Biotech Articles
Publish Your Research Online
Get Recognition - International Audience

Request for an Author Account   |   Login   |   Submit Article

Why and How of Normalization in Microarray Data Analysis

BY: Sandhya Anand | Category: Bioinformatics | Submitted: 2011-01-05 07:13:50
       No Photo
Article Summary: "Normalization is the process used in microarray data analysis to correct the measurement errors and bias introduced in acquisition of data. The data from microarray experiments are log normalized format since the log forms of data always follow a Gaussian distribution. The process therefore contributes more towards error correct.."

Share with Facebook Share with Linkedin Share with Twitter Share with Pinterest Email this article

The term normalization has been linked to microarray data as the first step in data analysis. This should not be confused with the normalization in statistical procedures in which the purpose is to make the data distribution to a normal or Gaussian distribution. The microarray data conforms to the pattern of Gaussian distribution and there is no need of data modification with such an aim.

Normalization of microarray data is aimed to correct for the systematic measurement errors and bias in the observed data. The errors and bias may be introduced by several factors such as difference in probe labeling, concentration of target DNA/ RNA sequence, efficiency of hybridization, instrumental noise due to scanners or printers etc. The process allows data to be compared across a common reference.

The actual measurement of the microarrays from the scanner consists of true measurement as well as the error component. The error in turn has bias and variance.

The variance is often distributed in a uniform pattern across the data. Examples include systematic errors due to defects in instrumentation and biological variation like difference in tissue samples or strains of mice. Bias is the tendency of the experimental system to err. For example the effects of binding vary for Cy5 and Cy3 dyes. The effects of variance can be nullified through experimental replicates both technical and biological. The corrections can be introduced by appropriate statistical tests also. The effect of bias can be nullified by the process of normalization.

Normalization is usually applied to the differential expression of dyes. The log ratio is represented as M = log2(R/G) = log2 R- log2 G.
The log intensity at each spot is calculated as
A = 1/2 log2 (RG) = ½ (log2(R) + log2(G))

The process of normalization can be classified into linear and non linear normalization. Linear normalization is applied to selected genes or global ones. The process is quite suitable for consistent data quality metrics. The database construction with this approach is simple. However the error factor is assumed to be uniform across all genes and so does the mRNA content in the cells selected for comparison.

Non linear normalization is highly precise for data at extreme values, but requires a gene set for reference. The problem with this approach is that it provides too many false positives and increases the power even with a statistically poor data. The purpose of both methods is to bring each image in the microarray data to same average brightness using simple statistics or more complex ones.
The simplest procedure of normalization employs division of the expression ratios by mean value. This can consider a single micro array slide as a whole or subdivide them into sectors to create sub matrices. In this method, low intense spots will show much higher variability than the brighter ones. The variability includes a greater detail of background variations and machine variability. When the absolute amount of RNA available is less, the change is greater. Dividing by the mean value will suppress the data in the average value from analysis.

For the simplest model with an experimental replicate of two slides, the M values are the average of the two slides. In case of a dye swap experiment, the M values of one of the slides has to be multiplied by -1 and the average is taken to get the fitted M value on which normalization is to be performed.

The linear models can perform normalization within slide, between paired slides and between two slides in which same type of hybridization is employed. It can include all the genes, house keeping genes alone or control genes.

Setting the M Vs A values to median is another type of normalization which assumes the changes to be symmetric over all the genes in a slide. Locally weighted regression methods are much sophisticated methods of normalization. There are two types- print tip type and global loess.
In print tip group normalization the M values are normalized by subtraction from corresponding average of the print tip group. The local regressions are linear and selected within a span of 0.4.
The Global Loess normalization does not count the differences in subarray and hence useful when the spatial variations in the M-A plots are negligible. The process can also be done by weighing for spot quality and scale normalization. Further complicated normalization procedures like median absolute deviation are employed in special cases. The choice of normalization method depends on the purpose of research and quality of data.

About Author / Additional Info:

Search this site & forums
Share this article with friends:

Share with Facebook Share with Linkedin Share with Twitter Share with Pinterest Email this article

More Social Bookmarks (Digg etc..)

Comments on this article: (0 comments so far)

Comment By Comment

Leave a Comment   |   Article Views: 6867

Additional Articles:

•   How to Get Job in Biotechnology Field: General View

•   The Impact of Biomass in Sustainable Development Today

•   Gene Pyramiding in Crop Improvement

•   Electronic Data Capture - An Essential Element in the Field of DM

Latest Articles in "Bioinformatics" category:
•   Career as Bioinformatician and Biostatistician

•   Expander: A Tool of Bioinformatics

•   Role of Bioinformatics in Drug Discovery

•   Importance and Applications of Bioinformatics in Molecular Medicine

•   Bioinformaticist vs. Bioinformatician - Definition, Differences and Career Outlook

•   Bioinformatics Application in Nanotechnology

•   How Bioinformatics Handles the Biological Data?

•   Application of Bioinformatics in Medicine

•   Prenatal Diagnosis via Bioinformatics Skills

•   Applications of Bioinformatics in Agriculture

•   Next Generation Sequencing Technologies: 454 Pyrosequencing

•   GenScan: Bioinformatics Software For Structure Prediction and Analysis of Gene

•   Pairwise Sequence Alignment For Sequence Similarity

•   Applications of Bioinformatics in Biotechnology

•   Introduction to Bioinformatics: Role of Mathematics and Technology

•   Steps in Microarray Data Analysis - Part I

•   Steps in Microarray Data Analysis - Part II

•   Bilirubin Metabolism And its Role in Neonatal Jaundice

•   Bioenergetics, Enzymes And the Energy of Activation

Important Disclaimer: All articles on this website are for general information only and is not a professional or experts advice. We do not own any responsibility for correctness or authenticity of the information presented in this article, or any loss or injury resulting from it. We do not endorse these articles, we are neither affiliated with the authors of these articles nor responsible for their content. Please see our disclaimer section for complete terms.
Page copy protected against web site content infringement by Copyscape
Copyright © 2010 - Do not copy articles from this website.

Agriculture Bioinformatics Applications Biotech Products Biotech Research
Biology Careers College/Edu DNA Environmental Biotech
Genetics Healthcare Industry News Issues Nanotechnology
Others Stem Cells Press Release Toxicology  

  |   Disclaimer/Privacy/TOS   |   Submission Guidelines   |   Contact Us