Normalization in Microarray Data Analysis and types of Normalization Methods
Author: Nivedita Yadav


The term normalization has been linked to microarray data as the first step in the data analysis and plays important role in the analysis, many undesirable systematic variations are commonly observed during data analysis in Microarray. Normalization of microarray data is aimed to correct for the systematic measurement errors and bias in the observed data. The error and bias may be introduced by several factors such as the difference in probe labeling, concentration in target DNA/RNA sequence, efficiency in hybridization, instrumental noise due to scanner or printers etc.

"Method applied in microarray data analysis precise the errors and unfairness introduced in the microarray data analysis is Normalization." Normalization is the process of removing such variation that affects the measured gene expression. The data from microarray experiments are log normalized format since the log forms the data always follow a Gaussian distribution. The process therefore, contributes more towards error correct."

Normalization of microarray data includes two steps:

(i) Selection of genes to be used as normalization features: Gene selection can contain the entire gene set (global), housekeeping genes, rank-invariant genes or genes spotted with the same print-tip.

(ii) Application of a mathematical operator or metric to calculate the normalization factor using the data from the selected genes: Mathematical operators or metrics consist of expression-intensity mean/median, expression-ratio mean/median, mean/median logarithm expression-ratio, expression-ratio probability density and non-linear/piece-wise linear regression.

There are three large groups of normalization techniques, each of which can hold various methods. These are -

1) Normalization
2) Standardization
3) Centralization

As we discussed, normalization stands for the method of applying a statistics in microarray data (intensity ratio) so that it will in close proximity to a normal distribution. Usually, a log-transformation moving parts as similar as a normalization method for microarray data. Log-transformation of intensity ratios is better for copious reasons. The simple ratio puts the entire low regulated expressed genes between 1 and 0. Log transformation takes out this unfairness. In statistical language, log-transformed data provides the more realistic logic of deviation and attains the variation of intensities and ratios of intensities much extra free of complete magnitudes. Log-transformation as well stabilizes the variance of high-intensity spots. Standardization is the procedure of increasing or contracting the distribution of a statistics so that the tentative values can be compared with those from another experiment. The bulk of methods intended by normalization in the microarray field is covered by the term centralization. It is a method of putting a distribution so that it is centered over the estimated mean.

Types of Normalization Methods:

Normalization is essential to compare the varying conditions of microarray experiments. it is used in visual inspection of the data and comparisons before and after normalization to control that the procedure worked correctly. While there are several methods for normalization, the choice of which method gives the best results actually calculated on your local settings. Basically, there are three types of Normalization methods-

(i) Affymetrix: Background correction + expression estimation + summarization

Probes are printed to the array base by base in a process that employs a combination of chemistry and photolithography. Affymetrix contains five algorithms, MAS5, Plier, RMA, GCRMA, and Li-Wong.


1. MAS5 is the older Affymetrix method.

2. MAS5 normalizes each array independently and sequentially.

3. MAS5 uses data from mismatch probes to estimate a "robust average", founded on deducting mismatch probe value from match probe value.


1. RMA is the default and works rather nicely if you have more than a few chips.

2. RMA as the name suggests (robust multi-array) uses a multi-chip model.

3. RMA does not use the mismatch probes, because their intensities are often higher than the match probes, making them unreliable as indicators of non-specific binding.

4. RMA values are in log2 units.


1. GCRMA is similar to RMA but takes also GC% content into account.

2. GeneChip RMA (GC-RMA) is an enhanced form of RMA that is able to use the sequence-specific probe affinities of the GeneChip probes to attain more accurate gene expression values.

3. Makes use of MM intensities to correct background.

Plier (Probe Logarithmic Intensity Error Estimation):

In spite of a non-intuitive assumption regarding the PM and MM errors made as part of the derivation for PLIER, the leading probe level error function does capture the key uniqueness of the perfect error function, assuming MM probes only measure non-specific binding and no signal.

Li- Wong:

1. Li-Wong is the method implemented in dChip.

2. The term 'Li-Wong' refers to the procedure that normalizes arrays using an invariant set of genes and then fits a parametric model to the probe set data.

(ii) Agilent: Background correction + averaging duplicate spots + normalization

Inkjet printing process is used to deposit oligo monomers onto specially prepared glass slides. This will produce 60 -mer length oligonucleotide probes, which are printed base-by-base.

(iii) Illumina: Background correction (in GenomeStudio only) + normalization

Probes are attached to magnetic drops erratically distributed across arrays. The arrays are then scanned to spot the position of all beads. This is based on an individual decoding sequence conceded by each probe. This information is combined with the measurement information after hybridization and scanning.

About Author / Additional Info:
I am research scholar at SHUATS , Allahabad.