Biotech Articles
Publish Your Research Online
Get Recognition - International Audience

Request for an Author Account   |   Login   |   Submit Article

Steps in Microarray Data Analysis - Part II

BY: Sandhya Anand | Category: Bioinformatics | Submitted: 2011-01-18 09:53:39
       No Photo
Article Summary: "Microarray data analysis requires numerous steps and data processing. There are complex statistical tests employed to find the significance of the research finding and these are detailed in this part..."

Share with Facebook Share with Linkedin Share with Twitter Share with Pinterest Email this article

Based on the nature of processed data, it is subjected to either inferential or descriptive statistical tests.

Inferential statistics

Inferential statistics use parametric or nonparametric statistics. It provides statistical significance to the discovered gene expression values. This type of analysis is used for research designs involving hypothesis formulation and is suited for finding up regulated genes. It is usually guided by the hypothesis.

• The expression ratios of control and experiment data are calculated and normalized.
• Scaling is applied for data having low expression values.
• Set the p value. By setting the p value to 0.05 you expect only 500 out of the 10000 genes under analysis to show significant variation due to random chance.
• The goal is to establish the differential variation of the targeted gene in control vs. experiment data.
• The null hypothesis assumes no difference and alternate hypothesis assumes difference across the control and experimental data. If the p value exceeds the significance level 0.05 then the null hypothesis is accepted.
• The test statistic is applied to the data to find the p value based on which the null hypothesis is either accepted or rejected.
• The simplest of the inferential statistics is the t-test which finds the P value using the mean expression values and standard deviations.
• A factor which accounts for the noise is also incorporated into the test for better precision.

Types of inferential statistical tests

Inferential tests are either parametric or non parametric based on the nature of the data. Parametric statistical tests are employed for data which follow the normal distribution. Non parametric tests are employed if the data is not sure of following the Gaussian distribution. Practically speaking this differentiation has little to do with the nature and is heavily dependent on the choice of the investigator since all statistical tests in general assume the normality of data when extended to infinite population size.

The following rules are general for the choice of the statistical test
1. To compare one group to a reference value -t test for parametric data and Wilcoxon test for non parametric data.
2. Comparison of two paired groups - Paired t test and Wilcoxon test
3. Comparison of two unpaired groups - Unpaired t test and Mann Whitney test
4. To compare three or more unrelated groups - One way ANOVA and Kruskal Wallis test.
5. To compare three or more related groups - Repeated ANOVA and Friedman test.

Descriptive Statistical test

This is an explorative mechanism in which data is compared using the correlation coefficient and visualized to find the extent of similarity. The similarity between the genes is expressed as distance metrics. This can be either

• Euclidean distance
• Person correlation coefficient or
• Manhattan distance.

The choice of the distance to be measured depends on the area of application and the type of similarity between the genes you would like to find.
The data is expressed as matrix points in a graph and the absolute difference between the vectors form the basis of Euclidean and Manhattan distance. Manhattan distance is more robust to the presence of outliers. Standardization can be applied to all the three distance measures. After the standardization, Euclidean and correlation distances are approximately equal.

Alternatively, statistical algorithms can be used to find the similarity and to group similar objects or data. Descriptive statistical tests are either supervised or unsupervised methods of clustering. Unsupervised methods include Hierarchical clustering, K Means clustering, and Self Organizing Maps.

• Hierarchical clustering is used to link genes based on similarity and builds a tree to find the target gene just like pedigree charts.
• Self Organizing Maps (SOM) use random statistical relations and measure of correlation to split the genes to sub groups based on similarity. This is an example of machine learning in which the statistical programs are employed to cluster similar genes.
• Principle Component Analysis is another method in which every gene is considered as a dimension (vector). The aim of the test is to find a single dimension that best represents the variations in the data.

Supervised methods include use of linear discriminants, decision trees and Support Vector Machines to classify similar genes into groups for analysis.

All these methods are based on the homogeneity and separation principles. Those which are more similar are grouped together and which are dissimilar are clustered separately. Although these descriptive tests rule out the requirement for a common reference, the validity is not as significant as the statistical tests. The choice of the method however depends on the purpose and nature of data.

About Author / Additional Info:
Part 1:

Search this site & forums
Share this article with friends:

Share with Facebook Share with Linkedin Share with Twitter Share with Pinterest Email this article

More Social Bookmarks (Digg etc..)

Comments on this article: (0 comments so far)

Comment By Comment

Leave a Comment   |   Article Views: 4155

Additional Articles:

•   Vaccine Industry in India

•   Animal Model for Neural Tube Defects

•   Various Methods for Quantitation of Proteins

•   Virtual Screening- a Promising Approach to Drug Discovery

Latest Articles in "Bioinformatics" category:
•   Career as Bioinformatician and Biostatistician

•   Expander: A Tool of Bioinformatics

•   Role of Bioinformatics in Drug Discovery

•   Importance and Applications of Bioinformatics in Molecular Medicine

•   Bioinformaticist vs. Bioinformatician - Definition, Differences and Career Outlook

•   Bioinformatics Application in Nanotechnology

•   How Bioinformatics Handles the Biological Data?

•   Application of Bioinformatics in Medicine

•   Prenatal Diagnosis via Bioinformatics Skills

•   Applications of Bioinformatics in Agriculture

•   Next Generation Sequencing Technologies: 454 Pyrosequencing

•   GenScan: Bioinformatics Software For Structure Prediction and Analysis of Gene

•   Pairwise Sequence Alignment For Sequence Similarity

•   Applications of Bioinformatics in Biotechnology

•   Introduction to Bioinformatics: Role of Mathematics and Technology

•   Why and How of Normalization in Microarray Data Analysis

•   Steps in Microarray Data Analysis - Part I

•   Bilirubin Metabolism And its Role in Neonatal Jaundice

•   Bioenergetics, Enzymes And the Energy of Activation

Important Disclaimer: All articles on this website are for general information only and is not a professional or experts advice. We do not own any responsibility for correctness or authenticity of the information presented in this article, or any loss or injury resulting from it. We do not endorse these articles, we are neither affiliated with the authors of these articles nor responsible for their content. Please see our disclaimer section for complete terms.
Page copy protected against web site content infringement by Copyscape
Copyright © 2010 - Do not copy articles from this website.

Agriculture Bioinformatics Applications Biotech Products Biotech Research
Biology Careers College/Edu DNA Environmental Biotech
Genetics Healthcare Industry News Issues Nanotechnology
Others Stem Cells Press Release Toxicology  

  |   Disclaimer/Privacy/TOS   |   Submission Guidelines   |   Contact Us