Publish Your Research Online
Get Recognition - International Audience
Request for an Author Account | Login | Submit Article
|HOME||FAQ||TOP AUTHORS||FORUMS||PUBLISH ARTICLE|
Steps in Microarray Data Analysis - Part IBY: Sandhya Anand | Category: Bioinformatics | Submitted: 2011-01-18 09:52:46
Article Summary: "Microarray data requires complex preprocessing and statistical tests. The individual steps are detailed to give an overview..."
The microarray data have been extensively used in genomic studies. However the data is extensive making it difficult to have a common reference design and analysis unlike the biological experimental data.
The steps used in microarray data analysis are summarized below.
1. Generation of data
The microarray data is an array of expression values derived from the hybridization of cDNA probes with the target. The matrix gives the values of hybridization as a measure of intensity of emitted fluorescence of the Cy5 and Cy3 dyes. The intensity of light emitted is affected by the overall intensity of light used in scanning, the dye effect and the back ground emission in addition to the intensity of hybridization.
Scanning of the hybridized arrays reveals the expression values. In most cases, the scanners are inbuilt with such calculations and provide the normalized log ratios along with the unprocessed data.
Scanning involves three steps:
1. Gridding- It involves separation of microarray spots by using image coordinates for the spots.
2. Segmentation is the process of separating the foreground and background pixels in a microarray spot.
3. Intensity extraction is done by calculation of average foreground and background intensities. This is done for the individual spots of the array.
After gridding, the microarray spot is marked as a circle. The target median is the median value of all the pixels in the circle. A square is marked around the circle. The pixels outside the circle but inside the square box are taken for calculation of a median value. This gives the background median value. Area is defined as the number of pixels inside the circle which are above the background pixels outside the circle but inside the square.
From these values, the integrated intensity is calculated as
Integrated intensity = (Target median - Background median) * Area
Intensity of Cy3 and Cy5 are calculated in this way and the log2 ratio is taken for further analysis.
2. Data preprocessing
The microarray data follows a Gaussian or normal distribution in its logarithmic format. Hence the preprocessing of microarray data is essentially aimed to eliminate the experimental bias and errors. The MA plot is used to analyze the data and to decide on the process of normalization.
Scaling is also employed in some cases in which the data is scaled up. The process is suitable for low expression values. The major goals of preprocessing are
1. To filter out changes in gene expression due to biologically relevant variations from the total variations.
2. Remove effects of technical artifacts due to DNA/oligonucleotide deposits on the slide.
3. Remove error due to equipments such as scanners used in quantifying expression.
4. Variations in quality of RNA due to extraction procedures.
5. Variations in washing of the slides after hybridization.
6. Errors in reading of signal due to errors in calibration, and the type of scanner used in measuring.
The preprocessing steps used in microarray data analysis assume that the genes change in an experiment.
3. Design of the experiment
Ideally this should be done before the collection of data. However, due to the increasing availability of free databases of microarray data, acquisition of data has become easier and analysis procedures usually start from these databases.
The experimental can be in the simple control vs. experiment design or more complicated designs.
a. Between subject designs:
These designs have two groups control and experiment groups with 'n' number of subjects in each group and are analyzed via simple statistical tests like the t-test.
b. within subject design:
This design takes into consideration of the intrasubject variations within the groups. For example design which takes into account the patient data before and after drug intake.
c. Factorial designs:
This is similar to between and /or within subject design, but considers the factors such as age, gender or any qualitative factors which can be used to group the data in the control and experiment data.
The within subject designs employ the data from subjects in different groups categorized based on the above mentioned factors along with another continuous factor, time.
Mixed factorial designs use a mix of the above combinations with a choice of factors under consideration.
d. Reference designs:
These designs require a reference sample. The experiment groups can be one or many and the comparison can be either one way or two ways.
e. Balanced designs/ designs without reference:
Here there is no specific reference and the comparison is between the groups. The number of relationships to be tested is dependent on the number of groups and usually more complicated. Such designs are useful where the genes are expressing at lower levels or having fewer variations and hence a meaningful comparison with a reference is not possible.
But the choice of the statistical test to establish the significance value is largely based on the nature of data and the experimental design. The factors deciding the choice and the statistics used in microarray data analysis are detailed in the next section.
About Author / Additional Info:
Part 2: http://www.biotecharticles.com/Bioinformatics-Article/Steps-in-Microarray-Data-Analysis-Part-II-559.html
Comments on this article: (0 comments so far)
• Carotenoids- Introduction, Origin and Properties
• Cloning Creates Human Embryonic Stem Cells Which are Patient-Specific
• Environmental Pollution - List of Most Common Pollutants
• GM Crops and Agro-Biodiversity
Latest Articles in "Bioinformatics" category:
• Career as Bioinformatician and Biostatistician
• Expander: A Tool of Bioinformatics
• Role of Bioinformatics in Drug Discovery
• Importance and Applications of Bioinformatics in Molecular Medicine
• Bioinformaticist vs. Bioinformatician - Definition, Differences and Career Outlook
• Bioinformatics Application in Nanotechnology
• How Bioinformatics Handles the Biological Data?
• Application of Bioinformatics in Medicine
• Prenatal Diagnosis via Bioinformatics Skills
• Applications of Bioinformatics in Agriculture
• Next Generation Sequencing Technologies: 454 Pyrosequencing
• GenScan: Bioinformatics Software For Structure Prediction and Analysis of Gene
• Pairwise Sequence Alignment For Sequence Similarity
• Applications of Bioinformatics in Biotechnology
• Introduction to Bioinformatics: Role of Mathematics and Technology
• Why and How of Normalization in Microarray Data Analysis
• Steps in Microarray Data Analysis - Part II
• Bilirubin Metabolism And its Role in Neonatal Jaundice
• Bioenergetics, Enzymes And the Energy of Activation
Important Disclaimer: All articles on this website are for general information only and is not a professional or experts advice. We do not own any responsibility for correctness or authenticity of the information presented in this article, or any loss or injury resulting from it. We do not endorse these articles, we are neither affiliated with the authors of these articles nor responsible for their content. Please see our disclaimer section for complete terms.
Copyright © 2010 biotecharticles.com - Do not copy articles from this website.
ARTICLE CATEGORIES :
| Disclaimer/Privacy/TOS | Submission Guidelines | Contact Us