GWAS stands for Genome Wide Association Studies. They are research studies conducted on entire human genome for analysis of genetic variations. They are designed to find the associations between different genetic markers, observable phenotypes, and any disease conditions.

Such studies are relatively backed by the progress in sequencing technology and innovative genotyping methods. The number of polymorphisms discovered in humans has grown rapidly to 2 million SNPs today. Effective genotyping of other species are also being carried out.

However, the technology poses its own challenges and risks. Since the data analyzed is huge, this warrants novel methods in statistical analysis. Also software which can handle huge data effectively is increasingly being on demand. Add to this, the huge genetic variation among humans and the analysis poses further challenges.

The genetic contribution towards the aspects of aging, cognitive functioning, physical performance etc is also not been defined fully. The NIH defines GWAS as the study of genetic variation across the human genome. They are conducted to detect genetic associations with phenotypic characters. The structural differences among the humans are greater than the nucleotide sequence variations and hence the newer approaches integrate the structural aspects in addition to the sequence variations.

GWAS involves rapid scanning of genetic markers of sample population to detect the genetic variations associated with a particular phenotype/disease. Typically they require more number of subjects per trial since the genetic association with exact SNPs and variations are found to show very low odds ratio which is as low as 1.5. The level of significance needed to establish a causal link between the variable and signal is also high. The tests required to assess the significance are large and hence corrections needed are also high.

Process of GWAS

• Large number of individuals in both case and control groups. Usually they are made into groups based on some confounding variables like ethnicity, race etc. The groups are further stratified according to variables such as age, sex etc which can enhance the signal.
• Microarray based techniques are employed to detect Single Nucleotide Polymorphisms.
• Haplotypes are derived on the basis of such polymorphisms. Haplotypes are tightly linked alleles which are less susceptible to drastic variations over generations.
• Suitable Statistical techniques are employed to reduce false discovery rates and detect association signal.
• Mapping of association signals is then followed by studies to replicate the results.
• Finally, the results are interpreted to identify the biological function/variant to which the association can be linked. Exploratory studies are also done to detect the mechanism of association.

Tools for GWAS studies

1. IMPUTE is specifically designed for genotype imputation based on marker data such as HapMap.

2. dbGAP -It is a database of Genotypes and Phenotypes to archive and distribute the results of researches in which the interaction of genotype and phenotype has been examined.

3. SNPTEST employs Bayesian methods of analysis to detect SNPs associated with the binary or quantitative phenotypic traits. This takes the genotype variation into account

4. HAPGEN is another program which stimulate data in cohorts of case and control to derive the association of SNP markers with known Haplotypes.

5. MarkerInfofinder is used to detect SNP marker association based on function, location, gene/probe, and the type of disease. The search tool integrates the database with the available literature to get the best results.

6. Meta carries out metaanalysis studies on genetic diseases. However, due to the limited number of antagonistic publications and wide variations in the qualitative nature of data, such studies lack accuracy.

7. GENECLUSTER is software developed for detecting and finding the genomic location of causal loci in mapping and GWAS.

8. QCTOOL is used to detect SNP association inherent with Sample quality control.

9. CHIAMO is used in studies with multiple cohorts.

10. GTOOL can generate genotype data subsets and can be used to convert the file format for genotype data for importing data from one program to another.

11. SNP & Variation Suite is designed to conduct analysis on genomic data. The analysis options are wide including detection of allele frequency, P-value, Hardy-Weinberg Equilibrium, Signed correlation analysis, genotype counts etc.

12. Goldsurfer2 is an interactive kind of GWAS program to find out and highlight the focused research topics by arranging data in a hierarchal pattern. The data can also be linked to graphs and tables for ease of analysis. The software can import several datasets, merge them and analyze them without losing the significance. PCA techniques can also be employed.

There are still newer methods and tools used in GWAS. The challenging problem is in the filtering of noise from original data to improve accuracy. The new methods are aimed at improving the significance by combining one or more approaches with development of underlying statistical algorithms.

About Author / Additional Info: