EST based molecular markers
Authors: Chet Ram, Bhupendra Singh Panwar, Lalit Arya, Manjusha Verma

Expressed sequence tags or ESTs are short and single pass end sequences (either 5' end and/or 3' end) of the cDNA clones which represents coding region of genome. Because of limitation in sequencing techniques the range of ESTs lies between 100 and 800bp in length. Since EST sequences are redundant in nature therefore, it must need to make them non-redundant prior to utilization. The developments in high throughput sequencing techniques makes ESTs based markers the chipest and valuable tool for characterizing and evaluating genetic resources. EST based markers are broadly classified into four groups as following:

1) EST-SNPs
2) EST-INDELS
3) EST-SSRs
4) ILP

EST-SNPs

Single nucleotide polymorphisms (SNPs) are considered as the next generation sequence based co-dominant molecular markers. Whole genome coverage of SNP markers make it a valuable tool for high density genetic maps, polymorphism analysis, association mapping etc. Of the methods available for SNP discovery, EST database mining is one of them.

Procedure:

• Retrieve EST sequences from databases (random as well as specific ESTs)
• Assemble ESTs into contigs using softwares like Abyss, Velvet etc.
• Multiple sequence alignment to detect single nucleotide polymorphism
• Primer designing from the flanking regions
• PCR amplification followed by sequencing to validate variation/polymorphism
• Allele specific primers designing according to SNP detection method used
• Screen germplasm to study polymorphism

EST-InDels

Insertion deletions (InDels) at a particular location of the expressed part of the genome can alter the expression of the gene. Therefore, InDels can be considered as markers for evaluating genetic variations between organisms. In InDel markers, random insertion/deletion is a source of variation which make it co-dominant in nature. As compared to CDS sequences of the gene, UTRs (Untranslated regions) harboring less selective pressures and hence highly variable in terms of nucleotide sequences. By virtue of this variability, UTRs can be used for identification of InDel markers of which 3'UTR is more appropriate because have no regulatory role except in few. The identification of InDels from EST databases is almost similar to the identification of SNPs.

EST-microsatellites (EST-SSR)

Simple Sequence Repeats (SSRs) are tandem repeats in the form of mono- di-, tri-, tetra-, penta- and hexa-nucleotide and so on in the genome of organisms. They are highly variable, multi locus, co-dominant, highly reproducible markers which also need prior knowledge of sequence informations. They are utilized to generate genetic maps, and are also used for association mapping and gene tagging for particular traits. The basis of polymorphism is variation in the length of tandem repeat between two loci.

SSRs can be developed by mining of nucleotide repeats from available sequences as well as by developing SSR enriched libraries from the species where sequence information is not available. In some species they were transferred from cross species. ESTs generated from transcriptomic projects can be used to generate cost effective, reliable and functional SSRs which are direct evidence of nucleotide repeat tags in a genic region.

Procedure:
• Retrieve EST sequences from databases (random as well as specific)
• Assemble ESTs into contigs using available softwares
• In silico analysis of EST contigs for mining possible nucleotide repeats using softwares (eg. MIcroSAtellite identification tool (MISA), Simple Sequence Repeat Identification Tool (SSRIT), Wabsat)
• Design primer pairs flanked by repeats
• PCR assay
• Validation of amplicons on high resolution gel matrix
• Allele scoring and validation of variations

Intron Length Polymorphisms (ILP) markers

Intron was discovery in the late 1970s and was considered as vestigial genomic sequences and a sort of genomic parasites. This concept was dragged them into selfish or junk DNA. These conceptions about to their non-functionality and evolutionary neutrality have now been demonstrated totally erroneous. This is supporting that the utility of introns will be more in near future. Their involvement in control of gene expression by means of alternative splicing and Intron Mediated Enhancement (IME) and Intron Dependent Spatial Expression (IDSE) was already indicated by their importance in molecular biology. They actively support the production of miRNAs and snoRNAs during different developmental stages of organisms. Besides their active participation in important molecular developmental events, their neutrality in the course of genetic evolution supposed to be excluded.

The Intron Length Polymorphic (ILP) markers have great potentiality to detect genomic evolutions. ILP is immerging technique having capability to designate it molecular markers because of its structural features like exon-intron gene boundaries. The ideal features of these markers includes specific to genes, genetically co-dominant, having high rate of variability, easily accessible. They are easily detectable using PCR assay with primers designing from flanked exons to an intron. The basis of polymorphisms is the length variations in targeted Intron of homologous genes.

Procedure:

• Retrieve EST sequences from databases (random as well as specific)
• Assemble ESTs into contigs using available softwares
• Align EST contigs with full length gene sequence
• Assign exon-intron boundaries
• Design primer pairs from exons flanking one Intron
• PCR assay
• Visualization of amplicons on high resolution gel matrix
• Validation of variations

Importance of EST based markers

Since ESTs are representing most expressed and most conserved part of a genome hence EST based molecular markers can be utilized for mapping of a genome and establish the phylogenetic relationship between two or more species. These EST based markers can also be utilized for a species which do not have prior information of genomic resources or which is difficult to sequence. The EST based molecular markers have high percentage of cross-species transferability in related or distant species as compared to genomic markers and hence can serve as potential tool for comparative genomics.

References

Braglia L, Manca A, Mastromauro F and Breviario D (2010) cTBP: A Successful Intron Length Polymorphism (ILP)-Based Genotyping Method Targeted to Well Defined Experimental Needs. Diversi. 2:572-585.

Han Z, Wang C, Song X, Guo W, Gou J, Li C, Chen X, Zhang T (2006) Characteristics, development and mapping of Gossypium hirsutum derived EST-SSRs in allotetraploid cotton. Theor. Applied. Genet. 112:430-439.

Picoult-Newberg L, Ideker T. E, Pohl M. G, Taylor S. L, Donaldson M. A, Deborah A. Nickerson D. A and Boyce-Jacino M (1999) Mining SNPs From EST Databases. Genome Res. 9: 167-174.
Wang X. Zhao X, Zhu J, Wu W (2005) Genome-wide investigation of Intron-Length Polymorphisms and their potential as molecular markers in rice (Oryza sativa L.). DNA Res. 12:417-427.

About Author / Additional Info:
I am working as a scientist at Division of Genomic Resources, National Bureau of Plant Genetic Resources, Pusa New Delhi-110012