In 1920, H. Winkler coined the term genome to describe the complete set of chromosomal and extra chromosomal gees of an organism or virus. In eukaryotes, DNA is also present inside mitochondria and chloroplast in plants. After a gap of 65 years Thomas Roderick in 1986 first used the term genomics and described it as a specific discipline of mapping, sequencing and analyzing the genome. This definition is unclear but emphasizes the systematic exploration of genome information to answer several quires arising from biology or its related areas. Now the number, location, size and organization of all genes require making up an organism can be known. Thus genome is the study of molecular organization of genome, their information contents and gene products they encode. Significance of genomics was substantially increased when the Human Genome Project was conceived of in 1987. Officially the Human Genome Project was started on October 1990 in USA. Throughout the world scientists were trying to sequence the genome completely of several organisms of important groups. The reason of sequencing the genome is it provides knowledge of total number of all genes, it shows relationship between genes, provides opportunities to exploit the sequence for desired experimentation, all the genetic information of the organism and acts as an archive of all genetic information.
During the mid 1980's the US Department of Energy (U.S.D.O.E) started several projects to construct the detailed genetic and physical maps of human genome, determine its complete nucleotide sequence and localize its estimated 1,00,000 genes. The new computational methods were developed for the analysis of genetic map and DNA sequence data. Design of new techniques and instrumentation of DNA analysis was demanded. In order to make the results available rapidly to the scientists, the project used advanced means of collaboration work resulting in the development of Human Genome Project.
In 1995, the completely sequenced genomes of the first two smallest bacteria Haemophilus influenza (1,830 kb) and Mycoplasma genitalicum (580kb) were reported. In 1996, the first yeast Sacharomyces cervisiae genome sequencing was completed. In 1997 sequencing of genome of two best studied bacteria, E.coli and Bacillus subtilis was completed. The sequence data of Mycoplasma genitalicum is important because these help to establish the minimal set of genes required for the free living existence. These appear to be 517 genes, out of which 140 genes code for membrane protein and 5 genes for regulatory function. Several computer programmers were adopted for data analysis. To carryout sequence assembly, several new computer programs could also be written. Genome sequencing of many bacteria including those of medical and industrial importance and from extreme environment were determined at "The Institute of Genome Research (TIGR)", Mary land USA. This institute was established by Craig Venter.
Sequencing a genome
Sequencing a genome is an enormous task. It requires not only finding the nucleotides sequence of small pieces of genome but also ordering those small pieces together into the whole genome. In 1975 Frederick Sanger developed a technique which was most widely used for DNA sequencing. He used dideoxynucleotide triphosphate (ddNTPs) in DNA synthesis. Recently automated system for DNA sequencing has been developed. Where ddNTPs are labeled with fluorescent dyes, each of different colours which are scanned by a detector and sequence is determined from the order of colour formed in band on gel.
Clone based sequencing
In clone based sequencing (also known as hierarchical shotgun sequencing) the first step in mapping. One first constructs a map of the chromosome, making them at regular intervals of about 100 kilo bases. Then known segment of marked chromosome are cloned in plasmids. One special type of plasmid used for genome sequencing is BAC (bacterial artificial chromosome), which can contain DNA fragment of about 80- 180 kb in E.coli cells. The plasmids fragments are then further broken into small, random, overlapping fragments of about 0.5 to 1.0 kb. Finally automated sequencing machines determine the order of each nucleotide of the small fragments. Data management and analysis are critical parts of the process, as these sequencing machines vast amount of data. As the data are generated, computer programs align and join the sequence of thousands of small fragments. By repeating this process with the thousands of clones the span each chromosome researchers can determine the sequence can join the clones and determine the sequence of each chromosome. Finding the sequence of smaller clone fragments is relatively easy. The challenge is assembling all the pieces. The National Human Genome Research Institute used clone based sequencing for the human genome. In doing so they relied heavily on the work of computer scientists to assemble the final sequence.
Whole Genome Sequencing by Shotgun
Before 1995 whole genome sequencing was not possible because of computational power was not sufficient to assemble a genome from thousands of DNA fragments. J. Craig Venter and H. Smith developed whole genome shotgun sequencing and sequenced the genome of bacteria H. Influenza and M. genitalicum this approach may be categorized into four steps.
Library construction: the chromosome is isolated from the desired cells following the methods of molecular biology and randomly fragments into small pieces using ultrasonic waves. Then the fragments are purifies and attached to plasmids vectors. Plasmids with single insert are isolated. A library of plasmids clones is prepared transforming E.coli strains with plasmid lacked restriction enzymes.
Random sequencing: the DNA is purified from plasmid. Thousands of DNA fragments are sequenced using automated sequencer by employing primers labeled with special dyes. Normally universal primers, thousands of templates are used. These recognize the plasmid DNA sequence next to bacterial DNA insert. The whole genome is sequenced several times. This results in final accurate results.
Fragment alignment and gap closure: by using special computer programme, the sequenced DNA fragments are clustered are assembled into longer stretches of sequence by comparing nucleotide sequence overlaps between the fragments. Two fragments are joined to form large stretch of DNA if the sequences are at their ends overlapped and matched. This overlap comparison method results in a set of larger contiguous nucleotide sequence called contigs. Contigs are aligned in proper order to form the completed genome sequence. If there a gap between the two contigs these could be analyzed and gap filled in with their sequences. Several other methods were also used to align contigs and filling the gaps. The large fragments overlap the previously sequenced contigs. Using oligonucleotide probes the fragments having overlap with two contigs allow placing side and filling in the gaps between them.
Proof reading: the proof reading of sequence is done carefully so that any ambiguities in the sequence could be resolved. The sequence could be resolved. The sequence is also checked for the frame shift mutation if so, the mutation is corrected. This approach required less than four months to sequence the genome (5, 00,000) of M. genitalicum.
Hybrid Shotgun-Sequencing Strategy
In a hybrid shotgun-sequencing strategy, sequence reads are generated in both a clone-by-clone and a whole-genome fashion. The sequence reads from individual BACs are then used to identify additional matching reads generated by whole-genome shotgun sequencing. It should be noted that this is precisely the strategy being used for finishing the Drosophila genome sequence. A hybrid shotgun-sequencing strategy can, in principle, capture the advantageous elements of both clone-by-clone and whole-genome approaches. For example, the whole-genome shotgun component pro-vides rapid insight about the sequence of the entire genome. Such data can be used to learn about the repertoire of repetitive sequences in a genome and to find matching conserved sequences in common with other organisms, as is now being done with mouse whole-genome shotgun-sequence data for the purpose of annotating the human genome sequence. At the same time, the whole-genome shotgun sequence reads are also useful when pooled with the corresponding data generated in a clone-by-clone fashion, thereby creating data sets with enhanced sequence coverage.
Next generation sequencing technology
The experience so far in analyzing the various, recently generated genome sequences has led to one definitive conclusion that the availability of genome-sequence data has a profound and positive impact on the infrastructure of experimental biology. Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalyzed the development of next generation sequencing (NGS) technologies. The inexpensive production of large volume of sequence data is the primary advantage over conventional methods. The automated Sanger method is considered as 'first generation' technology, and new methods are referred as next generation sequencing (NGS). These newer technologies constitute various strategies that rely on a combination of template preparation sequencing and imaging and genome alignment and assembly methods. The arrival of NGS technologies in the marketplace has changed the scientific approaches, clinical and applied research.
Sequencing technologies include a number of methods that are grouped broadly as template preparation sequencing, imaging and data analysis. The unique combination of specific protocol distinguishes one technology from another and determines the type of data produced from each platform. These differences in data output present challenges when comparing platform based on data quality and cost. Although quality scores and accuracy estimates are provided by each manufacturer, there is no consensus that a quality base from one platform is equivalent to that from another platform. A current research focus of several sequencing groups involves establishing the optimal balance between generating sequence reads in a clone-by-clone versus whole genome fashion when implementing a hybrid shotgun sequencing strategy. Although there is general consensus that ~8-10-fold redundant coverage is required for a project that aims to deliver high-quality, finished sequence, the relative amount of coverage contributed by the reads derived from a clone-by-clone versus whole-genome component is actively being investigated. In particular, the current efforts to sequence the mouse, rat and zebrafish genomes, all of which are using a hybrid sequencing strategy, should provide crucial insight into this issue.
• Deininger, P. L. Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis. Anal. Biochem. 12
• Wendl, M. C. et al. Theories and applications for sequencing randomly selected clones. Genome Res.
• Weber, J. L. & Myers, E. W. Human whole-genomeshotgun sequencing. Genome Res.(1997).
• Green, P. Against a whole-genome shotgun. Genome Res.
About Author / Additional Info:
Genome Sequencing Strategies