Biotech Articles
Publish Your Research Online
Get Recognition - International Audience

Request for an Author Account   |   Login   |   Submit Article

Bioinformatic Approaches to Find Gene-genome Duplication

BY: Sandhya Anand | Category: Bioinformatics | Submitted: 2011-06-21 11:01:46
       No Photo
Article Summary: "The article discusses the importance of detection of orthologs and bioinformatic methods used in the process..."

Share with Facebook Share with Linkedin Share with Twitter Share with Pinterest Email this article

Gene and genome duplication events have been known to be the driving forces in evolution along with further accelerated events such as genetic drift, mutations etc. However scientists encounter with a bottleneck in deciphering their importance due to their apparent non correlation between the genetic diversity and genome duplication events.

There are several approaches to study the pattern of these duplication events. The most promising approach is however through the bioinformatics tools.

The process of detecting gene duplication involves finding orthologs and paralogs, construction of gene trees and species trees. For genome duplication, evidences need to be analyzed within the species as well as across different species. Detecting accelerated divergence and measuring positive selection helps to identify duplicate gene evolution.

Determining Orthologs and paralogs

Orthologous genes occur by speciation and have the same function. Paralogs on the other hand, arise by duplication. Usually these genes acquire new functions in the event of gene duplication.

Comparative genomic approaches depend on identification of orthology based on conserved sequences. Evolutionary genomics makes use of complete mapping of the whole genome and comparison of sequences including those which underwent duplication.

Methods of determining orthologs

1. Pair wise sequence comparison

This method identifies the regions based on sequence similarity. The similarity may be indicative of a functional, structural or evolutionary relationship between the sequences. As indicated by name, they are used to compare two sequences from different species. The process can be used to identify orthologs. Multiple sequence alignment is used to infer evolutionary relationship among three or more sequences. MSA is useful in finding out homologous sequences. Bi directional BLAST and FASTA takes heuristic approach whereas Needleman-Wunsch (global) and Smith-Waterman (local) algorithms allow dynamic programming. The methods focus on one-to one comparison and identification of orthologs. However, BLAST is ineffective to detect homology between distantly related species.

2. Hit Clustering methods

Hit clustering methods detect clusters of similar sequences either from pair wise hits or from cluster graphs. Methods combining the algorithms such as Recursive and the Markov Cluster (MCL) algorithms have been developed for better detection. Such clustering is not ideal for analysis of large data from the same genome since the clusters lack functional coherence.

3. Synteny
Synteny refers to the presence of two or more genes on the same chromosome in a species. Determining Orthologous sequences using synteny is based on conserved sequences. Conserved synteny sequences share at least one ortholog among the different species. However, the number of orthologs thus found does not provide accurate data on genomic distances since the total number of chromosomes vary among the different species.

4. Phylogenetic approaches

Phylogenetic approaches make use of construction of phylogenetic trees based on sequence similarity. The tree can also be constructed based on distance matrix between sequences, restriction data and allele data. Orthologous sequences are clustered nearer to each other. The approach is best suited for specific families. A genome wide analysis can give rise to inaccurate results due to convergent evolution.

Problems in determination of orthologs

However, there are constraints for conducting genome wide studies to detect orthology.
a. The number of genes to be analyzed is huge which includes many paralogous gene families which has subfunctionalilzed before species divergence. This makes the correlation almost non linear.
b. Gene duplication events do not occur at steady rates. Evidences for abundant duplication and even loss of existing genome are seen in evolutionary history. Similarly sometimes protein families get expanded and genes acquire new functions or get inactivated.
c. False positives can arise due to the presence of matching sequences in unrelated proteins which is not necessarily due to common ancestry.
d. Noise in genomic data is unavoidable in most cases.
e. The rates of mutation can vary between related/similar genes and species. This can result in unpredictable gene divergences.
f. Presence of pseudogenes and incorrect/ incomplete gene models also pose serious problems in analysis.

The methods in analyzing orthologs pose serious challenges as the data becomes huge especially genome wide analyses. Combination of two or more approaches has been promising. Determination of gene orthology is necessary prerequisite for data mining and anlysis of genomic data. There are several tools like PSI-BLAST, COG, INPARANOID etc which makes use of pair wise sequence comparison. Newer approaches such as SynPhyl integrate the syntenic and phylogenetic methods and show a promising future for analysis at a genome wide scale.

About Author / Additional Info:

Search this site & forums
Share this article with friends:

Share with Facebook Share with Linkedin Share with Twitter Share with Pinterest Email this article

More Social Bookmarks (Digg etc..)

Comments on this article: (0 comments so far)

Comment By Comment

Leave a Comment   |   Article Views: 6459

Additional Articles:

•   Roots: The Route to Crop Improvement

•   Bioreactor Operations- Vortex Flow Separation For Separating Cells

•   How is Your Gene Doing? Biotechnology has Answers !!

•   Techniques Used in Detection of Genetic Diseases - Part 1

Latest Articles in "Bioinformatics" category:
•   Career as Bioinformatician and Biostatistician

•   Expander: A Tool of Bioinformatics

•   Role of Bioinformatics in Drug Discovery

•   Importance and Applications of Bioinformatics in Molecular Medicine

•   Bioinformaticist vs. Bioinformatician - Definition, Differences and Career Outlook

•   Bioinformatics Application in Nanotechnology

•   How Bioinformatics Handles the Biological Data?

•   Application of Bioinformatics in Medicine

•   Prenatal Diagnosis via Bioinformatics Skills

•   Applications of Bioinformatics in Agriculture

•   Next Generation Sequencing Technologies: 454 Pyrosequencing

•   GenScan: Bioinformatics Software For Structure Prediction and Analysis of Gene

•   Pairwise Sequence Alignment For Sequence Similarity

•   Applications of Bioinformatics in Biotechnology

•   Introduction to Bioinformatics: Role of Mathematics and Technology

•   Why and How of Normalization in Microarray Data Analysis

•   Steps in Microarray Data Analysis - Part I

•   Steps in Microarray Data Analysis - Part II

•   Bilirubin Metabolism And its Role in Neonatal Jaundice

Important Disclaimer: All articles on this website are for general information only and is not a professional or experts advice. We do not own any responsibility for correctness or authenticity of the information presented in this article, or any loss or injury resulting from it. We do not endorse these articles, we are neither affiliated with the authors of these articles nor responsible for their content. Please see our disclaimer section for complete terms.
Page copy protected against web site content infringement by Copyscape
Copyright © 2010 - Do not copy articles from this website.

Agriculture Bioinformatics Applications Biotech Products Biotech Research
Biology Careers College/Edu DNA Environmental Biotech
Genetics Healthcare Industry News Issues Nanotechnology
Others Stem Cells Press Release Toxicology  

  |   Disclaimer/Privacy/TOS   |   Submission Guidelines   |   Contact Us