How Can Computers Read DNA?

Article Summary:

Ever wondered how that thread like DNA got into your computer? How do scientists analyze it and programmers make robust and sophisticated algorithms that can read the whole Human genome. If you want an answer to all your questions read on and find out how its done...

Listed below are the steps through which DNA has to go through to find its way to the computer.

What is Gene Sequencing
A Genome is just like a map but without any directions and names. It consists of whole lot of ACTG's but not all of these are coding something meaningful. Scientists have to figure out which one are the useful genes (a particular combination of ACTG) and which one's are not. They also have to find out what these genes code for and how they are related and coordinated.

Is the Genome read and Sequenced all at once?
No. Since, for example, the human genome consists a total of 3 billion base pairs, the genome has to be cut in smaller pieces so that it could be easily sequenced. Thus, the genome when cut into smaller pieces is sequenced and the sequenced pieces are put back together like a puzzle which later on, scientists have to figure out piece by piece. Few of the approaches that were used by scientists during Human genome mapping were,

1. Gel Electrophoresis.
2. Sanger's DNA Sequencing method.
3. Whole genome Shotgun method.

Method 1: Gel Electrophoresis
A DNA molecule is extracted. This DNA molecule during purifying will be cleaved and present in different lengths. The technique of Gel Electrophoresis can identify DNA sequences of different lengths, even the ones that might differ with only one base pair. Here is how the process works;

1. A container is filled with agarose gel (made from seaweed) that will work as a filter for separating DNA strands of different lengths.
2. Wells are made at one end of the Gel using a comb like tool. One well is filled with a standard DNA solution which contains DNA of already defined and measured lengths while other wells are filled with our DNA solution whose length we want to measure.
3. DNA is negatively charged because of its phosphate back bone, taking advantage of this fact, electric current is applied which creates a negative and a positive pole in the gel.
4. DNA repelled from negative pole starts to move towards the opposite end, that is the positive pole.
5. Since the gel is acting as a filter, smaller pieces of DNA move more faster to the opposite end as compared to larger pieces of DNA that lag behind.
6. DNA strands get sorted themselves since DNA of the same length will move with the same speed and end up at the same positions in the gel.
7. Gel is then stained with Ethidium bromide. This chemical binds with DNA and is visible under fluorescent light.
8. Stained DNA will be visible as sorted, small bands of varying lengths.

Method 2: Sanger's DNA Sequencing method
For the bases to be identified scientists use a Replication-halting nucleotide (Replication is duplication of DNA). Four reaction mixtures are set up. Each mixture includes;

1. DNA to be sequenced (template DNA).
2. DNA polymerase (enzyme that copies DNA).

How are all the ACTG's detected? Replication-halting nucleotide, Dideoxynucleotide triphosphate (either an A, C, T or G) are used. In the four reaction mixture solutions, DNA is cut into smaller fragments. These template smaller fragments are replicated in the four solutions each using its on Replication-halting variant. This variant nucleotide tags the ends of DNA fragments. When the fragments are sorted by size using Gel electrophoresis techniques, a pattern is obtained of DNA fragments on a photo film, ending with a particular replication halting variant lets say C. These photo films are carefully read to determine the position of C's in our template DNA. Out come from all the four variant solutions will be something like this;

DNA to be sequenced:**********

Solution with Replication-Halting Nucleotide C
*C
*******C
Solution with Replication-Halting Nucleotide A
A
****A
Solution with Replication-Halting Nucleotide G
**G
********G
Solution with Replication-Halting Nucleotide T
***T
*****T
******T
*********T

Now we know which letter will fills each blank. By combing the data from every solution we get the output ACGTATTCGT. Thus DNA is sequenced bit by bit. This approach was discovered by Fred Sanger and is known as Sanger method of DNA sequencing. The method was used in the Human Genome project.

Method 3: Shotgun Sequencing
In Shotgun sequencing multiple copies of the gene to be sequenced are made. These copies are blown into smaller fragments and each fragment is then sequenced. Once the fragments are sequenced they have to be put back together in their precise order. Immense amount of computer power is used to match the sequences by using their overlapping fragments.
This approach carries some drawbacks when repeat sequences are encountered. Often there is no way of knowing how long the repeat sequences are or in which positions the fragments over lap.

Entering ACTG's in the Computer
For this step, an DNA sequencer is used, it works in the following steps.
1. The sequencer comprises of one or more laser beams that emit at a certain wavelength and are absorbed by the fluorescent dye that was used in the above stated sequencing experiments.
2. DNA during electrophoresis experiments on reaching the end of the gel, is read by the laser beam and a camera.
3. The laser beam excites the fluorescent dye which emits a certain color and its recorded by the camera and fed in an computer.
4. Consequently, one after another the hundred of DNA fragments pass through the beam and camera and are read.
5. All this read DNA data is integrated using a computer program. This program spots where the DNA fragments overlap and orders them.
6. Many overlapping sequences are read in order to generate the uninterrupted sequence.

The program thus predicts the sequence as it originally was in an chromosome.

Facts about DNA Sequencing
1. DNA consist of repeat codes that were discovered by Alec Jeffery's, and were called DNA finger prints. These repeat sequences stood as page marks among the random code of Human Genome. One approach while sequencing the human genome was to use these repeat sequence as tags and draw a precise map of human genome. Once the genome was mapped it could be distributed among various labs in the world to speed up the sequencing project.
2. During human genome project ever base pair of DNA was sequenced on an average of 9 times.
3. Some DNA stretches were easy to read and needed to be sequenced less often where as other were long and sequenced more often.
4. More than 50 million sequencing reactions were done during Human genome project with 2000 scientists working around the world on the project.
5. Only 25% of the genome consist of useful genes. The rest includes regulatory regions that control how genes are turned on and off and long stretches of Junk DNA, since scientists don't yet know what it does.
6. For human genome project DNA samples were obtained from hundred of donors but their identity was kept anonymous.

About Author / Additional Info:

Publish Your Research Online

How Can Computers Read DNA?

Article Summary: