Several new ways to sequence DNA are available today.
Table 22.2 indicates the great improvement in DNA sequenc- ing efficiency with time.
Organization Website
GenBank http://www.nchi.nlm.nih.gov/
Genbank/
European Molecular Biology Laboratory (EMBL)
www.ebi.ac.uk/embl/index.html
DNA Data Bank of Japan www.ddbj.nig.ac.jp University of California, Santa
Cruz Genome Browser
www.genome.ucsc.edu/
National Center for
Biotechnology Information Map Viewer
www.ncbi.nlm.nih.gov/mapview/
National Human Genome Research Institute, NIH
www.genome.gov/
SNP Database www.ncbi.nlm.nih.gov/projects/
SNP U.S. Department of Energy
Genomes to Life
http://genomicsgtl.energy.gov/
Genomes OnLine Database (GOLD)
http://www.genomesonline.org/
Table 22.1 Public Genome Databases
Figure 22.4 Sequencing genomes. The “shotgun” approach to genome sequencing overlapped DNA pieces cut from several copies of a genome, then assembled the overall sequence. Newer techniques use microfluidics and nanomaterials to sequence DNA.
Chromosomes DNA fragments
A T A G
1 Sequencing
DNA “shotgunned” into many small fragments, using restriction enzymes.
DNA sequencer devices sequence small fragments.
2Assembly
Software aligns ends of DNA pieces by recognizing sequence overlaps.
T A C G A T T C C
C C G A T T C G A
C G A G T C A T A G C C G A T T C A T A G Automated DNA
sequencer
Short sequenced segments
Overlaps
A
ATAG
CG AG TC
A TA G
CC GAT TCG A
C CG A T T C
C C C T C T G A
3 Annotation
Software searches for clues to locations of protein-encoding genes. Databases from other species’ genomes searched for similarities to identify gene functions.
4 Tiling microarrays display genome pieces
G G A G C T G A
DNA microarrays Derived sequence
One of the original methods for sequencing DNA, invented in 1977 by Frederick Sanger, is still widely used and was quite brilliant in concept. The Sanger method generates a series of DNA fragments of identical sequence that are complementary to the DNA sequence of interest. These fragments differ in length from each other by one end base, as follows:
Note that the entire complementary sequence appears in the sequence of end bases of each fragment. The complement of the gene of interest is cut into a collection of pieces, differing in the end bases, which are distinguished with a radioactive or fluorescent label. That is, A, T, C, and G are labeled with dif- ferent fluorescent colors. Then the fragments are separated by size. Once the areas of overlap are aligned, reading the labeled end bases of the pieces in size order reveals the sequence of the complement, from which the sequence of interest is derived.
Figure 22.5 shows how DNA sequence data derived from the Sanger method appear in scientific papers, and figure 22.6 shows how the sequence is read from the end bases.
Newer approaches to DNA sequencing use a microfluid- ics environment, which is a small, fluid-filled chamber. One
Figure 22.5 DNA sequence data. In automated DNA sequencing, a readout of sequenced DNA is a series of
wavelengths that represent the terminal DNA base labeled with a fluorescent molecule.
CTatGCTTTGGAGAAAGGCTCCATTGgCAATCAAGACACACA CTNGCTTTGGAGAAAGGCTCCATTGNCAATCAAGACACACA
Sequence of interest:
Complementary sequence:
Series of fragments:
,
: T A C G C A G T A C A T G C G T C A T G
T G C G T C A T G G C G T C A T G C G T C A T G G T C A T G T C A T G C A T G A T G T G G
Table 22.2 Evolution of DNA Sequencing Efficiency
Year # DNA Bases Sequenced/Day
1986 1,000
1995 15,000
1997 500,000
1998 1 million
2005 80 million
2009 8.8 billion
“The diploid genome sequence of an individual human,” PLoS Biology, October 2007 (Craig Venter)
“The complete genome of an individual by massively parallel DNA sequencing,” Nature, April 17, 2008 (James Watson)
“The Diploid Genome Sequence of an Asian Individual,” Nature, November 6, 2008 (“YH”)
The genomes sequenced in the human genome projects of 2001 were composites of different individuals. The first two genomes of specific individuals to be sequenced—of genome research pioneers Craig Venter and James Watson—yielded few medical surprises.
Instead, they showed that we had greatly underestimated genetic variation by focusing only on the DNA sequence. The numbers of copies of short sequences—copy number variants, or CNVs—
contribute significantly to genetic variation.
“Back in 2001, we thought we differed from chimps by 1.27%
of our genomes. Now we know that we differ from each other by as much as 1 to 3%. If we count all the differences, we are about 5 to 6%
from the chimp. In the way we put sequences in public databases, we lost the insertions and deletions,” said Venter to an American Society of Human Genetics meeting after he’d had a year to think about what his personal genome sequencing had revealed. It wasn’t much that he didn’t already know from his family history and personal experience.
Venter has gene variants associated with increased risk of Alzheimer disease and cardiovascular disease. He has alleles for dry earwax, blue eyes, lactose intolerance, a preference for activities in the evening, and a tendency toward antisocial behavior, novelty seeking, and substance abuse. Not to his great surprise, he is a fast metabolizer of caffeine. “I can have two double lattes and wash it down with a Red Bull and not be affected by it,” he said.
James Watson, according to his genome sequence, is a heterozygote for a dozen rare recessive disorders, including a glycogen storage disease, two eye conditions, and a DNA repair disorder, and he is at elevated risk for twenty other disorders.
Science journals deemed Watson’s results “of thin clinical value”
and yielding “few biological insights.” Said Richard Gibbs, one of the researchers who sequenced Watson’s genome, “We tried genetic counseling on Jim, but it was a failure.” Yet Watson and Venter differ in inherited drug responses, supporting the value of pharmacogenetics/genomics (discussed in chapter 20). Said Venter,
“You probably wouldn’t suspect this based on our appearance—
we are both bald, white scientists.”
The third person to have his genome sequenced was called, simply, “YH.” He is Han Chinese, an East Asian population that accounts for 30 percent of modern humanity. He has no inherited diseases in his family, but his genome includes 116 gene variants that cause recessive disorders, as well as many risk alleles. He shares with Craig Venter a tendency to tobacco addiction and high-risk alleles for Alzheimer disease.
An overall comparison of the first three genome sequences of individuals provides a peek at our variation. Each man has about 1.2 million SNPs, but a unique collection. Each has only .20 to .23 percent of SNPs that are nonsynonymous, meaning that they alter an encoded amino acid, and the men share only 37 percent of these more meaningful SNPs. The math indicates, therefore, that about .07 percent of our SNPs may affect our phenotypes.
Reading 22.2
The First Three Human Genome Sequences
What will we learn from our personal genome sequences?
Key Concepts
1. In the Sanger method of DNA sequencing, complementary copies of an unknown DNA sequence are cut into different-sized pieces differing from each other by an end base. The pieces are overlapped by size and the labeled end bases read off.
2. Newer techniques use short DNA sequences and nanomaterials.
3. Researchers have built the first genome.