Human Genome Project - Sequencing The Human Genome - Learn Science at Scitable
Human Genome Project - Sequencing The Human Genome - Learn Science at Scitable
Thanks to the Human Genome Project, researchers have sequenced all 3.2 billion base pairs in the human genome. How
did researchers complete this chromosome map years ahead of schedule?
Aa Aa Aa
The Human Genome Project was a 13-year-long, publicly funded project initiated in 1990 with the objective of determining the DNA sequence of the entire euchromatic human genome within 15
years. In its early days, the Human Genome Project was met with skepticism by many people, including scientists and nonscientists alike. One prominent question was whether the huge cost of
the project would outweigh the potential benefits. Today, however, the overwhelming success of the Human Genome Project is readily apparent. Not only did the completion of this project usher in
a new era in medicine, but it also led to significant advances in the types of technology used to sequence DNA.
Just as the Human Genome Project revolved around two key principles, it also started with two early goals: (1) building genetic and physical maps of the human and mouse genomes, and (2)
sequencing the smaller yeast and worm genomes as a test run for sequencing the larger, more complex human genome (IHGSC, 2001). When the yeast and worm efforts proved successful, the
sequencing of the human genome proceeded with full force.
The shotgun phase of the Human Genome Project itself consisted of three steps:
1. Obtaining a DNA clone to sequence Figure 1: Total amount of human sequence in the High-
2. Sequencing the DNA clone Throughput Genomic Sequences (HTGS) division of
3. Assembling sequence data from multiple clones to determine overlap and establish a contiguous sequence GenBank.
The total is the sum of finished sequence (red) and unfinished
(draft plus predraft) sequence (yellow).
© 2001 Nature Publishing Group International Human Genome
Sequencing Consortium. Initial sequencing and analysis of the
human genome. Nature 409, 867 (2001). All rights reserved.
Figure Detail
https://www.nature.com/scitable/topicpage/dna-sequencing-technologies-key-to-the-human-828 1/6
12/15/2018 Human Genome Project: Sequencing the Human Genome | Learn Science at Scitable
The approach used by the members of the IHGSC was called the hierarchical shotgun method, because the team members systematically generated overlapping clones mapped to individual
human chromosomes, which were individually sequenced using a shotgun approach (Figure 2). The clones were derived from DNA libraries made by ligating DNA
fragments generated by partial restriction enzyme digestion of genomic DNA from anonymous human donors into bacterial artificial chromosome vectors, which
could be propagated in bacteria.
When possible, the DNA fragments within the library vectors were mapped to chromosomal regions by screening for sequence-tagged sites (STSs), which are
DNA fragments, usually less than 500 base pairs in length, of known sequence and chromosomal location that can be amplified using polymerase chain reaction
(PCR). Library clones were also digested with the restriction enzyme HindIII, and the sizes of the resulting DNA fragments were determined using agarose gel
electrophoresis. Each library clone exhibited a DNA fragment "fingerprint," which could be compared to that of all other library clones in order to identify
Figure 2 overlapping clones. Fluorescence in situ hybridization (FISH) was also used to map library clones to specific chromosomal regions. Collectively, the STS, DNA
fingerprint, and FISH data allowed the IHGSC to generate contigs, which consisted of multiple overlapping bacterial artificial chromosome (BAC) library clones
Figure Detail spanning each of the 24 different human chromosomes (i.e., 22 autosomes and the X and Y chromosomes).
Next, individual BAC clones selected for DNA sequence analysis were further fragmented, and the smaller genomic DNA fragments were subcloned into vectors to generate a BAC-derived
shotgun library. The inserts were sequenced using primers matching the vector sequence flanking the genomic DNA insert, and overlapping shotgun clones were used to generate a DNA
sequence spanning the entire BAC clone. A summary of this step is shown in Figure 3. The members of the IHGSC agreed that each center would obtain an average of fourfold sequence
coverage, with no clone having less than threefold coverage. The term "shotgun" comes from the fact that the original BAC clone was randomly fragmented and sequenced, and the raw DNA
sequence data was then subjected to computational analyses to generate an ordered set of DNA sequences that spanned the BAC clone.
https://www.nature.com/scitable/topicpage/dna-sequencing-technologies-key-to-the-human-828 2/6
12/15/2018 Human Genome Project: Sequencing the Human Genome | Learn Science at Scitable
In the whole-genome assembly method (also called the whole-genome random shotgun method), Celera generated a massive shotgun library derived from its own DNA sequence data combined
with the "shredded" Human Genome Project DNA sequence data, which together corresponded to a total of 43.32 million sequence reads (Venter et al., 2001). Celera used computational
methods and sophisticated algorithms to identify overlapping DNA sequences and to reconstruct the human genome by generating a set of scaffolds (Figure 5).
https://www.nature.com/scitable/topicpage/dna-sequencing-technologies-key-to-the-human-828 3/6
12/15/2018 Human Genome Project: Sequencing the Human Genome | Learn Science at Scitable
In whole-genome assembly, the BAC fragments (red line segments) and the reads from five individuals (black line segments) are combined
to produce a contig and a consensus sequence (green line). The contigs are connected into scaffolds, shown in red, by pairing end
sequences, which are also called mates. If there is a gap between consecutive contigs, it has a known size. Next, the scaffolds are mapped
to the genome (gray line) using sequence tagged site (STS) information, represented by blue stars.
© 2001 American Association for the Advancement of Science Venter, C. et al. The sequence of the human genome. Science 291,
1304–1351 (2001). All rights reserved.
In contrast, with the regional chromosome assembly approach (also called the compartmentalized shotgun assembly method), Celera organized its own data and the Human Genome Project
sequence data into the largest possible chromosomal segments, followed by shotgun assembly of the sequence data within each segment (Venter et al., 2001); this approach was similar to the
hierarchical shotgun approach used by the IHGSC. The first step of the regional assembly approach involved separating Celera reads that matched Human Genome Project reads from those that
were distinct from the public sequence data. Of the 27.27 million Celera reads, 21.38 million matched a Human Genome Project bactig, and 5.89 million did not match the public sequence data.
These reads were assembled into Celera-specific or Human Genome Project-specific scaffolds, which were then combined and analyzed using whole-gene assembly algorithms. The resulting
bactig data were again "shredded" to permit unbiased assembly of the combined sequence data.
Celera's whole-genome and regional chromosome assembly methods were independent of each other, permitting direct comparison of the data. Celera found that the regional chromosome
assembly method was slightly more consistent than the whole-genome assembly method. Using these complementary approaches, Celera generated data that was in strong agreement with that
of the IHGSC.
In February 2001, drafts of the human genome sequence were published simultaneously by both groups in two separate articles (IHGSC, 2001; Venter et al., 2001). Due to technical advances in
DNA sequencing methods and a productive level of synergy between the two groups, they tied at the finish line, and both projects were completed ahead of schedule.
Researchers from both the IHGSC and Celera combined the DNA template they were interested in sequencing with DNA polymerase, a single-stranded DNA primer, free deoxynucleotide bases
(dATP, dCTP, dGTP, and dTTP), and a sparse mixture of fluorescently labeled dideoxynucleotide bases (ddATP, ddCTP, ddGTP, and ddTTP) that were each labeled with a different color and
would terminate new DNA strand synthesis once incorporated into the end of a growing DNA strand. The mixture was first heated to denature the template DNA strand; this was followed by a
cooling step to allow the DNA primer to anneal. Following primer annealing, the polymerase synthesized a complementary DNA strand. The template would grow in length until a
dideoxynucleotide base (ddNTP) was incorporated; the conditions were such that this occurred at random along the length of the newly synthesized DNA strands. In the end, the researchers were
left with a mixture of newly synthesized DNA strands that differed in length by a single nucleotide, and that were labeled at their 3′ end with the color of the ddNTP-associated dye molecule (Figure
6b).
In order to determine the sequence of the newly synthesized, color-coded DNA strands, researchers needed a way to separate them based on their size, which differed by only one DNA
nucleotide. To accomplish this, they electrophoresed the DNA through a gel matrix that permitted single-base differences in size to be easily distinguished. Small fragments run more quickly
through the gel, and larger fragments run more slowly (Figure 6c). By putting the entire mixture into a single well of the gel, a laser can be used to scan the DNA bands as they move through the
gel and determine their color; this data can be used to generate a sequence trace (also called an electropherogram), showing the color and signal intensity of each DNA band that passes through
the gel (Figure 6d). The color of each band represents the final 3′ base incorporated at that position, and by reading from the bottom to the top of the gel, one can determine the sequence of the
newly synthesized DNA strand from the 5′ to the 3′ end.
https://www.nature.com/scitable/topicpage/dna-sequencing-technologies-key-to-the-human-828 4/6
12/15/2018 Human Genome Project: Sequencing the Human Genome | Learn Science at Scitable
step produces a mixture of newly synthesized DNA strands that differ in length by a single nucleotide. Each strand is
labeled at the 3′ end with a fluorescently labeled ddNTP base. C) The DNA mixture is separated by electrophoresis. D)
The electropherogram results show peaks representing the color and signal intensity of each DNA band. From these
data, the sequence of the newly synthesized DNA strand is determined, as shown above the peaks.
© 2003 Macmillan Publishers, Ltd. Dennis, C. & Gallagher, R. (eds) The Human Genome (Palgrave, Basingstoke,
2001). Used with permission. All rights reserved.
Figure Detail
Unfortunately, the initial hope of accelerating the discovery of new treatments for disease was not necessarily accomplished by the Human Genome Project. With the sequence of the human
genome in hand, we have learned that it requires more than just knowledge of the order of the base pairs in our genome to cure human disease. Current efforts are therefore focused on
understanding the protein products that are encoded by our genes. When a gene is mutated, the corresponding protein is most often defective. The emerging field of proteomics aims to
understand how protein function and expression are altered in human disease states. Furthermore, investigators are also turning their attention to the expansive regions of our genome devoid of
traditional protein-encoding genes. We have already started to reap the benefits of our knowledge of the human genome, and future data-mining efforts will most certainly uncover many more
exciting and unexpected links to human disease.
Summary
Within a span of only 13 years, an amalgam of public and private researchers was able to successfully complete the Human Genome Project. Although these scientists used a number of different
methods in their work, they nonetheless obtained the same results. In doing so, the researchers not only silenced their critics, but they also beat their own estimated project timeline by two entire
years. Perhaps even more importantly, these scientists inspired an ongoing revolution in our fight against human disease and provided a new vision of the future of medicine-although that future
has yet to be fully realized.
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001) (link to article)
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004) (link to article)
Venter, J. C., et al. The sequence of the human genome. Science 291, 1304–1351 (2001) (link to article)
https://www.nature.com/scitable/topicpage/dna-sequencing-technologies-key-to-the-human-828 5/6
12/15/2018 Human Genome Project: Sequencing the Human Genome | Learn Science at Scitable
https://www.nature.com/scitable/topicpage/dna-sequencing-technologies-key-to-the-human-828 6/6