Next Generation Sequencing and Sequence Assembly Methodologies and Algorithms pdf epub
Next Generation Sequencing and Sequence Assembly Methodologies and Algorithms pdf epub
Visit the link below to download the full version of this book:
https://medipdf.com/product/next-generation-sequencing-and-sequence-assembly-met
hodologies-and-algorithms/
Nazanin Hosseinkhan
123
Ali Masoudi-Nejad Nazanin Hosseinkhan
Laboratory of Systems Biology Laboratory of Systems Biology
and Bioinformatics (LBB) and Bioinformatics (LBB)
Institute of Biochemistry and Biophysics Institute of Biochemistry and Biophysics
University of Tehran University of Tehran
Tehran Tehran
Iran Iran
Zahra Narimani
Laboratory of Systems Biology
and Bioinformatics (LBB)
Institute of Biochemistry and Biophysics
University of Tehran
Tehran
Iran
vii
viii Preface
ix
x Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Chapter 1
Next-Generation Sequencing
Methodologies
1.1 Introduction
Although many people believe that the American biologist James Watson and
English physicist Francis Crick were the first to discover DNA in the 1950s, DNA
was actually discovered by the Swiss chemist Friedrich Miescher in the late 1860s
during his attempts to isolate the protein components of leukocytes. But when he
isolated a substance that was unlike proteins resistant to proteolysis and also had
different chemical properties of proteins, including a much higher phosphorous
content, he realized that he had discovered a new substance [1]. He called this new
substance ‘‘nuclein.’’
Miescher’s finding was not considered particularly important until the twentieth
century, when the chemical nature of nuclein was studied by the Russian bio-
chemist Phoebus Levene. He was the first to discover: (1) the order of three major
components of a single nucleotide (phosphate-sugar-base) (Fig. 1.1); (2) the
carbohydrate component of RNA (ribose) and DNA (deoxyribose); and (3) the
way RNA and DNA molecules are put together. In 1919 Levene proposed that
nucleic acids were composed of a series of nucleotides and that each nucleotide
was in turn composed of just one of four nitrogen-containing bases—a sugar
molecule and a phosphate group.
Studies conducted to discover the DNA structure were continued by Erwin
Chargaff, an Austrian biochemist, to uncover additional details about the structure
of DNA. He reached two major conclusions [3]: First, he stated that the nucleotide
composition of DNA varies among species, and second, he concluded that the
amount of the base adenine (A) is usually similar to the amount of thymine (T);
this is also true about the amount of guanine (G) and cytosine (C). The latter is
known as Chargaff’s rule (Fig. 1.2).
Fig. 1.1 Three components of each nucleotide: the nitrogenous base that can basically belong to
two categories (single ring: pyrimidines, or two-linked rings: purines), a pentose sugar (ribose in
RNA and deoxyribose in DNA), and a phosphate group [2]
Fig. 1.3 Double-helical structure of DNA. The chains of sugar-phosphate groups are linked
together by complementary bases [2]
Knowing about the order (sequence) of nucleotides in DNA, the molecule in which
the genetic information of all organisms is stored, has revolutionized biology and
resulted in our better understanding of life’s secrets (BBSRC Review of Next-
Generation Sequencing—final version).
The first two DNA sequencing techniques, which are known as first-generation
DNA sequencers, historically were developed by Fredrick Sanger (1977, Uni-
versity of Cambridge) and Allan Maxam and Walter Gilbert (1976–1977, Harvard
University), independently. Sanger’s method, which earned him a Nobel Prize in
Chemistry in 1980, became popular, and in fact was the sole method for DNA
sequencing for three decades, as a result of its lesser technical complexity and
lesser amount of toxic chemicals used, compared to the Maxam–Gilbert method,
4 1 Next-Generation Sequencing Methodologies
which was based on the chemical modification of DNA and subsequent cleavage at
specific bases. In the Sanger sequencing method, which is also known as ‘‘chain
termination’’ or the ‘‘dideoxy method,’’ modified nucleotides (fluorescently
labeled dideoxynucleotides) are used in the reaction in addition to normal nucle-
otides; this method was gradually improved and became automated (the first
automatic sequencing machine, AB370, was introduced in 1987 by Applied
Biosystems), and therefore has been the method of choice for large-scale
sequencing projects, e.g., whole-genome sequencing for various species, for about
30 years [4].
G A C T
Largest
TCGAAGACGTATC
Smallest
Fig. 1.5 Sanger sequencing procedure. a Four distinct reactions are taking place in the presence
of all required materials for DNA synthesis. Besides in each separate reaction, a distinct type of
fluorescently labeled dideoxy nucleotides is added which after completion DNA synthesis cycles,
results in the DNA strands each of which terminated in specific dideoxy nucleotide present on
that reaction. b After reaction completion, the content of four separate reactions is electropho-
resed using high-resolution polyacrylamide gel (www.Wikipedia.org)
The Sanger method can only be performed for DNA fragments with a fairly short
length, i.e., 100–1,000 base pairs. This is due to the limitation in the power of
discrimination between fragment sizes during capillary electrophoresis, which
restricts the size of the DNA that can be reliably sequenced to *1,000 base pairs
(for larger DNA fragments, longer gels are required). Larger sequences—for
example, an entire chromosome—must first be fragmented into smaller pieces and
amplified to obtain a large number of copies for each individual fragment. After
performing sequencing reaction, these fragments must be reassembled to produce
the original sequence.
6 1 Next-Generation Sequencing Methodologies