ch10 2
ch10 2
Evolutionary relationships are clear in protein sequences. The now-extinct woolly mammoth (Mammuthus primigenius)
coexisted with early humans, as evidenced by Late Stone Age cave paintings such as those at the Rouffignac Cave in France.
Using samples obtained from frozen preserved tissue, researchers used recombinant DNA technology to determine the
complete sequence of the woolly mammoth genome and the structural and functional adaptations of its proteins. In the
case of α-hemoglobin, only six residues (shown in green) are nonidentical and dissimilar between the woolly mammoth
and human sequences. Amino acid sequence comparisons with living species and functional expression of these extinct 1
proteins have provided new insights into how this species adapted to very cold environments.
Chapter 10
Exploring Evolution & Bioinformatics
Jane Goodall
The human sequence (red) differs from the chimpanzee sequence (blue) in only one
amino acid in a protein chain of 153 residues for myoglobin
2
https://www.youtube.com/watch?v=51z7WRDjOjM
Molecular families often have features in common.
Such family resemblance is most easily detected by comparing 3D structure, the aspect of a molecule most
closely linked to function.
For example, angiogenin, a protein that stimulates the growth of new blood vessels, also turns out to be
structurally similar to ribonuclease.
Angiogenin and ribonuclease must have had a common ancestor at some earlier stage of evolution.
3
Gene sequences and the corresponding amino acid sequences are available for a great number of proteins,
largely owing to the tremendous power of DNA cloning and sequencing techniques including applications to
complete genome sequencing.
For example, 35% of the amino acids in corresponding positions are identical in the sequences of bovine
ribonuclease and angiogenin.
In this chapter, we shall examine the methods that are used to compare amino acid sequences and to deduce
such evolutionary relationships.
Sequence-comparison methods have become one of the most powerful tools in modem biochemistry.
4
10.1 Homologs Are Descended from a Common Ancestor
The most fundamental relationship between two entities is homology; two molecules are said to be
homologous if they have been derived from a common ancestor.
5
Statistical Analysis of Sequence Alignments Can Detect Homology
Both nucleic acid and protein sequences can be compared to detect homology.
Because nucleic acids are composed of fewer building blocks than proteins
(4 bases versus 20 amino acids),
6
Let us consider a class of proteins called the globins.
Sequence Alignment
α-chain
β-chain
α-chain
β-chain
Hemoglobin 7
How can we tell where to align the two sequences?
To understand how the methods of sequence alignment take these potential into account
8
Let us first consider the simplest approach,
where we slide one sequence past the other, one amino acid at a time, and
count the number of matched residues, or sequence identities.
9
By introducing a gap into one of the sequences,
Next, we must determine the significance of this score and level of identity 10
Statistical Significance of Alignments Estimated by Shuffling
The similarities in sequence appear striking,
yet there remains the possibility that a grouping of sequence identities has occurred by chance alone.
Hence, we can assess the significance of our alignment by “shuffling,” or randomly rearranging, one of the
sequences, repeat the sequence alignment, and determine a new alignment score.
11
When this procedure is applied to the sequences of myoglobin and α-hemoglobin the authentic alignment clearly
stands out.
Its score is far above the mean for the alignment scores based on shuffled sequences.
Thus, we can comfortably conclude that the two sequences are genuinely
similar;
the simplest explanation for this similarity is that these sequences are
homologous–that is, that the two molecules have descended by divergence
from a common ancestor.
A conservative substitution replace one amino acid with another that is similar in size and chemical
properties.
Conservative substitutions may have only minor effects on protein structure and often can be tolerated
without compromising protein function.
In contrast, in a nonconservative substitution, an amino acid is replaced by one that is structurally dissimilar.
Conservative and single-nucleotide substitutions are likely to be more common than are substitutions with
more radical effects.
How can we account for the type of substitution when comparing sequences?
13
Blosum-62 substitution matrix
Blosum: Blocks of amino acid substitution matrix
Furthermore, structurally conservative substitutions such as lysine (K) for arginine (R) and isoleucine (I) for
valine (V) have relatively high scores, whereas nonconservative substitutions such as lysine for tryptophan
result in negative scores.
In addition, the introduction of a single residue gap lowers the alignment score by
12 points and the extension of an existing gap costs 2 points per residue.
15
In many regions, most substitutions are conservative (defined as those substitutions with
scores greater than 0) and relatively few are strongly disfavored types.
This scoring system detects homology between less obviously related sequences with
greater sensitivity than would a comparison of identities only.
16
Consider, for example, the protein leghemoglobin, an oxygen-binding protein found in the roots of some plants.
17
Thus, an analysis performed by using the substitution matrix reaches a much firmer
conclusion about the evolutionary relationship between these proteins
It must be emphasized that the lack of a statistically significant degree of sequence similarity
does not rule out homology. 18
Databases Can Be Searched to Identify Homologous Sequences
When the sequence of a protein is first determined, comparing it with all previously
characterized sequences can be a source of tremendous insight into its evolutionary relatives
and, hence, its structure and function.
The sequence-alignment methods just described are used to compare an individual sequence
with all members of a database of known sequences.
Database searches are most often accomplished by using resources available on the Internet at
the National Center for Biotechnology Information (www.ncbi.nih.gov).
The procedure used is referred to as a BLAST (Basic Local Alignment Search Tool) search.
An amino acid sequence is typed or pasted into the Web browser, and a search is performed,
most often against a non-redundant database of all known sequences.
19
Databases can http://www.ncbi.nlm.nih.gov/blast/
http://www.ncbi.nlm.nih.gov/protein
be searched to
identify
homologous
sequences
FIGURE 10.12 Related protein sequences can be rapidly identified using online BLAST searches. Part of the
results from a BLAST search of the nonredundant (nr) protein sequence database using the sequence of ribose 5-
phosphate isomerase (also called phosphopentose isomerase, Chapter 20) from E. coli as a query. Among the
thousands of sequences found is the orthologous sequence from humans, and the alignment between these
sequences is shown. The number of sequences with this level of similarity expected to be in the database by
chance is 2 × 10–25 as shown by the E value (highlighted in red). Because this value is much less than 1, we20can
confidently conclude that the observed sequence alignment is highly significant.
NCBI website search movie
21
In 1995, investigators reported the first complete sequence of the genome of a free-living organism, the
bacterium Haemophilus infuenzae.
With the sequences available, they performed a BLAST search with each deduced protein sequence.
An additional 347 open reading frames could be linked to sequences in the database for which no function
had yet been assigned ("hypothetical proteins").
The remaining 389 sequences did not match any sequence present in the database at that time.
However, biomolecules generally function as intricate three-dimensional structures rather than as linear
polymers.
Mutations occur at the level of sequence, but the effects of the mutations are at the level of function,
and function is directly related to tertiary structure.
23
Tertiary Structure Is More Conserved Than Primary Structure
Tertiary structures of the globins are extremely similar
even though the similarity between human myoglobin and lupine leghemoglobin is just barely detectable at
the sequence level and that between human α–hemoglobin and lupine leghemoglobin is not statistically
significant (15 .6% identity).
This structural similarity firmly establishes that the framework that binds the heme group and facilitates the
reversible binding of oxygen has been conserved over a long evolutionary period.
24
In a growing number of other cases, however, a comparison of three-dimensional structures has revealed
striking similarities between proteins that were not expected to be related.
A case in point is the protein actin, a major component of the cytoskeleton, and heat shock protein 70 (Hsp-
70), which assists protein folding inside cells.
These two proteins were found to be noticeably similar in structure despite only 15.6% sequence identity.
On the basis of their three-dimensional structures, actin and Hsp-70 are paralogs.
The sequence-comparison methods described thus far treat all positions within a sequence equally.
However, we know from examining families of homologous proteins for which at least one three-dimensional
structure is known that regions and residues critical to protein function are more strongly conserved than are
other residues.
For example, each type of globin contains a bound heme group with an iron atom at its center.
A histidine residue that interacts directly with this iron (residue 64 in human myoglobin)
is conserved in all globins.
distal histidine
After we have identified key residues or highly conserved sequences within a family of proteins, we can
sometimes identify other family members even when the overall level of sequence similarity is below statistical
significance.
Thus it may be useful to generate a sequence template–a map of conserved residues that are structurally and
26
functionally important and are characteristic of particular families of proteins
Repeated Motifs Detected by Aligning Sequences with Themselves
More than 10% of all proteins contain sets of two or more
domains that are similar to one another. Sequence search
methods can often detect internally repeated sequences that
have been characterized in other proteins.
Their presence can be detected by attempting to align a given
sequence with itself.
The determination of the three-dimensional structure of the
TATA-box-binding protein confirmed the presence of repeated
structures; the protein is formed of two nearly identical domains.
Thus far, we have been exploring proteins derived from common ancestors – that is, through divergent
evolution.
Other cases have been found of proteins that are structurally similar in important ways but are not descended
from a common ancestor.
How might two unrelated proteins come to resemble each other structurally?
Two proteins evolving independently may have converged on a similar structure to perform a similar
biochemical activity.
Perhaps that structure was an especially effective solution to a biochemical problem that organisms face.
The process by which very different evolutionary pathways lead to the same solution is called convergent
evolution.
28
An example of convergent evolution is found among the serine proteases.
These enzymes cleave peptide bonds by hydrolysis (Section 6.2).
The active sites for two such enzymes, mammalian chymotrypsin and bacterial subtilisin, are remarkably similar
(Figure 10.16, top). In each case, the catalytic triad of a serine residue, a histidine residue, and an aspartate
residue is positioned in space in nearly identical arrangements.
Whereas chymotrypsin consists almost entirely of β sheets, subtilisin contains extensive α-helical structure.
Moreover, the key serine, histidine, and aspartic acid residues do not even appear in the same order within the
two sequences. It is extremely unlikely that two proteins evolving from a common ancestor could have retained
similar active-site structures while other aspects of the structure changed so dramatically.
30
Comparison of RNA Sequences can be a Source of Insight into RNA Secondary Structures
The methods and interpretation of sequence alignments described above are not limited to proteins.
Homologous RNA sequences can also be similarly studied, yielding important insights into evolutionary
relationships. Additionally, these analyses provide clues to the three-dimensional structure of the RNA itself.
FIGURE 10.17 RNA sequence alignments provide insights into structure based on conservation of base-
pairing. (A) A comparison of sequences in a part of ribosomal RNA taken from a variety of species. (B) The
implied secondary structure. Green lines indicate positions at which Watson–Crick base-pairing is completely
conserved in the sequences shown, whereas dots indicate positions at which Watson–Crick base-pairing is
31
conserved in most cases.
10.3 Evolutionary Trees Can Be Constructed on the Basis of Sequence Information
350 My
Horizontal gene transfer is the exchange of DNA between species that confers selective advantage on the
recipient.
Horizontal gene transfer is common among prokaryotes.
Example: the eukaryote Galdieria sulphuraria contains genes more closely related to bacteria and archaea.
This subset of genes allows G. sulphuraria to live in hostile environments.
(1) The polymerase chain reaction (Section 9.1) allows the direct amplification and examination of ancient
DNA sequences. If DNA samples can be obtained from preserved remains, we can explore genomes from
species that are no longer living.
(2) Molecular evolution may also be investigated through the process of synthesizing highly diverse
populations of molecules and selecting for a biochemical property.
The combination of these two techniques provides a glimpse into the types of molecules that may have existed
very early in evolution.
SELEX
34
Ancient DNA Can Sometimes Be Amplified and Sequenced
The tremendous chemical stability of DNA makes the molecule well suited to its role as the storage site of
genetic information. So stable is the molecule that samples of DNA have survived for many thousands of years
under appropriate conditions.
With the development of PCR methods, such ancient DNA can sometimes be amplified and sequenced. This
approach has been applied to mitochondrial DNA from a Neanderthal fossil estimated at 38,000 years of age
excavated from Vindija Cave, Croatia, in 1980.
Remarkably, investigators have completely sequenced the mitochondrial genome from this specimen.
Comparison of the Neanderthal mitochondrial sequence with those from Homo sapiens individuals revealed
between 201 and 234 substitutions, considerably fewer than the approximately 1,500 differences between
human beings and chimpanzees over the same regions.
35
Further analysis suggested that the common ancestor of modern human beings and
Neanderthals lived approximately 570,000 years ago.
A few earlier studies claimed to determine the sequences of far more ancient DNA such as that found in insects
trapped in amber, but these studies appear to have been flawed. The source of these sequences turned out to
be contaminating modern DNA.
Successful sequencing of ancient DNA requires sufficient DNA for reliable amplification and the rigorous
exclusion of all sources of contamination. 36
2022 Nobel Prize, Phys & Med
37
Svante Pä
Molecular Evolution Can Be Examined Experimentally
Evolution requires three processes:
Errors that occur naturally in the course of the replication process introduce
additional variation into the population in each “generation.”
38
The molecules that bound well to the ATP affinity column were replicated
by reverse transcription into DNA, amplification by PCR, and transcription
back into RNA. Additional diversity was introduced into the pool by the
use of a somewhat error-prone reverse transcriptase, which introduces
additional mutations into the population during each cycle.
The new population was subjected to additional rounds of selection for
FIGURE 10.24 A conserved secondary structure ATP-binding activity, and after 8 generations, members of the selected
is common to RNA molecules selected for ATP population were characterized by sequencing.
binding.
17 different sequences were obtained, 16 of which could form the structure shown in Figure 10.24.
Each of these molecules bound ATP with dissociation constants (KD) less than 50 μM.
The loop folds back to form a deep pocket into which the adenine ring can fit.
Thus, a structure had evolved that was capable of a specific interaction.
39
FIGURE 10.25 Experimental molecular evolution yields an ATP-binding RNA molecule.
Synthetic oligonucleotides that can specifically bind ligands, such as the ATP-binding RNA molecules
described above, are referred to as aptamers.
In addition to their role in understanding molecular evolution, aptamers have shown promise as versatile
tools for biotechnology and medicine.
For example, they have been developed for diagnostic applications, serving as sensors for ligands ranging
from small organic molecules, such as cocaine, to larger proteins, such as thrombin.
Several aptamers are also being tested in clinical trials as therapies for diseases ranging from leukemia to
diabetes.
Macugen (pegaptanib sodium), an aptamer which binds to and inhibits the protein vascular endothelial
growth factor, has been approved for the treatment of age-related macular degeneration.
macular degeneration
황반변성 40
End of
Biochemistry 1
41