0% found this document useful (0 votes)
6 views

ch10 2

Chapter 10 discusses the evolutionary relationships revealed through protein sequences, highlighting the use of bioinformatics and DNA technology to analyze genomes, such as that of the woolly mammoth. It explains the concepts of homologs, including paralogs and orthologs, and the importance of sequence alignment and statistical significance in detecting homology. The chapter also emphasizes the role of three-dimensional structures in understanding evolutionary relationships and the phenomenon of convergent evolution among proteins.

Uploaded by

anyaferre7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

ch10 2

Chapter 10 discusses the evolutionary relationships revealed through protein sequences, highlighting the use of bioinformatics and DNA technology to analyze genomes, such as that of the woolly mammoth. It explains the concepts of homologs, including paralogs and orthologs, and the importance of sequence alignment and statistical significance in detecting homology. The chapter also emphasizes the role of three-dimensional structures in understanding evolutionary relationships and the phenomenon of convergent evolution among proteins.

Uploaded by

anyaferre7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

CHAPTER 10

Exploring Evolution and Bioinformatics

Evolutionary relationships are clear in protein sequences. The now-extinct woolly mammoth (Mammuthus primigenius)
coexisted with early humans, as evidenced by Late Stone Age cave paintings such as those at the Rouffignac Cave in France.
Using samples obtained from frozen preserved tissue, researchers used recombinant DNA technology to determine the
complete sequence of the woolly mammoth genome and the structural and functional adaptations of its proteins. In the
case of α-hemoglobin, only six residues (shown in green) are nonidentical and dissimilar between the woolly mammoth
and human sequences. Amino acid sequence comparisons with living species and functional expression of these extinct 1
proteins have provided new insights into how this species adapted to very cold environments.
Chapter 10
Exploring Evolution & Bioinformatics

Jane Goodall

The human sequence (red) differs from the chimpanzee sequence (blue) in only one
amino acid in a protein chain of 153 residues for myoglobin
2
https://www.youtube.com/watch?v=51z7WRDjOjM
Molecular families often have features in common.

Such family resemblance is most easily detected by comparing 3D structure, the aspect of a molecule most
closely linked to function.

Similarities revealed by other such comparisons are sometimes surprising.

For example, angiogenin, a protein that stimulates the growth of new blood vessels, also turns out to be
structurally similar to ribonuclease.

Angiogenin and ribonuclease must have had a common ancestor at some earlier stage of evolution.

3
Gene sequences and the corresponding amino acid sequences are available for a great number of proteins,
largely owing to the tremendous power of DNA cloning and sequencing techniques including applications to
complete genome sequencing.

Evolutionary relationships also are manifest in amino acid sequences.

For example, 35% of the amino acids in corresponding positions are identical in the sequences of bovine
ribonuclease and angiogenin.

Is this level sufficiently high to ensure an evolutionary relationship?

In this chapter, we shall examine the methods that are used to compare amino acid sequences and to deduce
such evolutionary relationships.

Sequence-comparison methods have become one of the most powerful tools in modem biochemistry.

4
10.1 Homologs Are Descended from a Common Ancestor
The most fundamental relationship between two entities is homology; two molecules are said to be
homologous if they have been derived from a common ancestor.

Homologous molecules, or homologs, can be divided into two classes.


Paralogs are homologs that are present within one species.
Paralogs often differ in their detailed biochemical functions.
Orthologs are homologs that are present within different species and have very similar or identical
functions.
How can we tell whether two human proteins are paralogs or whether a yeast protein is the ortholog of a
human protein?

5
Statistical Analysis of Sequence Alignments Can Detect Homology

Both nucleic acid and protein sequences can be compared to detect homology.

Because nucleic acids are composed of fewer building blocks than proteins
(4 bases versus 20 amino acids),

Detection of homology between protein sequences is far more effective.

6
Let us consider a class of proteins called the globins.

Myoglobin is a protein that binds oxygen in muscle,


Hemoglobin is the oxygen-carrying protein in blood (Chapter 7).

Here, we consider only the α-chain.

Sequence Alignment

α-chain
β-chain

α-chain
β-chain
Hemoglobin 7
How can we tell where to align the two sequences?

In the course of evolution, insertions and deletions may have occurred.

Individual amino acids may have been mutated.

To understand how the methods of sequence alignment take these potential into account

8
Let us first consider the simplest approach,

where we slide one sequence past the other, one amino acid at a time, and
count the number of matched residues, or sequence identities.

For α-hemoglobin and myoglobin,

the best alignment reveals


23 sequence identities,

However, we see that another alignment, featuring 22


identities, is nearly as good.

9
By introducing a gap into one of the sequences,

38 identities : 38 x (+10) = 380


1 gap : 1 x (-25) = 25
Score: 355
Insertion of gaps allows the alignment method to compensate for the insertions or
deletions of nucleotides that may have taken place in the course of evolution.

These methods use scoring systems to compare different alignments, including


penalties for gaps to prevent the insertion of an unreasonable number of them.

Next, we must determine the significance of this score and level of identity 10
Statistical Significance of Alignments Estimated by Shuffling
The similarities in sequence appear striking,
yet there remains the possibility that a grouping of sequence identities has occurred by chance alone.

Hence, we can assess the significance of our alignment by “shuffling,” or randomly rearranging, one of the
sequences, repeat the sequence alignment, and determine a new alignment score.

11
When this procedure is applied to the sequences of myoglobin and α-hemoglobin the authentic alignment clearly
stands out.

Its score is far above the mean for the alignment scores based on shuffled sequences.

The probability that such a deviation occurred by chance alone are


approximately 1 in 1020 .

Thus, we can comfortably conclude that the two sequences are genuinely
similar;

the simplest explanation for this similarity is that these sequences are
homologous–that is, that the two molecules have descended by divergence
from a common ancestor.

FIGURE 10.6 Comparison of alignment scores reveals the statistical


significance of an unshuffled alignment. Alignment scores are
calculated for many shuffled sequences, and the number of
sequences generating a particular score is plotted against the score.
The resulting plot is a distribution of alignment scores occurring by
chance. The alignment score for unshuffled α-hemoglobin and
myoglobin (shown in red) is substantially greater than any of these
12
scores, strongly suggesting that the sequence similarity is significant.
Distant Evolutionary Relationships Can Be Detected Through the Use of Substitution Matrices

No credit is given for any pairing that is not an identity.

Not all substitutions are equivalent.

Amino acid changes classified as structurally conservative or nonconservative.

A conservative substitution replace one amino acid with another that is similar in size and chemical
properties.
Conservative substitutions may have only minor effects on protein structure and often can be tolerated
without compromising protein function.

In contrast, in a nonconservative substitution, an amino acid is replaced by one that is structurally dissimilar.

Conservative and single-nucleotide substitutions are likely to be more common than are substitutions with
more radical effects.

How can we account for the type of substitution when comparing sequences?
13
Blosum-62 substitution matrix
Blosum: Blocks of amino acid substitution matrix

FIGURE 10.7 The Blosum-62 substitution matrix


enables scoring based on amino acid similarity. This
matrix was derived by examining substitutions within
aligned sequence blocks in related proteins. In this
graphical view, amino acids are classified into four
groups (charged, red; polar, green; large and
hydrophobic, blue; other, black). Substitutions that
require the change of only a single nucleotide are
shaded. Identities are boxed. To find the score for a
substitution of, for instance, a Y for an H, you find the Y
in the column having H at the top and check the
14
number at the left. In this case, the resulting score is 2.
Notice that scores are not the same for each residue, owing to the less frequently occurring amino acids such
as cysteine (C) and tryptophan (W) will align by chance less often than the more common residues align.

Furthermore, structurally conservative substitutions such as lysine (K) for arginine (R) and isoleucine (I) for
valine (V) have relatively high scores, whereas nonconservative substitutions such as lysine for tryptophan
result in negative scores.

In addition, the introduction of a single residue gap lowers the alignment score by
12 points and the extension of an existing gap costs 2 points per residue.
15
In many regions, most substitutions are conservative (defined as those substitutions with
scores greater than 0) and relatively few are strongly disfavored types.

Yellow: conservative substitution


Orange: identity

This scoring system detects homology between less obviously related sequences with
greater sensitivity than would a comparison of identities only.

16
Consider, for example, the protein leghemoglobin, an oxygen-binding protein found in the roots of some plants.

Scoring based on identities by chance alone are 1 in 20.


In contrast, users of the substitution matrix
the odds of the alignment occurring by chance approximately 1 in 300.

17
Thus, an analysis performed by using the substitution matrix reaches a much firmer
conclusion about the evolutionary relationship between these proteins

Experience with sequence analysis has led to


sequence identities greater than 25% are probably homologous.
less than 15% identical is unlikely to indicate statistically significant similarity.
between 15 and 25% identical, further analysis is necessary to determine the statistical significance of the
alignment.

It must be emphasized that the lack of a statistically significant degree of sequence similarity
does not rule out homology. 18
Databases Can Be Searched to Identify Homologous Sequences

When the sequence of a protein is first determined, comparing it with all previously
characterized sequences can be a source of tremendous insight into its evolutionary relatives
and, hence, its structure and function.
The sequence-alignment methods just described are used to compare an individual sequence
with all members of a database of known sequences.

Database searches are most often accomplished by using resources available on the Internet at
the National Center for Biotechnology Information (www.ncbi.nih.gov).

The procedure used is referred to as a BLAST (Basic Local Alignment Search Tool) search.

An amino acid sequence is typed or pasted into the Web browser, and a search is performed,
most often against a non-redundant database of all known sequences.

A BLAST search yields a list of sequence alignments, each accompanied by an estimate


giving the likelihood that the alignment occurred by chance.

19
Databases can http://www.ncbi.nlm.nih.gov/blast/
http://www.ncbi.nlm.nih.gov/protein

be searched to
identify
homologous
sequences

FIGURE 10.12 Related protein sequences can be rapidly identified using online BLAST searches. Part of the
results from a BLAST search of the nonredundant (nr) protein sequence database using the sequence of ribose 5-
phosphate isomerase (also called phosphopentose isomerase, Chapter 20) from E. coli as a query. Among the
thousands of sequences found is the orthologous sequence from humans, and the alignment between these
sequences is shown. The number of sequences with this level of similarity expected to be in the database by
chance is 2 × 10–25 as shown by the E value (highlighted in red). Because this value is much less than 1, we20can
confidently conclude that the observed sequence alignment is highly significant.
NCBI website search movie

21
In 1995, investigators reported the first complete sequence of the genome of a free-living organism, the
bacterium Haemophilus infuenzae.

With the sequences available, they performed a BLAST search with each deduced protein sequence.

Of 1,743 identified protein-coding regions, also called open reading frames,


1,007 (58%) could be linked to some protein of known function that had been previously characterized in
another organism.

An additional 347 open reading frames could be linked to sequences in the database for which no function
had yet been assigned ("hypothetical proteins").

The remaining 389 sequences did not match any sequence present in the database at that time.

Thus, investigators were able to identify likely functions


for more than half the proteins within this organism
solely by sequence comparisons.

Sequence-comparison methods have become one of the most powerful tools in


modem biochemistry. 22
10.2 Examination of Three-Dimensional Structure Enhances Our Understanding of
Evolutionary Relationships
Sequence comparison is a powerful tool for extending our knowledge of protein function and kinship.

However, biomolecules generally function as intricate three-dimensional structures rather than as linear
polymers.

Mutations occur at the level of sequence, but the effects of the mutations are at the level of function,
and function is directly related to tertiary structure.

Consequently, to gain a deeper understanding of evolutionary relationships between proteins, we must


examine three-dimensional structures, especially in conjunction with sequence information.

23
Tertiary Structure Is More Conserved Than Primary Structure
Tertiary structures of the globins are extremely similar
even though the similarity between human myoglobin and lupine leghemoglobin is just barely detectable at
the sequence level and that between human α–hemoglobin and lupine leghemoglobin is not statistically
significant (15 .6% identity).

This structural similarity firmly establishes that the framework that binds the heme group and facilitates the
reversible binding of oxygen has been conserved over a long evolutionary period.

24
In a growing number of other cases, however, a comparison of three-dimensional structures has revealed
striking similarities between proteins that were not expected to be related.

A case in point is the protein actin, a major component of the cytoskeleton, and heat shock protein 70 (Hsp-
70), which assists protein folding inside cells.

These two proteins were found to be noticeably similar in structure despite only 15.6% sequence identity.

On the basis of their three-dimensional structures, actin and Hsp-70 are paralogs.

FIGURE 10.14 The structures of actin and a large fragment of heat


shock protein 70 (Hsp70) reveal an unexpected relationship. A
comparison of the identically colored elements of secondary structure
reveals the overall similarity in structure despite the difference in
biochemical activities. [Drawn from 1ATN.pdb and 1ATR.pdb.] 25
Knowledge of 3D Structures Aid the Evaluation of Sequence Alignments

The sequence-comparison methods described thus far treat all positions within a sequence equally.

However, we know from examining families of homologous proteins for which at least one three-dimensional
structure is known that regions and residues critical to protein function are more strongly conserved than are
other residues.

For example, each type of globin contains a bound heme group with an iron atom at its center.

A histidine residue that interacts directly with this iron (residue 64 in human myoglobin)
is conserved in all globins.

distal histidine

After we have identified key residues or highly conserved sequences within a family of proteins, we can
sometimes identify other family members even when the overall level of sequence similarity is below statistical
significance.

Thus it may be useful to generate a sequence template–a map of conserved residues that are structurally and
26
functionally important and are characteristic of particular families of proteins
Repeated Motifs Detected by Aligning Sequences with Themselves
More than 10% of all proteins contain sets of two or more
domains that are similar to one another. Sequence search
methods can often detect internally repeated sequences that
have been characterized in other proteins.
Their presence can be detected by attempting to align a given
sequence with itself.
The determination of the three-dimensional structure of the
TATA-box-binding protein confirmed the presence of repeated
structures; the protein is formed of two nearly identical domains.

For the TATA-box-binding protein, a key protein in controlling


gene transcription. Comparison of two 65-residue segments from
these regions reveals that 40% of the amino acids are identical.
The estimated probability of such an alignment occurring by
chance is less than 1 in 1010.

The evidence is convincing that the gene encoding


this protein evolved by duplication of a gene encoding
a single domain.
FIGURE 10.15 Internal repeats can be identified using
a self-alignment. 27
Convergent Evolution Illustrates Common Solutions to Biochemical Challenges

Thus far, we have been exploring proteins derived from common ancestors – that is, through divergent
evolution.

Other cases have been found of proteins that are structurally similar in important ways but are not descended
from a common ancestor.

How might two unrelated proteins come to resemble each other structurally?

Two proteins evolving independently may have converged on a similar structure to perform a similar
biochemical activity.

Perhaps that structure was an especially effective solution to a biochemical problem that organisms face.

The process by which very different evolutionary pathways lead to the same solution is called convergent
evolution.

28
An example of convergent evolution is found among the serine proteases.
These enzymes cleave peptide bonds by hydrolysis (Section 6.2).

The active sites for two such enzymes, mammalian chymotrypsin and bacterial subtilisin, are remarkably similar
(Figure 10.16, top). In each case, the catalytic triad of a serine residue, a histidine residue, and an aspartate
residue is positioned in space in nearly identical arrangements.

This conserved spatial arrangement is critical for the


activity of these enzymes and affords the same mechanistic
solution to the problem of peptide bond hydrolysis.

FIGURE 10.16 Mammalian chymotrypsin and bacterial subtilisin are


examples of convergent evolution. The relative positions of the three key
catalytic residues are nearly identical in the active sites of the serine
proteases chymotrypsin and subtilisin. Yet, the overall protein structures are
quite dissimilar, suggesting that although these proteins derived from
different ancestors, they arrived at a similar solution to the challenge of
peptide bond hydrolysis. [Drawn from 1GCT.pdb and 1SUP.pdb.] 29
At first glance, this similarity between chymotrypsin and subtilisin might suggest that these proteins are
homologous. However, striking differences in the overall structures of these proteins make an evolutionary
relationship extremely unlikely (Figure 10.16, bottom).

Whereas chymotrypsin consists almost entirely of β sheets, subtilisin contains extensive α-helical structure.
Moreover, the key serine, histidine, and aspartic acid residues do not even appear in the same order within the
two sequences. It is extremely unlikely that two proteins evolving from a common ancestor could have retained
similar active-site structures while other aspects of the structure changed so dramatically.

30
Comparison of RNA Sequences can be a Source of Insight into RNA Secondary Structures
The methods and interpretation of sequence alignments described above are not limited to proteins.
Homologous RNA sequences can also be similarly studied, yielding important insights into evolutionary
relationships. Additionally, these analyses provide clues to the three-dimensional structure of the RNA itself.

As noted in Chapter 8, single-stranded nucleic acid molecules fold back on themselves


to form elaborate structures held together by Watson–Crick base-pairing and other
noncovalent interactions. In a family of sequences that form similar base-paired structures,
base sequences may vary, but base-pairing ability is conserved.

FIGURE 10.17 RNA sequence alignments provide insights into structure based on conservation of base-
pairing. (A) A comparison of sequences in a part of ribosomal RNA taken from a variety of species. (B) The
implied secondary structure. Green lines indicate positions at which Watson–Crick base-pairing is completely
conserved in the sequences shown, whereas dots indicate positions at which Watson–Crick base-pairing is
31
conserved in most cases.
10.3 Evolutionary Trees Can Be Constructed on the Basis of Sequence Information

350 My

FIGURE 10.19 The lamprey is a jawless fish whose


ancestors diverged from bony fishes
How can we estimate the approximate dates of evolutionary events? approximately 400 million years ago. Lamprey
hemoglobin molecules contain only a single type
Evolutionary trees can be calibrated by the fossil record. of polypeptide chain.
Jawless fish, which diverged from bony fish approximately 400 mil-
lion years ago, contain hemoglobin built from a single type of sub-
unit 32
Horizontal Gene Transfer Events May Explain Unexpected Branches of the Evolutionary Tree

Horizontal gene transfer is the exchange of DNA between species that confers selective advantage on the
recipient.
Horizontal gene transfer is common among prokaryotes.

Example: the eukaryote Galdieria sulphuraria contains genes more closely related to bacteria and archaea.
This subset of genes allows G. sulphuraria to live in hostile environments.

FIGURE 10.20 The unicellular red alga Galdieria


sulphuraria is a eukaryote that can survive
extreme environments.

FIGURE 10.21 Proteins encoded within the G. sulphuraria


genome with high homology to prokaryotic sequences
33
are suggestive of horizontal gene transfer.
10.4 Modern Techniques Make the Experimental Exploration of Evolution Possible
Two techniques of biochemistry have made it possible to examine the course of evolution directly and not
simply by inference.

(1) The polymerase chain reaction (Section 9.1) allows the direct amplification and examination of ancient
DNA sequences. If DNA samples can be obtained from preserved remains, we can explore genomes from
species that are no longer living.

(2) Molecular evolution may also be investigated through the process of synthesizing highly diverse
populations of molecules and selecting for a biochemical property.

The combination of these two techniques provides a glimpse into the types of molecules that may have existed
very early in evolution.

SELEX

34
Ancient DNA Can Sometimes Be Amplified and Sequenced

The tremendous chemical stability of DNA makes the molecule well suited to its role as the storage site of
genetic information. So stable is the molecule that samples of DNA have survived for many thousands of years
under appropriate conditions.

With the development of PCR methods, such ancient DNA can sometimes be amplified and sequenced. This
approach has been applied to mitochondrial DNA from a Neanderthal fossil estimated at 38,000 years of age
excavated from Vindija Cave, Croatia, in 1980.

Remarkably, investigators have completely sequenced the mitochondrial genome from this specimen.

Comparison of the Neanderthal mitochondrial sequence with those from Homo sapiens individuals revealed
between 201 and 234 substitutions, considerably fewer than the approximately 1,500 differences between
human beings and chimpanzees over the same regions.

35
Further analysis suggested that the common ancestor of modern human beings and
Neanderthals lived approximately 570,000 years ago.

An evolutionary tree constructed by using these data and others revealed

that the Neanderthal was not an intermediate between chimpanzees


and human beings but, instead, was an evolutionary "dead end" that
became extinct.

Further analysis of these sequences has enabled researchers to


determine the extent of interbreeding between these groups, map the
geographic history of these populations, and make inferences about
additional ancestors whose DNA has not yet been sequenced.

A few earlier studies claimed to determine the sequences of far more ancient DNA such as that found in insects
trapped in amber, but these studies appear to have been flawed. The source of these sequences turned out to
be contaminating modern DNA.

Successful sequencing of ancient DNA requires sufficient DNA for reliable amplification and the rigorous
exclusion of all sources of contamination. 36
2022 Nobel Prize, Phys & Med

37
Svante Pä
Molecular Evolution Can Be Examined Experimentally
Evolution requires three processes:

(1) Diverse population


(2) Selection of fitness
(3) Reproduction

Nucleic acid molecules are capable of undergoing all three


processes in vitro.

Diverse population of nucleic acid synthesized in the laboratory by the


process of combinatorial chemistry

Selection process that isolates specific molecules with desired binding.

Replicated through the use of PCR

Errors that occur naturally in the course of the replication process introduce
additional variation into the population in each “generation.”
38
The molecules that bound well to the ATP affinity column were replicated
by reverse transcription into DNA, amplification by PCR, and transcription
back into RNA. Additional diversity was introduced into the pool by the
use of a somewhat error-prone reverse transcriptase, which introduces
additional mutations into the population during each cycle.
The new population was subjected to additional rounds of selection for
FIGURE 10.24 A conserved secondary structure ATP-binding activity, and after 8 generations, members of the selected
is common to RNA molecules selected for ATP population were characterized by sequencing.
binding.
17 different sequences were obtained, 16 of which could form the structure shown in Figure 10.24.
Each of these molecules bound ATP with dissociation constants (KD) less than 50 μM.

The loop folds back to form a deep pocket into which the adenine ring can fit.
Thus, a structure had evolved that was capable of a specific interaction.

39
FIGURE 10.25 Experimental molecular evolution yields an ATP-binding RNA molecule.
Synthetic oligonucleotides that can specifically bind ligands, such as the ATP-binding RNA molecules
described above, are referred to as aptamers.

In addition to their role in understanding molecular evolution, aptamers have shown promise as versatile
tools for biotechnology and medicine.

For example, they have been developed for diagnostic applications, serving as sensors for ligands ranging
from small organic molecules, such as cocaine, to larger proteins, such as thrombin.

Several aptamers are also being tested in clinical trials as therapies for diseases ranging from leukemia to
diabetes.

Macugen (pegaptanib sodium), an aptamer which binds to and inhibits the protein vascular endothelial
growth factor, has been approved for the treatment of age-related macular degeneration.

macular degeneration
황반변성 40
End of
Biochemistry 1

41

You might also like