0% found this document useful (0 votes)
2 views28 pages

5. Phylogenetics Basics

Chapter 5 discusses molecular phylogenetics, focusing on the use of molecular data, such as DNA and protein sequences, to infer evolutionary relationships among species. It outlines the process of constructing phylogenetic trees, including the selection of molecular markers, alignment of sequences, and evaluation of tree reliability. The chapter also defines key terminology and concepts related to phylogenetic trees, such as clades, tree topology, and various tree construction methods.

Uploaded by

tranvinh13012004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views28 pages

5. Phylogenetics Basics

Chapter 5 discusses molecular phylogenetics, focusing on the use of molecular data, such as DNA and protein sequences, to infer evolutionary relationships among species. It outlines the process of constructing phylogenetic trees, including the selection of molecular markers, alignment of sequences, and evaluation of tree reliability. The chapter also defines key terminology and concepts related to phylogenetic trees, such as clades, tree topology, and various tree construction methods.

Uploaded by

tranvinh13012004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Chapter 5.

Molecular phylogenetics
13/5/2025

1
The tree of life

2
Sketch of Darwin’s phylogenetic tree

• I think
Case must be that one generation then
should be as many living as now. To do
this & to have many species in same
genus (as is) requires extinction.

Thus between A & B immense gap of


relation. C & B the finest gradation, B &
D rather greater distinction. Thus
genera would be formed. - bearing
relation to ancient types with several
extinct forms...
Phylogenetic study uses fossil records

• Fossil records contain morphological


information about ancestors of current
species and the timeline of divergence
• They are available for only a few species
• They are fragmentary
• Description of morphology can be
ambiguous
• They are almost non-existent for
microorganisms

4
Carl Woese’s phylogenetic tree based on SSU rRNA genes

→Molecular data (DNA or protein sequences) can serve as molecular fossils


→ evolutionary relationship between species can be inferred from sequence
similarity of the molecules

Woese et al (1990)
Major assumptions

• Molecular sequences used in phylogenetic construction are homologous

• Phylogenetic divergence is assumed to be bifurcating (parent branch splits


into two daughter branches at any time)

• Each position in a sequence evolved independently

6
Carl Woese’s phylogenetic tree based on SSU rRNA genes
A new view of the tree
of life based on
genomics data

Phylogenetics is the study of the


evolutionary history of living organisms
using treelike diagrams to represent
pedigrees of these organisms.

Hug et al., 2016


Some terminology

• Phylogenetic tree shows


evolutionary relationship between
organisms or between
homologous genes.
• Terminal node: current taxon
• Internal node (ancestral node):
extinct, ancestor of the current taxa
• Branch: reflects levels of divergence
between taxa
• Root node: common ancestor of all
taxa presented on the phylogenetic
tree

10.3390/bioengineering11050480
Some terminology

• Clade: group of taxa having a common


ancestor

• Tree topology: branching pattern

• Dichotomy: ancestor gives rise to two


descendants

• Polytomy: ancestor gives rise to more than


two descendants.

10
Some terminology

• Unrooted tree: does not assume the existence of a


common ancestor, only shows relative relationship
between current taxa → cannot show evolutionary path

• Rooted tree: show a common ancestor

• Problem: common ancestor is extinct → define the root of


the tree by showing an outgroup (homologous
sequence, sufficiently divergent from the rest of the taxa
on the tree to show that they parted very early in
evolutionary time)

11 Essential bioinformatics
Form of tree representation

• Branches of a tree can rotate


without changing relationship
among taxa

• Phylogram: branch length


represents the amount of
evolutionary divergence (how
different the two sequences of the
two taxa are) → the tree is scaled

• Cladogram: only show tree


topology

12 Essential bioinformatics
Form of tree representation

13
Gene phylogeny vs species phylogeny
Constructing the correct tree

15 Essential bioinformatics
Constructing the correct tree

16 Essential bioinformatics
Constructing a phylogenetic tree – a procedure
1. Choice of molecular
markers and collect
sequences

2. Performing multiple
sequence alignment

3. Choosing an
evolutionary model

4. Choosing a tree
construction method

5. Assessing tree reliability

17 10.3390/bioengineering11050480
1. Choosing molecular markers

• DNA or protein

• Nucleotide sequences which evolve more rapidly than protein can be used
when studying very closely related organisms.
• E.g., non-coding sequence of mitochondrial DNA when analyzing evolution of individuals
in a population
• Slower evolving sequences (rRNA or ribosome proteins) when study more divergent
groups of organisms.
• Proteins are more conserved because of the degenerate genetic code

18
2. Multiple sequence alignment

• Choose state-of-the-art algorithm, inspect manually


• Observed number of substitutions may not represent true evolutionary events
due to:
• Multiple substitution events
• Reversion
Substitution model/
• Parallel mutation evolutionary model

19
3. Choosing substitution models

• Models differ in how multiple substitutions of each residue


are treated

• Jukes-Cantor model for nucleotide substitution model


dAB = - (3/4) ln[1 – (4/3) pAB]
d: evolutionary distance between sequences A and B
p: proportion of substitutions over the length of the
alignment
Jukes-Cantor Model applies to closedly related sequences

20 Essential bioinformatics
3. Choosing substitution models
• Models differ in how multiple substitutions of each residue
are treated

• Kimura Model for nucleotide substitution model


dAB = - (1/2) ln(1-2pti-ptv) – (1/4) ln(1-2ptv)
d: evolutionary distance between sequences A and B
pti: frequency of transition
pti: frequency of transversion
Mutation rates for transition and transversion are different

• Other models: TN93, HKY, GTR


• For proteins: PAM, JTT amino acid substitution matrix
21 Essential bioinformatics
3. Among-Site variations

• Different models for calculation evolutionary changes, different positions in a


sequence are assumed to be evolving at the same rate.

• In reality, the rates of substitutions in DNA differ for different codon positions

• Rate of substitution in protein sequences also differ due to functional


constraints → among-site rate heterogeneity.

• Evolutionary distance estimation is corrected using ɣ correction factor

22
4. Choosing tree construction method

• Distance-based method: neighbor-joining


• Character-based method: maximum likelihood

Algorithm Principle Criteria for selecting Application


the final tree
NJ Minimal evolution: Only one tree is Short sequences with
minimizing the total constructed small evolutionary
branch length of the distance and few
phylogenetic tree informative sites

ML Maximize likelihood Phylogenetic tree with Distantly related and


value maximum likelihood small number of
value sequences
23
Neighbor-Joining algorithm

• Step 1: Initial unrooted star-shaped tree is constructed based on an initial


distance matrix
• Step 2: Merge two nodes with the smallest distance
• Step 3: Repeat step 2 until only one cluster remains, resulting in the NJ tree.

24
Maximum Likelihood algorithm

• Step 1: choose an appropriate model


• Step 2: conduct a tree search
• Step 3: Optimize substitution parameters
and branch lengths for each topology to
maximize the likelihood value for each
topology
• Step 4: the topology with the highest ML
value is selected

25
5. Phylogenetic tree evaluation

• Statistical evaluation of the reliability of the inferred phylogeny


• Approach: analytical resampling strategy such as bootstrapping
• The tree are repeatedly constructed with slightly perturbed alignments that
have some random fluctuations introduced.
• If the tree is reliable, the evolutionary relationship among taxa remains true
even though random fluctuation is introduced.

26
A bootstrap value
of 70%
approximately
corresponds to
95% statistical
confidence

27
28

You might also like