Phylogenic Tree
Phylogenic Tree
2
What is phylogenetic analysis and why
should we perform it?
Terminal Nodes
Branches or
Lineages A Represent the
TAXA (genes,
populations,
B species, etc.)
used to infer
C the phylogeny
D
Ancestral Node
or ROOT of Internal Nodes or E
the Tree Divergence Points
(represent hypothetical
ancestors of the taxa)
Phylogenetic trees diagram the evolutionary
relationships between the taxa
Taxon B
Taxon C
No meaning to the
spacing between the
Taxon A taxa, or to the order in
which they appear from
top to bottom.
Taxon D
Taxon E
6
Taxon B 1 Taxon B Taxon B
1
Taxon C 3 Taxon C Taxon C
1
Taxon A Taxon A Taxon A
Taxon D 5
Taxon D Taxon D
All show the same evolutionary relationships, or branching orders, between the taxa.
Example: Which species are the closest
living relatives of modern humans?
Humans Gorillas
Chimpanzees Chimpanzees
Bonobos Bonobos
Gorillas Orangutans
Orangutans Humans
14 0 15-30 0
MYA MYA
Mitochondrial DNA, most nuclear DNA- The pre-molecular view was that the
encoded genes, and DNA/DNA great apes (chimpanzees, gorillas and
hybridization all show that bonobos and orangutans) formed a clade separate
chimpanzees are related more closely to from humans, and that humans diverged
humans than either are to gorillas. from the apes at least 15-30 MYA.
The goal of phylogeny inference is to resolve the
branching orders of lineages in evolutionary trees:
A A A
B C E
C E C
D B B
E D D
B D C D D C
Phylogenetic tree building (or inference) methods are aimed at
discovering which of the possible unrooted trees is "correct".
We would like this to be the “true” biological tree — that is, one
that accurately represents the evolutionary history of the taxa.
However, we must settle for discovering the computationally
correct or optimal tree for the phylogenetic method of choice.
The number of unrooted trees increases in a greater
than exponential manner with number of taxa
A B
# Taxa (N) # Unrooted trees
C A C 3 1
4 3
5 15
B D 6 105
7 945
C 8 10,935
A D
9 135,135
10 2,027,025
E . .
B
. .
C . .
A D . .
30 ≈3.58 x 1036
B D
B A C D
A C
Root
Root
B D
An unrooted, four-taxon tree theoretically can be rooted in five
different places to produce five different rooted trees
2 4
A C
The unrooted tree 1: 1 5
B 3 D
Rooted tree 1a Rooted tree 1b Rooted tree 1c Rooted tree 1d Rooted tree 1e
B A A C D
A B B D C
C C C A A
D D D B B
These trees show five different evolutionary relationships among the taxa!
All of these rearrangements show the same evolutionary
relationships between the taxa
A A
C D
D C
Rooted tree 1a
B B
B
C D
A
D C
C A A
B B
D
B B
C D
D C
A A
Possible evolutionary trees
Taxa (n): 2 3 4
2 1/1
3 1/3
4 3/15
Possible evolutionary trees
2 1 1
3 3 1
4 15 3
5 105 15
6 954 105
7 10,395 954
8 135,135 10,395
9 2,027,025 135,135
10 34,459,425 2,027,025
There are two major ways to root trees:
By outgroup:
Uses taxa (the “outgroup”) that are
known to fall outside of the group of
interest (the “ingroup”). Requires
some prior knowledge about the
relationships among the taxa. The
outgroup can either be species (e.g.,
birds to root a mammalian tree) or
previous gene duplicates (e.g., outgroup
a-globins to root b-globins).
By midpoint or distance:
Roots the tree at the midway point A
d (A,D) = 10 + 3 + 5 = 18
between the two most distant taxa in
Midpoint = 18 / 2 = 9
the tree, as determined by branch
lengths. Assumes that the taxa are 10
C
evolving in a clock-like manner. This 3 2
assumption is built into some of the B 2
5 D
distance-based tree building methods.
Types of data used in phylogenetic inference:
Character-based methods: Use the aligned characters, such as DNA
or protein sequences, directly during tree inference.
Taxa Characters
Species A ATGGCTATTCTTATAGTACG
Species B ATCGCTAGTCTTATATTACA
Species C TTCACTAGACCTGTGGTCCA
Species D TTGACCAGACCTGTGGTCCG
Species E TTGACCAGTTCTCTAGTTCG
A B
# Taxa (N) # Unrooted trees
A C 3 1
C 4 3
5 15
B D 6 105
7 945
C 8 10,935
A D
9 135,135
10 2,027,025
. .
B E . .
C . .
A D
. .
30 ≈3.58 x 1036
B F E
(2N - 5)!! = # unrooted trees for N taxa
Heuristic search algorithms are Rerunning heuristic searches using
input order dependent and can get different input orders of taxa can help
stuck in local minima or maxima find global minima or maxima
Search
for global
Search maximum
for global
minimum GLOBAL GLOBAL
MAXIMUM MAXIMUM
local
maximum
local
minimum GLOBAL GLOBAL
MINIMUM MINIMUM
Classification of phylogenetic inference methods
COMPUTATIONAL METHOD
Optimality criterion Clustering algorithm
Characters
PARSIMONY
MAXIMUM LIKELIHOOD
DATA TYPE
Distances
Disadvantages:
• Similarity and relationship are not necessarily the same thing, so clustering by
similarity does not necessarily give an evolutionary tree.
• Cannot be used for character analysis!
• Have no explicit optimization criteria, so one cannot even know if the program
worked properly to find the correct tree for the method.
NJ algorithm
ACA GTA
ACT GTT
Maximum Parsimony
ACT GTA ACA ACT
2 GTT GTA ACA ACT
1 3 1 3
GTT 2 ACA GTT GTA
MP score = 5 MP score = 7
ACA GTA
ACA GTA
1 2 1
ACT GTT
MP score = 4
Optimal MP tree
Maximum Parsimony:
computational complexity
Optimal labeling can be
computed in linear time O(nk)
ACA GTA
ACA GTA
1 2 1
ACT GTT
MP score = 4
Local optimum
Cost
Global optimum
Phylogenetic trees
Local search for MP
• Determine a candidate solution s
• While s is not a local minimum
– Find a neighbor s’ of s such that MP(s’)<MP(s)
– If found set s=s’
– Else return s and exit
0.07
0.06
0.05
0.04 TNT
0.03
0.02
0.01
0
1 48 96 144 192 240 288 336
Iterated local search:
escape local optima by
perturbation
Local search
Local optimum
Iterated local search:
escape local optima by
perturbation
Local search
Local optimum
Perturbation
Output of perturbation
Iterated local search:
escape local optima by
perturbation
Local search
Local optimum
Perturbation
Local search
Output of perturbation