Phylogenetic Tree Construction - Methods
Phylogenetic Tree Construction - Methods
1. Clustering-Based Methods
Module 23|1
Unweighted Pair Group Method Using Arithmetic Average (UPGMA)
The UPGMA method uses unweighted distances and assumes that all taxa
have constant evolutionary rates. Since this molecular clock assumption is
often not met in biological sequences, to build a more accurate phylogenetic
trees, the neighbour joining (NJ) method can be used, which is somewhat
similar to UPGMA in that it builds a tree by using stepwise reduced distance
matrices. However, the NJ method does not assume the taxa to be equidistant
from the root.
One of the disadvantages of the NJ method is that it generates only one tree
and does not test other possible tree topologies. This can be problematic
because, in many cases, in the initial step of NJ, there may be more than one
equally close pair of neighbours to join, leading to multiple trees. Ignoring
these multiple options may yield a suboptimal tree. To overcome the
limitations, a generalized NJ method has been developed, in which multiple
NJ trees with different initial taxon groupings are generated. A best tree is
then selected from a pool of regular NJ trees that best fit the actual
Module 23|2
evolutionary distances. This more extensive tree search means that this
approach has a better chance of finding the correct tree.
2. Optimality-Based Methods
Fitch–Margoliash
The Fitch–Margoliash (FM) method selects a best tree among all possible trees
based on minimal deviation between the distances calculated in the overall
branches in the tree and the distances in the original dataset. It starts by
randomly clustering two taxa in a node and creating three equations to
describe the distances, and then solving the three algebraic equations for
unknown branch lengths. The clustering of the two taxa helps to create a
newly reduced matrix. This process is iterated until a tree is completely
resolved. The method searches for all tree topologies and selects the one that
has the lowest squared deviation of actual distances and calculated tree
branch lengths.
Minimum Evolution
Minimum evolution (ME) constructs a tree with a similar procedure, but uses
a different optimality criterion that finds a tree among all possible trees with
a minimum overall branch length. Searching for the minimum total branch
length is an indirect approach to achieving the best fit of the branch lengths
with the original
CHARACTER-BASED METHODS
Module 23|3
The parsimony method chooses a tree that has the fewest evolutionary
changes or shortest overall branch lengths. It is based on a principle related
to a medieval philosophy called Occam’s razor. The theory was formulated
by William of Occam in the thirteenth century and states that the simplest
explanation is probably the correct one. This is because the simplest
explanation requires the fewest assumptions and the fewest leaps of logic. In
dealing with problems that may have an infinite number of possible solutions,
choosing the simplest model may help to “shave off” those variables that are
not really necessary to explain the phenomenon. By doing this, model
development may become easier, and there may be less chance of introducing
inconsistencies, ambiguities, and redundancies, hence, the name Occam’s
razor.
Parsimony tree building works by searching for all possible tree topologies
and reconstructing ancestral sequences that require the minimum number of
changes to evolve to the current sequences. To save computing time, only a
small number of sites that have the richest phylogenetic information are used
in tree determination. These sites are the so-called informative sites, which
are defined as sites that have at least two different kinds of characters, each
occurring at least twice. Informative sites are the ones that can often be
explained by a unique tree topology. Other sites are non-informative, which
are constant sites or sites that have changes occurring only once. Constant
sites have the same state in all taxa and are obviously useless in evaluating
the various topologies. The sites that have changes occurring only once are
not very useful either for constructing parsimony trees because they can be
explained by multiple tree topologies. The non-informative sites are thus
discarded in parsimony tree construction.
Once the informative sites are identified and the non-informative sites
discarded, the minimum number of substitutions at each informative site is
computed for a given tree topology. The total number of changes at all
informative sites are summed up for each possible tree topology. The tree that
has the smallest number of changes is chosen as the best tree.
Weighted Parsimony
Module 23|4
The parsimony method discussed is unweighted because it treats all
mutations as equivalent. This may be an oversimplification; mutations of
some sites are known to occur less frequently than others, for example,
transversions versus transitions, functionally important sites versus neutral
sites. Therefore, a weighting scheme that takes into account the different
kinds of mutations helps to select tree topologies more accurately. The MP
method that incorporates a weighting scheme is called weighted parsimony.
After logarithmic conversion, the likelihood score for the topology is the sum
of log likelihood of every single branch of the tree. After computing for all
possible tree paths with different combinations of ancestral sequences, the
tree path having the highest likelihood score is the final topology at the site.
Because all characters are assumed to have evolved independently, the log
likelihood scores are calculated for each site independently. The overall log
likelihood score for a given tree path for the entire sequence is the sum of log
likelihood of all individual sites. The same procedure has to be repeated for
all other possible tree topologies. The tree having the highest likelihood score
among all others is chosen as the best tree, which is the ML tree. This process
is exhaustive in nature and therefore very time consuming.
Quartet Puzzling
Module 23|5
The most commonly used heuristic ML method is called quartet puzzling,
which uses a divide-and-conquer approach. In this approach, the total
number of taxa are divided into many subsets of four taxa known as quartets.
An optimal ML tree is constructed from each of these quartets. This is a
relatively easy process as there are only three possible unrooted topologies for
a four-taxon tree. All the quartet trees are subsequently combined into a
larger tree involving all taxa. This process is like joining pieces in a jigsaw
puzzle, hence the name. The problem in drawing a consensus is that the
branching patterns in quartets with shared taxa may not agree. In this case,
a majority rule is used to determine the positions of branches to be inserted
to create the consensus tree.
NJML
Genetic Algorithm
Module 23|6
a near optimal tree can often be obtained within a limited number of
iterations.
Bayesian Analysis
Without going into much mathematical detail, it is important to know that the
Bayesian method can be used to infer phylogenetic trees with maximum
posterior probability. In Bayesian tree selection, the prior probability is the
probability for all possible topologies before analysis. The probability for each
of these topologies is equal before tree building. The conditional probability is
the substitution frequency of characters observed from the sequence
alignment. These two pieces of information are used as a condition by the
Bayesian algorithm to search for the most probable trees that best satisfy the
observations.
In the end, the Bayesian method can achieve the same or even better
performance than the complete ML method, but is much faster than regular
ML and is able to handle very large datasets. The reason that the Bayesian
analysis may achieve better performance than ML is that the ML method
searches one single best tree, whereas the Bayesian method searches a set of
best trees. The advantage of the Bayesian method can be explained by the
matter of probability. Because the true tree is not known, an optimal ML tree
may have, say, 90% probability of representing the reality. However, the
Bayesian method produces hundreds or thousands of optimal or near-optimal
trees with 88% to 90% probability to represent the reality. Thus, the latter
approach has a better chance overall to guess the true tree correctly.
Module 23|7