0% found this document useful (0 votes)

174 views

Phylogenetic Tree Construction - Methods

Building a phylogenetic tree requires four distinct steps: (Step 1) identify and acquire a set of homologous DNA or protein sequences, (Step 2) align those sequences, (Step 3) estimate a tree from the aligned sequences, and (Step 4) present that tree in such a way as to clearly convey the relevant information to others.Typically you would use your favorite web browser to identify and download the homologous sequences from a national database such as GenBank, then one of several alignment program

Uploaded by

vanigo1824

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

174 views

Phylogenetic Tree Construction - Methods

Uploaded by

vanigo1824

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

FUNDAMENTALS OF BIOINFORMATICS

Module 23: Phylogenetic Tree Construction – Methods

Welcome all to a new session on Fundamentals of Bioinformatics. In this

session, we will discuss the theory behind the different methods of
phylogenetic tree construction.

An evolutionary tree is a two dimensional graph showing evolutionary

relationships among organisms or in the case of sequences, in certain genes
in separate organisms. There are currently two main categories of tree-
building methods, each having advantages and limitations.

The first category is based on discrete characters, which are molecular

sequences from individual taxa. The basic assumption is that characters at
corresponding positions in a multiple sequence alignment are homologous
among the sequences involved. Therefore, the character states of the common
ancestor can be traced from this dataset. Another assumption is that each
character evolves independently and is therefore treated as an individual
evolutionary unit.

The second category of phylogenetic methods is based on distance, which is

the amount of dissimilarity between pairs of sequences, computed on the
basis of sequence alignment. The distance-based methods assume that all
sequences involved are homologous and that tree branches are additive,
meaning that the distance between two taxa equals the sum of all branch
lengths connecting them.

First we can look at Distance based methods

I. Distance Based Methods

The algorithms for the distance-based tree-building method can be subdivided

into either clustering based or optimality based.

The clustering-type algorithms compute a tree based on a distance matrix

starting from the most similar sequence pairs. These algorithms include an
unweighted pair group method using arithmetic average (UPGMA) and
neighbor joining (NJ).

The optimality-based algorithms compare many alternative tree topologies

and select one that has the best fit between estimated distances in the tree
and the actual evolutionary distances. This category includes the Fitch–
Margoliash and minimum evolution algorithms.

1. Clustering-Based Methods

In this category, we will discuss two important methods

Module 23|1
Unweighted Pair Group Method Using Arithmetic Average (UPGMA)

The simplest clustering method is UPGMA, which builds a tree by a sequential

clustering method. Given a distance matrix, it starts by grouping two taxa
with the smallest pairwise distance in the distance matrix. A node is placed
at the midpoint or half distance between them. It then creates a reduced
matrix by treating the new cluster as a single taxon. The distances between
this new composite taxon and all remaining taxa are calculated to create a
reduced matrix. The same grouping process is repeated and another newly
reduced matrix is created. The iteration continues until all taxa are placed on
the tree. The last taxon added is considered the out-group producing a rooted
tree. The basic assumption of the UPGMA method is that all taxa evolve at a
constant rate and that they are equally distant from the root, implying that a
molecular clock is in effect. However, real data rarely meet this assumption.
Thus, UPGMA often produces erroneous tree topologies. However, owing to its
fast speed of calculation, it has found extensive usage in clustering analysis
of DNA microarray data.

Neighbour Joining (NJ)

The UPGMA method uses unweighted distances and assumes that all taxa
have constant evolutionary rates. Since this molecular clock assumption is
often not met in biological sequences, to build a more accurate phylogenetic
trees, the neighbour joining (NJ) method can be used, which is somewhat
similar to UPGMA in that it builds a tree by using stepwise reduced distance
matrices. However, the NJ method does not assume the taxa to be equidistant
from the root.

The tree construction process is somewhat opposite to that used UPGMA.

Rather than building trees from the closest pair of branches and progressing
to the entire tree, the NJ tree method begins with a completely unresolved
star tree by joining all taxa onto a single node and progressively decomposes
the tree by selecting pairs of taxa based on the above modified pairwise
distances. This allows the taxa with the shortest corrected distances to be
joined first as a node. After the first node is constructed, the newly created
cluster reduces the matrix by one taxon and allows the next most closely
related taxon to be joined next to the first node. The cycle is repeated until all
internal nodes are resolved. This process is called star decomposition.
Unlike UPGMA, NJ and most other phylogenetic methods produce unrooted
trees. The out-group has to be determined based on external knowledge.

One of the disadvantages of the NJ method is that it generates only one tree
and does not test other possible tree topologies. This can be problematic
because, in many cases, in the initial step of NJ, there may be more than one
equally close pair of neighbours to join, leading to multiple trees. Ignoring
these multiple options may yield a suboptimal tree. To overcome the
limitations, a generalized NJ method has been developed, in which multiple
NJ trees with different initial taxon groupings are generated. A best tree is
then selected from a pool of regular NJ trees that best fit the actual

Module 23|2
evolutionary distances. This more extensive tree search means that this
approach has a better chance of finding the correct tree.

2. Optimality-Based Methods

The clustering-based methods produce a single tree as output. However, there

is no criterion in judging how this tree is compared to other alternative trees.
In contrast, optimality-based methods have a well-defined algorithm to
compare all possible tree topologies and select a tree that best fits the actual
evolutionary distance matrix. Based on the differences in optimality criteria,
there are two types of algorithms, Fitch–Margoliash and minimum evolution,
which we will discuss next.

Fitch–Margoliash

The Fitch–Margoliash (FM) method selects a best tree among all possible trees
based on minimal deviation between the distances calculated in the overall
branches in the tree and the distances in the original dataset. It starts by
randomly clustering two taxa in a node and creating three equations to
describe the distances, and then solving the three algebraic equations for
unknown branch lengths. The clustering of the two taxa helps to create a
newly reduced matrix. This process is iterated until a tree is completely
resolved. The method searches for all tree topologies and selects the one that
has the lowest squared deviation of actual distances and calculated tree
branch lengths.

Minimum Evolution

Minimum evolution (ME) constructs a tree with a similar procedure, but uses
a different optimality criterion that finds a tree among all possible trees with
a minimum overall branch length. Searching for the minimum total branch
length is an indirect approach to achieving the best fit of the branch lengths
with the original

So far our discussion has been on distance-based methods for phylogenetic

tree construction. Now we will move towards the second category.

CHARACTER-BASED METHODS

Character-based methods (also called discrete methods) are based directly on

the sequence characters rather than on pairwise distances. They count
mutational events accumulated on the sequences and may therefore avoid the
loss of information when characters are converted to distances. This
preservation of character information means that evolutionary dynamics of
each character can be studied. Ancestral sequences can also be inferred. The
two most popular character-based approaches are the maximum parsimony
(MP) and maximum likelihood (ML) methods.

Maximum Parsimony (MP)

Module 23|3
The parsimony method chooses a tree that has the fewest evolutionary
changes or shortest overall branch lengths. It is based on a principle related
to a medieval philosophy called Occam’s razor. The theory was formulated
by William of Occam in the thirteenth century and states that the simplest
explanation is probably the correct one. This is because the simplest
explanation requires the fewest assumptions and the fewest leaps of logic. In
dealing with problems that may have an infinite number of possible solutions,
choosing the simplest model may help to “shave off” those variables that are
not really necessary to explain the phenomenon. By doing this, model
development may become easier, and there may be less chance of introducing
inconsistencies, ambiguities, and redundancies, hence, the name Occam’s
razor.

For phylogenetic analysis, parsimony seems a good assumption. By this

principle, a tree with the least number of substitutions is probably the best
to explain the differences among the taxa under study. This view is justified
by the fact that evolutionary changes are relatively rare within a reasonably
short time frame. This implies that a tree with minimal changes is likely to be
a good estimate of the true tree. By minimizing the changes, the method
minimizes the phylogenetic noise owing to homoplasy and independent
evolution.

How Does MP Tree Building Work?

Parsimony tree building works by searching for all possible tree topologies
and reconstructing ancestral sequences that require the minimum number of
changes to evolve to the current sequences. To save computing time, only a
small number of sites that have the richest phylogenetic information are used
in tree determination. These sites are the so-called informative sites, which
are defined as sites that have at least two different kinds of characters, each
occurring at least twice. Informative sites are the ones that can often be
explained by a unique tree topology. Other sites are non-informative, which
are constant sites or sites that have changes occurring only once. Constant
sites have the same state in all taxa and are obviously useless in evaluating
the various topologies. The sites that have changes occurring only once are
not very useful either for constructing parsimony trees because they can be
explained by multiple tree topologies. The non-informative sites are thus
discarded in parsimony tree construction.

Once the informative sites are identified and the non-informative sites
discarded, the minimum number of substitutions at each informative site is
computed for a given tree topology. The total number of changes at all
informative sites are summed up for each possible tree topology. The tree that
has the smallest number of changes is chosen as the best tree.

A related term in this category is weighted parsimony.

Weighted Parsimony

Module 23|4
The parsimony method discussed is unweighted because it treats all
mutations as equivalent. This may be an oversimplification; mutations of
some sites are known to occur less frequently than others, for example,
transversions versus transitions, functionally important sites versus neutral
sites. Therefore, a weighting scheme that takes into account the different
kinds of mutations helps to select tree topologies more accurately. The MP
method that incorporates a weighting scheme is called weighted parsimony.

Maximum Likelihood Method (ML)

Another character-based approach is ML, which uses probabilistic models to

choose a best tree that has the highest probability or likelihood of reproducing
the observed data. It finds a tree that most likely reflects the actual
evolutionary process. ML is an exhaustive method that searches every
possible tree topology and considers every position in an alignment, not just
informative sites. By employing a particular substitution model that has
probability values of residue substitutions, ML calculates the total likelihood
of ancestral sequences evolving to internal nodes and eventually to existing
sequences. It sometimes also incorporates parameters that account for rate
variations across sites.

How Does the Maximum Likelihood Method Work?

ML works by calculating the probability of a given evolutionary path for a

particular extant sequence. The probability values are determined by a
substitution model (either for nucleotides or amino acids). For a particular
site, the probability of a tree path is the product of the probability from the
root to all the tips, including every intermediate branches in the tree topology.
Because multiplication often results in very small values, it is computationally
more convenient to express all probability values as natural log likelihood
(lnL) values, which also converts multiplication into summation. Because
ancestral characters at internal nodes are normally unknown, all possible
scenarios of ancestral states have to be computed.

After logarithmic conversion, the likelihood score for the topology is the sum
of log likelihood of every single branch of the tree. After computing for all
possible tree paths with different combinations of ancestral sequences, the
tree path having the highest likelihood score is the final topology at the site.
Because all characters are assumed to have evolved independently, the log
likelihood scores are calculated for each site independently. The overall log
likelihood score for a given tree path for the entire sequence is the sum of log
likelihood of all individual sites. The same procedure has to be repeated for
all other possible tree topologies. The tree having the highest likelihood score
among all others is chosen as the best tree, which is the ML tree. This process
is exhaustive in nature and therefore very time consuming.

Quartet Puzzling

Module 23|5
The most commonly used heuristic ML method is called quartet puzzling,
which uses a divide-and-conquer approach. In this approach, the total
number of taxa are divided into many subsets of four taxa known as quartets.
An optimal ML tree is constructed from each of these quartets. This is a
relatively easy process as there are only three possible unrooted topologies for
a four-taxon tree. All the quartet trees are subsequently combined into a
larger tree involving all taxa. This process is like joining pieces in a jigsaw
puzzle, hence the name. The problem in drawing a consensus is that the
branching patterns in quartets with shared taxa may not agree. In this case,
a majority rule is used to determine the positions of branches to be inserted
to create the consensus tree.

NJML

NJML is a hybrid algorithm combining aspects of NJ and ML. It constructs an

initial tree using the NJ method with bootstrapping (which will be described).
The branches with low bootstrap support are collapsed to produce multi-
furcating branches. The polytomy is resolved using the ML method. Although
the performance of this method is not yet as good as the complete ML method,
it is at least ten times faster.

Genetic Algorithm

A recent addition to fast ML search methods is the GA, a computational

optimization strategy that uses biological terminology as a metaphor because
the method involves “crossing” mathematical routines to generate new
“offspring” routines. The algorithm works by selecting an optimal result
through a mix-and-match process using a number of existing random
solutions. A “fitness” measure is used to monitor the optimization process. By
keeping record of the fitness scores, the process simulates the natural
selection and genetic crossing processes. For instance, a subroutine that has
the best score (best fit process) is selected in the first round and is used as a
starting point for the next round of the optimization cycle. Again using
biological metaphors, this is to generate more “offspring,” which are
mathematical trials with modifications from the previous ones. Different
computational routines (or “chromosomes”) are also allowed to combine (or
“crossover”) to produce a new solution. The iteration continues until an
optimal solution is found.

When applying GA to phylogenetic inference, the method strongly resembles

the pruning and re-grafting routines used in the branch-swapping process. In
GA-based tree searching, the fitness measure is the log likelihood scores. The
tree search begins with a population of random trees with an arbitrary branch
lengths. The tree with a highest log likelihood score is allowed to leave more
“offspring” with “mutations” on the tree topology. The mutational process is
essentially branch rearrangement. Mutated new trees are scored. Those that
are scored higher than the parent tree are allowed to mutate more to produce
even higher scored offspring, if possible. This process is repeated until no
higher scored trees can be found. The advantage of this algorithm is its speed;

Module 23|6
a near optimal tree can often be obtained within a limited number of
iterations.

Bayesian Analysis

Another recent development of a speedy ML method is the use of the Bayesian

analysis method. The essence of Bayesian analysis is to make inference on
something unobserved based on existing observations. It makes use of an
important concept of known as posterior probability, which is defined as the
probability that is revised from prior expectations, after learning something
new about the data. In mathematical terms, Bayesian analysis is to calculate
posterior probability of two joint events by using the prior probability and
conditional probability values using the following simplified formula:

Without going into much mathematical detail, it is important to know that the
Bayesian method can be used to infer phylogenetic trees with maximum
posterior probability. In Bayesian tree selection, the prior probability is the
probability for all possible topologies before analysis. The probability for each
of these topologies is equal before tree building. The conditional probability is
the substitution frequency of characters observed from the sequence
alignment. These two pieces of information are used as a condition by the
Bayesian algorithm to search for the most probable trees that best satisfy the
observations.

The tree search incorporates an iterative random sampling strategy based on

the Markov chain Monte Carlo (MCMC) procedure. MCMC is designed as a
“hill-climbing” procedure, seeking higher and higher likelihood scores while
searching for tree topologies, although occasionally it goes downhill because
of the random nature of the search. Over time, high-scoring trees are sampled
more often than low-scoring trees. When MCMC reaches high scored regions,
a set of near optimal trees are selected to construct a consensus tree.

In the end, the Bayesian method can achieve the same or even better
performance than the complete ML method, but is much faster than regular
ML and is able to handle very large datasets. The reason that the Bayesian
analysis may achieve better performance than ML is that the ML method
searches one single best tree, whereas the Bayesian method searches a set of
best trees. The advantage of the Bayesian method can be explained by the
matter of probability. Because the true tree is not known, an optimal ML tree
may have, say, 90% probability of representing the reality. However, the
Bayesian method produces hundreds or thousands of optimal or near-optimal
trees with 88% to 90% probability to represent the reality. Thus, the latter
approach has a better chance overall to guess the true tree correctly.

Module 23|7

Phylogenetic Tree Constructions Methods and Programmes - L 11 - 12
No ratings yet
Phylogenetic Tree Constructions Methods and Programmes - L 11 - 12
27 pages
Unit IV
No ratings yet
Unit IV
11 pages
PHYLOGENETIC TREE
No ratings yet
PHYLOGENETIC TREE
9 pages
Molecular phylogeny- Introduction
No ratings yet
Molecular phylogeny- Introduction
12 pages
Phylogenetics PDF by Matti Ullah KHan NIazi
No ratings yet
Phylogenetics PDF by Matti Ullah KHan NIazi
4 pages
Phylogenetic Trees (BIOINFORMATICS)
No ratings yet
Phylogenetic Trees (BIOINFORMATICS)
7 pages
swami ppt
No ratings yet
swami ppt
11 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
Module 2 Unit - 2 EVOLUTIONARY TREES AND PHYLOGENY
No ratings yet
Module 2 Unit - 2 EVOLUTIONARY TREES AND PHYLOGENY
39 pages
Computational Phylogenetics
No ratings yet
Computational Phylogenetics
18 pages
PHYLOGENY
No ratings yet
PHYLOGENY
17 pages
swami ppt (1)
No ratings yet
swami ppt (1)
12 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
Bandelt, Rohl 1999
No ratings yet
Bandelt, Rohl 1999
12 pages
1995 - Ho - Random Decision Forests
No ratings yet
1995 - Ho - Random Decision Forests
5 pages
Cognato q#5
No ratings yet
Cognato q#5
4 pages
Construction of Phylogenetic Tree.
No ratings yet
Construction of Phylogenetic Tree.
4 pages
Phylogenetic Analysis Methods
No ratings yet
Phylogenetic Analysis Methods
2 pages
Notes
No ratings yet
Notes
2 pages
Trees and or Networks To Display Intraspecific DNA Sequence Variation?
No ratings yet
Trees and or Networks To Display Intraspecific DNA Sequence Variation?
6 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Extremely Randomized Trees: Pierre Geurts
No ratings yet
Extremely Randomized Trees: Pierre Geurts
40 pages
Chapter-5-Cluster Analysis PDF
No ratings yet
Chapter-5-Cluster Analysis PDF
5 pages
BE Phylogenetics
No ratings yet
BE Phylogenetics
6 pages
HCPC Husson Josse
No ratings yet
HCPC Husson Josse
17 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
Assignment.3.2.2 Phylogenetics Anaylsis-NCMJ
No ratings yet
Assignment.3.2.2 Phylogenetics Anaylsis-NCMJ
2 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Linkage Methods
No ratings yet
Linkage Methods
2 pages
Improving Geometrical Structure of Data Using Geometric Decision Tree
No ratings yet
Improving Geometrical Structure of Data Using Geometric Decision Tree
6 pages
A Famous Example of Cluster Analysis
No ratings yet
A Famous Example of Cluster Analysis
5 pages
Multiple Sequence Alignment For Construction of Phylogenetic Tree
No ratings yet
Multiple Sequence Alignment For Construction of Phylogenetic Tree
5 pages
Phyml Maximum Likelihood Trees
No ratings yet
Phyml Maximum Likelihood Trees
37 pages
Ver Invariant and Metric Free Proximities For Data Jss.v025.i11
No ratings yet
Ver Invariant and Metric Free Proximities For Data Jss.v025.i11
22 pages
The Problem of Redundant Variables
No ratings yet
The Problem of Redundant Variables
10 pages
1 PT3-4 47-s03 MIELNICZUK STRZELECKI
No ratings yet
1 PT3-4 47-s03 MIELNICZUK STRZELECKI
16 pages
Michael Collins. Head-driven Statistical Models for Natural Language Processing, Computational Linguistics
No ratings yet
Michael Collins. Head-driven Statistical Models for Natural Language Processing, Computational Linguistics
49 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
922
No ratings yet
922
49 pages
Paper - Hierarchical Cluster
No ratings yet
Paper - Hierarchical Cluster
13 pages
Pathways of Evolution and Are Based On Parsimony or Likelihood Methods. The
No ratings yet
Pathways of Evolution and Are Based On Parsimony or Likelihood Methods. The
1 page
Clustering Hierarchical Algorithms
100% (1)
Clustering Hierarchical Algorithms
21 pages
An Approach To Study Species Persistence in Unconstrained Random Networks
No ratings yet
An Approach To Study Species Persistence in Unconstrained Random Networks
13 pages
HierarchicalClusterAnalysis1
No ratings yet
HierarchicalClusterAnalysis1
13 pages
Classification and Regression Trees As Alternatives To Regression
No ratings yet
Classification and Regression Trees As Alternatives To Regression
2 pages
Lecture 9- Phylogenetic tree
No ratings yet
Lecture 9- Phylogenetic tree
16 pages
What Is Mode
No ratings yet
What Is Mode
4 pages
Machine Learning in Ecology
No ratings yet
Machine Learning in Ecology
15 pages
Phylogenetic Analyses: Kirsi Kostamo
No ratings yet
Phylogenetic Analyses: Kirsi Kostamo
33 pages
On Evidential Combination Rules For Ensemble Classifiers: Henrik Bostr Om Ronnie Johansson Alexander Karlsson
No ratings yet
On Evidential Combination Rules For Ensemble Classifiers: Henrik Bostr Om Ronnie Johansson Alexander Karlsson
8 pages
The Sage Encyclopedia of Communication Research Methods
No ratings yet
The Sage Encyclopedia of Communication Research Methods
6 pages
10 1371@journal Pone 0221068
No ratings yet
10 1371@journal Pone 0221068
20 pages
clustering
No ratings yet
clustering
8 pages
Clustering With Decision Trees: Divisive and Agglomerative Approach
No ratings yet
Clustering With Decision Trees: Divisive and Agglomerative Approach
6 pages
Phylogeny_Notes
No ratings yet
Phylogeny_Notes
14 pages
Functional Models For Regression Tree Leaves: Luís Torgo
No ratings yet
Functional Models For Regression Tree Leaves: Luís Torgo
9 pages
Random Survival Forests For High-Dimensional Data: Hemant Ishwaran, Udaya B. Kogalur, Xi Chen and Andy J. Minn
No ratings yet
Random Survival Forests For High-Dimensional Data: Hemant Ishwaran, Udaya B. Kogalur, Xi Chen and Andy J. Minn
18 pages
An Approach of Hybrid Clustering Technique For Maximizing Similarity of Gene Expression
No ratings yet
An Approach of Hybrid Clustering Technique For Maximizing Similarity of Gene Expression
14 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Complexity
From Everand
Complexity
IntroBooks Team
3/5 (1)
Affordable Infrared-Optical Pose-Tracking For Virt
No ratings yet
Affordable Infrared-Optical Pose-Tracking For Virt
9 pages
Euclidean Distance Matrix Trick
No ratings yet
Euclidean Distance Matrix Trick
3 pages
2503.19067v1
No ratings yet
2503.19067v1
60 pages
Introduction To Floyd Warshall Algorithm
No ratings yet
Introduction To Floyd Warshall Algorithm
6 pages
IE-484 Ch6
No ratings yet
IE-484 Ch6
25 pages
Last Mile Delivery
No ratings yet
Last Mile Delivery
6 pages
C MDA
No ratings yet
C MDA
7 pages
Euclidean Distance Matrix
No ratings yet
Euclidean Distance Matrix
8 pages
Chapter - 4
No ratings yet
Chapter - 4
70 pages
Clad Cluster Analysisi Slides-Clusteranalysis
No ratings yet
Clad Cluster Analysisi Slides-Clusteranalysis
7 pages
Ali-S-2013-PhD-Thesis_En iyisi !!!!!!!!!!!!
No ratings yet
Ali-S-2013-PhD-Thesis_En iyisi !!!!!!!!!!!!
213 pages
Gotzenberger Et Al. - 2021 - Trait-Based Ecology Tools in R
No ratings yet
Gotzenberger Et Al. - 2021 - Trait-Based Ecology Tools in R
267 pages
Gray
No ratings yet
Gray
7 pages
MapmyIndia Google Distance Matrix
No ratings yet
MapmyIndia Google Distance Matrix
3 pages
Phylogenetic Tree Construction - Methods
No ratings yet
Phylogenetic Tree Construction - Methods
7 pages
Hospital Layout Design Renovation as a Quadratic Assignment Problem With Geodesic Distances
No ratings yet
Hospital Layout Design Renovation as a Quadratic Assignment Problem With Geodesic Distances
19 pages
Fm-Layout Planning
No ratings yet
Fm-Layout Planning
3 pages
Untitled11.ipynb - Colab
No ratings yet
Untitled11.ipynb - Colab
11 pages
neha mtech paper
No ratings yet
neha mtech paper
6 pages
possible questions - P3
No ratings yet
possible questions - P3
5 pages
41 ml
No ratings yet
41 ml
3 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
29.measuring Data Similarity and Dissimilarity Introduction
No ratings yet
29.measuring Data Similarity and Dissimilarity Introduction
43 pages
1BM22CS038 Anagha Bharadwaj (6)
No ratings yet
1BM22CS038 Anagha Bharadwaj (6)
27 pages
Sciarretta Dialectometry Revised
No ratings yet
Sciarretta Dialectometry Revised
29 pages
IRoC-U 2025 Proposal Tamplate v1.0
No ratings yet
IRoC-U 2025 Proposal Tamplate v1.0
17 pages

Phylogenetic Tree Construction - Methods

Uploaded by

Phylogenetic Tree Construction - Methods

Uploaded by

FUNDAMENTALS OF BIOINFORMATICS

Module 23: Phylogenetic Tree Construction – Methods

Welcome all to a new session on Fundamentals of Bioinformatics. In this

An evolutionary tree is a two dimensional graph showing evolutionary

The first category is based on discrete characters, which are molecular

The second category of phylogenetic methods is based on distance, which is

First we can look at Distance based methods

I. Distance Based Methods

The algorithms for the distance-based tree-building method can be subdivided

The clustering-type algorithms compute a tree based on a distance matrix

The optimality-based algorithms compare many alternative tree topologies

In this category, we will discuss two important methods

The simplest clustering method is UPGMA, which builds a tree by a sequential

Neighbour Joining (NJ)

The tree construction process is somewhat opposite to that used UPGMA.

The clustering-based methods produce a single tree as output. However, there

So far our discussion has been on distance-based methods for phylogenetic

Character-based methods (also called discrete methods) are based directly on

Maximum Parsimony (MP)

For phylogenetic analysis, parsimony seems a good assumption. By this

How Does MP Tree Building Work?

A related term in this category is weighted parsimony.

Maximum Likelihood Method (ML)

Another character-based approach is ML, which uses probabilistic models to

How Does the Maximum Likelihood Method Work?

ML works by calculating the probability of a given evolutionary path for a

NJML is a hybrid algorithm combining aspects of NJ and ML. It constructs an

A recent addition to fast ML search methods is the GA, a computational

When applying GA to phylogenetic inference, the method strongly resembles

Another recent development of a speedy ML method is the use of the Bayesian

The tree search incorporates an iterative random sampling strategy based on

You might also like