0% found this document useful (0 votes)
13 views

Multiple Sequence Alignment

Uploaded by

r233071x
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Multiple Sequence Alignment

Uploaded by

r233071x
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

HBB 133

BIOCOMPUTING AND
SEQUENCE ANALYSIS

MULTIPLE SEQUENCE
ALIGNMENT

MR S. ALIFA
• WHAT IS MULTIPLE SEQUENCE
ALIGNMENT?
the process of aligning three or more biological sequences, generally
protein, DNA, or RNA to infer meaning.

• WHAT IS THE PURPOSE OF DOING MULTIPLE SEQUENCE


ALIGNMENT?

to study closely related genes or proteins in order to find the


evolutionary relationships and to identify shared patterns among
functionally or structurally related genes or proteins.
MSA is based on the following
Assumptions
• Homology assumption: The sequences being aligned are assumed to be
homologous

• Genetic Variation: there are genetic differences amongst the sequences

• Conserved regions assumption: It is assumed that certain regions or residues


within the sequences have been conserved

• Independence assumption: MSA assumes that the positions within the


aligned sequences are statistically independent of each other.

• No lateral gene transfer: MSA assumes that, there has not been transfer of
Steps to Build an MSA
1. Gather your sequences (Clean your sequences if there is need)

2. Check for assumptions

3. Choose an aligning tool

4. Build your alignment

5. Generate a consensus sequence from the alignment

6. Validate and Interpret your results


Heuristic Alignment and
Exhaustive Methods
• Heuristic methods are designed to quickly find approximate solutions to
alignment problems.

• They employ efficient algorithms that make certain assumptions or use


predefined rules to guide the alignment process.

• While these methods are generally faster, they do not guarantee finding the
optimal solution. Some common heuristic alignment methods include:
• Greedy Algorithms, Progressive Alignment and Hidden Markov Models (HMMs)
Progressive Alignment Method
• The most commonly used approach to MSA is progressive alignment. Eg Clustal and T-
Coffee

• Also known as the tree-based algorithm, is a step-wise assembly of multiple alignments


based on pairwise similarity.

• Termed progressive because it aligns sequences in a step-wise manner.

• First, it performs pairwise alignments of all the sequences using the Needleman–
Wunsch global alignment method and records the similarity scores.

• Then, a third sequence is chosen and aligned to the first alignment, and this process is
iterated until all sequences have been aligned.
Progressive Alignment Method
• The alignment scoring and the optimization algorithm are closely
integrated within this iterative process
• The alignment scoring typically involves assessing the similarity or
dissimilarity between sequences or sequence regions
• The optimization algorithm, is responsible for finding the best
alignment given the scoring information.
• It determines the order in which sequences are added and how they
are aligned to the current alignment
• The most important heuristic part of progressive alignments is to
align the most similar pairs of sequences first then convert the scores
into evolutionary distances to create a distance matrix.
Progressive Alignment Method

• This is a binary tree whose leaves represent sequences and whose interior
nodes represent alignments known as the guide tree.

• The guide tree is used to direct the realignment of sequences based on their
relative positions on the tree, starting with the two most closely related
sequences and adding more distant sequences one at a time until all
sequences are aligned.

• The root node represents a complete multiple alignment.

• The nodes furthest from the root represent the most similar pairs.
• There are several different progressive alignment methods, but they all follow a
similar basic strategy:

• 1. Pairwise alignment: The first step is to perform pairwise alignments between all
pairs of sequences in the dataset. This produces a matrix of pairwise similarity
scores.

• 2. Guide tree construction: The pairwise similarity scores are used to construct a
guide tree that represents the evolutionary relationships between the sequences.

• The guide tree is a hierarchical clustering of the sequences, where the most similar
sequences are grouped together at the bottom of the tree and the most dissimilar
sequences are grouped together at the top.
• 3.Progressive alignment: Starting with the two most similar sequences, the

algorithm aligns them using a standard pairwise alignment algorithm.

• Then, the next sequence is added to the alignment by aligning it to the existing

alignment using a profile-based alignment algorithm.

• This process is repeated until all sequences have been added to the alignment.

• 4. Refinement: Once the progressive alignment is complete, it may be refined

using various techniques to improve the accuracy of the alignment.


Progressive Alignment Method

• They are relatively fast and can handle large datasets with many sequences.

• They also produce high-quality alignments that are often more accurate than

those produced by other methods.

• However, they can be sensitive to the order in which the sequences are added

to the alignment

• they may not always produce the optimal alignment.


Progressive Alignment Method

• One of the key challenges in progressive alignment methods is the

construction of the guide tree.

• The guide tree represents the evolutionary relationships between the

sequences in the dataset and is used to guide the alignment process.


Methods of Guide tree
construction
• Distance-based methods: most commonly used
• These methods work by calculating a distance matrix between all pairs of
sequences and then using this matrix to construct a tree that minimizes the total
branch length
• While distance-based methods are computationally efficient, they can be
sensitive to errors in the distance calculations
• may not always accurately represent the true evolutionary relationships between
the sequences.
• Examples are neighbor joining and UPGMA
Methods of Guide tree
construction
• Probabilistic methods:
• These methods use probabilistic models to estimate the likelihood of different
evolutionary scenarios and choose the tree with the highest likelihood
• While these methods can produce more accurate trees than distance-based
methods, they are also more computationally intensive and may not be
practical for very large datasets.
• Maximum likelihood and Bayesian methods are the basic methods that are
used to design these models
Methods of Guide tree
construction
• incorporation of structural information: Still an active area of
research
• There is also information available in the three-dimensional
structure of proteins and RNA molecules
• By incorporating structural this information into the alignment
process, it may be possible to produce more accurate alignments
and identify conserved structural elements
Iterative Methods
• Is a specialized method that can help to align highly divergent sequences by
using both sequence and profile information

• This method is based on the concept of generating a profile for each sequence
based on its local sequence context, and then using these profiles to align the
sequences in an iterative manner

• This method involves a series of iterative steps, where an initial alignment is


generated, and then refined through subsequent iterations.
• After the initial alignment is generated, the alignment is scored based on a
predefined scoring system, which assigns a score to each alignment position
based on the similarity between the aligned residues.
• The score is used to identify regions of the alignment that are highly
conserved, and these regions are then used to guide the subsequent
iterations.
• the profiles are refined using more advanced algorithms, such as hidden
Markov models (HMMs) or position-specific scoring matrices (PSSMs), which
take into account the conservation and variability of each position in the
alignment
• Iterative multiple sequence alignment can be computationally intensive,
especially for large datasets, as it involves multiple rounds of sequence
alignment and scoring.
• However, it can provide more accurate alignments than non-iterative methods,
particularly for sequences that are highly divergent or contain many gaps.
• One drawback of iterative multiple sequence alignment is that it can be
sensitive to the initial alignment used to start the iteration process.
• Different initial alignments can lead to different final alignments, and it can be
difficult to determine which alignment is the most accurate.
• One popular iterative multiple sequence alignment method is the ClustalW
algorithm, which uses a progressive alignment strategy to generate an initial
alignment, followed by iterative refinement using a variety of algorithms
• Another widely used iterative alignment algorithm for multiple sequences is
MUSCLE

You might also like