Multiple Sequence Alignment
Multiple Sequence Alignment
BIOCOMPUTING AND
SEQUENCE ANALYSIS
MULTIPLE SEQUENCE
ALIGNMENT
MR S. ALIFA
• WHAT IS MULTIPLE SEQUENCE
ALIGNMENT?
the process of aligning three or more biological sequences, generally
protein, DNA, or RNA to infer meaning.
• No lateral gene transfer: MSA assumes that, there has not been transfer of
Steps to Build an MSA
1. Gather your sequences (Clean your sequences if there is need)
• While these methods are generally faster, they do not guarantee finding the
optimal solution. Some common heuristic alignment methods include:
• Greedy Algorithms, Progressive Alignment and Hidden Markov Models (HMMs)
Progressive Alignment Method
• The most commonly used approach to MSA is progressive alignment. Eg Clustal and T-
Coffee
• First, it performs pairwise alignments of all the sequences using the Needleman–
Wunsch global alignment method and records the similarity scores.
• Then, a third sequence is chosen and aligned to the first alignment, and this process is
iterated until all sequences have been aligned.
Progressive Alignment Method
• The alignment scoring and the optimization algorithm are closely
integrated within this iterative process
• The alignment scoring typically involves assessing the similarity or
dissimilarity between sequences or sequence regions
• The optimization algorithm, is responsible for finding the best
alignment given the scoring information.
• It determines the order in which sequences are added and how they
are aligned to the current alignment
• The most important heuristic part of progressive alignments is to
align the most similar pairs of sequences first then convert the scores
into evolutionary distances to create a distance matrix.
Progressive Alignment Method
• This is a binary tree whose leaves represent sequences and whose interior
nodes represent alignments known as the guide tree.
• The guide tree is used to direct the realignment of sequences based on their
relative positions on the tree, starting with the two most closely related
sequences and adding more distant sequences one at a time until all
sequences are aligned.
• The nodes furthest from the root represent the most similar pairs.
• There are several different progressive alignment methods, but they all follow a
similar basic strategy:
• 1. Pairwise alignment: The first step is to perform pairwise alignments between all
pairs of sequences in the dataset. This produces a matrix of pairwise similarity
scores.
• 2. Guide tree construction: The pairwise similarity scores are used to construct a
guide tree that represents the evolutionary relationships between the sequences.
• The guide tree is a hierarchical clustering of the sequences, where the most similar
sequences are grouped together at the bottom of the tree and the most dissimilar
sequences are grouped together at the top.
• 3.Progressive alignment: Starting with the two most similar sequences, the
• Then, the next sequence is added to the alignment by aligning it to the existing
• This process is repeated until all sequences have been added to the alignment.
• They are relatively fast and can handle large datasets with many sequences.
• They also produce high-quality alignments that are often more accurate than
• However, they can be sensitive to the order in which the sequences are added
to the alignment
• This method is based on the concept of generating a profile for each sequence
based on its local sequence context, and then using these profiles to align the
sequences in an iterative manner