0% found this document useful (0 votes)

2 views

Module-II

The document provides an overview of sequence alignment techniques in bioinformatics, emphasizing their significance in analyzing biological sequences for functional and evolutionary insights. It covers various alignment methods, including global and local alignments, pairwise sequence alignment algorithms, and the dot matrix and dynamic programming methods, highlighting their advantages and limitations. Additionally, it discusses concepts such as sequence homology, similarity, and identity, along with the importance of scoring matrices and gap penalties in alignment processes.

Uploaded by

kpavankumar887

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Module-II

Uploaded by

kpavankumar887

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 51

Bioinformatics

— Unit II-Sequence Alignment Techniques—

Dr. Chandra Mohan D

Assistant Professor
Computer Science and Engineering Group
Indian Institute of Information Technology, Sri City

If you know your own DNA sequence than you know every thing about your self

January 22, 2025 Bioinformatics 1

Outline
 Sequence Alignment
 Introduction (23-01-2024)

Global Alignment

Local Alignment
 Pairwise Sequence Alignment Algorithms

Dot Matrix

Dynamic programming methods
 Needleman-Wunsch and Smith-Waterman algorithms

Scoring Matrices
 Probabilistic Foundations of Sequence Alignment
 Multiple sequence alignment- Uses of MSA
January 22, 2025 Bioinformatics 2
Introduction
 Sequence comparison lies at the heart of bioinformatics analysis
 It is an important step toward structural and functional analysis of
newly determined sequences
 As new biological sequences are being generated at exponential
rates,
 Sequence comparison is becoming increasingly important

 To draw functional and evolutionary inference of a new protein

with proteins already existing in the database

 The most fundamental process in this type of comparison is sequence
alignment
 Sequences are compared by searching for common character
patterns and
 establishing residue–residue correspondence among related

sequences
January 22, 2025 Bioinformatics 3
Introduction: Significance of sequence alignment
 Sequence alignment is useful for
 Discovering functional, structural, and evolutionary

information in biological sequences

 Identifying the evolutionary relationships between sequences helps
 To characterize the function of unknown sequences

 When a sequence alignment reveals significant similarity among a

group of sequences,
 They can be considered as belonging to the same family

 If one member within the family has a known structure and

function
 Sequence alignment can be used for prediction of structure and

function of uncharacterized sequences

January 22, 2025 Bioinformatics 4

Sequence homology vs similarity
 An important concept in sequence analysis is sequence homology
 When two sequences are descended from a common evolutionary
origin,
 They are said to have a homologous relationship or share

homology
 Sequence similarity, is the percentage of aligned residues that are
similar in
 Physiochemical properties such as size, charge, and

hydrophobicity
 Sequence similarity can be quantified using percentages
 Homology is a qualitative statement
 Two sequences share 40% similarity
 They are either homologous or non-homologous
January 22, 2025 Bioinformatics 5
Sequence similarity vs Identity
 Sequence similarity and Sequence identity are synonymous for
nucleotide sequences
 For protein sequences, however, the two concepts are very different
 In a protein sequence alignment, Sequence identity refers to
 The percentage of matches of the same amino acid residues

between two aligned sequences

 Identity in sequence alignment is the number of characters that

match exactly between two different sequences

 Hence, gaps do not count when assessing identity

 There are two ways to calculate the sequence similarity/identity

 One involves the use of the overall sequence lengths of both

sequences
 The other normalizes by the size of the shorter sequence

January 22, 2025 Bioinformatics 6

Sequence similarity vs Identity
 It significantly implies that it has the effect where the sequence
identity is not transitive
 If X=Y and Y=Z, then X is not necessarily equal to Z
 This is deduced in terms of the identity distance measure
 Example:
 X has a sequence of AAGGCTT, Y has a sequence of AAGGC and

Z has a sequence of AAGGCAT

 Identity between X and Y is 100% {5 identical nucleotides /
min[length(X),length(Y)]}
 Identity between Y and Z is also 100%
 But identity between X and Z is only 85% {(6 identical
nucleotides / 7)}

January 22, 2025 Bioinformatics 7

Sequence similarity vs Identity
 The first method uses the below formula for sequence similarity(S%):
S = [(Ls × 2)/(La + Lb)] × 100
 where S is the percentage sequence similarity

 Ls is the number of aligned residues with similar characteristics

 La and Lb are the total lengths of each individual sequence

 The sequence identity (I%) can be calculated in a similar fashion:

I = [(Li × 2)/(La + Lb)] × 100
 where Li is the number of aligned identical residues

 The second method of calculation is to derive the percentage of

identical/similar residues over the full length of the smaller sequence
using the formula
I(S)% = Li(s)/La%
 where La is the length of the shorter of the two sequences

January 22, 2025 Bioinformatics 8

Pairwise Sequence Similarity Methods
 Pairwise sequence alignment is the process of aligning two sequences
and is the basis of
 Database similarity searching and

 Multiple sequence alignment

 The overall goal of pairwise sequence alignment is to find the best

pairing of two sequences, such that
 There is maximum correspondence among residues

 To achieve this, one sequence needs to be shifted relative to the other

 to find the position where maximum matches are found

 There are two different alignment strategies that are

 Global alignment


Alignment is carried out from beginning to end of both sequences to
find the best possible alignment across the entire length between the
two sequences
January 22, 2025 Bioinformatics 9
Pairwise Sequence similarity Methods
 Local alignment

It only finds local regions with the highest level of similarity
between the two sequences

January 22, 2025 Bioinformatics 10

Alignment Algorithms-Dot Matrix Method
 Global and Local alignment algorithms, are fundamentally similar and
only differ in the optimization
 Both types of algorithms can be based on one of the three methods:
 The dot matrix method

 The dynamic programming method and

 The word method

 The dot matrix method:

 The most basic sequence alignment

method is the dot matrix method,

also known as the dot plot method
 It is a graphical way of comparing

two sequences in a 2D matrix

January 22, 2025 Bioinformatics 11

Alignment Algorithms-Dot Matrix Method
 In a dot matrix, two sequences to be compared are written in the
horizontal and vertical axes of the matrix
 The comparison is done by scanning each residue of one sequence for
similarity with all residues in the other sequence
 If a residue match is found, a dot is placed within the graph
 Otherwise, the matrix positions are left blank
 When the two sequences have substantial regions of similarity,
 many dots line up to form contiguous diagonal lines,

 which reveal the sequence alignment

 If there are interruptions in the middle of a diagonal line, they indicate

insertions or deletions
 Parallel diagonal lines within the matrix represent repetitive regions of
the sequences
January 22, 2025 Bioinformatics 12
Alignment Algorithms-Dot Matrix Method
 A problem exists when comparing large sequences using the dot matrix
method, namely, the high noise level
 In most dot plots, dots are plotted all over the graph,
 Obscuring identification of the true alignment

 For DNA sequences, the problem is particularly acute because

 There are only four possible characters in DNA and

 Each residue therefore has a one-in-four chance of matching a

residue in another sequence

 To reduce noise, instead of using a single residue to scan for similarity,
 A filtering technique has to be applied, which uses a “window” of

fixed length covering a stretch of residue pairs

 When applying filtering, windows slide across the two sequences to
compare all possible stretches
January 22, 2025 Bioinformatics 13
Alignment Algorithms-Dot Matrix Method
 Dots are only placed when a stretch of residues equal to
 the window size from one sequence matches completely with a

stretch of another sequence

 This method has been shown to be effective in reducing the noise level
 The window is also called a tuple, the size of which can be
manipulated so that a clear pattern of sequence match can be plotted
 However, if the selected window size is too long, sensitivity of the
alignment is lost
 There are many variations of using the dot plot method
 A sequence can be aligned with itself to identify internal repeat
elements
 In the self comparison, there is a main diagonal for perfect matching of
each residue

January 22, 2025 Bioinformatics 14

Alignment Algorithms-Dot Matrix Method
 If repeats are present, short parallel lines are observed above and below
the main diagonal
 Self complementarity of DNA sequences (inverted repeats) can also be

identified using a dot plot

Advantages:
 It gives a direct visual relationship between two sequences and

 helps easy identification of the regions of greatest similarities

 Identify repeat regions of sequence based on the presence of parallel

diagonals of the same size vertically or horizontally in the matrix

 It is useful in identifying chromosomal repeats and in comparing gene

order conservation between two closely related genomes

 It can also be used in identifying nucleic acid secondary structures

through detecting self-complementarity of a sequence

January 22, 2025 Bioinformatics 15

Alignment Algorithms-Dot Matrix Method
Types of Repeats in DNA:
 Tandem Repeat: 5'-ATTCG ATTCG ATTCG-3'

 Mirror Repeat: 5'-GCATGGTACG-3'

 Pairing Repeat: 5'-TCTAGTTAGATCAA-3'

 Inverted Repeat: 5'-TTACGnnnnnnCGTAA-3'

Disadvantages:
It is often up to the user to construct a full alignment with insertions and

deletions by linking nearby diagonals

Another limitation of this visual analysis method is that it lacks statistical

rigor in assessing the quality of the alignment

The method is also restricted to pairwise alignment

It is difficult for the method to scale up to multiple alignment

January 22, 2025 Bioinformatics 16

Alignment Algorithms-Dynamic Programming Method
 Dynamic programming is a method that determines optimal alignment
 By matching two sequences for all possible pairs of characters

 It is fundamentally similar to the dot matrix method in that it also

creates a 2D alignment grid
 However, it finds alignment in a more quantitative way by
 Converting a dot matrix into a scoring matrix to account for

matches and mismatches between sequences

 By searching for the set of highest scores in this matrix,
 the best alignment can be accurately obtained

 The residue matching is according to a particular scoring matrix

 The scores are calculated one row at a time
 The scanning of the second row takes into account the scores already
obtained in the first round
January 22, 2025 Bioinformatics 17
Alignment Algorithms-Dynamic Programming Method

January 22, 2025 Bioinformatics 18

Alignment Algorithms-Dynamic Programming Method

 This process is iterated until values for all the cells are filled
 The best score is put into the bottom right corner of an intermediate
matrix
January 22, 2025 Bioinformatics 19
Alignment Algorithms-Dynamic Programming Method
 Thus, the scores are accumulated along the diagonal going from the
upper left corner to the lower right corner
 Once the scores have been accumulated in matrix,
 the next step is to find the path that represents the optimal alignment

 This is done by tracing back through the matrix in reverse order from the
lower right-hand corner of the matrix toward the origin
 The best matching path is the one that has the maximum total score
 If two or more paths reach the same highest score, one is chosen
arbitrarily to represent the best alignment
 The path can also move horizontally or vertically at a certain point,
 which corresponds to introduction of a gap or an insertion or deletion

for one of the two sequences

January 22, 2025 Bioinformatics 20

Alignment Algorithms-Dynamic Programming Method
Gap Penalties:
Performing optimal alignment between sequences often involves applying

gaps that
 represent insertions and deletions

In natural evolutionary processes insertion and deletions are relatively rare

in comparison to substitutions
Introducing gaps should be made more difficult computationally

However, assigning penalty values can be more or less arbitrary because

 There is no evolutionary theory to determine a precise cost for

introducing insertions and deletions

If the penalty values are set too low, gaps can become too numerous

 to allow even nonrelated sequences to be matched up with high

similarity scores
January 22, 2025 Bioinformatics 21
Alignment Algorithms-Dynamic Programming Method
Gap Penalties:
If the penalty values are set too high, gaps may become too difficult

 to appear, and reasonable alignment cannot be achieved, which is

also unrealistic
Through empirical studies for globular proteins, a set of penalty values

have been developed that appear to suit most alignment purposes

They are normally implemented as default values in most alignment

programs
Another factor to consider is the cost difference between opening a gap

and extending an existing gap

It is known that it is easier to extend a gap that has already been started

Thus, gap opening should have a much higher penalty than gap extension

January 22, 2025 Bioinformatics 22

Alignment Algorithms-Dynamic Programming Method
Gap Penalties:
These differential gap penalties are also referred to as affine gap penalties

The normal strategy is to use preset gap penalty values for introducing and

extending gaps
For example, one may use a −12/ − 1 scheme in which

 the gap opening penalty is −12 and the gap extension penalty −1

The total gap penalty (W ) is a linear function of gap length, k which is

calculated using the formula:

W = γ + δ × (k− 1)
where γ is the gap opening penalty, δ is the gap extension penalty, and
k is the length of the gap

January 22, 2025 Bioinformatics 23

Alignment Algorithms-Dynamic Programming Method
Gap Penalties:
Besides the affine gap penalty, a constant gap penalty is sometimes also

used, which assigns the same score for each gap position regardless
whether it is opening or extending
However, this penalty scheme has been found to be less realistic than the

affine penalty
Gaps at the terminal regions are often treated with no penalty

 because in reality many true homologous sequences are of different

lengths
Consequently, end gaps can be allowed to be free to avoid getting

unrealistic alignments

January 22, 2025 Bioinformatics 24

Alignment Algorithms-Dynamic Programming Method
Dynamic Programming for Local Alignment
In regular sequence alignment, the divergence level between the two

sequences to be aligned is not easily known

The sequence lengths of the two sequences may also be unequal

In such cases, identification of regional sequence similarity may be of

greater significance than finding a match that includes all residues

The first application of dynamic programming in local alignment is the

Smith–Waterman algorithm
Dynamic Programming for Global Alignment
The classical global pairwise alignment algorithm using dynamic

programming is the Needleman–Wunsch algorithm

In this algorithm, an optimal alignment is obtained over the entire lengths

of the two sequences

January 22, 2025 Bioinformatics 25

Alignment Algorithms-Dynamic Programming Method
 It must extend from the beginning to the end of both sequences to
achieve the highest total score
 In other words, the alignment path has to go from the bottom right
corner of the matrix to the top left corner
 The drawback of focusing on getting a maximum score for the full-
length sequence alignment is the risk of missing the best local similarity
 This strategy is only suitable for aligning two closely related sequences
that are of the same length
 For divergent sequences or sequences with different domain structures,
the approach does not produce optimal alignment
 One of the few web servers dedicated to global pairwise alignment is
GAP(global alignment program)

January 22, 2025 Bioinformatics 26

Alignment Algorithms-Dynamic Programming Method
Dynamic Programming for Local Alignment
As in the global alignment, the final result is influenced by the choice of

scoring systems used

Occasionally, several optimally aligned segments with best scores are

obtained
The goal of local alignment is

 to get the highest alignment score locally

This approach may be suitable

 for aligning divergent sequences or sequences with multiple domains

that may be of different origins

Most commonly used pairwise alignment web servers apply the local

alignment strategy, which include

 SIM, SSEARCH, and LALIGN

January 22, 2025 Bioinformatics 27

Dynamic Programming Algorithm: Smith Waterman
 Smith Waterman algorithm was first proposed by Temple F. Smith
and Michael S. Waterman in 1981
 The algorithm explains the local sequence alignment
 It gives conserved regions between the two sequences
 One can align two partially overlapping sequences
 It align the subsequence of the sequence to itself
 These are the main advantages of Local Sequence Alignment

Gap score or gap penalty:

 Dynamic programming algorithms use gap penalties to maximize the

biological meaning
 Typical values are –12 for gap opening, and –4 for gap extension

January 22, 2025 Bioinformatics 28

Dynamic Programming Algorithm: Smith Waterman
Assumed scoring schemas:
If the residues (nucleotide or amino acids) are same in both the

sequences the match score is assumed (Si,j) as +5

 It is added to the diagonally positioned cell of the current cell (i, j

position)
If the residues are not same, the mismatch score is assumed as -3

 This score should be added to the diagonally positioned cell of

the current cell

The gap penalty score is assumed as -4

 It is added to left and above positioned cells of the current cell

January 22, 2025 Bioinformatics 29

Working of Smith-Waterman Algorithm
 The basic steps of the algorithm are:
1. Initialization of a matrix
2. Matrix Filling with the appropriate scores
3. Trace back the sequences for a suitable alignment
 To study the Local sequence alignment consider the given below
sequences.
CGTGAATTCAT (sequence#1 or A)
GACTTAC (sequence #2 or B)

Step1: Initialization of Matrix

 The two sequences are arranged in a matrix form with A+1 columns

and B+1 rows

 The values in the first row and first column are set to zero

January 22, 2025 Bioinformatics 30

Working of Smith-Waterman Algorithm

Variables used:
i, j describes row and columns
M is the matrix value of the required cell (stated as M i,j)
S is the score of the required cell (S i, j)
W is the gap alignment

January 22, 2025 Bioinformatics 31

Working of Smith-Waterman Algorithm
Step2: Matrix Filling
The second and crucial step of the algorithm is filling the entire matrix,

so it is more important to know the neighbor values (diagonal, upper

and left) of the current cell to fill each and every cell
Mi,j=Maximum{(Mi-1,j-1+Si,j), (Mi,j-1+W), (Mi-1,j+W), 0}
Fill the entire matrix using the assumed scoring schema and initial
values
One can fill the 1st row and 1st column with the scoring matrix as

follows
The first residue (nucleotides or amino acids) in both sequences is ‘C’

and ‘G’, the matching score or the mismatching score is going to be

added the neighboring value which is diagonally located i.e. 0

January 22, 2025 Bioinformatics 32

Working of Smith-Waterman Algorithm
 The upper and left values are added to the gap penalty score from the
matrix
 So the scoring schema equation can be shown as follows
M1,1 =Maximum{M0,0+S1,1, M1,0+W, M0,1+W, 0}
=Maximum{0+(-3), 0+(-4), 0+(-4), 0}
=Maximum{-3,-4,-4,0}
=0
 From the above calculations the maximum value obtained is 0

 Finding the maximum value for M position, one can notice that
i,j
there is no chance to see any negative values in the matrix, since we
are taking 0 as lowest value
 After filling the matrix, keep the pointer back to the cell from where
the maximum score has been determined
 In the similar fashion fill all the values of the matrix of the cell
January 22, 2025 Bioinformatics 33
Working of Smith-Waterman Algorithm
 In the similar fashion fill all the values of the matrix of the cell

 Each cell is back pointed by one or more pointers from where the
maximum score has been obtained
Step3: Trace backing the sequences for an optimal alignment
 The final step for the appropriate alignment is trace backing, prior to
that one needs to find out the maximum score obtained in the entire
matrix for the local alignment of the sequences
January 22, 2025 Bioinformatics 34
Working of Smith-Waterman Algorithm
 It is possible that the maximum scores can be present in more than
one cell, in that case there may be possibility of two or more
alignments, and the best alignment by scoring it
 In this example we can see the maximum score in the matrix as 18,
which is found in two positions that lead to multiple alignments, so
the best alignment has to be found
 So the trace back begins from the position which has the highest
value, pointing back with the pointers, thus find out the possible
predecessor, then move to next predecessor and continue until we
reach the score 0

January 22, 2025 Bioinformatics 35

Working of Smith-Waterman Algorithm
 It is possible to find two pointers pointing out from one cell, where
both ways(alignments) can be considered, best one is found by scoring
and finding maximum score among them

 Thus a local alignment is obtained and one can see the possible
alignments
 The two alignments can be given with a score, for matching as +5 ,
mismatch as -3 and gap penalty as -4

January 22, 2025 Bioinformatics 36

Working of Smith-Waterman Algorithm

 By summing up the scores both of the alignments are giving the same
as 18, so one can predict both alignments are the best

January 22, 2025 Bioinformatics 37

Working of Needleman-Wunsch Algorithm
 To study the algorithm, consider the two given sequences
CGTGAATTCAT (sequence #1) , GACTTAC (sequence #2)
 The length (count of the nucleotides or amino acids) of the sequence
1 and sequence 2 are 11 and 7 respectively
 The initial matrix is created with A+1 column’s and B+1 row’s
(where A and B corresponds to length of the sequences)
 Extra row and column is given, so as to align with gap, at the starting
of the matrix

 The basic steps of the algorithm are:

 Initialization of a matrix

 Matrix Filling with the appropriate scores

 Trace back the sequences for a suitable alignment

January 22, 2025 Bioinformatics 38

Working of Needleman-Wunsch Algorithm
 After creating the initial matrix, scoring schema has to be introduced
which can be user defined with specific scores
 The simple basic scoring schema can be assumed as,
 if two residues (nucleotide or amino acid) at ith and jth position

are same, matching score is 1 (S(i,j)= 1) or

 if the two residues at ith and jth position are not same, mismatch

score is assumed as -1 (S(i,j)= -1)

 The gap score(w) or gap penalty is assumed as -1 .

Step1: Initialization of Matrix

 First row and first column of the matrix can be initially filled with 0

 If the gap score is assumed, the gap score can be added to the

previous cell of the row or column

January 22, 2025 Bioinformatics 39
Working of Needleman-Wunsch Algorithm

Step1: Initialization of Matrix

First row and first column of the matrix can be initially filled with 0

If the gap score is assumed, the gap score can be added to the previous

cell of the row or column

Step2: Matrix Fill Step

The second and crucial step of the algorithm is matrix filling starting

from the upper left hand corner of the matrix

January 22, 2025 Bioinformatics 40
Working of Needleman-Wunsch Algorithm
 To find the maximum score of each cell, it is required to know the
neighboring scores (diagonal, left and right) of the current position
 From the assumed values, add the match or mismatch score to the
diagonal value
 Similarly add the gap score to the other neighboring values
 Thus, we can obtain three different values, from that take the
maximum among them and fill the ith and jth position with the score
obtained
 In terms of matrix positions, it is important to know
[M(i-1,j-1)+S(i,j), M(i,j-1)+w, M(i-1,j)+w]
 Overall the equation can be showed in the following manner
M(i,j) = Maximum{M(i-1,j-1)+S(i,j), M(i,j-1)+w, M(i-1,j)+w}

January 22, 2025 Bioinformatics 41

Working of Needleman-Wunsch Algorithm
 To score the matrix of the current position (the first position M 1,1)
the above stated formulae can be used
 The first residue (nucleotides or amino acids) in the 2 sequences are
‘G’ and ‘C’
 Since they are mismatching residues, the score would (Si,j=-1) be -1
M1,1=Maximum{M0,0+S1,1, M1,0+W, M0,1+W}
=Maximum{0+(-1), -1+(-1), -1+(-1)}
=Maximum{-1,-2,-2}
=-1
 The obtained score -1 is placed in position i,j (1,1) of the scoring
matrix
 Similarly using the above equation and method, fill all the remaining
rows and columns
January 22, 2025 Bioinformatics 42
Working of Needleman-Wunsch Algorithm
 Place the back pointers to the cell from where the maximum score is
obtained, which are predecessors of the current cell

January 22, 2025 Bioinformatics 43

Working of Needleman-Wunsch Algorithm
 Place the back pointers to the cell from where the maximum score is
obtained, which are predecessors of the current cell
3.Trace back Step
 The final step in the algorithm is the trace back for the best

alignment
 In the above mentioned example, one can see the bottom right hand

corner score as -1
 The important point to be noted here is that there may be two or

more alignments possible between the two example sequences

 The current cell with value -1 has immediate predecessor, where the

maximum score obtained is diagonally located and its value is 0

 If there are two or more values which points back, suggests that there

can be two or more possible alignments

January 22, 2025 Bioinformatics 44

Working of Needleman-Wunsch Algorithm
 By continuing the trace back step by the above defined method, one
would reach to the 0th row, 0th column
 Following the above described steps, alignment of two sample
sequences can be found
 The best alignment among the alignments can be identified by using
the maximum alignment score (match =5, mismatch=-1, gap=-2)
which may be user defined

January 22, 2025 Bioinformatics 45

Multiple Sequence Alignment (MSA)
 A MSA is basically an alignment of more than 2 sequences
 MSA tells us about the similarity among multiple sequences

Uses of MSA:
 In order to characterize protein families, identify shared regions of

homology in a multiple sequence alignment

 Determination of the consensus sequence of several aligned sequences

 Consensus sequences can help to develop a “sequence finger print ”

which allows the identification of members of distantly related protein

family (motifs)
 MSA can help us to reveal biological facts about proteins, like

analysis of the secondary/tertiary structure

January 22, 2025 Bioinformatics 46
Multiple Sequence Alignment (MSA)
Types of MSA
Dynamic Programming Approach

 Computes an optimal alignment for a given score function

 Because of its high running time, it is not typically used in practice

Progressive Alignment method

 This approach repeatedly aligns two sequences, two alignments, or

a sequence with an alignment

 This method also called hierarchical or tree method

Iterative refinement method

 Works similarly to progressive methods but repeatedly realigns the

initial sequences as well as adding new sequences to the growing

MSA

January 22, 2025 Bioinformatics 47

MSA-Progressive Alignment method
 The most widely used approach
 Builds up a final MSA by combining pairwise alignments beginning
with the most similar pair and progressing to the most distantly related
 Progressive alignment methods require two stages:
 A first stage in which the relationships between the sequences are

represented as a tree, called a guide tree

 Second step in which the MSA is built by adding the sequences

sequentially to the growing MSA according to the guide tree

January 22, 2025 Bioinformatics 48

Multiple Sequence Alignment (MSA)
MSA Using CLUSTALW
Works by progressive alignment

Most closely related sequences are aligned first, and then additional

sequences and groups of sequences are added, guided by the initial

alignments
Uses alignment scores to produce a phylogenetic tree

January 22, 2025 Bioinformatics 49

MSA-Iterative refinement method
 It works similarly progressive alignment method only
 but refinement is performed

 Refinement step: once new sequence is added to the alignment the

initially aligned sequences are realigned in order to obtain the best

alignment

January 22, 2025 Bioinformatics 50

Probabilistic Foundations of Sequence Alignment

January 22, 2025 Bioinformatics 51

MCN Ii - 1 Framework
No ratings yet
MCN Ii - 1 Framework
9 pages
Culture and The Evolutionary Process
No ratings yet
Culture and The Evolutionary Process
344 pages
3
No ratings yet
3
107 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Sequencing Alignment & Its Methods Group II
No ratings yet
Sequencing Alignment & Its Methods Group II
12 pages
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
No ratings yet
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
17 pages
Sequence Alingment
No ratings yet
Sequence Alingment
10 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
Genomics and Similarity search
No ratings yet
Genomics and Similarity search
43 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Lecture 6- Sequence Analysis
No ratings yet
Lecture 6- Sequence Analysis
28 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods Final
No ratings yet
Sequence Alignment Methods Final
69 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
4. Sequence Alignment
No ratings yet
4. Sequence Alignment
24 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Pairwise Alignment Prelab PDF
No ratings yet
Pairwise Alignment Prelab PDF
87 pages
05. Sequence Alignment
No ratings yet
05. Sequence Alignment
9 pages
Data Mining-Mining Sequence Patterns in Biological Data
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
6 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
54 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Chapter 2 Bioinformatics
No ratings yet
Chapter 2 Bioinformatics
9 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Lecture 6 Evolutionary Sequence Alignment Algorithms
No ratings yet
Lecture 6 Evolutionary Sequence Alignment Algorithms
26 pages
DOT PLOT and SEQUENTIAL ALIGNMENT
No ratings yet
DOT PLOT and SEQUENTIAL ALIGNMENT
22 pages
BT302_L3_PSA
No ratings yet
BT302_L3_PSA
47 pages
Lecture 04 Alignment
No ratings yet
Lecture 04 Alignment
22 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
AsBioinfo-Ders-7-ALLIGNMENT_1
No ratings yet
AsBioinfo-Ders-7-ALLIGNMENT_1
9 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
Bioinfo-Ders-7-ALLIGNMENT_1
No ratings yet
Bioinfo-Ders-7-ALLIGNMENT_1
55 pages
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
No ratings yet
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
18 pages
W03_Pairwise
No ratings yet
W03_Pairwise
55 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Bioinformatics Pairwise Alignment
No ratings yet
Bioinformatics Pairwise Alignment
128 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Sequence Alignment (Chapter 6) : The Biological Problem
No ratings yet
Sequence Alignment (Chapter 6) : The Biological Problem
44 pages
CE6068 Lecture 5
No ratings yet
CE6068 Lecture 5
83 pages
B.I Sec 4.
No ratings yet
B.I Sec 4.
18 pages
lec-02
No ratings yet
lec-02
103 pages
Bioinformatics Basics PDF
No ratings yet
Bioinformatics Basics PDF
10 pages
Bioinfo Notes 2
No ratings yet
Bioinfo Notes 2
9 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
10 pages
Multiple Sequence Alignment Black and White
No ratings yet
Multiple Sequence Alignment Black and White
2 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Document Mosaicing: Unlocking Visual Insights through Document Mosaicing
From Everand
Document Mosaicing: Unlocking Visual Insights through Document Mosaicing
Fouad Sabry
No ratings yet
Cytogen
No ratings yet
Cytogen
4 pages
Stem Cell Technology Notes
No ratings yet
Stem Cell Technology Notes
47 pages
John Lynard I. Anchoriz 11-Timothy #4
No ratings yet
John Lynard I. Anchoriz 11-Timothy #4
12 pages
17 Hal 709
No ratings yet
17 Hal 709
16 pages
CROP Genome Projects
No ratings yet
CROP Genome Projects
27 pages
DNA Worksheet
No ratings yet
DNA Worksheet
2 pages
General Biology Reviewer
No ratings yet
General Biology Reviewer
15 pages
4 Gmo
No ratings yet
4 Gmo
54 pages
RAC-AprilPRINCIPLES OF INHERITANCE AND VARIATION-CASE STUDY
No ratings yet
RAC-AprilPRINCIPLES OF INHERITANCE AND VARIATION-CASE STUDY
4 pages
Wellseq
No ratings yet
Wellseq
24 pages
Bio 110 Study Guide
No ratings yet
Bio 110 Study Guide
8 pages
Biology Researchers and Invention: Systema Naturae Tenth Edition Was Published in
No ratings yet
Biology Researchers and Invention: Systema Naturae Tenth Edition Was Published in
7 pages
Darwin Vs Lamarck Worksheet Answer Key
No ratings yet
Darwin Vs Lamarck Worksheet Answer Key
4 pages
Biotech Q3 WK 5-6
No ratings yet
Biotech Q3 WK 5-6
3 pages
Biochemistry Reviewer Continous.1
No ratings yet
Biochemistry Reviewer Continous.1
12 pages
Gmo Vs Selective Breeding
No ratings yet
Gmo Vs Selective Breeding
9 pages
SSIP PPT Life Sciences Module 1 Genetics and Heredity
No ratings yet
SSIP PPT Life Sciences Module 1 Genetics and Heredity
80 pages
9 Plant Transormation Methods RNT
No ratings yet
9 Plant Transormation Methods RNT
19 pages
IRRI Annual Report 2003-2004
No ratings yet
IRRI Annual Report 2003-2004
132 pages
Cells
No ratings yet
Cells
35 pages
The Sickle Cell Membrane: Tip of The Iceberg
No ratings yet
The Sickle Cell Membrane: Tip of The Iceberg
38 pages
Polygenic Inheritance, Heritability & Its
No ratings yet
Polygenic Inheritance, Heritability & Its
17 pages
Evolution Notes Detailed Class 12
No ratings yet
Evolution Notes Detailed Class 12
4 pages
Molecular Cloning and Characterization of Strictosidine Synthase, A Key Gene in Biosynthesis of Mitragynine From Mitragyna Speciosa
No ratings yet
Molecular Cloning and Characterization of Strictosidine Synthase, A Key Gene in Biosynthesis of Mitragynine From Mitragyna Speciosa
7 pages
Science Tech Society
No ratings yet
Science Tech Society
10 pages
Advanced Insect Systematics
No ratings yet
Advanced Insect Systematics
19 pages
Science 2014
100% (2)
Science 2014
130 pages
Pepaña, Tugahan, Mamailao (A3a78) - Ex2
No ratings yet
Pepaña, Tugahan, Mamailao (A3a78) - Ex2
9 pages