0% found this document useful (0 votes)

33 views38 pages

Week 4

This document discusses pairwise sequence alignment and dynamic programming algorithms used for sequence alignment. It provides the following key points: 1. Dynamic programming algorithms allow efficient exploration of all possible sequence alignments to find the optimal alignment without exploring non-optimal alignments. 2. The Needleman-Wunsch algorithm is an example of a dynamic programming approach that can find the globally optimal alignment of two sequences by filling a score matrix. 3. Filling the score matrix proceeds by considering all possible ways to align each additional residue pair, accounting for match, mismatch and gap penalties to optimize the total alignment score.

Uploaded by

Nurullah Mertel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views38 pages

Week 4

Uploaded by

Nurullah Mertel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Bioinformatics

Pairwise Sequence Alignment

Assoc. Prof. Dr. Gazi Erkan BOSTANCI

Slides are mainly based on ‘Understanding Bioinformatics’ by Marketa

Zvelebil and Jeremy O. Baum
• Given two sequences, and allowing gaps to be inserted, it is possible to
construct a very large number of alignments.

• Of these, there will be an optimal alignment, which in the ideal case

perfectly identifies the true equivalences between the sequences.
• However, there will be many alternative alignments with varying
degrees of error that could potentially be seriously misleading.

• Furthermore, the fact that an alignment can be constructed for any

two sequences, even ones with no meaningful equivalences, has the
potential to be even more misleading.

• Therefore, all useful methods of sequence alignment must not only

generate alignments but also be able to compare them in a
meaningful way and to provide an assessment of their significance.
• The number of alternative alignments is so great, however, that efficient
methods are required to determine those with optimal scores. Fortunately,
algorithms have been derived that can be guaranteed to identify the optimal
alignment between two sequences for a given scoring scheme.
• As long as only single proteins or genes, or small segments of
genomes, are aligned, these methods can be applied with ease on
today’s computers.

• When searching for alignments of a query sequence with a whole

database of sequences it is usual practice to use more approximate
methods that speed up the search.
• Finding the best-scoring alignment between two sequences does not
guarantee the alignment has any scientific validity. Ways must be
found to discriminate between fortuitously good alignments and
those due to a real evolutionary relationship.

• The large number of complete genome sequences has led to

increased interest in aligning very long sequences such as whole
genomes and chromosomes.

• These applications require a number of approximations and

techniques to increase the speed and reduce the storage
requirements.
Dynamic Programming Algorithms
• For any given pair of sequences, if gaps are allowed there is a large
number of possibilities to consider in determining the best-scoring
alignment.
• For example, two sequences of length 1000 have approximately 10600
different alignments, vastly more than there are particles in the
universe! Given the number and length of known sequences it would
seem impossible to explore all these possibilities.
• Nevertheless, a class of algorithms has been introduced that is able to
efficiently explore the full range of alignments under a variety of
different constraints. They are known as dynamic programming
algorithms, and efficiently avoid needless exploration of the majority
of alignments that can be shown to be nonoptimal.
• The key property of dynamic programming is that the problem can be
divided into many smaller parts. Consider the following alignment:

in which the subscripts u, v, etc. refer to alignment positions rather

than residue types, so that Xu, Yv, and so on each correspond to a
residue or to a gap.
The alignment has been divided into three parts, with positions labeled
1 -> u, u + 1 -> v, and v + 1 -> L.
Because scores for the individual positions are added together, the
score of the whole alignment is the sum of the scores of the three
parts; that is, their contributions to the score are independent. Thus,
the optimal global alignment can be reduced to the problem of
determining the optimal alignments of smaller sections.
• A corollary to this is that the global optimal alignment will not contain parts that
are not themselves optimal. While affine gap penalties require a slightly more
sophisticated argument, essentially the same property holds true for them as
well.

• Starting with sufficiently short sub-sequences, for example the first residue of
each sequence, the optimal alignment can easily be determined, allowing for all
possible gaps.
• Subsequently, further residues can be added to this, at most one from each
sequence at any step. At each stage, the previously determined optimal
subsequence alignment can be assumed to persist, so only the score for adding
the next residue needs to be investigated.

• A worked example later in the section will make this clear. In this way, the optimal
global alignment can be grown from one end of the sequence.

• As an alignment of two sequences will consist of pairs of aligned residues, a

rectangular matrix can conveniently represent these, with rows corresponding to
the residues of one sequence, and columns to those of the other.
• Until the global optimal alignment has been obtained, it is not known which
actual residues are aligned. All possibilities must be considered or the
optimal alignment could be missed. This is not as impossible as it might
seem.
• Saul Needleman and Christian Wunsch published the original dynamic
programming application in this field in 1970, since then many variations
and improvements have been made, some of which will be described here.
There have been three different motivations for developing these
modifications:
• Firstly, global and local alignments require slightly different algorithms.
• Secondly, but less commonly, certain gap-penalty functions and the desire to
optimize scoring parameters have resulted in further new schemes.
• Lastly, especially in the past, the computational requirements of the algorithms
prevented some general applications. For example, the basic technique in a standard
implementation requires computer memory proportional to the product m*n for two
sequences of length m and n. Some algorithms have been proposed that reduce this
demand considerably.
Optimal global alignments are produced using
efficient variations of the Needleman–Wunsch
algorithm
• We will introduce dynamic programming methods by describing their
use to find optimal global alignments. Needleman and Wunsch were
the first to propose this method, but the algorithm given here is not
their original one, because significantly faster methods of achieving
the same goal have since been developed.

• The problem is to align sequence x (x1x2x3…xm) and sequence y

(y1y2y3…yn), finding the best scoring alignment in which all residues of
both sequences are included. The score will be assumed to be a
measure of similarity, so that the highest score is desired.
• The key concept in all these algorithms is the matrix S of optimal
scores of subsequence alignments. The matrix has (m + 1) rows
labeled 0 -> m and (n + 1) columns labeled 0 -> n.

• The rows correspond to the residues of sequence x, and the columns

to those of sequence y. We shall use as a working example the
alignment of the sequences x = THISLINE and y = ISALIGNED, with the
BLOSUM-62 substitution matrix as the scoring matrix.
• The BLOSUM-62 substitution
matrix scores in half bits.
• Scores that would be given to
identical matched residues are in
blue; positive scores for
nonidentical matched residues are
in red.

• The latter represent pairs of

residues for which substitutions
were observed relatively often in
the aligned reference sequences.
• Because the sequences are small they can be aligned manually, and
so we can see that the optimal alignment is:

• This alignment might not produce the optimal score if the gap penalty
were set very high relative to the substitution matrix values, but in
this case it could be argued that the scoring parameters would then
not be appropriate for the problem.

• In the matrix in figure below, the element Si,j is used to store the score
for the optimal alignment of all residues up to xi of sequence x with
all residues up to yj of sequence y.
• The initial stage of filling in
the dynamic programming
matrix to find the optimal
global alignment of the two
sequences THISLINE and
ISALIGNED.
• The initial stage of filling in
the matrix depends only on
the linear gap penalty with E
set to –8.
• The arrows indicate the
cell(s) to which each cell
value contributes.
• The sequences (x1x2x3…xi) and (y1y2y3…yj) with i < m and j < n are called
subsequences. Column Si,0 and row S0,j correspond to the alignment of the
first i or j residues with the same number of gaps. Thus, element S0,3 is the
score for aligning sub-sequence y1y2y3 with a gap of length 3.

• To fill in this matrix, one starts by aligning the beginning of each sequence;
that is, in the extreme upper left-hand corner.

• The elements Si,0 and S0,j are easy to fill in, because there is only one possible
alignment available. Si,0 represents the alignment
• We will start by considering a linear gap penalty g of –8ngap for a gap
of ngap residues, giving the scores of Si,0 and S0,j as –8i and –8j,
respectively. This starting point with numerical values inserted into
the matrix is illustrated in figure above.
• The other matrix elements are filled in according to simple rules that
can be understood by considering a process of adding one position at
a time to the alignment.
• There are only three options for any given position:
• a pairing of residues from both sequences,
• and the two possibilities of a residue from one sequence aligning with a gap
in the other.
• These three options can be written as:

• The scores associated with these are s(xi,yj), g, and g, respectively.

• The value of s(xi,yj) is given by the element sa,b of the substitution

score matrix, where a is the residue type of xi and b is the residue
type of yj.
• The change in notation is solely to improve the clarity of the following
equations.
• Consider the evaluation of element S1,1, so that the only residues that appear
in the alignment are x1 and y1. The left-hand possibility of the three
possibilities could only occur starting from S0,0, as all other alignments will
already contain at least one of these two residues. The middle possibility can
only occur from S1,0 because it requires an alignment that contains x1 but not
y1. Similar reasoning shows that the right-hand possibility can only occur from
S0,1. The three possible alignments have the following scores:

where s(I,T) has been obtained from BLOSUM62. Of these alternatives, the
optimal one is clearly the first. Hence in figure below, S1,1 = –1. Because S1,1 has
been derived from S0,0 an arrow has been drawn linking them in the figure.
• An identical argument can be made to
construct any element of the matrix
from three others, using the formula:

• The maximum (“max”) implies that we

are using a similarity score. Figure
illustrates this formula in the layout of
the matrix.
• Note that it is possible for more than
one of the three alternatives to give the
same optimal score, in which case
arrows are drawn for all optimal
alternatives.
• The dynamic programming matrix used to find the optimal global alignment of
the two sequences THISLINE and ISALIGNED.
• (A) The completed matrix using the BLOSUM-62 scoring matrix and a linear gap
penalty, with E set to –8. The red arrows indicate steps used in the traceback of
the optimal alignment.
• (B) The optimal alignment returned by these calculations, which has a score of –
4.
• We now have a matrix of scores for optimal alignments of many sub-
sequences, together with the global sequence alignment score. This is
given by the value of Sm,n, which in this case is S8,9 = –4.

• Note that this is not necessarily the highest score in the matrix, which
in this case is S8,8 = 4, but only Sm,n includes the information from both
complete sequences.

• For each matrix element we know the element(s) from which it was
directly derived. Arrows in the figures are used to indicate this
information.
• We can use the information on the derivation of each element to obtain
the actual global alignment that produced this optimal score by a process
called traceback.
• Beginning at Sm,n we follow the arrows back through the matrix to the
start (S0,0). Thus, having filled the matrix elements from the beginning of
the sequences, we determine the alignment from the end of the
sequences.
• At each step, we can determine which of the three alternatives given in
the equation for Si,j has been applied, and add it to our alignment. If Si,j
has a diagonal arrow from Si– 1,j – 1, that implies the alignment will contain
xi aligned with yj. Vertical arrows imply a gap in sequence x aligning with
a residue in sequence y, and vice versa for horizontal arrows.
• The traceback arrows involved in the optimal global alignment are shown
in red. When tracing back by hand, care must be taken, as it is easy to
make mistakes, especially by applying the results to residues xi – 1 and yj – 1
instead of xi and yj.
• The traceback information is often stored efficiently in computer
programs, for example using three bits to represent the possible
origins of each matrix element. If a bit is set to zero, that path was not
used, with a value of one indicating the direction. Such schemes allow
all this information to be easily stored and analyzed to obtain the
alignment paths.

• Note that there may be more than one optimal alignment if at some
point along the path during traceback an element is encountered that
was derived from more than one of the three possible alternatives.

• The algorithm does not distinguish between these possible

alignments, although there may be reasons for preferring one to the
others.
• Such preference would normally be justified by knowledge of the molecular
structure or function. Most programs will arbitrarily report just one single
alignment.
• The alignment given by the traceback is not the one we expected, in that it
contains no gaps.

• The carboxy-terminal aspartic acid residue (D) in sequence y is aligned with a

gap only because the two sequences are not the same length. We can readily
understand this outcome if we consider our chosen gap penalty of 8 in the
light of the BLOSUM-62 substitution matrix.
• The worst substitution score in this matrix is –4, significantly less than the gap
penalty. Also, many of the scores for aligning identical residues are only 4 or 5.
This means that if we set such a high gap penalty, a gap is unlikely to be
present in an optimal alignment using this scoring matrix. In these
circumstances, gaps will occur if the sequences are of different length and also
possibly in the presence of particular residues such as tryptophan or cysteine
which have higher scores.
• If instead we use a linear gap penalty g(ngap) = –4ngap, the situation changes, as
shown in Figure below, which gives the optimal alignment we expected.
Because the gap penalty is less severe, gaps are more likely to be introduced,
resulting in a different alignment and a different score.

• In this particular case, four additional gaps occur, two of which occur within the
sequences. The overall alignment score is 7, but this alignment would have
scored –13 with the original gap penalty of 8.

• This example illustrates the need to match the gap penalty to the substitution
matrix used. However, care must be taken in matching these parameters, as the
performance also depends on the properties of the sequences being aligned.
Different parameters may be optimal when looking at long or short sequences,
and depending on the expected sequence similarity.
• Optimal global alignment of two sequences, except for a change in gap
scoring. The linear gap penalty using a value of –4 for the parameter E.
• (A) The completed matrix using the BLOSUM-62 scoring matrix.
• (B) The optimal alignment, which has a score of 7.
Local and suboptimal alignments can be produced
by making small modifications to the dynamic
programming algorithm
• Often we do not expect the whole of one sequence to align well with
the other. For example, the proteins may have just one domain in
common, in which case we want to find this high-scoring zone,
referred to as a local alignment.

• In a global alignment, those regions of the sequences that differ

substantially will often obscure the good agreement over a limited
stretch. The local alignment will identify these stretches while
ignoring the weaker alignment scores elsewhere.
• It turns out that a very similar dynamic programming algorithm to that
described above for global alignments can obtain a local alignment.

• Smith and Waterman first proposed this method. However, it should be noted
that the method presented here requires a similarity-scoring scheme that has
an expected negative value for random alignments and positive value for highly
similar sequences.

• Most of the commonly used substitution matrices fulfill this condition. Note
that the global alignment schemes have no such restriction, and can have all
substitution matrix scores positive.

• Under such a scheme, scores will grow steadily larger as the alignment gets
larger, regardless of the degree of similarity, so that long random alignments
will ultimately be indistinguishable by score alone from short significant ones.
• The key difference in the local alignment algorithm from the global alignment
algorithm set out above is that whenever the score of the optimal sub-
sequence alignment is less than zero it is rejected, and that matrix element is
set to zero.

• The scoring scheme must give a positive score for aligning (at least some)
identical residues. We would expect to be able to find at least one such match
in any alignment worth considering, so that we can be sure that there should
be some positive alignment scores.

• Another algorithmic difference is that we now start traceback from the

highest-scoring matrix element wherever it occurs.
• The extra condition on the matrix elements means that the values of
Si,0 and S0,j are set to zero, as was the case for global alignments
without end gap penalties. The formula for the general matrix
element Si,j with a general gap penalty function g(ngap) is

• Note that the equation only differs from earlier equation by the
inclusion of the zero.
• Figures below show the optimal local alignments for our usual example
in the two cases of linear gap penalties g(ngap) = –8ngap and –4ngap,
respectively. Both result in removal of the differing ends of the
sequences.

• In the first case, the higher gap penalty forces an alignment of serine
(S) and alanine (A) in preference to adding a gap to reach the identical
IS sub-sequence. Lowering the gap penalty in this instance improves
the result to give the local alignment we would expect.
• The dynamic programming calculation for determining the optimal local
alignment of the two sequences THISLINE and ISALIGNED.
• (A) The completed matrix using the BLOSUM-62 scoring matrix with a
linear gap penalty, with E set to –8.
• (B) The optimal alignment, determined by the highest-scoring element,
which has a score of 12.
• Optimal local alignment calculation with a linear gap penalty with E
set to –4.
• (A) The completed matrix for determining the optimal local alignment
of THISLINE and ISALIGNED using the BLOSUM-62 scoring matrix.
• (B) The optimal alignment, identified by the highest scoring element
in the entire matrix, which has a score of 19.
• The problem with dynamic programming methods is that despite
their efficiency they can place heavy demands on computer memory
and take a long time to run.

• The speed of calculation is no longer as serious a barrier as it has

been in the past, but the problem of insufficient computer memory
persists, particularly as there are now many very long sequences,
including those of whole genomes, available for comparison and
analysis.
• Some modifications of the basic dynamic programming algorithm
have been made that reduce the memory and time demands:

• One way of reducing memory requirements is by storing not the

complete matrix but only the two rows required for calculations.

• However, to recover the alignment from such a calculation takes

longer than if all the traceback information has been saved.

• By only calculating a limited region of the matrix, commonly a

diagonal band, both time and space saving can be made, although at
the risk of not identifying the correct optimal alignment.
• Often the first step in a sequence analysis is to search databases to
retrieve all related sequences. Such searches depend on making
pairwise alignments of the query sequence against all the sequences
in the databases, but because of the scale of this task, fast
approximate methods are usually used to make such searches more
practicable.

• The algorithms for two commonly used search programs—BLAST and

FASTA—make use of indexing techniques such as suffix trees and
hashing to locate short stretches of database sequences highly similar
or identical to parts of the query sequence.
• Attempts are then made to extend these to longer, ungapped local
alignments which are scored, the scores being used to identify
database sequences that are likely to be significantly similar. This
process is considerably faster than applying full-matrix dynamic
programming to each database sequence.

• At this point, both techniques revert to the more accurate methods to

examine the highest-scoring sequences, in order to determine the
optimal local alignment and score, but this is only done for a tiny
fraction of the database entries.

Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Lecture 9 and 10 Pair wise global Alignment.
No ratings yet
Lecture 9 and 10 Pair wise global Alignment.
27 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
What Is Dynamic Programming?
No ratings yet
What Is Dynamic Programming?
7 pages
Three Steps in Dynamic Programming
No ratings yet
Three Steps in Dynamic Programming
7 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Importance and Significance of Sequence Alignment.pptx12
No ratings yet
Importance and Significance of Sequence Alignment.pptx12
15 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
Bioinfo Generic Skill
No ratings yet
Bioinfo Generic Skill
10 pages
lecture1-2
No ratings yet
lecture1-2
44 pages
Tabby
No ratings yet
Tabby
11 pages
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
No ratings yet
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
57 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
L-8 Global Alignment
No ratings yet
L-8 Global Alignment
19 pages
Multiple Sequence Alignments
No ratings yet
Multiple Sequence Alignments
9 pages
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
No ratings yet
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
6 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
W03_Pairwise
No ratings yet
W03_Pairwise
55 pages
Chapter 5 Pairwise Alignment
No ratings yet
Chapter 5 Pairwise Alignment
8 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
4. Sequence Alignment
No ratings yet
4. Sequence Alignment
24 pages
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
No ratings yet
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
32 pages
Notes On Dynamic-Programming Sequence Alignment
No ratings yet
Notes On Dynamic-Programming Sequence Alignment
8 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Zhang 2000
No ratings yet
Zhang 2000
12 pages
2NGS.01.Alignment
No ratings yet
2NGS.01.Alignment
18 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Introduction Dynamic Programming
No ratings yet
Introduction Dynamic Programming
52 pages
Gap Penalty - Wikipedia
No ratings yet
Gap Penalty - Wikipedia
6 pages
Lecture 5 Introduction Dynamic Programming
No ratings yet
Lecture 5 Introduction Dynamic Programming
52 pages
lecture2_sequence_alignment
No ratings yet
lecture2_sequence_alignment
26 pages
Dynamic Programming Methods in Pairwise Alignment
No ratings yet
Dynamic Programming Methods in Pairwise Alignment
41 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Needleman Wunsch PDF
No ratings yet
Needleman Wunsch PDF
3 pages
Chapter 7 Multiple Alignment
No ratings yet
Chapter 7 Multiple Alignment
6 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Msa Notes
No ratings yet
Msa Notes
10 pages
Sequence Comparison Part 3
No ratings yet
Sequence Comparison Part 3
22 pages
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
No ratings yet
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
18 pages
Needlemanwunsch 130216130832 Phpapp01
No ratings yet
Needlemanwunsch 130216130832 Phpapp01
39 pages
Effects of Gap Open and Gap Extension Penalties
No ratings yet
Effects of Gap Open and Gap Extension Penalties
5 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony
No ratings yet
Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony
7 pages
Dynamic Programming Approach (1)
No ratings yet
Dynamic Programming Approach (1)
32 pages
Pairwise Alignment 2017
No ratings yet
Pairwise Alignment 2017
49 pages
DNA Alignment
No ratings yet
DNA Alignment
76 pages
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
325 04 Transformers
No ratings yet
325 04 Transformers
31 pages
EE403 Notes 3rd4th Week
No ratings yet
EE403 Notes 3rd4th Week
47 pages
Midterm
No ratings yet
Midterm
1 page
Final
No ratings yet
Final
5 pages
Assignment #1
No ratings yet
Assignment #1
4 pages
Week 1
No ratings yet
Week 1
24 pages
CEN338 Midterm Exam Nurullah Mertel 18290219
No ratings yet
CEN338 Midterm Exam Nurullah Mertel 18290219
3 pages
An Investigation of The Therac-25 Accidents
No ratings yet
An Investigation of The Therac-25 Accidents
24 pages
Essay
No ratings yet
Essay
2 pages
Adobe Scan 17 Sep 2024
No ratings yet
Adobe Scan 17 Sep 2024
11 pages
Microfluidic Integration For Electrochemical Biosensor
No ratings yet
Microfluidic Integration For Electrochemical Biosensor
8 pages
Flashcards - 2a-D Cells and Biological Molecules - Edexcel Biology IGCSE
No ratings yet
Flashcards - 2a-D Cells and Biological Molecules - Edexcel Biology IGCSE
73 pages
REPRODUCTION IN ANIMALS Science 5
No ratings yet
REPRODUCTION IN ANIMALS Science 5
23 pages
Fba With Cobrapy: Keesha Erickson June 21, 2018 Qbio Summer School
No ratings yet
Fba With Cobrapy: Keesha Erickson June 21, 2018 Qbio Summer School
30 pages
Rajkanya Dna Computing Report
No ratings yet
Rajkanya Dna Computing Report
23 pages
The Cellular Basis of Reproduction and Inheritance: Campbell Biology: Concepts & Connections
No ratings yet
The Cellular Basis of Reproduction and Inheritance: Campbell Biology: Concepts & Connections
36 pages
DNA, RNA and Protein Synthesis
No ratings yet
DNA, RNA and Protein Synthesis
24 pages
Worksheet 2_Reproduction in Plants
No ratings yet
Worksheet 2_Reproduction in Plants
11 pages
FTS-12 (Code-B) Solution
No ratings yet
FTS-12 (Code-B) Solution
12 pages
Glossary of Terms Used in Pathology
No ratings yet
Glossary of Terms Used in Pathology
3 pages
History of Immunology
No ratings yet
History of Immunology
40 pages
BIO302 Final Term Solved Papers
No ratings yet
BIO302 Final Term Solved Papers
20 pages
CAP 2011 Hematology Glossary
No ratings yet
CAP 2011 Hematology Glossary
55 pages
CANCER Med Surg
No ratings yet
CANCER Med Surg
20 pages
As Past Paper Questions p2
No ratings yet
As Past Paper Questions p2
8 pages
The Basic of Life
No ratings yet
The Basic of Life
21 pages
Get Animal locomotion : physical principles and adaptations 1st Edition Blickhan free all chapters
100% (3)
Get Animal locomotion : physical principles and adaptations 1st Edition Blickhan free all chapters
55 pages
Biosensors: A Presentation by Niloy Paul Reg. No. 14-05-3183 Department of Biotechnology Bsmrau
No ratings yet
Biosensors: A Presentation by Niloy Paul Reg. No. 14-05-3183 Department of Biotechnology Bsmrau
39 pages
Why The Wealthy Give Francie Ostrower pdf download
100% (2)
Why The Wealthy Give Francie Ostrower pdf download
33 pages
Virus Indexing
No ratings yet
Virus Indexing
15 pages
Science Gen Ed
No ratings yet
Science Gen Ed
17 pages
Role of Skin PH in Psoriasis: Paul L. Bigliardi
No ratings yet
Role of Skin PH in Psoriasis: Paul L. Bigliardi
7 pages
RPP Exemplar 7 IB Biology
No ratings yet
RPP Exemplar 7 IB Biology
3 pages
Insulin and Hypoglycemic Agents-1
100% (1)
Insulin and Hypoglycemic Agents-1
24 pages
Don_Craig_Wiley
No ratings yet
Don_Craig_Wiley
4 pages
Chikungunya Virus Infection in The Southernmost State of Brazil Was Characterised by Self-Limited Transmission (2017-2019) and A Larger 2021 Outbreak
No ratings yet
Chikungunya Virus Infection in The Southernmost State of Brazil Was Characterised by Self-Limited Transmission (2017-2019) and A Larger 2021 Outbreak
83 pages
The Epidemiology of Breast Implant Associated.1
No ratings yet
The Epidemiology of Breast Implant Associated.1
8 pages

Week 4

Uploaded by

Week 4

Uploaded by

Bioinformatics

Pairwise Sequence Alignment

Slides are mainly based on ‘Understanding Bioinformatics’ by Marketa

• Of these, there will be an optimal alignment, which in the ideal case

• Furthermore, the fact that an alignment can be constructed for any

• Therefore, all useful methods of sequence alignment must not only

• When searching for alignments of a query sequence with a whole

• The large number of complete genome sequences has led to

• These applications require a number of approximations and

in which the subscripts u, v, etc. refer to alignment positions rather

• As an alignment of two sequences will consist of pairs of aligned residues, a

• The problem is to align sequence x (x1x2x3…xm) and sequence y

• The rows correspond to the residues of sequence x, and the columns

• The latter represent pairs of

• The scores associated with these are s(xi,yj), g, and g, respectively.

• The value of s(xi,yj) is given by the element sa,b of the substitution

• The maximum (“max”) implies that we

• The algorithm does not distinguish between these possible

• The carboxy-terminal aspartic acid residue (D) in sequence y is aligned with a

• In a global alignment, those regions of the sequences that differ

• Another algorithmic difference is that we now start traceback from the

• The speed of calculation is no longer as serious a barrier as it has

• One way of reducing memory requirements is by storing not the

• However, to recover the alignment from such a calculation takes

• By only calculating a limited region of the matrix, commonly a

• The algorithms for two commonly used search programs—BLAST and

• At this point, both techniques revert to the more accurate methods to

You might also like