0% found this document useful (0 votes)

24 views27 pages

Sequence Alignment Presentation

Uploaded by

Munna Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views27 pages

Sequence Alignment Presentation

Uploaded by

Munna Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 27

Sequence alignment

Sequence alignment is a way of arranging the sequences of

DNA, RNA or protein to identify regions of similarity that may
be a consequence of functional, structural or evolutionary
relationships between the sequences.

The sequences are padded with gaps (dashes) so that wherever possible,
columns contain identical characters from the sequences involved

tcctctgcctctgccatcat---caaccccaaagt
|||| ||| ||||| ||||| ||||||||||||
tcctgtgcatctgcaatcatgggcaaccccaaagt
Sequence alignment

Sequence alignment is important for:

* prediction of function
* database searching
* gene finding
* sequence divergence
Causes for sequence (dis)similarity

mutation: a nucleotide at a certain location is replaced by

another nucleotide (e.g.: ATA → AGA)

insertion: at a certain location one new nucleotide is

inserted inbetween two existing nucleotides
(e.g.: AA → AGA)

deletion: at a certain location one existing nucleotide

is deleted (e.g.: ACTG → AC-G)

indel: an insertion or a deletion

An example of aligning text strings
Raw Data ???
T C A T G
C A T T G
4 matches, 1 insertion
2 matches, 0 gaps T C A- T G
T C A T G | | | |
| | . C ATT G
C A T T G
4 matches, 1 insertion
3 matches (2 end gaps) T C A T - G
T C A T G . | | | |
| | | . C A T T G
. C A T T G
Terminologies of sequence comparison

 Sequence identity -- exactly the same Amino Acid or Nucleotide in the

same position.

 Sequence similarity -- Substitutions with similar chemical properties.

 Sequence homology -- general term that indicates evolutionary

relatedness among sequences; we usually measure of percentage
identity of sequence homology.

 Pairwise alignment -- used to find the best-matching piecewise (local)

or global alignments of two query sequences. Pairwise alignments can
only be used between two sequences at a time.

 Multiple sequence alignment -- try to align all the sequences in a given

query set.
The procedure of comparing two (pair-wise alignment) or more
multiple sequences is to search for a series of individual
characters or patterns that are in the same order in the
sequences. Typically, the purpose of this is to find homologues
(relatives) of a gene or gene-product in a database.
This information is useful for answering a variety of biological questions:

1. The identification of sequences of unknown structure or function.

2. The study of molecular evolution.

There are two types of alignment: local and global.
 Global alignment is attempting to match as much of the sequence as
possible.
The tool for Global alignment is based on Needleman-Wunsch algorithm.

 Local alignment is to try to find the regions with highest density of

matches. The tool for local alignment is based on Smith-Waterman
algorithm.


A global alignment between two sequences is an alignment in which all the
characters in both sequences participate in the alignment.
Global alignments are useful mostly for finding closely-related sequences.
The global best fit between two sequences
Example: the sequences s = VIVALASVEGAS and
t = VIVADAVIS align like:
A(s,t) =

indels
Local alignment methods find related regions within sequences - they can
consist of a subset of the characters within each sequence.

LGPSSKQTGKGS-SRIWDN
Global alignment
LN-ITKSAGKGAIMRLGDA

-------TGKG--------
Local alignment
-------AGKG--------
Methods of pairwise alignment
 Dot matrix analysis
 The dynamic programming (DP) algorithm
 Word methods
Dot matrix analysis
 A dot matrix analysis is a method for comparing two
sequences to look for possible alignment (Gibbs and McIntyre
1970)
 Dot plots are two dimensional graphs showing a comparison
of two sequences. The two axes X and Y of the graph represent
the two sequences being compared. Wherever a base or
residue of one axis coincides with a base or residue on the
other axis, it is marked with a dot. Any region of similarity is
revealed by a diagonal row of dots. Isolated dots not on
diagonal represent random matches.
 Assume that we have to compare the following sequences
Sequence 1: AGCTAGGA
Sequence 2: CACTAGGC
Insert a dot in each matching cell and then scan the resulting
graphs for a series of dots that form a diagonal.
A G C T A G G A
C
A
C
T
A
G
G
C

To maximize the number of matches the resulting alignment could then be

- AGCTAGGA –
CA -CTAGG - C
Dynamic programming algorithm

 The approach compares every pair of characters in the two sequences

and generates an alignment, which is the best or optimal.
This procedure assigns score for matches, mismatches and gaps. It
generates a matrix of number that represents all possible alignments
between the sequences. The highest set of scores in the matrix defines
an optimal alignment.
 The method can be useful in aligning nucleotide to protein sequences.
The method requires large amounts of computing power and is a highly
computationally demanding.
New algorithmic improvements as well as increasing computer capacity
make possible to align a query sequence against a large database in a
few minutes.
The dynamic programming approach to sequence alignment always tries to
follow the best prior-result so far.

Try to align two sequences by inserting some gaps at different locations, so as to

maximize the score of this alignment.

Score measurement is determined by "match award", "mismatch penalty" and

"gap penalty". The higher the score, the better the alignment.

If both penalties are set to 0, it aims to always find an alignment with maximum
matches so far.
It is used to compare the similarity between two sequences of DNA or Protein, to
predict similarity of their functionalities.

A global alignment program is based on the Needleman-Wunsch

(1970) algorithm and a local alignment program is based on the
Smith-Waterman algorithm (1981).
Scoring function

The cost for aligning the two sequences s = VIVALASVEGAS and t =

is: indels

M(A) = 7 matches + 2 mismatches + 3 gaps

=7 –2 –3 =2
 Word methods, also known as k-tuple methods, are heuristic
methods that are not guaranteed to find an optimal alignment
solution, but are significantly more efficient than dynamic
programming.
 The typical tools used for this method is BLAST and FASTA.
• BLAST
 Heuristic method to find the highest scoring
 Locally optimal alignments
 Allow multiple hits to the same sequence
 Based on statistics of ungapped sequence alignments
 The statistics allow the probability of obtaining an ungapped alignment
 Use dynamic programming for narrow region
• FASTA
 Fast sequence search
 Based on dotplot
 Identify identical words (k-tuples)
 Search significant diagonals
 Dynamic programming for narrow region
The substitution score matrix
Substitution: Exchange of one amino acid with that of another amino acid of very
similar physicochemical properties so that the protein is not affected functionally.

Conservative substitution: Substitution that does not affect the protein’s property or
function.

Substitution score matrix is used to show scores for amino acid substitutions.
When calculating alignment scores, identical amino acids are given greater value
than substitutions and among substitutions conservative substitutions are given
greater value than non-conservative substitutions.
Two widely used substitution matrices are PAM and BLOSUM.

PAM - Point Accepted Mutation (Margaret Dayhoff)

Based on closely related proteins
BLOSUM - Blocks Substitution Matrix (Henikoff and Henikoff)
Based on conserved blocks bounded in similarity

PAM BLOSUM
Based on global alignments Based on local alignments.
of closely related proteins.

The PAM1 is calculated from BLOSUM 62 is calculated from

comparisons of sequences comparisons of sequences
with at least 62% identity
with no more than 1% in the blocks.
divergence.
Other PAM matrices are All BLOSUM matrices are
extrapolated from PAM1. based on observed
alignments.
They are not extrapolated
from comparisons of closely
related proteins.
PAM

PAM is the substitution of one amino acid of a protein by another that is ‘accepted’
by evolution. This implies that within some given species, the mutation has not
only arisen but has overtime, spread to essentially the entire species. One PAM
(PAM1) is a unit of evolutionary divergence in which 1% of the amino acids has
been changed (i.e. one point mutation per 100 residues).

PAM is based on the estimated mutation rates from the closely related proteins
and is dominated by the amino acid mutations caused by single base changes.

PAM is used to select groups of amino acids that represent conservative

substitutions in the proteins because it summarizes the observed replacement that
have taken place while conserving the structural and functional properties of
proteins.

Thus PAM matrix provided an empirical, experimental determination of conserved

replacement.
BLOSUM

BLOSUM is based on the observed amino acid substitutions in a large set of

more than 2000 conserved amino acid patterns called blocks. These blocks are
found in a database of protein sequences representing more than 500 families of
related proteins and act as signatures of these protein families.
Multiple sequence alignment:
Determine the best alignment between multiple
(more than two) DNA-sequences.

Multiple alignment is an extension of pairwise alignment to

incorporate more than two sequences into an alignment.

Multiple alignment methods try to align all of the sequences in a

specified set.

The most popular multiple alignment tool is CLUSTAL W.

‘W’ stands for ‘weighted’ (sequences are weighted
differently).
• MSA is central to many bioinformatics
applications
• Phylogenetic tree
• Motifs
• Patterns
• Structure prediction (RNA, protein)
Three-step process
1.) Construct pairwise alignments
2.) Build Guide Tree
3.) Progressive Alignment guided
by the tree
Multiple alignment

Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Importance and Significance of Sequence Alignment.pptx12
No ratings yet
Importance and Significance of Sequence Alignment.pptx12
15 pages
W03_Pairwise
No ratings yet
W03_Pairwise
55 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
4. Sequence Alignment
No ratings yet
4. Sequence Alignment
24 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
05. Sequence Alignment
No ratings yet
05. Sequence Alignment
9 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
3
No ratings yet
3
107 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
msa_MTech
No ratings yet
msa_MTech
17 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Protein Sequence Alignment Lecture Notes
No ratings yet
Protein Sequence Alignment Lecture Notes
2 pages
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
No ratings yet
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
17 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
2. Sequence alignment
No ratings yet
2. Sequence alignment
25 pages
Lecture 6- Sequence Analysis
No ratings yet
Lecture 6- Sequence Analysis
28 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
BLAST (Basic Local Alignment Search Tool)
100% (1)
BLAST (Basic Local Alignment Search Tool)
23 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
lec-02
No ratings yet
lec-02
103 pages
Dynamic Programming Methods in Pairwise Alignment
No ratings yet
Dynamic Programming Methods in Pairwise Alignment
41 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Bioinformatics Pairwise Alignment
No ratings yet
Bioinformatics Pairwise Alignment
128 pages
Introduction To Bioinformatics Lecture 3
No ratings yet
Introduction To Bioinformatics Lecture 3
20 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
Alignment of Sequences
No ratings yet
Alignment of Sequences
33 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
03_Sequence Alignment (1)
No ratings yet
03_Sequence Alignment (1)
4 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Genomics and Similarity search
No ratings yet
Genomics and Similarity search
43 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Module-II
No ratings yet
Module-II
51 pages
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
No ratings yet
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
18 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Multiple Sequence Alignment Black and White
No ratings yet
Multiple Sequence Alignment Black and White
2 pages
lecture1_Loi
No ratings yet
lecture1_Loi
52 pages
Sequence Alignment Methods Final
No ratings yet
Sequence Alignment Methods Final
69 pages
Sequence Alingment
No ratings yet
Sequence Alingment
10 pages
CE6068 Lecture 5
No ratings yet
CE6068 Lecture 5
83 pages
B.I Sec 4.
No ratings yet
B.I Sec 4.
18 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Notes Bioinformatics
No ratings yet
Notes Bioinformatics
14 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
Cross Correlation: Unlocking Patterns in Computer Vision
From Everand
Cross Correlation: Unlocking Patterns in Computer Vision
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
mol breeding
No ratings yet
mol breeding
1 page
Mia Li - 00_genetics Notes 2016
No ratings yet
Mia Li - 00_genetics Notes 2016
23 pages
DNA Revision Slides
No ratings yet
DNA Revision Slides
39 pages
dihybrid-activity-guide
No ratings yet
dihybrid-activity-guide
3 pages
The Neutral Theory of Molecular Evolution - A Recent Review of Evidence
No ratings yet
The Neutral Theory of Molecular Evolution - A Recent Review of Evidence
2 pages
GMO PowerPoint - Using in Class
100% (1)
GMO PowerPoint - Using in Class
37 pages
3 Transcription-From - DNA - To - RNA
No ratings yet
3 Transcription-From - DNA - To - RNA
7 pages
Transposable Elements in Humans
No ratings yet
Transposable Elements in Humans
10 pages
Ishani_CV_RetroBiotech
No ratings yet
Ishani_CV_RetroBiotech
3 pages
Mcgee Encycl Life Sci Genetic Exchange
No ratings yet
Mcgee Encycl Life Sci Genetic Exchange
9 pages
Get Human Ring Chromosomes: A Practical Guide for Clinicians and Families 2024th Edition Peining Li free all chapters
100% (4)
Get Human Ring Chromosomes: A Practical Guide for Clinicians and Families 2024th Edition Peining Li free all chapters
62 pages
2001 Blackburn. Telomeres
No ratings yet
2001 Blackburn. Telomeres
7 pages
Research Methods in Evolutionary Biology - Assignment 2: Method
No ratings yet
Research Methods in Evolutionary Biology - Assignment 2: Method
5 pages
2023-Non-Sequencing Based Molecular Bacterial Typing As An Affordable Tool For Outbreak Investigation in Low-Income Countries
No ratings yet
2023-Non-Sequencing Based Molecular Bacterial Typing As An Affordable Tool For Outbreak Investigation in Low-Income Countries
11 pages
Botany (Hons) Questions (WBSU)_Sem4_BOTA_2019-22_Library (1)
No ratings yet
Botany (Hons) Questions (WBSU)_Sem4_BOTA_2019-22_Library (1)
17 pages
DNA Microarray
No ratings yet
DNA Microarray
5 pages
CANCER-CYTOGENETICS
No ratings yet
CANCER-CYTOGENETICS
26 pages
Resume - Leila Jamali
No ratings yet
Resume - Leila Jamali
4 pages
Chapter 8 Cell Reproduction
No ratings yet
Chapter 8 Cell Reproduction
20 pages
Unit 4 AO1 Summaries
No ratings yet
Unit 4 AO1 Summaries
8 pages
Molecular Biology Test
100% (1)
Molecular Biology Test
2 pages
Instant Access to Gardner and Sutherland’s Chromosome Abnormalities and Genetic Counseling 5th Edition R.J. Mckinlay Gardner ebook Full Chapters
100% (10)
Instant Access to Gardner and Sutherland’s Chromosome Abnormalities and Genetic Counseling 5th Edition R.J. Mckinlay Gardner ebook Full Chapters
66 pages
WORKBOOK - Cell Division
No ratings yet
WORKBOOK - Cell Division
9 pages
PRACTICAL NO2 Karyotype
No ratings yet
PRACTICAL NO2 Karyotype
8 pages
BIOA01 Writing Assignment Instructions F2024
No ratings yet
BIOA01 Writing Assignment Instructions F2024
6 pages
Immediate download DNA Barcoding and Molecular Phylogeny Subrata Trivedi ebooks 2024
100% (2)
Immediate download DNA Barcoding and Molecular Phylogeny Subrata Trivedi ebooks 2024
55 pages
DNA Fingerprinting
No ratings yet
DNA Fingerprinting
22 pages
BIO230 - Section 1 Regulation of Genome Expression Lecture 1-9
No ratings yet
BIO230 - Section 1 Regulation of Genome Expression Lecture 1-9
36 pages
Linkersand DNAAdapters
No ratings yet
Linkersand DNAAdapters
14 pages

Sequence Alignment Presentation

Uploaded by

Sequence Alignment Presentation

Uploaded by

Sequence alignment

Sequence alignment is a way of arranging the sequences of

Sequence alignment is important for:

mutation: a nucleotide at a certain location is replaced by

insertion: at a certain location one new nucleotide is

deletion: at a certain location one existing nucleotide

indel: an insertion or a deletion

 Sequence identity -- exactly the same Amino Acid or Nucleotide in the

 Sequence similarity -- Substitutions with similar chemical properties.

 Sequence homology -- general term that indicates evolutionary

 Pairwise alignment -- used to find the best-matching piecewise (local)

 Multiple sequence alignment -- try to align all the sequences in a given

1. The identification of sequences of unknown structure or function.

2. The study of molecular evolution.

 Local alignment is to try to find the regions with highest density of

To maximize the number of matches the resulting alignment could then be

 The approach compares every pair of characters in the two sequences

Try to align two sequences by inserting some gaps at different locations, so as to

Score measurement is determined by "match award", "mismatch penalty" and

A global alignment program is based on the Needleman-Wunsch

The cost for aligning the two sequences s = VIVALASVEGAS and t =

M(A) = 7 matches + 2 mismatches + 3 gaps

PAM - Point Accepted Mutation (Margaret Dayhoff)

The PAM1 is calculated from BLOSUM 62 is calculated from

PAM is used to select groups of amino acids that represent conservative

Thus PAM matrix provided an empirical, experimental determination of conserved

BLOSUM is based on the observed amino acid substitutions in a large set of

Multiple alignment is an extension of pairwise alignment to

Multiple alignment methods try to align all of the sequences in a

The most popular multiple alignment tool is CLUSTAL W.

You might also like