0% found this document useful (0 votes)

0 views

2. Sequence alignment

Chapter 2 discusses sequence alignments, which are essential for inferring relationships between biological sequences, predicting functions, and assembling larger sequence units. It covers concepts such as homology, similarity, identity, and various alignment methods including global and local alignments, as well as algorithms like Needleman-Wunsch and Smith-Waterman. The chapter also highlights the importance of database searching and tools like BLAST for large-scale sequence analysis.

Uploaded by

phamngochuyen425

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

2. Sequence alignment

Uploaded by

phamngochuyen425

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Chapter 2.

Sequence Alignments

5/3/2025

1
Sequence alignment: Overview

• Sequence alignment provides inference for the relatedness of two

sequences under study.

seq1: CATTTATTTTC
seq2: AATTTGTA Mismatch

Match
Indel
• Match vs mismatch.
• Gap (added to increase number of match) represents insertion or deletion
(indels)

2
Sequence alignment: Purpose

• Predict function of a sequence by inference from a well-characterized

sequence

seq1: CATTTATTTTC
seq2: AATTTGTA

• Infer evolutionary relationship between sequences: If the two sequences

share significant similarity, it is likely that the two sequences must have
derived from a common evolutionary origin

• Predict structural and functional motif: active site, receptor site

3
Sequence alignment: Purpose

• Assembly of sequence reads into larger units such as contigs or genomes

Seq 1 The more that

Seq 2 that you read,
Seq 3 you read, the more things
Seq 4 things you will
Seq 5 will know.

4
Sequence alignment: Purpose

• Assembly of sequence reads into larger units such as contigs or genomes

Seq 1 The more that

Seq 2 that you read,
Seq 3 you read, the more things
Seq 4 things you will
Seq 5 will know.

Assembled sequence:
The more that that you read, the more things you will know.

5
Sequence homology, similarity and identity

• Two sequences share homology when they share a common ancestor.

Homology are not a quantitative term

• Sequence similarity is the percentage of aligned residues that are similar in

physiochemical properties such as size, charge, and hydrophobicity. Similarity
is a quantitative term

• Sequence identity can be the same as similarity (for DNA) but is different from
similarity (for protein)

• Sequence identity refers to the percentage matches of the aligned residues

6
Sequence evolution

• Major changes:
• Substitution GACTGGA
• Insertion
• Deletion Substitution: G -> C CACTGGA
Deletion: C CATGGA
Speciation event
Substitution: G ->T CATGTA
Insertion: T CATGTTA

CATGTTA CACTGGA

7
Sequence alignment: which alignment is the best?

C A T G T T A C A - T G T T A C A T - G T T A
| | | | | | | | | | | |
C A C T G G A C A C T G G - A C A C T G G - A

8
Pairwise alignment: Global vs. local

• In global alignment, two sequences

to be aligned are assumed to be
generally similar over their entire
length.
• Global alignment applies for closely
related sequences
• Local alignment does not assume
similar length between aligned
sequence, finds local regions that
share the highest level of similarity
• Local alignment to search for
conversed regions within the
sequence
9
Sequence alignment: dynamic programming method

• Global alignment: Needleman and

Wunsch algorithm

Match: +1, mismatch: -1, gap: -2

• Step 1: set up a matrix

• Step 2: score a matrix
• Step 3: trace back and identify
alignment

10
Sequence alignment: dynamic programming method

Sequence 2 (length m)
C A – T G T T A
C A C T G G - A
Sequence 1 (length n)

2
11
Sequence alignment: dynamic programming method

• Match: +1, mismatch: -1, gap: -3

12
Scoring matrix
• Substitution matrix is a set of values for quantifying the likelihood of one residue
being substituted by another in an alignment.

• Substitution matrix is derived from statistical analysis of residue substitution data

from sets of reliable alignments of highly related sequences.

• Scoring matrices for nucleotide sequences are relatively simple. A positive value
or high score is given for a match and a negative value or low score for a mismatch.

• Scoring matrices for amino acids are more complicated because scoring reflects
the physicochemical properties of amino acid residues, as well as the likelihood of
certain residues being substituted among true homologous sequences

13
Scoring matrix

14
Local alignment: Smith and Waterman algorithm

• Negative scores are replaced by 0

• Tracing back scoring matrix starts
from the cell with the highest score

15
Sequence alignment: dot plots
• Seq1: GATTCTATCTAACTA
• Seq2: GTTCTATTCTAAC

• Put a dot at where a match is found

• Connect the dots in diagonal direction
• Drawback: high noise
• Solution: sliding window with a
threshold

16
Database similarity searching: pairwise alignment on large scale

• Database searching: a mean of assigning putative functions to newly

determined sequences.

• How: by pairwise alignment on a large scale: a query sequence (input

sequence) vs. thousands of sequences in the database

17
Database similarity searching: pairwise alignment on large scale

• Requirements:
• Sensitivity: the ability to find as many correct hits as possible
• Selectivity (specificity): to find as few unrelated hits as possible
• Speed: the time it takes to get results
• Approaches:
• Exhaustive type: dynamic programming (Waterman and Smith algorithm)
• Heuristic type: take shortcut by reducing the search space.

18
Basic Local Alignment Search Tool (BLAST)

• Developed by Stephen Altschul of NCBI in 1990

• Became one of the most popular programs for sequence analysis
• Use heuristic approach to align a query sequence with all sequences in the
database
• Objective: find high-scoring ungapped segments along related sequences.

19
BLAST steps
1. Break query sequence into words
(e.g. 3 aa or 11 nucleotides)
2. Scan every 3 residues in word
database
3. Assume one of the words finds
matches in the database
4. Calculate sums of match scores
based on a scoring matrix
5. Find the database sequence
corresponding to the best word
match and extend alignment in both
directions
6. Determine the high scored segment
above threshold (e.g., 22)

20
BLAST results

21
Statistical significance of BLAST search results

• E-value (Expectation value) indicates the probability that the resulting

alignments from a database search are caused by random chance

E-value = m x n x P
m: total number of residues in a database
n: number of residues in the query sequence
P: probability that an alignment is a result of random chance

E.g., E-value = 1012 x 100 x 10-20 = 10-6

22
BLAST results

23
BLAST results

24
Problems

1. Obtain the human HBA and HBB protein sequences. Perform pairwise
alignment on NCBI and on EBI websites
2. You have isolated a novel bacterial strain from a soil sample and subject
PCR product of 16S rRNA gene for Sanger sequencing. Now that you have
a sequence of 16S rRNA gene, use Blastn on NCBI to identify the identity of
your isolate.

Structure of DNA Exam Questions and Mark Scheme 24lwmg4
No ratings yet
Structure of DNA Exam Questions and Mark Scheme 24lwmg4
12 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Lecture 6- Sequence Analysis
No ratings yet
Lecture 6- Sequence Analysis
28 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
3
No ratings yet
3
107 pages
W03_Pairwise
No ratings yet
W03_Pairwise
55 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Retrieval of Data
No ratings yet
Retrieval of Data
22 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
Chapter 2 Bioinformatics
No ratings yet
Chapter 2 Bioinformatics
9 pages
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
17 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Sequence Alignment
No ratings yet
Sequence Alignment
7 pages
UNIT IV _ BLAST (1)
No ratings yet
UNIT IV _ BLAST (1)
21 pages
Bioinformatics Pairwise Alignment
No ratings yet
Bioinformatics Pairwise Alignment
128 pages
B.I Sec 4.
No ratings yet
B.I Sec 4.
18 pages
Bio 2
No ratings yet
Bio 2
39 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Bioinformatics Chaper3
No ratings yet
Bioinformatics Chaper3
34 pages
3.7
No ratings yet
3.7
22 pages
CE6068 Lecture 5
No ratings yet
CE6068 Lecture 5
83 pages
New Sequence Alignment Algorithm Using Ai Rules and Dynamic Seeds
No ratings yet
New Sequence Alignment Algorithm Using Ai Rules and Dynamic Seeds
14 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
BLAST (Basic Local Alignment Search Tool)
100% (1)
BLAST (Basic Local Alignment Search Tool)
23 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
BT302_L3_PSA
No ratings yet
BT302_L3_PSA
47 pages
4. Sequence Alignment
No ratings yet
4. Sequence Alignment
24 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Sequence Analysis
No ratings yet
Sequence Analysis
6 pages
Sequence Alignemt
No ratings yet
Sequence Alignemt
3 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Lab Report 3 Bioinformatics
No ratings yet
Lab Report 3 Bioinformatics
18 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Bioinfo-Ders-7-ALLIGNMENT_1
No ratings yet
Bioinfo-Ders-7-ALLIGNMENT_1
55 pages
Sequence Alignment
No ratings yet
Sequence Alignment
29 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Module-II
No ratings yet
Module-II
51 pages
Lecture2022 - 3 /!
No ratings yet
Lecture2022 - 3 /!
60 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Copycaller Software V2.0: User Guide
No ratings yet
Copycaller Software V2.0: User Guide
92 pages
Complete Download Real Time PCR Advanced Technologies and Applications 1st Edition Nick A. Saunders PDF All Chapters
100% (4)
Complete Download Real Time PCR Advanced Technologies and Applications 1st Edition Nick A. Saunders PDF All Chapters
61 pages
Centrla Dogma WS2
No ratings yet
Centrla Dogma WS2
9 pages
Life Sciences Practical Task 1 Grade 12 2021
No ratings yet
Life Sciences Practical Task 1 Grade 12 2021
9 pages
Biology Revision Notes - Unit 3 & 4
No ratings yet
Biology Revision Notes - Unit 3 & 4
40 pages
Genetic Problems ANSWERS
No ratings yet
Genetic Problems ANSWERS
5 pages
Streptomyces Griseus
100% (1)
Streptomyces Griseus
11 pages
Hsslive-xii-Zoology Focus Area Note 2023-By-Sunil
No ratings yet
Hsslive-xii-Zoology Focus Area Note 2023-By-Sunil
26 pages
Germ Plasm Theory: August Weismann
100% (1)
Germ Plasm Theory: August Weismann
2 pages
Streptococcus Salivarius PCR
No ratings yet
Streptococcus Salivarius PCR
4 pages
BurchGene Elma Feric Bojic
No ratings yet
BurchGene Elma Feric Bojic
3 pages
G 12 BIO PB-1 MS
No ratings yet
G 12 BIO PB-1 MS
6 pages
KT202-2× HotStart Taq PCR Mix-210831
No ratings yet
KT202-2× HotStart Taq PCR Mix-210831
1 page
Full download (Ebook) The Science of Science Fiction by Matthew Brenden Wood; Tom Casteel ISBN 9781619304703, 1619304708 pdf docx
100% (9)
Full download (Ebook) The Science of Science Fiction by Matthew Brenden Wood; Tom Casteel ISBN 9781619304703, 1619304708 pdf docx
67 pages
Transgenic and Mutant Tools to Model Brain Disorders 1st Edition Carisa L. Bergner pdf download
100% (1)
Transgenic and Mutant Tools to Model Brain Disorders 1st Edition Carisa L. Bergner pdf download
56 pages
Ethico-Legal Aspects of CRISPR Cas-9 Genome Editing 1120
No ratings yet
Ethico-Legal Aspects of CRISPR Cas-9 Genome Editing 1120
6 pages
SOAL XII-PAS GASAL BAHASA INGGRIS WAJIB
No ratings yet
SOAL XII-PAS GASAL BAHASA INGGRIS WAJIB
9 pages
General Biology 1: Quarter 1 - Module 3
No ratings yet
General Biology 1: Quarter 1 - Module 3
7 pages
Biostat M239 - Spring 2016 - Marc Suchard PDF
No ratings yet
Biostat M239 - Spring 2016 - Marc Suchard PDF
4 pages
Science-8 Q4 Week-2 Lesson-2 Cell-Division
No ratings yet
Science-8 Q4 Week-2 Lesson-2 Cell-Division
17 pages
pET System Manual
No ratings yet
pET System Manual
68 pages
15.1 To 15.3 Biology
No ratings yet
15.1 To 15.3 Biology
11 pages
Zoos in India
No ratings yet
Zoos in India
5 pages
FishSeedProductionIndustry 12
0% (1)
FishSeedProductionIndustry 12
55 pages
Syllabus Genetics PCB 3063 b51
No ratings yet
Syllabus Genetics PCB 3063 b51
6 pages
Biomedical Science Long Term Planning
No ratings yet
Biomedical Science Long Term Planning
6 pages
M2 - L4 - Inheritance Patterns of Animals
No ratings yet
M2 - L4 - Inheritance Patterns of Animals
4 pages
Bio Lecture
No ratings yet
Bio Lecture
4 pages
Archives of Oral Biology
No ratings yet
Archives of Oral Biology
12 pages