0% found this document useful (0 votes)
7 views

Multiple Sequence Alignment: Some Slides From Cuong Dang and Others

Uploaded by

Cao Dũng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Multiple Sequence Alignment: Some Slides From Cuong Dang and Others

Uploaded by

Cao Dũng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Multiple sequence alignment

Some slides from Cuong Dang and others


Multiple alignment?
Multiple alignment?
Multiple alignment, Why?
MSA is central to many bioinformatics applications
✓ Phylogenetic tree reconstruction
✓ Structure prediction
Structure is more conserved than sequence

✓ Motifs/Patterns discovery
Which parts “do the same thing”?

✓ Detecting new homologies between new genes and


established sequence families
MSA Approaches
Dynamic programming
✓Generalization of Needleman-Wunsch
✓Find alignment that maximizes a
score function

Global Progressive Alignments


✓Match closely-related sequences first using
a guide tree
Sum of Pairs (SP)
MSA score?
DP: From pairwise to multiple

7 ways alignment can end for 3 sequences


Xi
X1 . . . Xi-1 Xi Yj
Y1 . . . Yj-1 Yj Wk
W1 . . . Wk-1 Wk -
Yj
Wk
Xi
-
Xi Wk
- Xi
- - Yj
- Yj -
- -
Wk
DP for 3 sequences

Each alignment is a path through the


dynamic programming matrix

A
V S N —S
A
—S N A —
N ———A S
S

Start V S N S
DP for 3 sequences
There are 7 ways to reach C[i,j,k]
✓Run time: O(n3), i.e. 7n3

C[i,j,k]
C[i-1,j,k-1]

C[i-1,j-1,k-1] C[i-1,j,k-1]
Dynamic programming for three sequences

- Step 1: Initialize F(0,0,0) = 0


- Step 2: For every (𝑖, 𝑗, 𝑣), calculate 𝐹(𝑖, 𝑗, 𝑣) from 7
previously calcuated elements

- Step 3: 𝐹(𝑝, 𝑞, 𝑡) is the optimal solution to align


sequences X, Y and Z. Traceback the solution and
insert ‘ − ’ into three sequences X, Y and Z.
DP for MSA
Impractical due to exponential running
time.
✓(2k-1)nk as the run time

Computing exact MSA is computationally


almost impossible
✓Heuristics are used (progressive alignment)
Progressive multiple alignment
Progressive multiple alignment
Start with aligning two chosen sequences
Choose third sequence, two previous
sequences and align the third against
them
Repeat until all sequences have been
aligned
Different options how to choose
sequences and score alignments
Progressive multiple alignment

A guide tree used


✓Pairwise alignments to reconstruct an initial
phylogenetic tree

✓Built progressively starting with most


closely related sequences

✓Follows branching order in phylogenetic tree


Progressive multiple alignment

CLUSTALW: Profile-based popular


progressive multiple alignment
✓Construct pairwise alignments

✓Build guide tree

✓Progressive alignment guided by the tree


CLUSTALW

Profile representation of multiple alignment


CLUSTALW

How to align two given alignments?

Alignments 1

Alignments 2
CLUSTALW
CLUSTALW: Step 1 - Pairwise alignment

Similarity matrix generated


✓Similarity = exact matches / sequence
length (percent identity)
CLUSTALW: Step 2 - Build guide tree

ClustalW uses Neighbor Joining method to build guide tree


Guide tree roughly reflects evolutionary relations
CLUSTALW: Step 3 - Progressive alignment

Progressive alignment guided by the tree


CLUSTALW results?

- 𝑋! = "GGATTGT”
- 𝑋" = "GGAAGG”
- 𝑋# = "AAGGTT”
- 𝑋$ = "AGGT"

Scores:

- 𝐶 𝑥, 𝑥 = 10 (match score).
- 𝐶 𝑥, 𝑦 = −1 mismatch score .
- 𝐶 𝑥, – = 𝐶 – , 𝑥 = −2 (indel score)
Problems with SP
Some pair-wise alignments are more
important than others
✓E.g., more important to have a good
alignment between mouse and human
sequences than mouse and bird

Solution: Different weights to different


pair-wise alignments.
✓Weight decreases with evolutionary distance
MUSCLE: A tool for fast MSA

MUSCLE: a multiple sequence alignment


method with reduced time and space
complexity; BMC Bioinformatics 2004,
5:113
✓Demonstrated to be consistently better
than CLUSTALW

One of the best-performing multiple


alignment programs according to
published benchmark tests
MUSCLE: A tool for fast MSA
MUSCLE

You might also like