Chapter 6 Multiple Sequence Alignment 2022 Bioinformatics For Everyone
Chapter 6 Multiple Sequence Alignment 2022 Bioinformatics For Everyone
Multiple sequence
alignment
6
6.1 Introduction
6.1.1 What is multiple sequence alignment?
In multiple sequence alignment (MSA), we attempt to coordinate two or more iden-
tical sequences with the aim of ensuring the best possible match between them.
MSA’s goal is to arrange a number of sequences to fit as numerous characters
from every sequence with a certain score (Fig. 6.1).
While there are many similarities between DNA and protein sequences, there are
usually many unique ones as well. This is because the various organisms that share
similar genes have similar or completely different functions, or because they are
shifted due to natural selection based on differing functions. Many genes and pat-
terns do not change much because of the simplicity of design. In order to investigate
this form of conservation, several sequences must be compared and aligned at the
same time. MSA has been necessary, and that is why it was created.
MSA is the process of aligning more than two sequences simultaneously. For an
illustration, let us have four hypothetical protein sequence, i.e. SeqA, SeqB, SeqC
and SeqD. The MSA of these sequences is shown below with the substitution of
(F/Y) and deletion of (L) and insertion of (K) (Fig. 6.2).
FIGURE 6.1
Result of multiple sequence alignment of five different sequences.
FIGURE 6.2
Multiple sequence alignment evolutionary tree.
6.1.1.1 Sequences
SeqA: NFLS
SeqB: NFS
SeqC: NKYLS
SeqD: NYLS
6.2 Scoring
The MSA scoring method depends on the sum of scores in a multi-line scoring ma-
trix for all possible pairs
P of sequences.
Score of MSA ¼ score (A, B), where score (A, B) ¼ pair-wise alignment
score of A, B.
Let us look at an example.
Seq (1): G K N
Seq (2): T R N
Seq (3): S H E
Sum of pairs: 1 þ 1 þ 6 ¼ 6.
Sum of second Col ¼ score (K, R) þ score (R, H) þ score (K, H) ¼
2 þ 0þ1 ¼ 1.
6.3 Multiple sequence alignment e types 49
6.3.1.1 Advantages
• Fast
• Efficient
• In many instances, the resulting alignments are fair.
6.3.1.2 Disadvantages
• Heuristic
• Accuracy is very important
• Errors in progressive steps are propagated.
At the moment, two of the most widely recognised progressive alignment
methods being used are
1. Clustal Omega
2. T-Coffee
FIGURE 6.3
Steps in iterative alignment.
6.3.2.1 Advantages
• Alignment of the profile illustrates conservation in a population (biologically
relevant).
• It is easy and can handle a large number of sequences.
6.3.2.2 Disadvantages
• Imprecise target feature.
• Any misalignments generated during the process are preserved.
expanded into more sequence alignment. However, this problem is really very diffi-
cult, since only a small number of relatively short sequences can be evaluated in
more than three sequences. As a consequence, various approximation models are
used, some of them are provided below.
Further reading
Altschul, S.F., 1989. Gap costs for multiple sequence alignment. J. Theor. Biol. 138,
297e309.
Ravi, R., Kececioglu, J.D., 1997. Approximation algorithms for multiple sequence alignment
under a fixed evolutionary tree. Discrete Appl. Math. 88, 355e366.
Raghava, 2001. GPS A graphical web server for the analysis of protein sequences and
alignment. Biotech Softw. Internet Rep. 2 (6).
Further reading 53
Simossis, V.A., Heringa, J., 2005. Praline: a multiple sequence alignment toolbox that inte-
grates homology-extended and secondary structure information. Nucleic Acids Res.
289e294.
Suplatov, D.A., Kopylov, K.E., Popova, N.N., Voevodin, V.V., Svedas, V.K., 2018. Mustgu-
seal: a server for multiple structure-guided sequence alignment of protein families. Bio-
informatics 34 (9), 05.
Wheeler, T.J., Kececioglu, J.D., 2007. Multiple alignment by aligning alignments. Bioinfor-
matics 13, 559e568.