0% found this document useful (0 votes)
11 views7 pages

Chapter 6 Multiple Sequence Alignment 2022 Bioinformatics For Everyone

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Chapter 6 Multiple Sequence Alignment 2022 Bioinformatics For Everyone

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CHAPTER

Multiple sequence
alignment
6
6.1 Introduction
6.1.1 What is multiple sequence alignment?
In multiple sequence alignment (MSA), we attempt to coordinate two or more iden-
tical sequences with the aim of ensuring the best possible match between them.
MSA’s goal is to arrange a number of sequences to fit as numerous characters
from every sequence with a certain score (Fig. 6.1).
While there are many similarities between DNA and protein sequences, there are
usually many unique ones as well. This is because the various organisms that share
similar genes have similar or completely different functions, or because they are
shifted due to natural selection based on differing functions. Many genes and pat-
terns do not change much because of the simplicity of design. In order to investigate
this form of conservation, several sequences must be compared and aligned at the
same time. MSA has been necessary, and that is why it was created.
MSA is the process of aligning more than two sequences simultaneously. For an
illustration, let us have four hypothetical protein sequence, i.e. SeqA, SeqB, SeqC
and SeqD. The MSA of these sequences is shown below with the substitution of
(F/Y) and deletion of (L) and insertion of (K) (Fig. 6.2).

FIGURE 6.1
Result of multiple sequence alignment of five different sequences.

Bioinformatics for Everyone. https://doi.org/10.1016/B978-0-323-91128-3.00011-2 47


Copyright © 2022 Elsevier Inc. All rights reserved.
48 CHAPTER 6 Multiple sequence alignment

FIGURE 6.2
Multiple sequence alignment evolutionary tree.

6.1.1.1 Sequences
SeqA: NFLS
SeqB: NFS
SeqC: NKYLS
SeqD: NYLS

6.1.1.2 Multiple sequence alignment


SeqA: N * F L S
SeqB: N * FS
SeqC: N K Y L S
SeqD: N * Y L S

6.2 Scoring
The MSA scoring method depends on the sum of scores in a multi-line scoring ma-
trix for all possible pairs
P of sequences.
Score of MSA ¼ score (A, B), where score (A, B) ¼ pair-wise alignment
score of A, B.
Let us look at an example.
Seq (1): G K N
Seq (2): T R N
Seq (3): S H E
Sum of pairs: 1 þ 1 þ 6 ¼ 6.
Sum of second Col ¼ score (K, R) þ score (R, H) þ score (K, H) ¼
2 þ 0þ1 ¼ 1.
6.3 Multiple sequence alignment e types 49

6.3 Multiple sequence alignment e types


It can be difficult to coordinate three or more sequences which almost always take
time to align. Therefore, these alignments are generated and analysed with compu-
tational algorithms. Dynamic and heuristic approaches are used in most MSA
algorithms.
The techniques for MSA that use heuristic methods are listed below.
1. Progressive Alignment Construction
2. Iterative Alignment Construction
3. Block-Base Alignment
These techniques are fit for finding arrangements among every conceivable solu-
tion; however, they do not have the best arrangement. They are thus regarded as ap-
proaches, but in a short span of time we will quickly find a solution that is similar to
the real one.

6.3.1 Progressive Alignment Construction


In 1984, Paulien Hogeweg and Ben Hesper invented this approach, also called as the
hierarchical method or tree method. It constructs a final MSA by integrating pair-
like alignment from the pair that is the most similar to the pair that is the furthest
apart.

6.3.1.1 Advantages
• Fast
• Efficient
• In many instances, the resulting alignments are fair.

6.3.1.2 Disadvantages
• Heuristic
• Accuracy is very important
• Errors in progressive steps are propagated.
At the moment, two of the most widely recognised progressive alignment
methods being used are
1. Clustal Omega
2. T-Coffee

6.3.2 Iterative Alignment Construction


This methodology comprises of various techniques for creating MSAs while elimi-
nating progressive method errors. They function in similar ways with progressive
approaches, but re-align the initial sequences again and again and introduce new se-
quences to increasing MSA (Fig. 6.3).
50 CHAPTER 6 Multiple sequence alignment

FIGURE 6.3
Steps in iterative alignment.

6.3.2.1 Advantages
• Alignment of the profile illustrates conservation in a population (biologically
relevant).
• It is easy and can handle a large number of sequences.

6.3.2.2 Disadvantages
• Imprecise target feature.
• Any misalignments generated during the process are preserved.

6.3.3 Block-base alignment


This methodology divides sequences into squares and endeavours to discover ungap-
ped blocks of arrangements. DIALIGN2 is a typical technique for block alignment.

6.4 Methods for multiple sequence alignment


MSA is entirely a computer problem with various computer task aspects. The stan-
dard Dynamic Programming Model, suitable for pair alignment of sequences, can be
6.4 Methods for multiple sequence alignment 51

expanded into more sequence alignment. However, this problem is really very diffi-
cult, since only a small number of relatively short sequences can be evaluated in
more than three sequences. As a consequence, various approximation models are
used, some of them are provided below.

6.4.1 Dynamic programming-based models


Progressive Global Alignment is an optimum alignment procedure that uses dy-
namic programming. The pair alignment of the most similar sequences is achieved
first in this process. Alignment is then constructed by adding additional sequences.
Another approach to find optimum alignment is called the Iterative Model, which
uses the dynamic programming. Alignments for many groups or classes are first
made in the iterative model. And this alignment is used to align itself with much bet-
ter alignments.
The main issue with the above-mentioned progressive alignment approach is that
errors are propagated to MSA with initial alignments of the most closely related se-
quences. This problem becomes more pronounced if the initial alignment is between
sequences more remotely linked. Iterative models aim to correct this issue by re-
aligning sequence sub-groups and then aligning them into an overall alignment.
But with a dynamic programming model, an underlying difficulty is that a suit-
able scoring material is found, which becomes more difficult if two sequences are
concurrently involved. It is exponentially growing in sizes (as the power of number
of sequences). As a consequence, the requirements for computational complexity
and storage are increasing and becoming impractical for more sequences. Three
sets with lower sequence lengths are suitable for dynamic programming. The chal-
lenge for this approach is therefore to use a suitable combination of sequence
weighting, scoring matrix and distance penalties.

6.4.2 Statistical methods and probabilistic models


The MSA model is approximated by various statistical and probabilistic methods.
The Hidden Markov (HMM), which includes any possible combination of matches,
mis-matches and lacunas to produce an alignment of a series of sequences, was the
most commonly used statistical and probability model. HMMs are sometimes as
strong, if not better than some, as a several-sequence alignment. A variety of se-
quences have been trained in the model. The learned model is then used for posterior
information in order to achieve the most likely MSA. This model is modelled upon
an entirely theoretical probability, no sequence ordering is necessary, no penalties
are required for inserting/deleting and experimental information is available.
52 CHAPTER 6 Multiple sequence alignment

6.5 Usage of multiple sequence alignment


The sequence pair alignment or DNA sequence alignment represents the relationship
between two sequences, while MSA provides sequence information on the areas or
groups in which it can be related. Protein may provide preserved functional and
structural domains with such details and the data for evolutionary relationships
are shown for the DNA sequence.
The evolutionary background for sequences is MSA. If the sequences are well
aligned with the Multiple Alignment Series, the sequences would probably come
from a similar ancestor sequence. They may be distant evolutionary links for poor
alignment. This results in evolutionary relations among the sequences being
discovered.
The objective is to detect structural or functional similarities between proteins in
the comparison of protein sequence. Biologically related proteins can show no clear
sequence resemblance, but even when the sequences share only weak similarities,
we still want to see resemblance to them. When the sequence similarity is low, bio-
logically related sequences could not be identified in pairs, as poor similarities in
pairs could fail statistical tests. Simultaneous comparisons of several sequences
can also be found with sequence comparisons where similarities are invisible.

6.6 Applications of multiple sequence alignment


MSA can be used for
• Identifying sequence similarities (closely or distinctly related).
• Detecting sequences of preserved areas or motifs.
• Detecting structural homology.
• Enhanced prediction of secondary and tertiary protein structures.
• Making patterns or models which can be used further in order to predict new
family sequences.
• Inferring or linking evolutionary trees.
NOTE: The various Multiple Sequence Alignment tools, software’s and pro-
tocols are described in Chapter 7.

Further reading
Altschul, S.F., 1989. Gap costs for multiple sequence alignment. J. Theor. Biol. 138,
297e309.
Ravi, R., Kececioglu, J.D., 1997. Approximation algorithms for multiple sequence alignment
under a fixed evolutionary tree. Discrete Appl. Math. 88, 355e366.
Raghava, 2001. GPS A graphical web server for the analysis of protein sequences and
alignment. Biotech Softw. Internet Rep. 2 (6).
Further reading 53

Simossis, V.A., Heringa, J., 2005. Praline: a multiple sequence alignment toolbox that inte-
grates homology-extended and secondary structure information. Nucleic Acids Res.
289e294.

Suplatov, D.A., Kopylov, K.E., Popova, N.N., Voevodin, V.V., Svedas, V.K., 2018. Mustgu-
seal: a server for multiple structure-guided sequence alignment of protein families. Bio-
informatics 34 (9), 05.
Wheeler, T.J., Kececioglu, J.D., 2007. Multiple alignment by aligning alignments. Bioinfor-
matics 13, 559e568.

You might also like