0% found this document useful (0 votes)
38 views

Bioinfo Notes 2

The document discusses global alignment and local alignment algorithms. It describes the Needleman-Wunsch algorithm as the first algorithm for global sequence alignment using dynamic programming to find the optimal alignment between entire sequences. The Smith-Waterman algorithm is presented as the method for local alignment to find locally similar regions between divergent or variably sized sequences. Key steps of the Needleman-Wunsch algorithm including setting up a scoring matrix and performing a trace-back procedure are outlined.

Uploaded by

Raj Lonkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Bioinfo Notes 2

The document discusses global alignment and local alignment algorithms. It describes the Needleman-Wunsch algorithm as the first algorithm for global sequence alignment using dynamic programming to find the optimal alignment between entire sequences. The Smith-Waterman algorithm is presented as the method for local alignment to find locally similar regions between divergent or variably sized sequences. Key steps of the Needleman-Wunsch algorithm including setting up a scoring matrix and performing a trace-back procedure are outlined.

Uploaded by

Raj Lonkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Global alignment

A global alignment contains the entire sequence of each

protein or DNA molecule that means it tries to align entire

sequence.

 One of the first and most important algorithms for aligning

two protein sequences was described by Needleman and

Wunsch (1970).

 TheNeedleman-Wunsch algorithm is an example of dynamic


programming.

 In global alignment, two sequences to be aligned are

assumed to be generally simmilar over their entire length.

 Alignment is carried out from beginning to end of both


sequences to find the best possible alignment across the entire
length between the two sequences.

 This
method is more applicable for aligning two closely related
sequences of roughly the same length.

 For divergent sequences and sequences of variable lengths, this


method may not be able to generate optimal results because it
fails to recognize highly similar local regions between the two
sequences.

 This algorithm is important because it produces an optimal


alignment of protein or DNA sequences, even allowing the
introduction of gaps.
 the Needleman-Wunsch approach to global sequence alignment
in three steps:

(1) setting up a matrix.


 First step is comparasion of two sequences in a
two-dimensional matrix.
 First sequence is listed horizintally along the matrix, second
sequence is listed vertically along the matrix .
 Then a matrix is build of dimensions m + 1 by n + 1
 A perfect alignment between two identical sequences would
simply be represented by a diagonal line extending from the top
left to the bottom right
 Any mismatches between two sequences would still be
represented on this diagonal path
 Gaps are represented in this matrix using horizontal or vertical
paths.

(2) scoring the matrix.


 The goal of this algorithm is to identify an optimal alignment.
 goal in finding an optimal alignment is to determine the path
through the matrix that maximizes the score.
 There are four possible occurrences at each position
 two residues may be perfectly matched
 they may be mismatched;
 a gap may be introduced from the first sequence
 a gap may be introduced from the second sequence,

(3) identifying the optimal alignment.


 After the matrix is filled, the alignment is determined by a
trace-back procedure.
 There are rewards and penalties match 1 mismatch -1 and gap
-2
 In the matrix the right bottom value will be larger than its

diagonal value then we can say it is match and if mis

matched then diagonal value will be larger than right bottom

one.

 If there is a match go diagonal, if not then go highest value

of the neighbour value and this is represented as gap.


Local alignment
 Localalignment, does not assume that the two sequences in
question have similarity over the entire length.

 It
only finds local regions with the highest level of similarity
between the two sequences and aligns these regions only .

 Stretches of sequences with highest density of matches are


aligned.

 Thisapproach can be used for aligning partially similar, different


length or more divergent sequences with the goal of searching for
conserved patterns in DNA or protein sequences.

 Thetwo sequences to be aligned can be of different lengths. In


which alignment of substring of target with substring of query is
done.

 This approach is more appropriate for aligning divergent


biological sequences containing only modules that are similar,
which are referred to as domains or motifs.

 The general local alignment method used is smith-waterman


which is an example of dynamic programming.
 The smith waterman method is very much similar to
needleman-wunsch method of gobal alignment , the only main
difference is the negative values in needleman-wunsch method is
converted to zero.
 The traceback step is far more simpler and straight forward than
global alignment, choosing the highest value first and then
moving upto zero is all needed in this step.this would give a
conserved pattern in both the sequences.
Applications of bioinformatics:

Databases
 database is a computerized archive used to store and organize
data in such a way that information can be retrieved easily via a
variety of search criteria.
 Databases are composed of computer hardware and software
for data management.
 The chief objective of the development of a database is to
organize data in a set of structured records to enable easy
retrieval of information.
 To retrieve a particular record from the database, a user can
specify a particular piece of information, called value, to be found
in a particular field and expect the computer to retrieve the whole
data record. This process is called making a query

 Biological databases:
 Itis the a collection of biological information or data that is
organised so that it can be easily accessed, managed, updated.
 The kind of data includes DNA sequences of gene or full
genome, protein sequences and 3d structure protein, nucleic
acids and protein -nucleic acid complex.
 Current biological databases use all three types of database
structures: flflat fifiles, relational, and object oriented.
 Based on their contents, biological databases can be roughly
divided into three categories: primary databases, secondary
databases, and specialized databases.
Similarity identity

 An important concept in sequence analysis is sequence


homology.
 When two sequences are descended from a common
evolutionary origin, they are said to have a homologous
relationship or share homology.
 A related but different term is sequence similarity, which is the
percentage of aligned residues that are similar in physio-chemical
properties such as size, charge, and hydrophobicity.
 To be clear, sequence homology is an inference or a conclusion
about a common ancestral relationship drawn from sequence
similarity comparison when the two sequences share a high
enough degree of similarity.
 On the other hand, similarity is a direct result of observation from
the sequencealignment.
 Sequence similarity can be quantifified using percentages;
homology is a qualitative statement.
 In a protein sequence alignment, sequence identity refers to the
percentage of matches of the same amino acid residues between
two aligned sequences.
 Sequence Similarity and sequence identity are same

words for nucleotide sequence, but are different for

protein sequence where identity means % of exact

matches between 2 aligned sequences and similarity

means % of aligned resides that share characteristics.


 Bothidentity and similarity are used to deduce homology.
Homology has a specific definition having a common evolutionary
ancestor.

Homology
 Homologous are two or more sequence that descend from a
common ancestral sequence
 Homologos are results of divergent evolution.
 Two sequences are homologous if they share a common
evolutionary ancestry.
 There are no degrees of homology; sequences are either
homologous or not.
 Homologous proteins almost always share a significantly related
three-dimensional structure
 Proteins that are homologous may be orthologous or
paralogous.
 Orthologs are homologous sequences in different species that
arose from a common ancestral gene during speciation, result of
speciation events.
 Paralogs are homologous sequences that arose by a mechanism
such as gene duplication, result of gene duplication.
 Xenologsn result of horizontal gene transfer
 Gametologs :the gene in sex chromosomes that have not
recombined.
 Homologs : the gene which are separated by a speciation event
when hybridised together via lateral gene transfer.

You might also like