0% found this document useful (0 votes)

38 views

Bioinfo Notes 2

The document discusses global alignment and local alignment algorithms. It describes the Needleman-Wunsch algorithm as the first algorithm for global sequence alignment using dynamic programming to find the optimal alignment between entire sequences. The Smith-Waterman algorithm is presented as the method for local alignment to find locally similar regions between divergent or variably sized sequences. Key steps of the Needleman-Wunsch algorithm including setting up a scoring matrix and performing a trace-back procedure are outlined.

Uploaded by

Raj Lonkar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Bioinfo Notes 2

Uploaded by

Raj Lonkar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Global alignment

A global alignment contains the entire sequence of each

protein or DNA molecule that means it tries to align entire

sequence.

 One of the first and most important algorithms for aligning

two protein sequences was described by Needleman and

Wunsch (1970).

 TheNeedleman-Wunsch algorithm is an example of dynamic

programming.

 In global alignment, two sequences to be aligned are

assumed to be generally simmilar over their entire length.

 Alignment is carried out from beginning to end of both

sequences to find the best possible alignment across the entire
length between the two sequences.

 This
method is more applicable for aligning two closely related
sequences of roughly the same length.

 For divergent sequences and sequences of variable lengths, this

method may not be able to generate optimal results because it
fails to recognize highly similar local regions between the two
sequences.

 This algorithm is important because it produces an optimal

alignment of protein or DNA sequences, even allowing the
introduction of gaps.
 the Needleman-Wunsch approach to global sequence alignment
in three steps:

(1) setting up a matrix.

 First step is comparasion of two sequences in a
two-dimensional matrix.
 First sequence is listed horizintally along the matrix, second
sequence is listed vertically along the matrix .
 Then a matrix is build of dimensions m + 1 by n + 1
 A perfect alignment between two identical sequences would
simply be represented by a diagonal line extending from the top
left to the bottom right
 Any mismatches between two sequences would still be
represented on this diagonal path
 Gaps are represented in this matrix using horizontal or vertical
paths.

(2) scoring the matrix.

 The goal of this algorithm is to identify an optimal alignment.
 goal in finding an optimal alignment is to determine the path
through the matrix that maximizes the score.
 There are four possible occurrences at each position
 two residues may be perfectly matched
 they may be mismatched;
 a gap may be introduced from the first sequence
 a gap may be introduced from the second sequence,

(3) identifying the optimal alignment.

 After the matrix is filled, the alignment is determined by a
trace-back procedure.
 There are rewards and penalties match 1 mismatch -1 and gap
-2
 In the matrix the right bottom value will be larger than its

diagonal value then we can say it is match and if mis

matched then diagonal value will be larger than right bottom

one.

 If there is a match go diagonal, if not then go highest value

of the neighbour value and this is represented as gap.

Local alignment
 Localalignment, does not assume that the two sequences in
question have similarity over the entire length.

 It
only finds local regions with the highest level of similarity
between the two sequences and aligns these regions only .

 Stretches of sequences with highest density of matches are

aligned.

 Thisapproach can be used for aligning partially similar, different

length or more divergent sequences with the goal of searching for
conserved patterns in DNA or protein sequences.

 Thetwo sequences to be aligned can be of different lengths. In

which alignment of substring of target with substring of query is
done.

 This approach is more appropriate for aligning divergent

biological sequences containing only modules that are similar,
which are referred to as domains or motifs.

 The general local alignment method used is smith-waterman

which is an example of dynamic programming.
 The smith waterman method is very much similar to
needleman-wunsch method of gobal alignment , the only main
difference is the negative values in needleman-wunsch method is
converted to zero.
 The traceback step is far more simpler and straight forward than
global alignment, choosing the highest value first and then
moving upto zero is all needed in this step.this would give a
conserved pattern in both the sequences.
Applications of bioinformatics:

Databases
 database is a computerized archive used to store and organize
data in such a way that information can be retrieved easily via a
variety of search criteria.
 Databases are composed of computer hardware and software
for data management.
 The chief objective of the development of a database is to
organize data in a set of structured records to enable easy
retrieval of information.
 To retrieve a particular record from the database, a user can
specify a particular piece of information, called value, to be found
in a particular field and expect the computer to retrieve the whole
data record. This process is called making a query

 Biological databases:
 Itis the a collection of biological information or data that is
organised so that it can be easily accessed, managed, updated.
 The kind of data includes DNA sequences of gene or full
genome, protein sequences and 3d structure protein, nucleic
acids and protein -nucleic acid complex.
 Current biological databases use all three types of database
structures: flflat fifiles, relational, and object oriented.
 Based on their contents, biological databases can be roughly
divided into three categories: primary databases, secondary
databases, and specialized databases.
Similarity identity

 An important concept in sequence analysis is sequence

homology.
 When two sequences are descended from a common
evolutionary origin, they are said to have a homologous
relationship or share homology.
 A related but different term is sequence similarity, which is the
percentage of aligned residues that are similar in physio-chemical
properties such as size, charge, and hydrophobicity.
 To be clear, sequence homology is an inference or a conclusion
about a common ancestral relationship drawn from sequence
similarity comparison when the two sequences share a high
enough degree of similarity.
 On the other hand, similarity is a direct result of observation from
the sequencealignment.
 Sequence similarity can be quantifified using percentages;
homology is a qualitative statement.
 In a protein sequence alignment, sequence identity refers to the
percentage of matches of the same amino acid residues between
two aligned sequences.
 Sequence Similarity and sequence identity are same

words for nucleotide sequence, but are different for

protein sequence where identity means % of exact

matches between 2 aligned sequences and similarity

means % of aligned resides that share characteristics.

 Bothidentity and similarity are used to deduce homology.
Homology has a specific definition having a common evolutionary
ancestor.

Homology
 Homologous are two or more sequence that descend from a
common ancestral sequence
 Homologos are results of divergent evolution.
 Two sequences are homologous if they share a common
evolutionary ancestry.
 There are no degrees of homology; sequences are either
homologous or not.
 Homologous proteins almost always share a significantly related
three-dimensional structure
 Proteins that are homologous may be orthologous or
paralogous.
 Orthologs are homologous sequences in different species that
arose from a common ancestral gene during speciation, result of
speciation events.
 Paralogs are homologous sequences that arose by a mechanism
such as gene duplication, result of gene duplication.
 Xenologsn result of horizontal gene transfer
 Gametologs :the gene in sex chromosomes that have not
recombined.
 Homologs : the gene which are separated by a speciation event
when hybridised together via lateral gene transfer.

SIMPLIFIED LECTURE On RA 11313
100% (1)
SIMPLIFIED LECTURE On RA 11313
25 pages
SAO - Cold Hand, Warm Heart (ME4)
No ratings yet
SAO - Cold Hand, Warm Heart (ME4)
8 pages
Principles and Strategies of Teaching in Medical Laboratory Science
100% (1)
Principles and Strategies of Teaching in Medical Laboratory Science
24 pages
DOJ Sentencing Memo (Garth Peterson)
100% (1)
DOJ Sentencing Memo (Garth Peterson)
28 pages
Lecture 6- Sequence Analysis
No ratings yet
Lecture 6- Sequence Analysis
28 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Daa Assignment 9
No ratings yet
Daa Assignment 9
4 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Dynamic Programming Methods in Pairwise Alignment
No ratings yet
Dynamic Programming Methods in Pairwise Alignment
41 pages
Tabby
No ratings yet
Tabby
11 pages
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
17 pages
Daa Assignment 9 Aryan Project (1)
No ratings yet
Daa Assignment 9 Aryan Project (1)
5 pages
lecture2_sequence_alignment
No ratings yet
lecture2_sequence_alignment
26 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
bioinformatics.pdf.bak
No ratings yet
bioinformatics.pdf.bak
14 pages
05. Sequence Alignment
No ratings yet
05. Sequence Alignment
9 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
Sequence alignment write
No ratings yet
Sequence alignment write
17 pages
4. Sequence Alignment
No ratings yet
4. Sequence Alignment
24 pages
3.7
No ratings yet
3.7
22 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
3
No ratings yet
3
107 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
The Needleman Wunsch Algorithm For Sequence Alignment
No ratings yet
The Needleman Wunsch Algorithm For Sequence Alignment
46 pages
Lecture 6 Evolutionary Sequence Alignment Algorithms
No ratings yet
Lecture 6 Evolutionary Sequence Alignment Algorithms
26 pages
G7 Sequence Alignment
No ratings yet
G7 Sequence Alignment
6 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony
No ratings yet
Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony
7 pages
Needleman-Wunsch and Smith-Waterman Algorithm
67% (9)
Needleman-Wunsch and Smith-Waterman Algorithm
19 pages
Bioinformatics Pairwise Alignment
No ratings yet
Bioinformatics Pairwise Alignment
128 pages
Sequence Alignment Methods Final
No ratings yet
Sequence Alignment Methods Final
69 pages
Module-II
No ratings yet
Module-II
51 pages
Sequence Alingment
No ratings yet
Sequence Alingment
10 pages
W03_Pairwise
No ratings yet
W03_Pairwise
55 pages
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
No ratings yet
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
17 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Bioinformatics and Biostatistics Course
No ratings yet
Bioinformatics and Biostatistics Course
6 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Bioinformatics Basics PDF
No ratings yet
Bioinformatics Basics PDF
10 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Module 3 Session.2 Practical Assignment-Lucy Nakabazzi
No ratings yet
Module 3 Session.2 Practical Assignment-Lucy Nakabazzi
4 pages
Sequence Alignment (Chapter 6) : The Biological Problem
No ratings yet
Sequence Alignment (Chapter 6) : The Biological Problem
44 pages
5.Pairwise Alignment
No ratings yet
5.Pairwise Alignment
85 pages
B.I Sec 4.
No ratings yet
B.I Sec 4.
18 pages
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
No ratings yet
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
57 pages
Secondary Structure Prediction of Tuberculosis Genomes Using Machine Learning Algorithms
No ratings yet
Secondary Structure Prediction of Tuberculosis Genomes Using Machine Learning Algorithms
111 pages
Sequence Alignemt
No ratings yet
Sequence Alignemt
3 pages
Genomics and Similarity search
No ratings yet
Genomics and Similarity search
43 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Bioinfo Generic Skill
No ratings yet
Bioinfo Generic Skill
10 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
54 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet
Genetic Algorithm: Fundamentals and Applications
From Everand
Genetic Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter 1 Definition of Health
No ratings yet
Chapter 1 Definition of Health
12 pages
Al-Ameen College - Year 4 Final 2024
No ratings yet
Al-Ameen College - Year 4 Final 2024
3 pages
Barangay Resident Information Management With Issuance System.
No ratings yet
Barangay Resident Information Management With Issuance System.
13 pages
By Martin Ganja: 100 Interview Questions For Practising Speaking English
No ratings yet
By Martin Ganja: 100 Interview Questions For Practising Speaking English
6 pages
Follow & Share: Discipleship Retreat
No ratings yet
Follow & Share: Discipleship Retreat
2 pages
Textile Terms
No ratings yet
Textile Terms
54 pages
Sinners Anonymous PDF
No ratings yet
Sinners Anonymous PDF
375 pages
Mala in Se
100% (1)
Mala in Se
27 pages
VII - Eng - One Thousand Dollars
No ratings yet
VII - Eng - One Thousand Dollars
7 pages
Chapter 1 Research
No ratings yet
Chapter 1 Research
8 pages
Procedure Te
No ratings yet
Procedure Te
13 pages
Dhanyapancaka, Dhānyapañcaka, Dhanya-Pancaka - 2 Definitions
No ratings yet
Dhanyapancaka, Dhānyapañcaka, Dhanya-Pancaka - 2 Definitions
2 pages
Patch Management For Red Hat Enterprise Linux User's Guide: IBM Endpoint Manager
No ratings yet
Patch Management For Red Hat Enterprise Linux User's Guide: IBM Endpoint Manager
44 pages
Ob Assignment
No ratings yet
Ob Assignment
3 pages
ARCore Document
No ratings yet
ARCore Document
3 pages
Jit & Toc
100% (1)
Jit & Toc
5 pages
Engagement Toolkit v0.5
No ratings yet
Engagement Toolkit v0.5
27 pages
Psychiatry Logbook
No ratings yet
Psychiatry Logbook
3 pages
Lim Erence
No ratings yet
Lim Erence
162 pages
Mazariah Bethel - BJC Coursework
No ratings yet
Mazariah Bethel - BJC Coursework
4 pages
Project Report On CSR of Tata Steel
No ratings yet
Project Report On CSR of Tata Steel
45 pages
Financial Accounting Terms Final 29
No ratings yet
Financial Accounting Terms Final 29
30 pages
BORROWING
No ratings yet
BORROWING
5 pages
Lumain de Aparicio Vs Paraguya 150 SCRA 279
No ratings yet
Lumain de Aparicio Vs Paraguya 150 SCRA 279
9 pages
Sappress Sap Certified Application Associate
No ratings yet
Sappress Sap Certified Application Associate
31 pages
The choanoflagellates evolution biology and ecology 1st Edition Leadbeater 2024 scribd download
100% (20)
The choanoflagellates evolution biology and ecology 1st Edition Leadbeater 2024 scribd download
44 pages