Algorithm Design and Scoring Matrices PDF

Uploaded by

Jahan Rana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views31 pages

Algorithm Design and Scoring Matrices PDF

Uploaded by

Jahan Rana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Bioinformatics

Muhammad Muddassir Ali

[email protected]
IBBT
Topics of this lecture
• Course aims and learning goals

• Concept of bioinformatics algorithm

• Different algorithmic methods
• Home task…
Learning objectives
The student should be able to:
• Understand the concept of bioinformatics algorithm

• Understand the basic algorithm that works behind sequence alignment

What is an algorithm..??

A Procedure i.e sequence or set of

instructions, for accomplishing a well
formulated problem.
Exemple
Initial state
input

Sequence of
instructions

algorithm

End state

output
Example
Search the entire genome for all plausible genes
Genes start with ’ATG’
• Given a genome search for all occurrences of ATG
and report their position in the genome.
• Read in genome sequence
Store in program/data structure
• Search for start codon
• If A, check if next is T
If T, check if next is G

• Output position
• Rough outline…???
• Pseudo code..???
Pseudo code
• What is rough outline?
• Outline of an algorithm
• Can not be compiled nor executed
• PSEUDO CODE
• There are no real formatting or syntax rules
• Purpose?
• Enables the programmer to concentrate on the
• implementation of the algorithm
• Programming language independent
Algorithm design technique
• Intuitive algorithm
• Given a genome sequence, make a copy of it.”
• Abstract algorithm
• Flow chart or u can make Pseudo code
String Copy(s,n)
for i ← to n
ti ← si
return t
Implemented algorithm
Programing language (C, Perl, R, Java)
Dynamic programming
Used to derive pairwise alignments
1. Needleman-Wunsch for global alignments
2. Smith-Waterman for local alignments

• Advantage……???
• Disadvantage……?????
Dynamic programming

Design
1. Brake a problem into sub-problems
2. Construct an optimal solution each sub-
problem
3. Derive overall optimal solution by combining
solutions for each sub-problem – without
recomputing already computed solutions
Gapped and Ungaped alignments
Extended gap penalty
Basic Local Alignment Search Tool
(BLAST)
• Generate a list of words
• Break down the query sequence into words, i.e.
• subsequences of length of words (here w=4)
• For each word, create similar words, using
substitution matrix
• Search DB for exact matches, called seeds
• Basic Local Alignment Search Tool (BLAST)
Extend match – MSP
Extend hit in both directions – locally maximal segment pair
• (MSP).
Terminate – when score for a segment pair is less than a certain
threshold
Score
• A number used to assess the biological
relevance of a finding.
• In the context of sequence alignments, a score
is a numerical value that describes the overall
quality of an alignment.
• Higher numbers correspond to higher
similarity.
• The score scale depends on the scoring system
used (substitution matrix, gap penalty).
Bit score
• The bit score gives an indication of how good
the alignment is; the higher the score, the
better the alignment.
• In general terms, this score is calculated from a
formula that takes into account the alignment
of similar or identical residues, as well as any
gaps introduced to align the sequences.
Bit-score:
• A log-scaled version of a score.
• Max score = highest alignment score (bit-score) between the
query sequence and the database sequence segment .
• In the context of sequence alignments (BLAST), the bit-score
S' is a normalized score expressed in bits that lets you estimate
the magnitude of the search space you would have to look
through before you would expect to find an score as good as or
better than this one by chance.

• S is the raw score. Parameters λ and K depend on the

substitution matrix and on the gap penalties (Altchul).
• The bit-scores is thus a rescaled version of the raw alignment
score that is independent of the size of the search space.
• Total score = sum of alignment scores of all segments from the
same database sequence that match the quary sequence
(calculated over all segments). This score is different from the
max score if several parts of the database sequence match
different parts of the query sequence
• Query coverage = percent of the query length that is included
in the aligned segments. This coverage is calculated over all
segments (cf. total score).
• E-value = number of alignments expected by chance with a
particular score or better. The expect value is the default
sorting metric and normally gives the same sorting order as
Max score.
BLAST – E-value

• Actually it is not a chance, nor a probability value, but rather

the estimate of how many times (this means "counts") you
would expect a result (e.g. a score in a sequence comparison)
at least as extreme as the one observed occurring by chance.
• A value close to zero means that you would practically expect
no unrelated sequence to score as high to your query sequence.
Apparently, no negative e-values may be observed.
• Measures the reliability of a match
• Given a match with score S, then E is the expected
• number of matches with score S or higher
•
• Lower E-value = more reliable match
• E-values, such as E=0.00001 (E=10-5)
(raise to power 5)
• 2e-6= 2x10-6

• E-val (S) = P-val (S) * N where N is the size of

the search space (N = n*m where n is the
length of the query sequence and m is the
length of the database).
Similarity scoring
Similarity scoring matrices for
proteins
• (Point Accepted Mutation PAM)
• PAM1
• Observed substitution rates when 1% of the amino acids have
• changed per 100 aa, i.e., 1 mutation per 100 aa
• Dayhoff (1978), used 71 protein families with 1572 mutations
• Matrices with higher PAM values derived from PAM1
• PAM250 = 250 mutations per 100 aa
• Due to back mutations and silent mutations, sequences at
PAM250 are ~20% identical
Point Accepted Mutation (PAM)

• Guideline
• PAM40 for highly similar sequences
• PAM70 for medium similar sequences
• PAM250 for highly divergent sequences
Similarity scoring
• BLOcks of Amino Acid SUbstitution Matrix
• BLOSUM matrix
• Derived from blocks, i.e. ungapped local alignments,
with different levels of identity
• E.g., BLOSUM62 derived from Blocks with >62%
identity
• Henikoff & Henikoff (1992) used 2000 blocks,
representing 500 protein groups
PAM vs. BLOSUM
• Roughly equivalent PAM and BLOSUM
matrices
• PAM100 <=> Blosum90
• PAM120 <=> Blosum80
• PAM160 <=> Blosum60
• PAM200 <=> Blosum52
• PAM250 <=> Blosum45

5 Database Similarity Search BLAST
No ratings yet
5 Database Similarity Search BLAST
47 pages
Database Similarity Searching
No ratings yet
Database Similarity Searching
4 pages
Blast & Fasta
No ratings yet
Blast & Fasta
47 pages
BMB 822_Bioinformatics and Computing_Lecture Notes
No ratings yet
BMB 822_Bioinformatics and Computing_Lecture Notes
94 pages
BLAST Background
100% (1)
BLAST Background
27 pages
4.Alignment Notes
No ratings yet
4.Alignment Notes
32 pages
Lecture2022 - 3 /!
No ratings yet
Lecture2022 - 3 /!
60 pages
Bio 2
No ratings yet
Bio 2
39 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
2NGS.01.Alignment
No ratings yet
2NGS.01.Alignment
18 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Sequence DB Search
No ratings yet
Sequence DB Search
38 pages
Lecture 101
No ratings yet
Lecture 101
43 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Multiple Sequence Alignment MSA
No ratings yet
Multiple Sequence Alignment MSA
8 pages
Unit I Algorithms
No ratings yet
Unit I Algorithms
42 pages
Fundamentals of bioinformatics_L5
No ratings yet
Fundamentals of bioinformatics_L5
56 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Lecture 9 and 10 Pair wise global Alignment.
No ratings yet
Lecture 9 and 10 Pair wise global Alignment.
27 pages
BLAST - A Heuristic Algorithm
No ratings yet
BLAST - A Heuristic Algorithm
18 pages
_second_done_w14b_searching squence databases
No ratings yet
_second_done_w14b_searching squence databases
32 pages
Sequence Alignment: Scoring Matrices
No ratings yet
Sequence Alignment: Scoring Matrices
30 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Basic Local Alignment Search Tool-BLAST
No ratings yet
Basic Local Alignment Search Tool-BLAST
9 pages
BLAST Analysis and Algorythim
No ratings yet
BLAST Analysis and Algorythim
11 pages
Introduction To Bioinformatics: Sequence Alignment
No ratings yet
Introduction To Bioinformatics: Sequence Alignment
29 pages
4. Sequence Alignment
No ratings yet
4. Sequence Alignment
24 pages
BLAST Glossary With Highlights
No ratings yet
BLAST Glossary With Highlights
9 pages
05 CAP5510 Fall21
No ratings yet
05 CAP5510 Fall21
40 pages
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
No ratings yet
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
57 pages
Unit2 2
No ratings yet
Unit2 2
30 pages
Lecture 4
No ratings yet
Lecture 4
106 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Protein Function Prediction Studies Ppts
No ratings yet
Protein Function Prediction Studies Ppts
34 pages
BLAST
No ratings yet
BLAST
30 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
2. Sequence alignment
No ratings yet
2. Sequence alignment
25 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Blast glossary
No ratings yet
Blast glossary
8 pages
Lab 2.1
No ratings yet
Lab 2.1
21 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
BLAST Script
No ratings yet
BLAST Script
10 pages
MICROBES AS TOOLS IN GENETIC STUDIES
No ratings yet
MICROBES AS TOOLS IN GENETIC STUDIES
27 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
UNIT IV _ BLAST (1)
No ratings yet
UNIT IV _ BLAST (1)
21 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
BIF401 Final Term Paper
No ratings yet
BIF401 Final Term Paper
9 pages
msa_MTech
No ratings yet
msa_MTech
17 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Dna Modifying Enzymes
No ratings yet
Dna Modifying Enzymes
35 pages
Tabby
No ratings yet
Tabby
11 pages
Sequence Alignment
No ratings yet
Sequence Alignment
14 pages
Bioinfo - BLAST - Scores PDF
No ratings yet
Bioinfo - BLAST - Scores PDF
8 pages
Bioinformatics Session8
No ratings yet
Bioinformatics Session8
33 pages
Lecture 8 ACB
No ratings yet
Lecture 8 ACB
5 pages
Blast
No ratings yet
Blast
12 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
20. Biotechnology
No ratings yet
20. Biotechnology
29 pages
Mutation Detection and Single-molecule Counting Using Isothermal Rolling-circle Amplification
No ratings yet
Mutation Detection and Single-molecule Counting Using Isothermal Rolling-circle Amplification
8 pages
Yi-Ping Phoebe Chen - Bioinformatics Technologies-_250210_163243-2
No ratings yet
Yi-Ping Phoebe Chen - Bioinformatics Technologies-_250210_163243-2
17 pages
Gene Editing
No ratings yet
Gene Editing
1 page
Biology BSC401
No ratings yet
Biology BSC401
80 pages
Genome Assembly in The Telomere-To-Telomere Era
No ratings yet
Genome Assembly in The Telomere-To-Telomere Era
13 pages
MUMmer PDF
No ratings yet
MUMmer PDF
8 pages
05 Minimum Edit Distance in Computational Biology 9-29
No ratings yet
05 Minimum Edit Distance in Computational Biology 9-29
4 pages
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
No ratings yet
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
6 pages
Alignment of Whole Genomes
No ratings yet
Alignment of Whole Genomes
21 pages
UCSC Genome Browser
No ratings yet
UCSC Genome Browser
9 pages
Quiz 4 GMO
No ratings yet
Quiz 4 GMO
1 page
Short Notes _ Biotechnology __ Lakshya MHTCET 2025
No ratings yet
Short Notes _ Biotechnology __ Lakshya MHTCET 2025
5 pages
The State-Of-The-Art in Biomimetics: Nathan F. Lepora, Paul Verschure and Tony J. Prescott
No ratings yet
The State-Of-The-Art in Biomimetics: Nathan F. Lepora, Paul Verschure and Tony J. Prescott
19 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
BC BioinformaticsTools
No ratings yet
BC BioinformaticsTools
29 pages
TRANSGENIC ANIMALS
No ratings yet
TRANSGENIC ANIMALS
17 pages
R Hsa 69173
No ratings yet
R Hsa 69173
3 pages
FSLC Fiche GillesThomas
No ratings yet
FSLC Fiche GillesThomas
2 pages
Dukewriting Lesson2-Answer
No ratings yet
Dukewriting Lesson2-Answer
2 pages
2022 Admission Information (Gachon Univ.-International Student)
No ratings yet
2022 Admission Information (Gachon Univ.-International Student)
13 pages
Protein Sequence
No ratings yet
Protein Sequence
36 pages
FASTA
No ratings yet
FASTA
3 pages
Health Education England: Whole-Genome Sequencing
No ratings yet
Health Education England: Whole-Genome Sequencing
2 pages
DDBJ, Bilogical Data Bases, Bioinformatics Data Base
No ratings yet
DDBJ, Bilogical Data Bases, Bioinformatics Data Base
2 pages
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
10 Minute Guide to Orthogonal Array Test Strategy
From Everand
10 Minute Guide to Orthogonal Array Test Strategy
Rajeev Nair Raman
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Algorithm Design and Scoring Matrices PDF

Uploaded by

Algorithm Design and Scoring Matrices PDF

Uploaded by

Bioinformatics

Muhammad Muddassir Ali

• Concept of bioinformatics algorithm

• Understand the basic algorithm that works behind sequence alignment

A Procedure i.e sequence or set of

• S is the raw score. Parameters λ and K depend on the

• Actually it is not a chance, nor a probability value, but rather

• E-val (S) = P-val (S) * N where N is the size of

You might also like