0% found this document useful (0 votes)

10 views

bioinformatics.pdf.bak

Uploaded by

insan habib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

bioinformatics.pdf.bak

Uploaded by

insan habib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

University of Illinois at Urbana-Champaign

Luthey-Schulten Group
Theoretical and Computational Biophysics Group
Biophysics 590C: Fall 2004

Sequence Alignment
Algorithms

Rommie Amaro
Zaida Luthey-Schulten
Felix Autenrieth
Anurag Sethi
Brijeet Dhaliwal
Taras Pogorelov
Barry Isralewitz
September 2004

A current version of this tutorial is available at

http://www.ks.uiuc.edu/Training/Tutorials/
CONTENTS 2

Contents
1 Introduction 3

2 Sequence Alignment Algorithms 5

2.1 Manually perform a Needleman-Wunsch alignment . . . . . . . . 5
2.2 Finding homologous pairs of ClassII tRNA synthetases . . . . . . 10
1 INTRODUCTION 3

1 Introduction
The recent developments of projects such as the sequencing of the genome from
several organisms, and high-throughput X-ray structure analysis, have brought
a large amount of data about the sequences and structures of several thousand
proteins to the scientific community. This information can be used effectively for
medical and biological research only if one can extract functional insight from
the sequence and structural data. To achieve this we need to understand how the
proteins perform their functions. Two main computational techniques exist to
reach this a goal: a bioinformatics approach, and atomistic molecular dynamics
simulations. Bioinformatics uses the statistical analysis of protein sequences
and structures to understand their function and predict structures when only
sequence information is available. Molecular modeling and molecular dynamics
simulations use principles from physics and physical chemistry to study the
function and folding of proteins.
Bioinformatics methods are among the most powerful technologies available
in life sciences today. They are used in fundamental research on theories of
evolution and in more practical considerations of protein design. Algorithms and
approaches used in these studies range from sequence and structure alignments,
secondary structure prediction, functional classification of proteins, threading
and modeling of distantly-related homologous proteins to modeling the progress
of protein expression through a cell’s life cycle.
In this tutorial you will use a classic global sequence alignment method, the
Needleman-Wunsch algorithm, to align two small proteins. First you will align
them by hand and perform your own dynamic programming; afterwards you
will check your work against a computer program that we provide you. The
Needleman-Wunsch alignment programs have been kindly provided by Anurag
Sethi.
The entire tutorial takes about an hour to complete in its entirity.
Protein sequences vs. nucleotide sequences. A protein is a se-
quence of amino acids linked with peptide bonds to form a polypep-
tide chain. In this tutorial, the word sequence (unless otherwise
specified) refers to the amino acid residue sequence of a protein; by
convention these sequences are listed from the N-terminal to the C-
terminal of the chain. Sequences can be written with full names, as
in “Lysine, Arginine, Cysteine, ...”, with 3-letter codes, “Lys, Arg,
Cys, ...”, or with 1-letter codes, “K, R, C, ...” . Proteins range in
size from a few dozen to several thousand residues. The nucleotide
sequences of DNA encodes protein sequence. Sections of genes in
chromosomal DNA are copied to mRNA, which provides the guide
for ribosome to assemble a protein. A nucleotide sequence may be
written as “Cytosine, Adenine, Adenine, Guanine, ...”, or “C, A, A,
G, ...”.

This tutorial assumes that the alignment programs we provide you have been
correctly installed on the user’s computer. Please ask a lab attendant for help
if you have any trouble locating software or data files during the tutorial.
1 INTRODUCTION 4

Getting started
The files for this tutorial are located in:
>> mkdir ~/Workshop/bioinformatics-tutorial/
Within this directory is the pdf for the tutorial, as well as the files needed for
running the tutorial. Before you start the tutorial, be sure you are in the direc-
tory with all the files:
>>~/Workshop/bioinformatics-tutorial/bioinformatics
2 SEQUENCE ALIGNMENT ALGORITHMS 5

2 Sequence Alignment Algorithms

In this section you will optimally align two short protein sequences using pen
and paper, then search for homologous proteins by using a computer program to
align several, much longer, sequences.
Dynamic programming algorithms are recursive algorithms modified to store
intermediate results, which improves efficiency for certain problems. The Needleman–
Wunsch algorithm uses a dynamic programming algorithm to find the optimal
global alignment of two sequences — a and b. The alignment algorithm is based
on finding the elements of a matrix H where the element Hi,j is the optimal
score for aligning the sequence (a1 ,a2 ,...,ai ) with (b1 ,b2 ,.....,bj ). Two similar
amino acids (e.g. arginine and lysine) receive a high score, two dissimilar amino
acids (e.g. arginine and glycine) receive a low score. The higher the score of
a path through the matrix, the better the alignment. The matrix H is found
by progressively finding the matrix elements, starting at H1,1 and proceeding
in the directions of increasing i and j. Each element is set according to:

 Hi−1,j−1 + Si,j
Hi,j = max Hi−1,j − d
Hi,j−1 − d


where Si,j is the similarity score of comparing amino acid ai to amino acid
bj (obtained here from the BLOSUM40 similarity table) and d is the penalty
for a single gap. The matrix is initialized with H0,0 = 0. When obtaining the
local Smith-Waterman alignment, Hi,j is modified:


 0
Hi−1,j−1 + Si,j

Hi,j = max

 H i−1,j − d
Hi,j−1 − d


The gap penalty can be modified, for instance, d can be replaced by (d × k),
where d is the penalty for a single gap and k is the number of consecutive gaps.
Once the optimal alignment score is found, the “traceback” through H along
the optimal path is found, which corresponds to the the optimal sequence align-
ment for the score. In the next set of exercises you will manually implement
the Needleman-Wunsch alignment for a pair of short sequences, then perform
global sequence alignments with a computer program developed by Anurag
Sethi, which is based on the Needleman-Wunsch algorithm with an affine gap
penalty, d + e(k − 1), where e is the extension gap penalty. The output file will
be in the GCG format, one of the two standard formats in bioinformatics for
storing sequence information (the other standard format is FASTA).

2.1 Manually perform a Needleman-Wunsch alignment

In the first exercise you will test the Needleman-Wunsch algorithm on a short
sequence parts of hemoglobin (PDB code 1AOW) and myoglobin 1 (PDB code
1AZI).
2 SEQUENCE ALIGNMENT ALGORITHMS 6

Here you will align the sequence HGSAQVKGHG to the sequence KTEAEMKASEDLKKHGT.

The two sequences are arranged in a matrix in Table 1. The sequences start at
the upper right corner, the initial gap penalties are listed at each offset starting
position. With each move from the start position, the initial penalty increase
by our single gap penalty of 8.

H G S A Q V K G H G
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
K -8
T -16
E -24
A -32
E -40
M -48
K -56
A -64
S -72
E -80
D -88
L -96
K -104
K -112
H -120
G -128
T -136

Table 1: The empty matrix with initial gap penalties.

2 SEQUENCE ALIGNMENT ALGORITHMS 7

1 The first step is to fill in the similarity scores Si,j from looking up the
matches in the BLOSUM40 table, shown here labeled with 1-letter amino
acid codes:

A 5
R -2 9
N -1 0 8
D -1 -1 2 9
C -2 -3 -2 -2 16
Q 0 2 1 -1 -4 8
E -1 -1 -1 2 -2 2 7
G 1 -3 0 -2 -3 -2 -3 8
H -2 0 1 0 -4 0 0 -2 13
I -1 -3 -2 -4 -4 -3 -4 -4 -3 6
L -2 -2 -3 -3 -2 -2 -2 -4 -2 2 6
K -1 3 0 0 -3 1 1 -2 -1 -3 -2 6
M -1 -1 -2 -3 -3 -1 -2 -2 1 1 3 -1 7
F -3 -2 -3 -4 -2 -4 -3 -3 -2 1 2 -3 0 9
P -2 -3 -2 -2 -5 -2 0 -1 -2 -2 -4 -1 -2 -4 11
S 1 -1 1 0 -1 1 0 0 -1 -2 -3 0 -2 -2 -1 5
T 0 -2 0 -1 -1 -1 -1 -2 -2 -1 -1 0 -1 -1 0 2 6
W -3 -2 -4 -5 -6 -1 -2 -2 -5 -3 -1 -2 -2 1 -4 -5 -4 19
Y -2 -1 -2 -3 -4 -1 -2 -3 2 0 0 -1 1 4 -3 -2 -1 3 9
V 0 -2 -3 -3 -2 -3 -3 -4 -4 4 2 -2 1 0 -3 -1 1 -3 -1 5
B -1 -1 4 6 -2 0 1 -1 0 -3 -3 0 -3 -3 -2 0 0 -4 -3 -3 5
Z -1 0 0 1 -3 4 5 -2 0 -4 -2 1 -2 -4 -1 0 -1 -2 -2 -3 2 5
X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 0 -1 -2 0 0 -2 -1 -1 -1 -1 -1
A R N D C Q E G H I L K M F P S T W Y V B Z X
2 SEQUENCE ALIGNMENT ALGORITHMS 8

2 We fill in the BLOSUM40 similarity scores for you in Table 2.

3 To turn this S matrix intro the dynamic programming H matrix requires

calculation of the contents of all 170 boxes. We’ve calculated the first
4 here, and encourage you to calculate the contents of at least 4 more.
The practice will come in handy in the next steps. As described above, a
matrix square cannot be filled with its dynamic programming value until
the squares above, to the left, and to the above-left diagonal are computed.
The value of a square is,

 Hi−1,j−1 + Si,j
Hi,j = max Hi−1,j − d
Hi,j−1 − d


using the convention that H values appear in the top part of a square in
large print, and S values appear in the bottom part of a square in small
print. Our gap penalty d is 8.

Example:. In the upper left square in Table 2, square (1,1), the

similarity score S1,1 is -1, the number in small type at the bottom of
the box. The value to assign as H1,1 will be the greatest (“max”)
of thes e three values: (H0,0 + S1,1 ), (H0,1 − d), (H1,0 − d)). That
is, the greatest of: (0 + −1), (−8 − 8), (−8 − 8) which just means
the greatest of: -1, -16, and -16. This is -1, so we write -1 as the
value of H1,1 (the larger number in the top part of the box). The
same reasoning in square (2,1) leads us to set H2,1 as -9, and so
on. Note: we consider H0,0 to be the “predecessor” of H1,1 , since
it helped decided H1,1 ’s value. Later, predecessors will qualify to be
on the traceback path.

4 Again, just fill in 4 or 5 boxes in Table 2 until you get a feel for gap
penalties and similarity scores S vs. alignment scores H. In the next step,
we provide the matrix with all values filled in as Table 2.1. Check that
your 4 or 5 calculations match.

5 Now we move to Table 2.1, with all 170 Hi,j values are shown, to do the
“alignment traceback”. To find the alignment requires one to trace the
path through from the end of the sequence (the lower right box) to the
start of the sequence (the upper left box). This job looks complicated,
but should only take about 5 –7 minutes.

6 We are tracing a path in Table 2.1, from the lower right box to the upper
left box. You can only move to a square if it could have been a “prede-
cessor” of your current square – that is, when the matrix was being filled
with Hi,j values, the move from the predecessor square to your current
square would have followed the mathematical rules we used to find Hi,j
above. Circle each square you move to along your path.
2 SEQUENCE ALIGNMENT ALGORITHMS 9

Example:. we start at the lower right square (10,17), where H10,17

is -21 and S10,17 is -2. We need to test for 3 possible directions of
movement: diagonal (up + left), up, and left. The condition for
diagonal movement given above is: Hi,j = Hi−1,j−1 + Si,j , so for
the diagonal box (9,16) to have contributed to (10,17), H9,16 +
S10,17 would have to equal the H value of our box, -21. Since (-29
+ -2) does not equal -21, the diagonal box is not a “predecessor”,
so we can’t move in that direction. We try the rule for the box
to the left: Hi,j = Hi−1,j − d Since -37 - 8 does not equal -21,
we also can’t move left. Our last chance is moving up. We test
Hi,j = Hi,j−1 − d. Since -21 = (-13 - 8) we can move up! Draw
an arrow from the lower right box, (H10,17 = −21, S10,17 = −2) to
the box just above it, (H10,16 = −13, S10,16 = 8).

7 Continue moving squares, drawing arrows, and circling each new square
you land on, until you have reached the upper right corner of the matrix
If the path branches, follow both branches.

8 Write down the alignment(s) that corresponds to your path(s) by writing

the the letter codes on the margins of each position along your circled path.
Aligned pairs are at the boxes at which the path exits via the upper-left
corner. When there are horizontal or vertical movements movements along
your path, there will be a gap (write as a dash, “-”) in your sequence.

9 Now to check your results against a computer program. We have prepared

a pairwise Needleman-Wunsch alignment program, pair, which you will
apply to the same sequences which you have just manually aligned.

10 Change your directory by typing at the Unix prompt:

cd ~/Workshop/bioinformatics-tutorial/bioinformatics/pairData
then start the pair alignment executable by typing:
pair targlist
All alignments will be carried out using the BLOSUM40 matrix, with a
gap penalty of 8. The paths to the input files and the BLOSUM40 matrix
used are defined in the file targlist; the BLOSUM40 matrix is the first
25 lines of the file blosum40. (Other substitution matrices can be found
at the NCBI/Blast website.)
Note: In some installations, the pair executable is
in ~/Workshop/bioinformatics-tutorial/bioinformatics/pairData and
here you must
type ./pair targlist to run it.
If you cannot access the pair executable at all, you can see the output from
this step in ~/Workshop/bioinformatics-tutorial/bioinformatics/pairData/example output/

11 After executing the program you will generate three output files namely
align, scorematrix and stats. View the alignment in GCG format by
2 SEQUENCE ALIGNMENT ALGORITHMS 10

typing less align. The file scorematrix is the 17x10 H matrix. If

there are multiple paths along the traceback matrix, the program pair
will choose only one path, by following this precedence rule for existing
potential traceback directions, listed in decreasing preceden ce: diagonal
(left and up), up, left. In the file stats you will find the optimal alignment
score and the percent identity of the alignment.

Questions. Compare your manual alignment to the the output of

the pair program. Do the alignments match?

2.2 Finding homologous pairs of ClassII tRNA synthetases

Homologous proteins are proteins derived from a common ancestral gene. In
this exercise with the Needleman-Wunsch algorithm you will study the sequence
identity of several class II tRNA synthetases, which are either from Eucarya,
Eubacteria or Archaea or differ in the kind of aminoacylation reaction which
they catalyze. Table 4 summarizes the reaction type, the organism and the
PDB accession code and chain name of the employed Class II tRNA synthetase
domains.

We have have prepared a computer program multiple which will align multiple
pairs of proteins.

1 Change your directory by typing at the Unix prompt:

cd ~/Workshop/bioinformatics-tutorial/bioinformatics/multipleData
then start the alignment executable by typing:
multiple targlist

Note: In some installations, the multiple executable is

in ~/Workshop/bioinformatics-tutorial/bioinformatics/multipleData and
here you must
type ./multiple targlist to run it.
If you cannot access the multiple executable at all, you can see the output from
this step in ~/Workshop.work/Bioinformatics/multipleData/example output/

2 In the align and stats files you will find all combinatorial possible pairs
of the provided sequences. On a piece of paper, write the names of the the
proteins, grouped by ther domain of life, as listed in Table 4. Compare
sequence identities of aligned proteins from the same domain of a life,
and of aligned proteins from different domains of life, to help answer the
questions below.
2 SEQUENCE ALIGNMENT ALGORITHMS 11

Questions. What criteria do you use in order to determine if two

proteins are homologous? Can you find a pattern when you evaluate
percent identities between the pairs of class II tRNA synthetases?
Which is the most evolutionarily related pair, and which is the most
evolutionarily divergent pair according to the sequence identity?
2 SEQUENCE ALIGNMENT ALGORITHMS 12

H G S A Q V K G H G
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
−1 −9
K -8
−1 −2 0 −1 1 −2 6 −2 −1 −2

−9 −3
T -16
−2 −2 2 0 −1 1 0 −2 −2 −2

E -24
0 −3 0 −1 2 −3 1 −3 0 −3

A -32
−2 1 1 5 0 0 −1 1 −2 1

E -40
0 −3 0 −1 2 −3 1 −3 0 −3

M -48
1 −2 −2 −1 −1 1 −1 −2 1 −2

K -56
−1 −2 0 −1 1 −2 6 −2 −1 −2

A -64
−2 1 1 5 0 0 −1 1 −2 1

S -72
−1 0 5 1 1 −1 0 0 −1 0

E -80
0 −3 0 −1 2 −3 1 −3 0 −3

D -88
0 −2 0 −1 −1 −3 0 −2 0 −2

L -96
−2 −4 −3 −2 −2 2 −2 −4 −2 −4

K -104
−1 −2 0 −1 1 −2 6 −2 −1 −2

K -112
−1 −2 0 −1 1 −2 6 −2 −1 −2

H -120
13 −2 −1 −2 0 −4 −1 −2 13 −2

G -128
−2 8 0 1 −2 −4 −2 8 −2 8

T -136
−2 −2 2 0 −1 1 0 −2 −2 −2

Table 2: Alignment score worksheet. In all alignment boxes, the similarity

score Si,j from the BLOSUM40 matrix lookup is supplied (small text, bottom
of square). Four alignment scores are provided as examples (large text, top of
square), try and calculate at least four more, following the direction provided
in the text for calculating Hi,j .
2 SEQUENCE ALIGNMENT ALGORITHMS 13

H G S A Q V K G H G
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
−1 −9 −16 −24 −31 −39 −42 −50 −58 −66
K -8
−1 −2 0 −1 1 −2 6 −2 −1 −2

−9 −3 −7 −15 −23 −30 −38 −44 −52 −60

T -16
−2 −2 2 0 −1 1 0 −2 −2 −2

−16 −11 −3 −8 −13 −21 −29 −37 −44 −52

E -24
0 −3 0 −1 2 −3 1 −3 0 −3

−24 −15 −10 2 −6 −13 −21 −28 −36 −43

A -32
−2 1 1 5 0 0 −1 1 −2 1

−32 −23 −15 −6 4 −4 −12 −20 −28 −36

E -40
0 −3 0 −1 2 −3 1 −3 0 −3

−39 −31 −23 −14 −4 5 −3 −11 −19 −27

M -48
1 −2 −2 −1 −1 1 −1 −2 1 −2

−47 −39 −31 −22 −12 −3 11 3 −5 −13

K -56
−1 −2 0 −1 1 −2 6 −2 −1 −2

−55 −46 −38 −26 −20 −11 3 12 4 −4

A -64
−2 1 1 5 0 0 −1 1 −2 1

−63 −54 −41 −34 −25 −19 −5 4 11 4

S -72
−1 0 5 1 1 −1 0 0 −1 0

−71 −62 −49 −42 −32 −27 −13 −4 4 8

E -80
0 −3 0 −1 2 −3 1 −3 0 −3

−79 −70 −57 −50 −40 −35 −21 −12 −4 2

D -88
0 −2 0 −1 −1 −3 0 −2 0 −2

−87 −78 −65 −58 −48 −38 −29 −20 −12 −6

L -96
−2 −4 −3 −2 −2 2 −2 −4 −2 −4

−95 −86 −73 −66 −56 −46 −32 −28 −20 −14
K -104
−1 −2 0 −1 1 −2 6 −2 −1 −2

−103 −94 −81 −74 −64 −54 −40 −34 −28 −22
K -112
−1 −2 0 −1 1 −2 6 −2 −1 −2

−99 −102 −89 −82 −72 −62 −48 −42 −21 −29
H -120
13 −2 −1 −2 0 −4 −1 −2 13 −2

−107 −91 −97 −88 −80 −70 −56 −40 −29 −13
G -128
−2 8 0 1 −2 −4 −2 8 −2 8

−115 −99 −89 −96 −88 −78 −64 −48 −37 −21
T -136
−2 −2 2 0 −1 1 0 −2 −2 −2

Table 3: Traceback worksheet. The completed alignment score matrix H (large

text, top of each square) with the BLOSUM40 lookup scores Si,j (small text,
bottom of each square). To find the alignment, trace back starting from the
lower right (T vs G, score -21) and proceed diagonally (to the left and up), left,
or up. Only proceed, however, if the square in that direction could have been a
predecessor, according to the conditions described in the text.
2 SEQUENCE ALIGNMENT ALGORITHMS 14

Specificity Organism PDB code:chain ASTRAL catalytic domain

Aspartyl Eubacteria 1EQR:B d1eqrb3
Aspartyl Archaea 1B8A:A d1b8aa2
Aspartyl Eukarya 1ASZ:A d1asza2
Glycl Archaea 1ATI:A d1atia2
Histidyl Eubacteria 1ADJ:C d1adjc2
Lysl Eubacteria 1BBW:A d1bbwa2
Aspartyl Eubacteria 1EFW:A d1efwa3

Table 4: Domain types, origins, and accession codes

Bioinformatics Pratical File
No ratings yet
Bioinformatics Pratical File
63 pages
The Needleman Wunsch Algorithm For Sequence Alignment
No ratings yet
The Needleman Wunsch Algorithm For Sequence Alignment
46 pages
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
17 pages
Bioinfo Notes 2
No ratings yet
Bioinfo Notes 2
9 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Daa Assignment 9
No ratings yet
Daa Assignment 9
4 pages
Tabby
No ratings yet
Tabby
11 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Daa Assignment 9 Aryan Project (1)
No ratings yet
Daa Assignment 9 Aryan Project (1)
5 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Module 3 Session.2 Practical Assignment-Lucy Nakabazzi
No ratings yet
Module 3 Session.2 Practical Assignment-Lucy Nakabazzi
4 pages
W03_Pairwise
No ratings yet
W03_Pairwise
55 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
G7 Sequence Alignment
No ratings yet
G7 Sequence Alignment
6 pages
4. Sequence Alignment
No ratings yet
4. Sequence Alignment
24 pages
Dynamic Programming Methods in Pairwise Alignment
No ratings yet
Dynamic Programming Methods in Pairwise Alignment
41 pages
lecture2_sequence_alignment
No ratings yet
lecture2_sequence_alignment
26 pages
Lecture 6- Sequence Analysis
No ratings yet
Lecture 6- Sequence Analysis
28 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Sequence Alignemt
No ratings yet
Sequence Alignemt
3 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony
No ratings yet
Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony
7 pages
Bioinformatics Basics PDF
No ratings yet
Bioinformatics Basics PDF
10 pages
Daa Assignment 10 Aryan Project
No ratings yet
Daa Assignment 10 Aryan Project
11 pages
Bioinformatics 04
No ratings yet
Bioinformatics 04
28 pages
Protein Tertiary Structures: Prediction From Amino Acid Sequences
No ratings yet
Protein Tertiary Structures: Prediction From Amino Acid Sequences
7 pages
Laboratory Work Preparation Lab Work 8: Sequence Alignment: Biomedical Informatics
No ratings yet
Laboratory Work Preparation Lab Work 8: Sequence Alignment: Biomedical Informatics
32 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Needlemanwunsch 130216130832 Phpapp01
No ratings yet
Needlemanwunsch 130216130832 Phpapp01
39 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Bioinfo Generic Skill
No ratings yet
Bioinfo Generic Skill
10 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
No ratings yet
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
57 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Data Mining-Mining Sequence Patterns in Biological Data
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
6 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Bio 2
No ratings yet
Bio 2
39 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Needleman Wunsch
100% (1)
Needleman Wunsch
6 pages
Lecture 5
No ratings yet
Lecture 5
15 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Module-II
No ratings yet
Module-II
51 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
36) Corpet 1988
No ratings yet
36) Corpet 1988
10 pages
Sequence Alignment Algorithms: DEKM Book Notes From Dr. Bino John and Dr. Takis Benos
No ratings yet
Sequence Alignment Algorithms: DEKM Book Notes From Dr. Bino John and Dr. Takis Benos
53 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
AsBioinfo-Ders-7-ALLIGNMENT_1
No ratings yet
AsBioinfo-Ders-7-ALLIGNMENT_1
9 pages
Blast 2 Sequences: Salman Khan Current Gpa in Bioinf 4 Gpa
No ratings yet
Blast 2 Sequences: Salman Khan Current Gpa in Bioinf 4 Gpa
45 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
Logical Modeling of Biological Systems
From Everand
Logical Modeling of Biological Systems
Luis Fariñas del Cerro
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
1 What Is Bioinformatics
No ratings yet
1 What Is Bioinformatics
34 pages
Download Full Open source software in life science research Practical solutions to common challenges in the pharmaceutical industry and beyond 1st Edition Lee Harland PDF All Chapters
100% (11)
Download Full Open source software in life science research Practical solutions to common challenges in the pharmaceutical industry and beyond 1st Edition Lee Harland PDF All Chapters
50 pages
DNA Sequencing at 40 - Past Present and Future
No ratings yet
DNA Sequencing at 40 - Past Present and Future
10 pages
Ch.4 Estimating Evolutionary Trees - 2019.09.23
No ratings yet
Ch.4 Estimating Evolutionary Trees - 2019.09.23
51 pages
The Variant Call Format and Vcftools: Example
No ratings yet
The Variant Call Format and Vcftools: Example
1 page
Selectfellows
No ratings yet
Selectfellows
26 pages
Applications of Nutrigenomics in Animal Science1
No ratings yet
Applications of Nutrigenomics in Animal Science1
24 pages
Satyendra Resume.
No ratings yet
Satyendra Resume.
2 pages
Protein Structure Modeling
No ratings yet
Protein Structure Modeling
21 pages
Expressed Sequence Tags (Ests)
No ratings yet
Expressed Sequence Tags (Ests)
3 pages
Machine Learning Thesis Ideas
100% (3)
Machine Learning Thesis Ideas
5 pages
Alignment and Phylogenic Tree Construction Procedures.
No ratings yet
Alignment and Phylogenic Tree Construction Procedures.
2 pages
Module 5-Lectures 4
No ratings yet
Module 5-Lectures 4
9 pages
Research Opportunity Resources - Research Opportunities
No ratings yet
Research Opportunity Resources - Research Opportunities
15 pages
Transcriptomics Notes
No ratings yet
Transcriptomics Notes
9 pages
W. James Kent - BLAT-The BLAST-Like Alignment Tool
No ratings yet
W. James Kent - BLAT-The BLAST-Like Alignment Tool
10 pages
Materi Perkuliahan 6 - Filogenetik
No ratings yet
Materi Perkuliahan 6 - Filogenetik
27 pages
Qasem
No ratings yet
Qasem
6 pages
Insect Genomic Resources: Status, Availability and Future: General Articles
No ratings yet
Insect Genomic Resources: Status, Availability and Future: General Articles
10 pages
EMBL - Nucleotide - Protein Data Base
No ratings yet
EMBL - Nucleotide - Protein Data Base
6 pages
Biological Databases: DR Z Chikwambi Biotechnology
No ratings yet
Biological Databases: DR Z Chikwambi Biotechnology
47 pages
Science
No ratings yet
Science
13 pages
Milta 36
No ratings yet
Milta 36
7 pages
Biologicalsciences
No ratings yet
Biologicalsciences
10 pages
Article
No ratings yet
Article
9 pages
Dog SNP Jesse Garris Part 2
No ratings yet
Dog SNP Jesse Garris Part 2
2 pages
Genomics Lectures 15 To 16-2023
No ratings yet
Genomics Lectures 15 To 16-2023
19 pages
Divine Birth - Pradyumna Mallampalli
No ratings yet
Divine Birth - Pradyumna Mallampalli
4 pages
Rethinking Attention With Performers
No ratings yet
Rethinking Attention With Performers
38 pages

bioinformatics.pdf.bak

Uploaded by

bioinformatics.pdf.bak

Uploaded by

University of Illinois at Urbana-Champaign

A current version of this tutorial is available at

2 Sequence Alignment Algorithms 5

2 Sequence Alignment Algorithms

2.1 Manually perform a Needleman-Wunsch alignment

Table 1: The empty matrix with initial gap penalties.

2 We fill in the BLOSUM40 similarity scores for you in Table 2.

3 To turn this S matrix intro the dynamic programming H matrix requires

Example:. In the upper left square in Table 2, square (1,1), the

Example:. we start at the lower right square (10,17), where H10,17

8 Write down the alignment(s) that corresponds to your path(s) by writing

9 Now to check your results against a computer program. We have prepared

10 Change your directory by typing at the Unix prompt:

typing less align. The file scorematrix is the 17x10 H matrix. If

Questions. Compare your manual alignment to the the output of

2.2 Finding homologous pairs of ClassII tRNA synthetases

1 Change your directory by typing at the Unix prompt:

Note: In some installations, the multiple executable is

Questions. What criteria do you use in order to determine if two

Table 2: Alignment score worksheet. In all alignment boxes, the similarity

−9 −3 −7 −15 −23 −30 −38 −44 −52 −60

−16 −11 −3 −8 −13 −21 −29 −37 −44 −52

−24 −15 −10 2 −6 −13 −21 −28 −36 −43

−32 −23 −15 −6 4 −4 −12 −20 −28 −36

−39 −31 −23 −14 −4 5 −3 −11 −19 −27

−47 −39 −31 −22 −12 −3 11 3 −5 −13

−55 −46 −38 −26 −20 −11 3 12 4 −4

−63 −54 −41 −34 −25 −19 −5 4 11 4

−71 −62 −49 −42 −32 −27 −13 −4 4 8

−79 −70 −57 −50 −40 −35 −21 −12 −4 2

−87 −78 −65 −58 −48 −38 −29 −20 −12 −6

Table 3: Traceback worksheet. The completed alignment score matrix H (large

Specificity Organism PDB code:chain ASTRAL catalytic domain

Table 4: Domain types, origins, and accession codes

You might also like