0% found this document useful (0 votes)
9 views

BioAlg02

The document provides an overview of bioinformatics algorithms, focusing on physical mapping and restriction mapping techniques. It discusses the discovery of restriction enzymes, the construction of restriction maps, and the challenges associated with reconstructing DNA sequences from fragment sizes. Additionally, it covers methods like gel electrophoresis, double digest mapping, and various computational problems related to restriction mapping.

Uploaded by

hipaji6592
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

BioAlg02

The document provides an overview of bioinformatics algorithms, focusing on physical mapping and restriction mapping techniques. It discusses the discovery of restriction enzymes, the construction of restriction maps, and the challenges associated with reconstructing DNA sequences from fragment sizes. Additionally, it covers methods like gel electrophoresis, double digest mapping, and various computational problems related to restriction mapping.

Uploaded by

hipaji6592
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 62

An Introduction to Bioinformatics Algorithms www.bioalgorithms.

info

Physical Mapping –
Restriction Mapping
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Molecular Scissors

Molecular Cell Biology, 4th edition


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Discovering Restriction Enzymes

• HindII - first restriction enzyme – was discovered


accidentally in 1970 while studying how the bacterium
Haemophilus influenzae takes up DNA from the virus
• Recognizes and cuts DNA at sequences:
GTGCAC
GTTAAC
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Discovering Restriction Enzymes


My father has discovered a servant
who serves as a pair of scissors. If
a foreign king invades a bacterium,
this servant can cut him in small
fragments, but he does not do any
harm to his own king. Clever
people use the servant with the
Werner Arber Daniel Nathans Hamilton Smith scissors to find out the secrets of
the kings. For this reason my father
Werner Arber – discovered restriction
received the Nobel Prize for the
enzymes
Daniel Nathans - pioneered the application discovery of the servant with the
of restriction for the scissors".
construction of genetic
maps Daniel Nathans’ daughter
Hamilton Smith - showed that restriction (from Nobel lecture)
enzyme cuts DNA in the
middle of a specific sequence
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Recognition Sites of Restriction Enzymes

Molecular Cell Biology, 4th edition


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Restriction Maps
• A map showing positions
of restriction sites in a
DNA sequence
• If DNA sequence is
known then construction
of restriction map is a
trivial exercise
• In early days of
molecular biology DNA
sequences were often
unknown
• Biologists had to solve
the problem of
constructing restriction
maps without knowing
DNA sequences
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Physical map
• Definition: Let S be a DNA sequence. A
physical map consists of a set M of markers and
a function p : M  N that assigns each marker a
position of M in S.

• N denotes the set of nonnegative integers


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Restriction mapping problem


• For a set X of points on the line, let X = { |
x1 - x2| : x1, x2 X } denote the multiset
of all pairwise distances between points in X. In
the restriction mapping problem, a subset E 
X (of experimentally obtained fragment
lengths) is given and the task is to reconstruct X
from E.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Full Restriction Digest

• DNA at each restriction site creates multiple


restriction fragments:

Is it possible to reconstruct the order of the fragments from the


sizes of the fragments {3,5,5,9} ?
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Full Restriction Digest: Multiple Solutions

• Alternative ordering of restriction fragments:

vs
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Measuring Length of Restriction Fragments

• Restriction enzymes break DNA into restriction fragments.

• Gel electrophoresis is a process for separating DNA by size


and measuring sizes of restriction fragments

• Can separate DNA fragments that differ in length in only 1


nucleotide for fragments up to 500 nucleotides long
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Gel Electrophoresis
• DNA fragments are injected into a gel positioned in an
electric field
• DNA are negatively charged near neutral pH
The ribose phosphate backbone of each nucleotide
is acidic; DNA has an overall negative charge
• DNA molecules move towards the positive electrode
• DNA fragments of different lengths are separated
according to size
Smaller molecules move through the gel matrix more readily than
larger molecules
• The gel matrix restricts random diffusion so molecules of
different lengths separate into different bands
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Gel Electrophoresis: Example

Direction of DNA
movement

Smaller fragments
travel farther

Molecular Cell Biology, 4th edition


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Vizualization of DNA:
Autoradiography and Fluorescence
•autoradiography:

• The DNA is radioactively labeled. The gel is laid against a


sheet of photographic film in the dark, exposing the film at
the positions where the DNA is present

•fluorescence:

• The gel is incubated with a solution containing the


fluorescent dye ethidium – ethidium binds to the DNA

• The DNA lights up when the gel is exposed to ultraviolet


light.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Three different problems


1. the double digest problem – DDP
2. the partial digest problem – PDP
3. the simplified partial digest
problem – SPDP
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Double Digest Mapping


Use two restriction enzymes; three full digests:
1. a complete digest of S using A,
2. a complete digest of S using B, and
3. a complete digest of S using both A and B.

• Computationally, Double Digest problem is more complex


than Partial Digest problem
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Double Digest: Example


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Double Digest: Example

Without the information about X (i.e. A+B), it is impossible to solve


the double digest problem as this diagram illustrates
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Double Digest Problem


Input: dA – fragment lengths from the complete digest with
enzyme A.
dB – fragment lengths from the complete digest with
enzyme B.
dX – fragment lengths from the complete digest with
both A and B.

Output: A – location of the cuts in the restriction map for the


enzyme A.
B – location of the cuts in the restriction map for the
enzyme B.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Double Digest: Multiple Solutions


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Double digest
• The decision problem of the DDP is NP-complete.
• All algorithms have problems with more than 10
restriction sites for each enzyme.
• A solution may not be unique and the number of
solutions grows exponenially.
• DDP is a favorite mapping method since the
experiments are easy to conduct.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

DDP is NP-complete
1. Is in NP – easy
2. given a set of integers X = {x1, . . . , xl}. The Set
Partitioning Problem (SPP) is to determine whether we
can partition X in into two subsets X1 and X2 such that

xx
x X 1 x X 2

3. This problem is known to be NP-complete.


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

DDP is NP-complete
• Let X be the input of the SPP, assuming that the sum of all
elements of X is even. Then set
dA = X,
K K 
dB =  ,   x
K. with , and
2 2 x X
dAB = dA.
n0 l
• then there exists an index n0 with  xj i
  xj i because
i 1 i n0 1
of the choice of B and AB. Thus a solution for the SPP exists.
• thus SPP is a DDP in which one of the two enzymes produced
only two fragments of equal length.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Partial Restriction Digest


• The sample of DNA is exposed to the restriction enzyme for
only a limited amount of time to prevent it from being cut at
all restriction sites
• This experiment generates the set of all possible restriction
fragments between every two (not necessarily consecutive)
cuts
• This set of fragment sizes is used to determine the positions
of the restriction sites in the DNA sequence
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Multiset of Restriction Fragments


• We assume that
multiplicity of a
fragment can be
detected, i.e., the
number of
restriction
fragments of the
same length can
be determined
(e.g., by observing
twice as much
fluorescence
intensity for a
double fragment
than for a single
fragment)
Multiset: {3, 5, 5, 8, 9, 14, 14, 17, 19, 22}
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Partial Digest Fundamentals


X: the set of n integers representing the location of all cuts in
the restriction map, including the start and end

n: the total number of cuts

X: the multiset of integers representing lengths of each of the


fragments produced from a partial digest
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

One More Partial Digest Example


X 0 2 4 7 10
0 2 4 7 10
2 2 5 8
4 3 6
7 3
10
Representation of X = {2, 2, 3, 3, 4, 5, 6, 7, 8, 10} as a two
dimensional table, with elements of
X = {0, 2, 4, 7, 10}
along both the top and left side. The elements at (i, j) in the table
is xj – xi for 1 ≤ i < j ≤ n.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Partial Digest Problem: Formulation

Goal: Given all pairwise distances between points on a line,


reconstruct the positions of those points

• Input: The multiset of pairwise distances L, containing


n(n-1)/2 integers
• Output: A set X, of n integers, such that X = L
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Partial Digest: Multiple Solutions


• It is not always possible to uniquely reconstruct a set X based
only on X.
• For example, the set
X = {0, 2, 5}
and
(X + 10) = {10, 12, 15}
both produce X={2, 3, 5} as their partial digest set.
• The sets {0,1,2,5,7,9,12} and {0,1,5,7,8,10,12} present a less
trivial example of non-uniqueness. They both digest into:
{1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 5, 6, 7, 7, 7, 8, 9, 10, 11, 12}
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Homometric Sets
0 1 2 5 7 9 12 0 1 5 7 8 10 12

0 1 2 5 7 9 12 0 1 5 7 8 10 12

1 1 4 6 8 11 1 4 6 7 9 11

2 3 5 7 10 5 2 3 5 7

5 2 4 7 7 1 3 5

7 2 5 8 2 4

9 3 10 2

12 12
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Partial Digest: Brute Force


1. Find the restriction fragment of maximum length M. M is
the length of the DNA sequence.

2. For every possible set


X={0, x2, … ,xn-1, M}

compute the corresponding X

• If X is equal to the experimental partial digest L, then X


is the correct restriction map
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

BruteForcePDP
1. BruteForcePDP(L, n):
2. M  maximum element in L
3. for every set of n – 2 integers 0 < x2 < … xn-1 < M
4. X  {0,x2,…,xn-1,M}
• Form X from X
• if X = L
• return X
• output “no solution”
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Efficiency of BruteForcePDP
• BruteForcePDP takes O(M n-2) time since it must examine all
possible sets of positions.

• One way to improve the algorithm is to limit the values of xi


to only those values which occur in L.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

AnotherBruteForcePDP
1. AnotherBruteForcePDP(L, n)
2. M  maximum element in L
3. for every set of n – 2 integers 0 < x2 < … xn-1 < M
4. X  { 0,x2,…,xn-1,M }
• Form X from X
• if X = L
• return X
• output “no solution”
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

AnotherBruteForcePDP
1. AnotherBruteForcePDP(L, n)
2. M  maximum element in L
3. for every set of n – 2 integers 0 < x2 < … xn-1 < M from L
4. X  { 0,x2,…,xn-1,M }
• Form X from X
• if X = L
• return X
• output “no solution”
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Efficiency of AnotherBruteForcePDP

• It’s more efficient, but still slow


• If L = {2, 998, 1000} (n = 3, M = 1000), BruteForcePDP will
be extremely slow, but AnotherBruteForcePDP will be quite
fast
• Fewer sets are examined, but runtime is still exponential:
O(n2n-4)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Branch and Bound Algorithm for PDP

1. Begin with X = {0}


2. Remove the largest element in L and place it in X
3. See if the element fits on the right or left side of the
restriction map
4. When it fits, find the other lengths it creates and remove
those from L
5. Go back to step 1 until L is empty
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Branch and Bound Algorithm for PDP

1. Begin with X = {0}


2. Remove the largest element in L and place it in X
3. See if the element fits on the right or left side of the
restriction map
4. When it fits, find the other lengths it creates and remove
those from L
5. Go back to step 1 until L is empty

WRONG ALGORITHM
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Defining D(y, X)

• Before describing PartialDigest, first define


D(y, X)
as the multiset of all distances between point y and all other
points in the set X

D(y, X) = {|y – x1|, |y – x2|, …, |y – xn|}

for X = {x1, x2, …, xn}


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

PartialDigest Algorithm

PartialDigest(L):
width  Maximum element in L
DELETE(width, L)
X  {0, width}
PLACE(L, X)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

PartialDigest Algorithm (cont’d)


1. PLACE(L, X)
2. if L is empty
3. output X
4. return
5. y  maximum element in L
• Delete(y,L)
• if D(y, X )  L
• Add y to X and remove lengths D(y, X) from L
• PLACE(L,X )
• Remove y from X and add lengths D(y, X) to L
• if D(width-y, X )  L
• Add width-y to X and remove lengths D(width-y, X) from L
• PLACE(L,X )
• Remove width-y from X and add lengths D(width-y, X ) to L
• return
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X={0}
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X={0}

Remove 10 from L and insert it into X. We know this must be


the length of the DNA sequence because it is the largest
fragment.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 10 }
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 10 }

Take 8 from L and make y = 2 or 8. But since the two cases


are symmetric, we can assume y = 2.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 10 }

We find that the distances from y=2 to other elements in X are


D(y, X) = {8, 2}, so we remove {8, 2} from L and add 2 to X.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 10 }
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 10 }

Take 7 from L and make y = 7 or y = 10 – 7 = 3. We will


explore y = 7 first, so D(y, X ) = {7, 5, 3}.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 10 }

For y = 7 first, D(y, X ) = {7, 5, 3}. Therefore we


remove {7, 5 ,3} from L and add 7 to X.

D(y, X) = {7, 5, 3} = {|7 – 0|, |7 – 2|, |7 – 10|}


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 7, 10 }
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 7, 10 }

Take 6 from L and make y = 6. Unfortunately


D(y, X) = {6, 4, 1 ,4}, which is not a subset of L. Therefore
we won’t explore this branch.

6
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 7, 10 }

This time make y = 4. D(y, X) = {4, 2, 3 ,6}, which is a


subset of L so we will explore this branch. We remove
{4, 2, 3 ,6} from L and add 4 to X.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 4, 7, 10 }
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 4, 7, 10 }

L is now empty, so we have a solution, which is X.


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 7, 10 }

To find other solutions, we backtrack.


An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 10 }

More backtrack.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 10 }

This time we will explore y = 3. D(y, X) = {3, 1, 7}, which is


not a subset of L, so we won’t explore this branch.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 10 }

We backtracked back to the root. Therefore we have found


all the solutions.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Analyzing PartialDigest Algorithm

• Still exponential in worst case, but is very fast on average


• Informally, let T(n) be time PartialDigest takes to place n cuts
No branching case: T(n) < T(n-1) + O(n)
Quadratic
Branching case: T(n) < 2T(n-1) + O(n)
Exponential
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

PDP analysis
• No polynomial time algorithm is known for PDP.
In fact, the complexity of PDP is an open
problem.
• S. Skiena devised a simple backtracking
algorithm that performs well in practice, but
may require exponential time.
• This approach is not a popular mapping method,
as it is difficult to reliably produce all pairwise
distances between restriction sites.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Simplified partial digest problem


• Given a target sequence S and a single
restriction enzyme A. Two different
experiments are performed
• on two sets of copies of S:
1. In the short experiment, the time span is chosen so
that each copy of the target sequence is cut precisely
once by the restriction enzyme.
2. In the long experiment, a complete digest of S by A
is performed.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

SPDP
• Let  = {1, . . . , 2N } be the multi-set of all
fragment lengths obtained by the short
experiment, and
• let  = {1, . . . , N+1} be the multi-set of all
fragment lengths obtained by the long
experiment,
• where N is the number of restriction sites in S.
• Here is an example: Given these (unknown)
restriction sites (in kb): 2 8 9 13 16
• We obtain % = {2kb, 6kb, 1kb, 4kb, 3kb}.

You might also like