BioAlg02
BioAlg02
info
Physical Mapping –
Restriction Mapping
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Molecular Scissors
Restriction Maps
• A map showing positions
of restriction sites in a
DNA sequence
• If DNA sequence is
known then construction
of restriction map is a
trivial exercise
• In early days of
molecular biology DNA
sequences were often
unknown
• Biologists had to solve
the problem of
constructing restriction
maps without knowing
DNA sequences
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Physical map
• Definition: Let S be a DNA sequence. A
physical map consists of a set M of markers and
a function p : M N that assigns each marker a
position of M in S.
vs
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Gel Electrophoresis
• DNA fragments are injected into a gel positioned in an
electric field
• DNA are negatively charged near neutral pH
The ribose phosphate backbone of each nucleotide
is acidic; DNA has an overall negative charge
• DNA molecules move towards the positive electrode
• DNA fragments of different lengths are separated
according to size
Smaller molecules move through the gel matrix more readily than
larger molecules
• The gel matrix restricts random diffusion so molecules of
different lengths separate into different bands
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Direction of DNA
movement
Smaller fragments
travel farther
Vizualization of DNA:
Autoradiography and Fluorescence
•autoradiography:
•fluorescence:
Double digest
• The decision problem of the DDP is NP-complete.
• All algorithms have problems with more than 10
restriction sites for each enzyme.
• A solution may not be unique and the number of
solutions grows exponenially.
• DDP is a favorite mapping method since the
experiments are easy to conduct.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
DDP is NP-complete
1. Is in NP – easy
2. given a set of integers X = {x1, . . . , xl}. The Set
Partitioning Problem (SPP) is to determine whether we
can partition X in into two subsets X1 and X2 such that
xx
x X 1 x X 2
DDP is NP-complete
• Let X be the input of the SPP, assuming that the sum of all
elements of X is even. Then set
dA = X,
K K
dB = , x
K. with , and
2 2 x X
dAB = dA.
n0 l
• then there exists an index n0 with xj i
xj i because
i 1 i n0 1
of the choice of B and AB. Thus a solution for the SPP exists.
• thus SPP is a DDP in which one of the two enzymes produced
only two fragments of equal length.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Homometric Sets
0 1 2 5 7 9 12 0 1 5 7 8 10 12
0 1 2 5 7 9 12 0 1 5 7 8 10 12
1 1 4 6 8 11 1 4 6 7 9 11
2 3 5 7 10 5 2 3 5 7
5 2 4 7 7 1 3 5
7 2 5 8 2 4
9 3 10 2
12 12
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
BruteForcePDP
1. BruteForcePDP(L, n):
2. M maximum element in L
3. for every set of n – 2 integers 0 < x2 < … xn-1 < M
4. X {0,x2,…,xn-1,M}
• Form X from X
• if X = L
• return X
• output “no solution”
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Efficiency of BruteForcePDP
• BruteForcePDP takes O(M n-2) time since it must examine all
possible sets of positions.
AnotherBruteForcePDP
1. AnotherBruteForcePDP(L, n)
2. M maximum element in L
3. for every set of n – 2 integers 0 < x2 < … xn-1 < M
4. X { 0,x2,…,xn-1,M }
• Form X from X
• if X = L
• return X
• output “no solution”
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
AnotherBruteForcePDP
1. AnotherBruteForcePDP(L, n)
2. M maximum element in L
3. for every set of n – 2 integers 0 < x2 < … xn-1 < M from L
4. X { 0,x2,…,xn-1,M }
• Form X from X
• if X = L
• return X
• output “no solution”
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Efficiency of AnotherBruteForcePDP
WRONG ALGORITHM
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Defining D(y, X)
PartialDigest Algorithm
PartialDigest(L):
width Maximum element in L
DELETE(width, L)
X {0, width}
PLACE(L, X)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X={0}
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X={0}
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 10 }
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 10 }
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 10 }
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 10 }
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 10 }
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 10 }
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 7, 10 }
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 7, 10 }
6
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 7, 10 }
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 4, 7, 10 }
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 4, 7, 10 }
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 7, 10 }
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 10 }
More backtrack.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 2, 10 }
An Example
L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }
X = { 0, 10 }
PDP analysis
• No polynomial time algorithm is known for PDP.
In fact, the complexity of PDP is an open
problem.
• S. Skiena devised a simple backtracking
algorithm that performs well in practice, but
may require exponential time.
• This approach is not a popular mapping method,
as it is difficult to reliably produce all pairwise
distances between restriction sites.
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
SPDP
• Let = {1, . . . , 2N } be the multi-set of all
fragment lengths obtained by the short
experiment, and
• let = {1, . . . , N+1} be the multi-set of all
fragment lengths obtained by the long
experiment,
• where N is the number of restriction sites in S.
• Here is an example: Given these (unknown)
restriction sites (in kb): 2 8 9 13 16
• We obtain % = {2kb, 6kb, 1kb, 4kb, 3kb}.