0% found this document useful (0 votes)

4 views

Lecture9_IO_BLG336E_2022

This lecture covers dynamic programming with a focus on applications such as the Longest Common Subsequence (LCS) and the Knapsack Problem. It outlines the steps for applying dynamic programming, including identifying optimal substructure and formulating recursive solutions. The lecture also emphasizes the importance of overlapping sub-problems and provides examples relevant to bioinformatics.

Uploaded by

pearsonicin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Lecture9_IO_BLG336E_2022

Uploaded by

pearsonicin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 149

BLG 336E

Analysis of Algorithms II
Lecture 9:
Dynamic Programming II
DNA Sequencing, Knapsack Problem

1
Last time

• Dynamic programming is an algorithm design

paradigm.
• Basic idea:
• Identify optimal sub-structure
• Optimum to the big problem is built out of optima of small
sub-problems
• Take advantage of overlapping sub-problems
• Only solve each sub-problem once, then use it again and again
• Keep track of the solutions to sub-problems in a table
as you build to the final solution.
Today
• Examples of dynamic programming:
1. Longest common subsequence
2. Knapsack problem
• Two versions!
3. Independent sets in trees
• If we have time…
• (If not the slides will be there as a reference)
The goal of this lecture
• For you to get really bored of dynamic programming
Longest Common Subsequence
• How similar are these two species?

DNA: DNA:
AGCCCTAAGGGCTACCTAGCTT GACAGCCTACAAGCGTTAGCTTG
Longest Common Subsequence
• How similar are these two species?

DNA: DNA:
AGCCCTAAGGGCTACCTAGCTT GACAGCCTACAAGCGTTAGCTTG
• Pretty similar, their DNA has a long common subsequence:

AGCCTAAGCTTAGCTT
Longest Common Subsequence
• Subsequence:
• BDFH is a subsequence of ABCDEFGH
• If X and Y are sequences, a common subsequence
is a sequence which is a subsequence of both.
• BDFH is a common subsequence of ABCDEFGH and of
ABDFGHI
• A longest common subsequence…
• …is a common subsequence that is longest.
• The longest common subsequence of ABCDEFGH and
ABDFGHI is ABDFGH.
We sometimes want to find these
• Applications in bioinformatics

• The unix command diff

• Merging in version control
• svn, git, etc…
Recipe for applying Dynamic Programming
• Step 1: Identify optimal substructure.
• Step 2: Find a recursive formulation for the length
of the longest common subsequence.
• Step 3: Use dynamic programming to find the
length of the longest common subsequence.
• Step 4: If needed, keep track of some additional
info so that the algorithm from Step 3 can find the
actual LCS.
• Step 5: If needed, code this up like a reasonable
person.
Step 1: Optimal substructure
Prefixes:

X A C G G T

Y A C G C T T A

Notation: denote this prefix ACGC by Y4

• Our sub-problems will be finding LCS’s of prefixes to X and Y.

• Let C[i,j] = length_of_LCS( Xi, Yj )
Optimal substructure ctd.
• Subproblem:
• finding LCS’s of prefixes of X and Y.

• Why is this a good choice?

• There’s some relationship between LCS’s of prefixes and
LCS’s of the whole things.
• These subproblems overlap a lot.

To see this formally, on to…

Recipe for applying Dynamic Programming
• Step 1: Identify optimal substructure.
• Step 2: Find a recursive formulation for the length
of the longest common subsequence.
• Step 3: Use dynamic programming to find the
length of the longest common subsequence.
• Step 4: If needed, keep track of some additional
info so that the algorithm from Step 3 can find the
actual LCS.
• Step 5: If needed, code this up like a reasonable
person.
• Our sub-problems will be finding
Two cases LCS’s of prefixes to X and Y.
• Let C[i,j] = length_of_LCS( Xi, Yj )
Case 1: X[i] = Y[j]
i These are
the same

Xi A C G G A
j

Yj A C G C T T A

Notation: denote this prefix ACGC by Y4

• Then C[i,j] = 1 + C[i-1,j-1].

• because LCS(Xi,Yj) = LCS(Xi-1,Yj-1) followed by A
• Our sub-problems will be finding
Two cases LCS’s of prefixes to X and Y.
• Let C[i,j] = length_of_LCS( Xi, Yj )
Case 2: X[i] != Y[j]
i These are
not the
same
Xi A C G G T
j

Yj A C G C T T A

Notation: denote this prefix ACGC by Y4

• Then C[i,j] = max{ C[i-1,j], C[i,j-1] }.

• either LCS(Xi,Yj) = LCS(Xi-1,Yj) and T is not involved,
• or LCS(Xi,Yj) = LCS(Xi,Yj-1) and A is not involved,
Recursive formulation
of the optimal solution X0
Yj A C G C T T A

Case 0

0 if 𝑖 = 0 or 𝑗 = 0
• 𝐶 𝑖, 𝑗 = ൞𝐶 𝑖 − 1, 𝑗 − 1 + 1 if 𝑋 𝑖 = 𝑌 𝑗 and 𝑖, 𝑗 > 0
max 𝐶 𝑖, 𝑗 − 1 , 𝐶 𝑖 − 1, 𝑗 if 𝑋 𝑖 ≠ 𝑌 𝑗 and 𝑖, 𝑗 > 0

Case 1 Case 2

A C G G A A C G G T
Xi Xi
Yj A C G C T T A Yj A C G C T T A
Recipe for applying Dynamic Programming
• Step 1: Identify optimal substructure.
• Step 2: Find a recursive formulation for the length
of the longest common subsequence.
• Step 3: Use dynamic programming to find the
length of the longest common subsequence.
• Step 4: If needed, keep track of some additional
info so that the algorithm from Step 3 can find the
actual LCS.
• Step 5: If needed, code this up like a reasonable
person.
LCS DP OMG BBQ
• LCS(X, Y):
• C[i,0] = C[0,j] = 0 for all i = 1,…,m, j=1,…n.
• For i = 1,…,m and j = 1,…,n:
• If X[i] = Y[j]:
• C[i,j] = C[i-1,j-1] + 1
• Else:
• C[i,j] = max{ C[i,j-1], C[i-1,j] }

0 if 𝑖 = 0 or 𝑗 = 0
𝐶 𝑖, 𝑗 = ൞ 𝐶 𝑖 − 1, 𝑗 − 1 + 1 if 𝑋 𝑖 = 𝑌 𝑗 and 𝑖, 𝑗 > 0
max 𝐶 𝑖, 𝑗 − 1 , 𝐶 𝑖 − 1, 𝑗 if 𝑋 𝑖 ≠ 𝑌 𝑗 and 𝑖, 𝑗 > 0
X A C G G A

Example
Y A C T G

Y
A C T G

0 0 0 0 0

A 0

C 0

X G 0

G 0

A 0
0 if 𝑖 = 0 or 𝑗 = 0
𝐶 𝑖, 𝑗 = ൞𝐶 𝑖 − 1, 𝑗 − 1 + 1 if 𝑋 𝑖 = 𝑌 𝑗 and 𝑖, 𝑗 > 0
max 𝐶 𝑖, 𝑗 − 1 , 𝐶 𝑖 − 1, 𝑗 if 𝑋 𝑖 ≠ 𝑌 𝑗 and 𝑖, 𝑗 > 0
X A C G G A

Example
Y A C T G

Y
A C T G

0 0 0 0 0

A 0 1 1 1 1 So the LCM of X
C 0 1 2 2 2 and Y has length 3.
X G 0 1 2 2 3

G 0 1 2 2 3

A 0 1 2 2 3
0 if 𝑖 = 0 or 𝑗 = 0
𝐶 𝑖, 𝑗 = ൞𝐶 𝑖 − 1, 𝑗 − 1 + 1 if 𝑋 𝑖 = 𝑌 𝑗 and 𝑖, 𝑗 > 0
max 𝐶 𝑖, 𝑗 − 1 , 𝐶 𝑖 − 1, 𝑗 if 𝑋 𝑖 ≠ 𝑌 𝑗 and 𝑖, 𝑗 > 0
Recipe for applying Dynamic Programming
• Step 1: Identify optimal substructure.
• Step 2: Find a recursive formulation for the length
of the longest common subsequence.
• Step 3: Use dynamic programming to find the
length of the longest common subsequence.
• Step 4: If needed, keep track of some additional
info so that the algorithm from Step 3 can find the
actual LCS.
• Step 5: If needed, code this up like a reasonable
person.
X A C G G A

Example
Y A C T G

Y
A C T G

0 0 0 0 0

A 0

C 0

X G 0

G 0

Example
Y A C T G

Y
A C T G

0 0 0 0 0

A 0 1 1 1 1

C 0 1 2 2 2

X G 0 1 2 2 3

G 0 1 2 2 3

A 0 1 2 2 3
0 if 𝑖 = 0 or 𝑗 = 0
𝐶 𝑖, 𝑗 = ൞𝐶 𝑖 − 1, 𝑗 − 1 + 1 if 𝑋 𝑖 = 𝑌 𝑗 and 𝑖, 𝑗 > 0
max 𝐶 𝑖, 𝑗 − 1 , 𝐶 𝑖 − 1, 𝑗 if 𝑋 𝑖 ≠ 𝑌 𝑗 and 𝑖, 𝑗 > 0
X A C G G A

Example
Y A C T G

Y
A C T G
• Once we’ve filled this in,
we can work backwards.
0 0 0 0 0

A 0 1 1 1 1

C 0 1 2 2 2

X G 0 1 2 2 3

G 0 1 2 2 3

Example
Y A C T G

Y
A C T G
• Once we’ve filled this in,
we can work backwards.
0 0 0 0 0

A 0 1 1 1 1

C 0 1 2 2 2

X G 0 1 2 2 3

G 0 1 2 2 3 That 3 must have come

from the 3 above it.
A 0 1 2 2 3
0 if 𝑖 = 0 or 𝑗 = 0
𝐶 𝑖, 𝑗 = ൞𝐶 𝑖 − 1, 𝑗 − 1 + 1 if 𝑋 𝑖 = 𝑌 𝑗 and 𝑖, 𝑗 > 0
max 𝐶 𝑖, 𝑗 − 1 , 𝐶 𝑖 − 1, 𝑗 if 𝑋 𝑖 ≠ 𝑌 𝑗 and 𝑖, 𝑗 > 0
X A C G G A

Example
Y A C T G

Y
A C T G
• Once we’ve filled this in,
we can work backwards.
0 0 0 0 0 • A diagonal jump means
that we found an element
A 0 1 1 1 1
of the LCS!
C 0 1 2 2 2

X G 0 1 2 2 3 This 3 came from that 2 –

0 1 2 2 3 we found a match!
G

Example
Y A C T G

Y
A C T G
• Once we’ve filled this in,
we can work backwards.
0 0 0 0 0 • A diagonal jump means
that we found an element
A 0 1 1 1 1
of the LCS!
C 0 1 2 2 2 That 2 may as well
have come from
X G 0 1 2 2 3
this other 2. G
G 0 1 2 2 3

Example
Y A C T G

Y
A C T G
• Once we’ve filled this in,
we can work backwards.
0 0 0 0 0 • A diagonal jump means
that we found an element
A 0 1 1 1 1
of the LCS!
C 0 1 2 2 2

X G 0 1 2 2 3 G
G 0 1 2 2 3

Example
Y A C T G

Y
A C T G
• Once we’ve filled this in,
we can work backwards.
0 0 0 0 0 • A diagonal jump means
that we found an element
A 0 1 1 1 1
of the LCS!
C 0 1 2 2 2

X G 0 1 2 2 3 C G
G 0 1 2 2 3

Example
Y A C T G

Y
A C T G
• Once we’ve filled this in,
we can work backwards.
0 0 0 0 0 • A diagonal jump means
that we found an element
A 0 1 1 1 1
of the LCS!
C 0 1 2 2 2

X G 0 1 2 2 3 A C G
G 0 1 2 2 3
This is the LCS!
A 0 1 2 2 3
0 if 𝑖 = 0 or 𝑗 = 0
𝐶 𝑖, 𝑗 = ൞𝐶 𝑖 − 1, 𝑗 − 1 + 1 if 𝑋 𝑖 = 𝑌 𝑗 and 𝑖, 𝑗 > 0
max 𝐶 𝑖, 𝑗 − 1 , 𝐶 𝑖 − 1, 𝑗 if 𝑋 𝑖 ≠ 𝑌 𝑗 and 𝑖, 𝑗 > 0
This gives an algorithm to recover the actual LCS
not just its length

• It runs in time O(n + m)

• We walk up and left in an n-by-m array
• We can only do that for n + m steps.
• So actually recovering the LCS from the table is
much faster than building the table was.
• We can find LCS(X,Y) in time O(mn).
Recipe for applying Dynamic Programming
• Step 1: Identify optimal substructure.
• Step 2: Find a recursive formulation for the length
of the longest common subsequence.
• Step 3: Use dynamic programming to find the
length of the longest common subsequence.
• Step 4: If needed, keep track of some additional
info so that the algorithm from Step 3 can find the
actual LCS.
• Step 5: If needed, code this up like a reasonable
person.
This pseudocode actually isn’t so bad

• If we are only interested in the length of the LCS:

• Since we go across the table one-row-at-a-time, we can only
keep two rows if we want.
• If we want to recover the LCS, we need to keep the whole
table.

• Can we do better than O(mn) time?

• A bit better.
• By a log factor or so.
• But doing much better (polynomially better) is an open
problem!
• If you can do it let me know :D
What have we learned?
• We can find LCS(X,Y) in time O(nm)
• if |Y|=n, |X|=m

• We went through the steps of coming up with a

dynamic programming algorithm.
• We kept a 2-dimensional table, breaking down the
problem by decrementing the length of X and Y.
Example 2: Knapsack Problem
• We have n items with weights and values:

Item:

Weight: 6 2 4 3 11
Value:
20 8 14 13 35

• And we have a knapsack: Capacity: 10

• it can only carry so much weight:
Item:
Weight: 6 2 4 3 11
Capacity: 10 Value: 20 8 14 13 35

• Unbounded Knapsack:
• Suppose I have infinite copies of all of the items.
• What’s the most valuable way to fill the knapsack?
Total weight: 10
Total value: 42

• 0/1 Knapsack:
• Suppose I have only one copy of each item.
• What’s the most valuable way to fill the knapsack?

Total weight: 9
Total value: 35
Some notation

Item:

Weight: w1 w2 w3 … wn
Value: v1 v2 v3 vn

Capacity: W
Recipe for applying Dynamic Programming
• Step 1: Identify optimal substructure.
• Step 2: Find a recursive formulation for the value of
the optimal solution.
• Step 3: Use dynamic programming to find the value
of the optimal solution.
• Step 4: If needed, keep track of some additional
info so that the algorithm from Step 3 can find the
actual solution.
• Step 5: If needed, code this up like a reasonable
person.
Optimal substructure
• Sub-problems:
• Unbounded Knapsack with a smaller knapsack.

First solve the

problem for Then larger Then larger
small knapsacks knapsacks knapsacks
Optimal substructure item i

• Suppose this is an optimal solution for capacity x:

Weight wi
Value vi
Capacity x
• Then this optimal for capacity x - wi: Value V

If I could do better than the second solution, Capacity x – wi

then adding a turtle to that improvement Value V - vi
would improve the first solution.
Recipe for applying Dynamic Programming
• Step 1: Identify optimal substructure.
• Step 2: Find a recursive formulation for the value of
the optimal solution.
• Step 3: Use dynamic programming to find the value
of the optimal solution.
• Step 4: If needed, keep track of some additional
info so that the algorithm from Step 3 can find the
actual solution.
• Step 5: If needed, code this up like a reasonable
person.
Recursive relationship
• Let K[x] be the optimal value for capacity x.

K[x] = maxi { + }
The maximum is over Optimal way to The value of
all i so that 𝑤𝑖 ≤ 𝑥. fill the smaller item i.
knapsack

K[x] = maxi { K[x – wi] + vi }

• (And K[x] = 0 if the maximum is empty).

• That is, there are no i so that 𝑤𝑖 ≤ 𝑥
Recipe for applying Dynamic Programming
• Step 1: Identify optimal substructure.
• Step 2: Find a recursive formulation for the value of
the optimal solution.
• Step 3: Use dynamic programming to find the value
of the optimal solution.
• Step 4: If needed, keep track of some additional
info so that the algorithm from Step 3 can find the
actual solution.
• Step 5: If needed, code this up like a reasonable
person.
Let’s write a bottom-up DP algorithm
• UnboundedKnapsack(W, n, weights, values):
• K[0] = 0
• for x = 1, …, W:
• K[x] = 0
• for i = 1, …, n:
• if 𝑤𝑖 ≤ 𝑥:
• 𝐾 𝑥 = max{ 𝐾 𝑥 , 𝐾 𝑥 − 𝑤𝑖 + 𝑣𝑖 }
• return K[W]

Running time: O(nW)

K[x] = maxi { + }
Why does this work?
= maxi { K[x – wi] + vi } Because our recursive relationship makes sense.
Can we do better?
• We only need log(W) bits to write down the input W
and to write down all the weights.
• Maybe we could have an algorithm that runs in time
O(nlog(W)) instead of O(nW)?
• Or even O( n1000000 log1000000(W) )?

• Open problem!
• (But probably the answer is no…otherwise P = NP)
Recipe for applying Dynamic Programming
• Step 1: Identify optimal substructure.
• Step 2: Find a recursive formulation for the value of
the optimal solution.
• Step 3: Use dynamic programming to find the value
of the optimal solution.
• Step 4: If needed, keep track of some additional
info so that the algorithm from Step 3 can find the
actual solution.
• Step 5: If needed, code this up like a reasonable
person.
Let’s write a bottom-up DP algorithm
• UnboundedKnapsack(W, n, weights, values):
• K[0] = 0
• for x = 1, …, W:
• K[x] = 0
• for i = 1, …, n:
• if 𝑤𝑖 ≤ 𝑥:
• 𝐾 𝑥 = max{ 𝐾 𝑥 , 𝐾 𝑥 − 𝑤𝑖 + 𝑣𝑖 }
• return K[W]

K[x] = maxi { + }

= maxi { K[x – wi] + vi }

Let’s write a bottom-up DP algorithm
• UnboundedKnapsack(W, n, weights, values):
• K[0] = 0
• ITEMS[0] = ∅
• for x = 1, …, W:
• K[x] = 0
• for i = 1, …, n:
• if 𝑤𝑖 ≤ 𝑥:
• 𝐾 𝑥 = max{ 𝐾 𝑥 , 𝐾 𝑥 − 𝑤𝑖 + 𝑣𝑖 }
• If K[x] was updated:
• ITEMS[x] = ITEMS[x – wi] ∪ { item i }
• return ITEMS[W]

K[x] = maxi { + }

= maxi { K[x – wi] + vi }

• UnboundedKnapsack(W, n, weights, values):
• K[0] = 0
• ITEMS[0] = ∅
Example • for x = 1, …, W:
• K[x] = 0
• for i = 1, …, n:
0 1 2 3 4 • if 𝑤𝑖 ≤ 𝑥:
• 𝐾 𝑥 = max{ 𝐾 𝑥 , 𝐾 𝑥 − 𝑤𝑖 + 𝑣𝑖 }
• If K[x] was updated:
K 0 • ITEMS[x] = ITEMS[x – wi] ∪ { item
• return ITEMS[W]
ITEMS