DP Practice
DP Practice
Problem 1. (Deterministic DP) Consider the following network, where each number
along a link represents the actual distance between the pair of nodes connected by that link.
The objective is to find the shortest path from the origin to the destination.
Use dynamic programming to solve this problem. Construct tables for the decisions in
each stage, like what we’ve done during the lecture.
Problem 2. (Deterministic DP) A political campaign is entering its final stage, and
polls indicate a very close election. One of the candidates has enough funds left to purchase
TV time for a total of five prime-time commercials on TV stations located in four different
areas. Based on polling information, an estimate has been made of the number of additional
votes that can be won in the different broadcasting areas depending upon the number of
commercials run. These estimates are given in the following table in thousands of votes:
1
Use dynamic programming to determine how the five commercials should be distributed
among the four areas in order to maximize the estimated number of votes won.
Problem 3. (Stochastic DP) Imagine that you have $5,000 to invest and that you will
have an opportunity to invest that amount in either of two investments (A or B) at the be-
ginning of each of the next 3 years. Both investments have uncertain returns. For investment
A you will either lose your money entirely or (with higher probability) get back $10,000 (a
profit of $5,000) at the end of the year. For investment B you will get back either just your
$5,000 or (with low probability) $10,000 at the end of the year. The probabilities for these
events are as follows:
You are allowed to make only (at most) one investment each year, and you can invest
only $5,000 each time. (Any additional money accumulated is left idle.)
(a) Use dynamic programming to find the investment policy that maximizes the expected
amount of money you will have after 3 years.
(b) Use dynamic programming to find the investment policy that maximizes the probability
that you will have at least $10000 after 3 years.
2
Solution
Problem 1
DP algorithm:
Problem 2
Let xn be the number of commercials run in area n, pn (xn ) be the number of votes won
when xn commercials are run in area n and sn be the number of commercials remaining to
be allocated to areas n. Then
Number of stages: 4
3
f3 (s, x3 )
x3
0 1 2 3 4 5 f3∗ (s) x∗3
s
0 0 - - - - - 0 0
1 3 5 - - - - 5 1
2 7 8 9 - - - 9 2
3 12 12 12 11 - - 12 0,1,2
4 14 17 16 14 10 - 17 1
5 16 19 21 18 13 9 21 2
f2 (s, x2 )
x2
0 1 2 3 4 5 x∗2
s
0 0 - - - - - 0 0
1 5 6 - - - - 6 1
2 9 11 8 - - - 11 1
3 12 15 13 10 - - 15 1
4 17 18 17 15 11 - 18 1
5 21 23 20 19 16 12 23 1
f1 (s, x1 )
x1
0 1 2 3 4 5 f1∗ (s) x∗1
s
5 23 22 22 20 18 15 23 0
Problem 3
(a) Let xn ∈ {0, A, B} be the investment made in year n, sn be the amount of money on
hand at the beginning of year n and fn (sn , xn ) be the maximum expected amount of
money by the end of the third year given sn and xn .
For 0 ≤ sn < 5000, since one cannot invest less than $5000, fn (sn , xn ) = fn+1
∗
(sn ) and
∗
xn = 0.
4
For sn ≥ 5000,
∗
fn+1 (sn ) if xn = 0
∗ ∗
fn (sn , xn ) = 0.3fn+1 (sn − 5000) + 0.7fn+1 (sn + 5000) if xn = A (2)
∗ ∗
0.9fn+1 (sn ) + 0.1fn+1 (sn + 5000) if xn = B
f2 (s2 , x2 )
s2 0 A B f2∗ (s2 ) x∗2
0 ≤ s2 < 5000 s2 - - s2 0
5000 ≤ s2 < 10000 s2 + 2000 s2 + 3400 s2 + 2500 s2 + 3400 A
s2 ≥ 10000 s2 + 2000 s2 + 4000 s2 + 2500 s2 + 4000 A
f1 (s1 , x1 )
s1 0 A B f1∗ (s1 ) x∗1
5000 8400 9800 8960 9800 A
The optimal policy is to invest in A with an expected fortune after three years of $9800.
(b) Let xn and sn be defined as in (a). Let fn (sn , xn ) be the maximum probability of having
at least $10000 after 3 years given sn and xn .
Similarly, for 0 ≤ sn < 5000, since one cannot invest less than $5000, fn (sn , xn ) =
∗
fn+1 (sn ) and x∗n = 0.
For sn ≥ 5000, the recursive relationship can be written as
∗
fn+1 (sn ) if xn = 0
∗ ∗
fn (sn , xn ) = 0.3fn+1 (sn − 5000) + 0.7fn+1 (sn + 5000) if xn = A (3)
∗ ∗
0.9fn+1 (sn ) + 0.1fn+1 (sn + 5000) if xn = B
Starting at the final stage—year 3, the probability of having at least $10000 at the end
of year 3 is obvious according to the probability distribution, which is listed below:
f3 (s3 , x3 )
s3 0 A B f3∗ (s3 ) x∗3
0 ≤ s3 < 5000 0 - - 0 0
5000 ≤ s3 < 10000 0 0.7 0.1 0.7 A
10000 ≤ s3 < 15000 1 0.7 1 1 0,B
s3 ≥ 15000 1 1 1 1 0,A,B
5
Suppose we are now at the beginning of year 2, the remaining money on hand could
be possibly in three ranges: [0, 5000), [5000, 10000), [10000, ∞). If 0 ≤ s2 < 5000, we
could not invest, thus the only action is no investment and the probability should be 0; if
5000 ≤ s2 < 10000, no investment leads to f2 (s2 , 0) = f3∗ (s2 ) = 0.7, investing in A leads
to f2 (s2 , A) = 0.3f3∗ (s2 −5000)+0.7f3∗ (s2 +5000) = 0.7, investing in B leads to f2 (s2 , A) =
0.9f3∗ (s2 ) + 0.1f3∗ (s2 + 5000) = 0.63 + 0.1 = 0.73; if s2 ≥ 10000, no investment leads to
f2 (s2 , 0) = 1, investing in A leads to f2 (s2 , A) = 0.3f3∗ (s2 − 5000) + 0.7f3∗ (s2 + 5000) =
0.21 + 0.7 = 0.91, investing in B leads to f2 (s2 , B) = 0.9f3∗ (s2 ) + 0.1f3∗ (s2 + 5000) = 1.
Similarly according to the recursive function, we get the probability at the first stage,
which is shown below:
f2 (s2 , x2 )
s2 0 A B f2∗ (s2 ) x∗2
0 ≤ s2 < 5000 0 - - 0 0
5000 ≤ s2 < 10000 0.7 0.7 0.73 0.73 B
s2 ≥ 10000 1 0.91 1 1 0,B
f1 (s1 , x1 )
s1 0 A B f1∗ (s1 ) x∗1
5000 0 0.7 0.757 0.757 B
Hence, the optimal policies are (using the numbers on the arcs to represent the return
on investment indicated at the nodes):
and the maximum probability of having at least $10000 at the end of three years is 0.757.