Lecture10 - Pontryagins Minimum Principle
Lecture10 - Pontryagins Minimum Principle
The HJB equation provides a lot of information: the optimal cost-to-go and the optimal policy
for all time and for all possible states. However, in many cases, we only care about the optimal
control trajectory for a specific initial condition. We will see how to exploit the fact that we
are asking for much less in order to arrive at simpler conditions for optimality (the Minimum
Principle).
10.1 Notation
Let F (t, x) be a continuous, differentiable function. Then,
∂F (t, x)
• The partial derivative of F with respect to its first argument, t, is .
∂t
• The partial derivative of F with respect to t when subject to x = x(t) is
∂F (t, x(t)) ∂F (t, x) ∂F (t, x) ∂x(t)
= + .
∂t ∂t
x=x(t) ∂x
x=x(t) ∂t
Example 1:
Consider F (t, x) = tx. Then,
∂F (t, x)
=x
∂t
dF (t, x(t))
= x(t) + tẋ(t).
dt
4
Date compiled: December 5, 2018
1
Lemma 10.1. Let F (t, x, u) be a continuously differentiable function of t ∈ R, x ∈ Rn , u ∈ Rm
and let U ⊆ Rm be a convex set. Furthermore, assume µ∗(t, x) := arg minu∈U F (t, x, u) exists
and is continuously differentiable. Then for all t and x,
∂ min F (t, x, u)
u∈U ∂F (t, x, u)
=
∂t ∂t
u=µ∗ (t,x)
∂ min F (t, x, u)
u∈U ∂F (t, x, u)
= .
∂x ∂x
u=µ∗ (t,x)
Proof. We prove this for when U = Rm . Let G(t, x) := min F (t, x, u) = F (t, x, µ∗(t, x)). Then,
u∈U
Similarly, this can be shown for the partial derivative with respect to x.
Example 2:
Let F (t, x, u) := (1 + t)u2 + ux + 1, t ≥ 0. Then,
x
min F (t, x, u) : 2(1 + t)u + x = 0, u=−
u∈R 2(1 + t)
x
∴ µ∗(t, x) = −
2(1 + t)
(1 + t)x2 x2 x2
∴ min F (t, x, u) = − + 1 = − +1
u∈R 4(1 + t)2 2(1 + t) 4(1 + t)
1.
∂ min F (t, x, u)
u∈R ∂F (t, x, µ∗(t, x))
=
∂t ∂t
x2
=
4(1 + t)2
x2
=
4(1 + t)2
2
2.
∂ min F (t, x, u)
u∈R ∂F (t, x, µ∗(t, x))
=
∂x ∂x
x
=−
2(1 + t)
Dynamics
where
Cost
3
Objective
Given an initial condition x(0) = x ∈ S, construct an optimal control trajectory u(t) such that
(10.2) subject to (10.1) is minimized.
One could potentially solve the above problem using the HJB which gives an optimal policy
µ∗ (t, x). The optimal input trajectory can then be inferred from the policy for a given initial
condition and the solution to ẋ(t) = f (x(t), µ∗ (t, x(t)). However, as we have seen, solving the
HJB is very difficult in general. The following theorem can be used instead which gives necessary
conditions on the optimal trajectory (henceforth the star superscript ·∗ will be dropped).
Theorem 10.1. For a given initial condition x(0) = x ∈ S, let u(t) be an optimal control
trajectory with associated state trajectory x(t) for the system (10.1). Then there exists a
trajectory p(t) such that:
The function H(·, ·, ·) in the above is referred to the Hamiltonian function, which comes from
Hamilton’s Principle of Least Action from mechanics.
Proof. We provide an informal proof which assumes that the cost-to-go J(t, x) is continuously
differentiable, the optimal policy µ(·, ·) is continuously differentiable, and U is convex in order
to make use of Lemma 10.1. However, these assumptions are actually not needed in a more
formal proof.
With continuously differentiable cost-to-go, the HJB is also a necessary condition for optimality:
∂J(t, x) ∂J(t, x)
0 = min g(x, u) + + f (x, u) , ∀t ∈ [0, T ], ∀x ∈ S (10.3)
u∈U ∂t ∂x
| {z }
=:F (t,x,u)
Now take the partial derivatives of (10.4) with respect to t and x; by Lemma 10.1
∂ min F (t, x, u)
u∈U ∂F (t, x, u)
0= =
∂t ∂t
µ(t,x)
∂ 2 J(t, x) ∂ 2 J(t, x)
= + f (x, µ(t, x)), (10.6)
∂t2 ∂t∂x
4
and similarly,
∂ min F (t, x, u)
u∈U ∂F (t, x, u)
0= =
∂x ∂x
µ(t,x)
2 2
∂g(x, u) ∂ J(t, x) > ∂ J(t, x) ∂J(t, x) ∂f (x, u)
= + + f (x, µ(t, x)) + . (10.7)
∂x µ(t,x) ∂x∂t ∂x2 ∂x ∂x µ(t,x)
Now consider the specific optimal trajectory u(t) := µ(t, x(t)) where
∂ 2 J(t, x) ∂ 2 J(t, x)
0= + ẋ(t)
∂t2 x(t) ∂t∂x x(t)
d ∂J(t, x)
=
,
(10.8)
dt
∂t x(t)
| {z }
=:r(t)
∂ 2 J(t, x) 2
∂g(x, u) > ∂ J(t, x)
∂J(t, x) ∂f (x, u)
0= + + ẋ(t) +
∂x x(t),u(t) ∂x∂t x(t) ∂x2 x(t) ∂x x(t) ∂x x(t),u(t)
∂g(x, u) d ∂J(t, x) ∂J(t, x) ∂f (x, u)
= + + . (10.9)
∂x x(t),u(t) dt
∂x
x(t) ∂x x(t) ∂x x(t),u(t)
| {z }
=:p(t)>
∂J(t, x)
With r(t) := , (10.8) becomes
∂t x(t)
∂J(t, x) >
and with p(t) := , (10.9) becomes
∂x x(t)
5
Taking the partial derivative of the boundary condition (10.5) with respect to x yields
∂J(T, x) ∂h(x)
= , ∀x ∈ S
∂x ∂x
and thus
Remarks:
• The Minimum Principle requires solving an ODE with split boundary conditions. It is
not trivial to solve, but easier than solving a PDE in the HJB.
• The Minimum Principle provides necessary conditions for optimality. If a control trajec-
tory satisfies these conditions, it is not necessarily optimal. Further analysis is needed to
guarantee optimality. One method that often works is to prove that an optimal control
trajectory exists, and to verify that there is only one control trajectory satisfying the
conditions of the Minimum Principle.
6
The dynamics are given by
Solution
Note that the cost function can be written as a function of x(t) and u(t):
Z T
z(T ) = (1 − u(t))x(t)dt (10.10)
0
and z(t) does not enter into the dynamics of x(t), we can therefore consider x(t) as the only
state and u(t) as the control input. The stage cost is then g(x, u) = (1 − u)x, the terminal cost
is h(x) = 0, and the dynamics function is f (x, u) = ux. Thus the Hamiltonian is
∂H(x, u, p) >
ṗ(t) = − = −1 + u(t) − p(t)u(t) (10.11)
∂x
x(t),u(t),p(t)
∂h(x) >
p(T ) = =0
∂x x(T )
ẋ(t) = x(t)u(t), x(0) = x
u(t) = arg max H(x(t), u, p(t)) = arg max (x(t) + x(t)(p(t) − 1)u) (10.12)
0≤u≤1 0≤u≤1
Since x(t) > 0 for t ∈ [0, T ], from (10.12) we can find the following solution1
(
0 if p(t) < 1
u(t) =
1 if p(t) ≥ 1
We will now work backwards from t = T . Since p(T ) = 0, for t close to T , we have u(t) = 0
and therefore (10.11) becomes ṗ(t) = −1. Therefore at time t = T − 1, p(t) = 1 and that is
when the control input switches to u(t) = 1. Thus for t ≤ T − 1:
Note that by (10.13) p(t) is bigger than 1 for t < T − 1 till time 0, hence we have
(
1 if 0 ≤ t ≤ T − 1
u(t) = . (10.14)
0 if T − 1 < t ≤ T
1
Note that when p(t) = 1, u(t) can be anything between 0 and 1. It can be shown that this choice does not
make a difference in the incurred cost.
7
u(t) p(t)
0 T −1 T t 0 T −1 T t
x(t) z(t)
0 T −1 T t 0 T −1 T t
In conclusion, an optimal control trajectory is to use all the robots to replicate themselves from
time 0 until t = T − 1, and then use all the robots to build habitats. If T < 1, then the
robots should only build habitats. In general, if the Hamiltonian is linear in u, the maximum
or minimum of the Hamiltonian can only be attained on the boundaries of U. The resulting
control trajectory is known as bang-bang control. 4
10.3 Summary
The Minimum Principle is a necessary condition for the optimal trajectory; in particular, it
is possible that non-optimal trajectories satisfy the conditions outlined in Theorem 10.1. As
we saw last lecture, the HJB is a sufficient condition for optimality; in particular, if a solution
satisfies the HJB, then we are guaranteed that it is indeed optimal. This is summarized in
Fig. 10.2.
8
this lecture
(lecture 10) Minimum
HJB
Principle
local
lecture 9 minima
generalized
solutions
(discussed in
lecture 9) calculus of
optimal variations non-optimal
solutions solutions
Figure 10.2: Optimal solutions and their relation to the HJB and the Minimum Principle.