ECE 551 Lecture 3
ECE 551 Lecture 3
Preface
This lecture continues our discussion of the method of dynamic programming. We proceed to generalize
the particular solution obtained from the example in the prior lecture to encompass all optimal control
problems involving an nth order differential equation with both state and control input constraints. This
treatment gives rise to a general mathematical statement of the dynamic programming algorithm when
applied to optimal control problems involving time-invariant dynamical systems.
We next consider whether dynamic programming can be used to solve problems involving continuous-
time systems without converting those systems to a discrete-time representation. This leads to the
classic Hamilton-Jacobi-Bellman (HJB) partial differential equation.
Consider an nth order time-invariant system described by the following state equation:
x (t ) a x (t ), u (t ) (4.10)
tf
J h x (t f ) g x (t ), u (t ) dt (4.11)
0
Where, the final time t f is assumed to be fixed, and the admissible controls are constrained to lie within a
set U , i.e. u U .
We begin our solution by converting equation (4.10) to a difference equation. We assume there are N
equally spaced time increments in 0 , t f , and as before, we approximate the time derivative as follows:
x (t t ) x (t )
x (t )
t
x (k 1) x(k ) t a x (k ) , u (k )
x (k 1) a D x (k ) , u (k ) (4.12)
1
ECE 551 LECTURE 3
N 1
J h x ( N ) t g x (k ) , u (k )
k 0
N 1
J h x ( N ) g D x (k ) , u (k ) (4.13)
k 0
Now, lets define the following quantity, which describes the cost of reaching the final state value x (N ) ,
i.e.
J NN x ( N ) h x ( N ) (4.14)
J N 1 , N x ( N 1) , u ( N 1) J NN x ( N ) g D x ( N 1) , u ( N 1) (4.15)
Note that equation (4.15) represents a one-stage process or the cost of operation during the interval
N 1 t t N t . Also note that J N 1 , N is only dependent on x ( N 1) and u ( N 1) , since
J N 1, N x ( N 1) , u ( N 1) g D x ( N 1) , u ( N 1)
(4.16)
J NN a D x ( N 1) , u ( N 1)
J N 1, N x ( N 1) , u ( N 1) min { g D x ( N 1) , u ( N 1)
u (k )
(4.17)
J NN a D x ( N 1) , u ( N 1) }
u x N 1, N 1 .
2
ECE 551 LECTURE 3
J N 2 , N x ( N 2) , u ( N 2) , u ( N 2) g D x ( N 2) , u ( N 2)
g D x ( N 1) , u ( N 1)
h x (N ) (4.18)
g D x ( N 2) , u ( N 2)
J N 1, N x ( N 1) , u ( N 1)
Hence, the optimal policy for the last two intervals is given by
J N 2 , N x ( N 2) min { g D x ( N 2) , u ( N 2)
u ( N 2) ,
u ( N 1) (4.19)
J
N 1 , N x ( N 1) }
But, since x ( N 1) is related to x ( N 2) and u ( N 2) through equation (4.12), J N 2 , N only depends on
x ( N 2) . Hence,
J N 2 , N x ( N 2) min { g D x ( N 2) , u ( N 2)
u ( N 2)
(4.20)
J N 1, N a D x ( N 2) , u ( N 2) }
We can continue backwards is this same manner, and if we do so, we obtain the following result for a k -
stage process
J N k , N x ( N k ) min { g D x ( N k ) , u ( N k )
u ( N k )
(4.21)
J N k 1, N a D x ( N k ) , u ( N k ) }
Note that equation (4.21) is a recurrence relation: i.e. if we know J N k 1, N , then we can use equation
(4.21) to find J N k , N , and so on. Consequently, if we start with J N , N , then we can work backwards over
We have approximated continuous-time systems with discrete-time representations in our initial treatment
of dynamic programming, and we have seen that this approach led to a recurrence relation suited to
solution by a digital computer. We will now consider an alternative approach to dealing with continuous-
time systems and the application of dynamic programming which results in a non-linear partial differential
equation, i.e. the so-called HJB equation. Start by considering the following continuous-time state
equation:
3
ECE 551 LECTURE 3
x (t ) a ( x(t ) , u (t ) , t ) (6.7)
tf
J h ( x (t f ) , t f ) g ( x ( ) , u ( ) , ) d (6.8)
t0
Where, we have that h and g are specified functions, t 0 and t f are fixed, and is a dummy variable of
integration.
J ( x ( t ) , u ( t ) , t ) h ( x (t f ) , t f )
t t f
tf (6.9)
g ( x ( ) , u ( ) , ) d
t
Where, we have that t can be any value less than or equal to t f , and x ( t ) can be any admissible state
value. Now lets try to determine the controls that minimize equation (6.9) for all t t f and all admissible
x(t ) .
tf
J ( x ( t ) , t ) min { g ( x ( ) , u ( ) , ) d h ( x (t f ) , t f ) }
(6.10)
u ( )
t t f t
4
ECE 551 LECTURE 3
t t tf
g ( x ( ) , u ( ) , ) d g ( x ( ) , u ( ) , ) d
J ( x ( t ) , t ) min {
u ( ) (6.11)
t t f t t t
h ( x (t f ) , t f ) }
t t
g ( x ( ) , u ( ) , ) d J
J ( x ( t ) , t ) min { (x(t t ) , t t ) } (6.12)
u ( )
t t f t
t t
g ( x ( ) , u ( ) , ) d J
J ( x ( t ) , t ) min { (x(t ) , t )
u ( )
t t f t
J
( x (t ) , t ) t
t (6.13)
J
( x (t ) , t ) T [ x ( t t ) x ( t ) ]
x
higher order terms }
5
ECE 551 LECTURE 3
J ( x ( t ) , t ) min { g ( x (t ) , u (t ) , t ) t J ( x ( t ) , t )
u( t )
J t ( x (t ) , t ) t (6.14)
T
J x ( x (t ) , t ) [ a ( x ( t ) , u ( t ) , t ] t
o(t ) }
J J J J
3. J x
T
x x1 x2 xn
We can remove terms in equation (6.14) involving J ( x ( t ) , t ) and J t ( x ( t ) , t ) from the minimization,
since they do not depend on u ( t ) . Thus, we have that
0 J t ( x ( t ) , t ) t min { g ( x (t ) , u (t ) , t ) t
u( t )
J x T ( x (t ) , t ) [ a ( x ( t ) , u ( t ) , t ] t (6.15)
o(t ) }
6
ECE 551 LECTURE 3
0 J t ( x ( t ) , t ) min { g ( x (t ) , u (t ) , t )
u( t )
(6.16)
T
J x ( x (t ) , t ) [ a ( x ( t ) , u ( t ) , t ] }
In order to find the boundary condition for equation (6.16), we set t t f in equation (6.10) which yields the
following expression:
J ( x ( t f ) , t f ) h ( x (t f ) , t f ) (6.17)
H ( x ( t ) , u ( t ) , J x , t ) g ( x (t ) , u (t ) , t )
(6.18)
J x T ( x (t ) , t ) [ a ( x ( t ) , u ( t ) , t ]
H ( x ( t ) , u ( x ( t ) , J x , t ) , J x , t ) min H ( x (t ) , u (t ) , J x , t ) (6.19)
u(t )
Using these definitions, i.e. equations (6.18) and (6.19), we arrive at the HJB equation
0 J t ( x (t ) , t ) H ( x ( t ) , u ( x ( t ) , J x , t ) , J x , t ) (6.20)
7
ECE 551 LECTURE 3
Note that equation (6.20) is a non-linear (in general) partial differential equation which is the result of
applying dynamic programming to the solution of an optimal control problem involving a continuous-time
system.