0% found this document useful (0 votes)
38 views

ECE 551 Lecture 3

This lecture discusses dynamic programming and its application to optimal control problems. It begins by generalizing the dynamic programming solution from the previous lecture to systems described by nth order differential equations. It then considers using dynamic programming to solve continuous-time systems directly, without discretization, by deriving the Hamilton-Jacobi-Bellman equation. The lecture proceeds to discretize the equations of motion and performance index to obtain a recursive relationship and solve for the optimal control policy using backward induction.

Uploaded by

adambose1990
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

ECE 551 Lecture 3

This lecture discusses dynamic programming and its application to optimal control problems. It begins by generalizing the dynamic programming solution from the previous lecture to systems described by nth order differential equations. It then considers using dynamic programming to solve continuous-time systems directly, without discretization, by deriving the Hamilton-Jacobi-Bellman equation. The lecture proceeds to discretize the equations of motion and performance index to obtain a recursive relationship and solve for the optimal control policy using backward induction.

Uploaded by

adambose1990
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ECE 551 LECTURE 3

Preface

This lecture continues our discussion of the method of dynamic programming. We proceed to generalize
the particular solution obtained from the example in the prior lecture to encompass all optimal control
problems involving an nth order differential equation with both state and control input constraints. This
treatment gives rise to a general mathematical statement of the dynamic programming algorithm when
applied to optimal control problems involving time-invariant dynamical systems.

We next consider whether dynamic programming can be used to solve problems involving continuous-
time systems without converting those systems to a discrete-time representation. This leads to the
classic Hamilton-Jacobi-Bellman (HJB) partial differential equation.

Optimal Control Application

We will begin with a statement of the problem as follows:

Consider an nth order time-invariant system described by the following state equation:

x (t ) a x (t ), u (t ) (4.10)

We seek the control law that minimizes the performance measure

tf


J h x (t f ) g x (t ), u (t ) dt (4.11)
0

Where, the final time t f is assumed to be fixed, and the admissible controls are constrained to lie within a

set U , i.e. u U .

We begin our solution by converting equation (4.10) to a difference equation. We assume there are N
equally spaced time increments in 0 , t f , and as before, we approximate the time derivative as follows:
x (t t ) x (t )
x (t )
t

And so, we have that

x (k 1) x(k ) t a x (k ) , u (k )

Which we denote as follows:

x (k 1) a D x (k ) , u (k ) (4.12)

1
ECE 551 LECTURE 3

Proceeding in a similar manner with the performance measure, we obtain

N 1
J h x ( N ) t g x (k ) , u (k )
k 0

Which we denote as follows:

N 1
J h x ( N ) g D x (k ) , u (k ) (4.13)
k 0

Now, lets define the following quantity, which describes the cost of reaching the final state value x (N ) ,
i.e.

J NN x ( N ) h x ( N ) (4.14)

Next, we define the following

J N 1 , N x ( N 1) , u ( N 1) J NN x ( N ) g D x ( N 1) , u ( N 1) (4.15)

Note that equation (4.15) represents a one-stage process or the cost of operation during the interval
N 1 t t N t . Also note that J N 1 , N is only dependent on x ( N 1) and u ( N 1) , since

x (N ) is related to both x ( N 1) and u ( N 1) through equation (4.12). Thus, we may write

J N 1, N x ( N 1) , u ( N 1) g D x ( N 1) , u ( N 1)
(4.16)
J NN a D x ( N 1) , u ( N 1)

And so, the optimal cost is given by

J N 1, N x ( N 1) , u ( N 1) min { g D x ( N 1) , u ( N 1)
u (k )
(4.17)
J NN a D x ( N 1) , u ( N 1) }

We know that the optimal choice u



N 1 will depend on x N 1 , thus we may denote that by

u x N 1, N 1 .

Now, lets consider the last two intervals, i.e.

2
ECE 551 LECTURE 3

J N 2 , N x ( N 2) , u ( N 2) , u ( N 2) g D x ( N 2) , u ( N 2)
g D x ( N 1) , u ( N 1)
h x (N ) (4.18)
g D x ( N 2) , u ( N 2)
J N 1, N x ( N 1) , u ( N 1)

Hence, the optimal policy for the last two intervals is given by

J N 2 , N x ( N 2) min { g D x ( N 2) , u ( N 2)
u ( N 2) ,
u ( N 1) (4.19)
J
N 1 , N x ( N 1) }

But, since x ( N 1) is related to x ( N 2) and u ( N 2) through equation (4.12), J N 2 , N only depends on

x ( N 2) . Hence,

J N 2 , N x ( N 2) min { g D x ( N 2) , u ( N 2)
u ( N 2)
(4.20)
J N 1, N a D x ( N 2) , u ( N 2) }

We can continue backwards is this same manner, and if we do so, we obtain the following result for a k -
stage process

J N k , N x ( N k ) min { g D x ( N k ) , u ( N k )
u ( N k )
(4.21)
J N k 1, N a D x ( N k ) , u ( N k ) }


Note that equation (4.21) is a recurrence relation: i.e. if we know J N k 1, N , then we can use equation

(4.21) to find J N k , N , and so on. Consequently, if we start with J N , N , then we can work backwards over

all N stages. As long as N k , the cost of the k stage process:


th
stage is imbedded in that of the N
this is the so-called imbedding principle.

Hamilton-Jacobi-Bellman (HJB) Equation

We have approximated continuous-time systems with discrete-time representations in our initial treatment
of dynamic programming, and we have seen that this approach led to a recurrence relation suited to
solution by a digital computer. We will now consider an alternative approach to dealing with continuous-
time systems and the application of dynamic programming which results in a non-linear partial differential
equation, i.e. the so-called HJB equation. Start by considering the following continuous-time state
equation:

3
ECE 551 LECTURE 3

x (t ) a ( x(t ) , u (t ) , t ) (6.7)

And, the performance measure

tf

J h ( x (t f ) , t f ) g ( x ( ) , u ( ) , ) d (6.8)
t0

Where, we have that h and g are specified functions, t 0 and t f are fixed, and is a dummy variable of
integration.

Let us now apply the imbedding principle, i.e.

J ( x ( t ) , u ( t ) , t ) h ( x (t f ) , t f )
t t f

tf (6.9)
g ( x ( ) , u ( ) , ) d
t

Where, we have that t can be any value less than or equal to t f , and x ( t ) can be any admissible state

value. Now lets try to determine the controls that minimize equation (6.9) for all t t f and all admissible

x(t ) .

The minimum cost is given by

tf

J ( x ( t ) , t ) min { g ( x ( ) , u ( ) , ) d h ( x (t f ) , t f ) }

(6.10)
u ( )
t t f t

4
ECE 551 LECTURE 3

Now, lets subdivide the time-interval as follows:

t t tf

g ( x ( ) , u ( ) , ) d g ( x ( ) , u ( ) , ) d

J ( x ( t ) , t ) min {
u ( ) (6.11)
t t f t t t

h ( x (t f ) , t f ) }

Recall that the principle of optimality requires that

t t

g ( x ( ) , u ( ) , ) d J

J ( x ( t ) , t ) min { (x(t t ) , t t ) } (6.12)
u ( )
t t f t

Where, we have that J



( x ( t t ) , t t ) is the minimum cost for the time-interval t t t f
with the initial state equal to x ( t t ) . If we assume that the second partial derivatives of

J ( x ( t t ) , t t ) exist and are bounded, then we can expand J ( x ( t t ) , t t ) in a Taylor


series about the point ( x ( t t ) , t ) as follows:

t t

g ( x ( ) , u ( ) , ) d J

J ( x ( t ) , t ) min { (x(t ) , t )
u ( )
t t f t

J
( x (t ) , t ) t
t (6.13)

J
( x (t ) , t ) T [ x ( t t ) x ( t ) ]
x
higher order terms }

We have that for small t ,

5
ECE 551 LECTURE 3

J ( x ( t ) , t ) min { g ( x (t ) , u (t ) , t ) t J ( x ( t ) , t )
u( t )

J t ( x (t ) , t ) t (6.14)
T
J x ( x (t ) , t ) [ a ( x ( t ) , u ( t ) , t ] t
o(t ) }

Where, we note the following:

1. o ( t ) , denotes terms containing powers of t


J

2. J
t
t

J J J J
3. J x
T

x x1 x2 xn


We can remove terms in equation (6.14) involving J ( x ( t ) , t ) and J t ( x ( t ) , t ) from the minimization,
since they do not depend on u ( t ) . Thus, we have that

0 J t ( x ( t ) , t ) t min { g ( x (t ) , u (t ) , t ) t
u( t )

J x T ( x (t ) , t ) [ a ( x ( t ) , u ( t ) , t ] t (6.15)
o(t ) }

Now, lets divide throughout by t and take the limit as t , i.e.

6
ECE 551 LECTURE 3

0 J t ( x ( t ) , t ) min { g ( x (t ) , u (t ) , t )
u( t )
(6.16)
T
J x ( x (t ) , t ) [ a ( x ( t ) , u ( t ) , t ] }

In order to find the boundary condition for equation (6.16), we set t t f in equation (6.10) which yields the
following expression:

J ( x ( t f ) , t f ) h ( x (t f ) , t f ) (6.17)

Lets define a new quantity, i.e. the Hamiltonian, as follows:

H ( x ( t ) , u ( t ) , J x , t ) g ( x (t ) , u (t ) , t )
(6.18)
J x T ( x (t ) , t ) [ a ( x ( t ) , u ( t ) , t ]

And so, we have that

H ( x ( t ) , u ( x ( t ) , J x , t ) , J x , t ) min H ( x (t ) , u (t ) , J x , t ) (6.19)
u(t )

Using these definitions, i.e. equations (6.18) and (6.19), we arrive at the HJB equation

0 J t ( x (t ) , t ) H ( x ( t ) , u ( x ( t ) , J x , t ) , J x , t ) (6.20)

7
ECE 551 LECTURE 3

Note that equation (6.20) is a non-linear (in general) partial differential equation which is the result of
applying dynamic programming to the solution of an optimal control problem involving a continuous-time
system.

You might also like