0% found this document useful (0 votes)
8 views

15 - Optimal Policies For Passive Learning Controllers

Uploaded by

Cordelio Lear
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

15 - Optimal Policies For Passive Learning Controllers

Uploaded by

Cordelio Lear
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Automatica, Vol. 25. No. 5, pp. 757-763, 1989 0005-1098189 $3.00 + 0.

00
Printed in Great Britain. Pergamon Press plc
~) 1989 International Federation of Automatic Control

Brief Paper

Optimal Policies for Passive Learning


Controllers*
FRANCISCO CASIELLOt and KENNETH A. L O P A R O t ~ t

Key Words--Dual control; dynamic programming; stochastic control; discrete time systems; adaptive
control.

Abstract--This paper deals with the optimal control of 2. Problem formulation


unknown parameter, partially observed linear discrete time Consider a linear, discrete time stochastic system of the
systems. The objective of the paper is to find a suitable cost form
functional for which passive learning controllers are optimal.
It is shown that if there are parameter uncertainties in the x(k+l)=A(r)x(k)+B(r)u(k)+o(k), k=0,1 ..... N-1
observation equation, then a quadratic functional can be (la)
found such that a passively adaptive closed loop control law
is optimal with respect to this functional. If there are also y(k ) = C(r)x(k ) + w(k ) (lb)
parameter uncertainties in the system equation, then a
where x(k) ~ R", u(k) ~ R", y(k) ~ R p, A(r), B(r) and C(r)
quadratic functional can be found such that a passively
are matrices of appropriate dimensions. Also x(0), {v(k)}
adaptive open loop feedback control law is optimal with
and {w(k)} are mutually independent white Gaussian
respect to this functional.
random sequences with
x(0) ~ N(xo, Co)
1. Introduction
THE PROBLEM of optimal control of partially observed, linear v(k ) ~ N(O, Q o)
discrete time stochastic systems with uncertain parameters
taking values in a finite set, is studied. In the 1960s and 1970s w(k) ~ N(O, QW)
the efforts directed toward the solution of this problem were and r is an unknown parameter taking values in a finite set
concentrated into two areas; finding quadratic approxima-
r = (0, 1. . . . . f).
tions to the cost-to-go of the dynamic programming equation For a quadratic cost functional of the following form:
to determine suboptimal dual control policies, and
developing naive suboptimal controllers, known as passively N--I
learning policies due to the fact that the controller does not J = E x(N)TSNx(N)+ ~'. x(k)TQkx(k)
k~O
use the control to actively influence the uncertainty about the
system parameters. + u(k)TRku(k)} (2)
In this paper we formulate and answer the following
question: given a passively learning policy, does there exist a
it is a well-known fact that the minimizing control sequence
cost functional for which this control strategy is optimal?
satisfies an active learning property, in the sense that the
Passive learning controllers of the type described by
Bar-Shalom and Tse (1974) and Casiello and Loparo (1985) control is actively employed to reduce the parameter
uncertainty (Casieilo and Loparo, 1985; Bar-Shalom and
are studied. Furthermore, it is shown that there are
fundamental differences between the following two cases. Tse, 1974). Also, no closed form solution for the optimal
control sequence has been obtained since the cost-to-go
(i) The realization of the system is (A, B, C(r)) (model function exhibits local maxima and minima (Grifliths and
uncertainty only in the observation equation). Loparo, 1985). Suboptimal "passive" learning policies have
(ii) The realization of the system is (A(r), B(r), C(r)) been employed (see Casiello and Loparo (1989b) for an
(model uncertainty in the plant and observation equations). account). It is the objective of this paper to illustrate that the
passive learning policy (Despande et al., 1973) /s optimal
In fact it is shown that for (i) above the passive learning with respect to a suitable quadratic cost; this is related to
controller (Bar-Shalom and Tse, 1974) is an optimal control some of the earlier work of Hijab (1986), Fragoso (1988) and
law for a suitable quadratic cost functional. This cost is shown some of our earlier work reported in Casielio and Loparo
to be a convex functional of the control sequence {u(k)} and (1989a) for continuous time problems.
is composed of two terms, one of them involving a weighted In Hijab (1986), a modified cost functional involving an
version of the trace of the state covariance matrix. Based on entropy term was proposed such that when certain algebraic
the same idea, an open loop feedback controller is designed conditions are satisfied the optimal closed loop policy can be
for case (ii) to find a quadratic cost functional with respect to obtained in closed form, for continuous time stochastic
which the policy, developed by Casielio and Loparo (1985), problems. In Casielio and Loparo (1989a) we proposed a
is optimal. modified cost functional, generalizing the results of Hijab
(1986), and obtained the solution of an optimal control
*Received 20 June 1988; revised 31 January 1989; problem in closed form, for continuous time stochastic
received in final form 9 March 1989. The original version of problems.
this paper was not presented at any IFAC meeting. This In Fragoso (1988), the continuous time problem for
paper was recommended for publication in revised form by systems with Markovian jump parameters is studied in a
Associate Editor P. G. Ferreira under the direction of Editor somewhat different form, and results similar to our work in
H. Kwakernaak. Casiello and Loparo (1989a) are obtained, i.e. a suitable cost
t Department of Systems Engineering, Case Western functional can be constructed so that a " D U L like" policy is
Reserve University, Cleveland, OH 44106, U.S.A. optimal. The original and the modified costs are related by a
~t Author to whom correspondence should be addressed. dual cost that enters additively in the cost functional. In

757
758 Brief Paper

Casiello and Loparo (1985) an expansion of the dual cost Let


which leads to a closed form solution of the optimal control
problem for discrete time systems was developed. $1(k)=E{x(k)lf, r=i}, l<.k<_N
P~(k) = E{(x(k) -2~(k))(x(k) -2~(k))~[ 1', r = i},
3. Estimation and filtering problem
The optimal estimator and filter for this problem are l<_k<_N
developed below, for A = A ( r ) , B = B(r), C = C(r), all then for l -< k -< N (6) and (7) can be rewritten as
matrices depend on the unknown parameter r. For a detailed
derivation see Griffiths and Loparo (1985). ~(k)=A(i)~(k-1)+B(i)u(k-1), l<-k<-N (6')
Let I k = {y(0) . . . . . y(k), u(O) . . . . . u(k - 1)}.
Let r take values in F = {0, 1. . . . . f}. P~(k) = A(i)P~(k - t)A(i)~ + Q ~' (7')
At time k = 0 we define the a priori probabilities
,~l(l) = , W )
qi(O) = e ( r = i I/o}, i e F. PJ(l) = P~(I).
By application of the theorem of total probability The conditional mean is
P(x(k) I/k) = ~ P(x(k) Iik, r = i)P(r = i l r ) E { x ( k ) I 1', 1 <- k} = ~. q,(l)$~(k)

the conditional density of x at time k assuming model i is and the conditional covariance matrix is
given by
Coy (x(k) I It) = ~ q,(l)P~,{k)
p(x(k ) I I k, r = i) = G(x(k );.~,(k ), P~(k )) ieF
where G ( . ; - , - ) denotes a Gaussian density function and + ~ q,(l)qj(k)($~(k) - .~(k))(.~(k) - 2J(k)) T. (8')
,et(k ) and Pi(k) are the conditional mean and covariance (i,I)GSF
matrix assuming model i given by The next theorem relates the minimization of an
appropriate quadratic cost functional associated with the
.fi(k) = m~(k) + K~(k)[y(k) - C(i)m~(k)] (3)
" D U L " passive controller defined by Despande et al. (1973).
P~(k ) = (1 - K,(k ) C ( i ) ) M , ( k ) (4) It is shown that this control law is the optimal closed loop
strategy for the problem posed.
and the Kalman gain
K~(k) = M,(k)C(i)T[C(i)M,(k)C(i) T + QW]-,. (5) 4. Results for model uncertainty only in the observation
equation
Here mi(k ) and Mi(k ) are the one step ahead mean and
Theorem 1. For the system
covariance matrix assuming model i and
x(k+l)=Ax(k)+Bu(k)+v(k), k=O ..... N-1
p(x(k ) l1 k-l, r = i) = G(x(k ); mi(k ), Mi(k ))
y(k) = C(r)x(k) + w(k), k = 1. . . . . N
with mi(k) and Mi(k ) computed recursively by
where the vectors and matrices were defined before, the
m~(k) = A ( i ) £ , ( k - l ) + B(i)u(k - 1) (6) passive closed loop control
M~(k) = A(i)P~(k - 1)A(i) T + Q ". (7) u(k) = -F(k).~(k), k = 0..... N - 1 (9a)

Define the a posteriori probabilities q i ( k ) = P ( r = i l I k ) , with


then they satisfy r(k) = (BTS(k + 1)B + R k ) - ' B T S ( k + I)A (9b)
qi(k ) = P(r = i I I k) = p ( y ( k ) l r = i, I k-I) is optimal with respect to the following quadratic cost
x P(r = i l l k - - l ) / p ( y ( k ) [ l k-I) criteria:
N--I
= p ( y ( k ) [ r = i, Ik-~)q~(k -- 1)/p(y(k ) l l ~-~)
J = EIx(N)TSNx(N) + ~ (x(k)rQkx(k) + u(k)TRku(k))
where L k=O
N-I }
p ( y ( k ) [ r = i, I k-I) - ~ (x(k) - ~(k))Tt',(x(k)- ~(k)) (10)
k=0
= G(y(k); C(i)m,(k), C(i)M~(k)C(i) T + QW).
where S N, {Qk}, k = 0 . . . . . N - 1 are sequences of
The conditional mean is symmetric positive semidefinite (~0) matrices, and {Rk},
k = 0 . . . . , N - 1 is a sequence of symmetric positive definite
$(k) = e { x ( k ) ] l k } = ~ qi(k)£i(k) (>0) matrices.
ieF
Here {Pk} is a sequence of matrices obtained from
and the conditional covariance matrix is
Pk = (BrS( k + 1)A)TD(k)(BTS( k + 1)A),
Cov (x(k) IIk) = ~ q,(k)Pi(k)
iEF k=0 ..... N-1 (lla)
+ ~, q~(k)q/(k)(.f~(k)-ij(k))(.~(k)-.~j(k)) T (8) where S(k) satisfies
(i,j)eS¥
where S r is the set of all pairwise distinct combinations of S(k) = A T S ( k + 1)A + Qk
elements of F. -- (BTS(k + 1)A)TD(k)(BTS(k + 1)A),
Refer to Appendix A for details pertaining to the
derivation of the conditional covariance equation. k = N - 1. . . . . 0 (11b)

Note. Assume that no new information will be collected with


from time I to time N. This will be the case when we develop D(k)=(B'rS(k+I)B+R~)-', I=N-1 ..... 0 (llc)
the optimal open loop feedback controller.
and S(k) satisfies the boundary condition
At time l the information available to the controller is
contained in I t. S(N) = SN.
Brief Paper 759

Furthermore, the optimal cost is x (~,(N - 1) - .~j(N - 1))

+ ~ . q i ( N - 1)(tr Q N _ I P i ( N -- 1) + tr SNMi(N))
joe rain J fxoTS(O)xo
(u(t ) ),"VZo
~
- ~ q , ( N - 1 ) q j ( N - I ) 0 ? , ( N - 1)
+ ~'~ q,(O) tr S ( n ) Q ~ + S(O)P~(O) . (12) (i,))~sF
-- $ j ( N - 1))'rPN_t(L(N -- 1) -- i j ( N - 1))
Proof. T h e solution is obtained using dynamic programming. - ~ q,(N- 1 ) t r P N _ 1 P i ( N - - 1).
Let
V ( k + 1, t k) = ~ i n E { x ( k ) r Q ~ ( k ) + u(k)TRku(k) Now select P ~ - t = (BTSNA)TDN-~(BTSNA), then using (7)
and ( l l b )
-- ( x ( k ) - £ ( k ) )T p , ( x ( k ) - $ ( k ) ) M~(N) = A(i)P~(N - 1)A(i) T + Q o
+ V ( k + 2, I *+') Ilk) and
where t* = {y(0), y(1) . . . . . y ( k ) , u(O) . . . . . u ( k - 1)) is the A(i)'rSNA(i) = S ( N - 1) - QN-1 + PN-I
information state of the system and V ( . , . ) satisfies the
then
boundary conditions
tr SNMi(N ) = tr SN(A(i)P~(N - 1)A(i) T) + tr SNQ ~
V ( N , I ~ - ' ) ffi min E { x ( N ) T S N x ( N )
u(N--l) = trA(i)TSNA(i)P~(N - 1) + tr SNQ °
+ x(N- 1 ) T Q ~ _ ~ x ( N - 1)
t r S N M i ( N ) = t r S ( N - 1)P~(N - 1) - tr O N _ t P i ( N - 1)
+ u ( N - 1)TR~_~u(N - 1)
+ tr PN_tP/(N -- 1) + tr SNQ ~
- ( x ( N - 1) - £ ( N - 1 ) ) T P ~ _ , ( x ( N - 1)
and this implies that
-.e(N - 1)) I ,,N-'}
q i ( N - 1) tr (QN_1P~(N -- 1) + SNMI(N) - PN_tP~(N -- 1))
and iGF

v(1, ~ = P . = ~'~ q i ( N - 1) tr ( Q N _ , P ~ ( N - 1) + S ( N - 1)P~(N- 1)

At Stage 1 we have - QN_,P,(N - 1) + PN_,P,(N - 1)


V ( N , I ~-~) + SNQ ° - P N _ , g ( N - 1))
= ~'~ q , ( N - 1) tr S ( ( N - I)P~(N - 1) + S N Q ~)
= min E { x ( N ) T S N x ( N ) + x ( N - 1)TQ~_tx(N- 1)
u(N-I)
+ u ( N - 1)TR~_~u(N - 1) which yields
V(N, F ' - I ) = ~ q , ( S - 1)~,(N - 1 ? ' S ( N - 1 ) e , ( N - 1)
-(x(N- 1) - £ ( N - 1 ) ) T p N _ t ( x ( N - 1) i~F

-.e(N- 1 ) ) 1 / N - ' } + ~'~ q , ( N - 1) tr { S ( N - 1)P~(N - 1) + SNQ ~}


i¢~F

= min / ~ q,(N- 1)m,(N)TSNm,(N)


where S ( N - 1) is defined as previously.
u ( N - - I ) ~'i~F The remainder of the proof follows from mathematical
induction.
+ ~ q~(N - 1) tr SNMi(N) Assume that at Stage N - k - 1

+ ~ q~(N- I~-(N- 1)TQ~_I~(N- 1) V ( k + 1, I k) = ~'. qi(k)xi(k)TS(k)xi


N
4- ~ q i ( N - 1) tr O~_t P~(N - 1)
+ ixe F X
~n=k+l
+ u ( N - 1)TRN_~u(N - 1)
then at Stage N - k we want to show that
q~(N - 1)$~(N- 1)'rPN_~.~(N- 1)
ieeF
V(k, t *-~) = Y~ q~(k - 1)~(k - 1)TS(k - 1)~,(k - 1)
ieF
~ . q i ( N - 1) tr P N _ ~ P i ( N - 1)
i¢F
+ ~'~ q,(k - l ) t r ~ S ( n ) Q O + S ( k - 1)P/(k - 1)
i~F Xn=k
+
with u(k), Pk and S ( k ) given by (3) and (5).
The induction argument follows from the following result:
x
V ( k , I k-I) = min E { x ( k - 1)TQ~_lx(k - 1)
u(k--l)
Thcn
+ u ( k - 1)TRk_lU(k -- 1)
u * ( N - 1) = --(BTSN B + R,v_O-'B'rSmA.~(N - 1)
- (x(k - 1) - ~ ( k - 0)+P~_,(x(k - 1)
= -r(N- I~(N- 1).
- ~ ( k - 1)) + v ( k + 1, P ) I P-~}-
Substituting u * ( N - 1 ) into V ( N , I ~-~) and using the
Computing E { ~ . ~ F q i ( k ) $ ~ ( k ) T S ( k ) £ j ( k ) l l k - t } and using (7)
algebraic fact given in Appendix A yields
and ( l l b ) and following the same steps as in Stage 1, with k
V ( N , 1~ - ' ) = ~ q , ( N - l ~ i ( N - 1)TS(N - I ~ , ( N - 1) replaced by N, we obtain the result.
ieF
The next result, Theorem 2, proves that the cost functional
+ ~ q~(N - 1)q~(N - 1)(.~(N - 1) introduced in Theorem 1 is a convex functional of {u(k)}
(i,I)eSF and thus the DUL controller is the unique minimizing
- :~(N - 1))T(BTSNA)TDN_t(BTSNA) strategy.
760 Brief Paper

Theorem 2 (verification theorem). The cost (10) is a convex next obtain a control cost for which an open loop Jeedback
functional of {u(k)} and, as such, there is a unique policy is optimal. It should also be noted that the
minimizing policy {u(k)}*. computation of the open loop feedback control (which is
needed to solve the problem when the system matrices A and
Proof. It is enough to show that S(k) is a positive B depend on the unknown parameter r) is much more
semidefinite matrix for k = 0 , . . . , N - 1 involved than the computation of the closed loop control
S(k) = ATS(k + 1)A + Qk (which can be solved in closed form when the system
matrices A and B do not depend on the unknown parameter
- ( B T S ( k + 1)A)T(BTS(k + 1)B + Rt,)-l(BXS(k + 1)A) r). This is an important difference in terms of the
computational effort and the conceptual interpretation.
with
S ( N - 1) = ATSNA + QN-I 5. Results for model uncertain~ on the plant and observation
equations
- (BTSNA)T(BTSNB + RN_,)-'(BTSNA) 5.1. Open loop feedback controller. Let q~(-) a n d f ( . , • . - )
be non-negative functions of their arguments.
with SN a positive semidefinite symmetric matrix,
Let
{Qk}k=o,...,N-i a sequence of positive semidefinite symmetric N-[
matrices and {Rk}k=o,...,N-~ a sequence of positive definite JN = cp(x(N)) + ~ f(x(k), u(k), k).
symmetric matrices. Write k =0

S(k ) = [A - BF(k )]XS(k + 1)[A - BF(k)] + F(k )TRkF(k ) + Qk The open loop feedback control policy is obtained as
follows. Let I k = {y(0) . . . . . y(k), u(0) . . . . . u(k - 1)} be
with the information state at time k. Note that
qi(O), q/(1) . . . . . qi(k), i • F, are not included because they
r(k) = (BTS(k + 1)B + Rk)-'BTS(k + 1)A. can be derived from I k and the system equations.
Given S(k + 1) positive semidefinite it follows that S(k) is Assume that only a priori information is available. The
positive semidefinite, k = N - 2 . . . . . 0. design of the open loop controller can be formulated as
But min E{JN I/~'} to obtain the sequence
(u(k)}k=o,. ,~-i
S(N - 1) = [A - BF(N - I)]TSNI A -- BF(N - 1)] {u°(O), u°(1) . . . . . u°(N - 1)}.
+ F(N - 1)TRN_IF(N - 1) + QN-I Select u(O) ° as the input to be applied at time zero and
advance one step, collecting information to obtain I ~.
->0
Assume that no new information will be collected in the
as required. future and select the control law at time equal to one such
Now, after a tedious calculation the cost-to-go at stage that
N - k - 1 can be written as min E{JN ] I I} to obtain the sequence
(u(k)}k .1,. ,jv-i
u'(1), ut(2) . . . . . u I ( N - 1).
V(k + 1, 1k) = min/~'~ q,(k) II£,(k)ll~(k)
u(k) [i~F
Select ul(1) as the input and advance one step, collecting
information to obtain I z. Repeating this procedure defines
+ u(k) + ~ q,(k)r(k)~,(k) ~ the open loop feedback policy {u(k)} = {u'(k)}. The
i~F II(Rk + B T S ( k + I ) B )
procedure can be formalized using dynamic programming as
developed in Appendix B.
+ i~t:
~'~ q,(k) tr , - 1S(n)Q ~ + S(k)P~(k) . The next result shows that the open loop feedback control
policy as previously defined is an optimal open loop feedback
Thus S(k + 1) > 0 implies Rk + BTS(k + 1)B > 0, so that at policy for an appropriately defined quadratic cost functional.
every stage the cost-to-go is a strictly convex function of the
control, and has a unique minimum. Theorem 3. For the system x(k + 1) = A(r)x(k) +
This proves the desired result. B(r)u(k) + v(k), k = 1. . . . . N - 1

Comments. y(k)=C(r)x(k)+w(k), k = 1. . . . . N
(1) The passive learning control introduced in Despande et where the vectors and matrices are defined in Section 2, the
al. (1973) is optimal with respect to an appropriate quadratic open loop feedback control
cost functional that includes a weighted version of the trace
of the conditional covariance matrix.
(2) The dual property of the control can be defined in u ( k ) = - [ ' ( k ) ( , ~ r q,(k)B(r)"Sr(k + l)A(r)Ycr(k )),
terms of the dependence of the conditional covariance on the
control (Bar-Shalom and Tse, 1974). Including the k=0 ..... N-I (13a)
conditional covariance weighted by the sequence {Pk}
balances out the active learning which is present in the F(k)=(~Fqr(k)(Rk+ B(r)TSr(k+ 1)B(r))) I
standard L Q G problem with parameter uncertainty so that
the passive learning controller is optimal in this context. k=0 ..... N-I (13b)
(3) It follows that L Q G problem with parameter
uncertainty poses an interesting dual control problem which is the optimal open loop feedback control with respect to the
inherently includes active learning without any modifications following quadratic cost functional:
of the cost functional to include an identification cost N-I
formally into the problem. J = E x(N)TSNx(N) + ~ (x(k)" fAx(k) + u(k)fRku(k))
(4) The feedback gain matrices (9) can be computed k~O
off-line. In fact, the optimal policy can be written as the N--1
weighted average of the individual optimal closed loop - ~ [(B(r)TS,( k + l)A(r)x(k))TD(k)
control policies corresponding to each parameter r • F , k=O
where the a posteriori probabilities are treated as the x (B(r)TS,(k + l)A(r)x(k))
weights. This is the D U L Despande et al. (1973) law. It
should be noted that this result has been obtained for the - E(B(r)TS,(k + 1)A(r)x(k) I lk)'rD(k)
case of model uncertainty in the observation and cannot be
straightforwardly extended to the case of general model × E(B(r)TS,(k + 1)A(r)x(k) [ P ] } (14)
uncertainty in the system equation. For this last situation we
Brief Paper 761

where for r e F, and {S,(k + 1)} and {D(k)} are sequences + ~', q,(0) tr {B(i)TsNA(i)}TD°(N - 1)
of matrices satisfying:
(i) × (B(i)TSNA(i))~(N-- 1)) (16)
St,(k) = A(r)TSt,(k + 1)A(r) + Qk
with IIxll~* = x T e x , for x e R ~.
- (B(r)TSt~(k + I)A(r))TDt(k)(B(r)TSt,(k + l)A(r)) Performing the minimization using (7') and (lib) and the
05a) results above, we obtain
with V(N,/O) = ~ q,(0)£'°(N- 1)TS,(N - 1)£°(N - 1)

Dr(k) = q~(l)B(r)TSt,(k + 1)B(r) + R k )' ,


/~F

+ ~ qi(O) tr (S,(N - l)e°(N - 1) + SNQ ")


\r~F
with
k=N-1 ..... l; /=0 ..... N-1. (15b)
(ii) S~/(N- 1) = A(i)T SNA(i) + ON-,
S,(k) = S~(k)
-- (B(i)TSNA(i))TD°(N -- 1)(B(i)TSNA(i))
D(k) = Dk(k)
with the boundary condition D°(N - I ) = (~Fqi(O)B(i)T SNB(i) + RN-, ) - '

S',(N)=S N, r¢F, I = 0 ..... N - 1 . (15c) and the boundary condition S°i(N)=SN, i cF. The
Remarks. minimizing control obtained is
(1) The upper index l indicates that the information
available to the controller is I t. For all times k with u°(N-1)= - f " ° ( N - 1)(~F q'(O)B(i)TSNA(i)£°(N-1))
l < k -< N - 1 the matrices Dl(k) and S~ are computed using
qi(l). Thus (15) are the equations associated with the f'°(N - 1) = O°(N - 1).
solution of the Bellman equation corresponding to the design
of u(l), equation (B6) in Appendix B. Note the cancellation of the terms involved in (16) with those
(2) This is neither the certainty equivalent controller nor in the cost-to-go.
the DUL controller, because (13) is not the average of the To achieve the desired result we apply mathematical
control policies associated with a particular system model induction in two steps.
(A(i), B(i), C(i)) weighted by the conditional probabilities Step 1. Here we assume that the information state is fixed
qi(k). This controller has been defined in Casielio and
at/O and use induction on the stages to determine u°(0). This
Loparo (1985).
is accomplished as follows. Assume that at stage N - k - 1
(3) The matrices {Si(k)} cannot be computed off-line,
since they depend on {qi(k)}k.O,....N-1. They must be V(k + 1,/O) = ~ q~(O)X°(k)TS°(k)X°(k)
recomputed as new information is obtained in Casiello and ieF
Loparo (1985). N

Proof. At Stage 1 we have


V(N, /o) = rain E{x(N)TSNx(N)+ x ( N - 1 ) ' r Q N _ t x ( N- 1) then we want to show that at stage N - k we have
u{N--1)

+ u ( N - 1)TRN_lU(N- 1) V(k,/O) = ~'~ q,(O)~°(k - 1)TS°(k - 1)X°(k - 1)


ieF

- ( B ( r ) T S r ( N ) A ( r ) x ( N - 1))TD°(N- 1)
+ ~ q,(O) tr S~(n)Q ~ + g ( k - 1)P~i(k - 1)
× (B(r)TS,(N)A(r)x(N- 1))
+ E(Bfr)TS, f N ) A f r ) x ( N - 1) I/O)TD°(N- 1) with u°(k) given by

× E(B(r)TS,(N)A(r)x(N - 1) 1/°)1/°}. u°(k) = - F ° ( k ) ( ~ , q,(O)B(i)TS°(k + 1)A(i)$°(k))


Consider the last two terms, evaluating the conditional
expectation and using S~(N) = SN, i e F. These two terms can
be written as
(
P ( k ) = D°(k) = ~'. q,(O)B(i)TS~(k + 1)B(i) + R,
)-'
~i~F

-(i~Fqi(O){B(i)TSNA(i)$°(N- 1)}TD°(N- 1) and S°(k) is given by


S~i(k) = A(i)TS°(k + I)A(i) + Qk
× {B(i)TSNA(i)$°(N - 1)} + ~ qi(0)tr {B(i) T
i¢~F - (B(i)TS~i(k + 1)A(i))TD°(k)(B(i)TS°(k + 1)A(i))
with the boundary condition S°(N) = SN, i e F.
× SNA(i)}TD°(N -- 1){B(i)TSNA(i)}P~i(N - 1))
This induction step can be verified in the following way:
V(k,/O) = rain E { x ( k - 1)TQk_lx(k -- 1)
+ [i~r q'(O)(B(i)TSNA(i)£°(N-1))T] D ° ( N - 1) u ( k - - l)

+ u(k - 1)TRk_lu(k - 1)
× [i~e q,(O)(B(i)TSNA(i)£°i(N-1)] - (B(r)TS~,(k)A(r)x(k - 1))TD°(k - 1)

with .~°(N - 1) and P~/(N - 1) are given by (6') and (7'), then x (B(r)S°(k)A(r)x(k - 1))
adding these terms and using the algebraic fact given in + E(B(r)TS°,(k)A(r)x(k)[/o)TD°(k - 1)
Appendix A, they become
x EfB(r)TS~,fk)Afr)x(k) I/O) + V(k + 1,/O) I 1°} •
- ((,.~s~. q,(O)q/(O) IIB(i)TSNA(i).~°(N -- 1) Computing ~,Fq,(O)B(i)TS~(k)A(i)~O(k) using (6'), (7')
and (15a) and following the same steps of Stage 1 with k
- B(j)TSNA(j)-r°( N -- l)ll~0v-~) replaced by N , w e o b t a i n the result.
762 Brief Paper

Then {Qk}k=o..N-I a sequence of positive semidefinite symmetnc


matrices and (Rk}k=o..N-t a sequence of positive definite
u°(0) = - l ~ ' ( 0 ) ( t ~F qt(O)B(i)TS~t'(1)m(i).f~/(O)) symmetric matrices. Write
S~i(k) = [A(i) - B(i)A~(k)]lS~tk + 1)[A(i) - B(i)A',(k)
with
I
+ A~(k)TRkAI(k) + Ok
~'(0) = ( ~ qt(O)(Ro + B(i)xS°(1)B(i))) . with
~t~F
A~(k) = Ot(k)B(i)lSl(k + I)A(i).
Select u ( 0 ) = u°(0), the control to be implemented at time
zero. Then S~(k + 1) positive semidefinite implies Sli(k) positive
semidefinite, k = N - 2 . . . . . 0. But
Step 2. This induction step is for the information state I t . S [ ( N - 1) = [A(i) - B(i)AI(N - 1)]TsN[A(i) -- B(i)A~(N - 1)]
The idea is to use the previous induction argument (as in
Step 1) with information state I t to show that the same form + A~(N - 1)TRN_~Att(N - 1) + aN
of the cost-to-go, and hence the open loop feedback control -1

results. A ~ ( N - 1 ) = (~ffFqt(I)B(i)TSNB(i)+ RN-I) B(i)TSNA(i)


Assume that at time step 1, the cost-to-go at stage
N-k-1 is then S~(N - 1) -> 0 as required.
V(k + 1, 1t) = ~ qt(l)~(k)TS~(k).~l(k) Now, after a tedious calculation the cost-to-go at stage
i6F N - k - 1 can be written as

+ ~" qt(l) tr ~÷ S~i(n)Q"+S~(k)P~(k) , V (k + 1, I') = min t~'. qt(l) II$~(k)ll~,~k)


i~F n -- ! u(k) ~i~F

l~k<_N-1. 2
+ u'(k) + ~ q,(l)~(k)~l(k)
Then we want to show that at stage N - k we have i~F (R k + ~ . ~ e F q i ( t ) B ( i ) T S ~1( k + 1 ) B ( i ) )

V(k, I t) = ~ qt(l)$~(k - 1)Xstt(k - 1)$~(k - 1)


+ ~] qt(l) tr S~(n)Q°+ S I ( k ) ~ ( k ) .
i~F n-- 1
N
+ ~, qt(l) tr Si(n)Q + St(k - 1)PI(k - 1) Thus S[(k + 1) -> 0 implies
i~F n=k
Rk + ~ q~(l)B(i)TSl(k + 1 ) B ( i ) > 0
with ul(k) given by t~F

so that at every stage and for any information state 1t, the
ut(k ) = -['t(k )(t~r qi(I)B(i)T S[(k + 1 )m(i)$~(k ))
cost-to-go is a strictly convex function of the control, and has

lW(k) = Dr(k) = ( ~'. qi(l)B(i)TS~(k + 1)B(i) + R~ )' a unique minimum.

6. Conclusions
~i~F
In this paper we have defined quadratic optimal control
and S~(k) problems for linear systems with parameter uncertainty for
Sl(k) = A(i)+Sl(k + 1)A(i) + Q, which passive control laws are optimal.
For a system with parameter uncertainty in the observation
- (B(i)TS~(k + 1)A(i))TD'(k)(B(i)TS~(k + 1)A(i)) equation only a convex cost functional was derived such that
the DUL control law is optimal.
with the boundary condition For a system with uncertainty in the plant and observation
SI(N ) = S N, i•F. equation a cost functional for which the open loop feedback
control is optimal is presented.
This follows directly by replacing /o by I t and using the The synthesis of these passive laws involves modifying the
results of the previous calculation. standard LQG cost functional by subtracting a quadratic
Now select u(l) = ut(l). term, which can be referred to as a "dual cost", since it
The next theorem shows that for the cost functional (14) directly involves the identification aspects of the problem. It
introduced in Theorem 3, the minimizing control in the class would be interesting to study the asymptotic properties of the
of open loop feedback policies is unique. dual cost when Ilu(k)ll---~, this is studied in Casiello and
Loparo (1989b).
Theorem 4 (verification theorem). There is a unique open
loop feedback policy {u(k)}* that minimizes the cost
functional (14).
References
Bar-Shalom, Y. and E. Tse (1974). Dual effect, certainty
Proof. It is enough to show that Sit(k) is positive semidefinite equivalence and separation in stochastic control. I E E E
1 = 0 . . . . . N - I , k>-I Trans. Aut. Control, AC-19, 494-500.
Casiello, F. and K. Loparo (1985). A dual controller for
S~(k) = A(i)TSI(k + 1)A(i) + Qk linear systems with random jump parameters. Proc. 24th
- (B(i)xS~(k + I)A(i))TD'(k)(B(i)TS~(k + 1)A(i)) Conf. on Decision and Control, Fort Lauderdale, pp.
911-915.
(
Dr(k) = ~'. qi(l)B(i)TSl(k + 1)B(i) + R k
"ieF
)' Casiello, F. and K. Loparo (1989a). Optimal control of
unknown parameter systems. IEEE Trans. Aut. Control,
to appear.
with Casiello, F. and K. Loparo (1989b). Optimal learning control
for a class of unknown parameter systems, submitted for
S[(N - 1) = A(i)TSNA(i) + ON-t publication.
- (B(i)TSNA(i))TDt(N -- 1)(B(i)TSNA(i)) Despande, J., T. Upadhyay and D. Lainotis. (1973).
Adaptive control of linear stochastic systems. Automatica,
9, 107-115.
Dr(N - 1) = ( ~Fq~(I)B(i)TSNB(i) + RN_, ) - ' Fragoso, M. (1988). On a partially observable LOG problem
for systems with Markovian jumping parameters. Syst.
with SN a positive semidefinite symmetric matrix, Control Leu., 10, 349-356.
Brief Paper 763

Grlfliths, B. and K. Loparo (1985). Optimal control of jump Let


linear quadratic Gaussian systems. Int. J. Control, 42(4),
791-819. V(k + Z, P) = m~ E(f(~(k), "(k))l : ) + V(k + 2, ~ (n2)
Hijab, O. (1986). Stabilization of Control Systems. Springer,
Berlin. where V ( . , / o ) satisfies the boundary conditions

Appendix A. Derivation of an algebraic identity V(N, 1°) = min E{q~(x(N)) + f ( x ( N - 1), u(N - 1)) [/o}
Algebraic fact. Let o~,> O, i • F = {0, I..... f} satisfy u(N--I)
and
~:'.o:i=l. V(Z, P) = SNo.
ieF
Let x~ ~ R n, i e F, P an n × n matrix then Then equation (B2) is the Bellman equation corresponding

E ~/,Pxi- E
icJF
(
xicJF
~:, :(e E )
/ \i¢~F
-:,
to the design of u(0).
Select u(O)=u°(O) and advance one step collecting
1' = {y(0), y0), u°(0)}.
= E OIiO~](Xi -- x I ) T p ( x i -- Xi) (A1) Designing ut(1). After obtaining 11, let
Oj)esF
where St, is the set of all pairwise distinct combinations of the
elements of F. (u(k))
{
JN, = rain E q~(x(N)) + ~. f(x(k), u(k)) ] 1'
t=1
}
N-I
Proof The left-hand side of the above expression equals
= min Ef~(x(N)) 11'} + ~ E{/(x(k), u(k)) t1'). (n3)
Y. ~:~,Px,-E ~,:,Px,- 2 E ~,~/,Px, {u(t)) k-t
iGF iGF (l,i)mSp Let
= ~ or/(1 - ~,)xTPx, - 2 ~ Ott.rgTPx,
itF ff.i)~sp V(k + 1, I t) ffi rain
u(k)
E{f(x(k), u(k)) [ I t } + V(k + 2, I t) (154)

= E °:,(2 %)xTpx, - 2 2 °t,°:rgTPx,


icg X]~F / (i,])~$g where V ( . , 11) satisfy the boundary conditions

-- Y. ~,~:Tpx,+ Y . . , . : T P ~ , V(N, 1') = min E{fp(x(N) + f ( x ( N - 1), u(N - 1)) [ it}


u(N-I)
Oj)osF O,/)~s~
and
+ E ","~xTpx, - E "i%xTPx/ V([, [1) _ JNt.
(i,I)~SF (t,j)~S F
= ~ ~,~,[(xTP(x,-@} + {x~P(~-x,)}l Equation (B4) is the Bellman equation corresponding to the
O./)~sF design of u(1).
Select u(1)=ut(1) and advance one step collecting
ffi ~ O~Ctj[xTP(x,-- xj) - x~P(x, - x])] 12 = {11, y(2), ul(1)).
O,])-s~ Designing ul(I).
---- E ~XiOti(X,-- x/)TP(x, -- X/).
O,J)eSF
rain E ~(~(N)) + ~ / ( x ( k ) , u(k)) I f'
sN,= {.(k))
NOW k--I
COV ( x ( k ) ] ]k) = E { ( x ( k ) - ~(k))(x(k)

= ~ q~(k)~a(k)£~(k)T+ ~ q~(k)P~(k)
- .~(k)) T I/k}
{
= rain E qJ(x(N) ]I') + ~', E{f(x(k), u(k)) [I') . (B5)
{u(~)) kft
}
Let
V (k + 1, 1:) ffi rain E {f (x(k ), u(k ) ) I It} + V (k + 2, ! t) (B6)
u(k)
Then (6) follows from (At) with P=I, xi=£i(k) x and
~l ffi qi( k ) • where V(., It) satisfiesthe boundary conditions
Appendix B. Synthesis of the open loop feedback controller V(N, I t) = rain E{cp(x(N)) + f ( x ( N - 1), u ( N - 1))[/t)
Designing u°(O). At time k =0, let /o represent the u(N--I)
knowledge about the initial state distribution. and
Let V(1, 1:) JNt" =

JNo = rain E~q~(x(N)) + N-1


Y./(x(k), u(k)) I P } Equation (B6) is the Bellman equation corresponding to the
(u(k)) t k-o
design of u(l).
N-I
Select u(l)ffiut(l) and advance one step collecting
= rain E{~(x(N)) I I°} + ~ E{f(x(k), u(k))ll°}. (B1) I t+t = {I t, y(! + 1), ul(/)}.
{ufk)) k=o

AUTO 25:5-H

You might also like