0% found this document useful (0 votes)

8 views

15 - Optimal Policies For Passive Learning Controllers

Uploaded by

Cordelio Lear

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

15 - Optimal Policies For Passive Learning Controllers

Uploaded by

Cordelio Lear

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Automatica, Vol. 25. No. 5, pp. 757-763, 1989 0005-1098189 $3.00 + 0.

00
Printed in Great Britain. Pergamon Press plc
~) 1989 International Federation of Automatic Control

Brief Paper

Optimal Policies for Passive Learning

Controllers*
FRANCISCO CASIELLOt and KENNETH A. L O P A R O t ~ t

Key Words--Dual control; dynamic programming; stochastic control; discrete time systems; adaptive
control.

Abstract--This paper deals with the optimal control of 2. Problem formulation

unknown parameter, partially observed linear discrete time Consider a linear, discrete time stochastic system of the
systems. The objective of the paper is to find a suitable cost form
functional for which passive learning controllers are optimal.
It is shown that if there are parameter uncertainties in the x(k+l)=A(r)x(k)+B(r)u(k)+o(k), k=0,1 ..... N-1
observation equation, then a quadratic functional can be (la)
found such that a passively adaptive closed loop control law
is optimal with respect to this functional. If there are also y(k ) = C(r)x(k ) + w(k ) (lb)
parameter uncertainties in the system equation, then a
where x(k) ~ R", u(k) ~ R", y(k) ~ R p, A(r), B(r) and C(r)
quadratic functional can be found such that a passively
are matrices of appropriate dimensions. Also x(0), {v(k)}
adaptive open loop feedback control law is optimal with
and {w(k)} are mutually independent white Gaussian
respect to this functional.
random sequences with
x(0) ~ N(xo, Co)
1. Introduction
THE PROBLEM of optimal control of partially observed, linear v(k ) ~ N(O, Q o)
discrete time stochastic systems with uncertain parameters
taking values in a finite set, is studied. In the 1960s and 1970s w(k) ~ N(O, QW)
the efforts directed toward the solution of this problem were and r is an unknown parameter taking values in a finite set
concentrated into two areas; finding quadratic approxima-
r = (0, 1. . . . . f).
tions to the cost-to-go of the dynamic programming equation For a quadratic cost functional of the following form:
to determine suboptimal dual control policies, and
developing naive suboptimal controllers, known as passively N--I
learning policies due to the fact that the controller does not J = E x(N)TSNx(N)+ ~'. x(k)TQkx(k)
k~O
use the control to actively influence the uncertainty about the
system parameters. + u(k)TRku(k)} (2)
In this paper we formulate and answer the following
question: given a passively learning policy, does there exist a
it is a well-known fact that the minimizing control sequence
cost functional for which this control strategy is optimal?
satisfies an active learning property, in the sense that the
Passive learning controllers of the type described by
Bar-Shalom and Tse (1974) and Casiello and Loparo (1985) control is actively employed to reduce the parameter
uncertainty (Casieilo and Loparo, 1985; Bar-Shalom and
are studied. Furthermore, it is shown that there are
fundamental differences between the following two cases. Tse, 1974). Also, no closed form solution for the optimal
control sequence has been obtained since the cost-to-go
(i) The realization of the system is (A, B, C(r)) (model function exhibits local maxima and minima (Grifliths and
uncertainty only in the observation equation). Loparo, 1985). Suboptimal "passive" learning policies have
(ii) The realization of the system is (A(r), B(r), C(r)) been employed (see Casiello and Loparo (1989b) for an
(model uncertainty in the plant and observation equations). account). It is the objective of this paper to illustrate that the
passive learning policy (Despande et al., 1973) /s optimal
In fact it is shown that for (i) above the passive learning with respect to a suitable quadratic cost; this is related to
controller (Bar-Shalom and Tse, 1974) is an optimal control some of the earlier work of Hijab (1986), Fragoso (1988) and
law for a suitable quadratic cost functional. This cost is shown some of our earlier work reported in Casielio and Loparo
to be a convex functional of the control sequence {u(k)} and (1989a) for continuous time problems.
is composed of two terms, one of them involving a weighted In Hijab (1986), a modified cost functional involving an
version of the trace of the state covariance matrix. Based on entropy term was proposed such that when certain algebraic
the same idea, an open loop feedback controller is designed conditions are satisfied the optimal closed loop policy can be
for case (ii) to find a quadratic cost functional with respect to obtained in closed form, for continuous time stochastic
which the policy, developed by Casielio and Loparo (1985), problems. In Casielio and Loparo (1989a) we proposed a
is optimal. modified cost functional, generalizing the results of Hijab
(1986), and obtained the solution of an optimal control
*Received 20 June 1988; revised 31 January 1989; problem in closed form, for continuous time stochastic
received in final form 9 March 1989. The original version of problems.
this paper was not presented at any IFAC meeting. This In Fragoso (1988), the continuous time problem for
paper was recommended for publication in revised form by systems with Markovian jump parameters is studied in a
Associate Editor P. G. Ferreira under the direction of Editor somewhat different form, and results similar to our work in
H. Kwakernaak. Casiello and Loparo (1989a) are obtained, i.e. a suitable cost
t Department of Systems Engineering, Case Western functional can be constructed so that a " D U L like" policy is
Reserve University, Cleveland, OH 44106, U.S.A. optimal. The original and the modified costs are related by a
~t Author to whom correspondence should be addressed. dual cost that enters additively in the cost functional. In

757
758 Brief Paper

Casiello and Loparo (1985) an expansion of the dual cost Let

which leads to a closed form solution of the optimal control
problem for discrete time systems was developed. $1(k)=E{x(k)lf, r=i}, l<.k<_N
P~(k) = E{(x(k) -2~(k))(x(k) -2~(k))~[ 1', r = i},
3. Estimation and filtering problem
The optimal estimator and filter for this problem are l<_k<_N
developed below, for A = A ( r ) , B = B(r), C = C(r), all then for l -< k -< N (6) and (7) can be rewritten as
matrices depend on the unknown parameter r. For a detailed
derivation see Griffiths and Loparo (1985). ~(k)=A(i)~(k-1)+B(i)u(k-1), l<-k<-N (6')
Let I k = {y(0) . . . . . y(k), u(O) . . . . . u(k - 1)}.
Let r take values in F = {0, 1. . . . . f}. P~(k) = A(i)P~(k - t)A(i)~ + Q ~' (7')
At time k = 0 we define the a priori probabilities
,~l(l) = , W )
qi(O) = e ( r = i I/o}, i e F. PJ(l) = P~(I).
By application of the theorem of total probability The conditional mean is
P(x(k) I/k) = ~ P(x(k) Iik, r = i)P(r = i l r ) E { x ( k ) I 1', 1 <- k} = ~. q,(l)$~(k)

the conditional density of x at time k assuming model i is and the conditional covariance matrix is
given by
Coy (x(k) I It) = ~ q,(l)P~,{k)
p(x(k ) I I k, r = i) = G(x(k );.~,(k ), P~(k )) ieF
where G ( . ; - , - ) denotes a Gaussian density function and + ~ q,(l)qj(k)($~(k) - .~(k))(.~(k) - 2J(k)) T. (8')
,et(k ) and Pi(k) are the conditional mean and covariance (i,I)GSF
matrix assuming model i given by The next theorem relates the minimization of an
appropriate quadratic cost functional associated with the
.fi(k) = m~(k) + K~(k)[y(k) - C(i)m~(k)] (3)
" D U L " passive controller defined by Despande et al. (1973).
P~(k ) = (1 - K,(k ) C ( i ) ) M , ( k ) (4) It is shown that this control law is the optimal closed loop
strategy for the problem posed.
and the Kalman gain
K~(k) = M,(k)C(i)T[C(i)M,(k)C(i) T + QW]-,. (5) 4. Results for model uncertainty only in the observation
equation
Here mi(k ) and Mi(k ) are the one step ahead mean and
Theorem 1. For the system
covariance matrix assuming model i and
x(k+l)=Ax(k)+Bu(k)+v(k), k=O ..... N-1
p(x(k ) l1 k-l, r = i) = G(x(k ); mi(k ), Mi(k ))
y(k) = C(r)x(k) + w(k), k = 1. . . . . N
with mi(k) and Mi(k ) computed recursively by
where the vectors and matrices were defined before, the
m~(k) = A ( i ) £ , ( k - l ) + B(i)u(k - 1) (6) passive closed loop control
M~(k) = A(i)P~(k - 1)A(i) T + Q ". (7) u(k) = -F(k).~(k), k = 0..... N - 1 (9a)

Define the a posteriori probabilities q i ( k ) = P ( r = i l I k ) , with

then they satisfy r(k) = (BTS(k + 1)B + R k ) - ' B T S ( k + I)A (9b)
qi(k ) = P(r = i I I k) = p ( y ( k ) l r = i, I k-I) is optimal with respect to the following quadratic cost
x P(r = i l l k - - l ) / p ( y ( k ) [ l k-I) criteria:
N--I
= p ( y ( k ) [ r = i, Ik-~)q~(k -- 1)/p(y(k ) l l ~-~)
J = EIx(N)TSNx(N) + ~ (x(k)rQkx(k) + u(k)TRku(k))
where L k=O
N-I }
p ( y ( k ) [ r = i, I k-I) - ~ (x(k) - ~(k))Tt',(x(k)- ~(k)) (10)
k=0
= G(y(k); C(i)m,(k), C(i)M~(k)C(i) T + QW).
where S N, {Qk}, k = 0 . . . . . N - 1 are sequences of
The conditional mean is symmetric positive semidefinite (~0) matrices, and {Rk},
k = 0 . . . . , N - 1 is a sequence of symmetric positive definite
$(k) = e { x ( k ) ] l k } = ~ qi(k)£i(k) (>0) matrices.
ieF
Here {Pk} is a sequence of matrices obtained from
and the conditional covariance matrix is
Pk = (BrS( k + 1)A)TD(k)(BTS( k + 1)A),
Cov (x(k) IIk) = ~ q,(k)Pi(k)
iEF k=0 ..... N-1 (lla)
+ ~, q~(k)q/(k)(.f~(k)-ij(k))(.~(k)-.~j(k)) T (8) where S(k) satisfies
(i,j)eS¥
where S r is the set of all pairwise distinct combinations of S(k) = A T S ( k + 1)A + Qk
elements of F. -- (BTS(k + 1)A)TD(k)(BTS(k + 1)A),
Refer to Appendix A for details pertaining to the
derivation of the conditional covariance equation. k = N - 1. . . . . 0 (11b)

Note. Assume that no new information will be collected with

from time I to time N. This will be the case when we develop D(k)=(B'rS(k+I)B+R~)-', I=N-1 ..... 0 (llc)
the optimal open loop feedback controller.
and S(k) satisfies the boundary condition
At time l the information available to the controller is
contained in I t. S(N) = SN.
Brief Paper 759

Furthermore, the optimal cost is x (~,(N - 1) - .~j(N - 1))

+ ~ . q i ( N - 1)(tr Q N _ I P i ( N -- 1) + tr SNMi(N))
joe rain J fxoTS(O)xo
(u(t ) ),"VZo
~
- ~ q , ( N - 1 ) q j ( N - I ) 0 ? , ( N - 1)
+ ~'~ q,(O) tr S ( n ) Q ~ + S(O)P~(O) . (12) (i,))~sF
-- $ j ( N - 1))'rPN_t(L(N -- 1) -- i j ( N - 1))
Proof. T h e solution is obtained using dynamic programming. - ~ q,(N- 1 ) t r P N _ 1 P i ( N - - 1).
Let
V ( k + 1, t k) = ~ i n E { x ( k ) r Q ~ ( k ) + u(k)TRku(k) Now select P ~ - t = (BTSNA)TDN-~(BTSNA), then using (7)
and ( l l b )
-- ( x ( k ) - £ ( k ) )T p , ( x ( k ) - $ ( k ) ) M~(N) = A(i)P~(N - 1)A(i) T + Q o
+ V ( k + 2, I *+') Ilk) and
where t* = {y(0), y(1) . . . . . y ( k ) , u(O) . . . . . u ( k - 1)) is the A(i)'rSNA(i) = S ( N - 1) - QN-1 + PN-I
information state of the system and V ( . , . ) satisfies the
then
boundary conditions
tr SNMi(N ) = tr SN(A(i)P~(N - 1)A(i) T) + tr SNQ ~
V ( N , I ~ - ' ) ffi min E { x ( N ) T S N x ( N )
u(N--l) = trA(i)TSNA(i)P~(N - 1) + tr SNQ °
+ x(N- 1 ) T Q ~ _ ~ x ( N - 1)
t r S N M i ( N ) = t r S ( N - 1)P~(N - 1) - tr O N _ t P i ( N - 1)
+ u ( N - 1)TR~_~u(N - 1)
+ tr PN_tP/(N -- 1) + tr SNQ ~
- ( x ( N - 1) - £ ( N - 1 ) ) T P ~ _ , ( x ( N - 1)
and this implies that
-.e(N - 1)) I ,,N-'}
q i ( N - 1) tr (QN_1P~(N -- 1) + SNMI(N) - PN_tP~(N -- 1))
and iGF

v(1, ~ = P . = ~'~ q i ( N - 1) tr ( Q N _ , P ~ ( N - 1) + S ( N - 1)P~(N- 1)

At Stage 1 we have - QN_,P,(N - 1) + PN_,P,(N - 1)

V ( N , I ~-~) + SNQ ° - P N _ , g ( N - 1))
= ~'~ q , ( N - 1) tr S ( ( N - I)P~(N - 1) + S N Q ~)
= min E { x ( N ) T S N x ( N ) + x ( N - 1)TQ~_tx(N- 1)
u(N-I)
+ u ( N - 1)TR~_~u(N - 1) which yields
V(N, F ' - I ) = ~ q , ( S - 1)~,(N - 1 ? ' S ( N - 1 ) e , ( N - 1)
-(x(N- 1) - £ ( N - 1 ) ) T p N _ t ( x ( N - 1) i~F

-.e(N- 1 ) ) 1 / N - ' } + ~'~ q , ( N - 1) tr { S ( N - 1)P~(N - 1) + SNQ ~}

i¢~F

= min / ~ q,(N- 1)m,(N)TSNm,(N)

where S ( N - 1) is defined as previously.
u ( N - - I ) ~'i~F The remainder of the proof follows from mathematical
induction.
+ ~ q~(N - 1) tr SNMi(N) Assume that at Stage N - k - 1

+ ~ q~(N- I~-(N- 1)TQ~_I~(N- 1) V ( k + 1, I k) = ~'. qi(k)xi(k)TS(k)xi

N
4- ~ q i ( N - 1) tr O~_t P~(N - 1)
+ ixe F X
~n=k+l
+ u ( N - 1)TRN_~u(N - 1)
then at Stage N - k we want to show that
q~(N - 1)$~(N- 1)'rPN_~.~(N- 1)
ieeF
V(k, t *-~) = Y~ q~(k - 1)~(k - 1)TS(k - 1)~,(k - 1)
ieF
~ . q i ( N - 1) tr P N _ ~ P i ( N - 1)
i¢F
+ ~'~ q,(k - l ) t r ~ S ( n ) Q O + S ( k - 1)P/(k - 1)
i~F Xn=k
+
with u(k), Pk and S ( k ) given by (3) and (5).
The induction argument follows from the following result:
x
V ( k , I k-I) = min E { x ( k - 1)TQ~_lx(k - 1)
u(k--l)
Thcn
+ u ( k - 1)TRk_lU(k -- 1)
u * ( N - 1) = --(BTSN B + R,v_O-'B'rSmA.~(N - 1)
- (x(k - 1) - ~ ( k - 0)+P~_,(x(k - 1)
= -r(N- I~(N- 1).
- ~ ( k - 1)) + v ( k + 1, P ) I P-~}-
Substituting u * ( N - 1 ) into V ( N , I ~-~) and using the
Computing E { ~ . ~ F q i ( k ) $ ~ ( k ) T S ( k ) £ j ( k ) l l k - t } and using (7)
algebraic fact given in Appendix A yields
and ( l l b ) and following the same steps as in Stage 1, with k
V ( N , 1~ - ' ) = ~ q , ( N - l ~ i ( N - 1)TS(N - I ~ , ( N - 1) replaced by N, we obtain the result.
ieF
The next result, Theorem 2, proves that the cost functional
+ ~ q~(N - 1)q~(N - 1)(.~(N - 1) introduced in Theorem 1 is a convex functional of {u(k)}
(i,I)eSF and thus the DUL controller is the unique minimizing
- :~(N - 1))T(BTSNA)TDN_t(BTSNA) strategy.
760 Brief Paper

Theorem 2 (verification theorem). The cost (10) is a convex next obtain a control cost for which an open loop Jeedback
functional of {u(k)} and, as such, there is a unique policy is optimal. It should also be noted that the
minimizing policy {u(k)}*. computation of the open loop feedback control (which is
needed to solve the problem when the system matrices A and
Proof. It is enough to show that S(k) is a positive B depend on the unknown parameter r) is much more
semidefinite matrix for k = 0 , . . . , N - 1 involved than the computation of the closed loop control
S(k) = ATS(k + 1)A + Qk (which can be solved in closed form when the system
matrices A and B do not depend on the unknown parameter
- ( B T S ( k + 1)A)T(BTS(k + 1)B + Rt,)-l(BXS(k + 1)A) r). This is an important difference in terms of the
computational effort and the conceptual interpretation.
with
S ( N - 1) = ATSNA + QN-I 5. Results for model uncertain~ on the plant and observation
equations
- (BTSNA)T(BTSNB + RN_,)-'(BTSNA) 5.1. Open loop feedback controller. Let q~(-) a n d f ( . , • . - )
be non-negative functions of their arguments.
with SN a positive semidefinite symmetric matrix,
Let
{Qk}k=o,...,N-i a sequence of positive semidefinite symmetric N-[
matrices and {Rk}k=o,...,N-~ a sequence of positive definite JN = cp(x(N)) + ~ f(x(k), u(k), k).
symmetric matrices. Write k =0

S(k ) = [A - BF(k )]XS(k + 1)[A - BF(k)] + F(k )TRkF(k ) + Qk The open loop feedback control policy is obtained as
follows. Let I k = {y(0) . . . . . y(k), u(0) . . . . . u(k - 1)} be
with the information state at time k. Note that
qi(O), q/(1) . . . . . qi(k), i • F, are not included because they
r(k) = (BTS(k + 1)B + Rk)-'BTS(k + 1)A. can be derived from I k and the system equations.
Given S(k + 1) positive semidefinite it follows that S(k) is Assume that only a priori information is available. The
positive semidefinite, k = N - 2 . . . . . 0. design of the open loop controller can be formulated as
But min E{JN I/~'} to obtain the sequence
(u(k)}k=o,. ,~-i
S(N - 1) = [A - BF(N - I)]TSNI A -- BF(N - 1)] {u°(O), u°(1) . . . . . u°(N - 1)}.
+ F(N - 1)TRN_IF(N - 1) + QN-I Select u(O) ° as the input to be applied at time zero and
advance one step, collecting information to obtain I ~.
->0
Assume that no new information will be collected in the
as required. future and select the control law at time equal to one such
Now, after a tedious calculation the cost-to-go at stage that
N - k - 1 can be written as min E{JN ] I I} to obtain the sequence
(u(k)}k .1,. ,jv-i
u'(1), ut(2) . . . . . u I ( N - 1).
V(k + 1, 1k) = min/~'~ q,(k) II£,(k)ll~(k)
u(k) [i~F
Select ul(1) as the input and advance one step, collecting
information to obtain I z. Repeating this procedure defines
+ u(k) + ~ q,(k)r(k)~,(k) ~ the open loop feedback policy {u(k)} = {u'(k)}. The
i~F II(Rk + B T S ( k + I ) B )
procedure can be formalized using dynamic programming as
developed in Appendix B.
+ i~t:
~'~ q,(k) tr , - 1S(n)Q ~ + S(k)P~(k) . The next result shows that the open loop feedback control
policy as previously defined is an optimal open loop feedback
Thus S(k + 1) > 0 implies Rk + BTS(k + 1)B > 0, so that at policy for an appropriately defined quadratic cost functional.
every stage the cost-to-go is a strictly convex function of the
control, and has a unique minimum. Theorem 3. For the system x(k + 1) = A(r)x(k) +
This proves the desired result. B(r)u(k) + v(k), k = 1. . . . . N - 1

Comments. y(k)=C(r)x(k)+w(k), k = 1. . . . . N
(1) The passive learning control introduced in Despande et where the vectors and matrices are defined in Section 2, the
al. (1973) is optimal with respect to an appropriate quadratic open loop feedback control
cost functional that includes a weighted version of the trace
of the conditional covariance matrix.
(2) The dual property of the control can be defined in u ( k ) = - [ ' ( k ) ( , ~ r q,(k)B(r)"Sr(k + l)A(r)Ycr(k )),
terms of the dependence of the conditional covariance on the
control (Bar-Shalom and Tse, 1974). Including the k=0 ..... N-I (13a)
conditional covariance weighted by the sequence {Pk}
balances out the active learning which is present in the F(k)=(~Fqr(k)(Rk+ B(r)TSr(k+ 1)B(r))) I
standard L Q G problem with parameter uncertainty so that
the passive learning controller is optimal in this context. k=0 ..... N-I (13b)
(3) It follows that L Q G problem with parameter
uncertainty poses an interesting dual control problem which is the optimal open loop feedback control with respect to the
inherently includes active learning without any modifications following quadratic cost functional:
of the cost functional to include an identification cost N-I
formally into the problem. J = E x(N)TSNx(N) + ~ (x(k)" fAx(k) + u(k)fRku(k))
(4) The feedback gain matrices (9) can be computed k~O
off-line. In fact, the optimal policy can be written as the N--1
weighted average of the individual optimal closed loop - ~ [(B(r)TS,( k + l)A(r)x(k))TD(k)
control policies corresponding to each parameter r • F , k=O
where the a posteriori probabilities are treated as the x (B(r)TS,(k + l)A(r)x(k))
weights. This is the D U L Despande et al. (1973) law. It
should be noted that this result has been obtained for the - E(B(r)TS,(k + 1)A(r)x(k) I lk)'rD(k)
case of model uncertainty in the observation and cannot be
straightforwardly extended to the case of general model × E(B(r)TS,(k + 1)A(r)x(k) [ P ] } (14)
uncertainty in the system equation. For this last situation we
Brief Paper 761

where for r e F, and {S,(k + 1)} and {D(k)} are sequences + ~', q,(0) tr {B(i)TsNA(i)}TD°(N - 1)
of matrices satisfying:
(i) × (B(i)TSNA(i))~(N-- 1)) (16)
St,(k) = A(r)TSt,(k + 1)A(r) + Qk
with IIxll~* = x T e x , for x e R ~.
- (B(r)TSt~(k + I)A(r))TDt(k)(B(r)TSt,(k + l)A(r)) Performing the minimization using (7') and (lib) and the
05a) results above, we obtain
with V(N,/O) = ~ q,(0)£'°(N- 1)TS,(N - 1)£°(N - 1)

Dr(k) = q~(l)B(r)TSt,(k + 1)B(r) + R k )' ,

/~F

+ ~ qi(O) tr (S,(N - l)e°(N - 1) + SNQ ")

\r~F
with
k=N-1 ..... l; /=0 ..... N-1. (15b)
(ii) S~/(N- 1) = A(i)T SNA(i) + ON-,
S,(k) = S~(k)
-- (B(i)TSNA(i))TD°(N -- 1)(B(i)TSNA(i))
D(k) = Dk(k)
with the boundary condition D°(N - I ) = (~Fqi(O)B(i)T SNB(i) + RN-, ) - '

S',(N)=S N, r¢F, I = 0 ..... N - 1 . (15c) and the boundary condition S°i(N)=SN, i cF. The
Remarks. minimizing control obtained is
(1) The upper index l indicates that the information
available to the controller is I t. For all times k with u°(N-1)= - f " ° ( N - 1)(~F q'(O)B(i)TSNA(i)£°(N-1))
l < k -< N - 1 the matrices Dl(k) and S~ are computed using
qi(l). Thus (15) are the equations associated with the f'°(N - 1) = O°(N - 1).
solution of the Bellman equation corresponding to the design
of u(l), equation (B6) in Appendix B. Note the cancellation of the terms involved in (16) with those
(2) This is neither the certainty equivalent controller nor in the cost-to-go.
the DUL controller, because (13) is not the average of the To achieve the desired result we apply mathematical
control policies associated with a particular system model induction in two steps.
(A(i), B(i), C(i)) weighted by the conditional probabilities Step 1. Here we assume that the information state is fixed
qi(k). This controller has been defined in Casielio and
at/O and use induction on the stages to determine u°(0). This
Loparo (1985).
is accomplished as follows. Assume that at stage N - k - 1
(3) The matrices {Si(k)} cannot be computed off-line,
since they depend on {qi(k)}k.O,....N-1. They must be V(k + 1,/O) = ~ q~(O)X°(k)TS°(k)X°(k)
recomputed as new information is obtained in Casiello and ieF
Loparo (1985). N

Proof. At Stage 1 we have

V(N, /o) = rain E{x(N)TSNx(N)+ x ( N - 1 ) ' r Q N _ t x ( N- 1) then we want to show that at stage N - k we have
u{N--1)

+ u ( N - 1)TRN_lU(N- 1) V(k,/O) = ~'~ q,(O)~°(k - 1)TS°(k - 1)X°(k - 1)

ieF

- ( B ( r ) T S r ( N ) A ( r ) x ( N - 1))TD°(N- 1)
+ ~ q,(O) tr S~(n)Q ~ + g ( k - 1)P~i(k - 1)
× (B(r)TS,(N)A(r)x(N- 1))
+ E(Bfr)TS, f N ) A f r ) x ( N - 1) I/O)TD°(N- 1) with u°(k) given by

× E(B(r)TS,(N)A(r)x(N - 1) 1/°)1/°}. u°(k) = - F ° ( k ) ( ~ , q,(O)B(i)TS°(k + 1)A(i)$°(k))

Consider the last two terms, evaluating the conditional
expectation and using S~(N) = SN, i e F. These two terms can
be written as
(
P ( k ) = D°(k) = ~'. q,(O)B(i)TS~(k + 1)B(i) + R,
)-'
~i~F

-(i~Fqi(O){B(i)TSNA(i)$°(N- 1)}TD°(N- 1) and S°(k) is given by

S~i(k) = A(i)TS°(k + I)A(i) + Qk
× {B(i)TSNA(i)$°(N - 1)} + ~ qi(0)tr {B(i) T
i¢~F - (B(i)TS~i(k + 1)A(i))TD°(k)(B(i)TS°(k + 1)A(i))
with the boundary condition S°(N) = SN, i e F.
× SNA(i)}TD°(N -- 1){B(i)TSNA(i)}P~i(N - 1))
This induction step can be verified in the following way:
V(k,/O) = rain E { x ( k - 1)TQk_lx(k -- 1)
+ [i~r q'(O)(B(i)TSNA(i)£°(N-1))T] D ° ( N - 1) u ( k - - l)

+ u(k - 1)TRk_lu(k - 1)
× [i~e q,(O)(B(i)TSNA(i)£°i(N-1)] - (B(r)TS~,(k)A(r)x(k - 1))TD°(k - 1)

with .~°(N - 1) and P~/(N - 1) are given by (6') and (7'), then x (B(r)S°(k)A(r)x(k - 1))
adding these terms and using the algebraic fact given in + E(B(r)TS°,(k)A(r)x(k)[/o)TD°(k - 1)
Appendix A, they become
x EfB(r)TS~,fk)Afr)x(k) I/O) + V(k + 1,/O) I 1°} •
- ((,.~s~. q,(O)q/(O) IIB(i)TSNA(i).~°(N -- 1) Computing ~,Fq,(O)B(i)TS~(k)A(i)~O(k) using (6'), (7')
and (15a) and following the same steps of Stage 1 with k
- B(j)TSNA(j)-r°( N -- l)ll~0v-~) replaced by N , w e o b t a i n the result.
762 Brief Paper

Then {Qk}k=o..N-I a sequence of positive semidefinite symmetnc

matrices and (Rk}k=o..N-t a sequence of positive definite
u°(0) = - l ~ ' ( 0 ) ( t ~F qt(O)B(i)TS~t'(1)m(i).f~/(O)) symmetric matrices. Write
S~i(k) = [A(i) - B(i)A~(k)]lS~tk + 1)[A(i) - B(i)A',(k)
with
I
+ A~(k)TRkAI(k) + Ok
~'(0) = ( ~ qt(O)(Ro + B(i)xS°(1)B(i))) . with
~t~F
A~(k) = Ot(k)B(i)lSl(k + I)A(i).
Select u ( 0 ) = u°(0), the control to be implemented at time
zero. Then S~(k + 1) positive semidefinite implies Sli(k) positive
semidefinite, k = N - 2 . . . . . 0. But
Step 2. This induction step is for the information state I t . S [ ( N - 1) = [A(i) - B(i)AI(N - 1)]TsN[A(i) -- B(i)A~(N - 1)]
The idea is to use the previous induction argument (as in
Step 1) with information state I t to show that the same form + A~(N - 1)TRN_~Att(N - 1) + aN
of the cost-to-go, and hence the open loop feedback control -1

results. A ~ ( N - 1 ) = (~ffFqt(I)B(i)TSNB(i)+ RN-I) B(i)TSNA(i)

Assume that at time step 1, the cost-to-go at stage
N-k-1 is then S~(N - 1) -> 0 as required.
V(k + 1, 1t) = ~ qt(l)~(k)TS~(k).~l(k) Now, after a tedious calculation the cost-to-go at stage
i6F N - k - 1 can be written as

+ ~" qt(l) tr ~÷ S~i(n)Q"+S~(k)P~(k) , V (k + 1, I') = min t~'. qt(l) II$~(k)ll~,~k)

i~F n -- ! u(k) ~i~F

l~k<_N-1. 2
+ u'(k) + ~ q,(l)~(k)~l(k)
Then we want to show that at stage N - k we have i~F (R k + ~ . ~ e F q i ( t ) B ( i ) T S ~1( k + 1 ) B ( i ) )

V(k, I t) = ~ qt(l)$~(k - 1)Xstt(k - 1)$~(k - 1)

+ ~] qt(l) tr S~(n)Q°+ S I ( k ) ~ ( k ) .
i~F n-- 1
N
+ ~, qt(l) tr Si(n)Q + St(k - 1)PI(k - 1) Thus S[(k + 1) -> 0 implies
i~F n=k
Rk + ~ q~(l)B(i)TSl(k + 1 ) B ( i ) > 0
with ul(k) given by t~F

so that at every stage and for any information state 1t, the
ut(k ) = -['t(k )(t~r qi(I)B(i)T S[(k + 1 )m(i)$~(k ))
cost-to-go is a strictly convex function of the control, and has

lW(k) = Dr(k) = ( ~'. qi(l)B(i)TS~(k + 1)B(i) + R~ )' a unique minimum.

6. Conclusions
~i~F
In this paper we have defined quadratic optimal control
and S~(k) problems for linear systems with parameter uncertainty for
Sl(k) = A(i)+Sl(k + 1)A(i) + Q, which passive control laws are optimal.
For a system with parameter uncertainty in the observation
- (B(i)TS~(k + 1)A(i))TD'(k)(B(i)TS~(k + 1)A(i)) equation only a convex cost functional was derived such that
the DUL control law is optimal.
with the boundary condition For a system with uncertainty in the plant and observation
SI(N ) = S N, i•F. equation a cost functional for which the open loop feedback
control is optimal is presented.
This follows directly by replacing /o by I t and using the The synthesis of these passive laws involves modifying the
results of the previous calculation. standard LQG cost functional by subtracting a quadratic
Now select u(l) = ut(l). term, which can be referred to as a "dual cost", since it
The next theorem shows that for the cost functional (14) directly involves the identification aspects of the problem. It
introduced in Theorem 3, the minimizing control in the class would be interesting to study the asymptotic properties of the
of open loop feedback policies is unique. dual cost when Ilu(k)ll---~, this is studied in Casiello and
Loparo (1989b).
Theorem 4 (verification theorem). There is a unique open
loop feedback policy {u(k)}* that minimizes the cost
functional (14).
References
Bar-Shalom, Y. and E. Tse (1974). Dual effect, certainty
Proof. It is enough to show that Sit(k) is positive semidefinite equivalence and separation in stochastic control. I E E E
1 = 0 . . . . . N - I , k>-I Trans. Aut. Control, AC-19, 494-500.
Casiello, F. and K. Loparo (1985). A dual controller for
S~(k) = A(i)TSI(k + 1)A(i) + Qk linear systems with random jump parameters. Proc. 24th
- (B(i)xS~(k + I)A(i))TD'(k)(B(i)TS~(k + 1)A(i)) Conf. on Decision and Control, Fort Lauderdale, pp.
911-915.
(
Dr(k) = ~'. qi(l)B(i)TSl(k + 1)B(i) + R k
"ieF
)' Casiello, F. and K. Loparo (1989a). Optimal control of
unknown parameter systems. IEEE Trans. Aut. Control,
to appear.
with Casiello, F. and K. Loparo (1989b). Optimal learning control
for a class of unknown parameter systems, submitted for
S[(N - 1) = A(i)TSNA(i) + ON-t publication.
- (B(i)TSNA(i))TDt(N -- 1)(B(i)TSNA(i)) Despande, J., T. Upadhyay and D. Lainotis. (1973).
Adaptive control of linear stochastic systems. Automatica,
9, 107-115.
Dr(N - 1) = ( ~Fq~(I)B(i)TSNB(i) + RN_, ) - ' Fragoso, M. (1988). On a partially observable LOG problem
for systems with Markovian jumping parameters. Syst.
with SN a positive semidefinite symmetric matrix, Control Leu., 10, 349-356.
Brief Paper 763

Grlfliths, B. and K. Loparo (1985). Optimal control of jump Let

linear quadratic Gaussian systems. Int. J. Control, 42(4),
791-819. V(k + Z, P) = m~ E(f(~(k), "(k))l : ) + V(k + 2, ~ (n2)
Hijab, O. (1986). Stabilization of Control Systems. Springer,
Berlin. where V ( . , / o ) satisfies the boundary conditions

Appendix A. Derivation of an algebraic identity V(N, 1°) = min E{q~(x(N)) + f ( x ( N - 1), u(N - 1)) [/o}
Algebraic fact. Let o~,> O, i • F = {0, I..... f} satisfy u(N--I)
and
~:'.o:i=l. V(Z, P) = SNo.
ieF
Let x~ ~ R n, i e F, P an n × n matrix then Then equation (B2) is the Bellman equation corresponding

E ~/,Pxi- E
icJF
(
xicJF
~:, :(e E )
/ \i¢~F
-:,
to the design of u(0).
Select u(O)=u°(O) and advance one step collecting
1' = {y(0), y0), u°(0)}.
= E OIiO~](Xi -- x I ) T p ( x i -- Xi) (A1) Designing ut(1). After obtaining 11, let
Oj)esF
where St, is the set of all pairwise distinct combinations of the
elements of F. (u(k))
{
JN, = rain E q~(x(N)) + ~. f(x(k), u(k)) ] 1'
t=1
}
N-I
Proof The left-hand side of the above expression equals
= min Ef~(x(N)) 11'} + ~ E{/(x(k), u(k)) t1'). (n3)
Y. ~:~,Px,-E ~,:,Px,- 2 E ~,~/,Px, {u(t)) k-t
iGF iGF (l,i)mSp Let
= ~ or/(1 - ~,)xTPx, - 2 ~ Ott.rgTPx,
itF ff.i)~sp V(k + 1, I t) ffi rain
u(k)
E{f(x(k), u(k)) [ I t } + V(k + 2, I t) (154)

= E °:,(2 %)xTpx, - 2 2 °t,°:rgTPx,

icg X]~F / (i,])~$g where V ( . , 11) satisfy the boundary conditions

-- Y. ~,~:Tpx,+ Y . . , . : T P ~ , V(N, 1') = min E{fp(x(N) + f ( x ( N - 1), u(N - 1)) [ it}

u(N-I)
Oj)osF O,/)~s~
and
+ E ","~xTpx, - E "i%xTPx/ V([, [1) _ JNt.
(i,I)~SF (t,j)~S F
= ~ ~,~,[(xTP(x,-@} + {x~P(~-x,)}l Equation (B4) is the Bellman equation corresponding to the
O./)~sF design of u(1).
Select u(1)=ut(1) and advance one step collecting
ffi ~ O~Ctj[xTP(x,-- xj) - x~P(x, - x])] 12 = {11, y(2), ul(1)).
O,])-s~ Designing ul(I).
---- E ~XiOti(X,-- x/)TP(x, -- X/).
O,J)eSF
rain E ~(~(N)) + ~ / ( x ( k ) , u(k)) I f'
sN,= {.(k))
NOW k--I
COV ( x ( k ) ] ]k) = E { ( x ( k ) - ~(k))(x(k)

= ~ q~(k)~a(k)£~(k)T+ ~ q~(k)P~(k)
- .~(k)) T I/k}
{
= rain E qJ(x(N) ]I') + ~', E{f(x(k), u(k)) [I') . (B5)
{u(~)) kft
}
Let
V (k + 1, 1:) ffi rain E {f (x(k ), u(k ) ) I It} + V (k + 2, ! t) (B6)
u(k)
Then (6) follows from (At) with P=I, xi=£i(k) x and
~l ffi qi( k ) • where V(., It) satisfiesthe boundary conditions
Appendix B. Synthesis of the open loop feedback controller V(N, I t) = rain E{cp(x(N)) + f ( x ( N - 1), u ( N - 1))[/t)
Designing u°(O). At time k =0, let /o represent the u(N--I)
knowledge about the initial state distribution. and
Let V(1, 1:) JNt" =

JNo = rain E~q~(x(N)) + N-1

Y./(x(k), u(k)) I P } Equation (B6) is the Bellman equation corresponding to the
(u(k)) t k-o
design of u(l).
N-I
Select u(l)ffiut(l) and advance one step collecting
= rain E{~(x(N)) I I°} + ~ E{f(x(k), u(k))ll°}. (B1) I t+t = {I t, y(! + 1), ul(/)}.
{ufk)) k=o

AUTO 25:5-H

Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)
Herbs of Commerce 1992
100% (1)
Herbs of Commerce 1992
93 pages
1993 - 101 - Subspace Algorithms For The Stochastic Identification Problem - 1993
No ratings yet
1993 - 101 - Subspace Algorithms For The Stochastic Identification Problem - 1993
12 pages
2007ACC
No ratings yet
2007ACC
6 pages
Robot Trajectory Optimization Using Approximate Inference
No ratings yet
Robot Trajectory Optimization Using Approximate Inference
8 pages
A New Discrete-Time Robust Stability Condition
No ratings yet
A New Discrete-Time Robust Stability Condition
5 pages
Predictive Control: For Linear and Hybrid Systems
No ratings yet
Predictive Control: For Linear and Hybrid Systems
458 pages
BackStepping For Nonsmooth Systems
No ratings yet
BackStepping For Nonsmooth Systems
7 pages
Chaos Control Via Optimal Generalized Backstepping Method
No ratings yet
Chaos Control Via Optimal Generalized Backstepping Method
12 pages
Adaptive Control of Linearizable Systems: Fully-State
No ratings yet
Adaptive Control of Linearizable Systems: Fully-State
9 pages
Neural Network Implementation of Nonlinear Receding-Horizon Control
No ratings yet
Neural Network Implementation of Nonlinear Receding-Horizon Control
7 pages
On The Existence of Stationary Optimal Policies For The Average Cost Control Problem of Linear Systems With Abstract State-Feedback
No ratings yet
On The Existence of Stationary Optimal Policies For The Average Cost Control Problem of Linear Systems With Abstract State-Feedback
6 pages
Parallel Implementation of An Ant Colony Optimization Metaheuristic With Openmp
No ratings yet
Parallel Implementation of An Ant Colony Optimization Metaheuristic With Openmp
7 pages
Controllability, Observability, Pole: Allocation
No ratings yet
Controllability, Observability, Pole: Allocation
14 pages
Kinematic Control of Robots With Noisy Guidance System
No ratings yet
Kinematic Control of Robots With Noisy Guidance System
8 pages
Newtons Method of Trading
No ratings yet
Newtons Method of Trading
8 pages
Hybrid LQ-Optimization Using Dynamic Programming: V. Azhmyakov, R. Galvan-Guerra, and M. Egerstedt
No ratings yet
Hybrid LQ-Optimization Using Dynamic Programming: V. Azhmyakov, R. Galvan-Guerra, and M. Egerstedt
9 pages
1-s2.0-S1474667016430557-main
No ratings yet
1-s2.0-S1474667016430557-main
6 pages
Nonlinear Rescaling of Control Laws With Application To Stabilization in The Presence of Magnitude Saturation
No ratings yet
Nonlinear Rescaling of Control Laws With Application To Stabilization in The Presence of Magnitude Saturation
32 pages
Hone On Nonstandard Numerical Integration Methods For Biological Oscillators
No ratings yet
Hone On Nonstandard Numerical Integration Methods For Biological Oscillators
17 pages
A Tutorial Introduction To Nonlinear Dynamics and Chaos, Part 11: Modellng and Control
No ratings yet
A Tutorial Introduction To Nonlinear Dynamics and Chaos, Part 11: Modellng and Control
17 pages
On An Optimal Linear Control of A Chaotic Non-Ideal Duffing System
No ratings yet
On An Optimal Linear Control of A Chaotic Non-Ideal Duffing System
7 pages
Sun Et Al. - 2020 - Finite Sample System Identification Improved Rate
No ratings yet
Sun Et Al. - 2020 - Finite Sample System Identification Improved Rate
10 pages
Azam 2017 J. Phys. Conf. Ser. 783 012013
No ratings yet
Azam 2017 J. Phys. Conf. Ser. 783 012013
12 pages
To Infinite Lattice
No ratings yet
To Infinite Lattice
14 pages
Trajectory Generation
No ratings yet
Trajectory Generation
6 pages
Ranking Replacement
No ratings yet
Ranking Replacement
35 pages
Algorithms For Portfolio
No ratings yet
Algorithms For Portfolio
9 pages
0003 4916 (84) 90097 6
No ratings yet
0003 4916 (84) 90097 6
33 pages
Algorithms For Portfolio Management Based On The Newton Method
No ratings yet
Algorithms For Portfolio Management Based On The Newton Method
8 pages
Robust Control With Structured Perturbations: L.H. S.P
No ratings yet
Robust Control With Structured Perturbations: L.H. S.P
6 pages
2003 Garcia de Jalon Et Al
No ratings yet
2003 Garcia de Jalon Et Al
25 pages
ClustingAdaptiveNonlinear (IEEE A1999)
No ratings yet
ClustingAdaptiveNonlinear (IEEE A1999)
6 pages
CntrlEngg (Optimization) RecentAdvancesInSystemModellingAndOptimization JorgeAmaya
No ratings yet
CntrlEngg (Optimization) RecentAdvancesInSystemModellingAndOptimization JorgeAmaya
203 pages
Christ of Ides 1996
No ratings yet
Christ of Ides 1996
24 pages
1-s2.0-S0016003223007305-main
No ratings yet
1-s2.0-S0016003223007305-main
19 pages
s14 Completare
No ratings yet
s14 Completare
14 pages
Gaussian Process Latent Force Models For Learning and Stochastic Control of Physical Systems
No ratings yet
Gaussian Process Latent Force Models For Learning and Stochastic Control of Physical Systems
7 pages
barany1998
No ratings yet
barany1998
5 pages
Convergence Properties
No ratings yet
Convergence Properties
9 pages
Ms Closed Form Rev
No ratings yet
Ms Closed Form Rev
36 pages
An Introduction To Particle Filters: David Salmond and Neil Gordon Sept 2005
No ratings yet
An Introduction To Particle Filters: David Salmond and Neil Gordon Sept 2005
27 pages
Aaronson - Denker - Urbański - Ergodic Theory For Markov Fibred Systems and Parabolic Rational Maps - TAMS - 1993
No ratings yet
Aaronson - Denker - Urbański - Ergodic Theory For Markov Fibred Systems and Parabolic Rational Maps - TAMS - 1993
54 pages
Koopman for hybrid systems
No ratings yet
Koopman for hybrid systems
8 pages
The Positive Real Control Problem and The Generalized Algebraic Riccati Equation For Descriptor Systems
No ratings yet
The Positive Real Control Problem and The Generalized Algebraic Riccati Equation For Descriptor Systems
18 pages
Tac 1971 1099621
No ratings yet
Tac 1971 1099621
10 pages
A Closed-Form Optimal Control For Linear Systems With Equal State and Input Delays
No ratings yet
A Closed-Form Optimal Control For Linear Systems With Equal State and Input Delays
6 pages
Clearance of flight control laws via parameter-dependent Lyapunov functions
No ratings yet
Clearance of flight control laws via parameter-dependent Lyapunov functions
6 pages
Dissipation Particle Creation E. B. L.: Quantum Fields From
No ratings yet
Dissipation Particle Creation E. B. L.: Quantum Fields From
4 pages
Adaptive iterative learning control for robot manipulators by Tayebi
No ratings yet
Adaptive iterative learning control for robot manipulators by Tayebi
9 pages
Feedback Control Systems : Book Reviews
No ratings yet
Feedback Control Systems : Book Reviews
2 pages
(1989 R. Ortega, M.W. Spong) Adaptive Motion Control of Rigid Robots - A Tutorial
No ratings yet
(1989 R. Ortega, M.W. Spong) Adaptive Motion Control of Rigid Robots - A Tutorial
10 pages
Discret Hamzaoui
No ratings yet
Discret Hamzaoui
19 pages
State-Space Solutions To Standard H2 and H Control Problems
No ratings yet
State-Space Solutions To Standard H2 and H Control Problems
6 pages
Some Lattice-Based Scientic Problems, Expressed in Haskell
No ratings yet
Some Lattice-Based Scientic Problems, Expressed in Haskell
23 pages
Global Stochastic Synchronization of Chaotic Oscillators
No ratings yet
Global Stochastic Synchronization of Chaotic Oscillators
6 pages
Adaptive Backstepping Control For An XY-table
No ratings yet
Adaptive Backstepping Control For An XY-table
5 pages
FVM11
No ratings yet
FVM11
8 pages
Dissipativity and Optimal Control Examining The Turnpike Phenomenon
No ratings yet
Dissipativity and Optimal Control Examining The Turnpike Phenomenon
14 pages
On-Adaptive-Measurement-Inclusion-Rate-In-Real-Time-Mo_2014_IFAC-Proceedings
No ratings yet
On-Adaptive-Measurement-Inclusion-Rate-In-Real-Time-Mo_2014_IFAC-Proceedings
6 pages
BBMbook Cambridge Newstyle
No ratings yet
BBMbook Cambridge Newstyle
373 pages
Tos MTB
No ratings yet
Tos MTB
3 pages
Intel Xeon-E-2400 Processor 30-3-30 Public
No ratings yet
Intel Xeon-E-2400 Processor 30-3-30 Public
18 pages
Math4120 Lecture-5-03 H
No ratings yet
Math4120 Lecture-5-03 H
8 pages
Devoir de Controle N2-7eme Annee de Base-Anglais - Reem Abderrahmen - Ksibet Elmadyouni
No ratings yet
Devoir de Controle N2-7eme Annee de Base-Anglais - Reem Abderrahmen - Ksibet Elmadyouni
4 pages
Module 1 Relevant Sources
No ratings yet
Module 1 Relevant Sources
24 pages
Mealy and Moore Machine
No ratings yet
Mealy and Moore Machine
44 pages
VFD Test
No ratings yet
VFD Test
2 pages
Impression Management Across Cultures: Category Questions Answers
No ratings yet
Impression Management Across Cultures: Category Questions Answers
2 pages
GR 7 English Literature Resource Material
No ratings yet
GR 7 English Literature Resource Material
80 pages
Abba Bishoy
No ratings yet
Abba Bishoy
2 pages
Mark Oliver Alvite Frederick Verana: Purposive
No ratings yet
Mark Oliver Alvite Frederick Verana: Purposive
40 pages
MC English Wisuda Angkatan Pertama SMP SIAS
No ratings yet
MC English Wisuda Angkatan Pertama SMP SIAS
3 pages
School OF Distance Education: Bharathiar University: Coimbatore - 46. Tamil Nadu, India
No ratings yet
School OF Distance Education: Bharathiar University: Coimbatore - 46. Tamil Nadu, India
2 pages
English 7 Summative Test 1 3RD Quarter
No ratings yet
English 7 Summative Test 1 3RD Quarter
2 pages
Chapter 5 Slides - Coulouris - SD
No ratings yet
Chapter 5 Slides - Coulouris - SD
22 pages
SADRAT
No ratings yet
SADRAT
14 pages
Select EndSelect
No ratings yet
Select EndSelect
1 page
Dynatrace JIRA Integration-1
No ratings yet
Dynatrace JIRA Integration-1
7 pages
Problem - Solutions Essay
No ratings yet
Problem - Solutions Essay
21 pages
Ghidra Software Reverse Engineering for Beginners Analyze identify and avoid malicious code and potential threats in your networks and systems 1st Edition by AP David ISBN 1800201842 9781800201842 - The ebook is ready for download, no waiting required
100% (8)
Ghidra Software Reverse Engineering for Beginners Analyze identify and avoid malicious code and potential threats in your networks and systems 1st Edition by AP David ISBN 1800201842 9781800201842 - The ebook is ready for download, no waiting required
90 pages
Radha
No ratings yet
Radha
15 pages
Completed Square
No ratings yet
Completed Square
7 pages
Unit 16. Assignment 01 - Brief
No ratings yet
Unit 16. Assignment 01 - Brief
31 pages
QuickStart MTD4
No ratings yet
QuickStart MTD4
8 pages
LEITURA e Interpretação ENSINO MÉDIO
No ratings yet
LEITURA e Interpretação ENSINO MÉDIO
3 pages
Operator'S Manual: Pulse / Modifiable Cctalk Coin Acceptor
No ratings yet
Operator'S Manual: Pulse / Modifiable Cctalk Coin Acceptor
20 pages
001 MBA Business Statistics Week 14 Parametric and Non Parametric Tests 17-06-2023 F1 SV
No ratings yet
001 MBA Business Statistics Week 14 Parametric and Non Parametric Tests 17-06-2023 F1 SV
109 pages
B1 U9 Lesson A First Conditional
No ratings yet
B1 U9 Lesson A First Conditional
9 pages
Inglés V, Primera Edición, Yaneth Ocando Pereira - 5 - Unidad V
No ratings yet
Inglés V, Primera Edición, Yaneth Ocando Pereira - 5 - Unidad V
14 pages

15 - Optimal Policies For Passive Learning Controllers

Uploaded by

15 - Optimal Policies For Passive Learning Controllers

Uploaded by

Automatica, Vol. 25. No. 5, pp. 757-763, 1989 0005-1098189 $3.00 + 0.

Optimal Policies for Passive Learning

Abstract--This paper deals with the optimal control of 2. Problem formulation

Casiello and Loparo (1985) an expansion of the dual cost Let

Define the a posteriori probabilities q i ( k ) = P ( r = i l I k ) , with

Note. Assume that no new information will be collected with

Furthermore, the optimal cost is x (~,(N - 1) - .~j(N - 1))

v(1, ~ = P . = ~'~ q i ( N - 1) tr ( Q N _ , P ~ ( N - 1) + S ( N - 1)P~(N- 1)

At Stage 1 we have - QN_,P,(N - 1) + PN_,P,(N - 1)

-.e(N- 1 ) ) 1 / N - ' } + ~'~ q , ( N - 1) tr { S ( N - 1)P~(N - 1) + SNQ ~}

= min / ~ q,(N- 1)m,(N)TSNm,(N)

+ ~ q~(N- I~-(N- 1)TQ~_I~(N- 1) V ( k + 1, I k) = ~'. qi(k)xi(k)TS(k)xi

Dr(k) = q~(l)B(r)TSt,(k + 1)B(r) + R k )' ,

+ ~ qi(O) tr (S,(N - l)e°(N - 1) + SNQ ")

Proof. At Stage 1 we have

+ u ( N - 1)TRN_lU(N- 1) V(k,/O) = ~'~ q,(O)~°(k - 1)TS°(k - 1)X°(k - 1)

× E(B(r)TS,(N)A(r)x(N - 1) 1/°)1/°}. u°(k) = - F ° ( k ) ( ~ , q,(O)B(i)TS°(k + 1)A(i)$°(k))

-(i~Fqi(O){B(i)TSNA(i)$°(N- 1)}TD°(N- 1) and S°(k) is given by

Then {Qk}k=o..N-I a sequence of positive semidefinite symmetnc

results. A ~ ( N - 1 ) = (~ffFqt(I)B(i)TSNB(i)+ RN-I) B(i)TSNA(i)

+ ~" qt(l) tr ~÷ S~i(n)Q"+S~(k)P~(k) , V (k + 1, I') = min t~'. qt(l) II$~(k)ll~,~k)

V(k, I t) = ~ qt(l)$~(k - 1)Xstt(k - 1)$~(k - 1)

lW(k) = Dr(k) = ( ~'. qi(l)B(i)TS~(k + 1)B(i) + R~ )' a unique minimum.

Grlfliths, B. and K. Loparo (1985). Optimal control of jump Let

= E °:,(2 %)xTpx, - 2 2 °t,°:rgTPx,

-- Y. ~,~:Tpx,+ Y . . , . : T P ~ , V(N, 1') = min E{fp(x(N) + f ( x ( N - 1), u(N - 1)) [ it}

JNo = rain E~q~(x(N)) + N-1

You might also like