1993 - 101 - Subspace Algorithms For The Stochastic Identification Problem - 1993
1993 - 101 - Subspace Algorithms For The Stochastic Identification Problem - 1993
00
PrintedinGreatBritain. ~ 1993PergamonPressLtd
Ala~raet--In this paper, we derive a new subspace algorithm where dlkp is the Kronecker delta. It is assumed
to consistently identify stochastic state space models from that the stochastic process is zero mean
given output data without forming the covariance matrix and
using only semi-infinite block Hankel matrices. The stationary, i.e.: l~.[xk] = 0 and E[XkXtk] = ~ (say).
algorithm is based on the concept of principal angles and The state covariance matrix X is independent of
directions. We describe how they can be calculated with QR the time k. This implies that A is a stable matrix
and Quotient Singular Value Decomposition. We also
provide an interpretation of the principal directions as states (all of its poles are strictly inside the unit circle).
of a non-steady state Kalman filter bank. The central problem discussed in this paper is
1. INTRODUCTION the identification of a state space model from the
data Yk (including the determination of the
LET Yk • 9~t, k = 0, 1. . . . , K be a data sequence
system order n) and the determination of the
that is generated by the following system:
noise covariance matrices. The state space model
Xk+ 1 = A x k + Wk, (1) should be equivalent up to within second order
statistics of the output.
Yk = CXk + Ok, (2)
The main contributions of this paper are the
where Xk • ~ " is a state vector. The vector following:
sequence Wk • 9~" is a process noise while the • Since the pioneering papers by Akaike (1975),
vector sequence Vk • 9t t is a measurement noise. canonical correlations (which were first intro-
They are both assumed to be zero mean, white, duced by Jordan (1875) in linear algebra and
Gaussian with covariance matrix** then by Hotelling (1936) in the statistical
community) have been used as a mathematical
tool in the stochastic realization problem. In
this paper we show how the approach by
Akaike (1975) and others (e.g. A r u n and
* Received 2 January 1992; received in final form 12 Kung, 1990; Larimore, 1990) reduces to
September 1992. The original version of this paper was not applying canonical correlation analysis to two
presented at any IFAC meeting. This paper was recom-
mended for publication in revised form by Associate Editor matrices that are double infinite (i.e. have an
B. Wahlberg under the direction of Editor P. C. Parks. infinite number of rows and columns). A
t The research reported in this paper was partially careful analysis reveals the nature of this
supported by the Belgian Program on Inter University
Attraction Poles initiated by the Belgian State Science Policy double infinity and we manage to reduce the
Programming (Prime Minister's Office) and the European canonical correlation approach to a semi-
Community Research Program ESPRIT, Basic Research infinite matrix problem, i.e. only the number
Action nr. 3280. The scientific responsibility is assumed by its
authors. of columns needs to be very large while the
* Katholieke Universiteit Leuven, Department of Electri- number of (block) rows remains sufficiently
cal Engineering--ESAT. Kardinaal Mercierlaan 94, 3001 small. This observation is extremely relevant
Heverlee, Belgium. email: vanoverschee
@esat.kuleuven.ac.be Fax: 32/16/221855. with respect to the use of updating techniques.
§ Research Assistant of the Belgian National Fund for • In order to find the state space model, we
Scientific Research. derive a finite-dimensional vector sequence
IIAuthor to whom all correspondence should be sent.
¶ Research Associate of the Belgian National Fund for (principal directions) which, in the case of
Scientific Research. double infinite block Hankel matrices, would
** The expected value operator is denoted by E[.]. be a valid state sequence of the stochastic
649
650 P. VAN OVERSCHEE and B. DE MOOR
model. This sequence would correspond to the (QSVD) of the triangular factors and is
outputs of an infinite number of steady state completely data driven instead of covariance
Kalman filters with an infinite number of driven. The fact that only one of the
output measurements as inputs. For the dimensions of the matrices involved needs to
semi-infinite matrix problem, the sequence go to infinity is very important since then
corresponds to the output of an infinite updating techniques can be used.
number of non-steady state Kalman filters that We should mention the work by Aoki (1987),
have only used a finite number of output data in which a "covariance" framework for the
as input. These state sequences are obtained stochastic realization is derived (in contrast to
directly from the output data, without any our square root version). Another aspect is that we
need for the state space model. The state emphasize more than is done in Aoki (1987), is
space model is then derived from these the careful analysis of the dynamics of the
sequences by solving a least squares problem. algorithm as a function of the number of block
Figure 1 illustrates the difference between this rows of the Hankel matrices involved. This
approach and the classical one, where first the paper is organized as follows. In Section 2, we
system matrices are identified, where after the summarize the main properties of linear
states are determined through a Kalman filter. time-invariant stochastic processes, including the
In our point of view, the notion of a state is non-uniqueness of the state space description. In
largely underestimated in the context of Section 3, we briefly present the solution via the
system identification, while it is well accepted classical realization approach, which is based on
in the context of control system design. Most the fact that the output covariance matrices can
identification approaches are based on an be considered as Markov parameters of a
input-output (transfer matrix) framework. deterministic linear time-invariant discrete time
• We give a precise interpretation of the fact system. In Section 4, we describe how the
that state space models, obtained via principal projection of past and future block Hankel
angles and directions, are approximately matrices of output data, leads to the definition of
balanced in the stochastic sense. the state sequences. In Section 5 we explore the
• It is a common belief that A R M A model relationship of these state sequences with the
identification requires non-linear estimation non-steady state Kalman filter. From these
methods, requiring iterative solution tech- sequences, it is easy to derive the stochastic state
niques (see e.g. Ljung, 1987). In that space model. This will be done in Section 6. The
framework, the main problems arise from second part of this paper consists of a description
convergence difficulties a n d / o r the existence of a numerical robust algorithm to calculate the
of local minima. With the subspace approach state sequences and the state space matrices. In
of this paper, this A R M A modelling is Section 7 we provide a geometrical inter-
basically reduced to the solution of an pretation of the main mathematical-geometrical
eigenvalue problem, for which numerically tool: principal angles and directions. We will
robust and always convergent algorithms exist. also show how the state sequences can be
• We derive a numerically robust square root calculated as principal directions. In Section 8,
algorithm, that mainly uses QR-decomposition we describe a numerical robust computation
and Quotient Singular Value Decomposition scheme based on the Q R and quotient singular
value decomposition. We also put everything
together and arrive at a new, numerically robust
algorithm to calculate the state space model of
I Output data Yk ] the stochastic system directly from output data.
Principal angles Classical Some illustrative examples are given in Section
and directions Identification 9.
of Xk, we know that: E[XkV~] = 0 and E[x,w~] = with Wbk= KbVbk and v~ = Y k - Gtrk • H e r e Kb is
O. Then we find the Lyapunov equation for the the backward Kalman gain: Kb=(C t-
state covariance matrix Y ~fE[xkx~,] = AY-,A t + AW-1G)(Ao - G W - I G ) -1 and N is the back-
Q. Defining the output covariance matrices ward state covariance matrix, which can be
A def ~ r t determined from the backward Riccati equation:
lxi = ~[Yk+iYk] we find for Ao:
Ao = C E C ' + R. (3) N=A'NA +(Ct-AtNG)
while the second subscript is the time index of matrix (11). The identification scheme is based
the bottom left element. For all output block on geometrical insights and in contrast with
Hankel matrices, the number of columns will be previously described similar algorithms (Akaike
j and for all theoretical derivations, we assume (1975); Arun and Kung (1990); Larimore (1990))
that j ~ oo. How to deal with a finite number of that needed double infinite block Hankel
output measurements will become clear in matrices, consistent estimates are obtained with
Section 8. semi-infinite output block Hankel matrices.
An important remark concerns the estimation For sake of elegance of the proofs, covariance
of the covariance matrices. H e r e t o we assume matrices will still be formed in the theoretical
that they can be estimated as motivation that follows. The "square root"
algorithm described in Section 8 will avoid this
A/ = l i m [ ~ j-I
~ Yk+iY~k] , and will thus have far better numerical
j--.oo k =0 robustness.
for which we will use the notation First, we define the orthogonal projection of
semi-infinite matrices as follows.
=0yk+iy , with an obvious definition of
Definition 2. Orthogonal projections of semi-
E j [ . l = l i m l [ . l . So, due to ergodicity and the
j--,~ j
infinite matrices. Given two matrices P1 e ~p,×i
and P2 ~ ~p2×/, with j---* oo. The orthogonal
infinite number of data at our disposition, we
projection of the row space of P1 onto the row
can replace the expectation operator E (average
space of P2 is defined as: Ej[P1P~]
over an infinite number of experiments) with the (Ej[P2pt2I)-IP2 ~ ~P'×/.
different operator Ej applied to the sum of
variables (average over one, infinitely long,
Now, we compute the orthogonal projection
experiment).
of the row space of Y,12~-1 (the future) onto the
The classical realization approach now follows
row space of Yoli_ l (the past). It follows from the
directly from equations (5)-(6). Consider the
rank deficiency of the covariance matrix
correlation matrix between the future block
Ej[Yil2i_lytoli_l] (11) that the row space of this
output Hankel and the past block output Hankel
projection will be an n-dimensional subspace of
matrix:
the j-dimensional ambient space. This can also
be seen from
t t --1
)
Ai Ai-1 Ai-2 "•• A2 A1 \ Ej[ Y~I2a-,Yor,-,I(Ej[Yol,-t Yol,-,]) Yol,-,
Ai+1 Ai Ai-l "•• A3 A2 = ~ ( A ' - ' G . . . A G G)L~'Yo[,_,
= [ A/+2 Ai+I Ai "'" A4 A3
= ~i~iLTIYoli_,.
~A2,--1
. . . . A~-2
. . . . A2i-3
. . . . ". ' ". . Ai+I
. / [ , , i" where we define
def t
/ (" \ L, = Ej[YoI,_,YoI,_,]
At) A-1 A-z • • " Ai-i
A~ Ao A_ 1 •.. A2_ i
= A2 A1 m~) • " " A3_ i
-- ~~i (say). (11)
Hence we obtain, as j-->o0, a rank deficient Ai- l Ai-2 mi_ 3 •• • a~)
block Toeplitz matrix. Its rank is equal to the
= Ej[Y~Iz,_,Y~I2,_,].
state dimension n and a model for A, G and C
can be obtained from the factorization into an The last equality is due to stationarity. Hence, a
extended observability and controllability matrix basis for the n-dimensional projection of the row
using for instance the approaches via the SVD as space of Y~I~-I onto that of Yoli-i is formed by
in Zeiger and McEwen (1974) or Kung (1978). the rows of the matrix Z;:
Note that the rank deficiency is guaranteed to
hold as long as i is larger than the largest
Z, = ~,Li-'Yol,-,. (12)
controllability or observability index of the In a similar way, we find for the orthogonal
system. projection of the row space of Yoli-~ onto the
row space of Y,%~-t:
4. ORTHOOONAL PROJECTIONS t t -1
In this section, we describe a new approach Ej[Yoli-l Yil2i-1](Ej[Yilzi-m Yil2i-l]) Yil2i-I
t t --1
that avoids the formation of the covariance = c~i~iL i Yil2i-1.
Subspace algorithms for stochastic identification 653
~'~'i+ 1 = c~i+lLi+lC~i+
--1 t l
= (A% G ) L ; 1 + LF'cC~C'A-'CCCiLi-I -, , , -1 , ,
\ _A-lCC~it-[ 1 A-1 ]l x G t ]
and with A = A o - CZ~C', we find: Riccati equation. The Kalman gain associated
with this equation at time step i is: K~=
Y'i+A = AZ~A' + ( A Z ~ C - G)(A~)- CY.iCt) -1
( G - A ~ , i f t ) ( m o - C~,iCt) -1. We prove now that
x (CZ,A t - G'), Zi÷l is formed out of Z~ with the help of this
Kalman gain. First, we can write (with N,. and M~
which is one step of the forward difference unknown)
\A~ / /
~/+1(0 0 "'" I) t = a•i C t "+ M i ( m 0 - - C ~ _ , i C t ) ,
Theorem 2. Given the sequences Zi, Z~+I, W~ 7.1. Principal angles and directions
and W~+1. The matrices A, C, G and Ao satisfy:
DefinMon 3. Principal angles and directions.
A = Ej[Z~+iZ~](Ej[ZiZ~) -1, (17) Given two matrices U ~ ~P×/ and V • 9tq×~,
A ' = Ej[W~+IW~](Ei[W~W~])-1, (18) (j---~ oo) of full row rank and the Singular Value
Decomposition (SVD):t
C = Ej[Y~I,Z~](Ei[Z,Z~])-', (19)
Ut(Ej[ UU'])-IEj[ UV'](Ej[VVt])-IV = ptSQ.
G'= Ej[Y,_ll,_I W~](Ej[W~W~])-1, (20)
(22)
A~, = Ej[Yqy~Ii]. (21)
Then the principal directions between the row
Proof of equation (17). spaces of U and V are defined as the left (P) and
right (Q) singular vectors (the principal direc-
tions in the row space of U, respectively V). The
= ~i+1 L i --1
+ l E j [ r o l i V o tl i - l l L i --1 ~ i t cosines of the principal angles between the row
--1 t --1 t --1 spaces of U and V are defined as the singular
X (~,L, Ej[roli_lroli_~]Li ~ )
values (the diagonal of S). The principal
I 0 ... 0 directions and angles between the row spaces of
0 ! ... 0 U and V will be denoted as:
= ~i+1 . . . . . . . . . . . ~[U, V] = [P, S, Q].
0 0 ... I
Geometrical motivation. The principal angles
0 0 --- 0 between two subspaces are a generalization of an
X LZI~(~/LZI~) -1
angle between two vectors (the concept goes
= AC~,L71C~(C~,L?lC~)-1 back to Jordan (1875)). Suppose we are given
two matrices U ~ ~P×J and V ~ ,¢tlq×/. The first
=A.
principal angle 01 (the smallest one) is obtained
The proof of equation (18) is similar. as follows: Choose unit vectors ul e Rrow(U) and
vl~ R~ow(V) and minimize the angle between
Proof of equation (19). themAt Next choose a unit vector u2 e Rrow(U)
orthogonal to ul and v2 e Rrow(V) orthogonal to
Ej[ Y,I,Z,q(Ej[Z,Z I) -1 vl and minimize the angle 02 between them.
= Ej[Y,I~Y'oI~_I]LFlC~(~LFI~) -1 This is the second principal angle. Continue in
this way until min(p, q) angles have been
=(A, A~_l " " A1)LF'C~0g/LFI~) -1
found. This informal description can also be
= CC~,L/-'~(C~,L~-lc~it)-I
formalized.
=C.
Definition 4. Principal angles and directions.
The proof of (20) is similar. Equation (21) The principal angles 01 -< 02-<" • • -< :r/2 be-
directly follows from the definition of Ao and of tween the row spaces grow(U) and grow(V) of
Ej. Hence we have a general outline for finding two matrices U e ~ p×/ and V E ~ qx] and the
A, G, C and Ao from output data, if Z~, Z~+I, W~ corresponding principal directions u,-eRrow(U)
and W/+l were available. and v~ e Rrow(V) are defined recursively as
Let us as a last remark point out that the
matrix A also satisfies the following Sylvester c o s Ok = max u t v = utkvk,
equation. u~Rrow(U), veRr,,w(V)
7.2. Determination ofZi, Wi, Zi+l and Wi+ 1 is easy to derive that:
~i ~ t t -- .
Ej[Y,12,_,Z,l(Ej[Z,Z,]) (31)
Theorem 3.
Given (with j-~ oo): Together with Ej[P](P))']=I, equation (27)
follows easily. Following the same reasoning,
~[Yoli-2, Y/-112/-1] = [P/+I, Si+l, Qi+l],
equations (24) and (28) can be derived.
Q,],
~l~[Y01i-l, E [ 2 / - 1 1 = [P/, Si, Equation (25) can be derived as follows. We
~[Yol,, Y~+q2,--ll = [P~-,, Si-l, Qi-l]. know from (14) that Zi+l is equal to any basis of
the n-dimensional projection of the row space of
Then: Y/+II2i_ 1 o n t o the row space of Iioli. Once again,
• The number of elements in Si different from hAt (~,1 ~ l / 2-i-~
we know from Definition 3 that ,,,i-~wi-w pl
zero is equal to the state dimension n. The is a valid basis, with Mi_~ an arbitrary similarity
same holds for S~-1 and Si+l. This means that transform. In the same way as above, we derive
there are only n principal angles different from that the 6._~ corresponding to the choice of this
~t/2. basis is equal to:
Denote with p1+1, P~, P]-I, a~+l, a~, a~-i
the matrices with the first n principal ~i--I = E j [ Y i + l l 2 i - , ( P i -1, ) t
directions, and with Si+l,
1 Sl:, S]_ 1 the n x n ( q 1 ~ l/2AAtt ]
X i Oi_l) a,Zi_l]
matrices with elements from Si+~, S~, S H (not 1 t --1
zero) on the diagonals.
× (Mi_lSi_lMi_l)
__ 1 t
• Then, a possible choice for the state matrices - Ej[Y/+,I2i_,(P/_,)
is given by:§ x (S]_,)-'nlM,71,
Zi = (S~)u2P~, (23) = ~'i_lMF_ll.
W~= (SI)laQ 1, (24) If we strip the last l rows from ~ in (27), we
must find the same matrix ~_~. Thus:
Zi+ 1 = [ ( ~ i ) t ~ i _ l ]
X ($I_1)112p~_1, (25) ~ i _ I M Z ~ , = e i.
the QR-decomposition and the QSVD. The row spaces of A and B are given by:
QSVD is a generalization for two matrices of the ~ [ A , B] = [ ( Q I U ) t, S, ( Q 1 U S + Q2VT)t],
SVD for one matrix.
Proof. Because A and B are of full row rank, we
Lemma 1. Any pair of matrices A ~ .~tm×n and
have
B ~ .~tp×n can be decomposed as
At(AAt)-IA = 01 Oil,
A = UaSaX',
B t ( B B ' ) -1B = (Q1 US + Q 2 V T )
B = UbSbX t,
x (S 'U' Q~' + T W ' Q2).
'
where Ua e ,film×m, Ub ~ .~tp×p are orthonormal,
It follows from the properties of the QSVD that
X ~ ~ × ~ is square non-singular and S~ e ~m×-
StS+TtT=I and hence Q I U S + Q 2 V T is an
and Sb e ,~tp×" have the following quasi-diagonal
orthonormal matrix. We find that
structure:
At(AA')-~ABt(BBt)-IB
Sa-~. U)S(Q, US + Q 2 V T ) ' ,
r,,, - rb
{r..lr 0
r.b-r n-r.b)
0 0
= (Q,
which is clearly an SVD. It follows from
Definition 3 that it contains the principal angles
ro + rb - r~b ~ 0 Da 0 0 '
and directions.
m-r~ - 0 0 0 0
The major advantage of this result lies in the
fact that principal angles and directions can
Sb ~
0.6 0 , B= ,
described in this section; 0 0 0.8
(b) the algorithm as described in Section 8;
(c) the algorithm described in Aoki (1987).
C= 0.3 , D = (1),
For every i, 100 Monte Carlo experiments were
performed. The mean value of these 100 \0.5/
experiments is plotted for every i on Fig. 4 (the which is a third order system with one output,
exact value is 0.949). Clearly the algorithm and complex poles on the unit circle (this is a
described in this section (which is the one that is limiting case of a stable stochastic system). So,
normally used in the literature), calculates there is a pure autonomous sinusoidal mode in
heavily biased solutions for small i. The bias is the output (initial state x 0 = ( 1 1 0)'). The
still significant for i, as large as 8 (eight times eigenvalues of A are: 0.6 + 0.8i and 0.8. Once
larger than the dimension of the state vector). again, we performed 100 Monte Carlo simula-
On the other hand, the new algorithm (of tions (i = 4, j = 1000) for the two algorithms.
Section 8) does not suffer at all from this Figure 5 shows the absolute values of the 100
problem. Neither does the method of Aoki, estimates of the poles. The top figure used the
which is a covariance based method, and is algorithm of the previous subsection. The full
mentioned just for comparison. lines are the absolute values of the exact poles (1
Figure 4 (bottom) also shows the variance of and 0.8). Clearly there is a bias on the estimates
the estimated A matrix. Clearly the variance of about 0.2 = 20%. The second figure shows the
grows smaller when i grows larger (as mentioned estimates with the new algorithm of this paper.
in Section 5). It should also be noted that, for The bias is eliminated. The sample mean for
small i, the biased algorithm of this section these 100 experiments was:
trades off bias for a smaller variance. Even
though it is not clear from this example, we (0.7993~ = (0.9983~
found from other experiments that the variance m~ = \0.6733]' mb \0.8080/
10. CONCLUSIONS
0.9[
In this paper, we have derived a new subspace
0.8 algorithm for the consistent identification of
stochastic state space descriptions, using only
0.7
0 2 4 6 8 10 12 14 16 semi-infinite block Hankel matrices of output
i measurements. The explicit formation of the
covariance matrix and double infinite block
Hankel matrices are avoided. The algorithm is
0"08I
0.06
based on principal angles and directions. We
0,04 have described a numerical robust and fast way
0.02 ~ 8 8 6 Q 0 ~ t ~ to calculate these on the basis of Q R and QSVD
0 decomposition. We have also interpreted the
0 2 4 6 8 10 12 14 16 principal directions as non-steady state Kalman
i
filter states. With an example, we have
Fro. 4. Mean (top) and variance (bottom) of the estimated illustrated that the new algorithm is superior to
eigenvalues in function of i for the algorithm of Section 9.1
(circle), for the algorithm of this paper (star) and for the the classical canonical correlation algorithms,
algorithm of Aoki (cross). especially for small i.
660 P. VAN OVERSCHEE and B. DE MOOR
1.2
0.8
,* ,~, ,~ - ., ,1..-,.,,.,,,,:,,., ,,, , , ,. [ , . , "-,;,,%, ; , " , , ; , ,,/
0.6 - ij , ', ': i: ;"
I I I I I I I ! I
0 10 20 30 40 50 60 70 80 90 100
1.2
vv,
'l ~
""'v.y.~. -v'"v--v'F'I)(~
Ii f
, t,t
U ~- ,v: , ~ , ,, i
w, '~Y't~,..,~
i 'i i
,~,, ? ~l'',,', ;, ~ ,,i ,. . ,;~ ~, :,, f, , ;,,~ 1;~: ,
0.8 4.--q-.- ~". . . . . "4V- ~.'-4'
' ';'~,l". ,' ", ,,:
-~a.-.%u,'~-.~.-.~
;" ',,l '
.......I I;-'~,.4-s.-,I.~..,..t...,
,' ;;,~,.,t
..... ~, - -%'--.-I
,,~," ,l '..,; 1 : "
..... ~,:......
0 '
10 2'o .
30 . 40
. . 50 60 70 8'o 100
FIG. 5. One hundred estimates of the absolute value of the eigenvalues with the algorithm of Section 9.1 (top). The horizontal
lines show the exact results. The bottom figure shows the same for the algorithm of this paper.