0% found this document useful (0 votes)
60 views

Cap. 12 Kay - Fundamentals of Statistical Signal Processing - Estimation Theory

Cap.12 kay

Uploaded by

GervasioMartinez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
60 views

Cap. 12 Kay - Fundamentals of Statistical Signal Processing - Estimation Theory

Cap.12 kay

Uploaded by

GervasioMartinez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 32
Chapter 12 Linear Bayesian Estimators 12.1 Introduction The optimal Bayesian estimators discussed in the previous chapter are difficult to deter- mine in closed form, and in practice too computationally intensive to implement. They involve multidimensional integration for the MMSE estimator and multidimensional maximization for the MAP estimator. Although under the jointly Gaussian assump- tion these estimators are easily found, in general, they are not. When we are unable to make the Gaussian assumption, another approach must be used. To fill this gap we can choose to retain the MMSE criterion but constrain the estimator to be linear. Then, an explicit form for the estimator may be determined which depends only on the first two moments of the PDF. In many ways this approach is analogous to the BLUE in classical estimation, and some parallels will become apparent. In practice, this class of estimators, which are generically termed Wiener filters, are extensively utilized. 12.2 Summary The linear estimator is defined by (12.1), and the corresponding Bayesian MSE by (12.2). Minimizing the Bayesian MSE results in the linear MMSE (LMMSE) estimator of (12.6) and the minimum Bayesian MSE of (12.8). The estimator may also be derived using a vector space viewpoint as described in Section 12.4. This approach leads to the important orthogonality principle which says that the error of the linear MMSE estimator must be uncorrelated with the data. The vector LMMSE estimator is given by (12.20), and the minimum Bayesian MSE by (12.21) and (12.22). The estimator commutes over linear transformations (12.23) and has an additivity property (12.24) For a data vector having the Bayesian linear model form (the Bayesian linear model without the Gaussian assumption), the LMMSE estimator and its performance are summarized in Theorem 12.1. If desired, the estimator for the Bayesian linear model form can be implemented sequentially in time using (12.47)-(12.49). 379 380 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS 12.3 Linear MMSE Estimation We begin our discussion by assuming a scalar parameter 0 is to be estimated based on the data set {2{0],2[1],...,2{4’ ~ 1]} or in vector form x = {z{o]2{1]...2{N - 1]. ‘The unknown parameter is modeled as the realization of a random variable. We do not assume any specific form for the joint PDF p(x, 6), but as we shall see shortly, only a knowledge of the first two moments. That @ may be estimated from x is due to the assumed statistical dependence of @ on x as summarized by the joint PDF p(x, 8), and in particular, for a linear estimator we rely on the correlation between @ and x. We now consider the class of all linear (actually affine) estimators of the form _ Not 6= > anz[n] +ay (12.1) and choose the weighting coefficients a,,’s to minimize the Bayesian MSE Bmse(6) = E {@ S 6| (12.2) where the expectation is with respect to the PDF p(x, 6). The resultant estimator is termed the linear minimum mean square error (LMMSE) estimator. Note that we have included the ay coefficient to allow for nonzero means of x and 6. If the means are both zero, then this coefficient may be omitted, as will be shown later. Before determining the LMMSE estimator we should keep in mind that the estima- tor will be suboptimal unless the MMSE estimator happens to be linear. Such would be the case, for example, if the Bayesian linear model applied (see Section 10.6). Other- wise, better estimators will exist, although they will be nonlinear (see the introductory example in Section 10.3). Since the LMMSE estimator relies on the correlation between random variables, a parameter uncorrelated with the data cannot be linearly estimated. Consequently, the proposed approach is not always feasible. This is illustrated by the following example. Consider a parameter @ to be estimated based on the single data sample z(0}, where 2[0] ~ N(0,07). If the parameter to be estimated is the power of the [0] realization or @ = 22(0], then a perfect estimator will be 6 = 2"(0] since the minimum Bayesian MSE will be zero. This estimator is clearly nonlinear. If, however, we attempt to use a LMMSE estimator or 6 =aoz[0) +01, then the optimal weighting coefficients ay and a, can be found by minimizing Bmse() = E[(@- 6 | EB {(0 — ax(0] -— a)’] 12.3, LINEAR MMSE ESTIMATION 381 We differentiate this with respect to ao and a, and set the results equal to zero to produce f ° E((9 — aox{0] — a1)2(0]) E(6—aor[0}-a,) = 0 or aE (2"(0]) + 1 E(z[0]) = £(62(0)) aE(z[0)) +a. = (6). But £(z[0]) = 0 and 6(z(0]) = B(2*[0}) = 0, so that a = 0 a, = E(8) = E(z"(0)) = 07. ‘Therefore, the LMMSE estimator is 6 = 0? and does not depend on the data. This is because § and z[0| are uncorrelated. The minimum MSE is Bmse(§) = £[(@- 6] E((0-07)?] = E[(x*[0]-07)?] = E(2"[0)) - 20°F (2*(0)) +0° = 304-20 +0" = 2! as opposed to a minimum MSE of zero for the nonlinear estimator @ = 2*(0]. Clearly, the LMMSE estimator is inappropriate for this problem. Problem 12.1 explores how to modify the LMMSE estimator to make it applicable. We now derive the optimal weighting coefficients for use in (12.1). Substituting (12.1) into (12.2) and differentiating a N-1 - N-1 Bane (0 - x a,z(n] -e) =-2E ~ 2 conta -an| : Setting this equal to zero produces Nel ay = E(8) — >> anE(z{n}) (12.3) a0 which as asserted earlier is zero if the means are zero. Continuing, we need to minimize n=0 Bmse(@) = B { [= onCzln| — Blzin))) - (@- «| | 382 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS over the remaining a,’s, where ay has been replaced by (12.3). Letting a= [ay qa1.-.ay-1]", we have Bmse(6) = EB {la — E(x)) - (0- 5())"} E [a(x — E(x))(x - E(x))"a] — E [a"(x — B(x))(9- £@))} — E [(@ — B(6))(x— E(s))"a] + £ [(@ ~ £(6))?] = a™C,,a—a7Cz — Cora t+ Coo (12.4) where Czy is the N x N covariance matrix of x, and Cys is the 1 x N cross-covariance vector having the property that CZ, = Czo, and Coo is the variance of 9. Making use of (4.3) we can minimize (12.4) by taking the gradient to yield dBmse(6) Basel) = 2Cy.a— 2Cz9 which when set to zero results in a= Cz)Crs. (12.5) Using (12.3) and (12.5) in (12.1) produces 6 = alxt+ay CEC zx + B(8) ~ CieCz2 E(x) or finally the LMMSE estimator is 6 = E(6) + Co.Cz)(x ~ E(x). (12.6) Note that it is identical in form to the MMSE estimator for jointly Gaussian x and 6, as can be verified from (10.24). This is because in the Gaussian case the MMSE estimator happens to be linear, and hence our constraint is automatically satisfied. If the means of @ and x are zero, then : 6 = Cy,Cz2x. (12.7) ‘The minimum Bayesian MSE is obtained by substituting (12.5) into (12.4) to yield Bmse(8) = C2Cz}C.2C,2Cs0 ~ CrpCz Cxe ~ CorC5} Cre + Coo Coz Cz}Cx0 — 2CeeCz2C20 + Coo or finally i Bmse(8) = Coo — CozCz!Cz0. (12.8) Again this is identical to that obtained by substituting (10.25) into (11.12). An. example follows. 12.3, LINEAR MMSE ESTIMATION 383 Example 12.1 - DC Level in WGN with Uniform Prior PDF Consider the introductory example in Chapter 10. The data model is a{n]=A+uln] n=0,1,....N-1 where A~ U[—Ao, Ao], w{n] is WGN with variance 0?, and A and w(n| are independent. We wish to estimate A. The MMSE estimator cannot be obtained in closed form due to the integration required (see (10.9). Applying the LMMSE estimator, we first note that E(A) = 0, and hence E(z{n}) =0. Since E(x) = 0, the covariances are E(xx") = E[(At+w)(Al+w)"] = E(A*)iIT +071 Cor = E(Ax") = E[A(Al+w)"] = E(A*)iT where 1 is an N x 1 vector of all ones. Hence, from (12.7) A = CoCzx = 031704? +07l) x where we have let o = E(A?). But the form of the estimator is identical to that encountered in Example 10.2 if we let 4 = 0, so that from (10.31) (12.9) ‘As opposed to the original MMSE estimator which required integration, we have ob- tained the LMMSE estimator in closed form. Also, note that we did not really need to know that A was uniformly distributed but only its mean and variance, or that win] was Gaussian but only that it is white and its variance. Likewise, independence of ‘A and w was not required, only that they were uncorrelated. In general, all that is required to determine the LMMSE estimator are the first two moments of p(x,@) or (6) [ Coo Coe ] ga) |? | Cee Cee However, we must realize that the LMMSE of (12.9) will be suboptimal since it has been constrained to be linear. The optimal estimator for this problem is given by (10.9). 384 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS ‘(ll Figure 12.1 Vector space inter- (0) pretation of random variables 12.4 Geometrical Interpretations In Chapter 8 we discussed a geometrical interpretation of the LSE based on the concept of a vector space. The LMMSE estimator admits a similar interpretation, although now the “vectors” are random variables. (Actually, both vector spaces are special cases of the more general Hilbert space [Luenberger 1969].) This alternative formulation assumes that @ and x are zero mean. If they are not, we can always define the zero mean random variables 0’ = @ — E(8) and x’ = x — E(x), and consider estimation of 6” by a linear function of x’ (see also Problem 12.5). Now we wish to find the a,’s so that minimizes Bmse(6) = E {e 7 6] Let us now think of the random variables @, 20], 2[1],-..,2[ — 1] as elements in a vector space as shown symbolically in Figure 12.1. The reader may wish to verify that the properties of a vector space are satisfied, such as vector addition and multiplication by ascalar, etc. Since, as is usually the case, 9 cannot be perfectly expressed as a linear combination of the z[n]’s (if it could, then our estimator would be perfect), we picture @ as only partially lying in the subspace spanned by the z[n}’s. We may define the “length” of each vector x as ||z|| = /E(@2) or the square root of the variance. Longer length vectors are those with larger variances. The zero length vector is the random variable with zero variance or, therefore, the one that is identically zero (actually not random at all). Finally, to complete our description we require the notion of an inner product between two vectors. (Recall that if x,y are Euclidean vectors in R°, then the inner product is (x,y) = x’y = |[x|||Ilyl] cosa, where a is the angle between the vectors.) It can be shown that an appropriate definition, i.e., one that satisfies the properties of an inner product between the vectors z and y, is (see Problem 12.4) (2,y) = Eley). (12.10) With this definition we have that (2,2) = E(2*) lal, (12.11) 12.4. GEOMETRICAL INTERPRETATIONS 385 Figure 12.2 Orthogonal random variables—y cannot be linearly es- timated based on x consistent with our earlier definition of the length of a vector. Also, we can now define two vectors to be orthogonal if (z,y) = B(zy) (12.12) Since the vectors are zero mean, this is equivalent to saying that two vectors are orthog- onal if and only if they are uncorrelated. (In R® two Euclidean vectors are orthogonal if the angle between them is a = 90°, so that (x,y) = ||xl| |lyllcosa = 0.) Recalling our discussions from the previous section, this implies that if two vectors are orthogonal, we cannot use one to estimate the other. As shown in Figure 12.2, since there is no com- ponent of y along «, we cannot use 2 to estimate y. Attempting to do so means that we need to find an a so that g = az minimizes the Bayesian MSE, Bmse(j) = El(y — 9)”]. To find the optimal value of a ems. 4 oy —a2)"] = £[nG%) - 208 (ey) +0°Be) = -2K(cy) + 20E(2") 0, which yields : ‘The LMMSE estimator of y is just where N = 1,0=y, [0] =z, and Coz With these ideas in mind we proceed to determine the LMMSE estimator using the vector space viewpoint. This approach is useful for conceptualization of the LMMSE estimation process and will be used later to derive the sequential LMMSE estimator. As before, we assume that where ay = 0 due to the zero mean assumption. We wish to estimate @ as a linear combination of z[0], 2[1],....2{N — 1]. The weighting coefficients a, should be chosen 386 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS ‘e{t] (0) (a) (b) Figure 12.3 Orthogonality principle for LMMSE estimation to minimize the MSE > N-1 a E(@-4))] = £ [¢- ye on) | Noi i 2 = |e T anz{n}} But this means that minimization of the MSE is equivalent to a minimization of the squared length of the error vector ¢ = 9 - 9. The error vector is shown in Figure 12.3b for several candidate estimates. Clearly, the length of the error vector is minimized when € is orthogonal to the subspace spanned by {z(0], z(1],...,2[N — 1]}. Hence, we require 1 z(0},z{l],...,2fN —1] (12.13) or by using our definition of orthogonality E[(@-d)zfn]=0 2 =0,1, (12.14) This is the important orthogonality principle or projection theorem. It says that in estimating the realization of a random variable by a linear combination of data samples, the optimal estimator is obtained when the error is orthogonal to each data sample. Using the orthogonality principle, the weighting coefficients are easily found as | (0 Sanzin) sn =0 n=0,1,...,N-1 or ¥en B(elmle(n)) = B(Gz{n)) n= 12.4. GEOMETRICAL INTERPRETATIONS 387 In matrix form this is E(2?(0)) E(e[Ole[l]) ... E(e[Oz[N-1)) ] [ao E(z{1)2(0)) E(a*{1)) --» E(z{l]2[N - 1) a, E(z[N —Iz(0)) E(e{N—a}e(1]) ... E(221N -1)) ane Hoa = ; (12.15) E(6x(N ~ 1) These are the normal equations. The matrix is recognized as C,z, and the right-hand vector as Cre. Therefore, C,,a = Cre (12.16) and Cz Cre. (12.17) The LMMSE estimator of @ is 6 = alx = Cl,Czix or finally - 6 = CoeCz1x (12.18) in agreement with (12.7). The minimum Bayesian MSE is the squared length of the error vector or Bmse() = |lell? N-1 y? = |>— Sensi = (--San) | for the a,’s given by (12.17). But Bmse(#) = sr onstnl n=O mat EO) — F anE(e{n]8) — 5 an [(°- > sno) an : N= azo m=0 n=0 388 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS (1) 70) (a) Figure 12.4 Linear estimation by orthogonal vectors ‘The last term is zero due to the orthogonality principle, producing Bmse(4) = Cop —aT Czy > Cap ~ C4072 C xo Coe — ConCz2 C0 in agreement with (12.8) Many important results can be easily derived when we view the LMMSE estimator in a vector space framework. In Section 12.6 we examine its application to sequential estimation. Asa simple example of its utility in conceptualizing the estimation problem, we provide the following illustration. Example 12.2 - Estimation by Orthogonal Vectors Assume that z[0] and z(t] are aero mean and uncorrelated with each other. However, they are each correlated with 8. Figure 12.4a illustrates this case. The LMMSE esti- mator of 6 based on 2{0] and 21] is the sum of the projections of @ on 2{0] and z(I] as shown in Figure 12.4b or 6 = +6 te (0) \ _2(0) a(t] ) xf} . (* rao) Te(oii * («ae He(a]il" Each component is the length of the Projection (0, 2{n]/||x{n]j|) times the unit vector in the [n] direction. Equivalently, since jjx[n]|] = \/var(z[n]) is a constant, this may be written (8,z(0}) (42(1)) (2(0}, (0) (z{l}, <[a)) By the definition of the inner product (12.10) we have z2()+ (1). p = Fles(0)) , B(ezith 9 Ber) + Bary tel 12.5. THE VECTOR LMMSE ESTIMATOR 389 E(c?(0 o yy 20 [ B(6z(0l) B(6z(1)) ] re) en) | : | = CosCzex. Clearly, the ease with which this result was obtained is due to the orthogonality of z(0].2{1] or, equivalently, the diagonal nature of Cz2. For nonorthogonal data samples the same approach can be used if we first orthogonalize the samples or replace the data by uncorrelated samples that span the same subspace. We will have more to say about this approach in Section 12.6. ° 12.5 The Vector LMMSE Estimator The vector LMMSE estimator is a straightforward extension of the scalar one. Now we wish to find the linear estimator that minimizes the Bayesian MSE for each element. We assume that a YS ainz[n] + aww (12.19) n=0 6, for i = 1,2,...,p and choose the weighting coefficients to minimize Bmse(4;) = E[(6,-4)*] i= 1,2...50 where the expectation is with respect to p(x.6;). Since we are actually determining p separate estimators, the scalar solution can be applied, and we obtain from (12.6) 8, = EG) + CoxCz2(x- B(x) §=1,2,...57 and the minimum Bayesian MSE is from (12.8) Buse(6;) = Cee, —CozCriCre, 1 =1,2)....2- The scalar LMMSE estimators can be combined into a vector estimator as E(61) Co,2Cz2(x — E(x)) : E02) |_| Co.2Cii(x— E(a)) o= e + : 2) | | Cnez2(- BC) £0) ] [Sra = PO a) fect ee BG) | | Cae = E(8) + Ce,Cz3(x — E(x)) (12.20) 390 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS where now Co, is a px N matrix. By a similar approach we find that the Bayesian MSE matrix is (see Problem 12.7) Ms £[(e - 60 - 6)"| = Cog ~CoeCZ Cre (12.21) where Cgo is the p x p covariance matrix. Consequently, the minimum Bayesian MSE is (see Problem 12.7) Brnse(6;) = [Mj], - (12.22) Of course, these results are identical to those for the MMSE estimator in the Gaussian case, for which the estimator is linear. Note that to determine the LMMSE estimator we require only the first two moments of the PDF. Two properties of the LMMSE estimator are particularly useful. The first one states that the LMMSE estimator commutes over linear (actually affine) transformations. This is to say that if = a=A0+d then the LMMSE estimator of a is &=Ab+d (12.23) with 6 given by (12.20). The second property states that the LMMSE estimator of a sum of unknown parameters is the sum of the individual estimators. Specifically, if we wish to estimate a = 6; + 62, then a= 6,46 (12.24) where 6: = E(:) +Cs,2C22 (x - E(x) E(82) + Coy2Cz2(x — E(x)). = fl The proof of these properties is left to the reader as an exercise (see Problem 12.8). In analogy with the BLUE there is a corresponding Gauss-Markov theorem for the Bayesian case. It asserts that for data having the Bayesian linear model form, ile, the Bayesian linear model without the Gaussian assumption, an optimal linear estimator exists, Optimality is measured by the Bayesian MSE. The theorem is just the application of our LMMSE estimator to the Bayesian linear model. More specifically, the data are assumed to be x=HO+w where @ is a random vector to be estimated and has mean E(8) and covariance Cye, H is a known observation matrix, w is a random vector with zero mean and covariance C,,, 12.5. THE VECTOR LMMSE ESTIMATOR 391 and 6 and w are uncorrelated. Then, the LMMSE estimator of 6 is given by (12.20), where E(x) = HE(6) Cr, = HCH" +C,, Cor = CoH? (See Section 10.6 for details.) This is summarized by the Bayesian Gauss-Markov theorem. ‘Theorem 12.1 (Bayesian Gauss-Markov Theorem) /f the data are described by the Bayesian linear model form x=HO+w (12.25) where x is an N x 1 data vector, H is a known N x p observation matrix, @ is ap x1 random vector of parameters whose realization is to be estimated and has mean E(@) and covariance matriz Cop, and w is an N x 1 random vector with zero mean and covariance matrix C,, and is uncorrelated with @ (the joint PDF p(w,@) is otherwise arbitrary), then the LMMSE estimator of 8 is 6 E(8) + CooH" (HCopH? + C,,)~!(x — HE(0) (12.26) = E(8) + (Cop + H7C;'H) HC; (x — HE(6)). (12.27) The performance of the estimator is measured by the error € = 6 — 6 whose mean is zero and whose covariance matria is C. = Ezo(ee") Cop — Cop? (HCopH™ + C,,)~'HCoe (12.28) (Cog +H C,H) (12.29) The error covariance matrix is also the minimum MSE matric Mj whose diagonal elements yield the minimum Bayesian MSE Mélic = [Cli = Bmse(4,) (12.30) These results are identical to those in Theorem 11.1 for the Bayesian linear model except that the error vector is not necessarily Gaussian, An example of the determination of this estimator and its minimum Bayesian MSE has already been given in Section 10.6. The Bayesian Gauss-Markov theorem states that within the class of linear estimators the one that minimizes the Bayesian MSE for each element of 6 is given by (12.26) or (12.27). It will not be optimal unless the conditional expectation E(6|x) happens to be linear. Such was the case for the jointly Gaussian PDF. Although suboptimal, the LMMSE estimator is in practice quite useful, being available in closed form and depending only on the means and covariances. 392 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS 12.6 Sequential LMMSE Estimation In Chapter 8 we discussed the sequential LS procedure as the process of updating in time the LSE estimator as new data become available. An analogous procedure can be used for LMMSE estimators. That this is possible follows from the vector space viewpoint. In Example 12.2 the LMMSE estimator was obtained by adding to the old estimate 6p, that based on z[0], the estimate 6,, that based on the new datum z(I). This was possible because (0] and 2[1] were orthogonal. When they are not, the algebra becomes more tedious, although the approach is similar. Before stating the general results we will illustrate the basic operations involved by considering the DC Tevel in white noise, which has the Bayesian linear model form. The derivation will be purely algebraic. Then we will repeat the derivation but appeal to the vector space approach. The reason for doing so will be to “lay the groundwork” for the Kalman filter in Chapter 13, which will be derived using the same approach. We will assume a zero mean for the DC level A, so that the vector space approach is applicable. To begin the algebraic derivation we have from Example 10.1 with wa = 0 (see (10.11) oA AUN -1)= : Won ae where A[N 1] denotes the LMMSE estimator based on {z(0],2(1],--.,2[N —1]}- This is because the LMMSE estimator is identical in form to the MMSE estimator for the Gaussian case. Also, we note that from (10.14) a4? Bmse(A[N — 1]) = Nol +o?" (12.31) To update our estimator as 2[N] becomes available o% i < Aw) = 4. —— Yah Wl = age No} ee. : wien (Ee |+0) Noi ot SG a4 = — No +8 gy — Wane oy AN + eer Nolte? | a = Waters + aay ee : Nok+o? : a = Aw—y+ (Neate -1) aw -y+— 4, AlN 1+ (Ghee ) Aw N+ apoyo = AlN -1]+ (2{N) - AlN - 1)). (12.32) pees (N +104 +0? 12.6. SEQUENTIAL LMMSE ESTIMATION 393 Similar to the sequential LS estimator we correct the old estimate A[N — 1] by a scaled version of the prediction error z[N] — A[N — 1]. The scaling or gain factor is from (12.82) and (12.31) earn (N +104, +0? Bmse(A[N — 1}) = Bmse(ALN —1]) +02 (12.33) KIN] and decreases to zero as N -+ 00, reflecting the increased confidence in the old estimator. We can also update the minimum Bayesian MSE since from (12.31) oie? (N +104, +0? Noj+02 oho? (N +103, +0? No? +0? (1 — K[N])Bmnse(A[N ~ 1]). Bmse(A[N]) = Summarizing our results we have the following sequential LMMSE estimator. Estimator Update: AIN) = ALN - 1) + KIN|(2[N] - A[N - 1) (12.34) where Bmse(A[N — 1)) ale Bmse(A[N — 1]) +0?" (12.35) Minimum MSE Update: Bmnse(A(N]) = (1 — K[N])Bmse(A[N — 1)). (12.36) ‘We now use the vector space viewpoint to derive the same results. (The general sequen- tial LMMSE estimator is derived in Appendix 12A using the vector space approach. It is a straightforward extension of the following derivation.) Assume that the LMMSE estimator A[1} is to be found, which is based on the data {z[0],z/1]} as shown in Fig- ure 12.5a. Because z(0] and z[1] are not orthogonal, we cannot simply add the estimate based on z[0] to the estimate based on z(1]. If we did, we would be adding in an extra component along 2[0]. However, we can form A[1] as the sum of A[0] and a component orthogonal to A(0] as shown in Figure 12.5b. That component is AA[1]. To find a vector in the direction of this component recall that the LMMSE estimator has the property that the error is orthogonal to the data. Hence, if we find the LMMSE estimator of (1) based on z{0}, call it 2[1|0], the error 2{1] — #[1|0] will be orthogonal to x[0]. This 394 CHAPTER 12, LINEAR BAYESIAN ESTIMATORS (1) ~ #11) Figure 12.5 Sequential estimation using vector space approach is shown in Figure 12.5c. Since z{0] and [1] — 2[1|0] are orthogonal, we can project A onto each vector separately and add the results, so that Ali) = Alo] + AAI). To find the correction term AA[I] first note that the LMMSE estimator of a random variable y based on 2, where both are zero mean, is from (12.7) , — Ezy) o* Fe) z. (12.37) Thus, we have that ato) = FERED) ay _ BUA+ wD + Vl Bla + wor A - ae atl (12.38) 12.6. SEQUENTIAL LMMSE ESTIMATION 395 and the error vector <(1] = [1] — #(1|0] represents the new information that z[1] contributes to the estimation of A. As such, it is called the innovation. The projection of A along this error vector is the desired correction An) = ={1) ) (1) aan = (4 a3) rea E(AS(1) (1) EPI) If we let K(1] = E(Az[1])/(2(1]), we have for the LMMSE estimator Ala] = Alo] + K(1)(e{1] - 2(1/0)). To evaluate 4[1|0] we note that z[1] = A+ w[l]. Hence, from the additive property (12.24) #{1|0] = A[0] + [20]. Since wl] is uncorrelated with w{0], we have w(1|0] = 0. Finally, then 2(1|0] = A[0], and A{0] is found as E(Az(0)) a4 Berio) 701 = eae Ala = Thus, . z : A{i] = Afo] + K(1](2[1] - Ald). It remains only to determine the gain K(1] for the correction. Since a] = 2{i) - a{00] = {i)— Ald) = a - eae) the gain becomes B[4 (a) - x =(0))| K{l] r B [(ea a ape) ] MW which agrees with (12.33) for V = 1. We can continue this procedure to find A[2}, A(3), s+. In general, we 1. find the LMMSE estimator of A based on z{0], yielding A(0] 2. find the LMMSE estimator of z(1] based on z(0], yielding (1)0] 396 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS 8. determine the innovation of the new datum 2[1}, yielding x[1] — [1/0] 4, add to A[0] the LMMSE estimator of A based on the innovation, yielding A(1} 5. continue the process. In essence we are generating a set of uncorrelated or orthogonal random variables, namely, the innovations {z(0], 2(1]—2(1|0], ¢[2|—#(2|0, 1], ...}. This procedure is termed a Gram-Schmidt orthogonalization (see Problem 12.10). To find the LMMSE estimator of A based on {z[0],2(1],...,2{N — 1]} we simply add the individual estimators to yield Aw —1} = $= xe (ele) — 4in,1,...5n— 1) (12.39) where each gain factor is sy = EtA Coll — 201, ..-sm— 1D) iia) E [(2ln} - 2{nf0,1,....2- y?] This simple form is due to the uncorrelated property of the innovations. In sequential form (12.39) becomes AUN = AN - 1) + KIN\(2(N] - @[N{0,1,...,N - 1). (12.41) To complete the derivation we must find #[N|0,1,...,.V —1] and the gain K[N]. We first use the additive property (12.24). Since A and w[N] are uncorrelated and zero mean, the LMMSE estimator of z[N] = A + w[N] is 2(N0,1,...,N — 1] = A[N(0,1,...,N - 1) + @[N0,1,...,N— 1]. But A[N|0,1,...,.V — 1] is by definition the LMMSE estimator of the realization of A at time N based on {2(0],«(1],...,2{N —1]} or A[N — 1] (recalling that A does not change over the observation interval). Also, ti[N|0,1,...,N — 1] is zero since w[N] is uncorrelated with the previous data samples since A and w[n] are uncorrelated and w{n] is white noise. Thus, 2(N|0,1,...,N-1) = A[N - 1]. (12.42) To find the gain we use (12.40) and (12.42): ane E [acein] - An -1)] B [(eln) - Aly - 1] But [aGiM - Al -1)] = ((4- AlN = tel) ~ Al - 1) (12.43) 12.6. SEQUENTIAL LMMSE ESTIMATION 397 since z{N] — A[N — 1] is the innovation of z[N] which is orthogonal to {z(0],{1],..., 2[N — 1]} and hence to A[N — 1] (a linear combination of these data samples). Also, Elw[N|(A — A[N — 1))] = 0 as explained previously, so that 5 [A(ei] = Atv - )] = B[(4-Aw- uy] Bmse(A[N — ]). (12.44) E (el — An - 1] = B[(win]+a-Alw - 1] = Elw*{N)) +E {4 — Al - 1)?] = 07 +Bmse(A[N — 1) (12.45) so that finally i K(ny = —Bmse(4ly = 1) @? + Bmse(A[N — 1)) ‘The minimum MSE update can be obtained as (12.46) Bmse( A[N]) B[(4- App] E (4 — AN — 1] — KIN|(2(N] - AlN — »))] u E [(4 — AlN — 1) —2K(NJE [4 — A[N = 1))(2[N] - AlN - 1] + K(NIE {etn — AlN - 1] Bmse(A[N — 1]) — 2K[N]Bmse(A(N — 1]) + K*[N] (o? + Bmse(4(N - 1))) where we have used (12.43)-(12.45). Using (12.46), we have Bmse(A[N]) = (1 — K[N])Bmse(A[N — 1]). We have now derived the sequential LMMSE estimator equations based on the vector space viewpoint. In Appendix 12A we generalize the preceding vector space derivation for the sequen- tial vector LMMSE estimator. To do so we must assume the data has the Bayesian linear model form (see Theorem 12.1) and that C,, is a diagonal matrix, so that the w(n]'s are uncorrelated with variance E(w?(n]) = 02. The latter is a critical assumption and was used in the previous example. Fortuitously, the set of equations obtained are valid for the case of nonzero means as well (see Appendix 12A). Hence, the equations to follow are the sequential implementation of the vector LMMSE estimator of (12.26) 398 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS 2, hin] z{n] K(n] 8[n| 47 (n) Figure 12.6 Sequential linear minimum mean square estimator or (12.27) when the noise covariance matrix C,, is diagonal. To define the equations we let 6[n] be the LMMSE estimator based on {z/0],[1|,...,<(n]}, and M[n] be the corresponding minimum MSE matrix (just the sequential version of (12.28) or (12.29)) or Min B[(0- Ain)(o- 6(n))"]. Also, we partition the (n + 1) x p observation matrix as H[ln-1]]_[nxp brn] Li Then, the sequential LMMSE estimator becomes (see Appendix 12A) Estimator Update: —h7 [nJ6[n - 1) (12.47) where (12.48) Minimum MSE Matrix Update: M(p] = (I— K{nJh7[n])M[n — 1]. (12.49) The gain factor K[n] is a p x 1 vector, and the minimum MSE matrix has dimension p xp. The entire estimator is summarized in Figure 12.6, where the thick arrows indicate vector processing. To start the recursion we need to specify initial values for 6{n — 1] and M[n ~ 1], so that K[n] can be determined from (12.48) and then 6[n] from (12.47). To do so we specify 6[-1] and M(—1] so that the recursion can begin at n = 0. 12.6. SEQUENTIAL LMMSE ESTIMATION 399 Since no data have been observed at n = —1, then from (12.19) our estimator becomes the constant 6; = ajo. It is easily shown that the LMMSE estimator is just the mean of 8; (see also (12.3)) or 6[-1] = E(6). As a result, the minimum MSE matrix is B[(@ - 6-1)(6 - 6[-1)"] Cos. i M(-1] Several observations are in order. 1, Forno prior knowledge about 8 we can let Cag > oo. Then, we have the same formas the sequential LSE (see Section 8.7), although the approaches are fundamentally different. 2. No matrix inversions are required 3. The gain factor K(n] depends on our confidence in the new data sample as measured by 0% versus that in the previous data as summarized by M[n — 1]. An example follows. Example 12.3 - Bayesian Fourier Analysis We now use sequential LS to compute the LMMSE estimator in Example 11.1. The data model is a(n] = acos2rfon + bsin2afon+win] n=0,1,...,.N—1 where fy is any frequency (in Example 11.1 we assumed fy to be a multiple of 1/N), excepting 0 and 1/2 for which sin 2n fon is identically zero, and w[n] is white noise with variance 0”. It is desired to estimate 8 = [a6] sequentially. We further assume that the Bayesian linear model form applies and that @ has mean E(@) and covariance oI. ‘The sequential estimator is initialized by 6-1] = 5(6) M[-1] = Cop = 031 Then, from (12.47) ; 6{0] = 8[-1] + K[0|(e[0] — h*[0}6[—1)). The h?(0] vector is the first row of the observation matrix or bj =[1 0]. Subsequent rows are [cos 2nfon sin2nfun ]. 400 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS ‘The 2x 1 gain vector is, from (12.48), M(-1)b(0] KO = SyarpiM_ inp) where M[—1] = 03]. Once the gain vector has been found, [0] can be computed. Finally, the MSE matrix update is, from (12.49), Mo ~ K(o}h7(0})M(—1] and the procedure continues in like manner for n > 1. ° 12.7 Signal Processing Examples - Wiener Filtering We now examine in detail some of the important applications of the LMMSE estimator. In doing so we will assume that the data {z(0],2(1],...,2[N — 1]} is WSS with zero mean. As such, the N x N covariance matrix C,, takes the symmetric Toeplitz form T22(0] Tee(l] «++ Pee(N ~ 1] Cn = fit in Be axl — 2) ree(N =I] reel — 2}. Teel] = Rez (12.50) where rz2(k] is the ACF of the z(n] process and R,., denotes the autocorrelation matrix. Furthermore, the parameter @ to be estimated is also assumed to be zero mean. The group of applications that we will describe are generically called Wiener filters. ‘There are three main problems that we will study. They are (see Figure 12.7) 1. Filtering, where 9 = s{n] is to be estimated based on 2[m] = s[m] + w(m] for m =0y1,...,n. The sequences s[n] and w(n] represent signal and noise processes. The problem is to filter the signal from the noise. Note that the signal sample is estimated based on the present and past data only, so that as n increases, we view the estimation process as the application of a causal filter to the data. 2. Smoothing, where @ = s(n] is to be estimated for n = 0,1,...,N —1 based on the data set {2(0], 2{1],...,2[N —1]}, where 2{n] = s(n] +w(n]. In contrast to filtering we are allowed to use future data. For instance, to estimate s{1] we can use the entire data set {x(0],2[1],...,2{N — 1]}, while in filtering we are constrained to use only {z(0],x(1]}. Clearly, in smoothing an estimate cannot be obtained until all the data has been collected. 3. Prediction, where 6 = z[N — 1+] for la positive integer is to be estimated based on {z(0},2(1),...,2{N — 1]}. This is referred to as the /-step prediction problem. 12.7. SIGNAL PROCESSING EXAMPLES - WIENER FILTERING 401 Naa 1 3 a 30] 310] at] (1) 82] a(2] [3] [3] (a) Filtering (b) Smoothing a{n] fn) qt: | T tT Tce 1 3405 (c) Prediction (d) Interpolation Figure 12.7 Wiener filtering problem definitions A related problem that is explored in Problem 12.14 is to estimate z[n] based on {2{0],...,2[n — 1], 2{n +1],...,2[N — 1]} and is termed the interpolation problem. To solve all three problems we use (12.20) with E(@) = E(x) = 0 or 6 =Cy.Czx (12.51) and the minimum Bayesian MSE matrix given by (12.21) Mg = Cop — CysCz2 Cro. (12.52) Consider first the smoothing problem. We wish to estimate 8 = s = [s(0] s[1]...s[N — 1] based on x = [z[0]2[1]...2{N — 1]]?. We will make the reasonable assumption that the signal and noise processes are uncorrelated. Hence, ‘ss[R] + Pww[h]. Tzelk] = 402 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS Then, we have Cor = Rez = Ras + Row: Also, Coz = E(sx") = E(s(s+w)") = Rew Therefore, the Wiener estimator of the signal is, from (12.51), 8 = R,,(Roy + Run) 1X. (12.53) The N x N matrix W=R,,(R., + Row)! (12.54) is referred to as the Wiener smoothing matriz. The corresponding minimum MSE matrix is, from (12.52), Ms = Ros — Res(Res + Raw)” 'Ras (- WR... (12.58) As an example, if N = 1 we would wish to estimate s{0] based on z[0] = s{0] + w(0]. Then, the Wiener smoother W is just a scalar W given by pelle (12.56) 2210] + Pow where 1) = P'ss(0]/Tww(0] is the SNR. For a high SNR so that W — 1, we have §{0] — 2/0], while for a low SNR so that W — 0, we have s(0] > 0. The corresponding minimum MSE is My = (1 Wyral0] = (1-225) ral which for these two extremes is either 0 for a high SNR or r,,(0] for a low SNR. A numerical example of a Wiener smoother has been given in Section 11.7. See also Problem 12.15. We next consider the filtering problem in which we wish to estimate @ = s{n] based on x = [z(0]2[1]...2(n]]”. The problem is repeated for each value of n until the entire signal s(n] for n = 0,1,...,.N —1 has been estimated. As before, 2{n] = s{n] +w(n], where s(n] and w(n] are signal and noise processes that are uncorrelated with each other. Thus, Coz = Roy + Raw where Rys, Ruy ate (n +1) x (n+ 1) autocorrelation matrices. Also, Cor E(s{nj( 2(0] [1] -.. 2{n) }) E(s(nj{ s(0) sll] .-- s(n} }) [realm] rs5(0— 1] ...7es(0]] - " Letting the latter row vector be denoted as r.,, we have from (12.51) = rh, (Ryy + Raw) X (12.57) 12.7, SIGNAL PROCESSING EXAMPLES - WIENER FILTERING 403 The (n+ 1) x 1 vector of weights is seen to be a= (Ry + Row) ry recalling that 3{n] = a"x where a = (ao a) ... dn)" as in our original formulation of the scalar LMMSE estimator of (12.1). We may interpret the process of forming the estimator as n increases as a filtering operation if we define a time varying impulse response h'")[k]. Specifically, we let h("[K] be the response of a filter at time n to an impulse applied k samples before. To make the filtering correspondence we let AM (k] =n. =0,1,...,0. Note for future reference that the vector h = [A‘")(0] A%™[1]...A™(n]]7 is just the vector a when flipped upside down. Then, = Salm = Ra or Yam [klein — k] (12.58) k=o which is recognized as a time varying FIR filter. To explicitly find the impulse response h we note that since (Rey + Row) a =r it follows that (Ras + Run) b= Pas where r.. = [rs,(0]rss(1]---rss(n]]”. This result depends on the symmetric Toeplitz nature of R,. + Ryn, and the fact that h is just a when flipped upside down. Written out, the set of linear equations becomes Te[0] Tee HI + Tea(n] i ae (0) Ts5(0] ial oo 3 ] | | a 1 i = (12.59) reel] Teein—1] ... — ree[0] An] real] where rz2[k] = rss(k] + Twulk]. These are the Wiener-Hopf filtering equations. It appears that they must be solved for each value of n. However, a computationally efficient algorithm for doing so is the Levinson recursion, which solves the equations recursively to avoid resolving them for each value of n (Marple 1987]. For large enough 404 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS nit can be shown that the filter becomes time invariant so that only a single solution is necessary. In this case an analytical solution may be possible. The Wiener-Hopf filtering equations can be written as > AM [Arse —k]=res[] = 0,1,...,0 =o where we have used the property rez[—k] = rze[k|. As n—+ 00, we have upon replacing the time varying impulse response h'")(k] by its time invariant version A[k] SY Alkreell — A] = rel] imo (12.60) The same set of equations result if we attempt to estimate s(n] based on the present and infinite past or based on z[m] for m < n. This is termed the infinite Wiener filter. To see why this is so we let 8[n] = S* Alklz[n - &] to and use the orthogonality principle (12.13). Then, we have sh or by using the definition of orthogonality S{nzin-]=0 1 =0,1,.. = afr) L ...,2{n = Ia BU(s| Hence, E o Alk]z(n — kla[n — s) = E(s{n|z(n — 1) io and therefore, the equations to be solved for the infinite Wiener filter impulse response are So Alkreell =A reels Onda A little thought will convince the reader that the solutions must be identical since the problem of estimating s(n] based on x{m] for 0 < m < nas n-} 0 is really just that of using the present and infinite past to estimate the current sample. The time invariance of the filter makes the solution independent of which sample is to be estimated or independent of n. The solution of (12.60) utilizes spectral factorization and is explored in Problem 12.16. At first glance it might appear that (12.60) could be solved by using Fourier transform techniques since the left-hand side of the equation is a convolution of two sequences. This is not the case, however, since the equations hold only for / > 0. If, indeed, they were valid for ! < 0 as well, then the Fourier transform approach would be viable. Such a set of equations arises in the smoothing problem in which s[n] is 12.7. SIGNAL PROCESSING EXAMPLES - WIENER FILTERING 405 to be estimated based on {...,2{1],2[0],c{1],..-} or [A] for all k. In this case the smoothing estimator takes the form 3fn] = J axcx{kl and by letting A[k] = an—« we have the convolution sum xy lkla{n — k] where Alk] is the impulse response of an infinite two-sided time invariant filter. The Wiener-Hopf equations become > Afklroz[t — k] = real hk] -20 1 based on x = [z(0) z[1]...x[N — 1]]?. The resulting estimator is termed the l-step linear predictor. We use (12.51) for which Cy, = Res where R,, is of dimension N x N and Coe = EleiN-144[ 2[0] ft] ... 2{V-1] ]} = [reelN-140) res[N-24+1] ... reel] }. Let the lat ter vector be denoted by r’-,. Then, > #(N -141) =r,Rz!x Recalling that a=R;Jr, (12.63) we have A #[(N -14+ = Do axclk]. imo If we let h[N — k] = ax to allow a “filtering” interpretation, then vet ¥ ALN = Alaa] =o a[N -14+]] ew -k| (12.64) and it is observed that the predicted sample is the output of a filter with impulse response h(n]. The equations to be solved are from (12.63) (noting once again that h is just a when flipped upside down) R,,h =f. where rez = [rez[l] rox{l + 1]...rs2[N —1+ JJ". In explicit form they become Te2(0] Tell]... Tee[N— I] All] Tea(ll reel0] «.- teslN—2} | | all ae eer en) AlN 12.7, SIGNAL PROCESSING EXAMPLES - WIENER FILTERING 407 reall _| Hea (12.65) Tz2(N -14+]] These are the Wiener-Hopf prediction equations for the [-step linear predictor based on N past samples. A computationally efficient method for solving these equations is the Levinson recursion [Marple 1987]. For the specific case where | = 1, the one-step linear predictor, the values of —h(n] are termed the linear prediction coefficients which are used extensively in speech modeling [Makhoul 1975]. Also, for / = 1 the resulting equations are identical to the Yule-Walker equations used to solve for the AR filter parameters of an AR(N) process (see Example 7.18 and Appendix 1), ‘The minimum MSE for the l-step linear predictor is, from (12.52), Mz =r2_(0)— 12 Rett ee or, equivalently, Mz = Teel) - ra va Te2l0] — Yo aeree[N -1+1-F] " im i = Prz(0] - e AN — k]rce[N —1+1-k] im N. = reel] — Do hlelree[e + (0 - D):- (12.66) 5 ‘As an example, assume that 2[n] is an AR(1) process with ACF (see Appendix 1) ow ik Tee[k] = Toil (-af1]) and we wish to find the one-step predictor #[N] as (see (12.64)) Nx #[N] = > Akl [N - Po ‘To solve for the A{k]’s we use (12.65) for 1 = 1, which is Si Mireetm =reelm) m=1,2 iat Substituting the ACF, we have SOMA ala = (af)! 408 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS It is easily verified that these equations are solved for niki = { ‘ile Feast iN. Hence, the one-step linear predictor is #{N] = -a[i}z[N - 1) and depends only on the previous sample. This is not surprising in light of the fact that the AR(1) process satisfies al [1Jz{n - 1) + u{n] where u(n] is white noise, so that the sample to be predicted satisfies a[N] = -a{1]z2[N - 1] + uf). ‘The predictor cannot predict u[N] since it is uncorrelated with the past data samples (recall that since the AR(1) filter is causal, z[n] is a linear combination of {u{n}. u[n — 1],...} and thus z[n] is uncorrelated with all future samples of u[n]). The prediction error is 2[N] — #[N] = u[N], and the minimum MSE is just the driving noise power ‘To verify this we have from (12.66) Mz = ree(0)—SlAlMreolAl = rra[0) +a[1|ree(l] = ag tll a We can extend these results to the (-step predictor by solving si Alk|recfm — kl =reelm+!-1] m=1,2,. ket Substituting the ACF for an AR(1) process this becomes w Ala] (aly! = (aay ‘The solution, which can be easily verified, is alk] = { Cala) 12.7, SIGNAL PROCESSING EXAMPLES - WIENER FILTERING 409 ? a z = 0 10 «15 200 5) 30 35 458 Sample number, Figure 12.8 Linear prediction for realization of AR(1) process and therefore the /-step predictor is a{(N -1) +) = (-a{i))! [Nv - 1) (12.67) The minimum MSE is, from (12.66), Mz = Pee(0)—hll)reell] = (alt ay al (1-a™1)). It is interesting to note that the predictor decays to zero with I (since |a[1]| < 1). This is reasonable since the correlation between 2{(N — 1) +), the sample to be predicted, and 2{N —1], the data sample on which the prediction is based, is rz2{l]. As l increases, Tzx[l) decays to zero and thus so does #{(N — 1) +]. This is also reflected in the minimum MSE, which is smallest for | = 1 and increases for larger 1. ‘A numerical example illustrates the behavior of the /-step predictor. If a{1] = —0.95 and 2 = 0.1, so that the process has a low-pass PSD, then for a given realization of z{n) we obtain the results shown in Figure 12.8. The true data are displayed as a solid line, while the predicted data for n > 11 is shown as a dashed line. The predictions are given by (12.67), where N = 11 and / = 1,2,...,40, and thus decay to zero with increasing l. As can be observed, the predictions are generally poor except for small |. See also Problems 12.19 and 12.20. 410 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS References Kay, S., “Some Results in Linear Interpolation Theory,” IEEE Trans. Acoust., Speech. Signal Process., Vol. 31, pp. 746-749, June 1983. Luenberger, D.G., Optimization by Vector Space Methods, J. Wiley, New York, 1969. Makhoul, J., “Linear Prediction: A Tutorial Review,” Proc. IEEE, Vol. 63, pp. 561-580, April 1975. Marple, S. Orfanidis, | Jn, Digital Spectral Analysis, Prentice-Hall, Englewood Cliffs, N.J., 1987. 5, Optimum Signal Processing, Macmillan, New York, 1985. Problems 12.1 Consider the quadratic estimator 6 = ax?(0] + bz[0] +c of a scalar parameter @ based on the single data sample {0}. Find the coefficients a.b,c that minimize the Bayesian MSE. If 2{0| ~ U{—}, 4], find the LMMSE estimator and the quadratic MMSE estimator if @ = cos2rz{0]. Also, compare the minimum MSEs. 12.2 Consider the data a{n] = Ar" + wh where A is a parameter to be estimated, r is a known constant, and wfn] is zero mean white noise with variance 9?. The parameter A is modeled as a random variable with mean pra and variance 03 and is independent of w{n|. Find the LMMSE estimator of A and the minimum Bayesian MSE. 12.3 A Gaussian random vector x = [2 £2]” has zero mean and covariance matrix C2. If x2 is to be linearly estimated based on 2), find the estimator that minimizes the Bayesian MSE. Also, find the minimum MSE and prove that it is zero if and only if C,, is singular. Extend your results to show that if the covariance matrix of an N x 1 zero mean Gaussian random vector is not positive definite, then any random variable may be perfectly estimated by a linear combination of the others. Hint: Note that fa 2 E le ) | =alC..a. iI 12.4 An inner product (2, y) between two vectors x and y of a vector space must satisfy the following properties a. (x,2) > 0 and (2,2) =0 if and only if ¢ = 0. b. (z,y) = (yz) ce. (citi + cata, y) = C1(t1, 4) + C2(t204)-

You might also like