0% found this document useful (0 votes)
63 views

Ece830 Fall11 Lecture20

1. The document discusses linear minimum mean square error (LMMSE) estimation. The LMMSE estimator of a parameter θ given data x is ΣθxΣ-1xx x, where Σθx and Σxx are the covariance matrices. 2. The orthogonality principle states that the error between the estimated and true parameters, θ - θ^, is orthogonal to the data x. This can be used to derive the LMMSE filter. 3. The Gauss-Markov theorem provides conditions when the LMMSE estimator is the optimal Bayesian estimator - specifically, when the joint distribution of x and θ is Gaussian. Then the posterior mean is linear and the LMMSE estimator
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Ece830 Fall11 Lecture20

1. The document discusses linear minimum mean square error (LMMSE) estimation. The LMMSE estimator of a parameter θ given data x is ΣθxΣ-1xx x, where Σθx and Σxx are the covariance matrices. 2. The orthogonality principle states that the error between the estimated and true parameters, θ - θ^, is orthogonal to the data x. This can be used to derive the LMMSE filter. 3. The Gauss-Markov theorem provides conditions when the LMMSE estimator is the optimal Bayesian estimator - specifically, when the joint distribution of x and θ is Gaussian. Then the posterior mean is linear and the LMMSE estimator
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ECE 830 Fall 2011 Statistical Signal Processing

instructor: R. Nowak
Lecture 20: Bayesian Linear Estimators
1 Linear Minimum Mean Square Error Estimator
Suppose our data is x R
n
, a random vector governed by a distribution p(x|), which depends on the pa-
rameter . Moreover, the parameter R
k
is treated as a random variable with E[] = 0 and E[
T
] =

.
Also, assume that E[x] = 0 and let
xx
:= E[xx
T
] and
x
:= E[x
T
]. Then, as we saw in the previous
lecture, the linear lter that provides the minimum MSE is given by:

A = arg min
AR
nk
E
_
A
T
x
2
2

A =
1
xx

x
The linear minimum MSE estimator (LMMSE) estimator is:

=

A
T
x =
x

1
xx
x.
2 Orthogonality Principle
Let

=
x

1
xx
x be the LMMSE estimator, dened above. Then
E
_
(

)
T
x

= E
_
tr
_


)x
T
_
= tr
_

x

x

1
xx

xx
_
= 0 .
In other words, the error (

) is orghogonal to the data X. This is shown graphically in Fig. 1. The
orthogality principle also provides a method for deriving the LMMSE lter. Consider any linear estimator
of the form

= B
T
x. If we impose the orthogonality condition
0 = E
_
(

)x
T

=
x
B
T

xx
then we see that B
T
must be equal to
x

1
xx
.
Example 1 Linear Signal Model. Suppose we model our detected signal as x = H+w, where x and x R
n
,
R
k
, H
nk
is a known linear transformation, and w is a noise process. Furthermore assume that
E[w] = 0, E[ww
T
] =
2
w
I
nn
E[] = 0, E[
T
] =
2

I
kk
In addition, we know that the parameter and the noise process are uncorrelated, i.e., E[w
T
] = E[w
T
]0. As
demonstrated before, the LMMSE estimator is

=
1
xx

x
x
1
Lecture 20: Bayesian Linear Estimators 2
Figure 1: Orthogonality between the estimator

and its error

.
where
1
xx
and
x
can be obtained as follows:

x
= E[x
T
] = E[(H +w)
T
] =
2

H
T

xx
= E[xx
T
] = E[(H +w)(H +w)
T
] =
2

HH
T
+
2
w
Therefore, the LMMSE estimator is given by

=
2

H
T
(
2

HH
T
+
2
w
I
nn
)
1
x
= H
T
(HH
T
+

2
w

I
nn
)
1
x .
3 Gauss-Markov Theorem
It is natural to ask when does the LMMSE estimator minimize the Bayes MSE amnog all possible estimators?
When is the linear estimar optimal? Based on our previous discussion of Bayesian estimators, we know that
the LMMSE esteimator is optimal, i.e., it is the minimum Bayesian MSE estimator, when the posterior
mean estimator is linear. This happens to be the case when both data and parameter are modeled as jointly
Gaussian.
Theorem 1 Gauss-Markov Theorem. Let x and y be jointly Gaussian random vectors, whose joint distri-
bution can be expressed as
_
x
y
_
N
__

x

y
_
,
_

xx

xy

yx

yy
__
then the conditional distribution of y given x is
y|x N
_

y
+
yx

1
xx
(x
x
),
yy

yx

1
xx

xy
_
.
According to the Gauss-Markov Theorem, the posterior mean of the density p(y|x) is a linear function of
x, and therefore this case the minimum MSE estimator is linear. Notice that the matrix
yx

1
xx
is precisely
Lecture 20: Bayesian Linear Estimators 3
the optimal A
T
derived above (if we take y = ). In general, when the joint distribution is non-Gaussian,
the minimum MSE estimator is nonlinear.
Example 2 Application to the Linear Signal Model. We model the detected signal as x = H + w where
w N(0,
2
w
I
nn
) and N(0,
2

I
kk
). Then the vector [x ]
T
is a multivarian Gaussian random vector.
As we saw in previous lectures, the Bayesian MSE is minimized by the posterior mean E[|x] which, in this
case, using the Gauss-Markov theorem, is
E[|x] =

+
x

1
xx
(x
x
)
= 0 +
2

H
T
(
2

HH
T
+
2
w
I
nn
)
1
(x 0)
=
2

H
T
(
2

HH
T
+
2
w
I
nn
)
1
x ,
which is the previously derived LMMSE estimator.
3.1 Proof of the Gauss-Markov theorem
Without loss of generality assume that x and y are zero-mean random vectors. Therefore
P(y|x) =
P(x, y)
P(x)
=
(2)
n/2
(2)
n/2
||
1
exp{
1
2
_
x y

1
_
x y

T
}
(2)
n/2
|
xx
|
1
exp{
1
2
x
T

1
xx
x}
where
=
_

xx

xy

yx

yy
_
.
To simplify the formula we need to determine
1
. The inverse can be written as:
_

xx

xy

yx

yy
_
1
=
_

1
xx
0
0 0
_
+
_

1
xx

xy
I
_
Q
1
_

yx

1
xx
I

where
Q :=
yy

yx

1
xx

xy
.
This formula for the inverse is easily veried by multiplying it by to get the identity matrix. Substituting
the inverse into P(y|x) yields
P(x|y) = (2)
n/2
|Q|
1
exp{
1
2
(y
yx

1
xx
x)
T
Q
1
(y
yx

1
xx
x)}
which shows that y|x N(
yx

1
xx
x, Q). For the general case when E[x] =
x
and E[y] =
y
then
(y
y
)|(x
x
) N(
yx

1
xx
(x
x
), Q)
y|x N(
y
+
yx

1
xx
(x
x
), Q)
Lecture 20: Bayesian Linear Estimators 4
4 The Wiener Filter
When the expected values of the parameter R
k
and the data x R
n
are zero, then the Wiener lter
A
opt
is obtained by minimizing the mean square error between the parameter and estimator:
A
opt
= arg min
A:
b
=Ax
E
_
Ax
2
2

which results in A
opt
=
x

1
xx
, involving second order moments and which becomes the optimal estimator
when both the data and the parameter are jointly Gaussian distributed.
Example 3 Signal + Noise Model. We model our detected signal as x = s + w where the noiseless signal
s (our parameter) follows a Gaussian distribution N(0,
ss
) and w N(0,
ww
). In addition, s and w are
uncorrelated. Therefore, the observation vector x N(0,
ss
+
ww
) and E[s x
T
] = E[s(s + w)
T
] =
ss
.
From here, the LMMSE estimator s becomes:
s =
ss
(
ss
+
ww
)
1
x.
Now assume that the observation is modeled x = H + w where now N(0,

) and w N(0,
ww
)
where R
k
and w R
n
(which are uncorrelated) and H is a known n k linear transformation matrix.
Therefore x N(0, H

H
T
+
ww
). In addition, E[ x
T
] = E[(H +w)
T
] =

H
T
and the estimator is

H
T
(H

H
T
+
ww
)
1
x.
Now suppose that

=
2

I
kk
and
ww
=
2
w
I
nn
. Then

=
2

H
T
(
2

HH
T
+
2
w
I
nn
)
1
x
and the LMMSE estimator of the signal is given by
s = H

=
2

HH
T
(
2

HH
T
+
2
w
I
nn
)
1
x .
The matrix HH
T
can be diagonalized by an orthonormal transformation U (i.e., HH
T
is a symmetric,
positive-semidenite matrix and UDU
T
is its eigendecomposition. Since H is rank k < n, only the rst k
elements of the diagonal are nonzero (and non-negative)
HH
T
= UDU
T
= U
_

2
1
. . . 0 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . .
2
k
0 . . . 0
0 . . . 0 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 0 0 . . . 0
_

_
U
T
.
As a consequence
s =
2

UDU
T
(
2

UDU
T
+
2
w
I
nn
)
1
x
=
2

UDU
T
(
2

UDU
T
+
2
w
UU
T
)
1
x
=
2

UDU
T
(U
_

D +
2
w
I
nn

U
T
)
1
x
=
2

UD
_

D +
2
w
I
nn

1
U
T
x
= U(
2

D
_

D +
2
w
I
nn

1
)U
T
x .
Lecture 20: Bayesian Linear Estimators 5
Note that the term in parenthesis is the diagonal matrix
_

2
1

2
1
+
2
w
. . . 0 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . .

2

2
k

2
k
+
2
w
0 . . . 0
0 . . . 0 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 0 0 . . . 0
_

_
which, as
2
w
/
2

tends to zero, converges to the identity matrix. Therefore, as the SNR , the Wiener
lter output
s UU
T
x = P
H
x ,
where P
H
= H(H
T
H)
1
H
T
, the orthogonal projection matrix onto the subspace spanned by H.
4.1 Frequency Domain Wiener Filter
Now let us consider a Wiener lter designed in the frequency domain. Again, our model is x = s + w and
now we take the DFT of both sides of this equation. Let U denote the DFT.
x = U
T
x = U
T
s +U
T
s = s + w
where x, s, and w denote the DFTs of the observation, signal and noise, respectively.
Lets specify our signal and noise models in the frequency domain as follows. Let s N(0,
s
) and
w N(0,
w
). Equivalently s N(0, U
s
U
T
) and w N(0, U
w
U
T
). In this case the Wiener lter is
given by
s =
ss
(
ss
+
ww
)
1
x
= U
s
U
T
(U[
s
+
w
]U
T
)
1
x
= U
s
U
T
U[
s
+
w
]
1
U
T
x
= U
s
[
s
+
w
]
1
U
T
x
s = U
_

2
1

2
1
+
2
1
. . . 0 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0
0
.
.
.

2
i

2
i
+
2
i
0 . . . 0
0
.
.
. 0 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0
0 . . . 0 0 . . .

2
n

2
n
+
2
n
_

_
U
T
x
where
2
j
and
2
j
are the jth diagonal elements of the diagonal matrices
s
and
w
, respectively. Therefore
the ltering process can be synthesized by the following algorithm:
1. Take the DFT of the measured signal.
2. Attenuate each frequency component according to
1
1+SNR
1
j
at frequency
j
, where SNR
j
=
2
j
/
2
j
.
3. Take the inverse DFT of the attenuated spectrum.
Lecture 20: Bayesian Linear Estimators 6
4.2 Classical derivation of the Wiener Filter
Again, we start with the model x = s+w where x, s, and w are wide-sense stationary processes. We express
them as time series
x[n] = s[n] +w[n] .
We aim at dening a lter h[n] that will be convolved with x[n] to etstimate s[n]
s[n] =

k
h[k]x[n k] .
Our lter should minimize the MSE:
MSE( s[n]) = E
_
(s[n] s[n])
2

= E
_
(s[n]
2
2s[n]
k
h[k]x[n k] + (
k
h[k]x[n k])
2

.
Dierentiating with respect to h[m] and making the derivative equal to zero
MSE( s[n])
h[m]
= E
_
2s[n]h[m] + 2(

k
h[k]x[n k])x[n m]

= 2R
sx
[m] + 2(

k
h[k]R
xx
[mk])
= 2R
ss
[m] + 2
_

k
h[k](R
ss
[mk] +R
ww
[mk])
_
= 0 .
Therefore the optimal lter satises R
ss
[m] =

k
h[k](R
ss
[mk] +R
ww
[mk]), which is the Wiener-Hopf
equation. Taking the Discrete-Time Fourier Transform (DTFT) of both sides, we get
S
ss
() = H()
_
S
ss
() +S
ww
()
_
where S
ss
() and S
ww
() are the power spectra of the signal and the noise process, respectively. Therefore,
the frequency response of the Wiener lter is
H() =
S
ss
()
S
ss
() +S
ww
()
.
5 Deconvolution
The nal topic of this lecture is deconvolution. We model the detected signal as x = Gs + w where G is
a circular convolution operator (a blurring transformation, shown in Fig. 2). As in the previous sections
s N(0, U
s
U
T
) and w N(0, U
w
U
T
). Furthermore, since G is circulant, G = UDU
T
, where D is a
diagonal matrix, which is the frequency response of G.
In this case, the Wiener lter solution is computed as follows:
s =
ss
G
T
(G
ss
G
T
+
ww
)
1
x
= U
s
U
T
G
T
(GU
s
U
T
G
T
+U
w
U
T
)
1
x
= U
s
U
T
UD
T
U
T
(UDU
T
U
s
U
T
UD
T
U
T
+U
w
U
T
)
1
x
= U
s
D
T
(D
s
D
T
+
w
)
1
U
T
x
= U

DU
T
x
where

D(k, k) =
D
T
(k,k)
|D(k,k)|
2
+P
1
(k,k)
and P(k, k) =
s(k,k)
w(k,k)
. Do not forget that the transpose operator works
as the conjugate transpose operator when the matrix has complex elements.
Lecture 20: Bayesian Linear Estimators 7
Figure 2: Blurring process. (a) Original impulse signal. (b) Blurring function. (c). Blurred signal
5.1 Classical Wiener Filter
Following a derivation simmilar to that of Section 4.2. In the case of a blurred, noise time series modeled as
x[n] = g[n] s[n] +w[n]
we aim at obtaining a lter h[n] such that the estimator of the deblurred, noisless signal is computed
from s[n] =

k
h[k]x[n k]. The resulting lter in Fourier domain is:
H() =
G

()S
ss
()
|G()|
2
S
ss
() +S
ww
()
where G() is the transfer function of the blurring lter g[n] and G

is its complex conjugate.

You might also like