0% found this document useful (0 votes)
33 views

MVUE

This document summarizes a lecture on unbiased estimation. It discusses that an unbiased estimator has an expected value equal to the parameter being estimated. It provides examples of unbiased and biased estimators for DC level in white Gaussian noise. It also discusses that unbiased estimators may not necessarily be good, covering bias-variance tradeoff. Finally, it introduces the Cramer-Rao lower bound, which specifies the minimum possible variance of an unbiased estimator.

Uploaded by

Haorui Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

MVUE

This document summarizes a lecture on unbiased estimation. It discusses that an unbiased estimator has an expected value equal to the parameter being estimated. It provides examples of unbiased and biased estimators for DC level in white Gaussian noise. It also discusses that unbiased estimators may not necessarily be good, covering bias-variance tradeoff. Finally, it introduces the Cramer-Rao lower bound, which specifies the minimum possible variance of an unbiased estimator.

Uploaded by

Haorui Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

VE564 Summer 2023

Lecture 2: Unbiased Estimation

Prof. H. Qiao
UM-SJTU Joint Institute
May 16, 2023
Unbiased Estimators

An estimator θ̂ is unbiased if

Epθ̂q “ θ, θ P pa, bq

Unbiased Estimator for DC Level in White Gaussian Noise


The observations in WGN

xrns “ A ` w rns, n “ 0, 1, ¨ ¨ ¨ , N ´ 1

The sample mean estimator


« ff
N´1 N´1
1 ÿ 1 ÿ
 “ xrns, EpÂq “ E xrns “ A
N n“0 N n“0

1
Unbiased Estimators

An estimator θ̂ is unbiased if

Epθ̂q “ θ, θ P pa, bq

Biased Estimator for DC Level in White Gaussian Noise

#
N´1
1 ÿ 1 “A If A “ 0
Ă “ xrns, EpĂq “ A
2N n“0 2 ‰A Otherwise

Unbiasedness means being unbiased for all θ P pa, bq

1
Unbiased Estimators

An estimator θ̂ is unbiased if
Epθ̂q “ θ, θ P pa, bq

Caution
Unbiased estimator may not necessarily be a good estimator. For
example, given n uncorrelated, unbiased estimators, tθ̂i u we consider
the averaging estimator
n
1ÿ
θ̂ “ θ̂i
n i“1

The averaging estimator satisfies


n
1 ÿ
Epθ̂q “ θ, Varpθ̂q “ Varpθ̂i q
n2 i“1

1
Minimum Variance Criterion

The essential metric for evaluating different estimators is the mean


square error (MSE):
” ı
msepθ̂q “ E pθ̂ ´ θq2

Bias-Variance Trade-off
"”´ ¯ ´ ¯ı2 *
msepθ̂q “E θ̂ ´ Epθ̂q ` Epθ̂q ´ θ

“ Varpθ̂q ` Biaspθ̂q2

2
Minimum Variance Criterion

The essential metric for evaluating different estimators is the mean


square error (MSE):
” ı
msepθ̂q “ E pθ̂ ´ θq2

DC Level in White Gaussian Noise


Consider the modified DC level estimator
N´1
1 ÿ
Ă “ a xrns
N n“0

We have
a2 σ 2
msepĂq “ ` pa ´ 1q2 A2
N
A2
aopt “ 2
A ` σ 2 {N

The optimal estimator is biased and unrealizable. 2


Minimum Variance Unbiased Estimator

For simplicity, we would like to first look for the unbiased estimator that
has the smallest variance (mse as well in this case). Such estimator may
not exist.

3
Existence of the Minimum Variance Unbiased Estimator

A Counter Example
Assume
#
N pθ, 1q If θ ě 0
xr0s „ N pθ, 1q, xr1s „
N pθ, 2q If θ ă 0

Consider two unbiased estimators


1 2 1
θ̂1 “ pxr0s ` xr1sq , θ̂2 “ xr0s ` xr1s
2 3 3
It can be shown that
# #
18 20
36 If θ ě 0 36 If θ ě 0
Varpθ̂1 q “ , Varpθ̂2 q “
27 24
36 If θ ă 0 36 If θ ă 0

4
Existence of the Minimum Variance Unbiased Estimator

A Guarantee [TPE, Theorem 1.2]


Let X be distributed according to a distribution in P “ tPθ , θ P Ωu.
For every U-estimable function g pθq, there exists an unbiased estimator
that uniformly minimizes the risk for any loss function Lpθ, dq which is
convex in its second argument.

4
Finding the Minimum Variance Unbiased Estimator*

• Determine the Cramer-Rao lower bound (CRLB) and see whether


any unbiased estimator satisfies it.

• Apply the Rao-Blackwell-Lehmann-Scheffe theorem

• Further restrict the unbiased estimators to be linear

5
Estimator Accuracy Considerations

Consider the case of single measurement in Gaussian noise:


„ 
1 1 2
xr0s “ A ` w r0s, pi pxr0s; Aq “ a exp ´ 2 pxr0s ´ Aq
2πσi2 2σi

Intuition: If σ12 ă σ22 , then A can be estimated more accurately based on


p1 pxr0s; Aq

6
Estimator Accuracy Considerations

The PDF ppxr0s; Aq is called likelihood function of unknown parameter


A with xr0s fixed. We need to formally characterize the curvature of the
likelihood function.

7
Estimator Accuracy Considerations

? 1
ln ppxr0s; Aq “ ´ ln 2πσ 2 ´ 2 pxr0s ´ Aq2

B ln ppxr0s; Aq 1
“ 2 pxr0s ´ Aq
BA σ
B 2 ln ppxr0s; Aq 1
´ “ 2 Ñ ”curvature”
BA2 σ
The ”curvature” increases as σ 2 decreases. The average (varying xr0s)
curvature is given by
„ 2 
B ln ppxr0s; Aq
´E
BA2

where the expectation is taken with respect to ppxr0s; Aq. Note: In this
special case, the ”curvature” is independent of A.

7
Cramer-Rao Lower Bound
CRLB-Scalar Parameter
Suppose the likelihood function ppx; θq satisfies the regularity condition
„ 
B ln ppx; θq
E “ 0, @θ P Ω

Then the variance of any unbiased estimator θ̂ (i.e. MSE) must satisfy

1
Varpθ̂q ě ” ı
B2 ln ppx;θq
´E Bθ 2

where the expectation is taken with respect to ppx; θq. Furthermore, an


unbiased estimator attains the lower bound if and only if

B ln ppx; θq
“ I pθq pg pxq ´ θq

for some functions I p¨q and g p¨q. And the estimator is θ̂ “ g pxq with
variance 1{I pθq.
8
Cramer-Rao Lower Bound
DC Level in White Gaussian Noise
Consider

xrns “ A ` w rns n “ 0, 1, ¨ ¨ ¨ , N ´ 1

The likelihood function is given by


« ff
N´1
1 1 ÿ 2
ppx; Aq “ N exp ´ 2 pxrns ´ Aq
p2πσ 2 q 2 2σ n“0

And we have
N´1
B ln ppx; Aq 1 ÿ N
“ 2 pxrns ´ Aq “ 2 px̄ ´ Aq
BA σ n“0 σ
B 2 ln ppx; Aq N σ2
“ ´ 2, VarpÂq ě
BA2 σ N
1
řN´1
For this case, the CRLB is attained by θ̂ “ x̄ “ N n“0 xrns
8
Cramer-Rao Lower Bound

The CRLB may not be attained by any unbiased estimator. An MVU


may or may not be efficient (attaining the CRLB).

8
Alternative Form of CRLB*

If the likelihood function ppx; θq satisfies the regularity condition, we have


2

«ˆ ˙2 ff
B 2 ln ppx; θq
„ 
B ln ppx; θq
E “ ´E
Bθ Bθ2

and then
1
Varpθ̂q ě „´ ¯2 
B ln ppx;θq
E Bθ

2
See appendix 3A of S. Kay Vol. 1

9
Fisher Information

The Fisher information is defined by


„ 2 
B ln ppx; θq
I pθq fi ´E
Bθ2

Intuitively, the more information, the lower the bound. Further more, it is
non-negative and additive for independent observations:
N´1
ÿ
ln ppx; θq “ ln ppxrns; θq
n“0
N´1
B 2 ln ppx; θq ÿ „ B 2 ln ppxrns; θq 
„ 
´E “ ´ E
Bθ2 n“0
Bθ2
„ 2 
B ln ppxrns; θq
I pθq “ Nipθq “ ´NE (If identically distributed)
Bθ2

10
Example: CRLB for Signals in White Gaussian Noise

Consider a deterministic signal parameterized by θ in WGN:

xrns “ srn; θs ` w rns, n “ 0, 1, ¨ ¨ ¨ , N ´ 1

We have
N´1
B ln ppx; θq 1 ÿ Bsrn; θs
“ 2 pxrns ´ srn; θsq
Bθ σ n“0 Bθ
ˆ 2 ˙ N´1 ˆ ˙2
B ln ppx; θq 1 ÿ Bsrn; θs
E “ ´
Bθ2 σ 2 n“0 Bθ

and finally
σ2
Varpθ̂q ě ´ ¯2
řN´1 Bsrn;θs
n“0 Bθ

This bound implies that signals that change rapidly along with θ result in
accurate estimators.
11
Transformation of Parameters

Suppose we want to estimate α “ g pθq, the the CRLB for estimating α is


given by 3
´ ¯2
Bg

Varpα̂q ě ” ı
B2 ln ppx;θq
´E Bθ 2

• Efficiency of an estimator is destroyed by nonlinear transformation.


E.g., x̄ is an efficient estimator of DC level A but x̄ 2 is not even
unbiased.

3
See appendix 3A

12
Transformation of Parameters

Suppose we want to estimate α “ g pθq, the the CRLB for estimating α is


given by 3
´ ¯2
Bg

Varpα̂q ě ” ı
B2 ln ppx;θq
´E Bθ 2

• Efficiency of an estimator is destroyed by nonlinear transformation.


E.g., x̄ is an efficient estimator of DC level A but x̄ 2 is not even
unbiased.
• Efficiency is maintained by affine transformation:

g pθq “ aθ ` b

3
See appendix 3A

12
Extension to a Vector Parameter

Suppose we want to estimate a vector parameter θ “ rθ1 , θ2 , ¨ ¨ ¨ , θp sT .


Then we have 4 :

Varpθ̂i q ě rI´1 pθqsii

where Ipθq P Rpˆp is the Fisher information matrix, which is defined by


„ 2 
B ln ppx; θq
rIpθqsij “ ´E
Bθi Bθj

4
See appendix 3B

13
Example: DC level in WGN

Suppose we do not know the noise power σ 2 and want to simultaneously


estimate θ “ rA, σ 2 sT . Then the Fisher information matrix is given by
» ” 2 ı ” 2 ı fi
´E B lnBA ppx;θq
2
ln ppx;θq
´E B BABσ 2
Ipθq “ – ” 2
ln ppx;θq
ı ” 2 ı fl
´E B Bσ 2 BA ´E B lnBσppx;θq
2 2

We have known that


N´1
N N 1 ÿ
ln ppx; θq “ ´ ln 2π ´ ln σ 2 ´ 2 pxrns ´ Aq2
2 2 2σ n“0

and it implies
« ff
N
σ2 0
Ipθq “ N
0 2σ 4

The diagonal structure implies that the estimation of A and σ 2 are


uncorrelated (not always be so).
14
Example: Line Fitting

Consider the problem of line fitting in WGN with parameters θ “ rA, BsT

xrns “ A ` Bn ` w rns

The Fisher information matrix is given by


» ” 2 ı ” 2 ı fi
´E B lnBA ppx;θq
´E B ln ppx;θq « ff
NpN´1q
2 BABB 1 N 2
Ipθq “ – ” 2 ı ” 2 ı fl “
NpN´1q NpN´1qp2N´1q
´E B ln ppx;θq
´E B lnBBppx;θq
2
σ2 2 6
BBBA

The Fisher information matrix Ipθq is not diagonal which implies the
estimates of A, B are correlated. Indeed, in this case, we have

2p2N ´ 1qσ 2 σ2
VarpÂq ě ě
NpN ` 1q N

Conclusion: CRLB increases (actually non-decreasing) as we estimate


more parameters.
15
CRLB-Vector Parameter

CRLB
If the likelihood function ppx; θq satisfies the regularity condition
„ 
B ln ppx; θq
E “ 0 @θ

Then covariance matrix of any unbiased estimator θ̂ satisfies


„ ˆ 2 ˙´1

T
ı
´1 B ln ppx; θq
Cθ̂ “ E pθ̂ ´ θqpθ̂ ´ θq ľ I pθq “ ´E
Bθi Bθj

An unbiased estimator attains the CRLB if and only if

B ln ppx; θq
“ Ipθqpgpxq ´ θq

and the MVU estimator is given by θ̂ “ gpxq

16
Vector Parameter CRLB for Transformation

Suppose we want to estimate α “ gpθq. Then we have

Bgpθq ´1 Bgpθq
Cα̂ ľ T
I pθq
Bθ Bθ
where the Jacobian matrix is defined by
» fi
Bg1 pθq Bg1 pθq Bg1 pθq
Bθ1 Bθ2 ¨¨¨ Bθp
— Bg2 pθq Bg2 pθq Bg2 pθq ffi
Bgpθq —— Bθ1 Bθ2 ¨¨¨ Bθp
ffi
ffi
T
“— .. .. .. .. ffi
Bθ —
– . . . . ffi
fl
Bgr pθq Bgr pθq Bgr pθq
Bθ1 Bθ2 ¨¨¨ Bθp

As a special case, the efficiency is maintained by affine transformation.

17
Example: CRLB for Signal-to-Noise Ratio (SNR)

Consider the previous example that both A, σ 2 are unknown. Suppose we


2
want to estimate SNR “ Aσ2 . For θ “ rA, σ 2 sT , we know that
« ff
N
2A A2
„ 
σ 2 0 Bg pθq
Ipθq “ N , “ , ´
0 2σ 4 Bθ T σ2 σ4

So that

Bgpθq ´1 Bgpθq 4SNR ` 2SNR2


I pθq “
Bθ T Bθ N

18
CRLB for the General Gaussian Case

Consider the Gaussian model

x „ N pµpθq, Cpθqq

In this case, we have


„ T „  „ 
Bµpθq Bµpθq 1 BC ´1 BC
rIpθqsij “ C´1 ` tr C´1 C
Bθi Bθi 2 Bθi Bθj

19
Example: Random DC Level in WGN

Suppose the DC level is also a Gaussian random variable

xrns “ A ` w rns

that A „ N p0, σA2 q. It can be shown that

σ2
ˆ ˙
1
CpσA2 q “ σA2 11T 2
` σ I, C ´1
pσA2 q “ 2 I ´ 2 A 2 11T
σ σ ` NσA

And the Fisher information for σA2 is


ˆ ˙2
1 N
IpσA2 q “
2 σ 2 ` NσA2

20
5
Singular Information Matrix

Consider estimating vector parameter θ P Rm and its transformation


α “ f pθq P Rm̄ . Let Ipθq be the information matrix of estimating θ:
«ˆ ˙ˆ ˙T ff
“ T
‰ B ln ppy; θq B ln ppy; θq
Ipθq “ E ∆∆ “ E
Bθ Bθ

Let W P Rmˆm̄ be an arbitrary matrix that does not depend on y:


(
E rpα̂ ´ Eα̂q ´ WT ∆srpα̂ ´ Eα̂q ´ WT ∆sT
“ Cα̂ ´ HW ´ WT HT ` WT IpθqW ľ 0
BpEα̂´α̂q Bα
where H “ Bθ T
` Bθ T

5
P. Stoica and T. L. Marzetta, 2001

21
5
Singular Information Matrix

Consider the eigenvector/eigenvalue decomposition of Ipθq:


« ff « ff
Λ1 0 UT
1
Ipθq “ rU1 , U2 s
0 0 UT
2

where U “ rU1 , U2 s is orthonormal, Λ1 P Rr ˆr is diagonal and positive


definite, and r is the rank of Ipθq. Then, we have
T ´1 T T ´1 T T T
Cα̂ ľ H1 Λ´1
1 H1 ´ pW1 ´ Λ1 H1 q Λ1 pW1 ´ Λ1 H1 q ` H2 W2 ` W2 H2

where
« ff « ff
W1 UT
1
“ W, rH1 , H2 s “ HrU1 , U2 s
W2 UT
2

5
P. Stoica and T. L. Marzetta, 2001

21
5
Singular Information Matrix

The W1 that maximizes the RHS is W1 “ Λ´1 T


1 H1 and then

T T T
Cα̂ ľ H1 Λ´1
1 H1 ` H2 W2 ` W2 H2

By setting W2 “ 0, we have a valid lower bound


T : T
Cα̂ ľ H1 Λ´1
1 H1 “ HIpθq H

However, when H2 ‰ 0, we can increase the RHS without limit. For


example, we choose W2 “ σH2T {2, then we have

Cα̂ ľ HIpθq: HT ` σH2 HT


2

Sufficient conditions for H2 “ 0:

• The parameter α is only a function of UT T


1 θ (e.g, α “ U1 θ )

• U1 has no dependence on θ
5
P. Stoica and T. L. Marzetta, 2001

21
6
CRLB Analog of Bayes’ Rule

Consider the random observation model in WGN

y “ xθ ` w

Some properties of CRLB:

• For a fixed θ, the CRLB for θ decreases as the dimension of


observation y increases

• For a fixed y, if additional parameters θ̃ are estimated, then the


CRLB for θ increases as the dimension of θ̃ increases.

• Among all possible distributions of w with a fixed covariance matrix,


the CRLB attains its maximum when w is Gaussian.
6
D. Zachariah and P. Stoica, 2015

22
6
CRLB Analog of Bayes’ Rule

Consider a partition of θ “ rαT , β T sT . The Fisher information matrix


can be partitioned as
« ff
Iα Iαβ
Iθ “
Iβα Iβ

For two random vectors a, b, the Bayes’ rule is

ppa, bq “ ppa|bqppbq

6
D. Zachariah and P. Stoica, 2015

22
6
CRLB Analog of Bayes’ Rule

We already know that


« ff´1
Iα Iαβ
CRLBpα, βq “
Iβα Iβ
CRLBpα|βq “ I´1
α If β is known
´ ¯´1
CRLBpαq “ Iα ´ Iαβ I´1
β Iβα If β is unknown
And the key is to apply the Schur determinant formula
ˇ« ffˇ
ˇ I Iαβ ˇˇ
α ˇ ˇ
“ |Iα | ˇIβ ´ Iβα I´1
α Iαβ
ˇ ˇ
ˇ ˇ
ˇ Iβα Iβ ˇ
ˇ ˇ
“ |Iβ | ˇIα ´ Iαβ I´1
β Iβα ˇ
ˇ ˇ

Then we have
|CRLBpα, βq| “ |CRLBpα|βq||CRLBpβq|
6
D. Zachariah and P. Stoica, 2015

22
Linear Models

In general, it is difficult to find the MVU. But for linear data models, we
can find the optimal unbiased estimator and provide the statistical
performance guarantee.

23
Example: Line Fitting Problem

Consider the straight line model in WGN:

xrns “ A ` Bn ` w rns n “ 0, 1, ¨ ¨ ¨ , N ´ 1

In a matrix form, we have

x “ Hθ ` w, w „ N p0, σ 2 Iq

where
T T
x “ rxr0s, xr1s, ¨ ¨ ¨ , xrN ´ 1ss , w “ rw r0s, w r1s, ¨ ¨ ¨ , w rN ´ 1ss
» fi
1 0
— 1 1
— ffi
T
ffi
θ “ rA, Bs , H “ — . — .. ffi
– .. .
ffi
fl
1 N ´1

H is referred to as the observation matrix.


24
Example: Line Fitting Problem

We know that an unbiased estimator θ̂ “ gpxq will be the MVU if

B ln ppx; θq
“ Ipθqpgpxq ´ θq

For the line fitting problem, if HT H is invertible, we have

B ln ppx; θq HT H “ T ´1 T ‰
“ 2
pH Hq H x ´ θ
Bθ σ
and
HT H
Ipθq “ , θ̂ “ gpxq “ pHT Hq´1 HT x
σ2

24
MVU for the Linear Model

Consider the parameterized model

x “ Hθ ` w

where H P RNˆp is the observation matrix of rank p and w „ N p0, σ 2 Iq.


Then the MVU estimator is

θ̂ “ pHT Hq´1 HT x

and the covariance matrix of θ̂ is

Cθ̂ “ σ 2 pHT Hq´1

25
Linear Model Examples

Consider the curve fitting problem

xptn q “ θ1 ` θ2 tn ` θ3 tn2 ` w ptn q, n “ 0, 1, ¨ ¨ ¨ , N ´ 1

In the matrix form, we have

x “ Hθ ` w, x “ rxpt0 q, xpt1 q, ¨ ¨ ¨ , xptN´1 qsT


» fi
1 t0 t02
— 1 t1 t12 ffi
— ffi
θ “ rθ1 , θ2 , θ3 sT , H “ — — .. .. .. ffi
ffi
– . . . fl
2
1 tN´1 tN´1

H is a Vandermonde matrix and it has full column rank if at least three


sample points are different.

26
Linear Model Examples

Consider the Fourier Analysis problem


M M
ÿ 2πkn ÿ 2πkn
xrns “ ak cosp q` bk sinp q ` w rns, n “ 0, 1, ¨ ¨ ¨ , N ´ 1
k“1
N k“1
N

In the matrix form, we have

θ “ ra1 , a2 , ¨ ¨ ¨ , aM , b1 , b2 , ¨ ¨ ¨ , bM sT
» fi
1 ¨¨¨ 1 0 ¨¨¨ 0
cosp 2π cosp 2πM sinp 2π sinp 2πM
— ffi
— N q ¨¨¨ N q N q ¨¨¨ N q ffi
H“— .. .. .. .. .. .. ffi

– . . . . . .
ffi
fl
cosp 2πpN´1q
N q ¨¨¨ cosp 2πMpN´1q
N q sinp 2πpN´1q
N q ¨¨¨ sinp 2πMpN´1q
N q

26
Linear Model Examples

N
It can be shown that HT H “ 2I and the MVU estimate of θ is given by

N´1 N´1
2 ÿ 2πkn 2 ÿ 2πkn
âk “ xrns cosp q, b̂k “ xrns sinp q
N n“0 N N n“0 N

and the covariance matrix is


2σ 2
Cθ̂ “ I
N

26
Linear Model in Colored Noise

Suppose the noise is not white that

w „ N p0, Cq

If C is positive definite, we can use the whitening approach. Let


C´1 “ DT D, we have
“ ‰
E pDwqpDwqT “ DCDT “ I

Then, given x “ Hθ ` w, we can consider Dx that

x1 “ Dx “ DHθ ` Dw

The MVU estimator and its covariance are

θ̂ “ pHT C´1 Hq´1 HT C´1 x, Cθ̂ “ pHT C´1 Hq´1

27
General Minimum Variance Unbiased Estimation

For the class of linear models, we can easily find the MVU estimator
which is also efficient (attains the CRLB). Now, we try to find MVU
estimator that may not be efficient. In this regard, we need to study the
concepts of sufficient statistics and Rao-Blackwell-Lehmann-Scheffe
theorem.

28
Example: DC Level in WGN

Consider the likelihood function:


« ff
N´1
1 1 ÿ
ppx; Aq “ N exp ´ 2 pxrns ´ Aq2
p2πσ 2 q 2 2σ n“0
řN´1
Assume the statistic T pxq “ n“0 xrns “ T0 is observed. We then look
at the conditional probability ppx|T0 ; Aq:
ppx, T pxq “ T0 ; Aq
ppx|T pxq “ T0 ; Aq “
ppT pxq “ T0 ; Aq
ppx; AqδpT pxq ´ T0 q

ppT pxq “ T0 ; Aq
where δp¨q is the Dirac delta function.
ppx; AqδpT pxq ´ T0 q “
« ˜ ¸ff
N´1
1 1 ÿ
N exp ´ 2 x 2 rns ´ 2AT0 NA2 δpT pxq ´ T0 q
p2πσ 2 q 2 2σ n“0

29
Example: DC Level in WGN

Then we have

ppx|T pxq “ T0 ; Aq
? «
N´1
ff „ 2 
N 1 ÿ 2 T0
“ N´1 exp ´ 2
x rns exp δpT pxq ´ T0 q
p2πσ 2 q 2 2σ n“0 2Nσ 2

which is not a function of A and implies T pxq is a sufficient statistic.

29
Sufficient Statistics

Neyman-Fisher Factorization
If the likelihood function ppx; θq can be factorized as

ppx; θq “ g pT pxq, θqhpxq

where g is a function of T pxq and h is a function only of x. Then T pxq


is a sufficient statistic for θ. Conversely, if T pxq is a sufficient statistic
for θ, then the likelihood function can be factorized as above.

30
Example

Consider the DC level in WGN:

ppx; Aq “
« ˜ ¸ff « ff
N´1 N´1
1 1 2
ÿ 1 ÿ 2
“ exp ´ 2 NA ´ 2A xrns exp ´ 2 x rns
p2πσ 2 q N2 2σ 2σ n“0
n“0
looooooooooooooooooooooooooooomooooooooooooooooooooooooooooon looooooooooooomooooooooooooon
g pT pxq,Aq hpxq

řN´1
And then T pxq “ n“0 xrns is a sufficient statistic for estimating A.

31
Example

Suppose we want to estimate the phase of a sinusoid:


xrns “ A cosp2πf0 n ` φq ` w rns, n “ 0, 1, ¨ ¨ ¨ , N ´ 1
where A, f0 , σ 2 are known. The likelihood function is
ppx; φq “
# +
N´1
1 1 ÿ 2
“ N exp ´ 2 rxrns ´ A cosp2πf0 n ` φqs
p2πσ 2 q 2 2σ n“0
# « ff+
N´1
1 1 ÿ
2 2
“ exp ´ 2 A cos p2πf0 n ` φq ´ 2AT1 pxq cos φ ` 2AT2 pxq sin φ
p2πσ 2 q N2 2σ n“0
looooooooooooooooooooooooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooooooooooooooooooooooon
g pT1 pxq,T2 pxq,φq
« ff
N´1
1 ÿ
¨ exp ´ 2 x 2 rns
2σ n“0
looooooooooooomooooooooooooon
hpxq
řN´1 řN´1
where T1 pxq “ n“0 xrns cos 2πf0 n, T2 pxq “ n“0 xrns sin 2πf0 n
31
More Examples

Uniform Distribution
Let X1 , ¨ ¨ ¨ , Xn be independently distributed according to the uniform
distribution Up0, θq. Let T be the largest of the nX ’s, and consider the
conditional distribution of the remaining n ´ 1X ’s given T “ t. These
n ´ 1 points are randomly distributed on p0, tq which independent of θ.

32
More Examples

Symmetric Distribution
Suppose X is symmetrically distributed about zero. Then, given that
|X | “ t, the only two possible values of the X are ˘t with equal
conditional probability 1{2. Then T “ |X | is sufficient.

32
Better Estimator Using Sufficient Statistics T *
Rao-Blackwell Theorem
Let X be a random observable with distribution Pθ P P “ tPθ1 , θ1 P Ωu,
and let T be sufficient for P. Let θ̂ be an estimator of an estimand
g pθq, and let the loss function Lpθ, dq be a strictly convex function of
d. Then, if θ̂ has finite expectation and risk,

Rpθ, θ̂q “ ELrθ, θ̂s ă 8

and if

θ̆ptq “ Erθ̂|T “ ts

the risk of the estimator θ̆pT q satisfies

Rpθ, θ̆q ă Rpθ, θ̂q

unless θ̂pX q “ θ̆pT q with probability 1.

33
Find the MVU Estimator

Given a sufficient statistic, we continue to find the MVU estimator which


may not be efficient.
Rao-Blackwell-Lehmann-Scheffe
If θ̆ is an unbiased estimator of θ and T pxq is a sufficient statistic for θ,
then θ̂ “ Epθ̆|T pxqq is

• a valid estimator for θ that is independent of θ


• unbiased
• of lesser or equal variance than that of θ̆, for all θ

Additionally, if the sufficient statistic is complete, then θ̂ is the MVU


estimator.

A statistic is complete if there is only one function of the statistic


that is unbiased.

34
Find the MVU Estimator

34
Example: DC Level in WGN

Consider the unbiased estimator Ă “ xr0s and sufficient statistic


řN´1 řN´1
T pxq “ n“0 xrns. We study the estimator  “ Epxr0s| n“0 xrnsq.

For rx, y sT a Gaussian random vector with mean vector


µ “ rEpxq, Epy qsT and covariance matrix
« ff
Varpxq Covpx, y q
C“
Covpy , xq Varpy q

It can be shown that 7


ż8
Covpx, y q
Epx|y q “ xppx|y qdx “ Epxq ` py ´ Epy qq
´8 Varpy q
N´1
1 ÿ
 “ xrns
N n“0

7
See Appendix 10A

35
Example: DC Level in WGN

řN´1
Next, we want to show T pxq “ n“0 xrns is complete. We have shown
that g pT pxqq “ TNpxq is unbiased and suppose there exists a second
function hp¨q that ErhpT pxqqs “ A. Then,

Erg pT q ´ hpT qs “ Epv pT qq “ 0


ż8 „ 
1 1 2
v pT q ? exp ´ pT ´ NAq dT “ 0
´8 2πNσ 2 2Nσ 2

which implies

v pT q “ g pT q ´ hpT q “ 0, a.e.

35
Incomplete Sufficient Statistic

Consider DC level in bounded noise


1 1
xr0s “ A ` w r0s, w r0s „ Ur´ , s
2 2
A trivial unbiased estimator is xr0s which is also a sufficient statistic. Let
g pxr0sq “ xr0s and hpxr0sq be any function such that Ephpxr0sqq “ A.
Then, we need
ż8 ż8
v pT qppx; Aqdx “ v pT qppT ; AqdT “ 0
´8 ´8
Note that
#
1 1
1 A´ 2 ďT ďA` 2
ppT ; Aq “
0 otherwise
So the condition becomes
ż A` 21
v pT qdT “ 0, @A
A´ 12

36
Incomplete Sufficient Statistic

Unlike before, besides v pT q “ 0, v pT q “ 2 sin 2πT also satisfies the


condition. And then we can choose hpT q “ T ´ sin 2πT and

 “ xr0s ´ sin 2πxr0s

which is also an unbiased estimator based on sufficient statistic xr0s.

In general, we need
ż8
v pT qppT ; θqdT “ 0, @θ
´8

to be satisfied only by the zero function.

36
Procedure to Find the MVU

37
Extension to the Vector Parameter

Neyman-Fisher Factorization Theorem: If we can factor the likelihood


function ppx; θq as

ppx; θq “ g pTpxq, θqhpxq

Then Tpxq is a sufficient statistic for θ. Conversely, if Tpxq is a sufficient


statistic for θ, then the likelihood function can be factorized as above.

38
Example: Sinusoidal Parameter Estimation

Consider a sinusoidal signal in WGN:

xrns “ A cos 2πf0 n ` w rns, n “ 0, 1, ¨ ¨ ¨ , N ´ 1

where the parameters are A, f0 , σ 2 . The likelihood function with respect


to θ “ rA, f0 , σ 2 sT is
« ff
N´1
1 1 ÿ 2
ppx; θq “ N exp ´ 2 pxrns ´ A cos 2πf0 nq
p2πσ 2 q 2 2σ n“0
ppx; θq “
« ˜ ¸ff
N´1 N´1 N´1
1 1 ÿ
2
ÿ
2
ÿ
2
N exp ´ 2 x rns ´ 2A xrns cos 2πf0 n ` A cos 2πf0 n ¨1
p2πσ 2 q 2 2σ n“0 n“0 n“0
looooooooooooooooooooooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooooooooooooooooooooon
g pTpxq,θq

”ř ıT
N´1 řN´1
So the sufficient statistic is Tpxq “ n“0 xrns cos 2πf0 n, n“0 x 2 rns

39
DC Level in WGN with Unknown Noise Power

Consider the linear model

xrns “ A ` w rns, n “ 0, 1, ¨ ¨ ¨ , N ´ 1

where θ “ rA, σ 2 sT . This is a special case of the sinusoidal parameter


estimation problem with f0 “ 0. Then the sufficient statistic is
« ř ff
N´1
n“0 xrns
Tpxq “ řN´1 2
n“0 x rns

Next, we try to find the MVU estimation based on Tpxq

40
DC Level in WGN with Unknown Noise Power

« ff
NA
EpTpxqq “
Npσ 2 ` A2 q

We can remove the bias by using the function g p¨q:


« ff « ff
1
N T1 pxq x̄
gpTpxqq “ 1 N 1 2 “ 1
řN´1 2 2
N´1 T2 pxq ´ N´1 N T1 pxqs N´1 r n“0 x rns ´ N x̄ s

And then
« ff

θ̂ “ 1
řN´1 2 2
N´1 r n“0 x rns ´ N x̄ s

is the MVU estimator. The completeness can be shown similarly as


previous example.

40
Exponential Families*
Definition
A family of distributions tpθ u is said to form an s-dimensional
exponential family if the distribution pθ is of the form
« ff
ÿs
pθ pxq “ exp ηi pθqTi pxq ´ Bpθq hpxq
i“1

with respect to some common measure µ. The ηi and B are real-valued


functions of parameters and the Ti are real-valued statistics, and x is a
point in the sample space X , the support of the density. The canonical
form is
« ff
s
ÿ
pη pxq “ exp ηi Ti pxq ´ Apηq hpxq
i“1

The set Ξ of points η “ pη1 , ¨ ¨ ¨ , ηs q is called the natural parameter


ş ř η T pxq
space and η is called natural parameter if e i i
hpxqdµpxq ă 8.
The set Ξ is convex. 41
Examples

Normal family: the distribution N pξ, σ 2 q is a member of exponential


family with θ “ pξ, σ 2 q:

ξ2
„ 
ξ 1 1
pθ pxq “ exp 2 x ´ 2 x 2 ´ 2 ?
σ 2σ 2σ 2πσ

and natural parameters pη1 , η2 q “ pξ{σ 2 , ´1{2σ 2 q P Ξ “ R ˆ p´8, 0q

42
Examples

Multinomial: In n independent trials with s ` 1 possible outcomes, let


the probability of the i-th outcome be pi in each trial. Let Xi denote the
number of outcome i, the multinomial distribution is
n!
ppX0 “ x0 , ¨ ¨ ¨ , Xs “ xs q “ p x0 ¨ ¨ ¨ psxs
x0 ! ¨ ¨ ¨ xs ! 0
which can be rewritten as

exppx0 log p0 ` ¨ ¨ ¨ ` xs log ps qhpxq


ř
Since xi “ n, the distribution is given by

exprn log p0 ` x1 logpp1 {p0 q ` ¨ ¨ ¨ xs logpps {p0 qshpxq

This is an s-dimensional exponential family with natural parameters


« ff
ÿs
ηi
ηi “ logppi {p0 q, Apηq “ ´n log p0 “ n log 1 ` e
i“1

42
Examples

If neither T ’s nor the η’s satisfy a linear constraint, the natural


parameter space will be convex and contains an open s-dimensional
rectangle. Such exponential is said to be of full rank.

Curved normal family: For the normal family N pξ, σ 2 q, if we know


ξ “ σ, then
„ 
1 1 1 1
pθ pxq “ exp x ´ 2 x 2 ´ ? , ξą0
ξ 2ξ 2 2πξ

The two dimensional parameter p ξ1 , ´ 2ξ1 2 q lies on a curve in R2 and this


family is rank deficient.

42
Sufficient Statistic for Exponential Family*

Consider the exponential family, then T “ pT1 , ¨ ¨ ¨ , Ts q is minimal


sufficient provided the family

• full rank or
• The parameter space contains s ` 1 points η pjq , which spans Rs in
the sense that they do not belong to a proper affine subspace.

Note that T is always sufficient using the factorization criterion.

43
Curved normal family

The curved normal family N pξ, ξ 2 q with η “ p ξ1 , ´ 2ξ1 2 q is rank deficient


but we can find 3 points
1 1 1
η p0q “ p1, ´ q, η p1q “ p2, ´ q, η p2q “ p3, ´ q
2 8 18
And the matrix
« ff
´
p1q p0q p2q p0q
¯ 2´1 3´1
η ´η ,η ´η “
´ 18 ` 21 1
´ 18 ` 12

is full rank. Thus, the statistic T “ p xi , xi2 q is minimal sufficient.


ř ř

However, T is not complete as we can find unbiased estimators based on


xi or xi2
ř ř

44
7
Completeness for Exponential Family

If X is distributed according to the exponential family and the family is


full rank, then T “ rT1 pX q, ¨ ¨ ¨ , Ts pX qs is complete.

7
Theorem 1.6.22, TPE

45
Sufficient Statistic for Finite Family*

Let P be a finite family with densities pi , i “ 0, 1, 2, ¨ ¨ ¨ , k, all having the


same support. Then, the statistic
ˆ ˙
p1 pxq p2 pxq pk pxq
T pxq “ , ,¨¨¨ ,
p0 pxq p0 pxq p0 pxq

is minimal sufficient.

46
8
Sufficiency of order statistics

Let X “ pX1 , ¨ ¨ ¨ , Xn q be i.i.d according to an unknown continuous


distribution F and let T “ pXp1q , ¨ ¨ ¨ , Xpnq q where Xp1q ă ¨ ¨ ¨ ă Xpnq
denotes the ordered observations, called ordered statistics. By the
continuity assumptions, the X ’s are distinct with probability 1. Given T ,
the only possible outcomes are the n! vectors pXpi1 q , ¨ ¨ ¨ , Xpin q q, and by
symmetry, each of these has equal probability 1{n!.

Thus, T is sufficient.

8
See TPE, Chap. 1

47
8
Sufficiency of order statistics

The order statistics are complete for the families of densities

P “ t all probability measures on the real line with unimodal densities


with respect to Lebesgue measure u

P “ t all probability densities with respect to Lebesgue measure u (See


problem 1.6.33 in TPE)

8
See TPE, Chap. 1

47
Nonparametric Families*

Suppose X1 , ¨ ¨ ¨ , Xn are i.i.d with distribution F P F. We do not


parameterize the family F but assume F has a density. The estimand
ş
g pF q might be EpXi q “ xdF pxq, VarpXi q or PpXi ď aq “ F paq. For this
family, the order statistic Xp1q ă ¨ ¨ ¨ ă Xpnq is a complete sufficient
statistic.

An estimator θ̂pX1 , ¨ ¨ ¨ , Xn q is a function of the order statistic if and only


if it is symmetric in its n arguments. Then, to find the MVU estimator, it
suffices to find a symmetric unbiased estimator based on the order
statistic.

48
Examples

Estimating the distribution function:

We want to estimate g pF q “ PpX ď aq “ F paq for a given a. The


estimator is the number of X ’s which are ď a, divided by the sample size
N. This simple estimator is symmetric and unbiased, and thus is the
MVU estimator.

49
Examples

Estimating the mean:


ş
Suppose E|X | ă 8, and let g pF q “ xf pxqdx. The empirical mean X̄ is
symmetric and unbiased, so X̄ is MVU. As a short proof, note that X1 is
unbiased for the mean, we compute ErX1 |Xp1q , ¨ ¨ ¨ , Xpnq s which is the
empirical mean.

49
Examples

Estimating the variance and second moment:

pXi ´X̄ q2
ř
For the variance, the estimator n´1 is symmetric, unbiased, and is
MVU.

Xi2
ř
For the second moment, the estimator n is a symmetric, unbiased,
MVU estimator of EpX 2 q.

49
The Information Inequality*

In previous slides, we show that CRLB may yield the best unbiased
estimator under certain conditions. In many other cases, we can seek the
MVU estimator.

More generally, we can use different types of information inequalities to


measure the performance of a given estimator. For any estimator δpxq of
estimand g pθq and any function ψpx, θq with a finite second moment, the
covariance inequality states that

rCovpδ, ψqs2
Varpδq ě
Varpψq

This inequality is not that helpful as the right hand side is a function of
δ.

50
The Information Inequality*

The key is to find a proper ψpx, θq:


Theorem 2.5.1, TPE
A necessary and sufficient condition for Covpδ, ψq to depend on δ only
through g pθq is that for all θ

CovpU, ψq “ 0, @U P U

where U is the class of statistics defined as

U “ tU : Eθ U “ 0, Eθ U 2 ă 8, @θ P Ωu

50
The Information Inequality*

Example: Hammersley-Chapman-Robbins Inequality Suppose X is


distributed with density pθ “ ppx, θq, and ppx, θq ą 0 for all x. If θ and
θ ` ∆ are two values for which g pθq ‰ g pθ ` ∆q, then the function

ppx, θ ` ∆q
ψpx, θq “ ´1
ppx, θq

satisfies the condition that

CovpU, ψq “ 0, @U P U

And then we have


„ 2
2 ppX , θ ` ∆q
Varpδq ě rg pθ ` ∆q ´ g pθqs {Eθ ´1
ppX , θq

50
The Information Inequality*
Information Inequality for Exponential Family (scalar)
Let X be distributed according to the exponential family with s “ 1,
and let

τ pθq “ Eθ pT q

called the mean-value parameter. Then, T


1
I pτ pθqq “
Varθ pT q

Example: Information in a gamma variable:


Let X follow the Gammapα, βq:
1
pβ pxq “ x α´1 e ´x{β “ e p´1{βqx´α logpβq hpxq
Γpαqβ α
T pxq “ x, EpT q “ αβ and the information about αβ is I pαβq “ 1{αβ 2
50
Best Linear Unbiased Estimator (BLUE)

Motivation:

• MVU estimator is difficult to find even if it exists.

• If we do not know the likehood function, the approaches based on


CRLB or sufficient statistics will not apply.

• If we restrict the estimator to be linear, we may only need the first


and second moments of the underlying process.

51
Definition of the BLUE

We observe the data set txr0s, xr1s, ¨ ¨ ¨ , xrN ´ 1su whose likelihood
function ppx; θq depends on an unknown parameter θ. The BLUE
restricts the estimator to be linear in the data:
N´1
ÿ
θ̂ “ an xrns
n“0

where the coefficients tan u are to be determined.

For example, for estimating the DC level in WGN, the MVU estimator is
the sample mean
N´1
ÿ 1
θ̂ “ x̄ “ xrns
n“0
N

which is also an linear estimator.

52
Definition of the BLUE

If the additive noise is not Gaussian but uniform, the MVU estimator is
shown to be
N `1
θ̂ “ max xrns
2N
which is nonlinear in the data.

Consider the estimation of σ 2 with zero mean. The MVU estimator is


N´1
1 ÿ 2
σ̂ 2 “ x rns
N n“0

If we force the estimator to be linear:


N´1
ÿ
σ̆ 2 “ an xrns
n“0

It can be easily seen that Epσ̆ 2 q “ 0 for any coefficients tan u.


52
Finding the BLUE

We enforce the constraint that


N´1
ÿ
Epθ̂q “ an Epxrnsq “ θ
n“0

Then the variance of θ̂ is given by


“ ‰
Varpθ̂q “ E paT x ´ aT Epxqq2 “ aT Cx a
“ ‰
where Cx “ E px ´ Epxqqpx ´ EpxqqT .
To satisfy the unbiased constraint, we must have

Epxrnsq “ srnsθ

where the srns’s are known. More generally, xrns can be represented as

xrns “ θsrns ` w rns

where w rns is zero-mean (though not Gaussian). The BLUE is


applicable to amplitude estimation of known signals in noise.
53
Finding the BLUE

To find the BLUE, we need to solve the following constrained


optimization program:
N´1
ÿ
min Varpθ̂q “ aT Cx a s.t. an srns “ aT s “ 1
a
n“0

The solution can be shown to be

Cx ´1 s sT Cx ´1 x 1
aopt “ , θ̂ “ , Varpθ̂q “
sT Cx ´1 s sT Cx ´1 s sT C xs

To derive the BLUE, we only need

• the scaled mean s


• the covariance Cx

53
Finding the BLUE

Consider the DC level in uncorrelated noise:

xrns “ A ` w rns

where Varpw rnsq “ σn2 . In this case, s “ 1 and the BLUE is given by

1T Cx ´1 x 1
 “ , VarpÂq “
1T Cx ´1 1 1T Cx ´1 1
and the covariance matrix Cx is

Cx “ diagpσ02 , σ12 , ¨ ¨ ¨ , σN2 q

Therefore,
řN´1 xrns
n“0 σn2
 “ řN´1 1
n“0 σn2

The BLUE weights those samples most heavily with smallest variances.
53
Extension to a Vector Parameter

For vector parameter, we enforce the constraint that

θ̂ “ Ax, Epθ̂q “ θ

To satisfy the unbiased constraint, we must have

Epxq “ Hθ, AH “ I

The variance is

Varpθ̂i q “ aT
i Cx ai

The optimal solution is given by the Gauss-Markov Theorem.

54
Extension to a Vector Parameter

Gauss-Markov Theorem
If the data model is linear that

x “ Hθ ` w

where w is zero-mean with covariance C. Then the BLUE is


` ˘´1 T ´1
θ̂ “ HT C´1 H H C x

and the covariance of θ̂ is

Cθ̂ “ pHT C´1 Hq´1

54

You might also like