MVUE
MVUE
Prof. H. Qiao
UM-SJTU Joint Institute
May 16, 2023
Unbiased Estimators
An estimator θ̂ is unbiased if
Epθ̂q “ θ, θ P pa, bq
xrns “ A ` w rns, n “ 0, 1, ¨ ¨ ¨ , N ´ 1
1
Unbiased Estimators
An estimator θ̂ is unbiased if
Epθ̂q “ θ, θ P pa, bq
#
N´1
1 ÿ 1 “A If A “ 0
Ă “ xrns, EpĂq “ A
2N n“0 2 ‰A Otherwise
1
Unbiased Estimators
An estimator θ̂ is unbiased if
Epθ̂q “ θ, θ P pa, bq
Caution
Unbiased estimator may not necessarily be a good estimator. For
example, given n uncorrelated, unbiased estimators, tθ̂i u we consider
the averaging estimator
n
1ÿ
θ̂ “ θ̂i
n i“1
1
Minimum Variance Criterion
Bias-Variance Trade-off
"”´ ¯ ´ ¯ı2 *
msepθ̂q “E θ̂ ´ Epθ̂q ` Epθ̂q ´ θ
“ Varpθ̂q ` Biaspθ̂q2
2
Minimum Variance Criterion
We have
a2 σ 2
msepĂq “ ` pa ´ 1q2 A2
N
A2
aopt “ 2
A ` σ 2 {N
For simplicity, we would like to first look for the unbiased estimator that
has the smallest variance (mse as well in this case). Such estimator may
not exist.
3
Existence of the Minimum Variance Unbiased Estimator
A Counter Example
Assume
#
N pθ, 1q If θ ě 0
xr0s „ N pθ, 1q, xr1s „
N pθ, 2q If θ ă 0
4
Existence of the Minimum Variance Unbiased Estimator
4
Finding the Minimum Variance Unbiased Estimator*
5
Estimator Accuracy Considerations
6
Estimator Accuracy Considerations
7
Estimator Accuracy Considerations
? 1
ln ppxr0s; Aq “ ´ ln 2πσ 2 ´ 2 pxr0s ´ Aq2
2σ
B ln ppxr0s; Aq 1
“ 2 pxr0s ´ Aq
BA σ
B 2 ln ppxr0s; Aq 1
´ “ 2 Ñ ”curvature”
BA2 σ
The ”curvature” increases as σ 2 decreases. The average (varying xr0s)
curvature is given by
„ 2
B ln ppxr0s; Aq
´E
BA2
where the expectation is taken with respect to ppxr0s; Aq. Note: In this
special case, the ”curvature” is independent of A.
7
Cramer-Rao Lower Bound
CRLB-Scalar Parameter
Suppose the likelihood function ppx; θq satisfies the regularity condition
„
B ln ppx; θq
E “ 0, @θ P Ω
Bθ
Then the variance of any unbiased estimator θ̂ (i.e. MSE) must satisfy
1
Varpθ̂q ě ” ı
B2 ln ppx;θq
´E Bθ 2
B ln ppx; θq
“ I pθq pg pxq ´ θq
Bθ
for some functions I p¨q and g p¨q. And the estimator is θ̂ “ g pxq with
variance 1{I pθq.
8
Cramer-Rao Lower Bound
DC Level in White Gaussian Noise
Consider
xrns “ A ` w rns n “ 0, 1, ¨ ¨ ¨ , N ´ 1
And we have
N´1
B ln ppx; Aq 1 ÿ N
“ 2 pxrns ´ Aq “ 2 px̄ ´ Aq
BA σ n“0 σ
B 2 ln ppx; Aq N σ2
“ ´ 2, VarpÂq ě
BA2 σ N
1
řN´1
For this case, the CRLB is attained by θ̂ “ x̄ “ N n“0 xrns
8
Cramer-Rao Lower Bound
8
Alternative Form of CRLB*
«ˆ ˙2 ff
B 2 ln ppx; θq
„
B ln ppx; θq
E “ ´E
Bθ Bθ2
and then
1
Varpθ̂q ě „´ ¯2
B ln ppx;θq
E Bθ
2
See appendix 3A of S. Kay Vol. 1
9
Fisher Information
Intuitively, the more information, the lower the bound. Further more, it is
non-negative and additive for independent observations:
N´1
ÿ
ln ppx; θq “ ln ppxrns; θq
n“0
N´1
B 2 ln ppx; θq ÿ „ B 2 ln ppxrns; θq
„
´E “ ´ E
Bθ2 n“0
Bθ2
„ 2
B ln ppxrns; θq
I pθq “ Nipθq “ ´NE (If identically distributed)
Bθ2
10
Example: CRLB for Signals in White Gaussian Noise
We have
N´1
B ln ppx; θq 1 ÿ Bsrn; θs
“ 2 pxrns ´ srn; θsq
Bθ σ n“0 Bθ
ˆ 2 ˙ N´1 ˆ ˙2
B ln ppx; θq 1 ÿ Bsrn; θs
E “ ´
Bθ2 σ 2 n“0 Bθ
and finally
σ2
Varpθ̂q ě ´ ¯2
řN´1 Bsrn;θs
n“0 Bθ
This bound implies that signals that change rapidly along with θ result in
accurate estimators.
11
Transformation of Parameters
3
See appendix 3A
12
Transformation of Parameters
g pθq “ aθ ` b
3
See appendix 3A
12
Extension to a Vector Parameter
4
See appendix 3B
13
Example: DC level in WGN
and it implies
« ff
N
σ2 0
Ipθq “ N
0 2σ 4
Consider the problem of line fitting in WGN with parameters θ “ rA, BsT
xrns “ A ` Bn ` w rns
The Fisher information matrix Ipθq is not diagonal which implies the
estimates of A, B are correlated. Indeed, in this case, we have
2p2N ´ 1qσ 2 σ2
VarpÂq ě ě
NpN ` 1q N
CRLB
If the likelihood function ppx; θq satisfies the regularity condition
„
B ln ppx; θq
E “ 0 @θ
Bθ
B ln ppx; θq
“ Ipθqpgpxq ´ θq
Bθ
16
Vector Parameter CRLB for Transformation
Bgpθq ´1 Bgpθq
Cα̂ ľ T
I pθq
Bθ Bθ
where the Jacobian matrix is defined by
» fi
Bg1 pθq Bg1 pθq Bg1 pθq
Bθ1 Bθ2 ¨¨¨ Bθp
— Bg2 pθq Bg2 pθq Bg2 pθq ffi
Bgpθq —— Bθ1 Bθ2 ¨¨¨ Bθp
ffi
ffi
T
“— .. .. .. .. ffi
Bθ —
– . . . . ffi
fl
Bgr pθq Bgr pθq Bgr pθq
Bθ1 Bθ2 ¨¨¨ Bθp
17
Example: CRLB for Signal-to-Noise Ratio (SNR)
So that
18
CRLB for the General Gaussian Case
x „ N pµpθq, Cpθqq
19
Example: Random DC Level in WGN
xrns “ A ` w rns
σ2
ˆ ˙
1
CpσA2 q “ σA2 11T 2
` σ I, C ´1
pσA2 q “ 2 I ´ 2 A 2 11T
σ σ ` NσA
20
5
Singular Information Matrix
5
P. Stoica and T. L. Marzetta, 2001
21
5
Singular Information Matrix
where
« ff « ff
W1 UT
1
“ W, rH1 , H2 s “ HrU1 , U2 s
W2 UT
2
5
P. Stoica and T. L. Marzetta, 2001
21
5
Singular Information Matrix
T T T
Cα̂ ľ H1 Λ´1
1 H1 ` H2 W2 ` W2 H2
• U1 has no dependence on θ
5
P. Stoica and T. L. Marzetta, 2001
21
6
CRLB Analog of Bayes’ Rule
y “ xθ ` w
22
6
CRLB Analog of Bayes’ Rule
ppa, bq “ ppa|bqppbq
6
D. Zachariah and P. Stoica, 2015
22
6
CRLB Analog of Bayes’ Rule
Then we have
|CRLBpα, βq| “ |CRLBpα|βq||CRLBpβq|
6
D. Zachariah and P. Stoica, 2015
22
Linear Models
In general, it is difficult to find the MVU. But for linear data models, we
can find the optimal unbiased estimator and provide the statistical
performance guarantee.
23
Example: Line Fitting Problem
xrns “ A ` Bn ` w rns n “ 0, 1, ¨ ¨ ¨ , N ´ 1
x “ Hθ ` w, w „ N p0, σ 2 Iq
where
T T
x “ rxr0s, xr1s, ¨ ¨ ¨ , xrN ´ 1ss , w “ rw r0s, w r1s, ¨ ¨ ¨ , w rN ´ 1ss
» fi
1 0
— 1 1
— ffi
T
ffi
θ “ rA, Bs , H “ — . — .. ffi
– .. .
ffi
fl
1 N ´1
B ln ppx; θq
“ Ipθqpgpxq ´ θq
Bθ
For the line fitting problem, if HT H is invertible, we have
B ln ppx; θq HT H “ T ´1 T ‰
“ 2
pH Hq H x ´ θ
Bθ σ
and
HT H
Ipθq “ , θ̂ “ gpxq “ pHT Hq´1 HT x
σ2
24
MVU for the Linear Model
x “ Hθ ` w
θ̂ “ pHT Hq´1 HT x
25
Linear Model Examples
26
Linear Model Examples
θ “ ra1 , a2 , ¨ ¨ ¨ , aM , b1 , b2 , ¨ ¨ ¨ , bM sT
» fi
1 ¨¨¨ 1 0 ¨¨¨ 0
cosp 2π cosp 2πM sinp 2π sinp 2πM
— ffi
— N q ¨¨¨ N q N q ¨¨¨ N q ffi
H“— .. .. .. .. .. .. ffi
—
– . . . . . .
ffi
fl
cosp 2πpN´1q
N q ¨¨¨ cosp 2πMpN´1q
N q sinp 2πpN´1q
N q ¨¨¨ sinp 2πMpN´1q
N q
26
Linear Model Examples
N
It can be shown that HT H “ 2I and the MVU estimate of θ is given by
N´1 N´1
2 ÿ 2πkn 2 ÿ 2πkn
âk “ xrns cosp q, b̂k “ xrns sinp q
N n“0 N N n“0 N
26
Linear Model in Colored Noise
w „ N p0, Cq
x1 “ Dx “ DHθ ` Dw
27
General Minimum Variance Unbiased Estimation
For the class of linear models, we can easily find the MVU estimator
which is also efficient (attains the CRLB). Now, we try to find MVU
estimator that may not be efficient. In this regard, we need to study the
concepts of sufficient statistics and Rao-Blackwell-Lehmann-Scheffe
theorem.
28
Example: DC Level in WGN
29
Example: DC Level in WGN
Then we have
ppx|T pxq “ T0 ; Aq
? «
N´1
ff „ 2
N 1 ÿ 2 T0
“ N´1 exp ´ 2
x rns exp δpT pxq ´ T0 q
p2πσ 2 q 2 2σ n“0 2Nσ 2
29
Sufficient Statistics
Neyman-Fisher Factorization
If the likelihood function ppx; θq can be factorized as
30
Example
ppx; Aq “
« ˜ ¸ff « ff
N´1 N´1
1 1 2
ÿ 1 ÿ 2
“ exp ´ 2 NA ´ 2A xrns exp ´ 2 x rns
p2πσ 2 q N2 2σ 2σ n“0
n“0
looooooooooooooooooooooooooooomooooooooooooooooooooooooooooon looooooooooooomooooooooooooon
g pT pxq,Aq hpxq
řN´1
And then T pxq “ n“0 xrns is a sufficient statistic for estimating A.
31
Example
Uniform Distribution
Let X1 , ¨ ¨ ¨ , Xn be independently distributed according to the uniform
distribution Up0, θq. Let T be the largest of the nX ’s, and consider the
conditional distribution of the remaining n ´ 1X ’s given T “ t. These
n ´ 1 points are randomly distributed on p0, tq which independent of θ.
32
More Examples
Symmetric Distribution
Suppose X is symmetrically distributed about zero. Then, given that
|X | “ t, the only two possible values of the X are ˘t with equal
conditional probability 1{2. Then T “ |X | is sufficient.
32
Better Estimator Using Sufficient Statistics T *
Rao-Blackwell Theorem
Let X be a random observable with distribution Pθ P P “ tPθ1 , θ1 P Ωu,
and let T be sufficient for P. Let θ̂ be an estimator of an estimand
g pθq, and let the loss function Lpθ, dq be a strictly convex function of
d. Then, if θ̂ has finite expectation and risk,
and if
θ̆ptq “ Erθ̂|T “ ts
33
Find the MVU Estimator
34
Find the MVU Estimator
34
Example: DC Level in WGN
7
See Appendix 10A
35
Example: DC Level in WGN
řN´1
Next, we want to show T pxq “ n“0 xrns is complete. We have shown
that g pT pxqq “ TNpxq is unbiased and suppose there exists a second
function hp¨q that ErhpT pxqqs “ A. Then,
which implies
v pT q “ g pT q ´ hpT q “ 0, a.e.
35
Incomplete Sufficient Statistic
36
Incomplete Sufficient Statistic
In general, we need
ż8
v pT qppT ; θqdT “ 0, @θ
´8
36
Procedure to Find the MVU
37
Extension to the Vector Parameter
38
Example: Sinusoidal Parameter Estimation
”ř ıT
N´1 řN´1
So the sufficient statistic is Tpxq “ n“0 xrns cos 2πf0 n, n“0 x 2 rns
39
DC Level in WGN with Unknown Noise Power
xrns “ A ` w rns, n “ 0, 1, ¨ ¨ ¨ , N ´ 1
40
DC Level in WGN with Unknown Noise Power
« ff
NA
EpTpxqq “
Npσ 2 ` A2 q
And then
« ff
x̄
θ̂ “ 1
řN´1 2 2
N´1 r n“0 x rns ´ N x̄ s
40
Exponential Families*
Definition
A family of distributions tpθ u is said to form an s-dimensional
exponential family if the distribution pθ is of the form
« ff
ÿs
pθ pxq “ exp ηi pθqTi pxq ´ Bpθq hpxq
i“1
ξ2
„
ξ 1 1
pθ pxq “ exp 2 x ´ 2 x 2 ´ 2 ?
σ 2σ 2σ 2πσ
42
Examples
42
Examples
42
Sufficient Statistic for Exponential Family*
• full rank or
• The parameter space contains s ` 1 points η pjq , which spans Rs in
the sense that they do not belong to a proper affine subspace.
43
Curved normal family
44
7
Completeness for Exponential Family
7
Theorem 1.6.22, TPE
45
Sufficient Statistic for Finite Family*
is minimal sufficient.
46
8
Sufficiency of order statistics
Thus, T is sufficient.
8
See TPE, Chap. 1
47
8
Sufficiency of order statistics
8
See TPE, Chap. 1
47
Nonparametric Families*
48
Examples
49
Examples
49
Examples
pXi ´X̄ q2
ř
For the variance, the estimator n´1 is symmetric, unbiased, and is
MVU.
Xi2
ř
For the second moment, the estimator n is a symmetric, unbiased,
MVU estimator of EpX 2 q.
49
The Information Inequality*
In previous slides, we show that CRLB may yield the best unbiased
estimator under certain conditions. In many other cases, we can seek the
MVU estimator.
rCovpδ, ψqs2
Varpδq ě
Varpψq
This inequality is not that helpful as the right hand side is a function of
δ.
50
The Information Inequality*
CovpU, ψq “ 0, @U P U
U “ tU : Eθ U “ 0, Eθ U 2 ă 8, @θ P Ωu
50
The Information Inequality*
ppx, θ ` ∆q
ψpx, θq “ ´1
ppx, θq
CovpU, ψq “ 0, @U P U
50
The Information Inequality*
Information Inequality for Exponential Family (scalar)
Let X be distributed according to the exponential family with s “ 1,
and let
τ pθq “ Eθ pT q
Motivation:
51
Definition of the BLUE
We observe the data set txr0s, xr1s, ¨ ¨ ¨ , xrN ´ 1su whose likelihood
function ppx; θq depends on an unknown parameter θ. The BLUE
restricts the estimator to be linear in the data:
N´1
ÿ
θ̂ “ an xrns
n“0
For example, for estimating the DC level in WGN, the MVU estimator is
the sample mean
N´1
ÿ 1
θ̂ “ x̄ “ xrns
n“0
N
52
Definition of the BLUE
If the additive noise is not Gaussian but uniform, the MVU estimator is
shown to be
N `1
θ̂ “ max xrns
2N
which is nonlinear in the data.
Epxrnsq “ srnsθ
where the srns’s are known. More generally, xrns can be represented as
Cx ´1 s sT Cx ´1 x 1
aopt “ , θ̂ “ , Varpθ̂q “
sT Cx ´1 s sT Cx ´1 s sT C xs
53
Finding the BLUE
xrns “ A ` w rns
where Varpw rnsq “ σn2 . In this case, s “ 1 and the BLUE is given by
1T Cx ´1 x 1
 “ , VarpÂq “
1T Cx ´1 1 1T Cx ´1 1
and the covariance matrix Cx is
Therefore,
řN´1 xrns
n“0 σn2
 “ řN´1 1
n“0 σn2
The BLUE weights those samples most heavily with smallest variances.
53
Extension to a Vector Parameter
θ̂ “ Ax, Epθ̂q “ θ
Epxq “ Hθ, AH “ I
The variance is
Varpθ̂i q “ aT
i Cx ai
54
Extension to a Vector Parameter
Gauss-Markov Theorem
If the data model is linear that
x “ Hθ ` w
54