Estimation and Detection: Lecture 6: The Bayesian Philosophy
Estimation and Detection: Lecture 6: The Bayesian Philosophy
⇤ For large data records, MLE is efficient and leads (asymptotically) to MVUE
⇤ for linear data model and Gaussian noise, MLE is efficient and MVUE (for
limited data records). 2
1
02/12/16
This lecture
• Bayesian approach can be useful when the MVU estimator is difficult to find.
Example 1 (1)
Example: estimation of the mean
2
x[n] = A + w[n], n = 0, · · · , N 1, w[n] ⇠ N (0, ).
In lecture 2 (Ch. 3) we have seen that the MVU estimator can be derived to be the sample
mean estimator:
N 1
1 X
 = x[n]
N n=0
2
02/12/16
Example 1 (2)
h PN i
1 1 1
p(x; A) = (2⇡ 2 )N/2
exp 2 2 n=0 (x[n] A)2
The CRB:
" N 1
#
@ ln p(x; A) @ 2 N/2 1 X 2
= ln[(2⇡ ) ] 2
(x[n] A)
@A @A 2 n=0
N 1
1 X
= 2
(x[n] A)
n=0
0 1
B N 1 C
N B X C
= B1 x[n] A C
2 BN C
|{z} @ n=0 A
I(A)
| {z }
Â
@ 2 ln p(x; A) N
=
@A2 2
h i 2 PN 1
var  MVU: 1
N n=0 x[n]
N 5
Example 1 (3)
h PN i
1 1 1
p(x; A) = (2⇡ 2 )N/2
exp 2 2 n=0 (x[n] A)2
PN 1
MVU: 1
N n=0 x[n]
The MLE:
" N 1
#
@ ln p(x; A) @ 2 N/2 1 X 2
= ln[(2⇡ ) ] 2
(x[n] A)
@A @A 2 n=0
N 1
1 X
= 2
(x[n] A) = 0
n=0
1
PN 1
 = N n=0 x[n]
3
02/12/16
Example 1 (4)
Example: estimation of the mean
2
x[n] = A + w[n], n = 0, · · · , N 1, w[n] ⇠ N (0, ).
In lecture 2 (Ch. 3) we have seen that the MVU estimator can be derived to be the sample
mean estimator:
N 1
1 X
 = x[n]
N n=0
Let us now assume that A0 A A0 . Without any further information about the pdf of A,
let’s assume A is uniformly distributed in the interval.
This lecture
w[n]
+
Classical approach + P
A x[n] for n = 0, 1, 2, ..., N 1
A
A0 A0
w[n]
+
+ P
Bayesian approach Select A x[n] for n = 0, 1, 2, ..., N 1
4
02/12/16
but this time both x and ✓ are random and the statistics of ✓ˆ depend on the statistics
of both x and ✓.
Whereas the first MSE depends on ✓, the second MSE does not, only on its statistics.
10
5
02/12/16
Since p(x) 0 for all x, we have to minimize the inner integral for each x.
Z
Problem: min (✓ˆ ✓)2 p(✓|x)d✓
✓
R
Solution: mean of posterior pdf of ✓: ✓ˆ = E(✓|x) = ✓p(✓|x)d✓
The a posteriori pdf p(✓|x): the pdf after observing the data.
11
• The MMSE has a smaller average MSE (Bayesian MSE) than the MVU, but the MMSE
estimator is biased whereas the MVU estimator is unbiased.
12
6
02/12/16
p(A)
1
0.5
0
-10 -5 0 5 10
p(xjA)
0.4
0.2
0
-10 -5 0 5 10
p(xjA)p(A)
0.4
0.2
0
-10 -5 0 5 10 13
14
7
02/12/16
15
15
16
16
8
02/12/16
• p(x|A) follows from one additional assumption and the signal model:
p(✓)p(x|✓)
p(✓|x) = .
p(x)
• The Bayesian MSE minimises the MSE over all realisations of ✓ and x.
R
• Bayesian MMSE: ✓ˆ = E(✓|x) = ✓
✓p(✓|x)d✓.
• The MMSE depends thus on both the prior knowledge (via p(✓)) and the data (via
p(x|✓)).
9
02/12/16
p(✓)p(x|✓)
p(✓|x) = .
p(x)
• If prior knowledge is rather weak compared to the data (p(✓) much wider than p(x|✓)),
then the estimator will rely primarily on the data and ignores the prior knowledge.
• When the prior knowledge is rather strong compared to the data (p(✓) much more
narrow than p(x|✓)), the estimator will be biased towards the mean of the prior pdf.
• The Bayesian estimator thus always compromises between the prior and the data.
• Increasing the amount of data typically makes p(x|✓)) more concentrated, resulting in
an estimator relying more and more on the data.
19
2
x[n] = A + w[n], n = 0, · · · , N 1, w[n] ⇠ N (0, ), A ⇠ U ( A0 , A 0 )
1 1
PN 1
) p(x|A) = (2⇡ 2 )N/2
exp[ 2 2 n=0 (x[n] A)2 ]
8
< 1
|A| A0
2A0 ,
and p(A) =
: 0, |A| > A0
20
10
02/12/16
8 " N 1
#
>
> 1 1 X 2
>
> exp (x[n] A)
>
> 2A0 (2⇡ 2 )N/2 2 2
>
< n=0
Z " N
# , |A| A0
p(A|x) = A0
1 1 X1
>
> exp (x[n] A) 2
dA
>
> 2A0 (2⇡ 2 )N/2 2 2
>
> A0 n=0
>
:
0, |A| > A0
PN 1
we can write n=0 (x[n] A)2 as
N
X1 N
X1 N
X1
(x[n] A)2 = x2 [n] 2N Ax̄ + N A2 = N (A x̄)2 + x2 [n] N x̄2 ,
n=0 n=0 n=0
21
22
11
02/12/16
Example
Z
1 - Bayesian
approach (4)
A0
1 1
Z 1
Ap exp 2 /N
(A x̄)2 dA
A0 2⇡ 2 /N 2
 = Ap(A|x)dA = Z A0
1 1 1
p exp 2 /N
(A x̄)2 dA
A0 2⇡ 2 /N 2
2 2
x[n] = A + w[n], n = 0, · · · , N 1, w[n] ⇠ N (0, ), A ⇠ N (µA , A)
1 1
PN 1
) p(x|A) = (2⇡ 2 )N/2
exp[ 2 2 n=0 (x[n] A)2 ]
and p(A) = q 1
2
exp[ 1
2 (A
2 A
µA ) 2 ]
2⇡ A
Since both p(x|A) and p(A) are now Gaussian, the a posterior pdf p(A|x) will also be
Gaussian:
1 1
p(A|x) = q exp[ 2 (A µA|x )2 ]
2⇡ 2 2 A|x
A|x
with 2
A|x = 1
N + 1 and µA|x = ( N2 x̄ + µA
2 )
2
A|x
2 2 A
A
24
12
02/12/16
 = ↵x̄ + (1 ↵)µA
• ↵ expresses the interplay between the prior knowledge (µA ) and the data (x̄).
• Also notice that the larger the data record (increasing N ), the narrower the a posteriori
pdf (and less uncertainty) as 2
A|x = 1
N + 1 .
2 2
A
25
1 1
p(A|x) = q exp[ 2 (A µA|x )2 ]
2⇡ 2 2 A|x
A|x
with 2
A|x = 1
N + 1 and µA|x = ( N2 x̄ + µA
2 ) A|x .
2
2 2 A
A
• If N ! 1, then  ! x̄.
26
13
02/12/16
Z Z
Bmse(Â) = E[(A Â)2 ] = (A E[A|x])2 p(x, A)dxdA
Z Z
= (A E[A|x])2 p(A|x)dAp(x)dx
Z
= var[A|x]p(x)dx
Z !
2 2
1 1 A
= N 1
p(x)dx = N 1
= 2
2 + 2 2 + 2 N 2
A+ N
A A
Hence, !
2 2 2
A
Bmse(Â) = 2 < = mse(Â)
N 2
A+ N
N
|{z}
CRB for classical estimators 27
14
02/12/16
x = 1A + w.
Then, we can say that x and A are jointly Gaussian (k = N and l = 1), with zero mean and
covariance matrix
22 3 3 2 3
2 T 2 2
x ⇥ T ⇤ A 11 + I A 15
Cx,A = E 44 5 x , A 5 = 4
2 T 2
A A1 A
29
2 T 2 T 2 1
E(A|x) = A 1 ( A 11 + I) x
2 4 T 2 T 2 1
CA|x = A A 1 ( A 11 + I) 1
(A + BCD) 1 = A 1 A 1 B(C 1
+ DA 1
B) 1
DA 1
2 T 2 T
A 1 ( A 11 +
2
I) 1 = ( A 2 + 2 T
1 1) 1 2 T
1
we can compute
2 T 2 T 2 1 2 2 T 1 2 T
E(A|x) = A 1 ( A 11 + I) x=( A + 1 1) 1 x 30
15
02/12/16
N
2
x̄
 = E(A|x) =
1 N
2 + 2
A
Z Z Z Z
Bmse(Â) = (A Â)2 p(x, A)dxdA = (A E(A|x))2 p(A|x)dAp(x)dx
Z 2 2
1 A
= CA|x p(x)dx = = 2
1 N N A+
2 /N
2 + 2 31
A
x = H✓ + w, w ⇠ N (0, C)
In that case p(✓|x) is also Gaussian with mean and covariance matrix
E(✓|x) = µ✓ + C✓ HT (HC✓ HT + C) 1
(x Hµ✓ )
T T 1
C✓|x = C✓ C✓ H (HC✓ H + C) HC✓
32
16
02/12/16
• The Bayesian MMSE estimator always exist (although sometimes difficult to calculate
analytically)
• For one particular parameter ✓ (when deterministic) it might not perform well.
33
2 2
x[n] = A + w[n], n = 0, · · · , N 1, w[n] ⇠ N (0, ), A ⇠ N (µA , A)
1 1
PN 1
p(x|A) = (2⇡ 2 )N/2
exp[ 2 2 n=0 (x[n] A)2 ]
and p(A) = q 1
2
exp[ 1
2 A2 (A µA ) 2 ]
2⇡ A
Since both p(x|A) and p(A) are now Gaussian, the a posterior pdf p(A|x) will also be
Gaussian:
1 1
p(A|x) = q exp[ 2 (A µA|x )2 ]
2⇡ 2 2 A|x
A|x
with 2
A|x = 1
N + 1 and µA|x = ( N2 x̄ + µA
2 )
2
A|x
2 2 A
A
34
17
02/12/16
 = ↵x̄ + (1 ↵)µA
Imagine that we have used this Bayesian estimator for A, while A was in fact deterministic.
In that case we can evaluate the MSE:
M SE(Â) = var(Â) + b2 (Â) = var(Â) + (E[Â] A)2 = ↵2 var(x̄) + (↵A + (1 ↵)µA A)2
2
= ↵2 + (1 ↵)2 (A µA ) 2
N
35
 = ↵x̄ + (1 ↵)µA
Imagine that we have used this Bayesian estimator for A, while A was in fact deterministic.
In that case we can evaluate the MSE:
M SE(Â) = var(Â) + b2 (Â) = var(Â) + (E[Â] A)2 = ↵2 var(x̄) + (↵A + (1 ↵)µA A)2
2
= ↵2 + (1 ↵)2 (A µA ) 2
N
2
Compare this to the classical deterministic (MVU) estimator ÂC = x̄: M SE(x̄) = N .
Only if A is close to the prior mean µA , M SE(Â) < M SE(ÂC ), as it trades off the bias for
a lower variance.
36
18
02/12/16
• The Bayesian MSE is the smallest on average. It trades of the bias for variance in
order to improve the overall Bayesian MSE (using prior knowledge on A). That is,
only if A is really random, we have
2
Bmse(Â) = EA [mse(Â)] = ↵2 N + (1 ↵)2 EA [(A µA ) 2 ]
2 2 2 2
= ↵2 N + (1 ↵)2 2
A = N
A
2+ 2
< N
A N
37
19