Estimation and Detection: Lecture 9: Introduction Detection Theory (Chs 1,2,3)
Estimation and Detection: Lecture 9: Introduction Detection Theory (Chs 1,2,3)
• Speech coding: Detect whether speech is present. If speech is not present, there is
no need for the device (phone) to transmit any information.
2
Example – Speech Processing
A VAD can be implemented using a Bayesian hypothesis test:
Base on statistical models for S and N and the right hypothesis criterium, we can automat-
ically decide whether speech is absent or present.
(more details in our course speech processing in the course Digital audio and speech processing,
IN4182, 4th quarter).
3
Example – Radio Pulsar Navigation
• Highly magnetized rotating neutron star that emits a beam of electromagnetic radia-
tion.
• Radiation can only be observed when the beam of emission is pointing toward the
earth (lighthouse model)
For some millisecond pulsars, the regularity of pulsation is more precise than an atomic
clock.
5
Example – Radio Pulsar Navigation
6
What is Detection Theory?
Definition
Assume a set of data {x[0], x[1], . . . , x[N 1]} is available. To arrive at a decision, first we
form a function of the data or T (x[0], x[1], . . . , x[N 1]) and then make a decision based on
its value. Determining the function T and its mapping to a decision is the central problem
addressed in Detection Theory.
7
The Simplest Detection Problem
Binary detection: Determine whether a certain signal that is embedded in noise is present
or not.
H0 x[n] = w[n]
H1 x[n] = s[n] + w[n]
Note that if the number of hypotheses is more than two, then the problem becomes a
multiple hypothesis testing problem. One example is detection of different digits in speech
processing.
8
Example (1)
Detection of a DC level of amplitude A = 1 embedded in white Gaussian noise w[n] with
variance 2
with only one sample.
H0 : x[0] = w[0]
H1 : x[0] = 1 + w[0]
1
H0 : x[0] <
2
1
H1 : x[0] >
2
for the case where x[0] = 12 , we might arbitrarily choose one of the possibilities. However
the probability of such a case is zero!
9
Example (2)
1 ⇣ 1 ⌘
2
p(x[0]; H0 ) = p exp 2
x [0]
2⇡ 2 2
1 ⇣ 1 ⌘
2
p(x[0]; H1 ) = p exp 2
(x[0] 1)
2⇡ 2 2
Deciding between H0 and H1 , we are essentially asking weather x[0] has been generated
according to the pdf p(x[0]; H0 ) or the pdf p(x[0]; H1 ).
10
Detection Performance
• Can we expect to always make a correct decision? Depending on the noise variance
2
, it will be more or less likely to make a decision error.
• The data under both H0 and H1 can be modelled with two different pdfs. Using these
pdfs, a decision rule can be formulated. A typical example:
N 1
1 X
T= x[n] >
N n=0
• The detection performance will increase as the "distance" between the pdfs under
both H0 and H1 increases.
• Important pdfs
• Neyman-Pearson Theorem
12
Important pdfs – Gaussian pdf
1 ⇣ 1 ⌘
2
p(x) = p exp 2
(x µ) 1 < x < +1
2⇡ 2 2
where µ is the mean and 2
is the variance of x.
Z ⇣
x
1 1 2⌘
(x) = p exp t dt
1 2⇡ 2
A more convenient description is the right-tail probability which is defined as Q(x) = 1
(x). This function which is called Q-function is used frequently in different detection prob-
lems where the signal and noise are normally distributed.
13
Important pdfs – Gaussian pdf
1 0.4
0.9
Q(x)
0.35
)(x)
0.8
0.3
0.7
0.25
0.6
Gaussian pdf
cdf / 1-cdf
0.5 0.2
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0 0
-20 -15 -10 -5 0 5 10 15 20 -20 -15 -10 -5 0 5 10 15 20
x x
14
Important pdfs – central Chi-squared
P
v
A chi-squared pdf arises as the pdf of x, where x = x2i , if xi is a standard normally
i=1
distributed random variable. The chi-squared pdf with v degrees of freedom is defined as
8
> v
< v 1 x 2 1 exp 12 x , x > 0
22 (v2)
p(x) =
>
: 0, x<0
and is denoted by v.
2
v is assumed to be integer and v 1. The function (u) is the
Gamma function and is defined as 0.5
0.45
Z 1 0.4
(u) = tu 1
exp( t)dt 0.35
0
0.3
@2 pdf
0.25
8=2 (Exponential pdf)
0.2
0.15
8=20 (approaching Gaussian)
0.1
0.05
0 15
0 10 20 30 40 50 60 70 80 90 100
x
Important pdfs – non-central Chi-squared
P
v
If x = x2i , where xi ’s are i.i.d. Gaussian random variables with mean µi and variance
i=1
2
= 1, then x has a noncentral chi-squared pdf with v degrees of freedom and noncentrality
Pv
parameter = i=1 µ2i . The pdf then becomes
8 v 2 h i ⇣p ⌘
< 1 x 4 exp 1
2 (x + ) I 2 1 x , x>0
v
2
p(x) =
: 0, x<0
16
Making Optimal Decisions
Remember the example:
H0 : x[0] <
H1 : x[0] >
• Bayesian detector
17
Neyman-Pearson Theorem - Introduction
Example: Assume that we observe a random variable whose pdf is either N (0, 1) or N (1, 1).
Our hypothesis problem is then:
H0 : µ=0
H1 : µ=1
Detection rule:
1
H0 : x[0] <
2
1
H1 : x[0] >
2
19
Neyman-Pearson Theorem – Detection Performance
20
Neyman-Pearson Theorem
Problem statement
Assume a data set x = [x[0], x[1], ..., x[N 1]]T is available. The detection problem is defined
as follows
H0 : T (x) <
H1 : T (x) >
where T is the decision function and is the detection threshold. Our goal is to design T
so as to maximize PD subject to PF A < ↵.
21
Neyman-Pearson Theorem
p(x; H1 )
L(x) = >
p(x; H0 )
The function L(x) is called the likelihood ratio and the entire test is called the likelihood
ratio test (LRT).
22
Neyman-Pearson Theorem - Derivation
max PD subject to PF A = ↵
F = PD + (PF A ↵)
Z ✓Z ◆
= p(x; H1 )dx + p(x; H0 )dx ↵
R R1
Z 1
= (p(x; H1 ) + p(x; H0 )) dx ↵
R1
The problem now is (see Figures) to select to right range R1 and R0 . As we want to
maximise F , a value x should only be included in R1 if it increases the integrand. So, x
should only be included in R1 if
23
Neyman-Pearson Theorem - Derivation
p(x; H1 ) + p(x; H0 ) > 0
p(x; H1 )
) >
p(x; H0 )
A likelihood ratio is always positive, so = > 0 (if > 0 we would have PF A = 1)
p(x; H1 )
> ,
p(x; H0 )
24
Neyman-Pearson Theorem – Example DC in WGN
Consider the following signal detection problem
H0 : x[n] = w[n] n = 0, 1, . . . , N 1
where the signal is s[n] = A for A > 0 and w[n] is WGN with variance 2
. Now the NP
detector decides H1 if
h PN i
1 1 1 2
N exp 2 2 n=0 (x[n] A)
(2⇡ 2 ) 2
h PN 1 2 i >
1 1
N exp 2 2 n=0 x [n]
(2⇡ 2 ) 2
0
!
0 A
PD = P r(T (x) > ; H1 ) = Q p
2 /N
N A2
In this case d2 = 2 .
Further notice that the detection performance (PD ) increases monotonic with the deflection
coefficient.
27
Neyman-Pearson Theorem – Example Change Var
2
H0 : x[n] ⇠ N (0, 0)
2
H1 : x[n] ⇠ N (0, 1 ),
with 2
1 > 0.
2
Neyman-Pearson test:
h PN i
1 1 1 2
N exp 2 12 n=0 x[n]
(2⇡ 12 ) 2
h PN i>
1 1 1 2
N exp 2 02 n=0 x [n]
(2⇡ 02 ) 2
28
Neyman-Pearson Theorem – Example Change Var
we then have
N 1
1 X 2 0
x [n] >
N n=0
2
2 1
0 N ln +ln 2
with = 1 1
0
2 2
0 1
What about PD ?
29
Neyman-Pearson Theorem – Example Change Var
30
Receiver Operating Characteristics
31
Minimum Probability of Error
Assume the prior probabilities of H0 and H1 are known and represented by P (H0 ) and
P (H1 ), respectively. The probability of error, Pe , is then defined as
Pe = P (H1 )P (H0 |H1 ) + P (H0 )P (H1 |H0 ) = P (H1 )PM + P (H0 )PF A
Our goal is to design a detector that minimizes Pe . It is shown that the following detector is
optimal in this case
p(x|H1 ) P (H0 )
> =
p(x|H0 ) P (H1 )
In case P (H0 ) = P (H1 ), the detector is called the maximum likelihood detector.
32
Minimum Probability of Error - Derivation
We know that Z Z
p(x|H1 )dx = 1 p(x|H1 )dx,
R0 R1
such that
✓ Z ◆ Z
Pe = P (H1 ) 1 p(x|H1 )dx + P (H0 ) p(x|H0 )dx
R1 R1
Z
= P (H1 ) + [P (H0 )p(x|H0 ) P (H1 )p(x|H1 )] dx
R1
33
Minimum Probability of Error - Derivation
Z
Pe = P (H1 ) + [P (H0 )p(x|H0 ) P (H1 )p(x|H1 )] dx
R1
34
Minimum Probability of Error– Example DC in WGN
Consider the following signal detection problem
H0 : x[n] = w[n] n = 0, 1, . . . , N 1
where the signal is s[n] = A for A > 0 and w[n] is WGN with variance 2
. Now the min.
p(x|H1 ) P (H0 )
probability of error detector decides H1 if p(x|H0 ) > P (H1 ) = 1 (assuming P (H0 ) = P (H1 ) =
0.5), leading to h i
1 1
PN 1 2
N exp 2 2 n=0 (x[n] A)
(2⇡ 2 ) 2
h PN 1 2 i >1
1 1
N exp 2 2 n=0 x [n]
(2⇡ 2 ) 2
Pe is then given by
1
Pe = [P (H0 |H1 ) + P (H1 |H0 )]
2
" N 1 N 1
#
1 1 X 1 X
= P r( x[n] < A/2|H1 ) + P r( x[n] > A/2|H0 )
2 N n=0 N n=0
" !! !#
1 A/2 A A/2
= 1 Q p +Q p
2 2 /N 2 /N
r !
NA 2
= Q
4 2
36
Minimum Probability of Error – MAP detector
Starting from
p(x|H1 ) P (H0 )
> =
p(x|H0 ) P (H1 )
we can use Bayes’ rule:
p(x|Hi )p(Hi )
p(Hi |x) =
p(x)
we arrive at
p(H1 |x) > p(H0 |x).
this is called the MAP detector, which, if P (H1 ) = P (H0 ) reduces again to the ML detector.
37
Bayes Risk
A generalisation of the minimum Pe criterion is one where costs are assigned to each type
of error:
Let Cij be the cost if we decide Hi while Hj is true. Minimizing the expected costs we get
1 X
X 1
R = E[C] = Cij P (Hi |Hj )P (Hj )
0=1 j=0
If C10 > C00 and C01 > C11 the detector that minimises the Bayes risk is to decide H1 when
38