Optimal Joint Detection and Estimation in Linear Models: Jianshu Chen, Yue Zhao, Andrea Goldsmith, and H. Vincent Poor
Optimal Joint Detection and Estimation in Linear Models: Jianshu Chen, Yue Zhao, Andrea Goldsmith, and H. Vincent Poor
Abstract— The problem of optimal joint detection and esti- expressed in terms of some generalized forms of likelihood
mation in linear models with Gaussian noise is studied. A simple ratios. The structure of the optimal Bayesian estimator under
closed-form expression for the joint posterior distribution of the any given constraints on false alarm probability and proba-
(multiple) hypotheses and the states is derived. The expression
crystalizes the dependence of the optimal detector on the state bility of missed detection have also been developed for the
estimates. The joint posterior distribution characterizes the binary hypothesis case [8].
beliefs (“soft information”) about the hypotheses and the values In this paper, we study the problem of optimal joint
of the states. Furthermore, it is a sufficient statistic for jointly detection and estimation for a general class of observation
detecting multiple hypotheses and estimating the states. The models, namely, linear models with Gaussian noise. Linear
developed expressions give us a unified framework for joint
detection and estimation under all performance criteria. models appear in a wide range of engineering applications
including power systems [1], channel estimation [9], [10],
I. I NTRODUCTION adaptive array processing [11]–[13], and spectrum estimation
[14]. In these applications, not only is state estimation
Detection and estimation problems appear simultaneously
of primary interest, but also the observation matrix can
and are naturally coupled in many engineering systems.
often change over time, and it is essential to detect which
Several prominent examples are as follows. To achieve
observation matrix among many possibilities is currently
situational awareness in power grids, it is essential to have
effective. We formulate these problems as joint multiple
timely detection of outages as well as estimation of system
hypothesis testing and state estimation problems. Instead
states [1], [2]. Radar systems detect the existence of targets
of focusing on a particular form of performance criterion
and also estimate their positions and velocities [3]. Wireless
and developing the corresponding optimal joint detector and
communication systems often need to decode messages and
estimator, we develop a unified Bayesian approach that can
estimate channel states at the same time [4]. In different
be applied to any given criterion. Specifically, employing a
engineering systems, the problem settings of joint detection
conjugate prior, we provide closed form expressions for the
and estimation can vary greatly, and many application-
joint posterior of the hypotheses and the system states given
specific solutions have been developed in practice.
all measurement samples. The developed expressions reveal
A classic approach that addresses the detection problem in
the exact dependence of the optimal detectors on the state
the presence of unknown states/parameters is composite hy-
estimates. Because the joint posterior is a sufficient statistic
pothesis testing [3]. Accordingly, a straightforward approach
for joint hypothesis testing and state estimation, the derived
for joint hypothesis testing and state/parameter estimation is
explicit forms of such soft information (as opposed to hard
to perform composite hypothesis testing first, followed by
decisions) can be applied to all performance criteria with
state/parameter estimation based on the hard decision made
optimality guarantees.
from hypothesis testing. However, such an approach cannot
The remainder of the paper is organized as follows. In
provide optimality guarantees under general performance cri-
Section II, we describe the system model and formulate
teria that depend jointly on detection and estimation results.
the joint detection and estimation problem. In Section III,
In the literature, several studies have addressed such joint
we provide a factorization of the likelihood function and
performance criteria. The structure of the jointly optimal
derive a simple closed form expression for the joint posterior
Bayes detector and estimator with discrete-time data was
distribution. Finally, we conclude our paper and remark on
developed in [5] and [6], and was extended to the continuous-
future directions in Section IV.
time data case in [7]. There, the detector structure was
Notation. We use boldface letters to denote random quan-
This research was supported in part by the DTRA under Grant HDTRA1- tities and use regular letters to denote realizations or deter-
08-1-0010, in part by the Air Force Office of Scientific Research under ministic quantities.
MURI Grant FA9550-09-1-0643, and in part by the Office of Naval Research
under Grant N00014-12-1-0767. II. P ROBLEM FORMULATION
J. Chen is with the Dept. of Electrical Engineering, University of
California, Los Angeles, CA 90095 USA. (e-mail: [email protected]) We consider the following observation model which en-
Y. Zhao is with the Dept. of Electrical Engineering, Princeton Uni- tails a joint detection and estimation problem. Given each of
versity, Princeton, NJ, 08544 USA, and with the Dept. of Electrical
Engineering, Stanford University, Stanford, CA, 94305 USA (e-mail: the K + 1 hypotheses H0 , H1 , . . . , HK , the M × 1 sensor
[email protected]). measurement vector xt at time t is obtained according to the
A. Goldsmith is with the Dept. of Electrical Engineering, Stanford following linear model:
University, Stanford, CA 94305 USA (e-mail: [email protected])
H. V. Poor is with the Dept. of Electrical Engineering, Princeton Univer-
sity, Princeton, NJ 08544 USA (e-mail: [email protected]) H k : x t = Hk θ + v t , k = 0, 1, . . . , K, (1)
where Hk is the M ×N observation matrix under hypothesis Lemma 1 (Factorization): According to the linear model
Hk , θ is the N × 1 unknown state vector1 to estimate, in (1), we can express the conditional distribution (the
and vt ∼ N (0, Rv ) is the M × 1 measurement noise that likelihood function) p(xi |θ, Hk ) in the following form:
is independent and identically distributed (i.i.d.) over time.
From the measurement data {xt }, we want to jointly infer p(xi |θ, Hk ) = p(xi |θ̂k,ML , Hk )
a) the true underlying linear model Hk , and b) the true 1
2
× exp −
θ− θ̂k,ML
I(θ̂ (3)
2 k,ML )
underlying states θ. Note that neither of them is known
beforehand, and we need to solve a problem of jointly
where the notation kxk2Σ denotes xT Σx for some positive
detecting Hk and estimating θ. Such problems arise in many
definite weighting matrix Σ, θ̂k,ML is the maximum likeli-
applications. We provide in the following an example that
hood estimate of θ given that hypothesis Hk is true, and
arises commonly in power grid monitoring. An outage in a
I(θ̂k,ML ) is the corresponding Fisher information matrix:
power grid will change the grid topology, and the system
operator wants to detect which outage among a candidate θ̂k,ML = (HkT Rv−1 Hk )−1 HkT Rv−1 xi (4)
set {H1 , . . . , HK } occurs, or no outage occurs (H0 ). With a
I(θ̂k,ML ) = i · (HkT Rv−1 Hk ) (5)
given set of sensors in the grid, the k th outage scenario gives
i
rises to a unique observation matrix Hk , and the sensors 1 X
xi = xt . (6)
measure the states of the grid θ via (1). Consequently, state i t=1
estimation depends on knowledge of the true outage, and Proof: See Appendix I.
outage detection depends on knowledge of the true states [2]. In [15], an asymptotic expression similar to (3) was
Clearly, solving a joint detection and estimation problem is derived for general likelihood functions satisfying certain
essential for monitoring the health of the power grid in real regularity conditions for i large. In comparison, our expres-
time. sion (3) holds for all i ≥ 1 due to the properties of the
For these purposes, this paper provides the joint posterior linear model with Gaussian noise that we have assumed.
distribution p(θ, Hk |xi ) (see (12)–(13) below), which gives Furthermore, the linear model (1) also allows us to evaluate
us the beliefs about both θ and Hk . Actually, it is also the expression for p(xi |θ̂k,ML , Hk ) given by the following
a sufficient statistic for θ and Hk given data xi , which lemma.
provides full information from the measured data xi about Lemma 2 (Expression for p(xi |θ̂k,ML , Hk )): The condi-
the hypothesis Hk and the state vector θ. Therefore, instead tional probability p(xi |θ̂k,ML , Hk ) can be expressed as
of being the optimal functions of xi , the optimal decision " #i
rule and the estimator need only be the optimal functionals 1
of p(θ, Hk |xi ). Deriving the expressions for the joint pos- p(xi |θ̂k,ML , Hk ) = M 1
(2π) 2 det(Rv ) 2
terior distribution will give us a unified framework for joint ( )
i
detection and estimation under all performance criteria (e.g., 1X 2
· exp − kxt kRv−1
minimum risk/minimum probability of error/maximum a- 2 t=1
posteriori probability (MAP) detection, and MAP/minimum-
1
2
mean-square-error (MMSE) estimate). · exp
θ̂ k,ML I(θ̂
(7)
2 k,ML )
III. J OINT POSTERIOR OF HYPOTHESES AND STATES where θ̂k,ML and I(θ̂k,ML ) are given by (4)–(6).
We now derive the joint posterior distribution of the Proof: See Appendix II.
hypothesis Hk and the unknown states θ. Specifically, we Substituting (7) into (3), we obtain the following factor-
will use p(θ, Hk |xi ) as a hybrid probability measure to ization of the likelihood function:
#i
denote the joint posterior distribution of θ and Hk :
"
i 1
p(x |θ, Hk ) = M 1
p(θ, Hk |xi ) = p(Hk |xi )p(θ|Hk , xi ) (2) (2π) 2 det(Rv ) 2
( i
)
where p(Hk |xi ) denotes the posterior probability mass func- 1X 2
· exp − kxt kRv−1
tion (PMF) of Hk and p(θ|Hk , xi ) denotes the posterior 2 t=1
probability density function (PDF) of θ given Hk .
1
2
· exp
θ̂k,ML
2 I(θ̂k,ML )
A. The Likelihood Function
2
1
· exp −
θ − θ̂k,ML
. (8)
We begin with a factorization of the likelihood func- 2 I(θ̂k,ML )
tion p(xi |θ, Hk ), which will be useful in finding sufficient
statistics for jointly detecting Hk and estimating θ, and in Note that the first two terms in (8) are independent of
computing the joint posterior distribution. the hypothesis index k and the state vector θ, while the
other two terms depend on {θ̂k,ML }, which, by (4), is
1 In addition to states, θ can also include parameters in some applications further determined by xi . Therefore, by the Neyman-Fisher
[12], [13]. For the sake of brevity, we refer to θ as states from now on. factorization theorem [3], [15], [16], xi is a sufficient statistic
for jointly detecting Hk and estimating θ. This fact will also 1 2
exp 2 kθ̂q,MMSE kCq,MMSE
−1
be reflected further ahead in the joint posterior expressions · (14)
(12)–(16), where xi is the only statistic we need to track 1 2
over time via, e.g.,
exp 2 kθq,0 kCq,0
−1 ,
−1
1
−1
xi = xi−1 + (xi − xi−1 ). (9) θ̂k,MMSE , Ck,0 +I(θ̂k,ML )
i
−1
B. Conjugate Prior × I(θ̂k,ML ) θ̂k,ML +Ck,0 θk,0 , (15)
For a certain likelihood function, if a prior distribution
−1
−1
produces a posterior distribution of the same family, then and Ck,MMSE , Ck,0 + I(θ̂k,ML ) . (16)
such a prior distribution is called a conjugate prior. With a Proof: See Appendix III.
conjugate prior, we need only to maintain the recursions for Note that θ̂k,MMSE is the classical MMSE estimate of θ
the parameters that describe the distribution family of the given Hk is true, and Ck,MMSE is the corresponding error
prior and the posterior. We will use this kind of prior in our covariance matrix. In the posterior expression (12),
joint detection and estimation problem. i
• f (x ) is a normalization factor.
At the beginning (before any measurement data are avail- • p(Hk ) captures theq prior PMF.
able), we assume that the prior distribution of θ and Hk are • The intuition of
det(Ck,MMSE )
is to penalize the
det(Ck,0 )
given by model complexity of H . This term reduces to
q k
p(θ, Hk ) = p(Hk )p(θ|Hk ) (10) det(I(θ̂k,ML )−1 ) in the case of an uninformative
where p(Hk ) is the prior PMF of the hypothesis Hk and prior (see (18) below), for which a discussion of its
p(θ|Hk ) is the prior PDF of the state vector θ given hypoth- meaning can be found in [15].
esis Hk . Throughout the paper, we assume that given Hk , θ • The last term in (12) characterizes the similarity be-
has a Gaussian prior: tween the data (adjusted by the prior PDF) and the
hypothesis Hk .
1 1 2
p(θ|Hk ) = N 1 exp − kθ − θk,0 kC −1 Moreover, the exact dependence of the optimal detector on
(2π) 2 det(Ck,0 ) 2 2 k,0
the state estimator can be seen from expression (12).
(11)
Accordingly, the posterior marginal distribution p(θ|xi )
where θk,0 and Ck,0 are the corresponding prior mean and for the state vector θ can be expressed as
covariance matrix given hypothesis Hk , respectively. We will K
show in the next section that this prior is indeed a conjugate
X 1
p(θ|xi ) = p(Hk |xi ) · N 1
prior. Furthermore, we will also show that even with an k=0
(2π)
det(Ck,MMSE ) 2
2
i
! Finally, substituting (23) into (22), we establish Lemma 2.
1X
· exp − (xt −Hk θ̂k,ML )T Rv−1 (xt −Hk θ̂k,ML )
2 t=1 A PPENDIX III
i
! P ROOF OF THEOREM 1
1 X
· exp − (θ̂k,ML −θ)T (HkT Rv−1 Hk )(θ̂k,ML −θ) By Bayes’ formula, the joint posterior distribution can be
2
t=1 expressed as
h
· exp − i xTi Rv−1 Hk (HkT Rv−1 Hk )−1 p(θ, Hk )p(xi |θ, Hk )
| {z } p(θ, Hk |xi ) = . (24)
T
θ̂k,ML p(xi )
× HkT Rv−1 Hk (θ̂k,ML − θ) To compute the above posterior distribution, we need to have
i p(xi ) given by
T
− θ̂k,ML HkT Rv−1 Hk (θ̂k,ML − θ) K Z
X
" #i p(xi ) = p(Hk ) · p(θ)p(xi |θ, Hk )dθ. (25)
1 k=0 θ∈Θ
= M 1
(2π) 2 det(Rv ) 2 To proceed, we first introduce the following lemma that gives
i
!
1X an integral result that would be useful in deriving both the
· exp − (xt −Hk θ̂k,ML )T Rv−1 (xt −Hk θ̂k,ML ) optimal detection and estimation procedures.
t=1
2
Lemma 3 (A useful integral): Suppose we are given the
1 T T −1 Gaussian prior distribution (11). Then, the following result
· exp − (θ̂k,ML −θ) (i · Hk Rv Hk )(θ̂k,ML −θ)
2 holds:
= p(x|θ̂k,ML , Hk ) Z
1
2
p(θ) exp −
θ − θ̂k,ML
dθ (26)
1 T 2 I(θ̂k,ML )
· exp − (θ̂k,ML −θ) I(θ̂k,ML )(θ̂k,ML −θ) . θ∈Θ
2
2
1
−1
= exp
I(θ̂k,ML )θ̂k,ML +Ck,0 θk,0
−1
(21) 2 −1
(Ck,0 +I(θ̂k,ML ))
A PPENDIX II
1 2
P ROOF OF LEMMA 2 × exp − kθk,0 kC −1 + kθ̂k,ML k2I(θ̂
2 k,0 k,ML )
t=1
1
i
X · exp kθ̂k,ML k2I(θ̂
k,ML )
T −1
= xt Rv xt − 2xTt Rv−1 Hk θ̂k,ML Z
2
t=1 1
2
· p(θ) exp −
θ − θ̂k,ML )
I(θ̂ dθ
T
HkT Rv−1 Hk θ̂k,ML
+ θ̂k,ML 2 k,ML )
θ∈Θ
i
" #i ( i
)
X 1 1X
= xTt Rv−1 xt − 2ixTi Rv−1 Hk θ̂k,ML = M 1 · exp − 2
kxt kR−1
t=1 (2π) 2 det(Rv ) 2 2 t=1 v
2
1
1
−1
· exp = exp
I(θ̂k,ML )θ̂k,ML +Ck,0 θk,0
−1
k,ML I(θ̂
θ̂
k,ML ) −1
2 2 (Ck,0+I(θ̂k,ML ))
2
1
−1 1 2
· exp
I(θ̂k,ML )θ̂k,ML +Ck,0 θk,0
−1 × exp − kθk,0 kC −1 + kθ̂k,ML k2I(θ̂
−1
2 (Ck,0+I(θ̂k,ML )) 2 k,0 k,ML )
1
1
· exp − kθk,0 k2C −1 + kθ̂k,ML k2I(θ̂ )
× 21 (30)
2 k,ML 1
k,0 −1
− 12 det(Ck,0 ) 2 · 1 det Ck,0 + I(θ̂k,ML )
1
−1
· det(Ck,0 )− 2 · det Ck,0 + I(θ̂k,ML ) (28)
where the notation kxk2Σ = xT Σx, and in the last step we
used the fact that the integral of a Gaussian distribution over
p(xi ) the entire space equals one.
P Z
X R EFERENCES
= p(Hk ) · p(θ)p(xi |θ, Hk )dθ
p=0 θ∈Θ [1] A. Abur and A. Gomez-Exposito, Power System State Estimation:
" #i Theory and Implementation, Marcel Dekker, New York, 2004.
P [2] Y. Zhao, A. Goldsmith, and H. V. Poor, “On PMU location selection
X 1
= p(Hk ) · M 1
for line outage detection in wide-area transmission networks,” in Proc.
p=0 (2π) 2 det(Rv ) 2 IEEE Power and Energy Society General Meeting, San Diego, CA,
( ) July 2012, pp. 1–8.
i [3] H. V. Poor, An Introduction to Signal Detection and Estimation,
1X 2 1
2
· exp − kxt kR−1 · exp
θ̂ k,ML I(θ̂
Springer-Verlag, New York, 1994.
2 t=1 v 2 k,ML )
[4] A. Goldsmith, Wireless Communications, Cambridge University Press,
2 Cambridge, UK, 2005.
1
−1 [5] D. Middleton and R. Esposito, “Simultaneous optimum detection and
· exp θ̂ )θ̂ +C θ
I( k,ML k,ML k,0 k,0
−1 −1
2 (Ck,0+I(θ̂k,ML )) estimation of signals in noise,” IEEE Trans. Inf. Theory, vol. 14, no.
3, pp. 434–444, May 1968.
1
[6] A. Fredriksen, D. Middleton, and V. VandeLinde, “Simultaneous signal
· exp − kθk,0 k2C −1 + kθ̂k,ML k2I(θ̂ ) detection and estimation under multiple hypotheses,” IEEE Trans. Inf.
2 k,0 k,ML
1
× 21
1
−1
det(Ck,0 ) 2 · det Ck,0 + I(θ̂k,ML )
1
× − 21
N −1
(2π) 2 det Ck,0 + I(θ̂k,ML )
Z n 1
−1
−1
× exp −
θ − Ck,0 + I(θ̂k,ML )
θ∈Θ 2
2 o
−1
× I(θ̂k,ML )θ̂k,ML + Ck,0 θk,0
−1 dθ
(Ck,0 +I(θ̂k,ML ))