Digital Signal Processing: B. Dulek, S. Gezici
Digital Signal Processing: B. Dulek, S. Gezici
a r t i c l e i n f o a b s t r a c t
Article history: Novel convex measurement cost minimization problems are proposed based on various estimation
Available online 17 April 2012 accuracy constraints for a linear system subject to additive Gaussian noise. Closed form solutions are
obtained in the case of an invertible system matrix. In addition, the effects of system matrix uncertainty
Keywords:
are studied both from a generic perspective and by employing a specific uncertainty model. The results
Measurement cost
Cramer–Rao bound (CRB)
are extended to the Bayesian estimation framework by treating the unknown parameters as Gaussian
Parameter estimation distributed random variables. Numerical examples are presented to discuss the theoretical results in
Gaussian noise detail.
© 2012 Elsevier Inc. All rights reserved.
1051-2004/$ – see front matter © 2012 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.dsp.2012.04.009
B. Dulek, S. Gezici / Digital Signal Processing 22 (2012) 828–840 829
individual power constraints are also imposed at each sensor. In [13] in order to calculate the optimal costs of measurement devices
[7], the distributed estimation of a deterministic parameter im- that maximize the average Fisher information for a scalar parame-
mersed in uncorrelated noise in a WSN is targeted under a total ter estimation problem.
bit rate constraint. The number of active sensors is determined Although the optimal cost allocation problem is studied for the
together with the quantization bit rate of each active sensor in single parameter estimation case in [13], and the signal recovery
order to minimize the MSE. The problem of estimating a spatially based on linear minimum mean-squared-error (LMMSE) estimators
distributed, time-varying random field from noisy measurements is discussed under cost-constrained measurements using a linear
collected by a WSN is investigated under bandwidth and energy system model in [1], no studies have analyzed the implications of
constraints on the sensors in [8]. Using graph-theoretic techniques, the proposed measurement device model in a more general setting
it is shown that the energy consumption can be reduced by con- by considering both random and nonrandom parameter estimation
structing reduced order Kalman–Bucy filters from only a subset of under various estimation accuracy constraints and uncertainty in
the sensors. In order to prevent degradation in the root-mean- the linear system model. The main contributions of our study in
squared (RMS) estimation error performance, efficient methods this paper extend far beyond a multivariate analysis of the discus-
employing Pareto optimality criterion between the communication sion in [13], and can be summarized as follows:
costs and RMS estimation error are presented. A power allocation
problem for distributed parameter estimation is investigated un- • Formulated new convex optimization problems for the min-
der a total network power constraint for various topologies in [9]. imization of the total measurement cost by employing con-
It is shown that for the basic star topology, the optimal solution straints on various estimation accuracy criteria (i.e., different
assumes either of the sensor selection, water-filling, or channel functionals of the eigenvalues of the Fisher information ma-
inversion forms depending on the measurement noise variance, trix (FIM)) assuming a linear system model1 in the presence
and the corresponding analytical expressions are obtained. Asymp- of Gaussian noise.
totically optimal power allocation strategies are derived for more • Studied system matrix uncertainty both from a general per-
complex branch, tree, and linear topologies assuming amplify-and- spective and by employing a specific uncertainty model.
forward and estimate-and-forward transmission protocols. The de- • Obtained closed form solutions for two of the proposed convex
centralized WSN estimation is extended to incorporate the effects optimization problems in the case of invertible system matrix.
of imperfect data transmission from sensors to fusion center under • Extended the results to the Bayesian estimation framework by
stringent bandwidth constraints in [10]. treating the unknown estimated parameters as Gaussian dis-
Important results are also obtained for the sensor selection tributed random variables.
problem under various constraints on the system cost and esti-
In addition to the items listed above, simulation results are
mation accuracy. The problem of choosing a set of k sensor mea-
presented to discuss the theoretical results. Namely, we compare
surements from a set of m available measurements so that the
the performance of various estimation quality metrics through nu-
estimation error is minimized is addressed in [11] under a Gaus-
merical examples using optimal and suboptimal cost allocation
sian assumption. It is shown that the combinatorial complexity of
schemes, and simulate the effects of system matrix uncertainty.
the solution can significantly be reduced without sacrificing much
We also examine the behavior of the optimal solutions returned
from the estimation accuracy by employing a heuristic based on
by various estimation accuracy criteria under scaling of the system
convex optimization. In [12], a similar sensor selection problem is
noise variances, and identify the most robust criterion to variations
analyzed in a target detection framework when several classes of
in the average system noise power via numerical examples. The re-
binary sensors with different discrimination performance and costs
lationship between the number of effective measurements and the
are available. Based on the conditional distributions of the obser-
quality of estimation is also investigated under scaling of the sys-
vations at the fusion center, the performance of the corresponding
tem noise variances.
optimal hypothesis tests is assessed using the symmetric Kullback–
The rest of this paper is organized as follows: In Section 2, we
Leibler divergence. The solution of the resulting constrained max-
pose the optimal cost allocation problem as a convex optimiza-
imization problem indicates that the sensor class with the best
tion problem under various information criteria for nonrandom
performance-to-cost ratio should be selected.
parameter vector estimation. In Section 3, we modify the proposed
As outlined above, not much work has been performed, to the
best of our knowledge, in the context of jointly designing the mea- optimization problems to handle the worst-case scenarios under
surement stage from a cost-oriented perspective while perform- system matrix uncertainty. Next, we take a specific but neverthe-
ing estimation up to a predetermined level of accuracy. In other less practical uncertainty model, and discuss how the optimization
words, the trade-offs between measurement associated costs and problems are altered while preserving convexity. In Section 4, we
estimation errors remain, to a large extent, undiscovered in the focus on two optimization problems proposed in Section 2, and
literature. On the other hand, if adopted, such an approach will simplify them to obtain closed form solutions in the case of in-
inevitably require a general and reliable method of assessing the vertible system matrix. In Section 5, we provide several numerical
cost of measurements applicable to any real world phenomenon examples to illustrate the results presented in this paper. Exten-
under consideration as well as an appropriate means of evaluat- sions to Bayesian estimation with Gaussian priors are discussed in
ing the best achievable estimation performance without reference Section 6, and we conclude in Section 7.
to any specific estimator structure. For the fulfillment of the first
requirement, a novel measurement device model is suggested in 2. Optimal cost allocation under estimation accuracy constraints
[1], where the cost of each measurement is determined by the
Consider a discrete-time system model as in Fig. 1 in which
number of amplitude levels that can reliably be distinguished. As a
noisy measurements are obtained at the output of a linear system,
consequence, higher resolution (less noisy) measurements demand
and then the measurements are processed to estimate the value of
higher costs in accordance with the usual practice. Although the
a nonrandom parameter vector θ . The observation vector x at the
proposed model may lack in capturing the exact relationship be-
tween the cost and inner workings of any specific measurement
hardware, it encompasses a sufficient amount of generality to re- 1
Such linear models have a multitude of application areas, a few examples of
main useful under a multitude of circumstances. Based on this which are channel equalization, wave propagation, compressed sensing, and Wiener
measurement model, an optimization problem is formulated in filtering problems [14,15].
830 B. Dulek, S. Gezici / Digital Signal Processing 22 (2012) 828–840
Fig. 1. Measurement and estimation systems model block diagram for a linear system with additive noise.
output of the linear system can be represented by x = H T θ + n, ∀i ∈ {1, 2, . . . , K }, since θ is a deterministic parameter vector. Then,
where θ ∈ R L denotes a vector of parameters to estimate, n ∈ R K the overall cost of measuring all the components of the observa-
is the inherent random system noise, and x ∈ R K is the observa- tion vector x is expressed as
tion vector at the output of the linear system. The system noise n
K
K
is assumed to be a Gaussian distributed random vector with zero- 1 σn2i
mean, independent but not necessarily identical components, i.e., C= Ci = log2 1 + . (1)
2 σm2 i
n ∼ N (0, Dn ), where Dn = diag{σn21 , σn22 , . . . , σn2K } is a diagonal co- i =1 i =1
variance matrix, and 0 denotes the all-zeros vector of length K . A closer look into (1) reveals that it is a nonnegative, mono-
We also assume that the number of observations is at least equal tonically decreasing and convex function of σm2 i , ∀σn2i > 0 and
to the number of estimated parameters (i.e., K L) and the sys-
∀σm2 i > 0. It is also noted that a measurement device has a higher
tem matrix H is an L × K matrix with full row rank L so that the
cost if it can perform measurements with a lower measurement
columns of H span R L .
variance (i.e., with higher accuracy). Such an approach brings great
Noisy measurements of the observation vector x are made by
flexibility by enabling to work with variable precision over the ac-
K measurement devices at the output of the linear system, and
quired measurements. After formulating the measurement device
then the measured values in vector y ∈ R K are processed to es-
model as outlined above, our objective is to minimize the total cost
timate the parameter vector θ . It is assumed that each measure-
of the measurement devices under a constraint on estimation ac-
ment device is capable of sensing the value of a scalar physi-
curacy. In other words, we are allowed to design the noise levels of
cal quantity with some resolution in amplitude according to the
the measurement devices such that the overall cost is minimized
measurement model y i = xi + mi , where mi denotes the measure-
under a constraint on the minimum acceptable estimation perfor-
ment noise associated with the ith measurement device. In other
mance.
words, measurement devices are modeled to introduce additive
In nonrandom parameter estimation problems, the Cramer–Rao
random measurement noise which can be expressed as y = x + m.
bound (CRB) provides a lower bound on the mean-squared errors
It is also reasonable to assume that measurement noise vector
(MSEs) of unbiased estimators under some regularity conditions
m is independent of the inherent system noise n. In addition,
[16]. Specifically, the CRB on the estimation error for an arbitrary
the noise components introduced by the measurement devices
unbiased estimator θ̂(y) is expressed as
(the elements of m) are assumed to be zero-mean independent
Gaussian random variables with possibly distinct variances,2 i.e.,
m ∼ N (0, Dm ), where Dm is a diagonal covariance matrix given by
E (θ̂ − θ)(θ̂ − θ)T J−1 (y, θ ) CRB, (2)
Dm = diag{σm2 1 , σn22 , . . . , σm2 K }. Based on the outputs of the mea- where J(y, θ) is the Fisher information matrix (FIM) of the mea-
surements devices, unknown parameter vector θ is estimated. surement y relative to the parameter vector θ , which is defined
In practical scenarios, a major issue is the cost of performing as
measurements. The cost of a measurement device is primarily as- T
sessed with its resolution, more specifically with the number of 1 ∂ p θy (y) ∂ p θy (y)
amplitude levels that the device can reliably discriminate. Intu- J(y, θ ) dy, (3)
p θy (y) ∂θ ∂θ
itively, as the accuracy of a measurement device increases so does
its cost. Therefore, it may not always be possible to make high res- where ∂/∂θ denotes the gradient (i.e., a column vector of partial
olution measurements with a limited budget. In a recent work [1], derivatives) with respect to parameters θ1 , . . . , θ K . Or, equivalently,
a novel measurement device model is proposed where the cost the elements of the FIM can be calculated from [16]
of each device is expressed quantitatively in terms of the num-
ber of amplitude levels that can be resolved reliably. In this model, ∂ 2 log p θy (y)
J i j = −Ey|θ . (4)
the amplitude resolution of the measurement devices solely de- ∂θi ∂θ j
termines the cost of each measurement. The dynamic range or
scaling of the input to the measurement device is assumed to have The symbol between nonnegative definite matrices in (2) rep-
no effect on the cost as long as the number of resolvable levels resents the inequality with respect to the positive semidefinite
stays the same. More explicitly, in [1], the cost associated with matrix cone. Specifically, it indicates that the difference matrix ob-
measuring the ith component of the observation vector x is given tained by subtracting the right-hand side of the inequality from
by Ci = 0.5 log2 (1 + σx2i /σm2 i ), where σx2i denotes the variance of the left-hand side is nonnegative definite. Assuming independent
the ith component of observation vector x (i.e., the variance of Gaussian distributions for n and m, it can be shown that the CRB
the input to the ith measurement device), and σm2 i is the vari- is given as follows [17]
ance of the ith component of m (i.e., the variance of the noise
− 1
introduced by the ith measurement device).3 Notice that σx2i = σn2i , CRB = J−1 (y, θ ) = H Cov−1 (n + m)H T , (5)
− 1
the maximum likelihood (ML) estimator (also the best linear unbi- θ̂(y) = HDHT HDy
ased estimator (BLUE) in this case), θ̂(y) = (HDH T )−1 HDy, where K
− 1 K
the efficiency of the estimator follows from linearity of the system 1 yi
T
and due to the assumption of Gaussian distributions [16]. Specifi-
= hi hi hi . (10)
σ + σmi
2
i =1 n i
2
σ + σm2 i
2
i =1 n i
cally, the covariance matrix of the estimator equals the inverse of
the FIM, i.e., Cov(θ̂ (y)) = (HDH T )−1 .
2.1. Average mean-squared error
Remark. When non-Gaussian distributions are assumed, we can The diagonal components of the CRB provide a lower bound on
utilize the preceding observation to obtain an upper bound on the the MSE while estimating the components of parameter θ . Specifi-
CRB. To see this, a few preliminaries are needed. First, the FIM of a cally,
random vector z with respect to a translation parameter is defined
2
as follows [17]
Ey|θ θ̂(y) − θ 2 tr J−1 (y, θ ) ,
T
1 ∂ p z (z) ∂ p z (z) where tr{·} denotes the trace operator [16]. In other words, the
J(z) J(θ + z, θ ) = dz, (6) harmonic average of the eigenvalues of the FIM is taken as the
p z (z) ∂z ∂z
performance metric. Based on this metric, the following measure-
where p z (z) is the probability density function of z that is inde- ment cost minimization problem is proposed:
pendent of θ . A well-known property of the FIM under translation
is J(z) Cov−1 (z) with equality if and only if z is Gaussian [17].
1
K
σn2i
Based on these preliminaries, for linear models in the form of min log2 1 +
Fig. 1 but with arbitrary probability distributions for n and m, it {σm2 i }iK=1 2
i =1
σm2 i
can be shown that J(y, θ ) = HJ(n + m)H T , where J(n + m) indicates
− 1
K
1
the FIM under a translation parameter of random vector n + m subject to tr hi hiT E, (11)
[17]. In order to upper bound the CRB, it is first observed that
i =1
σn2i + σ 2
mi
J(n + m) Cov−1 (n + m). Using the properties of nonnegative def-
inite matrices, we have where E denotes a constraint on the maximum allowable av-
erage estimation error. Due to the inevitable intrinsic system
− 1
CRB = J−1 (y, θ ) = HJ(n + m)H T noise, the design criterion E must satisfy E > tr{(HD−1 T −1
n H ) }=
K hi hiT −1
−1
T −1 tr{( i =1 ) }. Substituting μi = 1/(σn2i + σm2 i ), (11) becomes
H Cov (n + m)H , (7) σn2i
1
which naturally indicates that the difference matrix obtained by K
subtracting the CRB from the covariance matrix of the linear es- max log2 1 − σn2i μi
{μi }iK=1 2
timator θ̂(y) must be nonnegative definite. Correspondingly, it is i =1
also possible to lower bound the CRB for independent random vec- K
− 1
tors n and m. To that aim, we can revert to the Fisher Information
T
subject to tr μ i hi hi E. (12)
Inequality (FII) [18]. FII states that J−1 (n + m) J−1 (n) + J−1 (m)
i =1
with equality if and only if n and m are Gaussian. Therefore,
It is noted that the objective function is smooth and concave for
− 1 T − 1
CRB = J−1 (y, θ ) H J−1 (n) + J−1 (m) H . (8) ∀μi ∈ [0, 1/σn2i ). Since the constraint is also a convex function of
μi ’s for ∀μi 0, this is a convex optimization problem [19, Sec-
As a result, a lower bound on the CRB can also be obtained in tion 7.5.2]. Consequently, it can be efficiently solved in polynomial
terms of the FIMs under translation parameters (6) of random vec- time using interior point methods and the numerical convergence
tors n and m with arbitrary probability distributions. 2 is assured. It is also possible to express this optimization problem
using linear matrix inequalities (LMIs) as follows:
Returning to our case of independent Gaussian system noise
1
K
and measurement noise, the CRB is equal to the covariance ma-
trix (i.e., estimation error covariance) of the ML estimator θ̂(y) =
max log2 1 − σn2i μi
{zi }iL=1 , {μi }iK=1 2
(HDH T )−1 HDy as mentioned in the paragraph following (5). Fur- i =1
thermore, when the system and measurement noise distributions K
i =1 μi hi hiT e j
are not restricted to Gaussian, the covariance matrix of the linear subject to 0,
e Tj zi
estimator θ̂ (y) can also be used as an upper bound to the CRB as
shown in (7). For this reason, in the following analysis we employ
K
several performance metrics based on the CRB given in (5) in or- j = 1, . . . , L , z i E, (13)
der to assess the quality of estimation. In other words, we propose i =1
measurement cost minimization formulations under various esti-
where e j denotes the column vector of length L with a 1 in the
mation accuracy constraints based on the CRB expression in (5).
jth coordinate and 0’s elsewhere. Or equivalently,
However, before that analysis, we first express the CRB in a more
familiar form in the optimization theoretic sense
1
K
− 1 max log2 1 − σn2i μi
K
1 Z∈S L , {μi }iK=1 2
i =1
CRB = J−1 (y, θ ) = hi hiT , (9)
σ +σ
2
ni
2
mi Z I
i =1 subject to K T 0, tr(Z) E, (14)
I μ
i =1 i h i h i
and the corresponding ML estimator that achieves this bound be-
comes where S L denotes the set of symmetric L × L matrices.
832 B. Dulek, S. Gezici / Digital Signal Processing 22 (2012) 828–840
2.2. Shannon information is associated with the maximum (minimum) eigenvalue of the
CRB (FIM) [11,20–22]. The corresponding optimization problem is
An alternative measure of the estimation accuracy considers the stated as follows:
Shannon (mutual) information content between the unknown pa-
1
K
rameter vector θ and the measurement vector y. More explicitly,
max log2 1 − σn2i μi
the interest is to place a constraint on the log volume of the η - {μi }iK=1 2
confidence ellipsoid which is defined as the minimum ellipsoid i =1
that contains the estimation error with probability η [19, Sec-
K
T
tion 7.5.2]. As shown in [11], the η -confidence ellipsoid is given by subject to λmin μ i hi hi Λ, (19)
i =1
εα = z zT J(y, θ )z α , (15)
where λmin {·} represents the minimum eigenvalue of its argument,
where α = F χ−21 (η) is obtained from the cumulative distribution and Λ is a predetermined lower bound on the minimum eigen-
K K hi hiT
function of a chi-squared random variable with K degrees of value of the FIM satisfying Λ < λmin {HD−
n H } = λmin {
1 T
i =1 }.
σn2i
freedom. Then, the log volume of the η -confidence ellipsoid is Since the constraint can be represented in the form of an LMI, this
obtained as4 problem can equivalently be expressed as
1
K
1
1
K
log vol(εα ) = β − log det hi hiT ,
2 σn2i + σ 2
mi max log2 1 − σn2i μi
i =1 {μi }iK=1 2
i =1
n n
where β = log(απ ) − log Γ +1 , (16)
K
2 2 subject to μi hi hiT ΛI, (20)
with Γ denoting the Gamma function. Notice that the design cri- i =1
terion is related to the geometric mean of the eigenvalues of the
where I is the L × L identity matrix. The resulting problem is also
FIM. Based on this metric, the following measurement cost opti-
convex [19, Section 7.5.2].
mization problem can be obtained:
1
K 2.4. Worst-case coordinate error variance
max log2 1 − σn2i μi
{μi }iK=1 2 Another variation of the worst-case error criteria can be ob-
i =1
tained by placing a constraint on the maximum error variance
K
subject to log det μi hi hiT 2(β − S), (17) among all the individual estimator components, i.e., restricting the
i =1
largest diagonal entry of the CRB. Using this performance criterion,
we have the following optimization problem
where μi is as defined in (12) and S is a constraint on the
log volume of η -confidence ellipsoid satisfying S > β − 0.5 log det
1
K
K
(HD−
hi hiT
). Since log det( iK=1 μi hi hiT ) max log2 1 − σn2i μi
n H ) = β − 0.5 log det(
1 T
i =1 σn2i {μi }iK=1 2
i =1
is a smooth concave function of μi for μi 0, the resulting op-
− 1
timization problem is convex [19, Section 3.1.5]. The smoothness
K
T
property of the problem is also very helpful for obtaining the so- subject to max μ i hi hi , (21)
j =1,..., K
lution via numerical methods. i =1 j, j
By introducing a lower triangular nonsingular matrix L and uti- where is a constraint on the maximum allowable diagonal en-
lizing Cholesky decomposition of positive definite matrices, it is try of the CRB (estimation error covariance matrix) satisfying >
possible to rewrite
K the constraint in terms of a lower bound. To K hi hiT −1
max j =1,..., K ((HD−1 T −1
n H ) ) j , j = max j =1,..., K (( ) ) j , j . This
i =1 μi hi hi LL . Then, the optimization problem
T T i =1
that aim, let σn2i
can be expressed equivalently as problem can equivalently be expressed as
1
K
1
K
max log2 1 − σn2i μi max log2 1 − σn2i μi
L∈U L , {μi }iK=1 2
i =1 {μi }iK=1 2
i =1
I L T
L
e Tj
subject to K T 0, log L i ,i (β − S), subject to K 0, j = 1, . . . , L , (22)
L i =1 μ i h i h i i =1 ej i =1 μi hi hiT
(18)
where e j denotes the column vector of length L with a 1 in the jth
where U L denotes the set of lower triangular nonsingular L × L
coordinate and 0’s elsewhere. This is also a convex optimization
square matrices, L i ,i represents the ith diagonal coefficient of L,
problem [19, Section 7.5.2].
and L is the dimension of L.
3. Extensions to cases with system matrix uncertainty – robust
2.3. Worst-case error variance
measurement
When the primary concern shifts from accuracy requirements
towards robust behavior, it may be more desirable to have a con- It may also be the case that there exists some uncertainty con-
straint on the worst-case variance of the estimation error, which cerning the elements in the system matrix H [11]. Suppose that
the system matrix H can take values from a given finite set H.
In the robust measurement problem, we consider the optimization
4
We use ‘log’ without a subscript to denote the natural logarithm. over the worst-case scenario. Specifically, we choose the matrix
B. Dulek, S. Gezici / Digital Signal Processing 22 (2012) 828–840 833
from the family of system matrices H resulting in the worst esti- H is in general not finite, and the solutions of the above opti-
mation accuracy constraint, and perform the optimization accord- mization problems require general techniques from semi-infinite
ingly. Recalling that the infimum (supremum) preserves concavity convex optimization such as those explained in [23,24]. In the fol-
(convexity), it is possible to restate the measurement cost opti- lowing, a specific uncertainty model is considered where it is pos-
mization problems given in Section 2, and still maintain convex sible to further simplify the optimization problems given in (26)
optimization problems. Then, the resulting optimization problems and (27) by expressing the constraints as LMIs. To that aim, let
with respect to each criterion are expressed as follows. H ∈ H = {H̄ + : T 2 }, where · 2 denotes the spectral
norm (i.e., the square root of the largest eigenvalue of the positive
3.1. Average mean-squared error semidefinite matrix T ). It is possible to express this constraint
as an LMI, T 2 I. Suppose also that μ is defined as the fol-
1 lowing diagonal matrix μ diag{μ1 , μ2 , . . . , μ K }, and W LL T is
K
max log2 1 − σn2i μi a symmetric positive definite matrix. Then, the constraint in (26)
{μi }iK=1 2
i =1 can be expressed in terms of H̄ and as
−1
K
subject to sup tr μ T
E, W H̄μH̄ T + H̄μ T + μH̄ T + μ T ,
i hi hi (23)
H∈H
i =1 for all T 2 I. (29)
or equivalently, Similarly, the constraint in (27) is given by
1
K
max log2 1 − σn2i μi ΛI H̄μH̄T + H̄μT + μH̄T + μT ,
Z∈S L , {μi }iK=1 2
i =1 for all T 2 I. (30)
Z I
subject to K T 0 for all H ∈ H, In [25, Theorem 3.3], a necessary and sufficient condition is de-
I μ
i =1 i h i h i rived for quadratic matrix inequalities in the form of (29) and (30)
tr(Z) E. (24) to be true. In the light of this theorem, (29) holds if and only if
there exists t 0 such that
3.2. Shannon information
H̄μH̄ T − W − tI H̄μ
0, (31)
1
K
2
μH̄T μ + t2 I
max log2 1 − σ ni μi
{μi }iK=1 2 and (30) holds if and only if there exists t 0 such that
i =1
K
H̄μH̄ T − (Λ + t )I H̄μ
T
subject to inf log det μ i hi hi 2(β − S), (25) 0. (32)
H∈H
i =1
μH̄T μ + t2 I
or equivalently, Notice that (31) and (32) are both linear in μ, W and t. Hence,
under this specific uncertainty model, we can express the opti-
1
K
mization problem in (26) as
max log2 1 − σn2i μi
L∈U L , {μi }iK=1 2
1
K
i =1
I L T max log2 1 − σn2i μi
subject to K 0 for all H ∈ H, t ,W∈s++ , {μi }iK=1
L 2
T i =1
L i =1 μ i h i h i
H̄μH̄ T − W − tI H̄μ
L
subject to 0,
log L i ,i (β − S). (26) μH̄T μ + t2 I
i =1 log det(W) 2(β − S), t 0, (33)
3.3. Worst-case error variance where s++ denotes symmetric positive-definite L × L matrices.
L
arbitrary covariance matrix (possibly colored), i.e., n ∼ N (0, n ) From (37), it is noted that the constraint function is linear in σm2 i ’s,
with {σn21 , σn22 , . . . , σn2K } constituting the diagonal components of the objective function is convex, and both functions are contin-
n , and 0 denoting the all-zeros vector of length K as before. To uously differentiable which altogether indicate that Slater’s con-
that aim, assuming independent Gaussian distributions for n and dition holds. Therefore, Karush–Kuhn–Tucker (KKT) conditions are
m, and square H with full-rank (invertible), it is observed that necessary and sufficient for optimality. Then, the optimal measure-
− 1 ment noise variances can be calculated from
CRB = J−1 (y, θ ) = H Cov−1 (n + m)H T
−1 T −1
σn2i σn4i σn2i
= H Cov(n + m)H 2
σ =−
mi + +γ , (38)
−1 T
T 2 4 fi
= H n H−1 + H−1 D m H−1 , (35)
where γ > 0 is obtained by substituting (38) into the average MSE
K
where the first part of the CRB, (H−1 ) T n H−1 is a known quan- constraint, that is i =1 f i σmi = E − t.
2
tity, and the second part (H−1 ) T D H−1 will be subject to design
m Special case: When the inverse of the system matrix has nor-
while assessing the quality of the estimation. Similar to the previ- malized rows, i.e., f i = 1, and the components of the system
ous discussion, CRB can be achieved in this case by employing the noise are independent zero-mean Gaussian random variables,
K the
i =1 σmi =
2
corresponding linear unbiased estimator which turns out simply optimal measurement noise variances should satisfy
K
to be a multiplication of the measurement vector with the inverse E− σn2i . If identical system noise components are assumed
i =1
of the system matrix, i.e., θ̂(y) = (H−1 ) T y. Returning to two com- as well, i.e., σn2i = σn2 , i = 1, . . . , K , then the optimal solution re-
monly used performance metrics introduced in Section 2, we next
sults in σm2 i = σm2 , i = 1, . . . , K , where σm2 = E/ K − σn2 is obtained
examine the closed-form solutions of the corresponding cost min-
from the average MSE constraint. The corresponding optimal cost
imization problems.
is given by ( K /2) log2 (E/(E − K σn2 )). This is an increasing function
of K for fixed E. Furthermore, the derivatives of all orders with
4.1. Average mean-squared error
respect to K exist, and are positive for K < E/σn2 . Therefore, esti-
mating more parameters under an average error constraint based
Due to the CRB, it is known that the average MSE while esti-
on the CRB requires even more accurate measurement devices with
mating the components of the parameter θ is bounded from below
higher costs as long as K < E/σn2 is satisfied.
as
2
Ey|θ θ̂(y) − θ 2 tr J−1 (y, θ ) 4.2. Shannon information
T
T
= tr H−1 n H−1 + tr H−1 Dm H−1 , Another measure of estimation accuracy that results in a closed
form solution in the case of invertible system matrix H is the
where the last equality follows from the linearity of the trace op- Shannon information criterion. Using this metric as the constraint
erator and the invertibility of H. Since (H−1 ) T n H−1 is known, function, we are effectively restricting the log volume of the η -
let t = tr{(H−1 ) T n H−1 }. When the aim is to minimize the mea- confidence ellipsoid to stay below a predetermined value S. Using
surement cost subject to a constraint on the lower bound for the similar arguments to Section 2.2 and the invertibility of H,
average MSE (achievable in the case of Gaussian distributions), the
optimization problem can be expressed similarly to (11) as follows: log det H Cov−1 (n + m)H T
1
K
σn2 = log det H · det Cov−1 (n + m) · det HT
min log2 1 + 2i
{σm2 i }iK=1 2
i =1
σmi
K
= 2 log | det H| − log σn2i + σm2 i , (39)
−1 T −1
subject to tr H Dm H E − t, (36) i =1
where E denotes a constraint for the overall average estimation er- where the second equality follows the properties of the deter-
ror suggested by the CRB (achievable in this case), and t represents minant and logarithm, i.e., det H = det H T , det(Cov−1 (n + m)) =
the unavoidable estimation error due to intrinsic system noise n. 1/ det(Cov(n + m)), and Cov(n + m) = Dn + Dm = diag{σn21 + σm2 1 ,
Notice that for consistency, the design parameter E should be se- σn22 + σn22 , . . . , σn2K + σm2 K } due to Gaussian distributed independent
lected as E > t. system and measurement noises with independent components.
From the independence of the measurement noise components, Since the system matrix H is known, let α log | det H|. Under
Dm = diag{σm2 1 , σm2 2 , . . . , σm2 K } is a diagonal covariance matrix with these conditions, the optimization problem in (17) can be stated
σm2 i > 0, ∀i ∈ {1, 2, . . . , K }. In the view of this observation, it is as
possible to simplify the objective function further by defining
1
K
F (H−1 ) T = [f1 f2 . . . f K ], where fi represents the ith row of the σn2
min log2 1 + 2i
inverse of the system matrix H. Let f i fi 22 denote the square of {σm2 i }iK=1 2 σmi
i =1
the Euclidean norm of the vector fi , that is, the sum of squares of
the elements in fi . It is noted that f i is always positive for invert-
K
ible H, and is constant for fixed H. Then the optimization problem subject to log σn2i + σm2 i 2(S + α − β), (40)
in (36) can be expressed as follows: i =1
is subtracted from {σ 2m 0}. Since the global minimum of the un- it is assured that observations corrupted by weak, moderate and
constrained objective function is achieved for σ 2m = ∞ which is strong levels of Gaussian noise are available with similar propor-
contained in set C and the objective function is convex, it is con- tions for the estimation stage. In the following, we look into the
cluded that the minimum
K of the objective function has to occur at problem of optimally assigning costs to measurement devices un-
i =1 log(σni + σmi ) = 2(S + α −β) must be satis-
2 2 der various estimation accuracy constraints when the variances of
the boundary, i.e.,
fied [26]. Therefore, we can take the constraint as equality in (40). the intrinsic system noise components are uniformly distributed
This is a standard optimization problem that can be solved using as explained above. Note that our results obtained in the previ-
Lagrange multipliers. Hence, by defining 2(S + α − β), we can ous section are still valid for Gaussian system noise processes with
write the Lagrange functional as arbitrary diagonal covariance matrices (i.e., the nonzero compo-
nents of the diagonal covariance matrix need not be uniformly dis-
1
K
σn2i tributed as in this example). In obtaining the optimal solutions for
J σm2 1 , . . . , σm2 K = log2 1 + the convex optimization problems stated above, fmincon method
2
i =1
σm2 i
from MATLAB’s Optimization Toolbox and the CVX software [27]
K
are used.
+λ log σn2i + σm2 i − , (41)
i =1 5.1. Performance of various estimation quality metrics under perfect
2
system state information
and differentiating with respect to σ we have the following as- mi ,
signment of the noise variances to the measurement devices First, we investigate the cost assignment problem under perfect
2 information on the system matrix and intrinsic noise variances. Re-
σm2 i = γ 1/ K − 1 σn2i , where γ = K . (42) call that four different performance constraints are proposed for
j =1 σn2j that purpose in Section 2. In the following four experiments, we
For consistency, thedesign parameter S should be selected as analyze the behavior of the total measurement cost while each
= 2(S + α − β) > iK=1 log(σn2i ) since the intrinsic system noise constraint metric is varied between its extreme values. The to-
puts a lower bound on the minimum attainable volume of the con- tal cost is measured in bits by taking logarithms with respect to
fidence ellipsoid. Some properties of the obtained solution can be base 2. The constraint metric is expressed as the ratio of its cur-
summarized as follows: rent value to the value it attains for the limiting case when zero
measurement noise variances are assumed. As an example, for av-
• For given , K and σn2i ’s, the minimum achievable cost is erage mean-squared-error criterion, the total measurement cost C
1/ K will be tabulated versus E/ tr{(HD− 1 T −1
n H ) }.
( K /2) log2 ( γ γ1/ K −1 ), where γ is computed as in (42). In addition to the optimal cost allocation scheme proposed in
• For a fixed value of K (available number of observations), re- this paper, we also consider two suboptimal cost allocation strate-
laxing the constraint on the volume of the η -confidence el- gies:
lipsoid (increasing the value of ) results in smaller measure-
ment device costs with a limiting value of 0, as expected. • Equal cost to all measurement devices: In this strategy, it is as-
• If the observation variances are equal; that is, σn2i = σn2 , i = sumed that a single set of measurement devices with iden-
1, . . . , K , employing identical measurement devices for all the tical costs is employed for all observations so that Ci = C,
observations; that is, σm2 i = σm2 , i = 1, . . . , K , is the optimal i = 1, 2, . . . , K . This, in turn, implies that the ratio of the mea-
strategy. From (42), the optimal value of the measurement surement noise variance to the intrinsic system noise vari-
noise variances is calculated as σm2 ,opt = e / K − σn2 , and the ance, x σm2 i /σn2i , is constant for all measurement devices.
corresponding minimum total measurement cost is given as Then, the total cost can be expressed in terms of x as C =
/(2 log 2) − ( K /2) log2 (e / K − σn2 ) which is an increasing 0.5K log2 (1 + 1/x), and similarly the FIM becomes J(y, θ ) =
function of K for > K log σn2 . Intuitively, this result as well
−1 T
HDn H 1
K hi hiT
x+1
= x+1 i =1 . Using this observation, the constraint
indicates that estimating more parameters under a fixed con- σn2i
straint on the volume of the ellipsoid containing the estima- functions provided for different performance metrics in the
tion errors requires a higher total measurement device cost. optimization problems (11), (17), (19), and (21) can be al-
gebraically solved for equality to determine the value of x
5. Numerical results without applying any convex optimization techniques, and the
corresponding measurement variances and cost assignments
In this section, we present an example that illustrates several can be obtained.
theoretical results developed in the previous section. To that aim, • Equal measurement noise variances: In this case, measurement
a discrete-time linear system as depicted in Fig. 1 is considered devices are assumed to introduce random errors with equal
noise variances, that is, σm2 i = σm2 , i = 1, 2, . . . , K . In other
y = H T θ + n + m, (43) words, all observations are assumed to be corrupted with
identical noise processes, and the best measurement noise
where θ is a length-20 vector containing the unknown param- variance value that minimizes the overall measurement cost
eters to be estimated, H is a 20 × 100 system matrix with full while satisfying the estimation accuracy constraint is selected.
row rank, the intrinsic system noise n and the measurement noise Accordingly, the objective function in the proposed optimiza-
m are length-100 Gaussian distributed random vectors with in- K
tion problems simplifies to C = 0.5 i =1 log2 (1 + σn2i /σm2 ) and
dependent components. The entries of the system matrix H are the FIM employed in the constraint functions takes the form
generated from a process of i.i.d. uniform random variables in the K hi hiT
J(y, θ ) = . By substituting these expressions into
interval [−0.1, 0.1]. Also, the components of the system noise vec- i =1 σn2i +σm2
tor n are independently Gaussian distributed with zero mean, and the various optimization approaches provided in Section 2,
it is assumed that their variances come from a uniform distribution these problems can be solved rapidly over a single parame-
defined in the interval [0.05, 1]. The implication of this assump- ter σm2 using the tools of convex analysis, and the optimal cost
tion is that the observations at the output of the linear system allocations can be obtained for the case of equal measurement
possess uniformly varying degrees of accuracy. In other words, noise variances.
836 B. Dulek, S. Gezici / Digital Signal Processing 22 (2012) 828–840
Fig. 5. Total cost versus normalized worst-case coordinate error variance constraint. Fig. 6. The performance of various optimal cost allocation strategies under scaling
of the system noise variances. All costs are equal for c = 0.5.
Fig. 8. Number of effective measurements under the scaling of the system noise
Fig. 7. The performance of various optimal cost allocation strategies under scaling of
variances for various estimation accuracy metrics.
the system noise variances. All costs are equal for c = 1.
References
i =1 i =1
2 σm2 i Prentice Hall, Upper Saddle River, NJ, 1993.
[15] M. Hayes, Statistical Digital Signal Processing and Modeling, John Wiley & Sons,
1996.
where σx2i is the ith diagonal entry of the observation covariance [16] H.L.V. Trees, Detection, Estimation, and Modulation Theory: Part I, 2nd ed., John
matrix Cov(x) = H T θ H + Dn . Wiley & Sons, New York, NY, 2001.
Based on these expressions, all the proposed cost minimization [17] R. Zamir, A proof of the Fisher information inequality via a data processing
formulations in Section 2 can be modified accordingly to obtain argument, IEEE Trans. Inform. Theory 44 (1998) 1246–1250.
[18] A. Dembo, T.M. Cover, J.A. Thomas, Information theoretic inequalities, IEEE
the optimal cost assignment strategies in the presence of prior in-
Trans. Inform. Theory 37 (1991) 1501–1518.
formation. Specifically, the CRB is replaced with the BCRB, and the [19] S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press,
cost function stated in (47) is substituted as the objective function Cambridge, UK, 2004.
inside the optimization problems given in (14), (18), (20), and (22). [20] Y. Eldar, A. Ben-Tal, A. Nemirovski, Robust mean-squared error estimation in
the presence of model uncertainties, IEEE Trans. Signal Process. 53 (2005) 168–
However, the modified optimization problems are not necessarily
181.
convex. It is also noted that the problem formulation constructed [21] Z. Ben-Haim, Y.C. Eldar, Maximum set estimators with bounded estimation er-
by employing the LMMSE estimator in [1] is equivalent to the dual ror, IEEE Trans. Signal Process. 53 (2005) 3172–3182.
of the Bayesian estimation case under the average MSE criterion [22] A. Das, D. Kempe, Sensor selection for minimizing worst-case prediction error,
given in (11) when Gaussian priors are assumed. in: Int. Conf. Inform. Process. Sensor Networks (IPSN’08), pp. 97–108.
[23] R. Hettich, K. Kortanek, Semi-infinite programming: Theory, methods, and ap-
plications, SIAM Rev. 35 (1993) 380–429.
7. Conclusion [24] A. Mutapcic, S. Boyd, Cutting-set methods for robust convex optimization with
pessimizing oracles, Optim. Methods Softw. 24 (2009) 381–406.
In this paper, we have studied the measurement cost mini- [25] Z. Quan Luo, J.F. Sturm, S. Zhang, Multivariate nonnegative quadratic mappings,
mization problem for a linear system in the presence of Gaussian SIAM J. Optim. 14 (2002) 1140–1162.
[26] R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ,
noise based on the measurement device model introduced in [1]. 1968.
By considering the nonrandom parameter estimation case, novel [27] M. Grant, S. Boyd, CVX: Matlab software for disciplined convex programming,
convex optimization problems have been obtained under various version 1.21, http://cvxr.com/cvx, 2011.
840 B. Dulek, S. Gezici / Digital Signal Processing 22 (2012) 828–840
Berkan Dulek received the B.S. and M.S. degrees with high honors in ton University in 2006. From 2006 to 2007, he worked at Mitsubishi
electrical engineering from Bilkent University, Turkey, in 2003 and 2006, Electric Research Laboratories, Cambridge, MA. Since February 2007,
respectively. He is currently studying toward the Ph.D. degree at Bilkent he has been an Assistant Professor in the Department of Electrical
University. His research interests are in statistical signal processing and and Electronics Engineering at Bilkent University. Dr. Gezici’s research
communications with emphasis on stochastic signaling, randomized de- interests are in the areas of detection and estimation theory, wire-
tection and estimation under cost constraints. less communications, and localization systems. Among his publications
in these areas is the book Ultra-wideband Positioning Systems: Theoreti-
Sinan Gezici received the B.S. degree from Bilkent University, Turkey cal Limits, Ranging Algorithms, and Protocols (Cambridge University Press,
in 2001, and the Ph.D. degree in electrical engineering from Prince- 2008).