Sparse Signal Processing
Sparse Signal Processing
REVIEW
Open Access
1 Introduction
There are many applications in signal processing and
communication systems where the discrete signals are
sparse in some domain such as time, frequency, or space;
i.e., most of the samples are zero, or alternatively, their
transforms in another domain (normally called frequency
coecients) are sparse (see Figures 1 and 2). There
are trivial sparse transformations where the sparsity is
preserved in both the time and frequency domains;
the identity transform matrix and its permutations are
extreme examples. Wavelet transformations that preserve
the local characteristics of a sparse signal can be regarded
as almost sparse in the frequency domain; in general,
for sparse signals, the more similar the transformation
*Correspondence: [email protected]
1 Electrical Engineering Department, Advanced Communication Research
Institute (ACRI), Sharif University of Technology, Tehran, Iran
Full list of author information is available at the end of the article
2012 Marvasti et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction
in any medium, provided the original work is properly cited.
codes [6]; 3Spectral Estimation [7-10]; 4Array Processing: Multi-source location (MSL) and direction of
arrival (DOA) estimation [11,12], sparse array processing [13], and sensor networks [14]; 5Sparse Component Analysis (SCA): blind source separation [15-17]
and dictionary representation [18-20]; 6Channel Estimation in Orthogonal Frequency Division Multiplexing
(OFDM) [21-23]. The sparsity properties of these elds
are summarized in Tables 1, 2, and 3.b The details of most
of the major applications will be discussed in the next
sections but the common traits will be discussed in this
introduction.
The columns of Table 1 consist of 0category, 1
topics, 2sparsity domain, 3type of sparsity, 4
information domain, 5type of sampling in information
domain, 6minimum sampling rate, 7conventional
reconstruction methods, and 8applications. The rst
rows (27) of column 1 are on sampling techniques. The
89th rows are related to channel coding, row 10 is on
spectral estimation and rows 1113 are related to array
processing. Rows 1415 correspond to SCA and nally,
row 16 covers multicarrier channel estimation, which is
a rather new topic. As shown in column 2 of the table,
depending on the topics, sparsity is dened in the time,
space, or frequency domains. In some applications, the
Page 2 of 45
sparsity is dened as the number of polynomial coecients (which in a way could be regarded as frequency),
the number of sources (which may depend on location
or time sparsity for the signal sources), or the number of
words (signal bases) in a dictionary. The type of sparsity
is shown in column 3; for sampling schemes, it is usually
low-pass, band-pass, or multiband [24], while for compressed sensing, and most other applications, it is random.
Column 4 represents the information domain, where the
order of sparsity, locations, and amplitudes can be determined by proper sampling (column 5) in this domain.
Column 7 is on traditional reconstruction methods; however, for each area, any of the reconstruction methods can
be used. The other columns are self explanatory and will
be discussed in more details in the following sections.
The rows 24 of Table 1 are related to the sampling
(uniform or random) of signals that are bandlimited in
the Fourier domain. Band-limitedness is a special case
of sparsity where the nonzero coecients in the frequency domain are consecutive. A better assumption in
the frequency domain is to have random sparsity [25-27]
as shown in row 5 and column 3. A generalization of
the sparsity in the frequency domain is sparsity in any
transform domain such as Discrete Cosine and Wavelet
Transforms (DCT and DWT); this concept is further generalized in CS (row 6) where sampling is taken by a linear
combination of time domain samples [2,28-30]. Sampling
of signals with nite rate of innovation (row 7) is related
to piecewise smooth (polynomial based) signals. The positions of discontinuous points are determined by annihilating lters that are equivalent to error locator polynomials
in error correction codes and the Pronys method [10] as
discussed in Sections 4 and 5, respectively.
Random errors in a Galois eld (row 8) and the additive
impulsive noise in real-eld error correction codes (row
9) are sparse disturbances that need to be detected and
removed. For erasure channels, the impulsive noise can
be regarded as the negative of the missing sample value
[31]; thus the missing sampling problem, which can also
be regarded as a special case of nonuniform sampling, is
also a special case of the error correction problem. A subclass of impulsive noise for 2-D signals is salt and pepper
noise [32]. The information domain, where the sampling
process occurs, is called the syndrome which is usually in a
transform domain. Spectral estimation (row 10) is the dual
of error correction codes, i.e., the sparsity is in the frequency domain. MSL (row 11) and multi-target detection
in radars are similar to spectral estimation since targets
act as spatial sparse mono-tones; each target is mapped
to a specic spatial frequency regarding its line of sight
direction relative to the receiver. The techniques developed for this branch of science is unique; with examples
such as MUSIC [7], Prony [8], and Pisarenko [9]. We shall
see that the techniques used in real-eld error correction
Category
Topics
Sampling
Uniform
Sparsity
Type of
Information
Type of
domain
sparsity
domain
sampling in
info. domain
samples
methods
Uniform
2 BW 1
Lowpass
Frequency
Lowpass
Time/space
Min number
Conventional
Applications
of required
reconstruction
sampling
A/D
ltering/
Interpolation
Nonuniform
Frequency
Lowpass
Time/space
sampling
Sampling of
Frequency
Union of
Time/pace
Channel
coding
-ds/lter banks/
MRI/CT/
spline interp.
FM/ PPM
Uniform/jit-
intervals
random
Data
compression/
interpolation
radar
Random/
Iterative methods:
Missing samp.
uniform
coe.
adapt. thresh.
recovery/
RDE/ELP
data comp.
Random
Random
c k log( nk )
Basis pursuit/
Data
compression
Time/space
An arbitrary
sensing
orthonormal
mapping of
mixtures
matching
transform
time/space
of samples
pursuit
Filtered
Uniform
Finite
Time and
rate of
polynomial
innovation
coe.
Galois
Time
Random
Compressed
Random
time
domain
Random
Syndrome
Uniform
# Coe. + 1 +
Annihilating
ECG/
2 (# discont.
lter
OCT/
epochs)
(ELP)
UWB
2 # errors
Berlekamp
Digital
eld
or
-Massey/Viterbi/
communic-
codes
random
belief prop.
-tion
Real
Time
Random
eld
Uniform
2 # impulsive
Adaptive
Fault
domain
or
noise
thresholding
tolerant
Spectral
Spectral
RDE/ELP
system
MUSIC/
Military/
estimation
estimation
pisarenko/
radars
random
Frequency
Random
Time/
autocor-relation
Uniform
2 # tones
1
prony/MDL
Page 3 of 45
Transform
codes
10
signals
Random
Seismic/
even BW)
2 BW
-ter/periodic/
Frequency
Iterative metho-
-les/jitter/per-
disjoint
Random
2 BW 1
-iodic/random
multiband
sampling
Missing samp-
Table 1 Various topics and applications with sparsity properties: the sparsity, which may be in the time/space or frequency domains, consists of unknown
samples/coecients that need to be determined
11
Array
MSL/
processing
DOA
Space
Random
Sparse arr-
13
Uniform
autocor-
estimation
12
Space/
MDL+
Radars/
# sources
MUSIC/
sonar/
-relation
Space
Random/
Space
ESPRIT
ultrasound
Peaks of
2 # desired
Optimiz-
Radars/sonar/
-ay beam-
missing
sidelobes/
array
-ation: LP/
ultrasound/
-forming
elements
[non]uniform
elements
SA/GA
MSL
Uniform
2 BW
Similar
Seismic/
of random
to row 5
meteorology/
2 # active
l /2 /
Biomedical
sources
SL0
2# sparse
l /2 /
words
SL0
Sensor
Space
Random
Space
networks
eld
14
SCA
BSS
Active
Random
Time
Uniform
Uniform/
Linear mix-
Random
random
-ture of time
source/time
15
SDR
Dictionary
environmental
Data compression
samples
16
Channel
Multipath
estimation
channels
Time
Random
Frequency
Uniform/
2 # Spa-
l /
Channel equaliz-
or time
nonuniform
-rse channel
MIMAT
-ation/OFDM
Table 1 Various topics and applications with sparsity properties: the sparsity, which may be in the time/space or frequency domains, consists of unknown
samples/coecients that need to be determined (Continued)
components
The information domain consists of known samples/coecients in the frequency or time/space domain (the complement of the sparse domain). A list of acronyms is given in Table 2 at the end of this section; also, a list of
common notations is presented in Table 3. For denition of ESPRIT on row 11 and column 7, see the footnote on page 41.
Page 4 of 45
Page 5 of 45
AIC
ARMA
AR
Auto-Regressive
BW
BandWidth
BSS
CFAR
CAD
CS
Compressed Sensing
CG
Conjugate Gradient
DAB
CT
Computer Tomography
DCT
DC
DFT
DHT
DOA
Direction Of Arrival
DST
DT
Discrete Transform
DVB
DWT
EEG
ElectroEncephaloGraphy
ELP
ESPRIT
FDTD
Finite-Dierence Time-Domain
FETD
Finite-Element Time-Domain
FPE
GPSR
GA
Genetic Algorithm
ICA
HNQ
IDT
IDE
ISTA
IMAT
KLT
1
2
LDPC
LP
Linear Programming
MA
Moving Average
MAP
MDL
ML
Maximum Likelihood
MIMAT
Modied IMAT
MSL
Multi-Source Location
MMSE
NP
Non-Polynomial time
MUSIC
OFDM
OCT
OMP
OFDMA
PCA
OSR
PHD
PPM
Pulse-Position Modulation
POCS
RIP
RDE
RV
Residual Variance
RS
Reed-Solomon
SCA
SA
Simulated Annealing
SDFT
Sorted DFT
SDCT
Sorted DCT
SER
SDR
SL0
Smoothed 0 -norm
SI
Shift Invariant
ULA
SNR
Signal-to-Noise Ratio
WIMAX
UWB
WLAN
WMAN
Page 6 of 45
ICA can work if the sources are also sparse; for this special
case, ICA analysis is synonymous with SCA.
Finally, channel estimation is shown in row 16. In mobile
communication systems, multipath reections create a
channel that can be modeled by a sparse FIR lter. For
proper decoding of the incoming data, the channel characteristics should be estimated before they can be equalized.
For this purpose, a training sequence is inserted within the
main data, which enables the receiver to obtain the output of the channel by exploiting this training sequence.
The channel estimation problem becomes a deconvolution problem under noisy environments. The sparsity
criterion of the channel greatly improves the channel estimation; this is where the algorithms for extraction of a
sparse signal could be employed [21,22,35].
When sparsity is random, further signal processing is
needed. In this case, there are three items that need to
be considered. 1Evaluating the number of sparse coefcients (or samples), 2nding the positions of sparse
coecients, and 3determining the values of these coefcients. In some applications, only the rst two items are
needed; e.g., in spectral estimation. However, in almost
all the other cases mentioned in Table 1, all the three
items should be determined. Various types of linear programming (LP) and some iterative algorithms, such as the
IMAT with adaptive thresholding (IMAT), determine the
number, positions, and values of sparse samples at the
same time. On the other hand, the minimum description
length (MDL) method, used in DOA/MSL and spectral
estimation, determines the number of sparse source locations or frequencies. In the subsequent sections, we shall
describe, in more detail, each algorithm for various areas
and applications based on Table 1.
Finally, it should be mentioned that the signal model for
each topic or application may be deterministic or stochastic. For example, in the sampling category for rows 24
and 7, the signal model is typically deterministic although
stochastic models could also be envisioned [36]. On the
other hand, for random sampling and CS (rows 56), the
signal model is stochastic although deterministic models may also be envisioned [37]. In channel coding and
estimation (rows 89 and 16), the signal model is normally deterministic. For Spectral and DOA estimation
(rows 1011), stochastic models are assumed, whereas for
array beam-forming (row 12), deterministic models are
used. In sensor networks (row 13), both deterministic and
stochastic signal models are employed. Finally, in SCA
(rows 1415), statistical independence of sources may be
necessary and thus stochastic models are applied.
Order of sparsity
Original vector
Observed vector
Noise vector
un1 p
(1)
Since m < n, the vector sn1 cannot be uniquely recovered by observing the measurement vector xm1 ; however,
among the innite number of solutions to (1), the sparsest solution may be unique. For instance, if no 2k columns
of Amn are linearly dependent, the null-space of Amn
does not include any 2k-sparse vector (at most 2k nonzero elements) and therefore, the measurement vectors
(xmn ) of dierent k-sparse vectors are dierent. Thus, if
sn1 is sparse enough (k-sparse), the sparsest solution of
(1) is unique and coincides with sn1 ; i.e., perfect recovery.
Unfortunately, there are two obstacles here: (1) the vector
xm1 often includes an additive noise term, and (2) nding
the sparsest solution of a linear system is an NP problem
in general.
Since in the rest of the article, we are frequently dealing
with the problem of reconstructing the sparsest solution
of (1), we rst review some of the important reconstruction methods in this section.
min s1
Page 7 of 45
(2)
i : xi 0
(3)
1
||x As||22 + ||s||1
2
(4)
Note that J(s) is almost the Lagrange form of the constraint problem in (2) where the Lagrange multiplier is
1
, with the dierence that in (4), the minidened as 2
mization procedure is performed exclusively on s and not
on . Thus, the outcome of (4) coincides with that of (2)
Table 5 Relation between LP and basis pursuit (the
notation for LP is from [43])
Basis pursuit
Linear programming
2p
(1, . . . , 1)1m
j : uj , vj 0
Now by assuming that all the vectors and matrices are real,
it is easy to check that the minimizer of the following cost
function (F) corresponds to the minimizer of J(s):
1
F(z) = cT z + zT Bz s.t.
2
where
z=
u
v
B=
z0
c = 12n1 +
A A
AT x
(5)
AT A AT A
of J(.) than s(0) . This technique is useful only when nding the minimizer of J (.) is easier than solving the original
problem. In ISTA [45], at the kth iteration and by having
the estimate s(i) , the following alternative cost function is
used:
1
(8)
Ji (s) = J(s) + s s(i) 22 A(s s(i) )22
2
2
where is a scalar larger than all squared singular values
of A to ensure (7). By modifying the constant terms and
rewriting the above cost function, one can check that the
minimizer of Ji (.) is essentially the same as
arg min
AT x
AT A
Page 8 of 45
where
z(i) = s(i) +
(6)
1 H
A x As(i)
(9)
(10)
z>
z
0
|z|
(11)
S[, ] (z) =
z +
z<
(7)
3. Set z(i+1) = z(i) (i) F(z(i) ) + .
4. Check the termination criterion. If neither the maximum number of
iterations has passed nor a given stopping condition is fullled,
increase i and return to the 2nd step.
FOCal underdetermined system solver is a nonparametric algorithm that consists of two parts [46].
It starts by nding a low resolution estimation of the
sparse signal, and then pruning this solution to a sparser
signal representation through several iterations. The
solution at each iteration step is found by taking the
pseudo-inverse of a modied weighted matrix. The
pseudo-inverse of the modied weighted matrix is dened
by (AW)+ = (AW)H (AW (AW)H )1 . This iterative
Table 7 ISTA algorithm
1. Choose the scalar larger than all the singular values of A and set
i = 0. Also initialize s(0) , e.g, s(0) = A+ x.
2. Set z(i) = s(i) + 1 AH (x As(i) ).
3. Apply the shrinkage-threshold operator dened in (11):
sj(i+1) = S[, ] (zj(i) ), 1 j n
4. Check the termination criterion. If neither the maximum number of
iterations has passed nor a given stopping condition is fullled,
increase i and return to the 2nd step.
(12)
As discussed earlier, the criterion for sparsity is the 0 norm; thus our minimization is
min s0
Page 9 of 45
s.t. A s = x
(13)
T
l
l T
l
sj ai aj <
I = 1 i m : ai x
j =i
f (s) e
s2
2 2
1
0
if s = 0
if s
= 0
(15)
iIl
s21
s2n
(a) Let: s =[ s1 e 2 2 , . . . , sn e 2 2 ]T .
(b) Set s s s (where is a small positive
constant).
(c) Project s back onto the feasible set S :
1
s s AT AAT
(As x)
(14)
3. Set s i = s.
Final answer is s = s K
Page 10 of 45
35
30
SL0
OMP
FOCUSS
IDE
LP
SNR (dB)
25
20
15
10
5
0
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Figure 3 Performance of various methods with respect to the standard deviation when n = 1, 000, m = 400, and k = 100.
Figure 4 Computational time (complexity) versus the number of sources for m = 0.4 n and k = 0.1 n.
Page 11 of 45
Figure 5 Performance comparison of some reconstruction techniques for DFT sparse signals.
Figure 6 Performance comparison of some reconstruction techniques for sparse random trasnformations.
Page 12 of 45
Figure 7 Block diagram of the iterative reconstruction method. The mask is an appropriate lter with coecients of 1s and 0s depending on
the type of sparsity in the original signal.
as the bandwidth; thus, the law of algebra is equivalent to the Nyquist rate, i.e., twice the bandwidth (for
discrete signals with DC components it is twice the bandwidth minus one). The dual of frequency-sparsity is timesparsity, which can happen in a burst or a random fashion.
The number of frequency coecients needed follows
the Nyquist criterion. This will be further discussed in
Section 4 for sparse additive impulsive noise channels.
3.1 Sampling of sparse signals
If the sparsity locations of a signal are known in a transform domain, then the number of samples needed in
the time (space) domain should be at least equal to the
number of sparse coecients, i.e., the so-called Nyquist
rate. However, depending on the type of sparsity (lowpass,
bandpass, or random) and the type of sampling (uniform,
periodic nonuniform, or random), the reconstruction may
be unstable and the corresponding reconstruction matrix
may be ill-conditioned [51,52]. Thus in many applications
discussed in Table 1, the sampling rate in column 6 is
higher than the minimum (Nyquist) rate.
When the location of sparsity is not known, by the
law of algebra, the number of samples needed to specify
the sparsity is at least twice the number of sparse coefcients. Again for stability reasons, the actual sampling
rate is higher than this minimum gure [1,50]. To guarantee stability, instead of direct sampling of the signal,
a combination of the samples can be used. Donoho has
recently shown that if we take linear combinations of the
samples, the minimum stable sampling rate is of the order
O(k log( nk )), where n and k are the frame size and the
sparsity order, respectively [29].
3.1.1 Reconstruction algorithms
40
SNR (dB)
30
20
Simple
Chebyshev Acc.
Conjugate Gradient
10
0
0
10
20
30
40
Iteration Number
Figure 8 SNR improvement vs. the no. of iterations for a random
sampling set at the Nyquist rate (OSR = 1) for a bandpass signal.
in dB versus the number of iterations for a random sampling set for a bandpass signal. In this gure, besides the
standard iterative method, accelerated iterations such as
Chebyshev and conjugate gradient methods are also used
(please see [72] for the algorithms).
Iterative methods are quite robust against quantization
and additive noise. In fact, we can prove that the iterative methods approach the pseudo-inverse (least squares)
solution for a noisy environment; specially, when the
matrix is ill-conditioned [50].
Page 13 of 45
Figure 9 The IMAT for detecting the number, location, and values of sparsity.
Page 14 of 45
where s is an n1 vector which has at most k non-zero elements (k-sparse vectors). In practical cases, s has at most
k signicant elements and the insignicant elements are
set to zero which means s is an almost k-sparse vector.
For example, x can be the pixels of an image and can
be the corresponding IDCT matrix. In this case, most of
the DCT coecients are insignicant and if they are set to
zero, the quality of the image will not degrade signicantly.
In fact, this is the main concept behind some of the lossy
compression methods such as JPEG. Since the inverse
transform on x yields s, the vector s can be used instead
of x, which can be succinctly represented by the locations
and values of the nonzero elements of s. Although this
method eciently compresses x, it initially requires all the
samples of x to produce s, which undermines the whole
purpose of CS.
Now let us assume that instead of samples of x, we take
m linear combinations of the samples (called generalized
samples). If we represent these linear combinations by the
matrix mn and the resultant vector of samples by ym1 ,
we have
ym1 = mn xn1 = mn nn sn1
Figure 10 SNR vs. the no. of iterations for sparse signal recovery
using the IMAT (Table 12).
(16)
(17)
to a given probability density function) fails to be reconstructed using its generalized samples. If the probability
of failure can be made arbitrarily small, then the sampling
scheme (the joint pair of , ) is successful in recovering
x with probability 1 , i.e., with high probability.
Let us assume that (m) represents the submatrix
formed by m random (uniform) rows of an orthonormal
matrix nn . It is apparent that if we use {(m) }nm=0 as the
sampling matrices for a given sparsity domain, the failure
probabilities for (0) and (n) are, respectively, one and
zero, and as the index m increases, the failure probability decreases. The important point shown in [81] is that
the decreasing rate of the failure probability is exponential with respect to m
k . Therefore, we expect to reach an
almost zero failure probability much earlier than m = n
despite the fact that the exact rate highly depends on the
mutual behavior of the two matrices , . More precisely,
it is shown in [81] that
Pfailure < n e
c
m
2 (,) k
(18)
(19)
where a , b are the ath column and the bth row of the
matrices and , respectively. The above result implies
that the probability of reconstruction is close to one for
m 2 (, )
k ln n
c
Page 15 of 45
reconstruction with a probability of almost one, the following condition for the number of samples m suces
[2,79]:
n
(21)
m c k log
k
Notice that the required number of samples given in (20)
is for random sampling of an orthonormal basis while
(21) represents the required number of samples with i.i.d.
Gaussian distributed sampling matrix. Typically, the number in (21) is less than that of (20).
3.2.2 Reconstruction from compressed measurements
Geometric methods The oldest methods for reconstruction from compressed sampling are geometric, i.e., 1
minimization techniques for nding
a k-sparse vector
s Rn from a set of m = O k log(n) measurements
(yi s); see e.g., [29,81,84-86]. Let us assume that we have
applied a suitable which guarantees the invertibility of
the sampling process. The reconstruction method should
be a technique to recover a k-sparse vector sn1 from the
observed samples ym1 = mn nn sn1 or possibly ym1 = mn nn sn1 + m1 , where denotes
the noise vector. Suitability of implies that sn1 is the
only k-sparse vector that produces the observed samples;
therefore, sn1 is also the sparsest solution for y = s.
Consequently, s can be found using
minimize s0
(20)
2 (,(m) )
The above derivation implies that the smaller the maximum coherence between the two matrices, and the lower
is the number of required samples. Thus, to decrease the
number of samples, we should look for matrices with
low coherence with . For this purpose, we use a random . It is shown that the coherence of a random matrix
with i.i.d. Gaussian distribution with any unitary is considerably small [29], which makes it a proper candidate
for the sampling matrix. Investigation of the probability
distribution has shown that the Gaussian PDF is not the
only solution (for example binary Bernouli distribution
and other types are considered in [83]) but may be the
simplest to analyze.
For the case of random matrix with i.i.d. Gaussian distribution (or more general distributions for which the
concentration inequality holds [83]), a stronger inequality compared with (20) is valid; this implies that for the
subject to y = s
(22)
subject to y = s
(23)
The interesting part is that the number of required samples to replace 0 with 1 -minimization has the same order
of magnitude as the one for the invertibility of the sampling scheme. Hence, s can be derived from (22) using
1 -minimization. It is worthwhile to mention that replacement of 1 -norm with 2 -norm, which is faster to implement, does not necessarily produce reasonable solutions.
However, there are greedy methods (Matching Pursuit as
discussed in Section 7 on SCA [40,88]) which iteratively
approach the best solution and compete with the 1 norm optimization (equivalent to Basis Pursuit methods
as discussed in Section 7 on SCA).
k/m
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
m/n
Page 16 of 45
Ck (s) for some constant C independent of s. An implication of stability is that small perturbations in the signal
caused by noise result in small distortions in the output solution. The previous result means that if s is not
k-sparse, then s is close to the k-sparse vector sok that
has the k-largest components of s. In particular, if s is ksparse, then sok = s. This stability property is dierent
from the so-called robustness which is another important
characteristic that we wish to have in any reconstruction algorithm. Specically, an algorithm is robust if small
perturbations in the measurements are reected in small
errors in the reconstruction. Both stability and robustness
are achieved by the 1 -minimization algorithms (after a
slight modication of (22), see [83,91]). Although the two
concepts of robustness and stability can be related, they
are not the same.
In compressed sensing, the degree of stability and
robustness of the reconstruction is determined by the
characteristics of the sampling matrix . We say that the
matrix has RIP of order k, when for all k-sparse vectors
s, we have [30,76]:
1 k
s22
s22
1 + k
(24)
where 0 k < 1 (isometry constant). The RIP is a sufcient condition that provides us with the maximum and
minimum power of the samples with respect to the input
power and ensures that none of the k-sparse inputs fall in
the null space of the sampling matrix. The RIP property
essentially states that every k columns of the matrix mn
must be almost orthonormal (these submatrices preserve
the norm within the constants 1 k ). The explicit construction of a matrix with such a property is dicult for
any given n, k and m k log n; however, the problem has
been studied in some cases [37,92]. Moreover, given such
a matrix , the evaluation of s (or alternatively x) via the
minimization problem involves numerical methods (e.g.,
linear programming, GPSR, SPGL1, FPC [44,93]) for n
variables and m constraints which can be computationally
expensive.
However, probabilistic methods can be used to construct m n matrices satisfying the RIP property for a
given n, k and m k log n. This can be achieved using
Gaussian random matrices. If is a sample of a Gaussian random matrix with the number of rows satisfying
(20), is also a sample of a Gaussian random matrix
with the same number of rows and thus it satises RIP
with high probability. Using matrices with the appropriate
RIP property in the 1 -minimization, we guarantee exact
recovery of k-sparse signals that are stable and robust
against additive noise.
Without loss of generality, assume that is equal to the
identity matrix I, and that instead of s, we measure
subject to y s2
(25)
(26)
This shows that small perturbations in the measurements cause small perturbations in the output of the
1 -minimization method (robustness).
1
Cx (t , t + )
2
(29)
R
ci,r r
iZ r=1
t ti
Ts
(30)
Page 17 of 45
k
ci (t ti )
(31)
i=1
(27)
iZ
(32)
iZ
(33)
k
t
=
ci (ti j) (34)
x(t)
Ts
t=jTs
i=1
Thus,
r
r,j y[ j]
jZ
k
i=1
ci
r,j (ti j) =
k
ci tir
(35)
i=1
jZ
k
i=1
hi r+i1
(36)
Page 18 of 45
In order to nd the time instants ti , we nd the polynomial H(z) (or the coecients hi ) and we look for its roots.
A recursive relation for r becomes
h1
k+1
1 2 . . . k
k+2
2 3 . . . k+1 h2
(37)
..
.. ..
.
. . ..
..
.
. .
. .
k k+1 . . . 2k1
hk
2k
By solving the above linear system of equations, we
obtain coecients hi (for a discussion on invertibility
of the left side matrix see [102,104]) and consequently,
by nding the roots of H(z), the time instants will be
revealed. It should be mentioned that the choice of
1 , . . . , 2k in (37) can be replaced with any 2k consecutive
terms of {i }. After determining {ti }, (35) becomes a linear
system of equations with respect to the values {ci } which
could be easily solved.
This reconstruction method can be used for other types
of signals satisfying (30) such as the signals represented
by piecewise polynomials [102] (for large enough n, the
nth derivative of these signals become delta functions). An
important issue in nonlinear reconstruction is the noise
analysis; for the purpose of denoising and performance
under additive noise the reader is encouraged to see [27].
A nice application of sampling theory and the concept
of sparsity is error correction codes for real and complex numbers [105]. In the next section, we shall see that
similar methods can be employed for decoding block and
convolutional codes.
Iterative reconstruction for an erasure channel is identical to the missing sampling problem [115] discussed
in Section 3.1.1 and therefore, will not be discussed
here. Let us assume that we have a nite discrete signal
xorig [ i], where i = 1, . . . , l. The DFT of this sequence
yields l complex coecients in the frequency domain
(Xorig [ j] , j = 1, . . . , l). If we insert p consecutive zerose
to get n = l + p samples (X[ j] , j = 1, . . . , n) and
take its inverse DFT, we end up with an oversampled
version of the original signal with n complex samples
(a)
(b)
Figure 12 Convolutional encoders. (a) A real-eld systematic
convolutional encoder of rate 12 ; f [ i]s are the taps of an FIR lter. (b) A
non-systematic convolutional encoder of rate 12 , f1 [ i]s and f2 [ i]s are
the taps of 2 FIR lters.
Page 19 of 45
k
1
E[ r + t] hkt
hk
(38)
t=1
The simulation results for the ELP decoding implementation for n = 32, p = 16, and k = 16 erasures (a burst
of 16 consecutive missing samples from position 1 to 16)
are shown in Figure 13; this gure shows we can have perfect reconstruction up to the capacity of the code (up to
the nite computer precision which is above 320 dB; this
is also true for Figures 14 and 15). By capacity we mean
the maximum number of erasures that a code is capable of
correcting.
Since consecutive sample losses represent the worst case
[59,116], the proposed method works better for random
samples. In practice, the error recovery capability of this
technique degrades with the increase of the block and/or
burst size due to the accumulation of round-o errors. In
order to reduce the round-o error, instead of the DFT, a
transform based on the SDFT, or Sorted DCT (SDCT) can
be used [1,59,116]. These types of transformations act as
an interleaver to break down the bursty erasures.
4.1.2 Simulation results for random impulsive noise channel
There are several methods to determine the number, locations, and values of the impulsive noise samples, namely
Modied Berlekamp-Massey for real elds [118,119],
Page 20 of 45
ELP, IMAT, and constant false alarm rate with recursive detection estimation (CFAR-RDE). The BerlekampMassey method for real numbers is sensitive to noise and
will not be discussed here [118]. The other methods are
discussed below.
Figure 14 Simulation results of a convolutional decoder, using the iterative method with the generator matrix, after 30 CG iterations (see
[72]); SNR versus the relative rate of erasures (w.r.t. full capacity) in an erasure channel.
Page 21 of 45
(39)
The input signal is taken from a uniform random distribution of size 50 and the simulations are run 1, 000 times
and then averaged. The following subsections describe
the simulation results for erasure and impulsive noise
channels.
Figure 15 Simulation results by using the IMAT method for
detecting the location and amplitude of the impulsive noise,
= 1.9.
T
0
...
f1 [ 1]
f2 [ 1]
0
...
f1 [ 2] f1 [ 1] . . .
f2 [ 2] f2 [ 1] . . .
..
..
.
.
...
(40)
G=
f1 [ n] f1 [ n 1] . . .
f2 [ n] f2 [ n 1] . . .
0
f1 [ n] . . .
0
f2 [ n] . . .
0
.
.
.
..
..
..
.
.
.
An iterative decoding scheme for this matrix representation is similar to that of Figure 7 except that the operator
G consists of the generator matrix, a mask (erasure operation), and the transpose of the generator matrix. If the rate
of erasure does not exceed the encoder full capacity, the
Figure 16 CFAR-RDE method with the use of adaptive soft thresholding and an iterative method for signal reconstruction.
Page 22 of 45
Figure 17 Comparison of CFAR-RDE and a simple soft decision RDE for DFT block codes.
matrix form of the operator G can be shown to be a nonnegative denite square matrix and therefore its inverse
exists [51,60].
Figure 14 shows that the SNR values gradually decrease
as the rate of erasure reaches its maximum (capacity).
4.2.2 Decoding for impulsive noise channels
(41)
(42)
5 Spectral estimation
In this section, we review some of the methods which
are used to evaluate the frequency content of data [7-10].
In the eld of signal spectrum estimation, there are several methods which are appropriate for dierent types
of signals. Some methods are more suitable to estimate
the spectrum of wideband signals, whereas some others
are better for the extraction of narrow-band components.
Since our focus is on sparse signals, it would be reasonable to assume sparsity in the frequency domain, i.e., we
assume the signal to be a combination of several sinusoids
plus white noise.
Conventional methods for spectrum analysis are nonparametric methods in the sense that they do not assume
any model (statistical or deterministic) for the data, except
that it is zero or periodic outside the observation interval. For example, the periodogram P per (f ) is a well-known
nonparametric method that can be computed via the FFT
algorithm:
m1
2
j2frTs
Pper (f ) = 1 Ts
xr e
(43)
mTs
r=0
350
300
Rate = 1
Rate = 2
Rate = 5
Rate = 10
SNR (dB)
250
200
150
100
50
0
10
20
30
40
50
60
70
80
90
Percentage of Capacity
100
where m is the number of observations, Ts is the sampling interval (usually assumed as unity), and xr is the
signal. Although non-parametric methods are robust with
low computational complexity, they suer from fundamental limitations. The most important limitation is their
resolution; too closely spaced harmonics cannot be distinguished if the spacing is smaller than the inverse of the
observation period.
To overcome this resolution problem, parametric methods are devised. Assuming a statistical model with some
unknown parameters, we can increase resolution by estimating the parameters from the data at the cost of more
computational complexity. Theoretically, in parametric
methods, we can resolve closely spaced harmonics with
limited data length if the SNR goes to innity.h
k
bi zir
(44)
(45)
where ai , i , fi represent the amplitude, phase, and the frequency (fi is a complex number in general), respectively.
Let us dene the polynomial H(z) such that its roots represent the complex exponential functions related to the
sparse tones (see Section 3.3 on FRI, (38) on ELP and
Appendix 1):
k
"
(z zi ) =
i=1
k
hi z
ki
(48)
hi yri =
k
hi ri
(49)
i=0
i=0
hj xrj =
k
i=1
bi zirk
k
kj
hj zi
h = [ 1, h1 , . . . , hk ]T
= [ r , . . . , rk ]T
(50)
(51)
(52)
E{y H } = E{(x + ) H } = E{ H } = 2 I
(53)
(46)
By shifting the index of (44) and multiplying by the parameter hj and summing over j we get
j=0
y = [ yr , . . . , yrk ]T
bi = ai eji
k
i=0
i=1
H(z) =
Page 23 of 45
=0
(47)
j=0
(54)
MUltiple SIgnal Classication (MUSIC), is a method originally devised for high-resolution source direction estimation in the context of array processing that will be
discussed in the next section [122]. The inherent equivalence of array processing and time series analysis paves
the way for the employment of this method in spectral
estimation. MUSIC can be understood as a generalization
and improvement of the Pisarenko method. It is known
that in the context of array processing, MUSIC can attain
the statistical eciencyi in the limit of asymptotically large
number of observations [11].
In the PHD method, we construct an autocorrelation
matrix of dimension k + 1 under the assumption that its
smallest eigenvalue ( 2 ) belongs to the noise subspace.
Then we use the Hermitian property of the covariance
matrix to conclude that the noise eigenvector should
be orthogonal to the signal eigenvectors. In MUSIC, we
extend this method using a noise subspace of dimension
greater than one to improve the performance. We also use
some kind of averaging over noise eigenvectors to obtain
a more reliable signal estimator.
Table 14 PHD algorithm
1. Given the model order k (number of sinusoids), nd the autocorrelation
matrix of the noisy observations with dimension k + 1 (Ryy ).
2. Find the smallest eigenvalue ( 2 ) of Ryy and the corresponding
eigenvector (h).
3. Set the elements of the obtained vector as the coecients of the
polynomial in (46). The roots of this polynomial are the estimated
frequencies.
Page 24 of 45
(55)
for
1pm, 1qk
(56)
where represents the noise vector. Since the frequencies are dierent, A is of rank k and the rst term in (55)
forms a k-dimensional signal subspace, while the second
term is randomly distributed in both signal and noise subspaces; i.e., unlike the rst term, it is not conned to a
subspace of lower dimension. The correlation matrix of
the observations is given by
R = AbbH AH + 2 I
(57)
m
vi vH
i
(59)
i=k+1
Orig.
3
2
1
0
PHD
3
2
1
0
Prony
(60)
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
300
350
400
450
500
4
2
0
IMAT MUSIC
Page 25 of 45
3
2
1
0
3
2
1
0
Frequency (KHz)
Figure 19 A comparison of various spectral estimation methods for a sparse mixture of sinusoids (the top gure) using Prony, Pisarenko,
MUSIC, and IMAT methods (in the order of improved performance); input SNR is 5dB and 256 time samples are used.
Page 26 of 45
Figure 20 Uniform linear array with element distance d, element length I, and a wave arriving from direction .
resembles the spectral estimation problem with the dierence that sampling of the array elements is not limited in
time. In fact in array processing, an additional degree of
freedom (the number of elements) is present; thus, array
processing is more general than spectral estimation.
Two main elds in array processing are MSL and DOA
for estimating the source locations and directions, respectively; for both purposes, the angle of arrival (azimuth and
elevation) should be estimated while for MSL an extra
parameter of range is also needed. The simplest case is the
1-D ULA (azimuth-only) for DOA estimation.
For the general case of k sources with angles 1 , . . . , k
with respect to the array, the ULA response is given by
the matrix A() =[ a(1 ), . . . , a(k )], where the vector
of DOAs is dened as =[ 1 , . . . , k ]. In the above
notation, A is a matrix of size n k and a(i )s are column
vectors. Now, the vector of observations at array elements
(y[ i]) is given by
y[ i] = As[ i] +[ i]
(61)
(62)
two-stage approaches with the cost of higher computational complexity. In the following, we will describe
Minimum Description Length (MDL) as a powerful tool
to detect the number of active sources.
6.1.1 Minimum description length
i = 1, . . . , m
(63)
code length for the symbols. Furthermore, if ps is the distribution of the source s and qs is another distribution, we
have [138]:
#
#
(64)
H(s) = (ps log ps )ds (ps log qs )ds
where H(s) is the entropy of the signal. This implies that
the minimum average code length is obtained only for
the correct source distribution (model parameters); in
other words, the choice of wrong model parameters (distribution function) leads to larger code lengths. When a
particular model with the set of parameters is assumed
for the data a priori, each time a sequence y is received,
the parameters should rst be estimated. The optimum
estimation method is usually the ML estimator which
results in ML . Now, the probability
distribution for a
n
"
1
(2 2 )
n
2
k a sin( t+ ) 2
xt j=1
i
i
j
2 2
(66)
t=1
Page 27 of 45
(67)
i=1
are 1 2
and assume the ordered eigenvalues of R
tr{R1 R}
me
det(R)
(68)
where tr(.) stands for the trace operator. The ML estimate of signal eigenvalues in R are i , i = 1, . . . , k
with the respective eigenvectors {vi }ki=1 . Since k+1 =
= n = 2 , the ML estimate of the noise eigenvalue
2 = 1 n
is ML
vi }ni=k+1 are all noise eigeni=k+1 i and {
nk
is
vectors. Thus, the ML estimate of R given R
RML =
k
n
2
i v i v H
ML
i +
i=1
v i v H
i
(69)
i=k+1
k
n
1
i
log( i ) + m(n k) log
m
nk
i=1
Page 28 of 45
i=k+1
+ log(m)
(71)
2
where is the number of free parameters in the distribution. This expression should be computed for dierent
values of 0 k n 1 and its minimum point
should be
k MDL . Note that we can subtract the term m ni=1 log( i )
from the expression, which is not dependent on k to get
the well-known MDL criterion [129]:
1 n
i
i=k+1
nk
+ log(m)
(72)
m(n k) log
1
n
2
nk
i=k+1
=1+k+
k
2(n i) = n(2n k) + 1
(73)
i=1
where the rst term is the likelihood ratio for the sphericity test of the covariance matrix. This likelihood ratio is a
1
k=0
k=1
k=2
k=3
k=4
Probability
0.8
0.6
0.4
0.2
0
2
0
12
4
6
34
8
5
10
12
SNR in dB
Figure 21 An MDL example; the vertical axis is the probability of order detection. And the other two axes are the number of sources and the
SNR values. The MDL method estimates the number of active sources (which is 2) correctly when the SNR value is relatively high.
Wireless sensor networks typically consist of a large number of sensor nodes, spatially distributed over a region of
interest, that observe some physical environment including acoustic, seismic, and thermal elds with applications
in a wide range of areas such as health care, geographical monitoring, homeland security, and hazard detection.
The way sensor networks are used in practical applications can be divided into two general categories:
(1) There exists a central node known as the fusion
center (FC) that retrieves relevant eld information
from the sensor nodes and communication from the
sensor nodes to FC generally takes place over a
power- and bandwidth-constrained wireless channel.
(2) Such a central node does not exist and the nodes take
specic decisions based on the information they
obtain and exchange among themselves. Issues such
as distributed computing and processing are of high
importance in such scenarios.
In general, there are three main tasks that should be
implemented eciently in a wireless sensor network:
sensing, communication, and processing. The main challenge in design of practical sensor networks is to nd an
ecient way of jointly performing these tasks, while using
the minimum amount of system resources (computation,
power, bandwidth) and satisfying the required system
design parameters (such as distortion levels). For example, one such metric is the so-called energy-distortion
tradeo which determines how much energy the sensor
network consumes in extracting and delivering relevant
information up to a given distortion level. Although many
theoretical results are already available in the case of
point-to-point links in which separation between source
and channel coding can be assumed, the problem of
eciently transmitting or sharing information among a
vast number of distributed nodes remains a great challenge. This is due to the fact that well-developed theories
and tools for distributed signal processing, communications, and information theory in large-scale networked
systems are still under development. However, recent
results on distributed estimation or detection indicate
that joint optimization through some form of sourcechannel matching and local node cooperation can result in
signicant system performance improvement [143-147].
6.2.1 How sparsity can be exploited in a sensor network
Page 29 of 45
Compressive sensing in sensor networks Most natural phenomena in SNs are compressible through representation in a natural basis [86]. Some examples of
these applications are imaging in a scattering medium
[148], MIMO radar [149], and geo-exploration via underground seismic data. In such cases, it is possible to construct a highly compressed version of a given eld, in
a decentralized fashion. If the correlations between data
at dierent nodes are known a-priori, it is possible to
use schemes that have very favorable power-distortionlatency tradeos [143,155,156]. In such cases, distributed
source coding techniques, such as Slepian-Wolf coding,
can be used to design compression schemes without collaboration between nodes (see [155] and the references
therein). Since prior knowledge of such correlations is
not available in many applications, collaborative, intranetwork processing and compression are used to determine unknown correlations and dependencies through
information exchange between network nodes. In this
regard, the concept of compressive wireless sensing has
been introduced in [147] for energy-ecient estimation
at the FC of sensor data, based on ideas from wireless
communications [143,145,156-158] and compressive sampling theory [29,75,159]. The main objective in such an
approach is to combine processing and communications
in a single distributed operation [160-162].
Methods to obtain the required sparsity in a SN While
transform-based compression is well-developed in traditional signal and image processing domains, the understanding of sparse transforms for networked data is not
as trivial [163]. There are methods such as associating
a graph with a given network, where the vertices of the
graph represent the nodes of the network, and edges
between vertices represent relationships among data at
Page 30 of 45
Page 31 of 45
Figure 22 Computation of CS projections through superposition of radio waves of randomly weighted values directly from the nodes in
the network to the FC (from [163]).
desired degree of delity based on noisy sensor measurements. The inverse of sensing capacity is the compression
rate; i.e., the ratio of the number of measurements to
the number of signal dimensions which characterizes the
minimum rate to which the source can be compressed. As
shown in [14], sensing capacity is a function of SNR, the
inherent dimensionality of the information space, sensing
diversity, and the desired distortion level.
Another issue to be noted with respect to the sensing
capacity is the inherent dierence between sensor network and CS scenarios in the way in which the SNR is
handled [14,172]. In sensor networks composed of many
sensors, xed SNR can be imposed for each individual
sensor. Thus, the sensed SNR per location is spread across
the eld of view leading to a row-wise normalization of
the observation matrix. On the other hand, in CS, the
vector-valued observation corresponding to each signal
component is normalized by each column. This dierence
has led to dierent regimes of compression rate [172]. In
SN, in contrast to the CS setting, sensing capacity is generally small and correspondingly the number of sensors
required does not scale linearly with the target sparsity.
Specically, the number of measurements is generally proportional to the signal dimension and is weakly dependent
on target density sparsity. This issue has raised questions
on compressive gains in power-limited SN applications
based on sparsity of the underlying source domain.
Recovery of the original source signals from their mixtures, without having a priori information about the
sources and the way they are mixed, is called blind source
separation (BSS). This process is impossible if no assumption about the sources can be made. Such an assumption
on the sources may be uncorrelatedness, statistical independence, lack of mutual information, or disjointness in
some space [18,19,49].
The main assumption in ICA is the statistical independence of the constituent sources. Based on this assumption, ICA can play a crucial role in the separation and
denoising of signals (BSS).
There has been recent research interest in the eld of
BSS due to its practicality in a wide range of problems.
For example, BSS of acoustic signals measured in a room
is often referred to as the Cocktail Party problem, which
means separation of individual sounds from a number of
recordings in an echoic and noisy environment. Figure 23
illustrates the BSS concept, wherein the mixing block
represents the multipath propagation model between the
original sources and the microphone measurements.
Generally, BSS algorithms make assumptions about the
environment in order to make the problem more tractable.
There are typically three assumptions about the mixing
medium. The most simple but widely used case is the
instantaneous case, where the source signals arrive at the
sensors at the same time. This has been considered for
separation of biological signals such as the EEG where
the signals have narrow bandwidths and the sampling frequency is normally low [173]. The generative model for
BSS in this case can be easily formulated as
x[ i] = H s[ i] +[ i]
(74)
Page 32 of 45
(75)
n
hr,j sj [ i r,j ] +r [ i] ,
for r = 1, . . . , m
j=1
(76)
Figure 23 The BSS concept; the unobservable sources s1 [ i] , . . . , sn [ i] are mixed and corrupted by additive zero mean noise to generate
the observations x1 [ i] , . . . , xm [ i]. The target of BSS is to estimate an unmixing system to recover the original sources in y1 [ i] , . . . , yn [ i].
where the attenuation, hr,j , and delay r,j of source j to sensor r would be determined by the physical position of the
source relative to the sensors. Then the unmixing process
to estimate the sources will be given as
yj [ i] =
m
wj,r xr [ i j,r ] ,
for j = 1, . . . , n
(77)
r=1
n
L
Page 33 of 45
n
cr,l l [ i]
(79)
l=1
l
hlr,j sj [ i r,j
] +r [ i] ,
r = 1, . . . , m
j=1 l=1
(78)
where L denotes the maximum number of paths for the
sources, r [ i] is the accumulated noise at sensor r, and (.)l
refers to the lth path. The unmixing process will be formulated similarly to the anechoic one. For a known number
of sources, an accurate result may be expected if the number of paths is known; otherwise, the overall number of
observations in an echoic case is innite.
The aim of BSS using ICA is to estimate an unmixing
matrix W such that Y = WX best approximates the independent sources S, where Y and X are respectively matriT
and
ces with columns y[ i] = y1 [ i] , y2 [ i] , . . . , yn [ i]
T
x[ i] = [x1 [ i] , x2 [ i] , . . . , xm [ i] ] . Thus the ICA separation algorithms are subject to permutation and scaling
ambiguities in the output components, i.e. W = PDH1 ,
where P and D are the permutation and scaling (diagonal) matrices, respectively. Permutation of the outputs
is troublesome in places where either the separated segments of the signals are to be joined together or when a
frequency-domain BSS is performed.
Mutual information is a measure of independence and
maximizing the non-Gaussianity of the source signals is
equivalent to minimizing the mutual information between
them [177].
In those cases where the number of sources is more
than the number of mixtures (underdetermined systems),
the above BSS schemes cannot be applied simply because
the mixing matrix is not invertible, and generally the
original sources cannot be extracted. However, when the
signals are sparse, the methods based on disjointness of
the sources in some domain may be utilized. Separation
of the mixtures of sparse signals is potentially possible in
the situation where, at each sample instant, the number of
nonzero sources is not more than a fraction of the number
of sensors (see Table 1, row and column 6). The mixtures
of sparse signals can also be instantaneous or convolutive.
(80)
(a)
Page 34 of 45
they can be represented as sum of the members of a dictionary which can consist for example of wavelets or wavelet
packets. In these cases the SCA can be performed in those
domains more eciently. Such methods often include
transformation to time-frequency domain followed by a
binary masking [181] or a BSS followed by binary masking [176]. One such approach, called degenerate unmixing
estimation technique (DUET) [181], transforms the anechoic convolutive observations into the time-frequency
domain using a short-time Fourier transform and the relative attenuation and delay values between the two observations are calculated from the ratio of corresponding
time-frequency points. The regions of signicant amplitudes (atoms) are then considered to be the source components in the time-frequency domain. In this method only
two mixtures have been considered and as a major limit of
this method, only one source has been considered active
at each time instant.
For instantaneous separation of sparse sources, the
common approach used by most researchers is to attempt
to maximize the sparsity of the extracted signals at the
output of the separator. The columns of the mixing matrix
A assign each observed data point to only one source
based on some measure of proximity to those columns
[182], i.e., at each instant only one source is considered
active. Therefore the mixing system can be presented as:
xr [ i] =
n
aj,r sj [ i]
r = 1, . . . , m
(81)
j=1
(b)
where in an ideal case, aj,r = 0 for r
= j. Minimization of the 1 -norm is one of the most logical methods for
estimation of the sources as long as the signals can be considered sparse. 1 -norm minimization is a piecewise linear
operation that partially assigns the energy of x[ i] to the
m columns of A around x[ i] in Rn space. The remaining
nm columns are assigned zero coecients, therefore the
1 -norm minimization can be manifested as:
min s[ i] 1 subject to A s[ i] = x[ i]
(82)
been shown that for such sources, this method outperforms both DUET and Lis algorithms. The authors of
[185] have recently extended the DUET algorithm to separation of more than two sources in an echoic mixing
scenario in the time-frequency domain.
In a very recent approach, it has been considered that
brain signal sources in the space-time frequency domain
are disjoint. Therefore, clustering the observation points
in the space-time-frequency-domain can be eectively
used for separation of brain sources [186].
As it can be seen, generally, BSS exploits independence
of the source signals, whereas SCA benets from the disjointness property of the source signals in some domain.
While the BSS algorithms mostly rely on ICA with statistical properties of the signals, SCA uses their geometrical
and behavioral properties. Therefore, in SCA, either a
clustering approach or a masking procedure can result
in estimation of the mixing matrix. Often, an 1 -norm is
used to recover the source signals. Generally, in places
where the source signals are sparse, the SCA methods
often result in more accurate estimation of the signals with
less ambiguities in the estimation.
Page 35 of 45
e (F, {V1 , . . . , Vl }) =
m
i=1
min d2 (fi , Vj )
1jl
(83)
(a)
Page 36 of 45
(b)
k1
l (t)( l (t))
(84)
l=0
#+
h(t, )ej2f d
H(t, f ) =
(85)
morphologies has been presented by developing a multichannel morphological component analysis approach. In
this scheme, the signals are considered as combination
of features from dierent dictionaries. Therefore, dierent dictionaries are assumed for dierent sources. In [193]
inversion of a random eld from pointwise measurements
collected by a sensor network is presented. In this article,
it is assumed that the eld has a sparse representation in
a known basis. To illustrate the approach, the inversion of
an acoustic eld created by the superposition of a discrete
number of propagating noisy acoustic sources is considered. The method combines compressed sensing (sparse
reconstruction by 1 -constrained optimization) with distributed average consensus (mixing the pointwise sensor
measurements by local communication among the sensors). [194] addresses source separation from a linear mixture under source sparsity and orthogonality of the mixing
matrix assumptions. A two-stage separation process is
proposed. In the rst stage recovering a sparsity pattern of
the sources is tried by exploiting the orthogonality prior.
Input:
initial partition {F11 , . . . , Fl1 }
Data set F
Iterations:
1. Use the SVD to nd {V11 , . . . , Vl1 } by minimizing e Fi1 , Vi1 for each
1 1
i, and compute 1 = i e Fi , Vi ;
2. Set j = 1;
j j
j
j
3. While j = i e Fi , Vi > e F , {V1 , . . . , Vl }
%
$
j+1
j+1
j+1
that satises, f Fk
4. Choose a new partition F1 , . . . , Fl
j
j
implies that d f , Vk d f , Vh , h = 1, . . . , l;
j+1
j+1
Amplitude
(a)
0.8
0.6
0.4
0.2
0
Delay ( s)
Amplitude
(b)
Page 37 of 45
For OFDM, the discrete version of the time varying channel of (85) in the frequency domain becomes
H[ r, i] H(rTf , if ) =
0.8
n1
h[ r, l] e
j2 il
n
(86)
l=0
0.6
where
0.4
h[ r, l] = h(rTf , lTs )
(87)
0.2
0
Delay ( s)
Figure 26 The impulse response of two typical multipath
channels. (a) Brazil-D and (b) TU6 channel proles.
(88)
Page 38 of 45
h 0 = F+
ip Hip = Fip Fip h + Fip ip
1 H
1
Fip Fip h + Fip H ip
N
N
(89)
GNN
where we used
CS-based channel estimation The idea of using timedomain sparsity in OFDM channel estimation has been
proposed by [203-205]. There are two main advantages in
including the sparsity constraint of the channel impulse
response in the estimation process:
(1) Decrease in the MSE: By applying the sparsity
constraint, the energy of the estimated channel
impulse response will be concentrated into a few
coecients while in the conventional methods, we
usually observe a leakage of the energy to the
neighboring coecients of the nonzero taps. Thus, if
the sparsity-based methods succeed in estimating the
support of the channel impulse response, the MSE
will be improved by prevention of the leakage eect.
(2) Reduction in the overhead: The number of pilot
sub-carriers is in fact, the number of (noisy) samples
that we obtain from the channel frequency response.
Since the pilot sub-carriers do not convey any data,
they are considered as the overhead imposed to
enhance the estimation process. The theoretical
results in [203] indicate that by means of
sparsity-based methods, the perfect estimation can
be achieved with an overhead proportional to the
number of non-zero channel taps (which is
considerably less than that of the current standards).
In the sequel, we present two iterative methods which
exploit the inherent sparsity of the channel impulse
response to improve the channel estimation task in
OFDM systems.
8.1.3 Iterative method with adaptive thresholding (IMAT)
for OFDM channel estimation [206]
1
H 1
= FH
Fip + = FH
ip (Fip Fip )
ip .
N
(90)
1
N INP Np
0
otherwise
(91)
(92)
where and i are the relaxation parameter and the iteration number, respectively, k is the index of channel
impulse response and G = N1 FH
ip Fip is dened in (89).
The block diagram of the proposed channel estimation
method is shown in Figure 27.
8.1.4 Modied IMAT (MIMAT) for OFDM channel estimation
[23]
(93)
Page 39 of 45
Figure 28 SER vs. CNR for the ideal channel, linear interpolation,
GPSR, OMP, and the IMAT for the Brazil channel at Fd = 0
without zeropadding eect.
(94)
Figure 29 SER vs. CNR for the ideal channel, linear interpolation,
GPSR, CoSaMP, and the IMAT for the Brazil channel at
Fd = 50 Hz without zeropadding eect.
Figure 30 SER vs. CNR for the ideal channel, linear interpolation,
GPSR, CoSaMP and the MIMAT for the Brazil channel at Fd = 0
including zeropadding eect.
9 Conclusion
A unied view of sparse signal processing has been presented in tutorial form. The sparsity in the key areas
of sampling, coding, spectral estimation, array processing, component analysis, and channel estimation has been
carefully exploited. Some form of uniform or random
sampling has been shown to underpin the associated
sparse processing methods used in each of these elds.
The reconstruction methods used in each application
domain have been introduced and the interconnections
among them have been highlighted.
This development has revealed, for example, that the
iterative methods developed for random sampling can
be applied to real-eld block and convolutional channel
coding for impulsive noise (salt-and-pepper noise in the
case of images) removal, SCA, and channel estimation
for orthogonal frequency division multiplexing systems.
These iterative reconstruction methods have been shown
to be naturally extendable to spectral estimation and
sparse array processing due to their similarity to channel
Page 40 of 45
coding in terms of mathematical models with signicant improvements. Conversely, the minimum description length method developed for spectral estimation and
array processing has potential for application in other
areas. The error locator polynomial method developed for
channel coding has, moreover, been shown to be a discrete
version of the annihilating lter used in sampling with a
nite rate of innovation and the Prony method in spectral estimation; the Pisarenko and MUSIC methods are
further improvements of the Prony method when additive
noise is also considered.
Linkages with emergent areas such as compressive sensing and channel estimation have also been considered.
In addition, it has been suggested that the linear programming methods developed for compressive sensing
and SCA can be applied to other applications with possible reduction of sampling rate. As such, this tutorial
has provided a route for new applications of sparse signal
processing to emerge, which can potentially reduce computational complexity and improve performance quality.
Other potential applications of sparsity are in the areas of
sensor networks and sparse array design.
Endnotes
a Sparse
d Note
section, we can resolve any closely spaced sources conditioned on (1) limited snapshots and innite SNR, or (2)
limited SNR and innite number of observations, while
the spatial aperture of the array is kept nite.
i Statistical eciency of an estimator means that it is
asymptotically unbiased and its variance goes to zero.
j The array in ESPRIT is composed of sensor doublets with
the same displacement. The parameters of the impinging
signals can be estimated via a rotational invariant property of the signal subspace. The complexity and storage
of ESPRIT is less than MUSIC; it is also less vulnerable
to array imperfections. ESPRIT, unlike MUSIC results in
an unbiased DOA estimate; nonetheless, MUSIC outperforms ESPRIT, in general.
k For a video introduction to these concepts, please refer
to http://videolectures.net/icml08 grunwald mdl.
l Spherical subspace implies the eigenvalues of the autocorrelation matrix are equal in that subspace.
m Similar to Pisarenko method for spectral estimation in
Section 5.2.
n These acronyms are dened in Table 2 at the end of
Section 1.
o In current OFDM standards, a number of subcarriers at
both edges of the bandwith are set to zero to ease the
process of analog bandpass ltering.
Appendix 1
ELP decoding for erasure channels [59]
For lost samples, the polynomial locator for the erasure
samples is
H(zi ) =
k
k
"
2 im
ht zikt
zi ej n
=
m=1
(95)
t=0
H(zim ) = 0, m = 1, 2, . . . , k
(96)
j 2n i
where zi = e
. The polynomial coecients ht , t =
0, . . . , k can be found from the product in (95); it is easier
to nd ht by obtaining
r inverse FFT of H(z). Multi the
plying (96) by e[ im ] zim (where r is an integer) and
summing over m, we get
k
t=0
ht
k
k+rt
e[ im ] zim
=0
(97)
m=1
ht E[ k + r t] = 0
(98)
t=0
Page 41 of 45
j = 1, . . . , n
(99)
j
(100)
E[ r] =
(101)
t=1
where r
/ and the index additions are in mod(n).
Appendix 2
ELP decoding for impulsive noise channels [31,104]
For all integer values of r such that r and r + k ,
we obtain a system of k equations with k +1 unknowns (ht
coecients). These equations yield a unique solution for
the polynomial with the additional condition that the rst
nonzero ht is equal to one. After nding the coecients,
we need to determine the roots of the polynomial in (95).
2 im
Since the roots of H(z) are of the form ej n , the inverse
DFT (IDFT) of the {hm }km=0 can be used. Before performing IDFT, we have to pad n 1 k zeros at the end of the
{hm }km=0 sequence to obtain an n-point signal. We refer to
the new signal (after IDFT) as {Hi }n1
i=0 . Each zero in {Hi }
represents an error in r[ i] at the same location.
Competing interests
The authors declare that they have no competing interests.
Acknowledgements
We would like to sincerely thank our colleagues for their specic contributions
in various sections in this article. Especially, Drs. S. Holm from University of Oslo
who contributed a section in sparse array design, M. Nouri Moghadam, the
director of the Newton Foundation, H. Saeedi from University of
Massachusetts, and K. Mehrany from EE Dept. of Sharif University of
Technology who contributed to various sections of the original paper before
revision. We are also thankful to M. Valiollahzadeh who edited and contributed
to the SCA section. We are especially indebted to Prof. B. Sankur from Bogazici
University in Turkey for his careful review and comments. We are also thankful
to the students of the Multimedia Lab and members of ACRI at Sharif
University of Technology for their invaluable help and simulations. We are
specically indebted to A. Hosseini, A. Rashidinejad, R. Eghbali, A. Kazerouni, V.
Montazerhodjat, S. Jafarzadeh, A. Salemi, M. Soltanalian, M. Sharif and H.
Firouzi. The work of Akram Aldroubi was supported in part by grant NSF-DMS
0807464.
Author details
1 Electrical Engineering Department, Advanced Communication Research
Institute (ACRI), Sharif University of Technology, Tehran, Iran. 2 Department of
Electical Engineering, Iran University of Science and Technology, Tehran, Iran.
3 Math Department, Vanderbilt University, Nashville, USA. 4 Department of
Computing, University of Surrey, Surrey, UK. 5 Electrical and Electronic
Department, Loughborough University, Loughborough, UK.
Received: 30 July 2011 Accepted: 23 November 2011
Published: 22 February 2012
Page 42 of 45
30.
References
1.
F Marvasti, Nonuniform Sampling: Theory and Practice. (Springer, New
York, 2001)
2.
RG Baraniuk, A lecture on compressive sensing. IEEE Signal Process. Mag.
24(4), 118121 (2007)
3.
M Vetterli, P Marziliano, Blu T, Sampling signals with nite rate of
innovation. IEEE Trans. Signal Process. 50(6), 14171428 (2002)
4.
S Lin, DJ Costello, Error Control Coding. (Prentice-Hall, Englewood Clis,
1983)
5.
T Richardson, R Urbanke, Modern Coding Theory. (Cambridge University
Press, Cambridge, 2008)
6.
F Marvasti, M Hung, MR Nakhai, in Proc. IEEE Int. Conf. Acoust, Speech
Signal Proc. ICASSP99, vol. 5. The application of walsh transform for
forward error correction (Phoenix, AZ, USA, 1999), pp. 24592462
7.
SL Marple, Digital Spectral Analysis. (Prentice-Hall, Englewood Clis, 1987)
8.
SM Kay, SL Marple, Spectrum analysisa modern perspective. Proc. IEEE
(Modern Spectrum Analysis II) 69(11), ((1981))
9.
SM Kay, Modern Spectral Estimation: Theory and Application.
(Prentice-Hall, Englewood Clis, 1988)
10. P Stoica, RL Moses, Introduction to Spectral Analysis. (Upper Saddle River,
Prentice-Hall, 1997)
11. P Stoica, A Nehorai, maximumlikelihoodandCramer-Raobound Music.
IEEE Trans. ASSP. 37(5), 720741 (1989)
12. P Stoica, A Nehorai, Performance study of conditional and unconditional
direction-of-arrival estimation. IEEE Trans. ASSP. 38(10), 17831795 (1990)
13. S Holm, A Austeng, K Iranpour, JF Hopperstad, in Nonuniform Sampling:
Theory and Practice, ed. by F Marvasti. Sparse sampling in array
processing (Springer, New York, 2001), pp. 787833
14. S Aeron, M Zhao, V Saligrama, in in Asilomar Conference on Signals,
Systems and Computers, ACSSC06. Fundamental tradeos between
sparsity,sensing diversity,sensing capacity (OctNov, Pacic Grove,CA,
2006), pp. 295299
15. P Boll, M Zibulevsky, Underdetermined blind source separation using
sparse representations. Signal Process Elsevier. 81(11), 23532362 (2001)
16. MA Girolami, JG Taylor, in Self-Organising Neural Networks:Independent
Component Analysis and Blind Source Separation (Springer, London, 1999)
17. P Georgiev, F Theis, A Cichocki, Sparse component analysis and blind
source separation of underdetermined mixtures. IEEE Trans. Neural
Netw. 16(4), 992996 (2005)
18. M Aharon, M Elad, AM Bruckstein, The k-svd: an algorithm for designing
of overcomplete dictionaries for sparse representation. IEEE Trans. Signal
Process. 54(11), 43114322 (2006)
19. M Aharon, M Elad, AM Bruckstein, On the uniqueness of overcomplete
dictionaries, and a practical way to retrieve them. Linear Algebra Appl.
416(1), 4867 (2006)
20. R Gribonval, M Nielsen, Sparse representations in unions of bases. IEEE
Trans. Inf. Theory. 49(12), 33203325 (2003)
21. P Fertl, G Matz, in Proc. Asil. Conf. Sig. Sys. and Computers. Ecient OFDM
channel estimation in mobile environments based on irregular sampling
(Pacic Grove, US, May 2007), pp. 17771781
22. O Ureten, N Serinken, in IEEE conf. on Vehicular Technol. Conf. (VTC).
Decision directed iterative equalization of OFDM symbols using
non-uniform interpolation (Ottawa, Canada, Sep. 2007), pp. 15
23. M Soltanolkotabi, A Amini, F Marvasti, in Proc. EUSIPCO09. OFDM
channel estimation based on adaptive thresholding for sparse signal
detection (Glasgow, Scotland, Aug 2009), pp. 16851689
24. JL Brown, Sampling extentions for multiband signals. IEEE Trans. Acoust.
Speech Signal Process. 33, 312315 (1985)
25. OG Guleryuz, Nonlinear approximation based image recovery using
adaptive sparse reconstructions and iterated denoising, parts I and II.
IEEE Trans. Image Process. 15(3), 539571 (2006)
26. H Rauhut, On the impossibility of uniform sparse reconstruction using
greedy methods. Sampl Theory Signal Image Process. 7(2), 197215
(2008)
27. T Blu, P Dragotti, M Vetterli, P Marziliano, P Coulot, Sparse sampling of
signal innovations: Theory, algorithms, and performance bounds. IEEE
Signal Process. Mag. 25(2) (2008)
28. F Marvasti, Guest editors comments on special issue on nonuniform
sampling. Sampl Theory Signal Image Process. 7(2), 109112 (2008)
29. D Donoho, Compressed sensing. IEEE Trans. Inf. Theory. 52(4),
12891306 (2006)
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
Page 43 of 45
83.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
H Feichtinger, K Grochenig,
in Wavelets- Mathematics and Applications,
ed. by JJ Benedetto, M Frazier. Theory and practice of irregular sampling
(CRC Publications, Boca Raton, 1994), pp. 305363
PJSG Ferreira, Noniterative and fast iterative methods for interpolation
and extrapolation. IEEE Trans. Signal Process. 42(11), 32783282 (1994)
A Aldroubi, K Grochenig,
Non-uniform sampling and reconstruction in
shift-invariant spaces. SIAM Rev. 43(4), 585620 (2001)
A Aldroubi, Non-uniform weighted average sampling exact
reconstruction in shift-invariant and wavelet spaces. Appl. Comput.
Harmon. Anal. 13(2), 151161 (2002)
A Papoulis, C Chamzas, Detection of hidden periodicities by adaptive
extrapolation. IEEE Trans. Acoust. Speech Signal Process. 27(5), 492500
(1979)
WYXu C Chamzas, An improved version of Papoulis-Gerchberg
algorithm on band-limited extrapolation. IEEE Trans. Acoust. Speech
Signal Process. 32(2), 437440 (1984)
PJSG Ferreira, Interpolation and the discrete Papoulis-Gerchberg
algorithm. IEEE Trans. Signal Process. 42(10), 25962606 (1994)
K Grochenig,
T Strohmer, in Nonuniform Sampling: Theory and Practice ,
ed. by F Marvasti. Numerical and theoretical aspects of no (Springer,
formerly Kluwer Academic/Plenum Publishers, New York, 2001),
pp. 283324
DC Youla, Generalized image restoration by the method of alternating
orthogonal projections. IEEE Trans. Circuits Syst.
25(9), 694702 (1978)
DC Youla, H Webb, Image restoration by the method of convex
projections: Part 1-theory. IEEE Trans. Med. Imag. 1(2), 8194 (1982)
K Grochenig,
Acceleration of the frame algorithm. IEEE Trans. Signal
Process. 41(12), 33313340 (1993)
A Ali-Amini, M Babaie-Zadeh, C Jutten, A New Approach for Sparse
Decomposition and Sparse Source Separation (EUSIPCO2006,Florence)
(2006)
F Marvasti, in Nonuniform Sampling: Theory and Practice, ed. by F
Marvasti. Applications to error correction codes (Springer, New York,
2001), pp. 689738
E Candes, J Romberg, T Tao, Robust uncertainty principles: exact signal
reconstruction from highly incomplete frequency information. IEEE
Trans. Inf. Theory. 52(2), 489509 (2006)
E Candes, T Tao, Near-optimal signal recovery from random projections:
universal encoding strategies. IEEE Trans. Inf. Theory. 52(12), 54065425
(2006)
Y Tsaig, D Donoho, Extensions of compressed sensing. Signal Process.
86(3), 549571 (2006)
AJ Jerri, The Shannon sampling theorem-its various extension and
applications: a tutorial review. Proc IEEE. 65(11),
15651596 (1977)
E Candes, M Wakin, An introduction to compressive sampling. IEEE
Signal Process. Mag. 25(2), 2130 (2008)
Y Eldar, Compressed sensing of analog signals in shift-invariant spaces.
IEEE Trans. Signal Process. 57(8), 29862997 (2009)
E Candes, J Romberg, Sparsity and incoherence in compressive
sampling. Inverse Probl. 23, 969985 (2007)
D Donoho, X Hou, Uncertainty principle and ideal atomic
decomposition. IEEE Trans.Inf. Theory. 47(7), 28452862 (2001)
84.
85.
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
137.
138.
139.
140.
141.
142.
143.
144.
145.
146.
147.
148.
149.
150.
151.
152.
153.
154.
155.
156.
157.
158.
159.
160.
Page 44 of 45
Page 45 of 45