0% found this document useful (0 votes)
3 views

Deterministic Cs

The document outlines the principles of compressed sensing, focusing on the restricted isometry property (RIP) and its implications for signal recovery from undersampled and noisy data. It discusses the conditions under which exact recovery of sparse signals is possible, the use of LASSO for recovery in the presence of noise, and the importance of matrix properties for ensuring unique solutions. Additionally, it covers the probabilistic and deterministic frameworks for analyzing signal recovery performance.

Uploaded by

Dhiraj Hegde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Deterministic Cs

The document outlines the principles of compressed sensing, focusing on the restricted isometry property (RIP) and its implications for signal recovery from undersampled and noisy data. It discusses the conditions under which exact recovery of sparse signals is possible, the use of LASSO for recovery in the presence of noise, and the importance of matrix properties for ensuring unique solutions. Additionally, it covers the probabilistic and deterministic frameworks for analyzing signal recovery performance.

Uploaded by

Dhiraj Hegde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Stats 330: An Introduction to Compressed Sensing

Agenda

Deterministic approach to compressive sensing

1 The restricted isometry property (RIP)


2 General signal recovery from undersampled data
3 General signal recovery from noisy undersampled data
4 Examples of measurements obeying RIP
So far...

Exact recovery of sparse signals


1 We need to deal with compressible signals (not exactly sparse)
2 We need to deal with noise
So far...

Exact recovery of sparse signals


1 We need to deal with compressible signals (not exactly sparse)
2 We need to deal with noise

Can be addressed in the probabilistic framework


Can be addressed in a deterministic framework: this lecture
Conditioning of submatrices

Key by-product of signal recovery result:


A ∈ Rm×n (sensing matrix) with iid rows
T arbitrary set of size x
Near isometry with high probability: for all x supported on T
1 1 3
kxk2`2 ≤ kAxk2`2 ≤ kxk2`2
2 m 2
Can be interpreted as an uncertainty relation (think of A as being a partial
DFT)
Restricted isometries: C. and Tao (04)

Definition (Restricted isometry constants)


For each k = 1, 2, . . . , δk is the smallest scalar such that

(1 − δk )kxk2`2 ≤ kAxk2`2 ≤ (1 + δk )kxk2`2

for all k-sparse x

Note slight change of normalization


When δk is not too large, condition says that all m × k submatrices are well
conditioned (sparse subsets of columns are not too far from orthonormal)
General setup

x not necessarily sparse 0.5

observe b = Ax
recover by `1 mimimization
0

min kx̂k`1 s. t. Ax̂ = b


!0.5
0 5 10 15 20 25 30 35 40 45 50

Interested in comparing performance with sparsest approximation xs :

xs = arg minkzk` ≤s kx − zk
0

xs : s-sparse
s-largest entries of x are the nonzero entries of xs
General signal recovery

1
Theorem (Noiseless recovery (C., Romberg and
Taoa ))
0.5
a This version due to C. 08

0 If δ2s < 2 − 1 = 0.414 . . ., `1 recovery obeys

!0.5
0 5 10 15 20 25 30 35 40 45 50
kx̂ − xk`2 . kx − xs k`1 / s
kx̂ − xk`1 . kx − xs k`1
General signal recovery

1
Theorem (Noiseless recovery (C., Romberg and
Taoa ))
0.5
a This version due to C. 08

0 If δ2s < 2 − 1 = 0.414 . . ., `1 recovery obeys

!0.5
0 5 10 15 20 25 30 35 40 45 50
kx̂ − xk`2 . kx − xs k`1 / s
kx̂ − xk`1 . kx − xs k`1

Deterministic (nothing is random)


Universal (applies to all x)

Exact if x is s-sparse.
Otherwise, essentially reconstructs the s largest entries of x
Powerful if s is close to m
General signal recovery from noisy data
Inaccurate measurements: z error term (stochastic or deterministic)

b = Ax + z, with kzk`2 ≤ 

Recovery via the LASSO: `1 minimization with relaxed constraints

min kx̂k`1 s. t. kAx̂ − bk`2 ≤ 


General signal recovery from noisy data
Inaccurate measurements: z error term (stochastic or deterministic)

b = Ax + z, with kzk`2 ≤ 

Recovery via the LASSO: `1 minimization with relaxed constraints

min kx̂k`1 s. t. kAx̂ − bk`2 ≤ 

Theorem (C., Romberg and Tao)



Assume δ2s < 2 − 1, then

kx̂ − xk`2 . kx − xs k`1 / s +  = approx. error + measurement error

(numerical constants hidden in . are explicit, see proof)

When  = 0 (no noise), earlier result


Says when we can solve underdetermined systems of equations accurately
Interlude: when does sparse recovery make sense?

= *
x is s-sparse: kxk`0 ≤ s
can we recover x from Ax = b?

*
Perhaps possible if sparse vectors lie away from
null space of A
*
Interlude: when does sparse recovery make sense?

= *
x is s-sparse: kxk`0 ≤ s
can we recover x from Ax = b?

*
Perhaps possible if sparse vectors lie away from
null space of A
*

Yes if any 2s columns of A are linearly


independent

If x1 , x2 s-sparse such that Ax1 = Ax2 = b

A(x1 −x2 ) = 0 ⇒ x1 −x2 = 0 ⇔ x1 = x2


Interlude: when does sparse recovery make sense?

= *
x is s-sparse: kxk`0 ≤ s
can we recover x from Ax = b?

*
Perhaps possible if sparse vectors lie away from
null space of A
*

In general, No if A has 2s linearly dependent


columns

h 6= 0 is 2s-sparse with Ah = 0

h = x1 − x2 x1 , x2 both s-sparse

Ah = 0 ⇔ Ax1 = Ax2 and x1 6= x2


Equivalent view of restricted isometry property

δ2k is the smallest scalar such that

(1 − δ2k )kx1 − x2 k2`2 ≤ kAx1 − Ax2 k2`2 ≤ (1 + δ2k )kx1 − x2 k2`2

for all k-sparse vectors x1 , x2

The positive lower bounds is that which really matters


If lower bound does not hold, then we may have x1 and x2 both sparse and
with disjoint supports, obeying

Ax1 = Ax2

Lower bound guarantees that distinct sparse signals cannot be mapped too
closely (analogy with codes)
Formal equivalence

Suppose there is an s-sparse solution to Ax = b


δ2s < 1 solution to combinatorial optimization (min `0 ) is unique
δ2s < 0.414 solution to LP relaxation is unique and the same
With a picture
!"#$%%&'!()'$*'$'+,-$.%"/ 01."22345
For all k-sparse x1 and x2
6 !()'78'792"9'2! 31:%3"*&'879'$%%'
kAx1 − Ax2 k2 !;*:$9*"'"< $42'"=>
2
1 − δ2k ≤ ≤ 1 + δ2k
kx1 − x2 k22

K-planes

Picture from M. Wakin (Φ is our A)


Number of samples for RIP

Sampling mechanism order of m


Gaussian1 s log(n/s)
Binary2 s log(n/s)
Partial DFT3 s(log n)4
In general4 µs(log n)4

All with high probability

1
means entries are iid N (0, 1)
2
means entries are iid Bernoullis Aij = ±1 w.p. 1/2
3
means rows (frequencies) are selected at random (result due to C. and Tao,
Rudelson and Vershynin)
4
means rows iid sampled from F obeying isotropy condition and with
coherence µ
Gaussian matrices and RIP

Ai,j i.i.d. N (0, 1/m)


To show that for all T of size ≤ 2s

0.59 ≤ λmin (A∗T AT ) ≤ λmax (A∗T AT ) ≤ 1.41

This is a question about eigenvalues of random matrices


Well studied subject
Equivalent formulation
√ √
.59 ≤ σmin (AT ) ≤ σmax (AT ) ≤ 1.41

σmin , σmax singular values


Asymptotic theory of Gaussian matrices

Xi,j m by s matrix with i.i.d. N (0, 1/m) entries


Asymptotic theory of Gaussian matrices

Xi,j m by s matrix with i.i.d. N (0, 1/m) entries


Marchenko Pastur (1967): the spectrum of X ∗ X has a deterministic limit
distribution supported by the interval
√ √
[(1 − c)2 , (1 + c)2 ] s, m → ∞, s/m → c < 1
Asymptotic theory of Gaussian matrices

Xi,j m by s matrix with i.i.d. N (0, 1/m) entries


Marchenko Pastur (1967): the spectrum of X ∗ X has a deterministic limit
distribution supported by the interval
√ √
[(1 − c)2 , (1 + c)2 ] s, m → ∞, s/m → c < 1

Silverstein (1985)
√ √
σmin (X) → 1 − c a.s., σmax (X) → 1 + c a.s.
Asymptotic theory of Gaussian matrices

Xi,j m by s matrix with i.i.d. N (0, 1/m) entries


Marchenko Pastur (1967): the spectrum of X ∗ X has a deterministic limit
distribution supported by the interval
√ √
[(1 − c)2 , (1 + c)2 ] s, m → ∞, s/m → c < 1

Silverstein (1985)
√ √
σmin (X) → 1 − c a.s., σmax (X) → 1 + c a.s.

We need large deviation results for finite sizes—not asymptotic results


Large deviations

Fix m by s Gaussian matrix


 p  2
P σmax (X) > 1 + s/m + t ≤ e−mt /2
 p  2
P σmin (X) < 1 − s/m − t ≤ e−mt /2
Large deviations

Fix m by s Gaussian matrix


 p  2
P σmax (X) > 1 + s/m + t ≤ e−mt /2
 p  2
P σmin (X) < 1 − s/m − t ≤ e−mt /2

Can we justify this?


Borell’s inequality

Let X be a n-dimensional vector of centered, unit variance, independent Gaussian


variables. If f : Rn → R is Lipshitz

|f (x) − f (y)| ≤ kf kLip · kx − yk,

then for all t > 0


2
/2kf k2Lip
P(f (X) − E f (X) > t) ≤ e−t .
Borell’s inequality

Let X be a n-dimensional vector of centered, unit variance, independent Gaussian


variables. If f : Rn → R is Lipshitz

|f (x) − f (y)| ≤ kf kLip · kx − yk,

then for all t > 0


2
/2kf k2Lip
P(f (X) − E f (X) > t) ≤ e−t .

X is an m by s array of i.i.d. N (0, 1) variables


σmin and σmax are 1-Lipshitz (k · kF Frobenius norm)

|σmin (X) − σmin (X 0 )| ≤ kX − X 0 kF


|σmax (X) − σmax (X 0 )| ≤ kX − X 0 kF

Expectations obey
√ √ √ √
E σmin (X) ≥ m− s, E σmax (X) ≤ m+ s
Apply Borell
√ √ 2
s + t ≤ e−t /2

P σmax (X) > m+
√ √ 2
P σmin (X) < m − s − t ≤ e−t /2

Apply Borell
√ √ 2
s + t ≤ e−t /2

P σmax (X) > m+
√ √ 2
P σmin (X) < m − s − t ≤ e−t /2


Our claim is just this after renormalization


RIP for Gaussiam matrices
For each T with |T | ≤ 2s
 p  2
P σmax (AT ) > 1 + 2s/m + t ≤ e−mt /2
 p  2
P σmin (AT ) < 1 − 2s/m − t ≤ e−mt /2
RIP for Gaussiam matrices
For each T with |T | ≤ 2s
 p  2
P σmax (AT ) > 1 + 2s/m + t ≤ e−mt /2
 p  2
P σmin (AT ) < 1 − 2s/m − t ≤ e−mt /2

!
p 2
P sup σmax (AT ) > 1 + 2s/m + t ≤ #{T : |T | ≤ 2s} · e−mt /2
T :|T |≤2s
 
n 2
≤ 2s · e−mt /2
2s
2
≈ e2s log(n/2s) · e−mt /2

p √
Similarly for σmin . Take t small enough to that 1 + s/m + t ≤ 1.41 and get

m & s log(n/s)

as claimed
Geometric intuition for noisy recovery
Geometric intuition for noisy recovery

x̂ obeys constraints → x̂ inside the tube

kA(x − x̂)k`2 ≤ kAx − yk`2 + ky − Ax̂k`2 ≤ 2


Geometric intuition for noisy recovery

x̂ obeys constraints → x̂ inside the tube True x is feasible → x̂ inside 

kA(x − x̂)k`2 ≤ kAx − yk`2 + ky − Ax̂k`2 ≤ 2 kx̂k`1 ≤ kxk`1


Geometric intuition for noisy recovery

x̂ obeys constraints → x̂ inside the tube True x is feasible → x̂ inside 

kA(x − x̂)k`2 ≤ kAx − yk`2 + ky − Ax̂k`2 ≤ 2 kx̂k`1 ≤ kxk`1


radius of shaded area is small
Preliminaries

For all x, x0 supported on disjoint subsets T, T 0 ⊆ {1, . . . , n} with


|T | ≤ s, |T 0 | ≤ s0
|hAx, Ax0 i| ≤ δs+s0 kxk`2 kx0 k`2
Preliminaries

For all x, x0 supported on disjoint subsets T, T 0 ⊆ {1, . . . , n} with


|T | ≤ s, |T 0 | ≤ s0
|hAx, Ax0 i| ≤ δs+s0 kxk`2 kx0 k`2

Suppose x and x0 are unit vectors as above. Then

2(1 − δs+s0 ) ≤ kAx + Ax0 k2`2 ≤ 2(1 + δs+s0 )


2(1 − δs+s0 ) ≤ kAx − Ax0 k2`2 ≤ 2(1 + δs+s0 )

Parallelogram identity
1
|hAx, Ax0 i| = kAx + Ax0 k2`2 − kAx − Ax0 k2`2 ≤ δs+s0
4
Proof of noisy recovery result

Proof is elementary but not trivial.


Solution is x̂ = x + h
T0 : indices of s largest coefficients of x
Tube constraint: x feasible gives

kAhk`2 ≤ kAx̂ − yk`2 + ky − Axk`2 ≤ 

Cone constraint: x feasible gives kx + hk`1 ≤ kxk`1


X X
kx + hk`1 = |xi + hi | + |xi + hi |
i∈T0 i∈T0c

≥ kxT0 k`1 − khT0 k`1 + khT0c k`1 − kxT0c k`1

and thus
khT0c k`1 ≤ khT0 k`1 + 2kxT0c k`1
Divide T0c into subsets of size s in decreasing order of magnitude of hT0c
T1 : indices of the s largest coefficients of hT0c ,
T2 : indices of the next s largest coefficients,
and so on...
Divide T0c into subsets of size s in decreasing order of magnitude of hT0c
T1 : indices of the s largest coefficients of hT0c ,
T2 : indices of the next s largest coefficients,
and so on...
For each j ≥ 2,

khTj k`2 ≤ s1/2 khTj k`∞ ≤ s−1/2 khTj−1 k`1

and thus
X
khTj k`2 ≤ s−1/2 (khT1 k`1 + khT2 k`1 + . . .) ≤ s−1/2 khT0c k`1
j≥2

This gives the useful estimate


X X
kh(T0 ∪T1 )c k`2 = k hTj k`2 ≤ khTj k`2 ≤ s−1/2 khT0c k`1
j≥2 j≥2

By Cauchy-Schwarz, it follows that

kh(T0 ∪T1 )c k`2 ≤ khT0 k`2 + 2e0 , e0 ≡ s−1/2 kx − xs k`1 (1)


P
We bound khT0 ∪T1 k`2 by observing that AhT0 ∪T1 = Ah − j≥2 AhTj
X
kAhT0 ∪T1 k2`2 = hAhT0 ∪T1 , Ahi − hAhT0 ∪T1 , AhTj i
j≥2

The tube constraint and the restricted isometry property then give
p
|hAhT0 ∪T1 , Ahi| ≤ kAhT0 ∪T1 k`2 kAhk`2 ≤ 2ε 1 + δ2s khT0 ∪T1 k`2

Also
|hAhT0 , AhTj i| ≤ δ2s khT0 k`2 khTj k`2
and likewise for T1 in place of T0

Since khT0 k`2 + khT1 k`2 ≤ 2khT0 ∪T1 k`2 for T0 and T1 are disjoint,

(1 − δ2s )khT0 ∪T1 k2`2 ≤ kAhT0 ∪T1 k2`2


p √ X
≤ khT0 ∪T1 k`2 (2ε 1 + δ2s + 2 δ2s khTj k`2 )
j≥2

√ √
−1/2 2 1 + δ2s 2 δ2s
⇒ khT0 ∪T1 k`2 ≤ α ε + ρ s khT0c k`1 , α≡ ,ρ ≡
1 − δ2s 1 − δ2s
We now conclude from this that

khT0 ∪T1 k`2 ≤ αε + ρkhT0 ∪T1 k`2 + 2ρe0 ⇒ khT0 ∪T1 k`2 ≤ (1 − ρ)−1 (αε + 2ρe0 )

And finally,

khk`2 ≤ khT0 ∪T1 k`2 + kh(T0 ∪T1 )c k`2 ≤ 2kh(T0 ∪T1 ) k`2 + 2e0
≤ 2(1 − ρ)−1 (αε + (1 + ρ)e0 )

which is what we needed to show


Recovery with `1 metric
We also claimed that in the noiseless case, kx̂ − xk`1 ≤ Ckx − xs k`1 . Why?
Recovery with `1 metric
We also claimed that in the noiseless case, kx̂ − xk`1 ≤ Ckx − xs k`1 . Why?

Lemma
Let h be any vector in the nullspace of A and let T0 be any set of cardinality s.
Then √
khT0 k`1 ≤ ρ khT0c k`1 , ρ = 2 δ2s (1 − δ2s )−1 . (2)

Recall that khT0 k`1 ≤ s1/2 khT0 k`2 ≤ s1/2 khT0 ∪T1 k`2 and
khT0 ∪T1 k`2 ≤ ρs−1/2 khT0c k`1 with ε = 0
Recovery with `1 metric
We also claimed that in the noiseless case, kx̂ − xk`1 ≤ Ckx − xs k`1 . Why?

Lemma
Let h be any vector in the nullspace of A and let T0 be any set of cardinality s.
Then √
khT0 k`1 ≤ ρ khT0c k`1 , ρ = 2 δ2s (1 − δ2s )−1 . (2)

Recall that khT0 k`1 ≤ s1/2 khT0 k`2 ≤ s1/2 khT0 ∪T1 k`2 and
khT0 ∪T1 k`2 ≤ ρs−1/2 khT0c k`1 with ε = 0

Since khT0c k`1 ≤ khT0 k`1 + 2kxT0c k`1 , we have

khT0c k`1 ≤ 2(1 − ρ)−1 kxT0c k`1

Therefore,

khk`1 = khT0 k`1 + khT0c k`1 ≤ 2(1 + ρ)(1 − ρ)−1 kxT0c k`1

You might also like