0% found this document useful (0 votes)

47 views

Lecture 24

This document summarizes a lecture on statistical estimation in high dimensions. It discusses the lasso method, which solves an L1-regularized optimization problem that can be computed efficiently unlike the intractable L0 penalty. The lasso is motivated as a convex relaxation of the L0 penalty that can recover the sparse solution under certain conditions. However, the lasso may not always find the true sparse solution, as shown in an example where it fails to recover the correct nonzero coefficients. Oracle inequalities also cannot be achieved for the lasso like they can for canonical selection procedures.

Uploaded by

gewaray536

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Lecture 24

Uploaded by

gewaray536

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Stats 300C: Theory of Statistics Spring 2018

Lecture 24 — May 30, 2018

Prof. Emmanuel Candes
Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes

1 Outline

Agenda: High-dimensional Statistical Estimation

1. Lasso

2. `1 − `0 equivalence

3. Oracle inequalities for Lasso?

2 Canonical Selection Procedure

Consider the usual linear model

y = Xβ + z
and the canonical `0 selection procedure

min ky − X β̂k2 + λ2 σ 2 kβ̂k0

We have seen several choices of the parameter λ

• AIC: λ2 = 2

• BIC: λ2 = log n

• RIC: λ2 ≈ 2 log p

The critical problem shared by these methods is that finding the minimum requires an exhaustive
search over all possible models. This search is completely intractable for even moderate values of
p.

3 Lasso

Consider instead solving the `1 regularized problem

1
min ky − X β̂k22 + λσkβ̂k1
2

1
This is a quadratic program which can be solved efficiently.
`1 regularization has a long history in statistics and applied science. This idea appeared in the
early work of reflection seismology. In seismology, people try to image the earth by sound wave.
With reflected sound wave, one receives a time series of signals, which is noisy and low frequency.
In order to detect where the “jumps” in earth are, people came up with the idea to use `1 norm.

Some important references about the use of `1 norm are below.

• Logan (1956)

• Claerbout & Muir (1973)

• Taylor, Banks, McCoy (1979)

• Santosa & Synes (1983, 1986)

• Rudin, Osher, Fatemi (1992)

• Donoho et al. (1990’s)

• Tibshirani (1996)

3.1 Why `1 ?

To motivate the `1 approach, consider first the noiseless case

y = Xβ,

2
which is an underdetermined system of equations in the case where we have more variables than
unknowns, i.e. p > n.
In this setting, there is no unique solution to the least squares problem. However, by assuming that
β is sparse, we can reduce our search to sparse solutions, i.e., we wish to find the sparsest solution
to y = Xβ by solving
min kβ̂k0 s.t. y = X β̂
Under some conditions, the minimizer is unique, i.e., β̂ = β.
Even so, this doesn’t address the issue of computational tractability. The optimization problem as
stated remains intractable. This motivates the convex relaxation

min kβ̂k1 s.t. y = X β̂

Under broad conditions, it can be shown that

• The minimizer is unique

• The `1 solution is equal to the `0 solution.

Why `1 may not always work
Why does `1 work?

(a) β̂ = β Good! (b) β̂ 6= β Bad!

Figure 1: Geometry of the `1 optimization problem in the noiseless setting. To understand the
picture suppose β̂ is a feasible point (X β̂ = Xβ = y). Consider ‘descent’ vectors u such that
kβ̂ + uk1 < kβ̂k1 . It is easy to see that β̂ is the solution to the `1 optimization problem if Xu 6= 0
for all descent vectors u, i.e., if all vectors with smaller `1 norm are non-feasible. (a) The solution
here is the true β, because all other feasible vectors have larger `1 norm. (b) Here, β̂ 6= β. Starting
at β, we can consider the descent direction u as shown. The point β + u remains feasible, and has
`1 -norm strictly smaller than β.

Rigorous results pertaining to the formal equivalence between the `0 and `1 problem can be found
in

• Candes & Tao (2004)

• Donoho (2004)

3
Why Why an ! objective?
an !1approximation
objective? to ` quasi-norm
`1 norm is closest convex 0

!0 `quasi-norm
(a) 0 quasi-norm !1 `norm
(b) 1 norm !2 `norm
(c) 2 norm

`Figure 2: `1 norm is closest convex approximation to `0 quasi-norm.

1 ball, smallest convex ball containing ±ei = (0, . . . , 0, ±1, 0, . . . , 0)

3.2 `1 as a convex relaxation of `0

Sparse Representations
Sparse Representations (Numerical
(Numerical Analysis
Analysis Seminar,
Seminar, NYU,
NYU, 20
20 April
April 2007)
2007) 18
18

`1 is the tightest convex relaxation of `0 . One way of seeing this is to look at the model
p
X
µ= βi Xi
i=1

where β is sparse. The ‘building blocks’ of sparse solutions are vectors of the form
(0, . . . , 0, ±1, 0, . . . , 0)
and the `1 ball is the smallest convex body that contains these building blocks. It is the convex
hull of these points.
Another interpretation is this: the `1 norm is the best convex lower bound to the cardinality
functional over the unit cube. That is to say, if we search for a convex function f such that
f (β) ≤ kβk0 for all β ∈ [−1, 1]p ,
then the tightest lower bound is the `1 norm, see Figure 2.

4 Lasso and Oracles?

Recall from last time that

Theorem 1. CSP (Canonical Selection Procedures) with λp = A log p, then
EkX β̂ − Xβk2 = O(log p)[σ 2 + RI (µ)]
where RI (µ) is the ideal risk.

As we will illustrate, we cannot hope to achieve comparable optimality results for the Lasso.

Case study 1: Take X to be  

1 0 ···0 ε
.
0 1 · · · 0 .. 

 
0 0 . . . 0 ε 
 

0 0 ... 1 1

4
In this case, p = n + 1 so the system is underdetermined.
Suppose that µ is constructed with
   
0 1
 ..   .. 
 .  .
β = ε−1  0  so that µ = 1
   
   
−1 1
1 0

In the noiseless setting:

• The `0 solution is exactly β

• The `1 solution is  
1
 .. 
.
β̂ = 1 6= β
 
 
0
0
in the case where ε < 2/(n − 1). So `1 does not recover the right solution.

Next we will consider the noisy setting.

With no noise, the lasso does not find the sparse solution. So with noise, there is little doubt that
it will not be able to find the correct variables either. Now this may not be a problem as long as
we care only about the prediction error, i.e. β̂ may be a very poor estimate of β but X β̂ may still
be a good estimate of Xβ. This is, however, not the case here.
Suppose ε is small as above, and pick a noise level σ = 1/10, say. Hence, we observe

yi = 1(i 6= n) + 0.1 zi , i = 1, . . . , n.

It can be shown for all reasonable values of λ, the Lasso solution is given by soft-thresholding y,
i.e.
β̂i = (yi − λσ)+ , i = 1, . . . , n − 1
β̂i = 0, i = n, n + 1
and that
µ̂i = (yi − λσ)+ , i = 1, . . . , n − 1
µ̂i = 0, i=n
Consequently, the lasso has a squared bias of roughly (n − 1)λ2 σ 2 and a variance of (n − 1)σ 2 . We’d
be better off by just plugging µ̂ = y and only pay the variance.
The empirical results under a noisy setting are shown in Figure 3.

5
Lasso estimated mean response Ideal estimated mean response
1 1.2
true mean true mean
estimated mean estimated mean
0.9
1

0.8

0.8
0.7

0.6
0.6

0.5

0.4
0.4

0.3 0.2

0.2

0
0.1

0 −0.2
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000

Figure 3: Graphical representation of the example above with n = 1, 000, p = 1, 001 and σ = 0.1.
The lasso estimate fits X β̂ = µ̂ = (y − 0.1λ)+ , which results in a large variance and a large bias.
The ideal estimate uses only two predictors. The ratio between the lasso MSE and the ideal MSE
is over 6, 500 ≈ 6.5n ≈ 6.5p.

Summary: This is an instance where the `1 solution is irreparably worse than the `0 solution.
Indeed, with `0 we achieve
EkX β̂ − Xβk2 ≈ 2σ 2
while for the Lasso (no matter λ),

EkX β̂ − Xβk2 nσ 2

Take-away message: Even in the noiseless case, correlation between the predictors cause severe
problems for the Lasso.
Following we will show another case where the Lasso does not work well even when the correlation
between columns of X is very low.

Case study 2: X = [In F2:n ], where F2:n is the discrete cosine transform matrix excluding the
first column (the DC component). Specifically,
 
ϕ1 (1) ϕ2 (1) · · · ϕn−1 (1)
 ϕ1 (2) ϕ2 (2) · · · ϕn−1 (2) 
F2:n =  . .. .. ,
 
 .. . ··· . 
ϕ1 (n) ϕ2 (n) · · · ϕn−1 (n)

where p
ϕ2k−1 (t) = 2/n cos(2πkt/n), k = 1, 2, · · · , n/2 − 1
p
ϕ2k (t) = 2/n sin(2πkt/n), k = 1, 2, · · · , n/2 − 1
√
ϕn−1 (t) = (−1)t / n.

6
p
In this case, p = 2n − 1. The maximum correlation between columns of X, µ(X) = 2/n, which
shows that the columns of X are extremely incoherent. Besides, kxk2 ≤ 2.
Then we can construct f = Xβ, of which each coordinate is 1, with a sparse β (shown in Figure 4).

Figure 4: Graphical representation of f = Xβ and β in case study 2 with n = 256, p = 511.

In noisy setting, the observed noisy Xβ + z, the Lasso estimate β̂ and X β̂ are shown in Figure 5.

Figure 5: Graphical representation of the results in case study 2 with n = 256, p = 511. The lasso
estimate β̂ has far more non-zero entries than β and results in large MSE.

Empirically and provably,

(
yi − λσ, i ∈ {1, · · · , n},
X β̂ = y − λσ1, β̂i =
0, i ∈ {n + 1, · · · , 2n − 1}.

7
The Lasso MSE, kfˆ − f k2 ∼ nσ 2 (1 + λ2 ), is horrible. The l0 norm of β̂, |supp(β̂)| = n, is much
√
higher than that of β, |supp(β)| = 32 n.

Research Methods For Business. A Skill-Building Approach. 6th Edition PDF
No ratings yet
Research Methods For Business. A Skill-Building Approach. 6th Edition PDF
18 pages
Lecture Notes On Remote Sensing & Gis: IV B. Tech II Semester (JNTU (A) - R13)
100% (1)
Lecture Notes On Remote Sensing & Gis: IV B. Tech II Semester (JNTU (A) - R13)
13 pages
The Holloway Guide To Using Twitter - Holloway PDF
No ratings yet
The Holloway Guide To Using Twitter - Holloway PDF
76 pages
Lecture BDS 4 23 24 Print
No ratings yet
Lecture BDS 4 23 24 Print
14 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
Elasticnet Vietnamese
No ratings yet
Elasticnet Vietnamese
23 pages
1203.3002v1
No ratings yet
1203.3002v1
37 pages
Ratio and Difference of and Norms and Sparse Representation With Coherent Dictionaries
No ratings yet
Ratio and Difference of and Norms and Sparse Representation With Coherent Dictionaries
14 pages
Hdnotes 2021
No ratings yet
Hdnotes 2021
31 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
Fast Solution of ' - Norm Minimization Problems When The Solution May Be Sparse
No ratings yet
Fast Solution of ' - Norm Minimization Problems When The Solution May Be Sparse
45 pages
Sparse Optimization Lecture: Basic Sparse Optimization Models
No ratings yet
Sparse Optimization Lecture: Basic Sparse Optimization Models
33 pages
Lecture BDS 6-23-24 Print
No ratings yet
Lecture BDS 6-23-24 Print
10 pages
A Note On The Group Lasso and A Sparse Group Lasso PDF
No ratings yet
A Note On The Group Lasso and A Sparse Group Lasso PDF
9 pages
Ex Regularization 2
No ratings yet
Ex Regularization 2
3 pages
4 Smoothing
No ratings yet
4 Smoothing
34 pages
Lasso SVM
No ratings yet
Lasso SVM
6 pages
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
No ratings yet
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
29 pages
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
No ratings yet
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
29 pages
Sparse Inverse Covariance Estimation With The Graphical Lasso
No ratings yet
Sparse Inverse Covariance Estimation With The Graphical Lasso
14 pages
AMP Paper
No ratings yet
AMP Paper
16 pages
Lecture BDS 5 23 24 Print
No ratings yet
Lecture BDS 5 23 24 Print
9 pages
Figueiredo Et Al. - 2007 - Gradient Projection For Sparse Reconstruction Application To Compressed Sensing and Other Inverse Problems-Annotated
No ratings yet
Figueiredo Et Al. - 2007 - Gradient Projection For Sparse Reconstruction Application To Compressed Sensing and Other Inverse Problems-Annotated
12 pages
wainwrightslides2
No ratings yet
wainwrightslides2
77 pages
Analysis of The Ratio of ' and ' Norms in Compressed Sensing
No ratings yet
Analysis of The Ratio of ' and ' Norms in Compressed Sensing
24 pages
Dantzig Selector
No ratings yet
Dantzig Selector
35 pages
sparsitNERANETEE NEURA RMNURLA NER WERK MANUA KPAPER PD PDF BENE
No ratings yet
sparsitNERANETEE NEURA RMNURLA NER WERK MANUA KPAPER PD PDF BENE
8 pages
Lasso Slides Tibsharani
No ratings yet
Lasso Slides Tibsharani
44 pages
APS 4 Report
No ratings yet
APS 4 Report
84 pages
Norm Methods For Convex-Cardinality Problems
No ratings yet
Norm Methods For Convex-Cardinality Problems
31 pages
Lecture BDS 7-23-24 Print
No ratings yet
Lecture BDS 7-23-24 Print
14 pages
Lecture 7 Loss Function and Regularization
No ratings yet
Lecture 7 Loss Function and Regularization
38 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
Least Squares
No ratings yet
Least Squares
12 pages
Stable Recovery
No ratings yet
Stable Recovery
15 pages
Sparse Regression and Dictionary Learning
No ratings yet
Sparse Regression and Dictionary Learning
14 pages
Sparse Coding and Dictionary Learning
No ratings yet
Sparse Coding and Dictionary Learning
40 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
lecture03d_ridge
No ratings yet
lecture03d_ridge
13 pages
Penalty Decomposition Methods For L - Norm Minimization
No ratings yet
Penalty Decomposition Methods For L - Norm Minimization
26 pages
Regression Shrinkage and Selection Via The Lasso: A Retrospective
No ratings yet
Regression Shrinkage and Selection Via The Lasso: A Retrospective
10 pages
Suppdf 1
No ratings yet
Suppdf 1
14 pages
Face Recognition With Compressive Sensing: Mathias Lohne Spring, 2017
No ratings yet
Face Recognition With Compressive Sensing: Mathias Lohne Spring, 2017
14 pages
Lasso Cons
No ratings yet
Lasso Cons
23 pages
03 Sparse Approx Algs PDF
No ratings yet
03 Sparse Approx Algs PDF
12 pages
Lasso-NIPS
No ratings yet
Lasso-NIPS
8 pages
Optimization Techniques 1. Least Squares
No ratings yet
Optimization Techniques 1. Least Squares
17 pages
02c.L1L2 Regularization
No ratings yet
02c.L1L2 Regularization
50 pages
25 Coord Desc
No ratings yet
25 Coord Desc
29 pages
Least Squares Optimization With L1-Norm Regularization
No ratings yet
Least Squares Optimization With L1-Norm Regularization
12 pages
' Minimization: Resolution/ Bandwidth # Samples
No ratings yet
' Minimization: Resolution/ Bandwidth # Samples
11 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
最原始的文献
No ratings yet
最原始的文献
24 pages
Regression
No ratings yet
Regression
7 pages
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
No ratings yet
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
4 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
ECEN615 Fall2022 Lect16-1
No ratings yet
ECEN615 Fall2022 Lect16-1
47 pages
Deterministic Cs
No ratings yet
Deterministic Cs
45 pages
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
No ratings yet
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
86 pages
115
No ratings yet
115
8 pages
CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
No ratings yet
CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
8 pages
Introduction to Minimax
From Everand
Introduction to Minimax
V. F. Dem’yanov
No ratings yet
Mathematical Analysis 1: theory and solved exercises
From Everand
Mathematical Analysis 1: theory and solved exercises
Alessio Mangoni
5/5 (1)
Normal Curve Using TI84 - and - Excel
No ratings yet
Normal Curve Using TI84 - and - Excel
8 pages
At Sher-i-Kashmir: of Was
No ratings yet
At Sher-i-Kashmir: of Was
2 pages
The Role of Land Use Planning in Sustainable
No ratings yet
The Role of Land Use Planning in Sustainable
31 pages
SLG 2.2 Bar Graphs
No ratings yet
SLG 2.2 Bar Graphs
6 pages
Electrical Measurements And Instrumentation notes (1)_watermark
No ratings yet
Electrical Measurements And Instrumentation notes (1)_watermark
96 pages
PTS2 Exam 2017 With Solutions
No ratings yet
PTS2 Exam 2017 With Solutions
10 pages
The STEM Pipeline Booklet1.3 36
No ratings yet
The STEM Pipeline Booklet1.3 36
34 pages
Maria Montessori
No ratings yet
Maria Montessori
5 pages
Districtwise Atlas - Jodhpur
No ratings yet
Districtwise Atlas - Jodhpur
23 pages
Rme Diag Calculus Dummy
No ratings yet
Rme Diag Calculus Dummy
2 pages
FSUIPC7 Offsets Status
No ratings yet
FSUIPC7 Offsets Status
132 pages
Product Manager Versus Product Owner
No ratings yet
Product Manager Versus Product Owner
4 pages
Community Based Disaster Risk Management
No ratings yet
Community Based Disaster Risk Management
7 pages
STATS 111 1 2combined
No ratings yet
STATS 111 1 2combined
67 pages
Reverse Gender Roles in Moni Mohsin's 'The Diary of a Social Butterfly'
No ratings yet
Reverse Gender Roles in Moni Mohsin's 'The Diary of a Social Butterfly'
4 pages
How To Become IR Engineer in Malaysia
No ratings yet
How To Become IR Engineer in Malaysia
1 page
Philosophy in Life
No ratings yet
Philosophy in Life
1 page
In Vitrocell Co2 Incubator Product Brochure
No ratings yet
In Vitrocell Co2 Incubator Product Brochure
20 pages
Cambridge IELTS 9 Academic Reading Test 1
No ratings yet
Cambridge IELTS 9 Academic Reading Test 1
8 pages
Class_1_UIMO_Sample Questions With KEY & Sol.
No ratings yet
Class_1_UIMO_Sample Questions With KEY & Sol.
4 pages
SOCIAL STUDIES VI
No ratings yet
SOCIAL STUDIES VI
4 pages
Manufacturing Automation Lecture 1
No ratings yet
Manufacturing Automation Lecture 1
15 pages
Aggregate-Protected and Unprotected Organic Matter Pools in Conventional and No-Tillage Soils
No ratings yet
Aggregate-Protected and Unprotected Organic Matter Pools in Conventional and No-Tillage Soils
10 pages
Instant Access to Max Weber and Michel Foucault parallel life works Weber ebook Full Chapters
100% (7)
Instant Access to Max Weber and Michel Foucault parallel life works Weber ebook Full Chapters
78 pages
Comments Response Sheet (CRS) : Major Projects Template MAJOR ASY MPQA MPQA FOR 000005 - Rev02Dated 14/11/2019
No ratings yet
Comments Response Sheet (CRS) : Major Projects Template MAJOR ASY MPQA MPQA FOR 000005 - Rev02Dated 14/11/2019
23 pages
Agile Project Management
No ratings yet
Agile Project Management
4 pages
3 TekniFloor LDP
100% (1)
3 TekniFloor LDP
2 pages