0% found this document useful (0 votes)

2 views

Notes_Lecture 13_Regularization_LASSO and RIDGE Regression

The document discusses LASSO and Ridge regression as methods for statistical learning, highlighting the trade-off between prediction accuracy and model interpretability. Ridge regression shrinks coefficients towards zero to reduce variance, while LASSO can set some coefficients to zero, aiding in variable selection. Both methods utilize a tuning parameter to control the strength of their respective penalties.

Uploaded by

Marcelo Davi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Notes_Lecture 13_Regularization_LASSO and RIDGE Regression

Uploaded by

Marcelo Davi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

1

Lecture 18(b): LASSO and Ridge regression

Foundations of Data Science:

Algorithms and Mathematical Foundations

Mihai Cucuringu
[email protected]

CDT in Mathematics of Random System

University of Oxford

28 September, 2023
2

Overview

Ridge regression

LASSO
3
The Trade-Off Between Prediction Accuracy and
Model Interpretability
▶ linear regression: fairly inflexible
▶ splines: considerably more flexible (can fit a much wider range of
possible shapes to estimate f )
Inference:
▶ linear model: easy to understand the relationship between Y and
X1 , X2 , . . . , Xp
Very flexible approaches (splines, SVM, etc)
▶ can lead to such complicated estimates of f
▶ hard to understand how any individual predictor is associated
with the response (less interpretable)

Example: LASSO
▶ less flexible
▶ linear model + sparsity of [β0 , β1 , . . . , βp ]
▶ more interpretable; only a small subset of predictors matter
4
Flexibility vs. Interpretability 2.1 What Is Statistical Learning? 25

High
Subset Selection
Lasso

Least Squares
Interpretability

Generalized Additive Models

Trees

Bagging, Boosting

Support Vector Machines

Low

Low High

Flexibility
Figure: A representation of the trade-off between flexibility and
FIGURE 2.7. A representation of the tradeoﬀ between flexibility and inter-
interpretability, using different statistical learning methods. In general, as the
pretability, using
flexibility of diﬀerent
a method statistical
increases, itslearning methods.
interpretability In general, as the flexibil-
decreases.
ity of a method increases, its interpretability decreases.
5 2
R
▶ also called the coefficient of determination
▶ pronounced ”R squared”,
▶ gives the proportion of the variance in the dependent variable
that is predictable from the independent variable/s

2 TSS − RSS
R =
TSS
where

2
n⎛ p
⎞
RSS = ∑ ⎜yi − β0 − ∑ βj xij ⎟
i=1 ⎝ j=1 ⎠

2
TSS = ∑(yi − ȳ )
i
6
Variable selection
Which predictors are associated with the response? (in order to fit a
single model involving only those d predictors)
▶ Note: R 2 always increase as you add more variables to the model
▶ adjusted R 2 : 1 − RSS/(n−p−1) = 1 − (1 − R 2 ) n−1
TSS/(n−1) n−p−1

▶ Mallow’s: Cp = 1 (RSS + 2pσ̂ 2 )

n
2
▶ Akaike Information criterion AIC = 1
nσ̂ 2
(RSS + 2pσ̂ )
p
Cannot consider all 2 models...
▶ Best Subset Selection: fit a separate least squares regression for
each possible k -combination of the p predictors, and select the
best one
▶ Forward selection: start with the null model and keep adding
predictors one by one
▶ Backward selection: start with all variables in the model, and
remove the variable with the largest p-value
7
Prediction Accuracy
2 2
MSE = E[(h(x ) − h̄(x )) ] + [f (x ) − h̄(x )] + Var [],
∗ ∗ ∗ ∗

∗
x : new data point, f : ground truth, h: our estimator
2
MSE = Var[h(x )] + Bias(h(x )) + Var []
∗ ∗

▶ if true relationship is ≈ linear, the OLS will have low bias

▶ if n >> p: OLS also has low variance, and performs well on Xtest
▶ if n ∼ p: OLS has high variability, leads to overfitting/poor
predictions on Xtest
▶ if n < p: OLS estimate is no longer unique!
Today:
▶ by shrinking the estimated coefficients, we can often substantially
reduce the variance at the cost of a negligible increase in bias
▶ can lead to substantial improvements in the accuracy with which
we can predict the response for Xtest
8
Model Interpretability

▶ some or most of the variables used in a multiple linear regression

may not be associated with the response

▶ excluding them from the fit leads to a model that is more easily
interpreted

Shrinkage/Regularization:
▶ by setting the corresponding coefficient estimates to zero — we
can obtain a model that is more easily interpreted

▶ approach for automatically performing feature/variable selection

and thus excluding irrelevant variables from a multiple regression
model
9
Variable selection

▶ Subset Selection: identify a subset of p predictors that best relate

to the response, and perform OLS on them

▶ Shrinkage/Regularization: fit a model involving all p predictors,

but the estimated coefficients are shrunken towards zero, or end
up even equal to zero

▶ Dimensionality Reduction: first project the p predictors into a

d-dimensional subspace, with d < p. The d linear combinations,
or projections are subsequently used as predictors in OLS
(principal component regression PCR)
10
Shrinkage Methods

▶ fit a model containing all p predictors using a technique that

constrains or regularizes the coefficient estimates, or
equivalently, that shrinks the coefficient estimates towards zero

▶ shrinking the coefficient estimates can significantly reduce their

variance

▶ the two best-known techniques for shrinking the regression

coefficients towards zero are
▶ ridge regression
▶ lasso regression

See Section 6.2 in the ISLR textbook.

11
Regularization penalty

Idea: impose an `q penalty on the vector of beta coefficients, to

promote shrinking them towards zero

q q q

q q

Credit: Peter Gerstoft

12
Ridge Regression
Recall: OLS estimates β0 , β1 , . . . , βp such that it minimizes
2
n ⎛ p
⎞
RSS = ∑ ⎜yi − β0 − ∑ βj xij ⎟
i=1 ⎝ j=1 ⎠

Ridge regression shrinks β1 , . . . , βp towards zero. Given a response

n n×p
vector y ∈ R and a predictor matrix X ∈ R
RSS
ÌÒÒ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò ÐÒ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Î
2
n ⎛ p
⎞ p
(ridge) 2
β̂ = arg minp ∑ ⎜yi − ∑ βj xij ⎟ +λ ∑ βj
i=1 ⎝ ⎠
β∈R
j=1 j=1
n p
T 2 2
= arg minp ∑ (yi − xi β) + λ ∑ βj
β∈R
i=1 j=1
2 2
= arg minp ∣∣y − X β∣∣2 + λ ∣∣β∣∣2
β∈R ÍÒÒ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ñ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ï ÍÒÒ Ò Ò Ò Ò ÑÒÒ Ò Ò Ò Ò ÒÏ
Loss Penalty
13

(ridge) 2 2
β̂ = arg minp ∣∣y − X β∣∣2 + λ ∣∣β∣∣2
β∈R ÍÒÒ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ñ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ï ÍÒÒ Ò Ò Ò Ò ÑÒÒ Ò Ò Ò Ò ÒÏ
Loss Penalty

(ridge) T T
= (X X + λI) X y
−1
β̂

Here λ ≥ 0 is a tuning parameter

▶ controls the strength of the penalty term

▶ λ = 0 recovers the linear regression estimate

▶ λ = ∞ leads to β̂ (ridge) = 0

▶ λ ∈ (0, ∞) trades-off two ideas: fitting a linear model of y on X

versus shrinking the coefficients
14
Experimental setup
p
Given fixed covariates xi ∈ R , i = 1, . . . , n
We observe:
▶ yi = f (xi ) + i , i = 1, . . . , n,

▶ for a linear model f (xi ) = xiT β

▶ i ∈ R

▶ E[i ] = 0

▶ Var[i ] = σ 2

▶ Cov(i , j ) = 0
15
Experimental setup
▶ n = 50, p = 30, and σ 2 = 1
▶ The true model is linear with
▶ 10 large coefficients (between 0.5 and 1) and
▶ 20 small ones (between 0 and 0.3)
▶ Histogram of true coefficients

Source: R. Tibshirani
16
Experimental setup
▶ n = 50, p = 30, and σ 2 = 1
▶ The true model is linear with
▶ 10 large coefficients (between 0.5 and 1) and
▶ 20 small ones (between 0 and 0.3)
▶ Histogram of true coefficients

▶ the linear regression fit yields:

▶ Squared bias ≈ 0.006
▶ Variance ≈ 0.627
▶ Pred. error ≈ 1 + 0.006 + 0.627 ≈ 1.633
17
Improved prediction via shrinking

Linear Regression Ridge Reg. (at its best)

Squared bias ≈ 0.006 ≈ 0.077
Variance ≈ 0.627 ≈ 0.403
Pred. error ≈ 1 + 0.006 + 0.627 ≈ 1 + 0.077 + 0.403
≈ 1.633 ≈ 1.48
18
Ridge regression in R

The function lm.ridge in the package MASS:

▶ lambdas = seq(0,25,length = 100)

▶ aa = lm.ridge(y ∼ x + 0, lambda = lambdas)

▶ b.ridge = coef(aa)

▶ fit.ridge = b.ridge % * % t(x)

The glmnet function/package is also available in R.

19
Bias and variance of ridge regression

Bias and variance:

▶ not as simple to derive for ridge regression as they are for linear
regression
▶ but closed-form expressions are still possible

The general trend is:

▶ The bias increases as λ increases
▶ The variance decreases as λ increases
20
Bias and variance of ridge regression
21
Mean squared error (MSE), bias and variance
22
Recap: ridge regression
▶ minimizes the usual regression criterion plus a penalty term on
the squared l2 norm of the coefficient vector
▶ shrinks the coefficients towards zero
▶ introduces some bias
▶ but can greatly reduce the variance
▶ overall, it results in a better mean-squared error
▶ the amount of shrinkage is controlled by λ
▶ performs particularly well when there is a subset of true
coefficients that are small or even zero
▶ not as great when all of the true coefficients are moderately large
(can still outperform OLS over a pretty narrow range of (small) λ
values)
▶ does NOT set coefficients to zero exactly, and therefore cannot
perform variable selection in the linear model
23
LASSO
Recall OLS estimates β0 , β1 , . . . , βp such that it minimizes
2
⎛n p
⎞
RSS = ∑ ⎜yi − β0 − ∑ βj xij ⎟
i=1 ⎝ j=1 ⎠

LASSO sets some of the coefficients β1 , . . . , βp to zero. Given a

n n×p
response vector y ∈ R and a predictor matrix X ∈ R
RSS
ÌÒÒ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò ÐÒ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Î Penalty
ÌÒÒpÒ Ò Ò Ò Ò Ò Ð Ò Ò Ò Ò Ò Ò Ò Ò Î
2
n ⎛ p
⎞
(lasso)
β̂ = arg minp ∑ ⎜yi − ∑ βj xij ⎟ +λ ∑ ∣βj ∣
i=1 ⎝ ⎠
β∈R
j=1 j=1
n p
T 2
= arg minp ∑ (yi − xi β) + λ ∑ ∣βj ∣
β∈R
i=1 j=1
2
= arg minp ∣∣y − X β∣∣2 + λ ∣∣β∣∣1
β∈R ÍÒÒ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ñ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ï ÍÒÒ Ò Ò Ò Ò ÑÒÒ Ò Ò Ò Ò ÒÏ
Loss Penalty
24 2
arg minp ∣∣y − X β∣∣2 + λ ∣∣β∣∣1
β∈R ÍÒÒ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ñ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ï ÍÒÒ Ò Ò Ò Ò ÑÒÒ Ò Ò Ò Ò ÒÏ
Loss Penalty

• The tuning parameter λ controls the strength of the penalty, and (like
ridge regression), we get
▶ β̂ (lasso) = the usual OLS estimator, whenever λ = 0
▶ β̂ (lasso) = 0, whenever λ = ∞
For λ ∈ (0, ∞), we are balancing the trade-offs:
▶ fitting a linear model of y on X
▶ shrinking the coefficients; but the nature of the l1 penalty causes
some coefficients to be shrunken to zero exactly
LASSO (vs. Ridge):
▶ LASSO performs variable selection in the linear model
▶ has no closed-form solution (various optimization techniques are
employed)
▶ as λ increases, more coefficients are set to zero (less variables
are selected), and among the nonzero coefficients, more
shrinkage is employed
25
Ridge: coefficient paths
26
LASSO: coefficient paths
27
Fitting LASSO models in R with the glmnet package
▶ Lasso and Elastic-Net Regularized Generalized Linear Models
▶ fits a wide variety of models (linear models, generalized linear
models, multinomial models) with LASSO penalties
▶ the syntax is fairly straightforward, though it differs from lm in that
it requires you to form your own design matrix:
fit = glmnet(X, y)
▶ the package also allows you to conveniently carry out
cross-validation:
cvfit = cv.glmnet(X, y); plot(cvfit);
▶ prediction with cross validation. Example:
X = matrix(rnorm(100*20), 100, 20)
y = rnorm(100)
cv.fit = cv.glmnet(X, y)
yhat = predict(cv.fit, newx=X[1:5,])
coef(cv.fit)
coef(cv.fit, s = ”lambda.min”)
28
Elastic net - the best of both worlds
Elastic Net combines the penalties of Ridge and LASSO.

(elastic net) 2
β̂ = arg minp ∣∣y − X β∣∣2 + λ1 ∣∣β∣∣1 + λ2 ∣∣β∣∣2
β∈R ÍÒÒ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ñ Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ò Ï ÍÒÒ Ò Ò Ò Ò ÑÒÒ Ò Ò Ò Ò ÒÏ ÍÒÒ Ò Ò Ò Ò ÑÒÒ Ò Ò Ò Ò ÒÏ
Loss Penalty Penalty

Addresses several shortcomings of LASSO:

▶ for n < p (more covariates/features than samples) LASSO can
select only n covariates (even if more are truly associated with
the response)
▶ it tends to select only one covariate from any set of highly
correlated covariates
▶ for n > p, if the covariates are strongly correlated, Ridge tends to
perform better
Elastic Net:
▶ highly correlated covariates will tend to have similar regression
coefficients (desirable grouping effect)
29
Simpson’s paradox - beware!
Phenomenon in statistics when certain trends that appear when a
dataset is separated into groups are reversed when the data are
aggregated.

▶ can be resolved when confounding variables and causal relations

are appropriately addressed in the statistical modeling
▶ misleading results that the misuse of statistics can generate
Source: Wiki

Option Analysis in Tamil
80% (20)
Option Analysis in Tamil
18 pages
Food Safety Handouts
No ratings yet
Food Safety Handouts
8 pages
Chapter 6 - 1 Handsout Machine Learning
No ratings yet
Chapter 6 - 1 Handsout Machine Learning
29 pages
Applied Statistics II-SLR
100% (1)
Applied Statistics II-SLR
23 pages
Linear Regression
100% (1)
Linear Regression
27 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Random Processes: Version 2, ECE IIT, Kharagpur
No ratings yet
Random Processes: Version 2, ECE IIT, Kharagpur
8 pages
zouhastie05
No ratings yet
zouhastie05
20 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Lecture_6_Model_Selection_and_Regularization_11Oct2023
No ratings yet
Lecture_6_Model_Selection_and_Regularization_11Oct2023
29 pages
CH 4
No ratings yet
CH 4
14 pages
MLDA U1
No ratings yet
MLDA U1
10 pages
WINSEM2023-24 MAT6015 ETH VL2023240501308 2024-03-19 Reference-Material-I
No ratings yet
WINSEM2023-24 MAT6015 ETH VL2023240501308 2024-03-19 Reference-Material-I
39 pages
House Price Prediction Using Regression Techniques: A Comparative Study
No ratings yet
House Price Prediction Using Regression Techniques: A Comparative Study
5 pages
ML Answers Updated
No ratings yet
ML Answers Updated
13 pages
fama_mac
No ratings yet
fama_mac
14 pages
Statistical Methods For Bioinformatics Lecture 4
No ratings yet
Statistical Methods For Bioinformatics Lecture 4
29 pages
Chapter 2 Simple Linear Regression - Jan2023
No ratings yet
Chapter 2 Simple Linear Regression - Jan2023
66 pages
Econometrics
No ratings yet
Econometrics
6 pages
Debiasing Linear Prediction
No ratings yet
Debiasing Linear Prediction
37 pages
UCS-401_CSE7th M L_lect_10_Unit-Ll_Least Squares Method, Multivariate Linear Regression, Regul
No ratings yet
UCS-401_CSE7th M L_lect_10_Unit-Ll_Least Squares Method, Multivariate Linear Regression, Regul
16 pages
Week 2 and Week 3
No ratings yet
Week 2 and Week 3
14 pages
Apld07 Stat1
No ratings yet
Apld07 Stat1
25 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
Curvature Is Key
No ratings yet
Curvature Is Key
12 pages
Lesson 3-Multiple Linear Regression
No ratings yet
Lesson 3-Multiple Linear Regression
24 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manualpdf download
No ratings yet
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manualpdf download
41 pages
Doreswamy and Chanabasayya .M. Vastrad
No ratings yet
Doreswamy and Chanabasayya .M. Vastrad
18 pages
1. Linear regression Model - Applied_Part 1&2
No ratings yet
1. Linear regression Model - Applied_Part 1&2
69 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
Forecasting Assignment2023
No ratings yet
Forecasting Assignment2023
3 pages
LS_Project_Report
No ratings yet
LS_Project_Report
10 pages
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual pdf download
No ratings yet
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual pdf download
46 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
Maths Project 2
No ratings yet
Maths Project 2
6 pages
sangam's maths project (1)
No ratings yet
sangam's maths project (1)
12 pages
Module 5 Stat
No ratings yet
Module 5 Stat
12 pages
Gaussian Process Tutorial by Andrew NG
No ratings yet
Gaussian Process Tutorial by Andrew NG
13 pages
LAS 5 Statistics and Probability
No ratings yet
LAS 5 Statistics and Probability
7 pages
Thesis Proposal: Graph Structured Statistical Inference: James Sharpnack
No ratings yet
Thesis Proposal: Graph Structured Statistical Inference: James Sharpnack
20 pages
2501.04551v1
No ratings yet
2501.04551v1
50 pages
Chapter 3: Describing Relationships: Section 3.2
No ratings yet
Chapter 3: Describing Relationships: Section 3.2
23 pages
Chapter 11
No ratings yet
Chapter 11
18 pages
Lecture 8 Bivariate Data
No ratings yet
Lecture 8 Bivariate Data
24 pages
Part A Assignment - No - 4
No ratings yet
Part A Assignment - No - 4
14 pages
A Mathematical Programming Approach For Improving The Robustness of LAD Regression
No ratings yet
A Mathematical Programming Approach For Improving The Robustness of LAD Regression
22 pages
Chapter 10 Inference for Regression
No ratings yet
Chapter 10 Inference for Regression
36 pages
AP Stats Review
No ratings yet
AP Stats Review
16 pages
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual - PDF DOCX Format Is Available For Instant Download
100% (2)
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual - PDF DOCX Format Is Available For Instant Download
51 pages
Session 4 - Multiple Linear Regression
No ratings yet
Session 4 - Multiple Linear Regression
63 pages
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual - Download Today With Full Content
100% (3)
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual - Download Today With Full Content
44 pages
GLM, GAMs & GLLMs - An Overview of Theory For Applications in Fisheries Research, VENABLES, 2004
No ratings yet
GLM, GAMs & GLLMs - An Overview of Theory For Applications in Fisheries Research, VENABLES, 2004
19 pages
Module 5 Stat
No ratings yet
Module 5 Stat
13 pages
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual - Full Version Is Now Available For Download
100% (3)
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual - Full Version Is Now Available For Download
50 pages
Appendix Nonparametric Regression
No ratings yet
Appendix Nonparametric Regression
17 pages
Multiple Linear Regression: Chapter 12
No ratings yet
Multiple Linear Regression: Chapter 12
49 pages
Probabilistic Factorization of Non-Negative Data With Entropic Co-Occurrence Constraints
No ratings yet
Probabilistic Factorization of Non-Negative Data With Entropic Co-Occurrence Constraints
8 pages
PSMOD - Chapter 2 - Summary Measures of Statistics
No ratings yet
PSMOD - Chapter 2 - Summary Measures of Statistics
31 pages
ST T153A Regression Analysis
No ratings yet
ST T153A Regression Analysis
54 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Based On Google Search, Examining Search Trends
No ratings yet
Based On Google Search, Examining Search Trends
5 pages
Mayank Upadhyay: Career Objective
No ratings yet
Mayank Upadhyay: Career Objective
3 pages
G.O.C - Short Notes
No ratings yet
G.O.C - Short Notes
1 page
Floweing Plants Grade 3
No ratings yet
Floweing Plants Grade 3
2 pages
Sebaceous Cyst
100% (1)
Sebaceous Cyst
6 pages
REFERENCES
No ratings yet
REFERENCES
4 pages
Psychology Testing Introduction Reference 1
No ratings yet
Psychology Testing Introduction Reference 1
10 pages
EMF Certificate
No ratings yet
EMF Certificate
1 page
AECC Basic Computer Skills
No ratings yet
AECC Basic Computer Skills
118 pages
Chemistry Thesis Examples
100% (3)
Chemistry Thesis Examples
6 pages
Without A Budget: Marketing
No ratings yet
Without A Budget: Marketing
15 pages
Domus Line Business Profile A4 EN 005 23042019
No ratings yet
Domus Line Business Profile A4 EN 005 23042019
20 pages
Mathématiques Les Exercices Incontournables MPSI
No ratings yet
Mathématiques Les Exercices Incontournables MPSI
4 pages
Making Candles From Recycled Cooking Oil
No ratings yet
Making Candles From Recycled Cooking Oil
13 pages
Summative Test
100% (4)
Summative Test
2 pages
Week-5
No ratings yet
Week-5
26 pages
Swapnil - Kulkarni - SR Manager Key Accounts Resume June 24
No ratings yet
Swapnil - Kulkarni - SR Manager Key Accounts Resume June 24
2 pages
The House Under the Moondial - Ref-Toolkit - V1.0
No ratings yet
The House Under the Moondial - Ref-Toolkit - V1.0
14 pages
Communicative Language Ability
100% (1)
Communicative Language Ability
24 pages
ملف مميز عن التسميد فى الموالح من فلوريدا
No ratings yet
ملف مميز عن التسميد فى الموالح من فلوريدا
114 pages
Python Scripting For Digsilent Powerfactory: Leveraging The Python Api For Scenario Manipulation and Analysis of Large Datasets
No ratings yet
Python Scripting For Digsilent Powerfactory: Leveraging The Python Api For Scenario Manipulation and Analysis of Large Datasets
30 pages
All Winner
No ratings yet
All Winner
12 pages
METROPOLITAN BANK & TRUST COMPANY Vs ASB HOLDINGS, INC. CASE DIGEST
No ratings yet
METROPOLITAN BANK & TRUST COMPANY Vs ASB HOLDINGS, INC. CASE DIGEST
2 pages
Premium Bike Market in India
No ratings yet
Premium Bike Market in India
5 pages
Durham Johnston School Music Department Ks3 Scheme of Work
No ratings yet
Durham Johnston School Music Department Ks3 Scheme of Work
10 pages
DPR Template For PMGSY-III PDF
No ratings yet
DPR Template For PMGSY-III PDF
90 pages
Customer Statement
No ratings yet
Customer Statement
12 pages
Andersen 1986
No ratings yet
Andersen 1986
12 pages

Notes_Lecture 13_Regularization_LASSO and RIDGE Regression

Uploaded by

Notes_Lecture 13_Regularization_LASSO and RIDGE Regression

Uploaded by

1

Lecture 18(b): LASSO and Ridge regression

Algorithms and Mathematical Foundations

CDT in Mathematics of Random System

Generalized Additive Models

Support Vector Machines

▶ Mallow’s: Cp = 1 (RSS + 2pσ̂ 2 )

▶ if true relationship is ≈ linear, the OLS will have low bias

▶ some or most of the variables used in a multiple linear regression

▶ approach for automatically performing feature/variable selection

▶ Subset Selection: identify a subset of p predictors that best relate

▶ Shrinkage/Regularization: fit a model involving all p predictors,

▶ Dimensionality Reduction: first project the p predictors into a

▶ fit a model containing all p predictors using a technique that

▶ shrinking the coefficient estimates can significantly reduce their

▶ the two best-known techniques for shrinking the regression

See Section 6.2 in the ISLR textbook.

Idea: impose an `q penalty on the vector of beta coefficients, to

Credit: Peter Gerstoft

Ridge regression shrinks β1 , . . . , βp towards zero. Given a response

Here λ ≥ 0 is a tuning parameter

▶ λ = 0 recovers the linear regression estimate

▶ λ ∈ (0, ∞) trades-off two ideas: fitting a linear model of y on X

▶ for a linear model f (xi ) = xiT β

▶ the linear regression fit yields:

Linear Regression Ridge Reg. (at its best)

The function lm.ridge in the package MASS:

▶ lambdas = seq(0,25,length = 100)

▶ aa = lm.ridge(y ∼ x + 0, lambda = lambdas)

▶ fit.ridge = b.ridge % * % t(x)

The glmnet function/package is also available in R.

Bias and variance:

The general trend is:

LASSO sets some of the coefficients β1 , . . . , βp to zero. Given a

Addresses several shortcomings of LASSO:

▶ can be resolved when confounding variables and causal relations

You might also like