0% found this document useful (0 votes)

50 views7 pages

q6-5 Solution (Ridge and Lasso)

Uploaded by

usasua1112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views7 pages

q6-5 Solution (Ridge and Lasso)

Uploaded by

usasua1112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Assignment 6: Linear Model Selection

SDS293 - Machine Learning

Due: 1 November 2017 by 11:59pm

Conceptual Exercises
6.8.2 (p. 259 ISLR)
For parts each of the following, indicate whether each method is more or less flexible than least
squares. Describe how each method’s trade-off between bias and variance impacts its prediction
accuracy. Justify your answers.

(a) The lasso

Solution: Puts a budget constraint on least squares. It is therefore less flexible. The lasso
will have improved prediction accuracy when its increase in bias is less than its decrease in
variance.

(b) Ridge regression

Solution: For the same reason as above, this method is also less flexible. Ridge regression
will have improved prediction accuracy when its increase in bias is less than its decrease in
variance.

(c) Non-linear methods (PCR and PLS)

Solution: Non-linear methods are more flexible and will give improved prediction accuracy
when their increase in variance are less than their decrease in bias.

6.8.5 (p. 261)

Ridge regression tends to give similar coefficient values to correlated variables, whereas the lasso
may give quite different coefficient values to correlated variables. We will now explore this property
in a very simple setting.

Suppose that n = 2, p = 2, x11 = x12 , x21 = x22 . Furthermore, suppose that y1 + y2 = 0 and
x11 + x21 = 0 and x12 + x22 = 0, so that the estimate for the intercept in a least squares, ridge
regression, or lasso model is zero: β̂0 = 0.

1
(a) Write out the ridge regression optimization problem in this setting.

Solution: In general, Ridge regression optimization looks like:

 
1...n
X 1...p
X 1...p
X
min  (yi − β̂0 − β̂j xj )2 + λ β̂i2 
i j i

In this case, β̂0 = 0 and n = p = 2. So, the optimization simplifies to:

h i
min (y1 − β̂1 x11 − β̂2 x12 )2 + (y2 − β̂1 x21 − β̂2 x22 )2 + λ(β̂12 + β̂22 )

(b) Argue that in this setting, the ridge coefficient estimates satisfy β̂1 = β̂2 .

Solution: We know the following: x11 = x12 , so we’ll call that x1 , and x21 = x22 , so we’ll
call that x2 . Plugging this into the above, we get:
h i
min (y1 − β̂1 x1 − β̂2 x1 )2 + (y2 − β̂1 x2 − β̂2 x2 )2 + λ(β̂12 + β̂22 )

Taking the partial derivatives of the above with respect to β̂1 and β̂2 and setting them equal
to 0 will give us the point at which the function is minimized. Doing this, we find:

β̂1 (x21 + x22 + λ) + β̂2 (x21 + x22 ) − y1 x1 − y2 x2 = 0

and
β̂1 (x21 + x22 ) + β̂2 (x21 + x22 + λ) − y1 x1 − y2 x2 = 0
Since the right-hand side of both equations is identical, we can set the two left-hand sides
equal to one another:

β̂1 (x21 + x22 + λ) + β̂2 (x21 + x22 ) − y1 x1 − y2 x2 = β̂1 (x21 + x22 ) + β̂2 (x21 + x22 + λ) − y1 x1 − y2 x2

and then cancel out common terms:

β̂1 (x21 + x22 + λ) + β̂2 (x21 + x22 ) − y1 x1 − y2 x2 = β̂1 (x21 + x22 ) + β̂2 (x21 + x22 + λ) − y1 x1 − y2 x2

β̂1 (x21 + x22 ) + β̂1 λ + β̂2 (x21 + x22 ) = β̂1 (x21 + x22 ) + β̂2 (x21 + x22 ) + β̂2 λ
β̂1 λ + β̂2 (x21 + x22 ) = β̂2 (x21 + x22 ) + β̂2 λ
β̂1 λ = β̂2 λ
Thus, β̂1 = β̂2 .

(c) Write out the lasso optimization problem in this setting.

Solution:
h i
min (y1 − β̂1 x11 − β̂2 x12 )2 + (y2 − β̂1 x21 − β̂2 x22 )2 + λ(|β̂1 | + |β̂2 |)

2
(d) Argue that in this setting, the lasso coefficients β̂1 and β̂2 are not unique – in other words,
there are many possible solutions to the optimization problem in (c). Describe these solutions.

Solution: One way to demonstrate that these solutions are not unique is to make a geometric
argument. To make things easier, we’ll use the alternate form of Lasso constraints that we
saw in class, namely: |β̂1 | + |β̂2 | < s. If we were to plot these constraints, they take the
familiar shape of a diamond centered at the origin (0, 0).

Next we’ll consider the squared optimization constraint, namely:

(y1 − β̂1 x11 − β̂2 x12 )2 + (y2 β̂1 x21 − β̂2 x22 )2

Using the facts we were given regarding the equivalence of many of the variables, we can
simplify down to the following optimization:
h i
min 2(y1 − (β̂1 + β̂2 )x11

This optimization problem has a minimum at β̂1 + β̂2 = y1 x11 , which defines a line parallel
to one edge of the Lasso-diamond β̂1 + β̂2 = s.

As β̂1 and β̂2 vary along the line β̂1 + β̂2 = y1 x11 , these contours touch the Lasso-diamond
edge β̂1 + β̂2 = s at different points. As a result, the entire edge β̂1 + β̂2 = s is a potential
solution to the Lasso optimization problem!

A similar argument holds for the opposite Lasso-diamond edge, defined by: β̂1 + β̂2 = −s.

Thus, the Lasso coefficients are not unique. The general form of solution can be given by
two line segments:

β̂1 + β̂2 = s; β̂1 ≥ 0; β̂2 ≥ 0 and β̂1 + β̂2 = −s; β̂1 ≤ 0; β̂2 ≤ 0

3
Applied Exercises
6.8.9 (p. 263 ISLR)
In this exercise, we will predict the number of applications received using the other variables in the
College data set. For consistency, please use set.seed(11) before beginning.

(a) Split the data set into a training set and a test set.

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

(d) Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error
obtained, along with the number of non-zero coefficient estimates.

(e) Fit a PCR model on the training set, with M chosen by cross-validation. Report the test
error obtained, along with the value of M selected by cross-validation.

(f) Fit a PLS model on the training set, with M chosen by cross-validation. Report the test
error obtained, along with the value of M selected by cross-validation.

(g) Comment on the results you obtained. How accurately can we predict the number of college
applications received? Is there much difference among the test errors resulting from these five
approaches?

4
A6 Applied Solutions
6.8.9 (a)

library(ISLR)
library(dplyr)

Check to make sure we don’t have any null values

sum(is.na(College))

## [1] 0

Split the data set into a training set and a test set.

set.seed(1)
train = College %>%
sample_frac(0.5)

test = College %>%

setdiff(train)

6.8.9 (b)

Fit a linear model using least squares on the training set, and report the test error obtained.

lm_fit = lm(Apps~., data = train)

lm_pred = predict(lm_fit, test)
mean((test[, "Apps"] - lm_pred)^2)

## [1] 1108531

6.8.9 (c)

Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error
obtained, along with the number of non-zero coefficient estimates.

library(glmnet)
# Build model matrices for
# test and training data
train_mat = model.matrix(Apps~., data = train)
test_mat = model.matrix(Apps~., data = test)

# Find best lambda using cross-validation,

# alpha = 0 --> use ridge regression
grid = 10 ^ seq(4, -2, length=100)
mod_ridge = cv.glmnet(train_mat, train[, "Apps"], alpha = 0, lambda = grid, thresh = 1e-12)

1
lambda_best_ridge = mod_ridge$lambda.min

# Predict on test data, report error

ridge_pred = predict(mod_ridge, newx = test_mat, s = lambda_best_ridge)
mean((test[, "Apps"] - ridge_pred)^2)

## [1] 1108512

6.8.9 (d)

Fit a lasso model on the training set, with λ chosen by cross-validation. Report the test error obtained, along
with the number of non-zero coefficient estimates.

# Find best lambda using cross-validation,

# alpha = 1 --> use lasso
mod_lasso = cv.glmnet(train_mat, train[, "Apps"], alpha = 1, lambda = grid, thresh = 1e-12)
lambda_best_lasso = mod_lasso$lambda.min

# Predict on test data, report error

lasso_pred = predict(mod_lasso, newx = test_mat, s = lambda_best_lasso)
mean((test[, "Apps"] - lasso_pred)^2)

## [1] 1028718

predict(mod_lasso, newx = test_mat, s = lambda_best_lasso, type="coefficients")

## 19 x 1 sparse Matrix of class "dgCMatrix"

## 1
## (Intercept) -4.248125e+02
## (Intercept) .
## PrivateYes -4.955003e+02
## Accept 1.540306e+00
## Enroll -3.900157e-01
## Top10perc 4.779689e+01
## Top25perc -7.926581e+00
## F.Undergrad -9.846932e-03
## P.Undergrad .
## Outstate -5.231286e-02
## Room.Board 1.880308e-01
## Books 1.265938e-03
## Personal .
## PhD -4.137294e+00
## Terminal -3.184316e+00
## S.F.Ratio .
## perc.alumni -2.181304e+00
## Expend 3.193679e-02
## Grad.Rate 2.877667e+00

6.8.9 (e)

Results for OLS, Lasso, Ridge are comparable. Lasso reduces the P.Undergrad, Personal and S.F.Ratio
variables to zero and shrinks coefficients of other variables. Below are the test R2 values for all models.

2
test_avg = mean(test[, "Apps"])

lm_test_r2 = 1 - mean((test[, "Apps"] - lm_pred)^2) /mean((test[, "Apps"] - test_avg)^2)

ridge_test_r2 = 1 - mean((test[, "Apps"] - ridge_pred)^2) /mean((test[, "Apps"] - test_avg)^2)

lasso_test_r2 = 1 - mean((test[, "Apps"] - lasso_pred)^2) /mean((test[, "Apps"] - test_avg)^2)

barplot(c(lm_test_r2,
ridge_test_r2,
lasso_test_r2),
ylim=c(0,1),
names.arg = c("OLS", "Ridge", "Lasso"),
main = "Test R-squared")
abline(h = 0.9, col = "red")

Test R−squared
1.0
0.8
0.6
0.4
0.2
0.0

OLS Ridge Lasso

Since the test R2 values for all three models are above .90, they all predict the number of college applications
with high accuracy.

Gokamo's Attempted Complete Starfield Comp Finally in HD
100% (2)
Gokamo's Attempted Complete Starfield Comp Finally in HD
1,061 pages
Econometrics: Problem Set 2: Professor: Mauricio Sarrias
No ratings yet
Econometrics: Problem Set 2: Professor: Mauricio Sarrias
10 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Ch5 Regularization
No ratings yet
Ch5 Regularization
23 pages
1. Lecture+Notes+-+Advanced+Regression
No ratings yet
1. Lecture+Notes+-+Advanced+Regression
12 pages
ML QUES MOD-1
No ratings yet
ML QUES MOD-1
25 pages
LS Compensating Valve PVFC: Tech Note
No ratings yet
LS Compensating Valve PVFC: Tech Note
4 pages
5-R
No ratings yet
5-R
65 pages
Section 3
No ratings yet
Section 3
29 pages
Linear-Model-Selection-and-Regularization
No ratings yet
Linear-Model-Selection-and-Regularization
23 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Lecture BDS 4 23 24 Print
No ratings yet
Lecture BDS 4 23 24 Print
14 pages
Homework 2: Lasso Regression: 1.1 Data Set and Programming Problem Overview
No ratings yet
Homework 2: Lasso Regression: 1.1 Data Set and Programming Problem Overview
11 pages
Lecture BDS 7-23-24 Print
No ratings yet
Lecture BDS 7-23-24 Print
14 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
Lecture BDS 5 23 24 Print
No ratings yet
Lecture BDS 5 23 24 Print
9 pages
MLDA U1
No ratings yet
MLDA U1
10 pages
RGRSSN Assgnmnt
No ratings yet
RGRSSN Assgnmnt
11 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
Ex Regularization 2
No ratings yet
Ex Regularization 2
3 pages
Advanced Regression Assignment
No ratings yet
Advanced Regression Assignment
5 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
TP MSDC 3
No ratings yet
TP MSDC 3
6 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
Activity 7
No ratings yet
Activity 7
5 pages
Chapter 6 - 1 Handsout Machine Learning
No ratings yet
Chapter 6 - 1 Handsout Machine Learning
29 pages
Levers
No ratings yet
Levers
23 pages
Ridge Regression
No ratings yet
Ridge Regression
5 pages
Practice Problems Note
No ratings yet
Practice Problems Note
9 pages
分組作業三
No ratings yet
分組作業三
4 pages
SDSC3006_Assignment 3
No ratings yet
SDSC3006_Assignment 3
4 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
SL_3
No ratings yet
SL_3
11 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Data Analytics_Ridge and LASSO Regression
No ratings yet
Data Analytics_Ridge and LASSO Regression
15 pages
t2-sol
No ratings yet
t2-sol
5 pages
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
No ratings yet
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
29 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
Tutorial1_estimates(1)
No ratings yet
Tutorial1_estimates(1)
9 pages
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
No ratings yet
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
10 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
Machine learning
No ratings yet
Machine learning
19 pages
1108.4559
No ratings yet
1108.4559
12 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
16 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
Teaching Strategies
No ratings yet
Teaching Strategies
58 pages
ML Unit 3
No ratings yet
ML Unit 3
2 pages
Supervised Machine Learning Regression
No ratings yet
Supervised Machine Learning Regression
6 pages
ps4
No ratings yet
ps4
4 pages
exam_practice_solution
No ratings yet
exam_practice_solution
9 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
New Product Catalog525
No ratings yet
New Product Catalog525
47 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Ridge and Lasso in Python PDF
No ratings yet
Ridge and Lasso in Python PDF
5 pages
CODE ERROR Hyosung mx 5600
No ratings yet
CODE ERROR Hyosung mx 5600
34 pages
Generator Type Ecp 3-3L/4: HZ V kVA KW kVA KW DSR
No ratings yet
Generator Type Ecp 3-3L/4: HZ V kVA KW kVA KW DSR
5 pages
Introduction To Machine Learning Week 2 Assignment
100% (1)
Introduction To Machine Learning Week 2 Assignment
8 pages
Teun Koetsier : Explanation in The Historiography of Mathematics: The Case of Hamilton's Quaternions
No ratings yet
Teun Koetsier : Explanation in The Historiography of Mathematics: The Case of Hamilton's Quaternions
24 pages
Margaret Jean Harman Watson
No ratings yet
Margaret Jean Harman Watson
3 pages
LSH School Profile
No ratings yet
LSH School Profile
2 pages
Shell MBA Business Plan
No ratings yet
Shell MBA Business Plan
10 pages
Sankar Polytechnic College (Autonomous) Sankar Nagar: Diploma in Electrical and Electronics Engineering
No ratings yet
Sankar Polytechnic College (Autonomous) Sankar Nagar: Diploma in Electrical and Electronics Engineering
45 pages
Deca Press Release
No ratings yet
Deca Press Release
2 pages
Unit Plan - Attwn
No ratings yet
Unit Plan - Attwn
22 pages
Zilog Z-80 CPU Technical Manual
No ratings yet
Zilog Z-80 CPU Technical Manual
83 pages
Resources View Result Page - PHP
No ratings yet
Resources View Result Page - PHP
2 pages
1 Duality & Reciprocity Theorem
No ratings yet
1 Duality & Reciprocity Theorem
15 pages
Kenya Standards Approved SAC List 4th Oct 2018
No ratings yet
Kenya Standards Approved SAC List 4th Oct 2018
6 pages
Physics Formula Sheet
No ratings yet
Physics Formula Sheet
1 page
Force & Friction
100% (1)
Force & Friction
9 pages
SIME - Metropolis DGT 25 User
No ratings yet
SIME - Metropolis DGT 25 User
44 pages
All Weeks (Latest)
No ratings yet
All Weeks (Latest)
10 pages
TCS NQT Corporates Brochure
No ratings yet
TCS NQT Corporates Brochure
4 pages
Design, Analysis and Development of Hydraulic Scissor Lift
No ratings yet
Design, Analysis and Development of Hydraulic Scissor Lift
8 pages
English 6 Q1 Week 1 Lesson Exemplar
No ratings yet
English 6 Q1 Week 1 Lesson Exemplar
4 pages
Planificare-Calendaristica-Engleza-1 Clasa A V-A 2024-2025
No ratings yet
Planificare-Calendaristica-Engleza-1 Clasa A V-A 2024-2025
6 pages
9349803-40abbr
No ratings yet
9349803-40abbr
2 pages
C++ QB
No ratings yet
C++ QB
8 pages
IAL Physics SB2 Answers 11A
No ratings yet
IAL Physics SB2 Answers 11A
3 pages
Tutorial Sheet 9
No ratings yet
Tutorial Sheet 9
5 pages
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

q6-5 Solution (Ridge and Lasso)

Uploaded by

q6-5 Solution (Ridge and Lasso)

Uploaded by

Assignment 6: Linear Model Selection

SDS293 - Machine Learning

Due: 1 November 2017 by 11:59pm

(a) The lasso

(b) Ridge regression

(c) Non-linear methods (PCR and PLS)

6.8.5 (p. 261)

Solution: In general, Ridge regression optimization looks like:

In this case, β̂0 = 0 and n = p = 2. So, the optimization simplifies to:

β̂1 (x21 + x22 + λ) + β̂2 (x21 + x22 ) − y1 x1 − y2 x2 = 0

and then cancel out common terms:

(c) Write out the lasso optimization problem in this setting.

Next we’ll consider the squared optimization constraint, namely:

Check to make sure we don’t have any null values

test = College %>%

lm_fit = lm(Apps~., data = train)

# Find best lambda using cross-validation,

# Predict on test data, report error

# Find best lambda using cross-validation,

# Predict on test data, report error

predict(mod_lasso, newx = test_mat, s = lambda_best_lasso, type="coefficients")

## 19 x 1 sparse Matrix of class "dgCMatrix"

lm_test_r2 = 1 - mean((test[, "Apps"] - lm_pred)^2) /mean((test[, "Apps"] - test_avg)^2)

ridge_test_r2 = 1 - mean((test[, "Apps"] - ridge_pred)^2) /mean((test[, "Apps"] - test_avg)^2)

lasso_test_r2 = 1 - mean((test[, "Apps"] - lasso_pred)^2) /mean((test[, "Apps"] - test_avg)^2)

OLS Ridge Lasso

You might also like