0% found this document useful (0 votes)

36 views

Lecture8a Regularization

The document discusses regularization techniques for linear regression models. It introduces LASSO and Ridge regression, which add a regularization term to the loss function that penalizes extreme parameter values. This encourages simpler models that generalize better. LASSO uses an L1 penalty term that drives some coefficients to exactly zero, while Ridge uses an L2 penalty term. Choosing the regularization parameter lambda is important and can be done through cross-validation. Geometric interpretations of LASSO and Ridge regression are also provided.

Uploaded by

mayb3today

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Lecture8a Regularization

Uploaded by

mayb3today

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Lecture 8a: Regularization

CS109A Introduction to Data Science

Pavlos Protopapas, Kevin Rader and Chris Tanner
ANNOUNCEMENTS

• Religious Holidays: please contact if this affects your HW due dates.

• For 209 students:
– please submit 209 HW separately from 109 HW in different assignments on Canvas.
– A-sec this week: optional to cover 2nd part from last week (Bayesian perspective).

CS109A, PROTOPAPAS, RADER, TANNER 2

ANNOUNCEMENTS

• Milestone 1: Due this Friday, Oct 4 (see Canvas instructions).

• Study break: Wednesday 6-8pm in TBD?
• Recall Course Requirements: “You are expected to have programming
experience at the level of CS 50 or above, and statistics knowledge at the
level of Stat 100 or above (Stat 110 recommended).”

CS109A, PROTOPAPAS, RADER, TANNER 3

CS109A, PROTOPAPAS, RADER, TANNER 4
Bias vs Variance
Left: 2000 best fit straight lines, each fitted on a different 20 point training set.
Right: Best-fit models using degree 10 polynomial

CS109A, PROTOPAPAS, RADER, TANNER 5

Bias vs Variance

Left: Linear regression coefficients

Right: Poly regression of order 10 coefficients

CS109A, PROTOPAPAS, RADER, TANNER 6

Lecture Outline

Regularization: LASSO and Ridge

Geometric Interpretation

CS109A, PROTOPAPAS, RADER, TANNER 7

Regularization: An Overview

The
idea of regularization revolves around modifying the loss function L; in
particular, we add a regularization term that penalizes some specified properties of
the model parameters

where is a scalar that gives the weight (or importance) of the regularization term.
Fitting the model using the modified loss function Lreg would result in model
parameters with desirable properties (specified by R).

CS109A, PROTOPAPAS, RADER, TANNER 8

LASSO Regression

Since we wish to discourage extreme values in model parameter, we need to choose a

regularization term that penalizes parameter magnitudes. For our loss function, we
will again use MSE.
Together our regularized loss function is:

Note that is the l1 norm of the vector b

CS109A, PROTOPAPAS, RADER, TANNER 9

Ridge Regression

Alternatively, we can choose a regularization term that penalizes the squares of the
parameter magnitudes. Then, our regularized loss function is:

Note that is the square of the l2 norm of the vector b

CS109A, PROTOPAPAS, RADER, TANNER 10

Choosing l

In both ridge and LASSO regression, we see that the larger our choice of the
regularization parameter l, the more heavily we penalize large values in b,
• If l is close to zero, we recover the MSE, i.e. ridge and LASSO regression is
just ordinary regression.
• If l is sufficiently large, the MSE term in the regularized loss function will be
insignificant and the regularization term will force bridge and bLASSO to be close to
zero.
To avoid ad-hoc choices, we should select l using cross-validation.

CS109A, PROTOPAPAS, RADER, TANNER 11

Ridge, LASSO - Computational complexity

Solution to ridge regression:

The solution to the LASSO regression:

LASSO has no conventional analytical solution, as the L1 norm has no

derivative at 0. We can, however, use the concept of subdifferential or
subgradient to find a manageable expression. See a–sec2 for details.

CS109A, PROTOPAPAS, RADER, TANNER 12

Regularization Parameter with a Validation Seet

The solution of the Ridge/Lasso regression involves three steps:

• Select l
• Find the minimum of the ridge/Lasso regression loss function (using
the formula for ridge) and record the MSE on the validation set.
• Find the l that gives the smallest MSE

CS109A, PROTOPAPAS, RADER, TANNER 13

The Geometry of Regularization (LASSO)
𝑛 𝐽
1 ^ 𝐿𝐴𝑆𝑆𝑂
𝜷 =argmin 𝐿𝐿𝐴𝑆𝑆𝑂 ( 𝜷 )
𝑇 2
𝑂 ( 𝜷 ) = ∑ ¿ 𝑦 𝑖 − 𝜷 𝒙| + 𝜆 ∑ ¿ 𝛽 𝑗 ∨¿¿
𝜆 ∑𝑛| ^𝛽𝑖=1 |=𝐶
𝐽
𝐿𝐴𝑆𝑆𝑂
𝑗 𝑗 =1=D

𝑗=1𝛽
2
𝛽 2

𝑀𝑆𝐸
C ^
𝛽 MSE=D

𝛽 1
𝛽 1

CS109A, PROTOPAPAS, RADER, TANNER 14

The Geometry of Regularization (LASSO)
𝑛 𝐽
1 ^ 𝐿𝐴𝑆𝑆𝑂
𝜷 =argmin 𝐿𝐿𝐴𝑆𝑆𝑂 ( 𝜷 )
𝑇 2
𝑂 ( 𝜷 ) = 𝐽 ∑ ¿ 𝑦 𝑖 − 𝜷 𝒙| + 𝜆 ∑ ¿ 𝛽 ∨¿¿
𝑗 𝑛
𝑛 |𝑖=1 𝑗 =1 1

𝐿𝐴𝑆𝑆𝑂
^𝛽
𝐿 =𝜆 ∑ | 𝑇
1
𝑗=1
𝑗
𝐿𝑀𝑆𝐸 ( 𝜷 ) = ∑ ¿ 𝑦 𝑖 − 𝜷 𝒙
𝑛 𝑖=1

CS109A, PROTOPAPAS, RADER, TANNER 15

Th e Geometry
The Geom etryofofRegularization
Regu la riza tion (LASSO)
(LASSO)
𝑛 𝐽
1
1 ^ 𝐿𝐴𝑆𝑆𝑂
� � = |� − � � +� |� | 𝜷� 𝐿
= argmin �
=argmin (�𝜷 )
𝑇 2 𝐿𝐴𝑆𝑆𝑂

𝜷 ∑ 𝑦 − 𝜷 𝒙| 𝜆 ∑ 𝛽
�

𝑂 ( ) = ¿ 𝑖 + ¿ 𝑗 ∨¿¿
𝜆 ∑𝑛| ^𝛽𝑖=1 |=𝐶
𝐽
𝐿𝐴𝑆𝑆𝑂
� �
𝑗
=� 𝑗 =1=D∑ |� − �
�| =D
𝑗=1𝛽
� 2
𝛽
�2

𝑀𝑆𝐸
C ^
𝛽� MSE=D

𝛽
�1
𝛽
�1

CS109A, PROTOPAPAS,
CS109A, PROTOPAPAS , RADER, TANNER
RADER, TANNER 16
12
The Geometry of Regularization (LASSO)

𝛽 2 𝛽 2
𝑀𝑆𝐸 𝛽 2
^
𝛽 MSE=D

C 𝑀𝑆𝐸
C ^
𝛽 MSE=D

C C
𝛽 1 𝛽 1
𝛽 1

CS109A, PROTOPAPAS, RADER, TANNER 17

The Geometry of Regularization (Ridge)
𝑛 𝐽
1 𝑇 2 2
𝑅𝑖𝑑𝑔𝑒
^𝜷 =argmin 𝐿𝑅𝑖𝑑𝑔𝑒 ( 𝜷 )
(𝑒 𝜷 )= ∑ ¿ 𝑦 𝑖 − 𝜷 𝒙 | + 𝜆 ∑ ( 𝛽 𝑗 )
𝜆 ∑𝑛
𝐽
𝑅𝑖𝑑𝑔𝑒 2
|^𝛽 𝑖=1| =𝐶𝑗
=D𝑗 =1

𝑗=1𝛽
2
𝛽 2

𝑆𝐸
𝑀
^
𝛽
MSE=D
C
𝛽 1
𝛽 1

CS109A, PROTOPAPAS, RADER, TANNER 18

The Geometry of Regularization (Ridge)

𝛽 2
𝑀𝑆𝐸
^
𝛽
MSE=D

C
𝛽 1

CS109A, PROTOPAPAS, RADER, TANNER 19

The Geometry of Regularization

𝛽 2 𝛽 2
𝑀𝑆𝐸
^
𝛽
𝑀𝑆𝐸
MSE=D ^
𝛽
MSE=D
C

C C
𝛽 1 𝛽 1

CS109A, PROTOPAPAS, RADER, TANNER 20

Examples

CS109A, PROTOPAPAS, RADER, TANNER 21

Ridge visualized

Ridge estimator

The ridge estimator is where the constraint The values of the coefficients decrease as
and the loss intersect. lambda increases, but they are not nullified.

CS109A, PROTOPAPAS, RADER, TANNER 22

LASSO visualized

Lasso estimator

The Lasso estimator tends to zero out The values of the coefficients decrease as
parameters as the OLS loss can easily intersect lambda increases, and are nullified fast.
with the constraint on one of the axis.
CS109A, PROTOPAPAS, RADER, TANNER 23
Ridge regularization with only validation : step by step

1.
split data into
2. for
1. determine the that minimizes the , , using the train data.
2. record using validation data.
3. select the that minimizes the loss on the validation data,
4. Refit the model using both train and validation data, }, resulting to
5. report MSE or R2 on given the

CS109A, PROTOPAPAS, RADER, TANNER 24

Ridge regularization with validation only: step by step

al
al
vv

CS109A, PROTOPAPAS, RADER, TANNER 25

Lasso regularization with validation only: step by step

1.
split data into
2. for
A. determine the that minimizes the , , using the train data. This is done
using a solver.
B. record using validation data
3. select the that minimizes the loss on the validation data,
4. Refit the model using both train and validation data, }, resulting to
5. report MSE or R2 on given the

CS109A, PROTOPAPAS, RADER, TANNER 26

……
Ridge regularization with CV: step by step .... …
…
…
… .... …
…
1. remove from data …
…
……
…
..
..
…
…
..
..
…
…
2. split the rest of data into K folds, … … … …
… … … …
E[] …
3. for k in E[] …
1. for
A. determine the that minimizes the , , using the train data of the fold, .
B. record using the validation data of the fold
At this point we have a 2-D matrix, rows are for different k, and columns are for different values.
4. Average the for each , .
5. Find the that minimizes the , resulting to .
6. Refit the model using the full training data, }, resulting to
7. report MSE or R2 on given the

CS109A, PROTOPAPAS, RADER, TANNER 27

Ridge regularization with validation only: step by step

al
al
vv

CS109A, PROTOPAPAS, RADER, TANNER 28

Variable Selection as Regularization

Since LASSO regression tend to produce zero estimates for a number of model
parameters - we say that LASSO solutions are sparse - we consider LASSO to be a
method for variable selection.

Many prefer using LASSO for variable selection (as well as for suppressing extreme
parameter values) rather than stepwise selection, as LASSO avoids the statistic
problems that arises in stepwise selection.

Question: What are the pros and cons of the two approaches?

CS109A, PROTOPAPAS, RADER, TANNER 29

Quiz time: Lecture 8

CS109A, PROTOPAPAS, RADER, TANNER 30

Linear Algebra Cheat Sheet
100% (2)
Linear Algebra Cheat Sheet
1 page
Machine Learning With Ridge and Lasso Regression
No ratings yet
Machine Learning With Ridge and Lasso Regression
19 pages
Microsoft PowerPoint - 3. ELMAG - 1 - Coordinate Systems and Transformation
No ratings yet
Microsoft PowerPoint - 3. ELMAG - 1 - Coordinate Systems and Transformation
44 pages
3.3 Regularized Linear Model
No ratings yet
3.3 Regularized Linear Model
27 pages
Lesson Four
No ratings yet
Lesson Four
28 pages
Lasso Regression
No ratings yet
Lasso Regression
16 pages
What Is LASSO Regression Definition, Examples and Techniques
No ratings yet
What Is LASSO Regression Definition, Examples and Techniques
15 pages
_Regularization_Methods_Intro_1694372556
No ratings yet
_Regularization_Methods_Intro_1694372556
38 pages
Lect 6
No ratings yet
Lect 6
10 pages
Lecture BDS 4 23 24 Print
No ratings yet
Lecture BDS 4 23 24 Print
14 pages
Lecture 13 - Reguralization
No ratings yet
Lecture 13 - Reguralization
33 pages
B Ridge - and - Lasso - Regression
No ratings yet
B Ridge - and - Lasso - Regression
5 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
02. Performance Tuning
No ratings yet
02. Performance Tuning
24 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Regularization
No ratings yet
Regularization
45 pages
CSL0777 L17
No ratings yet
CSL0777 L17
27 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
2-lecture 2-1
No ratings yet
2-lecture 2-1
30 pages
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
No ratings yet
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
29 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
14 Regularization 2
No ratings yet
14 Regularization 2
18 pages
LassoRegression
No ratings yet
LassoRegression
3 pages
Lecture BDS 7-23-24 Print
No ratings yet
Lecture BDS 7-23-24 Print
14 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Lasso-NIPS
No ratings yet
Lasso-NIPS
8 pages
AI34
No ratings yet
AI34
3 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Lasso Regularization of Generalized Linear Models - MATLAB & Simulink
No ratings yet
Lasso Regularization of Generalized Linear Models - MATLAB & Simulink
14 pages
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
No ratings yet
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
4 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
Slides 2
No ratings yet
Slides 2
27 pages
Detailed_Breakdown_Ridge_Lasso
No ratings yet
Detailed_Breakdown_Ridge_Lasso
2 pages
Lecture 1.5-1.6
No ratings yet
Lecture 1.5-1.6
23 pages
Group 30 Ppt
No ratings yet
Group 30 Ppt
33 pages
Regularization
No ratings yet
Regularization
13 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
Assign3 Lasso
No ratings yet
Assign3 Lasso
3 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Sequential Forward Selection (SFS)
No ratings yet
Sequential Forward Selection (SFS)
5 pages
Ex Regularization 2
No ratings yet
Ex Regularization 2
3 pages
Linear Regression Regularization
No ratings yet
Linear Regression Regularization
13 pages
Cross-Validation, Regularization, and Principal Components Analysis (PCA)
No ratings yet
Cross-Validation, Regularization, and Principal Components Analysis (PCA)
47 pages
SLChapter5
No ratings yet
SLChapter5
16 pages
Journal of Statistical Software: Regularization Paths For Generalized Linear Models Via Coordinate Descent
No ratings yet
Journal of Statistical Software: Regularization Paths For Generalized Linear Models Via Coordinate Descent
22 pages
Regularization_(1)
No ratings yet
Regularization_(1)
3 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
LLM ML Interview Q
No ratings yet
LLM ML Interview Q
43 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
Lecture6 Regularization
No ratings yet
Lecture6 Regularization
56 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Module 3
No ratings yet
Module 3
35 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Advanced Regression Pres
No ratings yet
Advanced Regression Pres
42 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Pedagogic Project Form 4 Maths 2024-2025 Lingam
No ratings yet
Pedagogic Project Form 4 Maths 2024-2025 Lingam
5 pages
Multi Step Equations Part 2 I Can Quiz
No ratings yet
Multi Step Equations Part 2 I Can Quiz
1 page
AP Calc 3.1 Guided Notes
No ratings yet
AP Calc 3.1 Guided Notes
4 pages
Arturo Locatelli-Optimal Control - An Introduction-Birkhäuser Basel (2001) PDF
No ratings yet
Arturo Locatelli-Optimal Control - An Introduction-Birkhäuser Basel (2001) PDF
305 pages
Quantum Field Theory I Notes
No ratings yet
Quantum Field Theory I Notes
175 pages
Functional Analysis
No ratings yet
Functional Analysis
5 pages
06_FEA_17ME61 all units notes
No ratings yet
06_FEA_17ME61 all units notes
93 pages
Wikimama Class 11 CH 12 3-D
No ratings yet
Wikimama Class 11 CH 12 3-D
3 pages
Calculus 2 Midterm Quiz 2
100% (1)
Calculus 2 Midterm Quiz 2
6 pages
Complex Numbers
No ratings yet
Complex Numbers
9 pages
Course Outline MATH-234 Multivariable Calculus
No ratings yet
Course Outline MATH-234 Multivariable Calculus
4 pages
MATH107 GR07 (121886) Syllabus
No ratings yet
MATH107 GR07 (121886) Syllabus
3 pages
W.T. Tutte - On The Four-Colour Conjecture
No ratings yet
W.T. Tutte - On The Four-Colour Conjecture
13 pages
Namma Kalvi 10th Maths Full Guide English Medium 220980
No ratings yet
Namma Kalvi 10th Maths Full Guide English Medium 220980
304 pages
Class 12 Autmn Break HW Maths 2023-24
No ratings yet
Class 12 Autmn Break HW Maths 2023-24
8 pages
Picard's Existene Theorem (Statement Only)
No ratings yet
Picard's Existene Theorem (Statement Only)
3 pages
ESE306-lecture 1
No ratings yet
ESE306-lecture 1
34 pages
Monte Carlo
No ratings yet
Monte Carlo
59 pages
Study and Validation of Singularities For A Fanuc LR Mate 200ic Robot
No ratings yet
Study and Validation of Singularities For A Fanuc LR Mate 200ic Robot
7 pages
A First Course in The Finite Element Method - Lecture Slides PDF
100% (1)
A First Course in The Finite Element Method - Lecture Slides PDF
672 pages
Level 1 Problem Sheet - Vectors and Solid Geometry
No ratings yet
Level 1 Problem Sheet - Vectors and Solid Geometry
4 pages
Tensor Algebra and Tensor Analysis for Engineers With Applications to Continuum Mechanics Fifth Edition Mikhail Itskov - Download the ebook now for full and detailed access
No ratings yet
Tensor Algebra and Tensor Analysis for Engineers With Applications to Continuum Mechanics Fifth Edition Mikhail Itskov - Download the ebook now for full and detailed access
65 pages
CS 372: Computational Geometry Arrangements and Duality: Antoine Vigneron
No ratings yet
CS 372: Computational Geometry Arrangements and Duality: Antoine Vigneron
41 pages
C03 P001 P010
No ratings yet
C03 P001 P010
10 pages
2022 - 23 F.4 M1 Final Ver 3
No ratings yet
2022 - 23 F.4 M1 Final Ver 3
12 pages
10th Maths PDF1
No ratings yet
10th Maths PDF1
238 pages
QUIZ #1 (Basic Calculus) PDF
No ratings yet
QUIZ #1 (Basic Calculus) PDF
8 pages
YCT IIT JEE Matrix & Determinant
50% (2)
YCT IIT JEE Matrix & Determinant
259 pages

Lecture8a Regularization

Uploaded by

Lecture8a Regularization

Uploaded by

Lecture 8a: Regularization

CS109A Introduction to Data Science

• Religious Holidays: please contact if this affects your HW due dates.

CS109A, PROTOPAPAS, RADER, TANNER 2

• Milestone 1: Due this Friday, Oct 4 (see Canvas instructions).

CS109A, PROTOPAPAS, RADER, TANNER 3

CS109A, PROTOPAPAS, RADER, TANNER 5

Left: Linear regression coefficients

CS109A, PROTOPAPAS, RADER, TANNER 6

Regularization: LASSO and Ridge

CS109A, PROTOPAPAS, RADER, TANNER 7

CS109A, PROTOPAPAS, RADER, TANNER 8

Since we wish to discourage extreme values in model parameter, we need to choose a

Note that is the l1 norm of the vector b

CS109A, PROTOPAPAS, RADER, TANNER 9

Note that is the square of the l2 norm of the vector b

CS109A, PROTOPAPAS, RADER, TANNER 10

CS109A, PROTOPAPAS, RADER, TANNER 11

Solution to ridge regression:

The solution to the LASSO regression:

LASSO has no conventional analytical solution, as the L1 norm has no

CS109A, PROTOPAPAS, RADER, TANNER 12

The solution of the Ridge/Lasso regression involves three steps:

CS109A, PROTOPAPAS, RADER, TANNER 13

CS109A, PROTOPAPAS, RADER, TANNER 14

CS109A, PROTOPAPAS, RADER, TANNER 15

CS109A, PROTOPAPAS, RADER, TANNER 17

CS109A, PROTOPAPAS, RADER, TANNER 18

CS109A, PROTOPAPAS, RADER, TANNER 19

CS109A, PROTOPAPAS, RADER, TANNER 20

CS109A, PROTOPAPAS, RADER, TANNER 21

CS109A, PROTOPAPAS, RADER, TANNER 22

CS109A, PROTOPAPAS, RADER, TANNER 24

CS109A, PROTOPAPAS, RADER, TANNER 25

CS109A, PROTOPAPAS, RADER, TANNER 26

CS109A, PROTOPAPAS, RADER, TANNER 27

CS109A, PROTOPAPAS, RADER, TANNER 28

CS109A, PROTOPAPAS, RADER, TANNER 29

CS109A, PROTOPAPAS, RADER, TANNER 30

You might also like