0% found this document useful (0 votes)

48 views

Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University

This document discusses regularization techniques for linear models, including ridge regression, lasso, elastic net, and group lasso. Ridge regression adds an L2 penalty to reduce variance. Lasso uses an L1 penalty to perform variable selection. Elastic net combines L1 and L2 penalties. Group lasso allows predefined groups of variables to be selected together. These techniques help address issues like multicollinearity and improve model performance.

Uploaded by

azjajaoan malaya

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University

Uploaded by

azjajaoan malaya

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

CS 7265 BIG DATA ANALYTICS

REGULARIZATION ON LINEAR MODEL

* Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington

Mingon Kang, Ph.D

Computer Science, Kennesaw State University
Problems in Linear Regression
 More predictors than the number of samples
 Not unusual to see such data. E.g., microarray data

 Ill-conditioned matrix
 LS estimates depend upon (𝐗 ′ 𝐗)−1 , and the
computation with ill-conditioned matrix X may be
singular or nearly singular
 In this case, small changes to the elements of X lead to
large change in (𝐗 ′ 𝐗)−1
Ill-conditioned Matrices
𝑥+𝑦 =2 𝑥+𝑦 =2
𝑥 + 1.001𝑦 = 2 𝑥 + 1.001𝑦 = 2.001

The solution is x = 2, y = 0 on the left, x=1, y=1 on

the right. The coefficient matrix is called ill-conditioned
because a small change in the constant coefficients
results in a large change in the solution.
Solutions
 Sometimes can be improved by trading a little bias
to reduce the variance of the predicted values
 Determine a smaller subset of predictors that
exhibit the strongest effect (Feature Selection)
 It
gives the big picture sacrificing some of the small
details
Motivation
 If more than two independent variables are highly
correlated:

 The intercept is approximated well, but coefficients?

Reference: http://web.as.uky.edu/statistics/users/pbreheny/603/2-20.pdf
Motivation
 It happens because x1 and x2 are highly
correlated.
 RSS(40, -38) = 21.7 (our estimate) is very closed to
RSS(1, 1) = 22.6 (the truth)
 Effective way of dealing with this problem is
through penalization:
 Instead of minimizing RSS only, we consider an
additional term in the regression form…

Reference: http://web.as.uky.edu/statistics/users/pbreheny/603/2-20.pdf
Ridge Regression
 Ridge Regression Model
𝑛

𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 (𝑦𝑖 − 𝐗𝐛)2

𝑖=1
𝑝
2
𝑠. 𝑡. 𝑏𝑗 ≤ 𝑐
𝑗=1
Ridge Regression
 Why does this help?
 Smaller coefficients give less sensitivity of the variables.

Reference: http://web.as.uky.edu/statistics/users/pbreheny/603/2-20.pdf
Ridge Regression
 Lagrange Multiplier
 A strategy for finding the local maxima or minima of a function
subject to equality/inequality constraints

Minimizing
𝑛

𝑓(𝑥) 𝑠. 𝑡. 𝑔(𝑥) ≤ 𝐶
𝑖=1
Equivalent to minimizing
𝑛

𝑓(𝑥) + λ𝑔(𝑥) ,
𝑖=1
Where λ is positive.
Lagrange Multiplier
 Example
 Find the extrema of the function F(x, y) = 2𝑦 + 𝑥
subject to the constraint 0 = g(x, y) = 𝑦 2 + 𝑥𝑦 − 1

 Derivative of H x, y, z = 2𝑦 + 𝑥 + z(𝑦 2 + 𝑥𝑦 − 1)

Test with http://www.math.uri.edu/~bkaskosz/flashmo/graph3d2/

Ridge Regression
 Ridge Regression Model
𝑛

𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 (𝑦𝑖 − 𝐗𝐛)2 + λ 𝐛 2 ,

𝑖=1
2
where 𝐛 is L-2 norm of b (Euclidean distance)

P-norm (p ≥ 1)
𝒏 𝟏/𝒑
𝒑
x 𝒑 ≔ 𝒙𝒊
𝒊=𝟏
p=1, Manhattan norm (L-1 norm); p=2, Euclidean norm; p=∞,
maximum norm
Optimization
H 𝐛, λ = 𝐲 − 𝐗𝐛 ′ 𝐲 − 𝐗𝐛 +λ𝐛′ 𝐛
= 𝐲 ′ 𝐲 − 𝟐𝐛′ 𝐗 ′ 𝐲 + 𝐛′ 𝐗 ′ 𝐗𝐛+λ𝐛′ 𝐛

𝜕H 𝐛, λ
= −2𝐗 ′ 𝐲 + 2𝐗 ′ 𝐗𝐛 + 2λ𝐛 = 𝟎
𝜕𝐛
𝐗 ′ 𝐗 + λ𝐈 𝐛 = 𝐗 ′ 𝐲
𝐛 = (𝐗 ′ 𝐗 + λ𝐈)−1 𝐗 ′ 𝐲

𝐗 ′ 𝐗 + λ𝐈 is always invertible. Always gives a unique

solution, 𝐛
Ridge Regression
 Similar to the ordinary least squares solution,
but with the addition of a “ridge”
regularization
 λ  0, 𝐛 𝑟𝑖𝑑𝑔𝑒 𝐛𝑂𝐿𝑆
 λ  ∞, 𝐛 𝑟𝑖𝑑𝑔𝑒 0

 Applying the ridge regression penalty has the effect of

shrinking the estimates toward zero
 Introduce bias but reduce the variance of the estimate
LASSO
 LASSO: Least Absolute Shrinkage and Selection Operator
 L-1 norm penalization
 Most coefficients are shrunken all the way to zeros  called
sparse
𝑛

𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 (𝑦𝑖 − 𝐗 𝒊 𝐛)2

𝑖=1
𝑝

𝑠. 𝑡. 𝑏𝑗 ≤ 𝑐
𝑗=1
where 𝐗 𝒊 (= ℝ1×𝑝 ) represents the i-th row vector of X

Ref: http://statweb.stanford.edu/~tibs/lasso.html
LASSO
 Absolute value operation on penalization  not
differentiable and lacks of a close form solution
 In the special case of an orthonormal design matrix,
X’X=I, it is possible to obtain close form solution (but
approximate solution)
 𝑏𝐽𝑙𝑎𝑠𝑠𝑜 = 𝑆(𝑏𝐽𝑂𝐿𝑆 , λ), where S is soft-thresholding operator
defined as:
𝑧 − λ 𝑖𝑓 𝑧 > λ
𝑆 𝑧, λ = 0 𝑖𝑓 𝑧 ≤ λ
𝑧 + λ 𝑖𝑓 𝑧 < −λ
Soft-thresholding operator
 Proximal mapping of the L1-norm:
𝑛

𝐛∗ = argminb (𝑦𝑖 − 𝐗𝐛)2 + λ 𝐛 1

𝑖=1
𝛻 𝐲 − 𝐗𝐛 2 + 𝜕 λ 𝐛 1 ↔ −𝐗 ′ 𝐲 + 𝐗 ′ 𝐗𝐛 + λ𝜕 𝐛 1
The L-1norm is separable, so we can consider each of its
components separately. Let’s examine first the case where
𝑏𝑖 ≠ 0. Then, 𝜕 𝐛 1 = 𝑠𝑖𝑔𝑛(𝑏𝑖 ) and the optimum 𝑏𝑖∗ is
obtained as
−𝑋 ′ 𝑖 𝑦𝑖 + 𝑋 ′ 𝑖 𝑋𝑖 𝑏𝑖 + λ𝑠𝑖𝑔𝑛 𝑏𝑖 = 0
𝑏𝑖 = 𝑋 ′ 𝑖 𝑦𝑖 − λ𝑠𝑖𝑔𝑛 𝑏𝑖
Soft-thresholding operator
 In the case where 𝑏𝑖 = 0, the subdifferential of the
L-1norm is the interval [-1, 1] and the optimality
condition is
𝑋 ′ 𝑖 𝑦𝑖 + λ −1, 1 = 0
𝑋 ′ 𝑖 𝑦𝑖 = −λ, λ
|𝑋 ′ 𝑖 𝑦𝑖 | ≤ λ

Absolute value function (left), and its sub-differential as a function x(right)

Geometry of ridge vs lasso
 A geometric illustration of why lasso results in
sparsity. Lasso (left): confidents tend to be zeros
Reference
 http://web.as.uky.edu/statistics/users/pbreheny/60
3/2-20.pdf
 https://lagunita.stanford.edu/c4x/HumanitiesScienc
e/StatLearning/asset/model_selection.pdf
 http://statweb.stanford.edu/~tibs/sta305files/Rud
yregularization.pdf
 https://onlinecourses.science.psu.edu/stat857/node
/155
Elastic Net
 Hui Zou and Trevor Hastie in 2005
 Regularize combining L-1 and L-2 norm penalties of
the Lasso and Ridge
 Overcome the limitation of Lasso:
 For the case, where large p, small n and several
variables are highly correlated each other, LASSO
tends to select only one variable among them (others
become zeros).
Elastic Net
 The elastic net method includes the LASSO and
ridge regression

𝐛∗ = argminb (𝑦𝑖 − 𝐗𝐛)2 + λ1 𝐛 + λ2 𝐛 2

𝑖=1

 If λ1 = λ, λ2 = 0, it is LASSO
 If λ1 = 0, λ2 = λ, it is Ridge regression
Group LASSO
 Yuan and Lin in 2006
 When pre-defined groups of covariates are given,
they are allowed to be selected together

𝑛 𝐽

𝐛 = argminb (𝑦𝑖 − 𝐗𝐛)2 + λ 𝑏𝑗 𝐾𝑗

𝑖=1 𝑗=1
𝑏𝑖 𝐾𝑗 = (𝜑′ 𝐾𝜑)1/2
𝜑 ∈ ℝ𝑑 , 𝐾 ∈ ℝ𝑑×𝑑
Interaction effects
 Assumption that variables are independent in Linear model
 When considering the relationship among two or more
variables, need to describes simultaneous influence of the
variables.

𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽12 𝑥1 𝑥2 + 𝑒

 Kernel function
 Convert non-linear to linear in a higher dimensional space
 https://www.youtube.com/watch?v=9NrALgHFwTo
References
 Elastic net
 H. Zou and T. Hastie, “Regularization and variable
selection via the elastic net”, J.R. Statist. Soc., 2005
 https://web.stanford.edu/~hastie/Papers/B67.2%20(
2005)%20301-320%20Zou%20&%20Hastie.pdf
 Group Lasso
 M. Yuan and Y. Lin, “Model selection and estimation in
regression with grouped variables”, J.R. Statist. Soc.,
2006
 http://pages.stat.wisc.edu/~myuan/papers/glasso.fina
l.pdf

MAT-144 Syllabus
0% (1)
MAT-144 Syllabus
15 pages
Digital Solution - Assignment # 3
No ratings yet
Digital Solution - Assignment # 3
5 pages
Regularization
No ratings yet
Regularization
13 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
Hdnotes 2021
No ratings yet
Hdnotes 2021
31 pages
Regression Interpolation
No ratings yet
Regression Interpolation
34 pages
Module 3
No ratings yet
Module 3
35 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
lecture03d_ridge
No ratings yet
lecture03d_ridge
13 pages
Ex Regularization 2
No ratings yet
Ex Regularization 2
3 pages
Lect 6
No ratings yet
Lect 6
10 pages
Lecture BDS 4 23 24 Print
No ratings yet
Lecture BDS 4 23 24 Print
14 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
4lasso and Friends
No ratings yet
4lasso and Friends
36 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Worksheet2
No ratings yet
Worksheet2
9 pages
Machine learning
No ratings yet
Machine learning
19 pages
31 Least Squares
No ratings yet
31 Least Squares
39 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
Least Squares Optimization With L1-Norm Regularization
No ratings yet
Least Squares Optimization With L1-Norm Regularization
12 pages
1. Lecture+Notes+-+Advanced+Regression
No ratings yet
1. Lecture+Notes+-+Advanced+Regression
12 pages
Slides 2
No ratings yet
Slides 2
27 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
7th Lecture Note 230515 135845
No ratings yet
7th Lecture Note 230515 135845
21 pages
SLChapter5
No ratings yet
SLChapter5
16 pages
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
No ratings yet
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
4 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
No ratings yet
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
29 pages
Lecture BDS 7-23-24 Print
No ratings yet
Lecture BDS 7-23-24 Print
14 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Group 30 Ppt
No ratings yet
Group 30 Ppt
33 pages
Linear Regression Regularization
No ratings yet
Linear Regression Regularization
13 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
Unit 2
No ratings yet
Unit 2
92 pages
Lasso Slides Tibsharani
No ratings yet
Lasso Slides Tibsharani
44 pages
1.1. Linear Models — scikit-learn 1.6.1 documentation
No ratings yet
1.1. Linear Models — scikit-learn 1.6.1 documentation
41 pages
Class03 RLS
No ratings yet
Class03 RLS
28 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
Regularization_(1)
No ratings yet
Regularization_(1)
3 pages
Ridge Regression
No ratings yet
Ridge Regression
5 pages
Lec6 Linear Model With LSP
No ratings yet
Lec6 Linear Model With LSP
35 pages
Journal of Statistical Software: Regularization Paths For Generalized Linear Models Via Coordinate Descent
No ratings yet
Journal of Statistical Software: Regularization Paths For Generalized Linear Models Via Coordinate Descent
22 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
Regression Shrinkage and Selection Via The Lasso: A Retrospective
No ratings yet
Regression Shrinkage and Selection Via The Lasso: A Retrospective
10 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Convex Optimization Prerequisite_topics
No ratings yet
Convex Optimization Prerequisite_topics
6 pages
Lasso-NIPS
No ratings yet
Lasso-NIPS
8 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
Lecture 13 - Reguralization
No ratings yet
Lecture 13 - Reguralization
33 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lasoo Regression
No ratings yet
Lasoo Regression
8 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
10 Regression, Including Least-Squares Linear and Logistic Regression
No ratings yet
10 Regression, Including Least-Squares Linear and Logistic Regression
5 pages
LLM ML Interview Q
No ratings yet
LLM ML Interview Q
43 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
CMSC 141 - Automata and Language Theory
No ratings yet
CMSC 141 - Automata and Language Theory
2 pages
CS282BR: Topics in Machine Learning Interpretability and Explainability
No ratings yet
CS282BR: Topics in Machine Learning Interpretability and Explainability
84 pages
F1000research 274712
No ratings yet
F1000research 274712
1 page
Interpretation of Proportional Hazards Regression Models
No ratings yet
Interpretation of Proportional Hazards Regression Models
12 pages
F1000research-257532 Genome Graphs
No ratings yet
F1000research-257532 Genome Graphs
1 page
F1000research 273294
0% (1)
F1000research 273294
1 page
F1000research 273611
No ratings yet
F1000research 273611
1 page
Aarthi Ravikrishnan:, Meghana Nasre and Karthik Raman
No ratings yet
Aarthi Ravikrishnan:, Meghana Nasre and Karthik Raman
1 page
Estimation of Optimal Number of Independent Components For Patient Classification and Prediction of Their Survival
No ratings yet
Estimation of Optimal Number of Independent Components For Patient Classification and Prediction of Their Survival
1 page
Chapter 3: System of Linear Equation
No ratings yet
Chapter 3: System of Linear Equation
60 pages
2nd SUMMATIVE TEST MATH 9 3rd QUARTER
No ratings yet
2nd SUMMATIVE TEST MATH 9 3rd QUARTER
2 pages
3.1 Vector Fields
No ratings yet
3.1 Vector Fields
7 pages
Math Dictionary
No ratings yet
Math Dictionary
29 pages
JohnLee First Course in Linear Optimization
No ratings yet
JohnLee First Course in Linear Optimization
172 pages
Spiral Math 8 (Second Edition) TM
No ratings yet
Spiral Math 8 (Second Edition) TM
220 pages
Integration
100% (6)
Integration
4 pages
MAT8 CH 4 Cubesandcuberoots AK
No ratings yet
MAT8 CH 4 Cubesandcuberoots AK
7 pages
MCQ andCBQ_PLETV&QE_QNS
No ratings yet
MCQ andCBQ_PLETV&QE_QNS
10 pages
Coded Inequalities Mains Level Practice Questions1701862404
No ratings yet
Coded Inequalities Mains Level Practice Questions1701862404
34 pages
Integral Calculus PDF
No ratings yet
Integral Calculus PDF
390 pages
Maths Midterm Paper Class 10
No ratings yet
Maths Midterm Paper Class 10
4 pages
Parametric Curves
No ratings yet
Parametric Curves
8 pages
MTH 212
100% (1)
MTH 212
10 pages
Form 4: Chapter 2 (Quadratic Equations) SPM Practice Fully-Worked Solutions
No ratings yet
Form 4: Chapter 2 (Quadratic Equations) SPM Practice Fully-Worked Solutions
2 pages
Chapter 3 - Simplex Method
No ratings yet
Chapter 3 - Simplex Method
40 pages
Linear Predict
No ratings yet
Linear Predict
14 pages
02 Efficient Characterization of The Random Eigenvalue Problem in
No ratings yet
02 Efficient Characterization of The Random Eigenvalue Problem in
19 pages
Lecture 01 On 30.09.2015 Partial Differential Equations: MA201 Mathematics III
No ratings yet
Lecture 01 On 30.09.2015 Partial Differential Equations: MA201 Mathematics III
31 pages
Solving Linear and Non-Linear Simultaneous Equations
No ratings yet
Solving Linear and Non-Linear Simultaneous Equations
14 pages
School Report: Grade 9 Assessment of Mathematics, 2013-2014
No ratings yet
School Report: Grade 9 Assessment of Mathematics, 2013-2014
34 pages
Module 1 Functions Assignment
No ratings yet
Module 1 Functions Assignment
13 pages
From Poisson To Quantum Geometry: Nicola Ciccoli
No ratings yet
From Poisson To Quantum Geometry: Nicola Ciccoli
102 pages
Math 8 SLHT, Q2, Wk1, MELC Graphs of Linear Inequalities in Two Variables
100% (1)
Math 8 SLHT, Q2, Wk1, MELC Graphs of Linear Inequalities in Two Variables
5 pages
Common Mathematical Misconceptions: Kitty Rutherford and Denise Schulz NC Department of Public Instruction
No ratings yet
Common Mathematical Misconceptions: Kitty Rutherford and Denise Schulz NC Department of Public Instruction
83 pages
MCQ chap 3
No ratings yet
MCQ chap 3
2 pages
LAG ProblemSheet3
No ratings yet
LAG ProblemSheet3
2 pages
Solving Multicollinearity Problem: Int. J. Contemp. Math. Sciences, Vol. 6, 2011, No. 12, 585 - 600
No ratings yet
Solving Multicollinearity Problem: Int. J. Contemp. Math. Sciences, Vol. 6, 2011, No. 12, 585 - 600
16 pages