0% found this document useful (0 votes)

153 views

Ridge Regression

Ridge regression is proposed to address issues that arise when there are many predictors or the predictor matrix is ill-conditioned. It places an L2 penalty on the regression coefficients to reduce their variance. This shrinks the coefficients toward zero, improving prediction. Ridge regression estimates are obtained by minimizing the residual sum of squares with a penalty term added that is proportional to the L2 norm of the coefficients. Geometrically, it finds the point where the residual sum of squares contour and a constraint circle intersect. Cross-validation is often used to select the ridge parameter lambda, which controls the amount of shrinkage.

Uploaded by

nitin30

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

153 views

Ridge Regression

Uploaded by

nitin30

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

PublishedonSTAT897D(https://onlinecourses.science.psu.

edu/stat857)
Home>5.1RidgeRegression

5.1RidgeRegression
Motivation:toomanypredictors
Itisnotunusualtoseethenumberofinputvariablesgreatlyexceedthenumberof
observations,e.g.microarraydataanalysis,environmentalpollutionstudies.
Withmanypredictors,fittingthefullmodelwithoutpenalizationwillresultinlarge
predictionintervals,andLSregressionestimatormaynotuniquelyexist.
Motivation:illconditionedX
BecausetheLSestimatesdependupon(X X)1 ,wewouldhaveproblemsin
computingLS ifX Xweresingularornearlysingular.
Inthosecases,smallchangestotheelementsofX leadtolargechangesin

1
(X X )
.
TheleastsquareestimatorLS mayprovideagoodfittothetrainingdata,butitwill
notfitsufficientlywelltothetestdata.

RidgeRegression
Onewayoutofthissituationistoabandontherequirementofanunbiasedestimator.
WeassumeonlythatX'sandYhavebeencentered,sothatwehavenoneedforaconstant
termintheregression:
Xisanbypmatrixwithcenteredcolumns,
Yisacenterednvector.
HoerlandKennard(1970)proposedthatpotentialinstabilityintheLSestimator

^
= (X X )
X Y,

couldbeimprovedbyaddingasmallconstantvalue tothediagonalentriesofthematrix

X Xbeforetakingitsinverse.
Theresultistheridgeregressionestimator
^

^
ridge = (X X + Ip )
X Y

Ridgeregressionplacesaparticularformofconstraintontheparameters( 's):^ridge is
chosentominimizethepenalizedsumofsquares:
p

p
2

(yi xij j )
i=1

j=1

whichisequivalenttominimizationofi=1 (yi
p

j=1

2
j

< c

2
j

j=1

xij j )

subjectto,forsomec

> 0

,i.e.constrainingthesumofthesquaredcoefficients.

Therefore,ridgeregressionputsfurtherconstraintsontheparameters,j 's,inthelinear
model.Inthiscase,whatwearedoingisthatinsteadofjustminimizingtheresidualsumof
squareswealsohaveapenaltytermonthe 's.Thispenaltytermis (aprechosen
constant)timesthesquarednormofthe vector.Thismeansthatifthej 'stakeonlarge
values,theoptimizationfunctionispenalized.Wewouldprefertotakesmallerj 's,orj 's
thatareclosetozerotodrivethepenaltytermsmall.

GeometricInterpretationofRidgeRegression:

Theellipsescorrespondtothecontoursofresidualsumofsquares(RSS):theinnerellipse
hassmallerRSS,andRSSisminimizedatordinalleastsquare(OLS)estimates.
Forp

= 2

,theconstraintinridgeregressioncorrespondstoacircle,j=1

2
j

< c

Wearetryingtominimizetheellipsesizeandcirclesimultanouslyintheridgeregression.
Theridgeestimateisgivenbythepointatwhichtheellipseandthecircletouch.
ThereisatradeoffbetweenthepenaltytermandRSS.Maybealarge wouldgiveyoua
betterresidualsumofsquaresbutthenitwillpushthepenaltytermhigher.Thisiswhyyou

mightactuallyprefersmaller 'swithworseresidualsumofsquares.Fromanoptimization
perspective,thepenaltytermisequivalenttoaconstraintonthe 's.Thefunctionisstillthe
residualsumofsquaresbutnowyouconstrainthenormofthej 'stobesmallerthansome
constantc.Thereisacorrespondencebetween andc.Thelargerthe is,themoreyou
preferthej 'sclosetozero.Intheextremecasewhen = 0 ,thenyouwouldsimplybe
doinganormallinearregression.Andtheotherextremeas approachesinfinity,yousetall
the 'stozero.

PropertiesofRidgeEstimator:
^
ls

isanunbiasedestimatorof ^ridge isabiasedestimatorof .

Fororthogonalcovariates,X X

= nIp

,^ridge

^
ls

.Hence,inthiscase,the

ridgeestimatoralwaysproducesshrinkagetowards0. controlstheamountof
shrinkage.
Animportantconceptinshrinkageisthe"effective''degreesoffreedomassociatedwitha
setofparameters.Inaridgeregressionsetting:
1.Ifwechoose = 0 ,wehavepparameters(sincethereisnopenalization).
2.If islarge,theparametersareheavilyconstrainedandthedegreesof
freedomwilleffectivelybelower,tendingto0as .
Theeffectivedegreesoffreedomassociatedwith1 , 2 , , p isdefinedas
p

df () = tr(X(X X + Ip )

X ) =
j=1

2
j

,
+

wheredj arethesingularvaluesofX .Noticethat = 0 ,whichcorrespondsto

noshrinkage,givesdf () = p (aslongasX Xisnonsingular),aswewould
expect.
Thereisa1:1mappingbetween andthedegreesoffreedom,soinpracticeone
maysimplypicktheeffectivedegreesoffreedomthatonewouldlikeassociated
withthefit,andsolvefor .
Asanalternativetoauserchosen ,crossvalidationisoftenusedinchoosing :we
select thatyieldsthesmallestcrossvalidationpredictionerror.
Theintercept0 hasbeenleftoutofthepenaltytermbecauseY hasbeencentered.
Penalizationoftheinterceptwouldmaketheproceduredependontheoriginchosenfor
Y.

Sincetheridgeestimatorislinear,itisstraightforwardtocalculatethevariance
covariancematrixvar(^ridge )

= (X X + Ip )

X X(X X + Ip )

ABayesianFormulation
Considerthelinearregressionmodelwithnormalerrors:
p

Yi = Xij j + i
j=1

isi.i.d.normalerrorswithmean0andknownvariance 2 .

Since isappliedtothesquarednormofthevector,peopleoftenstandardizeallofthe
covariatestomakethemhaveasimilarscale.Assumej hasthepriordistribution
.Alargevalueof correspondstoapriorthatismoretightly
concentratedaroundzero,andhenceleadstogreatershrinkagetowardszero.
2

j iid N (0, /)

1
^
N ( , (X X + Ip )
X X(X X + Ip )
)

Theposterioris|Y

,where

^
^
= ridge = (X X + Ip )
X Y

,confirmingthattheposteriormean(andmode)ofthe
Bayesianlinearmodelcorrespondstotheridgeregressionestimator.
Whereastheleastsquaressolutions^ls

= (X X )

X Y

areunbiasedifmodeliscorrectly

specified,ridgesolutionsarebiased,E(^ridge ) .However,atthecostofbias,ridge
regressionreducesthevariance,andthusmightreducethemeansquarederror(MSE).
2

M S E = Bias

MoreGeometricInterpretations(optional)
Inputsarecenteredfirst
Considerthefittedresponse
ridge

^
^ = X
y

= X(X

X + I)

= UD(D

= uj
j=1

+ I)

2
j

u
+

2
j

+ V ariance

where\(\textbf{u}_j\)arethenormalizedprincipalcomponentsofX.
Ridgeregressionshrinksthecoordinateswithrespecttotheorthonormalbasisformed
bytheprincipalcomponents.
Coordinateswithrespecttoprincipalcomponentswithsmallervarianceareshrunk
more.
~
InsteadofusingX=(X1,X2,...,Xp)aspredictingvariables,usethenewinputmatrixX
=UD
Thenforthenewinputs:
2

ridge

dj +

^
V ar( j ) =

2
2

where 2 isthevarianceoftheerrorterm inthelinearmodel.

Theshrinkagefactorgivenbyridgeregressionis:
d
d

2
j

Wesawthisinthepreviousformula.Thelargeris,themoretheprojectionisshrunkinthe
directionofuj .Coordinateswithrespecttotheprincipalcomponentswithasmallervariance
areshrunkmore.
Let'stakealookatthisgeometrically.

[1]
Thisinterpretationwillbecomeconvenientwhenwecompareittoprincipalcomponents
regressionwhereinsteadofdoingshrinkage,weeithershrinkthedirectionclosertozeroor
wedon'tshrinkatall.Wewillseethisinthe"DimensionReductionMethods"lesson.
SourceURL:https://onlinecourses.science.psu.edu/stat857/node/155
Links:
[1]https://onlinecourses.science.psu.edu/stat857/javascript:popup_window(
'/stat857/sites/onlinecourses.science.psu.edu.stat857/files/lesson02/shrinkage_viewlet_swf.html','shrinkage',683,
525)

Chapter 2
No ratings yet
Chapter 2
22 pages
Chapter 6 - 1 Handsout Machine Learning
No ratings yet
Chapter 6 - 1 Handsout Machine Learning
29 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
Module 4: Regression Shrinkage Methods
No ratings yet
Module 4: Regression Shrinkage Methods
5 pages
Ridge Regression LASSO
No ratings yet
Ridge Regression LASSO
18 pages
SLChapter5
No ratings yet
SLChapter5
16 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
Copie de Executive Summary of Marketing Plan by Slidesgo 1
No ratings yet
Copie de Executive Summary of Marketing Plan by Slidesgo 1
50 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
No ratings yet
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
29 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
WNvanWieringen HDDA Lecture234 RidgeRegression 20182019 PDF
No ratings yet
WNvanWieringen HDDA Lecture234 RidgeRegression 20182019 PDF
132 pages
Modern Regression - Ridge Regression
100% (1)
Modern Regression - Ridge Regression
21 pages
Modern Regression 1: Ridge Regression: Ryan Tibshirani Data Mining: 36-462/36-662
No ratings yet
Modern Regression 1: Ridge Regression: Ryan Tibshirani Data Mining: 36-462/36-662
21 pages
MLDA U1
No ratings yet
MLDA U1
10 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
Regularization: Ridge Regression and The LASSO: Statistics 305: Autumn Quarter 2006/2007
No ratings yet
Regularization: Ridge Regression and The LASSO: Statistics 305: Autumn Quarter 2006/2007
56 pages
Rudyregularization PDF
No ratings yet
Rudyregularization PDF
56 pages
PA 1 UNIT
No ratings yet
PA 1 UNIT
23 pages
The Bayesian Lasso: Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521
No ratings yet
The Bayesian Lasso: Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521
16 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
Tutorial1_estimates(1)
No ratings yet
Tutorial1_estimates(1)
9 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
Topic One Linear Regression Regularization
No ratings yet
Topic One Linear Regression Regularization
68 pages
SLChapter6
No ratings yet
SLChapter6
21 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Ridge Regression and Ill-Conditioning
No ratings yet
Ridge Regression and Ill-Conditioning
10 pages
Регрессии на пальцаx
100% (1)
Регрессии на пальцаx
118 pages
Ridge Regression: Patrick Breheny
No ratings yet
Ridge Regression: Patrick Breheny
22 pages
Ols 23-24
No ratings yet
Ols 23-24
87 pages
7SSMM700 Lecture 8
No ratings yet
7SSMM700 Lecture 8
33 pages
Lasso Ridge Notes
No ratings yet
Lasso Ridge Notes
2 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
1.1. Linear Models — scikit-learn 1.6.1 documentation
No ratings yet
1.1. Linear Models — scikit-learn 1.6.1 documentation
41 pages
Week 2 - Simple Linear Regression
No ratings yet
Week 2 - Simple Linear Regression
25 pages
Lecture Notes On Ridge Regression
No ratings yet
Lecture Notes On Ridge Regression
149 pages
CHP 3 PDF
No ratings yet
CHP 3 PDF
31 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
LN - ieML LogisticRegression
No ratings yet
LN - ieML LogisticRegression
21 pages
_Regularization_Methods_Intro_1694372556
No ratings yet
_Regularization_Methods_Intro_1694372556
38 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
Ridge Mt1cars
No ratings yet
Ridge Mt1cars
4 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Regid Regression
No ratings yet
Regid Regression
129 pages
Unit III
No ratings yet
Unit III
18 pages
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
No ratings yet
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
2 pages
Module 4 EDA
No ratings yet
Module 4 EDA
20 pages
Regression Shrinkage and Selection Via The Lasso
No ratings yet
Regression Shrinkage and Selection Via The Lasso
22 pages
Lec_5
No ratings yet
Lec_5
53 pages
Lecture Notes On Ridge Regression
No ratings yet
Lecture Notes On Ridge Regression
113 pages
Ridge Regression
No ratings yet
Ridge Regression
5 pages
Math644 Chapter 1 Part1
No ratings yet
Math644 Chapter 1 Part1
5 pages
Chapter 2
No ratings yet
Chapter 2
12 pages
2 RegularizedRegression
No ratings yet
2 RegularizedRegression
25 pages
Regression Models
No ratings yet
Regression Models
10 pages
sta255 Week 13-pre
No ratings yet
sta255 Week 13-pre
16 pages
Schaum's Easy Outline of Probability and Statistics, Revised Edition
From Everand
Schaum's Easy Outline of Probability and Statistics, Revised Edition
Schiller
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Discover Launch by Adobe - CourseDesc
No ratings yet
Discover Launch by Adobe - CourseDesc
1 page
Two Discrete Random Variables
No ratings yet
Two Discrete Random Variables
5 pages
Best Fit Line Regression
No ratings yet
Best Fit Line Regression
7 pages
Trinomial Distribution
100% (1)
Trinomial Distribution
2 pages
Inverse of A Function
100% (2)
Inverse of A Function
10 pages
Normality
No ratings yet
Normality
41 pages
Capitulo7 Gujarati
No ratings yet
Capitulo7 Gujarati
46 pages
Manual Benchmarking JDemetra
No ratings yet
Manual Benchmarking JDemetra
12 pages
Term Paper On FDI
67% (3)
Term Paper On FDI
42 pages
Assessment of Economic Impact of The Development
No ratings yet
Assessment of Economic Impact of The Development
35 pages
Empirical Anomalies Based On Unexpected Earnings and The Importance of Risk Adjustments - 1982 - Journal of Financial Economics PDF
No ratings yet
Empirical Anomalies Based On Unexpected Earnings and The Importance of Risk Adjustments - 1982 - Journal of Financial Economics PDF
19 pages
Engineering Statistics: Curve Fitting
No ratings yet
Engineering Statistics: Curve Fitting
24 pages
What Prompts Japan To Intervene in The Forex Market? A New Approach To A Reaction Function
No ratings yet
What Prompts Japan To Intervene in The Forex Market? A New Approach To A Reaction Function
20 pages
Week 1
No ratings yet
Week 1
75 pages
Fixed Vs Random The Hausman Test Four Decades Later
No ratings yet
Fixed Vs Random The Hausman Test Four Decades Later
33 pages
Quickly download Introduction to Econometrics 2nd Edition Stock Test Bank in PDF with every chapter.
100% (9)
Quickly download Introduction to Econometrics 2nd Edition Stock Test Bank in PDF with every chapter.
52 pages
Simple Regression Model CH02
No ratings yet
Simple Regression Model CH02
60 pages
Econometrics Exam
No ratings yet
Econometrics Exam
8 pages
Lecture 7 VAR, VECM and Multivariate Cointegration
No ratings yet
Lecture 7 VAR, VECM and Multivariate Cointegration
53 pages
Multiple Regression OLS Asymptotics
No ratings yet
Multiple Regression OLS Asymptotics
20 pages
TOPIC 5; PREDICTION WITH MANY REGRESSORS AND BIG DATA (PART 1)
No ratings yet
TOPIC 5; PREDICTION WITH MANY REGRESSORS AND BIG DATA (PART 1)
13 pages
Econometrics CH 1-4
100% (1)
Econometrics CH 1-4
315 pages
Introduction To Regression With Statsmodels in Python
No ratings yet
Introduction To Regression With Statsmodels in Python
142 pages
Determinants of Working Capital Requirements in
No ratings yet
Determinants of Working Capital Requirements in
12 pages
Soetanto Dan Liem (2019) INTELLECTUAL CAPITAL IN All Sector in Indonesia
No ratings yet
Soetanto Dan Liem (2019) INTELLECTUAL CAPITAL IN All Sector in Indonesia
22 pages
EC220 2017 Paper ST
No ratings yet
EC220 2017 Paper ST
5 pages
14 Estimation of Demand of Mik and Milk Products in India
No ratings yet
14 Estimation of Demand of Mik and Milk Products in India
181 pages
Mock
No ratings yet
Mock
35 pages
Our Blog: Solving The Problem of Heteroscedasticity Through Weighted Regression
No ratings yet
Our Blog: Solving The Problem of Heteroscedasticity Through Weighted Regression
21 pages
Chapter 4 Transformations and Weighting To Correct Model Inadequacies 13 March
No ratings yet
Chapter 4 Transformations and Weighting To Correct Model Inadequacies 13 March
27 pages
Linear Model Small PDF
No ratings yet
Linear Model Small PDF
15 pages
Instant Download Applied Financial Econometrics: Theory, Method and Applications 1st Edition Maiti PDF All Chapters
100% (1)
Instant Download Applied Financial Econometrics: Theory, Method and Applications 1st Edition Maiti PDF All Chapters
76 pages
Unit7_Autocorrelation
No ratings yet
Unit7_Autocorrelation
11 pages