Regularization
Regularization
[email protected]
6LGU0EZJIR
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Linear Regression Model -
The dataset has 9 attributes listed below that define the quality
[email protected]
6LGU0EZJIR 1. mpg: continuous
2. cylinders: multi-valued discrete
3. displacement: continuous
4. horsepower: continuous
5. weight: continuous
6. acceleration: continuous
7. model year: multi-valued discrete
8. origin: multi-valued discrete
9. car name: string (unique for each instance)
Sol : Ridge_Lasso_Regression.ipynb
This file is meant for personal use by [email protected] only.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Regularising Linear Models (Shrinkage methods)
When we have too many parameters and exposed to curse of dimensionality, we resort to dimensionality reduction
techniques such as transforming to PCA and eliminating the PCA with least magnitude of eigen values. This can be a
laborious process before we find the right number principal components. Instead, we can employ the shrinkage
methods.
Shrinkage methods attempt to shrink the coefficients of the attributes and lead us towards simpler yet effective
models. The two shrinkage methods are :
[email protected]
6LGU0EZJIR
1. Ridge regression is similar to the linear regression where the objective is to find the best fit
surface. The difference is in the way the best coefficients are found. Unlike linear regression where
the optimization function is SSE, here it is slightly different
1. TheLinear
term Regression
is like acost
penalty term used to penalize
function
large magnitude coefficients when it is set to
Ridge Regression with additional term in the cost function
a high number, coefficients are suppressed significantly. When it is set to 0, the cost function
becomes same as linear regression cost function
Large coefficients indicate a case where for a unit change in the input variable, the magnitude of change in the target
column is very large.
[email protected]
6LGU0EZJIR
Coeff for simple linear regression model of 10 dimensions Coeff with polynomial features shooting up to 57 from 10
-9.67853872e-13 -1.06672046e+12 -4.45865268e+00 -2.24519565e+00 -
1. The coefficient for cyl is 2.5059518049385052 2.96922206e+00 -1.56882955e+00 3.00019063e+00 -1.42031640e+12 -
2. The coefficient for disp is 2.5357082860560483 5.46189566e+11 3.62350196e+12 -2.88818173e+12 -1.16772461e+00 -
1.43814087e+00 -7.49492645e-03 2.59439087e+00 -1.92409515e+00 -
3. The coefficient for hp is -1.7889335736325294 3.41759793e+12 -6.27534905e+12 -2.44065576e+12 -2.32961194e+12
4. The coefficient for wt is -5.551819873098725 3.97766113e-01 1.94046021e-01 -4.26086426e-01 3.58203125e+00 -
5. The coefficient for acc is 0.11485734803440854 2.05296326e+00 -7.51019934e+11 -6.18967069e+11 -5.90805593e+11
6. The coefficient for yr is 2.931846548211609 2.47863770e-01 -6.68518066e-01 -1.92150879e+00 -7.37030029e-01 -
7. The coefficient for car_type is 2.977869737601944 1.01183732e+11 -8.33924574e+10 -7.95983063e+10 -1.70394897e-01
5.25512695e-01 -3.33097839e+00 1.56301740e+12 1.28818991e+12
8. The coefficient for origin_america is -0.5832955290166003 1.22958044e+12 5.80200195e-01 1.55352783e+00 3.64527008e+11
9. The coefficient for origin_asia is 0.3474931380432235 3.00431724e+11 2.86762821e+11 3.97644043e-01 8.58604718e+10
10.The coefficient for origin_europe is 0.3774164680868855 7.07635073e+10 6.75439422e+10 -7.25449332e+11 1.00689540e+12
9.61084146e+11 2.18532428e+11 -4.81675252e+12 2.63818648e+12
=0
ThisLearning.
Proprietary content. ©Great file is meantAll
for Rights
personal Reserved.
use by [email protected]
Unauthorized use only.or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Regularising Linear Models (Shrinkage methods)
[email protected]
6LGU0EZJIR Large coefficients have been suppressed, almost close to 0 in many cases.
Ref: Ridge_Lasso_Regression.ipynb
This file is meant for personal use by [email protected] only.
Proprietary content. ©Great
SharingLearning. AlltheRights
or publishing contentsReserved.
in part or full Unauthorized use or distribution prohibited
is liable for legal action.
Regularising Linear Models (Shrinkage methods)
1. Lasso Regression is similar to the Ridge regression with a difference in the penalty term. Unlike
Ridge, the penalty term here is raised to power 1. Also known as L1 norm.
1. The term continues to be the input parameter which will decide how high penalties would be
for the coefficients. Larger the value more diminished the coefficients will be.
[email protected]
6LGU0EZJIR
1. Unlike Ridge regression, where the coefficients are driven towards zero but may not become zero,
Lasso Regression penalty process will make many of the coefficients 0. In other words, literally
drop the dimensions
Lasso model: [ 0. 0.52263805 -0.5402102 -1.99423315 -4.55360385 -0.85285179 2.99044036 0.00711821 -0. 0.76073274 -0. -0. -0.19736449
0. 2.04221833 -1.00014513 0. -0. 4.28412669 -0. 0. 0.31442062 -0. 2.13894094 -1.06760107 0. -0. 0. 0. -0.44991392 -1.55885506 -0. -0.68837902 0.
0.17455864 -0.34653644 0.3313704 -2.84931966 0. -0.34340563 0.00815105 0.47019445 1.25759712 -0.69634581 0. 0.55528147 0.2948979 -0.67289549
0.06490671 0. -1.19639935 1.06711702 0. -0.88034391 0. -0. ]
[email protected]
6LGU0EZJIR Large coefficients have been suppressed, to 0 in many cases, making those dimensions useless i.e. dropped from
the model.
Ref: Ridge_Lasso_Regression.ipynb
This file is meant for personal use by [email protected] only.
Proprietary content. ©Great Learning.
Sharing All the
or publishing Rights Reserved.
contents Unauthorized
in part or full is liable for legaluse or distribution prohibited
action.
Regularising Linear Models (Comparing The Methods)
To compare the Ridge and Lasso, let us first transform our error function (which is
a quadratic / convex function) into a contour graph
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Regularising Linear Models (Ridge Constraint)
1. Yellow circle is the Ridge
Most optimal combination of m1, constraint region
Lowest SSE error
m2 given the constraints representing the ridge
ring. violates the
penalty (sum of squared
constraint coeff)
1. Any combination of m1
nd m2 that fall within
yellow is a possible
solution
[email protected]
6LGU0EZJIR 1. The most optimal of all
Sub-optimal solutions is the one
combination which satisfies the
Allowed combination of m1, m2. constraint and also
of m1, m2 by Ridge Meets minimizes the SSE
Constraints constraint but (smallest possible red
circle)
is not the
minimal 1. Thus the optimal solution
possible SSE of m1 and m2 is the one
within where the yellow circle
constraint touches a red circle.
The point to note is that the red rings and yellow circle will never be tangential (touch) on the axes
representing the coefficient. Hence Ridge can make coefficients close to zero but never zero. You may
notice some coefficients becoming zero but that will be due to roundoff…
This file is meant for personal use by [email protected] only.
Proprietary content. ©Great Learning.
Sharing All Rights
or publishing Reserved.
the contents Unauthorized
in part or full use
is liable for legal or distribution prohibited
action.
Regularising Linear Models (Ridge Constraint) 1. As the lambda value (shown here as alpha) increases,
the coefficients have to become smaller and smaller to
minimize the penalty term in the cost function i.e. the
1. The tighter the constraint region, the larger will be the red
circle in the contour diagram that will be tangent to the
boundary of the yellow region
[email protected]
6LGU0EZJIR 1. Thus, higher the lambda, stronger the shrinkage, the
coefficient shrink significantly and hence more smooth
the surface / model
The beauty of Lasso is, the red circle may touch the constraint region on the attribute axis! In the picture
above the circle is touching the yellow rectangle on the m1 axis. But at that point m2 coefficient is 0!
Which means, that dimension has been dropped from analysis. Thus Lasso does dimensionality reduction
which Ridge does not This file is meant for personal use by [email protected] only.
Sharing Learning.
Proprietary content. ©Great or publishingAll
theRights
contentsReserved.
in part or full is liable for legal action.
Unauthorized use or distribution prohibited