0% found this document useful (0 votes)

1 views

Elements of Statistical Learning II - Ch.3 Linear Regression - Notes

Uploaded by

andreas.theodoulou3

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Elements of Statistical Learning II - Ch.3 Linear Regression - Notes

Uploaded by

andreas.theodoulou3

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Ch.

3 Linear Models

Case study
Note: For a linear regression (LinReg, Lasso, Ridge) case study see “Linear Regression case study
(sklearn).pdf”

- includes categorical data (one hot encoding), numerical data scaling (to interpret coeff
correctly), fat tails in target (take log), multi-collinearity/correlated features and their effect
on coefficient interpretation (unstable coeffs), determiming stability of coeffs (cross val),
interpretation of coeffs (effect/relationship of Xi on Y given others X’s remain constant
(conditional dependence – as opposed to independence/marginal dependence from
correlation of Xi and Y e.g.)

Linear Regression
Linear Regression Assumptions

Effect of Linear Reg assumptions - Inference vs Prediction ( Stats vs ML)

- Basically Linear Regression’s assumptions affect inference and not prediction (unless obvious
ones like linearity in inputs). See link below
- https://stats.stackexchange.com/questions/486672/why-dont-linear-regression-
assumptions-matter-in-machine-learning

Linear Regression assumptions list

- No multi-collinearity/Orthogonal variables
- No heteroskedacity (the variance of residual is the same for any value of X)
- Gaussian errors/For any fixed value of X, y is normally distribution
- Linearity
- Independence: No autocorrelation/Independence in the errors

Diagnostic for Linear regression assumptions

- https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R5_Correlation-Regression/
R5_Correlation-Regression7.html
- https://www.statisticssolutions.com/assumptions-of-linear-regression/ (includes multi-
collinearity)
- http://people.duke.edu/~rnau/testing.htm

Effect of these assumptions

- Mainly on p-values of coefficients (whether they are significant)

o This is based on a statistical test, and therefore some of its assumptions (/linear
regression’s assumptions)
- The ones that affect prediction are intuitive/more generic to SL in general
o 1) linearity (i.e. choice of relationship/model between X and Y) 2) independence
among (time series of) errors, 3) when using MSE, tackling unwanted outliers
otherwise will highly fit towards them (Gaussian errors assumptions – MSE is MLE
for Gaussian errors/target)
Multicollinearity assumption

- Affects coefficients (and their interpretation) and p-values of them (statistical significance).
Does not affect predictive power (see book reference below)
- More info on multicollinearity:
o https://statisticsbyjim.com/regression/multicollinearity-in-regression-analysis/
o Includes “Do I have to fix Multicollinearity” (short answer – if just for predictive
power no, if also for inference maybe), “How to deal with Multicollinearity”
 The fact that some or all predictor variables are correlated among
themselves does not, in general, inhibit our ability to obtain a good fit
nor does it tend to affect inferences about mean responses or
predictions of new observations. —Applied Linear Statistical Models,
p289, 4th Edition.

Interpreting inference in linear regression and other areas of caution

- http://people.duke.edu/~rnau/regnotes.htm

Interpretation of coefficients of linear models (and Common pitfalls)

https://scikit-learn.org/stable/auto_examples/inspection/
plot_linear_model_coefficient_interpretation.html#sphx-glr-auto-examples-inspection-plot-linear-
model-coefficient-interpretation-py

Subset/Stepwise regression Selection

- Best subset selection (all possible combinations for K features) (equivalent to L0
regualrization)
- Forward/Backward subset selection

Shrinkage methods
- Lasso, Ridge

Purpose

- ridge regression and lasso regression are designed to deal with situations in
which the candidate independent variables are highly correlated with each and/or
their number is large relative to the sample size (i.e. over fitting), but those
methods are beyond the scope of this discussion.
o Can think of it that since we are tackling beta and with 0 shrinkage
parameters betas are directly proportional to covariance matrix of inputs
(ols solution) then we are reducing covariance matrix by reducing beta.
(this is an approximation since when shrinkage > 0 then solution does not
exactly include the covariance matrix but should approximately involve)

Lasso vs Ridge (vs Elastic Net)

Predictive power
- Ridge/L2 usually better than L1/Lasso – but try both or elastic net
- https://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization-How-
does-it-solve-the-problem-of-overfitting-Which-regularizer-to-use-and-when
- https://stats.stackexchange.com/questions/331782/if-only-prediction-is-of-interest-why-
use-lasso-over-ridge/331809#331809
o Intuition: when you have collinear variables ridge will keep both, just shrink both
whereas lasso will typically “randomly” kick out one. By having more variables to
rely on you have more “diversification”. But does depend on the true distribution of
regression coefficients. If you have a small fraction of nonzero coefficients in truth,
lasso can perform better

Differences

- L1 for feature selection, L2, on the other hand, is useful when you have collinear/codependent
features.
o https://explained.ai/regularization/L1vsL2.html
- L2 Shrinks more higher values of coefficients and less smaller values of Coefficients. L1 does
this equally (just by thinking β^2 and abs(β))
o From a Bayesian perspective, L2/Ridge assumes a prior distribution of normal
distribution on coefficents whereas L1/Lasso a laplacian

- Generally, when you have many small/medium sized effects you should
go with ridge. If you have only a few variables with a medium/large
effect, go with lasso. Hastie, Tibshirani, Friedman (ESLII)
-

Elastic Net

https://www.quora.com/What-is-the-advantage-of-combining-L2-and-L1-regularizations

Elastic net (that is, L1 + L2 regularization) is definitely “worth it” in the

following situations, as noted in the paper first disclosing the algorithm, Zou,
Hui; Hastie, Trevor (2005). "Regularization and Variable Selection via the Elastic Net":

 In the p>n case, the lasso selects at most n variables before it

saturates, because of the nature of the convex optimization problem.
This seems to be a limiting feature for a variable selection method.
Moreover, the lasso is not well defined unless the bound on the L1-
norm of the coefficients is smaller than a certain value.
 If there is a group of variables among which the pairwise correlations
are very high, then the lasso tends to select only one variable from
the group and does not care which one is selected. [This is partially
rectified by the group LASSO algorithm, although the user must then
identify the group.]
 For usual n>p situations, if there are high correlations between
predictors, it has been empirically observed that the prediction
performance of the lasso is dominated by ridge regression.

L1 vs L2 regularisation generally (e.g. NN)

- On high level doing the same thing – reducing dfs

- However, the way the weights drop is different: In L2 Regularization the weight
reduction is multiplicative and proportional to the value of the weight, so it is faster
for large weights and de-accelerates as the weights get smaller. In L1
Regularization on the other hand, the weights are reduced by a fixed amount in
every iteration, irrespective of the value of the weight.

Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Notes IAS 19
No ratings yet
Notes IAS 19
18 pages
Logistic Regression Tutorial
No ratings yet
Logistic Regression Tutorial
25 pages
GZLM
No ratings yet
GZLM
3 pages
Linear Regression Assumptions and Limitations
No ratings yet
Linear Regression Assumptions and Limitations
10 pages
Multiple Regression: Curve Estimation
100% (2)
Multiple Regression: Curve Estimation
23 pages
National University of Modern Languages Lahore Campus Topic
No ratings yet
National University of Modern Languages Lahore Campus Topic
5 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
30 pages
Machine Learning Question Bank-Unit 3
No ratings yet
Machine Learning Question Bank-Unit 3
6 pages
ML points
No ratings yet
ML points
13 pages
Thesis Using Logistic Regression
100% (2)
Thesis Using Logistic Regression
7 pages
DA Unit-3
No ratings yet
DA Unit-3
14 pages
UNIT-2 NOTES
No ratings yet
UNIT-2 NOTES
30 pages
Linear Discriminat Analysis
No ratings yet
Linear Discriminat Analysis
23 pages
AML-3
No ratings yet
AML-3
19 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
BRM Multivariate Notes
No ratings yet
BRM Multivariate Notes
22 pages
Common Confusions and Mistakes
No ratings yet
Common Confusions and Mistakes
8 pages
LassoRegression
No ratings yet
LassoRegression
3 pages
ml_unit_3_notes
No ratings yet
ml_unit_3_notes
12 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
Linear Regression
100% (2)
Linear Regression
28 pages
(Rajaratman) LASSO Multicoll
No ratings yet
(Rajaratman) LASSO Multicoll
57 pages
Ridge Regression
No ratings yet
Ridge Regression
24 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
+part 04 - AMEFA - 2024 - Introduction and Repetition
No ratings yet
+part 04 - AMEFA - 2024 - Introduction and Repetition
69 pages
Regression Analysis
100% (2)
Regression Analysis
11 pages
Module 4: Regression Shrinkage Methods
No ratings yet
Module 4: Regression Shrinkage Methods
5 pages
Multiple Regression Analysis
100% (7)
Multiple Regression Analysis
6 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Least Squares After Model Selection in High-Dimensional Sparse Models
No ratings yet
Least Squares After Model Selection in High-Dimensional Sparse Models
27 pages
Lasso Regression
No ratings yet
Lasso Regression
16 pages
Linear Models and Linear Mixed Effects Models in R
No ratings yet
Linear Models and Linear Mixed Effects Models in R
5 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
9 pages
Regression
No ratings yet
Regression
45 pages
Statistics
100% (1)
Statistics
743 pages
SPSS Regression Spring 2010
No ratings yet
SPSS Regression Spring 2010
9 pages
Lasso Vs Ridge Vs Elastic 1
No ratings yet
Lasso Vs Ridge Vs Elastic 1
5 pages
BRM Multivariate Notes
No ratings yet
BRM Multivariate Notes
22 pages
Data Science Q&A - Latest Ed (2020) - 3 - 1
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 1
2 pages
Machine Learning Algorithm
100% (2)
Machine Learning Algorithm
20 pages
2-Logistic Regression
No ratings yet
2-Logistic Regression
15 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
Unit - II_DA
No ratings yet
Unit - II_DA
22 pages
Ref 7
No ratings yet
Ref 7
12 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Logistic Regression: in Experimental Research
No ratings yet
Logistic Regression: in Experimental Research
12 pages
Statistical Testing and Prediction Using Linear Regression: Abstract
No ratings yet
Statistical Testing and Prediction Using Linear Regression: Abstract
10 pages
spss10 LOGIT
No ratings yet
spss10 LOGIT
17 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Types of Regression
No ratings yet
Types of Regression
8 pages
Testing The Assumptions of Linear Regression
100% (1)
Testing The Assumptions of Linear Regression
14 pages
Unit 2
No ratings yet
Unit 2
67 pages
Assumptions of Logistic Regression
100% (1)
Assumptions of Logistic Regression
2 pages
Dynamic Bayesian Networks: Fundamentals and Applications
From Everand
Dynamic Bayesian Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet
Cross Correlation: Unlocking Patterns in Computer Vision
From Everand
Cross Correlation: Unlocking Patterns in Computer Vision
Fouad Sabry
No ratings yet
Bayesian Decision Networks: Fundamentals and Applications
From Everand
Bayesian Decision Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet
LLMs - Adapting Foundation Models - Notes
No ratings yet
LLMs - Adapting Foundation Models - Notes
3 pages
Feature Selection
No ratings yet
Feature Selection
1 page
Interpretable Machine Learning (Christoph Molnar) notes
No ratings yet
Interpretable Machine Learning (Christoph Molnar) notes
9 pages
Elements of Statistical Learning II - Ch.7 Model Assessment and Selection - Notes
No ratings yet
Elements of Statistical Learning II - Ch.7 Model Assessment and Selection - Notes
2 pages
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
No ratings yet
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
5 pages
Sample SPSS Project Paper #14-1
No ratings yet
Sample SPSS Project Paper #14-1
9 pages
Interest and Annuity Tables For Discrete Compounding
No ratings yet
Interest and Annuity Tables For Discrete Compounding
19 pages
Multiple Regression and Correlation Analysis
No ratings yet
Multiple Regression and Correlation Analysis
1 page
ASOP 54 Pricing of Life and Annuity Products
No ratings yet
ASOP 54 Pricing of Life and Annuity Products
27 pages
PV Table
No ratings yet
PV Table
2 pages
IFRS 17 Risk Adjustment For Non-Financial Risk For Life and Health Insurance Contracts
No ratings yet
IFRS 17 Risk Adjustment For Non-Financial Risk For Life and Health Insurance Contracts
34 pages
Course Outline
No ratings yet
Course Outline
3 pages
Wanjiru Kimondo - Analysis of Longevity Risk Using The Renshaw and Haberman Model
No ratings yet
Wanjiru Kimondo - Analysis of Longevity Risk Using The Renshaw and Haberman Model
53 pages
P08 - 178380 - Eviews Guide
No ratings yet
P08 - 178380 - Eviews Guide
9 pages
Stat 136 Chapter 10 Nonnormality and Heteroskedasticity
No ratings yet
Stat 136 Chapter 10 Nonnormality and Heteroskedasticity
49 pages
UT1- Question Bank 2023-2024
No ratings yet
UT1- Question Bank 2023-2024
2 pages
Formula Annuity
No ratings yet
Formula Annuity
9 pages
PERS Experience Investigation Report 2018
No ratings yet
PERS Experience Investigation Report 2018
74 pages
Chapter 14 (14.1 - 14.2)
No ratings yet
Chapter 14 (14.1 - 14.2)
22 pages
HW #5 - Linear Regression F24 SOLUTIONS
No ratings yet
HW #5 - Linear Regression F24 SOLUTIONS
4 pages
Daftar Buku Ujian PAI
No ratings yet
Daftar Buku Ujian PAI
6 pages
ManagementLetter - Possible Points
100% (3)
ManagementLetter - Possible Points
103 pages
Lesson 2 - Applications of Valuing Cash Flows
No ratings yet
Lesson 2 - Applications of Valuing Cash Flows
10 pages
IAS19-EMPLOYEE BENEFITS - Calubaquib, Shine - Clemente, Ryanne
No ratings yet
IAS19-EMPLOYEE BENEFITS - Calubaquib, Shine - Clemente, Ryanne
36 pages
Clinical Versus Actuarial Judgment Dawes Faust and Meehl
No ratings yet
Clinical Versus Actuarial Judgment Dawes Faust and Meehl
2 pages
Pre Need Code
No ratings yet
Pre Need Code
24 pages
Taller Final Estadistica Tercer Corte
No ratings yet
Taller Final Estadistica Tercer Corte
5 pages
Econometrics Mock Exam - Solutions
No ratings yet
Econometrics Mock Exam - Solutions
3 pages
Chapter 13 Simple Regression
No ratings yet
Chapter 13 Simple Regression
44 pages
Dictionary of Insurance Terms - Fourth Edition
No ratings yet
Dictionary of Insurance Terms - Fourth Edition
590 pages
Dictionary of Life Insurance Terminolgy
No ratings yet
Dictionary of Life Insurance Terminolgy
51 pages
Lecture-2 Least Squares Regression
No ratings yet
Lecture-2 Least Squares Regression
18 pages
Australian Bureau of Statistics: 3302055001DO001 - 20112013 Life Tables, States, Territories and Australia, 2011-2013
No ratings yet
Australian Bureau of Statistics: 3302055001DO001 - 20112013 Life Tables, States, Territories and Australia, 2011-2013
16 pages
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
11 pages

Elements of Statistical Learning II - Ch.3 Linear Regression - Notes

Uploaded by

Elements of Statistical Learning II - Ch.3 Linear Regression - Notes

Uploaded by

Ch.

Effect of Linear Reg assumptions - Inference vs Prediction ( Stats vs ML)

Linear Regression assumptions list

Diagnostic for Linear regression assumptions

Effect of these assumptions

- Mainly on p-values of coefficients (whether they are significant)

Interpreting inference in linear regression and other areas of caution

Interpretation of coefficients of linear models (and Common pitfalls)

Subset/Stepwise regression Selection

Lasso vs Ridge (vs Elastic Net)

Elastic net (that is, L1 + L2 regularization) is definitely “worth it” in the

 In the p>n case, the lasso selects at most n variables before it

L1 vs L2 regularisation generally (e.g. NN)

- On high level doing the same thing – reducing dfs

You might also like