0% found this document useful (0 votes)

6 views

Lecture 7

This lecture focuses on supervised learning, particularly linear regression, and introduces concepts like overfitting, regularization, and cross-validation. It covers methods such as Lasso and Ridge regression for managing overfitting and emphasizes the importance of hyperparameter tuning and K-fold cross-validation for model selection. The session includes practical Python applications to reinforce these concepts.

Uploaded by

Geetha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lecture 7

Uploaded by

Geetha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Programming for Data Science

Lecture 7 – Supervised Learning, Continued.

Thomas Lavastida
University of Texas at Dallas
[email protected]
Spring 2023
Agenda

• Assignment 2 Review
• Quick review of Supervised Learning and Linear Regression
• Linear Regression in Python
• Start Regularization and Cross Validation

2
Assignment 2 Review
Supervised Learning and Regression Review
Supervised Learning

• Given – labelled data points

• – features, independent variables, predictors, columns, etc.
• – target, dependent variable, outcome, etc.
• Continuous -> then we call this regression
• Discrete/categorical -> then we call this classification

• Goal: Find a mapping/function from ’s to ’s such that

Linear Regression

• Simple class of regression models

• Let be independent variables
• Model parameters (one for each indep. variable)
• Predicted outcome computed via a linear function:

• Compute ’s by minimizing average squared error

Overfitting

• As model gets more complex it can fit data

more closely
• New data we see (and want to make
predictions about) may not be fit well (i.e.,
high error)
• This is called overfitting

• Main idea to deal with this -> split into train

and test set
• Training set – used to compute model
parameters
• Test set – used to estimate accuracy of model
on new data
PYTHON PRACTICE
Review: Overfitting

• Model with overfitting problem

• Nice performance for data in hand
• Poor predictive accuracy for new dataset

• Solution 1 – Splitting data

• Training set: train the model (get parameters)
• Test set: evaluate performance

• Solution 2 – Regularization
Regularization – Intuition

• Overfitting occurrence: Too many variables

• True relationship: 𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀

• Fit the data w/ 10th degree polynomial

𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥 2 +… + 𝛽 10 𝑥 10 + 𝜀

Fewer variables
Regularization – Intuition (Cont.)

• Overfitting occurrence: Large variance/fluctuation

• Large coefficient => large fluctuation

• Under the same scale

• Green: 4 3 2
𝑓 ( 𝑥 ) =− 𝑥 +7 𝑥 − 5 𝑥 − 31 𝑥 +30

• Blue: 1
𝑔 ( 𝑥 )=− 𝑓 ( 𝑥)
5

Smaller coefficients

https://www.datacamp.com/community/tutorials/towards-preventing-overfitting-regularization
Regularization – Intuition (Cont.)

• What we need
• Smaller coefficients (coefficient closer to 0)
• Fewer variables (coefficient = 0)

• Penalize the magnitude of coefficients

• Regularization
• Modify our original linear regression model
• Add terms to penalize the magnitude of coefficients
Regularization

• Linear regression (fit only)

• Minimize the error between actual and predicted value
𝑛
𝑓 (𝝎)=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) )
2

𝑖=1

• Regularization (fit and overfit)

• Minimize the error between predicted and actual examples
• Penalize the coefficient magnitude of features

𝑛
𝑓 ( 𝝎 )=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) ) + 𝑃𝑒𝑛𝑎𝑙𝑡𝑦(𝝎)
2

𝑖=1
Regularization – Two Methods

𝑛
𝑓 ( 𝝎 )=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) ) + 𝑃𝑒𝑛𝑎𝑙𝑡𝑦(𝝎)
2

𝑖=1
Shrinkage Penalty

• Two formulation of shrinkage penalty

• L2 regularization: equivalent to the square of coefficient magnitude
=> Ridge regression

• L1 regularization: equivalent to absolute value of coefficient magnitude

=> Lasso regression
Ridge Regression

• Linear regression with L2 regularization (square of parameters)

• Minimize function:
𝑛 𝑘
𝑓 ( 𝝎 )=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) ) + 𝜆 ∑ 𝜔
2 2 Shrinkage Penalty
𝑗
𝑖=1 𝑗=1
where

• Large magnitude increases

• the amount of penalty

Ridge Regression – Tuning Parameter

𝑛 𝑘
𝑓 ( 𝝎 )=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) ) + 𝜆 ∑ 𝜔
2 2
𝑗
𝑖=1 𝑗=1

• the amount of penalty

• => A linear regression
• => All coefficients would be zero
• Higher , more penalty, smaller coefficients

• – hyperparameter
• NOT estimated with other parameters
• Set “manually” before model estimation
LASSO

• Linear regression with L1 regularization (absolute value of parameters)

𝑛 𝑘
𝑓 ( 𝝎 )=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) ) + 𝜆 ∑ |𝜔 𝑗|
2

𝑖=1 𝑗=1
where .

• L1 penalty can force some coefficient estimates to be exactly zero

• Combines the shrinking advantage of ridge regression with variable selection

• LASSO: Least absolute shrinkage and selection operator

Hyperparameter Tuning and Cross
Validation
Hyperparameter Tuning

• Hyperparameters – set before running the model

• Examples
• LASSO and Ridge –
• Polynomial – degree of polynomial ()

• Intuition of tuning (polynomial case)

• Start by some potential values,
• For each , run the model
• Select the model with the best performance
Tuning Method – Grid Search

• Try all possible hyperparameters of interest

• Most commonly used method for hyperparameter tuning

• Polynomial regression case

• Define a set of potential polynomial degrees
• Estimate, evaluate, choose
Degree MSE Values

Lowest value, selected model

…

• Select the model with best performance … on which dataset?

Data Splitting – Model Training

• Model selection?
Labeled Data
• For each model, get performance
measure in test set
Training Set Test Set • Select model with best performance
in test data
Data Data
• Problem
Model Prediction and • “best model?”
Training Evaluation • “best fit for test set!”
Parameter
Estimates
• Overfitting test set

Performance measure (e.g., MSE) in test set • Solution: more splits

is unbiased (untouched new data)
Data Splitting – Model Selection

Original Training Set Test Set

Training Set Validation Set

• Validation set:
Data Data • Used for model selection (e.g.,
hyperparameter tuning)
Model Model
Training Selection • Test set:
Parameter • Untouched for training and selection
Estimates
• Used for model assessment
(generalizability)
Limitations of Single Splitting (Partition)

• Data waste: method applies to less data

• If not enough data – unreliable result

• Small training set
• Small test set

• Solution: Cross Validation

K-Fold Cross Validation

• Randomly cut dataset into segments

• Use the th segment as test set, the rest as training set
• Obtain , the mean squared error on the th segment (test set)
• After iterations, calculate mean of

𝑀𝑆 𝐸 1 𝑀𝑆 𝐸 2 𝑀𝑆 𝐸 5
1 1 1 1
2 2 2 2
3 3 3 … 3
4 4 4 4
5 5 5 5
K-Fold Cross Validation

• No data put to waste

• Small dataset
• Involves more data to train the model
• Reliable by taking the mean of multiple

• Model selection
• Using more data to evaluate performance of each model
CV for Model Selection

• Combine CV with grid search

• Example:
• Polynomial, grid search for degree, CV

• Leave a portion for test set

• Set grid for hyperparameter (let n be polynomial degree)
• Select model from CV

Degree MSE Values

Apply to test set

Lowest CV score
…
Grid Search with CV

• Manually set a grid of discrete hyperparameter values

• Set a metric for model performance

• Search exhaustively through the grid

• For each set of hyperparameters, evaluate each model’s CV score
• The optimal hyperparameters are those of the model achieving the best CV score
Tuning is expensive

• Run model repetitively

• N grid, K-fold CV => NK iterations
• Example: 20 grid, 5-fold CV

• Computationally expensive

• Sometimes very slight improvement

PYTHON PRACTICE

The Art of ChatGPT Prompting - A Guide To Crafting Clear and Effective Prompts PDF
100% (3)
The Art of ChatGPT Prompting - A Guide To Crafting Clear and Effective Prompts PDF
31 pages
Homework Eats Dog and Other Woeful Tales Script
100% (1)
Homework Eats Dog and Other Woeful Tales Script
7 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Lecture 3
No ratings yet
Lecture 3
61 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Lecture 4_Regularization
No ratings yet
Lecture 4_Regularization
22 pages
Class 9 after
No ratings yet
Class 9 after
38 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
BiasVariance
No ratings yet
BiasVariance
14 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
116 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
ML-1
No ratings yet
ML-1
24 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
IML-Summary
No ratings yet
IML-Summary
12 pages
Over Fit
No ratings yet
Over Fit
63 pages
week2
No ratings yet
week2
43 pages
Advanced Regression Pres
No ratings yet
Advanced Regression Pres
42 pages
Machine learning
No ratings yet
Machine learning
19 pages
PS Notes (Machine Learning
No ratings yet
PS Notes (Machine Learning
14 pages
Regression-and-generalization (1)
No ratings yet
Regression-and-generalization (1)
67 pages
Choosing Model and Tuning
No ratings yet
Choosing Model and Tuning
20 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
ML Interview Questions
No ratings yet
ML Interview Questions
10 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
57 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Overfitting Regression
No ratings yet
Overfitting Regression
14 pages
1729585037_ML11_Generalization
No ratings yet
1729585037_ML11_Generalization
40 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
ml_exam_answers
No ratings yet
ml_exam_answers
26 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
Regularization
No ratings yet
Regularization
42 pages
Notes - Unit 3 - Machine Learning Lnctu-bca (Aida) - IV Sem - (1)
No ratings yet
Notes - Unit 3 - Machine Learning Lnctu-bca (Aida) - IV Sem - (1)
19 pages
Lecture 7
No ratings yet
Lecture 7
19 pages
ML 1 Lecture 2
No ratings yet
ML 1 Lecture 2
50 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
unit 2 (1)
No ratings yet
unit 2 (1)
23 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
LAB5_Regularization
No ratings yet
LAB5_Regularization
6 pages
Overfitting & Feature Engineering.pptx
No ratings yet
Overfitting & Feature Engineering.pptx
37 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
Mauryan Empire
No ratings yet
Mauryan Empire
11 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
unit 4 regression
No ratings yet
unit 4 regression
26 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Logistic
No ratings yet
Logistic
14 pages
Lec3 Linear Regression With Multiple Vars
No ratings yet
Lec3 Linear Regression With Multiple Vars
30 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Tiamo 3.0: Manual
No ratings yet
Tiamo 3.0: Manual
1,732 pages
UN Women Branding Guidelines
No ratings yet
UN Women Branding Guidelines
37 pages
SpesifikasiPengadaan Peralatan
No ratings yet
SpesifikasiPengadaan Peralatan
5 pages
QWS - Project Proposal
No ratings yet
QWS - Project Proposal
33 pages
Tally Set
No ratings yet
Tally Set
27 pages
SOFTENG - 02 Laboratory Exercise 1 Answer
No ratings yet
SOFTENG - 02 Laboratory Exercise 1 Answer
4 pages
CDSZGV
No ratings yet
CDSZGV
14 pages
Abstract, Inner, Static Classes Questions: Class Public Public
No ratings yet
Abstract, Inner, Static Classes Questions: Class Public Public
7 pages
ENA&BBS Controller Reprogram Manual
No ratings yet
ENA&BBS Controller Reprogram Manual
8 pages
COSC 1202 Object Oriented Programming Lab
No ratings yet
COSC 1202 Object Oriented Programming Lab
118 pages
Trends in computer operating systems
No ratings yet
Trends in computer operating systems
5 pages
Expt 06 - PID Tunning Using Software and Modeling and Analysis of Mechanical System and Its Verification Using Suitable Simulation Software
No ratings yet
Expt 06 - PID Tunning Using Software and Modeling and Analysis of Mechanical System and Its Verification Using Suitable Simulation Software
6 pages
Sapbasis - Home.blog - STMS
No ratings yet
Sapbasis - Home.blog - STMS
6 pages
Compute Your Grades With Excel: AVERAGE Function
No ratings yet
Compute Your Grades With Excel: AVERAGE Function
4 pages
OFSAAI_Administration_Guide_8.0
No ratings yet
OFSAAI_Administration_Guide_8.0
133 pages
Sitecore 10 Exam Certification Crash-Course
No ratings yet
Sitecore 10 Exam Certification Crash-Course
62 pages
Cheung 2021
No ratings yet
Cheung 2021
18 pages
Ch. 3 Simultaneous Equations
No ratings yet
Ch. 3 Simultaneous Equations
27 pages
PETRONAS Technologies - Brochure - v2
No ratings yet
PETRONAS Technologies - Brochure - v2
46 pages
Csuite Magazine - V7 2022
No ratings yet
Csuite Magazine - V7 2022
48 pages
ReactJS PDF
No ratings yet
ReactJS PDF
403 pages
PNB Specialist Officer Vacancy
No ratings yet
PNB Specialist Officer Vacancy
24 pages
Lecture 05-WLAN Basics.ppt
No ratings yet
Lecture 05-WLAN Basics.ppt
47 pages
Microstrategy Release Notes - 11 0
No ratings yet
Microstrategy Release Notes - 11 0
34 pages
Geolog6.6 Determin Tutorial
No ratings yet
Geolog6.6 Determin Tutorial
124 pages
Bluesky UserManual
No ratings yet
Bluesky UserManual
6 pages
STAT 4 - CLP-Unit 2 - To Pass
No ratings yet
STAT 4 - CLP-Unit 2 - To Pass
6 pages
Information Disclosure Vulnerability
No ratings yet
Information Disclosure Vulnerability
8 pages

Lecture 7

Uploaded by

Lecture 7

Uploaded by

Programming for Data Science

Lecture 7 – Supervised Learning, Continued.

• Given – labelled data points

• Goal: Find a mapping/function from ’s to ’s such that

• Simple class of regression models

• Compute ’s by minimizing average squared error

• As model gets more complex it can fit data

• Main idea to deal with this -> split into train

• Model with overfitting problem

• Solution 1 – Splitting data

• Overfitting occurrence: Too many variables

• Fit the data w/ 10th degree polynomial

• Overfitting occurrence: Large variance/fluctuation

• Large coefficient => large fluctuation

• Penalize the magnitude of coefficients

• Linear regression (fit only)

• Regularization (fit and overfit)

• Two formulation of shrinkage penalty

• L1 regularization: equivalent to absolute value of coefficient magnitude

• Linear regression with L2 regularization (square of parameters)

• Large magnitude increases

• the amount of penalty

• the amount of penalty

• Linear regression with L1 regularization (absolute value of parameters)

• L1 penalty can force some coefficient estimates to be exactly zero

• Combines the shrinking advantage of ridge regression with variable selection

• LASSO: Least absolute shrinkage and selection operator

• Hyperparameters – set before running the model

• Intuition of tuning (polynomial case)

• Try all possible hyperparameters of interest

• Most commonly used method for hyperparameter tuning

• Polynomial regression case

Lowest value, selected model

• Select the model with best performance … on which dataset?

Performance measure (e.g., MSE) in test set • Solution: more splits

Original Training Set Test Set

Training Set Validation Set

• Data waste: method applies to less data

• If not enough data – unreliable result

• Solution: Cross Validation

• Randomly cut dataset into segments

• No data put to waste

• Combine CV with grid search

• Leave a portion for test set

Degree MSE Values

• Manually set a grid of discrete hyperparameter values

• Search exhaustively through the grid

• Run model repetitively

• Sometimes very slight improvement

You might also like