0% found this document useful (0 votes)

25 views23 pages

Lecture 4.2. Generalization and Regularization

Uploaded by

thaotrau55

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views23 pages

Lecture 4.2. Generalization and Regularization

Uploaded by

thaotrau55

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Learning Systems (DT8008)

• Overfitting and Generalization

• Regularization

Dr. Mohamed-Rafik Bouguelia

[email protected]

Halmstad University
Quick reminder about overfitting

2
The problem of overfitting

Underfitting Overfitting

3
The problem of overfitting
Classification Regression
Addressing overfitting
1. Model selection (previous lecture)
– You can try various models (of different complexity) and compute the
generalization error (as explained previously), and keep the best model.

2. Reducing the number of features (previous lecture)

– We are more likely to overfit when the number of features is high (relatively to
the size of the dataset).
• Manually select which features to keep / remove
• Or using feature selection algorithms

3. Using an ensemble method (previous lecture)

4. Using regularization (this lecture)

– Keep all features, but reduce the magnitude / values of parameters 𝜃𝜃𝑗𝑗
– Works well when we have a lot of features, and each feature contributes a bit to
predicting 𝑦𝑦

5
Regularization

6
Regularization - Motivation

ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜽𝜽𝟑𝟑 𝒙𝒙𝟑𝟑𝟏𝟏 + 𝜽𝜽𝟒𝟒 𝒙𝒙𝟒𝟒𝟏𝟏

We added more features, e.g. 𝒙𝒙𝟑𝟑𝟏𝟏 and 𝒙𝒙𝟒𝟒𝟏𝟏

Overfits the data poorly and

does not generalize well 

7
Regularization - Motivation

Suppose that we penalize and make 𝜃𝜃3 , 𝜃𝜃4 really small.

𝑛𝑛
1 2
min � ℎ𝜃𝜃 𝑥𝑥 (𝑖𝑖) − 𝑦𝑦 (𝑖𝑖) + 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 𝜽𝜽𝟐𝟐𝟑𝟑 + 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 𝜽𝜽𝟐𝟐𝟒𝟒
𝜃𝜃 2𝑛𝑛
𝑖𝑖=1
Then, the only way to make this new cost function small is if
𝜃𝜃3 and 𝜃𝜃4 are small

8
Regularization - Motivation

Suppose that we penalize and make 𝜃𝜃3 , 𝜃𝜃4 really small. ≈ 𝟎𝟎 ≈ 𝟎𝟎

9
Regularization
• Small values for parameters 𝜃𝜃0 , 𝜃𝜃1 , … , 𝜃𝜃𝑝𝑝
– Implies a simpler hypothesis
– Less prone to overfitting

• So we just modify our cost function as follows

𝜆𝜆 = Regularization parameter
(it’s a hyper-parameter)
10
Regularization
• Small values for parameters 𝜃𝜃0 , 𝜃𝜃1 , … , 𝜃𝜃𝑝𝑝
– Implies a simpler hypothesis
– Less prone to overfitting

• So we just modify our cost function as follows

𝝀𝝀 controls the trade-off Objective 1: Objective 2:

between two objectives: • Fit the training • Keep the
11parameters small
dataset well
Regularization

What happens if 𝝀𝝀 is set to zero ?

• This becomes our original cost function. Overfitting can happen.

What happens if 𝝀𝝀 is set to an extremely large value?

• The algorithm might result in underfitting.
• Example for Linear Regression:

Suppose:
ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜃𝜃3 𝑥𝑥13 + 𝜃𝜃4 𝑥𝑥44

12
Regularization

What happens if 𝝀𝝀 is set to zero ?

• This becomes our original cost function. Overfitting can happen.

What happens if 𝝀𝝀 is set to an extremely large value?

• The algorithm might result in underfitting.
• Example for Linear Regression:

Suppose:
ℎ𝜃𝜃 𝑥𝑥 = 𝜽𝜽𝟎𝟎 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜃𝜃3 𝑥𝑥13 + 𝜃𝜃4 𝑥𝑥44

We will end up penalizing 𝜃𝜃1 , 𝜃𝜃2 , 𝜃𝜃3 , 𝜃𝜃4 (their value

Underfitting
will be close to 0)

13
Regularization

What happens if 𝝀𝝀 is set to zero ? So, it’s good to try

• This becomes our original cost function. Overfitting can happen. several values for 𝜆𝜆
and estimate the
What happens if 𝝀𝝀 is set to an extremely large value? generalization
• The algorithm might result in underfitting. error each time ...
• Example for Linear Regression:

Suppose:
ℎ𝜃𝜃 𝑥𝑥 = 𝜽𝜽𝟎𝟎 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜃𝜃3 𝑥𝑥13 + 𝜃𝜃4 𝑥𝑥44

We will end up penalizing 𝜃𝜃1 , 𝜃𝜃2 , 𝜃𝜃3 , 𝜃𝜃4 (their value

Underfitting
will be close to 0)

14
Regularized Linear Regression

15
Regularized Linear Regression
We minimize:

where ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃 𝑇𝑇 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥2 + ⋯ + 𝜃𝜃𝑑𝑑 𝑥𝑥𝑑𝑑

• By the way, how can you write 𝐸𝐸(𝜃𝜃) in a more compact way, using
vectors/matrices?

?
16
Regularized Linear Regression
We minimize:

where ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃 𝑇𝑇 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥2 + ⋯ + 𝜃𝜃𝑑𝑑 𝑥𝑥𝑑𝑑

• By the way, how can you write 𝐸𝐸(𝜃𝜃) in a more compact way, using
vectors/matrices?

vector of vector of true vector of parameters

predictions outputs 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑑𝑑 17
Regularized Linear Regression
Gradient Descent

Update
𝜃𝜃0 , 𝜃𝜃1 , … , 𝜃𝜃𝑑𝑑
simultaneously
same as

Some ratio This term is same as what

times current 𝜃𝜃𝑗𝑗 we had previously in GD. 18
Regularized Linear Regression
Normal equation
• Previously (in the lecture about linear regression), when we computed the
derivative of the cost function (without the regularization term) and set it
equal to 0 (to find optimal 𝜃𝜃), we found that the solution is:

• If we do the same while including the regularization term in our cost

function, then the solution would be:

19
Regularized Logistic Regression
(for classification)

20
Regularized Logistic Regression

ℎ𝜃𝜃 𝑥𝑥
= 𝑔𝑔(𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜃𝜃3 𝑥𝑥12 𝑥𝑥2
+ 𝜃𝜃4 𝑥𝑥12 𝑥𝑥22 + 𝜃𝜃5 𝑥𝑥12 𝑥𝑥23 + 𝜃𝜃6 𝑥𝑥13 𝑥𝑥2 … )

21
Regularized Logistic Regression

ℎ𝜃𝜃 𝑥𝑥
= 𝑔𝑔(𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜃𝜃3 𝑥𝑥12 𝑥𝑥2
+ 𝜃𝜃4 𝑥𝑥12 𝑥𝑥22 + 𝜽𝜽𝟓𝟓 𝑥𝑥12 𝑥𝑥23 + 𝜽𝜽𝟔𝟔 𝑥𝑥13 𝑥𝑥2 … )

Regularization term

22
Regularized Logistic Regression
Gradient Descent

Simultaneously
update all
parameters
𝜃𝜃0 , 𝜃𝜃1 , … , 𝜃𝜃𝑝𝑝

Maths FORMULA SHEET Class 10th (Prashant Kirad)
90% (70)
Maths FORMULA SHEET Class 10th (Prashant Kirad)
26 pages
Topic 03 - Simplex Method
No ratings yet
Topic 03 - Simplex Method
181 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages
07 Regularization
No ratings yet
07 Regularization
7 pages
Introduction To Machine Learning: The Problem of Overfitting
No ratings yet
Introduction To Machine Learning: The Problem of Overfitting
8 pages
Regularization
No ratings yet
Regularization
7 pages
5
No ratings yet
5
10 pages
Regularization PDF
No ratings yet
Regularization PDF
32 pages
The Problem of Overfitting: Overfitting With Linear Regression
No ratings yet
The Problem of Overfitting: Overfitting With Linear Regression
32 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
02 - Linear Models - C - Regularization - Logistic - Regression
No ratings yet
02 - Linear Models - C - Regularization - Logistic - Regression
16 pages
L09 - Regularisation
No ratings yet
L09 - Regularisation
79 pages
Regularization
No ratings yet
Regularization
46 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
Kkk
No ratings yet
Kkk
17 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
07_regularization
No ratings yet
07_regularization
51 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
Learning From Data: 9: Regularization
No ratings yet
Learning From Data: 9: Regularization
37 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Regularization 1704650055
No ratings yet
Regularization 1704650055
32 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
DL-Lec 2 -bias-variance-tradeoff
No ratings yet
DL-Lec 2 -bias-variance-tradeoff
33 pages
Nndl Notes
No ratings yet
Nndl Notes
73 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Group 30 Ppt
No ratings yet
Group 30 Ppt
33 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
3 Logistic Regression and Regularization
No ratings yet
3 Logistic Regression and Regularization
42 pages
Multiclass Classification Regularization
No ratings yet
Multiclass Classification Regularization
31 pages
12-Regularization for Deep Learning-17!08!2024
No ratings yet
12-Regularization for Deep Learning-17!08!2024
51 pages
Lecture6 Regularization
No ratings yet
Lecture6 Regularization
56 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Lecture ai
No ratings yet
Lecture ai
40 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
No ratings yet
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
10 pages
NN&DL Unit-IV Regularization for Deep Learning
No ratings yet
NN&DL Unit-IV Regularization for Deep Learning
16 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Regulariza On: The Problem of Overfi6ng
No ratings yet
Regulariza On: The Problem of Overfi6ng
19 pages
Lecture 7
No ratings yet
Lecture 7
19 pages
Lecture 7
No ratings yet
Lecture 7
19 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
06LogisticRegression
No ratings yet
06LogisticRegression
55 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Lecture W4a
No ratings yet
Lecture W4a
17 pages
Lecture 3
No ratings yet
Lecture 3
61 pages
Lecture 3-Linear-Regression-Part2
No ratings yet
Lecture 3-Linear-Regression-Part2
45 pages
Machine learning
No ratings yet
Machine learning
19 pages
Lecture Slides 3 - Bias Variance and Regularisation For Neural Networks - 2021
No ratings yet
Lecture Slides 3 - Bias Variance and Regularisation For Neural Networks - 2021
29 pages
Feature Selection, L1 vs. L2 Regularization, and Rotational Invariance - A NG 2004
No ratings yet
Feature Selection, L1 vs. L2 Regularization, and Rotational Invariance - A NG 2004
8 pages
Lecture4a Sub 02-En
No ratings yet
Lecture4a Sub 02-En
2 pages
Lec3 Linear Regression With Multiple Vars
No ratings yet
Lec3 Linear Regression With Multiple Vars
30 pages
unit4
No ratings yet
unit4
93 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Magic wIth Math
From Everand
Magic wIth Math
Rajinder Goswami
5/5 (2)
CBSE NCERT Solutions For Class 9 Mathematics Chapter 2: Back of Chapter Questions
No ratings yet
CBSE NCERT Solutions For Class 9 Mathematics Chapter 2: Back of Chapter Questions
34 pages
Least Square Regression
No ratings yet
Least Square Regression
15 pages
Horners Convert
No ratings yet
Horners Convert
4 pages
Harish - 3 22 04
No ratings yet
Harish - 3 22 04
19 pages
Numerical Differentiation & Integration: 8.2.1 Derivatives Using Newton's Forward Interpolation Formula
No ratings yet
Numerical Differentiation & Integration: 8.2.1 Derivatives Using Newton's Forward Interpolation Formula
22 pages
Practice Problem Set 1: OA4201 Nonlinear Programming
No ratings yet
Practice Problem Set 1: OA4201 Nonlinear Programming
4 pages
Ist Unit Test Class-X Mathematics
No ratings yet
Ist Unit Test Class-X Mathematics
1 page
Assignment Theory
No ratings yet
Assignment Theory
4 pages
考古題
No ratings yet
考古題
1 page
NM 2068 PDF
No ratings yet
NM 2068 PDF
12 pages
Assignment2 Cse330 Fall2024
No ratings yet
Assignment2 Cse330 Fall2024
1 page
Gomory Cutting Plane Method
No ratings yet
Gomory Cutting Plane Method
10 pages
Learning Activity Sheet No. 4 - Factoring Perfect Square Trinomials
100% (1)
Learning Activity Sheet No. 4 - Factoring Perfect Square Trinomials
1 page
Numerical Solution of Ordinary Differential Equations (ODE) : Initial Value Problem (IVP)
No ratings yet
Numerical Solution of Ordinary Differential Equations (ODE) : Initial Value Problem (IVP)
18 pages
HMT Lecture 3 - Two-Dimensional Steady State Conduction
No ratings yet
HMT Lecture 3 - Two-Dimensional Steady State Conduction
7 pages
MDO Lecture 07
No ratings yet
MDO Lecture 07
63 pages
Adaptive Quadrature
No ratings yet
Adaptive Quadrature
8 pages
VANDERMONDE WITH ARNOLDI
No ratings yet
VANDERMONDE WITH ARNOLDI
10 pages
Long Division With No Remainders Activity Sheets Ver 4
No ratings yet
Long Division With No Remainders Activity Sheets Ver 4
4 pages
Conjugate Gradient Algorithms and Finite Element Methods
No ratings yet
Conjugate Gradient Algorithms and Finite Element Methods
382 pages
Opt. Lec 4
No ratings yet
Opt. Lec 4
22 pages
FACTORISATION
No ratings yet
FACTORISATION
18 pages
Daraga National High School Table of Specification Fourth Quarter Examination Mathematics 7
No ratings yet
Daraga National High School Table of Specification Fourth Quarter Examination Mathematics 7
2 pages
Lecture 4 LP Formulation
No ratings yet
Lecture 4 LP Formulation
18 pages
Week4 Chap3 Recursion Branch and Bound Cbus
No ratings yet
Week4 Chap3 Recursion Branch and Bound Cbus
13 pages
Simplex Algorithm - Special Cases
No ratings yet
Simplex Algorithm - Special Cases
27 pages
Operation Research Sir Haidar Ali PDF
No ratings yet
Operation Research Sir Haidar Ali PDF
70 pages
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
No ratings yet
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
5 pages