Lecture 4.2. Generalization and Regularization
Lecture 4.2. Generalization and Regularization
Halmstad University
Quick reminder about overfitting
2
The problem of overfitting
Underfitting Overfitting
3
The problem of overfitting
Classification Regression
Addressing overfitting
1. Model selection (previous lecture)
– You can try various models (of different complexity) and compute the
generalization error (as explained previously), and keep the best model.
5
Regularization
6
Regularization - Motivation
ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜽𝜽𝟑𝟑 𝒙𝒙𝟑𝟑𝟏𝟏 + 𝜽𝜽𝟒𝟒 𝒙𝒙𝟒𝟒𝟏𝟏
7
Regularization - Motivation
ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜽𝜽𝟑𝟑 𝒙𝒙𝟑𝟑𝟏𝟏 + 𝜽𝜽𝟒𝟒 𝒙𝒙𝟒𝟒𝟏𝟏
8
Regularization - Motivation
ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜽𝜽𝟑𝟑 𝒙𝒙𝟑𝟑𝟏𝟏 + 𝜽𝜽𝟒𝟒 𝒙𝒙𝟒𝟒𝟏𝟏
9
Regularization
• Small values for parameters 𝜃𝜃0 , 𝜃𝜃1 , … , 𝜃𝜃𝑝𝑝
– Implies a simpler hypothesis
– Less prone to overfitting
𝜆𝜆 = Regularization parameter
(it’s a hyper-parameter)
10
Regularization
• Small values for parameters 𝜃𝜃0 , 𝜃𝜃1 , … , 𝜃𝜃𝑝𝑝
– Implies a simpler hypothesis
– Less prone to overfitting
Suppose:
ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜃𝜃3 𝑥𝑥13 + 𝜃𝜃4 𝑥𝑥44
12
Regularization
Suppose:
ℎ𝜃𝜃 𝑥𝑥 = 𝜽𝜽𝟎𝟎 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜃𝜃3 𝑥𝑥13 + 𝜃𝜃4 𝑥𝑥44
13
Regularization
Suppose:
ℎ𝜃𝜃 𝑥𝑥 = 𝜽𝜽𝟎𝟎 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜃𝜃3 𝑥𝑥13 + 𝜃𝜃4 𝑥𝑥44
14
Regularized Linear Regression
15
Regularized Linear Regression
We minimize:
• By the way, how can you write 𝐸𝐸(𝜃𝜃) in a more compact way, using
vectors/matrices?
?
16
Regularized Linear Regression
We minimize:
• By the way, how can you write 𝐸𝐸(𝜃𝜃) in a more compact way, using
vectors/matrices?
Update
𝜃𝜃0 , 𝜃𝜃1 , … , 𝜃𝜃𝑑𝑑
simultaneously
same as
19
Regularized Logistic Regression
(for classification)
20
Regularized Logistic Regression
ℎ𝜃𝜃 𝑥𝑥
= 𝑔𝑔(𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜃𝜃3 𝑥𝑥12 𝑥𝑥2
+ 𝜃𝜃4 𝑥𝑥12 𝑥𝑥22 + 𝜃𝜃5 𝑥𝑥12 𝑥𝑥23 + 𝜃𝜃6 𝑥𝑥13 𝑥𝑥2 … )
21
Regularized Logistic Regression
ℎ𝜃𝜃 𝑥𝑥
= 𝑔𝑔(𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥12 + 𝜃𝜃3 𝑥𝑥12 𝑥𝑥2
+ 𝜃𝜃4 𝑥𝑥12 𝑥𝑥22 + 𝜽𝜽𝟓𝟓 𝑥𝑥12 𝑥𝑥23 + 𝜽𝜽𝟔𝟔 𝑥𝑥13 𝑥𝑥2 … )
Regularization term
22
Regularized Logistic Regression
Gradient Descent
Simultaneously
update all
parameters
𝜃𝜃0 , 𝜃𝜃1 , … , 𝜃𝜃𝑝𝑝
23