Lect03 Linear Model ML
Lect03 Linear Model ML
2023
Contents
1. Linear Regression
2. Classification
3. Logistic Regression
4. Softmax Regression
Classification
symbol meaning
Logistic
Regression a, b, c, N . . . scalar number
Binary Classification
Evaluation
w, v, x, y . . . column vector
Multi-class
Classification
X, Y . . . matrix operator meaning
Softmax R set of real numbers w| transpose
set of integer numbers matrix multiplication
Regression
Softmax Regression
Z XY
Cross Entropy vs. MSE
N set of natural numbers X −1 inverse
RD
Capacity,
Overfitting and set of vectors
set
Underfitting
Model Capacity
X , Y, . . .
Model vs. Data
Bias-Variance
A algorithm
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
3
Learning diagram
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
4
Linear Regression
• Simple Linear Model
• Weighted Linear Model
• Linear Basis Function Model
Problem 1
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • Consider the Advertising data set Dtrain consists of the sales of that product
Logistic in 200 different markets, along with advertising budgets for the product in
Regression
Binary Classification each of those markets for the media TV. Find the relationship between TV
(input) and sales (output)
Evaluation
Multi-class
Classification
Softmax
25
25
Regression
Softmax Regression
Cross Entropy vs. MSE
20
20
Capacity,
Overfitting and
Sales
Sales
15
15
Underfitting
Model Capacity
Model vs. Data
10
10
Bias-Variance
Tradeoff of Capacity
Regularization
5
5
Tradeoff of
Regularization
TV TV
6
Linear Regression Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Concept 1
Logistic
Regression A linear regression is a model that assumes a linear relationship between
Binary Classification
Evaluation
inputs and the output.
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
7
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • The requirement is to build a system that can take a vector x ∈ RD+1 as
Logistic input and predict the value of a scalar y ∈ R as its output
Regression
Binary Classification
Evaluation
• The hypothesis set H
Multi-class
Classification
Softmax
y ≈ ŷ = hw (x) = w | x (1)
Regression
where ŷ be the value that our model (function) predicts y and w ∈ RD+1 is
Softmax Regression
Cross Entropy vs. MSE
8
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• Task T : to predict y from x by outputting ŷ = hw (x) = w | x
Logistic • The train set Dtrain denoted as (X, y) including N samples
Regression
Binary Classification {(x 1 , y1 ), (x 2 , y2 ) . . . (x N , yN )}, construct the matrix X and the vectors y and
ŷ
Evaluation
Multi-class
x1 y1 ŷ1
Classification |
Softmax
Regression x |2 y2 ŷ2
Softmax Regression
X = . , y = . , ŷ = . (2)
Cross Entropy vs. MSE
. . .
. . .
Capacity,
Overfitting and x| yN ŷN
Underfitting
| {z N } | {z } | {z }
target vector output vector
Model Capacity
Model vs. Data
input data matrix
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
9
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• Performance measure P:
Logistic
Regression Concept 2
Binary Classification
Evaluation The mean squared error MSEtrain of the model on the train set Dtrain
Multi-class
Classification
Softmax N
1 1 X
Regression
MSEtrain = kŷ − yk2 = (ŷn − yn )2 (3)
Softmax Regression
N N
Cross Entropy vs. MSE
n=1
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
10
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
11
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• The learning goal: find the vector of parameter w such that
Logistic
Regression w = arg min(MSEtrain ) (4)
Binary Classification w
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
12
Solving Problem
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Solution
Logistic • Compute the gradient of MSEtrain
Regression
Binary Classification
Evaluation
Multi-class
∇w (MSEtrain ) = ∇w (w | X | Xw − w | X | y − y | Xw + y | y)
Classification
Softmax
= 2X | Xw − 2X | y (5)
Regression
Softmax Regression
Cross Entropy vs. MSE
• If MSEtrain reach the min value then ∇w (MSEtrain ) = 0
Capacity,
Overfitting and
Underfitting ∇w (MSEtrain ) = 0
X | Xw − X | y
Model Capacity
Model vs. Data = 0
Bias-Variance
Tradeoff of Capacity X Xw
|
= X |y
Regularization
Tradeoff of
Regularization w = (X | X)−1 X | y (6)
13
Solving Problem (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
14
Programming Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • Use seaborn to read tips dataset and find the linear relationship between
Logistic total_bill and tip
Regression
Binary Classification
Evaluation
import numpy as np
Multi-class import seaborn as sns
Classification
import matplotlib . pyplot as plt
Softmax
Regression
Softmax Regression sns. set_style (" darkgrid ")
Cross Entropy vs. MSE
tips = sns. load_dataset ("tips")
Capacity,
Overfitting and
sns. regplot (x=" total_bill ", y="tip", data=tips , ci=None , line_kws
Underfitting ={ 'color ':'red '})
Model Capacity
Model vs. Data
plt.show ()
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
15
Programming Example (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
10
Logistic
Regression
Binary Classification
Evaluation
Multi-class 8
Classification
Softmax
Regression
Softmax Regression 6
Cross Entropy vs. MSE
tip
Capacity,
Overfitting and
Underfitting
4
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity 2
Regularization
Tradeoff of
Regularization
10 20 30 40 50
total_bill
16
Word Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification 1. Find the linear regression function y = f (x) = w0 + w1 x given the following
Logistic data set D
Regression
Binary Classification input x target y
1 2
Evaluation
Multi-class
2 3
Classification
Softmax
Regression 3 3
4 5
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and 2. Find the linear regression function y = f (x) = f (x1 , x2 ) = w0 + w1 x1 + w2 x2
Underfitting
Model Capacity
given the following data set D
Model vs. Data
Bias-Variance
input x target y
Tradeoff of Capacity
Regularization
(1, 1) 1
Tradeoff of
Regularization
(2, 3) 3
(3, 4) 4
(4, 3) 5
17
Discussion
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
18
Weighted Linear Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • In some cases the observations may be weighted; for example, they may not
Logistic be equally reliable. In this case, we find the vector of parameters w to
Regression
Binary Classification minimize the weighted sum of squares of errors
Evaluation
Multi-class
Classification N
Etrain = an (ŷn − yn )2 (7)
X
Softmax
Regression
Softmax Regression n=1
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
19
Solving Problem
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
x1 a1 0 · · · 0 y1
|
Evaluation
0 a2 · · · 0 y2
Multi-class
Classification x |2
Softmax X = . , A = . .. , y = (8)
.. . . ..
Regression .. ..
Softmax Regression
. . . .
Cross Entropy vs. MSE
x |N 0 0 · · · aN yN
Capacity, | {z } | {z } | {z }
Overfitting and input data matrix weight matrix target vector
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
2. Calculate the vector of parameters
Tradeoff of Capacity
w = (X | AX)−1 X | Ay (9)
Regularization
Tradeoff of
Regularization
20
Linear in What?
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
21
Linear Basis Function Models
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Concept 3
Logistic
Regression A linear basis function model is a linear combination of fixed nonlinear
Binary Classification
Evaluation
functions of the input variables
Multi-class
Classification
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
22
Some types of basis functions
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • Polynomial
Logistic
Regression
Binary Classification
Evaluation
1
Multi-class
Classification
φj (x) = x j (12)
0.5
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
0
Capacity,
Overfitting and −0.5
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
−1
Tradeoff of Capacity
−1 0 1
Regularization
Tradeoff of
Regularization
23
Some types of basis functions (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • Gaussian
Logistic
Regression
Binary Classification
Evaluation
1
Multi-class
Classification
!
0.75 (x − µj )2
Softmax φj (x; µj , sj ) = exp − (13)
Regression
Softmax Regression
sj2
Cross Entropy vs. MSE
0.5
Capacity,
Overfitting and 0.25
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
0
Tradeoff of Capacity
−1 0 1
Regularization
Tradeoff of
Regularization
24
Some types of basis functions (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • Sigmoid
Logistic
Regression
Binary Classification
Evaluation
1
1
Multi-class
Classification φj (x; µj , sj ) = x−µj (14)
0.75 −
Softmax
Regression
1+e sj
Softmax Regression
Cross Entropy vs. MSE
0.5
Capacity,
Overfitting and 0.25
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
0
Tradeoff of Capacity
−1 0 1
Regularization
Tradeoff of
Regularization
25
Problem 2
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• Find the relationship between Years of Education and Income based on
Logistic the given data
Regression
Binary Classification
Evaluation
Multi-class
Classification
80
80
Softmax
Regression
70
70
Softmax Regression
Cross Entropy vs. MSE
60
60
Capacity,
Income
Income
Overfitting and
50
50
Underfitting
40
40
Model Capacity
Model vs. Data
Bias-Variance
30
30
Tradeoff of Capacity
Regularization
20
20
Tradeoff of
Regularization
10 12 14 16 18 20 22 10 12 14 16 18 20 22
26
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
y ≈ ŷ = hw (x) = w | φ(x) (15)
Regression
27
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• Task T : to predict y from x by outputting ŷ = hw (x) = w | φ(x)
Logistic • Performance measure P:
Regression
Binary Classification The mean squared error MSEtrain of the model on the train set Dtrain
including N samples {(x 1 , y1 ), (x 2 , y2 ) . . . (x N , yN )}
Evaluation
Multi-class
Classification
Softmax • The learning goal: find the vector of parameter w such that
Regression
Softmax Regression
Cross Entropy vs. MSE w = arg min(MSEtrain )
w
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
28
Solving Problem
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
φ(x 1 )| y1
Evaluation
y2
Multi-class
Classification φ(x 2 )|
Softmax Φ= , y = .. (17)
Regression
..
Softmax Regression
. .
Cross Entropy vs. MSE
φ(x N ) | yN
Capacity, | {z } | {z }
Overfitting and design matrix target vector
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
2. Calculate the vector of parameters
Tradeoff of Capacity
29
Programming Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
30
Programming Example (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
100
Logistic
Regression
Binary Classification
Evaluation
80
Multi-class
Classification
Softmax
Regression 60
Softmax Regression
Cross Entropy vs. MSE y
Capacity, 40
Overfitting and
Underfitting
Model Capacity
Model vs. Data 20
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of 0
Regularization
1 2 3 4 5 6 7 8 9 10
x
31
Word Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
2 3
Regression
Softmax Regression
3 3
Cross Entropy vs. MSE 4 5
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
32
Puzzle
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression 0.5
Softmax Regression
Cross Entropy vs. MSE
Capacity, 0.0
Overfitting and
Underfitting
y
Model Capacity
Model vs. Data
0.5
Bias-Variance
Tradeoff of Capacity
Regularization 1.0
Tradeoff of
Regularization
1.0 0.5 0.0 0.5 1.0
x
33
Classification
A real data set
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • Some 16-by-16 pixel grayscale image from the MNIST database
Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
35
Input representation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
• Feature extraction: extract useful information
intensity and symmetry x | = x1 x2
Regression
linear model w | = w1 w 2
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
36
Illustration of Features
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
Logistic
Regression
Binary Classification
Evaluation
Multi-class
2x
Classification
Softmax
symmetry
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
intensity x1
37
The Problem with Categorical Data
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
38
Integer Encoding
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • For categorical variables where no such ordinal relationship exists, the integer
Logistic encoding is not enough.
Regression
Binary Classification
Evaluation
Multi-class
id color id color
Classification
1 red 1 1
Softmax
Regression 2 green 2 2
Softmax Regression
Cross Entropy vs. MSE 3 blue Integer 3 3
Capacity,
4 red encoding 4 1
Overfitting and
Underfitting
Model Capacity
… … … …
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
39
One-Hot Encoding
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • One-hot encoding ensures that machine learning does not assume that higher
Logistic numbers are more important.
Regression
Binary Classification
Evaluation
Multi-class
id color id color_red color_green color_blue
Classification
1 red 1 1 0 0
Softmax
Regression 2 green 2 0 1 0
Softmax Regression
Cross Entropy vs. MSE 3 blue One-hot 3 0 0 1
Capacity,
4 red encoding 4 1 0 0
Overfitting and
Underfitting
Model Capacity
… … … … … …
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
40
Classifier Training
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• Select the learning model for classifier, e.g., Perceptron
Logistic • Train the classifier/model using a training set D = {(x 1 , y1 ), ..., (x N , yN )}
Regression
Binary Classification
Evaluation
Multi-class
Classification
training
set sample
Softmax
Regression
Softmax Regression input set
Cross Entropy vs. MSE ... ...
get
Capacity, next sample Model
... ...
Overfitting and training
Underfitting set
Model Capacity ... ...
Model vs. Data
Bias-Variance ... ... update
model cost value
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization No statisfied? Yes Model
features target
41
Logistic Regression
• Binary Classification
• Evaluation
• Multi-class Classification
A Third Linear Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
d
Logistic
s= wi xi
X
Regression
i=0
Binary Classification
Evaluation
Multi-class
Softmax
Regression h(x) = sign(s) h(x) = s h(x) = σ(s)
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
43
The logistic function
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• The formula
Logistic 1
Regression
1
Binary Classification
σ(s) = (19)
Evaluation
Multi-class
1 + e −s 0.75
Classification
Capacity,
• Properties 0.25
Overfitting and
Underfitting σ(−s) = 1 − σ(s)
Model Capacity −6 −4 −2 0 2 4 6
Model vs. Data s
Bias-Variance
Tradeoff of Capacity
Regularization
σ 0 (s) = σ(s)(1 − σ(s))
Tradeoff of
Regularization
44
Probability Interpretation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
• σ(s): probability of a heart attack
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
45
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression
hw (x) for y = 1
P(y | x, w) = (20)
Softmax Regression
1 − hw (x) for y = 0
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
46
Error measure
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
P(y | x, w) = z y (1 − z)1−y
Classification
Softmax
(21)
Regression
where
Softmax Regression
Cross Entropy vs. MSE
47
Error measure (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• Learning goal: maximizing likelihood
Logistic QN
Regression Maximize P(y | x , w)
Binary Classification QN n n
n=1
Evaluation
⇔ Minimize − log n=1 P(yn | x n , w) (24)
⇔ Minimize − N
Multi-class
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
48
Learning Algorithm (Gradient Descent)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
50
Programming Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
51
Programming Example (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
sex = male sex = female
Logistic 1.0
Regression
Binary Classification
Evaluation
Multi-class
0.8
Classification
Softmax
Regression 0.6
survived
Softmax Regression
Cross Entropy vs. MSE
0.4
Capacity,
Overfitting and
Underfitting
Model Capacity
0.2
Model vs. Data
Bias-Variance
Tradeoff of Capacity 0.0
Regularization
Tradeoff of 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Regularization age age
52
Evaluation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
y is ⊕ if P(y | x) ≥ th
Evaluation
Multi-class
Classification
Capacity,
Overfitting and
• High threshold: high specificity, low sensitivity
Underfitting
Model Capacity
• Low threshold: low specificity, high sensitivity
Model vs. Data
Bias-Variance
• We should select the best threshold for the trade-off between the cost of false
positives vs false negatives
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
53
ROC Curve
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • The receiver operating characteristic (ROC) curve is plot which shows the
Logistic performance of a binary classifier as function of its cut-off threshold.
Regression
Binary Classification • It essentially shows the true positive rate (sensitivity) against the false
positive rate (1-specificity) for various threshold values.
Evaluation
Multi-class
Classification
Softmax
• The area under the curve (AUC) is an aggregated measure of performance.
Regression
1.00
Softmax Regression 1.00
Capacity, 0.75
Overfitting and
0.75
Underfitting
sensitivity
Model Capacity
0.50 0.50
p
Tradeoff of
Regularization
0.00 0.00
54
Multi-class classification problems
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • Email foldering/tagging: Work (1), Friends (2), Family (3), Hobby (4)
Logistic
Regression
• Medical diagrams: Not ill, Cold, Flu
Binary Classification
Evaluation
• Weather: Sunny, Cloudy, Rain, Snow
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
55
Visual of Binary vs Multi-class classification
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
56
Approaches
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • One-vs-one
Logistic
Regression
• Hierarchical
Binary Classification
Evaluation
• One-vs-all
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
57
One-vs-all
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
Logistic
Regression
• Class # (1):
Binary Classification
Evaluation
Multi-class
h(1) (x) = P(y = 1 | x, w 1 )
Classification
Softmax
Regression
• Class M (2):
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
• Class ♦ (3):
Model vs. Data
58
One-vs-all (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Learning
Logistic • Train a logistic regression classifier h(i) (x) for each class i to predict the
Regression
Binary Classification probability that y = i.
Evaluation
Multi-class
Classification
Prediction
Softmax • On a new input x, to make a prediction, pick the class i that maximizes
Regression
Softmax Regression
Cross Entropy vs. MSE
arg max(h(i) (x)) (29)
Capacity, i
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
59
Decision boundaries and decision regions
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
60
Softmax Regression
• Softmax Regression
• Cross Entropy vs. MSE
Softmax Regression
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Concept 4
Logistic
Regression Softmax regression is a generalization of logistic regression that we can use for
Binary Classification
Evaluation
multi-class classification
Multi-class
Classification
Softmax
• In Softmax regression, we replace the sigmoid function by the so-called
Regression
Softmax Regression
softmax function φ(·) = {φ1 , ..., φC }.
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
62
Score function
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Concept 5
Logistic
Regression The score function f that maps the raw features to class scores.
Binary Classification
z = f (x; W , b) = W x + b (30)
Evaluation
Multi-class
Classification
Softmax
Regression input image
Softmax Regression
Cross Entropy vs. MSE
Capacity, 56
Overfitting and 0.2 -0.5 0.1 2.0 1.1 -96.8 cat score
Underfitting
Model Capacity 231
Model vs. Data
1.5 1.3 2.1 0.0 3.2 437.9 dog score
Bias-Variance
Tradeoff of Capacity 24
Regularization 0 0.25 0.2 -0.3 -1.2 61.95 ship score
Tradeoff of
Regularization 2
63
Score function (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
64
Softmax function
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Concept 6
Logistic
Regression The softmax function converts a score vector z = (z1 , ..., zC ) to a discrete
distribution vector p = (p1 , ..., pC )
Binary Classification
Evaluation
Multi-class
e zi
Classification
Softmax
Regression
pi = P(y = i | z) = φi (z) = PC , i ∈ [1, ..., C] (32)
zj
Softmax Regression j=1 e
Cross Entropy vs. MSE
Capacity,
Overfitting and
where
Underfitting
Model Capacity
zi = w i0 + w i1 x1 + ... + w iD xD = w |i x (33)
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
65
Softmax function (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression
Softmax Regression softmax
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
distribution vectors
66
Softmax function (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
input image
Logistic
Regression
Binary Classification
1
Evaluation
Multi-class
Classification 56 -96.8 0 cat
Softmax
Regression
Softmax Regression
231 437.9 1 dog
Cross Entropy vs. MSE
67
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
N
arg min E (φW ) = arg min CE (p n , t n ) (34)
X
Regression
Softmax Regression
W W
Cross Entropy vs. MSE n=1
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
68
Cross Entropy
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Concept 7
Logistic
Regression Cross-entropy (CE) is a measure of the difference between two probability
distributions. The cross-entropy between a “true” distribution t = (t1 , ..., tC ) and
Binary Classification
Evaluation
Multi-class
Classification an estimated distribution p = (p1 , ..., pC ) is defined as
Softmax
Regression C
CE (p, t) = − ti log pi (35)
Softmax Regression
X
Cross Entropy vs. MSE
Capacity, i=1
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
69
Cross Entropy (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification p t
Logistic 0.45
Regression 0.4
0.35
Binary Classification
probability
0.3
Evaluation 0.25
Multi-class 0.2
Classification 0.15
0.1
Softmax 0.05
Regression 0
Softmax Regression A B C D E
Cross Entropy vs. MSE
class
Capacity,
Overfitting and
Underfitting
Classification Concept 8
Logistic
Regression Mean squared error (MSE) is a measure of the average of the squares of the
errors.
Binary Classification
Evaluation
Multi-class C
1X
MSE (p, t) = (pi − ti )2 (36)
Classification
Softmax
C
Regression i=1
Softmax Regression
Cross Entropy vs. MSE
Capacity, • MSE ≥ 0
Overfitting and
Underfitting • MSE (p, t) = MSE (t, p)
Model Capacity
Model vs. Data
Bias-Variance
• MSE = 0 if pi = ti , ∀i
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
71
Cross Entropy vs. MSE
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • Consider three “true” binary distributions p = (0.1, 0.9), (0.5, 0.5) and
Logistic (0.8, 0.2)
Regression
Binary Classification
Evaluation p = 0.1 p = 0.5 p = 0.8
Multi-class
6 6 6
Classification
Softmax
5 5 5
Regression
Softmax Regression 4 4 4
Cross Entropy vs. MSE
3 3 3
Capacity,
Overfitting and
Underfitting 2 (0.1log(q) + 0.9log(1 q)) 2 2 (0.8log(q) + 0.2log(1 q))
Model Capacity
1 1
(0.5log(q) + 0.5log(1 q)) 1
Model vs. Data
Bias-Variance (q 0.1)2 (q 0.5)2 (q 0.8)2
Tradeoff of Capacity 0 0 0
Regularization
Tradeoff of 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Regularization
72
Learning Algorithm
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
73
Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax 0.25
1
Regression
Softmax Regression 0.20
Cost
Cross Entropy vs. MSE 0
Capacity, 0.15
Overfitting and 1
Underfitting 0.10
Model Capacity
2
Model vs. Data
Bias-Variance
0.05
2 1 0 1 2 3 0 100 200 300 400 500
Tradeoff of Capacity Iterations
Regularization
Tradeoff of
Regularization
74
Capacity, Overfitting and Underfitting
• Model Capacity
• Model vs. Data
• Bias-Variance
• Tradeoff of Capacity
• Regularization
• Tradeoff of Regularization
Model Capacity
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
Logistic Concept 9
Regression
Model space
Binary Classification
Evaluation
Capacity is model complexity.
Multi-class
Softmax
Regression
Softmax Regression
capacity of a model:
Cross Entropy vs. MSE
• VC dimension
Capacity,
Overfitting and • The number of parameters
Underfitting
Model Capacity
Model vs. Data
• The norm of parameters simple
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization complex
76
Model vs. Data
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Concept 10
Logistic
Regression Data is divided into three sets: training set, validation set and test set.
Binary Classification
Evaluation
Concept 11
Multi-class
Classification
Softmax
Regression Models can be too limited. We can’t find a function that fits the data well. This
Softmax Regression
Cross Entropy vs. MSE
is called underfitting.
Capacity,
Overfitting and
Underfitting Concept 12
Models can also be too rich. We don’t just model the data, but also the
Model Capacity
Model vs. Data
77
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • Given the data set D = {(x1 , y1 ), ...(x10 , y10 )} shown in the following figure,
Logistic find the best regression function to the data
Regression
Binary Classification
Evaluation
Multi-class
Classification
1
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity, y0
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
−1
Tradeoff of
Regularization
0 1
x
78
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
1 1 M =1
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity, y0 y0
Overfitting and
Underfitting
Model Capacity
Model vs. Data −1 −1
Bias-Variance
Tradeoff of Capacity
Regularization 0 1 0 1
Tradeoff of x x
Regularization
79
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
(note that H1 ⊂ H3 )
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
1 1 M =3
Capacity,
Overfitting and
Underfitting y0 y0
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity −1 −1
Regularization
Tradeoff of
Regularization
0 1 0 1
x x
80
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
(note that H1 ⊂ H3 ⊂ H9 )
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
1 1 M =9
Capacity,
Overfitting and
Underfitting y0 y0
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity −1 −1
Regularization
Tradeoff of
Regularization
0 1 0 1
x x
81
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
Regression
Softmax Regression fitting 0
x
1 0
x
1
Capacity, M =3 M =9
1 1
Overfitting and
Underfitting
Model Capacity y0 y0
Model vs. Data
Bias-Variance
Tradeoff of Capacity −1 −1
Regularization
Tradeoff of
0 1 0 1
Regularization x x
82
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
M=1 M=3 M=9
Logistic
Regression w0 0.82 0.31 0.35
Binary Classification
Evaluation
w1 -1.27 7.99 232.37
Multi-class
Classification w2 -25.43 -5321.83
Softmax w3 17.37 48568.31
Regression
Softmax Regression w4 -231639.30
Cross Entropy vs. MSE
w5 640042.26
Capacity,
Overfitting and w6 -1061800.52
Underfitting
Model Capacity w7 1042400.18
Model vs. Data
Bias-Variance
w8 -557682.99
Tradeoff of Capacity
Regularization
w9 125201.43
Tradeoff of
Regularization
83
Model Performance
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification 1
Logistic Training
Regression
Binary Classification
Test
Evaluation
Multi-class
Classification
ERMS
Softmax 0.5
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data 0
Bias-Variance 0 3 M 6 9
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
Figure 1: Graphs of the root-mean-square error evaluated on the training set and on an
independent test set for various values of M
84
What happen if increasing N
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
Logistic
Regression 1 N = 15 1 N = 100
Binary Classification
Evaluation
Multi-class
Classification
Softmax
y0 y0
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity, −1 −1
Overfitting and
Underfitting
Model Capacity
Model vs. Data 0 1 0 1
Bias-Variance x x
Tradeoff of Capacity
Regularization
Tradeoff of
Figure 2: Using the M = 9 polynomial for N = 15 data points (left plot) and N = 100
Regularization
data points (right plot). We see that increasing the size of the data set reduces the
over-fitting problem
85
Errors in Learning Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Capacity,
Overfitting and
Underfitting Expected error = Bias + Variance + Irreducible Error (43)
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
86
Errors in Learning Model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Concept 13
Logistic
Regression Given a learning model hH, Ai, we define the “average” hypothesis g(x)
Binary Classification
Softmax
Regression where gD (x) is the “best” hypothesis given the data set D
Softmax Regression
Cross Entropy vs. MSE • Bias of learning model
Capacity,
Overfitting and
Bias = Ex (g(x) − f (x))2 (45)
h i
Underfitting
Model Capacity
Model vs. Data
87
Errors in Learning Model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
truth function
Classification
Logistic
Regression
Binary Classification
bias
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
variance
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
• Given many independent data sets D1 , D2 , ..., DK , we can estimate g(x) by
Regularization
K
1 X
g(x) ≈ gDi (x) (47)
K
i=1
88
Example: two learning models
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression
• We generate 100 data sets {Di } , i = 1, ..., 100, each containing N = 2 data
Softmax Regression
Cross Entropy vs. MSE
points, independently from the sinusoidal curve f (x) = sin(πx). For each
Capacity,
data set Di , we fit the data using one of two models
Overfitting and
Underfitting
• H0 : set of all lines of the form h(x) = b
Model Capacity
Model vs. Data
• H1 : set of all lines of the form h(x) = ax + b
Bias-Variance • Note that H0 ⊂ H1
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
89
Example: two learning models (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• Given a data set Di = {(x1 , y1 ), (x2 , y2 )}.
Logistic • For H0 , we choose the constant hypothesis that best fits the data (the
horizontal line at the midpoint, b = (y1 + y2 )/2).
Regression
Binary Classification
Evaluation
Multi-class
• For H1 , we choose the line that passes through the two data points
Classification
(x1 , y1 ) and (x2 , y2 ).
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
90
Example: two learning models (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification • Repeating this process with 100 data sets {Di } , i = 1, ..., 100,
Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
91
Example: two learning models (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
bias=0.50 bias=0.21
Tradeoff of
Regularization var=0.25 var=1.69
92
Generalization and Capacity
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification The criteria determining how well a machine learning model will perform:
Logistic 1. Make the training error small.
Regression
Binary Classification
Evaluation
2. Make the gap between training and test (generalization) error small.
Multi-class
Classification
Capacity,
Error
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Generalization gap
Regularization
Tradeoff of
Regularization 0 Optimal Capacity
Capacity
93
Regularization
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification Concept 14
Logistic
Regression Regularization is any modification we make to a learning model that is intended
Binary Classification
Evaluation
to reduce its generalization error but not its training error.
Multi-class
Classification
performance measure
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
algorithm
Model Capacity
Model vs. Data
Bias-Variance
data
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
model
94
Addressing
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
High bias High variance
Logistic
Regression Obtain more features Decrease number of features
Binary Classification
Evaluation
Decrease regularization λ Increase regularization λ
Multi-class
Classification Extend model Obtain more data
Softmax Train longer Stop early
Regression
Softmax Regression New model architecture New model architecture
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
95
Regularization for Linear Regression
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression
where λ is the regularization coefficient (hyper-parameter) that controls the
Softmax Regression relative importance of the data-dependent error MSEtrain and the
1 T
regularization term λ N w w
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
• Solving for w, we obtain
Model Capacity
w = (X | X + λI)−1 X | y (50)
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
96
Example: one learning model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Softmax
Regression
• We generate 100 data sets {Di } , i = 1, ..., 100, each containing N = 25 data
Softmax Regression
Cross Entropy vs. MSE
points, independently from the sinusoidal curve f (x) = sin(2πx). For each
Capacity,
data set Di , we fit a model with 24 Gaussian basis functions by minimizing
Overfitting and
Underfitting
the regularized error function
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
97
Example: one learning model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• Illustration of the dependence of bias and variance on model regularization
Logistic coefficient
Regression
Binary Classification
Evaluation
Multi-class y y y
Classification
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity y y y
Regularization
Tradeoff of
Regularization
98
Example: one learning model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
• Summary of the dependence of bias and variance on model regularization
Logistic coefficient
Regression
Binary Classification 0.15
Evaluation
Multi-class
Classification
(bias)2
0.12 variance
Softmax
Regression (bias)2 + variance
Softmax Regression
0.09 test error
Cross Entropy vs. MSE
Capacity,
Overfitting and 0.06
Underfitting
Model Capacity
Model vs. Data
0.03
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
0
Regularization −3 −2 −1 0 1 2
ln λ
99
References