0% found this document useful (0 votes)

6 views

Lect03 Linear Model ML

A computer program interacts with a dynamic environment in which it must perform a certain goal (finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) • The program is provided feedback in terms of rewards and punishments as it navigates its problem space

Uploaded by

Just Do It FireFly

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lect03 Linear Model ML

Uploaded by

Just Do It FireFly

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

LINEAR MODEL

Bùi Tiến Lên

2023
Contents

1. Linear Regression

2. Classification

3. Logistic Regression

4. Softmax Regression

5. Capacity, Overfitting and Underfitting

Notation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
symbol meaning
Logistic
Regression a, b, c, N . . . scalar number
Binary Classification
Evaluation
w, v, x, y . . . column vector
Multi-class
Classification
X, Y . . . matrix operator meaning
Softmax R set of real numbers w| transpose
set of integer numbers matrix multiplication
Regression
Softmax Regression
Z XY
Cross Entropy vs. MSE
N set of natural numbers X −1 inverse
RD
Capacity,
Overfitting and set of vectors
set
Underfitting
Model Capacity
X , Y, . . .
Model vs. Data
Bias-Variance
A algorithm
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

3
Learning diagram
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

4
Linear Regression
• Simple Linear Model
• Weighted Linear Model
• Linear Basis Function Model
Problem 1
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider the Advertising data set Dtrain consists of the sales of that product
Logistic in 200 different markets, along with advertising budgets for the product in
Regression
Binary Classification each of those markets for the media TV. Find the relationship between TV
(input) and sales (output)
Evaluation
Multi-class
Classification

Softmax

25
Regression
Softmax Regression
Cross Entropy vs. MSE

20
Capacity,
Overfitting and
Sales

Sales
15

15
Underfitting
Model Capacity
Model vs. Data
10

10
Bias-Variance
Tradeoff of Capacity
Regularization
5

5
Tradeoff of
Regularization

0 50 100 200 300 0 50 100 200 300

TV TV

6
Linear Regression Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 1
Logistic
Regression A linear regression is a model that assumes a linear relationship between
Binary Classification
Evaluation
inputs and the output.
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

7
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • The requirement is to build a system that can take a vector x ∈ RD+1 as
Logistic input and predict the value of a scalar y ∈ R as its output
Regression
Binary Classification
Evaluation
• The hypothesis set H
Multi-class
Classification

Softmax
y ≈ ŷ = hw (x) = w | x (1)
Regression

where ŷ be the value that our model (function) predicts y and w ∈ RD+1 is
Softmax Regression
Cross Entropy vs. MSE

Capacity, a vector of parameters of the model

Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

8
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Task T : to predict y from x by outputting ŷ = hw (x) = w | x
Logistic • The train set Dtrain denoted as (X, y) including N samples
Regression
Binary Classification {(x 1 , y1 ), (x 2 , y2 ) . . . (x N , yN )}, construct the matrix X and the vectors y and
ŷ
Evaluation
Multi-class

x1 y1 ŷ1
Classification  |     
Softmax
Regression  x |2   y2   ŷ2 
Softmax Regression
X =  .  , y =  . , ŷ =  .  (2)
     
Cross Entropy vs. MSE
 .  . .
 .   . .
Capacity,
Overfitting and x| yN ŷN
Underfitting
| {z N } | {z } | {z }
target vector output vector
Model Capacity
Model vs. Data
input data matrix
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

9
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Performance measure P:
Logistic
Regression Concept 2
Binary Classification
Evaluation The mean squared error MSEtrain of the model on the train set Dtrain
Multi-class
Classification

Softmax N
1 1 X
Regression
MSEtrain = kŷ − yk2 = (ŷn − yn )2 (3)
Softmax Regression
N N
Cross Entropy vs. MSE
n=1
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

10
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

11
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• The learning goal: find the vector of parameter w such that
Logistic
Regression w = arg min(MSEtrain ) (4)
Binary Classification w
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

12
Solving Problem
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Solution
Logistic • Compute the gradient of MSEtrain
Regression
Binary Classification
Evaluation
Multi-class
∇w (MSEtrain ) = ∇w (w | X | Xw − w | X | y − y | Xw + y | y)
Classification

Softmax
= 2X | Xw − 2X | y (5)
Regression
Softmax Regression
Cross Entropy vs. MSE
• If MSEtrain reach the min value then ∇w (MSEtrain ) = 0
Capacity,
Overfitting and
Underfitting ∇w (MSEtrain ) = 0
X | Xw − X | y
Model Capacity
Model vs. Data = 0
Bias-Variance
Tradeoff of Capacity X Xw
|
= X |y
Regularization
Tradeoff of
Regularization w = (X | X)−1 X | y (6)

13
Solving Problem (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

14
Programming Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Use seaborn to read tips dataset and find the linear relationship between
Logistic total_bill and tip
Regression
Binary Classification
Evaluation
import numpy as np
Multi-class import seaborn as sns
Classification
import matplotlib . pyplot as plt
Softmax
Regression
Softmax Regression sns. set_style (" darkgrid ")
Cross Entropy vs. MSE
tips = sns. load_dataset ("tips")
Capacity,
Overfitting and
sns. regplot (x=" total_bill ", y="tip", data=tips , ci=None , line_kws
Underfitting ={ 'color ':'red '})
Model Capacity
Model vs. Data
plt.show ()
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

15
Programming Example (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
10
Logistic
Regression
Binary Classification
Evaluation
Multi-class 8
Classification

Softmax
Regression
Softmax Regression 6
Cross Entropy vs. MSE
tip
Capacity,
Overfitting and
Underfitting
4
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity 2
Regularization
Tradeoff of
Regularization

10 20 30 40 50
total_bill

16
Word Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification 1. Find the linear regression function y = f (x) = w0 + w1 x given the following
Logistic data set D
Regression
Binary Classification input x target y
1 2
Evaluation
Multi-class

2 3
Classification

Softmax
Regression 3 3
4 5
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and 2. Find the linear regression function y = f (x) = f (x1 , x2 ) = w0 + w1 x1 + w2 x2
Underfitting
Model Capacity
given the following data set D
Model vs. Data
Bias-Variance
input x target y
Tradeoff of Capacity
Regularization
(1, 1) 1
Tradeoff of
Regularization
(2, 3) 3
(3, 4) 4
(4, 3) 5
17
Discussion
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • D is a large number

Logistic
Regression
• Online learning
Binary Classification
Evaluation
• Limitations of the model (hypothesis set)
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

18
Weighted Linear Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • In some cases the observations may be weighted; for example, they may not
Logistic be equally reliable. In this case, we find the vector of parameters w to
Regression
Binary Classification minimize the weighted sum of squares of errors
Evaluation
Multi-class
Classification N
Etrain = an (ŷn − yn )2 (7)
X
Softmax
Regression
Softmax Regression n=1
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

19
Solving Problem
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic 1. Construct the matrices X, A and the vector y

Regression
Binary Classification

x1 a1 0 · · · 0 y1
 |     
Evaluation

 0 a2 · · · 0  y2
Multi-class
Classification  x |2   
Softmax X =  . , A =  . .. , y = (8)
     
.. . . ..
Regression  ..   ..

Softmax Regression
. . .   . 
Cross Entropy vs. MSE
x |N 0 0 · · · aN yN
Capacity, | {z } | {z } | {z }
Overfitting and input data matrix weight matrix target vector
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
2. Calculate the vector of parameters
Tradeoff of Capacity

w = (X | AX)−1 X | Ay (9)
Regularization
Tradeoff of
Regularization

20
Linear in What?
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Linearity in the weights

Logistic
Regression
Binary Classification
hw (x) = w0 + w 1 x1 + ... + w D xD (10)
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

21
Linear Basis Function Models
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 3
Logistic
Regression A linear basis function model is a linear combination of fixed nonlinear
Binary Classification
Evaluation
functions of the input variables
Multi-class
Classification

Softmax hw (x) = w0 φ0 (x) + w1 φ1 (x) + ... + wM φM (x) (11)

Regression

where φi (x) are basis functions

Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

22
Some types of basis functions
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Polynomial
Logistic
Regression
Binary Classification
Evaluation
1
Multi-class
Classification
φj (x) = x j (12)
0.5
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
0
Capacity,
Overfitting and −0.5
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
−1
Tradeoff of Capacity
−1 0 1
Regularization
Tradeoff of
Regularization

23
Some types of basis functions (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Gaussian
Logistic
Regression
Binary Classification
Evaluation
1
Multi-class
Classification
!
0.75 (x − µj )2
Softmax φj (x; µj , sj ) = exp − (13)
Regression
Softmax Regression
sj2
Cross Entropy vs. MSE
0.5
Capacity,
Overfitting and 0.25
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
0
Tradeoff of Capacity
−1 0 1
Regularization
Tradeoff of
Regularization

24
Some types of basis functions (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Sigmoid
Logistic
Regression
Binary Classification
Evaluation
1
1
Multi-class
Classification φj (x; µj , sj ) = x−µj (14)
0.75 −
Softmax
Regression
1+e sj

Softmax Regression
Cross Entropy vs. MSE
0.5
Capacity,
Overfitting and 0.25
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
0
Tradeoff of Capacity
−1 0 1
Regularization
Tradeoff of
Regularization

25
Problem 2
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Find the relationship between Years of Education and Income based on
Logistic the given data
Regression
Binary Classification
Evaluation
Multi-class
Classification

80
Softmax
Regression

70
Softmax Regression
Cross Entropy vs. MSE
60

60
Capacity,
Income

Income
Overfitting and
50

50
Underfitting
40

40
Model Capacity
Model vs. Data
Bias-Variance
30

30
Tradeoff of Capacity
Regularization
20

20
Tradeoff of
Regularization

10 12 14 16 18 20 22 10 12 14 16 18 20 22

Years of Education Years of Education

26
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • The requirement is to build a system that can take a vector x ∈ RD as

Logistic input and predict the value of a scalar y ∈ R as its output
Regression
Binary Classification
Evaluation
• The hypothesis set H
Multi-class
Classification

Softmax
y ≈ ŷ = hw (x) = w | φ(x) (15)
Regression

where ŷ be the value that our model (function) predicts y, w ∈ RM+1 is a

Softmax Regression
Cross Entropy vs. MSE

Capacity, vector of parameters of the model and φ is a set of M + 1 basis functions

Overfitting and
Underfitting  
Model Capacity
Model vs. Data
φ0 (x)
Bias-Variance  φ1 (x) 
Tradeoff of Capacity
φ(x) =  (16)
 
Regularization .. 
Tradeoff of
Regularization
 . 
φM (x)

27
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Task T : to predict y from x by outputting ŷ = hw (x) = w | φ(x)
Logistic • Performance measure P:
Regression
Binary Classification The mean squared error MSEtrain of the model on the train set Dtrain
including N samples {(x 1 , y1 ), (x 2 , y2 ) . . . (x N , yN )}
Evaluation
Multi-class
Classification

Softmax • The learning goal: find the vector of parameter w such that
Regression
Softmax Regression
Cross Entropy vs. MSE w = arg min(MSEtrain )
w
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

28
Solving Problem
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic 1. Construct the matrix Φ and the vector y

Regression
Binary Classification

φ(x 1 )| y1
   
Evaluation

 y2
Multi-class
Classification  φ(x 2 )|  
Softmax Φ= , y =  .. (17)
   
Regression
.. 
Softmax Regression
 .   . 
Cross Entropy vs. MSE
φ(x N ) | yN
Capacity, | {z } | {z }
Overfitting and design matrix target vector
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
2. Calculate the vector of parameters
Tradeoff of Capacity

w = (Φ| Φ)−1 Φ| y (18)

Regularization
Tradeoff of
Regularization

29
Programming Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification import numpy as np

Logistic import seaborn as sns
Regression
import matplotlib . pyplot as plt
Binary Classification
Evaluation
Multi-class
Classification
sns. set_style (" darkgrid ")
Softmax
x = [1, 2, 3, 4, 5, 8, 10]
Regression y = [1.1 , 3.8, 8.5, 16, 24, 65, 99.2]
Softmax Regression
Cross Entropy vs. MSE
sns. regplot (x, y, order =2, ci=None , line_kws ={'color ':'red '})
Capacity,
plt. xlabel ('x')
Overfitting and plt. ylabel ('y')
Underfitting
Model Capacity
plt.show ()
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

30
Programming Example (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
100
Logistic
Regression
Binary Classification
Evaluation
80
Multi-class
Classification

Softmax
Regression 60
Softmax Regression
Cross Entropy vs. MSE y
Capacity, 40
Overfitting and
Underfitting
Model Capacity
Model vs. Data 20
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of 0
Regularization

1 2 3 4 5 6 7 8 9 10
x

31
Word Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Find a polynomial regression function y = f (x) = w0 + w1 x + w2 x 2 given the

Logistic data set D
Regression
Binary Classification input x target y
1 2
Evaluation
Multi-class
Classification

Softmax
2 3
Regression
Softmax Regression
3 3
Cross Entropy vs. MSE 4 5
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

32
Puzzle
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • What basis functions?

Logistic
Regression
Binary Classification
Evaluation 1.0
Multi-class
Classification

Softmax
Regression 0.5
Softmax Regression
Cross Entropy vs. MSE

Capacity, 0.0
Overfitting and
Underfitting
y
Model Capacity
Model vs. Data
0.5
Bias-Variance
Tradeoff of Capacity
Regularization 1.0
Tradeoff of
Regularization
1.0 0.5 0.0 0.5 1.0
x

33
Classification
A real data set
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Some 16-by-16 pixel grayscale image from the MNIST database
Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

35
Input representation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Input representation or feature extraction

Logistic • “raw” input
Regression
pixels x | = x0 x1 . . . x256

Binary Classification

linear model w | = w0 w1 . . . w 256

Evaluation
Multi-class
Classification

Softmax
• Feature extraction: extract useful information
intensity and symmetry x | = x1 x2

Regression

linear model w | = w1 w 2
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

36
Illustration of Features
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
Binary Classification
Evaluation
Multi-class

2x
Classification

Softmax

symmetry
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

intensity x1

37
The Problem with Categorical Data
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Some algorithms can work with categorical data directly.

Logistic
Regression
• For example, a decision tree can be learned directly from categorical
Binary Classification
Evaluation
data with no data transform required
Multi-class
Classification • Many machine learning algorithms cannot operate on label data directly.
Softmax • They require all input variables and output variables to be numeric.
Regression
Softmax Regression
Cross Entropy vs. MSE
• There are two common types of conversion: integer encoding and one-hot
Capacity, encoding
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

38
Integer Encoding
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • For categorical variables where no such ordinal relationship exists, the integer
Logistic encoding is not enough.
Regression
Binary Classification
Evaluation
Multi-class
id color id color
Classification
1 red 1 1
Softmax
Regression 2 green 2 2
Softmax Regression
Cross Entropy vs. MSE 3 blue Integer 3 3
Capacity,
4 red encoding 4 1
Overfitting and
Underfitting
Model Capacity
… … … …
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

39
One-Hot Encoding
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • One-hot encoding ensures that machine learning does not assume that higher
Logistic numbers are more important.
Regression
Binary Classification
Evaluation
Multi-class
id color id color_red color_green color_blue
Classification
1 red 1 1 0 0
Softmax
Regression 2 green 2 0 1 0
Softmax Regression
Cross Entropy vs. MSE 3 blue One-hot 3 0 0 1
Capacity,
4 red encoding 4 1 0 0
Overfitting and
Underfitting
Model Capacity
… … … … … …
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

40
Classifier Training
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Select the learning model for classifier, e.g., Perceptron
Logistic • Train the classifier/model using a training set D = {(x 1 , y1 ), ..., (x N , yN )}
Regression
Binary Classification
Evaluation
Multi-class
Classification
training
set sample
Softmax
Regression
Softmax Regression input set
Cross Entropy vs. MSE ... ...
get
Capacity, next sample Model
... ...
Overfitting and training
Underfitting set
Model Capacity ... ...
Model vs. Data
Bias-Variance ... ... update
model cost value
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization No statisfied? Yes Model

features target

41
Logistic Regression
• Binary Classification
• Evaluation
• Multi-class Classification
A Third Linear Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
d
Logistic
s= wi xi
X
Regression

i=0
Binary Classification
Evaluation
Multi-class

linear classification linear regression logistic regression

Classification

Softmax
Regression h(x) = sign(s) h(x) = s h(x) = σ(s)
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

43
The logistic function
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• The formula
Logistic 1
Regression
1
Binary Classification
σ(s) = (19)
Evaluation
Multi-class
1 + e −s 0.75
Classification

Softmax • The logistic function converts a σ

0.5
Regression
Softmax Regression score to a probability
Cross Entropy vs. MSE

Capacity,
• Properties 0.25

Overfitting and
Underfitting σ(−s) = 1 − σ(s)
Model Capacity −6 −4 −2 0 2 4 6
Model vs. Data s
Bias-Variance
Tradeoff of Capacity
Regularization
σ 0 (s) = σ(s)(1 − σ(s))
Tradeoff of
Regularization

44
Probability Interpretation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • h(x) = σ(s) can be interpreted as a probability

Logistic • For example, prediction of heart attacks
Regression
Binary Classification • Input x: cholesterol level, age, weight, etc.
Evaluation
Multi-class • The signal s = w T x: risk score
Classification

Softmax
• σ(s): probability of a heart attack
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

score probability of heart attack

45
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • The target function f is the probability distribution

Logistic
Regression
Binary Classification
f : RD → [0, 1]
Evaluation

• Hypothesis set hw (x) = σ(w T x) and the conditional probability

Multi-class
Classification

Softmax
Regression
hw (x) for y = 1

P(y | x, w) = (20)
Softmax Regression

1 − hw (x) for y = 0
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

46
Error measure
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification We define error measurement based on likelihood

Logistic • For each (x, y), y is generated by probability hw (x).
Regression
Binary Classification
Evaluation
• Plausible error measure based on likelihood of y given x and w
Multi-class

P(y | x, w) = z y (1 − z)1−y
Classification

Softmax
(21)
Regression

where
Softmax Regression
Cross Entropy vs. MSE

Capacity, z = hw (x) = σ(w | x) (22)

Overfitting and
Underfitting
Model Capacity
• Likelihood of D = {(x 1 , y1 )...(x N , yN )} given w is
Model vs. Data
Bias-Variance
Tradeoff of Capacity N
P(yn | x n , w) (23)
Y
Regularization
Tradeoff of
Regularization
n=1

47
Error measure (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Learning goal: maximizing likelihood
Logistic QN
Regression Maximize P(y | x , w)
Binary Classification QN n n
n=1
Evaluation
⇔ Minimize − log n=1 P(yn | x n , w) (24)
⇔ Minimize − N
Multi-class

n=1 (yn log zn + (1 − yn ) log(1 − zn ))

Classification
P
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

48
Learning Algorithm (Gradient Descent)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification 1. Initialize the weights (parameters) at t = 0 w 0

Logistic 2. For t = 1, 2, 3, ... do
Regression
Binary Classification
2.1 Compute the outputs zn for each x n (n = 1, ..., N)
Evaluation
Multi-class
Classification
zn = σ(w T
t xn) (25)
Softmax
Regression 2.2 Compute the gradient
Softmax Regression
Cross Entropy vs. MSE N
1 X
∇w E = x n (zn − yn ) (26)
Capacity, N
Overfitting and n=1
Underfitting
Model Capacity 2.3 Update the weights
Model vs. Data
Bias-Variance
Tradeoff of Capacity w t+1 = w t − η∇w E (27)
Regularization
Tradeoff of
Regularization where η is a learning rate (hyper-parameter)
Iterate the next step until w is not changes
3. Return the final weights w
49
Learning Rate
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • How η affects the algorithm?

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

50
Programming Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification import matplotlib . pyplot as plt

Logistic import seaborn as sns
Regression
sns.set(style=" darkgrid ")
Binary Classification
Evaluation
Multi-class
Classification
# Load the example titanic dataset
Softmax
df = sns. load_dataset (" titanic ")
Regression
Softmax Regression
Cross Entropy vs. MSE
# Make a custom palette with gendered colors
Capacity,
pal = dict(male=" #6495 ED", female ="# F08080 ")
Overfitting and
Underfitting
Model Capacity
# Show the survival proability as a function of age and sex
Model vs. Data g = sns. lmplot (x="age", y=" survived ", col="sex", hue="sex", data=df ,
Bias-Variance
Tradeoff of Capacity
palette =pal , y_jitter =.02 , logistic =True , ci=None)
Regularization g.set(xlim =(0, 80) , ylim =( -.05 , 1.05))
Tradeoff of
Regularization plt.show ()

51
Programming Example (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
sex = male sex = female
Logistic 1.0
Regression
Binary Classification
Evaluation
Multi-class
0.8
Classification

Softmax
Regression 0.6
survived

Softmax Regression
Cross Entropy vs. MSE
0.4
Capacity,
Overfitting and
Underfitting
Model Capacity
0.2
Model vs. Data
Bias-Variance
Tradeoff of Capacity 0.0
Regularization
Tradeoff of 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Regularization age age

52
Evaluation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider two-class problem with two classes ⊕ and

Logistic
Regression
• The performance of logistic regression model is based on a threshold th
Binary Classification

y is ⊕ if P(y | x) ≥ th
Evaluation
Multi-class
Classification

Softmax y is if P(y | x) < th (28)

Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
• High threshold: high specificity, low sensitivity
Underfitting
Model Capacity
• Low threshold: low specificity, high sensitivity
Model vs. Data
Bias-Variance
• We should select the best threshold for the trade-off between the cost of false
positives vs false negatives
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

53
ROC Curve
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • The receiver operating characteristic (ROC) curve is plot which shows the
Logistic performance of a binary classifier as function of its cut-off threshold.
Regression
Binary Classification • It essentially shows the true positive rate (sensitivity) against the false
positive rate (1-specificity) for various threshold values.
Evaluation
Multi-class
Classification

Softmax
• The area under the curve (AUC) is an aggregated measure of performance.
Regression
1.00
Softmax Regression 1.00

Cross Entropy vs. MSE

Capacity, 0.75
Overfitting and
0.75

Underfitting

sensitivity
Model Capacity
0.50 0.50
p

Model vs. Data

Bias-Variance
Tradeoff of Capacity
Regularization 0.25 0.25

Tradeoff of
Regularization

0.00 0.00

−4 −2 0 2 0.00 0.25 0.50 0.75 1.00

x 1 − specificity

54
Multi-class classification problems
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Email foldering/tagging: Work (1), Friends (2), Family (3), Hobby (4)
Logistic
Regression
• Medical diagrams: Not ill, Cold, Flu
Binary Classification
Evaluation
• Weather: Sunny, Cloudy, Rain, Snow
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

55
Visual of Binary vs Multi-class classification
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

56
Approaches
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • One-vs-one
Logistic
Regression
• Hierarchical
Binary Classification
Evaluation
• One-vs-all
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

57
One-vs-all
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
• Class # (1):
Binary Classification
Evaluation
Multi-class
h(1) (x) = P(y = 1 | x, w 1 )
Classification

Softmax
Regression
• Class M (2):
Softmax Regression

h(2) (x) = P(y = 2 | x, w 2 )

Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
• Class ♦ (3):
Model vs. Data

h(3) (x) = P(y = 3 | x, w 3 )

Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

58
One-vs-all (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Learning
Logistic • Train a logistic regression classifier h(i) (x) for each class i to predict the
Regression
Binary Classification probability that y = i.
Evaluation
Multi-class
Classification
Prediction
Softmax • On a new input x, to make a prediction, pick the class i that maximizes
Regression
Softmax Regression
Cross Entropy vs. MSE
arg max(h(i) (x)) (29)
Capacity, i
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

59
Decision boundaries and decision regions
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Logistic regression for Iris dataset

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

60
Softmax Regression
• Softmax Regression
• Cross Entropy vs. MSE
Softmax Regression
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 4
Logistic
Regression Softmax regression is a generalization of logistic regression that we can use for
Binary Classification
Evaluation
multi-class classification
Multi-class
Classification

Softmax
• In Softmax regression, we replace the sigmoid function by the so-called
Regression
Softmax Regression
softmax function φ(·) = {φ1 , ..., φC }.
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

62
Score function
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 5
Logistic
Regression The score function f that maps the raw features to class scores.
Binary Classification

z = f (x; W , b) = W x + b (30)
Evaluation
Multi-class
Classification

Softmax
Regression input image
Softmax Regression
Cross Entropy vs. MSE

Capacity, 56
Overfitting and 0.2 -0.5 0.1 2.0 1.1 -96.8 cat score
Underfitting
Model Capacity 231
Model vs. Data
1.5 1.3 2.1 0.0 3.2 437.9 dog score
Bias-Variance
Tradeoff of Capacity 24
Regularization 0 0.25 0.2 -0.3 -1.2 61.95 ship score
Tradeoff of
Regularization 2

63
Score function (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Use bias trick W0 = b to represent the two parameters W , b as one

Logistic
Regression
Binary Classification
z = f (x; W ) = W x (31)
Evaluation
Multi-class
Classification new new
Softmax
Regression 1
Softmax Regression
Cross Entropy vs. MSE 56
0.2 -0.5 0.1 2.0 1.1 1.1 0.2 -0.5 0.1 2.0 56
Capacity,
Overfitting and 231
Underfitting 1.5 1.3 2.1 0.0 3.2 3.2 1.5 1.3 2.1 0.0 231
Model Capacity
Model vs. Data
24
Bias-Variance 0 0.25 0.2 -0.3 -1.2 -1.2 0 0.25 0.2 -0.3 24
Tradeoff of Capacity
Regularization
2
Tradeoff of 2
Regularization

64
Softmax function
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 6
Logistic
Regression The softmax function converts a score vector z = (z1 , ..., zC ) to a discrete
distribution vector p = (p1 , ..., pC )
Binary Classification
Evaluation
Multi-class

e zi
Classification

Softmax
Regression
pi = P(y = i | z) = φi (z) = PC , i ∈ [1, ..., C] (32)
zj
Softmax Regression j=1 e
Cross Entropy vs. MSE

Capacity,
Overfitting and
where
Underfitting
Model Capacity
zi = w i0 + w i1 x1 + ... + w iD xD = w |i x (33)
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

65
Softmax function (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification score vectors

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression softmax
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

distribution vectors

66
Softmax function (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
input image
Logistic
Regression
Binary Classification
1
Evaluation
Multi-class
Classification 56 -96.8 0 cat
Softmax
Regression
Softmax Regression
231 437.9 1 dog
Cross Entropy vs. MSE

Capacity, 24 61.95 0 ship

Overfitting and
Underfitting
Model Capacity 2
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

67
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Given D = {(x 1 , y1 )...(x N , yN )} where yn ∈ {1, ..., C}.

Logistic
Regression
• Denote t n is a one-hot encoding of yn (or target discrete distribution)
Binary Classification
Evaluation
• Learning goal: Find a softmax function φW (·) = {φ1 , ..., φC } that minimize
Multi-class
Classification

Softmax
N
arg min E (φW ) = arg min CE (p n , t n ) (34)
X
Regression
Softmax Regression
W W
Cross Entropy vs. MSE n=1
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

68
Cross Entropy
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 7
Logistic
Regression Cross-entropy (CE) is a measure of the difference between two probability
distributions. The cross-entropy between a “true” distribution t = (t1 , ..., tC ) and
Binary Classification
Evaluation
Multi-class
Classification an estimated distribution p = (p1 , ..., pC ) is defined as
Softmax
Regression C
CE (p, t) = − ti log pi (35)
Softmax Regression
X
Cross Entropy vs. MSE

Capacity, i=1
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

69
Cross Entropy (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification p t
Logistic 0.45
Regression 0.4
0.35
Binary Classification

probability
0.3
Evaluation 0.25
Multi-class 0.2
Classification 0.15
0.1
Softmax 0.05
Regression 0
Softmax Regression A B C D E
Cross Entropy vs. MSE
class
Capacity,
Overfitting and
Underfitting

p = (0.1, 0.2, 0.4, 0.2, 0.1)

Model Capacity

→ CE (p, t) = 1.678
Model vs. Data
Bias-Variance
Tradeoff of Capacity t = (0.2, 0.4, 0.2, 0.1, 0.1)
Regularization
Tradeoff of
Regularization
• CE > 0
• CE (p, t) 6= CE (t, p)
• CE minimize if pi = ti , ∀i
70
MSE
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 8
Logistic
Regression Mean squared error (MSE) is a measure of the average of the squares of the
errors.
Binary Classification
Evaluation
Multi-class C
1X
MSE (p, t) = (pi − ti )2 (36)
Classification

Softmax
C
Regression i=1
Softmax Regression
Cross Entropy vs. MSE

Capacity, • MSE ≥ 0
Overfitting and
Underfitting • MSE (p, t) = MSE (t, p)
Model Capacity
Model vs. Data
Bias-Variance
• MSE = 0 if pi = ti , ∀i
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

71
Cross Entropy vs. MSE
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider three “true” binary distributions p = (0.1, 0.9), (0.5, 0.5) and
Logistic (0.8, 0.2)
Regression
Binary Classification
Evaluation p = 0.1 p = 0.5 p = 0.8
Multi-class
6 6 6
Classification

Softmax
5 5 5
Regression
Softmax Regression 4 4 4
Cross Entropy vs. MSE
3 3 3
Capacity,
Overfitting and
Underfitting 2 (0.1log(q) + 0.9log(1 q)) 2 2 (0.8log(q) + 0.2log(1 q))
Model Capacity
1 1
(0.5log(q) + 0.5log(1 q)) 1
Model vs. Data
Bias-Variance (q 0.1)2 (q 0.5)2 (q 0.8)2
Tradeoff of Capacity 0 0 0
Regularization
Tradeoff of 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Regularization

72
Learning Algorithm
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification 1. Initialize the weights W 0 (parameters) at t = 0

Logistic 2. For t = 1, 2, 3, ... do
Regression
Binary Classification
2.1 Compute the ouput distribution p n for each x n (n = 1...N)
Evaluation
Multi-class
Classification
p n = softmax(W t x n ) (37)
Softmax
Regression 2.2 Compute the gradient
Softmax Regression
Cross Entropy vs. MSE N
1 X
∇W E = (p n − t n )x |n (38)
Capacity, N
Overfitting and n=1
Underfitting
Model Capacity 2.3 Update the weights
Model vs. Data
Bias-Variance
Tradeoff of Capacity W t+1 = W t − η∇W E (39)
Regularization
Tradeoff of
Regularization Iterate the next step until W is not change
3. Return the final weights W

73
Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Softmax regression for Iris dataset

Logistic
Regression
Binary Classification Softmax Regression - Gradient Descent
Evaluation 0
1 0.30
Multi-class 2 2
Classification

Softmax 0.25
1
Regression
Softmax Regression 0.20

Cost
Cross Entropy vs. MSE 0

Capacity, 0.15
Overfitting and 1
Underfitting 0.10
Model Capacity
2
Model vs. Data
Bias-Variance
0.05
2 1 0 1 2 3 0 100 200 300 400 500
Tradeoff of Capacity Iterations
Regularization
Tradeoff of
Regularization

74
Capacity, Overfitting and Underfitting
• Model Capacity
• Model vs. Data
• Bias-Variance
• Tradeoff of Capacity
• Regularization
• Tradeoff of Regularization
Model Capacity
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic Concept 9
Regression
Model space
Binary Classification
Evaluation
Capacity is model complexity.
Multi-class

The most common ways to estimate the

Classification

Softmax
Regression
Softmax Regression
capacity of a model:
Cross Entropy vs. MSE
• VC dimension
Capacity,
Overfitting and • The number of parameters
Underfitting
Model Capacity
Model vs. Data
• The norm of parameters simple
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization complex

76
Model vs. Data
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 10
Logistic
Regression Data is divided into three sets: training set, validation set and test set.
Binary Classification
Evaluation

Concept 11
Multi-class
Classification

Softmax
Regression Models can be too limited. We can’t find a function that fits the data well. This
Softmax Regression
Cross Entropy vs. MSE
is called underfitting.
Capacity,
Overfitting and
Underfitting Concept 12
Models can also be too rich. We don’t just model the data, but also the
Model Capacity
Model vs. Data

underlying noise. This is called overfitting.

Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

77
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Given the data set D = {(x1 , y1 ), ...(x10 , y10 )} shown in the following figure,
Logistic find the best regression function to the data
Regression
Binary Classification
Evaluation
Multi-class
Classification
1
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity, y0
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
−1
Tradeoff of
Regularization

0 1
x
78
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider the simple hypothesis set

Logistic
Regression
Binary Classification
H1 = {h | y = h(x) = w0 + w 1 x} (40)
Evaluation
Multi-class
Classification

Softmax
1 1 M =1
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity, y0 y0
Overfitting and
Underfitting
Model Capacity
Model vs. Data −1 −1
Bias-Variance
Tradeoff of Capacity
Regularization 0 1 0 1
Tradeoff of x x
Regularization

79
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider the hypothesis set

Logistic
Regression
Binary Classification
H3 = {h | y = h(x) = w0 + w 1 x + w2 x 2 + w 3 x 3 } (41)
Evaluation

(note that H1 ⊂ H3 )
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
1 1 M =3

Capacity,
Overfitting and
Underfitting y0 y0
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity −1 −1
Regularization
Tradeoff of
Regularization
0 1 0 1
x x

80
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider the hypothesis set

Logistic
Regression
Binary Classification
H9 = {h | y = h(x) = w 0 + w 1 x + w 2 x 2 + ... + w 9 x 9 } (42)
Evaluation

(note that H1 ⊂ H3 ⊂ H9 )
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
1 1 M =9

Capacity,
Overfitting and
Underfitting y0 y0
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity −1 −1
Regularization
Tradeoff of
Regularization
0 1 0 1
x x

81
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic Which one 1 1 M =1

Regression
Binary Classification • Under-fitting
Evaluation y0 y0
Multi-class
Classification
• Over-fitting
Softmax • Appropriate −1 −1

Regression
Softmax Regression fitting 0
x
1 0
x
1

Cross Entropy vs. MSE

Capacity, M =3 M =9
1 1
Overfitting and
Underfitting
Model Capacity y0 y0
Model vs. Data
Bias-Variance
Tradeoff of Capacity −1 −1
Regularization
Tradeoff of
0 1 0 1
Regularization x x

82
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
M=1 M=3 M=9
Logistic
Regression w0 0.82 0.31 0.35
Binary Classification
Evaluation
w1 -1.27 7.99 232.37
Multi-class
Classification w2 -25.43 -5321.83
Softmax w3 17.37 48568.31
Regression
Softmax Regression w4 -231639.30
Cross Entropy vs. MSE
w5 640042.26
Capacity,
Overfitting and w6 -1061800.52
Underfitting
Model Capacity w7 1042400.18
Model vs. Data
Bias-Variance
w8 -557682.99
Tradeoff of Capacity
Regularization
w9 125201.43
Tradeoff of
Regularization

83
Model Performance
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification 1
Logistic Training
Regression
Binary Classification
Test
Evaluation
Multi-class
Classification

ERMS
Softmax 0.5
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data 0
Bias-Variance 0 3 M 6 9
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
Figure 1: Graphs of the root-mean-square error evaluated on the training set and on an
independent test set for various values of M

84
What happen if increasing N
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression 1 N = 15 1 N = 100
Binary Classification
Evaluation
Multi-class
Classification

Softmax
y0 y0
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity, −1 −1
Overfitting and
Underfitting
Model Capacity
Model vs. Data 0 1 0 1
Bias-Variance x x
Tradeoff of Capacity
Regularization
Tradeoff of
Figure 2: Using the M = 9 polynomial for N = 15 data points (left plot) and N = 100
Regularization
data points (right plot). We see that increasing the size of the data set reduces the
over-fitting problem

85
Errors in Learning Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Bias errors: error due to assumption in the model

Logistic
Regression
• High bias to signify underfitting
Binary Classification
Evaluation
• Variance errors: It measures the variability in the results given by model
Multi-class
Classification when the dataset is changed
Softmax • High variance to signify overfitting
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting Expected error = Bias + Variance + Irreducible Error (43)
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

86
Errors in Learning Model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 13
Logistic
Regression Given a learning model hH, Ai, we define the “average” hypothesis g(x)
Binary Classification

g(x) = ED [gD (x)] (44)

Evaluation
Multi-class
Classification

Softmax
Regression where gD (x) is the “best” hypothesis given the data set D
Softmax Regression
Cross Entropy vs. MSE • Bias of learning model
Capacity,
Overfitting and
Bias = Ex (g(x) − f (x))2 (45)
h i
Underfitting
Model Capacity
Model vs. Data

where f (x) is the “truth” function

Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
• Variance of learning model

Variance = Ex ED (gD (x) − g(x))2 (46)

87
Errors in Learning Model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

truth function
Classification

Logistic
Regression
Binary Classification
bias
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
variance
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
• Given many independent data sets D1 , D2 , ..., DK , we can estimate g(x) by
Regularization
K
1 X
g(x) ≈ gDi (x) (47)
K
i=1
88
Example: two learning models
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider a target function sine

Logistic
Regression
f : [−1, 1] →
(48)
Binary Classification
R
Evaluation
Multi-class
x → sin(πx)
Classification

Softmax
Regression
• We generate 100 data sets {Di } , i = 1, ..., 100, each containing N = 2 data
Softmax Regression
Cross Entropy vs. MSE
points, independently from the sinusoidal curve f (x) = sin(πx). For each
Capacity,
data set Di , we fit the data using one of two models
Overfitting and
Underfitting
• H0 : set of all lines of the form h(x) = b
Model Capacity
Model vs. Data
• H1 : set of all lines of the form h(x) = ax + b
Bias-Variance • Note that H0 ⊂ H1
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

89
Example: two learning models (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Given a data set Di = {(x1 , y1 ), (x2 , y2 )}.
Logistic • For H0 , we choose the constant hypothesis that best fits the data (the
horizontal line at the midpoint, b = (y1 + y2 )/2).
Regression
Binary Classification
Evaluation
Multi-class
• For H1 , we choose the line that passes through the two data points
Classification
(x1 , y1 ) and (x2 , y2 ).
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

90
Example: two learning models (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Repeating this process with 100 data sets {Di } , i = 1, ..., 100,
Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

91
Example: two learning models (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • The bias-variance for each learning model

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
bias=0.50 bias=0.21
Tradeoff of
Regularization var=0.25 var=1.69

92
Generalization and Capacity
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification The criteria determining how well a machine learning model will perform:
Logistic 1. Make the training error small.
Regression
Binary Classification
Evaluation
2. Make the gap between training and test (generalization) error small.
Multi-class
Classification

Softmax Training error

Regression Under-fitting Over-fitting
Softmax Regression Generalization error
Cross Entropy vs. MSE

Capacity,
Error

Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Generalization gap
Regularization
Tradeoff of
Regularization 0 Optimal Capacity
Capacity

93
Regularization
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 14
Logistic
Regression Regularization is any modification we make to a learning model that is intended
Binary Classification
Evaluation
to reduce its generalization error but not its training error.
Multi-class
Classification
performance measure
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
algorithm
Model Capacity
Model vs. Data
Bias-Variance
data
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

model

94
Addressing
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
High bias High variance
Logistic
Regression Obtain more features Decrease number of features
Binary Classification
Evaluation
Decrease regularization λ Increase regularization λ
Multi-class
Classification Extend model Obtain more data
Softmax Train longer Stop early
Regression
Softmax Regression New model architecture New model architecture
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

95
Regularization for Linear Regression
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Using regularized MSEtrain for linear regression

Logistic
Regression

1 T
Binary Classification
w = arg min MSEtrain + λ w w (49)
Evaluation
Multi-class
w N
Classification

Softmax
Regression
where λ is the regularization coefficient (hyper-parameter) that controls the
Softmax Regression relative importance of the data-dependent error MSEtrain and the
1 T
regularization term λ N w w
Cross Entropy vs. MSE
Capacity,
Overfitting and
Underfitting
• Solving for w, we obtain
Model Capacity

w = (X | X + λI)−1 X | y (50)
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

96
Example: one learning model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider a target function sine

Logistic
Regression
f : [−1, 1] →
(51)
Binary Classification
R
Evaluation
Multi-class
x → sin(2πx)
Classification

Softmax
Regression
• We generate 100 data sets {Di } , i = 1, ..., 100, each containing N = 25 data
Softmax Regression
Cross Entropy vs. MSE
points, independently from the sinusoidal curve f (x) = sin(2πx). For each
Capacity,
data set Di , we fit a model with 24 Gaussian basis functions by minimizing
Overfitting and
Underfitting
the regularized error function
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

97
Example: one learning model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Illustration of the dependence of bias and variance on model regularization
Logistic coefficient
Regression
Binary Classification
Evaluation
Multi-class y y y
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity y y y
Regularization
Tradeoff of
Regularization

98
Example: one learning model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Summary of the dependence of bias and variance on model regularization
Logistic coefficient
Regression
Binary Classification 0.15
Evaluation
Multi-class
Classification
(bias)2
0.12 variance
Softmax
Regression (bias)2 + variance
Softmax Regression
0.09 test error
Cross Entropy vs. MSE

Capacity,
Overfitting and 0.06
Underfitting
Model Capacity
Model vs. Data
0.03
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
0
Regularization −3 −2 −1 0 1 2
ln λ

99
References

Goodfellow, I., Bengio, Y., and Courville, A. (2016).

Deep learning.
MIT press.
Lê, B. and Tô, V. (2014).
Cở sở trí tuệ nhân tạo.
Nhà xuất bản Khoa học và Kỹ thuật.
Russell, S. and Norvig, P. (2021).
Artificial intelligence: a modern approach.
Pearson Education Limited.

FRM Part 1 Quants 2023 ML
No ratings yet
FRM Part 1 Quants 2023 ML
8 pages
DS203 2024 01 02 LogisticRegression
No ratings yet
DS203 2024 01 02 LogisticRegression
38 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
CS373 Lecture18.1
No ratings yet
CS373 Lecture18.1
33 pages
Week 10_Lecture 10
No ratings yet
Week 10_Lecture 10
59 pages
Learning 2
No ratings yet
Learning 2
104 pages
Qabd Unit 2 PDF
No ratings yet
Qabd Unit 2 PDF
74 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
Module 3 AE4 Linear Programming The Simplex Method
No ratings yet
Module 3 AE4 Linear Programming The Simplex Method
73 pages
Lecture2 PDF
No ratings yet
Lecture2 PDF
111 pages
Summary of New Features in 12.0
No ratings yet
Summary of New Features in 12.0
19 pages
Course Material- Artificial Intelligence-Week5_update
No ratings yet
Course Material- Artificial Intelligence-Week5_update
30 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
Hota ML Regression
No ratings yet
Hota ML Regression
57 pages
Corrupted Rank-One Measurements: Low-Rank Positive Semidefinite Matrix Recovery From
No ratings yet
Corrupted Rank-One Measurements: Low-Rank Positive Semidefinite Matrix Recovery From
12 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37 (1)
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37 (1)
115 pages
1.1. Linear Models — scikit-learn 1.6.1 documentation
No ratings yet
1.1. Linear Models — scikit-learn 1.6.1 documentation
41 pages
Week 04
No ratings yet
Week 04
101 pages
Excel File Tables for Statistical Analysis (1)
No ratings yet
Excel File Tables for Statistical Analysis (1)
8 pages
Lec 17 -Dsfa23
No ratings yet
Lec 17 -Dsfa23
32 pages
Teaching Integer Programming Formulations Using The Traveling Salesman Problem
No ratings yet
Teaching Integer Programming Formulations Using The Traveling Salesman Problem
8 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
ML - 2
No ratings yet
ML - 2
27 pages
Lecture 4-Revision_Part3_PCA_Reg
No ratings yet
Lecture 4-Revision_Part3_PCA_Reg
39 pages
Feature Selection For SVMS: by J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, V. Vapnik
No ratings yet
Feature Selection For SVMS: by J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, V. Vapnik
19 pages
6. Differentiation, Partial Differentiation & Gradients (1)
No ratings yet
6. Differentiation, Partial Differentiation & Gradients (1)
51 pages
Week 9 PDF
No ratings yet
Week 9 PDF
70 pages
Unit 6 - SA 1 PDF
No ratings yet
Unit 6 - SA 1 PDF
18 pages
Lecture W7ab
No ratings yet
Lecture W7ab
37 pages
Restricted Boltzmann Machines For Collaborative Filtering
No ratings yet
Restricted Boltzmann Machines For Collaborative Filtering
8 pages
1.1. Linear Models - Scikit-Learn 1.4.2 Documentation
No ratings yet
1.1. Linear Models - Scikit-Learn 1.4.2 Documentation
17 pages
Chapter 6 (Part I)
0% (1)
Chapter 6 (Part I)
40 pages
Complexity of Matrix Rank and Rigidity
No ratings yet
Complexity of Matrix Rank and Rigidity
1 page
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
2023 L2 Seminars
No ratings yet
2023 L2 Seminars
53 pages
Regression Analysis: Estimating Relationships
No ratings yet
Regression Analysis: Estimating Relationships
12 pages
Logistic Regression:: PGP Dse Bangalore July 2018
No ratings yet
Logistic Regression:: PGP Dse Bangalore July 2018
62 pages
ML Logistic Regression
No ratings yet
ML Logistic Regression
19 pages
Welcome To:: Multiple Regression and Model Building
No ratings yet
Welcome To:: Multiple Regression and Model Building
20 pages
Regression Analysis
No ratings yet
Regression Analysis
8 pages
1.3.2. Feature Engineering and Variable - Transformation
No ratings yet
1.3.2. Feature Engineering and Variable - Transformation
29 pages
Imran Muneeb Zayan ISL CH4 (8-21-2024 1445)
No ratings yet
Imran Muneeb Zayan ISL CH4 (8-21-2024 1445)
19 pages
Lecture Slides Week11
No ratings yet
Lecture Slides Week11
33 pages
Classical Optimization Techniques
No ratings yet
Classical Optimization Techniques
48 pages
ملخص شامل عن Quantile Regression
No ratings yet
ملخص شامل عن Quantile Regression
3 pages
Logistic Regression Algorithm
No ratings yet
Logistic Regression Algorithm
8 pages
Lecture Slides-Week11
No ratings yet
Lecture Slides-Week11
32 pages
Linear Regression: Machine Learning
No ratings yet
Linear Regression: Machine Learning
9 pages
Chap2slides - Copy
No ratings yet
Chap2slides - Copy
74 pages
2024 L2 Seminars
No ratings yet
2024 L2 Seminars
47 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Lecture04. Training Models (Regression in Chapter 4)
No ratings yet
Lecture04. Training Models (Regression in Chapter 4)
44 pages
Lecture 6,7-Linear Regression
No ratings yet
Lecture 6,7-Linear Regression
47 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Linear Regression & SVM
No ratings yet
Linear Regression & SVM
33 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Complex Variables Demystified
From Everand
Complex Variables Demystified
David McMahon
3/5 (1)
MATLAB Demystified
From Everand
MATLAB Demystified
David McMahon
5/5 (1)

Lect03 Linear Model ML

Uploaded by

Lect03 Linear Model ML

Uploaded by

LINEAR MODEL

Bùi Tiến Lên

5. Capacity, Overfitting and Underfitting

0 50 100 200 300 0 50 100 200 300

Capacity, a vector of parameters of the model

Classification • D is a large number

Logistic 1. Construct the matrices X, A and the vector y

Classification • Linearity in the weights

Softmax hw (x) = w0 φ0 (x) + w1 φ1 (x) + ... + wM φM (x) (11)

where φi (x) are basis functions

Years of Education Years of Education

Classification • The requirement is to build a system that can take a vector x ∈ RD as

where ŷ be the value that our model (function) predicts y, w ∈ RM+1 is a

Capacity, vector of parameters of the model and φ is a set of M + 1 basis functions

Logistic 1. Construct the matrix Φ and the vector y

w = (Φ| Φ)−1 Φ| y (18)

Classification import numpy as np

Classification • Find a polynomial regression function y = f (x) = w0 + w1 x + w2 x 2 given the

Classification • What basis functions?

Classification Input representation or feature extraction

linear model w | = w0 w1 . . . w 256

Classification • Some algorithms can work with categorical data directly.

linear classification linear regression logistic regression

Softmax • The logistic function converts a σ

Classification • h(x) = σ(s) can be interpreted as a probability

score probability of heart attack

Classification • The target function f is the probability distribution

• Hypothesis set hw (x) = σ(w T x) and the conditional probability

Classification We define error measurement based on likelihood

Capacity, z = hw (x) = σ(w | x) (22)

n=1 (yn log zn + (1 − yn ) log(1 − zn ))

Classification 1. Initialize the weights (parameters) at t = 0 w 0

Classification • How η affects the algorithm?

Classification import matplotlib . pyplot as plt

Classification • Consider two-class problem with two classes ⊕ and

Softmax y is if P(y | x) < th (28)

Cross Entropy vs. MSE

Model vs. Data

−4 −2 0 2 0.00 0.25 0.50 0.75 1.00

h(2) (x) = P(y = 2 | x, w 2 )

h(3) (x) = P(y = 3 | x, w 3 )

Classification • Logistic regression for Iris dataset

Classification • Use bias trick W0 = b to represent the two parameters W , b as one

Classification score vectors

Capacity, 24 61.95 0 ship

Classification • Given D = {(x 1 , y1 )...(x N , yN )} where yn ∈ {1, ..., C}.

p = (0.1, 0.2, 0.4, 0.2, 0.1)

Classification 1. Initialize the weights W 0 (parameters) at t = 0

Classification • Softmax regression for Iris dataset

The most common ways to estimate the

underlying noise. This is called overfitting.

Classification • Consider the simple hypothesis set

Classification • Consider the hypothesis set

Classification • Consider the hypothesis set

Logistic Which one 1 1 M =1

Cross Entropy vs. MSE

Classification • Bias errors: error due to assumption in the model

g(x) = ED [gD (x)] (44)

where f (x) is the “truth” function

Variance = Ex ED (gD (x) − g(x))2 (46)

Classification • Consider a target function sine

Classification • The bias-variance for each learning model

Softmax Training error

Classification • Using regularized MSEtrain for linear regression

Classification • Consider a target function sine

Goodfellow, I., Bengio, Y., and Courville, A. (2016).

You might also like