Lecture 5_Logistic Regression (1)
Lecture 5_Logistic Regression (1)
Boyu Wang
Department of Computer Science
University of Western Ontario
Recall: tumor example
1
Recall: supervised learning
2
Linear model for classification
hw (x) = w > x
I How to choose w?
3
Error (cost) function for classification
4
Error (cost) function for classification
4
Error (cost) function for classification
4
Probabilistic view for classification
5
Probabilistic view for classification
5
Probabilistic view for classification
5
Sigmoid function
1 ea
σ(a) , =
1 + e−a 1 + ea
- a → +∞ ⇒ σ(a) → 1
- a → −∞ ⇒ σ(a) → 0
- a = 0 ⇒ σ(a) = 0.5
1
p(y = 1|x; w) , σ(hw (x)) = ,
1 + e−w > x
which is exactly what we are looking for!
6
1D sigmoid function
7
2D sigmoid function
8
Error (cost) function for classification – revisited
9
The cross-entropy loss function for logistic regression
I Recall that for any data point (xi , yi ), the conditional probability
can be represented by the sigmoid function:
p(yi = 1|xi ; h) = σ(hw (xi ))
10
Linear regression vs. logistic regression
Both use linear model: hw (x) = w > x
I Conditional probability:
2
yi −w > xi
− 21
√ 1
σ
- linear regression: p(yi |xi ; w) = 2πσ 2
e
1
>x , if yi = 1
- logistic regression: p(yi |xi ; w) = 1+e−w i
1 − 1
>x , if yi = 0
1+e−w i
I Log-likelihood function:
2 !
yi −w > xi
Pm − 21
√ 1
σ
- linear regression: log L(w) = i=1 log
2πσ 2
e
- logistic regression:
log L(w) = m
P
i=1 (yi log ti + (1 − yi ) log(1 − ti )), where
ti = σ(hw (xi )).
I Solution:
- linear regression: w = (x > X )−1 X > y
- logistic regression: No analytical solution
11
Optimization procedure: gradient descent
12
Effect of the learning rate
13
Optimization procedure: Newton’s method
Objective: minimize a loss function J(w)
14
Gradient descent for logistic regression
Objective: minimize the cross-entropy loss function(− log L(w)):
m
X
J(w) , − log L(w) = − (yi log ti + (1 − yi ) log(1 − ti ))
i=1
m
X 1
∇J(w) = (ti − yi )xi , ti = σ(hw (xi )) =
i=1
1 + e−w > xi
16
Gradient descent vs. Newton’s method
17
Cross-entropy loss vs. squared loss
18
Regularized logistic regression
One can do regularization for logistic regression just like in the case
of linear regression.
λ
J(w) , − log L(w) + ||w||22
2
19
Multi-class logistic regression
I For 2 classes:
>
1 ew x
p(y = 1|x; w) = > =
1 + e−w x 1 + ew > x
20
Multi-class logistic regression
21
Multi-class logistic regression
22
Summary
23