0% found this document useful (0 votes)
16 views40 pages

Lec06-PracticalML

Uploaded by

Tayyab Mughal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views40 pages

Lec06-PracticalML

Uploaded by

Tayyab Mughal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Lecture 06

Practical Machine Learning

Slides taken from Andrew Ng Course


Agenda
• Problems encountered when Appling ML to
real world problems and advice on how to
tackle them
– The problem of overfitting
• Regularization
– Model Selection
• How to choose a hypothesis given multiple hypothesis
– Bias-Variance Tradeoff
– Precision-Recall
Recap
• Supervised Learning
– Decision Trees
– Linear Regression
– Logistic Regression
– Practical ML
Practical ML
The problem of
overfitting
Machine Learning
Example: Linear regression (housing prices)
Price

Price

Price
Size Size Size

Overfitting: If we have too many features, the learned hypothesis


may fit the training set very well ( ), but fail
to generalize to new examples (predict prices on new examples).
Example: Logistic regression

x2 x2 x2

x1 x1 x1
( = sigmoid function)
Practical ML
Regularization

Machine Learning
Intuition

Price
Price

Size of house Size of house

Suppose we penalize and make , really small.


+ 1000 𝜃32 + 1000 𝜃42
Regularization.

Small values for parameters


― “Simpler” hypothesis
― Less prone to overfitting
Housing:
― Features:
― Parameters:

𝑛 2
σ
+ 𝜆 𝑗=1 𝜃𝑗
Regularization.

Price

Size of house
In regularized linear regression, we choose to minimize

What if is set to an extremely large value (perhaps for too large


for our problem, say )?
Price

Size of house
Practical ML
Regularized Linear
regression
Machine Learning
Regularized linear regression
Gradient descent
Repeat

𝜆
− 𝜃𝑗
𝑚
Practical ML
Regularized logistic
regression
Machine Learning
Regularized logistic regression.

x2

x1
Cost function:

𝑛
𝜆
+ ෍ 𝜃𝑗2
2𝑚
𝑗=1
Gradient descent
Repeat

𝜆
− 𝜃𝑗
𝑚
Practical ML
Model Selection

Machine Learning
Evaluating the hypothesis
Dataset:
Size Price
2104 400
1600 330
2400 369
1416 232
3000 540
1985 300
1534 315
1427 199
1380 212
1494 243
Training/testing procedure for linear regression

- Learn parameter from training data (minimizing


training error )

- Compute test set error:

𝑚𝑡𝑒𝑠𝑡
1 𝑖 𝑖 2
𝐽 𝜃 = ෍ (ℎ 𝑥 ) − 𝑦
2𝑚𝑡𝑒𝑠𝑡
𝑖=1
Model selection

Models trained using


training data

How to choose the best model (degree of the features in


this case):
• Report test set error 𝐽𝑡𝑒𝑠𝑡 (𝜃)
• but 𝐽𝑡𝑒𝑠𝑡 (𝜃) is likely to be an optimistic estimate of
generalization error.

The same problem is encountered while trying to find the


optimal regularization parameter 𝜆
Evaluating your hypothesis
Dataset:
Size Price
2104 400
1600 330
2400 369
1416 232
3000 540
1985 300
1534 315
1427 199
1380 212
1494 243
Train/validation/test error
Training error:

Cross Validation error:

Test error:
Practical ML
Bias-Variance
Tradeoff
Machine Learning
Bias error: error from erroneous
assumptions in the learning algorithm
(under fitting)

Variance: error from sensitivity to small


fluctuations in the training set. High
variance can cause an algorithm to model
the random noise in the training data rather
that the intended output (overfitting).
Bias/variance
Price

Price

Price
Size Size Size

High bias “Just right” High variance


(underfit) (overfit)
Bias/variance
Training error:

Cross validation error:


Diagnosing bias vs. variance
Suppose your learning algorithm is performing less well than
you were hoping. ( or is high.) Is it a bias
problem or a variance problem?
Bias (underfit):
𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 will be high
error

(cross validation
error)
𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 ≈ 𝐽𝑐𝑣 𝜃

Variance (overfit):
(training error) 𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 will be low
𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 << 𝐽𝑐𝑣 𝜃
degree of
polynomial d
Practical ML

Regularization and
bias/variance
Machine Learning
Linear regression with regularization
Model:
Price

Price

Price
Size Size Size
Large xx Intermediate xx Small xx
High bias (underfit) “Just right” High variance (overfit)
Choosing the regularization parameter
Choosing the regularization parameter
Model:

1. Try
2. Try
3. Try
4. Try
5. Try

12. Try
Bias/variance as a function of the regularization parameter
Debugging a learning algorithm:
Suppose you have implemented regularized linear regression to predict
housing prices. However, when you test your hypothesis in a new set of
houses, you find that it makes unacceptably large errors in its
prediction. What should you try next?

- Get more training examples Fix High variance


- Try smaller sets of features Fix High variance
- Try getting additional features Fix High bias
- Try adding polynomial features Fix High bias
- Try decreasing Fix High bias
- Try increasing Fix High variance
Practical ML

Error Metric for skewed


classes:
Precision/Recall

Machine Learning
Cancer classification example
Train logistic regression model .( if cancer,
otherwise)
Find that you got 1% error on test set.
(99% correct diagnoses)

Only 0.50% of patients have cancer.


Precision/Recall
in presence of rare class that we want to detect
Actual Class Precision
1 0 (Of all patients where we predicted
True False 𝑦 = 1, what fraction actually has
Pred. 1 Positive Positive cancer?)
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠.
Class =
False True 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠. + 𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠.
0 Negative Negative Recall
(Of all patients that actually have
cancer, what fraction did we correctly
detect as having cancer?)
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠.
=
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠. + 𝐹𝑎𝑙𝑠𝑒 𝑛𝑒𝑔.
Trading off precision and recall

Logistic regression:
Predict 1 if
Predict 0 if
Suppose we want to predict (cancer)
only if very confident. (set high threshold)
High Precision, low threshold
Suppose we want to avoid missing too many cases of
cancer (avoid false negatives). (set low threshold)
High Recall, low Precision
More generally: Predict 1 if threshold.
F1 Score (F score):

How to compare precision/recall numbers?

Precision Recall Average


(P) (R)
Algorithm 1 0.5 0.4 0.45 0.444

Algorithm 2 0.7 0.1 0.4 0.175

Algorithm 3 0.02 1.0 0.51 0.0392


Summary
• Problems encountered when Appling ML to
real world problems and advice on how to
tackle them
– The problem of overfitting
• Regularization
– Model Selection
• How to choose a hypothesis given multiple hypothesis
– Bias-Variance Tradeoff
– Precision-Recall

You might also like