0% found this document useful (0 votes)

7 views

Over Fit

Uploaded by

Hafiza Aleeza Mustafa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Over Fit

Uploaded by

Hafiza Aleeza Mustafa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 63

Cross-validation for

detecting and preventing

overfitting
Note to other teachers and users of
these slides. Andrew would be
Andrew W. Moore
delighted if you found this source
material useful in giving your own
lectures. Feel free to use these
Professor
slides verbatim, or to modify them
to fit your own needs. PowerPoint School of Computer Science
originals are available. If you make
use of a significant portion of these
slides in your own lecture, please Carnegie Mellon University
include this message, or the
following link to the source www.cs.cmu.edu/~awm
repository of Andrew’s tutorials:
http://www.cs.cmu.edu/~awm/tutori [email protected]
als
. Comments and corrections 412-268-7599
gratefully received.

Copyright © Andrew W. Moore Slide 1

A Regression Problem

y = f(x) + noise
Can we learn f from this data?

x Let’s consider three methods…

Copyright © Andrew W. Moore Slide 2

Linear Regression

Copyright © Andrew W. Moore Slide 3

Linear Regression
Univariate Linear regression with a constant term:
X Y X= 3 y= 7
Originally
3 7 1 3 discussed in the
previous Andrew
1 3 : : Lecture: “Neural
Nets”
: : x1=(3).. y1=7..

Copyright © Andrew W. Moore Slide 4

Linear Regression
Univariate Linear regression with a constant term:
X Y X= 3 y= 7

3 7 1 3
1 3 : :
: : x1=(3).. y1=7..
Z= 1 3 y=
7
1 1 3
: :
z1=(1,3).. y1=7..
zk=(1,xk)

Copyright © Andrew W. Moore Slide 5

Linear Regression
Univariate Linear regression with a constant term:
X Y X= 3 y= 7

3 7 1 3
1 3 : :
: : x1=(3).. y1=7..
Z= 1 3 y=
7
1 1 3
: : =(ZTZ)-1(ZTy)
z1=(1,3).. y1=7..
yest = 0+ 1 x
zk=(1,xk)

Copyright © Andrew W. Moore Slide 6

Quadratic Regression

Copyright © Andrew W. Moore Slide 7

Quadratic Regression
Much more about
X Y X= 3 y= 7 this in the future
Andrew Lecture:
3 7 1 3 “Favorite
Regression
1 3 : : Algorithms”
: : x1=(3,2).. y1=7..
1 3 9 y=
Z=
7
1 1 1
3 =(ZTZ)-1(ZTy)
:
:
z=(1 , x, x2,) yest = 0+ 1 x+ 2 x2

Copyright © Andrew W. Moore Slide 8

Join-the-dots
Also known as piecewise
linear nonparametric
regression if that makes
you feel better

Copyright © Andrew W. Moore Slide 9

Which is best?

y y

x x

Why not choose the method with the

best fit to the data?

Copyright © Andrew W. Moore Slide 10

What do we really want?

y y

x x

Why not choose the method with the

best fit to the data?

“How well are you going to predict future

data drawn from the same distribution?”

Copyright © Andrew W. Moore Slide 11

The test set method

1. Randomly choose 30%

of the data to be in a test
set
2. The remainder is a
y training set

Copyright © Andrew W. Moore Slide 12

The test set method

1. Randomly choose 30%

of the data to be in a test
set
2. The remainder is a
y training set
3. Perform your
regression on the training
x set

(Linear regression example)

Copyright © Andrew W. Moore Slide 13

The test set method

1. Randomly choose 30%

of the data to be in a test
set
2. The remainder is a
y training set
3. Perform your
regression on the training
x set

(Linear regression example) 4. Estimate your future

performance with the test
Mean Squared Error = 2.4 set
Copyright © Andrew W. Moore Slide 14
The test set method

1. Randomly choose 30%

of the data to be in a test
set
2. The remainder is a
y training set
3. Perform your
regression on the training
x set

(Quadratic regression example) 4. Estimate your future

performance with the test
Mean Squared Error = 0.9 set
Copyright © Andrew W. Moore Slide 15
The test set method

1. Randomly choose 30%

of the data to be in a test
set
2. The remainder is a
y training set
3. Perform your
regression on the training
x set

(Join the dots example) 4. Estimate your future

performance with the test
Mean Squared Error = 2.2 set
Copyright © Andrew W. Moore Slide 16
The test set method
Good news:
•Very very simple
•Can then simply choose the method with
the best test-set score
Bad news:
•What’s the downside?

Copyright © Andrew W. Moore Slide 17

The test set method
Good news:
•Very very simple
•Can then simply choose the method with
the best test-set score
Bad news:
We say the
•Wastes data: we get an estimate of the “test-set
best method to apply to 30% less data estimator of
performance
•If we don’t have much data, our test-set has high
variance”
might just be lucky or unlucky

Copyright © Andrew W. Moore Slide 18

LOOCV (Leave-one-out Cross Validation)
For k=1 to R
1. Let (xk,yk) be the kth record

Copyright © Andrew W. Moore Slide 19

LOOCV (Leave-one-out Cross Validation)
For k=1 to R
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)
from the dataset

Copyright © Andrew W. Moore Slide 20

LOOCV (Leave-one-out Cross Validation)
For k=1 to R
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)
from the dataset

y
3. Train on the remaining R-1
datapoints

Copyright © Andrew W. Moore Slide 21

LOOCV (Leave-one-out Cross Validation)
For k=1 to R
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)
from the dataset

y
3. Train on the remaining R-1
datapoints
4. Note your error (xk,yk)
x

Copyright © Andrew W. Moore Slide 22

LOOCV (Leave-one-out Cross Validation)
For k=1 to R
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)
from the dataset

y
3. Train on the remaining R-1
datapoints
4. Note your error (xk,yk)
x
When you’ve done all points,
report the mean error.

Copyright © Andrew W. Moore Slide 23

LOOCV (Leave-one-out Cross Validation)
For k=1 to R
1. Let (xk,yk) be
the kth
record
y y y 2. Temporarily
remove
(xk,yk) from
x x x the dataset
3. Train on the
remaining
R-1
datapoints
4. Note your
y y y error (xk,yk)
When you’ve
done all points,
x x x report the mean
error.
MSELOOCV
= 2.12
y y y

x x x

Copyright © Andrew W. Moore Slide 24

LOOCV for Quadratic Regression
For k=1 to R
1. Let (xk,yk) be
the kth
record
y y y 2. Temporarily
remove
(xk,yk) from
x x x the dataset
3. Train on the
remaining
R-1
datapoints
4. Note your
y y y error (xk,yk)
When you’ve
done all points,
x x x report the mean
error.
MSELOOCV
=0.962
y y y

x x x

Copyright © Andrew W. Moore Slide 25

LOOCV for Join The Dots
For k=1 to R
1. Let (xk,yk) be
the kth
record
y y y 2. Temporarily
remove
(xk,yk) from
x x x the dataset
3. Train on the
remaining
R-1
datapoints
4. Note your
y y y error (xk,yk)
When you’ve
done all points,
x x x report the mean
error.
MSELOOCV
=3.33
y y y

x x x

Copyright © Andrew W. Moore Slide 26

Which kind of Cross Validation?
Downside Upside
Test-set Variance: unreliable Cheap
estimate of future
performance
Leave- Expensive. Doesn’t
one-out Has some weird waste data
behavior

..can we get the best of both worlds?

Copyright © Andrew W. Moore Slide 27

k-fold Cross Randomly break the dataset into k
partitions (in our example we’ll have k=3
Validation partitions colored Red Green and Blue)

Copyright © Andrew W. Moore Slide 28

Copyright © Andrew W. Moore Slide 29

Copyright © Andrew W. Moore Slide 30

k-fold Cross Randomly break the dataset into k
partitions (in our example we’ll have k=3
Validation partitions colored Red Green and Blue)
For the red partition: Train on all the
points not in the red partition. Find
the test-set sum of errors on the red
points.
For the green partition: Train on all the
points not in the green partition.
y Find the test-set sum of errors on
the green points.
For the blue partition: Train on all the
x points not in the blue partition. Find
the test-set sum of errors on the
blue points.

Copyright © Andrew W. Moore Slide 31

Copyright © Andrew W. Moore Slide 32

Copyright © Andrew W. Moore Slide 33

Copyright © Andrew W. Moore Slide 34

Which kind of Cross Validation?
Downside Upside
Test-set Variance: unreliable Cheap
estimate of future
performance

Leave- Expensive. Doesn’t waste data

one-out Has some weird behavior
10-fold Wastes 10% of the data. Only wastes 10%. Only
10 times more expensive 10 times more expensive
than test set instead of R times.
3-fold Wastier than 10-fold. Slightly better than test-
Expensivier than test set set
R-fold Identical to Leave-one-out

Copyright © Andrew W. Moore Slide 35

Which kind of Cross Validation?
Downside Upside
Test-set Variance: unreliable Cheap
estimate of future
performance
But note: One of
Leave- Expensive. Doesn’t joys
Andrew’s wastein data
life is
one-out Has some weird behavioralgorithmic tricks for
10-fold Wastes 10% of the data. making these cheap
Only wastes 10%. Only
10 times more expensive 10 times more expensive
than testset instead of R times.
3-fold Wastier than 10-fold. Slightly better than test-
Expensivier than testset set
R-fold Identical to Leave-one-out

Copyright © Andrew W. Moore Slide 36

CV-based Model Selection
• We’re trying to decide which algorithm to use.
• We train each machine and make a table…

i fi TRAINERR 10-FOLD-CV-ERR Choice

1 f1
2 f2
3 f3 
4 f4
5 f5
6 f6
Copyright © Andrew W. Moore Slide 37
CV-based Model Selection
• Example: Choosing number of hidden units in a one-
hidden-layer neural net.
• Step 1: Compute 10-fold CV error for six different model
classes:
Algorithm TRAINERR 10-FOLD-CV-ERR Choice
0 hidden units
1 hidden units
2 hidden units 
3 hidden units
4 hidden units
5 hidden units

• Step 2: Whichever model class gave best CV score: train it

with all the data, and that’s the predictive model you’ll use.
Copyright © Andrew W. Moore Slide 38
CV-based Model Selection
• Example: Choosing “k” for a k-nearest-neighbor regression.
• Step 1: Compute LOOCV error for six different model
classes:
Algorithm TRAINERR 10-fold-CV-ERR Choice
K=1
K=2
K=3
K=4 
K=5
K=6

• Step 2: Whichever model class gave best CV score: train it

with all the data, and that’s the predictive model you’ll use.
Copyright © Andrew W. Moore Slide 39
CV-based Model Selection
• Example: Choosing “k” for a k-nearest-neighbor regression.
• Step 1: Compute LOOCV error for six different
NN (and model
The reason is Computational. For k-
all other nonparametric

classes: Why did we use 10-fold-CV for methods) LOOCV happens to be as

cheap as regular predictions.
neural nets and LOOCV for k-
nearest neighbor? No good reason, except it looked
like things were getting worse as K
Algorithm TRAINERR LOOCV-ERR Choice
And why stop at K=6 was increasing

K=1
Are we guaranteed that a local Sadly, no. And in fact, the
relationship can be very bumpy.
K=2 optimum of K vs LOOCV will be
K=3 the global optimum?

K=4 What should we do if we are 

depressed at the expense of Idea One: K=1, K=2, K=4, K=8,
K=5 doing LOOCV for K= 1 through
K=16, K=32, K=64 … K=1024
Idea Two: Hillclimbing from an initial
K=6 1000? guess at K

• Step 2: Whichever model class gave best CV score: train it

with all the data, and that’s the predictive model you’ll use.
Copyright © Andrew W. Moore Slide 40
CV-based Model Selection
• Can you think of other decisions we can ask Cross
Validation to make for us, based on other machine learning
algorithms in the class so far?

These involve
choosing the value of a
real-valued parameter.
What should we do?

These involve Idea One: Consider a discrete set of values

(often best to consider a set of values with
choosing the value of a exponentially increasing gaps, as in the K-NN
real-valued parameter. example).
 LOOCV
What should we do? Idea Two: Compute and then
 Parameter
do gradianet descent.

CV-based Model Selection
• Can you think of other decisions we can ask Cross
Validation to make for us, based on other machine learning
algorithms in the class so far?
• Degree of polynomial in polynomial regression
• Whether to use full, diagonal or spherical Gaussians in a Gaussian
Bayes Classifier.
• The Kernel Width in Kernel Regression
• The Kernel Width in Locally Weighted Regression t or s o f a n on-
The sc ale fac
• The Bayesian Prior in Bayesian :
AlsoRegression ist a n c e m etric
parametr ic d
These involve Idea One: Consider a discrete set of values
(often best to consider a set of values with
choosing the value of a exponentially increasing gaps, as in the K-NN
real-valued parameter. example).
 LOOCV
What should we do? Idea Two: Compute and then
 Parameter
do gradianet descent.

CV-based Algorithm Choice
• Example: Choosing which regression algorithm to use
• Step 1: Compute 10-fold-CV error for six different model
classes:

Algorithm TRAINERR 10-fold-CV-ERR Choice

1-NN
10-NN
Linear Reg’n
Quad reg’n 
LWR, KW=0.1
LWR, KW=0.5

• Step 2: Whichever algorithm gave best CV score: train it

with all the data, and that’s the predictive model you’ll use.
Copyright © Andrew W. Moore Slide 45
Alternatives to CV-based model selection
• Model selection methods:
1. Cross-validation
2. AIC (Akaike Information Criterion)
3. BIC (Bayesian Information Criterion)
4. VC-dimension (Vapnik-Chervonenkis Dimension)

Only directly applicable to

choosing classifiers

Described in a future
Lecture

Which model selection method is best?
1. (CV) Cross-validation
2. AIC (Akaike Information Criterion)
3. BIC (Bayesian Information Criterion)
4. (SRMVC) Structural Risk Minimize with VC-dimension
• AIC, BIC and SRMVC advantage: you only need the training
error.
• CV error might have more variance
• SRMVC is wildly conservative
• Asymptotically AIC and Leave-one-out CV should be the same
• Asymptotically BIC and carefully chosen k-fold should be same
• You want BIC if you want the best structure instead of the best
predictor (e.g. for clustering or Bayes Net structure finding)
• Many alternatives---including proper Bayesian approaches.
• It’s an emotional issue.
Copyright © Andrew W. Moore Slide 47
Other Cross-validation issues
• Can do “leave all pairs out” or “leave-all-
ntuples-out” if feeling resourceful.
• Some folks do k-folds in which each fold is
an independently-chosen subset of the data
• Do you know what AIC and BIC are?
If so…
• LOOCV behaves like AIC asymptotically.
• k-fold behaves like BIC if you choose k carefully
If not…
• Nyardely nyardely nyoo nyoo
Copyright © Andrew W. Moore Slide 48
Cross-Validation for regression
• Choosing the number of hidden units in a
neural net
• Feature selection (see later)
• Choosing a polynomial degree
• Choosing which regressor to use

Supervising Gradient Descent
• This is a weird but common use of Test-set
validation
• Suppose you have a neural net with too
many hidden units. It will overfit.
• As gradient descent progresses, maintain a
graph of MSE-testset-error vs. Iteration
Use the weights you Training Set
Mean Squared

found on this iteration

Test Set
Error

Iteration of Gradient Descent

Supervising Gradient Descent
• This is a weird but common use of Test-set
validation
• Suppose you have a neural net with too
Relies on an intuition that a not-fully-
many hiddenminimized
units. Itset
will
of overfit.
weights is somewhat like
• As gradient descent progresses,
having fewer parameters. maintain a
graph of MSE-testset-error
Works pretty well in vs. Iteration
practice, apparently

Use the weights you Training Set

Mean Squared

found on this iteration

Test Set
Error

Iteration of Gradient Descent

Cross-validation for classification
• Instead of computing the sum squared
errors on a test set, you should compute…

Cross-validation for classification
• Instead of computing the sum squared
errors on a test set, you should compute…
The total number of misclassifications on
a testset.

Cross-validation for classification
• Instead of computing the sum squared
errors on a test set, you should compute…
The total number of misclassifications on
a testset.
• What’s LOOCV of 1-NN?
• What’s LOOCV of 3-NN?
• What’s LOOCV of 22-NN?

Cross-validation for classification
• Instead of computing the sum squared
errors on a test set, you should compute…
The total number of misclassifications on
a testset.
• But there’s a more sensitive alternative:
Compute
log P(all test outputs|all test inputs, your model)

Cross-Validation for classification
• Choosing the pruning parameter for decision
trees
• Feature selection (see later)
• What kind of Gaussian to use in a Gaussian-
based Bayes Classifier
• Choosing which classifier to use

Cross-Validation for density
estimation
• Compute the sum of log-likelihoods of test
points
Example uses:
• Choosing what kind of Gaussian assumption
to use
• Choose the density estimator
• NOT Feature selection (testset density will
almost always look better with fewer
features)
Copyright © Andrew W. Moore Slide 57
Feature Selection
• Suppose you have a learning algorithm LA
and a set of input attributes { X1 , X2 .. Xm }
• You expect that LA will only find some
subset of the attributes useful.
• Question: How can we use cross-validation
to find a useful subset?
• Four ideas: Another fun area in which
• Forward selection Andrew has spent a lot of his
• Backward elimination wild youth
• Hill Climbing
• Stochastic search (Simulated Annealing or GAs)
Copyright © Andrew W. Moore Slide 58
Very serious warning
• Intensive use of cross validation can overfit.
• How?

• What can be done about it?

Very serious warning
• Intensive use of cross validation can overfit.
• How?
• Imagine a dataset with 50 records and 1000 attributes.
• You try 1000 linear regression models, each one using
one of the attributes.
• The best of those 1000 looks good!
• But you realize it would have looked good even if the
output had been purely random!
• What can be done about it?
• Hold out an additional testset before doing any model
selection. Check the best model performs well even on
the additional testset.
• Or: Randomization Testing
Copyright © Andrew W. Moore Slide 62
What you should know
• Why you can’t use “training-set-error” to
estimate the quality of your learning
algorithm on your data.
• Why you can’t use “training set error” to
choose the learning algorithm
• Test-set cross-validation
• Leave-one-out cross-validation
• k-fold cross-validation
• Feature selection methods
• CV for classification, regression & densities
Copyright © Andrew W. Moore Slide 63

Always Red (Chasing Red) by Isabelle Ronin Book: Download Here
22% (9)
Always Red (Chasing Red) by Isabelle Ronin Book: Download Here
3 pages
Little Steps L3
71% (7)
Little Steps L3
12 pages
Ecoscience. .Population,.Resources,.Environment. (1977) .PDF (ProActiveReSEarch)
100% (7)
Ecoscience. .Population,.Resources,.Environment. (1977) .PDF (ProActiveReSEarch)
1,649 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
Ch5 Resampling Methods
No ratings yet
Ch5 Resampling Methods
66 pages
4-ResamplingMethods 1
No ratings yet
4-ResamplingMethods 1
23 pages
Lecture 7
No ratings yet
Lecture 7
29 pages
Week7_Lecture_1_ML_SPR25 (1)
No ratings yet
Week7_Lecture_1_ML_SPR25 (1)
23 pages
Crossvalidation - 1
No ratings yet
Crossvalidation - 1
30 pages
DATA ANALYSIS UNIT 4 Notes
No ratings yet
DATA ANALYSIS UNIT 4 Notes
19 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
5 CV Boot-Handout PDF
No ratings yet
5 CV Boot-Handout PDF
44 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
Lecture W2c
No ratings yet
Lecture W2c
16 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Cross Validation Thesis
100% (4)
Cross Validation Thesis
5 pages
unit 4 regression
No ratings yet
unit 4 regression
26 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
ML U-4
No ratings yet
ML U-4
63 pages
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
No ratings yet
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
17 pages
Exercise 4
No ratings yet
Exercise 4
7 pages
MI_Unit 5
No ratings yet
MI_Unit 5
72 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
UnivariateRegression Summary
No ratings yet
UnivariateRegression Summary
36 pages
ML 1 Lecture 2
No ratings yet
ML 1 Lecture 2
50 pages
Bias Varience Trade Off
100% (2)
Bias Varience Trade Off
35 pages
Class 9 after
No ratings yet
Class 9 after
38 pages
BiasVariance
No ratings yet
BiasVariance
14 pages
Lecture5
No ratings yet
Lecture5
41 pages
KNN_Bias_Variance_Classification_Metrics (1)
No ratings yet
KNN_Bias_Variance_Classification_Metrics (1)
81 pages
Statistical Learning: Master in Data Science For Management
No ratings yet
Statistical Learning: Master in Data Science For Management
47 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
PS Notes (Machine Learning
No ratings yet
PS Notes (Machine Learning
14 pages
lec5
No ratings yet
lec5
28 pages
2022hw01sol-na-na
No ratings yet
2022hw01sol-na-na
11 pages
Week 05
No ratings yet
Week 05
23 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
INSY662 - F23 - Week 3-1
No ratings yet
INSY662 - F23 - Week 3-1
22 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
No ratings yet
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
7 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
116 pages
Cross-Validation and Model Selection
No ratings yet
Cross-Validation and Model Selection
46 pages
Cost-Function
No ratings yet
Cost-Function
31 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
M1 - Evaluating Predictive Performance
No ratings yet
M1 - Evaluating Predictive Performance
58 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
intro to regression
No ratings yet
intro to regression
4 pages
ML 2024 Part1 CrossValidation
No ratings yet
ML 2024 Part1 CrossValidation
43 pages
EE2211 Introduction To Machine Learning
No ratings yet
EE2211 Introduction To Machine Learning
94 pages
CPSC 4830 2025Summer Lecture 3
No ratings yet
CPSC 4830 2025Summer Lecture 3
33 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
DLL Mapeh-5 Q3 W7
No ratings yet
DLL Mapeh-5 Q3 W7
10 pages
Quantitive and Qualitatitve Research Methods
No ratings yet
Quantitive and Qualitatitve Research Methods
2 pages
Power Flow Analysis
100% (1)
Power Flow Analysis
140 pages
CIVE 2004 GIS, Surveying, and Graphics - : Department of Civil and Environmental Engineering, Carleton University
No ratings yet
CIVE 2004 GIS, Surveying, and Graphics - : Department of Civil and Environmental Engineering, Carleton University
7 pages
Template ICon-TINE 2021
No ratings yet
Template ICon-TINE 2021
14 pages
Handbook to Life in the Ancient Inca World Facts on File Library of World History 1st Edition Ananda Cohen Suarez - Own the ebook now and start reading instantly
100% (1)
Handbook to Life in the Ancient Inca World Facts on File Library of World History 1st Edition Ananda Cohen Suarez - Own the ebook now and start reading instantly
57 pages
Physics Project
100% (1)
Physics Project
19 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Module 2 Operators
No ratings yet
Module 2 Operators
18 pages
Photography - Exam - Ans
No ratings yet
Photography - Exam - Ans
4 pages
2023 Gomez Gonzalez Et Al - Afforestation and Climate Mitigation, Lessons From Chile
No ratings yet
2023 Gomez Gonzalez Et Al - Afforestation and Climate Mitigation, Lessons From Chile
4 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
0673
No ratings yet
0673
171 pages
Greenlux H-Series Led Street Light New-1
No ratings yet
Greenlux H-Series Led Street Light New-1
3 pages
3-D Printable Open Source Dual Axis Gimbal System For Optoelectronic
No ratings yet
3-D Printable Open Source Dual Axis Gimbal System For Optoelectronic
13 pages
AQA A Level Topic Checklist
No ratings yet
AQA A Level Topic Checklist
21 pages
Sherwin White2017
No ratings yet
Sherwin White2017
23 pages
English Placement Test: Name: Arriane Belle B. Estiandan
No ratings yet
English Placement Test: Name: Arriane Belle B. Estiandan
1 page
Iot Based Health Monitoring of Three Phase Induction Motor
No ratings yet
Iot Based Health Monitoring of Three Phase Induction Motor
10 pages
Rpho 2023 T3-Eng
No ratings yet
Rpho 2023 T3-Eng
1 page
Thesis Fisheries Management
100% (3)
Thesis Fisheries Management
8 pages
Session 3B -Classroom Online Assessment
No ratings yet
Session 3B -Classroom Online Assessment
8 pages
Untitled
No ratings yet
Untitled
3 pages
Aprroved QAP-1883-R01-15122022
No ratings yet
Aprroved QAP-1883-R01-15122022
1 page
WhatWorksBrief 9
No ratings yet
WhatWorksBrief 9
5 pages
Alicosolar Li-Ion BESS 72-1720kWh
No ratings yet
Alicosolar Li-Ion BESS 72-1720kWh
3 pages
Guidance and Counseling Notes PDF
No ratings yet
Guidance and Counseling Notes PDF
11 pages

Over Fit

Uploaded by

Over Fit

Uploaded by

Cross-validation for

detecting and preventing

Copyright © Andrew W. Moore Slide 1

x Let’s consider three methods…

Copyright © Andrew W. Moore Slide 2

Copyright © Andrew W. Moore Slide 3

Copyright © Andrew W. Moore Slide 4

Copyright © Andrew W. Moore Slide 5

Copyright © Andrew W. Moore Slide 6

Copyright © Andrew W. Moore Slide 7

Copyright © Andrew W. Moore Slide 8

Copyright © Andrew W. Moore Slide 9

Why not choose the method with the

Copyright © Andrew W. Moore Slide 10

Why not choose the method with the

“How well are you going to predict future

Copyright © Andrew W. Moore Slide 11

1. Randomly choose 30%

Copyright © Andrew W. Moore Slide 12

1. Randomly choose 30%

(Linear regression example)

Copyright © Andrew W. Moore Slide 13

1. Randomly choose 30%

(Linear regression example) 4. Estimate your future

1. Randomly choose 30%

(Quadratic regression example) 4. Estimate your future

1. Randomly choose 30%

(Join the dots example) 4. Estimate your future

Copyright © Andrew W. Moore Slide 17

Copyright © Andrew W. Moore Slide 18

Copyright © Andrew W. Moore Slide 19

Copyright © Andrew W. Moore Slide 20

Copyright © Andrew W. Moore Slide 21

Copyright © Andrew W. Moore Slide 22

Copyright © Andrew W. Moore Slide 23

Copyright © Andrew W. Moore Slide 24

Copyright © Andrew W. Moore Slide 25

Copyright © Andrew W. Moore Slide 26

..can we get the best of both worlds?

Copyright © Andrew W. Moore Slide 27

Copyright © Andrew W. Moore Slide 28

Copyright © Andrew W. Moore Slide 29

Copyright © Andrew W. Moore Slide 30

Copyright © Andrew W. Moore Slide 31

Copyright © Andrew W. Moore Slide 32

Copyright © Andrew W. Moore Slide 33

Copyright © Andrew W. Moore Slide 34

Leave- Expensive. Doesn’t waste data

Copyright © Andrew W. Moore Slide 35

Copyright © Andrew W. Moore Slide 36

i fi TRAINERR 10-FOLD-CV-ERR Choice

• Step 2: Whichever model class gave best CV score: train it

• Step 2: Whichever model class gave best CV score: train it

classes: Why did we use 10-fold-CV for methods) LOOCV happens to be as

K=4 What should we do if we are 

• Step 2: Whichever model class gave best CV score: train it

Copyright © Andrew W. Moore Slide 41

Copyright © Andrew W. Moore Slide 42

These involve Idea One: Consider a discrete set of values

Copyright © Andrew W. Moore Slide 43

Copyright © Andrew W. Moore Slide 44

Algorithm TRAINERR 10-fold-CV-ERR Choice

• Step 2: Whichever algorithm gave best CV score: train it

Only directly applicable to

Copyright © Andrew W. Moore Slide 46

Copyright © Andrew W. Moore Slide 49

found on this iteration

Iteration of Gradient Descent

Copyright © Andrew W. Moore Slide 50

Use the weights you Training Set

found on this iteration

Iteration of Gradient Descent

Copyright © Andrew W. Moore Slide 51

Copyright © Andrew W. Moore Slide 52

Copyright © Andrew W. Moore Slide 53

Copyright © Andrew W. Moore Slide 54