Lecture 9 - Evaluations
Lecture 9 - Evaluations
Underfitting Overfitting
High bias, low variance Low bias, high variance
Increase # of Get more training data,
features or or reduce # of features
complexity of or complexity of model
model
Bias
•Definition of Bias
•Inability of the model to predict accurately.
•Difference or error between predicted values and actual values.
•Bias as Systematic Error
•Caused by wrong assumptions in the machine learning
process.
•Mathematical Representation
• Low Bias
• Fewer assumptions in target function construction.
• Model closely matches the training dataset.
• High Bias
• More assumptions lead to poor model fit.
• Results in underfitting and high error rates.
• Example of High Bias
• Linear regression model for non-linear data.
Reducing High Bias in Machine Learning
•Definition of Variance
•Measure of spread in data from its mean.
•Indicates how the model's performance varies with
different training data subsets.
•Sensitivity to Training Data
•Variance reflects how much the model adjusts to new
subsets of data.
•Mathematical Representation
Types of Variance Errors
•Low Variance
•Model is less sensitive to training data changes.
•Produces consistent estimates across subsets.
•Often leads to underfitting (poor generalization).
•High Variance
•Model is very sensitive to training data changes.
•Significant changes in estimates based on subsets.
•Results in overfitting (good performance on training data
but poor on unseen data).
Ways to Reduce Variance
•Cross-Validation
•Splitting data into training/testing sets to identify overfitting/underfitting.
•Feature Selection
•Choosing only relevant features to decrease model complexity.
•Regularization
•Applying L1 or L2 regularization to reduce variance in models.
•Ensemble Methods
•Combining multiple models (e.g., bagging, boosting) for better
generalization.
•Simplifying the Model
•Reducing complexity, such as decreasing parameters or layers in neural
networks.
•Early Stopping
•Halting training when performance on validation stops improving to
prevent overfitting.
Evaluation on “LARGE” data
0
3 Training set
2
5
1
Data
Testing set
Model Evaluation Step 2:
Build a model on a training set
THE PAST
Results Known
0
3 Training set
2
5
1
Data
Model Builder
Testing set
Model Evaluation Step 3:
Evaluate on test set
Results Known
0
3 Training set
2
5
1
Data
Model Builder
Evaluate
Predictions
3
Y N
4
1
Testing set 2
A note on parameter tuning
• It is important that the test data is not used in any way to build the
model
• Some learning schemes operate in two stages:
• Stage 1: builds the basic structure
• Stage 2: optimizes parameter settings
• The test data can’t be used for parameter tuning!
• Proper procedure uses three sets: training data, validation data, and
test data
• Validation data is used to optimize parameters
Evaluation on “small” data, 1
• The holdout method reserves a certain amount for testing and uses
the remainder for training
• Usually: one third for testing, the rest for training
• For “unbalanced” datasets, samples might not be representative
• Few or none instances of some classes
• Stratified sample: advanced version of balancing the data
• Make sure that each class is represented with approximately equal
proportions in both subsets
Evaluation on “small” data, 2
— Hold aside one group for testing and use the rest to
build model
Test
— Repeat
21
More on cross-validation
Relevant +
Relevant Retrieved Retrieved
Not Retrieved fn tn
Precision/Recall : Example
Positive Negative
Predicted Positive 1 1
Predicted Negative 8 90
43
Recall-Precision Graph
Multiple precision
for some recalls
44
Interpolation
45
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Interpolated 1.0
Precision
46
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Interpolated 1.0
Precision
47
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
48
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
49
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
50
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
51
Interpolation
Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Interpolated 1.0 1.0 1.0 0.67 0.67 0.5 0.5 0.5 0.5 0.5 0.5
Precision
52
Recap: Confusion matrix
Actual: Actual:
Positive Negative
Predicted: tp fp
Positive
Predicted: fn tn
Negative
Actual
P 150 40 FN P 250 45
N 60 250 N 5 200
FP Predicted Predicted
Cost matrix
P N
Accuracy: 80% Accuracy: 90%
Cost: 150x-1 + 40x100 + 60x1=3910 P -1 100 Cost: 250x-1 + 45x100 +5x1 = 4255
N 1 0
• If we are focusing on accuracy then we will go with the Model 2 (In this
case we need to compromise on cost) , however if we are focusing on
cost then we will go with the Model 1 (In this case we need to
compromise on accuracy).
Significance Testing