Lecture 3b - Evaluation
Lecture 3b - Evaluation
Learning Models
Lê Anh Cường
Content
• Evaluation for classification
• Evaluation for regression
Content
• Evaluation for classification
• Evaluation for regression
Classifier Evaluation Metrics: Confusion
Matrix
Confusion Matrix:
Actual class\Predicted class C1 ¬ C1
C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)
6
Classifier Evaluation Metrics: Accuracy,
Error Rate, Sensitivity and Specificity
A\P C ¬C positive
C TP FN P ◼ Significant majority of the negative class
rate
◼ Sensitivity = TP/P
rate
◼ Specificity = TN/N
7
Classifier Evaluation Metrics:
Precision and Recall, and F-measures
• Precision: exactness – what % of tuples that the classifier
labeled as positive are actually positive
8
Classifier Evaluation Metrics:
Precision and Recall, and F-measures
9
Classifier Evaluation Metrics: Example
• Precision = ?
• Recall = ?
• F-score=?
10
Classifier Evaluation Metrics: Example
11
Example: Iris dataset
ROC curve
An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a
classification model at all classification thresholds. This curve plots two parameters:
•True Positive Rate
•False Positive Rate
https://developers.google.com/machine-learning/crash-
course/classification/roc-and-auc
ROC curve
An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the
classification threshold classifies more items as positive, thus increasing both False
Positives and True Positives. The following figure shows a typical ROC curve.
18
Estimating Confidence Intervals:
Classifier Models M1 vs. M2
• These mean error rates are just estimates of error on the true population of
future data cases
• What if the difference between the 2 error rates is just attributed to chance?
• Use a test of statistical significance
• Obtain confidence limits for our error estimates
19
Estimating Confidence Intervals:
Null Hypothesis
• Assume samples follow a t distribution with k–1 degrees of freedom (here, k=10)
• Use t-test (or Student’s t-test)
• Null Hypothesis: M1 & M2 are the same
• If we can reject null hypothesis, then
• we conclude that the difference between M1 & M2 is statistically significant
• Chose model with lower error rate
20
Estimating Confidence Intervals: t-test
wh
ere
where k1 & k2 are # of cross-validation samples used
21
for M1 &
Estimating Confidence Intervals:
Table for t-distribution
• Symmetric
• Significance level,
e.g., sig = 0.05 or 5%
means M1 & M2 are
significantly different
for 95% of
population
• Confidence limit, z =
sig/2
22
Estimating Confidence Intervals:
Statistical Significance
23
Model Selection: ROC Curves
• ROC (Receiver Operating
Characteristics) curves: for visual
comparison of classification models
• Originated from signal detection theory
• Shows the trade-off between the true
positive rate and the false positive rate
• The area under the ROC curve is a ◼ Vertical axis
measure of the accuracy of the model represents the true
• Rank the test tuples in decreasing positive rate
order: the one that is most likely to ◼ Horizontal axis rep.
belong to the positive class appears at the false positive rate
the top of the list ◼ The plot also shows a
• The closer to the diagonal line (i.e., the diagonal line
closer the area is to 0.5), the less ◼ A model with perfect
accurate is the model accuracy will have an
area of 1.0
24
Issues Affecting Model Selection
• Accuracy
• classifier accuracy: predicting class label
• Speed
• time to construct the model (training time)
• time to use the model (classification/prediction time)
• Robustness: handling noise and missing values
• Scalability: efficiency in disk-resident databases
• Interpretability
• understanding and insight provided by the model
• Other measures, e.g., goodness of rules, such as decision tree
size or compactness of classification rules
25
Example
Example
Content
• Evaluation for classification
• Evaluation for regression
Metrics
Mean squared error
•on average how far are our predictions from the true values (in squared
distance)?
•Interpretation downside: the units are squared units
•Square root of MSE (RMSE = root mean squared error) is often used:
Metrics
Mean squared error RMSE = root mean squared error