0% found this document useful (0 votes)

24 views

09 - ML-Model Evaluation

The document discusses various metrics for evaluating machine learning models, including accuracy, precision, recall, F1 score, and area under the ROC curve for classification models, and measures like mean squared error, mean absolute error, root mean squared error, and R-squared for regression models.

Uploaded by

Nguyễn Tấn Khang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

09 - ML-Model Evaluation

Uploaded by

Nguyễn Tấn Khang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Ho Chi Minh University of Banking

Department of Economic Mathematics

Machine Learning
Model Evaluation

Vuong Trong Nhân ([email protected])

Outline

 1. Metrics for Classification

 2. Metrics for Regression
 3. Metrics for Clustering

2
1. Metrics for Classification

3
Evaluation for Classification

4
Metrics for Classification

 Accuracy score
 Confusion matrix
 Precision and Recall
 F1 score
 ROC curve
 Area Under the Curve

5
Accuracy Metrics

 Accuracy (độ chính xác):

 Tỉ lệ giữa số điểm được dự đoán đúng và tổng số
điểm trong tập dữ liệu kiểm thử

import numpy as np
from sklearn.metrics import accuracy_score
y_true = np.array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2])
y_pred = np.array([0, 1, 0, 2, 1, 1, 0, 2, 1, 2])

print('accuracy = ', accuracy_score(y_true, y_pred))

#accuracy = 0.6

6
Limitation of Accuracy

 Consider a binary classification problem

 Number of Class 0 examples = 9990
 Number of Class 1 examples = 10
 If predict all as 0, accuracy is 9990/10000 = 99.9%

 Solution:
𝑤 𝑇𝑃 𝑇𝑃+𝑤 𝑇𝑁 𝑇𝑁
 Weighted Accuracy =
𝑤 𝑇𝑃 𝑇𝑃+𝑤 𝐹𝑃 𝐹𝑃+𝑤 𝑇𝑁 𝑇𝑁+𝑤 𝐹𝑁 𝐹𝑁

 Other metrics: precision, recall, F1-score, …

7
Confusion Matrix

 shows performance of an algorithm, especially

predictive capability.
 rather than how fast it takes to classify, build
models, or scalability.

Predicted Class
Actual Positive Negative
Class Positive True Positive (TP) False Negative (FN)

Negative False Positive (FP) True Negative (TN)

8
Confusion Matrix

 Imagine a study evaluating a test that screens people

for a disease. Each person taking the test either has or
does not have the disease. The test outcome can be
positive or negative.
 The test results for each subject may or may not match
the subject's actual status. In that setting:

 True positive: Sick people correctly identified as sick

 False positive: Healthy people incorrectly identified as sick
 True negative: Healthy people correctly identified as healthy
 False negative: Sick people incorrectly identified as healthy

9
Confusion Matrix

https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers 10
Confusion Matrix
Predicted Class
Positive Negative
Actual Positive True Positive (TP) False Negative (FN)
Class (Type II error)
Negative False Positive (FP) True Negative (TN)
(Type I error)

 Accuracy
= (TP+TN) / (TP+FP+TN+FN)
 Precision
= TP / (TP + FP)
 Recall
= TP / (TP + FN)
 F1-score = 2 precision * recall / (precision + recall)
= 2TP / (2 TP + FP + FN)
11
Precision-Precall

12
F1-score
The F1 score is the harmonic mean of the precision and recall
F1-score  (0,1]

precision recall F1_score

1 1 1
0.1 0.1 0.1
0.5 0.5 0.5
1 0.1 0.182
0.3 0.8 0.36

F1-score:
(precision 0.5, recall = 0.5) is better than (precision = 0.3, recall = 0.8)

13
Type I and II error

14
Normalized confusion matrix

Predicted Class
Positive Negative
Actual Positive TPR = TP / (TP + FN) FNR = FN / (TP + FN)
Class
Negative FPR = FP / (FP + TN) TNR = TN / (FP + TN)

False Positive Rate còn được gọi là False Alarm Rate (tỉ lệ báo động nhầm)

False Negative Rate còn được gọi là Miss Detection Rate (tỉ lệ bỏ sót)

Trong bài toán dò mìn, “thà báo nhầm còn hơn bỏ sót”, tức là ta có thể chấp
nhận False Alarm Rate cao để đạt được Miss Detection Rate thấp.
Trong bài toán lọc email rác thì việc cho nhầm email quan trọng vào thùng rác
nghiêm trọng hơn việc xác định một email rác là email thường.
15
ROC curve

 Receiver Operating Characteristic

 Graphical approach for displaying the tradeoff
between true positive rate(TPR) and false positive
rate (FPR) of a classifier
o TPR = positives correctly classified/total positives
o FPR = negatives incorrectly classified/total negatives
 TPR on y-axis and FPR on x-axis

16
ROC curve

 Points of interests (TP, FP)

 (0, 0): everything is negative
 (1, 1): everything is positive
 (1, 0): perfect (ideal)

 Diagonal line
 Random guessing (50%)

 Area Under Curve (AUC)

 Measurement how good the model on the average
 Good to compare with other methods

17
For multi-class classification
Micro-average

Macro-average

20
2. Metrics for Regression

21
2.1. Bias

The sum of residuals, sometimes referred to as bias.

residual = actual - prediction

As the residuals can be both positive

(prediction is smaller than the actual
value) and negative (prediction is larger
than the actual value), bias generally
tells you whether your predictions were
higher or lower than the actuals.

However, as the residuals of opposing signs

offset each other, you can obtain a model
that generates predictions with a very low
bias, while not being accurate at all.

Figure 1 presents the relationship between a target variable (y) and a single feature (x)
22
2.2. Mean squared error (MSE)

Pros:
 MSE uses the mean (instead of the sum) to keep the metric independent of
the dataset size.
 As the residuals are squared, MSE puts a significantly heavier penalty on
large errors. Some of those might be outliers, so MSE is not robust to their
presence.
 The metric is useful for optimization algorithms.
Cons:
 MSE is not measured in the original units, which can make it harder to
interpret.
 MSE cannot be used to compare the performance between different datasets.

23
2.3. Root mean squared error
Root mean squared error (RMSE)

 Pros:
 Take the square (MSE) to bring the metric back to the scale of the target
variable, so it is easier to interpret and understand.
 Cos:
 However, take caution: one fact that is often overlooked is that although
RMSE is on the same scale as the target, an RMSE of 10 does not actually
mean you are off by 10 units on average.

24
https://developer.nvidia.com/
2.4. Mean absolute error (MAE)

 Pros:
 Due to the lack of squaring, the metric is expressed at the same scale as
the target variable, making it easier to interpret.
 All errors are treated equally, so the metric is robust to outliers.
 Cons:
 Absolute value disregards the direction of the errors, so underforecasting
= overforecasting.
 Similar to MSE and RMSE, MAE is also scale-dependent, so you cannot
compare it between different datasets.
 When you optimize for MAE, the prediction must be as many times
higher than the actual value as it should be lower. That means that you
are effectively looking for the median; that is, a value that splits a dataset
into two equal parts.
 As the formula contains absolute values, MAE is not easily differentiable.
25
2.5. R-squared (R²)

Measure: how well your model fits the data

RSS : the residual sum of squares

TSS : the total sum of squares

26
2.5. R-squared
Measure: how well your model fits the data
• RSS : the residual sum of squares
• TSS : the total sum of squares

 Pros:
 Model Fit Assessment & Model Comparisons
o A higher R-squared means a better fit.
 Helps in Feature Selection
o If adding a variable improves R-squared a lot,
it's likely a good predictor.
 Cons:
 Sensitive to Outliers
 Depends on Sample Size
 Not distinguishing between different types of
relationships
27
2.6. Some other metrics
 Mean squared log error (MSLE)
 Root mean squared log error (RMSLE)
 Symmetric mean absolute percentage error (sMAPE)
…

28
3. Metrics for Clustering

29
Rand index (RI)
 Given the knowledge of the ground truth class assignments
labels_true and our clustering algorithm assignments of the same
samples labels_pred, the (adjusted or unadjusted)
 Rand index measures the similarity of the two assignments,
ignoring permutations
 If C is a ground truth class assignment and K the clustering, let us
define a and b as:
 a the number of pairs of elements that are in the same set in C and in the
same set in K
 b the number of pairs of elements that are in different sets in C and in
different sets in K

is the total number of possible pairs in the dataset.

30
Adjusted Rand index (ARI)
 However, the Rand index does not guarantee that
random label assignments will get a value close to zero
(esp. if the number of clusters is in the same order of
magnitude as the number of samples).

 To counter this effect we can discount the expected RI

(E[RI) of random labelings by defining the adjusted
Rand index as follows

31
Rand index (RI) & Adjust Rand Index (ARI)

Rand index is a function that measures the similarity of the two assignments,
ignoring permutations:

The Rand index does not ensure to obtain a value close to 0.0 for a
random labelling.

The adjusted Rand index corrects for chance and will give such a
baseline.

32
Silhouette Score
 The Silhouette Coefficient (sklearn.metrics.silhouette_score) is an
example of such an evaluation, where a higher Silhouette Coefficient
score relates to a model with better defined clusters. The Silhouette
Coefficient is defined for each sample and is composed of two scores:

•a: The mean distance between a sample and all other

points in the same class.
•b: The mean distance between a sample and all other
points in the next nearest cluster.

 The Silhouette Coefficient s for a single sample is then given as:

33
Some other metrics

 Mutual Information based scores

 Homogeneity, completeness and V-measure
 Fowlkes-Mallows scores
 Calinski-Harabasz Index
 Davies-Bouldin Index
 Contingency Matrix
 Pair Confusion Matrix

34
References:
 https://scikit-
learn.org/stable/modules/clustering.html#clustering-
evaluation

A Convolutional Neural Network Model For Credit Card Fraud Detection
No ratings yet
A Convolutional Neural Network Model For Credit Card Fraud Detection
5 pages
Titanic Survival Analysis
No ratings yet
Titanic Survival Analysis
61 pages
09 - ML-Model Evaluation
No ratings yet
09 - ML-Model Evaluation
41 pages
Metrix in ML
No ratings yet
Metrix in ML
7 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
lec-4
No ratings yet
lec-4
24 pages
06-FSSR_DS610_2024=2025T1_ٍMetrics
No ratings yet
06-FSSR_DS610_2024=2025T1_ٍMetrics
24 pages
2. Performance Measures
No ratings yet
2. Performance Measures
19 pages
22AIP3101A Session 3
No ratings yet
22AIP3101A Session 3
24 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
20 pages
Lecture-(3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture-(3-4) Evaluation Metrices Classification and Regression
28 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
What Are The Evaluation Metrics in Machine Learning
No ratings yet
What Are The Evaluation Metrics in Machine Learning
3 pages
FALLSEM2024-25_BCSE209L_TH_VL2024250101737_2024-07-30_Reference-Material-II
No ratings yet
FALLSEM2024-25_BCSE209L_TH_VL2024250101737_2024-07-30_Reference-Material-II
23 pages
WEEK 08
No ratings yet
WEEK 08
13 pages
Confusion Matrix
No ratings yet
Confusion Matrix
4 pages
1 1regressionANDclassification
No ratings yet
1 1regressionANDclassification
20 pages
FDS_notes
No ratings yet
FDS_notes
6 pages
AD3501-DL-UNIT 4 NOTES
No ratings yet
AD3501-DL-UNIT 4 NOTES
16 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
6 pages
Model Evaluation
No ratings yet
Model Evaluation
18 pages
Accuracy Measures
No ratings yet
Accuracy Measures
18 pages
Imp Notes For Aamd
No ratings yet
Imp Notes For Aamd
6 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Evaluation Metrics for Your Regression Model - Analytics Vidhya
No ratings yet
Evaluation Metrics for Your Regression Model - Analytics Vidhya
6 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
S1-Evaluate-Performance-LKW-1Mar2025
No ratings yet
S1-Evaluate-Performance-LKW-1Mar2025
26 pages
Data Science Statistics Mathematics Cheat Sheet
100% (1)
Data Science Statistics Mathematics Cheat Sheet
13 pages
Đại Học Quốc Gia Thành Phố Hồ Chí Minh Trường Đại Học Khoa Học Tự Nhiên Khoa Công Nghệ Thông Tin Bộ Môn Công Nghệ Tri Thức
No ratings yet
Đại Học Quốc Gia Thành Phố Hồ Chí Minh Trường Đại Học Khoa Học Tự Nhiên Khoa Công Nghệ Thông Tin Bộ Môn Công Nghệ Tri Thức
9 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
ML3 Evaluating Models
No ratings yet
ML3 Evaluating Models
40 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Tutorial 6 Evaluation Metrics For Machine Learning Models: Classification and Regression Models
No ratings yet
Tutorial 6 Evaluation Metrics For Machine Learning Models: Classification and Regression Models
22 pages
Prediction---accuracy
No ratings yet
Prediction---accuracy
33 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
Imbalance Problem
No ratings yet
Imbalance Problem
13 pages
CS340 Machine Learning ROC Curves
No ratings yet
CS340 Machine Learning ROC Curves
8 pages
Roc 1 PDF
No ratings yet
Roc 1 PDF
8 pages
Confusion Matrix & Evaluation Metrics in Machine Learning
No ratings yet
Confusion Matrix & Evaluation Metrics in Machine Learning
23 pages
MLA Manual
No ratings yet
MLA Manual
25 pages
Instruction & Option Choice
No ratings yet
Instruction & Option Choice
6 pages
DL_IT324a_4
No ratings yet
DL_IT324a_4
52 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
chapter 5 Model Evaluation
No ratings yet
chapter 5 Model Evaluation
21 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
IE 527 Intelligent Engineering Systems: Basic Concepts Model/performance Evaluation Overfitting
No ratings yet
IE 527 Intelligent Engineering Systems: Basic Concepts Model/performance Evaluation Overfitting
18 pages
EvaluationMatrix
No ratings yet
EvaluationMatrix
29 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
Performance Metrics
No ratings yet
Performance Metrics
8 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
Evaluation
No ratings yet
Evaluation
18 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
49 pages
Assesing Performance of Regression-Error Measures
No ratings yet
Assesing Performance of Regression-Error Measures
5 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Yolo-V7 Object Detection Assessment
No ratings yet
Yolo-V7 Object Detection Assessment
15 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
1 s20 S2542660521001037 Main
No ratings yet
1 s20 S2542660521001037 Main
18 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
Beamer Presentation
No ratings yet
Beamer Presentation
103 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Pandey 2022 J. Phys. Conf. Ser. 2161 012027
No ratings yet
Pandey 2022 J. Phys. Conf. Ser. 2161 012027
13 pages
RevisedReport-1-1
No ratings yet
RevisedReport-1-1
20 pages
Machine Learning With Decision Trees and Random Forest ?
No ratings yet
Machine Learning With Decision Trees and Random Forest ?
31 pages
Tushar ML
No ratings yet
Tushar ML
52 pages
MC4301 - ML Unit 2 (Model Evaluation and Feature Engineering)
No ratings yet
MC4301 - ML Unit 2 (Model Evaluation and Feature Engineering)
40 pages
Top 50 Machine Learning Interview Questions (2023) - Simplilearn
No ratings yet
Top 50 Machine Learning Interview Questions (2023) - Simplilearn
24 pages
Final Report.2.0.final
No ratings yet
Final Report.2.0.final
68 pages
Deep Learning Model For Automatic Classification A
No ratings yet
Deep Learning Model For Automatic Classification A
11 pages
[S1 IJEECS 2021 Rohit Chivukula] Classifying Clinically KNN and SVM
No ratings yet
[S1 IJEECS 2021 Rohit Chivukula] Classifying Clinically KNN and SVM
8 pages
Rolling Bearing Fault Diagnosis Method Base On Periodic Sparse Attention and LSTM
No ratings yet
Rolling Bearing Fault Diagnosis Method Base On Periodic Sparse Attention and LSTM
10 pages
DETECTION OF DISEASES ON BANANAS (MUSA SP.) USING IMAGE PROCESSING AND MACHINE LEARNING TECHNIQUES
No ratings yet
DETECTION OF DISEASES ON BANANAS (MUSA SP.) USING IMAGE PROCESSING AND MACHINE LEARNING TECHNIQUES
15 pages
Class X Artificial Intelligence Evaluation Assignment
No ratings yet
Class X Artificial Intelligence Evaluation Assignment
3 pages
Application of Data Mining Techniques To Predict Adult Mortality Thecase of Butajira Rural Health Program Butajira Ethiopia 2157 7420 1000197
No ratings yet
Application of Data Mining Techniques To Predict Adult Mortality Thecase of Butajira Rural Health Program Butajira Ethiopia 2157 7420 1000197
10 pages
Harnessing The Strength of ResNet50 To Improve The Ocular Disease Recognition
No ratings yet
Harnessing The Strength of ResNet50 To Improve The Ocular Disease Recognition
7 pages
Final Credit Risk Prediction Report Corrected
No ratings yet
Final Credit Risk Prediction Report Corrected
19 pages
DM-MICA TELTEK Preksha Porwal
No ratings yet
DM-MICA TELTEK Preksha Porwal
12 pages
Credit Card Fraud Detection Using Naive Bayesian and C4.5 Decision
No ratings yet
Credit Card Fraud Detection Using Naive Bayesian and C4.5 Decision
5 pages
A Practical Deep Learning-Based Acoustic Side
No ratings yet
A Practical Deep Learning-Based Acoustic Side
21 pages
Naive - Bayes - Ipynb - Colab
No ratings yet
Naive - Bayes - Ipynb - Colab
3 pages
Convolutional Neural Network Project On Image Classification
No ratings yet
Convolutional Neural Network Project On Image Classification
8 pages
Confusion Matrix: Example Table of Confusion References External Links
No ratings yet
Confusion Matrix: Example Table of Confusion References External Links
3 pages

09 - ML-Model Evaluation

Uploaded by

09 - ML-Model Evaluation

Uploaded by

Ho Chi Minh University of Banking

Department of Economic Mathematics

Vuong Trong Nhân ([email protected])

 1. Metrics for Classification

 Accuracy (độ chính xác):

print('accuracy = ', accuracy_score(y_true, y_pred))

 Consider a binary classification problem

 Other metrics: precision, recall, F1-score, …

 shows performance of an algorithm, especially

Negative False Positive (FP) True Negative (TN)

 Imagine a study evaluating a test that screens people

 True positive: Sick people correctly identified as sick

precision recall F1_score

 Receiver Operating Characteristic

 Points of interests (TP, FP)

 Area Under Curve (AUC)

The sum of residuals, sometimes referred to as bias.

residual = actual - prediction

As the residuals can be both positive

However, as the residuals of opposing signs

Measure: how well your model fits the data

RSS : the residual sum of squares

is the total number of possible pairs in the dataset.

 To counter this effect we can discount the expected RI

•a: The mean distance between a sample and all other

 The Silhouette Coefficient s for a single sample is then given as:

 Mutual Information based scores

You might also like