0% found this document useful (0 votes)

9 views

Evaluation Notes

The document discusses evaluation of machine learning models including parameters like true positive, true negative, false positive, false negative. It also discusses overfitting, evaluation metrics like accuracy, precision, recall, F1 score and how different metrics are suitable for different types of problems depending on the cost of false positives vs false negatives.

Uploaded by

Pooja Anand

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Evaluation Notes

Uploaded by

Pooja Anand

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

EVALUATION

The stage of testing the models before deployment in the real world.
Importance of Evaluation

 Critically examines a program.

 Makes judgments about a program
o what works well
o what could be improved in a program
o how well is the prog achieving its goals
o steps we can take to improve its effectiveness
o Make informed programming decisions.
 Fine-tune the model to enable it to operate correctly and optimally
2 Parameters of Evaluation

 Prediction
 Reality

True Positive

 The predicted value matches the actual value

 The actual value was positive and the model predicted a positive value
Eg. Fire predicted and it did occur

True Negative

 The predicted value matches the actual value

 The actual value was negative and the model predicted a negative value
Eg. Fire was predicted and it did not occur
False Positive

 The predicted value was falsely predicted

 The actual value was negative but the model predicted a positive value
Eg. Fire predicted and it did not occur
False Negative

 The predicted value was falsely predicted

  The actual value was positive but the model predicted a negative value
Eg. Fire not predicted but it occurred
Overfitting of data

 Models that use the training dataset during testing, will always result in correct
output.
 This is known as overfitting.
 To evaluate the AI model, it is not necessary to use the data that is used to build the
model because AI Model remembers the whole training data set, therefore it always
predicts the correct label for any point in the training dataset.

WHICH DATA SHOULD NOT BE USED FOR EVALUATION PURPOSES

Training data should not be used for testing or evaluation .

 AI Model remembers the whole training data set

 So it always predicts the correct label for any point in the training dataset
CONFUSION MATRIX
A confusion matrix is a comparison between the results of Prediction and reality.
It represents the prediction summary in matrix form.
EXPLAIN WITH THE HELP OF AN EXAMPLE

Example: Case: Loan (Good loan & Bad loan)

TP result - Bad loans are correctly predicted as bad loans.

TN result - Good loans are correctly predicted as good loans.
FP result - (actual) good loans are incorrectly predicted as bad loans.
FN result – (actual) bad loans are incorrectly predicted as good loans.
The banks would lose a bunch of money if the actual bad loans were predicted as good loans
due to loans not being repaid.
On the other hand, banks won't be able to make more revenue if the actual good loans are
predicted as bad loans.
Therefore, the cost of False Negatives is much higher than the cost of False Positives.

ACCURACY
Percentage of correct predictions out of all the predictions.

PRECISION
Percentage of true positives out of all the true predictions

RECALL
Measure of balance between precision and recall
Includes a fraction of positive cases that are correctly identified.

F1 SCORE
A measure of the test accuracy

F1 score ranges from 0 to 1

Good F1 score = 1 (100% performance)
Failure F1 score = 0
Model with high F1 score is ready to be deployed
EVALUATION METRICS
a. Mail Spamming -
If the model always predicts that the mail is spam, people would not look at it and
eventually might lose important information.
False Positive condition would have a high cost. (predicting the mail as spam while the
mail is not spam).
b. Gold Mining
A model saying that there exists treasure at a point and you keep on digging there but it
turns out that it is a false alarm.
False Positive case is very costly. (predicting there is a treasure but there is no treasure)
c. Deadly virus
A deadly virus has started spreading and the model which is supposed to predict a viral
outbreak does not detect it. The virus might spread widely and infect a lot of people.
Hence, False Negative can be dangerous

False Negative cases (VIRAL OUTBREAK) are more crucial and dangerous when compared to
FALSE POSITIVE cases.
3. Calculate Accuracy, Precision, Recall and F1 Score
for the following Confusion Matrix on Heart Attack
Risk. Also suggest which metric would not be a good
evaluation parameter here and why?

Calculation:

Accuracy:

Accuracy = (50+20) / (50+20+20+10) = (70/100) = 0.7

Precision:
Precision = (50 / (50 + 20)) = (50/70) = 0.714
Recall:

Recall = 50 / (50 + 60) = 50 / 110 = 0.5

F1 Score

F1 score = 2 * (0.714 0.5) / (0.714 + 0.5) = 2 (0.357 /

1.214)
= 2* (0.29406)
= 0.58
Therefore,
Accuracy= 0.7
Precision=0.714
Recall=0.5
F1 Score=0.588
False Positive (impacts Precision): A person is predicted as
high risk but does not have heart attack.

False Negative (impacts Recall): A person is predicted as low

risk but has heart attack.
Therefore, False Negatives miss actual heart patients, hence
recall metric need more improvement.
False Negatives are more dangerous than False Positives.

4. Calculate Accuracy, Precision, Recall and F1 Score for the following

Confusion Matrix on Water Shortage in Schools: Also suggest which metric
would not be a good evaluation parameter here and why?

Calculation:

ACCURACY
A = (75+15) / (75+15+5+5) = (90 / 100) =0.9
Precision:

P = 75 / (75+5) = 75 /80 = 0.9375

Recall

R = 75 / (75+5) = 75 /80 = 0.9375.

F1 Score:

F 1 score = 2 * (0.8789 / 1.875) = 2 * 0.46875 = 0.9375

Accuracy= 0.9%
Precision=0.9375%
Recall=0.9375%
F1 Score=0.
Here precision, recall, accuracy, f1 score all are same.

Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion
Matrix on SPAM FILTERING: Also suggest which metric would not be a good
evaluation parameter here and why?

Calculation

ACCURACY

A = (10 + 25) / (10+25+55+10) = 35 / 100 = 0.35

Precision:

P = 10 / (10 +55) = 10 /65 = 0.15

Recall

R = 10 / (10 + 10) = 10 / 20 = 0.5

F1 Score:

F 1 score = 2 * ((0.15 * 0.5) / (0.15 + 0.5))

= 2 * (0.075 / 0.65)
= 2 * 0.115
= 0.23

Accuracy= 0.35
Precision= 0.15
Recall= 0.5
F1 Score= 0.23

Here within the test there is a tradeoff.

But Precision is not a good Evaluation metric.
Precision metric needs to improve more.
Because, False Positive (impacts Precision): Mail is predicted as
“spam” but it is not.
False Negative (impacts Recall): Mail is predicted as “not spam” but
spam
Of course, too many False Negatives will make the Spam Filter
ineffective.
But False Positives may cause important mails to be missed.
Hence, Precision is more important to improve.

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Business Risk and Simulation Modelling in Practice: Using Excel, VBA and @RISK
From Everand
Business Risk and Simulation Modelling in Practice: Using Excel, VBA and @RISK
Michael Rees
No ratings yet
Beyond The Kalman FilterParticle Filters For Tracking Applications
100% (1)
Beyond The Kalman FilterParticle Filters For Tracking Applications
47 pages
Evaluation Question Answers
No ratings yet
Evaluation Question Answers
7 pages
Cbse - Department of Skill Education Artificial Intelligence
No ratings yet
Cbse - Department of Skill Education Artificial Intelligence
12 pages
AI Evaluation
No ratings yet
AI Evaluation
3 pages
Q ClassX AI Evaluation
No ratings yet
Q ClassX AI Evaluation
12 pages
EvaluationQuestions Class 10 Ai
No ratings yet
EvaluationQuestions Class 10 Ai
6 pages
517-c-30072-Assignment Chapter Evaluation
No ratings yet
517-c-30072-Assignment Chapter Evaluation
10 pages
EVALUATION - notes
No ratings yet
EVALUATION - notes
15 pages
Evaluation
No ratings yet
Evaluation
12 pages
Unit-7 Evaluation: 7. What Is Meant by Overfitting of Data?
No ratings yet
Unit-7 Evaluation: 7. What Is Meant by Overfitting of Data?
7 pages
Unit 7 - Evaluation
No ratings yet
Unit 7 - Evaluation
7 pages
1051637-Worksheet Part b Unit7 Evaluation
No ratings yet
1051637-Worksheet Part b Unit7 Evaluation
5 pages
UNIT 7 EVALUATION.docx
No ratings yet
UNIT 7 EVALUATION.docx
13 pages
Aiunit 7 10
No ratings yet
Aiunit 7 10
4 pages
EVALUATION
No ratings yet
EVALUATION
4 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
EVALUATION
No ratings yet
EVALUATION
10 pages
Evaluation 1 7
No ratings yet
Evaluation 1 7
7 pages
AD3501-DL-UNIT 4 NOTES
No ratings yet
AD3501-DL-UNIT 4 NOTES
16 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
11 pages
10 Ai Evaluation tp01
No ratings yet
10 Ai Evaluation tp01
5 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
20 pages
How to evaluate and monitor performance of AI models for Financial Risk Management— a practical guide by Indraneel Dutta Barua
No ratings yet
How to evaluate and monitor performance of AI models for Financial Risk Management— a practical guide by Indraneel Dutta Barua
1 page
Evaluation notes
No ratings yet
Evaluation notes
12 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
Accuracy, Precision, Recall or F1
No ratings yet
Accuracy, Precision, Recall or F1
5 pages
Ch 7 - notes evaluation
No ratings yet
Ch 7 - notes evaluation
3 pages
Ss PPT Presentation
No ratings yet
Ss PPT Presentation
11 pages
Lecture #4
No ratings yet
Lecture #4
20 pages
c10 Ai Evaluation -2024-25
No ratings yet
c10 Ai Evaluation -2024-25
29 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
5.10AI -2B
No ratings yet
5.10AI -2B
15 pages
Cross Validation
No ratings yet
Cross Validation
10 pages
Measuring Sales Forecast Accuracy
No ratings yet
Measuring Sales Forecast Accuracy
5 pages
Chapter 2 Part II (1)
No ratings yet
Chapter 2 Part II (1)
28 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
DS Notes Unit - V
No ratings yet
DS Notes Unit - V
13 pages
2.Confusion matrix and Performmance Metrics
No ratings yet
2.Confusion matrix and Performmance Metrics
15 pages
Al Evaluation
No ratings yet
Al Evaluation
4 pages
Confusion Matrix Machine Learning
No ratings yet
Confusion Matrix Machine Learning
9 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Emailing PREDICTIVE ANALYSIS 2
No ratings yet
Emailing PREDICTIVE ANALYSIS 2
14 pages
Reflection On Estimation
No ratings yet
Reflection On Estimation
5 pages
Assessing Forecasting Error: The Prediction Interval
No ratings yet
Assessing Forecasting Error: The Prediction Interval
10 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
Chapter 7 (Evaluation)
No ratings yet
Chapter 7 (Evaluation)
2 pages
JCGM 100 2008 e
No ratings yet
JCGM 100 2008 e
20 pages
Notes - Machine Learning
No ratings yet
Notes - Machine Learning
9 pages
block new doc grp
No ratings yet
block new doc grp
3 pages
How to Evaluate Machine Learning Models - Yulinda Rizky
No ratings yet
How to Evaluate Machine Learning Models - Yulinda Rizky
15 pages
CE880_Lecture6_slides
No ratings yet
CE880_Lecture6_slides
25 pages
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
No ratings yet
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
9 pages
Monte Carlo Simulation
No ratings yet
Monte Carlo Simulation
105 pages
Source of Non Matching Meshes
No ratings yet
Source of Non Matching Meshes
18 pages
Objective
No ratings yet
Objective
22 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
An Optimized Square Root Algorithm For Implementation in Fpga Hardware
No ratings yet
An Optimized Square Root Algorithm For Implementation in Fpga Hardware
8 pages
Maths (041) Xii PB 1 QP Set C
No ratings yet
Maths (041) Xii PB 1 QP Set C
7 pages
Aryaman Resume
No ratings yet
Aryaman Resume
1 page
Pert CPM
No ratings yet
Pert CPM
7 pages
DSA Practice Questions in C++
No ratings yet
DSA Practice Questions in C++
59 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
Convolutional Neural Networks: An Overview and Application in Radiology
No ratings yet
Convolutional Neural Networks: An Overview and Application in Radiology
20 pages
Tricubic Interpolation
No ratings yet
Tricubic Interpolation
17 pages
Genetic Algorithm Based Parameter Tuning of PID Co
100% (1)
Genetic Algorithm Based Parameter Tuning of PID Co
8 pages
1 1optimization
No ratings yet
1 1optimization
25 pages
Statistics and Probability 3rd Quarter 3rd Assessment
No ratings yet
Statistics and Probability 3rd Quarter 3rd Assessment
6 pages
Ma102 Ode
No ratings yet
Ma102 Ode
2 pages
The Finite-Difference Time-Domain (FDTD) Algorithm: James R. Nagel
No ratings yet
The Finite-Difference Time-Domain (FDTD) Algorithm: James R. Nagel
7 pages
Basic Statistics (Course Outline)
No ratings yet
Basic Statistics (Course Outline)
1 page
Lab 45
No ratings yet
Lab 45
3 pages
CS50 Intro To AI With Python - Notes
No ratings yet
CS50 Intro To AI With Python - Notes
11 pages
Involving Mean and Variance of Probability Distribution
No ratings yet
Involving Mean and Variance of Probability Distribution
30 pages
Area Majoraxislength Minoraxislength Eccentricity Convexarea Extent Perimeter
No ratings yet
Area Majoraxislength Minoraxislength Eccentricity Convexarea Extent Perimeter
39 pages
EX7 Report
No ratings yet
EX7 Report
5 pages
FEU Diliman - Forecasting Techniques
No ratings yet
FEU Diliman - Forecasting Techniques
5 pages
Assignment On 1D Shock Tube Problem (Computational Fluid Dynamics)
No ratings yet
Assignment On 1D Shock Tube Problem (Computational Fluid Dynamics)
11 pages
NAME:K.Harshavardhan Reg no:11BEC1074
No ratings yet
NAME:K.Harshavardhan Reg no:11BEC1074
13 pages
Jaqm Vol5 Issue2
No ratings yet
Jaqm Vol5 Issue2
191 pages
FDS-U1-P2 - Facets of Data
No ratings yet
FDS-U1-P2 - Facets of Data
15 pages
Open Finals-Matrices
No ratings yet
Open Finals-Matrices
44 pages

Evaluation Notes

Uploaded by

Evaluation Notes

Uploaded by

EVALUATION

 Critically examines a program.

 The predicted value matches the actual value

 The predicted value matches the actual value

 The predicted value was falsely predicted

 The predicted value was falsely predicted

WHICH DATA SHOULD NOT BE USED FOR EVALUATION PURPOSES

 AI Model remembers the whole training data set

Example: Case: Loan (Good loan & Bad loan)

TP result - Bad loans are correctly predicted as bad loans.

F1 score ranges from 0 to 1

Accuracy = (50+20) / (50+20+20+10) = (70/100) = 0.7

Recall = 50 / (50 + 60) = 50 / 110 = 0.5

F1 score = 2 * (0.714 *0.5) / (0.714 + 0.5) = 2 * (0.357 /

False Negative (impacts Recall): A person is predicted as low

4. Calculate Accuracy, Precision, Recall and F1 Score for the following

P = 75 / (75+5) = 75 /80 = 0.9375

R = 75 / (75+5) = 75 /80 = 0.9375.

F 1 score = 2 * (0.8789 / 1.875) = 2 * 0.46875 = 0.9375

A = (10 + 25) / (10+25+55+10) = 35 / 100 = 0.35

P = 10 / (10 +55) = 10 /65 = 0.15

R = 10 / (10 + 10) = 10 / 20 = 0.5

F 1 score = 2 * ((0.15 * 0.5) / (0.15 + 0.5))

Here within the test there is a tradeoff.

You might also like

F1 score = 2 * (0.714 0.5) / (0.714 + 0.5) = 2 (0.357 /