0% found this document useful (0 votes)
9 views

Evaluation Notes

The document discusses evaluation of machine learning models including parameters like true positive, true negative, false positive, false negative. It also discusses overfitting, evaluation metrics like accuracy, precision, recall, F1 score and how different metrics are suitable for different types of problems depending on the cost of false positives vs false negatives.

Uploaded by

Pooja Anand
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Evaluation Notes

The document discusses evaluation of machine learning models including parameters like true positive, true negative, false positive, false negative. It also discusses overfitting, evaluation metrics like accuracy, precision, recall, F1 score and how different metrics are suitable for different types of problems depending on the cost of false positives vs false negatives.

Uploaded by

Pooja Anand
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

EVALUATION

The stage of testing the models before deployment in the real world.
Importance of Evaluation

 Critically examines a program.


 Makes judgments about a program
o what works well
o what could be improved in a program
o how well is the prog achieving its goals
o steps we can take to improve its effectiveness
o Make informed programming decisions.
 Fine-tune the model to enable it to operate correctly and optimally
2 Parameters of Evaluation

 Prediction
 Reality

True Positive

 The predicted value matches the actual value


 The actual value was positive and the model predicted a positive value
Eg. Fire predicted and it did occur

True Negative

 The predicted value matches the actual value


 The actual value was negative and the model predicted a negative value
Eg. Fire was predicted and it did not occur
False Positive

 The predicted value was falsely predicted


 The actual value was negative but the model predicted a positive value
Eg. Fire predicted and it did not occur
False Negative

 The predicted value was falsely predicted


  The actual value was positive but the model predicted a negative value
Eg. Fire not predicted but it occurred
Overfitting of data

 Models that use the training dataset during testing, will always result in correct
output.
 This is known as overfitting.
 To evaluate the AI model, it is not necessary to use the data that is used to build the
model because AI Model remembers the whole training data set, therefore it always
predicts the correct label for any point in the training dataset.

WHICH DATA SHOULD NOT BE USED FOR EVALUATION PURPOSES


Training data should not be used for testing or evaluation .

 AI Model remembers the whole training data set


 So it always predicts the correct label for any point in the training dataset
CONFUSION MATRIX
A confusion matrix is a comparison between the results of Prediction and reality.
It represents the prediction summary in matrix form.
EXPLAIN WITH THE HELP OF AN EXAMPLE

Example: Case: Loan (Good loan & Bad loan)

TP result - Bad loans are correctly predicted as bad loans.


TN result - Good loans are correctly predicted as good loans.
FP result - (actual) good loans are incorrectly predicted as bad loans.
FN result – (actual) bad loans are incorrectly predicted as good loans.
The banks would lose a bunch of money if the actual bad loans were predicted as good loans
due to loans not being repaid.
On the other hand, banks won't be able to make more revenue if the actual good loans are
predicted as bad loans.
Therefore, the cost of False Negatives is much higher than the cost of False Positives.

ACCURACY
Percentage of correct predictions out of all the predictions.

PRECISION
Percentage of true positives out of all the true predictions

RECALL
Measure of balance between precision and recall
Includes a fraction of positive cases that are correctly identified.

F1 SCORE
A measure of the test accuracy

F1 score ranges from 0 to 1


Good F1 score = 1 (100% performance)
Failure F1 score = 0
Model with high F1 score is ready to be deployed
EVALUATION METRICS
a. Mail Spamming -
If the model always predicts that the mail is spam, people would not look at it and
eventually might lose important information.
False Positive condition would have a high cost. (predicting the mail as spam while the
mail is not spam).
b. Gold Mining
A model saying that there exists treasure at a point and you keep on digging there but it
turns out that it is a false alarm.
False Positive case is very costly. (predicting there is a treasure but there is no treasure)
c. Deadly virus
A deadly virus has started spreading and the model which is supposed to predict a viral
outbreak does not detect it. The virus might spread widely and infect a lot of people.
Hence, False Negative can be dangerous

False Negative cases (VIRAL OUTBREAK) are more crucial and dangerous when compared to
FALSE POSITIVE cases.
3. Calculate Accuracy, Precision, Recall and F1 Score
for the following Confusion Matrix on Heart Attack
Risk. Also suggest which metric would not be a good
evaluation parameter here and why?

Calculation:

Accuracy:

Accuracy = (50+20) / (50+20+20+10) = (70/100) = 0.7


Precision:
Precision = (50 / (50 + 20)) = (50/70) = 0.714
Recall:

Recall = 50 / (50 + 60) = 50 / 110 = 0.5


F1 Score

F1 score = 2 * (0.714 *0.5) / (0.714 + 0.5) = 2 * (0.357 /


1.214)
= 2* (0.29406)
= 0.58
Therefore,
Accuracy= 0.7
Precision=0.714
Recall=0.5
F1 Score=0.588
False Positive (impacts Precision): A person is predicted as
high risk but does not have heart attack.

False Negative (impacts Recall): A person is predicted as low


risk but has heart attack.
Therefore, False Negatives miss actual heart patients, hence
recall metric need more improvement.
False Negatives are more dangerous than False Positives.

4. Calculate Accuracy, Precision, Recall and F1 Score for the following


Confusion Matrix on Water Shortage in Schools: Also suggest which metric
would not be a good evaluation parameter here and why?

Calculation:

ACCURACY
A = (75+15) / (75+15+5+5) = (90 / 100) =0.9
Precision:

P = 75 / (75+5) = 75 /80 = 0.9375


Recall

R = 75 / (75+5) = 75 /80 = 0.9375.


F1 Score:

F 1 score = 2 * (0.8789 / 1.875) = 2 * 0.46875 = 0.9375

Accuracy= 0.9%
Precision=0.9375%
Recall=0.9375%
F1 Score=0.
Here precision, recall, accuracy, f1 score all are same.

Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion
Matrix on SPAM FILTERING: Also suggest which metric would not be a good
evaluation parameter here and why?

Calculation

ACCURACY

A = (10 + 25) / (10+25+55+10) = 35 / 100 = 0.35


Precision:

P = 10 / (10 +55) = 10 /65 = 0.15


Recall

R = 10 / (10 + 10) = 10 / 20 = 0.5


F1 Score:

F 1 score = 2 * ((0.15 * 0.5) / (0.15 + 0.5))


= 2 * (0.075 / 0.65)
= 2 * 0.115
= 0.23

Accuracy= 0.35
Precision= 0.15
Recall= 0.5
F1 Score= 0.23

Here within the test there is a tradeoff.


But Precision is not a good Evaluation metric.
Precision metric needs to improve more.
Because, False Positive (impacts Precision): Mail is predicted as
“spam” but it is not.
False Negative (impacts Recall): Mail is predicted as “not spam” but
spam
Of course, too many False Negatives will make the Spam Filter
ineffective.
But False Positives may cause important mails to be missed.
Hence, Precision is more important to improve.

You might also like