0% found this document useful (0 votes)
23 views

Confusion Matrix

Environmental scientists want to classify a genetic variant using a machine learning model. They construct a confusion matrix to evaluate the model using 500 samples. The matrix tracks the model's predicted and actual classifications of samples as containing or not containing the variant. Based on the data, the scientists populate the matrix with values for true positives, false positives, true negatives, and false negatives.

Uploaded by

Kittu Bhargavi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Confusion Matrix

Environmental scientists want to classify a genetic variant using a machine learning model. They construct a confusion matrix to evaluate the model using 500 samples. The matrix tracks the model's predicted and actual classifications of samples as containing or not containing the variant. Based on the data, the scientists populate the matrix with values for true positives, false positives, true negatives, and false negatives.

Uploaded by

Kittu Bhargavi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Confusion Matrix:

A confusion matrix provides a summary of the predictive results


in a classification problem. Correct and incorrect predictions are
summarized in a table with their values and broken down by
each class.

Confusion Matrix for the Binary Classification

2. Calculate a confusion matrix:


Let’s take an example:
We have a total of 10 cats and dogs and our model predicts
whether it is a cat or not.

Actual values = [‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’,
‘cat’, ‘dog’]
Predicted values = [‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’,
‘cat’, ‘cat’]

Remember, we describe predicted values as Positive/Negative


and actual values as True/False.

Definition of the Terms:


True Positive: You predicted positive and it’s true. You predicted
that an animal is a cat and it actually is.

True Negative: You predicted negative and it’s true. You


predicted that animal is not a cat and it actually is not (it’s a
dog).
False Positive (Type 1 Error): You predicted positive and it’s
false. You predicted that animal is a cat but it actually is not (it’s
a dog).

False Negative (Type 2 Error): You predicted negative and it’s


false. You predicted that animal is not a cat but it actually is.

Classification Accuracy:
Classification Accuracy is given by the relation:

Recall (aka Sensitivity):


Recall is defined as the ratio of the total number of correctly
classified positive classes divide by the total number of positive
classes. Or, out of all the positive classes, how much we have
predicted correctly. Recall should be high.

Precision:
Precision is defined as the ratio of the total number of correctly
classified positive classes divided by the total number of
predicted positive classes. Or, out of all the predictive positive
classes, how much we predicted correctly. Precision should be
high.

Trick to remember: Precision has Predictive Results in the


denominator.

F-score or F1-score:
It is difficult to compare two models with different Precision
and Recall. So to make them comparable, we use F-Score. It is
the Harmonic Mean of Precision and Recall. As compared to
Arithmetic Mean, Harmonic Mean punishes the extreme values
more. F-score should be high.

Specificity:
Specificity determines the proportion of actual negatives that
are correctly identified.
Example to interpret confusion matrix:
Let’s calculate confusion matrix using above cat and dog
example:
Classification Accuracy:
Accuracy = (TP + TN) / (TP + TN + FP + FN) =
(3+4)/(3+4+2+1) = 0.70

Recall: Recall gives us an idea about when it’s actually yes, how
often does it predict yes.
Recall = TP / (TP + FN) = 3/(3+1) = 0.75

Precision: Precsion tells us about when it predicts yes, how


often is it correct.
Precision = TP / (TP + FP) = 3/(3+2) = 0.60

F-score:
F-score = (2*Recall*Precision)/(Recall+Presision) =
(2*0.75*0.60)/(0.75+0.60) = 0.67

Specificity:
Specificity = TN / (TN + FP) = 4/(4+2) = 0.67

The AUC-ROC curve, or Area Under the Receiver Operating


Characteristic curve, is a graphical representation of the performance of a
binary classification model at various classification thresholds. It is
commonly used in machine learning to assess the ability of a model to
distinguish between two classes, typically the positive class (e.g.,
presence of a disease) and the negative class (e.g., absence of a
disease).
Environmental scientists want to solve a two-class
classification problem for predicting whether a
population contains a specific genetic variant. They
can use a confusion matrix to determine how many
ways automated processes might confuse the
machine learning classification model they're
analyzing. Assuming the scientists use 500 samples
for their data analysis, a table is constructed for their
predictive and actual values before calculating the
confusion matrix.

Predicted without Predicted with the


the variant variant
Actual number
without the variant
Actual number with
the variant
Total predictive
Total predicted value
value

After creating the matrix, the scientists analyze their


sample data. Assume the scientists predict that 350
test samples contain the genetic variant and 150
samples don't. If they determine the actual number of
samples containing the variant is 305, the actual
number of samples without the variant is 195. These
values become the "true" values in the matrix and the
scientists enter the data in the table:
Predicted with the
Predicted without the variant variant
Actual number without the True negative = 45 False positive =
Predicted with the
Predicted without the variant variant
variant = 195 150
Actual number with the variant True positive =
False negative = 105
= 305 200
150 350

You might also like