0% found this document useful (0 votes)
4 views

Classification Metrics.pptx

The document discusses binary classification, defining key concepts such as instances, model, and label space, and emphasizes the importance of assessing classification performance through metrics like confusion matrix, sensitivity, specificity, accuracy, and precision. It also covers visualization techniques such as ROC curves and AUC for evaluating classifiers, as well as class probability estimation and methods for assessing probability estimates like Brier score and Mean Squared Error. Overall, it provides a comprehensive overview of classification tasks in machine learning and the metrics used to evaluate their performance.

Uploaded by

darshuipath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Classification Metrics.pptx

The document discusses binary classification, defining key concepts such as instances, model, and label space, and emphasizes the importance of assessing classification performance through metrics like confusion matrix, sensitivity, specificity, accuracy, and precision. It also covers visualization techniques such as ROC curves and AUC for evaluating classifiers, as well as class probability estimation and methods for assessing probability estimates like Brier score and Mean Squared Error. Overall, it provides a comprehensive overview of classification tasks in machine learning and the metrics used to evaluate their performance.

Uploaded by

darshuipath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Evaluation

Binary classification
•Task of classifying the elements of a set into two groups on
the basis of a classification rule.
•Observed response (output) ‘y’ takes only two possible
values + / – , or T/ F .
• Need to define the relationship between h(x) and y
• Use the decision rule:

Ex:
•Medical test- to determine if a patient has certain disease or not;
•Is the person fit or not
•Spam email classification
Definition
Instances
• The objects of interest in machine learning
Instance space [ ]
• The set of all possible instances
• Example: set of all possible e-mails
Label Space
• The label space is used in supervised learning to label the examples
Model
• In order to achieve the task under consideration we need a model:
a mapping from the instance space to the output space.
• For instance, in classification the output space is a set of classes,
while in regression it is the set of real numbers.
• In order to learn such a model we require a training set of labelled
instances (x, l(x)), also called examples
Assessing classification
performance
• The outputs of learning algorithms need to be
assessed and analyzed carefully and this
analysis must be interpreted correctly, so as to
evaluate different learning algorithms.

• The performance of classifiers can be


summarized by means of a table known as a
contingency table or confusion matrix.
1. Contingency table or confusion
matrix
• A confusion matrix is a table that is used to describe
the performance of a classification model (or
“classifier”) on a set of test data for which the true
values are known.
• It is a summary of prediction results on a classification
problem.
• The number of correct and incorrect predictions are
summarized with count values and broken down by
each class.
• shows the ways in which the classification model is
confused when it makes predictions
• It contains information about actual and predicted
classifications
• True Positive (TP) : Given class is spam and the classifier has been
correctly predicted it as spam.
• False Negative (FN) : Given class is spam however, the classifier
has been incorrectly predicted it as non-spam.
• False positive (FP) : Given class is non-spam however, the classifier
has been incorrectly predicted it as spam.
• True Negative (TN) : Given class is non- spam and the classifier has
been correctly predicted it as negative.
Ex: Confusion matrix of email classification
• Classification problem has spam and non-spam classes
and dataset contains 100 examples, 65 are spams and
35 are non-spams.
Sensitivity : referred as True Positive Rate or Recall.
• It is measure of positive examples labeled as positive by classifier.
• It should be higher.
• For instance, proportion of emails which are spam among all spam
emails.
• Out of all the positive classes, how much we predicted correctly.

Sensitivity = 45/(45+20) = 69.23% (The 69.23% spam emails are


correctly classified and excluded from all non-spam emails).
• Specificity - True Negative Rate.
• It is measure of negative examples labeled as negative
by classifier - it should be higher value
• For instance, proportion of emails which
are non-spam among all non-spam emails.

specificity = 30/(30+5) = 85.71% (The 85.71% non-spam emails


are accurately classified and excluded from all spam emails.
• Accuracy is the proportion of the total number of
predictions that are correct.

Accuracy = (45+30)/(45+20+5+30) = 75% (The 75% of examples are


correctly classified by the classifier.
• Precision is ratio of total number of correctly classified positive
examples and the total number of predicted positive examples.
• It shows correctness achieved in positive prediction , (Out of all
the positive classes we have predicted correctly, how many are
actually positive)
• High Precision indicates an example labeled as positive is
indeed positive (a small number of FP).

Precision = 45/(45+5)= 90% (The 90% of examples are classified as


spam are actually spam.
• Recall : The ratio of the total number of correctly
classified positive examples divide to the total
number of positive examples
• Out of all the positive classes, how much we
predicted correctly.
• It should be high as possible.

• High Recall indicates the class is correctly


recognized (a small number of FN).
F-measure (F1 score)
– good choice when you seek to balance between
Precision and Recall.
– It helps to compute recall and precision in one equation
so that the problem to distinguish the models with low
recall and high precision or vice versa could be solved.
• The last column and the last row give the
marginals (i.e., column and row sums).
Visualizing classification
performance
1. Coverage plot -- A coverage plot visualizes these four
numbers by means of a rectangular coordinate system
and a point.(the number of positives Pos, the number of
negatives Neg, the number of true positives TP and the
number of false positives FP).
• In a coverage plot, classifiers with the same accuracy are
connected by line segments with slope 1.
In a coverage plot, classifiers with the same
accuracy are connected by line segments with
slope 1.
2. ROC curves
An ROC curve (receiver operating characteristic curve) is a graph showing the
performance of a classification model at all classification thresholds.

This curve plots two parameters:


True Positive Rate
False Positive Rate

An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the
classification threshold classifies more items as positive, thus increasing both False
Positives and True Positives
https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
ROC Curve
Let us consider the hypothetical data,
True Labels: [1, 0, 1, 0, 1, 1, 0, 0, 1, 0]
Predicted Probabilities: [0.8, 0.3, 0.6, 0.2, 0.7, 0.9, 0.4, 0.1, 0.75, 0.55]
https://www.geeksforgeeks.org/auc-roc-curve/
ROC Curve
Actual Class Predicted
Probability
• Set different
1 0.8
thresholds 0, 0.2, 0.4,
0 0.96 0.6, 0.8, 1 and
1 0.4 generate ROC curve
1 0.3 by plotting between
0 0.2 TPR and FPR.
1 0.7
Problem 2
Problem 3
Example
Actual Predicted
1. Set Threshold
P 2. Calculate TP, T >=0.8
N TN, FP, FN
N 3. Calculate TPR
N and FPR
N 4. Draw ROC curve
N with FPR vs TPR
N
N A\P C ¬C
N C TP FN P
N
N ¬C FP TN N
P’ N’ All

FPR – the proportion of samples


that were incorrectly classified
3. AUC CURVE
■ AUC stands for "Area under the ROC
Curve." That is, AUC measures the
entire two-dimensional area
underneath the entire ROC curve
(think integral calculus) from (0,0)
to (1,1).
■ AUC ranges in value from 0 to 1. A
model whose predictions are 100%
wrong has an AUC of 0.0; one
whose predictions are 100% correct
has an AUC of 1.0.
■ Thus, AUC ROC indicates how well the
probabilities from the positive classes
are separated from the negative classes.

https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
30
https://medium.com/greyatom/lets-learn-about-auc-roc-curve-4a94b4d88152
AUC CURVE

■ ROC curves for different


classifiers for a given dataset.

■ Useful to pick the right classifier


based on AUC curve with good
TP rate

■ In this figure, classifier


corresponds to red line will be
selected

31
Class Probability Estimation
• The probability of an event is the likelihood that the event will
happen.

• Probability-based classifiers produce the class probability estimation


(the probability that a test instance belongs to the predicted class)

• Not only predicting the class label, but also obtaining a probability of
the respective label – use the estimated class probability for decision
making.
• Defn: A probabilistic classifier is a classifier that is able to predict,
given an observation of an input, a probability distribution over
a set of classes.
• Binary (ordinary) classifier uses function, that assigns to a
sample ’x’ a class label ‘ŷ’
ŷ= f(x)

• Probabilistic classifiers : instead of functions, they are


conditional distributions Pr=(Y/X) for a given x∈ X,
assign probabilities to all y ∈ Y (and these probabilities
sum to one)

• Ex: Naive Bayes, logistic regression and multilayer


perceptrons are naturally probabilistic.
Ex: Probability estimation tree
Assessing class probability
estimates
1. Sum Squared Error (SSE) : Square the individual
error terms (difference between the estimated values
and the actual value) which results in a positive
number for all values.
2. Mean squared error (MSE) -measures
the average of the squares of the errors
• The average squared difference between the
estimated values and the actual value (take the
average, or the mean, of the individual squared
error terms)
3. Brier score
• The definition of error in probability estimates - used in forecasting
theory

ft -the probability that was forecast,


ot - the actual outcome of the event at instance t (0 if it does not happen
and 1 if it does happen),
N is the number of forecasting instances.
• In effect, it is the mean squared error of the forecast.
• Brier score is a proper scoring rule only for binary events (for example
"rain" or "no rain").
• Ex: Suppose that one is forecasting the probability P that it
will rain on a given day. Then the Brier score is calculated as
follows:

• If the forecast is 100% (P = 1) and it rains, then the Brier


Score is 0, the best score achievable.
• If the forecast is 100% and it does not rain, then the Brier
Score is 1, the worst score achievable.
• If the forecast is 70% (P = 0.70) and it rains, then the Brier
Score is (0.70−1)2 = 0.09.
• In contrast, if the forecast is 70% (P= 0.70) and it does not
rain, then the Brier Score is (0.70−0)2 = 0.49.
Empirical probability
• uses the number of occurrences of an outcome within a sample set as a
basis for determining the probability of that outcome.
• The number of times "event X" happens out of 100 trials will be
the probability of event X happening.
• The empirical probability of an event is the ratio of the number of
outcomes in which a specified event occurs to the total number of trials,
• Empirical probability (experimental probability) estimates probabilities
from experience and observation.
Ex:
In a buffet, 95 out of 100 people chose to order coffee over tea. What is the empirical
probability of someone ordering tea?
Ans: The empirical probability of someone ordering tea is 5/100 = 5%.

You might also like