0% found this document useful (0 votes)
5 views

Classification Metrics Mod 6

The document discusses binary classification, which involves categorizing instances into two groups based on a classification rule, and outlines key concepts such as confusion matrices, performance metrics (sensitivity, specificity, accuracy, precision, F-measure), and visualization techniques like ROC curves. It also explains class probability estimation, including probabilistic classifiers and methods for assessing probability estimates like Mean Squared Error and Brier Score. Lastly, it touches on empirical probability, which is derived from observed outcomes in sample sets.

Uploaded by

darshuipath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Classification Metrics Mod 6

The document discusses binary classification, which involves categorizing instances into two groups based on a classification rule, and outlines key concepts such as confusion matrices, performance metrics (sensitivity, specificity, accuracy, precision, F-measure), and visualization techniques like ROC curves. It also explains class probability estimation, including probabilistic classifiers and methods for assessing probability estimates like Mean Squared Error and Brier Score. Lastly, it touches on empirical probability, which is derived from observed outcomes in sample sets.

Uploaded by

darshuipath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Created by Turbolearn AI

Evaluating Binary Classification


Binary classification is the task of classifying elements into two groups based on a
classification rule.

Observed response (output) 'y' has two possible values: +/-, or True/False.
Requires defining the relationship between h(x) and y.
Uses a decision rule.

Examples:

Medical test: Determining if a patient has a disease.


Fitness test: Determining if a person is fit.
Spam email classification.

Definitions
Instances: The objects of interest in machine learning.
Instance Space: The set of all possible instances. For example, the set of all
possible e-mails.
Label Space: Used in supervised learning to label examples.
Model: A mapping from the instance space to the output space.
In classification, the output space is a set of classes.
In regression, it is the set of real numbers.
To learn a model, a training set of labeled instances (x, l(x)), also called
examples, is needed.

Assessing Classification Performance


The outputs of learning algorithms must be assessed and analyzed carefully to
evaluate different learning algorithms. The performance of classifiers can be
summarized using a contingency table or confusion matrix.

1. Contingency Table or Confusion Matrix


A confusion matrix is a table that describes the performance of a
classification model on a set of test data where the true values are known.

Page 1
Created by Turbolearn AI

It summarizes prediction results on a classification problem.


It contains counts of correct and incorrect predictions, broken down by each
class.
It shows how the classification model is confused when it makes predictions.
It contains information about actual and predicted classifications.

Key terms:

True Positive (TP): The classifier correctly predicts a spam email as spam.
False Negative (FN): The classifier incorrectly predicts a spam email as non-
spam (a miss).
False Positive (FP): The classifier incorrectly predicts a non-spam email as
spam (a false alarm).
True Negative (TN): The classifier correctly predicts a non-spam email as non-
spam.

Example: Confusion matrix of email classification

Classification problem: spam and non-spam classes.


Dataset: 100 examples, 65 are spam, and 35 are non-spam.

Key Metrics Derived from the Confusion Matrix

Page 2
Created by Turbolearn AI

Sensitivity (True Positive Rate or Recall): Measure of positive examples


labeled as positive by the classifier. Should be higher.
For instance, the proportion of emails which are spam among all spam
emails.
Out of all the positive classes, how much we predicted correctly.
TP
Sensitivity =
T P +F N

Example: Sensitivity = = 69.23 (69.23% of spam emails are


45+20
45

correctly classified).
Specificity (True Negative Rate): Measure of negative examples labeled as
negative by the classifier. Should be higher.
For instance, the proportion of emails which are non-spam among all non-
spam emails.
TN
Specif icity =
T N +F P

Example: Specif icity = = 85.71 (85.71% of non-spam emails are


30

30+5

accurately classified).
Accuracy: Proportion of the total number of predictions that are correct.
T P +T N
Accuracy =
T P +T N +F P +F N

Example: Accuracy = = 75 (75% of examples are correctly


45+30

45+30+20+5

classified).
Precision: Ratio of correctly classified positive examples to the total number of
predicted positive examples.
Shows correctness achieved in positive prediction (out of all the positive
classes we have predicted correctly, how many are actually positive).
High precision indicates that an example labeled as positive is indeed
positive (small number of FPs).
TP
P recision =
T P +F P

Example: P recision = = 90 (90% of examples classified as spam


45

45+5

are actually spam).


Recall: Ratio of correctly classified positive examples to the total number of
positive examples.
Out of all the positive classes, how much we predicted correctly.
Should be as high as possible.
High recall indicates the class is correctly recognized (small number of
FNs).
F-measure (F1 score): Balances between precision and recall.

Page 3
Created by Turbolearn AI

Helps to compute recall and precision in one equation, solving the


problem of distinguishing models with low recall and high precision or
vice versa.
2⋅P recision⋅Recall
F -measure =
P recision+Recall

The last column and the last row give the marginals (i.e., column and row sums).

Visualizing Classification Performance

1. Coverage Plot
A coverage plot visualizes the four numbers (number of positives Pos, number of
negatives Neg, number of true positives TP, and number of false positives FP) using a
rectangular coordinate system and a point. In a coverage plot, classifiers with the
same accuracy are connected by line segments with slope 1.

2. ROC Curves
An ROC curve (receiver operating characteristic curve) is a graph showing
the performance of a classification model at all classification thresholds.

This curve plots two parameters:


True Positive Rate (TPR)
False Positive Rate (FPR)

An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the
classification threshold classifies more items as positive, thus increasing both False
Positives and True Positives.

Example:

Hypothetical Data:

True Labels: [1, 0, 1, 0, 1, 1, 0, 0, 1, 0]


Predicted Probabilities: [0.8, 0.3, 0.6, 0.2, 0.7, 0.9, 0.4, 0.1, 0.75, 0.55]

Case 1: Threshold = 0.5

Page 4
Created by Turbolearn AI

TP 4
TPR = = = 0.8
T P +F N 4+1
FP 0
FPR = = = 0
F P +T N 0+5

Case 2: Threshold = 0.7


TP 5
TPR = = = 1.0
T P +F N 5+0
FP 2
FPR = = = 0.4
F P +T N 2+3

Case 3: Threshold = 0.4


TP 4
TPR = = = 0.8
T P +F N 4+1
FP 4
FPR = = = 0.8
F P +T N 4+1

Case 4: Threshold = 0.2


TP 2
TPR = = = 0.4
T P +F N 2+3
FP 0
FPR = = = 0
F P +T N 0+5

Case 5: Threshold = 0.85


TP 5
TPR = = = 1.0
T P +F N 5+0
FP 4
FPR = = = 0.8
F P +T N 4+1

AUC Curve
AUC stands for "Area Under the ROC Curve." AUC measures the entire
two-dimensional area underneath the entire ROC curve from (0,0) to (1,1).

AUC ranges in value from 0 to 1.


A model whose predictions are 100% wrong has an AUC of 0.0.
One whose predictions are 100% correct has an AUC of 1.0.
AUC ROC indicates how well the probabilities from the positive classes are
separated from the negative classes.
ROC curves for different classifiers for a given dataset.
Useful to pick the right classifier based on the AUC curve with a good TP rate.

Class Probability Estimation


The probability of an event is the likelihood that the event will happen.

Page 5
Created by Turbolearn AI

Probability-based classifiers produce the class probability estimation (the


probability that a test instance belongs to the predicted class).
Involves not only predicting the class label but also obtaining a probability of
the respective label for decision-making.

Definition:

A probabilistic classifier is a classifier that is able to predict, given an


observation of an input, a probability distribution over a set of classes.

Binary (ordinary) classifier uses a function that assigns to a sample 'x' a class
label 'ŷ':

ŷ = f (x)

Probabilistic classifiers: Instead of functions, they are conditional distributions


P r = (Y /X) for a given x ∈ X , assigning probabilities to all y ∈ Y (and these

probabilities sum to one).

Examples: Naive Bayes, logistic regression, and multilayer perceptrons are


naturally probabilistic.

Assessing Class Probability Estimates

Page 6
Created by Turbolearn AI

1. Sum Squared Error (SSE): Square the individual error terms (difference
between the estimated values and the actual value), which results in a positive
number for all values.
2. Mean Squared Error (MSE): Measures the average of the squares of the errors.
The average squared difference between the estimated values and the
actual value (take the average, or the mean, of the individual squared error
terms).
3. Brier Score:
Definition of error in probability estimates, used in forecasting theory.
f - the probability that was forecast.

t - the actual outcome of the event at instance t (0 if it does not happen


0

and 1 if it does happen).


N is the number of forecasting instances.

In effect, it is the mean squared error of the forecast.


The Brier score is a proper scoring rule only for binary events (for
example, "rain" or "no rain").
Example: Suppose one is forecasting the probability P that it will rain on a
given day. Then the Brier score is calculated as follows:
If the forecast is 100% (P = 1) and it rains, then the Brier Score is 0
(best score).
If the forecast is 100% and it does not rain, then the Brier Score is 1
(worst score).
If the forecast is 70% (P = 0.70) and it rains, then the Brier Score is
(0.70 − 1) = 0.09.
2

If the forecast is 70% (P = 0.70) and it does not rain, then the Brier
Score is (0.70 − 0) = 0.49.
2

Empirical Probability
Empirical probability uses the number of occurrences of an outcome
within a sample set as a basis for determining the probability of that
outcome.

Page 7
Created by Turbolearn AI

The number of times "event X" happens out of 100 trials will be the probability
of event X happening.
The empirical probability of an event is the ratio of the number of outcomes in
which a specified event occurs to the total number of trials.
Empirical probability (experimental probability) estimates probabilities from
experience and observation.
Example: In a buffet, 95 out of 100 people chose to order coffee over tea. What
is the empirical probability of someone ordering tea?
Answer: The empirical probability of someone ordering tea is 5/100 = 5.

Page 8

You might also like