0% found this document useful (0 votes)

5 views

Classification Metrics Mod 6

The document discusses binary classification, which involves categorizing instances into two groups based on a classification rule, and outlines key concepts such as confusion matrices, performance metrics (sensitivity, specificity, accuracy, precision, F-measure), and visualization techniques like ROC curves. It also explains class probability estimation, including probabilistic classifiers and methods for assessing probability estimates like Mean Squared Error and Brier Score. Lastly, it touches on empirical probability, which is derived from observed outcomes in sample sets.

Uploaded by

darshuipath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Classification Metrics Mod 6

Uploaded by

darshuipath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Created by Turbolearn AI

Evaluating Binary Classification

Binary classification is the task of classifying elements into two groups based on a
classification rule.

Observed response (output) 'y' has two possible values: +/-, or True/False.
Requires defining the relationship between h(x) and y.
Uses a decision rule.

Examples:

Medical test: Determining if a patient has a disease.

Fitness test: Determining if a person is fit.
Spam email classification.

Definitions
Instances: The objects of interest in machine learning.
Instance Space: The set of all possible instances. For example, the set of all
possible e-mails.
Label Space: Used in supervised learning to label examples.
Model: A mapping from the instance space to the output space.
In classification, the output space is a set of classes.
In regression, it is the set of real numbers.
To learn a model, a training set of labeled instances (x, l(x)), also called
examples, is needed.

Assessing Classification Performance

The outputs of learning algorithms must be assessed and analyzed carefully to
evaluate different learning algorithms. The performance of classifiers can be
summarized using a contingency table or confusion matrix.

1. Contingency Table or Confusion Matrix

A confusion matrix is a table that describes the performance of a
classification model on a set of test data where the true values are known.

Page 1
Created by Turbolearn AI

It summarizes prediction results on a classification problem.

It contains counts of correct and incorrect predictions, broken down by each
class.
It shows how the classification model is confused when it makes predictions.
It contains information about actual and predicted classifications.

Key terms:

True Positive (TP): The classifier correctly predicts a spam email as spam.
False Negative (FN): The classifier incorrectly predicts a spam email as non-
spam (a miss).
False Positive (FP): The classifier incorrectly predicts a non-spam email as
spam (a false alarm).
True Negative (TN): The classifier correctly predicts a non-spam email as non-
spam.

Example: Confusion matrix of email classification

Classification problem: spam and non-spam classes.

Dataset: 100 examples, 65 are spam, and 35 are non-spam.

Key Metrics Derived from the Confusion Matrix

Page 2
Created by Turbolearn AI

Sensitivity (True Positive Rate or Recall): Measure of positive examples

labeled as positive by the classifier. Should be higher.
For instance, the proportion of emails which are spam among all spam
emails.
Out of all the positive classes, how much we predicted correctly.
TP
Sensitivity =
T P +F N

Example: Sensitivity = = 69.23 (69.23% of spam emails are

45+20
45

correctly classified).
Specificity (True Negative Rate): Measure of negative examples labeled as
negative by the classifier. Should be higher.
For instance, the proportion of emails which are non-spam among all non-
spam emails.
TN
Specif icity =
T N +F P

Example: Specif icity = = 85.71 (85.71% of non-spam emails are

30+5

accurately classified).
Accuracy: Proportion of the total number of predictions that are correct.
T P +T N
Accuracy =
T P +T N +F P +F N

Example: Accuracy = = 75 (75% of examples are correctly

45+30

45+30+20+5

classified).
Precision: Ratio of correctly classified positive examples to the total number of
predicted positive examples.
Shows correctness achieved in positive prediction (out of all the positive
classes we have predicted correctly, how many are actually positive).
High precision indicates that an example labeled as positive is indeed
positive (small number of FPs).
TP
P recision =
T P +F P

Example: P recision = = 90 (90% of examples classified as spam

45+5

are actually spam).

Recall: Ratio of correctly classified positive examples to the total number of
positive examples.
Out of all the positive classes, how much we predicted correctly.
Should be as high as possible.
High recall indicates the class is correctly recognized (small number of
FNs).
F-measure (F1 score): Balances between precision and recall.

Page 3
Created by Turbolearn AI

Helps to compute recall and precision in one equation, solving the

problem of distinguishing models with low recall and high precision or
vice versa.
2⋅P recision⋅Recall
F -measure =
P recision+Recall

The last column and the last row give the marginals (i.e., column and row sums).

Visualizing Classification Performance

1. Coverage Plot
A coverage plot visualizes the four numbers (number of positives Pos, number of
negatives Neg, number of true positives TP, and number of false positives FP) using a
rectangular coordinate system and a point. In a coverage plot, classifiers with the
same accuracy are connected by line segments with slope 1.

2. ROC Curves
An ROC curve (receiver operating characteristic curve) is a graph showing
the performance of a classification model at all classification thresholds.

This curve plots two parameters:

True Positive Rate (TPR)
False Positive Rate (FPR)

An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the
classification threshold classifies more items as positive, thus increasing both False
Positives and True Positives.

Example:

Hypothetical Data:

True Labels: [1, 0, 1, 0, 1, 1, 0, 0, 1, 0]

Predicted Probabilities: [0.8, 0.3, 0.6, 0.2, 0.7, 0.9, 0.4, 0.1, 0.75, 0.55]

Case 1: Threshold = 0.5

Page 4
Created by Turbolearn AI

TP 4
TPR = = = 0.8
T P +F N 4+1
FP 0
FPR = = = 0
F P +T N 0+5

Case 2: Threshold = 0.7

TP 5
TPR = = = 1.0
T P +F N 5+0
FP 2
FPR = = = 0.4
F P +T N 2+3

Case 3: Threshold = 0.4

TP 4
TPR = = = 0.8
T P +F N 4+1
FP 4
FPR = = = 0.8
F P +T N 4+1

Case 4: Threshold = 0.2

TP 2
TPR = = = 0.4
T P +F N 2+3
FP 0
FPR = = = 0
F P +T N 0+5

Case 5: Threshold = 0.85

TP 5
TPR = = = 1.0
T P +F N 5+0
FP 4
FPR = = = 0.8
F P +T N 4+1

AUC Curve
AUC stands for "Area Under the ROC Curve." AUC measures the entire
two-dimensional area underneath the entire ROC curve from (0,0) to (1,1).

AUC ranges in value from 0 to 1.

A model whose predictions are 100% wrong has an AUC of 0.0.
One whose predictions are 100% correct has an AUC of 1.0.
AUC ROC indicates how well the probabilities from the positive classes are
separated from the negative classes.
ROC curves for different classifiers for a given dataset.
Useful to pick the right classifier based on the AUC curve with a good TP rate.

Class Probability Estimation

The probability of an event is the likelihood that the event will happen.

Page 5
Created by Turbolearn AI

Probability-based classifiers produce the class probability estimation (the

probability that a test instance belongs to the predicted class).
Involves not only predicting the class label but also obtaining a probability of
the respective label for decision-making.

Definition:

A probabilistic classifier is a classifier that is able to predict, given an

observation of an input, a probability distribution over a set of classes.

Binary (ordinary) classifier uses a function that assigns to a sample 'x' a class
label 'ŷ':

ŷ = f (x)

Probabilistic classifiers: Instead of functions, they are conditional distributions

P r = (Y /X) for a given x ∈ X , assigning probabilities to all y ∈ Y (and these

probabilities sum to one).

Examples: Naive Bayes, logistic regression, and multilayer perceptrons are

naturally probabilistic.

Assessing Class Probability Estimates

Page 6
Created by Turbolearn AI

1. Sum Squared Error (SSE): Square the individual error terms (difference
between the estimated values and the actual value), which results in a positive
number for all values.
2. Mean Squared Error (MSE): Measures the average of the squares of the errors.
The average squared difference between the estimated values and the
actual value (take the average, or the mean, of the individual squared error
terms).
3. Brier Score:
Definition of error in probability estimates, used in forecasting theory.
f - the probability that was forecast.

t - the actual outcome of the event at instance t (0 if it does not happen

and 1 if it does happen).

N is the number of forecasting instances.

In effect, it is the mean squared error of the forecast.

The Brier score is a proper scoring rule only for binary events (for
example, "rain" or "no rain").
Example: Suppose one is forecasting the probability P that it will rain on a
given day. Then the Brier score is calculated as follows:
If the forecast is 100% (P = 1) and it rains, then the Brier Score is 0
(best score).
If the forecast is 100% and it does not rain, then the Brier Score is 1
(worst score).
If the forecast is 70% (P = 0.70) and it rains, then the Brier Score is
(0.70 − 1) = 0.09.
2

If the forecast is 70% (P = 0.70) and it does not rain, then the Brier
Score is (0.70 − 0) = 0.49.
2

Empirical Probability
Empirical probability uses the number of occurrences of an outcome
within a sample set as a basis for determining the probability of that
outcome.

Page 7
Created by Turbolearn AI

The number of times "event X" happens out of 100 trials will be the probability
of event X happening.
The empirical probability of an event is the ratio of the number of outcomes in
which a specified event occurs to the total number of trials.
Empirical probability (experimental probability) estimates probabilities from
experience and observation.
Example: In a buffet, 95 out of 100 people chose to order coffee over tea. What
is the empirical probability of someone ordering tea?
Answer: The empirical probability of someone ordering tea is 5/100 = 5.

Page 8

TSA MD 1100.75-3 - Addressing Unacceptable Performance and Conduct - Handbook - 2.6.09
No ratings yet
TSA MD 1100.75-3 - Addressing Unacceptable Performance and Conduct - Handbook - 2.6.09
24 pages
Classification Metrics.pptx
No ratings yet
Classification Metrics.pptx
39 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
EvaluationMatrix
No ratings yet
EvaluationMatrix
29 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
DSML Clasification
No ratings yet
DSML Clasification
44 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
ML-Lecture-11-Evaluation
No ratings yet
ML-Lecture-11-Evaluation
17 pages
lec5_Classification
No ratings yet
lec5_Classification
27 pages
ML Lec-11
No ratings yet
ML Lec-11
12 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
Lecture 10
No ratings yet
Lecture 10
16 pages
lecture11evaluationmetricsforclassification-240913060639-0c766554
No ratings yet
lecture11evaluationmetricsforclassification-240913060639-0c766554
28 pages
06-FSSR_DS610_2024=2025T1_ٍMetrics
No ratings yet
06-FSSR_DS610_2024=2025T1_ٍMetrics
24 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Imbalance Problem
No ratings yet
Imbalance Problem
13 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
Evaluating Model Performance Unit 6
No ratings yet
Evaluating Model Performance Unit 6
33 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Unit6 -7 Issues_23bc7150-918a-4ebe-9af6-01db96af986a
No ratings yet
Unit6 -7 Issues_23bc7150-918a-4ebe-9af6-01db96af986a
53 pages
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
No ratings yet
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
37 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
CH 4
No ratings yet
CH 4
9 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
13 pages
bi2
No ratings yet
bi2
25 pages
mla_unit-5'2 (1)
No ratings yet
mla_unit-5'2 (1)
74 pages
CH-5_ML
No ratings yet
CH-5_ML
36 pages
Instruction & Option Choice
No ratings yet
Instruction & Option Choice
6 pages
Ca 3 Merged
No ratings yet
Ca 3 Merged
275 pages
Roc 1 PDF
No ratings yet
Roc 1 PDF
8 pages
CS340 Machine Learning ROC Curves
No ratings yet
CS340 Machine Learning ROC Curves
8 pages
06 - NaiveBayes and ME
No ratings yet
06 - NaiveBayes and ME
26 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
39 pages
DL_IT324a_4
No ratings yet
DL_IT324a_4
52 pages
BSC ML CH1.pptx
No ratings yet
BSC ML CH1.pptx
63 pages
W6 CSE 4781 Classification Metrics
No ratings yet
W6 CSE 4781 Classification Metrics
28 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Notes 03
No ratings yet
Notes 03
38 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Dian Warnas
No ratings yet
Dian Warnas
152 pages
Ador Welding Machine
No ratings yet
Ador Welding Machine
4 pages
Simple White Cake Ingredients
No ratings yet
Simple White Cake Ingredients
7 pages
Ma 40121-6b K 4 s3 SRC T.moro (Itsafe)
No ratings yet
Ma 40121-6b K 4 s3 SRC T.moro (Itsafe)
2 pages
Untamed Vixen BONUS SCENE - Luna Pierce - Anna's Archive
No ratings yet
Untamed Vixen BONUS SCENE - Luna Pierce - Anna's Archive
18 pages
Shuja CV - WPR
No ratings yet
Shuja CV - WPR
12 pages
Catalog
No ratings yet
Catalog
11 pages
RV30YN20S Cosmos Tokyo
No ratings yet
RV30YN20S Cosmos Tokyo
1 page
Science 10 Worksheet - Electricity, Magnetism, and Electromagnetism
No ratings yet
Science 10 Worksheet - Electricity, Magnetism, and Electromagnetism
10 pages
Molecular Geometries & Chemical Bondings CC
No ratings yet
Molecular Geometries & Chemical Bondings CC
81 pages
3723_pdf
No ratings yet
3723_pdf
6 pages
Afspcman91 710V7
No ratings yet
Afspcman91 710V7
54 pages
Maria Davis' Resume
No ratings yet
Maria Davis' Resume
2 pages
Compro Bagdu 26 Mei 2021
No ratings yet
Compro Bagdu 26 Mei 2021
140 pages
Sequence 2
No ratings yet
Sequence 2
15 pages
Aesthetic & Medical Clinics - Dubai
No ratings yet
Aesthetic & Medical Clinics - Dubai
20 pages
Portable Lighting Fixtures For Incandescent Lamps
No ratings yet
Portable Lighting Fixtures For Incandescent Lamps
2 pages
LMGT2111 QA LAB DESIGN AND WORKFLOW With Notes
No ratings yet
LMGT2111 QA LAB DESIGN AND WORKFLOW With Notes
6 pages
Student Profile Project
No ratings yet
Student Profile Project
10 pages
GR 12 ACCOUNTANCY VOL 2 ISSUE OF SHARES
No ratings yet
GR 12 ACCOUNTANCY VOL 2 ISSUE OF SHARES
8 pages
ACUTE RESPIRATORY TRACT INFECTION, Pneumonia
No ratings yet
ACUTE RESPIRATORY TRACT INFECTION, Pneumonia
18 pages
Sociosexual Orientation Inventory
No ratings yet
Sociosexual Orientation Inventory
3 pages
Brochure Kleemann Product Range en PDF
No ratings yet
Brochure Kleemann Product Range en PDF
28 pages
Chapter 1-PSYC 1100
No ratings yet
Chapter 1-PSYC 1100
33 pages
IEEE Standard Device Numbers
No ratings yet
IEEE Standard Device Numbers
1 page
Beko Washer Dryer Manual
No ratings yet
Beko Washer Dryer Manual
40 pages
Nicholas Brihman
No ratings yet
Nicholas Brihman
5 pages
Edf SCC Issue
No ratings yet
Edf SCC Issue
11 pages
Broken Needle - A Rare Complication of Inferior Alveolar Nerve Block - A Report of Two Cases
No ratings yet
Broken Needle - A Rare Complication of Inferior Alveolar Nerve Block - A Report of Two Cases
4 pages

Classification Metrics Mod 6

Uploaded by

Classification Metrics Mod 6

Uploaded by

Created by Turbolearn AI

Evaluating Binary Classification

Medical test: Determining if a patient has a disease.

Assessing Classification Performance

1. Contingency Table or Confusion Matrix

It summarizes prediction results on a classification problem.

Example: Confusion matrix of email classification

Classification problem: spam and non-spam classes.

Key Metrics Derived from the Confusion Matrix

Sensitivity (True Positive Rate or Recall): Measure of positive examples

Example: Sensitivity = = 69.23 (69.23% of spam emails are

Example: Specif icity = = 85.71 (85.71% of non-spam emails are

Example: Accuracy = = 75 (75% of examples are correctly

Example: P recision = = 90 (90% of examples classified as spam

are actually spam).

Helps to compute recall and precision in one equation, solving the

Visualizing Classification Performance

This curve plots two parameters:

True Labels: [1, 0, 1, 0, 1, 1, 0, 0, 1, 0]

Case 1: Threshold = 0.5

Case 2: Threshold = 0.7

Case 3: Threshold = 0.4

Case 4: Threshold = 0.2

Case 5: Threshold = 0.85

AUC ranges in value from 0 to 1.

Class Probability Estimation

Probability-based classifiers produce the class probability estimation (the

A probabilistic classifier is a classifier that is able to predict, given an

Probabilistic classifiers: Instead of functions, they are conditional distributions

probabilities sum to one).

Examples: Naive Bayes, logistic regression, and multilayer perceptrons are

Assessing Class Probability Estimates

t - the actual outcome of the event at instance t (0 if it does not happen

and 1 if it does happen).

In effect, it is the mean squared error of the forecast.

You might also like