Hands On Machine Learning 3 Edition

This document discusses machine learning classification tasks and performance evaluation metrics. It covers classifying images from the MNIST dataset using algorithms like logistic regression and decision trees. Various performance metrics are examined, including accuracy, precision, recall, F1 score, and confusion matrices. Tradeoffs between precision and recall are also addressed.

Uploaded by

saharabdouma

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

138 views

Hands On Machine Learning 3 Edition

Uploaded by

saharabdouma

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Machine Learning

Course Code: AI413

Dr. Hammam Alshazly

Contents

This lecture will cover:

▪ Classification Tasks
▪ Different Performance Measures for
performance evaluation
Classification
▪ The most common supervised learning
tasks are regression (predicting values) and
classification (predicting classes).
▪ We then explored a regression task,
predicting housing values, using various
algorithms such as Linear Regression,
Decision Trees, and Random Forests.
▪ Now we will turn our attention to
classification
Classification
▪ MNIST dataset, which is a set of 70,000 small
images of digits handwritten by high school
students and employees of the US Census Bureau.
▪ Each image is labeled with the digit it represents.
▪ This set is often called the “Hello World” of
Machine Learning: whenever people come up
with a new classification algorithm, they are
curious to see how it will perform on MNIST.
MNIST
dataset
The following code fetches the MNIST dataset
MNIST dataset

▪ There are 70,000 images, and each image has 784

features.
▪ This is because each image is 28×28 pixels, and
each feature simply represents one pixel’s
intensity, from 0 (black) to 255 (white).
▪ The MNIST dataset is actually already split into a
training set (the first 60,000 images) and a test set
(the last 10,000 images)
Training a Binary Classifier
▪ Let’s simplify the problem for now and only try to
identify one digit—for example, the number 5.
▪ This “5-detector” will be an example of a binary
classifier, capable of distinguishing between just
two classes, 5 and not-5.
▪ A good place to start is with a Stochastic Gradient
Descent (SGD) classifier, using SGDClassifier class.
▪ This classifier is capable of handling very large
datasets efficiently.
▪ This is in part because SGD deals with training
instances independently, one at a time (which also
makes SGD well suited for online learning).
Performance Measures
▪ Evaluating a classifier is often significantly trickier
than evaluating a regressor, so we will spend a
large part of this chapter on this topic.
▪ There are many performance measures available:
▪ Accuracy is generally not the preferred
performance measure for classifiers, especially
when you are dealing with skewed datasets (i.e.,
when some classes are much more frequent than
others).
▪ A much better way to evaluate the performance of
a classifier is to look at the confusion matrix.
Performance Measures
Confusion matrix.
▪ The general idea is to count the number of times
instances of class A are classified as class B.
▪ For example, to know the number of times the
classifier confused images of 5s with 3s, you
would look in the 5th row and 3rd column of the
confusion matrix.
▪ To compute the confusion matrix, you first need to
have a set of predictions, so they can be
compared to the actual targets.
▪ You could make predictions on the test set
Performance Measures
Confusion matrix.
Performance Measures
Confusion matrix.
Performance Measures

▪ The confusion matrix gives you a lot of

information, but sometimes you may prefer a
more concise metric.
▪ An interesting one to look at is the accuracy of the
positive predictions; this is called the precision of
the classifier.
Performance Measures
▪ A trivial way to have perfect precision is to make
one single positive prediction and ensure it is
correct (precision = 1/1 = 100%).
▪ This would not be very useful since the classifier
would ignore all but one positive instance.
▪ Therefore, precision is typically used along with
another metric named recall, also called
sensitivity or true positive rate (TPR): The ratio of
positive instances that are correctly detected by
the classifier.
Performance Measures
▪ It is often convenient to combine precision and
recall into a single metric called the F1 score, in
particular if you need a simple way to compare
two classifiers.
▪ The F1 score is the harmonic mean of precision
and recall. Whereas the regular mean treats all
values equally, the harmonic mean gives much
more weight to low values.
▪ As a result, the classifier will only get a high F1
score if both recall and precision are high.
Performance Measures
▪ F1 score favors classifiers that have similar
precision and recall.
▪ This is not always what you want: in some
contexts you mostly care about precision, and in
other contexts you really care about recall.
▪ For example, if you trained a classifier to detect
videos that are safe for kids, you would probably
prefer a classifier that rejects many good videos
(low recall) but keeps only safe ones (high
precision), rather than a classifier that has a much
higher recall but lets a few really bad videos show
up in your product.
Performance Measures

▪ On the other hand, suppose you train a classifier

to detect shoplifters on surveillance images: it is
probably fine if your classifier has only 30%
precision as long as it has 99% recall (sure, the
security guards will get a few false alerts, but
almost all shoplifters will get caught).
▪ Unfortunately, you can’t have it both ways:
increasing precision reduces recall, and vice
versa. This is called the precision/recall tradeoff.
Summary

▪ Classification tasks
▪ Different performance evaluation metrics
to evaluate performance of classification
algorithms.

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Module 2
No ratings yet
Module 2
151 pages
Machine Learning Chapter3
No ratings yet
Machine Learning Chapter3
27 pages
lec5_Classification
No ratings yet
lec5_Classification
27 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Instruction & Option Choice
No ratings yet
Instruction & Option Choice
6 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
BSC ML CH1.pptx
No ratings yet
BSC ML CH1.pptx
63 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
13 pages
Unit 4 ML
No ratings yet
Unit 4 ML
28 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
s41598-024-56706-x
No ratings yet
s41598-024-56706-x
14 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
22 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
0 Machine Learning Overview and Metrics LT
No ratings yet
0 Machine Learning Overview and Metrics LT
84 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Classification: Prof. Gheith Abandah
No ratings yet
Classification: Prof. Gheith Abandah
30 pages
Classification
No ratings yet
Classification
53 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Evaluation Metrics and Statistical Tests For Machi
No ratings yet
Evaluation Metrics and Statistical Tests For Machi
15 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Imbalance Problem
No ratings yet
Imbalance Problem
13 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
8.predictive Analytics - Classification 2
No ratings yet
8.predictive Analytics - Classification 2
28 pages
chapter 5 Model Evaluation
No ratings yet
chapter 5 Model Evaluation
21 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
Accuracy and error measures
No ratings yet
Accuracy and error measures
14 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Chapitre_2
No ratings yet
Chapitre_2
26 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
Jnn 5.2 Confusion Matrix and Performance Evaluation Metrics
No ratings yet
Jnn 5.2 Confusion Matrix and Performance Evaluation Metrics
13 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Unit 2
No ratings yet
Unit 2
28 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
Machine_Learning_II
No ratings yet
Machine_Learning_II
61 pages
11_23ECE216_Histograms and classifier Perf Measures
No ratings yet
11_23ECE216_Histograms and classifier Perf Measures
23 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Lecture 10
No ratings yet
Lecture 10
16 pages
CSL0777 L06
No ratings yet
CSL0777 L06
24 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
39 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Lec07 Classification ModelEvaluation Ensemble
No ratings yet
Lec07 Classification ModelEvaluation Ensemble
62 pages
Chp1 Precision Recall Tradeoff
No ratings yet
Chp1 Precision Recall Tradeoff
11 pages
CS305 Exercise 5: Task 1: Comparing Machine Learning Algorithms
No ratings yet
CS305 Exercise 5: Task 1: Comparing Machine Learning Algorithms
7 pages

Hands On Machine Learning 3 Edition

Uploaded by

Hands On Machine Learning 3 Edition

Uploaded by

Machine Learning

Course Code: AI413

Dr. Hammam Alshazly

This lecture will cover:

▪ There are 70,000 images, and each image has 784

▪ The confusion matrix gives you a lot of

▪ On the other hand, suppose you train a classifier

You might also like