0% found this document useful (0 votes)

18 views

Lecture 3 Basics of Clssification

Uploaded by

parisangel

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Lecture 3 Basics of Clssification

Uploaded by

parisangel

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Basics of Classification

(Adapted from various sources. The slides copied are only for
teaching purposes.)
BASIC IDEA
§ In the exams, the student’s grade was assigned based
on their marks as follows:

Mark ≥ "# : A

"# > Mark ≥ $# : B

Rules 8# > Mark ≥ %# : C

%# > Mark ≥ &# : D

6# > Mark : F

Here the classification is done based on a simple

rule!!!
§ We apply a rule / set of rules to classify the
data
§ Classificationis a technique for
describing important data classes based
on some rules.

Classification !
§ The classes are mutually exhaustive and
exclusive.

§ This indicate each object can be assigned

to precisely only one class
§ Science
§ Finance
§ Medical
§ Security
Applications § Prediction
§ Entertainment
§ Social media
§ And more….
§ Supervised Classification
§ We already know the set of possible
classes.

§ Unsupervised Classification
Classification § It is called clustering
types § We don’t know the classes or the
number of possible classes.

§ We try to categorize based on some

rule which may not serve our purpose
at all.
Image taken from www.webstockreview.net
§ The Classification is a supervised
technique

§ A good classifier depends on the

Points to below two factors.
remember § We need rules for classification
§ We need a teacher.
§ Training set (the teacher)
§ Collection of records with a set of
attributes and one class lebel.

How to
proceed?
The image is taken from https://sebastianraschka.com/Articles/2015_pca_in_3_steps.html
§ Training set (the teacher)
§ Collection of records with a set of
attributes and one class label.

How to
proceed?

§ Develop a model for class in terms of

the other attributes with training set
§ Define the rules
§ Statistics based - Bayesian
§ Distance Based - KNN

Classifiers § Decision Tree Based - CART

§ Machine Learning based - SVM
§ Neural Network based - CNN
§ The ‘idiot’ or ‘simple’ classifier.
§ Based on statistics.
§ Empirically proven to be useful.
§ Scales very well.
Naïve Bayes § Predicts class membership probabilities
§ Based on Bayes’ Theorem.
§ The attributes are independent given the
class.
In this database, there are four attributes
A = [ Sepal length, Sepal width, Petal length,
Petal width]
with 150 sample.

The categories of classes are:

C= [Iris Versicolor, Iris Setosa, Iris Virginica]

Given this is the knowledge of data and

classes, we are to find most likely
classification for any other unseen instance.
§ In many applications, a unknown sample
cannot be classified to a class label with
certainty.
§ In such a situation, the classification can
be achieved probabilistically.
Why Statistics? § In Bayesian classifier we try to model
probabilistic relationships between the
attribute set and the class variable,
§ Bayesian classifier use Bayes’ Theorem of
Probability for classification.
Digit Recognition

Classifier 5
Another § X1,…,Xn Î {0,1} (Blue vs. Red pixels)
Application § Y Î {5,6} (predict whether a digit is a 5 or a 6)

A good strategy is to predict what is the

probability that the image represents a 5
given its pixels?
The Bayes Classifier
§ A good strategy is to predict what is the
probability that the image represents a 5
given its pixels?

§ So … How do we compute that?

Likelihood Prior

Normalization Constant

§ To classify, we’ll simply compute these two probabilities

How and predict based on which one is greater

CLASSIFIERS
X1
feature X2 Y
X3 Classifier category
values …
Xn
CLASSIFIERS
X1
feature X2
Y category
X3 Classifier
values …
Xn

collection of instances
DB
with known categories
EXAMPLE 1
Determining decision on scholarship application based on the following features:

Household income (annual Number of siblings in High school grade (on a

income in millions of pesos) family QPI scale of 1.0 – 4.0)

Intuition (reflected on data set): award scholarships

to high-performers and to those with financial need
,ØInstance-based learning is often termed lazy learning, as there is
typically no “transformation” of training instances into more general
“statements”

ØInstead, the presented training data is simply stored and, when a new
query instance is encountered, a set of similar, related instances is
retrieved from memory and used to classify the new query instance

ØHence, instance-based learners never form an explicit general hypothesis

regarding the target function. They simply compute the classification of
each new query instance as needed
K-NN APPROACH
The simplest, most used instance-based learning algorithm is the k-
NN algorithm

k-NN assumes that all instances are points in some n-dimensional

space and defines neighbors in terms of distance (usually
Euclidean in R-space)

k is the number of neighbors considered

K-NN APPROACH
ØUnlike all the previous learning methods, kNN does not build model from the
training data.

ØTo classify a test instance d, define k-neighborhood P as k nearest neighbors

of d

ØCount number n of training instances in P that belong to class cj

ØEstimate Pr(cj|d) as n/k

ØNo training is needed. Classification time is linear in training set size for each
test case.
9
K-NEAREST-
NEIGHBORS
WHAT IS THE MOST
POSSIBLE LABEL FOR C?
Solution: Looking for the nearest
K neighbors of c.
Take the majority label as c’s c
label
Let’s suppose k = 3:
WHAT IS THE MOST
POSSIBLE LABEL FOR C?
Solution: Looking for the nearest
K neighbors of c.
Take the majority label as c’s c
label
Let’s suppose k = 3:
WHAT IS THE MOST
POSSIBLE LABEL FOR C?
Solution: Looking for the nearest
K neighbors of c.
Take the majority label as c’s c
label
Let’s suppose k = 3
The 3 nearest points to c are: a,
a and o.
Therefore, the most possible
label for c is a.
SIMPLE ILLUSTRATION: THE COMPLEXITY

-
-
+
+ -
-
+
+
-
SIMPLE ILLUSTRATION

-
-
+
•q + -
-
+
+
-
What is the class of q?
SIMPLE ILLUSTRATION

-
-
+
•q + -
-
+
+
-
q is + under 1-NN
SIMPLE ILLUSTRATION
-
-
+
•q + -
-
+
+
-
q is + under 1-NN,
but – under 5-NN
For a given instance T, get the top k
Get dataset instances that are “nearest” to T
• Select a reasonable distance measure

K - NEAREST Inspect
Inspect the category of these k instances,
choose the category C that represent the
NEIGHBORS most instances

Adopted from Václav Hlaváč , Czech

Technical University, Prague.
Performance of a learned classifier?

Classifiers (both supervised and unsupervised) are

learned (trained) on a finite training multiset (named
simply training set in the sequel for simplicity).
A learned classifier has to be tested on a different test set
experimentally.
The classifier performs on different data in the run mode
that on which it has learned.
The experimental performance on the test data is a proxy
for the performance on unseen data. It checks the
classifier’s generalization ability.
There is a need for a criterion function assessing the
classifier performance experimentally, e.g., its error rate,
accuracy, expected Bayesian risk (to be discussed later).
A need for comparing classifiers experimentally.
Evaluation as Hypothesis testing

Evaluation has to be
treated as hypothesis
testing in statistics.
The value of the
population parameter
has to be statistically
inferred based on the
sample statistics (i.e., a
training set in pattern
recognition).
Danger of overfitting

Learning the training data too precisely usually

leads to poor classification results on new data.

Classifier has to have the ability to generalize.

Training vs. test data

Problem: Finite data are available only and have to

be used both for training and testing.

More training data gives better generalization.

More test data gives better estimate for the
classification error probability.

Never evaluate performance on training data.

¡ The conclusion would be optimistically biased.
Training vs. test data

Hold out: Partitioning of available finite set of data

to training / test sets.

Bootstrap and Cross validation

Once evaluation is finished, all the available data can

be used to train the final classifier.
Hold out method

Given data is randomly partitioned into two

independent sets.
Training multi-set (e.g., 2/3 of data) for the
statistical model construction, i.e. learning the
classifier.
Test set (e.g., 1/3 of data) is hold out for the accuracy
estimation of the classifier.
Random sampling is a variation of the hold out
method
Repeat the hold out k times, the accuracy is
estimated as the average of the accuracies obtained.
K-fold cross validation

The training set is randomly divided into K disjoint

sets of equal size where each part has roughly the
same class distribution.

The classifier is trained K times, each time with a

different set held out as a test set.

The estimated error is the mean of these K errors.

Graphical Example
Leave-one-out

A special case of K-fold cross validation with K = n,

where n is the total number of samples in the training
multiset.
n experiments are performed using (n − 1) samples for
training and the remaining sample for testing.
It is rather computationally expensive.
Leave-one-out cross-validation does not guarantee the
same class distribution in training and test data!
The extreme case:
¡ 50% class A, 50% class B. Predict majority class label in the training
data. True error 50%; Leave-one-out error estimate 100%!
Bootstrap aggregating

The bootstrap uses sampling with replacement to

form the training set.
Let the training set T consisting of n entries.
Bootstrap generates m new datasets Ti each of size n0 < n
by sampling T uniformly with replacement. The
consequence is that some entries can be repeated in Ti.
The m statistical models (e.g., classifiers, regressors) are
learned using the above m bootstrap samples.
The statistical models are combined, e.g. by averaging
the output (for regression) or by voting (for
classification).
Recommended experimental validation
procedure

Use K-fold cross-validation (K = 5 or K = 10) for

estimating performance estimates (accuracy, etc.).

Compute the mean value of performance estimate,

and standard deviation and confidence intervals.

Report mean values of performance estimates and

their standard deviations or 95% confidence
intervals around the mean.
Criterion function to assess classifier
performance

Accuracy and error rate

¡ Accuracy is the percent of correct classifications.
¡ Error rate = is the percent of incorrect classifications.
¡ Accuracy = 1 - Error rate.

Problems with the accuracy:

¡ Assumes equal costs for misclassification.
¡ Assumes relatively uniform class distribution.

Other characteristics derived from the confusion

matrix
Expected Bayesian risk.
Confusion matrix, two classes only
Confusion matrix, two classes only

Accuracy Precision, predicted

(a + d)/(a + b + c + d) =(TN + positive value
TP)/total = d/(b + d) = TP/predicted
positive
True positive rate, recall,
sensitivity False positive rate, false
= d/(c + d) = TP/actual positive alarm
= b/(a+b) = FP/actual negative =
Specificity, true negative
1 - specificity
rate
= a/(a + b) = TN/actual negative False negative rate
= c/(c + d) = FN/actual positive
Confusion matrix, # of classes > 2
Here is the summary!!!

Any ML/AI model depends on the training data set

Class balancing is important.

Validation is important!

And we also need some measure for validating the

data.
THANK YOU FOR LISTENING

Questions???

Marketing Analytics Rahul M Shettar
67% (3)
Marketing Analytics Rahul M Shettar
9 pages
Complete Math, Grade 1: Canadian Edition
From Everand
Complete Math, Grade 1: Canadian Edition
Carson Dellosa Education
3/5 (1)
2. Classification and clustering algorithms
No ratings yet
2. Classification and clustering algorithms
108 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Chapter
100% (1)
Chapter
101 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
19 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Classification
No ratings yet
Classification
33 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
L05-Predictive Analytics I
No ratings yet
L05-Predictive Analytics I
49 pages
APznzaaOoSfWDDs6MOckIGqH4XP2VHeq48_kGcBsO4AMqmGggWfQprpvqUi7un5sx3f3JT83ORHggRKjkAZyq6KG7QYiz-aJNIrQFyYcfM2CctUVKMqMQatTTYqCq-D30cEe2eQkpsv7eD8UdUymTe-_Z6Rzow7Ed8jsByqz8R-ymgT6HWk-iek4A3yLZZ7hpyO0mjabXEk
No ratings yet
APznzaaOoSfWDDs6MOckIGqH4XP2VHeq48_kGcBsO4AMqmGggWfQprpvqUi7un5sx3f3JT83ORHggRKjkAZyq6KG7QYiz-aJNIrQFyYcfM2CctUVKMqMQatTTYqCq-D30cEe2eQkpsv7eD8UdUymTe-_Z6Rzow7Ed8jsByqz8R-ymgT6HWk-iek4A3yLZZ7hpyO0mjabXEk
65 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
classification
No ratings yet
classification
36 pages
7.classification Before
No ratings yet
7.classification Before
27 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
Learning AI
No ratings yet
Learning AI
34 pages
Classification
No ratings yet
Classification
58 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Lec 04
No ratings yet
Lec 04
70 pages
Data Analytics Classification
No ratings yet
Data Analytics Classification
56 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
IntroClassificationDA-2024
No ratings yet
IntroClassificationDA-2024
129 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Introduction to Classification and Classification Algorithms
No ratings yet
Introduction to Classification and Classification Algorithms
9 pages
Internet of Things Comparative Study
No ratings yet
Internet of Things Comparative Study
3 pages
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
No ratings yet
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
30 pages
Lecture7 KNN
No ratings yet
Lecture7 KNN
40 pages
2 Supervised Learning
No ratings yet
2 Supervised Learning
48 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
6 Classification
No ratings yet
6 Classification
53 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Decision Tree Part 1
No ratings yet
Decision Tree Part 1
16 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
DM See M4
No ratings yet
DM See M4
8 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Unit-5_3161610
No ratings yet
Unit-5_3161610
92 pages
Unit 4- Classification and Prediction
No ratings yet
Unit 4- Classification and Prediction
72 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Fds UNIT 1
No ratings yet
Fds UNIT 1
38 pages
Independent Component Analysis For Time Series Separation
No ratings yet
Independent Component Analysis For Time Series Separation
13 pages
Nils Baker
No ratings yet
Nils Baker
11 pages
Stat and Prob-Thirrd QA
No ratings yet
Stat and Prob-Thirrd QA
4 pages
CH 9
No ratings yet
CH 9
35 pages
PPG MLRM Upto Autocorr PDF
No ratings yet
PPG MLRM Upto Autocorr PDF
20 pages
Hayatın Anlam Ve Amacı Ölçeği: Geçerlik Ve Güvenirlik Çalışması
No ratings yet
Hayatın Anlam Ve Amacı Ölçeği: Geçerlik Ve Güvenirlik Çalışması
18 pages
The Influence of Internal and External Factors On Learning Achievement
No ratings yet
The Influence of Internal and External Factors On Learning Achievement
12 pages
Thirteen Ways To Look at The Correlation Coefficient
No ratings yet
Thirteen Ways To Look at The Correlation Coefficient
9 pages
0673 Chapter 6
No ratings yet
0673 Chapter 6
61 pages
Efektifitas Perbedaan Kompres Hangat Dan Dingin Terhadap Perubahan Suhu Tubuh Pada Anak Di Rsud Dr. M. Yunus Bengkulu
No ratings yet
Efektifitas Perbedaan Kompres Hangat Dan Dingin Terhadap Perubahan Suhu Tubuh Pada Anak Di Rsud Dr. M. Yunus Bengkulu
10 pages
On The Optimal Weighting Matrix For The GMM System Estimator in Dynamic Panel Data Models
No ratings yet
On The Optimal Weighting Matrix For The GMM System Estimator in Dynamic Panel Data Models
28 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
3 pages
Parameter Estimation of Bernoulli Distribution Using Maximum Likelihood and Bayesian Methods
No ratings yet
Parameter Estimation of Bernoulli Distribution Using Maximum Likelihood and Bayesian Methods
14 pages
ANOVA-SCRIPT
No ratings yet
ANOVA-SCRIPT
2 pages
CH 03
No ratings yet
CH 03
54 pages
ST 511 Self Notes
No ratings yet
ST 511 Self Notes
6 pages
Portfolio Management Part 2
No ratings yet
Portfolio Management Part 2
6 pages
Sultan Kudarat State University College of Graduate Studies
No ratings yet
Sultan Kudarat State University College of Graduate Studies
4 pages
Assignment 2 ML
No ratings yet
Assignment 2 ML
2 pages
The Unscrambler Tutorials
No ratings yet
The Unscrambler Tutorials
179 pages
Multiple and Partial Correlation: R 1 R R R
No ratings yet
Multiple and Partial Correlation: R 1 R R R
3 pages
Unit 12 Testing of Hypothesis: Structure
No ratings yet
Unit 12 Testing of Hypothesis: Structure
48 pages
Chap 6.2 Questions
No ratings yet
Chap 6.2 Questions
4 pages
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
No ratings yet
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
5 pages
Comandos STATA
No ratings yet
Comandos STATA
10 pages
07 - Data Analysis and Decision Modeling (MBA) July 2018
No ratings yet
07 - Data Analysis and Decision Modeling (MBA) July 2018
5 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
3a. Factorial Experiment
No ratings yet
3a. Factorial Experiment
47 pages

Lecture 3 Basics of Clssification

Uploaded by

Lecture 3 Basics of Clssification

Uploaded by

Basics of Classification

"# > Mark ≥ $# : B

Rules 8# > Mark ≥ %# : C

%# > Mark ≥ &# : D

Here the classification is done based on a simple

§ This indicate each object can be assigned

§ We try to categorize based on some

§ A good classifier depends on the

§ Develop a model for class in terms of

Classifiers § Decision Tree Based - CART

The categories of classes are:

Given this is the knowledge of data and

A good strategy is to predict what is the

§ So … How do we compute that?

§ To classify, we’ll simply compute these two probabilities

How and predict based on which one is greater

Household income (annual Number of siblings in High school grade (on a

Intuition (reflected on data set): award scholarships

ØHence, instance-based learners never form an explicit general hypothesis

k-NN assumes that all instances are points in some n-dimensional

k is the number of neighbors considered

ØTo classify a test instance d, define k-neighborhood P as k nearest neighbors

ØCount number n of training instances in P that belong to class cj

ØEstimate Pr(cj|d) as n/k

Conclude Conclude that T belongs to category C

Adopted from Václav Hlaváč , Czech

 Classifiers (both supervised and unsupervised) are

 Learning the training data too precisely usually

 Classifier has to have the ability to generalize.

 Problem: Finite data are available only and have to

 More training data gives better generalization.

 Never evaluate performance on training data.

 Hold out: Partitioning of available finite set of data

 Bootstrap and Cross validation

Once evaluation is finished, all the available data can

 Given data is randomly partitioned into two

 The training set is randomly divided into K disjoint

 The classifier is trained K times, each time with a

 The estimated error is the mean of these K errors.

 A special case of K-fold cross validation with K = n,

 The bootstrap uses sampling with replacement to

 Use K-fold cross-validation (K = 5 or K = 10) for

 Compute the mean value of performance estimate,

 Report mean values of performance estimates and

 Accuracy and error rate

 Problems with the accuracy:

 Other characteristics derived from the confusion

 Accuracy  Precision, predicted

 Any ML/AI model depends on the training data set

 Class balancing is important.

 And we also need some measure for validating the

You might also like

Classifiers (both supervised and unsupervised) are

Learning the training data too precisely usually

Classifier has to have the ability to generalize.

Problem: Finite data are available only and have to

More training data gives better generalization.

Never evaluate performance on training data.

Hold out: Partitioning of available finite set of data

Bootstrap and Cross validation

Given data is randomly partitioned into two

The training set is randomly divided into K disjoint

The classifier is trained K times, each time with a

The estimated error is the mean of these K errors.

A special case of K-fold cross validation with K = n,

The bootstrap uses sampling with replacement to

Use K-fold cross-validation (K = 5 or K = 10) for

Compute the mean value of performance estimate,

Report mean values of performance estimates and

Accuracy and error rate

Problems with the accuracy:

Other characteristics derived from the confusion

Accuracy Precision, predicted

Any ML/AI model depends on the training data set

Class balancing is important.

And we also need some measure for validating the