0% found this document useful (0 votes)

4 views

05 Classification Part1

Uploaded by

keremmertnet

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

05 Classification Part1

Uploaded by

keremmertnet

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 35

Classification

Part 1

CSE4416 – Introduction to Data Mining

Assoc. Prof. Dr. Derya BİRANT

Outline

◘ What Is Classification?
◘ Classification Examples
◘ Classification Methods
– Decision Trees
– Bayesian Classification
– K-Nearest Neighbor
– Neural Network
– Support Vector Machines (SVM)
– Fuzzy Set Approaches
What Is Classification?

◘ Classification
– Construction of a model to classify data
– When constructing the model, use the training set and the class labels
(i.e. yes no) in the target column

Training Set Model

Classification Steps

1. Model construction
– Each tuple is assumed to belong to a predefined class
– The set of tuples used for model construction is training set
– The model is represented as classification rules, trees, or mathematical formulae

2. Test Model
– Using test set, estimate accuracy rate of the model
• Accuracy rate is the percentage of test set samples that are correctly classified
by the model

3. Model Usage (Classifying future or unknown objects)

– If the accuracy is acceptable, use the model to classify data tuples whose classes
don’t known
Classification Steps
Tid Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K No

2 No Married 100K No Yes Married 50K Yes 2. Test Model
3 No Single 70K No No Married 150K Yes
4 Yes Married 120K No Yes Divorced 90K No
Test
10

5 No Divorced 95K Yes

6 No Married 60K No
Set
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No 1. Construct Model
10 No Single 90K Yes
10

Training
Learn
Model
Set Classifier

Refund Marital Taxable

Status Income Cheat

Yes Divorced 50K ? New

No Married 50K ? Data
Yes Single 150K ? 3. Use Model
Classification Example

◘ Given old data about customers and payments, predict new

applicant’s loan eligibility.
– Good Customers
– Bad Customers

Previous customers Classifier Rules

Salary > 5 L
Age
Salary Good/
Profession
Prof. = Exec
bad
Location
Customer type

New applicant’s data

Classification Techniques
1. Decision Trees 4. Neural Network

2. Bayesian Classification 5.Support Vector Machines (SVM)

p(c j ) n
c max
cj p(d )
 p(a
i 1
i | cj)

3. K-Nearest Neighbor 6.Fuzzy Set Approaches

Classification Techniques

Decision Trees

Bayesian Classification

K-Nearest Neighbor

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches

…
Decision Trees

◘ Decision Tree is a tree where

– internal nodes are simple decision rules on one or more attributes
– leaf nodes are predicted class labels
◘ Decision trees are used for deciding between several courses of
action
age income student credit_rating buys_computer Attribute
<=30 high no fair no
<=30 high no excellent no
Value
31…40 high no fair yes age?
>40 medium no fair yes
>40 low yes fair yes >40
>40 low yes excellent no
<=30 31..40 Classification
31…40 low yes excellent yes
<=30 medium no fair no student? yes credit rating?
<=30 low yes fair yes
>40 medium yes fair yes No Yes Excellent Fair
<=30 medium yes excellent yes
31…40 medium no excellent yes no yes no yes
31…40 high yes fair yes
>40 medium no excellent no
Decision Regions
Desicion Tree Applications

◘ Decision trees are used extensively in data mining.

◘ Has been applied to:
– classify medical patients based on the disease,
– classify customers based on past behavior (their interests, loyalty, etc.),
– classify documents
– ...

Salary < 1 M

Job = teacher Age < 30

Good Bad Bad Good

House Hiring
Decision Tree Adv. DisAdv.

Positives (+) Negatives (-)

+ Reasonable training time - Cannot handle complicated
+ Fast application relationship between features
+ Easy to interpret - Simple decision boundaries
(can be re-represented as if-then- - Problems with lots of missing
else rules) data
+ Easy to implement - Output attribute must be categorical
+ Can handle large number of - Limited to one output attribute
features
+ Does not require any prior
knowledge of data distribution
Rules Indicated by Decision Trees

◘ Write a rule for each path in the decision tree from the root to a leaf.
Decision Tree Algorithms

◘ ID3
– Quinlan (1981)
– Tries to reduce expected number of comparison
◘ C 4.5
– Quinlan (1993)
– It is an extension of ID3
– Just starting to be used in data mining applications
– Also used for rule induction
◘ CART
– Breiman, Friedman, Olshen, and Stone (1984)
– Classification and Regression Trees
◘ CHAID
– Kass (1980)
– Oldest decision tree algorithm
– Well established in database marketing industry
◘ QUEST
– Loh and Shih (1997)
Decision Tree Construction

◘ Which attribute is the best classifier?

– Calculate the information gain G(S,A) for each attribute A.
– Select the attribute with the highest information gain.

m
Entropy(S)   p i log 2 p i Entropy(S)  p1 log 2 p1  p 2 log 2 p 2
i 1

| Si |
Gain (S, A) Entropy(S )   Entropy(Si)
iA |S|
Entropy
Entropy
Decision Tree Construction

Which attribute first?

Decision Tree Construction

Entropi ( S )  (9 / 14) log 2 (9 / 14)  (5 / 14) log 2 (5 / 14) 0,940

Decision Tree Construction
Decision Tree Construction
Decision Tree Construction
Decision Tree Construction
Decision Tree Construction

Gain(S, Outlook) = 0,25

Gain(S, Temperature) = 0,03
Gain(S, Humidity) = 0,16
Gain(S, Windy) = 0,05

Outlook
Sunny rain
overcast

? yes
?
Decision Tree Construction

◘ Which attribute is next?

Outlook
Sunny rain
overcast

? yes
?

Gain( S sunny ,Wind ) 0,970  (2 / 5)1,0  (3 / 5)0,918 0,970 0,019

Gain( S sunny , Humidity ) 0,970  (3 / 5)0,0  (2 / 5)0,0 0,970

Gain( S Sunny , Temperatur e) 0,970  (2 / 5)0  (2 / 5)1  (1 / 5)0 0,570

Decision Tree Construction

Outlook

rain
Sunny overcast

Wind
Humidity yes

[D3,D7,D12,D13] weak strong

High Normal

yes no
No yes
[D4,D5,D10] [D6,D14]
[D1,D2, D8] [D9,D11]
Another Example
At the weekend:
- go shopping,
- watch a movie,
- play tennis or
- just stay in.

What you do depends on three things:

- the weather (windy, rainy or sunny);
- how much money you have (rich or poor)
- whether your parents are visiting.
Another Example
Classification Techniques

Decision Trees

Bayesian Classification

K-Nearest Neighbor

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches

…
Classification Techniques
2- Bayesian Classification
◘ A statistical classifier: performs probabilistic prediction, i.e., predicts class
membership probabilities.

◘ Foundation: Based on Bayes’ Theorem.

Given training data X, posteriori probability of a hypothesis H, P(H|X), follows

the Bayes theorem

P(H | X) P(X | H )P(H )

P(X)
Classification Techniques
2- Bayesian Classification
◘ X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
age income student credit_rating buys_computer
◘ P(C1): P(buys_computer = “yes”) = 9/14 = 0.643 <=30 high no fair no
<=30 high no excellent no
P(C2): P(buys_computer = “no”) = 5/14= 0.357
31…40 high no fair yes
>40 medium no fair yes
◘ Compute P(X|Ci) for each class >40 low yes fair yes
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 >40 low yes excellent no
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 31…40 low yes excellent yes
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 <=30 medium no fair no
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 <=30 low yes fair yes
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 >40 medium yes fair yes
<=30 medium yes excellent yes
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
31…40 medium no excellent yes
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
31…40 high yes fair yes
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 >40 medium no excellent no

◘ P(X|C1) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|C2) : P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|Ci)P(Ci) : P(X|buys_computer = “yes”) P(buys_computer = “yes”) = 0.028

P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

Classification Techniques
2- Bayesian Classification
Classification Techniques

Decision Trees

Bayesian Classification

K-Nearest Neighbor

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches

…
K-Nearest Neighbor (k-NN)

◘ An object is classified by a majority vote of its neighbors (k closest

members) .

◘ If k = 1, then the object is simply assigned to the class of its nearest

neighbor.

◘ Euclidean Distance measure is used to calculate how close

K-Nearest Neighbor (k-NN)

Complete Math, Grade 1: Canadian Edition
From Everand
Complete Math, Grade 1: Canadian Edition
Carson Dellosa Education
3/5 (1)
Manual+Caldera+Roca+Laura+20 20F+ (Ingles)
50% (2)
Manual+Caldera+Roca+Laura+20 20F+ (Ingles)
60 pages
Factors - Math Lesson Plan
100% (1)
Factors - Math Lesson Plan
6 pages
Mind Machines You Can Build
100% (15)
Mind Machines You Can Build
202 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Classification
No ratings yet
Classification
33 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Unit 3
No ratings yet
Unit 3
16 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
unit 2 notes (1)
No ratings yet
unit 2 notes (1)
83 pages
CH 5
No ratings yet
CH 5
84 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Down 4
No ratings yet
Down 4
83 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Module 04
No ratings yet
Module 04
75 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Unit 4
No ratings yet
Unit 4
186 pages
Unit 4
No ratings yet
Unit 4
78 pages
8 Classification
No ratings yet
8 Classification
45 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
7 Classification
100% (3)
7 Classification
63 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
LECTURE 8
No ratings yet
LECTURE 8
81 pages
Module 04 Edited
No ratings yet
Module 04 Edited
19 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Class Basic
No ratings yet
Class Basic
67 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Classification
No ratings yet
Classification
81 pages
Learning AI
No ratings yet
Learning AI
34 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
CCST9017 (2023-24lecture11printed Version) MachineLearning
No ratings yet
CCST9017 (2023-24lecture11printed Version) MachineLearning
55 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
DWDM Unit Iv
No ratings yet
DWDM Unit Iv
81 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
05_Decision Tree_updated
No ratings yet
05_Decision Tree_updated
69 pages
MTS Lecture 6 24
No ratings yet
MTS Lecture 6 24
48 pages
Shut up bitch
No ratings yet
Shut up bitch
44 pages
Machine Learning Report
No ratings yet
Machine Learning Report
16 pages
catalog-Robot 2023
No ratings yet
catalog-Robot 2023
16 pages
InfraRed Search & Track Systems
No ratings yet
InfraRed Search & Track Systems
21 pages
Determiners
No ratings yet
Determiners
14 pages
Annuity, Perpetuity Explained
No ratings yet
Annuity, Perpetuity Explained
7 pages
My Supercars
No ratings yet
My Supercars
73 pages
Fixing Oracle Database Block Corruption 11g
No ratings yet
Fixing Oracle Database Block Corruption 11g
7 pages
PH-120-04 Homework Ch02 Answers
No ratings yet
PH-120-04 Homework Ch02 Answers
3 pages
Gui Design in C++.
No ratings yet
Gui Design in C++.
23 pages
Class 9 Science Study Material Chapter 6
No ratings yet
Class 9 Science Study Material Chapter 6
20 pages
Cat Electronic Technician 2015A v1.0 Product Status Report
No ratings yet
Cat Electronic Technician 2015A v1.0 Product Status Report
8 pages
Firestone Air Gripper Catalog
No ratings yet
Firestone Air Gripper Catalog
42 pages
Sourcing and Sinking Currents
No ratings yet
Sourcing and Sinking Currents
17 pages
Manual Proportional Directional Control Valve (With Pressure Compensation, Multiple Valve Series)
No ratings yet
Manual Proportional Directional Control Valve (With Pressure Compensation, Multiple Valve Series)
6 pages
27 201 Structure of Materials Lab 3, Basic Crystallographic Computations 1.1 Purpose
No ratings yet
27 201 Structure of Materials Lab 3, Basic Crystallographic Computations 1.1 Purpose
3 pages
Construction Materials
100% (1)
Construction Materials
6 pages
MODULE 1 (Important Question and Answers)
No ratings yet
MODULE 1 (Important Question and Answers)
23 pages
Web Server COnfiguration
No ratings yet
Web Server COnfiguration
3 pages
Angle Resolved Photoemission Spectroscopy (ARPES) : Basic Outline
No ratings yet
Angle Resolved Photoemission Spectroscopy (ARPES) : Basic Outline
4 pages
03E. COP Well Selection Guidelines (Excellent)
No ratings yet
03E. COP Well Selection Guidelines (Excellent)
66 pages
Model 780-001 Indoor Explosion-Proof Single Party Handset Station
No ratings yet
Model 780-001 Indoor Explosion-Proof Single Party Handset Station
2 pages
5 GRR Calculation Presentation Template
No ratings yet
5 GRR Calculation Presentation Template
5 pages
9b0642de-f4a7-4c38-a2a1-d4feb18dab3c
No ratings yet
9b0642de-f4a7-4c38-a2a1-d4feb18dab3c
18 pages
Holaaaa
No ratings yet
Holaaaa
4 pages
Machine Learning Models
0% (1)
Machine Learning Models
16 pages

05 Classification Part1

Uploaded by

05 Classification Part1

Uploaded by

Classification

CSE4416 – Introduction to Data Mining

Assoc. Prof. Dr. Derya BİRANT

Training Set Model

3. Model Usage (Classifying future or unknown objects)

1 Yes Single 125K No No Single 75K No

5 No Divorced 95K Yes

Refund Marital Taxable

Yes Divorced 50K ? New

◘ Given old data about customers and payments, predict new

Previous customers Classifier Rules

New applicant’s data

2. Bayesian Classification 5.Support Vector Machines (SVM)

3. K-Nearest Neighbor 6.Fuzzy Set Approaches

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches

◘ Decision Tree is a tree where

◘ Decision trees are used extensively in data mining.

Job = teacher Age < 30

Good Bad Bad Good

Positives (+) Negatives (-)

◘ Which attribute is the best classifier?

Which attribute first?

Entropi ( S )  (9 / 14) log 2 (9 / 14)  (5 / 14) log 2 (5 / 14) 0,940

Gain(S, Outlook) = 0,25

◘ Which attribute is next?

Gain( S sunny ,Wind ) 0,970  (2 / 5)1,0  (3 / 5)0,918 0,970 0,019

Gain( S Sunny , Temperatur e) 0,970  (2 / 5)0  (2 / 5)1  (1 / 5)0 0,570

[D3,D7,D12,D13] weak strong

What you do depends on three things:

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches

◘ Foundation: Based on Bayes’ Theorem.

Given training data X, posteriori probability of a hypothesis H, P(H|X), follows

P(H | X) P(X | H )P(H )

◘ P(X|C1) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028

Therefore, X belongs to class (“buys_computer = yes”)

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches

◘ An object is classified by a majority vote of its neighbors (k closest

◘ If k = 1, then the object is simply assigned to the class of its nearest

◘ Euclidean Distance measure is used to calculate how close

You might also like

P(X|Ci)P(Ci) : P(X|buys_computer = “yes”) P(buys_computer = “yes”) = 0.028