0% found this document useful (0 votes)
4 views

05 Classification Part1

Uploaded by

keremmertnet
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

05 Classification Part1

Uploaded by

keremmertnet
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

Classification

Part 1

CSE4416 – Introduction to Data Mining

Assoc. Prof. Dr. Derya BİRANT


Outline

◘ What Is Classification?
◘ Classification Examples
◘ Classification Methods
– Decision Trees
– Bayesian Classification
– K-Nearest Neighbor
– Neural Network
– Support Vector Machines (SVM)
– Fuzzy Set Approaches
What Is Classification?

◘ Classification
– Construction of a model to classify data
– When constructing the model, use the training set and the class labels
(i.e. yes no) in the target column

Training Set Model


Classification Steps

1. Model construction
– Each tuple is assumed to belong to a predefined class
– The set of tuples used for model construction is training set
– The model is represented as classification rules, trees, or mathematical formulae

2. Test Model
– Using test set, estimate accuracy rate of the model
• Accuracy rate is the percentage of test set samples that are correctly classified
by the model

3. Model Usage (Classifying future or unknown objects)


– If the accuracy is acceptable, use the model to classify data tuples whose classes
don’t known
Classification Steps
Tid Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K No


2 No Married 100K No Yes Married 50K Yes 2. Test Model
3 No Single 70K No No Married 150K Yes
4 Yes Married 120K No Yes Divorced 90K No
Test
10

5 No Divorced 95K Yes


6 No Married 60K No
Set
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No 1. Construct Model
10 No Single 90K Yes
10

Training
Learn
Model
Set Classifier

Refund Marital Taxable


Status Income Cheat

Yes Divorced 50K ? New


No Married 50K ? Data
Yes Single 150K ? 3. Use Model
Classification Example

◘ Given old data about customers and payments, predict new


applicant’s loan eligibility.
– Good Customers
– Bad Customers

Previous customers Classifier Rules


Salary > 5 L
Age
Salary Good/
Profession
Prof. = Exec
bad
Location
Customer type

New applicant’s data


Classification Techniques
1. Decision Trees 4. Neural Network

2. Bayesian Classification 5.Support Vector Machines (SVM)


p(c j ) n
c max
cj p(d )
 p(a
i 1
i | cj)

3. K-Nearest Neighbor 6.Fuzzy Set Approaches


Classification Techniques

Decision Trees

Bayesian Classification

K-Nearest Neighbor

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches


Decision Trees

◘ Decision Tree is a tree where


– internal nodes are simple decision rules on one or more attributes
– leaf nodes are predicted class labels
◘ Decision trees are used for deciding between several courses of
action
age income student credit_rating buys_computer Attribute
<=30 high no fair no
<=30 high no excellent no
Value
31…40 high no fair yes age?
>40 medium no fair yes
>40 low yes fair yes >40
>40 low yes excellent no
<=30 31..40 Classification
31…40 low yes excellent yes
<=30 medium no fair no student? yes credit rating?
<=30 low yes fair yes
>40 medium yes fair yes No Yes Excellent Fair
<=30 medium yes excellent yes
31…40 medium no excellent yes no yes no yes
31…40 high yes fair yes
>40 medium no excellent no
Decision Regions
Desicion Tree Applications

◘ Decision trees are used extensively in data mining.


◘ Has been applied to:
– classify medical patients based on the disease,
– classify customers based on past behavior (their interests, loyalty, etc.),
– classify documents
– ...

Salary < 1 M

Job = teacher Age < 30

Good Bad Bad Good


House Hiring
Decision Tree Adv. DisAdv.

Positives (+) Negatives (-)


+ Reasonable training time - Cannot handle complicated
+ Fast application relationship between features
+ Easy to interpret - Simple decision boundaries
(can be re-represented as if-then- - Problems with lots of missing
else rules) data
+ Easy to implement - Output attribute must be categorical
+ Can handle large number of - Limited to one output attribute
features
+ Does not require any prior
knowledge of data distribution
Rules Indicated by Decision Trees

◘ Write a rule for each path in the decision tree from the root to a leaf.
Decision Tree Algorithms

◘ ID3
– Quinlan (1981)
– Tries to reduce expected number of comparison
◘ C 4.5
– Quinlan (1993)
– It is an extension of ID3
– Just starting to be used in data mining applications
– Also used for rule induction
◘ CART
– Breiman, Friedman, Olshen, and Stone (1984)
– Classification and Regression Trees
◘ CHAID
– Kass (1980)
– Oldest decision tree algorithm
– Well established in database marketing industry
◘ QUEST
– Loh and Shih (1997)
Decision Tree Construction

◘ Which attribute is the best classifier?


– Calculate the information gain G(S,A) for each attribute A.
– Select the attribute with the highest information gain.

m
Entropy(S)   p i log 2 p i Entropy(S)  p1 log 2 p1  p 2 log 2 p 2
i 1

| Si |
Gain (S, A) Entropy(S )   Entropy(Si)
iA |S|
Entropy
Entropy
Decision Tree Construction

Which attribute first?


Decision Tree Construction

Entropi ( S )  (9 / 14) log 2 (9 / 14)  (5 / 14) log 2 (5 / 14) 0,940


Decision Tree Construction
Decision Tree Construction
Decision Tree Construction
Decision Tree Construction
Decision Tree Construction

Gain(S, Outlook) = 0,25


Gain(S, Temperature) = 0,03
Gain(S, Humidity) = 0,16
Gain(S, Windy) = 0,05

Outlook
Sunny rain
overcast

? yes
?
Decision Tree Construction

◘ Which attribute is next?

Outlook
Sunny rain
overcast

? yes
?

Gain( S sunny ,Wind ) 0,970  (2 / 5)1,0  (3 / 5)0,918 0,970 0,019


Gain( S sunny , Humidity ) 0,970  (3 / 5)0,0  (2 / 5)0,0 0,970

Gain( S Sunny , Temperatur e) 0,970  (2 / 5)0  (2 / 5)1  (1 / 5)0 0,570


Decision Tree Construction

Outlook

rain
Sunny overcast

Wind
Humidity yes

[D3,D7,D12,D13] weak strong


High Normal

yes no
No yes
[D4,D5,D10] [D6,D14]
[D1,D2, D8] [D9,D11]
Another Example
At the weekend:
- go shopping,
- watch a movie,
- play tennis or
- just stay in.

What you do depends on three things:


- the weather (windy, rainy or sunny);
- how much money you have (rich or poor)
- whether your parents are visiting.
Another Example
Classification Techniques

Decision Trees

Bayesian Classification

K-Nearest Neighbor

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches


Classification Techniques
2- Bayesian Classification
◘ A statistical classifier: performs probabilistic prediction, i.e., predicts class
membership probabilities.

◘ Foundation: Based on Bayes’ Theorem.

Given training data X, posteriori probability of a hypothesis H, P(H|X), follows


the Bayes theorem

P(H | X) P(X | H )P(H )


P(X)
Classification Techniques
2- Bayesian Classification
◘ X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
age income student credit_rating buys_computer
◘ P(C1): P(buys_computer = “yes”) = 9/14 = 0.643 <=30 high no fair no
<=30 high no excellent no
P(C2): P(buys_computer = “no”) = 5/14= 0.357
31…40 high no fair yes
>40 medium no fair yes
◘ Compute P(X|Ci) for each class >40 low yes fair yes
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 >40 low yes excellent no
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 31…40 low yes excellent yes
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 <=30 medium no fair no
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 <=30 low yes fair yes
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 >40 medium yes fair yes
<=30 medium yes excellent yes
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
31…40 medium no excellent yes
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
31…40 high yes fair yes
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 >40 medium no excellent no

◘ P(X|C1) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044


P(X|C2) : P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028


P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)


Classification Techniques
2- Bayesian Classification
Classification Techniques

Decision Trees

Bayesian Classification

K-Nearest Neighbor

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches


K-Nearest Neighbor (k-NN)

◘ An object is classified by a majority vote of its neighbors (k closest


members) .

◘ If k = 1, then the object is simply assigned to the class of its nearest


neighbor.

◘ Euclidean Distance measure is used to calculate how close


K-Nearest Neighbor (k-NN)

You might also like