05 Classification Part1
05 Classification Part1
Part 1
◘ What Is Classification?
◘ Classification Examples
◘ Classification Methods
– Decision Trees
– Bayesian Classification
– K-Nearest Neighbor
– Neural Network
– Support Vector Machines (SVM)
– Fuzzy Set Approaches
What Is Classification?
◘ Classification
– Construction of a model to classify data
– When constructing the model, use the training set and the class labels
(i.e. yes no) in the target column
1. Model construction
– Each tuple is assumed to belong to a predefined class
– The set of tuples used for model construction is training set
– The model is represented as classification rules, trees, or mathematical formulae
2. Test Model
– Using test set, estimate accuracy rate of the model
• Accuracy rate is the percentage of test set samples that are correctly classified
by the model
Training
Learn
Model
Set Classifier
Decision Trees
Bayesian Classification
K-Nearest Neighbor
…
Decision Trees
Salary < 1 M
◘ Write a rule for each path in the decision tree from the root to a leaf.
Decision Tree Algorithms
◘ ID3
– Quinlan (1981)
– Tries to reduce expected number of comparison
◘ C 4.5
– Quinlan (1993)
– It is an extension of ID3
– Just starting to be used in data mining applications
– Also used for rule induction
◘ CART
– Breiman, Friedman, Olshen, and Stone (1984)
– Classification and Regression Trees
◘ CHAID
– Kass (1980)
– Oldest decision tree algorithm
– Well established in database marketing industry
◘ QUEST
– Loh and Shih (1997)
Decision Tree Construction
m
Entropy(S) p i log 2 p i Entropy(S) p1 log 2 p1 p 2 log 2 p 2
i 1
| Si |
Gain (S, A) Entropy(S ) Entropy(Si)
iA |S|
Entropy
Entropy
Decision Tree Construction
Outlook
Sunny rain
overcast
? yes
?
Decision Tree Construction
Outlook
Sunny rain
overcast
? yes
?
Outlook
rain
Sunny overcast
Wind
Humidity yes
yes no
No yes
[D4,D5,D10] [D6,D14]
[D1,D2, D8] [D9,D11]
Another Example
At the weekend:
- go shopping,
- watch a movie,
- play tennis or
- just stay in.
Decision Trees
Bayesian Classification
K-Nearest Neighbor
…
Classification Techniques
2- Bayesian Classification
◘ A statistical classifier: performs probabilistic prediction, i.e., predicts class
membership probabilities.
Decision Trees
Bayesian Classification
K-Nearest Neighbor
…
K-Nearest Neighbor (k-NN)