DSTBD_10-DMClassification-ENG

Uploaded by

chamarilk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views160 pages

DSTBD_10-DMClassification-ENG

Uploaded by

chamarilk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 160

Classification fundamentals

B
D MG
Data Base and Data Mining Group of Politecnico di Torino

Elena Baralis
Politecnico di Torino
Classification
◼ Objectives
◼ prediction of a class label
◼ definition of an interpretable model of a given
phenomenon

training data

model

unclassified data classified data

DB
MG
2
Classification
◼ Applications
◼ detection of customer propension to leave a company
(churn or attrition)
◼ fraud detection
◼ classification of different pathology types
◼ …

training data

model

unclassified data classified data

DB
MG
5
Classification: definition
◼ Given
◼ a collection of class labels
◼ a collection of data objects labelled with a
class label
◼ Find a descriptive profile of each class,
which will allow the assignment of
unlabeled objects to the appropriate class

DB
MG
6
Definitions
◼ Training set
◼ Collection of labeled data objects used to learn
the classification model
◼ Test set
◼ Collection of labeled data objects used to
validate the classification model

DB
MG
7
Classification techniques
◼ Decision trees
◼ Classification rules
◼ Association rules
◼ Neural Networks
◼ Naïve Bayes and Bayesian Networks
◼ k-Nearest Neighbours (k-NN)
◼ Support Vector Machines (SVM)
◼ …
DB
MG
8
Evaluation of classification techniques

◼ Accuracy ◼ Efficiency
◼ quality of the prediction ◼ model building time
classification time
◼ Interpretability ◼

◼ model interpretability ◼ Scalability

◼ model compactness ◼ training set size
attribute number
◼ Incrementality ◼

◼ model update in presence of ◼ Robustness

newly labelled record ◼ noise, missing data

DB
MG
9
Decision trees
B
D MG
Data Base and Data Mining Group of Politecnico di Torino

Elena Baralis
Politecnico di Torino
Example of decision tree

Tid Refund Marital Taxable

Splitting Attributes
Status Income Cheat

1 Yes Single 125K No

2 No Married 100K No Refund
3 No Single 70K No
Yes No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes
Single, Divorced Married
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree

DB
MG From: Tan,Steinbach, Kumar, Introduction to Data Mining, McGraw Hill 2006
11
Another example of decision tree

MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes fits the same data!
10

DB
MG From: Tan,Steinbach, Kumar, Introduction to Data Mining, McGraw Hill 2006
12
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No