Section 5
Section 5
1
Main objective of this session
Aim
•To present different techniques used to assess the performance of data mining
models.
Learning outcomes
2
1.4 SEMMA Process
3
6.1 Assessing linear models
• Coefficient of determination R2
– Given a cut off: The model can make the classification. Example
Credit Card Fraud detection: We have a model that assigns a
probability of whether a credit card transaction is fraudulent.
Fraud
P (Y = 1 | X 1 ,..., X n ) =
Cut off
No Fraud
5
6.2 Assessing categorical
response models – scorecards
• Discriminatory power
– For each cut off we have a new confusion matrix and one pair of
(sensitivity, 1-specificity).
Fraud
Cut off 1
P (Y = 1 | X 1 ,..., X n ) = Cut off 2
Cut off 3
No Fraud
8
6.2 Assessing categorical
response models – scorecards
• Discriminatory power Cu a
ROC Curve
OC
1
0,9
– Plotting all (sensitivity, 1-specificity),
0,8 we obtain the Receiver
Sensitivity
Operating Characteristic (ROC)0,7 curve.
0,6
0,5
0,4
– The diagonal represents a “random”
0,3 classifier model.
0,2
0,1
0
0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
1-Specificity
9
6.2 Assessing categorical
response models – scorecards
• Discriminatory power
10
• The higher the better!
6.2 Assessing categorical
response models – scorecards
1,0 1 1,0
0,9
0,8
• Discriminatory power 0,9
0,8
0,9
0,8
12