0% found this document useful (0 votes)

10 views

09 Classification DecisionTree Concept Tool Tagged

The document provides an introduction to decision trees as a classification method, illustrating their application in various fields such as insurance, marketing, and banking. It explains the concepts of entropy and information gain, which are used to build decision trees by evaluating the informativeness of attributes. Additionally, it outlines different decision tree algorithms like ID3, C4.5, and CART.

Uploaded by

David Kok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

09 Classification DecisionTree Concept Tool Tagged

Uploaded by

David Kok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 172

COMP1942

Classification 1
(Introduction and Decision
Tree)
Prepared by Raymond Wong
The examples used in Decision Tree are borrowed from LW Chan’s notes
XLMiner Screenshot captured by Qixu Chen
Presented by Raymond Wong
raywong@cse
COMP1942 1
Classification
Suppose there is a person.
Race Incom Child Insuranc
e e
whit high no ?
e
root
child=yes child=no
100% Yes
0% No
Income=high Income=low
100% Yes 0% Yes
0% No 100% No

Decision tree
COMP1942 2
Classification
New set
Suppose there is a person.
Race Incom Child Insuranc
e e
whit high no ?
e
Race Income Child Insurance
root
black high no yes child=yes child=no
white high yes yes

white low yes yes 100% Yes

white low yes yes 0% No
black low no no Income=high Income=low
black low no no

black low no no
100% Yes 0% Yes
white low no no
0% No 100% No

Training set Decision tree

COMP1942 3
Classification
Suppose there is a person.
Race Incom Child Insuranc
e e
whit high no ?
e

Input attributes Target attributes

COMP1942 4
Applications
 Insurance

According to the attributes of customers,

Determine which customers will buy an insurance
policy
 Marketing

According to the attributes of customers,

Determine which customers will buy a product
such as computers
 Bank Loan
 According to the attributes of customers,

Determine which customers are “risky” customers
or “safe” customers

COMP1942 5
Applications
 Network
 According to the traffic patterns,

Determine whether the patterns are
related to some “security attacks”
 Software
 According to the experience of
programmers,

Determine which programmers can fix
some certain bugs

COMP1942 6
Same/Difference
 Classification
 Clustering

COMP1942 7
Classification Methods
 Decision Tree
 Bayesian Classifier
 Nearest Neighbor Classifier

COMP1942 8
Decision Trees
 Decision Trees
Iterative Dichotomiser
 ID3
Classification
 C4.5
Classification And Regression Trees
 CART
 Measurement
 How to use the data mining tool

COMP1942 9
Entropy
 Example 1
 Consider a random variable which has
a uniform distribution over 32
outcomes
 To identify an outcome, we need a
label that takes 32 different values.
 Thus, 5 bit strings suffice as labels

COMP1942 10
Entropy
 Entropy is used to measure how
informative is a node.
 If we are given a probability distribution
P = (p1, p2, …, pn) then the Information
conveyed by this distribution, also called
the Entropy of P, is:
I(P) = - (p1 x log p1 + p2 x log p2 + …+ pn
x log pn)
 All logarithms here are in base 2.
COMP1942 11
Entropy
 For example,
 If P is (0.5, 0.5), then I(P) is 1.
 If P is (0.67, 0.33), then I(P) is 0.92,
 If P is (1, 0), then I(P) is 0.
 The entropy is a way to measure
the amount of information.
 The smaller the entropy, the more
informative we have.

COMP1942 12
Race Incom Child Insuranc
e e
blac high no yes

Entropy k
whit high yes yes
e
Info(T) = - ½ log ½ - ½ log ½ whit low yes yes
=1 e
whit low yes yes
For attribute Race, e
Info(Tblack) = - ¾ log ¾ - ¼ log ¼ = 0.8113 blac low no no
k
Info(Twhite) = - ¾ log ¾ - ¼ log ¼ = 0.8113 blac low no no
k
Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite=
) 0.8113
blac low no no
Gain(Race, T) = Info(T) – Info(Race, T)= 1 – 0.8113= k 0.1887
whit low no no
For attribute Race, Gain(Race, T) = 0.1887 e

COMP1942 13
Race Incom Child Insuranc
e e
blac high no yes

Entropy k
whit high yes yes
e
Info(T) = - ½ log ½ - ½ log ½ whit low yes yes
=1 e
whit low yes yes
For attribute Income, e
Info(Thigh) = - 1 log 1 – 0 log 0= 0 blac low no no
k
Info(Tlow) = - 1/3 log 1/3 – 2/3 log 2/3 = 0.9183 blac low no no
k
Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow) = 0.6887
blac low no no
k = 0.3113
Gain(Income, T) = Info(T) – Info(Income, T)= 1 – 0.6887
whit low no no
For attribute Race, Gain(Race, T) = 0.1887 e
For attribute Income, Gain(Income, T) = 0.3113
COMP1942 14
root Race Incom Child Insuranc
child=yes child=no e e
20% Yes
1
100% Yes blac high no yes
0% No 80% No 2
k
{2, 3, 4} {1, 5, 6, 7, 8} 3
whit high yes yes
Insurance: 3 Yes; 0 No Insurance: 1 Yes; 4 No 4 e
5 whit low yes yes
Info(T) = - ½ log ½ - ½ log ½
=1 6 e
7 whit low yes yes
For attribute Child, e
8
Info(Tyes) = - 1 log 1 – 0 log 0= 0 blac low no no
k
Info(Tno) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219 blac low no no
k
Info(Child, T) = 3/8 x Info(Tyes) + 5/8 x Info(Tno)= 0.4512
blac low no no
Gain(Child, T) = Info(T) – Info(Child, T)
= 1 – 0.4512= k 0.5488
whit low no no
For attribute Race, Gain(Race, T) = 0.1887 e
For attribute Income, Gain(Income, T) = 0.3113
COMP1942
For attribute Child, Gain(Child, T) = 0.5488 15
root Race Incom Child Insuranc
child=yes child=no e e
20% Yes
1
100% Yes blac high no yes
0% No 80% No 2
k
{2, 3, 4} {1, 5, 6, 7, 8} 3
whit high yes yes
Insurance: 3 Yes; 0 No Insurance: 1 Yes; 4 No 4 e
5 whit low yes yes
Info(T) = - 1/5 log 1/5 – 4/5 log 4/5
= 0.7219 6 e
7 whit low yes yes
For attribute Race, e
8
Info(Tblack) = - ¼ log ¼ – ¾ log ¾ = 0.8113 blac low no no
k
Info(Twhite) = - 0 log 0 – 1 log 1= 0 blac low no no
k
Info(Race, T) = 4/5 x Info(Tblack) + 1/5 x Info(Twhite=
) 0.6490
blac low no no
Gain(Race, T) = Info(T) – Info(Race, T) k
= 0.7219 – 0.6490= 0.0729
whit low no no
For attribute Race, Gain(Race, T) = 0.0729 e

COMP1942 16
root Race Incom Child Insuranc
child=yes child=no e e
20% Yes
1
100% Yes blac high no yes
0% No 80% No 2
k
{2, 3, 4} {1, 5, 6, 7, 8} 3
whit high yes yes
Insurance: 3 Yes; 0 No Insurance: 1 Yes; 4 No 4 e
5 whit low yes yes
Info(T) = - 1/5 log 1/5 – 4/5 log 4/5
= 0.7219 6 e
7 whit low yes yes
For attribute Income, e
8
Info(Thigh) = - 1 log 1 – 0 log 0 = 0 blac low no no
k
Info(Tlow) = - 0 log 0 – 1 log 1 = 0 blac low no no
k
Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow=
) 0
blac low no no
Gain(Income, T) = Info(T) – Info(Income, T) = 0.7219 k – 0= 0.7219
whit low no no
For attribute Race, Gain(Race, T) = 0.0729 e
For attribute Income, Gain(Income, T) = 0.7219
COMP1942 17
root Race Incom Child Insuranc
child=yes child=no e e
20% Yes
1
100% Yes blac high no yes
0% No 80% No 2
k
Income=high Income=low 3
100% Yes 0% Yes
whit high yes yes
0% No 100% No
4 e
5 whit
{1}1/5 – 4/5 log 4/5 {5, 6, 7, 8} low yes yes
Info(T) = - 1/5 log
=Insurance:
0.7219 6 e
1 Yes; 0 No Insurance: 0 Yes; 4 No
7 whit low yes yes
For attribute Income, e
8
Info(Thigh) = - 1 log 1 – 0 log 0 = 0 blac low no no
k
Info(Tlow) = - 0 log 0 – 1 log 1 = 0 blac low no no
k
Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow=
) 0
blac low no no
Gain(Income, T) = Info(T) – Info(Income, T) = 0.7219 k – 0= 0.7219
whit low no no
For attribute Race, Gain(Race, T) = 0.0729 e
For attribute Income, Gain(Income, T) = 0.7219
COMP1942 18
root Race Incom Child Insuranc
child=yes child=no e e
1
100% Yes blac high no yes
0% No 2
k
Income=high Income=low 3
100% Yes 0% Yes
whit high yes yes
0% No 100% No
4 e
5 whit low yes yes
6 e
Decision tree 7 whit low yes yes
8 e
blac low no no
k
Suppose there is a new person. blac low no no
Race Incom Child Insuranc k
e e blac low no no
whit high no ? k
e whit low no no
e

COMP1942 19
root Race Incom Child Insuranc
child=yes child=no e e
1
100% Yes blac high no yes
0% No 2
k
Income=high Income=low 3
100% Yes 0% Yes
whit high yes yes
0% No 100% No
4 e
5 whit low yes yes
6 e
Decision tree 7 whit low yes yes
8 e
blac low no no
k
Termination Criteria? blac low no no
k
blac low no no
k
e.g., height of the tree whit low no no
e.g., accuracy of each node
e

COMP1942 20
Decision Trees
 Decision Trees
 ID3
 C4.5
 CART
 Measurement
 How to use the data mining tool

COMP1942 21
C4.5
 ID3
 Impurity Measurement

Gain(A, T)
= Info(T) – Info(A, T)
 C4.5
 Impurity Measurement

Gain(A, T)
= (Info(T) – Info(A, T))/SplitInfo(A)

where SplitInfo(A) = -vA p(v) log p(v)

COMP1942 22
Race Incom Child Insuranc
e e
blac high no yes

Entropy k
whit high yes yes
e
Info(T) = - ½ log ½ - ½ log ½ whit low yes yes
=1 e
whit low yes yes
For attribute Race, e
Info(Tblack) = - ¾ log ¾ - ¼ log ¼ = 0.8113
blac low no no
Info(Twhite) = - ¾ log ¾ - ¼ log ¼ = 0.8113 k

Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite= blac

) 0.8113 low no no
k
SplitInfo(Race) = - ½ log ½ - ½ log ½ = 1 blac low no no
k
Gain(Race, T) = (Info(T) – Info(Race, T))/SplitInfo(Race)= (1 – 0.8113)/1= 0.1887
whit low no no
For attribute Race, Gain(Race, T) = 0.1887 e

COMP1942 23
Race Incom Child Insuranc
e e
blac high no yes

Entropy k
whit high yes yes
e
Info(T) = - ½ log ½ - ½ log ½ whit low yes yes
=1 e
whit low yes yes
For attribute Income, e
Info(Thigh) = - 1 log 1 – 0 log 0= 0
blac low no no
Info(Tlow) = - 1/3 log 1/3 – 2/3 log 2/3 = 0.9183 k

Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow) =blac low

0.6887 no no
k
SplitInfo(Income) = - 2/8 log 2/8 – 6/8 log 6/8= 0.8113
blac low no no
k
Gain(Income, T)= (Info(T)–Info(Income, T))/SplitInfo(Income)
= (1–0.6887)/0.8113
whit low no
= 0.3837 no
For attribute Race, Gain(Race, T) = 0.1887 e
For attribute Income, Gain(Income, T) = 0.3837
COMP1942
For attribute Child, Gain(Child, T) = ? 24
Decision Trees
 Decision Trees
 ID3
 C4.5
 CART
 Measurement
 How to use the data mining tool

COMP1942 25
CART
 Impurity Measurement
 Gini
I(P) = 1 – j pj2

COMP1942 26
Race Incom Child Insuranc
e e
blac high no yes

Gini k
whit high yes yes
e
Info(T) = 1 – (½)2 – (½)2 whit low yes yes
=½ e
whit low yes yes
For attribute Race, e
Info(Tblack) = 1 – (¾)2 – (¼)2 = 0.375 blac low no no
k
Info(Twhite) = 1 – (¾)2 – (¼)2 = 0.375 blac low no no
k
Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite)= 0.375
blac low no no
Gain(Race, T) = Info(T) – Info(Race, T)= ½ – 0.375 k= 0.125
whit low no no
For attribute Race, Gain(Race, T) = 0.125 e

COMP1942 27
Race Incom Child Insuranc
e e
blac high no yes

Gini k
whit high yes yes
e
Info(T) = 1 – (½)2 – (½)2 whit low yes yes
=½ e
whit low yes yes
For attribute Income, e

Info(Thigh) = 1 – 12 – 02 = 0 blac low no no

k
Info(Tlow) = 1 – (1/3)2 – (2/3)2 = 0.444 blac low no no
k
Info(Income, T) = 1/4 x Info(Thigh) + 3/4 x Info(Tlow)= 0.333
blac low no no
Gain(Income, T) = Info(T) – Info(Race, T)= ½ – 0.333 k = 0.167
whit low no no
For attribute Race, Gain(Race, T) = 0.125 e
For attribute Income, Gain(Race, T) = 0.167
COMP1942
For attribute Child, Gain(Child, T) = ? 28
Decision Trees
 Decision Trees
 ID3
 C4.5
 CART
 Measurement
 How to use the data mining tool

COMP1942 29
Measurement
 Confusion Matrix
 Error Report
 Lift Chart
 Decile-wise lift chart
 Others

COMP1942 30
Measurement – Confusion
Matrix
New set
Suppose there is a person.
Race Incom Child Insuranc
e e
whit high no ?
e
Race Income Child Insurance
root
black high no yes child=yes child=no
white high yes yes

white low yes yes 100% Yes

white low yes yes 0% No
black low no no Income=high Income=low
black low no no

black low no no
100% Yes 0% Yes
white low no no
0% No 100% No

Training set
COMP1942 Decision tree 31
Measurement – Confusion
Matrix

Race Incom Child InsurancPredicted

e e
yes
blac high no yes yes
k Race Income Child Insurance
root
black high no yes
yes child=yes child=no
whit high
white high yesyes yes yes
e white low yes yes yes 100% Yes
white low yes yes 0% No
whitblack low yes yes no Income=low
low no no Income=high
e black low no no
no 100% Yes 0% Yes
black low no no
0% No 100% No
whitwhite low
low noyes no yes no
e no
blac low no no
k Training set
COMP1942 Decision tree 32
Measurement – Confusion
Confusion Matrix
Matrix Actual
Predicted Class
Yes No
Class 4 0
Yes 0 4

Race Incom Child Actual Predicted

Insuranc No Is this decision tree “good”?
e e
yes
blac high no yes yes
k Race Income Child Insurance
root
black high no yes
yes child=yes child=no
whit high
white high yesyes yes yes
e white low yes yes yes 100% Yes
white low yes yes 0% No
whitblack low yes yes no Income=low
low no no Income=high
e black low no no
no 100% Yes 0% Yes
black low no no
0% No 100% No
whitwhite low
low noyes no yes no
e no
blac low no no
k Training set
COMP1942 Decision tree 33
Measurement
 Confusion Matrix
 Error Report
 Lift Chart
 Decile-wise lift chart
 Others

COMP1942 34
Measurement
Error Report - Error
Class # # % Error
Report Cases
4 Errors
0 0.00
Yes 4 0 0.00
No 8 0 0.00

Overall
Race Incom Child Actual Predicted
Insuranc
e e
yes
blac high no yes yes
k Race Income Child Insurance
root
black high no yes
yes child=yes child=no
whit high
white high yesyes yes yes
e white low yes yes yes 100% Yes
white low yes yes 0% No
whitblack low yes yes no Income=low
low no no Income=high
e black low no no
no 100% Yes 0% Yes
black low no no
0% No 100% No
whitwhite low
low noyes no yes no
e no
blac low no no
k Training set
COMP1942 Decision tree 35
Measurement
 Confusion Matrix
 Error Report
 Lift Chart
 Decile-wise lift chart
 Others

COMP1942 36
Measurement - Lift Chart
 Lift charts
 visual aids for measuring model
performance
 consist of a lift curve and a baseline

 The greater the area between the lift

curve and the baseline, the better the
model.

COMP1942 37
Measurement - Lift Chart
 Lift charts
 We need to define which value in the
target attribute is a “success”
 In our running example, we can treat
“Yes” as a success

COMP1942 38
Lift Chart
4
3
Cumulative
2
1
Measurement 0–1 Lift
Cumulative Insurance of
actual values (or Lift Curve)
Curve
2 3 4 5 6 7 8 # cases

Sort the tuples according to the predicted value where “success”

(or “yes”) are sorted at a higher priority and then the actual value

Race Incom Child Actual Predicted

Insuranc Race Incom Child Actual Predicted
Insuranc
e e e e
yes yes
blac high no yes yes blac high no yes yes
k Race Income Child Insurance
k
black high no yes
yes yes
whit high
white high yesyes yes yes whit high yes yes
e white low yes yes yes e yes
white low yes yes

whitblack low
low noyes no yes no whit low yes yes no
e black low no no
no e no
black low no no

whitwhite low
low noyes no yes no whit low yes yes no
e no e no
blac low no no blac low no no
k COMP1942 k 39
Lift Chart
4
3
Cumulative
2
1
Measurement 0–1 Lift
Cumulative Insurance of
actual values (or Lift Curve)
Curve
2 3 4 5 6 7 8 # cases

Sort the tuples according to the predicted Cumulative

value whereInsurance
“success” using
(or “yes”) areArea between
sorted the lift
at a higher Average
curveand
priority (or
then the Baseline)
actual value
and the baseline
Actual Predicted Race Incom Child Insuranc
Race Incom Child Insuranc Actual Predicted
e e e e
yes yes
blac high no yes yes blac high no yes yes
k Race Income Child Insurance
k
black high no yes
yes yes
whit The
white largeryesyes
high
high the area
yes is, the better the classifier
yes whit is.high yes yes
e white low yes yes yes e yes
white low yes yes

whitblack low
low noyes no yes no whit low yes yes no
e black low no no
no e no
black low no no

whitwhite low
low noyes no yes no whit low yes yes no
e no e no
blac low no no blac low no no
k COMP1942 k 40
Measurement
 Confusion Matrix
 Error Report
 Lift Chart
 Decile-wise lift chart
 Others

COMP1942 41
Measurement - Decile-wise
lift chart
 A decile is any of the nine values that
divide the sorted data into ten equal
parts, so that each part represents
1/10 of the sample or population.
 E.g., 1st decile: the first 10% tuples
 E.g., 2nd decile: the second 10% tuples
 E.g., 3rd decile: the third 10% tuples

COMP1942 42
Decile-wise lift chart

Measurement - Decile-wise Decile mean/

4
3
2
Global lift
mean =chart
Global mean 1
4/8 = 0.5
0 1 2 3 4 5 6 7 8 9 10
Deciles

Sort the tuples according to the predicted value where “success”

(or “yes”) are sorted at a higher priority and then the actual value

Race Actual Predicted

Incom Child Insuranc Race Incom Child Actual Predicted
Insuranc
e e 1.0 1st Decile
Decile mean = 12.5%e e
yes yes
blac high no meanyes
Decile = 1.0 2
yesDecile
nd
blac25%high no yes yes
k Race Income Child Insurance
Decile mean = 1.0 3rd Decile k
black high no
Decile
yes
mean = 1.0 yes
4th Decile
37.5% yes
whit high
white high yesyes yes yes whit high yes yes
e white low Decile
yes mean
yes = 1.0 yes
5 Decile th
e 50% yes
white low yes yes
Decile mean = 0.0 no
6th Decile no
whitblack low
low no yes no yes whit62.5%
low yes yes
e black low Decile
no mean
no = 0.0 7
noDecile
th
e 75% no
black low no no
Decile mean = 0.0 8 Decile th

whitwhite low
low noyes no yes no whit87.5%
low yes yes no
Decile mean = 0.0 9th Decile
e noth e no
Decile mean = 0.0 10 Decile 100%
blac COMP1942
low no no blac low no no 43
Measurement
 Confusion Matrix
 Error Report
 Lift Chart
 Decile-wise lift chart
 Others

COMP1942 44
Measurement - Others
 We will discuss them later.
 E.g., Precision, Recall, Specificity,
f1-score

COMP1942 45
Decision Trees
 Decision Trees
 ID3
 C4.5
 CART
 Measurement
 How to use the data mining tool

COMP1942 46
How to use the data
mining tool
 We can use XLMiner for
classification (Decision Tree, CART)

COMP1942 47
How to use the data
mining tool
 We have the following 2 versions.
 XLMiner Desktop (installed in either
the CSE lab machine or your
computer)
 XLMiner Cloud (installed as a plugin in
your Office 365 Excel)

COMP1942 48
Race Incom Child Insuranc
e e
Suppose there is a person.
Race Incom Child Insuranc blac high no yes
e e k
whit high no ? whit high yes yes
e e
 XLMiner requires that whit
e
low yes yes

the input data should whit low yes yes

have the following e

format. blac
k
low no no

 Input attributes blac low no no

Input attributes:

Numeric k
• Categorical
blac attribute:
Target low no no
 Target attribute k
• Categorical

(or output attribute) whit low no no

We can etransform from “categorical”

Categorical to “numeric” by XLMiner
Transformation Tool first
COMP1942 49
Race Incom Child Insuranc
1
e
1 1
e
Suppose there is a person.
Race Incom Child Insuranc blac
2
high
1
no
2
yes
2 e e k
1 1 2 2 2
whit high no ? whit high yes yes
e e2 2 2

 XLMiner requires that 1

whit
e1
2
low 1
yes yes
2 1
the input data should whit
1 low
2 yes
1 yes
have the following e2
2 1

format. blac
k
low no no

 Input attributes blac low no no

Input attributes:

Numeric k
• Categorical
blac attribute:
Target low no no
 Target attribute k
• Categorical

(or output attribute) whit low no no

We can etransform from “categorical”

Categorical to “numeric” by XLMiner
Transformation Tool first
COMP1942 50
Race Incom Child Insuranc
1
e
1 1
e
Suppose there is a person.
Race Incom Child Insuranc blac
2
high
1
no
2
yes
2 e e k
1 1 2 2 2
whit high no ? whit high yes yes
e e2 2 2

 How can XLMiner 1

whit
e1
2
low 1
yes yes
2 1
perform the whit
1 low
2 yes
1 yes
transformation? e2
2 1
blac low no no
 Open “classification- k
decisionTree.xlsx” in MS Excel in blac low no no
a CSE lab machine Input attributes:
k
• Categorical
blac attribute:
Target low no no
k
• Categorical

whit low no no
We can etransform from “categorical”
to “numeric” by XLMiner
Transformation Tool first
COMP1942 51
COMP1942 52
COMP1942 53
COMP1942 54
COMP1942 55
COMP1942 56
Data source

Worksheet

Data range Workbook

Select the correct data range

“$B$1:$E$10”

COMP1942 57
# Rows
9
Variables

4
# Columns

Variables
Variables to
be factored

First row contains header

COMP1942 58
COMP1942 59
COMP1942 60
Assign numbers 1, 2, 3,…

Option
Assign numbers 0, 1, 2,…

COMP1942 61
COMP1942 62
COMP1942 63
COMP1942 64
COMP1942 65
COMP1942 66
COMP1942 67
 We have finished the
transformation.
 However, we want to make the
classification process easier
 Thus, we want to “tidy up” the
input format for the later process
of classification now.

COMP1942 68
Copy and paste!

COMP1942 69
COMP1942 70
COMP1942 71
COMP1942 72
COMP1942 73
COMP1942 74
COMP1942 75
COMP1942 76
COMP1942 77
 Now, we understand how to
perform the transformation.
 We also “tidied up” the data for the
process of classification
 Next, we need to perform the data
mining task for classification
(Decision Tree, CART)

COMP1942 78
COMP1942 79
COMP1942 80
COMP1942 81
Data Source Workbook

Worksheet

Data range

Select the correct data range

“$D$27:$G$35”

COMP1942 82
# Columns
4
# Rows in

Training set 8

COMP1942 83
Variables
First row contains header

Selected variables

Variables in input data

Output variables

COMP1942 84
COMP1942 85
COMP1942 86
COMP1942 87
COMP1942 88
Binary Classification

Target Success class

Classes
Yes

Number of classes:

2
0.5
Success probability cutoff
COMP1942 89
Preprocessing

Partition data Rescale data

COMP1942 90
Limit number of: Tree Growth

Levels Limit value

Nodes

Splits
Records in terminal nodes

COMP1942 91
1

COMP1942 92
Maximum number of 5
levels (to display)

COMP1942 93
Full grown

COMP1942 94
COMP1942 95
Score training data
Detailed report

Summary report

Lift charts

COMP1942 96
Score new data

In worksheet

COMP1942 97
Data source

Worksheet
Workbook

Data range

Select the correct data range

“$D$37:$G$38”

COMP1942 98
# Rows in data 1
4

Variables

# Columns in data

First row contains headers

COMP1942 99
Scale variables in input data
Variables in new data

Match sequentially

Match selected Unmatch all

Unmatch selected
Match by name

COMP1942 100
COMP1942 101
COMP1942 102
COMP1942 103
COMP1942 104
COMP1942 105
COMP1942 106
COMP1942 107
COMP1942 108
COMP1942 109
COMP1942 110
COMP1942 111
COMP1942 112
COMP1942 113
COMP1942 114
COMP1942 115
COMP1942 116
COMP1942 117
COMP1942 118
COMP1942 119
COMP1942 120
Child_ord <=1.5 Child_ord >1.5

# of cases in
this branch = 5 # of cases in
this branch = 3
Income_ord
<=1.5 Income_ord >1.5

# of cases in
this branch = 1 # of cases in
this branch = 4

COMP1942 121
COMP1942 122
COMP1942 123
COMP1942 124
COMP1942 125
COMP1942 126
COMP1942 127
COMP1942 128
COMP1942 129
COMP1942 130
COMP1942 131
COMP1942 132
COMP1942 133
COMP1942 134
COMP1942 135
COMP1942 136
COMP1942 137
COMP1942 138
COMP1942 139
COMP1942 140
COMP1942 141
COMP1942 142
COMP1942 143
COMP1942 144
COMP1942 145
COMP1942 146
COMP1942 147
COMP1942 148
COMP1942 149
COMP1942 150
COMP1942 151
COMP1942 152
COMP1942 153
How to use the data
mining tool
 We have the following 2 versions.
 XLMiner Desktop (installed in either
the CSE lab machine or your
computer)
 XLMiner Cloud (installed as a plugin in
your Office 365 Excel)

COMP1942 154
How to use the data
mining tool (XLMiner
Cloud)
 The way of opening “Create Category
Scores” in XLMiner Cloud plugin in your
Office 365 Excel

“Data Science” Tag  Transform  Categorical
Data - Create Category Scores

COMP1942 155
How to use the data mining tool
(XLMiner Cloud)
 The steps and output format of “create
category scores” in XLMiner Cloud are
similar to the steps in XLMiner Desktop.
 The transformation result of XLMiner
Cloud Platform is the same as that from
XLMiner Desktop.

COMP1942 156
How to use the data
mining tool (XLMiner
Cloud)
 The way of opening “classification tree” in
XLMiner Cloud plugin in your Office 365
Excel

“Data Science” Tag  Classify  Classification
Tree

COMP1942 157
How to use the data
mining tool (XLMiner
Cloud)
 The steps of performing “classification
tree” in XLMiner Cloud is similar to the
steps in XLMiner Desktop.
 The decision tree and classification result
of XLMiner Cloud is the same as that from
XLMiner Desktop.

COMP1942 158
How to use the data
mining tool (XLMiner
Cloud)
 The output format of XLMiner Cloud is
similar to the output in XLMiner Desktop.
 However, to display the classification tree
and the lift charts, you need to call out the
“Charts” window.

COMP1942 159
COMP1942 160
COMP1942 161
COMP1942 162
COMP1942 163
COMP1942 164
COMP1942 165
COMP1942 166
COMP1942 167
COMP1942 168
COMP1942 169
COMP1942 170
COMP1942 171
COMP1942 172

62 1 2022 X 20221108 PDF
No ratings yet
62 1 2022 X 20221108 PDF
7 pages
Data Mining Classification: Alternative Techniques
No ratings yet
Data Mining Classification: Alternative Techniques
15 pages
FINAL PROJECT For PURPOSIVE COMMUNICATION
No ratings yet
FINAL PROJECT For PURPOSIVE COMMUNICATION
4 pages
Decision Tree and KNN Assignment Two
No ratings yet
Decision Tree and KNN Assignment Two
13 pages
CALCULATION
No ratings yet
CALCULATION
15 pages
Artificial Intelligence For Business: A.K. Swain
No ratings yet
Artificial Intelligence For Business: A.K. Swain
27 pages
Decision Tree
No ratings yet
Decision Tree
71 pages
09 - ML - Decision Tree
No ratings yet
09 - ML - Decision Tree
45 pages
Chap4 Naive Bayes
No ratings yet
Chap4 Naive Bayes
13 pages
BA Project - Section 1 Group 1
No ratings yet
BA Project - Section 1 Group 1
27 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
Randomforest TNP
No ratings yet
Randomforest TNP
71 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
Comp 1942 finalExamSol-2016
No ratings yet
Comp 1942 finalExamSol-2016
24 pages
Chapter4 Machine Learning Part3
No ratings yet
Chapter4 Machine Learning Part3
43 pages
Supervised Learning by Fadhlurrohman Henriwan
No ratings yet
Supervised Learning by Fadhlurrohman Henriwan
31 pages
AIML Lect5 Assignment ID3
No ratings yet
AIML Lect5 Assignment ID3
2 pages
Week 6 Chap3 - Basic - Classificationi
No ratings yet
Week 6 Chap3 - Basic - Classificationi
59 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
ID3 MedhaPradhan
No ratings yet
ID3 MedhaPradhan
22 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
AIML_UNIT-4
No ratings yet
AIML_UNIT-4
82 pages
DM DT Solved Example 01 - Unlocked
No ratings yet
DM DT Solved Example 01 - Unlocked
4 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
02450ex Spring2020
No ratings yet
02450ex Spring2020
13 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Comp 1942 finalExamQuestion-2019
No ratings yet
Comp 1942 finalExamQuestion-2019
14 pages
Lecture 11-Classification-M
No ratings yet
Lecture 11-Classification-M
33 pages
02450ex Spring2020 Sol
No ratings yet
02450ex Spring2020 Sol
20 pages
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
58 pages
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
61 pages
Bayesian Classifiers Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Bayesian Classifiers Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
17 pages
C4.5 Decision Tree Algorithm
No ratings yet
C4.5 Decision Tree Algorithm
47 pages
Week 5 - Decision Trees
No ratings yet
Week 5 - Decision Trees
42 pages
Franciele - Bloco de Notas
No ratings yet
Franciele - Bloco de Notas
6 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
52 pages
Decision Tree 1
No ratings yet
Decision Tree 1
59 pages
Adult Census Income Prediction
No ratings yet
Adult Census Income Prediction
31 pages
Liaquat Majeed Sheikh: National University of Computer and Emerging Sciences
No ratings yet
Liaquat Majeed Sheikh: National University of Computer and Emerging Sciences
79 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
DM Lec8
No ratings yet
DM Lec8
17 pages
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
No ratings yet
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
23 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
71 pages
Fairness Lectures-21
No ratings yet
Fairness Lectures-21
63 pages
Basic Classification
No ratings yet
Basic Classification
58 pages
Classification Intr DT .Pptx
No ratings yet
Classification Intr DT .Pptx
31 pages
Mod 3 part1_merged
No ratings yet
Mod 3 part1_merged
101 pages
Chap4 - Basic - Classification-Admin and Economy
No ratings yet
Chap4 - Basic - Classification-Admin and Economy
31 pages
ID3 Algorithm & ROC Analysis
No ratings yet
ID3 Algorithm & ROC Analysis
51 pages
Tobit Models: Econ 60303 Bill Evans
No ratings yet
Tobit Models: Econ 60303 Bill Evans
20 pages
23 Id3
No ratings yet
23 Id3
20 pages
DAL Assignment 3
No ratings yet
DAL Assignment 3
7 pages
Chap3 Basic Classification
No ratings yet
Chap3 Basic Classification
59 pages
Lecture Notes For Chapter 4: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4: by Tan, Steinbach, Kumar
107 pages
تمييز اشكال ميد
No ratings yet
تمييز اشكال ميد
267 pages
Chap4 Basic Classification
No ratings yet
Chap4 Basic Classification
51 pages
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
59 pages
3. Classification Trees,
No ratings yet
3. Classification Trees,
48 pages
Tackling Imbalanced Data with Python: Advanced Techniques and Real-World Applications for Tackling Class Imbalance
From Everand
Tackling Imbalanced Data with Python: Advanced Techniques and Real-World Applications for Tackling Class Imbalance
Aarav Joshi
No ratings yet
Mock+Midterm+Exam+2 copy
No ratings yet
Mock+Midterm+Exam+2 copy
5 pages
ICA07 iPRS
No ratings yet
ICA07 iPRS
1 page
Mock+Midterm+Exam Copy
No ratings yet
Mock+Midterm+Exam Copy
4 pages
HKCC Ccn2046 - Lesson 11
No ratings yet
HKCC Ccn2046 - Lesson 11
37 pages
ICA08 iPRS
No ratings yet
ICA08 iPRS
1 page
FM Test Questions-Draft 3
No ratings yet
FM Test Questions-Draft 3
3 pages
HKCC Ccn2046 - Lesson 01
No ratings yet
HKCC Ccn2046 - Lesson 01
31 pages
FM - Assignment 1-20-21 Sem.2
No ratings yet
FM - Assignment 1-20-21 Sem.2
2 pages
10 Solution TakeHomeQ Ch2 Q A
No ratings yet
10 Solution TakeHomeQ Ch2 Q A
7 pages
06 Chapter 6 Notes
No ratings yet
06 Chapter 6 Notes
10 pages
PDF Photographs of the Past Process and Preservation 2nd edition Edition Bertrand Lavédrine download
100% (2)
PDF Photographs of the Past Process and Preservation 2nd edition Edition Bertrand Lavédrine download
55 pages
KJDK
No ratings yet
KJDK
3 pages
Circulatory System Essay
100% (2)
Circulatory System Essay
5 pages
Bioplasm-NLS Use Manual (Training) - Ingles
100% (1)
Bioplasm-NLS Use Manual (Training) - Ingles
136 pages
26._Dec_2022 BCHCT-137 IGNOUAssignmentGuru.com
No ratings yet
26._Dec_2022 BCHCT-137 IGNOUAssignmentGuru.com
8 pages
2011 (1) - Nuclear Engineering
No ratings yet
2011 (1) - Nuclear Engineering
9 pages
NPG-59 Toner (Black) (Iss4mar13)
No ratings yet
NPG-59 Toner (Black) (Iss4mar13)
7 pages
Sexy Beast Series
No ratings yet
Sexy Beast Series
108 pages
Earth and Life Science Quarter 1 Week 1-3
No ratings yet
Earth and Life Science Quarter 1 Week 1-3
37 pages
1234567
No ratings yet
1234567
22 pages
Lesson Plan 4
No ratings yet
Lesson Plan 4
4 pages
Commentary On Qualification Procedures: SSPC: The Society For Protective Coatings
No ratings yet
Commentary On Qualification Procedures: SSPC: The Society For Protective Coatings
4 pages
Welland - Public Engagement Trails - Question and Answers Document
No ratings yet
Welland - Public Engagement Trails - Question and Answers Document
18 pages
A Simplified Wood Sheathing Connection Model Using Ansys
No ratings yet
A Simplified Wood Sheathing Connection Model Using Ansys
9 pages
Schedule Review Checklist
No ratings yet
Schedule Review Checklist
3 pages
AIML Unit 2 Introduction To Machine Learning
No ratings yet
AIML Unit 2 Introduction To Machine Learning
32 pages
M20 Operatives Dossier (Final Download)
100% (1)
M20 Operatives Dossier (Final Download)
102 pages
Johannes Walther's Law of The Correlation of Fades: Canada
No ratings yet
Johannes Walther's Law of The Correlation of Fades: Canada
9 pages
(Contoh) Hirarc Template From Hirarc Guidelines (Contoh)
No ratings yet
(Contoh) Hirarc Template From Hirarc Guidelines (Contoh)
2 pages
Weekly Home Learning Plan: Banisil National High School
No ratings yet
Weekly Home Learning Plan: Banisil National High School
1 page
21cc l4 U1 Test
No ratings yet
21cc l4 U1 Test
6 pages
3.chemical Kinetics
No ratings yet
3.chemical Kinetics
2 pages
In Fruit Flies
No ratings yet
In Fruit Flies
3 pages
Problem Solving - Basic 7QC Tools
No ratings yet
Problem Solving - Basic 7QC Tools
86 pages
Directorate of Technical Education, Maharashtra State, Mumbai
No ratings yet
Directorate of Technical Education, Maharashtra State, Mumbai
57 pages
Doujin CPU Chan
No ratings yet
Doujin CPU Chan
34 pages
Science Festival 2024
No ratings yet
Science Festival 2024
18 pages
Bmec Program Structure
No ratings yet
Bmec Program Structure
5 pages

09 Classification DecisionTree Concept Tool Tagged

Uploaded by

09 Classification DecisionTree Concept Tool Tagged

Uploaded by

COMP1942

white low yes yes 100% Yes

Training set Decision tree

Input attributes Target attributes

Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite= blac

Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow) =blac low

Info(Thigh) = 1 – 12 – 02 = 0 blac low no no

white low yes yes 100% Yes

Race Incom Child InsurancPredicted

Race Incom Child Actual Predicted

 The greater the area between the lift

Sort the tuples according to the predicted value where “success”

Race Incom Child Actual Predicted

Sort the tuples according to the predicted Cumulative

Measurement - Decile-wise Decile mean/

Sort the tuples according to the predicted value where “success”

Race Actual Predicted

the input data should whit low yes yes

 Input attributes blac low no no

(or output attribute) whit low no no

 XLMiner requires that 1

 Input attributes blac low no no

(or output attribute) whit low no no

 How can XLMiner 1

Data range Workbook

Select the correct data range

First row contains header

Select the correct data range

Variables in input data

Target Success class

Partition data Rescale data

Levels Limit value

Select the correct data range

First row contains headers

Match selected Unmatch all

You might also like