ML Classification Tree
ML Classification Tree
Classification Tree
Lecture 06
Learning Outcomes
• Information Theory, Entropy & Information Gain.
• Classification Tree Algorithm ID3.
• True Pruning
3
Decision Tree Learning
Decision tree learning is a method for approximating discrete-valued
target functions, in which the learned function is represented by a
decision tree.
Learned trees can also be re-represented as set of IF-THEN rules to
improve human readability
When to consider decision tree:
Instances describable by attribute-value pairs
Target function is discrete valued
Examples:
Equipment or medical diagnosis
Credit risk analysis
4
Decision Tree for PlayTennis (Example)
Decision tree representation:
Each internal node tests an attribute
Each branch corresponds to attribute value
Each leaf node assigns a classification
5
Example of a Decision Tree
Splitting Attributes
Tid Refund Marital Taxable
Status Income Cheat
Training Data 6
Another Example of Decision Tree
MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that fits
10 No Single 90K Yes the same data!
10
7
Top-Down Induction of Decision Trees (Approach)
• Main loop:
Many Algorithms:
1. A the “best” decision attribute for next node
– Hunt’s Algorithm
2. Assign A as decision attribute for node
3. For each value of A, create new descendant of node
– CART
4. Sort training examples to leaf nodes – ID3, C4.5
5. If training examples perfectly classified, Then STOP, – SLIQ, SPRINT
Else iterate over new leaf nodes
Greedy strategy: Split the records based on an attribute test that optimizes certain criterion
. The following are issues
1) Determine how to split the records
How to specify the attribute test condition? How to determine the best split?
2) Determine when to stop splitting
8
How to determine the Best Split & Measured Node Impurity
– Greedy approach:
Nodes with homogeneous class distribution are preferred
– Need a measure of node impurity:
C0: 9
C0: 5
C1: 5 C1: 1
9
Entropy
10
Information Gain
Gain(S, A) = expected reduction in entropy due to sorting on A
11
12
13
14
Entropy, a common way to measure impurity
15
2- Class Cases
16
Information Gain
17
Calculating Information Gain
18
Classification Tree Example
How would you distinguish Class I from Class II
19
Training Examples
20
Selecting the Next Attribute(1/2)
Which attribute is the best classifier?
21
Selecting the Next Attribute(2/2)
Ssunny = {D1,D2,D8,D9,D11}
Gain (Ssunny , Humidity) = .970 - (3/5) 0.0 - (2/5) 0.0 = .970
Gain (Ssunny , Temperature) = .970 - (2/5) 0.0 - (2/5) 1.0 - (1/5) 0.0 = .570
Gain (Ssunny, Wind) = .970 - (2/5) 1.0 - (3/5) .918 = .019
22
Decision Tree Based Classification
Advantages:
– Inexpensive to construct
– Extremely fast at classifying unknown records
– Easy to interpret for small-sized trees
– Accuracy is comparable to other classification techniques for many
simple data sets
23
Divide and Conquer
Constructing Decision Trees
outlook temperature humidity windy play
sunny 85 85 FALSE no
sunny 80 90 TRUE no
overcast 83 86 FALSE yes
rainy 70 96 FALSE yes
rainy 68 80 FALSE yes
rainy 65 70 TRUE no
overcast 64 65 TRUE yes
sunny 72 95 FALSE no
sunny 69 70 FALSE yes
rainy 75 80 FALSE yes
sunny 75 70 TRUE yes
overcast 72 90 TRUE yes
overcast 81 75 FASLE yes
rainy 71 91 TRUE no
Which attribute to select ?
24
Divide and Conquer
Constructing Decision Trees
• Which is the best attribute?
– The one which will result in the smallest tree
• Popular impurity criterion: information gain
– Information gain increases with the average purity of the subsets that an attr
ibute produces
• Strategy: choose attribute that results in greatest information gain
25
Divide and Conquer
Constructing Decision Trees
outlook play
sunny no
sunny no
overcast yes
rainy yes
rainy yes
rainy no
overcast yes
sunny no
sunny yes
rainy yes
sunny yes
overcast yes
overcast yes
rainy no [2,3] [4,0] [3,2]
26
[2,3] [4,0] [3,2] [2,2] [4,2] [3,1]
27
Divide and Conquer:
Constructing Decision Trees
[2,3] [3,2]
[4,0]
28
Divide and Conquer:
Constructing Decision Trees
play [9,5]
• Info ([0,2]) = 0
• Info ([1,1]) = -1/1 * log 1/1 – 1/1 * log 1/1
• Info ([1,0]) = 0
• Info ([0,2], [1,1], [1,0]) = 0.4 bits
• Gain (temperature) = 0.971 – 0.4 = 0.571 bits
31
Divide and Conquer:
Constructing Decision Trees
• Gain (temperature) = 0.571 bits
• Gain (humidity) = 0.971 bits
• Gain (windy) = 0.020 bits
32
Example (1/2)
33
Example (2/2)
34
Review Questions
1. What is entropy?
2. What will be the value of entropy if the distribution is homogenous.
3. What is Information Gain?
4. How we select attribute for the root node of the tree.
5. What is Tree Pruning?
35
Thank you