0% found this document useful (0 votes)
30 views36 pages

ML Classification Tree

This document provides a summary of a presentation on machine learning classification trees. It begins with learning outcomes related to information theory, entropy, information gain, and the classification tree algorithm ID3. It then discusses decision tree learning and how decision trees can represent target functions and be converted to rule sets. Examples of classification trees are provided for predicting whether to play tennis based on weather conditions. The document outlines the top-down induction approach for generating decision trees, and how entropy and information gain are used to determine the best attribute to split the data on at each node in the tree.

Uploaded by

admin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views36 pages

ML Classification Tree

This document provides a summary of a presentation on machine learning classification trees. It begins with learning outcomes related to information theory, entropy, information gain, and the classification tree algorithm ID3. It then discusses decision tree learning and how decision trees can represent target functions and be converted to rule sets. Examples of classification trees are provided for predicting whether to play tennis based on weather conditions. The document outlines the top-down induction approach for generating decision trees, and how entropy and information gain are used to determine the best attribute to split the data on at each node in the tree.

Uploaded by

admin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

MACHINE LEARNING

Classification Tree

Presented by: Dr. S. N. Ahsan


(Slides adopted from the book Machine Learning, written by Tom Mitchell, and
I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations)
Welcome!!

Lecture 06
Learning Outcomes
• Information Theory, Entropy & Information Gain.
• Classification Tree Algorithm ID3.
• True Pruning

3
Decision Tree Learning
 Decision tree learning is a method for approximating discrete-valued
target functions, in which the learned function is represented by a
decision tree.
 Learned trees can also be re-represented as set of IF-THEN rules to
improve human readability
When to consider decision tree:
 Instances describable by attribute-value pairs
 Target function is discrete valued
Examples:
 Equipment or medical diagnosis
 Credit risk analysis

4
Decision Tree for PlayTennis (Example)
Decision tree representation:
Each internal node tests an attribute
Each branch corresponds to attribute value
Each leaf node assigns a classification

Converting a Tree to Rules


IF (Outlook = Sunny) ∧ (Humidity = High)
THEN PlayTennis = No
IF (Outlook = Sunny) ∧ (Humidity = Normal)
THEN PlayTennis = Yes
….

5
Example of a Decision Tree

Splitting Attributes
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Single 125K No


2 No Married 100K No Refund
No
Yes No
3 No Single 70K
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Single, Divorced Married
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data 6
Another Example of Decision Tree

MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that fits
10 No Single 90K Yes the same data!
10

7
Top-Down Induction of Decision Trees (Approach)
• Main loop:
Many Algorithms:
1. A  the “best” decision attribute for next node
– Hunt’s Algorithm
2. Assign A as decision attribute for node
3. For each value of A, create new descendant of node
– CART
4. Sort training examples to leaf nodes – ID3, C4.5
5. If training examples perfectly classified, Then STOP, – SLIQ, SPRINT
Else iterate over new leaf nodes

 Greedy strategy: Split the records based on an attribute test that optimizes certain criterion
. The following are issues
1) Determine how to split the records
How to specify the attribute test condition? How to determine the best split?
2) Determine when to stop splitting

8
How to determine the Best Split & Measured Node Impurity

– Greedy approach:
Nodes with homogeneous class distribution are preferred
– Need a measure of node impurity:
C0: 9
C0: 5
C1: 5 C1: 1

Non-homogeneous, High degree of impurity Homogeneous, Low degree of impurity

Following two are the most commonly


used method to measure node impurity:
1. Gini Index
2. Entropy

9
Entropy

• S is a sample of training examples


• p⊕ is the proportion of positive examples in S
• p⊖ is the proportion of negative examples in S
• Entropy measures the impurity of S
Entropy(S)  - p⊕log2 p⊕ - p⊖log2 p⊖

10
Information Gain
Gain(S, A) = expected reduction in entropy due to sorting on A

11
12
13
14
Entropy, a common way to measure impurity

15
2- Class Cases

16
Information Gain

17
Calculating Information Gain

18
Classification Tree Example
How would you distinguish Class I from Class II

19
Training Examples

20
Selecting the Next Attribute(1/2)
Which attribute is the best classifier?

21
Selecting the Next Attribute(2/2)

Ssunny = {D1,D2,D8,D9,D11}
Gain (Ssunny , Humidity) = .970 - (3/5) 0.0 - (2/5) 0.0 = .970
Gain (Ssunny , Temperature) = .970 - (2/5) 0.0 - (2/5) 1.0 - (1/5) 0.0 = .570
Gain (Ssunny, Wind) = .970 - (2/5) 1.0 - (3/5) .918 = .019

22
Decision Tree Based Classification
Advantages:
– Inexpensive to construct
– Extremely fast at classifying unknown records
– Easy to interpret for small-sized trees
– Accuracy is comparable to other classification techniques for many
simple data sets

Practical Issues of Classification:


- Underfitting and Overfitting
- Missing Values
- Costs of Classification

23
Divide and Conquer
Constructing Decision Trees
outlook temperature humidity windy play
sunny 85 85 FALSE no
sunny 80 90 TRUE no
overcast 83 86 FALSE yes
rainy 70 96 FALSE yes
rainy 68 80 FALSE yes
rainy 65 70 TRUE no
overcast 64 65 TRUE yes
sunny 72 95 FALSE no
sunny 69 70 FALSE yes
rainy 75 80 FALSE yes
sunny 75 70 TRUE yes
overcast 72 90 TRUE yes
overcast 81 75 FASLE yes
rainy 71 91 TRUE no
Which attribute to select ?
24
Divide and Conquer
Constructing Decision Trees
• Which is the best attribute?
– The one which will result in the smallest tree
• Popular impurity criterion: information gain
– Information gain increases with the average purity of the subsets that an attr
ibute produces
• Strategy: choose attribute that results in greatest information gain

25
Divide and Conquer
Constructing Decision Trees
outlook play
sunny no
sunny no
overcast yes
rainy yes
rainy yes
rainy no
overcast yes
sunny no
sunny yes
rainy yes
sunny yes
overcast yes
overcast yes
rainy no [2,3] [4,0] [3,2]

26
[2,3] [4,0] [3,2] [2,2] [4,2] [3,1]

• The number of either yeses or nos is zero,


the information is zero
• The number of yeses and nos is equal, the
information reaches a maximum
[3,4] [6,1] [6,2] [3,3]

27
Divide and Conquer:
Constructing Decision Trees

[2,3] [3,2]
[4,0]

• Info ([2,3]) = -2/5 * log 2/5 – 3/5 * log 3/5 =0.971


• Info ([4,0]) = -4/4 * log 4/4 – 0/4 * log 0/4 =0
• Info ([3,2]) = -3/5 * log 3/5 – 2/5 * log 2/5 =0.971
• Info ([2,3],[4,0],[3,2])
= 5/14 *0.971 + 4/14 *0 + 5/14 *0.971 = 0.673 bits

28
Divide and Conquer:
Constructing Decision Trees
play [9,5]

• Play: Info ([9,5]) = -9/14 * log 9/14 – 5/14 * log 5/14


= 0.94 bits
• Gain (outlook) = Info ([9,5]) - Info ([2,3],[4,0],[3,2])
= 0.94 - 0.673 = 0.247 bits
• Gain (temperature) = 0.029 bits
• Gain (humidity) = 0.029 bits
• Gain (windy) = 0.048 bits
29
?
30
Info ([2,3]) = 0.971

[0,2] [1,1] [1,0]

• Info ([0,2]) = 0
• Info ([1,1]) = -1/1 * log 1/1 – 1/1 * log 1/1
• Info ([1,0]) = 0
• Info ([0,2], [1,1], [1,0]) = 0.4 bits
• Gain (temperature) = 0.971 – 0.4 = 0.571 bits

31
Divide and Conquer:
Constructing Decision Trees
• Gain (temperature) = 0.571 bits
• Gain (humidity) = 0.971 bits
• Gain (windy) = 0.020 bits

32
Example (1/2)

33
Example (2/2)

34
Review Questions
1. What is entropy?
2. What will be the value of entropy if the distribution is homogenous.
3. What is Information Gain?
4. How we select attribute for the root node of the tree.
5. What is Tree Pruning?

35
Thank you

You might also like