0% found this document useful (0 votes)
11 views

UNIT 2 - Decision Tree - Issues

Uploaded by

esmritypoudel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

UNIT 2 - Decision Tree - Issues

Uploaded by

esmritypoudel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Machine Learning

Unit 2 –
Issues in
Decision Tree
Learning

By
Dr. G. Sunitha
Professor & BoS Chairperson
Department of CSE

Department of Computer Science and Engineering

Sree Sainath Nagar, A. Rangampet, Tirupati – 517 102


Issues in Decision Tree Learning
• Practical issues in learning decision trees include
– How deep to grow the tree? (Optimal height of the tree)
– How to handle continuous attributes? (Split Points)
– How to handle missing values?
– How to choose attribute selection measure? (Entropy, Gini Index etc)
– How to handle attributes with different costs? (Normalization procedures)
– How to improve computational efficiency?

2
Decision Tree - Overfitting
• Definition: Given a Hypothesis space H, a hypothesis h1 Є H is said to overfit the
training data if there exists some alternative hypothesis h2 Є H, such that
– h1 has smaller error than h2 over the training examples,
– but h2 has a smaller error than h1 over the entire distribution of instances.

3
Decision Tree – Overfitting . . .

4
Decision Tree – Overfitting . . .

5
Decision Tree – Overfitting . . .
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• All approaches to avoid overfitting in decision tree learning can be categorized into
two types:
– Stopping the growth of tree earlier, once it reaches the point where it perfectly
classifies the training data. (pre-pruning by setting hyperparameters)
– Allowing the tree to grow in full, and then prune the tree. (post-pruning)

• Regardless of whether pre-pruning or post-pruning is used, the key challenge is to


determine the correct tree size.
– Use separate set of examples, distinct from the training examples, to evaluate
the utility of post-pruning nodes from the tree. (train-validate approach, cross-
validation)
– Use all the available data for training, but apply a statistical test to estimate
whether expanding or pruning the node improves performance of the model. Ex:
chi-square test
– Use an explicit measure of the complexity for encoding the training examples
and the decision tree, so that the growth of the tree is halted when this encoding
size is minimized. Ex: MDL
6
Decision Tree – Overfitting . . .
Important techniques to avoid overfitting are:
– Cross-Validation
– Training With More Data
– Feature Selection
– Early Stopping
– Regularization
– Ensembling

7
K-fold Cross Validation

8
Reduced Error Pruning
• Pruning a decision node Di consists of removing the subtree of Di and making it a leaf
node.
• A node Di is pruned only “if the resulting pruned tree performs no worse than the original
tree”.
• Pruning is done based on performance of model on validation dataset.
• Pruning is an iterative process, which proceeds by always choosing the node whose
removal increases the tree performance the most.
• Tree pruning continues, until any further pruning degrades the performance of the tree.
• Not suitable if dataset available is small.

9
Rule Post Pruning
Rule post pruning is used in C4.5.
It involves the following steps:
1) Infer decision tree from training data.
2) Convert tree to rules - one rule per branch.
– Each path corresponds to a
rule
– Each node along a path
corresponds to pre-condition
– Each leaf classification is post- Ex:
condition If age = “senior” ∧
3) Prune each rule by removing any credit-rating = “excellent”,
preconditions that result in improved then buys_computer = “yes”
estimated accuracy.
4) Sort the pruned rules by their estimated
accuracy and consider them in this
sequence when classifying unseen
instances.
10
Rule Post Pruning . . .
Why Convert The Decision Tree To Rules Before Pruning?

• Converting to rules improves readability.

– Rules are often easier for to understand.

• Distinguishing different contexts in which a node is used

– separate pruning decision for each path

• No difference for root/inner

– no bookkeeping on how to reorganize tree if root node is pruned

11
Handling continuous-valued attributes
• How does ID3 handle continuous attributes?

Income (K)
12
18
20
25
34
47
52
55 12
Handling continuous-valued attributes . . .

Best split point is the


one whichever gives
the highest
information gain

Income Buys_Computer
(K)
12 No
18 Yes
20 Yes
25 Yes
34 Yes
47 No
52 No
55 No

13
Alternative Attribute Selection Measures
• There is a natural bias in the Information Gain Measure that it favors attributes with
many unique values over those with few unique values.

14
Alternative Attribute Selection Measures . . .
• Alternative Measures:
– Gain Ratio – penalizes attributes with many unique values by incorporating
“split information”. Split information is sensitive to how broadly and uniformly the
attribute splits the data.

𝑺𝒊 𝑺𝒊
𝑺𝒑𝒍𝒊𝒕𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏𝒎 𝑺, 𝑨 = − σ𝒄𝒊=𝟏 𝐥𝐨𝐠 𝟐
𝑺 𝑺

15
Alternative Attribute Selection Measures . . .
• Alternative Measures: 𝑺𝑨 − 𝑴𝑨
𝑮𝒂𝒊𝒏𝒎 𝑺, 𝑨 = 𝑮 𝑺, 𝑨
– Gain Ratio . . . 𝑺𝑨

Where |MA | = no. of missing values for A.

𝑮𝒂𝒊𝒏𝒎 𝑺, 𝑨
𝑮𝒂𝒊𝒏𝑹𝒂𝒕𝒊𝒐𝒎 𝑺, 𝑨 =
𝑺𝒑𝒍𝒊𝒕𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏𝒎 𝑺, 𝑨

16
Handling Missing Attribute Values in Training Data
• Ignore the tuple:
– usually done when class label is missing.
– not effective when the percentage of missing values per attribute varies
considerably.
• Fill in the missing value manually: tedious + infeasible?
• Fill in the missing value automatically with
– a global constant
• e.g., “unknown”, 0, -1 , ∞ etc
• affects quality of learning
– the attribute mean
– the attribute mean for all samples belonging to the same class
– the attribute mode
– the attribute median
• C4.5 is successor of ID3. C4.5 inherently supports handling attributes with missing
values.
17
Handling Missing Attribute Values in Training Data . . .

18
Handling Missing Attribute Values in Training Data . . .
Outlook

Sunny Rain Overcast Records with


missing values

𝑺𝒐𝒖𝒕𝒍𝒐𝒐𝒌 − 𝑴𝒐𝒖𝒕𝒍𝒐𝒐𝒌
𝑮𝒂𝒊𝒏𝒎 𝑫, 𝒐𝒖𝒕𝒍𝒐𝒐𝒌 = 𝑮 𝑫, 𝒐𝒖𝒕𝒍𝒐𝒐𝒌
𝑺𝑶𝒖𝒕𝒍𝒐𝒐𝒌

19
Handling Missing Attribute Values in Training Data . . .

For calculating Entropy of Doutlook =


Outlook
= Consider only records without missing values
= 8 records

5 5 3 3
− log 2 − log 2 = 0.95
8 8 8 8

20
Handling Missing Attribute Values in Training Data . . .
Outlook

Records with
Sunny Rain Overcast missing values
Entropy(Dsunny) Entropy(Drain) Entropy(DOvercast ) Entropy(Dmissing)

0 0 1 1
Entropy(Dsunny) = − log 2 + log 2 = 0
1 1 1 1
2 2 2 2
Entropy(Drain) = − log 2 + log 2 = 0.5
4 4 4 4

3 3 0 0
Entropy(Dovercast) = − log 2 + log 2 =0
3 3 3 3

1 4 3
Gain(D, outlook) = × 0 + × 0.5 + × 0 = 0.5
8 8 8
14 − 6
𝐺𝑎𝑖𝑛𝑚 (D, outlook) = × Gain(D, outlook) = 0.257
14

21
Handling Missing Attribute Values in Training Data . . .
Outlook

Records with
Sunny Rain Overcast
missing values

𝑆𝑝𝑙𝑖𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛𝑚 (D, outlook) =

1 1 4 4 3 3 6 6
− log 2 − log 2 − log 2 − log 2 = 1.659
14 14 14 14 14 14 14 14

𝐺𝑎𝑖𝑛𝑚 (D, outlook)


𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜𝑚 (D, outlook) =
𝑆𝑝𝑙𝑖𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛𝑚 (D, outlook)

0.257
= = 0.154
1.659

22
Handling Attributes with Different Costs
• In some learning tasks, importance (or priority) of attributes may vary.
• The priority of the attributes is represented in the form of costs.
• In such cases, the decision trees are required to consider low-cost attributes
wherever possible. High-cost attributes shall be used only when needed to produce
reliable classifications.
• ID3 can be modified to consider attribute costs for constructing the decision tree.
• Other attribute selection measures –
– Ex1: 𝑮𝒂𝒊𝒏𝟐 (D, A)
𝑪𝒐𝒔𝒕 (A)

– Ex2: 𝟐𝑮𝒂𝒊𝒏(𝑫,𝑨) − 𝟏
𝑪𝒐𝒔𝒕 𝑨 + 𝟏 𝒘

where w = [ 0 , 1 ] and it
represents the relative importance of cost vs Gain

23

You might also like