0% found this document useful (0 votes)

11 views

UNIT 2 - Decision Tree - Issues

Uploaded by

esmritypoudel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

UNIT 2 - Decision Tree - Issues

Uploaded by

esmritypoudel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Machine Learning

Unit 2 –
Issues in
Decision Tree
Learning

By
Dr. G. Sunitha
Professor & BoS Chairperson
Department of CSE

Department of Computer Science and Engineering

Sree Sainath Nagar, A. Rangampet, Tirupati – 517 102

Issues in Decision Tree Learning
• Practical issues in learning decision trees include
– How deep to grow the tree? (Optimal height of the tree)
– How to handle continuous attributes? (Split Points)
– How to handle missing values?
– How to choose attribute selection measure? (Entropy, Gini Index etc)
– How to handle attributes with different costs? (Normalization procedures)
– How to improve computational efficiency?

2
Decision Tree - Overfitting
• Definition: Given a Hypothesis space H, a hypothesis h1 Є H is said to overfit the
training data if there exists some alternative hypothesis h2 Є H, such that
– h1 has smaller error than h2 over the training examples,
– but h2 has a smaller error than h1 over the entire distribution of instances.

3
Decision Tree – Overfitting . . .

4
Decision Tree – Overfitting . . .

5
Decision Tree – Overfitting . . .
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• All approaches to avoid overfitting in decision tree learning can be categorized into
two types:
– Stopping the growth of tree earlier, once it reaches the point where it perfectly
classifies the training data. (pre-pruning by setting hyperparameters)
– Allowing the tree to grow in full, and then prune the tree. (post-pruning)

• Regardless of whether pre-pruning or post-pruning is used, the key challenge is to

determine the correct tree size.
– Use separate set of examples, distinct from the training examples, to evaluate
the utility of post-pruning nodes from the tree. (train-validate approach, cross-
validation)
– Use all the available data for training, but apply a statistical test to estimate
whether expanding or pruning the node improves performance of the model. Ex:
chi-square test
– Use an explicit measure of the complexity for encoding the training examples
and the decision tree, so that the growth of the tree is halted when this encoding
size is minimized. Ex: MDL
6
Decision Tree – Overfitting . . .
Important techniques to avoid overfitting are:
– Cross-Validation
– Training With More Data
– Feature Selection
– Early Stopping
– Regularization
– Ensembling

7
K-fold Cross Validation

8
Reduced Error Pruning
• Pruning a decision node Di consists of removing the subtree of Di and making it a leaf
node.
• A node Di is pruned only “if the resulting pruned tree performs no worse than the original
tree”.
• Pruning is done based on performance of model on validation dataset.
• Pruning is an iterative process, which proceeds by always choosing the node whose
removal increases the tree performance the most.
• Tree pruning continues, until any further pruning degrades the performance of the tree.
• Not suitable if dataset available is small.

9
Rule Post Pruning
Rule post pruning is used in C4.5.
It involves the following steps:
1) Infer decision tree from training data.
2) Convert tree to rules - one rule per branch.
– Each path corresponds to a
rule
– Each node along a path
corresponds to pre-condition
– Each leaf classification is post- Ex:
condition If age = “senior” ∧
3) Prune each rule by removing any credit-rating = “excellent”,
preconditions that result in improved then buys_computer = “yes”
estimated accuracy.
4) Sort the pruned rules by their estimated
accuracy and consider them in this
sequence when classifying unseen
instances.
10
Rule Post Pruning . . .
Why Convert The Decision Tree To Rules Before Pruning?

• Converting to rules improves readability.

– Rules are often easier for to understand.

• Distinguishing different contexts in which a node is used

– separate pruning decision for each path

• No difference for root/inner

– no bookkeeping on how to reorganize tree if root node is pruned

11
Handling continuous-valued attributes
• How does ID3 handle continuous attributes?

Income (K)
12
18
20
25
34
47
52
55 12
Handling continuous-valued attributes . . .

Best split point is the

one whichever gives
the highest
information gain

Income Buys_Computer
(K)
12 No
18 Yes
20 Yes
25 Yes
34 Yes
47 No
52 No
55 No

13
Alternative Attribute Selection Measures
• There is a natural bias in the Information Gain Measure that it favors attributes with
many unique values over those with few unique values.

14
Alternative Attribute Selection Measures . . .
• Alternative Measures:
– Gain Ratio – penalizes attributes with many unique values by incorporating
“split information”. Split information is sensitive to how broadly and uniformly the
attribute splits the data.

𝑺𝒊 𝑺𝒊
𝑺𝒑𝒍𝒊𝒕𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏𝒎 𝑺, 𝑨 = − σ𝒄𝒊=𝟏 𝐥𝐨𝐠 𝟐
𝑺 𝑺

15
Alternative Attribute Selection Measures . . .
• Alternative Measures: 𝑺𝑨 − 𝑴𝑨
𝑮𝒂𝒊𝒏𝒎 𝑺, 𝑨 = 𝑮 𝑺, 𝑨
– Gain Ratio . . . 𝑺𝑨

Where |MA | = no. of missing values for A.

𝑮𝒂𝒊𝒏𝒎 𝑺, 𝑨
𝑮𝒂𝒊𝒏𝑹𝒂𝒕𝒊𝒐𝒎 𝑺, 𝑨 =
𝑺𝒑𝒍𝒊𝒕𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏𝒎 𝑺, 𝑨

16
Handling Missing Attribute Values in Training Data
• Ignore the tuple:
– usually done when class label is missing.
– not effective when the percentage of missing values per attribute varies
considerably.
• Fill in the missing value manually: tedious + infeasible?
• Fill in the missing value automatically with
– a global constant
• e.g., “unknown”, 0, -1 , ∞ etc
• affects quality of learning
– the attribute mean
– the attribute mean for all samples belonging to the same class
– the attribute mode
– the attribute median
• C4.5 is successor of ID3. C4.5 inherently supports handling attributes with missing
values.
17
Handling Missing Attribute Values in Training Data . . .

18
Handling Missing Attribute Values in Training Data . . .
Outlook

Sunny Rain Overcast Records with

missing values

𝑺𝒐𝒖𝒕𝒍𝒐𝒐𝒌 − 𝑴𝒐𝒖𝒕𝒍𝒐𝒐𝒌
𝑮𝒂𝒊𝒏𝒎 𝑫, 𝒐𝒖𝒕𝒍𝒐𝒐𝒌 = 𝑮 𝑫, 𝒐𝒖𝒕𝒍𝒐𝒐𝒌
𝑺𝑶𝒖𝒕𝒍𝒐𝒐𝒌

19
Handling Missing Attribute Values in Training Data . . .

For calculating Entropy of Doutlook =

Outlook
= Consider only records without missing values
= 8 records

5 5 3 3
− log 2 − log 2 = 0.95
8 8 8 8

20
Handling Missing Attribute Values in Training Data . . .
Outlook

Records with
Sunny Rain Overcast missing values
Entropy(Dsunny) Entropy(Drain) Entropy(DOvercast ) Entropy(Dmissing)

0 0 1 1
Entropy(Dsunny) = − log 2 + log 2 = 0
1 1 1 1
2 2 2 2
Entropy(Drain) = − log 2 + log 2 = 0.5
4 4 4 4

3 3 0 0
Entropy(Dovercast) = − log 2 + log 2 =0
3 3 3 3

1 4 3
Gain(D, outlook) = × 0 + × 0.5 + × 0 = 0.5
8 8 8
14 − 6
𝐺𝑎𝑖𝑛𝑚 (D, outlook) = × Gain(D, outlook) = 0.257
14

21
Handling Missing Attribute Values in Training Data . . .
Outlook

Records with
Sunny Rain Overcast
missing values

𝑆𝑝𝑙𝑖𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛𝑚 (D, outlook) =

1 1 4 4 3 3 6 6
− log 2 − log 2 − log 2 − log 2 = 1.659
14 14 14 14 14 14 14 14

𝐺𝑎𝑖𝑛𝑚 (D, outlook)

𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜𝑚 (D, outlook) =
𝑆𝑝𝑙𝑖𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛𝑚 (D, outlook)

0.257
= = 0.154
1.659

22
Handling Attributes with Different Costs
• In some learning tasks, importance (or priority) of attributes may vary.
• The priority of the attributes is represented in the form of costs.
• In such cases, the decision trees are required to consider low-cost attributes
wherever possible. High-cost attributes shall be used only when needed to produce
reliable classifications.
• ID3 can be modified to consider attribute costs for constructing the decision tree.
• Other attribute selection measures –
– Ex1: 𝑮𝒂𝒊𝒏𝟐 (D, A)
𝑪𝒐𝒔𝒕 (A)

– Ex2: 𝟐𝑮𝒂𝒊𝒏(𝑫,𝑨) − 𝟏
𝑪𝒐𝒔𝒕 𝑨 + 𝟏 𝒘

where w = [ 0 , 1 ] and it
represents the relative importance of cost vs Gain

Applied Data Science Questions
No ratings yet
Applied Data Science Questions
15 pages
Tutorialspoint For R PDF
100% (2)
Tutorialspoint For R PDF
34 pages
SS Teamproject Documentation
No ratings yet
SS Teamproject Documentation
33 pages
issues in decision trees
No ratings yet
issues in decision trees
22 pages
ML Mod2
No ratings yet
ML Mod2
5 pages
Issues in Decision Tree Learning
No ratings yet
Issues in Decision Tree Learning
6 pages
Classification Problems
No ratings yet
Classification Problems
53 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
DT_RF
No ratings yet
DT_RF
64 pages
Decision Tree PDF
No ratings yet
Decision Tree PDF
10 pages
Classification
No ratings yet
Classification
8 pages
C4.5 Algorithm
100% (1)
C4.5 Algorithm
31 pages
Unit-7
No ratings yet
Unit-7
67 pages
An Empirical Comparison of Pruning Methods For Decision Tree Induction
No ratings yet
An Empirical Comparison of Pruning Methods For Decision Tree Induction
17 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
33 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
CS402 Mod 3
No ratings yet
CS402 Mod 3
2 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
DMDW_Classification
No ratings yet
DMDW_Classification
18 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
No ratings yet
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
30 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
ML Assignment-2: Unit 3
No ratings yet
ML Assignment-2: Unit 3
21 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
ID3 Dozier Seals
No ratings yet
ID3 Dozier Seals
30 pages
Decision Lists and Trees
No ratings yet
Decision Lists and Trees
29 pages
updated dm unit 3
No ratings yet
updated dm unit 3
28 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Class Basic
No ratings yet
Class Basic
75 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Unit Ivnotes
No ratings yet
Unit Ivnotes
19 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Aiml Improvement Test
No ratings yet
Aiml Improvement Test
50 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
DWDM UNIT-IV Classification and Prediction
100% (1)
DWDM UNIT-IV Classification and Prediction
70 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
Lec 16,17
No ratings yet
Lec 16,17
90 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Lecture 3
No ratings yet
Lecture 3
18 pages
Unit 3
100% (1)
Unit 3
21 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Classification: Basic Concepts, Decision Trees, and Model Evaluation
No ratings yet
Classification: Basic Concepts, Decision Trees, and Model Evaluation
46 pages
Unit-3 (MLT)
No ratings yet
Unit-3 (MLT)
46 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
A_comparative_analysis_of_methods_for_pruning_decision_trees
No ratings yet
A_comparative_analysis_of_methods_for_pruning_decision_trees
16 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Unit-3
No ratings yet
Unit-3
98 pages
Les 3 DWM
No ratings yet
Les 3 DWM
21 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Unit - I - IoT & IIoT
No ratings yet
Unit - I - IoT & IIoT
8 pages
Unit 3 - Ann
No ratings yet
Unit 3 - Ann
49 pages
Computer Vision Introduction
No ratings yet
Computer Vision Introduction
42 pages
Unit 5
No ratings yet
Unit 5
18 pages
Classifying Affective States Using Thermal Infrared Imaging of The Human Face
No ratings yet
Classifying Affective States Using Thermal Infrared Imaging of The Human Face
9 pages
Deep Learning with Python Develop Deep Learning Models on Theano and TensorFLow Using Keras Jason Brownlee all chapter instant download
100% (6)
Deep Learning with Python Develop Deep Learning Models on Theano and TensorFLow Using Keras Jason Brownlee all chapter instant download
65 pages
Mining Process
No ratings yet
Mining Process
33 pages
Final - Urop - Report - Heart Attack Machine Learning
No ratings yet
Final - Urop - Report - Heart Attack Machine Learning
33 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
A Convolutional Route To Abbreviation Disambiguation in Clinical Text
No ratings yet
A Convolutional Route To Abbreviation Disambiguation in Clinical Text
8 pages
An Introduction To The Package GeoR
No ratings yet
An Introduction To The Package GeoR
17 pages
WIA1006 Report (OrionX)
No ratings yet
WIA1006 Report (OrionX)
42 pages
Literature Survey on AI-Driven Early Sepsis Prediction Using Clinical Data
No ratings yet
Literature Survey on AI-Driven Early Sepsis Prediction Using Clinical Data
42 pages
KKNN
No ratings yet
KKNN
15 pages
39808709
No ratings yet
39808709
3 pages
Bjerre Et Al. - 2022 - Assessing Spatial Transferability of A Random Fore
No ratings yet
Bjerre Et Al. - 2022 - Assessing Spatial Transferability of A Random Fore
11 pages
Essential Machine Learning Interview Questions and Answers
No ratings yet
Essential Machine Learning Interview Questions and Answers
15 pages
Predicting and Explaining Employee Turnover Intention
No ratings yet
Predicting and Explaining Employee Turnover Intention
14 pages
Basepaper (Water Fraud)
No ratings yet
Basepaper (Water Fraud)
7 pages
Diabetes Prediction Using Supervised Machine Learning
No ratings yet
Diabetes Prediction Using Supervised Machine Learning
10 pages
Cross-Validation For Imbalanced Datasets: Avoiding Overoptimistic and Overfitting Approaches
No ratings yet
Cross-Validation For Imbalanced Datasets: Avoiding Overoptimistic and Overfitting Approaches
17 pages
Component Gaurd Ai Project
No ratings yet
Component Gaurd Ai Project
11 pages
Databricks Certified Machine Learning Associate Exam Guide
No ratings yet
Databricks Certified Machine Learning Associate Exam Guide
9 pages
1-S2.0-S1319157818309406-Main An Android Based Course Attendance System Using Face Recognition PDF
No ratings yet
1-S2.0-S1319157818309406-Main An Android Based Course Attendance System Using Face Recognition PDF
9 pages
Scikit-learn Interview Questions and Answers
No ratings yet
Scikit-learn Interview Questions and Answers
2 pages
V02ct42a050 gt2018 77098
No ratings yet
V02ct42a050 gt2018 77098
11 pages
Automated Analysis Method For Screening Knee Osteoarthritis Using Medical Infrared Thermography
No ratings yet
Automated Analysis Method For Screening Knee Osteoarthritis Using Medical Infrared Thermography
7 pages
DWM Notes
No ratings yet
DWM Notes
19 pages
IJNRD2406005
No ratings yet
IJNRD2406005
8 pages
MentalRiskES IberLEF 2023 TextualTherapists
No ratings yet
MentalRiskES IberLEF 2023 TextualTherapists
18 pages
COC257-Commercial Applications of Vehicle Image Classification
No ratings yet
COC257-Commercial Applications of Vehicle Image Classification
48 pages