SlideShare a Scribd company logo
12
Most read
13
Most read
14
Most read
ISSUES IN DECISION TREE LEARNING
Practical issues in learning decision trees include
1. Determining how deeply to grow the decision tree,
2. Handling continuous attributes,
choosing an appropriate attribute selection measure,
3. Handling training data with missing attribute values,
4. Handling attributes with differing costs, and
improving computational efficiency.
1. AVOIDING OVERFITTING THE DATA
 When we are designing a machine learning model, a
model is said to be a good machine learning model, if it
generalizes any new input data from the problem domain
in a proper way.
 This helps us to make predictions in the future data, that
data model has never seen.
 Underfitting
 A machine learning algorithm is said to have underfitting
when it cannot capture the underlying trend of the data.
 Underfitting destroys the accuracy of our machine learning
model.
 Its occurrence simply means that our model or the algorithm
does not fit the data well enough.
 It usually happens when we have less data to build an
accurate model and also when we try to build a linear model
with a non-linear data.
 Overfitting
 A machine learning algorithm is said to be overfitted, when
we train it with a lot of data.
 When a model gets trained with so much of data, it starts
learning from the noise and inaccurate data entries in our
data set.
 Then the model does not categorize the data correctly,
because of too much of details and noise.
 A solution to avoid over fitting is using a linear algorithm if
we have linear data or using the parameters like the
maximal depth if we are using decision trees.
 Definition — Overfit: Given a hypothesis space H, a
hypothesis h ∈ H is said to overfit the training data if there
exists some alternative hypothesis h’ ∈ H, such that h has
smaller error than h’ over the training examples, but h’ has
a smaller error than h over the entire distribution of
instances.
 Lets try to understand the effect of adding the following
positive training example, incorrectly labeled as negative, to
the training examples Table.
 <Sunny, Hot, Normal, Strong, ->, Example is noisy because
the correct label is +.
Given the original error-free data, ID3 produces the
decision tree shown in Figure.
Issues in DTL.pptx
AVOIDING OVER FITTING
 There are several approaches to avoiding overfitting in
decision tree learning. These can be grouped into two
classes:

- Pre-pruning (avoidance): Stop growing the tree
earlier, before it reaches the point where it perfectly
classifies the training data

- Post-pruning (recovery): Allow the tree to overfit the
data, and then post-prune the tree
 Criterion used to determine the correct final tree
size
 Use a separate set of examples, distinct from the
training examples, to evaluate the utility of post-
pruning nodes from the tree
 Use all the available data for training, but apply a
statistical test to estimate whether expanding (or
pruning) a particular node is likely to produce an
improvement beyond the training set
1. REDUCED ERROR PRUNING
 How exactly might we use a validation set to prevent
overfitting? One approach, called reduced-error pruning
(Quinlan 1987), is to consider each of the decision nodes in the
tree to be candidates for pruning.
 Reduced-error pruning, is to consider each of the decision
nodes in the tree to be candidates for pruning
 Pruning a decision node consists of removing the subtree
rooted at that node, making it a leaf node, and assigning it the
most common classification of the training examples affiliated
with that node
 Nodes are removed only if the resulting pruned tree performs
no worse than-the original over the validation set.
 Reduced error pruning has the effect that any leaf node added
due to coincidental regularities in the training set is likely to be
pruned because these same coincidences are unlikely to occur
in the validation set
2. RULE POST-PRUNING
 Rule post-pruning involves the following steps:
 Infer the decision tree from the training set, growing the
tree until the training data is fit as well as possible and
allowing over fitting to occur.
 Convert the learned tree into an equivalent set of rules by
creating one rule for each path from the root node to a
leaf node.
 Prune (generalize) each rule by removing any
preconditions that result in improving its estimated
accuracy.
 Sort the pruned rules by their estimated accuracy, and
consider them in this sequence when classifying
subsequent instances.
THERE ARE THREE MAIN ADVANTAGES BY CONVERTING
THE DECISION TREE TO RULES BEFORE PRUNING
 Converting to rules allows distinguishing among the
different contexts in which a decision node is used.
 Because each distinct path through the decision tree
node produces a distinct rule, the pruning decision
regarding that attribute test can be made differently for
each path.
 Converting to rules removes the distinction between
attribute tests that occur near the root of the tree and
those that occur near the leaves.
 Thus, it avoid messy bookkeeping issues such as how to
reorganize the tree if the root node is pruned while
retaining part of the subtree below this test.
 Converting to rules improves readability. Rules are often
easier for to understand.
2. INCORPORATING CONTINUOUS-VALUED
ATTRIBUTES
 Our initial definition of ID3 is restricted to attributes
that take on a discrete set of values.
 1. The target attribute whose value is predicted by
learned tree must be discrete valued.
 2. The attributes tested in the decision nodes of the
tree must also be discrete valued.
 This second restriction can easily be removed so
that continuous-valued decision attributes can be
incorporated into the learned tree.
3. ALTERNATIVE MEASURES FOR SELECTING
ATTRIBUTES
 There is a natural bias in the information gain measure
that favours attributes with many values over those with
few values.
 As an extreme example, consider the attribute Date, which
has a very large number of possible values. What is wrong
with the attribute Date?
 Simply put, it has so many possible values that it is bound
to separate the training examples into very small subsets.
 Because of this, it will have a very high information gain
relative to the training examples.
 How ever, having very high information gain, its a very
poor predictor of the target function over unseen
instances.
4. HANDLING MISSING ATTRIBUTE VALUES
 In certain cases, the available data may be missing
values for some attributes.
 For example, in a medical domain in which we wish
to predict patient outcome based on various
laboratory tests, it may be that the Blood-Test-
Result is available only for a subset of the patients.
 In such cases, it is common to estimate the missing
attribute value based on other examples for which
this attribute has a known value.
5. HANDLING ATTRIBUTES WITH DIFFERING
COSTS
 In some learning tasks the instance attributes may
have associated costs.
 For example, in learning to classify medical
diseases we might describe patients in terms of
attributes such as Temperature, BiopsyResult,
Pulse, BloodTestResults, etc.
 These attributes vary significantly in their costs,
both in terms of monetary cost and cost to patient
comfort.

More Related Content

PPT
3.2 partitioning methods
PPTX
04 Classification in Data Mining
PPTX
Distributed file system
PPTX
Rule Based Algorithms.pptx
PDF
Vc dimension in Machine Learning
PPT
2.5 backpropagation
PPTX
Bayesian Belief Network and its Applications.pptx
PPTX
Principle source of optimazation
3.2 partitioning methods
04 Classification in Data Mining
Distributed file system
Rule Based Algorithms.pptx
Vc dimension in Machine Learning
2.5 backpropagation
Bayesian Belief Network and its Applications.pptx
Principle source of optimazation

What's hot (20)

PPT
Sliding window protocol
PPT
Congestion control and quality of service
PPTX
Dbscan algorithom
PPTX
Concurrency Control in Distributed Database.
PPT
Chapter 4 data link layer
PDF
Decision Tree in Machine Learning
PPTX
PPTX
Data mining: Classification and prediction
PPTX
Inductive bias
PDF
Ddb 1.6-design issues
PPTX
Communication Asymmetry - Mobile Computing
PPTX
Forward and Backward chaining in AI
PDF
Distributed deadlock
PPTX
ID3 ALGORITHM
PPT
Computational Learning Theory
PPTX
03 Single layer Perception Classifier
PPT
Allocation methods continuous method.47
PPSX
Congestion avoidance in TCP
PPTX
Publish subscribe model overview
PPTX
Planning in Artificial Intelligence
Sliding window protocol
Congestion control and quality of service
Dbscan algorithom
Concurrency Control in Distributed Database.
Chapter 4 data link layer
Decision Tree in Machine Learning
Data mining: Classification and prediction
Inductive bias
Ddb 1.6-design issues
Communication Asymmetry - Mobile Computing
Forward and Backward chaining in AI
Distributed deadlock
ID3 ALGORITHM
Computational Learning Theory
03 Single layer Perception Classifier
Allocation methods continuous method.47
Congestion avoidance in TCP
Publish subscribe model overview
Planning in Artificial Intelligence
Ad

Similar to Issues in DTL.pptx (20)

PPT
Data Mining in Market Research
PPT
Data Mining In Market Research
PPT
Data Mining In Market Research
 
PDF
Data Science - Part V - Decision Trees & Random Forests
PDF
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
PDF
Analysis of Common Supervised Learning Algorithms Through Application
PPTX
ML2_ML (1) concepts explained in details.pptx
PDF
Analysis of Common Supervised Learning Algorithms Through Application
PPTX
Machine learning session6(decision trees random forrest)
DOCX
dl unit 4.docx for deep learning in b tech
PPTX
Machine learning module 2
PDF
dm1.pdf
PPTX
Gradient Boosted trees
PDF
A Decision Tree Based Classifier for Classification & Prediction of Diseases
PDF
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
PDF
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
PPTX
CST413 KTU S7 CSE Machine Learning Classification Assessment Confusion matrix...
PPTX
Machine learning Chapter three (16).pptx
PDF
Top 20 Data Science Interview Questions and Answers in 2023.pdf
PPTX
Unit 2-ML.pptx
Data Mining in Market Research
Data Mining In Market Research
Data Mining In Market Research
 
Data Science - Part V - Decision Trees & Random Forests
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
Analysis of Common Supervised Learning Algorithms Through Application
ML2_ML (1) concepts explained in details.pptx
Analysis of Common Supervised Learning Algorithms Through Application
Machine learning session6(decision trees random forrest)
dl unit 4.docx for deep learning in b tech
Machine learning module 2
dm1.pdf
Gradient Boosted trees
A Decision Tree Based Classifier for Classification & Prediction of Diseases
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
CST413 KTU S7 CSE Machine Learning Classification Assessment Confusion matrix...
Machine learning Chapter three (16).pptx
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Unit 2-ML.pptx
Ad

More from Ramakrishna Reddy Bijjam (20)

PPTX
Probability Distribution Reviewing Probability Distributions.pptx
PPTX
Combining data and Customizing the Header NamesSorting.pptx
PPTX
python plotting's and its types with examples.pptx
PPTX
Statistics and its measures with Python.pptx
PPTX
DataStructures in Pyhton Pandas and numpy.pptx
PPTX
Pyhton with Mysql to perform CRUD operations.pptx
PPTX
Regular expressions,function and glob module.pptx
PPTX
Natural Language processing using nltk.pptx
PPTX
Parsing HTML read and write operations and OS Module.pptx
PPTX
JSON, XML and Data Science introduction.pptx
PPTX
What is FIle and explanation of text files.pptx
PPTX
BINARY files CSV files JSON files with example.pptx
DOCX
VBS control structures for if do whilw.docx
DOCX
Builtinfunctions in vbscript and its types.docx
DOCX
VBScript Functions procedures and arrays.docx
DOCX
VBScript datatypes and control structures.docx
PPTX
Numbers and global functions conversions .pptx
DOCX
Structured Graphics in dhtml and active controls.docx
DOCX
Filters and its types as wave shadow.docx
PPTX
JavaScript Arrays and its types .pptx
Probability Distribution Reviewing Probability Distributions.pptx
Combining data and Customizing the Header NamesSorting.pptx
python plotting's and its types with examples.pptx
Statistics and its measures with Python.pptx
DataStructures in Pyhton Pandas and numpy.pptx
Pyhton with Mysql to perform CRUD operations.pptx
Regular expressions,function and glob module.pptx
Natural Language processing using nltk.pptx
Parsing HTML read and write operations and OS Module.pptx
JSON, XML and Data Science introduction.pptx
What is FIle and explanation of text files.pptx
BINARY files CSV files JSON files with example.pptx
VBS control structures for if do whilw.docx
Builtinfunctions in vbscript and its types.docx
VBScript Functions procedures and arrays.docx
VBScript datatypes and control structures.docx
Numbers and global functions conversions .pptx
Structured Graphics in dhtml and active controls.docx
Filters and its types as wave shadow.docx
JavaScript Arrays and its types .pptx

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Database Infoormation System (DBIS).pptx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Managing Community Partner Relationships
PPTX
Leprosy and NLEP programme community medicine
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Global Data and Analytics Market Outlook Report
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Introduction to the R Programming Language
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPT
ISS -ESG Data flows What is ESG and HowHow
Business Analytics and business intelligence.pdf
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Database Infoormation System (DBIS).pptx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Managing Community Partner Relationships
Leprosy and NLEP programme community medicine
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
CYBER SECURITY the Next Warefare Tactics
Qualitative Qantitative and Mixed Methods.pptx
Global Data and Analytics Market Outlook Report
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
annual-report-2024-2025 original latest.
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
retention in jsjsksksksnbsndjddjdnFPD.pptx
Introduction to the R Programming Language
IBA_Chapter_11_Slides_Final_Accessible.pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
ISS -ESG Data flows What is ESG and HowHow

Issues in DTL.pptx

  • 1. ISSUES IN DECISION TREE LEARNING Practical issues in learning decision trees include 1. Determining how deeply to grow the decision tree, 2. Handling continuous attributes, choosing an appropriate attribute selection measure, 3. Handling training data with missing attribute values, 4. Handling attributes with differing costs, and improving computational efficiency.
  • 2. 1. AVOIDING OVERFITTING THE DATA  When we are designing a machine learning model, a model is said to be a good machine learning model, if it generalizes any new input data from the problem domain in a proper way.  This helps us to make predictions in the future data, that data model has never seen.
  • 3.  Underfitting  A machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data.  Underfitting destroys the accuracy of our machine learning model.  Its occurrence simply means that our model or the algorithm does not fit the data well enough.  It usually happens when we have less data to build an accurate model and also when we try to build a linear model with a non-linear data.
  • 4.  Overfitting  A machine learning algorithm is said to be overfitted, when we train it with a lot of data.  When a model gets trained with so much of data, it starts learning from the noise and inaccurate data entries in our data set.  Then the model does not categorize the data correctly, because of too much of details and noise.  A solution to avoid over fitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.
  • 5.  Definition — Overfit: Given a hypothesis space H, a hypothesis h ∈ H is said to overfit the training data if there exists some alternative hypothesis h’ ∈ H, such that h has smaller error than h’ over the training examples, but h’ has a smaller error than h over the entire distribution of instances.  Lets try to understand the effect of adding the following positive training example, incorrectly labeled as negative, to the training examples Table.  <Sunny, Hot, Normal, Strong, ->, Example is noisy because the correct label is +. Given the original error-free data, ID3 produces the decision tree shown in Figure.
  • 7. AVOIDING OVER FITTING  There are several approaches to avoiding overfitting in decision tree learning. These can be grouped into two classes:  - Pre-pruning (avoidance): Stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data  - Post-pruning (recovery): Allow the tree to overfit the data, and then post-prune the tree
  • 8.  Criterion used to determine the correct final tree size  Use a separate set of examples, distinct from the training examples, to evaluate the utility of post- pruning nodes from the tree  Use all the available data for training, but apply a statistical test to estimate whether expanding (or pruning) a particular node is likely to produce an improvement beyond the training set
  • 9. 1. REDUCED ERROR PRUNING  How exactly might we use a validation set to prevent overfitting? One approach, called reduced-error pruning (Quinlan 1987), is to consider each of the decision nodes in the tree to be candidates for pruning.  Reduced-error pruning, is to consider each of the decision nodes in the tree to be candidates for pruning  Pruning a decision node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training examples affiliated with that node  Nodes are removed only if the resulting pruned tree performs no worse than-the original over the validation set.  Reduced error pruning has the effect that any leaf node added due to coincidental regularities in the training set is likely to be pruned because these same coincidences are unlikely to occur in the validation set
  • 10. 2. RULE POST-PRUNING  Rule post-pruning involves the following steps:  Infer the decision tree from the training set, growing the tree until the training data is fit as well as possible and allowing over fitting to occur.  Convert the learned tree into an equivalent set of rules by creating one rule for each path from the root node to a leaf node.  Prune (generalize) each rule by removing any preconditions that result in improving its estimated accuracy.  Sort the pruned rules by their estimated accuracy, and consider them in this sequence when classifying subsequent instances.
  • 11. THERE ARE THREE MAIN ADVANTAGES BY CONVERTING THE DECISION TREE TO RULES BEFORE PRUNING  Converting to rules allows distinguishing among the different contexts in which a decision node is used.  Because each distinct path through the decision tree node produces a distinct rule, the pruning decision regarding that attribute test can be made differently for each path.  Converting to rules removes the distinction between attribute tests that occur near the root of the tree and those that occur near the leaves.  Thus, it avoid messy bookkeeping issues such as how to reorganize the tree if the root node is pruned while retaining part of the subtree below this test.  Converting to rules improves readability. Rules are often easier for to understand.
  • 12. 2. INCORPORATING CONTINUOUS-VALUED ATTRIBUTES  Our initial definition of ID3 is restricted to attributes that take on a discrete set of values.  1. The target attribute whose value is predicted by learned tree must be discrete valued.  2. The attributes tested in the decision nodes of the tree must also be discrete valued.  This second restriction can easily be removed so that continuous-valued decision attributes can be incorporated into the learned tree.
  • 13. 3. ALTERNATIVE MEASURES FOR SELECTING ATTRIBUTES  There is a natural bias in the information gain measure that favours attributes with many values over those with few values.  As an extreme example, consider the attribute Date, which has a very large number of possible values. What is wrong with the attribute Date?  Simply put, it has so many possible values that it is bound to separate the training examples into very small subsets.  Because of this, it will have a very high information gain relative to the training examples.  How ever, having very high information gain, its a very poor predictor of the target function over unseen instances.
  • 14. 4. HANDLING MISSING ATTRIBUTE VALUES  In certain cases, the available data may be missing values for some attributes.  For example, in a medical domain in which we wish to predict patient outcome based on various laboratory tests, it may be that the Blood-Test- Result is available only for a subset of the patients.  In such cases, it is common to estimate the missing attribute value based on other examples for which this attribute has a known value.
  • 15. 5. HANDLING ATTRIBUTES WITH DIFFERING COSTS  In some learning tasks the instance attributes may have associated costs.  For example, in learning to classify medical diseases we might describe patients in terms of attributes such as Temperature, BiopsyResult, Pulse, BloodTestResults, etc.  These attributes vary significantly in their costs, both in terms of monetary cost and cost to patient comfort.