Issues in DTL.pptx

ISSUES IN DECISION TREE LEARNING
Practical issues in learning decision trees include
1. Determining how deeply to grow the decision tree,
2. Handling continuous attributes,
choosing an appropriate attribute selection measure,
3. Handling training data with missing attribute values,
4. Handling attributes with differing costs, and
improving computational efficiency.

1. AVOIDING OVERFITTING THE DATA
 When we are designing a machine learning model, a
model is said to be a good machine learning model, if it
generalizes any new input data from the problem domain
in a proper way.
 This helps us to make predictions in the future data, that
data model has never seen.

 Underfitting
 A machine learning algorithm is said to have underfitting
when it cannot capture the underlying trend of the data.
 Underfitting destroys the accuracy of our machine learning
model.
 Its occurrence simply means that our model or the algorithm
does not fit the data well enough.
 It usually happens when we have less data to build an
accurate model and also when we try to build a linear model
with a non-linear data.

 Overfitting
 A machine learning algorithm is said to be overfitted, when
we train it with a lot of data.
 When a model gets trained with so much of data, it starts
learning from the noise and inaccurate data entries in our
data set.
 Then the model does not categorize the data correctly,
because of too much of details and noise.
 A solution to avoid over fitting is using a linear algorithm if
we have linear data or using the parameters like the
maximal depth if we are using decision trees.

 Definition — Overfit: Given a hypothesis space H, a
hypothesis h ∈ H is said to overfit the training data if there
exists some alternative hypothesis h’ ∈ H, such that h has
smaller error than h’ over the training examples, but h’ has
a smaller error than h over the entire distribution of
instances.
 Lets try to understand the effect of adding the following
positive training example, incorrectly labeled as negative, to
the training examples Table.
 <Sunny, Hot, Normal, Strong, ->, Example is noisy because
the correct label is +.
Given the original error-free data, ID3 produces the
decision tree shown in Figure.

AVOIDING OVER FITTING
 There are several approaches to avoiding overfitting in
decision tree learning. These can be grouped into two
classes:

- Pre-pruning (avoidance): Stop growing the tree
earlier, before it reaches the point where it perfectly
classifies the training data

- Post-pruning (recovery): Allow the tree to overfit the
data, and then post-prune the tree

 Criterion used to determine the correct final tree
size
 Use a separate set of examples, distinct from the
training examples, to evaluate the utility of post-
pruning nodes from the tree
 Use all the available data for training, but apply a
statistical test to estimate whether expanding (or
pruning) a particular node is likely to produce an
improvement beyond the training set

1. REDUCED ERROR PRUNING
 How exactly might we use a validation set to prevent
overfitting? One approach, called reduced-error pruning
(Quinlan 1987), is to consider each of the decision nodes in the
tree to be candidates for pruning.
 Reduced-error pruning, is to consider each of the decision
nodes in the tree to be candidates for pruning
 Pruning a decision node consists of removing the subtree
rooted at that node, making it a leaf node, and assigning it the
most common classification of the training examples affiliated
with that node
 Nodes are removed only if the resulting pruned tree performs
no worse than-the original over the validation set.
 Reduced error pruning has the effect that any leaf node added
due to coincidental regularities in the training set is likely to be
pruned because these same coincidences are unlikely to occur
in the validation set

2. RULE POST-PRUNING
 Rule post-pruning involves the following steps:
 Infer the decision tree from the training set, growing the
tree until the training data is fit as well as possible and
allowing over fitting to occur.
 Convert the learned tree into an equivalent set of rules by
creating one rule for each path from the root node to a
leaf node.
 Prune (generalize) each rule by removing any
preconditions that result in improving its estimated
accuracy.
 Sort the pruned rules by their estimated accuracy, and
consider them in this sequence when classifying
subsequent instances.

THERE ARE THREE MAIN ADVANTAGES BY CONVERTING
THE DECISION TREE TO RULES BEFORE PRUNING
 Converting to rules allows distinguishing among the
different contexts in which a decision node is used.
 Because each distinct path through the decision tree
node produces a distinct rule, the pruning decision
regarding that attribute test can be made differently for
each path.
 Converting to rules removes the distinction between
attribute tests that occur near the root of the tree and
those that occur near the leaves.
 Thus, it avoid messy bookkeeping issues such as how to
reorganize the tree if the root node is pruned while
retaining part of the subtree below this test.
 Converting to rules improves readability. Rules are often
easier for to understand.

2. INCORPORATING CONTINUOUS-VALUED
ATTRIBUTES
 Our initial definition of ID3 is restricted to attributes
that take on a discrete set of values.
 1. The target attribute whose value is predicted by
learned tree must be discrete valued.
 2. The attributes tested in the decision nodes of the
tree must also be discrete valued.
 This second restriction can easily be removed so
that continuous-valued decision attributes can be
incorporated into the learned tree.

3. ALTERNATIVE MEASURES FOR SELECTING
ATTRIBUTES
 There is a natural bias in the information gain measure
that favours attributes with many values over those with
few values.
 As an extreme example, consider the attribute Date, which
has a very large number of possible values. What is wrong
with the attribute Date?
 Simply put, it has so many possible values that it is bound
to separate the training examples into very small subsets.
 Because of this, it will have a very high information gain
relative to the training examples.
 How ever, having very high information gain, its a very
poor predictor of the target function over unseen
instances.

4. HANDLING MISSING ATTRIBUTE VALUES
 In certain cases, the available data may be missing
values for some attributes.
 For example, in a medical domain in which we wish
to predict patient outcome based on various
laboratory tests, it may be that the Blood-Test-
Result is available only for a subset of the patients.
 In such cases, it is common to estimate the missing
attribute value based on other examples for which
this attribute has a known value.

5. HANDLING ATTRIBUTES WITH DIFFERING
COSTS
 In some learning tasks the instance attributes may
have associated costs.
 For example, in learning to classify medical
diseases we might describe patients in terms of
attributes such as Temperature, BiopsyResult,
Pulse, BloodTestResults, etc.
 These attributes vary significantly in their costs,
both in terms of monetary cost and cost to patient
comfort.

Issues in DTL.pptx

More Related Content

What's hot (20)

Similar to Issues in DTL.pptx (20)

More from Ramakrishna Reddy Bijjam (20)

Recently uploaded (20)

Issues in DTL.pptx