0% found this document useful (0 votes)
1 views

Decision Tree

A Decision Tree is a supervised learning algorithm primarily used for classification tasks, which divides data into homogeneous subsets based on significant input variables. The structure includes root, decision, and leaf nodes, and employs measures like Entropy and Gini Index for attribute selection to optimize splits. Overfitting can be mitigated through pre-pruning and post-pruning techniques to ensure the model generalizes well to unseen data.

Uploaded by

abhijaychauhan88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Decision Tree

A Decision Tree is a supervised learning algorithm primarily used for classification tasks, which divides data into homogeneous subsets based on significant input variables. The structure includes root, decision, and leaf nodes, and employs measures like Entropy and Gini Index for attribute selection to optimize splits. Overfitting can be mitigated through pre-pruning and post-pruning techniques to ensure the model generalizes well to unseen data.

Uploaded by

abhijaychauhan88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Decision Tree

Algorithm
Supervised ML
What is a Decision Tree?

 Decision tree is a type of supervised learning algorithm (having a


predefined target variable) that is mostly used in classification
problems. It works for both categorical and continuous input and
output variables. In this technique, we split the population or sample
into two or more homogeneous sets (or sub-populations) based on
most significant splitter / differentiator in input variables.
Structure of a Decision Tree
Structure of a Decision Tree

 Root Node: It represents entire population or sample and this further


gets divided into two or more homogeneous sets.
 Splitting: It is a process of dividing a node into two or more sub-nodes.
 Decision Node: When a sub-node splits into further sub-nodes, then it is
called decision node.
 Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node
 Branch / Sub-Tree: A sub section of entire tree is called branch or sub-
tree.
 Parent and Child Node: A node, which is divided into sub-nodes is
called parent node of sub-nodes where as sub-nodes are the child of
parent node.
How does the Decision Tree
Algorithm Work?
The basic idea behind any decision tree algorithm is as follows:
 Select the best attribute using Attribute Selection Measures (ASM) to
split the records.
 Make that attribute a decision node and breaks the dataset into
smaller subsets.
 Start tree building by repeating this process recursively for each child
until one of the condition will match:
 All the tuples belong to the same attribute value.
 There are no more remaining attributes.
 There are no more instances.
How does the Decision Tree
Algorithm Work?
Attribute Selection Measures

Attribute selection measure is a heuristic for selecting the splitting criterion that
partition data into the best possible manner. It is also known as splitting rules
because it helps us to determine breakpoints for tuples on a given node. ASM
provides a rank to each feature(or attribute) by explaining the given dataset.
Best score attribute will be selected as a splitting attribute (Source). In the case
of a continuous-valued attribute, split points for branches also need to define.

Most popular selection measures are:


 Entropy
 Gini Index
 Chi-Square
 Gain Ratio.
What is Entropy?

Entropy is a measure of the uncertainty or impurity in a dataset. It quantifies the


amount of disorder or randomness. In the context of a decision tree, entropy helps
to determine how informative a particular split is.
 High Entropy: Indicates high disorder, meaning the data is diverse and
uncertain.
 Low Entropy: Indicates low disorder, meaning the data is more homogeneous
and certain.
The formula for entropy H for a binary classification problem is:

H(S)=−p +​log 2​(p +​)−p −​log 2​(p −​)


where:
p +​ is the proportion of positive examples in the dataset S
p −​ is the proportion of negative examples in the dataset S
What is Information Gain?

 Information Gain (IG) is a measure of the effectiveness of an attribute in


classifying the training data. It quantifies the reduction in entropy (uncertainty)
achieved by splitting the dataset based on an attribute.
 The formula for Information Gain is:
Gain(S, A) = Entropy(S) - ((|Sv| / |S|) * Entropy(Sv))

Where:
• S is the original dataset
• A is the attribute
• Sv​is the subset of S for which attribute A has value v
• H(S) is the entropy of the original dataset
• H(Sv​) is the entropy of the subset
Gini index

 Another decision tree algorithm CART (Classification and Regression


Tree) uses the Gini method to create split points.

Where, pi is the probability that a tuple in D belongs to class Ci.


 The Gini Index considers a binary split for each attribute. You can
compute a weighted sum of the impurity of each partition. If a binary
split on attribute A partitions data D into D1 and D2, the Gini index of
D is:
Decision tree algorithms:-

 CART (Classification and Regression Trees) → uses Gini


Index(Classification) as metric.
 ID3 (Iterative Dichotomiser 3) → uses Entropy function and
Information gain as metrics.
Information Gain:

 By using information gain as a criterion, we try to estimate the


information contained by each attribute. We are going to use some
points deducted from information theory.
 To measure the randomness or uncertainty of a random variable X is
defined by Entropy.
 For a binary classification problem with only two classes, positive and
negative class.
 If all examples are positive or all are negative then entropy will be
zero i.e, low.
 If half of the records are of positive class and half are of negative
class then entropy is one i.e, high.
 By calculating entropy measure of each attribute we can calculate
their information gain. Information Gain calculates the expected
reduction in entropy due to sorting on the attribute.
Entropy can be calculated using formula:-

Here p and q is probability of success and failure respectively in that node.


Entropy is also used with categorical target variable. It chooses the split which
has lowest entropy compared to parent node and other splits. The lesser the
entropy, the better it is.

Steps to calculate entropy for a split:


 Calculate entropy of parent node
 Calculate entropy of each individual node of split and calculate weighted
average of all sub-nodes available in split.
PROCEDURE

 First the entropy of the total dataset is calculated.


 The dataset is then split on the different attributes.
 The entropy for each branch is calculated. Then it is added
proportionally, to get total entropy for the split.
 The resulting entropy is subtracted from the entropy before
the split.
 The result is the Information Gain, or decrease in entropy.

 The attribute that yields the largest IG is chosen for the decision
node.
EXAMPLE
how can we avoid over-fitting in decision
trees?

 Overfitting is a practical problem while building a decision tree model.


The model is having an issue of overfitting is considered when the
algorithm continues to go deeper and deeper in the to reduce the
training set error but results with an increased test set error i.e,
Accuracy of prediction for our model goes down. It generally happens
when it builds many branches due to outliers and irregularities in
data.
Two approaches which we can use to avoid overfitting are:
 Pre-Pruning
 Post-Pruning
 Pre-Pruning
In pre-pruning, it stops the tree construction bit early. It is preferred
not to split a node if its goodness measure is below a threshold value.
But it’s difficult to choose an appropriate stopping point.
 Post-Pruning
In post-pruning first, it goes deeper and deeper in the tree to build a
complete tree. If the tree shows the overfitting problem then pruning
is done as a post-pruning step. We use a cross-validation data
to check the effect of our pruning. Using cross-validation data, it tests
whether expanding a node will make an improvement or not.
If it shows an improvement, then we can continue by expanding that node.
But if it shows a reduction in accuracy then it should not be expanded i.e, the
node should be converted to a leaf node.

You might also like