0% found this document useful (0 votes)
8 views

Lecture 12

Uploaded by

sayanpal854
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 12

Uploaded by

sayanpal854
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Decision Tree

Prof. Subir Kumar Das, Dept. of CSE 1


Entropy and Information gain

• Entropy is used for checking the impurity or uncertainty present in the


data.
• Entropy is used to evaluate the quality of a split.
• When entropy is zero the sample is completely homogeneous, meaning
that each instance belongs to the same class and entropy is one when the
sample is equally divided between different classes.
• Information gain indicates how much information a particular feature/
variable give us about the final outcome.

Prof. Subir Kumar Das, Dept. of CSE 2


ID3
• ID3 stands for Iterative Dichotomizer3 and is named such because the
algorithm iteratively(repeatedly) dichotomizes(divides) features into two
or more groups at each step.
• ID3 is an algorithm invented by Ross Quinlan used to generate a decision
tree from a dataset and is the most popular algorithms used to
constructing trees.
• ID3 is the core algorithm for building a decision tree.
• It employs a top-down greedy search through the space of all possible
branches with no backtracking.
• This algorithm uses information gain and entropy to construct a
classification decision tree.
• ID3 can overfit the training data (to avoid overfitting, smaller decision
trees should be preferred over larger ones).
• This algorithm usually produces small trees, but it does not always
produce the smallest possible tree.
• ID3 is harder to use on continuous data and inexpensive to construct
• Extremely fast at classifying unknown records Easy to interpret for small-
sized trees and robust to noise
Prof. Subir Kumar Das, Dept. of CSE 3
Steps
• It can easily handle redundant or irrelevant attributes
• But the space of possible decision trees is exponentially large.
• Greedy approaches are often unable to find the best tree.
• Does not take into account interactions between attributes.
• Each decision boundary involves only a single attribute.
• Steps to making Decision Tree using ID3:
• a) Take the Entire dataset as an input.
• b) Calculate the Entropy of the target variable, As well as the
predictor attributes
• c) Calculate the information gain of all attributes.
• d) Choose the attribute with the highest information gain as the
Root Node
• e) Repeat the same procedure on every branch until the decision
node of each branch is finalized.

Prof. Subir Kumar Das, Dept. of CSE 4


Example

• First calculate the entropy for “Decision” attribute which is a target


variable and also calculate the entropy for independent attributes 5like
Prof. Subir Kumar Das, Dept. of CSE
“Outlook”, “Temp.” , “Humidity” , “Wind” .
Example

Prof. Subir Kumar Das, Dept. of CSE 6


Example

• Information gained before splitting i.e.,


• Entropy(S) = I(S1, S2) = I(9, 5)= -(9/14)log(9/14) – (5/14)log(5/14)
• = 0.4097 +0.5305=0.9402
• S11=Sunny + Play Tennis(Yes) --- 2 S21=Sunny + Play Tennis(No) --- 3
• S12=Overcast + Play Tennis(Yes) --- 4 S22=Overcast + Play Tennis(No) --- 0
• S12=Rain + Play Tennis(Yes) --- 3 S22=Rain + Play Tennis(No) --- 2

I (S11, S21)= -
(2/5)log(2/5) -
(3/5)log(3/5)
Prof. Subir Kumar Das, Dept. of CSE 7
Example

I(S12,S22)= -(4/4)
log(4/4) – 0

I(S13,S23) =-
(2/5)log(2
/5)
-(3/5)log(3/5)

• Entropy(Outlook) = (5/14) I(S11, S21) + (4/14)I(S12, S22) + ((5/14)


I(S13,S23)
• Infromation Gain(Outlook) = I(S1,S2) – E(Out look)
Prof. Subir Kumar Das, Dept. of CSE 8
Example
I(S12,S22)= -(2/4)
log(2/4) – (2/4)
log(2/4)

I(S12,S22)= -(2/6)
log(2/6) – (4/6)
log(4/6)

I(S12,S22)= -(3/4)
log(3/4) – (1/4)
log(1/4)

Prof. Subir Kumar Das, Dept. of CSE 9


Example
• Entropy(Temperature) = (5/14) I(S11, S21) + (4/14)I(S12, S22) + ((5/14)
I(S13,S23)
• Gain(Temperature) = I(S1,S2) – E(Temperature) = .02922

• Entropy(Humidity) = (7/14) I(S11, S21) + (7/14)I(S12, S22)= 0.7879


• Gain(Humidity) = I(S1,S2) – E(Temperature) = .015184

Prof. Subir Kumar Das, Dept. of CSE 10


Example

• Entropy(Wind) = (8/14) I(S11, S21) + (6/14)I(S12, S22)= 0.892


• Gain(Wind) = I(S1,S2) – E(Wind) = .04813

Prof. Subir Kumar Das, Dept. of CSE 11


Example

• Repeat the same procedure on every branch until the decision node of
each branch is finalized.
• Set the test data set for the decision node

Prof. Subir Kumar Das, Dept. of CSE 12


Example
• Entropy (Outlook = Sunny) = 0.97095
• Outlook=Sunny and Temp=Hot

• Outlook=Sunny and Temp=Mild

• Outlook=Sunny and Temp=Cool

• Outlook=Sunny |Humidity=High

• Outlook=Sunny | Humidity=Normal
Prof. Subir Kumar Das, Dept. of CSE 13
Example
• Outlook=Sunny | Wind=Strong

• Outlook=Sunny | Wind=Weak

Prof. Subir Kumar Das, Dept. of CSE 14


Example
• The information gain is maximum for humidity when outlook is “Sunny”
• So, “Himidity” is the next level decision for outlook “Sunny”
• The decision will always be yes if outlook were overcast. So no need of
calculating entropy and information gain.
• Continue the process for Outlook=Rain.
• The information gain is high for (Outlook=Rain | Wind) , so it will be the
decision node after Rain.

• The Decision Tree construction is over.


Prof. Subir Kumar Das, Dept. of CSE 15
Pruning
• In machine learning and data mining, pruning is a technique associated
with decision trees.
• Pruning reduces the size of decision trees by removing parts of the tree
that do not provide power to classify instances.
• Decision trees are the most susceptible out of all the machine learning
algorithms to overfitting and effective pruning can reduce this likelihood.
• Pruning is a technique that involves diminishing the size of a prepared
model by eliminating some of its parameters.
• The objective of pruning is to make a smaller, faster, and more effective
model while maintaining its accuracy.
• Pruning can be especially useful for huge and complex models, where
lessening their size can prompt significant improvements in their speed
and proficiency.
• Going through and computing the impurities over the entire dataset each
time, it is O(Nkd)
• This means that it’s actually somewhere in between being in O(NklogN)
and O(N2k).
• The worst case performance can be extremely bad! Pruning reducee the
complexity and thereby reduce the chance of over fitting
Prof. Subir Kumar Das, Dept. of CSE 16
Pre-Pruning
• Pre-Pruning technique is used before construction of decision tree.
• Pre-Pruning can be done using Hyper parameter tuning.
• Overcome the overfitting issue.
• Halt tree construction early and do not split a node if this would result in
the goodness measure falling below a threshold (difficult to choose)
• It is usually based on statistical significance test
• Upon halting, the node turns into a leaf.
• The leaf can influence the most common class between the subset
samples, or the probability distribution of those samples.
• Stop growing the tree when there is no statistically significant association
between any attribute and the class at a particular node
• This technique suffer from high risk of premature halt
– If initially no individual attribute exhibits any interesting information about
the class
– The structure will become visible only in fully expanded tree
– Pre-pruning won’t expand the root node
Prof. Subir Kumar Das, Dept. of CSE 17
Post Pruning
• This technique is used after construction of decision tree.
• This technique is used when decision tree will have very large depth and
will show overfitting of model.
• It is also known as backward pruning.
• This technique is used when we have infinitely grown decision tree.
• This technique remove branches from a “fully grown” tree.
• Use a set of data different from the training data to decide which is the
“best pruned tree”.
• First, build full tree, then prune it
– Fully-grown tree shows all attribute interactions
– But some subtrees might be due to chance effects
• Two pruning operations
– Subtree raising
– Subtree replacement
• Possible strategies to select the subtree
– Error estimation
– Significance testing Prof. Subir Kumar Das, Dept. of CSE 18
Thank You

Prof. Subir Kumar Das, Dept. of CSE 19

You might also like