ML - 04 - Decision Trees
ML - 04 - Decision Trees
Chapter 04
Decision Trees
Prepared by: Ziad Doughan
Email: [email protected]
Example:
Thus, for chosen attribute A that divides the training set E into
subsets E1, …, Ev according to their values for A, where A has v
distinct values.
In this case, Information Gain is calculated for a split by
subtracting the weighted entropies of each branch from the
original entropy.
𝑝 𝑛
𝐼𝐺 𝑃𝑎𝑡𝑟𝑜𝑛𝑠 = 𝐼 , − 𝑟𝑒𝑚𝑎𝑖𝑛𝑑𝑒𝑟 𝑃𝑎𝑡𝑟𝑜𝑛𝑠
𝑝+𝑛 𝑝+𝑛
𝑣
𝑝𝑖 + 𝑛𝑖 𝑝𝑖 𝑛𝑖
=1− 𝐼( , )
𝑝 + 𝑛 𝑝𝑖 + 𝑛𝑖 𝑝𝑖 + 𝑛𝑖
𝑖=1
2 0 2 4 4 0 6 2 4
=1− 𝐼 , + 𝐼 , + 𝐼 ,
12 2 2 12 4 4 12 6 6
2 4 6 2 2 4 4
=1− ×0+ ×0+ × − 𝑙𝑜𝑔2 + − 𝑙𝑜𝑔2
12 12 12 6 6 6 6
2 2 4 4
𝐼𝐺 𝑇𝑦𝑝𝑒 = 1 − + + + = 0 𝑏𝑖𝑡
12 12 12 12
Do the same for all the other attributes and you will conclude
that Patrons has the highest IG of all.
So Patrons is chosen by DT learning algorithm as root.
Example:
NOTE:
• Gini impurity ranges from 0 to 0.5.
• Entropy ranges from 0 to 1.
The key is to combine the weak rules together and get better
performance.
• Simple to implement.