Decision Tree
Decision Tree
Algorithm
Supervised ML
What is a Decision Tree?
Attribute selection measure is a heuristic for selecting the splitting criterion that
partition data into the best possible manner. It is also known as splitting rules
because it helps us to determine breakpoints for tuples on a given node. ASM
provides a rank to each feature(or attribute) by explaining the given dataset.
Best score attribute will be selected as a splitting attribute (Source). In the case
of a continuous-valued attribute, split points for branches also need to define.
Where:
• S is the original dataset
• A is the attribute
• Svis the subset of S for which attribute A has value v
• H(S) is the entropy of the original dataset
• H(Sv) is the entropy of the subset
Gini index
The attribute that yields the largest IG is chosen for the decision
node.
EXAMPLE
how can we avoid over-fitting in decision
trees?