Unit 3
Unit 3
ID3 stands for Iterative Dichotomizer3 and is named such because the
algorithm iteratively(repeatedly) dichotomizes(divides) features into two or
more groups at each step.ID3 is an algorithm invented by Ross Quinlan used to
generate a decision tree from a dataset and is the most popular algorithms used
to constructing trees.
ID3 is the core algorithm for building a decision tree .It employs a top-down
greedy search through the space of all possible branches with no backtracking.
This algorithm uses information gain and entropy to construct a classification
decision tree.
Characteristics of ID3 Algorithm
Major Characteristics of the ID3 Algorithm are listed below:
•ID3 can overfit the training data (to avoid overfitting, smaller
decision trees should be preferred over larger ones).
•This algorithm usually produces small trees, but it does not
always produce the smallest possible tree.
•ID3 is harder to use on continuous data (if the values of any
given attribute is continuous, then there are many more places to
split the data on this attribute, and searching for the best value to
split by can be time-consuming).
C4.5
The C4.5 algorithm is a successor of ID3. The most significant difference between C4.5
and ID3 is that C4.5 efficiently allows for continuous features by partitioning
numerical values into distinct intervals.
•Instead of Information Gain, C4.5 uses Information Gain Ratio to determine the best
feature to split on.
•C4.5 uses post-pruning after growing an overly large tree.
•C4.5 has a bias towards features with a lesser number of distinct values.
CART (Classification and Regression Tree)
CART, similarly to C4.5, supports both categorical and numerical data. However, CART differs
from C4.5 in the way that it also supports regression.
One major difference between CART, ID3, and C4.5 is the feature selection criteria. While ID3
and C4.5 use Entropy/Information Gain/Information Gain Ratio, CART uses Gini Impurity.
Additionally, CART constructs a binary tree, which means that every node has exactly two
children, unlike the other algorithms we have discussed, which don't necessarily have two child
nodes per parent.
•When training a CART decision tree, the best split is chosen by minimizing the Gini Impurity.
•CART uses post-pruning after growing an overly large tree.
KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India
May 28, 2024 86