Unit-3 (MLT)
Unit-3 (MLT)
The term |Dj | /|D| acts as the weight of the jth partition.
Information gain is defined as the difference between the original
information requirement (i.e., based on just the proportion of classes) and
the new requirement (i.e., obtained after partitioning on A).
The expected information needed to classify a tuple in D:
Expected information needed to classify a tuple in D if the
tuples are partitioned according to age is :
Gain(income) = 0.029.
Therefore, GainRatio(income) = 0.029/1.557 = 0.019.
Attribute Selection Measures
3) Gini index The Gini index is used in CART.
Gini index measures the impurity of D, a data partition or
set of training tuples, as