Entropy and Information Gain
Entropy and Information Gain
We would like to select the attribute that is most useful for classifying examples.
where p+, is the proportion of positive examples in S and p-, is the proportion of
negative examples in S.
Now, the information gain is simply the expected reduction in entropy caused by
partitioning the examples according to this attribute.
More precisely, the information gain, Gain(S, A) of an attribute A, relative to a
collection of examples S, is defined as,
where Values(A) is the set of all possible values for attribute A, and S, is the subset of S
for which attribute A has value v (i.e., S_v= {s ∈ S|A(s) = v})
For example, suppose S is a collection of training-example days described by attributes
including Wind, which can have the values Weak or Strong.
Information gain is precisely the measure used by ID3 to select the best attribute at each
step in growing the tree.