01 Section 6.2.1 QR Code Content
01 Section 6.2.1 QR Code Content
Example 1: Let us look at a simple example of constructing a decision tree with the training dataset
consisting of 10 data instances and 3 attributes {A1, A2, A3} as shown in Table 1. The final attribute
in the dataset is the target class to be classified. Each row in the table is called a data instance. The
data instances are classified as 'Positive' or 'Negative'. Construct the decision tree and predict the
target class for a given test instance.
Table 1
S.No. A1 A2 A3 Target Class
1 10 0.5 2.5 Negative
2 7 1.3 3 Positive
3 5 0.7 5.6 Negative
4 9 1.0 4.3 Positive
5 10 1.2 7.1 Positive
6 8 0.9 6.5 Negative
7 6 0.8 3.2 Negative
8 4 1.5 5.1 Negative
9 9 1.7 4.7 Positive
10 8 1.6 7.4 Positive
Solution:
Table 2 depicts the number of data instances categorized as Positive or Negative for each of the three
attributes in the Training dataset.
Table 2
A1 Positive Negative Total A2 Positive Negative Total A3 Positive Negative Total
≥5 5 4 9 ≥1 5 1 6 ≥5 2 3 5
<5 0 1 1 <1 0 4 4 <5 3 2 5
Step 1: Initially we calculate the Entropy for the whole data set T based on the target feature. The
target class has 5 data instances classified as ‘Positive’ and 5 data instances classified as ‘Negative’.
So, we can calculate the class Entropy as Entropy_Info(5,5).
Entropy_Info(T) = Entropy_Info(5,5)
=- [ 5
10
5 5
log 2 + log 2
10 10
5
10 ]
= - ( - 0.4997 + - 0.4997)
= 0.9994
Step 2: Now, we can calculate the Entropy_Info for each of the attributes in the training dataset. Since
all the attributes in this example are continuously valued they are discretized by assuming a split point
"S" and all values ≥ S belongs to one category and all values < S belongs to another category. Thus,
continuous values are made as categorical values and every attribute in this example has 2 categories.
The Table 1 shows total number of data instances belonging to each category of an attribute and
among each category how many instances are classified as positive and how many instances are
classified as negative. For example, in attribute A1, 9 data instances are ≥ 5 and 1 data instance is < 5.
Among 9 data instances, 5 are classified as ‘Positive’ and 4 are classified as 'Negative'. Hence, we
calculate for A1 as below,
9 1
Entropy_Info(T, A1) = * Entropy_Info(5,4) + Entropy_Info(0,1)
10 10
= [
9 −5
10 9
5 4 4
log 2 − log 2 +
9 9
1 −0
9 10 1 ] [ 0 1
log 2 − log 2
1 1
1
1 ]
9 1
= (0.4708 + 0.5196) + ∗¿ 0
10 10
= 0.8913
After we calculate the Entropy_Info for A1, We can calculate Gain of A1 by subtracting
Entropy_Info(T, A1) from Entropy_Info(T).
Entropy_Info(T, A2) = [
6 −5
10 6
5 1 1 4 −0
log 2 − log 2 +
6 6 6 10 4 ] [ 0 4
log 2 − log 2
4 4
4
4 ]
6
= (0.2190 + 0.4305) + 0
10
= 0.3897
Gain (A2) = 0.9994 - 0.3897
= 0.6097
Entropy_Info(T, A3) = [
5 −2
10 5
2 3 3 5 −3
log 2 − log 2 +
5 5 5 10 5 ] [ 3 2
log 2 − log 2
5 5
2
5 ]
5 5
= (0.5284 + 0.4419) + (0.5284 + 0.4419)
10 10
= 0.9703
Gain (A3) = 0.9994 - 0.9703
= 0.0291
Table 3 shows the Gain value calculated for each of the attribute.
Table 3
Attribute Gain
A1 0.1081
A2 0.6097
A3 0.0291
Step 3:
Now we choose the attribute which has maximum gain as the best split attribute. Here we choose A2
as the best split attribute and place it as the root node. A2 has two outcomes ≥ 1 and < 1.
Table 4 shows the data instances for which A2 values are < 1 and all are categorized as ‘Negative’
and hence its entropy is 0. So, the outcome of A2 with < 1 ends up in a leaf node and its class is
‘Negative’.
Table 4
S.No A1 A2 A3 Target Class
1 10 0.5 2.5 Negative
3 5 0.7 5.6 Negative
6 8 0.9 6.5 Negative
7 6 0.8 3.2 Negative
All other data instances with A2 values ≥ 1 form another subset and the process is repeated. This is
shown in Figure 1.
Figure 1
Table 5 depicts the number of data instances categorized as Positive and Negative for the attributes
A1 and A3.
Table 5
A1 Positive Negative Total A3 Positive Negative Total
≥5 5 0 5 ≥5 2 1 3
<5 0 1 1 <5 3 0 3
Entropy_Info(T) = Entropy_Info(5,1) =
=- [ 5
6
5 1
log 2 + log 2
6 6
1
6 ]
= - ( - 0.2190 + - 0.4305)
= 0.6495
Entropy_Info(T, A1) =
6 5[
5 −5 5 0 0 1 −0
log 2 − log 2 +
5 5 5 6 1 ] [
0 1
log 2 − log 2
1 1
1
1 ]
=0+0
= 0
Gain (A1) = 0.6495 - 0
= 0.6495
Entropy_Info(T, A3) =
6 3[
3 −2 2 1 1 3 −3
log 2 − log 2 +
3 3 3 6 3 ] [
3 0
log 2 − log 2
3 3
0
3 ]
3
= (0.5846 + 0.5280)
6
= 0.5563
Gain (A3) = 0.6495 - 0.5563
= 0.0932
Table 6
Attribute Gain
A1 0.6495
A3 0.0932
As shown in Table 6, A1with the maximum information gain is chosen as the best split attribute. The
outcome of A1 ≥ 5 has 5 instances in the subset considered in Iteration 2 and are all 'Positive' and
entropy is 0. Hence it ends in a leaf node as ‘Positive’. There is 1 instance < 5 and classified as
‘Negative’.
Figure 2
The final decision tree is shown in Figure 2. Now if a test instance is given, the target class can be
predicted. The sample test instance is given as (6, 1, 6.7). Now, traverse the tree for the given test data
instance from root to leaf taking a path that satisfies the constraints. The predicted class label is
‘Positive’.