Lec 3
Lec 3
Machine Learning
Dr Muhammad Sharjeel
03
Decision Trees
General motive of Decision Tree (DT) is to create a training model which can
predict class (or value) of target variables by learning decision rules inferred from
prior data (training data)
In a DT, each node represents a feature (attribute), each link (branch) a decision
(rule) and each leaf an outcome
Attributes Gain
Outlook 0.247
Temperature 0.029
Outlook
Humidity 0.152
Wind 0.048
Outlook
? Yes ?
Outlook
Yes
Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No
Entropy(S) = 0.971
Entropy(A)[Temperature](Cool) = 0
Entropy(A)[Temperature](Hot) = 0
Entropy(A)[Temperature](Mild) = 1
IE(Temperature) = 0.400
IG(Temperature) = 0.571
Entropy(S) = 0.971
Entropy(A)[Humidity](High) = 0
Entropy(A)[Humidity](Normal) = 0
IE(Humidity) = 0
IG(Humidity) = 0.971
Entropy(S) = 0.971
Entropy(A)[Wind](Strong) = 1
Entropy(A)[Wind](Weak) = 0.918
IE(Wind) = 0.951
IG(Wind) = 0.020
Outlook
Humidity Yes ?
Normal High
Yes No
Entropy(S) = 0.971
Entropy(A)[Temperature](Cool) = 1
Entropy(A)[Temperature](Mild) = 0.918
IE(Temperature) = 0.951
IG(Temperature) = 0.020
Entropy(S) = 0.971
Entropy(A)[Humidity](High) = 1
Entropy(A)[Humidity](Normal) = 0.918
IE(Humidity) = 0.951
IG(Humidity) = 0.020
Entropy(S) = 0.971
Entropy(A)[Wind](Weak) = 0
Entropy(A)[Wind](Strong) = 0
IE(Humidity) = 0
IG(Humidity) = 0.971
Outlook
Yes No Yes No
Entropy of the whole dataset, Outlook attribute entropy, and information gain of Outlook already calculated
(ID3)
Entropy(S) = 0.940
IE[Outlook] = 0.693
IG(Outlook) = 0.940 – 0.693 = 0.247
Outlook
Yes
Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No
Outlook
Yes No Yes No
Attribute with the highest gini index is Outlook, hence, it will be chosen as root node
Within the Outlook, [(Sunny, Rain), Overcast] [Gini(S,R), O] has the lowest gini index
65 Yes
70 No
70 Yes
70 Yes
75 Yes
78 Yes
80 Yes
80 Yes
80 No
85 No
90 No
90 Yes
95 No
96 Yes
No calculation of gain ratio for Humidity (96) because it cannot be greater than
this value
Gain is maximum when threshold is equal to Humidity (80)
Temperature will be the root node as it has the highest gain ratio value
Can you build the complete DT?
Outlook has the highest chi-square value (most significant feature) and will be
the root node
Can you build the complete DT?
Outlook
Overcast = 3.49
Rain = 10.87
Sunny = 7.78
Weighted SD (Outlook) = (4/14)x3.49 + (5/14)x10.87 + (5/14)x7.78 = 7.66
SD reduction (Outlook) = 9.32 – 7.66 = 1.66
Outlook will be the root node as it has the highest SD reduction value
Can you build the complete DT?