Assigment 2 Ammad Ali
Assigment 2 Ammad Ali
Question # 01 :
You are given a dataset containing information about weather conditions and whether people
decide to play tennis under those conditions. The dataset includes the following attributes:
1. Outlook: {Sunny, Overcast, Rain}
2. Temperature: {Hot, Mild, Cool}
3. Humidity: {High, Normal}
4. Wind: {Weak, Strong}
5. PlayTennis: {Yes, No} (target attribute) The dataset is as follows:
Outlook Temperature Humidity Wind PlayTennis Sunny Hot High Weak No Sunny Hot High
Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes
Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny
Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild
High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No Using the ID3
algorithm, answer the following questions:
1. Entropy Calculation: Calculate the initial entropy of the dataset with respect to the
target attribute "Play Tennis".
2. Information Gain Calculation: Calculate the information gain for each attribute
(Outlook, Temperature, Humidity, and Wind) with respect to the target attribute.
3. Attribute Selection: Based on the information gain calculated, identify which attribute
the ID3 algorithm will choose as the root node for the decision tree.
4. Subsequent Steps: Describe the next steps the ID3 algorithm will take after selecting the
root node.
Answer :
Step 1:
Entropy Calculation (Target: PlayTennis) We
have 14 records.
• Yes: 9
• No: 5
Entropy formula:
Entropy(S)=−p+log2(p+)−p−log2(p−)Entropy(S) = -p_+ \log_2(p_+) - p_- \log_2(p_-)Entropy(S)=−p+log2(p+)−p−
log2(p−) Entropy(S)=−914log2(914)−514log2(514)Entropy(S) = -\frac{9}{14} \log_2\left(\frac{9}{14}\right) -
\frac{5}{14} \log_2\left(\frac{5}{14}\right)Entropy(S)=−149log2(149)−145log2(145)
=−0.643log2(0.643)−0.357log2(0.357)≈−0.643⋅(−0.643)−0.357⋅(−1.485)= -0.643 \log_2(0.643) - 0.357
\log_2(0.357) \approx -0.643 \cdot (-0.643) - 0.357 \cdot (-1.485)=−0.643log2(0.643)−0.357log2
(0.357)≈−0.643⋅(−0.643)−0.357⋅(−1.485) ≈0.413+0.530=0.940\approx 0.413 + 0.530 = \
boxed{0.940}≈0.413+0.530=0.940
Step 2:
Information Gain Calculation
We calculate the gain for each attribute:
Attribute: Outlook
Attribute: Temperature
Attribute: Humidity
• High (7): [No, No, Yes, Yes, No, Yes, No] → 3 Yes, 4 No
Entropy=−37log2(37)−47log2(47)≈0.985Entropy = -\frac{3}{7}\log_2\left(\frac{3}{7}\right) -
\frac{4}{7}\log_2\left(\frac{4}{7}\right) \approx 0.985Entropy=−73log2(73)−74log2(74)≈0.985
• Normal (7): [Yes, Yes, Yes, Yes, Yes, Yes, Yes] → 6 Yes, 1 No
Entropy=−67log2(67)−17log2(17)≈0.592Entropy = -\frac{6}{7}\log_2\left(\frac{6}{7}\right) - \frac{1}{7}\log_2\left(\
frac{1}{7}\right) \approx 0.592Entropy=−76log2(76)−71log2(71)≈0.592
Gain(Humidity)=0.940−(714⋅0.985+714⋅0.592)Gain(Humidity) = 0.940 - \left( \frac{7}{14} \cdot 0.985 + \frac{7}{14}
\cdot 0.592 \right)Gain(Humidity)=0.940−(147⋅0.985+147⋅0.592) =0.940−(0.493+0.296)=0.940−0.789=0.151=
0.940
- (0.493 + 0.296) = 0.940 - 0.789 = \boxed{0.151}=0.940−(0.493+0.296)=0.940−0.789=0.151
Attribute: Wind
• Weak (8): [No, Yes, Yes, Yes, Yes, Yes, Yes, No] → 6 Yes, 2 No
Entropy=−68log2(68)−28log2(28)≈0.811Entropy = -\frac{6}{8}\log_2\left(\frac{6}{8}\right) -
\frac{2}{8}\log_2\left(\frac{2}{8}\right) \approx 0.811Entropy=−86log2(86)−82log2(82)≈0.811
Step 3:
Attribute Selection
Now compare all the information gains:
• Gain(Outlook): 0.245
• Gain(Temperature): 0.029
• Gain(Humidity): 0.151
• Gain(Wind): 0.048
So, Outlook has the highest information gain → It will be selected as the root node.
Step 4:
Subsequent Steps of ID3 Algorithm
Once Outlook is selected as the root node, the algorithm proceeds as follows:
• Sunny
• Overcast
• Rain
2. Recursively apply ID3 to each branch:
• Outlook = Sunny → Subset: 5 records → Recompute entropy & info gain for Temperature, Humidity, and
Wind in this subset.
• Outlook = Rain → Subset: 5 records → Recompute gains for remaining attributes to split further.
3. Repeat:
Continue selecting attributes with the highest information gain until:
• All samples in a subset belong to the same class (pure), or
• No more attributes left.
Final Summary:
1. Initial entropy of dataset = 0.940
2. Information gains:
o Outlook: 0.245
o Temperature:
0.029 o Humidity:
4. Next steps = Split by Outlook, recursively apply ID3 on resulting subsets using remaining attributes.
1. Outlook = Overcast
2. Outlook = Sunny
Subset:
i. Humidity
a) Entropy(Rain subset):
=−35log2(35)−25log2(25)=0.971= -\frac{3}{5}\log_2\left(\frac{3}{5}\right) -
\frac{2}{5}\log_2\left(\frac{2}{5}\right) = 0.971=−53log2(53)−52log2(52)=0.971
i. Wind
• Weak (3): [Yes, Yes, Yes] → All Yes → Entropy = 0
• Strong (2): [No, No] → All No → Entropy = 0
Gain(Wind)=0.971−(3/5⋅0+2/5⋅0)=0.971Gain(Wind) = 0.971 - (3/5 \cdot 0 + 2/5 \cdot 0) =
\boxed{0.971}Gain(Wind)=0.971−(3/5⋅0+2/5⋅0)=0.971
Best gain → choose Wind to split the "Rain" branch.
├── Sunny
│ └── Humidity?
│ ├── High → No
│ └── Normal → Yes
└── Rain
└── Wind?