0% found this document useful (0 votes)
4 views8 pages

Assigment 2 Ammad Ali

The document details an assignment on artificial intelligence submitted by Ammad Ali, focusing on the ID3 algorithm applied to a weather dataset to predict tennis play decisions. It includes calculations for initial entropy, information gain for various attributes, and the selection of the root node, which is determined to be 'Outlook'. The subsequent steps involve recursively applying the ID3 algorithm to split the dataset based on the chosen attributes until pure leaf nodes are achieved.

Uploaded by

ammad9171
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

Assigment 2 Ammad Ali

The document details an assignment on artificial intelligence submitted by Ammad Ali, focusing on the ID3 algorithm applied to a weather dataset to predict tennis play decisions. It includes calculations for initial entropy, information gain for various attributes, and the selection of the root node, which is determined to be 'Outlook'. The subsequent steps involve recursively applying the ID3 algorithm to split the dataset based on the chosen attributes until pure leaf nodes are achieved.

Uploaded by

ammad9171
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Assignment: Artificial intelligence

Submitted by: Ammad Ali


Roll No: 301-221123
Department: CS & IT
Submitted To: Ms Muneeba Darwaish

Question # 01 :
You are given a dataset containing information about weather conditions and whether people
decide to play tennis under those conditions. The dataset includes the following attributes:
1. Outlook: {Sunny, Overcast, Rain}
2. Temperature: {Hot, Mild, Cool}
3. Humidity: {High, Normal}
4. Wind: {Weak, Strong}
5. PlayTennis: {Yes, No} (target attribute) The dataset is as follows:
Outlook Temperature Humidity Wind PlayTennis Sunny Hot High Weak No Sunny Hot High
Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes
Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny
Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild
High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No Using the ID3
algorithm, answer the following questions:

1. Entropy Calculation: Calculate the initial entropy of the dataset with respect to the
target attribute "Play Tennis".

2. Information Gain Calculation: Calculate the information gain for each attribute
(Outlook, Temperature, Humidity, and Wind) with respect to the target attribute.

3. Attribute Selection: Based on the information gain calculated, identify which attribute
the ID3 algorithm will choose as the root node for the decision tree.

4. Subsequent Steps: Describe the next steps the ID3 algorithm will take after selecting the
root node.

Answer :
Step 1:
Entropy Calculation (Target: PlayTennis) We
have 14 records.

Count of PlayTennis values:

• Yes: 9
• No: 5

Entropy formula:
Entropy(S)=−p+log⁡2(p+)−p−log⁡2(p−)Entropy(S) = -p_+ \log_2(p_+) - p_- \log_2(p_-)Entropy(S)=−p+log2(p+)−p−
log2(p−) Entropy(S)=−914log⁡2(914)−514log⁡2(514)Entropy(S) = -\frac{9}{14} \log_2\left(\frac{9}{14}\right) -
\frac{5}{14} \log_2\left(\frac{5}{14}\right)Entropy(S)=−149log2(149)−145log2(145)
=−0.643log⁡2(0.643)−0.357log⁡2(0.357)≈−0.643⋅(−0.643)−0.357⋅(−1.485)= -0.643 \log_2(0.643) - 0.357
\log_2(0.357) \approx -0.643 \cdot (-0.643) - 0.357 \cdot (-1.485)=−0.643log2(0.643)−0.357log2
(0.357)≈−0.643⋅(−0.643)−0.357⋅(−1.485) ≈0.413+0.530=0.940\approx 0.413 + 0.530 = \
boxed{0.940}≈0.413+0.530=0.940

Step 2:
Information Gain Calculation
We calculate the gain for each attribute:

Attribute: Outlook

• Sunny (5): [No, No, No, Yes, Yes] → 2 Yes, 3 No


Entropy(Sunny)=−25log⁡2(25)−35log⁡2(35)=0.971Entropy(Sunny) = -\frac{2}{5} \log_2\left(\frac{2}{5}\right) -
\frac{3}{5} \log_2\left(\frac{3}{5}\right) = 0.971Entropy(Sunny)=−52log2(52)−53log2(53)=0.971

• Overcast (4): [Yes, Yes, Yes, Yes] → 4 Yes


Entropy(Overcast)=0 (pure)Entropy(Overcast) = 0 \text{ (pure)}Entropy(Overcast)=0 (pure)

• Rain (5): [Yes, Yes, No, Yes, No] → 3 Yes, 2 No


Entropy(Rain)=−35log⁡2(35)−25log⁡2(25)=0.971Entropy(Rain) = -\frac{3}{5} \log_2\left(\frac{3}{5}\right) - \frac{2}
{5} \log_2\left(\frac{2}{5}\right) = 0.971Entropy(Rain)=−53log2(53)−52log2(52)=0.971
Gain(Outlook)=0.940−(514⋅0.971+414⋅0+514⋅0.971)Gain(Outlook) = 0.940 - \left( \frac{5}{14} \cdot 0.971 +
\frac{4}{14} \cdot 0 + \frac{5}{14} \cdot 0.971 \right)Gain(Outlook)=0.940−(145⋅0.971+144⋅0+145⋅0.971)
=0.940−(0.3475+0+0.3475)=0.940−0.695=0.245= 0.940 - \left(0.3475 + 0 + 0.3475\right) = 0.940 - 0.695 =
\boxed{0.245}=0.940−(0.3475+0+0.3475)=0.940−0.695=0.245

Attribute: Temperature

• Hot (4): [No, No, Yes, Yes] → 2 Yes, 2 No → Entropy = 1.0

• Mild (6): [Yes, Yes, Yes, No, Yes, No] → 4 Yes, 2 No


Entropy=−46log⁡2(46)−26log⁡2(26)≈0.918Entropy = -\frac{4}{6}\log_2\left(\frac{4}{6}\right) -
\frac{2}{6}\log_2\left(\frac{2}{6}\right) \approx 0.918Entropy=−64log2(64)−62log2(62)≈0.918

• Cool (4): [Yes, Yes, Yes, No] → 3 Yes, 1 No


Entropy=−34log⁡2(34)−14log⁡2(14)≈0.811Entropy = -\frac{3}{4}\log_2\left(\frac{3}{4}\right) -
\frac{1}{4}\log_2\left(\frac{1}{4}\right) \approx 0.811Entropy=−43log2(43)−41log2(41)≈0.811
Gain(Temperature)=0.940−(414⋅1.0+614⋅0.918+414⋅0.811)Gain(Temperature) = 0.940 - \left( \frac{4}{14} \cdot 1.0
+ \frac{6}{14} \cdot 0.918 + \frac{4}{14} \cdot 0.811 \right)Gain(Temperature)=0.940−(144⋅1.0+146⋅0.918+144
⋅0.811) =0.940−(0.286+0.393+0.232)=0.940−0.911=0.029= 0.940 - (0.286 + 0.393 + 0.232) = 0.940 - 0.911 =
\boxed{0.029}=0.940−(0.286+0.393+0.232)=0.940−0.911=0.029

Attribute: Humidity

• High (7): [No, No, Yes, Yes, No, Yes, No] → 3 Yes, 4 No
Entropy=−37log⁡2(37)−47log⁡2(47)≈0.985Entropy = -\frac{3}{7}\log_2\left(\frac{3}{7}\right) -
\frac{4}{7}\log_2\left(\frac{4}{7}\right) \approx 0.985Entropy=−73log2(73)−74log2(74)≈0.985

• Normal (7): [Yes, Yes, Yes, Yes, Yes, Yes, Yes] → 6 Yes, 1 No
Entropy=−67log⁡2(67)−17log⁡2(17)≈0.592Entropy = -\frac{6}{7}\log_2\left(\frac{6}{7}\right) - \frac{1}{7}\log_2\left(\
frac{1}{7}\right) \approx 0.592Entropy=−76log2(76)−71log2(71)≈0.592
Gain(Humidity)=0.940−(714⋅0.985+714⋅0.592)Gain(Humidity) = 0.940 - \left( \frac{7}{14} \cdot 0.985 + \frac{7}{14}
\cdot 0.592 \right)Gain(Humidity)=0.940−(147⋅0.985+147⋅0.592) =0.940−(0.493+0.296)=0.940−0.789=0.151=
0.940
- (0.493 + 0.296) = 0.940 - 0.789 = \boxed{0.151}=0.940−(0.493+0.296)=0.940−0.789=0.151

Attribute: Wind

• Weak (8): [No, Yes, Yes, Yes, Yes, Yes, Yes, No] → 6 Yes, 2 No
Entropy=−68log⁡2(68)−28log⁡2(28)≈0.811Entropy = -\frac{6}{8}\log_2\left(\frac{6}{8}\right) -
\frac{2}{8}\log_2\left(\frac{2}{8}\right) \approx 0.811Entropy=−86log2(86)−82log2(82)≈0.811

• Strong (6): [No, Yes, No, Yes, Yes, No] → 3 Yes, 3 No


Entropy=1.0Entropy = 1.0Entropy=1.0
Gain(Wind)=0.940−(814⋅0.811+614⋅1.0)=0.940−(0.463+0.429)=0.940−0.892=0.048Gain(Wind) = 0.940 - \left(
\frac{8}{14} \cdot 0.811 + \frac{6}{14} \cdot 1.0 \right) = 0.940 - (0.463 + 0.429) = 0.940 - 0.892 = \
boxed{0.048}Gain(Wind)=0.940−(148⋅0.811+146⋅1.0)=0.940−(0.463+0.429)=0.940−0.892=0.048

Step 3:
Attribute Selection
Now compare all the information gains:

• Gain(Outlook): 0.245
• Gain(Temperature): 0.029
• Gain(Humidity): 0.151
• Gain(Wind): 0.048
So, Outlook has the highest information gain → It will be selected as the root node.

Step 4:
Subsequent Steps of ID3 Algorithm
Once Outlook is selected as the root node, the algorithm proceeds as follows:

1. Split the dataset by Outlook's values:

• Sunny

• Overcast

• Rain
2. Recursively apply ID3 to each branch:

• Outlook = Overcast → All samples are Yes → Leaf node: Yes

• Outlook = Sunny → Subset: 5 records → Recompute entropy & info gain for Temperature, Humidity, and
Wind in this subset.

• Outlook = Rain → Subset: 5 records → Recompute gains for remaining attributes to split further.
3. Repeat:
Continue selecting attributes with the highest information gain until:
• All samples in a subset belong to the same class (pure), or
• No more attributes left.

Final Summary:
1. Initial entropy of dataset = 0.940

2. Information gains:

o Outlook: 0.245

o Temperature:

0.029 o Humidity:

0.151 o Wind: 0.048

3. Root node selected = Outlook

4. Next steps = Split by Outlook, recursively apply ID3 on resulting subsets using remaining attributes.

Root Node: Outlook


It splits into three branches:

1. Outlook = Overcast

Outlook Temperature Humidity Wind PlayTennis

Overcast Hot High Weak Yes

Overcast Cool Normal Strong Yes


Outlook Temperature Humidity Wind PlayTennis

Overcast Mild High Strong Yes

Overcast Hot Normal Weak Yes


All are Yes → Pure leaf node:
Outlook = Overcast → PlayTennis = Yes

2. Outlook = Sunny
Subset:

Outlook Temperature Humidity Wind PlayTennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Sunny Mild High Weak No

Sunny Cool Normal Weak Yes

Sunny Mild Normal Strong Yes


Target values: 2 Yes, 3 No → Not pure → Need to split further.
Let’s calculate information gain for attributes in this subset:

• Remaining attributes: Temperature, Humidity, Wind


a) Entropy(Sunny subset):
=−25log⁡2(25)−35log⁡2(35)≈0.971= -\frac{2}{5}\log_2\left(\frac{2}{5}\right) -
\frac{3}{5}\log_2\left(\frac{3}{5}\right) \approx 0.971=−52log2(52)−53log2(53)≈0.971

i. Humidity

• High (3): [No, No, No] → All No → Entropy = 0


• Normal (2): [Yes, Yes] → All Yes → Entropy = 0
Gain(Humidity)=0.971−(3/5⋅0+2/5⋅0)=0.971Gain(Humidity) = 0.971 - (3/5 \cdot 0 + 2/5 \cdot 0) =
\boxed{0.971}Gain(Humidity)=0.971−(3/5⋅0+2/5⋅0)=0.971
Best gain → choose Humidity to split the "Sunny" branch.
Final sunny subtree:
If Outlook = Sunny:
└── Humidity

├── High → PlayTennis = No


└── Normal → PlayTennis = Yes
3. Outlook = Rain
Subset:

Outlook Temperature Humidity Wind PlayTennis

Rain Mild High Weak Yes

Rain Cool Normal Weak Yes

Rain Cool Normal Strong No

Rain Mild Normal Weak Yes

Rain Mild High Strong No


Target values: 3 Yes, 2 No → Not pure → Need to split
further. Remaining attributes: Temperature, Humidity, Wind

a) Entropy(Rain subset):
=−35log⁡2(35)−25log⁡2(25)=0.971= -\frac{3}{5}\log_2\left(\frac{3}{5}\right) -
\frac{2}{5}\log_2\left(\frac{2}{5}\right) = 0.971=−53log2(53)−52log2(52)=0.971

i. Wind
• Weak (3): [Yes, Yes, Yes] → All Yes → Entropy = 0
• Strong (2): [No, No] → All No → Entropy = 0
Gain(Wind)=0.971−(3/5⋅0+2/5⋅0)=0.971Gain(Wind) = 0.971 - (3/5 \cdot 0 + 2/5 \cdot 0) =
\boxed{0.971}Gain(Wind)=0.971−(3/5⋅0+2/5⋅0)=0.971
Best gain → choose Wind to split the "Rain" branch.

Final Rain Subtree:


If Outlook = Rain:
└── Wind

├── Weak → PlayTennis = Yes


└── Strong → PlayTennis = No

Final Decision Tree (First Two Levels):


Outlook?

├── Overcast → Yes

├── Sunny
│ └── Humidity?
│ ├── High → No
│ └── Normal → Yes
└── Rain
└── Wind?

├── Weak → Yes


└── Strong → No

You might also like