lecture 6
lecture 6
one
9 yes / 5 no
Outlook
Overcast
Sunny Rain
9 yes / 5 no
Outlook
Overcast
Sunny Rain
9 yes / 5 no
Outlook
Overcast
Sunny Rain
Humidity
Normal High
9 yes / 5 no
Outlook
Overcast
Sunny Rainy
Humidity
Wind
Normal High
Week Strong
9 yes / 5 no
Outlook
4/ 0
Overcast
2/ 3 3/ 2
yes
Sunny Rain
Humidity
Wind
2/ 0 0/ 3
3/ 0 0/ 2
Normal High
Week Strong
yes no
yes no
yes
ID3 Algorithm
Which attribute to split on?
9 yes / 5 no 9 yes / 5 no
Outlook Wind
Sunny Rain
Overcast Week Strong
2/ 3 3/ 2
6/ 2 3/ 3
4/ 0
• The Entropy is 1 when the collection contains an equal number of
positive and negative examples.
• The Entropy is 0 if all members of S belong to the same class
strong
Which attribute is the best classifier? Example
9 yes / 5 no
E=0.94
Humidity
Normal High
3 yes / 4 no 6 yes / 1 no
E=0.985 E=0.592
Gain (S,Humidity)=0.94-(7/14)*0.985-(7/14)*0.592 Gain (S,Wind)=0.94-(8/14)*0.81-(6/14)*1
=0.151 =0.048
▪ Humidity provides greater information gain than Wind, relative to the target
classification.
Sunny Rain
Overcast
2/ 3 3/ 2
4/ 0
9 9 5 5
H ( S , outlook ) = − log 2 − log 2 = 0.94
14 14 14 14
2 2 3 3
H ( S sunny ) = − log 2 − log 2 = 0.97
5 5 5 5
4 4 0 0
H ( S overcast ) = − log 2 − log 2 = 0
4 4 4 4
2 2 3 3
H ( S rain ) = − log 2 − log 2 = 0.97
5 5 5 5
5 4 5
Gain( S , Outlook ) = 0.94 − 0.97 − 0− 0.97 = 0.247
14 14 14
9 yes / 5 no
Outlook
Overcast
Sunny Rainy
Humidity
Wind
Normal High
Week Strong
How to Deal with Overfitting?
Decision Tree Pre-Pruning
Decision Tree Post-Pruning
Decision Tree Post-Pruning