0% found this document useful (0 votes)

170 views

A Step by Step ID3 Decision Tree Example by Niranjan Kumar Das

The document describes how to build a decision tree model to predict whether to play tennis based on weather data over 14 days. It uses the ID3 algorithm to calculate entropy and information gain at each step to determine the most important factor to split on. The analysis finds that outlook (sunny, overcast, or rainy) has the highest information gain and becomes the root node of the decision tree. Subtrees are then built based on the secondary factors for each outlook value.

Uploaded by

Niranjan Kumar Das

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

170 views

A Step by Step ID3 Decision Tree Example by Niranjan Kumar Das

Uploaded by

Niranjan Kumar Das

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

For instance, the following table informs about decision making factors to play tennis at

outside for previous 14 days.

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

We can summarize the ID3 algorithm as illustrated below

Entropy(S) = ∑ – p(I) . log2p(I)

Gain(S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]

These formulas might confuse your mind. Practicing will make it understandable.

Entropy
We need to calculate the entropy rst. Decision column consists of 14 instances and

includes two labels: yes and no. There are 9 decisions labeled yes, and 5 decisions labeled

no.

Entropy(Decision) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)

Entropy(Decision) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

Now, we need to nd the most dominant factor for decisioning.

Wind factor on decision

Gain(Decision, Wind) = Entropy(Decision) – ∑ [ p(Decision|Wind) . Entropy(Decision|Wind) ]

Wind attribute has two labels: weak and strong. We would re ect it to the formula.

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) .

Entropy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) .

Entropy(Decision|Wind=Strong) ]

Now, we need to calculate (Decision|Wind=Weak) and (Decision|Wind=Strong) respectively.

Weak wind factor on decision

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

13 Overcast Hot Normal Weak Yes

There are 8 instances for weak wind. Decision of 2 items are no and 6 items are yes as

illustrated below.

1- Entropy(Decision|Wind=Weak) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)

2- Entropy(Decision|Wind=Weak) = – (2/8) . log2(2/8) – (6/8) . log2(6/8) = 0.811

Notice that if the number of instances of a class were 0 and total number of instances

were n, then we need to calculate -(0/n) . log2(0/n). Here, log(0) would be equal to - ∞, and
we cannot calculate 0 times ∞. This is a special case often appears in decision tree
applications. Even though compilers cannot compute this operation, we can compute it

with calculus. If you wonder how to compute this equation, please read this post.

Strong wind factor on decision

Day Outlook Temp. Humidity Wind Decision

2 Sunny Hot High Strong No

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

14 Rain Mild High Strong No

Here, there are 6 instances for strong wind. Decision is divided into two equal parts.

1- Entropy(Decision|Wind=Strong) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)

2- Entropy(Decision|Wind=Strong) = – (3/6) . log2(3/6) – (3/6) . log2(3/6) = 1

Now, we can turn back to Gain(Decision, Wind) equation.

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) .

Entropy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) .

Entropy(Decision|Wind=Strong) ] = 0.940 – [ (8/14) . 0.811 ] – [ (6/14). 1] = 0.048

Calculations for wind column is over. Now, we need to apply same calculations for other

columns to nd the most dominant factor on decision.

Other factors on decision

We have applied similar calculation on the other columns.

1- Gain(Decision, Outlook) = 0.246

2- Gain(Decision, Temperature) = 0.029

3- Gain(Decision, Humidity) = 0.151

As seen, outlook factor on decision produces the highest score. That’s why, outlook

decision will appear in the root node of the tree.

Root decision on the tree

Now, we need to test dataset for custom subsets of outlook attribute.

Overcast outlook on decision

Basically, decision will always be yes if outlook were overcast.

Day Outlook Temp. Humidity Wind Decision

3 Overcast Hot High Weak Yes

7 Overcast Cool Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

Sunny outlook on decision

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

11 Sunny Mild Normal Strong Yes

Here, there are 5 instances for sunny outlook. Decision would be probably 3/5 percent no,

2/5 percent yes.

1- Gain(Outlook=Sunny|Temperature) = 0.570

2- Gain(Outlook=Sunny|Humidity) = 0.970

3- Gain(Outlook=Sunny|Wind) = 0.019

Now, humidity is the decision because it produces the highest score if outlook were sunny.

At this point, decision will always be no if humidity were high.

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

8 Sunny Mild High Weak No

On the other hand, decision will always be yes if humidity were normal

Day Outlook Temp. Humidity Wind Decision

9 Sunny Cool Normal Weak Yes

11 Sunny Mild Normal Strong Yes

Finally, it means that we need to check the humidity and decide if outlook were sunny.
Ad loxleycolour.com More ▼

Rain outlook on decision

Day Outlook Temp. Humidity Wind Decision

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

10 Rain Mild Normal Weak Yes

14 Rain Mild High Strong No

1- Gain(Outlook=Rain | Temperature) = 0.01997309402197489

2- Gain(Outlook=Rain | Humidity) = 0.01997309402197489

3- Gain(Outlook=Rain | Wind) = 0.9709505944546686

Here, wind produces the highest score if outlook were rain. That’s why, we need to check

wind attribute in 2nd level if outlook were rain.

So, it is revealed that decision will always be yes if wind were weak and outlook were rain.

Day Outlook Temp. Humidity Wind Decision

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

What’s more, decision will be always no if wind were strong and outlook were rain.

Day Outlook Temp. Humidity Wind Decision

6 Rain Cool Normal Strong No

14 Rain Mild High Strong No

So, decision tree construction is over. We can use the following rules for decisioning.

Final version of decision tree

Feature Importance
Decision trees are naturally explainable and interpretable algorithms. Besides, we can nd

the feature importance values as well to understand how model works.

The Foxridge Investment Group Buys and Sells Rental Income Properties
0% (1)
The Foxridge Investment Group Buys and Sells Rental Income Properties
2 pages
ID3 Algorithm For Decision Trees
No ratings yet
ID3 Algorithm For Decision Trees
16 pages
Row Echelon Form
No ratings yet
Row Echelon Form
2 pages
PDF
No ratings yet
PDF
14 pages
Clicker Question Bank For Numerical Analysis (Version 1.0 - May 14, 2020)
No ratings yet
Clicker Question Bank For Numerical Analysis (Version 1.0 - May 14, 2020)
91 pages
maths-class-xii-chapter-12-linear-programming-practice-paper-13-answers
No ratings yet
maths-class-xii-chapter-12-linear-programming-practice-paper-13-answers
12 pages
Riccati Equations Questions and Solutions
No ratings yet
Riccati Equations Questions and Solutions
4 pages
Decision Tree - ID3
No ratings yet
Decision Tree - ID3
11 pages
ID3 Algorithm
No ratings yet
ID3 Algorithm
25 pages
Decision Trees Example Problem
No ratings yet
Decision Trees Example Problem
1 page
Decision Tree Calculation For Play Example
No ratings yet
Decision Tree Calculation For Play Example
10 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Naive Bayes
No ratings yet
Naive Bayes
4 pages
Lec4 - Decision Trees
No ratings yet
Lec4 - Decision Trees
43 pages
Machine Learning Descision Tree
No ratings yet
Machine Learning Descision Tree
20 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Decision Tree by Visualization
No ratings yet
Decision Tree by Visualization
31 pages
DT Weather Data (4)
No ratings yet
DT Weather Data (4)
3 pages
Classification and Regression Trees (CART)
No ratings yet
Classification and Regression Trees (CART)
6 pages
Legaspi Ai2333
No ratings yet
Legaspi Ai2333
7 pages
ML UNIT III
No ratings yet
ML UNIT III
18 pages
Dataset 1 and 2
No ratings yet
Dataset 1 and 2
1 page
Cat 2
No ratings yet
Cat 2
1 page
Lec19 Decision Trees Typednotes
No ratings yet
Lec19 Decision Trees Typednotes
17 pages
Bayes Algorithm: Ex1
No ratings yet
Bayes Algorithm: Ex1
8 pages
05 ZeroR OneR Bayes KNN
No ratings yet
05 ZeroR OneR Bayes KNN
76 pages
Lismasari Data VolleyBall
No ratings yet
Lismasari Data VolleyBall
1 page
ID3_Explanation
No ratings yet
ID3_Explanation
23 pages
TUGAS 3 TPK Ira Rahmawati 0181031144
No ratings yet
TUGAS 3 TPK Ira Rahmawati 0181031144
1 page
lecture 6
No ratings yet
lecture 6
27 pages
WQD700D Test2 Input Data
No ratings yet
WQD700D Test2 Input Data
1 page
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
id3algorithm-200307175839
No ratings yet
id3algorithm-200307175839
22 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Outlook Temperature Humidity Windy Play: AS 0.0206 0.0053 Hence Play No
No ratings yet
Outlook Temperature Humidity Windy Play: AS 0.0206 0.0053 Hence Play No
1 page
Decision Trees For Classification - A Machine Learning Algorithm - Xoriant
No ratings yet
Decision Trees For Classification - A Machine Learning Algorithm - Xoriant
4 pages
Temperature
No ratings yet
Temperature
1 page
Module 4 Question Bank: Big Data Analytics
No ratings yet
Module 4 Question Bank: Big Data Analytics
2 pages
ML - Unit 2
No ratings yet
ML - Unit 2
15 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
BIT 415 Term Paper Questions
No ratings yet
BIT 415 Term Paper Questions
2 pages
3.1 C 4.5 Algorithm-19
No ratings yet
3.1 C 4.5 Algorithm-19
10 pages
3 Decision Trees_LMS
No ratings yet
3 Decision Trees_LMS
47 pages
MLT Notes
No ratings yet
MLT Notes
66 pages
Uts Riyanto (2020150115)
No ratings yet
Uts Riyanto (2020150115)
6 pages
Chapter_06_Naive_Bayes
No ratings yet
Chapter_06_Naive_Bayes
13 pages
Machine Learning
No ratings yet
Machine Learning
52 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Lec 3&4
No ratings yet
Lec 3&4
20 pages
Machine Learning Notes - Lec 04 - Decision Tree Learning
No ratings yet
Machine Learning Notes - Lec 04 - Decision Tree Learning
108 pages
Bai5-BaiTap2
No ratings yet
Bai5-BaiTap2
7 pages
Digital Assignment 1
No ratings yet
Digital Assignment 1
5 pages
Decision Tree
No ratings yet
Decision Tree
10 pages
Lecture 5-1 Naive
No ratings yet
Lecture 5-1 Naive
44 pages
Tables Input
No ratings yet
Tables Input
1 page
Bài tập cây quyết định
No ratings yet
Bài tập cây quyết định
3 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
64 pages
Naive Bayes Classification_04c360b1c962b080d8b84f51f8a494ad
No ratings yet
Naive Bayes Classification_04c360b1c962b080d8b84f51f8a494ad
5 pages
Lecture 19 - Decision Tress
No ratings yet
Lecture 19 - Decision Tress
21 pages
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-08-19 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-08-19 Reference-Material-I
11 pages
Unit 1 Concept of Human Resources Management
No ratings yet
Unit 1 Concept of Human Resources Management
6 pages
How To Start Freelancing
No ratings yet
How To Start Freelancing
10 pages
What Is The Curse of Dimensionality?
No ratings yet
What Is The Curse of Dimensionality?
3 pages
Chapter 6 7 Anomaly Fraud Detection Advanced Datamining Application
No ratings yet
Chapter 6 7 Anomaly Fraud Detection Advanced Datamining Application
10 pages
Star and Snowflake Schema in Data Warehouse With Examples: What Is Multidimensional Schema?
No ratings yet
Star and Snowflake Schema in Data Warehouse With Examples: What Is Multidimensional Schema?
6 pages
Difference Between Descriptive and Predictive Data Mining With Comparison Chart Tech Differences
No ratings yet
Difference Between Descriptive and Predictive Data Mining With Comparison Chart Tech Differences
4 pages
Difference Between Fact Table and Dimension Table: Last Updated: 04 Sep, 2020
No ratings yet
Difference Between Fact Table and Dimension Table: Last Updated: 04 Sep, 2020
2 pages
What Is The Correlation Coefficient?
No ratings yet
What Is The Correlation Coefficient?
3 pages
SWING UI Elements:: Unit V: Exploring Swing
No ratings yet
SWING UI Elements:: Unit V: Exploring Swing
4 pages
The Seven Deadly Sins of Powerpoint Presentations
No ratings yet
The Seven Deadly Sins of Powerpoint Presentations
5 pages
Unit VI: The Applet Class
No ratings yet
Unit VI: The Applet Class
12 pages
Unit IV: Introducing Swing: MVC Architecture
No ratings yet
Unit IV: Introducing Swing: MVC Architecture
24 pages
Types of Event: What Is An Event?
No ratings yet
Types of Event: What Is An Event?
6 pages
Science 4-Lesson 2 - 3rd Quarter
No ratings yet
Science 4-Lesson 2 - 3rd Quarter
3 pages
Assignment#01 - CEE 543 - Prof. MA
No ratings yet
Assignment#01 - CEE 543 - Prof. MA
5 pages
Scheme of Work Mat421 March 2024
No ratings yet
Scheme of Work Mat421 March 2024
5 pages
Siep C710617 05F Ga700 TM en PDF
No ratings yet
Siep C710617 05F Ga700 TM en PDF
1,052 pages
Math 150a Syllabus
No ratings yet
Math 150a Syllabus
2 pages
Immersion-Instrument Question
No ratings yet
Immersion-Instrument Question
29 pages
Airstream and Slipstream
No ratings yet
Airstream and Slipstream
16 pages
2021-P6-Maths-Semestral Assessment 1-Nan Hua
No ratings yet
2021-P6-Maths-Semestral Assessment 1-Nan Hua
42 pages
Math 9 1st Quarter TOS
No ratings yet
Math 9 1st Quarter TOS
11 pages
TSB 2009 100 NMF 00659 Scorpio MBFM New Instrcluster
100% (1)
TSB 2009 100 NMF 00659 Scorpio MBFM New Instrcluster
11 pages
Electron Configuration and Orbital Diagram
67% (3)
Electron Configuration and Orbital Diagram
15 pages
STM Setup
No ratings yet
STM Setup
226 pages
Sample Problem Answers
No ratings yet
Sample Problem Answers
6 pages
Theory of Helix-Coil Transition: Loukas Petridis Monday April 7 2008
No ratings yet
Theory of Helix-Coil Transition: Loukas Petridis Monday April 7 2008
17 pages
Gidley, 1985
No ratings yet
Gidley, 1985
9 pages
Mini Project: Design Passive Filters Objective
No ratings yet
Mini Project: Design Passive Filters Objective
11 pages
DIN 13-1 (1999) - General Purpose ISO Metric Screw Threads
100% (1)
DIN 13-1 (1999) - General Purpose ISO Metric Screw Threads
4 pages
Name
No ratings yet
Name
2 pages
Aldehydes, Ketones and Carboxylic Acids: Module - 7
No ratings yet
Aldehydes, Ketones and Carboxylic Acids: Module - 7
29 pages
Numerical Problems On Optical Fibers and Theory
100% (1)
Numerical Problems On Optical Fibers and Theory
72 pages
Documentation Lignes de Commandes
No ratings yet
Documentation Lignes de Commandes
34 pages
Biology_paper_2__TZ2_SL_markscheme
No ratings yet
Biology_paper_2__TZ2_SL_markscheme
15 pages
Topic: Coordinate Systems and Transformation
No ratings yet
Topic: Coordinate Systems and Transformation
33 pages
00 SEP671 REL670 Register Binder B
No ratings yet
00 SEP671 REL670 Register Binder B
1 page
Plastic Memory Report
100% (2)
Plastic Memory Report
31 pages
Computational Geomechanics and Hydraulic Structures Sheng-Hong Chen 2024 Scribd Download
100% (3)
Computational Geomechanics and Hydraulic Structures Sheng-Hong Chen 2024 Scribd Download
40 pages
Flyin' Miata: 1.6 To 1.8 Swap Hints
No ratings yet
Flyin' Miata: 1.6 To 1.8 Swap Hints
6 pages
The Stability Graph After Three Decades in Use: Experiences and The Way Forward
No ratings yet
The Stability Graph After Three Decades in Use: Experiences and The Way Forward
35 pages
Three Single-Phase Voltage Source Converter Based Three-Phase Four Wire DSTATCOM
No ratings yet
Three Single-Phase Voltage Source Converter Based Three-Phase Four Wire DSTATCOM
5 pages