0% found this document useful (0 votes)

97 views

3.1 C 4.5 Algorithm-19

The document discusses the C4.5 decision tree algorithm and provides a step-by-step example of building a decision tree model. It calculates the entropy, information gain, and gain ratio for various attributes in a sample weather dataset to determine the root node. Outlook had the highest gain and gain ratio, so it was selected as the root node. The tree then split on humidity levels for sunny days and always predicted yes for overcast days.

Uploaded by

nayan jain

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views

3.1 C 4.5 Algorithm-19

Uploaded by

nayan jain

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

A Step By Step

C4.5 Decision
Tree Example
https://sefiks.com/2018/05/13/a-step-by-step-
c4-5-decision-tree-example/

Post navigation
Decision trees are still hot topics nowadays in data science world. Here, ID3 is the most
common conventional decision tree algorithm but it has bottlenecks. Attributes must be
nominal values, dataset must not include missing data, and finally the algorithm tend to
fall into overfitting. Here, Ross Quinlan, inventor of ID3, made some improvements for
these bottlenecks and created a new algorithm named C4.5. Now, the algorithm can
create a more generalized models including continuous data and could handle missing
data. Additionally, some resources such as Weka named this algorithm as J48. Actually,
it refers to re-implementation of C4.5 release 8.

Day Outlook Temp. Humidity Wind Decision

1 Sunny 85 85 Weak No

2 Sunny 80 90 Strong No

3 Overcast 83 78 Weak Yes

4 Rain 70 96 Weak Yes

5 Rain 68 80 Weak Yes

6 Rain 65 70 Strong No

7 Overcast 64 65 Strong Yes

8 Sunny 72 95 Weak No

9 Sunny 69 70 Weak Yes

10 Rain 75 80 Weak Yes

11 Sunny 75 70 Strong Yes

12 Overcast 72 90 Strong Yes

13 Overcast 81 75 Weak Yes

14 Rain 71 80 Strong No

We will do what we have done in ID3 example. Firstly, we need to calculate global
entropy. There are 14 examples; 9 instances refer to yes decision, and 5 instances refer
to no decision.

Entropy(Decision) = ∑ – p(I) . log2p(I) = – p(Yes) . log2p(Yes) – p(No) . log2p(No) = –

(9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

In ID3 algorithm, we’ve calculated gains for each attribute. Here, we need to calculate
gain ratios instead of gains.

GainRatio(A) = Gain(A) / SplitInfo(A)

SplitInfo(A) = -∑ |Dj|/|D| x log2|Dj|/|D|

Wind Attribute
Wind is a nominal attribute. Its possible values are weak and strong.

Gain(Decision, Wind) = Entropy(Decision) – ∑ ( p(Decision|Wind) .

Entropy(Decision|Wind) )

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) .

Entropy(Decision|Wind=Weak) ] + [ p(Decision|Wind=Strong) .
Entropy(Decision|Wind=Strong) ]

There are 8 weak wind instances. 2 of them are concluded as no, 6 of them are
concluded as yes.

Entropy(Decision|Wind=Weak) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = – (2/8) .

log2(2/8) – (6/8) . log2(6/8) = 0.811

Entropy(Decision|Wind=Strong) = – (3/6) . log2(3/6) – (3/6) . log2(3/6) = 1

Gain(Decision, Wind) = 0.940 – (8/14).(0.811) – (6/14).(1) = 0.940 – 0.463 – 0.428 =

0.049

There are 8 decisions for weak wind, and 6 decisions for strong wind.

SplitInfo(Decision, Wind) = -(8/14).log2(8/14) – (6/14).log2(6/14) = 0.461 + 0.524 = 0.985

GainRatio(Decision, Wind) = Gain(Decision, Wind) / SplitInfo(Decision, Wind) = 0.049

/ 0.985 = 0.049

Outlook Attribute
Outlook is a nominal attribute, too. Its possible values are sunny, overcast and rain.

Gain(Decision, Outlook) = Entropy(Decision) – ∑ ( p(Decision|Outlook) .

Entropy(Decision|Outlook) ) =

Gain(Decision, Outlook) = Entropy(Decision) – p(Decision|Outlook=Sunny) .

Entropy(Decision|Outlook=Sunny) – p(Decision|Outlook=Overcast) .
Entropy(Decision|Outlook=Overcast) – p(Decision|Outlook=Rain) .
Entropy(Decision|Outlook=Rain)
There are 5 sunny instances. 3 of them are concluded as no, 2 of them are concluded as
yes.

Entropy(Decision|Outlook=Sunny) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -

(3/5).log2(3/5) – (2/5).log2(2/5) = 0.441 + 0.528 = 0.970

Entropy(Decision|Outlook=Overcast) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -

(0/4).log2(0/4) – (4/4).log2(4/4) = 0

Notice that log2(0) is actually equal to -∞ but assume that it is equal to 0. Actually,
lim (x->0) x.log2(x) = 0. If you wonder the proof, please look at this post.

Entropy(Decision|Outlook=Rain) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -

(2/5).log2(2/5) – (3/5).log2(3/5) = 0.528 + 0.441 = 0.970

Gain(Decision, Outlook) = 0.940 – (5/14).(0.970) – (4/14).(0) – (5/14).(0.970) –

(5/14).(0.970) = 0.246

There are 5 instances for sunny, 4 instances for overcast and 5 instances for rain

SplitInfo(Decision, Outlook) = -(5/14).log2(5/14) -(4/14).log2(4/14) -(5/14).log2(5/14) =

1.577

GainRatio(Decision, Outlook) = Gain(Decision, Outlook)/SplitInfo(Decision, Outlook) =

0.246/1.577 = 0.155

Humidity Attribute
As an exception, humidity is a continuous attribute. We need to convert continuous
values to nominal ones. C4.5 proposes to perform binary split based on a threshold
value. Threshold should be a value which offers maximum gain for that attribute. Let’s
focus on humidity attribute. Firstly, we need to sort humidity values smallest to largest.

Day Humidity Decision

7 65 Yes
6 70 No

9 70 Yes

11 70 Yes

13 75 Yes

3 78 Yes

5 80 Yes

10 80 Yes

14 80 No

1 85 No

2 90 No

12 90 Yes

8 95 No
4 96 Yes

Now, we need to iterate on all humidity values and seperate dataset into two parts as
instances less than or equal to current value, and instances greater than the current
value. We would calculate the gain or gain ratio for every step. The value which
maximizes the gain would be the threshold.

Check 65 as a threshold for humidity

Entropy(Decision|Humidity<=65) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -

(0/1).log2(0/1) – (1/1).log2(1/1) = 0

Entropy(Decision|Humidity>65) = -(5/13).log2(5/13) – (8/13).log2(8/13) =0.530 + 0.431 =

0.961

Gain(Decision, Humidity<> 65) = 0.940 – (1/14).0 – (13/14).(0.961) = 0.048

* The statement above refers to that what would branch of decision tree be for less than
or equal to 65, and greater than 65. It does not refer to that humidity is not equal to 65!

SplitInfo(Decision, Humidity<> 65) = -(1/14).log2(1/14) -(13/14).log2(13/14) = 0.371

GainRatio(Decision, Humidity<> 65) = 0.126

Check 70 as a threshold for humidity

Entropy(Decision|Humidity<=70) = – (1/4).log2(1/4) – (3/4).log2(3/4) = 0.811

Entropy(Decision|Humidity>70) = – (4/10).log2(4/10) – (6/10).log2(6/10) = 0.970

Gain(Decision, Humidity<> 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.940 – 0.231

– 0.692 = 0.014

SplitInfo(Decision, Humidity<> 70) = -(4/14).log2(4/14) -(10/14).log2(10/14) = 0.863

GainRatio(Decision, Humidity<> 70) = 0.016

Check 75 as a threshold for humidity

Entropy(Decision|Humidity<=75) = – (1/5).log2(1/5) – (4/5).log2(4/5) = 0.721

Entropy(Decision|Humidity>75) = – (4/9).log2(4/9) – (5/9).log2(5/9) = 0.991

Gain(Decision, Humidity<> 75) = 0.940 – (5/14).(0.721) – (9/14).(0.991) = 0.940 – 0.2575

– 0.637 = 0.045

SplitInfo(Decision, Humidity<> 75) = -(5/14).log2(4/14) -(9/14).log2(10/14) = 0.940

GainRatio(Decision, Humidity<> 75) = 0.047

I think calculation demonstrations are enough. Now, I skip the calculations and write only
results.

Gain(Decision, Outlook <> 78) =0.090, GainRatio(Decision, Humidity<> 78) =0.090

Gain(Decision, Outlook <> 80) = 0.101, GainRatio(Decision, Humidity<> 80) = 0.107

Gain(Decision, Outlook <> 85) = 0.024, GainRatio(Decision, Humidity<> 85) = 0.027

Gain(Decision, Outlook <> 90) = 0.010, GainRatio(Decision, Humidity<> 90) = 0.016

As seen, gain maximizes when threshold is equal to 80 for humidity. This means that we
need to compare other nominal attributes and comparison of humidity to 80 to create a
branch in our tree.

Let’s summarize calculated gain and gain ratios. Outlook attribute comes with both
maximized gain and gain ratio. This means that we need to put outlook decision in root of
decision tree.

Attribute Gain GainRatio

Wind 0.049 0.049

Outlook 0.246 0.155

Humidity <> 80 0.101 0.107

After then, we would apply similar steps just like as ID3 and create following decision
tree. Outlook is put into root node. Now, we should look decisions for different outlook
types.

Outlook = Sunny
We’ve split humidity for greater than 80, and less than or equal to 80. Surprisingly,
decisions would be no if humidity is greater than 80 when outlook is sunny. Similarly,
decision would be yes if humidity is less than or equal to 80 for sunny outlook.

Day Outlook Temp. Hum. > 80 Wind Decision

1 Sunny 85 Yes Weak No

2 Sunny 80 Yes Strong No

8 Sunny 72 Yes Weak No

9 Sunny 69 No Weak Yes

11 Sunny 75 No Strong Yes

Outlook = Overcast
If outlook is overcast, then no matter temperature, humidity or wind are, decision will
always be yes.

Day Outlook Temp. Hum. > 80 Wind Decision

3 Overcast 83 No Weak Yes

7 Overcast 64 No Strong Yes

12 Overcast 72 Yes Strong Yes

13 Overcast 81 No Weak Yes

Outlook = Rain
We’ve just filtered rain outlook instances. As seen, decision would be yes when wind is
weak, and it would be no if wind is strong.

Hum.
Day Outlook Temp. Wind Decision
> 80

4 Rain 70 Yes Weak Yes

5 Rain 68 No Weak Yes

6 Rain 65 No Strong No

10 Rain 75 No Weak Yes

14 Rain 71 No Strong No

Final form of decision table is demonstrated below.

Decision tree generated by C4.5

So, C4.5 algorithm solves most of problems in ID3. The algorithm uses gain ratios
instead of gains. In this way, it creates more generalized trees and not to fall into
overfitting. Moreover, the algorithm transforms continuous attributes to nominal ones
based on gain maximization and in this way it can handle continuous data. Additionally, it
can ignore instances including missing data and handle missing dataset. On the other
hand, both ID3 and C4.5 requires high CPU and memory demand. Besides, most of
authorities think decision tree algorithms in data mining field instead of machine learning.

HKMO Solution Final
0% (3)
HKMO Solution Final
248 pages
MAT1150-Exercices-preparatoires Corrigé
No ratings yet
MAT1150-Exercices-preparatoires Corrigé
6 pages
Semarchy XDM Ebook
No ratings yet
Semarchy XDM Ebook
46 pages
Convolution in 1D and 2D
No ratings yet
Convolution in 1D and 2D
18 pages
AI Berkeley Solution PDF
No ratings yet
AI Berkeley Solution PDF
9 pages
Correction Exercices DIANA Et AGnes
No ratings yet
Correction Exercices DIANA Et AGnes
3 pages
Maths AA SL Papers P1 Combined
100% (1)
Maths AA SL Papers P1 Combined
688 pages
Machine Learning Exercises in Python, Part 1: Curious Insight
No ratings yet
Machine Learning Exercises in Python, Part 1: Curious Insight
14 pages
BD Solutions PDF
100% (3)
BD Solutions PDF
31 pages
The Magic of Maths by Arthur Benjamin
100% (2)
The Magic of Maths by Arthur Benjamin
11 pages
Decision Tree Entropy Gini
No ratings yet
Decision Tree Entropy Gini
5 pages
Order of Complexity Analysis
No ratings yet
Order of Complexity Analysis
9 pages
Lex-Yacc For Exam
100% (1)
Lex-Yacc For Exam
17 pages
JMonkeyEngine 3 Tutorial (1) Hello SimpleAplication
No ratings yet
JMonkeyEngine 3 Tutorial (1) Hello SimpleAplication
6 pages
Java Exercises Beginning
No ratings yet
Java Exercises Beginning
1 page
3.3 Complexity of Algorithms: Exercises
No ratings yet
3.3 Complexity of Algorithms: Exercises
3 pages
Selection of Optimal Solution For Example and Model of Retrieval Based Voice Conversion
No ratings yet
Selection of Optimal Solution For Example and Model of Retrieval Based Voice Conversion
8 pages
A Short Course On Nonparametric Curve Estimation R PDF
No ratings yet
A Short Course On Nonparametric Curve Estimation R PDF
114 pages
Exercices Modelisation
No ratings yet
Exercices Modelisation
3 pages
Dsa Assignment: 1. Define Binary Tree
No ratings yet
Dsa Assignment: 1. Define Binary Tree
6 pages
Correction Bac Sti 2023
No ratings yet
Correction Bac Sti 2023
12 pages
Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist
No ratings yet
Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist
6 pages
Falcon PDF
No ratings yet
Falcon PDF
81 pages
Ns2 Sample Coding in Wireless
100% (1)
Ns2 Sample Coding in Wireless
74 pages
Square Topology For NoCs
No ratings yet
Square Topology For NoCs
4 pages
Exercise Simple Linear Regression
100% (1)
Exercise Simple Linear Regression
5 pages
Lehmann-Scheffe Theorem
No ratings yet
Lehmann-Scheffe Theorem
15 pages
Semaphore
100% (3)
Semaphore
4 pages
CS 525: Linear Programming 1993 Final Exam Solution
No ratings yet
CS 525: Linear Programming 1993 Final Exam Solution
4 pages
Or Assign
No ratings yet
Or Assign
13 pages
Distributed Database Derived Horizontal Fragmentation
No ratings yet
Distributed Database Derived Horizontal Fragmentation
26 pages
Pycryptodome Master
100% (1)
Pycryptodome Master
82 pages
Goog Le Net
No ratings yet
Goog Le Net
30 pages
Chapter 5 Algorithmic Complexity
No ratings yet
Chapter 5 Algorithmic Complexity
4 pages
Problem Solving
No ratings yet
Problem Solving
20 pages
Comp CFL 10
No ratings yet
Comp CFL 10
38 pages
Solution: Petri Nets - Homework 1
100% (1)
Solution: Petri Nets - Homework 1
5 pages
RCViewer
No ratings yet
RCViewer
7 pages
Prims Algorithm
No ratings yet
Prims Algorithm
6 pages
1.examen S.F.S.D - QCM - Recto-Verso
100% (1)
1.examen S.F.S.D - QCM - Recto-Verso
2 pages
Quizizz: IT202-MIDTERM EXAM
No ratings yet
Quizizz: IT202-MIDTERM EXAM
55 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
ECDSA Advantages Over Others
No ratings yet
ECDSA Advantages Over Others
7 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Exercises PDF
No ratings yet
Exercises PDF
29 pages
Artificial Intelligence DITI 1113: Uniformed Search II
No ratings yet
Artificial Intelligence DITI 1113: Uniformed Search II
36 pages
Fuzzy C-Means Clustering - MATLAB FCM
0% (1)
Fuzzy C-Means Clustering - MATLAB FCM
6 pages
PRP
No ratings yet
PRP
60 pages
Pipeline Data Hazards: Example #1 - Write-Back Data Hazard
No ratings yet
Pipeline Data Hazards: Example #1 - Write-Back Data Hazard
6 pages
Algorithms First-Fit, Next-Fit, and Best-Fit
100% (1)
Algorithms First-Fit, Next-Fit, and Best-Fit
3 pages
5 Pointers
100% (1)
5 Pointers
18 pages
The Successive Overrelaxation Method
No ratings yet
The Successive Overrelaxation Method
2 pages
Distributed Databases: Solutions To Practice Exercises
No ratings yet
Distributed Databases: Solutions To Practice Exercises
4 pages
Overview-Lex and Yacc Program
No ratings yet
Overview-Lex and Yacc Program
11 pages
Intro To Ns2
100% (1)
Intro To Ns2
239 pages
Ospf Lab Opnet It Guru
100% (1)
Ospf Lab Opnet It Guru
3 pages
Notes On Aymptotic Notation
No ratings yet
Notes On Aymptotic Notation
11 pages
Ramsey Ccass Koopman
No ratings yet
Ramsey Ccass Koopman
56 pages
Implementation
No ratings yet
Implementation
14 pages
Decision Tree - ID3
No ratings yet
Decision Tree - ID3
11 pages
ML UNIT III
No ratings yet
ML UNIT III
18 pages
3 Decision Trees_LMS
No ratings yet
3 Decision Trees_LMS
47 pages
ID3 Algorithm
No ratings yet
ID3 Algorithm
25 pages
Open Engineering Mathematics 2
No ratings yet
Open Engineering Mathematics 2
89 pages
Mid Year Exam Paper 2
No ratings yet
Mid Year Exam Paper 2
10 pages
Revit Families Cheat Sheet
No ratings yet
Revit Families Cheat Sheet
9 pages
Grade 10 - Diagnostic Test - Math PDF
100% (1)
Grade 10 - Diagnostic Test - Math PDF
5 pages
Calculus DLL Week 6
No ratings yet
Calculus DLL Week 6
7 pages
PDF PRACTICE MAKES PERFECT ALGEBRA I REVIEW AND WORKBOOK - Ebook PDF Download
100% (4)
PDF PRACTICE MAKES PERFECT ALGEBRA I REVIEW AND WORKBOOK - Ebook PDF Download
49 pages
AP Calc - Unit 2 Review Worksheet #1
No ratings yet
AP Calc - Unit 2 Review Worksheet #1
6 pages
Yearly Plan Mathematics T Sem 1
No ratings yet
Yearly Plan Mathematics T Sem 1
7 pages
Syllabus 2
No ratings yet
Syllabus 2
3 pages
3RD Summative Test in Math 7 Quarter 1
100% (1)
3RD Summative Test in Math 7 Quarter 1
3 pages
2015 Integrating The SU2 Code With Overset Grids (AIAA 2015-3428,16P)
No ratings yet
2015 Integrating The SU2 Code With Overset Grids (AIAA 2015-3428,16P)
16 pages
HW Practice 1.7 to 1.11
No ratings yet
HW Practice 1.7 to 1.11
4 pages
C105-Partial Differential Equations PDF
No ratings yet
C105-Partial Differential Equations PDF
2 pages
Cramer's Rule - The Method of Determinants
No ratings yet
Cramer's Rule - The Method of Determinants
10 pages
Midterm Exam: ENGR 391 Numerical Methods
No ratings yet
Midterm Exam: ENGR 391 Numerical Methods
7 pages
CT3240 - Calculus 1 - Formula Sheets
No ratings yet
CT3240 - Calculus 1 - Formula Sheets
2 pages
Relation and Functions - Short Notes (Maths)
No ratings yet
Relation and Functions - Short Notes (Maths)
1 page
DPP Qs 2.0 Matrices & Determinants (New Syllabus)
No ratings yet
DPP Qs 2.0 Matrices & Determinants (New Syllabus)
3 pages
E1) CLDS Documentation D1-D10
No ratings yet
E1) CLDS Documentation D1-D10
16 pages
Parallel Programming Course Project Proposal
No ratings yet
Parallel Programming Course Project Proposal
2 pages
Add Maths Form 3: Tutorial: Simultaneous Equations
No ratings yet
Add Maths Form 3: Tutorial: Simultaneous Equations
2 pages
class 10 mensuration MCQ sheet 5
No ratings yet
class 10 mensuration MCQ sheet 5
1 page
Calculus 4th Quarter V2 .Edited
No ratings yet
Calculus 4th Quarter V2 .Edited
21 pages
Shannons Model
No ratings yet
Shannons Model
2 pages
Darve Cme104 Matlab
No ratings yet
Darve Cme104 Matlab
24 pages
CMSC 420: Lecture 12 Balancing by Rebuilding - Scapegoat Trees
No ratings yet
CMSC 420: Lecture 12 Balancing by Rebuilding - Scapegoat Trees
5 pages
Resource Management Techniques
40% (5)
Resource Management Techniques
228 pages

3.1 C 4.5 Algorithm-19

Uploaded by

3.1 C 4.5 Algorithm-19

Uploaded by

A Step By Step

Day Outlook Temp. Humidity Wind Decision

3 Overcast 83 78 Weak Yes

4 Rain 70 96 Weak Yes

5 Rain 68 80 Weak Yes

7 Overcast 64 65 Strong Yes

9 Sunny 69 70 Weak Yes

10 Rain 75 80 Weak Yes

11 Sunny 75 70 Strong Yes

12 Overcast 72 90 Strong Yes

13 Overcast 81 75 Weak Yes

Entropy(Decision) = ∑ – p(I) . log2p(I) = – p(Yes) . log2p(Yes) – p(No) . log2p(No) = –

GainRatio(A) = Gain(A) / SplitInfo(A)

Gain(Decision, Wind) = Entropy(Decision) – ∑ ( p(Decision|Wind) .

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) .

Entropy(Decision|Wind=Weak) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = – (2/8) .

Entropy(Decision|Wind=Strong) = – (3/6) . log2(3/6) – (3/6) . log2(3/6) = 1

Gain(Decision, Wind) = 0.940 – (8/14).(0.811) – (6/14).(1) = 0.940 – 0.463 – 0.428 =

SplitInfo(Decision, Wind) = -(8/14).log2(8/14) – (6/14).log2(6/14) = 0.461 + 0.524 = 0.985

GainRatio(Decision, Wind) = Gain(Decision, Wind) / SplitInfo(Decision, Wind) = 0.049

Gain(Decision, Outlook) = Entropy(Decision) – ∑ ( p(Decision|Outlook) .

Gain(Decision, Outlook) = Entropy(Decision) – p(Decision|Outlook=Sunny) .

Entropy(Decision|Outlook=Sunny) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -

Entropy(Decision|Outlook=Overcast) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -

Entropy(Decision|Outlook=Rain) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -

Gain(Decision, Outlook) = 0.940 – (5/14).(0.970) – (4/14).(0) – (5/14).(0.970) –

SplitInfo(Decision, Outlook) = -(5/14).log2(5/14) -(4/14).log2(4/14) -(5/14).log2(5/14) =

GainRatio(Decision, Outlook) = Gain(Decision, Outlook)/SplitInfo(Decision, Outlook) =

Day Humidity Decision

Check 65 as a threshold for humidity

Entropy(Decision|Humidity<=65) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -

Entropy(Decision|Humidity>65) = -(5/13).log2(5/13) – (8/13).log2(8/13) =0.530 + 0.431 =

Gain(Decision, Humidity<> 65) = 0.940 – (1/14).0 – (13/14).(0.961) = 0.048

SplitInfo(Decision, Humidity<> 65) = -(1/14).log2(1/14) -(13/14).log2(13/14) = 0.371

GainRatio(Decision, Humidity<> 65) = 0.126

Check 70 as a threshold for humidity

Entropy(Decision|Humidity<=70) = – (1/4).log2(1/4) – (3/4).log2(3/4) = 0.811

Entropy(Decision|Humidity>70) = – (4/10).log2(4/10) – (6/10).log2(6/10) = 0.970

Gain(Decision, Humidity<> 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.940 – 0.231

SplitInfo(Decision, Humidity<> 70) = -(4/14).log2(4/14) -(10/14).log2(10/14) = 0.863

GainRatio(Decision, Humidity<> 70) = 0.016

Check 75 as a threshold for humidity

Entropy(Decision|Humidity<=75) = – (1/5).log2(1/5) – (4/5).log2(4/5) = 0.721

Gain(Decision, Humidity<> 75) = 0.940 – (5/14).(0.721) – (9/14).(0.991) = 0.940 – 0.2575

SplitInfo(Decision, Humidity<> 75) = -(5/14).log2(4/14) -(9/14).log2(10/14) = 0.940

GainRatio(Decision, Humidity<> 75) = 0.047

Gain(Decision, Outlook <> 78) =0.090, GainRatio(Decision, Humidity<> 78) =0.090

Gain(Decision, Outlook <> 80) = 0.101, GainRatio(Decision, Humidity<> 80) = 0.107

Gain(Decision, Outlook <> 85) = 0.024, GainRatio(Decision, Humidity<> 85) = 0.027

Gain(Decision, Outlook <> 90) = 0.010, GainRatio(Decision, Humidity<> 90) = 0.016

Attribute Gain GainRatio

Wind 0.049 0.049

Outlook 0.246 0.155

Humidity <> 80 0.101 0.107

Day Outlook Temp. Hum. > 80 Wind Decision

1 Sunny 85 Yes Weak No

2 Sunny 80 Yes Strong No

8 Sunny 72 Yes Weak No

9 Sunny 69 No Weak Yes

11 Sunny 75 No Strong Yes

Day Outlook Temp. Hum. > 80 Wind Decision

3 Overcast 83 No Weak Yes

12 Overcast 72 Yes Strong Yes

13 Overcast 81 No Weak Yes

4 Rain 70 Yes Weak Yes

5 Rain 68 No Weak Yes

10 Rain 75 No Weak Yes

Final form of decision table is demonstrated below.

You might also like