0% found this document useful (0 votes)

5 views11 pages

Decision Trees

The document discusses decision-making in restaurant selection using past experiences and data analysis techniques like entropy, information gain, and remainder. It explains how to calculate entropy to measure uncertainty in waiting times, and how to use information gain to determine the best attributes for predicting outcomes. The calculations demonstrate how to evaluate the effectiveness of various features, such as patron count and price, in reducing uncertainty when deciding whether to wait at a restaurant.

Uploaded by

onemistry513

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views11 pages

Decision Trees

Uploaded by

onemistry513

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Imagine you're trying to decide where to eat with your friends

You have some past experiences (data) about restaurants—what kind of food they serve, how
busy they are, prices, whether you had to wait, and whether you ended up waiting or not.
Now you want to predict: Will I have to wait if I go to a new restaurant with similar features

1. Entropy = Uncertainty or Confusion

Think of entropy like the amount of confusion you have when making a decision.
 If half the time you wait and half the time you don’t, you’re very uncertain → entropy is
high (maximum = 1).
 If almost always you either wait or don’t, then entropy is low → you're more certain.
In short: Entropy tells you how mixed-up your outcomes are. The more mixed, the harder it is to
make decisions.

2. Information Gain = How much your confusion goes down

Let’s say you notice that if the restaurant is "Empty", you never wait, and if it's "Full", you
usually do.
This means the attribute Patron (number of people) is a great predictor of whether you'll wait.
 So, by looking at Patron, your confusion goes down a lot.
 This decrease in entropy is called Information Gain.
In short: Information Gain tells you how helpful a question (attribute) is in making your
decision easier.

3. Remainder = Leftover confusion after splitting

Even after splitting data by an attribute like Price, the groups might still be messy:
 In one price range, people waited sometimes and didn’t other times.
 That means the split didn't help much — the confusion (entropy) remains high in those
groups.
So, the Remainder is the weighted average of how confused you still are after the split.
In short: Remainder tells you how much uncertainty is still there after splitting the data by a
question.
To calculate the sample entropy of the output class WillWait from the given dataset, we use
the entropy formula:
H ( V )=−∑ P ( v k ) log 2 P ( v k )
k

Where:
 V is the set of possible classes for the target variable (in this case, WillWait = {Yes,
No}).
 P ( v k ) is the probability of class v k in the dataset.

Step-by-step:
1. Count the occurrences of each class:
From the dataset of 12 samples:
 Yes appears in: y1, y3, y4, y6, y8, y12 → 6 times
 No appears in: y2, y5, y7, y9, y10, y11 → 6 times

So,
6
 P ( Yes )= =0.5
12
6
 P ( No )= =0.5
12

2. Plug into the entropy formula:

H ( V )=−[P ( Yes ) ⋅log 2 P ( Yes )+ P ( No ) ⋅log 2 P ( No ) ]

H ( V )=−[0.5⋅ log 2 0.5+0.5 ⋅log 2 0.5]

H ( V )=−[0.5⋅ (−1 )+ 0.5 ⋅ (−1 ) ]=−[−0.5−0.5]=1.0

Sample Entropy = 1.0 bits

This indicates maximum uncertainty, which makes sense because the class labels are evenly split
between Yes and No.
1. Entropy (before split):

B ( p p+n )=−( p+np log p+np + pn+n log p+nn )

2 2

2. Remainder after splitting on attribute A (e.g., splitting the data by values of

attribute A):
pk + nk
( )
d
pk
Remainder ( A )=∑ B
k=1 p+ n p k +nk

3. Information Gain:

Gain ( A )=B ( p+np )−Remainder ( A )

Example: Calculate Information Gain for attribute Pat (Patron)
Step 1: Get class counts from full dataset (12 examples)
From before:
 p = 6 (Yes), n = 6 (No)
 So total entropy:

B ( 126 )=−( 0.5 log 0.5+0.5 log 0.5)=1.0

2 2

Step 2: Group by values of Pat and count Yes/No in each

Ye
Pat Count s No
None 2 0 2
Some 4 3 1
Full 6 3 3

Step 3: Calculate Remainder(Pat)

Group 1: Pat = None
 p1=0 , n1 =2
 B=−[0⋅ log 2 0+1⋅ log 2 1]=0
2
 Weight: =0.1667
12
 Contribution to remainder: 0.1667 ⋅0=0
Group 2: Pat = Some
 p2=3 , n2=1

(
3
4
3 1
4 4
1
)
 B=− log 2 + log 2 ≈−[0.75 ⋅−0.415+0.25 ⋅−2]=0.811
4
4
 Weight: =0.333
12
 Contribution: 0.333 ⋅0.811≈ 0.270
Group 3: Pat = Full
 p3=3 , n3 =3
 B=1.0 (equal split)
6
 Weight: =0.5
12
 Contribution: 0.5 ⋅1=0.5

Step 4: Total Remainder

Remainder ( Pat )=0+0.270+0.5=0.770

Step 5: Gain(Pat)
Gain ( Pat )=1.0−0.770=0.230

Information Gain for attribute Pat = 0.230 bits

complete calculations for building the decision tree using Information
Gain:

Step 1: Entropy of the Entire Dataset

We have:

 6 examples where WillWait = Yes

 6 examples where WillWait = No

Entropy=− ( 126 log 126 + 126 log 126 )=−2 ⋅ 12 ⋅ log 12=1.0
2 2 2

Step 2: Information Gain for Each Feature

Let’s show the results of calculating Information Gain (IG) for each
attribute:

Feature Information
Gain

Pat (Patrons) 0.541

Hun (Hungry) 0.196

Price 0.196

Est (Estimated 0.208

Wait)

Fri, Rain, Res 0.021

Alt, Bar, Type ~0.0

➡So, Pat gives the highest information gain = 0.541, making it the root
node.

Interpretation:

 Pat (Patrons) = Best first split (most reduction in entropy).

 After splitting on Pat, we would recursively apply the same method to

the subsets to choose the next best feature (e.g., Hun, Price, etc.).
Let’s walk through how to calculate Information Gain step-by-step using
the features Pat (Patrons) and Price as examples.

Step-by-Step Formula:

2. Entropy (S) of the target set WillWait:

Entropy ( S )=−∑ pi log 2 pi

where pi is the proportion of class i in set S

3. Information Gain (IG):

∣ Sv ∣
IG ( S , A )=Entropy ( S )− ∑ ∣S∣
⋅Entropy ( S v )
v ∈Values ( A )

Where:

 A is the attribute (like Pat or Price)

 S v is the subset of S for which attribute A=v

Example 1: Pat (Patrons)

Values of Pat: None, Some, Full

From data:

Pat WillWa
it

Som Yes
e

Full No

Som Yes
e

Full Yes

Full No

Som Yes
e

Non No
Pat WillWa
it

Som Yes
e

Full No

Non No
e

Full Yes

Counts:

 None (2 samples): 0 Yes, 2 No → Entropy = 0

 Some (4 samples): 4 Yes, 0 No → Entropy = 0

 Full (6 samples): 2 Yes, 4 No

( 26 log 26 + 46 log 46 )=0.9183

Entropy Full =− 2 2

Now, total Info Gain for Pat:

( 122 ⋅0+ 124 ⋅0+ 126 ⋅ 0.9183)=1.0−0.4592=0.5408

IG ( S , Pat )=1.0−

Example 2: Price

Values: $, $$, $$$

From data:

 $ (6 samples): 3 Yes, 3 No → Entropy = 1.0

 $$ (2 samples): 2 Yes, 0 No → Entropy = 0

 $$$ (4 samples): 1 Yes, 3 No

Entropy $ $ $=− ( 14 log 14 + 34 log 34 )≈ 0.8113

2 2

Info Gain:
IG ( S , Price )=1.0− ( 126 ⋅1.0+ 122 ⋅0+ 124 ⋅0.8113)=1.0−( 0.5+ 0+0.2704 )=0.2296
(Note: We got ~0.1957 earlier due to rounding/float precision)

Summary:

Featur Information
e Gain

Pat 0.5408 (Best)

Price 0.2296
This equation is a statistical test used to measure how much the actual class distributions in
subsets deviate from what we’d expect if the feature were irrelevant. It's a kind of χ² (Chi-
Square)-like statistic used in some decision tree algorithms (like C4.5) to determine if a feature
is useful for splitting.

Notation Breakdown
 p: Total number of positive examples (e.g., WillWait = Yes)
 n: Total number of negative examples (WillWait = No)
 d: Number of distinct values the attribute takes (number of subsets after a split)
 For subset k:
o pk: Actual number of positives
o nk: Actual number of negatives
 ˆpk, ˆnk: Expected positives and negatives if the attribute was irrelevant

Expected Counts (Under Irrelevance Assumption)

The assumption:
If the attribute is irrelevant, the class proportions in each subset should match the whole dataset.
So, for each subset with sk =pk + nk examples:
 Expected positives:
sk
^pk = p ×
p+n
 Expected negatives:
sk
n^ k =n ×
p +n

Deviation Score (Δ)

The total deviation is:

( )
2 2
d
( p k − ^pk ) ( nk − n^ k )
Δ=∑ +
k=1 ^p k n^ k
This tells you: How far each subset's actual class distribution is from the expected
Higher Δ → more likely the attribute is important

Example (Simplified):
Let’s say you have 12 examples:
 Total p = 6, n = 6
Split by Pat (3 values: None, Some, Full):

Pat Subset Size pk (Yes) nk (No)

None 2 0 2

Som 4 4 0
e

Full 6 2 4

Now compute expected values if Pat were irrelevant:

 ˆpk = p × (pk + nk) / (p + n)
 ˆnk = n × (pk + nk) / (p + n)

Pat Subset Size ˆpk ˆnk

None 2 6 × 2/12 = 1 6 × 2/12 = 1

Som 4 2 2
e

Full 6 3 3

Now compute Δ:

( )
2 2
( pk −^pk ) ( nk−^n k )
Δ=∑ +
^p k n^ k

For None:
( 0−1 )2 ( 2−1 )2
+ =1+ 1=2
1 1
For Some:

( 4−2 )2 ( 0−2 )2 4 4
+ = + =4
2 2 2 2
For Full:

( 2−3 )2 ( 4−3 )2 1 1 2
+ = + =
3 3 3 3 3
Total Δ = 2 + 4 + 2/3 ≈ 6.67

Interpretation:
 High Δ (like 6.67) → Actual counts deviate significantly from expected → Attribute is
useful
 Low Δ (~0) → Class distribution looks like random → Attribute is not useful
This statistic can be used to perform a hypothesis test or as an alternative to Information Gain
when selecting attributes.

Entropy ID3 Exercise
No ratings yet
Entropy ID3 Exercise
3 pages
01 Section 6.2.1 QR Code Content
No ratings yet
01 Section 6.2.1 QR Code Content
5 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
23 Id3
No ratings yet
23 Id3
20 pages
Aiml Easy Solution
No ratings yet
Aiml Easy Solution
70 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Decision Tree and KNN Assignment Two
No ratings yet
Decision Tree and KNN Assignment Two
13 pages
Decision Tree-Using Entropy
No ratings yet
Decision Tree-Using Entropy
17 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
3 Decision Trees_LMS
No ratings yet
3 Decision Trees_LMS
47 pages
University of Gondar: August 2011 E.C Gondar, Ethiopia
No ratings yet
University of Gondar: August 2011 E.C Gondar, Ethiopia
10 pages
Example Decision Tree
No ratings yet
Example Decision Tree
8 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
52 pages
Decision Tree
No ratings yet
Decision Tree
71 pages
تمييز اشكال ميد
No ratings yet
تمييز اشكال ميد
267 pages
ID3 MedhaPradhan
No ratings yet
ID3 MedhaPradhan
22 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
DWDM final5
No ratings yet
DWDM final5
45 pages
5 - Predictive Modeling Using Decision Trees
No ratings yet
5 - Predictive Modeling Using Decision Trees
25 pages
id3algorithm-200307175839
No ratings yet
id3algorithm-200307175839
22 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
03 InformationGain
No ratings yet
03 InformationGain
20 pages
Decision Trees MIT 15.097 Course Notes
No ratings yet
Decision Trees MIT 15.097 Course Notes
17 pages
Lesson 5
No ratings yet
Lesson 5
28 pages
Decision Tree
No ratings yet
Decision Tree
29 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
ML intro
No ratings yet
ML intro
45 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
Machine Learning Unit4
No ratings yet
Machine Learning Unit4
8 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Trees
No ratings yet
Decision Trees
13 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
10b Understanding Entropy Information Gain
No ratings yet
10b Understanding Entropy Information Gain
10 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
10 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
DAA Project
No ratings yet
DAA Project
20 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
ML Unit 3_Questions
No ratings yet
ML Unit 3_Questions
7 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
entropy and IG
No ratings yet
entropy and IG
23 pages
Unit 6 Finalized
No ratings yet
Unit 6 Finalized
30 pages
MLRD 7
No ratings yet
MLRD 7
23 pages
DM GTU Study Material Presentations Unit-4 21052021124323PM
No ratings yet
DM GTU Study Material Presentations Unit-4 21052021124323PM
28 pages
Chapter4 Machine Learning Part3
No ratings yet
Chapter4 Machine Learning Part3
43 pages
3160714_DM_GTU_Study_Material_Presentations_Unit-4_21052021124323PM
No ratings yet
3160714_DM_GTU_Study_Material_Presentations_Unit-4_21052021124323PM
28 pages
Unit-3_ML
No ratings yet
Unit-3_ML
47 pages
Machine Learning II - Decision Trees
No ratings yet
Machine Learning II - Decision Trees
16 pages
Recitation Decision Trees Adaboost 02-09-2006
No ratings yet
Recitation Decision Trees Adaboost 02-09-2006
30 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Bài báo NCKH
No ratings yet
Bài báo NCKH
3 pages
IJRPR9609
No ratings yet
IJRPR9609
6 pages
EViews Workshop
No ratings yet
EViews Workshop
26 pages
Mid I 173 Online
No ratings yet
Mid I 173 Online
1 page
Statistics for Business and Economics Revised 12th Edition Anderson Test Bank - Available For Quick Download And Unlimited Reading
100% (3)
Statistics for Business and Economics Revised 12th Edition Anderson Test Bank - Available For Quick Download And Unlimited Reading
57 pages
Introductory Econometrics A Modern Approach 6th Edition Wooldridge Test Bank Compress
No ratings yet
Introductory Econometrics A Modern Approach 6th Edition Wooldridge Test Bank Compress
9 pages
562-Article Text-1966-1-10-20220722
No ratings yet
562-Article Text-1966-1-10-20220722
4 pages
Correlation Coefficients: Appropriate Use and Interpretation: Anesthesia & Analgesia February 2018
100% (1)
Correlation Coefficients: Appropriate Use and Interpretation: Anesthesia & Analgesia February 2018
7 pages
Instant download Essential Statistics for Public Managers and Policy Analysts Wang pdf all chapter
80% (5)
Instant download Essential Statistics for Public Managers and Policy Analysts Wang pdf all chapter
55 pages
Sample Sizes for Clinical Trials Steven A Julious instant download
No ratings yet
Sample Sizes for Clinical Trials Steven A Julious instant download
44 pages
Group Correlation 3
No ratings yet
Group Correlation 3
4 pages
INGGRIS - Nurullita Tri Handayani - 1709618019 - PAP A 2018
No ratings yet
INGGRIS - Nurullita Tri Handayani - 1709618019 - PAP A 2018
36 pages
0-Cheatsheet Capstone Part 1
No ratings yet
0-Cheatsheet Capstone Part 1
4 pages
Key Formulas: Simple Linear Regression
No ratings yet
Key Formulas: Simple Linear Regression
2 pages
Analisis Statistika: Materi 4 Sebaran Penarikan Contoh (Sampling Distribution)
No ratings yet
Analisis Statistika: Materi 4 Sebaran Penarikan Contoh (Sampling Distribution)
19 pages
معوقات-تطبيق-الإدارة-الإلكترونية
No ratings yet
معوقات-تطبيق-الإدارة-الإلكترونية
11 pages
RATS 7.0 User's Guide
No ratings yet
RATS 7.0 User's Guide
650 pages
Data Collection Methods
No ratings yet
Data Collection Methods
55 pages
Measures of Shape
No ratings yet
Measures of Shape
17 pages
The Proposed Null Hypothesis
No ratings yet
The Proposed Null Hypothesis
11 pages
Model Selection and Model Averaging
No ratings yet
Model Selection and Model Averaging
16 pages
1.2.5. Machine Learning With Python Lab
No ratings yet
1.2.5. Machine Learning With Python Lab
2 pages
Amalan Kepimpinan Lestari Guru Besar Dan Keberkesanan Sekolah Kebangsaan Dihulu Langat, Selangor
No ratings yet
Amalan Kepimpinan Lestari Guru Besar Dan Keberkesanan Sekolah Kebangsaan Dihulu Langat, Selangor
18 pages
SPSS Assignment 3 1.: Paired Samples Test
No ratings yet
SPSS Assignment 3 1.: Paired Samples Test
2 pages
Pycwt
No ratings yet
Pycwt
2 pages
Untitled
No ratings yet
Untitled
9 pages
Course Outline
No ratings yet
Course Outline
1 page
Utkarsh Gupta - House Price Prediction
No ratings yet
Utkarsh Gupta - House Price Prediction
6 pages
8.01 Machine Learning Basics
No ratings yet
8.01 Machine Learning Basics
6 pages
Assignment Chapter 5
No ratings yet
Assignment Chapter 5
2 pages