0% found this document useful (0 votes)

2 views

02 DecisionTrees Done

A decision tree is a supervised learning algorithm used for classification and regression, which organizes data into a tree structure based on feature attributes. The ID3 algorithm is commonly used to select the best attributes for splitting data, utilizing information gain to measure the effectiveness of each attribute. The document also discusses the concept of entropy and how it relates to information gain in decision tree construction.

Uploaded by

david1milad1982

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

02 DecisionTrees Done

Uploaded by

david1milad1982

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

Decision Trees

A decision tree is a
supervised learning
algorithm that is used for
classification and regression
modeling
Sample Dataset
• Columns denote features Xi
• Rows denote labeled instances
• Class label denotes whether a tennis game was played
Decision Tree
• A possible decision tree for the data:

• Each internal node: test one attribute Xi

• Each branch from a node: selects one value for Xi
• Each leaf node: predict Y
Decision Tree
• A possible decision tree for the data:

• What prediction would we make for

<outlook=sunny, temperature=hot, humidity=high, wind=weak> ?

NO
Decision Tree
• If features are continuous, internal nodes can
test the value of a feature against a threshold
Decision Tree Induced
Partition
Decision Tree – Decision Boundary
• Decision trees divide the feature space into axis-
parallel (hyper-)rectangles
• Each rectangular region is labeled with one label
– or a probability distribution over labels

Decision
boundary
Another Example:
Restaurant Domain (Russell & Norvig)
‫مدير‬
Model a patron’s decision of whether to wait for a table at a restaurant

~7,000 possible cases

A Decision Tree
from Introspection
‫التامل‬

Is this the best decision tree?

Preference bias: Ockham’s Razor

Idea: The simplest consistent explanation is the best

• Therefore, the smallest decision tree that correctly

classifies all of the training examples is best
• Finding the provably smallest decision tree is NP-hard
• ...So instead of constructing the absolute smallest tree
consistent with the training examples, construct one that
is pretty small
Choosing the Best Attribute

Key problem: choosing which attribute to split a

given set of examples
•Some possibilities are:
– Random: Select any attribute at random
– Least-Values: Choose the attribute with the smallest
number of possible values
– Most-Values: Choose the attribute with the largest
number of possible values
– Max-Gain: Choose the attribute that has the largest
expected information gain
• i.e., attribute that results in smallest expected size of
subtrees rooted at its children

•The ID3 algorithm uses the Max-Gain method of

selecting the best attribute
Basic Algorithm for Top-Down
Induction of Decision Trees
‫مقسم ثنائي‬
[ID3, C4.5 by Quinlan] (Iterative Dichotomiser 3)

node = root of decision tree Main loop:

1.A  the “best” decision attribute for the next node.
2.Assign A as decision attribute for node.
3.For each value of A, create a new descendant of node.
4.Sort training examples to leaf nodes.
5.If training examples are perfectly classified, stop. Else, recurse over new leaf
nodes.

How do we choose which attribute is best?

The ID3 algorithm uses the Max-Gain method
of selecting the best attribute
Choosing an
Attribute

Idea: a good attribute splits the examples into

subsets that are (ideally) “all positive” or “all
negative”

Which split is more informative: Patrons? or Type?

ID3-induced
Decision Tree
Compare the Two Decision Trees
Information Gain

Which test is more informative?

Split over whether Balance exceeds Split over whether
50K applicant is employed

Less or equal 50K Over 50K Unemployed Employed

Information Gain
Impurity/Entropy (informal)
– Measures the level of impurity in a group of
examples
Impurity
‫الشوائب‬

Very impure group Less impure Minimum

impurity
Entropy: a common way to measure
impurity
Entropy # of possible
values for X
Entropy H(X) of a random variable X

H(X) is the expected number of bits needed to encode a

randomly drawn value of X (under most efficient code)
Entropy: a common way to measure
impurity
Entropy # of possible
values for X
Entropy H(X) of a random variable X

H(X) is the expected number of bits needed to encode a

randomly drawn value of X (under most efficient code)

Why? Information theory:

• Most efficient code assigns -log2P(X=i) bits to encode
the message X=i
• So, expected number of bits to code one random X is:
Information Gain
• We want to determine which attribute in a given set
of training feature vectors is most useful for
discriminating between the classes to be learned.

• Information gain tells us how important a given

attribute of the feature vectors is.

• We will use it to decide the ordering of attributes in

the nodes of a decision tree.
From Entropy to
Information Gain
From Entropy to Information Gain
Entropy H(X) of a random variable X
From Entropy to Information Gain
Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

From Entropy to Information Gain
Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

From Entropy to Information Gain
Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (Information Gain) of X and Y :

Information Gain
Information Gain is the mutual information between
input attribute A and target variable Y

Information Gain is the expected reduction in entropy

of target variable Y for data sample S, due to sorting
on variable A
Calculating Information Gain
Information Gain = entropy(parent) – [average entropy(children)]

child 13 log 13   4 log 4 

 0.787
entropy 17 2
17  17 2
17 

Entire population (30 instances)

17 instances

child 1 log 1  12 log 12  0.391

entropy 
13 2
13   13
2
13 

parent 14 log 14   16 log 16   0.996

entropy  30 2
30   30 2
30  13 instances

17  0.787   13  0.391   0.615

(Weighted) Average Entropy of Children =
 30   30 
Information Gain= 0.996 - 0.615 = 0.38
Entropy-Based Automatic Decision
Tree Construction

Training Set X Node 1

x1=(f11,f12,…f1m) What feature
x2=(f21,f22, f2m) should be
. used?
What values?
.
xn=(fn1,f22, f2m)

Quinlan suggested information gain in his ID3 system

based on entropy.
Play
Day Outlook Temp Humidity Wind
Tennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

Rafael Nadal 13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

Example

Play
Day Outlook Temp Humidity Wind
Tennis
Outlook 1 Sunny Hot High Weak No
2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

Overcast

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

Humidity Yes Wind
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

No Yes No Yes 13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

Example
Question 1 Question 2

Yes No Yes No
Example
Question 3 Question 4

Yes No Yes No
Question 1 Question 2

E=1 E=1

Yes No Yes No

E=0.97 E=0.92 E=0.72 E=0

Information Gain
k
Question 1 Question 2

E=1 E=1

Yes No Yes No

E=0.97 E=0.92 E=0.72 E=0

Play
Day Outlook Temp Humidity Wind
Tennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

E=0.954

Wind

E=0.811 E=1
G (S, W ind) = 0.048
E=0.954

Humidity

E=0.985 E=0.592
G (S, W ind) = 0.048
G (S , H umidity) = 0.151 E=0.954

Temp

d
Mil
E=1 E=0.92 E=0.81
G (S, W ind) = 0.048
G (S , H umidity) = 0.151 E=0.954

G (S , T emp) = 0.042

Outlook

Overcast
E=0.971 E=0 E=0.971
Outlook

Overcast
Humidity Yes Wind

No Yes No Yes
Example
This dataset is originally from the National Institute of Diabetes and Digestive
and Kidney Diseases. The objective of the dataset is to diagnostically predict
whether or not a patient has diabetes, based on certain diagnostic
measurements included in the dataset. Several constraints were placed on the
selection of these instances from a larger database. In particular, all patients
here are females at least 21 years old of Pima Indian heritage. ‫بيما التراث‬
.‫الهندي‬

Content The datasets consists of several medical predictor variables and one
target variable, Outcome. Predictor variables includes the number of
pregnancies the patient has had, their BMI, insulin level, age, and so on.

Link Data set : Pima Indians Diabetes Database (kaggle.com)

To download .csv file.
Example
Example

Veena S. Chakravarthi, Shivananda R. Koteshwar - System on Chip (SOC) Architecture. a Practical Approach-Springer (2023)
No ratings yet
Veena S. Chakravarthi, Shivananda R. Koteshwar - System on Chip (SOC) Architecture. a Practical Approach-Springer (2023)
174 pages
Tata Home Finance Ltd. (A Comprehensive Case Study On MIS)
50% (2)
Tata Home Finance Ltd. (A Comprehensive Case Study On MIS)
24 pages
Deep Learning: Decision Trees I
No ratings yet
Deep Learning: Decision Trees I
45 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
MAchine Learning 2
No ratings yet
MAchine Learning 2
16 pages
MAchine Learning 1
No ratings yet
MAchine Learning 1
17 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
No ratings yet
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
31 pages
Lec4 - Decision Trees
No ratings yet
Lec4 - Decision Trees
43 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Unit 4a Decision Tree
No ratings yet
Unit 4a Decision Tree
90 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
W7-8_ Decision Trees
No ratings yet
W7-8_ Decision Trees
81 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Tree ID3 Algorithm - Machine Learning - by AshirbadPradhan - Medium
No ratings yet
Decision Tree ID3 Algorithm - Machine Learning - by AshirbadPradhan - Medium
18 pages
Unit 3
No ratings yet
Unit 3
46 pages
7_DecisionTree
No ratings yet
7_DecisionTree
58 pages
2025-Lecture07-P1-ID3
No ratings yet
2025-Lecture07-P1-ID3
41 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
2024-Lecture11-MLAlgorithms
No ratings yet
2024-Lecture11-MLAlgorithms
84 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
ML_Unit-2_Material
No ratings yet
ML_Unit-2_Material
20 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
48 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
Dec Tree
No ratings yet
Dec Tree
17 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
L6 Decision Tree Classifier
No ratings yet
L6 Decision Tree Classifier
46 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
3. Tree Models
No ratings yet
3. Tree Models
42 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Network - Lec. 6- Fall 2024
No ratings yet
Network - Lec. 6- Fall 2024
35 pages
Net.-Fall-2024-Lec.-12
No ratings yet
Net.-Fall-2024-Lec.-12
17 pages
Net.-Fall-2024-Lec.-10
No ratings yet
Net.-Fall-2024-Lec.-10
41 pages
07_KNN & Naive Bayes
No ratings yet
07_KNN & Naive Bayes
59 pages
08_NN
No ratings yet
08_NN
117 pages
voting machine _
No ratings yet
voting machine _
9 pages
03_Random Forest
No ratings yet
03_Random Forest
24 pages
Transportation Theory
No ratings yet
Transportation Theory
16 pages
J2SE - Java 2, Standard Edition
No ratings yet
J2SE - Java 2, Standard Edition
26 pages
Q&A - Bhaskar Rao
No ratings yet
Q&A - Bhaskar Rao
3 pages
DA-300 Electronic Manual
No ratings yet
DA-300 Electronic Manual
8 pages
MAT9004 Tutorials Week 4 Sol
No ratings yet
MAT9004 Tutorials Week 4 Sol
5 pages
Assignment CN-SAL
No ratings yet
Assignment CN-SAL
2 pages
MGSC-5108-20 Outline Winter 2023
No ratings yet
MGSC-5108-20 Outline Winter 2023
3 pages
Computer Basics - Basic Troubleshooting Techniques
No ratings yet
Computer Basics - Basic Troubleshooting Techniques
4 pages
ERP - NT SoftwareENG
No ratings yet
ERP - NT SoftwareENG
102 pages
Fdocuments - in Dqman-Manual PDF
No ratings yet
Fdocuments - in Dqman-Manual PDF
130 pages
TLX InstallationManual GB
No ratings yet
TLX InstallationManual GB
34 pages
1 Data Sheet Nokia Assurance Center
No ratings yet
1 Data Sheet Nokia Assurance Center
2 pages
unit 4-Spreadsheet and its Business Applications
No ratings yet
unit 4-Spreadsheet and its Business Applications
31 pages
"A Study On Effectiveness of Portfolio Management On Various Sectors in Stock Market
No ratings yet
"A Study On Effectiveness of Portfolio Management On Various Sectors in Stock Market
24 pages
Tecno Pova 4
No ratings yet
Tecno Pova 4
2 pages
Functional Units of Digital System - Javatpoint
No ratings yet
Functional Units of Digital System - Javatpoint
3 pages
Calypto's Latency Guide
No ratings yet
Calypto's Latency Guide
3 pages
DLD Lab No. 14
No ratings yet
DLD Lab No. 14
6 pages
2022 CAT Prelim P1 (FINAL)
No ratings yet
2022 CAT Prelim P1 (FINAL)
21 pages
VRR Info
No ratings yet
VRR Info
3 pages
KNN Based Clothing Color Detection For Optimization of Color Selection Based On Thermal Comforatability
No ratings yet
KNN Based Clothing Color Detection For Optimization of Color Selection Based On Thermal Comforatability
22 pages
STMicroelectronics Power Compel
No ratings yet
STMicroelectronics Power Compel
146 pages
1st Unit Test Grade 7 TLE 2nd Quarter
100% (1)
1st Unit Test Grade 7 TLE 2nd Quarter
3 pages
Student Protocols For Online Meeting
No ratings yet
Student Protocols For Online Meeting
2 pages
Travel Time Prediction Using Machine Learning and Weather Impact On Traffic Conditions
No ratings yet
Travel Time Prediction Using Machine Learning and Weather Impact On Traffic Conditions
8 pages
San Isidro National High School City of Tagbilaran Diagnostic Test in MATHEMATICS 10 Name: - Yr. & Section - Date: - Score
No ratings yet
San Isidro National High School City of Tagbilaran Diagnostic Test in MATHEMATICS 10 Name: - Yr. & Section - Date: - Score
3 pages
Chapter 3 Data Exploration
No ratings yet
Chapter 3 Data Exploration
84 pages
Chapter6-HE180347 SE1918
No ratings yet
Chapter6-HE180347 SE1918
11 pages
T200-users-manual
No ratings yet
T200-users-manual
11 pages

02 DecisionTrees Done

Uploaded by

02 DecisionTrees Done

Uploaded by

Decision Trees

• Each internal node: test one attribute Xi

• What prediction would we make for

~7,000 possible cases

Is this the best decision tree?

Idea: The simplest consistent explanation is the best

• Therefore, the smallest decision tree that correctly

Key problem: choosing which attribute to split a

•The ID3 algorithm uses the Max-Gain method of

node = root of decision tree Main loop:

How do we choose which attribute is best?

Idea: a good attribute splits the examples into

Which split is more informative: Patrons? or Type?

Which test is more informative?

Less or equal 50K Over 50K Unemployed Employed

Very impure group Less impure Minimum

H(X) is the expected number of bits needed to encode a

H(X) is the expected number of bits needed to encode a

Why? Information theory:

• Information gain tells us how important a given

• We will use it to decide the ordering of attributes in

Specific conditional entropy H(X|Y=v) of X given Y=v :

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (Information Gain) of X and Y :

Information Gain is the expected reduction in entropy

child 13 log 13   4 log 4 

Entire population (30 instances)

child 1 log 1  12 log 12  0.391

parent 14 log 14   16 log 16   0.996

17  0.787   13  0.391   0.615

Training Set X Node 1

Quinlan suggested information gain in his ID3 system

3 Overcast Hot High Weak Yes

7 Overcast Cool Normal Strong Yes

12 Overcast Mild High Strong Yes

Rafael Nadal 13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

7 Overcast Cool Normal Strong Yes

12 Overcast Mild High Strong Yes

14 Rain Mild High Strong No

E=0.97 E=0.92 E=0.72 E=0

E=0.97 E=0.92 E=0.72 E=0

3 Overcast Hot High Weak Yes

7 Overcast Cool Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

Link Data set : Pima Indians Diabetes Database (kaggle.com)

You might also like