0% found this document useful (0 votes)
21 views

Lecture 19 - Decision Tress

A decision tree is a classifier represented as a tree structure, where internal nodes represent attributes, branches represent attribute values, and leaf nodes represent classifications. An example decision tree for predicting whether to play tennis based on outlook, temperature, humidity, and wind is presented. The tree splits the data by testing attributes at each internal node, with branches for each possible attribute value leading to subsequent nodes until a leaf node with a classification is reached. Decision trees are well-suited for data that can be represented by attribute-value pairs and have discrete target concepts, and where disjunctive descriptions may be needed.

Uploaded by

bscs-20f-0009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Lecture 19 - Decision Tress

A decision tree is a classifier represented as a tree structure, where internal nodes represent attributes, branches represent attribute values, and leaf nodes represent classifications. An example decision tree for predicting whether to play tennis based on outlook, temperature, humidity, and wind is presented. The tree splits the data by testing attributes at each internal node, with branches for each possible attribute value leading to subsequent nodes until a leaf node with a classification is reached. Decision trees are well-suited for data that can be represented by attribute-value pairs and have discrete target concepts, and where disjunctive descriptions may be needed.

Uploaded by

bscs-20f-0009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Decision trees

Definition
• A decision tree is a classifier in the form of a
tree structure with two types of nodes:
– Decision node: Specifies a choice or test of
some attribute, with one branch for each
outcome
– Leaf node: Indicates classification of an example
Decision Tree Example 1
Whether to approve a loan

Employed?
N Ye
o s

Credit
Income?
Score?
Hig Lo Hig Lo
h w h w

Approve Reject Approve Reject


Data Set
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Decision Tree for PlayTennis
• Attributes and their values:
– Outlook: Sunny, Overcast, Rain
– Humidity: High, Normal
– Wind: Strong, Weak
– Temperature: Hot, Mild, Cool

• Target concept - Play Tennis: Yes, No


Decision Tree for PlayTennis
Outlook

Sunny Overcast Rain

Humidity Each internal node tests an attribute

High Normal Each branch corresponds to an


attribute value node

No Yes Each leaf node assigns a classification


A decision tree
• Concept: PlayTennis

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes
Classification by a decision tree
• Instance
<Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong>

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes
Disjunction of conjunctions
(Outlook = Sunny ^ Humidity = Normal)
v (Outlook = Overcast)
v (Outlook = Rain ^ Wind = Weak)

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes
Problems suited to decision trees

• Instanced are represented by attribute-value


pairs
• The target function has discrete target values
• Disjunctive descriptions may be required
• The training data may contain errors
• The training data may contain missing
attribute values
Training data
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Which attribute should be tested at
each node?
• We want to build a small decision tree

• Information gain
– How well a given attribute separates the training
examples according to their target classification
– Reduction in entropy
• Entropy
– (im)purity of an arbitrary collection of examples
Example
• Let’s Try an Example!
– Let –E([X+,Y-]) represent that there are X positive training
elements and Y negative elements. Therefore the Entropy
for the training data, E(S), can be represented as E([9+,5-])
because of the 14 training examples 9 of them are yes and
5 of them are no.
Entropy
• If there are only two classes

• In general,

*S = Dataset
Information Gain
• The expected reduction in entropy achieved
by splitting the training examples
Example
Coumpiting Information Gain

Humidity Wind

High Normal Weak Strong


Which attribute is the best
classifier?
• Information gain
Splitting training data with Outlook
{D1,D2,…,D14}
[9+,5-]

Outlook

Sunny Overcast Rain

{D1,D2,D8,D9,D11} {D3,D7,D12,D13} {D4,D5,D6,D10,D14}


[2+,3-] [4+,0-] [3+,2-]

? Yes ?
Python Code
from sklearn import tree

clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, y)
• Resource for python coding of decision tree

– https://scikit-learn.org/stable/modules/tree.html

You might also like