Lecture 19 - Decision Tress
Lecture 19 - Decision Tress
Definition
• A decision tree is a classifier in the form of a
tree structure with two types of nodes:
– Decision node: Specifies a choice or test of
some attribute, with one branch for each
outcome
– Leaf node: Indicates classification of an example
Decision Tree Example 1
Whether to approve a loan
Employed?
N Ye
o s
Credit
Income?
Score?
Hig Lo Hig Lo
h w h w
Outlook
No Yes No Yes
Classification by a decision tree
• Instance
<Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong>
Outlook
No Yes No Yes
Disjunction of conjunctions
(Outlook = Sunny ^ Humidity = Normal)
v (Outlook = Overcast)
v (Outlook = Rain ^ Wind = Weak)
Outlook
No Yes No Yes
Problems suited to decision trees
• Information gain
– How well a given attribute separates the training
examples according to their target classification
– Reduction in entropy
• Entropy
– (im)purity of an arbitrary collection of examples
Example
• Let’s Try an Example!
– Let –E([X+,Y-]) represent that there are X positive training
elements and Y negative elements. Therefore the Entropy
for the training data, E(S), can be represented as E([9+,5-])
because of the 14 training examples 9 of them are yes and
5 of them are no.
Entropy
• If there are only two classes
• In general,
*S = Dataset
Information Gain
• The expected reduction in entropy achieved
by splitting the training examples
Example
Coumpiting Information Gain
Humidity Wind
Outlook
? Yes ?
Python Code
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, y)
• Resource for python coding of decision tree
– https://scikit-learn.org/stable/modules/tree.html