Decision Trees
Decision Trees
rainy
1 sunny no big ? no
med
big
small big
yes no
Chance and Terminal nodes
• Each internal node of a DT is a decision point,
where some condition is tested
• The result of this condition determines which
branch of the tree is to be taken next
• Thus they are called decision node, chance
node or non-terminal node
• Chance nodes partition the available data at
that point to maximize dependent variable
differences
Terminal nodes
• The leaf nodes of a DT are called terminal
node
• They indicate the class into which a data
instance will be classified
• They have just one incoming node
• They do not have child nodes (outgoing nodes)
• There are no conditions tested at terminal
nodes
• Tree traversal from the root to the leaf
produces the production rule for that class
Advantages of DT
• Easy to understand and interpret
• Works for categorical and quantitative data
• DT can grow to any depth
• Attributes can be chosen in any desired order
• Pruning a DT is very easy
• Works for missing or null values
Advantages contd.
• Can be used to identify outliers
• Production rules can be obtained directly
from the built DT
• They are relatively faster than other
classification models
• DT can be used even when domain experts are
absent
Disadvantages
• A DT induces sequential decisions
• Class-overlap problem
• Correlated data
• Complex production rules
• A DT can be sub-optimal
Quinlan’s classical example
# Attribute Class
Outlook Temperature Humidity Windy Play
1 sunny hot high no N
2 sunny hot high yes N
3 overcast hot high no P
4 rainy moderate high no P
5 rainy cold normal no P
6 rainy cold normal yes N
7 overcast cold normal yes P
8 sunny moderate high no N
9 sunny cold normal no P
10 rainy moderate normal no P
11 sunny moderate normal yes P
12 overcast moderate high yes P
13 overcast hot normal no P
14 rainy moderate high yes N
Simple Tree
Outlook
sunny rainy
overcast
Humidity P Windy
N P N P
Complicated Tree
Temperature
hot
cold moderate
N P P N Windy P Outlook P