Decisiontree1 2
Decisiontree1 2
Joseph C. Lorilla
Background
• Decision Trees are the foundation for many
classical machine learning algorithms like
Random Forests, Bagging, and Boosted Decision
Trees. They were first proposed by Leo Breiman,
a statistician at the University of California,
Berkeley.
• Idea was to represent data as a tree where each
internal node denotes a test on an attribute
(basically a condition), each branch represents
an outcome of the test, and each leaf node
(terminal node) holds a class label.
• Decision trees are now widely used in many
applications for predictive modeling, including
both classification and regression. Sometimes
decision trees are also referred to as CART,
which is short for Classification and Regression
Trees
• Tree-based algorithms are a popular family of
related non-parametric and supervised
methods for both classification and regression
Types decision trees
• Classification trees:
• This is where the algorithm has a categorical target variable. For example, consider
you are asked to predict the relative price of a computer as one of three categories:
low, medium, or high. Features could include monitor type, speaker quality, RAM,
and SSD.
• Classification trees determine whether an event happened or didn’t happen. Usually,
this involves a “yes” or “no” outcome
• Categorical data
• Regression trees:
• predict continuous values based on previous data or information sources. For
example, they can predict the price of gasoline or whether a customer will purchase
eggs (including which type of eggs and at which store).
• Continuous data
Terminologies
• Root node: The topmost node of a decision tree that represents the entire
message or decision
• Decision (or internal) node: A node within a decision tree where the prior
node branches into two or more variables
• Leaf (or terminal) node: The leaf node is also called the external node or
terminal node, which means it has no child—it’s the last node in the
decision tree and furthest from the root node
• Splitting: The process of dividing a node into two or more nodes. It’s the
part at which the decision branches off into variables
• Pruning: The opposite of splitting, the process of going through and
reducing the tree to only the most important nodes or outcomes
Examples
Advantages of decision trees
1. Easy to read and interpret
• One of the advantages of decision trees is that their outputs are easy to read and interpret
without requiring statistical knowledge. For example, when using decision trees to present
demographic information on customers, the marketing department staff can read and
interpret the graphical representation of the data without requiring statistical knowledge.
• The data can also generate important insights on the probabilities, costs, and alternatives to
various strategies formulated by the marketing department.
2. Easy to prepare
• Compared to other decision techniques, decision trees take less effort for data preparation.
However, users need to have ready information to create new variables with the power to
predict the target variable. They can also create classifications of data without having to
compute complex calculations. For complex situations, users can combine decision trees with
other methods.
3. Less data cleaning required
• Another advantage of decision trees is that there is less data cleaning required once the
variables have been created. Cases of missing values and outliers have less significance on
the decision tree’s data.
Disadvantages of decision trees
1. Unstable nature
• One of the limitations of decision trees is that they are largely unstable
compared to other decision predictors. A small change in the data can result
in a major change in the structure of the decision tree, which can convey a
different result from what users will get in a normal event. The resulting
change in the outcome can be managed by machine learning algorithms, such
as boosting and bagging.
2. Less effective in predicting the outcome of a continuous variable
• In addition, decision trees are less effective in making predictions when the
main goal is to predict the outcome of a continuous variable. This is because
decision trees tend to lose information when categorizing variables into
multiple categories.
How to build a decision tree?
ENTROPY
Entropy
Entropy
Information Gain
• The information gain is based on the decrease in entropy after a
dataset is split on an attribute.
• Constructing a decision tree is all about finding attribute that returns
the highest information gain (i.e., the most homogeneous branches).
Steps in building decision trees
Steps in building decision trees
Steps in building decision trees
Steps in building decision trees
Steps in building decision trees
A complete decision tree
Let’s get our hands dirty, shall we?
Computing the Outlook feature
Computing for the Humidity
Consolidated result of information gain
The initial tree
Computing for the leaf
Consolidated result for the leaf
Updated tree
Computing for the leaf… again!
Completed decision tree
Overfitting?