ML Unit 4
ML Unit 4
Decision Trees
&
Ensemble Learning and Random Forests
1
Syllabus
2
3
4
5
6
▪ Decision Tree is one of the important algorithm in
ML.
▪ When we think about a problem, possible
solutions will come in our mind and select/choose
decisions.
▪ Ex: I want to buy a car.
7
What is Decision Tree?
Outcome features of
•It is a tree-structured classifier.
a dataset,
8
Decision Tree
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not contain any further branches.
The decisions or the test are performed on the basis of features of the given dataset.
It is called a decision tree because, similar to a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.
A decision tree simply asks a question and based on the answer (Yes/No), it further
split the tree into subtrees. 9
10
11
Advantages of DT
12
13
How does the Decision Tree algorithm Work?
14
Example: Suppose there is a candidate who has a job offer and
wants to decide whether he should accept the offer or Not.
15
DT Algorithm
Step-5:
Recursively
make new
Step-1: Step-2: Step-3: decision trees
Step-4: using the
Begin the Find the Divide the
Generate subsets of the
tree with best S into
the dataset created
the root attribute in subsets
decision in step -3.
node, says the dataset that
tree node, Continue this
S, which using contains process until a
which
contains Attribute possible stage is reached
contains
the Selection values for where you
the best
complete Measure the best cannot further
attribute.
dataset. (ASM). attributes. classify the
nodes and called
the final node as
a leaf node.
16
Attribute Selection Measures
▪ While implementing a Decision tree, the main issue arises that how to select
the best attribute for the root node and for sub-nodes.
▪ So, to solve such problems there is a technique which is called as Attribute
selection measure or ASM.
▪ By this measurement, we can easily select the best attribute for the nodes
of the tree.
▪ There are two popular techniques for ASM, which are:
▪ Information Gain
▪ Entropy / Gini Index
17
Information Gain
Entropy: Entropy is the measure of randomness or unpredictability in the datasets.
Information Gain:
▪ Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
▪ It calculates how much information a feature provides us about a class.
▪ According to the value of information gain, we split the node and build the
decision tree.
▪ A decision tree algorithm always tries to maximize the value of information gain,
and a node/attribute having the highest information gain is split first.
Decision trees can represent any boolean function of the input attributes. Let’s use
decision trees to perform the function of three boolean gates AND, OR and XOR.
Boolean Function: AND
23
24
Boolean Function: XOR
25
Decision tree for an XOR operation involving three operands
26
Data set
27
28
29
30
31
▪ Ensemble method is a technique that
combines the predictions from multiple
machine learning Algorithms together to make
to make accurate predictions than any
individual model.
▪ A model is comprised of many models is
called as Ensembled Model.
32
33
If you are planning to buy a car, would you enter a showroom and buy
the car that the salesperson shows you?
The answer is probably NO.
Currently, you are likely to ask your friends, family, and colleagues for an
opinion, do research on various portals about different models, and visit
a few review sites before making a purchase decision.
There are two main reasons to use an ensemble over a single model,
and they are related; they are:
Performance: Robustness:
An ensemble can make An ensemble reduces the
better predictions and spread or dispersion of
achieve better the predictions and model
performance than any performance.
single contributing model.
•Boosting •Stacking
•Bagging
•(Bootstrap
Aggregation)
1.AdaBoost
(Adaptive Boosting)
2.Gradient
ListTree
Boosting
Ex: Random Forest 3.XGBoost Ex
Bootstrapping: Aggregation:
Bagging It is a random sampling This is a step that
Bagging is a method that is used to involves the
method of derive samples from the
data using the
process of
ensemble combining the
modeling, which is replacement procedure.
primarily used to output of all base
In this method, first, models and
solve supervised random data samples
machine learning based on their
are fed to the primary
problems. model, and then a base
output, predicting
It is generally learning algorithm is run an aggregate
completed in two on the samples to result with greater
steps as follows: complete the learning accuracy and
process. reduced variance.
37
38
Bootstrapping is the method of randomly creating samples of data out of
a population with replacement to estimate a population parameter.
Steps to Perform Bagging
It involves taking random samples with replacement from the training data and
fitting a prediction model to each sample
The final prediction is obtained by averaging the predictions of the models for
regression problems or by voting for classification problems.
39
40
41
Bagging
minimizes
the
overfitting
of data
It improves
the model’s
accuracy
It deals with
higher
dimensional
data
efficiently
42
Boosting
Hence, in this
Boosting is an In boosting, all Boosting is
way, all weak
ensemble method base learners an efficient
learners get
that enables each (weak) are algorithm that
turned into
member to learn arranged in a converts a
strong learners
from the preceding sequential weak learner
and make a
member's mistakes format so that into a strong
better
and make better they can learn learner.
predictive
predictions for the from the
model with
future. mistakes of
significantly
their preceding
improved
learner.
performance.
45
46
47
48