0% found this document useful (0 votes)
5 views

ML Unit 4

Ml unit-4 for btech studens
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

ML Unit 4

Ml unit-4 for btech studens
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

UNIT – 4

Decision Trees
&
Ensemble Learning and Random Forests

1
Syllabus

Decision Trees: Training and Visualizing a Decision Tree, Making


Predictions, Estimating Class Probabilities, The CART Training
Algorithm, Computational Complexity, Gini Impurity or Entropy.

Ensemble Learning and Random Forests: Voting Classifiers,


Bagging and Pasting, Random Forests,
Extra-Trees, Boosting, AdaBoost, Gradient Boosting, Stacking.

2
3
4
5
6
▪ Decision Tree is one of the important algorithm in
ML.
▪ When we think about a problem, possible
solutions will come in our mind and select/choose
decisions.
▪ Ex: I want to buy a car.
7
What is Decision Tree?

•DT is a tree shaped diagram used to


determine a course of action.
•Each branch of the tree represents a
possible decisions, occurrence or
reaction.

•DT is a Supervised learning.


•It can be used for both classification
and Regression problems.
•but mostly it is preferred for solving
Classification problems.

Outcome features of
•It is a tree-structured classifier.
a dataset,
8
Decision Tree
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.

Decision nodes are used to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not contain any further branches.

The decisions or the test are performed on the basis of features of the given dataset.

It is a graphical representation for getting all the possible solutions to a


problem/decision based on given conditions.

It is called a decision tree because, similar to a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.

A decision tree simply asks a question and based on the answer (Yes/No), it further
split the tree into subtrees. 9
10
11
Advantages of DT

Simple to Little effort It can Nonlinear


understand required for handle both parameters
, interpret data numerical don’t affect
and preparation. and its
visualize. Very useful categorical performance.
for Decision data.
Flow chart making
type problems
Structure.

12
13
How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset.


The algorithm starts from the root node of the tree.
This algorithm compares the values of root attribute with the record (real dataset)
attribute and based on the comparison, follows the branch and jumps to the next
node.
For the next node, the algorithm again compares the attribute value with the other
sub-nodes and move further.
It continues the process until it reaches the leaf node of the tree.
The complete process can be better understood using the below algorithm:

14
Example: Suppose there is a candidate who has a job offer and
wants to decide whether he should accept the offer or Not.
15
DT Algorithm

Step-5:
Recursively
make new
Step-1: Step-2: Step-3: decision trees
Step-4: using the
Begin the Find the Divide the
Generate subsets of the
tree with best S into
the dataset created
the root attribute in subsets
decision in step -3.
node, says the dataset that
tree node, Continue this
S, which using contains process until a
which
contains Attribute possible stage is reached
contains
the Selection values for where you
the best
complete Measure the best cannot further
attribute.
dataset. (ASM). attributes. classify the
nodes and called
the final node as
a leaf node.

16
Attribute Selection Measures

▪ While implementing a Decision tree, the main issue arises that how to select
the best attribute for the root node and for sub-nodes.
▪ So, to solve such problems there is a technique which is called as Attribute
selection measure or ASM.
▪ By this measurement, we can easily select the best attribute for the nodes
of the tree.
▪ There are two popular techniques for ASM, which are:
▪ Information Gain
▪ Entropy / Gini Index

17
Information Gain
Entropy: Entropy is the measure of randomness or unpredictability in the datasets.

Information Gain:
▪ Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
▪ It calculates how much information a feature provides us about a class.
▪ According to the value of information gain, we split the node and build the
decision tree.
▪ A decision tree algorithm always tries to maximize the value of information gain,
and a node/attribute having the highest information gain is split first.

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)


IG= E(S) - Σ |SV| *E(Sv)
|S|
18
19
20
21
Expressiveness of decision trees

Decision trees can represent any boolean function of the input attributes. Let’s use
decision trees to perform the function of three boolean gates AND, OR and XOR.
Boolean Function: AND

Decision tree for an AND operation.


22
Boolean Function: OR

23
24
Boolean Function: XOR

25
Decision tree for an XOR operation involving three operands

26
Data set

27
28
29
30
31
▪ Ensemble method is a technique that
combines the predictions from multiple
machine learning Algorithms together to make
to make accurate predictions than any
individual model.
▪ A model is comprised of many models is
called as Ensembled Model.

32
33
If you are planning to buy a car, would you enter a showroom and buy
the car that the salesperson shows you?
The answer is probably NO.
Currently, you are likely to ask your friends, family, and colleagues for an
opinion, do research on various portals about different models, and visit
a few review sites before making a purchase decision.

In a nutshell, you would not come to a conclusion directly. Instead, you


would try to make a more informed decision after considering diverse
opinions and reviews.

In the case of ensemble learning, the same principle applies.


34
Why we use Ensemble?

There are two main reasons to use an ensemble over a single model,
and they are related; they are:

Performance: Robustness:
An ensemble can make An ensemble reduces the
better predictions and spread or dispersion of
achieve better the predictions and model
performance than any performance.
single contributing model.

Ensembles are used to achieve better predictive performance on a predictive


modeling problem than a single predictive model. 35
There are 3 most common ensemble learning methods in
machine learning. These are as follows:

•Boosting •Stacking
•Bagging
•(Bootstrap
Aggregation)
1.AdaBoost
(Adaptive Boosting)
2.Gradient
ListTree
Boosting
Ex: Random Forest 3.XGBoost Ex
Bootstrapping: Aggregation:
Bagging It is a random sampling This is a step that
Bagging is a method that is used to involves the
method of derive samples from the
data using the
process of
ensemble combining the
modeling, which is replacement procedure.
primarily used to output of all base
In this method, first, models and
solve supervised random data samples
machine learning based on their
are fed to the primary
problems. model, and then a base
output, predicting
It is generally learning algorithm is run an aggregate
completed in two on the samples to result with greater
steps as follows: complete the learning accuracy and
process. reduced variance.

37
38
Bootstrapping is the method of randomly creating samples of data out of
a population with replacement to estimate a population parameter.
Steps to Perform Bagging

Consider there are multiple base learners of n Records (observations) and m


features in the training set. You need to select a random sample from the
training dataset without replacement. (m<n)

A subset of m features is chosen randomly to create a model using sample


observations which is called as row sampling with replacement.

It involves taking random samples with replacement from the training data and
fitting a prediction model to each sample

The final prediction is obtained by averaging the predictions of the models for
regression problems or by voting for classification problems.
39
40
41
Bagging
minimizes
the
overfitting
of data

It improves
the model’s
accuracy

It deals with
higher
dimensional
data
efficiently
42
Boosting
Hence, in this
Boosting is an In boosting, all Boosting is
way, all weak
ensemble method base learners an efficient
learners get
that enables each (weak) are algorithm that
turned into
member to learn arranged in a converts a
strong learners
from the preceding sequential weak learner
and make a
member's mistakes format so that into a strong
better
and make better they can learn learner.
predictive
predictions for the from the
model with
future. mistakes of
significantly
their preceding
improved
learner.
performance.
45
46
47
48

You might also like