0% found this document useful (0 votes)

5 views

ML Unit 4

Ml unit-4 for btech studens

Uploaded by

NEMANI SRINITYA NEMANI SRINITYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

ML Unit 4

Ml unit-4 for btech studens

Uploaded by

NEMANI SRINITYA NEMANI SRINITYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

UNIT – 4

Decision Trees
&
Ensemble Learning and Random Forests

1
Syllabus

Decision Trees: Training and Visualizing a Decision Tree, Making

Predictions, Estimating Class Probabilities, The CART Training
Algorithm, Computational Complexity, Gini Impurity or Entropy.

Ensemble Learning and Random Forests: Voting Classifiers,

Bagging and Pasting, Random Forests,
Extra-Trees, Boosting, AdaBoost, Gradient Boosting, Stacking.

2
3
4
5
6
▪ Decision Tree is one of the important algorithm in
ML.
▪ When we think about a problem, possible
solutions will come in our mind and select/choose
decisions.
▪ Ex: I want to buy a car.
7
What is Decision Tree?

•DT is a tree shaped diagram used to

determine a course of action.
•Each branch of the tree represents a
possible decisions, occurrence or
reaction.

•DT is a Supervised learning.

•It can be used for both classification
and Regression problems.
•but mostly it is preferred for solving
Classification problems.

Outcome features of
•It is a tree-structured classifier.
a dataset,
8
Decision Tree
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.

Decision nodes are used to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not contain any further branches.

The decisions or the test are performed on the basis of features of the given dataset.

It is a graphical representation for getting all the possible solutions to a

problem/decision based on given conditions.

It is called a decision tree because, similar to a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.

A decision tree simply asks a question and based on the answer (Yes/No), it further
split the tree into subtrees. 9
10
11
Advantages of DT

Simple to Little effort It can Nonlinear

understand required for handle both parameters
, interpret data numerical don’t affect
and preparation. and its
visualize. Very useful categorical performance.
for Decision data.
Flow chart making
type problems
Structure.

12
13
How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset.

The algorithm starts from the root node of the tree.
This algorithm compares the values of root attribute with the record (real dataset)
attribute and based on the comparison, follows the branch and jumps to the next
node.
For the next node, the algorithm again compares the attribute value with the other
sub-nodes and move further.
It continues the process until it reaches the leaf node of the tree.
The complete process can be better understood using the below algorithm:

14
Example: Suppose there is a candidate who has a job offer and
wants to decide whether he should accept the offer or Not.
15
DT Algorithm

Step-5:
Recursively
make new
Step-1: Step-2: Step-3: decision trees
Step-4: using the
Begin the Find the Divide the
Generate subsets of the
tree with best S into
the dataset created
the root attribute in subsets
decision in step -3.
node, says the dataset that
tree node, Continue this
S, which using contains process until a
which
contains Attribute possible stage is reached
contains
the Selection values for where you
the best
complete Measure the best cannot further
attribute.
dataset. (ASM). attributes. classify the
nodes and called
the final node as
a leaf node.

16
Attribute Selection Measures

▪ While implementing a Decision tree, the main issue arises that how to select
the best attribute for the root node and for sub-nodes.
▪ So, to solve such problems there is a technique which is called as Attribute
selection measure or ASM.
▪ By this measurement, we can easily select the best attribute for the nodes
of the tree.
▪ There are two popular techniques for ASM, which are:
▪ Information Gain
▪ Entropy / Gini Index

17
Information Gain
Entropy: Entropy is the measure of randomness or unpredictability in the datasets.

Information Gain:
▪ Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
▪ It calculates how much information a feature provides us about a class.
▪ According to the value of information gain, we split the node and build the
decision tree.
▪ A decision tree algorithm always tries to maximize the value of information gain,
and a node/attribute having the highest information gain is split first.

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

IG= E(S) - Σ |SV| *E(Sv)
|S|
18
19
20
21
Expressiveness of decision trees

Decision trees can represent any boolean function of the input attributes. Let’s use
decision trees to perform the function of three boolean gates AND, OR and XOR.
Boolean Function: AND

Decision tree for an AND operation.

22
Boolean Function: OR

23
24
Boolean Function: XOR

25
Decision tree for an XOR operation involving three operands

26
Data set

27
28
29
30
31
▪ Ensemble method is a technique that
combines the predictions from multiple
machine learning Algorithms together to make
to make accurate predictions than any
individual model.
▪ A model is comprised of many models is
called as Ensembled Model.

32
33
If you are planning to buy a car, would you enter a showroom and buy
the car that the salesperson shows you?
The answer is probably NO.
Currently, you are likely to ask your friends, family, and colleagues for an
opinion, do research on various portals about different models, and visit
a few review sites before making a purchase decision.

In a nutshell, you would not come to a conclusion directly. Instead, you

would try to make a more informed decision after considering diverse
opinions and reviews.

In the case of ensemble learning, the same principle applies.

34
Why we use Ensemble?

There are two main reasons to use an ensemble over a single model,
and they are related; they are:

Performance: Robustness:
An ensemble can make An ensemble reduces the
better predictions and spread or dispersion of
achieve better the predictions and model
performance than any performance.
single contributing model.

Ensembles are used to achieve better predictive performance on a predictive

modeling problem than a single predictive model. 35
There are 3 most common ensemble learning methods in
machine learning. These are as follows:

•Boosting •Stacking
•Bagging
•(Bootstrap
Aggregation)
1.AdaBoost
(Adaptive Boosting)
2.Gradient
ListTree
Boosting
Ex: Random Forest 3.XGBoost Ex
Bootstrapping: Aggregation:
Bagging It is a random sampling This is a step that
Bagging is a method that is used to involves the
method of derive samples from the
data using the
process of
ensemble combining the
modeling, which is replacement procedure.
primarily used to output of all base
In this method, first, models and
solve supervised random data samples
machine learning based on their
are fed to the primary
problems. model, and then a base
output, predicting
It is generally learning algorithm is run an aggregate
completed in two on the samples to result with greater
steps as follows: complete the learning accuracy and
process. reduced variance.

37
38
Bootstrapping is the method of randomly creating samples of data out of
a population with replacement to estimate a population parameter.
Steps to Perform Bagging

Consider there are multiple base learners of n Records (observations) and m

features in the training set. You need to select a random sample from the
training dataset without replacement. (m<n)

A subset of m features is chosen randomly to create a model using sample

observations which is called as row sampling with replacement.

It involves taking random samples with replacement from the training data and
fitting a prediction model to each sample

The final prediction is obtained by averaging the predictions of the models for
regression problems or by voting for classification problems.
39
40
41
Bagging
minimizes
the
overfitting
of data

It improves
the model’s
accuracy

It deals with
higher
dimensional
data
efficiently
42
Boosting
Hence, in this
Boosting is an In boosting, all Boosting is
way, all weak
ensemble method base learners an efficient
learners get
that enables each (weak) are algorithm that
turned into
member to learn arranged in a converts a
strong learners
from the preceding sequential weak learner
and make a
member's mistakes format so that into a strong
better
and make better they can learn learner.
predictive
predictions for the from the
model with
future. mistakes of
significantly
their preceding
improved
learner.
performance.
45
46
47
48

Safety Integrity Level Selection: Systematic Methods Including Layer of Protection Analysis
No ratings yet
Safety Integrity Level Selection: Systematic Methods Including Layer of Protection Analysis
2 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Machine learning lecture 2,3,4
No ratings yet
Machine learning lecture 2,3,4
26 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Tree
No ratings yet
Tree
7 pages
5 Learning
No ratings yet
5 Learning
7 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
5 Learning
No ratings yet
5 Learning
8 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Unit 4
No ratings yet
Unit 4
33 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
ml unit3
No ratings yet
ml unit3
8 pages
DMDW 04
No ratings yet
DMDW 04
10 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Lecture 7 Overview of ML models
No ratings yet
Lecture 7 Overview of ML models
77 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Decision tree
No ratings yet
Decision tree
16 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Ensemble Methods.pptx
No ratings yet
Ensemble Methods.pptx
32 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Unit I ML (I) 24-25
No ratings yet
Unit I ML (I) 24-25
79 pages
Unit-V_1
No ratings yet
Unit-V_1
26 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Module 2
No ratings yet
Module 2
34 pages
2.4-Ensemble_methods_lecture_notes (1)
No ratings yet
2.4-Ensemble_methods_lecture_notes (1)
14 pages
ML_UNIT_3_NOTES-1
No ratings yet
ML_UNIT_3_NOTES-1
118 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
chapter 04
No ratings yet
chapter 04
48 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
CRT Picture Tube
No ratings yet
CRT Picture Tube
5 pages
IA Sample
No ratings yet
IA Sample
4 pages
Pre and Post Assessment Scoresheet
No ratings yet
Pre and Post Assessment Scoresheet
44 pages
Unique Floating Mechanism System Automatically Adjusts The Difference Between The Spindle Feed of The Tap
No ratings yet
Unique Floating Mechanism System Automatically Adjusts The Difference Between The Spindle Feed of The Tap
2 pages
Sany Stg190c 8 PC Preview
0% (1)
Sany Stg190c 8 PC Preview
31 pages
CGP Module 5
No ratings yet
CGP Module 5
2 pages
applications.bcfe.ie
No ratings yet
applications.bcfe.ie
10 pages
Attributable Risk and Relative Risk
No ratings yet
Attributable Risk and Relative Risk
10 pages
Using Transitional Signal DLL
No ratings yet
Using Transitional Signal DLL
6 pages
Douglas Haynes-Sunlight (2023)
No ratings yet
Douglas Haynes-Sunlight (2023)
329 pages
LE SPIKE Essential Tech Fact Sheet Color Sensor 45605 2HY21 Digital
No ratings yet
LE SPIKE Essential Tech Fact Sheet Color Sensor 45605 2HY21 Digital
1 page
Tes P 119 10 R0 PDF
100% (5)
Tes P 119 10 R0 PDF
43 pages
LIFTING RING Safety Procedures
No ratings yet
LIFTING RING Safety Procedures
3 pages
Epei
No ratings yet
Epei
25 pages
RVS Guidelines
No ratings yet
RVS Guidelines
41 pages
Earth Systems 6 Merged
No ratings yet
Earth Systems 6 Merged
6 pages
Campus IIM Kozhikode Team Members Malavika Menon, Pranay Hotkar, Soundarya Cholkar Team Name Revengers
No ratings yet
Campus IIM Kozhikode Team Members Malavika Menon, Pranay Hotkar, Soundarya Cholkar Team Name Revengers
4 pages
أثر المحاسبة الإبداعية على جودة القوائم المالية دراسة عينة من الاكاديمين والمهنيين في مجال المحاسبة والمراجعة
No ratings yet
أثر المحاسبة الإبداعية على جودة القوائم المالية دراسة عينة من الاكاديمين والمهنيين في مجال المحاسبة والمراجعة
16 pages
UIMO Class 6 Paper 2018 Part 3
No ratings yet
UIMO Class 6 Paper 2018 Part 3
4 pages
Empirical Software Engineering (Swe504) : Practical File
No ratings yet
Empirical Software Engineering (Swe504) : Practical File
27 pages
Catalogue (RCA2000, Surgetec) - en
No ratings yet
Catalogue (RCA2000, Surgetec) - en
8 pages
Hướng dẫn ôn tập thi Tuyển sinh lớp 10 2025-2026
No ratings yet
Hướng dẫn ôn tập thi Tuyển sinh lớp 10 2025-2026
2 pages
English For Academic Purposes
No ratings yet
English For Academic Purposes
21 pages
DRHT35 BBC
No ratings yet
DRHT35 BBC
1 page
C Sec cp1
No ratings yet
C Sec cp1
212 pages
How To Activate The Merkaba - 10 Merkaba Activation Symptoms
No ratings yet
How To Activate The Merkaba - 10 Merkaba Activation Symptoms
18 pages
Recruitment Practices in Family Owned and Professionally Owned Business On Gopal Group-New Delhi, Punj Lloyd-Dehradun, Hotel Madhuban-Dehradun and Hotel Fortune-Mussoorie
No ratings yet
Recruitment Practices in Family Owned and Professionally Owned Business On Gopal Group-New Delhi, Punj Lloyd-Dehradun, Hotel Madhuban-Dehradun and Hotel Fortune-Mussoorie
73 pages
962g Hyd PDF
100% (1)
962g Hyd PDF
4 pages
Aspect Impact Evaluation Form Office 2019
No ratings yet
Aspect Impact Evaluation Form Office 2019
1 page