0% found this document useful (0 votes)

61 views

Random Forest Summary

Ensemble learning combines multiple machine learning models to obtain better predictive performance than any individual model. The two main types are bagging, which trains models on random subsets of data and aggregates their predictions, and boosting, which builds models sequentially to correct errors of previous models. Random forest is an example of bagging that builds decision trees on random subsets of data and features and aggregates their predictions.

Uploaded by

Nkechi Koko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

Random Forest Summary

Uploaded by

Nkechi Koko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Ensemble Learning

Ensemble learning is the process of combining predictions from multiple machine learning models.
These models are known as weak learners. By combining several weak learners to create a strong
learner that can outperform any individual machine learning model. Ensemble learning combines
multiple models to obtain better predictive performance. The two main types of ensemble learning
are,

1. Bagging (or Bootstrap aggregating)

2. Boosting

1. Bagging

Bagging is an ensemble algorithm composed of two parts: Bootstrapping and aggregation.

Bootstrapping creates several subsets of original training data chosen randomly with replacement.
Each subset has equal size observations and can be used to train models in parallel. By sampling
with replacement, some observations may be repeated in each new training dataset. Models are
trained on each of these subsets independently and the results are aggregated for the final
prediction. The final prediction is decided from these models with the most votes (mode) in a
classification setting. In Regression, the final prediction is an average of all the predictions. An
example of bagging algorithms is the Random Forest algorithm.
[email protected]
1ZOGK3IHL5

2. Boosting

Boosting is done by building a model from the training data, then creating a second model that
attempts to correct the errors from the first model. Models are added until the training set is
predicted perfectly or a maximum number of models are added. In this technique, models are built
sequentially. An example for boosting algorithms is the AdaBoost (Adaptive Boosting) Algorithm.

Random Forest

Random Forest is an ensemble learning algorithm that is used for classification and regression
problems.

Random forest builds multiple decision trees and combines them to get a more accurate and stable
prediction. Decision trees usually work top-down, by choosing a variable at each step that best
splits the set of items. The end nodes can have a category (classification) or a continuous number
(regression). But the drawback of decision trees is that the learning mechanism in decision trees is
very sensitive to even small changes in data. Also, larger decision trees generally tend to overfit the
data.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Random forest prevents this by allowing a whole bunch of decision trees to work together to get a
better and more robust prediction. In classification, the prediction from each decision tree is a vote.
The final prediction comes from the prediction with the most votes (mode). In Regression, the final
prediction is an average of all predictions. If the data is modified a little, and use all those decision
trees to make a prediction then prediction may be more robust. Because the prediction is not
dependent on one model.

Thus each model is built on a subset of original training data, sampled with replacement. The
selection of the data subset is completely randomized. This random sampling helps reduce variance
in the data. This sampling approach is called bootstrap sampling. Then the results from different
trees are aggregated (mean for regression and mode for classification) for the final prediction. The
example representation is shown below.

[email protected]
1ZOGK3IHL5

Row Sampling

Consider a Dataset ‘D’. This is the original training dataset that has ‘n’ number of observations and
10 features say A to J. Bagging randomly selects some observations from training dataset ‘D’ and
makes training Dataset ‘D¹’. Similarly again, it takes some random observations from the training
dataset and makes a new dataset ‘D²’. In the same way, it makes a new Dataset D³, D⁴ to Dk.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Thus each new training dataset is made with a random selection of observations with replacement.
With replacement means that some observations may be present in more than one dataset. It may
be in D¹ and D³.

[email protected]
1ZOGK3IHL5

In the above representation, the observation ‘32’ is present in the many training datasets. The
original dataset has ‘n’ observations. And each training dataset can have n’ observations, where
n’<=n. Now, we can build a decision tree for each of those datasets. Each decision tree differs
slightly from one another because the data that we used to build is slightly different.

Column Sampling

The bagging technique considers only, ‘Row(observations) sampling with replacement’. That is,
all training datasets are largely going to break off at the same features throughout each model. But
the Random forest algorithm also considers ‘Column(Features) sampling with replacement’ with
row sampling. Instead of splitting at the same features at each node, each tree can be split based on
different features. The features considered for partitioning at each node are a random subset of the
original set of features.

From the above example, consider the training dataset ‘D¹’ with ‘M’ number of independent
variables, say A to J. Now we can build a decision tree for that dataset. Any decision tree algorithm
works top-down, by choosing a feature at each step that best splits the set of observations. It
considers all the independent features for splitting. And for each independent variable, we ask for
the best split and we choose the split as the best split amongst all the variables (Ex: CART
algorithm).

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
But in a random forest algorithm, we want each of the trees in the forest to be a little diverse from
one another. It tries a subset of all the features for splitting each node in each decision tree. In our
example, it considers only the ‘m’ number of features which is smaller than ‘M’.

Let ‘m’=3. To build the root node, three features are selected randomly say A, E, I, and whichever
feature gives the best split is used to split the node. The sample representation is shown below.

Similarly ‘m’ features are randomized for each split. That is after the root node is chosen, to split a
left and right subtree, again three features are randomly selected, say C, E, F and B, D, J
respectively. The same process is repeated iteratively, and we allow the tree to grow fully. This
ensures that each of these datasets is slightly different from one another and also the trees that we
built out of those datasets are even more different from one another. Thus it gives a very diverse
forest. Though we allow all the decision trees to grow larger, the final results do not overfit. While
individual trees tend to overfit training data, averaging corrects this. So no need to prune the fully
grown trees in random forests.

[email protected]
1ZOGK3IHL5

How to choose the value of ‘m’?

The number of features ‘m’ for splitting each node, from the total number of features ‘M’ is to be
chosen carefully to avoid any correlated or weak trees. If ‘m’ is very large, say ‘m’ is equal to ‘M’,
though the datasets are slightly different, the trees built from each dataset become very correlated. If
‘m’ is very small, say ‘m’ is equal to ‘2’, then the chance of actually catching one of the important
variables in the splitting mechanism becomes lower. Then those trees have a very weak ability to
predict.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Both of these extreme values (very large and very small) of ‘m’ are considered as bad choices. So
depending on the dataset, the ‘m’ value somewhere between very large and very small can be
considered. The sample representation is shown below.

[email protected]
1ZOGK3IHL5
If we have one tree, it is extremely sensitive to the data. The large set of trees are diverse enough to
provide robustness. But the trees should not be too diverse from one another so that their strength
becomes lower. So a good choice of ‘m’ gives us a much better prediction.

Random Forest algorithm summary

● Random Sampling with replacement

● For each subset, build a decision tree. However, only use ‘m’ randomly pick independent
variables for each node’s branching possibilities
● Do not prune
● While predicting:
● Use each tree to make individual predictions
● Combine predictions using voting:
● Means for regression
● Modes for classification

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Advantages of Random forest algorithm

● It is flexible to both classification and regression problems

● It works well with both categorical and continuous values
● Does not overfit and is highly stable as the final prediction is based on majority voting or
averaging
● Handles higher dimensionality data very well. Feature space is reduced because each tree
does not consider all the attributes
● It automates missing values present in the data
● Normalizing of data is not required as it uses a rule-based approach

Disadvantages of Random forest algorithm

● It requires much computational power as well as resources as it builds numerous trees to

combine their outputs.
● It also requires much time for training as it combines a lot of decision trees to determine the
class.
● Due to the ensemble of decision trees, it also suffers model interpretability

[email protected]
1ZOGK3IHL5

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.

Weekly Quiz 2 Boosting Ensemble Techniques and Model Tuning Great Learning PDF
100% (2)
Weekly Quiz 2 Boosting Ensemble Techniques and Model Tuning Great Learning PDF
8 pages
Latest Data Analysis MCQ Quiz
No ratings yet
Latest Data Analysis MCQ Quiz
36 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
A Novel Behavioral Scoring Model For Estimating Probability of Default
No ratings yet
A Novel Behavioral Scoring Model For Estimating Probability of Default
9 pages
Random Forest
No ratings yet
Random Forest
29 pages
Random Forest
No ratings yet
Random Forest
25 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Random Forest
No ratings yet
Random Forest
6 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Machine learning
No ratings yet
Machine learning
5 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Random Forest Algorithms - Comprehensive Guide With Examples
No ratings yet
Random Forest Algorithms - Comprehensive Guide With Examples
13 pages
Random Forests
No ratings yet
Random Forests
43 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Random FOrest
No ratings yet
Random FOrest
19 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
Ensemble Methods.pptx
No ratings yet
Ensemble Methods.pptx
32 pages
Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
4 pages
PDS+LVC+2+Post-Session+Summary
No ratings yet
PDS+LVC+2+Post-Session+Summary
11 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Unit 4
No ratings yet
Unit 4
33 pages
Random Forest
No ratings yet
Random Forest
10 pages
Random Forest (RF) : Decision Trees
No ratings yet
Random Forest (RF) : Decision Trees
3 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Random Forest
No ratings yet
Random Forest
25 pages
An Introduction to Random Forest Algorithm for beginners
No ratings yet
An Introduction to Random Forest Algorithm for beginners
16 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Case Study Possible Questions
No ratings yet
Case Study Possible Questions
3 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
05.Random Forest (2)
No ratings yet
05.Random Forest (2)
3 pages
ML-Lec6
No ratings yet
ML-Lec6
4 pages
Data Science - Decision Tree - Random Forest
No ratings yet
Data Science - Decision Tree - Random Forest
15 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Random forest algorithm 1
No ratings yet
Random forest algorithm 1
14 pages
Random Forest
No ratings yet
Random Forest
27 pages
Random Forest
No ratings yet
Random Forest
8 pages
Random Forest
No ratings yet
Random Forest
8 pages
25 June 2024 12:34: Random Fores Page 1
No ratings yet
25 June 2024 12:34: Random Fores Page 1
6 pages
Bagging and Random Forest Presentation1
100% (2)
Bagging and Random Forest Presentation1
23 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
2023AIB1008_Lab08
No ratings yet
2023AIB1008_Lab08
8 pages
03_Random Forest
No ratings yet
03_Random Forest
24 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
MLS+1+-+Decision+Trees+and+Random+Forests
No ratings yet
MLS+1+-+Decision+Trees+and+Random+Forests
16 pages
Session 7 - Random Forest
No ratings yet
Session 7 - Random Forest
8 pages
Da MS
No ratings yet
Da MS
24 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Random Forest
No ratings yet
Random Forest
21 pages
Random Forest
No ratings yet
Random Forest
83 pages
Module 2
No ratings yet
Module 2
34 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Bunzl Safety Wire Rope
No ratings yet
Bunzl Safety Wire Rope
20 pages
Analysis of Variance
No ratings yet
Analysis of Variance
6 pages
NCHRP - RPT - 461 - Geotech
No ratings yet
NCHRP - RPT - 461 - Geotech
125 pages
Hypothesis Testing Monograph
No ratings yet
Hypothesis Testing Monograph
50 pages
Sample Distribution
No ratings yet
Sample Distribution
1 page
Beyond The Numbers - Discovering The Fascinating History of Data Science
100% (1)
Beyond The Numbers - Discovering The Fascinating History of Data Science
27 pages
A Guide On How To Compare Different Models in Linear Progression
No ratings yet
A Guide On How To Compare Different Models in Linear Progression
8 pages
8457 English TarjomeFa
No ratings yet
8457 English TarjomeFa
8 pages
TarjomeFa F519 English
No ratings yet
TarjomeFa F519 English
9 pages
How Uniform - CDF Works For Discrete Uniform Distribution
No ratings yet
How Uniform - CDF Works For Discrete Uniform Distribution
1 page
Spudcan Bearing Capacity Calculation of The Offshore Jack-Up Drilling Platform Using The Preloading Process
No ratings yet
Spudcan Bearing Capacity Calculation of The Offshore Jack-Up Drilling Platform Using The Preloading Process
6 pages
Study Material - Statistics by Jim
No ratings yet
Study Material - Statistics by Jim
46 pages
Saipem 3000
No ratings yet
Saipem 3000
8 pages
Problem Statement - Medicon Case Study
No ratings yet
Problem Statement - Medicon Case Study
2 pages
Warri Globestar
No ratings yet
Warri Globestar
2 pages
SPM 7000re
No ratings yet
SPM 7000re
12 pages
Technical Specifications For The Installation of Telecoms Mast and Towers
No ratings yet
Technical Specifications For The Installation of Telecoms Mast and Towers
134 pages
Stair Case Section Selection
No ratings yet
Stair Case Section Selection
3 pages
Ma-Wa56-Dp25N 4.9-5.875 GHZ Dual Polarized/ Dual Slant Antenna
No ratings yet
Ma-Wa56-Dp25N 4.9-5.875 GHZ Dual Polarized/ Dual Slant Antenna
1 page
Sohail DataScientist
No ratings yet
Sohail DataScientist
3 pages
AI - ML in Healthcare - Notes
No ratings yet
AI - ML in Healthcare - Notes
34 pages
PdM FSA Predictive Maintenance Framework
No ratings yet
PdM FSA Predictive Maintenance Framework
13 pages
JournalPaper ASC Updated
No ratings yet
JournalPaper ASC Updated
16 pages
Deep Ensemble Learning With Pruning For DDoS Attack Detection in IoT Networks
No ratings yet
Deep Ensemble Learning With Pruning For DDoS Attack Detection in IoT Networks
21 pages
5G-Smart Diabetes Toward Personalized Diabetes Diagnosis With Healthcare Big Data Clouds
No ratings yet
5G-Smart Diabetes Toward Personalized Diabetes Diagnosis With Healthcare Big Data Clouds
8 pages
Ensemble Learning: Martin Sewell
No ratings yet
Ensemble Learning: Martin Sewell
16 pages
Developmental Dyslexia Detection Using Machine Lea
No ratings yet
Developmental Dyslexia Detection Using Machine Lea
7 pages
ML notes
No ratings yet
ML notes
16 pages
Artificial Intelligence and Parametric Construction Cost Estimate Modeling State-of-The-Art Review
No ratings yet
Artificial Intelligence and Parametric Construction Cost Estimate Modeling State-of-The-Art Review
31 pages
A Neural-Network-Based Nonlinear Metamodeling Approach To Financial Time Series Forecasting
No ratings yet
A Neural-Network-Based Nonlinear Metamodeling Approach To Financial Time Series Forecasting
12 pages
A Crop Recommendation System To Improve Crop Produ
No ratings yet
A Crop Recommendation System To Improve Crop Produ
5 pages
Machine Learning Theory and Application
No ratings yet
Machine Learning Theory and Application
3 pages
Theoretical Evaluation of Ensemble Machine Learning Techniques
No ratings yet
Theoretical Evaluation of Ensemble Machine Learning Techniques
9 pages
MLQB Unit 3
No ratings yet
MLQB Unit 3
12 pages
CIRP Conference Paper V5.0
No ratings yet
CIRP Conference Paper V5.0
7 pages
CSE
No ratings yet
CSE
20 pages
Integrating Machine Learning Algorithms A Hybrid Model for Lung Cancer Prediction
No ratings yet
Integrating Machine Learning Algorithms A Hybrid Model for Lung Cancer Prediction
3 pages
PDF Malware Detection Toward Machine Learning Modeling With Explainability Analysis
No ratings yet
PDF Malware Detection Toward Machine Learning Modeling With Explainability Analysis
27 pages
Unit-I (Ensemble Learning)
No ratings yet
Unit-I (Ensemble Learning)
67 pages
Azure-AI-Fundamentals-AI-900
No ratings yet
Azure-AI-Fundamentals-AI-900
84 pages
Malaria Parasitic Detection Using A New Deep Boosted and Ensemble Learning Framework
No ratings yet
Malaria Parasitic Detection Using A New Deep Boosted and Ensemble Learning Framework
17 pages
A Review On Machine Learning For EEG Signal Processing in Bioengineering
No ratings yet
A Review On Machine Learning For EEG Signal Processing in Bioengineering
15 pages
ML Interview Questions
No ratings yet
ML Interview Questions
21 pages
Samruddhi Malware
No ratings yet
Samruddhi Malware
5 pages
Curriculum Guide: Artificial Intelligence and Machine Learning
No ratings yet
Curriculum Guide: Artificial Intelligence and Machine Learning
8 pages
Lecture 0
No ratings yet
Lecture 0
33 pages
(Ebook) Deep Learning Foundations by Taeho Jo ISBN 9783031328787, 3031328787 - The complete ebook is available for download with one click
100% (1)
(Ebook) Deep Learning Foundations by Taeho Jo ISBN 9783031328787, 3031328787 - The complete ebook is available for download with one click
80 pages
Ai & ML 2 Marks Was
No ratings yet
Ai & ML 2 Marks Was
23 pages