0% found this document useful (0 votes)

13 views

Bagging and Random Forests

Uploaded by

Ali Raza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Bagging and Random Forests

Uploaded by

Ali Raza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Bagging and Random Forests

1
Slides based on: STT450-550: Statistical Data Mining
Ensemble methods
• A single decision tree does not perform well

• But, it is super fast

• What if we learn multiple trees?

We need to make sure they do not all just learn the same
Problem!
• Decision trees discussed earlier suffer from high
variance!
• If we randomly split the training data into 2 parts, and fit
decision trees on both parts, the results could be quite
different

• We would like to have models with low variance

3
Problem!

http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/Ensemble%20(v6).pdf 4
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/Ensemble%20(v6).pdf 5
Bagging
To solve this problem, we can use bagging (bootstrap aggregating).

6
What is bagging?
• Bagging is an extremely powerful idea based on
two things:
• Averaging: reduces variance!
• Bootstrapping: plenty of training datasets!

• Why does averaging reduces variance?

• Averaging a set of observations reduces variance. Recall
that given a set of n independent observations Z1, …, Zn,
each with variance , the variance of the mean of
the observations is given by

7
Bootstrapping is simple!
• Resampling of the observed dataset (and of equal size
to the observed dataset), each of which is obtained by
random sampling with replacement from the original
dataset.

STT450-550: Statistical Data Mining 8

http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/Ensemble%20(v6).pdf 9
How does bagging work?
• Generate B different bootstrapped training datasets, by
taking repeated sample from the (single) training data
set.

• Train the statistical learning method on each of the B

bootstrapped training datasets, and obtain B predictions.

• For prediction:
• Regression: average all B predictions from all B trees
• Classification: majority vote among all B trees

10
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/Ensemble%20(v6).pdf 11
Bagging
• Reduces overfitting (variance)

• Normally uses one type of classifier

• Decision trees are popular

• Easy to parallelize
Bagging for Regression Trees
• Construct B regression trees using B bootstrapped
training datasets
• Average the resulting predictions

• Note: These trees are not pruned, so each

individual tree has high variance but low bias.
• Averaging these trees reduces variance, and thus
we end up lowering both variance and bias 

14
Bagging for Classification Trees
• Construct B regression trees using B bootstrapped
training datasets
• For prediction, there are two approaches:
1. Record the class that each bootstrapped data set predicts
and provide an overall prediction to the most commonly
occurring one (majority vote).
2. If our classifier produces probability estimates we can just
average the probabilities and then predict to the class with
the highest probability.
• Both methods work well.

15
A Comparison of Error Rates
• Here the green line
represents a simple
majority vote approach

• The purple line

corresponds to
averaging the
probability estimates.

• Both do far better than

a single tree (dashed
red) and get close to
the Bayes error rate
(dashed grey).
16
Out-of-Bag Error Estimation
• Since bootstrapping involves random selection of
subsets of observations to build a training data set,
then the remaining non-selected part could be the
testing data.
• On average, each bagged tree makes use of around
2/3 of the observations, so we end up having 1/3 of
the observations (Out of Bag -- OOB) used for testing.

• 1-1/exp(1) ~ 0.632
• 1/exp(1) ~ 0.368
19
Variable Importance Measure
• Bagging typically improves the accuracy over
prediction using a single tree, but it is now hard to
interpret the model!
• We have hundreds of trees, and it is no longer clear
which variables are most important to the procedure
• Thus bagging improves prediction accuracy at the
expense of interpretability
• But, we can still get an overall summary of the
importance of each predictor using Relative
Influence Plots
20
Relative Influence Plots
• How do we decide which variables are most useful
in predicting the response?
• We can compute something called relative influence
plots.
• These plots give a score for each variable.
• These scores represent the decrease in MSE when
splitting on a particular variable
• A number close to zero indicates the variable is not
important and could be dropped.
• The larger the score the more influence the variable has.

21
Example: Housing Data
• Median
Income is by
far the most
important
variable.

• Longitude,
Latitude and
Average
occupancy are
the next most
important. 22
Random Forests

23
Random Forests for Classification
• It is a very efficient statistical learning method
• It builds on the idea of bagging, but it provides an improvement
because it de-correlates the trees
• Decision tree:
• Easy to achieve 0% error rate on training data
• If each training example has its own leaf ……
• Random forest: Bagging of decision tree
• Resampling training data is not sufficient
• Randomly restrict the features/questions used in each split
• How does it work?
• Build a number of decision trees on bootstrapped training sample, but
when building these trees, each time a split in a tree is considered, a
random sample of m predictors is chosen as split candidates from the
full set of p predictors (Usually )

24
Random Forests for Regression

25
Why are we considering a random sample of m
predictors instead of all p predictors for splitting?

• Suppose that we have a very strong predictor in the data set

along with a number of other moderately strong predictor, then
in the collection of bagged trees, most or all of them will use the
very strong predictor for the first split!

• All bagged trees will look similar. Hence all the predictions from
the bagged trees will be highly correlated

• Averaging many highly correlated quantities does not lead to a

large variance reduction, and thus random forests “de-correlates”
the bagged trees leading to more reduction in variance

26
Random Forest with different values of “m”
• Notice when
random forests
are built using
• m = p, then this
amounts simply
to bagging.

cs229 Notes Ensemble
No ratings yet
cs229 Notes Ensemble
7 pages
08 Tree Advanced
No ratings yet
08 Tree Advanced
68 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Random Forests
No ratings yet
Random Forests
43 pages
CS109a Lecture16 Bagging RF Boosting
No ratings yet
CS109a Lecture16 Bagging RF Boosting
48 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Chp 8.2 Intro to Statistical Learning
No ratings yet
Chp 8.2 Intro to Statistical Learning
13 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Insurance Analytics: Prof. Julien Trufin
No ratings yet
Insurance Analytics: Prof. Julien Trufin
50 pages
RandomForest2324 CR - 4p
No ratings yet
RandomForest2324 CR - 4p
11 pages
Random Forests: N 1 N J X A I X A I
No ratings yet
Random Forests: N 1 N J X A I X A I
12 pages
Lecture 22: Bagging and Random Forest: Wenbin Lu Department of Statistics North Carolina State University Fall 2019
No ratings yet
Lecture 22: Bagging and Random Forest: Wenbin Lu Department of Statistics North Carolina State University Fall 2019
35 pages
Ensemble Methods.pptx
No ratings yet
Ensemble Methods.pptx
32 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Session 7 - Random Forest
No ratings yet
Session 7 - Random Forest
8 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
Random Forest
No ratings yet
Random Forest
29 pages
Week 12
No ratings yet
Week 12
34 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Bagging and Random Forest Presentation1
100% (2)
Bagging and Random Forest Presentation1
23 pages
Machine learning
No ratings yet
Machine learning
5 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest
No ratings yet
Random Forest
5 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Random Forest
No ratings yet
Random Forest
83 pages
Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
No ratings yet
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
53 pages
Bagging vs Boosting - Javatpoint
No ratings yet
Bagging vs Boosting - Javatpoint
8 pages
Ensemble Learning: David Sontag New York University
No ratings yet
Ensemble Learning: David Sontag New York University
17 pages
ML - 5
No ratings yet
ML - 5
53 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
PDS+LVC+2+Post-Session+Summary
No ratings yet
PDS+LVC+2+Post-Session+Summary
11 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Data Science - Decision Tree - Random Forest
No ratings yet
Data Science - Decision Tree - Random Forest
15 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
ML UNIT-3 PART-1
No ratings yet
ML UNIT-3 PART-1
17 pages
Handbook Cs Rev2010
No ratings yet
Handbook Cs Rev2010
39 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Random+Forest+Summary
No ratings yet
Random+Forest+Summary
6 pages
Unit I ML (I) 24-25-1
No ratings yet
Unit I ML (I) 24-25-1
152 pages
ML mod1
No ratings yet
ML mod1
48 pages
22AIP3101A Session 11
No ratings yet
22AIP3101A Session 11
30 pages
CP 4
No ratings yet
CP 4
2 pages
2.4-Ensemble_methods_lecture_notes (1)
No ratings yet
2.4-Ensemble_methods_lecture_notes (1)
14 pages
Random Forest
No ratings yet
Random Forest
10 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
Module 2
No ratings yet
Module 2
34 pages
Lecture3 Bias and Variance Analysis and Bagging
No ratings yet
Lecture3 Bias and Variance Analysis and Bagging
22 pages
ML-Lecture-15-Ensemble
No ratings yet
ML-Lecture-15-Ensemble
27 pages

Bagging and Random Forests

Uploaded by

Bagging and Random Forests

Uploaded by

Bagging and Random Forests

• But, it is super fast

• What if we learn multiple trees?

• We would like to have models with low variance

• Why does averaging reduces variance?

STT450-550: Statistical Data Mining 8

• Train the statistical learning method on each of the B

• Normally uses one type of classifier

• Decision trees are popular

• Note: These trees are not pruned, so each

• The purple line

• Both do far better than

• Suppose that we have a very strong predictor in the data set

• Averaging many highly correlated quantities does not lead to a

You might also like