0% found this document useful (0 votes)

1 views

Chapter 7_Printed

Uploaded by

Siana Halim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Chapter 7_Printed

Uploaded by

Siana Halim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Ensemble

Learning and
Random Forests

Introduction
• Suppose you pose a complex question to thousands of random
people, then aggregate their answers.
• In many cases you will find that this aggregated answer is better than
an expert’s answer. This is called the wisdom of the crowd.
• Similarly, if you aggregate the predictions of a group of predictors
(such as classifiers or regressors), you will often get better predictions
than with the best individual predictor.
• A group of predictors is called an ensemble; thus, this technique is
called Ensemble Learning, and an Ensemble Learning algorithm is
called an Ensemble method.

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 1
Voting Classifiers
Suppose you have trained a few classifiers, each one achieving about
80% accuracy.

Ge’ron, Hands-on Machine Learning,

A very simple way to create an even better classifier is to aggregate the

predictions of each classifier and predict the class that gets the most
votes. This majority-vote classifier is called a hard-voting classifier

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 2
• Voting classifier often achieves a higher accuracy than the best
classifier in the ensemble.
• Ensemble methods work best when the predictors are as
independent from one another as possible.
• One way to get diverse classifiers is to train them using very different
algorithms.
• If all classifiers are able to estimate class probabilities (i.e., they all
have a predict_proba() method), then you can tell Scikit-Learn to
predict the class with the highest-class probability, averaged over all
the individual classifiers. This is called soft voting.

Ge’ron, Hands-on Machine Learning,

Biased Coin Tossing

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 3
Bagging and Pasting
• One way to get a diverse set of classifiers is to use very different training algorithms.
• Another approach is to use the same training algorithm for every predictor and train
them on different random subsets of the training set.
• When sampling is performed with replacement, this method is called bagging (short
for bootstrap aggregating ).
• When sampling is performed without replacement, it is called pasting

Ge’ron, Hands-on Machine Learning,

• Once all predictors are trained, the ensemble can make a prediction
for a new instance by simply aggregating the predictions of all
predictors.
• The aggregation function is typically the statistical mode for
classification, or the average for regression.

SHL-TI-UKP 4
Out-of-Bag Evaluation
• With bagging, some instances may be sampled several times for any
given predictor, while others may not be sampled at all.
• Only about 63% of the training instances are sampled on average for
each predictor. The remaining 37% of the training instances that are
not sampled are called out-of-bag(oob) instances.
• OOB is the mean prediction error on each training sample 𝑥 , using
only the trees that did not have 𝑥 in their bootstrap sample.

Ge’ron, Hands-on Machine Learning,

Random Patches and Random Subspaces

• BaggingClassifier class supports instance sampling and feature
sampling.
• Instance sampling is controlled by max_samples and bootstrap
• Feature sampling is controlled by two hyperparameters: max_features
and bootstrap_features.
• Sampling both training instances and features is called the Random
Patches method.
• Keeping all training instances (by setting bootstrap=False and
max_samples=1.0) but sampling features (by setting
bootstrap_features=True and/or max_features <1.0) is called the
Random Subspaces method.

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 5
Random Forests
• Random Forest is an ensemble of Decision Trees, generally trained via
the bagging method (or sometimes pasting), typically with
max_samples set to the size of the training set.
• Instead of building a BaggingClassifier and passing it a
DecisionTreeClassifier, you can instead use theRandomForestClassifier
class, which is more convenient and optimized for Decision Trees.

Ge’ron, Hands-on Machine Learning,

Decision
Tree

SHL-TI-UKP 6
Random
Forest

Extra-Trees
• When you are growing a tree in a Random Forest, at each node only a
random subset of the features is considered for splitting.
• It is possible to make trees even more random by also using random
thresholds for each feature rather than searching for the best possible
thresholds (like regular Decision Trees do).
• A forest of such extremely random trees is called an Extremely
Randomized Trees ensemble (or Extra-Trees for short).
• This technique trades more bias for a lower variance.
• It also makes Extra-Trees much faster to train than regular Random
Forests, because finding the best possible threshold for each feature
at every node is one of the most time-consuming tasks of growing a
tree.

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 7
Feature Importance
• Scikit-Learn measures a feature’s importance by looking at how much
the tree nodes that use that feature reduce impurity on average
(across all trees in the forest).
• More precisely, it is a weighted average, where each node’s weight is
equal to the number of training samples that are associated with it.

Ge’ron, Hands-on Machine Learning,

Boosting
• Boosting (originally called hypothesis boosting) refers to any
Ensemble method that can combine several weak learners into a
strong learner.
• The general idea of most boosting methods is to train predictors
sequentially, each trying to correct its predecessor.

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 8
Adaptive Boosting (AdaBoost)
• One way for a new predictor to correct its predecessor is to pay a bit
more attention to the training instances that the predecessor
underfitted.
• This results in new predictors focusing more and more on the hard
cases.

Ge’ron, Hands-on Machine Learning,

• For example, when training an AdaBoost classifier, the algorithm first

trains a base classifier (such as a Decision Tree) and uses it to make
predictions on the training set.
• The algorithm then increases the relative weight of misclassified
training instances.
• Then it trains a second classifier, using the updated weights, and
again makes predictions on the training set, updates the instance
weights, and so on.
• There is one important drawback to this sequential learning
technique:
• it cannot be parallelized (or only partially), since each predictor can only be
trained after the previous predictor has been trained and evaluated. As a
result, it does not scale as well as bagging or pasting.

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 9
AdaBoost in detail
(0) Each instance weight 𝑤 ( ) =
(1) Weighted error rate of the 𝑗-th predictor
∑ 𝑤( )
()
𝑟 =
∑ 𝑤( )
(2) Calculate predictor weight
1−𝑟
𝛼 = 𝜂 log
𝑟
𝜼 is the learning rate hyperparameter (defaults to 1). The more accurate the
predictor is, the higher its weight will be.
If it is just guessing randomly, then its weight will be close to zero.
If it is negative then, the predictor is less accurate than random guessing
(WRONG PREDICTOR)

(3) Weighted update rule

()
𝑤 𝑖𝑓 𝑦 = 𝑦( )
𝑤( ) ← ()
𝑤 ( ) exp(𝛼 ) 𝑖𝑓 𝑦 ≠ 𝑦( )
For 𝑖 = 1,2, … , 𝑚
Then all the instance weights are normalized (i.e., divided by ∑ 𝑤 ( ) )
(4) Finally, a new predictor is trained using the updated weights, and the whole process is repeated.
(5) The algorithm stops when the desired number of predictors is reached, or when a perfect
predictor is found.
(6) AdaBoost Predictions
𝑦 x = argmax 𝛼

x
𝑁 is the number of predictor
Scikit-Learn uses a multiclass version of AdaBoost called SAMME (which stands for Stagewise
Additive Modeling using a Multiclass Exponential loss function). When there are just two classes,
SAMME is equivalent to AdaBoost.

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 10
Gradient Boosting
• Just like AdaBoost, Gradient Boosting works by sequentially adding
predictors to an ensemble, each one correcting its predecessor.
• However, Gradient Boosting tries to fit the new predictor to the
residual errors made by the previous predictor.

Ge’ron, Hands-on Machine Learning,

Example: Gradient Boosting

SHL-TI-UKP 11
23

Stacking
• The last Ensemble method we will
discuss in this chapter is called
stacking (short for stacked
generalization).
• The idea: Train a model to perform
aggregation
• To train the blender (meta learner),
a common approach is to use a
hold-out set

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 12
• First, the training set is split into two subsets. The first subset is used
to train the predictors in the first layer

Ge’ron, Hands-on Machine Learning,

• Next, the first layer’s predictors are used

to make predictions on the second (held-
out) set.
• This ensures that the predictions are
“clean,” since the predictors never saw
these instances during training.
• For each instance in the hold-out set,
there are three predicted values.
• We can create a new training set using
these predicted values as input features,
and keeping the target values.
• The blender is trained on this new training
set, so it learns to predict the target value,
given the first layer’s predictions.

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 13
Split the training set into three subsets:
• The first one is used to train the first layer,
• The second one is used to create the training
set used to train the second layer (using
predictions made by the predictors of the
first layer), and
• The third one is used to create the training
set to train the third layer (using predictions
made by the predictors
• of the second layer).
Once this is done, we can make a prediction for
a new instance by going through each layer
sequentially

Ge’ron, Hands-on Machine Learning,

SHL-TI-UKP 14

Sy0 701
No ratings yet
Sy0 701
261 pages
GeorgiaTech CS-6515: Graduate Algorithms: Divide-And-Conquer Flashcards by Yang Hu - Brainscape
No ratings yet
GeorgiaTech CS-6515: Graduate Algorithms: Divide-And-Conquer Flashcards by Yang Hu - Brainscape
8 pages
Network Redesign and Rebuilding Proposal
No ratings yet
Network Redesign and Rebuilding Proposal
3 pages
Module 2
No ratings yet
Module 2
34 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
U1-Ensemble Methods
No ratings yet
U1-Ensemble Methods
17 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
ML UNIT-3 PART-1
No ratings yet
ML UNIT-3 PART-1
17 pages
What Is Ensemble Learning
No ratings yet
What Is Ensemble Learning
4 pages
Unit I ML (I) 24-25
No ratings yet
Unit I ML (I) 24-25
79 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Unit I ML (I) 24-25-1
No ratings yet
Unit I ML (I) 24-25-1
152 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Machine learning lecture 2,3,4
No ratings yet
Machine learning lecture 2,3,4
26 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Ensemble Learning
100% (1)
Ensemble Learning
7 pages
ML - 5
No ratings yet
ML - 5
53 pages
Unit 3
No ratings yet
Unit 3
99 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Ensemble Learning Algorithms
100% (1)
Ensemble Learning Algorithms
33 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
4 pages
ML Unit-3
No ratings yet
ML Unit-3
16 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
8 Classification
No ratings yet
8 Classification
45 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
ML UNIT 3-1
No ratings yet
ML UNIT 3-1
14 pages
2.4-Ensemble_methods_lecture_notes (1)
No ratings yet
2.4-Ensemble_methods_lecture_notes (1)
14 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
2025 Ensemble Learning.docx
No ratings yet
2025 Ensemble Learning.docx
25 pages
14 Model Ensembles
No ratings yet
14 Model Ensembles
63 pages
22AIP3101A Session 11
No ratings yet
22AIP3101A Session 11
30 pages
Module 4
No ratings yet
Module 4
30 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Mid2 Answers
No ratings yet
Mid2 Answers
42 pages
Ensemble Methods.pptx
No ratings yet
Ensemble Methods.pptx
32 pages
ML Cat 2 - 7
No ratings yet
ML Cat 2 - 7
30 pages
Time To Explore (5) ML
No ratings yet
Time To Explore (5) ML
9 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
M3
No ratings yet
M3
38 pages
Voting or Averaging of Predictions of Multiple Pre-Trained Models
No ratings yet
Voting or Averaging of Predictions of Multiple Pre-Trained Models
23 pages
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Intermediate AI Prompting – Reinforcement Learning
From Everand
Intermediate AI Prompting – Reinforcement Learning
Eric Centore
No ratings yet
(2024) (Zaoui Et Al) Impact of Artificial Intelligence On Aeronautics - An Industry-Wide Review
No ratings yet
(2024) (Zaoui Et Al) Impact of Artificial Intelligence On Aeronautics - An Industry-Wide Review
19 pages
Log
No ratings yet
Log
46 pages
Bus Organization of 8085
No ratings yet
Bus Organization of 8085
14 pages
JCM UBA 10 SS
No ratings yet
JCM UBA 10 SS
150 pages
7 8 Week3 Technology Algorithms - Robotics
No ratings yet
7 8 Week3 Technology Algorithms - Robotics
21 pages
Design A Biome in TinkerCAD Lesson Plan
No ratings yet
Design A Biome in TinkerCAD Lesson Plan
3 pages
C2W5-Quick-Reference - Data Specialization in Excel
No ratings yet
C2W5-Quick-Reference - Data Specialization in Excel
1 page
Assignment 1 Deadline 2075/8/17
No ratings yet
Assignment 1 Deadline 2075/8/17
1 page
41 FCPX Shortcuts by Brad and Donna
No ratings yet
41 FCPX Shortcuts by Brad and Donna
1 page
Flight Price Prediction Report
No ratings yet
Flight Price Prediction Report
18 pages
Inspiron 23 Service Manual: Computer Model: Inspiron 2350 Regulatory Model: W07C Regulatory Type: W07C002
No ratings yet
Inspiron 23 Service Manual: Computer Model: Inspiron 2350 Regulatory Model: W07C Regulatory Type: W07C002
106 pages
Market Basket Analysis of Instacart
No ratings yet
Market Basket Analysis of Instacart
68 pages
156 Useful Run Commands.......... Great One For IT Guys
No ratings yet
156 Useful Run Commands.......... Great One For IT Guys
5 pages
EE303 Digital System - 2010fall: Unit 12 Registers and Counters
No ratings yet
EE303 Digital System - 2010fall: Unit 12 Registers and Counters
50 pages
Maxserver Getting Started
No ratings yet
Maxserver Getting Started
164 pages
Forms folds sizes all the details graphic designers need to know but can never find 2nd ed Edition Sherin download pdf
100% (5)
Forms folds sizes all the details graphic designers need to know but can never find 2nd ed Edition Sherin download pdf
81 pages
Checkpoint Interview Questions
100% (3)
Checkpoint Interview Questions
2 pages
Information at Risk Online Safety Tips Internet Threats: Review
No ratings yet
Information at Risk Online Safety Tips Internet Threats: Review
68 pages
CIRCUIT THEORY 1 ASSIGNMENT
No ratings yet
CIRCUIT THEORY 1 ASSIGNMENT
3 pages
Multiple Choice Questions Question Bank
0% (2)
Multiple Choice Questions Question Bank
2 pages
Ent 801
No ratings yet
Ent 801
121 pages
COA CH 2
No ratings yet
COA CH 2
50 pages
1.2-Difference Between Operational and Informational Systems
No ratings yet
1.2-Difference Between Operational and Informational Systems
6 pages
MAS OS X - Digital Performer 5
No ratings yet
MAS OS X - Digital Performer 5
1 page
Application of Planar Graph To Design Printed Circuit Board
No ratings yet
Application of Planar Graph To Design Printed Circuit Board
6 pages
DONE SOFT COMPUTING Unit 1
No ratings yet
DONE SOFT COMPUTING Unit 1
3 pages
Optiplex 3050 Desktop - Owners Manual2 - en Us PDF
No ratings yet
Optiplex 3050 Desktop - Owners Manual2 - en Us PDF
98 pages