0% found this document useful (0 votes)

36 views

Chapter 7 - Ensemble

Ensemble methods like bagging, boosting, and stacking combine multiple machine learning models to improve performance. Bagging trains each model on randomly sampled subsets of the training data and averages the results. Boosting builds models sequentially by focusing on instances previous models misclassified. Stacking trains a meta-model to aggregate the predictions of the base models and further reduce error. These techniques can significantly improve accuracy and reduce variance compared to single models.

Uploaded by

Bic Bui Sport

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Chapter 7 - Ensemble

Uploaded by

Bic Bui Sport

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Chapter 7: Ensemble

**soft **instead of hard when classifiers are able to estimate class probabilities (predict_proba) =>
more confident votes average over all the individual classifiers

When sampling is performed with replacement, this method is called bagging (short for bootstrap
aggregating )

=> Allow for training instances to be sampled multiple times.

When sampling is performed without replacement, it is called pasting.

=> Effective because can be run in parallel => Scale very well

Bagging more biased than pasting (diversity means less correlated), but less variance

oob (Out of bag) are the not used instances, around a third of the instances.

Random patches and random subspaces

max_samples & bootstrap_features

Good for higher dimension inputs like images

Sampling both training instances and features is called the Random Patches method. Keeping all
training instances (by setting bootstrap=False and max_samples=1.0) but sampling features (by setting
bootstrap_features to True and/or max_features to a value smaller than 1.0) is called the Random
Subspaces method.

Ensembles on Random Patches

Ensembles on Random Subspaces

Sampling features => more diversity => more bias but lower variance

BaggingClassifier equivalent of RandomForestClassifier:

bag_clf = BaggingClassifier(

DecisionTreeClassifier(splitter="random", max_leaf_nodes=16),

n_estimators=500, max_samples=1.0, bootstrap=True, n_jobs=-1)

Extra-trees

Random thresholds for each feature rather best possible threshold =>
Extremely Randomized Trees ensemble

=> trades off more bias less variance => much faster to train (best possible threshold for each feature
at every node is time consuming)

Tip: Hard to tell if Random Forest is better or not, the only way to know is cross validation

Feature importance:

Random forest measures the relative importance of each feature

=> how much the tree nodes using that feature reduce its impurity on average (all trees)

Boosting (hypothesis boosting):

Combine several weak learners into a strong learner.

Train predictors sequentially, each trying to correct its predecessor.

Most popular is AdaBoost (Adaptive Boosting) and Gradient Boosting

AdaBoost:

Pay more attention to the training instances that its pred ignored

=> predictors that focus more on harder cases.

First train a base classifier

Then increase the relative weight of misclassified instances

Then trains a second classifier, and so on.

=> similar to GD but instead adding more predictors

Warning: One drawback => cannot be parallelized (each can only be trained after the last one)
=> does not scale as well as bagging or pasting.

AdaBoost Algo:
each w(i) is initially set to 1/m.

Then calculate the predictor's weight aj, η is the learning rate hyperparameter (defaults to 1)

The more accurate => higher its weight. If just guessing randomly, weight close to 0. If often wrong =>
weight is negative.

Next, Ada updates its instances' weights

Then all weights are normalized (i.e., divided by m∑i=1 w(i))

Then repeat. Algorithm stops when the desired num of predictors reached, or when a perfect predictor
is found.

**To Predict, **Ada computes all the predictions of all the predictors and weights them. The predicted
class is the one that recieves the majority of weighted votes.
Scikit learn uses a multiclass version of AdaBoost called SAMME (Stagewise Additive Modeling using a
Multiclass Exponential loss function), SAMME.R if have predict_proba() (performs better)

WHen just two classes => SAMME = AdaBoost

A decision stump = decision tree with max_depth = 1

Gradient Boosting:

Like AdaBoost, but instead of tweaking the instance weights at every iter, tries to fit the residual errors
left to right => shrinkage

Early stopping:

import numpy as np
from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error

X_train, X_val, y_train, y_val = train_test_split(X, y)

gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=120)

gbrt.fit(X_train, y_train)

errors = [mean_squared_error(y_val, y_pred)

for y_pred in gbrt.staged_predict(X_val)]

bst_n_estimators = np.argmin(errors) + 1

gbrt_best =
GradientBoostingRegressor(max_depth=2,n_estimators=bst_n_estimators)

gbrt_best.fit(X_train, y_train)
Also you can set warm_start = true => keep existing trees

gbrt = GradientBoostingRegressor(max_depth=2, warm_start=True)

min_val_error = float("inf")

error_going_up = 0

for n_estimators in range(1, 120):

gbrt.n_estimators = n_estimators

gbrt.fit(X_train, y_train)

y_pred = gbrt.predict(X_val)

val_error = mean_squared_error(y_val, y_pred)

if val_error < min_val_error:

min_val_error = val_error

error_going_up = 0

else:
error_going_up += 1

if error_going_up == 5:

break # early stopping

subsample => fraction of the training instances => training each tree => higher bias lower variance =>
speeds up training => Stochastic Gradient Boosting

Check out XBoost Lib!

Stacking:

stacked generalization

Simple idea => train a model to aggregate the votes => blender/ meta learner
To train a blender => hold-out set
=> Possible to train several different blenders (one using linear regression, other using random forest)
=> a whole layer of blenders

=> The trick: split into 3 subsets, first one for the 1st layer, second one to train the 2nd layer, third one
to train the 3rd layer

Cracking Codes
No ratings yet
Cracking Codes
4 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
1729585037_ML11_Generalization
No ratings yet
1729585037_ML11_Generalization
40 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
U1-Ensemble Methods
No ratings yet
U1-Ensemble Methods
17 pages
Machine learning
No ratings yet
Machine learning
76 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
ENsemble, Random Forest
No ratings yet
ENsemble, Random Forest
28 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Ensemble Methods.pptx
No ratings yet
Ensemble Methods.pptx
32 pages
Ensemble
No ratings yet
Ensemble
14 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Module 2
No ratings yet
Module 2
34 pages
UNIT3_class
No ratings yet
UNIT3_class
30 pages
Ml_interview
No ratings yet
Ml_interview
65 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
Chapter 7_Printed
No ratings yet
Chapter 7_Printed
14 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Lecture 17 - Ensemble Learning
No ratings yet
Lecture 17 - Ensemble Learning
31 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
ML mod1
No ratings yet
ML mod1
48 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
DSA5102_lecture3
No ratings yet
DSA5102_lecture3
34 pages
ML UNIT-3 PART-1
No ratings yet
ML UNIT-3 PART-1
17 pages
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
No ratings yet
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
21 pages
unit 5 ML
No ratings yet
unit 5 ML
14 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Voting or Averaging of Predictions of Multiple Pre-Trained Models
No ratings yet
Voting or Averaging of Predictions of Multiple Pre-Trained Models
23 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
Unit 3
No ratings yet
Unit 3
99 pages
Bagging and Random Forest Presentation1
100% (2)
Bagging and Random Forest Presentation1
23 pages
Ensemble Learning
No ratings yet
Ensemble Learning
30 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Calculus by Muhammad Umer
From Everand
Calculus by Muhammad Umer
Muhammad Umer
No ratings yet
COE301 Lab 14 Pipelined CPU Design With Stall Capability
No ratings yet
COE301 Lab 14 Pipelined CPU Design With Stall Capability
4 pages
(Heat Transfer) Chapter 5
No ratings yet
(Heat Transfer) Chapter 5
34 pages
Fundamentals of Web Technology Unit - 1: Tcp/Ip
100% (2)
Fundamentals of Web Technology Unit - 1: Tcp/Ip
16 pages
Tmc2160-Power Driver For Stepper Motors
No ratings yet
Tmc2160-Power Driver For Stepper Motors
107 pages
Construction Management System
25% (4)
Construction Management System
23 pages
MassivelyParallelModelsComputation_ValmirBarbosa
No ratings yet
MassivelyParallelModelsComputation_ValmirBarbosa
304 pages
Morehouse College Starts VR Classes With VictoryXR
No ratings yet
Morehouse College Starts VR Classes With VictoryXR
4 pages
Unit 4 Chapter 17 Agents
No ratings yet
Unit 4 Chapter 17 Agents
38 pages
Cyble Sensor en
No ratings yet
Cyble Sensor en
2 pages
LTE TDD B2268HS Quick Start Guide
No ratings yet
LTE TDD B2268HS Quick Start Guide
28 pages
Cel (Acoustical Calibrator)
100% (1)
Cel (Acoustical Calibrator)
6 pages
Cisco Diversification Into B2C: Project By: Sumit Verma
No ratings yet
Cisco Diversification Into B2C: Project By: Sumit Verma
12 pages
Baseband Hardware Overview PA23-4
No ratings yet
Baseband Hardware Overview PA23-4
14 pages
Pente: David Kron, Matt Renzelmann, Eric Richmond, and Todd Ritland
No ratings yet
Pente: David Kron, Matt Renzelmann, Eric Richmond, and Todd Ritland
16 pages
Dell 2407WFP Power Supply Repair Manual
100% (4)
Dell 2407WFP Power Supply Repair Manual
3 pages
Cs projects for class 12
No ratings yet
Cs projects for class 12
20 pages
12-01-16-Bump Integrator PDF
No ratings yet
12-01-16-Bump Integrator PDF
3 pages
Practical Electronics 1966 11 S OCR2 PDF
No ratings yet
Practical Electronics 1966 11 S OCR2 PDF
70 pages
Downloading The Scheduling Engine
No ratings yet
Downloading The Scheduling Engine
1 page
9.Layer 3 Routing Basic and Static Route
No ratings yet
9.Layer 3 Routing Basic and Static Route
18 pages
Components of Computer
No ratings yet
Components of Computer
5 pages
Jurisprudence & Scope of Cyber Law
No ratings yet
Jurisprudence & Scope of Cyber Law
20 pages
Wutong Gift Card 3
No ratings yet
Wutong Gift Card 3
1 page
Pic16f72 3 Phase Motor Control
100% (1)
Pic16f72 3 Phase Motor Control
18 pages
Turnkey Weld Inspection Solutions Using PAUT and TOFD
No ratings yet
Turnkey Weld Inspection Solutions Using PAUT and TOFD
10 pages
Python 101-PCEP Preparation
No ratings yet
Python 101-PCEP Preparation
49 pages
Complete Download A Guide to Service Desk Concepts Third Edition Donna Knapp PDF All Chapters
100% (10)
Complete Download A Guide to Service Desk Concepts Third Edition Donna Knapp PDF All Chapters
67 pages
MULTIPLE CHOICE. Choose The One Alternative That Best Completes The Statement or Answers The Question
No ratings yet
MULTIPLE CHOICE. Choose The One Alternative That Best Completes The Statement or Answers The Question
4 pages
EasyCatalog Manual
No ratings yet
EasyCatalog Manual
226 pages

Chapter 7 - Ensemble

Uploaded by

Chapter 7 - Ensemble

Uploaded by

Chapter 7: Ensemble

=> Allow for training instances to be sampled multiple times.

When sampling is performed without replacement, it is called pasting.

=> Effective because can be run in parallel => Scale very well

Random patches and random subspaces

max_samples & bootstrap_features

Good for higher dimension inputs like images

Ensembles on Random Patches

Ensembles on Random Subspaces

BaggingClassifier equivalent of RandomForestClassifier:

n_estimators=500, max_samples=1.0, bootstrap=True, n_jobs=-1)

Random forest measures the relative importance of each feature

Boosting (hypothesis boosting):

Combine several weak learners into a strong learner.

Train predictors sequentially, each trying to correct its predecessor.

Most popular is AdaBoost (Adaptive Boosting) and Gradient Boosting

=> predictors that focus more on harder cases.

First train a base classifier

Then increase the relative weight of misclassified instances

Then trains a second classifier, and so on.

Next, Ada updates its instances' weights

Then all weights are normalized (i.e., divided by m∑i=1 w(i))

WHen just two classes => SAMME = AdaBoost

A decision stump = decision tree with max_depth = 1

from sklearn.metrics import mean_squared_error

X_train, X_val, y_train, y_val = train_test_split(X, y)

gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=120)

errors = [mean_squared_error(y_val, y_pred)

for y_pred in gbrt.staged_predict(X_val)]

gbrt = GradientBoostingRegressor(max_depth=2, warm_start=True)

for n_estimators in range(1, 120):

val_error = mean_squared_error(y_val, y_pred)

if val_error < min_val_error:

break # early stopping

Check out XBoost Lib!

You might also like