0% found this document useful (0 votes)

38 views

Week 11 EnsembleLearning

Ensemble learning uses a set of models to obtain a final prediction. It aims to improve accuracy by aggregating multiple learned models. There are ensemble methods for classification, regression, and clustering. Popular ensemble methods include bagging, boosting, random forests, and those using negative correlation learning or heterogeneous models. Bagging generates new training data sets by sampling the original data with replacement and averages the predictions of models built on these sets. Boosting iteratively assigns weights to examples and focuses subsequent models on misclassified examples from previous models.

Uploaded by

7736468421

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Week 11 EnsembleLearning

Uploaded by

7736468421

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Ensemble Learning

Model selection
Statistical validation
Ensemble Learning
Definition

Ensemble learning is a process that uses a set of

models, each of them obtained by applying a
learning process to a given problem. This set of
models (ensemble) is integrated in some way to
obtain the final prediction.
Aggregation of multiple learned models with
the goal of improving accuracy.
Intuition: simulate what we do when we combine an
expert panel in a human decision-making process

3
Types of ensembles

There are ensemble methods for:

Classification
Regression
Clustering (also known as consensual clustering)
We will only discuss ensemble methods for
supervised learning (classification and regression)
Ensembles can also be classified as:
Homogeneous: It uses only one induction algorithm
Heterogeneous: It uses different induction algorithms
4
Types of ensembles

There are ensemble methods for:

Violation of Ockham’s Razor

“simplicity leads to greater accuracy”
Identifying the best model requires identifying the proper
"model complexity"

6
The ensemble learning process

Is optional

7
Methods to generate homogeneous
ensembles
Induction algorithm

Training Parameter set

Training Examples

Model

Data manipulation: it changes the training set in order to obtain

different models

Modeling process manipulation: it changes the induction

algorithm, the parameter set or the model (the last one is uncommon) in
order to obtain different models
8
Data manipulation

Manipulating the input features

Ratings, Actors Classifier A
Actors, Genres Classifier B Predictions
Genres, Ratings Classifier C
Training Examples

Sub-sampling from the training set

Classifier A
Classifier B Predictions
Classifier C
Training Examples

9
Modeling process manipulation

Manipulating the parameter sets

algorithm(x1,y1) Classifier A
algorithm(x2,y2) Classifier B Predictions
algorithm(x3,y3) Classifier C
Training Examples

Manipulating the induction algorithm

algorithm Classifier A
algorithm’ Classifier B Predictions
algorithm’’ Classifier C
Training Examples

where algorithm, algorithm’ and algorithm’’ are variations of the same induction algorithm

10
How to combine models
(the integration phase)

Algebraic methods Voting methods

Average Majority voting
Weighted average Weighted majority voting
Sum Borda count
(rank candidates in order of preference)
Weighted sum

Product
Maximum
Minimum
Median

In bold: the most frequent 11

Characteristics of the base models

For classification:
The base classifiers should be as accurate as possible and
having diverse errors as much the true class is the majority
class (see Brown, G. & Kuncheva, L., “Good” and “Bad”
Diversity in Majority Vote Ensembles, Multiple Classifier
Systems, Springer, 2010, 5997, 124-133)
It is not possible to obtain the optimum ensemble of classifiers based
on the knowledge of the base learners

12
Characteristics of the base models

For regression:
It is possible to express the error of the ensemble in function of
the error of the base learners
Assuming the average as the combination method,

The goal is to minimize ‫݂([ܧ‬መி − ݂)ଶ ], so:

The average error of the base learners (ܾ݅ܽ‫ )ݏ‬should be as small as possible, i.e.,
the base learners should be as accurate (in average) as possible;

The average variance of the base learners (‫ )ݎܽݒ‬should be as small as possible;

The average covariance of the base learners (ܿ‫ )ݎܽݒ݋‬should be as small as

possible, i.e., the base learners should have negative correlation.

13
Popular ensemble methods
Bagging:
averaging the prediction over a collection of unstable predictors generated from bootstrap
samples (both classification and regression)

Boosting:
weighted vote with a collection of classifiers that were trained sequentially from training sets
given priority to instances wrongly classified (classification)

Random Forest:
averaging the prediction over a collection of trees splited using a randomly selected subset of
features (both classification and regression)

Ensemble learning via negative correlation learning:

generating sequentially new predictors negatively correlated with the existing ones (regression)

Heterogeneous ensembles:
combining a set of heterogeneous predictors (both classification and regression)

14
Bagging: Bootstrap AGGregatING
Analogy: Diagnosis based on multiple doctors’ majority vote
Training
Given a set D of d tuples, at each iteration i, a training set Di of d tuples
is sampled with replacement from D (i.e., bootstrap)
A classifier model Mi is learned for each training set Di
Classification: classify an unknown sample X
Each classifier Mi returns its class prediction
The bagged classifier M* counts the votes and assigns the class with the
most votes to X
Prediction: can be applied to the prediction of continuous values by taking
the average value of each prediction for a given test tuple

15
Bagging (Breiman 1996)
Accuracy
Often significant better than a single classifier derived from D
For noise data: not considerably worse, more robust
Proved improved accuracy in prediction
Requirement: Need unstable classifier types
Unstable means a small change to the training data may lead to major
decision changes.
Stability in Training
Training: construct classifier f from D
Stability: small changes on D results in small changes on f
Decision trees are a typical unstable classifier

16
http://en.wikibooks.org/wiki/File:DTE_Bagging.png
17
Boosting
Analogy: Consult several doctors, based on a combination of weighted
diagnoses—weight assigned based on the previous diagnosis accuracy
Incrementally create models selectively using training
examples based on some distribution.
How boosting works?
Weights are assigned to each training example
A series of k classifiers is iteratively learned
After a classifier Mi is learned, the weights are updated to allow the
subsequent classifier, Mi+1, to pay more attention to the training
examples that were misclassified by Mi
The final M* combines the votes of each individual classifier, where the
weight of each classifier's vote is a function of its accuracy

18
Boosting: Construct Weak Classifiers
Using Different Data Distribution
Start with uniform weighting
During each step of learning
Increase weights of the examples which are not correctly learned by the
weak learner
Decrease weights of the examples which are correctly learned by the weak
learner

Idea
Focus on difficult examples which are not correctly classified in the
previous steps

19
Boosting: Combine Weak Classifiers
Weighted Voting
Construct strong classifier by weighted voting of the weak
classifiers
Idea
Better weak classifier gets a larger weight
Iteratively add weak classifiers
Increase accuracy of the combined classifier through minimization
of a cost function

20
Boosting
Differences with Bagging:
Models are built sequentially on modified versions of the data
The predictions of the models are combined through a weighted
sum/vote

Boosting algorithm can be extended for numeric prediction

Comparing with bagging: Boosting tends to achieve greater
accuracy, but it also risks overfitting the model to misclassified
data

21
AdaBoost: a popular boosting algorithm
(Freund and Schapire, 1996)

Given a set of d class-labeled examples, (X1, y1), …, (Xd, yd)

Initially, all the weights of examples are set the same (1/d)
Generate k classifiers in k rounds. At round i,
Tuples from D are sampled (with replacement) to form a training set Di of the
same size
Each example’s chance of being selected is based on its weight
A classification model Mi is derived from Di and its error rate calculated using Di
as a test set
If a tuple is misclassified, its weight is increased, otherwise it is decreased
Error rate: err(Xj) is the misclassification error of example Xj. Classifier Mi
error rate is the sum of the weights of the misclassified examples.

22
Adaboost comments
This distribution update ensures that instances misclassified by
the previous classifier are more likely to be included in the
training data of the next classifier.
Hence, consecutive classifiers’ training data are geared
towards increasingly hard-to-classify instances.
Unlike bagging, AdaBoost uses a rather undemocratic voting
scheme, called the weighted majority voting. The idea is an
intuitive one: those classifiers that have shown good
performance during training are rewarded with higher voting
weights than the others.

23
The diagram should be interpreted with the understanding that the algorithm is sequential: classifier CK is created
before classifier CK+1, which in turn requires that βK and the current distribution DK be available.
24
Random Forest (Breiman 2001)
Random Forest: A variation of the bagging algorithm
Created from individual decision trees.
Diversity is guaranteed by selecting randomly at each split, a
subset of the original features during the process of tree
generation.
During classification, each tree votes and the most popular
class is returned
During regression, the result is the averaged prediction of all
generated trees

25
Random Forest (Breiman 2001)
Two Methods to construct Random Forest:
Forest-RI (random input selection): Randomly select, at each node, F
attributes as candidates for the split at the node. The CART
methodology is used to grow the trees to maximum size
Forest-RC (random linear combinations): Creates new attributes (or
features) that are a linear combination of the existing attributes
(reduces the correlation between individual classifiers)
Comparable in accuracy to Adaboost, but more robust to
errors and outliers
Insensitive to the number of attributes selected for
consideration at each split, and faster than bagging or boosting

26
Ensemble learning via
negative correlation learning
Negative correlation learning can be used only in rnsemble
regression algorithms that try to minimize/maximize a given
objective function (e.g., neural networks, support vector
regression)
The idea is: a model should be trained in order to minimize the
error function of the ensemble, i.e., it adds to the error
function a penalty term with the averaged error of the models
already trained.
This approach will produce models negatively correlated with
the averaged error of the previously generated models.

27
Model selection
Model selection

Given a problem, which algorithms should we use?

Golden rule: there is no algorithm that is the best one for any given
problem
Typically, two approaches (or both) can be adopted:
To choose the algorithm more suitable for the given problem
To adapt the given data for the intended algorithm (using pre-processing,
for instance)
Additionally, the concept of “good algorithm” depends on the
problem:
For a doctor, the interpretation of the model can be a major criterion for
the selection of the model (decision trees and Bayesian networks are very
appreciated)
For logistics, the accuracy of travel time prediction is, typically, the most
important selection criterion.

29
Model selection

Hastie, T.; Tibshirani, R. & Friedman, J. H., The elements of statistical learning: data
mining, inference, and prediction, Springer, 2001, pag. 313.

30
Statistical validation
Statistical validation

If model1 has an accuracy of 10 and model2 has an accuracy of 10.1,

for a given test set, can we say that model1 is more accurate than
model2?
The answer is: we do not know. Remember that we are using a
sample. The test set is a sample. How can we know whether these
models would perform equally in a different test set?
We should take into account with the variability of the results.
We should validate statistically the results.
Two recommended references:
Salzberg, S. L., On comparing classifiers: pitfalls to avoid and a
recommended approach, Data Mining and Knowledge Discovery, 1997, 1,
317-327
Demsar, J., Statistical comparisons of classifiers over multiple data sets,
Journal of Machine Learning Research, 2006, 7, 1-30

32
Introductory References
‘Data Mining: Practical Machine Learning Tools and Techniques with Java
Implementations’, Ian H. Witten and Eibe Frank, 1999

‘Data Mining: Practical Machine Learning Tools and Techniques second

edition’, Ian H. Witten and Eibe Frank, 2005

Todd Holloway, 2008, “Ensemble Learning Better Predictions Through

Diversity”, power point presentation

Leandro M. Almeida, “Sistemas Baseados em Comitês de Classificadores”

Cong Li, 2009, “Machine Learning Basics 3. Ensemble Learning”

R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits and

Systems Magazine, vol. 6, no. 3, pp. 21–45, Quarter 2006.

33
Top References
Wolpert, D. H., Stacked generalization, Neural Networks, 1992, 5,
241-259
Breiman, L., Bagging predictors, Machine Learning, 1996, 26, 123-
140
Freund, Y. & Schapire, R., Experiments with a new boosting
algorithm, International Conference on Machine Learning, 1996, 148-
156
Breiman, L., Random forests, Machine Learning, 2001, 45, 5-32
Liu, Y. & Yao, X., Ensemble learning via negative correlation,
Neural Networks, 1999, 12, 1399-1404
Rodríguez, J. J.; Kuncheva, L. I. & Alonso, C. J., Rotation forest: a new
classifier ensemble, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2006, 28, 1619-1630
34

AB5C Key
No ratings yet
AB5C Key
13 pages
Psycology Book
No ratings yet
Psycology Book
114 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Unit 3
No ratings yet
Unit 3
99 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Module 2
No ratings yet
Module 2
34 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
Lecture 17 - Ensemble Learning
No ratings yet
Lecture 17 - Ensemble Learning
31 pages
Unit 4
No ratings yet
Unit 4
17 pages
unit 5 ML
No ratings yet
unit 5 ML
14 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
ensembles_learning
No ratings yet
ensembles_learning
16 pages
VTU Module-4 Chapter-2 Ensemble Learning and Random Forests
No ratings yet
VTU Module-4 Chapter-2 Ensemble Learning and Random Forests
61 pages
ML8Ensembles (1)
No ratings yet
ML8Ensembles (1)
31 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
UNIT IV
No ratings yet
UNIT IV
18 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Ensemble_Learning_SA
No ratings yet
Ensemble_Learning_SA
27 pages
Combining Classifiers: Outline
No ratings yet
Combining Classifiers: Outline
15 pages
Ensemble Learning Helps Improve Machine Learning Results by Combining Several Models
No ratings yet
Ensemble Learning Helps Improve Machine Learning Results by Combining Several Models
2 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
ML UNIT-3 PART-1
No ratings yet
ML UNIT-3 PART-1
17 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Classification Through Ensembling Techniques
No ratings yet
Classification Through Ensembling Techniques
10 pages
Chapter Five
No ratings yet
Chapter Five
42 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
UMl - unit 3
No ratings yet
UMl - unit 3
50 pages
7 - Ensemble Techniques-Converted Updated
No ratings yet
7 - Ensemble Techniques-Converted Updated
8 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
16-Ensemble Learning - Cont... - 12-04-2024
No ratings yet
16-Ensemble Learning - Cont... - 12-04-2024
13 pages
MTech Seminar II
No ratings yet
MTech Seminar II
10 pages
UNIT-5 ML notes
No ratings yet
UNIT-5 ML notes
24 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
2.4-Ensemble_methods_lecture_notes (1)
No ratings yet
2.4-Ensemble_methods_lecture_notes (1)
14 pages
Ensemble Learning: Wisdom of The Crowd
100% (1)
Ensemble Learning: Wisdom of The Crowd
12 pages
Julia for Data Science
From Everand
Julia for Data Science
Anshul Joshi
No ratings yet
Python Machine Learning
From Everand
Python Machine Learning
Sebastian Raschka
4/5 (18)
Machia
No ratings yet
Machia
20 pages
CEMB Guidelines Frontline Practitioners
No ratings yet
CEMB Guidelines Frontline Practitioners
24 pages
Scientific Errors in The Qur'an - WikiIslam PDF
50% (2)
Scientific Errors in The Qur'an - WikiIslam PDF
196 pages
COMPX307-20B Programming Languages Coursework Four Semantics Coursework
No ratings yet
COMPX307-20B Programming Languages Coursework Four Semantics Coursework
2 pages
Assistant Engineer - Overseer Gr.1: Civil Engineering Exam Syllabus
No ratings yet
Assistant Engineer - Overseer Gr.1: Civil Engineering Exam Syllabus
12 pages
Roofing Quantity Take-Off Worksheet
No ratings yet
Roofing Quantity Take-Off Worksheet
3 pages
PG PDF
No ratings yet
PG PDF
173 pages
A Road Accident
No ratings yet
A Road Accident
1 page
Gods New Era, Something Great Is About To Happen 2020
100% (1)
Gods New Era, Something Great Is About To Happen 2020
33 pages
Auditing Problems Audit of Investments
No ratings yet
Auditing Problems Audit of Investments
3 pages
A Study On Effective Training Programmes in Auto Mobile Industry
No ratings yet
A Study On Effective Training Programmes in Auto Mobile Industry
7 pages
08-02-13 - Garlock - Vol 10 PDF
No ratings yet
08-02-13 - Garlock - Vol 10 PDF
264 pages
Food Bank of Northern Nevada Distribution Schedule 2019-2020 - 31720
No ratings yet
Food Bank of Northern Nevada Distribution Schedule 2019-2020 - 31720
5 pages
MGT Assignment On Bata Shoes BD
75% (4)
MGT Assignment On Bata Shoes BD
18 pages
Feline Calicivirus Infection ABCD Guidelines On Prevention and Management
No ratings yet
Feline Calicivirus Infection ABCD Guidelines On Prevention and Management
10 pages
Best Fashion Schools in the World for 2024 - CEOWORLD Magazine
No ratings yet
Best Fashion Schools in the World for 2024 - CEOWORLD Magazine
1 page
The Social Learning Theory of Rotter
50% (2)
The Social Learning Theory of Rotter
32 pages
Foundations of Curriculum Development
100% (2)
Foundations of Curriculum Development
44 pages
Rigid Bodies For Concurrent Force
No ratings yet
Rigid Bodies For Concurrent Force
9 pages
Law of Torts I
No ratings yet
Law of Torts I
17 pages
Elen
No ratings yet
Elen
8 pages
Code 1037
No ratings yet
Code 1037
2 pages
Time Series Assignment
No ratings yet
Time Series Assignment
10 pages
Ionic Hydration Enthalpies
No ratings yet
Ionic Hydration Enthalpies
3 pages
Chapter 9 Exercise Solutions
100% (2)
Chapter 9 Exercise Solutions
24 pages
Family Tree Project (2nd-3rd)
No ratings yet
Family Tree Project (2nd-3rd)
5 pages
CAE Reading Passages
No ratings yet
CAE Reading Passages
52 pages
Teesta Setalvad Affidavit in Rejoinder
No ratings yet
Teesta Setalvad Affidavit in Rejoinder
5 pages
Apds7311 Part 1
No ratings yet
Apds7311 Part 1
6 pages
Instant Download (Ebook) Logical Reasoning: A First Course by Rob P. Nederpelt, Fairouz D. Kamareddine ISBN 9780954300678, 095430067X PDF All Chapters
100% (5)
Instant Download (Ebook) Logical Reasoning: A First Course by Rob P. Nederpelt, Fairouz D. Kamareddine ISBN 9780954300678, 095430067X PDF All Chapters
81 pages
Feats List
No ratings yet
Feats List
6 pages
Brent Doctrine
No ratings yet
Brent Doctrine
8 pages
Handajani Suzie 2005
No ratings yet
Handajani Suzie 2005
192 pages
Proposal For GAD Seminar
100% (1)
Proposal For GAD Seminar
3 pages
Skidy Best Format
No ratings yet
Skidy Best Format
38 pages

Week 11 EnsembleLearning

Uploaded by

Week 11 EnsembleLearning

Uploaded by

Ensemble Learning

Ensemble learning is a process that uses a set of

There are ensemble methods for:

There are ensemble methods for:

Violation of Ockham’s Razor

Training Parameter set

Data manipulation: it changes the training set in order to obtain

Modeling process manipulation: it changes the induction

Manipulating the input features

Sub-sampling from the training set

Manipulating the parameter sets

Manipulating the induction algorithm

Algebraic methods Voting methods

In bold: the most frequent 11

The goal is to minimize ‫݂([ܧ‬መி − ݂)ଶ ], so:

The average variance of the base learners (‫ )ݎܽݒ‬should be as small as possible;

The average covariance of the base learners (ܿ‫ )ݎܽݒ݋‬should be as small as

Ensemble learning via negative correlation learning:

Boosting algorithm can be extended for numeric prediction

Given a set of d class-labeled examples, (X1, y1), …, (Xd, yd)

Given a problem, which algorithms should we use?

If model1 has an accuracy of 10 and model2 has an accuracy of 10.1,

‘Data Mining: Practical Machine Learning Tools and Techniques second

Todd Holloway, 2008, “Ensemble Learning Better Predictions Through

Leandro M. Almeida, “Sistemas Baseados em Comitês de Classificadores”

Cong Li, 2009, “Machine Learning Basics 3. Ensemble Learning”

R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits and

You might also like