Ensemble Methods

Ensemble methods enhance machine learning performance by combining multiple models to reduce variance and bias, resulting in better predictive accuracy. Key techniques include bagging, which reduces variance through parallel model training, and boosting, which reduces bias through sequential model training. Random forests, a popular ensemble method, utilize multiple decision trees to improve robustness and accuracy while also offering feature importance insights.

Uploaded by

musavvirk04

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Ensemble Methods

Uploaded by

musavvirk04

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Ensemble Methods

Introduction
• Ensemble learning helps improve machine learning results by
combining several models.
• Ensemble learning combines the predictions from multiple models to
reduce the variance of predictions and reduce generalization error.
• This approach allows the production of better predictive performance
compared to a single model.
• Ensemble methods are meta-algorithms that combine several
machine learning techniques into one predictive model in order to
decrease variance (bagging), bias (boosting), or improve predictions
(stacking).
Ensemble Methods
• Models can be different from each other for a variety of reasons,
starting from the population they are built upon to the modeling used
for building the model.
The differences can be due to :
1. Difference in Population.
2. Difference in Hypothesis.
3. Difference in Modeling Technique.
4. Difference in Initial Seed.
Error in Ensemble Learning (Variance vs.
Bias)
The error emerging from any model can be broken down into three components mathematically. Following are these
component :

Bias error is useful to quantify how much on an average are the predicted values different from the actual value. A
high bias error means we have a under-performing model which keeps on missing important trends.
Variance on the other side quantifies how are the prediction made on same observation different from each other.
A high variance model will over-fit on your training population and perform badly on any observation beyond
training. Following diagram will give you more clarity (Assume that red spot is the real value and blue dots are
predictions) :
Contd..
• Normally, as you increase the complexity of your model, you will see
a reduction in error due to lower bias in the model. However, this
only happens till a particular point.
• As you continue to make your model more complex, you end up
over-fitting your model and hence your model will start suffering
from high variance.
•
Ensemble learning Types
Bootstrapping
• Bootstrap refers to random sampling with replacement. Bootstrap
allows us to better understand the bias and the variance with the
dataset.
• Bootstrap involves random sampling of small subset of data from the
dataset. This subset can be replaced. The selection of all the example
in the dataset has equal probability. This method can help to better
understand the mean and standard deviation from the dataset.
• Let’s assume we have a sample of ‘n’ values (x) and we’d like to get
an estimate of the mean of the sample.
mean(x) = 1/n * sum(x)
Visual Interpretation of Bootstrapping
Parallel Ensemble Learning(Bagging)
• Bagging, is a machine learning ensemble meta-algorithm intended to
improve the strength and accuracy of machine learning algorithms
used in classification and regression purpose. It additionally
diminishes fluctuation of data(variance)and help to overcome
over-fitting.
• Parallel ensemble methods where the base learners are generated in
parallel
• Algorithms : Random Forest, Bagged Decision Trees, Extra Trees
Sequential Ensemble learning (Boosting)
• Boosting, is a machine learning ensemble meta-algorithm for
principally reducing bias, and furthermore variance in supervised
learning, and a group of machine learning algorithms that convert
weak learner to strong ones.
• Sequential ensemble methods where the base learners are generated
sequentially.
• Example : Adaboost, Stochastic Gradient Boosting
Ensemble Methods: Increasing the Accuracy

• Ensemble methods
• Use a combination of models to increase accuracy
• Combine a series of k learned models, M1, M2, …, Mk, with
the aim of creating an improved model M*
• Popular ensemble methods
• Bagging: averaging the prediction over a collection of
classifiers
• Boosting: weighted vote with a collection of classifiers
Ensemble: combining a set of heterogeneous classifiers

12
Bagging: Boostrap Aggregation
• Analogy: Diagnosis based on multiple doctors’ majority vote
• Training
• Given a set D of d tuples, at each iteration i, a training set Di of d tuples is
sampled with replacement from D (i.e., bootstrap)
• A classifier model Mi is learned for each training set Di
• Classification: classify an unknown tuple X
• Each classifier Mi returns its class prediction
• The bagged classifier M* counts the votes and assigns the class with the
most votes to X
• Prediction: can be applied to the prediction of continuous values by taking
the average value of each prediction for a given test tuple
• Accuracy
• Often significantly better than a single classifier derived from D
• Proved improved accuracy in prediction

13
Basic Algorithm
Boosting
• Analogy: Consult several doctors, based on a combination of
weighted diagnoses—weight assigned based on the previous
diagnosis accuracy
• How boosting works?
• Weights are assigned to each training tuple
• A series of k classifiers is iteratively learned
• After a classifier Mi is learned, the weights are updated to
allow the subsequent classifier, Mi+1, to pay more attention to
the training tuples that were misclassified by Mi
• The final M* combines the votes of each individual classifier,
where the weight of each classifier's vote is a function of its
accuracy
• Boosting algorithm can be extended for numeric prediction
• Comparing with bagging: Boosting tends to have greater accuracy,
but it also risks overfitting the model to misclassified data
15
Adaboost (Freund and Schapire, 1997)
• Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)
• Initially, all the weights of tuples are set the same (1/d)
• Generate k classifiers in k rounds. At round i,
• Tuples from D are sampled (with replacement) to form a training set
Di of the same size
• Each tuple’s chance of being selected is based on its weight
• A classification model Mi is derived from Di
• Its error rate is calculated using Di as a test set
• If a tuple is misclassified, its weight is increased, o.w. it is decreased
• Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi error
rate is the sum of the weights of the misclassified tuples:

• The weight of classifier Mi’s vote is

16
• Similarities Between Bagging and Boosting
• Bagging and Boosting, both being the commonly used methods,
have a universal similarity of being classified as ensemble
methods. Here we will explain the similarities between them.
1. Both are ensemble methods to get N learners from 1 learner.
2. Both generate several training data sets by random sampling.
3. Both make the final decision by averaging the N learners (or
taking the majority of them i.e Majority Voting).
4. Both are good at reducing variance and provide higher stability.
S.NO Bagging Boosting

The simplest way of combining A way of combining predictions

1. predictions that that
belong to the same type. belong to the different types.
2. Aim to decrease variance, not bias. Aim to decrease bias, not variance.
Models are weighted according to
3. Each model receives equal weight.
their performance.
New models are influenced
4. Each model is built independently. by the performance of previously
built models.
Different training data subsets are
selected using row sampling with Every new subset contains the
5. replacement and random sampling elements that were misclassified by
methods from the entire training previous models.
dataset.
Bagging tries to solve the
6. Boosting tries to reduce bias.
over-fitting problem.
If the classifier is unstable (high If the classifier is stable and simple
7.
variance), then apply bagging. (high bias) the apply boosting.
In this base classifiers are trained In this base classifiers are trained
8.
parallelly. sequentially.
Example: The Random forest model Example: The AdaBoost uses
9
uses Bagging. Boosting techniques
Random Forest
• Random Forest:
( Breiman 2001)
• Each classifier in the ensemble is a decision tree classifier and is
generated using a random selection of attributes at each node to
determine the split
• During classification, each tree votes and the most popular class is
returned
• Two Methods to construct Random Forest:
• Forest-RI (random input selection): Randomly select, at each node, F
attributes as candidates for the split at the node. The CART methodology
is used to grow the trees to maximum size
• Forest-RC (random linear combinations): Creates new attributes (or
features) that are a linear combination of the existing attributes (reduces
the correlation between individual classifiers)
• Comparable in accuracy to Adaboost, but more robust to errors and outliers
• Insensitive to the number of attributes selected for consideration at each
split, and faster than bagging or boosting

20
Classification of Class-Imbalanced Data Sets

• Class-imbalance problem: Rare positive example but numerous

negative ones, e.g., medical diagnosis, fraud, oil-spill, fault, etc.
• Traditional methods assume a balanced distribution of classes and
equal error costs: not suitable for class-imbalanced data
• Typical methods for imbalance data in 2-class classification:
• Oversampling: re-sampling of data from positive class
• Under-sampling: randomly eliminate tuples from negative
class
• Threshold-moving: moves the decision threshold, t, so that the
rare class tuples are easier to classify, and hence, less chance
of costly false negative errors
• Ensemble techniques: Ensemble multiple classifiers introduced
above
• Still difficult for class imbalance problem on multiclass tasks

21
Random Forest Classifier
1. Take a random sample of size N with replacement from the data.
2. Take a random sample without replacement of the predictors.
3. Construct the first CART partition of the data.
4. Repeat Step 2 for each subsequent split until the tree is as large as
desired. Do not prune.
5. Repeat Steps 1–4 a large number of times.
Example

• Each decision tree in the ensemble is built upon a random bootstrap sample of
the original data, which contains positive (green labels) and negative (red labels)
examples.
• Class prediction for new instances using a random forest model is based on a
majority voting procedure among all individual trees.
• Bagging features and samples simultaneously: At each tree split, a
random sample of m features is drawn, and only those m features are
considered for splitting.
• Typically m = √ d or log2 d, where d is the number of features
• For each tree grown on a bootstrap sample, the error rate for
observations left out of the bootstrap sample is monitored. This is
called the “out-of-bag” error rate. random forests tries to improve on
bagging by “de-correlating” the trees.
• Each tree has the same expectation.
Advantages
• Random forests is considered as a highly accurate and robust method
because of the number of decision trees participating in the process.
• It does not suffer from the overfitting problem. The main reason is that it
takes the average of all the predictions, which cancels out the biases.
• The algorithm can be used in both classification and regression problems.
• Random forests can also handle missing values. There are two ways to
handle these: using median values to replace continuous variables, and
computing the proximity-weighted average of missing values.
• You can get the relative feature importance, which helps in selecting the
most contributing features for the classifier.
Disadvantages
• Random forests is slow in generating predictions because it has
multiple decision trees. Whenever it makes a prediction, all the trees
in the forest have to make a prediction for the same given input and
then perform voting on it. This whole process is time-consuming.
• The model is difficult to interpret compared to a decision tree, where
you can easily make a decision by following the path in the tree.
Finding important features

• Random forests also offers a good feature selection indicator.

• Scikit-learn provides an extra variable with the model, which shows the relative
importance or contribution of each feature in the prediction. It automatically
computes the relevance score of each feature in the training phase. Then it scales
the relevance down so that the sum of all scores is 1.
• This score will help you choose the most important features and drop the least
important ones for model building.
• Random forest uses gini importance or mean decrease in impurity (MDI) to
calculate the importance of each feature.
• Gini importance is also known as the total decrease in node impurity. This is how
much the model fit or accuracy decreases when you drop a variable. The larger
the decrease, the more significant the variable is. Here, the mean decrease is a
significant parameter for variable selection. The Gini index can describe the
overall explanatory power of the variables.
Random Forests vs Decision Trees

• Random forests is a set of multiple decision trees.

• Deep decision trees may suffer from overfitting, but random forests
prevents overfitting by creating trees on random subsets.
• Decision trees are computationally faster.
• Random forests is difficult to interpret, while a decision tree is easily
interpretable and can be converted to rules.
Which is the best, Bagging or Boosting?
• Bagging and Boosting decrease the variance of your single estimate as
they combine several estimates from different models. So the result
may be a model with higher stability.
• If the problem is that the single model gets a very low performance,
Bagging will rarely get a better bias. However, Boosting could
generate a combined model with lower errors as it optimises the
advantages and reduces pitfalls of the single model.
• By contrast, if the difficulty of the single model is over-fitting, then
Bagging is the best option. Boosting for its part doesn’t help to avoid
over-fitting; in fact, this technique is faced with this problem itself.
For this reason, Bagging is effective more often than Boosting.
Stacking & Blending
Stacking is a way of combining multiple models, that introduces the concept
of a meta learner. It is less widely used than bagging and boosting. Unlike
bagging and boosting, stacking may be (and normally is) used to combine
models of different types. The procedure is as follows:
• Split the training set into two disjoint sets.
• Train several base learners on the first part.
• Test the base learners on the second part.
• Using the predictions from Test data sets as the inputs, and the correct
responses as the outputs, train a higher level learner.
• Example : Voting Classifier
Blending is technique where we can do weighted averaging of final result.

Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
unit 5 ML
No ratings yet
unit 5 ML
14 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Ensemble
No ratings yet
Ensemble
14 pages
UNIT3_class
No ratings yet
UNIT3_class
30 pages
Ensemble Methods (Final)
No ratings yet
Ensemble Methods (Final)
16 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
16-Ensemble Learning - Cont... - 12-04-2024
No ratings yet
16-Ensemble Learning - Cont... - 12-04-2024
13 pages
Ensemble Learning Helps Improve Machine Learning Results by Combining Several Models
No ratings yet
Ensemble Learning Helps Improve Machine Learning Results by Combining Several Models
2 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
ensembles_learning
No ratings yet
ensembles_learning
16 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
2.4-Ensemble_methods_lecture_notes (1)
No ratings yet
2.4-Ensemble_methods_lecture_notes (1)
14 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
ML Cat 2 - 7
No ratings yet
ML Cat 2 - 7
30 pages
Module 4
No ratings yet
Module 4
30 pages
UMl - unit 3
No ratings yet
UMl - unit 3
50 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
Unit 4
No ratings yet
Unit 4
17 pages
Unit I ML (I) 24-25
No ratings yet
Unit I ML (I) 24-25
79 pages
Module 2
No ratings yet
Module 2
34 pages
Unit 3
No ratings yet
Unit 3
99 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
ML UNIT 3-1
No ratings yet
ML UNIT 3-1
14 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Ensemble Learning
No ratings yet
Ensemble Learning
30 pages
Bagging vs Boosting in Machine Learning
No ratings yet
Bagging vs Boosting in Machine Learning
5 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book Two
From Everand
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book Two
P.Y. Cheng
No ratings yet
Lo U ST in Therapy Development and Psychometric Evaluation of The Therapists Attitude Toward Sexual and Erotic Feelings Scale TASEF
No ratings yet
Lo U ST in Therapy Development and Psychometric Evaluation of The Therapists Attitude Toward Sexual and Erotic Feelings Scale TASEF
21 pages
A Study of Cross-Validation and Bootstrap For Accuracy Estimation and Model Selection
No ratings yet
A Study of Cross-Validation and Bootstrap For Accuracy Estimation and Model Selection
8 pages
Buy Ebook Modern Statistics With R From Wrangling and Exploring Data To Inference and Predictive Modelling Second Edition Måns Thulin Cheap Price
100% (3)
Buy Ebook Modern Statistics With R From Wrangling and Exploring Data To Inference and Predictive Modelling Second Edition Måns Thulin Cheap Price
84 pages
Lampiran Hasil Uji Univariat
No ratings yet
Lampiran Hasil Uji Univariat
3 pages
2021 - Psychometric Properties of The SRS3.0 in Spanish
No ratings yet
2021 - Psychometric Properties of The SRS3.0 in Spanish
13 pages
Short Answer Type Questions: Question Bank
No ratings yet
Short Answer Type Questions: Question Bank
26 pages
Partial Least Squares Structural Equation Modeling
No ratings yet
Partial Least Squares Structural Equation Modeling
16 pages
ML Unit 1
No ratings yet
ML Unit 1
22 pages
T10-R73-P2-Varian-802-902-v5.1 - Practice Questions
No ratings yet
T10-R73-P2-Varian-802-902-v5.1 - Practice Questions
11 pages
Activity 7
No ratings yet
Activity 7
5 pages
ERM-104 Parameter-Risk (Excl Sec 3)
No ratings yet
ERM-104 Parameter-Risk (Excl Sec 3)
10 pages
Additional Strength Biowaiver Guideline May2021
No ratings yet
Additional Strength Biowaiver Guideline May2021
3 pages
User's Guide For iNEXT Online: Software For Interpolation and Extrapolation of Species Diversity
No ratings yet
User's Guide For iNEXT Online: Software For Interpolation and Extrapolation of Species Diversity
14 pages
Cesa Bianchi Apéndice
No ratings yet
Cesa Bianchi Apéndice
66 pages
Introduction To Conformal Prediction With Python: A Short Guide For Quantifying Uncertainty of Machine Learning Models 1st Edition Christoph Molnar
No ratings yet
Introduction To Conformal Prediction With Python: A Short Guide For Quantifying Uncertainty of Machine Learning Models 1st Edition Christoph Molnar
64 pages
Sampling Theory and Practice 1st Edition Changbao Wu Mary E Thompson instant download
No ratings yet
Sampling Theory and Practice 1st Edition Changbao Wu Mary E Thompson instant download
42 pages
Pencegahan Perdarahan Postpartum.
No ratings yet
Pencegahan Perdarahan Postpartum.
8 pages
Bankruptcy Prediction Model 24-04-2017
No ratings yet
Bankruptcy Prediction Model 24-04-2017
22 pages
Wilcox Functions
No ratings yet
Wilcox Functions
117 pages
Media Framing of Copenhagen Tourism A New Approach To Public Opinion About Tourists
No ratings yet
Media Framing of Copenhagen Tourism A New Approach To Public Opinion About Tourists
13 pages
Time Series A First Course with Bootstrap Starter 1st Edition Tucker S. Mcelroy 2024 scribd download
No ratings yet
Time Series A First Course with Bootstrap Starter 1st Edition Tucker S. Mcelroy 2024 scribd download
41 pages
lecture 6- Parameter Estimation in Time Series Models
100% (1)
lecture 6- Parameter Estimation in Time Series Models
27 pages
A Primer in Nonparametric Econometrics
No ratings yet
A Primer in Nonparametric Econometrics
88 pages
Bootstrapping The General Linear Hypothesis Test: Pedro Delicado
No ratings yet
Bootstrapping The General Linear Hypothesis Test: Pedro Delicado
17 pages
Mplus 2
No ratings yet
Mplus 2
12 pages
Autoregressive Neural Network
No ratings yet
Autoregressive Neural Network
56 pages
How Handling Missing Data May Impact Conclusions - A Comparison of Six Different Imputation Methods For Categorical Questionnaire Data
No ratings yet
How Handling Missing Data May Impact Conclusions - A Comparison of Six Different Imputation Methods For Categorical Questionnaire Data
20 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
CS 4700: Foundations of Artificial Intelligence
No ratings yet
CS 4700: Foundations of Artificial Intelligence
36 pages
Practical Statistics for Data Scientists
No ratings yet
Practical Statistics for Data Scientists
13 pages

Ensemble Methods

Uploaded by

Ensemble Methods

Uploaded by

Ensemble Methods

• The weight of classifier Mi’s vote is

The simplest way of combining A way of combining predictions

• Class-imbalance problem: Rare positive example but numerous

• Random forests also offers a good feature selection indicator.

• Random forests is a set of multiple decision trees.

You might also like