0% found this document useful (0 votes)
17 views

Ensemble Method

Uploaded by

meenuthakur088
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Ensemble Method

Uploaded by

meenuthakur088
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Bias and Variance help us in parameter tuning and deciding better-fitted models among

several built.

Majorly there are two types of reducible errors associated with learning methods:

1. Bias error: Also known as algorithm bias or AI bias; is a phenomenon that occurs when an
algorithm produces results that are systemically prejudiced due to erroneous assumptions
in the machine learning (ML) process.

Bias is one type of error that occurs due to wrong assumptions about data such as assuming
data is linear when in reality, data follows a complex function.

Bias is simply defined as the inability of the model because of that there is some difference
or error occurring between the model’s predicted value and the actual value.

Let Y be the true value of a parameter, and let Y’ be an estimator of Y based on a sample of
data. Then, the bias of the estimator Y’ is given by:

Bias(Y’)=E(Y’)-Y

where E(Y’) is the expected value of the estimator Y’. It is the measurement of the model
that how well it fits the data.

 Low Bias: Low bias value means fewer assumptions are taken to build the target
function. In this case, the model will closely match the training dataset.
 High Bias: High bias value means more assumptions are taken to build the target
function. In this case, the model will not match the training dataset closely.
 The high-bias model will not be able to capture the dataset trend. It is considered as
the underfitting model which has a high error rate. It is due to a very simplified
algorithm.
 For example, a linear regression model may have a high bias if the data has a non-
linear relationship.

Ways to reduce high bias in Machine Learning:

 Use a more complex model: One of the main reasons for high bias is the very simplified
model. it will not be able to capture the complexity of the data. In such cases, we can
make our mode more complex by increasing the number of hidden layers in the case of
a deep neural network. Or we can use a more complex model like Polynomial
regression for non-linear datasets, CNN for image processing, and RNN for sequence
learning.
 Increase the number of features: By adding more features to train the dataset will
increase the complexity of the model. And improve its ability to capture the underlying
patterns in the data.
 Reduce Regularization of the model: Regularization techniques such as L1 or L2
regularization can help to prevent overfitting and improve the generalization ability of
the model. if the model has a high bias, reducing the strength of regularization or
removing it altogether can help to improve its performance.
 Increase the size of the training data: Increasing the size of the training data can help
to reduce bias by providing the model with more examples to learn from the dataset.

2. Variance:
Variance is the measure of spread in data from its mean position

In machine learning variance is the amount by which the performance of a predictive model
changes when it is trained on different subsets of the training data

Variance gets introduced with high sensitivity to variations in training data

More specifically, variance is the variability of the model that how much it is sensitive to
another subset of the training dataset. i.e. how much it can adjust on the new subset of the
training dataset.

 Low variance: Low variance means that the model is less sensitive to
changes in the training data and can produce consistent estimates of
the target function with different subsets of data from the
same distribution. This is the case of underfitting when the model fails to
generalize on both training and test data.
 High variance: High variance means that the model is very sensitive to
changes in the training data and can result in significant changes in the
estimate of the target function when trained on different subsets of data
from the same distribution. This is the case of overfitting when the model
performs well on the training data but poorly on new, unseen test data. It
fits the training data too closely that it fails on the new training dataset.

Ways to Reduce the reduce Variance in Machine Learning:

 Cross-validation: By splitting the data into training and testing sets multiple times,
cross-validation can help identify if a model is overfitting or underfitting and can be
used to tune hyperparameters to reduce variance.

 Feature selection: By choosing the only relevant feature will decrease the model’s
complexity and it can reduce the variance error.
 Regularization: We can use L1 or L2 regularization to reduce variance in machine
learning models

 Ensemble methods: It will combine multiple models to improve generalization


performance. Bagging, boosting, and stacking are common ensemble methods that
can help reduce variance and improve generalization performance.

 Simplifying the model: Reducing the complexity of the model, such as decreasing
the number of parameters or layers in a neural network, can also help reduce variance
and improve generalization performance.

 Early stopping: Early stopping is a technique used to prevent overfitting by stopping


the training of the deep learning model when the performance on the validation set
stops improving.

There can be four combinations between bias and variance.

 High Bias, Low Variance: A model with high bias and low variance is said to be
underfitting.

 High Variance, Low Bias: A model with high variance and low bias is said to be
overfitting.

 High-Bias, High-Variance: A model has both high bias and high variance, which
means that the model is not able to capture the underlying patterns in the data (high
bias) and is also too sensitive to changes in the training data (high variance). As a
result, the model will produce inconsistent and inaccurate predictions on
average.

 Low Bias, Low Variance: A model that has low bias and low variance means that the
model is able to capture the underlying patterns in the data (low bias) and is not too
sensitive to changes in the training data (low variance). This is the ideal scenario for
a machine learning model, as it is able to generalize well to new, unseen data and
produce consistent and accurate predictions. But in practice, it’s not possible.

The model is likely to be just complex enough to capture the complexity of the
data, but not too complex to overfit the training data. This can happen when the
model has been carefully tuned to achieve a good balance between bias and
variance, by adjusting the hyperparameters and selecting an appropriate model
architecture.

Bias Variance Tradeoff

If the algorithm is too simple (hypothesis with linear equation) then it may be on high
bias and low variance condition and thus is error-prone.
If algorithms fit too complex (hypothesis with high degree equation) then it may be on
high variance and low bias.

In the latter condition, the new entries will not perform well. There is something
between both of these conditions, known as a Trade-off or Bias Variance Trade-off.

This tradeoff in complexity is why there is a tradeoff between bias and variance.

An algorithm can’t be more complex and less complex at the same time. For the
graph, the perfect tradeoff will be like this.
Ensemble model

Basic idea is to learn a set of classifiers (experts) and to allow them to vote.
Advantage: Improvement in predictive accuracy.
Disadvantage : It is difficult to understand an ensemble of classifiers.

Ensembles overcome three problems –


 Statistical Problem –
The Statistical Problem arises when the hypothesis space is too large for
the amount of available data. Hence, there are many hypotheses with the
same accuracy on the data and the learning algorithm chooses only one
of them! There is a risk that the accuracy of the chosen hypothesis is low
on unseen data!
 Computational Problem –
The Computational Problem arises when the learning algorithm cannot
guarantees finding the best hypothesis.
 Representational Problem –
The Representational Problem arises when the hypothesis space does
not contain any good approximation of the target class(es).

Main Challenge for Developing Ensemble Models?


The main challenge is not to obtain highly accurate base models, but rather to obtain base
models which make different kinds of errors. For example, if ensembles are used for
classification, high accuracies can be accomplished if different base models misclassify
different training examples, even if the base classifier accuracy is low.
Types of Ensemble Classifier –
Bagging:
Bagging (Bootstrap Aggregation) is used to reduce the variance of a decision tree. Suppose a
set D of d tuples, at each iteration i, a training set Di of d tuples is sampled with replacement
from D (i.e., bootstrap). Then a classifier model Mi is learned for each training set D < i.
Each classifier Mi returns its class prediction. The bagged classifier M* counts the votes and
assigns the class with the most votes to X (unknown sample).
Implementation steps of Bagging –
1. Multiple subsets are created from the original data set with equal tuples, selecting
observations with replacement.
2. A base model is created on each of these subsets.
3. Each model is learned in parallel from each training set and independent of each other.
4. The final predictions are determined by combining the predictions from all the models.

Random Forest:

Random Forest is an extension over bagging. Each classifier in the ensemble is a decision
tree classifier and is generated using a random selection of attributes at each node to
determine the split. During classification, each tree votes and the most popular class is
returned.
Implementation steps of Random Forest –
1. Multiple subsets are created from the original data set, selecting observations with
replacement.
2. A subset of features is selected randomly and whichever feature gives the best split is
used to split the node iteratively.
3. The tree is grown to the largest.
4. Repeat the above steps and prediction is given based on the aggregation of predictions
from n number of trees.

Ensemble methods

1. Averaging method: It is mainly used for regression problems. The method consists of
building multiple models independently and returning the average of the prediction of all the
models. In general, the combined output is better than an individual output because variance
is reduced.

Dataset link: https://archive.ics.uci.edu/ml/datasets/Alcohol+QCM+Sensor+Dataset

# importing utility modules


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# importing machine learning models for prediction


from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb
from sklearn.linear_model import LinearRegression
# loading train data set in dataframe from train_data.csv file
df = pd.read_csv("train_data.csv")

# getting target data from the dataframe


target = df["target"]

# getting train data from the dataframe


train = df.drop("target")

# Splitting between train data into training and validation dataset


X_train, X_test, y_train, y_test = train_test_split(
train, target, test_size=0.20)

# initializing all the model objects with default parameters


model_1 = LinearRegression()
model_2 = xgb.XGBRegressor()
model_3 = RandomForestRegressor()

# training all the model on the training dataset


model_1.fit(X_train, y_target)
model_2.fit(X_train, y_target)
model_3.fit(X_train, y_target)

# predicting the output on the validation dataset


pred_1 = model_1.predict(X_test)
pred_2 = model_2.predict(X_test)
pred_3 = model_3.predict(X_test)

# final prediction after averaging on the prediction of all 3 models


pred_final = (pred_1+pred_2+pred_3)/3.0

# printing the mean squared error between real value and predicted value
print(mean_squared_error(y_test, pred_final))

2. Max voting: It is mainly used for classification problems. The method


consists of building multiple models independently and getting their individual
output called ‘vote’. The class with maximum votes is returned as output.

# importing utility modules


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss

# importing machine learning models for prediction


from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.linear_model import LogisticRegression

# importing voting classifier


from sklearn.ensemble import VotingClassifier

# loading train data set in dataframe from train_data.csv file


df = pd.read_csv("train_data.csv")

# getting target data from the dataframe


target = df["Weekday"]

# getting train data from the dataframe


train = df.drop("Weekday")

# Splitting between train data into training and validation dataset


X_train, X_test, y_train, y_test = train_test_split(
train, target, test_size=0.20)

# initializing all the model objects with default parameters


model_1 = LogisticRegression()
model_2 = XGBClassifier()
model_3 = RandomForestClassifier()

# Making the final model using voting classifier


final_model = VotingClassifier(
estimators=[('lr', model_1), ('xgb', model_2), ('rf', model_3)], voting='hard')

# training all the model on the train dataset


final_model.fit(X_train, y_train)

# predicting the output on the test dataset


pred_final = final_model.predict(X_test)

# printing log loss between actual and predicted value


print(log_loss(y_test, pred_final))
3. Bagging: Random Forest
also known as a bootstrapping method
4. Boosting: Gradient Boost, XGBoost, AdaBoost

Boosting is a sequential method–it aims to prevent a wrong base model from affecting the
final output. Instead of combining the base models, the method focuses on building a new
model that is dependent on the previous one. A new model tries to remove the errors made by
its previous one. Each of these models is called weak learners. The final model (aka strong
learner) is formed by getting the weighted mean of all the weak learners.

5. Stacking: It is an ensemble method that combines multiple models (classification


or regression) via meta-model (meta-classifier or meta-regression).

The base models are trained on the complete dataset, then the meta-model is
trained on features returned (as output) from base models. The base models in
stacking are typically different. The meta-model helps to find the features from base
models to achieve the best accuracy.

Stacking is a bit different from the basic ensembling methods because it has first-level and
second-level models. Stacking features are first extracted by training the dataset with all the
first-level models. A first-level model is then using the train stacking features to train the
model than this model predicts the final output with test stacking features.

Algorithm:
1. Split the train dataset into n parts
2. A base model (say linear regression) is fitted on n-1 parts and predictions
are made for the nth part. This is done for each one of the n part of the
train set.
3. The base model is then fitted on the whole train dataset.
4. This model is used to predict the test dataset.
5. The Steps 2 to 4 are repeated for another base model which results in
another set of predictions for the train and test dataset.
6. The predictions on train data set are used as a feature to build the new
model.
7. This final model is used to make the predictions on test dataset

6. Blending: It is similar to the stacking method explained above, but rather than using the
whole dataset for training the base-models, a validation dataset is kept separate to make
predictions.
Algorithm:
1. Split the training dataset into train, test and validation dataset.
2. Fit all the base models using train dataset.
3. Make predictions on validation and test dataset.
4. These predictions are used as features to build a second level model
5. This model is used to make predictions on test and meta-features

You might also like