0% found this document useful (0 votes)
12 views

unit 4 ml

The document provides an overview of ensemble learning techniques, focusing on methods like bagging, boosting, and stacking to improve model performance. It explains key concepts such as diversity, combining models, and the advantages and disadvantages of ensemble methods, including voting classifiers. Additionally, it details basic ensemble techniques like max voting, averaging, and weighted average, along with practical applications and examples in machine learning.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

unit 4 ml

The document provides an overview of ensemble learning techniques, focusing on methods like bagging, boosting, and stacking to improve model performance. It explains key concepts such as diversity, combining models, and the advantages and disadvantages of ensemble methods, including voting classifiers. Additionally, it details basic ensemble techniques like max voting, averaging, and weighted average, along with practical applications and examples in machine learning.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT-IV

Ensemble Learning and Random Forest: Introduction to Ensemble Learning, Basic Ensemble Techniques (Max Voting,
Averaging, Weighted Average), Voting Classifiers, Bagging and Pasting, Out-of-Bag Evaluation, Random Patches and
Random Subspaces, Random Forests (Extra-Trees, Feature Importance), Boosting (AdaBoost, Gradient Boosting),
Stacking.

# Introduction to Ensemble Learning


Ensemble learning is a machine learning technique that combines multiple individual models to solve a problem and
improve the overall performance compared to a single model. The basic idea is that by combining multiple models,
the weaknesses of one model can be compensated by the strengths of others. This approach typically leads to more
accurate and robust predictions.
Key Concepts of Ensemble Learning:
1. Diversity: For an ensemble method to be effective, the models combined in the ensemble must be diverse,
meaning they make different kinds of errors. This diversity allows the ensemble to correct individual model
errors, improving the overall prediction.
2. Combining Models: The results from multiple models are combined to form a final prediction. The
combination methods can include:
o Averaging: For regression tasks, the predictions of individual models are averaged.
o Voting: For classification tasks, the class with the majority of votes is chosen.
3. Improved Accuracy: The ensemble model usually performs better than individual models by reducing
variance (overfitting) and bias (underfitting).
Types of Ensemble Learning Methods:
1. Bagging (Bootstrap Aggregating):
o In bagging, multiple models (usually the same type) are trained on different subsets of the training
data, often generated by bootstrapping (sampling with replacement).
o Each model votes or averages its predictions, and the final result is based on this aggregated output.
o Example: Random Forest is a well-known bagging algorithm, where multiple decision trees are
trained independently, and their predictions are averaged or voted on.
2. Boosting:
o Boosting focuses on combining weak learners (models that perform slightly better than random
guessing) in a sequential manner.
o Each new model is trained to correct the errors made by previous models, giving more weight to the
misclassified data points.
o The final prediction is a weighted combination of the individual model predictions.
o Example: AdaBoost, Gradient Boosting, and XGBoost are popular boosting algorithms.
3. Stacking:
o Stacking (or stacked generalization) involves training multiple models and using another model,
known as a meta-model or blender, to combine the predictions of the base models.
o Unlike bagging and boosting, where models are combined directly, stacking uses the predictions of
the base models as input features for the meta-model.
o This approach allows for a more sophisticated combination of models.
Advantages of Ensemble Learning:
 Improved Accuracy: By combining several models, ensemble learning often leads to better performance,
especially in complex problems.
 Reduced Overfitting: It can help reduce overfitting (variance) by averaging out individual model errors.
 Robustness: Ensemble methods are more robust and less likely to perform poorly compared to a single
model.
Disadvantages of Ensemble Learning:
 Increased Complexity: Ensembles can be computationally expensive, requiring more time and resources for
training and prediction.
 Interpretability: Ensemble methods, especially those like boosting and stacking, can be harder to interpret
because they combine multiple models.
Common Applications:
 Classification: Ensemble methods are widely used in classification tasks, such as spam detection, image
recognition, and sentiment analysis.
 Regression: For regression tasks like predicting house prices or stock prices, ensemble methods can improve
the accuracy of predictions.
 Anomaly Detection: Ensemble learning can also be applied in anomaly detection to identify unusual patterns
in data.
Conclusion:
Ensemble learning is a powerful technique that enhances the predictive performance by combining multiple models.
By leveraging the strengths and compensating for the weaknesses of individual models, ensemble methods like
bagging, boosting, and stacking can help address various machine learning challenges, from overfitting to increasing
accuracy and robustness.

# Basic Ensemble Techniques


Ensemble learning combines the predictions of multiple models to improve performance. The key to making an
ensemble work effectively is how the individual model outputs are combined. Below are three common techniques
used for combining predictions: Max Voting, Averaging, and Weighted Average.
1. Max Voting (Majority Voting)
 Definition: Max voting is typically used in classification problems. In this approach, each individual model in
the ensemble casts a vote for a class, and the class that receives the most votes is chosen as the final
prediction.
 How it works:
o Suppose you have an ensemble of N models.
o Each model outputs a class label for a given data point.
o The class that appears most frequently across all the models is the predicted class for that data point.
 Example:
o Model 1: Class 0
o Model 2: Class 1
o Model 3: Class 0
o Model 4: Class 0
o Model 5: Class 1
o Max Voting Result: Class 0 wins because it has the most votes (3 votes for Class 0 vs. 2 votes for Class
1).
 Use Case: This technique is commonly used in bagging methods like Random Forest, where each decision
tree in the forest "votes" for a class.
2. Averaging
 Definition: Averaging is used in regression tasks. In this technique, the outputs of all the models in the
ensemble are averaged to make the final prediction. This helps to smooth out errors made by individual
models, leading to more stable predictions.
 How it works:
o Suppose you have an ensemble of NNN models, and each model predicts a numeric value for the
given data point.
o The final prediction is the average of these individual predictions.
 Example:
o Model 1: 10
o Model 2: 12
o Model 3: 11
o Model 4: 9
o Model 5: 10
o Averaging Result: The final prediction is the average of all the predictions:

 Use Case: Averaging is commonly used in bagging methods like Random Forest Regression, where multiple
regression trees provide their predictions, and the final prediction is the average of all.
3. Weighted Average
 Definition: The weighted average method is a more refined version of the averaging technique. In this
method, each model's prediction is given a different weight, and the final prediction is the weighted average
of all model predictions. The weight can be based on the model's performance or confidence in the
prediction.
 How it works:
o Each model's prediction is multiplied by a weight that reflects the model's accuracy or reliability.
o The final prediction is calculated as the sum of all the weighted predictions divided by the total sum
of the weights.
 Example:
o Suppose you have the following models and their predictions:
 Model 1: Prediction = 10, Weight = 0.4
 Model 2: Prediction = 12, Weight = 0.3
 Model 3: Prediction = 11, Weight = 0.2
 Model 4: Prediction = 9, Weight = 0.1
o The weighted average prediction is:

 Use Case: Weighted averaging can be used when models have different performances or when certain
models are known to be more reliable for certain types of data. For instance, models with lower error rates
might be assigned higher weights.
Technique Type of Problem How it Works Example
Each model votes for a class, and the class with Random Forest
Max Voting Classification
the most votes is the final prediction. (classification)
The final prediction is the average of all model
Averaging Regression Random Forest Regression
predictions.
Predictions are weighted by the model’s
Weighted Used in more sophisticated
Regression/Classification performance, and the final prediction is a
Average ensembles like boosting
weighted average.
Summary:
These ensemble techniques—max voting, averaging, and weighted averaging—are foundational methods for
combining predictions from multiple models, and they serve as the core of more advanced ensemble methods like
Random Forests (bagging), Boosting algorithms (like AdaBoost or Gradient Boosting), and Stacking.

/_________________________________________________________________________/
#Voting Classifiers
A Voting Classifier is a type of ensemble learning method that combines multiple machine learning models (also
known as "base models" or "weak learners") to make a final prediction based on the majority voting principle. It is
primarily used for classification tasks, where the goal is to combine the predictions of multiple classifiers to increase
accuracy and robustness.
Key Concept of Voting Classifiers:
 Each base model in the ensemble makes a prediction for a given instance.
 Voting occurs to decide the final predicted class:
o The class that receives the most votes from the individual models is chosen as the final prediction.
Voting classifiers are particularly useful when combining different models with complementary strengths, which
leads to improved overall performance.
Types of Voting Classifiers:
1. Hard Voting (Majority Voting):
o Definition: In hard voting, each classifier in the ensemble casts a vote for a class, and the class with
the most votes is chosen as the final prediction. If there is a tie (e.g., two classes have the same
number of votes), a predefined tie-breaking rule may be used.
o How it works:
 Suppose you have an ensemble of NNN classifiers.
 Each classifier assigns a predicted class label to the input data point.
 The final prediction is the class label that appears most frequently across all classifiers.
o Example:
 Model 1: Class 0
 Model 2: Class 1
 Model 3: Class 0
 Model 4: Class 0
Model 5: Class 1
Hard Voting Result: Class 0 is chosen because it has the most votes (3 votes for Class 0 vs. 2
votes for Class 1).
o Use Case: This approach is simple, and often used when combining multiple classifiers like decision
trees, logistic regression, or support vector machines (SVMs) in an ensemble.
2. Soft Voting:
o Definition: Soft voting is a more advanced version of voting. Instead of using hard class labels, soft
voting relies on the predicted probabilities (class probabilities) for each class and takes a weighted
average of these probabilities to make the final prediction.
o How it works:
 For each classifier, the predicted probability of each class is computed.
 The predicted probabilities of each class are averaged (or summed, depending on the
method) across all classifiers.
 The final predicted class is the one with the highest average (or summed) probability.
o Example:
 Model 1: Probability for Class 0 = 0.6, Probability for Class 1 = 0.4
 Model 2: Probability for Class 0 = 0.7, Probability for Class 1 = 0.3
 Model 3: Probability for Class 0 = 0.5, Probability for Class 1 = 0.5
 Soft Voting Result:

Final prediction: Class 0 (since it has the higher average probability).



o Use Case: Soft voting is more effective than hard voting when the models in the ensemble produce
well-calibrated class probabilities. This is often the case with models like Logistic Regression or Naive
Bayes, which provide probabilities as output.
Example of Using a Voting Classifier in Python:
Here’s an example of how to implement a Voting Classifier using scikit-learn in Python, with both hard voting and
soft voting.
python

from sklearn.ensemble import VotingClassifier


from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load a dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define base classifiers


clf1 = LogisticRegression(max_iter=200)
clf2 = DecisionTreeClassifier(random_state=42)
clf3 = SVC(probability=True, random_state=42)

# Create a Voting Classifier (hard voting)


voting_clf_hard = VotingClassifier(estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3)], voting='hard')

# Create a Voting Classifier (soft voting)


voting_clf_soft = VotingClassifier(estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3)], voting='soft')

# Train the hard voting classifier


voting_clf_hard.fit(X_train, y_train)

# Train the soft voting classifier


voting_clf_soft.fit(X_train, y_train)

# Evaluate the models


print("Hard Voting Classifier Accuracy: ", voting_clf_hard.score(X_test, y_test))
print("Soft Voting Classifier Accuracy: ", voting_clf_soft.score(X_test, y_test))

Advantages of Voting Classifiers:


 Improved Accuracy: By combining multiple models, a voting classifier often outperforms individual
classifiers.
 Robustness: Voting classifiers are more robust to errors because they combine different perspectives (from
different models) to make a final decision.
 Versatility: Voting classifiers can combine a variety of different model types (e.g., decision trees, support
vector machines, logistic regression), leveraging their unique strengths.
Disadvantages of Voting Classifiers:
 Complexity: The ensemble models are often more computationally expensive to train and use, especially
with a large number of classifiers.
 Diminished Returns: If the base models are very similar or have high bias/variance, the ensemble’s
performance might not significantly improve compared to the individual models.
 Interpretability: With multiple models combined in an ensemble, it may be harder to interpret the decision-
making process compared to using a single model.
Use Cases:
 Ensemble for Classification Problems: Voting classifiers are effective in combining different classifiers (e.g.,
decision trees, logistic regression, SVMs) for tasks like sentiment analysis, image classification, and spam
detection.
 Improved Prediction Stability: In cases where individual models might have high variance (e.g., decision
trees), a voting classifier can reduce the variance and increase stability.
Conclusion:
A Voting Classifier is an effective and simple ensemble method that improves classification performance by
combining multiple models. Hard voting works by taking the majority class prediction, while soft voting averages the
predicted probabilities to make a more refined decision. By using these techniques, voting classifiers can produce
more accurate, robust, and stable predictions than individual classifiers, especially in complex or noisy datasets.

# Bagging and Pasting in Machine Learning


Bagging and Pasting are two ensemble learning techniques that aim to improve the performance of machine
learning models by combining multiple base models trained on different subsets of data. Both methods are forms of
Bootstrap Aggregating (Bagging) but differ in how the training data subsets are generated.
Let’s explore both methods in more detail.

1. Bagging (Bootstrap Aggregating)


Bagging, short for Bootstrap Aggregating, is an ensemble learning method that helps to reduce variance and
improve the performance of machine learning models, particularly those that tend to overfit, such as decision trees.
Key Concept of Bagging:
 Data Sampling: In bagging, multiple models are trained on random subsets of the original training data.
These subsets are generated by bootstrapping, which means sampling the data with replacement. This
allows some data points to be repeated in the same subset, while others may not appear at all.
 Combining Predictions: Once the individual models are trained, their predictions are combined to produce a
final output:
o Classification: For classification tasks, the most common method is majority voting (for each data
point, the class predicted by the majority of models is selected).
o Regression: For regression tasks, the predictions are usually averaged to get the final result.
How Bagging Works:
1. Data Subsets: From the original training dataset, NNN bootstrap samples (random subsets of data with
replacement) are drawn.
2. Train Models: Each bootstrap sample is used to train an individual model, such as a decision tree, on the
data.
3. Combine Results:
o For Classification: Each model "votes" on the predicted class for a given input, and the class with the
majority of votes is chosen.
o For Regression: The predictions of all models are averaged to obtain the final prediction.
Example: Random Forest
 Random Forest is one of the most popular algorithms based on bagging. It builds a collection of decision
trees, each trained on a bootstrap sample of the data. The final prediction is made based on majority voting
(for classification) or averaging (for regression).
Advantages of Bagging:
 Reduces Overfitting: Bagging is particularly effective for high-variance models (e.g., decision trees) as it
reduces variance without increasing bias significantly.
 Improves Accuracy: By combining multiple models, bagging typically results in a more robust model than any
individual model.
 Parallelism: Since each model is trained independently, bagging can be parallelized, making it
computationally efficient.
Disadvantages of Bagging:
 Computationally Expensive: Training multiple models on different data subsets can require significant
computational resources.
 Model Interpretability: When using a large ensemble of complex models (like decision trees), interpreting
the overall decision-making process becomes challenging.

2. Pasting
Pasting is a variation of bagging with a subtle but important difference in how the training subsets are created.
Key Concept of Pasting:
 Data Sampling: Unlike bagging, pasting generates training subsets by sampling without replacement from
the original training data. In other words, no data point can appear more than once in any given training
subset.
 Combining Predictions: After training the individual models on different subsets of the data, the predictions
are combined in the same way as in bagging: using majority voting for classification or averaging for
regression.
How Pasting Works:
1. Data Subsets: From the original dataset, NNN samples are drawn without replacement to create each
training subset.
2. Train Models: Each subset is used to train an individual model, just like in bagging.
3. Combine Results:
o For Classification: Each model casts a vote for a predicted class, and the majority vote is taken as the
final prediction.
o For Regression: The predictions from all models are averaged.
Key Difference Between Bagging and Pasting:
 Bagging samples data with replacement, meaning a data point can appear multiple times in the same subset.
 Pasting samples data without replacement, meaning each subset contains unique data points and no data
point appears more than once.
Example:
Imagine a dataset with 10 data points:
 In Bagging: A training subset could contain, for example, data points {1, 2, 2, 4, 5, 7, 8, 8, 9, 10} (with
repetitions).
 In Pasting: A training subset would contain {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} (with no repetitions).
Advantages of Pasting:
 No Redundancy: Since there are no repeated data points in each training subset, pasting might be more
efficient in utilizing the available training data.
 Good for High-Variance Models: Like bagging, pasting helps reduce overfitting by combining multiple
models.
Disadvantages of Pasting:
 Limited Diversity: Since no data points are repeated in each subset, the subsets might have more in common
with each other than in bagging. This could lead to a reduced level of model diversity, which can impact
performance.
 Computational Cost: Like bagging, pasting requires training multiple models, which can be computationally
expensive.

Comparison of Bagging and Pasting:


Aspect Bagging Pasting
Sampling Method Sampling with replacement Sampling without replacement
Training Data Some data points may appear multiple times in Each data point appears at most once in a
Subsets each subset subset
Higher model diversity due to repeated data Lower diversity compared to bagging due to
Model Diversity
points in subsets unique data points in each subset
Computational Can be parallelized, but subsets are typically
Can be parallelized effectively
Efficiency smaller
Best for reducing high variance in models prone Suitable when the dataset is large, and data
Use Case
to overfitting (e.g., decision trees) points should not be reused

Example of Bagging and Pasting in Python (with scikit-learn):


Both bagging and pasting can be implemented using the BaggingClassifier in scikit-learn. Although the class is named
BaggingClassifier, the only difference between bagging and pasting in scikit-learn is the sampling method, controlled
via the max_samples parameter.
python
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Bagging (Bootstrap Aggregating) using decision tree classifier


bagging_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=50, random_state=42)
bagging_clf.fit(X_train, y_train)
print(f"Bagging Accuracy: {bagging_clf.score(X_test, y_test)}")

# Pasting using decision tree classifier (max_samples = 1.0 ensures no repetition of data points)
pasting_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=50, max_samples=1.0, bootstrap=False,
random_state=42)
pasting_clf.fit(X_train, y_train)
print(f"Pasting Accuracy: {pasting_clf.score(X_test, y_test)}")
Conclusion:
 Bagging and Pasting are ensemble methods that improve the performance of machine learning models by
reducing overfitting and increasing robustness.
 Bagging uses bootstrap sampling (sampling with replacement), while Pasting uses sampling without
replacement.
 Both methods work by training multiple models on different subsets of data and combining their predictions,
but bagging tends to have more model diversity due to repeated samples in the training sets, while pasting
ensures no redundancy in the data subsets.

# Out-of-Bag (OOB) Evaluation

Out-of-Bag (OOB) Evaluation is a technique used to estimate the performance of an ensemble model, particularly in
methods like Bagging (Bootstrap Aggregating) and Random Forests, without needing a separate validation set or
cross-validation. It leverages the inherent structure of bootstrapping to evaluate model performance on data that
wasn't used in training individual models.
Key Concept of Out-of-Bag Evaluation:
In bagging, each model is trained on a bootstrap sample—a random subset of the training data with replacement.
Since each subset can have repeated data points, some data points are left out of the training set for each model.
These left-out data points are called Out-of-Bag (OOB) samples.
Out-of-Bag Evaluation uses these OOB samples to estimate the performance of the model. The main idea is:
 For each data point in the training set, you can track how often it is left out of the bootstrap samples during
model training.
 When predicting the class (in classification) or the value (in regression) for a data point, only the models that
did not see that point during training are used.
 This allows for a validation-like process without needing to reserve a separate validation set.
How Out-of-Bag (OOB) Evaluation Works:
1. Bootstrapping: During the training phase, each model in the ensemble is trained on a bootstrap sample
(random subset with replacement). For each model, a portion of the data is left out (OOB samples).
2. OOB Prediction: Each data point has multiple models that did not see it during training (since the data point
was left out of the bootstrap sample). These models are used to predict the outcome for that data point.
3. Performance Estimation: The predictions made by the models that did not use a particular data point are
compared to the actual label (for classification) or value (for regression) of that data point. The performance
(e.g., accuracy, mean squared error) is averaged across all data points.
Example:
Consider a dataset with 1000 samples:
 Each decision tree in the random forest is trained on a bootstrap sample, and each tree leaves out some
samples from the original dataset (because of sampling with replacement).
 For each data point in the dataset, we can see how often it was left out and which trees are available to make
predictions for that data point.
 If a data point was not in a tree’s training set, that tree can be used to predict the class or value of that point.
The final prediction is often the average or majority vote of the predictions from all trees that did not use
that point.
 The OOB error rate is then calculated as the average error of all predictions made using OOB samples.
Advantages of Out-of-Bag Evaluation:
 No Need for a Separate Validation Set: OOB evaluation effectively uses the data points that were left out
during the training process to estimate the model's performance, so it eliminates the need for an extra
holdout validation set. This is especially useful when the dataset is small.
 Efficient: Since the models are trained on different subsets of the data, OOB evaluation can be done without
additional data splits, which saves time and computation.
 Accurate Estimate: The OOB estimate can be as accurate as other validation techniques like cross-validation.
In fact, for Random Forests, it is often as good or better because the OOB process inherently tests each
model on data that it has not seen.
 Reduces Overfitting: By using the OOB samples for evaluation, the model's tendency to overfit to the training
data is reduced. The OOB samples act as a kind of pseudo-validation set that helps provide an unbiased
estimate of model performance.
Example of Out-of-Bag Evaluation in Random Forests (with Python):
In scikit-learn, RandomForestClassifier and RandomForestRegressor provide built-in support for OOB evaluation.
Here's an example using RandomForestClassifier.
python
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load dataset
X, y = load_iris(return_X_y=True)

# Split into train and test sets (no validation set used here)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize Random Forest Classifier with OOB evaluation enabled
rf = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=42)
# Train the model
rf.fit(X_train, y_train)
# Access the Out-of-Bag score (similar to accuracy)
print(f"Out-of-Bag Score: {rf.oob_score_}")
# Evaluate on test data
test_accuracy = rf.score(X_test, y_test)
print(f"Test Accuracy: {test_accuracy}")

Interpretation of the Output:


 The OOB Score provides the accuracy of the Random Forest classifier on the out-of-bag samples (those data
points that were not included in each bootstrap sample). It is used as an estimate of the model's
performance on unseen data.
 The Test Accuracy shows how well the model performs on the test set, which is typically used as a final
evaluation.
Use Case:
 Random Forests: OOB evaluation is particularly useful in Random Forests, where multiple trees are trained
on bootstrapped data samples. Each tree in the forest is evaluated on the data points that were not included
in its own training set. This provides an unbiased estimate of performance without needing to use a separate
validation set.
 Other Ensemble Models: Although OOB evaluation is most commonly used in Random Forests, it can also be
applied to other bagging algorithms, like BaggingClassifier and BaggingRegressor.
Summary of Key Points:
 Out-of-Bag (OOB) Evaluation is an efficient method to estimate the performance of an ensemble model like
Random Forests.
 It works by using data points that were not included in the bootstrap samples during training.
 It is an advantage over using a separate validation set because it makes full use of the training data without
overfitting, saving both time and computation.
 In scikit-learn, the oob_score attribute can be used to retrieve the OOB score for Random Forest models.
 OOB evaluation provides a reliable estimate of model performance, often comparable to cross-validation.

You might also like