0% found this document useful (0 votes)
7 views

ML U3 Notes

xv xcv

Uploaded by

Shrenik Pittala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

ML U3 Notes

xv xcv

Uploaded by

Shrenik Pittala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Key Concepts in AdaBoost

Here are the important ideas in AdaBoost simplified:

1. Weak Learners

o These are simple models (like decision stumps) that perform slightly better than random
guessing.

o They are trained in sequence, focusing more on data points that previous models found
hard to classify.

2. Strong Classifier

o This is the final model created by combining the predictions of all weak learners.

o It is powerful and accurate because it uses the collective learning of all the weak
learners.

3. Weighted Voting

o Each weak learner gets a weight based on how well it performs.

o More accurate models have a bigger influence on the final prediction.

4. Error Rate

o Measures how many mistakes a weak learner makes.

o Models with fewer errors get higher weights in the ensemble.

5. Iterations

o AdaBoost trains weak learners in multiple rounds (iterations).

o The number of iterations is a key setting; too many can lead to overfitting.

Advantages of AdaBoost

Why AdaBoost is useful:

1. Better Accuracy

o Even with simple models, it can significantly improve accuracy by focusing on tough-to-
classify data.

2. Versatile

o Works with many types of base models and can be applied to different problems.

3. Feature Selection

o Automatically picks the most important features, reducing the need for manual feature
selection.
4. Less Overfitting

o It’s less likely to overfit compared to some other ensemble methods.

Limitations and Challenges

Things to be careful about:

1. Sensitive to Noisy Data

o Noisy data and outliers can mislead AdaBoost because it gives extra weight to
misclassified data points.

2. Computationally Expensive

o Training multiple models takes time, especially for large datasets or many iterations.

3. Overfitting Risk

o Too many iterations can lead to overfitting, especially on small datasets.

4. Complex Tuning

o Choosing the right weak learner and settings (like the number of iterations) can be
tricky.

Summary Table for Quick Memorization

Concept/Advantage/Challenge Key Idea

Weak Learners Simple models trained on hard-to-classify data

Strong Classifier Combines all weak learners for accuracy

Weighted Voting More accurate models get higher influence

Error Rate Measures mistakes; low error = higher weight

Iterations Trains models in multiple rounds

Advantages Improved accuracy, versatility, feature selection

Challenges Sensitive to noise, slow training, overfitting risk

This breakdown makes it easier to recall key points about AdaBoost.

Bagging
• Bagging, an abbreviation for Bootstrap Aggregating, is a machine learning ensemble strategy for
enhancing the reliability and precision of predictive models.
• It entails generating numerous subsets of the training data by employing random sampling with
replacement
• These subsets train multiple base learners, such as decision trees, neural networks, or other
models.

Implementing bagging involves several steps. Here's a general overview:

1. Dataset Preparation: Prepare your dataset, ensuring it's properly cleaned and preprocessed.
Split it into a training set and a test set.

2. Bootstrap Sampling: Randomly sample from the training dataset with replacement to create
multiple bootstrap samples. Each bootstrap sample should typically have the same size as the
original dataset, but some data points may be repeated while others may be omitted.

3. Model Training: Train a base model (e.g., decision tree, neural network, etc.) on each bootstrap
sample. Each model should be trained independently of the others.

4. Prediction Generation: Use each trained model to predict the test dataset.

5. Combining Predictions: Combine the predictions from all the models. You can use majority
voting to determine the final predicted class for classification tasks. For regression tasks, you can
average the predictions.

6. Evaluation: Evaluate the bagging ensemble's performance on the test dataset using appropriate
metrics (e.g., accuracy, F1 score, mean squared error, etc.).

7. Hyperparameter Tuning: If necessary, tune the hyperparameters of the base model(s) or the
bagging ensemble itself using techniques like cross-validation.

8. Deployment: Once you're satisfied with the performance of the bagging ensemble, deploy it to
make predictions on new, unseen data.

Advantages

Main reduces variance

Reduces the risks of overfitting

Rest write your won

Applications
Write your own

Bagging and Sub-bagging are similar. Only difference is that Sub bagging uses random sampling
without replacement where as bagging uses random sampling with replacement
Differences Between Bagging and Subbagging

Aspect Bagging Subbagging

Subsets are created without


Subset Creation Subsets are created with replacement.
replacement.

Each subset can have the same size as the Subsets are usually smaller than the
Sample Size
original dataset. original dataset.

Data points can appear multiple times in a Each data point appears at most once
Data Redundancy
subset. in a subset.

More computationally intensive due to


Complexity Less computationally intensive.
larger subsets.

Better at handling overfitting due to more Less effective in handling overfitting in


Overfitting Handling
diverse subsets. comparison.

Performance on Performs better on noisy data due to its


May struggle with noisy data.
Noisy Data robustness.

Works well with larger datasets and high Suitable for smaller datasets or when
Best Use Case
computational resources. resources are limited.

Simplified ensemble models with


Examples of Use Random Forest, Bagging Classifier.
reduced data usage.

Summary

Bagging emphasizes diversity by allowing data repetition within subsets.

Subbagging is simpler, faster, and uses smaller, non-repeating subsets.

Stumping
• Stumping is a technique where a decision stump (a very simple model) is used as a base learner
in an ensemble learning method like AdaBoost.
• A decision stump is a decision tree with just one split (or decision point).
• It means the model makes decisions based on a single feature.

Purpose of Stumping:

• It simplifies the learning process by focusing on just one feature at a time.


• Stumps are very fast to train because they are extremely simple.

Use in AdaBoost:
• In AdaBoost, many stumps are created sequentially.

• Each stump focuses on the data points that were misclassified by the previous stumps.

Bagging vs Boosting
Differences Between Bagging and Boosting

Feature Bagging Boosting

Type of Parallel ensemble method, where base Sequential ensemble method, where
Ensemble learners are trained independently. base learners are trained sequentially.

Base learners are trained sequentially,


Base learners are typically trained in
with each subsequent learner focusing
Base Learners parallel on different subsets of the
more on correcting the mistakes of its
data.
predecessors.

Misclassified data points are given more


Weighting of All data points are equally weighted in
weight in subsequent iterations to focus
Data the training of base learners.
on difficult instances.

Mainly reduces bias by focusing on


Reduction of Mainly reduces variance by averaging
difficult instances and improving the
Bias/Variance predictions from multiple models.
accuracy of subsequent models.

More sensitive to outliers, especially in


Handling of Resilient to outliers due to averaging
boosting iterations where misclassified
Outliers or voting among multiple models.
instances are given more weight.

May be less robust to outliers,


Generally robust to noisy data and
especially in boosting iterations where
Robustness outliers due to averaging of
misclassified instances are given more
predictions.
weight.
Model Training Can be parallelized, allowing for faster Generally slower than bagging, as base
Time training on multi-core systems. learners are trained sequentially.

AdaBoost, Gradient Boosting Machines


Random Forest is a popular bagging
Examples (GBM), and XGBoost are popular
algorithm.
boosting algorithms.

KD Trees
Are KD Trees and KNN the Same?

No, KD Trees and KNN (k-Nearest Neighbors) are not the same, but they are related.

• KNN is an algorithm used for classification or regression, where we find the k-nearest neighbors
of a given data point.

• KD Trees are a data structure used to make finding those neighbors (in KNN) faster, especially in
high-dimensional data.

KD Tree Explained in Simple English

A KD Tree (K-Dimensional Tree) is a binary tree that organizes points in a space with multiple
dimensions (like 2D or 3D) for fast searching of neighbors.

Key Idea:

• Split the data points into smaller regions, where each region focuses on a specific part of the
dataset.

• At each level, split the data based on one dimension (like x, y, or z) and alternate dimensions at
each level.

How KD Tree Works

Building the KD Tree:

1. Start with All Points:

o Begin with a set of points (e.g., locations on a map: (x, y) coordinates).

2. Choose a Splitting Dimension:


o Split the points based on a chosen dimension (e.g., x-coordinate at the first level, y-
coordinate at the second level, etc.).

o Alternate dimensions at each level.

3. Find the Median:

o Sort the points by the chosen dimension and find the median.

o The median becomes the "root" of the current level.

4. Split into Left and Right:

o Points smaller than the median (on the chosen dimension) go to the left subtree.

o Points larger go to the right subtree.

5. Repeat Recursively:

o Continue splitting the remaining points in the same way until all points are in leaf nodes.

Algorithm for KD Tree Construction

1. Input: A set of points and the current depth ddd.

2. Choose Splitting Dimension:

o Split dimension = dmod kd \mod kdmodk, where kkk is the total number of dimensions.

3. Find Median:

o Sort points along the splitting dimension and choose the median.

4. Create Node:

o The median becomes the current node.

5. Recursive Calls:

o Build left and right subtrees using points before and after the median.

6. Base Case:

o Stop when no points are left.

Example of KD Tree (2D Example)

Points:

(3,6),(2,7),(17,15),(6,12),(13,15),(9,1),(10,19)(3, 6), (2, 7), (17, 15), (6, 12), (13, 15), (9, 1), (10,
19)(3,6),(2,7),(17,15),(6,12),(13,15),(9,1),(10,19)
Step-by-Step Construction: (Example in Notes)

1. Depth = 0 (Split by x-coordinate):

o Points sorted by x: (2,7),(3,6),(6,12),(9,1),(10,19),(13,15),(17,15)(2, 7), (3, 6), (6, 12), (9,


1), (10, 19), (13, 15), (17, 15)(2,7),(3,6),(6,12),(9,1),(10,19),(13,15),(17,15)

o Median: (9,1)(9, 1)(9,1) → Root of tree.

2. Depth = 1 (Split by y-coordinate):

o Left subtree (points < 9): (2,7),(3,6),(6,12)(2, 7), (3, 6), (6, 12)(2,7),(3,6),(6,12)

▪ Median: (3,6)(3, 6)(3,6) → Root of left subtree.

o Right subtree (points > 9): (10,19),(13,15),(17,15)(10, 19), (13, 15), (17,
15)(10,19),(13,15),(17,15)

▪ Median: (13,15)(13, 15)(13,15) → Root of right subtree.

3. Continue Recursively:

o Repeat the process for each subset, alternating between x and y splits.

Searching in KD Tree (Nearest Neighbor Search)

Goal:

Find the closest point to a given query point.

Steps:

1. Start at the root and compare the query point to the splitting dimension.

2. Move to the left or right subtree based on the query point’s position relative to the current
node.

3. Once you reach a leaf node, calculate the distance to the query point.

4. Backtrack and check the other subtree if necessary (to ensure the closest point isn’t missed).

Advantages of KD Tree

1. Fast Search: Reduces the number of distance calculations compared to brute-force KNN.

2. Efficient for Low Dimensions: Works well for datasets with a moderate number of dimensions.

3. Supports KNN: KD Trees make KNN searches more efficient.

Limitations of KD Tree
1. Curse of Dimensionality: Performance decreases as dimensions increase.

2. Uneven Splits: If the data isn’t evenly distributed, the tree may become unbalanced.

Example Use Case:

Imagine you have GPS data of cities and want to find the city closest to a given location. Instead of
calculating distances for all cities, a KD Tree organizes the cities for fast nearest neighbor searches.

For example:

• Query: (5,10)(5, 10)(5,10)

• KD Tree quickly identifies (6,12)(6, 12)(6,12) as the closest point.

You might also like