0% found this document useful (0 votes)

3 views7 pages

Baysian Final

The document compares various boosting algorithms including AdaBoost, Gradient Boosting, XGBoost, LightGBM, and CatBoost, highlighting their origins, core ideas, strengths, and weaknesses. It also discusses regularization techniques (L1, L2, and Elastic Net) and several machine learning models such as Decision Trees, Logistic Regression, SVM, k-NN, Random Forest, and Linear Regression. Additionally, it explains the concepts of bagging and boosting, emphasizing their differences in training methodologies.

Uploaded by

240415

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views7 pages

Baysian Final

Uploaded by

240415

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Feature AdaBoost Gradient Boosting

Origin Introduced by Yoav Freund & Introduced by Jerome Friedman in

Robert Schapire in 1995 2001
Core Idea Focuses more on misclassified Focuses on minimizing loss
samples by adjusting weights function via gradient descent
Error Increases weight of misclassified Fits new learners to the residual
Correction data points errors
Learner Each learner has a weight based on Each learner updates the model to
Weighting its accuracy minimize loss
Data Weights Adjusts sample weights after each Sample weights usually stay fixed
round
Default Loss Exponential loss Flexible (MSE, log-loss, MAE,
Function etc.)
Training Tries to fix what the last learner got Tries to reduce residuals of the last
Strategy wrong (hard examples) prediction
Strengths Simple, good for clean data Powerful, customizable, often
better accuracy
Weaknesses Sensitive to outliers/noise Can overfit if not tuned well

AdaBoost — Key Concepts

 Base Model: Weak learners (usually decision stumps)
 Focus: Focuses on misclassified samples
 Mechanism:
o After each round, increase weight of misclassified points.
o The next model pays more attention to those points.
 Final Prediction: Weighted vote of all learners based on accuracy.

Gradient Boosting — Key Concepts

 Base Model: Usually decision trees (not necessarily stumps)
 Focus: Minimizes a loss function (like MSE or log-loss)
 Mechanism:
o After each step, compute residual errors (actual - predicted)
o Train the next model to predict these residuals
o Use gradient descent to iteratively update the model
AdaBoost (Adaptive Boosting):

 Focuses on wrong predictions.

 Increases the importance (weight) of misclassified points.
 Each new model tries to correct the mistakes of the previous one.
 Works well with clean data.

🔹 Gradient Boosting:

 Focuses on errors (residuals) instead of weights.

 Each new model tries to reduce the total error by predicting the residuals.
 Uses gradient descent to improve the model step by step.
 More flexible and powerful, but needs careful tuning.

Imagine you're standing on a hill (graph of error) and want to reach the lowest point (minimum
error).

 You take small steps downhill.

 Each step moves you in the direction where the slope is steepest.
 Eventually, you reach the bottom (minimum loss).

That’s gradient descent!

XGBoost (Extreme Gradient Boosting)

🧬 Origin:

 Developed by Tianqi Chen as part of his PhD project.

 Released in 2014, became hugely popular after dominating Kaggle competitions.
 Based on gradient boosting but optimized for speed and performance.
 Maintained by DMLC (Distributed Machine Learning Community).

💡 Key Concepts:

1. Gradient Boosting:
o Builds additive models in a forward stage-wise fashion.
o Fits new models to correct residuals of previous models using the gradient of the
loss function.
2. Regularized Objective:
o Adds L1 (Lasso) and L2 (Ridge) regularization to the loss to avoid overfitting.

Obj=∑il(yi,y^i(t))+∑kΩ(fk)\text{Obj}

3. Second-Order Approximation:
o Uses both first and second derivatives (Hessian) of the loss function to optimize
trees (unlike traditional GBMs that use only gradients).
4. Tree Pruning:
o Employs a max-depth pruning strategy after building the tree to avoid
unnecessary splits (greedy algorithm).
5. Handling Sparse Data:
o Efficiently manages missing values and sparse data using a default direction in
split.
6. Parallelization:
o Parallel tree construction on a feature-wise basis (column block) to boost training
speed.
7. Out-of-core computation:
o Capable of handling very large datasets that do not fit in memory by using disk.

2. LightGBM (Light Gradient Boosting Machine)

🧬 Origin:

 Developed by Microsoft Research in 2016.

 Aimed at being faster and more scalable than XGBoost for large datasets.

💡 Key Concepts:

1. Histogram-based Decision Tree Learning:

o Converts continuous features into discrete bins, reducing memory usage and
speeding up training.
o Results in faster training and lower memory usage.
2. Leaf-wise Tree Growth (Best-first):
o Unlike level-wise growth (like XGBoost), LightGBM grows trees leaf-wise,
choosing the leaf with the highest loss reduction.
o Can result in deeper and more accurate trees but may overfit on small datasets.
3. Gradient-based One-Side Sampling (GOSS):
o Samples data points with large gradients more frequently since they contribute
more to loss.
o Reduces data without hurting accuracy.
4. Exclusive Feature Bundling (EFB):
o Combines mutually exclusive (non-overlapping) features into a single one to
reduce dimensionality.
5. GPU Support:
o Supports GPU-based training for even faster performance on large datasets.
6. Built-in Support for Categorical Features:
o Efficient native handling without one-hot encoding.
3. CatBoost (Categorical Boosting)
🧬 Origin:

 Developed by Yandex (Russia’s Google equivalent) in 2017.

 Designed to be particularly effective on datasets with categorical features.

💡 Key Concepts:

1. Ordered Boosting:
o Prevents target leakage by using permutations of the dataset when computing
residuals.
o Avoids overfitting by ensuring that predictions for a row are not based on its own
target.
2. Efficient Categorical Feature Handling:
o Converts categorical values into numbers using target statistics, but in an
ordered and smoothed way to prevent overfitting.
o Avoids one-hot encoding and handles high-cardinality categorical features
efficiently.
3. Symmetric Trees (Oblivious Trees):
o Uses symmetric decision trees, where all nodes at the same depth split on the
same feature.
o Faster inference and highly optimized for CPU/GPU.
4. Minimal Data Preprocessing:
o Can be used without extensive data preprocessing, handling NaNs and
categories natively.
5. Robust to Overfitting:
o Due to ordered boosting and regularization methods, it's more stable on small
datasets than LightGBM.

L1 Regularization (Lasso Regression)

L1 regularization, also known as Lasso (Least Absolute Shrinkage and Selection Operator),
is a technique used to prevent overfitting and improve model interpretability by shrinking some
model coefficients exactly to zero. This means it effectively performs feature selection by
automatically eliminating less important variables. The penalty term added to the loss function is
the sum of the absolute values of the coefficients. When the regularization strength is
increased, more coefficients are pushed to zero, resulting in a sparse model that uses only a
subset of features. This is particularly useful in high-dimensional datasets where many features
may be irrelevant.

L2 Regularization (Ridge Regression)

L2 regularization, or Ridge regression, is another method to combat overfitting by adding a

penalty to the loss function — this time the sum of the squares of the coefficients. Unlike L1,
Ridge does not shrink coefficients to zero, but rather reduces their magnitude, keeping all
features in the model. This helps to stabilize the model, especially when there is
multicollinearity (i.e., when independent variables are highly correlated). Ridge regression is
useful when you want to retain all features but still want to prevent the model from fitting noise
or overly complex patterns in the data.

L1 + L2 Regularization (Elastic Net)

Elastic Net combines the strengths of both Lasso and Ridge by using a mix of L1 and L2
penalties. It adds both the absolute values and the squares of the coefficients to the loss function.
This allows Elastic Net to perform feature selection like Lasso and also stabilize the model like
Ridge. Elastic Net is particularly useful when you have many correlated features, where Lasso
might randomly pick one and ignore the rest. By blending the two approaches, Elastic Net offers
a more balanced regularization that can produce better generalization performance in many real-
world applications.

Decision Tree

A Decision Tree is a supervised learning model used for both classification and regression tasks.
It works by splitting the data into branches based on feature values to create a tree-like structure.
It is moderately complex and highly interpretable, making it easy to visualize and understand.
Training and prediction are generally fast, but the model is prone to overfitting, especially with
deep trees. It handles non-linear relationships well and does not require feature scaling.
However, it is sensitive to outliers and missing data and performs poorly in those conditions
unless preprocessing is applied. It is suitable for tasks like medical diagnosis or loan approval,
and requires minimal hyperparameter tuning (e.g., tree depth, splitting criteria).

Logistic Regression

Logistic Regression is a simple and fast algorithm used primarily for binary classification,
though it can be extended to multi-class problems. Despite its name, it’s a classification model
that outputs probabilities through the sigmoid function, which are then converted to class
labels. It assumes a linear relationship between input features and the log-odds of the outcome.
It is highly interpretable and computationally efficient, making it ideal for large datasets and
real-time systems. However, it does not handle non-linear relationships unless features are
transformed (e.g., with polynomials). It is sensitive to outliers and requires careful handling of
multicollinearity. Common use cases include spam detection and customer churn prediction.
Support Vector Machine (SVM)

SVM is a powerful supervised learning algorithm used mainly for classification but also
adaptable to regression (SVR). It works by finding the optimal hyperplane that maximizes the
margin between different classes. It is high in complexity and less interpretable compared to
simpler models. SVM performs well in high-dimensional spaces, especially with the kernel
trick, which allows it to model non-linear decision boundaries. However, it is
computationally expensive and does not scale well to very large datasets. It requires feature
scaling and careful tuning of parameters like the kernel type, regularization parameter (C), and
gamma. SVM is commonly used in applications like image classification and face recognition.

k-Nearest Neighbors (k-NN)

k-NN is a simple, instance-based algorithm used for both classification and regression. It has
no training phase — instead, it makes predictions based on the majority vote or average of the
k-nearest data points in the training set. It is easy to understand but slow at prediction time,
especially on large datasets, because it must compute distances to all training instances. It
handles non-linear patterns well but is very sensitive to irrelevant features, outliers, and
requires feature scaling. It is best suited for small datasets and applications like
recommendation systems and handwriting recognition. Hyperparameter tuning is important
for choosing the optimal value of k and distance metric.

Random Forest

Random Forest is an ensemble method that combines many Decision Trees using Bagging
(Bootstrap Aggregation). It is used for both classification and regression, and is highly
accurate and robust due to the averaging (or voting) of multiple trees. This reduces overfitting,
a common problem in individual decision trees. While not as interpretable as a single tree,
Random Forests can still provide insights like feature importance. They handle non-linearity
well, are not sensitive to outliers, and require little to no feature scaling. Random Forest is
scalable, and although training may be slower due to multiple trees, predictions are fairly
efficient. It is commonly used in fields like credit scoring, fraud detection, and stock market
prediction.

Linear Regression

Linear Regression is a fundamental model used for predicting continuous values. It assumes a
linear relationship between the input features and the target variable. It is very fast to train
and predict, and offers excellent interpretability through its coefficients. However, it makes
strong assumptions: linearity, independence of errors, constant variance (homoscedasticity),
and normally distributed residuals. It is sensitive to outliers and multicollinearity, and
performs poorly on non-linear data unless transformed. It works best on small to medium-
sized datasets with a clear linear trend. Common use cases include house price prediction,
salary estimation, and sales forecasting. Feature scaling is sometimes necessary, and
overfitting may occur if irrelevant variables are included.

agging involves training multiple models (like decision trees) on different bootstrapped
datasets. Since these models are independent of each other, they can be trained at the same
time — i.e., in parallel.

🔁 Steps Where Parallelism Happens:

1. Bootstrapping:
o You create several datasets by randomly sampling from the original data.
o These datasets can be generated simultaneously.
2. Model Training:
o Each model (e.g., decision tree) is trained on its own bootstrapped dataset.
o Since these models don’t depend on each other, they can be trained
simultaneously on multiple CPU cores or machines.
3. Prediction:
o Each trained model gives a prediction on test data.
o These predictions can also be generated in parallel, then aggregated (voted or
averaged).

⚙️ Why Is This Useful?

 🚀 Faster training: If you use a multi-core CPU or GPU cluster, all trees (or models)
can be trained together.
 🔧 Scalable: Works well on big data or real-time systems.
 🔄 Efficient: No need to wait for one model to finish before starting the next.

🔁 Contrast With Boosting:

Unlike bagging, boosting works sequentially:

 Each new model corrects the errors of the previous one.

 So you can’t train boosting models in parallel — they depend on earlier steps.

Xgboost Presentation
100% (3)
Xgboost Presentation
54 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Exploring Boston Housing Data
No ratings yet
Exploring Boston Housing Data
7 pages
Using Scouting Reports Text To Predict NCAA NBA Performance
No ratings yet
Using Scouting Reports Text To Predict NCAA NBA Performance
16 pages
Boosting
No ratings yet
Boosting
2 pages
XGBoost_ Unleashing the Power of Gradient Boosting
No ratings yet
XGBoost_ Unleashing the Power of Gradient Boosting
10 pages
Explicação de Cada Modelo IAML
No ratings yet
Explicação de Cada Modelo IAML
10 pages
XGboost Vs Other
No ratings yet
XGboost Vs Other
2 pages
XGBoost
No ratings yet
XGBoost
4 pages
Extreme Gradient Boosting
No ratings yet
Extreme Gradient Boosting
8 pages
21csc305p Machine Learning Unit 5
No ratings yet
21csc305p Machine Learning Unit 5
61 pages
Experimenting XGBoost Algorithmfor Predictionand Classificationof Different Datasets
No ratings yet
Experimenting XGBoost Algorithmfor Predictionand Classificationof Different Datasets
12 pages
Ensemble,Voting,Bagging,Boosting
No ratings yet
Ensemble,Voting,Bagging,Boosting
15 pages
107 Boostong Models
No ratings yet
107 Boostong Models
27 pages
Plagiarism
No ratings yet
Plagiarism
18 pages
Plagiarism
No ratings yet
Plagiarism
20 pages
Machine Learning Algorithms
100% (1)
Machine Learning Algorithms
15 pages
All About ML
No ratings yet
All About ML
18 pages
Survey - Gradient Boosting Machine
No ratings yet
Survey - Gradient Boosting Machine
9 pages
XGboost Tutorial
100% (1)
XGboost Tutorial
13 pages
DM - Lecture 4
No ratings yet
DM - Lecture 4
65 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
XGBoost & Adaboost
No ratings yet
XGBoost & Adaboost
22 pages
106-110
No ratings yet
106-110
6 pages
XGBoost - A Powerful Machine Learning Algorithm For Beginners
No ratings yet
XGBoost - A Powerful Machine Learning Algorithm For Beginners
3 pages
Types of Boosting
No ratings yet
Types of Boosting
4 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Comparison Between Xgboost, Lightgbm and Catboost Using A Home Credit Dataset
No ratings yet
Comparison Between Xgboost, Lightgbm and Catboost Using A Home Credit Dataset
5 pages
1729585037_ML11_Generalization
No ratings yet
1729585037_ML11_Generalization
40 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
ml_cheat (1)
No ratings yet
ml_cheat (1)
9 pages
_LECTURE+NOTES_Boosting
No ratings yet
_LECTURE+NOTES_Boosting
8 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
05.XGBoost
No ratings yet
05.XGBoost
6 pages
Xg Boost
No ratings yet
Xg Boost
5 pages
PyData London 2022 - Unlocking The Power of LightGBM (Summarized)
No ratings yet
PyData London 2022 - Unlocking The Power of LightGBM (Summarized)
28 pages
365 ML Infographic
No ratings yet
365 ML Infographic
1 page
00. April27 Revision LR DT Boosting Student Copy
No ratings yet
00. April27 Revision LR DT Boosting Student Copy
33 pages
Petrophysics-Driven Well Log Quality Control Using Machine Learning-2
No ratings yet
Petrophysics-Driven Well Log Quality Control Using Machine Learning-2
15 pages
DONG et al 2022 A neural network boosting regression model based on XGBoost
No ratings yet
DONG et al 2022 A neural network boosting regression model based on XGBoost
11 pages
ML CheatSheet
No ratings yet
ML CheatSheet
14 pages
Assignment 4 Reportdocx
No ratings yet
Assignment 4 Reportdocx
10 pages
Boosting Algorithms: Regularization, Prediction and Model Fitting
No ratings yet
Boosting Algorithms: Regularization, Prediction and Model Fitting
29 pages
Gradient Boosting: November 2020
100% (1)
Gradient Boosting: November 2020
7 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
Datagiri: Presented 17 November By: Himanshu Shrivastava
No ratings yet
Datagiri: Presented 17 November By: Himanshu Shrivastava
17 pages
Breast Cancer Tumor Prediction Using XGBOOST
No ratings yet
Breast Cancer Tumor Prediction Using XGBOOST
1 page
XGBoost
No ratings yet
XGBoost
4 pages
ML mod1
No ratings yet
ML mod1
48 pages
2. LearningFromExamples II
No ratings yet
2. LearningFromExamples II
36 pages
106-110
No ratings yet
106-110
6 pages
Models
No ratings yet
Models
46 pages
mboost_package
No ratings yet
mboost_package
69 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
Session 10 - Ensemble Methods (XGBoost)
No ratings yet
Session 10 - Ensemble Methods (XGBoost)
37 pages
Xgboost: Notebook
No ratings yet
Xgboost: Notebook
8 pages
Module 3.5 Ensemble Learning XGBoost
No ratings yet
Module 3.5 Ensemble Learning XGBoost
26 pages
Machine Learning
No ratings yet
Machine Learning
93 pages
ML Cheat Sheet MediSearch
No ratings yet
ML Cheat Sheet MediSearch
1 page
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Time_Series_DES_3_5_201320130514143106
No ratings yet
Time_Series_DES_3_5_201320130514143106
69 pages
NLP07 Generative Language Models (1)
No ratings yet
NLP07 Generative Language Models (1)
50 pages
12. Object Detection-compressed
No ratings yet
12. Object Detection-compressed
80 pages
11. GANs
No ratings yet
11. GANs
41 pages
Feature Engineering and Selection: A Practical Approach for Predictive Models 1st Edition Max Kuhn download pdf
No ratings yet
Feature Engineering and Selection: A Practical Approach for Predictive Models 1st Edition Max Kuhn download pdf
50 pages
JNTUK R20 ML UNIT-I Final
No ratings yet
JNTUK R20 ML UNIT-I Final
22 pages
Detection_Method_of_Turn_to_Turn_Insulation_Short_Circuit_Fault_of_Dry-Type_Air-Core_Reactor_U[1]
No ratings yet
Detection_Method_of_Turn_to_Turn_Insulation_Short_Circuit_Fault_of_Dry-Type_Air-Core_Reactor_U[1]
5 pages
ML - Questions & Answer
No ratings yet
ML - Questions & Answer
45 pages
Individual Assignment 3 Guideline
No ratings yet
Individual Assignment 3 Guideline
4 pages
Machine Learning in Python - Course Notes
No ratings yet
Machine Learning in Python - Course Notes
36 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
trắc nghiệm phân tích dữ liệu trong kế toán
No ratings yet
trắc nghiệm phân tích dữ liệu trong kế toán
24 pages
Camm 3e Ch07 PPT PDF
No ratings yet
Camm 3e Ch07 PPT PDF
116 pages
Key Terms in Machine Learning
No ratings yet
Key Terms in Machine Learning
6 pages
Zaheer Et Al 2023 Orientation Estimation For Instrumented Helmet Using Neural Networks
No ratings yet
Zaheer Et Al 2023 Orientation Estimation For Instrumented Helmet Using Neural Networks
12 pages
Unit II - 1 - Chapter 4 - Training Models
No ratings yet
Unit II - 1 - Chapter 4 - Training Models
20 pages
Deep Learning Based Multilingual Speech Synthesis Using Multi Feature Fusion Methods
No ratings yet
Deep Learning Based Multilingual Speech Synthesis Using Multi Feature Fusion Methods
16 pages
Sample Project Doc-RIT
No ratings yet
Sample Project Doc-RIT
63 pages
Lesson 07 7.02 Knowledge Check
No ratings yet
Lesson 07 7.02 Knowledge Check
7 pages
Data Mining: Set-01: (Introduction)
No ratings yet
Data Mining: Set-01: (Introduction)
14 pages
Midterm - APS1070 - 2019 - 09 Fall
No ratings yet
Midterm - APS1070 - 2019 - 09 Fall
2 pages
Face Recognition For Attendance Using Transfer Learning
No ratings yet
Face Recognition For Attendance Using Transfer Learning
7 pages
Quiz Feedback - Coursera
100% (1)
Quiz Feedback - Coursera
5 pages
Machine Learning With Spark
No ratings yet
Machine Learning With Spark
26 pages
Final Report A I Detect
No ratings yet
Final Report A I Detect
34 pages
Decomposing The Rise of The Populist Radical Right
No ratings yet
Decomposing The Rise of The Populist Radical Right
90 pages
Deep Learning and Pure Mathematics
No ratings yet
Deep Learning and Pure Mathematics
16 pages
CSC413 A2
No ratings yet
CSC413 A2
3 pages
Katsande Android Applicationfor Crop Disease Diagnosis Using Image Processing and Deep Learning
No ratings yet
Katsande Android Applicationfor Crop Disease Diagnosis Using Image Processing and Deep Learning
84 pages
Thinkitive Int
No ratings yet
Thinkitive Int
2 pages

Baysian Final

Uploaded by

Baysian Final

Uploaded by

Feature AdaBoost Gradient Boosting

Origin Introduced by Yoav Freund & Introduced by Jerome Friedman in

AdaBoost — Key Concepts

Gradient Boosting — Key Concepts

 Focuses on wrong predictions.

 Focuses on errors (residuals) instead of weights.

 You take small steps downhill.

That’s gradient descent!

XGBoost (Extreme Gradient Boosting)

 Developed by Tianqi Chen as part of his PhD project.

2. LightGBM (Light Gradient Boosting Machine)

 Developed by Microsoft Research in 2016.

1. Histogram-based Decision Tree Learning:

 Developed by Yandex (Russia’s Google equivalent) in 2017.

L1 Regularization (Lasso Regression)

L2 Regularization (Ridge Regression)

L2 regularization, or Ridge regression, is another method to combat overfitting by adding a

L1 + L2 Regularization (Elastic Net)

k-Nearest Neighbors (k-NN)

🔁 Steps Where Parallelism Happens:

⚙️ Why Is This Useful?

🔁 Contrast With Boosting:

Unlike bagging, boosting works sequentially:

 Each new model corrects the errors of the previous one.

You might also like