0% found this document useful (0 votes)
7 views11 pages

Module 10-Part 3- Advanced Boosting Models

The document discusses advanced boosting models, focusing on decision trees and their fundamental questions, including feature selection, sample splitting, tree growth, and tree combination methods. It introduces three popular gradient boosting libraries: XGBoost, LightGBM, and CatBoost, highlighting their unique features and differences. The document emphasizes the evolution and efficiency of these algorithms in handling various data types and improving model performance.

Uploaded by

Aashir Aftab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views11 pages

Module 10-Part 3- Advanced Boosting Models

The document discusses advanced boosting models, focusing on decision trees and their fundamental questions, including feature selection, sample splitting, tree growth, and tree combination methods. It introduces three popular gradient boosting libraries: XGBoost, LightGBM, and CatBoost, highlighting their unique features and differences. The document emphasizes the evolution and efficiency of these algorithms in handling various data types and improving model performance.

Uploaded by

Aashir Aftab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Module 10 – Part III

Advanced Boosting models

Prof. Pedram Jahangiry


Decision Trees Fundamental questions
• Four fundamental questions to be answered:
1) What feature and cut off to start with?
2) How to split the samples?
3) How to grow a tree?
4) How to combine trees?

Prof. Pedram Jahangiry


What feature and cut off to start with?
• Which feature and cut off adds the most information gain (minimum impurity)?
• Regression trees: MSE
Control how a Decision Tree
• Classification trees: decides to split the data
1. Error rate
2. Entropy
3. Gini Index

Prof. Pedram Jahangiry


How to split the samples?

Method Description
This method sorts the data and creates histograms of the values
before splitting the tree. This allows for faster splits but can
Pre-sorted and histogram based
result in less accurate trees.

This method uses gradient information as a measure of the


weight of a sample for splitting.
GOSS (Gradient-based One-Side
Keeps instances with large gradients while performing random
Sampling)
sampling on instances with small gradients.

This method selects the best split at each step without


considering the impact on future splits. This method May
Greedy method
result in suboptimal trees

Prof. Pedram Jahangiry


How to grow a tree?
Algorithm Description
Depth-Wise Repeatedly splitting the data along the feature with the highest
Level-Wise information gain, until a certain maximum depth is reached. Resulting in a
tree with a balanced structure, where all leaf nodes are at the same depth.

Repeatedly splitting the data along the feature with the highest
information gain, until all leaf nodes contain only a single class. Resulting
Leaf-wise in a tree with a highly unbalanced structure, where some branches are
much deeper than others.

Builds the tree by repeatedly splitting the data along the feature with the
highest information gain, until a certain stopping criterion is met (e.g. a
Symmetric minimum number of samples per leaf node). Resulting in a more balanced
tree structure than leaf-wise growth.

Prof. Pedram Jahangiry


How to combine trees?
• Bagging consists of creating many “copies” of the training data (each
copy is slightly different from another) and then apply the weak
learner to each copy to obtain multiple weak models and then
combine them.
• In bagging, the bootstrapped trees are independent from each other.

• Boosting consists of using the “original” training data and iteratively


creating multiple models by using a weak learner. Each new model
tries to “fix” the errors which previous models make.
• In boosting, each tree is grown using information from previous tree.

Prof. Pedram Jahangiry


Evolution of XGBoost

Prof. Pedram Jahangiry


XGBoost: eXtreme Gradient Boosting
• XGBoost is an open-source gradient boosting library developed by Tianqi Chen (2014)
focused on developing efficient and scalable machine learning algorithms.
• Extreme refers to the fact that the algorithms and methods have been customized to push the
limit of what is possible for gradient boosting algorithms.
• XGBoost includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.

Prof. Pedram Jahangiry


LightGBM (Light Gradient Boosted Machine)
• LightGBM is an open-source gradient boosting library developed by Microsoft (2016) that
is fast and efficient, making it suitable for large-scale learning tasks.
• LightGBM can handle categorical features, but requires one-hot encoding, ordinal
encoding or other preprocessing
• LightGBM includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.

Prof. Pedram Jahangiry


CatBoost (Category Boosting)
• CatBoost is an open-source gradient boosting library developed by Yandex (2017) that is
specifically designed to handle categorical data.
• CatBoost can handle categorical features directly, without the need for one-hot encoding or
other preprocessing.
• CatBoost includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.

Prof. Pedram Jahangiry


XGBoost vs LightGBM vs CatBoost

XGBoost LightGBM CatBoost


Developer Tianqi Chen (2014) Microsoft (2016) Yandex (2017)
Base Model Decision Trees Decision Trees Decision Trees
Tree growing algorithm Depth-wise tree growth Leaf-wise tree growth Symmetric tree growth
Leaf-wise is also available
Parallel training Single GPU Multiple GPUs Multiple GPUs
Handling categorical Encoding required (one-hot, Automated encoding No encoding required
features ordinal, target, label, …) using categorical feature
binning
Splitting method Pre-sorted and histogram GOSS (Gradient based Greedy method
based one-side sampling)

Prof. Pedram Jahangiry

You might also like