Module 10-Part 3- Advanced Boosting Models

The document discusses advanced boosting models, focusing on decision trees and their fundamental questions, including feature selection, sample splitting, tree growth, and tree combination methods. It introduces three popular gradient boosting libraries: XGBoost, LightGBM, and CatBoost, highlighting their unique features and differences. The document emphasizes the evolution and efficiency of these algorithms in handling various data types and improving model performance.

Uploaded by

Aashir Aftab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views11 pages

Module 10-Part 3- Advanced Boosting Models

Uploaded by

Aashir Aftab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Module 10 – Part III

Advanced Boosting models

Prof. Pedram Jahangiry

Decision Trees Fundamental questions
• Four fundamental questions to be answered:
1) What feature and cut off to start with?
2) How to split the samples?
3) How to grow a tree?
4) How to combine trees?

Prof. Pedram Jahangiry

What feature and cut off to start with?
• Which feature and cut off adds the most information gain (minimum impurity)?
• Regression trees: MSE
Control how a Decision Tree
• Classification trees: decides to split the data
1. Error rate
2. Entropy
3. Gini Index

Prof. Pedram Jahangiry

How to split the samples?

Method Description
This method sorts the data and creates histograms of the values
before splitting the tree. This allows for faster splits but can
Pre-sorted and histogram based
result in less accurate trees.

This method uses gradient information as a measure of the

weight of a sample for splitting.
GOSS (Gradient-based One-Side
Keeps instances with large gradients while performing random
Sampling)
sampling on instances with small gradients.

This method selects the best split at each step without

considering the impact on future splits. This method May
Greedy method
result in suboptimal trees

Prof. Pedram Jahangiry

How to grow a tree?
Algorithm Description
Depth-Wise Repeatedly splitting the data along the feature with the highest
Level-Wise information gain, until a certain maximum depth is reached. Resulting in a
tree with a balanced structure, where all leaf nodes are at the same depth.

Repeatedly splitting the data along the feature with the highest
information gain, until all leaf nodes contain only a single class. Resulting
Leaf-wise in a tree with a highly unbalanced structure, where some branches are
much deeper than others.

Builds the tree by repeatedly splitting the data along the feature with the
highest information gain, until a certain stopping criterion is met (e.g. a
Symmetric minimum number of samples per leaf node). Resulting in a more balanced
tree structure than leaf-wise growth.

Prof. Pedram Jahangiry

How to combine trees?
• Bagging consists of creating many “copies” of the training data (each
copy is slightly different from another) and then apply the weak
learner to each copy to obtain multiple weak models and then
combine them.
• In bagging, the bootstrapped trees are independent from each other.

• Boosting consists of using the “original” training data and iteratively

creating multiple models by using a weak learner. Each new model
tries to “fix” the errors which previous models make.
• In boosting, each tree is grown using information from previous tree.

Prof. Pedram Jahangiry

Evolution of XGBoost

Prof. Pedram Jahangiry

XGBoost: eXtreme Gradient Boosting
• XGBoost is an open-source gradient boosting library developed by Tianqi Chen (2014)
focused on developing efficient and scalable machine learning algorithms.
• Extreme refers to the fact that the algorithms and methods have been customized to push the
limit of what is possible for gradient boosting algorithms.
• XGBoost includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.

Prof. Pedram Jahangiry

LightGBM (Light Gradient Boosted Machine)
• LightGBM is an open-source gradient boosting library developed by Microsoft (2016) that
is fast and efficient, making it suitable for large-scale learning tasks.
• LightGBM can handle categorical features, but requires one-hot encoding, ordinal
encoding or other preprocessing
• LightGBM includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.

Prof. Pedram Jahangiry

CatBoost (Category Boosting)
• CatBoost is an open-source gradient boosting library developed by Yandex (2017) that is
specifically designed to handle categorical data.
• CatBoost can handle categorical features directly, without the need for one-hot encoding or
other preprocessing.
• CatBoost includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.

Prof. Pedram Jahangiry

XGBoost vs LightGBM vs CatBoost

XGBoost LightGBM CatBoost

Developer Tianqi Chen (2014) Microsoft (2016) Yandex (2017)
Base Model Decision Trees Decision Trees Decision Trees
Tree growing algorithm Depth-wise tree growth Leaf-wise tree growth Symmetric tree growth
Leaf-wise is also available
Parallel training Single GPU Multiple GPUs Multiple GPUs
Handling categorical Encoding required (one-hot, Automated encoding No encoding required
features ordinal, target, label, …) using categorical feature
binning
Splitting method Pre-sorted and histogram GOSS (Gradient based Greedy method
based one-side sampling)

Prof. Pedram Jahangiry

Minimal Charm by Slidesgo
0% (3)
Minimal Charm by Slidesgo
47 pages
Xgboost Presentation
100% (3)
Xgboost Presentation
54 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
Strauss Susan Feiz Parastou Xiang Xuehua Grammar Meaning and
100% (2)
Strauss Susan Feiz Parastou Xiang Xuehua Grammar Meaning and
493 pages
Module 10- Part 2- Boosting models
No ratings yet
Module 10- Part 2- Boosting models
14 pages
05.XGBoost
No ratings yet
05.XGBoost
6 pages
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
100% (1)
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
13 pages
XGBoost
No ratings yet
XGBoost
4 pages
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
No ratings yet
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
13 pages
Plagiarism
No ratings yet
Plagiarism
20 pages
Plagiarism
No ratings yet
Plagiarism
18 pages
Module 3.5 Ensemble Learning XGBoost
No ratings yet
Module 3.5 Ensemble Learning XGBoost
26 pages
XGBoost Algorithm
No ratings yet
XGBoost Algorithm
26 pages
XGBoost and Random Forest Algorithms
100% (1)
XGBoost and Random Forest Algorithms
6 pages
Session 10 - Ensemble Methods (XGBoost)
No ratings yet
Session 10 - Ensemble Methods (XGBoost)
37 pages
Machine Learning
No ratings yet
Machine Learning
93 pages
Comparative Analysis of XGBoost
No ratings yet
Comparative Analysis of XGBoost
20 pages
XGBoost
No ratings yet
XGBoost
4 pages
phys361-S24-lecture-17-random-forests
No ratings yet
phys361-S24-lecture-17-random-forests
24 pages
A Comparative Analysis of Gradient Boosting Algorithms: Candice Bentéjac Anna Csörgő Gonzalo Martínez Muñoz
No ratings yet
A Comparative Analysis of Gradient Boosting Algorithms: Candice Bentéjac Anna Csörgő Gonzalo Martínez Muñoz
31 pages
Module 9- CART
No ratings yet
Module 9- CART
33 pages
rfp0697 Chenaemb
No ratings yet
rfp0697 Chenaemb
10 pages
Trees, Boosting, and Random Forest
No ratings yet
Trees, Boosting, and Random Forest
14 pages
Chapter 12
No ratings yet
Chapter 12
27 pages
XG Boost Research Paper (2)
No ratings yet
XG Boost Research Paper (2)
5 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
XGBoost - A Powerful Machine Learning Algorithm For Beginners
No ratings yet
XGBoost - A Powerful Machine Learning Algorithm For Beginners
3 pages
gbt
No ratings yet
gbt
24 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Baysian Final
No ratings yet
Baysian Final
7 pages
Xg Boost
No ratings yet
Xg Boost
13 pages
XGboost Tutorial
100% (1)
XGboost Tutorial
13 pages
Module 10- Part 1- Bagging and RandomForest
No ratings yet
Module 10- Part 1- Bagging and RandomForest
22 pages
ML mod1
No ratings yet
ML mod1
48 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Breast Cancer Tumor Prediction Using XGBOOST
No ratings yet
Breast Cancer Tumor Prediction Using XGBOOST
1 page
AIEdge MLArchive
No ratings yet
AIEdge MLArchive
93 pages
Decision Tree
No ratings yet
Decision Tree
38 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
106-110
No ratings yet
106-110
6 pages
XGBoost_ Unleashing the Power of Gradient Boosting
No ratings yet
XGBoost_ Unleashing the Power of Gradient Boosting
10 pages
Course DataCamp Classification With XGBoost
100% (1)
Course DataCamp Classification With XGBoost
39 pages
Extreme Gradient Boosting
No ratings yet
Extreme Gradient Boosting
8 pages
Out-of-Core GPU Gradient Boosting: Rong Ou
No ratings yet
Out-of-Core GPU Gradient Boosting: Rong Ou
5 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
PyData London 2022 - Unlocking The Power of LightGBM (Summarized)
No ratings yet
PyData London 2022 - Unlocking The Power of LightGBM (Summarized)
28 pages
Xg boosting reference
No ratings yet
Xg boosting reference
6 pages
XGBoost and Upgrades
No ratings yet
XGBoost and Upgrades
14 pages
Comparison Between Xgboost, Lightgbm and Catboost Using A Home Credit Dataset
No ratings yet
Comparison Between Xgboost, Lightgbm and Catboost Using A Home Credit Dataset
5 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Boosting: Trevor Hastie Statistics Department Stanford University
No ratings yet
Boosting: Trevor Hastie Statistics Department Stanford University
43 pages
Zhang 2017
No ratings yet
Zhang 2017
4 pages
Bagging vs Boosting - Javatpoint
No ratings yet
Bagging vs Boosting - Javatpoint
8 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
106-110
No ratings yet
106-110
6 pages
alogos used
No ratings yet
alogos used
3 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Google JAX Cookbook: Perform machine learning and numerical computing with combined capabilities of TensorFlow and NumPy
From Everand
Google JAX Cookbook: Perform machine learning and numerical computing with combined capabilities of TensorFlow and NumPy
Zephyr Quent
No ratings yet
Graphic Design Business Plan Example
No ratings yet
Graphic Design Business Plan Example
35 pages
CH 04
No ratings yet
CH 04
12 pages
CH 12
No ratings yet
CH 12
19 pages
GROUP2-Ak Bank Part A PDF
No ratings yet
GROUP2-Ak Bank Part A PDF
19 pages
Chapter 5: Personnel Planning and Recruiting
No ratings yet
Chapter 5: Personnel Planning and Recruiting
20 pages
Chapter 1.1 Principles of Marketing
No ratings yet
Chapter 1.1 Principles of Marketing
10 pages
Chap 05 Power Point Slides
No ratings yet
Chap 05 Power Point Slides
103 pages
Exercise 3_PM299.1_Hilot and Hilario
No ratings yet
Exercise 3_PM299.1_Hilot and Hilario
4 pages
Ias 40 - Investment Property - 2
No ratings yet
Ias 40 - Investment Property - 2
53 pages
Analysis of Six Patients With Unknown Viruses
No ratings yet
Analysis of Six Patients With Unknown Viruses
66 pages
Narrative Report For Upd Parmap Processing Surigao Del Norte Block 65Cd
No ratings yet
Narrative Report For Upd Parmap Processing Surigao Del Norte Block 65Cd
25 pages
Civil715 S1 2019
No ratings yet
Civil715 S1 2019
8 pages
Admitcard Test2
100% (1)
Admitcard Test2
2 pages
Data Adjusment BKM
No ratings yet
Data Adjusment BKM
21 pages
VPN Diagram Existing
No ratings yet
VPN Diagram Existing
1 page
Newsletter 2011 August 10
No ratings yet
Newsletter 2011 August 10
2 pages
Ranger Tactical Tasks List
No ratings yet
Ranger Tactical Tasks List
21 pages
Getting multi-channel distribution right Second Edition Ailawadi download
100% (2)
Getting multi-channel distribution right Second Edition Ailawadi download
63 pages
L6004L8 - Triac
No ratings yet
L6004L8 - Triac
8 pages
Mediation: Suitability Checklist
No ratings yet
Mediation: Suitability Checklist
2 pages
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis
No ratings yet
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis
18 pages
Assignment of Software (Naveen Sajwan) BCA
No ratings yet
Assignment of Software (Naveen Sajwan) BCA
11 pages
Mobile Applications For Libraries
100% (4)
Mobile Applications For Libraries
19 pages
THERMO1 - 3 Evaluating Properties PDF
No ratings yet
THERMO1 - 3 Evaluating Properties PDF
17 pages
04PCParameterBack-up and Restore PDF
100% (1)
04PCParameterBack-up and Restore PDF
32 pages
Chapter2 Water-Transport
No ratings yet
Chapter2 Water-Transport
66 pages
Module 4 - Company Situation Analysis
No ratings yet
Module 4 - Company Situation Analysis
58 pages
Notebook 4
No ratings yet
Notebook 4
3 pages
Inspired Eye Issue I
0% (1)
Inspired Eye Issue I
57 pages
chidAbhAsa - Kuntimaddi Sadananda
No ratings yet
chidAbhAsa - Kuntimaddi Sadananda
2 pages
Mba Wala SSC - Co 79
No ratings yet
Mba Wala SSC - Co 79
65 pages
OpenScape Contact Center Enterprise V11 OpenScape Contact Media Service Dialer User Guide Issue 1
No ratings yet
OpenScape Contact Center Enterprise V11 OpenScape Contact Media Service Dialer User Guide Issue 1
25 pages
Discussion Notes (Poisson's Ratio and Stattistically Indeterminate Members)
No ratings yet
Discussion Notes (Poisson's Ratio and Stattistically Indeterminate Members)
12 pages
183884 49351 Debit Note Format - Copy
No ratings yet
183884 49351 Debit Note Format - Copy
15 pages
Pta Book Inside Questions
No ratings yet
Pta Book Inside Questions
11 pages