0% found this document useful (0 votes)

3 views6 pages

05.XGBoost

XGBoost is an open-source library for optimized distributed gradient boosting, excelling in regression, classification, and ranking tasks through techniques like regularization, parallel tree boosting, and advanced splitting algorithms. It incorporates a regularized objective function, greedy learning with second-order approximations, and various optimizations to enhance speed and performance, making it particularly effective for large datasets with mixed feature types. XGBoost also handles missing values efficiently and is often preferred over Random Forest due to its ability to manage unbalanced datasets and overfitting more effectively.

Uploaded by

Durvesh Mahurkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views6 pages

05.XGBoost

Uploaded by

Durvesh Mahurkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

XGBOOST

∙ XGBoost is an open-source software library that implements optimized distributed

gradient boosting machine learning algorithms under the Gradient Boosting
framework.
∙ XGBoost, which stands for Extreme Gradient Boosting, is a scalable, distributed
gradient-boosted decision tree (GBDT) machine learning library. It provides
parallel tree boosting and is the leading machine learning library for regression,
classification, and ranking problems.
∙ XGBoost minimizes a regularized (L1 and L2) objective function that combines a
convex loss function (based on the difference between the predicted and target
outputs) and a penalty term for model complexity (in other words, the regression
tree functions). The training proceeds iteratively, adding new trees that predict
the residuals or errors of prior trees that are then combined with previous trees
to make the final prediction. It's called gradient boosting because it uses a
gradient descent algorithm to minimize the loss when adding new models.

XGBoost Features

∙ Regularized Learning: Regularization term helps to smooth the final learnt

weights to avoid over-fitting. The regularized objective will tend to select a model
employing simple and predictive functions.
∙ Gradient Tree Boosting: The tree ensemble model cannot be optimized using
traditional optimization methods in Euclidean space. Instead, the model is
trained in an additive manner.
∙ Shrinkage and Column Subsampling: Besides the regularized objective, two
additional techniques are used to further prevent over fitting. The first technique
is shrinkage introduced by Friedman. Shrinkage scales newly added weights by a
factor η after each step of tree boosting. Similar to a learning rate in stochastic
optimization, shrinkage reduces the influence of each tree and leaves space for
future trees to improve the model.
∙ The second technique is the column (feature) subsampling. This technique is used
in Random Forest. Column sub-sampling prevents over-fitting even more so than
the traditional row sub-sampling. The usage of column sub-samples also speeds
up computations of the parallel algorithm.

SPLITTING ALGORITHMS

XGBoost (Extreme Gradient Boosting) employs a sophisticated approach to splitting

nodes in its decision tree construction, going beyond basic methods like Gini impurity or
information gain. Here's a breakdown of how splitting works in XGBoost:
1. Regularized Objective Function:

XGBoost's core strength lies in its regularized objective function. It doesn't just aim to
minimize the loss (like mean squared error or logistic loss); it also incorporates a penalty
term for the complexity of the tree. This helps prevent overfitting. The objective function
looks something like this:

Obj(Θ) = Σ(Loss(yi, ŷi)) + Σ(Ω(tree_k))

2. Greedy Learning with Second-Order Approximation:

XGBoost uses a greedy approach to build the trees. It starts with a single root node and
iteratively adds branches to the tree. For each possible split, it calculates the gain – how
much the objective function would be improved by making that split.

A key innovation is that XGBoost uses a second-order Taylor expansion of the loss
function. This gives a more accurate estimate of the gain compared to using just the first
derivative (as in gradient boosting). The second-order information (Hessian) helps
XGBoost find better splits.

3. Exact Greedy Algorithm: The main problem in tree learning is to find the best
split. This algorithm enumerates over all the possible splits on all the features. It is
computationally demanding to enumerate all the possible splits for continuous features.

4. Approximate Greedy Algorithm:

Calculating the exact best split can be computationally expensive, especially for large
datasets. XGBoost provides an approximate greedy algorithm to speed up the process.
Instead of evaluating every possible split point, it proposes a set of candidate split points
(quantiles of the feature distribution). It then evaluates the gain for these candidate split
points and chooses the best one.

These approximations are based on quantiles. That is, the first quantile is the first
threshold, the second quantile is the second threshold, and so on and so forth. By
default, the approximate greedy algorithm builds approximately 33 quantiles.

5. Weighted Quantile Sketch:

Even the approximate algorithm can be slow if the dataset doesn't fit into memory.
XGBoost uses a clever data structure called a weighted quantile sketch to efficiently find
the candidate split points. This sketch approximates the quantiles of the feature
distribution without needing to load the entire dataset into memory.

6. Sparsity-Aware Split finding

Sparsity aware split finding helps to handle missing information in data and provides a
basis on how to deal with new missing data. In this optimization, the data is split into
two groups. One group has data with no missing feature values, and the second group
has data with all the missing features rows with its associated response variable. The
data from group one is sorted in an ascending fashion. Then, the split finding process
calculates two sets of gain values –

● First, it would calculate gain by adding the missing data from the second group to
the left of the tree
● Second, it would calculate gain by adding the missing data from the second group
to the right of the tree

This is done for each of the quantiles. The largest gain overall is picked as the default,
when there is missing data.

Cache-Aware Access

Cache memory in the CPU is the fastest to access. Hence, this is used to store the first
and second order derivatives within XGBoost (gradients and hessians) to rapidly
calculate the scores for each node and leaf in the tree.

Compressed Sparse Column (CSC) data format

XGBoost divides the dataset into multiple blocks in a Compressed Sparse Column format
(CSC) to distribute the blocks to multiple cores for parallel learning.
How does it compare to gradient boosting technique?

In the traditional sequential gradient boosting technique, the process that takes the most
time is the split finding process which uses greedy algorithm. Though the greedy
algorithm is fast with small datasets, for very large datasets, the process becomes
extremely slow. This is because, the entire dataset is linearly scanned, and for each
unique value in the data, a tree is built first without considering the effect of the split
until the next iteration.

The parallelism within XGBoost occurs within this split finding process for the tree
branches. It is a highly optimised and well-engineered parallelism which makes the
process 10 times faster compared to gradient boosting technique. In this parallel split
finding process, the data is split into multiple subsets and distributed to the available
cores (4 in the diagram below). The data is then scanned for all the possible values and
approximated using the greedy algorithm. The data from all 4 cores is then combined to
form an approximate quantile histogram which is used as approximations for tree splits.
The first and second order derivates (gradients and hessians) calculated for the splits are
stored in cache memory for faster access when determining the gain and output value for
the leaf nodes. This process exploits the majority of the unique features in XGBoost, that
is, parallel learning, approximate greedy algorithm, weighted quantile sketch, sparsity
aware splitting and cache aware access. Furthermore, the CSC data format makes
reading the data from the hard drive much faster even though it needs to be
decompressed first.

Below is the diagram I put together to demonstrate how the process and optimisations
come together in XGBoost.

..\Pictures\xgboost.png

Goals of XGBoost

Execution Speed: XGBoost was almost always faster than the other benchmarked
implementations from R, Python Spark and H2O and it is really faster when compared
to the other algorithms.
Model Performance: XGBoost dominates structured or tabular datasets on
classification and regression predictive modelling problems.

Learning Task Parameters:

The metric to be used for validation data. The default values are rmse for regression and
error for classification.
Typical values are:
rmse – root mean square error.
mae – mean absolute error.
logloss – negative log-likelihood.
error – Binary classification error rate (0.5 threshold).
merror – Multiclass classification error rate.
mlogloss – Multiclass logloss.
auc – Area under the curve.

When to Use XGBoost?

Consider using XGBoost for any supervised machine learning task when satisfies the
following criteria:

∙ When you have large number of observations in training data.

∙ Number features < number of observations in training data.
∙ It performs well when data has mixture numerical and categorical features or just
numeric features.
∙ When the model performance metrics are to be considered.
How does XGB handle missing values?

Solution: XGBoost supports missing values by default. In tree algorithms, branch

directions for missing values are learned during training. It is important to note that the
gblinear booster treats missing values as zeros. During the training time XGB decides
whether the missing values should fall into the right node or left node. This decision is
taken to minimise the loss. If there are no missing values during the training time, the
tree made a default decision to send any new missings to the right node.

Key Difference Between Random Forest VS. XGBoost

1. XGBoost straight away prunes the tree with a score called “Similarity score” before
entering into the actual modeling purposes. It considers the “Gain” of a node as the
difference between the similarity score of the node and the similarity score of the
children. If the gain from a node is found to be minimal then it just stops constructing
the tree to a greater depth which can overcome the challenge of overfitting to a great
extent. Meanwhile, the Random forest might probably overfit the data if the
majority of the trees in the forest are provided with similar samples. If the trees are
completely grown ones then the model will collapse once the test data is introduced.
Therefore, major consideration is given to distributing all the elementary units of
the sample with approximately equal participation to all trees.
2. XGBoost is a good option for unbalanced datasets but we cannot trust random forest
in these types of cases. In applications like fraud detection, the classes will almost
certainly be imbalanced where the number of authentic transactions will be huge
when compared with unauthentic transactions. In XGBoost, when the model fails to
predict the anomaly for the first time, it gives more preferences and weightage to it
in the upcoming iterations thereby increasing its ability to predict the class with low
participation; but we cannot assure that random forest will treat the class imbalance
with a proper process.
3. One of the most important differences between XGBoost and Random forest is that
the XGBoost always gives more importance to functional space when reducing the
cost of a model while Random Forest tries to give more preferences to
hyperparameters to optimize the model. A small change in the hyperparameter will
affect almost all trees in the forest which can alter the prediction. Also, this is not a
good approach when we expect test data with so many variations in real-time with a
pre-defined mindset of hyperparameters for the whole forest but XG boost
hyperparameters are applied to only one tree at the beginning which is expected to
adjust itself in an efficient manner when iterations progress. Also, the XGBoost needs
only a very low number of initial hyperparameters (shrinkage parameter, depth of
the tree, number of trees) when compared with the Random forest.
4. When the model is encountered with a categorical variable with a different number of
classes then there lies a possibility that Random forest may give more preferences
to the class with more participation.
5. XGBoost may be more preferable in situations like Poisson regression, rank
regression, etc. This is because trees are derived by optimizing an objective function.
6. Random forests are easier to tune than Boosting algorithms.
7. Random forests easily adapt to distributed computing more than Boosting algorithms.
8. Random forests will not overfit almost certainly if the data is neatly pre-processed
and cleaned unless similar samples are repeatedly given to the majority of trees.

Is XGBoost faster than random forest?

XGBoost is usually used to train gradient-boosted decision trees (GBDT) and other
gradient boosted models. Random forests also use the same model representation and
inference as gradient-boosted decision trees, but it is a different training algorithm.
XGBoost can be used to train a standalone random forest. Also, random forest can be
used as a base model for gradient boosting techniques.

Further, random forest is an improvement over bagging that helps in reducing the
variance. Random forest builds trees in parallel, while in boosting, trees are built
sequentially. Meaning, each of the trees is grown using information from previously
grown trees, unlike bagging, where multiple copies of original training data are created
and fit separate decision tree on each. This is the reason why XGBoost generally
performs better than random forest.

Xgboost Presentation
100% (3)
Xgboost Presentation
54 pages
Salesforce AI Associate
No ratings yet
Salesforce AI Associate
3 pages
AI Programming CAT 1 N CAT 2 Muchiri
0% (1)
AI Programming CAT 1 N CAT 2 Muchiri
14 pages
XGBoost
No ratings yet
XGBoost
4 pages
XGBoost - A Powerful Machine Learning Algorithm For Beginners
No ratings yet
XGBoost - A Powerful Machine Learning Algorithm For Beginners
3 pages
Extreme Gradient Boosting
No ratings yet
Extreme Gradient Boosting
8 pages
XG Boost Research Paper (2)
No ratings yet
XG Boost Research Paper (2)
5 pages
XGBoost
No ratings yet
XGBoost
4 pages
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
100% (1)
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
13 pages
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
No ratings yet
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
13 pages
XGBoost & Adaboost
No ratings yet
XGBoost & Adaboost
22 pages
L5 Slides
No ratings yet
L5 Slides
23 pages
Machine Learning
No ratings yet
Machine Learning
93 pages
rfp0697 Chenaemb
No ratings yet
rfp0697 Chenaemb
10 pages
Xg Boost
No ratings yet
Xg Boost
13 pages
Xgboost_Regressor
No ratings yet
Xgboost_Regressor
3 pages
XGBoost and Random Forest Algorithms
100% (1)
XGBoost and Random Forest Algorithms
6 pages
Module 10-Part 3- Advanced Boosting Models
No ratings yet
Module 10-Part 3- Advanced Boosting Models
11 pages
XGboost Tutorial
100% (1)
XGboost Tutorial
13 pages
Breast Cancer Tumor Prediction Using XGBOOST
No ratings yet
Breast Cancer Tumor Prediction Using XGBOOST
1 page
XGBoost Algorithm
No ratings yet
XGBoost Algorithm
26 pages
Comparative Analysis of XGBoost
No ratings yet
Comparative Analysis of XGBoost
20 pages
XGBoost Parameters - Xgboost 1.5.0-Dev Documentation (Dragged) 2
No ratings yet
XGBoost Parameters - Xgboost 1.5.0-Dev Documentation (Dragged) 2
2 pages
XGBoost_ Unleashing the Power of Gradient Boosting
No ratings yet
XGBoost_ Unleashing the Power of Gradient Boosting
10 pages
Baysian Final
No ratings yet
Baysian Final
7 pages
Session 10 - Ensemble Methods (XGBoost)
No ratings yet
Session 10 - Ensemble Methods (XGBoost)
37 pages
xgboost_2019
No ratings yet
xgboost_2019
21 pages
365 ML Infographic
No ratings yet
365 ML Infographic
1 page
XGBoost and Upgrades
No ratings yet
XGBoost and Upgrades
14 pages
Xgboostcomp
No ratings yet
Xgboostcomp
21 pages
AIEdge MLArchive
No ratings yet
AIEdge MLArchive
93 pages
XGBoost Parameters - Xgboost 1.5.0-Dev Documentation (Dragged) 3
No ratings yet
XGBoost Parameters - Xgboost 1.5.0-Dev Documentation (Dragged) 3
2 pages
Gentle Introduction of XGBoost Library _ by Mohit Sharma _ Medium
No ratings yet
Gentle Introduction of XGBoost Library _ by Mohit Sharma _ Medium
17 pages
Plagiarism
No ratings yet
Plagiarism
18 pages
Module 3.5 Ensemble Learning XGBoost
No ratings yet
Module 3.5 Ensemble Learning XGBoost
26 pages
Chapter 12
No ratings yet
Chapter 12
27 pages
CSE24003_xgboost-1
No ratings yet
CSE24003_xgboost-1
14 pages
Loan
No ratings yet
Loan
3 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Ada Boost
No ratings yet
Ada Boost
2 pages
Thesis Final Version Julian Van Erk
No ratings yet
Thesis Final Version Julian Van Erk
30 pages
About The Model
No ratings yet
About The Model
1 page
Imbalanced XGBoost Paper
No ratings yet
Imbalanced XGBoost Paper
11 pages
XGboost Vs Other
No ratings yet
XGboost Vs Other
2 pages
Plagiarism
No ratings yet
Plagiarism
20 pages
Out-of-Core GPU Gradient Boosting: Rong Ou
No ratings yet
Out-of-Core GPU Gradient Boosting: Rong Ou
5 pages
XGBoost Regression In Depth. Explore everything about xgboost… _ by Fraidoon Omarzai _ Medium
No ratings yet
XGBoost Regression In Depth. Explore everything about xgboost… _ by Fraidoon Omarzai _ Medium
20 pages
Xg boosting reference
No ratings yet
Xg boosting reference
6 pages
Experimenting XGBoost Algorithmfor Predictionand Classificationof Different Datasets
No ratings yet
Experimenting XGBoost Algorithmfor Predictionand Classificationof Different Datasets
12 pages
Boosting
No ratings yet
Boosting
2 pages
Course DataCamp Classification With XGBoost
100% (1)
Course DataCamp Classification With XGBoost
39 pages
Zhang_2019_IOP_Conf._Ser.__Mater._Sci._Eng._490_072062
No ratings yet
Zhang_2019_IOP_Conf._Ser.__Mater._Sci._Eng._490_072062
6 pages
X Boost
No ratings yet
X Boost
2 pages
Comparison Between Xgboost, Lightgbm and Catboost Using A Home Credit Dataset
No ratings yet
Comparison Between Xgboost, Lightgbm and Catboost Using A Home Credit Dataset
5 pages
XGBOOST Advanced
100% (1)
XGBOOST Advanced
128 pages
Chapter 1
No ratings yet
Chapter 1
39 pages
Xg Boost
No ratings yet
Xg Boost
5 pages
Imbalance-Xgboost: Leveraging Weighted and Focal Losses For Binary Label-Imbalanced Classification With Xgboost
No ratings yet
Imbalance-Xgboost: Leveraging Weighted and Focal Losses For Binary Label-Imbalanced Classification With Xgboost
12 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Search Algorithm: Fundamentals and Applications
From Everand
Search Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Artificial Intelligence The Time To Act Is Now - McKinsey
No ratings yet
Artificial Intelligence The Time To Act Is Now - McKinsey
12 pages
Unit 5 - Speech and Video Processing (SVP)
No ratings yet
Unit 5 - Speech and Video Processing (SVP)
37 pages
How AI Is Making The Gig Economy More Fair and Reliable For Workers PDF
No ratings yet
How AI Is Making The Gig Economy More Fair and Reliable For Workers PDF
7 pages
Capstone Project Question Answers
No ratings yet
Capstone Project Question Answers
11 pages
Analisis Sentimen Aplikasi Pembelajaran Online Di Menggunakan Algoritma Support Vector Machine (SVM)
No ratings yet
Analisis Sentimen Aplikasi Pembelajaran Online Di Menggunakan Algoritma Support Vector Machine (SVM)
8 pages
Disc 11 PS - 02.03.24
No ratings yet
Disc 11 PS - 02.03.24
30 pages
Thesis Brain Computer Interface
100% (3)
Thesis Brain Computer Interface
8 pages
Edge Impulse
No ratings yet
Edge Impulse
12 pages
Lecture 6 Text Classification
No ratings yet
Lecture 6 Text Classification
19 pages
Futuristic Trends of Management
No ratings yet
Futuristic Trends of Management
14 pages
Diffusion Models in Deep Learning
No ratings yet
Diffusion Models in Deep Learning
14 pages
AI and Semiconductors
No ratings yet
AI and Semiconductors
64 pages
SVM
No ratings yet
SVM
12 pages
Towards A Meta-Theory of Accounting Information Systems
No ratings yet
Towards A Meta-Theory of Accounting Information Systems
15 pages
Vision Transformer-Based Feature Extraction For Generalized Zero-Shot Learning
No ratings yet
Vision Transformer-Based Feature Extraction For Generalized Zero-Shot Learning
21 pages
It Department Curriculum Report
No ratings yet
It Department Curriculum Report
5 pages
EXP6
No ratings yet
EXP6
5 pages
Market Research
No ratings yet
Market Research
15 pages
Global Beauty Sector Report 2024
No ratings yet
Global Beauty Sector Report 2024
40 pages
TSEC Admission Brochure
No ratings yet
TSEC Admission Brochure
98 pages
Status of Progress On Connected, Cooperative and Automated Mobility in Europe
No ratings yet
Status of Progress On Connected, Cooperative and Automated Mobility in Europe
55 pages
Mechanical Engineering【96†L1 L4】
No ratings yet
Mechanical Engineering【96†L1 L4】
5 pages
Basics of Machine Learning in Cybersecurity
No ratings yet
Basics of Machine Learning in Cybersecurity
3 pages
DGS Magazine 2018 19
No ratings yet
DGS Magazine 2018 19
25 pages
Robotic Process Automation June2015 PDF
100% (3)
Robotic Process Automation June2015 PDF
35 pages
Lai et al., 2021: Human-AI Collaboration in Healthcare_ A Review and Research Agend
No ratings yet
Lai et al., 2021: Human-AI Collaboration in Healthcare_ A Review and Research Agend
10 pages
Atmosphere 11 01241 PDF
No ratings yet
Atmosphere 11 01241 PDF
15 pages
Ai in different fields, with focus on software development
No ratings yet
Ai in different fields, with focus on software development
6 pages