0% found this document useful (0 votes)
13 views

XGBoost_ Unleashing the Power of Gradient Boosting

Slides on the XG Boost Algorithm

Uploaded by

enpass
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

XGBoost_ Unleashing the Power of Gradient Boosting

Slides on the XG Boost Algorithm

Uploaded by

enpass
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

XGBoost: Unleashing the

Power of Gradient Boosting

- A Dive into its Key Features and Advantages


Introduction

● What is XGBoost?
○ An open-source library for gradient boosting
○ Combines decision trees with regularization to create highly accurate models
○ Widely used in machine learning competitions and real-world applications

● Boosting is an ensemble learning technique where weak learners are combined


sequentially to create a strong model. It focuses on correcting errors made by
previous learners in the sequence.

● XGBoost stands out as an advanced boosting algorithm known for its efficiency and
performance.
Workings

In machine learning, weak learners are simple models, like decision trees. On their own, they might
not be incredibly accurate. But boosting comes in and harnesses their collective power. Here's how:

Stage 1: The First Foothold: Start with a single weak learner (decision tree) trained on your data. It
makes predictions, but inevitably gets some wrong.

Stage 2: Boosting the Signal: Focus on the errors! Increase the weights of data points the first tree
misclassified. Train a new weak learner to specifically address those errors.

Stage 3: Climbing Together: Combine the predictions of both trees. The first tree provides a rough
direction, and the second fine-tunes it based on the mistakes. Repeat!

Stage N: Reaching the Peak: With each iteration, build a new weak learner focused on the
remaining errors, weighted more for those the previous trees missed. Combine all predictions into a
final, powerful ensemble model.
Key Observations

● The weak learners are typically simple models like decision trees, but can be any
type of learner.

● Each new learner focuses on the residuals (errors) left by the previous ones.

● Boosting improves both bias (systematic error) and variance (random error) of the
model.

● This iterative process results in an ensemble model that is often much more accurate
than any individual weak learner.
Objective function
XGBoost's objective function is typically expressed as:

Obj = Loss(y, y_pred) + alpha * Reg_L1 + beta * Reg_L2

Loss represents the training loss function. Typical loss functions are Mean squared error
(for regression) Logistic loss (for binary classification) and Multiclass logloss (for multiclass
classification)

- y denotes the true labels and y_pred signifies the model's predicted values.
- alpha and beta control the strength of L1 and L2 regularization, respectively.
Key Features
● Speed and Efficiency: Optimized algorithms for parallel processing
● Regularization: Built-in mechanisms to prevent overfitting. Controls
model complexity through penalties on tree structure and leaf values
● Flexibility: Supports various objective functions for regression and
classification tasks and can handle missing values and categorical features
● Sparsity Awareness: Efficiently handles sparse data with many zero
features
● Distributed Learning : Scales to large datasets using distributed
computing frameworks like SparkEnables collaborative training on multiple
machines
● Interpretability: Feature importance scores help identify key drivers of
the model
Hyperparameters - Learning Task Parameters

● objective: Specifies the learning task and the corresponding objective function (e.g.,
'reg:squarederror' for regression, 'binary:logistic' for binary classification,
'multi:softmax' for multiclass classification).

● eval_metric: Evaluation metric used for the validation data (e.g., 'rmse' for
regression, 'logloss' for binary classification).
Hyperparameters - Tree Booster Parameters

● eta (learning rate): Shrinks the contribution of each tree, preventing overfitting.
Smaller values lead to slower learning but robust models. Typical values range
from 0.01 to 0.3.

● max_depth: Maximum depth of a tree. Higher values may lead to overfitting.

● subsample: Fraction of training data to be randomly sampled during each


boosting round. Values between 0.5 and 1.0 are common.

● colsample_bytree: Fraction of features to be randomly sampled for building


each tree.

● min_child_weight: Minimum sum of instance weight (hessian) needed in a child.


Hyperparameters - Regularization Parameters

● gamma: Minimum loss reduction required to make a further partition on a leaf node.

● alpha: L1 regularization parameter, controlling tree sparsity (number of non-zero


leaf nodes).

● lambda: L2 regularization parameter, shrinking leaf node weights to reduce model


complexity.
Other Hyperparameters

scale_pos_weight: Controls the balance of positive and negative weights, particularly


useful for imbalanced classes.

nthread: Sets the number of threads to use for parallel processing

n_estimators: Number of boosting rounds or trees to build.

early_stopping_rounds: If a metric does not improve for a certain number of rounds,


training is stopped.

You might also like