Learning Rate (Or Eta)
Learning Rate (Or Eta)
o Definition: Controls the step size shrinkage used in updating the weights of the model
during each boosting iteration.
o Where to use: Central to controlling the step size during boosting and preventing
overfitting.
o When to use: Lower values make the boosting process more conservative and require
more boosting rounds to converge, while higher values may lead to overfitting.
o XGBoost Hyperparameter: learning_rate
o Recommended values: Typically in the range [0.01, 0.3].
o
Number of Estimators (n_estimators):
o Definition: Number of boosting rounds or trees to build.
o Where to use: Dictates the number of boosting rounds and the overall complexity of
the model.
o When to use: Higher values can improve model performance, but increasing the
number of estimators also increases computation time.
o XGBoost Hyperparameter: n_estimators
o Recommended values: Depends on the size of the dataset and computational
resources, but typically in the range [100, 1000].
o
Maximum Depth (max_depth):
o Definition: Maximum depth of a tree in the ensemble.
o Where to use: Controls the depth of individual trees and the complexity of the model.
o When to use: Higher values allow for more complex trees, but too high may lead to
overfitting.
o XGBoost Hyperparameter: max_depth
o Recommended values: Typically in the range [3, 10].
o
Minimum Child Weight (min_child_weight):
o Definition: Minimum sum of instance weight required in a child node. It helps
prevent overfitting by controlling the minimum size of child nodes.
o Where to use: Ensures that each leaf node has a minimum number of instances, thus
reducing the complexity of the model.
o When to use: Higher values make the algorithm more conservative and reduce the
risk of overfitting.
o XGBoost Hyperparameter: min_child_weight
o Recommended values: Typically in the range [1, 10].
o
Subsample:
o Definition: Fraction of observations to be randomly sampled for each tree. It
introduces randomness and reduces overfitting.
o Where to use: Controls the randomness of the data sampling process for each tree.
o When to use: Lower values make the model more robust to noise but may lead to
underfitting.
o XGBoost Hyperparameter: subsample
o Recommended values: Typically in the range [0.5, 1.0].
o
Colsample bytree:
o Definition: Fraction of features to be randomly sampled for each tree. It introduces
randomness and reduces overfitting.
o Where to use: Controls the randomness of feature selection for each tree.
o When to use: Lower values reduce overfitting by introducing more randomness in
feature selection.
o XGBoost Hyperparameter: colsample_bytree
o Recommended values: Typically in the range [0.5, 1.0].
o
Gamma:
o Definition: Minimum loss reduction required to make a further partition on a leaf
node. It acts as regularization by controlling the complexity of trees.
o Where to use: Helps prevent overfitting by penalizing overly complex trees.
o When to use: Higher values make the algorithm more conservative.
o XGBoost Hyperparameter: gamma
o Recommended values: Typically in the range [0, 0.2].
o
Regularization Parameters (reg_alpha and reg_lambda):
o Definition: L1 and L2 regularization terms applied to the weights. They help prevent
overfitting by penalizing large parameter values.
o Where to use: Controls the amount of regularization applied to the model.
o When to use: Increase values to increase regularization and reduce overfitting.
o XGBoost Hyperparameters: reg_alpha, reg_lambda
o Recommended values: Typically in the range [0, 0.5].
Subsample bytree:
o Definition: Fraction of observations to be randomly sampled for each tree. It
introduces randomness and reduces overfitting.
o Where to use: Similar to subsample, but this parameter specifically controls the
randomness for sampling observations when constructing each tree.
o When to use: Can be useful for further fine-tuning the sampling process at the tree
level.
o XGBoost Hyperparameter: subsample_bytree
o Recommended values: Typically in the range [0.5, 1.0].
o
Lambda:
o Definition: L2 regularization term on weights. It penalizes large coefficients and
helps prevent overfitting.
o Where to use: Provides an alternative way to control regularization compared to
reg_lambda.
o When to use: Can be used as an additional regularization term to further control
overfitting.
o XGBoost Hyperparameter: lambda
o Recommended values: Typically in the range [0, 0.5].
o
Alpha:
o Definition: L1 regularization term on weights. It encourages sparsity in the weight
vectors.
o Where to use: Provides an alternative way to control regularization compared to
reg_alpha.
o When to use: Useful when dealing with high-dimensional data or when you suspect
that many features are irrelevant.
o XGBoost Hyperparameter: alpha
o Recommended values: Typically in the range [0, 0.5].
o
Scale Pos Weight:
o Definition: Controls the balance of positive and negative weights. It is useful for
imbalanced classification tasks.
o Where to use: Can be used to address class imbalance by assigning different weights
to positive and negative examples.
o When to use: Relevant for binary or multi-class classification tasks with imbalanced
class distributions.
o XGBoost Hyperparameter: scale_pos_weight
o Recommended values: Typically set to the ratio of negative examples to positive
examples.