LightGBM - An In-Depth Guide Python
LightGBM - An In-Depth Guide Python
coderzcolumn.com/tutorials/machine-learning/lightgbm-an-in-depth-guide-python
Table of Contents
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", 50)
Load Datasets
We’ll be using the below-mentioned three different datasets which are available from sklearn as a part of this tutorial for explanation purposes.
Boston Housing Dataset: It's a regression problem dataset which has information about the various attribute of houses in Boston and
their price in dollar. This will be used for regression tasks.
Breast Cancer Dataset: It's a classification dataset that has information about two different types of tumor. It'll be used for explaining
binary classification tasks.
Wine Dataset - It's a classification dataset that has information about ingredients used in three different types of wines. It'll be used for
explaining multi-class classification tasks.
We have loaded all three datasets mentioned one by one below. We have printed descriptions of datasets which gives us an overview of dataset
features and size. We have even loaded each dataset as a pandas data frame and displayed the first few samples of data.
boston = load_boston()
boston_df.head()
1/26
**Data Set Characteristics:**
:Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT Price
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2
breast_cancer = load_breast_cancer()
breast_cancer_df.head()
:Attribute Information:
- radius (mean of distances from center to points on the perimeter)
- texture (standard deviation of gray-scale values)
- perimeter
- area
- smoothness (local variation in radius lengths)
- compactness (perimeter^2 / area - 1.0)
- concavity (severity of concave portions of the contour)
- concave points (number of concave portions of the contour)
- symmetry
- fractal dimension ("coastline approximation" - 1)
The mean, standard error, and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.
- class:
- WDBC-Malignant
- WDBC-Benign
2/26
mean mean
mean mean mean mean mean mean mean concave mean fractal radius textur
radius texture perimeter area smoothness compactness concavity points symmetry dimension error error
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 1.0950 0.9053
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 0.5435 0.7339
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 0.7456 0.7869
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 0.4956 1.1560
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 0.7572 0.7813
Wine Dataset
wine = load_wine()
wine_df.head()
- class:
- class_0
- class_1
- class_2
train()
The simplest way to create an estimator in lightgbm is by using the train() method. It takes as input estimator parameter as dictionary and
training dataset. It then trains the estimator and returns an object of type Booster which is a trained estimator that can be used to make
future predictions.
params - This parameter accepts dictionary specifying parameters of gradient boosted decision trees algorithm. We just need to provide
an objective function to get started with based on the type of problem (classification/regression). We'll later explain a commonly used list
of parameters that can be passed to this dictionary.
3/26
train_set - This parameter accepts lightgbm Dataset object which holds information about feature values and target values. It's an
internal data structure designed by lightgbm to wrap data.
num_boost_round - It specifies the number of booting trees that will be used in the ensemble. The group of gradient boosted trees is
called ensemble to whom we generally refer as an estimator. The default value is 100.
valid_sets - It accepts list of Dataset objects which as validation sets. These validation sets will be evaluated after each training
round.
valid_names - It accepts a list of strings of the same length as that of valid_sets specifying names for each validation set. These
names will be used when printing evaluation metrics for these datasets as well as when plotting them.
categorical_feature - It accepts list of strings/ints or string auto . If we give a list of strings/ints then those columns from the
dataset will be treated as categorical columns.
verbose_eval - It accepts bool or int as value. If we set the value to False or 0 then it won't print metrics evaluation results calculated
on validation sets that we passed. If we pass True then it'll print results for each round. If we pass an integer greater than 1 then it'll print
results repeatedly after that many rounds.
Dataset
The dataset is a lightgbm internal data structure for holding data and labels. Below are important parameters of the class.
data - It accepts numpy array, pandas dataframe, scipy sparse matrix, list of numpy arrays, h2o data table’s frame as input holding
feature values.
label - It accepts numpy array, pandas series, pandas one column dataframe specifying target values. We can even set this parameter
to None if we don't have target values. The default is None.
feature_name - It accepts a list of strings specifying feature names.
categorical_feature - It has the same meaning as that mentioned in the train() method parameter above. We can handle
categorical feature here or in that method.
Regression
The first problem that we'll solve using lightgbm is a simple regression problem using the Boston housing dataset which we loaded earlier. We
have divided the dataset into train/test sets and created a Dataset instance out of them. We have then called the lightgbm.train() method
giving it train and validation set. We have set the number of boosting rounds to 10 hence it'll create 10 boosted trees to solve the problem. After
training completes, it'll return an instance of type Booster which we can later use to make future predictions on the dataset. As we have given
the validation set as input, it'll print the validation l2 score after each iteration of training. Please make a note that by default lightgbm
minimizes l2 loss for regression problems.
4/26
Below we have made predictions on train and test data using a trained booster. We have then calculated R2 metrics for both using the sklearn
metric method. Please make a note that the predict() method accepts numpy array, pandas dataframe, scipy sparse matrix, or h2o data
table’s frame as input for making predictions.
If you are interested in learning the list of available metrics in scikit-learn then please feel free to check our tutorial on the same.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
The predict() method of a few important parameters which can be used to make a different kind of predictions.
raw_score - It a boolean parameter which if set to True will return raw predictions. For regression problems, this won't make any
difference but for classification problems, it'll return function values as output than probabilities.
pred_leaf - This parameter accepts boolean values which if set to True will return an index of leaf in each tree that was predicted for a
particular sample. The size of the output will be n_samples x n_trees .
pred_contrib - It returns an array of features contribution for each sample. It'll return an array of size (n_features + 1) for each
sample of data where the last value is the expected value and the first n_features values are the contribution of features in making that
prediction. We can add the contribution of each feature to the last expected value and we'll get an actual prediction. It's commonly
referred to as SHAP values.
If you are interested in learning about SHAP values and our tutorial on the awesome SHAP package which lets us visualize these SHAP values
in different ways to understand the performance of the model then check our tutorial on the same.
idxs
We can call the num_trees() method on the booster instance to get a number of trees in the ensemble. Please make a note that if we don't
stop training early then a number of trees will be the same as num_boost_round . But if we are stopping training early then a number of trees
will be different from num_boost_round . We have explained later in this tutorial how we can stop training if the ensemble's performance is
not improving when evaluated on the validation set.
booster.num_trees()
10
5/26
The booster instance has another important method named feature_importance() which can return us the importance of features based on
gain and split values of the trees.
booster.feature_importance(importance_type="gain")
booster.feature_importance(importance_type="split")
Binary Classification
In this section, we have explained how we can use the train() method to create a booster for a binary classification problem. We are training
the model on the breast cancer dataset and later evaluating the accuracy of it using a metric from sklearn. We have set an objective to binary for
informing the train() method that we'll be giving data for binary classification problem. We have also set the verbosity parameter value
to -1 in order to prevent training messages. It'll still print validation set evaluation results which can be turned off by setting the
verbose_eval parameter to False.
Please make a note that for classification problems predict() method of booster return probabilities. We have included logic to convert
probabilities to the target class.
LightGBM evaluates binary log loss function by default on the validation set for binary classification problems. We can give the metric
parameter in the dictionary which we are giving to the train() method with any metric names available with lightgbm and it'll evaluate that
metric. We'll later explain the list of available metrics with lightgbm.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
MultiClass Classification
As a part of this section, we have explained how we can use the train() method for multi-class classification problems. We are using it on
the wine dataset which has three different types of wine as the target variable. We have set an objective function to multiclass . We need to
provide the num_class parameter with an integer specifying a number of classes whenever we are using the method for multi-class
classification problems.
6/26
The predict() method returns the probabilities of each class in case of multi-class problems. We have included logic to select the class with
maximum probability as a prediction.
LightGBM evaluates multi-class log loss function by default on the validation set for binary classification problems.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
objective - This parameter lets us define an objective function to use for the task. The default value of this parameter is regression .
Below is a list of commonly used values for this parameter.
regression
regression_l1
tweedie
binary
multiclass
multiclassova
cross_entropy
Other available objective functions
7/26
metric - This parameter accepts metrics to be evaluated on evaluation datasets if evaluation datasets are provided as
eval_set/validation_sets parameter value. We can provide more than one metric and all will be evaluated on validation sets. Below
is a list of the commonly used values of metrics.
rmse
l2
l1
tweedie
binary_logloss
multi_logloss
auc
cross_entropy
Other available metrics
boosting - This parameter accepts one of the below-mentioned string specifying which algorithm to use.
gbdt - Default. Gradient Boosting Decision Tree
rf - Random Forest
dart - Dropouts meet multiple additive regression trees
goss - Gradient-based on side sampling
num_iterations - This parameter is an alias to num_boost_round which lets us specify the number of trees to the ensemble to create
an estimator. The default is 100 .
learning_rate - This parameter accepts a learning rate to use for the training process. The default is 0.1 .
num_class - If we are working with multi-class classification problems then we need to provide a number of classes to this parameter.
num_leaves - This parameter accepts integer specifying the number of max leaves allowed per tree. The default is 31 .
num_threads - It accepts integer specifying the number of threads to use for training. We can set it to the same number of cores of the
system.
seed - This lets us specify the default seed for training which lets us regenerate the same results.
max_depth - This parameter lets us specify the maximum depth allowed for trees in the ensemble. The default is -1 which let trees
grow as deep as possible. We can restrict this behavior by setting this parameter.
min_data_in_leaf - This parameter accepts integer value specifying a minimum number of data points that can be kept in one leaf.
This parameter can be used to control overfitting. The default value is 20 .
bagging_fraction - This parameter accepts float value between 0-1 letting us specify randomly select that much part of the data when
training. This parameter can help prevent overfitting. The default is 1.0 .
feature_fraction - This parameter accepts a float value between 0-1 that informs the algorithm to select that fraction of features from
the total for training at each iteration. The default is 1.0 hence selecting all features.
extra_trees - This parameter accepts boolean values specifying whether to use an extremely randomized tree or not.
early_stopping_round - This parameter accepts integer specifying we should stop training if evaluation metric is not improving on
last evaluation set for iterations specified by this parameter.
monotone_constraints - This parameter lets us specify whether our model should enforce increasing, decreasing, or no relation of an
individual feature with the target value. We have explained the usage of this parameter in a section named monotonic constraints.
monotone_constraints_method - This parameter accepts one of the below-mentioned string specifying the type of monotonic
constraints to impose.
basic - Basic monotone constraints method which can over constrain the model.
intermediate - It’s a little advanced constraints method which is a little less constraining than the basic method but can take a
little more time.
advanced - - It’s an advanced constraints method that is less constraining than basic and intermediate methods but can take more
time.
interaction_constraints - This parameter accepts a list of lists where individual list specify feature indices which are allowed to
interact with one another. We have explained feature interaction in detail in section feature interaction constraints.
verbosity - This parameter accepts integer value for controlling logging message when training.
<0 - Only Fatal Errors are displayed.
0 - Only Error/Warning messages are displayed.
1 - Only info messages are displayed.
>1 - Only debug information is displayed.
is_unbalance - This is a boolean parameter that should be set to True if data is imbalanced. It should be used with binary and multi-
class classification problems.
device_type - It accepts one of the below string specifying device type of training.
cpu
gpu
force_col_wise - This parameter accepts boolean value specifying whether to force column-wise histogram building when training. If
data has too many columns then setting this parameter to True will improve training process speed by reducing memory usage.
force_row_wise - This parameter accepts boolean value specifying whether to force row-wise histogram building when training. If data
has too many rows then setting this parameter to True will improve training process speed by reducing memory usage.
Please make a note that this is not the full list of parameters available with lightgbm but only a few important parameters list. If you are
interested in learning about all parameters then please feel free to check the below link.
8/26
LightGBM Full Parameters List
LGBMModel
LGBMModel class is a wrapper around Booster class that provides scikit-learn like API for training and prediction in lightgbm. It let us create
an estimator object with a list of parameters as input. We can then call the fit() method giving train data for training and the predict()
method for making a prediction. The parameters which we had given as a dictionary to params parameter of train() can now directly be
given to the constructor of LGBMModel to create a model. LGBMModel let us perform both classification and regression tasks by specifying the
objective of the task.
Regression
Below we have explained with a simple example of how we can use LGBMModel to perform regression tasks with Boston housing data. We
have first created an instance of LGBMModel with the objective as regression and number of trees set to 10. The n_estimators parameter is
an alias of num_boost_round parameter of train() method.
We have then called the fit() method for the training model giving train data to it. Please make a note that it accepts numpy arrays as input
and not lightgbm Dataset object. We have also given a dataset to be used as an evaluation set and metrics to be evaluated on the evaluation
dataset. The parameter of the fit() method is almost the same as that of the train() method.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
Binary Classification
Below we have explained with a simple example of how we can use LGBMModel for classification tasks. We have a trained model with a breast
cancer dataset. Please make a note that the predict() method returns probabilities. We have included logic to calculate class from
probabilities.
9/26
from sklearn.model_selection import train_test_split
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
LGBMRegressor
LGBMRegressor is another wrapper estimator around the Booster class provided by lightgbm which has the same API as that of sklearn
estimators. As its name suggests, it’s designed for regression tasks. LGBMRegressor is almost the same as that of LGBMModel with the only
difference that it’s designed for only regression tasks. Below we have explained the usage of LGBMRegressor with a simple example using the
Boston housing dataset. Please make a note that LGBMRegressor provides the score() method which evaluates the R2 score for us which we
used to evaluate using the sklearn metric method till now.
LGBMClassifier
LGBMClassifier is one more wrapper estimator around the Booster class that provides a sklearn-like API for classification tasks. It works
exactly like LGBMModel but for only classification tasks. It also provides a score() method which evaluates the accuracy of data passed to it.
Please make a note that LGBMClassifier predicts actual class labels for the classification tasks with the predict() method. It provides the
predict_proba() method if we want probabilities of target classes.
10/26
Binary Classification
Below we have explained with a simple example of how we can use LGBMClassifier for binary classification tasks. We have explained its usage
with the Breast cancer dataset.
Multi-Class Classification
Below we have explained the usage of LGBMClassifier for multi-class classification tasks using the Wine classification dataset.
Please make a note that LGBMModel , LGBMRegressor and LGBMClassifier provides an attribute named booster_ which returns an
instance of the Booster class which we can save to disk after training and later load for prediction.
booster.booster_
<lightgbm.basic.Booster at 0x7f21e9f69eb8>
11/26
save_model() - This method takes as an input file name to which save the model.
model_to_string() - This method returns a string representation of the model which we can then save to a text file.
lightgbm.Booster() - This constructor lets us create an instance of the Booster class. It has two important parameters that can help us
load a model from a file or from a string.
model_file - This parameter accepts the file name from which to load the trained model.
model_str - This parameter accepts a string that has information about the trained model. We need to give a string that was
generated using model_to_string() to this parameter after loading from the file.
Below we have explained with simple examples how we can use above mentioned methods to save models to a disk and then load it.
Please make a note that in order to save model trained using LGBMModel, LGBMRegressor, and LGBMClassifier, we first need to get their
Booster instance by using the booster_ attribute of an estimator and then save it. LGBMModel, LGBMRegressor, and LGBMClassifier do not
provide saving and loading functionalities. It’s only available with the Booster instance.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
booster.save_model("lgb.model")
<lightgbm.basic.Booster at 0x7f08e8967c50>
loaded_booster = lgb.Booster(model_file="lgb.model")
loaded_booster
<lightgbm.basic.Booster at 0x7f08e8e744a8>
test_preds = loaded_booster.predict(X_test)
train_preds = loaded_booster.predict(X_train)
model_as_str = booster.model_to_string()
model_str = open("booster2.model").read()
<lightgbm.basic.Booster at 0x7f08e8938940>
12/26
from sklearn.metrics import r2_score
test_preds = booster_frm_str.predict(X_test)
train_preds = booster_frm_str.predict(X_train)
Cross Validation
Lightgbm let us perform cross-validation using cv() method. It accepts model parameters as a dictionary like the train() method. We can
then give a dataset on which to perform cross-validation. It performs 5-fold cross-validation by default. We can change the number of folds by
setting the nfold parameter. It also accepts sklearn's data splitter like KFold , StratifiedKFold , ShuffleSplit , and
StratifiedShuffleSplit . We can provide these data splitters to the folds parameter of the method.
The cv() method returns a dictionary that has information about the mean and standard deviation of loss for each round of training. We can
even ask the method to return an instance of CVBooster by setting the return_cvbooster parameter to True. CVBooster object has
information about cross-validation.
{'binary_logloss-mean': [0.5862971048268162,
0.536385329057131,
0.4946178001035051,
0.4577660981720048,
0.42757828019512817,
0.40059432541714546,
0.3787432348470402,
0.355943799374708,
0.3417565456639551,
0.3243928378974005],
'binary_logloss-stdv': [0.008145979941642538,
0.013910430256742287,
0.02139399288171927,
0.026698647074055896,
0.0317980957740354,
0.03473655291456087,
0.039345850387526374,
0.04066125361064387,
0.04311758960643671,
0.04399410008603076]}
13/26
[1] cv_agg's auc: 0.891975 + 0.0243025 cv_agg's average_precision: 0.903601 + 0.0403935
[2] cv_agg's auc: 0.947531 + 0.0218243 cv_agg's average_precision: 0.966003 + 0.0157877
[3] cv_agg's auc: 0.959877 + 0.0340906 cv_agg's average_precision: 0.97341 + 0.0230962
[4] cv_agg's auc: 0.962963 + 0.0302406 cv_agg's average_precision: 0.976702 + 0.018958
[5] cv_agg's auc: 0.969136 + 0.0314754 cv_agg's average_precision: 0.980817 + 0.0197982
[6] cv_agg's auc: 0.975309 + 0.0230967 cv_agg's average_precision: 0.985447 + 0.0135086
[7] cv_agg's auc: 0.975309 + 0.0230967 cv_agg's average_precision: 0.985447 + 0.0135086
[8] cv_agg's auc: 0.975309 + 0.0230967 cv_agg's average_precision: 0.985447 + 0.0135086
[9] cv_agg's auc: 0.975309 + 0.0230967 cv_agg's average_precision: 0.985447 + 0.0135086
[10] cv_agg's auc: 0.969136 + 0.0314754 cv_agg's average_precision: 0.980817 + 0.0197982
cvbooster = cv_output['cvbooster']
cvbooster.boosters
[<lightgbm.basic.Booster at 0x7f21e96937b8>,
<lightgbm.basic.Booster at 0x7f21e90dfc88>,
<lightgbm.basic.Booster at 0x7f21e9693240>]
Plotting Functionality
Lightgbm provides a list of the below-mentioned plotting functions.
plot_importance()
This method accepts a booster instance and plots feature importance using it. Below we have created a feature importance plot using the
booster trained earlier for the regression task. The method has a parameter named importance_type which can be set to string split will
plot the number of times feature was used for split and plots gains of splits if set to string gain . The value of parameter importance_type is
split . The plot_importance() method has another important parameter max_num_features which accepts an integer specifying how
many features to include in the plot. We can limit the number of features using this parameter as it'll include only that many top features in the
plot.
lgb.plot_importance(booster, figsize=(8,6));
plot_metric()
This method plots the results of an evaluation metric. We need to give a booster instance to the method in order to plot an evaluation metric
evaluated on the evaluation dataset.
14/26
from sklearn.model_selection import train_test_split
booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse", eval_names = ["Validation Set"],
feature_name=boston.feature_names.tolist()
)
lgb.plot_metric(booster, figsize=(8,6));
plot_split_value_histogram()
This method takes as input booster instance and feature name/index. It then plots a split value histogram for the feature.
15/26
plot_tree()
This method lets us plot the individual tree of the ensemble. We need to give a booster instance and index of the tree which we want to plot to
it.
Please make a note that we need an evaluation dataset in order for this to work as it’s based on evaluation metric results evaluated on the
evaluation dataset.
Below we have explained the usage of the parameter early_stopping_rounds for regression and classification tasks with simple examples.
16/26
from sklearn.model_selection import train_test_split
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
17/26
Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)
[1] valid_0's rmse: 8.82485
Training until validation scores don't improve for 5 rounds
[2] valid_0's rmse: 8.09497
[3] valid_0's rmse: 7.46686
[4] valid_0's rmse: 6.90991
[5] valid_0's rmse: 6.4172
[6] valid_0's rmse: 5.99212
[7] valid_0's rmse: 5.62928
[8] valid_0's rmse: 5.30155
[9] valid_0's rmse: 5.05191
[10] valid_0's rmse: 4.84863
[11] valid_0's rmse: 4.63474
[12] valid_0's rmse: 4.44933
[13] valid_0's rmse: 4.28644
[14] valid_0's rmse: 4.15939
[15] valid_0's rmse: 4.01791
[16] valid_0's rmse: 3.92719
[17] valid_0's rmse: 3.82892
[18] valid_0's rmse: 3.77695
[19] valid_0's rmse: 3.69585
[20] valid_0's rmse: 3.64548
[21] valid_0's rmse: 3.58403
[22] valid_0's rmse: 3.54853
[23] valid_0's rmse: 3.51134
[24] valid_0's rmse: 3.4976
[25] valid_0's rmse: 3.45016
[26] valid_0's rmse: 3.42836
[27] valid_0's rmse: 3.41483
[28] valid_0's rmse: 3.40661
[29] valid_0's rmse: 3.39959
[30] valid_0's rmse: 3.38903
[31] valid_0's rmse: 3.37894
[32] valid_0's rmse: 3.35784
[33] valid_0's rmse: 3.37572
[34] valid_0's rmse: 3.3732
[35] valid_0's rmse: 3.35426
[36] valid_0's rmse: 3.35484
[37] valid_0's rmse: 3.34265
[38] valid_0's rmse: 3.33666
[39] valid_0's rmse: 3.33256
[40] valid_0's rmse: 3.33374
[41] valid_0's rmse: 3.32778
[42] valid_0's rmse: 3.33335
[43] valid_0's rmse: 3.33888
[44] valid_0's rmse: 3.34715
[45] valid_0's rmse: 3.32557
[46] valid_0's rmse: 3.34178
[47] valid_0's rmse: 3.3474
[48] valid_0's rmse: 3.33983
[49] valid_0's rmse: 3.33105
[50] valid_0's rmse: 3.3198
[51] valid_0's rmse: 3.31533
[52] valid_0's rmse: 3.31672
[53] valid_0's rmse: 3.32232
[54] valid_0's rmse: 3.3158
[55] valid_0's rmse: 3.31626
[56] valid_0's rmse: 3.32085
Early stopping, best iteration is:
[51] valid_0's rmse: 3.31533
18/26
from sklearn.model_selection import train_test_split
booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),],
early_stopping_rounds=3)
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
Lightgbm provides early stopping training functionality using the early_stopping() callback function as well. We can give number of
rounds to early_stopping() function and give that function to callbacks parameter of train()/fit() method. We have explained
callbacks in an upcoming section.
19/26
from sklearn.model_selection import train_test_split
booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),],
callbacks=[lgb.early_stopping(3)]
)
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
Below we have explained with a simple example of how we can force feature interaction constraint on estimator in lightgbm. Lighgbm
estimators provide a parameter named interaction_constraints which accepts a list of lists where individual lists are indices of
parameters that are allowed to interact with one another.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
20/26
Train/Test Sizes : (455, 13) (51, 13) (455,) (51,)
booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse",
)
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
Monotonic Constraints
Lightgbm let us specify monotonic constraints on a model that specifies whether the individual feature has increasing, decreasing, or no
relation with the target value. It let us specify monotone values of -1, 0, and 1 forcing model to impose decreasing, none, and increasing
relationship of the feature with the target. We can provide a list with the same length as a number of features specifying 1,0 or -1 for the
monotonic relationship by using the monotone_constraints parameter. We have explained below with a simple example of how we can
enforce monotonic constraints in lightgbm.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
21/26
Train/Test Sizes : (455, 13) (51, 13) (455,) (51,)
booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse",
)
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
Below we have designed the mean squared error objective function. We have then given this function to an objective parameter of LGBMModel
for an explanation.
22/26
from sklearn.model_selection import train_test_split
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
We need to give reference to this function as the value of parameter feval if we are using train() method to design our estimator. If we
are using a sklearn-like estimator then we need to give this function to the eval_metric parameter of the fit() method.
Below we have explained with simple examples of how we can use custom evaluation metrics with lightgbm.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
23/26
Train/Test Sizes : (455, 13) (51, 13) (455,) (51,)
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
Callbacks
Lightgbm provides users with a list of callback functions for a different purpose that gets executed after each iteration of training. Below is a list
of available callback functions with lightgbm:
early_stopping(stopping_rounds) - This callback function accepts an integer specifying whether to stop training if evaluation
metric results on the last evaluation set are not improved for that many iterations.
print_evaluation(period, show_stdv) - This callback function accepts integer values specifying how often to print evaluation
results. Evaluation metric results are printed at every that many iterations as specified.
record_evaluation(eval_result) - This callback function accepts a dictionary in which evaluation results will be recorded.
reset_parameter() - This callback function lets us reset the learning rate after each iteration of training. It accepts an array of size the
same as the number of iterations or callback returning the new learning rate for each iteration.
The callbacks parameter which is available with the train() method and the fit() method of estimators accepts a list of callback
functions.
Below we have explained with simple examples of how we can use different callback functions. The explanation of the early_stopping()
callback function has been covered in the early stopping training section of this tutorial.
24/26
from sklearn.model_selection import train_test_split
booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse", verbose=False,
callbacks=[lgb.callback.print_evaluation(period=3)])
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
evals_results = {}
booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse", verbose=False,
callbacks=[lgb.print_evaluation(period=3), lgb.record_evaluation(evals_results)])
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse",
callbacks=[lgb.reset_parameter(learning_rate=np.linspace(0.1,1,10).tolist())])
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
25/26
Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)
[1] valid_0's rmse: 19.224
[2] valid_0's rmse: 12.167
[3] valid_0's rmse: 6.42527
[4] valid_0's rmse: 4.44198
[5] valid_0's rmse: 4.22668
[6] valid_0's rmse: 4.43308
[7] valid_0's rmse: 4.29187
[8] valid_0's rmse: 4.47696
[9] valid_0's rmse: 4.5301
[10] valid_0's rmse: 4.64636
This ends our small tutorial explaining the API of LightGBM. Please feel free to let us know your views in the comments section.
26/26