0% found this document useful (0 votes)

34 views

Rumus2 GBT

This document discusses using gradient boosting trees to model and predict auto insurance loss costs. Gradient boosting is an iterative algorithm that combines simple models to produce highly accurate predictions. It provides interpretable results while requiring little data preprocessing or parameter tuning. The author applies gradient boosting trees to predict "at-fault" accident loss costs using data from a major Canadian insurer. Predictive accuracy is compared to conventional generalized linear models.

Uploaded by

Mabi sa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Rumus2 GBT

Uploaded by

Mabi sa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Expert Systems with Applications 39 (2012) 3659–3667

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa

Gradient boosting trees for auto insurance loss cost modeling and prediction
Leo Guelman ⇑
Royal Bank of Canada, RBC Insurance, 6880 Financial Drive, Mississauga, Ontario, Canada L5N 7Y5

a r t i c l e i n f o a b s t r a c t

Keywords: Gradient Boosting (GB) is an iterative algorithm that combines simple parameterized functions with
Statistical learning ‘‘poor’’ performance (high prediction error) to produce a highly accurate prediction rule. In contrast to
Gradient boosting trees other statistical learning methods usually providing comparable accuracy (e.g., neural networks and sup-
Insurance pricing port vector machines), GB gives interpretable results, while requiring little data preprocessing and tuning
of the parameters. The method is highly robust to less than clean data and can be applied to classiﬁcation
or regression problems from a variety of response distributions (Gaussian, Bernoulli, Poisson, and
Laplace). Complex interactions are modeled simply, missing values in the predictors are managed almost
without loss of information, and feature selection is performed as an integral part of the procedure. These
properties make GB a good candidate for insurance loss cost modeling. However, to the best of our knowl-
edge, the application of this method to insurance pricing has not been fully documented to date. This
paper presents the theory of GB and its application to the problem of predicting auto ‘‘at-fault’’ accident
loss cost using data from a major Canadian insurer. The predictive accuracy of the model is compared
against the conventional Generalized Linear Model (GLM) approach.
2011 Elsevier Ltd. All rights reserved.

1. Introduction data mechanism as unknown. As a result, algorithmic models sig-

nificantly increase the class of functions that can be approximated
Generalized Linear Models (GLMs) (McCullagh & Nelder, 1989) relative to data models. They are more efficient in handling large
are widely recognized as an accepted framework for building and complex data sets and in fitting non-linearities to the data.
insurance pricing models. These models are based on a traditional Model validation is measured by the degree of predictive accuracy
approach to statistical modeling which starts by assuming that and this objective is usually emphasized over producing interpret-
data are generated by a given stochastic data model (e.g., Gaussian, able models. It is probably due to this lack of interpretability in
Gamma, Poisson, etc.). There is vast insurance pricing literature on most algorithmic models, that their application to insurance pric-
such models (Anderson, Feldblum, Modlin, Schirmacher, & Thandi, ing problems has been very limited so far. Chapados et al. (2001)
2007; Brockman & Wright, 1992; Haberman & Renshaw, 1996). used several data-mining methods to estimate car insurance pre-
They are attractive in the sense of producing interpretable param- miums. Francis (2001) illustrates the application of neural net-
eters which are combined in a multiplicative fashion to obtain an works to insurance pricing problems such as the prediction of
estimate of loss cost, defined here as the portion of the premium frequencies and severities. Kolyshkina, Wong, and Lim (2004) dem-
which covers losses and related expenses (not including loadings onstrate the use of multivariate adaptive regression splines (MARS)
for the insurance company’s expenses, premium taxes, contingen- to enhance GLM building. To the best of our knowledge, the appli-
cies, and profit margins). Model validation is usually done using cation of Gradient Boosting (GB) to insurance pricing has not been
goodness-of-fit tests and residual examination. fully documented to date.
In the past two decades, the rapid development in computation Among algorithmic models, GB is unique in the sense of achiev-
and information technology has created an immense amount of ing both predictive accuracy and model interpretation goals. The
data. The field of statistics was revolutionized by the creation of later objective is particularly important in business environments,
new tools that helped analyze the increasing size and complexity where models must generally be approved by non-statistically
in the data structures. Most of these tools originated from an algo- trained decision makers who need to understand how the output
rithmic modeling culture as opposed to a data modeling culture from the ‘‘black-box’’ is being produced. In addition, this method
(Brieman, 2001). In contrast to data modeling, algorithmic model- requires little data preprocessing and tuning of the parameters. It
ing does not assume any specific model for the data, but treats the is highly robust to less than clean data and can be applied to clas-
sification or regression problems from a variety of response distri-
⇑ Tel.: +1 905 606 1175; fax: +1 905 286 4756. butions. Complex interactions are modeled simply, missing values
E-mail address: [email protected] in the predictors are managed almost without loss of information,

0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.09.058
3660 L. Guelman / Expert Systems with Applications 39 (2012) 3659–3667

and feature selection is performed as an integral part of the proce-

dure. These properties make this method a good candidate for Algorithm 1. AdaBoost
insurance loss cost modeling. 1: Initialize observation weights wi ¼ M1
The objective of this paper is to present the theory of GB and its 2: for t = 1 to T do
application to the analysis of auto insurance loss cost modeling 3: Fit ft(x) as the weak classifier on the training data using wi
using data from a major Canadian insurer. We first define the scope Pm
wi Iðy – f ðxi ÞÞ
of the predictive learning problem and the boosting approach to 4: Compute the weighted error rate as errt ¼ i¼1 Pm i t
i¼1
wi
solve it. The core of the paper follows, comprising a detailed 5: Let at = log ((1 errt)/errt)
description of gradient boosting trees from the statistical learning 6: Update wi wi.exp[at.I(yi – ft(xi))] scaled to sum to one "
perspective. We next describe an application to the analysis of auto i 2 {1, . . . , M}
insurance ‘‘at-fault’’ accident loss cost. A discussion is outlined at 7: end for
the end. hP i
8: Output ^f ðxÞ ¼ sign T ^
t¼1 at :f t ðxÞ

2. Predictive learning and boosting

The predictive learning problem can be characterized by a vec- The success of AdaBoost for classification problems was seen
tor of inputs or predictor variables x = {x1, . . . , xp} and an output or as a mysterious phenomenon by the statistics community until
target variable y. In this application, the input variables are repre- (Friedman, Hastie, & Tibshirani, 2000) showed the connection
sented by a collection of quantitative and qualitative attributes of between boosting and statistical concepts such as additive model-
the vehicle and the insured, and the output is the actual loss cost. ing and maximum-likelihood. Their main result is that it is possi-
Given a collection of M instances {(yi, xi); i = 1, . . . , M} of known ble to rederive AdaBoost as a method for fitting an additive model
(y, x) values, the goal is to use this data to obtain and estimate of in a forward stagewise manner. This gave significant understand-
the function that maps the input vector x into the values of the ing of why this algorithm tends to outperform a single base mod-
output y. This function can then be used to make predictions on el: by fitting an additive model of different and potentially simple
instances where only the x values are observed. Formally, we wish functions, it expands the class of functions that can be
to learn a prediction function ^f ðxÞ : x ! y that minimizes the approximated.
expectation of some loss function L(y, f) over the joint distribution
of all (y, x)-values
4. Additive models and boosting
^f ðxÞ ¼ argmin Ey;x Lðy; f ðxÞÞ ð1Þ
f ðxÞ
Our discussion in this section will be focused on the regres-
Boosting methods are based on the intuitive idea that combin- sion problem, where the output y is quantitative and the objec-
ing many ‘‘weak’’ rules to approximate (1) should result in tive is to estimate the mean E(yjx) = f(x). The standard linear
classification and regression models with improved predictive per- regression model assumes a linear form for this conditional
formance compared to a single model. A weak rule is a learning expectation
algorithm which performs only a little bit better than a coinflip.
The aim is to characterize ‘‘local rules’’ relating variables (e.g., ‘‘if X
p
EðyjxÞ ¼ f ðxÞ ¼ bj xj ð2Þ
an insured characteristic A is present and B is absent, then a claim j¼1
has high probability of occurring’’). Although this rule alone would
not be strong enough to make accurate predictions on all insureds, An additive model extends the linear model by replacing the
P
it is possible to combine many of those rules to produce a highly linear component g ¼ pj¼1 bj xj with an additive predictor of the
Pp
accurate model. This idea, known as the ‘‘the strength of weak lear- form g ¼ j¼1 fj ðxj Þ. We assume
nability’’ (Schapire, 1990) was originated in the machine learning
community with the introduction of AdaBoost, which is described X
p
EðyjxÞ ¼ f ðxÞ ¼ fj ðxj Þ; ð3Þ
in the next section.
j¼1

where f1(), . . . , fp() are smooth functions. There is a separate

3. AdaBoost
smooth function fj for each of the p input variables xj or, more gen-
erally, each component fj is a function of a prespecified subset of the
The AdaBoost is a popular boosting algorithm due to Freund and
input variables. These functions are not assumed to have a paramet-
Schapire (1996). Consider a classification problem with a binary
ric form, but instead they are estimated in a non-parametric
response variable coded as y 2 {1, 1} and classifier ^f ðxÞ taking
fashion.
one of those two values {1, 1}. The AdaBoost algorithm is outlined
This model can be extended by considering additive models
below. In short, the algorithm generates a sequence of weak classi-
with functions ft(x), t = {1, . . . , T} of potentially all the inputs vari-
fiers induced on a distribution of weights over the training set. One
ables. In this context
such weak classifier often used in AdaBoost is a single-split classi-
fication tree with only two terminal nodes. Initially, all observation
X
T X
T
weights are set equally, but on each iteration, the training observa- f ðxÞ ¼ ft ðxÞ ¼ bt hðx; at Þ; ð4Þ
tions that were misclassified in the previous step receive more t¼1 t¼1
weight in the next iteration. Thus, the algorithm is forced to focus
on observations that are difficult to correctly classify with each where the functions h(x; at) are usually taken to be simple functions
successive iteration. The final classifier is a weighted majority vote characterized by a set of parameters a = {a1, a2, . . .} and a multiplier bt.
of the individual weak classifiers. The weight assigned to each This form includes models such as neural networks, wavelets, multi-
weak classifier gets larger as its weighted error rate measured on variate adaptive regression splines and regression trees (Hastie,
the training set gets smaller. Tibshirani, & Friedman, 2001). In a boosting context, bth(x; at)
L. Guelman / Expert Systems with Applications 39 (2012) 3659–3667 3661

represents the ‘‘weak learner’’ and f(x) the weighted majority vote of
the individual weak learners. Algorithm 3. Gradient Boosting
Estimation of the parameters in (4) amounts to solving P
1: Initialize f0(x) to be a constant, f0 ðxÞ ¼ argminb M
i¼1 Lðyi ; bÞ
! 2: for t = 1 to T do
X
M X
T
min L yi ; bt hðxi ; at Þ ; ð5Þ 3: Compute the negative gradient as the working
fbt ;at gT1 i¼1 t¼1 response

where L(y, f(x)) is the chosen loss function (1) to define lack-of-fit. A
@Lðyi ; f ðxi ÞÞ
‘‘greedy’’ forward stepwise method solves (5) by sequentially fitting ri ¼ ; i ¼ f1; . . . ; Mg
@f ðxi Þ f ðxÞ¼ft1 ðxÞ
a single weak learner and adding it to the expansion of prior fitted
terms. The corresponding solution values of each new fitted term is 4: Fit a regression model to ri by least-squares using the
not readjusted as new terms are added into the model. This is out- input xi and get the estimate at of bh(x; a)
lined in Algorithm 2. 5: Get the estimate bt by minimizing
L(yi, ft1(xi) + bh(xi; at))
6: Update ft(x) = ft1(x) + bth(x; at)
7: end for
Algorithm 2. Forward Stagewise Additive Modeling
8: Output ^f ðxÞ ¼ f ðxÞ
T
1: Initialize f0(x) = 0
2: for t = 1 to T do
3: Obtain estimates bt and at by minimizing
PM For squared-error loss, the negative gradient in line 3 is just the
i¼1 Lðyi ; ft1 ðxi Þ þ bhðxi ; aÞÞ
4: Update ft(x) = ft1(x) + bth(x; at) usual residuals, so in this case the algorithm is reduced to standard
5: end for least-squares boosting. With absolute error loss, the negative gra-
dient is the sign of the residuals. Least-squares is used in line 4
6: Output ^f ðxÞ ¼ f ðxÞ T
independently of the chosen loss function.
Although boosting is not restricted to trees, our work will focus
on the case in which the weak learners represent a ‘‘small’’ regres-
If squared-error is used as the loss function, line 3 simplifies to sion tree, since they were proven to be a convenient representation
for the weak learners h(x; a) in the context of boosting. In this spe-
Lðyi ; ft1 ðxi Þ þ bhðxi ; aÞÞ ¼ ðyi ft1 ðxi Þ bhðxi ; aÞÞ2 cific case, the algorithm above is called gradient boosting trees and
the parameters at represent the split variables, their split values
¼ ðr it bhðxi ; aÞÞ2 ; ð6Þ
and the fitted values at each terminal node of the tree. Henceforth
in this paper, the term ‘‘Gradient Boosting’’ will be used to denote
where rit is the residual of the ith observation at the current gradient boosting trees.
iteration. Thus, for squared-error loss, the term bth(x; at) fitted to
the current residuals is added to the expansion in line 4. It is also
6. Injecting randomness and regularization
fairly easy to show (Hastie et al., 2001) that the AdaBoost algorithm
described in Section 3 is equivalent to forward stagewise modeling
Two additional ingredients to the gradient boosting algorithm
based on an exponential loss function of the form
were proposed by Friedman, namely regularization through
L(y, f(x)) = exp(yf(x)).
shrinkage of the contributed weak learners (Friedman, 2001) and
injecting randomness in the fitting process (Friedman, 2002).
5. Gradient boosting trees The generalization performance of a statistical learning method
is related to its prediction capabilities on independent test data.
Squared-error and exponential error are plausible loss functions Fitting a model too closely to the train data can lead to poor gen-
commonly used for regression and classification problems, respec- eralization performance. Regularization methods are designed to
tively. However, there may be situations in which other loss func- prevent ‘‘overfitting’’ by placing restrictions on the parameters of
tions are more appropriate. For instance, binomial deviance is far the model. In the context of boosting, this translates into control-
more robust than exponential loss in noisy settings where the ling the number of iterations T (i.e., trees) during the training pro-
Bayes error rate is not close to zero, or in situations where the tar- cess. An independent test sample or cross-validation can be used to
get classes are mislabeled. Similarly, the performance of squared- select the optimal value of T. However, an alternative strategy
error significantly degrades for long-tailed error distributions or showed to provide better results, and relates to scaling the contri-
the presence of ‘‘outliers’’ in the data. In such situations, other bution of each tree by a factor s 2 (0, 1]. This implies changing line
functions such as absolute error or Huber loss are more 6 in Algorithm 3 to
appropriate.
ft ðxÞ ¼ ft1 ðxÞ þ s bt hðx; at Þ ð7Þ
Under these alternative specifications for the loss function and
for a particular weak learner, the solution to line 3 in Algorithm The parameter s has the effect of retarding the learning rate of
2 is difficult to obtain. The gradient boosting algorithm solves the the series, so the series has to be longer to compensate for the
problem using a two-step procedure which can be applied to any shrinkage, but its accuracy is better. Lower values of s will produce
differentiable loss function. The first step estimates at by fitting a a larger value for T for the same test error. Empirically it has been
weak learner h(x; a) to the negative gradient of the loss function shown that small shrinkage factors (s < 0.1) yield dramatic
(i.e., the ‘‘pseudo-residuals’’) using least-squares. In the second improvements over boosting series built with no shrinkage
step, the optimal value of bt is determined given h(x; at). The pro- (s = 1). The trade-off is that a small shrinkage factor requires a
cedure is shown in Algorithm 3. higher number of iterations and computational time increases. A
3662 L. Guelman / Expert Systems with Applications 39 (2012) 3659–3667

strategy for model selection often used is practice is to set the The data set includes 426,838 earned exposures (measured in
value of s as small as possible (i.e. between 0.01 and 0.001) and vehicle-years) from Jan-06 to Jun-09, and 14,984 claims incurred
then choose T by early stopping. during the same period of time, with losses based on best reserve
The second modification introduced in the algorithm was to estimates as of Dec-09. The input variables (for an overview, see
incorporate randomness as an integral part of the fitting procedure. Table 1) were measured at the start of the exposure period, and
This involves taking a simple random sample without replacement are represented by a collection of quantitative and qualitative attri-
of usually approximately 1/2 the size of the full training data set at butes of the vehicle and the insured. The output is the actual loss
each iteration. This sample is then used to fit the weak learner (line cost, which is calculated as the ratio of the total amount of losses
4 in Algorithm 3) and compute the model update for the current to the earned exposure. In practice, the insurance legislation may
iteration. As a result of this randomization procedure, the variance restrict the usage of certain input variables to calculate insurance
of the individual weak learner estimates at each iteration premiums. Although our analysis was developed assuming a free
increases, but there is less correlation between these estimates at rating regulatory environment, the techniques described here can
different iterations. The net effect is a reduction in the variance be applied independently of the limitations imposed by any spe-
of the combined model. In addition, this randomization procedure cific legislation.
has the benefit of reducing the computational demand. For For statistical modeling purposes, we first partitioned the data
instance, taking half-samples reduces computation by almost 50%. into train (70%) and test (30%) data sets. The train set was used
for model training and selection, and the test set to assess the pre-
dictive accuracy of the selected gradient boosting model against
7. Interpretation
the Generalized Linear Model. To ensure that the estimated perfor-
mance of the model, as measured on the test sample, is an accurate
Accuracy and interpretability are two fundamental objectives of
approximation of the expected performance on future ‘‘unseen’’
predictive learning. However, these objectives do not always coin-
cases, the inception date of policies in the test set is posterior to
cide. In contrast to other statistical learning methods providing
the one of policies used to build and select the model.
comparable accuracy (e.g., neural networks and support vector
Loss cost is usually broken down into two components: claim
machines), gradient boosting gives interpretable results. An impor-
frequency (calculated as the ratio of the number of claims to the
tant measure often useful for interpretation is the relative influ-
earned exposure) and claim severity (calculated as the ratio of the
ence of the input variables on the output. For a single decision
total amount of losses to the number of claims). Some factors affect
tree, (Brieman, Friedman, Olshen, & Stone, 1984) proposed the
claim frequency and claim severity differently, and thus we consid-
following measure as an approximation of the relative influence
ered them separately. For the claim frequency model, the target
of a predictor xj
variable was coded as binary since only a few records had more
X than one claim during a given exposure period. The exposure per-
bI 2 ¼ m^2s ; ð8Þ
j
all splits
iod was treated as an offset variable in the model (i.e., a variable
on xj with a known parameter of 1).
The actual claim frequency measured on the entire sample is
where m ^2s is the empirical improvement in squared-error as a result 3.51%. This represents an imbalanced or skewed class distribution
of using xj as a splitting variable at the non-terminal node s. For Gra- for the target variable, with one class represented by a large sam-
dient Boosting, this relative influence measure is naturally extended ple (i.e. the non-claimants) and the other represented by only a few
by averaging (8) over the collection of trees. (i.e. the claimants). Classification of data with imbalanced class
Another important interpretation component is given by a distribution has posed a significant drawback for the performance
visual representation of the partial dependence of the approxima- attainable by most standard classifier algorithms, which assume a
tion ^f ðxÞ on a subset x‘ of size ‘ < p of the input vector x. The depen- relatively balanced class distribution (Sun, Kamel, Wong, & Wang,
dency of ^f ðxÞ on the remaining predictors xc (i.e. x‘ [ xc = x) must 2007). These classifiers tend to output the simplest hypothesis
be conditioned out. This can be estimated based on the training which best fits the data and, as a result, classification rules that
data by predict the small class tend to be fewer and weaker compared to
those that predict the majority class. This may hinder the detection
XM
^f ðx‘ Þ ¼ 1 ^f ðx‘ ; x Þ ð9Þ of claim predictors and eventually decrease the predictive accuracy
ic
M i¼1 of the model. To address this issue, we re-balanced the class distri-
bution for the target in the frequency model by resampling the
Note that this method requires predicting the response over the
data space. Specifically, we under-sampled instances from the
training sample for each set of the joint values of x‘, which can be
majority class to attain a 10% representation of claims in the train
computationally very demanding. However, for regression trees, a
sample. The test sample was not modified and thus contains the
weighted transversal method (Friedman, 2001) can be used, from
original class distribution for the target. In econometrics, this sam-
which ^f ðx‘ Þ is computed using only the tree, without reference to
ple scheme is known as choice-based or endogenous stratified sam-
the data itself.
pling (Green, 2000) and it is also popular in the computer science
community (Chan & Stolfo, 1998; Estabrooks & Japkowicz, 2004).
8. Application to auto insurance loss cost modeling The ‘‘optimal’’ class distribution for the target variable based on
under-sampling is generally dependent on the specific data set
8.1. The data (Weiss & Provost, 2003), and it is usually considered as an addi-
tional tuning parameter to optimize based on the performance
The data used for this analysis were extracted from a large data- measured on a validation sample.
base from a major Canadian insurer. It consists of policy and claim The estimation of a classification model from a balanced sample
information at the individual vehicle level. There is one observa- can be efficient but will overestimate the actual claim frequency.
tion for each period of time during which the vehicle was exposed An appropriate statistical method is required to correct this bias,
to the risk of having an at-fault collision accident. Mid-term and several alternatives exist for that purpose. In this application,
changes and policy cancellations would result in a corresponding we used the method of prior correction, which fundamentally
reduction in the exposure period. involves adjusting the predicted values based on the actual claim
L. Guelman / Expert Systems with Applications 39 (2012) 3659–3667 3663

Table 1
Overview of loss cost predictors.

Driver characteristics Accident/conviction history Policy characteristics Vehicle characteristics

DC1. Age of principal AC1. Number of chargeable accidents (last 1–3 years) PC1. Years since policy inception VC1. Vehicle make
operator
DC2. Years licensed AC2. Number of chargeable accidents (last 4–6 years) PC2. Presence of multi-vehicle VC2. Vehicle purchased new or
used
DC3. Age licensed AC3. Number of non-chargeable accidents (last 1–3 PC3. Collision deductible VC3. Vehicle leased
years)
DC4. License class AC4. Number of non-chargeable accidents (last 4–6 PC4. Billing type VC4. Horse power to weight ratio
years)
DC5. Gender AC5. Number of driving convictions (last 1–3 years) PC5. Billing status VC5. Vehicle age
DC6. Marital status AC6. Prior examination costs from accident-beneﬁt PC6. Rating territory VC6. Vehicle price
claims
DC7. Prior facility PC7. Presence of occasional driver under
association 25
DC8. Postal code risk score PC8. Presence of occasional driver over
25
DC9. Insurance lapses PC9. Group business
DC10. Insurance PC10. Business origin
suspensions
PC11. Dwelling unit type

frequency in the population. This correction is described for the was done in turn for the frequency and severity models. For each
logit model in Ref. (King & Zeng, 2001), and the same method of these models, we run 20,000 boosting iterations using the train-
has been successfully used in a boosting application to predict cus- ing data set.
tomer churn (Lemmens & Croux, 2006). A drawback of the under-sampling scheme described in Section
8.1, is that we may risk losing information from the majority class
when being under-sampled. To maximize the usage of the informa-
8.2. Building the model tion available in the training data, the optimal value for the param-
eters S and T was chosen based on the smallest estimated
The first choice in building the model involves selecting an prediction error using a K-fold cross-validation procedure with
appropriate loss function L(y, f(x)) as in (1). Squared-error loss, K = 10. This involves splitting the training data in K equal parts, fit-
PM 2 PM
i¼1 ðyi f ðxi ÞÞ , and Bernoulli deviance, 2 i¼1 ðyi f ðxi Þ logð1þ ting the model to K 1 parts of the data, and then calculating the
expðf ðxi ÞÞÞ, were used to define prediction error for the severity value for the prediction error on the kth part. This is done for
and frequency models, respectively. Then, it is necessary to select k = 1, 2, . . . , K and then the K estimated values for the prediction
the shrinkage parameter s applied to each tree and the sub-sam- error are averaged. Using a three-way interaction gave best results
pling rate as defined in Section 6. The former was set at the fixed in both frequency and severity models. Based on this level of inter-
value of 0.001 and the later at 50%. Next, the the size of the individ- action, Fig. 1 shows the train and cv-error as function of the num-
ual trees S and the number of boosting iterations T (i.e., number of ber of iterations for the severity model. The optimal value of T was
trees) need to be selected. The size of the trees was selected by set at the point for which the cv-error cease to decrease.
sequentially increasing the interaction depth of the tree, starting The test data set was not used for model selection purposes, but
with an additive model (single-split regression trees), followed to assess the generalization error of the final chosen model relative
by two-way interactions, and up to six-way interactions. This to the Generalized Linear Model approach. The later model was
estimated based on the same training data and using Binomial/
Gamma distributions for the response variables in the Frequency/
29000

Severity models, respectively.

Train Error
CV−Error
8.3. Results
28000

Fig. 2 displays the relative importance of the 10 most inﬂuential

Squared−Error Loss

predictor variables for the frequency (left) and severity (right) mod-
els. Since these measures are relative, a value of 100 was assigned
27000

to the most important predictor and the others were scaled accord-
ingly. There is a clear differential effect between the models. For
instance, the number of years licensed of the principal operator of
the vehicle is the most relevant predictor in the frequency model,
26000

while it is far less important in the severity model. Among the

other inﬂuential predictors in the frequency model, we ﬁnd the
presence of an occasional driver under 25 years, the number of driving
25000

convictions, and the age of the principal operator. For the severity
model, the vehicle age is the most inﬂuential predictor, followed
0 5000 10000 15000 by the price of the vehicle and the horse power to weight ratio. Partial
Boosting Iterations dependence plots offer additional insights in the way these vari-
ables affect the dependent variable in each model. Fig. 3 shows
Fig. 1. The relation between train and cross validation error and the optimal
number of boosting iterations (shown by the vertical green line). (For interpretation
the partial dependence plots for the frequency model. The vertical
of the references to colour in this ﬁgure legend, the reader is referred to the web scale is in the log odds and the hash marks at the base of each plot
version of this article.) show the deciles of the distribution of the corresponding variable.
3664 L. Guelman / Expert Systems with Applications 39 (2012) 3659–3667

VC3

PC9
PC6

PC7
DC8

AC2
DC3

DC8
VC4

AC5
DC2
VC5
DC1

PC3
AC5

VC4
PC7

VC6
DC2

0 20 40 60 80 100 VC5 0 20 40 60 80 100

Relative Importance Relative Importance

Fig. 2. Relative importance of the predictors for the Frequency (left) and Severity (right) models.

−2.00
−1.8

−1.9

partial dependence
partial dependence
partial dependence

−2.10
−2.0
−2.0

−2.1

−2.20
−2.2

−2.2
−2.4

−2.30
−2.3

0 10 20 30 40 50 60 N Y 0 1 2
DC2 PC7 AC5
−2.10
−2.2
−2.0
partial dependence

partial dependence

partial dependence
−2.15
−2.3

−2.20
−2.1

−2.4

−2.25
−2.5
−2.2

−2.30
−2.6
−2.3

20 30 40 50 60 70 80 0 5 10 15 550 600 650 700 750 800 850

DC1 VC5 DC8

Fig. 3. Partial dependence plots (frequency model).

The partial dependence of each predictor accounts for the average increases nearly at the end. The partial dependence on age initially
joint effect of the other predictors in the model. decreases abruptly up to a value of approximately 30, followed by
Claim frequency has a nonmonotonic partial dependence on a long plateau up to 70, when it steeply increases. The variables
years licensed. It decreases over the main body of the data and vehicle age and postal code risk score have a roughly monotonically
L. Guelman / Expert Systems with Applications 39 (2012) 3659–3667 3665

6800
7000
7000

6600
6500
partial dependence

partial dependence

partial dependence
6500
6000

6400
5500

6200
6000
5000

6000
4500

5800
5500
4000

0 5 10 15 10 20 30 40 50 60 70 80 0.05 0.10 0.15 0.20

VC5 VC6 (in 1000’s) VC4
6800

6500
6400
6600
partial dependence

partial dependence

partial dependence
6200

6300
6400

6000
6200

6100
6000

5800

5900
5800

0−500 501+ 0 10 20 30 40 50 60 0 1 2
PC3 DC2 AC5

Fig. 4. Partial dependence plots (severity model).

decreasing partial dependence. The age of the vehicle is widely ratio on claim severity. There appears to be an interaction effect
recognized as an important predictor in the frequency model between these two variables. Claim severity tends to be higher
(Brockman & Wright, 1992), since it is believed to be negatively for low values of years licensed, but this relation tends to be much
associated with annual mileage. It is not a common practice to stronger for high values of horse power to weight ratio.
use annual mileage directly as an input in the model, due to the We next compare the predictive accuracy of Gradient Boosting
difficulty in obtaining a reliable estimate for this variable. Claim (GB) against the conventional Generalized Linear Model (GLM)
frequency is also estimated to increase with the number of driving approach based on the test sample. This was done by calculating
convictions and it is higher for vehicles with an occasional driver the ratio of the rate we would charge based on the GB model
under 25 years of age. to the rate we would charge based on the GLM. Then we grouped
Note that these plots are not necessarily smooth, since there is the observations into five fairly equally sized buckets ranked by
no smoothness constraint imposed on the fitting procedure. This the ratio. Finally, for each bucket we calculated the GLM-loss ratio,
is the consequence of using a tree-based model. If a smooth trend defined as the ratio of the actual losses to the GLM predicted loss
is observed, this is result of the estimated nature of the depen- cost. Fig. 6 displays the results. Note that the GLM-loss ratio
dence of the predictors on the response and it is purely dictated increases whenever the GB model would suggest to charge a higher
by the data. rate relative to the GLM. The upward trend in the GLM-loss ratio
Fig. 4 shows the partial dependence plots for the severity mod- curve indicates the higher predictive performance of GB relative
el. The nature of the dependence of vehicle age and price of the vehi- to GLM.
cle is naturally due to the fact that newer and more expensive cars
would cost more to repair in the event of a collision. The shape of
these curves is fairly linear over the vast majority of the data. The 9. Discussion
variable horse power to weight ratio measures the actual perfor-
mance of the vehicle’s engine. The upward trend observed in the In this paper, we described the theory of Gradient Boosting (GB)
curve is anticipated, since drivers with high performance engines and its application to the analysis of auto insurance loss cost mod-
will generally drive at a higher speed compared to those with eling. GB was presented as an additive model that sequentially fits
low performance engines. All the remaining variables have the a relatively simple function (weak learner) to the current residuals
expected partial dependence effect on claim severity. by least-squares. The most important practical steps in building a
An interesting relationship is given in Fig. 5, which shows the model using this methodology have been described. Estimating
joint dependence between years licensed and horse power to weight loss cost involves solving regression and classification problems
3666 L. Guelman / Expert Systems with Applications 39 (2012) 3659–3667

7500

partial dep
7000

6500
endence

6000

5500 0.20
0
10 0.15
20
30 4
0.10
DC
2 40 VC
50
60 0.05

Fig. 5. Partial dependence of claim severity on years licensed and horse power to weight ratio.
15000

1.3
Actual Losses/GLM Pred. Loss Cost
Exposure Count
10000

1.1 1.2
5000

0.9 1.0
0

(0.418,0.896] (0.896,0.973] (0.973,1.05] (1.05,1.15] (1.15,3.36]

Ratio: GB Pred. Loss Cost / GLM Pred. Loss Cost

Fig. 6. Prediction accuracy of GB relative to GLM (based on test sample).

with several challenges. The large number of categorical and integral part of the GB procedure, and so it requires little ‘‘detec-
numerical predictors, the presence of non-linearities in the data tive’’ work on the part of the analyst.
and the complex interactions among the inputs is often the norm. In short, Gradient Boosting is a good alternative method to Gen-
In addition, data might not be clean and/or contain missing values eralized Linear Models for building insurance loss cost models. The
for some predictors. GB ﬁts very well this data structure. First, free available package gbm implements gradient boosting methods
based on the sample data used in this analysis, the level of accu- under the R environment for statistical computing (Ridgeway,
racy in prediction was shown to be higher for GB relative to the 2007).
conventional Generalized Linear Model approach. This is not sur-
prising since GLMs are, in essence, relatively simple linear models Acknowledgments
and thus they are constrained by the class of functions they can
approximate. Second, as opposed to other non-linear statistical I am deeply grateful to Matthew Buchalter and Charles Dugas
learning methods such as neural networks and support vector for thoughtful discussions. Also special thanks to Greg Ridgeway
machines, GB provides interpretable results via the relative inﬂu- for freely distributing the gbm software package in R. Comments
ence of the input variables and their partial dependence plots. This are welcome.
is a critical aspect to consider in a business environment, where
models usually must be approved by non-statistically trained deci-
sion makers who need to understand how the output from the References
‘‘black-box’’ is being produced. Third, GB requires very little data
Anderson, D., Feldblum, S., Modlin, C., Schirmacher, D. Schirmacher, E., & Thandi, N.
preprocessing which is one of the most time consuming activities (2007). A practitioner’s guide to generalized linear models. Casualty Actuarial
in a data mining project. Lastly, model selection is done as an Society (CAS), Syllabus Year: 2010, Exam Number: 9, 1–116.
L. Guelman / Expert Systems with Applications 39 (2012) 3659–3667 3667

Brieman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression Friedman, J. (2002). Stochastic gradient boosting. Computational Statistics & Data
trees. CRC Press. Analysis, 38, 367–378.
Brieman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16, Green, W. (2000). Econometric analysis (4th ed.). Prentice-Hall.
199–231. Haberman, S., & Renshaw, A. (1996). Generalized linear models and actuarial
Brockman, M., & Wright, T. (1992). Statistical motor rating: Making effective use of science. Journal of the Royal Statistical Society, Series D, 45, 407–436.
your data. Journal of the Institute of Actuaries, 119, 457–543. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning.
Chan, P., & Stolfo, S. (1998). Toward scalable learning with non-uniform class and Springer.
cost distributions: A case study in credit card fraud detection. Proceedings of the King, G., & Zeng, L. (2001). Explaining rare events in international relations.
International Conference on Knowledge Discovery and Data Mining, 4 (pp. 164– International Organization, 55, 693–715.
168). Kolyshkina, I., Wong, S., & Lim, S. (2004). Enhancing generalised linear models with
Chapados, N., Bengio, Y., Vincent, P., Ghosn, J., Dugas, C., Takeuchi, I., et al. (2001). data mining. Casualty Actuarial Society 2004, Discussion Paper Program.
Estimating car insurance premia: A case study in high-dimensional data Lemmens, A., & Croux, C. (2006). Bagging and boosting classification trees to predict
inference. University of Montreal, DIRO Technical Report, 1199. churn. Journal of Marketing Research, 43, 276–286.
Estabrooks, T., & Japkowicz, T. (2004). A multiple resampling method for learning McCullagh, P., & Nelder, J. (1989). Generalized linear models (2nd ed.). Chapman and
from imbalanced data sets. Computational Intelligence, 20, 315–354. Hall.
Francis, L. (2001). Neural networks demystified. Casualty Actuarial Society Forum, Ridgeway, G. (2007). Generalized boosted models: a guide to the gbm package.
Winter, 2001, 252–319. Available from http://cran.r-project.org/web/packages/gbm/index.html.
Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.
Proceedings of the International Conference on Machine Learning, 13 (pp. 148– Sun, Y., Kamel, M., Wong, A., & Wang, Y. (2007). Cost-sensitive boosting for
156). classification of imbalanced data. Pattern Recognition, 40, 3358–3378.
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A Weiss, G., & Provost, F. (2003). Learning when training data are costly: The effect of
statistical view of boosting. The Annals of Statistics, 28, 337–407. class distribution on tree induction. Journal of Artificial Intelligence Research, 19,
Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. 315–354.
The Annals of Statistics, 29, 1189–1232.

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6415)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (640)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1173)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (992)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1852)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4101)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (628)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1016)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1138)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (297)
How To Talk About Data Build Your Data Fluency (MARTIN. BUNZLI EPPLER (FABIENNE.) ) (Z-Library)
No ratings yet
How To Talk About Data Build Your Data Fluency (MARTIN. BUNZLI EPPLER (FABIENNE.) ) (Z-Library)
273 pages
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5143)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Psychology SA 4e Ch02 PowerPoint
100% (1)
Psychology SA 4e Ch02 PowerPoint
61 pages
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (460)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2126)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4355)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2001)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2787)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2876)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4087)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (835)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (918)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (814)
All Modules
No ratings yet
All Modules
87 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Week 14: Exam Preparation: Slide 1
No ratings yet
Week 14: Exam Preparation: Slide 1
48 pages
The Effect of Audit Experience, Complexity of Duties and Compliance Pressure On Audit Judgment
No ratings yet
The Effect of Audit Experience, Complexity of Duties and Compliance Pressure On Audit Judgment
9 pages
Brief Introduction To NLOGIT
No ratings yet
Brief Introduction To NLOGIT
26 pages
Arcamo - Full Imrad Research Paper
No ratings yet
Arcamo - Full Imrad Research Paper
12 pages
Immediate download Basic Econometrics 5th Edition Gujarati Solutions Manual all chapters
100% (35)
Immediate download Basic Econometrics 5th Edition Gujarati Solutions Manual all chapters
45 pages
Dimensions of Job Satisfaction Among Employees of Banking Industry in Nepal
No ratings yet
Dimensions of Job Satisfaction Among Employees of Banking Industry in Nepal
6 pages
A House Price Valuation Based On The Random Forest Approach: The Mass Appraisal of Residential Property in South Korea
No ratings yet
A House Price Valuation Based On The Random Forest Approach: The Mass Appraisal of Residential Property in South Korea
13 pages
Kartika Rusnindita, & Rachmad Hidayat. (2023) .
No ratings yet
Kartika Rusnindita, & Rachmad Hidayat. (2023) .
7 pages
RESEARCH
No ratings yet
RESEARCH
5 pages
Immediate download (Ebook) Forecasting Techniques for Urban and Regional Planning by Brian Field, Bryan Macgregor (editor) ISBN 9781138480551, 113848055X ebooks 2024
100% (4)
Immediate download (Ebook) Forecasting Techniques for Urban and Regional Planning by Brian Field, Bryan Macgregor (editor) ISBN 9781138480551, 113848055X ebooks 2024
76 pages
Business Research Methods 9th Edition by Zikmund Test Bank Compress
No ratings yet
Business Research Methods 9th Edition by Zikmund Test Bank Compress
36 pages
MMW - Midterm - Modules - DATA MANAGEMENT
No ratings yet
MMW - Midterm - Modules - DATA MANAGEMENT
29 pages
Beautysleep PDF
No ratings yet
Beautysleep PDF
5 pages
Immediate download Data Science Solutions on Azure: Tools and Techniques Using Databricks and MLOps Julian Soh ebooks 2024
100% (10)
Immediate download Data Science Solutions on Azure: Tools and Techniques Using Databricks and MLOps Julian Soh ebooks 2024
45 pages
IIIs Concept Notes 1
No ratings yet
IIIs Concept Notes 1
8 pages
Jaggia Chapter 7 1 Updated
No ratings yet
Jaggia Chapter 7 1 Updated
23 pages
10 - Ir. Muhammad Khairul Ikhwan Bin Mohd Ali PDF
No ratings yet
10 - Ir. Muhammad Khairul Ikhwan Bin Mohd Ali PDF
27 pages
Using Binary Logistic Regression Models For Ordinary Data With Non-Proportional Odds
No ratings yet
Using Binary Logistic Regression Models For Ordinary Data With Non-Proportional Odds
8 pages
2024 L2 QuantMethods
No ratings yet
2024 L2 QuantMethods
57 pages
DM2FT_August28
No ratings yet
DM2FT_August28
38 pages
Introduction To Econometrics PDF
No ratings yet
Introduction To Econometrics PDF
13 pages
Weaning From Mechanical Ventilation in ICU Across 50 Countries. WEAN SAFE - Pham 2023 PDF
No ratings yet
Weaning From Mechanical Ventilation in ICU Across 50 Countries. WEAN SAFE - Pham 2023 PDF
12 pages
Edsand, Broich - 2020 - The Impact of Education On Environmental and Renewable Energy Technology Awareness Empirical Evide
No ratings yet
Edsand, Broich - 2020 - The Impact of Education On Environmental and Renewable Energy Technology Awareness Empirical Evide
24 pages
Clonogenic Assay of Cells in Vitro - Nature Protocol 2006
100% (1)
Clonogenic Assay of Cells in Vitro - Nature Protocol 2006
5 pages
Energy Consumption Forecasting Model For Puerto Princesa Distribution System Using Multiple Linear Regression
No ratings yet
Energy Consumption Forecasting Model For Puerto Princesa Distribution System Using Multiple Linear Regression
4 pages
The Impact of Corporate Social Responsibility On Customer Loyalty in The Qatari Telecommunication Sector
No ratings yet
The Impact of Corporate Social Responsibility On Customer Loyalty in The Qatari Telecommunication Sector
29 pages
Regression Analysis - Stata Annotated Output: Use Https://stats - Idre.ucla - Edu/stat/stata/notes/hsb2
No ratings yet
Regression Analysis - Stata Annotated Output: Use Https://stats - Idre.ucla - Edu/stat/stata/notes/hsb2
6 pages

Rumus2 GBT

Uploaded by

Rumus2 GBT

Uploaded by

Expert Systems with Applications 39 (2012) 3659–3667

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications

1. Introduction data mechanism as unknown. As a result, algorithmic models sig-

and feature selection is performed as an integral part of the proce-

2. Predictive learning and boosting

where f1(), . . . , fp() are smooth functions. There is a separate

Driver characteristics Accident/conviction history Policy characteristics Vehicle characteristics

Severity models, respectively.

Fig. 2 displays the relative importance of the 10 most inﬂuential

while it is far less important in the severity model. Among the

0 20 40 60 80 100 VC5 0 20 40 60 80 100

20 30 40 50 60 70 80 0 5 10 15 550 600 650 700 750 800 850

Fig. 3. Partial dependence plots (frequency model).

0 5 10 15 10 20 30 40 50 60 70 80 0.05 0.10 0.15 0.20

Fig. 4. Partial dependence plots (severity model).

(0.418,0.896] (0.896,0.973] (0.973,1.05] (1.05,1.15] (1.15,3.36]

Fig. 6. Prediction accuracy of GB relative to GLM (based on test sample).

You might also like