0% found this document useful (0 votes)
2 views

Rate Merket

This paper discusses the use of quantile regression as an alternative method for rate-making in insurance, addressing the limitations of traditional regression models that often rely on restrictive assumptions. Quantile regression offers advantages such as robustness to outliers and the ability to handle heterogeneous portfolios without requiring independence of observations. The paper provides a comparison of both approaches and demonstrates the practical application of quantile regression in estimating net premium rates.

Uploaded by

mariana2845845
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Rate Merket

This paper discusses the use of quantile regression as an alternative method for rate-making in insurance, addressing the limitations of traditional regression models that often rely on restrictive assumptions. Quantile regression offers advantages such as robustness to outliers and the ability to handle heterogeneous portfolios without requiring independence of observations. The paper provides a comparison of both approaches and demonstrates the practical application of quantile regression in estimating net premium rates.

Uploaded by

mariana2845845
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Insurance: Mathematics and Economics 45 (2009) 296–304

Contents lists available at ScienceDirect

Insurance: Mathematics and Economics


journal homepage: www.elsevier.com/locate/ime

Using quantile regression for rate-making


Andrey A. Kudryavtsev ∗
Department of Economics, St. Petersburg State University, St. Petersburg, Russia

article info abstract


Article history: Regression models are popular tools for rate-making in the framework of heterogeneous insurance
Received November 2007 portfolios; however, the traditional regression methods have some disadvantages particularly their
Received in revised form sensitivity to the assumptions which significantly restrict the area of their applications. This paper is
November 2008
devoted to an alternative approach – quantile regression. It is free of some disadvantages of the traditional
Accepted 29 July 2009
models. The quality of estimators for the approach described is approximately the same as or sometimes
better than that for the traditional regression methods. Moreover, the quantile regression is consistent
Insurance branch category:
with the idea of using the distribution quantile for rate-making. This paper provides detailed comparisons
IB42 between the approaches and it gives the practical example of using the new methodology.
© 2009 Elsevier B.V. All rights reserved.
JEL classification:
C21

Subject category:
IM30

Keywords:
Regression models
Generalized linear models
Quantile regression
Confidence band
Rate-making
Quantile approach to the net premium
rate-making

1. Introduction and safety loading. The former corresponds to the expected value
of future losses. The traditional models for such estimation are
The process of rate-making is one of the most important based on assumptions of risk homogeneity and of the absence
functions of insurers. This paper is devoted to the statistical of outliers (catastrophic losses, for example). The problem is that
estimation of net premium rates including safety loadings. Expense those assumptions are often wrong in the actuarial practice.
loadings which are used to cover business expenses and to provide The safety loadings (supporting the insurer’s ability to cover
a normal profit are not the subject of the paper. losses) are determined from different considerations. They are
In order to correctly estimate rates or tariffs, the so-called usually estimated as a proportion to one of the moments of loss
premium principles are used in actuarial practice—see, for distribution. The popular approaches are the principle of expected
example, Anderson et al. (2004), Bühlmann (1970), Kudryavtsev value, the principle of variation and the principle of standard
(2004) and Mack (1997). Those principles connect the net premium deviation—e.g. see Bühlmann (1970).
rates with the random future cash flows generated by insurance Mack (1997) suggested another approach in which the net
contracts. The techniques of rate-making are actually based on premium rate as a whole may be estimated as a quantile of loss
loss distributions or their moments, which are estimated using distribution. The quantile of the distribution of random variable
historical data with some appropriate adjustments for changing Y is defined as yθ = inf {y : FY (y) ≥ θ} for a fixed probability θ .
trends. In the case of continuous distribution with strictly monotonous
The standard approach involves a separate analysis of two distribution function, the quantile can be defined as yθ = FY−1 (θ ).
parts of net premium: expected net premium (pure cost of risk) If Y is a random variable of future losses and 1 − θ is a proper
probability of ruin during the forecasting period then the quantile
can be thought of as the estimator of the net premium rate. This
∗ Tel.: +7 812 393 0059; fax: +7 812 273 2400. approach explains the needs of safety loading quite well and it does
E-mail address: [email protected]. not separate the rate into two components.
0167-6687/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.insmatheco.2009.07.010
A.A. Kudryavtsev / Insurance: Mathematics and Economics 45 (2009) 296–304 297

In practice, insurer portfolios are usually not homogeneous. In operational risks. The integrated description of those risks is given
other words, the objects insured are different from each other. among others by Crouhy et al. (2001) and Jorion (2001). The
Actuaries try to identify risk factors (covariates) that helping quantile regression is also a popular tool of estimating Value at Risk
explain differences between object insured. Then, the net premium in the financial framework—see Engle and Manganelli (1999).
rates are estimated on the basis of conditional loss distributions The method of quantile regression has been already used
(assuming the values of those covariates are known). This approach for insurance applications, but only to solve some technical
actually breaks the portfolio up into sub-portfolios being described problems. In particular, Portnoy (1997) suggested applying it
with different conditional loss distributions. Each sub-portfolio is for the graduation of mortality table rates. Pitt (2006) used it
usually more homogeneous than the total insurance portfolio. to estimate the claim termination rates for Income Protection
In practice, regression methods are traditional tools for the insurance. The author of the paper offered earlier using the
analysis of heterogeneous portfolios—see Anderson et al. (2004), quantile regression techniques for the estimation of net premium
Kudryavtsev (2004) and Mack (1997). Actuaries use both linear rates— Abduramanov and Kudryavtsev (2007). Applying quantile
and non-linear regressions. The latter is often recommended for regression for rate-making is also discussed by Pitselis (2007) in
actuarial applications using generalized linear models—see de the context of credibility.
Jong and Heller (2008) and Anderson et al. (2004). The ordinary
and generalized regression models based on the least squares
estimators are widely used in different fields including actuarial 2. Quantile regression
one.
Nevertheless, the traditional regression models are based on 2.1. Advantages of the quantile regression approach
assumptions that are often wrong for real insurance portfolios. It
is the main disadvantage of those models and the basic constraint The least squares method is the most common tool among
for their practical applications. regression techniques. It produces results under the assumptions
First of all, those models give estimates of conditional of normally distributed (Gaussian) errors and, sometimes, of their
expectations. For rate-making, the latter are thought of as the independence. Alternative approaches (like the generalized linear
estimates of expected net premium rates (pure costs of risks). And, models) are also based on quite strong assumptions about the type
hence, the traditional regression approach ignores safety loadings, of distributions and about some characteristics of data.
which should be separately analyzed from the first principles. Nevertheless, actuaries often meet much more complex
Moreover, traditional regression models often ignore the situations in their practice than that is assumed during modelling
specific feature of the insurance data used. In particular, the process. It may make the applications of traditional regression
following is typical for the real insurance portfolios: methods being false and, hence, useless. For example, the following
• The possibility of catastrophic losses. problems can occur:
• The dependence of insured objects on each other (e.g. cumulat- 1. Inaccurate estimate of loss distribution. The estimation errors
ing risk).
based on specific features of the sample may ruin the assump-
• The information shortfall to verify the statistical significance of
tions of its homogeneity and/or of the distribution belonging to
the model chosen.
an a priori set. As a result, the estimate of loss distribution may
The traditional regression models are not robust to the outliers be very different than the real one that generates a model error.
and based on distributions with light tails. Moreover, they often 2. Loss distributions with heavy tails. In practice, there is often a
require the independence of observations and a large sample size. need to give larger weights to the extreme values observed than
Therefore, those models should be used with great caution and it is assumed with the Gaussian case. For such situations, the
often need additional qualitative and quantitative information. least squares method may be inappropriate.
Recently, intensive research has been done that tries to 3. A (small) number of outliers in the sample. They are often thought
solve those informational and technical problems. In particular, to be ‘‘noise’’ which is impossible to separate from ‘‘regular’’
Rousseeuw et al. (1984) discussed the construction of robust data on the basis of a priori information. It may make the sample
statistical models for insurance applications in the traditional distribution asymmetric. This also restricts the use of the tradi-
regression framework. The problem of dependence in insurance tional regression tools and requires statistical procedures that
portfolios was intensively studied recently—e.g. Denuit et al. are more sensitive to such a sample ‘‘being littered.’’
(2005) and Dhaene et al. (2006). Nevertheless, an integrated 4. Dependence structure of the data. If the structure of the sample
solution to the problem does not exist. is complex enough, it is difficult or even impossible to identify
Moreover, the traditional regression approach does not usually
that structure. Hence, there are some problems in analyzing the
break up an insurance portfolio into absolutely homogeneous
structure with appropriate models (for example, using the vari-
groups of policies—there is a residual heterogeneity in each sub-
ance–covariance matrix or copula).
portfolio separated on the basis of risk factors (covariates). As a
result, the best model for future losses seems to be a mixture In other words, the traditional regression methods are inappropri-
of distributions. This does not support the assumption of the ate for some practical situations. As a result, any conclusions based
traditional regression methods about a fixed type of distribution on those methods may be incorrect.
taken from a restricted a priori set, for example, only Gaussian or a This motivates the usefulness of alternative approaches. One
member of the exponential family. of them is the quantile regression method whose main idea is to
In this paper, a new approach to estimating net premium rates replace the quadratic deviations by absolute ones. This approach
is offered. It is based on the idea of quantile regression proposed has a number of advantages:
by Koenker and Bassett (1978), Koenker and Hallock (2001) and
Koenker (2005). This approach offsets some disadvantages of the • It is a distribution-free method, which does not need any
traditional regression methods. Moreover, it is consistent with the assumptions about a parametric distribution family and does
idea of making net premium rates as the quantile of conditional not use any properties of it.
loss distributions. • It is robust to outliers.
The approach proposed is widely used for estimating different • It does not require independence or a weak degree of
types of risks in the financial area, first of all, market, credit and dependence.
298 A.A. Kudryavtsev / Insurance: Mathematics and Economics 45 (2009) 296–304

• Some conclusions about fluctuations of the value estimated are In practice, it is convenient to take adjusted estimates of
possible to reach directly from the estimation procedure. the least squares as the initial values β̂θ for solving the linear
Thus, the method of quantile regression outweighs some disad- programming problem. Another approach to selecting the initial
vantages of the traditional regression models. Even if those as- vector is to preliminarily use the quantile regression approach for a
sumptions are fulfilled the estimates received by the method of small part of the sample—see Koenker (2005) for details. According
quantile regression have almost the same quality as ones derived to our experience, the latter approach considerably reduces the
by the least squares method. If the assumptions are wrong the number of iterations and the time of calculation.
quantile regression gives better estimators. It will be shown in The representation of quantile regression as a linear program-
more detail in Section 6. ming problem has several important consequences. Firstly, it is
guaranteed that the computing procedure will be finished with a fi-
2.2. General model for quantile regression nite number of iterations. Secondly, as was mentioned earlier, this
is the robust approach. That is, if yi − xi β̂θ > 0, then yi can be in-
Let (yi , xi ) be a member of the set of observations (i = 1 . . . n), creased up to +∞ without ruining the estimates of β̂θ , and vice
where yi is a dependent variable in the regression equation,
versa, if yi − xi β̂θ < 0, then yi can be decreased up to −∞ without
and xi = (xi1 . . . xim ) is a row-vector of independent variables
changing those estimates.
(covariates). Then, according to Koenker and Bassett (1978),
Koenker and Hallock (2001), and Koenker (2005), the model is
given by formula 2.4. Confidence bands for the estimates of quantile regression

Quantθ (ỹi |xi ) = xi βθ , (1) The general idea of confidence intervals for quantile estimators
where Quantθ (ỹi |xi ) indicates conditional quantile of random is well known. For example, it can be found in the handbook edited
variable ỹi for probability θ provided vector of regressors xi , and by Sarhan and Greenberg (1957). The estimators of confidence
βθ is the corresponding column-vector of regression coefficients. bands for quantile regression are developing the idea. In particular,
In statistical practice, the distribution-free approach is often Zhou and Portnoy (1996) referred to the direct (or distribution-
used for estimation—see Koenker and Bassett (1978), Koenker and free) approach, studentization and bootstrap methods. The direct
Hallock (2001), Koenker (2005), and Hao and Naiman (2007). It approach is used further, in Sections 3 and 4, as it is the best both
is based on a large enough number n of observations (yi , xi ), i = in terms of decreasing the computational time and the adequacy of
1 . . . n. In that framework, the estimator β̂θ of vector βθ from for- confidence bands as a whole. Nevertheless, the bootstrap method
mula (1) could be obtained by solving the minimization problem: is used in Section 6 as it is better connected with simulation study.
( ) According to the direct method, the confidence bands are
1 X X estimated for every vector x as follows:
min θ yi − xi βθ + (1 − θ ) yi − xi βθ . (2)
βθ n
 
i:yi ≥xi βθ i:yi <xi βθ Iβθ = xβ̂θ−b , xβ̂θ+b , (4)
If θ = 1
,
it is the classical least absolute deviations (LAD)
2
q
xQ−1 x0 ·θ(1−θ)
, Q = n−1 i=1 x0 i xi ; γ ∈ (0, 1) is
Pn
estimator. The latter is also known as the median regression: where b = zγ n
1X1 a confidence probability (probability of covering true value by the
min yi − xi βθ . confidence set); zγ = Φ −1 (γ ) is the quantile of standard normal
βθ n 2
i
distribution for probability γ , Φ −1 (·) is the inverse function to
Karst (1958) and Wagner (1959) among others discussed the standard normal (Gaussian) distribution function.
model in details. That model is more popular in practice than Thus, in order to construct the confidence band for estimates
the more general approach of quantile regression. Nevertheless, of quantile regression, one needs to also calculate the quantile
that more general approach is much more important for actuarial regressions for probability values being equal to θ ± b. It makes
practice. the approach to be the universal tool for analysis.

2.3. Quantile regression as a linear programming problem 3. Quantile regression for rate-making

According to Koenker (2005) and Hao and Naiman (2007)


3.1. Some practical considerations of rate-making procedures
among others, the problem (2) can be represented as a linear
programming one of the following type:
Actuarial methods are used for solving practical problems.
θ · 1 · u+ + (1 − θ) · 1 · u− → min Hence, some specific features of rate-making procedures should
Xβθ + u+ − u− = y, be taken into account for modifying the well-known statistical

procedures. The actuary must consider both the properties of the
u+ ≥ 0, (3)
u− ≥ 0, data used and the design of the rating system. Although the latter
does not typically influence a choice of the statistical procedures it
where 1 is a row-vector (of dimension m) containing unit values; may change the form of using them.
X is the matrix of observation covariates (of dimension n × m); y is First of all, the estimation of a loss distribution is, in general,
a vector of independent variable (of dimension n); u+ and u− are split into two stages—(1) fitting the conditional distribution
vectors of correspondingly positive and negative deviations with function (given the loss incurs) to the data of losses and (2) adding
components the information about the policies without any losses (as jumps
of distribution functions in zero point). Managers and actuaries of
yi − xi βθ , yi ≥ xi βθ ,

+
i = yi − xi βθ
u+ = insurance companies are interested in the estimators combining
0, otherwise,
the results of both stages. But the regression models are easier to
xi βθ − yi , yi < xi βθ ,

+ apply for the loss data (used for the first stage) and, separately, for
ui = xi βθ − yi

=
0, otherwise. the values of the jumps—see de Jong and Heller (2008). This needs
A.A. Kudryavtsev / Insurance: Mathematics and Economics 45 (2009) 296–304 299

some modifications of loss distribution before adopting regression Table 1


methods. Results of the estimation using quantile regression.

In this framework, random variable ỹ of the loss incurred should Rating class Regression coefficients, Basic rate and  
be regarded as a truncated or conditional distribution (assuming β̂j adjustments, exp β̂j
the loss is positive). In order to construct the ‘‘full’’ distribution of Basic rating class 9.022 8280.53
random variable Y , the event ‘‘no losses’’ should be added. Let the (j = 0)
probability of the event be p = Pr [Y = 0]. After re-normalizing, Imported cars 1.388 4.01
the distribution function is FY (y) = p + (1 − p) · Fỹ (y) for y > 0. It Trucks / lorries 0.367 1.44
Luxury cars 0.545 1.73
is important for estimating quantiles of unconditional distribution
Car from mass 0.238 1.27
θ −p
 
segment
yθ = Fỹ−1 Light-colored cars −0.026 0.97
1−p
in terms of conditional distribution provided the loss occurs. it is possible to directly control the value of safety loadings by
Thus, the quantile approach to making of net premium rate is choosing the probability as a parameter of the model. Moreover,
based on probability
the approach generates the system of multiplicative adjustments
θ −p for different rating classes. The significant advantage of the
θ∗ = (5) approach is that the system is designed for net premium rates
1−p
as whole while it takes place only for expected net premium
applying for the conditional distribution of random variable ỹ rates (pure costs of risks) in the framework of other methods.
(given the loss occurs) that is estimated by the loss data. Although the separate estimates of expected rates and safety
Another important thing is the design of the premium rates loadings are sometimes important in actuarial applications the
estimator because some of them are more convenient for practical quantile approach is easily adapted for them. For example, the
use than the others. In particular, linear models are often not conditional mean could be replaced with the conditional median
appropriate. A more general type of the model can be represented (given the risk factors/covariates are known).
as the following:

y = g −1 (Xβ). (6) 3.2. Example of using the quantile regression for rate-making

It allows an actuary to take into account the influence of risk The data set for the example was provided by one of the major
factors (covariates) on the loss amount in the framework of linear Russian insurance companies. That set covers a large portfolio of
form while the model is non-linear. The different approaches to the vehicles insured against theft in St. Petersburg and its environs.
that model vary by choosing function g (·) and assuming the type The number of cars in that portfolio was 11,790, from which 2359
of errors. had been stolen (claims were reported). Thus, the proportion of
Function g (·) is usually differentiable and monotonic. In claims in the insurance portfolio was approximately 20%. The loss
some cases, the choice is bound with the features of statistical is measured in US dollars. The loss amount depends on both the
procedures. For example, canonical link functions are preferable risk of theft and the price of a car. Information about risks factors
in the framework of generalized linear models. In applications, (covariates) is available for each of the objects insured. It includes
the choice is often made to simplify practical considerations. the model (type) of the vehicle, its color, the geographic area and
In particular, a logarithm function is quite popular in the the date of the theft (if any). The information helps to reduce the
insurance practice because it is well connected with multiplicative degree of heterogeneity in the insurance portfolio.
adjustments—see Mack (1997) among others. Those adjustments In order to construct the quantile regression estimates, the
are accepted in pricing insurance products and in insurance probability θ should be fixed. In practice, it is chosen by
underwriting because those multiplicative adjustments are easily managers of the insurance company with pricing and solvency
understandable by managers, agents, and underwriters. For that considerations. In our example, it is θ = 0.95. In other words, the
function, the particular case of formula (6) could be re-written as probability that the actually incurred loss exceeds net premium is
the following: 5%. Then, formula (5) gives the modified value of quantile θ ∗ as
follows:
y = ex β = eβ 0 · ex 1 β 1 · · · · · ex k β k , (7)
β0
θ −p 0.95 − 0.8
where e is an estimator of net premium rate for the basic sub- θ∗ = = = 0.75.
portfolio (rating class), exj βj is a multiplicative adjustment for jth 1−p 0.2
group of risk (rating class) which the insured object belongs to. That value is used further for the estimation by the quantile
Combining formulae (1) and (6), one can use the method of regression method.
quantile regression in the following way: The dependent variables (covariates) are Boolean; they are the
indicators marked a car as belonging to an appropriate rating class.
Quantθ ∗ (ỹi |xi ) = g −1 xi βθ ∗ .

The procedure of estimation includes the selection of covariates
It may be also re-written for the multiplicative model of type (rating classes) with statistically significant influence on the loss
(7) with the logarithm function g (·) as follows: differences. In particular, the grouping of areas into 3 sets (city
central, outskirts and rural districts) is not important for quantile
Quantθ ∗ (ỹi |xi ) = exi βθ ∗ . (8)
estimates (although the means/expectations for those groups are
Problems (2) and (3) as well as confidence band (4) can be easily significantly different). The basic rating class is chosen as a set of
changed in the similar way. the cheap dark-colored cars produced in Russia.
Model (8) gives the rate estimators that are convenient for The procedure of estimation is to solve problem (3) for
practical use: the estimators are conditional quantiles (given probability of 75% with appropriate modification for model (8). The
observable risk factors xi known) of probability θ ∗ for ith policy results are shown in Table 1.
before a loss occurs. It is exactly the quantile definition of net The results are consistent with the practice: luxury and
premium rate with probability 1 − θ of ruin for that policy. Hence, imported (produced abroad) cars are often stolen. The color of the
300 A.A. Kudryavtsev / Insurance: Mathematics and Economics 45 (2009) 296–304

car is not an important characteristic although dark-colored cars Table 2


are slightly more often stolen. Comparing the amounts of total losses for different models and its observable value.
The model gives estimates for conditional quantiles of loss Multiplicative model based on linear regression 44,719,770
distributions which can be thought of as the net premium rates for Gamma regression 52,510,963
different rating classes. The approach is ‘‘direct’’ in that it allows Quantile regression 67,631,033
Actual values for the sample 52,556,493
an actuary to avoid the 2-step procedure of separate estimation
(predicting expected net premium rates and then safety loadings).
4.2. The example continued
4. Quantile regression approach compared with traditional
regression methods in the rate-making framework The aim is to compare different approaches in order to illustrate
the possibility that the quantile regression could be better than the
4.1. Some aspects of traditional regression methods traditional methods. The regression equations are fitted with the
data introduced in Section 3.2. Then average premiums and other
Traditional regression approaches based on least squares characteristics are compared.
method and generalized linear models are used in a specific way First of all, let us have a look at the histograms of loss data (see
in quantile rate-making framework. In order to compare different Fig. 1).
approaches, such specific features should be discussed. The form of the histogram helps to make the following
For traditional models based on the least squares method, assumptions about the possible type of unimodal distribution:
formula (6) becomes: 1. Losses are lognormally distributed which makes possible the
g (yi ) = xi β + εi or E g ỹi |x̃i = xi = xi β, use of linearization in the framework of least squares.
  
2. Losses are gamma-distributed (it makes sense to explore the
where εi is a random error. This model is consistent with the idea of gamma regression).
linearization using transformation g (·). The standard assumption
is that errors εi are normally distributed with zero expectations Test of fitting rejects both hypotheses although gamma
and appropriate variations σi2 . According to the discussion above, distribution looks slightly better. The result can be explained by
the distribution of random variable ỹi used in the model may be a form of histograms, for which a bimodal distribution seems to be
rather different from actuarial data. Although the assumptions of more adequate.
independence and of the identity of distributions (σi2 = const i ) can Thus, the traditional regression cannot be formally used to
be weakened to a certain extent it is not a solution of the problem analyze the data set. Nevertheless, practicing actuaries sometimes
to construct a realistic model. may use one of the traditional approaches in such a situation,
Nevertheless, the approach is quite popular among actuaries although the quality of the estimates is very bad. For them, it may
because the statistical apparatus has been well known and be better to have any numbers (even though of bad quality) than
statistical software is easily available. In particular, it is possible to nothing. Therefore, both traditional models are used in the paper,
use multiplicative model (7) choosing g (·) as logarithmic function. but only for the purposes of comparison.
In that case, the distribution of errors is assumed to be lognormal. As a first phase of comparison, the observable and predictive
In order to compare results from more traditional regression values of total losses (for all 3 models) are contrasted in Table 2.
methods with results from quantile regression, the estimator of The generalized linear model gives the most accurate estimate
quantile for the least squares method is needed. Normal errors are of total losses. The least squares method underestimated the total
assumed so the estimator is: losses by 14% compared with actual ones. This means a large
enough probability of ruin in the future, which is a significant
Quantθ (ỹi |xi ) = g −1 (xi β + zθ σi ) . (9)
disadvantage of the method (at least, in the current example).
Another approach to regression analysis is generalized linear The method of quantile regression overestimated total losses
models. In the framework of the approach, additive errors are by 28%. However, the latter estimate already includes safety
assumed in (6) as follows: loadings which are ignored with the other methods. A reasonable
yi = g −1 (xi β) + ηi x̃i = xi = g −1 (xi β) . overestimation of the net premium rates gives an additional fund
 
or E ỹi
for supporting the insurance solvency.
In this case, distribution of errors is selected from the broader The second phase of contrasting is based on the comparison of
set of distributions—the exponential family. In particular, it some characteristics, first of all 0.75th quantiles (third quartile), P Y ,
includes normal (Gaussian), inverse Gaussian, gamma, Poisson, of conditional distributions (given loss incurs) and their confidence
binomial and negative binomial distributions. The best estimates of intervals. For multiplicative models based on a linear regression
regression coefficients are obtained by maximizing the likelihood
model and on a gamma regression, formulae (9) and (11) are used.
function. The function g (·) is often chosen as the canonical link
Quantile regression model (8) discussed in Section 3.2 directly
function which simplifies the process of finding solution for that
gives estimates for the all quantiles needed.
maximization problem. For example, when the generalized linear
In order to compare the results of the estimation for portfolio
model is based on gamma distribution (the so-called gamma
as a whole, the quantile estimates for each rating class are mixed
regression) the canonical link function is logarithmic which gives
on the basis of the actual structural distribution. The result is
multiplicative model (7)
the average premium for contract. This allows a comparison the
yi = exp(xi β) + ηi x̃i = xi = exp (xi β) . total premium amount for the portfolio. The results are shown in
 
or E ỹi (10)
Table 3.
The estimator of quantile in generalized linear models is
The first line includes quartile estimates given by the different
based on the information or the assumption about the underlying
approaches. They are conditional (given loss incurs) and should
distribution. In particular, for gamma regression one should solve
be re-calculated into unconditional values (the second line). The
the equation
information in other lines is used for comparisons. First of all, let
Z yθ ·α exp(−xβ) us contrast the value of total net premium estimates from different
1
t α−1 e−t dt = θ (11) methods in the context of the total losses showed in Table 2.
Γ (α) 0
The method of least squares overestimates total net premium
for yθ , where α is a parameter of the gamma distribution. by 46% although it underestimates the total loss. This means that
A.A. Kudryavtsev / Insurance: Mathematics and Economics 45 (2009) 296–304 301

Fig. 1. Loss histogram.

Table 3
Comparison of the results obtained by the different methods of estimation for portfolio as a whole.
Estimated characteristic Multiplicative model based on linear regression Gamma regression Quantile regression
Y
0.75th quantile, P (3rd quartile) 32,436 37,168 28,669
Average net premium per contract, P 6,487 7,434 5,734
Total net premium for the portfolio 76,515,744 87,679,513 67,631,033
Lower endpoint of confidence interval 5,151 6,218 5,094
Upper endpoint of confidence interval 8,520 9,513 6,255
Range of confidence interval 3,369 3,295 1,160

traditional methods. And importantly, the upper endpoint of the


confidence interval for the quantile regression methods is strictly
less than ones for traditional methods. In other words, the accuracy
of estimation within the quantile regression method is relatively
high.

5. Simulation study

The example of Sections 3.2 and 4.2 is important as a real


illustration of the approach offered; but it does not give an
explanation of the quality of quantile regression estimators. This
Fig. 2. The estimates of average premium rate and their confidence intervals for can be achieved with a simulation study. All 3 regression methods
different methods. are compared in that framework.
The simulation study is based on a series of data sets generated.
the safety loadings are too large. As a result, the method of the least Each data set consists of 100 ‘‘observations’’ of losses. Let i be a
squares seems not to be robust for such data. number of data set, j be a number of generated value, yij be the
The generalized linear models, as already noted, quite accu- loss generated. Each data set is connected with the model ‘‘good
rately describe the total loss, but its estimate of the total net pre- risks vs. bad risks’’. Approximately 30% of each data set is ‘‘bad
miums is overstated by 67%. Taking safety loadings into account, risks’’ (with larger expected losses). They are marked with special
resulting tariffs seem not to be competitive on the insurance mar- Boolean indicator xij = 1 while xij = 0 indicates a ‘‘good’’ risk.
ket. Safety loadings, as with the method of least squares, seem not In order to show the most traditional ways of using statistical
to be robust. procedures, the least squares method is based on a linear model
Due to the fact that the quantile regression method allows
yij = α + β xij + εij ,
directly estimating net premium rate, the total loss can be seen as
the volume of total premiums. As already noted, its overestimate the gamma regression method follows formula (10), and formula
of 28% over the actual loss can be interpreted as safety loadings. (8) underlies the proposed method of quantile regression.
In absolute values, it is equal to 15 million US dollars, which is at In mathematical terms, each data set represents a mixture of
least twice lower than the estimates obtained by the traditional distributions. All ‘‘good’’ risks are independent copies from one
approaches. It looks as if the estimation is more competitive on the distribution while all ‘‘bad’’ risks are independent copies from
market. another distribution of the same type. The effective regression
Further, it would be a good idea to compare the confidence method must split the data set into ‘‘good’’ and ‘‘bad’’ risks and
intervals for the estimates of average net premium from different predict an appropriate quantile.
methods. They are given in Table 3 and shown in Fig. 2. The number of losses being equal to 100 is connected with
As can be seen from the figure, the range of the confidence relatively small insurance portfolios. In order to study a possible
interval for quantile regression method is 3 times less than ones for error committed with the limited volume, 250 sets of each
302 A.A. Kudryavtsev / Insurance: Mathematics and Economics 45 (2009) 296–304

Table 4
The basic characteristics of the data generated.
Type of risk Data set Expected value Standard deviation Third quartile

‘‘good’’ Normal (Gaussian) 300.0 150.0 401.17


Gamma with small variation 300.0 150.0 383.21
Gamma with large variation 300.0 375.0 409.34
Mixture of distributions 300.0 375.0 643.51
‘‘bad’’ Normal (Gaussian) 450.0 150.0 551.17
Gamma with small variation 450.0 150.0 540.12
Gamma with large variation 450.0 382.0 618.50
Mixture of distributions 450.0 382.0 873.35

type have been generated. They used for the bootstrap method and σ̂ is a sample estimate of standard deviation. As gamma dis-
of the confidence intervals estimation. Four different types of tribution is used for generalized linear models the estimation of
distribution mixtures are used in the simulation study. third quartile is based on that distribution function with appropri-
The first type is normal (Gaussian) distribution mixture with ate values of parameters—see formula (11).
ỹgood ∼ N (300; 150) and ỹbad ∼ N (450; 150). The second type In order to understand the quality of methods, the confidence
is gamma distribution with ỹgood ∼ G(4; 75) and ỹbad ∼ G(9; 50). intervals are constructed with bootstrap methodology. The confi-
Those random values have the same expectations and variations dence interval is fixed as one between 7th and 244th values of the
as ones of the first type; but the gamma distribution is slightly set of ordered estimates within the group of 250 generated mix-
asymmetric. Those facts are used for comparisons to understand tures of the same type. Those intervals have the significance value
how important the symmetry is. of 95.2%.
The third type is gamma distribution as well, but with much This method was chosen because it has good connections to
larger variations: ỹgood ∼ G(0.64; 468.75) and ỹbad ∼ G(1.388; the simulation approach and because the volume of each data set
324.276). The expectations are the same as ones for the second was limited to supporting specific features of insurance records.
type, but asymmetry is larger. It is possible to analyze both types Those bootstrap confidence intervals are more or less the same
of gamma mixtures to test the effect of the degree of asymmetry as ones estimated with approximation approaches on the base of
on the quality of regression method. 3000 generated values. The latter makes the normal approximation
The fourth type is the mixture of two mixed distributions with quite precise. The results of the simulation are shown in Fig. 3.
density function 0.6f1 (x) + 0.4f2 (x) where f1 (x) is the density of The quantile regression approach is a bit worse than the
Pareto distribution (with parameters 3 and 40 for ‘‘good’’ risks or traditional methods for normal and gamma distributions, but the
4 and 450 for ‘‘bad’’ risks), and f2 (x) is normal (Gaussian) density latter method is obviously appropriate for distributions with heavy
with mean 720 and standard deviation 239.50 for ‘‘good’’ risks tails where it is the only adequate estimation approach. Although
or mean 900 and standard deviation 102.15 for ‘‘bad’’ risks. It is the range of confidence intervals for quantile estimates tends to be
guaranteed that those mixed distributions are highly asymmetric a bit larger than for the ones from other methods, the midpoints
because of Pareto distribution and ‘‘humps’’ on tails generated with of those intervals for the quantile regression method are usually
Gaussian parts. Hence, the residual heterogeneity in sub-portfolios closer to the true values.
is modelled. Nevertheless, their expectations and variations are The quality of the least squares method becomes worse as
equal to ones for the third type. The comparisons can be made to the asymmetry is larger. The degree of tail heaviness seems to
contrast distributions with different degrees of tail heaviness. be critical for generalized linear models. Although the traditional
The aim is to estimate the quantile chosen. In the framework methods quite appropriately estimate the expected values, they
of the study, the third quartile (0.75th quantile) is taken as the made large additional errors in the quantile estimates. This
target quantity. The latter is different for contrasted types of certainly is a result of the wrong assumptions. In contrast, the
distributions although they are quite close to each other. The basic method of quantile regression gives accurate results in every case
characteristics of the data generated are given in Table 4. as it is distribution-free.
Those sets of data are analyzed with 3 regression approaches This is the explanation of the real data results of Sections 3.2
– the least squares method (LSE), the generalized linear models and 4.2, where the quantile regression was better than generalized
(GLM) and the quantile regression (QR). As the characteristics of linear models. Moreover, the simulation study gives some
the contrasted data sets are similar, the differences in predictions arguments for using the quantile regression in the actuarial
will be explained by the quality of methods using for them. practice where the degree of a residual heterogeneity is typical.
The first two methods give the estimation of expected values
while the quantile regression may directly predict the third
6. Conclusions
quartile. This quantity should be estimated for the least squares
method and the generalized linear models according to underlying
The methods of estimating net premium rates are discussed in
assumptions (as appropriate quartiles for normal or gamma
this paper. The traditional methods of estimation (the least squares
distributions correspondingly). It simulates the analysis made
method and the generalized linear models) are compared with a
by a practicing actuary who actually does not know the exact
new approach to making net premium rates – quantile regression.
distribution, but follows the assumption of the method used.
The comparison is based on the mathematical properties of models
For the least squares method, the assumption is the normal
and illustrated with an example from real insurance data. The new
distribution of errors. The third quartile is estimated as follows:
approach to rate-making is shown to have superior efficiency per
E yij |xij + z0.75 σ̂
 
the simulation study.
  The traditional regression approaches have several significant
where E ỹij |xij is the appropriate estimate of the expected value, disadvantages. First of all, they are not robust. An additional
E yij |xij = 0 = α̂ and E yij |xij = 1 = α̂ + β̂ , z0.75 is the third
   
problem is that there can be difficulties in estimating the size of
quartile of standard normal distribution (being equal to 0.67449), errors associated with the violation of the underlying assumptions.
A.A. Kudryavtsev / Insurance: Mathematics and Economics 45 (2009) 296–304 303

Fig. 3. The estimates of third quartile and their confidence intervals in the simulation study.

Moreover, the assumptions often contradict the features of real interpretation. Therefore we recommend its use in actuarial
data. applications.
The quantile regression approach overcomes the disadvantages
of the traditional methods. Even if the assumptions of the latter Acknowledgement
are reasonable, the former gives quite appropriate estimates; but
in situations in which those assumptions are wrong the quantile The author is grateful to the Editor and the anonymous reviewer
regression estimates significantly better than the ones given by for valuable comments and helpful criticism that greatly improved
traditional methods. this paper. The author would like to thank R. Abduramanov for
Two important advantages of the method of quantile regression access to the real insurance data used in the example of this paper.
are its robustness to the data and its unbiased estimators. More-
over, it is a distribution-free approach. In addition, the confidence
intervals for the estimates under the quantile regression seem to be References
almost as wide as ones for traditional methods. In the example, the
Abduramanov, R., Kudryavtsev, A., 2007. The method of quantile regression,
net premium estimated with quantile regression is the most com- a new approach to actuarial mathematics. In: 11th International Congress
petitive on the market among all considered forecasts. The method Insurance: Mathematics and Economics. July 10–12, 2007, Piraeus, Greece. Book
proposed is conforming to the idea of quantile definition for net of Abstracts. pp. 56–57.
Anderson, D., et al. 2004. A practitioner’s guide to generalized linear models. CAS
premium rates. Discussion Paper Program. pp. 1 –115.
Thus, the estimates of net premium rates derived from the Bühlmann, H., 1970. Mathematical Methods in Risk Theory. Springer, Berlin.
quantile regression have better technical properties and better Crouhy, M., et al., 2001. Risk Management. McGraw-Hill, New York.
304 A.A. Kudryavtsev / Insurance: Mathematics and Economics 45 (2009) 296–304

Denuit, M., et al., 2005. Actuarial Theory for Dependent Risks: Measures, Orders and Kudryavtsev, A., 2004. Lectures on Rate-Making for Non-Life Insurance. European
Models. Wiley, Chichester. University at St.Petersburg, St.Petersburg, (in Russian).
Dhaene, J., et al., 2006. On the structure of premium principles under pointwise Mack, Th., 1997. Schadenversicherungsmathematik. VVW, Münchener Rück,
comonotonicity. Theory of Stochastic Processes 12 (3–4), 27–45. Karlsruhe.
Engle, R.F., Manganelli, S., 1999. CAViaR: Conditional value at risk by quantile Pitselis, G., 2007. Quantile regression in a credibility framework. In: 11th Int-
regression. National Bureau of Economic Research. Working Paper No. 7341. ernational Congress Insurance: Mathematics and Economics July 10–12, 2007,
Hao, L., Naiman, D.Q., 2007. Quantile Regression. SAGE Publication, Los Angeles Piraeus, Greece. Book of Abstracts. pp. 52.
Pitt, D.G.W., 2006. Regression quantile analysis of claim termination rates for
(Series/Number 07–149).
income protection insurance. Annals of Actuarial Science 1 (II), 345–357.
de Jong, P., Heller, G.Z., 2008. Generalized Linear Models for Insurance Data.
Portnoy, E., 1997. Regression-quantile graduation of Australian life tables,
Cambridge University Press, Cambridge. 1946–1992. Insurance: Mathematics and Economics 21 (2), 163–172.
Jorion, Ph., 2001. Value at Risk: The New Benchmark for Managing Financial Risk. Rousseeuw, P., et al., 1984. Applying robust regression to insurance. Insurance:
McGraw-Hill, New York. Mathematics and Economics 3 (1), 67–72.
Karst, O.J., 1958. Linear curve fitting using least deviations. Journal of the American Sarhan, A.E., Greenberg, B.G. (Eds.), 1957. Contributions to Order Statistics. Wiley,
Statistical Association 53 (281), 118–132. New York.
Koenker, R., 2005. Quantile Regression. Cambridge University Press, Cambridge. Wagner, H.M., 1959. Linear programming techniques regression analysis. Journal of
Koenker, R., Bassett, G., 1978. Regression quantiles. Econometrica 46 (1), 33–50. the American Statistical Association 54 (285), 206–212.
Koenker, R., Hallock, K.F., 2001. Quantile regression. Journal of Economic Zhou, K.Q., Portnoy, S.L., 1996. Direct use of regression quantiles to construct
Perspectives 15 (4), 143–156. confidence sets in linear models. The Annals of Statistics 24 (1), 287–306.

You might also like