credibility-using-semiparametric-models
credibility-using-semiparametric-models
BY VIRGINIA R. YOUNG
School of Business
University of Wisconsin-Madison
ABSTRACT
KEYWORDS
1. INTRODUCTION
the resulting model may not represent the loss data very well. One method of
circumventing this problem is to apply empirical Bayesian analysis in which one
uses the data to estimate the parameters of the model (Klugman, 1992).
In this paper, we use a semiparametric mixture model to represent the
insurance losses of a portfolio of risks: We choose a flexible parametric
conditional loss distribution for each risk with unknown conditional mean that
varies across the risks. This conditional distribution may depend on parameters
other than the mean, and we use the data to estimate those parameters. Then, we
apply techniques from nonparametric density estimation to estimate the
distribution of the conditional means.
In Section 2, we describe a mixture model for insurance claims and estimate
the prior density using kernel density estimation. In Section 3, we calculate the
credibility estimator assuming squared-error loss and also give the projection of
that estimator onto the space of linear functions. Finally, in Section 4, we apply
our methodology to simulated data from a mixture of a lognormal conditional
over a lognormal prior. We show that our method can lead to good credibility
formulas, as measured by the mean squared error of the claim predictor, even
when we use a gamma conditional instead of a lognormal conditional.
Bayesian paradigm, our uncertainty about 0 for any particular risk would be
represented by a continuous random variable. Also, if r is large, then the variable
6 can be well approximated by a continuous random variable. Even if r is not
large, the collection of r risks may be a sample from a larger population of risks
whose distribution can be approximated by a continuous distribution. Assume
that the experience of different risks is independent.
Note that our model is a special case of the one given by Buhlmann and
Straub (1970). Because Xy is the random variable of an average of wy iid claims
Yu Y2, ..., y,v,,., given 0h we have that £[J!f/;|0/] = E[Y\et] = 0,- is independent of
the period/ It also follows that
if j
CoV[XihXik\0] = { ^ t " =K
I 0, if jjtk,
as in the Biihlmann-Straub model. In the literature, £'[1/J0,] is called the hypothe-
tical mean and Kar[y|0/] the process variance. Note that we -assume the
observations for a risk arise as arithmetic averages of an underlying claim
random variable Y\6, while Buhlmann and Straub (1970) do not assume this in
their more general model.
The goal of credibility theory is to predict the future claim y (or an average of
future claims) of a risk, given that the risk's claim experience is x and exposure w.
In this paper, we restrict our attention to credibility formulas that are functions of
a single statistic because they are easier to estimate and to use. We choose the
_ Yl"i=\ wijxij
sample mean as our statistic, x, = J^n.—:—- because the claim experience x is a
vector of averages. However, we do not restrict a claim estimator to be linear.
To pick a parametric conditional distribution for Y\6, we use the following
criteria:
. E[Y\6}=0
• The sample mean is a sufficient statistic for 0.
• The functional form of /ij>|0) is closed under averaging. That is, if X is an
average of w claims that follow the distribution given by f(y\0), then the density of
X has the same functional form as f{y\0).
Three such families of densities are commonly used in actuarial science to
model insurance losses—(1) the normal, with mean 0 and fixed variance a2, (2) the
a
gamma, with mean 0 = — and fixed shape parameter a, and (3) the inverse
03
gaussian, with mean 0 and fixed A = ——j-^rjr- Indeed, Y\6 ~ -Y(0, a2) implies
that if X is an average of w iid claims Y\, Y2, ..., YK, given 0, then
X\0 ~ N(e, CT2/H<)- Similarly, if Y\8 ~ G(0, a), then X\8 ~ G(0, wa), and the
probability density function of Y\6 is
Finally, if Y\9 ~ InvG{9, A), then X\9 ~ InvG(9, wX) and the probability
density function of Y\9 is
\(y-e)2
f{y\o) = 2 y92
, y>0.
K{t)dt= 1.
If we were to observe directly the conditional means 9\, 9i, ..., 9r, then the kernel
density estimate of ir(9) with kernel K would be given by
claims for the /''' risk so that the expectation of 9 is the sample mean
x = —;,.' ~^,,' = ^-,.'" — in which w, = Yl"jL\ wii- We, therefore, propose the
following kernel density estimator for TT(#)
in which \vlol = ~Y^\= 1 HV = X]/=i S / i i w'</- See t n e Appendix for a discussion of the
asymptotic mean square consistency of ft(9).
Two commonly used symmetric kernels are (1) the Gaussian kernel, G,
In our example in Section 4, we use the Epanechnikov kernel because its domain
is bounded, and we can, therefore, easily restrict the support of ft(6) to lie in the
positive real numbers.
Remark: The Epanechnikov is optimal with respect to mean integrated square
error (Silverman, 1986). The efficiency of the Gaussian kernel with respect to the
optimal Epanechnikov kernel is roughly 95% (Silverman, 1986), so one does not
lose much efficiency by using the Gaussian kernel. Silverman, therefore, suggests
that one choose the kernel according to auxiliary requirements, such as ease of
computing. •
There are many techniques for choosing the window width /;,; see, for
example, Silverman (1986, Section 3.4) and Jones, Marron, and Sheather (1996).
In our example in Section 4, we use a (modified) fixed window width selected by
reference to a standard distribution (Silverman, 1986, Section 3.4.2). The window
width h that minimizes the mean integrated squared error is given by
i - 2 / 5 r /• ~)]/$( r l" 1 / 5
To approximate this optimal window width h, ones assumes that ir(8) is say,
normal, with mean 0 and standard deviation a. In that case, the term JIT"(9) d9
equals ^ir~]/2a~5. We modify the window width h at each point x, to ensure that
the density has support on the nonnegative real numbers. Specifically, we set ht
equal to /z, if h < —j= otherwise, we set h, equal to —j=.
For a general kernel K and bandwidths //,-, this estimated posterior mean of 6 can
be written
Recall that x is an average of w iid claims, each of which follows the density
/(v|#), as in Section 2.1. If we constrain the estimator d to be linear, then it is well-
known that-the least-squares linear estimator of £"[^|x] = iT^lx] is
d(x) = (l~Z)E[Y] + Zx, (3.2)
the prior density (2.2), we obtain E[Y] = E[0] = x, as noted in Section 2.2. In the
:
case of the normal conditional; k = , in the case of the gamma condi-
k =
To end this section, we show that as w approaches oo, p,(x) approaches the
true expected value 6Q, for the given risk. Because X\9 has mean 9 and variance
— under certain regularity conditions, (DeGroot, 1970) and (Walker,
w
1969), the density f(x\9) approaches the delta function with its mass concentrated
at the point ~x = 9Q Then,
J9f(x\9)ir(9)d9
Thus, as an actuary gets more claim information for a given policyholder (w gets
large), the estimated expected claim approaches the true expected claim with
probability 1.
l
,/W0) =
ox
in which a > 0 is a known parameter, and
in which fj, > 0 and r > 0 are known parameters. That is, ( ) |
N(\n(/>, a2), and l n 0 ~ N(\n/j., r 2 ) . The marginal distribution of X is lognor-
mal; lnJST ~ Af(ln/x, a1 + r 2 ) .
Given claim data for a specific policyholder, X = x =< x\, X2, ..., xn>
e [0, oo]", the posterior distribution of 4>\x is lognormal; (\n(f))\x~ N
(ln/i*, T*' j , in which
1
' o1 In a + r2t
= exp'
and
a2 + nr2'
a2 {a2 + (n + \)T2)\
J
\ ) (4-1)
EPV =
100(5- 1 ) ^ ' =
100-
(Willmot, 1994, Section 5.1).
TABLE 4.1
DESCRIPTIVE STATISTICS OF h, MSE, MSEB, AND RATIO
80000 - 0
70000 - /
60000 - 0
/
0 0 /
50000 - o o o £ °° / o
o o /
40000 -
0 0 QjO 0 /
30000 -
<\ & o ° o ° o ° o l f ° 0
20000 -
° * V § 4 ° I ° 08 Oo4^^ >0 ° 0
10000 -
o - o ^ ° oof^*
I I I I I I
300 400 500 600 700 800 900
with vertex at 542. See Figure 4.1 for a graph of this quadratic superimposed on a
scatter plot of the observations.
We also computed some of the mean squared errors up to the 99'A percentile
and found that the estimated predictive mean compared poorly relative to the
Biihlmann credibility estimator. We conclude that our estimate of the prior
density at larger conditional means may suffer. Silverman (1986) suggests a
variable bandwidth approach for estimating densities with long tails which uses
6*10 1 1 1 1
4*10
/; \
V
I
2*10
1 1
1300 2600 3900 5200 6500
Estimated marginal density of X
True marginal density of X
FIGURE 4.2: Estimated and True Marginal Densities of Claims.
larger bandwidths in the regions of lower density. We tried this method without
increased accuracy in the upper percentiles of our claim estimator. We suspect
that the poor fit at the higher percentiles may be due to our using a medium-tailed
gamma conditional to model a heavy-tailed lognormal. We encourage the
interested reader to investigate using an inverse gaussian instead of a gamma
conditional to model the conditional claim distribution.
See Figure 4.2 for graphs of the estimated and true marginal densities of X for
one of the simulations1. Of the graphs we plotted, Figure 4.2 is typical, in that the
estimated marginal density of X is less skewed than the true density.
See Figure 4.3 for the corresponding graphs of the estimated and true
predictive means. Notice how closely the estimated predictive mean follows the
true predictive mean, compared with the linear Buhlmann estimator for claims
less than 4000. Also note how the estimated predictive mean diverges upward for
claims larger than 4000. This phenomenon occurred in all of the several graphs
that we plotted and is due, we believe, to the fact that we used a gamma
conditional to estimate a lognormal. It may also be due to computational errors
In this run, h = 476, MSE = 12,076, and MSEB = 84.571. Recall that « = 1 and that the claim
amount 6,500 is the 95"' percentile of X.
6000
4000 —
2000 —
because there are only a few simulated claims in the right tail. One way to adjust
the estimated predictive mean to eliminate this divergence is to extend it linearly
beyond some large value of the sample mean. Another solution may be to use a
conditional distribution with a longer tail, such as the inverse gaussian. Yet
another solution may be to apply my method of blending the criteria of accuracy
and linearity (Young, 1997).
accuracy and linearity (Young, 1997). Also, it would be interesting if one were to
extend the model to include a trend component, as in Hachemeister (1975), and
apply kernel density estimation in the more general model.
ACKNOWLEDGMENTS
I thank the Committee for Knowledge Extension and Research of the Society of
Actuaries (SOA) and the Actuarial and Education Research Fund for financial
support. I especially thank my SOA Project Oversight Group (Hans Gerber and
Gary Venter, led by Thomas Herzog and assisted by Warren Luckner of the
SOA) for helpful guidance.
APPENDIX
ASYMPTOTIC MEAN SQUARE CONSISTENCY OF ( 2 . 2 )
Let ff(0) = X]/=i ^~,h^-\ir) denote the kernel density estimator of n when we
are given observations #,, / = 1, 2, ..., r. Consider the mean squared error of the
density estimate -ft at a fixed value 8 :
= E £[(#(0) - ir(6))2}
By the law of large numbers (Serfling, 1980), x, approaches #,-, with probability
one, as Wj approaches infinity. Therefore, as w, approaches infinity, the first term
in the mean squared error goes to zero. By Silverman (1986) or Thompson and
Tapia (1990), the second and third terms go to zero as r goes to infinity if
lim hi = 0 and lim rhj = oo.
r—too r—<oo
REFERENCES
JONES, M. C , J. S. MARRON, and S. J. SHEATHER (1996), A brief survey of bandwidth selection for
density estimation, Journal oj the American Statistical Association, 91: 401-407.
KLUGMAN, S. A. (1992), Bayesian Statistics in Actuarial Science with Emphasis on Credibility, Kluwer,
Boston.
SERFLINO, R. J. (1980), Approximation Theorems of Mathematical Statistics, Wiley, New York.
SILVERMAN, B. W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall,
London.
THOMPSON, J. R. and R. A. TAPIA (\990), Nonparametric Function Estimation, Modeling, and Simula-
tion, Society for Industrial and Applied Mathematics, Philadelphia.
WALKER, A. M. (1969), On the asymptotic behaviour of posterior distributions, Journal of the Royal
Statistical Society, Series B, 31: 80-88.
WILLMOT, G. E. (1994), Introductory Credibility Theory, Institute of Insurance and Pension Research,
University of Waterloo, Waterloo, Ontario.
YOUNG, V. R. (1997), Credibility using a loss function from spline theory: Parametric models with a
one-dimensional sufficient statistic, to appear. North American Actuarial Journal.
VIRGINIA R. YOUNG
School of Business
975 University Avenue
University of Wisconsin-Madison
Madison, Wl USA 53706