0% found this document useful (0 votes)
74 views

CourseWork1 20122013 Solutions

This document discusses solutions to coursework problems involving statistics and probabilistic modelling for insurance. The first problem involves calculating properties of a bivariate normal distribution, such as its density function and marginal densities. The second problem involves fitting a lognormal and Pareto distribution to sample data and performing a goodness-of-fit test. The third problem calculates properties of an excess loss model, such as the expected value of the retained loss. The fourth problem involves modelling aggregate claims as the product of an indicator variable and claim severity variable, and estimating parameters using maximum likelihood.

Uploaded by

yous123
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

CourseWork1 20122013 Solutions

This document discusses solutions to coursework problems involving statistics and probabilistic modelling for insurance. The first problem involves calculating properties of a bivariate normal distribution, such as its density function and marginal densities. The second problem involves fitting a lognormal and Pareto distribution to sample data and performing a goodness-of-fit test. The third problem calculates properties of an excess loss model, such as the expected value of the retained loss. The fourth problem involves modelling aggregate claims as the product of an indicator variable and claim severity variable, and estimating parameters using maximum likelihood.

Uploaded by

yous123
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Statistics and Probabilistic Modelling for Insurance

Solutions to Course Work No1 2012/2013


1. (a) Since f (x, y) is a joint density, we have that
(1)
_
-

_
-

f (x, y) y x = c
_ _
x
2
+y
2
R
2
y x = 1.
Noting that the double integral is equal to the area of the circle, i.e.,
_ _
x
2
+y
2
R
2
y x = p R
2
from (1) we obtain that , c =
1
p R
2
.
(b) We have
f
X
(x) =
_
-

f (x, y) y =
1
p R
2
_
x
2
+y
2
R
2
y
=
1
p R
2
_
- R
2
-x
2
R
2
-x
2
y =
2
p R
2
R
2
-x
2
,
if x
2
R
2
and f
X
(x) = 0 if x
2
> R
2
. By symmetry, the marginal density of Y is given
by
f
Y
(y) =
2
p R
2
R
2
- y
2
0
for
for
y
2
R
2
y
2
> R
2
.
(c) The distribution function, F
D
(a), 0 a R, of the distance, D = X
2
+Y
2
is
obtained as follows
F
D
(a) = P(D a) = P X
2
+Y
2
a = P|X
2
+Y
2
a
2
]
=
_ _
x
2
+y
2
a
2
f (x, y) y x =
1
p R
2
_ _
x
2
+y
2
a
2
y x =
p a
2
p R
2
=
a
2
R
2
,
where we have used the fact that
] ]
x
2
+y
2
a
2
y x is the area of a circle of radius a and
thus is equal to p a
2
. When R = 50 miles and a = 10 miles we obtain
P(D 10) =
100
2500
= 0.04.
(d) Using the distribution function, F
D
(a), from part (c) we obtain
f
D
(a) =

a
F
D
(a) =
2a
R
2
, 0 a R.
Hence, we have
E[D] =
_
0
R
a
2a
R
2
a =
2
R
2
_
0
R
a
2
a =
2R
3
.
2. (a) Equating the theoretical mean and variance to the empirical mean, x, and
variance, we get

m+
1
2
s
2
= x

2m+s
2
|
s
2
-1] = _
i=1
120
(x
i
-x)
2
119

2m+s
2
= _
i=1
120
x
i
f 120

2m+s
2
|
s
2
-1] = _
i=1
120
(x
i
-x)
2
119

s
2
-1=
_
i=1
120
(x
i
-x)
2
119
x
2
Hence, we have

s
2
-1=
15601373
2020.292
2
s
2
= Log_1+
15601373
2020.292
2
= 1.57327 s = 1.2543
and

m+
1
2
s
2
= 2020.292 m = Log[2020.292] -
1
2
s
2
= 6.82436
(b) The likelihood function takes the form
L(a, l) = [
i=1
n
f
X
(x
i
) = [
i=1
n
a l
a
(l + x
i
)
a+1
and hence
logL(a, l) = l(a, l) = Log [
i=1
n
a l
a
(l +x
i
)
a+1
= _
i=1
n
|log( a l
a
) -log|(l +x
i
)
a+1
]] = _
i=1
n
(log(a) +a log(l) -(a+1) log(l + x
i
))
= n log(a) +n a log(l) -_
i=1
n
(a+1) log(l +x
i
)
SPMI CW1 2012/2013 Solutions
2
Differentiating the log-likelihood function wrt a and l, and then solving for a, one
find the maximum likelihood estimators must satisfy
d logL(a, l)
d a
= n
1
a
+n log(l) -_
i=1
n
log(l +x
i
) = 0
a
`
=
n
_
i=1
n
log(l +x
i
) -n log(l)
d logL(a, l)
d l
= n a
1
l
-_
i=1
n
(a+1)
1
(l +x
i
)
= 0
a
`
=
_
i=1
n
1
(l+x
i
)
n
l
-_
i=1
n
1
(l+x
i
)
Hence, the maximum likelihood estimator l
`
must be a solution of
n
_
i=1
n
log(l +x
i
) -n log(l)
=
_
i=1
n
1
(l+x
i
)
n
l
-_
i=1
n
1
(l+x
i
)
which may be solved in e.g. Excel. Thus, we obtain l
`
= 1872.13 and a
`
= 1.880467.
(c) Note that both Lognormal and Pareto distributions are defined on (0, ) and you
need to have _
i
E
i
= 120= _
i
O
i
.
(i) One appropriate choice is to use 10 equally probable bins as determined by the
fitted Lognormal distribution. So, in the case of Lognormal with m = 6.82436 and
s = 1.2543 we have
where the Expected number of data in the bin [L, U] is calculated as
120P(L < x U) = 120(F(U) -F(L)) = 1200.1
We need to compare with c
2
with df = (10-2-1) = 7. The critical value at 5% is 14.07
and so we reject the hypothesis that data follow Lognormal distribution with
SPMI CW1 2012/2013 Solutions
3
m = 6.82436 and s = 1.2543. The critical value at 1% is 18.48 and so we do not reject
the Lognormal fit. The p-value of the test statistic is 0.018503 and of course leads to
the same conclusions.
(ii) One appropriate choice is to use 10 equally probable bins as determined by the
fitted Pareto distribution (note that there is no built-in (inverted) Pareto distribution
function in Excel but it could be explicitly inverted in order to find the bins). So, in the
case of Pareto with a = 1.8805 and l = 1872.13 we have
where the Expected number of data in the bin [L, U] is calculated as
120P(L < x U) = 120(F(U) -F(L)) = 1200.1
We need to compare with c
2
with df = (10-2-1) = 7. The critical value at 5% is 14.07
and so we do not reject the hypothesis that data follow Pareto distribution with
a = 1.8805 and l = 1872.13. The critical value at 1% is 18.48 and so we again do not
reject the Pareto fit. The p-value of the test statistic is 0.277482 and of course leads to
the same conclusions.
(iii) Recall that in order for the Goodness-of-Fit test to be valid one needs E
i
5, so in
both cases (i) and (ii) the test is reliable. The quality of the test is also reaffirmed by
the high number of observations, namely 120, used to perform it.
Based on the test statistics one can say that the Pareto fit is better than the Lognormal
fit - it has a much smaller test statistics and thus, a higher p-value.
This particular choice ensures O
i
> 5, i = 1, ..., 10 and simplifies the computation of
E
i
, i = 1, ..., 10. Other choices are also possible and clearly, the test statistic will be
influenced by the bins selected. However, given the above calculations it is unlikely
that the drawn conclusions would be affected.
3. It will be instructive to use the notation M for the retention level and L for the limiting
level, (M = 20, L = 60).
(a) It is not difficult to see from the definition of F
Z
(x) that the partial density function of
SPMI CW1 2012/2013 Solutions
4
its continuous part is f
Z
(z) f
X
(z +M) = c exp-c (M + z) , if 0< z < L-M, where
f
X
( ) is the density of the original individual claims X.
Hence, one can conclude that the r.v. X has an exponential distribution with parameter c,
i.e., f
X
(x) = c
-c x
.
(b) We have
Z = min [max[0, X - M], L- M] = min [max[0, X - 20], 40]
(c) Applying similar reasoning as in the lectures,
E[Z] =
]
M
L
(1-F
X
(x)) x =
]
M
L
(1-(1-
-c x
)) x =
]
M
L

-c x
x =

-c M
-
-c L
c
.
Alternatively, but much longer,
E[Z] =
]
0

min [max[0, x - M], L- M] f


X
(x) x =
=
]
M
L
(x -M) f
X
(x) x +(L-M)
]
L
+
f
X
(x) x
=
]
0
L-M
y f
X
(y +M) y +(L-M) (1-F
X
(L))
=
]
0
L-M
y c
-c (y+M)
y +(L-M)
-c L
= c
-c M
]
0
L-M
y
-c y
y +(L-M)
-c L
= -c
-c M
|(L-M)
-c (L-M)
+
1
c
|
-c (L-M)
-1]j +(L-M)
-c L
=

-c M
-
-c L
c
Hence,
E[Z] =

-c M
-
-c L
c
=

-0.120
-
-0.160
0.1
= 1.32857
(d) Applying similar reasoning as in the lectures, we have
F
Y
(y) =
F
X
(y), if y < M
F
X
(L + y -M), if y M
i.e.,
F
Y
(y) =
1-exp-c y, if y < M
1-exp-c (L + y -M), if y M
We have
E[Y] =
]
0
M
(1-F
X
(x)) x +
]
L
+
(1-F
X
(x)) x.
=
]
0
M

-c x
x +
]
L

-c x
x =
1+
-c L
-
-c M
c
.
Alternatively, and simpler,
E[Y] = E[X] -E[Z] =
1
c
-

-c M
-
-c L
c
=
1+
-c L
-
-c M
c
.
SPMI CW1 2012/2013 Solutions
5
Alternatively, but much longer,
E[Y] =
]
0

min [X, M] f
X
(x) x +
]
0

max[0, X - L] f
X
(x) x =
=
1+
-c L
-
-c M
c
.
Finally, we have
E[Y] =
1+
-c L
-
-c M
c
=
1+
-0.160
-
-0.120
0.1
= 8.67143
4.
(a) We have that Y = I x where I is the indicator of the loss event (accident)
I =
1, withprobability q
0, withprobability1-q
and x is the severity of the loss given the loss event (accident) occurs. For the cdf
F
Y
(y), for y 0, we have
F
Y
(y) = P(Y y) = P(Y y I = 0) P(I = 0) + P(Y y I = 1) P(I = 1)
= P(I x y I = 0) P(I = 0) + P(I x y I = 1) P(I = 1)
= P(0 y) (1-q) + P(x y) q = 1-q +q F
x
(y)
substituting the gamma cdf (see lecture notes on Loss distributions)
F
x
(y) = 1-exp-a y -(a y) exp-a y = 1-exp-a y (1+a y),

we obtain that
F
Y
(y) = 1-q +q [1-exp-a y (1+a y)] = 1-q (1+a y) exp-a y
(b) Applying maximum likelihood to estimate q, we have
L(q, x) =
n
x
q
x
(1-q)
n-x
=
2000
40
q
40
(1-q)
2000-40
logL(q, x) = log
n
x
+ x logq +(n -x) log(1-q)
d logL(q, x)
d q
=
x
q
-
(n -x)
1-q
= 0 q
`
=
x
n
=
400
2000
= 0.2
Since E[x] = m = 2f a , we have that a
`
= 2f E[x] = 2f 1000= 0.002
(c) Set m = E(x) and s
2
= Var(x). Then, in view of
SPMI CW1 2012/2013 Solutions
6
E|Y
k
] = E|(I x)
k
] = E|I
k
] E|x
k
] = q E|x
k
]

E(Y) = m
Y
= q E(x) = q m =
2q
a
Var(Y) = s
2
= E|Y
2
] -(E(Y))
2
= q E|x
2
] -q
2
(E(x))
2
=
q|s
2
+ m
2
] -q
2
m
2
= q s
2
+q(1-q) m
2
=
2q
a
2
+
4q(1-q)
a
2
=
2q(3-2q)
a
2
E(Y) = m
Y
= q m = 0.2* 1000= 200
Var(Y) = s
2
=
2q(3-2q)
a
2
=
2* 0.2(3-2* 0.2)
0.002
2
= 260000.
E(S
n
) = n E(Y) = 2000* 200= 400000;
Var(S
n
) = 2000
2q(3-2q)
a
2
= 2000* 260000= 5210
7
(d) From the lectures we have that for a probability level b, say b = 0.95
b = P(S
n
P
n
) = P(S
n
(1+q) E[S
n
]) = P(S
n
-E[S
n
] q E[S
n
]) =
P
S
n
-E[S
n
]
Var[S
n
]

q E[S
n
]
Var[S
n
]
= P S
*

q E[S
n
]
Var[S
n
]
where P(S
*
x) F(x), standard normal cdf F
q E[S
n
]
Var[S
n
]
= 0.95
(2)
q
q
b
Var[S
n
]
E[S
n
]
=
q
b
n s
n m
Y
=
q
b
s
n m
Y
=
1.65* 260000
2000 * 200
= 0.0940645,
where we have used that
E[S
n
] = _
j=1
n
q
j
m
j
= n q m = n m
Y
andVar[S
n
] = _
j=1
n
Var(Y
j
) = n Var(Y) = n s
2
,
and that m
Y
= 20, s = 260000 = 172.047, q
b
= 1.65 at b = 0.95.
From (2) we see that the security loading coefficient q decays as 1 n , so the more
policies n the less the security coefficient q, which is natural since the risk is shared
among larger number of policyholders.
SPMI CW1 2012/2013 Solutions
7
(e) Denote the loss of the insurance company by W. We have
W =
0, if Y d
Y -d, if Y > d

F
W
(w) = P(W w) =
0, if w < 0
F
Y
(w+d) = P(Y w+d) if w 0
F
W
(w) = P(W w) =
0, if w < 0
1-q +q F
x
(w+d) if w 0
F
W
(w) = P(W w) =
0, if w < 0
1-q (1+a (w+d)) exp-a (w+d), if w 0
(f) (i) We have
W =
0, if Y d
Y -d, if d < Y m
m-d, if Y > m
F
W
(w) = P(W w) =
0, if w < 0
F
Y
(w+d) = P(Y w+d),
1,
if 0 w < m-d
if m-d w
F
W
(w) =
P(W w) =
0, if w < 0
1-q (1+a (w+d)) exp-a (w+d),
1,
if 0 w < m-d
if m-d w
(ii) We have
E(W) =
_
0

(1-F
W
(w)) w =
_
0
m-d
(1-F
W
(w)) w
=
_
0
m-d
(1-(1-q (1+a (w+d)) exp-a (w+d))) w
=
_
0
m-d
q (1+a (w+d)) exp-a (w+d) w
SPMI CW1 2012/2013 Solutions
8
=
_
0
m-d
q exp-a (w+d) w+
_
0
m-d
q a (w+d) exp-a (w+d) w
= q
-a d
_
0
m-d

-a w
w+q a
_
0
m-d
(w+d)
-a( w+d)
w
=
q
a
|
-a d
-
-a m
j +
q
a
|(1+a d)
-a d
-(1+a m)
-a m
j
=
q
a
|(2+a d)
-a d
-(2+a m)
-a m
j
E(W) =
q
a
|(2+a d)
-a d
-(2+a m)
-a m
j = 107.438
E(S) = 2000E(W) = 2000* 107.438= 214876,
so the mean has decreased almost twice compared to E(S
n
) = 2000* 200= 400000.
5.
We have the general results
(3) E[S] = m E[N] and V[S] = s
2
E[N] +m
2
Var[N]
(a)(i) For the Poisson case
E[S] = ml and V[S] = l E|X
2
j
So
E[S] = ml = 2*
1
3
+3*
1
2
+4*
1
6
* 150= 425
V[S] = l E|X
2
j = 150* 2
2
*
1
3
+3
2
*
1
2
+4
2
*
1
6
= 1275.
(ii) From (3) we have
E[S] = m E[N] = m
(1- p)
p
=
5
3
*
0.98
0.02
=
245
3
= 81.6667
V[S] = s
2
E[N] +m
2
Var[N] =
s
2
(1- p)
p
+m
2
(1- p)
p
2
=
5
3
2
*
0.98
0.02
+
5
2
3
2
*
0.98
0.02
2
=
61495
9
= 6832.78
SPMI CW1 2012/2013 Solutions
9
(ii) From (3) we have
E[S] = m E[N] = m
k * (1- p)
p
= 3*
4* 0.98
0.02
= 588
V[S] = s
2
E[N] +m
2
Var[N] =
s
2
k * (1- p)
p
+m
2
k * (1- p)
p
2
= 3
2
*
4* 0.98
0.02
+3
2
*
4* 0.98
0.02
2
= 89964
(b)(i) Comparing
M
S
(t) = exp{200
1
1-2t
-1 , fort < 1f 2
with formula (22) from the lecture notes on Risk models which is for the Poisson
number of claims
M
S
(t) = expl(M
X
(t) -1)
it is evident that this corresponds to a collective risk model with Poisson l = 200
number of claims and
M
X
(t) =
1
1-2t
, fort < 1f 2
is the m.g.f. of exponentially distributed claim amounts, X ~ Exp(0.5)
(ii) Therefore we have
E[S] = ml = 2* 200= 400 and V[S] = l E|X
2
j = 200* |2
2
+2
2
] = 1600
SPMI CW1 2012/2013 Solutions
10

You might also like