0% found this document useful (0 votes)
19 views

GLM Slides 5 Continuous Response

Uploaded by

raisa.mim17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

GLM Slides 5 Continuous Response

Uploaded by

raisa.mim17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Generalized Linear Models

Lecture 5. Models with continuous response

GLM (MTMS.01.011) Lecture 5 1 / 35


Continuous response

In practice, there are many situations where the response is continuous but the
normal distribution does not fit:
response can only take nonnegative values, Y ≥ 0
conditional probability that Y ≥ k increases when k increases
variance is not constant

Examples:
insurance claims
time to failure (of an experiment or a machine)
amount of rainfall

The distributions that may suit in given scenarios are gamma, lognormal or
inverse Gaussian distributon

GLM (MTMS.01.011) Lecture 5 2 / 35


Coefficient of variation

Definition. Coefficient of variation


Coefficient of variation of a random variable Y is the ratio of its standard
deviation to the mean

DY σ ˆ = sy
CV = = , CV
EY µ ȳ

Coefficient of variation
is a dimensionless measure of variability (often given in %)
allows to compare the variability of variables measured in different scales
is also called relative standard deviation (RSD)
if µ → 0 then CV → ∞, i.e. it is sensitive to values of µ
CV is often used in situations where the r.v. of interest is (related to) exponential

GLM (MTMS.01.011) Lecture 5 3 / 35


Coefficient of variation

Definition. Coefficient of variation


Coefficient of variation of a random variable Y is the ratio of its standard
deviation to the mean

DY σ ˆ = sy
CV = = , CV
EY µ ȳ

Coefficient of variation
is a dimensionless measure of variability (often given in %)
allows to compare the variability of variables measured in different scales
is also called relative standard deviation (RSD)
if µ → 0 then CV → ∞, i.e. it is sensitive to values of µ
CV is often used in situations where the r.v. of interest is (related to) exponential

Then DY = EY ⇒ CV = 1
CV < 1 (Erlang distribution) – small relative variability
CV > 1 (hyperexp. dist. = mixture of exponentials) – large relative variability
GLM (MTMS.01.011) Lecture 5 3 / 35
Coefficient of variation for known distributions

Distribution EY DY CV

σ
N(µ, σ 2 ) µ σ2 µ

q
1−π
B(n, π) nπ nπ(1 − π) nπ

Po(λ) λ λ √1
λ

α α √1
Γ(α, λ) λ λ2 α

1 1
Exp(λ) λ λ2 1

GLM (MTMS.01.011) Lecture 5 4 / 35


Distributions with constant CV

Consider random variables that are positive and right skewed


In that case, constant variance might not be the best assumption (think of
different scales, for example), but distributions with constant CV might be quite
useful

In other words, let us assume that CV = α, which also means that there is a
linear relation between the std. deviation and the mean:

DY = αEY

A known distribution with constant coefficient of variance is Gamma distribution


(in what sense?)

GLM (MTMS.01.011) Lecture 5 5 / 35


Gamma distribution Y ∼ Γ(ν, µν ) (Y ∼ Γ(α, λ))

Gamma distribution is a non-negative continuous distribution with the following


characteristics:  ν
1 ν
Pdf: f (y ; µ, ν) = Γ(ν) µ y ν−1 exp(− µν y ), y ≥ 0,
R∞
where Γ(ν) is the Gamma function: Γ(ν) = 0 x ν−1 exp(−x )dx
ν
Mean: EY = ν/µ =µ
µ2
Variance: DY = ν
Coefficient of variation: CV = √1
ν
µ3 √2 µ3
Skewness: γ1 = 3/2 = ν
> 0, since µ3 = 2 ν2 .
µ2

In other words, the distribution is skewed to the right


ν
Relation between parametrizations: α = ν, λ = µ

Some known properties of the gamma function:



Γ(1) = 1; Γ(ν) = (ν − 1)Γ(ν − 1), ν > 1; Γ(n) = (n − 1)! Γ(1/2) = π

GLM (MTMS.01.011) Lecture 5 6 / 35


Some properties of gamma distribution

Gamma distribution has two parameters:


shape parameter ν > 0
ν
rate (or inverse scale) parameter λ = µ (given in this form to stress the
relation to the mean)

Properties:
ν 1
if λ = µ = 2 then we have χ22ν -distribution
if ν = 1 then we have exponential distribution
if 0 < ν ≤ 1, the density is monotone decreasing, otherwise unimodal and
stretched to the right
if Yi ∼P
Γ(νi , λ) are independent, their sum has gamma distribution with
n
shape i=1 νi and rate λ
if ν → ∞, gamma distribution converges to normal distribution

GLM (MTMS.01.011) Lecture 5 7 / 35


Density of gamma distribution

GLM (MTMS.01.011) Lecture 5 8 / 35


Example of gamma distribution (1)

Let us look at the duration of something which can be considered as a sum of


latent periods, each of which has exponential distribution with fixed intensity.
Then the total duration follows gamma distribution

Lindsey (1995). Data from Liege, Belgium 1984, the duration of marriages
(n = 1699)
Marriage is sometimes thought of as having three periods of different relationships
through which a couple goes, so that gamma distribution with ν = 3 might be
suitable
Empirical mean is ȳ = 13.85 years and variance is sy2 = 75.9
2
Parameter estimates: ν̂ = ȳs 2 = 2.53; µ̂ = 13.85
y
Histogram (next slide) shows a good fit

GLM (MTMS.01.011) Lecture 5 9 / 35


Example of gamma distribution (2)

Source: Lindsey (1995). Introductory Statistics. A modelling approach. Oxford

GLM (MTMS.01.011) Lecture 5 10 / 35


Gamma distribution as a member of exponential family
Let us now show that gamma distribution belongs to the exponential family:
Recall the pdf
 ν
1 ν ν
f (yi ; µi , ν) = yiν−1 exp(− yi ),
Γ(ν) µi µi

which can be rewritten as


  
yi
f (yi ; µi , ν) = exp − − ln µi ν − ln Γ(ν) + ν ln ν + (ν − 1) ln yi
µi

θi = − µ1i ; b(θi ) = − ln(−θi ); ϕi = 1


ν
Mean: E(Y |xi ) = b 0 (θi ) = − θ1i = µi , µi > 0
Variance function: b 00 (θi ) = 1
θi2
= µ2i
µ2i
Variance: D(Y |xi ) = σi2 = ϕi b 00 (θi ) = ν (depends on the mean)
σi √1
Coefficient of variation: CV = µi = ν
(is constant)

GLM (MTMS.01.011) Lecture 5 11 / 35


Deviance
If the shape parameter ν is known, the log-likelihood can be written as:
 
yi
l(µi ; yi , ν) = − − ln µi ν − ln Γ(ν) + ν ln ν + (ν − 1) ln yi
µi

Thus, the sample log-likelihood of a current model is


X yi
l(µ̂; y, ν) = ν (− − ln µ̂i ) + C
µ̂i
and for a saturated model (i.e. µ̂i = yi )
X
l(y; y, ν) = −ν (1 + ln yi ) + C

Deviance for gamma model


X yi yi − µ̂i
D(y, µ̂) = −2 (ln − )
µ̂i µ̂i
a 1
and D ∼ ϕχ2n−p , where ϕ = ν
GLM (MTMS.01.011) Lecture 5 12 / 35
Link functions

Canonical link of the gamma model is g(µi ) = − µ1i


Nevertheless, the default link function offered in most statistical packages is the
inverse function:

1
g(µi ) =
µi

Why?

GLM (MTMS.01.011) Lecture 5 13 / 35


Link functions

Canonical link of the gamma model is g(µi ) = − µ1i


Nevertheless, the default link function offered in most statistical packages is the
inverse function:

1
g(µi ) =
µi

Why?

Note that by construction we have a restriction µi > 0, which also implies ηi > 0

Other possible choices:


log-link g(µi ) = ln µi (very often used)
identity link g(µi ) = µi (NB! restriction µi > 0)

power function link g(µi ) = µαi (also, e.g. µi )

GLM (MTMS.01.011) Lecture 5 13 / 35


Example. Hurn , 1945

Data about blood clotting


y – blood clotting time (sec); L = {LOT 1, LOT 2} – 2 different scenarios
u – plasma concentration (%)
u LOT1 LOT2
5 118 69
10 58 35
15 42 26
20 35 21
30 27 18
40 25 16
60 21 13
80 19 12
100 18 12

Bliss (1970) estimated a hyperbolic model using gamma distribution and log-link
Results (x = log10 u, see next slide for the explanation of this transform):

LOT 1 : µ̂−1 = −0.0166 + 0.0153x ; LOT 2 : µ̂−1 = −0.0239 + 0.0236x

GLM (MTMS.01.011) Lecture 5 14 / 35


Example. Hurn , 1945

GLM (MTMS.01.011) Lecture 5 15 / 35


Measures of explained variation (R 2 ) for Gamma model

Two estimates for R 2 are proposed:


estimate based on residual sums of squares:

2 RSS(µ̂)
RRSS =1−
RSS(µ̄)

estimate based on deviances:


D(y, µ̂)
RD2 = 1 −
D(y, µ̄)

Adjustments by degrees of freedom and certain shrinkage factor to given estimates


is also proposed

M. Mittlböck, H. Heinzl (2002). Measures of explained variation in Gamma regression models. Commun. Statist. – Simulat. and Comput., 31(1), 67-73.

GLM (MTMS.01.011) Lecture 5 16 / 35


Estimation of the shape parameter (and model scale)
If the shape parameter ν is not known, we can find ν and ϕ using the maximum
likelihood method (derivative of log-likelihood by ν, ...),
which is approximately
D(y, µ̂)
ϕ̂ = ν̂ −1 =
n
or, a bias-corrected version:
D(y, µ̂)
ϕ̂ = ν̂ −1 =
n−p

These estimates are sensitive to small values of yi and are undefined if yi = 0


The moment-based Pearson χ2 is thus often preferred

χ2
ϕ̂ = ν̂ −1 =
n−p

R: summary(model)$disp gives Pearson estimate for dispersion,


gamma.shape(model) uses iterations, starts from deviance based estimate
GLM (MTMS.01.011) Lecture 5 17 / 35
Remarks to gamma models

If the response has small CV (even 0.6), it is difficult to distinguish normal


model for log y and log-linear gamma model (i.e. gamma with log-link)
(Atkinson, 1982)
In situations where censoring is possible (reliability and survival data),
gamma models may have issues in practical applications

GLM (MTMS.01.011) Lecture 5 18 / 35


Lognormal model

Definition. Lognormal distribution


Random variable Y has lognormal distribution if its logarithm Z = ln(Y ) has
normal distribution
If Z ∼ N(µ, σ 2 ) and Z = ln(Y ) then Y ∼ LN(µ, σ 2 )
1 1 (ln y − µ)2
Pdf: fY (y ; µ, σ) = fZ (ln y ) = √ exp{− }, y >0
y y σ 2π 2σ 2
Expectation: EY = exp(µ + 12 σ 2 )
Variance: DY = exp(2µ + σ 2 )(exp σ 2 − 1)
p
Coefficient of variation: CV = exp σ 2 − 1

Variance is proportional to the square of the mean (like gamma distribution)


In case of small values of σ 2 (CV < 0.7), it is quite close to gamma distribution

GLM (MTMS.01.011) Lecture 5 19 / 35


Pdf of lognormal distribution

GLM (MTMS.01.011) Lecture 5 20 / 35


Usage of lognormal distribution

Lognormal is widely used in different areas, e.g.


measurements in biology and medicine (size of living tissue (length, skin area,
weight), length of inert appendages (hair, claws, nails, teeth) of biological
specimens, blood pressure of adult humans)
extreme events in hydrology (daily rainfall and river discharge volumes)
income of most of the population (the distribution of higher-income
individuals follows a Pareto distribution)
financial and insurance models (claim amounts, Black-Scholes model,
changes in the logarithm of exchange rates, price indices, and stock market
indices are assumed normal)
size of cities
different processes in technology

GLM (MTMS.01.011) Lecture 5 21 / 35


Lognormal distribution and exponential family

Question
Does lognormal distribution belong to the exponential family?

GLM (MTMS.01.011) Lecture 5 22 / 35


Lognormal distribution and exponential family

Question
Does lognormal distribution belong to the exponential family?

Answer:
Based on the definition of natural 1-parameter exponential family, log-normal
distribution does not belong to this family

GLM (MTMS.01.011) Lecture 5 22 / 35


Lognormal distribution and exponential family

Question
Does lognormal distribution belong to the exponential family?

Answer:
Based on the definition of natural 1-parameter exponential family, log-normal
distribution does not belong to this family

On the other hand, lognormal distribution belongs to 2-parameter exponential


family with natural parameters and natural statistics, respectively, given by
 
1 µ
− 2, 2 and (ln2 (Y ), ln(Y ))
2σ σ

GLM (MTMS.01.011) Lecture 5 22 / 35


Inverse Gaussian model

Definition. Inverse Gaussian distribution


Random variable Y has inverse Gaussian distribution, Y ∼ IG(µ, λ) if its pdf has
the following form:
1/2
λ(y − µ)2

λ
f (y ; µ, λ) = exp{− }, y >0
2πy 3 2µ2 y

Mean: EY = µ
µ3
Variance: DY = – variance is proportional to mean cubed
λ
Coefficient of variation: CV = µλ
p

NB! The name can be misleading: "inverse" means that while the Gaussian
describes a Brownian motion’s level at a fixed time, the inverse Gaussian describes
the distribution of the time a Brownian motion with positive drift takes to reach a
fixed positive level.

GLM (MTMS.01.011) Lecture 5 23 / 35


Usage of inverse Gaussian distribution

IG has sharper peak and heavier tails (as compared to lognormal), thus it is used
in areas related to more extreme events
insurance
financial mathematics
meteorology (wind energy applications)
Its hazard function is ∩-shaped as for lognormal and Weibull distributions, thus IG
is also used in
survival analysis
risk analysis (e.g. analysis of noise effects)
IG is first mentioned by Schrödinger (1915) and Smoluchowski (1915), name
"inverse Gaussian" is proposed by Tweedie (1941)
Same class of distributions is discussed by Wald (1947). If µ = 1, inverse
Gaussian is called Wald’s distribution

GLM (MTMS.01.011) Lecture 5 24 / 35


Pdf of inverse Gaussian distribution IG(µ, λ)

Source: Matsuda, K (2005). Inverse Gaussian Distribution.


GLM (MTMS.01.011) Lecture 5 25 / 35
Inverse Gaussian as a member of exponential family
Let us rewrite the pdf
1/2
λ(yi − µi )2

λ
f (yi ; µi , λ) = exp{− }, y >0
2πyi3 2µ2i yi
in the following form:
λ(yi − µi )2 1 λ
f (yi ; µi , λ) = exp{− + ln( )}
2µ2i yi 2 2πyi3
−yi /2µ2i + 1/µi
 
λ 1 λ
= exp − + ln( )
1/λ 2yi 2 2πyi3
Now
canonical parameter: θi = − 2µ1 2
i
1
b(θi ) = − µ1i = −(−2θi ) 2
1
ϕi = λ
1
b 0 (θi ) = (−2θi )− 2 = µi
b 00 (θi ) = µ3i
Prove it!
GLM (MTMS.01.011) Lecture 5 26 / 35
Link functions used for inverse Gaussian model

Canonical link: g(µi ) = − 2µ1 2 – not used very often in this exact form
i

1
Default link in most statistical packages: squared inverse, g(µi ) = µ2i
,
which implies µi = √1
ηi

Other possibilities:
log-link g(µi ) = ln µi – often used, especially when squared inverse has
convergence or negativity issues
identity g(µi ) = µi – always a simple choice, problems if ηi < 0

GLM (MTMS.01.011) Lecture 5 27 / 35


Deviance

Sample log-likelihood for inverse Gaussian:


X  −yi /2µ2 + 1/µi λ 1 λ

i
l(µ; y, λ) = − + ln( )
i
1/λ 2yi 2 2πyi3

1
Thus the deviance is (as ϕ = λ )

2
D = − {l(µ̂; y, λ) − l(y; y, λ)}
λ
X  yi 1
 
yi 1

=2 − − −
i
2µ̂i 2 µ̂i 2yi2 yi
X yi 
2 1
 X (yi − µ̂i )2
= 2 − + =
i
µ̂i µ̂i yi i
yi µ̂2i

GLM (MTMS.01.011) Lecture 5 28 / 35


Example. Australian insurance claims 2004-2005
Data: 67856 insurance policies, 4624 claims
How to model the individual claim amount?
claimcst0 – claim amount (0 if no claim) (min 200, max 55922)
gender – gender of driver: M, F
area – driver’s area of residence: A, B, C, D, E, F
agecat – driver’s age category: 1 (youngest), 2, 3, 4, 5, 6

Histogram of claim amount


1000 2000 3000 4000
Frequency

0 10000 20000 30000 40000 50000 60000

Claim amount

GLM (MTMS.01.011) Lecture 5 29 / 35


Example. Results (1)

> m1 = glm(claimcst0~factor(agecat)+gender+area,
data=claims, family="inverse.gaussian"(link="log"))
> summary(m1)
...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.70839 0.09853 78.230 < 2e-16 ***
factor(agecat)2 -0.15845 0.10349 -1.531 0.125836
...
factor(agecat)6 -0.31865 0.12076 -2.639 0.008349 **
genderM 0.15283 0.05119 2.986 0.002846 **
areaB -0.02976 0.07287 -0.408 0.682977
...
areaF 0.35539 0.13049 2.723 0.006485 **
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
...
AIC: 77162
How to interpret the results?
GLM (MTMS.01.011) Lecture 5 30 / 35
Example. Results (2)

> m2 = glm(claimcst0~factor(agecat)+gender+area,
data=claims, family="Gamma")

> m3 = glm(log(claimcst0)~factor(agecat)+gender+area,
data=claims, family="gaussian")
> m1$aic
[1] 77162.32

> m2$aic
[1] 79331.75

> m3$aic
[1] 14707.79

Which model is best?

GLM (MTMS.01.011) Lecture 5 31 / 35


Lognormal model revisited. Log-likelihood

Sample likelihood for lognormal:


Y 1 1 (ln yi − µi )2
L(y; µ, σ) = √ exp(− )
i
yi σ 2π 2σ 2

Sample log-likelihood for lognormal:


X √ X (ln yi − µi )2
l(y; µ, σ) = − ln(yi ) − n ln(σ 2π) −
i i
2σ 2

NB! Compare with the log-likelihood of normal applied to log-sample:

√ X (ln yi − µi )2
l(ln y; µ, σ) = −n ln(σ 2π) −
i
2σ 2

GLM (MTMS.01.011) Lecture 5 32 / 35


Lognormal model. AIC

AIC = −2log-likelihood + 2p

AIC for lognormal:


!
X √ X (ln yi − µi )2
AICLN = −2 − ln(yi ) − n ln(σ 2π) − + 2p
i i
2σ 2

AIC for normal applied to log-sample:


!

X (ln yi − µi )2
AICN = −2 −n ln(σ 2π) − + 2p
i
2σ 2

P
⇒ AICLN = AICN + 2 i ln(yi )

GLM (MTMS.01.011) Lecture 5 33 / 35


Lognormal model. How to proceed?

P
We just need to calculate 2 i ln(yi ) and add it to the AIC found for normal
model

> sum(log(claims$claimcst0))
[1] 31489.81

> m3$aic + 2*sum(log(claims$claimcst0))


[1] 77687.41

Thus the correct value for lognormal model is 77687


Recall that the other models had AICIG = 77162 and AICGamma = 79331
These numbers are now comparable and we conclude that inverse Gaussian fits
the data best

GLM (MTMS.01.011) Lecture 5 34 / 35


Conclusive remarks

Some remarks about residual analysis


Normal distribution: constant variance (for standardized residuals)
If variance varies
– in case of gamma (and lognormal), variance is proportional to squared
mean,
– in case of inverse Gaussian, variance is proportional to cubed mean
Graphs:
a) in case of log-link, argument vs log yi should be linear
b) in case of inverse link, argument vs 1/yi should be linear

GLM (MTMS.01.011) Lecture 5 35 / 35

You might also like