0% found this document useful (0 votes)

12 views167 pages

Distributions.r Forge

The document is a handbook on probability distributions, detailing various discrete, continuous, and multivariate distributions. It provides definitions, estimations, applications, and properties of distributions such as the discrete uniform, Bernoulli, and binomial distributions. The guide is structured into chapters, each focusing on a specific family of distributions, and includes mathematical tools and a bibliography.

Uploaded by

jayakishori1357

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views167 pages

Distributions.r Forge

Uploaded by

jayakishori1357

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 167

R powered R-forge project

From http://r-forge.r-project.org, available http://tinyurl.com/yc7tem9z

Handbook on probability distributions

R-forge distributions Core Team

University Year 2009-2010

LATEXpowered Mac OS’ TeXShop edited

Contents

Introduction 4

I Discrete distributions 6

1 Classic discrete distribution 7

2 Not so-common discrete distribution 27

II Continuous distributions 34

3 Finite support distribution 35

4 The Gaussian family 47

5 Exponential distribution and its extensions 56

6 Chi-squared’s ditribution and related extensions 75

7 Student and related distributions 84

8 Pareto family 88

9 Logistic distribution and related extensions 108

10 Extrem Value Theory distributions 111

3
4 CONTENTS

III Multivariate and generalized distributions 116

11 Generalization of common distributions 117

12 Multivariate distributions 133

13 Misc 135

Conclusion 137

Bibliography 137

A Mathematical tools 141

Introduction

This guide is intended to provide a quite exhaustive (at least as I can) view on probability distri-
butions. It is constructed in chapters of distribution family with a section for each distribution.
Each section focuses on the tryptic: definition - estimation - application.

Ultimate bibles for probability distributions are Wimmer & Altmann (1999) which lists 750
univariate discrete distributions and Johnson et al. (1994) which details continuous distributions.

In the appendix, we recall the basics of probability distributions as well as “common” mathe-
matical functions, cf. section A.2. And for all distribution, we use the following notations

• X a random variable following a given distribution,

• x a realization of this random variable,

• f the density function (if it exists),

• F the (cumulative) distribution function,

• P (X = k) the mass probability function in k,

• M the moment generating function (if it exists),

• G the probability generating function (if it exists),

• φ the characteristic function (if it exists),

Finally all graphics are done the open source statistical software R and its numerous packages
available on the Comprehensive R Archive Network (CRAN∗ ). See the CRAN task view† on prob-
ability distributions to know the package to use for a given “non standard” distribution, which is
not in base R.

∗
http://cran.r-project.org
†
http://cran.r-project.org/web/views/Distributions.html

5
Part I

Discrete distributions

6
Chapter 1

Classic discrete distribution

1.1 Discrete uniform distribution

1.1.1 Characterization

The discrete uniform distribution can be de-

mass probability function
fined in terms of its elementary distribution
(sometimes called mass probability function):
0.14

1
P (X = k) = ,
0.12

n
where k ∈ S = {k1 , . . . , kn } (a finite set of or-
P(X=k)

0.10

dered values). Typically, the ki ’s are consecu-

tive positive integers.
0.08

Equivalenty, we have the following cumula-

tive distribution function:
0.06

n
1X
F (k) = 11(ki ≤k) ,
n 2 4 6 8 10
i=1
k
where 11 is the indicator function.
Figure 1.1: Mass probability function for discrete
Furthermore, the probability generating uniform distribution
function is given by
n
4 1 X ki
G(t) = E(tX ) = t ,
n
i=1

with the special cases where the ki ’s are {1, . . . , n}, we get

1 − zn
G(t) = z ,
1−z

7
8 CHAPTER 1. CLASSIC DISCRETE DISTRIBUTION

when z 6= 1.

Finally, the moment generating function is expressed as follows

n
4 1 X tki
M (t) = E(tX ) = e ,
n
i=1

with the special case e t 1−etn when S = {1, . . . , n}.

1−et

1.1.2 Properties

The expectation is X n , the empirical mean: E(X) = n1 ni=1 ki . When S = {1, . . . , n}, this is just
P
n+1 1 Pn 2 n2 −1
2 . The variance is given by V ar(X) = n i=1 (ki − E(X) which is 12 for S = {1, . . . , n}.

1.1.3 Estimation

Since there is no parameter to estimate, calibration is pretty easy. But we need to check that
sample values are equiprobable.

1.1.4 Random generation

The algorithm is simply

• generate U from a uniform distribution,

• compute the generated index as I = dn × U e,

• finally X is kI .

where d.e denotes the upper integer part of a number.

1.1.5 Applications

A typical application of the uniform discrete distribution is the statistic procedure called bootstrap
or others resampling methods, where the previous algorithm is used.

1.2 Bernoulli/Binomial distribution

1.2.1 Characterization
1.2. BERNOULLI/BINOMIAL DISTRIBUTION 9

mass probability function

Since the Bernoulli distribution is a special case
of the binomial distribution, we start by ex- B(10,1/2)
B(10,2/3)
plaining the binomial distribution. The mass

0.30
probability distribution is

0.25
P (X = k) = Cnk pk (1 − p)n−k ,

0.20
P(X=k)

0.15
n!
where Cnk is the combinatorial number k!(n−k)! ,

0.10
k ∈ N and 0 < p < 1 the ’success’ probabil-
ity. Let us notice that the cumulative distribu-

0.05
tion function has no particular expression. In
the following, the binomial dsitribuion is de-

0.00
noted by B(n, p). A special case of the bino-
0 2 4 6 8 10
mial dsitribution is the Bernoulli when n = 1.
k
This formula explains the name of this distri-
bution since elementary probabilities P (X = k) Figure 1.2: Mass probability function for binomial
are terms of the development of (p + (1 − p))n distributions
according the Newton’s binom formula.

Another way to define the binomial distribution is to say that’s the sum of n identically and
independently Bernoulli distribution B(p). Demonstration can easily be done with probability
generating function. The probability generating function is

G(t) = (1 − p + pz)n ,

while the moment generating function is

M (t) = (1 − p + pet )n .

The binomial distribution assumes that the events are binary, mutually exclusive, independent
and randomly selected.

1.2.2 Properties

The expectation of the binomial distribution is then E(X) = np and its variance V ar(X) =
np(1 − p). A useful property is that a sum of binomial distributions is still binomial if success
L
probabilities are the same, i.e. B(n1 , p) + B(n2 , p) = B(n1 + n2 , p).

We have an asymptotic distribution for the binomial distribution. If n → +∞ and p → 0 such

that np tends to a constant, then B(n, p) → P(np).
10 CHAPTER 1. CLASSIC DISCRETE DISTRIBUTION

1.2.3 Estimation

Bernoulli distribution

Let (Xi )1≤i≤m be an i.i.d. sample of binomial distributions B(n, p). If n = 1 (i.e. Bernoulli
distribution, we have
m
1 X
p̂m = Xi
m
i=1

is the unbiased and efficient estimator of p with minimum variance. It is also the moment-based
estimator.

There exists a confidence interval for the Bernoulli distribution using the Fischer-Snedecor
distribution. We have
" −1 −1 #
m−T +1 m−T
Iα (p) = 1+ f2(m−T +1),2T, α2 , 1+ f α ,
T T + 1 2(m−T ),2(T +1), 2

where T = m
P
i=1 Xi and fν1 ,ν2 ,α the 1 − α quantile of the Fischer-Snedecor distribution with ν1 and
ν2 degrees of freedom.

We can also use the central limit theorem to find an asymptotic confidence interval for p

uα p uα p
Iα (p) = p̂m − √ p̂m (1 − p̂m ), p̂m + √ p̂m (1 − p̂m ) ,
n n

where uα is the 1 − α quantile of the standard normal distribution.

Binomial distribution

When n is not 1, there are two cases: either n is known with certainty or n is unknown. In the
first case, the estimator of p is the same as the Bernoulli distribution. In the latter case, there are
no closed form for the maximum likelihood estimator of n.

One way to solve this problem is to set n̂ to the maximum number of ’success’ at first. Then
we compute the log likelihood for wide range of integers around the maximum and finally choose
the likeliest value for n.

Method of moments for n and p is easily computable. Equalling the 2 first sample moments,
we have the following solution
( 2
Sm
p̃ = 1 − X̄
m ,
ñ = X̄p̃m
with the constraint that ñ ∈ N.

Exact confidence intervals cannot be found since estimators do not have analytical form. But
we can use the normal approximation for p̂ and n̂.
1.3. ZERO-TRUNCATED OR ZERO-MODIFIED BINOMIAL DISTRIBUTION 11

1.2.4 Random generation

It is easy to simulate Bernoulli distribution with the following heuristic:

• generate U from a uniform distribution,

• compute X as 1 if U ≤ p and 0 otherwise.

The binomial distribution is obtained by summing n i.i.d. Bernoulli random variates.

1.2.5 Applications

The direct application of the binomial distribution is to know the probability of obtaining exactly
n heads if a fair coin is flipped m > n times. Hundreds of books deal with this application.

In medecine, the article Haddow et al. (1994) presents an application of the binomial distribution
to test for a particular syndrome.

In life actuarial science, the binomial distribution is useful to model the death of an insured or
the entry in invalidity/incapability of an insured.

1.3 Zero-truncated or zero-modified binomial distribution

1.3.1 Characterization
mass probability function

The zero-truncated version of the binomial dis-

0.5

B(10,2/3)
B(10,2/3,0)
tribution is defined as follows B(10,2/3,1/4)

Cnk pk (1 − p)n−k
0.4

P (X = k) = ,
1 − (1 − p)n
0.3

where k ∈ {1, . . . , n}, n, p usual parameters.

P(X=k)

The distribution function does not have partic-

ular form. But the probability generating func-
0.2

tion and the moment generating function exist

(1 + p(z − 1))n − (1 − p)n
0.1

G(t) = ,
1 − (1 − p)n
0.0

and
0 1 2 3 4
(1 + p(et − 1))n − (1 − p)n
M (t) = . k
1 − (1 − p)n
In the following distribution, we denote the
zero-truncated version by B0 (n, p). Figure 1.3: Mass probability function for zero-
modified binomial distributions
12 CHAPTER 1. CLASSIC DISCRETE DISTRIBUTION

For the zero-modified binomial distribution,

which of course generalizes the zero-truncated
version, we have the following elementary probabilities

p̃ if k = 0
P (X = k) = ,
KCnk pk (1 − p)n−k otherwise
1−p̃
where K is the constant 1−(1−p) n , n, p, p̃ are the parameters. In terms of probability/moment
generating functions we have:

G(t) = p̃ + K((1 − p + pz)n − (1 − p)n ) and M (t) = p̃ + K((1 − p + pet )n − (1 − p)n ).

The zero-modified binomial distribution is denoted by B(n, p, p̃).

1.3.2 Properties

np
The expectation and the variance for the zero-truncated version is E(X) = 1−(1−p)n and V ar(X) =
np(1−p−(1−p+np)(1−p)n )
(1−(1−p)n )2
. For the zero-modified version, we have E(X) = Knp and V ar(X) =
Knp(1 − p).

1.3.3 Estimation

From Cacoullos & Charalambides (1975), we know there is no minimum variance unbiased estimator
for p. NEED HELP for the MLE... NEED Thomas & Gart (1971)

Moment based estimators are numerically computable whatever we suppose n is known or

unknown.

Confidence intervals can be obtained with bootstrap methods.

1.3.4 Random generation

The basic algorithm for the zero-truncated version B0 (n, p) is simply

• do; generate X binomially distributed B(n, p); while X = 0

• return X

In output, we have a random variate in {1, . . . , n}.

The zero-modified version B(n, p, p̃) is a little bit tricky. We need to use the following heuristic:

• generate U from an uniform distribution

1.4. QUASI-BINOMIAL DISTRIBUTION 13

• if U < p̃, then X = 0

• otherwise

– do; generate X binomially distributed B(n, p); while X = 0

• return X

1.3.5 Applications

Human genetics???

1.4 Quasi-binomial distribution

1.4.1 Characterization

The quasi-binomial distribution is a “small” pertubation of the binomial distribution. The mass
probability function is defined by

P (X = k) = Cnk p(p + kφ)k−1 (1 − p − kφ)n−k ,

where k ∈ {0, . . . , n}, n, p usual parameters and φ ∈] − np , 1−p

n [. Of course, we retrieve the binomial
distribution with φ set to 0.

1.4.2 Properties

NEED REFERENCE

1.4.3 Estimation

NEED REFERENCE

1.4.4 Random generation

NEED REFERENCE

1.4.5 Applications

NEED REFERENCE
14 CHAPTER 1. CLASSIC DISCRETE DISTRIBUTION

1.5 Poisson distribution

1.5.1 Characterization

The Poisson distribution is characterized by the mass probability function

following elementary probabilities

0.5
P(4)
P(2)
λk −λ P(1)
P (X = k) = e ,
k!

0.4
where λ > 0 is the shape parameter and k ∈ N.

0.3
The cumulative distribution function has no

P(X=k)
particular form, but the probability generating

0.2
function is given by

G(t) = eλ(t−1) ,
0.1
and the moment generating function is 0.0

t −1)
M (t) = eλ(e . 0 2 4 6 8 10

Figure 1.4: Mass probability function for Poisson

Another way to characterize the Poisson dis- distributions
tribution is to present the Poisson process (cf.
Saporta (1990)). We consider independent and
identically events occuring on a given period of time t. We assume that those events can not occur
simultaneously and their probability to occur only depends on the observation period t. Let c be
the average number of events per unit of time (c for cadency). We can prove that the number of
events N occuring during the period [0, t[ is

(ct)k −ct
P (N = n) = e ,
k!
since the interoccurence are i.i.d. positive random variables with the property of ’lack of memory’∗ .

1.5.2 Properties

The Poisson distribution has the ’interesting’ but sometimes annoying property to have the same
mean and variance. We have E(X) = λ = V ar(X).

The sum of two independent Poisson distributions P(λ) and P(µ) (still) follows a Poisson
distribution P(λ + µ).

Let N follows a Poisson distribution P(λ).PKnowing the value of N = n, let (Xi )1≤i≤n be a
sequence of i.i.d. Bernoulli variable B(q), then ni=1 Xi follows a Poisson distribution P(λq).
∗
i.e. interoccurence are exponentially distributed, cf. the exponential distribution.
1.6. ZERO-TRUNCATED OR ZERO-MODIFIED POISSON DISTRIBUTION 15

1.5.3 Estimation

The estimator maximum likelihood estimator of λ is λ̂ = X n for a sample (Xi )i . It is also the
moment based estimator, an unbiased estimator λ and an efficient estimator.

From the central limit theorem, we have asymptotic confidence intervals

q q
uα uα
Iα (λ) = λ̂n − √ λ̂m , λ̂n + √ λ̂m ,
n n

where uα is the 1 − α quantile of the standard normal distribution.

1.5.4 Random generation

A basic way to generate Poisson random variate is the following:

• initialize variable n to 0, l to e−λ and P to 1,

• do

– generate U from a uniform distribution,

– P = P × U,
– n = n + 1,

while P ≥ l,

• return n − 1.

See Knuth (2002) for details.

TOIMPROVE

Ahrens, J. H. and Dieter, U. (1982). Computer generation of Poisson deviates from modified
normal distributions. ACM Transactions on Mathematical Software, 8, 163?179.

1.5.5 Applications

TODO

1.6 Zero-truncated or zero-modified Poisson distribution

1.6.1 Characterization
16 CHAPTER 1. CLASSIC DISCRETE DISTRIBUTION

The zero-truncated version of the Poisson dis- mass probability function

tribution is defined the zero-truncated binomial

1.0
P(1/2)
distribution for the Poisson distribution. The P(1/2,0)
P(1/2,1/4)
elementary probabilities is defined as

0.8
λk 1
P (X = k) = λ
,
k! (e − 1)

0.6
P(X=k)
where k ∈ N∗ . We can define probabil-

0.4
ity/moment generating functions for the zero-
truncated Poisson distribution P0 (λ):

0.2
t
eλt − 1 eλe − 1
G(t) = and M (t) = .
eλ − 1 eλ − 1

0.0
0 1 2 3 4 5 6

The zero-modified version of the Poisson k

distribution (obviously) generalized the zero-

truncated version. We have the following mass Figure 1.5: Mass probability function for zero-
probability function modified Poisson distributions
(
p if k = 0
P (X = k) = λk −λ ,
K k! e otherwise
1−p
where K is the constant 1−e−λ . The “generating functions” for the zero-modified Poisson distribu-
tion P(λ, p) are
t
G(t) = p + K(eλt − 1) and M (t) = p + K(eλe − 1).

1.6.2 Properties

The expectation of the zero-truncated Poisson distribution is E(X) = 1−eλ−λ and Kλ for the zero-
modified version. While the variance are respectively V ar(X) = (1−eλ−λ )2 and Kλ + (K − K 2 )λ2 .

1.6.3 Estimation

Zero-truncated Poisson distribution

Let (Xi )i be i.i.d. sample of truncated Poisson random variables. Estimators of λ for the zero-
truncated Poisson distribution are studied in Tate & Goen (1958). Here is the list of possible
estimators for λ:

t−1
2 Sn−1
• λ̃ = T
n (1 − t
2 Sn
) is the minimum variance unbiased estimator,

• λ∗ = T
n (1 − N1
T ) is the Plackett’s estimator,

• λ̂, the solution of equation T

n = λ
1−e−λ
, is the maximum likelihood estimator,
1.7. QUASI-POISSON DISTRIBUTION 17

where T = ni=1 Xi , 2 Snk denotes the Stirling number of the second kind and N1 the number of
P
observations equal to 1. Stirling numbers are costly do compute, see Tate & Goen (1958) for
approximate of theses numbers.

Zero-modified Poisson distribution

NEED REFERENCE

1.6.4 Random generation

The basic algorithm for the zero-truncated version P0 (λ) is simply

• do; generate X Poisson distributed P(λ); while X = 0

• return X

In output, we have a random variate in N∗ .

The zero-modified version P(λ, p) is a little bit tricky. We need to use the following heuristic:

• generate U from an uniform distribution

• if U < p, then X = 0
• otherwise
– do; generate X Poisson distributed P(λ); while X = 0
• return X

1.6.5 Applications

NEED REFERENCE

1.7 Quasi-Poisson distribution

NEED FOLLOWING REFERENCE

Biom J. 2005 Apr;47(2):219-29. Generalized Poisson distribution: the property of mixture of

Poisson and comparison with negative binomial distribution. Joe H, Zhu R.

Ecology. 2007 Nov;88(11):2766-72. Quasi-Poisson vs. negative binomial regression: how should
we model overdispersed count data? Ver Hoef JM, Boveng PL.
18 CHAPTER 1. CLASSIC DISCRETE DISTRIBUTION

1.7.1 Characterization

TODO

1.7.2 Properties

TODO

1.7.3 Estimation

TODO

1.7.4 Random generation

TODO

1.7.5 Applications

1.8 Geometric distribution

1.8.1 Characterization

The geometric distribution represents the first mass probability function

outcome of a particular event (with the proba-
0.6

G(1/2)
bility q to raise) in a serie of i.i.d. events. The G(1/3)
G(1/4)
mass probability function is
0.5

P (X = k) = q(1 − q)k ,
0.4

where k ∈ N and 0 < q ≤ 1. In terms of cumu-

P(X=k)

0.3

lative distribution function, it is the same as

0.2

F (k) = 1 − (1 − q)k+1 .
0.1

The whole question is wether this outcome

could be null or at least one (event). If we con-
0.0

sider the distribution to be valued in N∗ , please

0 1 2 3 4 5 6
see the truncated geometric distribution.
k

Figure 1.6: Mass probability function for Geomet-

ric distributions
1.8. GEOMETRIC DISTRIBUTION 19

The probability generating function of the

geometric G(q) is

q
G(t) = ,
1 − (1 − q)t

and its moment generating function

q
M (t) = .
1 − (1 − q)et

1.8.2 Properties

1−q 1−q
The expecation of a geometric distribution is simply E(X) = q and its variance V ar(X) = q2
.

The sum of n i.i.d. geometric G(q) random variables follows a negative binomial distribution
N B(n, q).

The minimum of n independent geometric G(qi ) random variables follows a geometric distribu-
tion G(q. ) with q. = 1 − ni=1 (1 − qi ).
Q

The geometric distribution is the discrete analogue of the exponential distribution thus it is
memoryless.

1.8.3 Estimation

1
The maximum likelihood estimator of q is q̂ = 1+X̄n
, which is also the moment based estimator.

NEED REFERENCE

1.8.4 Random generation

A basic algorithm is to use i.i.d. Bernoulli variables as follows

• initialize X to 0 and generate U from an uniform distribution,

• while U > p do ; generate U from an uniform distribution; X = X + 1;

• return X.

TOIMPROVE WITH Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-

Verlag, New York. Page 480.
20 CHAPTER 1. CLASSIC DISCRETE DISTRIBUTION

1.8.5 Applications

NEED MORE REFERENCE THAN Mačutek (2008)

1.9 Zero-truncated or zero-modified geometric distribution

1.9.1 Characterization

mass probability function

The zero-truncated version of the geometric dis-
tribution is defined as

0.6
G(1/3)
G(1/3,0)
G(1/3,1/4)
P (X = k) = p(1 − p)k−1 ,

0.5
where n ∈ N+ . Obviously, the distribution
takes values in {1, . . . , n, . . . }. Its distribution

0.4
function is P(X=k)

0.3
F (k) = 1 − (1 − p)k .
Finally the probability/moment generating
0.2

functions are
pt pet
0.1

G(t) = , and M (t) = .

1 − (1 − p)t 1 − (1 − p)et
0.0

In the following, it is denoted by G0 (p).

0 1 2 3 4 5 6

k
The zero-modified version of the geometric
distribution is characterized as follows

p if k = 0 Figure 1.7: Mass probability function for zero-
P (X = k) = k ,
Kq(1 − q) otherwise modified geometric distributions
where the constant K is 1−p
1−q and k ∈ N. Of course special cases of the zero modified version of the
geometric G(q, p) are the zero-truncated version with p = 0 and q = p and the classic geometric
distribution with p = q. The distribution function is expressed as follows
F (x) = p + K(1 − (1 − p)k ),
where k ≥ 0. The probability/moment generating functions are

q q
G(t) = p + K −q and M (t) = p + K −q .
1 − (1 − q)t 1 − (1 − q)et

1.9.2 Properties

1 1−p
The expectation of the geometric G0 (p) distribution is E(X) = p and its variance V ar(X) = p2
.

For the zero-modified geometric distribution G(q, p), we have E(X) = K 1−q
q and V ar(X) =
K 1−q
q2
.
1.9. ZERO-TRUNCATED OR ZERO-MODIFIED GEOMETRIC DISTRIBUTION 21

1.9.3 Estimation

Zero-truncated geometric distribution

According to Cacoullos & Charalambides (1975), the (unique) minimim variance unbiased estimator
of q for the zero-truncated geometric distribution is

S̃nt−1
q̃ = t ,
S̃nt

where t denotes the sum ni=1 Xi , S̃nt is defined by n!1 Pn n−k C k (k + t − 1) ∗ . The maximum
P
k=1 (−1) n t
likelihood estimator of q is given by
1
q̂ = ,
X̄n
which is also the moment based estimator. By the uniqueness of the unbiased estimator, q̂ is a
biased estimator.

Zero-modified geometric distribution

X̄n
Moment based estimators for the zero-modified geometric distribution G(p, q) are given by q̂ = Sn2
(X̄n )2
and p̂ = 1 − 2 .
Sn

NEED REFERENCE

1.9.4 Random generation

For the zero-truncated geometric distribution, a basic algorithm is to use i.i.d. Bernoulli variables
as follows

• initialize X to 1 and generate U from an uniform distribution,

• while U > q do ; generate U from an uniform distribution; X = X + 1;

• return X.

While for the zero-modified geometric distribution, it is a little bit tricky

• generate U from an uniform distribution

• if U < p, then X = 0

• otherwise
∗
where Cnk ’s are the binomial coefficient and (n)m is the falling factorial.
22 CHAPTER 1. CLASSIC DISCRETE DISTRIBUTION

– initialize X to 1 and generate U from an uniform distribution

– while U > q do ; generate U from an uniform distribution; X = X + 1;
• return X

1.9.5 Applications

NEED REFERENCE

1.10 Negative binomial distribution

1.10.1 Characterization

1.10.2 Characterization

mass probability function

The negative binomial distribution can be char-
acterized by the following mass probability
0.30

NB(4,1/2)
NB(4,1/3)
function NB(3,1/2)
0.25

k
P (X = k) = Cm+k−1 pm (1 − p)k ,
k
0.20

where k ∈ N, Cm+k−1 ’s are combinatorial num-

bers and parameters m, p are constraint by
P(X=k)

0 < p < 1 and m ∈ N∗ . However a second

0.15

parametrization of the negative binomial distri-

0.10

bution is
r k
Γ(r + k) 1 β
0.05

P (X = k) = ,
Γ(r)k! 1+β 1+β
0.00

where k ∈ N and r, β > 0. We can retrieve the

first parametrization N B(m, p) from the second 0 2 4 6 8 10

parametrization N B(r, β) with k

(
1
1+β = p
r=m Figure 1.8: Mass probability function for negative
binomial distributions

The probability generating functions for

these two parametrizations are
m r
p 1
G(t) = and G(t) = ,
1 − (1 − p)t 1 − β(t − 1)
and their moment generating functions are
m r
p 1
M (t) = and M (t) = .
1 − (1 − p)et 1 − β(et − 1)
1.10. NEGATIVE BINOMIAL DISTRIBUTION 23

One may wonder why there are two parametrization for one distribution. Actually, the first
parametrization N B(m, p) has a meaningful construction: it is the sum of m i.i.d. geometric G(p)
random variables. So it is also a way to characterize a negative binomial distribution. The name
comes from the fact that the mass probability function can be rewritten as

1 − p k 1 −m−k

k
P (X = k) = Cm+k−1 ,
p p
which yields to
k
P (X = k) = Cm+k−1 P k Q−m−k .
This is the general term of the development of (P − Q)−m .

1.10.3 Properties

m(1−p)
The expectation of negative binomial N B(m, p) (or N B(m, p)) is E(X) = p or (rβ), while its
m(1−p)
variance is V ar(X) = p2
or (rβ(1 + β)).

Let N be Poisson distributed P(λΘ) knowing that Θ = θ where Θ is gamma distributed G(a, a).
Then we have N is negative binomial distributed BN (a, λa ).

1.10.4 Estimation

Sn2
X̄n
Moment based estimators are given by β̂ = X̄n
− 1 and r̂ = .
β̂

NEED REFERENCE

1.10.5 Random generation

The algorithm to simulate a negative binomial distribution N B(m, p) is simply to generate m

random variables geometrically distributed and to sum them.

NEED REFERENCE

1.10.6 Applications

From Simon (1962), here are some applications of the negative binomial distribution

• number of bacterial colonies per microscopic field,

• quality control problem,

• claim frequency in non life insurance.

24 CHAPTER 1. CLASSIC DISCRETE DISTRIBUTION

1.11 Zero-truncated or zero-modified negative binomial distribu-

tion

1.11.1 Characterization

The zero-truncated negative binomial distribution is characterized by

Γ(r + k) β k
P (X = k) = r
( ) ,
Γ(r)k!((r + β) − 1) 1 + β

where k ∈ N∗ , r, β usual parameters. In terms of probability generating function, we have

(1 − β(t − 1))r − (1 + β)−r

G(t) = .
1 − (r + β)r

The zero-modified version is defined as follows

(
p if k = 0
P (X = k) = ,
K Γ(r+k) 1 r β k
Γ(r)k! ( 1+β ) ( 1+β ) otherwise

1−p
where K is defined as 1
1−( 1+β )r
, r, β usual parameters and p the new parameter. The probability
generating function is given by

1 r 1 r
G(t) = ( ) −( ) ,
1 − β(t − 1) 1+β

and
1 1 r
M (t) = ( t
)r − ( )
1 − β(e − 1) 1+β
for the moment generating function.

1.11.2 Properties

rβ
Expectations for these two distribution are E(X) = 1−(r+β)r and Krβ respectively for the zero-
rβ(1+β−(1+β+rβ)(1+β)−r )
truncated and the zero-modified versions. Variances are V ar(X) = (1−(r+β)r )2
and
Krβ(1 + β) + (K − K 2 )E 2 [X].

1.11.3 Estimation

According to Cacoullos & Charalambides (1975), the (unique) minimim variance unbiased estimator
of p for the zero-truncated geometric distribution is
t−1
S̃r,n
p̃ = t ,
S̃nt
1.12. PASCAL DISTRIBUTION 25

where t denotes the sum ni=1 Xi , S̃nt is defined by n−k C k (k + t − 1) ∗ .

P 1 Pn
n! k=1 (−1) n t The maximum
likelihood estimator of q is given by

1
q̂ = ,
X̄n

which is also the moment based estimator. By the uniqueness of the unbiased estimator, q̂ is a
biased estimator.

1.11.4 Random generation

1.11.5 Applications

1.12 Pascal distribution

1.12.1 Characterization

The negative binomial distribution can be constructed by summing m geometric distributed vari-
ables G(p). The Pascal distribution is got from summing n geometrically distributed G0 (p) variables.
Thus possible values of the Pascal distribution are in {n, n + 1, . . . }. The mass probability function
is defined as

n−1 n
P (X = k) = Ck−1 p (1 − p)k−n ,

where k ∈ {n, n + 1, . . . }, n ∈ N∗ and 0 < p < 1. The probability/moment generating functions are

n n
pet

pt
G(t) = and M (t) = .
1 − (1 − p)t 1 − (1 − p)et

1.12.2 Properties

For the Pascal distribution Pa(n, p), we have E(X) = np and V ar(X) = n(1−p)
p2
. The link between
Pascal distribution Pa(n, p) and the negative binomial distribution BN (n, p) is to substract the
constant n, i.e. if X ∼ Pa(n, p) then X − n ∼ BN (n, p).

∗
where Cnk ’s are the binomial coefficient and (n)m is the increasing factorial.
26 CHAPTER 1. CLASSIC DISCRETE DISTRIBUTION

1.12.3 Estimation

1.12.4 Random generation

1.12.5 Applications

1.13 Hypergeometric distribution

1.13.1 Characterization

The hypergeometric distribution is characterized by the following elementary probabilities

k C n−k
Cm N −m
P (X = k) = n ,
CN

where N ∈ N+ , (m, n) ∈ {1, . . . , N }2 and k ∈ {0, . . . , min(m, n)}.

It can also be defined though its probability generating function or moment generating function:
n n t
CN −m 2 F1 (−n, −m; N − m − n + 1; t) CN −m 2 F1 (−n, −m; N − m − n + 1; e )
G(t) = n and M (t) = n ,
CN CN

where 2 F1 is the hypergeometric function of second kind.

1.13.2 Properties

nm nm(N −n)(N −m)

The expectation of an hypergeometric distribution is E(X) = N and V ar(X) = N 2 (N −1)
.

m
We have the following asymptotic result: H(N, n, m) 7→ B(n, N ) when N and m are large such
m
that N −→ 0 < p < 1.
N →+∞

1.13.3 Estimation

1.13.4 Random generation

1.13.5 Applications

Let N be the number of individuals in a given population. In this population, m has a particular
m
property, hence a proportion of N . If we draw n individuals among this population, the random
variable associated with the number of people having the desired property follows a hypergeometric
n
distribution H(N, n, m). The ratio N is called the survey rate.
Chapter 2

Not so-common discrete distribution

2.1 Conway-Maxwell-Poisson distribution

2.1.1 Characterization

TODO

2.1.2 Properties

TODO

2.1.3 Estimation

TODO

2.1.4 Random generation

TODO

27
28 CHAPTER 2. NOT SO-COMMON DISCRETE DISTRIBUTION

2.1.5 Applications

2.2 Delaporte distribution

2.2.1 Characterization

TODO

2.2.2 Properties

TODO

2.2.3 Estimation

TODO

2.2.4 Random generation

TODO

2.2.5 Applications

2.3 Engen distribution

2.3.1 Characterization

TODO

2.3.2 Properties

TODO

2.3.3 Estimation

TODO
2.4. LOGARITMIC DISTRIBUTION 29

2.3.4 Random generation

TODO

2.3.5 Applications

2.4 Logaritmic distribution

2.4.1 Characterization

TODO

2.4.2 Properties

TODO

2.4.3 Estimation

TODO

2.4.4 Random generation

TODO

2.4.5 Applications

2.5 Sichel distribution

2.5.1 Characterization

TODO

2.5.2 Properties

TODO
30 CHAPTER 2. NOT SO-COMMON DISCRETE DISTRIBUTION

2.5.3 Estimation

TODO

2.5.4 Random generation

TODO

2.5.5 Applications

2.6 Zipf distribution

The name “Zipf distribution” comes from George Zipf’s work on the discretized version of the Pareto
distribution, cf. Arnold (1983).

2.6.1 Characterization

See Arnold(83) for relationship with Pareto’s distribution.

2.6.2 Properties

TODO

2.6.3 Estimation

TODO

2.6.4 Random generation

TODO
2.7. THE GENERALIZED ZIPF DISTRIBUTION 31

2.6.5 Applications

2.7 The generalized Zipf distribution

2.7.1 Characterization

TODO

2.7.2 Properties

TODO

2.7.3 Estimation

TODO

2.7.4 Random generation

TODO

2.7.5 Applications

2.8 Rademacher distribution

2.8.1 Characterization

TODO

2.8.2 Properties

TODO

2.8.3 Estimation

TODO
32 CHAPTER 2. NOT SO-COMMON DISCRETE DISTRIBUTION

2.8.4 Random generation

TODO

2.8.5 Applications

2.9 Skellam distribution

2.9.1 Characterization

TODO

2.9.2 Properties

TODO

2.9.3 Estimation

TODO

2.9.4 Random generation

TODO

2.9.5 Applications

2.10 Yule distribution

2.10.1 Characterization

TODO

2.10.2 Properties

TODO
2.11. ZETA DISTRIBUTION 33

2.10.3 Estimation

TODO

2.10.4 Random generation

TODO

2.10.5 Applications

2.11 Zeta distribution

2.11.1 Characterization

TODO

2.11.2 Properties

TODO

2.11.3 Estimation

TODO

2.11.4 Random generation

TODO

2.11.5 Applications
Part II

Continuous distributions

34
Chapter 3

Finite support distribution

3.1 Uniform distribution

3.1.1 Characterization

The uniform distribution is the most intuitive

distribution, its density function is density function

1
1.0

U(0,1)
f (x) = , U(0,2)
b−a U(0,3)
0.8

where x ∈ [a, b] and a < b ∈ R. So the uniform

U(a, b) is only valued in [a, b]. From this, we
can derive the following distribution function
0.6
f(x)


 0 if x < a
0.4

x−a
F (x) = if a ≤ x ≤ b .
 b−a
1 otherwise
0.2

Another way to define the uniform distribu-

0.0

tion is to use the moment generating function

0.0 0.5 1.0 1.5 2.0 2.5 3.0

etb − eta x
M (t) =
t(b − a)
Figure 3.1: Density function for uniform distribu-
whereas its characteristic function is tion
eibt − eiat
φ(t) = .
i(b − a)t

3.1.2 Properties

a+b (b−a)2
The expectation of a uniform distribution is E(X) = 2 and its variance V ar(X) = 12 .

35
36 CHAPTER 3. FINITE SUPPORT DISTRIBUTION

If U is uniformally distributed U(0, 1), then (b−a)×U +a follows a uniform distribution U(a, b).

The sum of two uniform distribution does not follow a uniform distribution but a triangle
distribution.

The order statistic Xk:n of a sample of n i.i.d. uniform U(0, 1) random variable is beta distributed
Beta(k, n − k + 1).

Last but not least property is that for all random variables Y having a distribution function
FY , the random variable FY (Y ) follows a uniform distribution U(0, 1). Equivalently, we get that
the random variable FY−1 (U ) has the same distribution as Y where U ∼ U(0, 1) and FY−1 is the
generalized inverse distribution function. Thus, we can generate any random variables having a
distribution from the a uniform variate. This methods is called the inverse function method.

3.1.3 Estimation

For a sample (Xi )i of i.i.d. uniform variate, maximum likelihood estimators for a and b are respec-
tively X1:n and Xn:n , where Xi:n denotes the order statistics. But they are biased so we can use
the following unbiased estimators
n 1 1 n
â = X1:n + Xn:n and b̂ = X1:n + 2 Xn:n .
n2 −1 1−n 2 1−n 2 n −1
Finally the method of moments gives the following estimators
p p
ã = X̄n − 3Sn2 and b̃ = X̄n + 3Sn2 .

3.1.4 Random number generation

Since this is the core distribution, the distribution can not be generated from another distribution.
In our modern computers, we use deterministic algorithms to generate uniform variate initialized
with the machine time. Generally, Mersenne-Twister algorithm (or its extensions) from Matsumoto
& Nishimura (1998) is implemented, cf. Dutang (2008) for an overview of random number genera-
tion.

3.1.5 Applications

The main application is sampling from an uniform distribution by the inverse function method.

3.2 Triangular distribution

3.2.1 Characterization
3.2. TRIANGULAR DISTRIBUTION 37

density function
The triangular distribution has the following
density

1.0
T(0,2,1)
T(0,2,1/2)
 T(0,2,4/3)
 2(x−a) if a ≤ x ≤ c
(b−a)(c−a)

0.8
f (x) = 2(b−x) ,

(b−a)(b−c) if c ≤ x ≤ b

0.6
where x ∈ [a, b], a ∈ R, a < b and a ≤ c ≤ b.

f(x)
The associated distribution function is

0.4

 (x−a)2 if a ≤ x ≤ c
(b−a)(c−a)
F (x) = (b−x)2 .
 1− if c ≤ x ≤ b

0.2
(b−a)(b−c)

0.0
As many finite support distribution, we have
a characteristic function and a moment gener- 0.0 0.5 1.0 1.5 2.0

ating function. They have the following expre- x

sion: Figure 3.2: Density function for triangular distri-

iat
(b − c)e − (b − a)e ict −2(c − a)ebutions
ibt
φ(t) = +
−2(b − a)(c − a)(b − c)t2 (b − a)(c − a)(b − c)t2
and
(b − c)eat − (b − a)ect 2(c − a)ebt
M (t) = + .
2(b − a)(c − a)(b − c)t2 (b − a)(c − a)(b − c)t2

3.2.2 Properties

a+b+c
The expectation of the triangle distribution is E(X) = 3 whereas its variance is V ar(X) =
a2 +b2 +c2
18 − ab+ac+bc
18 .

3.2.3 Estimation

Maximum likelihood estimators for a, b, c do not have closed form. But we can maximise the log-
likelihood numerically. Furthermore, moment based estimators have to be computed numerically
solving the system of sample moments and theoretical ones. One intuitive way to estimate the
parameters of the triangle distribution is to use sample minimum, maximum and mode: â = X1:n ,
b̂ = Xn:n and ĉ = mode(X1 , . . . , Xn ), where mode(X1 , . . . , Xn ) is the middle of the interval whose
bounds are the most likely order statistics.

3.2.4 Random generation

The inverse function method can be used since the quantile function has a closed form:
(
if 0 ≤ u ≤ c−a
p
−1 a + u(b − a)(c − a) b−a
F (u) = .
if c−a
p
b − (1 − u)(b − a)(b − c) b−a ≤ u ≤ 1
38 CHAPTER 3. FINITE SUPPORT DISTRIBUTION

Thus F −1 (U ) with U a uniform variable is triangular distributed.

Stein & Keblis (2008) provides new kind of methods to simulate triangular variable. An algo-
rithm for the triangular T (0, 1, c) distribution is provided. It can be adapted for a, b, c in general.
Let c̃ be c−a
b−a which is in ]0, 1[. The “minmax” algorithm is

• generate U, V (idependently) from a uniform distribution,

• X = a + (b − a) × [(1 − c̃) min(U, V ) + c̃ max(U, V )].

This article also provides another method using a square root of uniform variate, which is called
“one line method”, but it is not necessary more fast if we use vector operation.

3.2.5 Applications

A typical of the triangle distribution is when we know the minimum and the maximum of outputs
of an interest variable plus the most likely outcome, which represent the parameter a, b and c. For
example we may use it in business decision making based on simulation of the outcome, in project
management to model events during an interval and in audio dithering.

3.3 Beta type I distribution

3.3.1 Characterization

The beta distribution of first kind is a distribu- density function

tion valued in the interval [0, 1]. Its density is
B(2,2)
2.0

defined as B(3,1)
B(1,5)
Arcsine
xa−1 (1 − x)b−1
f (x) = ,
β(a, b)
1.5

where x ∈ [0, 1], a, b > 0 and β(., .) is the beta

function defined in terms of the gamma func-
f(x)

1.0

tion.

Since a, b can take a wide range of values,

0.5

this allows many different shapes for the beta

density:
0.0

• a = b = 1 corresponds to the uniform 0.0 0.2 0.4 0.6 0.8 1.0

x
distribution
Figure 3.3: Density function for beta distributions
• when a, b < 1, density is U-shapped
3.3. BETA TYPE I DISTRIBUTION 39

• when a < 1, b ≥ 1 or a = 1, b > 1, density

is strictly decreasing

– for a = 1, b > 2, density is strictly

convex
– for a = 1, b = 2, density is a straight
line
– for a = 1, 1 < b < 2, density is
strictly concave

• when a = 1, b < 1 or a > 1, b ≤ 1, density

is strictly increasing

– for a > 2, b = 1, density is strictly convex

– for a = 2, b = 1, density is a straight line
– for 1 < a < 2, b = 1, density is strictly concave

• when a, b > 1, density is unimodal.

Let us note that a = b implies a symmetric density.

From the density, we can derive its distribution function

β(a, b, x)
F (x) = ,
β(a, b)

where x ∈ [0, 1] and β(., ., .) denotes the incomplete beta function. There is no analytical formula
for the incomplete beta function but can be approximated numerically.

There exists a scaled version of the beta I distribution. Let θ be a positive scale parameter.
The density of the scaled beta I distribution is given by

xa−1 (θ − x)b−1
f (x) = ,
θa+b−1 β(a, b)

where x ∈ [0, θ]. We have the following distribution function

β(a, b, xθ )
F (x) = .
β(a, b)

Beta I distributions have moment generating function and characteristic function expressed in
terms of series:
+∞ k−1
!
X Y a+r tk
M (t) = 1 +
a + b + r k!
k=1 r=0

and
φ(t) = 1 F1 (a; a + b; i t),
where 1 F1 denotes the hypergeometric function.
40 CHAPTER 3. FINITE SUPPORT DISTRIBUTION

3.3.2 Special cases

A special case of the beta I distribution is the arcsine distribution, when a = b = 12 . In this special
case, we have
1
f (x) = p ,
π x(1 − x)
from which we derive the following distribution function
2 √
F (x) = arcsin( x).
π

Another special case is the power distribution when b = 1, with the following density

f (x) = axa−1 and F (x) = xa ,

for 0 < x < 1.

3.3.3 Properties

a ab θa
The moments of the beta I distribution are E(X) = a+b and V ar(X) = (a+b)2 (a+b+1)
(and a+b ,
θ2 ab
(a+b)2 (a+b+1)
for the scaled version respectively).

Raw moments for the beta I distribution are given by

Γ(α + β)Γ(α + r)
E(X r ) = ,
Γ(α + β + r)Γ(α)
while central moments have the following expression
r
α α+β
E ((X − E(X))r ) = − F
2 1 α, −r, α + β, .
α+β α

For the arcsine distribution, we have 12 and 81 respectively. Let us note that the expectation of
a arcsine distribution is the least probable value!

Let n be an integer. If we consider n i.i.d. uniform U(0, 1) variables Ui , then the distribution
of the maximum max Ui of these random variables follows a beta I distribution B(n, 1).
1≤i≤n

3.3.4 Estimation

Maximum likelihood estimators for a and b do not have closed form, we must solve the system
n

1 P


 n log(Xi ) = β(a, b)(ψ(a + b) − ψ(a))
i=1
n
1 P
log(1 − Xi ) = β(a, b)(ψ(a + b) − ψ(b))


 n
i=1
3.4. GENERALIZED BETA I DISTRIBUTION 41

numerically, where ψ(.) denotes the digamma function.

Method of moments gives the following estimators

X̄n (1 − X̄n ) 1 − X̄n
ã = X̄n −1 and b̃ = ã .
Sn2 X̄n

3.3.5 Random generation

NEED REFERENCE

3.3.6 Applications

The arcsine distribution (a special case of the beta I) can be used in game theory. If we have two
players playint at head/tail coin game and denote by (Si )i≥1 the serie of gains of the first player
for the different game events, then the distribution of the proportion of gains among all the Si ’s
that are positive follows asymptotically an arcsine distribution.

3.4 Generalized beta I distribution

3.4.1 Characterization

The generalized beta distribution is the distri- density function

1
bution of the variable θX τ when X is beta dis-
3.0

B(2,2,2,2)
tributed. Thus it has the following density B(3,1,2,2)
B(3,1,1/2,2)
B(1/2,2,1/3,2)
2.5

(x/θ)a−1 (1 − (x/θ))b−1 τ
f (x) =
β(a, b) x
2.0

for 0 < x < θ and a, b, τ, θ > 0. θ is a scale

f(x)

1.5

parameter while a, b, τ are shape parameters.

1.0

As for the beta distribution, the distribution

function is expressed in terms of the incomplete
0.5

beta function
0.0

β(a, b, ( xθ )τ )
F (x) = , 0.0 0.5 1.0 1.5 2.0
β(a, b)
x

for 0 < x < θ. Figure 3.4: Density function for generalized beta
distributions
42 CHAPTER 3. FINITE SUPPORT DISTRIBUTION

3.4.2 Properties

Moments of the generalized beta distribution

are given by the formula
β(a + τr )
E(X r ) = θr .
β(a, b)

For τ = θ = 1, we retrieve the beta I distribution.

3.4.3 Estimation

Maximum likelihood estimators as well as moment based estimators have no chance to have explicit
form, but we can compute it numerically. NEED REFERENCE

3.4.4 Random generation

NEED REFERENCE

3.4.5 Applications

NEED REFERENCE

3.5 Generalization of the generalized beta I distribution

3.5.1 Characterization

A generalization of the generalized beta distribution has been studied in Nadarajah & Kotz (2003).
Its density is given by

bβ(a, b) a+b−1
f (x) = x 2 F1 (1 − γ, a, a + b, x),
β(a, b + γ)

where 0 < x < 1 and 2 F1 denotes the hypergeometric function. Its distribution function is also
expressed in terms of the hypergeometric function:

bβ(a, b)
F (x) = xa+b 2 F1 (1 − γ, a, a + b + 1, x),
(a + b)β(a, b + γ)
3.5. GENERALIZATION OF THE GENERALIZED BETA I DISTRIBUTION 43

3.5.2 Special cases

Nadarajah & Kotz (2003) list specials cases of this distribution: If a + b + γ = 1 then we get

bΓ(b)xa+b−1 (1 − x)−a
f (x) = .
Γ(1 − a)Γ(a + b)
If a + b + γ = 2 then we get
b(a + b − 1)β(a, b)
f (x) = β(a + b − 1, 1 − a, x)
β(a, 2 − a)
If in addition

• a + b − 1 ∈ N, we have
a+b−1
!
b(a + b − 1)β(a, b)β(a + b − 1, 1 − a) X Γ(i − a)
f (x) = 1− xi−1 (1 − x)1−a
β(a, 2 − a) Γ(1 − a)Γ(i)
i=1

• a = 1/2 and b = 1, we have r

4 x
f (x) = arctan
π 1−x

• a = 1/2 and b = k ∈ N, we have

k−1
r !
k(2k − 1)β(1/2, k)β(1/2, k − 1/2) 2 x p X
f (x) = arctan − x(1 − x) di (x, k)
π π 1−x
i=1

If γ = 0 then, we get

f (x) = b(a + b − 1)(1 − x)b−1 β(a + b − 1, 1 − b, x)

If in addition

• a + b − 1 ∈ N, we have
a+b−1
!
X Γ(i − b)
f (x) = b(a + b − 1)β(a, b)β(a + b − 1, 1 − a) 1 − xi−1 (1 − x)1−b
Γ(1 − b)Γ(i)
i=1

• a = 1 and b = 1/2, we have

r
1 x
f (x) = √ arctan
2 1−x 1−x

• a = k ∈ N, we have
k−1
r !
(2k − 1)β(1/2, k − 1/2) 2 x p X
f (x) = √ arctan − x(1 − x) di (x, k)
4 1−x π 1−x
i=1
44 CHAPTER 3. FINITE SUPPORT DISTRIBUTION

If γ = 1, then we get a power function

f (x) = (a + b)xa+b−1

If a = 0, then we get a power function

f (x) = bxb−1

If b = 0, then we get
β(a, γ, x)
f (x) =
β(a, γ + 1)
If in addition

• a ∈ N, we have
a
!
a X Γ(γ + i − 1) i−1 γ
f (x) = +1 1− x (1 − x)
γ Γ(γ)Γ(i)
i=1

• γ ∈ N, we have
a
!
a X Γ(a + i − 1)
f (x) = +1 1− xa (1 − x)i−1
γ Γ(a)Γ(i)
i=1

• a = γ = 1/2, we have r
4 x
f (x) = arctan
π 1−x

• a = k − 1/2 and γ = j − 1/2 with k, j ∈ N, we have

k−1 j−1
r !
a 2 x p X X
f (x) = +1 arctan − x(1 − x) di (x, k) + ci (x, k)
γ π 1−x
i=1 i=1

Where ci , di functions are defined by

Γ(k + i − 1)xk−1/2 (1 − x)i−1/2

ci (x, k) =
Γ(k − 1/2)Γ(i + 1/2)

and
Γ(i)xi−1
di (x, k) = .
Γ(i + 1/2)Γ(1/2)

3.5.3 Properties

Moments for this distribution are given by

bβ(a, b)
E(X n ) = xa+b 3 F1 (1 − γ, a, n + a + b + 1, a + b, n + a + b + 1, 1),
(n + a + b)β(a, b + γ)

where 3 F1 is a hypergeometric function.

3.6. KUMARASWAMY DISTRIBUTION 45

3.5.4 Estimation

NEED REFERENCE

3.5.5 Random generation

NEED REFERENCE

3.5.6 Applications

NEED REFERENCE

3.6 Kumaraswamy distribution

3.6.1 Characterization

The Kumaraswamy distribution has the follow- density function

ing density function
3.0

K(5,2)
K(2,5/2)
f (x) = abxa−1 (1 − xa )b−1 , K(1/2,1/2)
K(1,3)
2.5

where x ∈ [0, 1], a, b > 0. Its distribution func-

tion is
2.0

F (x) = 1 − (1 − xa )b .
A construction of the Kumaraswamy distribu-
f(x)

1.5

tion use minimum/maximum of uniform sam-

ples. Let n be the number of samples (each with
1.0

m i.i.d. uniform variate), then the distribution

of the minimumm of all maxima (by sample)
0.5

is a Kumaraswamy Ku(m, n), which is also the

0.0

distribution of one minus the maximum of all

minima. 0.0 0.2 0.4 0.6 0.8 1.0

From Jones (2009), the shapes of the density

Figure 3.5: Density function for Kumaraswamy
behaves as follows
distributions

• a, b > 1 implies unimodal density,

• a > 1, b ≤ 1 implies increasing density,

• a = b = 1 implies constant density,

• a ≤ 1, b > 1 implies decreasing density,

46 CHAPTER 3. FINITE SUPPORT DISTRIBUTION

• a, b < 1 implies uniantimodal,

which is examplified in the figure on the right.

3.6.2 Properties

Moments for a Kumaraswamy distribution are available and computable with

τ
E(X τ ) = bβ(1 + , b)
a
when τ > −a with β(., .) denotes the beta function. Thus the expectation of a Kumaraswamy
distribution is E(X) = bΓ(1+1/a)Γ(b) 2 2 2 1
Γ(1+1/a+b) and its variance V ar(X) = bβ(1 + a , b) − b β (1 + a , b).

3.6.3 Estimation

From Jones (2009), the maximum likelihood estimators are computable by the following procedure

Pn Yi log Yi
with Yi = Xia to find â∗ ,
Pn log Yi
1. solve the equation n
1+ 1
+ Pni=1 1−Yi
a n i=1 1−Yi i=1 log(1−Yi )

Pn −1
2. compute b̂ = −n i=1 log(1 − Xiâ ) .

3.6.4 Random generation

Since the quantile function is explicit

1
1
F −1 (u) = 1 − (1 − u) b
a
,

an inversion function method F −1 (U ) with U uniformly distributed is easily computable.

3.6.5 Applications

From wikipedia, we know a good example of the use of the Kumaraswamy distribution: the storage
volume of a reservoir of capacity zmax whose upper bound is zmax and lower bound is 0.

∗
the solution for this equation exists and is unique.
Chapter 4

The Gaussian family

4.1 The Gaussian (or normal) distribution

The normal distribution comes from the study of astronomical data by the German mathematician
Gauss. That’s why it is widely called the Gaussian distribution. But there are some hints to
think that Laplace has also used this distribution. Thus sometimes we called it the Laplace Gauss
distribution, a name introduced by K. Pearson who wants to avoid a querelle about its name.

4.1.1 Characterization

The density of a normal distribution N (µ, σ 2 )

is
1 (x−µ)2 density function
f (x) = √ e− 2σ2 ,
σ 2π
0.8

N(0,1)
N(0,2)
where x ∈ R and µ(∈ R) denotes the mean N(0,1/2)
N(-1,1)
of the distribution (a location parameter) and
σ 2 (> 0) its variance (a scale parameter).
0.6

Its distribution function is then

f(x)

0.4

Z x
1 (x−µ)2
F (x) = √ e− 2σ2 du,
−∞ σ 2π
0.2

which has no explicit expressions. Many soft-

wares have this distribution function imple-
mented, since it is The basic distribution. Gen-
0.0

erally, we denote by Φ the distribution function

-4 -2 0 2 4
a N (0, 1) normal distribution, called the stan-
x
dard normal distribution. F can be rewritten
as
x−µ
F (x) = Φ . Figure 4.1: The density of Gaussian distributions
σ

47
48 CHAPTER 4. THE GAUSSIAN FAMILY

Finally, the normal distribution can also be characterized through its moment generating func-
tion
σ 2 t2
M (t) = emt+ 2 ,
as well as its characteristic function
σ 2 t2
φ(t) = eimt− 2 .

4.1.2 Properties

It is obvious, but let us recall that the expectation (and the median) of a normal distribution
N (µ, σ 2 ) is µ and its variance σ 2 . Furthermore if X ∼ N (0, 1) we have that E(X n ) = 0 if x is odd
and (2n)!
2n n! if x is even.

The biggest property of the normal distribution is the fact that the Gaussian belongs to the
family of stable distribution (i.e. stable by linear combinations). Thus we have

• if X ∼ N (µ, σ 2 ) and Y ∼ N (ν, ρ2 ), then aX + bY ∼ N (aµ + bν, a2 σ 2 + b2 ρ2 + 2abCov(X, Y )),

with the special case where X, Y are independent cancelling the covariance term.

• if X ∼ N (µ, σ 2 ), a, b two reals, then aX + b ∼ N (aµ + b, a2 σ 2 ).

If we consider an i.i.d. sample of n normal random variables (Xi )1≤i≤n , then the sample mean
2 2
X n follows a N (µ, σn ) independently from the sample variance Sn2 such that Sσn2n follows a chi-square
distribution with n − 1 degrees of freedom.

A widely used theorem using a normal distribution is the central Pnlimit theorem:
2 X −nm L
If (Xi )1≤i≤n are i.i.d. with mean m and finite variance s , then i=1s√ni −→ N (0, 1). If we
drop the hypothesis of identical distribution, there is still an asymptotic convergence (cf. theorem
of Lindeberg-Feller).

4.1.3 Estimation

The maximum likelihood estimators are

Pn 2
• Xn = 1
n i=1 Xi ∼ N (µ, σn ) is the unbiased estimator with minimum variance of µ,
Pn
• Sn2 = 1
n−1 i=1 (Xi − X n )2 ∼ χ2n−1 is the unbiased estimator with minimum variance of σ 2∗ ,
Γ( n−1
q
)p 2
• σ̂n = n−12
2
Γ( n ) Sn is the unbiased estimator with minimum variance of σ but we generally
p 2
use Sn2 .

Confidence intervals for these estimators are also well known quantities
∗
This estimator is not the maximum likelihood estimator since we unbias it.
4.1. THE GAUSSIAN (OR NORMAL) DISTRIBUTION 49

q q
2
Sn 2
Sn
• I(µ) = X n − n tn−1,α/2 ; X n + n tn−1,α/2 ,
h 2n 2n
i
Sn Sn
• I(σ 2 ) = zn−1,α/2 ;z ,
n−1,1−α/2

where tn−1,α/2 and zn−1,α/2 are quantiles of the Student and the Chi-square distribution.

4.1.4 Random generation

The Box-Muller algorithm produces normal random variates:

• generate U, V from a uniform U(0, 1) distribution,

√ √
• compute X = −2 log U cos(2πV ) and Y = −2 log U sin(2πV ).

In outputs, X and Y follow a standard normal distribution (independently).

But there appears that this algorithm under estimates the tail of the distribution (called the
Neave effect, cf. Patard (2007)), most softwares use the inversion function method, consist in
computing the quantile function Φ−1 of a uniform variate.

4.1.5 Applications

From wikipedia, here is a list of situations where approximate normality is sometimes assumed

• In counting problems (so the central limit theorem includes a discrete-to-continuum approx-
imation) where reproductive random variables are involved, such as Binomial random vari-
ables, associated to yes/no questions or Poisson random variables, associated to rare events;
• In physiological measurements of biological specimens: logarithm of measures of size of living
tissue (length, height, skin area, weight) or length of inert appendages (hair, claws, nails,
teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark
also falls under this category or other physiological measures may be normally distributed,
but there is no reason to expect that a priori;
• Measurement errors are often assumed to be normally distributed, and any deviation from
normality is considered something which should be explained;
• Financial variables: changes in the logarithm of exchange rates, price indices, and stock
market indices; these variables behave like compound interest, not like simple interest, and
so are multiplicative; or other financial variables may be normally distributed, but there is
no reason to expect that a priori;
• Light intensity: intensity of laser light is normally distributed or thermal light has a Bose-
Einstein distribution on very short time scales, and a normal distribution on longer timescales
due to the central limit theorem.
50 CHAPTER 4. THE GAUSSIAN FAMILY

4.2 Log normal distribution

4.2.1 Characterization

One way to characterize a random variable fol-

lows a log-normal distribution is to say that its
logarithm is normally distributed. Thus the dis- densities of log-normal distribution
tribution function of a log-normal distribution

1.0
(LG(µ, σ 2 )) is LN(0,1)
LN(0,2)
LN(0,1/2)

log(x) − µ

0.8
F (x) = Φ ,
σ

0.6
where Φ denotes the distribution function of the

f(x)
standard normal distribution and x > 0.

0.4
From this we can derive an explicit expres-
0.2
sion for the density LG(µ, σ 2 )
0.0

1 (log(x)−µ)2
f (x) = √ e− 2σ2 ,
σx 2π 0 2 4 6 8 10

x
for x > 0, µ ∈ R and σ 2 > 0.

A log-normal distribution does not have fi- Figure 4.2: The density of log-normal distribu-
nite characteristic function or moment generat- tions
ing function.

4.2.2 Properties

The expectation and the variance of a log-

σ2
normal distribution are E(X) = eµ+ 2 and
2 2
V ar(X) = (eσ − 1)e2µ+σ . And raw moments
n2 σ 2
are given by E(X n ) = enµ+ 2 . The median of a log-normal distribution is eµ .

From Klugman et al. (2004), we also have a formula for limited expected values

kσ 2

E (X ∧ L)k = ek(µ+ 2 Φ(u − kσ) + Lk (1 − Φ(u)),

log(L)−µ
where u = σ .

Since the Gaussian distribution is stable by linear combination, log-normal distribution is stable
by product combination. That is to say if we consider X and Y two independent log-normal
variables (LG(µ, σ 2 ) and LG(ν, ρ2 )), we have XY follows a log-normal distribution LG(µ+ν, σ 2 +ρ2 ).
Let us note that X 2
Y also follows a log-normal distribution LG(µ − ν, σ + ρ ).
2
4.2. LOG NORMAL DISTRIBUTION 51

An equivalence of the Limit Central Theorem for the log-normal distribution is the product of
i.i.d. random variables (Xi )1≤i≤n asymptotically follows a log-normal distribution with paramter
nE(log(X)) and nV ar(log(X)).

4.2.3 Estimation

Maximum likelihood estimators for µ and σ 2 are simply

Pn
• µ̂ = 1
n i=1 log(xi ) is an unbiased estimator of µ,
Pn
• σ
c2 = 1
n−1 i=1 (log(xi ) − µ̂)2 is an unbiased estimator of σ 2∗ .

One amazing fact about parameter estimations of log-normal distribution is that those estimators
are very stable.

4.2.4 Random generation

Once we have generated a normal variate, it is easy to generate a log-normal variate just by taking
the exponential of normal variates.

4.2.5 Applications

There are many applications of the log-normal distribution. Limpert et al. (2001) focuses on
application of the log-normal distribution. For instance, in finance the Black & Scholes assumes
that assets are log-normally distributed (cf. Black & Scholes (1973) and the extraordinary number
of articles citing this article). Singh et al. (1997) deals with environmental applications of the
log-normal distribution.

∗
As for the σ 2 estimator of normal distribution, this estimator is not the maximum likelihood estimator since we
unbias it.
52 CHAPTER 4. THE GAUSSIAN FAMILY

4.3 Shifted log normal distribution

4.3.1 Characterization

An extension to the log-normal distribution is density function

the translated log-normal distribution. It is the

0.8
LN(0,1,0)
distribution of X + ν where X follows a log- LN(0,1,1)
LN(0,1,1/2)
normal distribution. It is characterized by the
following distribution function

0.6

log(x − ν) − µ
F (x) = Φ ,
σ

f(x)

0.4
where Φ denotes the distribution function of
the standard normal distribution and x > 0.
Then we have this expression for the density

0.2
T LG(ν, µ, σ 2 )

1 (log(x−ν)−µ)2
0.0

f (x) = √ e− 2σ 2 ,
σ(x − ν) 2π 0.0 0.5 1.0 1.5 2.0

for x > 0, µ, ν ∈ R and σ 2 > 0. Figure 4.3: The density of shifted log-normal dis-
tributions
As for the log-normal distribution, there is
no moment generating function nor character-
istic function.

4.3.2 Properties

σ2
The expectation and the variance of a log-normal distribution are E(X) = ν +eµ+ 2 and V ar(X) =
2 2
σ2 2µ+σ 2 nµ+ n 2σ
(e − 1)e . And raw moments are given by E(X n ) = e .

4.3.3 Estimation

An intuitive approach is to estimate ν with X1:n , then estimate parameters on shifted samples
(Xi − ν)i .

4.3.4 Random generation

Once we have generated a normal variate, it is easy to generate a log-normal variate just by taking
the exponential of normal variates and adding the shifted parameter ν.
4.4. INVERSE GAUSSIAN DISTRIBUTION 53

4.3.5 Applications

An application of the shifted log-normal distribution to finance can be found in Haahtela (2005) or
Brigo et al. (2002).

4.4 Inverse Gaussian distribution

4.4.1 Characterization

The density of an inverse Gaussian distribution densities of inv-gauss distribution

IG(ν, λ) is given by

1.5
r InvG(1,2)
(x − ν)2

λ InvG(2,2)
f (x) = exp −λ , InvG(1,1/2)
2πx3 2ν 2 x
while its distribution function is
1.0
"r # "r #
λ x
2λ/ν λ x
f(x)

F (x) = Φ − 1 +e Φ +1 ,
x ν x ν
0.5

for x > 0, ν ∈ R, λ > 0 and Φ denotes the usual

standard normal distribution.
0.0

Its characteristic function is

» q –
2
( λν )1− 1− 2νλ it 0 1 2 3 4 5 6
φ(t) = e .
x

The moment generating function is ex- Figure 4.4: The density of inverse Gaussian dis-
pressed as tributions –
» q
2
λ
( ν ) 1− 1− 2νλ t
M (t) = e .

4.4.2 Properties

ν3
The expectation of an inverse Gaussian distribution IG(ν, λ) is ν and its variance λ.

Pn−1 Γ(n+i) 2λ i
Moments for the inverse Gaussian distribution are given E(X n ) = ν n i=0 Γ(i+1)Γ(n−i) ( ν )
for n integer.

From Yu (2009), we have the following properties

• if X is inverse Gaussian distributed IG(ν, λ), then aX follows an inverse Gaussian distribution
IG(aν, aλ) for a > 0
• if (Xi )i are i.i.d. inverse Gaussian variables, then the sum ni=1 Xi still follows an inverse
P
Gaussian distribution IG(nν, n2 λ)
54 CHAPTER 4. THE GAUSSIAN FAMILY

4.4.3 Estimation

Maximum likelihood estimators of ν and λ are

n !−1
X 1 1
µ̂ = X̄n and λ̂ = n − .
Xi µ̂
i=1

nλ
From previous properties, µ̂ follows an inverse gaussian distribution IG(µ, nλ) and follows a
λ̂
chi-squared distribution χ2n−1 .

4.4.4 Random generation

NEED

Mitchael,J.R., Schucany, W.R. and Haas, R.W. (1976). Generating random roots from variates
using transformations with multiple roots. American Statistician. 30-2. 88-91.

4.4.5 Applications

NEED REFERENCE

4.5 The generalized inverse Gaussian distribution

This section is taken from Breymann & Lüthi (2008).

4.5.1 Characterization density function

1.5

GIG(-1/2,5,1)
GIG(-1,2,3)
A generalization of the inverse Gaussian distri- GIG(-1,1/2,1)
GIG(1,5,1)
bution exists but there is no closed form for its
distribution function and its density used Bessel
1.0

functions. The latter is as follows

λ
xλ−1

ψ 2 1 χ
f(x)

f (x) = √ exp − + ψx ,
χ 2Kλ ( χψ) 2 x
0.5

where x > 0 and Kλ denotes the modified

Bessel function. Parameters must satisfy
0.0

• χ > 0, ψ ≥ 0, when λ < 0,

0 1 2 3 4 5
• χ > 0, ψ > 0, when λ = 0, x

Figure 4.5: The density of generalized inverse

Gaussian distributions
4.5. THE GENERALIZED INVERSE GAUSSIAN DISTRIBUTION 55

• χ ≥ 0, ψ > 0, when λ > 0.

The generalized inverse Gaussian is noted as

GIG(λ, ψ, χ).

Closed form for distribution function??

Plot

The moment generating function is given by

λ/2 p
ψ Kλ ( χ(ψ − 2t))
M (t) = √ . (4.1)
ψ − 2t Kλ ( χψ)

4.5.2 Properties

The expectation is given by r √

χ Kλ+1 ( χψ)
√ ,
ψ Kλ ( χψ)
and more generally the n-th moment is as follows
n √
n χ 2 Kλ+n ( χψ)
E(X ) = √ .
ψ Kλ ( χψ)

Thus we have the following variance

√ √ 2
χ Kλ+2 ( χψ) χ Kλ+1 ( χψ)
V ar(X) = √ − √ .
ψ Kλ ( χψ) ψ Kλ ( χψ)

Furthermore,
∂dE(X α )
E(log X) = . (4.2)
∂dα α=0
Note that numerical calculations of E(log X) may be performed with the integral representation as
well.

4.5.3 Estimation

NEED REFERENCE

4.5.4 Random generation

NEED REFERENCE
Chapter 5

Exponential distribution and its

extensions

5.1 Exponential distribution

5.1.1 Characterization

The exponential is a widely used and widely density function

known distribution. It is characterized by the
2.0

E(1)
following density E(2)
E(1/2)
f (x) = λe−λx ,
for x > 0 and λ > 0. Its distribution function
1.5

is
F (x) = 1 − e−λx .
f(x)

1.0

Since it is a light-tailed distribution, the mo-

ment generating function of an exponential dis-
0.5

tribution E(λ) exists which is

λ
M (t) = ,
λ−t
0.0

while its characteristic function is

0.0 0.5 1.0 1.5 2.0
λ
φ(t) = . x
λ − it
Figure 5.1: Density function for exponential dis-
tributions
5.1.2 Properties

1 1
The expectation and the variance of an exponential distribution E(λ) are λ and λ2
. Furthermore
the n-th moment is given by
Γ(n + 1)
E(X n ) = .
λn

56
5.1. EXPONENTIAL DISTRIBUTION 57

The exponential distribution is the only one continuous distribution to verify the lack of memory
property. That is to say if X is exponentially distributed, we have
P (X > t + s)
= P (X > t),
P (X > s)
where t, s > 0.

If we sum n i.i.d. exponentially distributed random variables, we get a gamma distribution

G(n, λ).

5.1.3 Estimation

The maximum likelihood estimator and the moment based estimator are the same
n 1
λ̂ = Pn = ,
i=1 Xi Xn
for a sample (Xi )1≤i≤n . But the unbiased estimator with mininum variance is
n−1
λ̃ = Pn .
i=1 Xi

Exact confidence interval for parameter λ is given by

z2n,1− α2 z2n, α2

Iα (λ) = , P ,
2 ni=1 Xi 2 ni=1 Xi
P

where zn,α denotes the α quantile of the chi-squared distribution.

5.1.4 Random generation

Despite the quantile function is F −1 (u) = − λ1 log(1 − u), generally the exponential distribution
E(λ) is generated by applying − λ1 log(U ) on a uniform variate U .

5.1.5 Applications

From wikipedia, the exponential distribution occurs naturally when describing the lengths of the
inter-arrival times in a homogeneous Poisson process.

The exponential distribution may be viewed as a continuous counterpart of the geometric distri-
bution, which describes the number of Bernoulli trials necessary for a ”discrete” process to change
state. In contrast, the exponential distribution describes the time for a continuous process to change
state.

In real-world scenarios, the assumption of a constant rate (or probability per unit time) is rarely
satisfied. For example, the rate of incoming phone calls differs according to the time of day. But
58 CHAPTER 5. EXPONENTIAL DISTRIBUTION AND ITS EXTENSIONS

if we focus on a time interval during which the rate is roughly constant, such as from 2 to 4 p.m.
during work days, the exponential distribution can be used as a good approximate model for the
time until the next phone call arrives. Similar caveats apply to the following examples which yield
approximately exponentially distributed variables:

• the time until a radioactive particle decays, or the time between beeps of a geiger counter;

• the time it takes before your next telephone call

• the time until default (on payment to company debt holders) in reduced form credit risk
modeling

Exponential variables can also be used to model situations where certain events occur with a
constant probability per unit ”distance”:

• the distance between mutations on a DNA strand;

• the distance between roadkill on a given road;

In queuing theory, the service times of agents in a system (e.g. how long it takes for a bank
teller etc. to serve a customer) are often modeled as exponentially distributed variables. (The inter-
arrival of customers for instance in a system is typically modeled by the Poisson distribution in most
management science textbooks.) The length of a process that can be thought of as a sequence of
several independent tasks is better modeled by a variable following the Erlang distribution (which
is the distribution of the sum of several independent exponentially distributed variables).

Reliability theory and reliability engineering also make extensive use of the exponential distri-
bution. Because of the “memoryless” property of this distribution, it is well-suited to model the
constant hazard rate portion of the bathtub curve used in reliability theory. It is also very conve-
nient because it is so easy to add failure rates in a reliability model. The exponential distribution is
however not appropriate to model the overall lifetime of organisms or technical devices, because the
“failure rates” here are not constant: more failures occur for very young and for very old systems.

In physics, if you observe a gas at a fixed temperature and pressure in a uniform gravitational
field, the heights of the various molecules also follow an approximate exponential distribution. This
is a consequence of the entropy property mentioned below.
5.2. SHIFTED EXPONENTIAL 59

5.2 Shifted exponential

5.2.1 Characterization

The distribution of the shifted exponential dis- density function

tribution is simply the distribution of X − τ

0.6
E(1/2,0)
when X is exponentially distributed. Therefore E(1/2,1)
E(1/2,2)
the density is given by

0.5
f (x) = λe−λ(x−τ )

0.4
for x > τ . The distribution function is given by

f(x)

0.3
F (x) = 1 − e−λ(x−τ )

0.2
for x > τ .

0.1
As for the exponential distribution, there
exists a moment generating function 0.0

λ
M (t) = e−tτ 0 1 2 3 4 5
λ−t
x

and also a characteristic function Figure 5.2: Density function for shifted exponen-
λ tial distributions
φ(t) = e−itτ .
λ − it

5.2.2 Properties

1 1
The expectation and the variance of an exponential distribution E(λ, τ ) are τ + λ and λ2
.

Furthermore the n-th moment (for n integer) is computable with the binomial formula by
n
X n! (−τ )n
E(X n ) = .
(n − i)! (−λτ )i
i=0

5.2.3 Estimation

Maximum likelihood estimator for τ and λ are given by

n
τ̂ = X1:n and λ̂ = Pn
i=1 (Xi − τ̂ )

where Xi:n denotes the ith order statistic. Since the minimum X1:n follows a shifted exponential
distribution E(nλ, τ ), we have τ̂ is biased but asympotically unbiased.

NEED REFERENCE for unbiased estimators

60 CHAPTER 5. EXPONENTIAL DISTRIBUTION AND ITS EXTENSIONS

5.2.4 Random generation

The random generation is simple: just add τ to the algorithm of exponential distribution.

5.2.5 Applications

NEED REFERENCE

5.3 Inverse exponential

5.3.1 Characterization

This is the distribution of the random variable density function

1
X when X is exponentially distributed. The
0.6

density defined as IE(1)

IE(2)
IE(3)

λ −λ
0.5

f (x) = e x,
x2
0.4

where x > 0 and λ > 0. The distribution func-

tion can then be derived as
f(x)

0.3

λ
F (x) = e− x .
0.2

We can define inverse exponential distributions

0.1

with characteristic or moment generating func-

tions
0.0

√ √
φ(t) = 2 −itλK1 2 −iλt 0 1 2 3 4 5

and √ √ Figure 5.3: Density function for inverse exponen-

M (t) = 2 −itλK1 2 −λt . tial distributions
where K. (.) denotes the modified Bessel function.

5.3.2 Properties

Moments of the inverse exponential distribution are given by

E(X r ) = λr ∗ Γ(1 − r)

for r < 1. Thus the expectation and the variance of the inverse exponential distribution do not
exist.
5.4. GAMMA DISTRIBUTION 61

5.3.3 Estimation

Maximum likelihood estimator of λ is

n
!−1
X 1
λ̂ = n ,
Xi
i=1

which is also the moment based estimator with E(X −1 ) = λ−1 .

5.3.4 Random generation

The algorithm is simply to inverse an exponential variate of parameter 1

λ, i.e. (−λ log(U ))−1 for
an uniform variable U .

5.3.5 Applications

NEED REFERENCE

5.4 Gamma distribution

5.4.1 Characterization

The gamma distribution is a generalization of density function

the exponential distribution. Its density is de-
fined as G(1,1)
G(2,1)
G(2,2)
1.0

λα −λx α−1 G(1/2,1)

f (x) = e x ,
Γ(α)
0.8

where x ≥ 0, α, λ > 0 and Γ denotes the gamma

0.6

function. We retrieve the exponential distribu-

f(x)

tion by setting α to 1. When α is an integer,

the gamma distribution is sometimes called the
0.4

Erlang distribution.
0.2

The distribution function can be expressed

in terms of the incomplete gamma distribution.
0.0

We get
0 1 2 3 4 5
γ(α, λx)
F (x) = , x
Γ(α)
Figure 5.4: Density function for gamma distribu-
where γ(., .) is the incomplete gamma function. tions
62 CHAPTER 5. EXPONENTIAL DISTRIBUTION AND ITS EXTENSIONS

There is no analytical formula except when

we deal with Erlang distribution (i.e. α ∈ N).
In this case, we have
α−1
X (λx)i −λx
F (x) = 1 − e .
i!
i=0

For the gamma distribution, the moment generating and characteristic functions exist.
−α
λ
φ(t) = ,
λ − it

and −α
λ
M (t) = .
λ−t

5.4.2 Properties

The expectation of a gamma distribution G(α, λ) is E(X) = αλ , while its variance is V ar(X) = α
λ2
.

For a gamma distribution G(α, λ), the τ th moment is given by

Γ(α + r)
E(X r ) = λr ,
Γ(α)

provided that α + r > 0.

As for the exponential, we have a property on the convolution of gamma distributions. Let
X and Y be gamma distributed G(α, λ) and G(β, λ), we can prove that X + Y follows a gamma
distribution G(α + β, λ).

X
For X and Y gamma distributed (G(α, λ) and G(β, λ) resp.), we also have that X+Y follows a
beta distribution of the first kind with parameter α and β.

5.4.3 Estimation

Method of moments give the following estimators

(X̄n )2 X̄n
α̃ = 2
and λ̃ = 2 .
Sn Sn

with X̄n and Sn2 the sample mean and variance.

Maximum likelihood estimators of α, λ verify the system

(
log α − ψ(α) = log( n1 ni=1 Xi ) − n1 ni=1 log Xi
P P
,
λ = Pnnα Xi
i=1
5.5. GENERALIZED ERLANG DISTRIBUTION 63

where ψ(.) denotes the digamma function. The first equation can be solved numerically∗ to get α̂
and then λ̂ = X̄α̂ . But λ̂ is biased, so the unbiased estimator with minimum variance of λ is
n

α̂n α̂
λ̄ =
α̂n − 1 X̄n

NEED REFERENCE for confidence interval

5.4.4 Random generation

Simulate a gamma G(α, λ) is quite tricky for non integer shape parameter. Indeed, if the shape
parameter α is integer, then we simply sum α exponential random variables E(λ). Otherwise we
need to add a gamma variable G(α−bαc, λ). This is carried out by an acceptance/rejection method.

NEED REFERENCE

5.4.5 Applications

NEED REFERENCE

5.5 Generalized Erlang distribution

5.5.1 Characterization

density function
As the gamma distribution is the distribution
of the sum of i.i.d. exponential distributions, Erlang(1,2,3)
Erlang(1,2,4)
the generalized Erlang distribution is the dis- Erlang(1,3,5)
Erlang(2,3,4)
tribution of the sum independent exponential
0.6

distributions. Sometimes it is called the hypo-

exponential distribution. The density is defined
as
0.4
f(x)

 
d d
X Y λ j  λi e−λi x ,
f (x) = 
λj − λi
i=1 j=1,j6=i
0.2

where x ≥ 0 and λj > 0’s† are the paremeters

(for each exponential distribution building the
0.0

generalized Erlang distribution). There is an

0 1 2 3 4 5
∗
algorithm can be initialized with α̃. x
†
with the constraint that all λj ’s are strictly different.
Figure 5.5: Density function for generalized Er-
lang distributions
64 CHAPTER 5. EXPONENTIAL DISTRIBUTION AND ITS EXTENSIONS

explicit form for the distribution function:

 
d d
X Y λj 
F (x) =  (1 − e−λi x ).
λj − λi
i=1 j=1,j6=i

This distribution is noted Erlang(λ1 , . . . , λd ).

Of course, we retrieve the Erlang distribution when ∀i, λi = λ.

Finally, the characteristic and moment generating functions of generalized Erlang distribution
are
d d
Y λj Y λj
φ(t) = and M (t) = .
λj − it λj − t
j=1 j=1

5.5.2 Properties

d
P 1
The expectation of the generalized Erlang distribution is simply E(X) = λi and its variance
i=1
d
P 1
V ar(X) = λ2i
.
i=1

5.5.3 Estimation

NEED REFERENCE

5.5.4 Random generation

The algorithm is very easy simulate independently d random variables exponentially E(λj ) dis-
tributed and sum them.

5.5.5 Applications

NEED REFERENCE

5.6 Chi-squared distribution

A special case of the gamma distribution is the chi-squared distribution. See section 6.1.
5.7. INVERSE GAMMA 65

5.7 Inverse Gamma

5.7.1 Characterization

The inverse gamma distribution is the distribu- density function

tion of a random variable X1 when X is gamma
distributed. Hence the density is InvG(3/2,1)
InvG(3/2,3/2)

1.5
InvG(1,3)

λα λ
f (x) = α+1
e− x ,
Γ(α)x

1.0
where x > 0 and β, α > 0. From this, we can
derive the distribution function

f(x)
γ(α, λx )
F (x) = .

0.5
Γ(α)

We can define inverse gamma distributions

0.0

with characteristic or moment generating func-

0.0 0.5 1.0 1.5 2.0 2.5 3.0
tions
x
√ α
2 −itλ √
φ(t) = Kα (2 −iλt) Figure 5.6: Density function for inverse gamma
Γ(α) distributions
and √ α
2 −itλ √
M (t) = Kα (2 −λt).
Γ(α)
where K. (.) denotes the modified Bessel function.

5.7.2 Properties

λ
The expectation exists only when α > 1 and in this case E(X) = α−1 , whereas the variance is only
λ2
finite if α > 2 and V ar(X) = (α−1)2 (α−2)
.

5.7.3 Estimation

Method of moments give the following estimators

(X̄n )2
α̃ = 2 + and λ̃ = X̄n (α̃ − 1)
Sn2

with X̄n and Sn2 the sample mean and variance. If the variance does not exist, then α will be 2, it
means we must use the maximum likelihood estimator (which works also for α ≤ 2).
66 CHAPTER 5. EXPONENTIAL DISTRIBUTION AND ITS EXTENSIONS

Maximum likelihood estimators of α, λ verify the system


 log α − ψ(α) = log( n1 ni=1
P 1 1 Pn 1
Xi ) − n i=1 log Xi
P −1 ,
 λ = α n1 ni=1 X1
i

where ψ(.) denotes the digamma function. The first equation can be solved numerically∗ to get α̂
and then λ̂ with the second equation.

5.7.4 Random generation

Simply generate a gamma variable G(α, 1/λ) and inverse it.

5.7.5 Applications

NEED REFERENCE

5.8 Transformed or generalized gamma

5.8.1 Characterization

The transformed gamma distribution is defined density function

by the following density function
1.0

TG(3,1/2,1)
TG(3,1/2,1/3)
x τ
−( λ TG(3,1/2,4/3)
τ ( λx )ατ −1 e )
f (x) = ,
0.8

λΓ(α)

where x > 0 and α, λ, τ > 0. Thus, the distri-

0.6

bution function is
f(x)

γ(α, ( λx )τ )
0.4

F (x) = .
Γ(α)
0.2

1
This is the distribution of the variable λX τ
when X is gamma distributed G(α, 1).
0.0

0 1 2 3 4 5
Obviously, a special case of the transformed
x
gamma is the gamma distribution with τ = 1.
But we get the Weibull distribution with α = 1. Figure 5.7: Density function for transformed
gamma distributions
∗
algorithm can be initialized with α̃.
5.8. TRANSFORMED OR GENERALIZED GAMMA 67

5.8.2 Properties

The expectation of the transformed gamma dis-

λΓ(α+ τ1 )
tribution is E(X) = Γ(α) and its variance
λ2 Γ(α+ τ2 )
V ar(X) = Γ(α) − E 2 [X].

From Venter (1983) moments are given by

Γ(α + τr )
E(X r ) = λr ,
Γ(α)
r
with α + τ > 0.

5.8.3 Estimation

Maximum likelihood estimators verify the following system

 n n
1 P 1 P
ψ(α) − log α = τ log X − log( Xiτ )

i


 n n

 i=1 i=1

 n
n n n −1
1 P τ 1 P τ 1 P τ 1 P
α= n Xi n Xi log Xi − n Xi n log Xi ,
 i=1 i=1 i=1 i=1
τ

n


1 P τ α−τ



 λ = n X i
i=1

where ψ denotes the digamma function. This system can be solved numerically.

TODO : use Gomes et al. (2008)

5.8.4 Random generation

1
Generate a gamma distributed variable (G(α, 1)), raise it to power τ and multiply it by λ.

5.8.5 Applications

In an actuarial context, the transformed gamma may be useful in loss severity, for example, in
workers’ compensation, see Venter (1983).
68 CHAPTER 5. EXPONENTIAL DISTRIBUTION AND ITS EXTENSIONS

5.9 Inverse transformed Gamma

5.9.1 Characterization

The transformed gamma distribution is defined density function

by the following density function
λ τ ITG(3,2,1)
τ ( λ )ατ e−( x ) ITG(3,2,1/2)

f (x) = x ITG(3,2,4/3)

3.0
,
xΓ(α)

2.5
where x > 0 and α, λ, τ > 0. Thus, the distri-
bution function is

2.0
γ(α, ( λx )τ )
F (x) = 1 − .

f(x)
Γ(α)

1.5
λ τ
1
This is the distribution of X when X is

1.0
gamma distributed G(α, 1).

0.5
0.0

5.9.2 Properties
0.0 0.5 1.0 1.5 2.0 2.5 3.0

x
The expectation of the transformed gamma dis-
λΓ(α− τ1 ) Figure 5.8: Density function for inverse trans-
tribution is E(X) = Γ(α) and its variance
formed gamma distributions
λ2 Γ(α− τ2 )
V ar(X) = Γ(α) − E 2 [X].

From Klugman et al. (2004), we have the

following formula for the moments
rλr Γ(α − τr )
E(X ) = .
Γ(α)

5.9.3 Estimation

NEED REFERENCE

5.9.4 Random generation

1
Simply simulate a gamma G(α, 1) distributed variable, inverse it, raise it to power α and multiply
it by λ.

5.9.5 Applications

NEED REFERENCE
5.10. LOG GAMMA 69

5.10 Log Gamma

5.10.1 Characterization

Density function for log-gamma distribution is expressed as

x−a
x−a
−e
ek b
b

f (x) =
Γ(k)

for x > 0, where a is the location parameter, b > 0 the scale parameter and k > 0 the shape
parameter. The distribution function is
x−a
γ(k, e b )
F (x) = ,
Γ(k)

for x > 0. This is the distribution of a + b log(X) when X is gamma G(k, 1).

5.10.2 Properties

The expectation is E(X) = a + bψ(k) and the variance V ar(X) = b2 ψ1 (k) where ψ is the digamma
function and ψ1 the trigamma function.

5.10.3 Estimation

NEED REFERENCE

5.10.4 Random generation

Simply simulate a gamma G(k, 1) distributed variable and returns a + b log(X).

5.10.5 Applications

NEED REFERENCE

5.11 Weibull distribution

5.11.1 Characterization
70 CHAPTER 5. EXPONENTIAL DISTRIBUTION AND ITS EXTENSIONS
density function

W(3,1)
Despite the fact the Weibull distribution is not W(3,2)
W(4,2)
particularly related to the chi distribution, its

1.0
W(4,3)

density tends exponentially fast to zero, as chi’s

related distribution. The density of a Weibull

0.8
distribution is given by

0.6
f(x)
β β−1 −( xη )β
f (x) = x e ,
ηβ

0.4
where x > 0 and η, β > 0. In terms of dis-

0.2
tribution function, the Weibull can be defined
as
−( x )η
F (x) = 1 − e β .

0.0
0 2 4 6 8 10

x
There exists a second parametrization of the
Weibull distribution. We have Figure 5.9: Density function for Weibull distribu-
tions
λ
f (x) = τ λxλ−1 e−τ x ,

with the same constraint on the parameters τ, λ > 0. In this context, the distribution function is
λ
F (x) = 1 − e−τ x .

We can pass from the first parametrization to the second one with
(
λ=β
.
τ = η1β

5.11.2 Properties

The expectation of a Weibull distribution W(η, β) is E(X) = ηΓ(1+ β1 ) and the variance V ar(X) =
τ (1+ τ1 )
η 2 [Γ( β+2 β+1 2
β ) − Γ( β ) ]. In the second parametrization, we have E(X) = 1 and V ar(X) =
λτ
1
2 (τ (1 + τ2 ) − τ (1 + τ1 )2 ).
λτ

The rth raw moment E(X r ) of the Weibull distribution W(η, β) is given by ηΓ(1 + βr ) for r > 0.

Xβ
The Weibull distribution is the distribution of the variable η where X follows an exponential
distribution E(1).

5.11.3 Estimation

We work in this sub-section with the first parametrization. From the cumulative distribution, we
have
log(− log |1 − F (x)|) = β log x − β log η.
5.12. INVERSE WEIBULL DISTRIBUTION 71

Thus we can an estimation of β and η by regressing log(− log | ni |) on log Xi:n . Then we get the
following estimators
b̂
β̃ = â and η̃ = e− â ,

where â and b̂ are respectively the slope and the intercept of the regression line.

The maximum likelihood estimators verify the following system

( Pn
− nβ
η +
β
η β+1 i=1 (xi )β = 0
n P n Pn xi β ,
β − n ln(η) + i=1 ln(xi ) − i=1 ln(xi )( η ) =0

which can be solved numerically (with algorithm initialized by the previous estimators).

5.11.4 Random generation

1
Using the inversion function method, we simply need to compute β(− log(1 − U )) η for the first
1
parametrization or − log(1−U
τ
) λ
for the second one where U is an uniform variate.

5.11.5 Applications

The Weibull was created by Weibull when he studied machine reliability.

NEED REFERENCE

5.12 Inverse Weibull distribution

5.12.1 Characterization
density function

The inverse Weibull distribution is defined as InvW(3,1)

InvW(3,2)
InvW(4,2)
4

β η
ηβ η e−( x ) InvW(4,3)

f (x) = ,
xη+1
3

where x > 0 and η, β > 0. Its distribution

function is
f(x)

β η
F (x) = e−( x ) .
2

This is the distribution of 1/X when X is

Weibull distributed W(β −1 , η).

0.0 0.5 1.0 1.5 2.0

x
72 CHAPTER 5. EXPONENTIAL DISTRIBUTION AND ITS EXTENSIONS

5.12.2 Properties

The expectation is given by ηΓ(1 − β1 ) and the

variance η 2 [Γ( β−2 β−1 2
β ) − Γ( β ) ].

The rth moment of the Weibull distribution

IW(η, β) is given by η r Γ(1 − βr ) for r > 0.

5.12.3 Estimation

Maximum likelihood estimators for β and η ver-

ify the following system
 η−1
 1 = 1 Pn β
β n i=1 Xi
η ,
 1 + log(β) = 1 Pn β
log β 1 Pn
η n i=1 Xi Xi + n i=1 log(Xi )

while the method of moment has the following system


 (Sn2 + (X̄n )2 )Γ2 (1 − β1 ) = (X̄n )2 Γ(1 − β2 )
X̄n .
 η= Γ(1− β1 )

Both are to solve numerically.

5.12.4 Random generation

Simply generate a Weibull variable W(β −1 , η) and inverse it.

5.12.5 Applications

NEED REFERENCE

TODO Carrasco et al. (2008)

5.13 Laplace or double exponential distribution

5.13.1 Characterization

Density for the Laplace distribution is given by density function

1 − |x−m|
0.5

f (x) = e σ , L(0,1)
2σ 2 L(0,1)
L(0,3)
0.4
.3
5.13. LAPLACE OR DOUBLE EXPONENTIAL DISTRIBUTION 73

for x ∈ R, m the location parameter and σ >

0 the scale parameter. We have the following
distribution function
(
1 − m−x
2e if x < m
σ
F (x) = 1 − x−m
.
1 − 2e σ otherwise

There exists a moment generating function for

this distribution, which is

emt
M (t) = ,
1 − σ 2 t2
for |t| < σ1 . The characteristic function is ex-
pressed as
eimt
φ(t) = ,
1 + σ 2 t2
for t ∈ R.

5.13.2 Properties

The expectation for the Laplace distribution is given by E(X) = m while the variance is V ar(X) =
2σ 2 .

5.13.3 Estimation

Maximum likelihood estimators for m and σ are

( X n +X n+2
2 :n :n
m̂ = 2
2
if n is even ,
X
bn
2
c:n otherwise

where Xk:n denotes the kth order statistics and

n
1X
σ̂ = |Xi − m̂|.
n
i=1

5.13.4 Random generation

Let U be a uniform variate. Then the algorithm is

• V = U − 1/2

• X = m + σsign(V ) log(1 − 2|V |)

• return X
74 CHAPTER 5. EXPONENTIAL DISTRIBUTION AND ITS EXTENSIONS

5.13.5 Applications

NEED

The Double Exponential Distribution: Using Calculus to Find a Maximum Likelihood Estimator
Robert M. Norton The American Statistician, Vol. 38, No. 2 (May, 1984), pp. 135-136
Chapter 6

Chi-squared’s ditribution and related

extensions

6.1 Chi-squared distribution

6.1.1 Characterization

There are many ways to define the chi-squared density function

distribution. First, we can say a chi-squared
0.5

Chisq(2)
distribution is the distribution of the sum Chisq(3)
Chisq(4)
Chisq(5)
k
0.4

X
Xi2 ,
i=1
0.3

where (Xi )i are i.i.d. normally distributed

f(x)

N (0, 1) and a given k. In this context, k is

0.2

assumed to be an integer.

We can also define the chi-squared distribu-

0.1

tion by its density, which is

0.0

k
x 2 −1 x
f (x) = k e− 2 , 0 2 4 6 8 10
Γ( k2 )2 2
x

where k is the so-called degrees of freedom and Figure 6.1: Density function for chi-squared dis-
x ≥ 0. One can notice that is the density of a tributions
gamma distribution G( k2 , 12 ), so k is not neces-
sarily an integer. Thus the distribution function
can be expressed with the incomplete gamma
function
γ( k , x )
F (x) = 2 k 2 .
Γ( 2 )

75
76 CHAPTER 6. CHI-SQUARED’S DITRIBUTION AND RELATED EXTENSIONS

Thirdly, the chi-squared distribution can be defined in terms of its moment generating function
k
M (t) = (1 − 2t)− 2 ,

or its characteristic function

k
φ(t) = (1 − 2it)− 2 .

6.1.2 Properties

The expectation and the variance of the chi-squared distribution are simply E(X) = k and
V ar(X) = 2k. Raw moments are given by
r k
r 1 Γ( 2 + r)
E(X ) = .
2 Γ( k2 )

6.1.3 Estimation

Same as gamma distribution ??

6.1.4 Random generation

For an integer k, just sum the square of k normal variable. Otherwise use the algorithm for the
gamma distribution.

6.1.5 Applications

The chi-squared distribution is widely used for inference, typically as pivotal function.
6.2. CHI DISTRIBUTION 77

6.2 Chi distribution

6.2.1 Characterization

This is the distribution of the sum density function

0.6
u k Chi(2)
Chi(3)
uX
t Xi2 , Chi(4)
Chi(5)

0.5
i=1

where (Xi )i are i.i.d. normally distributed

0.4
N (0, 1) and a given k. This is equivalent as the
distribution of a square root of a chi-squared

f(x)

0.3
distribution (hence the name).

0.2
The density function has a closed form

0.1
xk−1 2
− x2
f (x) = k e ,
−1 k
2 2 Γ
0.0

2
0 1 2 3 4
where x > 0. The distribution function can
x
be expressed in terms of the gamma incomplete
function Figure 6.2: Density function for chi distributions
k x2
γ( , )
F (x) = 2 k 2 ,
Γ 2
for x > 0.

Characteristic function and moment generating function exist and are expressed by

√ Γ k+1

k 1 −t2

2
φ(t) = 1 F1 , , + it 2
Γ k2

2 2 2

and
√ Γ k+1

k 1 t2

2
M (t) = 1 F1 , , +t 2 .
2 2 2 Γ k2

6.2.2 Properties

√
2Γ( k+1 )
The expectation and the variance of a chi distribution are given by E(X) = 2
Γ( k2 )
and V ar(X) =
k − E 2 (X). Other moments are given by

r Γ( k+r
2 )
E(X r ) = 2 2 ,
Γ( k2 )

for k + r > 0.
78 CHAPTER 6. CHI-SQUARED’S DITRIBUTION AND RELATED EXTENSIONS

6.2.3 Estimation

The maximum likelihood estimator of k satisfies the following equation

n
1 k log(2) 1X
ψ + = log(Xi ),
2 2 2 n
i=1

where ψ denotes the digamma function. This equation can be solved on the positive real line or
just the set of positive integers.

6.2.4 Random generation

Take the square root of a chi-squared random variable.

6.2.5 Applications

NEED REFERENCE

6.3 Non central chi-squared distribution

6.3.1 Characterization

The non central chi-squared distribution is the density function

distribution of the sum
Chisq(2)
0.5

k Chisq(2,1)
X Chisq(4)
Xi2 , Chisq(4,1)
0.4

i=1

where (Xi )i are independent normally dis-

0.3

tributed N (µi , 1), i.e. non centered normal ran-

f(x)

dom variable. We generally define the non cen-

tral chi-squared distribution by the density
0.2

1 x k−2 x+λ
√
e− 2 I k −1
4
f (x) = λx ,
0.1

2 λ 2

for x > 0, k ≥ 2 the degree of freedom, λ the

0.0

non central parameter and Iλ the Bessel’s mod-

0 2 4 6 8 10
ified function. λ is related to the previous sum
x
by
X k Figure 6.3: Density function for non central chi-
λ= µ2i . squared distributions
i=1
6.3. NON CENTRAL CHI-SQUARED DISTRIBUTION 79

The distribution function can be expressed

in terms of a serie
+∞ λ j k x
−λ ( ) γ(j +
2, 2)
X
2
F (x) = e 2 ,
j=0
j!Γ(j + k2 )

for x > 0 where γ(., .) denotes the incomplete gamma function.

Moment generating function for the non central chi-squared distribution exists
λt
e 1−2t
M (t) = k
(1 − 2t) 2

and the characteristic function

λit
e 1−2it
φ(t) = k ,
(1 − 2it) 2
from which we see it is a convolution of a gamma distribution and a compound Poisson distribution.

6.3.2 Properties

Moments for the non central chi-squared distribution are given by

n−1
X (n − 1)!2j−1
E(X n ) = 2n−1 (n − 1)!(k + nλ) + (k + jλ)E(X n−j ),
(n − j)!
j=1

where the first raw moment is

E(X) = k + λ.
The variance is V ar(X) = 2(k + 2λ).

6.3.3 Estimation

Li & Yu (2008) and Saxena & Alam (1982)

6.3.4 Random generation

For integer k degrees q

of freedom, we can use the definition of the sum, i.e. sum k idependent normal
λ
random variables N ( k , 1).

6.3.5 Applications

NEED REFERENCE
80 CHAPTER 6. CHI-SQUARED’S DITRIBUTION AND RELATED EXTENSIONS

6.4 Non central chi distribution

6.4.1 Characterization

This is the distribution of the sum v

u k
uX
t X 2, i
i=1

where (Xi )i are i.i.d. normally distributed N (µi , 1) and a given k. This is equivalent as the
distribution of a square root of a non central chi-squared distribution (hence the name).

We generally define the non central chi distribution by

λxk x2 +λ2
f (x) = k e− 2 I k −1 (λx),
2
(λx) 2

where x > 0 and I. (.) denotes the modified Bessel’s function. The distribution function can be
expressed in terms of the gamma incomplete function

F (x) =??,

for x > 0.

6.4.2 Properties

The expectation and the variance of a chi distribution are given by

π (k/2−1) −λ2
r
E(X) = L
2 1/2 2
and
V ar(X) = k + λ2 − E 2 (X),
where L(.)
. denotes the generalized Laguerre polynomials. Other moments are given by

E(X r ) =??,

for k + r > 0.

6.4.3 Estimation

NEED REFERENCE

6.4.4 Random generation

NEED REFERENCE
6.5. INVERSE CHI-SQUARED DISTRIBUTION 81

6.4.5 Applications

NEED REFERENCE

6.5 Inverse chi-squared distribution

The inverse chi-squared distribution is simply density function

the distribution of X1 when X is chi-squared
distributed. We can also define the chi-squared InvChisq(2)
InvChisq(3)
distribution by its density, which is InvChisq(4)

2.5
InvChisq(2.5)

k
2− 2 − k−2 − 1

2.0
f (x) = x 2 e 2x ,
Γ( k2 )

1.5
f(x)
where k is the so-called degrees of freedom and
x ≥ 0. Thus the distribution function can be
1.0

expressed with the incomplete gamma function

0.5

Γ( k2 , 2x
1
)
F (x) = ,
Γ( k2 )
0.0

0.0 0.2 0.4 0.6 0.8 1.0

where Γ(., .) the upper incomplete gamma func-
x
tion.
Figure 6.4: Density function for inverse chi-
Thirdly, the chi-squared distribution can be squared distributions
defined in terms of its moment generating func-
tion
k
√

2 −t 4
M (t) = Kk −2t ,
Γ( k2 ) 2 2

or its characteristic function

k √
2 −it 4
φ(t) = Kk −2it .
Γ( k2 ) 2 2

6.5.1 Properties

1
The expectation and the variance of the chi-squared distribution are simply E(X) = k−2 if k > 2
and V ar(X) = (k−2)22 (k−4) . Raw moments are given by

E(X r ) =??
82 CHAPTER 6. CHI-SQUARED’S DITRIBUTION AND RELATED EXTENSIONS

6.5.2 Estimation

Maximum likelihood estimator for k verifies the equation

n
k 1X
ψ = − log(2) − log(xi ),
2 n
i=1

where ψ denotes the digamma function.

6.5.3 Random generation

Simply inverse a chi-squared random variable

6.5.4 Applications

NEED REFERENCE

6.6 Scaled inverse chi-squared distribution

6.6.1 Characterization

TODO

6.6.2 Properties

TODO

6.6.3 Estimation

TODO

6.6.4 Random generation

TODO
6.6. SCALED INVERSE CHI-SQUARED DISTRIBUTION 83

6.6.5 Applications

TODO
Chapter 7

Student and related distributions

7.1 Student t distribution

Intro?

7.1.1 Characterization

There are many ways to define the student dis- density function
tribution. One can say that it is the distribution
0.4

T(1)
of √ T(2)
T(3)
dN T(4)
,
C
0.3

where N is a standard normal variable indepen-

dent of C a chi-squared variable with d degrees
of freedom. We can derive the following density
f(x)

0.2

function
− d+1
Γ( d+1 ) x2
2
0.1

f (x) = √ 2 d 1+ ,
πdΓ( 2 ) d

for x ∈ R. d is not necessarily an integer, it

0.0

could be a real but greater than 1. -4 -2 0 2 4

x
The distribution function of the student t
Figure 7.1: Density function for student distribu-
distribution is given by
tions

1 d+1 3 x2
d + 1 2 F1 2 , 2 ; 2 ; − d

1
F (x) = + xΓ √ ,
2 2 πν Γ( d2 )

where 2 F1 denotes the hypergeometric function.

84
7.1. STUDENT T DISTRIBUTION 85

7.1.2 Properties

The expectation of a student distribution is E(X) = 0 if d > 1, infinite otherwise. And the variance
d
is given by V ar(X) = d−2 if d > 2.

Moments are given by

r/2
r
Y 2i − 1 r/2
E(X ) = ν ,
ν − 2i
i=1

where r is an even integer.

7.1.3 Estimation

Maximum likelihood estimator for d can be found by solving numerically this equation
n n
Xi2 d + 1 X (Xi /d)2

d+1 d 1X
ψ −ψ = log 1 + − ,
2 2 n
i=1
d n 1 + Xi2 /d
i=1

where ψ denotes the digamma function.

7.1.4 Random generation

The algorithm is simply

• generate a standard normal distribution N

• generate a chi-squared distribution C

√
• return √dN .
C

7.1.5 Applications

The main application of the student is when dealing with a normally distributed sample, the
derivation of the confidence interval for the standard deviation use the student distribution. Indeed
for a normally distributed N (m, σ 2 ) sample of size n we have that

X̄n − m √
p n
Sn2

follows a student n distribution.

86 CHAPTER 7. STUDENT AND RELATED DISTRIBUTIONS

7.2 Cauchy distribution

7.2.1 Characterization

7.2.2 Characterization

The Cauchy distribution is a special case of the density function

Student distribution when a degree of freedom
Cauchy(0,1)
of 1. Therefore the density function is

0.6
Cauchy(1,1)
Cauchy(1,1/2)
1 Cauchy(1,2)
f (x) = ,

0.5
π(1 + x2 )
where x ∈ R. Its distribution function is

0.4
1 1
F (x) = arctan(x) + .
π 2
f(x)

0.3
There exists a scaled and shifted version of
0.2
the Cauchy distribution coming from the scaled
and shifted version of the student distribution.
0.1

The density is
γ2
0.0

f (x) = ,
π [γ 2 + (x − δ)2 ] -4 -2 0 2 4

x
while its distribution function is
Figure 7.2: Density function for Cauchy distribu-

1 x−δ 1
F (x) = arctan + . tions
π γ 2
Even if there is no moment generating function, the Cauchy distribution has a characteristic function
φ(t) = exp(δ i t − γ |t|).

7.2.3 Properties

The Cauchy distribution C(δ, γ) has the horrible feature not to have any finite moments. How-
ever, the Cauchy distribution belongs to the family of stable distribution, thus a sum of Cauchy
distribution is still a Cauchy distribution.

7.2.4 Estimation

Maximum likelihood estimators verify the following system

n

1 1 P γ
 γ =n

γ 2 +(Xi −δ)2

i=1
n n .
P Xi P δ
=


 γ 2 +(Xi −δ)2 γ 2 +(Xi −δ)2
i=1 i=1
There is no moment based estimators.
7.3. FISHER-SNEDECOR DISTRIBUTION 87

7.2.5 Random generation

Since the quantile function is F −1 (u) = δ + γ tan((u − 1/2)π), we can use the inversion function
method.

7.2.6 Applications

NEED REFERENCE

7.3 Fisher-Snedecor distribution

7.3.1 Characterization

TODO

7.3.2 Properties

TODO

7.3.3 Estimation

TODO

7.3.4 Random generation

TODO

7.3.5 Applications

TODO
Chapter 8

Pareto family

8.1 Pareto distribution

name??

8.1.1 Characterization

The Pareto is widely used by statistician across density function

the world, but many parametrizations of the
1.5

P1(1,1)
Pareto distribution are used. Typically two P1(2,1)
P1(2,2)
different generalized Pareto distribution are P1(2,3)
used in extrem value theory with the work of
Pickands et al. and in loss models by Klugman
1.0

et al. To have a clear view on Pareto distribu-

tions, we use the work of Arnold (1983). Most
f(x)

of the time, Pareto distributions are defined in

terms of their survival function F̄ , thus we omit
0.5

the distribution function. In the following, we

will define Pareto type I, II, III and IV plus the
Pareto-Feller distributions.
0.0

Pareto I 1 2 3 4 5

Figure 8.1: Density function for Pareto I distri-

The Pareto type I distribution PaI (σ, α) is de-
butions
fined by the following survival function
x −α
F̄ (x) = ,
σ
where x > σ and α > 0. Therefore, its density is
α x −α−1
f (x) = ,
σ σ

88
8.1. PARETO DISTRIBUTION 89

still for x > σ. α is the positive slope parameter∗ (sometimes called the Pareto’s index) and σ is
the scale parameter. Pareto type I distribution is sometimes called the classical Pareto distribution
or the European Pareto distribution.

Pareto II
density function

The Pareto type II distribution PaII (µ, σ, α) is

2.0
P2(2,1)
characterized by this survival function P2(2,2)
P2(2,3)
P2(3,2)
x − µ −α

F̄ (x) = 1 + ,

1.5
σ
where x > µ and σ, α > 0. Again α is the shape
parameter, while µ is the location parameter.

f(x)

1.0
We can derive the density from this definition:
x − µ −α−1

α
f (x) = 1+ ,
σ σ
for x > µ. We retrieve the Pareto I distribu- 0.5

tion with µ = σ, i.e. if X follows a Pareto I

0.0

distribution then µ − σ + X follows a Pareto II

distribution. The Pareto II is sometimes called 0 1 2 3 4 5

the American Pareto distribution. x

Figure 8.2: Density function for Pareto II distri-

butions
Pareto III

A similar distribution to the type II distribution density function

is the Pareto type III PaIII (µ, σ, γ) distribution
P3(0,1,1)
defined as P3(1,1,1)
P3(1,2,1)
1 !−1
2.5

P3(1,1,3/2)

x−µ γ
F̄ (x) = 1 + ,
σ
2.0

where x > µ, γ, σ > 0. The γ parameter is

called the index of inequality, and in the special
1.5
f(x)

case of µ = 0, it is the Gini index of inequality.

The density function is given by
1.0

1 1 !−2
1 x − µ γ −1

x−µ γ
f (x) = 1+ ,
0.5

γσ σ σ
where x > µ. The Pareto III is not a general-
0.0

isation of the Pareto II distribution, but from 0 1 2 3 4 5

these two distribution we can derive more gen- x
eral models. It can be seen as the following
Figure 8.3: Density function for Pareto III distri-
transformation µ + σZ γ , where Z is a Pareto II
butions
PaII (0, 1, 1).
∗
the slope of the Pareto chart log F̄ (x) vs. log x, controlling the shape of the distribution.
90 CHAPTER 8. PARETO FAMILY

Pareto IV

The Pareto type IV PaIV (µ, σ, γ, α) distribu- density function

tion is defined by
P4(0,1,1,1)
1 !−α P4(,0,2,1,1)
x−µ γ P4(0,1,3/2,1)

2.5
F̄ (x) = 1 + , P4(0,1,1,2)
σ

2.0
where x > µ and α, σ, γ > 0. The associated
density function is expressed as follows

1.5
1 1 !−α−1

f(x)
α x − µ γ −1

x−µ γ
f (x) = 1+
γσ σ σ

1.0
for x > µ.

0.5
Quantile functions for Pareto distributions
are listed in sub-section random generation.

0.0
0 1 2 3 4 5

The generalized Pareto used in extreme x

value theory due to Pickands (1975) has a lim- Figure 8.4: Density function for Pareto IV distri-
iting distribution with Pareto II PaII (0, σ, α), butions
see chapter on EVT for details. Finally, the
Feller-Pareto is a generalisation of the Pareto
IV distribution, cf. next section.

8.1.2 Properties

Equivalence

It is easy to verify that if X follows a Pareto I distribution PaI (σ, α), then log X follows a translated
exponential distribution T E(σ, α?).

The Pareto type III distribution is sometimes called the log-logistic distribution, since if X has
a logistic distribution then eX has a Pareto type III distribution with µ = 0.

Moments

ασ ασ 2
Moments for the Pareto I distribution are given by E(X) = α−1 if α > 1, V ar(X) = (α−1)2 (α−2)
α
and E(X τ ) = τ α−τ for α > τ and σ = 1.

Moments for the Pareto II, III can be derived from those of Pareto IV distribution, which are
Γ(1 + τ γ)Γ(α − τ γ)
E(X τ ) = σ τ ,
Γ(α)
with −1 < τ γ < α and µ = 0.
8.1. PARETO DISTRIBUTION 91

Convolution and sum

The convolution (i.e. sum) of Pareto I distributions does not have any particular form but the
product of Pareto I distributions does have a analytical form.

If we consider of n i.i.d. Pareto I PaI (σ, α) random variables, then the product Π has the
following density
n−1 x −α
α σ log( x−σ
σ ) σ
fΠ (x) = ,
xΓ(n)
where x > σ.

If we consider only independent Pareto I distribution PaI (σi , αi ), then we have for the density
of the product
n
X αi x −αi −1 Y αk
fΠ (x) = ,
σ σ αi − αk
i=1 k6=i

where x > ni=1 σi .

Other Pareto distributions??

Order statistics

Let (Xi )i be a sample of Pareto distributions. We denote by (Xi:n )i the associated order statistics,
i.e. X1:n is the minimum and Xn:n the maximum.

For Pareto I distribution, the ith order statistic has the following survival function
i i
X x −α(n−j+1) Y n − l + 1
F̄Xi:n (x) = 1+ ,
σ l−j
j=1 l=1
l6=i

where x > 0. Furthermore moments are given by

τ n! Γ(n − i + 1 − τ α−1 )
E(Xi:n ) = στ ,
(n − i)! Γ(n + 1 − τ α−1 )
for τ ∈ R.

For Pareto II distribution, we get

i i
x − µ −α(n−j+1) Y n − l + 1
X
F̄Xi:n (x) = 1+ ,
σ l−j
j=1 l=1
l6=i

where x > µ. Moments can be derived from those in the case of the Pareto I distribution using the
fact Xi:n = µ − σ + Yi:n with Yi:n order statistic for the Pareto I case.

For Pareto III distribution, the ith order statistic follows a Feller-Pareto FPa(µ, σ, γ, i, n−i+1).
Moments of order statistics can be obtained by using the transformation of Pareto II random
92 CHAPTER 8. PARETO FAMILY

γ
variable: we have Xi:n = µ + σZi:n follows a Pareto III distribution, where Z is a Pareto II
PaII (0, 1, 1). Furthermore, we know the moments of the random variable Z:

τ Γ(i + τ )Γ(n − i + τ + 1)
E(Zi:n )=
Γ(i)Γ(n − i + 1)

The minimum of Pareto IV distributions still follows a Pareto IV distribution. Indeed if we

consider n independent random variables Pareto IV PaIV (µ, σ, γ, αi ) distributed, we have

n
!
X
min(X1 , . . . , Xn ) ∼ PaIV µ, σ, γ, αi .
i=1

But the ith order statistic does not have a particular distribution. The intermediate order statistic
can be approximated by the normal distibution with

Xi:n −→ N F −1 (i/n) , i/n (1 − i/n) f −2 F −1 (i/n) n−1

n→+∞

where f and F denotes respectively the density and the distribution function of the Pareto IV
distribution. Moments for the order statistics are computable from the moments of the minima
since we have
Xn
τ n−i
E(Xi:n ) = (−1)r−n+i−1 Cnr Cr−1 τ
E(X1:r ).
r=n−i+1

Since X1:r still follows a Pareto IV distribution PIV (µ, σ, γ, rα), we have

τ
E(X1:r ) = E((µ + σZ1:r )τ ),

τ )= Γ(1+τ γ)Γ(rα−τ γ)
where Z1:r ∼ PaIV (0, 1, γ, rα) and E(Z1:r Γ(rα) .

Truncation

Let us denote by X|X > x0 the random variable X knowing that X > x0 . We have the following
properties (with x0 > µ):

• if X ∼ PaI (σ, α) then X|X > x0 ∼ PaI (x0 , α)∗

• if X ∼ PaII (µ, σ, α) then X|X > x0 ∼ PaI (x0 , σ + x0 − µ, α)

More general distributions do not have any particular form.

∗
In this case, the truncation is a rescaling. It comes from the lack of memory property of the log variable since
the log variable follows an exponential distribution.
8.1. PARETO DISTRIBUTION 93

Record values

Geometric minimization

8.1.3 Estimation

Estimation of the Pareto distribution in the context of actuarial science can be found in Rytgaard
(1990).

Pareto I

Arnold (1983) notices that from a log transformation, the parameter estimation reduces to a prob-
lem for a translated exponentiallly distributed data. From this, we have the following maximum
likelihood estimator for the Pareto I distribution

• α̂n = X1:n ,
h P i−1
• σ̂n = n1 ni=1 log XX1:n
i
,

where (Xi )1≤i≤n denotes a sample of i.i.d. Pareto variables. Those estimators are strongly consis-
tent estimator of α and σ. Let us note that for these estimator we have better than the asymptotic
normality (due to the maximum likelihoodness). The distributions for these two estimators are
respectively Pareto I and Gamma distribution:

• α̂n ∼ PI (σ, nα),

• σ̂n−1 ∼ G(n − 1, (αn)−1 ).

From this, we can see these estimators are biased, but we can derive unbiased estimators with
minimum variance:

• α̃n = n−2
n α̂n ,
h i
• σ̃n = 1 − 1
α̂n σ̂n .

Since those statistics α̃n and σ̃n are sufficient, it is easy to find unbiased estimators of functions of
these parameters h(α, σ) by plugging in α̃n and σ̃n (i.e. h(α̃n , σ̃n )).

However other estimations are possible, for instance we may use a least square regression on the
Pareto chart (plot of log F̄ (x) against log x). We can also estimate parameters by the method of
moments by equalling the sample mean and minimum to corresponding theoretical moments. We
get
94 CHAPTER 8. PARETO FAMILY

nX̄n −X1:n
• α̂nM = n(X̄n −X1:n )
,

nα̂M
n −1
• σ̂nM = nα̂M
X1:n ,
n

where we assume a finite expectation (i.e. α > 1).

Finally, we may also calibrate a Pareto I distribution with a quantile method. We numerically
solve the system  α
 p1 = 1 − Xbnp1 c:n
X σ α ,
 p2 = 1 − bnp2 c:n
σ

for two given probabilities p1 , p2 .

Pareto II-III-IV

Estimation of parameters for Pareto II, III and IV are more difficult. If we write the log-likelihood
for a sample (Xi )1≤i≤n Pareto IV distributed, we have
X n n 1 !
1 xi − µ X xi − µ γ
log L(µ, σ, γ, α) = −1 log −(α+1) log 1 + −n log γ−n log σ+n log α,
γ σ σ
i=1 i=1

with the constraint that ∀1 ≤ i ≤ n, xi > µ. Since the log-likelihood is null when x1:n ≤ µ and a
decreasing function of µ otherwise the maximum likelihood estimator of µ is the minimum µ̂ = X1:n .

Then if we substract µ̂ to all observations, we get the following the log-likelihood

X n n x 1
1 x
i
X i γ
log L(σ, γ, α) = −1 log − (α + 1) log 1 + − n log γ − n log σ + n log α,
γ σ σ
i=1 i=1

which can be maximised numerically. Since there are no close form for estimators of σ, γ, α, we do
not know their distributions, but they are asymptotically normal.

We may also use the method of moments, where again µ̂ is X1:n . Substracting this value to all
observations, we use the expression of moments above to have three equations. Finally solve the
system numerically. A similar scheme can be used to estimate parameters with quantiles.

8.1.4 Random generation

It is very easy to generate Pareto random variate using the inverse function method. Quantiles
function can be easily calculated

−1
• for PI (σ, α) distribution, F −1 (u) = σ(1 − u) α ,
h −1
i
• for PII (µ, σ, α) distribution, F −1 (u) = σ (1 − u) α − 1 + µ,
8.1. PARETO DISTRIBUTION 95

γ
• for PIII (µ, σ, γ) distribution, F −1 (u) = σ (1 − u)−1 − 1 + µ,

h −1
iγ
• for PIV (µ, σ, α) distribution, F −1 (u) = σ (1 − u) α − 1 + µ.

Therefore algorithms for random generation are simply

−1
• for PI (σ, α) distribution, F −1 (u) = σU α,
h −1 i
• for PII (µ, σ, α) distribution, F −1 (u) = σ U α − 1 + µ,
γ
• for PIII (µ, σ, γ) distribution, F −1 (u) = σ U −1 − 1 + µ,

h −1 iγ
• for PIV (µ, σ, α) distribution, F −1 (u) = σ U α − 1 + µ,

where U is an uniform random variate.

8.1.5 Applications

From wikipedia, we get the following possible applications of the Pareto distributions:

• the sizes of human settlements (few cities, many hamlets/villages),

• file size distribution of Internet traffic which uses the TCP protocol (many smaller files, few
larger ones),

• clusters of Bose-Einstein condensate near absolute zero,

• the values of oil reserves in oil fields (a few large fields, many small fields),

• the length distribution in jobs assigned supercomputers (a few large ones, many small ones),

• the standardized price returns on individual stocks,

• sizes of sand particles,

• sizes of meteorites,

• numbers of species per genus (There is subjectivity involved: The tendency to divide a genus
into two or more increases with the number of species in it),

• areas burnt in forest fires,

• severity of large casualty losses for certain lines of business such as general liability, commercial
auto, and workers compensation.

In the litterature, Arnold (1983) uses the Pareto distribution to model the income of an individual
and Froot & O’Connell (2008) apply the Pareto distribution as the severity distribution in a context
of catastrophe reinsurance. Here are just a few applications, many other applications can be listed.
96 CHAPTER 8. PARETO FAMILY

8.2 Feller-Pareto distribution

8.2.1 Characterization

As described in Arnold (1983), the Feller-Pareto distribution is the distribution of

γ
U
X =µ+σ ,
V

where U and V are independent gamma variables (G(δ1 , 1) and G(δ2 , 1) respectively). Let us note
that the ratio of these two variables follows a beta distribution of the second kind. In term of
distribution function, using the transformation of the beta variable, we get

y 1
β δ1 , δ2 , 1+y
x−µ γ
F (x) = with y = ,
β(δ1 , δ2 ) σ

with x ≥ µ, β(., .) denotes the beta function and β(., ., .) the incomplete beta function.

We have the following density for the Feller-Pareto distribution FP(µ, σ, γ, δ1 , δ2 ) :

δ2
−1
( x−µ
σ )
γ
f (x) = 1 ,
γβ(δ1 , δ2 )x(1 + ( x−µ γ δ1 +δ2
σ ) )

x−µ
where x ≥ µ. Let y be σ , the previous expression can be rewritten as

1 !δ2 1 !δ1
1 yγ yγ 1
f (x) = 1 1− 1 ,
γβ(δ1 , δ2 ) 1+y γ 1+y γ xy

for x ≥ µ. In this expression, we see more clearly the link with the beta distribution as well as the
transformation of the variable VU .

There is a lot of special cases to the Feller-Pareto distribution FP(µ, σ, γ, δ1 , δ2 ). When µ = 0,

we retrieve the transformed beta distribution∗ of Klugman et al. (2004) and if in addition γ = 1,
we get the “generalized” Pareto distribution† (as defined by Klugman et al. (2004)).

Finally the Pareto IV distribution is obtained with δ1 = 1. Therefore we have the following
equivalences

• PI (σ, α) = FP(σ, σ, 1, 1, α),

• PII (µ, σ, α) = FP(µ, σ, 1, 1, α),

• PIII (µ, σ, γ) = FP(µ, σ, γ, 1, 1),

• PIV (µ, σ, γ, α) = FP(µ, σ, γ, 1, α).

∗
sometimes called the generalized beta distribution of the second kind.
†
which has nothing to do with the generalized Pareto distribution of the extreme value theory.
8.2. FELLER-PARETO DISTRIBUTION 97

8.2.2 Properties

When µ = 0, raw moments are given by

Γ(δ1 + rγ)Γ(δ2 − rγ)

E(X r ) = σ r ,
Γ(δ1 )Γ(δ2 )

for − δγ1 ≤ r ≤ δ2
γ .

8.2.3 Estimation

NEED REFERENCE

8.2.4 Random generation

Once we have simulated a beta I distribution B, we get a beta II distribution∗ with B̃ = 1−B
B
.
γ
Finally we shift, scale and take the power X = µ + σ B̃ to get a Feller-Pareto random variable.

8.2.5 Applications

NEED REFERENCE

∗
We can also use two gamma variables to get the beta II variable.
98 CHAPTER 8. PARETO FAMILY

8.3 Inverse Pareto

8.3.1 Characterization
density function

From the Feller-Pareto distribution, we get the

1.0
InvP(1,1)
inverse Pareto distribution with µ = 0, δ1 = 1 InvP(2,1)
InvP(2,2)
and γ = 1. Thus the density is InvP(1,2)

0.8
x δ2
1 σ 1 1
f (x) = x x x ,
β(1, δ2 ) 1+ 1+

0.6
σ σ σ

f(x)
It can be rewritten as the density

0.4
τ λxτ −1
f (x) =
(x + λ)τ +1

0.2
which implies the following distribution func-
tion 0.0
τ
x 0.0 0.5 1.0 1.5 2.0 2.5 3.0
F (x) = , x
x+λ
Figure 8.5: Density function for inverse Pareto
for x ≥ 0. Let us note this is the distribution
distributions
of X1 when X is Pareto II.

8.3.2 Properties

λΓ(τ +1)
The expectation of the inverse Pareto distribution is E(X) = Γ(τ ) , but the variance does not
exist.

8.3.3 Estimation

NEED REFERENCE

8.3.4 Random generation

Simply inverse a Pareto II variable.

8.3.5 Applications

NEED REFERENCE
8.4. GENERALIZED PARETO DISTRIBUTION 99

8.4 Generalized Pareto distribution

8.4.1 Characterization
density function

GPD(0)
The generalized Pareto distribution was intro- GPD(1/2)
GPD(1)
duced in Embrechts et al. (1997) in the context GPD(2)

2.0
GPD(3)
of extreme value theory. GPD(-1/3)
GPD(-2/3)
GPD(-1)
GPD(-5/4)

1.5
We first define the standard generalized
Pareto distribution by the following distribu-

f(x)
tion function

1.0
(
− 1ξ
F (x) = 1 − (1 + ξx) if ξ 6= 0 ,

0.5
1 − e−x if ξ = 0

h i
0.0
where x ∈ R+ if ξ ≥ 0 and x ∈ 0, − 1ξ oth-
0.0 0.5 1.0 1.5 2.0 2.5 3.0
erwise. This distribution function is generally
x
denoted by Gξ .
Figure 8.6: Density function for standard gener-
We can see the impact of the shape parame- alized Pareto distributions
ter ξ on the figure on the right. The case where
ξ = 0 can be seen as a limiting case of Gξ when
ξ → 0.

To get the “full” generalized Pareto distribution, we introduce a scale β and a location parameter
µ. We get
 − 1
x−ν ξ
1− 1+ξ β if ξ > 0





x−ν
F (x) = −
1−e β if ξ = 0 ,
1

 −ξ
1 − 1 + ξ x−ν

if ξ < 0


β

h i
where x lies in [ν, +∞[, [ν, +∞[ and ν, ν − βξ respectively. We denote it by Gξ,ν,β (x) (which is
simply Gξ ( x−ν
β )). Let us note when ξ > 0, we have a Pareto II distribution, when ξ = 0 a shifted
exponential distribution and when ξ < 0 a generalized beta I distribution.

From these expression, we can derive a density function for the generalized Pareto distribution
 − 1 −1
1 ξ
1 + ξ x−ν if ξ > 0




 β β
1 − x−ν
f (x) = βe
β if ξ = 0 ,
 1
−1

−ξ

1
1 − (−ξ) x−ν if ξ < 0


β β

for x in the same supports as above.

100 CHAPTER 8. PARETO FAMILY

8.4.2 Properties

For a generalized Pareto distribution Gξ,0,β , we have results on raw moments (for simplicity ν = 0).
The expectation E(X) is finite if and only if ξ < 1. In this case we have
−r !
ξ 1 1
E 1+ X = , for r > −
β 1 + ξr ξ
k !
ξ
E log 1 + X = ξ k k!, for k ∈ N
β
β r+1
E X F̄ (X)r =

, for >0
(r + 1 − ξ)(r + 1) |ξ|
β k Γ(ξ −1 − k) 1
E X k = k+1 −1
k!, for ξ < ,
ξ Γ(1 + ξ ) k

see Embrechts et al. (1997) for details.

If X follows a generalized Pareto distribution GP D(ξ, 0, β), then the treshold excess random
variable X − u|X > u still follows a generalized Pareto distribution GP D(ξ, 0, β + ξu). Let Fu be
the distribution function of X − u|X > u. We have F is in the maximum domain of attraction Hξ
if and only if
lim sup Fu (x) − Gξ,0,β(u) (x) = 0,
u→xf 0<x<xf −u

where β is a positive function. This makes the link between the generalized Pareto distribution
and the generalized extreme value distribution.

8.4.3 Estimation

In this sub-section, we assume ν = 0.

Peak Over a Treshold

We briefly present the Peak Over a Treshold (POT) method to fit the generalized Pareto distribu-
tion. Let (Xi )1≤i≤n an i.i.d. sample whose distribution function belongs to a maximum domain of
attraction Hξ . For a deterministic treshold u > 0, we define the number of exceedances by

Nu = Card(1 ≤ i ≤ n, Xi > u),

with the corresponding excesses (Yi )1≤i≤Nu . We want to fit the excess distribution function Fu
with the GPD distribution function Gξ,0,β(u) .

First we can use the linearity of the mean excess function

β + ξu
E(X − u|X > u) = ,
1−ξ
8.4. GENERALIZED PARETO DISTRIBUTION 101

for a given u. This can be estimated by the empirical mean of the sample (Yi )1≤i≤Nu . Embrechts
et al. (1997) warn us about the difficulty of chosing u, since they are many u for wich the plot of
(u, ȲNu ).

Once we find the treshold u, we can use conditional likelihood estimation on sample (Yi )1≤i≤Nu .
Let τ be −ξ/β. However we can also use a linear regression to fit the shape and the scale parameter.

Maximum likelihood estimation

Maximum likelihood estimators of ξ and β are solutions of the system

n
 P
1 ξXi
 ξ +1


β 2 +βξXi
= βn
i=1
n n ,
1 P ξ 1 P Xi
log 1 + X = ( + 1)

β i

 ξ2 ξ β+ξXi
i=1 i=1

but the system may be instable for ξ ≤ −1/2. When ξ > 1/2, we have some asymptotical properties
of maximum likelihood estimators ξˆ and β̂:
!
√ β̂ L
n ξˆ − ξ, − 1 −→ N (0, M −1 ),
β

where the variance/covariance matrix for the bivariate normal distribution is

−1 1+ξ 1
M = (1 + ξ) .
1 2
Let us note that if we estimate a ξ as zero, then we can try to fit a shifted exponential distribution.

Method of moments

From the properties, we know the theoretical expression of E(X) and E X F̄ (X) . From wich we
get the relation

2E(X)E X F̄ (X) E(X)
β= and ξ = 2 − .
E(X) − 2E X F̄ (X) E(X) − 2E X F̄ (X)

We simply replace E(X) and E X F̄ (X) by the empirical estimators.

8.4.4 Random generation

We have an explicit expression for the quantile function

(
−1 ν + σξ ((1 − u)−ξ − 1) if ξ 6= 0
F (u) = ,
ν − σ log(1 − u) if ξ = 0

thus we can use the inversion function method to generate GPD variables.
102 CHAPTER 8. PARETO FAMILY

8.4.5 Applications

The main application of the generalized Pareto distribution is the extreme value theory, since
there exists a link between the generalized Pareto distribution and the generalized extreme value
distribution. Typical applications are modeling flood in hydrology, natural disaster in insurance
and asset returns in finance.

8.5 Burr distribution

8.5.1 Characterization
density function

The Burr distribution is defined by the follow- Burr(1,1,1)

2.0
Burr(2,1,1)
ing density Burr(2,2,1)
Burr(2,2,2)

ατ (x/λ)τ −1
f (x) = 1.5
λ (1 + (x/λ)τ )α+1

where x ≥ 0, λ the scale parameter and α, τ > 0

f(x)

1.0

the shape parameters. Its distribution function

is given by
0.5

α
λτ

F (x) = 1 − ,
λ τ + xτ
0.0

for x ≥ 0. In a slightly different rewritten form,

0.0 0.5 1.0 1.5 2.0 2.5 3.0
we recognise the Pareto IV distribution
x
x τ −α
F̄ (x) = 1 + , Figure 8.7: Density function for Burr distribu-
λ tions
with a zero location parameter.

8.5.2 Properties

The raw moment of the Burr distribution is given by

Γ(1 + τr )Γ(α − τr )
E(X r ) = λr ,
Γ(α)

hence the expectation and the variance are

!2
Γ(1 + τ1 )Γ(α − τ1 ) Γ(1 + τ2 )Γ(α − τ2 ) Γ(1 + τ1 )Γ(α − τ1 )
E(X) = λ and V ar(X) = λ2 − λ .
Γ(α) Γ(α) Γ(α)
8.5. BURR DISTRIBUTION 103

8.5.3 Estimation

Maximum likelihood estimators are solution of the system

n τ

n P Xi


 α = log 1 + λ
i=1


n n

Xiτ

n P Xi α+1 P Xi
τ = − log λ + τ λ log λ λ +Xiτ ,
τ

 i=1 i=1
 n n
 nλ = − τ −1 1 α+1 P 1
 P
Xi + τ λ

λτ +X τ

λ i
i=1 i=1

which can be solved numerically.

8.5.4 Random generation

1 1
From the quantile function F −1 (u) = λ((1 − u) α − 1) τ , it is easy to generate Burr random variate
1 1
with λ(U α − 1) τ where U is a uniform variable.

8.5.5 Applications

NEED REFERENCE
104 CHAPTER 8. PARETO FAMILY

8.6 Inverse Burr distribution

8.6.1 Characterization
density function

1.4
The inverse Burr distribution (also called the InvBurr(1,1,1,0)
InvBurr(1,2,1,0)
Dagum distribution) is a special case of the InvBurr(2,2,1,0)

1.2
InvBurr(1,2,2,0)
Feller Pareto distribution FP with δ2 = 1.
That is to say the density is given by

1.0
x−µ αγ−1
αγ

0.8
σ
f (x) = α γ+1 ,
σ

f(x)
1 + x−µ

0.6
σ

where x ≤ µ, µ the location parameter, σ the

0.4
scale parameter and α, γ the shape parameters.

0.2
Klugman et al. (2004) defines the inverse Burr
distribution with µ = 0, since this book deals
with insurance loss distributions. In this ex- 0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
pression, it is not so obvious that this is the
x
inverse Burr distribution and not the Burr dis-
tribution. But the density can be rewritten as Figure 8.8: Density function for inverse Burr dis-
tributions
α+1
σ
αγ x−µ
f (x) = α γ+1 ,
σ

σ
x−µ +1

From this, the distribution function can be derived to

 γ
1
F (x) =  α  ,
σ
x−µ +1

for x ≥ µ. Here it is also clearer that this is the inverse Burr distribution since we notice the
survival function of the Burr distribution taken in x1 . We denotes the inverse Burr distribution by
IB(γ, α, β, µ).

8.6.2 Properties

The raw moments of the inverse Burr distribution are given by

Γ(γ + αr )Γ(1 − αr )
E(X r ) = σ r ,
Γ(γ)

when µ = 0 and α > r. Thus the expectation and the variance are

Γ(γ + α1 )Γ(1 − α1 )
E(X) = µ + σ
Γ(γ)
8.7. BETA TYPE II DISTRIBUTION 105

and
Γ(γ + α2 )Γ(1 − α2 ) Γ2 (γ + α1 )Γ2 (1 − α1 )
V ar(X) = σ 2 − σ2
Γ(γ) Γ2 (γ)

Furthermore, we have the following special cases

• with γ = α, we get the inverse paralogistic distribution,

• with γ = 1, we have the log logistic distribution,

• with α = 1, this is the inverse Pareto distribution.

8.6.3 Estimation

The maximum likelihood estimator of µ is simply µ̂ = X1:n for a sample (Xi )i , then working on the
transformed sample Yi = Xi − µ̂, other maximum likelihood estimators are solutions of the system
n
 α
n
log 1 + Yλi
P


 γ =
i=1


n n


n α
log Yσi + (γ + 1) log Yσi Y ασ+σα ,
P P
α =−
i

 i=1 i=1
 n n
σα
n 1
− α γ+1
 P P
= (α + 1)

Yiα +σ α

 σ Yi +σ σ
i=1 i=1

8.6.4 Random generation

− γ1 1
Since the quantile function is F −1 (u) = µ + σ −1 (u − 1)− α , we can use the inverse function
method.

8.6.5 Applications

NEED REFERENCE

8.7 Beta type II distribution

8.7.1 Characterization

There are many ways to characterize the beta type II distribution. First we can say it is the
X
distribution of 1−X when X is beta I distributed. But this is also the distribution of the ratio VU
106 CHAPTER 8. PARETO FAMILY

when U and V are gamma distributed (G(a, 1) and G(b, 1) resp.). The distribution function of the
beta of the second distribution is given by
x
β(a, b, 1+x )
F (x) = ,
β(a, b)

for x ≤ 0. The main difference with the beta I distribution is that the beta II distribution takes
values in R+ and not [0, 1].

The density can be expressed as

xa−1
f (x) = ,
β(a, b)(1 + x)a+b
x
for x ≤ 0. It is easier to see the transformation 1−x if we rewrite the density as
a−1 b−1
x x 1
f (x) = 1− .
1+x 1+x β(a, b)(1 + x)2

As already mentioned above, this is a special case of the Feller-Pareto distribution.

8.7.2 Properties

a a(a+b−1)
The expectation and the variance of the beta II are given by E(X) = b−1 and V ar(X) = (b−1)2 (b−2)
when b > 1 and b > 2. Raw moments are expressed as follows

Γ(a + r)Γ(b − r)
E(X r ) = ,
Γ(a)Γ(b)

for b > r.

8.7.3 Estimation

Maximum likelihood estimators for a and b verify the system

n

1 P
 ψ(a) − ψ(a + b) = (log(1 + Xi ) − log(Xi ))

 n
i=1
n ,
1 P
 ψ(b) − ψ(a + b) = log(1 + Xi )


n
i=1

where ψ denotes the digamma function. We may also use the moment based estimators given by

X̄n (X̄n + 1)
b̃ = 2 + and ã = (b̃ − 1)X̄n ,
Sn2

which have the drawback that b̃ is always greater than 2.

8.7. BETA TYPE II DISTRIBUTION 107

8.7.4 Random generation

X
We can simply use the construction of the beta II, i.e. the ratio of 1−X when X is beta I distributed.
However we may also use the ratio of two gamma variables.

8.7.5 Applications

NEED REFERENCE
Chapter 9

Logistic distribution and related

extensions

9.1 Logistic distribution

9.1.1 Characterization

The logistic distribution is defined by the following distribution function

1
F (x) = x−µ ,
1 + e− s

where x ∈ R, µ the location parameter and s the scale parameter. TODO

9.1.2 Properties

TODO

9.1.3 Estimation

TODO

9.1.4 Random generation

TODO

108
9.1. LOGISTIC DISTRIBUTION 109
110 CHAPTER 9. LOGISTIC DISTRIBUTION AND RELATED EXTENSIONS

9.1.5 Applications

9.2 Half logistic distribution

9.2.1 Characterization

9.2.2 Properties

9.2.3 Estimation

9.2.4 Random generation

9.2.5 Applications

9.3 Log logistic distribution

9.3.1 Characterization

9.3.2 Properties

9.3.3 Estimation

9.3.4 Random generation

9.3.5 Applications

9.4 Generalized log logistic distribution

9.4.1 Characterization

9.4.2 Properties

9.4.3 Estimation

9.4.4 Random generation

9.4.5 Applications

9.5 Paralogisitic distribution

Chapter 10

Extrem Value Theory distributions

10.1 Gumbel distribution

10.1.1 Characterization

The standard Gumbel distribution is defined by density function

the following density function
Gum(0,1)
0.7

−x Gum(1/2,1)
f (x) = e−x−e , Gum(0,1/2)
Gum(-1,2)
0.6

where x ∈ R. Its distribution function is ex-

0.5

pressed as follows
0.4

−x
F (x) = e−e .
f(x)

0.3

A scaled and shifted version of the Gumbel

0.2

distribution exists. The density is defined as

0.1

1 − x−µ −e− x−µ

σ
f (x) = e σ ,
0.0

σ
-4 -2 0 2 4
where x ∈ R, µ ∈ R and σ > 0. We get back to
x
the standard Gumbel distribution with µ = 0
and σ = 1. The distribution function of the
Gumbel I distribution is simply Figure 10.1: Density function for Gumbel distri-
x−µ
butions
−e− σ
F (x) = e ,

for x ∈ R.

There exists a Gumbel distribution of the second kind defined by the following distribution
function x−µ
F (x) = 1 − e−e σ ,

111
112 CHAPTER 10. EXTREM VALUE THEORY DISTRIBUTIONS

for x ∈ R. Hence we have the density

1 x−µ −e x−µ
σ
f (x) =
e σ .
σ
This is the distribution of −X when X is Gumbel I distributed.

The characteristic function of the Gumbel distribution of the first kind exists
φ(t) = Γ(1 − iσt)eiµt ,
while its moment generating function are
M (t) = Γ(1 − σt)eµt .

10.1.2 Properties

The expectation of a Gumbel type I distribution is E(X) = γ, the Euler constant, roughly 0.57721.
2
Its variance is V ar(X) = π6 . Thus for the Fisher-Tippett distribution, we have E(X) = µ + σγ
2 2
and V ar(X) = π 6σ .

For the Gumbel type II, expectation exists if a > 1 and variance if a > 2.

10.1.3 Estimation

Maximum likelihood estimators are solutions of the following system

n

X −µ
1 P − iσ
 1= n e


i=1
n n X −µ
,
1 P 1 P − iσ
X = X e


 n i n i
i=1 i=1

which can solved numerically initialized by the moment based estimators

r
6Sn2
µ̃ = X̄n − σ̃γ and σ̃ = ,
π2
where γ is the Euler constant.

10.1.4 Random generation

The quantile function of the Gumbel I distribution is simply F −1 (u) = µ − σ log(− log(u)), thus
we can use the inverse function method.

10.1.5 Applications

The Gumbel distribution is widely used in natural catastrophe modelling, especially for maximum
flood. NEED REFERENCE
10.2. FRÉCHET DISTRIBUTION 113

10.2 Fréchet distribution

A Fréchet type distribution is a distribution whose distribution function is

x−µ −ξ
F (x) = e−( σ ) ,

for x ≥ µ. One can notice this is the inverse Weibull distribution, see section 5.12 for details.

10.3 Weibull distribution

A Weibull type distribution is characterized by the following distribution function

x−µ β
F (x) = 1 − e−( σ ) ,

for x ≥ µ. See section 5.11 for details.

10.4 Generalized extreme value distribution

10.4.1 Characterization

The generalized extreme value distribution is defined by the following distribution function
1
x−µ − ξ
F (x) = e−(1+ξ σ ) ,
x−µ
for 1+ξ σ > 0, ξ the shape parameter, µ the location parameter and σ > 0 the scale parameter.
We can derive a density function
1
x − µ − ξ −1 −(1+ξ x−µ )− 1ξ

1
f (x) = 1+ξ e σ .
σ σ
This distribution is sometimes called the Fisher-Tippett distribution.

Let us note that the values can be taken in R, R− or R+ according to the sign of ξ. The dis-
tribution function is generally noted by Hξ,µ,σ , wich can expressed with the “standard” generalized
extreme value distribution Hξ,0,1 with a shift and a scaling. When ξ tends to zero, we get the
Gumbel I distribution x−µ
−
Hξ,µ,σ (x) −→ e−e σ .
ξ→0

10.4.2 Properties

The expectation and the variance are

σ σ2
E(X) = µ − Γ(1 − ξ) and V ar(X) = 2 (Γ(1 − 2ξ) − Γ2 (1 − ξ))
ξ ξ
114 CHAPTER 10. EXTREM VALUE THEORY DISTRIBUTIONS

if they exist.

From the extreme value theory, we have the following theorem. Let (Xi )1≤i≤n be an i.i.d.
sample and Xi:n the order statistics. If there exits two sequences (an )n and (bn )n valued in R+ and
R respectively, such that

Xn:n − bn
P
an
have a limit in probability distribution. Then the limiting distribution H for the maximum belongs
to the type of one the following three distribution functions

−x−ξ , x ≥ 0, ξ > 0,
 e
 MDA of Fréchet
−(−x) ξ
H(x) = e , x ≤ 0, ξ < 0, MDA of Weibull ,
 −e−x

e , x ∈ R, ξ = 0, MDA of Gumbel

where MDA stands for maximum domains of attraction∗ . For all distribution, there is a unique
MDA. We quickly see that the limiting distribution for the maximum is nothing else than the gen-
eralized extreme value distribution Hξ,0,1 . This theorem is the Fisher-Tippett-Gnedenko theorem.

X1:n −bn
For the minimum, assuming that P an has a limit, the limiting distribution belongs to
 β
 1 − e−x ,
 x ≥ 0, β > 0
−(−x) β
H̃(x) = 1−e , x ≤ 0, β < 0 .
 1 − e−ex ,

x ∈ R, β = 0

Again there are three types for the limiting distribution† .

In the MDA of Fréchet, we have the Cauchy, the Pareto, the Burr, the log-gamma and the stable
distributions, while in the Weibull MDA we retrieve the uniform, the beta and bounded support
power law distribution. Finally, the MDA of Gumbel contains the exponential, the Weibull, the
gamma, the normal, the lognormal, the Benktander distributions.

From the Embrechts et al. (1997), we also have some equivalence given a MDA:

• a distribution function F belongs to the MDA of Fréchet if and only if 1 − F (x) = x−α L(x)
for some slowly varying function L,

• a distribution function F belongs to the MDA of Weibull if and only if 1 − F (xF − 1/x) =
x−α L(x) for some slowly varying function L and xF < +∞,

• a distribution function F belongs to the MDA of Gumbel if and only if there exists z < xF
R x g(t)
− z a(t) dt
such that 1 − F (x) = c(x)e for some measurable function c, g and a continuous
function a.
−x
∗
Sometimes the distribution characterized by the distribution function e−e is called the extreme maximal-value
distribution.
x
†
Sometimes the distribution characterized by the distribution function 1−e−e is called the extreme minimal-value
distribution.
10.5. GENERALIZED PARETO DISTRIBUTION 115

10.4.3 Estimation

According to Embrechts et al. (1997) maximum likelihood estimation is not very reliable in the case
of the generalized extreme value fitting. But that’s not surprising since the generalized extreme
value distribution is a limiting distribution to very heterogeneous distribution, such as heavy tailed,
light tailed or bounded distributions.

We can use weighted moment method, where we estimate moments

r
ωr (ξ, µ, σ) = E(XHξ,µ,σ (X))

by its empirical equivalent

n
1X r
ω̂r = Xj:n Uj:n ,
n
i=1
r are the order statistics of an uniform sample (which can be replaced by its expectation
where Uj:n
(n−r−1)! (n−j)!
(n−1)! (n−j−r)! ). Equalling the theoretical and the empirical moments, we get that ξ is a solution
of
3ω̂2 − ω̂0 3ξ − 1
= ξ .
2ω̂1 − ω̂0 2 −1
Then we estimate the other two parameters with

(2ω̂1 − ω̂0 )ξˆ σ̂ ˆ

σ̂ = and µ̂ = ω̂0 + (1 − Γ(1 − ξ)).
ˆ ξ̂
Γ(1 − ξ)(2 − 1) ξˆ

10.4.4 Random generation

The quantile function of the generalized extreme value distribution is F −1 (u) = µ+ σξ ((− log u)−ξ )−
1 for ξ 6= 0. So we can use the inverse function method.

10.4.5 Applications

The application of the generalized extreme value distribution is obviously the extremex value theory
which can be applied in many fields : natural disaster modelling, insurance/finance extreme risk
management,. . .

10.5 Generalized Pareto distribution

See section 8.4 for details.

Part III

Multivariate and generalized

distributions

116
Chapter 11

Generalization of common
distributions

11.1 Generalized hyperbolic distribution

This part entirely comes from Breymann & Lüthi (2008).

11.1.1 Characterization

The first way to characterize generalized hyperbolic distributions is to say that the random vector
X follows a multivariate GH distribution if
L √
X = µ + W γ + W AZ (11.1)
where

1. Z ∼ Nk (0, Ik )
2. A ∈ Rd×k
3. µ, γ ∈ Rd
4. W ≥ 0 is a scalar-valued random variable which is independent of Z and has a Generalized
Inverse Gaussian distribution, written GIG(λ, χ, ψ).

Note that there are at least five alternative definitions leading to different parametrizations.

Nevertheless, the parameters of a GH distribution given by the above definition admit the
following interpretation:

• λ, χ, ψ determine the shape of the distribution, that is, how much weight is assigned to the
tails and to the center. In general, the larger those parameters the closer is the distribution
to the normal distribution.

117
118 CHAPTER 11. GENERALIZATION OF COMMON DISTRIBUTIONS

• µ is the location parameter.

• Σ = AA0 is the dispersion-matrix.
• γ is the skewness parameter. If γ = 0, then the distribution is symmetric around µ.

Observe that the conditional distribution of X|W = w is normal,

X|W = w ∼ Nd (µ + w γ, wΣ), (11.2)

Another way to define a generalized hyperbolic distribution is to use the density. Since the
conditional distribution of X given W is Gaussian with mean µ + W γ and variance W Σ the GH
density can be found by mixing X|W with respect to W .
Z ∞
fX (x) = fX|W (x|w) fW (w) dw (11.3)
0
∞ 0 −1 γ
e(x−µ) Σ
Z
Q(x) γΣγ
= d 1 d
exp − − fW (w)dw
0 (2π) 2 |Σ| 2 w 2 2w 2/w
(x−µ)0 Σ−1 γ
d
p
( ψ/χ)λ (ψ + γΣγ) 2 −λ Kλ− d2 ( (χ + Q(x))(ψ + γΣγ)) e
p
= d 1 √ × d ,
( (χ + Q(x))(ψ + γΣγ)) 2 −λ
p
(2π) 2 |Σ| 2 Kλ ( χψ)
where Kλ (·) denotes the modified Bessel function of the third kind and Q(x) denotes the maha-
lanobis distance Q(x) = (x − µ)0 Σ−1 (x − µ) (i.e. the distance with Σ−1 as norm). The domain of
variation of the parameters λ, χ and ψ is given in section 11.1.2.

A last way to characterize generalized hyperbolic distributions is the usage of moment generating
functions. An appealing property of normal mixtures is that the moment generating function is
easily calculated once the moment generating function of the mixture is known. Based on equation
(11.4) we obtain the moment generating function of a GH distributed random variable X as
0
M (t) = E(E(exp t0 X |W )) = et µ E(exp W t0 γ + 1/2 t0 Σt )

λ/2 p
Kλ ( ψ(χ − 2t0 γ − t0 Σt))

t0 µ ψ
= e √ , χ ≥ 2 t0 γ + t0 Σt.
ψ − 2t0 γ − t0 Σt Kλ ( χψ)
For moment generating functions of the special cases of the GH distribution we refer to Prause
(1999) and Paolella (2007).

11.1.2 Parametrization

There are several alternative parametrizations for the GH distribution. In the R package ghyp the
user can choose between three of them. There exist further parametrizations which are not imple-
mented and not mentioned here. For these parametrizations we refer to Prause (1999) and Paolella
(2007).

Table 11.1 describes the parameter ranges for each parametrization and each special case.
Clearly, the dispersion matrices Σ and ∆ have to fulfill the usual conditions for covariance ma-
trices, i.e., symmetry and positive definiteness as well as full rank.
11.1. GENERALIZED HYPERBOLIC DISTRIBUTION 119

(λ, χ, ψ, µ, Σ, γ)-Parametrization
λ χ ψ µ Σ γ
ghyp λ∈R χ>0 ψ >0 µ ∈ Rd Σ ∈ RΣ γ ∈ Rd
hyp λ = d+1
2 χ>0 ψ >0 µ ∈ Rd Σ ∈ RΣ γ ∈ Rd
NIG λ = − 12 χ>0 ψ >0 µ ∈ Rd Σ ∈ RΣ γ ∈ Rd
t λ<0 χ>0 ψ =0 µ ∈ Rd Σ ∈ RΣ γ ∈ Rd
VG λ>0 χ=0 ψ >0 µ ∈ Rd Σ ∈ RΣ γ ∈ Rd

(λ, ᾱ, µ, Σ, γ)-Parametrization

λ ᾱ µ Σ γ
ghyp λ∈R ᾱ > 0 µ ∈ Rd Σ ∈ RΣ γ ∈ Rd
hyp λ = d+12 ᾱ > 0 µ ∈ Rd Σ ∈ RΣ γ ∈ Rd
NIG λ = 21 ᾱ > 0 µ ∈ Rd Σ ∈ RΣ γ ∈ Rd
t λ = − ν2 < −1 ᾱ = 0 µ ∈ Rd Σ ∈ RΣ γ ∈ Rd
VG λ>0 ᾱ = 0 µ ∈ Rd Σ ∈ RΣ γ ∈ Rd

(λ, α, µ, Σ, δ, β)-Parametrization
λ α δ µ ∆ β
ghyp λ∈R α>0 δ >0 µ ∈ Rd ∆ ∈ R∆ β ∈ {x ∈ Rd : α2 − x0 ∆x > 0}
hyp λ = d+1
2 α>0 δ >0 µ ∈ Rd ∆ ∈ R∆ β ∈ {x ∈ Rd : α2 − x0 ∆x > 0}
NIG λ = − 12 α>0 δ >0 µ ∈ Rd ∆ ∈ R∆ β ∈ {x ∈ Rd : α2 − x0 ∆x > 0}
√
t λ<0 α = β 0 ∆β δ >0 µ ∈ Rd ∆ ∈ R∆ β ∈ Rd
VG λ>0 α>0 δ =0 µ ∈ Rd ∆ ∈ R∆ β ∈ {x ∈ R : α2 − x0 ∆x > 0}
d

Table 11.1: The domain of variation for the parameters of the GH distribution and some of its
special cases for different parametrizations. We denote the set of all feasible covariance matrices in
Rd×d with RΣ . Furthermore, let R∆ = {A ∈ RΣ : |A| = 1}.

Internally, he package ghyp uses the (λ, χ, ψ, µ, Σ, γ)-parametrization. However, fitting is done
in the (λ, ᾱ, µ, Σ, γ)-parametrization since this parametrization does not necessitate additional con-
straints to eliminate the redundant degree of freedom. Consequently, what cannot be represented
by the (λ, α, µ, Σ, δ, β)-parametrization cannot be fitted (cf. section 11.1.2).

(λ, χ, ψ, µ, Σ, γ)-Parametrization

The (λ, χ, ψ, µ, Σ, γ)-parametrization is obtained as the normal mean-variance mixture distribution

when W ∼ GIG(λ, χ, ψ). This parametrization has a drawback of an identification problem.
Indeed, the distributions GHd (λ, χ, ψ, µ, Σ, γ) and GHd (λ, χ/k, kψ, µ, kΣ, kγ) are identical for any
k > 0. Therefore, an identifying problem occurs when we start to fit the parameters of a GH
distribution to data. This problem could be solved by introducing a suitable contraint. One
possibility is to require the determinant of the dispersion matrix Σ to be 1.
120 CHAPTER 11. GENERALIZATION OF COMMON DISTRIBUTIONS

(λ, ᾱ, µ, Σ, γ)-Parametrization

There is a more elegant way to eliminate the degree of freedom. We simply constrain the expected
value of the generalized inverse Gaussian distributed mixing variable W to be 1 (cf. 4.5). This
makes the interpretation of the skewness parameters γ easier and in addition, the fitting procedure
becomes faster (cf. 11.1.5).
We define r √
χ Kλ+1 ( χψ)
E(W ) = √ = 1. (11.4)
ψ Kλ ( χψ)
and set p
ᾱ = χψ. (11.5)
It follows that
Kλ+1 (ᾱ) ᾱ2 Kλ (ᾱ)
ψ = ᾱ and χ = = ᾱ . (11.6)
Kλ (ᾱ) ψ Kλ+1 (ᾱ)
The drawback of the (λ, ᾱ, µ, Σ, γ)-parametrization is that it does not exist in the case ᾱ = 0 and
λ ∈ [−1, 0], which corresponds to a Student-t distribution with non-existing variance. Note that
the (λ, ᾱ, µ, Σ, γ)-parametrization yields to a slightly different parametrization for the special case
of a Student-t distribution.

(λ, α, µ, Σ, δ, β)-Parametrization

When the GH distribution was introduced in Barndorff-Nielsen (1977), the following parametriza-
tion for the multivariate case was used.
p 0
(α2 − β 0 ∆β)λ/2 Kλ− d (α δ 2 + (x − µ)0 ∆−1 (x − µ)) eβ (x−µ)
2
fX (x) = dp × d , (11.7)
(α δ 2 + (x − µ)0 ∆−1 (x − µ)) 2 −λ
p p
(2π) 2 |∆| δ λ Kλ (δ α2 − β 0 ∆β)

where the determinant of ∆ is constrained to be 1. In the univariate case the above expression
reduces to

(α2 − β 2 )λ/2 p
fX (x) = √ 1 p × Kλ− 1 (α δ 2 + (x − µ)2 ) eβ(x−µ) , (11.8)
2π αλ− 2 δ λ Kλ (δ α2 − β 2 ) 2

which is the most widely used parametrization of the GH distribution in literature.

Switching between different parametrizations

The following formulas can be used to switch between the (λ, ᾱ, µ, Σ, γ), (λ, χ, ψ, µ, Σ, γ), and
the (λ, α, µ, Σ, δ, β)-parametrization. The parameters λ and µ remain the same, regardless of the
parametrization.
The way to obtain the (λ, α, µ, Σ, δ, β)-parametrization from the (λ, ᾱ, µ, Σ, γ)-parametrization
yields over the (λ, χ, ψ, µ, Σ, γ)-parametrization:

(λ, ᾱ, µ, Σ, γ) (λ, χ, ψ, µ, Σ, γ) (λ, α, µ, Σ, δ, β)

11.1. GENERALIZED HYPERBOLIC DISTRIBUTION 121

(λ, ᾱ, µ, Σ, γ) → (λ, χ, ψ, µ, Σ, γ): Use the relations in (11.6) to obtain χ and ψ. The parameters Σ
and γ remain the same.
q √
K ( χψ)
(λ, χ, ψ, µ, Σ, γ) → (λ, ᾱ, µ, Σ, γ): Set k = ψχ Kλ+1(√χψ) .
λ

p
ᾱ = χψ, Σ ≡ k Σ, γ ≡ kγ (11.9)

(λ, χ, ψ, µ, Σ, γ) → (λ, α, µ, Σ, δ, β):

1
∆ = |Σ|− d Σ , β = Σ−1 γ
q q
1 1
δ = χ|Σ| d , α = |Σ|− d (ψ + γ 0 Σ−1 γ) (11.10)

(λ, α, µ, Σ, δ, β) → (λ, χ, ψ, µ, Σ, γ):

Σ = ∆, γ = ∆β, χ = δ2, ψ = α2 − β 0 ∆β. (11.11)

11.1.3 Properties

Moments

The expected value and the variance are given by

E(X) = µ + E(W )γ (11.12)

V ar(X) = E(Cov(X|W )) + Cov(E(X|X)) (11.13)
0
= V ar(W )γγ + E(W )Σ.

Linear transformation

The GH class is closed under linear transformations: If X ∼ GHd (λ, χ, ψ, µ, Σ, γ) and Y = BX +b,
where B ∈ Rk×d and b ∈ Rk , then Y ∼ GHk (λ, χ, ψ, Bµ + b, BΣB 0 , Bγ). Observe that by
introducing a new skewness parameter γ̄ = Σγ, all the shape and skewness parameters (λ, χ, ψ, γ̄)
become location and scale-invariant, provided the transformation does not affect the dimensionality,
that is B ∈ Rd×d and b ∈ Rd .

11.1.4 Special cases

The GH distribution contains several special cases known under special names.

• If λ = d+12 the name generalized is dropped and we have a multivariate hyperbolic (hyp)
distribution. The univariate margins are still GH distributed. Inversely, when λ = 1 we get
a multivariate GH distribution with hyperbolic margins.

• If λ = − 12 the distribution is called Normal Inverse Gaussian (NIG).

122 CHAPTER 11. GENERALIZATION OF COMMON DISTRIBUTIONS

• If χ = 0 and λ > 0 one gets a limiting case which is known amongst others as Variance
Gamma (VG) distribution.

• If ψ = 0 and λ < −1 one gets a limiting case which is known as a generalized hyperbolic
Student-t distribution (called simply Student-t in what follows).

11.1.5 Estimation

Numerical optimizers can be used to fit univariate GH distributions to data by means of maximum
likelihood estimation. Multivariate GH distributions can be fitted with expectation-maximazion
(EM) type algorithms (see Dempster et al. (1977) and Meng & Rubin (1993)).

EM-Scheme

Assume we have iid data x1 , . . . , xn and parameters represented by Θ = (λ, ᾱ, µ, Σ, γ). The problem
is to maximize
n
X
ln L(Θ; x1 , . . . , xn ) = ln fX (xi ; Θ). (11.14)
i=1

This problem is not easy to solve due to the number of parameters and necessity of maximizing
over covariance matrices. We can proceed by introducing an augmented likelihood function
n
X n
X
ln L̃(Θ; x1 , . . . , xn , w1 , . . . , wn ) = ln fX|W (xi |wi ; µ, Σ, γ) + ln fW (wi ; λ, ᾱ) (11.15)
i=1 i=1

and spend the effort on the estimation of the latent mixing variables wi coming from the mixture
representation (11.2). This is where the EM algorithm comes into play.

E-step: Calculate the conditional expectation of the likelihood function (11.15) given the data
x1 , . . . , xn and the current estimates of parameters Θ[k] . This results in the objective function

Q(Θ; Θ[k] ) = E ln L̃(Θ; x1 , . . . , xn , w1 , . . . , wn )|x1 , . . . , xn ; Θ[k] . (11.16)

M-step: Maximize the objective function with respect to Θ to obtain the next set of estimates
Θ[k+1] .

Alternating between these steps yields to the maximum likelihood estimation of the parameter
set Θ. In practice, performing the E-Step means maximizing the second summand of (11.15)
numerically. The log density of the GIG distribution (cf. 4.5.1) is

λ p χ1 ψ
ln fW (w) = ln(ψ/χ) − ln(2Kλ ( χψ)) + (λ − 1) ln w − − w. (11.17)
2 2w 2
When using the (λ, ᾱ)-parametrization this problem is of dimension two instead of three as it is in
the (λ, χ, ψ)-parametrization. As a consequence the performance increases.
11.1. GENERALIZED HYPERBOLIC DISTRIBUTION 123

Since the wi ’s are latent one has to replace w, 1/w and ln w with the respective expected values in
order to maximize the log likelihood function. Let

[k] [k] [k]
ηi := E wi | xi ; Θ[k] , δi := E wi−1 | xi ; Θ[k] , xii := E ln wi | xi ; Θ[k] . (11.18)
We have to find the conditional density of wi given xi to calculate these quantities.

MCECM estimation

In the R implementation a modified EM scheme is used, which is called multi-cycle, expectation,

conditional estimation (MCECM) algorithm (Meng & Rubin 1993, McNeil, Frey & Embrechts
2005a). The different steps of the MCECM algorithm are sketched as follows:

(1) Select reasonable starting values for Θ[k] . For example λ = 1, ᾱ = 1, µ is set to the sample
mean, Σ to the sample covariance matrix and γ to a zero skewness vector.
(2) Calculate χ[k] and ψ [k] as a function of ᾱ[k] using (11.6).
[k] [k]
(3) Use (11.18), (11.12) to calculate the weights ηi and δi . Average the weights to get
n n
[k] 1 X [k] 1 X [k]
η̄ = ηi and δ̄ [k] = δi . (11.19)
n n
i=1 i=1

(4) If an asymmetric model is to be fitted set γ to 0, else set

[k]
1 ni=1 δi (x̄ − xi )
P
[k+1]
γ = . (11.20)
n η̄ [k] δ̄ [k] − 1
(5) Update µ and Σ:
[k]
1 ni=1 δi (xi − γ [k+1] )
P
[k+1]
µ = (11.21)
n δ̄ [k]
n
1 X [k]
Σ[k+1] = δi (xi − µ[k+1] )(xi − µ[k+1] )0 − η̄ [k] γ [k+1] γ [k+1] 0 . (11.22)
n
i=1

[k,2] [k,2] [k,2]

(6) Set Θ[k,2] = (λ[k] , ᾱ[k] , µ[k+1] , Σ[k+1] , γ [k+1] ) and calculate weights ηi , δi and xii using
(11.18), (4.2) and (11.12).
(7) Maximize the second summand of (11.15) with density (11.17) with respect to λ, χ and ψ to
complete the calculation of Θ[k,2] and go back to step (2). Note that the objective function
must calculate χ and ψ in dependence of λ and ᾱ using relation (11.6).

11.1.6 Random generation

We can simply use the first characterization by adding

√
µ + W γ + W AZ
where Z is a multivariate gaussian vector Nk (0, Ik ) and W follows a Generalized Inverse Gaussian
GIG(λ, χ, ψ).
124 CHAPTER 11. GENERALIZATION OF COMMON DISTRIBUTIONS

11.1.7 Applications

Even though the GH distribution was initially ivented to study the distribution of the logarithm
of particle sizes, we will focus on applications of the GH distribution family in finance and risk
measurement.

We have seen above that the GH distribution is very flexible in the sense that it nests several
other distributions such as the Student-t (cf. 7.1).

To give some references and applications of the GH distribution let us first summarize some
of its important properties. Beside of the above mentioned flexibility, three major facts led to the
popularity of GH distribution family in finance:

(1) The GH distribution features both fat tails and skewness. These properties account for the
stylized facts of financial returns.

(2) The GH family is naturally extended to multivariate distributions∗ . A multivariate GH

distribution does exhibit some kind of non-linear dependence, for example tail-dependence.
This reflects the fact that extremes mostly occure for a couple of risk-drivers simultaneously
in financial markets. This property is of fundamental importance for risk-management, and
can influence for instance the asset allocation in portfolio theory.

(3) The GH distribution is infinitely divisible (cf. Barndorff-Nielsen & Halgreen (1977)). This is
a necessary and sufficient condition to build Lévy processes. Lévy processes are widespread
in finance because of their time-continuity and their ability to model jumps.

Based on these properties one can classify the applications of the GH distributions into the fields
empirical modelling, risk and dependence modelling, derivative pricing, and portfolio selection.

In the following, we try to assign papers to each of the classes of applications mentioned above.
Rather than giving abstracts for each paper, we simply cite them and refer the interested reader
to the bibliography and to the articles. Note that some articles deal with special cases of the GH
distribution only.

Empirical modelling Eberlein & Keller (1995), Barndorff-Nielsen & Prause (2001), Fergusson
& Platen (2006)

Risk and dependence modelling Eberlein et al. (1998), Breymann et al. (2003), McNeil et al.
(2005b), Chen et al. (2005), Kassberger & Kiesel (2006)

Lévy processes Barndorff-Nielsen (1997a,b), Bibby & Sorensen (1997), Dilip B. Madan et al.
(1998), Raible (2000), Cont & Tankov (2003)
∗
The extension to multivariate distributions is natural because of the mixing structure (see eq. (11.2)).
11.2. STABLE DISTRIBUTION 125

Portfolio selection Kassberger (2007)

11.2 Stable distribution

A detailed and complete review of stable distributions can be found in Nolan (2009).

11.2.1 Characterization

Stable distributions are characterized by the following equation

L
aX̃ + bX = cX + d,
˜

where X̃ and X are independent copies of a random variable X and some positive constants a, b, c
˜
and d. This equation means stable distributions are distributions closed for linear combinations.
For the terminology, we say X is strictly stable if d = 0 and symmetric stable if in addition we have
L
X = −X. From Nolan (2009), we learn we use the word stable since the shape of the distribution
is preserved under linear combinations.

Another way to define stable distribution is to use characteristic functions. X has a stable
distribution if and only if its characteristic function is
( α πα
itδ e−|tγ| (1−iβ tan( 2 )sign(t)) if α 6= 1
φ(t) = e × 2 ,
e−|tγ|(1+iβ π log |t|sign(t)) if α = 1

where α ∈]0, 2], β ∈] − 1, 1[, γ > 0 and b ∈ R are the parameters. In the following, we denote
S(α, β, γ, δ), where δ is a location parameter, γ a scale parameter, α an index of stability and β a
skewness parameter. This corresponds to the parametrization 1 of Nolan (2009).

We know that stable distributions S(α, β, γ, δ) are continuous distributions whose support is

 [δ, +∞[ if α < 1 and β = 1
] − ∞, δ] if α < 1 and β = −1 .
] − ∞, +∞[ otherwise


11.2.2 Properties

If we work with standard stable distributions S(α, β, 0, 1), we have the reflection property. That is
to say if X ∼ S(α, β, 0, 1), then −X ∼ S(α, −β, 0, 1). This implies the following constraint on the
density and the distribution function:

fX (x) = f−X (−x) and FX (x) = 1 − F−X (x).

126 CHAPTER 11. GENERALIZATION OF COMMON DISTRIBUTIONS

From the definition, we have the obvious property on the sum. If X follows a stable distribution
S(α, β, γ, δ), then aX + b follows a stable distribution of parameters

S(α, sign(a)β, |a|γ, aδ + b) if α 6= 1
.
S(1, sign(a)β, |a|γ, aδ + b − π2 βγa log |a|) if α = 1

Furthermore if X1 and X2 follow a stable distribution S(α, βi , γi , δi ) for i = 1, 2, then the

β γ α +β γ α 1
sum X1 + X2 follows a stable distribution S(α, β, γ, δ) with β = 1 γ1α +γ 2α 2 , γ = (γ1α + γ2α ) α and
1 2
δ = δ1 + δ2 .

11.2.3 Special cases

The following distributions are special cases of stable distributions:

√ (x−µ)2
• S(2, 0, σ/ 2, µ) is a Normal distribution defined by the density f (x) = √ 1 e− 2σ 2 ,
2πσ 2
γ
• S(1, 0, γ, δ) is a Cauchy distribution defined by the density f (x) = 1
π γ 2 +(x−γ)2 ,
q γ
γ − 2(x−δ)
• S(1/2, 1, γ, δ) is a Lévy distribution defined by the density f (x) = 2π
1
3 e .
(x−δ) 2

11.2.4 Estimation

NEED REFERENCE

11.2.5 Random generation

Simulation of stable distributions are carried out by the following algorithm from Chambers et al.
(1976). Let Θ be an independent random uniform variable U(−π/2, π/2) and W be an exponential
variable with mean 1 independent from Θ. For 0 < α ≤ 2, we have

• in the symmetric case,

1−α
sin(αΘ) cos((α − 1)Θ) α
Z= 1
cos(Θ) α W
follows a stable distribution S(α, 0, 1, 0) with the limiting case tan(Θ) when α → 1.
• in the nonsymetric case,
 1−α

 sin(α(Θ+θ)) cos(αθ+(α−1)Θ) α
1 W
Z= (cos(αθ)
cos(Θ)) α π
2 π W cos(Θ)
π ( 2 + βΘ) tan(Θ) − β log 2

 π
2
+βΘ

follows a stable distribution S(α, β, 1, 0) where θ = arctan(β tan(πα/2))/α.

11.3. PHASE-TYPE DISTRIBUTION 127

Then we get a “full” stable distribution with γZ + δ.

11.2.6 Applications

NEED REFERENCE

11.3 Phase-type distribution

11.3.1 Characterization

A phase-type distribution P H(π, T, m) (π a row vector of Rm , T a m × m matrix) is defined as

the distribution of the time to absorption in the state 0 of a Markov jump process, on the set
{0, 1, . . . , m}, with initial probability (0, π) and intensity matrix∗
!
0 0
Λ = (λij )ij = ,
t0 T
where the vector t0 is −T 1m and 1m stands for the column vector of 1 in Rm . This means that if
we note (Mt )t the associated Markov process of a phase-type distribution, then we have

λij h + o(h) if i 6= j
P (Mt+h = j/Mt = i) = .
1 + λii h + o(h) if i = j
The matrix T is called the sub-intensity matrix and t0 the exit rate vector.

The cumulative distribution function of a phase-type distribution is given by

F (x) = 1 − πeT x 1m ,
and its density by
f (x) = πeT x t0 ,
+∞
T n xn
where eT x denote the matrix exponential defined as the matrix serie
P
n! .
n=0

The computation of matrix exponential is studied in details in appendix A.3, but let us notice
that when T is a diagonal matrix, the matrix exponential is the exponential of its diagonal terms.
Let us note that there also exists discrete phase-type distribution, cf. Bobbio et al. (2003).

11.3.2 Properties

The moments of a phase-type distribution are given by (−1)n n!πT −n 1. Since phase-type distribu-
tions are platikurtic or light-tailed distributions, the Laplace transform exists
fb(s) = π(−sIm − T )−1 t0 ,
∗
matrix such that its row sums are equal to 0 and have positive elements except on its diagonal.
128 CHAPTER 11. GENERALIZATION OF COMMON DISTRIBUTIONS

where Im stands for the m × m identity matrix.

One property among many is the set of phase-type distributions is dense with the set of positive
random variable distributions. Hence, the distribution of any positive random variable can be
written as a limit of phase-type distributions. However, a distribution can be represented (exactly)
as a phase-type distribution if and only if the three following conditions are verified

• the distribution has a rational Laplace transform;

• the pole of the Laplace transform with maximal real part is unique;

• it has a density which is positive on R?+ .

11.3.3 Special cases

Here are some examples of distributions, which can be represented by a phase-type distribution

• exponential distribution E(λ) : π = 1, T = −λ and m = 1.

• generalized Erlang distribution G (n, (λi )1≤i≤n ) :

π = (1, 0, . . . , 0),
 
−λ1 λ1 0 ... 0
 .. 

 0 −λ2 λ2 . 0 

T =
 .. ,

 0 0 −λ3 . 0 
 .. .. 
 0 0 . . λn−1 
0 0 0 0 −λn
and m = n.

• a mixture of exponential distribution of parameter (pi , λi )1≤i≤n :

π = (p1 , . . . , pn ),
 
−λ1 0 0 ... 0
 .. 

 0 −λ2 0 . 0 

T =
 .. ,

 0 0 −λ3 . 0 
 .. .. 
 0 0 . . 0 
0 0 0 0 −λn
and m = n.

• a mixture of 2 (or k) Erlang distribution G(ni , λi )i=1,2 with parameter pi :

π = (p1 , 0, . . . , 0, p2 , 0, . . . , 0),
| {z } | {z }
n1 n2
11.3. PHASE-TYPE DISTRIBUTION 129

 
−λ1 λ1 0 0 ... 0 0
 .. .. 

 0 . λ1 0 . 0 0 


 0 0 −λ1 0 0 0 0 

 .. 
T =
 0 0 . −λ2 λ2 0 0 ,

 .. .. 

 0 0 0 0 . . 0 

 .. 
 0 0 0 0 0 . λ2 
0 0 0 0 0 0 −λ2
and m = n1 + n2 .

11.3.4 Estimation

The estimation based on moments can be a starting point for parameters, but according to Feld-
mann & Whitt (1996) the fit is very poor. Feldmann & Whitt (1996) proposes a recursive algo-
rithm matching theoretical quantiles and empirical quantiles. They illustrates their method with
the Weibull and the Pareto distribution by a mixture of exponential distributions.

First Asmussen et al. (1996) and then Lee & Lin (2008) fit phase-type distribution with the EM
algorithm. Lee & Lin (2008) also investigates goodness of fit and graphical comparison of the fit.
Lee & Lin (2008) focuses on mixture of Erlang distributions while Asmussen et al. (1996) provides
an algorithm for general phase-type distributions. Lee & Lin (2008) illustrates their algorithm with
an uniform, a Weibull, a Pareto and log-normal distributions.

11.3.5 Random generation

From Neuts (1981), we have the following algorithm to generate phase-type distributed random
variate. Let s be the state of the underlying Markov chain.

• S initialized from the discrete distribution characterized by π

• X initialized to 0
• while S 6= 0 do
– generate U from an uniform distribution,
– X = X − 1 log(U ),
λ̂ij

– generate S from a discrete distribution characterized by the row Λ̂

where Λ̂ is the transition matrix defined by



 1 if i=j=1
i = 1 and j 6= 1

 0 if
λ̂ij = 0 if i > 1 and j = i .

 λij

if i > 1 and j 6= i

−λii
130 CHAPTER 11. GENERALIZATION OF COMMON DISTRIBUTIONS

11.3.6 Applications

NEED REFERENCE

11.4 Exponential family

11.4.1 Characterization

Clark & Thayer (2004) defines the exponential family by the following density or mass probability
function
f (x) = ed(θ)e(x)+g(θ)+h(x) ,
where d, e, g and h are known functions and θ the vector of paremeters. Let us note that the support
of the distribution can be R or R+ or N. This form for the exponential family is called the natural
form.

When we deal with generalized linear models, we use the natural form of the exponential family,
which is
θx−b(θ)
+c(x,φ)
f (x) = e a(φ) ,
where a, b, c are known functions and θ, φ∗ denote the parameters. This form is derived from the
previous by setting d(θ) = θ, e(x) = x and adding a dispersion parameter φ.

Let µ be the mean of the variable of an exponential family distribution. We have µ = τ (θ) since
φ is only a dispersion parameter. The mean value form of the exponential family is
τ −1 (µ)x−b(τ −1 (µ))
+c(x,φ)
f (x) = e a(φ) .

11.4.2 Properties

For the exponential family, we have E(X) = µ = b0 (θ) and V ar(X) = a(φ)V 00
q (µ) = a(φ)b (θ) where
a(φ) b(3) (θ)a(φ)2
V is the unit variance function. The skewness is given by γ3 (X) = dV dµ (µ) V (µ) = V ar(Y )3/2
, while
2
2 a(φ) b(4) (θ)a(φ)3
the kurtosis is γ4 (X) = 3 + ddµV2 (µ)V (µ) + dV
dµ (µ) V (µ) = 3 + V ar(Y )2 .

The property of uniqueness is the fact that the variance function V uniquely identifies the
distribution.

11.4.3 Special cases

The exponential family of distributions in fact contains the most frequently used distributions.
Here are the corresponding parameters, listed in a table:
∗
the canonic and the dispersion parameters.
11.5. ELLIPTICAL DISTRIBUTION 131

Law Distribution θ φ Expectation Variance

(x−µ)2
Normal N (µ, σ 2 ) √ 1 e− 2σ 2 µ σ2 µ=θ 1
2πσ
β α xα−1 −βx
Gamma G(α, β) Γ(α) e − αβ = 1
µ
1
α µ = − 1θ µ2
q λ(x−µ)2
− 1
Inverse Normal I(µ, λ) λ
2πx3
e 2µ2 x − 2µ1 2 1
λ µ = (−2θ)− 2 µ3
µ eθ
Bernoulli B(µ) µx (1 − µ)1−x log( 1−µ ) 1 µ= 1+eθ
µ(1 − µ)
µx −µ
Poisson P(µ) x! e log(µ) 1 µ = eθ µ
x
µφ
Overdispersed Poisson P(φ, µ) x
! e−µ log(µ) φ φeθ φµ
φ

11.4.4 Estimation

The log likelihood equations are

n

1 P Xi b0 (θ)


 n a(φ) = a(φ)
i=1
n n ,
θXi a0 (φ) 0
1 1 ∂c
b(θ) aa2(φ)
P P
− ∂φ (Xi , φ) =


 n a2 (φ) n (φ)
i=1 i=1

for a sample (Xi )i .

11.4.5 Random generation

NEED REFERENCE

11.4.6 Applications

GLM, credibility theory, lehman scheffe theorem

11.5 Elliptical distribution

11.5.1 Characterization

TODO
132 CHAPTER 11. GENERALIZATION OF COMMON DISTRIBUTIONS

11.5.2 Properties

TODO

11.5.3 Special cases

11.5.4 Estimation

TODO

11.5.5 Random generation

TODO

11.5.6 Applications
Chapter 12

Multivariate distributions

12.1 Multinomial

12.2 Multivariate normal

12.3 Multivariate elliptical

12.4 Multivariate uniform

12.5 Multivariate student

12.6 Kent distribution

12.7 Dirichlet distribution

12.7.1 Characterization

TODO

12.7.2 Properties

TODO

133
134 CHAPTER 12. MULTIVARIATE DISTRIBUTIONS

12.7.3 Estimation

TODO

12.7.4 Random generation

TODO

12.7.5 Applications

TODO

12.8 Von Mises Fisher

12.9 Evens
Chapter 13

Misc

13.1 MBBEFD distribution

The MBBEFD distribution comes from the actuarial science due to Bernegger (1997). MBBEFD
stands for Maxwell-Boltzmann, Bore-Einstein and Fermi-Dirac distribution.

13.1.1 Characterization

The MBBEFD distribution is characterized by the following distribution function.

(
a+1
a a+bx −1 if 0 ≤ x ≤ 1
F (x) = ,
1 if x > 1

for x ∈ R+ . The MBBEFD distribution is a mixed distribution of a continuous distribution on

x ∈]0, 1[ and a discrete distribution in x = 1. We have a mass probability for x = 1

(a + 1)b
p = 1 − F (1) = .
a+b

The parameters (a, b) are defined on a wide set of intervals, which are not trivial: ] − 1, 0[×]1, +∞[
and ]−∞, −1[∪]0, +∞[×]0, 1[. The shape of the distribution function F has the following properties

• for (a, b) ∈ I1 =] − 1, 0[×]1, +∞[, F is concave,

• for (a, b) ∈ I2 =] − ∞, −1[×]0, 1[, F is concave,

• for (a, b) ∈ I3 =]0, b[×]0, 1[, F is concave,

• for (a, b) ∈ I4 = [b, 1[×]0, 1[, F is convex then concave,

• for (a, b) ∈ I4 = [1, +∞[×]0, 1[, F is convex.

135
136 CHAPTER 13. MISC

There is no usual density but if we use the Dirac function δ, we can define a function f such
that
−a(a + 1)bx ln(b)
f (x) = 11]0,1[ (x) + δ1 .
(a + bx )2
which is a mix between a mass probability and a density functions.

13.1.2 Special cases

TODO

13.1.3 Properties

TODO

13.1.4 Estimation

TODO

13.1.5 Random generation

TODO

13.1.6 Applications

13.2 Cantor distribution

TODO

13.3 Tweedie distribution

TODO
Bibliography

Arnold, B. C. (1983), ‘Pareto distributions’, International Co-operative Publishing House 5. 30, 88,
93, 95, 96
Asmussen, S., Nerman, O. & Olsson, M. (1996), ‘Fitting phase-type distributions via the em algo-
rithm’, Scandinavian Journal of Statistics 23(4), 419–441. 129
Barndorff-Nielsen, O. E. (1977), ‘Exponentially decreasing distributions for the logarithm of particle
size’, Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences
353(1674), 401–419. 120
Barndorff-Nielsen, O. E. (1997a), ‘Normal inverse Gaussian distributions and stochastic volatility
modelling’, Scandinavian Journal of Statistics 24(1), 1–13. 124
Barndorff-Nielsen, O. E. (1997b), ‘Processes of normal inverse gaussian type’, Finance and Stochas-
tics 2(1), 41–68. 124
Barndorff-Nielsen, O. E. & Halgreen, O. (1977), ‘Infinite divisibility of the hyperbolic and gen-
eralized inverse gaussian distribution’, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte
Gebiete 38(4), 309–311. 124
Barndorff-Nielsen, O. E. & Prause, K. (2001), ‘Apparent scaling’, Finance and Stochastics
5(1), 103–113. 124
Bernegger, S. (1997), ‘The swiss re exposure curves and the mbbefd distribution class’, Astin Bull.
27(1), 99–111. 135
Bibby, B. M. & Sorensen, M. (1997), ‘A hyperbolic diffusion model for stock prices’, Finance &
Stochastics 2 pp. 25–41. 124
Black, F. & Scholes, M. (1973), ‘The pricing of options and corporate liabilities’, Journal of Political
Economy 81(3). 51
Bobbio, A., Horvath, A., Scarpa, M. & Telek, M. (2003), ‘Acyclic discrete phase type distributions:
properties and a parameter estimation algorithm’, performance evaluation 54, 1–32. 127
Breymann, W., Dias, A. & Embrechts, P. (2003), ‘Dependence Structures for Multivariate High–
Frequency Data in Finance’, Quantitative Finance 3(1), 1–14. 124
Breymann, W. & Lüthi, D. (2008), ghyp: A package on generalized hyperbolic distributions, Institute
of Data Analysis and Process Design. 54, 117
Brigo, D., Mercurio, F., Rapisarda, F. & Scotti, R. (2002), ‘Approximated moment-matching dy-
namics for basket-options simulation’, Product and Business Development Group,Banca IMI,
SanPaolo IMI Group . 53

137
138 BIBLIOGRAPHY

Cacoullos, T. & Charalambides, C. (1975), ‘On minimum variance unbiased estimation for truncated
binomial and negative binomial distributions’, Annals of the Institute of Statistical Mathematics
27(1). 12, 21, 24

Carrasco, J. M. F., Ortega, E. M. M. & Cordeiro, G. M. (2008), ‘A generalized modified weibull

distribution for lifetime modeling’, Computational Statistics and Data Analysis 53, 450–462. 72

Chambers, J. M., Mallows, C. L. & Stuck, B. W. (1976), ‘A method for simulating stable random
variables’, Journal of the American Statistical Association, . 126

Chen, Y., Härdle, W. & Jeong, S.-O. (2005), Nonparametric Risk Management with Generalized
Hyperbolic Distributions, Vol. 1063 of Preprint / Weierstraß–Institut für Angewandte Analysis
und Stochastik, WIAS, Berlin. 124

Clark, D. R. & Thayer, C. A. (2004), ‘A primer on the exponential family of distributions’, 2004
call paper program on generalized linear models . 130

Cont, R. & Tankov, P. (2003), Financial Modelling with Jump Processes, Chapman & Hall CRC
Financial Mathematics Series. 124

Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977), ‘Maximum likelihood from incomplete data
via the Em algorithm’, Journal of the Royal Statistical Society 39(1), 1–38. 122

Dilip B. Madan, Peter Carr & Eric C. Chang (1998), ‘The variance gamma process and option
pricing’, European Finance Review 2, 79–105. 124

Dutang, C. (2008), randtoolbox: Generating and Testing Random Numbers. 36

Eberlein, E. & Keller, U. (1995), ‘Hyperbolic distributions in finance.’, Bernoulli 1 pp. 281–299.
124

Eberlein, E., Keller, U. & Prause, K. (1998), ‘New insights into smile, mispricing and value at risk
measures.’, Journal of Business 71, 371–405. 124

Embrechts, P., Klüppelberg, C. & Mikosch, T. (1997), Modelling extremal events, Springer. 99,
100, 101, 114, 115

Feldmann, A. & Whitt, W. (1996), ‘Fitting mixtures of exponentials to long tail distributions to
analyze network performance models’, AT&T Laboratory Research . 129

Fergusson, K. & Platen, E. (2006), ‘On the distributional characterization of log-returns of a world
stock index’, Applied Mathematical Finance 13(1), 19–38. 124

Froot, K. A. & O’Connell, P. G. J. (2008), ‘On the pricing of intermediated risks: Theory and
application to catastrophe reinsurance’, Journal of banking & finance 32, 69–85. 95

Gomes, O., Combes, C. & Dussauchoy, A. (2008), ‘Parameter estimation of the generalized gamma
distribution’, Mathematics and Computers in Simulation 79, 955–963. 67

Haahtela, T. (2005), Extended binomial tree valuation when the underlying asset distribution is
shifted lognormal with higher moments. Helsinki University of Technology. 53

Haddow, J. E., Palomaki, G. E., Knight, G. J., Cunningham, G. C., Lustig, L. S. & Boyd, P. A.
(1994), ‘Reducing the need for amniocentesis in women 35 years of age or older with serum
markers for screening’, New England Journal of Medicine 330(16), 1114–1118. 11
BIBLIOGRAPHY 139

Johnson, N. L., Kotz, S. & Balakrishnan, N. (1994), Continuous univariate distributions, John
Wiley. 5

Jones, M. C. (2009), ‘Kumaraswamy’s distribution: A beta-type distribution with some tractability

advantages’, Statistical Methodology 6, 70. 45, 46

Kassberger, S. (2007), ‘Efficient Portfolio Optimization in the Multivariate Generalized Hyperbolic

Framework’, SSRN eLibrary . 125

Kassberger, S. & Kiesel, R. (2006), ‘A fully parametric approach to return modelling and risk
management of hedge funds’, Financial markets and portfolio management 20(4), 472–491. 124

Klugman, S. A., Panjer, H. H. & Willmot, G. (2004), Loss Models: From Data to Decisions, 2 edn,
Wiley, New York. 50, 68, 96, 104

Knuth, D. E. (2002), The Art of Computer Programming: seminumerical algorithms, Vol. 2, 3rd
edition edn, Massachusetts: Addison-Wesley. 15

Lee, S. C. & Lin, X. S. (2008), ‘Modeling and evaluating insurance losses via mixtures of erlang’,
North American Actuarial Journal . 129

Li, Q. & Yu, K. (2008), ‘Inference of non-centrality parameter of a truncated non-central chi-squared
distribution’, Journal of Statistical Planning and Inference . 79

Limpert, E., Stahel, W. A. & Abbt, M. (2001), ‘Log-normal distributions across the sciences: Keys
and clues’, Bioscience 51(5). 51

Matsumoto, M. & Nishimura, T. (1998), ‘Mersenne twister: A 623-dimensionnally equidistributed

uniform pseudorandom number generator’, ACM Trans. on Modelling and Computer Simulation
8(1), 3–30. 36

Mačutek, J. (2008), ‘A generalization of the geometric distribution and its application in quantita-
tive linguistics’, Romanian Reports in Physics 60(3), 501–509. 20

McNeil, A. J., Frey, R. & Embrechts, P. (2005a), Quantitative risk management: Concepts, tech-
niques and tools, Princeton University Press, Princeton. 123

McNeil, A. J., Frey, R. & Embrechts, P. (2005b), Quantitative risk management: Concepts, tech-
niques and tools, Princeton University Press, Princeton. 124

Meng, X.-L. & Rubin, D.-B. (1993), ‘Maximum likelihood estimation via the ECM algorithm: A
general framework’, Biometrika 80(2), 267–278. 122, 123

Moler, C. & Van Loan, C. (2003), ‘Nineteen dubious ways to compute the exponential of a matrix,
twenty-five years later’, SIAM review 45(1), 300. 144

Nadarajah, S. & Kotz, S. (2003), ‘A generalized beta distribution ii’, Statistics on the internet .
42, 43

Neuts, M. F. (1981), Generating random variates from a distribution of phase-type, in ‘Winter

Simulation Conference’. 129

Nolan, J. P. (2009), Stable Distributions - Models for Heavy Tailed Data, Birkhäuser, Boston. In
progress, Chapter 1 online at academic2.american.edu/∼jpnolan. 125
140 BIBLIOGRAPHY

Paolella, M. (2007), Intermediate probability: A computational approach, Wiley, Chichester. 118

Patard, P.-A. (2007), ‘Outils numeriques pour la simulation monte carlo des produits derives com-
plexes’, Bulletin français d’actuariat 7(14), 74–117. 49

Pickands, J. (1975), ‘Statistical inference using extreme order statistics’, Annals of Statistics 3, 119–
131. 90

Prause, K. (1999), The Generalized Hyperbolic Model: Estimation Financial Derivatives and Risk
Measures, PhD thesis, Universität Freiburg i. Br., Freiburg i. Br. 118

Raible, S. (2000), Lévy processes in finance: Theory, numerics, and empirical facts, PhD thesis,
Universität Freiburg i. Br. 124

Rytgaard, M. (1990), ‘Estimation in the pareto distribution’, Astin Bull. 20(2), 201–216. 93

Saporta, G. (1990), Probabilités analyse des données et statistique, Technip. 14

Saxena, K. M. L. & Alam, K. (1982), ‘Estimation of the non-centrality parameter of a chi-squared

distribution’, The Annals of Statistics 10(3), 1012–1016. 79

Simon, L. J. (1962), An introduction to the negative binomial distribution and its applications, in
‘Casualty Actuarial Society’, Vol. XLIX. 23

Singh, A. K., Singh, A. & Engelhardt, M. (1997), ‘The lognormal distribution in environmental
applications’, EPA Technology Support Center Issue . 51

Stein, W. E. & Keblis, M. F. (2008), ‘A new method to simulate the triangular distribution’,
Mathematical and computer modelling . 38

Tate, R. F. & Goen, R. L. (1958), ‘Minimum variance unbiased estimation for the truncated poisson
distribution’, The Annals of Mathematical Statistics 29(3), 755–765. 16, 17

Thomas, D. G. & Gart, J. J. (1971), ‘Small sample performance of some estimators of the truncated
binomial distribution’, Journal of the American Statistical Association 66(333). 12

Venter, G. (1983), Transformed beta and gamma distributions and aggregate losses, in ‘Casualty
Actuarial Society’. 67

Wimmer, G. & Altmann, G. (1999), Thesaurus of univariate discrete probability distributions,

STAMM Verlag GmbH Essen. 5

Yu, Y. (2009), ‘Complete monotonicity of the entropy in the central limit theorem for gamma and
inverse gaussian distributions’, Statistics and probability letters 79, 270–274. 53
Appendix A

Mathematical tools

A.1 Basics of probability theory

TODO

A.1.1 Characterising functions

For a discrete distribution, one may use the probability generating function to characterize the
distribution, if it exists or equivalently the moment generating function. For a continuous distri-
bution, we generally use only the moment generating function. The moment generating function is
linked to the Laplace transform of a distribution. When dealing with continuous distribution, we
also use the characteristic function, which is related to the Fourrier transform of a distribution, see
table below for details.

Probability generating Moment generating Laplace Characteristic Fourrier

function GX (z) function MX (t) <=> Transform LX (s) function φX (t) <=> transform

E zX E etX E e−sX E eitX E e−itX

<=> <=>

We have the following results

1 dk GX (t) dk GX (t)
• ∀k ∈ N, X discrete random variable , P (X = k) = k! dtk
|t=0 ; E(X . . . (X−k)) = dtk
|t=1

dk MX (t)
• ∀X continuous random variable E(X k ) = dtk
|t=0

141
142 APPENDIX A. MATHEMATICAL TOOLS

A.2 Common mathematical functions

In this section, we recall the common mathematical quantities used in all this guide. By definition,
we have

A.2.1 Integral functions

R +∞
• gamma function: ∀a > 0, Γ(a) = 0 xa−1 e−x dx
Rx
• incomplete gamma function: lower ∀a, x > 0, γ(a, x) = 0 y a−1 e−y dy and upper Γ(a, x) =
R +∞ a−1 −y
x y e dy;
√
• results for gamma function ∀n ∈ N? , Γ(n) = (n − 1)!, Γ(0) = 1, Γ( 12 ) = π, ∀a > 1, Γ(a) =
(a − 1)Γ(a − 1)
R1
• beta function: ∀a, b > 0, β(a, b) = 0 xa−1 (1 − x)b−1 dx,
Ru
• incomplete beta function ∀1 ≥ u ≥ 0, β(a, b, u) = 0 xa−1 (1 − x)b−1 dx;

Γ(a)Γ(b)
• results for beta function ∀a, b > 0, β(a, b) = Γ(a+b)

Γ0 (x)
• digamma function: ∀x > 0, ψ(x) = Γ(x)

Γ00 (x)
• trigamma function: ∀x > 0, ψ1 (x) = Γ(x)

Rx 2
• error function : erf(x) = √2
π 0 e−t dt

A.2.2 Factorial functions

• factorial : ∀n ∈ N, n! = n × (n − 1) . . . 2 × 1
Γ(n+m)
• rising factorial : ∀n, m ∈ N2 , m(n) = m × (m + 1) . . . (m + n − 2) × (m + n − 1) = Γ(n)

Γ(m)
• falling factorial: ∀n, m ∈ N2 , (m)n = m × (m − 1) . . . (m − n + 2) × (m − n + 1) = Γ(m−n)

• combination number : ∀n, p ∈ N2 , Cnp = n!

p!(n−p)!

• arrangement number Apn = n!

(n−p)!

• Stirling number of the first kind : coefficients 1 Snk of the expansion of (x)n = nk=0 1 Snk xk or
P
k−1
defined by the recurrence 1 Snk = (n − 1) × 1 Sn−1
k + 1 Sn−1 with 1 Sn0 = δn0 and 1 S01 = 0.

• Stirling number of the second kind : coefficients 2 Snk of the expansion nk=0 2 Snk (x)k = xn or
P
k−1
defined by the recurrence 2 Snk = 2 Sn−1 k
+ k × 2 Sn−1 with 2 Sn1 = 2 Snn = 1.
A.3. MATRIX EXPONENTIAL 143

A.2.3 Serie functions

+∞
• Riemann’s zeta function : ∀s > 1, ζ(s) = 1
P
ns
n=1

+∞
zn
• Jonquière’s function : ∀s > 1, ∀z > 0, Lis (z) =
P
ns
n=1

+∞
a(n) z n
• hypergeometric function : ∀a, b, c ∈ N, ∀z ∈ R, 1 F1 (a, b, z) =
P
b(n) n!
, 2 F1 (a, b, c, z) =
n=0
+∞ +∞
P a(n) b(n) z n P a(n) b(n) c(n) z n
c(n) n!
and 3 F1 (a, b, c, d, e, z) = d(n) e(n) n!
.
n=0 n=0

• Bessel’s functions verify the following ODE: x2 y 00 + xy 0 + (x2 − α2 )y = 0. We define the Bessel
∞
(−1)n x 2n+α
function of the 1st kind by Jα (x) = and of the 2nd kind Yα (x) =
P
n!Γ(n+α+1) 2
n=0
Jα (x)cos(απ)−J−α (x)
sin(απ) .
(1)
• Hankel’s function: Hα (x) = Jα (x) + iYα (x)
+∞
P (x/2)2k+α π α+1 (1)
• Bessel’s modified function Iα (x) = i−α Jα (ix) = k!Γ(α+k+1) and Kα (x) = 2 i Hα (x) =
k=0
1 ∞ α−1 − x2 (y+y −1 )
R
2 0 y e dy
x n x xn ) i
• Laguerre’s polynomials: Ln (x) = en! d (e = ni=0 (−1)i Cnn−i xi!
P
dxn
(α) ex dn (ex xn+α ) Pn i n−i xi
• generalized Laguerre’s polynomials: Ln (x) = n!xα dxn = i=0 (−1) Cn+α i!

A.2.4 Miscellanous

+∞ si x = x0
• Dirac function: ∀x > 0, δx0 (x) = et
0 sinon

 0
si x < x0
• heavyside function : Hx0 (x) = 1
si x = x0
2
1 sinon



 x si n=0
1
n 6= 0 et 0 ≥ x ≥ 31

2 n−1 (3x)
F si

• Cantor function : ∀x ∈ [0, 1], Fn (x) = 1

 2 si n 6= 0 et 13 ≥ x ≥ 23
 1 1
2 + 2 F n−1 (3(x − 23 )) si n 6= 0 et 23 ≥ x ≥ 1

A.3 Matrix exponential

Now let us consider the problem of computing eQu . We recall that

+∞
X Qn un
eQu = .
n!
n=0
144 APPENDIX A. MATHEMATICAL TOOLS

There are various methods to compute the matrix exponential, Moler & Van Loan (2003) makes a
deep analysis of the efficiency of different methods. In our case, we choose a decomposition method.
We diagonalize the n × n matrix Q and use the identity

eQu = P eDu P −1 ,

where D is a diagonal matrix with eigenvalues on its diagonal and P the eigenvectors. We compute
m
X
eQu = eλl u P Ml P −1 ,
| {z }
l=1 Cl

where λi stands for the eigenvalues of Q, P the eigenvectors and Ml = (δil δlj )ij (δij is the symbol
Kronecker, i.e. equals to zero except when i = j). As the matrix Ml is a sparse matrix with just
a 1 on the lth term of its diagonal. The constant Ci can be simplified. Indeed, if we denote by Xl
the lth column of the matrix P (i.e. the eigenvector associated to the eigenvalue λl ) and Yl the lth
row of the matrix P −1 , then we have
4
Cl = P Ml P −1 = Xl ⊗ Yl .

Despite Q is not obligatorily diagonalizable, this procedure will often work, since Q may have
a complex eigenvalue (say λi ). In this case, Ci is complex but as eQu is real, we are ensured there
is j ∈ [[1, . . . , m]], such that λj is the conjugate of λl . Thus, we get

eλi u Ci + eλj u Cj = 2cos(=(λi )u)e<λi u <(Xi ⊗ Yi ) − 2sin(=(λi )u)e<λi u =(Xi ⊗ Yi ) ∈ R,

where < and = stands resp. for the real and the imaginary part.

A.4 Kronecker product and sum

The Kronecker product A ⊗ B is defined as the mn × mn matrix

A ⊗ B = (Ai1 ,j1 Bi2 ,j2 )i1 i2 ,j1 j2 ,

when A is a m × m matrix of general term (Ai1 ,j1 )i1 ,j1 and B a n × n matrix of general term
(Bi2 ,j2 )i2 ,j2 . Note that the Kronecker can also be defined for non-square matrixes.

The Kronecker sum A ⊕ B is given by the mn × mn matrix

A ⊗ B = A ⊗ Im + B ⊗ In ,

where Im and In are the identity matrixes of size m and n. This definition is right only for square
matrixes A and B.
Contents

Introduction 4

I Discrete distributions 6

1 Classic discrete distribution 7

1.1 Discrete uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.1.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2 Bernoulli/Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Zero-truncated or zero-modified binomial distribution . . . . . . . . . . . . . . . . . 11

1.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

145
146 CONTENTS

1.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4 Quasi-binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.5 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.5.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.5.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.6 Zero-truncated or zero-modified Poisson distribution . . . . . . . . . . . . . . . . . . 15

1.6.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.6.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.6.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.6.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.7 Quasi-Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.7.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
CONTENTS 147

1.7.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.7.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.7.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.8 Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.8.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.8.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.8.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.8.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.8.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.9 Zero-truncated or zero-modified geometric distribution . . . . . . . . . . . . . . . . . 20

1.9.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.9.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.9.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.9.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.9.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.10 Negative binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.10.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.10.2 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.10.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.10.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.10.5 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.10.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.11 Zero-truncated or zero-modified negative binomial distribution . . . . . . . . . . . . 24

148 CONTENTS

1.11.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.11.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.11.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.11.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.11.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.12 Pascal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.12.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.12.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.12.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.12.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.12.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.13 Hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.13.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.13.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.13.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.13.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.13.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Not so-common discrete distribution 27

2.1 Conway-Maxwell-Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
CONTENTS 149

2.2 Delaporte distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3 Engen distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4 Logaritmic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5 Sichel distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
150 CONTENTS

2.6 Zipf distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.6.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.6.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.6.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.6.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7 The generalized Zipf distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.8 Rademacher distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.8.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.8.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.8.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.8.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.8.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.9 Skellam distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.9.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.9.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.9.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.9.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.9.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
CONTENTS 151

2.10 Yule distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.10.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.10.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.10.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.10.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.10.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.11 Zeta distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.11.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.11.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.11.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.11.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.11.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

II Continuous distributions 34

3 Finite support distribution 35

3.1 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.1.4 Random number generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Triangular distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
152 CONTENTS

3.2.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3 Beta type I distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.2 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.5 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4 Generalized beta I distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.5 Generalization of the generalized beta I distribution . . . . . . . . . . . . . . . . . . 42

3.5.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.5.2 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5.5 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.6 Kumaraswamy distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

CONTENTS 153

3.6.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 The Gaussian family 47

4.1 The Gaussian (or normal) distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2 Log normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.3 Shifted log normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
154 CONTENTS

4.4 Inverse Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.4.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5 The generalized inverse Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . 54

4.5.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.5.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.5.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5 Exponential distribution and its extensions 56

5.1 Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 Shifted exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
CONTENTS 155

5.3 Inverse exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.3.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.3.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.4.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.5 Generalized Erlang distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.5.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.5.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.5.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.6 Chi-squared distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.7 Inverse Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.7.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.7.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.7.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.7.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

156 CONTENTS

5.7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.8 Transformed or generalized gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.8.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.8.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.8.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.8.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.8.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.9 Inverse transformed Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.9.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.9.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.9.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.9.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.9.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.10 Log Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.10.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.10.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.10.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.10.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.10.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.11 Weibull distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.11.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.11.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.11.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.11.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

CONTENTS 157

5.11.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.12 Inverse Weibull distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.12.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.12.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.12.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.12.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.12.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.13 Laplace or double exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . 72

5.13.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.13.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.13.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.13.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.13.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6 Chi-squared’s ditribution and related extensions 75

6.1 Chi-squared distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.1.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.2 Chi distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
158 CONTENTS

6.2.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.3 Non central chi-squared distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.3.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.3.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.4 Non central chi distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.4.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.4.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.5 Inverse chi-squared distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.5.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.5.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.5.3 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.5.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.6 Scaled inverse chi-squared distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.6.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.6.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.6.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

CONTENTS 159

6.6.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7 Student and related distributions 84

7.1 Student t distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.1.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.2 Cauchy distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.2.2 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.2.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.2.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.2.5 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.2.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.3 Fisher-Snedecor distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.3.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.3.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

8 Pareto family 88

8.1 Pareto distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

8.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
160 CONTENTS

8.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

8.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

8.1.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

8.1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

8.2 Feller-Pareto distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.2.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.2.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.2.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.3 Inverse Pareto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

8.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

8.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

8.3.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

8.3.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

8.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

8.4 Generalized Pareto distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.4.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

8.4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

8.4.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8.4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8.5 Burr distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8.5.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

CONTENTS 161

8.5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8.5.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

8.5.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

8.5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

8.6 Inverse Burr distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8.6.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8.6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8.6.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8.6.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8.6.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8.7 Beta type II distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8.7.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8.7.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

8.7.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

8.7.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

8.7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

9 Logistic distribution and related extensions 108

9.1 Logistic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

9.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

9.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

9.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

9.1.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

9.1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.2 Half logistic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

162 CONTENTS

9.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.2.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.2.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.2.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.3 Log logistic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.3.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.3.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.4 Generalized log logistic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.4.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.4.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.5 Paralogisitic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.5.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.5.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.5.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.6 Inverse paralogistic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

CONTENTS 163

9.6.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.6.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.6.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.6.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

10 Extrem Value Theory distributions 111

10.1 Gumbel distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

10.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

10.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

10.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

10.1.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

10.1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

10.2 Fréchet distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

10.3 Weibull distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

10.4 Generalized extreme value distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 113

10.4.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

10.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

10.4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

10.4.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

10.4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

10.5 Generalized Pareto distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

III Multivariate and generalized distributions 116

11 Generalization of common distributions 117

164 CONTENTS

11.1 Generalized hyperbolic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

11.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

11.1.2 Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

11.1.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

11.1.4 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

11.1.5 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

11.1.6 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

11.1.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

11.2 Stable distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

11.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

11.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

11.2.3 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

11.2.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

11.2.5 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

11.2.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

11.3 Phase-type distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

11.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

11.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

11.3.3 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

11.3.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

11.3.5 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

11.3.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

11.4 Exponential family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

11.4.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

CONTENTS 165

11.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

11.4.3 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

11.4.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

11.4.5 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

11.4.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

11.5 Elliptical distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

11.5.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

11.5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

11.5.3 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

11.5.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

11.5.5 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

11.5.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

12 Multivariate distributions 133

12.1 Multinomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

12.2 Multivariate normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

12.3 Multivariate elliptical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

12.4 Multivariate uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

12.5 Multivariate student . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

12.6 Kent distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

12.7 Dirichlet distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

12.7.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

12.7.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

12.7.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

12.7.4 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

166 CONTENTS

12.7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

12.8 Von Mises Fisher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

12.9 Evens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

13 Misc 135

13.1 MBBEFD distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

13.1.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

13.1.2 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

13.1.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

13.1.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

13.1.5 Random generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

13.1.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

13.2 Cantor distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

13.3 Tweedie distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Conclusion 137

Bibliography 137

A Mathematical tools 141

A.1 Basics of probability theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

A.1.1 Characterising functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

A.2 Common mathematical functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

A.2.1 Integral functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

A.2.2 Factorial functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

A.2.3 Serie functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

A.2.4 Miscellanous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

CONTENTS 167

A.3 Matrix exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

A.4 Kronecker product and sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Bernoulli Distribution
No ratings yet
Bernoulli Distribution
5 pages
Lab Test (Q) Questions
No ratings yet
Lab Test (Q) Questions
2 pages
Lecture Note SGD
No ratings yet
Lecture Note SGD
4 pages
Slides Module 4 Lesson 2
No ratings yet
Slides Module 4 Lesson 2
34 pages
Poisson Regression Analysis 136
No ratings yet
Poisson Regression Analysis 136
8 pages
Lect_08 A Bernoulli Distributions
No ratings yet
Lect_08 A Bernoulli Distributions
68 pages
Statistics 5
No ratings yet
Statistics 5
3 pages
Lecture 7
No ratings yet
Lecture 7
26 pages
Navidi_ch4 (1)
No ratings yet
Navidi_ch4 (1)
85 pages
Common Probalility Built in Functions
No ratings yet
Common Probalility Built in Functions
68 pages
Lecture-2
No ratings yet
Lecture-2
90 pages
Probability 2
No ratings yet
Probability 2
67 pages
Ch3 Random Variables
No ratings yet
Ch3 Random Variables
27 pages
Mansi Bharne 16/10: Unit Viii: Correlation and Regression
No ratings yet
Mansi Bharne 16/10: Unit Viii: Correlation and Regression
5 pages
Notes On Unit 3
No ratings yet
Notes On Unit 3
42 pages
Standard Discrete Probability Distribution
No ratings yet
Standard Discrete Probability Distribution
25 pages
Common Probability Distributionsi Math 217/218 Probability and Statistics
No ratings yet
Common Probability Distributionsi Math 217/218 Probability and Statistics
10 pages
MAS202 - Homework For Chapter 13-14
No ratings yet
MAS202 - Homework For Chapter 13-14
7 pages
Chapter 4
No ratings yet
Chapter 4
27 pages
2. Discrete Distributions
No ratings yet
2. Discrete Distributions
21 pages
Statistical Foundations: SOST70151 - LECTURE 2
No ratings yet
Statistical Foundations: SOST70151 - LECTURE 2
43 pages
A
No ratings yet
A
3 pages
Slide - 8 - 04 - Minimum Mean Square Estimation
No ratings yet
Slide - 8 - 04 - Minimum Mean Square Estimation
33 pages
ISLR Chap 4 Shaheryar
No ratings yet
ISLR Chap 4 Shaheryar
16 pages
استدلال احصائي
No ratings yet
استدلال احصائي
110 pages
Statistical Analysis in Model Building
No ratings yet
Statistical Analysis in Model Building
9 pages
Topic: Count Data Analysis (Poisson: Regression)
No ratings yet
Topic: Count Data Analysis (Poisson: Regression)
35 pages
Bernoulli Distribution - Wikipedia
No ratings yet
Bernoulli Distribution - Wikipedia
20 pages
Probability Distributions
No ratings yet
Probability Distributions
167 pages
Probability Gatre
No ratings yet
Probability Gatre
3 pages
Bernoulli_and_binomial_distribution
No ratings yet
Bernoulli_and_binomial_distribution
39 pages
The Simple Linear Regression Model: Specification and Estimation
No ratings yet
The Simple Linear Regression Model: Specification and Estimation
66 pages
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
No ratings yet
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
20 pages
Theoretical Distributions: Bernoulli's Distribution Binomial Distributions
No ratings yet
Theoretical Distributions: Bernoulli's Distribution Binomial Distributions
35 pages
p3_sol
No ratings yet
p3_sol
40 pages
28-10-2021
No ratings yet
28-10-2021
15 pages
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
25 pages
The Problem of Modeling Rare Events in ML-based Logistic Regression - Assessing Potential Remedies Via MC Simulations
No ratings yet
The Problem of Modeling Rare Events in ML-based Logistic Regression - Assessing Potential Remedies Via MC Simulations
20 pages
Slides - Module 2 - Lesson 2
No ratings yet
Slides - Module 2 - Lesson 2
23 pages
Lesson - 11
No ratings yet
Lesson - 11
50 pages
Chapter 3 - Correlation & Regression
No ratings yet
Chapter 3 - Correlation & Regression
19 pages
Mack-Chain-Ladder-Sloma(1)
No ratings yet
Mack-Chain-Ladder-Sloma(1)
23 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
CSD502 Standard Probability Dist.docx
No ratings yet
CSD502 Standard Probability Dist.docx
15 pages
1) - BERNOULLI'S-WPS Office
No ratings yet
1) - BERNOULLI'S-WPS Office
30 pages
Stepwiseselection MATTOUHI AICHA
No ratings yet
Stepwiseselection MATTOUHI AICHA
7 pages
GED102-1 Multiple and Nonlinear Regression Excel
No ratings yet
GED102-1 Multiple and Nonlinear Regression Excel
20 pages
Study note chap 3
No ratings yet
Study note chap 3
32 pages
Prob. Distri.
No ratings yet
Prob. Distri.
36 pages
Lecture 4
No ratings yet
Lecture 4
28 pages
Chapter 4: Probability Distributions: 4.1 Random Variables
100% (1)
Chapter 4: Probability Distributions: 4.1 Random Variables
53 pages
Discrete Probability Distribution
No ratings yet
Discrete Probability Distribution
24 pages
Tài liệu 5
No ratings yet
Tài liệu 5
19 pages
Bernoulli distribution [Autosaved]
No ratings yet
Bernoulli distribution [Autosaved]
13 pages
Lecture 4 - Fall 2023
No ratings yet
Lecture 4 - Fall 2023
29 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
Lecture 3 - Probability - BMSLec02
No ratings yet
Lecture 3 - Probability - BMSLec02
16 pages
Notes ch2 Discrete and Continuous Distributions
No ratings yet
Notes ch2 Discrete and Continuous Distributions
48 pages
Lecture Note 1
No ratings yet
Lecture Note 1
5 pages
18 Simultaneous Equation Models Two Stage Least Squares Estimation
No ratings yet
18 Simultaneous Equation Models Two Stage Least Squares Estimation
6 pages
Module 5 Common Discrete Probability Distribution - Latest
No ratings yet
Module 5 Common Discrete Probability Distribution - Latest
45 pages
Distribution PPT
No ratings yet
Distribution PPT
75 pages
Chapter 2
No ratings yet
Chapter 2
25 pages
UNIT 4
No ratings yet
UNIT 4
30 pages
Beyene Stat For Management II Chapter 2
No ratings yet
Beyene Stat For Management II Chapter 2
21 pages
Unit 03 - Random Variables - 1 Per Page
No ratings yet
Unit 03 - Random Variables - 1 Per Page
47 pages
Applied Statistics and Probability For Engineers, 5th Edition
75% (4)
Applied Statistics and Probability For Engineers, 5th Edition
23 pages
Block 3
No ratings yet
Block 3
63 pages
Common Probability Distributions: D. Joyce, Clark University Aug 2006
No ratings yet
Common Probability Distributions: D. Joyce, Clark University Aug 2006
9 pages
LectureNotes5 PDF
No ratings yet
LectureNotes5 PDF
28 pages
Arx Models
No ratings yet
Arx Models
14 pages
Cars
No ratings yet
Cars
103 pages
Confidence Interval Estimation
100% (1)
Confidence Interval Estimation
31 pages
MetNum1 2023 1 Week 12
No ratings yet
MetNum1 2023 1 Week 12
61 pages
Linear Regression Analysis in SPSS Statistics
No ratings yet
Linear Regression Analysis in SPSS Statistics
7 pages
Topic 5 Discrete Distributions
No ratings yet
Topic 5 Discrete Distributions
30 pages
Common Probability Distributions: 1.1 Bernoulli Distribution
No ratings yet
Common Probability Distributions: 1.1 Bernoulli Distribution
6 pages
The Detection of Earnings Manipulation by Messod D Beneish (1999)
No ratings yet
The Detection of Earnings Manipulation by Messod D Beneish (1999)
27 pages
Chap 3
No ratings yet
Chap 3
18 pages
Commonly Used Probability Distribution - SHORT
No ratings yet
Commonly Used Probability Distribution - SHORT
26 pages
Probdist Ref
No ratings yet
Probdist Ref
256 pages
Probability & Statistics 2: AS2110 / MA3666
No ratings yet
Probability & Statistics 2: AS2110 / MA3666
32 pages
Probability Space and Random Variable Proporties
No ratings yet
Probability Space and Random Variable Proporties
21 pages
ch11 Heteroscedasticity
No ratings yet
ch11 Heteroscedasticity
31 pages
Unit-8 IGNOU STATISTICS
No ratings yet
Unit-8 IGNOU STATISTICS
15 pages
Statistics
No ratings yet
Statistics
167 pages
Probability Distribution: Shreya Kanwar (16eemme023)
No ratings yet
Probability Distribution: Shreya Kanwar (16eemme023)
51 pages
FMOLS Model
No ratings yet
FMOLS Model
8 pages
Estimating Number of PPQ Batches: Various Approaches: Journal of Pharmaceutical Innovation
No ratings yet
Estimating Number of PPQ Batches: Various Approaches: Journal of Pharmaceutical Innovation
9 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet