0% found this document useful (0 votes)

34 views

Lecture 3

Uploaded by

Isaac

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Lecture 3

Uploaded by

Isaac

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

P1.T2.

Quantitative Analysis

Miller, Mathematics & Statistics for Financial Risk

Management

Distributions

Bionic Turtle FRM Video Tutorials

By David Harper, CFA FRM
Distributions
• Describe the key properties of the uniform distribution, Bernoulli
distribution, Binomial distribution, the Poisson distribution, normal
distribution, lognormal distribution, the Chi-squared distribution, Student’s
t and F-distributions, and identify common occurrences of each
distribution.
• Describe the central limit theorem and
the implications it has when combining
i.i.d. random variables.
• Describe the properties of independent
and identically distributed (i.i.d.) random
variables.
• Describe a mixture distribution and
explain the creation and characteristics
of mixture distributions.

Page 2
Describe the key properties of the uniform distribution, Bernoulli
distribution & Binomial distribution and identify common occurrences
of each distribution.
Uniform distribution

If the random variable, X, is discrete, then the uniform

distribution is given by the following probability mass function
(pmf):
1
=

This is an extremely simple distribution. Common examples of discrete

uniform distributions are:
• A coin, where n=2, such that the probability: P[heads] = 1/2 and P[tails] = 1/2; or
• A six-sided die, where probability of rolling a one is: P[rolling a one] = 1/6

Page 3
Describe the key properties of the uniform distribution, Bernoulli
distribution & Binomial distribution and identify common occurrences
of each distribution (continued)
Uniform distribution
If the random variable, X, is continuous, the uniform distribution
is given by the following probability density function (pdf):

1
for ≤ x ≤
= −
0 for x < or x >

The mean, , is calculated as the average of the start and end values of the
distribution. Similarly, the variance, is calculated as shown below.
1
= ( + )
2
1
= ( − )
12
The uniform distribution is characterized by the following cumulative
distribution function (CDF):
−
≤ =
−

Page 4
Describe the key properties of the uniform distribution, Bernoulli
distribution & Binomial distribution and identify common occurrences
of each distribution (continued)
Bernoulli distribution
A random variable X is called Bernoulli distributed with parameter (p) if it has
only two possible outcomes, often encoded as 1 (“success” or “survival”) or 0
(“failure” or “default”), and if the probability for realizing “1” equals p and the
probability for “0” equals 1 – p. The classic example for a Bernoulli-distributed
random variable is the default event of a company.

A Bernoulli variable is discrete and has two possible outcomes:

1
=
0

Page 5
Describe the key properties of the uniform distribution, Bernoulli
distribution & Binomial distribution and identify common occurrences
of each distribution (continued)
Binomial distribution
A binomial distributed random variable is the sum of (n) independent and
identically distributed (i.i.d.) Bernoulli-distributed random variables. The
probability of observing (k) successes is:
!
= = ( − ) where = ! !

• The mean of this random

variable is and the
variance of a binomial
distribution is (1 − ).
• The exhibit here shows
binomial distribution with p =
0.10, for n = 10, 50, and
100.

Page 6
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Poisson distribution
A Poisson-distributed random variable is used to describe
the random number of events occurring over a certain
time interval. The Poisson distribution depends upon only
one parameter, lambda λ, and can be interpreted as an
approximation to the binomial distribution. The lambda
parameter (λ) indicates the rate of occurrence of the
random events.

If the rate at which events occur over time is constant, and the probability of any one
event occurring is independent of all other events, then the events follow a Poisson
process, where t is the amount of time elapsed (i.e, the expected number of events
before time t is equal to λt):

= =
!

Page 7
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Poisson distribution
In Poisson, lambda is both the expected value (the mean) and the
variance!
The exhibit below represents a Poisson distribution for λ and n =2, 4 and 10.

In the Poisson distribution, the random

number of events that occur during an
interval of time, (e.g., losses/ year,
failures/ day) is given by:

= =
!

Page 8
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Normal distribution
The normal or Gaussian distribution is often referred to as
the bell curve because of the shape of its probability density
function. Characteristics of the normal distribution include:

• The middle of the distribution, mu (µ), is the mean (and

median). This first moment is also called the “location”.
• Standard deviation and variance are measures of dispersion (a.k.a., shape).
Variance is the second-moment; typically, variance is denoted by sigma-
squared such that standard deviation is sigma.
• The distribution is symmetric around µ. In other words, the normal has
skewness = 0
• The normal has kurtosis = 3 or “excess kurtosis” = 0

Page 9
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Normal distribution
Properties of normal distribution:
• Location-scale invariance: Imagine random variable X, which is
normally distributed with the parameters µ and σ. Now consider
random variable Y, which is a linear function of X, such that: Y = aX
+ b. In general, the distribution of Y might substantially differ from
the distribution of X, but in the case where X is normally distributed,
the random variable Y is again normally distributed with parameters
mean (= a*µ + b) and variance (=a^2* σ). Specifically, we do not
leave the class of normal distributions if we multiply the
random variable by a factor or shift the random variable.

• Summation stability: If you take the sum of several independent random

variables, which are all normally distributed with mean (µi) and standard
deviation (σi), then the sum will be normally distributed again.

• The normal distribution possesses a domain of attraction. The central limit

theorem (CLT) states that—under certain technical conditions—the distribution
of a large sum of random variables behaves necessarily like a normal
distribution. The normal distribution is not the only class of probability
distributions having a domain of attraction. Actually, three classes of
distributions have this property: they are called stable distributions.

Page 10
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Normal distribution
Here we show an exhibit of a
normal distribution for µ =10 and at
various levels of σ (1, 2 and 3)

For a random variable X, the

probability density function for the
normal distribution is:

1
=
2

Conventionally, this is written as X is normally distributed with a mean of µ and

variance of σ2:
~ ( , )

Page 11
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Normal distribution
Why is the normal distribution commonplace?

The central limit theorem (CLT) says that sampling distribution of sample
means tends to be normal regardless of the shape of the underlying distribution;
this explains much of the “popularity” of the normal distribution.

The normal is economical (elegant) because it only requires two parameters

(mean and variance). The standard normal is even more economical: it
requires no parameters.

The normal is tractable: it is easy to manipulate (especially in regard to closed-

form equations like the Black-Scholes)

Parsimony: It requires (or is fully described by) only two parameters: mean and
variance

Page 12
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Normal distribution
Standard Normal distribution
A normal distribution is fully specified by two parameters,
mean and variance (or standard deviation). We can
transform a normal into a unit or standardized variable:
• Standard normal has mean = 0, and variance = 1
• No parameters required!

This unit or standardized variable is normally distributed with zero mean and
variance of one. Its standard deviation is also one (variance = 1.0 and standard
deviation = 1.0).

This can be written as:

Variable Z is approximately (“asymptotically”) normally distributed:
Z ~ N(0,1)

Page 13
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Normal distribution
Standard normal distribution: Critical Z values

Key locations on the normal distribution Critical Two-sided One-sided

are noted here. In the FRM curriculum, z values Confidence Significance
the choice of one-tailed 5% 1.00 ~ 68% ~ 15.87%
significance and 1% significance (i.e., 1.645 (~1.65) ~ 90% ~ 5.0 %
95% and 99% confidence) is common, 1.96 ~ 95% ~ 2.5%
so please pay particular attention to the
2.327(~2.33) ~ 98% ~ 1.0 %
yellow highlights.
2.58 ~ 99% ~ 0.5%

Memorize the two common critical values: 1.65 and 2.33. These correspond to
confidence levels, respectively, of 95% and 99% for a one-tailed test. For VAR,
the one-tailed test is relevant because we are concerned only about losses (left-tail)
not gains (right-tail).

Page 14
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Normal distribution
Multivariate normal distributions

Normal can be generalized to a joint distribution of normal; e.g., bivariate

normal distribution. Properties include:

• If X and Y are bivariate normal, then aX + bY is normal;

• any linear combination is normal.
• If a set of variables has a multivariate normal distribution,
• the marginal distribution of each is normal
• If variables with a multivariate normal distribution have covariances that equal
zero, then the variables are independent.

Page 15
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Examples
In the FRM, four distributions are quite common:
• The Bernoulli is invoked when there are only two outcomes. It is
used to characterize a default: an obligor or bond will either default
or survive. Most bonds “survive” each year, until perhaps one year
they default. At any given point in time, or during any given year,
the bond will be in one of two states.

• The binomial is a series of independent and identically distributed (i.i.d.) Bernoulli

variables, such that the binomial is commonly used to characterize a portfolio of
credits.

• The normal distribution is the most common:

 Typically, the central limit theorem (CLT) will justify the significance test of the
sample average in a large sample - for example, to test the sample average
asset return or excess return.
 In many cases, the normal distribution is employed to model equity returns for
short horizons; this is an assumption made with the understanding that it may
not be realistic.

Page 16
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Examples

• The Poisson distribution has two very common purposes:

 Poisson is often used, as a
generic stochastic process, to Normal Binomial Poisson
model the time of default in Mean = =
Variance = =
some credit risk models.
Standard = =
 As a discrete distribution, the Dev.
Poisson is arguably the most
common distribution employed
for operational loss frequency.

Page 17
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Lognormal
The lognormal is common in finance: If an asset return (r) is normally distributed,
the continuously compounded future asset price level (or ratio or prices; i.e., the
wealth ratio) is lognormal. Expressed in reverse, if a variable is lognormal, its natural
log is normal. Here is an exhibit of lognormal distribution for µ =10 and at various
levels of σ (0.25, 0.5 and 1)

Page 18
Describe the key properties of the Poisson distribution, normal
distribution & lognormal distribution and identify common
occurrences of each distribution (continued)
Lognormal
The lognormal distribution is extremely common in finance because it is often the
distribution assumed for asset prices (e.g., stock prices). Specifically, it is common to
assume that log (i.e., continuously compounded) asset returns are normally
distributed such that, by definition, asset prices have a lognormal distribution.

The density function of the lognormal distribution is given by:

Page 19
Describe the key properties of the Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution.
Chi-squared

Chi-squared distribution is the sum of the squares of k independent

standard normal random variables. The variable k is referred to as the
degrees of freedom. The below exhibit shows the probability density functions
for some chi-squared distributions with different values for k(1, 2 and 3).

Page 20
Describe the key properties of the Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution (continued)
Chi-squared

Properties of the chi-squared distribution include:

• Nonnegative (>0), since it is a sum of squared values.
• Skewed right, but as d.f. increases it approaches
normal.
• Expected value (mean) = k and variance = 2k, where k
= degrees of freedom.
• The sum of two independent chi-square variables is
also a chi-squared variable.

Using a chi-square distribution, we can observe a sample variance and

compare to hypothetical population variance. This variable has a chi-
square distribution with (n-1) d.f.

Page 21
Describe the key properties of the Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution (continued)
Chi-squared

Example (Google’s stock return variance): Google’s sample variance over 30

days is 0.0263%. We can test the hypothesis that the population variance
(Google’s “true” variance) is 0.02%. The chi-square variable = 38.14:

Sample variance (30 days) 0.0263%

Degrees of freedom (d.f.) 29
Population variance? 0.0200%
Chi-square variable 38.14 = 0.0263%/0.02%*29
=CHIDIST() = p value 11.93% @ 29 d.f., P[.1] = 39.0875
Area under curve (1- ) 88.07%

With 29 degrees of freedom (d.f.), 38.14 corresponds to 11.93% (i.e., to left of

0.10 on the lookup table). Therefore, we can reject the null with only 88%
confidence; i.e., we are likely to accept the probability that the true variance is
0.02%.

Page 22
Describe the key properties of the Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution (continued)
Student’s t
Student t’s distribution (for large samples, approximates the normal)

As the degrees of freedom (d.f.) increases, the t-distribution converges with the
normal distribution. It is similar to the normal, except that it exhibits slightly heavier
tails (the lower the d.f., heavier the tails). The below exhibit shows the basic shape of
the student t’s distribution and how it changes with k (specifically the shape of its tail).

The student’s t variable is given by:

−
=
/

Page 23
Describe the key properties of the Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution (continued)
Student’s t

Properties of the t-distribution:

• Like the normal, it is symmetrical
• Like the standard normal, it has mean of zero (mean = 0)
• Its variance for k >2 is k/(k-2) where k = degrees of freedom. Note,
as k increases, the variance approaches 1.0 and approximates
the standard normal distribution.
• Always slightly heavy-tail (kurtosis>3.0) but converges to normal.
But the student’s t is not considered a really heavy-tailed
distribution.

In practice, the student’s t is the mostly commonly used distribution.

Page 24
Describe the key properties of the Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution (continued)
Student’s t

For example, Google’s average periodic return Sample Mean 0.02%

over a ten-day sample period was +0.02% with Sample Std Dev 1.54%
sample standard deviation of 1.54%. Days
(n=10) 10
If we do not know the population variance, the
Confidence 95%
sample mean is a random variable following a
Significance 5%
student’s t distribution. We can use this to
Critical t 2.262
construct a confidence (random) interval:
Lower limit -1.08%
Upper limit 1.12%
±

We need the critical (lookup) t value. The critical t value is a function of:
• Degrees of freedom (d.f.); e.g., 10-1 =9 in this example
• Significance: 1-95% confidence = 5% in this example

Page 25
Describe the key properties of the Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution (continued)
Student’s t
The critical-t is just a lookup (reference to) the student's t distribution as opposed to a computed t-
statistic, aka t-ratio. In this way, a critical t is an inverse CDF (quantile function) just like, for a
normal distribution, the "critical one-tailed value" at 1% is -2.33 and at 5% is -1.645. In this case
we want the critical t for (n-1) degrees of freedom and two-tailed 5% significance (= one tailed
2.5%). We can find 2.262 on the student's t lookup table where column = 2-tail 0.05 and d.f. = 9.
In Excel, 2.262 = T.INV.2t (5%, 9). The 95% confidence interval can be computed.

The upper limit is given by:

1.54%
+ 2.262 = 1.12%
10

The lower limit is given by:

1.54%
− 2.262 = −1.08%
10

Page 26
Describe the key properties of the Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution (continued)
Student’s t

Both the normal (Z) and student’s t (t) distribution characterize the sampling
distribution of the sample mean.
The difference is that the normal is used when we know the population
variance; the student’s t is used when we must rely on the sample variance. In
practice, we don’t know the population variance, so the student’s t is typically
appropriate.
( − ) ( − )
= =

Page 27
Describe the key properties of the Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution (continued)
F-Distribution

The F distribution is also called the variance ratio distribution. The F ratio is
the ratio of sample variances, with the greater sample variance in the
numerator:

Properties of F distribution:
• Nonnegative (>0) and skewed to the
right
• Like the chi-square distribution, as
d.f. increases, it approaches normal
• Square of a variable with t-distribution
and k d.f. has an F distribution with
(1, k) d.f: X2 ~ F(1,k)

Page 28
Describe the key properties of the Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution (continued)
F-Distribution

Example: Based on two 10-day samples, we calculated the sample variance of

Google and Yahoo. Google’s variance was 0.0237% and Yahoo’s was
0.0084%. Find the F ratio.

GOOG YHOO
=VAR() 0.0237% 0.0084%
=COUNT() 10 10
F ratio 2.82
Confidence 90%
Significance 10%
=FINV() 2.44

• The F ratio, therefore, is 2.82 (divide higher variance by lower variance; the F ratio
must be greater than, or equal to, 1.0).
• At 10% significance, with (10-1) and (10-1) degrees of freedom, the critical F value
is 2.44. Because our F ratio of 2.82 is greater than (>) 2.44, we reject the null (i.e.,
that the population variances are the same).
• We conclude the population variances are different.

Page 29
Describe the central limit theorem and the implications it has when
combining i.i.d. random variables.

Central limit theorem (CLT)

• We assume a population with a known mean and
finite variance, but not necessarily a normal
distribution (we may not know the distribution!).
Random samples of size (n) are then drawn from the
population. The expected value of each random
variable is the population’s mean. The variance of
each random variable is equal the population’s
variance divided by n.

• The central limit theorem says that this random variable is itself normally
distributed, regardless of the shape of the underlying population. Given a
population described by any probability distribution having mean () and finite
variance (2), the distribution of the sample mean computed from samples
(where each sample equals size n) will be approximately normal. If the size
of the sample is at least 30 (n  30), then we can assume the sample mean
is approximately normal!

Page 30
Describe the central limit theorem and the implications it has when
combining i.i.d. random variables (continued)

Each sample has a sample mean. There are many sample means. The
sample means have variation: a sampling distribution. The central limit
theorem (CLT) says the sampling distribution of sample means is
asymptotically normal.

Page 31
Describe the central limit theorem and the implications it has when
combining i.i.d. random variables (continued)

Summary of central limit theorem (CLT):

• We assume a population with a known mean and finite variance, but not
necessarily a normal distribution.
• Random samples (size n) drawn from the population.
• The expected value of each random variable is the population mean
• The distribution of the sample mean computed from samples (where each sample
equals size n) will be approximately (asymptotically) normal.
• The variance of each random variable is equal to population variance divided by n
(equivalently, the standard deviation is equal to the population standard deviation
divided by the square root of n).

Page 32
Describe the central limit theorem and the implications it has when
combining i.i.d. random variables (continued)

Sample Statistics and Sampling Distributions

• When we draw from (or take) a sample, the sample is a random variable with
its own characteristics. The “standard deviation of a sampling distribution” is
called the standard error.

• The mean of the sample or the sample mean is a random variable defined by:

+ +
=

Page 33
Describe the properties of independent and identically distributed
(i.i.d.) random variables.

A random sample is a sample of random variables that are independent

and identically distributed (i.i.d.)

Independent and identically distributed (i.i.d.) variables:

• Each random variable has the same (identical)
probability distribution (PDF/PMF, CDF).
• Each random variable is drawn independently of the
others: no serial or auto-correlation.

The concept of independent and identically distributed (i.i.d.) variables is a

key assumption we often encounter: to scale volatility by the square root
of time requires i.i.d. returns. If returns are not i.i.d., then scaling volatlity
by the square root of time will give an incorrect answer.

Page 34
Describe a mixture distribution and explain the creation and
characteristics of mixture distributions.

A mixture distribution is a sum of other distribution functions but weighted by

probabilities. The density function of a mixture distribution is, then, the
probability-weighted sum of the component density function

= , =1

where fi(x)’s are the component distributions, and wi’s are the mixing proportions
or weights.

Page 35
Describe a mixture distribution and explain the creation and
characteristics of mixture distributions (continued)

Note: The sum of the component weights must equal one.

Consider a stock whose log returns follow a normal distribution with low volatility
90% of the time, and a normal distribution with high volatility 10% of the time.
Most of the time the stock just bounces along but occasionally, the stock’s
behavior may be more extreme. In this Miller’s example, the mixture distribution
is:
= +

According to Miller, “Mixture distributions are extremely flexible. In a sense they occupy
a realm between parametric distributions and non-parametric distributions. In a typical
mixture distribution, the component distributions are parametric but the weights are
based on empirical (non-parametric) data. Just as there is a trade-off between
parametric distributions and non-parametric distributions, there is a trade-off between
using a low number and a high number of component distributions. By adding more and
more component distributions, we can approximate any data set with increasing
precision. At the same time, as we add more and more component distributions, the
conclusions that we can draw become less and less general in nature.”

Page 36
Describe a mixture distribution and explain the creation and
characteristics of mixture distributions (continued)

Normal mixture distribution

• A mixture distribution is extremely flexible. If two

normal distributions have the same mean, they
combine (mix) to produce mixture distribution with
leptokurtosis (heavy-tails). Otherwise, mixtures are
infinitely flexible.

• Just by adding two normal distributions

together, we can develop a large number
of interesting distributions. For example,
if we combine two normal distributions
with the same mean but different
variances, we can get a symmetrical
mixture distribution that displays excess
kurtosis.

Page 37
Describe a mixture distribution and explain the creation and
characteristics of mixture distributions (continued)

By shifting the mean of one distribution, we can also create a distribution with
positive or negative skew. Finally, if we move the means of far enough apart, the
resulting mixture distribution will be bimodal. This exhibit below shows that we
have a PDF with two distinct maxima.

Page 38
The End