0% found this document useful (0 votes)
2 views

Study note chap 3

The document provides an overview of common univariate random variables, focusing on various distributions such as uniform, Bernoulli, binomial, Poisson, and normal distributions, along with their properties and applications. It includes detailed explanations of each distribution's characteristics, probability functions, and examples, particularly in the context of financial risk management. Additionally, it discusses the significance of the normal distribution and its role in statistical analysis, including critical Z values for significance testing.

Uploaded by

vy an
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Study note chap 3

The document provides an overview of common univariate random variables, focusing on various distributions such as uniform, Bernoulli, binomial, Poisson, and normal distributions, along with their properties and applications. It includes detailed explanations of each distribution's characteristics, probability functions, and examples, particularly in the context of financial risk management. Additionally, it discusses the significance of the normal distribution and its role in statistical analysis, including critical Z values for significance testing.

Uploaded by

vy an
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.

The information provided in this document is intended solely for you. Please do not freely distribute.

P1.T2. Quantitative Analysis

Chapter 3: Common Univariate Random Variables

Bionic Turtle FRM Study Notes


Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Chapter 3: Common Univariate Random Variables

DISTINGUISH THE KEY PROPERTIES AND IDENTIFY THE COMMON OCCURRENCES OF THE
FOLLOWING DISTRIBUTIONS: UNIFORM DISTRIBUTION, BERNOULLI DISTRIBUTION, BINOMIAL
DISTRIBUTION, POISSON DISTRIBUTION, NORMAL DISTRIBUTION, LOGNORMAL DISTRIBUTION, CHI-
SQUARED DISTRIBUTION, STUDENT’S T, AND F-DISTRIBUTIONS. .................................................... 3
DESCRIBE A MIXTURE DISTRIBUTION AND EXPLAIN THE CREATION AND CHARACTERISTICS OF
MIXTURE DISTRIBUTIONS. ........................................................................................................ 22
QUESTIONS & ANSWERS: ........................................................................................................ 24
APPENDIX: ADDITIONAL DISTRIBUTIONS THAT ARE NOT IN THE SYLLABUS BUT OCCASIONALLY
RELEVANT .............................................................................................................................. 27

2
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Chapter 3: Common Univariate Random Variables


 Distinguish the key properties … (see below)

 Describe a mixture distribution and explain the creation and characteristics of


mixture distributions.

Distinguish the key properties and identify the common occurrences


of the following distributions: uniform distribution, Bernoulli
distribution, binomial distribution, Poisson distribution, normal
distribution, lognormal distribution, Chi-squared distribution,
Student’s t, and F-distributions.
Uniform distribution

The uniform distribution can be either discrete or continuous. If the random variable, X, is
discrete, then the uniform distribution is given by the following probability mass function (pmf):

1
( )=

This is the simplest distribution. Common examples of discrete uniform distributions are:
 A coin, where n=2, such that the probability: P[heads] = 1/2 and P[tails] = 1/2; or
 A six-sided die, where for example, probability of rolling a one is: P[rolling a one] = 1/6

If the random variable, X, is continuous, the uniform distribution is given by the following
probability density function (pdf):

1
for ≤ x ≤
( )= −
0 for x < or x >

Using this pdf, the mean, is calculated as the average of the start and end values of the
distribution. Similarly, the variance, is calculated as shown below.

1
= ( + )
2
1
= ( − )
12
The uniform distribution is characterized by the following cumulative distribution function (CDF):
− 1
[ ≤ ]=
2− 1

3
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Bernoulli distribution

A random variable X is called Bernoulli distributed with parameter (p) if it has only two
possible outcomes, often encoded as 1 (“success” or “survival”) or 0 (“failure” or “default”), and
if the probability for realizing “1” equals p and the probability for “0” equals 1 – p. The classic
example for a Bernoulli-distributed random variable is the default event of a company.

1
A Bernoulli variable is discrete and has two possible outcomes: =
0

The Bernoulli distribution is a special case of the binomial distribution that has only a single trial;
i.e., n = 1. Alternatively, we can say that the binomial distribution generalizes the Bernoulli
because it is a series of independent and identically distributed (i.i.d.) Bernoulli variables.

Binomial distribution

A binomial distributed random variable is the sum of (n) independent and identically distributed
(iid) Bernoulli-distributed random variables. The probability of observing (k) successes is:

!
( = )= ( − ) where = − ! !

The mean of this random variable is and the variance of a binomial distribution is (1 − ).

The exhibit below compares binomial distribution with p = 0.10, for n = 10, 50, and 100. As the
number of trials increases (i.e., as n  ∞), the binomial tends to resemble the normal
distribution.

4
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

The exhibit below compares the binomial’s pmf to its corresponding CDF, for two sets of
parameters (these charts were generated by r’s ggplot):

Application of binomial: the binomial tree for asset prices

On the next page is shown a simple binomial asset price tree. The assumptions are simple:
 The initial asset price is $100.00
 At each node (each month), the price can jump up by a multiple of 1.10 or down by a
multiple of 1/1.10 = 0.90909; this a rough, discrete approximation of a lognormal
property. For example, at the end of the first month, the price can either:
o Jump up to $100.00 * 1.10 = $110.00, or
o Jump down to $100.00 * 1/1.10 = $90.909
 Each step is one month. The tree has twelve steps (each step is 1/12 of one year) and
therefore the terminal nodes represent possible prices at the end of one year.
 This three happens to assume that the probability of an up jump is 60.0%; therefore, the
probability of a down jump is 40.0%. Each step is a Bernoulli variable where p = 0.60.
This model represents a series of iid Bernoulli variables and therefore this is a binomial
model with parameters p = 0.60 (an up jump is “success,” if you will) and n = 12 steps.
Let’s look at the tree with a focus on the final two rows which represent (i) the thirteen different
possible terminal values and (ii) the pmf probabilities of reaching those nodes (colored in blue)
which must sum to 100.0%/

5
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Because this is binomial model, we can use the binomial distribution directly to retrieve the
terminal probabilities; the final column itself represents a binomial probability mass function
(pmf). For example,
 The probability of ending at $100.00, where the asset started, is given by
12
(1 − ) = 0.6 (1 − 0.6) = 0.17658 ~ 17.7%
6
 The probability of reaching $177.16, which requires 9 up jumps, is given by
12
(1 − ) = 0.6 (1 − 0.6) = 0.14189 ~ 14.2%
9

6
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Poisson distribution

A Poisson-distributed random variable describes the random number of events occurring


over a certain time interval, for example, the number of bond defaults in a portfolio or the
number of market crashes over an interval. The Poisson requires only one parameter, lambda
(λ) which indicates the rate of occurrence of the random events; i.e., how many events occur
on average per unit of time and n is the number of events that occur in an interval.

In the Poisson distribution, the random number of events that occur during an interval of time,
(e.g., losses/ year, failures/ day) is given by:

( = )= (4.10)1
!

If the rate at which events occur over time is constant, and the probability of any one event
occurring is independent of all other events, then the events follow a Poisson process, where t
is the amount of time elapsed (i.e, the expected number of events before time t is equal to λt):1

( = )= (4.11)
!

In the Poisson, the lambda parameter, λ, is both the expected value (the mean) and the
variance. The exhibit below compares a Poisson distribution for λ and n =2, 4 and 10.

The next exhibit compares the Poisson’s pmf to its corresponding CDF, for two sets of
parameters.

1Michael Miller, Mathematics and Statistics for Financial Risk Management, 2nd Edition (Hoboken, NJ:
John Wiley & Sons, 2013)

7
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Normal distribution

The normal (aka, Gaussian; aka, bell curve due to shape of its pdf2) has these features:
 The distribution’s middle and peak, mu (µ), is the mean (and the median). This first
moment is also called the location. Standard deviation and variance are measures of
dispersion. Variance is the second central moment and is denoted sigma-squared such
that standard deviation is denoted σ
 The normal is symmetric around µ; i.e., the normal has zero skew. The normal has
kurtosis of three (3) or, equivalently, zero excess kurtosis.
Properties of normal distribution:
 Location-scale invariance: Imagine normally distributed random variable X(µ, σ^2).
Now imagine random variable Y as a linear function of X: Y = a*X + b. Location-scale
invariance says that Y is also normally distributed with parameters of mean (= a*µ + b)
and variance (=a^2* σ). Usefully, we do not exit the class of normal distributions if
we multiply the random variable by a factor and/or shift the random variable.
 Summation stability: If we sum several independent normal random variables, each
with mean (µi) and standard deviation (σi), their sum will be normally distributed again.
 The normal distribution possesses a domain of attraction. The central limit theorem
(CLT) states that—under certain conditions—the distribution of a large sum of random
variables behaves approximately normal. Three classes of distributions have this
property: they are called stable distributions.

2Michael Miller, Mathematics and Statistics for Financial Risk Management, 2nd Edition (Hoboken, NJ:
John Wiley & Sons, 2013)

8
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Below is an exhibit of a normal distribution for µ =10 and at various levels of σ (1, 2 and 3)

The exhibit below compares the normal distribution’s pdf to its corresponding CDF, for two sets
of parameters. Please note the top row is plotting the standard normal (more detail later) such
that the parameters are not listed. Why are they not required? Because the standard normal
has, by definition, zero mean and unit variance.

9
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

For a random variable X, the probability density function for the normal distribution is:3

2
−12 −
( )= (4.12)

As a normally distributed variable with mean, µ, and variance, σ2, X is denoted ~ ( , ).

The normal distribution is commonplace for at least three (or four) reasons:
 The central limit theorem (CLT) asserts that the sampling distribution of sample means
tends to be normal (i.e., converges toward a normally shaped distributed) regardless of
the shape of the underlying distribution. This is an incredibly useful idea.
 The normal is economical (aka, the feature of parsimony, or being parsimonious)
because it only requires two parameters (mean and variance). The standard normal is
even easier as it requires no parameters.
 The normal is tractable: it is easy to manipulate (especially in closed-form equations like
the Black-Scholes)
It is common to retrieve an historical dataset such as a series of monthly returns and compute
the mean and standard deviation of the series. In some cases, the analyst will stop at that point,
having determined the first and second moments of the data.

When doing this, the user is implicitly “imposing normality” by assuming the data is
normally distributed. For example, the user might multiply the standard deviation of the
dataset by 1.645 or 2.33 (i.e., normal distribution deviates) in order to estimate a value-at-risk.
But notice what happens in this case: without a test (or QQ-plot, for example) the analyst is
merely assuming normality because the normal distribution is conveniently summarized by only
the first two moments! Many other non-normal distributions have first (aka, location) and second
moments (aka, scale or shape).

In this way, it is not uncommon to see the normal distribution used merely for the sake of
convenience: when we only have the first two distributional moments, the normal is implied
perhaps merely because they are the only moments that have been computed.

Standard normal distribution

A normal distribution is fully specified by two parameters, mean and variance (or standard
deviation). We can transform a normal into a unit or standardized variable:
 Standard normal has mean = 0, and variance = 1
 No parameters required.
This unit or standardized variable is normally distributed with zero mean and variance of
one. Its standard deviation is also one (variance = 1.0 and standard deviation = 1.0). The letter
“Z” is often used to denote the standard normal distribution. So, when variable Z is
approximately (“asymptotically”) normally distributed, we will read this: Z ~ N(0,1).

3Michael Miller, Mathematics and Statistics for Financial Risk Management, 2nd Edition (Hoboken, NJ:
John Wiley & Sons, 2013)

10
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Standard normal distribution: Critical Z values

Key locations on the normal distribution are noted below. In the FRM curriculum, the choice of
one-tailed 5% significance and 1% significance (i.e., 95% and 99% confidence) is
common, so please pay particular attention to the yellow highlights:

Critical Two-sided One-sided


z values Confidence Significance
1.00 ~ 68% ~ 15.87%
1.645 (~1.65) ~ 90% ~ 5.0 %
1.96 ~ 95% ~ 2.5%
2.327(~2.33) ~ 98% ~ 1.0 %
2.58 ~ 99% ~ 0.5%

Memorize the two common critical values: 1.65 and 2.33. These correspond to
confidence levels, respectively, of 95% and 99% for a one-tailed test. For VAR, the one-
tailed test is relevant because we are concerned only about losses (left-tail) not gains (right-tail).

Multivariate normal distributions

Normal can be generalized to a joint distribution


of normal; e.g., bivariate normal distribution. Properties include:
 If X and Y are bivariate normal, then aX + bY is normal;
any linear combination is normal.
 If a set of variables has a multivariate normal distribution,
the marginal distribution of each is normal
 If variables with a multivariate normal distribution have covariances that equal zero, then
the variables are independent.

11
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Common examples of Bernoulli, binomial, normal and Poisson

In the FRM, these four distributions are quite common:


 The Bernoulli is invoked when there are only two outcomes. It is used to characterize
a default: an obligor or bond will either default or survive. Most bonds “survive” each
year, until perhaps one year they default. At any given point in time, or (for example)
during any given year, the bond will be in one of two states.
 The binomial is a series of independent and identically distributed (i.i.d.) Bernoulli
variables, such that the binomial is commonly used characterize a portfolio of credits.
 The normal distribution is the most common:
o Typically, the central limit theorem (CLT) will justify the significance test of the
sample average in a large sample - for example, to test the sample average
asset return or excess return.
o In many cases, due to convenience, the normal distribution is employed to model
equity returns for short horizons; typically this is an assumption made with the
understanding that it may not be realistic.
 The Poisson distribution has two very common purposes:
o Poisson is often used, as a generic stochastic process, to model the time of
default in some credit risk models.
o As a discrete distribution, the Poisson is arguably the most common distribution
employed for operational loss frequency (but not for loss severity, which wants a
continuous distribution).

Normal Binomial Poisson


Mean = =
Variance = =
Standard Dev. = =√

12
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Lognormal distribution

The lognormal is common in finance: If an asset return (r) is normally distributed, the
continuously compounded future asset price level (or ratio or prices; i.e., the wealth ratio) is
lognormal. Expressed in reverse, if a variable is lognormal, its natural log is normal. Here is an
exhibit of lognormal distribution for µ =10 and at various levels of σ (0.25, 0.5 and 1):

The exhibit below compares the lognormal distribution’s pdf to its associated CDF, for two sets
of parameters. Notice how the lognormal distribution has a non-negative support.

13
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

The lognormal distribution is extremely common in finance because it is often the distribution
assumed for asset prices (e.g., stock prices).

Specifically, it is common to assume that log (i.e., continuously compounded) asset returns
are normally distributed such that, by definition, asset prices have a lognormal
distribution.

The density function of the lognormal distribution is given by:

2
−12 ln −
( )= (4.18)4

Chi-squared distribution

The chi-squared distribution is the sum of the squares of k independent standard normal
random variables. The variable k is called its degrees of freedom. The exhibit below shows the
pdf for some chi-squared distributions with different values for k(1, 2 and 3).

4Michael Miller, Mathematics and Statistics for Financial Risk Management, 2nd Edition (Hoboken, NJ:
John Wiley & Sons, 2013)

14
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

The exhibit below compares the chi-squared distribution’s pdf to its associated CDF, for two
different degrees of freedom (df).

Properties of the chi-squared distribution include:


 Nonnegative (>0), since it is a sum of squared values.
 Skewed right, but as d.f. increases it approaches normal.
 Expected value (mean) = k and variance = 2k, where k = degrees of freedom.
 The sum of two independent chi-square variables is also a chi-squared variable.

Using a chi-square distribution, we can observe a sample variance and compare to hypothetical
population variance. This variable has a chi-square distribution with (n-1) d.f.

Example (Google’s stock return variance): Google’s sample variance over 30 days is
0.0263%. We can test the hypothesis that the population variance (Google’s “true” variance) is
0.02%. The chi-square variable = 38.14:

Sample variance (30 days) 0.0263%


Degrees of freedom (df) 29
Population variance? 0.0200%
Chi-square variable 38.14 = 0.0263%/0.02%*29
=CHIDIST() = p value 11.93% @ 29 df P[.1] = 39.0875
Area under curve (1- ) 88.07%

With 29 degrees of freedom (df), 38.14 corresponds to 11.93% (i.e., to left of 0.10 on the lookup
table). Therefore, we can reject the null with only 88% confidence; i.e., we are likely to accept
the probability that the true variance is 0.02%.

15
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Student’s t distribution (for large samples, approximates the normal)

The student’s t distribution (t distribution) is among the most commonly used distributions. As
the degrees of freedom (d.f.) increases, the t-distribution converges with the normal distribution.
It is similar to the normal, except that it exhibits slightly heavier tails (the lower the d.f., heavier
the tails). The below exhibit shows the basic shape of the student t’s distribution and how it
changes with k (specifically the shape of its tail).

The exhibit below compares the student t’s pdf to its associated CDF, for two different degrees
of freedom (df). The student’s t distribution is often used to characterize a heavy-tailed
distribution because it always has heavy tails: its kurtosis equals 6/(df -4). However, notice that
its tails are not very heavy. Keep in mind that, if you want significantly heavy tails, the student’s t
is unlikely to generate sufficiently heavy tails.

16
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

The student’s t variable is given by:


=
/√

Properties of the t-distribution:

 Like the normal, it is symmetrical


 Like the standard normal, it has mean of zero (mean = 0)
 Its variance for k >2 is k/(k-2) where k = degrees of freedom. Note, as k increases, the
variance approaches 1.0 and approximates the standard normal distribution.
 It always exhibits a heavy tail (kurtosis > 3.0) but it quickly converges to the normal
distribution as the degrees of freedom increases. However, as mentioned, the student’s t
is not considered a really heavy-tailed distribution.

In practice, the student’s t is the mostly commonly used distribution. When we test the
significance of regression coefficients, the central limit theorem (CLT) justifies the normal
distribution (because the coefficients are effectively sample means). But we rarely know the
population variance, such that the student’s t is the appropriate distribution. When the d.f. is
large (e.g., sample over ~30), as the student’s t approximates the normal, we can use the
normal as a proxy. In the assigned Stock & Watson, the sample sizes are large (e.g., 420
students), so they tend to use the normal.

Example: For example, Google’s average periodic return over a ten-day sample period was
+0.02% with sample standard deviation of 1.54%. Here are the statistics:

Sample Mean 0.02%


Sample Std Dev 1.54%
Days (n=10) 10
Confidence 95%
Significance 5%
Critical t 2.262
Lower limit -1.08%
Upper limit 1.12%

The sample mean is a random variable. If we know the population variance, we assume the
sample mean is normally distributed. But if we do not know the population variance (typically the
case!), the sample mean is a random variable following a student’s t distribution. In the
above example, we can use this to construct a confidence (random) interval:

±

We need the critical (lookup) t value. The critical t value is a function of:

 Degrees of freedom (d.f.); e.g., 10-1 =9 in this example


 Significance: 1-95% confidence = 5% in this example

17
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

How do we here retrieve the critical-t value of 2.262?

The critical-t is just a lookup (reference to) the student's t distribution as opposed to a computed
t-statistic, aka t-ratio. In this way, a critical t is an inverse CDF (quantile function) just like, for a
normal distribution, the "critical one-tailed value" at 1% is -2.33 and at 5% is -1.645. In this case
we want the critical t for (n-1) degrees of freedom and two-tailed 5% significance (= one tailed
2.5%). We can find 2.262 on the student's t lookup table where column = 2-tail 0.05 and d.f. = 9.
In Excel, 2.262 = T.INV.2t (5%, 9). The 95% confidence interval can be computed.

The upper limit is given by:

1.54%
+ (2.262) = 1.12%
√10
And the lower limit is given by:

1.54%
− (2.262) = −1.08%
√10
Please make sure you can take a sample
standard deviation, compute the critical t
value and construct the confidence
interval.

Both the normal (Z) and student’s t (t) distribution characterize the sampling distribution of the
sample mean. The difference is that the normal is used when we know the population variance;
the student’s t is used when we must rely on the sample variance. In practice, we don’t know
the population variance, so the student’s t is typically appropriate.

( − ) ( − )
= =
√ √

18
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

F-Distribution

The F distribution is also called the variance ratio distribution. The F-ratio is the ratio of sample
variances, with the greater sample variance in the numerator:

The properties of F distribution include the following: it is nonnegative (>0) and skewed to the
right. Like the chi-square distribution, as df increases, approaches normal. The square of a t-
distributed variable (with k df) has an F distribution with (1, k) df; X2~ F(1,k). Below are exhibits
that display the F-distribution (for three sets of parameters) and compare its pdf to its CDF.

19
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Example: Based on two 10-day samples, we calculated the sample variance of Google and
Yahoo. Google’s variance was 0.0237% and Yahoo’s was 0.0084%. Find the F ratio.

GOOG YHOO
=VAR() 0.0237% 0.0084%
=COUNT() 10 10
F ratio 2.82
Confidence 90%
Significance 10%
=FINV() 2.44

 The F ratio, therefore, is 2.82 (divide higher variance by lower variance; the F ratio
must be greater than, or equal to, 1.0).
 At 10% significance, with (10-1) and (10-1) degrees of freedom, the critical F value is
2.44. Because our F ratio of 2.82 is greater than (>) 2.44, we reject the null (i.e., that the
population variances are the same).
 We conclude the population variances are different.

Triangular Distribution

The triangular distribution is a distribution whose PDF is a triangle, say with a minimum of a, a
maximum of b, and a mode of c. Like the uniform distribution, it has a finite range, but being
only slightly more complex than a uniform distribution, it has more flexibility. The triangular
distribution has a unique mode, and can be symmetric, positively skewed, or negatively skewed.
Its PDF is described by the following two-part function:

⎧ 2( − )
≤ ≤
⎪( − )( − )
( )=
⎨ 2( − )
≤ ≤
⎪( − )( − )

The exhibit below shows a triangular distribution where a, b, and c are 0.0, 1.0, and 0.8,
respectively.

20
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Triangular distribution
Three params: a, b & c (mode)
2.5

2.0
a= 0, b= 1, c= 0.8
1.5

1.0

0.5

0.0
0.0 0.3 0.5 0.8 1.0

 PDF is zero at both a and b, and the value of f(x) reaches a maximum, 2/(b − a), at c.
 The mean and variance are given by:
+ +
=
3
+ + − − −
=
18

Beta distribution

The beta distribution has two parameters: alpha (“center”) and beta (“shape”). The beta
distribution is very flexible, and popular for modeling default and recovery rates.

21
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Example: The beta distribution is often used to model recovery rates. Here are two examples:
one beta distribution to model a junior class of debt (i.e., lower mean recovery) and another for
a senior class of debt (i.e., lower loss given default):

Junior Senior
alpha (center) 2.0 4.0
beta (shape) 6.0 3.3
Mean recovery 25% 55%

Beta distribution
for recovery/LGD
0.03

0.02
Senior
0.01 Junior

0.00
0%
7%
14%
21%
28%
35%
42%
49%
56%
63%
70%
77%
84%
91%
98%
Recovery (Residual Value)

Describe a mixture distribution and explain the creation and


characteristics of mixture distributions.
A mixture distribution is a sum of other distribution functions but weighted by probabilities. The
density function of a mixture distribution is, then, the probability-weighted sum of the
component density function

( )= ( ), =1

where fi(x)’s are the component distributions, and wi’s are the mixing proportions or weights.

Note: The sum of the component weights must equal one.

For example, consider a stock whose log returns follow a normal distribution with low volatility
90% of the time, and a normal distribution with high volatility 10% of the time. Most of the time
the stock just bounces along but occasionally, the stock’s behavior may be more extreme. In
this Miller’s example, the mixture distribution is:

( )= ( )+ ( )

22
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Normal mixture distribution

A mixture distribution is extremely flexible. If two normal distributions have the same mean, they
combine (mix) to produce mixture distribution with leptokurtosis (heavy-tails). Otherwise,
mixtures are infinitely flexible.

So, just by adding two normal distributions together, we can develop a large number of
interesting distributions. For example, if we combine two normal distributions with the same
mean but different variances, we can get a symmetrical mixture distribution that displays excess
kurtosis.

By shifting the mean of one distribution, we can also create a distribution with positive or
negative skew. Finally, if we move the means of far enough apart, the resulting mixture
distribution will be bimodal. This exhibit below shows that we have a PDF with two distinct
maxima.

23
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Questions & Answers:


309.1. Next month, the short interest rate will be either 200 basis points with probability of
28.0%, or 300 basis points. What is nearest to the implied rate volatility?
a) 17.30 bps
b) 44.90 bps
c) 83.50 bps
d) 117.70 bps

309.2. At the start of the year, a stock price is $100.00. A twelve-step binomial model describes
the stock price evolution such that each month the extremely volatility price will either jump from
S(t) to S(t)*u with 60.0% probability or down to S(t)*d with 40.0% probability. The up jump (u) =
1.1 and the down jump (d) = 1/1.1; note these (u) and (d) parameters correspond to an annual
volatility of about 33% as exp[33%*SQRT(1/12)] ~= 1.10. At the end of the year, which is
nearest to the probability that the stock price will be exactly $121.00?
a) 0.33%
b) 3.49%
c) 12.25%
d) 22.70%

310.1. A large bond portfolio contains 100 obligors. The average default rate is 4.0%. Analyst
Joe assumes defaults follow a Poisson distribution but his colleague Mary assumes the defaults
instead follow a binomial distribution. If they each compute the probability of exactly four (4)
defaults, which is nearest to the difference between their computed probabilities?
a) 0.40%
b) 1.83%
c) 3.55%
d) 7.06%

24
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

311.1. George the analyst creates a model, displayed below, which generates two series of
random but correlated asset returns. Both asset prices begin at a price of $10.00 with a periodic
mean return of +1.0%. Series #1 has periodic volatility of 10.0% while Series #2 has periodic
volatility of 20.0%. The desired correlation of the simulated series is 0.80. Each series steps
according to a discrete version of geometric Brownian motion (GBM) where price(t+1) = price (t)
+ price(t)*(mean + volatility*standard random normal). Two standard random normals are
generated at each step, X(1) and X(2), but X(2) is transformed into correlated Y(1) with Y(1) =
rho*X(1) + SQRT(1 - rho^2)*X(2), such that Y(1) informs Series #2. The first five steps are
displayed below:

At the fourth step, when the Series #1 Price = $10.81, what is Y(1) and the Series #2 Price [at
Step 4], both of which cells are highlighted in orange above?
a) -0.27 and $9.08
b) +0.55 and $9.85
c) +0.99 and $11.33
d) +2.06 and $12.40

312.1. A random variable X has a density function that is a normal mixture with two independent
components: the first normal component has an expectation (mean) of 4.0 with variance of 16.0;
the second normal component has an expectation (mean) of 6.0 with variance of 9.0. The
probability weight on the first component is 0.30 such that the weight on the second component
is 0.70. What is the probability that X is less than zero; i.e., Prob [X<0]?
a) 0.015%
b) 1.333%
c) 6.352%
d) 12.487%

25
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Answers:

309.1. B. 44.90 bps

Expected rate = 28%*200 + 72%*300 = 272, and


Variance = (200-272)^2*28% + (300-272)^2*72% = 2,016.0 bps^2, such that
Standard deviation = SQRT(2,016) = 44.90 basis points.

Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-309-probability-


distributions-i-miller-chapter-4.7025

309.2. D. 22.70%

There are 13 outcomes at the end of the 12-step binomial, with $100 as the outcome that must
correspond to six up jumps and six down jumps. Therefore, $121.0 must be the outcome due to
seven up jumps and five down jumps: $100*1.1^7*(1/1.1)^5 = $121.00
Such that we want the binomial probability given by:
Binomial Prob [X = 7 | n = 12, p = 60%] = 22.70%.

Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-309-probability-


distributions-i-miller-chapter-4.7025

310.1. A. 0.40%

Binomial Prob [X = 4 | n = 100 and p = 4%] = 19.939%, and


Poisson Prob [X = 4 | lamda = 100*4%] = 19.537%, such that difference = 0.4022%.

Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-310-probability-


distributions-ii-miller-chapter-4.7036

311.1. C. 0.99 and $11.33

Correlated Series #2 = 0.80*1.02 + SQRT(1-0.80^2)*0.28 = 0.99; i.e., the standard random


normal 0.28 is transformed into another, correlated standard random normal of 0.99.
The Series #2 Price [Step 4] = $9.38 + 9.38*(1% + 0.99*20%) = $11.33

Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-311-probability-


distributions-iii-miller.7066

312.1. C. 6.352%

Because the normal mixture distribution function is a probability-weighted sum of its component
distribution functions, it is true that:
Prob(mixture)[X < 0] = 0.30*Prob(1st component)[X < 0] + 0.70*Prob(2nd component)[X < 0].
In regard to the 1st component, Z = (0-4)/sqrt(16) = -4/4 = -1.0.
In regard to the 2nd component, Z = (0-6)/sqrt(9) = -6/3 = -2.0. Such that:
Prob(mixture)[X<0] = 0.30*[Z < -1.0] + 0.70*Prob[Z < -2.0],
Prob(mixture)[X<0] = 0.30*15.87% + 0.70*2.28% = 6.352%.

Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-312-mixture-


distributions.7103

26
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Appendix: Additional distributions that are not in the syllabus but


occasionally relevant
The following distributions are not explicitly assigned in this chapter, but have
historically been relevant to the FRM, to various degrees.

Exponential

The exponential distribution is popular in queuing theory. It is used to model the time we have
to wait until a certain event takes place.

Exponential

2.50
2.00
1.50 0.5
1.00 1
0.50 2
0.00
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
4.0
4.4
4.8
-

According to the text, examples include “the time until the next client enters the store, the time
until a certain company defaults or the time until some machine has a defect.” The exponential
function is non-zero:

( )= , = 1⁄ , >0

Weibull

Weibull is a generalized exponential distribution; i.e., the exponential is a special case of the
Weibull where the alpha parameter equals 1.0.

( )= 1− , >0

Weibull distribution
2.00

1.50
alpha=.5, beta=1
1.00
alpha=2, beta=1
0.50
alpha=2, beta=2
-
0.0
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
4.0
4.4
4.8

27
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

The main difference between the exponential distribution and the Weibull is that, under the
Weibull, the default intensity depends upon the point in time t under consideration. This allows
us to model the aging effect or teething troubles:

 For α > 1—also called the “light-tailed” case—the default intensity is monotonically
increasing with increasing time, which is useful for modeling the “aging effect” as it
happens for machines: The default intensity of a 20-year old machine is higher than the
one of a 2-year old machine.
 For α < 1—the “heavy-tailed” case—the default intensity decreases with increasing
time. That means we have the effect of “teething troubles,” a figurative explanation for
the effect that after some trouble at the beginning things work well, as it is known from
new cars. The credit spread on noninvestment-grade corporate bonds provides a good
example: Credit spreads usually decline with maturity. The credit spread reflects the
default intensity and, thus, we have the effect of “teething troubles.” If the company
survives the next two years, it will survive for a longer time as well, which explains the
decreasing credit spread.
 For α = 1, Weibull distribution reduces to an exponential distribution with parameter β.

Gamma distribution

The family of Gamma distributions forms a two-parameter probability distribution family with pdf:

1 /
( )= , >0
Γ( )
Gamma distribution
1.20
1.00
0.80 alpha=1, beta=1
0.60 alpha=2, beta=.5
0.40 alpha=4, beta=.25

0.20
-

 For alpha = 1, Gamma distribution becomes exponential distribution


 For alpha = k/2 and beta = 2, Gamma distribution becomes Chi-square distribution

28
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Logistic

A logistic distribution has heavy tails.

Logistic distribution
0.50

0.40
alpha=0, beta=1
0.30
alpha=2, beta=1
0.20 alpha=0, beta=3
N(0,1)
0.10

-
1 4 7 10 13 16 19 22 25 28 31 34 37 40

Extreme Value Theory

Measures of central tendency and dispersion (variance, volatility) are impacted more by
observations near the mean than outliers. The problem is that, typically, we are concerned with
outliers; we want to size the likelihood and magnitude of low frequency, high severity (LFHS)
events. Extreme value theory (EVT) solves this problem by fitting a separate distribution to
the extreme tail loss. EVT uses only the tail of the distribution, not the entire dataset.

In applying extreme value theory (EVT), the two general approaches are:
 Block maxima (BM): The classic approach
 Peaks over threshold (POT): The modern approach that is often preferred.

29
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Block maxima

The dataset is parsed into (m) identical, consecutive and non-overlapping periods called blocks.
The length of the block should be greater than the periodicity; e.g., if the returns are daily,
blocks should be weekly or more. Block maxima partitions the set into time-based intervals. It
requires that observations be identically and independently (i.i.d.) distributed.

Generalized extreme value (GEV) fits block maxima. The Generalized extreme value
(GEV) distribution is given by:

 1
  

exp  (1   y )    0
H ( y )    
 
 y
exp( e )  0

The  (xi) parameter is the “tail index;” it represents the fatness of the tails. In this expression, a
lower tail index corresponds to fatter tails.

Generalized Extreme Value (GEV)


0.15

0.10

0.05

0.00
10

15

20

25

30

35

40

45
0

Per the (unassigned) Jorion reading on EVT, the key thing to know here is that (1)
among the three classes of GEV distributions (Gumbel, Frechet, and Weibull), we only
care about the Frechet because it fits to fat-tailed distributions, and (2) the shape
parameter determines the fatness of the tails (higher shape → fatter tails)

30
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Peaks over threshold (POT)

Peaks over threshold (POT) collects the dataset of losses above (or in excess of) some
threshold.

The cumulative distribution function here refers to the probability that the “excess loss” (i.e., the
loss, X, in excess of the threshold, u, is less than some value, y, conditional on the loss
exceeding the threshold):

( )= ( − ≤ | > )

u X
-4 -3 -2 -1 0 1 2 3 4
1
  x 
1  (1  )  0
 
G , ( x )  
1  exp(  x )  0
 

31
Licensed to Ngoc Le at [email protected]. Downloaded January 11, 2021.
The information provided in this document is intended solely for you. Please do not freely distribute.

Generalized Pareto Distribution


(GPD)
1.50

1.00

0.50

-
0 1 2 3 4

Block maxima is time-based (i.e., blocks of time), traditional, less sophisticated and
more restrictive in its assumptions (i.i.d.) while peaks over threshold (POT) is more
modern, has at least three variations (semi-parametric, unconditional parametric and
conditional parametric) and is more flexible.

EVT Highlights: Both GEV and GPD are parametric distributions used to model heavy-tails.
GEV (Block Maxima)
 Has three parameters: location, scale and tail index
 If tail > 0: Frechet
GPD (peaks over threshold, POT)
 Has two parameters: scale and tail (or shape)
 But must select threshold (u)

32

You might also like