0% found this document useful (0 votes)
20 views

CQF Jan Maths Primer 2013 Probability Blank

This document provides an overview of probability concepts including: - Preliminaries such as experiments, outcomes, events, and sample spaces. - Calculating probability as the number of favorable outcomes divided by the total number of outcomes. - Using probability diagrams such as tables, trees, and Venn diagrams to represent problems. - Conditional probability and Bayes' theorem for calculating probabilities when events are dependent. - Defining mutually exclusive and independent events and how they impact probability calculations.

Uploaded by

qpalzm97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

CQF Jan Maths Primer 2013 Probability Blank

This document provides an overview of probability concepts including: - Preliminaries such as experiments, outcomes, events, and sample spaces. - Calculating probability as the number of favorable outcomes divided by the total number of outcomes. - Using probability diagrams such as tables, trees, and Venn diagrams to represent problems. - Conditional probability and Bayes' theorem for calculating probabilities when events are dependent. - Defining mutually exclusive and independent events and how they impact probability calculations.

Uploaded by

qpalzm97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

CERTIFICATE IN

FINANCE

CQF

Certificate in Quantitative Finance


Subtext t here

GLOBAL STANDARD IN FINANCIAL ENGINEERING

A4 PowerPoint cover Portrait2.indd 1 21/10/2011 10:53


Certificate in Quantitative
Finance
Probability and Statistics

June 2011

1
1 PROBABILITY

1 Probability
1.1 Preliminaries
• An experiment is a repeatable process that gives
rise to a number of outcomes.
• An event is a collection (or set) of one or more out-
comes.
• An sample space is the set of all possible outcomes
of an experiment, often denoted Ω.

Example

In an experiment a dice is rolled and the number ap-


pearing on top is recorded.

Thus
Ω = {1, 2, 3, 4, 5, 6}
If E1 , E2 , E3 are the events even, odd and prime occur-
ring, then

E1 ={2, 4, 6}
E2 ={1, 3, 5}
E3 ={2, 3, 5}

2
1.1 Preliminaries 1 PROBABILITY

1.1.1 Probability Scale


Probability of an Event E occurring i.e. P (E) is less
than or equal to 1 and greater than or equal to 0.
0 ≤ P (E) ≤ 1

1.1.2 Probability of an Event


The probability of an event occurring is defined as:

The number of ways the event can occur


P (E) =
Total number of outcomes

Example

A fair dice is tossed. The event A is defined as the


number obtained is a multiple of 3. Determine P (A)

Ω ={1, 2, 3, 4, 5, 6}
A ={3, 6}

2
∴ P (A) =
6

1.1.3 The Complimentary Event E 0


An event E occurs or it does not. If E is the event then
E 0 is the complimentary event, i.e. not E where
P (E 0 ) = 1 − P (E)

3
1.2 Probability Diagrams 1 PROBABILITY

1.2 Probability Diagrams


It is useful to represent problems diagrammatically. Three
useful diagrams are:
• Sample space or two way table
• Tree diagram
• Venn diagram

Example

Two dice are thrown and their numbers added to-


gether. What is the probability of achieving a total of
8?

5
P (8) =
36
Example

A bag contains 4 red, 5 yellow and 11 blue balls. A


ball is pulled out at random, its colour noted and then

4
1.2 Probability Diagrams 1 PROBABILITY

replaced. What is the probability of picking a red and a


blue ball in any order.

P(Red and Blue) or P(Blue and Red) =


   
4 11 11 4 11
× + × =
20 20 20 20 50

Venn Diagram

A Venn diagram is a way of representing data sets or


events. Consider two events A and B. A Venn diagram
to represent these events could be:

• A∪B ”A or B”

5
1.2 Probability Diagrams 1 PROBABILITY

• A∩B ”A and B”

Addition Rule:
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
or
P (A ∩ B) = P (A) + P (B) − P (A ∪ B)

Example

In a class of 30 students, 7 are in the choir, 5 are in


the school band and 2 students are in the choir and the
school band. A student is chosen at random from the
class. Find:
a) The probability the student is not in the band
b) The probability the student is not in the choir nor in
the band

6
1.2 Probability Diagrams 1 PROBABILITY

5 + 20
P (not in band) =
30
25 5
= =
30 6
20 2
P (not in either) = =
30 3
Example

A vet surveys 100 of her clients, she finds that:


(i) 25 own dogs
(ii) 53 own cats
(iii) 40 own tropical fish
(iv) 15 own dogs and cats
(v) 10 own cats and tropical fish

7
1.2 Probability Diagrams 1 PROBABILITY

(vi) 11 own dogs and tropical fish


(vii) 7 own dogs, cats and tropical fish

If she picks a client at random, Find:


a) P(Owns dogs only)
b) P(Does not own tropical fish)
c) P(Does not own dogs, cats or tropical fish)

6
P (Dogs only) =
100
6 + 8 + 35 + 11 60
P (Does not own tropical fish) = =
100 100
11
P (Does not own dogs, cats or tropical fish) =
100

8
1.3 Conditional Probability 1 PROBABILITY

1.3 Conditional Probability


The probability of an event B may be different if you
know that a dependent event A has already occurred.

Example

Consider a school which has 100 students in its sixth


form. 50 students study mathematics, 29 study biology
and 13 study both subjects. You walk into a biology class
and select a student at random. What is the probability
that this student also studies mathematics?

13
P (study maths given they study biology) = P (M |B) =
29

In general, we have:

9
1.3 Conditional Probability 1 PROBABILITY

P (A ∩ B)
P (A|B) =
P (B)
or, Multiplication Rule:

P (A ∩ B) = P (A|B) × P (B)
Example

You are dealt exactly two playing cards from a well


shuffled standard 52 card deck. What is the probability
that both your cards are Kings ?

Tree Diagram!

4 3 1
P (K ∩ K) = × = =≈ 0.5%
52 51 221
or
3 4
P (K∩K) = P (2nd is King | first is king)×P (first is king) = ×
51 52

We know,
P (A ∩ B) = P (B ∩ A)

10
1.3 Conditional Probability 1 PROBABILITY

so
P (A ∩ B) = P (A|B) × P (B)
P (B ∩ A) = P (B|A) × P (A)

i.e.
P (A|B) × P (B) = P (B|A) × P (A)
or

Bayes’ Theorem:

P (A|B) × P (B)
P (B|A) =
P (A)
Example

You have 10 coins in a bag. 9 are fair and 1 is double


headed. If you pull out a coin from the bag and do not
examine it. Find:
1. Probability of getting 5 heads in a row
2. Probability that if you get 5 heads the you picked
the double headed coin

11
1.3 Conditional Probability 1 PROBABILITY

P (5heads) = P (5heads|N ) × P (N ) + P (5heads|H) × P (H)


   
1 9 1
= × + 1×
32 10 10
41
=
320
≈ 13%

P (5heads|H) × P (H)
P (H|5heads) =
P (5heads)
1
1 × 10
= 41
320
320
=
410
≈ 78%

12
1.4 Mutually exclusive and Independent events 1 PROBABILITY

1.4 Mutually exclusive and Independent


events
When events can not happen at the same time, i.e. no
outcomes in common, they are called mutually exclu-
sive. If this is the case, then
P (A ∩ B) = 0

and the addition rule becomes


P (A ∪ B) = P (A) + P (B)
Example

Two dice are rolled, event A is ’the sum of the out-


comes on both dice is 5’ and event B is ’the outcome on
each dice is the same’

When one event has no effect on another event, the


two events are said to be independent, i.e.

P (A|B) = P (A)
and the multiplication rule becomes

P (A ∩ B) = P (A) × P (B)

13
1.5 Two famous problems 1 PROBABILITY

Example

A red dice and a blue dice are rolled, if event A is ’the


outcome on the red dice is 3’ and event B ’is the outcome
on the blue dice is 3’ then events A and B are said to be
independent.

1.5 Two famous problems

• Birthday Problem - What is the probability that


at least 2 people share the same birthday

• Monty Hall Game Show - Would you swap ?

14
1.6 Random Variables 1 PROBABILITY

1.6 Random Variables


1.6.1 Notation
Random Variables X, Y, Z
Observed Variables x, y, z

1.6.2 Definition
Outcomes of experiments are not always numbers, e.g.
two heads appearing; picking an ace from a deck of cards.
We need some way of assigning real numbers to each ran-
dom event. Random variables assign numbers to events.
Thus a random variable (RV) X is a function which
maps from the sample space Ω to the number line.

Example

let X = the number facing up when a fair dice is rolled,

or let X represent the outcome of a coin toss, where



1 if heads
X=
0 if tails

1.6.3 Types of Random variable


1. Discrete - Countable outcomes, e.g. roll of a dice,
rain or no rain
2. Continuous - Infinite number of outcomes, e.g. exact
amount of rain in mm

15
1.7 Probability Distributions 1 PROBABILITY

1.7 Probability Distributions


Depending on whether you are dealing with a discrete
or continuous random variable will determine how you
define your probability distribution.

1.7.1 Discrete distributions


When dealing with a discrete random variable we de-
fine the probability distribution using a probaility mass
fucntion or simply a probability function.

Example
The RV X is defined as’ the sum of scores shown by
two fair six sided dice’. Find the probability distribution
of X
A sample space diagram for the experiment is:

The distribution can be tabulated as:

x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (X = x) 36 36 36 36 36 36 36 36 36 36 36

16
1.7 Probability Distributions 1 PROBABILITY

or can be represented on a graph as

1.7.2 Continuous Distributions


As continuous random variables can take any value, i.e an
infinite number of values, we must define our probability
distribution differently.
For a continuous RV the probability of getting a spe-
cific value is zero, i.e
P (X = x) = 0
and so just as we go from bar charts to histograms when
representing discrete and continuous data, we must use a
probability density function (PDF) when describing the
probability distribution of a continuous RV.

17
1.7 Probability Distributions 1 PROBABILITY

Z b
P (a < X < b) = f (x)dx
a

Properties of a PDF:

• f (x) ≥ 0 since probabilities are always positive

R +∞
• ∞ f (x)dx = 1

Rb
• P (a < X < b) = a f (x)dx

Example

The random variable X has the probability density


function:

k 1<x<2
f (x) = k(x − 1) 2≤x≤4
0 otherwise

18
1.7 Probability Distributions 1 PROBABILITY

a) Find k and Sketch the probability distribution


b) Find P (X ≤ 1.5)

a)
Z +∞
f (x)dx = 1

Z 2 Z 4
1 = kdx + k(x − 1)dx
1 2
 2 4
2 kx
1 = [kx]1 + − kx
2 2
1 = 2k − k + [(8k − 4k) − (2k − 2k)]
1 = 5k
1
∴k =
5

19
1.8 Cumulative Distribution Function 1 PROBABILITY

b)
Z 1.5
1
P (X ≤ 1.5) = dx
5
h 1x i1.5
=
5 1
1
=
10

1.8 Cumulative Distribution Function


The CDF is an alternative function for summarising a
probability distribution. It provides a formula for P (X ≤
x), i.e.
F (x) = P (X ≤ x)

1.8.1 Discrete Random variables


Example

Consider the probability distribution

x 1 2 3 4 5 6
P (X = x) 12 14 18 1
16
1
32
1
32

F (X) = P (X ≤ x)
Find:
a) F (2) and
b) F (4.5)

20
1.8 Cumulative Distribution Function 1 PROBABILITY

a)

F (2) = P (X ≤ 2) = P (X = 1) + P (X = 2)
1 1
= +
2 4
3
=
4
b)
F (4.5) = P (X ≤ 4.5) = P (X ≤ 4)
1 1 1 1
= + + +
16 8 4 2
15
=
16

1.8.2 Continuous Random Variable


For continuous random variables
Z x
F (X) = P (X ≤ x) = f (x)dx
−∞
or
d
f (x) = F (x)
dx
Example

A PDF is defined as
 3
11 (4 − x2 ) 0≤x≤1
f (x) =
0 otherwise
Find the CDF

21
1.8 Cumulative Distribution Function 1 PROBABILITY

Consider:
From −∞ to 0: F (x) = 0

From 1 to ∞: F (x) = 1

From 0 to 1 :

22
1.8 Cumulative Distribution Function 1 PROBABILITY

Z x
3
F (x) = (4 − x2 )dx
0 11
x
x3

3
= 4x −
11 3 0
x3
 
3
= 4x −
11 3
i.e.

0 h
 i x<0
3
3 x
F (x) = 11 4x − 3 0≤x≤1

1 x>1

Example

A CDF is defined as:



0  x<1
1 2

F (x) = x + 2x − 3 1≤x≤3
 12
1 x>3

a) Find P (1.5 ≤ x ≤ 2.5)


b) Find f (x)

a)
P (1.5 ≤ x ≤ 2.5) = F (2.5) − F (1.5)
1 1
= (2.52 + 2(2.5) − 3) − (1.52 + 2(1.5) − 3)
12 12
= 0.5

23
1.9 Expectation and Variance 1 PROBABILITY

b)
d
f (x) = F (x)
dx

1

6 (x + 1) 1≤x≤3
f (x) =
0 otherwise

1.9 Expectation and Variance


The expectation or expected value of a random variable
X is the mean µ (measure of center), i.e.
E(X) = µ
The variance of a random variables X is a measure of
dispersion and is labeled σ 2 , i.e.
V ar(X) = σ 2

1.9.1 Discrete Random variables


For a discrete random variable
X
E(X) = xP (X = x)
allx
Example

Consider the probability distribution

x 1 2 3 4
P (X = x) 12 14 18 18
then

24
1.9 Expectation and Variance 1 PROBABILITY

1 1 1 1
E(X) = (1 × ) + (2 × ) + (3 × ) + (4 × )
2 4 8 8
15
=
8

25
1.9 Expectation and Variance 1 PROBABILITY

Aside
What is Variance?

(x − µ)2
P
Variance =
P 2n
x
= − µ2
n

rP
(x − µ)2
Standard deviation =
rP n
x2
= − µ2
n

26
1.9 Expectation and Variance 1 PROBABILITY

For a discrete random variable


V ar(X) = E(X 2 ) − [E(X)]2
Now, for previous example

1 1 1
E(X 2 ) = 12 × + 22 × + 32 × 18 + 42 ×
2 4 8
15
E(X) =
18
71
∴ V ar(X) =
64
= 1.10937...
Standard Deviation = 1.05(3s.f)

1.9.2 Continuous Random Variables


For a continuous random variable
Z
E(X) = xf (x)dx
allx

and
V ar(X) = E(X 2 ) − [E(X)]2
Z Z 2
2
= x f (x)dx − xf (x)dx
allx allx

Example

if
3
− x2 )

32 (4x 0≤x≤4
f (x) =
0 otherwise

27
1.9 Expectation and Variance 1 PROBABILITY

Find E(X) and V ar(X)

Z 4
3
E(X) = x. (4x − x2 )dx
0 32
Z 4
3
= 4x − x2 dx
32 0
4
3 4x3 x4

= −
32 3 4 0
3
44
  
3 4(4)
= − − (0)
32 3 4
= 2

V ar(X) = E(X 2 ) − [E(X)]2


Z 4
3
= x2 . (4x − x2 )dx − 22
0 32
 4 4
3 4x x5
= − −4
32 4 5 0
5
 
3 4
= 44 − −4
32 5
4
=
5

28
1.10 Expectation Algebra 1 PROBABILITY

1.10 Expectation Algebra


Suppose X and Y are random variables and a,b and c
are constants. Then:
• E(X + a) = E(X) + a
• E(aX) = aE(X)
• E(X + Y ) = E(X) + E(Y )
• V ar(X + a) = V ar(X)
• V ar(aX) = a2 V ar(X)
• V ar(b) = 0
If X and Y are independent, then
• E(XY ) = E(X)E(Y )
• V ar(X + Y ) = V ar(X) + V ar(Y )

29
1.11 Moments 1 PROBABILITY

1.11 Moments
The first moment is E(X) = µ

The nth moment is E(X n ) =


R n
allx x f (x)dx

We are often interested in the moments about the


mean, i.e. central moments.
The 2nd central moment about the mean is called the
variance E[(X − µ)2 ] = σ 2

The 3rd central moment is E[(X − µ)3 ]


So we can compare with other distributions, we scale
with σ 3 and define Skewness.

E[(X − µ)3 ]
Skewness =
σ3
This is a measure of asymmetry of a distribution. A
distribution which is symmetric has skew of 0. Negative
values of the skewness indicate data that are skewed to
the left, where positive values of skewness indicate data
skewed to the right.

30
1.11 Moments 1 PROBABILITY

The 4th normalised central moment is called Kurtosis


and is defined as

E[(X − µ)4 ]
Kurtosis =
σ4
A normal random variable has Kurtosis of 3 irrespec-
tive of its mean and standard deviation. Often when
comparing a distribution to the normal distribution, the
measure of excess Kurtosis is used, i.e. Kurtosis of
distribution −3.

Intiution to help understand Kurtosis

Consider the following data and the effect on the Kur-


tosis of a continuous distribution.

xi < µ ± σ :

The contribution to the Kurtosis from all data points


within 1 standard deviation from the mean is low since
(xi − µ)4
<1
σ4
e.g consider
1
x1 = µ + σ
2
then
1 4 4  4

(x1 − µ)4 2 σ 1 1
= = =
σ4 σ4 2 16
xi > µ ± σ :

31
1.11 Moments 1 PROBABILITY

The contribution to the Kurtosis from data points


greater than 1 standard deviation from the mean will
be greater the further they are from the mean.
(xi − µ)4
>1
σ4
e.g consider
x1 = µ + 3σ
then
(x1 − µ)4 (3σ)4
= = 81
σ4 σ4

This shows that a data point 3 standard deviations


from the mean would have a much greater effect on the
Kurtosis than data close to the mean value. Therefore,
if the distribution has more data in the tails, i.e. fat tails
then it will have a larger Kurtosis.
Thus Kurtosis is often seen as a measure of how ’fat’
the tails of a distribution are.

If a random variable has Kurtosis greater than 3 is


is called Leptokurtic, if is has Kurtosis less than 3 it is
called platykurtic

Leptokurtic is associated with PDF’s that are simul-


taneously peaked and have fat tails.

32
1.11 Moments 1 PROBABILITY

33
1.12 Covariance 1 PROBABILITY

1.12 Covariance
The covariance is useful in studying the statistical de-
pendence between two random variables. If X and Y
are random variables, then theor covariance is defined
as:

Cov(X, Y ) = E [(X − E(X))(Y − E(Y ))]


= E(XY ) − E(X)E(Y )
Intuition

Imagine we have a single sample of X and Y, so that:

X = 1, E(X) = 0

Y = 3, E(Y ) = 4
Now
X − E(X) = 1
and
Y − E(Y ) = −1
i.e.
Cov(X, Y ) = −1

So in this sample when X was above its expected value


and Y was below its expected value we get a negative
number.
Now if we do this for every X and Y and average
this product, we should find the Covariance is negative.
What about if:

34
1.12 Covariance 1 PROBABILITY

X = 4, E(X) = 0

Y = 7, E(Y ) = 4
Now
X − E(X) = 4
and
Y − E(Y ) = 3
i.e.
Cov(X, Y ) = 12
i.e positive

We can now define an important dimensionless quan-


tity (used in finance) called the correlation coefficient
and denoted ρXY (X, Y ) where

Cov(X, Y )
ρXY = ; −1 ≤ ρXY ≤ 1
σX σY

If ρXY = −1 =⇒ perfect negative correlation

If ρXY = 1 =⇒ perfect positive correlation

If ρXY = 0 =⇒ uncorrelated

35
1.13 Important Distributions 1 PROBABILITY

1.13 Important Distributions


1.13.1 Binomial Distribution
The Binomial distribution is a discrete distribution and
can be used if the following are true.
• A fixed number of trials, n
• Trials are independent
• Probability of success is a constant p

We say X ∼ B(n, p) and


 
n x
P (X = x) = p (1 − p)n−x
x
where  
n n!
=
x x!(n − x)!
Example

If X ∼ B(10, 0.23), find

a) P (X = 3)
b) P (X < 4)

a)
 
10
P (X = 3) = (0.23)3 (1 − 0.23)7
3
= 0.2343

36
1.13 Important Distributions 1 PROBABILITY

b)
P (X < 4) = P (X ≤ 3)
= P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
   
10 0 10 10
= (0.23) (0.77) + (0.23)1 (0.77)9
0 1
   
10 10
+ (0.23)2 (0.77)8 + (0.23)3 (0.77)7
2 3
= 0.821(3 d.p)
Example

Paul rolls a standard fair cubical die 8 times. What is


the probability that he gets 2 sixes.

Let X be the random variable equal to the number of


6’s obtained, i.e X ∼ B(8, 16 )
   2  6
8 1 1
P (X = 2) = = 0.2604(4 d.p)
2 6 6

It can be shown that for a binomial distribution where


X ∼ B(n, p)
E(X) = np
and
V ar(X) = np(1 − p)

37
1.13 Important Distributions 1 PROBABILITY

1.13.2 Poisson Distribution


The Poisson distribution is a discrete distribution where
the random variable X represents the number of events
that occur ’at random’ in any interval. If X is to have a
Poisson distribution then events must occur
• Singly, i.e. no chance of two events occurring at the
same time
• Independently of each other
• Probability of an event occurring at all points in time
is the same
We say X ∼ P o(λ).

The Poisson distribution has probability function:


e−λ λr
P (X = r) = r = 0, 1, 2...
r!
It can be shown that:
E(X) = λ
V ar(X) = λ
Example

Between 6pm and 7pm, directory enquiries receives


calls at the rate of 2 per minute. Find the probability
that:
(i) 4 calls arrive in a randomly chosen minute
(ii) 6 calls arrive in a randomly chosen two minute pe-
riod

38
1.13 Important Distributions 1 PROBABILITY

(i) Let X be the number of call in 1 minute, so


λ = 2, i.e. E(X) = 2
and
e−2 2r
X ∼ P o(2) =
r!

e−2 24
P (X = 4) = = 0.090(3 d.p)
4!

(ii) Let Y be the number of calls in 2 minutes, so

λ = 4, i.e. E(Y ) = 4
and
e−4 46
P (Y = 6) = = 0.104(3 d.p)
6!

39
1.13 Important Distributions 1 PROBABILITY

1.13.3 Normal Distribution


The Normal distribution is a continuous distribution.
This is the most important distribution. If X is a ran-
dom variable that follows the normal distribution we say:
X ∼ N (µ, σ 2 )
where
E(X) = µ
V ar(X) = σ 2
and the PDF is described as
1 (x−µ)2
PDF = f (x) = √ e 2σ2
σ 2π
i.e. Z x
1 (s−µ)2
P (X ≤ x) = √ e 2σ2 ds
−∞ σ 2π

The Normal distribution is symmetric and area under


the graph equals 1, i.e.
Z +∞
1 (x−µ)2
√ e 2σ2 dx = 1
−∞ σ 2π

40
1.13 Important Distributions 1 PROBABILITY

To find the probabilities we must integrate under f (x),


this is not easy to do and requires numerical methods.
In order to avoid this numerical calculation we define
a standard normal distribution, for which values have
already been documented.
The Standard Normal distribution is just a transfor-
mation of the Normal distribution.

1.13.4 Standard Normal distribution


We define a standard normal random variable by Z,
where Z ∼ N (0, 1), i.e.
E(Z) = 0
V ar(Z) = 1
thus the PDF is
1 −z2
φ(z) = √ e 2

and Z z
1 −s2
Φ(z) = √ e 2 ds
−∞ 2π

41
1.13 Important Distributions 1 PROBABILITY

To transform a Normal distribution into a Standard


Normal distribution, we use:

X −µ
Z=
σ
Example

Given X ∼ N (12, 16) find:


a) P (X < 14)
b) P (X > 11)
c) P (13 < X < 15)
a)

X − µ 14 − 12
Z= = = 0.5
σ 4
Therefore we want

P (Z ≤ 0.5) = Φ(0.5)
= 0.6915
(from tables)

b)

42
1.13 Important Distributions 1 PROBABILITY

11 − 12
Z= = −0.25
4
Therefore we want

P (Z > −0.25)
but this is not in the tables. From symmetry this is the
same as
P (Z < 0.25)
i.e.
Φ(0.25)
thus
P (Z > −0.25) = Φ(0.25)
= 0.5987

c)

43
1.13 Important Distributions 1 PROBABILITY

13 − 12
Z1 = = 0.25
4
15 − 12
Z2 = = 0.75
4
Therefore
P (0.25 < Z < 0.75) = Φ(0.75) − Φ(0.25)
= 0.7734 − 0.5987
= 0.1747

1.13.5 Common regions


The percentages of the Normal Distribution lying within
the given number of standard deviations either side of
the mean are approximately:
One Standard Deviation:

Two Standard Deviations:

44
1.13 Important Distributions 1 PROBABILITY

Three Standard Deviations:

45
1.14 Central Limit Theorem 1 PROBABILITY

1.14 Central Limit Theorem


The Central Limit Theorem states:
Suppose X1 , X2 , ......, Xn are n independent random
variables, each having the same distribution. Then as n
increases, the distributions of
X1 + X2 + ...... + Xn
and of
X1 + X2 + ...... + Xn
n
come increasingly to resemble normal distributions.
Why is this important ?
The importance lies in the fact:
(i) The common distribution of X is not stated - it can
be any distribution
(ii) The resemblance to a normal distribution holds for
remarkably small n
(iii) Total and means are quantities of interest
If X is a random variable with mean µ and standard
devaition σ fom an unknown distribution, the central
limit theorem states that the distribution of the sample
means is Normal.
But what are it’s mean and variance ?
Let us consider the sample mean as another random
variable, which we will denote X̄. We know that
X1 + X2 + ......Xn 1 1 1
X̄ = = X1 + X2 + ...... + Xn
n n n n
46
1.14 Central Limit Theorem 1 PROBABILITY

We want E(X̄) and V ar(X̄)

 
1 1 1
E(X̄) = E X1 + X2 + ...... + Xn
n n n
1 1 1
= E(X1 ) + E(X2 ) + ...... + E(Xn )
n n n
1 1 1
= µ + µ + ...... + µ
n  n n
1
= n µ
n
= µ
i.e. the expectation of the sample mean is the popu-
lation mean !

 
1 1 1
V ar(X̄) = V ar X1 + X2 + ...... + Xn
n n n
     
1 1 1
= V ar X1 + V ar X2 + ...... + V ar Xn
n n n
 2  2  2
1 1 1
= V ar(X1 ) + V ar(X2 ) + ..... + V ar(Xn )
n n n
 2  2  2
1 2 1 2 1
= σ + σ + ..... + σ2
n n n
 2
1
= n σ2
n
2
σ
=
n
Thus CLT tells us that where n is a sufficiently large

47
1.14 Central Limit Theorem 1 PROBABILITY

number of samples.
σ2
X̄ ∼ N (µ, )
n
Standardising, we get the equivalent result that
X̄ − µ
∼ N (0, 1)
√σ
n

This analysis could be repeated for the sum Sn = X1 +


X2 + ....... + Xn and we would find that
Sn − nµ
√ ∼ N (0, 1)
σ n
Example

Consider a 6 sided fair dice. We know that E(X) = 3.5


and V ar(X) = 3512 .

Let us now consider an experiment. The experiment


consists of rolling the dice n times and calculating the
average for the experiment. We will run 500 such exper-
iments and record the results in a Histogram.

n=1

In each experiment the dice is rolled once only, this


experiment is then repeated 500 times. The graph below
shows the resulting frequency chart.

48
1.14 Central Limit Theorem 1 PROBABILITY

This clearly resembles a uniform distribution (as ex-


pected).

Let us now increase the number of rolls, but continue


to carry out 500 experiments each time and see what
happens to the distribution of X̄

n=5

49
1.14 Central Limit Theorem 1 PROBABILITY

n=10

n=30

We can see that even for small sample sizes (number


of dice rolls), our resulting distribution begins to look
more like a Normal distribution. we can also note that
as n increases our distribution begins to narrow, i.e. the
2
variance becomes smaller σn , but the mean remains the
same µ.

50
2 STATISTICS

2 Statistics
2.1 Sampling
So far we have been dealing with populations, however
sometimes the population is too large to be able to anal-
yse and we need to use a sample in order to estimate the
population parameters, i.e. mean and variance.

Consider a population of N data points and a sample


taken from this population of n data points.

We know that the mean and variance of a population


are given by:
PN
i=1 xi
population mean, µ=
N
and
PN
2 i=1 (xi − x̄)2
population variance, σ =
N

51
2.1 Sampling 2 STATISTICS

But how can we use the sample to estimate our pop-


ulation parameters?

First we define an unbiased estimator. An unbiased


estimator is when the expected value of the estimator is
exactly equal to the corresponding population parame-
ter, i.e.
if x̄ is the sample mean then the unbiased estimator is
E(x̄) = µ
where the sample mean is given by:
PN
i=1 xi
x̄ =
n
2
If S is the sample variance, then the unbiased esti-
mator is
E(S 2 ) = σ 2
where the sample variance is given by:
Pn 2
2 i=1 (xi − x̄)
S =
n−1

2.1.1 Proof
From the CLT, we know:
E(X̄) = µ
and
σ2
V ar(X̄) =
n
Also
V ar(X̄) = E(X̄ 2 ) − [E(X̄)]2

52
2.1 Sampling 2 STATISTICS

i.e.
σ2
= E(X̄ 2 ) − µ2
n
or
σ2
+ µ2
E(X̄ 2 ) =
n
For a single piece of data n = 1, so
E(X̄i2 ) = σ 2 + µ2
Now
hX i hX i
2
E (Xi − X̄) = E Xi2 − nX̄ 2

X
= E(Xi2 ) − nE(X̄)2
 2 
σ
= nσ 2 + nµ2 − n + µ2
n
= nσ + nµ − σ − nµ2
2 2 2

= (n − 1)σ 2
P 
E (Xi − X̄)2
∴ σ2 =
n−1

53
2.2 Maximum Likelihood Estimation 2 STATISTICS

2.2 Maximum Likelihood Estimation


The Maximum Likelihood Estimation (MLE) is a sta-
tistical method used for fitting data to a model (Data
analysis).
We are asking the question:

”Given the set of data, what model parameters is most


likely to give this data?”

MLE is well defined for the standard distributions,


however in complex problems, the MLE may be unsuit-
able or even fail to exist.

Note:When using the MLE model we must first as-


sume a distribution, i.e. a parametric model, after which
we can try to determine the model parameters.

2.2.1 Motivating example


Consider data from a Binomial distribution with random
variable X and parameters n = 10 and p = p0 . The
parameter p0 is fixed and unknown to us. That is:
 
10
f (x; p0 ) = P (X = x) = P0x (1 − p0 )10−x
x
Now suppose we observe some data X = 3.

Our goal is to estimate the actual parameter value p0


based on the data.

54
2.2 Maximum Likelihood Estimation 2 STATISTICS

Thought Experiments:

let us assume p0 = 0.5, so probability of generating


the data we saw is

f (3; 0.5) = P (X = 3)
 
10
= (0.5)3 (0.5)7
3
≈ 0.117
Not very high !

How about p0 = 0.4, again

f (3; 0.4) = P (X = 3)
 
10
= (0.4)3 (0.6)7
3
≈ 0.215
better......

So in general let p0 = p and we want to maximise


f (3; p), i.e.
 
10
f (3; p) = P (X = 3) = P 3 (1 − p)7
3
Let us define a new function called the likelihood func-
tion `(p; 3) such that `(p; 3) = f (3; p). Now we want to
maximise this function.
Maximising this function is the same as maximising
the log of this function (we will explain why we do this

55
2.2 Maximum Likelihood Estimation 2 STATISTICS

later!), so let
L(p; 3) = log `(p; 3)
therefore,
 
10
L(p; 3) = 3 log p + 7 log(1 − p) + log
3
dL
To maximise we need to find dp =0

dL
= 0
dp
3 7
− = 0
p 1−p
3(1 − p) − 7p = 0
3
p =
10
3
Thus the value of p0 that maximises L(p; 3) is p = 10 .
This is called the Maximum Likelihood estimate of
p0 .

2.2.2 In General
If we have n pieces of iid data x1 , x2 , x3 , ....xn with prob-
ability density (or mass) function f (x1 , x2 , x3 , ....xn ; θ),
where θ are the unknown parameter(s). Then the Max-
imum likelihood function is defined as

`(θ; x1 , x2 , x3 , ....xn ) = f (x1 , x2 , x3 , ....xn ; θ)


and the log-likelihood function can be defined as

56
2.2 Maximum Likelihood Estimation 2 STATISTICS

L(θ; x1 , x2 , x3 , ....xn ) = log `(θ; x1 , x2 , x3 , ....xn )


Where the maximum likelihood estimate of the param-
eter(s) θ0 can be obtained by maximising L(θ; x1 , x2 , x3 , ....xn )

2.2.3 Normal Distribution


Consider a random variable X such that X ∼ N (µ, σ 2 ).
Let x1 , x2 , x3 , ....xn be a random sample of iid observa-
tions. To find the maximum likelihood estimators of µ
and σ 2 we need to maximise the log-likelihood function.

f (x1 , x2 , x3 , ....xn ; µ, σ) = f (x1 ; µ, σ).f (x2 ; µ, σ).......f (xn ; µ, σ)

`(µ, σ; x1 , x2 , x3 , ....xn ) = f (x1 ; µ, σ).f (x2 ; µ, σ).......f (xn ; µ, σ)

∴ L(µ, σ; x1 , x2 , x3 , ....xn ) = log `(µ, σ; x1 , x2 , x3 , ....xn )

= log f (x1 ; µ, σ) + log f (x2 ; µ, σ) + ..... + log f (xn ; µ

n
X
= logf (xi ; µ, σ)
i=1

For the Normal distribution


1 − (x−µ)
2

f (x; µ, σ) = √ e 2σ 2

σ 2π

57
2.2 Maximum Likelihood Estimation 2 STATISTICS

so
" n
#
X 1 (x −µ)2
− i2σ2
L(µ, σ; x1 , x2 , x3 , ....xn ) = log √ e
i=1
σ 2π
n
n 1 X
= − log(2π) − n log(σ) − 2 (xi − µ)2
2 2σ i=1

To maximise we differentiate partially with respect to µ


and σ set the derivatives to zero and solve. If we were
to do this, we would get:
n
1X
µ= xi
n i=1

and n
2 1X
σ = (xi − µ)2
n i=1

58
2.3 Regression and Correlation 2 STATISTICS

2.3 Regression and Correlation


2.3.1 Linear regression
We are often interested in looking at the relationship be-
tween two variables (bivariate data). If we can model
this relationship then we can use our model to make pre-
dictions.
A sensible first step would be to plot the data on a
scatter diagram, i.e. pairs of values (xi , yi )

Now we can try to fit a straight line through the data.


We would like to fit the straight line so as to minimise
the sum of the squared distances of the points from the
line. The different between the data value and the fitted
line is called the residual or error and the technique of
often referred to as the method of least squares.

59
2.3 Regression and Correlation 2 STATISTICS

If the equation of the line is given by


y = bx + a
then the error in y, i..e the residual of the ith data point
(xi , yi ) would be

ri = yi − y
= yi − (bxi + a)
We want to minimise n=∞ 2
P
n=1 ri , i.e.
n=∞
X n=∞
X
S.R = ri2 = [yi − (bxi + a)]2
n=1 n=1
Pn=∞
We want to find the b and a that minimise n=1 ri2 .

X
yi2 − 2yi (bxi + a) + (bxi + a)2

S.R =
X
yi2 − 2byi xi − 2ayi + b2 x2i + 2baxi + a2

=
or
= ny¯2 − 2bnxy¯ − 2anȳ + b2 nx¯2 + 2banx̄ + na2

60
2.3 Regression and Correlation 2 STATISTICS

To minimise, we want
∂(S.R)
(i) ∂b =0
∂(S.R)
(ii) ∂a =0
(i)
∂(S.R)
¯ + 2bnx¯2 + 2anx̄ = 0
= −2nxy
∂b
(ii)
∂(S.R)
= −2nȳ + 2bnx̄ + 2an = 0
∂a
These are linear simultaneous equations in b and a
and can be solved to get

Sxy
b=
Sxx
where
(xi )2
X X P
Sxx = (xi − x̄)2 = 2
(xi ) −
n
and
P P
X X ( xi )( yi )
Sxy = (xi − x̄)(yi − ȳ) = x i yi −
n

a = ȳ − bx̄
Example
x 5 10 15 20 25 30 35 40
y 98 90 81 66 61 47 39 34
X X X X X
xi = 180 yi = 516 x2i = 5100 yi2 = 37228 xi yi = 9585

61
2.3 Regression and Correlation 2 STATISTICS

180 × 516
Sxy = 9585 − = −2025
8
1802
Sxx = 5100 − = 1050
8
−2025
∴b= = −1.929
1050
180 516
x̄ = = 22.5 ȳ = = 64.5
8 8

∴ a = 64.5 − (−1.929 × 22.5) = 107.9


i.e.
y = −1.929x + 107.9

62
2.3 Regression and Correlation 2 STATISTICS

2.3.2 Correlation
A measure of how two variables are dependent is their
correlation. When viewing scatter graphs we can often
determine if their is any correlation by sight, e.g.

63
2.3 Regression and Correlation 2 STATISTICS

It is often advantageous to try to quantify the corre-


lation between between two variables, this can be done
in a number of ways, two such methods are described.

2.3.3 Pearson Product-Moment Corre-


lation Coefficient
A measure often used within statistics to quantify this
is the Pearson product-moment correlation coeffi-
cient. This correlation coefficient is a measure of linear
dependence between two variables, giving a value be-
tween +1 and −1.

Sxy
PMCC r = p
Sxx Syy
Example
Consider the previous example, i.e.

x 5 10 15 20 25 30 35 40
y 98 90 81 66 61 47 39 34

We calculated,

64
2.3 Regression and Correlation 2 STATISTICS

Sxy = −2025 and Sxx = 1050


also,
(yi )2
X X P
2
Syy = (yi − ȳ) = (yi2 ) −
n
i.e
5162
Syy = 37228 − = 3946
8
therefore,
−2025
r=√ = −0.995
1050 × 3946
This shows a strong negative correlation and if we were
to plot this using a scatter diagram, we can see this vi-
sually.

2.3.4 Spearman’s Rank Correlation Co-


efficient
Another method of measuring the relationship between
two variables is to use the Spearman’s rank corre-

65
2.3 Regression and Correlation 2 STATISTICS

lation coeffieint. Instead of dealing with the values


of the variables as in the product moment correlation
coefficient, we assign a number (rank) to each variable.
We then calculate a correlation coefficient based on the
ranks. The calculated value is called the Spearmans
Rank Correlation Coefficient, rs , and is an approxima-
tion to the PMCC.

6 d2i
P
rs = 1 −
n(n2 − 1)
where d is the difference in ranks and n is the number of
pairs.

Example
Consider two judges who score a dancing championship
and are tasked with ranking the competitors in order.
The following table shows the ranking that the judges
gave the competitors.
Competitor A B C D E F G H
JudgeX 3 1 6 7 5 4 8 2
JudgeY 2 1 5 8 4 3 7 6

calculating d2 , we get

dif f erence d 1 0 1 1 1 1 1 4
dif f erence2 d2 1 0 1 1 1 1 1 16
X
∴ d2i = 22 and n = 8
6 × 22
rs = 1 − = 0.738
8(82 − 1)

66
2.4 Time Series 2 STATISTICS

i.e. strong positive correlation

2.4 Time Series


A time series is a sequence of data points, measured typi-
cally at successive times spaced at uniform time intervals.
Examples of time series are the daily closing value of the
Dow Jones index or the annual flow volume of the Nile
River at Aswan.
Time series analysis comprises methods for analyzing
time series data in order to extract meaningful statistics
and other characteristics of the data.
Two methods for modeling time series data are (i)
Moving average models (MA) and (ii) Autoregressive
models.

2.4.1 Moving Average


The moving average model is a common approach to
modeling univariate data. Moving averages smooth the

67
2.4 Time Series 2 STATISTICS

price data to form a trend following indicator. They do


not predict price direction, but rather define the current
direction with a lag.
Moving averages lag because they are based on past
prices. Despite this lag, moving averages help smooth
price action and filter out the noise. The two most pop-
ular types of moving averages are the Simple Moving
Average (SMA) and the Exponential Moving Average
(EMA).

Simple moving average

A simple moving average is formed by computing the


average over a specific number of periods.
Consider a 5-day simple moving average for closing
prices of a stock. This is the five day sum of closing
prices divided by five. As its name implies, a moving
average is an average that moves. Old data is dropped
as new data comes available. This causes the average
to move along the time scale. Below is an example of a
5-day moving average evolving over three days.

The first day of the moving average simply covers the

68
2.4 Time Series 2 STATISTICS

last five days. The second day of the moving average


drops the first data point (11) and adds the new data
point (16). The third day of the moving average contin-
ues by dropping the first data point (12) and adding the
new data point (17). In the example above, prices grad-
ually increase from 11 to 17 over a total of seven days.
Notice that the moving average also rises from 13 to 15
over a three day calculation period. Also notice that
each moving average value is just below the last price.
For example, the moving average for day one equals 13
and the last price is 15. Prices the prior four days were
lower and this causes the moving average to lag.

Exponential moving average

Exponential moving averages reduce the lag by apply-


ing more weight to recent prices. The weighting applied
to the most recent price depends on the number of pe-
riods in the moving average. There are three steps to
calculating an exponential moving average. First, calcu-
late the simple moving average. An exponential moving
average (EMA) has to start somewhere so a simple mov-
ing average is used as the previous period’s EMA in the
first calculation. Second, calculate the weighting multi-
plier. Third, calculate the exponential moving average.
The formula below is for a 10-day E.

Ei+1 = 2−(n+1) (Pi+1 − Ei ) + Ei

69
2.4 Time Series 2 STATISTICS

A 10-period exponential moving average applies an


18.18% weighting to the most recent price. A 10-period
EMA can also be called an 18.18% EMA.
A 20-period EMA applies a 9.52% weighing to the
2
most recent price 20+1 = .0952. Notice that the weight-
ing for the shorter time period is more than the weighting
for the longer time period. In fact, the weighting drops
by half every time the moving average period doubles.

70
2.4 Time Series 2 STATISTICS

2.4.2 Autoregressive models


Autoregressive models are models that describe random
processes (denote here as et ) that can be described by
a weighted sum of its previous values and a white noise
error.
An AR(1) process is a first-order one process, meaning
that only the immediately previous value has a direct
effect on the current value

et = ret−1 + ut
where r is a constant that has absolute value less than
one, and ut is a white noise process drawn from a distri-

71
2.4 Time Series 2 STATISTICS

bution with mean zero and finite variance, often a normal


distribution.
An AR(2) would have the form

et = r1 et−1 + r2 et−2 + ut
and so on. In theory a process might be represented
by an AR(∞).

72

You might also like