CQF Jan Maths Primer 2013 Probability Blank
CQF Jan Maths Primer 2013 Probability Blank
FINANCE
CQF
June 2011
1
1 PROBABILITY
1 Probability
1.1 Preliminaries
• An experiment is a repeatable process that gives
rise to a number of outcomes.
• An event is a collection (or set) of one or more out-
comes.
• An sample space is the set of all possible outcomes
of an experiment, often denoted Ω.
Example
Thus
Ω = {1, 2, 3, 4, 5, 6}
If E1 , E2 , E3 are the events even, odd and prime occur-
ring, then
E1 ={2, 4, 6}
E2 ={1, 3, 5}
E3 ={2, 3, 5}
2
1.1 Preliminaries 1 PROBABILITY
Example
Ω ={1, 2, 3, 4, 5, 6}
A ={3, 6}
2
∴ P (A) =
6
3
1.2 Probability Diagrams 1 PROBABILITY
Example
5
P (8) =
36
Example
4
1.2 Probability Diagrams 1 PROBABILITY
Venn Diagram
• A∪B ”A or B”
5
1.2 Probability Diagrams 1 PROBABILITY
• A∩B ”A and B”
Addition Rule:
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
or
P (A ∩ B) = P (A) + P (B) − P (A ∪ B)
Example
6
1.2 Probability Diagrams 1 PROBABILITY
5 + 20
P (not in band) =
30
25 5
= =
30 6
20 2
P (not in either) = =
30 3
Example
7
1.2 Probability Diagrams 1 PROBABILITY
6
P (Dogs only) =
100
6 + 8 + 35 + 11 60
P (Does not own tropical fish) = =
100 100
11
P (Does not own dogs, cats or tropical fish) =
100
8
1.3 Conditional Probability 1 PROBABILITY
Example
13
P (study maths given they study biology) = P (M |B) =
29
In general, we have:
9
1.3 Conditional Probability 1 PROBABILITY
P (A ∩ B)
P (A|B) =
P (B)
or, Multiplication Rule:
P (A ∩ B) = P (A|B) × P (B)
Example
Tree Diagram!
4 3 1
P (K ∩ K) = × = =≈ 0.5%
52 51 221
or
3 4
P (K∩K) = P (2nd is King | first is king)×P (first is king) = ×
51 52
We know,
P (A ∩ B) = P (B ∩ A)
10
1.3 Conditional Probability 1 PROBABILITY
so
P (A ∩ B) = P (A|B) × P (B)
P (B ∩ A) = P (B|A) × P (A)
i.e.
P (A|B) × P (B) = P (B|A) × P (A)
or
Bayes’ Theorem:
P (A|B) × P (B)
P (B|A) =
P (A)
Example
11
1.3 Conditional Probability 1 PROBABILITY
P (5heads|H) × P (H)
P (H|5heads) =
P (5heads)
1
1 × 10
= 41
320
320
=
410
≈ 78%
12
1.4 Mutually exclusive and Independent events 1 PROBABILITY
P (A|B) = P (A)
and the multiplication rule becomes
P (A ∩ B) = P (A) × P (B)
13
1.5 Two famous problems 1 PROBABILITY
Example
14
1.6 Random Variables 1 PROBABILITY
1.6.2 Definition
Outcomes of experiments are not always numbers, e.g.
two heads appearing; picking an ace from a deck of cards.
We need some way of assigning real numbers to each ran-
dom event. Random variables assign numbers to events.
Thus a random variable (RV) X is a function which
maps from the sample space Ω to the number line.
Example
15
1.7 Probability Distributions 1 PROBABILITY
Example
The RV X is defined as’ the sum of scores shown by
two fair six sided dice’. Find the probability distribution
of X
A sample space diagram for the experiment is:
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (X = x) 36 36 36 36 36 36 36 36 36 36 36
16
1.7 Probability Distributions 1 PROBABILITY
17
1.7 Probability Distributions 1 PROBABILITY
Z b
P (a < X < b) = f (x)dx
a
Properties of a PDF:
R +∞
• ∞ f (x)dx = 1
Rb
• P (a < X < b) = a f (x)dx
Example
18
1.7 Probability Distributions 1 PROBABILITY
a)
Z +∞
f (x)dx = 1
∞
Z 2 Z 4
1 = kdx + k(x − 1)dx
1 2
2 4
2 kx
1 = [kx]1 + − kx
2 2
1 = 2k − k + [(8k − 4k) − (2k − 2k)]
1 = 5k
1
∴k =
5
19
1.8 Cumulative Distribution Function 1 PROBABILITY
b)
Z 1.5
1
P (X ≤ 1.5) = dx
5
h 1x i1.5
=
5 1
1
=
10
x 1 2 3 4 5 6
P (X = x) 12 14 18 1
16
1
32
1
32
F (X) = P (X ≤ x)
Find:
a) F (2) and
b) F (4.5)
20
1.8 Cumulative Distribution Function 1 PROBABILITY
a)
F (2) = P (X ≤ 2) = P (X = 1) + P (X = 2)
1 1
= +
2 4
3
=
4
b)
F (4.5) = P (X ≤ 4.5) = P (X ≤ 4)
1 1 1 1
= + + +
16 8 4 2
15
=
16
A PDF is defined as
3
11 (4 − x2 ) 0≤x≤1
f (x) =
0 otherwise
Find the CDF
21
1.8 Cumulative Distribution Function 1 PROBABILITY
Consider:
From −∞ to 0: F (x) = 0
From 1 to ∞: F (x) = 1
From 0 to 1 :
22
1.8 Cumulative Distribution Function 1 PROBABILITY
Z x
3
F (x) = (4 − x2 )dx
0 11
x
x3
3
= 4x −
11 3 0
x3
3
= 4x −
11 3
i.e.
0 h
i x<0
3
3 x
F (x) = 11 4x − 3 0≤x≤1
1 x>1
Example
a)
P (1.5 ≤ x ≤ 2.5) = F (2.5) − F (1.5)
1 1
= (2.52 + 2(2.5) − 3) − (1.52 + 2(1.5) − 3)
12 12
= 0.5
23
1.9 Expectation and Variance 1 PROBABILITY
b)
d
f (x) = F (x)
dx
1
6 (x + 1) 1≤x≤3
f (x) =
0 otherwise
x 1 2 3 4
P (X = x) 12 14 18 18
then
24
1.9 Expectation and Variance 1 PROBABILITY
1 1 1 1
E(X) = (1 × ) + (2 × ) + (3 × ) + (4 × )
2 4 8 8
15
=
8
25
1.9 Expectation and Variance 1 PROBABILITY
Aside
What is Variance?
(x − µ)2
P
Variance =
P 2n
x
= − µ2
n
rP
(x − µ)2
Standard deviation =
rP n
x2
= − µ2
n
26
1.9 Expectation and Variance 1 PROBABILITY
1 1 1
E(X 2 ) = 12 × + 22 × + 32 × 18 + 42 ×
2 4 8
15
E(X) =
18
71
∴ V ar(X) =
64
= 1.10937...
Standard Deviation = 1.05(3s.f)
and
V ar(X) = E(X 2 ) − [E(X)]2
Z Z 2
2
= x f (x)dx − xf (x)dx
allx allx
Example
if
3
− x2 )
32 (4x 0≤x≤4
f (x) =
0 otherwise
27
1.9 Expectation and Variance 1 PROBABILITY
Z 4
3
E(X) = x. (4x − x2 )dx
0 32
Z 4
3
= 4x − x2 dx
32 0
4
3 4x3 x4
= −
32 3 4 0
3
44
3 4(4)
= − − (0)
32 3 4
= 2
28
1.10 Expectation Algebra 1 PROBABILITY
29
1.11 Moments 1 PROBABILITY
1.11 Moments
The first moment is E(X) = µ
E[(X − µ)3 ]
Skewness =
σ3
This is a measure of asymmetry of a distribution. A
distribution which is symmetric has skew of 0. Negative
values of the skewness indicate data that are skewed to
the left, where positive values of skewness indicate data
skewed to the right.
30
1.11 Moments 1 PROBABILITY
E[(X − µ)4 ]
Kurtosis =
σ4
A normal random variable has Kurtosis of 3 irrespec-
tive of its mean and standard deviation. Often when
comparing a distribution to the normal distribution, the
measure of excess Kurtosis is used, i.e. Kurtosis of
distribution −3.
xi < µ ± σ :
31
1.11 Moments 1 PROBABILITY
32
1.11 Moments 1 PROBABILITY
33
1.12 Covariance 1 PROBABILITY
1.12 Covariance
The covariance is useful in studying the statistical de-
pendence between two random variables. If X and Y
are random variables, then theor covariance is defined
as:
X = 1, E(X) = 0
Y = 3, E(Y ) = 4
Now
X − E(X) = 1
and
Y − E(Y ) = −1
i.e.
Cov(X, Y ) = −1
34
1.12 Covariance 1 PROBABILITY
X = 4, E(X) = 0
Y = 7, E(Y ) = 4
Now
X − E(X) = 4
and
Y − E(Y ) = 3
i.e.
Cov(X, Y ) = 12
i.e positive
Cov(X, Y )
ρXY = ; −1 ≤ ρXY ≤ 1
σX σY
If ρXY = 0 =⇒ uncorrelated
35
1.13 Important Distributions 1 PROBABILITY
a) P (X = 3)
b) P (X < 4)
a)
10
P (X = 3) = (0.23)3 (1 − 0.23)7
3
= 0.2343
36
1.13 Important Distributions 1 PROBABILITY
b)
P (X < 4) = P (X ≤ 3)
= P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
10 0 10 10
= (0.23) (0.77) + (0.23)1 (0.77)9
0 1
10 10
+ (0.23)2 (0.77)8 + (0.23)3 (0.77)7
2 3
= 0.821(3 d.p)
Example
37
1.13 Important Distributions 1 PROBABILITY
38
1.13 Important Distributions 1 PROBABILITY
e−2 24
P (X = 4) = = 0.090(3 d.p)
4!
λ = 4, i.e. E(Y ) = 4
and
e−4 46
P (Y = 6) = = 0.104(3 d.p)
6!
39
1.13 Important Distributions 1 PROBABILITY
40
1.13 Important Distributions 1 PROBABILITY
41
1.13 Important Distributions 1 PROBABILITY
X −µ
Z=
σ
Example
X − µ 14 − 12
Z= = = 0.5
σ 4
Therefore we want
P (Z ≤ 0.5) = Φ(0.5)
= 0.6915
(from tables)
b)
42
1.13 Important Distributions 1 PROBABILITY
11 − 12
Z= = −0.25
4
Therefore we want
P (Z > −0.25)
but this is not in the tables. From symmetry this is the
same as
P (Z < 0.25)
i.e.
Φ(0.25)
thus
P (Z > −0.25) = Φ(0.25)
= 0.5987
c)
43
1.13 Important Distributions 1 PROBABILITY
13 − 12
Z1 = = 0.25
4
15 − 12
Z2 = = 0.75
4
Therefore
P (0.25 < Z < 0.75) = Φ(0.75) − Φ(0.25)
= 0.7734 − 0.5987
= 0.1747
44
1.13 Important Distributions 1 PROBABILITY
45
1.14 Central Limit Theorem 1 PROBABILITY
1 1 1
E(X̄) = E X1 + X2 + ...... + Xn
n n n
1 1 1
= E(X1 ) + E(X2 ) + ...... + E(Xn )
n n n
1 1 1
= µ + µ + ...... + µ
n n n
1
= n µ
n
= µ
i.e. the expectation of the sample mean is the popu-
lation mean !
1 1 1
V ar(X̄) = V ar X1 + X2 + ...... + Xn
n n n
1 1 1
= V ar X1 + V ar X2 + ...... + V ar Xn
n n n
2 2 2
1 1 1
= V ar(X1 ) + V ar(X2 ) + ..... + V ar(Xn )
n n n
2 2 2
1 2 1 2 1
= σ + σ + ..... + σ2
n n n
2
1
= n σ2
n
2
σ
=
n
Thus CLT tells us that where n is a sufficiently large
47
1.14 Central Limit Theorem 1 PROBABILITY
number of samples.
σ2
X̄ ∼ N (µ, )
n
Standardising, we get the equivalent result that
X̄ − µ
∼ N (0, 1)
√σ
n
n=1
48
1.14 Central Limit Theorem 1 PROBABILITY
n=5
49
1.14 Central Limit Theorem 1 PROBABILITY
n=10
n=30
50
2 STATISTICS
2 Statistics
2.1 Sampling
So far we have been dealing with populations, however
sometimes the population is too large to be able to anal-
yse and we need to use a sample in order to estimate the
population parameters, i.e. mean and variance.
51
2.1 Sampling 2 STATISTICS
2.1.1 Proof
From the CLT, we know:
E(X̄) = µ
and
σ2
V ar(X̄) =
n
Also
V ar(X̄) = E(X̄ 2 ) − [E(X̄)]2
52
2.1 Sampling 2 STATISTICS
i.e.
σ2
= E(X̄ 2 ) − µ2
n
or
σ2
+ µ2
E(X̄ 2 ) =
n
For a single piece of data n = 1, so
E(X̄i2 ) = σ 2 + µ2
Now
hX i hX i
2
E (Xi − X̄) = E Xi2 − nX̄ 2
X
= E(Xi2 ) − nE(X̄)2
2
σ
= nσ 2 + nµ2 − n + µ2
n
= nσ + nµ − σ − nµ2
2 2 2
= (n − 1)σ 2
P
E (Xi − X̄)2
∴ σ2 =
n−1
53
2.2 Maximum Likelihood Estimation 2 STATISTICS
54
2.2 Maximum Likelihood Estimation 2 STATISTICS
Thought Experiments:
f (3; 0.5) = P (X = 3)
10
= (0.5)3 (0.5)7
3
≈ 0.117
Not very high !
f (3; 0.4) = P (X = 3)
10
= (0.4)3 (0.6)7
3
≈ 0.215
better......
55
2.2 Maximum Likelihood Estimation 2 STATISTICS
later!), so let
L(p; 3) = log `(p; 3)
therefore,
10
L(p; 3) = 3 log p + 7 log(1 − p) + log
3
dL
To maximise we need to find dp =0
dL
= 0
dp
3 7
− = 0
p 1−p
3(1 − p) − 7p = 0
3
p =
10
3
Thus the value of p0 that maximises L(p; 3) is p = 10 .
This is called the Maximum Likelihood estimate of
p0 .
2.2.2 In General
If we have n pieces of iid data x1 , x2 , x3 , ....xn with prob-
ability density (or mass) function f (x1 , x2 , x3 , ....xn ; θ),
where θ are the unknown parameter(s). Then the Max-
imum likelihood function is defined as
56
2.2 Maximum Likelihood Estimation 2 STATISTICS
n
X
= logf (xi ; µ, σ)
i=1
f (x; µ, σ) = √ e 2σ 2
σ 2π
57
2.2 Maximum Likelihood Estimation 2 STATISTICS
so
" n
#
X 1 (x −µ)2
− i2σ2
L(µ, σ; x1 , x2 , x3 , ....xn ) = log √ e
i=1
σ 2π
n
n 1 X
= − log(2π) − n log(σ) − 2 (xi − µ)2
2 2σ i=1
and n
2 1X
σ = (xi − µ)2
n i=1
58
2.3 Regression and Correlation 2 STATISTICS
59
2.3 Regression and Correlation 2 STATISTICS
ri = yi − y
= yi − (bxi + a)
We want to minimise n=∞ 2
P
n=1 ri , i.e.
n=∞
X n=∞
X
S.R = ri2 = [yi − (bxi + a)]2
n=1 n=1
Pn=∞
We want to find the b and a that minimise n=1 ri2 .
X
yi2 − 2yi (bxi + a) + (bxi + a)2
S.R =
X
yi2 − 2byi xi − 2ayi + b2 x2i + 2baxi + a2
=
or
= ny¯2 − 2bnxy¯ − 2anȳ + b2 nx¯2 + 2banx̄ + na2
60
2.3 Regression and Correlation 2 STATISTICS
To minimise, we want
∂(S.R)
(i) ∂b =0
∂(S.R)
(ii) ∂a =0
(i)
∂(S.R)
¯ + 2bnx¯2 + 2anx̄ = 0
= −2nxy
∂b
(ii)
∂(S.R)
= −2nȳ + 2bnx̄ + 2an = 0
∂a
These are linear simultaneous equations in b and a
and can be solved to get
Sxy
b=
Sxx
where
(xi )2
X X P
Sxx = (xi − x̄)2 = 2
(xi ) −
n
and
P P
X X ( xi )( yi )
Sxy = (xi − x̄)(yi − ȳ) = x i yi −
n
a = ȳ − bx̄
Example
x 5 10 15 20 25 30 35 40
y 98 90 81 66 61 47 39 34
X X X X X
xi = 180 yi = 516 x2i = 5100 yi2 = 37228 xi yi = 9585
61
2.3 Regression and Correlation 2 STATISTICS
180 × 516
Sxy = 9585 − = −2025
8
1802
Sxx = 5100 − = 1050
8
−2025
∴b= = −1.929
1050
180 516
x̄ = = 22.5 ȳ = = 64.5
8 8
62
2.3 Regression and Correlation 2 STATISTICS
2.3.2 Correlation
A measure of how two variables are dependent is their
correlation. When viewing scatter graphs we can often
determine if their is any correlation by sight, e.g.
63
2.3 Regression and Correlation 2 STATISTICS
Sxy
PMCC r = p
Sxx Syy
Example
Consider the previous example, i.e.
x 5 10 15 20 25 30 35 40
y 98 90 81 66 61 47 39 34
We calculated,
64
2.3 Regression and Correlation 2 STATISTICS
65
2.3 Regression and Correlation 2 STATISTICS
6 d2i
P
rs = 1 −
n(n2 − 1)
where d is the difference in ranks and n is the number of
pairs.
Example
Consider two judges who score a dancing championship
and are tasked with ranking the competitors in order.
The following table shows the ranking that the judges
gave the competitors.
Competitor A B C D E F G H
JudgeX 3 1 6 7 5 4 8 2
JudgeY 2 1 5 8 4 3 7 6
calculating d2 , we get
dif f erence d 1 0 1 1 1 1 1 4
dif f erence2 d2 1 0 1 1 1 1 1 16
X
∴ d2i = 22 and n = 8
6 × 22
rs = 1 − = 0.738
8(82 − 1)
66
2.4 Time Series 2 STATISTICS
67
2.4 Time Series 2 STATISTICS
68
2.4 Time Series 2 STATISTICS
69
2.4 Time Series 2 STATISTICS
70
2.4 Time Series 2 STATISTICS
et = ret−1 + ut
where r is a constant that has absolute value less than
one, and ut is a white noise process drawn from a distri-
71
2.4 Time Series 2 STATISTICS
et = r1 et−1 + r2 et−2 + ut
and so on. In theory a process might be represented
by an AR(∞).
72