Gambling, Random Walks and The Central Limit Theorem: 3.1 Random Variables and Laws of Large Num-Bers
Gambling, Random Walks and The Central Limit Theorem: 3.1 Random Variables and Laws of Large Num-Bers
In this Chapter we will discuss the basic limit theorems in probability theory,
including the law of large numbers, central limit theorem and the theory of
large deviations. We will demonstrate the application of the law of large
numbers and CLT to the problem of Monte-Carlo calculations of integrals
and the theory of portfolio optimization. Using the simplest discrete portfolio
model we will explain the application of the theory of large deviations to risk
estimation. We will discuss the law of large numbers for Markov chains and
use it to analyse the simplest stochastic volatility model. We will then move
on to more complex stochastic volatility models to illustrate the limitations
of CLT in application to financial time series.
Finally we will analyse the discrete random walks in the context of the
Gambler’s ruin problem and apply the developed formalism to Cox-Ross-
Rubinstein model. In particular, we will derive Black-Scholes equation and
Black-Scholes formula for the value of a European call option in the contin-
uous limit of CRR model.
42
3.1.1 Useful probabilistic tools.
Chebyshev inequality gives a simple estimate for the probability of a devia-
tion from the mean:
σ2
P r(|ξ − µ| ≥ t) ≤ .
t2
Proof:
Sk = ξ 1 + . . . + ξ k
be the k-th sample sum. Let mk and sk be the expectation and the standard
deviation of Sk correspondingly.
For any t > 0,
P r(|S1 − m1 | < tsn , |S2 − m2 | < tsn , . . . , |Sn − mn | < tsn ) ≥ 1 − 1/t2
43
3.1.2 Weak law of large numbers.
Theorem 3.1.3. Let {ξk } be the sequence of mutually independent identically
distributed variables. If the expectation µ = E(ξk ) exists, then for every ǫ > 0,
where Sn = nk=1 ξk is n-th sample sum. In other words, the probability that
P
sample average Sn /n differs from the expectation by less than an arbitrarily
prescribed ǫ tends to one.1
Proof: Note that the theorem does not require the existence of the vari-
ance of ξ. For the proof of the theorem in the complete generality, see Feller,
p. 231. Here we present the proof for the case σ 2 = V ar(ξ) < ∞: As
variances of independent variables add,
V ar(Sn ) = nσ 2 .
44
(1 − δ) .2
will be satisfied with probability >P
We will use the notation Sn = nk=1 ξk for the sample sum, mn = E(Sn )
for the expectation value of the sample sum.
The following two theorems give sufficient conditions for a sequence to
satisfy the strong law of large numbers:
Proof: Let Aν be the event that for at least one n: 2ν−1 < n ≤ 2ν , the
inequality (3.1.1) does not hold. We have to prove that for for ν > log2 (N )
and all r,
P r(Aν ) + P r(Aν+1 ) + . . . + P r(Aν+r ) < δ
P
In other words we need to prove the convergence of the series ν P r(Aν ).
By definition of Aν ,
|Sn − mn | ≥ nǫ > 2ν−1 ǫ,
for some n ∈ {2ν−1 + 1, . . . 2ν }. Hence by Kolmogorov inequality,
−2
P r(Aν ) ≤ 2ν−1 ǫ s22ν ,
45
Proof: Again, we would like to stress that the existence of variance is
not required for the validity of the Theorem. For the sake of brevity we will
present the proof under the extra assumption of the existence of variance,
V ar(ξk ) = σ 2 for all k. In this case the sequence
X σ2 X 1
2
2
= σ 2
<∞
k
k k
k
and the statement follows from Kolmogorov criterion.
It’s worth noting that Theorem 3.1.5 supersedes Theorem 3.1.3, as strong
convergence implies weak convergence.
46
counts the number of heads in our experiment.
We would like to estimate the probability
β
X n
P rob(α ≤ Sn ≤ β) = pk q n−k (A)
k=α
k
in the limit n → ∞.
The law of large numbers implies that for large n the probability of ob-
serving the value of Sn far from pn is small. This motivates the introduction
of a new variable
k = np + δk
n
To estimate np+δ k
in the limit n → ∞, we use Stirling formula,
√
n! ∼ 2πnn+1/2 e−n , n >> 1.
The result is
2
n 1 δk
pk q n−k ∼ √ e− 2nσ2 +Error ,
np + δk 2πnσ 2
δ3
where Error = C(p, q) nk2 + . . .. Here σ 2 = pq is the variance of ξ. For the
error to be small we must have
δk3
→ 0,
n2
as n → ∞. If this condition is satisfied for all terms in the expression (A), we
can conclude that the probability for the number of successes to fall between
α and β is given by the sum of Gaussian distribution functions!
Finally, let
δk
xk = √
σ n
√
Then, provided that x3k / n → 0 as n → ∞,
Sn − pn 1 √
P rob(xα ≤ √ ≤ xβ ) = √ (g(xα ) + g(xα + 1/ σ 2 n) + . . . + g(xβ )).
σ n σn
√ 2
Here g(x) = 1/ 2πe−x /2 . The above (Riemann) sum converges to the inte-
gral in the limit n → ∞ if we keep xα = A and xβ = B fixed, yielding the
statement of the theorem 3.2.1.
47
The existence of the expectation, |E(ξ)| < ∞ is essential for the validity
of the weak law of large numbers. Consider for example Cauchy distribution
on R characterised by density
1 1
̺(x) = .
π 1 + x2
Let ξk be i. i. d. Cauchy random variable. It is possible to check that for
any n,
n
1 X Law
ξk ∼ ξ1 ,
n k=1
i. e. the empirical mean of any number of ξk ’s is still the Cauchy random
variable with the same distribution as ξ1 .
Let us illustrate CLT by computing the 95% confidence interval of the
sample mean
n
1X
Sn /n = ξi ,
n i=1
where {xi } is the sequence of i. i. d. random variables, E(ξi ) = µ, var(ξ) =
σ 2 . By definition 95% confidence interval is the interval symmetric around
the mean such the probability of random variable falling into this interval is
0.95.
If n is large, CLT applies and
Z B
σ σ 1 2
P µ̄ − B √ ≤ Sn /n ≤ µ̄ + B √ ≈√ e−x /2 dx = 0.95
n n 2π −B
48
It is remarkable that we can get a definite answer without the detailed
knowledge of the probability distribution of ξi ’s. Even if σ 2 is unknown we
can estimate its value using the law of large numbers
n
1 X
σ2 ≈ (ξi − µ̄)2 .
n − 1 i=1
If the ξi are i.i.d. then V ar(Sn ) = nσ 2 .
If ξi are independent random variables but do not have the same distri-
bution, then one needs some further conditions on the distributions of ξi in
order for the Central Limit Theorem to hold. One can weaken the assump-
tion about independence somewhat, but it is still important to have some
asymptotic independence.
The central limit theorem gives the following error bound, There is a 95%
probability that
(ξ1 + · · · + ξn )
n
lies between
Z 1 Z 1
σ σ
f (t) dt − 1.96 √ , f (t) dt + 1.96 √
0 n 0 n
R1
where σ 2 is equal to 0 [f (t)]2 dt − ( f (t)dt)2 , and could again be estimated
R
from the sample . So in order to increase the precision (but again with the
49
same confidence) by a factor 10 one needs to increase the sample size by a
factor 100.
It is good to compare this with most simple classical numerical integra-
tion scheme (so not using a Monte Carlo approach). Let f : [0, 1] → R
be differentiable and such that |f ′ | ≤ M , and take yi be equally spaced:
yi = i/n, i = 0, 1, . . . , n. Then by the Mean-Value Theorem, f (x) ∈
[f (yi ) − M/n, f (yi ) + M/n] for each x ∈ [yi , yi+1 ]. So
Z yi+1
f (yi ) M M
f (t) dt ∈ + [− 2 , 2 ].
yi n n n
Hence n
1
1X
Z
f (t) dt ∈ f (yi ) + [−M n−2 , Mn−2 ].
0 n i=1
This implies that using the non-probabilistic method, one increases the pre-
cision by a factor 100 by merely increasing the sample size by a factor 10. So
for such a nice function defined on an interval it does not really make sense
to use Monte Carlo methods for estimating the integral. In fact, any integra-
tion method used in practice approximates the integral by a trapezium rule,
quadratic interpolation, or even more advanced methods, and thus obtains
much better approximations than the above very naive method (provided the
function is sufficiently differentiable). But all these methods get less efficient
in higher dimensions: if f : [0, 1]d → R the error bounds get worse for large
d. For example, the analogous calculation, gives for the non-probabilistic
method an error bound of
Z 1 Z 1 n
1X
... f (t1 , . . . , td ) dt1 . . . dtd ∈ f (yi ) + [−M n−2/d , M n−2/d ].
0 0 n i=1
For large values of d these bounds become bad: the curse of dimensionality.
Compare this with the error of n−1/2 of the Monte-Carlo method. So if d is
large, the Monte-Carlo method wins.
So we see that Monte-Carlo methods for estimating integrals is some-
thing which is particularly useful when we work in higher dimensions, or
50
for functions which are not smooth. This is precisely the setting one has in
calculating stochastic integrals!
In the second half of the course various ways of improving on the simplest
Monte-Carlo method we discussed will be considered.
• Build the model of an unfair coin tossing experiment in MATLAB using ’binornd’
and ’cumsum’ commands.
• Investigate the fluctuations of the sample mean around the limiting value using
for sample size N = 1000: Use ’binopdf’ and ’hist’ commands to build the
probability distribution function of sample mean M1000 numerically and compare
your result against the prediction of central limit theorem for p = 1/4, 1/2, 3/4.
2. Monte-Carlo integration.
• Write your own MATLAB Monte-Carlo routine for calculating the integrals of
2
functions f (x) = x3 and f (x) = e−x over the interval [0, 1]. Analyse the conver-
gence of your routine as the function of the number of sampling points. Compare
your findings with the prediction of central limit theorem.
• Can you calculate the integral of f (x) = x−2/3 using Monte-Carlo method? Can
you use CLT to estimate the precision of your calculation? Explain your answers.
51
Firstly, let us calculate the expected amount of money the gambler has
after n steps: Let ξk = 1 if the gambler wins in the k-th roundPand ξk = 0
n
if he loses. Then total number of wins in n rounds is Sn = k=1 ξk and
the total number of losses is n − Sn . Therefore, the amount of money the
gambler has after n rounds is
Mn = GSn Ln−Sn M0
pounds, where G = 1.8, L = 0.4. On average this is
! "
S
n G n
E(Mn ) = M0 L E
L
As the rounds are independent, Mn is the product of independent random
variables,
n ξk
Y G
Mn = L .
k=1
L
Therefore,
n
! "
ξ n
n
Y G k G+L
E(Mn ) = M0 L E = M0
k=1
L 2
Numerically,
E(Mn ) = M0 (1.1)n ,
meaning that the expected gain is growing exponentially. Imagine that a
casino offers this game for 1.09n M0 pounds for n-rounds, where M0 is the
amount the player wishes to start with. The argument for playing is that
the expected return will be quite big. For instance for n = 100 it is
(1.1n − 1.09n ) · M0 = 8.25 · 103 · M0 .
Should the gambler play this game?
Before answering ’you bet!’, consider the following argument: Let
1 Mn
rn = ln
n M0
be the average rate of increase of your bet. We can re-write this in terms of
the sample sum Sn as follows:
Sn
rn = ln(L) + ln(G/L) · (3.3.2)
n
52
The gambler has won the money if rn > 0 or
Sn ln(L)
>− ≈ 0.61.
n ln(G/L)
Now that is worrying: according to the strong law of large numbers, Snn
converges almost surely to 1/2 < 0.61! CLT allows us to quantify our worries:
the expected value of the sample mean Sn /n is 1/2. The standard deviation
is 2√1 n . CLT tells us that for large n, Sn /n will fall into the following interval
with probability 0.95:
√ √
[1/2 − 1.96/2/ n, 1/2 + 1.96/2/ n]
Using (3.3.2) we can re-state the above by saying that with probability 0.95,
√ √
−0.16 − 1.47/ n ≤ rn ≤ −0.16 + 1.47/ n.
53
We should however ask the question: how large should N be to ensure
that CN approaches ECN with probability close to one? The most crude
estimate following from CLT is that
Equivalently,
n
E(Mi2 ) G2 + L 2
N >> −1= 2 − 1 ≈ 6 · 1014
(E(Mi ))2 (G + L)2
Horizontal Diversification.
54
Horizontal diversification is when you diversify between same-type invest-
ments. It can be a broad diversification (like investing in several NASDAQ
companies) or more narrowed (investing in several stocks of the same branch
or sector).
Vertical Diversification.
µk = µ > 0 ∀k.
As we are interested in the properties of our portfolio for large values of
n, we need some extra assumptions to control the set of our random variables
in the limit n → ∞. We will use the simplest assumption - that all returns
are uniformly bounded by an n-independent constant,
55
that’s it. (Which proves by the way, that such an asset does not exist.) We
will assume that all sigma’s are uniformly bounded away from zero,
0 < v < σk2 , ∀k. (3.3.4)
Let Ni be the numbers of assets of type i in the portfolio, 1 ≤ i ≤ n. Let
n
X
N= Nk .
k=1
Note that in the terminology of probability theory, the average return is just
the empirical mean of all returns. The expected value of the average return
is n
1 X
E(Rn ) = Nk µ = µ,
N k=1
the variance is n
1 X 2 2
V ar(Rn ) = 2 N σ .
N k=1 k k
The simplest way to demonstrate the advantage of portfolio diversification
is to consider the case
N1 = N2 = . . . Nn = K,
where K does not depend on n. In other words we assume that our portfolio
contains the equal number K of assets of each type. In this case
n
1 X 2 1 n→∞
V ar(Rn ) = 2 σk < (2V )2 −→ 0.
n k=1 n
56
Therefore, Rn strongly converges to µ as n → ∞.
As n = ∞ is unrealistic, we also need to know the statistics of fluctuations
of Rn around the mean value. This information is provided by central limit
theorem. Here is the slight modification of CLT for independent, but not
necessarily identically distributed variables:
Theorem 3.3.1. (Lindeberg) Every uniformly bounded (i. e. satisfying
(3.3.3)) sequence X1 , X2 , . . . of mutually independent random variables obeys
Z β
Sn − µn dx − x2
P r(α < < β) = e 2,
σn α 2π
In our case, the second constraint is absent as E(Rn ) = µ for any choice
of Nk ’s. The minimization problem of V ar(Rn ) subject to the constraint
P
k Nk = N is an example of the classical quadratic optimization problem.
3
See Yuh-Dauh Lyuu, Financial Engineering and Computation, for more details. Harry
Markowitz developed the modern portfolio theory in 1950’s. In 1990 he was awarded Nobel
prize in economics.
57
For the case at hand, this problem has a unique solution (as V ar(Rn ) is a
convex function of Nk ’s and there is at least one set of positive Nk ’s solving
the constraint) which can be found using the method of Lagrange multipliers.
Let X
F (N1 , . . . Nk , λ) = V ar(Rn ) − λ( Nk − N ).
k
(0) λN 2
Nk = .
2σk2
Note that the optimal portfolio tends to have more assets with smaller vari-
ance. Lagrange multiplier λ can be computed from the constraint (3.3.5)
giving us
(0) N
Nk = Pn σ −2
−2 k
σ
m=1 m
The minimal variance is
n
1 X (0) 2 2 1
min(V ar(Rn )) = Nk σ k = P n .
N 2 k=1 −2
m=1 σm
Using (3.3.3), we can immediately verify that the optimized variance van-
ishes in the limit of large portfolio size:
(2V )2
min(V ar(Rn )) ≤ → 0, for n → ∞.
n
Fantastic, we managed to optimize our portfolio and the resulting variance
of the mean return becomes small as the number of assets grows. Moreover,
in virtue of the theorem 3.3.1, the fluctuations
√ of Rn around the mean value
µ in the interval of length of order 1/ n are Gaussian with variance equal
to min(V ar(Rn )) computed above.4
4
My friend from Moscow ’City’ said this about Markowitz portfolio theory: ’As a quan-
titative theory it never works for realistic portfolio sizes as something is always happening
which invalidates CLT. But thanks God for Markowitz as now we can explain to customers
why their portfolios have so many under-performing assets.’
58
It is important to remember that both the law of large numbers and
the central limit theorem require mutual independence of random variables.
Portfolio theory provides an example of the spectacular violation of these
limit theorems due to correlations between returns from the various assets.
Let
σij = E ((ri − E(ri ))(rj − E(rj ))) , i 6= j
be the covariance of the returns of assets i and j. Then
! n "
1 X 2 2 X
V ar(Rn ) = 2 Nk σk + Ni Nj σij
N k=1 i6=j
The second term on the right hand side of the above expression is called
(somewhat misleadingly) the systematic portfolio risk. The first term is
referred to as the special or non-systematic risk. The systematic risk is
determined by the correlations between distinct assets in the portfolio.
To make the point, let us assume that Nk = K for all k. Let us also
assume that all variances are equal to each other and to σ 2 . Finally, let us
assume that all assets are correlated in such a way that
σij = zσ 2 ,
σ 2 n(n − 1) 2
V ar(Rn ) = + zσ .
n n2
Note that non-systematic risk vanishes in the limit of large n, whereas sys-
tematic risk converges to the limit equal to the mean covariance for all pairs
of assets (Markowitz law of mean covariation). It is clear that in the
presence of mean covariance, the sequence Rn does not converge to µ. As a
result, it is impossible to achieve perfect diversification of the corresponding
portfolio.
It is still possible to prove limiting theorems for non-independent random
variables given that the correlations are weak in some sense. A typical ex-
ample is served by the infinite sequence of correlated random variables which
can be broken into subsets of fixed size such that elements belonging to dif-
ferent subsets are mutually independent. We are not going to pursue the
limiting theorems for correlated variables in this course.
59
3.4 Risk estimation and the theory of large
deviations.
Despite the demonstrated usefulness of the law of large numbers and central
limit theorems for portfolio optimization and game analysis there are some
important questions which these theorems do not address. Assume for ex-
ample that we have a well diversified portfolio √ consisting of n assets with
expected return µ and standard deviation C/ n. Central limit theorem al-
lows us
√ to estimate the probability of the return to be in the interval of size
∼ 1/ n around the mean value. However, it cannot be used to estimate
the probability of a large loss L, Rn < −L in the limit of large n. Which
is unfortunate, as this probability is of principle importance for contingency
planning for rare but catastrophic events.
The probability of large losses can often be effectively estimated using the
theory of large deviations. To illustrate this theory we will use the following
simple example:
Assume that out portfolio consists on n independent assets. Each asset
brings us one pound per month with probability p > 1/2 and loses us one
pound with probability q = 1 − p < 1/2. Therefore, we can earn up to n
pounds and loose up to n pounds per month. The expected mean return is
PIf ri is the return from the i-th asset, the mean return per
however positive:
asset is Rn = n1 k rk and
E(Rn ) = (p − q) > 0.
Due to the law of large numbers, Rn stays close to the deterministic value
p − q with probability close to 1.
However, what is the probability of a large loss L = τ n, where 0 ≤ τ < 1
is fixed?
The answer is given by the following result:
Proposition 3.4.1. (Cramer) Let Pr1 , r2 , . . . , rn be i. i. d. bounded random
variables, µ = E(rk ). Let Rn = n1 nk=1 rk . Fix τ > −µ. Then
1
lim ln(P rob(−Rn > τ )) = −I(τ ),
n→∞ n
60
In other words, I(τ ) is the Legendre transform of the cumulant generating
function lnE(e−θr ).
The appearance of the rate function in the above form is easy to follow
using Chernof f inequality. By definition,
where χ(x) is 1 for x ≥ 0 and is zero otherwise. Therefore, for any positive
θ,
P rob(−Rn > τ ) ≤ E(enθ(−Rn −τ ) χ(−Rn − τ )) ≤ E(enθ(−Rn −τ ) )
n
−θr ))
Y
= e−nθτ E(e−θrk ) = e−n(θτ −lnE(e .
k=1
To find the tightest bound, we need to find the minimum of the right hand
side over all θ > 0. Therefore,
−θr ))
P rob(Rn < −τ ) ≤ e−nsupθ>0 (θτ −lnE(e = e−nI(τ ) .
−pe−θc + qeθc
τ= .
pe−θc + qeθc
61
This equation can be solved leading to a remarkable answer for the rate
function:
1−τ 1+τ
I(τ ) = DKL , ||(p, q) ,
2 2
where DKL is Kullback-Leibler divergence or relative entropy. DKL is widely
used in information theory and statistical mechanics as a measure of differ-
ence between two stochastic vectors. If P~ and Q ~ are d-dimensional stochastic
vectors,
d
~ ~
X Pk
DKL (P ||Q) = Pk ln .
k=1
Q k
62
3.4.1 Week 6 MATLAB exercises.
Numerical study of the Game.
Use the binomial distribution to analyse the n = 100-round game introduced
in Sec. 3.3.3: i n−i
n 1 1
P rob.(Sn = i) =
i 2 2
• Write a MATLAB programme for the computation the chance of loss, expected
gain and gain variance in Matlab. Compare your answer with theoretical pre-
dictions. How does the standard deviation compare with the expected value?
Remark. If you can use the statistics package, then you can use the operator
binopdf - which enables to choose larger values of n.
• Write a MATLAB programme which computes the expected value of the mean
rate of winning, its variance and build its histogram. Perform a careful compar-
ison of your numerical results with the predictions of the law of large numbers
and central limit theorem.
• Calculate the same probability using the (incorrectly applied) central limit the-
orem. How does the answer compare with the exact result?
63
n=2000;
%Create sample sequence
x=round(rand(1,n));
%Create the sequence of sample means
sm=zeros(1,n); for k=1:n
sm(k)=sum(x(1:k))/k;
end
%Plot the sequence and the expected value it should converge to
xx=1:n; subplot(2,2,1), plot(xx,1/2*ones(size(xx,2)),’k--’,
xx,sm,’r-’)
legend(’The limit’,’Random sequence of sample sums’)
%Conclusion: the sequence of sample sums seems to converge to 1/2.
yes=input(’Continue?’)
%1.2. Quantitative study of convergence
%Fix the number of experiments
expers=2000;
%Create the sequence of sample sums S_n’s
Sfinal=zeros(1,expers);
for k=1:expers
Sfinal(k)=sum(round(rand(1,n)))/n;
end
64
The probability of loss and large deviations theorem.
%Theory
tau=0; p=0.8; ptau=[(1-tau)/2,(1+tau)/2]; pp=[p,1-p];
dkl=sum(ptau.*log(ptau./pp))
%Max number of rounds
N=3000; n=1:N; prob=exp(-dkl*n);
%Experiment
%number of -1’s is greater than the number of ones.
for k=1:N prob_exp(k)=sum(binopdf(0:ceil(k/2-1),k,p)); end
%CLT
prob_clt=1/2*erfc((2*p-1)/(8*p*(1-p))^(1/2).*(n.^(1/2)));
subplot(2,2,1),
plot(10:100,log(prob(10:100))./(10:100),’r--’,10:100,
log(prob_exp(10:100))./(10:100),’k-’,10:100,
log(prob_clt(10:100))./(10:100),’b-.’);
legend(’Large deviations’,’Experiment’,’CLT’),xlabel(’Number of
rounds, n’),ylabel(’log(Pr(Loss))/n’) title(’log(Pr(Loss))/n for a
small number of rounds, p=0.8’) LA=[N/2:N];
subplot(2,2,2),
plot(LA,log(prob(LA))./LA,’r--’,LA,log(prob_exp(LA))./LA,’k-’,
LA,log(prob_clt(LA))./LA,’b-.’);
legend(’Large deviations’,’Experiment’,’CLT’),xlabel(’Number of
rounds, n’),ylabel(’log(Pr(Loss))/n’) title(’log(Pr(Loss))/n for a
large number of rounds, p=0.8’)
subplot(2,2,3),
semilogy(10:100,prob(10:100),’r--’,10:100,prob_exp(10:100),’k-’,
10:100,prob_clt(10:100),’b-.’);
legend(’Large deviations’,’Experiment’,’CLT’),xlabel(’Number of
65
rounds, n’),ylabel(’Pr(Loss)’) title(’Pr(Loss) for a small number of
rounds, p=0.8’) LA=[N/2:N];
subplot(2,2,4),
semilogy(LA,prob(LA),’r--’,LA,prob_exp(LA),’k-’,LA,prob_clt(LA),’b-.’);
legend(’Large deviations’,’Experiment’,’CLT’),xlabel(’Number of
rounds, n’),ylabel(’Pr(Loss)’) title(’Pr(Loss) for a large number of
rounds, p=0.8’)
log(Pr(Loss))/n for a small number of rounds, p=0.8 log(Pr(Loss))/n for a large number of rounds, p=0.8
−0.2 −0.22
Large deviations Large deviations
−0.25 Experiment −0.23 Experiment
CLT CLT
−0.3 −0.24
log(Pr(Loss))/n
log(Pr(Loss))/n
−0.35 −0.25
−0.4 −0.26
−0.45 −0.27
−0.5 −0.28
−0.55 −0.29
0 20 40 60 80 100 1500 2000 2500 3000
Number of rounds, n Number of rounds, n
Pr(Loss) for a small number of rounds, p=0.8 Pr(Loss) for a large number of rounds, p=0.8
0 −100
10 10
Large deviations Large deviations
Experiment Experiment
CLT CLT
−5
10
Pr(Loss)
Pr(Loss)
−200
10
−10
10
−300
10
−15
10
0 20 40 60 80 100 1500 2000 2500 3000
Number of rounds, n
Figure 3.1: The probability of loss: experiment, CLT and large deviations.
66
3.5 The law of large numbers for Markov chains
The assumption of independence of logarithmic price increments of past price
movements plays a fundamental role in financial mathematics.5 According
to such an assumption, the logarithm of a share’s price on day n conditional
on the logarithm of the price on day n − 1 is independent of the rest of the
price history (i. e. prices on days n − 2, n − 3, . . .). Markov chains are
mathematical models describing such random processes.
Let ξt be sequence of discrete random variables taking values in {1, . . . , n},
where t ∈ N is ’discrete time’. If
67
Theorem 3.5.1. (Law of Large numbers for Markov chains)
i
Tn
P | − qi | ≥ ǫ → 0, n → ∞,
n
ij
Tn
P | − Pji qi | ≥ ǫ → 0, n → ∞.
n
i
We will not prove this theorem here. Note only, that Tnn is the frequency
with which the value i appears in the sample sequence, whereas qi is the
probability of such an event in the equilibrium state. Therefore, we have
a typical weak law of large numbers: frequency converges to probability.
Similarly, the probability of the transition i → j is P r(ξn = j, ξn−1 = i) =
ij
P r(ξn = j | ξn−1 = i)P r(ξn−1 = i) = Pji qi , whereas Tnn is its frequency.
Notice also that the rate of convergence of frequency to probability stated
in the above theorem depends on the size of the second largest eigenvalue of
P.
Let
D(n) = L(n) − L(n − 1)
be the logarithmic daily price increment. Let us define the following four
states of the oil market:
68
In other words, state 1 describes a large (greater than one percent drop)
in crude oil prices, state 2 corresponds to the price drop between zero and
one percent, state 3 describes the price rise between zero and one percent.
Finally, state 4 describes a large (over one percent) increase in the oil price.
The following table shows the observed frequency of the state S(n) given
the state S(n − 1):
S(n) \ S(n − 1) 1 2 3 4
1 0.31 0.21 0.15 0.27
2 0.23 0.30 0.28 0.32
3 0.25 0.21 0.28 0.16
4 0.21 0.28 0.29 0.25
Assuming, that D(n) is independent on D(n−2), D(n−3), . . ., given D(n−1),
we arrive at the description of the fluctuation of oil prices in terms of a four-
state Markov chain with the following transition matrix:
0.31 0.21 0.15 0.27
0.23 0.30 0.28 0.32
P = 0.25 0.21 0.28 0.16
69
The norm of the second largest eigenvalue is 0.1 signalling a rapid convergence
to equilibrium. According to Theorem 3.5.1, matrix elements of Q ~ give us the
frequency of large and medium price drops and medium and large price rises
correspondingly. Note that the most frequent state is that of the moderate
price drop closely followed by the state of the large price increase. The matrix
of joint probabilities P r(S(n) = j, S(n − 1) = i) in the equilibrium state is
0.0731 0.0597 0.0333 0.0697
~ = 0.0542 0.0853 0.0622 0.0826
P ⊗Q 0.0589 0.0597 0.0622 0.0413
0.0495 0.0796 0.0644 0.0645
70
to power law distributions of price increments. Such a behavior clearly vio-
lates CLT. The possible reason for such a violation is the presence of strong
correlations between stock prices contributing to a given index.
To motivate such models, let us consider the volatility of FTSE100 index
shown in Fig. 3.6 7 :
Notice that periods of high volatility tend to cluster together. Notice also
that there are long periods of volatility staying below the mean value -the
periods of mean reversion.
Given such a behaviour, it is quite improbable that daily volatilities are
described by independent random variables. Indeed, let
hσ 2 (t)σ 2 (t + τ )i
A(t, τ ) = 2 −1
hσ (t)ihσ 2 (t + τ )i
71
is often referred to as autocorrelation function. For a stationary sequence
of random variables, the autocorrelation function depends on the time lag τ
only.
If volatilities on different days were independent random variables, the
autocorrelation function would have been zero. Look however at Fig. 3.6:
not only A(τ ) 6= 0, it also decays very slowly as the time lag increases: the
red curve shows the result of LMS fit of the exponential function e−τ /τ 0 to
data. The law of decay is much better described by a power law, signalling
long memory in the sequence of daily volatilities.
72
previous Section: transition probabilities from state i to state j depended on
the state i. For this Markov chain we did not attempt to model the stochastic
volatility explicitly. Instead, transition probabilities were estimated directly
from data.
The first explicit stochastic volatility model is ARCH(m) model invented
in 1982 by Engle to model the volatility of UK inflation. Let D(n) be the
logarithmic price increment on day n, let σn be the volatility of D(n). The
variance is equal to σn2 . According to Engle,
Here V0 is the expected variance, {W (n) ∼ N (0, 1)} are independent random
variables; parameters γ and α’s are non-negative numbers such that
m
X
γ+ αk = 1.
k=1
which leads to the above constraint. Note than for any ARCH model,
E(D(n)D(n − k)) = 0 for k 6= 0. Therefore price increments are uncorrelated
(but not independent - dependence creeps in via dependencies in volatilities).
It is easy to see how ARCH can lead to clamping and mean reversion: large
9
’hetero’ literally translates as ’different’, ’skeda’ - as ’step’ from Greek.
73
values of observed price increments lead to large future volatility, which in
turn increases the probability of large future price increments. If on the other
hand, a sequence of small price increments has occurred, the predicted future
volatility will be less than the mean value.
The most common model in the family is ARCH(1):
D(n) = σ(n)W (n), (3.6.9)
σ (n) = γV0 + (1 − γ)D2 (n − 1),
2
(3.6.10)
which can be easily studied using MATLAB. Here is the script generating
the series of volatilities according to ARCH(1):
%%%%%%%%%%%%%%%%%%%%%%
%ARCH(1)
%%%%%%%%%%%%%%%%%%%%%%
%Expected variance
V0=1;
%Runlength
N=1000000;
%Variance array
V=V0*ones(1,N);
%Increments array
D=zeros(1,N);
%Memory parameter
gamma=0.4;
for k=2:N
V(k)=gamma*V0+(1-gamma)*D(k-1)^2;
D(k)=sqrt(V(k))*randn;
end
subplot(1,2,1),plot(1:700,V0*ones(1,700),’r.’,1:700,V(1:700),’k-’)
xlabel(’Time’), ylabel(’Volatility’),
legend(’Mean volatility’, ’Stochastic volatility’)
%Autocorrelation function
maxlag=10;
AC=zeros(1,maxlag);
for k=1:maxlag
AC(k)=(sum(V(1:N-k).*V(k:N-1))/(N-k-1)-mean(V)^2)/V0^2;
end
74
subplot(1,2,2), semilogy(AC)
xlabel(’Time lag’),ylabel(’Autocorrelation’)
%%%%%%%%%%%%%%%%%%%%%%
10
0
10
8
Autocorrelation
Volatility
−1
6 10
−2
10
−3
0 10
0 200 400 600 800 0 2 4 6 8 10
Time Time lag
Figure 3.4: Volatility time series and autocorrelation function for ARCH(1).
Currently there exist more advanced models capable of much more accu-
rate reproduction of the statistics of price fluctuations, for example, general-
ized ARCH or GARCH(p,q) (Bollerslev (1986)):
Note than in the GARCH model, the future volatility depends not only on
price increments but also on the past volatilities. As a result, the GARCH
75
time series tends to have a longer memory and a stronger clamping compared
with ARCH model.
By now we are convinced that stochastic volatility in ARCH-type mod-
els exhibits the correct qualitative behavior. But what are the implications
for the statistics of price increments? For instance, can it be close to Gaus-
sian thus justifying a Black-Scholes type theory? The standard measure of
closeness of a random variable X to Gaussian is its kurtosis,
E((X − E(X))4 )
KX = − 3.
E((X − E(X))2 )2
Conjecture:
Const
lim P rob.(Dn2 > V ) ∼ (1 + O(1/V )) ,
n→∞ Vµ
where exponent µ > 0 solves the equation
Γ(µ + 1/2)
√ = (2(1 − γ))−µ , (3.6.13)
π
76
where Γ(x) is gamma function.
To the best of my knowledge, the rigorous proof of this conjecture is an
open problem, but an informal argument leading to the above equation is
pretty straightforward. √
What is the exponent µ at the point γ = 1 − 1/ 3 for which kurtosis
becomes infinite? Using the fact that
3√
Γ(5/2) = π,
4
we find that µ = 2. As we know, the random variable distributed according
to P (V ) ∼ V −2 violates assumptions of CLT and the corresponding sample
mean does not converge to a Gaussian distribution. The fact that volatility
modeling led us to distributions of price fluctuations with power law tails has
fundamental implications for risk estimation.10
77
%Final increments
Dfinal=zeros(1,expers);
for pp=1:expers
%Variance array
V=V0*ones(1,N);
%Increments array
D=zeros(1,N);
%Memory parameter
gamma=0.1;
for k=2:N
V(k)=gamma*V0+(1-gamma)*D(k-1)^2;
D(k)=sqrt(V(k))*randn;
range=[0:.1:100*V0];
pdf_exp=hist(Dfinal,range);
%Theory
%Analytical value of \mu
x=fsolve(’(2*(0.9))^x*Gamma(x+1/2)/sqrt(pi)-1’,2,optimset(’fsolve’))
loglog(range,pdf_exp/sum(pdf_exp),’k--’,range,range.^(-x-1),’r-’)
legend(’Experiment’,’Theory’)
The result for various values of the parameter γ is shown in Figs. 3.5, 3.6
and 3.7. As one can see, numerical simulations support both the algebraic
tail conjecture and the particular value of the tail exponent given by (3.6.13).
78
3
10
Experiment
Theory
2
10
1
10
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
10
−7
10
−1 0 1 2
10 10 10 10
What is the probability P+ that either of the players has been ahead in this game
96 percent of the time? What is the probability P0 that neither player has been
ahead more than 52 percent of the time? How do these probability compare?
79
3
10
Experiment
Theory
2
10
1
10
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
10
−1 0 1 2
10 10 10 10
1
Pn
wins, and ξi = −1 if player II wins in the i-th round. Let Yn := n i=1 ξi . Then
#{0 ≤ n ≤ 1000; Yn > 0}
P+ = P rob( ≥ 0.96)
1000
#{0 ≤ n ≤ 1000; Yn > 0}
+P rob( ≤ 0.04)
1000
#{0 ≤ n ≤ 1000; Yn > 0}
P0 = P rob(0.48 ≤ ≤ 0.52)
1000
A naive argument aiming to compare P+ and P0 goes as follows: for large n, Yn
are mean zero variables with an approximate distribution N (0, 2√1N ). Therefore,
Yn is positive for about 50 percent of the time. Therefore, P0 >> P+ .
This argument misses completely the fact that random variables Y1 , Y2 , . . . are
correlated: If for example, Yn > 1, then Yn+1 > 0 with probability one.
Hints. Useful MATLAB commands: cumsum, sum(Y > 0, 2), sum(Z <
520). ∗ (Z > 480), 2). According to my estimates, P+ ≈ 0.25, P0 ≈ 0.023.
Hence P+ >> P0 .
80
γ=0.1
1
10
Experiment
Theory
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
10
−7
10
−1 0 1 2
10 10 10 10
81
• If zi ∈ (0, a), zi+1 − zi = 1 with probability p and zi+1 − zi = −1 with probability
q = 1 − p.
7
Walker‘s co−ordinate
0
0 5 10 15 20 25 30
Time
82
is the number of transitions zi+1 − zi = 1 (for i ≥ 0) and l the number of
transitions zi+1 − zi = −1 in this path.
The above definition of the measure on Ωz is simple as we chose to deal
only with the paths which ’stick’ in finite time. But there is a price to pay:
we will have to show that Ωz with the described probability measure is indeed
a probability space, i. e. that
X
Pz ≡ p(ωz ) = 1
ωz ∈Ωz
Keeping in mind that we will have to repay this debt, let us proceed to
the computation of the probability of winning. Let
′
X ′′
X
Wz = p(ωz ), Lz = p(ωz ),
P′
P′′ is summation over all ’winning’ paths ωz ∈ Ωz which end up at a
where
and is the summation over all ’losing’ paths which end up at 0. Clearly,
Wz is the probability of winning, Lz is the probability of losing and
Pz = Wz + Lz .
= pWz+1 + qWz−1 .
The equations for Pz and Lz can be derived in exactly the same way:
83
Proposition 3.7.1. There is a unique solution to the second order homoge-
neous linear difference equation
A 0 = C1 , A a = C2 ,
given by
λz − 1
Az = C1 + (C2 − C1 ), λ 6= 1,
λa − 1
z
Az = C1 + (C2 − C1 ), λ = 1,
a
where
q
λ=
p
Proof: We will give a constructive proof of the statement, the ideas from
which can be applied to the study of a large class of difference equations.
Let
DAk = Ak+1 − Ak
be the discrete derivative. The repeated application of discrete differentiation
gives
D2 Ak = Ak+2 − 2Ak+1 + Ak .
Using the operations of D and D2 , one can re-write equation (3.7.14) as
follows:
q−p
D2 Ak = DAk .
p
Let us introduce a new dependent variable Bk = DAk . In terms of this
variable our equation takes the form
q−p
Bk+1 = (1 + )Bk ≡ λBk , k = 0, 1, . . . , a − 1.
p
This equation can be solved explicitly:
B k = λk B 0 .
84
The solution is manifestly unique given B0 . Matrix counterparts of the equa-
tion we have just solved play fundamental role in the general theory of dif-
ference equation. Reverting to the original variable Ak we find that
Ak+1 = Ak + λk B0 .
λk −1
The sum of the geometric series above is equal λ−1
for λ 6= 1. For λ = 1 it
is equal to k. Therefore,
λk − 1
Ak = A0 + B0 , λ 6= 1, (∗)
λ−1
Ak = A0 + B0 · k, λ = 1.
The constructed solution is unique given A0 , B0 . If Aa is specified instead,
λa − 1
Aa = A0 + B0 ,
λ−1
which always has a unique solution with respect to B0 (treat λ = 1 case in
the sense of a limit):
Aa − A0
B0 = a (λ − 1).
λ −1
Substituting this into (∗) we get the statement of the proposition for A0 = C1 ,
Aa = C 2 .
We can now solve the equations for Pz , Wz and Lz with their respective
initial conditions. Substituting C1 = C2 = 1 into 3.7.14 we find that
Pz = 1, ∀z.
85
L z = 1 − Wz .
Note that in both cases, the chance of winning increases with the initial for-
tune of the gambler. Perhaps this is the reason that Keynes said ”Millionaires
should always gamble, poor men never”.12
The idea of solving equations in order to determine various statistical
properties of a random process is a powerful one. For instance, we can
easily calculate the expected duration (number of rounds E(τz ) of the game
started at z. (Note that the expected duration of the game is finite: the
game definitely ends when we have a run of a consecutive wins or losses.
The probability that this happens within a run of a games is more than
pa + q a > 0. Therefore, the probability that the game ends before time n · a
is ≥ 1 − (1 − (pa + q a ))n . This means that PP
{τ (ωz ) ≥ n} goes exponentially
fast to zero and that (Duration = Ez (τ ) := τ (ωz )p(ωz ) < ∞.)
The expected duration of the game satisfies the following difference equa-
tion
= p · (E(τz+1 ) + 1) + q · (E(τz−1 ) + 1)
= pE(τz+1 ) + qE(τz−1 ) + 1, 0 < z < a,
which should be equipped with the obvious boundary conditions:
E(τ0 ) = E(τa ) = 0.
12
John Maynard Keynes, 1st Baron Keynes, CB (pronounced keinz) (5 June 1883 - 21
April 1946) was a British economist whose ideas, called Keynesian economics, had a major
impact on modern economic and political theory as well as on many governments’ fiscal
policies. He advocated interventionist government policy, by which the government would
use fiscal and monetary measures to mitigate the adverse effects of economic recessions,
depressions and booms. He is one of the fathers of modern theoretical macroeconomics.
He is known by many for his refrain ”In the long run, we are all dead.” (Wikipedia)
86
The difference equation we are dealing with is inhomogeneous. The sim-
plest way to solve it, is by guessing a particular solution. For example, for
p 6= q, a good guess would be
Fz = Cz,
1
where C is a constant. Substituting Cz into the equation we find C = q−p
.
Then
z
E(τz ) = + Az ,
q−p
where Az solves the homogeneous equation (3.7.14) subject to the boundary
conditions
a
A0 = 0, Aa = − .
q−p
Using the proposition (3.7.1), we find that
1 − (q/p)z
1
E(τz ) = z−a
q−p 1 − (q/p)a
87
p q z a Ruin Success E(Gain) E(Duration)
0.5 0.5 9 10 0.1 0.9 0 9
0.5 0.5 90 100 0.1 0.9 0 900
0.5 0.5 900 1000 0.1 0.9 0 90,000
0.5 0.5 950 1000 0.05 0.95 0 47,500
0.5 0.5 8,000 10,000 0.2 0.8 0 16,000,000
0.45 0.55 9 10 0.210 0.79 -1.1 11
0.45 0.55 90 100 0.866 0.134 -76.6 765.6
0.4 0.6 90 100 0.983 0.017 -88.3 441.3
0.4 0.6 99 100 0.333 0.667 -32.3 161.7
88
3.8 Cox-Ross-Rubinstein model and Black-
Scholes pricing formula for European op-
tions.
We will now apply our knowledge of discrete random walks to analyse price
fluctuations of assets in Cox-Ross-Rubinstein (CRR) model and apply the
result to derive one of the most fundamental formulae of the quantitative
finance - Black-Scholes formula for the price of a European option.
Here u > 1 is the price increase factor, 0 < d < 1 is price reduction factor,
p is the probability of price increase. In CRR model it is assumed that the
magnitude of the upward jump is the same as the magnitude of downward
jump,
u · d = 1.
Let parameterise u, d as follows:
u = 1/d = eσ ,
where σ > 0 is a parameter equal to the root mean square of the logarithmic
increment: v
u 2 !
u
tE ln S n+1
= σ.
Sn
89
The CRR model can be defined more compactly, if one introduces a se-
quence of i. i. d. random variables ξn , which take values 1 and −1 with
probability p and q correspondingly. Then
Sn+1 = eσξn Sn .
Note that
ln(Sn+1 ) = ln(Sn ) + Dn ,
where Dn = σn ξn is the logarithmic price increment. Therefore the CRR
model makes the standard assumption about statistical independence of log-
arithmic price increments, which makes the prediction of price increments
impossible. We see that the sequence lnS0 , lnS1 , lnS2 , . . . describes the tra-
jectory of a one dimensional random walker with jumps of size σ.
CRR model has two parameters - σ and p. As we said, parameter σ
is related to price volatility. Amazingly, parameter p can be expressed in
terms of σ using the most fundamental assumption about the real markets -
the risk neutrality. According to this assumption, the expected value of an
asset T days (months, years) from now is equal to the risk free return one
would get by cashing the asset in today and putting money in the bank for T
days. Without some form of risk neutrality, the stable market is impossible:
imagine for example, that the expected return from an asset is less than
the risk-free investment. Then everybody would stop buying shares and take
their money to the bank. The trouble is, the bank wouldn’t want to invest
either!
For the case at hand, the risk-neutrality assumption takes a simple form:
er∆T Sn = E(Sn+1 | Sn ), (3.8.15)
where r is the interest rate which we assume to be constant. As we can see,
the interest is measured in units of inverse time.
Explicitly,
er∆T = up + d(1 − p),
which can be solved for p:
er∆T − e−σ
p= .
eσ − e−σ
Risk neutrality assumption even yields an important restriction on σ: as p is
the probability, we must have p < 1. Therefore
σ > r∆T
90
If in particular, we reduce intervals between the price measurements, ∆T →
0, σ cannot go to zero faster than ∆T without the model violating the risk
neutrality condition.
The price fluctuations in the CRR model are completely characterized:
the price logarithm follows a discrete random walk with jump size equal
to volatility. Jump probability is a function of interest rate and volatility.
Volatility has to be found from the analysis of market data. Note however,
that volatility in CRR model is constant, which seriously reduces the domain
of its applicability.
How can we use the information about asset prices to calculate a ’fair’
option value in the CRR Universe? To be definite, consider a European call
option. As we briefly discussed before, this is a contract, which gives its
owner a discretion to buy the specified asset (underlying asset) from the
contract’s writer at the specified time T (expiry or maturity date) for the
specified amount K (strike price). The writer has an obligation to sell the
underlying asset at the strike price.
Let S(T ) be the asset’s price at time T . If S(T ) > K, it makes sense for
the owner of the option to exercise it: by buying the option for K, he/she
can immediately sell it for S(T ) thus making a profit of S(T ) − K. If on the
other hand, S(T ) < K, the option is worth nothing as exercising it will lead
to a loss. Therefore, the value of European option at the exercise date is
C(N, SN ) = max(SN − K, 0),
where we used discrete time N = T /∆T . Note C(N, SN ) is a random vari-
able. The ’fair’ value of the option is fixed by the same risk-neutrality con-
ditions which we used to calculate the parameters of the CRR model. Let
C(n, S) be the value of the option on day n, S is the corresponding price of
the underlying asset. It is important to notice that in the CRR Universe,
the value of the option can depend only on S and n, as part of the price
trajectory for t > n is independent of the past walk history conditional on
the price Sn = S. If we sell the option on day n and put all the money in
the bank at the interest rate r, we will end up with risk-free pay-off equal to
C(n, S)e(N −n)r∆T
The risk neutrality principle requires that this is equal to the expected pay-off
resulting from exercising the option,
E(max(SN − K, 0) | Sn = S).
91
Therefore, we arrive at the formula for the value of the option:
The random variable inside the expectation value depends only on the num-
ber of up and down jumps between times n and N . In principle, we could use
this fact to calculate this expectation value using a version of central limit
theorem.
There is however, a more instructive way. We have learned recently that
one can compute various expectation values and probabilities for random
walks by solving difference equations. Can we derive an equation for C(n, S)?
Certainly:
C(n, S)
(n−N )r∆T
=e (p · E(max(SN − K, 0) | Sn = S, ξn = 1)
+q · E(max(SN − K, 0) | Sn = S, ξn = −1))
(n−N )r∆T
=e (p · E(max(SN − K, 0) | Sn+1 = uS)
+q · E(max(SN − K, 0) | Sn+1 = dS)).
Expressing the right hand side in terms of C(n, S) we find the equation
C(n, 0) = 0.
If S becomes very large at some time it is likely to stay very large up to time
N . The expectation value in the pricing formula becomes approximately
equal to S and we get the second boundary condition:
C(n, S)
lim = e(n−N )r∆T
S→∞ S
92
Finally, the value of the option at n = N is known:
To find the value of the option on any day n < N all we need to solve the
Black-Scholes equation backward in time using the ’final’ condition at n = N
we have just derived.14
(i) Construction of the price tree. Firstly, we should build the set of all
possible asset prices at times ∆T, 2∆T, . . . , N ∆T . For instance, at time ∆T
possible prices are uS and dS, at time 2∆T the prices are u2 S, udS = S, d2 S.
In general, at time n∆T (n ≥ 0) there are n + 1 possible prices: un−2m S,
m = 0, 1, . . . , n.
It is due to condition ud = 1 that the number of vertices grows lin-
early with time rather than exponentially. Such a binomial tree is called
recombining.
14
A practitioner’s view of the CRR model: Considered as a model CRR isn’t too pro-
ductive since it does not admit explicit solutions (other than as series or in the BS limit),
it doesn’t readily generalise, and empirically it falls over straight away The world is cer-
tainly not Wiener (whatever else it might be) but it is even less binomial! As a numerical
method it has a greater appeal.
CRR came along in the early 70s (CR wrote a book about BS that introduced a binomial
tree). It was seen at the time as being a simple (the marketing blurb) method for (i) pricing
American option in BS framework and (ii) explaining to non-maths types how you could
get prices out of BS. There were derivations around back then that extracted BS from
CRR. Of course CRR can be viewed as a weak numerical scheme for GBM, as can Jarrow
and Rudd and the other binomial methods that were circulating at the time. (Pr. Nick
Webber, WBS.)
93
This set of prices can be thought of as the set of vertex labels of binomial
price tree, see Fig. 3.9, part I. The tree can be constructed recursively using
the f orward step shown in Fig. 3.9, part II. The vertices of the tree can
be arranged into levels according to the number of time steps needed to get
from the root of the tree to the given vertex. Vertices at the final, N -th level,
are called leaves.
(ii) Calculation of the option price. Leaves of the tree correspond to the
exercise time. Therefore, the value of the option at each leaf is known due to
(3.8.18). Assume that value of the option at each vertex at level n is known.
Then its value at each vertex of level n − 1 can be calculated using equation
(3.8.17)! This process is called backward evaluation. Its elementary step is
shown in Fig. 3.9, part III. After N steps of backward evaluation we reach
level zero, i. e. calculate C(S, 0).
The binomial tree method of solving the discrete BS equation can be gen-
eralized to deal with stochastic volatility models and/or more complicated
type of options such as American options. You will have a chance to im-
plement this algorithm and its generalizations during the MATLAB sessions
and while working on the term project.
e−r∆t ≈ 1 − r∆t,
1
u ≈ 1 + σ + σ2,
2
1
d ≈ 1 − σ + σ2,
2
94
uN S
u2 S
uN-2 S
uS
S
S
dS
d2 S
dN-2 S
Level 1 d NS
Level 2
Level N
I. The binomial price tree
Cn+1 (uS)
uSn
Sn
dSn
Cn+1 (dS)
2σp ≈ r∆T + σ − σ 2 /2
∂ ∂ σ2 ∂2
C(uS, t+∆t) ≈ C(S, t)+ C(S, T )∆T +S C(S, T )(σ+σ 2 /2)+ S 2 2 C(S, T )
∂t ∂S 2 ∂ S
95
∂ ∂ σ2 ∂2
C(dS, t+∆t) ≈ C(S, t)+ C(S, T )∆T +S C(S, T )(−σ+σ 2 /2)+ S 2 2 C(S, T )
∂t ∂S 2 ∂ S
Substituting these expansions into (3.8.17) and collecting the terms we get
the following relation:
∂C ∂C σ2 2 ∂ 2C
(S, t) + rS (S, t) + S (S, T ) − rC(S, t) + O(σ 3 /∆T ) = 0.
∂t ∂S 2∆T ∂ 2 S
(3.8.19)
The price of the option should not depend on ∆T as the latter goes to zero.
Therefore, two cases are possible:
The meaning of this solution is clear: assume that the asset price grows
deterministically according to the risk-free interest rate. Then the price neu-
trality assumption applied to C(S, t) and the known pay-off at time T would
tell us that
er(T −t) C(S, t) = max(Ser(T −t) − K, 0),
which coincides with (∗∗). In other words, Case I corresponds to zero volatil-
ity limit for which it would be impossible to make or lose money using options.
The possibility that real markets fall into this Case has been thoroughly ruled
out experimentally.
Case II. lim∆T →0 (σ 2 /∆T ) = σ02 > 0. The limiting equation is the celebrated
Black-Scholes (BS) equation:
∂C ∂C σ02 2 ∂ 2 C
(S, t) + rS (S, t) + S 2 (S, T ) − rC(S, t) = 0.
∂t ∂S 2 ∂ S
(3.8.20)
96
The parameter σ02 has the meaning of the continuous time volatility, asso-
ciated with the geometric Brownian motion which describes the continuous
limit of the CRR model. We will learn more about this stochastic process in
the near future.
In the mean time, let us solve equation (4.2.15). Introducing new inde-
pendent variables
x = ln(S/K), τ = T − t
we can re-write the BS equation as the forward diffusion equation with the
drift term:
∂C σ 2 ∂C σ2 ∂ 2C
− (x, τ ) + (r − 0 ) (x, τ ) + 0 2 (x, τ ) − rC(x, τ ) = 0.
∂τ 2 ∂x 2 ∂ x
Next, we change the dependent variable:
Cx ∼ eC± x , x ± ∞, (3.8.23)
where s
2
1 − y 2
G(τ, y) = e τ 2σ0
.
2πτ σ02
is the Green function of the diffusion equation. Notice that Green’s function
coincides with the Gaussian probability distribution function with variance
τ σ02 , which describes (due to CLT) the law of the position of a discrete random
walker in the continuous limit.
Substituting the initial conditions into (3.8.24) we find:
Z ∞ Z ∞
(1−α)x
C0 (x, τ ) = S dyG(x − y, τ )e −K dyG(x − y, τ )e−αx .
0 0
The integrals in the above solution can be expressed in terms of error func-
tions. Substituting the answer for C0 into C = eαx+βτ and re-writing the
result in terms of the original variables S and t, we arrive at the Black-
Scholes formula for the European call option:
erf c(−d1 ) erf c(−d2 )
C(S, t) = S −K , (3.8.25)
2 2
98
where t < T ,
σ02
ln(S/K) + (T − t)(r + 2
)
d1 = p , (3.8.26)
2σ02 (T − t)
σ02
ln(S/K) + (T − t)(r − 2
)
d2 = p , (3.8.27)
2σ02 (T − t)
(3.8.28)
and ∞
2
Z
2
erf c(x) = √ dye−y
π x
99
Increase of the option price with the contract length Increase of the option price with volatility.
0.5
0.5
0.45
0.45
0.4
0.4
0.35
0.35
0.3
0.3
C
C
0.25
0.25
0.2
0.2
0.15
0.15
0.1 0.1
Time−to−exercise
increases
0.05 Volatility increases
0.05
0
0.5 1 1.5 0.5 1 1.5
S/K S/K
100