Report
Report
Yu Sun 9074473373
d
M = λM
dt
this simple model actually describe many process in biology, epidemic theory, physics and
chemistry etc. And at the same time, most of these processes were also described by Markov
Chain Model, which means, for probability space (Ω, F , P) with filtration (Ft , t ∈ T ), (S, S)
be a measure space, the for any s < t, s, t ∈ T , for all A ∈ S,
for simplicity it means that the future only depends on current state information. For
example, we choose a special Markov Process X(t) to be Branching Process, and we
have the following lemma,
Lemma. There exists a sequence of {Xn (t)} are branching processes with
with M (s) = M (0)eλs with initial value of M (0) as the solution of Markov Chain Model.
Based on this lemma in 1970’s, Kurtz give a extension of this lemma to pure jump Markov
Process[3].
We will use the same notation in the paper. If we let {Xn (t)} to be a sequence of pure
jump Markov processes with state spaces (En , Bn ), where En ∈ B K , B K denotes the Borel
σ-algebra of RK , Bn = {En ∩ B : B ∈ B K }. If Xn (t) are all right continuous with
1
• exit distribution: µn (x, Γ), Γ ∈ Bn ;
• P{Xn (τxn ) ∈ Γ|X(0) = x} = µn (x, Γ), is Bn -measurable, and τxn is the first exit time
for X n from x;
lim P{sup |Xn ([αn s]) − X(s, x0 )| > }, ∀ > 0, ∀t > 0.
n→∞ s≤t
The proof of this solution based on a similar result derive from ODE, and here we just
skip the details. After this we already have a theorem about under certain condition a
sequence pure jump Markov Processes Xn (t) converge to the solution of X(t) of the first
order differential equations, and in which sense of this converge was detailedly discussed in
[4]. Actually this theorem gives a really good theoretical direction of deterministic model,
which as good as for these stochastic model provided with sufficient large population. We
will detailed discussed the sufficient condition of a stochastic model converge to a ODE,
and here we just give some really simple examples for better understanding.
We started with Markov process with many small jumps with fast rate [5]. We define the
drift is the product of average jump and the total rate. For more general cases the process
may have multiple types of jump, then the drift is the linear sum product of different types
of jump and its rate. If we consider N large enough, then N will be well quantifies the
size of the jump and the jump rate. (R.K This notation arise from fluid limit or law of
large number, for sufficient large N , the size of jump is order 1/N and the rate is order
√
N ; this id quite different from diffusive and central limit, which jumps have size of 1 N
and the rate of order N . Further, for classical central limit, the Winer process, also means
the Gaussian diffusive limit, can be used to describe the first order fluctuations of a process
around the fluid limit.)
2
jump of size 1/N and rate λN . Then is easy to get drift is λ, then ODE is
dyt = λdt
µ1
k1 A + k2 B ⇐=
=⇒ k3 C
µ2
In this reversible reaction, molecules A and B become C with number of k1 , k2 and k3 and
rate of µ1 and µ2 . If At , Bt , Ct denote the number of molecules at state t, and
~ t if x~0 = X
with λi = ki /N , i = 1, 2, 3; ν1 = µ1 /N k1 +k2 , ν2 = /N k3 will be closed by X ~0 .
dyt = (µ − 1)yt dt
As far we give some first order approximation to the Markov Chain, and in [Kurt’s
71] he discussed how this “sufficient” large is by using Martingale Theorem to prove the
bounds of the probability in Thm, but a more important thing is we want to get a preciser
approximate than only first order, in next two sections we will introduce a Theorem about
Markov Chain converges to SDE.
3
2 Intro of Diffusion Process
In many physical, biological, economic and social phenomena, we are using diffusion process
to approximate or reasonable modeled. And in fact, diffusion process has many good
properties and the most famous and important one is weakly convergence from SDE to
deterministic PDE. Here we started with the introduction of diffusion process[1].
Definition.1. A continuous time parameter stochastic process which possesses the (strong)
Markov property and for which the sample paths X(t) are a.e continuous functions of t is
called diffusion process.
• starting from any interior point of I reach another interior pointof I will have positive
measure.
And WLOG, following diffusion process we talked about are all regular.
1
lim P(|X(t + h) − X(t)|> |X(t) = x) = 0;
h→0 h
Usually we called µ(x, t) and σ 2 (x, t) infinitesimal parameters of the process. µ(x, t) is
called infinitesimal mean or drift parameter ; σ 2 (x, t) is called infinitesimal variance
or diffusion parameter. Generally, µ(x, t) and σ 2 (x, t) are continuous function of x and
t.
An interesting question is: we always want to determine a stochastic process to be a
diffusion process, that means we need to find out the sufficient condition of diffusion process,
but first we should started with standard process.
4
• left limits of X(t) exist: for all s > 0, limt→s X(t) exists;
• X(t) is continuous from the left through Markov times: if T1 ≤ T2 ≤ · · · are Markov
times converging to T ≤ ∞, then for T < ∞:
1
P(|X(t + h) − X(t)|> |X(t) = x) → 0, as h → 0
h
for all compact subinterval for I.
Proof. And the prove is quite long we just skip it, you can find all the details in [1].
3 Convergence to Diffusion
(N ) (N )
In this section we denote X (N ) = {Xk } and X (N ) = {Xt } as discrete time and contin-
uous time cases of stochastic process. And we first study discrete case [2].
(N )
If X (N ) = {Xk }, with state space confined to a closed bounded interval of the real
(N ) (N )
line to an increasing sequence of a σ− field {Fk }, that means Xk is measurable with
(N )
respect to Fk . In didn’t we require the Markov property for discrete case.
(N ) (N ) (N )
We use the same notation ∆Xn = Xn+1 − Xn , and use the terminology of condi-
(N )
tional moments to describe ∆Xn , we assume it satisfies
(N ) (N ) (N ) (N )
• E[∆Xn |Fn ] = hN µ(Xn ) + 1,n ;
(N ) (N ) (N ) (N )
• E[(∆Xn )2 |Fn ] = hN σ 2 (Xn ) + 2,n ;
(N ) (N ) (N )
• E[(∆Xn )4 |Fn ] = 4,n ;
P (N )
• for i=1,2,4, [z] denotes the integer part of z, then E|i,n | → 0
n<[t/hN ]
(N ) (N )
R.K. Usually, when Fn = σ(Xn ), then we can replace the conditional expectation by
E[·|Fn(N ) ] = E[·|Xn(N ) ]
5
then we consider about continuous cases, let
(N )
X (N ) (t) = X[t/hN ]
that means it’s a step continuous stochastic process, then we have a most important theorem
in this chapter
and that weakly convergence means converge in distribution to X(t), which has infinitesimal
drift and variance to be µ and σ 2 .
Proof. The detailed is all in [Durr 96] (7.1) and (8.2), tightness condition is (A) and
truncated moment condition is (B) in P297. Briefly we separate to discrete and con-
tinuous cases. For both cases we cans show tightness by showing
Z t Z t
Xti − bi (Xs )ds, Xti Xtj − ai (Xs )ds
0 0
are local martingales. And For continuous case we can either show that
d
sup P(Xth ∈ Rd |X0h = x) < ∞
x∈K dt
Example.1. Generations are in discrete time t=0,1,2,3... There are 2 types of players:
C(cooperators) and D(defectors). Fixed integer N. Suppose at the beginning (generation 0)
there are N/2 C and D. To go to the next generation,
• randomly pick two guys and play according game matrix. New guys are generated
according to this step. 2C and 5D are generated from this game.
• Pick 7 guys uniformly at random N guys in generation 0 and replace them by 2C and
5D. Therefore there are also N guys in total at generation 1.
(N ) (N )
Show that Ct , Dt converge to a system of ODE.
Proof. By C and D are conservative, dCt = −dDt , for simplicity we just consider Dt . If
i
Dn = N, then the transform probability is
Cij CN
7−j
−i
pi,i+5−j = 7 , j = 0, ..., 7
CN
6
For the drift term,
7
X i + 5 − j 1 j 7−j
i i
E[∆Dn |Dn = ] = 7 Ci CN −i −
N N CN N
j=0
7 7
1 j 7−j 7−j 1 i
jCij CN
X X
= 7 [(i + 5) Ci CN −i − −i ] −
CN N N
j=0 j=0
7
1 7
X j−1 6−(j−1) 1 i
= 7 [(i + 5)CN − iCi−1 C(N −1)−(i−1) ] −
CN N N
j=1
1 7 6 1 i
= 7 [(i + 5)CN − iC(N −1) ] −
CN N N
7i 1
= (5 − )
N N
7
2 i i2 i 7i 1 X i + 5 − j 1 j 7−j
E[∆ Dn |Dn = ] = 2 − 2 (i + 5 − ) + 7 Ci CN −i
N N N N N N 2 CN
j=0
1 14i2
1 7i 7 42i(i − 1) 7
= {(−i2 − 10i + ) + 7 [(i + 5)2 CN
7
− (2i + 9) CN + C ]}
N2 N CN N N (N − 1) N
1 14i2 14i2
= { − + o(1/N )} = 0
N N N
i
then σD (x) = 0. For similar calculation E[∆2 Dn |Dn = N] = 0. Therefore, dDt = (5 − 7t)
Dt = 5 − 3 −7t
7 14 e
C = 2 + 3 −7t
t 7 14 e
7
Example.2. (More general case for evolutionary game theory ) We still consider population
#C #D
is fixed number N , and we have C and D types in population with p = N and q = N . If
they have a symmetric game matrix as
C D
C α (β, γ)
D (β, γ) δ
#C(#C−1)
• pick (C,C), with P = N (N −1) = p2 generate 2α new C;
2#C#D
• pick (C,D), with P = N (N −1) = 2pq generate β new C and γ new D;
#D(#D−1)
• pick (D,D), with P = N (N −1) generate 2δ new D
By
E[·] E[·2 ]
XCC 2αq 2αqp + (2αq)2
XDD −2δq 2δqp + (2δq)2
XCD (β + γ)q − γ (β + γ)(q − p − 2γq) + (β + γ)2 q 2 + γ 2
8
Then by the similar calculation as before,
E[ ∆C ] = N1 [f ( N
C
)];
N
E[( ∆C 2 1 1 C
N ) ] = N [ N g( N )];
E[( ∆C )4 ] = 0
N
with
2 3 2
f (x) = 2αx (1 − x) − 2δ(1 − x) + 2(β + γ)x(1 − x) − 2γx(1 − x);
g(x) = 2αx3 (1 − x) + 4α2 x2 (1 − x)2 + 2δx(1 − x)3 + 4δ 2 (1 − x)4
+ 2x(1 − x){(β + γ)[x(1 − x) − 2γ(1 − x)] + (β + γ)2 (1 − x)2 + γ 2 }
A B
A a b
B c d
And we assume population is finite (N) and constant, the balance between selection can
drift can describe as Moran Process:
Definition.2. The transition matrix of the stochastic process is tri-diagonal in the shape
and the transition probabilities is are
P = NN−i Ni
i,i−1
Pi,i = 1 − Pi,i−1 − Pi,i+1
= i N −i
P
i,i+1 N N
9
• replacement the offspring randomly replace a random selected individual in the pop-
ulation.
• frequency dependent contribution fitness: associated with interaction with other mem-
bers of population
If i is the number of A, then the average payoff for different type is given by
π A = a(i−1)+b(N −i)
i N −1
π B = c(i+d(N −i−1)
i N −1
p = 1 − ω + ωπiκ
and ω ∈ [0, 1] determines the relative contributions of the baseline fitness, clearly bigger of
value of ω is, stronger of frequency dependent fitness is. Then combine the properties of
Moran Process, we get a global information transition probabilities:
1−ω+ωπ B
T − (i) = 1−ω+ωhπii i NN−i Ni
g
Tg0 (i) = 1 − Tg− (i) − Tg+ (i)
T + (i) = 1−ω+ωπiA i N −i
g 1−ω+ωhπi i N N
then if we denote P τ (i) as the probability that system is in time τ and state i. Let us the
10
notation
i τ
x= ; t = ; ρ(x, t) = N P τ (i)
N N
ρ(x, t) denotes the probability density. For By
1 1 1 1 1
⇒ ρ(x, t + ) − ρ(x, t) = ρ(x − , t)Tξ+ (x − ) + ρ(x + , t)Tξ− (x + )
N N N N N
+ −
− ρ(x, t)Tξ (x) − ρ(x, t)Tξ (x)
By N 1, then use Taylor expansion
1 1 1 1 1 1 1 1 1 + 1
ρx + ρxx 2 + o( 2 ) = (ρ − ρx + ρxx 2 )(T + − Tx+ + Txx ) − ρT +
N 2 N N N 2 N N 2 N2
1 1 1 1 1 − 1 1
+ (ρ + ρx + ρxx 2 )(T − + Tx− + Txx ) − ρT − + o( 2 )
N 2 N N 2 N2 N
1
= −(Tx+ − Tx− )ρ − (T + − T − )ρx + −
(T + + Txx )ρ
2N xx
1 1 1
+ (Tx+ + Tx− )ρx + (T + + T − )ρxx + o( 2 )
N 2N N
That implies
d d 1 d2 2
ρ(x, t) ≈ − [a(x)ρ(x, t)] + [b (x)ρ(x, t)]
dt dx 2 dx2
q
with a(x) = Tξ+ (x) − Tξ− (x) and b(x) = 1 + −
N [Tξ (x) + Tξ (x)]. By the form is exactly
Fokker-Planck Equation, we know ρ(x, t) actually denotes the probability density of a
random variable Xt , which is a Itô process derive by standard Wiener process Wt , and
satisfies SDE as
dXt = a(Xt )dt + b(Xt )dWt
Since N is sufficient large and |Tξ+ (x)|, |Tξ− (x)| ≤ 1, then b(Xt ) ≈ 0. For N → 0,
1
Global: dXt = Xt [π A (Xt ) − hπ A (Xt )i]
Γ + hπ(Xt )i
with
π A (Xt ) = aXt + b(1 − Xt )
1−ω ω
π B (Xt ) = cXt + d(1 − Xt ) , Γ= ; Υ=
ω ∆πmax
hπ A (X )i = π A (X )X + π B (X )(1 − X )
t t t t t
Personal comment for this Local Information Model : this model is kind of good,
but the local assumption seems doesn’t make full sense, if we compute Tl0 (i) = 1 − TL+ (i) −
11
TL− (i) = 1 − i N −i
N N , it is actually independent with baselinef itness, which doesn’t make
sense at all. For example, if we consider i is the number of type in the population, then if
the baseline fitness of B is increasing, Tg+ (i) and Tg0 (i) should decrease and Tg− (i) should
increase.
Then we assume the mutation matrix M = (mjk )d×d , the special case of vanishing mutation
will be M = Id×d . Then the average payoff of j type with population (i1 , · · · , id ) will be
d
P
pjk ik
k=1
πj (i1 , · · · , id ) =
N −1
d
1 − ω + ωπj (i1 , · · · , id ) X ik
, hπ(i1 , · · · , id )i = π(i1 , · · · , id )
1 − ω + ωhπ(i1 , · · · , id )i N
k=1
for simplicity, we can always choose ω = 1 and times a constant to the payoff matrix. Then
if we consider the transition probability of a type k replaced by type j (k 6= j), and denotes
it as Tkj (i1 , · · · , id ), this happens in two cases
Then we know both cases happen with replaced by type k, then if we back to three proba-
bility in Moran Process, only Pik ,ik +1 and partial Pik ,ik of type k satisfy, and each sub-case
in Pik ,ik +1 has to mutate to type j, and only event pick type k and generate type k will be
12
counted with mutating to type j. Then combine the fitness and ω = 1 we have
d ik i k
X πl (i1 , · · · , id ) il ik N N
Tkj (i1 , · · · , id ) = mlj + (1 − Pik ,ik −1 − Pik ,ik +1 ) mkj
hπ(i1 , · · · , id )i N N (1 − Pik ,ik −1 − Pik ,ik +1 )
l6=k
d
ik X
= il πl (i1 , · · · , id )mlj
N 2 hπ(i1 , · · · , id )i
l=1
then we use the similar notation P τ (i1 , · · · , id ) denotes the probability of state (i1 , · · · , id )
at time τ , then
d
X
τ +∆τ
P (i1 , · · · , id ) = P τ (i1 , · · · , ij − 1, · · · , ik + 1, · · · , id ) × Tkj (i1 , · · · , ij − 1, · · · , ik + 1, · · · , id )
j,k=1
d
X
= P τ (i1 , · · · , ij − 1, · · · , ik + 1, · · · , id ) × Tkj (i1 , · · · , ij − 1, · · · , ik + 1, · · · , id )
j6=k
d
X
+ P τ (i1 , · · · , id ) × Tjj (i1 , · · · , id )
j=1
d
X
= P τ (i1 , · · · , ij − 1, · · · , ik + 1, · · · , id ) × Tkj (i1 , · · · , ij − 1, · · · , ik + 1, · · · , id )
j6=k
X
+ P τ (i1 , · · · , id ) × [1 − Tkj (i1 , · · · , id )]
k6=j
that implies
d
X d
X
− + − +
ρ(x; t + ∆t) − ρ(x; t) = ρ(j , k ; t) × Tkj (j , k ) − ρ(x; t) × Tkj (x)
j6=k j6=k
d−1 d−1
∂ρ(x) X ∂ 1 X ∂2
=− ρ(x)ak (x) + ρ(x)bjk (x)
∂t ∂xk 2 ∂xk ∂xj
k=1 j6=k
with
P d
[Tjk (x) − Tkj (x)]
ak (x) =
j6=k
d
1
P
bjk (x) = N − Tjk (x) − Tkj (x) + δjk
Tjl (x) + Tlj (x)
l=1
Similarly as Example 3, we can also use the local information transition probabilities
compute the Fokker-Planck Equation, but since we already mentioned this idea is technically
not very good, we will skip this part.
13
After this a very interesting idea is to see the stationary distribution and equilibrium for
the Fokker-Planck Equation since it’s very hard to get the analytic solution. If we consider
d−1
1X ∂
Jk (x) = ρ(x)ak (x) − ρ(x)bjk (x)
2 ∂xj
j=1
we denote ρ(x) at equilibrium is ρ∗ (x), and at equilibrium Jk (x) = 0 for all k, then we got
d−1 d−1
X ∂ ∗ X ∂
bjk (x) ρ∗ (x),
ρ (x) · bjk (x) = 2ak (x) − ∀k
∂xj ∂xj
j=1 j=1
If we assume Γ(x) is a gradient, then the solution exists and independent of the choice of
path, then if we assume the initial data was x0 , then we get the solution as
Z x
∗ ∗
ρ (x) = ρ (x0 )exp Γ(y) · dy
x0
but since the complicity of Γ(x), we couldn’t easily get the analytic solution.
Another good idea shows on Traulsen’s 2012 is using the Eigenvalue Decomposition of
the matrix: if we denote the matrix of coefficient as (Ak )1×d−1 (x) and (Bjk )d−1×d−1 (x), he
Pd
use the fact that Tjk (x) = 1, then
j=1
d
X d
X
Ak (x) = Tjk (x − Tkj (x) = −1 + Tjk (x)
j=1 j=1
it’s easy to observe that B(x) is a symmetric matrix and using Weak Diagonal Dominate
Theorem, we know that B is non-negative definite, and by Itô calculus, we can get the
solution of the Fokker-Planck Equation system as Langevin equation, and it could be
14
written as a diffusion process
∂ρ(x)
= Ak (x)dt + Ck (x)Bt
∂xk
here C T (x) · C(x) = B(x) and Ck (x) denotes the k-th row of C(x); Bt is a vector uncor-
related d − 1-dimensional Brownian Motion, and
By matrix B(x) is non-negative definite, exists an orthogonal matrix U (x), such that
Personal comment
• for this general case and Example 3 as simple case which only have 2 types, we should
Pd
always face the fact that Tk j(x) 6= 1, which means if we only define transition
j=1,k=1
d
P
probabilities on k 6= j, it still could let Tk j(x) > 1, if we have certain types which
j6=k
have large fitness and dominate population, since Tj j(x) is always positive, then we
have to check the transition probability is well-defined.
d
P d
P
• Traulsen made a little mistake that Tjk (x) not always equal to 1, since Tjk (x) =
j=1 j=1,k=1
1 all the time.
• for the Fokker-Planck Equation system, we could try to use PDE knowledge to get
the analytic solution, in Evan’s PDE book and Yoshida’s Heat kernal expansion, we
can find some iterative geometric approach to the analytic solution, but the cost will
be expensive and we need to find a way to minimize the cost, like we should find the
easy way to compute the eigenvalue dependent on x in our future work.
15
4 Reference
[1 KT81] Samuel Karlin and Howard M. Taylor. A second course in stochastic processes.
Academic Press Inc. [Harcourt Brace Jovanovich Publishers], New York, 1981.
[2 Dur96] Richard Durrett. Stochastic calculus. Probability and Stochastics Series. CRC
Press, Boca Raton, FL, 1996.
[3 Kurtz 70] Kurtz T G. Solutions of ordinary differential equations as limits of pure jump
Markov processes[J]. Journal of applied Probability, 1970, 7(1): 49-58.
[4 Kurtz 71] Kurtz T G. Limit theorems for sequences of jump Markov processes approximat-
ing ordinary differential processes[J]. Journal of Applied Probability, 1971, 8(2): 344-356.
16