0% found this document useful (0 votes)

36 views

Essential Questions For The Exam 2018, AMCS 308, Stochastic Methods in Engineering

1. The document provides essential questions for an exam on stochastic methods in engineering. 2. It covers key concepts like the Wiener process, the Central Limit Theorem, Monte Carlo methods, acceptance-rejection sampling, and variance reduction techniques. 3. Variance reduction techniques like control variates aim to reduce the variance of Monte Carlo estimates without introducing bias, improving computational efficiency.

Uploaded by

Agustín Estramil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Essential Questions For The Exam 2018, AMCS 308, Stochastic Methods in Engineering

Uploaded by

Agustín Estramil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Essential Questions for the Exam 2018, AMCS 308,

Stochastic Methods in Engineering

May 31, 2018

1. Formulate the basic properties a Wiener process.

The one dimensional Wiener process is a mapping W : [0, ∞) × Ω → R, where Ω
is a probability space with probability measure associated. This Process satisfies the
following properties
• W0 = 0.
• The mapping t → Wt is almost surely continuos in [0, ∞).
• Consider the time mesh, 0 = t0 < t1 < t2 . . . < tn = T . Then the increments,
Wtn − Wtn−1 , . . . , Wt1 − Wt0 are independent and normally distributed random
variables: Wt − Ws ∼ N (0, t − s) f or 0 ≤ s < t.
2. State and derive the Central Limit Theorem. State the BerryEsseen The-
orem.
The Central Limit Theorem: Let X1 , . . . , Xn be IID with mean µ and variance σ 2 . Let
X n = n−1 ni=1 Xi . Then
P
√
Xn − µ n(X n − µ)
Zn = q = →Z
σ
V [X n ]
where Z ∼ N (0, 1).
Proof : Let Yi = (Xiσ−µ) . Then, Zn = n−1/2 ni=1 Yi . Now
P
n
Pn if φYi (t) is the moment-
generating
√ nfunction of Y i , then (φYi (t)) is the MGF of i=1 Yi and then, φZn (t) =
(φYi (t/ n)) is the MGF of Zn . Now I have that:
(
φ0Yi (0) = E[Yi ] = 0,
φ00Yi (0) = E[Yi2 ] = V[Yi ] = 1.
Then:
t2 00 t3
φYi (t) = φYi (0) + tφ0Yi (0) + φYi (0) + φ000 (0) + . . .
2! 3! Yi
t2 t3 000
= 1 + 0 + + φYi (0) + . . .
2! 3!
t2 t3 000
= 1 + + φYi (0) + . . . ,
2! 3!
1
and then:
t n
φZn (t) = φYi √
n
2
t t3 000
n 2
= 1+ + 3/2
φYi (0) + . . . → et /2 ,
2!n 3!n
which is the MGF of a N (0, 1).
The Berry-Essen Inequality:
let’s define,

E[|Y − EY |3 ]
λ= <∞ (1)
σy
then the inequality is given by

c∗ λ3
|F (x) − φ(x)| ≤ √ (2)
(1 + |x|)3 M

where φ is the CDF of a standard normal and c∗ is a constant.

3. Show how to apply the Monte Carlo method to compute an integral and
discuss the corresponding error.
Consider,
Z 1 M
1 X
f (x) dx ≈ f (xi ) := VM (3)
0 M i=1
where xi ∼ U [0, 1] IID samples.
The variance is given by,
M M
1 X 1 X V [f (xi )]
V[ f (xi )] = 2 Vf (xi ) = (4)
M i=1 M i=1 M

Then by the CLT as the samples are IID , asymptotically we have that,

VM − E[VM ] √ VM − E[VM ]
p = M p ∼ N (0, 1) (5)
V[VM ] V[f (xi )]

Equivalently,
V[f (xi )]
VM − E[VM ] ∼ N (0, ) (6)
M
and we so can build the asymptotic confidence intervals,
r
V[f (xi )]
V M ± Cα (7)
M
where Cα is an appropriate percentile of the standard normal.

2
4. Describe and give all the proofs on the sampling method of acceptance-
rejection.
The method is as follows. We choose a proposal density η which has the share the
support of our target distribution f and covers it. Then we sample U ∼ U [0, 1]. And
for any ∈ [0, 1], ( f (x)
η(x) < U Reject the sample
f (x) (8)
η(x) ≥ U Accept the sample
Proof:
f (x)
Z Z η(x) Z Z
f (xk ) f (x)
P (Uk ≤ )= dµ dx = η(x) dx = f (x) dx = (9)
η(xk ) 0 η(x)

Let xk be the first accepted sample. We would like to show that xk is from the density
f . Consider,
k−1
X f (xk ) Y f (xm )
P (xk ∈ B) = P (xk ∈ B, Uk ≤ ) P (Um > )
k≥1
η(xk ) m=1 η(xm )
| {z }
=1−
(10)
f (xk ) X
= P (xk ∈ B, Uk ≤ ) (1 − )k−1 ∵ IID
η(xk ) k≥1
| {z }
1/

And we have,
Z Z f (x) Z
f (xk ) η(x) f (x)
P (xk ∈ B, Uk ≤ )= dµ η(x) dx = η(x) (11)
η(xk ) B 0 B η(x)

Hence, Z
f (xk )
P (xk ∈ B|Uk ≤ )= f (x) dx (12)
η(xk ) B

And so the sample are from the density f given that we have accepted them.

We look at the expected number of sampling from the proposal before we accept one
sample,
X X 1
E[K] = kP (K = k) = k(1 − )k−1 = (13)
k≥1 k≥1

We see that, ( →0

E[K] −−→ ∞
→1 (14)
E[K] −−→ 1

5. Motivate the use of variance reduction techniques. Describe two techniques

and discuss their computational efficiency.

3
Table 1: Comparison between MC and MC with Control Variates
Method #Samples Computational Work
c 2 c
MC ( T OL ) VY ( T OL )2 VY
Control Variates ( T OL )2 V[Vm (β ∗ )] (1 + ρ)( T OL )2 V[Vm (β ∗ )]
c c

As we have seen in question (3). The Monte Carlo’s error is asymptotically given by
the CLT since the samples are IID,
M
r
1 X VY
E[Y ] − Yj ≈ C α (15)

M j=1 M

where Cα are appropriate percentile of the standard normal. We would like to reduce
the variance without introducing bias.
(a) Control Variates: We would like to estimate EY . We sample Y and another
auxiliary RV, X for which EX is given and X and y are highly correlated.
For a given β > 0, we consider the unbiased estimator
M M
1 X β X
Vm = Yj − (Xj − EX) (16)
M j=1 M j=1

We observe that,
1 1
V[Vm ](β) = V[Y − βX] = VY + β 2 VX − 2βCov[X, Y ] (17)
M M
We find the minimizing β ,
Cov(Y, X)
β∗ = (18)
VX
Then,
1 Cov(Y, X)2 1 1
V[Vm ](β ∗ ) = V[Y ] 1 − = V[Y ] 1 − Cor(X, Y )2 ≤ V[Y ]
M V [X]V[Y ] M M
(19)
Assume that the work to sample (X, Y ) is (1 + ρ) times the work of sampling Y
alone.
From Table(1), we see that control variate is a good choice only if the increase
in computational work of the control variates technique is minor to the benefit of
its variance reduction. From (19), we will have the following computational work
inequality,
ρ
(1 + ρ)(1 − Cor(X, Y )2 ) ≤ 1 ⇐⇒ Cor(X, Y )2 > (20)
1+ρ
This shows exactly how high the correlation between Y and the auxiliary vari-
able X must be in order to benefit from control variates computationally. The
condition can also be stated as,
VY ≥ (1 + ρ)V[Vm (β ∗ )] (21)

4
(b) Antithetic variates: Let Y = g(x) such that X has a symmetric distribution and
so EX = 0. Then we have X and −X identically distributed, which means that
Eg(X) = Eg(−X) and
h g(X) + g(−X) i
EY = E (22)
2
then we have the estimator,
M
1 X g(Xj ) + g(−Xj )
Vm = (23)
M j=1 2

Which has variance,

V[g(x) + g(−X)]
V[Vm ] = (24)
4M
Assume that the extra evaluation of g(−X) has the same cost as evaluating g(X).
Then we require that,

2V[V1 ] ≤ V[X] =⇒ V[g(X) + g(−X)] ≤ 2V[g(X)] (25)

to effectively reduce the variance as the combined variance of both g(X) and
g(−X) must be smaller than the variance of the original random variable X
alone. Note that by the variance of the sum law we see that it is sufficient that
the covariance term is negative,

Cov[g(X), g(−X)] < 0 (26)

Note that for linear g we have V[g(X) + g(−X)] = 0. As,

V [g(X) + g(−X)] = V [g(X) − g(X)] = 2V[g(x)] + 2 Cov(g(X), −g(X)) = 0 (27)

| {z }
=−V[g(X)]

6. Define the Poisson counting process and explain how to sample it.
It is a collection of random variables where N (t) is a counting of events that has
occurred upto time t starting from t0 . The number of events between time t and time
s is given by N (s) − N (t) which follows a Poisson distribution.
Def.

(a) N (0) = 0
(b) Consider the time mesh, 0 = t0 < t1 < t2 . . . < tn = T . Then the increments,
Ntn − Ntn−1 , . . . , Nt1 − Nt0 are independent.
(c) If I = (s, t) ⊆ R for k = 1, 2, . . .

(λ|I|)k
P (N (t) − N (s) = k) = P (k jumps in I) = e−λ|I| (28)
k!
Where λ > 0 is the intensity of the process.

5
Given a sequence time t where t0 = 0 and tn = T with λ > 0:

(a) Sample ∆Ntn ∼ P oisson(λ(tn − tn−1 )) for k = 1, 2, . . . N

P
(b) Nt = ∆Ntk
k=1

7. Define the compound Poisson process and explain how to sample it.

A compound Poisson process with rate λ > 0 and jump size distribution f is a contin-
uous time stochastic process {X(t)} given by
N (t)
X
X(t) = Zi
i=1

Where N (t) is a Poisson counting process with rate λ > 0. The random variables Z
are IID with distribution f .

The general algorithm:

1 Set t = 0, N = 0, X = 0.
2 Sample U ∼ unif orm
3 t = t + [−(1/λ) ln(U )]. If t > T : final time, stop.
4 Sample Z ∼ f
5 Set N = N + 1 and X = X + Z
6 Go back to step (2).

To sample a compound Poisson process with jump amplitudes N (0, 1) and with parti-
tions in time ∆tn :

1 Sample ∆Ntn ∼ P oisson(λ(tn − tn−1 )) ∀n = 1, 2, . . . N .

2 Sample ∆Xtn ∼ N (0, ∆Nt2n ) ∀n = 1, 2, . . . N
N
P
3 X(t) = ∆Xtn
n=1

8. What is a Brownian bridge? Discuss an algorithm to sample the mid point

of the Brownian bridge.

Given a Wiener process W (t)

on [t1 = 0, t2 ]. Let the Brownian bridge have the con-
ditional value B(s) = W (s) for s ∈ [t1 = 0, t2 ]. Since W (s) and W (t2 ) are both

W (t2 )
Gaussian, then the conditional B(s) = W (s) is also Gaussian. We can find its

W (t2 )
parameters in the following way,

6
E[W (s)|W (t2 )] = E[Ws ] +Cov(Ws , Wt2 ) V [Wt2 ]−1 (Wt2 − E[Ws ])
| {z } | {z } | {z }
=0 = 1/t2 =0
(29)
s Wt2
=
t2
and
V[W (s)|W (t2 )] = V[Ws ] −Cov[Ws , Wt2 ] V [Wt2 ]−1 Cov[Wt2 , Ws ]
| {z } | {z }
= s = 1/t2
(30)
2
s
=s−
t2
Note that,
Cov[Ws , Wt2 ] = E[Ws Wt2 ] − E[Ws ]E[Wt2 ] = min(s, t2 ) = s (31)
| {z }
=0

We conclude that the distribution of B at s ∈ [t1 = 0, t2 ] is Normal with parameters,

s Wt2 s2
E[B(s)] = V[B(s)] = s − (32)
t2 t2
9. Describe Kernel density estimates for non parametric estimation. Discuss
the choice of the window parameter h and the resulting error in the esti-
mate of the pdf. How does this discussion depend on the dimension of the
random variable?

We want to estimate the density ρy (y) given a set {Yl }M

l=1 IDD observations from a
common density ρy . The kernel density estimator of ρy of the point y is given by,
M y − Y
1 X j
ρ̂k (y) = Ker (33)
M k j=1 k

if Ker(·) is a pdf, i.e. ( Ker(z) ≥ 0 and normalized). Then ρ̂k (y) will also be a candi-
date pdf estimator.
Next, we will show that no unbiased estimator can exit for all continuous ρy (y),

We look for a candidate with the least mean square error. An asymptotically unbiased
candidate estimator ρ̂km (y) such that,
M →∞
E[ρ̂km (y)] −−−−→ ρ(y) ∀y ∈ R (34)
We choose the window size parameter k > 0 to minimize the total error in the estimate
of Y ∼ ρ, given by
ρ̂k (y) − ρ(y) = E[ρ̂k (y)] − ρ(y) + ρ̂k (y) − E[ρ̂k (y)] (35)
| {z } | {z }
Bias sampling error

7
We look closer at the bias error,
Z
E[ρ̂k (y)] − ρ(y) = Kerk (y − z)ρ(z) dz − ρ(y)

= (Kerk ∗ ρ)(y) − ρ(y)

Z
= Kerk (z)ρ(y − zk) dz − ρ(y)
Z
0 1 00
= Kerk (z)[ ρ(y)

+ kzρ (y) + (kz)2 ρ (y)
R
|{z} | {z } 2
∵ Kerk (z)=1 integration is zero by symmetry

1 000
+ (kz)3 ρ (y) +O(z 4 )] dz −
ρ(y)

|6 {z }
integration is zero by symmetry
1 00
≈ σk2 k 2 ρ (y)
2
(36)

Note that we have expanded ρ(y − kz) around kz using a Taylor series. Where the first
term cancels out with ρ(y) as the kernel is a pdf and the secondR term is zero as the
2
kernel is symmetric. And we write the third term with σKer = z 2 Kerk (z) dz plus
higher order terms.

Next we look at the sampling error,

By chebychev’s inequality, for any a > 1,
1
P |EX − X| ≥ aσ(X) ≤ 2 (37)
a
We can approximate,
E[ρˆk ](y) − ρ̂k (y) ≈ Cσρ̂k (y) (38)

And,
1 y − z
V[ρ̂k ](y) = V[Ker ]
M k2 k
Z 0 Z 00 Z
ρ(y) 2 ρ (y) 2 ρ (y)
= Kerk (z) dz + zKer (z) dz + z 2 Ker2 (z) dz + . . .
Mk M 2M
(39)

Where we have expanded ρ(y − kz) using Taylor around kz.

Next we choose k that minimizes the total squared error,
Z
2 1 4 00 2 ρ(y)
E[ρ̂k ](y) − ρ(y) + CV[ρ̂k ](y) = σKer (ρ (y)) + C Kerk2 (z) dz + . . . (40)
4 MK

Hence, k ∗ ∝ M −1/5 . And so the approximation error ∝ M −2/5 .However, we see that
this choice depends on y. Next we find an optimal k independent of y. This can be

8
done by minimizing the L2 error,
Z Z Z
2 2
E[(ρ(y) − ρ̂k (y)) ] dy ≤ 2 (ρ(y) − E[ρ̂k ](y)) dy + 2 V[ρ̂k ](y)
Z Z (41)
1 4 4 00 2 2c
using chebychev inq. ≤ σKer k |ρ (y)| dy + Kerk2 (z) dz + . . .
2 MK

We consider the Mean Square Error,

Z
M SE(ρ̂k ) = E (ρ̂k (y) − ρ(y))2 dy
Z
= E(ρ̂k (y) − ρ(y))2 dy
Z
= M SE(ρ̂k (y)) dy
Z Z Z Z
1 1
= Ker (y) dy + (1 − ) (ρ̂k ∗ ρ) (y) − 2 (Kerk ∗ ρ)(y)ρ(y) dy + ρ2 (y) dy
2 2
Mk M
1 1 00 1
= R(Ker) + k 2 σKer
2
R(ρ ) + O( + k4)
Mk 4 Mk
(42)
R 00 R 00
where R(Ker) = Ker2 (y) dy and R(ρ ) = (ρ )2 (y) dy. Hence,
1 1 00
M SE(ρ̂k ) ≈ R(Ker) + k 2 σKer
2
R(ρ ) (43)
Mk 4
Then the optimal choice of k is given by ,
R(Ker) 1/5 −1/5
kM SE = [ ] M (44)
4
σKer (ρ00 )

This yields an approximation error ∝ M −2/5 .

The optimal value of k ∗ for a kde of an Rd random variable X, based on observing
X1 , X2 , . . . , Xn is as follows:

The d-dimensional KDE is defined as follows,

M
ˆ 1 X
f (x; H) = KH (X − Xi ),
M i=1

where H is a d × d SPD bandwidth matrix and KH is defined as follows,

KH (X) = |H|−1/2 K(H−1/2 X)

Note that KH is a normalized pdf. We propose the kernel K to be a d-variate Gaussian,

1 − 1 XT X
K(X) = e 2
2π d/2
9
To see the convergence rate, we look at the mean square error (MSE), where

M SE = bias2 (fˆ) + V ar(fˆ)

Lets first compute the bias and we use use Taylor expansion to simplify,
Z Z
ˆ
E[f (X; H)] − f (X) = KH (X − Y )f (Y )dY − f (X) = K(Z)f (X − H1/2 Z)dZ − f (X)
Z
1/2 T 1 1/2 T 1/2
= K(Z) f (X) − (H Z) Df (X) + (H Z) Hf (X)(H Z) dZ + o(tr(H)) − f (X)
2
Z Z
1/2 1
T
= f (X) − Z H Df (X)K(Z)dZ + Z T H1/2 Hf (X)H1/2 ZK(Z)dZ + o(tr(H)) − f (X)
2
Z
1 1/2 1/2 1
= tr H Hf (X)H ZZ K(Z)dZ + o(tr(H)) ∼ σ 2 (K)tr(HHf (X))
T
2 2
R
where σ 2 = zi2 K(Z)dZ
Next we compute the variance of fˆ ,
Z Z
ˆ 1 −1/2 2 1/2 1/2 2
V(f (X, H)) = |H| K (Z)f (X − H Z)dZ − ( K(Z)f (X − H Z)
M
1 1
= |H|−1/2 R(K)f (x) + o( |H|−1/2 )
M M
R 2
Where R(K) = K (z) dz. Then the MSE is given by,
For simplicity, consider H = h2 I , then
1 1 1
M SE = h4 (σ 2 (K))2 (∇2 f (x))2 + h−d R(K)f (x) + o( h−d )
4 M M

Then optimal bandwidth h∗ that minimizes the MSE is given by,

1
d+4
dR(K) 1
h∗ = ∝ M − d+4
M (σ (K))2 (∇2 f (x))2
2

Note that at best, when using h∗ , the MSE depends on the dimensions as follows,
2
M SE ∝ M − d+4 .

We see that as the dimensions increase, the convergence rate decreases as expected.

10. What is affine prediction? Describe its use for the approximation of the
conditional expectation E[Y |Z]? When is this approximation exact?

The affine prediction is a method used to solve prediction problems, that means given
the joint distribution of a random variable (Y, Z) and the observations of Z, we want

10
to compute V̂ , the best prediction of Y .
In particular, consider the space:
Lp = {g : g is a polynomial with degre at most p} (45)
The affine prediction returns an Lp approximation of E[Y |Z]
.
Affine prediction: If we restrict ourselves to the L1 approximation of E[Y |Z]. So
lets consider, Y = g(Z) = at (Z − EZ) + b. To find (a∗ , b∗ ), we solve for m = 1, . . . , M .

E[(at (Z − EZ) + b)(Zm − EZm )] = E[Y (Zm − EZm )] (46)

and
E[at (Z − EZ) + b] = E[Y ] (47)

This can be written in a matrix form as,

CovZ 0 a E[Y (Z − EZ)]
=
0 1 b EY

So we have that b∗ = EY and a∗ = Cov −1 (Z)E[Y (Z − EZ)] then,

g1∗ (Z) = E[Y (Z − EZ)]t Cov −1 (Z)(Z − EZ) + EY
t (48)
= E[Y Z] − EY EZ Cov(Z)−1 (Z − EZ) + EY

11. Motivate the use of non-linear weighted least squares starting from a max-
imum posterior, Gaussian likelihood, Bayesian formulation.
We consider the following setting 1 ,
x=m+e (49)
where m is the model (a deterministic unkown) and e ∼ N (0, σ 2 ∆) is a zero-mean
multivariate Gaussian noise with covariance Cov[e] = ∆. In the weighted least square
framework, we aim to minimize the following where we have M samples,
σW LS = ||W (x − m)||2F (50)
where W is a diagonal matrix of weights and || · ||F is the Frobenuis norm. Note that
by minimizing (50) we are reducing the error according to different weights. These
weights can be choose based on prior information (a prior distribution).
Next we find the log-likelihood function of the residual e = x − m,
M 1 1
`=− ln σ 2 − ln |∆| − 2 (x − m)t ∆−1 (x − m) (51)
2 2 2σ
1
-Bro R, Sidiropoulos ND and Smilde AK. Maximum likelihood fitting using ordinary least squares
algorithms. J. Chemom. 2002; 16: 387400.
-Judge GG, Griffths WE, Carter Hill R, Luetkepohl H and Lee TC. The Theory and Practice of Econometrics.
Wiley: New York,1985.

11
The minimization of the ` with respect to σ 2 , yields the estimator

(x − m)t ∆−1 (x − m)
2
σ̂ = ∝ (x − m)t ∆−1 (x − m) (52)
M
By choosing W = ∆−1/2 , we obtain

(x − m)t ∆−1 (x − m) = (x − m)t W t W (x − m) = [W (x − m)]t [W (x − m)] (53)

This again equivalent to minimizing,

σM L = ||W (x − m)||2F (54)

Comparing (50) and (54), we see that by choosing W = ∆−1/2 the weighted least
square estimates coincide with maximum likelihood estimates.

Next, it remains to minimize (54). We use an iterative majorizartion technique to tackle

a possibly full matrix W . We denote the cth iteration by σM L (mc ) = ||W (x − mc )||2F .
Since we know the σM L ≥ 0. Then we aim to have,

σM L (mc+1 ) ≤ σM L (mc ) (55)

We define a majorization function σmaj that satisfies the following:

• monotonically converges to σM L from above.

• Identical to σM L at the current point mc , i.e. σmaj (mc ) = σM L (mc )

We rewrite (54) using m = mc + (m − mc ),

σM L (m) = [(x − mc ) − (m − mc )]t W t W [(x − mc ) − (m − mc )]

= (x − mc )t W t W (x − mc ) + (m − mc )t W t W (m − mc ) − 2(m − mc )t W t W (x − mc )
| {z } | {z } | {z }
a constant non-linear in m linear in m

Let β be the largest eigenvalue of W t W , then for any vector s,

st W t W s ≤ βst s =⇒ (m − mc )t W t W (m − mc ) ≤ β(m − mc )t (m − mc ) (56)

Then a majorization function defined by,

σmaj = (x − mc )t W t W (x − mc ) + β(m − mc )t (m − mc ) − 2(m − mc )t W t W (x − mc ) (57)

satisfies σM L ≤ σmaj and we see that σmaj (mc ) = σM L (mc ) = (x − mc )t W t W (x − mc ).

Amd so it satisfies the requirement of being a majorization function as defined above.
Note that we have convergence as,

σM L (mc+1 ) ≤ σmaj (mc+1 ) ≤ σmaj (mc ) ≤ σM L (mc ) (58)

To find the best solution, we perform the following algorithm:

12
1 Set c = 0 and an initial mc = 0

2 Find mc+1 = arg min σmaj (mc )

m∈ξ
||mc −mc−1 ||2F
3 Got to (2) until ||mc−1 ||2F
≤

where ξ is the admissible set of m following the structure of the model and is a required
tolerance. Hence, in this Bayesian setting, we have used as a prior the covariance ∆
of the noise e and we were able to formulate a maximum posterior estimator of the
parameter σ using non-linear least squares.

12. Describe Linear Partially Observed State Space Models. What is the filter-
ing problem? What is a Bayesian filter?
An Rd stochastic sequence X = {Xk }k≥0 is a state space model if it evolves according
to a recursion of the form:

Xk+1 = Fk Xk + Wk + Uk (59)

Where {Fk }k≥0 is a deterministic sequence of d×d matrices , {Uk }k≥0 is a deterministic
sequence of d × 1 vectors and {Wk }k≥0 is a sequence of independent Rd random vectors
for which EWk = 0 and E[||Wk ||2 ] < ∞.
Partially observed states When are unable to observe directly the state or the
system. However, we observe a corrupted state Xk . In this setting, we assume that
one can directly observe {Zk }k≥0 where:

Zk = Gk Xk + Vk f or k ≥ 0 (60)

Here {Gk }k≥0 is a deterministic sequence of n × d matrices and {Vk }k≥0 is a stochastic
sequence of n × 1 random vectors.
Filtering is concerned with trying to extract the current state Xk from the observed
history of the observation process Z up to time n.

The Bayesian filter gives the complete evolution of the distribution of the state Xk in
terms of the available data in a recursive way. It is done in the following way:
R
1 Time evolution update: π(Xk+1 |{Zj }kj=1 ) = π(Xk+1 |Xk )π(Xk |{Zj }kj=1 ) dXk
π(Zk+1 |Xk+1 )π(Xk+1 |{Zj }kj=1 )
2 Observation update: π(Xk+1 |{Zj }kj=1 ) = π(Zk+1 |{Zj }kj=1 )
R
with: π(Zk+1 |{Zj }kj=1 ) = π(Zk+1 |Xk+1 )π(Xk+1 |{Zj }kj=1 ) dXk+1

13. Describe the Kalman filter, motivate it as a particular case of a Bayesian

filter. Give a simple example of a Kalman Filter applied to a Linear Ob-
served State Space Model.

13
Given a Bayesian filter for linear evolution and observations subject to Gaussian noise
and initial condition X0 Gaussian, then Xk will be Gaussian. This leads to the Kalman
filter that just tracks the conditional mean and the covariance of Xk .

Kalman Filter: Let (Xk , Zk ) follow the linear equation:

Xk+1 = Fk Xk + Wk + Uk (61)

For k ≥ 0 :

(a) {Fk }k≥0 deterministic

(b) {Uk }k≥0
(c) {Wk }k≥0 sequence of independent random vectors Zk = Gk Xk + Vk
(d) {Gk }k≥0 deterministic
(e) {Vk }k≥0 stochastic

We assume that W and V are independent. Then we have,

for T = 1 and
1
σ2
Z
σes−1 dWs ∼ N (0, (1 − e−2 )) (64)
0 2
and
Yn+1 = Un+1 + ηn+1 ηn ∼ N (0, T ) (65)
then given an initial condition U0 Gaussian, the filtering distribution,

π(µn+1 |{Yj }n+1

j=1 ) (66)

is Gaussian. So using the Kalman filter we can track the evolution of the first two
moments as follows:

14
(a) Prediction step:

Un+1 = e−1 m̂n Cn+1 = e−2 Ĉn + Σ

(b) Update step:

m̂n+1 = (I − Kn+1 )mn+1 + Kn+1 Yn+1 Ĉn+1 = (I − Kn+1 )Cn+1

−1
where Kn+1 = Cn+1 Sn+1 and Sn+1 = T + Cn+1
14. Define Markov chains and give an example. Compute its limit distribution
if it exists.

Def. Y = {Yt , t = 0, 1, 2, . . .} is a Markov Chain with state space S, initial distribution

π and transition matrix P if :
(a) Y0 has a distribution π
(b) Conditional distribution of Yt+1 given Yt = i is p(i, i + 1) at time t + 1 and it is
idempotent of Y0 , . . . Yt−1 .
Example:
Every month Raul travels (T ) or not N . Given a state, the transition probability is
given by ,

p(T → T ) = 0.9 p(T → N ) = 0.1 p(N → T ) = 0.5 p(N → N ) = 0.5 (67)

Then we have the transition matrix,

0.9 0.1
P =
0.5 0.5

Then the Markov Chain X ∼ M C(λ, p) is a regular M C such that:

lim P n = (1, . . . , 1)t w̄ (68)

n→∞

where w̄ is the distribution that solves

(
w̄ = w̄P
P (69)
i w̄i = 1

We find the solution to be w̄ = [5/6, 1/6]

15. Let X(t) be a discrete Markov Chain and

u(y, T ) = P (X(T ) = y|X(0) = x).

These probabilities satisfy an evolution equation moving forward in time:

state it and derive it.

15
Define a µ(x, t) = p(X(t) = x) where x ∈ S sample space. Starting from the numbers
µ(x, 0) = λx ∀x ∈ S and given the transition matrix P . The next evolution in time,

µ(x, t) = λx pt ∀x ∈ S (70)

By conditional probability,
µ(x, t + 1) = p(X(t + 1) = x)
X
= p(x(t + 1) = x|x(t) = y)p(X(t) = y)
y∈S (71)
X
= p(y, x)µ(y, t)
y∈S

We will write the last equation in a matrix form. Suppose that S = {x1 , . . . , xN } and
suppose that |S| < ∞. We define a row vector of order |S|:

µ(t) =< µ1 (x1 , t), . . . , µn (xN , t) > (72)

And so we can write,

u(t + 1) = u(t)P (73)
Where λ = µ(0) =< p(x(0) = x1 ), . . . , p(x(0) = xN ) >

16. Let X(t) be a discrete Markov Chain and for t < T let

f (x, t) = E(V (X(T ))|X(t) = x).

State and derive a backward equation for the above expected value of a
state observable.

Def. F (x, T ) = E[V[X(T )]|X(t) = x] for t < T and x ∈ S. Given F (x, t) =

V (x) ∀x ∈ S and given the transition matrix P , we have the following backward
evolution,
F (x, t) = P T −t f (x, T ) ∀x ∈ S (74)

The evolution equation of the expectation can be written in a matrix form. Again,
suppose that S = {x1 , . . . , xN } and suppose that |S| < ∞. We define a vector of order
|S|: f (t) =< f1 (t), . . . , fN (t) >. And so we have,

f (t) = P f (t + 1) with f (T ) = V (x) ∀x ∈ S (76)

16
17. State and discuss a method for sampling from the invariant measure of a
Markov Chain. Mention a possible application of it.
Metropolis-Hasting algorithm samples from pdf f . Fixing f = w , it is possible to
sample out of w and then construct an ergodic, recursive MC such that the given w
satisfies , the detailed balance condition.
Metropolis-Hasting:
Input: f the target pdf to be sampled and q(y|x) a proposal density. Let Xn be the
current sample. We perform the following:
1 Given Xn , we sample Yn ∼ q(·|Xn ).
2 Let the acceptance probability be,
f (y)q(x|y)
p(x, y) = min{1, } (77)
f (x)q(y|x)
and take, (
Yn with probability p(Xn , Yn )
Xn+1 = (78)
Xn else

In particular , if q(x|y) = q(y|x) =⇒ p(x, y) = min{1, ff (x)

(y)
}
Next, we check the detailed balance property, we have to show that:

f (x)t(x, y) = f (y)t(y, x) (79)

where t(x, y) are the transition probabilities of the MC defined by Metropolis-Hasting.

The transition pdf is :
Z
t(x, y) = p(x, y)q(y, x) + δx (y)(1 − p(x, s)q(s, x) ds) (80)

By two identities corresponding to two terms in f (x)t(x, y).

(a) ρ(x, y)q(y, x)f (x) = ρ(y, x)q(x|y)f (y).

(b) (1 − r(x))δx (y)f (x) = (1 − r(y))δy (x)f (y)

Hence the detailed balance follows and then f is a stationary pdf for MC.
Application: In the Bayesian framework, we have data {Xi }ni=1 we want to estimate
the parameter θ:
p(θ|{Xi }) ∝ L({Xi }|θ)p(θ) (81)
this has to be known up to a multiplicative constant. Then we can apply M-H algorithm
to draw samples from the posterior p(θ|{Xi }).
Next we discuss the bias and statistical error of MCMC. For an arbitrary function we
would like to estimate the expectation θ = E[ψ(X)] and where X is a discrete random
vector. Consider the stationary target probability π, then

θ = E[ψ(X)] = π t ψ (82)

17
Consider the MCMC estimator,
n−1
1X
θ̂n = ψ(Xk ) (83)
n k=0

where Xk is a state in the Markov chain starting from a fixed initial distribution α.
Consider an irreducible apriodic transition matrix P whose ergodic limit is π. Next we
check the bias, i.e. if Eθ̂n = θ. We proceed,
n−1 n−1 n−1
1X 1X t k 1 tX k
Eθ̂n = E[ψ(Xk )] = αP ψ= α P ψ (84)
n k=0 n k=0 n k=0

Then using the ergodicity of P and its limit π, we have

lim Eθ̂n = θ (85)

n→∞

Hence, MCMC is asymptotically unbiased. However, for finite n it is biased and could
be significantly biased near the initial distribution α. Therefore, it is customary to
allow a burn in period r (discarding the output for the first r runs) then we calculate
our estimate as follows,
n
1 X
θ̂n−r = ψ(Xk ) (86)
n − r k=r+1
The choice of r can be pre-selected, or estimated statistically.
Next we find the variance of the estimator θ̂n . Let Π be a matrix with identical row
vectors of the target probability π t . Then,

(P − Π)n = (P − Π)n−1 (P − Π) = (P n−1 − Π)(P − Π) = (P n − ΠP − P n−1 Π + Π2 )

= Pn − Π

Where we have used the fact that ΠP = P Π = Π = Π2 since P has π as the limit
probability and the idempotent property of Π. We can also see that,

lim (P − Π)n = lim P n − Π = 0 (87)

n→∞ n→∞

Note that Q = P − Π describes the probabilities between transient states. Next we

can define the fundamental matrix,

Z = (I − (P − Π))−1 (88)

which we can expand into,

∞
X
Z = (I − (P − Π))−1 = I + (P k − Π) (89)
k=1

We begin find the variance of the MCMC estimator,

18
n−1 n−1
1 X 2
X
V[θ̂n ] = 2 E[ ψ(Xk ) ] − (E[ ψ(Xk )])2 (90)
n k=1 k=1

Rewriting,
n−1 n−1 n−1
0
X X X
2 2
n V[θ̂n ] = E[ ψ(Xk ) ] + 2 E[ψ(Xk )ψ(Xk )] − (E[ ψ(Xk )])2 (91)
k=1 0 k=1
k6=k

Consider the case of α = π where π is of fixed length l, that is when the initial
probability is the target probability. Then,
n−1
X
(E[ ψ(Xk )])2 = n2 θ2 (92)
k=1

and
n−1
X n−1
X l
X
2 2
E[ ψ(Xk ) ] = E[ψ(Xk ) ] = n πi ψi2 (93)
k=1 k=1 i

Also,
n−1 n−1
0
X X
E[ψ(Xk )ψ(Xk )] = (n − k)E[ψ(X0 )ψ(Xk )]
k6=k0 k=1
n−1
X l X
X l n−1
X
(k)
= (n − k) πi ψi pi,j ψj = π t diag(ψ) (n − k)P k ψ
k=1 i=1 j=1 k=1

Then we have, after substituting and dividing by n,

l n−1
X X (n − k)
nV[θ̂n ] = πi ψi2 2
− θ +2π diag(ψ) t
P k ψ − (n − 1)θ2
i k=1
n (94)
| {z }
σ2

Observe that θ2 can be written as θ2 = π t diag(ψ)Πψ, then we can rewrite,

n−1
X (n − k)
nV[θ̂n ] = σ 2 + 2π t diag(ψ) P k ψ − (n − 1)π t diag(ψ)Πψ (95)
k=1
n

rewriting,
n−1
2
X (n − k) k
t n−1
nV[θ̂n ] = σ + 2π diag(ψ) P ψ− Πψ
k=1
n 2
n−1
(96)
X (n − k)
= σ 2 + 2π t diag(ψ) (P k − Π) ψ
k=1
n

19
Taking the limit and using (89) ,

lim nV[θ̂n ] = σ 2 + 2π t diag(ψ)(Z − I)ψ (97)

n→∞

In the previous calculation we assumed that we have started MCMC with an initial
distribution equal to the target distribution α = π. It remains to show that the result
is valid if we start from any initial distribution α, i.e.

lim nV[θ̂nα ] − nV[θ̂nπ ] = 0 (98)

n→∞

where θ̂nα denotes the estimator for a chain started from distribution α and θ̂nπ denotes
the estimator for a chain started from distribution π

For the purpose of manipulation, lets define the following summations

r−1
X n−1
X
Yr = ψ(Xk ) Zn = ψ(Xk ) (99)
k=1 k=r

Then we can write,

n(V[θ̂nπ ] − V[θ̂nα ]) = E(Yrπ + Znπ )2 − E(Yrα + Znα )2 − (EYrπ + EZnπ )2 − (EYrα + EZnα )2
1 h
= E(Yrπ )2 − (EYrπ )2 − E(Yrα )2 + (EYrα )2 := A
n
π π π π α α α α
+ 2E (Yr − EYr )(Zn − EZn ) − 2E (Yr − EYr )(Zn − EZn ) := B
i
+ E(Znπ )2 − (EZnπ )2 − E(Znα )2 + (EZnα )2 := C

We look closer at each term. Note that the term A does not depend on n. Hence, lim n1 A = 0.
n→∞
Since the state space is finite, we have a maximum such that C = max|ψ(Xi )| < ∞. Then
i∈S
we can see that lim n1 B = 0 and lim n1 C = 0. Hence, we have shown (98) and the statement
n→∞ n→∞
(97) holds for any starting distribution α, that is lim V[θ̂n ] = 0. Then we can conclude that
n→∞
the MSE of MCMC is given by ,

M SE 2 (θ̂n ) = Bias(θ̂n )2 + V[θ̂n ] = Eθ̂n − θ + V[θ̂n ] (100)

And we conclude that in the limit,

lim M SE 2 (θ̂n ) = lim (Eθ̂n − θ)2 + lim V[θ̂n ] = 0 (101)

n→∞ n→∞ n→∞

However, each term converges at a different speed. The bias converge on the order of
n−2 while the variance on the order of n−1 .Hence, controlling the variance is much more
important. This can be done by selecting an initial distribution α that yields the least
variance of the estimator in order to save on computational costs.

Solutions To Steven Kay's Statistical Estimation Book
67% (3)
Solutions To Steven Kay's Statistical Estimation Book
16 pages
MA451 S23 Assignment 2 Solutions Marking PDF
No ratings yet
MA451 S23 Assignment 2 Solutions Marking PDF
21 pages
Solutions To Oksendal
0% (2)
Solutions To Oksendal
35 pages
Essential Questions For The Exam 2018, AMCS 308, Stochastic Methods in Engineering
No ratings yet
Essential Questions For The Exam 2018, AMCS 308, Stochastic Methods in Engineering
14 pages
Notessc w04
No ratings yet
Notessc w04
8 pages
msqe_metrics_1_ps2
No ratings yet
msqe_metrics_1_ps2
11 pages
Rarefied Gas Dynamics - DSMC Course
No ratings yet
Rarefied Gas Dynamics - DSMC Course
50 pages
2023 Final Review
No ratings yet
2023 Final Review
61 pages
R300 Solution Guide 2018M
No ratings yet
R300 Solution Guide 2018M
8 pages
MA204 FinalTest 2022
No ratings yet
MA204 FinalTest 2022
14 pages
ECE286 Final Exam Aid Sheet
No ratings yet
ECE286 Final Exam Aid Sheet
4 pages
Cheat sheet for the final exam
No ratings yet
Cheat sheet for the final exam
6 pages
TP_stat_inf_103957
No ratings yet
TP_stat_inf_103957
32 pages
Chap2 PDF
No ratings yet
Chap2 PDF
15 pages
Solutions To The Exercises On The Bias-Variance Dilemma
No ratings yet
Solutions To The Exercises On The Bias-Variance Dilemma
8 pages
JRF Stat STB 2017
No ratings yet
JRF Stat STB 2017
2 pages
Importance Sampling
No ratings yet
Importance Sampling
13 pages
final_practice_2023
No ratings yet
final_practice_2023
10 pages
Sol Stat Chapter2
No ratings yet
Sol Stat Chapter2
9 pages
College Statistics
No ratings yet
College Statistics
244 pages
FIT5197 2021 S1 Formula Sheet
No ratings yet
FIT5197 2021 S1 Formula Sheet
20 pages
Bài Tập Ước Lượng C12346
No ratings yet
Bài Tập Ước Lượng C12346
55 pages
Introduction
No ratings yet
Introduction
11 pages
STAT4027 Assignment 1: Lewis Hastie
No ratings yet
STAT4027 Assignment 1: Lewis Hastie
26 pages
Assign20153 Sol
No ratings yet
Assign20153 Sol
47 pages
Essential Questions For The Exam 2017, AMCS 336, Numerical Methods For Stochastic Differential Equations
No ratings yet
Essential Questions For The Exam 2017, AMCS 336, Numerical Methods For Stochastic Differential Equations
21 pages
03 Spring Final Soln
No ratings yet
03 Spring Final Soln
3 pages
ISI MStat 06
No ratings yet
ISI MStat 06
5 pages
Problem Sheet 3.7
No ratings yet
Problem Sheet 3.7
3 pages
final_soln
No ratings yet
final_soln
5 pages
Solutions To Exam 1: 1 2 N N A N
No ratings yet
Solutions To Exam 1: 1 2 N N A N
3 pages
281A Final Sol
No ratings yet
281A Final Sol
9 pages
Formula Sheet STAT1301
No ratings yet
Formula Sheet STAT1301
3 pages
Formulae Descript
No ratings yet
Formulae Descript
5 pages
Common Statistical Densities: Appendix 1
No ratings yet
Common Statistical Densities: Appendix 1
59 pages
formulasheetensvnew
No ratings yet
formulasheetensvnew
15 pages
HW1
No ratings yet
HW1
21 pages
Solution 3 Problem 1: Let X
No ratings yet
Solution 3 Problem 1: Let X
12 pages
Solutions To Problems: X 0 ZX X y
No ratings yet
Solutions To Problems: X 0 ZX X y
19 pages
1.2.3.4.short Solutions
No ratings yet
1.2.3.4.short Solutions
14 pages
Unit 4
No ratings yet
Unit 4
8 pages
Probability and Statistics Soln 20
No ratings yet
Probability and Statistics Soln 20
5 pages
Sol 1
No ratings yet
Sol 1
4 pages
Formula Sheet Math236
No ratings yet
Formula Sheet Math236
2 pages
ChrisRackauckas-IntuitiveSDEs
No ratings yet
ChrisRackauckas-IntuitiveSDEs
96 pages
Ma40189 2016 2017 Problem Sheet 3 Solutions合并版
No ratings yet
Ma40189 2016 2017 Problem Sheet 3 Solutions合并版
67 pages
SolutionsManual MCstyle 2018
No ratings yet
SolutionsManual MCstyle 2018
40 pages
STA 303 Theory of Estimation 9th Lecture-1
No ratings yet
STA 303 Theory of Estimation 9th Lecture-1
7 pages
Xxxx Statistical Estimation
No ratings yet
Xxxx Statistical Estimation
87 pages
Gra 65151 - 201920 - 04.10.2019 - QP
No ratings yet
Gra 65151 - 201920 - 04.10.2019 - QP
5 pages
Maa 203 Cheat Sheet Lucien Walewski
No ratings yet
Maa 203 Cheat Sheet Lucien Walewski
2 pages
Fundamentals of Mathematical Statistics 2020
No ratings yet
Fundamentals of Mathematical Statistics 2020
196 pages
Chap5
No ratings yet
Chap5
51 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Compiled Notes: Mscfe 610 Econometrics
100% (1)
Compiled Notes: Mscfe 610 Econometrics
29 pages
Mechanical Integrity Weibull
No ratings yet
Mechanical Integrity Weibull
16 pages
Factors Affecting Mode Choice of Work Trips Gaza
No ratings yet
Factors Affecting Mode Choice of Work Trips Gaza
13 pages
Adoption and Intensity of Row-Seeding (Case of Wolaita Zone)
No ratings yet
Adoption and Intensity of Row-Seeding (Case of Wolaita Zone)
12 pages
The Australian Journal of Agricultural Economics
No ratings yet
The Australian Journal of Agricultural Economics
13 pages
Documento 1 Bmantenimiento
No ratings yet
Documento 1 Bmantenimiento
70 pages
Ward Ahlquist
No ratings yet
Ward Ahlquist
327 pages
CMRIT B.tech Minor Honors Courses Regulations Syllabus
No ratings yet
CMRIT B.tech Minor Honors Courses Regulations Syllabus
75 pages
Analyzing Raised Median Safety Impacts Using Bayesian Methods - 2011
No ratings yet
Analyzing Raised Median Safety Impacts Using Bayesian Methods - 2011
8 pages
Machine Learning: COMS 4771 Fall 2018
No ratings yet
Machine Learning: COMS 4771 Fall 2018
6 pages
Financial Econometrics Notes
No ratings yet
Financial Econometrics Notes
115 pages
Xspec Tutorial
No ratings yet
Xspec Tutorial
25 pages
Error Analysis in Circle Fitting
No ratings yet
Error Analysis in Circle Fitting
26 pages
Answer-Assignment DMBA103 MBA1 2 Set-1 and 2 Sep 2023
No ratings yet
Answer-Assignment DMBA103 MBA1 2 Set-1 and 2 Sep 2023
12 pages
Duolingo English Test: Technical Manual: Ramsey Cardwell, Geoffrey T. Laflair, and Burr Settles
No ratings yet
Duolingo English Test: Technical Manual: Ramsey Cardwell, Geoffrey T. Laflair, and Burr Settles
37 pages
Mobile Information Systems - 2016 - Ali - On The Eigenvalue Based Detection For Multiantenna Cognitive Radio System
No ratings yet
Mobile Information Systems - 2016 - Ali - On The Eigenvalue Based Detection For Multiantenna Cognitive Radio System
8 pages
LoD MLE A SAS Macro For LoD Estimation FINAL v1 05Jan2016A
No ratings yet
LoD MLE A SAS Macro For LoD Estimation FINAL v1 05Jan2016A
10 pages
NASKAH - Matematika - Statistika - 1606894894 - Detasya Avri Magfira
No ratings yet
NASKAH - Matematika - Statistika - 1606894894 - Detasya Avri Magfira
7 pages
Goodness
No ratings yet
Goodness
12 pages
White Noise With Arima Modelling
No ratings yet
White Noise With Arima Modelling
9 pages
Quality Reliability Eng - 2016 - Ali - An Overview of Control Charts For High Quality Processes
No ratings yet
Quality Reliability Eng - 2016 - Ali - An Overview of Control Charts For High Quality Processes
19 pages
Unit 2
No ratings yet
Unit 2
76 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
Coaching Actuaries Exam STAM Suggested Study Schedule: Phase 1: Learn
No ratings yet
Coaching Actuaries Exam STAM Suggested Study Schedule: Phase 1: Learn
7 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
Presentation - R Package eRM
100% (1)
Presentation - R Package eRM
40 pages
CONCEPTS_OF_MACHINE_LEARNING [MINOR]
No ratings yet
CONCEPTS_OF_MACHINE_LEARNING [MINOR]
14 pages
Iit Jam Mathematical Statistics Question Paper 2015
No ratings yet
Iit Jam Mathematical Statistics Question Paper 2015
18 pages
Gradient-Based Optimization of Dag-Penalized Likelihood For Learning Linear Dag Models (GOLEM)
No ratings yet
Gradient-Based Optimization of Dag-Penalized Likelihood For Learning Linear Dag Models (GOLEM)
23 pages

Essential Questions For The Exam 2018, AMCS 308, Stochastic Methods in Engineering

Uploaded by

Essential Questions For The Exam 2018, AMCS 308, Stochastic Methods in Engineering

Uploaded by

Essential Questions for the Exam 2018, AMCS 308,

Stochastic Methods in Engineering

May 31, 2018

1. Formulate the basic properties a Wiener process.

where φ is the CDF of a standard normal and c∗ is a constant.

We see that, ( →0

5. Motivate the use of variance reduction techniques. Describe two techniques

Which has variance,

2V[V1 ] ≤ V[X] =⇒ V[g(X) + g(−X)] ≤ 2V[g(X)] (25)

Cov[g(X), g(−X)] < 0 (26)

Note that for linear g we have V[g(X) + g(−X)] = 0. As,

V [g(X) + g(−X)] = V [g(X) − g(X)] = 2V[g(x)] + 2 Cov(g(X), −g(X)) = 0 (27)

(a) Sample ∆Ntn ∼ P oisson(λ(tn − tn−1 )) for k = 1, 2, . . . N

The general algorithm:

1 Sample ∆Ntn ∼ P oisson(λ(tn − tn−1 )) ∀n = 1, 2, . . . N .

8. What is a Brownian bridge? Discuss an algorithm to sample the mid point

Given a Wiener process W (t)

We conclude that the distribution of B at s ∈ [t1 = 0, t2 ] is Normal with parameters,

We want to estimate the density ρy (y) given a set {Yl }M

= (Kerk ∗ ρ)(y) − ρ(y)

Next we look at the sampling error,

Where we have expanded ρ(y − kz) using Taylor around kz.

We consider the Mean Square Error,

This yields an approximation error ∝ M −2/5 .

The d-dimensional KDE is defined as follows,

where H is a d × d SPD bandwidth matrix and KH is defined as follows,

KH (X) = |H|−1/2 K(H−1/2 X)

Note that KH is a normalized pdf. We propose the kernel K to be a d-variate Gaussian,

M SE = bias2 (fˆ) + V ar(fˆ)

Then optimal bandwidth h∗ that minimizes the MSE is given by,

E[(at (Z − EZ) + b)(Zm − EZm )] = E[Y (Zm − EZm )] (46)

This can be written in a matrix form as,

So we have that b∗ = EY and a∗ = Cov −1 (Z)E[Y (Z − EZ)] then,

(x − m)t ∆−1 (x − m) = (x − m)t W t W (x − m) = [W (x − m)]t [W (x − m)] (53)

This again equivalent to minimizing,

σM L = ||W (x − m)||2F (54)

Next, it remains to minimize (54). We use an iterative majorizartion technique to tackle

σM L (mc+1 ) ≤ σM L (mc ) (55)

We define a majorization function σmaj that satisfies the following:

• monotonically converges to σM L from above.

We rewrite (54) using m = mc + (m − mc ),

σM L (m) = [(x − mc ) − (m − mc )]t W t W [(x − mc ) − (m − mc )]

Let β be the largest eigenvalue of W t W , then for any vector s,

st W t W s ≤ βst s =⇒ (m − mc )t W t W (m − mc ) ≤ β(m − mc )t (m − mc ) (56)

Then a majorization function defined by,

σmaj = (x − mc )t W t W (x − mc ) + β(m − mc )t (m − mc ) − 2(m − mc )t W t W (x − mc ) (57)

satisfies σM L ≤ σmaj and we see that σmaj (mc ) = σM L (mc ) = (x − mc )t W t W (x − mc ).

σM L (mc+1 ) ≤ σmaj (mc+1 ) ≤ σmaj (mc ) ≤ σM L (mc ) (58)

To find the best solution, we perform the following algorithm:

2 Find mc+1 = arg min σmaj (mc )

13. Describe the Kalman filter, motivate it as a particular case of a Bayesian

Kalman Filter: Let (Xk , Zk ) follow the linear equation:

(a) {Fk }k≥0 deterministic

We assume that W and V are independent. Then we have,

π(µn+1 |{Yj }n+1

Un+1 = e−1 m̂n Cn+1 = e−2 Ĉn + Σ

m̂n+1 = (I − Kn+1 )mn+1 + Kn+1 Yn+1 Ĉn+1 = (I − Kn+1 )Cn+1

Def. Y = {Yt , t = 0, 1, 2, . . .} is a Markov Chain with state space S, initial distribution

p(T → T ) = 0.9 p(T → N ) = 0.1 p(N → T ) = 0.5 p(N → N ) = 0.5 (67)

Then we have the transition matrix,

Then the Markov Chain X ∼ M C(λ, p) is a regular M C such that:

lim P n = (1, . . . , 1)t w̄ (68)

where w̄ is the distribution that solves

We find the solution to be w̄ = [5/6, 1/6]

u(y, T ) = P (X(T ) = y|X(0) = x).

These probabilities satisfy an evolution equation moving forward in time:

µ(t) =< µ1 (x1 , t), . . . , µn (xN , t) > (72)

And so we can write,

f (x, t) = E(V (X(T ))|X(t) = x).

Def. F (x, T ) = E[V[X(T )]|X(t) = x] for t < T and x ∈ S. Given F (x, t) =

f (t) = P f (t + 1) with f (T ) = V (x) ∀x ∈ S (76)

In particular , if q(x|y) = q(y|x) =⇒ p(x, y) = min{1, ff (x)

f (x)t(x, y) = f (y)t(y, x) (79)

where t(x, y) are the transition probabilities of the MC defined by Metropolis-Hasting.

By two identities corresponding to two terms in f (x)t(x, y).

(a) ρ(x, y)q(y, x)f (x) = ρ(y, x)q(x|y)f (y).

We see that, ( →0