04 Sampling
04 Sampling
Lecture 04
Sampling
Philipp Hennig
27 April 2021
Faculty of Science
Department of Computer Science
Chair for the Methods of Machine Learning
# date content Ex # date content Ex
1 20.04. Introduction 1 14 09.06. Logistic Regression 8
2 21.04. Reasoning under Uncertainty 15 15.06. Exponential Families
3 27.04. Continuous Variables 2 16 16.06. Graphical Models 9
4 28.04. Monte Carlo 17 22.06. Factor Graphs
5 04.05. Markov Chain Monte Carlo 3 18 23.06. The Sum-Product Algorithm 10
6 05.05. Gaussian Distributions 19 29.06. Example: Topic Models
7 11.05. Parametric Regression 4 20 30.06. Mixture Models 11
8 12.05. Understanding Deep Learning 21 06.07. EM
9 18.05. Gaussian Processes 5 22 07.07. Variational Inference 12
10 19.05. An Example for GP Regression 23 13.07. Example: Topic Models
11 25.05. Understanding Kernels 6 24 14.07. Example: Inferring Topics 13
12 26.05. Gauss-Markov Models 25 20.07. Example: Kernel Topic Models
13 08.06. GP Classification 7 26 21.07. Revision
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 1
A Computational Challenge
Integration is the core computation of probabilistic inference
f(x) = x mean
f(x) = (x − Ep (x)) 2
variance
p
f(x) = x p-th moment
f(x) = − log x entropy
..
.
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 2
The Toolbox
Framework:
Z
p(y | x)p(x)
p(x1 , x2 ) dx2 = p(x1 ) p(x1 , x2 ) = p(x1 | x2 )p(x2 ) p(x | y) =
p(y)
Modelling: Computation:
▶ Directed Graphical Models ▶ Monte Carlo
▶ ▶
▶ ▶
▶ ▶
▶ ▶
▶
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 3
Randomized Methods — Monte Carlo
the idea
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 4
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 image source: wikipedia u:fruitpunchline 5
A method from a different age
Monte Carlo Methods and the Manhattan Project images: Los Alamos National Laboratory / wikipedia
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 6
The FERMIAC
analog Monte Carlo computer images: wikipedia
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 7
Example
a dumb way to compute π
1
π/4
▶ ratio of quarter-circle to square:
R 1 0.8
▶ π = 4 I(x⊺ x < 1)u(x) dx
▶ draw x ∼ u(x), check x⊺ x < 1, count 0.6
0.2
> 3.13708
> 3.14276
0
0 0.2 0.4 0.6 0.8 1
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 8
Monte Carlo works on every Integrable Function
is this a good thing?
Z
ϕ := f(x)p(x) dx = Ep (f)
▶ Let xs ∼ p, s = 1, . . . , S iid. (i.e. p(xs = x) = p(x) and p(xs , xt ) = p(xs )p(xt ) ∀s, t)
1X
S
ϕ̂ := f(xs ) ^ the Monte Carlo estimator is …
S
s=1
Z Z
1X 1X
S S
E(ϕ̂) =: f(xs )p(xs ) dxs = f(xs )p(xs ) dxs
S S
s=1 s=1
1X
S
E(f(xs )) = ϕ = ^ … an unbiased estimator!
S
s=1
R
▶ the only requirement for this is that f(x)p(x) dx exists (i.e. f must be Lebesgue-integrable
relative to p). Monte Carlo integration can even work on discontinuous functions.
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 9
Sampling converges slowly
expected square error
1 X
S X
S
= E(f(xs )f(xr )) − ϕE(f(xs )) − E(f(xr ))ϕ + ϕ2
S2
s=1 r=1
1 X
S
X
= 2 ϕ2 − 2ϕ2 + ϕ2 + E(f2 ) − ϕ2
S | {z } | {z }
s=1 r̸=s =0 =:var(f)
1
= var(f) = O(S−1 )
S
▶ Thus, the expected error (the square-root of the expected square error) drops as O(S−1/2 )
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 10
sampling is for rough guesses
recall example computation for π
101
√
MC var(f)/s
6 π
0
10
4
10−1
ϕ̂
2
10−2
0
10−3
100 101 102 103 104 105 100 101 102 103 104 105
# samples # samples
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 12
Reminder: Change of Measure
The transformation law
Let X = (X1 , . . . , Xd ) have a joint density pX . Let g : Rd _ Rd be continously differentiable and injective,
with non-vanishing Jacobian Jg . Then Y = g(X) has density
(
pX (g−1 (y)) · |Jg−1 (y)| if y is in the range of g,
pY (y) =
0 otherwise.
∂gi (x)
The Jacobian Jg is the d × d matrix with [Jg (x)]ij = ∂xj .
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 13
Some special cases
sampling from an exponential distribution is analytic
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Z
1
p(x) = e−x/λ p(x) dx = 1 − e−x/λ
λ
1 − u = 1 − e−x/λ x = −λ log(u)
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 14
Example: Sampling from a Beta Distribution
uniform variables
Consider u ∼ U[0, 1] (i.e. u ∈ [0, 1], and p(u) = 1). The variable x = u1/α has the Beta density
∂u(x)
px (x) = pu (u(x)) · = α · xα−1 = B(x; α, 1).
∂x
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 15
Example: Sampling from a Beta Distribution
uniform variables
Consider u ∼ U[0, 1] (i.e. u ∈ [0, 1], and p(u) = 1). The variable x = u1/α has the Beta density
∂u(x)
px (x) = pu (u(x)) · = α · xα−1 = B(x; α, 1).
∂x
Homework:
Consider two independent variables
X ∼ G(α, θ) Y ∼ G(β, θ)
where Γ(ξ; α, θ) = 1
Γ(α)θ k
ξ α−1 e−ξ/θ is the Gamma distribution. Show that the random variable
X
Z= X+Y is Beta distributed, with the density
Γ(α + β) α−1
p(Z = z) = B(z; α, β) = z (1 − z)β−1 .
Γ(α)Γ(β)
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 15
▶ samples from a probability distribution can be used to estimate expectations, roughly
▶ ‘random numbers’ don’t really need to be unpredictable, as long as they have as little structure as
possible
▶ uniformly distributed random numbers can be transformed into other distributions. This can be
done numerically efficiently in some cases, and it is worth thinking about doing so
What do we do if we don’t know a good transformation?
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 16
Why is sampling hard?
Sampling is harder than global optimization
p̃(x)
p(x) =
Z
assuming that it is possible to evaluate the unnormalized density p̃ (but not p) at arbitrary points.
Typical example: Compute moments of a posterior
p(D | x)p(x) 1X n
p(x | D) = R as Ep(x|D) (xn ) ≈ x with xi ∼ p(x | D)
p(D, x) dx S s i
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 17
Rejection Sampling
a simple method [Georges-Louis Leclerc, Comte de Buffon, 1707–1788]
0.4
0.3
0.2
0.1
0
−4 −2 0 2 4 6 8 10
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 18
The Problem with Rejection Sampling
the curse of dimensionality [MacKay, §29.3]
0.4
Example:
p(x) ▶ p(x) = N (x; 0, σp2 )
cq(x)
▶ q(x) = N (x; 0, σq2 )
0.3
▶ σq > σ p
▶ optimal c is given by
D
p(x)
0.2
(2πσq2 )D/2 σq σq
c= = = exp D ln
(2πσp2 )D/2 σp σp
0.1
▶ acceptance rate is ratio of volumes: 1/c
▶ rejection rate rises exponentially in D
0
−4 −2 0 2 4 ▶ for σq /σp = 1.1, D = 100, 1/c < 10−4
x
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 19
Importance Sampling
a slightly less simple method
0
−2 0 2 4 6 8 −20 0 20 40 60 80 100
x f(x), g(x)
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 21
Sampling (Monte Carlo) Methods
Sampling is a way of performing rough probabilistic computations, in particular for expectations
(including marginalization).
▶ samples from a probability distribution can be used to estimate expectations, roughly
▶ uniformly distributed random numbers can be transformed into other distributions. This can be
done numerically efficiently in some cases, and it is worth thinking about doing so
▶ Rejection sampling is a primitive but exact method that works with intractable models
▶ Importance sampling makes more efficient use of samples, but can have high variance (and this
may not be obvious)
Next Lecture:
▶ Markov Chain Monte Carlo methods are more elaborate ways of getting approximate answers to
intractable problems.
Probabilistic ML — P. Hennig, SS 2021 — Lecture 04: Sampling— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 22