Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu
Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu
Srihari
Sargur Srihari
[email protected]
1
Machine Learning ! ! ! ! ! Srihari
Topics
1. Motivation
Intractability in ML
How sampling can help
2. Ancestral Sampling
Using BNs
3. Transforming a Uniform Distribution
4. Rejection Sampling
5. Importance Sampling
6. Sampling-Importance-Resampling
2
Machine Learning ! ! ! ! ! Srihari
1. Motivation
• When exact inference is intractable, we need
some form of approximation
– True of probabilistic models of practical significance
• Inference methods based on numerical sampling
are known as Monte Carlo techniques
• Most situations will require evaluating
expectations of unobserved variables, e.g., to
make predictions
– Rather than the posterior distribution
3
Machine Learning ! ! ! ! ! Srihari
Common Task
Illustration
E[ fˆ ] = E[ f ]
• Then , i.e., estimator has the correct mean
1 "
– And var[ fˆ ] = E #( f − E( f )) $%
2
which is the variance of the estimator
L
• Accuracy independent of dimensionality of z
– High accuracy can be achieved with few (10 or twenty samples)
• However samples may not be independent
– Effective sample size may be smaller than apparent sample size
– In example f(z) is small when p(z) is high and vice versa
• Expectation may be dominated by regions of small probability thereby requiring
large sample sizes
5
Machine Learning ! ! ! ! ! Srihari
2. Ancestral Sampling
• If joint distribution is represented by a BN
– no observed variables
– a straightforward method exists
M
• Distribution is specified by p( z ) = ∏ p(z i | pa i )
i =1
– where zi are set of variables associated with node i and
– pai are set of variables associated with node parents of node i
• To obtain samples from joint
– we make one pass through set of variables in order z1,..zM
sampling from conditional distribution p(z|pai)
• After one pass through the graph we obtain one sample
• Frequency of different values defines the distribution
– E.g., allowing us to determine marginals
P(L, S) = ∑ P(D, I, G, L, S)= ∑ P(D)P(I )P(G | D, I )P(L | G)P(S | I )
D,I,G D,I,G 6
Machine Learning ! ! ! ! ! Srihari
P(L = l 0 | G)P(S = s1 | I )
P(L = l 0 | G)P(S = s1 | I )
Undirected Graphs
• No one-pass sampling strategy even from
prior distribution with no observed
variables
• Computationally expensive methods such
as Gibbs sampling must be used
– Start with a sample
– Replace first variable conditioned on rest of
values, then next variable, etc
9
Machine Learning ! ! ! ! ! Srihari
Transformation Method
• Goal: generate random numbers from a
simple non-uniform distribution
• Assume we have a source of uniformly
distributed random numbers
• z is uniformly distributed over (0,1), i.e.,
p(z) =1 in the interval
p(z)
0 1
z
11
Machine Learning ! ! ! ! ! Srihari
p(z) p(y)
0 1
z y = f(z)
Geometry of Transformation
• We are interested in generating r.v. s from p(y)
– non-uniform random variables
=1− exp(− λ y)
– If we transform using y = − λ ln(1− z)−1
15
Machine Learning ! ! ! ! ! Srihari
Generating a Gaussian
• For each pair z1,z2 evaluate the quantities
1/ 2 1/ 2
⎛ − 2 ln z1 ⎞ ⎛ − 2 ln z2 ⎞
y1 = z1 ⎜ 2 ⎟ y 2 = z 2⎜ 2 ⎟
⎝ r ⎠ ⎝ r ⎠
where r 2 = z12 + z 22
– Then y1 and y2 are independent Gaussians with zero
mean and variance
• For arbitrary mean and covariance matrix
If y N(0,1) then σ y + µ has N( µ, σ 2 )
• In multivariate case
If components are independent and N ( 0,1)
then y = µ + Lz
will have N( µ, Σ)
17
where Σ = LLT , is called the Cholesky decomposition
Machine Learning ! ! ! ! ! Srihari
4. Rejection Sampling
• Allows sampling from a relatively complex distribution
• Consider univariate, then extend to several variables
• Wish to sample from distribution p(z)
– Not a simple standard distribution
– Sampling from p(z) is difficult
• Suppose we are able to easily evaluate p(z) for any
given value of z, upto a normalizing constant Z
1
p(z) = p (z) Unnormalized
Zp
– where p (z) can readily be evaluated but Zp is unknown
– e.g., p (z) is a mixture of Gaussians
– Note that we may know the mixture distribution but we need
19
samples to generate expectations
Machine Learning ! ! ! ! ! Srihari
20
Machine Learning ! ! ! ! ! Srihari
23
Machine Learning ! ! ! ! ! Srihari
25
5. Importance Sampling
Machine Learning ! ! ! ! ! Srihari
p(z)
= ∫ f (z) q(z)dz
q(z)
1 L p(z(l ) ) (l )
= ∑ (l )
f (z ) Unlike rejection sampling
L l=1 q(z )
All of the samples are retained
Likelihood-weighted Sampling
• Importance sampling of graphical model using
ancestral sampling
• Weight of sample z given evidence variables E
28
Machine Learning ! ! ! ! ! Srihari
SIR Method
• Two stages
• Stage 1: L samples z(1),..,z(L) are drawn from q(z)
• Stage 2: Weights w1,..,wL are constructed
– As in importance sampling wl= p(z(l)) / q(z(l))
• Finally a second set of L samples are drawn
from the discrete distribution {z(1),..,z(L) } with
probabilities given by {w1,..,wL}
• If moments wrt distribution p(z) are needed,
use:
E[ f ] = ∫ f (z)p(z)dz
L
=∑ wl f (z(l ) )
l=1
30