0% found this document useful (0 votes)
48 views

Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu

The document discusses basic sampling methods for machine learning. It covers ancestral sampling using Bayesian networks to generate samples from a joint distribution. It also discusses transforming a uniform distribution using an inverse transformation to generate samples from other distributions like exponential and Cauchy. Rejection sampling and importance sampling methods are also mentioned. The goal of these sampling techniques is to approximate expectations when exact inference is intractable for probabilistic models.

Uploaded by

asdfasdffdsa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu

The document discusses basic sampling methods for machine learning. It covers ancestral sampling using Bayesian networks to generate samples from a joint distribution. It also discusses transforming a uniform distribution using an inverse transformation to generate samples from other distributions like exponential and Cauchy. Rejection sampling and importance sampling methods are also mentioned. The goal of these sampling techniques is to approximate expectations when exact inference is intractable for probabilistic models.

Uploaded by

asdfasdffdsa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Machine Learning ! ! ! ! !

Srihari

Basic Sampling Methods

Sargur Srihari
[email protected]

1
Machine Learning ! ! ! ! ! Srihari

Topics
1.  Motivation
Intractability in ML
How sampling can help
2.  Ancestral Sampling
Using BNs
3.  Transforming a Uniform Distribution
4.  Rejection Sampling
5.  Importance Sampling
6.  Sampling-Importance-Resampling
2
Machine Learning ! ! ! ! ! Srihari

1. Motivation
•  When exact inference is intractable, we need
some form of approximation
–  True of probabilistic models of practical significance
•  Inference methods based on numerical sampling
are known as Monte Carlo techniques
•  Most situations will require evaluating
expectations of unobserved variables, e.g., to
make predictions
–  Rather than the posterior distribution

3
Machine Learning ! ! ! ! ! Srihari

Common Task
Illustration

•  Find expectation of some


function f(z)
–  with respect to a probability
distribution p(z)
–  Components of z can be
discrete, continuous or In Bayesian regression,
combination From sum and product rules
p(t | x)= ∫ p(t|x,w)p(w)dw
•  In case of continuous we
where p(t |x,w)~N(y(x,w),β-1)
wish to evaluate
E[ f ] = ∫ f (z) p(z)dz In general such expectations
are too complex to be evaluated
analytically
•  In discrete case, integral
replaces by summation 4
Machine Learning ! ! ! ! ! Srihari

Sampling: main idea


•  Obtain set of samples z(l) where i =1,.., L
•  Drawn independently from distribution p(z)
•  Allows expectation E[ f ] = ∫ f (z) p(z)dz to be approximated by
L
1
fˆ = ∑ f ( z (l ) )
L i =1 Called an estimator

E[ fˆ ] = E[ f ]
•  Then , i.e., estimator has the correct mean
1 "
–  And var[ fˆ ] = E #( f − E( f )) $%
2
which is the variance of the estimator
L
•  Accuracy independent of dimensionality of z
–  High accuracy can be achieved with few (10 or twenty samples)
•  However samples may not be independent
–  Effective sample size may be smaller than apparent sample size
–  In example f(z) is small when p(z) is high and vice versa
•  Expectation may be dominated by regions of small probability thereby requiring
large sample sizes

5
Machine Learning ! ! ! ! ! Srihari

2. Ancestral Sampling
•  If joint distribution is represented by a BN
–  no observed variables
–  a straightforward method exists
M
•  Distribution is specified by p( z ) = ∏ p(z i | pa i )
i =1
–  where zi are set of variables associated with node i and
–  pai are set of variables associated with node parents of node i
•  To obtain samples from joint
–  we make one pass through set of variables in order z1,..zM
sampling from conditional distribution p(z|pai)
•  After one pass through the graph we obtain one sample
•  Frequency of different values defines the distribution
–  E.g., allowing us to determine marginals
P(L, S) = ∑ P(D, I, G, L, S)= ∑ P(D)P(I )P(G | D, I )P(L | G)P(S | I )
D,I,G D,I,G 6
Machine Learning ! ! ! ! ! Srihari

Ancestral sampling with some


nodes instantiated
•  Directed graph where some nodes instantiated
with observed values
P(L = l 0 , s = s1 ) =
∑ P(D)P(I )P(G | D, I ) ×
D,I,G

P(L = l 0 | G)P(S = s1 | I )

•  Called Logic sampling


•  Use ancestral sampling, except
–  When sample is obtained for an observed value:
•  if they agree then sample value is retained and proceed
to next variable
•  If they don’t agree, whole sample is discarded
7
Machine Learning ! ! ! ! ! Srihari

Properties of Logic Sampling


P(L = l 0 , s = s1 ) =
∑ P(D)P(I )P(G | D, I ) ×
D,I,G

P(L = l 0 | G)P(S = s1 | I )

•  Samples correctly from posterior distribution


–  Corresponds to sampling from joint distribution of
hidden and data variables
•  But prob. of accepting sample decreases as
–  no of variables increase and
–  number of states that variables can take increases
•  A special case of Importance Sampling
–  Rarely used in practice 8
Machine Learning ! ! ! ! ! Srihari

Undirected Graphs
•  No one-pass sampling strategy even from
prior distribution with no observed
variables
•  Computationally expensive methods such
as Gibbs sampling must be used
–  Start with a sample
–  Replace first variable conditioned on rest of
values, then next variable, etc

9
Machine Learning ! ! ! ! ! Srihari

3. Basic Sampling Algorithms


•  Simple strategies for generating random
samples from a given distribution
–  They will be pseudo random numbers that pass a test
for randomness
•  Assume that algorithm is provided with a
pseudo-random generator for uniform
distribution over (0,1)
•  For standard distributions we can use
transformation method of generating non-
uniformly distributed random numbers
10
Machine Learning ! ! ! ! ! Srihari

Transformation Method
•  Goal: generate random numbers from a
simple non-uniform distribution
•  Assume we have a source of uniformly
distributed random numbers
•  z is uniformly distributed over (0,1), i.e.,
p(z) =1 in the interval
p(z)
0 1
z

11
Machine Learning ! ! ! ! ! Srihari

Transformation for Standard Distributions


f(z)

p(z) p(y)
0 1
z y = f(z)

•  If we transform values of z using f() such that y=f(z)


•  Distribution of y is governed by dz
p( y ) = p( z ) (1)
dy
•  Goal is to choose f(z) such that values of y have distribution p(y)
•  Integrating (1) above
–  Since p(z)=1 and integral of dz/dy wrt y is z
y
z = h(y) ≡ ∫ p( ŷ) dŷ
−∞
–  which is an indefinite integral of p(y)
•  Thus y = h-1(z)
•  So we have to transform uniformly distributed random numbers
12
–  Using function which is inverse of the indefinite integral of the distribution
Machine Learning ! ! ! ! ! Srihari

Geometry of Transformation
•  We are interested in generating r.v. s from p(y)
–  non-uniform random variables

•  h(y) is indefinite integral of desired p(y)


•  z ~ U(0,1) is transformed using y = h-1(z)
•  Results in y being distributed as p(y)
13
Machine Learning ! ! ! ! ! Srihari

Transformations for Exponential & Cauchy


•  We need samples from the Exponential Distribution
p( y) = λ exp(−λy )
where 0 ≤ y <∞
y
–  In this case the indefinite integral is z = h(y) = ∫ p(y )d y
0

=1− exp(− λ y)
–  If we transform using y = − λ ln(1− z)−1

–  Then y will have an exponential distribution


•  We need samples from the Cauchy Distribution
–  Cauchy Distribution 1 1
p( y ) =
π 1+ y2
–  Inverse of the indefinite integral can be expressed as a
“tan” function 14
Machine Learning ! ! ! ! ! Srihari

Generalization of Transformation Method

•  Single variable transformation


dz
p(y) = p(z)
dy
–  Where z is uniformly distributed over (0,1)

•  Multiple variable transformation
d(z1 ,..., zM )
p(y1 ,...., yM ) = p(z1 ,..., zM )
d(y1 ,..., yM )

15
Machine Learning ! ! ! ! ! Srihari

Transformation Method for Gaussian


•  Box-Muller method for Gaussian
–  Example of a bivariate Gaussian
•  First generate uniform distribution in unit
circle
–  Generate pairs of uniformly distributed random
numbers
z1 , z2 ∈(−1, 1)
•  Can be done from U(0,1) using zà2z-1
–  Discard each pair unless z12+z22<1
–  Leads to uniform distribution of points inside
unit circle with
1
p(z1 , z2 ) =
π 16
Machine Learning ! ! ! ! ! Srihari

Generating a Gaussian
•  For each pair z1,z2 evaluate the quantities
1/ 2 1/ 2
⎛ − 2 ln z1 ⎞ ⎛ − 2 ln z2 ⎞
y1 = z1 ⎜ 2 ⎟ y 2 = z 2⎜ 2 ⎟
⎝ r ⎠ ⎝ r ⎠
where r 2 = z12 + z 22
–  Then y1 and y2 are independent Gaussians with zero
mean and variance
•  For arbitrary mean and covariance matrix
If y  N(0,1) then σ y + µ has N( µ, σ 2 )
•  In multivariate case
If components are independent and N ( 0,1)
then y = µ + Lz
will have N( µ, Σ) 
17
where Σ = LLT , is called the Cholesky decomposition
Machine Learning ! ! ! ! ! Srihari

Limitation of Transformation Method


•  Need to first calculate and then invert
indefinite integral of the desired distribution
•  Feasible only for a small number of
distributions
•  Need alternate approaches
•  Rejection sampling and Importance sampling
applicable to univariate distributions only
–  But useful as components in more general
strategies
18
Machine Learning ! ! ! ! ! Srihari

4. Rejection Sampling
•  Allows sampling from a relatively complex distribution
•  Consider univariate, then extend to several variables
•  Wish to sample from distribution p(z)
–  Not a simple standard distribution
–  Sampling from p(z) is difficult
•  Suppose we are able to easily evaluate p(z) for any
given value of z, upto a normalizing constant Z
1
p(z) = p (z) Unnormalized
Zp
–  where p (z) can readily be evaluated but Zp is unknown
–  e.g., p (z) is a mixture of Gaussians
–  Note that we may know the mixture distribution but we need
19
samples to generate expectations

Machine Learning ! ! ! ! ! Srihari

Rejection sampling: Proposal distribution

•  Samples are drawn from simple distribution,


called proposal distribution q(z)
•  Introduce constant k whose value is such that
kq(z) ≥ p(z) for all z
–  Called comparison function

20
Machine Learning ! ! ! ! ! Srihari

Rejection Sampling Intuition


•  Samples are drawn from
simple distribution q(z)
•  Rejected if they fall in grey
area
–  Between un-normalized
distribution p~(z) and scaled
distribution kq(z)
•  Resulting samples are
distributed according to p(z)
which is the normalized
version of p~(z)
21
Machine Learning ! ! ! ! ! Srihari

Determining if sample is shaded area


•  Generate two random numbers
–  z0 from q(z)
–  u0 from uniform distribution [0,kq(z0)]

•  This pair has uniform distribution under the


curve of function kq(z)
•  If u0 > p(z0 ) the pair is rejected otherwise it
is retained
•  Remaining pairs have a uniform distribution
under the curve of p(z) and hence the
corresponding z values are distributed
according to p(z) as desired 22
Machine Learning ! ! ! ! ! Srihari

Example of Rejection Sampling


•  Task of sampling from Gamma
b a z a −1 exp(−bz )
Gam( z | a, b) =
Γ( a ) Scaled
Cauchy
•  Since Gamma is roughly bell- Gamma
shaped, proposal distribution is
is Cauchy
•  Cauchy has to be slightly
generalized
–  To ensure it is nowhere smaller
than Gamma

23
Machine Learning ! ! ! ! ! Srihari

Adaptive Rejection Sampling


•  When difficult to find suitable
analytic distribution p(z)
•  Straight-forward when p(z) is
log concave
–  When ln p(z) has derivatives that
are non-increasing functions of z
–  Function ln p(z) and its gradient
are evaluated at initial set of grid
points
–  Intersections are used to
construct envelope
•  A sequence of linear functions
24
Machine Learning ! ! ! ! ! Srihari

Dimensionality and Rejection Sampling


•  Gaussian example
Proposal distribution q(z) is Gaussian
•  Acceptance rate is Scaled version is kq(z)
ratio of volumes
under p(z) and
kq(z)
–  diminishes
exponentially with
True distribution p(z)
dimensionality

25
5. Importance Sampling
Machine Learning ! ! ! ! ! Srihari

•  Principal reason for sampling p(z) is evaluating


expectation of some f(z) In Bayesian regression,

p(t | x)= ∫ p(t|x,w)p(w)dw


E[ f ] = ∫ f (z) p(z)dz where
p(t |x,w)~N(y(x,w),β-1)

•  Given samples z(l), l=1,..,L, from p(z), the finite


sum approximation is 1 L (l )
f̂ =
L
∑ f (z )
i=1

•  But drawing samples p(z) may be impractical


•  Importance sampling uses:
–  a proposal distribution– like rejection sampling
•  But all samples are retained
–  Assumes that for any z, p(z) can be evaluated
26
Machine Learning ! ! ! ! ! Srihari

Determining Importance weights


•  Samples {z(l)} are drawn from simpler dist. q(z)
Proposal
E[ f ] = ∫ f (z)p(z)dz distribution

p(z)
= ∫ f (z) q(z)dz
q(z)
1 L p(z(l ) ) (l )
= ∑ (l )
f (z ) Unlike rejection sampling
L l=1 q(z )
All of the samples are retained

•  Samples are weighted by ratios rl= p(z(l)) / q(z(l))


–  Known as importance weights
•  Which corrects the bias introduced by wrong distribution
27
Machine Learning ! ! ! ! ! Srihari

Likelihood-weighted Sampling
•  Importance sampling of graphical model using
ancestral sampling
•  Weight of sample z given evidence variables E

r(z) = ∏ p(zi | pai )


zi ∈E

28
Machine Learning ! ! ! ! ! Srihari

6. Sampling-Importance Re-sampling (SIR)


•  Rejection sampling depends on Rejection Sampling

suitable value of k where kq(z)> p(k) Proposal q(z) is Gaussian.


Scaled version is kq(z)
–  For many pairs of distributions p(z) and
q(z) it is impractical to determine value
of k
–  If it is sufficiently large to guarantee a
bound then impractically small
acceptance rates
•  Method makes use of sampling True distribution p(z)

distribution q(z) but avoids having to


determine k
•  Name of method: Sampling
proposal distribution followed by
determining importance weights 29
and then resampling
Machine Learning ! ! ! ! ! Srihari

SIR Method
•  Two stages
•  Stage 1: L samples z(1),..,z(L) are drawn from q(z)
•  Stage 2: Weights w1,..,wL are constructed
–  As in importance sampling wl= p(z(l)) / q(z(l))
•  Finally a second set of L samples are drawn
from the discrete distribution {z(1),..,z(L) } with
probabilities given by {w1,..,wL}
•  If moments wrt distribution p(z) are needed,
use:
E[ f ] = ∫ f (z)p(z)dz
L
=∑ wl f (z(l ) )
l=1
30

You might also like