0% found this document useful (0 votes)
8 views

Importance Sampling

This document introduces importance sampling, which is a technique for approximating expectations. It explains that importance sampling allows estimating integrals by sampling from a different distribution than the target distribution, as long as the sampling distribution is non-zero where the target is non-zero. The key idea is that importance weights correct for bias introduced by sampling from the wrong distribution. Two examples are provided to illustrate importance sampling for computing probabilities in the tails of normal distributions. Numerical issues that can arise for rare events are also discussed, along with how to address them by working with logarithms of the importance weights.

Uploaded by

shuhuifang2023
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Importance Sampling

This document introduces importance sampling, which is a technique for approximating expectations. It explains that importance sampling allows estimating integrals by sampling from a different distribution than the target distribution, as long as the sampling distribution is non-zero where the target is non-zero. The key idea is that importance weights correct for bias introduced by sampling from the wrong distribution. Two examples are provided to illustrate importance sampling for computing probabilities in the tails of normal distributions. Numerical issues that can arise for rare events are also discussed, along with how to address them by working with logarithms of the importance weights.

Uploaded by

shuhuifang2023
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Introduction to Importance Sampling

Xingjie

2023-12-14

Definition
Suppose we want to compute the expectation of a function f (X) with respect to a distribution with density
p(x). So Z
I = f (x)p(x)dx.

That is
I = E[f (X)]
where the expectation is taken over a random variable X with density p. We could approximate I by “naive
simulation”:
1 X
I≈ f (Xi )
M i

where X1 , . . . , XM ∼ p(x).
Now let q(x) denote any other density function that is non-zero whenever p(x) is non-zero. (We need this
condition to avoid dividing by 0 in what follows). Then we can rewrite this as
Z
I = f (x)[p(x)/q(x)]q(x)dx.

That is
f (X)p(X)
I = E[ ]
q(X)
where the expectation is taken over a random variable X with density q. So we could also approximate I by
simulation:
1 X p(Xi0 ) 1 X
I≈ 0 f (Xi0 ) = w(Xi0 )f (Xi0 )
M i q(Xi ) M i

where X10 , . . . , XM
0
∼ q(x).
This is called “Importance Sampling” (IS) and q is called the “Importance sampling function”. The quantities
wi are known as importance weights, and they correct the bias introduced by sampling from the wrong
distribution.
An obvious problem with this approach is that the number of terms in the summation grows exponentially
with the dimensionality of X.
The idea behind IS is that if q is well chosen then the approximation to I will be better than the naive
approximation.

1
Examples 1
Suppose X ∼ N (0, 1), and we want to compute Pr(X > z) for z = 10. That is, f (x) = I(x > 10) and
p(x) = φ(x) is the density of the standard normal distribution.
Let’s try naive simulation,
and compare it with the “truth”, as ascertained by the R function pnorm.
set.seed(100)
X = rnorm(100000)
mean(1*(X>10))

## [1] 0
pnorm(10,lower.tail=FALSE)

## [1] 7.619853e-24
You can see the problem with naive simulation: all the simulations are less than 10 (where f(x)=0), so you
don’t get any precision.
φ(y)
Now we use IS. Here we code the general case for z, using IS function q to be N (z, 1). w(y) = q(y) =
2
(y−z)2
exp[− y2 − 2 ].
Note that because of this choice of q many of the samples are > z, where f is non-zero, and we hope to get
better precision. Of course, we could do this problem much better ways. . . this is just a toy illustration of IS.
pnorm.IS= function(z,nsamp=100000){
y = rnorm(nsamp,z,1)
w = exp(dnorm(y, 0, 1, log = TRUE) - dnorm(y, z, 1, log = TRUE))
mean(w*(y>z))
}
pnorm.IS(10)

## [1] 7.673529e-24
pnorm(10,lower.tail=FALSE)

## [1] 7.619853e-24

Example 2: computing with means on log scale


We just push this example a bit further, to illustrate a numerical issue that can arise quite generally (not just
for IS).
Let’s try the above with z = 100.
pnorm.IS(100)

## [1] 0
pnorm(100,lower.tail=FALSE)

## [1] 0
Hmmm.. we are having numerical issues.
The trick to solving this is doing things on log scale,
1 X m 1 X
log Pr(X > z) = log w(Xi0 )I(Xi0 > z) = log + log w(Xi0 )I(Xi0 > z),
M i M m i

2
0
P
where m = m I(Xm > z). Since z is too large, each w(xm ) → 0. How can we utilize the log function in
computation to assist in scaling down exponential values in w? This function allows us to do this.
#function to find the log of the mean of exp(lx).
lmean=function(lx){
m = max(lx)
m + log(mean(exp(lx-m)))
}

Exploiting this we can now do IS for z = 100:


lpnorm.IS= function(z,nsamp=100000){
y = rnorm(nsamp,z,1)
lw = dnorm(y,0,1,log=TRUE) - dnorm(y,z,1, log=TRUE) # log-weights
log(mean(y>z)) + lmean(lw[y>z])
}
lpnorm.IS(100)

## [1] -5005.571
pnorm(100,lower.tail=FALSE,log=TRUE)

## [1] -5005.524
Acknowledgement: This lecture note is from Matthew Stephens.

You might also like