AAEC 6984 / SPRING 2014 Instructor: Klaus Moeltner
AAEC 6984 / SPRING 2014 Instructor: Klaus Moeltner
Assume throughout that θ1 , θ2 , and z are random elements (scalars or vectors), and y symbolizes
observed data. The letter p will denote a generic distribution or probability.
Of course, other split-ups are possible as well. The split-up strategy is usually chosen to be left with
as many known densities as possible on the right hand side. If the original joint density is already
conditioned on some other variable, that conditioning is carried through all subsequent components.
Example:
p (θ1 , θ2 , z|y) = p (θ1 |y) p (θ2 , z|θ1 , y) = p (θ1 |y) p (θ2 |θ1 , y) p (z|θ1 , θ2 , y) (2)
Z
p (θ1 ) = p (θ1 , θ2 , z) dz dθ2 =
θ2 ,z
Z (3)
p (θ1 |θ2 , z) p (θ2 , z) dz dθ2
θ2 ,z
Z
p (θ1 |y) = p (θ1 , θ2 , z|y) dz dθ2 =
θ2 ,z
Z (4)
p (θ1 |θ2 , z, y) p (θ2 , z|y) dz dθ2
θ2 ,z
1
by drawing from p (θ1 |θ2,r , zr , y) for many different draws of θ2,r , zr from p (θ2 , z|y).
The Gibbs Sampler is a special case of this strategy with a built-in reciprocity condition. Drop-
ping z for convenience and without loss in generality, assume we need draws from p (θ1 |y), but we
only know the form of p (θ1 |θ2 , y). Using the integration trick and the “breaking up a joint”-trick,
we obtain:
Z
p (θ1 |y) = p (θ1 , θ2 |y) dθ2 =
Zθ2 (5)
p (θ1 |θ2 , y) p (θ2 |y) dθ2
θ2
The problem here is that we don’t know the other marginal either, i.e. we don’t know p (θ2 |y).
However, if we have 1 draw of θ2 (our staring value for the GS), we can take a single draw of θ1
from p (θ1 |θ2 , y). By the reasoning above, this will also be a draw from the marginal p (θ1 |y). We
can then set up the reverse integration problem for θ2 , i.e.
Z
p (θ2 |y) = p (θ1 , θ2 |y) dθ1 =
θ1
Z (6)
p (θ2 |θ1 , y) p (θ1 |y) dθ1
θ1
If p (θ2 |θ1 , y) is known, we can draw θ2 from it (conditioning on the draw of θ1 we just obtained
from the first step). This will also be a draw from p (θ2 |y). This process is then repeated many
times to yield draws from the entire support of p (θ1 |y) and p (θ2 |y).
Example:
Z
p θ̄1 |y = p θ̄1 , θ2 |y dθ2 =
Zθ2 (7)
p θ̄1 |θ2 , y p (θ2 |y) dθ2
θ2
2
If we know p (θ1 |θ2 , y), and we have draws of θ2 from p (θ2 |y), we can approximate p θ̄1 |y via:
R
X
1
p θ̄1 |y ≈ R p θ̄1 |θ2,r , y (8)
r=1
Z
E (θ1 |y) = θ1 ∗ p (θ1 , θ2 |y) dθ2 =
Zθ2 (9)
θ1 ∗ p (θ1 |θ2 , y) p (θ2 |y) dθ2
θ2
using r = 1 . . . R draws of θ1 from p (θ1 |θ2,r , y), which themselves are based on r = 1 . . . R draws of
θ2 from p (θ2 |y).
The same logic holds for any other (smooth, continuous) function g (θ1 |y), which is exploited
when generating posterior predictive distributions (PPDs).