0% found this document useful (0 votes)
3 views14 pages

CH 12

Chapter 12 of ECO545A discusses endogeneity in econometric models, particularly in treatment, unobserved covariate, and sample selection models. It highlights how confounding variables can bias the average treatment effect (ATE) and introduces the use of instrumental variables (IVs) to address this issue. The chapter provides mathematical formulations and Bayesian approaches for estimating treatment effects while accounting for endogeneity.

Uploaded by

navnihal1819
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views14 pages

CH 12

Chapter 12 of ECO545A discusses endogeneity in econometric models, particularly in treatment, unobserved covariate, and sample selection models. It highlights how confounding variables can bias the average treatment effect (ATE) and introduces the use of instrumental variables (IVs) to address this issue. The chapter provides mathematical formulations and Bayesian approaches for estimating treatment effects while accounting for endogeneity.

Uploaded by

navnihal1819
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ECO545A: Bayesian Econometrics

April 10, 2023

Chapter 12: Endogeneity in Selected Models


In real life applications, we often come across data in which the covariates are correlated
with the disturbance term (i.e., the exogeneity assumption is no longer tenable). Such covariates
are referred to as endogenous variables in the econometrics literature. Here, we will study
endogeneity in treatment models, unobserved covariate models, and sample selection models
subject to incidental truncation.

Treatment Models
Treatment models are used to compare responses of individuals who either belong to the
treatment group or control group. And the average difference in response between the two
groups is known as the average treatment effect or ATE. This difference gives the true treatment
effect, if the assignment to the group is random i.e., the assignment is independent of the any
characteristic of the individual. For example, suppose cancer patients are randomly selected to
form two groups, where the first group is given an old medicine and the second group is given
the new medicine. Improvement, if any, in the average years of survival for the second group
may then be attributed to the new medicine (assuming compliance by both groups are met).
However, often whether an individual belongs to the treatment or control group is a choice
made by the individual, and the choice may be related to the unobserved covariates that are
correlated with the response variable. Such unobserved covariates are called confounders in the
Statistics literature; and in the econometrics literature such treatment assignment is said to be
endogenous. In such cases, the ATE fails to give the true effect and may lead to misleading
inference.
For example, suppose we are interested in wages and the treatment is participation in a
job training program. It is possible that people who participate in the program have more
motivation and would earn higher wages, even without participating in the program, than those
with less motivation. The problem may be (absent or) less serious if individuals are assigned
randomly to the training program, but there still may be some confounding. Such a situation
can arise, if individuals assigned to the program may choose not to participate, and individuals
not assigned to the program may find a way to participate. Inferences drawn from models that
ignore confounding may yield misleading results.

1
To formally explain endogeneity in treatment effect model, suppose that the response vari-
able is related to the covariates and the treatment assignment as follows,

y0i = x0i β0 + u0i ,


(1)
y1i = x0i β1 + u1i ,

where i = 1, . . . , n; xi is a (K1 × 1) vector of covariates; a 0 subscript indicates assignment to


the control group; and 1 indicates assignment to the treatment group. The group assignment
is determined by the binary variable si as follows,
(
0 if i is in the control group
si =
1 if i is in the treatment group.

One primary objective of such model is to determine the effect of the treatment on the
response. Typically, this is measured by calculating the ATE. The ATE given xi is defined as,

ATE(xi ) = E(y1i − y0i |xi ) = x0i (β1 − β0 ).

For a sample of n individuals, the ATE is calculated as,


n
1X 0
ATE(xi ) = xi (β1 − β0 ).
n
i=1

Since an individual is assigned to either the treatment or control group, only one of y0i and y1i
are observed. In the presence of confounding, the data provide information about E(y1i |si = 1)
and E(y0i |si = 0), but the difference between them is not the ATE, because of the correlation
between the responses and the assignment variable. To see this, we express the difference as

E(y1i |si = 1) − E(y0i |si = 0) = x0i (β1 − β0 ) + E(u1i |si = 1) − E(u0i |si = 0) .
 

In the presence of confounding, the terms within the square bracket is not equal to zero. Hence,
the difference E(y1i |si = 1) − E(y0i |si = 0) is not equal to ATE.

One approach to solving the problem caused by confounders is to model the assignment
decision and use instrumental variables (or IV’s). The IV’s have two properties: (1) they are
independent of u0i and u1i , and (2) they are not independent of si .
To elucidate the problem, reconsider the wage-job training example. In this case, a possible
IV is the score on an intelligence test. Given a rich set of covariates, the test score is unlikely
to be correlated with error in wage equations, and likely to be correlated with the decision
to participate. By their independence from the confounders, the IVs introduce an element of
randomization into data that are not generated by random assignment.

To explain the treatment effect model, consider the equation (1), the independence of zi and

2
(u0i , u1i ), and

s∗i = x0i γ1 + zi0 γ2 + νi , νi ∼ N (0, 1),


(
0 if s∗i ≤ 0, (2)
si =
1 if s∗i > 0,

and set
yi = (1 − si )y0i + si y1i , (3)

where zi is a (K2 × 1) vector of instrumental variables. The latent variable s∗i for the treatment
indicator si is modeled as a binary probit model. Confounding or endogeneity arises if there
is correlation between u0i and νi or between u1i and νi . This correlation may arise due an
unobserved covariate. The resulting covariance matrices are,
!
σ00 + ω02 ω0
Cov(u0i , νi ) ≡ Σ0 = ,
ω0 1
!
σ11 + ω12 ω1
Cov(u1i , νi ) ≡ Σ1 = ,
ω1 1

where σνν = 1 because of the probit specification. The parameterization σjj + ωj2 has been
adopted to facilitate the MCMC algorithm. Note that σ00 and σ11 are positive, but ω0 and ω1
are unconstrained.
We first define certain notations. The sets of untreated and treated observations are denoted,
respectively, by N0 = {i : si = 0} and N1 = i : si = 1; n0 and n1 are, respectively, the number
of observations in N0 and N1 ; y and s are vectors containing the observed data; s∗ is a vector
containing the s∗i ; γ = (γ10 , γ20 )0 ; and Θ = (β0 , β1 , γ1 , γ2 , σ00 , σ11 , ω0 , ω1 ) contains the unknown
parameters. Let Yj , Xj , Zj and Sj∗ , respectively, be vectors and matrices with rows yji , x0i , zi0 ,
and s∗i for i ∈ Nj .

We assume that σjj and ωj are independent and specify the following prior distributions:

βj ∼ NK1 (bj0 , Bj0 ),


γ ∼ NK (g0 , G0 ),
σjj ∼ IG(νj0 /2, dj0 /2),
ωj ∼ N (mj0 , Mj0 ),

where K = K1 + K2 , and N and IG denotes normal and inverse gamma distributions, respec-
tively.

The likelihood contribution for individual i is written as,

f (yi , si |s∗i , Θ) = f (yi |si , s∗i , θ)f (si |s∗i , θ)


h i
= f (yi |s∗i , Θ) 1(si = 0)1(s∗i ≤ 0) + 1(si = 1)1(s∗i > 0) ,

3
where the second line uses the fact that yi given s∗i is independent of si and si is degenerate
given s∗i . Since the first term is conditioned on s∗i and Θ, we can regard νi = s∗i − x0i γ1 − zi0 γ2
as known. Whether yi is drawn from y0i or y1i is also known. Based on the properties of
conditional normal distribution, we have,

yji |s∗i , βj , γ, ωj , σjj ∼ N (x0i βj + νi ωj , σjj ), i ∈ Nj , j = 1, 2. (4)

Normal distribution and its properties: Let x = (x01 , x02 )0 ∼ N (µ, Σ), where x1 is
of dimension p1 × 1, x2 is of dimension p2 × 1 such that p1 + p2 = p, µ = (µ01 , µ02 )0 , and
Σ = (Σ11 , Σ12 ; Σ21 , Σ22 ). Then, the marginal and conditional distributions, respectively, for x1
are as follows:

x1 ∼ N (µ1 , Σ11 )
x1 |x2 ∼ N (µ1 + Σ12 Σ−1 −1
22 (x2 − µ2 ), Σ11 − Σ12 Σ22 Σ21 ).

The marginal and conditional distributions for x2 can be obtained by interchanging the sub-
scripts.

Coming back to the treatment effect model, the conditional mean for yji is obtained as
µji = x0i βj + ωj × 1 × (νi − 0) = x0i βj + νi ωj . The conditional variance is obtained as Σj|−j =
σjj + ωj2 − ωj × 1 × ωj = σjj .

The joint posterior distribution is obtained by multiplying the likelihood with the prior
distributions and deriving the conditional posteriors for the objects of interest while holding
the remaining fixed.

(1) We first find the conditional posterior for (βj , ωj ) by multiplying the likelihood contri-
bution given by (4) with the prior distribution of (βj , ωj ). To write the distribution, define Vj
as the vector with rows νi for i ∈ Nj , φj = (βj0 , ωj0 )0 , Wj = (Xj , Vj ), fj0 = (b0j0 , mj0 )0 , and
!
Bj0 0
Fj0 = .
0 Mj0

The conditional posterior for (βj , ωj ) can then be expressed as,


 
1
π(βj , ωj |Yj , γ, Sj∗ , σjj ) ∝ exp − 0
(Yj − Wj φj ) (Yj − Wj φj )
2σjj
 
1 0 −1
× exp − (φj − fj0 ) Fj0 (φj − fj0 ) .
2

Opening the quadratic terms and rearranging, we obtain the following distribution:

π(βj , ωj |Yj , γ, Sj∗ , σjj ) ∝ N (β̃j0 , B̃j0 ), (5)

4
where we have used the following notation,
h i−1
−1 −1
B̃j0 = Fj0 + σjj Wj0 Wj ,
h i
−1 −1
β̃j0 = B̃j0 Fj0 fj0 + σjj Wj0 Yj .

(2) The conditional posterior for σjj can be found by multiplying its prior with the likelihood
contribution (4), which yields,
 
−(n/2) 1
π(σjj |Yj , Sj∗ , γ, φj ) ∝ σjj exp − 0
(Yj − Wj φj ) (Yj − Wj φj )
2σjj
 
−(νj0 /2+1) dj0
× σjj exp − .
2σjj

Rearranging the above terms, we have the following:

π(σjj |Yj , Sj∗ , γ, φj ) ∼ IG(ν̃j0 /2, δ̃j0 ), (6)

where we have used the following notations,

ν̃j0 = (n + νj0 ),
d˜j0 = dj0 + (Yj − Wj φj )0 (Yj − Wj φj ).

(3) To arrive at the conditional posterior distribution for s∗i |yi , si , θ, we regard uji = yji −x0i βj
as known. Some work yields,

ωj σjj
s∗i |yi , si , θ ∼ T N (x0i γ1 + zi0 γ2 + 2 uji , ) (7)
σjj + ωj σjj + ωj2

which again uses the properties of conditional distribution from a multivariate normal distribu-
tion, and where the support of the truncated distribution is (−∞, 0] for si = 0 and (0, ∞) for
si = 1.
(4) To find the full conditional distribution of γ, define Uj as the vector of uji for j ∈ Nj ,
Pj = (Xj , Zj ), and
ωj
Tj∗ = Sj∗ − Uj .
σjj + ωj2
Multiply the likelihood for Sj∗ by the prior for γ to find the full conditional distribution,
 
∗ 1 0 −1
π(γ|y, s , β0 , β1 , σ00 , σ11 , ω0 , ω1 ) ∝ exp − (γ − g0 ) G0 (γ − g0 )
2
σ00 + ω02
 
∗ 0 ∗
× exp − (T0 − P0 γ) (T0 − P0 γ)
2σ00
σ11 + ω12 ∗
 
0 ∗
× exp − (T1 − P1 γ) (T1 − P1 γ) .
2σ11

Opening the quadratic terms and rearranging we arrive at the following conditional posterior

5
distribution,
π(γ|y, s∗ , β0 , β1 , σ00 , σ11 , ω0 , ω1 ) ∼ N (g̃, G̃), (8)

where we have used the following notations,


−1
σ00 + ω02 σ11 + ω12
    
G̃ = + G−1
0
0
P0 P0 + 0
P1 P1
σ00 σ11
σ00 + ω02 σ11 + ω12
     
−1 0 ∗ 0 ∗
g̃ = G̃ G0 g0 + P0 T0 + P1 T1 .
σ00 σ11

The sample of βj generated using the above Gibbs algorithm can be used to estimate the
ATE as follows:
1 1 X 0 (g) (g)
AT
[ E= xi (β1 − β0 ).
nG
i,g

Calculation of marginal likelihood for model comparison: The marginal likelihood


is calculated using the method proposed by Chib (1995). The log of the marginal likelihood for
our model can be written as,

log m(y, s) = ln f (y, s|β0∗ , β1∗ , γ ∗ , σ00


∗ , σ ∗ , ω ∗ , ω ∗ ) + ln π(β ∗ , β ∗ , γ ∗ , σ ∗ , ω ∗ , σ ∗ , ω ∗ )
11 0 1 0 1 00 0 11 1

− ln π(β0∗ , ω0∗ , β1∗ , ω1∗ , σ00


∗ , σ ∗ , γ ∗ |y, s),
11

where the first term is the log likelihood, the second is log prior, and the third is log posterior; all
evaluated at the posterior means of the parameters from the MCMC run, indicated by asterisk.

6
Unobserved Covariates/Endogeneity with Continuous Variable
In the previous section, we looked at endogeneity arising from a binary treatment variable.
However, endogeneity in economics is typically related to a continuous variable. We now discuss
the nature of problems caused by the presence of a continuous endogenous covariate.
Suppose, the regression model is specified as follows:

yi = x0i β1 + βs xis + ui , i = 1, . . . , n,

where x0i is a vector of dimension (1 × K1 ) and is independent of ui , but ui and xis have a joint
normal distribution as,
! ! !!
ui 0 σ11 σ12
∼ N2 , , (9)
xis E(xis ) σ12 σ22

−1
where σ12 6= 0. Then the conditional mean of ui is given by, E(ui |xis ) = 0 + σ11 σ22 [xis − E(xis )]
σ12
= σ22 [xis − E(xis )]. As such, the conditional mean yi can be written as,

σ12  
E(yi |xi , xis ) = x0i β1 + βs xis + xis − E(xis )
σ22
σ12

σ12
 (10)
0
= xi β1 − E(xis ) + βs + xis ,
σ22 σ22

σ2 ∂E(yi )
and the conditional variance of yi , V (yi |xis ) = σ11 − σ12
22
. Equation (10) implies, ∂xis = βs + σσ22
12
,
σ12
the likelihood function contains information on βs and σ22 , but there is no way to separate the
two terms. So, in the absence of information on ui , βs is unidentified.
To understand how this situation may arise in an economic context, let yi denote the hourly
wage of individual i, xi be a vector of covariates that control for demographic and economic
factors, and xis denote years of schooling for the ith individual. Lets assume that the unobserved
covariate is intelligence and it likely affects both wages and schooling; that is, an individual
with higher intelligence would tend to earn a higher wage for any level of schooling and attain
a higher level of education than an individual with lower intelligence (σ12 > 0). In such a case,
the coefficient on education measures both the direct effect of schooling through βs and the
σ12
indirect effect through the relationship between schooling and intelligence σ22 .
If the relation between schooling and intelligence is ignored, the effect of education on wages
σ12
is overestimated by σ22 . This is may be important for public policy. It is thus important to
find a way to estimate βs more accurately.
To proceed, we employ instrumental variables to model the endogenous variable xis . Con-
sider the system,

yi = x0i β1 + βs xis + ui , (11)


xis = x0i γ1 + zi0 γ2 + vi , (12)

where zi0 is a (1 × K2 ) vector of instrumental variables, assumed to be exogenous to the system

7
in the sense that they are independent of ui and vi . On the assumption that (ui , vi ) ∼ N2 (0, Σ),
where Σ = {σij }, this system conditional on xi and zi reproduces the joint distribution given
in Equation (9).
The next step is to write the likelihood function for the model given by equations (11) and
(12), but working directly with the model does not lead a convenient algorithm because solving
the system for (yi , xis ) yield a joint normal distribution with βs appearing in both the mean and
the covariance. So, we employ the law conditional probability and write the joint distribution
as follows:

f (yi , xis |β1 , βs , γ1 , γ2 , Σ) = f (xis |γ1 , γ2 , σ22 )f (yi |xis , γ1 , γ2 , β1 , βs , Σ) (13)

where the first density on the right hand side represents the marginal distribution of xis and
second density represents the conditional density of yi given xis . Since the density of yi is condi-
tional on xis and the parameters, it is equivalent to knowing the value of vi , which because of its
correlation with ui provides information about ui . Considering the distributional assumption
on (ui , vi ), we can write,

xis |Θ ∼ N (x0i γ1 + zi0 γ2 , σ22 ) (14)


yi |xis , Θ ∼ N (x0i β1 + βs xis + β12 xis − x0i γ1 + zi0 γ2 , ω11 ),
 
(15)
2
σ12
σ12
where Θ = (γ1 , γ2 , β1 , βs , β12 , Σ), β12 = σ22 , and ω11 = σ11 − σ22 . The parameters in Θ are
identified: γ1 , γ2 , and σ22 are available from f (xis |Θ), and the remaining parameters from
2
σ12
f (yi |xis , Θ) because σ12 = β12 σ22 , and σ11 = ω11 + σ22 .
Given the likelihood, we need prior distribution on the parameters to arrive at the joint
posterior distribution. We assume the following prior distributions,

β = (β10 , βs , β12 )0 ∼ N (β0 , B0 ),


γ = (γ10 , γ20 )0 ∼ N (γ0 , G0 ),
−1
σ22 ∼ Gamma(α20 /2, δ20 /2),
−1
ω11 ∼ Gamma(α10 /2, δ10 /2).

Moreover, we define y as the (n × 1) vector with ith row yi , X as the n × (K1 + 2) matrix with
ith row Xi = (x0i , xis , ṽi = xis − x0i γ1 − zi0 γ2 ), and Z as the n × (K1 + K2 ) matrix with ith row
Zi = (x0i , zi0 ). Employing the Bayes’ theorem, to combine the prior distributions with the joint

8
likelihood, the posterior distribution is,
 
1 0 −1
π(β, γ, ω11 , σ22 |y, xs ) ∝ exp − (β − β0 ) B0 (β − β0 )
2
 n/2  
1 1 0
× exp − (y − Xβ) (y − Xβ)
ω11 2ω11
 
1 0 −1
× exp − (γ − γ0 ) G0 (γ − γ0 )
2
 n/2  
1 1 0
× exp − (xs − Zγ) (xs − Zγ)
σ22 2σ22
1 α10 /2−1
   
δ10
× exp −
ω11 2ω11
 α20 /2−1  
1 δ20
× exp − .
σ22 2σ22

The above joint posterior distribution can be used to derive the conditional posterior distribu-
tions and construct a Gibbs sampler. The conditional posteriors are as follows:

−1
σ22 |y, xs , γ ∼ Gamma(α21 /2, δ21 /2),
−1
ω11 |y, xs , β ∼ Gamma(α11 /2, δ11 /2),
γ|y, xs , σ22 ∼ NK1 +K2 (γ̄, G1 ),
β|y, xs , ω11 ∼ NK1 +2 (β̄, B1 ),

where we have used the following notations,

α21 = α20 + n,
δ21 = δ20 + (xs − Zγ)0 (xs − Zγ),
α11 = α10 + n,
δ11 = δ10 + (y − Xβ)0 (y − Xβ),
−1 0
G1 = [σ22 Z Z + G−1 −1
0 ] ,
−1 0
γ̄ = G1 [σ22 Z xs + G−1
0 γ0 ],
−1 0
B1 = [ω11 X X + B0−1 ]−1 ,
−1 0
β̄ = B1 [ω11 X y + B0−1 β0 ].

9
Sample Selection Model or Incidental Truncation Model
The sample selection model (a.k.a. incidental truncation model or Tobit Type-II model)
arises when the response variable y (corresponding to the outcome equation) is NOT observed
for all units, and whether it is observed or not depends on the value of “selection variable” si
(corresponding to the selection or participation equation).
The classic example is that of a model designed to explain the wage rate (outcome variable)
of married women on the basis of demographic and economic variables. However, wage rate
is observed only if a woman decides to participate in the labor market (selection variable); no
wage is observed for women who do not work. Moreover, it is likely that factors determining
a woman’s decision to work is associated or not independent of wage rate. For example, a
woman’s decision to join the labor force may be a function of unobserved innate drive, ability
or “spunk”, which is likely to affect the outcome variable i.e., wage rate. If the link between
the two equations are ignored, results flowing solely from the outcome equation will suffer from
sample selection bias if the regression coefficients are interpreted to hold for the wider population
at large (i.e., working AND non-working women).
To show why the sample selection bias arises, suppose the outcome equation is written as,

yi = x0i β1 + ui ,

where yi is the outcome variable, x0i is a vector of dimension (1 × K1 ), β1 is a column vector


of dimension (K1 × 1), and ui is the error term. The variable yi is observed if si = 1 and not
observed if si = 0. For the units that are observed,

E(yi |xi , si = 1) = x0i β1 + E(ui |xi , si = 1).

The last term on the right may not equal zero, because of the correlation between ui and si ,
and a model that assumes a zero expected value is misspecified.
To deal with sample selection, assume that there are K2 instrumental variables contained
in the vector zi and that zi and xi are observed for all units. In summary, the sample selection
model is specified as,

yi = x0i β1 + ui ,
s∗i = x0i γ1 + zi0 γ2 + vi (16)
(
0, if s∗i ≤ 0,
si =
1, if s∗i > 0,

where (ui , vi ) ∼ N2 (0, Σ = [σ11 , σ12 ; σ21 , 1]). The restriction σ22 = 1 arises from the binary
probit model for the selection equation.
Let N0 = {i : si = 0}, N1 = {i : si = 1} and Θ = (β1 , γ1 , γ2 , σ11 , σ12 ) denote the parameters

10
of the model. Then, the contribution of i ∈ N0 to the posterior distribution is,

π(s∗i , Θ|si = 0) ∝ P (si = 0|s∗i , Θ)π(s∗i |Θ)π(Θ)


= 1(si = 0) 1(s∗i ≤ 0)π(s∗i |Θ)π(Θ),

and that of i ∈ N1 is,

π(s∗i , Θ|si = 1, yi ) ∝ f (si = 1, yi |s∗i , Θ)π(s∗i |Θ)π(Θ)


= f (si = 1, yi , s∗i |Θ)π(Θ)
= f (yi , s∗i |Θ)P (si = 1|s∗i , yi , Θ)π(Θ)
= f (yi , s∗i |Θ)1(si = 1)1(s∗i > 0)π(Θ).

The posterior distribution is therefore,


Y
π(s∗i , Θ|s, y) ∝ π(Θ) π(s∗i |Θ)1(si = 0)1(s∗i ≤ 0)
i∈N0
Y
× f (yi , s∗i |Θ)1(si = 1)1(s∗i > 0). (17)
i∈N1

Next, we begin the development of MCMC algorithm by sampling β = (β10 , γ10 , γ20 )0 , σ12 ,
2 , and s∗ from their respective conditional posterior distributions.
ω11 = σ11 − σ12 i
(1) To sample β from its conditional posterior, it is convenient to write the likelihood in a
SUR framework. Define,
(
(0, s∗i )0 if i ∈ N0 ,
ηi =
(yi , s∗i )0 , if i ∈ N1 ,

 
! β1 !
x0i 0 0 0 0
Xi = , β =  γ1  , J= ,
 
0 x0i zi0 0 1
γ2
then the likelihood can be written in the SUR formulation as,
 
∗ 1 X 0 0
π(β|y, s , Σ) ∝ exp − (ηi − Xi β) J J(ηi − Xi β)
2
i∈N0
 
1 X 0 −1
× exp − (ηi − Xi β) Σ (ηi − Xi β) .
2
i∈N1

Note that the first row of (ηi − Xi β) for i ∈ N0 is zero after pre-multiplication by J. With
the prior distribution β ∼ NK1 +K2 (β0 , B0 ), the conditional posterior can be derived to have the

11
following normal distribution,

β|y, s∗ , Σ ∼ NK1 +K2 (β̄, B1 ),


X X −1
B1 = Xi0 JXi + Xi0 Σ−1 Xi + B0−1
i∈N0 i∈N1
X 
β̄ = B1 Xi0 Jηi + B0−1 β0 ,
i∈N0

where we have used J 0 J = J.


2 , which
(2) Next, we sample the covariance matrix parameters σ12 and ω11 = σ11 − σ12
appear in the likelihood function only for i ∈ N1 . Sampling for ω11 is restricted to positive
2 . Assume the prior distribution
values and automatically yields a positive σ11 = ω11 + σ12
−1
ω11 ∼ Ga(α0 /2, δ0 /2) and write f (yi , s∗i |Θ) = f (yi |s∗i , Θ)f (s∗i |Θ) to find,

1 (n1 /2)

  i2 
1 Xh 0 ∗ 0 0
π(ω11 |y, β, σ12 ) ∝ exp − yi − xi β1 − σ12 (si − xi γ1 − zi γ2 )
ω11 2ω11
i∈N1
 (α0 /2)−1  
1 δ0
× exp −
ω11 2ω11

−1
where we note that ω11 does not appear in f (s∗i |Θ). This implies ω11 |y, β, σ12 ∼ Ga(α1 /2, δ1 /2),
where,

α1 = α0 + n1 ,
Xh i2
δ1 = δ0 + yi − x0i β1 − σ12 (s∗i − x0i γ1 − zi0 γ2 ) .
i∈N1

To sample σ12 |y, β, ω11 , assume the prior distribution σ12 ∼ N (s0 , S0 ) and find,
 i2 
1 Xh 0 ∗ 0 0
π(σ12 |y, β, ω11 ) ∝ exp − yi − xi β1 − σ12 (si − xi γ1 zi γ2 )
2ω11
i∈N1
 
1 2
× exp − (σ12 − s0 ) ,
2S0

which implies σ12 |y, β, ω11 ∼ N (ŝ, Ŝ), where


 X −1
−1
Ŝ = ω11 (s∗i − x0i γ1 − zi0 γ2 )2 + S0−1 ,
i∈N1
 X 
−1
ŝ = Ŝ ω11 (s∗i − x0i γ1 − zi0 γ2 )(yi − x0i β1 ) + S0−1 s0 .
i∈N1

(3) To sample the s∗i , we use equation (17) and write, for i ∈ N1 ,
Y Y
f (yi , s∗i |Θ)1(si = 1)1(s∗i > 0) = f (s∗i |yi , Θ)f (yi |Θ)1(si = 1)1(s∗i > 0),
i∈N1 i∈N1

12
which implies that the s∗i are drawn from a truncated normal distribution, as follows:


 T N(−∞,0]) (x0i γ1 + zi0 γ2 , 1), for i ∈ N0 ,

s∗i ∼  
 T N(0,∞) x0i γ1 + zi0 γ2 + ω σ+σ 0
2 (yi − xi β1 ), ω
σ12
, for i ∈ N1 .

 12
2
11 12 11 +σ12

Note that the sampler generates values of the latent data s∗i , it does not generate values of the
“missing” yi for i ∈ N0 .

13
References
Chib, S. (1995), “Marginal Likelihood from the Gibbs Output,” Journal of the American Sta-
tistical Association, 90, 1313–1321.

14

You might also like