0% found this document useful (0 votes)

3 views14 pages

CH 12

Chapter 12 of ECO545A discusses endogeneity in econometric models, particularly in treatment, unobserved covariate, and sample selection models. It highlights how confounding variables can bias the average treatment effect (ATE) and introduces the use of instrumental variables (IVs) to address this issue. The chapter provides mathematical formulations and Bayesian approaches for estimating treatment effects while accounting for endogeneity.

Uploaded by

navnihal1819

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views14 pages

CH 12

Uploaded by

navnihal1819

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

ECO545A: Bayesian Econometrics

April 10, 2023

Chapter 12: Endogeneity in Selected Models

In real life applications, we often come across data in which the covariates are correlated
with the disturbance term (i.e., the exogeneity assumption is no longer tenable). Such covariates
are referred to as endogenous variables in the econometrics literature. Here, we will study
endogeneity in treatment models, unobserved covariate models, and sample selection models
subject to incidental truncation.

Treatment Models
Treatment models are used to compare responses of individuals who either belong to the
treatment group or control group. And the average difference in response between the two
groups is known as the average treatment effect or ATE. This difference gives the true treatment
effect, if the assignment to the group is random i.e., the assignment is independent of the any
characteristic of the individual. For example, suppose cancer patients are randomly selected to
form two groups, where the first group is given an old medicine and the second group is given
the new medicine. Improvement, if any, in the average years of survival for the second group
may then be attributed to the new medicine (assuming compliance by both groups are met).
However, often whether an individual belongs to the treatment or control group is a choice
made by the individual, and the choice may be related to the unobserved covariates that are
correlated with the response variable. Such unobserved covariates are called confounders in the
Statistics literature; and in the econometrics literature such treatment assignment is said to be
endogenous. In such cases, the ATE fails to give the true effect and may lead to misleading
inference.
For example, suppose we are interested in wages and the treatment is participation in a
job training program. It is possible that people who participate in the program have more
motivation and would earn higher wages, even without participating in the program, than those
with less motivation. The problem may be (absent or) less serious if individuals are assigned
randomly to the training program, but there still may be some confounding. Such a situation
can arise, if individuals assigned to the program may choose not to participate, and individuals
not assigned to the program may find a way to participate. Inferences drawn from models that
ignore confounding may yield misleading results.

1
To formally explain endogeneity in treatment effect model, suppose that the response vari-
able is related to the covariates and the treatment assignment as follows,

y0i = x0i β0 + u0i ,

(1)
y1i = x0i β1 + u1i ,

where i = 1, . . . , n; xi is a (K1 × 1) vector of covariates; a 0 subscript indicates assignment to

the control group; and 1 indicates assignment to the treatment group. The group assignment
is determined by the binary variable si as follows,
(
0 if i is in the control group
si =
1 if i is in the treatment group.

One primary objective of such model is to determine the effect of the treatment on the
response. Typically, this is measured by calculating the ATE. The ATE given xi is defined as,

ATE(xi ) = E(y1i − y0i |xi ) = x0i (β1 − β0 ).

For a sample of n individuals, the ATE is calculated as,

n
1X 0
ATE(xi ) = xi (β1 − β0 ).
n
i=1

Since an individual is assigned to either the treatment or control group, only one of y0i and y1i
are observed. In the presence of confounding, the data provide information about E(y1i |si = 1)
and E(y0i |si = 0), but the difference between them is not the ATE, because of the correlation
between the responses and the assignment variable. To see this, we express the difference as

E(y1i |si = 1) − E(y0i |si = 0) = x0i (β1 − β0 ) + E(u1i |si = 1) − E(u0i |si = 0) .

In the presence of confounding, the terms within the square bracket is not equal to zero. Hence,
the difference E(y1i |si = 1) − E(y0i |si = 0) is not equal to ATE.

One approach to solving the problem caused by confounders is to model the assignment
decision and use instrumental variables (or IV’s). The IV’s have two properties: (1) they are
independent of u0i and u1i , and (2) they are not independent of si .
To elucidate the problem, reconsider the wage-job training example. In this case, a possible
IV is the score on an intelligence test. Given a rich set of covariates, the test score is unlikely
to be correlated with error in wage equations, and likely to be correlated with the decision
to participate. By their independence from the confounders, the IVs introduce an element of
randomization into data that are not generated by random assignment.

To explain the treatment effect model, consider the equation (1), the independence of zi and

2
(u0i , u1i ), and

s∗i = x0i γ1 + zi0 γ2 + νi , νi ∼ N (0, 1),

(
0 if s∗i ≤ 0, (2)
si =
1 if s∗i > 0,

and set
yi = (1 − si )y0i + si y1i , (3)

where zi is a (K2 × 1) vector of instrumental variables. The latent variable s∗i for the treatment
indicator si is modeled as a binary probit model. Confounding or endogeneity arises if there
is correlation between u0i and νi or between u1i and νi . This correlation may arise due an
unobserved covariate. The resulting covariance matrices are,
!
σ00 + ω02 ω0
Cov(u0i , νi ) ≡ Σ0 = ,
ω0 1
!
σ11 + ω12 ω1
Cov(u1i , νi ) ≡ Σ1 = ,
ω1 1

where σνν = 1 because of the probit specification. The parameterization σjj + ωj2 has been
adopted to facilitate the MCMC algorithm. Note that σ00 and σ11 are positive, but ω0 and ω1
are unconstrained.
We first define certain notations. The sets of untreated and treated observations are denoted,
respectively, by N0 = {i : si = 0} and N1 = i : si = 1; n0 and n1 are, respectively, the number
of observations in N0 and N1 ; y and s are vectors containing the observed data; s∗ is a vector
containing the s∗i ; γ = (γ10 , γ20 )0 ; and Θ = (β0 , β1 , γ1 , γ2 , σ00 , σ11 , ω0 , ω1 ) contains the unknown
parameters. Let Yj , Xj , Zj and Sj∗ , respectively, be vectors and matrices with rows yji , x0i , zi0 ,
and s∗i for i ∈ Nj .

We assume that σjj and ωj are independent and specify the following prior distributions:

βj ∼ NK1 (bj0 , Bj0 ),

γ ∼ NK (g0 , G0 ),
σjj ∼ IG(νj0 /2, dj0 /2),
ωj ∼ N (mj0 , Mj0 ),

where K = K1 + K2 , and N and IG denotes normal and inverse gamma distributions, respec-
tively.

The likelihood contribution for individual i is written as,

f (yi , si |s∗i , Θ) = f (yi |si , s∗i , θ)f (si |s∗i , θ)

h i
= f (yi |s∗i , Θ) 1(si = 0)1(s∗i ≤ 0) + 1(si = 1)1(s∗i > 0) ,

3
where the second line uses the fact that yi given s∗i is independent of si and si is degenerate
given s∗i . Since the first term is conditioned on s∗i and Θ, we can regard νi = s∗i − x0i γ1 − zi0 γ2
as known. Whether yi is drawn from y0i or y1i is also known. Based on the properties of
conditional normal distribution, we have,

yji |s∗i , βj , γ, ωj , σjj ∼ N (x0i βj + νi ωj , σjj ), i ∈ Nj , j = 1, 2. (4)

Normal distribution and its properties: Let x = (x01 , x02 )0 ∼ N (µ, Σ), where x1 is
of dimension p1 × 1, x2 is of dimension p2 × 1 such that p1 + p2 = p, µ = (µ01 , µ02 )0 , and
Σ = (Σ11 , Σ12 ; Σ21 , Σ22 ). Then, the marginal and conditional distributions, respectively, for x1
are as follows:

x1 ∼ N (µ1 , Σ11 )
x1 |x2 ∼ N (µ1 + Σ12 Σ−1 −1
22 (x2 − µ2 ), Σ11 − Σ12 Σ22 Σ21 ).

The marginal and conditional distributions for x2 can be obtained by interchanging the sub-
scripts.

Coming back to the treatment effect model, the conditional mean for yji is obtained as
µji = x0i βj + ωj × 1 × (νi − 0) = x0i βj + νi ωj . The conditional variance is obtained as Σj|−j =
σjj + ωj2 − ωj × 1 × ωj = σjj .

The joint posterior distribution is obtained by multiplying the likelihood with the prior
distributions and deriving the conditional posteriors for the objects of interest while holding
the remaining fixed.

(1) We first find the conditional posterior for (βj , ωj ) by multiplying the likelihood contri-
bution given by (4) with the prior distribution of (βj , ωj ). To write the distribution, define Vj
as the vector with rows νi for i ∈ Nj , φj = (βj0 , ωj0 )0 , Wj = (Xj , Vj ), fj0 = (b0j0 , mj0 )0 , and
!
Bj0 0
Fj0 = .
0 Mj0

The conditional posterior for (βj , ωj ) can then be expressed as,

1
π(βj , ωj |Yj , γ, Sj∗ , σjj ) ∝ exp − 0
(Yj − Wj φj ) (Yj − Wj φj )
2σjj

1 0 −1
× exp − (φj − fj0 ) Fj0 (φj − fj0 ) .
2

Opening the quadratic terms and rearranging, we obtain the following distribution:

π(βj , ωj |Yj , γ, Sj∗ , σjj ) ∝ N (β̃j0 , B̃j0 ), (5)

4
where we have used the following notation,
h i−1
−1 −1
B̃j0 = Fj0 + σjj Wj0 Wj ,
h i
−1 −1
β̃j0 = B̃j0 Fj0 fj0 + σjj Wj0 Yj .

(2) The conditional posterior for σjj can be found by multiplying its prior with the likelihood
contribution (4), which yields,

−(n/2) 1
π(σjj |Yj , Sj∗ , γ, φj ) ∝ σjj exp − 0
(Yj − Wj φj ) (Yj − Wj φj )
2σjj

−(νj0 /2+1) dj0
× σjj exp − .
2σjj

Rearranging the above terms, we have the following:

π(σjj |Yj , Sj∗ , γ, φj ) ∼ IG(ν̃j0 /2, δ̃j0 ), (6)

where we have used the following notations,

ν̃j0 = (n + νj0 ),
d˜j0 = dj0 + (Yj − Wj φj )0 (Yj − Wj φj ).

(3) To arrive at the conditional posterior distribution for s∗i |yi , si , θ, we regard uji = yji −x0i βj
as known. Some work yields,

ωj σjj
s∗i |yi , si , θ ∼ T N (x0i γ1 + zi0 γ2 + 2 uji , ) (7)
σjj + ωj σjj + ωj2

which again uses the properties of conditional distribution from a multivariate normal distribu-
tion, and where the support of the truncated distribution is (−∞, 0] for si = 0 and (0, ∞) for
si = 1.
(4) To find the full conditional distribution of γ, define Uj as the vector of uji for j ∈ Nj ,
Pj = (Xj , Zj ), and
ωj
Tj∗ = Sj∗ − Uj .
σjj + ωj2
Multiply the likelihood for Sj∗ by the prior for γ to find the full conditional distribution,

∗ 1 0 −1
π(γ|y, s , β0 , β1 , σ00 , σ11 , ω0 , ω1 ) ∝ exp − (γ − g0 ) G0 (γ − g0 )
2
σ00 + ω02

∗ 0 ∗
× exp − (T0 − P0 γ) (T0 − P0 γ)
2σ00
σ11 + ω12 ∗

0 ∗
× exp − (T1 − P1 γ) (T1 − P1 γ) .
2σ11

Opening the quadratic terms and rearranging we arrive at the following conditional posterior

5
distribution,
π(γ|y, s∗ , β0 , β1 , σ00 , σ11 , ω0 , ω1 ) ∼ N (g̃, G̃), (8)

where we have used the following notations,

−1
σ00 + ω02 σ11 + ω12

G̃ = + G−1
0
0
P0 P0 + 0
P1 P1
σ00 σ11
σ00 + ω02 σ11 + ω12

−1 0 ∗ 0 ∗
g̃ = G̃ G0 g0 + P0 T0 + P1 T1 .
σ00 σ11

The sample of βj generated using the above Gibbs algorithm can be used to estimate the
ATE as follows:
1 1 X 0 (g) (g)
AT
[ E= xi (β1 − β0 ).
nG
i,g

Calculation of marginal likelihood for model comparison: The marginal likelihood

is calculated using the method proposed by Chib (1995). The log of the marginal likelihood for
our model can be written as,

log m(y, s) = ln f (y, s|β0∗ , β1∗ , γ ∗ , σ00

∗ , σ ∗ , ω ∗ , ω ∗ ) + ln π(β ∗ , β ∗ , γ ∗ , σ ∗ , ω ∗ , σ ∗ , ω ∗ )
11 0 1 0 1 00 0 11 1

− ln π(β0∗ , ω0∗ , β1∗ , ω1∗ , σ00

∗ , σ ∗ , γ ∗ |y, s),
11

where the first term is the log likelihood, the second is log prior, and the third is log posterior; all
evaluated at the posterior means of the parameters from the MCMC run, indicated by asterisk.

6
Unobserved Covariates/Endogeneity with Continuous Variable
In the previous section, we looked at endogeneity arising from a binary treatment variable.
However, endogeneity in economics is typically related to a continuous variable. We now discuss
the nature of problems caused by the presence of a continuous endogenous covariate.
Suppose, the regression model is specified as follows:

yi = x0i β1 + βs xis + ui , i = 1, . . . , n,

where x0i is a vector of dimension (1 × K1 ) and is independent of ui , but ui and xis have a joint
normal distribution as,
! ! !!
ui 0 σ11 σ12
∼ N2 , , (9)
xis E(xis ) σ12 σ22

−1
where σ12 6= 0. Then the conditional mean of ui is given by, E(ui |xis ) = 0 + σ11 σ22 [xis − E(xis )]
σ12
= σ22 [xis − E(xis )]. As such, the conditional mean yi can be written as,

σ12
E(yi |xi , xis ) = x0i β1 + βs xis + xis − E(xis )
σ22
σ12

σ12
(10)
0
= xi β1 − E(xis ) + βs + xis ,
σ22 σ22

σ2 ∂E(yi )
and the conditional variance of yi , V (yi |xis ) = σ11 − σ12
22
. Equation (10) implies, ∂xis = βs + σσ22
12
,
σ12
the likelihood function contains information on βs and σ22 , but there is no way to separate the
two terms. So, in the absence of information on ui , βs is unidentified.
To understand how this situation may arise in an economic context, let yi denote the hourly
wage of individual i, xi be a vector of covariates that control for demographic and economic
factors, and xis denote years of schooling for the ith individual. Lets assume that the unobserved
covariate is intelligence and it likely affects both wages and schooling; that is, an individual
with higher intelligence would tend to earn a higher wage for any level of schooling and attain
a higher level of education than an individual with lower intelligence (σ12 > 0). In such a case,
the coefficient on education measures both the direct effect of schooling through βs and the
σ12
indirect effect through the relationship between schooling and intelligence σ22 .
If the relation between schooling and intelligence is ignored, the effect of education on wages
σ12
is overestimated by σ22 . This is may be important for public policy. It is thus important to
find a way to estimate βs more accurately.
To proceed, we employ instrumental variables to model the endogenous variable xis . Con-
sider the system,

yi = x0i β1 + βs xis + ui , (11)

xis = x0i γ1 + zi0 γ2 + vi , (12)

where zi0 is a (1 × K2 ) vector of instrumental variables, assumed to be exogenous to the system

7
in the sense that they are independent of ui and vi . On the assumption that (ui , vi ) ∼ N2 (0, Σ),
where Σ = {σij }, this system conditional on xi and zi reproduces the joint distribution given
in Equation (9).
The next step is to write the likelihood function for the model given by equations (11) and
(12), but working directly with the model does not lead a convenient algorithm because solving
the system for (yi , xis ) yield a joint normal distribution with βs appearing in both the mean and
the covariance. So, we employ the law conditional probability and write the joint distribution
as follows:

f (yi , xis |β1 , βs , γ1 , γ2 , Σ) = f (xis |γ1 , γ2 , σ22 )f (yi |xis , γ1 , γ2 , β1 , βs , Σ) (13)

where the first density on the right hand side represents the marginal distribution of xis and
second density represents the conditional density of yi given xis . Since the density of yi is condi-
tional on xis and the parameters, it is equivalent to knowing the value of vi , which because of its
correlation with ui provides information about ui . Considering the distributional assumption
on (ui , vi ), we can write,

xis |Θ ∼ N (x0i γ1 + zi0 γ2 , σ22 ) (14)

yi |xis , Θ ∼ N (x0i β1 + βs xis + β12 xis − x0i γ1 + zi0 γ2 , ω11 ),

(15)
2
σ12
σ12
where Θ = (γ1 , γ2 , β1 , βs , β12 , Σ), β12 = σ22 , and ω11 = σ11 − σ22 . The parameters in Θ are
identified: γ1 , γ2 , and σ22 are available from f (xis |Θ), and the remaining parameters from
2
σ12
f (yi |xis , Θ) because σ12 = β12 σ22 , and σ11 = ω11 + σ22 .
Given the likelihood, we need prior distribution on the parameters to arrive at the joint
posterior distribution. We assume the following prior distributions,

β = (β10 , βs , β12 )0 ∼ N (β0 , B0 ),

γ = (γ10 , γ20 )0 ∼ N (γ0 , G0 ),
−1
σ22 ∼ Gamma(α20 /2, δ20 /2),
−1
ω11 ∼ Gamma(α10 /2, δ10 /2).

Moreover, we define y as the (n × 1) vector with ith row yi , X as the n × (K1 + 2) matrix with
ith row Xi = (x0i , xis , ṽi = xis − x0i γ1 − zi0 γ2 ), and Z as the n × (K1 + K2 ) matrix with ith row
Zi = (x0i , zi0 ). Employing the Bayes’ theorem, to combine the prior distributions with the joint

8
likelihood, the posterior distribution is,

1 0 −1
π(β, γ, ω11 , σ22 |y, xs ) ∝ exp − (β − β0 ) B0 (β − β0 )
2
n/2
1 1 0
× exp − (y − Xβ) (y − Xβ)
ω11 2ω11

1 0 −1
× exp − (γ − γ0 ) G0 (γ − γ0 )
2
n/2
1 1 0
× exp − (xs − Zγ) (xs − Zγ)
σ22 2σ22
1 α10 /2−1

δ10
× exp −
ω11 2ω11
α20 /2−1
1 δ20
× exp − .
σ22 2σ22

The above joint posterior distribution can be used to derive the conditional posterior distribu-
tions and construct a Gibbs sampler. The conditional posteriors are as follows:

−1
σ22 |y, xs , γ ∼ Gamma(α21 /2, δ21 /2),
−1
ω11 |y, xs , β ∼ Gamma(α11 /2, δ11 /2),
γ|y, xs , σ22 ∼ NK1 +K2 (γ̄, G1 ),
β|y, xs , ω11 ∼ NK1 +2 (β̄, B1 ),

where we have used the following notations,

α21 = α20 + n,
δ21 = δ20 + (xs − Zγ)0 (xs − Zγ),
α11 = α10 + n,
δ11 = δ10 + (y − Xβ)0 (y − Xβ),
−1 0
G1 = [σ22 Z Z + G−1 −1
0 ] ,
−1 0
γ̄ = G1 [σ22 Z xs + G−1
0 γ0 ],
−1 0
B1 = [ω11 X X + B0−1 ]−1 ,
−1 0
β̄ = B1 [ω11 X y + B0−1 β0 ].

9
Sample Selection Model or Incidental Truncation Model
The sample selection model (a.k.a. incidental truncation model or Tobit Type-II model)
arises when the response variable y (corresponding to the outcome equation) is NOT observed
for all units, and whether it is observed or not depends on the value of “selection variable” si
(corresponding to the selection or participation equation).
The classic example is that of a model designed to explain the wage rate (outcome variable)
of married women on the basis of demographic and economic variables. However, wage rate
is observed only if a woman decides to participate in the labor market (selection variable); no
wage is observed for women who do not work. Moreover, it is likely that factors determining
a woman’s decision to work is associated or not independent of wage rate. For example, a
woman’s decision to join the labor force may be a function of unobserved innate drive, ability
or “spunk”, which is likely to affect the outcome variable i.e., wage rate. If the link between
the two equations are ignored, results flowing solely from the outcome equation will suffer from
sample selection bias if the regression coefficients are interpreted to hold for the wider population
at large (i.e., working AND non-working women).
To show why the sample selection bias arises, suppose the outcome equation is written as,

yi = x0i β1 + ui ,

where yi is the outcome variable, x0i is a vector of dimension (1 × K1 ), β1 is a column vector

of dimension (K1 × 1), and ui is the error term. The variable yi is observed if si = 1 and not
observed if si = 0. For the units that are observed,

E(yi |xi , si = 1) = x0i β1 + E(ui |xi , si = 1).

The last term on the right may not equal zero, because of the correlation between ui and si ,
and a model that assumes a zero expected value is misspecified.
To deal with sample selection, assume that there are K2 instrumental variables contained
in the vector zi and that zi and xi are observed for all units. In summary, the sample selection
model is specified as,

yi = x0i β1 + ui ,
s∗i = x0i γ1 + zi0 γ2 + vi (16)
(
0, if s∗i ≤ 0,
si =
1, if s∗i > 0,

where (ui , vi ) ∼ N2 (0, Σ = [σ11 , σ12 ; σ21 , 1]). The restriction σ22 = 1 arises from the binary
probit model for the selection equation.
Let N0 = {i : si = 0}, N1 = {i : si = 1} and Θ = (β1 , γ1 , γ2 , σ11 , σ12 ) denote the parameters

10
of the model. Then, the contribution of i ∈ N0 to the posterior distribution is,

π(s∗i , Θ|si = 0) ∝ P (si = 0|s∗i , Θ)π(s∗i |Θ)π(Θ)

= 1(si = 0) 1(s∗i ≤ 0)π(s∗i |Θ)π(Θ),

and that of i ∈ N1 is,

π(s∗i , Θ|si = 1, yi ) ∝ f (si = 1, yi |s∗i , Θ)π(s∗i |Θ)π(Θ)

= f (si = 1, yi , s∗i |Θ)π(Θ)
= f (yi , s∗i |Θ)P (si = 1|s∗i , yi , Θ)π(Θ)
= f (yi , s∗i |Θ)1(si = 1)1(s∗i > 0)π(Θ).

The posterior distribution is therefore,

Y
π(s∗i , Θ|s, y) ∝ π(Θ) π(s∗i |Θ)1(si = 0)1(s∗i ≤ 0)
i∈N0
Y
× f (yi , s∗i |Θ)1(si = 1)1(s∗i > 0). (17)
i∈N1

Next, we begin the development of MCMC algorithm by sampling β = (β10 , γ10 , γ20 )0 , σ12 ,
2 , and s∗ from their respective conditional posterior distributions.
ω11 = σ11 − σ12 i
(1) To sample β from its conditional posterior, it is convenient to write the likelihood in a
SUR framework. Define,
(
(0, s∗i )0 if i ∈ N0 ,
ηi =
(yi , s∗i )0 , if i ∈ N1 ,

 
! β1 !
x0i 0 0 0 0
Xi = , β =  γ1  , J= ,
 
0 x0i zi0 0 1
γ2
then the likelihood can be written in the SUR formulation as,

∗ 1 X 0 0
π(β|y, s , Σ) ∝ exp − (ηi − Xi β) J J(ηi − Xi β)
2
i∈N0

1 X 0 −1
× exp − (ηi − Xi β) Σ (ηi − Xi β) .
2
i∈N1

Note that the first row of (ηi − Xi β) for i ∈ N0 is zero after pre-multiplication by J. With
the prior distribution β ∼ NK1 +K2 (β0 , B0 ), the conditional posterior can be derived to have the

11
following normal distribution,

β|y, s∗ , Σ ∼ NK1 +K2 (β̄, B1 ),

X X −1
B1 = Xi0 JXi + Xi0 Σ−1 Xi + B0−1
i∈N0 i∈N1
X
β̄ = B1 Xi0 Jηi + B0−1 β0 ,
i∈N0

where we have used J 0 J = J.

2 , which
(2) Next, we sample the covariance matrix parameters σ12 and ω11 = σ11 − σ12
appear in the likelihood function only for i ∈ N1 . Sampling for ω11 is restricted to positive
2 . Assume the prior distribution
values and automatically yields a positive σ11 = ω11 + σ12
−1
ω11 ∼ Ga(α0 /2, δ0 /2) and write f (yi , s∗i |Θ) = f (yi |s∗i , Θ)f (s∗i |Θ) to find,

1 (n1 /2)

i2
1 Xh 0 ∗ 0 0
π(ω11 |y, β, σ12 ) ∝ exp − yi − xi β1 − σ12 (si − xi γ1 − zi γ2 )
ω11 2ω11
i∈N1
(α0 /2)−1
1 δ0
× exp −
ω11 2ω11

−1
where we note that ω11 does not appear in f (s∗i |Θ). This implies ω11 |y, β, σ12 ∼ Ga(α1 /2, δ1 /2),
where,

α1 = α0 + n1 ,
Xh i2
δ1 = δ0 + yi − x0i β1 − σ12 (s∗i − x0i γ1 − zi0 γ2 ) .
i∈N1

To sample σ12 |y, β, ω11 , assume the prior distribution σ12 ∼ N (s0 , S0 ) and find,
i2
1 Xh 0 ∗ 0 0
π(σ12 |y, β, ω11 ) ∝ exp − yi − xi β1 − σ12 (si − xi γ1 zi γ2 )
2ω11
i∈N1

1 2
× exp − (σ12 − s0 ) ,
2S0

which implies σ12 |y, β, ω11 ∼ N (ŝ, Ŝ), where

X −1
−1
Ŝ = ω11 (s∗i − x0i γ1 − zi0 γ2 )2 + S0−1 ,
i∈N1
X
−1
ŝ = Ŝ ω11 (s∗i − x0i γ1 − zi0 γ2 )(yi − x0i β1 ) + S0−1 s0 .
i∈N1

(3) To sample the s∗i , we use equation (17) and write, for i ∈ N1 ,
Y Y
f (yi , s∗i |Θ)1(si = 1)1(s∗i > 0) = f (s∗i |yi , Θ)f (yi |Θ)1(si = 1)1(s∗i > 0),
i∈N1 i∈N1

12
which implies that the s∗i are drawn from a truncated normal distribution, as follows:


 T N(−∞,0]) (x0i γ1 + zi0 γ2 , 1), for i ∈ N0 ,

s∗i ∼
 T N(0,∞) x0i γ1 + zi0 γ2 + ω σ+σ 0
2 (yi − xi β1 ), ω
σ12
, for i ∈ N1 .

 12
2
11 12 11 +σ12

Note that the sampler generates values of the latent data s∗i , it does not generate values of the
“missing” yi for i ∈ N0 .

13
References
Chib, S. (1995), “Marginal Likelihood from the Gibbs Output,” Journal of the American Sta-
tistical Association, 90, 1313–1321.

Imbens Wooldridge Notes
No ratings yet
Imbens Wooldridge Notes
473 pages
Kenneth Rothman - Timothy L. Lash - Modern Epidemiology-LWW (2020) - 96-142
No ratings yet
Kenneth Rothman - Timothy L. Lash - Modern Epidemiology-LWW (2020) - 96-142
47 pages
6 Causal Inference Technical
No ratings yet
6 Causal Inference Technical
28 pages
Causal Inference - A Statistical Learning Approach
No ratings yet
Causal Inference - A Statistical Learning Approach
247 pages
Emp Handout PDF
No ratings yet
Emp Handout PDF
36 pages
Bayesian Causal Tutorial Ohiostate June2019
No ratings yet
Bayesian Causal Tutorial Ohiostate June2019
56 pages
Empirical Methods - Esther Duflo 2002
No ratings yet
Empirical Methods - Esther Duflo 2002
36 pages
FIXED-POPULATION CAUSAL INFERENCE FOR MODELS OF EQUILIBRIUM
No ratings yet
FIXED-POPULATION CAUSAL INFERENCE FOR MODELS OF EQUILIBRIUM
39 pages
PSM Inès
No ratings yet
PSM Inès
71 pages
Course3 Generalization
No ratings yet
Course3 Generalization
26 pages
Stock Watson 4E AnswersToReviewTheConcepts
No ratings yet
Stock Watson 4E AnswersToReviewTheConcepts
34 pages
Dealing With and Understanding Endogeneity: Enrique Pinzón
No ratings yet
Dealing With and Understanding Endogeneity: Enrique Pinzón
79 pages
Potential Outcomes Framework
100% (1)
Potential Outcomes Framework
7 pages
CIML2023
No ratings yet
CIML2023
87 pages
Lecture9-Estimating the Linear Causal Model II -Slides annotated
No ratings yet
Lecture9-Estimating the Linear Causal Model II -Slides annotated
30 pages
2506.16486v2
No ratings yet
2506.16486v2
52 pages
s10 IV Handout
No ratings yet
s10 IV Handout
48 pages
Lunceford Davidian 2004
No ratings yet
Lunceford Davidian 2004
24 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Loksh in 2011
No ratings yet
Loksh in 2011
18 pages
Instrumental Variables in RCT
No ratings yet
Instrumental Variables in RCT
27 pages
asz048
No ratings yet
asz048
14 pages
Nihms 1780206
No ratings yet
Nihms 1780206
11 pages
Introduction To Treatment Effects Handout
No ratings yet
Introduction To Treatment Effects Handout
18 pages
intro-stat
No ratings yet
intro-stat
17 pages
Lesson 5 - Instrumental Variables
No ratings yet
Lesson 5 - Instrumental Variables
14 pages
04 Interference Dynamics Notes
No ratings yet
04 Interference Dynamics Notes
13 pages
Slides 1 Match7!30!07
No ratings yet
Slides 1 Match7!30!07
40 pages
slides_intro_1_r1
No ratings yet
slides_intro_1_r1
13 pages
Lecture 1b
No ratings yet
Lecture 1b
7 pages
CH10
No ratings yet
CH10
9 pages
Causality (Slides 2009 Video Link) - Philip Dawid
No ratings yet
Causality (Slides 2009 Video Link) - Philip Dawid
34 pages
14 - 382 - Pset - 5 (1) - Merged
No ratings yet
14 - 382 - Pset - 5 (1) - Merged
9 pages
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
NIHMS1780206 Supplement Supp
No ratings yet
NIHMS1780206 Supplement Supp
8 pages
14 382 Pset 5
No ratings yet
14 382 Pset 5
7 pages
Causal Inference: Yu Xie University of Michigan
No ratings yet
Causal Inference: Yu Xie University of Michigan
51 pages
Evaluation Method - 2023 - Class
No ratings yet
Evaluation Method - 2023 - Class
21 pages
08 Causal Inference I: MSBA7003 Quantitative Analysis Methods
No ratings yet
08 Causal Inference I: MSBA7003 Quantitative Analysis Methods
32 pages
M604 Final Solutions
No ratings yet
M604 Final Solutions
20 pages
PH1630 Chapter8 Notes
No ratings yet
PH1630 Chapter8 Notes
8 pages
Causal Inference, Michael E. Sobel
No ratings yet
Causal Inference, Michael E. Sobel
3 pages
Robins and Greenland 1986
No ratings yet
Robins and Greenland 1986
11 pages
Statistical Models For Causal Analysis - Causal Inference - Notes
No ratings yet
Statistical Models For Causal Analysis - Causal Inference - Notes
3 pages
Slides_1
No ratings yet
Slides_1
8 pages
Economics 717 Fall 2019 Lecture - IV PDF
No ratings yet
Economics 717 Fall 2019 Lecture - IV PDF
30 pages
Chapter 4 Transportation Planning
No ratings yet
Chapter 4 Transportation Planning
217 pages
Lecture 21
No ratings yet
Lecture 21
8 pages
Best Linear Predictor
No ratings yet
Best Linear Predictor
15 pages
Final
No ratings yet
Final
9 pages
Expectations 2 Basics
No ratings yet
Expectations 2 Basics
3 pages
Interpreting Experiments
No ratings yet
Interpreting Experiments
6 pages
Deb and Trivedi 2nd Paper
No ratings yet
Deb and Trivedi 2nd Paper
11 pages
M300 Summary Notes
No ratings yet
M300 Summary Notes
12 pages
Wooldridge 6e Ch09 SSM
No ratings yet
Wooldridge 6e Ch09 SSM
8 pages
Empirical Methods in Microeconomics
No ratings yet
Empirical Methods in Microeconomics
3 pages
Multiple Indicator Kriging (MIK Uing StudioRM
No ratings yet
Multiple Indicator Kriging (MIK Uing StudioRM
41 pages
Presentation Forecasting Pharmacy
No ratings yet
Presentation Forecasting Pharmacy
48 pages
Guo et al. - 2021 - Optimal weighted two-sample t-test with partially
No ratings yet
Guo et al. - 2021 - Optimal weighted two-sample t-test with partially
17 pages
Anova 1
No ratings yet
Anova 1
47 pages
simple linear regression with example problem
No ratings yet
simple linear regression with example problem
12 pages
Chap 011
No ratings yet
Chap 011
42 pages
IPPTCh 007
No ratings yet
IPPTCh 007
41 pages
SQC (Chapter 3)
No ratings yet
SQC (Chapter 3)
34 pages
Lecture 4 Hypothesis Testing
No ratings yet
Lecture 4 Hypothesis Testing
24 pages
Understanding the Backtest on MetaTrader
No ratings yet
Understanding the Backtest on MetaTrader
13 pages
Computing The Variance of A Discrete Probability Distribution
No ratings yet
Computing The Variance of A Discrete Probability Distribution
12 pages
Project 2 Classification Models
No ratings yet
Project 2 Classification Models
5 pages
Influence of stages of economic development on women entrepreneurs' startups
No ratings yet
Influence of stages of economic development on women entrepreneurs' startups
8 pages
Loan Eligibility Prediction Using Logistics Regression Algorithm
No ratings yet
Loan Eligibility Prediction Using Logistics Regression Algorithm
11 pages
Descriptive Stats Exercises Solutions 162
No ratings yet
Descriptive Stats Exercises Solutions 162
6 pages
Critical Region
No ratings yet
Critical Region
7 pages
5 Ways To Find Outliers in Your Data - Statistics by Jim
No ratings yet
5 Ways To Find Outliers in Your Data - Statistics by Jim
35 pages
HASTS211_MAR2023
No ratings yet
HASTS211_MAR2023
3 pages
Q4-AIP-ICOMELA-A Meta-Analysis of The Implementation of ANN Back Propagation Methods in Time Series Data Forecasting Case Studies in Indonesia
No ratings yet
Q4-AIP-ICOMELA-A Meta-Analysis of The Implementation of ANN Back Propagation Methods in Time Series Data Forecasting Case Studies in Indonesia
8 pages
ML 06 Multiclass
No ratings yet
ML 06 Multiclass
11 pages
Anova, Ancova, Manova, & Mancova
No ratings yet
Anova, Ancova, Manova, & Mancova
11 pages
ML A1 PDF
100% (1)
ML A1 PDF
3 pages
1997 Principles and Procedures of Exploratory Data Analysis
No ratings yet
1997 Principles and Procedures of Exploratory Data Analysis
30 pages
X Variable 1 Line Fit Plot: Regression Statistics
No ratings yet
X Variable 1 Line Fit Plot: Regression Statistics
3 pages
Understanding The Service Desk: Applied Forecasting and Analytics Approach
No ratings yet
Understanding The Service Desk: Applied Forecasting and Analytics Approach
5 pages
This Study Resource Was
No ratings yet
This Study Resource Was
6 pages
Math 142 Project PDF
No ratings yet
Math 142 Project PDF
4 pages
Test of Difference Between Profile and Factors Affecting The Academic Performance Variables Test Stat Computed Value P-Value Remarks Decision
No ratings yet
Test of Difference Between Profile and Factors Affecting The Academic Performance Variables Test Stat Computed Value P-Value Remarks Decision
2 pages
Differential Equations: A Concise Course
From Everand
Differential Equations: A Concise Course
H. S. Bear
5/5 (3)
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Introduction to Bessel Functions
From Everand
Introduction to Bessel Functions
Frank Bowman
2.5/5 (1)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

CH 12

Uploaded by

CH 12

Uploaded by

ECO545A: Bayesian Econometrics

April 10, 2023

Chapter 12: Endogeneity in Selected Models

y0i = x0i β0 + u0i ,

where i = 1, . . . , n; xi is a (K1 × 1) vector of covariates; a 0 subscript indicates assignment to

ATE(xi ) = E(y1i − y0i |xi ) = x0i (β1 − β0 ).

For a sample of n individuals, the ATE is calculated as,

s∗i = x0i γ1 + zi0 γ2 + νi , νi ∼ N (0, 1),

βj ∼ NK1 (bj0 , Bj0 ),

The likelihood contribution for individual i is written as,

f (yi , si |s∗i , Θ) = f (yi |si , s∗i , θ)f (si |s∗i , θ)

yji |s∗i , βj , γ, ωj , σjj ∼ N (x0i βj + νi ωj , σjj ), i ∈ Nj , j = 1, 2. (4)

The conditional posterior for (βj , ωj ) can then be expressed as,

π(βj , ωj |Yj , γ, Sj∗ , σjj ) ∝ N (β̃j0 , B̃j0 ), (5)

Rearranging the above terms, we have the following:

π(σjj |Yj , Sj∗ , γ, φj ) ∼ IG(ν̃j0 /2, δ̃j0 ), (6)

where we have used the following notations,

where we have used the following notations,

Calculation of marginal likelihood for model comparison: The marginal likelihood

log m(y, s) = ln f (y, s|β0∗ , β1∗ , γ ∗ , σ00

− ln π(β0∗ , ω0∗ , β1∗ , ω1∗ , σ00

yi = x0i β1 + βs xis + ui , (11)

where zi0 is a (1 × K2 ) vector of instrumental variables, assumed to be exogenous to the system

f (yi , xis |β1 , βs , γ1 , γ2 , Σ) = f (xis |γ1 , γ2 , σ22 )f (yi |xis , γ1 , γ2 , β1 , βs , Σ) (13)

xis |Θ ∼ N (x0i γ1 + zi0 γ2 , σ22 ) (14)

β = (β10 , βs , β12 )0 ∼ N (β0 , B0 ),

where we have used the following notations,

where yi is the outcome variable, x0i is a vector of dimension (1 × K1 ), β1 is a column vector

E(yi |xi , si = 1) = x0i β1 + E(ui |xi , si = 1).

π(s∗i , Θ|si = 0) ∝ P (si = 0|s∗i , Θ)π(s∗i |Θ)π(Θ)

and that of i ∈ N1 is,

π(s∗i , Θ|si = 1, yi ) ∝ f (si = 1, yi |s∗i , Θ)π(s∗i |Θ)π(Θ)

The posterior distribution is therefore,

β|y, s∗ , Σ ∼ NK1 +K2 (β̄, B1 ),

where we have used J 0 J = J.

which implies σ12 |y, β, ω11 ∼ N (ŝ, Ŝ), where

You might also like