CH10
CH10
Multivariate Responses
We will consider three examples of models in which the response variable is vector: (1)
seemingly unrelated regression, (2) multivariate probit model (will not be covered), (3) panel
data model.
1
Each equation in the SUR model is represented by a linear regression model of the form:
M
X
where x0mj is (1 × Km ), βm is (Km × 1) and Km = K.
m=1
Note that we allow ex-ante for a different set of regressors for each equation (thus the
subscript ‘m’ to x) and a different set of coefficients per equation (thus subscript ‘m’ to β). For
a given individual, we can write down M observations as a block i.e.,
yj = Xj β + j , j ∼ N (0M , ΣM ×M ), (2)
As seen from the covariance structure, we allow the error term associated with a given person to
be correlated across equations. Hence, the name seemingly unrelated regression. The covariance
matrix Σ has M (M + 1)/2 elements free. To keep the estimation tractable, we assume that all
individuals share the same Σ.
where we have,
y1 X1 β1 1 Σ 0 ... 0
y2 X2 β2 2 0 Σ ... 0
.. ,
y = . , X = . , β =
.. , Ω = ..
=
.. .
..
.. .
. . . . ... . .
yJ XJ βM J 0 0 ... Σ
2
The likelihood for the model is,
J
Y h 1 i
f (y|β, Σ) = (2π)−M/2 |Σ|−1/2 exp − (yj − Xj β)0 Σ−1 (yj − Xj β)
2
j=1
h 1X J
(4)
i
= (2π)−M J/2 |Σ|−J/2 exp − (yj − Xj β)0 Σ−1 (yj − Xj β)
2
j=1
h 1 i
= (2π)−M J/2 |Σ|−J/2 exp − (y − Xβ)0 Ω−1 (y − Xβ) .
2
Now Ω−1 = diag[Σ−1 , . . . , Σ−1 ], because Ω is a block diagonal matrix. Assuming the prior
distributions, β ∼ N (β0 , B0 ), and Σ−1 ∼ W (ν0 , R0 ), the posterior distribution can be written
as,
J
−J/2 1X h i
π(β, Σ|y) ∝ |Σ| exp − (yj − Xj β)0 Σ−1 (yj − Xj β)
2
j=1
h 1 i
× exp − (β − β0 )0 B0−1 (β − β0 )
2
ν −M −1
h 1 i
− 0 2
× |Σ| exp − tr(R0−1 Σ−1 ) .
2
The derivation of the conditional distributions are quite straightforward. The conditional
posterior distribution for β can be shown to be the following,
To derive the conditional posterior distribution of Σ−1 |y, β, we use the properties of trace
operator Z 0 AZ = tr(Z 0 AZ) = tr(ZZ 0 A) where Z is a column vector, to obtain,
J
X J
nX o
(yj − Xj β)0 Σ−1 (yj − Xj β) = tr (yj − Xj β)(yj − Xj β)0 Σ−1 .
j=1 j=1
3
Algorithm 10.1 (Gibbs for SUR Model)
J
X
−1(g)
B1 = Xj0 Σ−1(g−1) Xj + B0−1 ,
j=1
J
X
(g)
β̄ (g)
= B1 Xj0 Σ−1(g−1) yi + B0−1 β0 ,
j=1
(g)
and then draw Σ−1(g) ∼ WM (ν1 , R1 ), where
h J i−1
(g)
X
R1 = R0−1 + (yj − Xj β (g) )(yj − Xj β (g) )0 .
j=1
4
Panel Data Models
Panel data consist of observations on the same unit observed over several time periods. The
response variable typically is denoted as yit , for i = 1, . . . , n, t = 1, . . . , T ; where n is the sample
size and T is the number of years that the sample has been observed. If the number of years is
same for all units, then we have a balanced panel; else we have an unbalanced panel.
Panel data models is applied to data with a large number of units and a small number of
time periods, i.e. n is large and T is small. A typical panel data consists of large number of
units, usually firms or households, often observed over time period that is too short to estimate
a separate regression for each unit. The identity of the individual units is of no inherent interest
and the large number of units make it impractical to estimate individual variance for each unit
and covariances for each pair of units. The panel data model further assumes that the behaviour
of the units is independent at each time period, but there are differences across individual that
persists over time. These differences are called heterogeneity and they are modeled by a non-zero
covariance between the disturbances of a particular firm or households across time.
Example 1 : Vella and Verbeek (1998) study is based on panel data. The sample includes
observations on 545 young men who worked in each of eight years from 1980−87.
Example 2 : Hausman (1978) considered log wages as a function of demographic variables
in a panel of 629 observations over six years.
where x0it is the vector of covariates for individual i at time t of dimension 1 × K1 , β is a vector
of fixed effects parameters of size K1 × 1, wit and bi are vectors of covariates and random effects
parameters that vary across i of dimension K2 × 1. To model heterogeneity, the i subscript of
bi allows each of the variables in wit to have a different effect on each observation unit. It is
assumed that uit ∼ N (0, h−1
u ) and that cov(uit , ujs ) = 0, unless i = j and s = t. Note that the
distribution of uit has been parameterized in terms of precision.
We first stack the model for each i and define yi = (yi1 , . . . , yiT )0 , Xi = (x0i1 , x0i2 , . . . , x0iT )0 ,
ui = (ui1 , . . . , uiT )0 , Wi = (wi1
0 , . . . , w 0 )0 . Thus, the panel data model can be written as,
iT
5
Consequently the posterior distribution can be written as,
n
nY o
π(β, b, D, hu |y) ∝ f (yi |β, bi , hu )π(bi |D) π(β)π(D)π(hu )
i=1
n
hu X 0
∝ hnT
u
/2
exp − (yi − Xi β − Wi bi ) (yi − Xi β − Wi bi )
2
i=1
n
−n 1 X 0 −1 h 1 i
× |D| exp −
2 bi D bi exp − (β − β0 )0 B0−1 (β − β0 )
2 2
i=1
α0
−(ν0 −K2 −1) 1 −1 −1 −1 δ0 hu
× |D| 2 exp − tr(D0 D ) × hu 2
exp − ,
2 2
where b = (b1 , . . . , bn ). The conditional posterior distributions can be derived from the above
joint posterior distribution.
The conditional posterior h|y, β, b ∼ Ga α1 /2, δ1 /2 , where,
α1 = α0 + nT,
n
X
δ1 = δ0 + (yi − Xi β − Wi bi )0 (yi − Xi β − Wi bi ),
i=1
ν1 = ν0 + n,
h n
X i−1
D1 = D0−1 + bi b0i .
i=1
It is preferable to sample (β, b) in one block as π(β, b|y, D, hu ) rather than in two blocks
π(β|b, y, D, hu ) and π(b|β, y, D, hu ) because of possible correlation between them. This is done
as follows,
In other words, β is sampled marginally of bi and then bi is sampled conditional on β and other
parameters/variables.
The conditional posterior for bi is found to be,
6
integrate out bi and ui :
E(yi ) = Xi β,
V (yi ) = Wi DWi0 + h−1
u IT = B1i ,
n
X −1
−1
B1 = 0
X1i B1i X1i + B0−1 ,
i=1
n
X
−1
β̄ = B1 Xi0 B1i yi + B0−1 β0 .
i=1
n
(g) (g−1) 0 (g−1)
X
α1 = α0 +nT, and δ1 = δ0 + (yi −Xi β (g−1) −Wi bi ) (yi −Xi β (g−1) −Wi bi ).
i=1
(g) (g)
(b) Sample (β, b) in one block as follows: sample β|y, D(g−1) , hu ∼ NK1 (β̄ (g) , B1 ),
marginally of b, where,
h i
(g)
B1i = Wi D(g−1) Wi0 + (h(g) −1
u ) IT ,
Xn −1
(g) 0 (g) −1 −1
B1 = Xi (B1i ) Xi + B0
i=1
n
X
(g) (g)
β̄ (g)
= B1 Xi0 (B1i )−1 yi + B0−1 β0 .
i=1
7
(g)
(c) Sample D−1 |b(g) ∼ W ishK2 (ν1 , D1 ), where,
ν1 = ν0 + n,
n
(g) −1 X (g) (g)0 −1
D1 = D0 + bi bi .
i=1
8
References
Hausman, J. A. (1978), “Specification Tests in Econometrics,” Econometrica, 46, 1251–1271.
Vella, F. and Verbeek, M. (1998), “Whose Wages do Unions Raise? A Dynamic Model of
Unionism and Wage Rate Determination for YUoung Men,” Journal of Applied Econometrics,
13, 163–183.
Zellner, A. (1962), “An Efficient Method of Estimating Seemingly Unrelated Regressions and
Tests for Aggregation Bias,” Journal of the American Statistical Association, 57, 348–368.