0% found this document useful (0 votes)
2 views

CH10

The document discusses Bayesian econometric models for multivariate responses, focusing on the Seemingly Unrelated Regression (SUR) model and panel data models. It details the structure, formulation, and estimation methods for these models, including examples and the likelihood functions involved. The SUR model is characterized by correlated error terms across equations, while panel data models address observations on the same units over time with a focus on heterogeneity among units.

Uploaded by

navnihal1819
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CH10

The document discusses Bayesian econometric models for multivariate responses, focusing on the Seemingly Unrelated Regression (SUR) model and panel data models. It details the structure, formulation, and estimation methods for these models, including examples and the likelihood functions involved. The SUR model is characterized by correlated error terms across equations, while panel data models address observations on the same units over time with a focus on heterogeneity among units.

Uploaded by

navnihal1819
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ECO545A: Bayesian Econometrics

March 26, 2025

Multivariate Responses
We will consider three examples of models in which the response variable is vector: (1)
seemingly unrelated regression, (2) multivariate probit model (will not be covered), (3) panel
data model.

Seemingly Unrelated Regression


The Seemingly Unrelated Regression (SUR) model was introduced by Zellner (1962) and
has been applied extensively. SUR models differ from panel data models in a distinct way.
Let’s consider a sample of j = 1, 2, . . . , J individuals (or years). For each individual we have
one observation for each of m = 1, 2, . . . , M equations. The SUR model is usually applied to
data for which M is small and J is large i.e., the number of units is small and the number of time
periods is large. For example, one may study investment expenditures by M = 5 investment
firms for J = 40 years (more in Example 1 below). In contrast, panel data models is applied to
data with a large number of units and a small number of time periods i.e., M is large and J is
small (typically, M is replaced by n, and J is replaced by T and so the response is denoted by
yit where i = 1, . . . , n and t = 1, . . . , T ).
The first subscript of the response variable in the SUR model is usually associated with a
unit whose behaviour is expected to differ from that of other units and such differences are of
interest.
Example 1 : Let ymj denote investment expenditure by firm m in the j-th year. Here the
subscript m identifies one of a small number of firms and differences in the investment behaviour
of the firms are of interest in research. The j subscript indicates a year and it is assumed that
there are a large number of observations for the investment expenditures of each firm. Infact,
there are enough observation to estimate one equation for each firm. However, its the correlation
across firms in a given year that distinguishes the SUR model.
Example 2 : Let ymj denote the score in test m by individual j. Typically, number of tests
are small but number of individuals are large. Further, we assume correlation across test results
for a particular individual.
Example 3 : Let ymj denote the expenditure on category m by household j. Once again the
number of categories are small and number of households are large. Further, expenditure across
categories are correlated for a given household.

1
Each equation in the SUR model is represented by a linear regression model of the form:

ymj = x0mj βm + mj j = 1. . . . , J, m = 1, . . . , M, (1)

M
X
where x0mj is (1 × Km ), βm is (Km × 1) and Km = K.
m=1
Note that we allow ex-ante for a different set of regressors for each equation (thus the
subscript ‘m’ to x) and a different set of coefficients per equation (thus subscript ‘m’ to β). For
a given individual, we can write down M observations as a block i.e.,

yj = Xj β + j , j ∼ N (0M , ΣM ×M ), (2)

where yj = [y1j , . . . , yM j ]0 , is a vector of dimension M × 1, β = [β10 , . . . , βM


0 ]0 is a vector of

dimension K × 1, j = [1j , . . . , M j ]0 is a vector of dimension M × 1, and the covariate matrix


and error covariance are, respectively,
 0   
x1j 0 ... 0 σ11 σ12 ... σ1M
 0 x02j ... 0   σ21 σ22 ... σ2M
   

Xj =  ..  , Σ=
 .. .. .. ..  .
 . . . .   . . . .
  

0 0 . . . x0M j M ×K
σM 1 σM 2 . . . σM M M ×M

As seen from the covariance structure, we allow the error term associated with a given person to
be correlated across equations. Hence, the name seemingly unrelated regression. The covariance
matrix Σ has M (M + 1)/2 elements free. To keep the estimation tractable, we assume that all
individuals share the same Σ.

The SUR model in the matrix formulation can be written as,

y = Xβ + ,  ∼ N(JM ×1) (0, Ω) (3)

where we have,
         
y1 X1 β1 1 Σ 0 ... 0
 y2   X2   β2   2  0 Σ ... 0
         
   
 ..  ,
y =  . , X =  . , β =  
 ..  , Ω =  ..
=  
.. .
.. 
.. .
 .   .  .  . ... . .
yJ XJ βM J 0 0 ... Σ

Here, y and  are of dimension JM × 1, X is a JM × K matrix, β is a vector of dimension


K × 1, and Ω is a matrix of dimension JM × JM .

2
The likelihood for the model is,

J
Y h 1 i
f (y|β, Σ) = (2π)−M/2 |Σ|−1/2 exp − (yj − Xj β)0 Σ−1 (yj − Xj β)
2
j=1
h 1X J
(4)
i
= (2π)−M J/2 |Σ|−J/2 exp − (yj − Xj β)0 Σ−1 (yj − Xj β)
2
j=1
h 1 i
= (2π)−M J/2 |Σ|−J/2 exp − (y − Xβ)0 Ω−1 (y − Xβ) .
2

Now Ω−1 = diag[Σ−1 , . . . , Σ−1 ], because Ω is a block diagonal matrix. Assuming the prior
distributions, β ∼ N (β0 , B0 ), and Σ−1 ∼ W (ν0 , R0 ), the posterior distribution can be written
as,

J
−J/2 1X h i
π(β, Σ|y) ∝ |Σ| exp − (yj − Xj β)0 Σ−1 (yj − Xj β)
2
j=1
h 1 i
× exp − (β − β0 )0 B0−1 (β − β0 )
2
ν −M −1
h 1 i
− 0 2
× |Σ| exp − tr(R0−1 Σ−1 ) .
2

The derivation of the conditional distributions are quite straightforward. The conditional
posterior distribution for β can be shown to be the following,

β|Σ, y ∼ N (β̄, B1 ), where,


X J 
−1 0 −1 −1
B1 = Xj Σ Xj + B0
j=1
J
X 
β̄ = B1 Xj0 Σ−1 yj + B0−1 β0 .
j=1

To derive the conditional posterior distribution of Σ−1 |y, β, we use the properties of trace
operator Z 0 AZ = tr(Z 0 AZ) = tr(ZZ 0 A) where Z is a column vector, to obtain,

J
X J
nX o
(yj − Xj β)0 Σ−1 (yj − Xj β) = tr (yj − Xj β)(yj − Xj β)0 Σ−1 .
j=1 j=1

Using the above relation, we have,

Σ−1 |y, β ∼ WM (ν1 , R1 ), where,


ν1 = ν0 + J,
h J
X i−1
R1 = R0−1 + (yj − Xj β)(yj − Xj β)0 .
j=1

3
Algorithm 10.1 (Gibbs for SUR Model)

1. Choose a starting value Σ−1(1) .


(g)
2. At the g-th iteration, draw β (g) ∼ NK (β̄ (g) , B1 ), where

J
X 
−1(g)
B1 = Xj0 Σ−1(g−1) Xj + B0−1 ,
j=1
J
X 
(g)
β̄ (g)
= B1 Xj0 Σ−1(g−1) yi + B0−1 β0 ,
j=1

(g)
and then draw Σ−1(g) ∼ WM (ν1 , R1 ), where

h J i−1
(g)
X
R1 = R0−1 + (yj − Xj β (g) )(yj − Xj β (g) )0 .
j=1

For Inverse-Wishart prior Σ ∼ IW (ν0 , S0 ), Σ|y, β ∼ IW (ν1 , S1 ) where ν1 = ν0 + J and


XJ
S1 = S0 + (yj − Xj β (g) )(yj − Xj β (g) )0 .
j=1

4
Panel Data Models
Panel data consist of observations on the same unit observed over several time periods. The
response variable typically is denoted as yit , for i = 1, . . . , n, t = 1, . . . , T ; where n is the sample
size and T is the number of years that the sample has been observed. If the number of years is
same for all units, then we have a balanced panel; else we have an unbalanced panel.
Panel data models is applied to data with a large number of units and a small number of
time periods, i.e. n is large and T is small. A typical panel data consists of large number of
units, usually firms or households, often observed over time period that is too short to estimate
a separate regression for each unit. The identity of the individual units is of no inherent interest
and the large number of units make it impractical to estimate individual variance for each unit
and covariances for each pair of units. The panel data model further assumes that the behaviour
of the units is independent at each time period, but there are differences across individual that
persists over time. These differences are called heterogeneity and they are modeled by a non-zero
covariance between the disturbances of a particular firm or households across time.

Example 1 : Vella and Verbeek (1998) study is based on panel data. The sample includes
observations on 545 young men who worked in each of eight years from 1980−87.
Example 2 : Hausman (1978) considered log wages as a function of demographic variables
in a panel of 629 observations over six years.

The panel data model can be written as,

yit = x0it β + wit


0
bi + uit , for i = 1, . . . , n; t = 1, . . . , T, (5)

where x0it is the vector of covariates for individual i at time t of dimension 1 × K1 , β is a vector
of fixed effects parameters of size K1 × 1, wit and bi are vectors of covariates and random effects
parameters that vary across i of dimension K2 × 1. To model heterogeneity, the i subscript of
bi allows each of the variables in wit to have a different effect on each observation unit. It is
assumed that uit ∼ N (0, h−1
u ) and that cov(uit , ujs ) = 0, unless i = j and s = t. Note that the
distribution of uit has been parameterized in terms of precision.
We first stack the model for each i and define yi = (yi1 , . . . , yiT )0 , Xi = (x0i1 , x0i2 , . . . , x0iT )0 ,
ui = (ui1 , . . . , uiT )0 , Wi = (wi1
0 , . . . , w 0 )0 . Thus, the panel data model can be written as,
iT

yi = Xi β + Wi bi + ui , ui |hu ∼ NT (0, h−1


u IT )

bi |D ∼ NK2 (0, D),


β ∼ NK1 (β0 , B0 ), D−1 ∼ W ish(ν0 , D0 ), hu ∼ Ga(α0 /2, δ0 /2).

5
Consequently the posterior distribution can be written as,
n
nY o
π(β, b, D, hu |y) ∝ f (yi |β, bi , hu )π(bi |D) π(β)π(D)π(hu )
i=1
 n 
hu X 0
∝ hnT
u
/2
exp − (yi − Xi β − Wi bi ) (yi − Xi β − Wi bi )
2
i=1
 n 
−n 1 X 0 −1 h 1 i
× |D| exp −
2 bi D bi exp − (β − β0 )0 B0−1 (β − β0 )
2 2
i=1
  α0
 
−(ν0 −K2 −1) 1 −1 −1 −1 δ0 hu
× |D| 2 exp − tr(D0 D ) × hu 2
exp − ,
2 2

where b = (b1 , . . . , bn ). The conditional posterior distributions can be derived from the above
joint posterior distribution.

The conditional posterior h|y, β, b ∼ Ga α1 /2, δ1 /2 , where,

α1 = α0 + nT,
n
X
δ1 = δ0 + (yi − Xi β − Wi bi )0 (yi − Xi β − Wi bi ),
i=1

and the conditional posterior D−1 |b ∼ W ishK2 (ν1 , D1 ) where,

ν1 = ν0 + n,
h n
X i−1
D1 = D0−1 + bi b0i .
i=1

It is preferable to sample (β, b) in one block as π(β, b|y, D, hu ) rather than in two blocks
π(β|b, y, D, hu ) and π(b|β, y, D, hu ) because of possible correlation between them. This is done
as follows,

π(β, b|y, D, hu ) = π(β|y, D, hu )π(b|β, y, D, hu )


n
Y
= π(β|y, D, hu ) π(bi |β, y, D, hu ).
i=1

In other words, β is sampled marginally of bi and then bi is sampled conditional on β and other
parameters/variables.
The conditional posterior for bi is found to be,

bi |y, β, hu , D ∼ NK2 (b̄i , D1i ), for i = 1 . . . , n,


0 −1 −1
 
D1i = hu Wi Wi + D ,
b̄i = D1i hu Wi0 (yi − Xi β) .
 

To find the conditional posterior distribution for β, we write yi = Xi β + (Wi bi + ui ) and

6
integrate out bi and ui :

E(yi ) = Xi β,
V (yi ) = Wi DWi0 + h−1
u IT = B1i ,

which implies yi |β, hu , D ∼ NT (Xi β, B1i ). Thus, it follows that,


n  o
1n X 0 −1
π(β|y, h, Σ) ∝ exp − (yi − Xi β) B1i (yi − Xi β)
2
i=1
 
1 0 −1
× exp − (β − β0 ) B0 (β − β0 ) ,
2

which implies β|y, hu , D ∼ NT β̄, B1 , where,

n
X −1
−1
B1 = 0
X1i B1i X1i + B0−1 ,
i=1
n
X 
−1
β̄ = B1 Xi0 B1i yi + B0−1 β0 .
i=1

Algorithm 10.3 (Gibbs Sampler for Panel Data Model)

1. Choose a starting value β (1) , b(1) , D(1) .

2. At the g-th iteration,


(g)
(a) Sample hu |y, β (g−1) , b(g−1) ∼ Ga(α1 /2, δ1 /2), where,

n
(g) (g−1) 0 (g−1)
X
α1 = α0 +nT, and δ1 = δ0 + (yi −Xi β (g−1) −Wi bi ) (yi −Xi β (g−1) −Wi bi ).
i=1

(g) (g)
(b) Sample (β, b) in one block as follows: sample β|y, D(g−1) , hu ∼ NK1 (β̄ (g) , B1 ),
marginally of b, where,
h i
(g)
B1i = Wi D(g−1) Wi0 + (h(g) −1
u ) IT ,
Xn −1
(g) 0 (g) −1 −1
B1 = Xi (B1i ) Xi + B0
i=1
n
X 
(g) (g)
β̄ (g)
= B1 Xi0 (B1i )−1 yi + B0−1 β0 .
i=1

(g) (g) (g)


Then, sample bi |y, β (g) , hu , D(g−1) ∼ NK2 (b̄i , D1i ), conditional on β, for i =
1, . . . , n, where,

(g)  (g) 0 −1


D1i = hu Wi Wi + (D(g−1) )−1 ,
(g) (g)  0
= D1i h(g) (g)

b̄i u Wi (yi − Xi β ) .

7
(g)
(c) Sample D−1 |b(g) ∼ W ishK2 (ν1 , D1 ), where,

ν1 = ν0 + n,
n
(g)  −1 X (g) (g)0 −1
D1 = D0 + bi bi .
i=1

8
References
Hausman, J. A. (1978), “Specification Tests in Econometrics,” Econometrica, 46, 1251–1271.

Vella, F. and Verbeek, M. (1998), “Whose Wages do Unions Raise? A Dynamic Model of
Unionism and Wage Rate Determination for YUoung Men,” Journal of Applied Econometrics,
13, 163–183.

Zellner, A. (1962), “An Efficient Method of Estimating Seemingly Unrelated Regressions and
Tests for Aggregation Bias,” Journal of the American Statistical Association, 57, 348–368.

You might also like