0% found this document useful (0 votes)
65 views26 pages

Holtz-Eakin, Newey and Rosen (1988)

This paper considers estimating vector autoregressive models with panel data. The model allows for individual heterogeneity through individual effects and heteroskedastic errors. Estimation is done by applying instrumental variables to quasi-differenced equations. The techniques are then applied to analyze the dynamic relationship between wages and hours worked using panel data on American males.

Uploaded by

trofff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views26 pages

Holtz-Eakin, Newey and Rosen (1988)

This paper considers estimating vector autoregressive models with panel data. The model allows for individual heterogeneity through individual effects and heteroskedastic errors. Estimation is done by applying instrumental variables to quasi-differenced equations. The techniques are then applied to analyze the dynamic relationship between wages and hours worked using panel data on American males.

Uploaded by

trofff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Estimating Vector Autoregressions with Panel Data

Author(s): Douglas Holtz-Eakin, Whitney Newey and Harvey S. Rosen


Source: Econometrica, Vol. 56, No. 6 (Nov., 1988), pp. 1371-1395
Published by: The Econometric Society
Stable URL: http://www.jstor.org/stable/1913103 .
Accessed: 01/11/2013 12:56

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

http://www.jstor.org

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
Econometrica,Vol. 56, No. 6 (November, 1988), 1371-1395

ESTIMATING VECTOR AUTOREGRESSIONS


WITH PANEL DATA

BY DOUGLASHOLTZ-EAKIN,WHITNEYNEWEY, AND HARVEYS. ROSEN'

This paper considers estimation and testing of vector autoregression coefficients in panel
data, and applies the techniques to analyze the dynamic relationships between wages and
hours worked in two samples of American males. The model allows for nonstationary
individual effects, and is estimated by applying instrumental variables to the quasi-dif-
ferenced autoregressive equations. Particular attention is paid to specifying lag lengths,
forming convenient test statistics, and testing for the presence of measurement error. The
empirical results suggest the absence of lagged hours in the wage forecasting equation. Our
results also show that lagged hours is important in the hours equation, which is consistent
with alternatives to the simple labor supply model that allow for costly hours adjustment or
preferences that are not time separable.
KEYwORDS:Vector autoregression, panel data, causality tests, labor supply.

1. INTRODUCTION

VECTORAUTOREGRESSIONS are now a standard part of the applied econometri-


cian's tool kit. Although their interpretation in terms of causal relationships is
controversial, most researchers would agree that vector autoregressions are a
parsimonious and useful means of summarizing time series "facts."
To date, vector autoregressive techniques have been used mostly to analyze
macroeconomic time series where there are dozens of observations. (See, e.g.,
Taylor (1980), or Ashenfelter and Card (1982).) In principle, these techniques
should apply equally well to disaggregate data. For example, a vector autoregres-
sion can be used to summarize the dynamic relationship between an individual's
hours of work and wages (see below) or the dynamic relationship between a
government's revenues and expenditures (see Holtz-Eakin, Newey, Rosen (forth-
coming)). Unlike macroeconomic applications, however. the available time series
on micro units are typically quite short. Many of the popular panel data sets, for
example, have no more than ten or twelve years of observations for each unit.2
Also, it is possible that individual heterogeneity is an important feature of
disaggregate data. For these reasons, it is inappropriate to apply standard
techniques for estimating vector autoregressions to panel data.
The purpose of this paper is to formulate a coherent set of procedures for
estimating and testing vector autoregressions in panel data. Section 2 presents the
basic model, which builds upon Chamberlain (1983). Section 3 discusses identifi-
cation and gives methods of parameter estimation and testing. The estimation

1This research was supported in part by NSF Grants SES-8419238 and SES-8410249. We are
grateful to Joseph Altonji, the Editors, and three referees for useful comments. Joseph Altonji and
David Card graciously provided us with the data used in the empirical analysis.
2Nevertheless, our techniques are appropriate for more " traditional" macroeconomic applications.
For example, Taylor (1980) examined and compared the time series properties of several key
macroeconomic variables for a number of European countries. Our methods could be used to execute
formal tests of similarity between them.
1371

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1372 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

method is similar in spirit to that of Anderson and Hsiao (1982). Section 4


applies the methods to an example from labor economics; we investigate the
dynamic relationships between wages and hours worked. Section 5 provides a
brief summary and conclusion.

2. THE MODEL

In the usual time series context, equations of a bivariate autoregression


typically take the form

(2.1) yt--=ao+ EaZyt,_+ E8,xt_,+ut,


1=1 1=1

where the a 's and 8's are the coefficients of the linear projection of yt onto a
constant and past values of Yt and xt, and the lag length m is sufficiently large to
ensure that ut is a white noise error term. While it is not essential that the lag
lengths for y and x are equal, we follow typical practice by assuming that they
are identical.
Consistent estimation of the parameters of equation (2.1) requires many
observations of x and y values. In time series applications these observations
typically are obtained from a record of x and y over a long period of time. In
contrast, panel data usually have a relatively small number of time series
observations. Instead, there often are a great number of cross-sectional units,
with only a few years of data on each unit. To estimate the parameters of
equation (2.1) one must pool data from different units, a procedure which
imposes the constraint that the underlying structure is the same for each
cross-sectional unit.
The constraint that the time series relationship of x and y is the same for each
cross-sectional unit is likely to be violated in practice, so that it is desirable to be
able to relax this restriction. One way to relax the pooling constraint is to allow
for an "individual effect," which translates in practice into an individual specific
intercept in equation (2.1). Changes in the intercept of a stationary vector
autoregression correspond to changes in the means of the variables, so that
allowing for an individual effect allows for individual heterogeneity in the levels
of x and y. A second way to allow for individual heterogeneity is to allow the
variance of the innovation in equation (2.1) to vary with the cross-section unit.
Changes in the innovation variance of a vector autoregression correspond
to changes in the variance of the variables, so that allowing for cross-section
heteroskedasticity in the innovation variance allows for individual heterogeneity
in the variability of x and y. In what follows we allow for both an individual
effect and cross-section heteroskedasticity in the variance of the innovation.
It is likely that the level and variability of the variables are important sources
of individual heterogeneity, but it would also be nice to allow for individual
heterogeneity in the time series correlation pattern of x and y. In this context,
allowing for such heterogeneity is difficult because the variables on the right-hand

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1373

side of the equation are lagged endogenous variables. Here it is impossible to


interpret the a 's and 8's as means of parameters that vary randomly across
individual cross-section units, although this interpretation of regression parame-
ters is possible when the right-hand side variables are exogenous. (See Pakes and
Griliches (1984).)
On the other hand, pooling cross-sectional units does have certain advantages.
First, the assumption of time stationarity can be relaxed. The presence of a large
number of cross-sectional units makes it possible to allow for lag coefficients that
vary over time. Second, the asymptotic distribution theory for a large number of
cross-sectional units does not require the vector autoregression to satisfy the
usual conditions that rule out unit and explosive roots. Of course, the presence of
an explosive process may lead to difficulties in interpreting the model. Neverthe-
less, it is still possible, for example, to use standard asymptotic distribution
theory to formulate valid tests for explosive behavior.3
A model with individual effects that relaxes the time stationarity assumption
can be obtained by modifying a model presented by Chamberlain (1983). Assume
that there are N cross-sectional units observed over T periods. Let i index the
cross-sectional observations and t the time periods. A model that is analogous to
equation (2.1), but allows for individual effects and nonstationarities across time
is
m m

(2.2) yit = 'at + a ltyit-y + 81xit-1 + t.fi + uit

(i= 1,... N; t=1. T),

where fi is an unobserved individual effect and the coefficients aot,


a,t', a8mt, 8,,'' , Irt are the coefficients of the linear projection of yit on a
constant, past values of yit and xit, and the individual effect fi.
The model of equation (2.2) is different than that of Chamberlain (1983, p.
1263) in that Chamberlain avoids restricting the lag length by assuming that
the first period of observation corresponds to the first period of the life of the
individual unit. This assumption implies that the projection of yit on all the
observed past values for yit and xit (i.e. Yit.- , Yib, x i.,- I, x}) is equal to
the projection on the entire past. That is, the lag length m in equation (2.2) varies
with t according to the relation m(t) = t - 1. In practice, the entire history of
each economic unit is not usually observed and some assumptions must be
imposed to identify the time series relationship of x and y using the observed
data.4 Our method takes this fact into account. The assumption embodied in
equation (2.2) is that for each observed time period t the projection of yit on the

3 The asymptotic theory does require that various moments of the data exist. Existence of moments
in models with unit or explosive roots requires an assumption concerning the initial conditions of the
data such as the assumption that the first point in the life of the individual units is a constant.
4 Pakes and Griliches (1984) have considered a similar identification issue in the context of

distributed lag models with exogenous regressors.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1374 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

entire past depends only on the past m observations. In the next section we will
discuss identification, estimation, and inference under this assumption on lag
length, as well as ways of testing restrictionson the parametersof equation (2.2).

3. STATISTICALINFERENCE

A. Identification
The specification of equation (2.2) as a projection equation implies that the
error term u , satisfies the orthogonality condition

(3.1) E[yisuit] = E[xisuit] = E[fiuit] = 0, (s < t).

These orthogonality conditions imply that lagged values of x and y qualify as


instrumental variables for equation (2.2). Our analysis of identification will be
restricted to use of these orthogonality conditions.5 Of course, if other restric-
tions are imposed on equation (2.2), such as absence of cross-section hetero-
skedasticity in the forecast error ui, then it will be easier to identify the
parameters.6 Such extra restrictions will often take the form of imposing ad-
ditional cross-section or time-series homogeneity in the relationship of x and Y,
so that restricting attention to the orthogonality conditions (3.1) is consistent
with allowing as much heterogeneity as possible.
In order to use the orthogonality conditions (3.1) to identify the parameters of
equation (2.2), the investigator must deal with the presence of the unobserved
individual effect, fi. It is well known that in models with lagged dependent
variables it is inappropriate to treat individual effects as constants to be esti-
mated.7 Instead, we can transform equation (2.2) to eliminate the individual
effect. Let r, = *,1/',I-', and consider multiplying equation (2.2) for time period
t- 1 by r, and subtracting the result from the equation for period t. Collecting
all x and y terms dated t- 1 or before on the right-hand side yields
m+1 m+1
(3.2) yi,=a,+ E cl,yi,l?+ , d1,xi,l +vit (t=(m+2),...,T),
1=1 1=1

Note that it would be valid to use nonlinear functions of lagged values of x and v as instruments
only if equation (2.2) could be interpreted as a conditional expectation rather than a linear projection.
We choose to work with a linear projection specification because specification of the form of the
conditional expectation of y1, given lagged values of x and v is difficult. It seems likely that this
conditional expectation would involve nonlinear functions of lagged values of x and Y.
6 If the first and second moments of the data are the same for different cross-section units, then the
minimum distance methods of MaCurdy (1981a) and Chamberlain(1983) could be used to estimate
the parameters from cross-section moments. In this case it will be easier to identify the parameters,
because the orthogonality conditions (3.1) do not involve all the cross-section moments.
7One common technique is to compute the differencebetween each variable and its time mean (by
cross-section unit) to eliminate the individual effect. See, e.g., Lundberg(1985). In the current context,
this procedure will yield inconsistent estimates, even when the parametersare stationary, because of
the presence of lagged endogenous variables. See Nickell (1981).

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1375

where
(3.3) at =aot rtfaOt,
cl,-=rt + a,t,t

c,= a,t-rta,_ (1=2,


.t_ , m
Cm+i t
t= rtamt_ 1,

=
dit 81t'

d1t=81t -r81, t- (I= 2,. m


dm+itl =-rt8m.t-l

Vit = Ui, -rtUi, t_ I -

Note that in the special case of r,= 1 for each t, then this transformation is
simple differencing of equation (2.2). This has been suggested for use in estima-
tion of univariate autoregressive models in panel data by Anderson and Hsiao
(1982). More generally, this transformation is a quasi-differencing transformation
that has been suggested by Chamberlain (1983). We will proceed by first
discussing identification of the parameters of the transformed equation (3.2), and
then discussing identification of the original parameters of equation (2.2) from
(3.2).
The orthogonality conditions of equation (3.1) imply that the error term of the
transformed equation (3.2) satisfies the orthogonality condition

(3.4) E[ yi,vit] = E [ xivit ] = O (s < (t-1I)).


Thus, the vector of instrumental variables that is available to identify the
parameters of equation (3.2) is
Zi,= [1, Yi-2. *, Yil' Xit-2,., Xil]

Using the orthogonality conditions (3.4), a necessary condition for identification


is that there are at least as many instrumental variables as right-hand side
variables.8 Since there are a total of 2m + 3 right-hand variables in equation (3.2)
and the dimension of Zd is 2t - 3, this order condition reduces to t > m + 3.
Thus, we must have T ? m + 3 in order to estimate the parameters of equation
(3.2) for any t.
Consider the identification of the original parameters. Note first that the
parameters of equation (3.2) involve only the ratio rt of the coefficients of the
individual effect. This is to be expected. Since changes in the level of these
coefficients correspond to changes in the scale of the individual effect, the level of
these coefficients is not identified. The original coefficients that can be identified
are therefore the lag coefficients and the ratios of the coefficients of the individual

x See Fisher (1966). A sufficient condition for identification is that in the limit, the cross-product
matrix between the instruments and the right-hand side variables have rank equal to the number of
right-hand side variables.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1376 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

effect. We will ignore identification of the constant terms (i.e., the a0o,'s)since
they are usually considered nuisance parameters.
To identify the original parameters there must be at least as many parameters
in the transformed equation as in the original equation. Note that it is possible to
estimate the parameters of (3.2) for a total of T - m - 2 time periods. Ignoring
the constant terms, there is a total of (T - m - 2)2(m + 1) parameters in the
transformed equation, which involve a total of (T - m - 2) + (T - m - 1)2m
original parameters. Thus, it will not be possible to identify the original parame-
ters unless T - m - 2 > 2m, i.e., the number of estimable time periods is at least
as large as twice the lag length. Also, because many of the parameters of the
transformed equation consist of nonlinear functions of the original parameters
with complicated interactions across time periods, for some values of the parame-
ters it may not be possible to recover the original parameters, even when
T? 3m + 2.
Importantly, there is no need to recover the original parameters to test certain
interesting hypotheses. For example, the hypothesis that x does not (Granger)
cause y conditional on the individual effect restricts 8, = 0 for each / and t. Since
this further implies that d,, = 0 for each / and t, this noncausality hypothesis can
be tested by testing for zero coefficients for the lagged x variables in the
transformed equation.
It is also useful to note that additional restrictions on the original parameters
can aid their identification. When the restriction rt= 1 for each t is imposed, the
parameters of the estimable transformed equations involve only (T- m - 1)2m
original parameters. There will be at least as many parameters in the estimable
time periods for the transformed equation as original parameters when T - m - 2
>?m. Thus, when rt = 1 and the number of estimable time periods is at least as
large as the lag length, recovery of the original parameters from the transformed
parameters is possible and, as is apparent from equation (3.3), straightforward.
Identification of the original parameters is easiest when the individual effect
coefficients and the lag coefficients are stationary. In this case the transformed
equation (3.2) can be written:

(3.5) yjt-yit-l =a,+ E ai(yjt1-Yjt-l- .) + E 8(xit-- vxi, +


1=1 1=1

Here there are only 2m + 1 right-hand side variables, so that there are enough
instruments to identify the parameters if t > m + 2. In the stationary case it is
possible to obtain estimates of the lag parameters when T > m + 2.
A final case that is of interest occurs when measurement error is present.
Suppose that xit and yit are unobserved, and that instead we observe

(3.6) .it = xit ?e, it = Yi + eit


where ex, and e-v are measurement errors that are uncorrelated with all x and y
observations and are uncorrelated across time. For simplicity we consider the
implications of such measurement error for the stationary case. Substitution of

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1377

x., and yi, from equation (3.6) in equation (3.5) yields


m
(3.7) i-i-=+E (Y--Y--)
1=1
nt

1=1

where

(3.8) ) Qg ?e+e1(~1e~
=v~ +(t X
it=it e1e)-
I it-- I E a11eX
eit -et -1)

By the assumption that the measurement errors are uncorrelated across time, the
vector

=it Yit- m-2, Yjil Xnit -m-2. 9 i

will be uncorrelated with vit and thus qualify as the vector of instrumental
variables for equation (3.7).9 Here there are only 2(t - m - 2) + 1 instrumen-
tal variables, so that the requirement that there are at least as many instrumental
variables as right-hand side variables becomes t > 2m + 2. Thus, it will only be
possible to estimate the lag parameters in the stationary case with uncorrelated
measurement error when T > 2m + 2.

B. Estimation
The presentation requires some additional notation. Let
Yt=Ylt I YNtI and Xt = [xlt, , X
XNtI
be N x 1 vectors of observations on units for a given time period. Let
=- t-m-1, t-19 ' Xt-m-1]

be the N X (2m + 3) vector of right-hand side variables for equations (3.2), where
e is an N x 1 vector of ones. Let
Vt=[vIt,,.,, VNt]

be the N x 1 vector of transformed disturbance terms, and let


Bt = [ at, clt, - - *,tCm+ 1i t, dlt ,. dm+ 1, t
be the (2m + 3) x 1 vector of coefficients for the equations. Then we can write
equations (3.2) as10
(3.9) Yt= WtBt+ Vt (t=(m+3), ... ,T).
The autoregressive structure of equation (3.7) will result in Z,, being correlated with the
right-hand side variables. The use of instrumental variables to identify equation (3.7) under measure-
ment error is similar to the methods for identification of panel data models with measurement error
that have been suggested by Griliches and Hausman (1984).
10Observe that we exclude t < (m + 2) because these equations are not identified. See the
discussion above.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1378 D. HOLTZ-EAKIN,W. NEWEY, AND H. S. ROSEN

To combine all the observations for each time period, we can "stack" equations
(3.5). Let
Y[ Ym +33.*.. 9 Y. ]I

((T- m - 2)N x 1)
B=lBm+39 B`T I 9
((T- m - 2)(2m + 3) x 1)
V = I Vm+ 3, * * , VT, I'

((T-m-2)Nx 1)
W=diagIWMI+39... WTI]

((T-m-2)NX (T-m-2)(2m+ 3))


where diag[ ] denotes a block diagonal matrix with the given entries along the
diagonal. With this, the observations for equations (3.2) can be written:
(3.10) Y= WB+ V.
So far the discussion is quite similar to that of a classical simultaneous
equations system where the equations are indexed by t and the observations by i.
However, here the instrumental variables are different for different equations.
The matrix of variableswhich qualify for use as instrumentalvariables in period
t is

Z,= [e, Y,-2,..., Yl, Xt2,. X1]

which changes with T. Consider the matrix Z defined as


Z = diag[Zm+3 ...* ZT].
The orthogonality conditions ensure that

(zM+ 3Vm+ 3) =N
plim (Z'V)/N = plim 0.
N -oo N- oc ZTVT)/N

It follows directly that Z is the appropriate choice of instrumental variables for


(3.10).11
To estimate B, premultiply(3.10) by Z' to obtain:
(3.11) Z'Y= Z'WB + Z'V.
We can then form a consistent instrumentalvariables estimator by applying GLS
to this equation. As usual, such an estimator requires knowledge of the covari-
ance matrix of the (transformed)disturbances, Z'V. This covariance matrix, Q2is

"Limits are taken as N- oo, with T fixed.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1379

given by
2= E { Z'VV'Z}.

S2 is not known and therefore must be estimated. To do so, let B, be the


preliminary consistent estimator of B, formed by estimating the coefficients of
the equation for time period t using two-stage least squares (2SLS) on the
equation for each time period alone-using the correct list of instrumental
variables.'2 Using these preliminary estimates, form the vector of residuals for
period t: Vt= Yt- Wt-Bt.A consistent estimator of (Q2/N) is then formed by'3
N

(3.12) (b/N) rs = Z (VirVisZjrZi,)/N


where vut (t = r, s) is the ith element of Vt and Z1, is the ith row of Z,. Finally,
Q2is used to form a GLS estimator of the entire parameter vector, B, using all the
available observations:

(3.13) B = [W'Z(2) Z'W] W'Z(Q) Z'Y.

(i) Imposing Linear Constraints


Stationarity of the individual effect and of the lag coefficients requires estimat-
ing B subject to linear constraints. The hypothesis that x does not cause y also
imposes linear constraints on B. A simple way to formulate such constraints is to
specify that

(3.14) B= Hy+ G

where y is a k x 1 vector of parameters, H is a constant matrix with dimensions


((T- m - 2)(2m + 3) x k), and G is a constant vector of the same dimension as
B.14 Since y is the restricted parameter vector, it has dimension smaller than B.
Replacing B by Hy + G and subtracting WG from both sides of (3.10) gives

(3.15) Y = Y-WG = WHy + V = Wy + V.


This equation has exactly the same form as (3.10). Thus, we can estimate y as
before-using the data matrices transformed by G and H.
12 That is:

B,= [ WI,Z,( Z,Z,) Z, ] WI,Z,( Z,Z,) --Z, Y,


Z"WI,I
13 This procedure is an extension of White's (1980) heteroskedasticity consistent covariance matrix
estimator. It is appropriate if E virv = 0 for i, j, r, s such that i *j, that is, error terms for
different units are uncorrelated. Note that common factors are controlled by inclusion of time dummy
variables in the estimating equations.
14 The rank of H must be k for the restrictions to be unique.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1380 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

(ii) Efficiency
Several comments concerning the efficiency of B are in order. First, B is
efficient in the class of instrumental variable estimators which use linear combi-
nations of the instrumental variables. This follows directly from the results of
Hansen (1982). (See also White (1982).) However, just as 3SLS on an entire
system of equations may be more efficient than 3SLS on a subset of the
equations, it may be possible to improve the efficiency by jointly estimating both
the equation for yit given past values of y and x and the equation for xi, given
the history of x and y.
Second, recall that our procedure involves dropping the equations for the first
m + 2 time periods. When the parameters are nonstationary this procedure
involves no loss in efficiency. Although the equations that are dropped may be
correlated with the remaining equations, there are no cross equation restrictions,
and they are underidentified. When the parameters are stationary, dropping the
first m + 2 periods may involve some loss in efficiency. Because there are
cross-equation restrictions, efficiency can be improved by adding back t = m + 2
and t = m + 1 period equations, both of which have observable lags. Also, if
there is no heteroskedasticity (across time or individuals) in the innovation
variance for yit and xit, then all of the parameters for the joint Yit and xit
process can be estimated without the earliest cross-section moments, so that it
may be possible to further improve efficiency by using these moments. Cross-sec-
tion moment based estimation of moving average (but not autoregressive) time
series models in panel data has been considered by MaCurdy (1981a).

C. Hypothesis Testing
In this section we discuss the computation of statistics to test the hypotheses
that x does not cause y, that the parameters are stationary, that m is the correct
lag length, and other possible hypotheses. In each case, the test statistic revolves
around the sum of squared residuals, resulting in tests with chi-square distribu-
tion in large samples. Further, we consider two additional topics: tests when
parameters are not identified under the alternative hypothesis and sequences of
tests.
We consider only tests of linear restrictions on the estimated parameters, B.
Consider the null hypothesis:
(3.16) Ho: B = Hy + G
where the notation is as before. As we have shown, it is straightforwardto impose
this restriction during estimation. Let

(3.17) Q= (Y- WB)'Z(Q) W'B(Y-)w N,


QRe= (s - Wo)'Z(f)
stZ'(Y- W)aNi

Q is the unrestrictedsum of squaredresidualsand QRis the restrictedsum of

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1381

squared residuals. Q and QR each have a chi-square distribution as N grows."5


By analogy with the F statistic in the standard linear model, an appropriate test
statistic is

(3.18) L=QR-Q.

L has the form of the numerator of the F statistic. By construction, the


covariance matrix of the transformed disturbances is an identity matrix. As a
result, L has a chi-squared distribution with degrees of freedom equal to the
degrees of freedom of QR minus the degrees of freedom of Q.When all of the
parameters are identified under both the null and the alternative hypotheses,
the degrees of freedom of Q is equal to the number of instrumental variables (the
number of rows of Z'V in (3.11)) minus the number of parameters, i.e., the
dimension of B. Similarly, QR has degrees of freedom equal to the dimension of
B minus the dimension of yv.6
So far we have restricted our discussion to testing linear hypotheses concerning
the coefficients of a single equation of a vector autoregression. In some contexts
hypotheses that involve the coefficients of more than one equation and/or are
nonlinear may be of interest. The case of linear hypotheses on the coefficients of
more than one equation can be handled by simply stacking together the time
periods for the several equations and proceeding in the manner we have dis-
cussed. The case of nonlinear restrictions can be handled by formulating the
constraints as B = H(y), estimating y by nonlinear GLS on equation (3.11), with
H(y) in place of B, and forming the test statistic as before.

(i) UnidentifiedParameters
When executing the tests, it is often the case that some parameters are not
identified under the alternative hypothesis. For example, under the null hypothe-
sis that x does not cause y, lagged x's can be used as instrumental variables for
lagged y's. This is because lagged x's will be correlated with lagged y's via the
individual effect. Use of these instruments permits us to identify the parameters
in (3.11). Under the alternative hypothesis, the greater number of parameters
means that not all of the parameters are identified. Nonetheless, a test of the null

To see this, let P be the matrix such that PP = - 1. Then premultiplying (3.11) by P results in

PZ'Y/ = ( PZ'W/I ) B + ( PZ'V/I ).


Note that asymptotically, the disturbance P'Z'V/IN is normally distributed with a covariance
matrix equal to the identity matrix. As usual, sums of these squared residuals will have a chi-square
distribution.
16 L can be thought of as the extension of the Gallant and Jorgenson (1979) test statistic for 3SLS
to this application. Of course, we could use other asymptotically equivalent test statistics to test the
null hypothesis. In fact, the well known Wald test is numerically equivalent to our L. Newey and
West (1987) discuss the relationship between L and other test statistics, including regularity
conditions.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1382 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

hypothesis is still possible. The method is analogous to conducting a Chow test


with insufficient observations."7(See Fisher (1966).)
In more general notation, suppose that the parametersof the equation

Ys= WsBs + Vs
are not identified (in the absence of the restrictions imposed by the null
hypothesis) for time period s, i.e. Zs has fewer elements than Bs (and, hence
fewer than Ws). That is, in the equation
(3.19) ZsYs= Zs:WsBs
+ Zsv
the number of rows in Zs Ys is fewer than the number of parametersin Bs.
The appropriate test statistic once again uses the difference between the
restricted and unrestricted sum of squared residuals, but care must be taken in
constructing the covariance matrix 2. Since the same covariance matnrxmust be
used when computing both the restricted and unrestricted sum of squared
residuals, the following procedure is appropnate.
First obtain the restricted sum of squares, incorporating the fact that BS is
identified under the null hypothesis by adding equation (3.19) to the list of
equations to be estimated. Let B* = [B', Bs']' be the coefficients for the equa-
tions for all time periods. The parameters B are identified under either hypothe-
sis, but those for time period s are not. Consider the null hypothesis
(3.20) Ho: B* =Hy+ G
where the elements of y are identified. Using similar notation. let
V* = [V Vs', . Y* =[y Ys]
W*= diag[ W, Ws], Z* = diag[Z, Zs].
Under the null hypothesis, we may add equation (3.19) to equation (3.11) as:

(3.21) Z*'Y* = Z*'W*y + Z*'v*,


where Y= Y*- W*G and W* = W*H.
Next estimate the parameters, B*, and covanance matrix, QS,using the proce-
dure described above.
To obtain the unrestricted sum of squares and the appropriate test statistic.
only those equations identified under the alternative hypothesis are employed.
Accordingly, the appropriateestimate of the covariance matrix, 2, is a submatrix
of the covariance matrix estimated under the null hypothesis. The desired
submatrix is that for equations identified under the alternativehypothesis.'8

17The analogy is not exact because we consider more general hypotheses than simplv hypotheses
which impose equality across equations and because the Joint covariance matnx across (3. 11), above.
and (3.19), below, is not diagonal.
18 Importantly, the submatrix must be obtained from the estimated covanance matrix, S2, prior to
inverting the matrix and constructing the unrestrictedsum of squares.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
VECTORAUTOREGRESSIONS
ESTIMATING 1383

As before, QR- Q will have a chi-square distribution in large samples. In this


instance, the degrees of freedom is given by:

(3.22) [dim( Z*'v*) - dim (-y)] - [dim (Z'v )-dim (B)].

(ii) Sequences of Tests


Two important questions in this framework are whether the data are consistent
with a lag of length m and whether x causes y. It seems natural to nest the
hypothesis of noncausality within the hypothesis about the lag length. That is, it
makes sense to think of testing for noncausality conditional upon the outcome of
a test for the lag length. When hypotheses are nested in this manner, we can
construct a sequence of test statistics which will be (asymptotically) statistically
independent. This permits us to isolate the reason for the rejection of the joint
hypothesis.
To see how such a sequence is constructed, consider the two hypotheses

HI: B=Hy+G,
and the second hypothesis, nested within H1,

112: y=-+G.

Let Q be the unrestricted sum of squares, QR1 the restricted sum of squares from
imposing H1, and QR2 the sum of squares from imposing both H1 and H2, i.e.,
the restriction

B = HHy + (HG + G).


Then QR1 - Q is the appropriate test statistic for testing H1 and QR2 - QR1 is
the appropriate statistic for testing H, conditional upon H1 being true. Further-
more, it is the case that the two statistics are asymptotically independently
distributed.19
The significance of a joint test of H1 and H2 may be determined. Suppose that
the test consists of rejecting H, and H2 if either statistic is too large. Let the first
test have significance level a, and the second a2. The significance of the joint test
is:
+a -a a 20
a1 + a1- aa2.?

Notice that, if Hi is accepted, we can infer the correctness of H2 from whether or


not the test statistic for H2 is too large. However, if H1 is rejected we can say
nothing about H2 because it is nested within H1.

1) This result is a simple extension of similar results for the likelihood ratio test for maximum
likelihood.
20A similar procedure based upon Wald tests is discussed by Sargan (1980) in the closely related
context of testing for dynamic specification of time series models.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1384 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

IV. AN EXAMPLE

In this section we demonstrate the techniques described above in a dynamic


analysis of the relationship between annual hours worked and hourly earnings.

A. The Issues
The conventional approach to analyzing the relationship between hours worked
and the wage rate is to specify and estimate a model in which hours worked in a
given period depend upon that period's wage rate. Implicitly, past hours and
wages are assumed to have no impact on current hours. Similarly, the possibility
that the past history of wages and hours affects the current wage is ruled out.
However, on theoretical grounds it is quite plausible to expect intertemporal
relationships between wages and hours worked. For example, maximization of
utility in some life cycle models leads to labor supply functions which depend on
wages in other periods. (See, e.g., MaCurdy (1983).) Moreover, if there are costs
to adjusting hours of work in response to changes in wages, one might expect that
past hours of work would help predict current hours of work. At the same time,
some human capital accumulation models suggest that present wage rates depend
on past hours of work. (As hours increase, so does expertise on the job, leading to
a higher subsequent wage.) Alternatively, one can imagine incentive schemes that
link a worker's current wage rate to his past hours of work. (Hamilton (1986)
argues that such schemes may help explain the behavior of medical interns,
associates in law firms, and assistant professors.)
To fix ideas, consider the equation pair
(4.1a) hi =, + W + ??h + E,

(4.1b) wit=a,w + Slwlt-I + +S8wlt_ + pw + c't


where
(4.2) O=E w, h ' h a-2.W1, W'a-i ]
= E[r hiit2.-,
wjlh, jsw, hi,_1, Wit_1,wit-2, -
and hit is the natural log of hours worked for individual i in period t, wit is the
natural log of the wage of individual i in period t, and yh and [' are unobserved
individual effects. Equation (4.1a) is similar to a life cycle labor supply equation
derived by MaCurdy (1981b) for a particular specification of preferences. In this
equation [h represents the marginal utility of lifetime income (see also Heckman
and MacCurdy (1980) and Browning, Deaton, and Irish (1985)), plus other
individual specific variables. Unlike MaCurdy's (1981b) model, this equation
imposes the strong restriction that -, is serially uncorrelated. Of course, variables
which are a sum of a function of t and a function of i, such as experience, are
allowed for, since they would be absorbed by the time specific term a h and the
individual specific term Ih.
Equation (4.1b) is a wage forecasting equation. An important feature of this
equation is that lagged hours are excluded. If lagged hours were of use in
predicting wages, as might be the case in some of the scenarios previously

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1385

discussed, then the individual would take into account the effect of today's choice
of hours on tomorrow's wages, and the labor supply equation would take a
different form.
Substituting w1, from equation (4.1b) into equation (4.1a) gives
(4.3) h1i= (ah + paw) + /81w, 1 + **+
+ (th + pw) + (Eh +

which together with equation (4.1b) gives a VAR of the form considered in
Section 2. Note that lagged hours is excluded from equation (4.1a) and (4.3) and
that cross-equation restrictions are present. Evidence of the presence of lagged
hours in either equation might therefore be interpreted as evidence against this
specification. The presence of lagged hours in the wage equation might be
interpreted as presence of the kind of human capital or incentive effects previ-
ously mentioned, while the presence of lagged hours in the hours equations might
occur because of preferences that are not time separable or costs of adjusting
hours. Of course, the presence of lagged hours in the hours equation could also
be due to the omission of relevant variables or the violation of the assumption of
no serial correlation in e. Since serial correlation is often thought to be present
in such models, this perhaps reduces the substantive implications of the finding
that lagged hours appears in the hours equation.
The important point here is that this model and similar models imply the
presence of dynamic interrelationships between wages and hours, and these
interrelationships can be investigated using panel data on wages and hours. This
fact, of course, has been recognized by earlier investigators. Lundberg (1985)
used panel data to test whether hours Granger-cause wages. However, her
estimation procedure involved taking deviations from means to account for the
presence of individual effects. As we argued in Section II, such a procedure leads
to inconsistent estimates. Abowd and Card (1986) analyzed the time series
properties of the first differences in hours and earnings. Like MaCurdy (1981b)
and Lundberg, they assumed that the individual effect was stationary. Moreover,
although it would be possible to work backward from their estimates to learn
about the time series properties of the levels, this would be extremely cumber-
some, and they made no attempt to do so. In contrast, the procedures developed
in Section 3 allow us to obtain consistent estimates of the time series properties
of the levels of wages and hours without having to impose stationary individual
effects, and without having to employ difficult nonlinear methods.

B. The Data
We estimate equations for wages and hours using a sample of 898 males from
the Panel Study of Income Dynamics (PSID) covering the years 1968 to 1981.21
21
Our data include the Survey of Economic Opportunity (SEO) subsample, which oversamples low
income households. We performed all of the tests reported in the next section deleting the SEO
subsample. The results, which are available upon request, do not in general difler substantially from
those presented below. For the one exception, see footnote 22.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1386 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

We study two variables for each individual. First is the log of the individual's
annual hours of work ("hours", denoted h,), and second is the log of his annual
average hourly earnings ("wages", denoted w,). (For a complete description of
the data, see Altonji and Paxson (1986).) As discussed below, we also check some
of our results using data from the National Longitudinal Survey of Men 45-59.
The wage variable in both data sets is constructed by dividing total earnings by
hours worked. As a result, to the extent that measurement errors are present, they
will be correlated across variables. While this presents a problem for full-infor-
mation methods, the single equation techniques used here are unaffected by this
correlation. Finally, to the extent that measurement error induces a serial
correlation in the composite error terms of the autoregression, this problem will
reveal itself as a correlation between instrumental variables and the transformed
errors.

C. Estimation and Testing


Using the PSID data, we estimate two equations, one for hours and one for
wages. On the right side of each are lags of both wages and hours. We conduct
tests for parameter stationarity, minimum lag length, and causality or exclusion
restrictions.
While in principle it is desirable to begin by specifying an arbitrarily long
initial lag length, this poses a problem in practice. As additional lags are
specified, the block structure of the matrix of instrumental variables causes the
size of the weighting matrix (Q2in equation (3.12) above) to grow rapidly. For
such large matrices, standard numerical procedures for inversion may yield
unsatisfactory results. Therefore we initially assume a lag length m = 3, leading
to four lags in the quasi-differenced reduced form. The wage and hours equations
are estimated for the years 1977 to 1981.

(i) Wage Equation


Results for the wage equation are presented in Table I. The first step is to test
for parameter stationarity; i.e., both the individual effects and the lag parameters
in the equation are the same for each year. As the second line of the table shows,
the chi-square statistic for this test (26.22) indicates that one cannot reject this
hypothesis at any level of significance less than roughly 80%. Thus, the ap-
propriate specification of the wage equation is a first-differenced form containing
at least three lags each of wages and hours.
Column (1) of Table II shows estimates of the wage equation assuming
parameter stationarity, but with no other constraints imposed. The only parame-
ter that is statistically significant is that of the first lagged wage. Not only are the
other coefficients insignificant, they are relatively small in absolute value as well.
None of the other coefficients is more than about one-third the coefficient on the
first lagged wage, in absolute value. While suggestive, these observations do not
tell us which lag length is most consistent with the data. It is necessary to use the

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1387

TABLE I
WAGE EQUATION

Q L DF P

(i) m = 3 0.00 - 0 -
(ii) All Parameters
Stationary 26.22 26.22 34 0.828
(iii) m= 2 27.14 0.92 2 0.631
(iv) m= 1 29.31 2.17 2 0.338
(v) m= 0 43.40 14.09 2 0.001
(vi) Exclude Hours
m= 1 29.94 0.63 1 0.427

TABLE II
ESTIMATESa
UNCONSTRAINEDPARAMETER

(1) (2)
Wage Equation Hours Equation

Aht- l 0.0623 0.145


(0.0476) (0.0262)
,Awt_ 1 0.183 0.00116
(0.0631) (0.0385)
A ht - 2 0.0189 - 0.00489
(0.0276) (0.0158)
'AWt- 2 0.0359 - 0.0455
(0.0320) (0.0190)
A ht - 3 0.0200 - 0.0158
(0.0234) (0.0202)
'AWt- 3 - 0.00328 - 0.00185
(0.0245) (0.0168)

aFigures in parentheses are standard errors.

methods of Section III.C to conduct the appropriate tests. The lag length results
are recorded in lines (iii) through (v) of Table I, which show the results for the
sequence m = 2, m = 1, and m = 0, respectively. The results provide no evidence
that the wage equation contains more than a single lag of hours and wages.
Finally, we conduct a test of the hypothesis that hours do not cause wages.
Line (vi) of Table I shows that one cannot reject the hypothesis that lagged hours
may be excluded from the wage equation. The estimate of this single autoregres-
sive parameter is shown in column (1) of Table III. Note that the exclusion of
lagged hours from the wage equation rejects the notion that past hours of work
affect the current wage. To the extent that workers face a market locus of hours
and wages, it contains at most contemporaneous tradeoffs between hours and
wages.

(ii) Hours Equation


To complete the investigation, we perform a symmetric set of tests for the
specification of the hours equation. The results are reported in Table IV. As was

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1388 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

TABLE III
CONSTRAINED PARAMETER ESTIMATES'

(1) (2) (3)


Wage Eq. Hours Eq. Hours Eq.

(m=2) (n=1)
Aht 1 - 0.156 0.156
(0.0206) (0.0175)
,Aht r - 0.002
(0.0181) -
Aust, 0.135 -0.001 -
(0.0368) (0.05) _
-0.045 -
si2r - (0.0174) _

F'Figures in parentheses are standard errors.

TABLE IV
HOURS EQUATION

Q I. DF p

(i) =3 0.00 - - -
(ii) Parameters
Stationary 26.69 26.69 34 0.810
(iii) ot = 2 27.33 0.64 2 0.726
(iv) "n = 1 34.09 6.76 2 0.034
(v) Exclude Wages,
nt = 2 37.47 10.14 2 0.006
(vi) Exclude Wages.
t71= 1 37.62 3.53 1 0.060

the case with the wage equation, we cannot reject the hypothesis that the
appropriate specification is a first-differencedequation with constant lag parame-
ters (see line (ii)). The unconstrained parameter estimates for this specification
are presented in column (2) of Table II. As in the wage equation, the strongest
effect both from the point of view of the absolute value of the coefficient and
statistical significance, is the own first lag. However, while in the wage equation
nothing else seems to "matter," in the hours equation, the second lag on wages is
statistically significant.22 Indeed, as in Table IV one cannot reject the hypothesis
that m = 2 (see line iii). Further restrictions depend, however, on the chosen level
of significance. As line (iv) indicates, one cannot reject the hypothesis that m = 1
at the 1%level. However, this conclusion is reversed using a 5%significance level.
Therefore, we test the hypothesis that lagged wages do not cause hours
conditional on both m = 1 and m = 2. Using m = 2 (and, thus, adopting a 5%
significance level) one can reject the hypothesis that lagged wages may be
excluded from the equation for hours (line (v)). The parameter estimates for this
AR(2) model are shown in column (2) of Table III. Using the 1% level of
22
When the SEO subsample was excluded, the data did not reject the hypothesis that lagged wages
could be excluded from the hours equation.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1389

significance leads to different results. As shown in line (vi) of Table IV, one
cannot reject the hypothesis that lagged wages may be excluded from the AR(1)
specification of the hours equation at a 5% significance level. The resulting
parameter estimate is shown in column (3) of Table III.
Thus, using a 1% significance level, one is left with a parsimonious representa-
tion of the dynamic behavior of hours and wages. Both variables may be
represented as autonomous autoregressive processes with a single lag. The robust-
ness of this result is examined below.

(iii) Deviationsfrom Time Means


It is interesting to determine whether an inappropriate statistical technique
would lead to substantively different results. Recall that Lundberg tested for
intertemporal relationships between wages and hours by conducting F-tests for
exclusion of wages and hours from a VAR in which individual effects are
removed by measuring all variables as deviations from individual time means. We
applied this method to the PSID data. Specifically, we estimated a VAR with
constant parameters and three lags of wages and hours, measuring all variables as
deviations from time means. In direct contrast to the results presented above, this
procedure indicates that hours Granger-cause wages and wages Granger-cause
hours. The F statistic for the former test is 11.4 and for the latter 4.0. Both are
significant at the 1% level. In short, in this context using an inappropriate
estimation technique can lead to serious errors.

(iv) MeasurementError
The estimation procedure used in the wage and hours equations makes no
special allowance for the possibility of measurement error. Altonji (1986) has
estimated that a large part of the yearly variation in PSID data is due to
measurement error. As noted in Section 3, the estimation procedure may be
modified to accommodate measurement error by simply using a different set of
instrumental variables.
We examine the effect of measurement error on our results by re-estimating the
final form of both the wage and hours equations. In order to isolate the effect of
measurement error, we focus on the correlation between the composite error term
and the instrumental variables. For the AR(1) specification, measurement error
will produce a correlation between instrumental variables dated t - 2 and the
composite error. In the absence of measurement error, no such correlation will be
present. (See equation (3.8).) We estimate the equation for wages using two
different sets of instrumental variables: (i) lagged wages dated both t - 2 and
t- 3 and (ii) lagged wages dated t- 3.23 We estimate the hours equation using
the corresponding sets of instrumental variables. One can formally test the null
hypothesis of no correlation between the instrumental variables dated t - 2 and
23
Note that this set of instruments is more restrictive than that used in our previous estimations.
This change is an attempt to increase the power of the test for measurement error.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1390 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

TABLE V
MEASUREMENT ERROR CORRECTION a

Wage Hours
Equation Equation

(i) A ht-1 I_ 0.169


(0.0279)
'Awt1 0.179
(0.050) -
(ii) A h,t- 1 0.170
(0.224)
Awt_1 0.460
(0.172) -
(iii) X2(5) 6.23 1.427
(p = 0.284) (p = 0.921)

aFigures in parentheses are standard errors. Estimates in part (i) of the table use variables dated t - 2 and t - 3 as
instrumental variables; the estimates in part (ii) use only variables dated t - 3. Part (iii) contains the test statistic for a test of
the null hypothesis of no correlation.

the error terms-and thus no measurement error-using a generalized method of


moments test. (See Holtz-Eakin (forthcoming).)
The results are presented in Table V. Part (i) of the table shows the parameter
estimates using instrumental variables lagged t- 2 and t- 3. Part (ii) gives the
alternative parameter estimates using only variables dated t- 3 as instrumental
variables. Part (iii) contains the test statistic for a test of the null hypothesis of no
correlation. The statistic is distributed as a chi-square with five degrees of
freedom. The significance level of the test is shown in parentheses.
As the p-value indicates, the test of the null hypothesis of no correlation fails
to reject at conventional significance levels.24 It is important to note, however,
that the estimated autoregressive parameter in the wage equation changes sub-
stantially with the measurement error correction. It increases from 0.179 to 0.460
when wages dated t- 2 are excluded.
Turning to the hours equation, we find that the estimate of the autoregressive
parameter for hours does not vary with the measurement error correction,
although the precision of the estimate is affected. Not surprisingly, the null
hypothesis of no measurement error in the hours equation cannot be rejected.
In sum, our attempt to correct for measurement error produces mixed signals.
On one hand, changing the set of instrumental variables to allow for measure-
ment error can have an important effect on the parameter estimates. On the
other, the formal test suggests that this difference is not statistically significant.
This suggests that the test may have low power in this particular application.

(v) Evidencefrom the NLS


As another check of the robustness of these results, we examine data on wages
and hours from the National Longitudinal Survey of Men 45-59 (NLS). The

24 A Hausman test on the difference between the two coefficients also fails to reject the hypothesis,
although the p value of about 0.10 provides somewhat stronger evidence against the null hypothesis.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1391

TABLE VI
NLS RESULTS

Hours Equation
Q L DF P

m = 1, All Parameters
Stationary 21.21 - 16 0.170
Exclude wages 21.36 0.15 1 0.699

Wage Equation

m = 1, All Parameters
Stationary 25.95 - 16 0.055
Exclude Hours 26.00 0.05 1 0.823

NLS sample consists of 1446 men who had positive earnings and hours in each of
the survey years 1966, 1967, 1969, 1971, 1973, and 1975. (See Abowd and Card
(1986) for details on the construction of the sample.) As in the PSID, we use two
data series: the log of annual hours of work and the log of average hourly
earnings. The latter is constructed as the difference between the log of annual
earnings and the log of annual hours.
Use of the NLS is complicated by the above-noted fact that the data are not
available for consecutive years. For the simple AR(1) model
(4.4) yit =fi + alyit,l + uit,
the problem of missing years can be circumvented by successive substitution to
yield a relationship between two-year differences:

(4.5) y=.-Yit-2
Yi Yt 2 = a2(y,2
Yt2Yt4 u ? aluit
? uia1t_ 1-it3 1-it2

Equation (4.5) is estimable using our methods. However, for an AR(2) (or longer
lag), successive substitution gives an infinite order lag specification that is not
estimable using our methods. Fortunately, the results from the PSID discussed
above indicate (using the 1% significance level) that equation (4.4) may be an
adequate representation of both wages and hours. For these reasons, we con-
centrate on the estimation of AR(1) models. We test the assumption of excluda-
bility and the overidentifying restrictions implied by the initial assumptions on
stationarity and lag length.
The test results are contained in Table VI. For both wages and hours, we
cannot reject the initial assumptions of stationarity and lag length; although the
test for wages is borderline at the 5% level.25 Under the identification restriction
that both own lag coefficients are nonzero and have the same sign, we can test for
causality by testing whether the wage is significant in the hours equation, and
whether hours is significant in the wage equation. The result of these tests is that
one cannot reject the hypotheses that hours do not cause wages and wages do not

25 Note that we only identify the square of the matnrx of autoregressive coefficients from the
biannual data. These restrictions allow us to solve for the original matrix in terms of its square.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1392 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

TABLE VII
NLS PARAMETER ESTIMATESa

Wage Equation Hours Equation

Ah, 2 0.0263
(0.0410)
'A w 2 0.0492
(0.0259)
TRANSFORMED NLS PARAMETER ESTIMATESa
Wage Equation Hours Equation

zAh, 0.162
- (0.127)
,A 0.222
(0.0584)

Figures in parcntheses are standard errors.

cause hours. The result in both cases is an equation of the form specified in (4.5).
Thus, neither the PSID nor NLS data lend support to the notion that the current
wage rate depends upon past hours of work.
The parameter estimates are shown in Table VII. Of course, because of the
presence of the al in equation (4.5), these parameters do not correspond directly
to those we obtained using PSID data. To make the estimates comparable, we
take square roots, imposing an identifying assumption that the underlying
coefficients are positive. These transformed estimates (and their standard errors)
are shown at the bottom of Table VII.
The correspondence between these point estimates and those in Table III is
striking. This is particularly compelling when one considers that the two data sets
cover different years and contain different types of individuals: the PSID has
only hourly employees, while the NLS has only relatively old workers. With
respect to the statistical significance of the estimates, in both data sets the lagged
wage is significant at conventional levels. There is some disagreement with
respect to lagged hours, however. For the PSID the t statistic for the coefficient
of lagged hours in the hours equation is 8.90; for the NLS it is 1.28. While this
evidence is mixed it does suggest that it is potentially dangerous to exclude
lagged hours from the autoregressive hours equation. Whether this fact has
consequences for the appropriate specification of the structural hours equation is
unclear; it depends on whether the presence of lagged hours is due to serial
correlation .2
26

26 Using PSID data, we tested for the presence of first order serial correlation in eh, of equation
(4.1a) using a Wald test of the common factor restriction that the coefficient of H, times the
coefficient of w, is equal to the negative of the coefficient of w, 2 (This test is valid under the
assumption that only w, l appears in the wage equation.) The test indicated that one can reject
the hypothesis of first order autocorrelation at a 3 percent significance level. Of course, testing for
other patterns of serial correlation might produce a diflerent result. Hence, while this test is suggestive
that the presence of lagged hours is indeed due to "structural" considerations, it is not conclusive.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1393

TABLE VIII
NLS MEASUREMENT ERROR CORRECTIONa

Wage Equation Hours Equation

(i)A h,i2 -0.092


(0.0597)
'AW,, 20.063
(0.0312) _
(ii) Ah, -2- - 0.224
(0.35)
'AW 20.420
(0.154)
(i)X(3) 10.913 2.295
(p = 0.017) (p = 0.513)

aFigures in parentheses are standard errors. Estimates in part (i) of the table use variables dated - 2 and t --- 4 as
instrumental variables, the estimates in part (ii) use onlv variables dated t -- 4. Part (iii) contains the test statistic for a test of
the null hvpothesis of no correlation.

Finally, in the same fashion as for the PSID we check for the importance of
measurement error by re-estimating the equations using alternative sets of
instrumental variables. The results for the untransformed parameter estimates are
shown in Table VIII. Comparing these results to those in the top of Table VII,
we see that much like the results from the PSID, the most seriously affected
coefficient is that in the wage equation. Here, however, the formal test for the
wage equation supports the hypothesis of measurement error at close to the 1%
level.27As before, for the hours equation, the hypothesis of measurement error is
not supported.

5. CONCLUSION

We have presented a simple method of estimating vector autoregression


equations using panel data. The key to its simplicity is the fact that estimation
and testing have straightforward GLS interpretations-no nonlinear optimiza-
tion is required.
We applied our estimation procedure to the study of dynamic relationships
between wages and hours. Our empirical results are consistent with the absence
of lagged hours in the wage forecasting equation, and thus with the absence of
certain human capital or dynamic incentive effects. Our results also show that
lagged hours is important in the hours equation, which is consistent with
alternatives to the simple labor supply model that allow for costly hours adjust-
ment or preferences that are not time separable. As usual, of course, these results
might be due to serial correlation in the error term or to a functional form
misspecification. However, we find it encouraging that broadly similar results are
obtained from two different data sets.

27
A Hausman test also supports the measurement error hypothesis for the wage equation; the p
value is about 0.02.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
1394 D. HOLTZ-EAKIN, W. NEWEY, AND H. S. ROSEN

More generally, our empirical example demonstrates the importance of testing


for the appropriate lag length prior to causality testing, an issue of considerable
importance in short panels. In the absence of such tests, no inferences concerning
causal relationships can be drawn. The example also demonstrates that use of
inappropriate methods to deal with individual effects in the VAR context can
lead to highly misleading results.

Columbia University,New York City, ANewYork, U.S.A.


and
Princeton University,Princeton, NJ, U.S.A.

ManuscriptreceivedJune, 1985; final revision received October,1987.

REFERENCES
ABOWD,JOHNM., AND DAVID CARD (1986): "On the Covariance Structure of Earnings and Hours
Changes," mimeo, Princeton University, October, 1986.
ALTONJI, J. (1986): "Intertemporal Substitution in Labor Supply: Evidence from Micro Data,"
Journal of Political Economy, 94, S176-S215.
ALTONJ, J., AND C. PAXSON (1986): "Job Characteristics and Hours of Work," Research in Labor
Economics.
ANDERSON, T. W., AND C. HSIAO (1982): "Formulation and Estimation of Dynamic Models Using
Panel Data," Journal of Econometrics,18, 47-82.
ASHENFELTER,ORLEY, AND D. CARD(1982): "Time Series Representations of Economic Variables
and Alternative Models of the Labour Market," Review of Economic Studies, 49, 761-782.
BROWNING, M. J., A. S. DEATON, AND M. IRISH(1985): "A Profitable Approach to Labor Supply and
Commodity Demands Over the Life-Cycle," Econometrica, 53, 503-544.
CHAMBERLAIN,GARY (1983): "Panel Data," Chapter 22 in The Handbook of Econometrics VolumeII,
ed. by Z. Griliches and M. Intrilligator. Amsterdam: North-Holland Publishing Company.
FISHER, FRANKLIN (1966): The Identification Problem in Econometrics. Huntington, N.Y.: Krieger
Publishing Company.
GALLANT, RONALD, AND D. JORGENSON(1979): "Statistical Inference for a System of Nonlinear,
Implicit Equations in the Context of Instrumental Variables Estimation," Journal of Econometrics,
11, 275-302.
GRILICHES, ZVI, AND J. HAUSMAN (1984): "Errors in Variables in Panel Data," National Bureau of
Economic Research Technical Working Paper No. 37, May, 1984.
HAMILTON, BRUCE (1986): "Merit Pay Increases and the Supply of Labor," mimeo, Johns Hopkins
University, November, 1986.
HANSEN, LARS (1982): "Large Sample Properties of Generalized Method of Moments Estimators,"
Econometrica, 50, 1029-1054.
HECKMAN, J. J., AND T. E. MACuRDY (1980): "A Life Cycle Model of Female Labor Supply," Review
of Economic Studies, 47, 47-74.
HOLTz-EAKIN, D. (FORTHCOMING):"Testing for Individual Effects in Autoregressive Models,"
forthcoming in Journal of Econometrics.
HOLTz-EAKIN, D., W. NEWEY, AND H. ROSEN (FORTHCOMING):"The Revenues-Expenditures Nexus:
Evidence from Local Government Data," forthcoming in International Economic Review.
LUNDBERG, S. (1985): "Tied Wage-Hours Offers and the Endogeneity of Wages," Review of
Economics and Statistics, 67, 405-410.
MACuRDY,T. (1981a): "Time Series Models Applied to Panel Data," mimeo, Stanford University.
(1981b): "An Empirical Model of Labor Supply in a Life Cycle Setting," Journal of Political
Economy, 89, 1059-1086.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions
ESTIMATING VECTOR AUTOREGRESSIONS 1395

(1983): "A Simple Scheme for Estimating an Intertemporal Model of Labor Supply and
Consumption in the Presence of Taxes and Uncertainty," International Economic Review, 24,
265-289.
NEWEY,WHITNEY,AND K. WEST(1987): "Hypothesis Testing with Efficient Method of Moment
Estimation," International Economic Review, 28, 777-787.
NICKELL,STEPHEN(1981): "Biases in Dynamic Models with Fixed Effects," Econometrica, 49,
1417-1426.
PAKES,ARIEL,AND Zvi GRILICHES (1984): "Estimating Distributed Lags in Short Panels with an
Application to the Specification of Depreciation Patterns and Capital Stock Constructs," Review of
Economic Studies, 51, 243-262.
SARGAN,J. D. (1980): "Some Tests of Dynamic Specification for a Single Equation," Econometrica,
48, 879-898.
TAYLOR,JOHN (1980): "Output and Price Stability-An International Comparison," Journal of
Economic Dynamics and Control, 2, 109-132.
WHITE, HALBERT(1980): "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a
Direct Test for Heteroskedasticity," Econometrica,48, 817-838.
(1982): "Instrumental Variables Regression with Independent Observations," Econometrica,
50, 483-500.

This content downloaded from 206.212.0.156 on Fri, 1 Nov 2013 12:56:46 PM


All use subject to JSTOR Terms and Conditions

You might also like