Econometrica: Eywords
Econometrica: Eywords
BY JUSHAN BAI1
This paper considers large N and large T panel data models with unobservable mul-
tiple interactive effects, which are correlated with the regressors. In earnings studies,
for example, workers’ motivation, persistence, and diligence combined to influence the
earnings in addition to the usual argument of innate ability. In macroeconomics, inter-
active effects represent unobservable common shocks and their heterogeneous impacts
on cross sections. We consider identification, consistency, and the limiting distribution
of the interactive-effects
√ estimator. Under both large N and large T , the estimator is
shown to be NT consistent, which is valid in the presence of correlations and het-
eroskedasticities of unknown form in both dimensions. We also derive the constrained
estimator and its limiting distribution, imposing additivity coupled with interactive ef-
fects. The problem of testing additive versus interactive effects is also studied. In ad-
dition, we consider identification and estimation of models in the presence of a grand
mean, time-invariant regressors, and common regressors. Given identification, the rate
of convergence and limiting results continue to hold.
1. INTRODUCTION
WE CONSIDER THE FOLLOWING PANEL DATA MODEL with N cross-sectional
units and T time periods
and
1
I am grateful to a co-editor and four anonymous referees for their constructive comments,
which led to a much improved presentation. I am also grateful for comments and suggestions from
seminar participants at the University of Pennsylvania, Rice, MIT/Harvard, Columbia Econo-
metrics Colloquium, New York Econometrics Camp (Saratoga Springs), Syracuse, Malinvaud
Seminar (Paris), European Central Bank/Center for Financial Studies Joint Workshop, Cam-
bridge University, London School of Economics, Econometric Society European Summer Meet-
ings (Vienna), Quantitative Finance and Econometrics at Stern, and the Federal Reserve Bank
of Atlanta. This work is supported in part by NSF Grants SES-0551275 and SES-0424540.
Then
λi Ft = αi + ξt
The case of r = 1 has been studied by Holtz-Eakin, Newey, and Rosen (1988)
and Ahn, Lee, and Schmidt (2001), among others.
Owing to potential correlations between the unobservable effects and the
regressors, we treat λi and Ft as fixed-effects parameters to be estimated. This
is a basic approach to controlling the unobserved heterogeneity; see Chamber-
lain (1984) and Arellano and Honore (2001). We allow the observable Xit to
be written as
r
r
r
where ak , bk , and ck are scalar constants (or vectors when Xit is a vector),
and Gt is another set of common factors that do not enter the Yit equation.
So Xit can be correlated with λi alone or with Ft alone, or can be simultane-
ously correlated with λi and Ft . In fact, Xit can be a nonlinear function of λi
and Ft . We make no assumption on whether Ft has a zero mean or whether
Ft is independent over time: it can be a dynamic process without zero mean.
The same is true for λi . We directly estimate λi and Ft , together with β subject
to some identifying restrictions. We consider the least squares method, which
is detailed in Section 3.
While additive effects can be removed by the within-group transformation
(least squares dummy variables), the scheme fails to purge interactive ef-
fects. For example, consider r = 1, Yit = Xit β + λi Ft + εit . Then Yit − Ȳi· =
(Xit − X̄i·) β + λi (Ft − F̄) + εit − ε̄i· where Ȳi·, X̄i·, and ε̄i· are averages
over time. Because Ft ≡ F̄ , the within-group transformation with cross-section
dummy variable is unable to remove the interactive effects. Similarly, the inter-
active effects cannot be removed with time dummy variable. Thus the within-
PANEL MODELS WITH FIXED EFFECTS 1231
group estimator is inconsistent since the unobservables are correlated with the
regressors. However, the interactive effects can be eliminated by the quasi-
differencing method, as in Holtz-Eakin, Newey, and Rosen (1988). Further
details are provided in Section 3.3.
Recently, Pesaran (2006) proposed a new estimator that allows for multiple
factor error structure under large N and large T . His method augments the
model with additional regressors, which are the cross-sectional averages of the
dependent and independent variables, in an attempt to control for Ft . His es-
timator requires a certain rank condition, which is not guaranteed
√ to be met,
that depends on data generating processes. Peseran showed N consistency
irrespective of the rank condition, and a possible faster rate of convergence
when the rank condition does hold. Coakley, Fuertes, and Smith (2002) pro-
posed a two-step estimator, but this estimator was found to be inconsistent by
Pesaran. The two-step estimator, while related, is not the least squares estima-
tor. The latter is an iterated solution.
Ahn, Lee, and Schmidt (2001) considered the situation of fixed T and noted
that the least squares method does not give a consistent estimator if ser-
ial correlation or heteroskedasticity is present in εit . Then they explored the
consistent generalized method of moments (GMM) estimators and showed
that a GMM method that incorporates moments of zero correlation and ho-
moskedasticity is more efficient than least squares under fixed T . The fixed T
framework was also studied earlier by Kiefer (1980) and Lee (1991).
Goldberger (1972) and Jöreskog and Goldberger (1975) are among the ear-
lier advocates for factor models in econometrics, but they did not consider
correlations between the factor errors and the regressors. Similar studies in-
clude MaCurdy (1982), who considered random effects type of generalized
least squares (GLS) estimation for fixed T , and Phillips and Sul (2003), who
considered SUR-GLS (seemingly unrelated regressions) estimation for fixed
N. Panel unit root tests with factor errors were studied by Moon and Perron
(2004). Kneip, Sickles, and Song (2005) assumed Ft is a smooth function of t
and estimated Ft by smoothing spline. Given the spline basis, the estimation
problem becomes that of ridge regression. The regressors Xit are assumed to
be independent of the effects.
In this paper, we provide a large N and large T perspective on panel data
models with interactive effects, permitting the regressor Xit to be correlated
with either λi or Ft , or both. Compared with the fixed T analysis, the large T
perspective has its own challenges. For example, an incidental parameter prob-
lem is now present in both dimensions. Consequently, a different argument is
called for. On the other hand, the large T setup also presents new opportu-
nities. We show that √ if T is large, comparable with N, then the least squares
estimator for β is NT consistent, despite serial or cross-sectional correla-
tions and heteroskedasticities of unknown form in εit . This presents a contrast
to the fixed T framework, in which serial correlation implies inconsistency.
Earlier fixed T studies assume independent and identically distributed (i.i.d.)
1232 JUSHAN BAI
Xit over i, disallowing Xit to contain common factors, but permitting Xit to be
correlated with λi . Earlier studies also assume εit are i.i.d. over i. We allow εit
to be weakly correlated across i and over t, thus, uit has the approximate factor
structure of Chamberlain and Rothschild (1983). Additionally, heteroskedas-
ticity is allowed in both dimensions.
Controlling fixed effects by directly estimating them, while often an effective
approach, is not without difficulty—known as the incidental parameter prob-
lem, which manifests itself in bias and inconsistency at least under fixed T , as
documented by Neyman and Scott (1948), Chamberlain (1980), and Nickell
(1981). Even for large T , asymptotic bias can persist in dynamic or nonlinear
panel data models with fixed effects.2 We show that asymptotic bias arises un-
der interactive effects, leading to nonzero centered limiting distributions.
We also show that bias-corrected estimators can be constructed in a way
similar to Hahn and Kuersteiner (2002) and Hahn and Newey (2004), who
argued that bias-corrected estimators may have desirable properties relative
to instrumental variable estimators.
Because additive effects are special cases of interactive effects, the interac-
tive-effects estimator is consistent when the effects are, in fact, additive, but
the estimator is less efficient than the one with additivity imposed. In this pa-
per, we derive the constrained estimator together with its limiting distribution
when additive and interactive effects are jointly present. We also consider the
problem of testing additive effects versus interactive effects.
In Section 2, we explain why incorporating interactive effects can be a useful
modelling paradigm. Section 3 outlines the estimation method and Section 4
discusses the underlying assumptions that lead to consistent estimator. Sec-
tion 5 derives the asymptotic representation and the asymptotic distribution of
the estimator. Section 6 provides an interpretation of the estimator as a gener-
alized within-group estimator. Section 7 derives the bias-corrected estimators.
Section 8 considers estimators with additivity restrictions and their limiting dis-
tributions. Section 9 studies Hausman tests for additive effects versus interac-
tive effects. Section 10 is devoted to time-invariant regressors and regressors
that are common to each cross-sectional unit. Monte Carlo simulations are
given in Section 11. All proofs are provided either in the Appendix or in the
Supplemental Material (Bai (2009)).
2. SOME EXAMPLES
Macroeconometrics
Here Yit is the output (or growth rate) for country i in period t, Xit is the
input such as labor and capital, Ft represents common shocks (e.g., techno-
logical shocks and financial crises), λi represents the heterogeneous impact of
2
See Nickell (1981), Anderson and Hsiao (1982), Kiviet (1995), Hsiao (2003, pp. 71–74), and
Alvarez and Arellano (2003) for dynamic panel data models; see Hahn and Newey (2004) for
nonlinear panel models.
PANEL MODELS WITH FIXED EFFECTS 1233
common shocks on country i, and, finally, εit is the country-specific error term
of output (or growth rate). In general, common shocks not only affect the out-
put directly (through the total factor productivity or Solow resdidual), but also
affect the amount of input in the production process (through investment deci-
sions). When common shocks have homogeneous effects on the output, that is,
λi = λ for all i, the model collapses to the usual time effect by letting δt = λ Ft ,
where δt is a scalar. It is the heterogeneity that gives rise to a factor structure.
Recently, Giannone and Lenza (2005) provided an explanation for the
Feldstein–Horioka (1980) puzzle, one of the six puzzles in international macro-
economics (Obstfeld and Rogoff (2000)). The puzzle refers to the excessively
high correlation between domestic savings and domestic investments in open
economies. In their model, Yit is the investment and Xit is the savings for coun-
try i, Ft is the common shock that affects both investment and savings deci-
sions. Giannone and Lenza found that the high correlation is a consequence
of the strong assumption that shocks have homogeneous effects across coun-
tries (additive effects); it disappears when shocks are allowed to have hetero-
geneous impacts (interactive effects).
Microeconometrics
In earnings studies, Yit represents the wage rate for individual i with age (or
age cohort) t and Xit is a vector of observable characteristics, such as educa-
tion, experience, gender, and race. Here λi represents a vector of unobservable
characteristics or unmeasured skills, such as innate ability, perseverance, mo-
tivation, and industriousness, and Ft is a vector of prices for the unmeasured
skills. The model assumes that the price vector for the unmeasured skills is
time-varying. If Ft = f for all t, the standard fixed-effects model is obtained
by letting αi = λi f . In this example, t is not necessarily the calendar time, but
age or age cohort. Applications in this area were given by Cawley, Connelly,
Heckman, and Vytlacil (1997) and Carneiro, Hansen, and Heckman (2003). As
explained in a previous version, the model of Abowd, Kramarz, and Margolis
(1999) can be extended to disentangle the worker and the firm effects, while
incorporating interactive effects. Ahn, Lee, and Schmidt (2001) provided a the-
oretical motivation for a single factor model based on the work of Altug and
Miller (1990) and Townsend (1994).
In the setup of Holtz-Eakin, Newey, and Rosen (1988), the slope coefficient
β is also time-varying. Their model can be considered as a projection of Yit
on {Xit λi }; see Chamberlain (1984). Pesaran (2006) allowed β to be hetero-
geneous over i such that βi = β + vi with vi being i.i.d.. In this regard, the
constant slope coefficient is restrictive. To partially alleviate the restriction, it
would be useful to allow additional individual and time effects as
Finance
Here Yit is the excess return of asset i in period t; Xit is a vector of observable
factors such as dividend yields, dividend payout ratio, and consumption gap as
in Lettau and Ludvigson (2001) or book and size factors as in Fama and French
(1993); Ft is a vector of unobservable factor returns; λi is the factor loading; εit
is the idiosyncratic return. The arbitrage pricing theory of Ross (1976) is built
upon a factor model for asset returns. Campbell, Lo, and MacKinlay (1997)
provided many applications of factor models in finance.
Cross-Section Correlation
Interactive-effects models provide a tractable way to model cross-section
correlations. In the error term uit = λi Ft + εit , each cross section shares the
same Ft , causing cross-correlation. If λi = 1 for all i, and εit are i.i.d. over
i and t, an equal correlation model is obtained. In a recent paper, Andrews
(2005) showed that cross-section correlation induced by common shocks can
be problematic for inference. Andrews’ analysis is confined within the frame-
work of a single cross-section unit. In the panel data context, as shown here,
consistency and proper inference can be obtained.
3. ESTIMATION
3.1. Issues of Identification
Even in the absence of regressors Xit , the lack of identification for factor
models is well known; see Anderson and Rubin (1956) and Lawley and Max-
ell (1971). The current setting differs from classical factor identification in two
aspects. First, both factor loadings and factors are treated as parameters, as
opposed to factor loadings only. Second, the number of individuals N is as-
sumed to grow without bound instead of being fixed, and it can be much larger
than the number of observations T .
Write the model as
Yi = Xi β + Fλi + εi
where
⎡ ⎤ ⎡ X ⎤ ⎡ F ⎤ ⎡ ⎤
Yi1 i1 1 εi1
⎢ Yi2 ⎥ ⎢X ⎥
⎢F ⎥
⎢ εi2 ⎥
Yi = ⎢ ⎥ Xi = ⎢ ⎥ F =⎢ ⎥ εi = ⎢ ⎥
i2 2
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
YiT
XiT FT εiT
(5) Y = Xβ + FΛ + ε
PANEL MODELS WITH FIXED EFFECTS 1235
(6) F F/T = Ir
yields r(r + 1)/2 restrictions. This is a commonly used normalization; see, for
example, Connor and Korajzcyk (1986), Stock and Watson (2002), and Bai and
Ng (2002). Additional r(r − 1)/2 restrictions can be obtained by requiring
(7) Λ Λ = diagonal
These two sets of restrictions uniquely determine Λ and F , given the product
FΛ .4 The least squares estimators for F and Λ derived below satisfy these
restrictions.
With either fixed N or fixed T , factor analysis would require additional re-
strictions. For example, the covariance matrix of εi is diagonal or the covari-
ance matrix depends on a small number of parameters via parameterization.
Under large N and large T , the cross-sectional covariance matrix of εit or the
time series covariance matrix can be of an unknown form. In particular, none
of the elements is required to be zero. However, the correlation—either cross
sectional or serial—must be weak, which we assume to hold. This is known as
the approximate factor model of Chamberlain and Rothschild (1983).
Sufficient variation in Xit is also
Nneeded. The usual identification condi-
i is of full rank, where MF =
1
tion for β is that the matrix NT i=1 X i MF X
−1
IT − F(F F) F . Because F is not observable and is estimated, a stronger con-
dition is required. See Section 4 for details.
3.2. Estimation
The least squares objective function is defined as
N
3
The normalization still leaves rotation indeterminacy. For example, let G be an r × r orthog-
onal matrix, and let F ∗ = FG and Λ∗ = ΛG. Then FΛ = F ∗ Λ∗ and F ∗ F ∗ /T = F F/T = I. To
remove this indeterminacy, we fix G to make Λ∗ Λ∗ = G Λ ΛG a diagonal matrix. This is the
reason for restriction (7).
4
Uniqueness is up to a columnwise sign change. For example, −F and −Λ also satisfy the
restrictions.
1236 JUSHAN BAI
subject to the constraint F F/T = Ir and Λ Λ being diagonal. Define the pro-
jection matrix
MF = IT − F(F F)−1 F = IT − FF /T
The least squares estimator for β for each given F is simply
N
−1 N
β̂(F) = Xi MF Xi Xi MF Yi
i=1 i=1
N
N
and
1
N
(12) (Yi − Xi β̂)(Yi − Xi β̂) F̂ = F̂VNT
NT i=1
PANEL MODELS WITH FIXED EFFECTS 1237
Λ̂ = T −1 F̂ (Y − X β̂)
N
N
(13) tr(W MF W ) = Wi MF Wi = (Yi − Xi β) MF (Yi − Xi β)
i=1 i=1
This is also the objective function considered by Ahn, Lee, and Schmidt (2001),
although a different normalization is used. They as well as Kiefer (1980) dis-
cussed an iteration procedure for estimation. Interestingly, convergence to a
local optimum for such an iterated estimator was proved by Sargan (1964).
Here we suggest a more robust iteration scheme (having a much better con-
vergence property from Monte Carlo evidence) than the one implied by (11)
and (12). Given F and Λ, we compute
N
−1 N
β̂(F Λ) = Xi Xi Xi (Yi − Fλi )
i=1 i=1
and given β, we compute F and Λ from the pure factor model Wi = Fλi + ei
with Wi = Yi − Xi β. This iteration scheme only requires a single matrix inverse
N
( i=1 Xi Xi )−1 , with no need to update during iteration. Our simulation results
are based on this scheme.
5
We divide this matrix by NT so that VNT will have a proper limit. The scaling does not affect F̂ .
1238 JUSHAN BAI
resulting equation, then the factor error will be eliminated. This approach was
used by Ahn, Lee, and Schmidt (2006). The GMM method as in Holtz-Eakin,
Newey, and Rosen (1988) and Ahn, Lee, and Schmidt (2001) can be used to
estimate the model parameters consistently under some identification condi-
tions. For the case of r = 1, GMM was also discussed by Arellano (2003) and
Baltagi (2005). While not always necessary, there may be a need to recover
the original model parameters; see Holtz-Eakin, Newey, and Rosen (1988) for
details.
This estimator is consistent under fixed T despite serial correlation and het-
eroskedasticity in εit . In contrast, the least squares estimator will be inconsis-
tent under this setting. On the other hand, as T increases, due to the many-
parameter and many-instrument problem, the GMM method tends to yield
bias, a known issue from the existing literature (e.g., Newey and Smith (2004)).
The least squares estimator is consistent under large N and large T with un-
known form of correlation and heteroskedasticity, and the bias is decreasing
in N and T . Furthermore, the least squares method directly estimates all pa-
rameters, including the factor processes Ft and the factor loadings λi , so there
is no need to recover the original parameters. In many applications, the esti-
mated factor processes are used as inputs for further analysis, for example, the
diffusion index forecasting of Stock and Watson (2002) and factor-augmented
vector autoregression of Bernanke, Boivin, and Eliasz (2005). The estimated
loadings are useful objects in finance; see Connor and Korajzcyk (1986). Re-
covering those original parameters becomes more involved under multiple fac-
tors with the quasi-differencing approach. The least squares method is simple
and effective in handling multiple factors. Furthermore, computation of the
least squares method under large N and large T is quite fast.
In summary, with small T and with potential serial correlation and time se-
ries heteroskedasticity, the quasi-differencing method is recommended in view
of its consistency properties. With large T , the least squares method is a viable
alternative. Remark 2 suggests another alternative under small T .
where δt = A Ft . The above model still has a factor error structure. However,
when Ft is assumed to be uncorrelated with the regressors, the aggregated er-
ror ηi Ft + εit is now uncorrelated with the regressors, so we can use a random-
1240 JUSHAN BAI
with ρi = Bλi . Again, a random-effects GLS can be used. When both λi and
Ft are correlated with regressors, we apply both projections and augment the
model with cross-products of X̄i· and X̄·t , in addition to X̄i· and X̄·t so that,
with ρi = B ηi and δt = A ξt ,
METHOD 3: The method of Pesaran (2006) augments the model with re-
gressors (Ȳ·t X̄·t ) under the assumption of Ft being correlated with regres-
sors, where Ȳ·t and X̄·t attempt to estimate Ft , similar to the projection ar-
gument of Mundlak. But in the Mundlak argument, the projection residual
ξt is assumed to have a fixed variance. In contrast, the variance of ξt is as-
sumed to converge to zero as N → ∞ in Pesaran (2006), who assumed Xit is
N
of the form Xit = Bi Ft + eit so that X̄·t = BFt + ξt with B = N −1 i=1 Bi and
N
ξt = N −1 i=1 eit . The variance of ξt is of O(N
√
−1
). Thus the factor error λ√i ξt is
negligible under large N. He established N consistency and possible NT
consistency for some special cases. It appears that when λi is correlated with
the regressors, additional regressors Ȳi· and X̄i· should also be added to achieve
consistency.
4. ASSUMPTIONS
In this section, we state assumptions needed for consistent estimation
and explain the meaning of each assumption prior to or after its introduc-
tion. Throughout, for a vector or matrix A, its norm is defined as A =
(tr(A A))1/2 .
The p × p matrix
1 1 1
N N N
D(F) = X MF Xi − X MF Xk aik
NT i=1 i T N 2 i=1 k=1 i
where aik = λi (Λ Λ/N)−1 λk , plays an important role in this paper. Note that
aik = aki since it is a scalar. The identifying condition for β is that D(F) is pos-
itive definite. If F were observable, the identification condition for β would be
that the first term of D(F) on the right-hand side is positive definite. The pres-
PANEL MODELS WITH FIXED EFFECTS 1241
ence of the second term is because of unobservable F and Λ. The reason for
this particular form is the nonlinearity of the interactive effects.
Define a T × p vector
1
N
Zi = MF Xi − MF Xk aik
N k=1
so that Zi is equal to the deviation of MF Xi from its mean, but here the mean
is a weighted average. Write Zi = (Zi1 Zi2 ZiT ) . Then
1 1 1
N N T
D(F) = Z Zi = Zit Zit
NT i=1 i N i=1 T t=1
N
The first equality follows from aik = aki and N −1 i=1 aik aij = akj , and the sec-
ond equality is by definition. Thus D(F) is at least semipositive definite. Since
each Zit Zit is a rank 1 semidefinite matrix, summation of NT such semidefi-
nite matrices should lead to a positive definite matrix, given enough variations
in Zit over i and t. Our first condition assumes D(F) is positive definite in the
p
limit. Suppose that as N T → ∞, D(F) −→ D > 0. If εit are i.i.d. (0 σ 2 ), then
the limiting distribution of β̂ can be shown to be
√
NT (β̂ − β) → N(0 σ 2 D−1 )
ASSUMPTION B: T p
(i) E Ft 4 ≤ M and T1 t=1 Ft Ft −→ ΣF > 0 for some r × r matrix ΣF , as
T → ∞. p
(ii) E λi 4 ≤ M and Λ Λ/N −→ ΣΛ > 0 for some r ×r matrix ΣΛ , as N → ∞.
1 1 1
N T
This assumption rules out dynamic panel data models and is given for the
purpose of simplifying the proofs. The procedure works well even with lagged
dependent variables; see Table V in the Supplemental Material. We do allow
Xit , Ft , and εit to be dynamic processes. If lagged dependent variables are in-
cluded in Xit , then εit cannot be serially correlated. Also note that Xit , λi ,
and εit are allowed to be cross-sectionally correlated.
5. LIMITING THEORY
0 0
We use (β F ) to denote the true parameters, and we still use λi without the
superscript 0 as it is not directly estimated and thus not necessary. Here F 0 de-
notes the true data generating process for F that satisfies Assumption B. This
F 0 in general has economic interpretations (e.g., supply shocks and demand
shocks). The estimator F̂ below estimates a rotation of F 0 .6 Define SNT (β F)
as the concentrated objective function in (13) divided by NT together with
centering, that is,
1 1
N N
The second term does not depend on β and F , and is for the purpose of cen-
tering, where MF = I − PF = I − FF /T with F F/T = I. We estimate β0 and
F 0 by
where
√ F̂ is the the matrix that consists of the first r eigenvectors (multiplied by
N
i=1 (Yi − Xi β̂)(Yi − Xi β̂) and where VNT is a diagonal
1
T ) of the matrix NT
6
If (6) and (7) hold for the data generating processes (i.e., F 0 F 0 /T = I and Λ0 Λ0 is diagonal)
rather than being viewed as estimation restrictions, then F̂ estimates F 0 itself instead of a rotation
of F 0 .
1244 JUSHAN BAI
matrix that consists of the first r largest eigenvalues of this matrix. Denote
PA = A(A A)−1 A for a matrix A.
with some special cases in which asymptotic bias is absent. This is obtained
by requiring stronger assumptions: the absence of either cross-correlation or
serial correlation and heteroskedasticity. The second theorem deals with the
most general case that allows for correlation and heteroskedasticity in both
dimensions.
Introduce
1
N
Zi = MF 0 Xi − aik MF 0 Xk
N k=1
1
N N T T
1
N
d
√ Zi εi −→ N(0 DZ )
NT i=1
σiits = E(εit εis ) since it does not depend on i, and we denote DZ by D2 . That
is, D1 and D2 are the probability limits of
1
N N T
1
T T N
N d
The corresponding central limit theorem will be denoted by √1 Zi εi −→
N d
NT i=1
N
where D0 = plim D(F 0 ) = plim NT
1
i=1 Zi Zi .
from Yit − Xit β̂ = λi Ft + eit + Xit (β̂ − β), which is a pure factor model with an
added error Xit (β̂ − β) = (NT )−1/2 Op (1). An error of this order of magnitude
does not affect the analysis.
Let β̂Asy be the least squares estimator obtained from the above transformed
variables, treating F and Λ as known. That is,
⎡ ⎤
tr[MΛ X 1 MF X 1 ] · · · tr[MΛ X 1 MF X p ] −1
β̂Asy = ⎣ ⎦
tr[MΛ X p MF X 1 ] · · · tr[MΛ X p MF X p ]
⎡ ⎤
tr[MΛ X 1 MF Y ]
×⎣ ⎦
tr[MΛ X p MF Y ]
PANEL MODELS WITH FIXED EFFECTS 1249
The square matrix on the right without inverse is equal to D(F) up to a scaling
constant, that is,
1
N
D(F) = Z Zi
T N i=1 i
⎡ ⎤
tr[MΛ X 1 MF X 1 ] ··· tr[MΛ X 1 MF X p ]
1 ⎣ ⎦
=
TN
tr[MΛ X p MF X 1 ] · · · tr[MΛ X p MF X p ]
This can be verified by some calculations. The estimator β̂Asy can be rewritten
as
N
−1
N
β̂Asy = Z Zi
i Zi Yi
i=1 i=1
√ √
It follows from (15) that NT (β̂ − β) = NT (β̂Asy − β) + op (1). To purge the
fixed effects, the LSDV estimator uses MT and MN to transform the variables,
whereas the interactive-effects estimator uses MF and MΛ to transform the
variables.
7. BIAS-CORRECTED ESTIMATOR
The interactive-effect estimator is shown to have the representation (see
Proposition A.3 in the Appendix)
1/2
√ 1
N
0 −1 T
(21) NT (β̂ − β ) = D(F ) √
0
Zi εi + B
NT i=1 N
1/2
N
+ C + op (1)
T
where B and C are given by (18) and (19), respectively, and they give rise to the
biases. Their presence arises from correlations and heteroskedasticities in εit .
We show that B and C can be consistently estimated so that a bias-corrected
estimator can be constructed, as in the framework of Hahn and Kuersteiner
(2002) and Hahn and Newey (2004). Attention is paid to heteroskedasticities
in both dimensions, assuming no correlation in either dimension to simplify
the presentation. We do point out how to estimate the biases consistently and
outline the idea of the proof when correlation exists in either dimension.
1250 JUSHAN BAI
The expression C is still given by (19), but Ω now becomes a diagonal matrix
N N
under no correlation, that is, Ω = diag( N1 k=1 σk1 2
N1 k=1 σkT
2
). Let Ω̂ =
1
N 2 1
N 2
diag( N k=1 ε̂k1 N k=1 ε̂kT ) be an estimator for Ω. We estimate C by
−1
1
N
−1 Λ̂ Λ̂
(24) Ĉ = −D̂ 0 Xi MF̂ Ω̂F̂ λ̂i
NT i=1 N
In the Appendix we prove (T/N)1/2 (B̂ − B) = op (1) and (N/T )1/2 (Ĉ − C) =
op (1). Define
1 1
β̂† = β̂ − B̂ − Ĉ
N T
N T
where D3 = plim NT
1
i=1 t=1 Zit Zit σit2 .
where n/N → 0 and n/T → 0. The argument of Bai and Ng (2006) can be
adapted to show that B̂ is consistent for B.
1
N T
where Ẑit is equal to Zit with F 0 , λi , and Λ replaced with F̂ , λ̂i , and Λ̂, re-
spectively. Next consider estimating Dj , j = 1 2 3. For all cases, we limit our
attention to the presence of heteroskedasticity, but no correlation. Thus Dj
1252 JUSHAN BAI
1
N T
1 1
n n T
tivity holds but is ignored, the resulting estimator is less efficient. In this sec-
tion, we consider the joint presence of additive and interactive effects, and
show how to estimate the model by imposing additivity and derive the limiting
distribution of the resulting estimator. Consider
where μ is the grand mean, αi is the usual fixed effect, ξt is the time effect, and
λi Ft is the interactive effect. Restrictions are required to identify the model.
Even in the absence of the interactive effect, the restrictions
N
T
(27) αi = 0 ξt = 0
i=1 t=1
are needed; see Greene (2000, p. 565). The following restrictions are main-
tained:
Further restrictions are needed to separate the additive and interactive effects.
They are
N
T
(29) λi = 0 Ft = 0
i=1 t=1
N T
To see this, suppose that λ̄ = N1 i=1 λi = 0 or F̄ = 1
T t=1 Ft = 0, or both are
not zero. Let λ†i = λi − 2λ̄ and Ft† = Ft − 2F̄ . Then
Yit = Xit β + μ + α†i + ξt† + λ†i Ft† + εit
where α†i = αi + 2F̄ λi − 2λ̄ F̄ and ξt† = ξt + 2λ̄ Ft − 2λ̄ F̄ . It is easy to verify
that F † F † /T = F F/T = Ir and Λ† Λ† = Λ Λ is diagonal, and at the same time,
N † T †
i=1 αi = 0 and t=1 ξt = 0 Thus the new model is observationally equivalent
to (26) if (29) is not imposed.
To estimate the general model under the given restrictions, we introduce
some standard notation. For any variable φit , define
1 1 1
N T N T
and its vector form φ̇i = φi − ιT φ̄i· − φ̄ + ιT φ̄··, where φ̄ = (φ̄·1 φ̄·T ) .
1254 JUSHAN BAI
Ẋi β̂)(Ẏi − Ẋi β̂) . Finally, Λ̂ is expressed as a function of (β̂ F̂) such that
Λ̂ = (λ̂1 λ̂2 λ̂N ) = T −1 [F̂ (Ẏ1 − Ẋ1 β̂) F̂ (ẎN − ẊN β̂)]
Iterations are required to obtain β̂ and F̂ . The remaining parameters û, α̂i ,
ξ̂t , and Λ̂ require no iteration, and they can be computed once β̂ and F̂ are
obtained. The solutions for μ̂ α̂i , and ξ̂t have the same form as the usual fixed-
effects model; see Greene (2000, p. 565).
We shall argue that (μ̂ {α̂i } {ξ̂t } β̂ F̂ Λ̂) are indeed the least squares esti-
mators from minimization of the objective function
N
T
subject to the restrictions (27)–(29). Concentrating out (μ, {αi }, {ξt }) is equiv-
alent to using (Ẏit Ẋit ) to estimate the remaining parameters. So the concen-
trated objective function is
N
T
The dotted variable for λi Ft is itself, that is, ċit = cit , where cit = λi Ft due to
restriction (29). This objective function is the same as (8), except Yit and Xit
are replaced by their dotted versions. From the analysis in Section 3, the least
squares estimators for β, F , and Λ are as prescribed above. Given these es-
timates, the least squares estimators for (μ {αi } {ξt }) are also immediately
obtained as prescribed.
PANEL MODELS WITH FIXED EFFECTS 1255
N
We next argue that all restrictions are satisfied. For example, N1 i=1 α̂i =
T
Ȳ·· − X̄··β̂ − μ̂ = μ̂ − μ̂ = 0. Similarly, t=1 ξ̂t = 0. It requires an extra argument
T
to show t=1 F̂t = 0. By definition,
1
N
F̂VNT = (Ẏi − Ẋi β̂)(Ẏi − Ẋi β̂) F̂
NT i=1
1
N
1
N
The entire analysis of Section 4 can be restated here. In particular, under the
conditions of Theorem 2, we have the asymptotic representation
−1
√ 1
N N
1
NT (β̂ − β0 ) = Ż Żi √ Żi ε̇i + op (1)
NT i=1 i NT i=1
N N d
ASSUMPTION F: (i) plim NT 1
Ż Żi = Ḋ0 > 0; (ii) √1 Żi εi −→
i=1 i NT i=1
N(0 ḊZ ), where ḊZ = plim NT ijts σijts Żit Żjs .
1
The null model is nested in the general model with λi = (αi 1) and Ft =
(1 ξt + μ) .
The interactive-effects estimator for β is consistent under both models (31)
and (32), but is less efficient than the least squares dummy-variable estimator
for model (31), as the latter imposes restrictions on factors and factor loadings.
But the fixed-effects estimator is inconsistent under model (32). The principle
of the Hausman test is applicable here.
PANEL MODELS WITH FIXED EFFECTS 1257
where
1
N
1 N
1
N
(33) η= √ X MF 0 εi
i ξ= √ aik Xk MF 0 εi
NT i=1
NT i=1 N k=1
This implies var(β̂IE − β̂FE ) = var(β̂IE ) − var(β̂FE ). Thus the Hausman test
takes the form
d
J = NT σ 2 (β̂IE − β̂FE ) [D(F 0 )−1 − C −1 ]−1 (β̂IE − β̂FE ) −→ χ2p
REMARK 9: The Hausman test is also applicable when there are no time
effects but only individual effects (i.e., ξt = 0). Then it is testing whether the
individual effects are time-varying. Similarly, the Hausman test is applicable
when αi = 0 in (31) but ξt = 0. Then it is testing whether the common shocks
have heterogeneous effects on individuals. Details are given in the Supplemen-
tal Materials.
1258 JUSHAN BAI
where (Xit xi wt ) is a vector of observable regressors, xi is time invariant, and
wt is cross-sectionally invariant (common). The dimensions of regressors are
such that Xit is p × 1, xi is q × 1, wt is × 1, and Ft is r × 1. Introduce
⎡ ⎤
Xi1 xi w1 ⎡ ⎤
⎢ X x w ⎥ ϕ
⎢ i2 2 ⎥
⎣
⎥ β = γ ⎦
i
Xi = ⎢
⎣ ⎦
δ
XiT xi wT
⎡ x ⎤ ⎡ w ⎤
1 1
⎢ x2 ⎥ ⎢ w2 ⎥
x=⎢ ⎥
⎣ ⎦ W =⎢ ⎥
⎣ ⎦
xN wT
Then the model can be rewritten as
Yi = Xi β + Fλi + εi
Let (β0 F 0 Λ) denote the true parameters (superscript 0 is not used for Λ).
To identify β0 , it was assumed in Section 4 that the matrix
−1
1 1 1
N N N
ΛΛ
D(F) = X MF Xi − X MF Xk λi λk
NT i=1 i T N 2 i=1 k=1 i N
PANEL MODELS WITH FIXED EFFECTS 1259
is positive definite for all possible F . This assumption fails when time-invariant
regressors and common regressors exist. This is because D(ιT ) and D(W ) are
not full rank matrices. However, the positive definiteness of D(F) is not a nec-
essary condition. In fact, all that is needed is the identification condition
D(F 0 ) > 0
That is, the matrix D(F) is positive definite when evaluated at the true F 0 , a
much weaker condition than Assumption A. In the Supplemental Material, we
show that the above condition can be decomposed into some intuitive assump-
tions. First, this means that the interactive effects are genuine (not additive
effects); otherwise, we are back to the environment of Hausman and Taylor,
and instrumental variables must be used to identify β. Second, there should
be no multicollinearity between W and F 0 , and no multicollinearity between x
and Λ. Finally, W and x cannot both contain the constant regressor (only one
grand mean parameter).
It remains to argue that D(F 0 ) > 0 (or equivalently, the four conditions
above) implies consistent estimation. We state this result as a proposition.
p
PROPOSITION 3: Assume Assumptions B–D hold. If D(F 0 ) > 0, then β̂ → β0 .
Discussion
When additive effects are also present, (35) becomes
where μ is the grand mean (explicitly written out), and αi and ξt are, respec-
tively, the individual and the time effects. The parameters γ and δ are no
longer directly estimable. Under the restrictions of (27) and (29), the within-
group transformation implies Ẏit = Ẋit φ + λi Ft + ε̇it . The parameters φ and
1260 JUSHAN BAI
Infeasible Estimator
100 10 1.003 0.061 2.999 0.061 4.994 0.103 1.998 0.060 4.003 0.087
100 20 1.001 0.039 2.998 0.041 5.002 0.065 2.000 0.040 4.000 0.054
100 50 1.000 0.025 3.002 0.024 5.000 0.039 1.999 0.024 4.000 0.030
100 100 1.000 0.017 3.000 0.017 5.000 0.029 1.999 0.017 3.999 0.020
10 100 0.998 0.056 3.002 0.055 4.998 0.098 2.002 0.066 4.001 0.063
20 100 1.000 0.039 2.998 0.039 5.000 0.064 2.002 0.040 3.999 0.046
50 100 1.000 0.024 3.001 0.025 4.999 0.040 2.001 0.025 4.000 0.029
Interactive-Effects Estimator
100 10 1.104 0.135 3.103 0.138 4.611 0.925 1.952 0.242 3.939 0.250
100 20 1.038 0.083 3.036 0.084 4.856 0.524 1.996 0.104 3.989 0.114
100 50 1.010 0.036 3.012 0.037 4.981 0.156 1.995 0.098 3.999 0.058
100 100 1.006 0.032 3.006 0.033 4.992 0.115 1.996 0.066 3.997 0.061
10 100 1.105 0.133 3.108 0.135 4.556 0.962 1.939 0.240 3.949 0.259
20 100 1.038 0.083 3.037 0.084 4.859 0.479 1.991 0.109 3.996 0.082
50 100 1.009 0.035 3.010 0.037 4.974 0.081 2.000 0.041 4.000 0.033
with ι = (1 1). The regressors are correlated with λi , Ft , and the product λi Ft .
The variables λij Ftj , and ηitj are all i.i.d. N(0 1) and the regression error
εit is i.i.d. N(0 4). We set μ1 = μ2 = c1 = c2 = 1. Further, xi ∼ ι λi + ei and
wt = ι Ft + ηi , with ei and ηi being i.i.d. N(0 1), so that xi is correlated with
λi and wt is correlated with Ft .
Simulation results are reported in Table I (based on 1000 repetitions).
The infeasible estimator in this table assumes observable Ft . Both the infea-
sible and interactive-effects estimators are consistent, but the latter is less effi-
cient than the former, as expected. The coefficients for the common regressors
and time-invariant regressors are estimated well. The within-group estimator
can only estimate β1 and β2 and is not reported.
We next investigate what happens when interactive-effects estimator is used
when the underlying effects are additive. That is, λi = (αi 1) and Ft = (1 ξt )
so that λi Ft = αi + δt . With regressors Xit1 and Xit2 generated with the earlier
formula, the model is
Yit = Xit1 β1 + Xit2 β2 + αi + ξt + εit
We consider three estimators: (i) the within-group estimator, (ii) the infeasible
estimator, and (iii) the interactive-effects estimator. All three are consistent.
The results are reported in Table II. The interactive-effects estimator remains
valid under additive effects, but is less efficient than the within-group estima-
tor, as expected.
1262 JUSHAN BAI
TABLE II
MODELS OF ADDITIVE EFFECTS
Interactive-Effects
Within-Group Estimator Infeasible Estimator Estimator
Mean Mean Mean Mean Mean Mean
N T β1 = 1 SD β2 = 3 SD β1 SD β2 SD β1 SD β2 SD
100 3 1.002 0.146 2.997 0.144 1.001 0.208 2.998 0.206 1.155 0.253 3.164 0.259
100 5 1.001 0.099 3.002 0.100 1.001 0.114 3.003 0.118 1.189 0.194 3.190 0.186
100 10 1.000 0.068 2.996 0.066 1.000 0.072 2.995 0.072 1.110 0.167 3.106 0.167
100 20 0.999 0.048 2.999 0.046 0.998 0.048 2.998 0.047 1.017 0.083 3.016 0.080
100 50 1.001 0.029 2.999 0.029 1.001 0.029 2.999 0.029 1.003 0.029 3.000 0.029
100 100 0.999 0.021 3.000 0.021 0.999 0.021 3.000 0.021 1.000 0.021 3.001 0.021
3 100 1.001 0.142 2.995 0.143 1.002 0.113 2.996 0.116 1.163 0.240 3.165 0.251
5 100 1.000 0.102 3.005 0.100 1.000 0.093 3.006 0.092 1.179 0.190 3.180 0.189
10 100 1.000 0.069 2.999 0.069 1.001 0.066 2.999 0.065 1.106 0.167 3.106 0.164
20 100 1.001 0.047 3.000 0.047 1.001 0.045 3.000 0.046 1.018 0.080 3.017 0.080
50 100 0.998 0.030 3.002 0.029 0.998 0.030 3.002 0.028 1.000 0.030 3.004 0.029
APPENDIX A: PROOFS
T
We use the following facts throughout: T −1 Xi 2 = T −1 t=1 Xit 2 = Op (1)
N
or T −1/2 Xi = Op (1). Averaging over i, (T N)−1 i=1 Xi 2 = Op (1). Simi-
√
larly, T −1/2 F 0 = Op (1), T −1 F̂ 2 = r, T −1/2 F̂ =√ r, T −1 0
√ Xi F = O2p (1),
and so forth. Throughout, we define δNT = min[ N T ] so that δNT =
min[N T ]. The proofs of the lemmas are given in the Supplemental mater-
ial.
1 1 0
N N
1
N
+ ε (PF − PF 0 )εi
NT i=1 i
where
0
1
N
F MF F 0 ΛΛ
(38) S̃NT (β F) = β Xi MF Xi β + tr
NT i=1 T N
1
N
+ 2β X MF F 0 λi
NT i=1 i
By Lemma A.1,
1
N
C= (λ ⊗ MF Xi )
NT i=1 i
Because Λ Λ/N > 0 and (F 0 MF̂ F 0 )/T ≥ 0, the above implies the latter matrix
is op (1), that is,
F 0 MF̂ F 0 F 0 F 0 F 0 F̂ F̂ F 0
(40) = − = op (1)
T T T T
By Assumption B, F 0 F 0 /T is invertible, so it follows that F 0 F̂/T is invertible.
Next,
Note that for any positive definite matrices, A and B, the eigenvalues of AB
are the same as those of BA, A1/2 BA1/2 , and so forth; therefore, all eigenvalues
1266 JUSHAN BAI
are positive. In all remaining proofs, β and β0 are used interchangeably, and
so are F and F 0 .
PROPOSITION A.1: Under Assumptions A–D, we can make the following state-
ments: p
(i) VNT is invertible and VNT −→ V , where V (r × r) is a diagonal matrix
consisting of the eigenvalues of ΣΛ ΣF ; VNT is defined in (12).
−1
(ii) Let H = (Λ Λ/N)(F 0 F̂/T )VNT . Then H is an r × r invertible matrix and
1
T
1
F̂ − F H =
0 2
F̂t − H Ft0 2
T T t=1
1
= Op ( β̂ − β 2 ) + Op
min[N T ]
PROOF: From
1
N
1 1
N N
1 1 0
N N
1
N
+ εi (β − β̂) Xi F̂
NT i=1
1 0 1 1
N N N
0
+ F λi εi F̂ + εi λi F F̂ + εi εi F̂
NT i=1 NT i=1 NT i=1
1 0 0
N
+ F λi λi F F̂
NT i=1
= I1 + · · · + I9
The last term on the right is equal to F 0 (Λ Λ/N)(F 0 F̂/T ). Letting I1 I8
denote the eight terms on the right, the above can be rewritten as
Note that the matrix VNT (F 0 F̂/T )−1 (Λ Λ/N)−1 is equal to H −1 , but the invert-
ibility of VNT is not proved yet. We have
T −1/2 F̂[VNT (F 0 F̂/T )−1 (Λ Λ/N)−1 ] − F 0
= Op ( β̂ − β 2 ) = op ( β̂ − β )
because T −1 F 0 (I1 + · · · + I8) = op (1). The above equality shows that the
columns of F 0 F̂/T are the (nonnormalized) eigenvectors of the matrix
(F 0 F 0 /T )(Λ Λ/N), and VNT consists of the eigenvalues of the same matrix
p
(in the limit). Thus VNT −→ V , where V is r × r, consisting of the r eigenvalues
of the matrix ΣF ΣΛ .
(ii) Since VNT is invertible, the left-hand side of (43) can be written as
T −1/2 F̂H −1 − F 0 ; thus (43) is equivalent to
√ √
T −1/2 F̂ − F 0 H = Op ( β̂ − β ) + Op (1/ min[ N T ])
Taking squares on each side gives part (ii). Note that the cross-product term
from expanding the square has the same bound.
Q.E.D.
The proofs for the next four lemmas are given in the Supplemental Material.
LEMMA A.2: Under Assumptions A–C, there exists an M < ∞, such that state-
ments (i) and (ii) hold:
(i) We have
2
−1/2 1
N T T
E N Fs Ft [εkt εks − E(εkt εks )] ≤ M
T
k=1 t=1 s=1
Op (δ−2
NT ).
PANEL MODELS WITH FIXED EFFECTS 1269
N 0
N N
(iv) NT1 0 0
k=1 (Xk F /T )(F F /T )(F̂H
−1
−F 0 ) εk = (1/N 2 ) i=1 k=1 (Xk ×
T
F 0 /T )(F 0 F 0 /T )(Λ Λ/N)−1 λi ( T1 t=1 εit εkt ) + (NT )−1/2 Op (β̂ − β) + N −1/2 ×
Op (δ−2
NT )
LEMMA A.5: Let G = (F 0 F̂/T )−1 (Λ Λ/N)−1 . Under Assumptions A–D, we
have
1
N N
PROOF: From Yi = Xi β0 + F 0 λi + εi ,
N
−1 N
β̂ − β0 = Xi MF̂ Xi Xi MF̂ F 0 λi
i=1 i=1
−1
N
N
or
1 1 1
N N N
It follows that
1
N
X M F 0 λi
NT i=1 i F̂
0 −1 −1
1
N
F F̂ ΛΛ
=− Xi MF̂ [I1 + · · · + I8] λi
NT i=1 T N
= J1 + · · · + J8
where aik = λi (Λ Λ/N)−1 λk is a scalar and thus commutable with β̂ − β. Now
consider
0 −1 −1
1
N N
εk F̂ F̂ F ΛΛ
J3 = X M X k λi (β̂ − β)
N 2 T i=1 k=1 i F̂ T T N
Writing εk F̂/T = εk F 0 H/T + εk (F̂ − F 0 H)/T = Op (T −1/2 ) + Op (β̂ − β) +
√ √
Op (1/ min[ N T ]), by Lemma A.4, it is easy to see that J3 = op (1)(β̂ − β).
PANEL MODELS WITH FIXED EFFECTS 1271
Next
1
N N
Xk F̂
J4 = − 2 X M F λk (β − β̂)
0
N T i=1 k=1 i F̂ T
−1 −1
F̂ F 0 Λ Λ
× λi
T N
Writing MF̂ F 0 = MF̂ (F 0 − F̂H −1 ) and using that T −1/2 F 0 − F̂H −1 is small,
then J4 is equal to op (1)(β̂ − β). It is easy to show J5 = op (1)(β̂ − β) and thus
it is omitted.
The last three terms J6–J8 do not explicitly depend on β̂ − β. Only term
J7 contributes to the limiting distribution of β̂ − β; the other two terms
are op ((NT )−1/2 ) plus op (β̂ − β). We shall establish these claims. Con-
sider
0 −1 −1
1
N N
εk F̂ F̂ F ΛΛ
J6 = − X M F 0
λk λi
N 2 T i=1 k=1 i F̂ T T N
Denote G = (F̂ F 0 /T )−1 (Λ Λ/N)−1 for the moment: it is a matrix of fixed di-
mension and does not vary with i. Using MF̂ F 0 = MF̂ (F 0 − F̂H −1 ), we can
write
1
N N
1 ε F̂
J6 = − Xi MF̂ (F 0 − F̂H −1 ) λk k Gλi
NT i=1 N k=1 T
Now
1 1 1
N N N
by Lemma A.4(iii). The last equality is because (NT )−1/2 dominates (NT )−1/2 ×
N
(β̂ − β). Furthermore, by Lemma A.3, NT 1
i=1 Xi MF̂ (F̂ − F H)λi = Op (β̂ −
0
−2
β) + Op (δNT ) for = 1 2 r, and noting G does not depend on i and
1272 JUSHAN BAI
G = Op (1), we have
NT
+ N −1/2 Op (δ−4
NT )
1
N N
0 −1 −1
1
N N
F F̂ ΛΛ
J8 = − 2 2 X M εk εk F̂ λi
N T i=1 k=1 i F̂ T N
1
N N
(46) J8 = − X M Ωk F̂Gλi
N 2 T 2 i=1 k=1 i F̂
1
N N
Denote the first term on the right by ANT . By Lemma A.5, we have
1 1
J8 = ANT + Op √ [Op (β̂ − β) + Op (δ−1
+√ NT )]
T N NT
1 1
− √ Op ( β̂ − β 2 ) + √ Op (δ−2NT )
N N
PANEL MODELS WITH FIXED EFFECTS 1273
1
N
X M Xi + op (1) (β̂ − β) − J2
NT i=1 i F̂
1
N
= Xi MF̂ εi + J7 + ANT
NT i=1
−1/2
1
+ op (NT ) + Op √ + N −1/2 Op (δ−2
NT )
T N
√
Combining terms and multiplying by NT yields
√
[D(F̂) + op (1)] NT (β̂ − β)
1 1 √
N N
=√ Xi MF̂ − aik Xk MF̂ εi + NT ANT
NT i=1 N k=1
−1/2
+ op (1) + Op T + T 1/2 Op (δ−2
NT )
Thus, if T/N 2 → 0, the last term is also op (1). Multiply D(F̂)−1 on each
√
side of the above and note that D(F̂)−1 NT ANT = N/T ζNT . Finally,
D(F̂)−1 [D(F̂) + op (1)]−1 = I + op (1), so we have proved the proposition.
Q.E.D.
LEMMA A.6: Under Assumptions A–D, ζNT = Op (1), where ζNT is given in
Proposition A.2.
where
−1
1 (Xi − Vi ) F 0 F 0 F 0
N N
(47) ξ †
NT =−
N i=1 k=1 T T
−1
1
T
ΛΛ
× λk εit εkt = Op (1)
N T t=1
√
Combining Proposition A.2 and Lemma A.8 and noting that T Op ( β̂ −
√ √
β0 2 ) + Op ( β̂ − β0 ) is dominated by NT (β̂ − β0 ) and T Op (δ−2
NT ) = op (1)
if T/N 2 → 0, we have an additional statement:
PROOF OF THEOREM 1: The assumption implies that both T/N and N/T
are O(1). Furthermore, ζNT = Op (1) by Lemma A.6, and ξNT †
and hence
ξNT are Op (1) by Lemma A.8. The theorem follows from the expression for
√
NT (β̂ − β) given in Corollary 1. Q.E.D.
√ 1
N
which implies B = 0. Thus (48) holds and the limiting distribution once again
follows from Assumption E. Q.E.D.
Bias Correction
LEMMA A.10: Under Assumptions A–D, the following equalities hold:
N
(i) N1 Λ̂ − H −1 Λ 2 = N1 i=1 λ̂i − H −1 λi 2 = Op ( β̂ − β 2 ) + Op (δ−2
NT )
−1 −1 −2
(ii) N (Λ̂ − H Λ )Λ = Op ( β̂ − β ) + Op (δNT ).
(iii) Λ̂ Λ̂/N − H −1 (Λ Λ/N)H −1 = Op ( β̂ − β ) + Op (δ−2 NT ).
(iv) (Λ̂ Λ̂/N) − H (Λ Λ/N) H = Op ( β̂ − β ) + Op (δ−2
−1 −1
NT ).
N
(v) N1 i=1 λ̂i − H −1 λi = Op (δ−1 ) + O p ( β̂ − β ).
N NT
(vi) N1 k=1 T −1/2 Xi λ̂i − H −1 λi = Op (δ−1 NT ) + Op ( β̂ − β ).
LEMMA A.11: Under the assumptions of Theorem 4, T/N(B̂ − B) = op (1)
LEMMA A.12: Under the assumptions of Theorem 4, N/T (Ĉ − C) = op (1)
The proof of Theorem 4 follows from Proposition A.3, Lemma A.11, and
Lemma A.12. For the proof of Proposition 2, see the Supplemental Material.
REFERENCES
ABOWD, J. M., F. KRAMARZ, AND D. N. MARGOLIS (1999): “High Wage Workers and High Wage
Firms,” Econometrica, 67, 251–333. [1233]
PANEL MODELS WITH FIXED EFFECTS 1277
AHN, S. G., Y. H. LEE, AND P. SCHMIDT (2001): “GMM Estimation of Linear Panel Data Models
With Time-Varying Individual Effects,” Journal of Econometrics, 101, 219–255. [1230,1231,1233,
1237,1239,1258]
(2006): “Panel Data Models With Multiple Time-Varying Effects,” Mimeo, Arizona
State University. [1239]
ALTUG, S., AND R. A. MILLER (1990): “Household Choices in Equilibrium,” Econometrica, 58,
543–570. [1233]
ALVAREZ, J., AND M. ARELLANO (2003): “The Time Series and Cross-Section Asymptotics of
Dynamic Panel Data Estimators,” Econometrica, 71, 1121–1159. [1232]
ANDERSON, T. W. (1984): An Introduction to Multivariate Statistical Analysis. New York: Wiley.
[1236]
ANDERSON, T. W., AND C. HSIAO (1982): “Formulation and Estimation of Dynamic Models With
Error Components,” Journal of Econometrics, 76, 598–606. [1232]
ANDERSON, T. W., AND H. RUBIN (1956): “Statistical Inference in Factor Analysis,” in Proceedings
of Third Berkeley Symposium on Mathematical Statistics and Probability, Vol. 5, ed. by J. Neyman.
Berkeley: University of California Press, 111–150. [1234]
ANDREWS, D. W. K. (2005): “Cross-Section Regression With Common Shocks,” Econometrica,
73, 1551–1585. [1234]
ARELLANO, M. (2003): Panel Data Econometrics. Oxford: Oxford University Press. [1239]
ARELLANO, M., AND J. HAHN (2005): “Understanding Bias in Nonlinear Panel Models: Some
Recent Developments,” Unpublished Manuscript, CEMFI. [1250]
ARELLANO, M., AND B. HONORE (2001): “Panel Data Models: Some Recent Developments,” in
Handbook of Econometrics, Vol. 5, ed. by J. J. Heckman and E. Leamer. Amsterdam: North-
Holland. [1230]
BAI, J. (1994): “Least Squares Estimation of Shift in Linear Processes,” Journal of Time Series
Analysis, 15, 453–472. [1244]
(2003): “Inferential Theory for Factor Models of Large Dinensions,” Econometrica, 71,
135–173. [1242,1247,1251]
(2009): “Supplement to ‘Panel Data Models With Interactive Fixed Effects’,” Economet-
rica Supplemental Material, 77, http://www.econometricsociety.org/ecta/Supmat/6135_proofs.
pdf. [1232]
BAI, J., AND S. NG (2002): “Determining the Number of Factors in Approximate Factor Models,”
Econometrica, 70, 191–221. [1235,1267]
(2006): “Confidence Intervals for Diffusion Index Forecasts and Inference for Factor-
Augment Regressions,” Econometrica, 74, 1133–1150. [1251]
BALTAGI, B. H. (2005): Econometric Analysis of Panel Data. Chichester: Wiley. [1239]
BEKKER, P. A. (1994): “Alternative Approximations to the Distributions of Instrumental Variable
Estimators,” Econometrica, 62, 657–681. [1250]
BERNANKE, B., J. BOIVIN, AND P. ELIASZ (2005): “Factor Augmented Vector Autoregression and
the Analysis of Monetray Policy,” Quarterly Journal of Economics, 120, 387–422. [1239]
CAMPBELL, J. Y., A. W. LO, AND A. C. MACKINLAY (1997): The Econometrics of Financial Mar-
kets. Princeton, NJ: Princeton University Press. [1234]
CARNEIRO, P., K. T. HANSEN, AND J. J. HECKMAN (2003): “Estimating Distributions of Treatment
Effects With an Application to the Returns to Schooling and Measurement of the Effects of
Uncertainty on College Choice,” Working Paper 9546, NBER; International Economic Review,
44, 362–422. [1233]
CAWLEY, J., K. CONNELLY, J. HECKMAN, AND E. VYTLACIL (1997): “Cognitive Ability, Wages,
and Meritocracy,” in Intelligence Genes, and Success: Scientists Respond to the Bell Curve, ed. by
B. Devlin, S. E. Feinberg, D. Resnick, and K. Roeder. Berlin: Springer-Verlag, 179–192. [1233]
CHAMBERLAIN, G. (1980): “Analysis of Covariance With Qualitative Data,” Review of Economic
Studies, 47, 225–238. [1232]
(1984): “Panel Data,” in Handbook of Econometrics, Vol. 2, ed. by Z. Griliches and M.
Intriligator. Amsterdam: North-Holland. [1230,1233,1239]
1278 JUSHAN BAI
CHAMBERLAIN, G., AND M. ROTHSCHILD (1983): “Arbitrage, Factor Structure and Mean-
Variance Analysis in Large Asset Markets,” Econometrica, 51, 1281–1304. [1232,1235,1238,
1244]
COAKLEY, J., A. FUERTES, AND R. P. SMITH (2002): “A Principal Components Approach to Cross-
Section Dependence in Panels,” Mimeo, Birkbeck College, University of London. [1231]
CONNOR, G., AND R. KORAJZCYK (1986): “Performance Measurement With the Arbitrage Pric-
ing Theory: A New Framework for Analysis,” Journal of Financial Economics, 15, 373–394.
[1235,1236,1239]
FAMA, E., AND K. FRENCH (1993): “Common Risk Factors in the Returns on Stocks and Bonds,”
Journal of Financial Economics, 33, 3–56. [1234]
FELDSTEIN, M., AND C. HORIOKA (1980): “Domestic Savings and International Capital Flows,”
Economic Journal, 90, 314–329. [1233]
GIANNONE, D., AND M. LENZA (2005): “The Feldstein Horioka Fact,” Unpublished Manuscript,
European Central Bank. [1233]
GOLDBERGER, A. S. (1972): “Structural Equations Methods in the Social Sciences,” Economet-
rica, 40, 979–1001. [1231]
GREENE, W. (2000): Econometric Analysis (Fourth Ed.). Englewood Cliffs, NJ: Prentice-Hall.
[1253,1254]
HAHN, J., AND G. KUERSTEINER (2002): “Asymptotically Unbiased Inference for a Dynamic
Panel Model With Fixed Effects When Both n and T Are Large,” Econometrica, 70, 1639–1657.
[1232,1246,1247,1249]
HAHN, J., AND W. NEWEY (2004): “Jackknife and Analytical Bias Reduction for Nonlinear Panel
Models,” Econometrica, 72, 1295–1319. [1232,1249,1250]
HANSEN, C., J. HAUSMAN, AND W. NEWEY (2005): “Estimation With Many Instruments,” Un-
published Manuscript, Department of Economics, MIT. [1250]
HAUSMAN, J. (1978): “Specification Tests in Econometrics,” Econometrica, 46, 1251–1271. [1256]
HAUSMAN, J. A., AND W. E. TAYLOR (1981): “Panel Data and Unobservable Individual Effects,”
Econometrica, 49, 1377–1398. [1258,1260]
HOLTZ-EAKIN, D., W. NEWEY, AND H. ROSEN (1988): “Estimating Vector Autoregressions With
Panel Data,” Econometrica, 56, 1371–1395. [1230,1231,1233,1238,1239]
HSIAO, C. (2003): Analysis of Panel Data. New York: Cambridge University Press. [1232]
JÖRESKOG, K. G., AND A. S. GOLDBERGER (1975): “Estimation of a Model With Multiple In-
dicators and Multiple Causes of a Single Latent Variable,” Journal of the American Statistical
Association, 70, 631–639. [1231]
KIEFER, N. M. (1980): “A Time Series-Cross Section Model With Fixed Effects With an Intertem-
poral Factor Structure,” Unpublished Manuscript, Department of Economics, Cornell Univer-
sity. [1231,1237]
KIVIET, J. (1995): “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel
Data Models,” Journal of Econometrics, 68, 53–78. [1232]
KNEIP, A., R. SICKLES, AND W. SONG (2005): “A New Panel Data Treatment for Heterogeneity in
Time Trends,” Unpublished Manuscript, Department of Economics, Rice University. [1231]
LAWLEY, D. N., AND A. E. MAXWELL (1971): Factor Analysis as a Statistical Method. London:
Butterworth. [1234]
LEE, Y. H. (1991): “Panel Data Models With Multiplicative Individual and Time Effects: Appli-
cation to Compensation and Frontier Production Functions,” Unpublished Ph.D. Dissertation,
Michigan State University. [1231]
LETTAU, M., AND S. LUDVIGSON (2001): “Resurrecting the (C)CAPM: A Cross-Sectional Test
When Risk Premia Are Time Varying,” Journal of Political Economy, 109, 1238–1287. [1234]
MACURDY, T. (1982): “The Use of Time Series Processes to Model the Error Structure of Earn-
ings in a Longitudinal Data Analysis,” Journal of Econometrics, 18, 83–114. [1231]
MOON, R., AND B. PERRON (2004): “Testing for a Unit Root in Panels With Dynamic Factors,”
Journal of Econometrics, 122, 81–126. [1231]
PANEL MODELS WITH FIXED EFFECTS 1279
MUNDLAK, Y. (1978): “On the Pooling of Time Series and Cross Section Data,” Econometrica,
46, 69–85. [1239]
NEWEY, W., AND D. MCFADDEN (1994): “Large Sample Estimation and Hypothesis Testing,” in
Handbook of Econometrics, ed. by R. F. Engle and D. McFadden. Amsterdam: North Holland,
2111–2245. [1244]
NEWEY, W., AND R. SMITH (2004): “Higher Order Properties of GMM and Generalized Empir-
ical Likeliood Estimators,” Econometrica, 72, 219–255. [1239]
NEWEY, W. K., AND K. D. WEST (1987): “A Simple Positive Semi-Definite, Heteroskedasticity
and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55, 703–708. [1251,1252]
NEYMAN, J., AND E. L. SCOTT (1948): “Consistent Estimates Based on Partially Consistent Ob-
servations,” Econometrica, 16, 1–32. [1232]
NICKELL, S. (1981): “Biases in Dynamic Models With Fixed Effects,” Econometrica, 49,
1417–1426. [1232]
OBSTFELD, M., AND K. ROGOFF (2000): “The Six Major Puzzles in International Macroeco-
nomics: Is There a Common Cause?” NBER Macroeconomic Annual, 15, 339–390. [1233]
PESARAN, M. H. (2006): “Estimation and Inference in Large Heterogeneous Panels With a Mul-
tifactor Error Structure,” Econometrica, 74, 967–1012. [1231,1233,1240]
PHILLIPS, P. C. B., AND D. SUL (2003): “Dynamic Panel Estimation and Homogeneity Testing
Under Cross Section Dependence,” The Econometrics Journal, 6, 217–259. [1231]
ROSS, S. (1976): “The Arbitrage Theory of Capital Asset Pricing,” Journal of Economic Theory,
13, 341–360. [1234]
SARGAN, J. D. (1964): “Wages and Prices in the United Kingdom: A Study in Ecnometrics
Methodology,” in Econometric Analysis of National Economic Planning, ed. by P. G. Hart, G.
Mill, and J. K. Whitaker. London: Butterworths, 25–54. [1237]
STOCK, J. H., AND M. W. WATSON (2002): “Forecasting Using Principal Components From a
Large Number of Predictors,” Journal of the American Statistical Association, 97, 1167–1179.
[1235,1236,1239]
TOWNSEND, R. (1994): “Risk and Insurance in Village India,” Econometrica, 62, 539–592. [1233]
Dept. Economics, New York University, 19 West 4th Street, New York, NY 10012,
U.S.A., SEM, Tsinghua University, and CEMA, Central University of Finance and
Economics, Beijing, China; [email protected].
Manuscript received October, 2005; final revision received January, 2009.