0% found this document useful (0 votes)

59 views

Econometrica: Eywords

Uploaded by

Jun Chen

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Econometrica: Eywords

Uploaded by

Jun Chen

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Econometrica, Vol. 77, No.

4 (July, 2009), 1229–1279

PANEL DATA MODELS WITH INTERACTIVE FIXED EFFECTS

BY JUSHAN BAI1
This paper considers large N and large T panel data models with unobservable mul-
tiple interactive effects, which are correlated with the regressors. In earnings studies,
for example, workers’ motivation, persistence, and diligence combined to influence the
earnings in addition to the usual argument of innate ability. In macroeconomics, inter-
active effects represent unobservable common shocks and their heterogeneous impacts
on cross sections. We consider identification, consistency, and the limiting distribution
of the interactive-effects
√ estimator. Under both large N and large T , the estimator is
shown to be NT consistent, which is valid in the presence of correlations and het-
eroskedasticities of unknown form in both dimensions. We also derive the constrained
estimator and its limiting distribution, imposing additivity coupled with interactive ef-
fects. The problem of testing additive versus interactive effects is also studied. In ad-
dition, we consider identification and estimation of models in the presence of a grand
mean, time-invariant regressors, and common regressors. Given identification, the rate
of convergence and limiting results continue to hold.

KEYWORDS: Additive effects, interactive effects, factor error structure, bias-cor-

rected estimator, Hausman tests, time-invariant regressors, common regressors.

1. INTRODUCTION
WE CONSIDER THE FOLLOWING PANEL DATA MODEL with N cross-sectional
units and T time periods

(1) Yit = Xit β + uit

and

uit = λi Ft + εit (i = 1 2 N, t = 1 2 T ),

where Xit is a p × 1 vector of observable regressors, β is a p × 1 vec-

tor of unknown coefficients, uit has a factor structure, λi (r × 1) is a vec-
tor of factor loadings, and Ft (r × 1) is a vector of common factors so that
λi Ft = λi1 F1t + · · · + λir Frt ; εit are idiosyncratic errors; λi , Ft , and εit are all un-
observed. Our interest is centered on the inference for the slope coefficient β,
although inference for λi and Ft will also be discussed.

1
I am grateful to a co-editor and four anonymous referees for their constructive comments,
which led to a much improved presentation. I am also grateful for comments and suggestions from
seminar participants at the University of Pennsylvania, Rice, MIT/Harvard, Columbia Econo-
metrics Colloquium, New York Econometrics Camp (Saratoga Springs), Syracuse, Malinvaud
Seminar (Paris), European Central Bank/Center for Financial Studies Joint Workshop, Cam-
bridge University, London School of Economics, Econometric Society European Summer Meet-
ings (Vienna), Quantitative Finance and Econometrics at Stern, and the Federal Reserve Bank
of Atlanta. This work is supported in part by NSF Grants SES-0551275 and SES-0424540.

© 2009 The Econometric Society DOI: 10.3982/ECTA6135

1230 JUSHAN BAI

The preceding set of equations constitutes the interactive-effects model in

light of the interaction between λi and Ft . The usual fixed-effects model takes
the form
(2) Yit = Xit β + αi + ξt + εit
where the individual effects αi and the time effects ξt enter the model addi-
tively instead of interactively; accordingly, it will be called the additive-effects
model for comparison and reference. It is noted that multiple interactive ef-
fects include additive effects as special cases. For r = 2, consider the special
factor and factor loading such that, for all i and all t,

1 αi
Ft = and λi =
ξt 1

Then
λi Ft = αi + ξt
The case of r = 1 has been studied by Holtz-Eakin, Newey, and Rosen (1988)
and Ahn, Lee, and Schmidt (2001), among others.
Owing to potential correlations between the unobservable effects and the
regressors, we treat λi and Ft as fixed-effects parameters to be estimated. This
is a basic approach to controlling the unobserved heterogeneity; see Chamber-
lain (1984) and Arellano and Honore (2001). We allow the observable Xit to
be written as

r

r

r

(3) Xit = τi + θt + ak λik + bk Fkt + ck λik Fkt + πi Gt + ηit

k=1 k=1 k=1

where ak , bk , and ck are scalar constants (or vectors when Xit is a vector),
and Gt is another set of common factors that do not enter the Yit equation.
So Xit can be correlated with λi alone or with Ft alone, or can be simultane-
ously correlated with λi and Ft . In fact, Xit can be a nonlinear function of λi
and Ft . We make no assumption on whether Ft has a zero mean or whether
Ft is independent over time: it can be a dynamic process without zero mean.
The same is true for λi . We directly estimate λi and Ft , together with β subject
to some identifying restrictions. We consider the least squares method, which
is detailed in Section 3.
While additive effects can be removed by the within-group transformation
(least squares dummy variables), the scheme fails to purge interactive ef-
fects. For example, consider r = 1, Yit = Xit β + λi Ft + εit . Then Yit − Ȳi· =
(Xit − X̄i·) β + λi (Ft − F̄) + εit − ε̄i· where Ȳi·, X̄i·, and ε̄i· are averages
over time. Because Ft ≡ F̄ , the within-group transformation with cross-section
dummy variable is unable to remove the interactive effects. Similarly, the inter-
active effects cannot be removed with time dummy variable. Thus the within-
PANEL MODELS WITH FIXED EFFECTS 1231

group estimator is inconsistent since the unobservables are correlated with the
regressors. However, the interactive effects can be eliminated by the quasi-
differencing method, as in Holtz-Eakin, Newey, and Rosen (1988). Further
details are provided in Section 3.3.
Recently, Pesaran (2006) proposed a new estimator that allows for multiple
factor error structure under large N and large T . His method augments the
model with additional regressors, which are the cross-sectional averages of the
dependent and independent variables, in an attempt to control for Ft . His es-
timator requires a certain rank condition, which is not guaranteed
√ to be met,
that depends on data generating processes. Peseran showed N consistency
irrespective of the rank condition, and a possible faster rate of convergence
when the rank condition does hold. Coakley, Fuertes, and Smith (2002) pro-
posed a two-step estimator, but this estimator was found to be inconsistent by
Pesaran. The two-step estimator, while related, is not the least squares estima-
tor. The latter is an iterated solution.
Ahn, Lee, and Schmidt (2001) considered the situation of fixed T and noted
that the least squares method does not give a consistent estimator if ser-
ial correlation or heteroskedasticity is present in εit . Then they explored the
consistent generalized method of moments (GMM) estimators and showed
that a GMM method that incorporates moments of zero correlation and ho-
moskedasticity is more efficient than least squares under fixed T . The fixed T
framework was also studied earlier by Kiefer (1980) and Lee (1991).
Goldberger (1972) and Jöreskog and Goldberger (1975) are among the ear-
lier advocates for factor models in econometrics, but they did not consider
correlations between the factor errors and the regressors. Similar studies in-
clude MaCurdy (1982), who considered random effects type of generalized
least squares (GLS) estimation for fixed T , and Phillips and Sul (2003), who
considered SUR-GLS (seemingly unrelated regressions) estimation for fixed
N. Panel unit root tests with factor errors were studied by Moon and Perron
(2004). Kneip, Sickles, and Song (2005) assumed Ft is a smooth function of t
and estimated Ft by smoothing spline. Given the spline basis, the estimation
problem becomes that of ridge regression. The regressors Xit are assumed to
be independent of the effects.
In this paper, we provide a large N and large T perspective on panel data
models with interactive effects, permitting the regressor Xit to be correlated
with either λi or Ft , or both. Compared with the fixed T analysis, the large T
perspective has its own challenges. For example, an incidental parameter prob-
lem is now present in both dimensions. Consequently, a different argument is
called for. On the other hand, the large T setup also presents new opportu-
nities. We show that √ if T is large, comparable with N, then the least squares
estimator for β is NT consistent, despite serial or cross-sectional correla-
tions and heteroskedasticities of unknown form in εit . This presents a contrast
to the fixed T framework, in which serial correlation implies inconsistency.
Earlier fixed T studies assume independent and identically distributed (i.i.d.)
1232 JUSHAN BAI

Xit over i, disallowing Xit to contain common factors, but permitting Xit to be
correlated with λi . Earlier studies also assume εit are i.i.d. over i. We allow εit
to be weakly correlated across i and over t, thus, uit has the approximate factor
structure of Chamberlain and Rothschild (1983). Additionally, heteroskedas-
ticity is allowed in both dimensions.
Controlling fixed effects by directly estimating them, while often an effective
approach, is not without difficulty—known as the incidental parameter prob-
lem, which manifests itself in bias and inconsistency at least under fixed T , as
documented by Neyman and Scott (1948), Chamberlain (1980), and Nickell
(1981). Even for large T , asymptotic bias can persist in dynamic or nonlinear
panel data models with fixed effects.2 We show that asymptotic bias arises un-
der interactive effects, leading to nonzero centered limiting distributions.
We also show that bias-corrected estimators can be constructed in a way
similar to Hahn and Kuersteiner (2002) and Hahn and Newey (2004), who
argued that bias-corrected estimators may have desirable properties relative
to instrumental variable estimators.
Because additive effects are special cases of interactive effects, the interac-
tive-effects estimator is consistent when the effects are, in fact, additive, but
the estimator is less efficient than the one with additivity imposed. In this pa-
per, we derive the constrained estimator together with its limiting distribution
when additive and interactive effects are jointly present. We also consider the
problem of testing additive effects versus interactive effects.
In Section 2, we explain why incorporating interactive effects can be a useful
modelling paradigm. Section 3 outlines the estimation method and Section 4
discusses the underlying assumptions that lead to consistent estimator. Sec-
tion 5 derives the asymptotic representation and the asymptotic distribution of
the estimator. Section 6 provides an interpretation of the estimator as a gener-
alized within-group estimator. Section 7 derives the bias-corrected estimators.
Section 8 considers estimators with additivity restrictions and their limiting dis-
tributions. Section 9 studies Hausman tests for additive effects versus interac-
tive effects. Section 10 is devoted to time-invariant regressors and regressors
that are common to each cross-sectional unit. Monte Carlo simulations are
given in Section 11. All proofs are provided either in the Appendix or in the
Supplemental Material (Bai (2009)).

2. SOME EXAMPLES
Macroeconometrics
Here Yit is the output (or growth rate) for country i in period t, Xit is the
input such as labor and capital, Ft represents common shocks (e.g., techno-
logical shocks and financial crises), λi represents the heterogeneous impact of

2
See Nickell (1981), Anderson and Hsiao (1982), Kiviet (1995), Hsiao (2003, pp. 71–74), and
Alvarez and Arellano (2003) for dynamic panel data models; see Hahn and Newey (2004) for
nonlinear panel models.
PANEL MODELS WITH FIXED EFFECTS 1233

common shocks on country i, and, finally, εit is the country-specific error term
of output (or growth rate). In general, common shocks not only affect the out-
put directly (through the total factor productivity or Solow resdidual), but also
affect the amount of input in the production process (through investment deci-
sions). When common shocks have homogeneous effects on the output, that is,
λi = λ for all i, the model collapses to the usual time effect by letting δt = λ Ft ,
where δt is a scalar. It is the heterogeneity that gives rise to a factor structure.
Recently, Giannone and Lenza (2005) provided an explanation for the
Feldstein–Horioka (1980) puzzle, one of the six puzzles in international macro-
economics (Obstfeld and Rogoff (2000)). The puzzle refers to the excessively
high correlation between domestic savings and domestic investments in open
economies. In their model, Yit is the investment and Xit is the savings for coun-
try i, Ft is the common shock that affects both investment and savings deci-
sions. Giannone and Lenza found that the high correlation is a consequence
of the strong assumption that shocks have homogeneous effects across coun-
tries (additive effects); it disappears when shocks are allowed to have hetero-
geneous impacts (interactive effects).

Microeconometrics
In earnings studies, Yit represents the wage rate for individual i with age (or
age cohort) t and Xit is a vector of observable characteristics, such as educa-
tion, experience, gender, and race. Here λi represents a vector of unobservable
characteristics or unmeasured skills, such as innate ability, perseverance, mo-
tivation, and industriousness, and Ft is a vector of prices for the unmeasured
skills. The model assumes that the price vector for the unmeasured skills is
time-varying. If Ft = f for all t, the standard fixed-effects model is obtained
by letting αi = λi f . In this example, t is not necessarily the calendar time, but
age or age cohort. Applications in this area were given by Cawley, Connelly,
Heckman, and Vytlacil (1997) and Carneiro, Hansen, and Heckman (2003). As
explained in a previous version, the model of Abowd, Kramarz, and Margolis
(1999) can be extended to disentangle the worker and the firm effects, while
incorporating interactive effects. Ahn, Lee, and Schmidt (2001) provided a the-
oretical motivation for a single factor model based on the work of Altug and
Miller (1990) and Townsend (1994).
In the setup of Holtz-Eakin, Newey, and Rosen (1988), the slope coefficient
β is also time-varying. Their model can be considered as a projection of Yit
on {Xit λi }; see Chamberlain (1984). Pesaran (2006) allowed β to be hetero-
geneous over i such that βi = β + vi with vi being i.i.d.. In this regard, the
constant slope coefficient is restrictive. To partially alleviate the restriction, it
would be useful to allow additional individual and time effects as

(4) Yit = Xit β + αi + δt + λi Ft + εit

Model (4) will be considered in Section 8.

1234 JUSHAN BAI

Finance
Here Yit is the excess return of asset i in period t; Xit is a vector of observable
factors such as dividend yields, dividend payout ratio, and consumption gap as
in Lettau and Ludvigson (2001) or book and size factors as in Fama and French
(1993); Ft is a vector of unobservable factor returns; λi is the factor loading; εit
is the idiosyncratic return. The arbitrage pricing theory of Ross (1976) is built
upon a factor model for asset returns. Campbell, Lo, and MacKinlay (1997)
provided many applications of factor models in finance.

Cross-Section Correlation
Interactive-effects models provide a tractable way to model cross-section
correlations. In the error term uit = λi Ft + εit , each cross section shares the
same Ft , causing cross-correlation. If λi = 1 for all i, and εit are i.i.d. over
i and t, an equal correlation model is obtained. In a recent paper, Andrews
(2005) showed that cross-section correlation induced by common shocks can
be problematic for inference. Andrews’ analysis is confined within the frame-
work of a single cross-section unit. In the panel data context, as shown here,
consistency and proper inference can be obtained.

3. ESTIMATION
3.1. Issues of Identification
Even in the absence of regressors Xit , the lack of identification for factor
models is well known; see Anderson and Rubin (1956) and Lawley and Max-
ell (1971). The current setting differs from classical factor identification in two
aspects. First, both factor loadings and factors are treated as parameters, as
opposed to factor loadings only. Second, the number of individuals N is as-
sumed to grow without bound instead of being fixed, and it can be much larger
than the number of observations T .
Write the model as

Yi = Xi β + Fλi + εi

where
⎡ ⎤ ⎡ X ⎤ ⎡ F ⎤ ⎡ ⎤
Yi1 i1 1 εi1
⎢ Yi2 ⎥ ⎢X ⎥
⎢F ⎥
⎢ εi2 ⎥
Yi = ⎢ ⎥ Xi = ⎢ ⎥ F =⎢ ⎥ εi = ⎢ ⎥
i2 2
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦

YiT
XiT FT εiT

Similarly, define Λ = (λ1 λ2 λN ) , an N × r matrix. In matrix notation,

(5) Y = Xβ + FΛ + ε
PANEL MODELS WITH FIXED EFFECTS 1235

where Y = (Y1 YN ) is T × N and X is a three-dimensional matrix with p

sheets (T × N × p), the th sheet of which is associated with the th element of
β ( = 1 2 p). The product Xβ is T × N and ε = (ε1 εN ) is T × N.
In view of FΛ = FAA−1 Λ for an arbitrary r × r invertible A, identification
is not possible without restrictions. Because an arbitrary r × r invertible matrix
has r 2 free elements, the number of restrictions needed is r 2 . The normaliza-
tion3

(6) F F/T = Ir

yields r(r + 1)/2 restrictions. This is a commonly used normalization; see, for
example, Connor and Korajzcyk (1986), Stock and Watson (2002), and Bai and
Ng (2002). Additional r(r − 1)/2 restrictions can be obtained by requiring

(7) Λ Λ = diagonal

These two sets of restrictions uniquely determine Λ and F , given the product
FΛ .4 The least squares estimators for F and Λ derived below satisfy these
restrictions.
With either fixed N or fixed T , factor analysis would require additional re-
strictions. For example, the covariance matrix of εi is diagonal or the covari-
ance matrix depends on a small number of parameters via parameterization.
Under large N and large T , the cross-sectional covariance matrix of εit or the
time series covariance matrix can be of an unknown form. In particular, none
of the elements is required to be zero. However, the correlation—either cross
sectional or serial—must be weak, which we assume to hold. This is known as
the approximate factor model of Chamberlain and Rothschild (1983).
Sufficient variation in Xit is also
Nneeded. The usual identification condi-

i is of full rank, where MF =
1
tion for β is that the matrix NT i=1 X i MF X
−1
IT − F(F F) F . Because F is not observable and is estimated, a stronger con-
dition is required. See Section 4 for details.

3.2. Estimation
The least squares objective function is defined as

(8) SSR(β F Λ) = (Yi − Xi β − Fλi ) (Yi − Xi β − Fλi )

i=1

3
The normalization still leaves rotation indeterminacy. For example, let G be an r × r orthog-
onal matrix, and let F ∗ = FG and Λ∗ = ΛG. Then FΛ = F ∗ Λ∗ and F ∗ F ∗ /T = F F/T = I. To
remove this indeterminacy, we fix G to make Λ∗ Λ∗ = G Λ ΛG a diagonal matrix. This is the
reason for restriction (7).
4
Uniqueness is up to a columnwise sign change. For example, −F and −Λ also satisfy the
restrictions.
1236 JUSHAN BAI

subject to the constraint F F/T = Ir and Λ Λ being diagonal. Define the pro-
jection matrix
MF = IT − F(F F)−1 F = IT − FF /T
The least squares estimator for β for each given F is simply
N −1 N

β̂(F) = Xi MF Xi Xi MF Yi
i=1 i=1

Given β, the variable Wi = Yi − Xi β has a pure factor structure such that

(9) Wi = Fλi + εi
Define W = (W1 W2 WN ), a T × N matrix. The least squares objective
function is
tr[(W − FΛ )(W − FΛ ) ]
From the analysis of pure factor models estimated by the method of least
squares (i.e., principal components; see Connor and Korajzcyk (1986) and
Stock and Watson (2002)), by concentrating out Λ = W F(F F)−1 = W F/T ,
the objective function becomes
(10) tr(W MF W ) = tr(W W ) − tr(F W W F)/T
Therefore, minimizing with respect to F is equivalent to maximizing tr[F (W ×
W )F]. The estimator for
√ F (see Anderson (1984)), is equal to the first r eigen-

vectors (multiplied by T due to the restriction F F/T = I) associated with
the first r largest eigenvalues of the matrix

N

N

WW= Wi Wi = (Yi − Xi β)(Yi − Xi β)

i=1 i=1

Therefore, given F , we can estimate β, and given β, we can estimate F . The

final least squares estimator (β̂ F̂) is the solution of the set of nonlinear equa-
tions
N −1 N

(11) β̂ = Xi MF̂ Xi Xi MF̂ Yi
i=1 i=1

and

1
N

(12) (Yi − Xi β̂)(Yi − Xi β̂) F̂ = F̂VNT
NT i=1
PANEL MODELS WITH FIXED EFFECTS 1237

where VNT is a diagonal matrix that consists of the r largest eigenvalues of

the above matrix5 in the brackets, arranged in decreasing order. The solution
(β̂ F̂) can be simply obtained by iteration. Finally, from Λ = W F/T , Λ̂ is
expressed as a function of (β̂ F̂) such that

Λ̂ = (λ̂1 λ̂2 λ̂N ) = T −1 [F̂ (Y1 − X1 β̂) F̂ (YN − XN β̂)]

We may also write

Λ̂ = T −1 F̂ (Y − X β̂)

where Y is T × N and X is T × N × p, a three-dimensional matrix.

The triplet (β̂ F̂ Λ̂) jointly minimizes the objective function (8). The pair
(β̂ F̂) jointly minimizes the concentrated objective function (10), which, when
substituting Yi − Xi β for Wi , is equal to

N

N

(13) tr(W MF W ) = Wi MF Wi = (Yi − Xi β) MF (Yi − Xi β)
i=1 i=1

This is also the objective function considered by Ahn, Lee, and Schmidt (2001),
although a different normalization is used. They as well as Kiefer (1980) dis-
cussed an iteration procedure for estimation. Interestingly, convergence to a
local optimum for such an iterated estimator was proved by Sargan (1964).
Here we suggest a more robust iteration scheme (having a much better con-
vergence property from Monte Carlo evidence) than the one implied by (11)
and (12). Given F and Λ, we compute
N −1 N

β̂(F Λ) = Xi Xi Xi (Yi − Fλi )
i=1 i=1

and given β, we compute F and Λ from the pure factor model Wi = Fλi + ei
with Wi = Yi − Xi β. This iteration scheme only requires a single matrix inverse
N
( i=1 Xi Xi )−1 , with no need to update during iteration. Our simulation results
are based on this scheme.

REMARK 1: The common factor F is obtained by the principal components

method from the matrix W W /N. Under large N but a fixed T , W W /N →
ΣW = FΣΛ F + Ω, where ΣΛ is the limit of Λ Λ/N—an r × r matrix—and Ω
is a T × T matrix of the covariance matrix of εi ; see (9). Unless Ω is a scalar
multiple of IT (identity matrix), that is, no serial correlation and heteroskedas-
ticity, the first r eigenvectors of ΣW are not a rotation of F , implying that the

5
We divide this matrix by NT so that VNT will have a proper limit. The scaling does not affect F̂ .
1238 JUSHAN BAI

first r eigenvectors of W W /N are not consistent for F (a rotation of F to be

more precise). This leads to inconsistent estimation of the product FΛ , and,
thus, inconsistent estimation of β. However, under large T , Ω does not have
to be a scalar multiple of an identity matrix and the principal components esti-
mator for F is consistent. This is the essence of the approximate factor model
of Chamberlain and Rothschild (1983).

REMARK 2: Instead of estimating F from (9) by the method of princi-

pal components, one can directly use factor analysis. Factor analysis such as
the maximum likelihood method allows Ω to be heteroskedastic and to have
nonzero off-diagonal elements. The off-diagonal elements (due to serial cor-
relation) must be parametrized to avoid too many free parameters. Serial cor-
relation can also be removed by adding lagged dependent variables as regres-
sors, leaving a diagonal Ω. In contrast, the principal components method is
designed for large N and large T . In this setting, there is no need to assume a
parametric form for serial correlations. The principal components method is a
quick and effective approach to extracting common factors. Note that lagged
dependent variables lead to bias, just as serial correlation does; see Section
7. Small T models can also be estimated by the quasi-differencing approach
(Section 3.3). This latter approach also parametrizes serial correlations by in-
cluding lagged dependent variables as regressors so that εit has no serial cor-
relation. The reason to parametrize serial correlation under quasi-differencing
is that, with unrestricted serial correlation, lagged dependent variables will not
be valid instruments.

REMARK 3: The estimation procedure can be modified to handle unbal-

anced data. The procedure is elaborated in the Supplemental material; the
details are omitted here.

3.3. Alternative Estimation Methods

While the analysis is focused on the method of least squares, we discuss sev-
eral alternative estimation strategies.

METHOD 1: The quasi-differencing method in Holtz-Eakin, Newey, and

Rosen (1988) can be adapted for multiple factors. Consider the case of two
factors:
yit = xit β + λi1 ft1 + λi2 ft2 + εit
Multiplying the equation yit−1 by φt = ft1 /ft−11 and then subtracting it from
the equation yit , we obtain
yit = φt yit−1 + xit β − xit−1 βφt + λi2 δt + εit∗
where δt = ft2 − ft−12 φt and εit∗ = εit − φt εit−1 . The resulting model has a sin-
gle factor. If we apply the quasi-differencing method one more times to the
PANEL MODELS WITH FIXED EFFECTS 1239

resulting equation, then the factor error will be eliminated. This approach was
used by Ahn, Lee, and Schmidt (2006). The GMM method as in Holtz-Eakin,
Newey, and Rosen (1988) and Ahn, Lee, and Schmidt (2001) can be used to
estimate the model parameters consistently under some identification condi-
tions. For the case of r = 1, GMM was also discussed by Arellano (2003) and
Baltagi (2005). While not always necessary, there may be a need to recover
the original model parameters; see Holtz-Eakin, Newey, and Rosen (1988) for
details.
This estimator is consistent under fixed T despite serial correlation and het-
eroskedasticity in εit . In contrast, the least squares estimator will be inconsis-
tent under this setting. On the other hand, as T increases, due to the many-
parameter and many-instrument problem, the GMM method tends to yield
bias, a known issue from the existing literature (e.g., Newey and Smith (2004)).
The least squares estimator is consistent under large N and large T with un-
known form of correlation and heteroskedasticity, and the bias is decreasing
in N and T . Furthermore, the least squares method directly estimates all pa-
rameters, including the factor processes Ft and the factor loadings λi , so there
is no need to recover the original parameters. In many applications, the esti-
mated factor processes are used as inputs for further analysis, for example, the
diffusion index forecasting of Stock and Watson (2002) and factor-augmented
vector autoregression of Bernanke, Boivin, and Eliasz (2005). The estimated
loadings are useful objects in finance; see Connor and Korajzcyk (1986). Re-
covering those original parameters becomes more involved under multiple fac-
tors with the quasi-differencing approach. The least squares method is simple
and effective in handling multiple factors. Furthermore, computation of the
least squares method under large N and large T is quite fast.
In summary, with small T and with potential serial correlation and time se-
ries heteroskedasticity, the quasi-differencing method is recommended in view
of its consistency properties. With large T , the least squares method is a viable
alternative. Remark 2 suggests another alternative under small T .

METHOD 2: We can extend the argument of Mundlak (1978) and Chamber-

lain (1984) to models with interactive effects. When λi is correlated with the
regressors, it can be projected onto the regressors such that λi = AX̄i· + ηi ,
where X̄i· is the time average of Xit and A is r × p, so that model (1) can be
rewritten as

Yit = Xit β + X̄i· δt + ηi Ft + εit

where δt = A Ft . The above model still has a factor error structure. However,
when Ft is assumed to be uncorrelated with the regressors, the aggregated er-
ror ηi Ft + εit is now uncorrelated with the regressors, so we can use a random-
1240 JUSHAN BAI

effects GLS to estimate (β δ1 δT ). Similarly, when Ft is correlated with

the regressors, but λi is not, one can project Ft onto the cross-sectional aver-
ages such that Ft = BX̄·t + ξt to obtain

Yit = Xit β + X̄·t ρi + λi ξt + εit

with ρi = Bλi . Again, a random-effects GLS can be used. When both λi and
Ft are correlated with regressors, we apply both projections and augment the
model with cross-products of X̄i· and X̄·t , in addition to X̄i· and X̄·t so that,
with ρi = B ηi and δt = A ξt ,

(14) Yit = Xit β + X̄i· δt + X̄·t ρi + X̄i· C X̄·t + ηi ξt + εit

where C is a matrix. The above can be estimated by the random-effects GLS.

METHOD 3: The method of Pesaran (2006) augments the model with re-
gressors (Ȳ·t X̄·t ) under the assumption of Ft being correlated with regres-
sors, where Ȳ·t and X̄·t attempt to estimate Ft , similar to the projection ar-
gument of Mundlak. But in the Mundlak argument, the projection residual
ξt is assumed to have a fixed variance. In contrast, the variance of ξt is as-
sumed to converge to zero as N → ∞ in Pesaran (2006), who assumed Xit is
N
of the form Xit = Bi Ft + eit so that X̄·t = BFt + ξt with B = N −1 i=1 Bi and
N
ξt = N −1 i=1 eit . The variance of ξt is of O(N
√
−1
). Thus the factor error λ√i ξt is
negligible under large N. He established N consistency and possible NT
consistency for some special cases. It appears that when λi is correlated with
the regressors, additional regressors Ȳi· and X̄i· should also be added to achieve
consistency.

4. ASSUMPTIONS
In this section, we state assumptions needed for consistent estimation
and explain the meaning of each assumption prior to or after its introduc-
tion. Throughout, for a vector or matrix A, its norm is defined as A =
(tr(A A))1/2 .
The p × p matrix

1 1 1
N N N

D(F) = X MF Xi − X MF Xk aik
NT i=1 i T N 2 i=1 k=1 i

where aik = λi (Λ Λ/N)−1 λk , plays an important role in this paper. Note that
aik = aki since it is a scalar. The identifying condition for β is that D(F) is pos-
itive definite. If F were observable, the identification condition for β would be
that the first term of D(F) on the right-hand side is positive definite. The pres-
PANEL MODELS WITH FIXED EFFECTS 1241

ence of the second term is because of unobservable F and Λ. The reason for
this particular form is the nonlinearity of the interactive effects.
Define a T × p vector

1
N

Zi = MF Xi − MF Xk aik
N k=1

so that Zi is equal to the deviation of MF Xi from its mean, but here the mean
is a weighted average. Write Zi = (Zi1 Zi2 ZiT ) . Then

1 1 1
N N T

D(F) = Z Zi = Zit Zit
NT i=1 i N i=1 T t=1
N
The first equality follows from aik = aki and N −1 i=1 aik aij = akj , and the sec-
ond equality is by definition. Thus D(F) is at least semipositive definite. Since
each Zit Zit is a rank 1 semidefinite matrix, summation of NT such semidefi-
nite matrices should lead to a positive definite matrix, given enough variations
in Zit over i and t. Our first condition assumes D(F) is positive definite in the
p
limit. Suppose that as N T → ∞, D(F) −→ D > 0. If εit are i.i.d. (0 σ 2 ), then
the limiting distribution of β̂ can be shown to be
√
NT (β̂ − β) → N(0 σ 2 D−1 )

This shows the need for D(F) to be positive definite.

Since F is to be estimated, the identification condition for β is assumed as
follows:

ASSUMPTION A—E Xit 4 ≤ M: Let F = {F : F F/T = I}. We assume

inf D(F) > 0

F∈F

The matrix F in this assumption is T × r, either deterministic or ran-

dom. This assumption rules out time-invariant regressors and common re-
gressors. Suppose Xi = xi ιT , where xi is a scalar and ιT = (1 1 1) . For
ιT ∈ F and D(ιT ) = 0, it follows that infF D(F) = 0. A common regressor
does not vary with i. Suppose all regressors are common such that Xi = W .
For F = W (W W )−1/2 ∈ F , D(F) = 0. The analysis of time-invariant regres-
sors and common regressors is postponed to Section 10, where we show that it
is sufficient to have D(F) > 0, when evaluated at the true factor process F . For
now, it is not difficult to show that if Xit is characterized by (3), where ηit have
sufficient variations such as i.i.d. with positive variance, then Assumption A is
satisfied.
1242 JUSHAN BAI

ASSUMPTION B: T p
(i) E Ft 4 ≤ M and T1 t=1 Ft Ft −→ ΣF > 0 for some r × r matrix ΣF , as
T → ∞. p
(ii) E λi 4 ≤ M and Λ Λ/N −→ ΣΛ > 0 for some r ×r matrix ΣΛ , as N → ∞.

This assumption implies the existence of r factors. Note that whether Ft or λt

has zero mean is of no issue since they are treated as parameters to be esti-
mated; for example, it can be a linear trend (Ft = t/T ). But if it is known that
Ft is a linear trend, imposing this fact gives more efficient
∞ estimation. More-
over, Ft itself can be a dynamic process such that Ft = i=1 Ci et−i , where et are
i.i.d. zero mean process. Similarly, λi can be cross-sectionally correlated.

ASSUMPTION C—Serial and Cross-Sectional Weak Dependence and Hetero-

1 1 1
N T

σ̄ij ≤ M τts ≤ M |σijts | ≤ M

N ij=1 T ts=1 NT ijts=1

The largest eigenvalue of Ωi = E(εi εi ) (T × T ) is bounded uniformly in i and T .

Assumption C is about weak serial and cross-sectional correlation. Het-

eroskedasticity is allowed, but εit is assumed to have a uniformly bounded
eighth moment. The first three conditions are relatively easy to understand
and are assumed in Bai (2003). We explain the meaning of C(iv). Let ηi =
T T
(T −1/2 t=1 εit )2 − E(T −1/2 t=1 εit )2 . Then E(ηi ) = 0 and E(η2i ) is bounded.
N
The expected value (N −1/2 i=1 ηi )2 is equal to T −2 N −1 tsuv ij cov(εit εis
εju εjv ), that is, the left-hand side of the first inequality without the absolute
sign. So the first part of C(iv) is slightly stronger than the assumption that the
N
second moment of N −1/2 i=1 ηi is bounded. The meaning of the second part
is similar. It can be easily shown that if εit are independent over i and t with
Eεit4 ≤ M for all i and t, then C(iv) is true. If εit are i.i.d. with zero mean and
Eεit8 ≤ M, then Assumption C holds.
PANEL MODELS WITH FIXED EFFECTS 1243

ASSUMPTION D: εit is independent of Xjs , λj , and Fs for all i t j, and s.

This assumption rules out dynamic panel data models and is given for the
purpose of simplifying the proofs. The procedure works well even with lagged
dependent variables; see Table V in the Supplemental Material. We do allow
Xit , Ft , and εit to be dynamic processes. If lagged dependent variables are in-
cluded in Xit , then εit cannot be serially correlated. Also note that Xit , λi ,
and εit are allowed to be cross-sectionally correlated.

5. LIMITING THEORY
0 0
We use (β F ) to denote the true parameters, and we still use λi without the
superscript 0 as it is not directly estimated and thus not necessary. Here F 0 de-
notes the true data generating process for F that satisfies Assumption B. This
F 0 in general has economic interpretations (e.g., supply shocks and demand
shocks). The estimator F̂ below estimates a rotation of F 0 .6 Define SNT (β F)
as the concentrated objective function in (13) divided by NT together with
centering, that is,

1 1
N N

SNT (β F) = (Yi − Xi β) MF (Yi − Xi β) − ε MF 0 εi

NT i=1 NT i=1 i

The second term does not depend on β and F , and is for the purpose of cen-
tering, where MF = I − PF = I − FF /T with F F/T = I. We estimate β0 and
F 0 by

(β̂ F̂) = arg min SNT (β F)

βF

As explained in the previous section, (β̂ F̂) satisfies

N −1 N

β̂ = Xi MF̂ Xi Xi MF̂ Yi
i=1 i=1

1
N

(Yi − Xi β̂)(Yi − Xi β̂) F̂ = F̂VNT

NT i=1

where
√ F̂ is the the matrix that consists of the first r eigenvectors (multiplied by
N
i=1 (Yi − Xi β̂)(Yi − Xi β̂) and where VNT is a diagonal
1
T ) of the matrix NT

6
If (6) and (7) hold for the data generating processes (i.e., F 0 F 0 /T = I and Λ0 Λ0 is diagonal)
rather than being viewed as estimation restrictions, then F̂ estimates F 0 itself instead of a rotation
of F 0 .
1244 JUSHAN BAI

matrix that consists of the first r largest eigenvalues of this matrix. Denote
PA = A(A A)−1 A for a matrix A.

PROPOSITION 1—Consistency: Under Assumptions A–D, as N T → ∞, the

following statements hold:
p
(i) The estimator β̂ is consistent such that β̂ − β0 −→ 0.
p
(ii) The matrix F 0 F̂/T is invertible and PF̂ − PF 0 −→ 0.

The usual argument of consistency for extreme estimators would involve

p
showing SNT (β F) −→ S(β F) uniformly on some bounded set of β and F ,
and then showing that S(β F) has a unique minimum at β0 and F 0 ; see Newey
and McFadden (1994). This argument needs to be modified to take into ac-
count the growing dimension of F . As F is a T × r vector, the limit S would
involve an infinite number of parameters as N T going to infinity so the limit
as a function of F is not well defined. Furthermore, the concept of bounded F
is not well defined either. In this paper, we only require F F/T = I. The mod-
ification is similar to Bai (1994), where the parameter space (the break point)
increases with the sample size. We show there exists a function S̃NT (β F), de-
pending on (N T ) and generally still a random function, such that S̃NT (β F)
has a unique minimum at β0 and F 0 . In addition, we show the difference is
uniformly small,

SNT (β F) − S̃NT (β F) = op (1)

where op (1) is uniform. This implies the consistency of β̂ for β0 . However,

we cannot claim the consistency of F̂ for F 0 (or a rotation of F 0 ) owing to its
growing dimension. Part (ii) claims that the spaces spanned by F̂ and F 0 are as-
ymptotically the same. Alternative consistency concepts, including componen-
twise consistency or average norm consistency, are provided in the Appendix,
as these consistency concepts are also needed.
Given consistency, we can further establish the rate of convergence.

THEOREM 1—Rate of Convergence: Assume Assumptions

√ A–D hold. For
comparable N and T such that T/N → ρ > 0, then NT (β̂ − β0 ) = Op (1).

The theorem allows cross-section and serial correlations, as well as het-

eroskedasticities in both dimensions. This is important for applications in
macroeconomics, say cross-country studies, or in finance, where the factors
may not fully capture the cross-section correlations, and therefore the approx-
imate factor model of Chamberlain and Rothschild (1983) is relevant. For mi-
croeconomic data, cross-section
√ heteroskedasticity is likely to be present.
Although the estimator is NT consistent, the underlying limiting distrib-
ution will not be centered at zero; asymptotic biases exist. The next two the-
orems provide the limiting behavior of the estimator. The first theorem deals
PANEL MODELS WITH FIXED EFFECTS 1245

with some special cases in which asymptotic bias is absent. This is obtained
by requiring stronger assumptions: the absence of either cross-correlation or
serial correlation and heteroskedasticity. The second theorem deals with the
most general case that allows for correlation and heteroskedasticity in both
dimensions.
Introduce

1
N

Zi = MF 0 Xi − aik MF 0 Xk
N k=1

Then in the absence of serial correlation and heteroskedasticity in one of

the dimensions, and given an appropriate relative rate for T and N, it is shown
in the Appendix that the estimator has the representation
−1
√ 1 1
N N

(15) NT (β̂ − β0 ) = Z Zi √ Zi εi + op (1)

NT i=1 i NT i=1

If correlation and heteroskedasticity are present in both dimensions, there

will be an Op (1) bias term in the above representation; see (21) in Sec-
N
tion 7. In all cases, we need the central limit theorem for (NT )−1/2 i=1 Zi εi =
N T
(NT )−1/2 i=1 t=1 Zit εit . Assuming correlation and heteroskedasticity in both
dimensions, its variance is given by

1
N
1
N N T T

var √ Zi εi = σijts E(Zit Zjs )

NT i=1
NT i=1 j=1 t=1 s=1

where σijts = E(εit εjs ). This variance is O(1) because 1
NT ijts |σijts | ≤ M by
assumption.

ASSUMPTION E: For some nonrandom positive definite matrix DZ ,

1
N N T T

(16) plim σijts Zit Zjs = DZ

NT i=1 j=1 t=1 s=1

1
N
d
√ Zi εi −→ N(0 DZ )
NT i=1

In the absence of serial correlation and heteroskedasticity, we let σij =

σijtt = E(εit εjt ) since it does not depend on t, and we denote DZ by D1 . Like-
wise, with no cross-section correlation and heteroskedasticity, we let ωts =
1246 JUSHAN BAI

σiits = E(εit εis ) since it does not depend on i, and we denote DZ by D2 . That
is, D1 and D2 are the probability limits of

1
N N T

(17) plim σij Zit Zjt = D1

NT i=1 j=1 t=1

1
T T N

plim ωts Zit Zis = D2

NT t=1 s=1 i=1

N d
The corresponding central limit theorem will be denoted by √1 Zi εi −→
N d
NT i=1

i=1 Zi εi −→ N(0 D2 ), respectively.

1
N(0 D1 ) and √NT

THEOREM 2: Assume Assumptions A–E hold. As T N → ∞, the following

statements hold:
(i) In the absence of serial correlation and heteroskedasticity and with T/N →
0,
√ d
NT (β̂ − β0 ) −→ N(0 D−1 −1
0 D1 D0 )

(ii) In the absence of cross-section correlation and heteroskedasticity and with

N/T → 0,
√ d
NT (β̂ − β0 ) −→ N(0 D−1 −1
0 D2 D0 )

N
where D0 = plim D(F 0 ) = plim NT
1
i=1 Zi Zi .

Noting that D1 = D2 = σ 2 D0 under i.i.d. assumption of εit , the following

statement holds:

COROLLARY 1: Under the assumptions of Theorem 1, if εit are i.i.d. over t

√ d
and i, with zero mean and variance σ 2 , then NT (β̂ − β0 ) −→ N(0 σ 2 D−1
0 ).

It is conjectured that β̂ is asymptotically efficient if εit are i.i.d. N(0 σ 2 ),

based on the argument of Hahn and Kuersteiner (2002).
Part (i) of Theorem 1 still permits cross-section correlation and het-
eroskedasticity, and part (ii) still permits serial correlation and heteroskedas-
ticity. The theorem also requires an appropriate rate for N and T . If T/N
converges to a constant, there will be a bias term due to correlation and het-
eroskedasticity. The next theorem is concerned with this bias. We shall deal
with the more general case in which correlation and heteroskedasticity exist in
both dimensions.
PANEL MODELS WITH FIXED EFFECTS 1247

THEOREM 3: Assume Assumptions A–E hold and T/N → ρ > 0. Then

√ d
NT (β̂ − β0 ) −→ N ρ1/2 B0 + ρ−1/2 C0 D−1 −1
0 DZ D0

where B0 is the probability limit of B with

−1
1 (Xi − Vi ) F 0 F 0 F 0
N N

(18) B = −D(F 0 )−1

N i=1 k=1 T T
−1 T

ΛΛ 1
× λk σiktt
N T t=1

and C0 is the probability limit of C with

0 0 −1 −1
1
N
0 −1 0 F F ΛΛ
(19) C = −D(F ) Xi MF 0 ΩF λi
NT i=1 T N
N N
and Vi = 1
N j=1 aij Xj , aij = λi (Λ Λ/N)−1 λj , and Ω = 1
N k=1 Ωk with Ωk =
E(εk εk ).

There will be no biases in the absence of correlations and heteroskedas-

ticities. In particular, bias B0 = 0 when cross-sectional correlation and het-
eroskedasticity are absent, and similarly C0 = 0 when serial correlation and
heteroskedasticity are absent. To see this, consider C in (19). The absence of
serial
correlation and heteroskedasticity implies Ωk = σk2 IT ; thus, MF 0 ΩF 0 =
( k σk )MF 0 F = 0. It follows that C = 0 and hence C0 = 0. The parametric
2 0

form of serial correlations is usually removed by adding lagged dependent vari-

ables. However, lagged dependent variables lead to bias with fixed-effect esti-
mators; see Hahn and Kuersteiner (2002) in a different context. The bias will
take a different form and is not studied here. The argument for B = 0 is not so
obvious and is provided in the proof of Theorem 2(ii). When εit are i.i.d. over
t and over i, then both B0 and C0 become zero, and the result specializes to
Corollary 1. These results assume no lagged dependent variables, whose pres-
ence will lead to additional bias, which is not studied in this paper.

REMARK 4: Suppose that k factors√ are allowed in the estimation, with k

fixed but k ≥ r. Then β̂ remains NT consistent, albeit less efficient than
k = r. Consistency relies on controlling the space spanned by Λ and that of F ,
which is achieved when k ≥ r.
√
REMARK 5: Due to NT consistency for β̂, estimation of β does not affect
the rates of convergence and the limiting distributions of F̂t and λ̂i . That is,
they are the same as that of the pure factor model of Bai (2003). This follows
1248 JUSHAN BAI

from Yit − Xit β̂ = λi Ft + eit + Xit (β̂ − β), which is a pure factor model with an
added error Xit (β̂ − β) = (NT )−1/2 Op (1). An error of this order of magnitude
does not affect the analysis.

6. INTERPRETATIONS OF THE ESTIMATOR

The Meaning of D(F) and the Within-Group Interpretation
Like the least squares dummy-variable (LSDV) estimator, the interactive-
effects estimator β̂ is a result of least squares with the effects being estimated.
In this sense, it is a within estimator. It is more instructive, however, to compare
the mathematical expressions of the two estimators. Write the additive-effects
model (2) in matrix form:
(20) Y = β1 X 1 + β2 X 2 + · · · + βp X p + ιT α + ξιN + ε
where Y and X k (k = 1 2 p) are matrices of T × N, with X k being
the regressor matrix associated with parameter βk (a scalar); ιT is a T × 1
vector with all elements being 1 and similarly for ιN ; α = (α1 αN ) and
ξ = (ξ1 ξT ) . Define
MT = IT − ιT ιT /T MN = IN − ιN ιN /N
Multiplying equation (20) by MT from the left and by MN from the right yields
MT Y MN = β1 (MT X 1 MN ) + · · · + βp (MT X p MN ) + MT εMN
The least squares dummy-variable estimator is simply the least squares applied
to the above transformed variables. The interactive-effects estimator has a sim-
ilar interpretation. Rewrite the interactive-effects model (5) as
Y = β1 X 1 + · · · + βp X p + FΛ + ε
Then left multiply MF and right multiply MΛ to obtain
MF Y MΛ = β1 (MF X 1 MΛ ) + · · · + βp (MF X p MΛ ) + MF εMΛ

Let β̂Asy be the least squares estimator obtained from the above transformed
variables, treating F and Λ as known. That is,
⎡ ⎤
tr[MΛ X 1 MF X 1 ] · · · tr[MΛ X 1 MF X p ] −1
β̂Asy = ⎣ ⎦

tr[MΛ X p MF X 1 ] · · · tr[MΛ X p MF X p ]
⎡ ⎤
tr[MΛ X 1 MF Y ]
×⎣ ⎦

tr[MΛ X p MF Y ]
PANEL MODELS WITH FIXED EFFECTS 1249

The square matrix on the right without inverse is equal to D(F) up to a scaling
constant, that is,

1
N

D(F) = Z Zi
T N i=1 i
⎡ ⎤
tr[MΛ X 1 MF X 1 ] ··· tr[MΛ X 1 MF X p ]
1 ⎣ ⎦
=
TN
tr[MΛ X p MF X 1 ] · · · tr[MΛ X p MF X p ]

This can be verified by some calculations. The estimator β̂Asy can be rewritten
as
N −1

N

β̂Asy = Z Zi
i Zi Yi
i=1 i=1

√ √
It follows from (15) that NT (β̂ − β) = NT (β̂Asy − β) + op (1). To purge the
fixed effects, the LSDV estimator uses MT and MN to transform the variables,
whereas the interactive-effects estimator uses MF and MΛ to transform the
variables.

7. BIAS-CORRECTED ESTIMATOR
The interactive-effect estimator is shown to have the representation (see
Proposition A.3 in the Appendix)

1/2
√ 1
N
0 −1 T
(21) NT (β̂ − β ) = D(F ) √
0
Zi εi + B
NT i=1 N
1/2
N
+ C + op (1)
T

where B and C are given by (18) and (19), respectively, and they give rise to the
biases. Their presence arises from correlations and heteroskedasticities in εit .
We show that B and C can be consistently estimated so that a bias-corrected
estimator can be constructed, as in the framework of Hahn and Kuersteiner
(2002) and Hahn and Newey (2004). Attention is paid to heteroskedasticities
in both dimensions, assuming no correlation in either dimension to simplify
the presentation. We do point out how to estimate the biases consistently and
outline the idea of the proof when correlation exists in either dimension.
1250 JUSHAN BAI

Under the assumption of E(εit2 ) = σit2 and E(εit εjs ) = 0 for i = j or t = s,

term B becomes
−1 −1
1 (Xi − Vi ) F 0 F 0 F 0
N
ΛΛ
(22) B = −D(F 0 )−1 λi σ̄i2
N i=1 T T N
T
where σ̄i2 = 1
T
σit2 . The bias can be estimated by replacing F 0 by F̂ , λi by
t=1
2 T
λ̂i , and σ̄i2 by σ̄ˆi = T1 t=1 ε̂it2 . This gives, in view of F̂ F̂/T = Ir ,
−1
1 (Xi − V̂i ) F̂ Λ̂ Λ̂
N
2
(23) B̂ = −D̂ −1
0 λ̂i σ̄ˆ i
N i=1 T N

The expression C is still given by (19), but Ω now becomes a diagonal matrix
N N
under no correlation, that is, Ω = diag( N1 k=1 σk1 2
N1 k=1 σkT
2
). Let Ω̂ =
1
N 2 1
N 2
diag( N k=1 ε̂k1 N k=1 ε̂kT ) be an estimator for Ω. We estimate C by
−1
1
N
−1 Λ̂ Λ̂
(24) Ĉ = −D̂ 0 Xi MF̂ Ω̂F̂ λ̂i
NT i=1 N

In the Appendix we prove (T/N)1/2 (B̂ − B) = op (1) and (N/T )1/2 (Ĉ − C) =
op (1). Define

1 1
β̂† = β̂ − B̂ − Ĉ
N T

THEOREM 4: Assume Assumptions A–E hold. In addition, E(εit2 ) = σit2 and

E(εit εjs ) = 0 for i = j or t = s. If T/N 2 → 0 and N/T 2 → 0, then
√ d
NT (β̂† − β0 ) −→ N(0 D−1 −1
0 D3 D0 )

N T
where D3 = plim NT
1
i=1 t=1 Zit Zit σit2 .

The limiting variance D3 is a special case of DZ due to the no correlation

assumption. Bias correction does not contribute to the limiting variance. Also
note that conditions N/T 2 → 0 and T/N 2 → 0 are added. Clearly, these condi-
tions are less restrictive than T/N converging to a positive constant. There exist
other bias correction procedures (e.g., panel jackknife) that could be used; see
Arellano and Hahn (2005) and Hahn and Newey (2004). An alternative to bias
correction in the case of T/N → ρ > 0 is to use the Bekker (1994) standard
errors to improve inference accuracy. This strategy was studied by Hansen,
Hausman, and Newey (2005) in the context of many instruments.
PANEL MODELS WITH FIXED EFFECTS 1251

REMARK 6: Consider estimating C in the presence of serial correlation.

We need consistent estimators for T −1 Xi Ωk F 0 and T −1 F 0 Ωk F 0 , where Ωk =
Eεk εk (T × T ), and then we take (weighted) averages over i and over k. Thus
consider estimating them for each given (i k). These terms are standard ex-
pressions in the usual heteroskedasticity and autocorrelation (HAC) robust
limiting covariance. To see this, let Wi = (Xi F 0 ) which is T × (p + r). Then the
T
long-run variance of T −1/2 Wi εk = T −1/2 t=1 Wit εkt is the limit of T1 Wi Ωk Wi ,
which contains T1 Xi Ωk F 0 and T1 F 0 Ωk F 0 as subblocks. A consistent estimator
for T −1 Wi ΩWi can be constructed by the truncated kernel method of Newey
and West (1987) based on the sequence Ŵit ε̂kt (t = 1 T ). Similar argument
has been made in Bai (2003).

REMARK 7: While estimating B in the presence of cross-section correlation

is not difficult, the underlying theory for consistency requires a different argu-
ment. In the time series dimension, the data are naturally ordered and distant
observations have less correlations. The kernel method puts small weights for
autocovariances with large lags, leading to consistent estimation. In the cross-
section dimension, such an ordering of data is not available, unless an eco-
nomic distance can be constructed so that the data can be ordered. In general,
large |i − j| does not mean smaller correlation between εit and εjt . Bai and Ng
(2006) studied the estimation of an object similar to B. They showed that if
the whole cross-sample is used in the estimation, the estimator is inconsistent.
A partial sample estimator, with N being replaced by n such that n/N → 0 and
n/T → 0, is consistent. Thus, B can be estimated by
−1
1 (Xi − V̂i ) F̂ Λ̂ Λ̂
n n T
−1 1
(25) B̂ = −D̂ 0 λ̂k ε̂it ε̂kt
n i=1 k=1 T N T t=1

where n/N → 0 and n/T → 0. The argument of Bai and Ng (2006) can be
adapted to show that B̂ is consistent for B.

Estimating the Covariance Matrices

To estimate D0 , we define

1
N T

D̂0 = Ẑit Ẑit

NT i=1 t=1

where Ẑit is equal to Zit with F 0 , λi , and Λ replaced with F̂ , λ̂i , and Λ̂, re-
spectively. Next consider estimating Dj , j = 1 2 3. For all cases, we limit our
attention to the presence of heteroskedasticity, but no correlation. Thus Dj
1252 JUSHAN BAI

(j = 1 2 3) are covariance matrices when heteroskedasticity exists in the cross-

section dimension only, in the time dimension only, and in both dimensions,
respectively. Thus we define

1 2 1
N T

D̂1 = σ̂ Ẑit Ẑit

N i=1 i T t=1

1 2 1
T N

D̂2 = ω̂ Ẑit Ẑit

T t=1 t N i=1

1
N T

D̂3 = Ẑit Ẑit ε̂it2

NT i=1 t=1
T N
where σ̂i2 = 1
T t=1 ε̂it2 , ω̂2t = 1
N i=1 ε̂it2 , and Ẑit was defined previously.

PROPOSITION 2: Assume Assumptions A–E hold. Then as N T → ∞,

p
D̂0 −→ D0 . In addition, in the absence of serial and cross-section correlations,
p
D̂j −→ Dj , where D1 and D2 are defined in Theorem 2 with no correlation, and
D3 is defined in Theorem 4.

REMARK 8: When cross-section correlation exists, we estimate D1 in (17) by

1 1
n n T

D̂1 = Ẑit Ẑjt ε̂it ε̂jt

n i=1 j=1 T t=1

where n satisfies n/N → 0 and n/T → 0; see Remark 7. It can be shown

that D̂1 is consistent for D1 . When serial correlation exists, we estimate D2
of (17) by estimating the long-run variance of the sequence {Ẑit ε̂it } using
the truncated kernel of Newey and West (1987); see Remark 6. It can be
shown that D̂2 is consistent for D2 . For estimating DZ —the covariance ma-
trix when correlation exists in both dimensions—we need to use the partial
sample method together with the Newey–West procedure. More specifically,
n
let ξ̂t = n−1/2 i=1 Ẑit ε̂it , where n is chosen as before. The estimated long-run
variance (e.g., truncated kernel) for the sequence ξ̂t is an estimator for DZ .
While we conjecture the estimator is consistent, a formal proof remains to be
explored.

8. MODELS WITH BOTH ADDITIVE AND INTERACTIVE EFFECTS

Although interactive-effects models include the additive models as special
cases, additivity has not been imposed so far, even when it is true. When addi-
PANEL MODELS WITH FIXED EFFECTS 1253

tivity holds but is ignored, the resulting estimator is less efficient. In this sec-
tion, we consider the joint presence of additive and interactive effects, and
show how to estimate the model by imposing additivity and derive the limiting
distribution of the resulting estimator. Consider

(26) Yit = Xit β + μ + αi + ξt + λi Ft + εit

where μ is the grand mean, αi is the usual fixed effect, ξt is the time effect, and
λi Ft is the interactive effect. Restrictions are required to identify the model.
Even in the absence of the interactive effect, the restrictions

N

T

(27) αi = 0 ξt = 0
i=1 t=1

are needed; see Greene (2000, p. 565). The following restrictions are main-
tained:

(28) F F/T = Ir Λ Λ = diagonal

Further restrictions are needed to separate the additive and interactive effects.
They are

N

T

(29) λi = 0 Ft = 0
i=1 t=1

N T
To see this, suppose that λ̄ = N1 i=1 λi = 0 or F̄ = 1
T t=1 Ft = 0, or both are
not zero. Let λ†i = λi − 2λ̄ and Ft† = Ft − 2F̄ . Then

Yit = Xit β + μ + α†i + ξt† + λ†i Ft† + εit

where α†i = αi + 2F̄ λi − 2λ̄ F̄ and ξt† = ξt + 2λ̄ Ft − 2λ̄ F̄ . It is easy to verify
that F † F † /T = F F/T = Ir and Λ† Λ† = Λ Λ is diagonal, and at the same time,
N † T †
i=1 αi = 0 and t=1 ξt = 0 Thus the new model is observationally equivalent
to (26) if (29) is not imposed.
To estimate the general model under the given restrictions, we introduce
some standard notation. For any variable φit , define

1 1 1
N T N T

φ̄·t = φit φ̄i· = φit φ̄·· = φit

N i=1 T t=1 NT i=1 t=1

φ̇it = φit − φ̄i· − φ̄·t + φ̄··

and its vector form φ̇i = φi − ιT φ̄i· − φ̄ + ιT φ̄··, where φ̄ = (φ̄·1 φ̄·T ) .
1254 JUSHAN BAI

The least squares estimators are

μ̂ = Ȳ·· − X̄·· β̂

α̂i = Ȳi· − X̄i· β̂ − μ̂
ξ̂t = Ȳ·t − X̄·t β̂ − μ̂
N −1 N

β̂ = Ẋi MF̂ Ẋi Ẋi MF̂ Ẏi
i=1 i=1

√ F̂ is the T × r matrix consisting of the first r eigenvectors (multiplied

and
N
by
T ) associated with the first r largest eigenvalues of the matrix NT i=1 (Ẏi −
1

Ẋi β̂)(Ẏi − Ẋi β̂) . Finally, Λ̂ is expressed as a function of (β̂ F̂) such that

Λ̂ = (λ̂1 λ̂2 λ̂N ) = T −1 [F̂ (Ẏ1 − Ẋ1 β̂) F̂ (ẎN − ẊN β̂)]

Iterations are required to obtain β̂ and F̂ . The remaining parameters û, α̂i ,
ξ̂t , and Λ̂ require no iteration, and they can be computed once β̂ and F̂ are
obtained. The solutions for μ̂ α̂i , and ξ̂t have the same form as the usual fixed-
effects model; see Greene (2000, p. 565).
We shall argue that (μ̂ {α̂i } {ξ̂t } β̂ F̂ Λ̂) are indeed the least squares esti-
mators from minimization of the objective function

N

T

(Yit − Xit β − μ − αi − ξt − λi Ft )2

i=1 t=1

subject to the restrictions (27)–(29). Concentrating out (μ, {αi }, {ξt }) is equiv-
alent to using (Ẏit Ẋit ) to estimate the remaining parameters. So the concen-
trated objective function is

N

T

(Ẏit − Ẋit β − λi Ft )2

i=1 t=1

The dotted variable for λi Ft is itself, that is, ċit = cit , where cit = λi Ft due to
restriction (29). This objective function is the same as (8), except Yit and Xit
are replaced by their dotted versions. From the analysis in Section 3, the least
squares estimators for β, F , and Λ are as prescribed above. Given these es-
timates, the least squares estimators for (μ {αi } {ξt }) are also immediately
obtained as prescribed.
PANEL MODELS WITH FIXED EFFECTS 1255
N
We next argue that all restrictions are satisfied. For example, N1 i=1 α̂i =
T
Ȳ·· − X̄··β̂ − μ̂ = μ̂ − μ̂ = 0. Similarly, t=1 ξ̂t = 0. It requires an extra argument
T
to show t=1 F̂t = 0. By definition,

1
N

F̂VNT = (Ẏi − Ẋi β̂)(Ẏi − Ẋi β̂) F̂
NT i=1

Multiplying ιT = (1 1) on each side yields

1
N

ιT F̂VNT = ι (Ẏi − Ẋi β̂)(Ẏi − Ẋi β̂) F̂
NT i=1 T
T
but ιT Ẏi = t=1 Ẏit = 0 and, similarly, ιT Ẋi = 0. Thus the right-hand side is
N
zero, implying ιT F̂ = 0. The same argument leads to i=1 λ̂i = 0.
To derive the asymptotic distribution for β̂, we define

1
N

Żi (F) = MF Ẋi − aik MF Ẋk and

N k=1

1
N

Ḋ(F) = Żi (F) Żi (F)

NT i=1

where aik = λi (Λ Λ/N)−1 λk . We assume

(30) inf Ḋ(F) > 0

Let Żi = Żi (F 0 ). Notice that

Ẏit = Ẋit β + λi Ft + ε̇it

The entire analysis of Section 4 can be restated here. In particular, under the
conditions of Theorem 2, we have the asymptotic representation
−1
√ 1
N N
1
NT (β̂ − β0 ) = Ż Żi √ Żi ε̇i + op (1)
NT i=1 i NT i=1

In the Supplemental Material, we show the identity (see Lemma A.13)

N N
i=1 Żi ε̇i ≡ i=1 Żi εi . That is, ε̇ can be replaced by εi . It follows that if
Ni
1
normality is assumed for √NT i=1 Żi εi , asymptotic normality also holds for
√
NT (β̂ − β).
1256 JUSHAN BAI

N N d
ASSUMPTION F: (i) plim NT 1
Ż Żi = Ḋ0 > 0; (ii) √1 Żi εi −→
i=1 i NT i=1
N(0 ḊZ ), where ḊZ = plim NT ijts σijts Żit Żjs .
1

THEOREM 5: Assume Assumptions A–F hold. Then as T N → ∞, the follow-

ing statements hold:
(i) Under the assumptions of Theorem 2(i),
√ d
NT (β̂ − β0 ) −→ N(0 Ḋ−1 −1
0 Ḋ1 Ḋ0 )

(ii) Under the assumptions of Theorem 2(ii),

√ d
NT (β̂ − β0 ) −→ N(0 Ḋ−1 −1
0 Ḋ2 Ḋ0 )

where Ḋ1 and Ḋ2 are special cases of ḊZ .

An analogous result to Theorem 3 also holds, and bias-corrected estimators

can also be considered. Since the analysis holds with Xi replaced by Ẋi , details
are omitted.

9. TESTING ADDITIVE VERSUS INTERACTIVE EFFECTS

There exist two methods to evaluate which specification—fixed effects or
interactive effects—gives a better description of the data. The first method is
that of the Hausman test statistic (Hausman (1978)) and the second is based
on the number of factors. We detail the Hausman test method, delegating the
number-of-factors method to the Supplemental Material. Throughout this sec-
tion, for simplicity, we assume εit are i.i.d. over i and t, and that E(εit2 ) = σ 2 .
The null hypothesis is an additive-effects model

(31) Yit = Xit β + αi + ξt + μ + εit

N T
with restrictions i=1 αi = 0 and t=1 ξt = 0 due to the grand mean parame-
ter μ. The alternative hypothesis—more precisely, the encompassing general
model—is

(32) Yit = Xit β + λi Ft + εit

The null model is nested in the general model with λi = (αi 1) and Ft =
(1 ξt + μ) .
The interactive-effects estimator for β is consistent under both models (31)
and (32), but is less efficient than the least squares dummy-variable estimator
for model (31), as the latter imposes restrictions on factors and factor loadings.
But the fixed-effects estimator is inconsistent under model (32). The principle
of the Hausman test is applicable here.
PANEL MODELS WITH FIXED EFFECTS 1257

The within-group estimator of β in (31) is

−1
√ 1 1
N N

NT (β̂FE − β) = Ẋ Ẋi √ Ẋi εi

NT i=1 i NT i=1

where Ẋi = Xi − ιT X̄i· − X̄ + ιT X̄··. Rewrite the fixed-effects estimator more

compactly as
√
NT (β̂FE − β) = C −1 ψ
N N
where C = ( NT
1
i=1 Ẋi Ẋi ) and ψ = NT
√1
i=1 Ẋi εi . The interactive-effects es-
timator can be written as (see Proposition A.3)
√
NT (β̂IE − β) = D(F 0 )−1 (η − ξ) + op (1)

where

1
N
1 N
1
N

(33) η= √ X MF 0 εi
i ξ= √ aik Xk MF 0 εi
NT i=1
NT i=1 N k=1

The variances of the two estimators are

√ √
var( NT (β̂FE − β)) = σ 2 C −1 var( NT (β̂IE − β)) = σ 2 D(F 0 )−1

In the accompanying document, we show, under the null hypothesis of additiv-

ity,

(34) E[(η − ξ)ψ ] = σ 2 D(F 0 )

This implies var(β̂IE − β̂FE ) = var(β̂IE ) − var(β̂FE ). Thus the Hausman test
takes the form
d
J = NT σ 2 (β̂IE − β̂FE ) [D(F 0 )−1 − C −1 ]−1 (β̂IE − β̂FE ) −→ χ2p

Replacing D(F 0 ) and σ 2 by their consistent estimators, the above is still

true. Proposition 2 shows that D(F 0 ) is consistently estimated by D̂0 . Let
N T p
σ̂ 2 = L1 i=1 t=1 ε̂it2 , where L = NT − (N + T )r − p. Then σ̂ 2 −→ σ 2 .

REMARK 9: The Hausman test is also applicable when there are no time
effects but only individual effects (i.e., ξt = 0). Then it is testing whether the
individual effects are time-varying. Similarly, the Hausman test is applicable
when αi = 0 in (31) but ξt = 0. Then it is testing whether the common shocks
have heterogeneous effects on individuals. Details are given in the Supplemen-
tal Materials.
1258 JUSHAN BAI

10. TIME-INVARIANT AND COMMON REGRESSORS

In earnings studies, time-invariant regressors include education, gender,
race, and so forth; common variables are those that represent trends or poli-
cies. In consumption studies, common regressors include price variables, which
are the same for each individual. Those variables are removed by the within-
group transformation. As a result, identification and estimation must rely on
other means such as the instrumental variable approach of Hausman and Tay-
lor (1981). This section considers similar problems under interactive effects.
Under some reasonable and intuitive conditions, the parameters of the time-
invariant and common regressors are shown to be identifiable and can be con-
sistently estimated. In effect, those regressors act as their own instruments;
additional instruments, either within or outside the system, are not necessary.
Ahn, Lee, and Schmidt (2001) allowed for time-invariant regressors, although
they did not consider the joint presence of common regressors. Their identi-
fication condition relies on nonzero correlation between factor loadings and
regressors.
A general model can be written as

(35) Yit = Xit ϕ + xi γ + wt δ + λi Ft + εit

where (Xit xi wt ) is a vector of observable regressors, xi is time invariant, and
wt is cross-sectionally invariant (common). The dimensions of regressors are
such that Xit is p × 1, xi is q × 1, wt is × 1, and Ft is r × 1. Introduce
⎡ ⎤
Xi1 xi w1 ⎡ ⎤
⎢ X x w ⎥ ϕ
⎢ i2 2 ⎥
⎣
⎥ β = γ ⎦
i
Xi = ⎢
⎣ ⎦
δ

XiT xi wT
⎡ x ⎤ ⎡ w ⎤
1 1
⎢ x2 ⎥ ⎢ w2 ⎥
x=⎢ ⎥
⎣ ⎦ W =⎢ ⎥
⎣ ⎦

xN wT
Then the model can be rewritten as

Yi = Xi β + Fλi + εi

Let (β0 F 0 Λ) denote the true parameters (superscript 0 is not used for Λ).
To identify β0 , it was assumed in Section 4 that the matrix
−1
1 1 1
N N N
ΛΛ
D(F) = X MF Xi − X MF Xk λi λk
NT i=1 i T N 2 i=1 k=1 i N
PANEL MODELS WITH FIXED EFFECTS 1259

is positive definite for all possible F . This assumption fails when time-invariant
regressors and common regressors exist. This is because D(ιT ) and D(W ) are
not full rank matrices. However, the positive definiteness of D(F) is not a nec-
essary condition. In fact, all that is needed is the identification condition

D(F 0 ) > 0

That is, the matrix D(F) is positive definite when evaluated at the true F 0 , a
much weaker condition than Assumption A. In the Supplemental Material, we
show that the above condition can be decomposed into some intuitive assump-
tions. First, this means that the interactive effects are genuine (not additive
effects); otherwise, we are back to the environment of Hausman and Taylor,
and instrumental variables must be used to identify β. Second, there should
be no multicollinearity between W and F 0 , and no multicollinearity between x
and Λ. Finally, W and x cannot both contain the constant regressor (only one
grand mean parameter).
It remains to argue that D(F 0 ) > 0 (or equivalently, the four conditions
above) implies consistent estimation. We state this result as a proposition.

p
PROPOSITION 3: Assume Assumptions B–D hold. If D(F 0 ) > 0, then β̂ → β0 .

The proof of this proposition is nontrivial and is provided in the Supplemen-

tal Material. The proposition implies that D(F 0 ) > 0 is a sufficient condition
for consistent estimation.
Given consistency, the rest of the argument for rate of convergence does not
hinge on any particular structure of the regressors. Therefore, the rate of con-
vergence of β̂ and the limiting distribution are still valid in the presence of the
grand mean, time-invariant regressors, and common regressors. More specif-
ically, all results up to Section 7 (inclusive) are valid. The result of Section 8
is valid for regressors with variations in both dimensions. Similarly, hypothesis
testing in Section 9 can only rely on the subset of coefficients whose regressors
have variations in both dimensions.

Discussion
When additive effects are also present, (35) becomes

Yit = Xit ϕ + μ + αi + ξt + xi γ + wt δ + λi Ft + εit

where μ is the grand mean (explicitly written out), and αi and ξt are, respec-
tively, the individual and the time effects. The parameters γ and δ are no
longer directly estimable. Under the restrictions of (27) and (29), the within-
group transformation implies Ẏit = Ẋit φ + λi Ft + ε̇it . The parameters φ and
1260 JUSHAN BAI

λi Ft are estimable by the interactive-effects estimator, so they can be treated

as known in terms of identification. Letting Yit∗ = Yit − Xit φ − λi Ft , we have
(36) Yit∗ = μ + αi + ξt + xi γ + wt δ + εit
which is a standard model. As in Hausman and Taylor (1981), if we assume
a subset of Xit ’s whose time averages are uncorrelated with αi but correlated
with xi , time averages can be used as instruments for xi . We can assume a
similar instrument for wt to estimate both γ and δ. This is a direct extension of
the Hausman and Taylor framework for interactive-effects models.
A more interesting setup is to allow time-dependent coefficients for the
time-invariant regressors and, similarly, allow individual-dependent coeffi-
cients for the common regressors; namely
(37) Yit = Xit φ + μ + αi + ξt + xi γt + wt δi + λi Ft + εit
The observable variables are Yit , Xit , xi , and wt . Again the levels of γt and δi
are not directly estimable due to αi + ξt . For example, αi + xi γt = (αi + xi c) +
xi (γt − c) = α∗i + xi γt∗ . However, the deviations of γt from its time average γt −
E(γt ) and the deviations of δi from its individual average δi − E(δi ) are directly
estimable. In practice, these deviations or contrasts may be of more importance
than the levels, since they reveal the patterns across individuals or changes
over time, just like coefficients on dummy variables. Restrictions are needed to
estimate (φ μ γt δi ). For example, we need to impose E(γt ) = 0 or t γt = 0,
and we also need similar restrictions for δi , αi , ξt , λi , and Ft , together with
some normalization and multicollinearity restrictions. Unreported simulations
show that these deviations can be well estimated. To estimate the levels E(γt )
and E(δi ), the Hausman–Taylor approach appears to be applicable as well. In
this case, Yit∗ in (36) is replaced by Yit∗ = Yit − Xit φ − xi γt∗ − wt δ∗i − λi Ft , where
γt∗ = γt − E(γt ) and δ∗i = δi − E(δi ) are the deviations. The large sample theory
of this model warrants a separate study.

11. FINITE SAMPLE PROPERTIES VIA SIMULATIONS

We assess the performance of the estimator by Monte Carlo simulations.
A general model with common regressors and time-invariant regressors is con-
sidered:
Yit = Xit1 β1 + Xit2 β2 + μ + xi γ + wt δ + λi Ft + εit
((β1 β2 μ γ δ) = (1 3 5 2 4))
where λi = (λi1 λi2 ) and Ft = (Ft1 Ft2 ) . The regressors are generated accord-
ing to
Xit1 = μ1 + c1 λi Ft + ι λi + ι Ft + ηit1
Xit2 = μ2 + c2 λi Ft + ι λi + ι Ft + ηit2
PANEL MODELS WITH FIXED EFFECTS 1261
TABLE I
MODELS WITH GRAND MEAN, TIME-INVARIANT REGRESSORS AND COMMON REGRESSORS
(TWO FACTORS, r = 2)

Mean Mean Mean Mean Mean

N T β1 = 1 SD β2 = 3 SD μ=5 SD γ=2 SD δ=4 SD

Infeasible Estimator
100 10 1.003 0.061 2.999 0.061 4.994 0.103 1.998 0.060 4.003 0.087
100 20 1.001 0.039 2.998 0.041 5.002 0.065 2.000 0.040 4.000 0.054
100 50 1.000 0.025 3.002 0.024 5.000 0.039 1.999 0.024 4.000 0.030
100 100 1.000 0.017 3.000 0.017 5.000 0.029 1.999 0.017 3.999 0.020
10 100 0.998 0.056 3.002 0.055 4.998 0.098 2.002 0.066 4.001 0.063
20 100 1.000 0.039 2.998 0.039 5.000 0.064 2.002 0.040 3.999 0.046
50 100 1.000 0.024 3.001 0.025 4.999 0.040 2.001 0.025 4.000 0.029
Interactive-Effects Estimator
100 10 1.104 0.135 3.103 0.138 4.611 0.925 1.952 0.242 3.939 0.250
100 20 1.038 0.083 3.036 0.084 4.856 0.524 1.996 0.104 3.989 0.114
100 50 1.010 0.036 3.012 0.037 4.981 0.156 1.995 0.098 3.999 0.058
100 100 1.006 0.032 3.006 0.033 4.992 0.115 1.996 0.066 3.997 0.061
10 100 1.105 0.133 3.108 0.135 4.556 0.962 1.939 0.240 3.949 0.259
20 100 1.038 0.083 3.037 0.084 4.859 0.479 1.991 0.109 3.996 0.082
50 100 1.009 0.035 3.010 0.037 4.974 0.081 2.000 0.041 4.000 0.033

with ι = (1 1). The regressors are correlated with λi , Ft , and the product λi Ft .
The variables λij Ftj , and ηitj are all i.i.d. N(0 1) and the regression error
εit is i.i.d. N(0 4). We set μ1 = μ2 = c1 = c2 = 1. Further, xi ∼ ι λi + ei and
wt = ι Ft + ηi , with ei and ηi being i.i.d. N(0 1), so that xi is correlated with
λi and wt is correlated with Ft .
Simulation results are reported in Table I (based on 1000 repetitions).
The infeasible estimator in this table assumes observable Ft . Both the infea-
sible and interactive-effects estimators are consistent, but the latter is less effi-
cient than the former, as expected. The coefficients for the common regressors
and time-invariant regressors are estimated well. The within-group estimator
can only estimate β1 and β2 and is not reported.
We next investigate what happens when interactive-effects estimator is used
when the underlying effects are additive. That is, λi = (αi 1) and Ft = (1 ξt )
so that λi Ft = αi + δt . With regressors Xit1 and Xit2 generated with the earlier
formula, the model is
Yit = Xit1 β1 + Xit2 β2 + αi + ξt + εit
We consider three estimators: (i) the within-group estimator, (ii) the infeasible
estimator, and (iii) the interactive-effects estimator. All three are consistent.
The results are reported in Table II. The interactive-effects estimator remains
valid under additive effects, but is less efficient than the within-group estima-
tor, as expected.
1262 JUSHAN BAI

TABLE II
MODELS OF ADDITIVE EFFECTS

Interactive-Effects
Within-Group Estimator Infeasible Estimator Estimator
Mean Mean Mean Mean Mean Mean
N T β1 = 1 SD β2 = 3 SD β1 SD β2 SD β1 SD β2 SD

100 3 1.002 0.146 2.997 0.144 1.001 0.208 2.998 0.206 1.155 0.253 3.164 0.259
100 5 1.001 0.099 3.002 0.100 1.001 0.114 3.003 0.118 1.189 0.194 3.190 0.186
100 10 1.000 0.068 2.996 0.066 1.000 0.072 2.995 0.072 1.110 0.167 3.106 0.167
100 20 0.999 0.048 2.999 0.046 0.998 0.048 2.998 0.047 1.017 0.083 3.016 0.080
100 50 1.001 0.029 2.999 0.029 1.001 0.029 2.999 0.029 1.003 0.029 3.000 0.029
100 100 0.999 0.021 3.000 0.021 0.999 0.021 3.000 0.021 1.000 0.021 3.001 0.021
3 100 1.001 0.142 2.995 0.143 1.002 0.113 2.996 0.116 1.163 0.240 3.165 0.251
5 100 1.000 0.102 3.005 0.100 1.000 0.093 3.006 0.092 1.179 0.190 3.180 0.189
10 100 1.000 0.069 2.999 0.069 1.001 0.066 2.999 0.065 1.106 0.167 3.106 0.164
20 100 1.001 0.047 3.000 0.047 1.001 0.045 3.000 0.046 1.018 0.080 3.017 0.080
50 100 0.998 0.030 3.002 0.029 0.998 0.030 3.002 0.028 1.000 0.030 3.004 0.029

Additional simulations are reported in the Supplemental Material, where we

consider cross-sectionally correlated eit . Under cross-section correlation in eit
and with a fixed N, the interactive-effects estimator is inconsistent. The esti-
mator becomes consistent as N going to infinity. These theoretical results are
confirmed by the simulations. A primary use of the factor model in practice is
to account for cross-sectional correlations. With a sufficient number of factors
included, much of the correlation in the error terms will either be removed or
be reduced, making the correlation a less critical issue. We also report results
that include lagged dependent variables as regressors. The idea is to parame-
trize and to control for serial correlation. The parameters are well estimated
in the simulation. The interactive-effects estimator is effective under large N
and large T . The computation is fast and the bias is decreasing with N and T ,
as shown in the theory and confirmed in the simulation.

12. CONCLUDING REMARKS

In this paper, we have examined issues related to identification and infer-
ence for panel data models with interactive effects. In earnings studies, the
interactive effects are a result of changing prices for a vector of unmeasured
skills. The model can also be motivated from an optimal choice of consumption
and labor supply for heterogeneous agents under a competitive economy with
complete markets. In macroeconomics, interactive effects represent common
shocks and heterogeneous impacts on the cross-units. In finance, the common
factors represent marketwide risks and the loadings reflect assets’ exposure to
the risks. A factor model is also a useful approach to controlling cross-section
PANEL MODELS WITH FIXED EFFECTS 1263

correlations. This paper focuses on some of the underlying econometric issues.

We
√ showed that the convergence rate for the interactive-effects estimator is
NT , and this rate holds in spite of correlations and heteroskedasticity in both
dimensions. We also derived bias-corrected estimator and estimators under
additivity restrictions and their limiting distributions. We further studied the
problem of testing additive effects against interactive effects. The interactive-
effects estimator is easy to compute, and both the factor process Ft and the
factor loadings λi can also be consistently estimated up to a rotation. Under
interactive effects, we showed that the grand mean, the coefficients of time-
invariant regressors, and the coefficients of common regressors are identifiable
and can be consistently estimated.
A useful extension is the large N and large T dynamic panel data model with
multiple interactive effects. The argument for consistency and rate of conver-
gence remains the same, but the asymptotic bias will take a different form. An-
other broad extension is nonstationary panel data analysis, particularly panel
data cointegration, a subject that recently attracted considerable attention. In
this setup, Xit is a vector of an integrated variable and Ft can be either inte-
grated or stationary. When Ft is integrated, then Yit , Xit , and Ft are cointe-
grated. Neglecting Ft is equivalent to spurious regression and the estimation
of β will not be consistent. However, the interactive-effect approach can be
applied by jointly estimating the unobserved common stochastic trends Ft and
the model coefficients, leading to consistent estimation. Finally, the models
introduced in Section 10 (see Discussion) warrant further investigation.

APPENDIX A: PROOFS
T
We use the following facts throughout: T −1 Xi 2 = T −1 t=1 Xit 2 = Op (1)
N
or T −1/2 Xi = Op (1). Averaging over i, (T N)−1 i=1 Xi 2 = Op (1). Simi-
√
larly, T −1/2 F 0 = Op (1), T −1 F̂ 2 = r, T −1/2 F̂ =√ r, T −1 0
√ Xi F = O2p (1),
and so forth. Throughout, we define δNT = min[ N T ] so that δNT =
min[N T ]. The proofs of the lemmas are given in the Supplemental mater-
ial.

LEMMA A.1: Under Assumptions A–D,

1 N

sup Xi MF εi = op (1)
F NT
i=1

1 N
0
sup λi F MF εi = op (1)
F NT
i=1

1 N

sup εi PF εi = op (1)
F NT
i=1
1264 JUSHAN BAI

where the sup is taken with respect to F such that F F/T = I.

PROOF OF PROPOSITION 1: Without loss of generality, assume β0 = 0

(purely for notational simplicity). From Yi = Xi β0 + F 0 λi + εi = F 0 λi + εi ,
expanding SNT (β F), we obtain

1 1 0
N N

SNT (β F) = S̃NT (β F) + 2β Xi MF εi + 2 λ F MF εi

NT i=1 NT i=1 i

1
N

+ ε (PF − PF 0 )εi
NT i=1 i

where
0
1
N
F MF F 0 ΛΛ
(38) S̃NT (β F) = β Xi MF Xi β + tr
NT i=1 T N

1
N

+ 2β X MF F 0 λi
NT i=1 i

By Lemma A.1,

(39) SNT (β F) = S̃NT (β F) + op (1)

uniformly over bounded β and over F such that F F/T = I. Bounded β is in

fact not necessary because the objective function is quadratic in β (that is, it is
easy to argue that the objective function cannot achieve its minimum for very
large β).
Clearly, S̃NT (β0 F 0 H) = 0 for any r × r invertible H, because MF 0 H = MF 0
and MF 0 F 0 = 0. The identification restrictions implicitly fix an H. We next
show that for any (β F) = (β0 F 0 H), S̃NT (β F) > 0; thus, S̃NT (β F) attains
its unique minimum value 0 at (β0 F 0 H) = (0 F 0 H). Define

1
N
Λ Λ
A= X MF Xi B= ⊗ IT
NT i=1 i N

1
N

C= (λ ⊗ MF Xi )
NT i=1 i

and let η = vec(MF F 0 ). Then

S̃NT (β F) = β Aβ + η Bη + 2β C η

PANEL MODELS WITH FIXED EFFECTS 1265

Completing the square, we have

S̃NT (β F) = β (A − C B−1 C)β + (η + β CB−1 )B(η + B−1 Cβ)

= β D(F)β + θ Bθ
where θ = (η + B−1 Cβ). By Assumption A, D(F) is positive definite and B
is also positive definite, so S̃NT (β F) ≥ 0. In addition, if either β = β0 = 0 or
F = F 0 H, then S̃NT (β F) > 0. Thus, S̃NT (θ F) achieves its unique minimum
at (β0 F 0 H). Further, for β ≥ c > 0, S̃NT (β F) ≥ ρmin c 2 > 0, where ρmin is
the minimum eigenvalue of the positive definite matrix infF D(F). This implies
that β̂ is consistent for β0 = 0. However, we cannot deduce that F̂ is consistent
for F 0 H. This is because F 0 is T × r and as T → ∞, the number of elements
goes to infinity, so the usual consistency is not well defined. Other notions of
consistency will be examined.
To prove part (ii), note that the centered objective function satisfies
SNT (β0 F 0 ) = 0 and, by definition, SNT (β̂ F̂) ≤ 0. Therefore, in view of (39),

0 ≥ SNT (β̂ F̂) = S̃NT (β̂ F̂) + op (1)

Combined with S̃NT (β̂ F̂) ≥ 0, it must be true that

S̃NT (β̂ F̂) = op (1)

p
From β̂ −→ β0 = 0 and (38), it follows that the above implies
0
F MF̂ F 0 Λ Λ
tr = op (1)
T N

Because Λ Λ/N > 0 and (F 0 MF̂ F 0 )/T ≥ 0, the above implies the latter matrix
is op (1), that is,

F 0 MF̂ F 0 F 0 F 0 F 0 F̂ F̂ F 0
(40) = − = op (1)
T T T T
By Assumption B, F 0 F 0 /T is invertible, so it follows that F 0 F̂/T is invertible.
Next,

PF̂ − PF 0 2 = tr[(PF̂ − PF 0 )2 ] = 2 tr(Ir − F̂ PF 0 F̂/T )

p p
But (40) implies F̂ PF 0 F̂/T −→ Ir , which is equivalent to PF̂ − PF 0 −→ 0.
Q.E.D.

Note that for any positive definite matrices, A and B, the eigenvalues of AB
are the same as those of BA, A1/2 BA1/2 , and so forth; therefore, all eigenvalues
1266 JUSHAN BAI

are positive. In all remaining proofs, β and β0 are used interchangeably, and
so are F and F 0 .

PROPOSITION A.1: Under Assumptions A–D, we can make the following state-
ments: p
(i) VNT is invertible and VNT −→ V , where V (r × r) is a diagonal matrix
consisting of the eigenvalues of ΣΛ ΣF ; VNT is defined in (12).
−1
(ii) Let H = (Λ Λ/N)(F 0 F̂/T )VNT . Then H is an r × r invertible matrix and

1
T
1
F̂ − F H =
0 2
F̂t − H Ft0 2
T T t=1

1
= Op ( β̂ − β 2 ) + Op
min[N T ]

PROOF: From

1
N

(Yi − Xi β̂)(Yi − Xi β̂) F̂ = F̂VNT

NT i=1

and Yi − Xi β̂ = Xi (β − β̂) + F 0 λi + εi , by expanding terms, we obtain

1 1
N N

F̂VNT = Xi (β − β̂)(β − β̂) Xi F̂ + Xi (β − β̂)λi F 0 F̂

NT i=1 NT i=1

1 1 0
N N

+ Xi (β − β̂)εi F̂ + F λi (β − β̂) Xi F̂

NT i=1 NT i=1

1
N

+ εi (β − β̂) Xi F̂
NT i=1

1 0 1 1
N N N
0
+ F λi εi F̂ + εi λi F F̂ + εi εi F̂
NT i=1 NT i=1 NT i=1

1 0 0
N

+ F λi λi F F̂
NT i=1
= I1 + · · · + I9
The last term on the right is equal to F 0 (Λ Λ/N)(F 0 F̂/T ). Letting I1 I8
denote the eight terms on the right, the above can be rewritten as

(41) F̂VNT − F 0 (Λ Λ/N)(F 0 F̂/T ) = I1 + · · · + I8

PANEL MODELS WITH FIXED EFFECTS 1267

Multiplying (F 0 F̂/T )−1 (Λ Λ/N)−1 on each side of (41), we obtain

(42) F̂[VNT (F 0 F̂/T )−1 (Λ Λ/N)−1 ] − F 0

= (I1 + · · · + I8)(F 0 F̂/T )−1 (Λ Λ/N)−1

Note that the matrix VNT (F 0 F̂/T )−1 (Λ Λ/N)−1 is equal to H −1 , but the invert-
ibility of VNT is not proved yet. We have

T −1/2 F̂[VNT (F 0 F̂/T )−1 (Λ Λ/N)−1 ] − F 0

≤ T −1/2 ( I1 + · · · + I8 ) · (F 0 F̂/T )−1 (Λ Λ/N)−1

√
Consider each term on the right. For the first term, note that T −1/2 F̂ = r
and
N
−1/2 1 Xi 2 √
T I1 ≤ β̂ − β 2 r
N i=1 T

= Op ( β̂ − β 2 ) = op ( β̂ − β )

because β̂ − β = op (1). Using the same argument, it is easy to prove that

next four terms (I2–I5) are each Op (β̂ − β). The last three terms do not ex-
plicitly depend on β̂ − β and they have the same √ expressions
√ as those in Bai
and Ng (2002). Each of these terms is Op (1/ min[ N T ]), which was proved
in Bai and Ng (2002, Theorem 1). The proof there only uses the property that
F̂ F̂/T = I and the assumptions on εi ; thus the proof needs no modification.
In summary, we have

(43) T −1/2 F̂VNT (F 0 F̂/T )−1 (Λ Λ/N)−1 − F 0

√ √
= Op ( β̂ − β ) + Op (1/ min[ N T ])

(i) Left multiplying (41) by F̂ and using F̂ F̂ = T , we have

VNT − (F̂ F 0 /T )(Λ Λ/N)(F 0 F̂/T ) = T −1 F̂ (I1 + · · · + I8) = op (1)

√
because T −1/2 F̂ = r and T −1/2 (I1 + · · · + I8) = op (1). Thus

VNT = (F̂ F 0 /T )(Λ Λ/N)(F 0 F̂/T ) + op (1)

Proposition 1 shows that F̂ F 0 /T is invertible; thus VNT is invertible. To obtain

the limit of VNT , left multiply (41) by F 0 and then divide by T to yield

(F 0 F 0 /T )(Λ Λ/N)(F 0 F̂/T ) + op (1) = (F 0 F̂/T )VNT

1268 JUSHAN BAI

because T −1 F 0 (I1 + · · · + I8) = op (1). The above equality shows that the
columns of F 0 F̂/T are the (nonnormalized) eigenvectors of the matrix
(F 0 F 0 /T )(Λ Λ/N), and VNT consists of the eigenvalues of the same matrix
p
(in the limit). Thus VNT −→ V , where V is r × r, consisting of the r eigenvalues
of the matrix ΣF ΣΛ .
(ii) Since VNT is invertible, the left-hand side of (43) can be written as
T −1/2 F̂H −1 − F 0 ; thus (43) is equivalent to
√ √
T −1/2 F̂ − F 0 H = Op ( β̂ − β ) + Op (1/ min[ N T ])

Taking squares on each side gives part (ii). Note that the cross-product term
from expanding the square has the same bound.
Q.E.D.

The proofs for the next four lemmas are given in the Supplemental Material.

LEMMA A.2: Under Assumptions A–C, there exists an M < ∞, such that state-
ments (i) and (ii) hold:
(i) We have
2

−1/2 1
N T T

E N Fs Ft [εkt εks − E(εkt εks )] ≤ M
T
k=1 t=1 s=1

(ii) For all i = 1 2 N and h = 1 2 r, we have

T T 2

−1/2 1
N

E N Xit [εkt εks − E(εkt εks )]Fhs ≤ M
T
k=1 t=1 s=1

LEMMA A.3: Under Assumptions A–D, we have four equalities:

(i) T −1 F 0 (F̂ − F 0 H) = Op (β̂ − β) + Op (δ−2 NT ).
(ii) T −1 F̂ (F̂ − F 0 H) = Op (β̂ − β) + Op (δ−2 NT ).
(iii) T Xk (F̂ − F H) = Op (β̂ − β) + Op (δ−2
−1 0
NT ) for each k = 1 2 N.
N
(iv) NT1
i=1 X
i MF̂ ( F̂ − F 0
H) = O p ( β̂ − β) + Op (δ−2NT ).

LEMMA A.4: Under Assumptions A–D, we also have four equalities:

(i) T −1 εk (F̂ − F 0 H) = T −1/2 Op (β̂ − β) + Op (δ−2
NT ) for each k.
N
(ii) T √1 N k=1 εk (F̂ − F 0 H) = T −1/2 Op (β̂ − β) + N −1/2 Op (β̂ − β) +
Op (N −1/2 ) + Op (δ−2 ).
N NT −1
(iii) NT k=1 λk (F̂H − F 0 ) εk = (NT )−1/2 Op (β̂ − β) + Op (N −1 ) + N −1/2 ×
1

Op (δ−2
NT ).
PANEL MODELS WITH FIXED EFFECTS 1269
N 0
N N
(iv) NT1 0 0
k=1 (Xk F /T )(F F /T )(F̂H
−1
−F 0 ) εk = (1/N 2 ) i=1 k=1 (Xk ×
T
F 0 /T )(F 0 F 0 /T )(Λ Λ/N)−1 λi ( T1 t=1 εit εkt ) + (NT )−1/2 Op (β̂ − β) + N −1/2 ×
Op (δ−2
NT )

LEMMA A.5: Let G = (F 0 F̂/T )−1 (Λ Λ/N)−1 . Under Assumptions A–D, we
have

1
N N

X M (εk εk − Ωk )F̂Gλi

N 2 T 2 i=1 k=1 i F̂

1
= Op √ + (NT )−1/2 [Op (β̂ − β) + Op (δ−1
NT )]
T N
1 1
+ √ Op ( β̂ − β 2 ) + √ Op (δ−2 NT )
N N

PROPOSITION A.2: Assume Assumptions A–D hold. If T/N 2 → 0, then

√ 1 1
N N
−1
NT (β̂ − β ) = D(F̂) √
0
Xi MF̂ − aik Xk MF̂ εi
NT i=1 N k=1

N
+ ζNT + op (1)
T
where aik = λi (Λ Λ/N)λk and
0 −1 −1
1
N
F F̂ ΛΛ
(44) ζNT = −D(F̂)−1 Xi MF̂ ΩF̂ λi
NT i=1 T N
N
with Ω = 1
N k=1 Ωk and Ωk = E(εk εk ).

PROOF: From Yi = Xi β0 + F 0 λi + εi ,
N −1 N

β̂ − β0 = Xi MF̂ Xi Xi MF̂ F 0 λi
i=1 i=1
−1

N

N

+ Xi MF̂ Xi Xi MF̂ εi

i=1 i=1

or

1 1 1
N N N

(45) Xi MF̂ Xi (β̂ − β) = Xi MF̂ F 0 λi + X M εi

NT i=1 NT i=1 NT i=1 i F̂
1270 JUSHAN BAI

In view of MF̂ F̂ = 0, we have MF̂ F 0 = MF̂ (F 0 − F̂A) for any A. Choosing

A = H −1 , from (42), we get

F 0 − F̂H −1 = −[I1 + · · · + I8](F 0 F̂/T )−1 (Λ Λ/N)−1

It follows that

1
N

X M F 0 λi
NT i=1 i F̂
0 −1 −1
1
N
F F̂ ΛΛ
=− Xi MF̂ [I1 + · · · + I8] λi
NT i=1 T N
= J1 + · · · + J8

where J1–J8 are implicitly defined vis-à-vis I1–I8. For example,

0 −1 −1
1
N
F F̂ ΛΛ
J1 = − Xi MF̂ (I1) λi
NT i=1 T N

Term J1 is bounded in norm by Op (1) β̂ − β 2 and thus J1 = op (1)(β̂ − β).

Consider
N −1
1
N
ΛΛ
J2 = − 2 XM Xk (β − β̂)λk λi
N T i=1 i F̂ k=1 N
−1
1
N N
ΛΛ
= 2
(X i MF̂ X k ) λk λi (β̂ − β)
N T i=1 k=1 N

1 1 1
N N

= X M Xk aik (β̂ − β)

T N N i=1 k=1 i F̂

where aik = λi (Λ Λ/N)−1 λk is a scalar and thus commutable with β̂ − β. Now
consider
0 −1 −1
1
N N
εk F̂ F̂ F ΛΛ
J3 = X M X k λi (β̂ − β)
N 2 T i=1 k=1 i F̂ T T N

Writing εk F̂/T = εk F 0 H/T + εk (F̂ − F 0 H)/T = Op (T −1/2 ) + Op (β̂ − β) +
√ √
Op (1/ min[ N T ]), by Lemma A.4, it is easy to see that J3 = op (1)(β̂ − β).
PANEL MODELS WITH FIXED EFFECTS 1271

Next

1
N N
Xk F̂
J4 = − 2 X M F λk (β − β̂)
0
N T i=1 k=1 i F̂ T
−1 −1
F̂ F 0 Λ Λ
× λi
T N

Writing MF̂ F 0 = MF̂ (F 0 − F̂H −1 ) and using that T −1/2 F 0 − F̂H −1 is small,
then J4 is equal to op (1)(β̂ − β). It is easy to show J5 = op (1)(β̂ − β) and thus
it is omitted.
The last three terms J6–J8 do not explicitly depend on β̂ − β. Only term
J7 contributes to the limiting distribution of β̂ − β; the other two terms
are op ((NT )−1/2 ) plus op (β̂ − β). We shall establish these claims. Con-
sider
0 −1 −1
1
N N
εk F̂ F̂ F ΛΛ
J6 = − X M F 0
λk λi
N 2 T i=1 k=1 i F̂ T T N

Denote G = (F̂ F 0 /T )−1 (Λ Λ/N)−1 for the moment: it is a matrix of fixed di-
mension and does not vary with i. Using MF̂ F 0 = MF̂ (F 0 − F̂H −1 ), we can
write

1
N N
1 ε F̂
J6 = − Xi MF̂ (F 0 − F̂H −1 ) λk k Gλi
NT i=1 N k=1 T

Now

1 1 1
N N N

λk εk F̂ = λk εk F 0 H + λk εk (F̂ − F 0 H)

NT k=1 NT k=1 NT k=1

1
= Op √ + (NT )−1/2 Op (β̂ − β)
NT
+ Op (N −1 ) + N −1/2 Op (δ−2
NT )

1
= Op √ + Op (N −1 ) + N −1/2 Op (δ−2
NT )
NT

by Lemma A.4(iii). The last equality is because (NT )−1/2 dominates (NT )−1/2 ×
N
(β̂ − β). Furthermore, by Lemma A.3, NT 1
i=1 Xi MF̂ (F̂ − F H)λi = Op (β̂ −
0

−2
β) + Op (δNT ) for = 1 2 r, and noting G does not depend on i and
1272 JUSHAN BAI

G = Op (1), we have

J6 = [Op (β̂ − β) + Op (δ−2

NT )]

1 −1 −1/2 −2
× Op √ + Op (N ) + N Op (δNT )
NT

1
= op (β̂ − β) + op √ + Op (δ−2
NT )N
−1

NT
+ N −1/2 Op (δ−4
NT )

The term J7 is simply

N −1
1
N
ΛΛ
J7 = − 2 Xi MF̂ εk λk λi
N T i=1 k=1
N

1
N N

=− 2 aik Xi MF̂ εk

N T i=1 k=1

Next consider J8, which has the expression

0 −1 −1
1
N N
F F̂ ΛΛ
J8 = − 2 2 X M εk εk F̂ λi
N T i=1 k=1 i F̂ T N

Let E(εk εk ) = Ωk (T × T ). Denoting G = (F 0 F̂/T )−1 (Λ Λ/N)−1 and G =

Op (1), and rewriting gives

1
N N

(46) J8 = − X M Ωk F̂Gλi
N 2 T 2 i=1 k=1 i F̂

1
N N

− X M (εk εk − Ωk )F̂Gλi

N 2 T 2 i=1 k=1 i F̂

Denote the first term on the right by ANT . By Lemma A.5, we have

1 1
J8 = ANT + Op √ [Op (β̂ − β) + Op (δ−1
+√ NT )]
T N NT
1 1
− √ Op ( β̂ − β 2 ) + √ Op (δ−2NT )
N N
PANEL MODELS WITH FIXED EFFECTS 1273

Collecting terms from J1 to J8 with dominated terms ignored gives

1
N

Xi MF̂ F 0 λi = J2 + J7 + ANT + op (β̂ − β) + op (NT )−1/2

NT i=1

1
+ Op √ + N −1/2 Op (δ−2
NT )
T N
Thus,

1
N

X M Xi + op (1) (β̂ − β) − J2
NT i=1 i F̂

1
N

= Xi MF̂ εi + J7 + ANT
NT i=1

−1/2
1
+ op (NT ) + Op √ + N −1/2 Op (δ−2
NT )
T N
√
Combining terms and multiplying by NT yields
√
[D(F̂) + op (1)] NT (β̂ − β)

1 1 √
N N

=√ Xi MF̂ − aik Xk MF̂ εi + NT ANT
NT i=1 N k=1
−1/2
+ op (1) + Op T + T 1/2 Op (δ−2
NT )

Thus, if T/N 2 → 0, the last term is also op (1). Multiply D(F̂)−1 on each
√
side of the above and note that D(F̂)−1 NT ANT = N/T ζNT . Finally,
D(F̂)−1 [D(F̂) + op (1)]−1 = I + op (1), so we have proved the proposition.
Q.E.D.

LEMMA A.6: Under Assumptions A–D, ζNT = Op (1), where ζNT is given in
Proposition A.2.

LEMMA A.7: Under Assumptions A–D, we have the following equalities:

(i) HH = (F 0 F 0 /T )−1 + Op ( β̂ − β ) + Op (δ−2
NT ).
(ii) PF̂ − PF 0 2 = Op ( β̂ − β ) + Op (δ−2
NT ).

Proposition A.2 still involves estimated F . To replace F̂ by F 0 , we need some

preliminary results.
1274 JUSHAN BAI

LEMMA A.8: Under Assumptions A–D,

1
N N
1
√ Xi MF̂ − aik Xk MF̂ εi
NT i=1 N k=1

1
N N
1
=√ Xi MF 0 − aik Xk MF 0 εi
NT i=1 N k=1
√
T
+ †
ξNT + T Op ( β̂ − β0 2 )
N
√
+ Op ( β̂ − β0 ) + T Op (δ−2 NT )

where
−1
1 (Xi − Vi ) F 0 F 0 F 0
N N

(47) ξ †
NT =−
N i=1 k=1 T T
−1
1
T
ΛΛ
× λk εit εkt = Op (1)
N T t=1
√
Combining Proposition A.2 and Lemma A.8 and noting that T Op ( β̂ −
√ √
β0 2 ) + Op ( β̂ − β0 ) is dominated by NT (β̂ − β0 ) and T Op (δ−2
NT ) = op (1)
if T/N 2 → 0, we have an additional statement:

COROLLARY A.1: Under Assumptions A–D and as T/N 2 → 0,

√ 1 1
N N
−1
NT (β̂ − β ) = D(F̂) √
0
Xi MF 0 − aik Xk MF 0 εi
NT i=1 N k=1

T N
+ ξNT + ζNT + op (1)
N T

where ξNT = D(F̂)−1 ξNT

† †
, ξNT is defined in (47) and ζNT is given in (44).

PROOF OF THEOREM 1: The assumption implies that both T/N and N/T
are O(1). Furthermore, ζNT = Op (1) by Lemma A.6, and ξNT †
and hence
ξNT are Op (1) by Lemma A.8. The theorem follows from the expression for
√
NT (β̂ − β) given in Corollary 1. Q.E.D.

LEMMA A.9: Under Assumptions A–D, the following equalities hold:

(i) D(F̂)−1 − D(F 0 )−1 = op (1).
PANEL MODELS WITH FIXED EFFECTS 1275

(ii) T/N[D(F̂)−1 − D(F 0 )−1 ] = op (1) if T/N 2 → 0.

(iii) N/T [D(F̂)−1 − D(F 0 )−1 ] = op (1) if N/T 2 → 0.
(iv) T/N(ξNT − B) = op (1) if T/N 2 → 0, where B is given in (18).
(v) N/T (ζNT − C) = op (1) if N/T 2 → 0, where C is given in (19).

PROPOSITION A.3: Under Assumptions A–D, if T/N 2 → 0 and N/T 2 → 0,

then

√ 1 1
N N
0 −1
NT (β̂ − β ) = D(F ) √
0
Xi MF 0 − aik Xk MF 0 εi
NT i=1 N k=1

T N
+ B+ C + op (1)
N T
where B and C are in (18) and (19), respectively.

In this representation, the left-hand side involves no estimated quantities.

PROOF OF PROPOSITION A.3: From Lemma A.9(i), D(F̂)−1 = D(F 0 )−1 +

op (1), the matrix D(F̂)−1 in Corollary 1 can be replaced by D(F 0 ) since
N
N
i=1 [Xi MF 0 − N k=1 aik Xk MF 0 ]εi = Op (1). Parts (iv) and (v) of
√1 1
NT
Lemma A.9 together with Corollary 1 again immediately lead to the propo-
sition. Q.E.D.

PROOF OF THEOREM 2: (i) We use the representation in Proposition A.3.

Without serial correlation or heteroskedasticity, C in Proposition A.3 is zero.
N
This follows from Ωk = σk2 IT and MF 0 ΩF 0 = ( i=1 σk2 )MF 0 F 0 = 0. Further-
p
more, T/NB −→ 0 since T/N → 0. Thus, by Proposition A.3,

√ 1
N

(48) NT (β̂ − β0 ) = D(F 0 )−1 √ Zi εi + op (1)

NT i=1

The limiting distribution now follows from Assumption E.

(ii) The proof again uses the representation in Proposition A.3. From
N/T → 0, we have N/T C → 0. We next argue that B = 0 when the
cross-section correlation and heteroskedasticity are absent. Recall that σiktt =
E(εit εkt ). By assumption, σijtt = σt2 for i = j and σijtt = 0 for i = j. Thus B in
(18) is simplified as
−1 −1
1 (Xi − Vi ) F 0 F 0 F 0
N
0 −1 ΛΛ
B = −D(F ) λi σ̄ 2
N i=1 T T N
1276 JUSHAN BAI
T N
where σ̄ 2 = T −1 t=1 σt2 . From Vi = N1 k=1 Xk aik and that aik is a scalar, thus
N N
commutable with all matrices, we have N1 i=1 λi aik = N1 i=1 λi λi (Λ Λ/N)−1 ×
λk = λk . This means that
−1 −1
1 Vi F 0 F 0 F 0
N
ΛΛ
λi
N i=1 T T N
−1 −1
1 Xk F 0 F 0 F 0
N
ΛΛ
= λk
N k=1 T T N

which implies B = 0. Thus (48) holds and the limiting distribution once again
follows from Assumption E. Q.E.D.

PROOF OF THEOREM 3: This again follows from the representation of

p
Proposition A.3. Under the assumption that T/N → ρ, T/NB −→ ρ1/2 B0 ,
p
where B0 is the probability limit of B, and N/T C −→ ρ−1/2 C0 , where C0 is
the probability limit of C. Combining with Assumption E, we obtain the theo-
rem. Q.E.D.

Bias Correction
LEMMA A.10: Under Assumptions A–D, the following equalities hold:
N
(i) N1 Λ̂ − H −1 Λ 2 = N1 i=1 λ̂i − H −1 λi 2 = Op ( β̂ − β 2 ) + Op (δ−2
NT )
−1 −1 −2
(ii) N (Λ̂ − H Λ )Λ = Op ( β̂ − β ) + Op (δNT ).
(iii) Λ̂ Λ̂/N − H −1 (Λ Λ/N)H −1 = Op ( β̂ − β ) + Op (δ−2 NT ).
(iv) (Λ̂ Λ̂/N) − H (Λ Λ/N) H = Op ( β̂ − β ) + Op (δ−2
−1 −1
NT ).
N
(v) N1 i=1 λ̂i − H −1 λi = Op (δ−1 ) + O p ( β̂ − β ).
N NT
(vi) N1 k=1 T −1/2 Xi λ̂i − H −1 λi = Op (δ−1 NT ) + Op ( β̂ − β ).

LEMMA A.11: Under the assumptions of Theorem 4, T/N(B̂ − B) = op (1)

LEMMA A.12: Under the assumptions of Theorem 4, N/T (Ĉ − C) = op (1)

The proof of Theorem 4 follows from Proposition A.3, Lemma A.11, and
Lemma A.12. For the proof of Proposition 2, see the Supplemental Material.

REFERENCES
ABOWD, J. M., F. KRAMARZ, AND D. N. MARGOLIS (1999): “High Wage Workers and High Wage
Firms,” Econometrica, 67, 251–333. [1233]
PANEL MODELS WITH FIXED EFFECTS 1277

AHN, S. G., Y. H. LEE, AND P. SCHMIDT (2001): “GMM Estimation of Linear Panel Data Models
With Time-Varying Individual Effects,” Journal of Econometrics, 101, 219–255. [1230,1231,1233,
1237,1239,1258]
(2006): “Panel Data Models With Multiple Time-Varying Effects,” Mimeo, Arizona
State University. [1239]
ALTUG, S., AND R. A. MILLER (1990): “Household Choices in Equilibrium,” Econometrica, 58,
543–570. [1233]
ALVAREZ, J., AND M. ARELLANO (2003): “The Time Series and Cross-Section Asymptotics of
Dynamic Panel Data Estimators,” Econometrica, 71, 1121–1159. [1232]
ANDERSON, T. W. (1984): An Introduction to Multivariate Statistical Analysis. New York: Wiley.
[1236]
ANDERSON, T. W., AND C. HSIAO (1982): “Formulation and Estimation of Dynamic Models With
Error Components,” Journal of Econometrics, 76, 598–606. [1232]
ANDERSON, T. W., AND H. RUBIN (1956): “Statistical Inference in Factor Analysis,” in Proceedings
of Third Berkeley Symposium on Mathematical Statistics and Probability, Vol. 5, ed. by J. Neyman.
Berkeley: University of California Press, 111–150. [1234]
ANDREWS, D. W. K. (2005): “Cross-Section Regression With Common Shocks,” Econometrica,
73, 1551–1585. [1234]
ARELLANO, M. (2003): Panel Data Econometrics. Oxford: Oxford University Press. [1239]
ARELLANO, M., AND J. HAHN (2005): “Understanding Bias in Nonlinear Panel Models: Some
Recent Developments,” Unpublished Manuscript, CEMFI. [1250]
ARELLANO, M., AND B. HONORE (2001): “Panel Data Models: Some Recent Developments,” in
Handbook of Econometrics, Vol. 5, ed. by J. J. Heckman and E. Leamer. Amsterdam: North-
Holland. [1230]
BAI, J. (1994): “Least Squares Estimation of Shift in Linear Processes,” Journal of Time Series
Analysis, 15, 453–472. [1244]
(2003): “Inferential Theory for Factor Models of Large Dinensions,” Econometrica, 71,
135–173. [1242,1247,1251]
(2009): “Supplement to ‘Panel Data Models With Interactive Fixed Effects’,” Economet-
rica Supplemental Material, 77, http://www.econometricsociety.org/ecta/Supmat/6135_proofs.
pdf. [1232]
BAI, J., AND S. NG (2002): “Determining the Number of Factors in Approximate Factor Models,”
Econometrica, 70, 191–221. [1235,1267]
(2006): “Confidence Intervals for Diffusion Index Forecasts and Inference for Factor-
Augment Regressions,” Econometrica, 74, 1133–1150. [1251]
BALTAGI, B. H. (2005): Econometric Analysis of Panel Data. Chichester: Wiley. [1239]
BEKKER, P. A. (1994): “Alternative Approximations to the Distributions of Instrumental Variable
Estimators,” Econometrica, 62, 657–681. [1250]
BERNANKE, B., J. BOIVIN, AND P. ELIASZ (2005): “Factor Augmented Vector Autoregression and
the Analysis of Monetray Policy,” Quarterly Journal of Economics, 120, 387–422. [1239]
CAMPBELL, J. Y., A. W. LO, AND A. C. MACKINLAY (1997): The Econometrics of Financial Mar-
kets. Princeton, NJ: Princeton University Press. [1234]
CARNEIRO, P., K. T. HANSEN, AND J. J. HECKMAN (2003): “Estimating Distributions of Treatment
Effects With an Application to the Returns to Schooling and Measurement of the Effects of
Uncertainty on College Choice,” Working Paper 9546, NBER; International Economic Review,
44, 362–422. [1233]
CAWLEY, J., K. CONNELLY, J. HECKMAN, AND E. VYTLACIL (1997): “Cognitive Ability, Wages,
and Meritocracy,” in Intelligence Genes, and Success: Scientists Respond to the Bell Curve, ed. by
B. Devlin, S. E. Feinberg, D. Resnick, and K. Roeder. Berlin: Springer-Verlag, 179–192. [1233]
CHAMBERLAIN, G. (1980): “Analysis of Covariance With Qualitative Data,” Review of Economic
Studies, 47, 225–238. [1232]
(1984): “Panel Data,” in Handbook of Econometrics, Vol. 2, ed. by Z. Griliches and M.
Intriligator. Amsterdam: North-Holland. [1230,1233,1239]
1278 JUSHAN BAI

CHAMBERLAIN, G., AND M. ROTHSCHILD (1983): “Arbitrage, Factor Structure and Mean-
Variance Analysis in Large Asset Markets,” Econometrica, 51, 1281–1304. [1232,1235,1238,
1244]
COAKLEY, J., A. FUERTES, AND R. P. SMITH (2002): “A Principal Components Approach to Cross-
Section Dependence in Panels,” Mimeo, Birkbeck College, University of London. [1231]
CONNOR, G., AND R. KORAJZCYK (1986): “Performance Measurement With the Arbitrage Pric-
ing Theory: A New Framework for Analysis,” Journal of Financial Economics, 15, 373–394.
[1235,1236,1239]
FAMA, E., AND K. FRENCH (1993): “Common Risk Factors in the Returns on Stocks and Bonds,”
Journal of Financial Economics, 33, 3–56. [1234]
FELDSTEIN, M., AND C. HORIOKA (1980): “Domestic Savings and International Capital Flows,”
Economic Journal, 90, 314–329. [1233]
GIANNONE, D., AND M. LENZA (2005): “The Feldstein Horioka Fact,” Unpublished Manuscript,
European Central Bank. [1233]
GOLDBERGER, A. S. (1972): “Structural Equations Methods in the Social Sciences,” Economet-
rica, 40, 979–1001. [1231]
GREENE, W. (2000): Econometric Analysis (Fourth Ed.). Englewood Cliffs, NJ: Prentice-Hall.
[1253,1254]
HAHN, J., AND G. KUERSTEINER (2002): “Asymptotically Unbiased Inference for a Dynamic
Panel Model With Fixed Effects When Both n and T Are Large,” Econometrica, 70, 1639–1657.
[1232,1246,1247,1249]
HAHN, J., AND W. NEWEY (2004): “Jackknife and Analytical Bias Reduction for Nonlinear Panel
Models,” Econometrica, 72, 1295–1319. [1232,1249,1250]
HANSEN, C., J. HAUSMAN, AND W. NEWEY (2005): “Estimation With Many Instruments,” Un-
published Manuscript, Department of Economics, MIT. [1250]
HAUSMAN, J. (1978): “Specification Tests in Econometrics,” Econometrica, 46, 1251–1271. [1256]
HAUSMAN, J. A., AND W. E. TAYLOR (1981): “Panel Data and Unobservable Individual Effects,”
Econometrica, 49, 1377–1398. [1258,1260]
HOLTZ-EAKIN, D., W. NEWEY, AND H. ROSEN (1988): “Estimating Vector Autoregressions With
Panel Data,” Econometrica, 56, 1371–1395. [1230,1231,1233,1238,1239]
HSIAO, C. (2003): Analysis of Panel Data. New York: Cambridge University Press. [1232]
JÖRESKOG, K. G., AND A. S. GOLDBERGER (1975): “Estimation of a Model With Multiple In-
dicators and Multiple Causes of a Single Latent Variable,” Journal of the American Statistical
Association, 70, 631–639. [1231]
KIEFER, N. M. (1980): “A Time Series-Cross Section Model With Fixed Effects With an Intertem-
poral Factor Structure,” Unpublished Manuscript, Department of Economics, Cornell Univer-
sity. [1231,1237]
KIVIET, J. (1995): “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel
Data Models,” Journal of Econometrics, 68, 53–78. [1232]
KNEIP, A., R. SICKLES, AND W. SONG (2005): “A New Panel Data Treatment for Heterogeneity in
Time Trends,” Unpublished Manuscript, Department of Economics, Rice University. [1231]
LAWLEY, D. N., AND A. E. MAXWELL (1971): Factor Analysis as a Statistical Method. London:
Butterworth. [1234]
LEE, Y. H. (1991): “Panel Data Models With Multiplicative Individual and Time Effects: Appli-
cation to Compensation and Frontier Production Functions,” Unpublished Ph.D. Dissertation,
Michigan State University. [1231]
LETTAU, M., AND S. LUDVIGSON (2001): “Resurrecting the (C)CAPM: A Cross-Sectional Test
When Risk Premia Are Time Varying,” Journal of Political Economy, 109, 1238–1287. [1234]
MACURDY, T. (1982): “The Use of Time Series Processes to Model the Error Structure of Earn-
ings in a Longitudinal Data Analysis,” Journal of Econometrics, 18, 83–114. [1231]
MOON, R., AND B. PERRON (2004): “Testing for a Unit Root in Panels With Dynamic Factors,”
Journal of Econometrics, 122, 81–126. [1231]
PANEL MODELS WITH FIXED EFFECTS 1279

MUNDLAK, Y. (1978): “On the Pooling of Time Series and Cross Section Data,” Econometrica,
46, 69–85. [1239]
NEWEY, W., AND D. MCFADDEN (1994): “Large Sample Estimation and Hypothesis Testing,” in
Handbook of Econometrics, ed. by R. F. Engle and D. McFadden. Amsterdam: North Holland,
2111–2245. [1244]
NEWEY, W., AND R. SMITH (2004): “Higher Order Properties of GMM and Generalized Empir-
ical Likeliood Estimators,” Econometrica, 72, 219–255. [1239]
NEWEY, W. K., AND K. D. WEST (1987): “A Simple Positive Semi-Definite, Heteroskedasticity
and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55, 703–708. [1251,1252]
NEYMAN, J., AND E. L. SCOTT (1948): “Consistent Estimates Based on Partially Consistent Ob-
servations,” Econometrica, 16, 1–32. [1232]
NICKELL, S. (1981): “Biases in Dynamic Models With Fixed Effects,” Econometrica, 49,
1417–1426. [1232]
OBSTFELD, M., AND K. ROGOFF (2000): “The Six Major Puzzles in International Macroeco-
nomics: Is There a Common Cause?” NBER Macroeconomic Annual, 15, 339–390. [1233]
PESARAN, M. H. (2006): “Estimation and Inference in Large Heterogeneous Panels With a Mul-
tifactor Error Structure,” Econometrica, 74, 967–1012. [1231,1233,1240]
PHILLIPS, P. C. B., AND D. SUL (2003): “Dynamic Panel Estimation and Homogeneity Testing
Under Cross Section Dependence,” The Econometrics Journal, 6, 217–259. [1231]
ROSS, S. (1976): “The Arbitrage Theory of Capital Asset Pricing,” Journal of Economic Theory,
13, 341–360. [1234]
SARGAN, J. D. (1964): “Wages and Prices in the United Kingdom: A Study in Ecnometrics
Methodology,” in Econometric Analysis of National Economic Planning, ed. by P. G. Hart, G.
Mill, and J. K. Whitaker. London: Butterworths, 25–54. [1237]
STOCK, J. H., AND M. W. WATSON (2002): “Forecasting Using Principal Components From a
Large Number of Predictors,” Journal of the American Statistical Association, 97, 1167–1179.
[1235,1236,1239]
TOWNSEND, R. (1994): “Risk and Insurance in Village India,” Econometrica, 62, 539–592. [1233]

Dept. Economics, New York University, 19 West 4th Street, New York, NY 10012,
U.S.A., SEM, Tsinghua University, and CEMA, Central University of Finance and
Economics, Beijing, China; [email protected].
Manuscript received October, 2005; final revision received January, 2009.

NVCA 2020 Yearbook
No ratings yet
NVCA 2020 Yearbook
71 pages
Panel Data Lecture Notes
No ratings yet
Panel Data Lecture Notes
38 pages
FEF and FEF-IV - 5 - 6 - 2017
No ratings yet
FEF and FEF-IV - 5 - 6 - 2017
13 pages
NVCA 2013 Yearbook PDF
No ratings yet
NVCA 2013 Yearbook PDF
111 pages
Econometrica - 2009 - Bai - Panel Data Models With Interactive Fixed Effects
No ratings yet
Econometrica - 2009 - Bai - Panel Data Models With Interactive Fixed Effects
51 pages
Panal Data Method ch14 PDF
No ratings yet
Panal Data Method ch14 PDF
38 pages
Rev Lect 3&4 J
No ratings yet
Rev Lect 3&4 J
56 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Section10 Solutions
100% (1)
Section10 Solutions
11 pages
Croissant y Millo, Panel Data Econometrics
100% (1)
Croissant y Millo, Panel Data Econometrics
52 pages
00 panels1e
No ratings yet
00 panels1e
20 pages
LN 13
No ratings yet
LN 13
8 pages
Cuarta Clase
No ratings yet
Cuarta Clase
142 pages
1709.08980v2
No ratings yet
1709.08980v2
40 pages
PLM
No ratings yet
PLM
51 pages
Cap 1-3 Hsiao Analysis of Panel Data
No ratings yet
Cap 1-3 Hsiao Analysis of Panel Data
57 pages
Panel Data Lecture Rome
No ratings yet
Panel Data Lecture Rome
47 pages
Vb V ε X = σ Vb = σ Vb = X'X Σx X'X: I X'X X'
No ratings yet
Vb V ε X = σ Vb = σ Vb = X'X Σx X'X: I X'X X'
9 pages
Panel Data Assign
No ratings yet
Panel Data Assign
19 pages
Panel Data Econometrics in R: The PLM Package: Yves Croissant Giovanni Millo
No ratings yet
Panel Data Econometrics in R: The PLM Package: Yves Croissant Giovanni Millo
51 pages
Panel Vs Pooled Data
No ratings yet
Panel Vs Pooled Data
9 pages
Panel Time-Series
No ratings yet
Panel Time-Series
113 pages
AE 2023 Lecture10
No ratings yet
AE 2023 Lecture10
40 pages
Ecotrics (PR) Panel Data 1
No ratings yet
Ecotrics (PR) Panel Data 1
14 pages
Estimating Econometric Models With Fixed Effects
No ratings yet
Estimating Econometric Models With Fixed Effects
14 pages
Panel Data: Fixed and Random Effects: I1 0 I1 0 I I1
No ratings yet
Panel Data: Fixed and Random Effects: I1 0 I1 0 I I1
8 pages
Nonlinear Panel Data
No ratings yet
Nonlinear Panel Data
29 pages
Er Za 2009
No ratings yet
Er Za 2009
9 pages
Intro Panel Data by Kurt-Univ Basel
No ratings yet
Intro Panel Data by Kurt-Univ Basel
8 pages
Panel Ecmiic2
No ratings yet
Panel Ecmiic2
57 pages
Problem Set 1: Panel Data
No ratings yet
Problem Set 1: Panel Data
3 pages
Adv Econ Chapter 1: Modeling Framework
No ratings yet
Adv Econ Chapter 1: Modeling Framework
5 pages
Panel Data
100% (1)
Panel Data
13 pages
Chapter 2 Slides Handout
No ratings yet
Chapter 2 Slides Handout
48 pages
Midterm Exam spring 2023 - Answers
No ratings yet
Midterm Exam spring 2023 - Answers
6 pages
Ecotrics (PR) Panel Data 2
No ratings yet
Ecotrics (PR) Panel Data 2
16 pages
panel2up
No ratings yet
panel2up
9 pages
MIT14 382S17 Lec10
No ratings yet
MIT14 382S17 Lec10
10 pages
Ch11_slides_PA April 2024 (2)
No ratings yet
Ch11_slides_PA April 2024 (2)
27 pages
Diff Diff
No ratings yet
Diff Diff
121 pages
Econometrics Final Exam Study Guide PDF
No ratings yet
Econometrics Final Exam Study Guide PDF
14 pages
Lecture 5 - Panel data models
No ratings yet
Lecture 5 - Panel data models
14 pages
Holtz Eakin1988 PDF
No ratings yet
Holtz Eakin1988 PDF
11 pages
Panel Data-1 FD and FE Estimators (1)
No ratings yet
Panel Data-1 FD and FE Estimators (1)
4 pages
Introduction To Panel Data
No ratings yet
Introduction To Panel Data
20 pages
Econometris II - 4
No ratings yet
Econometris II - 4
26 pages
Chapter 14 Advanced Panel Data Methods: T T Derrorterm Complicate X y
No ratings yet
Chapter 14 Advanced Panel Data Methods: T T Derrorterm Complicate X y
13 pages
Panel Data I
No ratings yet
Panel Data I
40 pages
Nickell-Biases DynamicModels-1981
No ratings yet
Nickell-Biases DynamicModels-1981
11 pages
04 - Panel Data PDF
No ratings yet
04 - Panel Data PDF
84 pages
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
No ratings yet
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
24 pages
Panel Data
No ratings yet
Panel Data
105 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
61 pages
slide
No ratings yet
slide
43 pages
Block 3
No ratings yet
Block 3
105 pages
PANEL DATA ANSWERS
No ratings yet
PANEL DATA ANSWERS
5 pages
Handout 5 Panel Data
No ratings yet
Handout 5 Panel Data
23 pages
(O) Andreou Et Al (2019) Inference in Group Factor Models With An Application To Mixed Frequency Data
No ratings yet
(O) Andreou Et Al (2019) Inference in Group Factor Models With An Application To Mixed Frequency Data
39 pages
Panel Data Regression Models-Seminar
No ratings yet
Panel Data Regression Models-Seminar
18 pages
econometrics II CH-4 PPT (3)
No ratings yet
econometrics II CH-4 PPT (3)
25 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
The Contribution of High-Skilled Immigrants To Innovation in The United States
No ratings yet
The Contribution of High-Skilled Immigrants To Innovation in The United States
69 pages
Investor Horizon and The Life Cycle of Innovative Firms: Evidence From Venture Capital
No ratings yet
Investor Horizon and The Life Cycle of Innovative Firms: Evidence From Venture Capital
24 pages
Matrix Completion Methods For Causal Panel Data Models
No ratings yet
Matrix Completion Methods For Causal Panel Data Models
49 pages
The Economic Costs of Conflict: A Case Study of The Basque Country
No ratings yet
The Economic Costs of Conflict: A Case Study of The Basque Country
73 pages
Abadie2010 - Synthetic Control Methods
No ratings yet
Abadie2010 - Synthetic Control Methods
14 pages
AI Presentation October
No ratings yet
AI Presentation October
59 pages
A Method To Link Advances in Artificial Intelligence To
No ratings yet
A Method To Link Advances in Artificial Intelligence To
4 pages
NVCA 2019 Yearbook
No ratings yet
NVCA 2019 Yearbook
72 pages
BookletAbstracts_CIMA-CIMPA_2025
No ratings yet
BookletAbstracts_CIMA-CIMPA_2025
19 pages
Introduction To Panel Data Analysis Using Eviews
No ratings yet
Introduction To Panel Data Analysis Using Eviews
43 pages
Assessmentoftheroleofindependentdirectors Haldar
No ratings yet
Assessmentoftheroleofindependentdirectors Haldar
112 pages
Jurnal 1 (International)
No ratings yet
Jurnal 1 (International)
16 pages
Ae1 Panel
No ratings yet
Ae1 Panel
36 pages
Impact of Revenue Diversification On Nonprofit Financial Health
No ratings yet
Impact of Revenue Diversification On Nonprofit Financial Health
23 pages
2019 3061 Ajt PDF
No ratings yet
2019 3061 Ajt PDF
34 pages
QM 7 Panel Regression Fixed Effects
No ratings yet
QM 7 Panel Regression Fixed Effects
36 pages
Immediate Download Panel Data Econometrics Theoretical Contributions and Empirical Applications 1st Edition Badi H. Baltagi (Eds.) Ebooks 2024
100% (2)
Immediate Download Panel Data Econometrics Theoretical Contributions and Empirical Applications 1st Edition Badi H. Baltagi (Eds.) Ebooks 2024
66 pages
Introduction To Stata Part 2 (Six Slides Per Page)
No ratings yet
Introduction To Stata Part 2 (Six Slides Per Page)
12 pages
1 s2.0 S0308596123000903 Main
No ratings yet
1 s2.0 S0308596123000903 Main
12 pages
ESG and Corporate Financial Pe
No ratings yet
ESG and Corporate Financial Pe
17 pages
Coaching Quantitative (Reading 2)
No ratings yet
Coaching Quantitative (Reading 2)
7 pages
Panel Data
100% (2)
Panel Data
5 pages
Linear Regression Models For Panel Data - 230919 - 160651
No ratings yet
Linear Regression Models For Panel Data - 230919 - 160651
93 pages
Chap1 Econometrics
No ratings yet
Chap1 Econometrics
36 pages
Advanced Econometrics
No ratings yet
Advanced Econometrics
61 pages
Econ21 Ditzen
No ratings yet
Econ21 Ditzen
36 pages
Chapter 4, Unit 3, Biostatistics and Research Methodology, B Pharmacy 8th Sem, Carewell Pharma 2
No ratings yet
Chapter 4, Unit 3, Biostatistics and Research Methodology, B Pharmacy 8th Sem, Carewell Pharma 2
18 pages
ECN3322 - Panel Data-1
No ratings yet
ECN3322 - Panel Data-1
56 pages
Introduction To Panel Data Analysis
No ratings yet
Introduction To Panel Data Analysis
18 pages
Panel Data On Eviews
No ratings yet
Panel Data On Eviews
15 pages
Perasan Common Correlated Effects Estimation
No ratings yet
Perasan Common Correlated Effects Estimation
63 pages
Panel Data Analysis - Advantages and Challenges: Wise Working Paper Series WISEWP0602
No ratings yet
Panel Data Analysis - Advantages and Challenges: Wise Working Paper Series WISEWP0602
35 pages
4 DC 2 F 16583 e 31196
No ratings yet
4 DC 2 F 16583 e 31196
20 pages
CH13 Wooldridge 7e+PPT 2pp
No ratings yet
CH13 Wooldridge 7e+PPT 2pp
14 pages
Lab Introduction To STATA
No ratings yet
Lab Introduction To STATA
27 pages
Does Earning Per Share (Eps) Affected by Debt To Asset Ratio (Dar) and Debt To Equity Ratio (Der) ?
No ratings yet
Does Earning Per Share (Eps) Affected by Debt To Asset Ratio (Dar) and Debt To Equity Ratio (Der) ?
11 pages
Panel Data Analysis For Economics and The Melbourne Institute
No ratings yet
Panel Data Analysis For Economics and The Melbourne Institute
36 pages
Chapter 5 Panel Data (2) (1)
No ratings yet
Chapter 5 Panel Data (2) (1)
47 pages

Econometrica: Eywords

Uploaded by

Econometrica: Eywords

Uploaded by

Econometrica, Vol. 77, No.

4 (July, 2009), 1229–1279

PANEL DATA MODELS WITH INTERACTIVE FIXED EFFECTS

KEYWORDS: Additive effects, interactive effects, factor error structure, bias-cor-

(1) Yit = Xit β + uit

uit = λi Ft + εit (i = 1 2     N, t = 1 2     T ),

where Xit is a p × 1 vector of observable regressors, β is a p × 1 vec-

© 2009 The Econometric Society DOI: 10.3982/ECTA6135

The preceding set of equations constitutes the interactive-effects model in

(3) Xit = τi + θt + ak λik + bk Fkt + ck λik Fkt + πi Gt + ηit 

(4) Yit = Xit β + αi + δt + λi Ft + εit 

Model (4) will be considered in Section 8.

Similarly, define Λ = (λ1  λ2      λN ) , an N × r matrix. In matrix notation,

where Y = (Y1      YN ) is T × N and X is a three-dimensional matrix with p

(8) SSR(β F Λ) = (Yi − Xi β − Fλi ) (Yi − Xi β − Fλi )

Given β, the variable Wi = Yi − Xi β has a pure factor structure such that

WW= Wi Wi = (Yi − Xi β)(Yi − Xi β) 

Therefore, given F , we can estimate β, and given β, we can estimate F . The

where VNT is a diagonal matrix that consists of the r largest eigenvalues of

Λ̂ = (λ̂1  λ̂2      λ̂N ) = T −1 [F̂ (Y1 − X1 β̂)     F̂ (YN − XN β̂)]

We may also write

where Y is T × N and X is T × N × p, a three-dimensional matrix.

REMARK 1: The common factor F is obtained by the principal components

first r eigenvectors of W W /N are not consistent for F (a rotation of F to be

REMARK 2: Instead of estimating F from (9) by the method of princi-

REMARK 3: The estimation procedure can be modified to handle unbal-

3.3. Alternative Estimation Methods

METHOD 1: The quasi-differencing method in Holtz-Eakin, Newey, and

METHOD 2: We can extend the argument of Mundlak (1978) and Chamber-

Yit = Xit β + X̄i· δt + ηi Ft + εit 

effects GLS to estimate (β δ1      δT ). Similarly, when Ft is correlated with

Yit = Xit β + X̄·t ρi + λi ξt + εit

(14) Yit = Xit β + X̄i· δt + X̄·t ρi + X̄i· C X̄·t + ηi ξt + εit 

where C is a matrix. The above can be estimated by the random-effects GLS.

This shows the need for D(F) to be positive definite.

ASSUMPTION A—E Xit 4 ≤ M: Let F = {F : F F/T = I}. We assume

inf D(F) > 0

The matrix F in this assumption is T × r, either deterministic or ran-

This assumption implies the existence of r factors. Note that whether Ft or λt

ASSUMPTION C—Serial and Cross-Sectional Weak Dependence and Hetero-

σ̄ij ≤ M τts ≤ M |σijts | ≤ M

The largest eigenvalue of Ωi = E(εi εi ) (T × T ) is bounded uniformly in i and T .

Assumption C is about weak serial and cross-sectional correlation. Het-

ASSUMPTION D: εit is independent of Xjs , λj , and Fs for all i t j, and s.

SNT (β F) = (Yi − Xi β) MF (Yi − Xi β) − ε MF 0 εi 

(β̂ F̂) = arg min SNT (β F)

As explained in the previous section, (β̂ F̂) satisfies

(Yi − Xi β̂)(Yi − Xi β̂) F̂ = F̂VNT 

PROPOSITION 1—Consistency: Under Assumptions A–D, as N T → ∞, the

The usual argument of consistency for extreme estimators would involve

SNT (β F) − S̃NT (β F) = op (1)

where op (1) is uniform. This implies the consistency of β̂ for β0 . However,

THEOREM 1—Rate of Convergence: Assume Assumptions

The theorem allows cross-section and serial correlations, as well as het-

Then in the absence of serial correlation and heteroskedasticity in one of

(15) NT (β̂ − β0 ) = Z Zi √ Zi εi + op (1)

If correlation and heteroskedasticity are present in both dimensions, there

var √ Zi εi = σijts E(Zit Zjs )

ASSUMPTION E: For some nonrandom positive definite matrix DZ ,

(16) plim σijts Zit Zjs = DZ 

In the absence of serial correlation and heteroskedasticity, we let σij =

(17) plim σij Zit Zjt = D1 

plim ωts Zit Zis  = D2

i=1 Zi εi −→ N(0 D2 ), respectively.

THEOREM 2: Assume Assumptions A–E hold. As T N → ∞, the following

(ii) In the absence of cross-section correlation and heteroskedasticity and with

Noting that D1 = D2 = σ 2 D0 under i.i.d. assumption of εit , the following

COROLLARY 1: Under the assumptions of Theorem 1, if εit are i.i.d. over t

It is conjectured that β̂ is asymptotically efficient if εit are i.i.d. N(0 σ 2 ),

THEOREM 3: Assume Assumptions A–E hold and T/N → ρ > 0. Then

where B0 is the probability limit of B with

(18) B = −D(F 0 )−1

and C0 is the probability limit of C with

There will be no biases in the absence of correlations and heteroskedas-

form of serial correlations is usually removed by adding lagged dependent vari-

REMARK 4: Suppose that k factors√ are allowed in the estimation, with k

uit = λi Ft + εit (i = 1 2 N, t = 1 2 T ),

(3) Xit = τi + θt + ak λik + bk Fkt + ck λik Fkt + πi Gt + ηit

(4) Yit = Xit β + αi + δt + λi Ft + εit

Similarly, define Λ = (λ1 λ2 λN ) , an N × r matrix. In matrix notation,

where Y = (Y1 YN ) is T × N and X is a three-dimensional matrix with p

(8) SSR(β F Λ) = (Yi − Xi β − Fλi ) (Yi − Xi β − Fλi )

WW= Wi Wi = (Yi − Xi β)(Yi − Xi β)

Λ̂ = (λ̂1 λ̂2 λ̂N ) = T −1 [F̂ (Y1 − X1 β̂) F̂ (YN − XN β̂)]

Yit = Xit β + X̄i· δt + ηi Ft + εit

effects GLS to estimate (β δ1 δT ). Similarly, when Ft is correlated with

(14) Yit = Xit β + X̄i· δt + X̄·t ρi + X̄i· C X̄·t + ηi ξt + εit

inf D(F) > 0

σ̄ij ≤ M τts ≤ M |σijts | ≤ M

ASSUMPTION D: εit is independent of Xjs , λj , and Fs for all i t j, and s.

SNT (β F) = (Yi − Xi β) MF (Yi − Xi β) − ε MF 0 εi

(β̂ F̂) = arg min SNT (β F)

As explained in the previous section, (β̂ F̂) satisfies

(Yi − Xi β̂)(Yi − Xi β̂) F̂ = F̂VNT

PROPOSITION 1—Consistency: Under Assumptions A–D, as N T → ∞, the

SNT (β F) − S̃NT (β F) = op (1)

(15) NT (β̂ − β0 ) = Z Zi √ Zi εi + op (1)

var √ Zi εi = σijts E(Zit Zjs )

(16) plim σijts Zit Zjs = DZ

(17) plim σij Zit Zjt = D1

plim ωts Zit Zis = D2

i=1 Zi εi −→ N(0 D2 ), respectively.

THEOREM 2: Assume Assumptions A–E hold. As T N → ∞, the following

It is conjectured that β̂ is asymptotically efficient if εit are i.i.d. N(0 σ 2 ),

Under the assumption of E(εit2 ) = σit2 and E(εit εjs ) = 0 for i = j or t = s,

THEOREM 4: Assume Assumptions A–E hold. In addition, E(εit2 ) = σit2 and

D̂0 = Ẑit Ẑit

(j = 1 2 3) are covariance matrices when heteroskedasticity exists in the cross-

D̂1 = σ̂ Ẑit Ẑit

D̂2 = ω̂ Ẑit Ẑit

D̂3 = Ẑit Ẑit ε̂it2

PROPOSITION 2: Assume Assumptions A–E hold. Then as N T → ∞,

D̂1 = Ẑit Ẑjt ε̂it ε̂jt

(26) Yit = Xit β + μ + αi + ξt + λi Ft + εit

(28) F F/T = Ir Λ Λ = diagonal

φ̄·t = φit φ̄i· = φit φ̄·· = φit

μ̂ = Ȳ·· − X̄·· β̂

(Ẏit − Ẋit β − λi Ft )2

Multiplying ιT = (1 1) on each side yields

Ḋ(F) = Żi (F) Żi (F)

(30) inf Ḋ(F) > 0

THEOREM 5: Assume Assumptions A–F hold. Then as T N → ∞, the follow-

(32) Yit = Xit β + λi Ft + εit

NT (β̂FE − β) = Ẋ Ẋi √ Ẋi εi

(34) E[(η − ξ)ψ ] = σ 2 D(F 0 )

(35) Yit = Xit ϕ + xi γ + wt δ + λi Ft + εit

Yit = Xit ϕ + μ + αi + ξt + xi γ + wt δ + λi Ft + εit

SNT (β F) = S̃NT (β F) + 2β Xi MF εi + 2 λ F MF εi

(39) SNT (β F) = S̃NT (β F) + op (1)

S̃NT (β F) = β Aβ + η Bη + 2β C η

S̃NT (β F) = β (A − C B−1 C)β + (η + β CB−1 )B(η + B−1 Cβ)

0 ≥ SNT (β̂ F̂) = S̃NT (β̂ F̂) + op (1)

Combined with S̃NT (β̂ F̂) ≥ 0, it must be true that