Guja - Chap 16 PDF
Guja - Chap 16 PDF
Chapter
16
Panel Data Regression
Models
In Chapter 1 we discussed briefly the types of data that are generally available for empir-
ical analysis, namely, time series, cross section, and panel. In time series data we observe
the values of one or more variables over a period of time (e.g., GDP for several quarters
or years). In cross-section data, values of one or more variables are collected for several
sample units, or subjects, at the same point in time (e.g., crime rates for 50 states in the
United States for a given year). In panel data the same cross-sectional unit (say a family
or a firm or a state) is surveyed over time. In short, panel data have space as well as time
dimensions.
We have already seen an example of this in Table 1.1, which gives data on eggs produced
and their prices for 50 states in the United States for years 1990 and 1991. For any given
year, the data on eggs and their prices represent a cross-sectional sample. For any given
state, there are two time series observations on eggs and their prices. Thus, we have in all
100 (pooled) observations on eggs produced and their prices.
Another example of panel data was given in Table 1.2, which gives data on investment,
value of the firm, and capital stock for four companies for the period 1935–1954. The data
for each company over the period 1935–1954 constitute time series data, with 20 observa-
tions; data, for all four companies for a given year is an example of cross-section data, with
only four observations; and data for all the companies for all the years is an example of
panel data, with a total of 80 observations.
There are other names for panel data, such as pooled data (pooling of time series
and cross-sectional observations), combination of time series and cross-section data,
micropanel data, longitudinal data (a study over time of a variable or group of subjects),
event history analysis (studying the movement over time of subjects through successive
states or conditions), and cohort analysis (e.g., following the career path of 1965 graduates
of a business school). Although there are subtle variations, all these names essentially con-
note movement over time of cross-sectional units. We will therefore use the term panel data
in a generic sense to include one or more of these terms. And we will call regression mod-
els based on such data panel data regression models.
Panel data are now being used increasingly in economic research. Some of the well-
known panel data sets are:
1. The Panel Study of Income Dynamics (PSID) conducted by the Institute of Social
Research at the University of Michigan. Started in 1968, each year the Institute col-
lects data on some 5,000 families about various socioeconomic and demographic
variables.
591
guj75772_ch16.qxd 22/08/2008 07:13 PM Page 592
2. The Bureau of the Census of the Department of Commerce conducts a survey similar to
PSID, called the Survey of Income and Program Participation (SIPP). Four times a
year respondents are interviewed about their economic condition.
3. The German Socio-Economic Panel (GESOEP) studied 1,761 individuals every year
between 1984 and 2002. Information on year of birth, gender, life satisfaction, marital
status, individual labor earnings, and annual hours of work was collected for each indi-
vidual for the period 1984 to 2002.
There are also many other surveys that are conducted by various governmental agencies,
such as:
Household, Income and Labor Dynamics in Australia Survey (HILDA)
British Household Panel Survey (BHPS)
Korean Labor and Income Panel Study (KLIPS)
At the outset a warning is in order: The topic of panel data regressions is vast, and some of
the mathematics and statistics involved are quite complicated. We only hope to touch on some
of the essentials of the panel data regression models, leaving the details for the references.1 But
be forewarned that some of these references are highly technical. Fortunately, user-friendly
software packages such as LIMDEP, PC-GIVE, SAS, STATA, SHAZAM, and EViews, among
others, have made the task of actually implementing panel data regressions quite easy.
1
Some of the references are G. Chamberlain, “Panel Data,” in Handbook of Econometrics, vol. II;
Z. Griliches and M. D. Intriligator, eds., North-Holland Publishers, 1984, Chapter 22; C. Hsiao,
Analysis of Panel Data, Cambridge University Press, 1986; G. G. Judge, R. C. Hill, W. E. Griffiths,
H. Lutkepohl, and T. C. Lee, Introduction to the Theory and Practice of Econometrics, 2d ed., John Wiley
& Sons, New York, 1985, Chapter 11; W. H. Greene, Econometric Analysis, 6th ed., Prentice-Hall,
Englewood Cliffs, NJ, 2008, Chapter 9; Badi H. Baltagi, Econometric Analysis of Panel Data, John Wiley
and Sons, New York, 1995; and J. M. Wooldridge, Econometric Analysis of Cross Section and Panel
Data, MIT Press, Cambridge, Mass., 1999. For a detailed treatment of the subject with empirical
applications, see Edward W. Frees, Longitudinal and Panel Data: Analysis and Applications in the Social
Sciences, Cambridge University Press, New York, 2004.
2
Baltagi, op. cit., pp. 3–6.
guj75772_ch16.qxd 22/08/2008 07:13 PM Page 593
3. The fixed effects within-group model. Here also we pool all 90 observations, but for
each airline we express each variable as a deviation from its mean value and then esti-
mate an OLS regression on such mean-corrected or “de-meaned” values.
4. The random effects model (REM). Unlike the LSDV model, in which we allow each
airline to have its own (fixed) intercept value, we assume that the intercept values are a
random drawing from a much bigger population of airlines.
We now discuss each of these methods using the data given in Table 16.1. (See textbook
website.)
TABLE 16.2
Dependent Variable: C
Method: Least Squares
Included observations: 90
Recall that one of the important assumptions of the classical linear regression model is that
there is no correlation between the regressors and the disturbance or error term.
To see how the error term may be correlated with the regressors, let us consider the
following revision of model (16.3.1):
Cit = β1 + β2 P Fit + β3 L Fit + β4 Mit + u it (16.3.2)
where the additional variable M = management philosophy or management quality. Of the
variables included in Eq. (16.3.2), only the variable M is time-invariant (or time-constant)
because it varies among subjects but is constant over time for a given subject (airline).
Although it is time-invariant, the variable M is not directly observable and therefore we
cannot measure its contribution to the cost function. We can, however, do this indirectly if
we write Eq. (16.3.2) as
Cit = β1 + β2 P Fit + β3 L Fit + αi + u it (16.3.3)
where αi , called the unobserved, or heterogeneity, effect, reflects the impact of M on
cost. Note that for simplicity we have shown only the unobserved effect of M on cost, but
in reality there may be more such unobserved effects, for example, the nature of ownership
(privately owned or publicly owned), whether it is a minority-owned company, whether the
CEO is a man or a woman, etc. Although such variables may differ among the subjects (air-
lines), they will probably remain the same for any given subject over the sample period.
Since αi is not directly observable, why not consider it random and include it in the error
term u it , and thereby consider the composite error term vit = αi + u it ? We now write
Eq. (16.3.3) as:
Cit = β1 + β2 P Fit + β3 L Fit + vit (16.3.4)
But if the αi term included in the error term vit is correlated with any of the regressors
in Eq. (16.3.4), we have a violation of one of the key assumptions of the classical linear re-
gression model—namely, that the error term is not correlated with the regressors. As we
know in this situation, the OLS estimates are not only biased but they are also inconsistent.
There is a real possibility that the unobservable αi is correlated with one or more of the
regressors. For example, the management of one airline may be astute enough to buy future
contracts of the fuel price to avoid severe price fluctuations. This will have the effect of
lowering the cost of airline services. As a result of this correlation, it can be shown that
cov (vit , vis ) = σu2 ; t = s, which is non-zero, and therefore, the (unobserved) heterogene-
ity induces autocorrelation and we will have to pay attention to it. We will show later how
this problem can be handled.
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 596
The question, therefore, is how we account for the unobservable, or heterogeneity, effect(s)
so that we can obtain consistent and/or efficient estimates of the parameters of the variables
of prime interest, which are output, fuel price, and load factor in our case. Our prime interest
may not be in obtaining the impact of the unobservable variables because they remain the
same for a given subject. That is why such unobservable, or heterogeneity, effects are called
nuisance parameters. How then do we proceed? It is to this question we now turn.
α2 E(Yit|Xit) = α 1 + β Xit
Group 1
α1
Xit
Output
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 597
companies; this is equivalent to neglecting the fixed effects.4 You can see from Figure 16.1
how the pooled regression can bias the slope estimate.
How do we actually allow for the (fixed effect) intercept to vary among the airlines? We can
easily do this by using the dummy variable technique, particularly the differential intercept
dummy technique, which we learned in Chapter 9. Now we write Eq. (16.4.1) as:
Cit = α1 + α2 D2i + α3 D3i + α4 D4i + α5 D5i + α6 D6i
+ β2 Q it + β3 P Fit + β4 L Fit + u it (16.4.2)
where D2i = 1 for airline 2, 0 otherwise; D3i = 1 for airline 3, 0 otherwise; and so on.
Notice that since we have six airlines, we have introduced only five dummy variables to
avoid falling into the dummy-variable trap (i.e., the situation of perfect collinearity). Here
we are treating airline 1 as the base, or reference, category. Of course, you can choose any
airline as the reference point. As a result, the intercept α1 is the intercept value of airline 1
and the other α coefficients represent by how much the intercept values of the other airlines
differ from the intercept value of the first airline. Thus, α2 tells by how much the intercept
value of the second airline differs from α1 . The sum (α1 + α2 ) gives the actual value of the
intercept for airline 2. The intercept values of the other airlines can be computed similarly.
Keep in mind that if you want to introduce a dummy for each airline, you will have to drop
the (common) intercept; otherwise, you will fall into the dummy-variable trap.
The results of the model (16.4.2) for our data are presented in Table 16.3.
The first thing to notice about these results is that all the differential intercept coeffi-
cients are individually highly statistically significant, suggesting that perhaps the six air-
lines are heterogeneous and, therefore, the pooled regression results given in Table 16.2
may be suspect. The values of the slope coefficients given in Tables 16.2 and 16.3 are also
different, again casting some doubt on the results given in Table 16.2. It seems model
(16.4.1) is better than model (16.3.1). In passing, note that OLS applied to a fixed effect
model produces estimators that are called fixed effect estimators.
TABLE 16.3
Dependent Variable: TC
Method: Least Squares
Sample: 1–90
Included observations: 90
4
Adapted from the unpublished notes of Alan Duncan.
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 598
We can provide a formal test of the two models. In relation to model (16.4.1), model
(16.3.1) is a restricted model in that it imposes a common intercept for all the airlines.
Therefore, we can use the restricted F test discussed in Chapter 8. Using formula (8.6.10),
the reader can check that in the present case the F value is:
(0.971642 − 0.946093)/5
F= ≈ 14.99
(1 − 0.971642)/81
Note: The restricted and unrestricted R2 values are obtained from Tables 16.1 and 16.2.
Also note that the number of restrictions is 5 (why?).
The null hypothesis here is that all the differential intercepts are equal to zero. The com-
puted F value for 5 numerator and 81 denominator df is highly statistically significant.
Therefore, we reject the null hypothesis that all the (differential) intercepts are zero. If the
F value were not statistically significant, we would have concluded that there is no differ-
ence in the intercepts of the six airlines. In this case, we would have pooled all 90 of the
observations, as we did in the pooled regression given in Table 16.2.
Model (16.4.1) is known as a one-way fixed effects model because we have allowed the
intercepts to differ between airlines. But we can also allow for time effect if we believe that
the cost function changes over time because of factors such as technological changes, changes
in government regulation and/or tax policies, and other such effects. Such a time effect can be
easily accounted for if we introduce time dummies, one for each year from 1970 to 1984.
Since we have data for 15 years, we can introduce 14 time dummies (why?) and extend model
(16.4.1) by adding these variables. If we do that, the model that emerges is called a two-way
fixed effects model because we have allowed for both individual and time effects.
In the present example, if we add the time dummies, we will have in all 23 coefficients to
estimate—the common intercept, five airlines dummies, 14 time dummies, and three slope
coefficients. As you can see, we will consume several degrees of freedom. Furthermore, if
we decide to allow the slope coefficients to differ among the companies, we can interact the
five firm (airline) dummies with each of the three explanatory variables and introduce
differential slope dummy coefficients. Then we will have to estimate 15 additional coeffi-
cients (five dummies interacted with three explanatory variables). As if this is not enough, if we
interact the 14 time dummies with the three explanatory variables, we will have in all 42 addi-
tional coefficients to estimate. As you can see, we will not have any degrees of freedom left.
Fourth, we have to think carefully about the error term u it . The results we have pre-
sented in Eqs. (16.3.1) and (16.4.1) are based on the assumption that the error term follows
the classical assumptions, namely, u it ∼ N (0, σ 2 ). Since the index i refers to cross-section
observations and t to time series observations, the classical assumption for u it may have to
be modified. There are several possibilities, including:
1. We can assume that the error variance is the same for all cross-section units or we can
assume that the error variance is heteroscedastic.5
2. For each entity, we can assume that there is no autocorrelation over time. Thus, in our
illustrative example, we can assume that the error term of the cost function for airline #1 is
non-autocorrelated, or we can assume that it is autocorrelated, say, of the AR(1) type.
3. For a given time, it is possible that the error term for airline #1 is correlated with the
error term for, say, airline #2.6 Or we can assume that there is no such correlation.
There are also other combinations and permutations of the error term. As you will quickly
realize, allowing one or more of these possibilities will make the analysis that much more com-
plicated. (Space and mathematical demands preclude us from considering all the possibilities.
The references in footnote 1 discuss some of these topics.) Some of these problems may be
alleviated, however, if we consider the alternatives discussed in the next two sections.
TABLE 16.4
Dependent Variable: DMTC
Method: Least Squares
Sample: 1–90
Included observations: 90
α2
α1 E(Y*it|X*it) = β X*it
X*it
Output
that WG estimators, although consistent, are inefficient (i.e., have larger variances)
compared to the ordinary pooled regression results.7 Observe that the slope coefficients of
the Q, PF, and LF are identical in Tables 16.3 and 16.4. This is because mathematically the
two models are identical. Incidentally, the regression coefficients estimated by the WG
method are called WG estimators.
One disadvantage of the WG estimator can be explained with the following wage
regression model:
Wit = β1i + β2 Experienceit + β3 Ageit + β4 Genderit + β5 Educationit + β6 Raceit
(16.5.2)
In this wage function, variables such as gender, education, and race are time-invariant. If
we use the WG estimators, these time-invariant variables will be wiped out (because of
7
The reason for this is that when we express variables as deviations from their mean values, the varia-
tion in these mean-corrected values will be much smaller than the variation in the original values of
the variables. In that case, the variation in the disturbance term uit may be relatively large, thus
leading to higher standard errors of the estimated coefficients.
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 601
differencing). As a result, we will not know how wage reacts to these time-invariant vari-
ables.8 But this is the price we have to pay to avoid the correlation between the error term
(αi included in vit ) and the explanatory variables.
Another disadvantage of the WG estimator is that, “. . . it may distort the parameter val-
ues and can certainly remove any long run effects.”9 In general, when we difference a vari-
able, we remove the long-run component from that variable. What is left is the short-run
value of that variable. We will discuss this further when we discuss time series economet-
rics later in the book.
In using LSDV we obtained direct estimates of the intercepts for each airline. How can
we obtain the estimates of the intercepts using the WG method? For the airlines example,
they are obtained as follows:
α̂i = C i − β̂2 Q i − β̂3 P Fi − β̂4 L F (16.5.3)
where bars over the variables denote the sample mean values of the variables for the ith
airline.
That is, we obtain the intercept value of the ith airline by subtracting from the mean
value of the dependent variable the mean values of the explanatory variables for that airline
times the estimated slope coefficients from the WG estimators. Note that the estimated
slope coefficients remain the same for all of the airlines, as shown in Table 16.4. It may be
noted that the intercept estimated in Eq. (16.5.3) is similar to the intercept we estimate in
the standard linear regression model, which can be see from Eq. (7.4.21). We leave it for
the reader to find the intercepts of the six airlines in the manner shown and verify that they
are the same as the intercept values derived in Table 16.3, save for the rounding errors.
It may be noted that the estimated intercept of each airline represents the subject-specific
characteristics of each airline, but we will not be able to identify these characteristics indi-
vidually. Thus, the α1 intercept for airline #1 represents the management philosophy of that
airline, the composition of its board of directors, the personality of the CEO, the gender of
the CEO, etc. All these heterogeneity characteristics are subsumed in the intercept value.
As we will see later, such characteristics can be included in the random effects model.
In passing, we note that an alternative to the WG estimator is the first-difference
method. In the WG method, we express each variable as a deviation from that variable’s
mean value. In the first-difference method, for each subject we take successive differences
of the variables. Thus, for airline #1 we subtract the first observation of TC from the second
observation of TC, the second observation of TC from the third observation of TC, and so
on. We do this for each of the remaining variables and repeat this process for the remaining
five airlines. After this process we have only 14 observations for each airline, since the first
observation has no previous value. As a result, we now have 84 observations instead of the
original 90 observations. We then regress the first-differenced values of the TC variable on
the first-differenced values of the explanatory variables as follows:
T Cit = β2 Q it + β3 P Fit + β4 L Fit + (u it − u i,t−1 )
i = 1, 2, . . . , 6 (16.5.4)
t = 1, 2, . . . , 84
where = (T Cit − T Ci, t−1 ). As noted in Chapter 11, is called the first difference
operator.10
8
This is also true of the LSDV model.
9
Dimitrios Asteriou and Stephen G. Hall, Applied Econometrics: A Modern Approach, Palgrave
Macmillan, New York, 2007, p. 347.
10
Notice that Eq. (16.5.3) has no intercept term (why?), but we can include it if there is a trend
variable in the original model.
guj75772_ch16.qxd 03/09/2008 11:11 AM Page 602
In passing, note that the original disturbance term is now replaced by the difference
between the current and previous values of the disturbance term. If the original disturbance
term is not autocorrelated, the transformed disturbance is, and therefore it poses the kinds
of estimation problems that we discussed in Chapter 11. However, if the explanatory vari-
ables are strictly exogenous, the first difference estimator is unbiased, given the values of
the explanatory variables. Also note that the first-difference method has the same disad-
vantages as the WG method in that the explanatory variables that remain fixed over time for
an individual are wiped out in the first-difference transformation.
It may be pointed out that the first difference and fixed effects estimators are the same
when we have only two time periods, but if there are more than two periods, these estima-
tors differ. The reasons for this are rather involved and the interested reader may consult the
references.11 It is left as an exercise for the reader to apply the first difference method to our
airlines example and compare the results with the other fixed effects estimators.
If the dummy variables do in fact represent a lack of knowledge about the (true) model,
why not express this ignorance through the disturbance term? This is precisely the approach
suggested by the proponents of the so-called error components model (ECM) or random
effects model (REM), which we will now illustrate with our airline cost function.
The basic idea is to start with Eq. (16.4.1):
T Cit = β1i + β2 Q it + β3 P Fit + β4 L Fit + u it (16.6.1)
Instead of treating β1i as fixed, we assume that it is a random variable with a mean value
of β1 (no subscript i here). The intercept value for an individual company can be expressed as
β1i = β1 + εi (16.6.2)
where εi is a random error term with a mean value of zero and a variance of σε2 .
What we are essentially saying is that the six firms included in our sample are a drawing
from a much larger universe of such companies and that they have a common mean value
for the intercept (= β1 ). The individual differences in the intercept values of each company
are reflected in the error term εi .
Substituting Eq. (16.6.2) into Eq. (16.6.1), we obtain:
T Cit = β1 + β2 Q it + β3 P Fit + β4 L Fit + εi + u it
(16.6.3)
= β1 + β2 Q it + β3 P Fit + β4 L Fit + wit
where
wit = εi + u it (16.6.4)
11
See in particular Jeffrey M. Wooldridge, Econometric Analysis of Cross Section and Panel Data, MIT
Press, Cambridge, Mass., 2002, pp. 279–283.
12
Jan Kmenta, Elements of Econometrics, 2d ed., Macmillan, New York, 1986, p. 633.
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 603
The composite error term wit consists of two components: εi , which is the cross-section,
or individual-specific, error component, and u it , which is the combined time series and
cross-section error component and is sometimes called the idiosyncratic term because it
varies over cross-section (i.e., subject) as well as time. The error components model (ECM)
is so named because the composite error term consists of two (or more) error components.
The usual assumptions made by the ECM are that
εi ∼ N (0, σε2 )
u it ∼ N 0, σu2
(16.6.5)
E(εi u it ) = 0; E(εi ε j ) = 0 (i = j)
E(u it u is ) = E(u i j u i j ) = E(u it u js ) = 0 (i = j; t = s)
that is, the individual error components are not correlated with each other and are not autocor-
related across both cross-section and time series units. It is also very important to note that wit
is not correlated with any of the explanatory variables included in the model. Since εi is a com-
ponent of wit , it is possible that the latter is correlated with the explantory variables. If that is
indeed the case, the ECM will result in inconsistent estimation of the regression coefficients.
Shortly, we will discuss the Hausman test, which will tell us in a given application if wit is cor-
related with the explanatory variables, that is, whether ECM is the appropriate model.
Notice carefully the difference between FEM and ECM. In FEM each cross-sectional
unit has its own (fixed) intercept value, in all N such values for N cross-sectional units. In
ECM, on the other hand, the (common) intercept represents the mean value of all the
(cross-sectional) intercepts and the error component εi represents the (random) deviation
of individual intercept from this mean value. Keep in mind, however, that εi is not directly
observable; it is what is known as an unobservable, or latent, variable.
As a result of the assumptions stated in Eq. (16.6.5), it follows that
E(wit ) = 0 (16.6.6)
var (wit ) = σε2 + σu2 (16.6.7)
Now if σε2 = 0, there is no difference between models (16.3.1) and (16.6.3) and we can
simply pool all the (cross-sectional and time series) observations and run the pooled regres-
sion, as we did in Eq. (16.3.1). This is true because in this situation there are either no
subject-specific effects or they have all been accounted for in the explanatory variables.
As Eq. (16.6.7) shows, the error term is homoscedastic. However, it can be shown that wit
and wis (t = s) are correlated; that is, the error terms of a given cross-sectional unit at two dif-
ferent points in time are correlated. The correlation coefficient, corr (wit , wis ), is as follows:
σε2
ρ = corr (wit , wis ) = ; t = s (16.6.8)
σε2 + σu2
Notice two special features of the preceding correlation coefficient. First, for any given
cross-sectional unit, the value of the correlation between error terms at two different times
remains the same no matter how far apart the two time periods are, as is clear from
Eq. (16.6.8). This is in strong contrast to the first-order [AR(1)] scheme that we discussed
in Chapter 12, where we found that the correlation between periods declines over time.
Second, the correlation structure given in Eq. (16.6.8) remains the same for all cross-
sectional units; that is, it is identical for all subjects.
If we do not take this correlation structure into account, and estimate Eq. (16.6.3) by
OLS, the resulting estimators will be inefficient. The most appropriate method here is the
method of generalized least squares (GLS).
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 604
TABLE 16.5
Dependent Variable: TC
Method: Panel EGLS (Cross-section random effects)
Sample: 1–15
Periods included: 15
Cross-sections included: 6
Total panel (balanced) observations: 90
Swamy and Arora estimator of component variances
Firm Effect
1 1.000000 -270615.0
2 2.000000 -87061.32
3 3.000000 -21338.40
4 4.000000 187142.9
5 5.000000 134488.9
6 6.000000 57383.00
We will not discuss the mathematics of GLS in the present context because of its com-
plexity.13 Since most modern statistical software packages now have routines to estimate
ECM (as well as FEM), we will present the results for our illustrative example only. But
before we do that, it may be noted that we can easily extend Eq. (16.4.2) to allow for a ran-
dom error component to take into account variation over time (see Exercise 16.6).
The results of ECM estimation of the airline cost function are presented in Table 16.5.
Notice these features of the REM. The (average) intercept value is 107429.3. The (differ-
ential) intercept values of the six entities are given at the bottom of the regression results.
Firm number 1, for example, has an intercept value which is 270615 units lower than the
common intercept value of 107429.3; the actual value of the intercept for this airline is
then −163185.7. On the other hand, the intercept value of firm number 6 is higher by 57383
units than the common intercept value; the actual intercept value for this airline is
(107429.3 + 57383), or 164812.3. The intercept values for the other airlines can be derived
similarly. However, note that if you add the (differential) intercept values of all the six air-
lines, the sum is 0, as it should be (why?).
If you compare the results of the fixed-effect and random-effect regressions, you will see
that there are substantial differences between the two. The important question now is:
Which results are reliable? Or, to put it differently, which should be the choice between the
two models? We can apply the Hausman test to shed light on this question.
The null hypothesis underlying the Hausman test is that the FEM and ECM estimators
do not differ substantially. The test statistic developed by Hausman has an asymptotic χ2
13
See Kmenta, op. cit., pp. 625–630.
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 605
TABLE 16.6
Correlated Random Effects—Hausman Test
Equation: Untitled
Test cross-section random effects
Chi-Sq.
Test Summary Statistic Chi-Sq. d.f. Prob.
Cross-section random 49.619687 3 0.0000
distribution. If the null hypothesis is rejected, the conclusion is that the ECM is not appro-
priate because the random effects are probably correlated with one or more regressors. In
this case, FEM is preferred to ECM. For our example, the results of the Hausman test are
as shown in Table 16.6.
The Hausman test clearly rejects the null hypothesis, for the estimated χ2 value for 3 df
is highly significant; if the null hypothesis were true, the probability of obtaining a chi-
square value of as much as 49.62 or greater would be practically zero. As a result, we can
reject the ECM (REM) in favor of FEM. Incidentally, the last part of the preceding table
compares the fixed-effect and random-effect coefficients of each variable and, as the last
column shows, in the present example the differences are statistically significant.
14
T. Breusch and A. R. Pagan, “The Lagrange Multiplier Test and Its Application to Model Specifica-
tion in Econometrics,” Review of Economic Studies, vol. 47, 1980, pp. 239–253.
15
The following discussion draws on A. Colin Cameron and Pravin K. Trivedi, Microeconometrics:
Methods and Applications, Cambridge University Press, Cambridge, New York, 2005, Chapter 21.
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 606
Pooled Estimators
Assuming the slope coefficients are constant across subjects, if the error term in Eq. (16.3.1)
is uncorrelated with the regressors, pooled estimators are consistent. However, as noted
earlier, the error terms are likely to be correlated over time for a given subject. Therefore,
panel-corrected standard errors must be used for hypothesis testing. Make sure the
statistical package you use has this facility, otherwise the computed standard errors may
be underestimated. It should be noted that if the fixed effects model is appropriate but we
use the pooled estimator, the estimated coefficients will be inconsistent.
4. If N is large and T is small, and if the assumptions underlying ECM hold, ECM estima-
tors are more efficient than FEM.
5. Unlike FEM, ECM can estimate coefficients of time-invariant variables such as gender
and ethnicity. The FEM does control for such time-invariant variables, but it cannot
estimate them directly, as is clear from the LSDV or within-group estimator models. On
the other hand, FEM controls for all time-invariant variables (why?), whereas ECM can
estimate only such time-invariant variables as are explicitly introduced in the model.
Despite the Hausman test, it is important to keep in mind the warning sounded by
Johnston and DiNardo. In deciding between fixed effects or random effects models, they
argue that, “ . . . there is no simple rule to help the researcher navigate past the Scylla of
fixed effects and the Charybdis of measurement error and dynamic selection. Although
they are an improvement over cross-section data, panel data do not provide a cure-all for all
of an econometrician’s problems.”17
EXAMPLE 16.1 To find out why productivity has declined and what the role of public investment is, Alicia
Productivity and Munnell studied productivity data in 48 continental United States for 17 years from 1970 to
1986, for a total of 816 observations.19 Using these data, we estimated the pooled regression
Public in Table 16.7. Note that this regression does not take into account the panel nature of the data.
Investment The dependent variable in this model is GSP (gross state product), and the explanatory
variables are: PRIVCAP (private capital), PUBCAP (public capital), WATER (water utility
capital), and UNEMP (unemployment rate). Note: L stands for natural log.
(Continued )
17
Jack Johnston and John DiNardo, Econometric Methods, 4th ed., McGraw-Hill, 1997, p. 403.
18
For further details and concrete applications, see Paul D. Allison, Fixed Effects Regression Methods for
Longitudinal Data, Using SAS, SAS Institute, Cary, North Carolina, 2005.
19
The Munnell data can be found at www.aw-bc.com/murray.
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 608
All the variables have the expected signs and all are individually, as well as collectively,
statistically significant, assuming all the assumptions of the classical linear regression
model hold true.
To take into account the panel dimension of the data, in Table 16.8 we estimated a fixed
effects model using 47 dummies for the 48 states to avoid falling into the dummy-variable
TABLE 16.8
Dependent Variable: LGSP
Method: Panel Least Squares
Sample: 1970–1986
Periods included: 17
Cross-sections included: 48
Total panel (balanced) observations: 816
trap. To save space, we only present the estimated regression coefficients and not the indi-
vidual dummy coefficients. But it should be added that all of the 47 state dummies were
individually highly statistically significant.
You can see that there are substantial differences between the pooled regression and
the fixed-effects regression, casting doubt on the results of the pooled regression.
To see if the random effects model is more appropriate in this case, we present the
results of the random effects regression model in Table 16.9.
To choose between the two models, we use the Hausman test, which gives the results
shown in Table 16.10.
Since the estimated chi-square value is highly statistically significant, we reject the
hypothesis that there is no significant difference in the estimated coefficients of the two
models. It seems there is correlation between the error term and one or more regressors.
Hence, we can reject the random effects model in favor of the fixed effects model. Note,
however, as the last part of Table 16.10 shows, not all coefficients differ in the two mod-
els. For example, there is not a statistically significant difference in the values of the
LUNEMP coefficient in the two models.
TABLE 16.10
Chi-Sq.
Test Summary Statistic Chi-Sq. d.f. Prob.
Cross-section random 42.458353 4 0.0000
Cross-section random effects test comparisons:
Variable Fixed Random Var (Diff.) Prob.
LPRIVCAP 0.267096 0.313980 0.000486 0.0334
LPUBCAP 0.714094 0.641926 0.000159 0.0000
LWATER 0.088272 0.130768 0.000054 0.0000
LUNEMP -0.138854 -0.139820 0.000006 0.6993
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 610
EXAMPLE 16.2 In their article, Maddala et al. considered the demand for residential electricity and natural
Demand for gas in 49 states in the USA for the period 1970–1990; Hawaii was not included in the
analysis.20 They collected data on several variables; these data can be found on the book’s
Electricity website. In this example, we will only consider the demand for residential electricity. We
in the USA first present the results based on the fixed effects estimation (Table 16.11) and then the
random effects estimation (Table 16.12), followed by a comparison of the two models.
TABLE 16.11
Dependent Variable: Log(ESRCBPC)
Method: Panel Least Squares
Sample: 1971–1990
Periods included: 20
Cross-sections included: 49
Total panel (balanced) observations: 980
where Log (ESRCBPC) = natural log of residential electricity consumption per capita (in
billion btu), Log(RESRCD) = natural log of real 1987 electricity price, and Log(YDPC) =
natural log of real 1987 disposable income per capita.
Since this is a double-log model, the estimated slope coefficients represent elasticities.
Thus, holding other things the same, if real per capita income goes up by 1 percent, the
mean consumption of electricity goes up by about 1 percent. Likewise, holding other
things constant, if the real price of electricity goes up by 1 percent, the average con-
sumption of electricity goes down by about 0.6 percent. All the estimated elasticities are
statistically significant.
The results of the random error model are as shown in Table 16.12.
It seems that there is not much difference in the two models. But we can use the
Hausman test to find out if this is so. The results of this test are as shown in Table 16.13.
Although the coefficients of the two models in Tables 16.11 and 16.12 look quite sim-
ilar, the Hausman test shows that this is not the case. The chi-square value is highly statis-
tically significant. Therefore, we can choose the fixed effects model over the random
20
G. S. Maddala, Robert P. Trost, Hongyi Li, and Frederick Joutz, “Estimation of Short-run and Long-
run Elasticities of Demand from Panel Data Using Shrikdage Estimators,” Journal of Business and
Economic Statistics, vol. 15, no. 1, January 1997, pp. 90–100.
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 611
TABLE 16.13
Correlated Random Effects—Hausman Test
Equation: Untitled
Test cross-section random effects
Chi-Sq.
Test Summary Statistic Chi-Sq. d.f. Prob.
Cross-section random 105.865216 2 0.0000
effects model. This example brings out the important point that when the sample size is large,
in our case 980 observations, even small differences in the estimated coefficients of the two
models can be statistically significant. Thus, the coefficients of the Log(RESRCD) variable in
the two models look reasonably close, but statistically they are not.
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 612
EXAMPLE 16.3 To assess the impact of beer tax on beer consumption, Philip Cook investigated the rela-
Beer tionship between the two, after allowing for the effect of income.21 His data pertain to 50
states and Washington, D.C, for the period 1975–2000. In this example we study the
Consumption, relationship of per capita beer sales to tax rate and income, all at the state level. We pre-
Income and sent the results of pooled OLS, fixed effects, and random effects models in tabular form in
Beer Tax Table 16.14. The dependent variable is per capita beer sales.
These results are interesting. As per economic theory, we would expect a negative
relationship between beer consumption and beer taxes, which is the case for the three
models. The negative income effect on beer consumption would suggest that beer is an
inferior good. An inferior good is one whose demand decreases as consumers’ income
rises. Maybe when their income rises, consumers prefer champagne!
For our purpose, what is interesting is the difference in the estimated coefficients.
Apparently there is not much difference in estimated coefficients between FEM and ECM.
As a matter of fact, the Hausman test produces a chi-square value of 3.4, which is not
significant for 2 df at the 5 percent level; the p value is 0.1783.
The results based on OLS, however, are vastly different. The coefficient of the beer tax
variable, in absolute value, is much smaller than that obtained from FEM or ECM. The
income variable, although it has the negative sign, is not statistically significant, whereas
the other two models show that it is highly significant.
This example shows very vividly what could happen if we neglect the panel structure
of the data and estimate a pooled regression.
TABLE 16.14
Variable OLS FEM REM
Constant 1.4192 1.7617 1.7542
(24.37) (52.23) (39.22)
Beer tax −0.0067 −0.0183 −0.0181
(−2.13) (−9.67) (−9.69)
Income −3.54(e−6) −0.000020 −0.000019
(−1.12) (−9.17) (−9.10)
R2 0.0062 0.0052 0.0052
Summary and 1. Panel regression models are based on panel data. Panel data consist of observations on
the same cross-sectional, or individual, units over several time periods.
Conclusions
2. There are several advantages to using panel data. First, they increase the sample size
considerably. Second, by studying repeated cross-section observations, panel data are
better suited to study the dynamics of change. Third, panel data enable us to study
more complicated behavioral models.
3. Despite their substantial advantages, panel data pose several estimation and inference
problems. Since such data involve both cross-section and time dimensions, problems
that plague cross-sectional data (e.g., heteroscedasticity) and time series data (e.g.,
autocorrelation) need to be addressed. There are some additional problems as well,
such as cross-correlation in individual units at the same point in time.
21
The data used here are obtained from the website of Michael P. Murphy, Econometrics: A Modern In-
troduction, Pearson/Addison Wesley, Boston, 2006, but the original data were collected by Philip
Cook for his book, Paying the Tab: The Costs and Benefits of Alcohol Control, Princeton University Press,
Princeton, New Jersey, 2007.
guj75772_ch16.qxd 22/08/2008 07:14 PM Page 613
4. There are several estimation techniques to address one or more of these problems. The
two most prominent are (1) the fixed effects model (FEM) and (2) the random effects
model (REM), or error components model (ECM).
5. In FEM, the intercept in the regression model is allowed to differ among individuals in
recognition of the fact that each individual, or cross-sectional, unit may have some special
characteristics of its own. To take into account the differing intercepts, one can use dummy
variables. The FEM using dummy variables is known as the least-squares dummy variable
(LSDV) model. FEM is appropriate in situations where the individual-specific intercept
may be correlated with one or more regressors. A disadvantage of LSDV is that it consumes
a lot of degrees of freedom when the number of cross-sectional units, N, is very large, in
which case we have to introduce N dummies (but suppress the common intercept term).
6. An alternative to FEM is ECM. In ECM it is assumed that the intercept of an individual
unit is a random drawing from a much larger population with a constant mean value. The
individual intercept is then expressed as a deviation from this constant mean value. One
advantage of ECM over FEM is that it is economical in degrees of freedom, as we do not
have to estimate N cross-sectional intercepts. We need only to estimate the mean value of
the intercept and its variance. ECM is appropriate in situations where the (random) inter-
cept of each cross-sectional unit is uncorrelated with the regressors. Another advantage
of ECM is that we can introduce variables such as gender, religion, and ethnicity, which
remain constant for a given subject. In FEM we cannot do that because all such variables
are colinear with the subject-specific intercept. Moreover, if we use the within-group
estimator or first-difference estimator, all such time-invariance will be swept out.
7. The Hausman test can be used to decide between FEM and ECM. We can also use the
Breusch–Pagan test to see if ECM is appropriate.
8. Despite its increasing popularity in applied research, and despite the increasing avail-
ability of such data, panel data regressions may not be appropriate in every situation.
One has to use some practical judgment in each case.
9. There are some specific problems with panel data that need to be borne in mind. The
most serious is the problem of attrition, whereby, for one reason or another, subjects of
the panel drop out over time so that over subsequent surveys (or cross-sections) fewer
original subjects remain in the panel. Even if there is no attrition, over time subjects may
refuse or be unwilling to answer some questions.
EXERCISES Questions
16.1. What are the special features of (a) cross-section data, (b) time series data, and
(c) panel data?
16.2. What is meant by a fixed effects model (FEM)? Since panel data have both time and
space dimensions, how does FEM allow for both dimensions?
16.3. What is meant by an error components model (ECM)? How does it differ from
FEM? When is ECM appropriate? And when is FEM appropriate?
16.4. Is there a difference between LSDV, within-estimator, and first-difference models?
16.5. When are panel data regression models inappropriate? Give examples.
16.6. How would you extend model (16.4.2) to allow for a time error component? Write
down the model explicitly.
16.7. Refer to the data on eggs produced and their prices given in Table 1.1. Which model
may be appropriate here, FEM or ECM? Why?
guj75772_ch16.qxd 28/08/2008 10:06 AM Page 614
16.8. For the investment data given in Table 1.2, which model would you choose—FEM
or REM? Why?
16.9. Based on the Michigan Income Dynamics Study, Hausman attempted to estimate
a wage, or earnings, model using a sample of 629 high school graduates, who
were followed for a period of six years, thus giving in all 3,774 observations. The de-
pendent variable in this study was logarithm of wage, and the explanatory variables
were: age (divided into several age groups); unemployment in the previous year;
poor health in the previous year; self-employment; region of residence (for graduate
from the South, South = 1 and 0 otherwise) and area of residence (for a graduate
from rural area, Rural = 1 and 0 otherwise). Hausman used both FEM and ECM.
The results are given in Table 16.15 (standard errors in parentheses).
16.12. Continue with Exercise 16.11. Before deciding to run the pooled regression, you
want to find out whether the data are “poolable.” For this purpose you decide to use
the Chow test discussed in Chapter 8. Show the necessary calculations involved and
determine if the pooled regression makes any sense.
16.13. Use the investment data given in Table 1.6.
a. Estimate the Grunfeld investment function for each company individually.
b. Now pool the data for all the companies and estimate the Grunfeld investment
function by OLS.
c. Use LSDV to estimate the investment function and compare your results with
the pooled regression estimated in (b).
d. How would you decide between the pooled regression and the LSDV regression?
Show the necessary calculations.
16.14. Table 16.16 gives data on the hourly compensation rate in manufacturing in U.S.
dollars, Y (%), and the civilian unemployment rate, X (index, 1992 = 100), for
Canada, the United Kingdom, and the United States for the period 1980–2006.
Consider the model:
Yit = β1 + β2 X it + u it (1)
a.
A priori, what is the expected relationship between Y and X? Why?
b.
Estimate the model given in Eq. (1) for each country.
c.
Estimate the model, pooling all of the 81 observations.
d.
Estimate the fixed effects model.
e.
Estimate the error components model.
f.
Which is a better model, FEM or ECM? Justify your answer (Hint: Apply the
Hausman Test).
16.15. Baltagi and Griffin considered the following gasoline demand function:*
ln Yit = β1 + β2 ln X 2it + β3 ln X 3it + β4 ln X 4it + u it
Where Y = gasoline consumption per car; X2 = real income per capita, X3 = real
gasoline price, X4 = number of cars per capita, i = country code, in all 18 OECD
countries, and t = time (annual observations from 1960–1978). Note: Values in
table are logged already.
a. Estimate the above demand function pooling the data for all 18 of the countries
(a total of 342 observations).
b. Estimate a fixed effects model using the same data.
c. Estimate a random components model using the same data.
d. From your analysis, which model best describes the gasoline demand in the
18 OECD countries? Justify your answer.
16.16. The article by Subhayu Bandyopadhyay and Howard J. Wall, “The Determinants of
Aid in the Post-Cold War Era,” Review, Federal Reserve Bank of St. Louis,
November/December 2007, vol. 89, number 6, pp. 533–547, uses panel data to
estimate the responsiveness of aid to recipient countries’ economic and physical
needs, civil/political rights, and government effectiveness. The data are for
135 countries for three years. The article and data can be found at: http://
research.stlouisfed.org/publications/review/past/2007 in the November/December
Vol. 89, No. 10 section. The data can also be found on the textbook website in
Table 16.18. Estimate the authors’ model (given on page 534 of their article) using
a random effects estimator. Compare your results with those of the pooled and fixed
effects estimators given by the authors in Table 2 of their article. Which model is
appropriate here, fixed effects or random effects? Why?
16.17. Refer to the airlines example discussed in the text. For each airline, estimate a time
series logarithmic cost function. How do these regressions compare with the fixed
effects and random effects models discussed in the chapter? Would you also esti-
mate 15 cross-section logarithmic cost functions? Why or why not?
*
B. H. Baltagi and J. M. Griffin, “Gasoline Demand in the OECD: An Application of Pooling and Test-
ing Procedures,” European Economic Review, vol. 22, 1983, pp. 117–137. The data for 18 OECD coun-
tries for the years 1960–1978 can be obtained from: http://www.wiley.com/legacy/wileychi/baltagi/
supp/Gasoline.dat, or from the textbook website, Table 16.17.