Advanced Panel Data Methods: Basic Econometrics
Advanced Panel Data Methods: Basic Econometrics
Introduction
Reference: Wooldridge, Chapter 14.1-2 (14.3 optional)
In the previous lecture (and in Chapter 13) it was discussed how first differencing of the data can be used to estimate the unobserved effects panel data model. In this chapter we cover two other methods that can be used for this purpose:
The fixed effects estimator The random effects estimator
Throughout we assume we have panel data, i.e. data on the same individuals (or whatever) over time.
Because ai is unobserved, we cant include it in the regression. This may create omitted variable bias if ai is correlated with xit. Possible exam question: (a) Why? (b) Please provide an example in which this might be a problem.
Solution: Eliminate ai from the equation and then estimate the parameters of interest! As we have seen, one way of eliminating ai is to take first differences of the data:
But first differencing is just one of many ways of eliminating ai An alternative method, which often works better in practice, is called the fixed effects transformation.
Now, for each individual i, average this equation over time. We get:
where and so on.
14.1 14.2 Because ai is fixed (constant) over time, it appears in both (14.1) and (14.2). Now subtract (14.2) from (14.1):
which we shall write as 14.3 where (similar for is the time-demeaned data on y and ).
14.3
This is also called the within transformation. The important thing: The unobserved effect ai has disappeared.
Simply use time demeaning on each explanatory variable, and then estimate the following equation using OLS:
14.5
We use the OLS method to estimate the parameters here. What do we need to assume for this approach to result in unbiased estimates of the -parameters? Hint: Revisit MLR.1-MLR.4, Chapter 3.
Otherwise is always equal to zero (yes?), i.e. theres no variation in this variable => cant estimate its coefficient.
If we also assume that uit is homoskedastic and serially uncorrelated, we can use the usual formula for computing standard errors.
First, make sure the data have been declared to be panel data we do this by using tsset:
. tsset fcode year panel variable: time variable: delta: fcode (strongly balanced) year, 1987 to 1989 1 unit
(acutally this has already been done for this data set).
. xtreg lscrap d88 d89 grant grant_1, fe Fixed-effects (within) regression Group variable: fcode R-sq: within = 0.2010 between = 0.0079 overall = 0.0068 Number of obs Number of groups = = 162 54 3 3.0 3 6.54 0.0001
corr(u_i, Xb)
= -0.0714
[95% Conf. Interval] -.297309 -.5113797 -.5510178 -.8384239 .4631142 .1368776 .0169741 .0463881 -.0047551 .7317539
(fraction of variance due to u_i) F(53, 104) = 24.66 Prob > F = 0.0000
. xtreg lscrap d88 d89 grant grant_1, fe Fixed-effects (within) regression Group variable: fcode Number of obs Number of groups = = 162 54 3 3.0 3 6.54 0.0001
Based on (14.5)
R-sq: within = 0.2010 between = 0.0079 overall = 0.0068 Obs per group: min = avg = max = F(4,104) Prob > F = =
corr(u_i, Xb)
= -0.0714
[95% Conf. Interval] -.297309 -.5113797 -.5510178 -.8384239 .4631142 .1368776 .0169741 .0463881 -.0047551 .7317539
(fraction of variance due to u_i) F(53, 104) = 24.66 Prob > F = 0.0000
Additional points
While time-constant variables cannot be included by themselves in FE models, they can be interacted with variables that change over time. Can we use FE to estimate
The return to education?
When we include a full set of year dummies (one for each year except the base year), we cannot also include a variable whose change over time is constant (e.g. age). Why?
corr(u_i, Xb)
= 0.0991
lwage _Iyear_1981 _Iyear_1982 _Iyear_1983 _Iyear_1984 _Iyear_1985 _Iyear_1986 _Iyear_1987 educ _IyeaXedu_1981 _IyeaXedu_1982 _IyeaXedu_1983 _IyeaXedu_1984 _IyeaXedu_1985 _IyeaXedu_1986 _IyeaXedu_1987 married union _cons sigma_u sigma_e rho
Coef. -.0224158 -.0057611 .0104297 .0843743 .0497253 .0656064 .0904448 0 .0115854 .0147905 .0171182 .0165839 .0237085 .0274123 .0304332 .0548205 .0829785 1.362459 .37264193 .35335713 .52654439
Std. Err. .1458885 .1458558 .1458579 .1458518 .1458602 .1458917 .1458505 (omitted) .0122625 .0122635 .0122633 .0122657 .0122738 .012274 .0122723 .0184126 .0194461 .0162385
t -0.15 -0.04 0.07 0.58 0.34 0.45 0.62 0.94 1.21 1.40 1.35 1.93 2.23 2.48 2.98 4.27 83.90
P>|t| 0.878 0.968 0.943 0.563 0.733 0.653 0.535 0.345 0.228 0.163 0.176 0.053 0.026 0.013 0.003 0.000 0.000
[95% Conf. Interval] -.3084431 -.2917243 -.2755377 -.2015811 -.2362465 -.2204273 -.195508 -.0124562 -.0092533 -.0069251 -.007464 -.0003554 .0033481 .0063722 .018721 .0448527 1.330622 .2636114 .2802021 .2963971 .3703297 .3356971 .3516401 .3763977 .0356271 .0388342 .0411615 .0406319 .0477725 .0514765 .0544942 .09092 .1211042 1.394296
(fraction of variance due to u_i) F(544, 3799) = 8.09 Prob > F = 0.0000
Note: xi expands terms containing categorical variables into dummy variable sets.
lscrap d88 d89 grant grant_1 _Ifcode_410538 _Ifcode_410563 _Ifcode_410565 _Ifcode_410566 _Ifcode_410567 _Ifcode_410577 _Ifcode_410592 _Ifcode_410593 _Ifcode_410596
Coef. -.0802157 -.2472028 -.2523149 -.4215895 3.905259 4.717328 4.443668 4.621434 2.279588 3.423147 6.12662 2.934958 4.761838
Std. Err. .1094751 .1332183 .150629 .2102 .4064064 .4064064 .4064064 .4064064 .4064064 .4064064 .4064064 .4064064 .4064064
t -0.73 -1.86 -1.68 -2.01 9.61 11.61 10.93 11.37 5.61 8.42 15.08 7.22 11.72
P>|t| 0.465 0.066 0.097 0.047 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[95% Conf. Interval] -.297309 -.5113797 -.5510178 -.8384239 3.09934 3.911408 3.637748 3.815514 1.473668 2.617228 5.3207 2.129039 3.955919 .1368776 .0169741 .0463881 -.0047551 4.711178 5.523247 5.249587 5.427353 3.085507 4.229066 6.932539 3.740878 5.567757
Verify that the coefficients on the year dummies & grant variables are identical to the FE estimates shown earlier.
Estimating the ai
Sometimes we are interested in the estimated ai. These can be computed as follows: Stata can do this for us: after your xtreg regression, use predict ahati, u If you are using a dummy variable regression, the estimated coefficients on the N-1 dummies are your estimates of ai.
Testing for serial correlation after FE estimation is difficult methods are beyond the scope of the course. Advice: If N is large and T is not too large (say less than 20), then the FE usually works better in practice. A good approach in practice: Obtain results for both estimators and compare if the results do not differ very much, this is reassuring.
Unbalanced panels
So far we have focused on the case where we have the same number of time series observations for each firm (T is the same for all i). Balanced panel. The main reason is that it makes the exposition above a bit more user friendly. But, in practice, T often varies across the individuals in which case we have an unbalanced panel. Estimating the FE model with an unbalanced panel is just as easy as for a balanced panel.
However, you may want to think about why the panel is unbiased, and whether this may lead to bias. For example, if you are analyzing the determinants of firm-level profitability, it may be a problem if the least profitable firms go out of business during the sampling period (attrition). Basically this creates a non-random sample. Addressing this particular problem is difficult and we will not go into further detail.
Indeed, I think its fair to say that the vast majority of researchers using a panel data approach assume attrition will not bias the estimates.
where vit = ai + uit is the composite error term. Now, because ai is in the composite error term in each time period, the vit will be serially correlated. It can be shown that:
where
Estimating this transformed equation using OLS gives you the random effects estimator. The residual of this equation is serially uncorrelated OK to have time-constant explanatory variables The parameter is unknown and has to be estimated
Random-effects GLS regression Group variable: nr R-sq: within = 0.1799 between = 0.1860 overall = 0.1830
= =
Obs per group: min = avg = max = Wald chi2(14) Prob > chi2 = =
corr(u_i, X)
= 0 (assumed)
lwage educ black hisp exper expersq married union _Iyear_1981 _Iyear_1982 _Iyear_1983 _Iyear_1984 _Iyear_1985 _Iyear_1986 _Iyear_1987 _cons sigma_u sigma_e rho
Coef. .0918763 -.1393767 .0217317 .1057545 -.0047239 .063986 .1061344 .040462 .0309212 .0202806 .0431187 .0578155 .0919476 .1349289 .0235864 .32460315 .35099001 .46100216
Std. Err. .0106597 .0477228 .0426063 .0153668 .0006895 .0167742 .0178539 .0246946 .0323416 .041582 .0513163 .0612323 .0712293 .0813135 .1506683
z 8.62 -2.92 0.51 6.88 -6.85 3.81 5.94 1.64 0.96 0.49 0.84 0.94 1.29 1.66 0.16
P>|z| 0.000 0.003 0.610 0.000 0.000 0.000 0.000 0.101 0.339 0.626 0.401 0.345 0.197 0.097 0.876
[95% Conf. Interval] .0709836 -.2329117 -.0617751 .0756361 -.0060753 .0311091 .0711415 -.0079385 -.0324672 -.0612186 -.0574595 -.0621977 -.0476592 -.0244427 -.271718 .1127689 -.0458417 .1052385 .1358729 -.0033726 .0968629 .1411273 .0888626 .0943096 .1017798 .1436969 .1778286 .2315544 .2943005 .3188907
Compare to FE estimates:
Fixed-effects (within) regression Group variable: nr R-sq: within = 0.1806 between = 0.0005 overall = 0.0635 Number of obs Number of groups = = 4360 545 8 8.0 8 83.85 0.0000 Obs per group: min = avg = max = F(10,3805) Prob > F = =
corr(u_i, Xb)
= -0.1212
lwage educ black hisp exper expersq married union _Iyear_1981 _Iyear_1982 _Iyear_1983 _Iyear_1984 _Iyear_1985 _Iyear_1986 _Iyear_1987 _cons sigma_u sigma_e rho
Coef. 0 0 0 .1321464 -.0051855 .0466804 .0800019 .0190448 -.011322 -.0419955 -.0384709 -.0432498 -.0273819 0 1.02764 .4009279 .35099001 .56612236
Std. Err. (omitted) (omitted) (omitted) .0098247 .0007044 .0183104 .0193103 .0203626 .0202275 .0203205 .0203144 .0202458 .0203863 (omitted) .0299499
P>|t|
13.45 -7.36 2.55 4.14 0.94 -0.56 -2.07 -1.89 -2.14 -1.34 34.31
0.000 0.000 0.011 0.000 0.350 0.576 0.039 0.058 0.033 0.179 0.000
.1128842 -.0065666 .0107811 .0421423 -.0208779 -.0509798 -.0818357 -.0782991 -.0829434 -.0673511 .9689201
.1514087 -.0038044 .0825796 .1178614 .0589674 .0283359 -.0021553 .0013573 -.0035562 .0125872 1.086359
(fraction of variance due to u_i) F(544, 3805) = 9.16 Prob > F = 0.0000