Panel Data
Panel Data
«
1
2
3
4
...7
next »
1. Module objective
This module attempts to explore the possibilities of panel data modeling. The model gives high stress on data structure. The modelling on such problems
usually increases model accuracy. In this section, we highlight the followings for the panel data modeling:
2. Modelling structure
In general, data are usually fitted in the modeling in four different ways. This includes
Pooled data
Panel data
Panel data is the cross sectional data but they observed periodically; or else, it is time series data but observed as a cross sectional set up. Panel data or
pooled or longitudinal data is nothing but an admixture of cross section and time series data that is data acquired from repeated observation over a certain
period of time. A typical panel data set has both a cross-sectional dimension and a time series dimension. In particular, the same cross-sectional units (e.g.
individuals, families, firms, cities, states) are observed over time.
Model representation:
For i = 1, 2,….N :
For t = 1, 2,….T
For t = 1, 2,….T
Let where
If fixed (constant) for all i = 1, 2, 3…. N and [i.e. zero mean and constant variance] then we can use pooled data model and the
procedure is like usual regression with OLS application.
If is accepted, we can go for pooled data model. Here, the technique is used when the data is just combining cross-section data and time-series
data and this combination of data is a new set of data (called pooled data) without taking any consideration of cross-section and time-series behaviour.
If is rejected, we can go for panel data analysis; otherwise, we can go for pooled data analysis.
In the present set up, we suppose to reject the null hypothesis and moved into panel data analysis. Panel data modelling has two parts: fixed effect model
and random effect model. Panel data model has two different approaches: Fixed Effect Model and Random effect model
In the fixed effect model, intercept will change across individuals or over time or both.
In the random effect model, all individual characteristics and cross-sectional specifics are captures in the residuals. So, in REM, the residual has individual
component, time series component and both components.
where
If and
Then we can estimate the model by separating its time component so that we have T regressions each having N observations.
where
where
where
where
where
Here, we assume that (intercept) and the residuals are constant over cross sectional units and time series units.
It is, however, very rare in reality. So, we should consider model where intercepts or residuals change over time and across individual.
Let
Here, variations of individuals and over time are captured in the intercepts.
for
No of parameters:
Degree of freedom=
Or else,
Dummy Variable Regression (i.e. put in a dummy variable for each cross-sectional unit, along with other explanatory variables). This may cause estimation
difficulty when N is large.
First-difference Estimator: Each variable is differenced once over time, so we are effectively estimating the relationship between changes of variables.
Here, variations of individuals and over time are captured in the residuals.
Here, random error is composed of error of individual component, error of time component and error of both.
Let
Where
ui is error for cross section; vt is error for time series; wit is error for both
Step 2: by using sample variance estimated at the step 1, use GLS to estimate the parameters of the model.
«
1
2
3
4
...6
next »
Difference between random effect and fixed effect estimators
RE estimates are more efficient (or more precise) if αi is uncorrelated with the explanatory variables.
Balanced Panel indicates panel data with observations for the same time periods for all individuals. Otherwise, the data are unbalanced. If a panel data set is
unbalanced for reasons uncorrelated with uit, estimation consistency using FE will not be affected. The "attrition" problem: If an unbalanced panel is a result
of some selection process related to uit, then endogeneity problem is present and need to be dealt with using some correction methods. This problem cannot
be solved by just deleting the units that have missing observations for some time periods.
Balanced Panel indicates panel data with observations for the same time periods for all individuals. Otherwise, the data are unbalanced. If a panel data set is
unbalanced for reasons uncorrelated with uit, estimation consistency using FE will not be affected. The "attrition" problem: If an unbalanced panel is a result
of some selection process related to uit, then endogeneity problem is present and need to be dealt with using some correction methods. This problem cannot
be solved by just deleting the units that have missing observations for some time periods.
Hausmann test: Comparing the RE and FE estimates, if the estimates are statistically different, then the RE assumption is probably invalid. In this case FE has
to be used. Otherwise, RE is more efficient.
Breusch and Pagan test: This is to test the hypothesis that there are no random effects.
Independent variables:
Power; Education; Health; Transport; Research and Development; Domestic Investment; Profit; Risk
The fixed effects regression model is of three different forms: within-group fixed effect model, first difference fixed effect model and least square dummy
variable (LSDV) fixed effects model.
This is known as the within groups regression model because it is explaining the variations about the mean of the dependent variable in terms of the
variations about the means of the explanatory variables for the group of observations relating to a given individual.
The first difference fixed effect model is as follows:
Here the unobserved effect is eliminated by subtracting the observations for the previous time period from the observation for the current time period, for all
time periods.
Here, the unobserved effect is brought explicitly into the model. Zi is considered as dummy variable, where it is equal to 1 in the case of an observation
relating to individual I and 0 otherwise. Formally, the unobserved effect is being treated as the coefficient of the individualspecific dummy variable. The
weight of represents the fixed effect on the dependent variable Yi for individual i.
It is to be noted that when the variables of interest are constant for each individual, a fixed effects regression is not an effective tool because such variables
cannot be included. So the alternative approach is the use of random effect regression model. It has two conditions. First, Z i should be drawn randomly from
a given distribution>This may well be the case if the individual observations constitute a random sample from a given population. If this is the case, the αi
may be treated as random variables, drawn from a given distribution and we can write the model is follows:
Where and
The second condition is that the Zi variables are distributed independently of all of the Xj variables. If this is not the case, α and hence u, will not be
uncorrelated with the Xj variables and the random effects estimation will be biased and inconsistent. We would have to use fixed effects estimation instead,
even if the first condition seems to be satisfied.
BPL 6.95
=================================================================
=======
Montgomery, D. C., Peck, E. A., and G. G. Vining: Introduction to Linear Regression Analysis, Wiley India, New York, 2006.
Dielman, Terry E.: Applied Regression Analysis for Business and Economics, PWS-Kent, Boston, 1991.
Draper, N. R., and H. Smith: Applied Regression Analysis, 3d ed., John Wiley & Sons, New York, 1998.
Frank, C. R., Jr.: Statistics and Econometrics, Holt, Rinehart and Winston, New York, 1971.
Graybill, F. A.: An Introduction to Linear Statistical Models, vol. 1, McGraw- Hill, New York, 1961.
Greene, William H.: Econometric Analysis, 4th ed., Prentice Hall, Englewood Cliffs, N. J., 2000.
Griffiths, William E., R. Carter Hill and George G. Judge: Learning and Practicing Econometrics, John Wiley & Sons, New York, 1993.
Gujarati, Damodar N.: Essentials of Econometrics, 2d ed., McGraw-Hill, New York, 1999.
Hill, Carter, William Griffiths, and George Judge: Undergraduate Econometrics, John Wiley& Sons, New York, 2001.
Katz, David A.: Econometric Theory and Applications, Prentice Hall, Englewood Cliffs, N.J., 1982.
Koop, Gary: Analysis of Economic Data, John Wiley & Sons, New York, 2000.
Koutsoyiannis, A.: Theory of Econometrics, Harper & Row, New York, 1973.
Maddala, G. S.: Introduction to Econometrics, John Wiley & Sons, 3d ed., New York, 2001.
Mills, T. C.: The Econometric Modelling of Financial Time Series, Cambridge University Press, 1993.
Mittelhammer, Ron C., George G. Judge, and Douglas J. Miller: Econometric Foundations, Cambridge University Press, New York, 2000.
Mukherjee, Chandan, Howard White, and Marc Wuyts: Econometrics and Data Analysis for Developing Countries, Routledge, New York, 1998.
Pindyck, R. S., and D. L. Rubinfeld: Econometric Models and Econometric Forecasts, 4th ed., McGraw-Hill, New York, 1990.
Verbeek, Marno: A Guide to Modern Econometrics, John Wiley & Sons, New York, 2000.
1. In panel data
2. While dealing with random effects model the most appropriate procedure to be adopted is
a) OLS procedure
b) GLS procedure
b) T distribution
c) Chi‐square distribution
d) F distribution
1. In 1985, neither Florida nor Georgia had laws banning open alcohol containers in vehicle passenger compartments. By 1990, Florida had passed such a
law, but Georgia had not.
a) Suppose you can collect random samples of the driving-age population in both states, for 1985 and 1990. Let arrest be a binary variable equal to unity if a
person was arrested for drunk driving during the year. Without controlling for any other factors, write down a linear probability model that allows you to test
whether the open container law reduced the probability of being arrested for drunk driving. Which coefficient in your model measures the effect of the law?
b) Why might you want to control for other factors in the model? What might some of these factors be?
2. What is meant by an error components model (ECM)? How does it differ from FEM? When is ECM appropriate? And when is FEM appropriate?
3. In order to determine the effects of collegiate athletic performance on applicants, you collect data on applications for a sample of Division I colleges for
1985, 1990, and 1995.
a) What measures of athletic success would you include in an equation? What are some of the timing issues?
c) Write an equation that allows you to estimate the effects of athletic success on the percentage change in applications. How would you estimate this
equation? Why would you choose this method?
4. Suppose that, for one semester, you can collect the following data on a random sample of college juniors and seniors for each class taken: a standardized
final exam score, percentage of lectures attended, a dummy variable indicating whether the class is within the student's major, cumulative grade point
average prior to the start of the semester, and SAT score.
a) Why would you classify this data set as a cluster sample? Roughly how many observations would you expect for the typical student?
b) If you pool all of the data together and use OLS, what are you assuming about unobserved student characteristics that affect performance and attendance
rate? What roles do SAT score and prior GPA play in this regard?
c) If you think SAT score and prior GPA do not adequately capture student ability, how would you estimate the effect of attendance on final exam
performance?