0% found this document useful (0 votes)

81 views22 pages

Ecotrics (PR) Panel Data Reference

This document provides an introduction to panel data analysis. It defines panel data as data collected on the same individuals, firms, or geographic units over multiple time periods. This allows the modeling of changes within units over time and controls for time-invariant characteristics. The document contrasts panel data with pooled cross-sectional data, where different units are observed each period. It then gives examples of panel data models for production functions and training program effects. Finally, it discusses challenges like endogeneity that arise in panel data analysis.

Uploaded by

Arka Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views22 pages

Ecotrics (PR) Panel Data Reference

Uploaded by

Arka Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

UNIT 8 INTRODUCTION TO PANEL DATA

Structure

8.0 Objectives
8.1 Introduction
8.2 Panel Data Models
8.2.1 Pooled Cross Section Data
8.2.2 Panel Data
8.2.3 Advantage of Panel Data Over Pooled Data
8.3 Linear Static Panel Data Model
8.3.1 Chow Test
8.4 Fixed Effect Versus Random Effect Panel Models
8.4.1 Fixed Effect Model
8.4.2 Random Effect Model
8.4.3 Policy Relevant Inference
8.5 Let Us Sum Up
8.6 Key Words
8.7 Suggested Books for Further Reading
8.8 Answers/Hints to Check Your Progress Exercises

8.0 OBJECTIVES

After reading this unit, you will be able to:

 differentiate between panel data and pooled data;

 state the distinctive features of time series data, cross-section data and panel
data;

 explain, with illustrations, the panel data models;

 Problem of endogeneity in a static panel data model

 illustrate the framework of a ‘pooled cross section data’;

 elucidate the features of ‘panel data’;

 contrast the advantages of ‘panel data’ over ‘pooled data’;

 define the terms ‘Static Panel Data Model’ and ‘Dynamic Panel Data Model’;

 specify the steps involved in performing a Chow Test;

1
 discuss the features of ‘fixed effect (FE) model’ and ‘random effect (RE)
model’ in panel regressions;

 present a comparative profile of the relative contexts in which the FE or the RE

model is appropriate to adopt; and

 outline, with illustration,’ the ‘policy relevant inference’ that could be drawn
from panel data models.

8.1 INTRODUCTION

In applications, econometricians often use either pure cross sectional data or time
series data. A cross sectional data is one which is collected for different sample
units for a same point of time (e.g. NSSO’s 5-yearly data on manufacturing firms).
A pure time series data, on the other hand, is collected over different time points
for the same set of sample units (GDP). Such sample units, in a pure time series
data, could themselves be cross sectional in nature.

A data set that has both cross sectional and time series dimensions are nowadays
very common in empirical research. Such data sets, often used for policy analysis,
could be pooled to form a panel data set. Note that an independently pooled cross
section can be obtained by a random sampling of a large population at different
points of time (usually, but not necessarily, for different years). From a statistical
standpoint, such data sets have an important feature i.e. they consist of independent
sampled observations. Such independently sampled observations play a key role
in our analysis of cross-sectional data, where, among other things, it rules out
correlation in the error terms across different observations. An independently
pooled cross section data differs from a single random sample. This is in the sense
that sampling from the population at different points of time likely leads to
observations that are not identically distributed. For instance, distributions of
wages and education have changed over time in most countries.

A panel data or longitudinal data set thus consists of time series data for each cross
sectional unit in the data set. Panel data is collected on the same individuals or
2
firms or geographical units over specified periods of time. The key difference
between the panel data and pooled data is that, in case of panel data, the same cross
sectional units are followed over a given time period. In case of pooled data,
different cross section units are observed for a given time period. Thus, the main
features of the three types of data are:

a) Time Series Data: Many observations (large T) on as few as one unit

(small N): e.g. stock price trends, aggregate national statistics.

b) Pooled Cross Sections: Two or more independent samples of many

units (large N) are drawn from the same population at different time
periods.

c) Panel Data: Observations of multiple phenomena obtained over multiple

time periods (small T) on many cross sectionunits (large N).
Hence, for using methods of panel data analysis, we need information on same cross
section units over a given period of time. This is more difficult to obtain than a
pooled cross section data where the sampled units could be different. But in
observing the same cross sectional units over time repeatedly, it helps us in
controlling for certain ‘unobserved characteristics’ of the cross sectional unit.

8.2 PANEL DATA MODELS

Let us now illustrate panel data models with some instances. Suppose that the
population consists of all manufacturing firms in a country operating during a given
three year period. Production function describing the output in the population of
firm can be specified as:

Log (outputit )   t  1 log( lobourit )   2 log( capital it )   3 spilloverit  quality i  u it

…………….. (8.1)

Here, spillover is a measure of foreign firm concentration in a region containing the

firm. The term quality refers to unobserved factors (e.g. managerial or work
quality) affecting productivity. We must note that quality is a firm specific term
and is constant over time. It also has the same effect in each time period, while uit

3
changes across time and firm. Each firm is randomly chosen from the population
of all manufacturing firms. Thus, in a panel regression, for a specification like in
(8.1), ‘i’ is an indicator of cross section unit and ‘t’ is an indicator of time. In
analyzing a panel data set, our aim is to capture this time constant for firms as a
specific unobserved effect. The error term ‘u’ represents the unobserved shocks
in each time period. The presence of the parameter  t represents intercepts in each
time period, allowing for aggregate productivity to change over time. The
coefficients of regressors are assumed to be constant.

Another context we can consider relates to the effect of a training programme to

employees (or any other similar programme like providing mid day meals to
children in school) on subsequent performance. Its specification can be considered
as:

log( performanceit )   t  z it    1 prog it  ci  uit

……………… (8.2)

where ‘i' indicates individual, ‘t’ indicates time period and  t indicates the time

varying intercept. z it is the set of observable characteristics that affect not only
wage but may also be correlated with program participation. ci indicates the ability
of the individual. Now, suppose at t=1 no one has participated in the programme.
It implies progi1=0 for all i. Then, let us say some individuals are chosen to
participate in the programme and their subsequent performance are observed for
the two groups (i.e. the group which did not undergo training and the group which
underwent the training). The sub group that participates in the training programme
is defined as the ‘treatment group’ and the other one as the ‘control group’. In
period t=1 none received treatment but in t=2 treatment group received training but
the control group did not receive the training. The term ci included in (8.2) stands
for an individual ‘i' who can choose to participate in the programme with his/her
own choice i.e. it can be correlated with the inherent ability (or proactive initiative)
of the individual. This is identified in the literature as the problem of ‘self-
selection’. The important issue in a panel model like (8.2) is whether unobserved

4
factors of productivity relevance are correlated with the observable factors?
Another issue is whether we can assume at any time point t, that the unobserved
effect is uncorrelated with the error term of other time periods or not? For example,
the effect of job training on productivity and thus on subsequent wages. This problem is

known as the problem of ‘endogeneity’. In the above example we see how the self-
selection problem can lead to the problem of endogeneity.

8.2.1 Pooled Cross Section Data

Pooled cross sectional data are obtained by collecting random samples from a large
population independently of each other at different point of time. Panel data sets
have both cross-sectional and time series features (it consists of time series data for
each statistical unit in the cross section). For instance, consider two cross-sectional
household surveys taken: one in 1985 and one in 1990. In 1985, a random sample
of households were surveyed with variables like income, savings, family size, etc.
In 1990, a new random sample of households was taken using the same survey
questions. To increase our sample size, we can form a pooled cross section by
combining the two years. Pooling cross sections from different years is an effective
way for analysing the effects of a new government policy. The idea is to collect
data from the years before and after a key policy change. As an example, we can
consider the data on housing prices taken in 1993 and 1995 i.e. before and after a
reduction in property taxes was effected in 1994. Suppose we have data on 250
houses for 1993 and on 270 houses for 1995. One method of arranging such a data
set is as given in Table 8.1. Observations 1 through 250 correspond to the houses
sold in 1993, and observations 251 through 520 correspond to the 270 houses sold
in 1995. A pooled cross section is analysed much like a standard cross section,
except that we often need to account for secular differences in the variables across
time. In fact, in addition to increasing the sample size, the point of a pooled cross-
sectional analysis is often to see how a key relationship has changed over time.
With large N and small T one may introduce separate intercepts for each time
period.

5
Table 8.1: Pooled Data on Houses Sold

Source: Woolridge (5th Edition)

In India many surveys of individuals, households and firms are repeated in the
NSSO’s [National Sample Survey organization (NSSO)] periodic surveys
conducted on individuals and households at regular intervals. For these surveys,
NSSO randomly samples households at every five year interval. If a random
sample is drawn at each time period, pooling the resulting random samples gives
us an independently pooled cross section. One reason for using independently
pooled cross sections is to increase the sample size.

8.2.2 Panel Data

The unique characteristic of panel data structure is that each cross section unit is
followed over a certain period of time. Panel data sets are fairly easy to collect for
districts, cities, states, and countries. Hence, policy analysis is greatly enhanced by
using panel data sets. For the econometric analysis of panel data, we cannot
assume that the observations are independently distributed across time. For

6
instance, unobserved factors (such as ability) that affect someone’s wage in 2010
will also affect that person’s wage in 2011. Likewise, unobserved factors that
affect a city’s crime rate in 2015 will also affect that city’s crime rate in 2020. For
this reason, special models and methods have been developed to analyse panel data.

In using panel data in an econometric study, it is important to know how the data
should be stored. We must be careful to arrange the data so that the different time
periods for the same cross-sectional unit (person, firm, city, and so on) are easily
linked. For instance, let us suppose that the data set is on cities for two different
years. For most purposes, the best way to enter the data is to have two records for
each city, one for each year. The first record for each city corresponds to the early
year, and the second record is for the later year. These two records should be
adjacent. Therefore, a data set for 100 cities and two years will contain 200 records.
The first two records are for the first city in the sample, the next two records are for
the second city, and so on.

The above method of data arrangement makes it easy to obtain the differences in
the two records for each city and store them in a pooled cross-sectional manner for
an analysis of the differencing estimation. Most of the two-period panel data sets
are stored in this way. We use a direct extension of this scheme for panel data sets
with more than two time periods. A second way of organising the two periods of a
panel data set is to have only one record per cross-sectional unit. This requires two
entries for each variable, one for each time period. Creating the differences from
T1 to T2 is then easy. Placing the data in one record, however, does not allow for
a pooled analysis by using the two time periods on the original data. Also, this
method of organisation does not work for panel data sets with more than two time
periods. Table 8.2 presents a two-year panel data set on crime and related statistics
for 150 cities. Cities are numbered as 1,2,…,150. Just as in a pure cross section,
the ordering in the cross section of a panel data set does not matter. We could use
the city name in place of a number. But it is often useful to have both.

7
Table 8.2: Panel Data on Crime and Unemployment by City

Source: Woolridge (5th Edition)

8.2.3 Advantages of Panel Data Over Pooled Data

Because panel data require replication of the same units over time, panel data sets,
especially those on individuals, households, and firms, are more difficult to obtain
than pooled cross sections. Not surprisingly, observing the same units over time
leads to several advantages over cross-sectional data or even pooled cross-sectional
data. The benefit that we will focus on is of having multiple observations on the
same units which allows us to control for certain unobserved characteristics of
individuals, firms, etc. As we will see, the use of more than one observation can
facilitate causal inference in situations where inferring causality would be difficult
if only a single cross section were available. A second advantage of panel data is
that it allows us to study the importance of lags in the behaviour or the result of
decision making. This information can be significant because many economic
policies can be expected to have an impact only after some time has passed. It
therefore follows from here that the advantage of panel data is that we can observe
the ‘before and after effects’ of receiving a treatment by the same individual. It

8
also provides the possibility of isolating the effects of treatment from other factors
affecting the outcome.

Panel data obtained by combining both the cross sectional and time series data
capture both the inter cross sectional differences as well as the intra cross sectional
dynamics. It has several other advantages over cross sectional and time series data.
For instance, cross sectional data may be viewed as a panel with T=1 and time
series data may be viewed as a cross section with N=1. Hence, panel data
combining both cross section and time series data provides more degrees of
freedom and more sample variability than either only the cross sectional or only the
time series data. It hence improves the efficiency of econometric estimates.

Evaluating the effectiveness of certain programs by using a cross-sectional sample

typically suffers from the fact that those receiving treatment are different from those
without. In other words, one does not simultaneously observe what happens to an
individual when she receives the treatment or when she does not. An individual is
observed as either receiving treatment or not receiving treatment. Using the
difference between the treatment group and control group could suffer from two
sources of biases: (i) selection bias due to differences in observable factors between
the treatment and control groups and (ii) selection bias due to endogeneity of
participation in treatment.

It is also frequently argued that the real reason one finds (or does not find) certain
effects is ‘due to ignoring the effects of certain variables in a model specification
which are correlated with the included explanatory variables’. Panel data contain
information on both the inter-temporal dynamics and the individuality of the
entities. This therefore allows for one to control for the effects of missing or
unobserved variables.

By pooling random samples drawn from the same population, but at different points
in time, we can get more precise estimators and test statistics with higher power.

9
Pooling is helpful in this regard only in-so-far as the relationship between the
dependent variable and at least some of the independent variables remain constant
over time. Using pooled cross sections raises a statistical complication viz. the two
populations could have different distributions. To reflect for the fact that the
populations may have different distributions in different time periods, we allow the
intercept to differ across periods. This is also accomplished by including dummy
variables for all but one year i.e. for the earliest year in the sample which is usually
chosen as the base year. Sometimes, the pattern of coefficients on the year dummy
variables could itself be of interest.

Check Your Progress 1 [answer within the space given in about 50-100 words]

1) Distinguish between time-series data and cross-section data.

…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
………………………………………………………………………………….

2) Differentiate between pooled data and panel data.

3) State the two main advantages of ‘panel data’ over ‘pooled data’.

…………………………………………………………………………………
…………………………………………………………………………………
10
…………………………………………………………………………………
…………………………………………………………………………………
………………………………………………………………………………….

4) What is a ‘statistical complication’ in pooling two cross sections? How is this

complication dealt with in practice?

8.3 LINEAR STATIC PANEL DATA MODEL

Suppose for each cross section unit we collect data on same set of variables for T
time periods. Let X be a vector of k exogenous variables which affect Y. At any
time point ‘t’, the population model is like:

Yit  X it   ci  uit , t = 1,2,….,T, i=1,2,…,N .................... (8.3)

where ci is the unobserved effect and u it is the random error term. (8.3) is a panel
regression with ‘i’ as an indicator of cross section unit and ‘t’ as an indicator of
time. The most commonly used method for estimating the parameters  is the
‘ordinary least square’ (OLS). The OLS assumes that the explanatory variables are
exogenous in nature and they are uncorrelated with the random error term. Primary
motivation behind the panel data is to solve for the omitted variables problem. In
panel data models, we consider time to account for the unobserved effect (like
quality in the example considered above). We assume that the ‘unobserved effects’
are random variables. This is an instance of a linear static panel data model. It is
a static model because all explanatory variables are contemporaneous dates
corresponding to the value of Y in period t. In contrast, in a dynamic panel data

11
model, one or more lagged dependent variables are allowed in the models as a
‘partial adjustment mechanism’. In this unit, we discuss only the static panel data
models. You may however note that a dynamic panel model, with one lagged
dependent variable and a single regressor X, is defined as:

yit  1  yit 1  X it  2  ci   it ....... (8.4)

where ci is for a specific unobserved effect and  it is the overall random error term.

8.3.1 Chow Test

Chow Test examines whether parameters of one group of data are equal to those in
the other groups. Simply put, the test checks whether the data can be pooled. If
only intercepts are found different across groups, it becomes a ‘fixed effect model’.
Let us consider two groups from a model like y = α + βx + ε as follows:

y = α1 + β1 x + ε1 for n1 observations (group 1) .................... (8.5)

y = α2 + β2 x + ε2 for n2 observations (group 2) .................... (8.6)

The null hypothesis is α1 = α2 and β1 = β2 . If the null hypothesis is rejected, it

means the two groups have different slopes and intercepts and hence the data is not
poolable. The Chow test is simply an F test used to determine whether a multiple
regression function differs across two groups. We can also apply this test to two
different time periods. The ‘sum of squared residuals’ (SSR) obtained from the
pooled estimation for both the groups combined is designated as the ‘restricted
SSR’ (SSRr). The unrestricted SSR is the sum of the two SSRs obtained for the
two groups separately. A Chow test can also be computed for more than two time
periods. In such cases, we first estimate the restricted model by doing a pooled
regression and obtain the SSRr (i.e. the restricted SSR). We then run separate
regressions (for each of the time periods, say, T) to obtain the sum of squared
residuals for each time period. The unrestricted sum of squared residuals is then

12
obtained as SSRUR  SSR1  SSR2  ...........  SSRT . If there are k explanatory
variables (excluding the intercept or the time dummies) with T time periods, then
we are imposing (T -1)k restrictions for the (T +Tk) parameters estimated in the
unrestricted models. Hence, if n  n1  n2  .....  nT is the total number of
observations, then the ‘degrees of freedom’ (df) for the F test are ‘(T -1)k and
(n -T –Tk)’. We compute the F statistic as usual i.e.:

(SSRr -SSRur) (T -1)k / SSRur (n -T -Tk)] ................. (8.7)

You may simply note at this stage that, as with any F test based on sums of squared
residuals, this test is not robust to Heteroscedasticity.

8.4 FIXED EFFECT VERSUS RANDOM EFFECT

PANEL MODELS

With panel data, the most commonly estimated models are the fixed effects and the
random effects models. Let us therefore focus first on the major differences
between these two types of models. Several considerations affect the choice
between the two types of models. For this, first of all, one has to identify the nature
of the variables that have been omitted from the model. If we have reason to
believe that there are no omitted variables, or we believe that the omitted variables
are uncorrelated with the explanatory variables in the model, then a ‘random
effects’ (RE) model is probably the best. It will produce unbiased estimates of the
coefficients, use all the data available and produce the smallest standard errors. On
the other hand, if there are omitted variables, and these variables are correlated with
the explanatory variables in the model, then ‘fixed effect’ (FE) models provide a
means for controlling the ‘omitted variable bias’. In a fixed-effects model,
‘subjects’ serve as their own controls. The idea is that whatever effects the omitted
variables have on the subject at one time, they will also have the same effect at a
later time. In this sense, their effect will be ‘constant’ or ‘fixed’. However, for
this to be true, the omitted variables must have time-invariant values with time-

13
invariant effects. By time-invariant values, we mean that the value of the variable
does not change across time. Gender and race are obvious instances, but this can
also include the ‘educational level’ of the respondent.

Second, one needs to consider the variability within subjects or cross section of
units. If subjects change little across time, a fixed effects model may not work
very well. This is because, there needs to be within-subject variability if we are
to use subjects as their own controls. If there is little variability within subjects,
then the standard errors from fixed effects model could be too large. Conversely,
random effects models will often have smaller standard errors. But, the trade-off
is that their coefficients are more likely to be biased.

Third, one needs to decide whether one wants to estimate the effect of variables
whose values do not change across time. With fixed effects models, we do not
estimate the effect of variables whose values do not change across time. Rather,
we control for them or ‘partial them out’. This is similar to an experiment with
random assignment. Though the RE models estimate the effect of time-invariant
variables, the estimates could be biased because we are not controlling for omitted
variables. For a more clearer description, let us consider a situation where
y and x  x1 , x2 ,.............., xk  are observable random variables with a linear
relationship like as:

y    x  c ............................................. (8.8)

where ‘c’ the unobservable random variable. We are interested in the partial effect
of the observable explanatory variables xj while holding ‘c’ constant. Our interest
is to estimate the vector  . If ‘c’ is uncorrelated with x, then ‘c’ is just another
unobserved factor uncorrelated with the explanatory variables. If covx j , c   0 for

some j, then we cannot consistently estimate  .

8.4.1 Fixed Effect Model

14
Fixed effect (FE) are thus variables that are ‘constant across individuals’. These
variables are like age, sex, ethnicity which do not change (or change at a constant
rate) over time. FE explores the relationship between the predictor variables (i.e.
explanatory or independent variables) and outcome variables (i.e. the dependent
variable). The relationship between them is explored within an entity (country,
person, company, etc.). Each entity has its own individual characteristics that may
or may not influence the predictor variables. For instance, being a male or female
could influence the opinion toward certain issue, the political system of a particular
country could have some effect on trade or GDP, the business practices of a
company may influence its stock price, etc. When using FE, we assume that
something within the individual may impact, or bias, the predictor and therefore we
might wish to control for this. This is the rationale behind the assumption of the
correlation between entity’s error term and predictor variables. FE removes the
effect of those time-invariant characteristics so that we can assess the net effect of
the predictors on the outcome variable. Another important assumption of the FE
model is that the time-invariant characteristics are unique to the individual and are
not correlated with other individuals’ characteristics. In other words, each entity
is different and therefore the entity’s error term and the constant (which captures
individual characteristics) are not correlated with the others. If the error terms are
correlated, then the FE model is not suitable. In that case, we need to model that
relationship using the RE model.

The FE model allows the unobserved individual effects to be correlated with the
included variables. We can therefore model the differences between units as
parametric shifts of the regression function. This could be viewed as applying only
to the cross-sectional units in the study and not for the additional units outside the
sample. For instance, an inter-country comparison may include the full set of
countries for which it is reasonable to assume that the model is constant. If the
individual effects are strictly uncorrelated with the regressors, then it might be
appropriate to model the individual specific constant terms as randomly distributed
across the cross-sectional units.

15
8.4.2 Random Effect Model

The random effects (RE) model is useful when we have reason to believe that the
unobserved effect is uncorrelated with all the explanatory variables. In such a
situation, the time constant’s unobserved effect is uncorrelated with the explanatory
variables and the parameters could be consistently estimated by using a single cross
section. There is therefore no need for panel data. But using a single cross section
disregards much useful information in the other time periods. We can therefore
use the data in a pooled OLS procedure i.e. just run the OLS of dependent variable
on the explanatory variables with the time dummies. This, too, produces consistent
estimators of the parameters under the RE assumption. But it ignores the fact that
the existence of unobserved effect in the error term in each time period is serially
correlated across time. We can use the GLS method to solve for the serial
correlation problem.

RE assumes that the unobserved effect is uncorrelated with all explanatory

variables irrespective of whether the explanatory variables are fixed over time or
not. Hence, we can include a variable like education even if it does not change
over time. But we are assuming that education is uncorrelated with the unobserved
effect. Hence, in applications of FE and RE, it is usually informative to compute
the pooled OLS estimates. Comparing the three sets of estimates can help us
determine the nature of the biases caused by leaving the unobserved effect. We
must, however, remember that, even if the unobserved effect is uncorrelated with
all explanatory variables in all time periods, the pooled OLS standard errors and
test statistics are generally invalid. This is because they generally ignore the often
substantial serial correlation in the composite errors. But it is possible to compute
the standard errors and test statistic which are robust to arbitrary serial correlation
(and Heteroscedasticity) in composite error. Note that the FE approach allows for
the arbitrary correlation while the RE approach does not. Hence, the FE approach

16
is widely thought to be a more convincing tool for estimating the ‘ceteris paribus’
effects.

To sum up, therefore, if the key explanatory variable is constant over time, we
cannot use FE to estimate its effect on dependent variable. In such situations, we
must rely on the RE (or pooled OLS) estimate. We can however use the RE
approach if we are able to assume that the unobserved effect is uncorrelated with
the explanatory variables. Typically, when one uses random effects, many time-
constant controls are included among the explanatory variables. However, with the
FE approach, it is not necessary to include such controls. RE is preferred to pooled
OLS due to its generally higher efficiency.

8.4.3 Policy Relevant Inference

The choice of fixed or random effects should be based on the basis of the
background knowledge and the availability of data. Let us have clarity on what we
mean here by the term ‘policy-relevant inference’. Ideally, policy-relevant
inferences are causal inferences about average treatment effects. Causal inferences
tell us what happens if we intervene and change the way the things are being done.
Within the regression modelling framework, and in the absence of experimental or
quasi-experimental data, many issues can be overcome by making assumptions.
But, estimating the treatment effect in an unbiased manner becomes difficult. A
realistic goal is therefore to produce policy-relevant estimates that may be biased,
but are not too much so, so as to lead to misleading policy recommendations. Recall
that the RE approach requires the strong assumption that the unobserved effect is
uncorrelated with any of the covariates. An important reason why the random
effect assumption fails is that there is usually non-random selection of cross section
units. For instance, if each school had drawn its pupils at random from the pupil
population, then the random effect assumption would hold. But, in reality, a non-
random selection mechanism operates through which parents choose schools and
some schools select which children to accept. Thus, the probability of selecting a

17
particular school varies systematically according to a series of factors
characterising the child, his/her family, the school itself or the higher local
education authority. Some of these factors will be associated with pupil attainment,
either directly or indirectly, through a mediating mechanism.

In light of the above, the debate remains inconclusive on whether we should

conclude that the FE approach is always preferable? The answer depends on
circumstances. The FE estimator for β is robust when a school F is not empty.
Therefore, if we have some knowledge about the school selection mechanism, and
we can include measures of these factors in the model as ‘controls’, then we can
estimate the average treatment effect using the RE approach.

Check Your Progress 2 [answer questions within 50-100 words]

1) Distinguish between Linear ‘Static Panel Data Model’ and ‘Dynamic Panel
Data Model’.

..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................

2) For what purpose is the ‘Chow Test’ used? What does it basically seek to
examine?

3) In what contexts, the ‘fixed effects’ or the ‘random effects’ panel data model
used?
18
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................

4) Specify the considerations that determine the choice between the FE and the
RE models.

.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................

8.5 LET US SUM UP

The unit introduces the panel data models. Panel data refers to observations on
multiple variables obtained over different time periods for the same firms or
individuals. It can be understood by the common expression ‘the two data sets are
drawn from the same panel’. In contrast, pooled cross section data refers to a time
series of cross-sections where the observation on each cross section do not
necessarily relate to the same units. In India, the surveys of NSSO, conducted on
many subjects periodically, usually at an interval of 5 years, are based on
independent random samples. They are therefore useful for methods of ‘pooled
data’ analysis and techniques. The unit has introduced you to the concepts and
application of two main leading approaches viz. the FE approach and the RE
approach. The contexts in which the choice between the two could be made is
outlined. Generally, the choice need to be based on the background knowledge on
variables and the nature of availability of data.

19
8.6 KEY WORDS

Pooled Cross Section Data Refers to data collected on same cross sections at two
different points of time but pooled for the purpose of
analysis. By combining the two samples, we get
increased degrees of freedom or higher ‘n’. Data is
pooled to assess the impact of a new government
policy. In other words, pooled cross section data
helps us in assessing the before/after effects.

Panel Data Refers to data collected for two time points on same
sample units. In other words, unlike in ‘pooled cross
section’, no new random samples are used in the two
surveys. Data is collected on same variables. This is
particularly useful for assessing the effect of ‘lags’
which is usually there in govt. policies introduced.

Self-Selection

Endogeneity

8.7 SUGGESTED BOOKS FOR FURTHER READING

1) Cooper, Donald R. and Pamela S. Schindler (2014), Business Research

Methods, Twelfth Edition, McGraw Hill Publication.

2) Research Methodology: Conceptual Foundation (2006), Unit 5, MEC 005,

IGNOU, ISBN: 81-266-2641-0.

3) Wooldridge, J. M. (2006). Introduction to econometrics: A modern approach. Michigan

State University. USA.

4) Greene, W. H. (2016). Econometrics. Prentice Hall.

20
8.8 ANSWERS/HINTS TO CHECK YOUR PROGRESS
EXERCISES

Check your progress 1

1) Time series is data is collected over different time points for the same set of
sample units (e.g. GDP for states). Cross section data is collected over different
sample units for a same point of time (e.g. NSSO’s surveys in India on 5-yearly
basis).

2) In case of panel data, the same cross sectional units are followed up over
different time periods. In case of pooled data, different cross section units (i.e.
two independently selected random samples) are observed for a given time
period.

3) One, it allows for causal inference. Second, it allows us to study the effect of
lags in the behaviour or the result of decision making.

4) The complication is that the two samples might have come from populations
with different distributions. The way it is dealt with is by allowing for different
intercept terms or by using a ‘dummy variable’.

Check Your Progress 2

1) It is like Yit  X it   ci  uit , t = 1,2,….,T, i=1,2,…,N where ci is for a

specific unobserved effect (like quality), assumed to be a random variable. u it
is the overall random error term, accounting for all other unobserved factors or
effects. When one or more lagged dependent variables are allowed in the
models, as a ‘partial adjustment mechanism’, it becomes a dynamic panel data
model.

2) It is used to test for the feature of ‘poolability’ across data collected in groups.
In other words, it seeks to examine whether the parameters in the models for
the two or more groups are equal.

3) In general, a panel data model is used for determining the effect of ‘omitted
variables’. If we have reason to believe, no variable is omitted, then ‘random
effect model’ can be used. If it is not so, applying the ‘fixed effects panel data
model’ helps in controlling for the ‘omitted variable bias’.
21
4) If the key explanatory variable is constant over time, then RE model is to be
applied. Alternatively, we can use the RE model when ‘we are able to assume
that the unobserved effect is uncorrelated with the explanatory variables’. The
FE approach allows for the arbitrary correlation while the RE approach does
not. Hence, the FE approach is a convincing tool for estimating the ‘ceteris
paribus’ effects. Therefore, The choice of fixed or random effects should be
based on the basis of the background knowledge and the availability of data.

Positive Economics
67% (3)
Positive Economics
859 pages
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
From Everand
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
Bob Mather
3/5 (1)
Panel Data Analysis of Microeconomic Decisions: Fall 2020
0% (1)
Panel Data Analysis of Microeconomic Decisions: Fall 2020
25 pages
Homework 3
No ratings yet
Homework 3
3 pages
Literature Reviews With MAXQDA
No ratings yet
Literature Reviews With MAXQDA
24 pages
Block 3
No ratings yet
Block 3
36 pages
Econometrics 5
No ratings yet
Econometrics 5
29 pages
CHAPTER 7
No ratings yet
CHAPTER 7
121 pages
Different Types of Data For Economic Analysis 2
No ratings yet
Different Types of Data For Economic Analysis 2
3 pages
Ecmetrics II Ch4
No ratings yet
Ecmetrics II Ch4
56 pages
PANEL_DATA_ANALYSIS
No ratings yet
PANEL_DATA_ANALYSIS
14 pages
Guja - Chap 16 PDF
No ratings yet
Guja - Chap 16 PDF
26 pages
A Guide to Panel Data Regression_ Theoretics and Implementation with Python TEXT
No ratings yet
A Guide to Panel Data Regression_ Theoretics and Implementation with Python TEXT
5 pages
Panel Data
100% (2)
Panel Data
5 pages
Time Series
No ratings yet
Time Series
12 pages
Panel Data Notes
No ratings yet
Panel Data Notes
26 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
42 pages
ECN3322 - Panel Data-1
No ratings yet
ECN3322 - Panel Data-1
56 pages
Panel Data Models
No ratings yet
Panel Data Models
112 pages
Lec06 - Panel Data
No ratings yet
Lec06 - Panel Data
160 pages
Introduction To Panel Data UG-students
100% (1)
Introduction To Panel Data UG-students
57 pages
Panel Data
No ratings yet
Panel Data
9 pages
30905022117 RohanChakraborty FinancialAnalytics CA2.PDF
No ratings yet
30905022117 RohanChakraborty FinancialAnalytics CA2.PDF
10 pages
Panel Data Analysis Using STATA 13
No ratings yet
Panel Data Analysis Using STATA 13
17 pages
PanelDataAnalysiswithStata1FEandREModelsMPRA Paper 76869
No ratings yet
PanelDataAnalysiswithStata1FEandREModelsMPRA Paper 76869
58 pages
Panel Data Assignment
No ratings yet
Panel Data Assignment
24 pages
Advanced Econometrics
No ratings yet
Advanced Econometrics
61 pages
Panel Data Analysis With Stata Part 1 Fixed Effects and Random Effects Models
No ratings yet
Panel Data Analysis With Stata Part 1 Fixed Effects and Random Effects Models
57 pages
Panel Data Slides - 230919 - 160722
No ratings yet
Panel Data Slides - 230919 - 160722
92 pages
Primer On Panel Data Analysis PDF
No ratings yet
Primer On Panel Data Analysis PDF
11 pages
Time Series Leterature
No ratings yet
Time Series Leterature
5 pages
Topic 1_An Introduction to Panel Data Analysis
No ratings yet
Topic 1_An Introduction to Panel Data Analysis
37 pages
Panel Data Econometrics Kenya
No ratings yet
Panel Data Econometrics Kenya
114 pages
Chapter 5 Panel Data (2) (1)
No ratings yet
Chapter 5 Panel Data (2) (1)
47 pages
Emping Stat Ass
No ratings yet
Emping Stat Ass
5 pages
Panel Data Method-Baltagi
100% (1)
Panel Data Method-Baltagi
51 pages
Unbalanced Panel Data PDF
No ratings yet
Unbalanced Panel Data PDF
51 pages
Samggfy
No ratings yet
Samggfy
2 pages
Yaffee Promer For Panel Data Analysis
No ratings yet
Yaffee Promer For Panel Data Analysis
12 pages
Structure of Economic Data
No ratings yet
Structure of Economic Data
4 pages
Principles of Economics Types of Data
No ratings yet
Principles of Economics Types of Data
32 pages
Chapter 2 Panel Data
No ratings yet
Chapter 2 Panel Data
17 pages
PD2004_1
No ratings yet
PD2004_1
24 pages
Econometrics
No ratings yet
Econometrics
5 pages
Panel Time-Series
No ratings yet
Panel Time-Series
113 pages
BellJonesExplainingFixedEffects_withAppendix
No ratings yet
BellJonesExplainingFixedEffects_withAppendix
40 pages
ASSIGNMENT 1 Econometrics
No ratings yet
ASSIGNMENT 1 Econometrics
7 pages
Panel Data Assignment
No ratings yet
Panel Data Assignment
32 pages
Introductory Econometrics Chapter 1
No ratings yet
Introductory Econometrics Chapter 1
14 pages
Introduction To Panel Data Analysis
No ratings yet
Introduction To Panel Data Analysis
18 pages
Panel Data
No ratings yet
Panel Data
105 pages
FE&RM-Unit-2
No ratings yet
FE&RM-Unit-2
47 pages
Panel Data Methods
No ratings yet
Panel Data Methods
17 pages
econometrics 2
No ratings yet
econometrics 2
20 pages
A Guide to Panel Data Regression_ Theoretics and Implementation with Python
No ratings yet
A Guide to Panel Data Regression_ Theoretics and Implementation with Python
17 pages
CH 1 and 2
No ratings yet
CH 1 and 2
35 pages
Econometrics_1_
No ratings yet
Econometrics_1_
19 pages
Test Bank for Introductory Econometrics: A Modern Approach, 7th Edition, Jeffrey M. Wooldridge instant download
100% (3)
Test Bank for Introductory Econometrics: A Modern Approach, 7th Edition, Jeffrey M. Wooldridge instant download
41 pages
Panel Data From Time Series of Cross-Sections
No ratings yet
Panel Data From Time Series of Cross-Sections
18 pages
Econ T1 Eng
No ratings yet
Econ T1 Eng
14 pages
CH - 13 - Pooling Cross Sections Across Time Simple Panel Data Methods
No ratings yet
CH - 13 - Pooling Cross Sections Across Time Simple Panel Data Methods
8 pages
Topic 6 - Static Panel Data
No ratings yet
Topic 6 - Static Panel Data
21 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Perfect-Bayesian
No ratings yet
Perfect-Bayesian
16 pages
PG 1-18 Pankaj Chaudhary Testing of CAPM in Indian Contextjjj
No ratings yet
PG 1-18 Pankaj Chaudhary Testing of CAPM in Indian Contextjjj
18 pages
Lecture 27 - 28th July, 2021
No ratings yet
Lecture 27 - 28th July, 2021
4 pages
Open Access Fishery
No ratings yet
Open Access Fishery
7 pages
Property Rights
No ratings yet
Property Rights
11 pages
Demand For Environmental Goods
No ratings yet
Demand For Environmental Goods
6 pages
Lecture 18 NI & CA Balance - Income Approach
No ratings yet
Lecture 18 NI & CA Balance - Income Approach
14 pages
The Review of Economic Studies LTD
No ratings yet
The Review of Economic Studies LTD
27 pages
PUBDET-2019 Result Economics: Appno Rollno PWD - Elig Subject Marks GMR Scrank Strank Oarank Obrank Pwdrank
No ratings yet
PUBDET-2019 Result Economics: Appno Rollno PWD - Elig Subject Marks GMR Scrank Strank Oarank Obrank Pwdrank
15 pages
DS4A Resources 2
No ratings yet
DS4A Resources 2
9 pages
DAY 4 RES 12 Writing Chapter 1 2
No ratings yet
DAY 4 RES 12 Writing Chapter 1 2
26 pages
Budgetted-Outlay statistics and probability
No ratings yet
Budgetted-Outlay statistics and probability
3 pages
HW3
No ratings yet
HW3
14 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
Freq PDF
No ratings yet
Freq PDF
207 pages
BADS (KMBA 106) - Qus Bank
No ratings yet
BADS (KMBA 106) - Qus Bank
7 pages
Evan
No ratings yet
Evan
2 pages
Aadt1.Csv and Aadt2.Csv From Ublearns - Fit A LR Model Fit1 From Aadt1.Csv
No ratings yet
Aadt1.Csv and Aadt2.Csv From Ublearns - Fit A LR Model Fit1 From Aadt1.Csv
4 pages
c9a09ASSIGNMENT 2
No ratings yet
c9a09ASSIGNMENT 2
2 pages
Assignment 2 - Machine Learning
No ratings yet
Assignment 2 - Machine Learning
3 pages
AI-900 - Fundamental Principles of ML
No ratings yet
AI-900 - Fundamental Principles of ML
55 pages
LAB 06 One Way Anova
No ratings yet
LAB 06 One Way Anova
9 pages
Hypothesis Test
No ratings yet
Hypothesis Test
6 pages
Teacher Assistants Working With Students With Disability: The Role of Adaptability in Enhancing Their Workplace Wellbeing
No ratings yet
Teacher Assistants Working With Students With Disability: The Role of Adaptability in Enhancing Their Workplace Wellbeing
24 pages
Veeresh - DCCBank - Final
No ratings yet
Veeresh - DCCBank - Final
80 pages
Example Thesis Problem Related To Business
100% (3)
Example Thesis Problem Related To Business
5 pages
BA Da1 22MBA0168
No ratings yet
BA Da1 22MBA0168
9 pages
FR2202mock Exam
No ratings yet
FR2202mock Exam
6 pages
Tourism Management Perspectives: A B A C
No ratings yet
Tourism Management Perspectives: A B A C
9 pages
Important Questions
No ratings yet
Important Questions
20 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
7 pages
Data Mining Edited
No ratings yet
Data Mining Edited
7 pages
Student Level of Attendance L Low H High Final Exam Score (%) A B C D E F G H I J K
No ratings yet
Student Level of Attendance L Low H High Final Exam Score (%) A B C D E F G H I J K
3 pages
Care Services By: Factors Explaining The Use of Health The Elderly
No ratings yet
Care Services By: Factors Explaining The Use of Health The Elderly
26 pages
19MIS0424 Yerram Karthik
No ratings yet
19MIS0424 Yerram Karthik
72 pages
Principles and Planning For Research: A. B. C. D
No ratings yet
Principles and Planning For Research: A. B. C. D
62 pages
Aryan Resume
No ratings yet
Aryan Resume
2 pages

Ecotrics (PR) Panel Data Reference

Uploaded by

Ecotrics (PR) Panel Data Reference

Uploaded by

UNIT 8 INTRODUCTION TO PANEL DATA

After reading this unit, you will be able to:

 differentiate between panel data and pooled data;

 explain, with illustrations, the panel data models;

 Problem of endogeneity in a static panel data model

 illustrate the framework of a ‘pooled cross section data’;

 elucidate the features of ‘panel data’;

 contrast the advantages of ‘panel data’ over ‘pooled data’;

 specify the steps involved in performing a Chow Test;

 present a comparative profile of the relative contexts in which the FE or the RE

a) Time Series Data: Many observations (large T) on as few as one unit

b) Pooled Cross Sections: Two or more independent samples of many

c) Panel Data: Observations of multiple phenomena obtained over multiple

8.2 PANEL DATA MODELS

Log (outputit )   t  1 log( lobourit )   2 log( capital it )   3 spilloverit  quality i  u it

Here, spillover is a measure of foreign firm concentration in a region containing the

Another context we can consider relates to the effect of a training programme to

log( performanceit )   t  z it    1 prog it  ci  uit

8.2.1 Pooled Cross Section Data

Source: Woolridge (5th Edition)

8.2.2 Panel Data

Source: Woolridge (5th Edition)

8.2.3 Advantages of Panel Data Over Pooled Data

Evaluating the effectiveness of certain programs by using a cross-sectional sample

1) Distinguish between time-series data and cross-section data.

2) Differentiate between pooled data and panel data.

4) What is a ‘statistical complication’ in pooling two cross sections? How is this

8.3 LINEAR STATIC PANEL DATA MODEL

Yit  X it   ci  uit , t = 1,2,….,T, i=1,2,…,N .................... (8.3)

yit  1  yit 1  X it  2  ci   it ....... (8.4)

8.3.1 Chow Test

y = α1 + β1 x + ε1 for n1 observations (group 1) .................... (8.5)

y = α2 + β2 x + ε2 for n2 observations (group 2) .................... (8.6)

The null hypothesis is α1 = α2 and β1 = β2 . If the null hypothesis is rejected, it

(SSRr -SSRur) (T -1)k / SSRur (n -T -Tk)] ................. (8.7)

8.4 FIXED EFFECT VERSUS RANDOM EFFECT

some j, then we cannot consistently estimate  .

8.4.1 Fixed Effect Model

RE assumes that the unobserved effect is uncorrelated with all explanatory

8.4.3 Policy Relevant Inference

In light of the above, the debate remains inconclusive on whether we should

Check Your Progress 2 [answer questions within 50-100 words]

8.5 LET US SUM UP

8.7 SUGGESTED BOOKS FOR FURTHER READING

1) Cooper, Donald R. and Pamela S. Schindler (2014), Business Research

2) Research Methodology: Conceptual Foundation (2006), Unit 5, MEC 005,

3) Wooldridge, J. M. (2006). Introduction to econometrics: A modern approach. Michigan

4) Greene, W. H. (2016). Econometrics. Prentice Hall.

Check your progress 1

Check Your Progress 2

1) It is like Yit  X it   ci  uit , t = 1,2,….,T, i=1,2,…,N where ci is for a

You might also like