0% found this document useful (0 votes)
29 views

Panel Data Stata

This document provides an introduction to panel regression models and how to apply them in Stata. It defines panel data as longitudinal data collected on many individuals over multiple time periods. The document discusses the advantages of panel data over cross-sectional and time series data alone. It then covers pooled, fixed effects, and random effects panel regression models and how to test between these using the Breusch-Pagan and Hausman tests in Stata. An example is given applying a country fixed effects model to study the relationship between per capita income and services share of GDP using a cross-country panel dataset.

Uploaded by

Aneeza Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Panel Data Stata

This document provides an introduction to panel regression models and how to apply them in Stata. It defines panel data as longitudinal data collected on many individuals over multiple time periods. The document discusses the advantages of panel data over cross-sectional and time series data alone. It then covers pooled, fixed effects, and random effects panel regression models and how to test between these using the Breusch-Pagan and Hausman tests in Stata. An example is given applying a country fixed effects model to study the relationship between per capita income and services share of GDP using a cross-country panel dataset.

Uploaded by

Aneeza Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Panel Regression in Stata

An introduction to type of models and tests

Gunajit Kalita
Rio Tinto India

STATA Users Group Meeting


1st August, 2013, Mumbai
2

Content

• Understand Panel structure and basic econometrics


behind

• Application of different Panel regression models and


post estimation tests in STATA
What are Panel Data?
Panel data are a type of longitudinal data, or data collected at different points in time.
Three main types of longitudinal data:
• Time series data: Many observations (large t) on as few as one unit (small N).
Examples: stock price trends, aggregate national statistics
• Pooled cross sections: Two or more independent samples of many units (large N)
drawn from the same population at different time periods:
• General Social Surveys
• India’s Decennial Census
• Panel data: Two or more observations (small t) on many units (large N)
• Panel surveys of households and individuals (NSS EUS, CES)
• Data on organizations and firms at different time points (ASI, NSS)
• Aggregated country/regional data over time (WDI,WEO,BOP)
• The literature on econometrics of panel regression and options available in STATA is
vast, this presentation will only introduce the fundamentals of this topic today
4

Advantage of Panel Data


• It relate to individuals, firms, states, countries etc., over time, presence of
heterogeneity in these units is natural
Heterogeneity
• Such heterogeneity can be explicitly taken into account by allowing
individual specific variables

• It combines time series of cross section observations, thus


• Gives more informative data, more variability, less collinearity among
Degree of variables, more degree of freedom and more efficiency
freedom • By studying repeated cross section of observation, it is better suited to
study dynamics of change

• Panel data can better detect and measures effects that simply can not be
observed in pure cross section or time series data.
Unobservable • For example, the effect of minimum wage laws on employment and
earnings can be better studied if we include successive waves of
minimum wage increase in the federal and/or state minimum wages

• Panel data enables us to study more complicated behavioural models


• For example, phenomenon such as economies of scale and technological
Behavioural
change can be better handled by panel data
Models • It can also minimise the bias that might result if we aggregate individuals
or firms into broad aggregates
5

Data requirement
• Basic panel methods require at least two
“waves” of measurement
Consider services share of GDP in a
country and its economic development
(GDP per capita) in the last three decades
• One way to construct your panel is to
create a single record for each combination
of unit (country, firm, individual) and time
period
• Data include:
• A time-invariant unique identifier for
each unit (country, firm, individual)
• A time-varying outcome (Services
share in GDP, GDP, Inflation)
• An indicator of time (Year, Quarter,
Month, day)

• Variation for dependent variable and


regressors:
Overall: Over time and individuals
Between: Between individuals
Within: Within individuals (over time)
6

Panel data models


Pooled Model
• The pooled model specifies constant coefficients, the usual assumptions for cross-
sectional analysis. It is most restrictive panel model

yit    xit'   uit

• The default standard errors erroneously assume errors are independent over i for
given t.
Individual-specific effects model
• We assume that there is unobserved heterogeneity across individuals captured by  i
Example: unobserved ability of an individual that affects wages
• The main question is whether the individual-specific effects  i are correlated with
the regressors.
• If they are correlated, we have the fixed effects (FE) model. If they are not
correlated we have the random effects (RE) model
7

Individual-specific effects model


Fixed effects model (FE)
• It allows individual-specific effects  i to be correlated with the regressors x . We
include  i as intercepts. Each individual has a different intercept term and the same
slope parameters y    x'   u
it i it it
• We can recover the individual specific effects after estimation as:
ˆ i  yi  xi' ˆ
In other words, the individual-specific effects are the leftover variation in the
dependant variable that cannot be explained by the regressors
Random effects model (RE)
• It assumes that individual-specific effects are distributed independently of the
regressors, we include  i in the error term. Each individual has the same slope
parameters and a composite error term  it   i  eit
yit  xit'   (i  eit )
Here var( it )   2   e2 and cov( it ,  is )   2 , so   cor( it ,  is )    (     e )
2 2 2

• Rho is the interclass correlation of the error. Rho is the fraction of the variance in
the error due to the individual-specific effects. It approaches 1 if the individual effects
dominate the idiosyncratic error
8

Choosing between fixed and random effects


Breusch-Pagan Lagrange Multiplier (LM) test
• This is a test for the random effects model based on the OLS residual. The LM test
helps to decide between a random effects regression and a simple OLS regression
The null hypothesis is that variances across entities is zero. Test whether  u or
2

equivalentlycor (uit , uis ) is significantly different from zero.
• If the LM test is not significant, it implied no significant difference across units( i.e. no
panel effect), thus can run simple OLS regression
Hausman test
• The null hypothesis is that the preferred model is random effects vs. the alternative
fixed effects. It tests whether the unique errors ( i) are correlated with the
regressors, the null hypothesis is they are not correlated.
• The random effects estimator is more efficient so we need to use it if the Hausman
test supports it. The Hausman test statistic can be calculated only for the time-
varying regressors
• The Hausman test statistic is:

ˆ  ˆ '
     
H   RE   FE V ˆRE  V ˆFE ˆRE  ˆFE 
9

Example: Cross country panel


Two Waves of Services Growth (NBER WP:14968)
“The positive association between the Command: lowess ser_sh lngdpc_pp
service sector share of output and per
Lowess Plot of the Relationship between Log Per Capita
capita income is one of the best-known
Income and Services/GDP (1980-2010), 116 countries
regularities in all of growth and
development economics. Yet there is
less than complete agreement on the

80
nature of that association. Here we
identify two waves of service sector
Services (% of GDP)

growth…”

60
• They identify two waves of service
sector growth, a first wave in

40
countries with relatively low levels of
per capita GDP and a second wave
in countries with higher per capita 20
incomes
• There is evidence of the second
0

wave occurring at lower income


4 6 8 10 12
levels after 1990 Log Per Capita GDP at PPP
bandwidth = .8
• Does that mean India’s
experience is not an aberration? Servit
 Constant  i i Di  1Yit   2Yit2   3Yit3   4Yit4   it
GDPit
10

Panel-Fixed effect (FE) model


STATA Commands:

• To convert country name from


string to individual code
encode country, gen(con_cod)

• Declare the Panel variables


xtset con_code year

• Run country fixed effect model


xtreg ser_sh lngdpc_pp lngdp_pp2
lngdp_pp3 lngdp_pp4 lngdp_90s
lngdp_20s,fe
11

Panel-Random effect (RE) model


Random-effects GLS regression Number of obs = 3397
Group variable: con_cod Number of groups = 113

R-sq: within = 0.1983 Obs per group: min = 10


STATA Commands: between = 0.2220 avg = 30.1
overall = 0.2130 max = 31

• Run random effect model corr(u_i, X) = 0 (assumed)


Wald chi2(6)
Prob > chi2
=
=
841.07
0.0000
xtreg ser_sh lngdpc_pp
lngdp_pp2 lngdp_pp3 lngdp_pp4 ser_sh Coef. Std. Err. z P>|z| [95% Conf. Interval]

lngdp_90s lngdp_20s,re lngdpc_pp 352.3767 73.52802 4.79 0.000 208.2644 496.489


lngdp_pp2 -64.61057 14.17162 -4.56 0.000 -92.38643 -36.83472
lngdp_pp3 5.26195 1.191796 4.42 0.000 2.926072 7.597828
lngdp_pp4 -.1590866 .0369467 -4.31 0.000 -.2315008 -.0866725
lngdp_90s .3669355 .0308193 11.91 0.000 .3065308 .4273402
lngdp_20s .6244614 .0347734 17.96 0.000 .5563067 .692616
_cons -677.8364 140.3619 -4.83 0.000 -952.9406 -402.7321

sigma_u 10.817956
sigma_e 5.8722998
rho .7724016 (fraction of variance due to u_i)

• Testing for cross-sectional Ho: Residual are not correlated


dependence or contemporaneous
correlation
xtcsd, pesaran abs
12

OLS or RE or Fe
Breusch and Pagan Lagrangian multiplier test for random effects
STATA Commands:
ser_sh[con_cod,t] = Xb + u[con_cod] + e[con_cod,t]
• Breusch-Pagan Lagrange Multiplier
Estimated results:
(LM) test: OLS vs RE Var sd = sqrt(Var)

ser_sh 191.0374 13.82163


quietly xtreg ser_sh lngdpc_pp e 34.48391 5.8723
lngdp_pp2 lngdp_pp3 lngdp_pp4 u 117.0282 10.81796

lngdp_90s lngdp_20s,re Test: Var(u) = 0


xttest0 chibar2(01) = 29076.72
Prob > chibar2 = 0.0000

Coefficients

• Hausman test: RE vs FE (b)


fe
(B)
re
(b-B)
Difference
sqrt(diag(V_b-V_B))
S.E.
quietly xtreg ser_sh lngdpc_pp lngdpc_pp 332.9264 352.3767 -19.45025 13.70544
lngdp_pp2 lngdp_pp3 lngdp_pp4 lngdp_pp2 -60.60611 -64.61057 4.00446 2.695435
lngdp_pp3 4.906946 5.26195 -.3550045 .2279756
lngdp_90s lngdp_20s,fe lngdp_pp4 -.1477659 -.1590866 .0113207 .0070114
lngdp_90s .3742022 .3669355 .0072667 .0051062
estimate store fe lngdp_20s .6419146 .6244614 .0174533 .0128005

b = consistent under Ho and Ha; obtained from xtreg


quietly xtreg ser_sh lngdpc_pp B = inconsistent under Ha, efficient under Ho; obtained from xtreg

lngdp_pp2 lngdp_pp3 lngdp_pp4 Test: Ho: difference in coefficients not systematic

lngdp_90s lngdp_20s,re chi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)


= 4.58
estimate store re Prob>chi2 = 0.3337

hausman fe re
13

Prediction
STATA Commands:
• Prediction fitted value including
individual-specific effects

70
predict yhat, xbu
Services share of GDP (%)

1990 2000

60
Prediction standard error of the
fitted values
predict se, stdp

50
• Prediction standard error band

40
gen up_se=yhat+2*se
gen low_se=yhat-2*se
30
• Lowess Curve 4 6 8 10 12
Log Per Capita GDP at PPP

twoway (lowess yhat Predicted path 2SE Band


2SE Band India's actual services share
lngdpc_pp)(lowess up_se
lngdpc_pp) (lowess low_se
lngdpc_pp)(line ser_sh lngdpc_pp
if (con_cod==50))
14

To produce robust standard error


estimates for linear panel models
15

References
• Panel data analysis, Princeton University, http://dss.princeton.edu/training/
• Econometric Academy by Ani Katchova,
https://sites.google.com/site/econometricsacademy/econometrics-models
• Introduction to Regression Models for Panel Data Analysis, Indiana University by
Prof. Patricia A. McManus, http://www.indiana.edu/~wim/docs/10_7_2011_slides.pdf
• Econometric analysis using Panel Data by Ranjit Kumar Paul,
http://www.iasri.res.in/sscnars/socialsci/12-Panel%20data.pdf
• Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence by
Daniel Hoechle, http://fmwww.bc.edu/repec/bocode/x/xtscc_paper.pdf
• Two Waves of Services Growth by Poonam Gupta and Barry Eichengreen, NBER
Working Paper no. 14968, http://www.nber.org/papers/w14968.pdf
16

Thank You

Gunajit Kalita
[email protected]
[email protected]
My Blog: http://macroscan.wordpress.com/

You might also like