Full Download Econometrics of panel data : methods and applications First Edition Biørn PDF DOCX
Full Download Econometrics of panel data : methods and applications First Edition Biørn PDF DOCX
com
https://textbookfull.com/product/econometrics-of-panel-data-
methods-and-applications-first-edition-biorn/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/time-series-and-panel-data-
econometrics-first-edition-m-hashem-pesaran/
textboxfull.com
https://textbookfull.com/product/principles-of-econometrics-theory-
and-applications-1st-edition-mignon/
textboxfull.com
https://textbookfull.com/product/relative-fidelity-processing-of-
seismic-data-methods-and-applications-1-edition-edition-wang/
textboxfull.com
https://textbookfull.com/product/data-analysis-with-microsoft-power-
bi-brian-larson/
textboxfull.com
Engineering optimization: applications, methods and
analysis First Edition Rhinehart
https://textbookfull.com/product/engineering-optimization-
applications-methods-and-analysis-first-edition-rhinehart/
textboxfull.com
https://textbookfull.com/product/advances-in-panel-data-analysis-in-
applied-economic-research-nicholas-tsounis/
textboxfull.com
https://textbookfull.com/product/time-series-analysis-methods-and-
applications-for-flight-data-zhang/
textboxfull.com
https://textbookfull.com/product/data-science-and-social-research-ii-
methods-technologies-and-applications-paolo-mariani/
textboxfull.com
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
Econometrics
of Panel Data
Methods and Applications
Erik Biørn
1
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
3
Great Clarendon Street, Oxford, ox2 6dp,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Erik Biørn 2017
The findings, interpretations, and conclusions expressed in this work are entirely
those of the authors and should not be attributed in any manner to the World Bank,
its Board of Executive Directors, or the governments they represent.
The moral rights of the author have been asserted
First Edition published in 2017
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence, or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2016937737
ISBN 978–0–19–875344–5
Printed in Great Britain by
Clays Ltd, St Ives plc
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
P R E FACE
Panel data is a data type used with increasing frequency in empirical research in eco-
nomics, social sciences, and medicine. Panel data analysis is a core field in modern
econometrics and multivariate statistics, and studies based on such data occupy a growing
part of the field in all the mentioned disciplines. A substantial literature on methods
and applications has accumulated, also synthesized in survey articles and a number of
textbooks. Why then write another book on panel data analysis? I hope that some of my
motivation and part of the answer will be given in the following paragraphs.
The text collected in this book has originated, and been expanded through several years,
partly in parallel with courses I have given at the University of Oslo for master’s and
doctoral students of economics. I have been interested in the field, both its models and
methods and applications to social sciences, for more than forty years. The first drafts were
lecture notes, later expanded and synthesized into course compendia, first in Norwegian,
later in English. In compiling the text, I have had no ambition of giving an account of
the history of panel data analysis and its various subfields, some of which have a longer
history than others. Within some 350 pages it is impossible to give a complete coverage,
and the choice of topics, and the depth of discussion of each of them, to some extent reflect
my preferences. Some readers may miss topics like cross-section-time-series analysis in
continuous time, duration analysis, and analysis of non-linear panel data models (outside
the limited dependent variables field). Topics I give specific attention to, more than many
comparable texts, I think, are coefficient identification of models mixing two-dimensional
and individual-specific variables, regression models with two-way random effects, models
and methods for handling random coefficients and measurement errors, unbalanced
panel data, and panel data in relation to aggregation. Problems at the interface between
unbalance and truncation, and between micro-econometrics and panel data analysis, are
also discussed.
It goes without saying that in a book dealing with data showing temporal–spatial vari-
ation, matrix algebra is unavoidable. Although panel data have a matrix structure, such
algebra should not be a core matter. I have experienced that many students starting on the
topic feel parts of the matrix algebra, especially when written in a dense, compact style,
to be an obstacle. I therefore set out to explain many of the models and methods in some
detail to readers coming to panel data analysis for the first time, including students famil-
iar with basic statistics and classical multiple regression analysis, and applied researchers.
Some technical material is placed in appendices. Yet some advanced and ‘modern’ topics
are discussed. Since one of my intentions is that students should be given the chance to
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
vi PREFACE
get an acceptable grasp of basic ideas without having to be involved in extensive matrix-
algebraic exercises, most chapters start with a simple example in scalar (or elementary
vector) notation, and then attempt to take the reader gradually to more complex cases in
fuller matrix notation, even if this necessitates some repetition.
The first chapters survey rather elementary materials. I believe that the initial sections
of the first eight chapters may be useful for bachelor students, provided their knowledge
of multiple regression analysis and basic mathematical statistics is sufficient. Chapters 1–3
and 5 contain mostly basic topics, Chapters 4 and 6–10 contain intermediate level topics,
and Chapters 11 and 12 contain somewhat advanced topics. I have, as far as possible, tried
to keep each chapter self-contained. Yet some connections exist. It may also be helpful to
note that Chapter 6 builds on Chapters 2, 3, and 5, that Chapters 6 and 7 expand topics in
Chapters 2–4, that Chapters 7 and 8 are methodologically related, that Chapter 11 builds
on Chapters 9 and 10, and that Chapter 12 builds on Chapters 3 and 6. Each chapter has
an initial summary, some also have a concluding section.
Some readers may miss exercises with solutions, which would have expanded the size
of the book. On the other hand, examples of applications, some from my own research,
or experiments, and illustrations utilizing publicly available data are included in several
chapters. My view of the panel data field is that the core topics have, to some extent, the
character of being building blocks which may be combined, and the number of potential
combinations is large. Not many combinations are discussed explicitly. Certain potential
combinations—for example, dynamic equations with random coefficients for unbalanced
panel data and random coefficients interacting with limited dependent variables with
measurement errors—have (to my knowledge) hardly been discussed in any existing text.
Although primarily aiming at students and practitioners of econometrics, I believe that
my book, or parts of it, may be useful also for students and researchers in social sciences
outside economics and for students and research workers in psychology, political science,
and medicine, provided they have a sufficient background in statistics. My intention, and
hope, is that the book may serve as a main text for lecturing and seminar education at
universities. I believe that parts may be useful for students, researchers, and other readers
working on their own, inter alia, with computer programming of modules for panel data
analysis.
During the work, I have received valuable feedback from many colleagues and students,
not least PhD students writing their theses, partly under my supervision, on applied panel
data topics. Questions frequently posed during such discussions have undoubtedly made
their mark on the final text. I want to express my gratitude to the many students who
have read, commented on, and in others ways been ‘exposed to’ my notes, sketches, and
chapter drafts. Their efforts have certainly contributed to eliminate errors. I also thank
good colleagues for many long and interesting discussions or for having willingly spent
their time in reading and commenting on drafts of preliminary versions of the various
sections and chapters, sometimes more than once. I regret that I have been unable to take
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
PREFACE vii
all their advice into account. I specifically want to mention (in alphabetical order) Jørgen
Aasness, Anne Line Bretteville-Jensen, John K. Dagsvik, Xuehui Han, Terje Skjerpen,
Thor Olav Thoresen, Knut R. Wangen, and Yngve Willassen. Needless to say, none of
them are to be held responsible for remaining errors or shortcomings. The text has been
prepared by the author in the Latex document preparation software, and I feel obliged
to the constructors of this excellent scientific text processor. Last, but not least, I express
my sincere gratitude to Oxford University Press, in particular Adam Swallow and Aimee
Wright, for their belief in the project and for support, encouragement, and patience.
Erik Biørn
Oslo, March 2016
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
C O NTEN TS
1 Introduction 1
1.1 Types of panel variables and data 1
1.2 Virtues of panel data: Transformations 3
1.3 Panel data versus experimental data 8
1.4 Other virtues of panel data and some limitations 9
1.5 Overview 11
x CONTENTS
CONTENTS xi
xii CONTENTS
CONTENTS xiii
xiv CONTENTS
CONTENTS xv
REFERENCES 383
INDEX 394
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
L I S T O F TA B L E S
1 Introduction
Panel data, longitudinal data, or combined time-series/cross-section data are terms used
in econometrics and statistics to denote data sets which contain repeated observations
on a selection of variables from a set of observation units. The observations cover simul-
taneously the temporal and the spatial dimension. Examples are: (1) time-series for
production, factor inputs, and profits in a sample of firms over a succession of years;
(2) time-series for consumption, income, wealth, and education in a sample of persons
or households over several years; and (3) time-series for manufacturing production, sales
of medical drugs, or traffic accidents for all, or a sample of, municipalities or counties,
or time-series for variables for countries in the OECD, the EU, etc. Examples (1) and (2)
relate to micro-data, (3) exemplifies macro-data.
‘Panel data econometrics is one of the most exciting fields of inquiry in econometrics
today’ (Nerlove, Sevestre, and Balestra (2008, p. 22)), with a history going back to at least
1950. We will not attempt to survey this interesting history. Elements of a survey can be
found in Nerlove (2002, Chapter 1); see also Nerlove (2014) as well as Griliches (1986,
Sections 5 and 6) and Nerlove, Sevestre and Balestra (2008). A background for the study
of panel data, placing it in a wider context, may also be given by a quotation from a text
on ‘modern philosophy’:
Space and time, or the related concepts of extension and duration, attained special prominence
in early modern philosophy because of their importance in the new science . . . Metaphysical
questions surrounding the new science pertained to the nature of space and time and their relation
to matter. Epistemological questions pertained to the cognition of space itself or extension in
general . . . and also to the operation of the senses in perceiving the actual spatial order of things.
(Hatfield (2006, p. 62))
In this introductory chapter, we first, in Section 1.1, briefly define the main types of
panel data. In Section 1.2 we illustrate, by examples, virtues of panel data and explain in
which sense they may ‘contain more information’ than the traditional data types cross-
section data and time-series data. Section 1.3 briefly contrasts panel data with experimen-
tal data, while some other virtues of panel data, as well as some limitations, are specified
in Section 1.4. An overview of the content of the book follows, in Section 1.5.
Using i as subscript for the unit of observation and t as subscript for the time-period, and
letting the data set contain N units and T time-periods, the coverage of a balanced panel
data set can be denoted as i = 1, . . . , N; t = 1, . . . , T.
Balanced panel data have a matrix structure. We need, in principle, three subscripts to
represent the observations: one for the variable number; one for the individual (unit)
number; and one for the period (year, quarter, month, etc.). A balanced panel data set can
therefore be arranged as three-dimensional matrices. Quite often N is much larger than T,
but for panels of geographic units, the opposite may well be the case. Regarding asymp-
totics, the distinction between N → ∞ and T → ∞, often denoted as ‘N-asymptotics’
(cross-sectional asymptotics) and ‘T-asymptotics’ (time-serial asymptotics), will often
be important. In ‘micro’ contexts, ‘short’ panels from ‘many’ individuals is a frequently
occurring constellation.
Quite often, however, some variables do not vary both across observation unit and
over time-periods. Examples of variables which do not vary over time, time-invariant
variables, are for individuals: birth year; gender; length of the education period (if for
all individuals the education has been finished before the sample period starts); and to
some extent attitudes, norms, and preferences. For firms they are: year of establishment;
sector; location; technical strength; and management ability. Such variables are denoted
as individual-specific or firm-specific. Examples of variables that do not vary across
individuals, individual-invariant variables, may be prices, interest rates, tax parameters,
and variables representing the macro-economic situation. Such variables are denoted
as time-specific or period-specific. As a common term for individual-specific and time-
specific variable we will use unidimensional variables. Variables showing variation across
both individuals and time-periods are denoted as two-dimensional variables. Examples
are (usually) income and consumption (for individuals) and production and labour input
(for firms).
The second, also very important, category is unbalanced panel data. Its characteristic is
that not the same units are observed in all periods, but some are observed more than once.
There are several reasons why a panel data set may become unbalanced. Entry and exit
of units in a data base (e.g., establishment and close-down of firms and marriages and
dissolution of households) is one reason, another is randomly missing observations in
time series. A particular type of unbalance is created by rotating panel data, which emerges
in sample surveys when the sample changes systematically in a way intended by the data
collector. Another major reason why unbalanced panel data may occur is endogenous
selection, meaning, loosely, that the selection of units observed is partly determined by
variables our model is intended to explain. This may complicate coefficient estimation
and interpretation if the selection mechanism is neglected or improperly accounted for
in the modelling and design of inference method. Asserting that selection problems are
potentially inherent in any micro-data set, panel data as well as cross-section data, is hardly
an exaggeration.
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
INTRODUCTION 3
where k is an intercept; β, α, and γ are coefficients and x, z, q are (row) vectors containing
all values of (x, z, q) in the data set; and i and t are index individuals and time periods,
respectively. We assume that xit , zi , qt , β, α, and γ are scalars, but the following argument
easily carries over to the case where the variables are row-vectors and the coefficients are
column-vectors. This expression is assumed to describe the relationship between E(y|x, z, q)
and x, z, q for any values of i and t. Let uit = yit −E(yit |x, z, q), which can be interpreted
as a disturbance, giving the equivalent formulation
First, assume that the data set is balanced panel data from N individuals and T periods,
so that we can specify
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
where now (x, z, q) denote vectors containing the values of (xit , zi , qt ) for i =
1, . . . , N; t = 1, . . . , T. A researcher may sometimes give primary attention to β and
wants to estimate it without bias, but α and γ may be of interest as well. Anyway,
we include zi and qt as explanatory variables because our theory implies that they
are relevant in explaining yit , and we do not have experimental data which allow
us to ‘control for’ zi and qt by keeping their values constant in repeated samples. If
uit has not only zero conditional expectation, but also is homoskedastic and serially
uncorrelated, we can from the NT observations on (yit , xit , zi , qt ) estimate (k, β, α, γ )
by ordinary least squares (OLS), giving Minimum Variance Linear Unbiased Estima-
tors (MVLUE), the ‘best possible’ linear estimators, or Gauss–Markov estimators of the
coefficients.
What would have been the situation if we only had had access to either time-series or
cross-section data? Assume first that pure time-series, for individual i = 1 (i.e., N = 1) in
periods t = 1, . . . , T, exist.1 Then Model (1.1) should be specialized to this data situation
by (conditioning on z1 is irrelevant)
where x1· = (x11 , . . . , x1T ). From time-series for y1t , x1t , and qt , we could estimate β, γ ,
and the composite intercept k + z1 α. This confirms: (i) pure time-series data contain no
information on individual differences or on effects of individual-specific variables; (ii) the
intercept is specific to the individual (unit); (iii) the coefficient α cannot be identified, as it
belongs to a variable with no variation over the data set (having observed z1 is of no help);
and (iv) the coefficients β and γ can be identified as long as x1t and qt are observable and
vary over periods. If u1t is homoskedastic (over t) and shows no serial correlation (over t),
then OLS applied on (1.2) will give estimators which are MVLUE for these coefficients in
the pure time-series data case.
Next, assume that a cross-section, for period t = 1 (i.e., T = 1), for individuals
i = 1, . . . , N, exists. Then (1.1) should be specialized to (conditioning on q1 is irrelevant):
1 We here consider the case with time-series from only one individual (unit) to retain symmetry with the
pure cross-section case below. Most of our following conclusions, however, carry without essential modifica-
tions over to situations with aggregate time-series for a sector or for the entire economy, since the equation
is linear. But individual-specific time-series are far from absent. We could, for example, possess annual time-
series of sales, stock prices, or employment for a specific company.
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
INTRODUCTION 5
where x·1 = (x11 , . . . , xN1 ). From cross-section data for yi1 , xi1 , and zi , we could estimate
β, α, and the composite intercept k+q1 γ . This confirms: (i) pure cross-section data contain
no information on period-specific differences or on the effects of period-specific variables;
(ii) the intercept is specific to the data period; (iii) the coefficient γ cannot be identified,
as it belongs to a variable with no variation over the data set (having observed q1 is of no
help); (iv) the coefficients β and α can be identified as long as xi1 and zi are observable and
vary across individuals. If ui1 is homoskedastic (over i) and serially uncorrelated (over i),
then OLS applied on (1.3) will give MVLUE for these coefficients in the pure cross-section
data case.
By panel data we may circumvent the problem of lack of identification of α from
times series data, when using the ‘time-series equation’ (1.2) and of γ from cross-
section data, when using the ‘cross-section equation’ (1.3). Moreover, we may control
for unobserved individual-specific or time-specific heterogeneity. We illustrate this from
(1.1), by first taking the difference between the equations for observations (i, t) and (i, s),
giving
yit −yis = (xit −xis )β +(qt −qs )γ +(uit −uis ), E(uit −uis |x, q) = 0,
(1.4)
i = 1, . . . , N; t, s = 1, . . . , T (t = s),
from which zi vanishes and, in contrast to (1.1), E(uit |z) = 0 is not required for consistency
of OLS. We consequently ‘control for’ the effect on y of z and ‘retain’ only the variation
in y, x, and q. Having the opportunity to do so is crucial if zi is unobservable and reflects
unspecified heterogeneity, but still is believed to affect yit . To see this, assume that zi is
unobservable and correlated with xit (across i) and consider using OLS on (1.1) with zi
excluded, or on
where ui1 is a disturbance in which we (tacitly) include the effect of zi . This gives a biased
estimator for β, since ui1 , via the correlation and the non-zero value of α, captures the
effect of zi on yi1 : we violate E(ui1 |z) = 0. This will not be the case if we instead use OLS
on (1.4).
By next in (1.1) taking the difference between the equations for observations (i, t) and
(j, t), it likewise follows that
yit −yjt = (xit −xjt )β +(zi −zj )α+(uit −ujt ), E(uit −ujt |x, z) = 0,
(1.5)
i, j = 1, . . . , N (i = j); t = 1, . . . , T,
from which qt vanishes and, in contrast to (1.1), E(uit |q) = 0 is not required for con-
sistency of OLS. We consequently ‘control for’ the effect on y of q and ‘retain’ only the
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
where ui1 is a disturbance in which we (tacitly) include the effect of qt . This gives a biased
OLS estimator for β, since u1t , via the correlation and the non-zero value of γ , captures
the effect of qt on y1t : we violate E(u1t |q) = 0. This not will be the case if we instead use
OLS on (1.5).
If only individual time-series (N = 1) are available, we may perform the transformation
leading to (1.4), but not the one leading to (1.5). Likewise, if one cross-section (T = 1) is
the only data available, we may perform the transformation leading to (1.5), but not the
one leading to (1.4). Transformations which may give both (1.4) and (1.5) are infeasible
unless we have panel data.
We also have the opportunity to make other, more complex, transformations of (1.1).
Deducting from (1.4) the corresponding equation when i is replaced by j (or deducting
from (1.5) the corresponding equation with t replaced by s) we obtain
By this double differencing, zi and qt disappear, and neither E(uit |z) = 0 nor E(uit |q) = 0
is needed for consistency of the OLS estimators. We thus control for the effect on y of
both z and q and retain only the variation in x. To see this, assume that both zi and qt are
unobservable and correlated with xit (over i and t, respectively), and consider using OLS
on either (1.1) with both zi and qt omitted, on
or on
while (tacitly) including the effect of, respectively, (qt , zi ), qt , and zi in the equation’s
disturbance, which will give biased (inconsistent) estimators for β. This will not be the
case when using OLS on (1.6).
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
INTRODUCTION 7
Haavelmo (1944, p. 50), more than 70 years ago, well before panel data became a
common term in econometrics, described the relevance of handling unobserved hetero-
geneity in relation to data variation as follows:
. . . two individuals, or the same individual in two different time periods, may be confronted with
exactly the same set of specified influencing factors and still . . . may have different quantities
y. . . . We may try to remove such discrepancies by introducing more “explaining” factors, x. But,
usually, we shall soon exhaust the number of factors which could be considered as common to all
individuals . . . and which, at the same time, were not merely of negligible influence upon y. The
discrepancies . . . may depend upon a great variety of factors, these factors may be different from
one individual to another, and they may vary with time for each individual.
Several other linear transformations can be performed on the linear relationship (1.1)
when having panel data. We will show five. Summation in (1.1) over, respectively, i, t,
and (i, t) and division by N, T, and NT, letting z̄ = N1 N zi , ȳ·t = N1 Ni=1 yit ,
1 T 1 T 1 N T
i=1
q̄ = T t=1 qt , ȳi· = T t=1 yit , ȳ = NT i=1 t=1 yit , etc., give
Here (1.7) and (1.8) are equations in respectively period-specific and individual-specific
means, which may have interest in themselves. Deducting these equations from (1.1), we
obtain, respectively,
yit − ȳi· = (xit − x̄i· )β + (qt − q̄)γ + (uit − ūi· ), E(uit − ūi· |x, q) = 0, (1.10)
yit − ȳ·t = (xit − x̄·t )β + (zi − z̄)α + (uit − ū·t ), E(uit − ū·t |x, z) = 0. (1.11)
Using (1.10) we can be said to measure the variables from their individual-specific means
and therefore, as in (1.4), eliminate zi , while using (1.11), we measure the variables from
their period-specific means and therefore, as in (1.5), eliminate qt . Consequently, consis-
tency of OLS estimation of β from (1.10) is robust to violation of E(uit |z) = 0, unlike OLS
applied on (1.1), when zi is unobservable, correlated with xit over i, and omitted from the
equation. Likewise, consistency of OLS estimation of β from (1.11) is robust to violation
E(uit |q) = 0, unlike OLS applied on (1.1) when qt is unobservable, correlated with xit ,
over t, and omitted from the equation.
We may perform the two last transformations jointly. Subtracting both time-specific
means (1.7) and individual-specific means (1.8) from (1.1) and adding the global means
(1.9) gives2
2 Equivalently, deduct from (1.10) the period mean of (1.11) or deduct from (1.11) the individual mean
of (1.10).
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
yit − ȳi· − ȳ·t + ȳ = (xit − x̄i· − x̄·t + x̄)β +(uit − ūi· − ū·t + ū),
(1.12)
E(uit − ūi· − ū·t + ū|x) = 0.
Now, we can be said to be measuring the variables from their individual-specific and time-
specific means jointly and therefore, as in (1.6), eliminate both the individual-specific
variable zi and the time-specific variable qt . Consequently, OLS regression on (1.12) is
robust to (nuisance) correlation both between xit and zi and between xit and qt , which is
not the case for OLS regression on (1.1) if zi and qt are unobservable and are excluded
from the equation.
Example: The following illustration exemplifies the above transformations and shows
that running OLS regressions across different ‘dimensions’ of a panel data set can give
rather different results for a model with only one y and one x, while disregarding
variables like z and q. Using panel data for N = 229 firms in the Norwegian chemi-
cal manufacturing industry observed over T = 8 years, the logarithm of input (y) is
regressed on the logarithm of the output volume (x). For material input and for labour
input the relevant elasticity estimates are, respectively (standard errors in parentheses):
Regression on the full data set (1832 observations):
Materials: 1.0337 (0.0034). Labour: 0.7581 (0.0088)
Regression on the 229 firm means:
Materials: 1.0340 (0.0082). Labour: 0.7774 (0.0227)
Regression on the 8 year means:
Materials: 1.0228 (0.0130). Labour: −0.0348 (0.0752)
The three equations exemplify, respectively, (1.1) with zi and qt omitted, (1.8) with
zi omitted, and (1.7) with qt omitted. The point estimates (of β) are fairly equal for
materials, but for labour the estimate exploiting only the time variation is negative
and much lower than the two others. There are reasons to believe that the underlying
β-coefficient is not the same. Maybe the estimate from the time mean regression
reflects that its variation is dominated by an omitted trend, say, a gradual introduction
of a labour-saving technology. The question of why cross-sectional and times serial
estimates of presumably the same coefficient often differ substantially has occupied
econometricians for a long time and was posed almost sixty years ago by Kuh (1959).
INTRODUCTION 9
time variation nor the individual variation; (ii) only the time variation; (iii) only the
individual variation; or (iv) both types of variation at the same time. It is often said that
practitioners of, e.g., econometrics very rarely have access to experimental data. For panel
data, however, this is not true without qualification. Such data place the researcher in an
intermediate position closer to an experimental situation than pure cross-section data and
pure time-series data do.
Expressed in technical terms, when using panel data one has the opportunity to separate
intra-individual differences (differences within individuals) from inter-individual differ-
ences (differences between individuals). We have seen that, by performing suitable linear
transformations, we can eliminate unobserved individual- or time-specific effects and
avoid violating E(disturbance|regressors) = 0, the core condition for ensuring unbiased esti-
mation in (classical) regression analysis. Panel data therefore make it possible to eliminate
estimation bias (inconsistency) induced by unobserved nuisance variables which are
correlated with the observable explanatory variables in the equation. Many illustrations of
this will be given throughout the book. To quote Lancaster (2006, p. 277): ‘. . . with panel
data we can relax the assumption that the covariates are independent of the errors. They
do this by providing what, on certain additional assumptions, amounts to a “controlled
experiment”. ’
where x1it , x2it , and x3it are (row vectors of) two-dimensional explanatory variables;
ki , k1i , and k2t are, respectively, N individual-specific, and T period-specific intercepts;
the βi s and β2i s are (column vectors of) individual-specific slope coefficients; the β3t s
are period-specific slope coefficients; and β1 is common to all individuals and peri-
ods. Estimating the coefficients in such equations from pure time-series data or pure
cross-section data is impossible, as the number of coefficients exceeds the number of
observation points.
OUP CORRECTED PROOF – FINAL, 19/9/2016, SPi
By panel data we may explore aggregation problems for time-series data and time series
models. An illustration can be given by (1.13), assuming balanced panel data from N
individuals. Summation across i gives
i yit = i ki + i xit βi + i uit .
Let k̄ = N1 i ki and β̄ = N1 i βi . The equation in time means, after division by N and
a slight rearrangement, can be written as
Representing the aggregated equation simply as (which is the only thing we could do if
we only had linearly aggregated data)
and interpreting β̄ as a ‘mean slope coefficient’ we commit an aggregation error, unless the
micro-coefficients do not show correlation with the variable to which they belong, Rxt,β = 0.
Otherwise, the correct macro-coefficient, β̄(1+Vxt Vβ Rxt,β ), will either show instability
over time or, if Vxt , Vβ , and Rxt,β are approximately time-invariant, will differ from β̄.
How it differs is left in the dark. Having panel data, the aggregation bias may be explored
and corrected for since β N and their standard errors can be estimated. For further
1 , . . . , β
discussion of individual heterogeneity in aggregation contexts, see, e.g., Stoker (1993) and
Blundell and Stoker (2005), as well as Kirman (1992), on the related problems of macro-
economic modelling and analysis as if a ‘representative individual’ exists.
Extending from time-series data or cross-section data to panel data usually gives an
increased number of observations and hence more degrees of freedom in estimation. Use
of panel data frequently contributes to reducing collinearity among the explanatory vari-
ables and allows more extensive testing of competing model specifications. A common
experience is that the correlation between explanatory variables in a regression equation
often is stronger over time than across individuals or firms.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com