PDF Using Python for Introductory Econometrics 1st Edition Florian Heiss Daniel Brunner download
PDF Using Python for Introductory Econometrics 1st Edition Florian Heiss Daniel Brunner download
com
https://ebookmeta.com/product/using-python-for-introductory-
econometrics-1st-edition-florian-heiss-daniel-brunner/
OR CLICK BUTTON
DOWNLOAD NOW
https://ebookmeta.com/product/health-econometrics-using-stata-1st-
edition-partha-deb/
ebookmeta.com
https://ebookmeta.com/product/environmental-econometrics-using-
stata-1st-edition-christopher-f-baum/
ebookmeta.com
https://ebookmeta.com/product/introduction-to-python-for-econometrics-
statistics-and-data-analysis-5th-edition-kevin-sheppard/
ebookmeta.com
https://ebookmeta.com/product/rick-steves-england-10th-edition-rick-
steves/
ebookmeta.com
Step by Step Hair Transplantation Expand the Dermatologist
s Knowledge on Concepts and Techniques of Fue Hair
Transplantation 1st Edition Pradeep Sethi Editor Arika
Bansai Editor Abhinav Kumar Editor Sarita Sanke Editor
https://ebookmeta.com/product/step-by-step-hair-transplantation-
expand-the-dermatologist-s-knowledge-on-concepts-and-techniques-of-
fue-hair-transplantation-1st-edition-pradeep-sethi-editor-arika-
bansai-editor-abhinav-kumar-editor/
ebookmeta.com
https://ebookmeta.com/product/manias-panics-and-crashes-a-history-of-
financial-crisis-8th-8th-edition-robert-z-aliber/
ebookmeta.com
https://ebookmeta.com/product/learning-disabilities-sourcebook-health-
reference-6th-edition-angela-l-williams/
ebookmeta.com
https://ebookmeta.com/product/the-story-of-black-military-
officers-1861-1948-1st-edition-krewasky-a-salter/
ebookmeta.com
South Asian Women and International Relations 1st Edition
Abhiruchi Ojha
https://ebookmeta.com/product/south-asian-women-and-international-
relations-1st-edition-abhiruchi-ojha/
ebookmeta.com
Using Python for Introductory Econometrics
1st edition
Florian Heiss
Daniel Brunner
Using Python for Introductory Econometrics
© Florian Heiss, Daniel Brunner 2020. All rights reserved.
Address:
Universitätsstraße 1, Geb. 24.31.01.24
40225 Düsseldorf, Germany
1.6.4. Random Draws from Prob-
ability Distributions . . . . 50
1.7. Confidence Intervals and Statisti-
cal Inference . . . . . . . . . . . . . 52
Contents 1.7.1. Confidence Intervals . . . .
1.7.2. t Tests . . . . . . . . . . . .
52
55
1.7.3. p Values . . . . . . . . . . . 57
1.8. Advanced Python . . . . . . . . . . 60
Preface . . . . . . . . . . . . . . . . . . . 1
1.8.1. Conditional Execution . . . 60
1. Introduction 3 1.8.2. Loops . . . . . . . . . . . . . 60
1.1. Getting Started . . . . . . . . . . . 3 1.8.3. Functions . . . . . . . . . . 61
1.1.1. Software . . . . . . . . . . . 3 1.8.4. Object Orientation . . . . . 62
1.8.5. Outlook . . . . . . . . . . . 66
1.1.2. Python Scripts . . . . . . . . 4
1.9. Monte Carlo Simulation . . . . . . 66
1.1.3. Modules . . . . . . . . . . . 8
1.9.1. Finite Sample Properties of
1.1.4. File Names and the Work-
Estimators . . . . . . . . . . 66
ing Directory . . . . . . . . 9
1.9.2. Asymptotic Properties of
1.1.5. Errors and Warnings . . . . 9
Estimators . . . . . . . . . . 69
1.1.6. Other Resources . . . . . . 10
1.9.3. Simulation of Confidence
1.2. Objects in Python . . . . . . . . . . 11
Intervals and t Tests . . . . 70
1.2.1. Variables . . . . . . . . . . . 11
1.2.2. Objects in Python . . . . . . 11
1.2.3. Objects in numpy . . . . . . 15
I. Regression Analysis with
1.2.4. Objects in pandas . . . . . 19
1.3. External Data . . . . . . . . . . . . 23 Cross-Sectional Data 75
1.3.1. Data Sets in the Examples . 23
2. The Simple Regression Model 77
1.3.2. Import and Export of Data
2.1. Simple OLS Regression . . . . . . . 77
Files . . . . . . . . . . . . . 24
2.2. Coefficients, Fitted Values, and
1.3.3. Data from other Sources . . 26 Residuals . . . . . . . . . . . . . . . 82
1.4. Base Graphics with matplotlib . 27 2.3. Goodness of Fit . . . . . . . . . . . 85
1.4.1. Basic Graphs . . . . . . . . 27 2.4. Nonlinearities . . . . . . . . . . . . 88
1.4.2. Customizing Graphs with 2.5. Regression through the Origin and
Options . . . . . . . . . . . 29 Regression on a Constant . . . . . 89
1.4.3. Overlaying Several Plots . . 30 2.6. Expected Values, Variances, and
1.4.4. Exporting to a File . . . . . 31 Standard Errors . . . . . . . . . . . 91
1.5. Descriptive Statistics . . . . . . . . 33 2.7. Monte Carlo Simulations . . . . . . 94
1.5.1. Discrete Distributions: Fre- 2.7.1. One Sample . . . . . . . . . 94
quencies and Contingency 2.7.2. Many Samples . . . . . . . 96
Tables . . . . . . . . . . . . 33 2.7.3. Violation of SLR.4 . . . . . 98
1.5.2. Continuous Distributions: 2.7.4. Violation of SLR.5 . . . . . 99
Histogram and Density . . 38
1.5.3. Empirical Cumulative Dis- 3. Multiple Regression Analysis: Estima-
tribution Function (ECDF) . 40 tion 101
1.5.4. Fundamental Statistics . . . 42 3.1. Multiple Regression in Practice . . 101
1.6. Probability Distributions . . . . . . 44 3.2. OLS in Matrix Form . . . . . . . . 107
1.6.1. Discrete Distributions . . . 44 3.3. Ceteris Paribus Interpretation and
1.6.2. Continuous Distributions . 47 Omitted Variable Bias . . . . . . . 110
1.6.3. Cumulative Distribution 3.4. Standard Errors, Multicollinearity,
Function (CDF) . . . . . . . 47 and VIF . . . . . . . . . . . . . . . . 112
4. Multiple Regression Analysis: Inference115 8.2. Heteroscedasticity Tests . . . . . . 168
4.1. The t Test . . . . . . . . . . . . . . . 115 8.3. Weighted Least Squares . . . . . . 171
4.1.1. General Setup . . . . . . . . 115
4.1.2. Standard Case . . . . . . . . 116 9. More on Specification and Data Issues 177
4.1.3. Other Hypotheses . . . . . 118 9.1. Functional Form Misspecification . 177
4.2. Confidence Intervals . . . . . . . . 121 9.2. Measurement Error . . . . . . . . . 180
4.3. Linear Restrictions: F-Tests . . . . 123 9.3. Missing Data and Nonrandom
Samples . . . . . . . . . . . . . . . 184
5. Multiple Regression Analysis: OLS 9.4. Outlying Observations . . . . . . . 188
Asymptotics 127 9.5. Least Absolute Deviations (LAD)
5.1. Simulation Exercises . . . . . . . . 127 Estimation . . . . . . . . . . . . . . 190
5.1.1. Normally Distributed Error
Terms . . . . . . . . . . . . . 127
5.1.2. Non-Normal Error Terms . 128 II. Regression Analysis with Time
5.1.3. (Not) Conditioning on the
Series Data 191
Regressors . . . . . . . . . . 132
5.2. LM Test . . . . . . . . . . . . . . . . 135
10. Basic Regression Analysis with Time
Series Data 193
6. Multiple Regression Analysis: Further
Issues 137 10.1. Static Time Series Models . . . . . 193
6.1. Model Formulae . . . . . . . . . . . 137 10.2. Time Series Data Types in Python . 194
6.1.1. Data Scaling: Arithmetic 10.2.1. Equispaced Time Series in
Operations Within a Formula 137 Python . . . . . . . . . . . . 194
6.1.2. Standardization: Beta Coef- 10.2.2. Irregular Time Series in
ficients . . . . . . . . . . . . 138 Python . . . . . . . . . . . . 197
6.1.3. Logarithms . . . . . . . . . 140 10.3. Other Time Series Models . . . . . 199
6.1.4. Quadratics and Polynomials 140 10.3.1. Finite Distributed Lag
6.1.5. Hypothesis Testing . . . . . 142 Models . . . . . . . . . . . . 199
6.1.6. Interaction Terms . . . . . . 143 10.3.2. Trends . . . . . . . . . . . . 201
6.2. Prediction . . . . . . . . . . . . . . 144 10.3.3. Seasonality . . . . . . . . . 202
6.2.1. Confidence and Prediction
Intervals for Predictions . . 11. Further Issues in Using OLS with Time
144
6.2.2. Effect Plots for Nonlinear Series Data 205
Specifications . . . . . . . . 147 11.1. Asymptotics with Time Series . . . 205
11.2. The Nature of Highly Persistent
7. Multiple Regression Analysis with Time Series . . . . . . . . . . . . . . 210
Qualitative Regressors 151 11.3. Differences of Highly Persistent
7.1. Linear Regression with Dummy Time Series . . . . . . . . . . . . . . 213
Variables as Regressors . . . . . . . 151 11.4. Regression with First Differences . 213
7.2. Boolean Variables . . . . . . . . . . 154
7.3. Categorical Variables . . . . . . . . 155 12. Serial Correlation and Heteroscedas-
7.3.1. ANOVA Tables . . . . . . . 157 ticity in Time Series Regressions 217
7.4. Breaking a Numeric Variable Into 12.1. Testing for Serial Correlation of the
Categories . . . . . . . . . . . . . . 159 Error Term . . . . . . . . . . . . . . 217
7.5. Interactions and Differences in Re- 12.2. FGLS Estimation . . . . . . . . . . 222
gression Functions Across Groups 161 12.3. Serial Correlation-Robust Infer-
ence with OLS . . . . . . . . . . . . 223
8. Heteroscedasticity 165 12.4. Autoregressive Conditional Het-
8.1. Heteroscedasticity-Robust Inference 165 eroscedasticity . . . . . . . . . . . . 224
III. Advanced Topics 227 18. Advanced Time Series Topics 289
18.1. Infinite Distributed Lag Models . . 289
13. Pooling Cross-Sections Across Time: 18.2. Testing for Unit Roots . . . . . . . 291
Simple Panel Data Methods 229 18.3. Spurious Regression . . . . . . . . 292
13.1. Pooled Cross-Sections . . . . . . . 229 18.4. Cointegration and Error Correc-
13.2. Difference-in-Differences . . . . . . 230 tion Models . . . . . . . . . . . . . 295
13.3. Organizing Panel Data . . . . . . . 233 18.5. Forecasting . . . . . . . . . . . . . . 295
13.4. First Differenced Estimator . . . . 234
19. Carrying Out an Empirical Project 299
14. Advanced Panel Data Methods 239 19.1. Working with Python Scripts . . . 299
14.1. Fixed Effects Estimation . . . . . . 239 19.2. Logging Output in Text Files . . . 301
14.2. Random Effects Models . . . . . . 240 19.3. Formatted Documents with
14.3. Dummy Variable Regression and Jupyter Notebook . . . . . . . . . . 302
Correlated Random Effects . . . . 244 19.3.1. Getting Started . . . . . . . 302
14.4. Robust (Clustered) Standard Errors 247 19.3.2. Cells . . . . . . . . . . . . . 302
19.3.3. Markdown Basics . . . . . . 303
15. Instrumental Variables Estimation
and Two Stage Least Squares 249
15.1. Instrumental Variables in Simple IV. Appendices 309
Regression Models . . . . . . . . . 249
15.2. More Exogenous Regressors . . . . Python Scripts
251 311
15.3. Two Stage Least Squares . . . . . . 1.
254 Scripts Used in Chapter 01 . . . . 311
15.4. Testing for Exogeneity of the Re- 2. Scripts Used in Chapter 02 . . . . 334
gressors . . . . . . . . . . . . . . . .256 3. Scripts Used in Chapter 03 . . . . 343
15.5. Testing Overidentifying Restrictions 257 4. Scripts Used in Chapter 04 . . . . 347
15.6. Instrumental Variables with Panel 5. Scripts Used in Chapter 05 . . . . 349
Data . . . . . . . . . . . . . . . . . .259 6. Scripts Used in Chapter 06 . . . . 352
7. Scripts Used in Chapter 07 . . . . 357
16. Simultaneous Equations Models 261 8. Scripts Used in Chapter 08 . . . . 361
16.1. Setup and Notation . . . . . . . . . 261 9. Scripts Used in Chapter 09 . . . . 365
16.2. Estimation by 2SLS . . . . . . . . . 262 10. Scripts Used in Chapter 10 . . . . 371
16.3. Outlook: Estimation by 3SLS . . . 263 11. Scripts Used in Chapter 11 . . . . 374
12. Scripts Used in Chapter 12 . . . . 378
17. Limited Dependent Variable Models 13. Scripts Used in Chapter 13 . . . . 383
and Sample Selection Corrections 265 14. Scripts Used in Chapter 14 . . . . 386
17.1. Binary Responses . . . . . . . . . . 265 15. Scripts Used in Chapter 15 . . . . 390
17.1.1. Linear Probability Models . 265 16. Scripts Used in Chapter 16 . . . . 395
17.1.2. Logit and Probit Models: 17. Scripts Used in Chapter 17 . . . . 396
Estimation . . . . . . . . . . 267 18. Scripts Used in Chapter 18 . . . . 405
17.1.3. Inference . . . . . . . . . . . 270 19. Scripts Used in Chapter 19 . . . . 409
17.1.4. Predictions . . . . . . . . . . 271
17.1.5. Partial Effects . . . . . . . . 273 Bibliography 411
17.2. Count Data: The Poisson Regres-
List of Wooldridge (2019) Examples 413
sion Model . . . . . . . . . . . . . . 276
17.3. Corner Solution Responses: The Index 415
Tobit Model . . . . . . . . . . . . . 279
17.4. Censored and Truncated Regres-
sion Models . . . . . . . . . . . . . 281
17.5. Sample Selection Corrections . . . 286
List of Tables
1.1. Logical Operators . . . . . . . . . . 12
1.2. Python Built-in Data Types . . . . . 15
1.3. Important numpy Functions and
Methods . . . . . . . . . . . . . . . 18
1.4. Important pandas Methods . . . . 21
1.5. numpy Functions for Descriptive
Statistics . . . . . . . . . . . . . . . 41
1.6. scipy Functions for Statistical
Distributions . . . . . . . . . . . . . 44
to reproduce the results. Some supplementary analyses provide additional intuition and insights.
We want to thank Lars Grönberg providing us with many suggestions and valuable feedback about
the contents of this book.
The book is designed mainly for students of introductory econometrics who ideally use
Wooldridge (2019) as their main textbook. It can also be useful for readers who are familiar with
econometrics and possibly other software packages. For them, it offers an introduction to Python
and can be used to look up the implementation of standard econometric methods. Because we are
explicitly building on Wooldridge (2019), it is useful to have a copy at hand while working through
this book.
Note that there is a sister book Using R for Introductory Econometrics, just published as a second
edition, see http://www.URfIE.net. We based this book on the R version, using the same struc-
ture, the same examples, and even much of the same text where it makes sense. This decision was
not only made for laziness. It also helps readers to easily switch back and forth between the books.
And if somebody worked through the R book, she can easily look up the pythonian way to achieve
exactly the same results and vice versa, making it especially easy to learn both languages. Which one
should you start with (given your professor hasn’t made the decision for you)? Both share many of
the advantages like having a huge and active user community, being widely used inside and outside
of academia and being freely available. R is traditionally used in statistics, while Python is domi-
nant in machine learning and artificial intelligence. These origins are still somewhat reflected in the
availability of specialized extension packages. But most of all data analysis and econometrics tasks
can be equally well performed in both packages. At the end, it’s most important point is to get used
to the workflow of some dedicated data analysis software package instead of not using any software
or a spreadsheet program for data analysis.
All computer code used in this book can be downloaded to make it easier to replicate the results
and tinker with the specifications. The companion website also provides the full text of this book
for online viewing and additional material. It is located at:
http://www.UPfIE.net
1. Introduction
Learning to use Python is straightforward but not trivial. This chapter prepares us for implementing
the actual econometric analyses discussed in the following chapters. First, we introduce the basics
of the software system Python in Section 1.1. In order to build a solid foundation we can later rely
on, Chapters 1.2 through 1.4 cover the most important concepts and approaches used in Python like
working with objects, dealing with data, and generating graphs. Sections 1.5 through 1.7 quickly
go over the most fundamental concepts in statistics and probability and show how they can be
implemented in Python. More advanced Python topics like conditional execution, loops, functions
and object orientation are presented in Section 1.8. They are not really necessary for most of the
material in this book. An exception is Monte Carlo simulation which is introduced in Section 1.9.
1.1.1. Software
Python is a free and open source software. Its homepage is https://www.python.org/. There, a
wealth of information is available as well as the software itself. We recommend installing the Python
distribution Anaconda (also open source), which includes Python plus many tools needed for data
analysis. For more information and installation files, see https://www.anaconda.com.
Distributions are available for Windows, Mac, and Linux systems and come in two versions. The
examples in this book are based on the installation of the latest version, Python 3. It is not backwards
compatible to Python 2.
After downloading and installing, Python can be accessed by the command line interface. In
Windows, run the program “Anaconda Prompt”. In Linux or macOS you can simply open a terminal
4 1. Introduction
window. You start Python by typing python and pressing the return key ( ←- ). This will look similar
to the screenshot in Figure 1.1. It provides some basic information on Python and the installed
version. Right to the “>>>” sign is the prompt where the user can type commands for Python to
evaluate.
We can type whatever we want here. After pressing ←- , the line is terminated, Python tries to
make sense out of what is written and gives an appropriate answer. In the example shown in Figure
1.1, this was done four times. The texts we typed are shown next to the “>>>” sign, Python answers
under the respective line.
Our first attempt did not work out well: We have got an error message. Unfortunately, Python does
not comprehend the language of Shakespeare. We will have to adjust and learn to speak Python’s
less poetic language. The second command shows one way to do this. Here, we provide the input
to the command print in the correct syntax, so Python understands that we entered text and knows
what to do with it: print it out on the console. Next, we gave Python simple computational tasks and
got the result under the respective command. The syntax should be easy to understand – apparently,
Python can do simple addition and deals with the parentheses in the expected way. The meaning of
the last command is less obvious,
√ because it uses the pythonian way of calculating an exponential
term: 16**0.5 = 160.5 = 16 = 4.
Python is used by typing commands such as these. Not only Apple users may be less than im-
pressed by the design of the user interface and the way the software is used. There are various
approaches to make it more user friendly by providing a different user interface added on top of
plain Python. Notable examples include IDLE, PyCharm, Visual Studio and Spyder. The latter was
already set up during the installation of Anaconda and we use it for all what follows. The easiest
way to start Spyder is by selecting it in the Anaconda Navigator that was also set up during the
installation of Anaconda.
A screenshot of the user interface on a Mac computer is shown in Figure 1.2 (on other systems it
will look very similar). There are several sub-windows. The one on the bottom right named “IPython
console” looks very similar and behaves exactly the same as the command line. The usefulness of
the other windows will become clear soon.
Here are a few quick tricks for working in the console of Spyder:
• When starting to type a command, press the tabulator key − →−−
→− to see a list of suggested
commands. Typing pr, for example, followed by − → −
−
−→ gives a list of all Python commands
starting with pr, like the print command.
• Use help(command) to print the help page for the provided command.
• With the ↑ and ↓ arrow keys, we can scroll through the previously entered commands to
repeat or correct them.
This is important since a key feature of the scientific method is reproducibility. Our thesis adviser
as well as the referee in an academic peer review process or another researcher who wishes to build
on our analyses must be able to fully understand where the results come from. This is easy if we can
simply present our Python script which has all the answers.
Working with Python scripts is not only best practice from a scientific perspective, but also very
convenient once we get used to it. In a nontrivial data analysis project, it is very hard to remember
all the steps involved. If we manipulate the data for example by directly changing the numbers in a
spreadsheet, we will never be able to keep track of everything we did. Each time we make a mistake
(which is impossible to avoid), we can simply correct the command and let Python start from scratch
by a simple mouse click if we are using scripts. And if there is a change in the raw data set, we can
simply rerun everything and get the updated tables and figures instantly.
Using Python scripts is straightforward: We just write our commands into a text file and save it
with a “.py” extension. When using a user interface like Spyder, working with scripts is especially
convenient since it is equipped with a specialized editor for script files. To use the editor for working
on a new Python script, use the menu File→New file....
The window in the left part of Figure 1.2 is the script editor. We can type arbitrary text, begin
a new line with the return key, and navigate using the mouse or the ↑ ↓ ← → arrow keys.
Our goal is not to type arbitrary text but sensible Python commands. In the editor, we can also use
tricks like code completion that work in the Console window as described above. A new command
is generally started in a new line, but also a semicolon “;” can be used if we want to cram more than
one command into one line – which is often not a good idea in terms of readability.
An extremely useful tool to make Python scripts more readable are comments. These are lines
beginning with a “#”. These lines are not evaluated by Python but can (and should) be used to
structure the script and explain the steps. Python scripts can be saved and opened using the File
menu.
Figures 1.3 and 1.4 show a screenshot of Spyder with a Python script saved as “First-Python-
Script.py”. It consists of six lines in total including three comments. We can send lines of code to
Random documents with unrelated
content Scribd suggests to you:
the application of capital to agriculture will have increased the
available food. The result will be the same tolerable degree of
comfort as at the beginning of the cycle, and the same relapse from it
as at the second stage. He conceives the two stages to follow each
other as naturally as sunshine rain and rain sunshine. The existence
of such a cycle may remain concealed from the ordinary historian, if
he looks merely to the money wages of the labourer, for it frequently
happens that the labourer gets the same sums of money for his wages
during a long series of years when the real value of the sums has not
remained the same,—the price of bread in what we have called the
second stage of the cycle being much dearer than it was in the first,
and than it will be in the third.[153] Though Malthus expressly
qualifies his statements by showing that civilization tends to
counteract these fluctuations, it certainly seemed to be his belief in
1803 that on the whole the working classes of Europe, and especially
of England, were powerless to escape from them. How far this view is
justified will be seen presently.
CHAPTER IV.
THE SAVAGE, BARBARIAN, AND ORIENTAL.
Simile supplanted by Fact—Savage Life—Population dependent not
on possible but on actual Food—Indirect Action of Positive
Checks—Hunger not a Principle of Progress—Otaheite a Crux for
Common Sense—Cycle in the Movement of Population—Pitcairn
Island—Barbarian and Oriental—Nomad Shepherds—Abram and
Lot—Cimbri and Teutones—Gibbon versus Montesquieu—“At
bay on the limits of the Universe”—Misgovernment an indirect
Check on Population—Ancient Europe less populous than
Modern—Civilization the gradual Victory of the third Check.