Syllabus Ec240a Fall2024
Syllabus Ec240a Fall2024
Course Description
After introducing some basic optimization tools, this course begins with an analysis
of a basic prediction problem. A decision maker obtains a random sample of covariates
(features) and outcomes. She wishes to use her sample to forecast the outcomes of new
units on the basis of their covariates. We motivate this problem and provide a canonical
representation of it (the K Normal means problem). We use this problem to introduce
some elements of (i) statistical decision theory and (ii) modern regression methods.
We then develop some properties of regression functions. The iteration properties
of mean and linear regression will receive special emphasis.
Finally, we will develop methods for conducting inference on linear regression co-
efficients estimated by the method of least squares under random sampling. We will
develop two approaches. The first is a nonparametric Bayesian method. The second,
frequentist approach, is based on large sample (i.e., asymptotic) approximations. Meth-
ods of hypothesis testing and confidence interval construction will be reviewed. If time
permits we will introduce some methods for quantile regression analysis.
The central limit theorem at the Santa Cruz Beach Boardwalk (near the Hurricane).
See: https://en.wikipedia.org/wiki/Bean_machine
1
Instructor: Bryan Graham, 665 Evans Hall, email: [email protected]
Time and Location: Monday and Wednesday, 10:00AM to 12:00PM in Lewis Hall, Room 9
Office Hours: Thursdays 2 to 3:45PM (sign up online here). Office hours will be in my Evans
Hall office.
Graduate Student Instructor: Jinglin Yang, e-mail: [email protected]
Prerequisites: linear algebra, multivariate calculus, basic probability and inference theory.
Course Webpage: Various instructional resources, including occasional lecture notes and Jupyter
Notebooks, can be found on GitHub in the following repository
https://github.com/bryangraham/Ec240a
Textbook: There is no mandatory text. Material will be delivered primarily through lecture and
assigned papers. Good note taking is essential for successful performance in the class. Nevertheless
I do recommend the following book as useful supplement to the material presented in lecture (and
also Ec240b in the spring semester).
1. Wooldridge, Jeffrey M. (2011). Econometric Analysis of Cross Section and Panel Data, 2nd
Ed. Cambridge, MA: The MIT Press.
This is a useful long term reference for anyone who anticipates undertaking empirical research. Two
other useful textbooks are Stachurski (2016) and Hansen (2022). Paul Goldsmith-Pinkham has
made Gary Chamberlain’s first-year Harvard Ph.D. econometrics Lecture Notes from circa 2010
available online. Gary was my primary advisor when I was a Ph.D. student; you may find these
notes helpful as well.
Additional books which you may find helpful include Ferguson (1996), Wasserman (2004), Wasser-
man (2006) and Manski (2007). Ferguson (1996) is a compact introduction to large sample theory.
My treatment of the K Normal means problem draws from Wasserman (2006). Wasserman (2004)
is a nice introductory mathematical statistics reference. Manski (2007) provides a textbook treat-
ment of identification with applications of interest to economists. While this course is largely
self-contained, I nevertheless encourage you to view course materials as a basic scaffolding which
you can use to independently add to on your own. Indeed one goal of the course is to develop some
basic literacy in econometrics so that you are better able to learn new material independently. This
is not a “cookbook” class.
Grading: Grades for this half of the course will equal a weighted average of homework (40%), a
scribing assignment (10%) and mid-term performance (50%). The mid-term will be held on the
last day of class (December 4th, 2024). There will be 5 homework assignments (plus a review
sheet). Homeworks are due at 5PM on the assigned due date (the GSI may elect to make small
2
modifications to all things homework related). Homeworks are graded on a ten point scale with one
point off per day late, for the first three days, and no additional penalty thereafter. Concretely this
means homeworks turned in three or more days late can earn only up to seven points (but as long
as you turn it in before the last day of the semester it will be counted).
You are free, indeed encouraged, to work in groups but each student must submit an individual
write-up and accompanying Jupyter Notebook (when required; see below). Please write the names
of any study partners on the top of your homework. Your lowest homework grade will be dropped,
with the average of the remaining scores counting toward your final grade. I will add 5 points
to homework aggregates for students who make serious efforts to complete all five problem sets.
Concretely this means that students may amass up to 45 homework points; it also means that if you
only do four problem sets you can earn no more than 40 out of 45 homework points. Problem Set
5 is formally the course final assessment.
The due dates for the five problem sets are:
Problem Set Due Date
1 November 1st (Friday)
2 November 15th (Friday)
3 November 27th (Wednesday)
4 December 6th (Friday)
5 December 13th (Friday, no assignments accepted after this date)
The scribing assignment involves using a LaTex editor to prepare notes summarizing the material
covered in one lecture. A good scribing assignment will elegantly reproduce the material covered in
lecture and also modestly enhance this material by drawing on course readings and possibly other
resources. Each assignment will be completed in collaboration with two or three of your classmates.
A LaTex style file will be made available on bCourses and/or GitHub. Each scribing assignment is
due 1 week after the relevant lecture.
Computation: All computational work should be completed in Python. Python is a widely used
general purpose programming language with good functionality for scientific computing. There are
lots of ways of accessing Python (EML, online at https://datahub.berkeley.edu/). For those
wishing to manage a Python environment on their personal computer, the Anaconda distribution,
which is available for download at https://www.anaconda.com/distribution/, is a convenient way
to get started. Some basic tutorials on installing and using Python, with a focus on economic
applications, can be found online at https://quantecon.org/. Good books for learning Python,
with some coverage of statistical applications, are Guttag (2013), VanderPlas (2017), and McKinney
(2022).
The code I will provide will execute properly in Python 3.6, which is (close to) the latest Python
release. Python is also available on the EML workstations (which are remotely accessible). There
are a large number of useful resources available for learning Python (including classes at the D-Lab).
While issues of computation may arise from time to time during lecture, I will not teach Python
programming. This is something you will need to learn outside of class. I do not expect this to be
easy. I ask that those students with strong backgrounds in technical computing to assist classmates
with less experience. I am happy to answer programming questions during office hours.
3
Extensions: Extensions for assignments will not be granted. The penalty for lateness is relatively
minor and I also drop the lowest homework grade. I am mindful that you may find the workload
during the first semester of the Ph.D. program challenging. The goal is not to create a miserable
experience, but rather to introduce you to a variety of tools that will be of continuing value as an
economist. Part of the “trick” of getting through the core Ph.D. coursework is to not let work pile
up. There will be times where you may not complete all your work to the standards you are perhaps
used to. This is normal. Do the best you can on your problem sets and turn them in on time.
This allows you to not fall behind. Adopt a growth mindset. Perfection is neither expected nor
optimal; try to enjoy the challenge. Think of the problem sets as puzzles if that helps. Conditional
of meeting a basic proficiency standard, your course grade is not particularly important. Work
hard, but do not worry.
Academic Integrity: Please read the Center for Student Conduct’s statement on Academic
Integrity at http://sa.berkeley.edu/conduct/integrity. I take issues of intellectual honest very seri-
ously.
E-mail and office hours: I prefer to avoid having substantive communications by e-mail. Please
limit e-mail use to short yes/no queries. I am unlikely to read or respond to a long/complex e-mail.
Please make use of my office hours. This is time specifically allocated for your use; please come by
(virtually!) I look forward to getting to know all of you. You can sign-up for office hour slots online
here.
4
Course Outline
W 10/23 Probability Distributions, Bayes’ Rule Mitzenmacher & Upfal (2005, Chs. 1-2)
Stachurski (2016, Ch. 4)
M 10/28 Conditional Expectation Functions Wooldridge (2010, Ch. 2)
Stachurski (2016, Ch. 5)
W 10/30 K-Normal Means Efron (2004)
Wasserman (2006, Ch. 7)
M 11/4 K-Normal Means Wasserman (2006, Ch. 7)
Stein (1981)
W 11/6 Linear Regression Wooldridge (2010, Chs. 2, 4)
Card (1995); Card & Krueger (1996)
M 11/11 No Class Veterans’ Day
M 11/18 Large Sample Theory for OLS Wooldridge (2010, Chs. 3-4)
5
References
Altonji, J. G. & Pierret, C. R. (2001). Employer learning and statistical discrimination. Quarterly
Journal of Economics, 116(1), 313 – 350.
Ashenfelter, O., Ashmore, D., Baker, J. B., Gleason, S., & Hosken, D. S. (2006). Empirical methods
in merger analysis: econometric analysis of pricing in ftc v. staples. International Journal of the
Economics of Business, 13(2), 265 – 279.
Card, D. (1995). Earnings, schooling, and ability revisted. Research in Labor Economics, 14(23 -
48).
Card, D. & Krueger, A. (1996). Does Money Matter?, chapter Labor market effects of school quality:
theory and evidence, (pp. 97 – 140). Brookings Institution Press: Washington D.C.
Deaton, A. (1989). Rice prices and income distribution in thailand: a non-parametric analysis.
Economic Journal, 99(395), 1 – 37.
Efron, B. (2004). The estimation of prediction error. Journal of the American Statistical Association,
99(467), 619 – 632.
Ferguson, T. S. (1996). A Course in Large Sample Theory. London: Chapman & Hall.
Manski, C. F. (2007). Identification for Prediction and Decision. Cambridge, MA: Harvard Uni-
versity Press.
McKinney, W. (2022). Python for Data Analysis: Data Wrangling with pandas, NumPy, and
Jupyter. Cambridge: O’Reilly, 3rd edition.
Mitzenmacher, M. & Upfal, E. (2005). Probability and Computing. Cambridge: Cambridge Uni-
versity Press.
Mood, A. M., Graybill, F. A., & Boes, D. C. (1974). Introduction to the Theory of Statistics. New
York: McGraw-Hill Book Company, 3rd edition.
6
Stachurski, J. (2016). A Primer in Econometric Theory. Cambridge, MA: The MIT Press.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. Cambridge, MA:
MIT Press, 2nd edition.