0% found this document useful (0 votes)
3 views

Mock+Final+Exam

The document outlines the instructions and structure for a final mock exam in Economics UN3412 at Columbia University. It includes sections for short and long questions, emphasizes the closed-book nature of the exam, and specifies academic integrity guidelines. Additionally, it provides a formula sheet for reference during the exam.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Mock+Final+Exam

The document outlines the instructions and structure for a final mock exam in Economics UN3412 at Columbia University. It includes sections for short and long questions, emphasizes the closed-book nature of the exam, and specifies academic integrity guidelines. Additionally, it provides a formula sheet for reference during the exam.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Department of Economics Economics UN3412

Columbia University Spring 2022

Final Mock Exam (Section 1)

(Wednesday, May 11 at 1:10 PM)

Instructions

(1) Do not turn this page until instructed to do so.


(2) The total points are 100.
(3) The exam has two sections: one for 6 short questions and the other for 2 long
questions. Please put your answers in the space provided under each question.
(4) This is a closed-book exam. However, a formula sheet is given to help you
answer questions.
(5) No calculators, computers, wireless, or other electronic devices are allowed.
You may not share resources with anyone else.
(6) Points will be deducted if you separate the pages, including the last page.
(7) If you believe there are errors, state the errors and you will get extra credits
if you are correct.
(8) Columbia’s academic honor code applies.
(9) Put your name and Columbia UNI below.

Name:

UNI:

1
A. SHORT QUESTIONS
1) Consider panel data with T=2 time periods. You have estimated a simple regression in
changes model and found a statistically significant positive intercept. Discuss the interpretation
of the intercept.

2) For a logit regression with an intercept and one independent variable, write the logit
expression using Euler’s constant. In other words, what mathematical formula, involving Euler’s
constant, would we use to further express the equation below. Additionally, list two estimation
methods that can be used to estimate the parameters.

Pr(𝑌𝑌 = 1|𝑋𝑋) = 𝐹𝐹(𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋) =

1.
2.

3) Consider the TSLS model framework. Discuss the test used to determine if the instruments
are weak. How and from what is the test statistic formed? How might we make the statistical
inference decision; what decision value might we reference?
4) Suppose you are designing an experiment. You will randomly assign human subjects to a
treatment group or a control group. You suspect that the treatment will have different effects on
different categories of people in the treatment group. This is not a case of heterogeneous
populations such that every subject might have a different 𝛽𝛽1𝑖𝑖 . Rather it is a difference in
treatment effect based on a covariate (a control variable). Gender is one example of such a
covariate that might result in a different treatment effect. Discuss in words how you would
model this situation and write a sample regression formula.

5) Consider techniques to estimate models with many predictors. Using both words and
equations, discuss the similarities and differences between the ridge and lasso approaches.
6) Consider a sharp discontinuity situation in which the discontinuity occurs once a threshold
value (call it A) of the running variable (call it a) is crossed. Using a linear and a quadratic term,
you want to model one curve on one side of the discontinuity and a different curve on the other
side of the discontinuity. Write an equation that models the discontinuity and the two curves.
Follow the style of authors Angrist ad Pischke and use the difference of the running variable
value to the threshold instead of using the value of the running variable outright.

B1. FIRST LONG QUESTION


At times researchers might be trying to estimate an average treatment effect, ATE, and at other
times they might be trying to estimate a local average treatment effect, LATE. Consider the
regression equation below with the included dummy variable which is coded as 1 if the entity has
received treatment. In addition to the Y and the D variables, we also have a valid instrument
variable, Z, that can be used to predict D.

𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝐷𝐷𝑖𝑖 + 𝑢𝑢𝑖𝑖


You suspect a relationship between 𝐷𝐷𝑖𝑖 and 𝑢𝑢𝑖𝑖 and so you have chosen to use an instrumental
variables approach and two-stage least squares.
7) Using the Angrist and Pischke terminology and notation, consider the reduced form, the effect
of Z on Y. Write an expression for the effect of Z on Y, call it 𝜌𝜌, using the notation of a
difference of conditional expectations. Then write the reduced form regression equation, placing
𝜌𝜌 and the other coefficients and variables in their proper positions.

𝜌𝜌 =
8) Using the same type of conditional expectations notation, write an expression for the effect of
Z on D, the first stage of a TSLS process. Call the effect 𝜑𝜑. Then write the first stage regression
equation, placing 𝜑𝜑 and the other coefficients and variables in their proper positions.
𝜑𝜑 =

9) Write the expression that combines 𝜌𝜌 and 𝜑𝜑 into a measure of the local average treatment
effect, the LATE, 𝜆𝜆. Then write the second stage regression equation, placing 𝜆𝜆 and the other
coefficients in their proper positions.
𝜆𝜆 =

10) (2 points) In the table below from the Angrist and Pischke discussion of KIPP charter
schools, to which of the four groups of students does the LATE estimate apply as a measure of
the effect of treatment?
11) Now consider the case of heterogeneous populations, in which each entity has its own 𝛽𝛽1𝑖𝑖
and its own 𝜋𝜋1𝑖𝑖 as these coefficients are used in the equations below. Of course, the TSLS
process will not estimate individual values of 𝛽𝛽1𝑖𝑖 and 𝜋𝜋1𝑖𝑖 but will rather estimate a single value
for 𝛽𝛽̂1 and 𝜋𝜋�1. Considering the LATE, which of these coefficients is the LATE, and discuss why
is it not the ATE? Discuss the difference between the LATE and the ATE in this context.
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1𝑖𝑖 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖 (𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖)
𝑋𝑋𝑖𝑖 = 𝜋𝜋0 + 𝜋𝜋1𝑖𝑖 𝑍𝑍𝑖𝑖 + 𝑣𝑣𝑖𝑖 (𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇)

B2. SECOND LONG QUESTION


12) If removing omitted variable bias was our primary motivator for seeking out a panel dataset
relating to a particular research issue, we know that we could eliminate OVB of two types,
variables that …., and variables that … Using words, fill in these two expressions.

13) In the panel data context, discuss what we are trying to address when we turn to clustered
standard errors. Include what dimension we cluster on, and discuss what comes together in the
clustering process.
14) Assuming that the panel model assumptions were met before we started the analysis, but we
forget to use clustered standard errors, are the estimated coefficients still unbiased and
consistent?

15) For panel analysis, when we employ the model that uses binary regressors with an intercept,
and we are pursuing both entity fixed effects and time fixed effects, do we need to leave out both
one entity and one time period, or do can we choose to either leave out one entity or one time
period?
’Metrics Formula Sheet Random Variables Joint and Marginal PDFs
Cumulative Distribution Function A function from R2 to R is called a joint probability density
function of (X, Y ) if for every A ∈ R2 ,
Probability The cumulative distribution function or cdf of a r.v. X, Z Z
Sample space (S): the set of all possible outcomes of a denoted by FX (x), is defined by Pr((X, Y ) ∈ A) = f (x, y)dxdy.
particular experiment FX (x) = PX (X ≤ x) for all x. A
Event (A): a collection of possible outcomes of an experiment The marginal pdfs of X and Y are given by
(a subset of S) The function F (x) is a cdf iff the following three conditions Z ∞ Z ∞
hold: fX (x) = f (x, y)dy, fY (y) = f (x, y)dx.
• Two events A and B are disjoint if A ∩ B = ∅. −∞ −∞
• limx→−∞ F (x) = 0 and limx→∞ F (x) = 1.
The joint cumulative distribution function of two random
• A1 , A2 , . . . are pairwise disjoint if Ai ∩ Aj = ∅ for all • F (x) is a nondecreasing function of x. variables X and Y is
i 6= j. Z y Z x
• F (x) is right-continuous.
• If A1 , A2 , . . . are pairwise disjoint and ∪∞ F (x, y) = Pr(X ≤ x and Y ≤ y) = f (r, s)drds
i=1 Ai = S, then X is continuous if FX (x) is a continuous function of x. −∞ −∞
the collection {A1 , A2 , . . .} forms a partition of S.
X is discrete if FX (x) is a step function of x. for all values of x ∈ R and y ∈ R.
X and Y are identically distributed iff FX (x) = FY (x) for
Axioms of Probability every x. Conditional PDF
Given sample space S and sigma algebra B, a probability Suppose that (X, Y ) is a continuous random vector. For any x
(measure) is a function P with domain B that satisfies: Probability (Density) Function such that fX (x) > 0, the conditional pdf of Y given that
The probability function (p.f.) of a discrete r.v. X is given by X = x is a function of y denoted by f (y|x) and defined by
• P (A) ≥ 0 for all A ∈ B.
fX (x) = Pr(X = x) for all x. f (x, y)
• P (S) = 1. f (y|x) := .
The probability density function (pdf), fX (x), of a continuous fX (x)
• If A1 , A2 , . . . ∈PB are pairwise disjoint, then
∞ r.v. X is the function that satisfies Independent Random Variables
P (∪∞
i=1 Ai ) = i=1 P (Ai ). Z x
X and Y are called independent random variables if for every
FX (x) = fX (t)dt for all x. (x, y),
Conditional Probability −∞
f (x, y) = fX (x)fY (y).
If A and B are events in S and Pr(B) > 0, then the A function fX (x) is a pdf. (or p.f.) of X iff
conditional probability of A given B is X and Y are independent iff
• fX (x) ≥ 0 for all x. f (x, y) = g(x)h(y)
Pr(A ∩ B) P R∞
Pr(A|B) = . • x fX (x) = 1 (p.f.) or −∞ fX (x)dx = 1 (pdf). for some functions g and h.
Pr(B)
Expectation Bayes’ Theorem for Continuous Random
The Law of Total Probability The expected value or mean of a random variable g(X) is Variables
For a partition B1 , B2 , . . . , Bk and Pr(Bj ) > 0 for j = 1, . . . , k,  R∞
we have, for any A ∈ S, −∞ g(x)fX (x)dx if X is continuous fX|Y (x|y)fY (y)
Eg(X) = P fY |X (y|x) = R ∞ .
x∈X g(x)fX (x) if X is discrete
−∞ fX|Y (x|y)fY (y)dy
k
Let X be a r.v. and let a, b, and c be constants. Then
X
P (A) = Pr(Bj )Pr(A|Bj ). Conditional Expectation
j=1 E(a · g1 (X) + b · g2 (X) + c) = a · Eg1 (X) + b · Eg2 (X) + c. The conditional expected value of g(Y ) given X = x is defined
by
Statistical Independence EX is the best predictor of X in a sense that Z ∞
Two events A and B are (statistically) independent if EX = argminb E(X − b)2 . E(g(Y )|x) := g(y)f (y|x)dy (continuous case).

Pr(A ∩ B) = Pr(A)Pr(B). Variance E(g(Y )|x) is a function of x and thus, E(g(Y )|X) is a random
variable.
When A and B are independent, The variance of X is VarX = E(X − EX)2 = EX 2 − (EX)2 .
The positive square root of VarX is the standard deviation of Law of Iterated Expectations
Pr(A|B) = Pr(A) and Pr(B|A) = Pr(B), X. If X and Y are any two r.v.s, then
If X is a r.v. with finite variance, then
provided that Pr(A) > 0 and Pr(B) > 0. E[E[g(Y )|X]] = E[g(Y )].
Var(a · X + b) = a2 VarX
Bayes’ Theorem Covariance and Correlation
for any constants a and b. The covariance of X and Y is the number defined by
Let B1 , B2 , . . . , Bk be a partition of the sample space such
that Pr(Bj ) > 0, and let A be an event such that Pr(A) > 0. Markov Inequality Cov(X, Y ) = E[(X − EX)(Y − EY )] = E(XY ) − E(X)E(Y ).
Then for each i = 1, 2, . . . , k, Let X be a r.v. and let g(x) be a nonnegative function. Then The correlation of X and Y is the number defined by
Pr(Bi )Pr(A|Bi ) for any r > 0,
Pr(Bi |A) = Pk . Eg(X) Cov(X, Y )
Pr(g(X) ≥ r) ≤ . ρXY = .
j=1 Pr(Bj )Pr(A|Bj ) r (VarX)1/2 (VarY )1/2
Convergence in Probability Regression Models The t-Statistic of β̂1
A sequence of random variables Zn is said to converge in Simple Linear Regression Model Consider H0 : β1 = β1,0 vs. H0 : β1 6= β1,0 . Let
probability to a constant b if for every  > 0, as n → ∞, The simple linear regression model is β̂1 − β1,0
t= .
Pr(|Zn − b| ≥ ) → 0. Yi = β0 + β1 Xi + ui , i = 1, . . . , n. SE(β̂1 )
When n is large, we obtain the p-value by computing
p The OLS estimators of β0 and β1 are
Convergence in probability is denoted by Zn −
→ b. Pn p-value = 2Φ(−|tact |).
i=1 (Xi − X)(Yi − Y )
Law of Large Numbers (LLN) β̂1 = Pn , β̂0 = Y − β̂1 X. Confidence Intervals for β1
2
i=1 (Xi − X)
Let X̄n denote sample average of independent and identically When n is large, an asymptotically valid (1 − α) confidence
distributed (iid) random vectors X1 , . . . , Xn . Suppose that The OLS predicted values Ŷi and residuals ûi are interval for β1 is constructed as
p h i
E|X1 | < ∞. Then X̄n − → EX1 . Ŷi = β̂0 + β̂1 Xi , ûi = Yi − Ŷi . β̂1 − cv(1 − α)SE(β̂1 ), β̂1 + cv(1 − α)SE(β̂1 ) ,
Convergence in Distribution The Least Squares Assumptions where cv(1 − α) satisfies Φ[−cv(1 − α)] = α/2.
A sequence of random variables Xn is said to converge in • Assumption # 1: E[ui |Xi ] = 0. A Formula for Omitted Variable Bias
distribution to a random variable X if • Assumption # 2: {(Xi , Yi ) : i = 1, . . . , n} are β̂1 →p β1 if and only if cov(Xi , ui ) = 0. In particular,
P (Xn ≤ x) → P (X ≤ x) independently and identically distributed (i.i.d.). cov(Xi , ui )
• Assumption # 3: 0 < E(Xi4 ) < ∞ and 0 < E(Yi4 ) < ∞. β̂1 →p β1 + ,
var(Xi )
for every x at which x 7→ P (X ≤ x) is continuous. It is or
denoted by Xn −
→ X.
d Properties of the OLS Estimator p
var(ui )
We have that β̂1 →p β1 + corr(Xi , ui ) p .
var(Xi )
The Central Limit Theorem (CLT) n−1
Pn
(Xi − X)ui
(Lindeberg and Lévy CLT) If X1 , . . . , Xn form a random β̂1 − β1 = Pi=1 The Bonferroni Test of a Joint Hypothesis
n−1 n i=1 (Xi − X)
2
sample of size n from a given distribution with mean µ and Pn Consider H0 : βj = 0 for all j = {1, . . . , J}.
−1
variance σ 2 (0 < σ 2 < ∞), then for each fixed number x, n i=1 (Xi − E[Xi ])ui The Bonferroni test rejects H0 if there exists some j such that
= + Rn ,
" # var(Xi ) |tj | > c for critical value c such that P (|Z| > c) = α/J.
Xn − µ
lim P ≤ x = Φ(x), where the remainder term Rn is asymptotically negligible. The F -Statistic
n→∞ σ/n1/2
The R 2 In general, if there are q restrictions, the F -statistic times q
where Φ denotes the cdf of the standard normal distribution. follows a chi-squared distribution with q degrees of freedom
The total sum of squares (T SS) consists of the explained sum
under H0 .
of squares (ESS) and the sum of squared residuals (SSR):
Continuous Mapping Theorems When q = 2, the F -statistic combines the two t-statistics using
Let g : R 7→ R be a continuous function. Then T SS = ESS + SSR, the formula !
d d where 1 t21 + t22 − 2ρ̂t1 ,t2 t1 t2
(i) If Xn −
→ X, then g(Xn ) −
→ g(X). F = ,
n
X n
X n
X 2 1 − ρ̂2t1 ,t2
p p T SS = (Yi − Y )2 , ESS = (Yˆi − Y )2 , SSR = uˆi 2 .
(ii) If Xn −
→ X, then g(Xn ) −
→ g(X). where ρ̂t1 ,t2 is an estimator of the correlation between the two
i=1 i=1 i=1
t-statistics.
Statistics The regression R2 is Under H0 , the F -statistic is distributed as F2,∞ .
ESS SSR Then the p-value is computed as
Mean Squared Error R2 = =1− .
T SS T SS p-value = P (F2,∞ > F act ).
The mean squared error (MSE) of an estimator δ(X) of a
function g(·) of parameter θ is the function of θ defined by One can construct an asymptotically valid (1 − α) confidence
Standard Error of β̂1
set using the F -statistic:
When n is large, the distribution of β̂1 is approximated by
h i
Eθ {δ(X) − g(θ)}2 = Varθ [δ(X)] + {Eθ [δ(X)] − g(θ)}2 . {(β1 , β2 ) : F (β1 , β2 ) ≤ cv(F2,∞ , 1 − α)}
  1 var[(Xi − E[Xi ])ui ] where cv(F2,∞ , 1 − α) is the critical value.
N β1 , σβ̂2 , σβ̂2 := .
The bias of a point estimator δ(X) of g(θ) is 1 1 n [var(Xi )]2
Errors-in-Variables Bias
Biasθ = Eθ [δ(X)] − g(θ). An estimator of σβ̂ is called the standard error of β̂1 : Under the classical measurement error,
1

σ2
v
An estimator whose bias is identically (in θ) equal to zero is 1 Pn 2 2
i=1 (Xi − X̄) ûi β̂1 →p 2 X 2 β1 ,
u
u 1 n−2
called unbiased and satisfies Eθ [δ(X)] = g(θ) for all θ. SE(β̂1 ) = t  1 Pn 2 . σX + σw
n (Xi − X̄)2
2 is the variance of X (unmeasured true value of X)
n i=1 where σX
p-values i
2
and σw is the variance of wi (measurement error).
Under homoskedasticity, var(ui |Xi = x) = var(ui ),
A p-value is valid if, for every θ that satisfies the null v
hypothesis and every 0 ≤ α ≤ 1, u 1 Pn 2 Prepared for ECON W3412 (Date: Feb 27. 2022)
i=1 ûi
u1 n−2
SEhomoskedastic (β̂1 ) = t 1 Pn . The latex template is modified from Winston Chang’s latexsheet
Prθ (p(X) ≤ α) ≤ α. n n i=1 (Xi − X̄)2 http://wch.github.io/latexsheet/

You might also like