0% found this document useful (0 votes)
4 views

5ssmn932 Lecture7 2021 Collated Online

Uploaded by

1842708432z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

5ssmn932 Lecture7 2021 Collated Online

Uploaded by

1842708432z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Lecture 7

Introduction: Instrumental Variables Regression

Dragos Radu
[email protected]

5SSMN932: Introduction to Econometrics


outline lecture 7

• motivation and introduction to IV regression


• how does IV regresion work?
• assumptions of IV
• application: institutions and economic performance
• the two stage least squares estimator (TSLS)
• formula for the TSLS
• sampling distribution of b̂TSLS
1
• further examples and applications

recommended reading for this week: Stock and Watson chapter 12.1
corruption and GDP growth (recap from last tutorial)

with the multiple regression


(including controls for the
trade share, schooling and
initial level of GDP) we
estimated the unbiased e↵ect
of corruption on the GDP
growth rate
this e↵ect is represented by
the blue area
we obtained the same
unbiased e↵ect in two steps:
corruption and GDP growth (recap from last tutorial)
step I a: predict the growth residuals

in the first step we predicted


the residuals after regressing
GDP growth on the control
variables (excluding
corruption)
these residuals are
represented by the yellow
and blue areas - they capture
the variation in GDP
GROWTH that does not
depend on the control
variables.
corruption and GDP growth (recap from last tutorial)
step I b: predict the corruption residuals

we also predicted the


residuals after regressing
corruption on the control
variables (excluding GDP
growth)
these residuals are
represented by the yellow
and brown areas - they
capture the variation in
CORRUPTION that does not
depend on the control
variables.
corruption and GDP growth (recap from last tutorial)
step II: regress GPD growth residuals on the corruption residuals

we can run now a simple


regression:
we regress the two predicted
residuals: (yellow and blue)
on (blue and brown) and
obtain the unbiased e↵ect of
CORRUPTION on GDP
GROWTH depicted by the
blue area of overlap
instrumental variables regression
eliminate bias when E (u |X ) 6= 0

We use a similar approach to deal with violations of our zero conditional


mean assumption when:
• the OVB comes from an unobservable variable (which we cannot
include in the regression)
• we have simultaneous causality bias (‘X causes Y ’ and ‘Y causes X ’)
• errors-in-variables bias (X is measured with error)

All three problems result in E (u |X ) 6= 0.


what comes next?

• part 1: how does IV work? two conditions for a valid instrument


• part 2: application: institutions and economic performance
• part 3: the two stage least square estimator (TSLS)
Lecture 7
Part I: How does IV work?

Dragos Radu
[email protected]

5SSMN932: Introduction to Econometrics


outline lecture 7 part 1

• the idea of IV regression


• conditions for a valid instrument
• the IV model
corruption and GDP growth (recap from last tutorial)

with the multiple regression


(including controls for the
trade share, schooling and
initial level of GDP) we
estimated the unbiased e↵ect
of corruption on the GDP
growth rate
this e↵ect is represented by
the blue area
we obtained the same
unbiased e↵ect in two steps:
instrumental variables regression
eliminate bias when E (u |X ) 6= 0

Instrumental variables (IV) regression can be used to eliminate bias when


E (u |X ) 6= 0 in a
Y = b0 + b1 · X + u
similar to our tutorial example, IV regression breaks X into two parts:
• one part that is correlated to u, and
• another part that is not correlated to u

We do this using an instrumental variable, Z , which is correlated with X


but uncorrelated with u.
standard regression
Y = b0 + b1 · X + u

X Y

no association between X and u; E (u |X ) = 0 ! OLS consistent


standard regression
Y = b0 + b1 · X + u

X Y

correlation between x and u; E (u |X ) 6= 0 ! OLS inconsistent


standard regression
Y = b0 + b1 · X + u

Z X Y

correlation between x and u; E (u |X ) 6= 0 ! OLS inconsistent


instrumental variable Z uncorrelated with u, correlated with X
terminology

An endogenous variable is one that is correlated with u


An exogenous variable is one that is uncorrelated with u

In IV regression, we focus on the case that X is endogenous and there is


an instrument, Z , which is exogenous.

Digression on terminology: Endogenous literally means “determined within


the system.” If X is jointly determined with Y , then a regression of Y on
X is subject to simultaneous causality bias. But this definition of
endogeneity is too narrow because IV regression can be used to address
OV bias and errors-in-variable bias. Thus we use the broader definition of
endogeneity above.
two conditions for a valid instrument

Y = b0 + b1 · X + u
For an instrumental variable (an “instrument”) Z to be valid, it must
satisfy two conditions:
1 Instrument relevance: corr (Z , X ) 6= 0
2 Instrument exogeneity: corr (Z , u ) = 0
Suppose for now that you have such a Z (we’ll discuss how to find
instrumental variables later). How can you use Z to estimate b 1 ?
two stage least squares estimates (2SLS)

Y = b0 + b1 · X + u
TSLS - so named because the results can be obtained directly by two
consecutive OLS regressions:
1 OLS regression of X on Z to get X b
2 OLS regression of Y on X b to get bbIV

bbIV is a consistent estimator of b 1


what comes next?

• part 2: IV example: institutions and economic performance


• part 3: two stage least squares (TSLS)
Lecture 7
Part II
IV Example: Institutions and Economic Prosperity

Dragos Radu
[email protected]

5SSMN932: Introduction to Econometrics


outline lecture 7 part 2

application: institutions and economic performance


• empirical question: do better institutions cause economic growth?
• main intuition: where do good institutions come from?
• method and empirical setup
• replication of results in Stata (tutorial exercise)
IV example: institutions and economic performance
Acemoglu, Johnson, Robinson (AJR)

To illustrate how IV regression works we use a classic paper in


development economics which studies the e↵ect of political institutions on
economic performance:
The Colonial Origins of Comparative Development:
An Empirical Investigation
by: D Acemoglu, S Johnson, J. A. Robinson
The American Economic Review, Vol. 91, No. 5 (Dec., 2001)

The theory they want to test is that good institutions (rule-of-law,


property rights) should result in a country having higher long-term
economic output than if the same country had poor institutions.
empirical question
Acemoglu, Johnson, Robinson (AJR)

why does prosperity vary so much across countries today?


• income per capita in sub-Saharan Africa on av. 1/20th of the US
• in Mali and Congo 1/35th of US income per capita
standard economic explanations (recall the growth models):
• di↵erences in physical capital
• di↵erences in human capital
• di↵erences in technology
but where are these di↵erences coming from?
what are the more fundamental causes?
empirical question
Acemoglu, Johnson, Robinson (AJR)

why do some countries invest less in physical and human capital?


why do some countries fail to adopt technologies. . .
three potential fundamental causes:
• institutions (humanly-devised rules)
• geography (exogenous di↵erences in environment)
• culture (di↵erences in beliefs, attitudes, preferences)

we look at the method applied by AJR to identify the role played by


institutions in explaining di↵erences in economic prosperity
what are institutions?
Acemoglu, Johnson, Robinson (AJR)

institutions: rules of the game in economic, political and social interactions

big di↵erences in economic and political institutions across countries


• enforcement and property rights
• legal system
• corruption
• entry barriers
• democracy /oligarchy/dictatorship
• constraints on elites
• electoral rules
• ...
institutions and economic performance
Acemoglu, Johnson, Robinson (AJR)
institutions and economic performance
Acemoglu, Johnson, Robinson (AJR)
institutions and economic performance
Acemoglu, Johnson, Robinson (AJR)

does this answer the question:

do institutions cause prosperity?

not really! potential problems:


• omitted variables (OVB)
e.g. culture, geography correlated with both income and institutions
• reverse causality
e.g. prosperity might favour the rise of certain institutions
• measurement errors
e.g. poor measures of institutions cause “errors in variables”
institutions and economic performance

the yellow circle represents


the variation in economic
performance (GDP)
the orange circle represent
the variation in institutions
the blue area of overlap
represents the e↵ect of
institutions on GDP
institutions and economic performance

we introduce additional
control variables to eliminate
potential sources of bias
the red area of overlap
between the three variables
represents the variation in
GDP that is explained by
institutions and the control
variables
the remaining blue area
represents the variation in
GDP uniquely explained by
institutional variation, i.e. the
true e↵ect of institutions on
economic performance
institutions and economic performance

problem:
what if relevant control
variables are not available of
unobservable?
what if higher GDP also leads
to better institutions?
(reversed causality)
then there is no way to
distinguish between the red
and the blue ares, i.e. we
cannot identify the true e↵ect
of institutions on economic
performance
institutions and economic performance

we need an instrumental
variable for institutions which
should have two properties:
• it must be independent
of the error term (no
overlap with yellow or
red areas)
• it must be a determinant
of institutions (large
overlap with orange and
blue areas)
do good institutions cause economic prosperity?
Y = b0 + b1 · X + u
where do good institutions come from?
Acemoglu, Johnson, Robinson (AJR)

1 Di↵erent types of colonisation policies


• extremely “extractive states” (e.g. Belgian Congo)
• “Neo-Europes” (USA, Canada, Australia, New Zeland)
do good institutions cause economic prosperity?
example: conquistadores
where do good institutions come from?
Acemoglu, Johnson, Robinson (AJR)

1 Di↵erent types of colonisation policies


• extremely “extractive states” (e.g. Belgian Congo)
• “Neo-Europes” (USA, Canada, Australia, New Zeland)
2 Colonisation strategy influenced by local disease environment
• “Neo-Europes” would not be established in areas where European
colonists faced high mortality
3 Colonial institutions persisted even after independence
where do good institutions come from?
Acemoglu, Johnson, Robinson (AJR)

AJR propose the following causal chain:


(potential)settler mortality =) settlement =) early institutions
=) current institutions =) current economic performance
where do good institutions come from?
Acemoglu, Johnson, Robinson (AJR)
where do good institutions come from?
Acemoglu, Johnson, Robinson (AJR)

if (potential)settler mortalitya↵ects current properity only via institutions


than we can use settler mortality rates as an instrument
method: is the instrument relevant?
is the settlers’ mortality rate relevant for property rights nowadays?
institutions and economic performance

two conditions for the


instrument to be valid:
• to be a determinant of
institutions
• to only impact GDP via
institutions

if settler mortality satisfies


these conditions we can use it
as instrument for institutions
institutions and economic performance

the estimation works in two


stages:
1 we regress the
instrument on
institutions and predict
the green and brown
area
2 we regress green and
brown on GDP to
produce an unbiased
estimate of the e↵ect
of institutions on GDP

this corresponds to the


green area of overlap
institutions and economic performance

Y = b0 + b1 · X + u
Y : economic performance
X : institutions
Z : settler mortality rates
(instrument)
first stage:
we regress X on Z and
predict X̂ - i.e. brown +
green
second stage:
we regress Y on X̂ to
produce an estimate of b 1
institutions and economic performance

the information in the green


area is used to form the
estimate of b 1
the reason is that the green
area corresponds to variation
in prosperity arising entirely
from variation in institutions
however, the green area is
much smaller than the
red+blue+green area, which
means we use less information
in our IV estimation
less efficiency = higher
standard errors
empirical setup

method: Two Stage Least-Squares (TSLS)


• second stage: log income per capita=f (current economic institutions)
• first stage: current economic institution=g (settler mortality)

. use iv_tutorial8.dta
. reg prot logmort
. predict prothat
. reg lgdp prothat
first stage: institutions and settler mortality
Acemoglu, Johnson, Robinson (AJR)

. graph twoway (lfit prot logmort)(scatter prot logmort)


first stage results, TSLS

. reg prot logmort


Source | SS df MS Number of obs = 64
-------------+------------------------------ F( 1, 62) = 23.82
Model | 37.7117954 1 37.7117954 Prob > F = 0.0000
Residual | 98.1744456 62 1.5834588 R-squared = 0.2775
-------------+------------------------------ Adj R-squared = 0.2659
Total | 135.886241 63 2.15692446 Root MSE = 1.2584

------------------------------------------------------------------------------
prot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
logmort | -.6213181 .1273148 -4.88 0.000 -.8758166 -.3668195
_cons | 9.400169 .6116454 15.37 0.000 8.177507 10.62283
------------------------------------------------------------------------------

. predict prothat
(option xb assumed; fitted values)
second stage results, TSLS

. reg lgdp prothat

Source | SS df MS Number of obs = 64


-------------+------------------------------ F( 1, 62) = 53.34
Model | 31.7169566 1 31.7169566 Prob > F = 0.0000
Residual | 36.8647619 62 .594592934 R-squared = 0.4625
-------------+------------------------------ Adj R-squared = 0.4538
Total | 68.5817185 63 1.08859871 Root MSE = .7711

------------------------------------------------------------------------------
lgdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
prothat | .9170798 .1255658 7.30 0.000 .6660774 1.168082
_cons | 2.086889 .8237977 2.53 0.014 .4401407 3.733637
------------------------------------------------------------------------------
second stage results, TSLS

. reg lgdp prothat

------------------------------------------------------------------------------
lgdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
prothat | .9170798 .1255658 7.30 0.000 .6660774 1.168082
_cons | 2.086889 .8237977 2.53 0.014 .4401407 3.733637
------------------------------------------------------------------------------

compare this to the “naive” OLS estimate

. reg lgdp prot

------------------------------------------------------------------------------
lgdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
prot | .522107 .061185 8.53 0.000 .3997999 .6444142
_cons | 4.660383 .4085062 11.41 0.000 3.843791 5.476976
------------------------------------------------------------------------------
IV in Stata

. ivreg lgdp (prot=logmort)

Instrumental variables (2SLS) regression

Source | SS df MS Number of obs = 64


-------------+------------------------------ F( 1, 62) = 37.29
Model | 15.8432814 1 15.8432814 Prob > F = 0.0000
Residual | 52.7384371 62 .850619954 R-squared = 0.2310
-------------+------------------------------ Adj R-squared = 0.2186
Total | 68.5817185 63 1.08859871 Root MSE = .92229

------------------------------------------------------------------------------
lgdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
prot | .9170798 .1501859 6.11 0.000 .6168625 1.217297
_cons | 2.086889 .9853227 2.12 0.038 .1172566 4.056521
------------------------------------------------------------------------------
Instrumented: prot
Instruments: logmort
------------------------------------------------------------------------------
IV in Stata
. ivreg lgdp (prot=logmort), first
First-stage regressions
------------------------------------------------------------------------------
prot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
logmort | -.6213181 .1273148 -4.88 0.000 -.8758166 -.3668195
_cons | 9.400169 .6116454 15.37 0.000 8.177507 10.62283
------------------------------------------------------------------------------
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 64
-------------+------------------------------ F( 1, 62) = 37.29
Model | 15.8432814 1 15.8432814 Prob > F = 0.0000
Residual | 52.7384371 62 .850619954 R-squared = 0.2310
-------------+------------------------------ Adj R-squared = 0.2186
Total | 68.5817185 63 1.08859871 Root MSE = .92229
------------------------------------------------------------------------------
lgdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
prot | .9170798 .1501859 6.11 0.000 .6168625 1.217297
_cons | 2.086889 .9853227 2.12 0.038 .1172566 4.056521
------------------------------------------------------------------------------
Instrumented: prot
Instruments: logmort
------------------------------------------------------------------------------
conclusions from AJR, 2001

• di↵erences in colonial experience might be a source of exogenous


di↵erences in institutions if:
• early institutions persisted to the present and
• there is a causal chain: mortality-settlement-institutions
• is the income-institutions relationship robust for additional controls?
• if yes, there are economic gains from improving institutions
waht comes next?

• part 3: two stage least squares: intuition and derivation of b IV


Lecture 7
Part III
The Two Stage Least Squares Estimator

Dragos Radu
[email protected]

5SSMN932: Introduction to Econometrics


outline lecture 7 part 3

• derivation of b IV
• explain the math behind the intuition
• further IV examples
two stage least squares estimates(2SLS)

Y = b0 + b1 · X + u
TSLS - so named because the results can be obtained directly by two
consecutive OLS regressions:
1 OLS regression of X on Z to get X b
2 OLS regression of Y on X b to get bbIV

bbIV is a consistent estimator of b 1


derivation of b IV

Y = b0 + b1 · X + u
if Z is an instrument for X, we can write:

cov (Z , Y ) = cov (Z , b 0 + b 1 · X + u )
= cov (Z , b 0 ) + cov (Z , b 1 · X ) + cov (Z , u )
= 0 + cov (Z , b 1 · X ) + 0
= b 1 · cov (Z , X )

where cov (Z , u ) = 0 because of instrument exogeneity, thus:

cov (Z , Y )
b1 =
cov (Z , X )
the variance of b IV

the formula of b IV allows us to approximate its sampling distribution as a


normal distribution with mean b the variance:

1 Var (Zi µZ )ui


Var ( b IV ) =
n [Cov (Zi , Xi )]2
alternative derivation of b IV TSLS
using the reduced form

the “reduced forms” relates Y to Z and X to Z :

X = p0 + p1 · Z + e
Y = a0 + a1 · Z + n

because Z is exogenous, Z is uncorrelated with both e and n.


The intuition: A unit change in Z results in a change in X of p1 and a
change in Y of a1 . The change in X arises from the exogenous change in
Z , that change in X is exogenous. Thus an exogenous change in X of p1
units is associated with a change in Y of a1 units
– so the e↵ect on Y of an exogenous change in X is:
a1
b1 = units
p1
the math behind the intuition
X = p0 + p1 · Z + e
Y =a0 + a1 · Z + n
solve the first equation for Z and get:
p0 1 1
Z = + ·X ·e
p1 p1 p1
substitute this into the Y equation and collect terms:
Y = a0 + a1 · Z + µ
✓ ◆
p0 1 1
= a0 + a1 · + ·X ·e +µ
p1 p1 p1
✓ ◆ ✓ ◆
p0 a1 a1
= a0 a1 · + ·X + µ ·e
p1 p1 p1
= b0 + b1 · X + u
where:
a1
b1 =
p1
additional intuition

1 OLS regression of X on Z to get Xb


2 OLS regression of Y on Xb to get bbIV

X = p0 + p1 · Z + e
Y = a0 + a1 · Z + µ
yields :
Y = b0 + b1 · X + u
where :
a1 causal e↵ect of Z on Y
b1 = = b IV =
p1 causal e↵ect of Z on X
Interpretation: An exogenous change in X of p1 units is associated with a
change in Y of a1 units – so the e↵ect on Y of an exogenous unit change
in X is b 1 = pa11 .
additional intuition

causal e↵ect of Z on Y
b IV =
causal e↵ect of Z on X
further IV examples

• supply and demand functions


• the e↵ect of class size on test scores
example 1: supply and demand for butter

IV regression was first developed to estimate demand elasticities for


agricultural goods, for example, butter:

ln (Q butter ) = b 0 + b 1 (P butter ) + u (1)

• b 1 = price elasticity of butter = percent change in quantity for a 1%


change in price (recall log-log specification discussion)
• Data: observations on price and quantity of butter for di↵erent years
• The OLS regression of ln (Q butter on ln (P butter su↵ers from
simultaneous causality bias (why?)
example 1: supply and demand for butter
ln (Q butter ) = b 0 + b 1 · ln (P butter ) + u

equilibrium price and demand for 11 time periods


example 1: supply and demand for butter
ln (Q butter ) = b 0 + b 1 · ln (P butter ) + u

how can we estimate the demand curve?


example 1: supply and demand for butter
ln (Q butter ) = b 0 + b 1 · ln (P butter ) + u

simultaneous causality bias in OLS because price and quantity are


determined by interaction of demand and supply
example 1: supply and demand for butter

TSLS isolates demand points by shifts in supply


IV shifts supply but not demand
example 1: supply and demand for butter

ln (Qibutter ) = b 0 + b 1 · ln (Pibutter ) + u

• Zi = raini = rainfall in dairy-producing regions


• Stage 1: regress ln (Pibutter ) on rain, get ln (\ Pibutter )
ln (\
Pibutter ) isolates changes in log price that arise from supply (part of
supply, at least)

• Stage 2: regress ln (Qibutter ) on ln (\


Pibutter )
The regression counterpart of using shifts in the supply curve to trace out
the demand curve.
IV example2: class size and test scores

The California test score/class size regressions still could have OV


bias (e.g. parental involvement).
In principle, this bias can be eliminated by IV regression (TSLS).
IV regression requires a valid instrument, i.e. an instrument that is:
• relevant: corr (Z , STR ) 6= 0
• exogenous: corr (Z , u ) = 0
IV example2: class size and test scores

here is a hypothetical instrument:


• some districts, randomly hit by an earthquake, “double up”
classrooms:

Zi = Quakei = 1 if hit by quake, = 0 otherwise

• Do the two conditions for a valid instrument hold?


• The earthquake makes it as if the districts were in a random
assignment experiment. Thus, the variation in STR arising from the
earthquake is exogenous.
• The first stage of TSLS regresses STR against Quake, thereby
isolating the part of STR that is exogenous (the part that is “as if”
randomly assigned)
IV example2: class size and test scores
TestSCR = b 0 + b 1 · STR + u

STR TestSCR

no association between STR and u; E (u |STR ) = 0 ! OLS consistent


IV example2: class size and test scores
TestSCR = b 0 + b 1 · STR + u

STR TestSCR

correlation between STR and u; E (u |STR ) 6= 0 ! OLS inconsistent


IV example2: class size and test scores
TestSCR = b 0 + b 1 · STR + u

Quake STR TestSCR

correlation between STR and u; E (u |STR ) 6= 0 ! OLS inconsistent


instrumental variable Quake uncorrelated with u, correlated with STR
IV example2: class size and test scores
TestSCR = b 0 + b 1 · STR + u

Quake STR TestSCR

TestSCR = b 0 + b 1 STR + u STR = p0 + p1 Quake + e


TestSCR = a0 + a1 Quake + µ
conclusions: why IV?

Three important threats to internal validity are:


• Omitted variable bias from a variable that is correlated with X but is
unobserved (so cannot be included in the regression) and for which
there are inadequate control variables;
• Simultaneous causality bias (X causes Y, Y causes X);
• Errors-in-variables bias (X is measured with error)
All three problems result in E (u |X ) 6= 0
Instrumental variables regression can eliminate biases in these cases if a
valid instrument is available.
what comes next?

next week: Regression with Panel Data

You might also like