0% found this document useful (0 votes)
73 views

09 Causal Inference II: MSBA7003 Quantitative Analysis Methods

The instrumental variable Z is used to estimate the effect of the endogenous variable D on the outcome Y: Z is correlated with D but independent of other factors affecting Y; a two stage least squares regression first estimates the effect of Z on D, then uses this to instrument for D in predicting Y and obtain a consistent estimate of the causal effect.

Uploaded by

Amanda Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

09 Causal Inference II: MSBA7003 Quantitative Analysis Methods

The instrumental variable Z is used to estimate the effect of the endogenous variable D on the outcome Y: Z is correlated with D but independent of other factors affecting Y; a two stage least squares regression first estimates the effect of Z on D, then uses this to instrument for D in predicting Y and obtain a consistent estimate of the causal effect.

Uploaded by

Amanda Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

MSBA7003 Quantitative Analysis Methods

ZHANG, Wei
Assistant Professor
HKU Business School

09 Causal Inference II

1
Agenda
• Causal Graphs

• Back-door-blocking Strategy

• Instrumental Variable

• Two Step Least Squares Regression


• Single IV estimation
• General 2SLS
• Testing for validity of IV

2
Causal Graphs
• How to make sure that the control variables allow a perfect stratification?
• We need causal graphs to conceptualize our problem.
• The goal is to represent explicitly all causes of the outcome.

• Each node of a causal graph represents a random variable


• Labeled by a letter such as A, B, or C
• Observed variables are represented by a solid circle ●
• Unobserved variables are represented by a hollow circle ○
• Causes are represented by a directed edge → (i.e., single-headed arrow),
such that an edge from one node to another signifies that the variable at the
origin causes the variable at the terminus.

3
Causal Graphs

U C D

A
X Y

4
Transitivity of Causal Relations
• For any three variables 𝐴, 𝐵, and 𝐶, if 𝐴 causes 𝐵 and 𝐵 causes 𝐶, then 𝐴 causes 𝐶.
• This is an assumption.
• Counter-example: 𝐴 = playing computer games; 𝐵 = academic performance; 𝐶 = brain
capability; 𝐷 = study time (i.e., the effect of 𝐴 on 𝐶 is cancelled out by different mechanisms.)

A B

5
Causal Structures
• There are three basic patterns of causal relationship that would exist for
three variables that are structurally related to each other.
• (1) A chain of mediation:
• Unconditional dependence: P(AB)≠P(A)P(B)
• Conditional independence: P(AB|C)=P(A|C)P(B|C)
A C B

• (2) A fork of mutual dependence: C


• Unconditional dependence: P(AB)≠P(A)P(B)
• Conditional independence: P(AB|C)=P(A|C)P(B|C) A B

• (3) An inverted fork of mutual causation: A B


• Unconditional independence: P(AB)=P(A)P(B) C
• Conditional dependence: P(AB|C)≠P(A|C)P(B|C)

6
Unconditional Correlation
• The Reichenbach Principle: Any two variables A and B are unconditionally
correlated if and only if either (a) A causes B, (b) B causes A, (c) a common cause C
causes both A and B, or (d) any combination of (a)-(c).

(a) A B (b) A B

(c) A B (d) A B
C C

• Causes may be mediated.


• Correlation due to randomness is possible, but it will disappear as sample size increases.
• In sum, two variables are unconditionally correlated if and only if they share a driving factor.

7
Conditional Correlation
• Two variables are unconditionally independent, but are correlated
conditioning on C (or E), if
• (1) they are in the positions of A and B, or
• (2) they are in the positions of X and B, or Which of the following statement(s) is(are) wrong?
(A) If A and B are correlated, and B and C are
• (3) they are in the positions of X and Y. correlated, then A and C must also be correlated.
(B) If A and C are correlated, and B and C are
correlated, then A and B could be independent.
C (C) If A and B are independent, and A and C are
A B correlated, then B and C must be independent.
X Y (D) If A and B are independent unconditionally and
are correlated conditioning on C, then A and C
must be correlated unconditionally.
E

8
Causal Inference Strategy
• Conditioning on “back-door-blocking” variables
• Identify paths that correlate X and Y; block the path by conditioning on a variable on the path
• Condition on C, or
• Condition on A and B
V

U C D

A
X Y

Z
9
Back-door-blocking Strategy: An Example
• You are running an online shopping website that focuses on fashion apparel. Your company normally
purchases a product from a supplier before the selling season starts. During the season, customers that
purchase the product can give a rating of the product on the website. When the selling season ends, any
leftovers will be shipped back to the supplier and partial refund will be provided. You want to use the
historical data to estimate how customer rating of a product influences the sales of the product. You have
prepared data for the following variables:
• Y: the total sales of a product
• X: the average customer rating
• P: the price of a product
• S: the set of dummies that indicate the category of a product
(e.g., male versus female, season, and type)
• W: the beginning inventory level of a product
• C: the number of clicks associated with a product
• Draw the causal graph and find the BDB variable(s).

10
When Conditioning Does Not Work
• When we cannot block all back-door paths, conditioning strategy does not work.
• We cannot use matching or simple regressions to estimate the effect of
• Education on income with family background unknown
• Screen size on online purchase behavior with individual income unknown
• Firm revenue on default risk with economic condition unknown
V
• A possible remedy:
• instrumental variable (IV) estimation U C D
• Z causes X
B
• Z is related to Y only through X’s effect
• if Z and Y are correlated A
• it must be that X causes Y X Y

11
Z
Instrumental Variable
• Suppose 𝑌 = 𝛼 + 𝛿𝐷 + 𝛽𝑋 + 𝛾𝑈 + 𝜖 and the causal structure is given below.
Assume that 𝜖 ⊥ 𝐷, 𝑋 and that 𝐷 can take many values.
• E.g., Y is the total amount of smoking, D is years of education, U is family income, X is place of
origin, and Z is exam training.
• What can go wrong if we use OLS method to estimate 𝛿? Omitted Variable Bias.
12 –

Z U 10 –

8–

𝑌 = 𝛼 + 𝛽𝑋 + 𝛿𝐷 + 𝜖 ∗

Y
6–

D Y 4–

2–
෠ + 𝛿𝐷
𝑌෠ = 𝛼ො + 𝛽𝑋 መ
0–
| | | | | | | | |
X 0 1 2 3 4 5 6 7 8

D
12
Instrumental Variable
• Alternatively, we can describe our causal system with two simultaneous equations:
• (1) 𝑌 = 𝛼 + 𝛿𝐷 + 𝛽𝑋 + 𝑢, where 𝑢 = 𝛾𝑈 + 𝜖.
• (2) 𝐷 = 𝑎 + 𝑏𝑍 + 𝑐𝑋 + 𝑣, where 𝑣 is correlated with 𝑢 by 𝑈.

• If we develop a regression model based on equation (2), then 𝑏 can be estimated without OVB. Let
𝑏෠ denote the estimated 𝑏.
• If we replace equation (2) into (1), we have
• (3) 𝑌 = 𝛼 + 𝛿𝑎 + 𝛿𝑏𝑍 + 𝛽 + 𝛿𝑐 𝑋 + 𝑢 + 𝛿𝑣 = 𝐴 + 𝐵𝑍 + 𝐶𝑋 + 𝜀

• Since 𝑍 is uncorrelated with 𝜀 , equation (3) and 𝐵 can be estimated without OVB. Note that 𝐵 =
𝛿𝑏. Hence, 𝛿 can be consistently estimated by 𝐵/෠ 𝑏෠ (in two steps).

13
Instrumental Variable
• Formally, for 𝑍 to be an instrumental variable for the causal relationship 𝐷 → 𝑌 , 𝑍 must satisfy
two conditions:

• (1) Relevance: 𝐶𝑜𝑣 𝐷, 𝑍 ≠ 0


• The instrument must be correlated with the explanatory variable 𝐷.
Sources of variations
of Y other than D. The true relationship of
how Y depends on D.
• (2) Exogeneity: 𝐶𝑜𝑣 𝜖 ′ , 𝑍 = 0, where 𝜖 ′ = 𝑌 − 𝛿𝐷
• 𝑍 cannot be correlated with 𝑌 except through 𝐷’s effect on 𝑌.

• (1) can be tested by regressing 𝐷 on 𝑍; (2) cannot be directly tested because 𝛿 cannot be correctly
estimated without a valid IV.

14
Instrumental Variable
Z U Z U

D Y D Y
Z is a valid IV Z is a valid IV
Z U

D Y
Z U Z U Is Z a valid IV?

D Y D Y
Z is not a valid IV Z is not a valid IV

15
Examples of I.V.
• Using day of the week of hospital admission as an instrument for the effect
of waiting time to surgery on length of stay and patient mortality. (Assume
factures happen randomly. Many surgeons operate only on weekdays and,
therefore, patients admitted on weekends may wait longer.)

• Using the quarter of birth as an instrument for the effect of years of


schooling on subsequent earnings. (Students born in different months of the
year start school at different ages. Students who are born early in the
academic year are typically older when they enter school. If the fraction of
students who desire to leave school after they reach the legal dropout age is
constant across birthdays, those born in the beginning of the academic year
will have less schooling.)

16
Examples of I.V.
• Using favorable growing condition (e.g., the amount of rainfall) as an
instrument for the effect of market price on demand of an agricultural
product. (Favorable growing conditions will affect supply of the product but
not the demand, and supply will affect the price.)

• Counter-example:
• Using tax rate for tobacco products as an instrument for the effect of
smoking on health. (The tax rate for tobacco products is determined mainly
by political and economic factors. Economic factors could be related to the
general health condition of the population. This IV is also invalid if the
government considered general health condition when determining the tax
rate.)
17
2SLS Estimation
• Suppose there is a single instrument.

• Step 1: Isolate the part of 𝐷 that is uncorrelated with 𝑈 by regressing 𝐷 on 𝑍.


𝐷 = 𝑎 + 𝑏𝑍 + 𝑣
• Then compute the predicted values of 𝐷:
𝐷 ෠
෡ = 𝑎ො + 𝑏𝑍
෡ in the regression of interest and regress 𝑌 on 𝐷
• Step 2: Replace 𝐷 by 𝐷 ෡ using OLS.
෡ + 𝜖′
𝑌 = 𝛼 ′ + 𝛿𝐷
• The 2-Step-Least-Squares (2SLS) estimator is 𝛿መ2𝑆𝐿𝑆 .

18
2SLS Estimation
• With large samples,
𝛿መ2𝑆𝐿𝑆 ~ 𝑁 𝛿, 𝜎𝛿෡2
2𝑆𝐿𝑆

• Important Notes:
• The OLS standard errors from the second step regression are not correct because
they do not take into account estimation errors in the first step.
• To get correct standard errors, we’d better use a single command in a statistical
software (e.g., R and STATA).
• In R, install ivpack first, and then check package AER.
• Although uncorrelated with 𝑈, 𝑋 is usually added to both stages of regression in
order to increase the consistency.

19
General 2SLS Estimation
• There is set of unobserved explanatory variables: 𝑈.
• There are several explanatory variables of interest: 𝐷1 , … , 𝐷𝑘 , 𝑋1 , … , 𝑋𝑞 ,
where 𝐷1 , … , 𝐷𝑘 are directly correlated with 𝑈 and 𝑋1 , … , 𝑋𝑞 are not
directly correlated with 𝑈.
𝑌 = 𝛽0 + 𝛽1 𝐷1 + ⋯ + 𝛽𝑘 𝐷𝑘 + 𝛾1 𝑋1 + ⋯ + 𝛾𝑞 𝑋𝑞 + 𝜖
• Suppose there are more than one instruments: 𝑍1 , … , 𝑍𝑚 , where we must
have 𝑚 ≥ 𝑘.
• If 𝑚 = 𝑘, we say 𝛽1 , … , 𝛽𝑘 are exactly identified.
• If 𝑚 > 𝑘, we say 𝛽1 , … , 𝛽𝑘 are over-identified.
• Very often, 𝐷1 , … , 𝐷𝑘 are called endogenous variables, 𝑋1 , … , 𝑋𝑞 are called
included exogenous variables, and 𝑍1 , … , 𝑍𝑚 are called excluded
exogenous variables.

20
General 2SLS Estimation
• Step 1:
• Regress 𝐷𝑖 on 𝑋1 , … , 𝑋𝑞 , 𝑍1 , … , 𝑍𝑚 by OLS for 𝑖 = 1, … , 𝑘.
• Calculate 𝑛 predicted values for each 𝐷𝑖 : 𝐷 ෡𝑖𝑗 where 𝑖 = 1, … , 𝑘 and 𝑗 = 1, … , 𝑛.
• Step 2:
• Regress 𝑌 on 𝐷 ෡1 , … , 𝐷
෡𝑘 , 𝑋1 , … , 𝑋𝑞 by OLS and obtain the 2SLS estimators for
𝛽1 , … , 𝛽𝑘 , 𝛾1 , … , 𝛾𝑞 .

• Note:
• We normally include 𝑋1 , … , 𝑋𝑞 in both steps to reduce the estimation error.
• Use a single command to get correct standard errors and perform hypothesis tests.

21
Example: Housing Value and Rent
• Data from 1980 census on the median dollar value of owner-occupied housing (hsngval) and the
median monthly gross rent (rent) for each state in US (50 observations).
• We want to study how rent is affected by hsngval.
Variable Description Mean Std. Dev. Min Max
pop Population in 1980 4,518,149 4,715,038 401,851 2.37e+07
popgrow Pop. growth 1970-80 16.29 14.39 -3.6 63.8
popden Pop./sq. miles 1,543.72 2,213.894 7 9,862
pcturban Percentage urban 66.95 14.41 33.77 91.29
faminc Median family income 1979 19,499.92 2,617.22 14,591 28,395
hsng Housing units in 1980 1,762,686 1,847,308 162,825 9,279,036
hsnggrow Percentage housing growth 35.65 19.30 9.02 97.01
hsngval Median housing value 48,484 15,770.24 31,100 119,400
rent Median gross rent 234.76 35.35 180 368
22
Example: Housing Value and Rent
• Both housing value and rent are determined by demand and supply factors,
and they share many of these factors.
• Supply side: hsng, government policy (unobserved)
• Demand side: pop, faminc, government policy (unobserved)

• Urban population (pcturban) is an indirect measure of economic status and


can affect both the housing and the rental markets.

• Different from the rental market, housing value also depends on the future
supply and demand as well as the land value. Hence, it is reasonable to
believe that popgrow, hsnggrow, and popden only affect hsngval directly,
and they can serve as instruments for the estimation.

23
Example: Housing Value and Rent
• The conceptual model

faminc pop hsng pcturban


popgrow

hsnggrow
hsngval rent

popden
U

24
Example: Housing Value and Rent
pop popgrow popden pcturban faminc hsng hsnggrow hsngval rent

pop 1.0000
popgrow -0.2311 1.0000
popden 0.2174 -0.4449 1.0000
pcturban 0.4557 0.1499 0.4789 1.0000
faminc 0.1711 0.0111 0.3127 0.5790 1.0000
hsng 0.9986 -0.2110 0.2077 0.4588 0.1637 1.0000
hsnggrow -0.2678 0.9661 -0.4140 0.1707 0.1135 -0.2474 1.0000
hsngval 0.0619 0.3350 0.1381 0.5631 0.6773 0.0621 0.3948 1.0000
rent 0.1456 0.3834 0.1824 0.5958 0.8455 0.1518 0.4823 0.7987 1.0000

• To avoid the multicollinearity problem, we shouldn’t include popgrow and


hsnggrow at the same time. To better capture the demand-supply tension, we
use the population-housing growth rate ratio: phgrow = popgrow/hsnggrow.
• Similarly, to avoid strong correlation between pop or hsng, we introduce the
population-housing ratio: phratio = pop/hsng.

25
Example: Housing Value and Rent
• We estimate the following model:
• Outcome variable: 𝑌 = 𝑟𝑒𝑛𝑡
• Endogenous explanatory variable: 𝐷 = ℎ𝑠𝑛𝑔𝑣𝑎𝑙
• Included exogenous variables:
• 𝑋1 = 𝑝𝑐𝑡𝑢𝑟𝑏𝑎𝑛
• 𝑋2 = 𝑝ℎ𝑟𝑎𝑡𝑖𝑜
• 𝑋3 = 𝑓𝑎𝑚𝑖𝑛𝑐
• Excluded exogenous variables (i.e., instrumental variables):
• 𝑍1 = 𝑝ℎ𝑔𝑟𝑜𝑤
• 𝑍2 = 𝑝𝑜𝑝𝑑𝑒𝑛

26
Example: Housing Value and Rent

27
Example: Housing Value and Rent
• First stage result:

28
Example: Housing Value and Rent
• Housing value affects rent significantly. One thousand dollars increase
in housing value will cause an increase in rent of 1.73 dollars.

• Population-housing ratio is significantly and negatively correlated with


rent. This is unintuitive, but we shouldn’t interpret the coefficient in a
causal way. (Families versus singles?)

• Population density is weak and thus may not be an effective instrument.

29
Tests
• Weak instruments: This is an F-test on the instruments in the first stage. The
null hypothesis is essentially that we have weak instruments, so a rejection
means our instruments are not weak.
• Wu-Hausman: This tests the consistency of the OLS estimates under the
assumption that the IV is consistent. When we reject, it means OLS is not
consistent, suggesting endogeneity is present. Otherwise, it essentially means
that the OLS and IV estimates are similar, and endogeneity may not have
been a big problem.
• Sargan: This is a test of instrument exogeneity using overidentifying
restrictions, called the J-statistic. It can only be used if you have more
instruments than endogenous regressors. If the null is rejected, it means that
at least one of our instruments is invalid, and possibly all of them. Otherwise,
it means all the instruments are valid.

30
Pitfalls of I.V. Estimation
• The exogeneity assumption of IV is hard to defend (especially in the
single IV case).
• IV estimators are consistent in large samples, but are biased in finite
small samples. Moreover, the bias can be substantial when instruments
are weak.
• By using only a portion of the covariation in the causal variable and
the outcome variable, IV estimators use only a portion of the
information in the data. This represents a loss in statistical power.
• Maybe only a subpopulation will react to the instruments. The estimates
will represent locally average causal effects. (The quarter of birth in
the dropout example.)
31
In-Class Exercise
• Conditioning on B, can A and C be correlated?
• Conditioning on X, is A independent of Y?
• If X is a binary cause, then the Naïve Estimator of the unconditional average treatment effect is
biased if we only condition on C. Correct?

U C D

A
X Y

Z 32
In-Class Exercise
• In the following causal system, which variable is a valid instrument for
the aggregate causal effect of 𝐷 on 𝑌?

Z U A

X Y

B C
D V

33
In-Class Exercise
• We want to investigate how Manning’s staffing level affects store sales using
OLS. What factors must be included in the regression? If the back-door path
cannot be completely blocked, what can be a valid instrument variable?

• If we want to estimate the causal effect of skipping classes on final exam


score, what factors should be included in the regression? What can be a
valid instrument variable?

• Consider estimating the effect of PC ownership on college GPA for senior


students in a large university. what factors should be included in the
regression? What can be a valid instrument variable?

34

You might also like