09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
ZHANG, Wei
Assistant Professor
HKU Business School
09 Causal Inference II
1
Agenda
• Causal Graphs
• Back-door-blocking Strategy
• Instrumental Variable
2
Causal Graphs
• How to make sure that the control variables allow a perfect stratification?
• We need causal graphs to conceptualize our problem.
• The goal is to represent explicitly all causes of the outcome.
3
Causal Graphs
U C D
A
X Y
4
Transitivity of Causal Relations
• For any three variables 𝐴, 𝐵, and 𝐶, if 𝐴 causes 𝐵 and 𝐵 causes 𝐶, then 𝐴 causes 𝐶.
• This is an assumption.
• Counter-example: 𝐴 = playing computer games; 𝐵 = academic performance; 𝐶 = brain
capability; 𝐷 = study time (i.e., the effect of 𝐴 on 𝐶 is cancelled out by different mechanisms.)
A B
5
Causal Structures
• There are three basic patterns of causal relationship that would exist for
three variables that are structurally related to each other.
• (1) A chain of mediation:
• Unconditional dependence: P(AB)≠P(A)P(B)
• Conditional independence: P(AB|C)=P(A|C)P(B|C)
A C B
6
Unconditional Correlation
• The Reichenbach Principle: Any two variables A and B are unconditionally
correlated if and only if either (a) A causes B, (b) B causes A, (c) a common cause C
causes both A and B, or (d) any combination of (a)-(c).
(a) A B (b) A B
(c) A B (d) A B
C C
7
Conditional Correlation
• Two variables are unconditionally independent, but are correlated
conditioning on C (or E), if
• (1) they are in the positions of A and B, or
• (2) they are in the positions of X and B, or Which of the following statement(s) is(are) wrong?
(A) If A and B are correlated, and B and C are
• (3) they are in the positions of X and Y. correlated, then A and C must also be correlated.
(B) If A and C are correlated, and B and C are
correlated, then A and B could be independent.
C (C) If A and B are independent, and A and C are
A B correlated, then B and C must be independent.
X Y (D) If A and B are independent unconditionally and
are correlated conditioning on C, then A and C
must be correlated unconditionally.
E
8
Causal Inference Strategy
• Conditioning on “back-door-blocking” variables
• Identify paths that correlate X and Y; block the path by conditioning on a variable on the path
• Condition on C, or
• Condition on A and B
V
U C D
A
X Y
Z
9
Back-door-blocking Strategy: An Example
• You are running an online shopping website that focuses on fashion apparel. Your company normally
purchases a product from a supplier before the selling season starts. During the season, customers that
purchase the product can give a rating of the product on the website. When the selling season ends, any
leftovers will be shipped back to the supplier and partial refund will be provided. You want to use the
historical data to estimate how customer rating of a product influences the sales of the product. You have
prepared data for the following variables:
• Y: the total sales of a product
• X: the average customer rating
• P: the price of a product
• S: the set of dummies that indicate the category of a product
(e.g., male versus female, season, and type)
• W: the beginning inventory level of a product
• C: the number of clicks associated with a product
• Draw the causal graph and find the BDB variable(s).
10
When Conditioning Does Not Work
• When we cannot block all back-door paths, conditioning strategy does not work.
• We cannot use matching or simple regressions to estimate the effect of
• Education on income with family background unknown
• Screen size on online purchase behavior with individual income unknown
• Firm revenue on default risk with economic condition unknown
V
• A possible remedy:
• instrumental variable (IV) estimation U C D
• Z causes X
B
• Z is related to Y only through X’s effect
• if Z and Y are correlated A
• it must be that X causes Y X Y
11
Z
Instrumental Variable
• Suppose 𝑌 = 𝛼 + 𝛿𝐷 + 𝛽𝑋 + 𝛾𝑈 + 𝜖 and the causal structure is given below.
Assume that 𝜖 ⊥ 𝐷, 𝑋 and that 𝐷 can take many values.
• E.g., Y is the total amount of smoking, D is years of education, U is family income, X is place of
origin, and Z is exam training.
• What can go wrong if we use OLS method to estimate 𝛿? Omitted Variable Bias.
12 –
Z U 10 –
8–
𝑌 = 𝛼 + 𝛽𝑋 + 𝛿𝐷 + 𝜖 ∗
Y
6–
D Y 4–
2–
+ 𝛿𝐷
𝑌 = 𝛼ො + 𝛽𝑋 መ
0–
| | | | | | | | |
X 0 1 2 3 4 5 6 7 8
D
12
Instrumental Variable
• Alternatively, we can describe our causal system with two simultaneous equations:
• (1) 𝑌 = 𝛼 + 𝛿𝐷 + 𝛽𝑋 + 𝑢, where 𝑢 = 𝛾𝑈 + 𝜖.
• (2) 𝐷 = 𝑎 + 𝑏𝑍 + 𝑐𝑋 + 𝑣, where 𝑣 is correlated with 𝑢 by 𝑈.
• If we develop a regression model based on equation (2), then 𝑏 can be estimated without OVB. Let
𝑏 denote the estimated 𝑏.
• If we replace equation (2) into (1), we have
• (3) 𝑌 = 𝛼 + 𝛿𝑎 + 𝛿𝑏𝑍 + 𝛽 + 𝛿𝑐 𝑋 + 𝑢 + 𝛿𝑣 = 𝐴 + 𝐵𝑍 + 𝐶𝑋 + 𝜀
• Since 𝑍 is uncorrelated with 𝜀 , equation (3) and 𝐵 can be estimated without OVB. Note that 𝐵 =
𝛿𝑏. Hence, 𝛿 can be consistently estimated by 𝐵/ 𝑏 (in two steps).
13
Instrumental Variable
• Formally, for 𝑍 to be an instrumental variable for the causal relationship 𝐷 → 𝑌 , 𝑍 must satisfy
two conditions:
• (1) can be tested by regressing 𝐷 on 𝑍; (2) cannot be directly tested because 𝛿 cannot be correctly
estimated without a valid IV.
14
Instrumental Variable
Z U Z U
D Y D Y
Z is a valid IV Z is a valid IV
Z U
D Y
Z U Z U Is Z a valid IV?
D Y D Y
Z is not a valid IV Z is not a valid IV
15
Examples of I.V.
• Using day of the week of hospital admission as an instrument for the effect
of waiting time to surgery on length of stay and patient mortality. (Assume
factures happen randomly. Many surgeons operate only on weekdays and,
therefore, patients admitted on weekends may wait longer.)
16
Examples of I.V.
• Using favorable growing condition (e.g., the amount of rainfall) as an
instrument for the effect of market price on demand of an agricultural
product. (Favorable growing conditions will affect supply of the product but
not the demand, and supply will affect the price.)
• Counter-example:
• Using tax rate for tobacco products as an instrument for the effect of
smoking on health. (The tax rate for tobacco products is determined mainly
by political and economic factors. Economic factors could be related to the
general health condition of the population. This IV is also invalid if the
government considered general health condition when determining the tax
rate.)
17
2SLS Estimation
• Suppose there is a single instrument.
18
2SLS Estimation
• With large samples,
𝛿መ2𝑆𝐿𝑆 ~ 𝑁 𝛿, 𝜎𝛿2
2𝑆𝐿𝑆
• Important Notes:
• The OLS standard errors from the second step regression are not correct because
they do not take into account estimation errors in the first step.
• To get correct standard errors, we’d better use a single command in a statistical
software (e.g., R and STATA).
• In R, install ivpack first, and then check package AER.
• Although uncorrelated with 𝑈, 𝑋 is usually added to both stages of regression in
order to increase the consistency.
19
General 2SLS Estimation
• There is set of unobserved explanatory variables: 𝑈.
• There are several explanatory variables of interest: 𝐷1 , … , 𝐷𝑘 , 𝑋1 , … , 𝑋𝑞 ,
where 𝐷1 , … , 𝐷𝑘 are directly correlated with 𝑈 and 𝑋1 , … , 𝑋𝑞 are not
directly correlated with 𝑈.
𝑌 = 𝛽0 + 𝛽1 𝐷1 + ⋯ + 𝛽𝑘 𝐷𝑘 + 𝛾1 𝑋1 + ⋯ + 𝛾𝑞 𝑋𝑞 + 𝜖
• Suppose there are more than one instruments: 𝑍1 , … , 𝑍𝑚 , where we must
have 𝑚 ≥ 𝑘.
• If 𝑚 = 𝑘, we say 𝛽1 , … , 𝛽𝑘 are exactly identified.
• If 𝑚 > 𝑘, we say 𝛽1 , … , 𝛽𝑘 are over-identified.
• Very often, 𝐷1 , … , 𝐷𝑘 are called endogenous variables, 𝑋1 , … , 𝑋𝑞 are called
included exogenous variables, and 𝑍1 , … , 𝑍𝑚 are called excluded
exogenous variables.
20
General 2SLS Estimation
• Step 1:
• Regress 𝐷𝑖 on 𝑋1 , … , 𝑋𝑞 , 𝑍1 , … , 𝑍𝑚 by OLS for 𝑖 = 1, … , 𝑘.
• Calculate 𝑛 predicted values for each 𝐷𝑖 : 𝐷 𝑖𝑗 where 𝑖 = 1, … , 𝑘 and 𝑗 = 1, … , 𝑛.
• Step 2:
• Regress 𝑌 on 𝐷 1 , … , 𝐷
𝑘 , 𝑋1 , … , 𝑋𝑞 by OLS and obtain the 2SLS estimators for
𝛽1 , … , 𝛽𝑘 , 𝛾1 , … , 𝛾𝑞 .
• Note:
• We normally include 𝑋1 , … , 𝑋𝑞 in both steps to reduce the estimation error.
• Use a single command to get correct standard errors and perform hypothesis tests.
21
Example: Housing Value and Rent
• Data from 1980 census on the median dollar value of owner-occupied housing (hsngval) and the
median monthly gross rent (rent) for each state in US (50 observations).
• We want to study how rent is affected by hsngval.
Variable Description Mean Std. Dev. Min Max
pop Population in 1980 4,518,149 4,715,038 401,851 2.37e+07
popgrow Pop. growth 1970-80 16.29 14.39 -3.6 63.8
popden Pop./sq. miles 1,543.72 2,213.894 7 9,862
pcturban Percentage urban 66.95 14.41 33.77 91.29
faminc Median family income 1979 19,499.92 2,617.22 14,591 28,395
hsng Housing units in 1980 1,762,686 1,847,308 162,825 9,279,036
hsnggrow Percentage housing growth 35.65 19.30 9.02 97.01
hsngval Median housing value 48,484 15,770.24 31,100 119,400
rent Median gross rent 234.76 35.35 180 368
22
Example: Housing Value and Rent
• Both housing value and rent are determined by demand and supply factors,
and they share many of these factors.
• Supply side: hsng, government policy (unobserved)
• Demand side: pop, faminc, government policy (unobserved)
• Different from the rental market, housing value also depends on the future
supply and demand as well as the land value. Hence, it is reasonable to
believe that popgrow, hsnggrow, and popden only affect hsngval directly,
and they can serve as instruments for the estimation.
23
Example: Housing Value and Rent
• The conceptual model
hsnggrow
hsngval rent
popden
U
24
Example: Housing Value and Rent
pop popgrow popden pcturban faminc hsng hsnggrow hsngval rent
pop 1.0000
popgrow -0.2311 1.0000
popden 0.2174 -0.4449 1.0000
pcturban 0.4557 0.1499 0.4789 1.0000
faminc 0.1711 0.0111 0.3127 0.5790 1.0000
hsng 0.9986 -0.2110 0.2077 0.4588 0.1637 1.0000
hsnggrow -0.2678 0.9661 -0.4140 0.1707 0.1135 -0.2474 1.0000
hsngval 0.0619 0.3350 0.1381 0.5631 0.6773 0.0621 0.3948 1.0000
rent 0.1456 0.3834 0.1824 0.5958 0.8455 0.1518 0.4823 0.7987 1.0000
25
Example: Housing Value and Rent
• We estimate the following model:
• Outcome variable: 𝑌 = 𝑟𝑒𝑛𝑡
• Endogenous explanatory variable: 𝐷 = ℎ𝑠𝑛𝑔𝑣𝑎𝑙
• Included exogenous variables:
• 𝑋1 = 𝑝𝑐𝑡𝑢𝑟𝑏𝑎𝑛
• 𝑋2 = 𝑝ℎ𝑟𝑎𝑡𝑖𝑜
• 𝑋3 = 𝑓𝑎𝑚𝑖𝑛𝑐
• Excluded exogenous variables (i.e., instrumental variables):
• 𝑍1 = 𝑝ℎ𝑔𝑟𝑜𝑤
• 𝑍2 = 𝑝𝑜𝑝𝑑𝑒𝑛
26
Example: Housing Value and Rent
27
Example: Housing Value and Rent
• First stage result:
28
Example: Housing Value and Rent
• Housing value affects rent significantly. One thousand dollars increase
in housing value will cause an increase in rent of 1.73 dollars.
29
Tests
• Weak instruments: This is an F-test on the instruments in the first stage. The
null hypothesis is essentially that we have weak instruments, so a rejection
means our instruments are not weak.
• Wu-Hausman: This tests the consistency of the OLS estimates under the
assumption that the IV is consistent. When we reject, it means OLS is not
consistent, suggesting endogeneity is present. Otherwise, it essentially means
that the OLS and IV estimates are similar, and endogeneity may not have
been a big problem.
• Sargan: This is a test of instrument exogeneity using overidentifying
restrictions, called the J-statistic. It can only be used if you have more
instruments than endogenous regressors. If the null is rejected, it means that
at least one of our instruments is invalid, and possibly all of them. Otherwise,
it means all the instruments are valid.
30
Pitfalls of I.V. Estimation
• The exogeneity assumption of IV is hard to defend (especially in the
single IV case).
• IV estimators are consistent in large samples, but are biased in finite
small samples. Moreover, the bias can be substantial when instruments
are weak.
• By using only a portion of the covariation in the causal variable and
the outcome variable, IV estimators use only a portion of the
information in the data. This represents a loss in statistical power.
• Maybe only a subpopulation will react to the instruments. The estimates
will represent locally average causal effects. (The quarter of birth in
the dropout example.)
31
In-Class Exercise
• Conditioning on B, can A and C be correlated?
• Conditioning on X, is A independent of Y?
• If X is a binary cause, then the Naïve Estimator of the unconditional average treatment effect is
biased if we only condition on C. Correct?
U C D
A
X Y
Z 32
In-Class Exercise
• In the following causal system, which variable is a valid instrument for
the aggregate causal effect of 𝐷 on 𝑌?
Z U A
X Y
B C
D V
33
In-Class Exercise
• We want to investigate how Manning’s staffing level affects store sales using
OLS. What factors must be included in the regression? If the back-door path
cannot be completely blocked, what can be a valid instrument variable?
34