CH 1
CH 1
Mahbub Latif
October 2024
• Question 1
Do charter schools increase the test scores of elementary school students?
• If so, how large are the gains compared to those that could be realized by
implementing alternative educational reforms?
• Question 2
Does obtaining a college degree increase an individual’s labor market earnings?
• If so, is this particular effect large relative to the earnings gains that could be
achieved only through on-the-job training?
• Simple cause-and-effect questions are the motivation for much research in the social,
demographic, and health sciences
• Definitive answers to cause-and-effect questions may not always be possible to
formulate, given the constraints that researchers face in collecting data and
evaluating alternative explanations.
• Over the past four decades, a counterfactual model (also known as the
Neyman-Rubin Causal model) of causality has been developed and refined, and as a
result, a unified framework for the prosecution of causal questions is now available.
𝑃 (𝑌 = 𝑦, 𝑆 = 𝑠)
𝑃 (𝑌 = 𝑦 | 𝑆 = 𝑠) =
𝑃 (𝑆 = 𝑠)
• A typical associational parameter is defined by the parameters of regression of 𝑌 on
𝑆 , i.e., 𝐸(𝑌 | 𝑆 = 𝑠)
• Associational inference consists of making statistical inferences (estimates, tests, etc.)
about associational parameters relating 𝑌 and 𝑆 based on data gathered about 𝑌
and 𝑆 from units in 𝑈
• In associational inference, the role of time is to affect the definition of the population
of units or to specify the operational meaning of a particular variable, which will be
different from causal inference
195.0
cholesterol
192.5
190.0
0 2 4 6
exercise time (in hour)
195.0
cholesterol
192.5
190.0
0 2 4 6
exercise time (in hour)
cholesterol 195.0
192.5
190.0
0 2 4 6
exercise time (in hour)
age 20 25 30 40 50
cholesterol 195.0
192.5
190.0
0 2 4 6
exercise time (in hour)
age 20 25 30 40 50 overall
• A group of sick patients were given the option to try a new drug
• Among those who tried the new drug, a lower percentage recovered than among those
who did not
• General conclusion
• Drug appears to help men and women, but hurt the general population
• Recovery rates of 700 patients who were given access to the drug were recorded
• A total of 350 patients chose to take the drug, and the remaining 350 did not
• “Data show that income and marriage have a high positive correlation. Therefore,
your earnings will increase if you get married.”
• “Data show that as the number of fires increases, so does the number of firefighters.
Therefore, to reduce fires, you should reduce the number of firefighters.”
• “Data show that people who hurry tend to be late to their meetings. Don’t hurry, or
you will be late.”
• The host of a popular TV game show, Monty shows a contestant three doors – 𝐴, 𝐵,
and 𝐶 – a new car is behind one of the doors, and the other two doors have goats.
• If the contestant guesses correctly, the car is his; otherwise, he gets a goat.
• Suppose the contestant guesses 𝐴 at random and then Monty, who is forbidden from
revealing where the car is, opens Door 𝐶 , which, of course, has a goat behind it.
• He tells the contestant that he can now switch to Door 𝐵, or stick with Door 𝐴.
• Whichever the contestant picks, he will get what’s behind it.
• Let each door have an equal chance to get the car behind, i.e.,
• Initially the contestant picked the door 𝐴 and Monty opened the door 𝐶 , then we
have
𝑃 (open-C | car-in-A) = 1/2
𝑃 (open-C | car-in-B) = 1
𝑃 (open-C | car-in-C) = 0
• Then show switching to door 𝐵 is a better choice to win the car, i.e.
• For the example with the causal effect of having a college degree rather than only a
high school diploma on subsequent earnings:
• adults who have completed only high school diplomas have theoretical what-if earnings
under the state “have a college degree,” and
• adults who have completed college degrees have theoretical what-if earnings under the
state “have only a high school diploma”
• These what-if potential outcomes are counterfactual in the sense that they exist in
theory but are not observed
1 for treated
𝐴={
0 for untreated
1 for death
𝑌 ={
0 for survival
p(Y^(a=0)=0|A=1)
trt deoay ulta hoyeche
• Patient Zeus
• On January 1, Zeus got a new heart after waiting for a heart transplant, and five days
later he died
• We know somehow “had Zeus not received a heart transplant on January 1, he would
have been alive five days later”
p(Y^(a=0)=1|A=1)
• Patient Hera
• On January 1, Hera got a new heart after waiting for a heart transplant, and five days
later she was alive
• We know somehow “had Hera not received a heart transplant on January 1, she would
still have been alive five days later”
• Counterfactuals
• 𝑌 𝑎=1 → outcome that would have observed under 𝑎 = 1
zeus ke trt dise dekhe mara
• 𝑌 𝑎=0 → outcome that would have observed under 𝑎 = 0
gese so y^a=1 hobe, ar na
dile jeto na so y^a=0 hobe,
ar hera r jonno, na dileo
patient 𝐴 𝑌 𝑎=0 𝑌 𝑎=1 beche thakto, so dei ba nai
dei not matter!, so all
Zeus 1 0 1 counterfatuals zero
Hera 1 0 0
• Both 𝑌 𝑎=1 and 𝑌 𝑎=0 are random variables as it can take different values for different
individuals
• The notations 𝑌 (1) and 𝑌 (0) are also used for 𝑌 𝑎=1 and 𝑌 𝑎=0 , respectively, in the
literature
Mahbub Latif Introduction to Causal Inference October 2024 31 / 80
Associational and causal measures
• The roles of treatment 𝐴 measured on unit 𝑢 ∈ 𝑈 are different for the models of
associational and causal inference are different
• In associational inference 𝐴(𝑢) is a characteristic of 𝑢 and in causal inference 𝐴(𝑢)
indicates exposure of 𝑢 to a specific cause
• The role of time is important in causal inference, and a unit is exposed to a cause
that must occur at some specific time or within a specific time period
• Variables are now divided into two classes: pre-exposure and post-exposure
• Response variable falls in the class of post-exposure
• The treatment 𝐴 has a causal effect on an individual outcome 𝑌𝑖 if for the individual
we have
𝑌𝑖𝑎=1 ≠ 𝑌𝑖𝑎=0 ⇒ Δ𝑖 = 𝑌𝑖𝑎=1 − 𝑌𝑖𝑎=0
• For Zeus, the heart transplant has a causal effect, but for Hera, the treatment does not
have a causal effect
• The variables 𝑌 𝑎=1 and 𝑌 𝑎=0 are known as potential outcomes or counterfactual
outcomes becasue only one will ultimately be realized
• The outcomes that would have been observed for each individual under a possible
treatment value 𝑎 is known as potential outcome def
• Counterfactual outcomes refers to the outcome corresponding “counter to the fact”
situation
• The outcomes that would have been observed under a treatment value that the
individual did not actually receive
• The term potential outcomes was first used by Jerzey Neyman (1923) in the context
of randomized experiments
• 𝐴𝑖 is the set of treatments for which unit 𝑖 can be exposed and 𝐴𝑖 = 𝐴 because the
set of treatments is the same for all units protteke same trt dile !
• For each individual, the observed outcome 𝑌 is the counterfactual outcome under the
treatment value that the individual actually experienced
• The counterfactual outcome 𝑌 𝑎 under treatment 𝑎 is factual for some individuals and
counterfactual for others
• Consistency is a component of Rubin’s
• “Stable-unit-treatment-value assumption” (SUTVA)
• The potential outcomes for any unit don’t vary with the treatments assigned to other
units (No interference), i.e.
• 𝑌𝑖 = 𝑌𝑖 (𝐴𝑖 ) and it does not depend on 𝐴𝑖′ , 𝑖′ ≠ 𝑖
• There are no different versions of each treatment level, which leads to different
potential outcomes (No hidden variations of treatments)
• Under SUVTA, for a treatment with two levels, the dimension of counterfactuals of a
population of size 𝑁 is 𝑁 × 2, otherwise, it would be 𝑁 × 2𝑁
• There are two general solutions to the fundamental problem of causal inference
• Scientific solution and statistical solution
Let 𝑁 be the populaiton size, for the 𝑖𝑡ℎ individual, we can define
𝑌𝑖 (0) if 𝐴𝑖 = 0 𝑌𝑖 (1) if 𝐴𝑖 = 0
𝑌𝑖obs = 𝑌𝑖 (𝐴𝑖 ) = { 𝑎𝑛𝑑 𝑌𝑖miss = 𝑌𝑖 (1 − 𝐴𝑖 ) = {
𝑌𝑖 (1) if 𝐴𝑖 = 1 𝑌𝑖 (0) if 𝐴𝑖 = 1
Ai=0 bosaile Yi(1-Ai)=Yi(1) evave
• Observed response
• Potential outcomes are hypothetical and assumed to exist before treatment is assigned
• Observed outcome does not exist until it is assessed, which is after the treatment is
assigned
trt deoar age theke jani je ki hobe for potentia; er
jonnn,
but obs er jonno trt deoar por jani!
Mahbub Latif Introduction to Causal Inference October 2024 40 / 80
Average causal effects
specifiw grp of people lagbe ACE er jonno wgile ice er jonno individual holei
hobe
• Consider a population of 20 subjects for which all the counterfactuals (𝑌 𝑎=0 and
𝑌 𝑎=1 ) are known
• Pr(𝑌 𝑎=1 = 1) = 𝐸(𝑌 𝑎=1 ) → proportion of individuals who would have developed
the outcome 𝑌 had everybody in the population been treated
• Pr(𝑌 𝑎=0 = 1) = 𝐸(𝑌 𝑎=0 ) → proportion of individuals who would have developed
the outcome 𝑌 had everybody in the population been untreated
• Average causal effects in the population
• The treatment 𝐴 has an average causal effect on the outcome of interest 𝑌 if
• When treatment is not dichotomous, the causal effect can be defined as a contrast of
any functional (e.g., mean, median, hazard, cdf, …) of the distributions of
counterfactuals under different treatment values
• Of the 12 subjects, six were harmed by the treatment 𝐴, and the others benefited
from it.
Mahbub Latif Introduction to Causal Inference October 2024 48 / 80
Average causal effects
• Average causal effect is always equal to the average of individual causal effects
ind ce nai, i.e= trt effect nai, sobar
𝐸(𝑌 𝑎=1 ) − 𝐸(𝑌 𝑎=0 ) = 𝐸(𝑌 𝑎=1 − 𝑌 𝑎=0 )jonno same outcome, so sharp nul
true , as kono diff nai :)
• When there is no causal effect for any individual in the population (i.e. 𝑌 𝑎=1 = 𝑌 𝑎=0
for all individuals), we say that sharp causal null hypothesis is true
• Average causal effect can sometimes be identified from the data, even if individual
causal effects cannot
• From now on, “average causal effect” is referred to as “causal effect”
𝑃 (𝑌 𝑎=1 = 1) − 𝑃 (𝑌 𝑎=0 = 1)
risk diff=p(Y(1)=1)-p(y(0)=1)
=E(Y(1))-E(Y(0))=E(Y(1)-Y(0))=avg(ind casual effect)
• For difference measures, the causal null value is zero, and for ratio measures, it is one
• Causal risk difference in the population is the average of the individual causal effects,
but the causal risk ratio in the population is not the average of the individual causal
effects
• Exercise: Obtain the causal risk difference, risk ratio, and odds ratio for our
hypothetical population
• If one treats 100 million patients, there will be 10 million fewer deaths than if one does
not treat those 100 million patients
• One needs to treat ten patients to save one live
• The number needed to treat (𝑁 𝑁 𝑇 ) is the average number of individuals that need
to receive treatment to reduce the number of cases by one
−1
𝑁𝑁𝑇 =
𝑃 (𝑌 𝑎=1 = 1) − 𝑃 (𝑌 𝑎=0 = 1)
• Data obtained from actual studies don’t have information on counterfactuals, but the
labels of assigned treatments and the observed outcomes for all individuals are
available
subjects 𝐴 𝑌 subjects 𝐴 𝑌
Rheia 0 0 Leto 0 0
Kronos 0 1 Ares 1 1
Demeter 0 0 Athena 1 1
Hades 0 0 Hephaestus 1 1
Hestia 1 0 Aphrodite 1 1
Poseidon 1 0 Cyclope 1 1
Hera 1 0 Persephone 1 1
Zeus 1 1 Hermes 1 0
Artemis 0 1 Hebe 1 0
Apollo 0 1 Dionysus 1 0
Risk in treated
• 𝑃 (𝑌 = 1 | 𝐴 = 1) → proportion of individuals who developed the outcome 𝑌 among
those who received the treatment (i.e. 𝐴 = 1)
Risk in untreated
• 𝑃 (𝑌 = 1 | 𝐴 = 0) → proportion of individuals who developed the outcome 𝑌 among
those who received no treatment (i.e. 𝐴 = 0)
𝑃 (𝑌 = 1 |𝐴 = 1) ≠ 𝑃 (𝑌 = 1 |𝐴 = 0)
𝑃 (𝑌 = 1 |𝐴 = 1) = 𝑃 (𝑌 = 1 |𝐴 = 0)
try dey na dey, mara jaoar prob same ,
• Independence is represented by 𝐴 ⟂
⟂ 𝑌 or 𝑌 ⟂⟂ 𝐴 so, trt o outcome independet,
• Risk difference
𝑃 (𝑌 = 1 | 𝐴 = 1) − 𝑃 (𝑌 = 1 | 𝐴 = 0)
• Risk ratio
𝑃 (𝑌 = 1 | 𝐴 = 1)
𝑃 (𝑌 = 1 | 𝐴 = 0)
• Odds ratio
𝑃 (𝑌 = 1 | 𝐴 = 1)/𝑃 (𝑌 = 0 | 𝐴 = 1)
𝑃 (𝑌 = 1 | 𝐴 = 0)/𝑃 (𝑌 = 0 | 𝐴 = 0)
• Risk difference
7 3
𝑃 (𝑌 = 1 | 𝐴 = 1) − 𝑃 (𝑌 = 1 | 𝐴 = 0) = − = 0.110
13 7
• Risk ratio
(7/13)
𝑃 (𝑌 = 1 | 𝐴 = 1)/𝑃 (𝑌 = 1 | 𝐴 = 0) = = 1.256
(3/7)
• Risk ratio
𝑃 (𝑌 = 1 | 𝐴 = 1)/𝑃 (𝑌 = 0 | 𝐴 = 1) (7/13)/(6/13)
= = 1.556
𝑃 (𝑌 = 1 | 𝐴 = 0)/𝑃 (𝑌 = 0 | 𝐴 = 0) (3/7)/(4/7)
• An estimand is the parameter that represents the causal effect of interest and it is a
function of counterfactuals
• Estimands are usually formulated based on causal assumptions and considerations of
the study design
• Let us consider a population of size 𝑁 and 𝐴𝑖 = 𝐴, i.e., all units of the population
received treatment from the same set of treatment combinations 𝐴
𝜏𝑖 = 𝑌𝑖 (1) − 𝑌𝑖 (0)
• Average treatment effect for the treated units (ATT) and control units (ATC)
1 𝑁 1 𝑁
𝜏𝑓𝑠 = ∑ (𝑌𝑖 (1) − 𝑌𝑖 (0)) = ∑ (𝑌 𝑎=1 − 𝑌𝑖𝑎=0 )
𝑁 𝑖=1 𝑁 𝑖=1 𝑖
• Average treatment effect for the treated, i.e. those who were exposed to the treatment
1
𝜏𝑓𝑠,𝑡 = ∑ (𝑌 (1) − 𝑌𝑖 (0))
𝑁𝑡 𝑖∶𝐴 =1 𝑖
𝑖
• Interest is in the average effect of the job-training program on hourly wages, averaged
over only those who would have been employed irrespective of the level of treatment
1
𝜏𝑓𝑠,𝑝𝑜𝑠 = ∑ (𝑌𝑖 (1) − 𝑌𝑖 (0))
𝑁𝑝𝑜𝑠 𝑖∶𝑌𝑖 (1)>0,𝑌𝑖 (0)>0
1
a function that remains invariant under any permutation of its rows
Mahbub Latif Introduction to Causal Inference October 2024 71 / 80
Key concepts
• Counterfactuals/potential outcomes
• Individual causal effect
• Average causal effect
• Effect measure
• Association and related measures
• §1: Introduction
• §2: Model for associational inference
• §3: Rubin’s model for causal inference
• §4: Some special cases of causal inference
• §5: Comments on selected philosophers
• §6: Comments from a few statisticians
• §7: What can be a cause?
• §8: Comments on causal inference in various disciplines
• §9: Summary
• Charig et al. (1986) discussed a study about treatments for kidney stones
• 𝑍 represents the treatment (𝑍 = 1 for open surgical procedure and 𝑍 = 0 for small
puncture procedure)
• Outcome 𝑌 is binary (1 for success and 0 for failure)
• Data on another variable 𝑋 , the size of the stone, is also available (0 for small and 1 for
large stones)
• Can you show Simpson’s paradox using this data set?
• Explain the paradox after examining the association between (𝑋 and 𝑍 ) and (𝑋 and 𝑌 )
X Z Y n
0 1 1 81
0 1 0 6
0 0 1 234
0 0 0 36
1 1 1 192
1 1 0 71
1 0 1 55
1 0 0 25
𝐸(𝑌 | 𝐴 = 1) − 𝐸(𝑌 | 𝐴 = 1)
• Class notes of the workshop “Introduction to Causal Inference”, which was held at
the Harvard School of Public Health during June 4-8, 2018
• Hernan MA and Robins JM (2020). “Causal Inference: What If”. Boca Raton:
Chapman & Hall/CRC.
• Rubin D and Imbens G (2015). “Causal Inference for Statistics, Social, and
Biomedical Sciences: An Introduction”. Cambridge University Press.
Bickel, P. J., Hammel, E. A., and O’Connell, J. W. (1975). Sex bias in graduate
admissions: Data from berkeley: Measuring bias is harder than is usually assumed,
and the evidence is sometimes contrary to expectation. Science, 187(4175):398–404.
Charig, C. R., Webb, D. R., Payne, S. R., and Wickham, J. E. (1986). Comparison of
treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and
extracorporeal shockwave lithotripsy. Br Med J (Clin Res Ed), 292(6524):879–882.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American
Statistical Association, 81(396):945–960.
Simpson, E. H. (1951). The interpretation of interaction in contingency tables. Journal
of the Royal Statistical Society: Series B (Methodological), 13(2):238–241.