Instant download Cause and correlation in biology a user s guide to path analysis structural equations and causal inference Hernan M.A. pdf all chapter
Instant download Cause and correlation in biology a user s guide to path analysis structural equations and causal inference Hernan M.A. pdf all chapter
com
https://textbookfull.com/product/cause-and-correlation-in-
biology-a-user-s-guide-to-path-analysis-structural-
equations-and-causal-inference-hernan-m-a/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/because-without-cause-non-causal-
explanations-in-science-and-mathematics-1st-edition-marc-lange/
textboxfull.com
https://textbookfull.com/product/a-buyer-s-and-user-s-guide-to-
astronomical-telescopes-binoculars-2007th-edition-mullaney-james/
textboxfull.com
https://textbookfull.com/product/introduction-to-statistical-decision-
theory-utility-theory-and-causal-analysis-silvia-bacci/
textboxfull.com
https://textbookfull.com/product/an-econometric-model-of-the-us-
economy-structural-analysis-in-56-equations-john-j-heim/
textboxfull.com
A Student s Guide to Dimensional Analysis Lemons
https://textbookfull.com/product/a-student-s-guide-to-dimensional-
analysis-lemons/
textboxfull.com
https://textbookfull.com/product/the-proact-root-cause-analysis-quick-
reference-guide-first-edition-latino/
textboxfull.com
https://textbookfull.com/product/the-brain-a-user-s-manual-a-simple-
guide-to-the-world-s-most-complex-machine-marco-magrini/
textboxfull.com
https://textbookfull.com/product/sharia-compliant-a-user-s-guide-to-
hacking-islamic-law-1st-edition-rumee-ahmed/
textboxfull.com
https://textbookfull.com/product/quasi-experimentation-a-guide-to-
design-and-analysis-1st-edition-charles-s-reichardt/
textboxfull.com
Causal Inference
2 Randomized experiments 13
2.1 Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Conditional randomization . . . . . . . . . . . . . . . . . . . . . 17
2.3 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Inverse probability weighting . . . . . . . . . . . . . . . . . . . . 20
3 Observational studies 25
3.1 Identifiability conditions . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Positivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Consistency: First, define the counterfactual outcome . . . . . . 31
3.5 Consistency: Second, link to the data . . . . . . . . . . . . . . . 35
3.6 The target trial . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Effect modification 41
4.1 Definition of effect modification . . . . . . . . . . . . . . . . . . . 41
4.2 Stratification to identify effect modification . . . . . . . . . . . . 43
4.3 Why care about effect modification . . . . . . . . . . . . . . . . . 45
4.4 Stratification as a form of adjustment . . . . . . . . . . . . . . . 47
4.5 Matching as another form of adjustment . . . . . . . . . . . . . . 49
4.6 Effect modification and adjustment methods . . . . . . . . . . . 50
5 Interaction 55
5.1 Interaction requires a joint intervention . . . . . . . . . . . . . . 55
5.2 Identifying interaction . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 Counterfactual response types and interaction . . . . . . . . . . . 58
5.4 Sufficient causes . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.5 Sufficient cause interaction . . . . . . . . . . . . . . . . . . . . . 63
5.6 Counterfactuals or sufficient-component causes? . . . . . . . . . . 65
iv Causal Inference
7 Confounding 83
7.1 The structure of confounding . . . . . . . . . . . . . . . . . . . . 83
7.2 Confounding and exchangeability . . . . . . . . . . . . . . . . . . 85
7.3 Confounders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.4 Single-world intervention graphs . . . . . . . . . . . . . . . . . . 91
7.5 How to adjust for confounding . . . . . . . . . . . . . . . . . . . 93
8 Selection bias 97
8.1 The structure of selection bias . . . . . . . . . . . . . . . . . . . 97
8.2 Examples of selection bias . . . . . . . . . . . . . . . . . . . . . . 99
8.3 Selection bias and confounding . . . . . . . . . . . . . . . . . . . 101
8.4 Selection bias and censoring . . . . . . . . . . . . . . . . . . . . . 103
8.5 How to adjust for selection bias . . . . . . . . . . . . . . . . . . . 105
8.6 Selection without bias . . . . . . . . . . . . . . . . . . . . . . . . 108
By reading this book you are expressing an interest in learning about causal inference. But, as a human being,
you have already mastered the fundamental concepts of causal inference. You certainly know what a causal effect
is; you clearly understand the difference between association and causation; and you have used this knowledge
constantly throughout your life. In fact, had you not understood these causal concepts, you would have not
survived long enough to read this chapter–or even to learn to read. As a toddler you would have jumped right
into the swimming pool after observing that those who did so were later able to reach the jam jar. As a teenager,
you would have skied down the most dangerous slopes after observing that those who did so were more likely to
win the next ski race. As a parent, you would have refused to give antibiotics to your sick child after observing
that those children who took their medicines were less likely to be playing in the park the next day.
Since you already understand the definition of causal effect and the difference between association and cau-
sation, do not expect to gain deep conceptual insights from this chapter. Rather, the purpose of this chapter is
to introduce mathematical notation that formalizes the causal intuition that you already possess. Make sure that
you can match your causal intuition with the mathematical notation introduced here. This notation is necessary
to precisely define causal concepts, and we will use it throughout the book.
the treatment value = 0. =1 and =0 are also random variables. Zeus
Sometimes we abbreviate the ex- has =1 = 1 and =0 = 0 because he died when treated but would have
pression “individual has outcome survived if untreated, while Hera has =1 = 0 and =0 = 0 because she
= 1” by writing = 1. Tech- survived when treated and would also have survived if untreated.
nically, when refers to a specific We can now provide a formal definition of a causal effect for an individ-
individual, such as Zeus, is not ual : the treatment has a causal effect on an individual’s outcome if
a random variable because we are =1 6= =0 for the individual. Thus the treatment has a causal effect on
assuming that individual counter- Zeus’s outcome because =1 = 1 6= 0 = =0 , but not on Hera’s outcome
factual outcomes are deterministic because =1 = 0 = =0 . The variables =1 and =0 are referred to
(see Technical Point 1.2). as potential outcomes or as counterfactual outcomes. Some authors prefer the
Causal effect for individual : term “potential outcomes” to emphasize that, depending on the treatment that
=1 6= =0 is received, either of these two outcomes can be potentially observed. Other
authors prefer the term “counterfactual outcomes” to emphasize that these
outcomes represent situations that may not actually occur (that is, counter to
the fact situations).
For each individual, one of the counterfactual outcomes–the one that cor-
responds to the treatment value that the individual actually received–is ac-
tually factual. For example, because Zeus was actually treated ( = 1), his
counterfactual outcome under treatment =1 = 1 is equal to his observed
(actual) outcome = 1. That is, an individual with observed treatment
equal to , has observed outcome equal to his counterfactual outcome .
This equality can be succinctly expressed as = where denotes the
counterfactual evaluated at the value corresponding to the individual’s
Consistency: observed treatment . The equality = is referred to as consistency.
if = , then = = Individual causal effects are defined as a contrast of the values of counterfac-
tual outcomes, but only one of those outcomes is observed for each individual–
the one corresponding to the treatment value actually experienced by the in-
dividual. All other counterfactual outcomes remain unobserved. The unhappy
conclusion is that, in general, individual causal effects cannot be identified–
that is, cannot be expressed as a function of the observed data–because of
missing data. (See Fine Point 2.1 for a possible exception.)
Interference. An implicit assumption in our definition of counterfactual outcome is that an individual’s counterfactual
outcome under treatment value does not depend on other individuals’ treatment values. For example, we implicitly
assumed that Zeus would die if he received a heart transplant, regardless of whether Hera also received a heart transplant.
That is, Hera’s treatment value did not interfere with Zeus’s outcome. On the other hand, suppose that Hera’s getting
a new heart upsets Zeus to the extent that he would not survive his own heart transplant, even though he would
have survived had Hera not been transplanted. In this scenario, Hera’s treatment interferes with Zeus’s outcome.
Interference between individuals is common in studies that deal with contagious agents or educational programs, in
which an individual’s outcome is influenced by their social interaction with other population members.
In the presence of interference, the counterfactual for an individual is not well defined because an individual’s
outcome depends also on other individuals’ treatment values. As a consequence “the causal effect of heart transplant on
Zeus’s outcome” is not well defined when there is interference. Rather, one needs to refer to “the causal effect of heart
transplant on Zeus’s outcome when Hera does not get a new heart” or “the causal effect of heart transplant on Zeus’s
outcome when Hera does get a new heart.” If other relatives and friends’ treatment also interfere with Zeus’s outcome,
then one may need to refer to the causal effect of heart transplant on Zeus’s outcome when “no relative or friend gets
a new heart,” “when only Hera gets a new heart,” etc. because the causal effect of treatment on Zeus’s outcome may
differ for each particular allocation of hearts. The assumption of no interference was labeled “no interaction between
units” by Cox (1958), and is included in the “stable-unit-treatment-value assumption (SUTVA)” described by Rubin
(1980). See Halloran and Struchiner (1995), Sobel (2006), Rosenbaum (2007), and Hudgens and Halloran (2009) for
a more detailed discussion of the role of interference in the definition of causal effects. Unless otherwise specified, we
will assume no interference throughout this book.
Table 1.1 Pr[ =1 = 1] = 1020 = 05. Similarly, from the other column of Table 1.1, we
=0 =1 can conclude that half of the members of the population (10 out of 20) would
Rheia 0 1 have died if they had not received a heart transplant. That is, the proportion
Kronos 1 0 of individuals that would have developed the outcome had all population in-
Demeter 0 0 dividuals received = 0 is Pr[ =0 = 1] = 1020 = 05. Note that we have
Hades 0 0 computed the counterfactual risk under treatment to be 05 by counting the
Hestia 0 0 number of deaths (10) and dividing them by the total number of individuals
Poseidon 1 0 (20), which is the same as computing the average of the counterfactual outcome
Hera 0 0 across all individuals in the population (to see the equivalence between risk and
Zeus 0 1 average for a dichotomous outcome, use the data in Table 1.1 to compute the
Artemis 1 1 average of =1 ).
Apollo 1 0 We are now ready to provide a formal definition of the average causal effect
Leto 0 1 in the population: an average causal effect of treatment on outcome
Ares 1 1 is present if Pr[ =1 = 1] 6= Pr[ =0 = 1] in the population of interest.
Athena 1 1 Under this definition, treatment does not have an average causal effect on
Hephaestus 0 1 outcome in our population because both the risk of death under treatment
Aphrodite 0 1 Pr[ =1 = 1] and the risk of death under no treatment Pr[ =0 = 1] are 05.
Cyclope 0 1 That is, it does not matter whether all or none of the individuals receive a
Persephone 1 1 heart transplant: half of them would die in either case. When, like here, the
Hermes 1 0 average causal effect in the population is null, we say that the null hypothesis
Hebe 1 0 of no average causal effect is true. Because the risk equals the average and
Dionysus 1 0 because the letter E is usually employed to represent the population average
or mean (also referred to as ‘E’xpectation), we can rewrite the definition of a
non-null average causal effect in the population as E[ =1 ] 6= E[ =0 ] so that
the definition applies to both dichotomous and nondichotomous outcomes.
The presence of an “average causal effect of heart transplant ” is defined
by a contrast that involves the two actions “receiving a heart transplant ( =
6 Causal Inference
Multiple versions of treatment. Another implicit assumption in our definition of an individual’s counterfactual outcome
under treatment value is that there is only one version of treatment value = . For example, we said that Zeus
would die if he received a heart transplant. This statement implicitly assumes that all heart transplants are performed
by the same surgeon using the same procedure and equipment. That is, that there is only one version of the treatment
“heart transplant.” If there were multiple versions of treatment (e.g., surgeons with different skills), then it is possible
that Zeus would survive if his transplant were performed by Asclepios, and would die if his transplant were performed by
Hygieia. In the presence of multiple versions of treatment, the counterfactual for an individual is not well defined
because an individual’s outcome depends on the version of treatment . As a consequence “the causal effect of heart
transplant on Zeus’s outcome” is not well defined when there are multiple versions of treatment. Rather, one needs to
refer to “the causal effect of heart transplant on Zeus’s outcome when Asclepios performs the surgery” or “the causal
effect of heart transplant on Zeus’s outcome when Hygieia performs the surgery.” If other components of treatment
(e.g., procedure, place) are also relevant to the outcome, then one may need to refer to “the causal effect of heart
transplant on Zeus’s outcome when Asclepios performs the surgery using his rod at the temple of Kos” because the
causal effect of treatment on Zeus’s outcome may differ for each particular version of treatment.
Like the assumption of no interference (see Fine Point 1.1), the assumption of no multiple versions of treatment is
included in the “stable-unit-treatment-value assumption (SUTVA)” described by Rubin (1980). Robins and Greenland
(2000) made the point that if the versions of a particular treatment (e.g., heart transplant) had the same causal effect
on the outcome (survival), then the counterfactual =1 would be well-defined. VanderWeele (2009) formalized this
point as the assumption of “treatment variation irrelevance,” i.e., the assumption that multiple versions of treatment
= may exist but they all result in the same outcome . We return to this issue in Chapter 3 but, unless otherwise
specified, we will assume treatment variation irrelevance throughout this book.
1)” and “not receiving a heart transplant ( = 0).” When more than two
actions are possible (i.e., the treatment is not dichotomous), the particular
Average causal effect in population: contrast of interest needs to be specified. For example, “the causal effect of
E[ =1 ] 6= E[ =0 ] aspirin” is meaningless unless we specify that the contrast of interest is, say,
“taking, while alive, 150 mg of aspirin by mouth (or nasogastric tube if need
be) daily for 5 years” versus “not taking aspirin.” Note that this causal effect is
well defined even if counterfactual outcomes under other interventions are not
well defined or even do not exist (e.g., “taking, while alive, 500 mg of aspirin
by absorption through the skin daily for 5 years”).
Absence of an average causal effect does not imply absence of individual
effects. Table 1.1 shows that treatment has an individual causal effect on
12 members (including Zeus) of the population because, for each of these 12
individuals, the value of their counterfactual outcomes =1¡ =1and =0 differ.¢
Of the 12 , 6 were harmed
¡ =1 by treatment, ¢ including Zeus − =0 = 1 ,
and 6 were helped − =0 = −1 . This equality is not an accident:
the average causal effect E[ =1 ] − E[ =0 ] is always equal to the average
E[ =1 − =0 ] of the individual causal effects =1 − =0 , as a difference
of averages is equal to the average of the differences. When there is no causal
effect for any individual in the population, i.e., =1 = =0 for all individuals,
we say that the sharp causal null hypothesis is true. The sharp causal null
hypothesis implies the null hypothesis of no average effect.
As discussed in the next chapters, average causal effects can sometimes be
identified from data, even if individual causal effects cannot. Hereafter we refer
to ‘average causal effects’ simply as ‘causal effects’ and the null hypothesis of
no average effect as the causal null hypothesis. We next describe different
measures of the magnitude of a causal effect.
A definition of causal effect 7
Causal effects in the population. Let E[ ] be the mean counterfactual outcome had all individuals in the population
received treatment level . For discrete outcomes, the mean or expected value E[ ] is defined as the weighted sum
P
() over all possible values of the random variable , where (·) is the probability mass function of ,
i.e., () = Pr[ = ]. For dichotomous
R outcomes, E[
] = Pr[
= 1]. For continuous outcomes, the expected
value E[ ] is defined as the integral () over all possible values of the random variable , where (·)
is the probability density function of R . A common representation of the expected value that applies to both discrete
and continuous outcomes is E[ ] = (), where (·) is the cumulative distribution function (cdf) of the
0
random variable . We say that there is a non-null average causal effect in the population if E[ ] 6= E[ ] for any
two values and 0 .
The average causal effect, defined by a contrast of means of counterfactual outcomes, is the most commonly
used population causal effect. However, a population causal effect may also be defined as a contrast of, say, medians,
variances, hazards, or cdfs of counterfactual outcomes. In general, a causal effect can be defined as a contrast of any
functional of the distributions of counterfactual outcomes under different actions or treatment values. The causal null
hypothesis refers to the particular contrast of functionals (mean, median, variance, hazard, cdf, ...) used to define the
causal effect.
Number needed to treat. Consider a population of 100 million patients in which 20 million would die within five years
if treated ( = 1), and 30 million would die within five years if untreated ( = 0). This information can be summarized
in several equivalent ways:
• the causal risk difference is Pr[ =1 = 1] − Pr[ =0 = 1] = 02 − 03 = −01
• if one treats the 100 million patients, there will be 10 million fewer deaths than if one does not treat those 100
million patients.
• one needs to treat 100 million patients to save 10 million lives
• on average, one needs to treat 10 patients to save 1 life
We refer to the average number of individuals that need to receive treatment = 1 to reduce the number of cases
= 1 by one as the number needed to treat (NNT). In our example the NNT is equal to 10. For treatments that
reduce the average number of cases (i.e., the causal risk difference is negative), the NNT is equal to the reciprocal of
the absolute value of the causal risk difference:
−1
=
Pr[ =1 = 1] − Pr[ =0 = 1]
For treatments that increase the average number of cases (i.e., the causal risk difference is positive), one can
symmetrically define the number needed to harm. The NNT was introduced by Laupacis, Sackett, and Roberts (1988).
Note that, like the causal risk difference, the NNT applies to the population and time interval on which it is based. For
a discussion of the relative advantages and disadvantages of the NNT as an effect measure, see Grieve (2003).
treatment, relative to no treatment, increases the disease risk. The causal risk
difference (additive scale) is used to compute the absolute number of cases of
the disease attributable to the treatment. The use of either the multiplicative
or additive scale will depend on the goal of the inference.
tals). We denote the proportion of individuals in the sample who would have
c =0 = 1] = 1020 = 050. The sample proportion
died if unexposed as Pr[
c =0 = 1] does not have to be exactly equal to the proportion of individ-
Pr[
uals who would have died if the entire super-population had been unexposed,
Pr[ =0 = 1]. For example, suppose Pr[ =0 = 1] = 057 in the population
c =0 = 1] = 05 in
but, because of random error due to sampling variability, Pr[
our particular sample. We use the sample proportion Pr[ c = 1] to estimate
the super-population probability Pr[ = 1] under treatment value . The
c = 1] is an estimator
“hat” over Pr indicates that the sample proportion Pr[
An estimator ̂ of is consistent
of the corresponding population quantity Pr[ = 1]. We say that Pr[ c = 1]
if, with probability approaching 1, is a consistent estimator of Pr[ = 1] because the larger the number of in-
the difference ̂ − approaches zero dividuals in the sample, the smaller the difference between Pr[ c = 1] and
as the sample size increases towards Pr[ = 1] is expected to be. This occurs because the error due to sampling
infinity. variability is random and thus obeys the law of large numbers.
Because the super-population probabilities Pr[ = 1] cannot be computed,
Caution: the term ‘consistency’ only consistently estimated by the sample proportions Pr[c = 1], one cannot
when applied to estimators has a conclude with certainty that there is, or there is not, a causal effect. Rather, a
different meaning from that which statistical procedure must be used to test the causal null hypothesis Pr[ =1 =
it has when applied to counterfac- 1] = Pr[ =0 = 1]; the procedure quantifies the chance that the difference
tual outcomes. c =1 = 1] and Pr[
between Pr[ c =0 = 1] is wholly due to sampling variability.
So far we have only considered sampling variability as a source of random
error. But there may be another source of random variability: perhaps the
2nd source of random error: values of an individual’s counterfactual outcomes are not fixed in advance.
Nondeterministic counterfactuals We have defined the counterfactual outcome as the individual’s outcome
had he received treatment value . For example, in our first vignette, Zeus
would have died if treated and would have survived if untreated. As defined,
the values of the counterfactual outcomes are fixed or deterministic for each
Table 1.2 individual, e.g., =1 = 1 and =0 = 0 for Zeus. In other words, Zeus
has a 100% chance of dying if treated and a 0% chance of dying if untreated.
Rheia 0 0 However, we could imagine another scenario in which Zeus has a 90% chance
Kronos 0 1 of dying if treated, and a 10% chance of dying if untreated. In this scenario,
Demeter 0 0 the counterfactual outcomes are stochastic or nondeterministic because Zeus’s
Hades 0 0 probabilities of dying under treatment (09) and under no treatment (01)
Hestia 1 0 are neither zero nor one. The values of =1 and =0 shown in Table 1.1
Poseidon 1 0 would be possible realizations of “random flips of mortality coins” with these
Hera 1 0 probabilities. Further, one would expect that these probabilities vary across
Zeus 1 1 individuals because not all individuals are equally susceptible to develop the
Artemis 0 1 outcome. Quantum mechanics, in contrast to classical mechanics, holds that
Apollo 0 1 outcomes are inherently nondeterministic. That is, if the quantum mechanical
Leto 0 0 probability of Zeus dying is 90%, the theory holds that no matter how much
Ares 1 1 data we collect about Zeus, the uncertainty about whether Zeus will actually
Athena 1 1 develop the outcome if treated is irreducible.
Hephaestus 1 1 Thus, in causal inference, random error derives from sampling variability,
Aphrodite 1 1 nondeterministic counterfactuals, or both. However, for pedagogic reasons, we
Cyclope 1 1 will continue to largely ignore random error until Chapter 10. Specifically, we
Persephone 1 1 will assume that counterfactual outcomes are deterministic and that we have
Hermes 1 0 recorded data on every individual in a very large (perhaps hypothetical) super-
Hebe 1 0 population. This is equivalent to viewing our population of 20 individuals as a
Dionysus 1 0 population of 20 billion individuals in which 1 billion individuals are identical
to Zeus, 1 billion individuals are identical to Hera, and so on. Hence, until
Chapter 10, we will carry out our computations with Olympian certainty.
Then, in Chapter 10, we will describe how our statistical estimates and
10 Causal Inference
confidence intervals for causal effects in the super-population are identical ir-
respective of whether the world is stochastic (quantum) or deterministic (classi-
cal) at the level of individuals. In contrast, confidence intervals for the average
causal effect in the actual study sample will differ depending on whether coun-
terfactuals are deterministic versus stochastic. Fortunately, super-population
effects are in most cases the causal effects of substantive interest.
Pr[ = 1| = 1]
(ii) =1
Pr[ = 1| = 0]
where the left-hand side of the inequalities (i), (ii), and (iii) is the associational
risk difference, risk ratio, and odds ratio, respectively.
We say that treatment and outcome are dependent or associated when
For a continuous outcome we Pr[ = 1| = 1] 6= Pr[ = 1| = 0]. In our population, treatment and
define mean independence between outcome are indeed associated because Pr[ = 1| = 1] = 713 and Pr[ =
treatment and outcome as: 1| = 0] = 37. The associational risk difference, risk ratio, and odds ratio
E[ | = 1] = E[ | = 0] (and other measures) quantify the strength of the association when it exists.
Independence and mean indepen- They measure the association on different scales, and we refer to them as
dence are the same concept for di- association measures. These measures are also affected by random variability.
chotomous outcomes. However, until Chapter 10, we will disregard statistical issues by assuming that
the population in Table 1.2 is extremely large.
For dichotomous outcomes, the risk equals the average in the population,
and we can therefore rewrite the definition of association in the population as
E [ | = 1] 6= E [ | = 0]. For continuous outcomes , we can also define
association as E [ | = 1] 6= E [ | = 0]. For binary , and are not
associated if and only if they are not statistically correlated.
In our population of 20 individuals, we found (i ) no causal effect after com-
paring the risk of death if all 20 individuals had been treated with the risk
of death if all 20 individuals had been untreated, and (ii ) an association after
comparing the risk of death in the 13 individuals who happened to be treated
with the risk of death in the 7 individuals who happened to be untreated.
Figure 1.1 depicts the causation-association difference. The population (repre-
sented by a diamond) is divided into a white area (the treated) and a smaller
grey area (the untreated).
Population of interest
Treated Untreated
Causation Association
vs. vs.
Figure 1.1
Does your looking up at the sky make other pedestrians look up too? This question has the main components
of any causal question: we want to know whether certain action (your looking up) affects certain outcome (other
people’s looking up) in certain population (say, residents of Madrid in 2017). Suppose we challenge you to design
a scientific study to answer this question. “Not much of a challenge,” you say after some thought, “I can stand on
the sidewalk and flip a coin whenever someone approaches. If heads, I’ll look up; if tails, I’ll look straight ahead.
I’ll repeat the experiment a few thousand times. If the proportion of pedestrians who looked up within 10 seconds
after I did is greater than the proportion of pedestrians who looked up when I didn’t, I will conclude that my
looking up has a causal effect on other people’s looking up. By the way, I may hire an assistant to record what
people do while I’m looking up.” After conducting this study, you found that 55% of pedestrians looked up when
you looked up but only 1% looked up when you looked straight ahead.
Your solution to our challenge was to conduct a randomized experiment. It was an experiment because the
investigator (you) carried out the action of interest (looking up), and it was randomized because the decision to
act on any study subject (pedestrian) was made by a random device (coin flipping). Not all experiments are
randomized. For example, you could have looked up when a man approached and looked straight ahead when a
woman did. Then the assignment of the action would have followed a deterministic rule (up for man, straight for
woman) rather than a random mechanism. However, your findings would not have been nearly as convincing if
you had conducted a non randomized experiment. If your action had been determined by the pedestrian’s sex,
critics could argue that the “looking up” behavior of men and women differs (women may look up less often than
do men after you look up) and thus your study compared essentially “noncomparable” groups of people. This
chapter describes why randomization results in convincing causal inferences.
2.1 Randomization
In a real world study we will not know both of Zeus’s potential outcomes =1
under treatment and =0 under no treatment. Rather, we can only know
his observed outcome under the treatment value that he happened to
receive. Table 2.1 summarizes the available information for our population
of 20 individuals. Only one of the two counterfactual outcomes is known for
each individual: the one corresponding to the treatment level that he actually
Neyman (1923) applied counterfac- received. The data are missing for the other counterfactual outcomes. As we
tual theory to the estimation of discussed in the previous chapter, this missing data creates a problem because
causal effects via randomized ex- it appears that we need the value of both counterfactual outcomes to compute
periments effect measures. The data in Table 2.1 are only good to compute association
measures.
Randomized experiments, like any other real world study, generate data with
missing values of the counterfactual outcomes as shown in Table 2.1. However,
randomization ensures that those missing values occurred by chance. As a
result, effect measures can be computed –or, more rigorously, consistently
estimated–in randomized experiments despite the missing data. Let us be
more precise.
Suppose that the population represented by a diamond in Figure 1.1 was
near-infinite, and that we flipped a coin for each individual in such population.
14 Causal Inference
We assigned the individual to the white group if the coin turned tails, and
Table 2.1 to the grey group if it turned heads. Note this was not a fair coin because
0 1 the probability of heads was less than 50%–fewer people ended up in the
Rheia 0 0 0 ? grey group than in the white group. Next we asked our research assistants to
Kronos 0 1 1 ? administer the treatment of interest ( = 1), to individuals in the white group
Demeter 0 0 0 ? and a placebo ( = 0) to those in the grey group. Five days later, at the end of
Hades 0 0 0 ? the study, we computed the mortality risks in each group, Pr[ = 1| = 1] =
Hestia 1 0 ? 0 03 and Pr[ = 1| = 0] = 06. The associational risk ratio was 0306 = 05
Poseidon 1 0 ? 0 and the associational risk difference was 03 − 06 = −03. We will assume
Hera 1 0 ? 0 that this was an ideal randomized experiment in all other respects: no loss to
Zeus 1 1 ? 1 follow-up, full adherence to the assigned treatment over the duration of the
Artemis 0 1 1 ? study, a single version of treatment, and double blind assignment (see Chapter
Apollo 0 1 1 ? 9). Ideal randomized experiments are unrealistic but useful to introduce some
Leto 0 0 0 ? key concepts for causal inference. Later in this book we consider more realistic
Ares 1 1 ? 1 randomized experiments.
Athena 1 1 ? 1 Now imagine what would have happened if the research assistants had
Hephaestus 1 1 ? 1 misinterpreted our instructions and had treated the grey group rather than
Aphrodite 1 1 ? 1 the white group. Say we learned of the misunderstanding after the study
Cyclope 1 1 ? 1 finished. How does this reversal of treatment status affect our conclusions?
Persephone 1 1 ? 1 Not at all. We would still find that the risk in the treated (now the grey
Hermes 1 0 ? 0 group) Pr[ = 1| = 1] is 03 and the risk in the untreated (now the white
Hebe 1 0 ? 0 group) Pr[ = 1| = 0] is 06. The association measure would not change.
Dionysus 1 0 ? 0 Because individuals were randomly assigned to white and grey groups, the
proportion of deaths among the exposed, Pr[ = 1| = 1] is expected to be
the same whether individuals in the white group received the treatment and
individuals in the grey group received placebo, or vice versa. When group
membership is randomized, which particular group received the treatment is
irrelevant for the value of Pr[ = 1| = 1]. The same reasoning applies to
Pr[ = 1| = 0], of course. Formally, we say that groups are exchangeable.
Exchangeability means that the risk of death in the white group would have
been the same as the risk of death in the grey group had individuals in the white
group received the treatment given to those in the grey group. That is, the risk
under the potential treatment value among the treated, Pr[ = 1| = 1],
equals the risk under the potential treatment value among the untreated,
Pr[ = 1| = 0], for both = 0 and = 1. An obvious consequence of these
(conditional) risks being equal in all subsets defined by treatment status in the
population is that they must be equal to the (marginal) risk under treatment
value in the whole population: Pr[ = 1| = 1] = Pr[ = 1| = 0] =
Pr[ = 1]. Because the counterfactual risk under treatment value is the
same in both groups = 1 and = 0, we say that the actual treatment
does not predict the counterfactual outcome . Equivalently, exchangeability
means that the counterfactual outcome and the actual treatment are indepen-
Exchangeability: dent, or ⊥
⊥, for all values . Randomization is so highly valued because it
⊥
⊥ for all is expected to produce exchangeability. When the treated and the untreated
are exchangeable, we sometimes say that treatment is exogenous, and thus
exogeneity is commonly used as a synonym for exchangeability.
The previous paragraph argues that, in the presence of exchangeability, the
counterfactual risk under treatment in the white part of the population would
equal the counterfactual risk under treatment in the entire population. But the
risk under treatment in the white group is not counterfactual at all because the
white group was actually treated! Therefore our ideal randomized experiment
allows us to compute the counterfactual risk under treatment in the population
Pr[ =1 = 1] because it is equal to the risk in the treated Pr[ = 1| = 1] =
Randomized experiments 15
Full exchangeability and mean exchangeability. Randomization makes the jointly independent of which implies,
but is not implied by, exchangeability ⊥
⊥nfor each . Formally, 0 00
o let A = { } denote the set of all treatment
A 0 00
values present in the population, and = the set of all counterfactual outcomes. Randomization
makes A ⊥ ⊥. We refer to
¡ this joint independence
¢ as full exchangeability. For a dichotomous treatment, A = {0 1}
and full exchangeability is =1 =0 ⊥⊥.
For a dichotomous outcome and treatment, exchangeability ⊥ ⊥ can also be written as Pr [ = 1| = 1] =
Pr [ = 1| = 0] or, equivalently, as E[ | = 1] = E[ | = 0] for all . We refer to the last equality as mean
exchangeability. For a continuous outcome, exchangeability ⊥⊥ implies mean exchangeability E[ | = 0 ] =
E[ ], but mean exchangeability does not imply exchangeability because distributional parameters other than the mean
(e.g., variance) may not be independent of treatment.
Neither full exchangeability A ⊥
⊥ nor exchangeability ⊥⊥ are required to prove that E[ ] = E[ | = ].
Mean exchangeability is sufficient. As sketched in the main text, the proof has two steps. First, E[ | = ] =
E[ | = ] by consistency. Second, E[ | = ] = E[ ] by mean exchangeability. Because exchangeability and
mean exchangeability are identical concepts for the dichotomous outcomes used in this chapter, we use the shorter term
“exchangeability” throughout.
03. That is, the risk in the treated (the white part of the diamond) is the
same as the risk if everybody had been treated (and thus the diamond had
been entirely white). Of course, the same rationale applies to the untreated:
the counterfactual risk under no treatment in the population Pr[ =0 = 1]
equals the risk in the untreated Pr[ = 1| = 0] = 06. The causal risk ratio
is 05 and the causal risk difference is −03. In ideal randomized experiments,
association is causation.
Here is another explanation for exchangeability ⊥ ⊥ in a randomized
experiment. The counterfactual outcome , like one’s genetic make-up, can
be thought of as a fixed characteristic of a person existing before the treat-
ment was randomly assigned. This is because encodes what would have
been one’s outcome if assigned to treament and thus does not depend on
the treatment you later receive. Because treatment was randomized, it is
independent of both your genes and . The difference between and your
genetic make-up is that, even conceptually, you can only learn the value of
after treatment is given and then only if one’s treatment is equal to .
Caution: Before proceeding, please make sure you understand the difference between
⊥
⊥ is different from ⊥
⊥ ⊥⊥ and ⊥ ⊥. Exchangeability ⊥ ⊥ is defined as independence between
the counterfactual outcome and the observed treatment. Again, this means
that the treated and the untreated would have experienced the same risk of
death if they had received the same treatment level (either = 0 or = 1). But
independence between the counterfactual outcome and the observed treatment
⊥⊥ does not imply independence between the observed outcome and the
observed treatment ⊥ ⊥. For example, in a randomized experiment in which
Suppose there is a causal effect on exchangeability ⊥⊥ holds and the treatment has a causal effect on the
some individuals so that =1 6= outcome, then ⊥ ⊥ does not hold because the treatment is associated with
=0 . Since = , then the observed outcome.
with evaluated at the observed Does exchangeability hold in our heart transplant study of Table 2.1? To
treatment is the observed , answer this question we would need to check whether ⊥ ⊥ holds for = 0
which depends on and thus will and for = 1. Take = 0 first. Suppose the counterfactual data in Table 1.1
not be independent of . are available to us. We can then compute the risk of death under no treatment
Pr[ =0 = 1| = 1] = 713 in the 13 treated individuals and the risk of death
16 Causal Inference
Crossover randomized experiments. Individual (also known as subject-specific) causal effects can sometimes be
identified via randomized experiments. For example, suppose we want to estimate the causal effect of lightning bolt
use on Zeus’s blood pressure . We define the counterfactual outcomes =1 and =0 to be 1 if Zeus’s blood
pressure is temporarily elevated after calling or not calling a lightning strike, respectively. Suppose we convinced Zeus
to use his lightning bolt only when suggested by us. Yesterday morning we flipped coin and obtained heads. We then
asked Zeus to call a lightning strike ( = 1). His blood pressure was elevated after doing so. This morning we flipped
a coin and obtained tails. We then asked Zeus to refrain from using his lightning bolt ( = 0). His blood pressure did
not increase. We have conducted a crossover randomized experiment in which an individual’s outcome is sequentially
observed under two treatment values. One might argue that, because we have observed both of Zeus’s counterfactual
outcomes =1 = 1 and =0 = 0, using a lightning bolt has a causal effect on Zeus’s blood pressure.
In crossover randomized experiments, an individual is observed during two or more periods. The individual receives
a different treatment value in each period and the order of treatment values is randomly assigned. The main purported
advantage of the crossover design is that, unlike in non-crossover designs, for each treated individual there is a perfectly
exchangeable untreated subject–him or herself. A direct contrast of an individual’s outcomes under different treatment
values allows the identification of individual effects under the following conditions: 1) treatment is of short duration
and its effects do not carry-over to the next period, and 2) the outcome is a condition of abrupt onset that completely
resolves by the next period. Therefore crossover randomized experiments cannot be used to study the effect of heart
transplant, an irreversible action, on death, an irreversible outcome.
Often treatment is randomized at many different periods. If the individual causal effect changes with time, we
obtain the average of the individual time-specific causal effects.
But only the observed data in Table 2.1, not the counterfactual data in
Table 1.1, are available in the real world. Since Table 2.1 is insufficient to
compute counterfactual risks like the risk under no treatment in the treated
Pr[ =0 = 1| = 1], we are generally unable to determine whether exchange-
ability holds in our study. However, suppose for a moment, that we actually
had access to Table 1.1 and determined that exchangeability does not hold
in our heart transplant study. Can we then conclude that our study is not
a randomized experiment? No, for two reasons. First, as you are probably
already thinking, a twenty-person study is too small to reach definite conclu-
sions. Random fluctuations arising from sampling variability could explain
almost anything. We will discuss random variability in Chapter 10. Until
then, let us assume that each individual in our population represents 1 billion
individuals that are identical to him or her. Second, it is still possible that
a study is a randomized experiment even if exchangeability does not hold in
infinite samples. However, unlike the type of randomized experiment described
in this section, it would need to be a randomized experiment in which investi-
gators use more than one coin to randomly assign treatment. The next section
describes randomized experiments with more than one coin.
Randomized experiments 17
and the untreated given that they all were in critical condition at the time of
treatment assignment. That is,
where ⊥
⊥| = 1 means and are independent given = 1. Similarly,
randomization also ensures that the treated and the untreated are exchange-
able in the subset of individuals that were in noncritical condition, that is,
⊥⊥| = 0. When ⊥ ⊥| = holds for all values we simply write
Conditional exchangeability: ⊥ ⊥|. Thus, although conditional randomization does not guarantee un-
⊥
⊥| for all conditional (or marginal) exchangeability ⊥
⊥, it guarantees conditional
exchangeability ⊥
⊥| within levels of the variable . In summary, ran-
domization produces either marginal exchangeability (design 1) or conditional
exchangeability (design 2).
We know how to compute effect measures under marginal exchangeabil-
ity. In marginally randomized experiments the causal risk ratio Pr[ =1 =
1] Pr[ =0 = 1] equals the associational risk ratio Pr[ = 1| = 1] Pr[ =
1| = 0] because exchangeability ensures that the counterfactual risk under
treatment level , Pr[ = 1], equals the observed risk among those who re-
In a marginally randomized exper- ceived treatment level , Pr[ = 1| = ]. Thus, if the data in Table 2.2 had
iment, the values of the counter- been collected during a marginally randomized experiment, the causal risk ra-
factual outcomes are missing com- 713
tio would be readily calculated from the data on and as = 126. The
pletely at random (MCAR). In 37
a conditionally randomized experi- question is how to compute the causal risk ratio in a conditionally randomized
ment, the values of the counterfac- experiment. Remember that a conditionally randomized experiment is simply
tual outcomes are not MCAR, but the combination of two (or more) separate marginally randomized experiments
they are missing at random (MAR) conducted in different subsets of the population, e.g., = 1 and = 0. Thus
conditional on the covariate . The we have two options.
terms MCAR, MAR, and NMAR First, we can compute the average causal effect in each of these subsets of
(not missing at random) were in- strata of the population. Because association is causation within each subset,
troduced by Rubin (1976). the stratum-specific causal risk ratio Pr[ =1 = 1| = 1] Pr[ =0 = 1| = 1]
among people in critical condition is equal to the stratum-specific associational
risk ratio Pr[ = 1| = 1 = 1] Pr[ = 1| = 1 = 0] among people in
critical condition. And analogously for = 0. We refer to this method to
compute stratum-specific causal effects as stratification. Note that the stratum-
specific causal risk ratio in the subset = 1 may differ from the causal risk
Stratification and effect modifica- ratio in = 0. In that case, we say that the effect of treatment is modified by
tion are discussed in more detail in , or that there is effect modification by .
Chapter 4. Second, we can compute the average causal effect Pr[ =1 = 1] Pr[ =0 =
1] in the entire population, as we have been doing so far. Whether our princi-
pal interest lies in the stratum-specific average causal effects versus the average
causal effect in the entire population depends on practical and theoretical con-
siderations discussed in detail in Chapter 4 and in Part III. As one example,
you may be interested in the average causal effect in the entire population,
rather than in the stratum-specific average causal effects, if you do not expect
to have information on for future individuals (e.g., the variable is expen-
sive to measure) and thus your decision to treat cannot depend on the value of
. Until Chapter 4, we will restrict our attention to the average causal effect
in the entire population. The next two sections describe how to use data from
conditionally randomized experiments to compute the average causal effect in
the entire population.
Randomized experiments 19
2.3 Standardization
Our heart transplant study is a conditionally randomized experiment: the in-
vestigators used a random procedure to assign hearts ( = 1) with probability
50% to the 8 individuals in noncritical condition ( = 0), and with probability
75% to the 12 individuals in critical condition ( = 1). First, let us focus on
the 8 individuals–remember, they are really the average representatives of 8
billion individuals–in noncritical condition. In this group, the risk of death
among the treated is Pr[ = 1| = 0 = 1] = 14 , and the risk of death
among the untreated is Pr[ = 1| = 0 = 0] = 14 . Because treatment
was randomly assigned to individuals in the group = 0, i.e., ⊥ ⊥| = 0,
the observed risks are equal to the counterfactual risks. That is, in the group
= 0, the risk in the treated equals the risk if everybody had been treated,
Pr[ = 1| = 0 = 1] = Pr[ =1 = 1| = 0], and the risk in the untreated
equals the risk if everybody had been untreated, Pr[ = 1| = 0 = 0] =
Pr[ =0 = 1| = 0]. Following an analogous reasoning, we can conclude that
the observed risks equal the counterfactual risks in the group of 12 individuals
in critical condition, i.e., Pr[ = 1| = 1 = 1] = Pr[ =1 = 1| = 1] = 23 ,
and Pr[ = 1| = 1 = 0] = Pr[ =0 = 1| = 1] = 23 .
Suppose now our goal is to compute the causal risk ratio Pr[ =1 =
1] Pr[ =0 = 1]. The numerator of the causal risk ratio is the risk if all
20 individuals in the population had been treated. From the previous para-
graph, we know that the risk if all individuals had been treated is 14 in the 8
individuals with = 0 and 23 in the 12 individuals with = 1. Therefore the
risk if all 20 individuals in the population had been treated will be a weighted
average of 14 and 23 in which each group receives a weight proportional to its
size. Since 40% of the individuals (8) are in group = 0 and 60% of the
individuals (12) are in group = 1, the weighted average is 14 × 04 + 23 × 06 =
05. Thus the risk if everybody had been treated Pr[ =1 = 1] is equal to 05.
By following the same reasoning we can calculate that the risk if nobody had
been treated Pr[ =0 = 1] is also equal to 05. The causal risk ratio is then
0505 = 1.
More formally, the marginal counterfactual risk Pr[ = 1] is the weighted
average of the stratum-specific risks Pr[ = 1| = 0] and Pr[ = 1| = 1]
with weights equal to the proportion of individuals in the population with = 0
and = 1, respectively. That is, Pr[ = 1] = Pr[ = 1| = 0] Pr [ = 0] +
Pr[ = 1| = 1] Pr [ = 1]. Or, using a P
P more compact notation, Pr[ = 1] =
Pr[ = 1| = ] Pr [ = ], where means sum over all values that
occur in the population. By conditional exchangeability, we can replace the
counterfactual risk Pr[ = 1| = ] by the observed P risk Pr[ = 1| = =
] in the expression above. That is, Pr[ = 1] = Pr[ = 1| = =
] Pr [ = ]. The left-hand side of this equality is an unobserved counterfactual
risk whereas the right-hand side includes observed quantities only, which can
be computed using data on , , and . When, as here, a counterfactual
quantity can be expressed as function of the distribution (i.e., probabilities)
of the observed data, we say that the counterfactual quantity is identified or
identifiable; otherwise, we say it is unidentified or not identifiable.
The method described above is known in epidemiology, demography, P and
Standardized
P mean other disciplines as standardization. For example, the numerator Pr[ =
E[ | = = ] 1| = = 1] Pr [ = ] of the causal risk ratio is the standardized risk in the
× Pr [ = ] treated using the population as the standard. In the presence of conditional ex-
changeability, this standardized risk can be interpreted as the (counterfactual)
risk that would have been observed had all the individuals in the population
20 Causal Inference
been treated.
The standardized risks in the treated and the untreated are equal to the
counterfactual risks under treatment and no treatment, respectively. There-
Pr[ =1 = 1]
fore, the causal risk ratio can be computed by standardization as
P Pr[ =0 = 1]
Pr[ = 1| = = 1] Pr [ = ]
P .
Pr[ = 1| = = 0] Pr [ = ]
Figure 2.1
Randomized experiments 21
Risk periods. We have defined a risk as the proportion of individuals who develop the outcome of interest during a
particular period. For example, the 5-day mortality risk in the treated Pr[ = 1| = 0] is the proportion of treated
individuals who died during the first five days of follow-up. Throughout the book we often specify the period when the
risk is first defined (e.g., 5 days) and, for conciseness, omit it later. That is, we may just say “the mortality risk” rather
than “the five-day mortality risk.”
The following example highlights the importance of specifying the risk period. Suppose a randomized experiment
was conducted to quantify the causal effect of antibiotic therapy on mortality among elderly humans infected with the
plague bacteria. An investigator analyzes the data and concludes that the causal risk ratio is 005, i.e., on average
antibiotics decrease mortality by 95%. A second investigator also analyzes the data but concludes that the causal risk
ratio is 1, i.e., antibiotics have a null average causal effect on mortality. Both investigators are correct. The first
investigator computed the ratio of 1-year risks, whereas the second investigator computed the ratio of 100-year risks.
The 100-year risk was of course 1 regardless of whether individuals received the treatment. When we say that a treatment
has a causal effect on mortality, we mean that death is delayed, not prevented, by the treatment.
Figure 2.2
The denominator of the causal risk ratio, Pr[ =0 = 1], is the counterfac-
tual risk of death had everybody in the population remained untreated. Let
us calculate this risk. In Figure 2.1, 4 out of 8 individuals with = 0 were
untreated, and 1 of them died. How many deaths would have occurred had
the 8 individuals with = 0 remained untreated? Two deaths, because if 8
individuals rather than 4 individuals had remained untreated, then 2 deaths
rather than 1 death would have been observed. If the number of individuals is
multiplied times 2, then the number of deaths is also doubled. In Figure 2.1,
3 out of 12 individuals with = 1 were untreated, and 2 of them died. How
many deaths would have occurred had the 12 individuals with = 1 remained
untreated? Eight deaths, or 2 deaths times 4, because 12 is 3× 4. That is, if all
8 + 12 = 20 individuals in the population had been untreated, then 2 + 8 = 10
would have died. The denominator of the causal risk ratio, Pr[ =0 = 1], is
1020 = 05. The first tree in Figure 2.2 shows the population had everybody
remained untreated. Of course, these calculations rely on the condition that
treated individuals with = 0, had they remained untreated, would have had
the same probability of death as those who actually remained untreated. This
condition is precisely exchangeability given = 0.
22 Causal Inference
The numerator of the causal risk ratio Pr[ =1 = 1] is the counterfactual
risk of death had everybody in the population been treated. Reasoning as in
the previous paragraph, this risk is calculated to be also 1020 = 05, under
exchangeability given = 1. The second tree in Figure 2.2 shows the popu-
lation had everybody been treated. Combining the results from this and the
previous paragraph, the causal risk ratio Pr[ =1 = 1] Pr[ =0 = 1] is equal
to 0505 = 1. We are done.
Let us examine how this method works. The two trees in Figure 2.2 are
a simulation of what would have happened had all individuals in the popula-
tion been untreated and treated, respectively. These simulations are correct
under conditional exchangeability. Both simulations can be pooled to create a
hypothetical population in which every individual appears as a treated and as
an untreated individual. This hypothetical population, twice as large as the
original population, is known as the pseudo-population. Figure 2.3 shows the
entire pseudo-population. Under conditional exchangeability ⊥ ⊥| in the
original population, the treated and the untreated are (unconditionally) ex-
changeable in the pseudo-population because the is independent of . That
is, the associational risk ratio in the pseudo-population is equal to the causal
risk ratio in both the pseudo-population and the original population.
Figure 2.3
Formal definition of IP weights. An individual’s IP weight depends on her values of treatment and covariate .
For example, a treated individual with = receives the weight 1 Pr [ = 1| = ], whereas an untreated individual
with = 0 receives the weight 1 Pr [ = 0| = 0 ]. We can express these weights using a single expression for all
individuals–regardless of their individual treatment and covariate values–by using the probability density function (pdf)
of rather than the probability of . The conditional pdf of given evaluated at the values and is represented
by | [|], or simply as [|]. For discrete variables and , [|] is the conditional probability Pr [ = | = ].
In a conditionally randomized experiment, [|] is positive for all such that Pr [ = ] is nonzero.
Since the denominator of the weight for each individual is the conditional density evaluated at the individual’s own
values of and , it can be expressed as the conditional density evaluated at the random arguments and (as
opposed to the fixed arguments and ), that is, as [|]. This notation, which appeared in Figure 2.3, is used to
define the IP weights = 1 [|]. It is needed to have a unified notation for the weights because Pr [ = | = ]
is not considered proper notation.
IP weight: = 1 [|] by the inverse of the conditional probability of receiving the treatment level
that she indeed received. These IP weights are shown in Figure 2.3.
IP weighting yielded the same result as standardization–causal risk ratio
equal to 1– in our example above. This is no coincidence: standardization and
IP weighting are mathematically equivalent (see Technical Point 2.3). In fact,
both standardization and IP weighting can be viewed as procedures to build
a new tree in which all individuals receive treatment . Each method uses a
different set of the probabilities to build the counterfactual tree: IP weighting
uses the conditional probability of treatment given the covariate (as shown
in Figure 2.1), standardization uses the probability of the covariate and the
conditional probability of outcome given and .
Because both standardization and IP weighting simulate what would have
been observed if the variable (or variables in the vector) had not been used
to decide the probability of treatment, we often say that these methods adjust
for . In a slight abuse of language we sometimes say that these methods
control for , but this “analytic control” is quite different from the “physical
control” in a randomized experiment. Standardization and IP weighting can
be generalized to conditionally randomized studies with continuous outcomes
(see Technical Point 2.3).
Why not finish this book here? We have a study design (an ideal random-
ized experiment) that, when combined with the appropriate analytic method
(standardization or IP weighting), allows us to compute average causal effects.
Unfortunately, randomized experiments are often unethical, impractical, or un-
timely. For example, it is questionable that an ethical committee would have
approved our heart transplant study. Hearts are in short supply and society
favors assigning them to individuals who are more likely to benefit from the
transplant, rather than assigning them randomly among potential recipients.
Also one could question the feasibility of the study even if ethical issues were
ignored: double-blind assignment is impossible, individuals assigned to medical
treatment may not resign themselves to forego a transplant, and there may not
be compatible hearts for those assigned to transplant. Even if the study were
feasible, it would still take several years to complete it, and decisions must be
made in the interim. Frequently, conducting an observational study is the least
bad option.
24 Causal Inference
Equivalence of IP weighting and standardization. Assume that [|] is positive for all such that Pr [ = ] is
nonzero. This positivity condition is guaranteed to hold P in conditionally randomized experiments. Under positivity, the
standardized mean for treatment level is defined as E [ | = = ] Pr [ = ] and the IP weighted mean of
∙ ¸
( = )
for treatment level is defined as E i.e., the mean of , reweighted by the IP weight = 1 [|],
[|]
in individuals with treatment value = . The indicator function ( = ) is the function that takes value 1 for
individuals with = , and 0 for the others.
We now prove ∙the equality of¸ the IP weighted mean and the standardized mean under positivity. By definition of an
( = ) X 1
expectation, E = {E [ | = = ] [|] Pr [ = ]}
[|] [|]
P
= {E [ | = = ] Pr [ = ]} where in the final step we cancelled [|] from the numerator and denominator,
and in the first step we did not need to sum over the possible values of because because for any 0 other than the
quantity (0 = ) is zero. The proof treats and as discrete but not necessarily dichotomous. For continuous
simply replace the sum over with an integral.
The proof makes no reference to counterfactuals or to causality. However if we further assume conditional ex-
changeability then both the IP weighted and the standardized means are equal to the counterfactual mean E [ ]. Here
we provide two different proofs of this last statement. First, we prove equality of E [ ] and the standardized mean as
in the textX X X
E [ ] = E [ | = ] Pr [ = ] = E [ | = = ] Pr [ = ] = E [ | = = ] Pr [ = ]
where the second equality is by conditional exchangeability and positivity, and the third by consistency. Second, we
prove
∙ equality of¸ E [ ] and the∙ IP weighted mean
¸ as follows:
( = ) ( = )
E is equal to E by consistency. Next, because positivity implies [|] is never 0, we
[|] [|]
have
∙ ¸ ½ ∙ ¯ ¸¾ ½ ∙ ¯ ¸ ¾
( = ) ( = ) ¯¯ ( = ) ¯¯
E =E E ¯ =E E E [ |] (by conditional exchangeability).
[|] [|] [|] ¯
∙ ¯ ¸
( = ) ¯¯
= E {E [ |]} (because E =1)
[|] ¯
= E [ ]
The extension to polytomous treatments (i.e., can take more than two values) is straightforward. When treatment
is continuous, which is unlikely in conditionally randomized experiments, effect estimates based on the IP weights
= 1 [|] have infinite variance and thus cannot be used. Chapter 12 describes generalized weights. In
Technical Point 3.1, we discuss that the results above do not longer hold in the absence of positivity.
Exploring the Variety of Random
Documents with Different Content
XXX. Estos motivos y el deseo de contemplar y ver mundo
hicieron que Solón se partiese de su patria y fuese a visitar al rey
Amasis en Egipto, y al rey Creso en Sardes. Este último le hospedó en
su palacio, y al tercer o cuarto día de su llegada dio orden a los
cortesanos para que mostrasen al nuevo huésped todas las riquezas y
preciosidades que se encontraban en su tesoro. Luego que todas las
hubo visto y observado prolijamente por el tiempo que quiso, le dirigió
Creso este discurso: «Ateniense, a quien de veras aprecio, y cuyo
nombre ilustre tengo bien conocido por la fama de tu sabiduría y
ciencia política, y por lo mucho que has visto y observado con la mayo
diligencia, respóndeme, caro Solón, a la pregunta que voy a dirigirte
entre tantos hombres, ¿has visto alguno hasta de ahora
completamente dichoso?». Creso hacía esta pregunta porque se creía
el más afortunado del mundo. Pero Solón, enemigo de la lisonja, y que
solamente conocía el lenguaje de la verdad, le respondió: «Sí, señor
he visto a un hombre feliz en Telo el ateniense». Admirado el rey, insta
de nuevo: «¿Y por qué motivo juzgas a Telo el más venturoso de
todos?». «Por dos razones, señor, le responde Solón; la una, porque
floreciendo su patria, vio prosperar a sus hijos, todos hombres de bien
y crecer a sus nietos en medio de la más risueña perspectiva; y la otra
porque gozando en el mundo de una dicha envidiable, le cupo la
muerte más gloriosa, cuando en la batalla de Eleusis, que dieron los
atenienses contra los fronterizos, ayudando a los suyos y poniendo en
fuga a los enemigos, murió en el lecho del honor con las armas
victoriosas en la mano, mereciendo que la patria le distinguiese con
una sepultura pública en el mismo sitio en que había muerto».
XXXI. Excitada la curiosidad de Creso por este discurso de Solón
le preguntó nuevamente a quién consideraba después de Telo e
segundo entre los felices, no dudando que al menos este lugar le sería
adjudicado. Pero Solón le respondió: «A dos argivos, llamados Cleobis
y Bitón. Ambos gozaban en su patria una decente medianía, y eran
además hombres robustos y valientes, que habían obtenido coronas en
los juegos y fiestas públicas de los atletas. También se refiere de ellos
que, como en una fiesta que los argivos hacían a Hera fuese ceremonia
legítima el que su madre[26] hubiese de ser llevada al templo en un
carro tirado de bueyes, y estos no hubiesen llegado del campo a la
hora precisa, los dos mancebos, no pudiendo esperar más, pusieron
bajo del yugo sus mismos cuellos, y arrastraron el carro en que su
madre venía sentada, por el espacio de cuarenta y cinco estadios
hasta que llegaron al templo con ella.
»Habiendo dado al pueblo que a la fiesta concurría este tierno
espectáculo, les sobrevino el término de su carrera del modo más
apetecible y más digno de envidia; queriendo mostrar en ellos el cielo
que a los hombres a veces les conviene más morir que vivir. Porque
como los ciudadanos de Argos, rodeando a los dos jóvenes celebrasen
encarecidamente su resolución, y las ciudadanas llamasen dichosa la
madre que les había dado el ser, ella muy complacida por aque
ejemplo de piedad filial, y muy ufana con los aplausos, pidió a la diosa
Hera delante de su estatua que se dignase conceder a sus hijos Cleobis
y Bitón, en premio de haberla honrado tanto, la mayor gracia que
ningún mortal hubiese jamás recibido. Hecha esta súplica, asistieron
los dos al sacrificio y al espléndido banquete, y después se fueron a
dormir en el mismo lugar sagrado, donde les cogió un sueño tan
profundo que nunca más despertaron de él. Los argivos honraron su
memoria y dedicaron sus retratos en Delfos, considerándolos como a
unos varones esclarecidos».
XXXII. A estos daba Solón el segundo lugar entre los felices
oyendo lo cual Creso, exclamó conmovido: «¿Conque apreciáis en tan
poco, amigo ateniense, la prosperidad que disfruto, que ni siquiera me
contáis por feliz al lado de esos hombres vulgares?». «¿Y a mí, replicó
Solón, me hacéis esa pregunta, a mi, que sé muy bien cuán envidiosa
es la fortuna, y cuán amiga es de trastornar los hombres? Al cabo de
largo tiempo puede suceder fácilmente que uno vea lo que no quisiera
y sufra lo que no temía.
»Supongamos setenta años el término de la vida humana. La suma
de sus días será de venticinco mil y doscientos, sin entrar en ella
ningún mes intercalar. Pero si uno quiere añadir un mes[27] cada dos
años, con la mira de que las estaciones vengan a su debido tiempo
resultarán treinta y cinco meses intercalares, y por ellos mil y cincuenta
días más. Pues en todos estos días de que constan los setenta años, y
que ascienden al número de veintiséis mil doscientos y cincuenta, no
se hallará uno solo que por la identidad de sucesos sea enteramente
parecido a otro. La vida del hombre, ¡oh Creso!, es una serie de
calamidades. En el día sois un monarca poderoso y rico, a quien
obedecen muchos pueblos; pero no me atrevo a daros aún ese nombre
que ambicionáis, hasta que no sepa cómo habéis terminado el curso de
vuestra vida. Un hombre por ser muy rico no es más feliz que otro que
solo cuenta con la subsistencia diaria, si la fortuna no le concede
disfrutar hasta el fin de su primera dicha. ¿Y cuántos infelices vemos
entre los hombres opulentos, al paso que muchos con un moderado
patrimonio gozan de la felicidad?
»El que siendo muy rico es infeliz, en dos cosas aventaja solamente
al que es feliz, pero no rico. Puede, en primer lugar, satisfacer todos
sus antojos; y en segundo, tiene recursos para hacer frente a los
contratiempos. Pero el otro le aventaja en muchas cosas; pues además
de que su fortuna le preserva de aquellos males, disfruta de buena
salud, no sabe qué son trabajos, tiene hijos honrados en quienes se
goza, y se halla dotado de una hermosa presencia. Si a esto se añade
que termine bien su carrera, ved aquí el hombre feliz que buscáis; pero
antes que uno llegue al fin, conviene suspender el juicio y no llamarle
feliz. Désele entretanto, si se quiere, el nombre de afortunado.
»Pero es imposible que ningún mortal reúna todos estos bienes
porque así como ningún país produce cuanto necesita, abundando de
unas cosas y careciendo de otras, y teniéndose por mejor aquel que da
más de su cosecha, del mismo modo no hay hombre alguno que de
todo lo bueno se halle provisto; y cualquiera que constantemente
hubiese reunido mayor parte de aquellos bienes, si después lograre
una muerte plácida y agradable, este, señor, es para mí quien merece
con justicia el nombre de dichoso. En suma, es menester conta
siempre con el fin; pues hemos visto frecuentemente desmoronarse la
fortuna de los hombres a quienes Dios había ensalzado más».
XXXIII. Este discurso, sin mezcla de adulación ni de cortesanos
miramientos, desagradó a Creso, el cual despidió a Solón, teniéndole
por un ignorante que, sin hacer caso de los bienes presentes, fijaba la
felicidad en el término de las cosas.
XXXIV. Después de la partida de Solón, la venganza del cielo se
dejó sentir sobre Creso, en castigo, a lo que parece, de su orgullo po
haberse creído el más dichoso de los mortales. Durmiendo una noche
le asaltó un sueño en que se le presentaron las desgracias que
amenazaban a su hijo. De dos que tenía, el uno era sordo y lisiado; y
el otro, llamado Atis, el más sobresaliente de los jóvenes de su edad
Este perecería traspasado con una punta de hierro si el sueño se
verificaba. Cuando Creso despertó se puso lleno de horror a medita
sobre él, y desde luego hizo casar a su hijo y no volvió a encargarle e
mando de sus tropas, a pesar de que antes era el que solía conduci
los lidios al combate; ordenando además que los dardos, lanzas y
cuantas armas sirven para la guerra, se retirasen de las habitaciones
destinadas a los hombres, y se llevasen a los cuartos de las mujeres
no fuese que permaneciendo allí colgadas pudiese alguna caer sobre
su hijo.
XXXV. Mientras Creso disponía las bodas, llegó a Sardes un frigio
de sangre real, que había tenido la desgracia de ensangrentar sus
manos con un homicidio involuntario. Puesto en la presencia del rey, le
pidió se dignase purificarle de aquella mancha, lo que ejecutó Creso
según los ritos del país, que en esta clase de expansiones son muy
parecidos a los de la Grecia. Concluida la ceremonia, y deseoso de
saber quién era y de dónde venía, le habló así: «¿Quién eres
desgraciado?, ¿de qué parte de Frigia[28] vienes?, ¿y a qué hombre o
mujer has quitado la vida?». «Soy, respondió el extranjero, hijo de
Midas, y nieto de Gordias: me llamo Adrasto; maté sin querer a un
hermano mío, y arrojado de la casa paterna, falto de todo auxilio
vengo a refugiarme a la vuestra». «Bien venido seas, le dijo Creso
pues eres de una familia amiga, y aquí nada te faltará. Sufre la
calamidad con buen ánimo, y te será más llevadera». Adrasto se quedó
hospedado en el palacio de Creso.
XXXVI. Por el mismo tiempo un jabalí enorme del monte Olimpo
devastaba los campos de los misios; los cuales, tratando de
perseguirle, en vez de causarle daño lo recibían de él nuevamente. Po
último, enviaron sus diputados a Creso, rogándole que les diese a
príncipe su hijo con algunos mozos escogidos y perros de caza para
matar aquella fiera. Creso, renovando la memoria del sueño, les
respondió: «Con mi hijo no contéis, porque es novio y no quiero
distraerle de los cuidados que ahora le ocupan; os daré, sí, todos mis
cazadores con sus perros, encargándoles hagan con vosotros los
mayores esfuerzos para ahuyentar de vuestro país el formidable
jabalí».
XXXVII. Poco satisfechos quedaron los misios con esta respuesta
cuando llegó el hijo de Creso, e informado de todo, habló a su padre
en estos términos: «En otro tiempo, padre mío, la guerra y la caza me
presentaban honrosas y brillantes ocasiones donde acreditar mi valor
pero ahora me tenéis separado de ambos ejercicios, sin haber dado yo
muestras de flojedad ni de cobardía. ¿Con qué cara me dejaré ver en la
corte de aquí en adelante al ir y volver del foro y de las concurrencias
públicas? ¿En qué concepto me tendrán los ciudadanos? ¿Qué pensará
de mí la esposa con quien acabo de unir mi destino? Permitidme, pues
que asista a la caza proyectada, o decidme por qué razón no me
conviene ir a ella».
XXXVIII. «Yo, hijo mío, respondió Creso, no he tomado estas
medidas por haber visto en ti cobardía, ni otra cosa que pudiese
desagradarme. Un sueño me anuncia que morirás en breve traspasado
por una punta de hierro. Por esto aceleré tus bodas, y no te permito
ahora ir a la caza por ver si logro, mientras viva, libertarte de aque
funesto presagio. No tengo más hijo que tú, pues el otro, sordo y
estropeado, es como si no le tuviera».
XXIX. «Es justo, replicó el joven, que se os disimule vuestro temo
y la custodia en que me habéis tenido después de un sueño tan
aciago; mas, permitidme, señor, que os interprete la visión, ya que
parece no la habéis comprendido. Si me amenaza una punta de hierro
¿qué puedo temer de los dientes y garras de un jabalí? Y puesto que
no vamos a lidiar con hombres, no pongáis obstáculo a mi marcha».
XL. «Veo, dijo Creso, que me aventajas en la inteligencia de los
sueños. Convencido de tus razones, mudo de dictamen y te doy
permiso para que vayas a caza».
XLI. En seguida llamó a Adrasto, y le dijo: «No pretendo, amigo
mío, echarte en cara tu desventura: bien sé que no eres ingrato
Recuérdote solamente que me debes tu expiación, y que hospedado en
mi palacio te proveo de cuanto necesitas. Ahora en cambio exijo de t
que te encargues de la custodia de mi hijo en esta cacería, no sea que
en el camino salgan ladrones a dañaros. A ti, además, te conviene una
expedición en que podrás acreditar el valor heredado de tus mayores y
la fuerza de tu brazo».
XLII. «Nunca, señor, respondió Adrasto, entraría de buen grado
en esta que pudiendo llamarse partida de diversión desdice de
miserable estado en que me veo, y por eso heme abstenido hasta de
frecuentar la sociedad de los jóvenes afortunados; pero agradecido a
vuestros beneficios, y debiendo corresponder a ellos, estoy pronto a
ejecutar lo que me mandáis, y quedad seguro que desempeñaré con
todo esmero la custodia de vuestro hijo, para que torne sano y salvo a
vuestra casa».
XLIII. Dichas estas palabras, parten los jóvenes, acompañados de
una tropa escogida y provistos de perros de caza. Llegados a las sierras
del Olimpo, buscan la fiera, la levantan y rodean, y disparan contra ella
una lluvia de dardos. En medio de la confusión, quiere la fortuna ciega
que el huésped purificado por Creso de su homicidio, el desgraciado
Adrasto, disparando un dardo contra el jabalí, en vez de dar en la fiera
dé en el hijo mismo de su bienhechor, en el príncipe infeliz que
traspasado con aquella punta, cumple muriendo la predicción del sueño
de su padre. Al momento despachan un correo para Creso con la nueva
de lo acaecido, el cual, llegado a Sardes, dale cuenta del choque y de
la infausta muerte de su hijo.
XLIV. Túrbase Creso al oír la noticia, y se lamenta particularmente
de que haya sido el matador de su hijo aquel cuyo homicidio había é
expiado. En el arrebato de su dolor invoca al dios de la expiación, a
dios de la hospitalidad, al dios que preside a las íntimas amistades
nombrando con estos títulos a Zeus, y poniéndole por testigo de la
paga atroz que recibe de aquel cuyas manos ensangrentadas ha
purificado, a quien ha recibido como huésped bajo su mismo techo, y
que escogido para compañero y custodio de su hijo, se había mostrado
su mayor enemigo.
XLV. Después de estos lamentos llegan los lidios con el cadáver, y
detrás el matador, el cual, puesto delante de Creso, le insta con las
manos extendidas para que le sacrifique sobre el cuerpo de su hijo
renovando la memoria de su primera desventura, y diciendo que ya no
debe vivir, después de haber dado la muerte a su mismo expiador. Pero
Creso, a pesar del sentimiento y luto doméstico que le aflige, se
compadece de Adrasto y le habla en estos términos: «Ya tengo, amigo
toda la venganza y desagravio que pudiera desear, en el hecho de
ofrecerte a morir tú mismo. Pero, ¡ah!, no es tuya la culpa, sino de
destino, y quizá de la deidad misma que me pronosticó en el sueño lo
que había de suceder».
Creso hizo los funerales de su hijo con la pompa correspondiente; y
el infeliz hijo de Midas y nieto de Gordias, el homicida involuntario de
su hermano y del hijo de su expiador, el fugitivo Adrasto, cuando vio
quieto y solitario el lugar del sepulcro, condenándose a sí mismo por e
más desdichado de los hombres, se degolló sobre el túmulo con sus
propias manos.
XLVI. Creso, privado de su hijo, cubriose de luto por dos años, a
cabo de los cuales, reflexionando que el imperio de Astiages, hijo de
Ciaxares, había sido destruido por Ciro, hijo de Cambises, y que e
poder de los persas iba creciendo de día en día, suspendió su llanto y
se puso a meditar sobre los medios de abatir la dominación persa
antes que llegara a la mayor grandeza. Con esta idea quiso hace
prueba de la verdad de los oráculos, tanto de la Grecia como de la
Libia, y despachó diferentes comisionados a Delfos, a Abas, lugar de
los focidios, y a Dodona, como también a los oráculos de Anfiarao y de
Trofonio, y al que hay en Bránquidas, en el territorio de Mileto. Estos
fueron los oráculos que consultó en la Grecia, y asimismo envió sus
diputados al templo de Amón en la Libia. Su objeto era explorar lo que
cada oráculo respondía, y si los hallaba conformes, consultarles
después si emprendería la guerra contra los persas.
XLVII. Antes de marchar, dio a sus comisionados estas
instrucciones: que llevasen bien la cuenta de los días, empezando
desde el primero que saliesen de Sardes; que al centésimo consultasen
el oráculo en estos términos: «¿En qué cosa se está ocupando en este
momento el rey de los lidios, Creso, hijo de Aliates?», y que
tomándolas por escrito, le trajesen la respuesta de cada oráculo. Nadie
refiere lo que los demás oráculos respondieron; pero en Delfos, luego
que los lidios entraron en el templo e hicieron la pregunta que se les
había mandado, respondió la Pitia con estos versos:
Sé del mar la medida, y de su arena
El número contar. No hay sordo alguno
A quien no entienda; y oigo al que no habla.
Percibo la fragancia que despide
La tortuga cocida en la vasija
De bronce, con la carne de cordero,
Teniendo bronce abajo, y bronce arriba.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com