Probability and Causality
Probability and Causality
University of Jena
Preface
What can we do to reduce global warming? How can we prevent another global financial
crisis? How to fight AIDS? What is the effect of being infected by a virus on dying within
a fixed amount of time? And which political measures can save lives in a specific virus
pandemic? These and similar questions ask about causal effects of political and social in-
terventions, of medical or psychological treatments, or of expositions to potentially harm-
ful or beneficial expositions, for example, to a virus or to asbestos dust. Obviously, inter-
ventions based on the wrong causal theories and hypotheses will cost the lives of thou-
sands, huge amounts of money that could be spent more appropriately, and fundamen-
tally change our lives. Even if our daily problems beyond these issues are less dramatic,
they are of the same nature. Just think about your own actions that you have to chose in
your responsibilities as a student, scientist, teacher, physician, psychologist, politician, or
as a parent! Whatever you do has direct, indirect, and total effects, and these effects might
be different if you take one action instead of another one. Furthermore, on the individual
level, these effects differ for different persons. The effects of being infected by a virus may
be much more severe for persons above 80 years of age if they already have severe health
problems as compared to younger persons or those who do not have such problems. It is
these kind of thoughts that make me believe that there is no other issue in the methodol-
ogy of empirical sciences that deserves and needs more attention and effort than causality.
And because the dependencies we are investigating are of a nondeterministic nature, we
need a probabilistic theory of causality. In other words, we need to understand probability
and causality.
The simple reason is that I wanted to understand the concepts of causality that are and
should be used in our theories in science and in practical live. Existing theories and their
answers never satisfied my own standards of what science should be. For me ‘understand-
ing’ includes to be able to define all relevant concepts in terms of mathematics. None of
the main stream theories, neither Rubin’s potential outcome approach nor Pearl’s graphi-
cal modeling approach satisfy this criterion. Although these pioneers contributed tremen-
dously to our understanding and the popularity of the issue of causality in the empirical
sciences, they lack mathematical rigor, thus opening the ground for misunderstandings
and endless and all too often fruitless discussions.
This book presents a mathematical theory of causality. All terms are well-defined in
terms of measure and probability theory, and all relevant propositions have a mathemati-
cal proof. This does not exclude that I made some mistakes. However, it is now possible to
VI
find and correct them, where they should have occurred. And, of course, the theory is still
far from being complete.
Empirical causal research involves several inferences and interpretations. Among these
are:
(a) Statistical inference, that is, the inference from a data sample to parameters charac-
terizing the distributions of random variables.
(b) Causal inference, that is, the inference from parameters characterizing the distribu-
tions of random variables to causal effects and/or dependencies.
(c) The substantive interpretation or meaning of the putative cause.
(d) The substantive interpretation or meaning of the outcome variable.
(e) The specification of the random experiment considered.
This book does not deal with all these points. We will neither discuss the mathematics of
statistical inference nor the content issues of construct validity or external validity (Camp-
bell & Stanley, 1963; Cook & Campbell, 1979; Shadish, Cook, & Campbell, 2002) involved
in points (c) to (e). Instead we will focus on the second point: causal inference, that is,
the inference from true parameters (i. e., not their estimates) to causal effects. Parameters
characterizing the distributions of random variables such as the conditional expectation
values of an outcome variable in two treatment conditions have, per se, no causal inter-
pretation. The inference from true parameters to causal effects is what the probabilistic
theory of causal effects and this book are about. As will be shown, causal effects are also
parameters that characterize the joint distributions of the random variables considered
in a random experiment. However, their definitions are less obvious than ‘ordinary’ con-
ditional expectation values and their differences. And, sometimes causal effects are iden-
tical to differences between ordinary conditional expectation values and sometimes they
are not. In other words, sometimes we can have a positive difference between conditional
expectation values that seemingly indicates a positive treatment effect, whereas the causal
effect is negative, and vice versa.
Basic Idea
In order to get a first impression of what this means, let us briefly formulate the basic idea
that can most easily be explained if the putative (or presumed) cause is a treatment or in-
tervention variable. Suppose an individual, or in more general terms, an observational unit
(this can also be a country), could be treated by condition 1 or it could be treated by con-
dition 0, everything else being invariant. If there is a difference in the outcome considered
(some measure of success of the treatment), then this difference is due to the difference in
the two treatment conditions. This idea goes back at least to Mill (1843/1865).
Multiple Determinacy
The problem with this first version of the basic idea is that most outcomes are multiply
determined, that is, they are not only influenced by the treatment variable, but by many
other variables as well. In the field of agricultural research, for example, the yield (outcome)
VII
of a variety does not only depend on the variety (treatment) itself, but it also depends
on the quality of the plot (observational unit), such as the average hours of sunshine on
the plot per day, the amount of water reaching the plot, and the number of microbes in
the plot, and so on. Although Mill’s idea sounds perfect, it is not immediately clear which
implications it has for practice, because the number of other causes is often too large for
keeping constant all of them. Furthermore, Mill’s idea fails to distinguish between poten-
tial confounders and intermediate variables. Holding constant all intermediate variables
as well — and not only all pretreatment variables — would imply that there is no treatment
effect any more, if we assume that all treatment effects have to be transmitted by some
intermediate variables.
Because of the problem of multiple determinacy, Mills conception has been comple-
mented by Sir Ronald A. Fisher (1925/1946) and by Jerzy S. Neyman (1923/1990) in the
second and third decades of the last century. Simply speaking, emphasizing and propa-
gating the randomized experiment, Fisher replaced the ceteris paribus clause (‘everything
else invariant’) by the ceteris paribus distributionibus clause: all other possible causes (the
‘pretreatment variables’) having the same distribution. This is what randomized assign-
ment of units to treatment conditions, for example, based on a coin flip, secures.
Imagine an invisible man. Although we cannot see him, suppose we know that he is there,
because we can see his shadow. Furthermore, suppose we would like to measure his size.
Doing that, we have two problems, a theoretical and a practical one. The theoretical prob-
lem is to define size. We have to clarify that we do not mean ‘volume’ or ‘weight’, but
‘height’ — without shoes, and without hat and hair. Unfortunately, actual height varies
slightly in the course of a day. Hence, we define size to be the expectation (with respect
to the uniform distribution over the 24 hours) of the momentary heights. This solves the
theoretical problem; now we know what we want to measure.
However, because the man is invisible, we cannot measure his size directly — and this
is not only because his size slightly varies over the day. The crucial problem is that we
can only observe his shadow. And this is the practical problem: How to determine his size
from his shadow? Sometimes, there is almost no shadow at all, sometimes it is huge. Some
geometrical reflection yields a first simple solution: measuring the shadow when the sun
has an angle of 45°. But what if it is winter and the sun does not reach this angle? Now we
need more geometrical knowledge, taking into account the actual angle of the sun and the
observed length of the shadow. This will yield an exact measure of the size of the invisible
man at this time of the day as well.
Determining a causal effect we face the same kind of problems. First, we have to define
a causal effect, and second, we have to find out how to determine it from empirically es-
timable parameters such as true means, that is, from conditional expectation values. The
simple solution — corresponding to the 45° angle of the sun in the metaphor — is the
perfect randomized experiment. The sample mean differences we observe in a random-
ized experiment only randomly deviate from the causal effect (due to random sample
variation). In contrast, in quasi-experiments and observational studies, solutions to the
VIII
practical problem are more sophisticated. They are also more sophisticated than in the
metaphor of the invisible man, because it is not only one other variable (the angle) that
determines the length of the shadow; instead there often are many other variables sys-
tematically determining the sample means as well as the true means that are estimated
by these sample means. This is again the problem of multiple determinacy. Furthermore,
a true effect to be estimated may even be negative although the true causal effect is posi-
tive, and vice versa. And this reversal of effects can be systematic, and not only be due to
sampling error.
This book presents a solution to the theoretical and the practical problems mentioned
above. Unfortunately, both solutions are not as simple and obvious as in our metaphor.
Furthermore, there is not only one single kind of causal effects, even if we restrict ourselves
to total causal effects and do not consider direct and indirect effects.
To our knowledge, the first pioneer tackling the theoretical and the practical problems was
Jerzy S. Neyman (1923/1990). While Fisher propagated the design technique of random-
ization, Neyman introduced the concepts of total individual and average causal effects,
thus attempting a first solution to the theoretical problem mentioned above. (Note, how-
ever, that he used different terms for these concepts). Developing statistical methods for
agricultural research, he assumed that, for each individual plot, there is an intra-individual
(i. e., plot-specific) distribution of the outcome variable, say Y , under each treatment.
He then defined the individual causal effect of treatment x compared to treatment x ′ to
be the difference between the intra-individual (plot-specific) expectation of Y (the “true
yield”) given treatment (“variety”) x and the intra-individual (plot-specific) expectation of
Y given treatment (“variety”) x ′ . Once the individual causal effect is defined, the average
treatment effect of x compared to x ′ on Y is simply the expectation (true mean) of the cor-
responding individual (plot-specific) causal effects in the set (population) of observational
units (plots). Similarly, several kinds of conditional effects can be defined, conditioning, for
instance, on covariates, that is, on other causes of Y that cannot be affected by X , such as
measures of the quality of the soil before treatment, average hours of sunshine, average
hours of rain, and so on.
At about the same time as Neyman and Fisher developed their ideas, Sewall Wright
(Wright, 1918, 1921, 1923, 1934, 1960a, 1960b) developed his ideas on path analysis and the
concepts of total, direct, and indirect effects. While his total effect aims at the same idea
as the causal total average effect, his direct and indirect effects were new. Simply speaking,
in the context of an experiment or quasi-experiment, a direct effect of the treatment is the
effect that is not transmitted through the intermediate variables; it is the conditional effect
of the treatment variable holding constant the intermediate variables on one of their val-
ues. In contrast, the indirect effect is the difference between the total effect and the direct
effect.
IX
Whereas the basic ideas outlined above are relatively simple and straightforward, trying to
put them into practice — that is, solving the practical problem mentioned above — is of-
ten difficult and needs considerable sophistication. The “fundamental problem of causal
inference” (Holland, 1986) is that we cannot expose an observational unit to treatment 1
and, at the same time, to treatment 0. However, this is exactly what is necessary if we want
to be sure that ‘everything else is invariant’, a clause that is also an implicit assumption in
the solution proposed by Neyman. Comparing the true yield of treatment 1 to treatment
0 within the same plot at the same time and identical conditions is an ideal version of the
ceteris paribus clause, which unfortunately is rarely accomplishable.
Pre-Post Designs
If we choose to first observe a unit under ‘no treatment’ and then observe it again after
‘treatment’, we may be tempted to interpret the pre-post differences as estimates of the
individual causal effects of the treatment given in between. However, this interpretation
might be wrong, because the unit may have developed (maturated, learned), may have
suffered from critical life events, may have experienced historical change, and so on (see,
e. g., Campbell & Stanley, 1963; Cook & Campbell, 1979; Shadish et al., 2002). Hence, in
these pre-post designs or synonymously, within-group designs, we have to make assump-
tions on the nature of these possible alternative interpretations of the pre-post compar-
isons, for example, that they do not hold in the application considered or that they have a
certain structure that can be taken into account when making causal inferences based on
pre-post comparisons.
Between-Group Designs
If, instead of making comparisons within a unit, we compare different units to each other
in between-group experiments, we certainly lose the possibility of estimating the individ-
ual causal effects. However, what we can hope for is that we are still able to estimate
the causal average total effect and certain causal conditional total effects. But how to es-
timate the average of the causal individual total effects if, due to the fundamental problem
of causal inference, the causal individual total effects are not estimable? Both, between-
group experiments and quasi-experiments, have a set of (observational) units, at least two
experimental conditions (‘treatment conditions’, ‘expositions’, ‘interventions’, etc.), and at
least one outcome variable (‘response’, ‘criterion’, ‘dependent variable’) Y . In the medical
sciences, the units are usually patients. In psychology the observational units are often
persons, but it could be persons-in-a-situation, or groups as well. In economics it could
be subjects, companies, or countries, for instance. In educational sciences the units might
be school classes, schools, communities, districts, or countries. In sociology and the polit-
ical sciences, the units could be persons, but also communities, countries, and so on. In
this book we show how to define and also how to make inferences about the average of the
causal individual total effects in such sets (and subsets) of observational units and about
causal conditional total effects, conditioning on attributes of the observational units or on
pretest scores, for instance.
X
In order to delineate the scope of the theory, consider the following kind of random exper-
iment : Draw an observational unit u (e. g., a person) out of a set of units, observe the value
z of a (possibly multivariate qualitative or quantitative) covariate Z for this unit, assign the
unit or observe its assignment to x, one of several experimental conditions, and record the
numerical value y of the outcome variable Y . We will use U to denote the random variable
representing with its value u the unit drawn. Note that many observations can be made
additionally to observing U , Z , X , and Y . Although this single-unit trial is a prototype of
the kind of empirical phenomena the theory is dealing with, there are other single-unit
trials in which the theory can be applied as well (see ch. 2). In fact, the theory is applicable
far beyond the true (i. e., the randomized experiment) and the quasi-experiment. This in-
cludes applications in which the putative causes are not manipulable. In this volume, we
also treat the case in which the putative cause is a continuous random variable (see, e. g.,
the causality conditions treated in chs. 8 or 9). The theory has its limitations only if there
is no clear time order of the random variables considered as putative causes or outcomes.
The single-unit trial described above is a random experiment, but not necessarily a ran-
domized experiment. A randomized experiment is a special random experiment in which
the drawn unit is assigned to one of the treatment conditions via randomization, for ex-
ample, depending on the outcome of a coin flip. (In empirical research, the single-unit
trials are repeated n times, where n denotes the sample size.) Referring to single-unit tri-
als, we can distinguish the true experiment from the quasi-experiment as follows: In the
true experiment, there are at least two treatment conditions and the assignment to one of
the treatment conditions is randomized, for example, by flipping a coin. In a traditional
randomized experiment, for instance, the treatment probabilities are chosen to be equal
for all units. However, equal treatment probabilities for all units are neither essential for
the definition of the true experiment nor for drawing valid causal inferences. We may as
well have treatment probabilities depending on the units and/or on a covariate (for more
details, see, e. g., Rem. 8.59), as long as these treatment probabilities are fixed or known
by the researcher. Note, however, that in designs, in which different units have different
treatment probabilities, standard techniques of data analysis such as t -tests or analysis of
variance do not test the hypothesis about a causal effect any more.
For between-group designs, the quasi-experiment may be defined such that there are at
least two treatment conditions; however, in contrast to the true experiment, the treatment
probabilities are unknown. Nevertheless, valid causal inferences can be drawn in quasi-
experiments provided that we can rely on certain assumptions (see the causality conditions
treated in Part III of this book. In specific applications these assumptions might be wrong.
If they are actually wrong, causal inferences can be completely wrong as well.
The Methodologist
In the first place, I would like to address the methodologist, that is, the expert in empiri-
cal research methodology, especially in the social, economic, behavioral, cognitive, medi-
XI
cal, agricultural, and biological sciences. This book provides answers to some of the most
important and fundamental questions of these empirical sciences: What do we mean by
terms like ‘X affects Y ’, ‘X has an effect on Y ’, ‘X influences Y ’, ‘X leads to Y ’, and so on
used in our informal theories and hypotheses? How can we translate these terms into a
precise language (i. e., probability theory) that is compatible with the statistical analysis of
empirical data? How to design an empirical study and how to analyze the resulting data
if we want to probe our theories and learn from such data about the causal dependencies
postulated in our theories and hypotheses? And last but not least: How to evaluate in-
terventions, treatments, or expositions to (possibly detrimental) environments, and learn
about how which effects they have for which kind of subjects or observational-units, and
under which circumstances?
The Statistician
Many statisticians believe that causality is beyond the horizon of their profession. Causal-
ity might be a matter of empirical researchers and philosophers, they say, but not their
own. They think that it cannot be treated mathematically and therefore a statistician
should refrain from causal interpretations. As a consequence, they ignore the issue of
causality. This book proves that these beliefs are prejudices. The theory of causal effects,
as presented here, is a branch of probability theory, which itself, at least since Kolmogorov
(1933/1977), is a part of pure mathematics — although with an enormous potential for
applications in many empirical sciences and even beyond. The main purpose of this book
is to translate the informal concepts about causal effects shared by many methodologists
and applied statisticians into well-defined terms of mathematical probability theory. The
principle is not to use any term that itself is not defined in other mathematic terms, and
the result is a purely mathematical theory of causal effects. Of course, this will make it
harder to read this book for the methodologist and those not yet trained in probability
theory. However, the reward is a much deeper understanding of what is essential and a
much better grasp of the nature of our theories about the real world.
Of course, undefined terms are still used in this book, but only in the examples, in
the interpretations, and in the motivations of the definitions. The theory itself is pure
mathematics, just in the same way as Kolmogorov’s probability theory presented in 1933,
which explicated the mathematical, measure-theoretical structure of probabilistic con-
cepts. Substantive meaning results, for example, if we interpret the core components of
the formal structure in a specific random experiment considered. And this is also true for
the theory of causal effects presented in this book.
The empirical scientist in the fields mentioned above has at least three good reasons to
study this book. The first is that some crucial parts of his theories and hypotheses are
explicated, at least when it comes to considering a concrete experiment or study. The am-
biguity in causal language such as ‘X affects Y ’, ‘X has an effect on Y ’, ‘X influences Y ’,
‘X leads to Y ’ are not necessary any more. Reading this book will make it possible to re-
place these ambiguous terms by well-understood and well-defined terms, improving the
precision of empirical research and theories.
XII
The second motivation of the empirical scientist is that even if he knows his own theo-
retical concepts and hypotheses, he still has to know how to design experiments and stud-
ies that enable him to test them empirically.
Third, the standard ways of analyzing data offered in the textbooks of applied statistics
and in the available computer programs often do not estimate and test the causal effects
and dependencies we refer to in our theories. And this is not only bad for the empirical
scientist but also for all those relying on the validity of his inferences and his expertise. Just
think about all the harmful consequences of wrong causal theories in various empirical
research fields, if they are applied to solving concrete problems!
There are two messages for those who do their research with experiments, a good one and
a bad one. The good news is that, in a perfect randomized experiment, the causal average
total treatment effect is indeed estimated when comparing sample means between two
different treatment conditions. The bad news is that we can not rely on randomized as-
signment of units to treatment conditions when it comes to estimating direct and indirect
effects. More specifically, in such an analysis it is usually not sufficient to consider inter-
mediate variables, treatment and outcome variables. Instead we also have to include in
our analysis pre-treatment variables such as a pre-test of the intermediate variable and a
pre-test of the outcome variable and apply adjustment methods, very much in the same
way as we have to use these techniques in quasi-experiments. Hence, if we want to study
the black box between the treatment and the outcome variables, we have to adopt the
techniques of causal modeling that are far beyond traditional comparisons of means and
analysis of variance. (For more details see, e. g., Mayer, Thoemmes, Rose, Steyer, & West,
2014).
Philosophers of science study and teach the methodology of empirical sciences. In that
respect, their task is very similar to that of the methodologist, perhaps only more gen-
eral and less specific for a certain discipline. Therefore, it is not surprising that probabilis-
tic causality has also been tackled by philosophers of science (see, e. g., Cartwright, 1979;
Spohn, 1980; Stegmüller, 1983; Suppes, 1970). Compared to these approaches, our empha-
sis is more on those parts of the theory that have implications for the design of empirical
studies and the analysis of data resulting from such studies.
For reasons detailed before, I believe that the probabilistic theory causal effects and de-
pendencies is the most rewarding topic in methodology. Although it is tough to get into it,
you will get insights why all this methodology stuff was useful and what it was good for. At
least this is what many of my students said at the end of their curriculum, even if they did
not have the choice whether or not to take the courses on causal effects.
XIII
Several research traditions have been contributing to the theory of causal effects in vari-
ous ways. From the Neyman-Rubin tradition, I adopted the idea that it is important to de-
fine various causal effects such as individual, conditional, and average total effects, even
though we modified and extended these concepts in some important aspects. Defining
causal effects is important for proving that certain methods of data analysis yield unbi-
ased estimates of these effects if certain assumptions can be made. Are there conditions
under which the analysis of change scores (between pre- and post-tests) and repeated-
measures analysis of variance yield causal effects? Under which conditions do we estimate
and test causal effects in the analysis of covariance? Which are the assumptions under
which propensity score methods yield estimates of causal effects? Which are the assump-
tions under which an instrumental variable analysis estimates a causal effect? All these
questions and their answers presuppose that we have a mathematical definition of causal
effects. Simply speaking, Rubin’s potential outcome variables are replaced by the true out-
come variables (see ch. 5), allowing for variance in the outcome (or response) variables
given treatment and an observational unit. Many important results of the theory, for exam-
ple, about strong ignorability and about propensity scores remain unchanged, while other
results are new, giving more insights, and open the floor for new research techniques.
From the Campbellian tradition (see, e. g., Campbell & Stanley, 1966; Cook & Camp-
bell, 1979; Shadish et al., 2002) we learned that there are questions and problems beyond
the theory causal effects itself that are relevant in empirical causal research, such as: How
to generalize beyond the study? What does the treatment variable actually mean from a
substantive point of view? What is the meaning of the outcome variable? And, perhaps the
most general question: Are there alternative explanations for the effect? The vast major-
ity of social scientists (including myself) have been educated in this research tradition to
some degree. Although this training is still very useful as a general methodology frame-
work, it lacks precision and clarity in a number of issues — and the definition of a causal
effect is one of them that remains unnecessarily vague in their ideas dealing with interval
validity.
From the graphical modeling tradition (see, e. g., Cox & Wermuth, 2004; Pearl, 2009;
Spirtes, Glymour, & Scheines, 2000), we learned that conditional independence plays an
important role in causal modeling. This research tradition has also been developing tech-
niques to estimate causal effects and to search for causal models if specific assumptions
can be made. The fact that randomization in a true experiment in no way guarantees the
validity of causal inferences on direct effects has been brought up by this research tradi-
tion.
Structural equation modeling and psychometrics have been teaching us how to use la-
tent variables and structural equation modeling in testing causal hypotheses. Due to a
number of statistical programs such as AMOS (Arbuckle, 2006), EQS (Bentler, 1995), lavaan
(Rosseel, 2012), LISREL (Jöreskog & Sörbom, 1996/2001), Mplus (Muthén & Muthén, 1998-
2007), OpenMx (OpenMx, 2009), RAMONA (Browne & Mels, 1998), structural equation
modeling became extremely popular in the social sciences. Although many users of these
programs hope to find causal answers, it should be clearly stated that structural equation
modeling — and this is true for all kinds of statistical models (including analysis of vari-
ance) — does neither automatically estimate and test causal effects, nor does it provide a
satisfactory theory of causal effects and dependencies. Nevertheless, this research tradi-
XIV
tion contributes — just like other areas of statistics — a number of statistical techniques
that can be very useful in causal modeling.
This book is aimed at embedding — and, where necessary, extending — conventional
statistical procedures such as analysis of covariance, nonorthogonal analysis of variance,
and latent variable modeling, but also more recent techniques based on propensity scores
into a coherent theory of probabilistic causality.
Outline of Chapters
This book is written such that standard mathematical probability theory is sufficient for a
complete understanding, provided one takes the time that these topics require. In many
parts, this is not a book one can just read; instead it is a book to be studied. This includes
working on the questions and exercises provided in each chapter. We presume that the
reader is familiar with — or learns while studying this book — the essentials of proba-
bility theory, including not only random variables and their distribution, but also con-
ditional expectations and conditional independence. These essentials of probability the-
ory are extracted in Steyer (2024) from the more complete and detailed book Steyer and
Nagel (2017). Both books are also referred to very often for definitions, theorems, and
other propositions used in this text. The references to Steyer (2024) are abbreviated by
RS-Definition, RS-Theorem, RS-Remark, or RS-(3.3), the latter referring to an equation or
a proposition in that book. Similarly, references to Steyer and Nagel (2017) are abbreviated
by SN-Definition, SN-Theorem, SN-Remark, or SN-(10.32), for example.
The largest part of this book is devoted to the theory of causal effects. The Causal Ef-
fects Explorer (Nagengast, Kröhne, Bauer, & Steyer, 2007) can be used for exploring prima
facie effects, conditional and average total effects given certain parameters. Furthermore,
XVI
the program EffectLiteR (Mayer, Dietzfelbinger, Rosseel, & Steyer, 2016), can be used to
estimate conditional and average total effects from empirical data in experiments and
quasi-experiments. Both programs, which are available at www.causal-effects.de, may be
used together with this book in a course on causal modeling. In fact, this is the content of
my workshops on the analysis of causal conditional and average total effects, which are
available both as videos-on-demand on the internet and on DVDs, again at www.causal-
effects.de.
Acknowledgements
This book has been written with the help of several colleagues and students. Werner Nagel
(FSU Jena) helped whenever I felt lost in probability spaces. Stephen G. West (Arizona
State University) and Felix Thömmes (Cornell University) made detailed suggestions for
improving readability of the book. Safir Yousfi and Sonja Hahn contributed and/or sug-
gested concrete ideas, Sonja being extremely helpful also in checking some of the math-
ematics. Our students Franz Classe, Lisa Dietzfelbinger, Niclas Heider, Marc Heigener,
Remo Kamm, Lawrence Lo, David Meder, Marita Menzel, Yuka Morikawa, Sebastian Nit-
sche, Fabian Schäfer, Michael Temmerman, Sebastian Weirich, Anna Zimmermann, and
other students critically commented on previous versions, helped minimizing errors, or
organizing the references. Uwe Altmann, Linda Gräfe, Sven Hartenstein, Ulf Kröhne, Axel
Mayer, Marc Müller, Christof Nachtigall, Benjamin Nagengast, Andreas Neudecker, Ivailo
Partchev, Jan Plötner, Steffi Pohl, Norman Rose, Marie-Ann Sengewald, and Andreas Wolf
together with the others mentioned above provided the intellectual climate in which this
book could be written. I am also grateful to the students and colleagues participating at
our courses on the analysis of causal effects asking questions and making important com-
ments. Over the years, this helped a lot to improve this book.
Part I
Introduction
1 Introductory Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Example 1 — Joe and Ann With Self-Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Joint Probabilities P (X =x , Y =y ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2 Marginal Probabilities P (X =x ) and P (Y =y) . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.3 Prima Facie Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.4 Individual Total Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.5 Prima Facie Effect Versus Expectation of the Individual Total Effects 10
1.1.6 How to Evaluate the Treatment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Example 2 — Experiment With Two Nonorthogonal Factors . . . . . . . . . . . . . . . 13
1.2.1 Prima Facie Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.2 (Z =z)-Conditional Prima Facie Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.3 Average of the (Z =z)-Conditional Prima Facie Effects . . . . . . . . . . . . . . 16
1.2.4 Individual Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.5 Average of the Individual Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.6 (Z =z)-Conditional Total Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.7 How to Evaluate the Treatment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Part II
Basic Concepts of the Theory of Causal Total Effects
3 Time Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1 Filtration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Prior-To Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 Properties of the Prior-to Relation of Measurable Set Systems . . . . . . . 51
XVIII Contents
Part III
Causality Conditions
Contents XIX
10 Unconfoundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
10.1 Unconfoundedness Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
10.2 Sufficient Conditions of Unconfoundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
10.2.1 Fisher Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
10.2.2 Suppes-Reichenbach Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
10.3 Hybrid Sufficient Conditions of Unconfoundedness . . . . . . . . . . . . . . . . . . . . . . 319
10.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
10.5 Implications of Unconfoundedness on Unbiasedness . . . . . . . . . . . . . . . . . . . . . 324
10.5.1 Unbiasedness of E (Y |X ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
10.5.2 Unbiasedness of E (Y |X, Z ), E X =x (Y |Z ), and E X =x (Y |Z =z) . . . . . . . . . 325
10.6 Expectation Stability of Prima Facie Effects and Effect Functions . . . . . . . . . . . 326
10.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
10.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
List of Figures
4.1 The filtration (Ft )t ∈T and various σ-algebras in a regular causality space. . . . . . 89
6.1 The person variable U , the function g 1 , and their composition, the true
outcome variable τ1 = g 1 (U ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
List of Tables
Introduction
Chapter 1
Introductory Examples
For more than a century there have been examples in the statistical literature showing that
comparing means or comparing probabilities (e. g., of success of a treatment) between a
group exposed to a treatment and a comparison group (unexposed or exposed to a dif-
ferent treatment) does not necessarily answer our questions: ‘Which treatment is better
overall?’ or ‘Which treatment is better for which kind of person?’ Differences between true
means and differences between probabilities (or any other comparison between probabil-
ities such as odds ratios, log odds ratios, or relative risk) are usually not the treatment ef-
fects we are looking for (see, e. g., Pearson, Lee, & Bramley-Moore, 1899; Yule, 1903; Simp-
son, 1951). They are just effects at first sight or “prima facie effects” (Holland, 1986).
Just like the shadow in the metaphor of the invisible man presented in the preface,
prima facie effects reflect the effects of the treatment (the size of the invisible man), but
also the effects of other causes (the angle of the sun). The goal of analyzing causal effects
is to estimate the effect of the treatment alone, isolating it from other potential determi-
nants, such sex, educational background, socio-economic status, and so on. The general
idea is to define and, in applications, estimate a treatment effect that is not biased by pre-
existing differences between treatment groups that would also be observed after treatment
if there were no treatment effect at all.
Overview
We illustrate systematic bias in determining total (as opposed to direct or indirect) treat-
ment effects in quasi-experiments by two examples. The first one deals with a dichoto-
mous outcome variable, the second with a quantitative one. Note that the problems de-
scribed in these two examples cannot occur in a randomized experiment, but they are
ubiquitous in nonrandomized quasi-experimental observational studies.
In this example, the prima facie effect reverses if we switch from comparing the condi-
tional probabilities of success between treatment and control, that is, from comparing
Person u P(U =u ) P(X =1|U =u ) P(Y =1|U =u , X =0) P(Y =1|U =u , X =1)
Joe .5 .04 .7 .8
Ann .5 .76 .2 .4
This kind of phenomenon, which is already known at least since Yule (1903), is called
Simpson’s paradox (Simpson, 1951), and it is still being debated (see, e. g., Hernán, Clayton,
& Keiding, 2011; Wang & Rousseau, 2021).
Table 1.1 shows the parameters specifying a random experiment that is composed of
three parts.
(1) A person is sampled from a set of two persons, Joe and Ann, with identical proba-
bilities for each person u, that is, with probability P (U =u ) = .5.
(2) If Joe is sampled, then he obtains treatment (X =1) with probability P (X =1|U =Joe )
= .04. If Ann is sampled, then she is treated with probability P (X =1|U =Ann) = .76.
(These numbers may reflect self-selection to treatment and the different inclina-
tions of the two persons to go to treatment.)
(3) If Joe is sampled and not treated, then his probabilityP (Y =1| U =Joe , X =0) of suc-
cess is .7. If he is sampled and treated, then his probability P (Y =1| U =Joe , X =1)
of success is .8. In contrast, if Ann is sampled and not treated, then her probability
P (Y =1| U =Ann, X =0) of success is .2, and if she is sampled and treated, then her
probability P (Y =1| U =Ann, X =1) of success is .4.
This table describes a random experiment and it contains all information we need to
compute the causal total effects of the treatment on the outcome variable Y (success),
including the causal conditional total effects given the person and the causal average total
effect of the treatment variable. These terms will be defined in chapter 5 and computed for
this example in section 1.1.4.
Note that Table 1.1 does not describe a randomized experiment, in which, by definition,
the treatment probabilities P (X =1|U =u ) would be identical for all observational units u.
Instead, it describes a random experiment, which is that kind of empirical phenomenon
that we usually consider when we apply probability theory using terms such as random
variables, their expectations, variances, distribution, correlations, etc. In inferential statis-
tics it is those concepts about which we formulate our hypothesis and that we try to esti-
mate in a sample.
In probability theory, we consider such a random experiment from the pre facto per-
spective. Hence, we do not consider data that would result from actually conducting such
a random experiment. Data are only important in order to learn from observations about
the laws of a random experiment. Data analysis is only a way to learn about these laws.
But it these laws of the random experiment that are of primary interest. More precisely, if
we know the eight probabilities displayed in Table 1.1, then we have all the information
that we need to compute the causal conditional and average total effects of the treatment
on the outcome (success). All it needs is to define these concepts in terms of probability
theory, and this is what this book is about. Causal effects are nothing philosophical or even
metaphysical. Instead, they are parameters that can be computed from the probabilities
1.1 Example 1 — Joe and Ann With Self-Selection 5
Treatment variable X
Response variable Y
Person variable U
P (Y = 1|X ,U )
P (X =1| U )
P (Y = 1|X )
Treatment
Success
P ({ωi })
Unit
of events and the distributions of random variables pertaining to the random experiment
considered.
Table 1.2 describes the same random experiment as Table 1.1, but in a different way.
The eight triples such as (Joe, no, −) or (Ann, yes, +) represent one of the eight possible
outcomes ω1 , . . . , ω8 of the random experiment that are gathered in the set Ω of possible
outcomes. Remember, an event A is a subset of Ω that has a probability P (A), which is
assigned by the probability measure P to each element A in the set A of all events (see RS-
ch. 1 for these elementary concepts of probability theory). The eight probabilities of the
elementary events contain the same information as the eight probabilities in Table 1.1. All
conditional probabilities appearing in Table 1.2, but also all probabilities and all condi-
tional probabilities presented in Table 1.1 can be computed from these eight probabilities
of the elementary events.
Table 1.2 has the virtue of explicitly showing all possible outcomes of the random ex-
periment considered. Furthermore, it shows how the random variables U , X , and Y are
defined, showing the assignments of their values to each of the eight possible outcomes
of the random experiment (see RS-Def. 2.2 for the definition of a random variable). It also
displays the conditional probabilities P (Y =1|X ,U ), P (Y =1|X ), and P (X =1|U ), which are
random variables on the same probability space as the observables, that is, the random
variables U , X , and Y (see RS-Def. 4.1 and RS-Rem. 4.12 for the definition of such a con-
ditional probability).
The crucial point is that each of the conditional probabilities mentioned above also
assigns a value to each of the eight possible outcomes of the random experiment. For
example, the values assigned by P (X =1|U ) to each outcome ωi ∈ Ω are the conditional
probabilities P (X =1|U =u ). More precisely,
Treatment
Success No (X =0) Yes (X =1)
No (Y =0) .240 .232 .472
Yes (Y =1) .360 .168 .528
.600 .400 1.000
Note. The entries in the four cells are the joint probabilities P(X =x ,Y =y), the other entries are
the marginal probabilities P(X =x ) (last row) and P(Y =y) (last column).
Similarly,
and
1.0
Probability of success
0.8
0.6
0.6
0.42
0.4
0.2
0.0
Control Treatment
These probabilities are easily computed from the probabilities of the elementary events
displayed in the second column of Table 1.2. For example, the probability P (X =0, Y =1)
that the sampled person receives no treatment and is successful is the sum of the proba-
bilities of the elementary events {ω2 } = {(Joe, no, +)} and {ω6 } = {(Ann, no, +)}, that is,
¡ ¢ ¡ ¢
P (X =0, Y =1) = P {(Joe, no, +)} + P {(Ann, no, +)} = .336 + .024 = .36.
Similarly, the probability P (X =1, Y =1) that the sampled person receives treatment
and is successful is the sum of the probabilities of the two elementary events {ω4 } =
{(Joe, yes, +)} and {ω8 } = {(Ann, yes, +)}, that is,
¡ ¢ ¡ ¢
P (X =1, Y =1) = P {(Joe, yes, +)} + P {(Ann, yes, +)} = .016 + .152 = .168.
Table 1.3 is the theoretical analog to a contingency table that would be observed in a
data sample. More precisely, if we multiply the displayed numbers by the sample size, then
we receive the expected frequencies of the corresponding events. For example, if the sample
size is 1000, then we expect 240 cases in cell (X =0, Y = 0) and 360 cases in cell (X =0, Y =1),
etc. Of course, in a data sample, the observed frequencies would fluctuate around these
expected frequencies (see again Exercise 1-7).
The marginal probabilities P (X =x ) and P (Y =y) are also easily computed from the prob-
abilities of the elementary events displayed in the second column of Table 1.2. For exam-
ple, the probability P (X =0) that the sampled person receives no treatment is the sum of
the probabilities of the four elementary events {ω1 } = {(Joe, no, −)}, {ω2} = {(Joe, no, +)},
{ω5 } = {(Ann, no, −)}, and {ω6 } = {(Ann, no, +)}, that is,
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
P (X =0) = P {(Joe, no, −)} + P {(Joe, no, +)} + P {(Ann, no, −)} + P {(Ann, no, +)}
= .144 + .336 + .096 + .024 = .6,
8 1 Introductory Examples
P (X =1) = 1 − P (X =0) = .4
Similary, the probability P (Y =1) that the sampled person is successful is the sum of the
probabilities of the four elementary events {ω2} = {(Joe, no, +)}, {ω4 } = {(Joe, yes, +)}, {ω6 } =
{(Ann, no, +)}, and {ω8 } = {(Ann, yes, +)}, that is,
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
P (Y =1) = P {(Joe, no, +)} + P {(Joe, yes, +)} + P {(Ann, no, +)} + P {(Ann, yes, +)}
= .336 + .016 + .024 + .152 = .528,
which implies
P (Y = 0) = 1 − P (Y =1) = .472.
Comparing the conditional probability P (Y =1|X =1) of success given the treatment con-
dition to the conditional probability P (Y =1|X =0) of success given the control condition
would lead us to the (wrong) conclusion that the treatment is harmful. These two condi-
tional probabilities can be computed by
and
P (Y =1, X =0) .36
P (Y =1| X = 0) = = = .6,
P (X =0) .6
respectively (see, e. g., RS-Def. 1.32 for the events {Y =1}, {X =0}, and {X =1}). Figure 1.1
displays both conditional probabilities in a bar chart.
These two conditional probabilities can be compared to each other in different ways.
The simplest one is looking at the difference P (Y =1| X =1) − P (Y =1| X = 0). This is a par-
ticular case of the difference E (Y |X =1) − E (Y |X =0) between two conditional expectation
values, in which the outcome variable Y is dichotomous with values 0 and 1 (see RS-
Rem. 3.22). Following Holland (1986), we will call this difference the (unconditional) prima
facie effect and use the notation
Other possibilities of comparing the two conditional probabilities are to compute the odds
ratio, its logarithm, or the risk ratio (see, e. g., SN-sect. 13.3 or chapter 4 of Rothman, Green-
land, & Lash, 2008, for a detailed discussion of these and other effect parameters). No mat-
ter which of these effect parameters we choose, they all lead to the conclusion that the
treatment is harmful (see Exercise 1-8). As shown in the following section this conclusion
is utterly wrong.
1.1 Example 1 — Joe and Ann With Self-Selection 9
Joe (U =Joe )
Treatment
Success No (X =0) Yes (X =1)
No (Y = 0) .144 .004 .148
Yes (Y =1) .336 .016 .352
.48 .02 .5
Ann (U =Ann )
Treatment
Success No (X =0) Yes (X =1)
No (Y = 0) .096 .228 .324
Yes (Y =1) .024 .152 .176
.12 .38 .5
Note. The entries in the four cells for Joe and in the four cells for Ann are the joint probabilities
P(U =u , X =x ,Y =y). The other entries are the joint probabilities P(U =u , X =x ) (third and last
row) and P(U =u ,Y =y ) (last column), respectively, and the two marginal probabilities P(U =u ).
The conclusion about the effect of the treatment is completely different if we look at the
treatment effects separately for Joe and Ann. Table 1.4 shows the joint distributions of
treatment, success, and the person variable U with values Joe and Ann . The probabilities
of sampling Joe and of sampling Ann are identical, that is, P (U =Joe ) = P (U =Ann ) = .5.
Furthermore, the joint probabilities P (U =u , X =x , Y =y ) are the probabilities of the ele-
mentary events displayed in the second column of Table 1.2. These joint probabilities are
displayed again in a form analog to (2×2×2)-contingency table in Table 1.4.
As already mentioned in section 1.1.1, in empirical applications, this random exper-
iment cannot be repeated in order to obtain a data sample. However, we can repeat it
in a simulation (see Exercise 1-7). If, in such a simulation, we multiply the numbers dis-
played in Table 1.4 by the sample size, then we receive the expected frequencies of the cor-
responding events. For example, if the sample size is 1000, then we expect 144 cases in
cell (U =Joe , X =0, Y = 0) and 336 cases in cell (U =Joe , X =0, Y =1), etc. Of course, in data
samples, the observed frequencies fluctuate around these expected frequencies.
Using the joint probabilities displayed in Table 1.4, the conditional probability of suc-
cess for Joe in the treatment condition can be computed as follows:
P (U =Joe , X =1, Y =1) .016
P (Y =1| X =1,U =Joe ) = = = .8
P (U =Joe , X =1) .016 + .004
(see Exercise 1-9). In contrast, Joe’s conditional probability of success in the control con-
dition is
10 1 Introductory Examples
Hence,
P (Y =1| X =1,U =Joe ) − P (Y =1| X = 0,U =Joe ) = .8 − .7 = .1,
which may lead us to conclude that the treatment is beneficial for Joe. Again, because Y is
binary with values 0 and 1, this difference is a special case of the difference
which we call the individual total (treatment) effect of Joe, using the notation ITE U ;10 ( Joe ).
Hence,
ITE U ;10 ( Joe ) = E (Y | X =1,U =Joe ) − E (Y | X =0,U =Joe )
(1.4)
= P (Y =1| X =1,U =Joe ) − P (Y =1| X = 0,U =Joe ) .
What about the individual total effect of Ann? Table 1.4 shows that the conditional prob-
ability of success for Ann in the treatment condition is
whereas it is
in the control condition. Figure 1.2 shows these conditional probabilities in a bar chart.
Considering the individual total effect
of Ann may lead us to conclude that the treatment is also beneficial for Ann.
Hence, it seems that the treatment is beneficial for Joe and for Ann. This seems to con-
tradict our finding ignoring the person variable. Just considering the prima facie effect
1.1.5 Prima Facie Effect Versus Expectation of the Individual Total Effects
In contrast to our intuition, the prima facie effect E (Y |X =1) − E (Y |X =0) is neither the
simple average nor any weighted average of the corresponding individual total effects
0.6
0.4
0.4
0.2
0.2
0.0
Control Treatment Control Treatment
Joe Ann
The conditional probability P (Y =1| X = 0) of success given control is the sum of the corre-
sponding probabilities P (Y =1| X = 0,U =Joe ) and P (Y =1| X = 0,U =Ann ), weighted by the
conditional probabilities P (U =Joe |X =0) and P (U =Ann|X =0), respectively, that is,
P (Y =1| X = 0) = P (Y =1| X = 0,U =Joe ) ·P (U =Joe |X =0) +
P (Y =1| X = 0,U =Ann ) ·P (U =Ann|X =0)
.48 .12
= .7 · + .2 · = .6
.6 .6
[see Box 3.2 (ii) and Exercise 1-10]. Because the difference between the conditional prob-
abilities P (U =Joe |X =0) = .48/.6 and P (U =Ann|X =0) = .12/.6 is large, the probability of
success in treatment 0 is much closer to .7 than to .2 (see the dots above X = 0 in Fig. 1.3).
Similarly, the conditional probability P (Y =1| X =1) of success given treatment con-
dition (X =1) is the sum of the two corresponding individual conditional probabilities
P (Y =1| X =1,U =Joe ) and P (Y =1| X =1,U =Ann ), weighted by the conditional probabili-
ties P (U =Joe |X =1) and P (U =Ann|X =1), respectively, that is,
P (Y =1| X =1) = P (Y =1| X =1,U =Joe ) ·P (U =Joe |X =1) +
P (Y =1| X =1,U =Ann ) ·P (U =Ann|X =1)
.02 .38
= .8 · + .4 · = .42.
.4 .4
Hence, the prima facie effect is
PFE 10 = P (Y =1| X =1) − P (Y =1| X = 0)
X
= P (Y =1|X =1,U =u ) · P (U =u |X =1) −
u
X (1.7)
P (Y =1|X =0,U =u ) · P (U =u |X =0)
u
= .42 − .6. = −.18.
12 1 Introductory Examples
1.0
0.6
P(Y =1|X )
0.4 P(Y =1|X =x ,U =Ann )
0.2
0.0
0 1 X
Because the two (X =1)-conditional probabilities P (U =Joe |X =1) = .02/.4 = .05 and
P (U =Ann|X =1) = .38/.4 = .95 are very different, the probability of success in treatment
1 is much closer to .4 than to .8 (see the dots above X =1 in Fig. 1.3). (The size of the area
of the dotted circles is proportional to the conditional probabilities P (U =u |X =x ) that are
used in the computation of the conditional expectation values E (Y |X =x ). [This kind of
graphics has been adopted from Agresti, 2007)].
The prima facie effect is not identical to the expectation of the individual total effects,
which is the expectation of the random variable ITE U ; 10 (U ), the values of which are the
two individual total effects ITE U ; 10 ( Joe ) and ITE U ; 10 (Ann ) for Joe and Ann, respectively,
that is,
¡ ¢ X
E ITE U ;10 (U ) = ITE U ; 10 (u) · P (U =u )
u
X
= P (Y =1|X =1,U =u ) · P (U =u ) − (1.8)
u
X
P (Y =1|X =0,U =u ) · P (U =u ).
u
Because the two individual effects are ITE U ; 10 ( Joe ) = .1 and ITE U ; 10 (Ann ) = .2,
¡ ¢ 1 1
E ITEU ;10 (U ) = .1 · P (U = Joe ) + .2 · P (U =Ann ) = .1 · + .2 · = .15.
2 2
Hence, whereas the prima facie effect PFE 10 = P (Y =1| X =1)− P (Y =1| X = 0) is negative,
namely −.18, the expectation of the individual total-effect variable is positive, namely .15
(see Exercise 1-15). This expectation will be called the causal average total effect of the
treatment on the outcome variable Y , denoted ATE 10 .
1.2 Example 2 — Experiment With Two Nonorthogonal Factors 13
are contradictory. Which of these comparisons should we trust? Is the treatment harmful
as P (Y =1|X =1) − P (Y =1|X =0) = −.18 suggests? Or is it beneficial as suggested by the
two positive differences P (Y =1|X =1,U =u ) − P (Y =1|X =0,U =u )? Which of these com-
parisons are meaningful for evaluating the causal total effect of the treatment on the suc-
cess variable Y ? Before we come back to these questions, we consider another example.
In this section, we treat an example with three treatment conditions, representing two
treatments and a control, for instance. Furthermore, there are a discrete covariate with
three values, representing, for example, educational status, and a quantitative outcome
variable, indicating the degree of success, for instance.1
Table 1.5 shows the table of a random experiment. Again this random experiment is
composed of three parts.
(a) A person is sampled from a set of eight persons with identical probabilities for each
person u, that is, with probability P (U =u ) = 1/8.
(b) If Tom is sampled, then he obtains treatment 1 with probability P (X =1|U =Tom )
= 10/60 and treatment 2 with probability P (X =2 |U =Tom ) = 3/60. The corre-
sponding probabilities are also displayed for the other seven persons. The proba-
bilities of getting treatment 0 can be computed from the displayed probabilities for
treatment 1 and 2. For example, for Tom it is P (X =0 |U =Tom ) = 1 − (10/60 + 3/60).
And again, all these conditional probabilities may reflect self-selection to one of the
treatments and the different inclinations of the persons to go to those treatments.
(c) After receiving treatment, a value of the outcome variable Y (success) is assessed.
These values cannot be displayed in the table, because we assume that Y is contin-
uous. Instead, the table displays the (U =u , X =x )-conditional expectation values of
Y . If Tom is sampled and not treated, then his expectation value E (Y |U =Tom , X =0)
1 In this example, we consider a (3×3)-factorial design with crossed, non-orthogonal factors. The analysis of
such designs has been puzzling many statisticians (see, e. g., Aitkin, 1978; Appelbaum & Cramer, 1974; Carlson &
Timm, 1974; Gosslee & Lucas, 1965; Jennings & Green, 1984; Keren & Lewis, 1976; Kramer, 1955; Langsrud, 2003;
Nelder & Lane, 1995; Overall & Spiegel, 1969, 1973b, 1973a; Overall, Spiegel, & Cohen, 1975; Williams, 1972). In
fact, none of the statistical packages such as SAS, SysStat, or SPSS with their Type I, II, III or IV sums of squares
provide correct estimates and tests of the average effects (or main effects) for such a design unless the second
factor has a uniform distribution, with equal probabilities for all values of the second factor. In this case Type III
analysis yields correct results, at least, if the second factor is assumed to be fixed. However, in most applications
in the social sciences, the second factor is not fixed but stochastic with a distribution of this factor (a qualitative
random variable) varying between samples. Mayer and Thoemmes (2019) show how to conduct a correct analysis
including average total effects (see also Exercise 1-17).
14 1 Introductory Examples
E(Y |X = 0,U =u )
E(Y |X =1,U =u )
E(Y |X =2,U =u )
P (X =1 |U =u )
P (X =2 |U =u )
Educational
Person u
P (U =u )
status z
Tom low 1/8 10/60 3/60 120 100 80
Tim low 1/8 18/60 9/60 120 100 80
Joe med 1/8 26/60 17/60 90 90 70
Jim med 1/8 26/60 17/60 100 100 80
Ann med 1/8 26/60 17/60 120 100 100
Eva med 1/8 26/60 17/60 130 110 110
Sue hi 1/8 12/60 44/60 60 100 140
Mia hi 1/8 16/60 36/60 60 100 140
This table also contains the values of a qualitative covariate Z , which indicates an at-
tribute of the person, his or her educational status. Because it is an attribute of the person,
there is no extra sampling of Z . The value z of Z is fixed as soon as the person is actually
sampled. In chapter 2, we will also deal with random experiments in which we do have an
extra sampling process of one or several covariates. This will always be the case if, given
the person, his or her value on Z is not fixed. A typical example is Z being a fallible pretest,
for instance, a psychological test (say, of live satisfaction) that is not perfectly reliable, so
that there is measurement error.
For this example to be realistic, we have to assume that there is still variation of the
outcome variable Y in each combination of person and treatment condition. This condi-
tional variance may be due to (a) measurement error, but also to (b) mediator effects, that
is, to effects of variables and events that are in between X and the outcome variable Y in
the process considered. Because Y is continuous and subject to measurement error and
mediator effects, a full table similar to Table 1.2 with all possible outcomes is not feasi-
ble. It will not exist if Y is actually continuous, which would be true if we would assume,
for example, that Y has a normal distribution given the combination of a person u and a
treatment condition x.
Nevertheless, it is still possible to present a table that is analog to Table 1.1. In that table,
the conditional probabilities P (Y =1|X =x ,U =u ) are identical to the conditional expecta-
tion values E (Y | X =x ,U =u ) if Y is binary with values 0 and 1 (see RS-Rem. 3.22).
1.2 Example 2 — Experiment With Two Nonorthogonal Factors 15
Table 1.6. Conditional expectation values of the outcome variable Y given treatment
The conditional expectation values of the outcome variable Y given one of the three treat-
ment conditions x are displayed in Table 1.6. The ratios in the last column are the treat-
ment probabilities P (X =x ), which are 1/3 for all three values x of X . Note that this is not a
randomized design as will become obvious if we look at the joint probabilities of X and Z
(see Table 1.7). Furthermore, considering the conditional expectation values, and not the
sample means, should make clear that we are not discussing statistical inference (i. e., in-
ference from sample statistics to true parameters), but causal inference, that is, inference
from the conditional expectation values such as E (Y | X =x ) or E (Y | X =x , Z =z) to causal
effects.
If our evaluation of the treatment effects were based on the differences between the
conditional expectation values E (Y |X =x ) of Y in the three treatment conditions x, then
we would conclude that there is a negative effect of treatment 1 compared to control,
namely,
E (Y |X =1) − E (Y |X =0) = 100 − 111.25 = −11.25,
A second attempt to evaluate the ‘effects’ of the treatment is to look at the differences
between the conditional expectation values of Y in the three treatment conditions given
one of the three values of Z : low, med, and hi. These (Z =z)-conditional effects are also
called simple effects in the literature on analysis of variance.
Table 1.7 displays the conditional expectation values of the outcome variable Y in the
nine cells of the (3×3)-design. The ratios in parentheses are the probabilities that the pairs
(x, z) of values of X and Z are observed. Hence, this table contains the conditional expec-
tation values (true cell means) E (Y | X =x , Z =z) of the outcome variable Y , and the joint
probabilities P (X =x , Z =z) determining the true joint distribution of X and Z .2
2 In this context, ‘true’ just indicates that we are not referring to sample means or relative frequencies in a sample.
Instead these are the true means around which sample means would vary.
16 1 Introductory Examples
Table 1.7. Conditional expectation values E (Y | X =x , Z=z) given treatment and status
Status
Treatment low (Z = 0) med (Z =1) hi (Z = 2)
X =0 120 (20/120) 110 (17/120) 60 (3/120) (40/120)
X =1 100 (7/120) 100 (26/120) 100 (7/120) (40/120)
X =2 80 (3/120) 90 (17/120) 140 (20/120) (40/120)
(30/120) (60/120) (30/120)
In the low status condition (Z = 0), there are large negative effects, both of treatment 1
and of treatment 2 compared to the control:
PFE Z ;10 (0) = E (Y | X =1, Z = 0) − E (Y | X = 0, Z = 0) = 100 − 120 = −20
and
PFE Z ;20 (0) = E (Y | X =2, Z = 0) − E (Y | X = 0, Z = 0) = 80 − 120 = −40.
In the medium status condition (Z =1), there are also negative effects of treatment 1 and
of treatment 2 compared to the control:
PFE Z ;10 (1) = E (Y |X =1, Z =1) − E (Y |X = 0, Z =1) = 100 − 110 = −10
and
PFE Z ; 20 (1) = E (Y | X =2, Z =1) − E (Y |X = 0, Z =1) = 90 − 110 = −20.
Finally, in the high status condition (Z =2), the effects of treatment 1 and treatment 2 are
both positive:
PFE Z ; 10 (2) := E (Y | X =1, Z =2) − E (Y | X =0, Z =2) = 100 − 60 = 40
and
PFE Z ;20 (2) := E (Y | X =2, Z =2) − E (Y | X =0, Z =2) = 140 − 60 = 80.
Based on these comparisons, we can conclude that the ‘effects’ of the treatments depend
on the status of the subjects: the differences between the expectations of Y are negative
for subjects with low and medium status, and they are positive for the subjects with high
status.
Now we consider the average of the (Z =z)-conditional prima facie effects, where Z is again
the qualitative covariate status.3 Because we already looked at the corresponding (Z =z)-
conditional prima facie effects (see section 1.2.2), we just have to compute their averages,
3 Note that we assume that Z is a random variable. In contrast, in analysis of variance it is assumed that Z is
a fixed factor with a fixed number of observations for each value z of Z , that is, these numbers of observations
are assumed to be invariant across different samples. In many empirical applications, this assumption is not
realistic, but it does not invalidate the statistical conclusions as long as the parameters of interest do not involve
the distribution of Z . However, a hypothesis about the average total effect does involve the distribution of Z
if we the term ‘average’ is specified as an expectation value, and this is the reason why programs on analysis
of variance usually are not able to correctly estimate and test hypotheses about average total effects. For more
details see again Mayer and Thoemmes (2019).
1.2 Example 2 — Experiment With Two Nonorthogonal Factors 17
160
120
E (Y |X =x )
E (Y | X =x , Z =1)
80 E (Y | X =x , Z = 0)
40
0 1 2 X
more precisely, the expectations of these conditional effects over the distribution of status:
¡ ¢ X 1 1 1
E PFE Z ;10 (Z ) = PFE Z ; 10 (z) · P (Z =z) = (−20) · + (−10) · + 40 · = 0. (1.9)
z 4 2 4
Hence, the average of the (Z =z)-conditional prima facie effects of treatment 1 compared
to the control is 0.
Comparing treatment 2 to control yields the average of the (Z =z)-conditional prima
facie effects:
¡ ¢ X 1 1 1
E PFE Z ;20 (Z ) = PFE Z ; 20 (z) · P (Z =z) = (−40) · + (−20) · + 80 · = 0. (1.10)
z 4 2 4
According to this result, the average effect of the (Z =z)-conditional prima facie effects of
treatment 2 compared to the control is 0 as well.
In this fictive example we can also look at the individual effects of treatment 1 compared
to control and treatment 2 compared to control. These two effects can be read from Table
1.5 for each person. For example, for Tom the individual effect of treatment 1 compared to
control is
For the reasons mentioned in section 1.1.1, unlike the (Z =z)-conditional prima facie ef-
fects treated in section 1.2.2, the individual effects usually cannot be estimated in em-
pirical applications. Nevertheless, they will play a crucial role in the definition of causal
effects.
Of course, individual effects are more informative than their average if we want to know
which treatment is the best for which individual. Nevertheless, we might ask: What are the
total individual treatment effects on average ? And, which are the (Z =z)-conditional effects,
that is, the total individual treatment effects on average, given the value z of Z ? Further-
more, if the total individual effects cannot be estimated in empirical applications under
realistic assumptions, is it possible to estimate at least the average of the total individual
treatment effects and/or the total individual treatment effects on average given the value
z of Z ? And if yes, under which conditions?
Note that we have two averages of the individual effects in this example; we can com-
pare treatment 1 to control and treatment 2 to control. Because we already looked at the
corresponding individual effects, we just have to compute their averages, that is, the ex-
pectations of these conditional effects over the distribution of the person variable U , that
is, ¡ ¢ X
E ITE U ; 10 (U ) = ITE U ; 10 (u) · P (U =u )
u
1 1 1 1
= (100 − 120) · + (100 − 120) · + (90 − 90) · + . . . + (100 − 60) · = 0.
8 8 8 8
Hence, the average total effect of treatment 1 compared to the control is 0. Comparing
treatment 2 to control yields
¡ ¢ X
E ITEU ;20 (U ) = ITE U ; 20 (u) · P (U =u )
u
1 1 1 1
= (80 − 120) · + (80 − 120) · + (70 − 90) · + . . . + (140 − 60) · = 0.
8 8 8 8
According to this result, the average total effect of treatment 2 compared to the control is
0 as well.
Again, because we already know the individual effects, we just have to compute their av-
erages given the value z of the covariate, or more precisely, the (Z =z)-conditional expec-
tation values of the corresponding individual effects. We exemplify the computations for
the value med of the covariate (second factor) Z . Comparing treatment 1 to control yields
¡ ¯ ¢ X
E ITE U ;10 (U ) ¯ Z =med = ITE U ; 10 (u) · P (U =u |Z =med )
u
1 1 1 1
= (90 − 90) · + (100 − 100) · + (100 − 120) · + (110 − 130) · = −10.
4 4 4 4
1.3 Summary and Conclusion 19
Hence, the (Z =med )-conditional total effect of treatment 1 compared to control is −10.
Comparing treatment 2 to control we obtain
¡ ¯ ¢ X
E ITEU ;20 (U ) ¯ Z =med = ITE U ; 20 (u) · P (U =u |Z =med )
u
1 1 1 1
= (70 − 90) · + (80 − 100) · + (100 − 100) · + (110 − 110) · = −10.
4 4 4 4
Hence, the (Z =med )-conditional total effect of treatment 2 compared to control is −10 as
well.
¡ The computations¯ ¢for the other two
¡ values of ¯ Z are analogous.
¢ For Z =low we obtain
E ITE U ; 10¡ (U ) ¯ Z =low¯ = −20 ¢ and E ITE U¡;20 (U ) ¯ Z =low = −40. In contrast, for Z =hi we
¯ ¢
obtain E ITEU ; 10 (U ) ¯ Z =hi = −40 and E ITE U ;20 (U ) ¯ Z =hi = −80.
To summarize, we discussed several ways that may, at first sight, be used to evaluate the
treatment effects in empirical applications: First, we may compare the differences be-
tween the conditional expectation values E (Y |X =x ) of the outcome variable in the three
treatment conditions x ∈ { 0, 1, 2}. Second, we may consider the corresponding differences
between the conditional expectation values E (Y |X =x ,Z =z) given each of the three val-
ues z ∈ {low, med, hi } of status. Third, we may compare the expectations of these differ-
ences between the (X =x , Z =z)-conditional expectation values over the distribution of Z .
All these comparisons yield different results. Which of them are meaningful for the evalu-
ation of the treatment effects? All three of them, or only two, just one, or none at all? And
which are the conditions under which they are meaningful?
Furthermore, we also presented three parameters based on individual effects that may
be used to evaluate the treatment effects: First, the individual total effect of a treatment
compared to a control. These effects are hard to estimate in empirical applications, unless
we introduce very restrictive assumptions. Second, the expectation of these individual ef-
fects, which in this example, might also be called average total effects. This kind of effect
is less informative than the individual total effects, but it is a summary parameter that in-
forms us if the treatment is beneficial, ineffective, or even harmful on average. Third, the
conditional expectation values of the individual total effects, which in this example, might
also be called the (Z =z)-conditional total treatment effects. They inform us with a single
number for each value z if the treatment is beneficial, ineffective, or even harmful on av-
erage for those individuals with value z on the covariate Z . Box 1.1 provides a summary of
these effects.
In this example, the averages of the (Z =z)-conditional prima facie effects are identical
to the averages of the individual total effects. Is this just a coincidence? Or is this due to
systematic conditions that hold in this example? If yes, which are these conditions?
In this chapter, we treated two examples. In the first one, a dichotomous treatment vari-
able X has a negative (prima facie) effect P (Y =1|X =1) − P (Y =1|X =0) on a dichotomous
outcome variable Y (‘success’), although the corresponding individual treatment effects
20 1 Introductory Examples
are positive. Taking the expectation of these two individual effects also yields a positive
effect.
In the second example, there are nonzero differences E (Y |X =1) − E (Y |X =0) and
E (Y |X =2) − E (Y |X =0), where Y is a quantitative outcome variable, and nonzero condi-
tional ‘effects’ E (Y |X =1, Z =z) − E (Y |X =0, Z =z) and E (Y |X =2, Z =z) − E (Y |X =0, Z =z)
for the different values z of status. The expectations of these (Z =z)-conditional prima fa-
cie effects (comparing treatment 1 to 0 and comparing treatment 2 to 0) over the three
status conditions are zero.
The Problem
Box 1.1 displays the various total effects that have been computed and discussed in the
two examples. Because the conclusions drawn from each of these putative effects are con-
tradictory, which of these should we trust? In the first example: Is the treatment harmful —
as the difference P (Y =1|X =1) − P (Y =1|X =0) suggests? Or is it beneficial as suggested by
the individual effects P (Y =1|X =1,U =u ) − P (Y =1|X =0,U =u )? In the second example:
Do the prima facie effects E (Y | X =1) − E (Y | X =0) have a meaningful causal interpreta-
tion? Or do the (Z =z)-conditional prima facie effects E (Y | X =1, Z =z) − E (Y | X =0, Z =z)
have a meaningful causal interpretation? And, does this apply also to their expectation?
In the first example, we demonstrated that we cannot expect that the difference
Similarly, in the second example, we showed that we can not expect that the difference
E (Y | X =1) − E (Y | X =0)
given a value z of status. And, how do we know that these (Z =z)-conditional effects are
meaningful for the evaluation of the treatment? As noted before, these questions are not
related to statistical inference; they are not raised at the sample level, but on the level of
true conditional expectation values!
Hence our examples show that conditional expectation values and their differences,
the prima facie effects, can be totally misleading in evaluating the effects of a treatment
variable X on an outcome variable Y . This conclusion can also be extended to conditional
probabilities, to correlations and to all other parameters describing relationships and de-
pendencies between random variables. They all are like the shadow in the metaphor of the
invisible man (see the preface).
If this is true, is the whole idea of learning from experience — the core of empirical
sciences — wrong? Our answer is ‘No’. However, we have to be more explicit in what we
1.3 Summary and Conclusion 21
¡ ¢
E PFE Z ; x x ′ (Z ) Expectation of the (Z=z)-conditional prima facie effects of treat-
ment x compared to treatment x ′. If Z is discrete, then it is com-
puted by
¡ ¢ X
E PFE Z ; x x ′ (Z ) := PFE Z ; x x ′ (z) · P(Z=z ).
z
¡ ¢
E ITEU ; x x ′ (U ) Expectation of the individual effects of treatment x compared to
treatment x ′. It is computed by
¡ ¢ X
E ITEU ; xx ′ (U ) := ITEU ; xx ′ (u) · P(U =u ).
u
¡ ¯ ¢
E ITEU ; x x ′ (U ) ¯ Z=z (Z =z)-conditional expectation value of the individual effects of
treatment x compared to treatment x ′. It is computed by
¡ ¯ ¢ X
E ITEU ; xx ′ (U ) ¯ Z=z := ITEU ; x x ′ (u) · P(U =u |Z=z).
u
mative about total causal treatment effects. But why? What is so special in the randomized
experiment? Which are the mathematical conditions that we create in a randomized ex-
periment? Are there also conditions that can be utilized in quasi-experimental evaluation
studies? How can we estimate causal effects in quasi-experimental observational studies?
Conclusive answers to these questions can be hoped for only within a theory of causal
effects.
Obviously, these questions are of fundamental importance for the methodology of empir-
ical sciences and for the empirical sciences themselves. The answers to these questions
have consequences for the design and analysis of experiments, quasi-experiments, and
other studies aiming at estimating the effects of treatments, interventions, or expositions.
No prevention study can meaningfully be conducted and analyzed without knowing the
concepts of causal effects and how they can be estimated from empirical data. Similarly,
without a clear concept of causal effects we are not able to learn from our data about the
effects of a certain (possibly harmful) environment on our health, or about the effects of
certain behaviors such as smoking or drug abuse. Again, this is similar to the problem of
measuring the invisible man’s size via the length of his shadow: only with a clear concept
of size, some basic knowledge in geometry, and the additional information such as the an-
gle of the sun at the time of measurement are we able to determine his size of the man
from the length of his shadow.
Research Traditions
Of course, raising these questions and attempting answers is not new. Immense knowl-
edge and wisdom about experiments and quasi-experiments has been collected in the
Campbellian tradition of experiments and quasi-experiments (see, e. g., Campbell & Stan-
ley, 1963; Cook & Campbell, 1979; Shadish et al., 2002). In the last decades, a more for-
mal approach has been developed supplementing the Campbellian theory and terminol-
ogy in important aspects: the theory of causal effects in the Neyman-Rubin tradition (see,
e. g., Splawa-Neyman, 1923/1990; Rubin, 1974, 2005). Many papers and books indicate the
growing influence of this theory (see, e. g., Greenland, 2000, 2004; Höfler, 2005; Rosen-
baum, 2002a; Rubin, 2006; Winship & Morgan, 1999; Morgan & Winship, 2007) and re-
markable efforts have already been made to integrate it into the Campbellian framework
(West, Biesanz, & Pitts, 2000). Furthermore, these questions have also been dealt with in
the graphical modeling tradition (see, e. g., Pearl, 2009; Spirtes et al., 2000) as well as in
biometrics, econometrics, psychometrics, epidemiology, and other fields dealing with the
methodology of empirical research.
Outlook
In this volume, we present the theory of causal total effects in terms of mathematical prob-
ability theory. We show that a number of questions that have been debated controver-
sially and inconclusively can now be given a clear-cut answer. What kinds of causal effects
can meaningfully be defined? Which design techniques allow for unbiased estimation of
causal effects? How to analyze nonorthogonal ANOVA designs (cf., e. g., Aitkin, 1978; Ap-
1.4 Exercises 23
pelbaum & Cramer, 1974; Gosslee & Lucas, 1965; Maxwell & Delaney, 2004; Overall et al.,
1975)? How to analyze non-equivalent control-group designs (cf., e. g., Reichardt, 1979)?
Should we compare pre-post differences between treatment groups (cf., e. g., Lord, 1967;
Senn, 2006; van Breukelen, 2006; Wainer, 1991)? Should we use analysis of covariance to
adjust for differences between treatment and control that already existed prior to treat-
ment (cf., e. g., Maxwell & Delaney, 2004; Cohen, Cohen, West, & Aiken, 2003)? Should
we use propensity score methods instead of the more traditional procedures mentioned
above (cf., e. g., Rosenbaum & Rubin, 1983b)? How do we deal with non-compliance to
treatment assignment (cf., e. g., Cheng & Small, 2006; Dunn et al., 2003; Jo, 2002a, 2002b,
2002c; Jo, Asparouhov, Muthén, Ialongo, & Brown, 2008; J. Robins & Rotnitzky, 2004;
J. M. Robins, 1998)?
We do not treat the statistical sampling models with their distributional assumptions,
their implications for parameter estimation, and the evaluation (or tests) of hypotheses
about these parameters. However, we will discuss the virtues and problems of general
strategies of data analysis such as the analysis of difference scores, analysis of covariance,
its generalizations, and analysis based on propensity scores.
1.4 Exercises
⊲ Exercise 1-2 What is the relationship between the unconditional prima facie effect PFE 10 and the
expectations E (Y |X =0) and E (Y |X =1) of the outcome variable Y in the two treatment conditions?
⊲ Exercise 1-3 Compute the probabilities P(X =x ,Y =y) presented in Table 1.3 from the probabili-
ties P(U =u , X =x ,Y =y) presented in Table 1.4.
⊲ Exercise 1-4 Which are the kinds of prima facie effects treated in this chapter?
⊲ Exercise 1-5 What is the difference between statistical inference and causal inference?
⊲ Exercise 1-6 Why are the conditional expectation values E (Y |X =x ) in treatment conditions x
also the (X =x )-conditional probabilities for the event {Y =1} in the first example treated in this chap-
ter?
⊲ Exercise 1-7 Download Kbook Table 1.1.sav from www.causal-effects.de. This data set has been
generated from Table 1.1 for a sample of size N = 10,000. Compute the contingency table corre-
sponding to Table 1.3 and the associated estimates of the conditional probabilities P(Y =1|X =0)
and P(Y =1|X =1).
⊲ Exercise 1-8 Use P(Y =1| X =1) = .42 and P(Y =1| X = 0) = .6 computed in section 1.1.3 in order to
compute the corresponding odds ratio, its logarithm, and the risk ratio, according to the definitions
of these parameters presented in SN-sect. 13.3.
⊲ Exercise 1-9 Compute the conditional probability P(Y =1| X =1,U =Joe ) from Table 1.4.
⊲ Exercise 1-10 Compute the probability P(Y =1| X = 0) from the corresponding conditional prob-
abilities P(Y =1|X =0,U =u ).
⊲ Exercise 1-11 What (i. e., how big) are the unconditional prima facie effects of the treatments, that
is, the prima facie effects E (Y |X =1) −E (Y |X =0) and E (Y |X =2) −E (Y |X =0) in the second example
of this chapter?
24 1 Introductory Examples
⊲ Exercise 1-12 What are the conditional prima facie effects of the treatments, that is, the prima
facie effects E (Y |X =1, Z=z) −E (Y |X =0, Z=z) and E (Y |X =2, Z=z) −E (Y |X =0, Z=z ) in the second
example of this chapter?
⊲ Exercise 1-13 What are the averages of the conditional prima facie effects
⊲ Exercise 1-14 Compute the conditional probability P(U =Tom | X =0) from the parameters pre-
sented in Tables 1.5 and 1.6.
⊲ Exercise 1-15 Open the Causal Effects Xplorer with table K-book table 1.1.tab. Change the condi-
tional probabilities P(X =1|U =u ) of receiving treatment 1 for Joe and Ann to 2/5. Then compare the
two individual treatment effects of Joe and Ann and their average to the prima facie effect E (Y |X =1)
−E (Y |X =0).
⊲ Exercise 1-16 Open the Causal Effects Xplorer with table K-book Table 1.1.tab displaying the con-
ditional probabilities P(U =u |X =x ). Then use RS-Box 3.2 (ii) in order to compute the three condi-
tional expectation values E (Y |X =x ) displayed in Table 1.6 from the parameters presented in Table
1.5.
⊲ Exercise 1-17 Download Kbook Table 1.5.sav from www.causal-effects.de. This data set has been
generated with the Causal Effects Xplorer from Table 1.5 for a sample of size N = 10,000 with error
variance 10 given each person.
(a) Compute the cell means and the relative frequencies of observations in each of the nine cells
of the (3×3)-table.
(b) Use each of the procedures offered by your statistical program package to analyze the data
including a test of the main effects of the treatment factor (most programs offer Typ I, II and
III sums of squares for such an analysis).
(c) Compare the results of these analyses to the parameters presented in Table 1.7.
Solutions
⊲ Solution 1-1 We need the concept of a causal treatment effect, because the two examples show
that differences between conditional expectation values are meaningless for the evaluation of the ef-
fects of a treatment, unless we can show how the differences between these conditional expectation
values are related to the causal treatment effects. Obviously, without a definition of causal treatment
effects this is not possible. Estimating causal treatment effects is crucial for answering questions
such as ‘Does the treatment help our patients with respect to the outcome variable considered?’
⊲ Solution 1-2 The unconditional prima facie effect PFE 10 is defined as the difference between the
two conditional expectation values E (Y |X =1) and E (Y |X =0).
⊲ Solution 1-3 This can easily be verified by adding the probabilities for the observations of the
pairs (x, z) of X and Z over males and females. This yields .144 + .096 = .24, .004 + .228 = .232, .336 +
.024 = .36 and .016 + .152 = .168.
⊲ Solution 1-4 The kinds of prima facie effects treated in this chapter are: the unconditional prima
facie effect, the conditional prima facie effect given the value z of a covariate Z , and the average of the
(Z=z)-conditional prima facie effects. The unconditional prima facie effect of treatment 1 compared
1.4 Exercises 25
to treatment 0 is the difference PFE 10 = E (Y |X =1) −E (Y |X =0) between the conditional expecta-
tion values of an outcome variable Y given the two treatment conditions. The (Z=z)-conditional
prima facie effect is the difference PFE Z ; 10 (z) = E (Y |X =1, Z=z) − E (Y |X = 0, Z=z) between the
(X =1, Z=z)-conditional expectation value and the (X =0, Z=z)-conditional expectation value of the
outcome variable Y . The average prima facie effect is the expectation of the (Z=z)-conditional
prima facie effects over the distribution of Z [see Eqs. (1.9) and (1.10)].
⊲ Solution 1-5 In statistical inference we estimate and test hypotheses about parameters charac-
terizing the (joint or marginal) distributions of random variables from sample data. In causal infer-
ence we interpret some of these parameters as causal effects, provided that certain conditions are
satisfied that allow for such a causal interpretation.
⊲ Solution 1-6 E (Y |X =x ) = P(Y =1| X =x ), because, in this example, Y is dichotomous with values
0 and 1. In this case, the term P(Y =1 | X =x ) is defined by E (Y |X =x ) (see RS-Rem. 3.22).
⊲ Solution 1-7 No solution provided. Just compare your results to the true parameters presented in
Table 1.3 and to the conditional probabilities P(Y =1|X =0) and P(Y =1|X =1) presented in section
1.1.3.
⊲ Solution 1-8 The odds ratio is
P(Y =1|X =1) . P(Y =1|X =0)
≈ .483.
1 − P(Y =1|X =1) 1 − P(Y =1|X =0)
Because this number is smaller than 1 it indicates that there is a negative effect of the treatment. The
natural logarithm of the odds ratio is the log odds ratio, which is
P(Y =1|X =1) . P(Y =1|X =0)
· ¸
ln ≈ −0.728.
1 − P(Y =1|X =1) 1 − P(Y =1|X =0)
This number is smaller than 0 indicating that there is a negative effect of the treatment. The log odds
ratio is identical to the logistic regression coefficient λ1 in the equation
exp(λ0 + λ1 · X )
P(Y =1|X ) = .
1 + exp(λ0 + λ1 · X )
Another closely related parameter is the risk ratio
P(Y =1|X =1)
= .7.
P(Y =1|X =0)
Because this number is smaller than 1, it indicates that there is a negative effect of the treatment.
Hence, no matter which of these parameters we use, we would always come to the same (wrong)
conclusion that the treatment is detrimental for our patients.
⊲ Solution 1-9 Using the joint probabilities presented in Table 1.4, the definition of the conditional
probability yields
P(X =1,Y =1,U =Joe ) .016
P(Y =1|X =1,U =Joe ) = = = .8.
P(X =1,U =Joe ) .016 + .004
⊲ Solution 1-10 First of all, note that the theorem of total probability (see RS-Th. 1.38), can also
be applied to conditional probabilities (see RS-Th. 1.42). In this exercise, it is applied to the (X =0)-
conditional probabilities P(Y =1| X = 0) = P X=0(Y =1). Hence, according to this theorem,
The probabilities P(Y =1|X =0,U =Joe ) = .7 and P(Y =1|X =0,U =Ann) = .2 are computed analo-
gously to Exercise 1-9 and the other two probabilities occurring in this formula are P(U =Joe |X =0) =
.48/.6 and P(U =Ann|X =0) = .12/.6 (see Table 1.4). Hence,
.7 · .48 .2 · .12
P(Y =1| X = 0) = + = .6.
.48 + .12 .48 + .12
26 1 Introductory Examples
⊲ Solution 1-11 The prima facie effects E (Y |X =1) −E (Y |X =0) and E (Y |X =2) −E (Y |X =0) can be
computed from Table 1.6 as follows:
and
PFE 20 = E (Y |X =2) −E (Y |X =0) = 114.25 − 111.25 = 3.00.
⊲ Solution 1-12 The conditional prima facie effects
can be computed from Table 1.7. For low status (Z =low), they are:
⊲ Solution 1-13 Using the results of the last exercise, the average of the (Z=z)-conditional prima
facie effects can be computed from the conditional effects as follows:
¡ ¢
E PFE Z ; 10 (Z ) = PFE Z ;10 (low) · P(Z =low) + PFE Z ; 10 (med ) · P(Z =med ) + PFE Z ; 10 (hi ) · P(Z =hi )
1 1 1
= (−20) · + (−10) · + 40 · = 0.
4 2 4
¡ ¢
E PFE Z ;20 (Z ) = PFE Z ;20 (low) · P(Z =low) + PFE Z ; 20 (med ) · P(Z =med ) + PFE Z ; 20 (hi ) · P(Z =hi )
1 1 1
= (−40) · + (−20) · + 80 · = 0.
4 2 4
⊲ Solution 1-14
P(U =Tom , X =0) P(X =0 | U =Tom ) · P(U =Tom )
P(U =Tom | X =0) = =
P(X =0) P(X =0)
(47/60) · (1/8) 47
= = .
1/3 160
⊲ Solution 1-15 With this change, the prima facie effect changes to E (Y |X =1) −E (Y |X =0) = .6 −
.45 = .15, which is the average of the two individual total effects, which still are .10 for Joe and .20 for
Ann. Note that identical treatment probabilities P(X =1|U =u ) for all persons u is what we create by
randomly assigning a person to treatment 1 in a randomized experiment.
Hence,
1.4 Exercises 27
X
E (Y | X =0) = E (Y | X =0,U =u ) · P(U =u |X =0)
u
47 33 8
= 120 · + 120 · + ... + 60 · = 111.25,
160 160 160
X
E (Y | X =1) = E (Y | X =1,U =u ) · P(U =u |X =1)
u
10 18 16
= 100 · + 100 · + ... + 100 · = 100,
160 160 160
and
X
E (Y | X =2) = E (Y | X =2,U =u ) · P(U =u |X =2)
u
3 9 36
= 80 · + 80 · + ... + 140 · = 114.25.
160 160 160
⊲ Solution 1-17 No solution provided. Just compare your results to the parameters presented in
Table 1.7.
Chapter 2
Some Typical Kinds of Random Experiments
Overview
We start with the single-unit trial of simple experiments and then treat increasingly more
complex ones introducing additional design features. Specifically, we will introduce the
single-unit trials of experiments and quasi-experiments with fallible covariates, a multi-
30 2 Some Typical Kinds of Random Experiments
factorial design with more than one treatment, multilevel experiments and quasi-experi-
ments, and experiments and quasi-experiments with latent covariates and/or outcome
variables.
We also discuss different kinds of random variables that will play a crucial role in the
chapters to come. Among these random variables are the observational-unit variable or
person variable, manifest and latent covariates, treatment variables, as well as manifest
and latent outcome variables. In this chapter, we confine ourselves to an informal descrip-
tion of single-unit trials and the random variables involved, preparing the stage for their
mathematical representations in the subsequent chapters.
As a first class of random experiments we consider the single-unit trials of simple exper-
iments and quasi-experiments. Such single-unit trials are experiments and quasi-experi-
ments in which no fallible covariates are assessed, that is, no covariates whose values con-
sist in part of a measurement error. Such a single-unit trial consists of:
(a) sampling an observational unit u (e. g., a person) from a set of units,
(b) assigning the unit or observing its assignment to one of several experimental con-
ditions (represented by the value x of the treatment variable X ),
(c) recording the value y of the outcome (or response) variable Y .
Figure 2.1 displays a tree representation of the set of possible outcomes of this single-
unit trial. Note that this is the kind of random experiment we considered in the Joe-Ann
example presented in section 1.1 and in the two-factorial design example treated in sec-
tion 1.2. The random variables X (treatment), Y (success), and Z (status), the conditional
expectation values E (Y |X =x ) and E (Y |X =x ,Z =z), as well as the probabilities P (X =x ),
P (Z =z), P (X =x , Z =z) all referred to such a single-unit trial. Of course, all these condi-
tional expectation values and probabilities are unknown in empirical applications. Never-
theless, they are among the parameters that determine the outcome of a single-unit trial,
just in the same way as the probability of heads determines the outcome of flipping a coin.
In order to illustrate this point, imagine flipping a deformed coin that has the shape of a
Chinese wok, and suppose that in this case the probability of flipping heads is .8 instead of
.5. Although this probability does not deterministically determine the outcome of flipping
the coin, it stochastically determines the outcome.
In fact, we may consider the single-unit trial of (a) sampling a coin u from a set of coins,
(b) forming (X =1) or not forming (X =0) a wok out of it, and (c) observing whether (Y =1)
or not (Y = 0) we flip heads. In this single-unit trial, the difference .8 − .5 = .3 would be the
causal total effect of the treatment variable X on the outcome variable Y . Note that the
probabilities .8 and .5 and their difference .3 refer to this single-unit trial, although these
probabilities can only be estimated if we conduct many of these single-unit trials, that is,
if we draw a data sample. However, if these probabilities were known, we could dispense
with a sample (including the data that would result from drawing it), and still have a per-
fect theory and prediction for the outcome of such a single-unit trial (see Exercise 2-1).
2.1 Simple Experiments 31
y1
control y2
..
u1 .
y1
treatment y2
..
.
y1
control y2
..
u2 .
y1
treatment y2
..
.
..
.
Sampling a Unit
The first part of this single-unit trial consists of sampling an observational unit. In the so-
cial sciences, units often are persons, but they might be groups, school classes, schools
and even countries. Usually such units change over time. Therefore, it should be em-
phasized that, in simple experiments and quasi-experiments, we are talking about the
units at the onset of treatment . Later we will see that we have to distinguish between units
at the onset of treatment and units at the time of assessment of the outcome variable, which
might be months or even years later (for details see Steyer, Mayer, Geiser, & Cole, 2015).
In a single-unit trial of simple experiments and quasi-experiments, the units can be rep-
resented by the observational-unit variable U , whose possible values u are the units at the
onset of treatment.
Note that the unit at the onset of treatment also comprises his or her experiences a year
and/or the day before treatment, as well as the psycho-bio-social situation in which he
or she is at the onset of treatment. Both, the experiences and the situation, already hap-
pen before the onset of treatment (see again Steyer et al., 2015 for more details). Therefore,
they are attributes of the observational units u. They can be treated in the same way as
other attributes such as sex and educational status. However, if these attributes are actu-
ally assessed and if this assessment is fallible, then we have to distinguish between these
attributes and their fallible assessments (see sect. 2.2).
Treatment Variable
to one of the possible treatments with a probability that is fixed by the experimenter. In
contrast, in a quasi-experiment we just observe selection (e. g., self-selection or selec-
tion by someone else) to one of the treatment conditions. In the simplest case there are at
least two treatment conditions, for example, treatment and control. These treatment con-
ditions are the possible values of the treatment variable X . For simplicity, we use the values
0, 1, . . . , J to represent J +1 treatment conditions. Furthermore, unless stated otherwise, we
presume that treatment assignment and actual exposure to treatment are equivalent, that
is, we assume that there is perfect compliance.
Selection of a unit into one of the treatment conditions x may happen with unknown
probabilities. This is the case, for example, in self-selection or assignment by an unknown
physician. In this case we talk about a quasi-experiment. However, assignment can also
be done with known probabilities that are equal for different units such as in the sim-
ple randomized experiment or with known probabilities that may be unequal for different
units such as in the conditionally randomized experiment . In this case, these treatment
probabilities may also depend on a covariate Z representing pre-treatment attributes of
the units. As mentioned above, conditional and unconditional randomized assignment,
distinguish the true experiment from the quasi-experiment, in which the assignment prob-
abilities are unknown. (See Remarks 8.58 and 8.59 for more details on randomization and
conditional randomization.)
In simple experiments and quasi-experiments, the focus is usually on total treatment ef-
fects on an outcome variable. Hence, if we are interested in the treatment variable as a
cause, then each attribute of the observational units is a potential confounder. Examples
are sex, race, educational status, and socio-economic status. Once the unit is drawn, its sex,
race, educational status, and socio-economic status are fixed. This means that there is no
additional sampling process associated with assessing these potential confounders. This
is also the reason why they do not appear in points (a) to (c) describing the single-unit
trial.
A potential confounder is also called a covariate if it is actually assessed and used to-
gether with X in a conditional expectation or a conditional distribution. Note that a po-
tential confounder can also be unobserved, and in this case we usually do not call it a
covariate.
Because potential confounders represent attributes of the unit at the onset of treatment
they can never be affected by the treatment. However, there can be (stochastic) dependen-
cies between the treatment variable and potential confounders. In the Joe-Ann example
treated in sect. 1.1, for instance, there is a stochastic dependence between the treatment
variable X and the person variable U . Similarly, in the second example presented in sec-
tion 1.2 there is a stochastic dependence between Z (status) and the treatment variable
X.
Note that the observational-unit variable U and the U -conditional treatment probability
P (X =x |U ) (see RS-Def. 4.4) are potential confounders as well. The values of P (X =x |U )
are the conditional probabilities P (X =x |U =u ), which are attributes of the persons u [see
Eq. (1.1)]. Similarly, the Z -conditional treatment probability P (X =x | Z ) is also a potential
confounder provided that Z is a covariate [see Def. 4.11 (iv) and Rem. 4.16]. Furthermore,
the assignment to treatment x with values ‘yes’ and ‘no’ is also a potential confounder if
assignment to treatment and exposure to treatment (again with values ‘yes’ and ‘no’) are
not identical and exposure to treatment is focused as a (putative) cause. This distinction
is useful in experiments with non-compliance (see, e. g., Jo, 2002a, 2002b, 2002c; Jo et al.,
2008).
Outcome Variable
Of course, the outcome variable Y refers to a time at which the treatment might have had
its impact. Hence, treatment variables are always prior to the outcome variable. In prin-
ciple, we may also observe several outcome variables, for example, in order to study how
the effects of a treatment grow or decline over time or to study treatment effects that are
not confined to a single outcome variable. All random variables mentioned above refer to
a concrete single-unit trial and they have a joint distribution. Each combination of unit,
treatment condition, and score of the outcome variable may be an observed result of such
a single-unit trial. This implies that the variables U , Z , X , and Y , as well as unobserved
potential confounders, say W , have a joint distribution (see RS-Def. 2.38 and SN-section
5.3). Once we specified the random experiment to be studied, this joint distribution is
fixed, even though it might be known only in parts or even be unknown altogether.
34 2 Some Typical Kinds of Random Experiments
There are already several kinds of causal effects that can be considered in the single-unit
trial of a simple experiment or quasi-experiment. For simplicity, suppose the treatment
has just two values, say treatment and control. First, there is the causal average total effect
of treatment (compared to control) on the outcome variable Y . Second, there are the
causal conditional total treatment effects on Y , where we may condition on any function
of the observational-unit variable U . If, for example, Z := sex with values m for male and
f for female, then we may consider the causal (Z =m)-conditional total treatment effect
on Y , that is, the causal average total treatment effect for males, and the causal (Z = f )-
conditional total treatment effect on Y , that is, the causal average total treatment effect
for females. Similarly, if Z := socio-economical status, we may consider the causal condi-
tional total treatment effects on Y for each status group, etc. Third, although difficult and
often impossible to estimate, we may also consider the causal individual total effect of
treatment compared to control on Y .
By definition, within a simple experiment and quasi-experiment we cannot consider
any direct treatment effects with respect to one or more specified potential mediators, that
is, the effects of the treatment on the outcome variable that are not transmitted through
specified potential mediators. However, the causal total treatment effects discussed above
are, of course, transmitted through potential mediators, irrespective of whether or not we
observe (or are aware of) these potential mediators.
Another class of random experiments are single-unit trials of experiments and quasi-
experiments in which we assess a fallible covariate. In this case, the fallible covariate does
not represent a (deterministic) attribute of the observational units. The single-unit trial of
such an experiment of quasi-experiment consists of:
(a) sampling an observational unit u (e. g., a person) from a set of units,
(b) assessing the values z1 , . . . , zk of the covariates (pre-treatment variables) Z 1 , . . . , Z k ,
k ≥ 1.
(c) assigning the unit or observing its assignment to one of several experimental con-
ditions (represented by the value x of the treatment variable X ),
(d) recording the value y of the outcome variable Y .
The crucial distinction between a simple (quasi-) experiment and a (quasi-) experiment
with fallible covariates is that there is variability of at least one of the covariates given the
observational unit u (see Fig. 2.2). In this case, we may distinguish between the latent
covariate, say ξ, representing the attribute to be assessed and its fallible measures, some
manifest variables that can actually be observed. (For the theory of latent variables see
Steyer et al., 2015). Also note that sometimes it is crucial to adjust the effect of X on Y by
conditioning on the latent variable ξ in order to fully adjust for the bias of the prima-facie
effect of X on Y . In these cases, only adjusting for the manifest variables that measure the
latent covariate ξ does not completely remove bias (see, e. g., Sengewald, Steiner, & Pohl,
2019).
Furthermore, this distinction also implies that the unit whose attributes are measured
at the time when the potential confounder is assessed is not identical any more to the unit
2.2 Experiments With Fallible Covariates 35
y1
control y2
..
z1 .
y1
treatment y2
..
.
u1 y1
control y2
..
z2 .
.. y1
. treatment y2
..
.
y1
control y2
..
z1 .
y1
treatment y2
..
.
u2 y1
control y2
..
z2 .
.. y1
. treatment y2
..
.
..
.
at the onset of treatment (see section 2.1). The covariate might be assessed some months
before the treatment is given — enough time and plenty of possibilities for the unit to
change in various ways, for example, due to maturation, learning, critical life events, and
other experiences that are not fixed yet at the time of assessing the covariate. As a con-
sequence, a variable, say W , representing such intermediate events or experiences may
also affect the outcome variable Y and the treatment variable. Hence, such intermediate
variables are also potential confounders. This is one of the reasons why we need to define
causal effects in a more general way than in the Neyman-Rubin tradition (see ch. 5).
Note that assessing a fallible covariate does not only change the interpretation of the
observational-unit variable U (now its values are the units of the time of assessment of the
manifest covariates), but it also changes the random experiment, and with it, the empiri-
cal phenomenon we are considering. Assessing a fallible covariate often involves that the
sampled person fills in a questionnaire or takes a test. Assessing, prior to treatment, a fal-
36 2 Some Typical Kinds of Random Experiments
lible covariate such as a test of an ability, an attitude, or a personality trait, may change the
observational units and their attributes, as well as the effects of the treatment on a spec-
ified outcome variable, which usually is related to such pre-treatment variables. This has
already been discussed by Campbell and Stanley (1963), who also recommended designs
for studying how pre-treatment assessment modifies the effects of the treatment variable
on the outcome variable.
Which are the potential confounders of the treatment variable X in such a single-unit trial?
First of all, it is each attribute of the units at the time of the assessment of the observed
covariates. This does not only include variables such as sex, race, and educational status,
but also a latent covariate, say ξ, (which might be multi-dimensional). Furthermore, aside
from the manifest covariates, each variable W representing an intermediate event or ex-
perience of the unit (occurring in between the assessment of the observed covariates and
the onset of the treatment), as well as any attribute of the unit at the onset of treatment is
a potential confounder as well, irrespective of whether or not these potential confounders
are observed.
Note that a latent covariate ξ may be considered a cause of its fallible measures Z 1 , . . . , Z k
and of the outcome variable Y . This is not in conflict with the theory that the treatment
variable X is a cause of Y as well. In this kind of single-unit trial, we have several causes and
several outcome variables, and a cause itself can be considered as an outcome variable. For
example, it would be possible to consider the treatment variable X to be causally depen-
dent on the manifest or latent covariates. In other words, we may also raise the question
if the conditional treatment probabilities P (X =1 | Z 1 , . . . , Z k ) or P (X =1 | ξ ) describe causal
dependencies. This makes clear that the terms ‘potential confounder’ and ‘covariate’ can
only be defined with respect to a focused cause.
Sampling a Unit
Treatment Variables
As a simple example, let us consider an experiment in which we study the effects — in-
cluding the joint effects — of two treatment factors, say individual therapy represented by
X (with values ‘yes’ and ‘no’) and group therapy represented by Z (with values ‘yes’ and
‘no’).
In such a two-factorial experiment, we may consider group therapy as a covariate and
individual therapy to be the cause in order to ask for the conditional and average total
effects of individual therapy given group therapy and given no group therapy. In contrast,
we may also consider individual therapy to be a covariate and group therapy to be the
focused treatment variable. Finally, we may also consider the two-dimensional variable
(X , Z ) as the cause. Which option is chosen depends on the causal effects we are interested
in (see below).
Outcome Variable
Again, the outcome variable Y refers to a time at which the treatment might have exerted
the effects to be estimated. Hence, both treatment variables are prior to the outcome vari-
able considered. And again, we may also observe several outcome variables, for example,
in order to study how effects of a treatment grow or decline over time or to study effects
that are not confined to a single outcome variable.
Causal Effects
There are several causal effects we might look at. If X and Z have only two values, then we
may be interested in the following effects on the outcome variable Y :
(a1 ) the conditional total effect of ‘individual therapy’ as compared to ‘no individual ther-
apy’ given that the unit treated also receives ‘group therapy’,
(b 1 ) the corresponding conditional total effect given that the unit does not receive ‘group
therapy’, and
(c 1 ) the average of these conditional total effects of ‘individual therapy’ as compared to
‘no individual therapy’, averaging over the two values of Z (group therapy).
Vice versa, we might also be interested in the following effects on the outcome variable Y :
(a2) the conditional total effect of ‘group therapy’ as compared to ‘no group therapy’
given that the unit treated also receives ‘individual therapy’,
(b 2) the corresponding conditional total effect given that the unit does not receive ‘indi-
vidual therapy’, and
(c 2 ) the average of these conditional total effects of ‘group therapy’ as compared to ‘no
group therapy’, averaging over the two values of X .
(a3 ) the total effect of receiving ‘individual therapy’ and ‘group therapy’ as compared to
receiving none of the two treatments.
(b 3 ) the total effect of receiving ‘individual therapy’ and ‘no group therapy’ as compared
to receiving ‘group therapy’ and ‘no individual therapy’.
38 2 Some Typical Kinds of Random Experiments
All these effects may answer meaningful causal questions. In fact there are even more
causal effects than those listed above. For example, we could compare each of the four
combinations of the two treatments to an average of the other treatments. Furthermore,
many additional causal effects can be considered if we condition on other covariates such
as sex or educational status.
Which are the potential confounders in multilevel designs if the treatment variable X is
considered as the cause? The answer depends on the type of design considered: In de-
signs with assignment of units to clusters, attributes of the observational unit such as sex,
race, or educational status, are potential confounders of X . Other potential confounders
are attributes of the cluster such as school type, hospital ownership, or school-level socio-
economic status or school-level intelligence. The last two kinds of potential confounders
would be defined as conditional expectations of the corresponding potential confounders
at the unit-level given the cluster variable.
In these designs, clusters may not only be considered as potential confounders, but also
as treatments, because some of the effects observed later on may depend on the compo-
sition of the group to which a particular unit, say Joe, is assigned. Receiving group therapy
together with beautiful Ann in the same group might make a great difference as compared
to getting it together with awful Joe. In designs in which clusters as a whole are assigned to
treatment conditions, only attributes of the cluster can influence the assignment. Hence,
in data analysis we would focus on controlling for the potential confounders on the cluster
level (see, e. g., Nagengast, 2009, for more details).
We may also consider single-unit trials of experiments with a latent outcome variable. The
basic goal of such experiments is to investigate the effect of the treatment variable X on a
latent outcome variable, say η. This is of interest, for example, where a quantitative out-
come variable can only be measured by qualitative observations such as solving or not
solving certain items indicating the (latent) ability. However, it can also be of interest if the
manifest measures are linearly related to the latent variable such as in models of classical
test theory (see, e. g., Steyer, 2001) or in models of latent state-trait theory (see, e. g., Steyer
40 2 Some Typical Kinds of Random Experiments
et al., 2015). If, for example, there are three manifest variables Y1 , Y2, and Y3 measuring a
single latent variable η, then we may ask if there is just one single effect of the treatment
on the latent outcome variable η – which transmits these effects to the manifest variables
Y1 , Y2, and Y3 – instead of three separate effects of X on each variable Yi . Hence, the latent
variable may also be considered to be a mediator variable. Showing that all effects of X on
the variables Yi are indirect, that is, mediated by η is one of the research efforts that aims
at establishing construct validity of the latent variable η.
In the simplest case with a single latent variable, we consider the following single-unit
trial:
(b) assigning the unit or observing its assignment to one of several experimental con-
ditions (represented by the value x of the treatment variable X ),
Which are the potential confounders in such a single-unit trial? Again, the answer depends
on the cause considered. If it is the treatment variable X , then each attribute of the unit at
the onset of treatment is a potential confounder (with respect to X ). Obviously, this again
includes variables such as sex, race, and educational status. Note that in this kind of exper-
iments, the set of potential confounders of X is the same irrespective of the choice of the
outcome variable. Remember, we may not only consider the latent outcome variable η but
also the manifest outcome variables Yi , for example, in order to study whether or not the
effects of X on these manifest outcome variables are perfectly transmitted (or mediated)
through the latent variable η.
Choosing the latent outcome variable η as a cause of the manifest outcomes variables
Yi brings additional potential confounders into play, for instance, all those variables that
are in between treatment and the assessment of η. If, for example, we consider an exper-
iment studying the effects of different teaching methods, these additional potential con-
founders are critical life events (such as father or mother leaving the family), or additional
lessons taken after treatment and before outcome assessment, for instance.
2.6 Summary and Conclusion 41
Note that all terms mentioned in this box are still of an informal nature. Their mathematical
specification starts in chapter 3.
Random experiment The kind of empirical phenomenon to which events, random
variables, and their dependencies refer.
Randomized experiment An random experiment in which the experimenter fixes the treat-
ment probabilities for each observational unit.
Single-unit trial A particular kind of random experiment that consists of sampling
a single unit from a set of observational units and observing the
values of one or more random variables related to this unit.
Cause A random variable. Its effect on an outcome variable is consid-
ered.
Outcome variable A random variable. Its dependency on a cause is considered.
Potential confounder If we confine the discussion to total causal effects, then it is a ran-
dom variable that is prior or simultaneous to the cause consid-
ered. It might be correlated with the cause and the outcome vari-
able.
Covariate A potential confounder that is considered together with X in a
conditional expectation or a conditional distribution.
Fallible covariate A covariate that is assessed with measurement error.
Latent covariate A covariate that is not directly observed. Instead it is defined us-
ing some parameters of the joint distribution of a set of manifest
random variables.
Intermediate variable A variable that might mediate (transmit) the effect of the cause on
the outcome variable. The cause is always prior to an potential
mediator and an potential mediator is always prior to the out-
come variable. An potential mediator is not necessarily affected
by the cause and it does not necessarily have an effect on the out-
come variable.
Mediator An intermediate variable on which X has a causal effect and
which itself has a causal effect on the outcome variable Y .
are ‘prior’ or ‘simultaneous’ to the treatment variable, which itself is ‘prior’ to the outcome
variable. Furthermore, for each single-unit trial and each cause in such a single-unit trial,
we discussed the potential confounders involved. We emphasized that each cause consid-
ered in such a single-unit trial has its own set of potential confounders.
The single-unit trials discussed in this chapter are just a small selection of single-unit tri-
als in which causal effects and causality of stochastic dependencies are of interest. We
might also consider single-unit trials with latent covariates and latent outcome variables
and manifest and/or latent potential mediators, but also single-unit trials with multiple
mediation. Furthermore, we could also consider single-unit trials of growth curve models
(see, e. g., Biesanz, Deeb-Sossa, Aubrecht, Bollen, & Curran, 2004; Bollen & Curran, 2006;
Meredith & Tisak, 1990; Singer & Willett, 2003; Tisak & Tisak, 2000), latent change mod-
els (see, e. g., McArdle, 2001; Steyer, Eid, & Schwenkmezger, 1997; Steyer, 2005), or cross-
lagged panel models (see, e. g., Kenny, 1975; Rogosa, 1980; Watkins, Lei, & Canivez, 2007;
Wolf, Chandler, & Spies, 1981). Causality is also an issue in uni- and multivariate time-
series analysis as well as in stochastic processes with continuous time. However, in this
book our examples will usually deal with experiments and quasi-experiments, including
latent covariates and outcome variables.
Outlook
2.7 Exercises
⊲ Exercise 2-1 Imagine that the probabilities of a crash for a flight with Airline A is ten times smaller
than with Airline B. Which airline would you choose?
⊲ Exercise 2-2 Why does the theory of causal effects refer to single-unit trials?
⊲ Exercise 2-3 Why is it important to know which random experiment we are talking about?
⊲ Exercise 2-4 Which type of random experiment did we refer to in the two examples described in
chapter 1?
⊲ Exercise 2-5 Why is it important to emphasize that, in simple experiments and quasi-experiments
(see section 2.1), the observational-unit variable U represents the observational units at the onset of
treatment ?
⊲ Exercise 2-7 Which kinds of causal effects can be considered in the simple experiment or quasi-
experiment in which no fallible potential confounder and no potential mediator is assessed?
2.7 Exercises 43
Solutions
⊲ Solution 2-1 If your answer is A, then you implicitly apply these probabilities to the random ex-
periment of flying once with A or B, even if these probabilities have been estimated in a sample. This
example serves to emphasize that, not only in theory but also in practice, we are mainly interested
in a single-unit trial, not in a sample consisting of many such single-unit trials, and in particular not
in what applies to sample size going to infinity. (This is how many applied statisticians try to specify
the term ‘population’.)
⊲ Solution 2-2 Within such a single-unit trial, the various concepts of causal effects can be defined
and we can study how to identify these causal effects from the parameters describing the joint dis-
tribution of the random variables considered. In such a single-unit trial, there usually is a clear time
order which helps (but is not sufficient) to disentangle the possible causal relationships between the
random variables considered.
⊲ Solution 2-3 Different random experiments are different empirical phenomena. Although the
names of the variables in different random experiments might be the same, the variables themselves
are different entities, implying that the dependencies and effects between these variables might dif-
fer between different random experiments.
⊲ Solution 2-4 The type of random experiment we refer to in these examples is the single-unit trial
of simple experiments and quasi-experiments described in section 2.1, because there is no extra
sampling of a covariate. Instead, the value of this covariate is fixed as soon as the person is sampled.
That is, the covariate is an attribute of the person.
⊲ Solution 2-5 In the social sciences, units are often persons, and persons can change over time.
If, in a simple experiment or quasi-experiment, a value u of U represents the observational unit
sampled at the onset of treatment, each potential confounder is a function of U . If, in contrast, U
represents the observational unit at the assessment of a fallible covariate (see sect. 2.2), which is
some time prior to the onset of treatment, then there can be other potential confounders in between
assessment of the fallible potential confounder and the onset of treatment. We have to consider
these additional potential confounders both in the definition of causal effects and in data analysis.
⊲ Solution 2-6 A potential confounder of a cause is a random variable that is prior or simultaneous
to the cause, at least as long as we only consider total effects. (If we also consider direct effects, then
a potential confounder can also be posterior to a cause.)
⊲ Solution 2-7 If the treatment has just two values, say treatment and control, then there are differ-
ent kinds of causal effects of the treatment variable on the outcome variable Y , such as the average
total treatment effect, the conditional total treatment effects given a value z of a covariate Z , and the
individual total effect of X on Y given an observational unit u. Aside from these treatment effects,
we may also consider the causal effects of a potential confounder Z on the treatment variable X , but
also on the outcome variable Y .
Part II
In chapter 1 we studied some examples showing that the conditional expectation values
E (Y |X =x ) of an outcome variable Y and their differences E (Y |X =x ) − E (Y |X =x ′ ), the
prima facie effects, can be seriously misleading in evaluating the causal effect of a (treat-
ment) variable X on an (outcome or response) variable Y. Hence, conditional expectation
values cannot be used offhandedly to define the causal effects in which we are interested
when we want to evaluate a treatment, an intervention, or an exposition. For the purpose
of such an evaluation, the concept of a conditional expectation value E (Y |X =x ) has two
deficits. The first one is that the terms E (Y |X =x ) and E (Y |X =x ′ ) do not necessarily de-
scribe the kind of dependency in which we are interested for the evaluation of a treatment.
This deficit is related to (causal) bias, which will be treated in chapter 6. The second deficit
is that it does not guarantee that X is prior to Y in time, which is indispensable for a dif-
ference E (Y |X =x ) − E (Y |X =x ′ ) to describe a causal effect of a value x of X compared to
another value x ′ of X .
Overview
In the present chapter, we focus on time order of sets of events and of random variables.
We introduce the concepts of a filtration and the relations prior to, simultaneous to, and
prior or simultaneous to with respect to a filtration. Note that the definitions of these rela-
tions do not involve a probability measure. Instead, they can be introduced for measurable
set systems and measurable maps. In the framework of a probability space, these relations
represent time order of sets of events and random variables, respectively. For brevity, we re-
frain from explicitly treating the corresponding relations among measurable sets (and with
it, among events).
Requirements
Reading this chapter requires that the reader is familiar with the contents of the first two
chapters of Steyer (2024). The first of these chapters deals with the concepts of probabil-
ity and conditional probability of events, including the necessary mathematical frame-
work such as a probability space (Ω, A, P ) consisting of a set Ω of possible outcomes, a
σ-algebra A of events, and a probability measure P on A. The second chapter introduces
the concepts of a random variable as a special measurable map and its distribution as a
special image measure. These chapters will be referred to as RS-chapter 1 and RS-chapter
2. The same kind of shortcut is used when referring to other parts of that book, such as
sections, definitions, theorems, remarks, or equations, for instance.
48 3 Time Order
3.1 Filtration
The definition of an event in probability theory does not presume that there is a time or-
der between events, sets of events, and random variables. However, in many applications
of probability theory such a time order is important, in particalur if causal interpretations
of dependencies are intended. In both examples presented in chapter 1, for instance, it
is crucial for the evaluation of the treatment that the treatment variable is prior to the
outcome variable Y. Such a time order can be defined with respect to a filtration, a funda-
mental concept of the theory of stochastic processes (see, e. g., Klenke, 2020, Def. 9.9).
Note that this concept is defined in the context of a measurable space (Ω, A ) (see RS-
Def. 1.4), which just consists of a set Ω and a σ-algebra A. No probability measure is in-
volved, and this also applies to the prior-to relations that will be introduced in section 3.2.
Hence, throughout this chapter, we do not presume that there is a probability measure
P on (Ω, A ). Nevertheless, our examples refer to random experiments, which are repre-
sented by a probability space (Ω, A, P ).
RST: Vielleicht auch filtered measurable space einführen.
Example 3.2 [Joe and Ann With Self-Selection] In the random experiment presented in
Table 1.2, all elements ω1 , . . . , ω8 of the set of possible outcomes are listed in the first column
of this table. The set of these possible outcomes is the Cartesian product
Ω = ΩU × ΩX × ΩY
= {ω1 , . . . , ω8 } (3.1)
© ª
= ( Joe , no , −), . . . , (Ann , yes , +) ,
where ΩU = { Joe , Ann }, ΩX = {no , yes }, and ΩY = {−, +}. The σ-algebra on Ω is specified by
A = P (Ω), that is, A is chosen to be the power set of Ω, which is the set of all subsets of Ω,
consisting of 28 = 256 elements. Finally, the probability measure P : Ω → A is specified by
the assignment of the probabilities P ({ωi }) to the eight elements of Ω. These probabilities
are shown in the second column of Table 1.2. All other 248 probabilities P (A), A ∈ A, can
be computed from the probabilities P ({ωi }), i = 1, . . . , 8, because, except for the empty set,
they are unions of the elementary events {ωi } [see RS-Box 1.1 (x)].
In this example, the person variable
U : Ω → ΩU (3.2)
X : Ω → ΩX′ (3.3)
Y : Ω → ΩY′ (3.4)
3.1 Filtration 49
has the co-domain ΩY′ = { 0, 1}. Table 1.2 shows the assignment of values of these random
variables to each element of Ω. Furthermore, we choose the σ-algebras AU = P (ΩU ) and
AX′ = AY′ = P ({ 0, 1}) to be the power sets of ΩU and of ΩX′ = ΩY′ , respectively. Hence, the
value space of U is (ΩU , AU ), and (ΩX′ , AX′ ) = (ΩY′ , AY′ ) is the value space of X and Y (see
RS-Def. 2.2).
Now, we specify the filtration FT = (Ft )t ∈T , T = {1, 2, 3}, in A by
(see RS-Def. 2.12). The first of these three σ-algebras is the σ-algebra generated by U,
σ(U ) = U −1 (A ′ ): A ′ ∈ AU
© ª
(3.6)
= U −1 ({ Joe }), U −1 ({Ann }), U −1 (ΩU ), U −1 (Ø) .
© ª
U −1 ({ Joe }) = ( Joe , no , −), ( Joe , no , +), ( Joe , yes , −), ( Joe , yes , +)
© ª
= {ω1 , ω2 , ω3 , ω4 } (3.7)
U − 1 ({Ann }) = (Ann , no , −), (Ann , no , +), (Ann , yes , −), (Ann , yes , +)
© ª
= {ω5 , ω6 , ω7 , ω8 } (3.8)
U −1 (ΩU ) = Ω (3.9)
U −1 (Ø) = Ø (3.10)
(see Exercises 3-1 and 3-2). Hence, these sets are the four events that the person variable
U takes on
The σ-algebra generated by (the bivariate random variable) (U , X ) (see RS-sect. 2.1.4) is
σ(U , X ) = (U , X ) − 1 (A ′ ): A ′ ∈ AU ⊗ AX′ .
© ª
(3.11)
(see RS-Def. 1.15). In this example, this set is identical to the power set of the Cartesian
product ΩU × ΩX′ . Hence, gathering the inverse images of all elements of AU ⊗ AX′ [see
Eq. (3.11)] yields
©
σ(U , X ) = {ω1 , ω2 }, {ω3 , ω4 }, {ω5 , ω6 }, {ω7 , ω8 },
{ω1 , ω2 , ω3 , ω4 }, {ω5 , ω6 , ω7 , ω8 },
{ω1 , ω2 , ω5 , ω6 }, {ω1 , ω2 , ω7 , ω8 },
{ω3 , ω4 , ω5 , ω6 }, {ω3 , ω4 , ω7 , ω8 }, (3.13)
{ω1 , ω2 , ω3 , ω4 , ω5 , ω6 }, {ω1 , ω2 , ω3 , ω4 , ω7 , ω8 },
ª
{ω1 , ω2 , ω5 , ω6 , ω7 , ω8 }, {ω3 , ω4 , ω5 , ω6 , ω7 , ω8 }, Ω, Ø
(see Exercise 3-4). Comparing σ(U ) to σ(U , X ) [see Eqs. (3.7) to (3.10)] shows that all ele-
ments of σ(U ) are also elements of σ(U , X ). Hence, σ(U ) ⊂ σ(U , X ).
Finally, the σ-algebra σ(U , X , Y ) generated by (U , X , Y ) is identical to the power set of
Ω. It consists of 28 = 256 elements. Because σ(U ) ⊂ σ(U , X ) and the power set consists
of all subsets of Ω, we can conclude σ(U ) ⊂ σ(U , X ) ⊂ σ(U , X , Y ). Hence, FT defined by
Equations (3.5) is a filtration in A. ⊳
Now we introduce time order. More precisely, we define the prior-to relation among mea-
surable set systems (sets of events), and measurable maps (random variables). For these
definitions, it suffices to refer to a measurable space (Ω, A ). In the framework of a pro-
bability space (Ω, A, P ), measurable set systems (i.e., subsets of A ) are sets of events (see
RS-Def. 1.4) and measurable maps are random variables (see RS-Def. 2.2).
Reading the following definition, note that ∃ means ‘there is’ and ∧ symbolizes the
conjunction of two propositions, that is, the logical ‘and’. Furthermore, remember that
the σ-algebra σ(X ) generated by a measurable map X on (Ω, A ) is a set system satisfying,
among other things, σ(X ) ⊂ A (see RS-Def. 2.12).
Remark 3.4 [Prior-to Relation of Measurables Sets and of Events] Let (Ω, A ) be a mea-
surable space and {C }, {D } be set systems containing the sets C , D ∈ A as their only el-
ement. In the context of a probability space (Ω, A, P ), the sets C and D represent events.
We say that C is prior in FT to D (or D posterior in FT to C ) and denote it by C ≺ D , if
FT
{C } is prior in FT to {D }. Using this definition, all propositions about the prior-to rela-
tion of measurable set systems can be applied to the prior-to relation of measurable sets.
However, for brevity, we refrain from explicitly spelling out more details about the prior-to
relation of measurable sets or events. ⊳
Example 3.5 [Joe and Ann With Self-Selection] In Example 3.2, we specified the proba-
bility space (Ω, A, P ), the random variables U , X , Y, and the filtration FT = (Ft )t ∈T in A,
T = {1, 2, 3}, by
F1 = σ(U ), F2 = σ(U , X ), F3 = σ(U , X , Y ),
where σ(U ), σ(U , X ), and σ(U , X , Y ) are the σ-algebras generated by the random variables
U , (U , X ), and (U , X , Y ), respectively.
According to Definition 3.3 (i), the set system σ(U ) is prior in FT to the set system σ(X )
because σ(U ) ⊂ F1 , σ(X ) 6⊂ F1 [see Def. 3.3 (i) (a)], but σ(X ) ⊂ F2 [see Def. 3.3 (i) (b)]. Note
that
¡ ¢
F2 = σ(U , X ) = σ σ(U )∪ σ(X )
[see RS-Eq. (2.16)], which implies σ(X ) ⊂ F2. Similarly, σ(U ) is prior in FT to σ(Y ) because
σ(U ) ⊂ F1 , σ(Y ) 6⊂ F1 , but σ(Y ) ⊂ F3 . Again note that
¡ ¢
F3 = σ(U , X , Y ) = σ σ(U )∪ σ(X )∪ σ(Y )
[see again RS-Eq. (2.16)], which implies σ(Y ) ⊂ F3 . Finally, σ(X ) is prior in FT to σ(Y )
because σ(X ) ⊂ F2 and σ(Y ) 6⊂ F2, but σ(Y ) ⊂ F3 (see Exercise 3-5).
Remember, that random variables are measurable maps and that the prior-to relation
of measurable maps is defined via their generated σ-algebras . Now that we clarified the
prior-to relation of the set systems σ(U ), σ(X ), and σ(Y ), we can also conclude that U is
prior in FT to X and Y, and that X is prior in FT to Y [see Def. 3.3 (ii)]. ⊳
Now we study some properties of the prior-to relations. First of all, let us ascertain that
the prior-to relation of measurable set systems is asymmetric and transitive. Reading this
theorem, note that ¬ denotes the negation of a proposition and ⇒ the implication between
two propositions.
(ii) (C F≺ D ∧ D F≺ E ) ⇒ C F≺ E . (transitivity)
T T T
(Proof p. 70)
52 3 Time Order
Hence, according to Proposition (i) of Theorem 3.6, the set system C being prior in FT
to the set system D implies that D is not prior in FT to C . This property of the prior-to
relation is called asymmetry. Furthermore, according to Proposition (ii) of Theorem 3.6,
the prior-to relation is transitive. That is, if C is prior in FT to D that itself is prior in FT to
E , then C is prior in FT to E .
The following remark prepares another important property of the prior-to relation of
measurable set systems.
Remark 3.7 [Some Propositions About Subsets] Some general properties of subsets are:
(A ⊂ B ∧ A ⊂ C ) ⇔ A ⊂ (B ∩ C ) (3.14)
(A ⊂ C ∧ B ⊂ C ) ⇔ (A ∪ B) ⊂ C (3.15)
and
(A ⊂ B ∧ B ⊂ C ) ⇒ A ⊂C (3.16)
(see Exercises 3-6 to 3-8). Hence, according to Proposition (3.16), if (Ω, A ) is a measurable
space and FT = (Ft )t ∈T a filtration in A, then:
(C 0 ⊂ C ∧ C ⊂ Ft ) ⇒ C 0 ⊂ Ft . (3.17)
C F≺ D ⇒ C 0 F≺ D, if C 0 ⊂ C. (3.18)
T T
(Proof p. 70)
Now we turn to some properties of the prior-to relation of measurable set systems
that involve σ-algebras generated by set systems. Let (Ω, A ) be a measurable space,
FT = (Ft )t ∈T a filtration in A, and C ⊂ A. Because each Ft , t ∈T , is a σ-algebra,
∀ t ∈T : C ⊂ Ft ⇔ σ(C ) ⊂ Ft (3.19)
holds for the σ-algebra σ(C ) generated by C [see RS-Def. 1.7 and RS-Prop. (1.5)]. This
proposition is used in the proof of the following theorem.
(i) C F≺ D ⇔ C F≺ σ(D)
T T
(ii) C ≺ D ⇔ σ(C ) ≺ D.
FT FT
(Proof p. 70)
Hence, according to Proposition (i) of Theorem 3.9, the set system C being prior in FT
to the set system D is equivalent to C being prior in FT to the σ-algebra σ(D) generated by
D. And, according to Proposition (ii) of this theorem, C being prior in FT to D is equivalent
to σ(C ) being prior in FT to D.
3.2 Prior-To Relations 53
Remark 3.10 [The σ-Algebra Generated by the Union of Two Set Systems] Now we turn
to some properties of the prior-to relation of set systems involving the σ-algebra generated
by the union of two set systems. This is of interest because
¡ ¢
σ(X , Y ) = σ σ(X ) ∪ σ(Y ) (3.20)
(see RS-Lem. 2.15). According to this equation, the union of the σ-algebras σ(X ) and σ(Y )
generated by the measurable maps X and Y on a measurable space (Ω, A ) generates the
σ-algebra generated by the bivariate measurable map (X , Y ) (see RS-sect. 2.1.4).
Note that, if C is a σ-algebra on Ω, then
¡ ¢
σ(X ) ∪ σ(Y ) ⊂ C ⇔ σ σ(X ) ∪ σ(Y ) ⊂ C (3.21)
(see RS-Rem. 1.9). Therefore, and because a filtration FT = (Ft )t ∈T is a family of σ-algebras,
all properties of the prior-to relation of set systems involving the σ-algebra generated by the
union of two set systems can be translated to corresponding properties involving a bivariate
measurable map or a bivariate random variable (see RS-sect. 2.1.4). ⊳
C, D ≺ E :⇔ C ≺E ∧ D≺E. (3.22)
FT FT FT
(ii) C, D F≺ E ⇔ σ(C ∪ D) F≺ E .
T T
(Proof p. 71)
Remark 3.12 [A Special Case] Note that C F≺ E implies the existence of a t ∈T with E ⊂ Ft
T
[see Prop. (b) of Def. 3.3 (i)]. Hence,
C F≺ D, E ⇒ C F≺ σ(D ∪ E ) (3.23)
T T
is an immediate implication of Theorem 3.11 (i). In this proposition, the premise is a short-
cut for C F≺ D ∧ C F≺ E . Hence, if C is prior in FT to D and to E , then C is also prior in FT
T T
to the σ-algebra generated by the union of D and E . ⊳
Example 3.14 [Joe and Ann With Self-Selection] In Example 3.2, we already specified the
probability space (Ω, A, P ), the random variables U , X , Y, and the filtration FT = (Ft )t ∈T
in A, T = {1, 2, 3}, by
because σ(U ) ⊂ F1 and σ(X ), σ(Y ), σ(X , Y ), σ(U , X , Y ) 6⊂ F1 , but σ(X ) ⊂ F2 and σ(Y ),
σ(X , Y ), σ(U , X , Y ) ⊂ F3 [see Def. 3.3 (i)]. That is, σ(U ) is prior in FT to σ(X ), prior to
σ(Y ), prior to σ(X , Y ), and prior to σ(U , X , Y ).
Finally,
because σ(X ) ⊂ F2 and σ(Y ), σ(X , Y ), σ(U , X , Y ) 6⊂ F2, but σ(Y ), σ(X , Y ), σ(U , X , Y ) ⊂ F3
[see again Def. 3.3 (i)]. Hence, σ(X ) is prior in FT to σ(Y ), prior to σ(X , Y ), and prior to
σ(U , X , Y ). ⊳
Now we will translate the properties of the prior-to relation of measurable set systems to
the prior-to relation of measurable maps. For a full understanding, the following remarks
on the measurability of a composition are important.
Remark 3.15 [Composition of Two Maps] Let X : Ω → ΩX′ and g : ΩX′ → Ωg′ be maps. Then
a map W : Ω → Ωg′ is called the composition of X and g , denoted g ◦ X or g (X ) if
¡ ¢
W (ω) = g X (ω) , ∀ω ∈ Ω . (3.25)
The following lemma is useful whenever we deal with the composition of two measur-
able maps. (For a proof, see RS-Theorem 2.11, RS-Definition 2.12, RS-Remark 2.13, and
Lemma 2.34).
3.2 Prior-To Relations 55
Now we turn to properties of the prior-to relation of measurable maps; the most im-
portant ones are gathered in Box 3.1. Because random variables on a probability space
(Ω, A, P ) are measurable maps on the measurable space (Ω, A ), all propositions about
measurable maps in this box and in the present section also hold for random variables.
Box 3.1 starts repeating the definition of the prior-to relation of measurable maps. This
definition is based on the framework of a measurable space (Ω, A ) and a filtration FT in A.
Proposition (i) of Box 3.1 is the asymmetry property of the prior-to relation of measurable
maps. According to this proposition, if X is prior in FT to Y, then Y is not prior in FT to X .
This proposition immediately follows from Theorem 3.6 (i) because the prior-to relation
of measurable maps is defined via their generated σ-algebras [see Def. 3.3 (ii)].
According to Proposition (ii) of Box 3.1, X is prior in FT to Y if and only if X is prior in
FT to (X , Y ), where (X , Y ) denotes the bivariate measurable map on (Ω, A ) that consists of
X and Y (for more details see RS-sect. 2.1.4). This proposition is an immediate implication
of Theorem 3.13 and Definition 3.3 (ii).
According to Propositions (iii) and (iv) of this box, if W is X -measurable, then X being
prior in FT to Y implies that W is also prior in FT to Y and to the bivariate map (X , Y ).
Proposition (iii) is an immediate implication of Theorem 3.8, whereas Proposition (iv) fol-
lows from Theorem 3.13, Proposition (3.20), and Theorem 3.8 [see Exercise 3-11].
Now we turn to Propositions (v) and (vi) of Box 3.1, which involve an additional measur-
able map Z on the measurable space (Ω, A ). We prove these propositions in the following
theorem. In this theorem, we use the notation
56 3 Time Order
Let X and Y be measurable maps on a measurable space (Ω,A ), let FT = (Ft )t ∈T be a filtration
in A, and let σ(X ) and σ(Y ) denote the σ-algebras generated by X and Y, respectively. Then we
say that X is prior in FT to Y (and Y posterior in FT to X ), denoted X ≺ Y , if the following two
FT
conditions hold:
(a) ∃ s ∈T : σ(X ) ⊂ Fs ∧ σ(Y ) 6⊂ Fs
(b) ∃ t ∈T : σ(Y ) ⊂ Ft .
A first property is
X ≺Y ⇒ ¬(Y ≺ X ). (asymmetry) (i)
FT FT
Additionally, let (X ,Y ) denote the bivariate measurable map consisting of X and Y. Then:
X ≺Y ⇔ X ≺ (X ,Y ). (ii)
FT FT
Additionally, let also Z be a measurable map on (Ω,A ), let (Y , Z ) denote the bivariate map
consisting of Y and Z , and σ(X ,Y ) the σ-algebra generated by (X ,Y ). Then:
X , Y F≺ Z :⇔ (X F≺ Z ∧ Y F≺ Z ). (3.26)
T T T
Hence, according to Proposition (v) of Box 3.1, if X and Y are prior in FT to Z , then
each (X , Y )-measurable map W is prior in FT to Z as well. Note that each X -measurable
map W is also (X , Y )-measurable, that is,
Remark 3.20 [Two Special Cases] For W =X , Proposition (ii) of Theorem 3.19 yields
(X F≺ Y ∧ Y F≺ Z ) ⇒ X F≺ Z . (transitivity) (3.28)
T T T
(X F≺ Y ∧ Y F≺ Z ) ⇒ (X , Y ) F≺ Z . (3.29)
T T T
Now we turn to Proposition (vii) of Box 3.1, which is proved in the following theorem.
¡ ¢
X F≺ Y ∧ ∃ t ∈T : σ(Z ) ⊂ Ft ⇒ X F≺ (Y , Z ). (3.31)
T T
Example 3.23 [Joe and Ann With Self-Selection] In Example 3.14, we already showed that
and
U ≺ X , Y , (X , Y ), (U , X , Y ). (3.32)
FT
58 3 Time Order
X F≺ Y , (X , Y ), (U , X , Y ), (3.33)
T
that is, X is prior in FT to Y, prior in FT to the bivariate random variable (X , Y ), and prior
in FT to the trivariate random variable (U , X , Y ).
In order to illustrate some propositions of Box 3.1, consider the indicator 1U = Joe of the
event that Joe is sampled. Because 1U = Joe is U -measurable, according to Proposition (3.32)
and Box 3.1 (iii),
1U = Joe F≺ X , Y , (X , Y ), (U , X , Y ). (3.34)
T
1U = Joe F≺ Y , (X , Y ), (U , X , Y ). (3.35)
T
Finally, consider the product 1U = Joe · X , which is the indicator of the event that Joe is
sampled and treated. (Note that, in this example, X is an indicator, too.) This indicator is
(U , X )-measurable, that is, σ(1U = Joe · X ) ⊂ σ(U , X ). Hence, according to (3.32), (3.33), and
Box 3.1 (v), the random variable 1U = Joe · X is prior in FT to Y. ⊳
(i) We say that C and D are simultaneous in (or, with respect to) FT , denoted
C ≈ D, if the following two conditions hold:
FT
(a) ∃ t ∈T : C ⊂ Ft
(b) ∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft .
(ii) Let X and Y be measurable maps on (Ω, A ). Then we say that X and Y are
simultaneous in FT , denoted X ≈ Y , if σ(X ) and σ(Y ) are simultaneous in
FT
FT .
3.3 Simultaneous-to Relations 59
The simultaneous-to relation always refers to a measurable space (Ω, A ) and a fil-
tration FT = (Ft )t ∈T in A. For C and D to be simultaneous in FT , or, synonymously,
for C to be simultaneous in FT to D , we require that there is an element t ∈ T such that
the set system C is a subset of Ft [see Def. 3.24 (i) (a)]. In this case we also say that
C is in the filtration FT . Furthermore, in condition (b) of this definition, we require that
C is a subset of Ft if and only if D is a subset of Ft , for all t ∈T . Note that the conjunc-
tion of (a) and (b) implies that D is in the filtration FT as well, that is, there is a t ∈T such
that D ⊂ Ft . Note again that, in the context of a probability space (Ω, A, P ), a measurable
map is a random variable on the probability space (Ω, A, P ), and measurable set systems
C, D ⊂ A are sets of events.
Remark 3.25 [Simultaneous-to Relation of Measurables Sets and of Events] Let (Ω, A )
be a measurable space and {C }, {D } set systems containing the sets C , D ∈ A as their only
element. We say that C and D are simultaneous in FT , denoted C ≈ D , if {C } and {D } are
FT
simultaneous in FT . Using this definition, all propositions about the simultaneous-to re-
lation of measurable set systems are easily translated to the simultaneous-to relation of
measurable sets. Again remember, in the context of a probability space (Ω, A, P ), the sets C
and D represent events. Again, for brevity, we will not treat any details of the simultaneous-
to relation of measurable sets. ⊳
Example 3.26 [Joe and Ann With Self-Selection] In Example 3.2, we already specified the
probability space (Ω, A, P ), the random variables U , X , Y, and the filtration FT = (Ft )t ∈T .
In this example, the person variable U and the event { ©Joe } × ΩX × ΩY thatª Joe is sampled
are simultaneous in FT . This is because the set system { Joe } × ΩX × ΩY is a subset of the
σ-algebra generated by U and because the first σ-algebra F1 in the filtration FT has been
defined to be the σ-algebra generated© by U . Hence,
ª conditions (a) and (b) of Definition
3.24 (i) hold for C = σ(U ) and D = { Joe } × ΩX × ΩY . ⊳
Now we study some elementary properties of the simultaneous-to relation. First of all, we
show that this relation is reflexive, symmetric and transitive, that is, we show that it is an
equivalence relation.
Now we turn to some properties of the simultaneous-to relation involving the σ-algebras
σ(C ) and σ(C ∪ D) generated by the set systems C and C ∪ D, respectively (see RS-
Def. 1.7). The motivation is to obtain some properties of the simultaneous-to relation of
measurable maps and random variables (see again Rem. 3.10).
60 3 Time Order
(i) C ≈ D ⇔ σ(C ) ≈ D
FT FT
(ii) C ≈ D ⇒ σ(C ∪ D) ≈ D.
FT FT
(Proof p. 73)
Hence, according to Theorem 3.28, the set systems C and D being simultaneous in FT
is equivalent to σ(C ) (i.e., the σ-algebra generated by C ) being simultaneous in FT to D
[see Prop. (i)]. Furthermore, C and D being simultaneous in FT implies that σ(C ∪D) (i.e.,
the σ-algebra generated by the union C ∪ D) is simultaneous in FT to D [see Prop. (ii)].
In the following theorem we treat a property of the simultaneous-to relation involving
a third measurable set system E ⊂ A.
Hence, if the set systems C , D and E are simultaneous in FT to each other, then the
σ-algebra generated by the union of C and D is simultaneous in FT to E .
Example 3.30 [Joe and Ann With Self-Selection] In Example 3.2, we already specified the
probability space (Ω, A, P ), the random variables U , X , Y, and the filtration FT = (Ft )t ∈T
for the random experiment presented in Table 1.2. In this example,
σ(X ) ≈ σ(U , X ),
FT
and
X ≈ (U , X )
FT
and
Y ≈ (X , Y ), (U , X , Y ).
FT
Now we treat some theorems involving the prior-to and the simultaneous-to relations.
In the first one we show that the set system C being prior in FT to the set system D implies
that σ(C ∪ D), the σ-algebra generated by the union of C and D, is simultaneous in FT to
D and that C being simultaneous in FT to D implies that C is not prior in FT to D.
(i) C F≺ D ⇒ σ(C ∪ D) ≈ D
T FT
(i) C ≈ D ⇒ (C 0 ≺ D ∨ C 0 ≈ D)
FT FT FT
(ii) C F≺ D ⇒ (C 0 F≺ D ∨ C 0 ≈ D).
T T FT
(Proof p. 74)
Hence, according to the two propositions of this theorem, if the set system C is prior or
simultaneous in FT to the set system D and C 0 is a subset of σ(C ∪ D), then C 0 is prior or
simultaneous in FT to D.
Finally, we show: If C is simultaneous in FT to D that itself is prior in FT to E , then
C 0 is prior in FT to E , provided that C 0 is a subset of σ(C ∪ D). Furthermore, if C is prior
in FT to D that itself is simultaneous in FT to E , and C 0 is a subset of C, then we can
conclude that C 0 is prior in FT to E .
(i) (C ≈ D ∧ D F≺ E ) ⇒ C 0 F≺ E , if C 0 ⊂ σ(C ∪ D)
FT T T
(ii) (C ≺ D ∧ D ≈ E ) ⇒ C0 ≺ E , if C 0 ⊂ C.
FT FT FT
(Proof p. 75)
In the following remark, we consider the special case of Theorem 3.33 in which C 0 =C.
Remark 3.34 [A Special Case] Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration
in A, and C, D, E ⊂ A. Then:
(i) (C ≈ D ∧ D ≺ E ) ⇒ C ≺E
FT FT FT
62 3 Time Order
(ii) (C F≺ D ∧ D ≈ E ) ⇒ C F≺ E .
T FT T
Hence, if the set system C is simultaneous in FT to the set system D that itself is prior
in FT to the set system E , then C is prior in FT to E as well. Furthermore, if C is prior in
FT to D and D is simultaneous in FT to E , then C is also prior in FT to E . ⊳
Now we turn to properties of the simultaneous-to relation of measurable maps, the most
important of which are gathered in Box 3.2. Again remember, because random variables
on a probability space (Ω, A, P ) are measurable maps on the measurable space (Ω, A ), all
propositions about the simultaneous-to relation of measurable maps in this box and in
this section also hold for random variables.
Box 3.2 starts repeating the definition of the simultaneous-to relation of measurable
maps. Propositions (i) (reflexivity) and (ii) (symmetry) of Box 3.2 immediately follow from
Propositions (i) and (ii) of Theorem 3.27 because the simultaneous-to relation of measur-
able maps is defined via their generated σ-algebras [see Def. 3.24 (ii)].
Proposition (iii) of Box 3.2 is an immediate implication of Theorem 3.31 (ii) because
the simultaneous-to and prior-to relations of measurable maps are defined via the corre-
sponding relation of their generated σ-algebras [see Defs. 3.3 (ii) and 3.24 (ii)]. According
to this proposition, if X is simultaneous in FT to Y, then X is not prior in FT to Y, nor is Y
prior in FT to X (which follows from symmetry).
According to Propositions (iv) and (v) of Box 3.2, X being simultaneous or prior in FT
to Y implies that the bivariate map (X , Y ) is also simultaneous in FT to Y. The first of these
propositions immediately follows from Theorem 3.28 (ii), Definition 3.24 (ii), and Equation
(3.20), the second from Theorem 3.31 (i), Definition 3.24 (ii), and Equation (3.20).
Proposition (vi) of Box 3.2 is called transitivity of the simultaneous-to relation of mea-
surable maps. According to this proposition, if X is simultaneous in FT to Y and Y simul-
taneous in FT to Z , then X is also simultaneous in FT to Z . This proposition immediately
follows from Theorem 3.27 (iii), Definition 3.24 (ii), and Equation (3.20).
According to Proposition (vii) of Box 3.2, X being simultaneous in FT to Y and Y being
simultaneous in FT to Z implies that the bivariate map (X , Y ) is also simultaneous in FT
to Z . This proposition immediately follows from Theorem 3.29, Definition 3.24 (ii), and
Equation (3.20). Symmetry immediately yields
¡ ¢
X ≈Y ∧ Y ≈ Z ⇒ (X , Z ) ≈ Y . (3.37)
FT FT FT
According to Propositions (viii) and (ix) of Box 3.2, X being prior or simultaneous in
FT to Y implies that each (X , Y )-measurable map W is prior or simultaneous in FT to Y
(see again Rem. 3.15 to Example 3.18). These two propositions are proved in the following
theorem.
Let X and Y be measurable maps on a measurable space (Ω,A ), let FT = (Ft )t ∈T be a filtration
in A , and let σ(X ) and σ(Y ) denote the σ-algebras generated X and Y , respectively. Then we
say that X is simultaneous to Y in FT , denoted X ≈ Y , if the following two conditions hold:
FT
(a) ∃ t ∈T : σ(X )⊂ Ft
(b) ∀ t ∈T : σ(X ) ⊂ Ft ⇔ σ(Y ) ⊂ Ft .
Additionally, let (X ,Y ) denote the bivariate measurable map consisting of X and Y. Then:
X ≈Y ⇒ (X ,Y ) ≈ Y (iv)
FT FT
X ≺Y ⇒ (X ,Y ) ≈ Y . (v)
FT FT
(X ≈ Y ∧ Y ≈ Z ) ⇒ X ≈Z (transitivity) (vi)
FT FT FT
(X ≈ Y ∧ Y ≈ Z ) ⇒ (X ,Y ) ≈ Z . (vii)
FT FT FT
(ii) X ≈ Y ⇒ (W F≺ Y ∨ W ≈ Y ).
FT T FT
(Proof p. 76)
Remark 3.36 [Two Special Cases] For W =X , Proposition (x) of this box yields
¡ ¢
X ≈ Y ∧ Y F≺ Z ⇒ X F≺ Z . (3.38)
FT T T
(i) ∃ t ∈T : C ⊂ Ft ⇒ C 4C (reflexivity)
FT
(ii) (C 4 D ∧ D 4 C ) ⇒ C ≈D (pseudo-antisymmetry)
FT FT FT
(iii) (∃ s, t ∈T : C ⊂ Fs ∧ D ⊂ Ft ) ⇒ (C 4 D ∨ D 4 C ). (linearity)
FT FT
(iv) (C 4 D ∧ D 4 E ) ⇒ C 4E . (transitivity)
FT FT FT
(Proof p. 76)
Hence, according to Proposition (i) of Theorem 3.39, if the set system C is in the filtra-
tion, then it is prior or simultaneous in FT to itself. Furthermore, if C is prior or simulta-
neous in FT to the set system D that itself is prior or simultaneous in FT to C , then we
can conclude that C and D are simultaneous in FT to each other [see Prop. (ii)]. Further-
more, if C and D are in the filtration FT , then C is prior or simultaneous in FT to D or D
is prior or simultaneous in FT to C [see Prop. (iii)]. Finally, according to Proposition (iv),
if C is prior or simultaneous in FT to D that itself is prior or simultaneous in FT to the set
system E , then C is prior or simultaneous in FT to E .
Now we treat an implication of a set system C being prior or simultaneous in a filtration
FT to a set system D for a subset C 0 of σ(C ∪ D), the σ-algebra generated by the union of
the sets C and D.
∃ t ∈T : C ⊂ Ft ⇒ C 0 4 C, if C 0 ⊂ C . (3.41)
FT
(Proof p. 78)
According to this theorem, if the set system C is prior or simultaneous in FT to the set
system D, then each subset C 0 of the σ-algebra σ(C ∪ D) generated by the union of C and
D is prior or simultaneous in FT to σ(C ∪ D).
Remark 3.43 [A Special Case] In the special case C 0 =C, Proposition (3.42) yields
C, D 4 E :⇔ (C 4 E ∧ D 4 E ). (3.44)
FT FT FT
(i) C 4 D ⇔ C 4 σ(D)
FT FT
(ii) C 4 D ⇔ σ(C ) 4 D.
FT FT
(Proof p. 79)
3.4 Prior-or-Simultaneous-to Relations 67
Let X and Y be measurable maps on a measurable space (Ω,A ), let FT = (Ft )t ∈T be a filtration
in A , and let σ(X ) and σ(Y ) denote the σ-algebras generated by X and Y, respectively. Then we
say that X is prior or simultaneous to Y in FT , denoted X 4 Y , if X ≺ Y or X ≈ Y.
FT FT FT
Two first properties are:
(X 4 Y ∧ Y 4 X ) ⇒ X ≈Y (pseudo-antisymmetry) (i)
FT FT FT
¡ ¢
∃ s, t ∈T : σ(X ) ⊂ Fs ∧ σ(Y ) ⊂ Ft ⇒ (X 4 Y ∨ Y 4 X ). (linearity) (ii)
FT FT
Now we turn to the prior-or-simultaneous-to relation of measurable maps, the most im-
portant properties of which are gathered in Box 3.3. The first two, pseudo-antisymmetry
and linearity, are immediate implications of Definition 3.37 (ii) and Theorem 3.39 (ii) and
(iii), respectively. According to Proposition (i) of Box 3.3, if X is prior or simultaneous in
FT to Y and Y is prior or simultaneous in FT to X , then we can conclude that X and Y
are simultaneous in FT to each other. And, according to Proposition (ii) of Box 3.3, if X
and Y are in the filtration FT , then X is prior or simultaneous in FT to Y , or Y is prior or
simultaneous in FT to X .
Proposition (iii) of Box 3.3 has already been proved in Theorem 3.35 (ii). According to
this proposition, X and Y being simultaneous in FT implies that each (X , Y )-measurable
map W is prior or simultaneous in FT to Y.
Proposition (iv) of Box 3.3 is an immediate implication of Theorem 3.35 (i) and (ii). Ac-
cording to this proposition, if X is prior or simultaneous in FT to Y , then W is prior or
simultaneous in FT to Y as well, provided that W is measurable with respect to (X , Y ),
that is, provided that σ(W ) ⊂ σ(X , Y ).
According to Proposition (v) of Box 3.3, if X is prior or simultaneous in FT to Y, then
W is prior or simultaneous in FT to the bivariate measurable map (X , Y ), provided that
W is measurable with respect to (X , Y ). This is the case, for example, if W = X or W = Y.
Proposition (v) is an immediate implication of Theorem 3.40 and Definition 3.37 (ii).
68 3 Time Order
Note again that σ(W ) ⊂ σ(X ) implies σ(W ) ⊂ σ(X , Y ), but not vice versa. The analog of
Proposition (vi) for measurable set systems has been treated in Theorem 3.39 (i). Together
with Definition 3.37 (ii), this proves Proposition (vi) of Box 3.3, and with it, Proposition
(3.46).
According to Proposition (vii), if X and Y as well as Y and Z are simultaneous in FT to
each other, then each (X , Y )-measurable map W is prior or simultaneous in FT to Z . For
example, if 1X =x is the indicator of the event that X takes on the value x, X is simultaneous
to Y , and Y simultaneous to Z in FT , then 1X =x is prior or simultaneous to Z .
Furthermore, according to Proposition (viii) of Box 3.3, if X is prior or simultaneous in
FT to Y and Y is prior or simultaneous in FT to Z in FT , then W is prior or simultaneous
in FT to Z as well, provided that W is (X , Y )-measurable.
Also note, for W = X , Proposition (viii) of Box 3.3 yields
The proofs of Propositions (vii) and (viii) of Box 3.3 are found in the following theorem.
(i) (X ≈ Y ∧ Y ≈ Z ) ⇒ W 4Z
FT FT FT
(ii) (X 4 Y ∧ Y 4 Z ) ⇒ W 4Z.
FT FT FT
(Proof p. 79)
Finally, we consider Proposition (ix) of Box 3.3. According to this proposition, if X and
Y are prior or simultaneous to Z in FT and W is (X , Y )-measurable, then W is prior or
simultaneous in FT to Z as well. The proof of this proposition is found in the following
theorem, in which we use the notation
X ,Y 4 Z :⇔ (X 4 Z ∧ Y 4 Z ). (3.48)
FT FT FT
Let (Ω,A ) be a measurable space, that is, let Ω be a set and A a σ-algebra on Ω.
FT Filtration in A . A family FT := (Ft )t ∈T of σ-algebras Ft ⊂ A satisfying Fs ⊂ Ft ,
for all s, t ∈T with s ≤ t, T ⊂ R , and T 6= Ø.
Additionally, let C, D ⊂ A.
C≺D C is prior in FT to D . This means:
FT
(a) there is an s ∈T such that C ⊂ Fs and D 6⊂ Fs , and
(b) there is a t ∈T such that D ⊂ Ft .
Let X ,Y be measurable maps on (Ω,A ) and let σ(X ), σ(Y ) denote their generated σ-algebras.
X ≺Y X is prior in FT to Y . This means that σ(X ) is prior in FT to σ(Y ).
FT
X ≈Y X is simultaneous in FT to Y . This means that σ(X ) is simultaneous in FT to σ(Y ).
FT
X 4Y X is prior or simultaneous in FT to Y . This means that σ(X ) is prior or simultane-
FT
ous in FT to σ(Y ).
It should be noted that these concepts are defined in the framework of a measurable
space (Ω, A ). No probability measure on A is involved. Hence, whether or not a random
variable X is prior to a random variable Y does not depend on the stochastic dependencies
between these random variables, let alone on data that result from conducting a random
experiment.
70 3 Time Order
3.6 Proofs
(a) ∃ s ∈T: C ⊂ Fs ∧ D 6⊂ Fs
(b) ∃ t ∈T: D ⊂ Ft .
(a) ∃ s ∈T: C ⊂ Fs ∧ D 6⊂ Fs
(b) ∃ t ∈T: D ⊂ Ft .
(c) ∃ s ∈T: C 0 ⊂ Fs ∧ D 6⊂ Fs .
However, the conjunction of (c) and (b) is equivalent to C 0 F≺ D [see Def. 3.3 (i)].
T
(a) ∃ s ∈T : C ⊂ Fs ∧ D 6⊂ Fs
(b) ∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft .
According to Proposition (3.19), (a) is equivalent to
(c) ∃ s ∈T : C ⊂ Fs ∧ σ(D) 6⊂ Fs ,
and for the same reason, (b) is equivalent to
(d) ∀ t ∈T : C ⊂ Ft ⇔ σ(D) ⊂ Ft .
However, the conjunction of (c) and (d) is equivalent to C F≺ σ(D).
T
(ii). As mentioned above, C F≺ D is equivalent to the conjunction of (a) and (b). Accord-
T
ing to Proposition (3.19), (a) is equivalent to
(e) ∃ s ∈T : σ(C ) ⊂ Fs ∧ D 6⊂ Fs .
3.6 Proofs 71
(i).
(C F≺ D ∧ ∃ t ∈T : E ⊂ Ft )
¡ T
⇔ ∃ r ∈T : C ⊂ Fr ∧ D 6⊂ Fr [Def. 3.3 (i) (a)]
∧ ∃ s ∈T : D ⊂ Fs [Def. 3.3 (i) (b)]
¢
∧ ∃ t ∈T : E ⊂ Ft [part 2 of the premise in (i)]
¡
⇒ ∃ r ∈T : C ⊂ Fr ∧ σ(D ∪ E ) 6⊂ Fr [D 6⊂ Fr , D ⊂ σ(D ∪ E )]
¢
∧ ∃ u ∈T : σ(D ∪ E ) ⊂ Fu [u = max {s, t }]
⇔ C F≺ σ(D ∪ E ). [Def. 3.3 (i)]
T
(a) ∃ r ∈T: C ⊂ Fr ∧ E 6⊂ Fr
(b) ∃ s ∈T: D ⊂ Fs ∧ E 6⊂ Fs
(c) ∃ t ∈T: E ⊂ Ft
[see Prop. (3.22) and Def. 3.3 (i)]. Without loss of generality, we assume r ≤ s. Because FT
is a filtration, this implies
(d) C, D ⊂ Fs
(e) ∀u ∈T : C, D ⊂ Fu ⇔ σ(C ∪ D) ⊂ Fu
[see RS-Prop. (1.9)].
C, D F≺ E ⇒ σ(C ∪ D) F≺ E . The conjunction of (b), (d), and (e) implies
T T
(f) ∃ s ∈T : σ(C ∪ D) ⊂ Fs ∧ E 6⊂ Fs .
However, the conjunction of (c) and (f) is equivalent to σ(C ∪ D) F≺ E [see Def. 3.3 (i)].
T
C, D F≺ E ⇐ σ(C ∪D) F≺ E . As just has been said, the conjunction of (c) and (f) is equiv-
T T
alent to σ(C ∪D) ≺ E . Now, the conjunction of (f) and (e) implies (a) and (b). However, the
FT
conjunction of (a), (b), and (c) is equivalent to C, D ≺ E .
FT
C F≺ D ⇒ C F≺ σ(C ∪ D). This proposition is a special case of Theorem 3.11 (i), in which C
T T
also takes the role of E . The existence of a t ∈T with C ⊂ Ft is a part of the premise C ≺ D
FT
[see Def. 3.3 (i)].
C ≺ D ⇐ C ≺ σ(C ∪ D). According to Definition 3.3 (i), C ≺ σ(C ∪ D) is equivalent to
FT FT FT
the conjunction of
72 3 Time Order
(a) ∃ s ∈T : C ⊂ Fs ∧ σ(C ∪ D) 6⊂ Fs
(b) ∃ t ∈T : σ(C ∪ D) ⊂ Ft .
(i).
(ii).
¡ ¢
(X F≺ Y ∧ Y F≺ Z ) ⇔ σ(X ) F≺ σ(Y ) ∧ σ(Y )F≺ σ(Z ) [Def. 3.3 (ii)]
T T T T
¡ ¢
⇒ σ(X ) F≺ σ(Z ) ∧ σ(Y )F≺ σ(Z ) [Th. 3.6 (ii)]
T T
¡ ¢
X F≺ Y ∧ ∃ t ∈T : σ(Z ) ⊂ Ft
T
¡ ¢
⇔ σ(X ) F≺ σ(Y ) ∧ ∃ t ∈T : σ(Z ) ⊂ Ft [Def. 3.3 (ii)]
T
¡ ¢
⇒ σ(X ) F≺ σ σ(Y ) ∪ σ(Z ) [Th. 3.11 (i)]
T
According to Definition 3.24 (i), the conjunction of (a) and (b) is equivalent to C ≈ C .
FT
(c) ∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft .
(f) ∀ t ∈T : D ⊂ Ft ⇔ E ⊂ Ft .
(a) ∃ t ∈T : C ⊂ Ft
(b) ∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft .
(e) ∀ t ∈T : C ⊂ Ft ⇔ σ(C ∪ D) ⊂ Ft .
74 3 Time Order
However, the conjunction of (a) and (e) is equivalent to C ≈ σ(C ∪ D) [see Def. 3.24 (i)], to
FT
σ(C ∪ D) ≈ C [see Th. 3.27 (ii)], and to σ(C ∪ D) ≈ D [see Th. 3.27 (iii)].
FT FT
(a) ∃ s ∈T : C ⊂ Fs ∧ D 6⊂ Fs
(b) ∃ t ∈T : D ⊂ Ft .
(c) ∀ t ∈T : D ⊂ Ft ⇒ C ⊂ Ft .
(f) ∀ t ∈T : D ⊂ Ft ⇔ σ(C ∪ D) ⊂ Ft .
Finally, according to Definition 3.24 (i) and Theorem 3.27 (ii), the conjunction of (b) and
(f) is equivalent to σ(C ∪ D) ≈ D.
FT
(g) ∃ t ∈T : C ⊂ Ft
(h) ∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft .
Now, (h) implies
(i) ¬ ∃ t ∈T : C ⊂ Ft ∧ D 6⊂ Ft ,
[see Def. 3.24 (i)]. Because of C 0 ⊂ σ(C ∪D) and Proposition (3.17), Proposition (a) implies
(c) ∃ t ∈T : C 0 ⊂ Ft .
Now, we distinguish two cases.
Case 1: ∀ t ∈T : C 0 ⊂ Ft ⇔ σ(C ∪ D) ⊂ Ft . Together with (b), this proposition implies
(d) ∀ t ∈T : C 0 ⊂ Ft ⇔ D ⊂ Ft .
Now, the conjunction of (c) and (d) is equivalent to C 0 ≈ D [see Def. 3.24 (i)].
¡ ¢ FT
Case 2: ¬ ∀ t ∈T : C 0 ⊂ Ft ⇔ σ(C ∪ D) ⊂ Ft . In conjunction with (c), this proposition
implies
(e) ∃ s ∈T : C 0 ⊂ Fs ∧ σ(C ∪ D) 6⊂ Fs .
Together with (b), this proposition implies
(f) ∃ s ∈T : C 0 ⊂ Fs ∧ D 6⊂ Fs .
Furthermore, the conjunction of (a) and (b) implies
(g) ∃ t ∈T : D ⊂ Ft .
However, the conjunction of (f) and (g) is equivalent to C 0 F≺ D [see Def. 3.3 (i)].
T
(ii).
(f) ∃ s ∈T : (C ⊂ Fs ∧ D 6⊂ Fs )
(g) ∃ t ∈T : D ⊂ Ft
(h) ∀ t ∈T : D ⊂ Ft ⇔ E ⊂ Ft .
76 3 Time Order
(i) ∃ s ∈T : (C ⊂ Fs ∧ E 6⊂ Fs ),
and the conjunction of (g) and (h) implies
(j) ∃ t ∈T : E ⊂ Ft .
However, according to Definition 3.3 (i), the conjunction of (i) and (j) is equivalent to
C F≺ E . Now, Proposition (ii) follows from Proposition (3.18).
T
(i).
X F≺ Y
T
(ii). Similarly,
X ≈Y
FT
(i).
(a) C F≺ D ⇒ ¬(D F≺ C )
T T
(b) C ≈ D ⇒ ¬(D F≺ C ).
FT T
Hence, in both cases in which C 4 D holds, we can conclude ¬(D ≺ C ), and in both cases
FT FT
in which D 4 C holds, we can conclude ¬(C F≺ D). Therefore,
FT T
⇒ ¬(C F≺ D ∧ D F≺ C )
T T
⇔ ¬ (C F≺ D) ∨ ¬ (D F≺ C ). [de Morgan]
T T
(iii). We presume ∃ s, t ∈T : C ⊂ Fs ∧ D ⊂ Ft .
Case 1: (∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft )
⇒ C ≈D [∃ s ∈T : C ⊂ Fs , Def. 3.24 (i)]
FT
⇒ (C 4 D ∨ D 4 C ).
FT FT
Case 2: ¬ (∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft )
¡ ¢
⇒ ∃ t ∈T : (C ⊂ Ft ∧ D 6⊂ Ft ) ∨ (C 6⊂ Ft ∧ D ⊂ Ft )
⇒ (C F≺ D ∨ D F≺ C ) [Def. 3.3 (i)]
T T
(C 4 D ∧ D 4 E )
F FT
¡ T ¢
⇔ (C F≺ D ∨ C ≈ D) ∧ (D F≺ E ∨ D ≈ E ) . [Def. 3.37 (i)]
T FT T FT
¡ ¢
⇔ (C ≺ D ∧ D ≺ E ) ∨ (C ≺ D ∧ D ≈ E ) ∨ (C ≈ D ∧ D ≺ E ) ∨ (C ≈ D ∧ D ≈ E ) .
FT FT FT FT FT FT FT FT
The latter proposition follows from repeatedly applying the first distributive law. Accord-
ing to the last proposition, we consider four cases.
Case 1: (C ≺ D ∧ D ≺ E )
FT FT
⇒ C ≺E [Th. 3.6 (ii)]
FT
Case 2: (C F≺ D ∧ D ≈ E )
T FT
⇒ C F≺ E [Th. 3.33 (ii)]
T
Case 3: (C ≈ D ∧ D F≺ E )
FT T
78 3 Time Order
Case 4: (C ≈ D ∧ D ≈ E )
FT FT
⇒ (C 0 F≺ D ∨ C 0 ≈ D) [Th. 3.32]
T FT
⇒ C0 4C . [Th. 3.40]
FT
C, D 4 E
FT
⇔ (C 4 E ∧ D 4 E ) [(3.44)]
FT FT
¡ ¢
⇔ (C F≺ E ∨ C ≈ E ) ∧ (D F≺ E ∨ D ≈ E ) [Def. 3.37 (i)]
T FT T FT
¡ ¢
⇔ (C F≺ E ∧ D F≺ E ) ∨ (C F≺ E ∧ D ≈ E ) ∨ (C ≈ E ∧ D F≺ E ) ∨ (C ≈ E ∧ D ≈ E ) .
T T T FT FT T FT FT
3.6 Proofs 79
The last proposition follows from repeatedly applying the first distributive law. Accord-
ingly, we consider four cases.
¡ ¢
Case 2: (C F≺ E ∧ D ≈ E ) ⇒ σ(C ∪ E ) ≈ E ∧ E ≈ D [Ths. 3.31 (i), 3.27 (ii)]
T FT FT FT
¡ ¢
⇒ σ σ(C ∪ D) ∪ E ≈ E [Ths. 3.29, 3.27 (iii)]
FT
Case 3: (C ≈ E ∧ D F≺ E ) ⇔ (D F≺ E ∧ C ≈ E )
FT T T FT
(i).
(ii). Similarly,
(i).
(ii).
¡ ¢
(X 4 Y ∧ Y 4 Z ) ⇔ σ(X ) 4 σ(Y ) ∧ σ(Y ) 4 σ(Z ) [Def. 3.37 (ii)]
FT FT FT FT
X ,Y 4 Z ⇔ (X 4 Z ∧ Y 4 Z ) [Prop. (3.48)]
FT FT FT
¡ ¢
⇔ 4
σ(X ) σ(Z ) ∧ σ(Y ) 4 σ(Z ) [Def. 3.37 (ii)]
FT FT
¡ ¢
⇒ σ σ(X ) ∪ σ(Y ) 4 σ(Z ) [Th. 3.44]
FT
3.7 Exercises
⊲ Exercise 3-1 Which are the elements of the σ-algebra σ(U ) in Example 3.2?
⊲ Exercise 3-2 Consider Table 1.2 and write down the inverse image X − 1 (A ′ ) of the set A ′ = {1} in
terms of a subset of Ω = {ω1 ,ω2,... ,ω8 }.
⊲ Exercise 3-3 Consider the random experiment presented in Example 3.2 and enumerate¢ all ele-
ments of the σ-algebra generated by X . Instead of specifying the value space ΩX′ ,P (ΩX′ ) of X with
¡
ΩX′ = {0,1} as in Example 3.2, choose (R ,B), where B denotes the Borel σ-algebra on the set R of real
numbers (see RS-Rem. 1.14).
⊲ Exercise 3-4 Consider Table 1.2 and write down the inverse images
(U , X ) − 1 (A i′ ) = ω ∈ Ω: (U , X )(ω) ∈ A i′ , i = 1,2,
© ª
in terms of subsets of Ω = {ω1 ,ω2 ,... ,ω8 } for A ′1 = {( Joe ,0)} and A ′2 = {( Joe ,1),(Ann ,0),(Ann ,1)}.
⊲ Exercise 3-5 Consider Example 3.5 and name three more pairs of sets systems such that one is
prior in FT to the other.
Solutions
⊲ Solution 3-1 The probability space (Ω,A,P) representing the random experiment described in
Table 1.2 is specified in Example 3.2. Furthermore, the random variable U is specified in Table 1.2.
The σ-algebra σ(U ) has four elements. Aside from Ω and Ø, these are the events
© ª
C = { Joe } × ΩX ×ΩY = ( Joe, no ,−),(Joe ,yes ,−),(Joe , no ,+),(Joe ,yes ,+)
3-2).
⊲ Solution 3-4 The first inverse image is
This is the event that Joe is drawn and not treated. The second inverse image is
(U , X ) − 1 (A ′2) = (U , X ) − 1 {( Joe ,1),(Ann ,0),(Ann ,1)} = {ω3 ,ω4 ,ω5 ,ω6 ,ω7 ,ω8 }
¡ ¢
[consider again Eqs. (3.11) to (3.13)]. This is the event that Joe is drawn and treated or Ann is drawn.
⊲ Solution 3-5 σ(U ) is prior in FT to σ(U , X ), σ(U ) is prior to σ(U , X ,Y ), and σ(X ) is prior to
σ(U , X ,Y ).
⊲ Solution 3-6
(A ⊂ B ∧ A ⊂ C ) ⇔ ∀a ∈ A : a ∈ B ∧ a ∈ C
⇔ ∀a ∈ A : a ∈ (B ∩C )
⇔ A ⊂ (B ∩C ).
82 3 Time Order
⊲ Solution 3-7
(A ⊂ C ∧ B ⊂ C ) ⇔ ∀a ∈ A : a ∈ C ∧ ∀b ∈B : b ∈ C
⇔ ∀c ∈ A ∪ B : c ∈ C
⇔ A ∪B ⊂ C.
⊲ Solution 3-8
(A ⊂ B ∧ B ⊂ C ) ⇔ ∀a ∈ A : a ∈ B ∧ ∀b ∈ B : b ∈ C
⇒ ∀a ∈ A : a ∈ C
⇔ A ⊂ C.
⊲ Solution 3-9 If α ∈ R , then W =α·X is the composition g (X ) of the measurable maps X : (Ω,A ) →
(R, B) and g : (R,B) → (R,B) defined by
g (x) = α · x, ∀x ∈ R.
g (x1 , x2 ) = β1 x1 + β2 x2 , ∀(x1 , x2 ) ∈ R 2.
(X ≈ Y ∧ Y ≺ Z )
FT FT
¡ ¢
⇔ σ(X ) ≈ σ(Y ) ∧ σ(Y ) ≺ σ(Z ) [Defs. 3.3 (ii), 3.24 (ii)]
F T F T
³ ¡ ¢ ´
⇔ σ σ(X ) ∪ σ(Y ) ≈ σ(Y ) ∧ σ(Y ) ≺ σ(Z ) [Th. 3.28 (i)]
FT FT
⇒ σ(W ) ≺ σ(Z ) [σ(W ) ⊂ σ(X ,Y ), Th. 3.33 (i)]
FT
⇔ W ≺ Z. [Def. 3.3 (ii)]
FT
(X ≺ Y ∧ Y ≈ Z )
FT FT
¡ ¢
⇔ σ(X ) ≺ σ(Y ) ∧ σ(Y ) ≈ σ(Z ) [Defs. 3.3 (ii), 3.24 (ii)]
FT FT
⇒ σ(W ) ≺ σ(Z ) [σ(W ) ⊂ σ(X ), Th. 3.33 (ii)]
FT
⇔ W ≺ Z. [Def. 3.3 (ii)]
FT
Chapter 4
Regular Causality Space and Potential Confounder
In chapter 1 we studied some examples showing that the conditional expectation values
E (Y |X =x ) of an outcome variable Y and their differences E (Y |X =x ) − E (Y |X =x ′ ), the
prima facie effects, can be seriously misleading in evaluating the causal effect of a treat-
ment variable X on an outcome variable Y. These examples demonstrate that the standard
probabilistic concepts — such as conditional expectation values or conditional probabili-
ties — cannot be used offhandedly to define the causal effects in which we are interested
when we have to evaluate a treatment, an intervention, or an exposition. In chapter 2, we
described random experiments of various research designs in which a causal total effect is
of interest. In chapter 3 we started the mathematical theory of causal effects introducing
the prior-to, simultaneous-to, and prior-or-simultaneous-to relations among measurable
set systems and among measurable maps. In the context of a probability space (Ω, A, P ),
these relations also apply to sets of events and random variables, respectively.
Overview
In the present chapter, we introduce the concepts of a regular causality space and a regular
causality setup. A regular causality setup provides the mathematical structure that allows
us to define a putative cause variable X , an outcome variable Y of X , a potential confounder
of X , and a potential mediator between X and Y . In many cases, conditional expectations
describing a causal dependence can be distinguished from conditional expectations that
have no such causal interpretation by their relationship to the potential confounders of
the putative cause variable X . This will be substantiated in the chapters on causality con-
ditions.
Just as in chapter 3, none of the concepts treated in this chapter involves a probabi-
lity measure. Instead, a measurable space (Ω, A ) (i. e., a set Ω and a σ-algebra A on Ω)
suffices. Note, however, that all properties of a regular causality space still hold if we add
a probability measure P on (Ω, A ), considering a probability space (Ω, A, P ). This will be
necessary as soon as we turn to random variables and their stochastic dependencies on
each other (see the chapters to come). In empirical applications, (Ω, A, P ) represents a
concrete random experiment, Ω the set of possible outcomes, and A the set of possible
events in this random experiment.
Prerequisites
Reading this chapter requires that the reader is familiar with the concepts of a σ-algebra, a
measurable space, and a measurable map as treated, for example, in the first two chapters
of Steyer (2024). These chapters will be referred to as RS-chapter 1 and RS-chapter 2, and
the same kind of shortcut is used when referring to other parts of that book.
84 4 Regular Causality Space and Potential Confounder
In this section, we introduce the notions of a regular causality space and a regular proba-
bilistic causality space. A regular causality space is the minimal formal framework in which
we can introduce the concepts of a putative cause variable, a potential confounder, a po-
tential mediator, and an outcome variable. In a regular probabilistic causality space we
can define true outcome variables, (causal) unbiasedness, and causality conditions, which
imply unbiasedness of various conditional expectations and their differences.
Defining a regular causality space (see Def. 4.6), we refer to a measurable space (Ω, A )
that is assumed to be the product of four measurable spaces (Ωt , At ), t ∈T = {1, 2, 3} (see
RS-Def. 1.15). This implies that Ω is the set product of the four sets Ω1 , Ω2 , and Ω3 , each of
which can itself be a set product of other sets.
Remark 4.1 [Intuitive Background of the Product Space (Ω, A )] When we consider the
effect of a putative cause variable X (treatment, intervention, or exposition variable) on
an outcome variable Y that is assessed for an observational unit to be sampled (often-
times a person), then there are variables that are prior to X . They represent attributes of
the observational unit before treatment. It is obvious that these pretreatment attributes or
their fallible observations cannot be caused by a subsequent treatment. Simple examples
are age, sex, race, socioeconomic status, or educational status before treatment. The set Ω1
occurring in Definition 4.6 should be chosen such that these pretreatment variables solely
depend on the elements of Ω1 .
Next, there might be variables that are simultaneous to X , for example, a second treat-
ment variable that varies at the same time as X . For example, you can drink coffee (or not)
and alcohol (or not) at the same time. The set Ω2 in Definition 4.6 should be chosen such
that variables that are simultaneous to a focused putative cause variable X (including X
itself) only depend on the elements of Ω2 .
Finally, the set Ω3 should be chosen such that the outcome variable Y depends, at least
in part, on the elements of this set. For details and a mathematical formulation of these
ideas, see Definitions 4.6 and 4.11. ⊳
Remark 4.2 [Intuitive Background of the Projections] Definition 4.6 refers to the projec-
tions (or coordinate maps) π1 to π3 (see, e. g., RS-Def. 2.27). Intuitively speaking, the pro-
jection π1 is a map that contains the information about all events that are prior to the pu-
tative cause variable. In any case, all nonconstant maps that only depend — in the sense of
measurability, not in the sense of a probabilistic dependence — on π1 are potential con-
founders.
In contrast, π2 contains the information about all events that are simultaneous to the
putative cause variable, say X , including the events represented by X itself. The projection
π2 can be multidimensional, that is, it may consist of several projections π2j . Hence, π2 =
(π2j , j ∈ J ) is a family of projections for a nonempty index set J .
Finally, the projection π3 contains the information about all events that are simultane-
ous or posterior to the outcome variable, say Y . We will require that Y depends at least in
part on π3 . This will allow us to choose V2 −V1 as an outcome variable, where V2 is a func-
4.1 Regular Causality Space and Setup 85
Remark 4.3 [Filtration (Ft )t ∈T ] In Definition 4.6 we specify a specific filtration (Ft )t ∈T ,
T = {1, 2, 3}. From the perspective of the focused putative cause variable, the σ-algebras
F1 and F2 represent the sets of past and present events, respectively. In contrast, F3 rep-
resents all events that are posterior to X . ⊳
Remark 4.4 [Cause σ-Algebra] In Remark 4.2, we already discussed the relation between
the projection π2 and a putative cause variable X . In Definition 4.6 (i), this relationship is
made explicit for the cause σ-algebra C with respect to which we will require a putative
cause variable X to be measurable (see Def. 4.11). ⊳
Remark 4.5 [Potential Confounder σ-Algebra] Intuitively speaking, we consider all vari-
ables as potential confounders of a putative cause variable X that are prior or simultane-
ous to X , except for X itself. In Definition 4.6 we introduce the concept of a confounder
σ-algebra of C , denoted DC . This σ-algebra DC can be interpreted to represent the set of
past and present possible events from the perspective of the cause σ-algebra C , except for
those that are represented by C itself. The σ-algebra DC also plays a crucial role in the
definition of a true outcome variable and all other concepts based thereon (see ch. 5). ⊳
Finally, let π2 = (π2j , j ∈ J ), where J is a nonempty finite subset of N and assume that
{Ω, Ø} 6= σ(π2j , j ∈ K ), Ø 6= K ⊂ J . Then we call
(i) C := σ(π2j , j ∈ K ) , Ø 6= K ⊂ J , the cause σ-algebra
(ii) DC := σ(π1, π2j , j ∈ J \ K ) the confounder σ-algebra of C
¡ ¢
(iii) (Ω, A ), (Ft )t ∈T ,C , DC a regular causality space .
Remark 4.7 [Regular Probabilistic Causality Space] Beginning with chapter 5, we will ad-
ditionally consider a probability measure P on the σ-algebra A, and then
¡ ¢
(Ω, A, P ), (Ft )t ∈T , C , DC
Treatment variable X
Outcome variable Y
Treatment
Success
Unit
π1
π2
π3
ω1 = ( Joe, no, −) Joe no − 0 0
ω2 = ( Joe, no, +) Joe no + 0 1
ω3 = ( Joe, yes, −) Joe yes − 1 0
ω4 = ( Joe, yes, +) Joe yes + 1 1
ω5 = (Ann, no, −) Ann no − 0 0
ω6 = (Ann, no, +) Ann no + 0 1
ω7 = (Ann, yes, −) Ann yes − 1 0
ω8 = (Ann, yes, +) Ann yes + 1 1
Remark 4.9 [(Ft )t ∈T Is a Filtration in A ] The family of σ-algebras (Ft )t ∈T , T = {1, 2, 3},
specified in the definition of a regular causality space [see Eq. (4.1)] is a filtration in A.
This immediately follows from RS-Equation (1.3) and RS-Equation (2.15). ⊳
Example 4.10 [Joe and Ann With a Single Treatment Variable] We illustrate the concepts
treated in Definition 4.6 by the kind of experiment presented in Table 4.1. It is the same
kind of experiment as already presented in Table 1.2. The first part of the experiment con-
sists of sampling a person u from the set Ω1 = { Joe, Ann }. Then the sampled person re-
ceives (yes) or does not receive (no) a treatment. The two treatment conditions are the
elements of the set Ω2 = {no, yes}. Finally, it is observed whether or not a success criterion
is reached at some appropriate time after treatment. These two possible outcomes are the
elements of the set Ω3 = {−, +}. Hence, the sets Ωt , t ∈T = {1, 2, 3}, occurring in Definition
4.6 are
Ω1 = { Joe, Ann }, Ω2 = {no, yes}, Ω3 = {−, +}.
(see Exercises 4-1 to 4-4). Furthermore, according to Definition 4.6, in this kind of experi-
ment,
4.1 Regular Causality Space and Setup 87
© ª
C = σ(π2) = {ω1 , ω2 , ω5 , ω6 }, {ω3 , ω4 , ω7 , ω8 }, Ω, Ø (4.3)
is necessarily the cause σ-algebra because J = K = {1} implying π2 = π21 = (π2j , j ∈ K ) (see
Exercises 4-5 and 4-6). Finally,
© ª
DC = σ(π1 ) = {ω1 , . . . , ω4 }, {ω5 , . . . , ω8 }, Ω, Ø (4.4)
Definition
¡ 4.11 [Regular Causality
¢ Setup and Potential Confounder]
Let (Ω, A ), (Ft )t ∈T ,C , DC be a regular causality space and X , Y , D X ,W measurable
maps on (Ω, A ). Then we call
(i) X a putative cause variable of Y if
(a) {Ω, Ø} 6= σ(X ) ⊂ C = σ(π2j , j ∈ K )
(b) ¬∃ K 0 ⊂ K , K 0 6= K , such that σ(X ) ⊂ σ(π2j , j ∈ K 0 )
(ii) Y an outcome or response variable if σ(Y ) 6⊂ F2.
(iii) D X a global potential confounder of X if σ(D X ) = DC
(iv) W a potential confounder of X if σ(W ) ⊂ σ(D X )
¡ ¢
(v) (Ω, A ), (Ft )t ∈T ,C , DC , X , Y a regular causality setup .
A potential confounder W of X is called trivial if σ(W ) = {Ω, Ø}.
In Definition 4.11, we require that all variables introduced in this definition are mea-
surable maps on the same measurable space (Ω, A ). Specifying (Ω, A ) and (Ft )t ∈T , we fix
which kind of experiment we are talking about, and with X and Y, we choose the puta-
tive cause variable and the outcome variable, respectively. Finally, with D X we specify the
global potential confounder of X . Note that the set of all potential confounders of X is the
set of all measurable maps on (Ω, A ) that are D X -measurable.
Remark 4.13 [ Putative Cause Variable] In Definition 4.11 (i), we postulate σ(X ) 6= {Ω, Ø}.
Hence, a putative cause variable X is not a constant. Requiring σ(X ) ⊂ C — and not
σ(X ) = C — means that X is not necessarily the only putative cause variable we might con-
sider in the framework of a given regular causality space. Instead we can also coarsen an
original putative cause variable and consider a new one with fewer values (see the example
in sect. 4.2.1). According to condition (b) of Definition 4.11 (i), there is no proper subset K 0
of K such that a putative cause variable X is measurable with respect to (π2j , j ∈ K 0 ). This
requirement secures, for example, that we cannot choose a putative cause variable that is
measurable with respect to a single projection, say π21 , if we choose C to be generated by
(π21 , π22). If, in a setting with π2 = (π21 , π22), a π21 -measurable putative cause variable X is
desired, then we have to choose C = σ(π21 ), implying σ(π22 ) ⊂ DC = σ(π1, π22 ). Hence, if
π2 = (π21 , π22 ) and σ(X ) ⊂ σ(π21 ), then the projection π22 is a potential confounder of X .
For more details see the first regular causality setup presented in the example of section
4.2.2. ⊳
Remark 4.14 [Outcome Variable] Requiring σ(Y ) 6⊂ F2, we secure that an outcome vari-
able is neither a constant nor a potential confounder of X . We allow for outcome variables
that are functions of a potential confounder and a map that is measurable with respect
to σ(π3 ). A typical example is a change variable Y = V2 − V1 between a σ(π3 )-measurable
variable V2 and a σ(π1 )-measurable V1 — for example, a pretest — assessing the same at-
tribute as V2, but before treatment. In this case, the change variable still depends on the
elements of Ω3 , but not exclusively (see Rem. 4.2).
Of course, a σ(π3 )-measurable map is an outcome variable as well. The definition of
an outcome variable also allows us to rescale, coarsen, or aggregate a new outcome vari-
able from other outcome variables. For example, instead of considering a ten-dimensional
outcome variable (Y1 , . . . , Y10 ) representing the binary answers to ten items in a question-
naire, one might be interested in a uni-dimensional outcome variable Y defined as the
sum Y1 + . . . + Y10 . ⊳
Remark 4.15 [Potential Confounder] According to Definition 4.11 (iv), each non-constant
measurable map W satisfying σ(W ) ⊂ DC is a potential confounder of X . Hence, each pre-
treatment variable W — which is measurable with respect to the projection π1 — is a po-
tential confounder of a putative cause variable X [see Def. 4.11 (iv)]. However, a treatment
variable can also be a potential confounder of another treatment variable if this other
treatment variable is focused as the putative cause variable [see Defs. 4.6 (ii) and 4.11 (i)].
⊳
Remark 4.16 [Covariate of X ] If W is a potential confounder of X , then we also call it a
covariate of X , in particular in a context in which we condition on W and X in a con-
ditional expectation such as E (Y | X ,W ) or in a conditional distribution P Y |X ,W . From a
mathematical point of view, the terms potential confounder of X and covariate of X are
synonyms. Note, however, that, in the statistical literature, the term ‘covariate’ is not used
unanimously. Sometimes it even refers to a putative cause variable of Y . ⊳
Remark 4.17 [Global Potential Confounder] According to Definition 4.11 (iii), each non-
constant measurable map D X satisfying σ(D X ) = DC is a global or comprehensive potential
confounder of X . Hence, instead of specifying the σ-algebra DC we may also specify a
global potential confounder D X , which may often be more convenient. The concept of a
global potential confounder also plays a crucial role in the definition of a true outcome
4.1 Regular Causality Space and Setup 89
F3 = σ(π1, π2 , π3 ) = A
F2 = σ(π1, π2)
D = σ(D X )
F1 = σ(π1)
σ(W1 ) σ(W3 )
σ(W2)
C σ(X )
σ(π3 )
Figure 4.1. The filtration (Ft )t ∈T and various σ-algebras in a regular causality space.
variable and all other concepts based thereon (see ch. 5). The σ-algebra σ(D X ) generated
by D X can be interpreted to represent, from the perspective of the cause σ-algebra C, the
set of past and present possible events, except for those that are represented by C itself. ⊳
Remark 4.18 [There Are Several Global Potential Confounders] There are always several
global potential confounders of X . For example, in Table 4.1, the projection π1 and the in-
dicator variable 1U = Joe of the event that Joe is sampled generate the same σ-algebra, which
is also the confounder σ-algebra if we consider the treatment variable X as the putative
cause variable. Therefore, we talk about ‘a’ — and not about ‘the’ — global potential con-
founder of X unless a specific global potential confounder is already specified. In contrast,
the confounder σ-algebra DC = σ(D X ) is uniquely determined, once the measurable space
(Ω, A ), the filtration (Ft )t ∈T , and the cause σ-algebra C are specified (see Def. 4.6). ⊳
Figure 4.1 shows the subset relationships between the various σ-algebras in a regular
causality setup. These relationships follow from the conditions specified in Definitions 4.6
and 4.11. The figure also shows which of the σ-algebras occur for the first time in one of the
four σ-algebras F1 , F2, and F3 . Note that W1 ,W2 , and W3 denote potential confounders of
X . The σ-algebra σ(Y ) generated by the outcome variable Y is not shown in the figure. It
can be a subset of σ(π3 ) or a any other subset of A as long as it is not a subset of F2.
90 4 Regular Causality Space and Potential Confounder
In experiments in which we have more than two treatment conditions, for example, treat-
ment a, treatment b, and control, we might be interested in comparing treatment a or
treatment b against control, or treatment a against treatment b. In each of these cases, we
use putative cause variables that are not maps on the original measurable space (Ω, A )
but on (Ω0 , A | Ω0 ), where Ω0 is a subset of Ω and A | Ω0 denotes the restriction of the
σ-algebra A to Ω0 . For detailed examples see sections 4.2.1 and 4.2.2.
Remark 4.20 [Restriction of a Set System and a σ-Algebra] Let E be a set of subsets of a
nonempty set Ω and Ω0 ⊂ Ω. Then
E | Ω0 := { Ω0 ∩ A : A ∈ E } (4.5)
of the projections πt , t ∈T , introduced in Definition 4.6. That is, each map πt | Ω0 is the re-
striction of the corresponding map πt to the subset Ω0 of Ω. ⊳
¡ ¢
A regular causality space (Ω, A ), (Ft )t ∈T , C , DC is a structure that entirely consists
of σ-algebras on the set Ω. According to the following theorem, the restrictions of these
σ-algebras to a subset Ω0 of Ω constitute a new regular causality space, which is the frame-
work for putative cause variables that are measurable maps on Ω0 (see sect. 4.2.1 for a
detailed example).
Theorem
¡ 4.22 [Restriction¢ of a Regular Causality Space]
Let (Ω, A ), (Ft )t ∈T , C , DC be a regular causality space and
Then
F1 | Ω0 = σ(π1| Ω0 )
¡ ¢
F2| Ω0 = σ (π1, π2)| Ω0 (4.9)
¡ ¢
F3 | Ω0 = σ (π1, π2, π3 )| Ω0
and
4.2 Examples 91
¡ ¢ ³¡ ¢¡ ¢ ´
(Ω, A ), (Ft )t ∈T , C , DC | Ω0 := Ω0 , A | Ω0 , Ft | Ω0 t ∈T , C | Ω0 , DC | Ω0 (4.10)
¡ ¢
is a regular causality space. It is called the restriction of (Ω, A ), (Ft )t ∈T ,C , DC to Ω0 .
(Proof p. 106)
Remark 4.23 [Restriction of a Putative Cause Variable and a Potential Confounder] Let
us also consider the restriction of the putative cause variable X to Ω0 ,
Adding these maps to the restriction of a regular causality space yields a new regular cau-
sality setup. ⊳
4.2 Examples
We illustrate the concepts treated in section 4.1 by the kind of experiment presented in Ta-
ble 4.2. In this kind of experiment, we first sample a person u from the set Ω1 = { Joe, Ann }.
Then the sampled person receives treatment a, b, or c. These treatment conditions are the
elements of the set Ω2 = {a, b, c }. In this experiment, there is no other treatment variable.
Finally, it is observed whether or not a success criterion is reached at some appropriate
time after treatment. These two possible outcomes are the elements of the set Ω3 = {−, +}.
Hence, the sets Ωt , t ∈T = {1, 2, 3}, occurring in Definition 4.6 are
and Ω = Ω1 × Ω2 × Ω3 contains the twelve elements listed in first column of Table 4.2.
Furthermore, we choose the σ-algebras on these sets to be At = P (Ωt ), t ∈T , the cor-
responding power sets. Because, in Definition 4.6, we require that (Ω, A ) is the product
of the measurable spaces (Ωt , At ) (see RS-Def. 1.15), the measurable space (Ω, A ) is com-
pletely specified, and with it the projections π1, π2, and π3 . The assignment of values of
these projections to all elements of Ω is explicitly shown in Table 4.2.
92 4 Regular Causality Space and Potential Confounder
ab
ab
Treatment variable X | Ω
Outcome variable Y | Ω
Treatment variable 1C
Treatment variable 1A
Treatment variable 1B
Treatment variable X
Treatment condition
Outcome variable Y
Success
Unit
π1
π2
π3
ω1 = ( Joe, a, −) Joe a − 0 1 0 0 0 1 0
ω2 = ( Joe, a, +) Joe a + 0 1 0 0 1 1 1
ω3 = ( Joe, b, −) Joe b − 1 0 1 0 0 0 0
ω4 = ( Joe, b, +) Joe b + 1 0 1 0 1 0 1
ω5 = ( Joe, c, −) Joe c − 2 0 0 1 0
ω6 = ( Joe, c, +) Joe c + 2 0 0 1 1
ω7 = (Ann, a, −) Ann a − 0 1 0 0 0 1 0
ω8 = (Ann, a, +) Ann a + 0 1 0 0 1 1 1
ω9 = (Ann, b, −) Ann b − 1 0 1 0 0 0 0
ω10 = (Ann, b, +) Ann b + 1 0 1 0 1 0 1
ω11 = (Ann, c, −) Ann c − 2 0 0 1 0
ω12 = (Ann, c, +) Ann c + 2 0 0 1 1
is necessarily the cause σ-algebra because J = K = {1}, implying π2 = π21 = (π2j , j ∈ K ) (see
Def. 4.6 and Exercise 4-7). Finally,
© ª
DC = σ(π1, π2j , j ∈ J \ K ) = σ(π1 ) = {ω1 , . . . , ω6 }, {ω7 , . . . , ω12 }, Ω, Ø
The σ-algebras generated by these three putative cause variables are subsets of C . They
can be used for comparing one treatment condition against the other two. All four putative
cause variables, X and the three indicator variables listed above, are maps on Ω, sharing
the same cause and confounder σ-algebras, and the same set of potential confounders.
Hence, we specified the four regular causality setups
¡ ¢ ¡ ¢
(Ω, A ), (Ft )t ∈T , C , DC , X , Y , (Ω, A ), (Ft )t ∈T , C , DC , 1A , Y ,
¡ ¢ ¡ ¢
(Ω, A ), (Ft )t ∈T , C , DC , 1B , Y , (Ω, A ), (Ft )t ∈T , C , DC , 1C , Y ,
which only differ in their putative cause variables (see Exercise 4-8).
Comparing Treatment a to b
C | Ωab = {Ωab ∩ C : A ∈ C }
©
= Ωab ∩ {ω1 , ω2 , ω7 , ω8 }, Ωab ∩ {ω3 , ω4 , ω9 , ω10 }, Ωab ∩ {ω5 , ω6 , ω11 , ω12 },
Ωab ∩ {ω1 , ω2 , ω3 , ω4 , ω7 , ω8 , ω9 , ω10 },
(4.18)
Ωab ∩ {ω1 , ω2 , ω5 , ω6 , ω7 , ω8 , ω11 , ω12 },
ª
Ωab ∩ {ω3 , ω4 , ω5 , ω6 , ω9 , ω10 , ω11 , ω12 }, Ωab ∩ Ω, Ωab ∩ Ø
© ª
= {ω1 , ω2 , ω7 , ω8 }, {ω3 , ω4 , ω9 , ω10 }, Ωab , Ø .
Note that C | Ωab contains only four elements (see the last equation), whereas the orig-
inal σ-algebra C contains eight elements. Checking conditions (a) to (c) of RS-Definition
1.4 for the set in the last displayed line shows that C | Ωab is in fact a σ-algebra on Ωab (see
Exercise 4-9).
94 4 Regular Causality Space and Potential Confounder
Finally, we may consider the restriction X | Ωab of the original putative cause variable X
to Ωab specified by
(
0, if ω ∈ {ω1 , ω2 , ω7 , ω8 }
X | Ωab (ω) = X (ω) =
1, if ω ∈ {ω3 , ω4 , ω9 , ω10 },
and the restriction Y | Ωab of the original outcome variable Y to Ωab specified by
(
0, if ω ∈ {ω1 , ω3 , ω7 , ω9 }
Y | Ωab (ω) = Y (ω) =
1, if ω ∈ {ω2 , ω4 , ω8 , ω10 }
¡ ¢
(see the last two colums of Table 4.2). Hence, (Ω, A ), (Ft )t ∈T , C , DC , X , Y | Ωab is a regular
causality setup in which we can compare treatments a and b to each other with respect to
the outcome variable Y | Ωab .
If we intend to compare two other treatment conditions to each other, for example, a to c
or b to c, then each
¡ of these comparisons ¢ would be based on the restriction of the regular
causality space (Ω, A ), (Ft )t ∈T ,C , DC to another subset of Ω. For comparing a to c, this
subset of Ω is
Now we illustrate the concepts introduced in section 4.1 by the kind of experiment pre-
sented in Table 4.3. Again, we sample a person u from the set Ω1 = { Joe, Ann }. Then the
sampled person receives or does not receive treatment a (e. g., drug a) and, simultane-
ously, he or she receives or does not receive treatment b (e. g., drug b). The pairs of these
treatment conditions are the elements of the set
Table 4.3. Joe and Ann with two simultaneous treatment variables
0
Treatment variable X | Ω
0
Outcome variable Y | Ω
Treatment variable 1A
Treatment variable Z
Treatment variable X
Outcome variable Y
π2 = (π21 ,π22 )
Treatment a
Treatment b
Success
Unit
π1
π3
ω1 = ( Joe, no, no, −) Joe (no, no) − 0 0 0 0 0 0
ω2 = ( Joe, no, no, +) Joe (no, no) + 0 0 0 1 0 1
ω3 = ( Joe, no, yes, −) Joe (no, yes) − 0 1 0 0
ω4 = ( Joe, no, yes, +) Joe (no, yes) + 0 1 0 1
ω5 = ( Joe, yes, no, −) Joe (yes, no) − 1 0 0 0
ω6 = ( Joe, yes, no, +) Joe (yes, no) + 1 0 0 1
ω7 = ( Joe, yes, yes, −) Joe (yes, yes) − 1 1 1 0 1 0
ω8 = ( Joe, yes, yes, +) Joe (yes, yes) + 1 1 1 1 1 1
ω9 = (Ann, no, no, −) Ann (no, no) − 0 0 0 0 0 0
ω10 = (Ann, no, no, +) Ann (no, no) + 0 0 0 1 0 1
ω11 = (Ann, no, yes, −) Ann (no, yes) − 0 1 0 0
ω12 = (Ann, no, yes, +) Ann (no, yes) + 0 1 0 1
ω13 = (Ann, yes, no, −) Ann (yes, no) − 1 0 0 0
ω14 = (Ann, yes, no, +) Ann (yes, no) + 1 0 0 1
ω15 = (Ann, yes, yes, −) Ann (yes, yes) − 1 1 1 0 1 0
ω16 = (Ann, yes, yes, +) Ann (yes, yes) + 1 1 1 1 1 1
(Ω, A ) = (Ω1 × Ω2 × Ω3 , A1 ⊗ A2 ⊗ A3 )
is completely specified, and with it the projections π1, π2 , π3 , and the σ-algebras of the fil-
tration (Ft )t ∈T [see Eqs. (4.1)].
Note that the first projection π1 is identical to the person variable U that has been used
in the examples of chapter 1, for instance. In this example, the second projection π2 =
(π21 , π22) consists of two projections π21 : Ω → Ω21 and π22 : Ω → Ω22 [see Eq. (4.19)]. Note
that π2 generates the same σ-algebra as (Z , X ).
Table 4.3 shows the sixteen elements ωi of the set Ω = Ω1 × Ω2 × Ω3 of all possible out-
comes of this kind of experiment. It also shows the values assigned to these elements by
the projections π1 to π3 , the treatment variable Z (that represents treatment a vs.¬a), the
treatment variable X (representing treatment b vs.¬b), and by the outcome variable Y.
First Regular Causality Setup. Again, in this kind of experiment, we can choose among
several maps as the focused putative cause variable. However, in contrast to the example
presented in section 4.2.1, now the cause and confounder σ-algebras change, even if we
stick to the unrestricted measurable space (Ω, A ).
96 4 Regular Causality Space and Potential Confounder
Referring to Definition 4.6, the index set J is {1, 2} because π2 = (π21 , π22). If we want to
choose X to take the role of the putative cause variable, then we have to specify K = {2},
which yields
Another choice of the index set K would not meet conditions (a) and (b) of Definition 4.11
(i) for X . For K = {2}, condition (a) is satisfied because Ø 6= σ(X ) = σ(π22) = C 1 . Condition
(b) is met as well because there is no proper subset K 0 of K = {2} such that X is measurable
with respect to (π2j , j ∈ K 0 ). In contrast, for K = {1}, condition (a) is not satisfied because
σ(X ) 6⊂ σ(π21 ). Finally, for K = {1, 2}, condition (a) is satisfied, however, condition (b) does
not hold because K 0 = {2} is a proper subset of K and the σ-algebra generated by X is
identical to (and therefore a subset of) σ(π2j , j ∈ K 0 ) = σ(π22).
Choosing K = {2} does not only imply C 1 = σ(π22 ) but also that the corresponding con-
founder σ-algebra is
In Table 4.3 we already specified the putative cause variable, the treatment variable X , and
the outcome variable Y. In this example, σ(X ) = σ(π22) = C 1 , σ(Z ) = σ(π21 ) ⊂ DC 1 , and
σ(Y ) = σ(π3 ). Hence, the treatment variable Z , which is simultaneous to X , as well as π1
and π21 are potential confounders of X because they all are DC 1 -measurable.
Finally, the bivariate projection (π1, π21 ) as well as (π1, Z ) are global potential con-
founders of X and can take the role of D X [see Def. 4.11 (iii)] because σ(π1, π21 ) = σ(π1, Z ) =
DC 1 . Hence, we completely specified the regular causality setup
¡ ¢
(Ω, A ), (Ft )t ∈T , C 1 , DC 1 , X , Y .
Second Regular Causality Setup. Because X and Z are simultaneous treatment vari-
ables, their roles can be exchanged. Hence, now we consider Z as the focused putative
cause variable and X as a potential confounder of Z (see again Table 4.3). For this purpose,
we specify K = {1}, which yields C 2 = σ(π21 ) and that DC 2 = σ(π1, π22) is the confounder σ-
algebra of Z (see again Def. 4.6). Now, the bivariate projection (π1, π22) as well as (π1, X ) are
global potential confounders of Z and can take the role of D Z [see Def. 4.11 (iii)]. Hence,
the regular causality setup is now
¡ ¢
(Ω, A ), (Ft )t ∈T , C 2 , DC 2 , Z , Y ,
and (π1, π22 ), π1, as well as X are potential confounders of Z . The outcome variable is still
Y.
Third Regular Causality Setup. In this example, we can also choose the bivariate map
(X , Z ) to take the role of a putative cause variable. This allows us to study the joint effects
of treatment a and b. In this case, K = {1, 2}, C 3 = σ(π21 , π22 ) is the cause σ-algebra, and
DC 3 = σ(π1) the confounder σ-algebra (see again Def. 4.6). Now, π1 is a global potential
confounder of the putative cause variable (X , Z ). Hence, a third regular causality setup in
the example presented in Table 4.3 is
¡ ¢
(Ω, A ), (Ft )t ∈T , C 3 , DC 3 , (X , Z ), Y ,
This new putative cause variable allows us to compare receiving both treatments to re-
ceiving at most one of the two treatments. The cause and the confounder σ-algebras are
still C 3 = σ(π2) = σ(π21 , π22 ) and DC 3 = σ(π1 ), respectively, because 1A is measurable with
respect to π2 but neither with respect to π21 nor with respect to π22. Hence, we specified
the fourth regular causality setup
¡ ¢
(Ω, A ), (Ft )t ∈T , C 3 , DC 3 , 1A , Y .
Fifth Regular Causality Setup. If, for example, we intend to compare receiving both
treatments to receiving
¡ no treatment, then ¢ we have to use the restriction of the regular
causality space (Ω, A ), (Ft )t ∈T , C 3 , DC 3 to the subset
this means
© ª
C 3 | Ω0 = {ω1 , ω2 , ω9 , ω10 }, {ω7 , ω8 , ω15 , ω16 }, Ω0 , Ø .
For, the intended comparison we can use the restriction X | Ω0 of the original putative
cause variable X (see Table 4.3) to Ω0 specified by,
(
0, if ω ∈ {ω1 , ω2 , ω9 , ω10 }
X | Ω0 (ω) = X (ω) =
1, if ω ∈ {ω7 , ω8 , ω15 , ω16 },
Of course, we might also want to compare receiving only treatment a to only receiving
treatment b, or receiving both treatments to receiving only one single treatment, and so
on.
¡ All such comparisons ¢ would need their own restriction of the regular causality space
(Ω, A ), (Ft )t ∈T , C 3 , DC 3 to the appropriate subset of Ω (see Exercise 4-10).
Consider again the example presented in Table 1.5. In contrast to the previous examples,
now we consider an outcome variable Y whose possible values are not only 0 and 1 any
more. Therefore, this example cannot be presented in the same format as before.
Nevertheless, we can specify the sets Ωt , t ∈ T = {1, 2, 3}, occurring in Definition 4.6. In
this example, they are Ω1 = {Tom , Tim , . . . , Mia }, Ω2 = {control, treatment 1, treatment 2 },
and Ω3 = R . Furthermore, we choose the σ-algebras on these sets to be At = P (Ωt ),
t = 1, 2, and A3 = B, the Borel σ-algebra on R (see RS-Rem. 1.14).
Requiring (Ω, A ) to be the product of the measurable spaces (Ωt , At ), t ∈T (see Def. 4.6),
the measurable space (Ω, A ) is completely specified and with it the projections π1 to π3 .
The projection π1 : Ω → Ω1 has the possible values Tom , Tim , . . . , Mia . Hence, π1 is identi-
cal to the person variable U used in Table 1.5. The projection π2 : Ω → Ω2 has the possible
values control, treatment 1, treatment 2, and the possible values of π3 : Ω → Ω3 are the real
numbers.
The filtration (Ft )t ∈T is specified by
Furthermore, the index sets occurring in Definition 4.6 are J = K = {1}, implying that the
cause σ-algebra is
C = σ(π2j , j ∈ K ) = σ(π2)
Choosing X as the putative cause variable and Y as the outcome variable (see Table 1.5),
we specify the regular causality setup
¡ ¢
(Ω, A ), (Ft )t ∈T ,C , DC , X , Y .
of the set Ω2 = {control, treatment 1, treatment 2 }, the second factor set in the set product
Ω = Ω1 × Ω2 × Ω3 .
Similarly, the outcome variable Y is the composition of the projection π4 and a measur-
able map f : (R , B) → (R , B), that is, Y = f (π4 ). This implies σ(Y ) ⊂ σ(π4 ) (see again RS-
Lemma 2.35). Hence, the values of Y only depend on the elements of Ω4 = R , the fourth
set in the set product Ω = Ω1 × Ω2 × Ω3 .
According to Definition 4.11 (iv), each non-constant measurable map W that satisfies
σ(W ) ⊂ DC is a potential confounder of X . In this example, this applies, for instance, to
educational status specified in Table 1.5, but also to sex, which is also an attribute of the
persons in the set Ω1 = {Tom , Tim , . . . , Mia }. Again, each of these variables can be written
as the composition of some map and the projection π1, implying that they are measurable
with respect to σ(π1 ) = DC (see Exercise 4-11).
4.3 Properties
In this section, we study some properties of a regular causality space, a putative cause
variable, an outcome variable, a potential confounder, and other maps that are measur-
able with respect to these maps.
According to the following theorem, the intersection of the cause σ-algebra C and the con-
founder σ-algebra DC is the trivial σ-algebra, and the cause and confounder σ-algebras, C
and DC , are measurable with respect to F2. In contrast to C , the confounder σ-algebra DC
can also be measurable with respect to F1 . According to Definition 4.6 (ii), this is the case
if J = K , implying DC = σ(π1 ). This is the case if there are no events A ∈ A that are simulta-
neous to the cause σ-algebra C and that are not elements of C .
Theorem
¡ 4.25 [Properties ¢ of C and DC ]
If (Ω, A ), (Ft )t ∈T , C , DC is a regular causality space, then
C ∩ DC = {Ω, Ø} (4.22)
C 6⊂ F1 (4.23)
C ⊂ F2 (4.24)
DC ⊂ F2 . (4.25)
(Proof p. 109)
Proposition (4.22) has two immediate implications for the σ-algebra generated by a puta-
tive cause variable and the σ-algebra generated by a potential confounder.
¡ ¢
Remark 4.26 [Two Immediate Implications] If (Ω, A ), (Ft )t ∈T ,C , DC , X , Y is a regular
causality setup, then
100 4 Regular Causality Space and Potential Confounder
because σ(W ) ⊂ DC . ⊳
Example 4.28 [Joe and Ann With a Single Treatment Variable] The issue addressed in Re-
mark 4.27 is exemplified by RS-Tables 1.2 and 1.4. These tables present random experi-
ments that differ from each other only in the probability measure P , whereas the structure
is the same as described in Example 4.10. Hence, in both examples, σ(X ) ∩ σ(U ) = {Ω, Ø}
holds for the putative cause variable X and the global potential confounder U . Because,
in this example, σ(X ) = C and σ(U ) = DC , this can be seen from Equations (4.3) and (4.4).
Obviously, the only elements shared by these σ-algebras are Ω and Ø.
Furthermore, while X and U are stochastically independent in the example of RS-Table
1.2, they are stochastically dependent in the example of RS-Table 1.4. This can be seen
from comparing the columns P (X =1|U =u ) in the two tables to each other. In RS-Table
1.2, the conditional probability P (X =1|U =u ) is the same for both persons u, which im-
plies stochastic independence of X and U . In contrast, in RS-Table 1.4, the conditional
probabilities P (X =1|U =Joe ) and P (X =1|U =Ann) differ from each other, and this implies
stochastic dependence of X and U . ⊳
According to the following theorem, a putative cause variable is not a potential con-
founder of itself, it is measurable with respect to F2, and it is not measurable with respect
to F1 .
Theorem
¡ 4.29 [Properties of a¢ Putative Cause Variable]
If (Ω, A ), (Ft )t ∈T ,C , DC , X , Y is a regular causality setup, then
σ(X ) 6⊂ DC (4.29)
σ(X ) ⊂ F2 (4.30)
σ(X ) 6⊂ F1 . (4.31)
(Proof p. 110)
4.3 Properties 101
Theorem
¡ 4.30 [Two Properties ¢ of a Potential Confounder]
If (Ω, A ), (Ft )t ∈T ,C , DC , X , Y is a regular causality setup and W a potential con-
founder of X , then
σ(W ) ⊂ F2 (4.32)
σ(W ) 6⊂ σ(X ) . (4.33)
(Proof p. 110)
Now we turn to time order concerning a putative cause variable and a potential con-
founder. According to the first proposition of the following theorem, a measurable map
V that is prior in (Ft )t ∈T to a putative cause variable X is measurable with respect to DC .
And, according to the second proposition, if V is DC -measurable, then it is prior or simul-
taneous in (Ft )t ∈T to X . Note that V can also be a global potential confounder of X .
Theorem
¡ 4.31 [A Measurable Map ¢ That Is Prior to X Is DC -Measurable]
Let (Ω, A ), (Ft )t ∈T ,C , DC , X , Y be a regular causality setup, V a measurable map on
(Ω, A ), and let FT denote the filtration (Ft )t ∈T . Then
V F≺ X ⇒ σ(V ) ⊂ DC (4.34)
T
Proposition (4.34) and Definition 4.11 (iv) immediately imply the following corollary, in
which we consider a non-constant measurable map on (Ω, A ) that is prior in (Ft )t ∈T to X .
Corollary
¡ 4.34 [Measurable Maps ¢ of a Potential Confounder]
Let (Ω, A ), (Ft )t ∈T ,C , DC , X , Y be a regular causality setup and let V,W be measurable
maps on (Ω, A ). If W is a potential confounder of X and {Ω, Ø} 6= σ(V ) ⊂ σ(W ), then V
is a potential confounder of X as well.
(Proof p. 111)
Theorem
¡ 4.35 [An Outcome Variable ¢ Is F3 -Measurable]
If (Ω, A ), (Ft )t ∈T ,C , DC , X , Y is a regular causality setup, then σ(Y ) 6⊂ F2 and
σ(Y ) ⊂ F3 . (4.36)
(Proof p. 111)
Remark 4.36 [Immediate Implications] Note that σ(Y ) 6⊂ F2 implies σ(Y ) 6⊂ F1 because
(Ft )t ∈T is a filtration. It also implies σ(Y ) 6⊂ DC because DC ⊂ F2 [see Prop. (4.25)]. ⊳
Corollary
¡ 4.37 [A Potential Confounder
¢ of X is Prior in (Ft )t ∈T to Y ]
If (Ω, A ), (Ft )t ∈T ,C , DC , X , Y is a regular causality setup and W a potential con-
founder of X , then
W F≺ Y . (4.37)
T
Theorem
¡ 4.38 [Measurable Map ¢ of Y and W ]
Let (Ω, A ), (Ft )t ∈T ,C , DC , X , Y be a regular causality setup, let W be F2-measurable,
and V a measurable map on (Ω, A ) that is not F2-measurable. Then
Under the assumptions of Theorem 4.38, a (W,Y )-measurable map V on (Ω, A ) that is
not F2-measurable is posterior in (Ft )t ∈T to X .
According to RS-Lemma 2.34, the premise σ(V ) ⊂ σ(W, Y ) holds for the composition of
(W, Y ) and a measurable map g . Hence, the propositions of Theorem 4.38 and Corollary
4.39 apply to such a composition g (W, Y ).
Corollary
¡ 4.40 [X Is Prior to g (W,¢ Y )]
Let (Ω, A ), (Ft )t ∈T ,C , DC , X , Y be a regular causality setup, let FT denote the filtra-
′
tion (Ft )t ∈T , and let W : (Ω, A ) → (ΩW , AW′ ) be measurable with respect to F2. Fur-
′ ′ ′ ′ ′ ′
thermore,
¡ ¢assume that g : (ΩY × ΩW , AY ⊗ AW ) → (Ωg , Ag ) is a measurable map and
σ g (W, Y ) 6⊂ F2. Then
¡ ¢
σ g (W, Y ) ⊂ F3 (4.40)
and
X F≺ g (W, Y ). (4.41)
T
Hence, if the composition g (W, Y ) is not measurable with respect to the σ-algebra F2,
then g (W, Y ) is F3 -measurable and X is prior in FT to g (W, Y ).
Example 4.41 [Linear Combination of W and Y ] A special case of a map g (W, Y ) men-
tioned in Corollary 4.40 that is not F2-measurable is a linear combination g (W, Y ) = αW +
βY , α, β ∈ R , provided that β 6= 0. Note that in Definition 4.11 (ii) we assume σ(Y ) 6⊂ F2,
which implies σ(Y ) 6= {Ω, Ø}. ⊳
Example 4.42 [Change Score Variable] A special case of a linear combination of W and Y
mentioned in Example 4.41 is the difference variable g (W, Y ) = Y −W , where W assesses
the same attribute as Y, only before treatment (see sect. 2.2). Again, remember the require-
ment σ(Y ) 6⊂ F2 [see Def. 4.11 (ii)], which implies σ(Y ) 6= {Ω, Ø}. ⊳
104 4 Regular Causality Space and Potential Confounder
Example 4.43 [Residual] Another special case of a function g (W, Y ) mentioned in Corol-
lary 4.40 that is not F2-measurable is the residual g (W, Y ) = Y − E (Y |W ) of Y with respect
to its W -conditional expectation. [Here, we additionally assume that P is a probability
measure on (Ω, A ) (see RS-sect. 4.5).] Of course, the requirements for Y mentioned in the
last examples still stand. ⊳
The conjunction of Definition 4.11, Remark 4.9, and Corollary 4.40 implies the following
corollary.
is a regular probabilistic causality setup. Note that in all of these setups we assume that Y
is not measurable with respect to F2, which implies σ(Y ) 6= {Ω, Ø}.
In this chapter, we specified the mathematical structure, and formulated the assumptions
under which we can define causal effects. The most important concepts are gathered in
Box 4.1 and their properties are summarized in Box 4.2. ¡ ¢
The fundamental concept is a regular causality space (Ω, A ), (Ft )t ∈T ,C , DC , which
consists of a measurable space (Ω, A ), a filtration (Ft )t ∈T , a cause σ-algebra C and a con-
founder σ-algebra DC . None of these concepts involves a probability measure. However,
in the context of a probability measure P on (Ω, A ), a measurable space (Ω, A ) is the math-
ematical framework for any statement about events, their probabilities, and their (causal
or noncausal) dependencies. It is also the framework for the definition of random vari-
ables, their distributions, and their (causal or noncausal) dependencies (see, e. g., Steyer &
Nagel, 2017).
A filtration (Ft )t ∈T in A allows to define time order among elements of A , among sub-
sets of A , and among measurable maps on (Ω, A ) (see ch. 3). In the context of a probability
4.4 Summary and Conclusions 105
¡ ¢
(Ω,A ),(Ft )t ∈T ,C , DC Regular causality space. It consists of a measurable space,
a filtration on A , and two σ-algebras C and DC on A .
(Ω,A ) Measurable space. (Ω,A ) is assumed to be the product of
the measurable spaces (Ωt ,At ), T = {1,2,3}. For all t ∈T
and all ωt ∈ Ωt , we assume {ωt } ∈ At .
(Ft )t ∈T Filtration (Ft )t ∈T in A . It consists of the σ-algebras F1
= σ(π1 ), F2 = σ(π1,π2 ), and F3 = σ(π1,π2,π3 ), where πt ,
t ∈T , are the projections πt : Ω → Ωt defined by πt (ω) = ωt ,
for all ω ∈ Ω. The projection π2 may itself consist of several
projections, that is, π2 = (π2j , j ∈ J ), J ⊂ N.
C Cause σ-algebra. For Ø 6= K ⊂ J it is defined by
C = σ(π2j , j ∈ K ) .
space (Ω, A, P ), this is tantamount to time order among events, among sets of events, and
among random variables, respectively. Such a time order is indispensable for a formaliza-
tion of the intuitive idea that a cause precedes its outcome. The filtration (Ft )t ∈T consists
of three σ-algebras. Their substantive interpretation is that, from the perspective of the
cause, F1 represents the set of all past events, F2 the set of all past and present events, and
F3 the set of all past, present, and future events.
The third and fourth components of a regular causality space are the cause σ-algebra
C and its confounder σ-algebra DC . In the context of a probability space and from the
perspective of a focused putative cause variable X , the σ-algebra C represents the set of
all present events with respect to which a putative cause variable X is measurable. In con-
trast, DC consists of all past and all other present events. They are the events that might
confound or disturb the stochastic dependency of the outcome variable Y on the putative
cause variable X .
Adding a putative cause variable X and¡ an outcome variable Y to¢a regular causality
space constitutes a regular causality setup (Ω, A ), (Ft )t ∈T ,C , DC , X , Y . By definition, the
putative cause variable X satisfies two conditions. According to condition (a), it is C -
measurable and its generated σ-algebra is not trivial. The latter would be the case if X
would be a constant map. According to the condition (b), there is no family of projections
(π2j , j ∈ K 0 ), K 0 ⊂ K , such that X is measurable with respect to this family of projections.
This condition secures that the confounder σ-algebra DC is chosen such that all other pu-
tative cause variables that are simultaneous to X are measurable with respect to DC .
By definition, the outcome variable Y is not measurable with respect to F2. Hence, an
outcome variable Y can either solely depend on π3 or depend on π3 and one or more of the
projections π1 and π2. This allows, for instance, to consider pre-post difference variables
(‘pre’ and ‘post’ with respect to the focused putative cause variable) as outcome variables
(see Examples 4.41 to 4.43).
Furthermore, we introduced the concepts of a potential confounder and a global po-
tential confounder of a putative cause variable X . By definition, a potential confounder W
of a putative cause variable X is a nonconstant DC -measurable map, and a global potential
confounder D X of X is a potential confounder that generates the confounder σ-algebra DC .
Hence, all potential confounders are measurable with respect to D X .
Last¡ but not least, in Theorem¢ 4.22 we showed that the restriction of a regular causality
space (Ω, A ), (Ft )t ∈T , C¡ , DC | Ω0 to Ω0 ⊂ Ω is¢ a new regular causality space in which all
σ-algebras involved in (Ω, A ), (Ft )t ∈T ,C¡, DC are restricted to Ω0 . Adding
¢ the restrictions
of the maps X and Y to Ω0 then yields (Ω, A ), (Ft )t ∈T , C , DC , X , Y | Ω0 , the restriction of
a regular causality setup to Ω0 , which is the formal framework in which we can consider
putative cause variables and outcome variables that are measurable maps on Ω0 (see the
Examples in sect. 4.2.1 and 4.2.2).
4.5 Proofs
Remember,¡ (Ω, A ) is¢ the product of the measurable spaces (Ωt¡, At ), t ∈T . ¢Hence, first we
show that Ω0 , A | Ω0 is the product of the measurable spaces Ω0t , At| Ω0t , t ∈T . That is,
we show A | Ω0 = A1| Ω01 ⊗ A2| Ω02 ⊗ A3 | Ω03 . Now consider the set system
4.5 Proofs 107
All
¡ propositions gathered¢ in this box refer to the same regular causality setup
(Ω,A ),(Ft )t ∈T ,C , DC , X ,Y and throughout this box, W denotes a potential confounder
of X , which by definition, satisfies σ(W ) ⊂ DC .
C ∩ DC = {Ω,Ø} (i)
C 6⊂ F1 (ii)
C ⊂ F2 (iii)
DC ⊂ F2 (iv)
σ(Y ) ⊂ F3 (ix)
∀ t ∈ {1,2}: σ(Y ) 6⊂ Ft (x)
σ(Y ) 6⊂ DC (xi)
σ(W ) ⊂ F2 (xii)
σ(W ) 6⊂ σ(X ) (xiii)
W 4 X. (xiv)
FT
3
×A : A ∈ A |
n o
E 0 := 0t 0t t Ω0t
t =1
3
= × A : A ∈ {Ω ∩ A : A ∈ A }
n o
0t 0t 0t t t t [(4.5)]
t =1
3
= ×(Ω ∩ A ): A ∈ A
n o
0t t t t [A 0t = Ω0t ∩ A t ]
t =1
3 3
= ×Ω ∩× A : A ∈ A
n o
0t t t t
t =1 t =1
3 3
= Ω ∩× A : A ∈ A . ×Ω
n o h i
0 t t t Ω0 = 0t
t =1 t =1
108 4 Regular Causality Space and Potential Confounder
According to RS-Equation (1.10), the set system E 0 is a generating system of the product
σ-algebra A1| Ω01 ⊗ A2| Ω02 ⊗ A3 | Ω03 . Hence, using this fact and the last equation yields
3
O
At| Ω0t = σ(E 0 ) [RS-Eq. (1.10)]
t =1
3 3
×A : A ∈ A ×A : A ∈ A
³n o´ h n oi
= σ Ω0 ∩ t t t E 0 = Ω0 ∩ t t t
t =1 t =1
3 3 3
×A : A ∈ A
³n O o´ h n o O i
= σ Ω0 ∩ A : A ∈ At RS-(1.3), t t t ⊂ At
t =1 t =1 t =1
n 3
O o
= Ω0 ∩ A : A ∈ At [RS-(1.6)]
t =1
© ª
= Ω0 ∩ A : A ∈ A [Def. 4.6, RS-Eq. (1.10)]
= A | Ω0 . [(4.5)]
Hence, A | Ω0 is in fact the product of the σ-algebras At| Ω0t , t ∈T , as required in the defini-
tion of a regular causality
¡ space.
¢
Next we show that Ft | Ω0 t ∈T is a filtration in A | Ω0 .
F1 ⊂ F2 ⊂ F3
⇒ { Ω0 ∩ A : A ∈ F1 } ⊂ { Ω0 ∩ A : A ∈ F2 } ⊂ { Ω0 ∩ A : A ∈ F3 }
⇔F1 | Ω0 ⊂ F2 | Ω0 ⊂ F3 | Ω0 . [(4.5)]
¡ ¢
The filtration Ft | Ω0 t ∈T satisfies Equations (4.9), which can be seen as follows:
F1 | Ω0 = { Ω0 ∩ A : A ∈ F1 } [(4.5)]
© ª
= A 0 : A 0 ∈ F1 | Ω0 [A 0 := Ω0 ∩ A ]
n© ª o
= ω ∈ Ω0 : π1 | Ω0 (ω) ∈ A 0 : A 0 ∈ F1 | Ω0 [(4.8)]
n¡ ¢−1 o
= π1 | Ω0 (A 1 ): A 1 ∈ A1| Ω01 [RS-(2.1)]
¡ ¢
= σ π1 | Ω 0 . [RS-(2.11), (2.12)]
F2| Ω0 = { Ω0 ∩ A : A ∈ F2 } [(4.5)]
© ª
= A 0 : A 0 ∈ F2 | Ω0 [A 0 := Ω0 ∩ A ]
n© ¡ ¢ ª o
= ω ∈ Ω0 : π1 | Ω0 , π2| Ω0 (ω) ∈ A 0 : A 0 ∈ F2| Ω0 [(4.8)]
©¡ ¢ −1 ª
= π1 | Ω0 , π2| Ω0 (A 12 ): A 12 ∈ A1| Ω01⊗ A2| Ω02 [RS-(2.1)]
¡ ¢
= σ π1 | Ω 0 , π2 | Ω 0 . [RS-(2.11), (2.12)]
F3 | Ω0 = { Ω0 ∩ A : A ∈ F3 } [(4.5)]
© ª
= A 0 : A 0 ∈ F3 | Ω0 [A 0 := Ω0 ∩ A ]
©© ¡ ¢ ª ª
= ω ∈ Ω0 : π1 | Ω0 , π2 | Ω0 , π3 | Ω0 (ω) ∈ A 0 : A 0 ∈ F3 | Ω0 [(4.8)]
n¡ ¢−1 O 3 o
= π1 | Ω0 , π2| Ω0 , π3 | Ω0 (A 123 ): A 123 ∈ At| Ω01 [RS-(2.1)]
¡ ¢ t =1
= σ π1 | Ω 0 , π2 | Ω 0 , π3 | Ω 0 . [RS-(2.11), (2.12)]
4.5 Proofs 109
C | Ω0 = { Ω0 ∩ A : A ∈ C } [(4.5)]
© ª
= Ω0 ∩ A : A ∈ σ(π2j , j ∈ K ) [Def. 4.6 (i)]
© ¡ ¢ª
= A 0 : A 0 ∈ σ π2j | Ω0 , j ∈ K [A 0 = Ω0 ∩ A ]
¡ ¢
= σ π2j | Ω0 , j ∈ K .
DC | Ω0 = { Ω0 ∩ A : A ∈ DC } [(4.5)]
© ª
= Ω0 ∩ A : A ∈ σ(π1, π2j , j ∈ J \ K ) [Def. 4.6 (ii)]
© ¡ ¢ª
= A 0 : A 0 ∈ σ π1 | Ω0 , π2j | Ω0 , j ∈ J \ K [A 0 = Ω0 ∩ A ]
¡ ¢
= σ π1 | Ω0 , π2j | Ω0 , j ∈ J \ K .
Equation (4.22). Note that K ∩ ({1} ∪ J \ K ) = Ø holds for the sets used in Definition 4.6.
Hence,
C ⊂ F1 ⇔ C ⊂ σ(π1) [(4.1)]
⇒ C ⊂ DC , [(3.16), Def. 4.6 (ii)]
which is a contradiction to Proposition (4.22) because C 6= {Ω, Ø} [see Def. 4.6 (i)].
Proposition (4.24). We assume Ø 6= K ⊂ J (see Def. 4.6). Hence,
Proposition (4.25).
Proposition (4.29). In Definition 4.11 (i), we assume σ(X ) 6= {Ω, Ø}. Hence, σ(X ) ⊂ DC
would be a contradiction to Equation (4.26).
Proposition (4.30).
Proposition (4.32).
Proposition (4.34). According to Propositions (4.30) and (4.31), σ(X ) ⊂ F2 and σ(X ) 6⊂ F1 .
Hence,
Proposition (4.35).
4.5 Proofs 111
σ(V ) ⊂ DC
⇒ σ(V ) ⊂ σ (π1, π2 ) [Def. 4.6 (ii), (3.16)]
⇔ σ(V ) ⊂ F2 [(4.1)]
¡ ¢
⇒ σ(V ) ⊂ F1 ∨ σ(V ) ⊂ F2 ∧ σ(V ) 6⊂ F1 [F1 ⊂ F2]
⇒ σ(V )F≺ σ(X ) ∨ σ(V ) ≈ σ(X ) [Def. 3.3, F1 6⊃ σ(X ) ⊂ F2, Def. 3.24]
T FT
⇔ V 4X. [Def. 3.37 (ii)]
FT
4.6 Exercises
⊲ Exercise 4-1 Write down the set of all values of the bivariate projection (π1,π2 ) occurring in Equa-
tion (4.2).
⊲ Exercise 4-2 Consider Table 4.1 and gather in a set all elements of the σ-algebra σ(π1 ), using RS-
Equations (2.1), (2.11), and (2.12).
⊲ Exercise 4-3 Consider Table 4.1 and write down the product σ-algebra A1 ⊗ A2 with all its ele-
ments. The σ-algebras A1 and A2 are specified in Example 4.10 to be the power sets of Ω1 and Ω2 ,
respectively. Use RS-Equation (1.10) as well as RS-Definitions 1.4 and 1.7.
⊲ Exercise 4-4 Consider Table 4.1 and gather in two sets all elements of the two inverse images
(π1,π2) − 1 ({( Joe ,no )}) and (π1,π2 ) − 1 ({( Joe ,no ),(Ann ,no )}), using RS-Equation (2.1).
⊲ Exercise 4-5 In the context of a probability space (Ω,A,P), what is the substantive interpretation
of the sets {ω1 ,ω2,ω5 ,ω6 } and {ω3 ,ω4 ,ω7 ,ω8 } occurring in the set C specified in Equation 4.3?
⊲ Exercise 4-6 Enumerate all elements of the σ-algebra generated by the map X : Ω → R speci-
fied in Table 4.1, using σ(X ) = X − 1 (B) [see RS-Eqs. (2.11) and (2.12)], where B denotes the Borel
σ-algebra on R .
where A2 = P ({a,b,c }) [see RS-Eqs. (2.11) and (2.12))]. Write down the power set P ({a,b,c }), list-
ing its elements in the sequence corresponding to the sequence of the elements in C .
⊲ Exercise 4-8 Gather in a set of all elements of the σ-algebra σ(1A ) generated by the indicator vari-
able 1A : Ω → {0,1} specified in Equation (4.15). Use the power set of {0,1} as the σ-algebra on {0,1},
and RS-Equations (2.1), (2.11), and (2.12).
⊲ Exercise 4-9 Check conditions (a) to (c) of RS-Definition 1.4 for the set in the last displayed line
of Equation (4.18).
⊲ Exercise 4-11 Consider the example presented in Table 1.5 and specify the measurable map
W : Ω → {male,female } by assigning to each element of Ω the values male or female according to
the sex of the person to be sampled. Then specify W once again as the composition of the projection
π1 : Ω1 → Ω1 and a map g : Ω1 → {male,female }.
Solutions
© ª
⊲ Solution 4-1 The set of all values of (π1,π2) is ( Joe ,no ), ( Joe ,yes ), (Ann ,no ), (Ann ,yes ) .
© ª
⊲ Solution 4-2 In this example, Ω1 = { Joe , Ann } and A1 = { Joe },{Ann },Ω1 ,Ø is the power set of
Ω1 (see Example 4.10). Hence,
σ(π1 ) = π1− 1 (A 1 ): A 1 ∈ A1
© ª
[RS-(2.11), RS-(2.12)]
= π1− 1 ({ Joe }), π1− 1 ({Ann }), π1− 1 (Ω1 ), π1− 1 (Ø)
© ª
[A1 = P (Ω1 )]
© ª
= {ω1 ,... ,ω4 }, {ω5 ,... , ω8 }, Ω, Ø . [RS-(2.1), Table 4.1]
4.6 Exercises 113
= {ω1 ,ω2 }
and
(π1,π2 ) − 1 ({( Joe ,no ),(Ann ,no )}) = ω ∈ Ω: (π1,π2 )(ω) ∈ {( Joe ,no ),(Ann ,no )}
© ª
⊲ Solution 4-5 In the context of a probability space (Ω,A,P), the set {ω1 ,ω2 ,ω5 ,ω6 } is the event
that the sampled person receives treatment and {ω3 ,ω4 ,ω7 ,ω8 } is the event that it does not receive
treatment.
⊲ Solution 4-6 Although there is an uncountably infinite number of elements B in the Borel σ-al-
gebra B on the set R of real numbers (see RS-Rem. 1.14), there are only four different inverse images
X − 1 (B) of sets B ∈ B under X because
{ω ,ω ,ω ,ω }, if 0 ∉B and 1 ∈B
3 4 7 8
{ω ,ω ,ω ,ω },
1 2 5 6 if 0 ∈B and 1 ∉B
∀B ∈ B : X − 1 (B) =
Ω, if 0 ∈B and 1 ∈B
Ø, if 0 ∉B and 1 ∉B.
These four inverse images are the elements of the σ-algebra σ(X ) = X − 1 (B) generated ¡ by X [see
RS-Eqs. (2.11) and (2.12)]. Because, in this example, X is binary, σ(X ) = X − 1 (B) = X − 1 P ({0,1}) .
¢
© ª
⊲ Solution 4-7 A2 = P ({a,b,c }) = {a },{b },{c },{a,b }, {a,c },{b, c },Ω2 ,Ø .
114 4 Regular Causality Space and Potential Confounder
⊲ Solution 4-8
⊲ Solution 4-9 Condition (a) of RS-Definition 1.4 holds for the set C | Ωab because Ωab ∈ C | Ωab .
Condition (b) also holds because the complement C c = Ωab \ C of each set C ∈ C | Ωab is an element
of C | Ωab as well. Finally, Condition (c) of RS-Definition 1.4 holds too. For each sequence C 1 ,C 2 ,...
of elements of C | Ωab , the union of these elements is an element of C | Ωab as well. Examples of such
sequences are
{ω1 ,ω2 ,ω7 ,ω8 }, Ø, Ø, ...
{ω1 ,ω2,ω7 ,ω8 }, {ω3 ,ω4 ,ω9 ,ω10 }, Ωab , Ωab ,...
and
{ω1 ,ω2,ω7 ,ω8 }, {ω3 ,ω4 ,ω9 ,ω10 }, Ωab ,Ø, Ø, ... .
The union of the elements of such sequences is always itself an element of C | Ωab .
Alternatively, we may define W as the composition W = g (π1) of the first projection π1 and a map
g : Ω1 → {male,female }, where
Ω1 = {Tom ,Tim ,... ,Mia } (see the first column of Table 1.5), and
(
male, if ω1 ∈ {Tom ,... , Jim }
g (ω1 ) =
female, if ω1 ∈ {Ann ,... ,Mia }.
Chapter 5
True Outcome Variable and Causal Total Effects
In chapter 4, we introduced the notion of a regular causality setup, which is the formal
framework in which we can discuss time order of measurable maps (see ch. 3), define
putative cause variables, their potential confounders, outcome variables, and study their
mathematical properties. We also defined the concept of a global potential confounder of
a putative cause variable X , denoted D X . A global potential confounder of X comprises all
potential confounders of X in the sense that the σ-algebra generated by a potential con-
founder of X is a subset of the σ-algebra generated by D X (for a detailed summary see Box
4.1).
In chapter 4, we also mentioned that a regular causality setup is also called a regular
probabilistic causality setup if there is a probability measure P on the measurable space
(Ω, A ). In the framework of a probability space (Ω, A, P ), the measurable maps mentioned
above, including the putative cause variables, their potential confounders, and the out-
come variables are random variables on (Ω, A, P ) (see RS-Def. 2.2).
In this and all other chapters
¡ to come we assume¢ that there is such a regular prob-
abilistic causality setup (Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y , which has all the properties of a
regular causality setup studied in chapter 4. In the framework of such a regular probabilis-
tic causality setup we can meaningfully define causal effects and dependencies among
two random variables X and Y on a probability space (Ω, A, P ).
We begin this chapter defining a true outcome variable τx = E X =x(Y |D X ) as a version
of the D X -conditional expectation of Y with respect to the (X =x )-conditional probabil-
ity measure P X =x on A (see RS-ch. 5), where x denotes a value of X for which we assume
P (X =x ) > 0. As mentioned above, with D X we condition on all potential confounders of X .
Although, in empirical applications, the values of such a true outcome variable are rarely
estimable, the expectation of τx as well as the conditional expectation of τx given a covari-
ate of X , can be estimated under appropriate and realistic assumptions.
Based on the concept of a true outcome variable we then define a true total effect vari-
able. The expectation of the true total effect variable is then defined to be the causal aver-
age total effect on Y comparing x to x ′ , where x ′ denotes a second value of X . Then we turn
to the definition of a causal conditional total effect of x compared to x ′ given the value z
of a random variable Z , and a causal Z -conditional total effect function comparing treat-
ment x to treatment x ′ . Each of these kinds of conditional total effects or effect functions
provides specific information that might be of interest in empirical causal research and
evaluation studies.
In the first place, these parameters and effect functions are of a purely theoretical na-
ture. However, in the next chapter we study how and under which assumptions these vari-
ous kinds of causal effects can be identified by empirically estimable parameters and how
the causal effect functions can be identified by empirically estimable functions.
116 5 True Outcome Variable and Causal Total Effects
Requirements
Reading this chapter requires that the reader is familiar with the contents of the first five
chapters of Steyer (2024), referred to as RS-chapters 1 to 5, the first two of which have al-
ready been required in chapters 3 and 4 of this book. RS-chapter 3 deals with the concepts
expectation, variance, covariance, and correlation, and RS-chapter 4 with the concept of a
conditional expectation. The most important one for the present chapter is RS-chapter 5,
introducing the concept of a conditional expectation with respect to the probability mea-
sure P X =x .
In this chapter, we will often refer to some of the following assumptions and notation.
In this section, we introduce the concept of a (total effects) true outcome variable of Y
given the value x of X . ¡This concept can be defined ¢in the framework of a regular proba-
bilistic causality setup (Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y if we assume P (X =x ) > 0. Although
this concept is not mandatory for all causality conditions (see, e.g., chs. 8 and 9, it is fun-
damental for several causality conditions and useful for others.
In the definition of a true outcome variable we consider a D X -conditional expecta-
tion of Y with respect to the conditional probability measure P X =x , denoted E X =x (Y |D X )
(see RS-Def. 5.4), where D X denotes a global potential confounder of X (see Def. 4.6).
Considering the term E X =x (Y |D X ) is tantamount to conditioning on the event {X =x } =
{ω ∈ Ω: X (ω) = x } and on all potential confounders of X , presuming P (X =x ) > 0.
Remark 5.3 [Two Concepts] The intuitive idea of a true outcome variable outlined in Re-
mark 5.2 can mathematically be specified by conditional expectations E X =x (Y |D X ) with
respect to a conditional probability measure P X =x (see RS-section 5.1). If P (X =x ) > 0, then
this concept can be used to describe how a numerical random variable Y depends on a
random variable D X given the event {X =x } that X takes on the value x. The difference
′
E X =x (Y |D X ) − E X =x (Y |D X )
then defines the true effect variable comparing the values x and x ′ to each other (see
′
Def. 5.18), provided that we assume that E X =x (Y |D X ) and E X =x (Y |D X ) are P-unique (see
sect. 5.2 below).
Alternatively, we might use the partial conditional expectations E (Y | X =x , D X ) and
E (Y | X =x ′, D X ) (see RS-Def. 5.34) comparing conditions x and x ′ to each other. According
to RS-Theorem 5.40, if P (X =x ) > 0, then E (Y | X =x , D X ) is also a version of the conditional
expectation E X =x (Y |D X ) with respect to the conditional probability measure P X =x . Hence,
we may use both concepts and their corresponding notation for the definition of a true
outcome variable. However, E X =x (Y |D X ) is more convenient because we can immediately
utilize the (at least since Kolmogorov, 1933/1977) well-known properties of conditional
expectations, which are presented in some detail in RS-chapter 4. Reading the following
definition, remember that X (Ω) = {X (ω): ω ∈ Ω } denotes the image of Ω under the map X .
⊳
τx := E X =x (Y |D X ) (5.1)
1 The idea of comparing conditional expectation values on the individual level is already found in Splawa-
Neyman (1923/1990). Later, it has been oversimplified by Rubin, who compares the values Y x (u) and Y x ′ (u)
of his potential outcome variables between two treatment conditions x and x ′ (see, e. g., Rubin, 1974, 2005).
118 5 True Outcome Variable and Causal Total Effects
Note that τx is a random variable on the probability space (Ω, A, P ), just as the putative
cause X , the outcome variable Y, and the global potential confounder D X of X . Only Y
has to be real-valued whereas X and D X may take on their values in any set ΩX′ and Ω D ′
X
,
respectively.
E X =x (Y |D X =d ) = E (Y | X =x , D X =d), (5.3)
[see RS-Eq. (5.26)]. That is, a value of a true outcome variable τx is identical to a con-
ditional expectation value of Y given the value x of X and the value d of a global po-
tential confounder D X . Such a conditional expectation value is uniquely defined only if
P (X =x , D X =d ) > 0 (see RS-Rem. 4.33). ⊳
Remark 5.9 [The Values of τx Cannot be Observed] Because the values of τx are the con-
ditional expectation values E X =x (Y |D X =d ), they cannot be observed. In contrast, the val-
ues of X and Y can be observed, if the random experiment represented by (Ω, A, P ) is con-
ducted. In most applications, the values of τx even cannot be estimated (for more details,
see Rem. 5.13). ⊳
Remark 5.10 [ τx is D X -Measurable] Note that τx is a random variable on the probability
space (Ω, A, P ) that is measurable with respect to D X , that is, σ(τx ) ⊂ σ(D X ). This immedi-
ately follows from RS-Definition 5.4 (a). ⊳
Remark 5.11 [Factorization of τx ] According to RS-Corollary 4.18, there is a measurable
function g x : (Ω′D X , A ′D X ) → (R, B) such that τx = E X =x (Y |D X ) = g x (D X ) is the compo-
sition of D X and the factorization g x of E X =x (Y |D X ) (see RS-sect. 5.29). Therefore, if
P (X =x , D X =d ) > 0, then we may rewrite Equations (5.2) and (5.3) as follows:
5.1 True Outcome Variable 119
Probability
Possible outcomes ωi measures Observables Conditional expectations
Treatment variable X
Outcome variable Y
Person variable U
τ0 = E X=0 (Y | U )
τ1 = E X =1 (Y | U )
δ10 = τ1 −τ0
P X=0({ωi })
P X =1({ωi })
P (X =1| U )
E(Y | X ,U )
Treatment
E(Y | X )
Success
P ({ωi })
Unit
Example 5.12 [Joe and Ann With Self-Selection] Consider the random experiment pre-
sented in Table 5.1. It describes the same random experiment as Table 1.2 and it has
the same structure as the experiment presented in Table 4.1. However, in Table 5.1 we
added some terms that are important to illustrate the true outcome variables τ0 , τ1 and
their difference τ1 − τ0 . Furthermore, we also used the more general notation for condi-
tional expectations, which also apply if the outcome variable is not an indicator vari-
able with values 0 and 1. In Example 4.10, we already specified the set Ω = Ω1 × Ω2 × Ω3
of possible outcomes with its elements ωi = (u, ωX , ωY ), the probability space (Ω, A, P ),
the random variables U = π1, X , Y, and the filtration (Ft )t ∈T in A, where T = {1, 2, 3}.
Furthermore, in Example 4.10 we also asserted that U = π1 is a global potential con-
founder of the treatment variable X , playing the role of D X in the general theory. Therefore,
τx = E X =x (Y |D X ) = E X =x (Y |U ) for both treatments x = 0 and x = 1.
Now we specify the values of the true outcome variables τx = E X =x (Y |U ) for the two
treatment conditions x = 0 and x = 1. According to Remark 5.11,
τ0 = g 0 (U ), (5.5)
120 5 True Outcome Variable and Causal Total Effects
where U : Ω → ΩU with
¡ ¢
U (ωi ) = U (u, ωX , ωY ) = u, for all ωi ∈ Ω, (5.6)
and g 0 : ΩU → R with
g 0 (u) = E (Y | X =0,U =u ), for all u ∈ ΩU (5.7)
[see Eq. (5.4)]. Hence, in order to assign a value of τ0 to an outcome ωi ∈ Ω of the random
experiment, first we have to assign to ωi a value u ( Joe or Ann ) of the person variable
U , and then assign to u via g 0 the number E (Y | X =0,U =u ), that is, the corresponding
conditional expectation value.
If, for instance, ω3 = ( Joe , yes , −) (see the third row in Table 5.1), then U (ω3 ) = Joe , and
the value of τ0 is
¡ ¢
τ0 (ω3 ) = g 0 U (ω3 ) = g 0 ( Joe ) = E (Y | X =0,U =Joe ) = P (Y =1| X =0,U =Joe ) = .7.
This is true even though X (ω3 ) = 1 and the value of the conditional expectation E (Y | X ,U )
is
E (Y | X ,U )(ω3 ) = E (Y | X =1,U =Joe ) = P (Y =1| X =1,U =Joe ) = .8.
Hence, the true outcome variable τ0 given treatment 0 takes on a well-defined value for ω3
even though the unit drawn receives treatment 1. This illustrates the distinction between
the random variables τ0 = E X=0 (Y |U ) and E (Y | X ,U ). While τ0 = g 0 (U ) is solely a function
of U [see Eq. (5.5)], the conditional expectation E (Y | X ,U ) is a function of X and U , that is,
there is a function g : ΩX′ ×ΩU → R such that E (Y | X ,U ) = g (X ,U ), where g (X ,U ) denotes
the composition of (X ,U ) and g . Again, we simply say that E (Y | X ,U ) is a function of X
and U .
Because, in this example, the outcome variable Y is dichotomous with values 0 and
1, the conditional expectation value E (Y | X =0,U =u ) is also the conditional probability
P (Y =1| X =0,U =u ) of success, and because in this example, U has only two values, Joe
and Ann , the true outcome variable τ0 also has only two different values, the two condi-
tional probabilities P (Y =1| X =0,U =Joe ) = .7 and P (Y =1 | X =0,U =Ann ) = .2 (see Table
5.1). Hence, if ωi ∈ {U =Ann} = {ω ∈ Ω: U (ω) = Ann }, then the value of τ0 is
¡ ¢
τ0 (ωi ) = g 0 U (ωi ) = g 0 ( Ann ) = E (Y | X =0,U =Ann ) = P (Y =1| X =0,U =Ann ) = .2.
Similarly, the true outcome variable τ1 = E X =1 (Y |U ) given treatment 1 is specified by
τ1 = g 1 (U ), (5.8)
where g 1 : ΩU → R is defined by
g 1 (u) = E (Y | X =1,U =u ), for all u ∈ ΩU . (5.9)
Hence, if ωi ∈ {U =Joe } = {ω ∈ Ω: U (ω) = Joe }, then the value of τ1 is
¡ ¢
τ1 (ωi ) = g 1 U (ωi ) = g 1 ( Joe ) = E (Y | X =1,U =Joe ) = P (Y =1| X =1,U =Joe ) = .8,
and if ωi ∈ {U =Ann}, then the value of τ1 is
¡ ¢
τ1 (ωi ) = g 1 U (ωi ) = g 1 ( Ann ) = E (Y | X =1,U =Ann ) = P (Y =1| X =1,U =Ann ) = .4.
The last two columns of Table 5.1 show which values the two random variables τ0 =
E X=0 (Y |D X ) and τ1 = E X =1 (Y |D X ) assign to each of the eight possible outcomes ωi of the
random experiment. ⊳
5.1 True Outcome Variable 121
Remark 5.13 [The Fundamental Problem of Causal Inference] In Remark 5.9 we already
mentioned that the values of τx cannot be estimated. The reason is as follows: Consider
the last but one column of Table 5.1 with the values E X =1 (Y |U =u ) of τ1 , namely .8 for
u = Joe and .4 for u = Ann . Imagine that we would conduct the random experiment pre-
sented in this table and that Joe is sampled and he selects treatment (X =1). Then we can
observe the value of Y for Joe under treatment, which is an estimate of E X =1 (Y |U =Joe ),
although a bad one. However, in many applications, we cannot estimate E X=0 (Y |U =Joe )
at the same time because, if he selects treatment, then we cannot observe his value of Y
under control, and vice versa. This has been called “the fundamental problem of causal
inference” by Holland (1986). Observing Joe’s value of the outcome variable Y, first under
control and then under treatment may yield a value of Y for Joe under treatment that may
systematically differ from the corresponding value that would be observed if Joe would not
have been in control, in the first place. There are numerous reasons for this fact that are
discussed in great detail by Campbell and Stanley (1966). From a formal point of view, it
should be noted that observing Joe’s value of the outcome variable Y under control and
then again under treatment refers to a different random experiment (represented by a dif-
ferent probability space) than observing his outcome value under treatment without the
preceding observation under control. ⊳
Remark 5.15 [True Outcomes vs. Potential Outcomes] Rubin (1974, 2005) assumes that,
given an observational unit u and a treatment condition x, the values of his potential out-
come variables Y0 and Y1 are fixed numbers. In the example presented in Table 5.1, this
would mean to replace the two true outcome variables τ0 and τ1 by the two potential out-
come variables Y0 and Y1 that can take on only the values 0 and 1. Substantively speaking,
this would mean that, given a concrete treatment and a concrete observational unit, the
outcome is fixed to 0 or 1. For example, if the outcome is being alive (Y =1) or not (Y = 0) at
the age of 80 and the treatment is receiving (X =1) or no receiving (X =0) an anti-smoking
therapy before the age of 40, then this deterministic idea is not in line with our knowledge
of causes for being alive or not being alive at the age of 80.
In contrast, the two true outcome variables τ0 and τ1 can take on any real number
as values. In the example of Table 5.1, they can take on any value between 0 and 1, in-
clusively. In the smoking example, their values would be the person-specific probabil-
ities of being alive at the age of 80 given treatment or given no treatment. [As we will
see later on, it is irrelevant that we usually cannot estimate these conditional probabili-
ties P (Y =1| X =1,U =u ).] Therefore, the true outcome variables can be considered to be
a generalization of the potential outcome variables. Most important, in contrast to po-
tential outcome variables, true outcome variables are in line with the idea that mediators
and events that may occur between treatment and outcome might also affect the outcome
variable Y.
Hence, in Rubin’s potential outcome approach the observational unit u and treatment
x determine the value of the outcome variable Y. In contrast, according to true outcome
122 5 True Outcome Variable and Causal Total Effects
theory, u and x only determine the conditional distribution P Y |X =x ,U =u , and with it, the
conditional expectation value E (Y |X =x ,U =u ) of the outcome variable Y. ⊳
In general, this random variable is not uniquely defined so that there can be many versions
of such a true total effect variable. However, assuming P -uniqueness of τx = E X =x (Y |D X )
′
and of τx ′ = E X =x (Y |D X ) implies that the difference τx − τx ′ is P-unique as well [see RS-Box
5.1 (viii)].
Remark 5.16 [True Outcome Variables Are P X =x -Unique] In Remark 5.5 we already men-
tioned that, according to its definition as a D X -conditional expectation of Y with respect
to the probability measure P X =x , a true outcome variable τx is P X =x -unique. That is, if
E X =x (Y |D X ) denotes the set of all versions of the D X -conditional expectation of Y with
respect to the measure P X =x , then
holds, where
That is, whenever τx and τx∗ are two versions of the D X -conditional expectation of Y with
respect to P X =x , then they are identical P X =x -almost surely. ⊳
5.2 True Total Effect Variable 123
Remark 5.17 [Assuming P -Uniqueness of True Outcome Variables] However, in the def-
inition of a true total effect variable (see Def. 5.18) we assume that the true outcome vari-
ables τx and τx ′ are P -unique. Remember,
where
Definition 5.18 [True Total Effect Variable and True Total Effect ]
Let the Assumptions 5.1 (a) to (d) and (g) hold. Furthermore, let τx and τx ′ denote true
outcome variables of Y given the values x and x ′ of X , respectively.
(i) If τx and τx ′ are P-unique, then we call CTE D X ; x x ′ : Ω′D X → R a version of the
true total effect function comparing x to x ′ (with respect to Y), if
CTE D X ; xx ′ (D X ) =
P
τx − τx ′ (5.15)
Of course, the concepts of a true total effect and a true total effect variable are of a the-
oretical nature and can be estimated only under rather restrictive assumptions. However,
other causal total effects such as the expectation of τx − τx ′ (see sect. 5.3) can be estimated
under realistic assumptions.
CTE U ;10 := g 1 − g 0 .
the treatment effect of Ann. Hence, CTE U ;10 assigns to each person u ∈ ΩU the person-
specific true effect on the outcome variable Y (success) comparing x =1 to x = 0. ⊳
Remark 5.21 [Uniqueness of a True Total Effect Variable] For simplicity, we denote a true
total effect variable CTE D X ; x x ′ (D X ) also by
δxx ′ = τx − τx ′ . (5.17)
[see Def. 5.18 (iii) and RS-Def. 5.30]. If P (X =x , D X =d ), P (X =x ′, D X =d ) > 0, then this value
is identical for all versions of the true total effect function CTE D X ; xx ′ and for all versions of
the true total effect variable CTE D X ; xx ′ (D X ) = δxx ′ .
⊳
5.3 Causal Average Total Effect 125
Remark 5.24 [Probabilistic Ceteris Paribus Clause] Considering a value CTE D X ; x x ′ (d) is
tantamount to comparing x to x ′ (with respect to the outcome variable Y ) keeping con-
stant the global potential confounder D X , and with it, keeping constant all potential con-
founders of X . Keeping constant D X , is the translation of the ceteris paribus clause for total
effects into probability theory. ⊳
Example 5.25 [Joe and Ann With Self-Selection] In Example 5.20, we already specified
the true total effect function CTE U ;10 : ΩU → R . The corresponding true total effect vari-
able CTE U ;10 (U ) is the composition of U and CTE U ;10 . It is a random variable on (Ω, A, P ).
For all ωi ∈ Ω, its values are assigned by
(
¡ ¢ CTE U ;10 ( Joe ) = .1, if ωi ∈ {U =Joe }
δxx ′ (ωi ) = CTE U ;10 (U )(ωi ) = CTE U ;10 U (ωi ) =
CTE U ;10 (Ann ) = .2, if ωi ∈ {U =Ann }
of a true total effect variable [see Eq. (5.17) and RS-Box 3.1 (vii)]. Hence, the causal average
total effect is the expectation of a true total effect variable δxx ′ ; it is not an unweighted
average as the name might suggest.
As mentioned before, this expectation can be estimated under assumptions that are
less restrictive than those that allow estimating the total effect variable δxx ′ itself. This will
be detailed in the chapters on unbiasedness and its sufficient conditions.
Taking the expectation of δxx ′ (with respect to P ) means that we consider the average
total effect with respect to the measure P , that is,
′
ATE xx ′ = E (δxx ′ ) = E CTE D X ; x x ′ (D X ) = E E X =x(Y |D X ) − E X =x (Y |D X ) .
¡ ¢ ¡ ¢
(5.22)
Note again that CTE D X ; x x ′ (D X ) is a random variable on (Ω, A, P ). In principle, we can also
consider the average total effect with respect to another measure than P (for more details
see Rem. 5.45).
126 5 True Outcome Variable and Causal Total Effects
Note that CTE D X ; x x ′ is a random variable on (Ω′D X , A ′D X , P D X ), not on (Ω, A, P ) (see again
RS-Rem. 2.39). ⊳
Remark 5.28 [P -Uniqueness and Expectations of τx and τx ′ ] According to RS-Box 5.1 (vii),
P -uniqueness of τx also implies
Hence, under the assumptions of Definition 5.26, the expectation of τx is a uniquely de-
fined number, and the same applies to the expectation of τx ′ . 2 ⊳
If the true outcome variables τx and τx ′ are not P-unique or the expectations E (τx ) and
E (τx ′ ) are not finite, then the causal average total effect ATE x x ′ is not defined. ⊳
Now we gather some sufficient conditions for E (τx ) to be finite. For more details and
proofs see SN-Remark 14.47, from which the following remark is adapted.
Remark 5.30 [Sufficient Conditions for Finiteness of E (τx )] Remember, the expectation
(Ω, A, P ) exists if the integral Y + d P of
R
of a random variable Y on a probability R space
the positive part of Y or the integral Y − d P of the negative part of Y is finite (see RS-
Def. 3.1). Hence, under the assumptions of Definition 5.26, some sufficient conditions for
the expectation E (τx ) of any version τx ∈ E X =x (Y |D X ) to exists and to be finite are:
(a) σ(D X ) is a finite set and E X =x (Y ) is finite
(b) τx has only a finite number of real values
(c) τx is P -almost surely bounded on both sides, that is, ∃ α ∈ R : −α ≤ τx ≤ α
P P
(d) Y is P -almost surely bounded on both sides, that is, ∃ α ∈ R : −α ≤ Y ≤ α.
P P
Note that 0 ≤ Y ≤ α, for 0 < α ∈ R , as well as Y = 1A , for A ∈ A, are special cases of (d). ⊳
P P
Remark 5.31 [Causal Average Effect] If X represents a treatment variable, then ATE x x ′ is
also called the ‘average causal effect’ or the ‘causal average treatment effect’ comparing
treatment x to treatment x ′, which is unambiguous as long as no direct and/or indirect
treatment effects are discussed in the same context. ⊳
2 The expectation E(τ ) corresponds to the term E [Y |do(x)] in Pearl’s and to E(Y ) in Rubin’s terminologies (see,
x x
e. g., Pearl, 2009, p. 108 and Rubin, 2005, p. 323).
5.4 Causal Conditional Total Effect and Total Effect Function 127
X X
= g 1 (u) · P (U =u ) − g 0 (u) · P (U =u )
u u
= P (Y =1| X =1,U =Joe ) · P (U =Joe ) + P (Y =1| X =1,U =Ann ) · P (U =Ann )
¡ ¢
− P (Y =1| X =0,U =Joe ) · P (U =Joe ) + P (Y =1| X =0,U =Ann ) · P (U =Ann )
Figure 1.3 also illustrates various conditional probabilities of success. The points marked
by the dashed line are the probabilities P (Y =1|X =1,U =Joe ) and P (Y =1|X =0,U =Joe ) of
success for Joe given that he is treated and given that he is not treated, respectively. Simi-
larly, the two points marked by the solid line show the probabilities P (Y =1|X =1,U =Ann)
and P (Y =1|X =0,U =Ann) of success for Ann given that she is treated and given that she
is not treated, respectively. The points marked by the dotted line represent the conditional
probabilities P (Y =1| X =1) and P (Y =1| X =0) of success given treatment and given con-
trol, respectively. The size of the area of the dotted circles is proportional to the conditional
probabilities P (U =u |X =x ) that are used in the computation of the conditional expecta-
tion values
X
E (Y |X =x ) = E (Y | X =x ,U =u ) · P (U =u |X =x ) (5.25)
u
the concepts of the causal conditional total effect given the value z of Z and of a causal Z-
conditional total effect function. Often Z is a pretest that assesses the ‘same’ attribute as the
outcome variable Y, only prior to treatment. In other examples, Z could be X itself. First,
we explain the assumptions based on which we can introduce these concepts, present
their definitions, and then turn to re-aggregating conditional effects.
One of the assumptions in the definition of a causal conditional total effect is P Z=z -unique-
ness of the true outcome variables τx and τx ′ , which is defined in the following remark.
Remark 5.34 [P Z=z -Uniqueness of a True Total Effect Variable] Let the Assumptions 5.1
(a) to (f) hold. Then
That is, if τx is P Z=z-unique, then all versions of the D X -conditional expectation of Y with
respect to the probability measure P X =x are identical, P Z=z -almost surely.
Also note that P Z=z -uniqueness of τx is equivalent to P (X =x |D X ) > 0 (see RS-Th. 5.27),
P Z=z
which is defined by
Remark 5.35 [An Implication of P Z=z -Uniqueness of τx and τx ′ ] The conditional expecta-
tion value E (τx − τx ′ |Z =z) is a uniquely defined number if P (Z =z) > 0 and we assume
P Z=z -uniqueness of τx and τx ′ . In other words, if τx , τx ′ are P Z=z-unique and δx x ′ , δx∗x ′ are
different versions of the true total effect variable CTE D X ; xx ′ (D X ) [see Def. 5.18 (ii)], then
∗ ∗
E Z=z (δxx ′ ) = E Z=z (δxx ′ ) = E (δxx ′ | Z =z) = E (δxx ′ | Z =z) (5.28)
Remark 5.36 [P -Uniqueness Implies P Z=z -Uniqueness] If P (Z =z) > 0, then P -unique-
ness of τx = E X =x (Y |D X ) implies that τx is also P Z=z-unique [see RS-Box 5.1 (v)]. In con-
trast, P Z=z -uniqueness of τx does not imply that it is also P-unique. Hence, P Z=z-unique-
ness of τx is a weaker assumption than P -uniqueness of τx . However, there are conditions
under which P Z=z -uniqueness implies P -uniqueness, so that they are equivalent to each
other. ⊳
∀ z ∈ Z (Ω): τx is P Z=z-unique
¡ ¢
⇔ τx is P-unique. (5.29)
(Proof p. 147)
5.4 Causal Conditional Total Effect and Total Effect Function 129
Hence, under the assumptions of Lemma 5.37, τx is P Z=z-unique for all values z of Z
if and only if τx is P-unique. However, note that P -uniqueness of τx is also defined if Z is
continuous and P (Z =z) = 0, for all z ∈ Z (Ω). In this case, Proposition (5.29) does not hold.
CTE Z ; xx ′ (Z ) =
P
E (τx − τx ′ |Z ) (5.30)
Under the assumptions of Definition 5.38 (ii), CTE Z ; x x ′ (z) is uniquely defined. Further-
more, if τx and τx ′ are P Z=z-unique for all z ∈ Z (Ω), then, according to Lemma 5.37, they
are also P-unique and
That is, if τx and τx ′ are P Z=z-unique for all z ∈ Z (Ω), then the causal (Z =z)-conditional
total effects CTE Z ; x x ′ (z) are the uniquely defined values of the causal Z -conditional total
effect function CTE Z ; xx ′ .
Remark 5.40 [CTE Z ; x x ′ (Z ) Versus CTE Z ; x x ′ ] While the composition CTE Z ; x x ′ (Z ) is a ran-
dom variable on the probability space (Ω, A, P ), which assigns values to all ω ∈ Ω, the func-
tion CTE Z ; xx ′ is a random variable on (ΩZ′ , AZ′ , P Z ), which assigns values to all z ∈ ΩZ′ . It is
the factorization of the conditional expectation CTE Z ; x x ′ (Z ) (see RS-sect. 4.3). ⊳
ceteris paribus clause into the language of probability theory. It has been used to define a
true outcome variable τx = E X =x (Y |D X ).
Estimating the true outcome variables τx and τx ′ or their difference δxx ′ requires strong
assumptions. Considering a causal Z-conditional total effect variable, we re-aggregate the
true total effect variable τx − τx ′ [see Eq. (5.30)]. This yields a less fine-grained or coarsened
total effect variable, but it is still a causal conditional total effect variable. Considering
E (τx − τx ′ |Z ) instead of τx − τx ′ itself, we may loose information, but we do not loose causal
interpretability. In contrast to a true total effect variable τx − τx ′ , a Z-conditional total effect
variable E (τx − τx ′ |Z ) can often be identified under realistic assumptions by empirically
estimable conditional expectations (see, e. g., chs. 6 to 11). ⊳
Remark 5.42 [E (τx |Z ) Versus E X =x (Y | Z )] Note the distinction between the two Z-condi-
tional expectations E (τx |Z ) and E (τx ′ |Z ) on one side and the two Z-conditional expecta-
′
tions E X =x (Y | Z ) and E X =x (Y | Z ) on the other side. The difference between the first two
conditional expectations is a causal Z-conditional total effect variable, that is,
CTE Z ; x x ′ (Z ) =
P
E (τx − τx ′ |Z ) =
P
E (τx |Z ) − E (τx ′ |Z ). (5.34)
′
In contrast, the conditional expectations E X =x (Y | Z ) and E X =x (Y | Z ), and their differ-
ence have no causal meaning unless they are unbiased (see Def. 6.13). The difference
′
E X =x (Y | Z ) − E X =x (Y | Z ) is just a Z -conditional prima facie effect variable, which can be
seriously misleading if erroneously interpreted as a causal conditional total effect variable.
⊳
Remark 5.43 [Coarsening the True Total Effect Variable τx − τx ′ ] With E (τx |Z ) we coarsen
(or re-aggregate) the true outcome variable τx = E X =x (Y |D X ). Conditioning on the global
potential confounder D X we control for all potential confounders of X . Therefore the con-
′
ditional expectations E X =x (Y |D X ) and E X =x (Y |D X ) inform us how Y depends on the val-
ues x and x ′ controlling for all potential confounders of X . Hence, considering the Z-con-
ditional expectation of the difference variable τx − τx ′ does not introduce bias. As already
stated in Remark 5.41, it just coarsens the most fine-grained total effects to causal total
effects that are less fine-grained. In contrast, considering the conditional expectations
′
E X =x (Y | Z ) and E X =x (Y | Z ) and their difference, we only control for Z , possibly neglecting
important potential confounders. In chapter 6 we will define E X =x (Y | Z ) to be unbiased or
biased depending on whether or not E X =x (Y | Z ) = P
E (τx |Z ) (see again Def. 6.13). ⊳
Remark 5.44 [Values of a Causal Conditional Total Effect Variable] Note that the compo-
sition CTE Z ; x x ′ (Z ) of the conditional total effect function CTE Z ; x x ′ and Z is a random vari-
able on (Ω, A, P ), and according to RS-Equation (4.18),
Hence, if z is a value of Z such that the assumptions of Definition 5.38 (ii) are satisfied, then
CTE Z ; x x ′ (z) is uniquely defined and it is identical to the (Z =z)-conditional expectation
value of τx − τx ′ given the event {Z =z} = {ω ∈ Ω: Z (ω) = z }. Furthermore, according to RS-
Equation (3.35),
That is, a value of the causal conditional total effect variable CTE Z ; xx ′ (Z ) is the difference
between the (Z =z)-conditional expectation values of τx and τx ′ . ⊳
Remark 5.45 [Average Total Effect With Respect to P Z=z ] In Definition 5.38 (ii) it is as-
sumed that P (Z =z) > 0 and that τx and τx ′ are P Z=z-unique. Therefore, the causal (Z =z)-
conditional total effect CTE Z ; x x ′ (z) on Y comparing x to x ′ is identical to the causal aver-
age total effect on Y comparing x to x ′ with respect to the measure P Z=z . That is,
⊳
Remark 5.46 [Causal Conditional Versus Causal Average Total Effects] A causal condi-
tional total effect variable is more informative than the causal average total effect. If the
values z of Z are pretest scores that assess the ‘same’ attribute (e. g., life satisfaction) as the
outcome variable Y (the post-test), but prior to the onset of the treatment, then compar-
ing the conditional total effects CTE Z ; xx ′ (z) and CTE Z ; xx ′ (z ′ ) shows if these conditional
total effects are different for different values z and z ′ of this pretest. If they are, then the
numbers CTE Z ; xx ′ (z) and CTE Z ; x x ′ (z ′ ) may inform us about the differential indication of
the treatment. That is, they answer questions such as “Which treatment is good for which
kind of persons?” ⊳
can talk about a causal individual total effect although the individual is not yet treated and
even if it will never be treated, just in the same way as we can talk about the probability of
flipping ‘heads’, even if the coin is never flipped. ⊳
E (τx |X ) =
P
E (τx ) and E (τx ′ |X ) =
P
E (τx ′ ), (5.39)
then
CTE X ; x x ′ (X ) =
P
E (τx − τx ′ |X ) =
P
E (τx |X ) − E (τx ′ |X ) =
P
E (τx ) − E (τx ′ ) = ATE x x ′ . (5.40)
Hence, if Proposition (5.39) holds, then the causal (X =x ∗)-conditional total treatment ef-
fects CTE X ; x x ′ (x ∗ ) are identical for all values x ∗ of X for which P (X =x ∗) > 0. A sufficient
condition of Proposition (5.39) is stochastic independence of X and the global potential
confounder D X (see Exercise 5-12), a condition that is created in the randomized experi-
ment. (For more details see ch. 8.) ⊳
Remark 5.50 [CTE X ; xx ′ (x) Versus CTE X ; xx ′ (x ′ )] Suppose we are interested in the effects
of a treatment (represented by X =x ) compared to a control (represented by X =x ′ ) with
respect to the outcome variable Y, say well-being, and assume that there is no random
assignment of persons to treatments. In this case, the persons that tend to take the treat-
ment may differ in their well-being before treatment and in other pre-treatment variables
from those who tend to be in the control condition. In this case, there might be large dif-
ferences between the causal (X =x )-conditional total effect CTE X ; xx ′ (x) compared to the
causal (X =x ′ )-conditional total effect CTE X ; xx ′ (x ′ ). In this scenario the causal average
total effect ATE xx ′ would not be of much interest. The causal (X =x )-conditional effect
CTE X ; x x ′ (x) helps us evaluating how good the treatment is on average for those who tend
to take this treatment. In contrast, CTE X ; x x ′ (x ′ ) informs us about the average effect of the
treatment on those who tend not to take the treatment — under the side conditions under
which the random experiment is to be conducted. Hence, if the causal conditional total
effect CTE X ; xx ′ (x) is smaller than the causal conditional total effect CTE X ; x x ′ (x ′ ), one may
raise the question whether or not it would be worthwhile to change the regime of assigning
units to treatment, provided, of course, that this regime is under our control. ⊳
Remark 5.51 [Multivariate Random Variable Z ] Also note that the concept of a causal
(Z =z)-conditional total effect is not restricted to a univariate random variable Z . Instead,
Z = (Z 1 , . . . , Z m ) may also be an m-variate random variable on (Ω, A, P ) such that a value
z = (z1 , . . . , zm ) of Z is an m-tupel of values of the random variables Z 1 , . . . , Z m . ⊳
In this section we consider complete re-aggregation of conditional effects, that is, we con-
sider the expectation of a causal total effect variable CTE Z ; xx ′ . According to Theorem 5.52,
this expectation is identical to the causal average total effect on Y comparing x to x ′ .
5.4 Causal Conditional Total Effect and Total Effect Function 133
Now we turn to a less rigorous re-aggregation of a causal total effect variable CTE Z ; x x ′ (Z ),
considering a W-conditional expectation of CTE Z ; xx ′ (Z ), which, according to Theorem
5.55, is a causal W-conditional total effect variable CTE W ; xx ′ (W ), provided that W is Z -
measurable,
¡ that is, provided
¯ ¢ that σ(W ) ⊂ σ(Z ). Furthermore, the conditional expectation
value E CTE Z ; xx ′ (Z ) ¯ W =w is identical to the causal (W =w)-conditional total effect on
Y comparing x to x ′ , if we assume P (W =w) > 0.
134 5 True Outcome Variable and Causal Total Effects
Remark 5.56 [Partial Re-Aggregation] According to Theorem 5.55 (i), the W-conditional
expectation of a causal Z-conditional total effect variable is P -almost surely identical to
a causal W-conditional total effect variable, provided that W is Z -measurable. If W is Z-
measurable,
¡ then ¢ ) ⊂ σ(Z ), and if also σ(W ) 6= σ(Z ), then the conditional expectation
¯ σ(W
E CTE Z ; xx ′ (Z ) ¯ W may be called a partial re-aggregation of the original causal total effect
variable CTE Z ; xx ′ (Z ). It is tantamount to coarsening the original causal total effect variable
CTE Z ; x x ′ (Z ) to a less fine-grained causal total effect variable CTE W ; x x ′ (W ). ⊳
Remark 5.57 [The Proper Way of Partial Re-Aggregation] Inserting the definition of the
causal conditional total effect variable CTE Z ; xx ′ (Z ) [see Eq. (5.30)] into Equation (5.45)
yields
¡ ¯ ¢
CTE W ; x x ′ (W ) =
P
E CTE Z ; xx ′ (Z ) ¯ W [(5.45)]
¡ ¯ ¢
=
P
E E (τx − τx ′ |Z ) W
¯ [(5.30)]
(5.47)
=
P
E (τx − τx ′ |W ) [RS-Box 4.1 (xiii)]
=
P
E (τx |W ) − E (τx ′ |W ). [RS-Box 4.1 (xviii)]
Hence, partial re-aggregation actually yields the causal W -conditional effect variable, that
is, the W -conditional expectation of the true effect variable τx − τx ′ [see Def. 5.18 (i)].
If Y is binary, then we take a W-conditional expectation of Z-conditional probabilities
′ ′
P X =x (Y =1|D X ) = E X =x (Y |D X ) = τx and P X =x (Y =1|D X ) = E X =x (Y |D X ) = τx ′ ,
and not of their log odds ratios or other transformations of these probabilities. In general,
a partial re-aggregation of such transformed probabilities does not yield a causal W -con-
ditional total effect variable. ⊳
5.5 Example: Joe and Ann With Bias at the Individual Level
Now we illustrate the various causal total effects by an example in which there are two
treatment variables. Such a two-factorial experiment has already been discussed at an in-
formal level in section 2.3 and at a structural level in section 4.2.2. In contrast to the ex-
ample in section 4.2.2, now the outcome variable is not binary any more. In this example,
5.5 Example: Joe and Ann With Bias at the Individual Level 135
we also exemplify that bias can occur at the individual level if we just control for the per-
son variable but not for the second treatment variable that is simultaneous to the focused
putative cause variable.
Ω = Ω1 × Ω2 × Ω3 = ΩU × (ΩZ × ΩX ) × ΩY ,
where
Ω1 = ΩU := { Joe, Ann },
Ω2 = ΩZ × ΩX := { (no , no ), (no , yes ), (yes , no ), (yes , yes ) },
and ΩY is the set of possible observations based on which the score of the outcome vari-
able Y is computed.
If Ω3 = ΩY is finite or countably infinite, then we may choose A = P (Ω) as the σ-al-
gebra on Ω, where P (Ω) denotes the power set of Ω. However, if ΩY = R , then A is the
product σ-algebra A = P (ΩU )⊗P (ΩZ ×ΩX )⊗B, where B denotes the Borel σ-algebra on
R (see RS-Rem. 1.14).
The probability measure P on (Ω, A ) is known only in that part which can be com-
puted from the parameters displayed in Table 5.2. For example, the conditional prob-
abilities for the two kinds of treatments are as follows: Joe receives treatment a (Z =1)
with probability P (Z =1|U =Joe ) = 1/2 and he receives treatment b (X =1) with probability
P (X =1|U =Joe, Z = 0) = 3/4 if he does not receive treatment a (Z = 0), and with probability
P (X =1|U =Joe, Z =1) = 1/4 if he receives treatment a (Z =1) as well. Similarly, Ann receives
treatment b (X =1) with probability P (X =1|U =Ann, Z = 0) = 3/4 if she does not receive
treatment a (Z = 0), and with probability P (X =1|U =Ann, Z =1) = 1/4 if she also receives
treatment a (Z =1). This is a realistic scenario if the probability of a person getting treat-
ment b is fixed by design depending on whether or not this person receives treatment a.
(Availability of resources might be a reason for such a design.)
We specify the filtration (Ft )t ∈T , T = {1, 2, 3}, as follows: F1 = σ(π1 ) = σ(U ), F2 = σ(π1, π2 ) =
σ(U , Z , X ), and F3 = σ(π1, π2, π3 ) = σ(U , Z , X , Y ), presuming that the two treatment vari-
ables X and Z are simultaneous. Of course, treatments are processes in time and they
could be represented by a more fine-grained filtration that would allow us to study de-
pendencies within the treatment process. However, for our present purpose the filtration
(Ft )t ∈T specified above will suffice.
136 5 True Outcome Variable and Causal Total Effects
Table 5.2. Joe and Ann with bias at the individual level
E X=0 (Y |U =u , Z=z)
E X =1 (Y |U =u , Z=z)
P (X =1 |U =u , Z=z)
P X=0(Z=z |U =u )
P X =1(Z=z |U =u )
Group therapy z
P (Z=z |U =u )
Person u
P (U =u )
0 1/2 3/4 68 82 1/4 3/4
Joe 1/2
1 1/2 1/4 96 100 3/4 1/4
There are several true total causal effects we might look at. In principle, we might be inter-
ested in
(a1 ) the causal individual total effect on Y of treatment a (Z =1) compared to not treat-
ment a (Z = 0) given that Joe (Ann) also receives treatment b (X =1),
(b 1 ) the causal individual total effect on Y of treatment a (Z =1) compared to not treat-
ment a (Z = 0) given that Joe (Ann) does not receive treatment b (X =0), and
(c 1 ) the average of these causal individual total effects, averaging over the two values of
X (representing treatment b and not treatment b, respectively).
Of course, the causal average effect is certainly less informative than the two conditional
effects.
Similarly, we may also be interested in
(a2 ) the causal individual total effect on Y of treatment b (X =1) compared to not treat-
ment b (X =0) given that Joe (Ann) also receives treatment a (Z =1),
(b 2 ) the causal individual total effect of treatment b (X =1) compared to not treatment b
(X =0) on Y given that Joe (Ann) does not receive treatment a (Z = 0), and
(c 2 ) the average of these causal individual total effects, averaging over the two values of
Z (representing treatment a and not treatment a, respectively).
Looking at the effects (a1 ) to (c 1 ), we consider treatment b to be a (qualitative) covariate
and treatment a to be the putative cause variable asking for the causal conditional effects
of treatment a given treatment b and their average, the ‘main effect’ of treatment a. In
contrast, looking at the effects (a2 ) to (c 2 ), we consider treatment a to be a (qualitative)
covariate and treatment b to be the putative cause variable.
In principle, both treatment variables, X and Z , may take the role of a covariate (and
potential confounder), depending on which treatment effects we are studying, the causal
effects of X on Y or the causal effects of Z on Y . In this example, we focus on X as a
putative cause variable of Y, that is, we consider the regular probabilistic causality setup
5.5 Example: Joe and Ann With Bias at the Individual Level 137
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y . In this setup, the index sets in Definition 4.11 are J = {1, 2}
and K = {2}, implying
and
In this setup, Z is a potential confounder of X because σ(Z ) ⊂ DC [see Def. 4.11 (iv)]. Fur-
thermore, the bivariate random variable (U , Z ) is a global potential confounder of X be-
cause σ(U , Z ) = σ(π1, π21 ) [see Def. 4.11 (iii)].
The values of τ0 and τ1 , denoted E X=0 (Y |U =u , Z =z) and E X =1 (Y |U =u , Z =z), are dis-
played in Table 5.2. According to RS-Equation (5.26),
and
E X =1 (Y |U =u , Z =z) = E (Y | X =1,U =u , Z =z).
The true total effect variable is
CTE (U ,Z ); 10 (U , Z ) = τ1 − τ0 .
It is (U , Z )-measurable. According to Table 5.2, its values are as follows: If Joe does not re-
ceive treatment a (Z = 0), then his causal total effect of treatment b compared to ¬b is
if she does (Z =1). Hence, all these (U =u , Z =z)-conditional total effects are positive, and
they are the true total effects on Y comparing treatment b to treatment ¬b [see Def. 5.18
(iii)]. However, in this example, these effects are not identical to the individual effects
CTE U ;10 (u), as will be shown later on in this section. The individual total effect CTE U ;10 (u)
is an attribute of the person u, whereas the true total effect CTE (U , Z ); 10 (u, z) is an attribute
of the person u in treatment z.
Using the true total effects [see Eqs. (5.49) to (5.52)], the causal average total effect of treat-
ment b (X =1) compared to ¬b (X =0) can be computed by
¡ ¢ XX
E CTE (U ,Z ); 10 (U , Z ) = CTE (U ,Z ); 10 (u, z) · P (U =u , Z =z)
u z
1 1 1 1
= 14 · + 4 · + 18 · + 2 · = 9.5.
4 4 4 4
BecauseCTE (U ,Z ); 10 (U , Z ) denotes the composition of (U , Z ) and the true total effect func-
tion CTE (U ,Z ); 10 [see Def. 5.38 (i)], in these computations we used RS-Equation (3.13), and
for all values (u, z) of the global potential confounder (U , Z ). Note that in other examples
the probabilities P (U =u , Z =z) may not be identical for all pairs (u, z) of values of U and
Z.
Now we compute the (U =u )-conditional prima facie effects, that is, the differences
and
E (Y | X =1,U =Ann) − E (Y | X =0,U =Ann) .
These differences may also be called the individual or person-specific prima facie effects.
They are not identical to the causal individual total effects, that is, they are not the values
of the causal U -conditional total effect function, which will be computed in the next sub-
section. Hence, in this example, these individual prima facie effects are biased (see ch. 6
for more details).
In order to compute the conditional expectation values of Y given treatment and unit,
we use the equation
X
E (Y | X =x,U =u) = E (Y | X =x,U =u, Z =z) · P (Z =z | X =x,U =u), (5.53)
z
which is always true if Z is discrete with P (X =x,U =u, Z =z) > 0 for all values of Z [see
RS-Box 3.2 (ii)]. Both kinds of parameters occurring on the right-hand side of this equa-
tion are displayed in Table 5.2. This does not only include the conditional expectation
values E (Y | X =x,U =u, Z =z) = E X =x (Y |U =u, Z =z), but also the conditional probabilities
P (Z =z | X =x,U =u) = P X =x (Z =z |U =u), which have been computed via:
5.5 Example: Joe and Ann With Bias at the Individual Level 139
In this example, the individual prima facie effect of treatment b compared to not treat-
ment b is negative, namely −2.5, although all (U =Joe , Z =z)-conditional effects are posi-
tive, namely 14 for U =Joe and Z = 0 (i. e., given Joe and not treatment a) and 4 for U =Joe
and Z =1 (i. e., given Joe and treatment a).
Similarly, using Equation (5.53), the (X =x ,U =Ann)-conditional expectation values for
Ann are
1 3
E (Y | X = 0,U =Ann) = 80 · + 104 · = 98,
4 4
3 1
E (Y | X =1,U =Ann) = 98 · + 106 · = 100,
4 4
and her individual prima facie effect of X on Y is
This prima facie effect does not have a causal interpretation. It is not identical to the causal
(U =Ann)-conditional total effect of X on Y , which is computed in the following subsec-
tion.
= E E X =1 (Y |U , Z ) − E X=0 (Y |U , Z ) ¯ U =Joe
¡ ¯ ¢
XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z |U =Joe)
u z
1 1
= (82 − 68) · + (100 − 96) · + (98 − 80) · 0 + (106 − 104) · 0 = 9
2 2
140 5 True Outcome Variable and Causal Total Effects
[see Eqs. (5.31), (5.48), and RS-Eq. (3.28)]. This is an example of partial re-aggregation (see
Th. 5.55). For Ann, the corresponding causal total individual effect is
¡ ¢
CTE U ;10 (Ann ) = E CTE (U ,Z ); 10 (U , Z ) |U =Ann
= E E X =1 (Y |U , Z ) − E X=0 (Y |U , Z ) ¯ U =Ann
¡ ¯ ¢
XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z |U =Ann)
u z
1 1
= (82 − 68) · 0 + (100 − 96) · 0 + (98 − 80) · + (106 − 104) · = 10.
2 2
Hence, in this example, the two causal individual total effects for Joe and Ann are both
positive.
Comparing the causal individual effect CTE U ;10 ( Joe ) = 9 to the corresponding prima fa-
cie effect −2.5 [see Eq. (5.55)] shows that the individual prima facie effect E (Y | X =1,U = Joe )
− E (Y | X = 0,U = Joe ) strongly differs from its causal counterpart, and the same applies to
the individual prima facie effect of Ann. This is evident if we compare her prima facie effect
E (Y | X =1,U =Ann) − E (Y | X = 0,U =Ann) = 2 to her causal individual effect CTE U ;10 (Ann )
= 10.
According to Equation (5.41), the expectation of these causal individual effects,
X 1 1
= CTE U ;10 (u) · P (U =u ) = 9 · + 10 · = 9.5,
u 2 2
is the causal average total effect ATE 10 of treatment b compared to not treatment b.
Causal individual total effects are more informative than the causal average total ef-
fect and usually more informative than causal conditional total effects given a value of a
pre-test or a second treatment variable. However, note again that causal individual [i. e.,
(U =u )-conditional] total effects are not necessarily the most fine-grained causal total ef-
fects. In this example, there is a second treatment variable, denoted Z , that contributes
to the variation of the outcome variable Y beyond the individual level. This is exem-
plified comparing the causal (U =u , Z =z)-conditional total effects to the causal (U =u )-
conditional total effects.
In the example presented in Table 5.2, the causal (Z = 0)-conditional (i. e., given not treat-
ment a) total effect of treatment b (X =1) compared to not treatment b (X =0) can be com-
puted by
¡ ¯ ¢
CTE Z ;10 (0) = E CTE (U ,Z ); 10 (U , Z ) ¯ Z = 0
= E E X =1 (Y |U , Z ) − E X=0 (Y |U , Z ) ¯ Z = 0
¡ ¯ ¢
XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z | Z = 0)
u z
1 1
= (82 − 68) · + (100 − 96) · 0 + (98 − 80) · + (106 − 104) · 0 = 16
2 2
5.5 Example: Joe and Ann With Bias at the Individual Level 141
[see Eqs. (5.31), (5.48), and RS-Eq. (3.28)]. The corresponding causal (Z =1)-conditional
(i. e., given treatment a) total effect is
¡ ¯ ¢
CTE Z ;10 (1) = E CTE (U ,Z ); 10 (U , Z ) ¯ Z =1
= E E X =1 (Y |U , Z ) − E X=0 (Y |U , Z ) ¯ Z =1
¡ ¯ ¢
XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z | Z =1)
u z
1 1
= (82 − 68) · 0 + (100 − 96) ·
+ (98 − 80) · 0 + (106 − 104) · = 3.
2 2
According to Equation (5.41), taking the expectation
¡ ¢ X ¡ ¯ ¢ 1 1
E CTE Z ; 10 (Z ) = E CTE Z ; 10 (z) ¯ Z =z · P (Z =z) = 16 · + 3 · = 9.5 (5.57)
z 2 2
again yields the average total effect. In this equation, we used the theorem of total prob-
P
ability in order to compute P (Z =z) = u P (Z =z |U =u ) · P (U =u ) (see RS-Th. 1.38), which
yields P (Z = 0) = P (Z =1) = 1/2.
Consider again the example presented in Table 5.2. Because D X = (U , Z ), the causal
(X =0)-conditional total effect of treatment b (X =1) compared to not treatment b (X =0)
can be computed by
¡ ¯ ¢
CTE X ;10 (0) = E CTE (U ,Z );10 (U , Z ) ¯ X =0
XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z | X =0)
u z
1 3 1 3
= (82 − 68) · + (100 − 96) · + (98 − 80) · + (106 − 104) · = 6.25
8 8 8 8
[see again Eqs. (5.31), (5.48), and RS-Eq. (3.28)].
In contrast,
¡ ¯ ¢
CTE X ; 10 (1) = E CTE (U ,Z ); 10 (U , Z ) ¯ X =1
XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z | X =1)
u z
3 1 3 1
= (82 − 68) · + (100 − 96) · + (98 − 80) · + (106 − 104) · = 12.75
8 8 8 8
yields the (X =1)-conditional total effect of treatment b (X =1) compared to not treatment
b (X =0). In these equations, we used
P (X =x |U =u , Z =z) · P (U =u , Z =z)
P (U =u , Z =z | X =x ) = . (5.58)
P (X =x )
According to Equation (5.41), taking the expectation
¡ ¢ X 1 1
E CTE X ; 10 (X ) = CTE X ;10 (x) · P (X =x ) = 6.25 · + 12.75 · = 9.5
x 2 2
yields the causal average total effect. In this equation, we again used the theorem of total
P P
probability, that is, P (X =x ) = u z P (X =x |U =u , Z =z) · P (U =u , Z =z) (see RS-Th. 1.38),
which yields P (X =0) = P (X =1) = 1/2.
142 5 True Outcome Variable and Causal Total Effects
CTE (X ,Z ); 10 (x, z)
¡ ¯ ¢
= E CTE (U ,Z );10 (U , Z ) ¯ X =x , Z =z = E (δ10 | X =x , Z =z) (5.59)
(Y |U =u , Z =z ∗ ) − E X=0 (Y |U =u , Z =z ∗ ) · P (U =u , Z =z ∗ |X =x , Z =z)
X X ¡ X =1 ¢
= E
u z∗
[see Eqs. (5.31), (5.48), and RS-Eq. (3.28)]. In this equation, the summation is over all
values u of U and all values z ∗ of Z . In the term P (U =u , Z =z ∗ |X =x , Z =z) we con-
dition on a fixed value x of X and a fixed value z of Z . The conditional probabilities
P (U =u , Z =z ∗ |X =x , Z =z) can be computed via
(
∗ P (U =u |X =x , Z =z), if z ∗= z
P (U =u , Z =z |X =x , Z =z) = (5.60)
0, if z ∗6= z,
because, if z = z ∗, then
P (U =u , Z =z ∗, X =x , Z =z)
P (U =u , Z =z ∗ |X =x , Z =z) =
P (X =x , Z =z)
P (U =u , X =x , Z =z)
= = P (U =u |X =x , Z =z) (5.61)
P (X =x , Z =z)
P (X =x |U =u , Z =z) · P (U =u , Z =z)
= ,
P (X =x | Z =z) · P (Z =z)
where
X
P (X =x | Z =z) = P (X =x |U =u , Z =z) · P (U =u |Z =z) , (5.62)
u
1/4 · 1/4 1
P (U =u , Z = 0 | X =0, Z = 0) = = ,
1/4 · 1/2 2
¡ ¯ ¢
for u =Joe and for u =Ann. Hence, the equation for E CTE (U ,Z ); 10 (U , Z ) ¯ X =x , Z =z yields
¡ ¯ ¢
E CTE (U ,Z ); 10 (U , Z ) ¯ X =0, Z = 0 = E (δ10 | X =0, Z = 0)
1 1
= (82 − 68) · + (100 − 96) · 0 + (98 − 80) · + (106 − 104) · 0 = 16.
2 2
For X =1 and Z = 0, we receive
5.6 Summary and Conclusions 143
¡ ¯ ¢
E CTE (U ,Z ); 10 (U , Z ) ¯ X =1, Z = 0 = E (δ10 | X =1, Z = 0)
1 1
= (82 − 68) · + (100 − 96) · 0 + (98 − 80) · + (106 − 104) · 0 = 16,
2 2
for X =0 and Z =1, we receive
¡ ¯ ¢
E CTE (U ,Z );10 (U , Z ) ¯ X =0, Z =1 = E (δ10 | X =0, Z =1)
1 1
= (82 − 68) · 0 + (100 − 96) · + (98 − 80) · 0 + (106 − 104) · = 3,
2 2
and for X =1 and Z =1,
¡ ¯ ¢
E CTE (U ,Z );10 (U , Z ) ¯ X =1, Z =1 = E (δ10 | X =1, Z =1)
1 1
= (82 − 68) · 0 + (100 − 96) · + (98 − 80) · 0 + (106 − 104) · = 3.
2 2
Hence, in this example, the conditional total effects E (δ10 | Z =z) and E (δ10 | X =x , Z =z) are
identical.
According to Equation (5.41), taking the expectation
³ ¡ ¯ ¢´ ¡ ¢
E E CTE (U ,Z ); 10 (U , Z ) ¯ X , Z = E E (δ10 | X , Z )
XX
= E (δ10 | X =x , Z =z) · P (X =x , Z =z)
x z
1 1 1 1
= 16 · + 3 · + 16 · + 3 · = 9.5,
4 4 4 4
yields again the causal average total effect ATE 10 . In this equation, we used P (X =x , Z =z) =
P (X =x | Z =z) · P (Z =z), where the conditional probabilities P (X =x |Z =z) are obtained via
Equation (5.62).
In this chapter, we introduced the concept of a true outcome variable of the value x of a pu-
tative cause variable X , which was then used to define various causal total effects. Assum-
ing P (X =x ) > 0, a true outcome variable τx has been defined such that its values are the
conditional expectation values E (Y | X =x , D X =d ) of the outcome variable Y holding con-
stant X at the value x and D X at a value d, where D X denotes a global potential confounder
of X . These conditional expectation values are uniquely defined if P (X =x , D X =d ) > 0.
Note that this requirement is not necessary in the definition of a true outcome variable
itself. Also note, in this definition we only consider total effects.
Based on the concept of a true outcome variable τx = E X =x(Y |D X ), we defined several
kinds of causal total effects of treatment x compared to another treatment x ′ using the
true total effect variable
δxx ′ = τx − τx ′ .
The definitions of a causal average total effect ATE xx ′ and of a causal Z-conditional total
effect variable CTE Z ; xx ′ (Z ) (see Box 5.1) are based on the assumption that the two true
144 5 True Outcome Variable and Causal Total Effects
Additionally, let also x ′ ∈ ΩX′ , assume P(X =x ′ ) > 0, and let τx and τx ′ denote true outcome
variables of Y given the values x and x ′ of X , respectively.
outcome variables τx and τx ′ are P-unique. Defining the causal (Z =z)-conditional total ef-
fect CTE Z ; x x ′ (z) we only assume that τx and τx ′ are P Z=z-unique. The term ‘total’ is used in
order to distinguish these effects from direct and indirect effects, which are not considered
in this volume.
While D X is a global potential confounder of X on which we condition in order to con-
trol for all potential confounders of X , the variable Z may be used to re-aggregate the
(D X =d )-conditional total effects in order to consider less fine-grained causal conditional
total effects. Examples of Z are the observational-unit variable U , a pre-treatment variable
Z , and a treatment variable X .
5.6 Summary and Conclusions 145
Often we have to content ourselves with the causal average total effect or with causal con-
ditional total effects. Note however, that there might be cases in which half of the units
have positive causal individual total effects and the other half negative ones. The causal av-
erage total effect can then be zero. This is not a paradox but the nature of an average. Also
remember that a causal average total effect is informative for causal inference, whereas
an ordinary true mean difference E (Y | X =1) − E (Y | X =0), the prima facie effect, is not.
These prima facie effects have no causal interpretation at all, unless they are identical to
the causal average total effect. This will be detailed in chapter 6.
Conceptually, the causal average total effect is what is tested in a t -test for two independent
groups, provided that the data are sampled in a perfect randomized experiment. Similarly,
in this case, a test of the main effect of the ‘treatment factor’ in orthogonal analysis of vari-
ance is a simultaneous test of several causal average total effects if there are more than
two treatment conditions. Furthermore, if Z is a qualitative covariate of X , then it is con-
sidered a second ‘factor’ in analysis of variance. In this case, the (Z =z)-conditional total
effects are often called the ‘simple main effects’ (see, e. g., Woodward & Bonett, 1991).
Note that causal average total effects are uniquely defined even if there are inter-indivi-
dual differences in the causal individual total effects, and even if there is interaction be-
tween X and a covariate Z of X in the sense that the effect of X depends on the values of
Z . However, only in the randomized experiment can we be sure that, with the main effects
in analysis of variance, we test the causal average total effects.
Of course, the causal conditional effects given the values of a covariate are usually more
informative than their average, that is, than the causal average total effect; but sometimes
averaging is useful in order to avoid information overload, and sometimes we may be able
to estimate precisely enough only the causal average effect, but not the causal conditional
effects, for example, because of small sample sizes.
Note that our definitions of the various kinds of total effects solely use concepts of prob-
ability theory. No concepts have to be borrowed from philosophy or any other science,
although the basic idea goes back at least to Mill (1843/1865). We do not take a counter-
factual but a pre-facto perspective, which is the perspective taken in every application of
probability theory. Causal total effects are parameters, just in the same way as the proba-
bility of flipping ‘heads’ is a parameter about which we can talk before the coin is flipped
and even if the coin is never flipped. It is even meaningful to talk about the causal condi-
tional total effect of a treatment given control [see Eq. (5.38) and Rem. 5.47].
Note that all concepts introduced in this chapter such as the causal average total effects,
causal conditional total effects, causal individual total effects, and so on, are of a purely
theoretical nature. This does not mean that they are irrelevant for practical research. On
146 5 True Outcome Variable and Causal Total Effects
the contrary, they explicate what exactly we are looking for when we ask for the causal total
effects, for example, comparing two values of a treatment variable with respect to an out-
come variable Y . It is these effects that we have to estimate if we want to evaluate (a) if and
how dangerous an infection is, and (b) the overall effects of medical, psychological, social,
or political interventions on a specified criterion. And this includes their (undesired) side
effects.
The examples treated in chapters 4 and 5 exemplify what we mean by ‘purely theoretical
nature‘. For example, Table 5.2 does not show data that might be obtained in a data sam-
ple. Instead, it contains the theoretical parameters we would like to estimate from sample
data. Data serve to estimate theoretical parameters, including the various causal effects.
Defining these parameters is necessary if we want to study the conditions under which
these parameters can in fact be estimated.
Not all theoretical parameters have a causal meaning. In terms of the metaphor presented
in the preface, the causal total effects are the size of the invisible man. In contrast, in chap-
ter 1, we only dealt with (a) ordinary conditional expectation values E (Y |X =x ) of an out-
come variable Y given treatment x, (b) conditional expectation values E (Y |X =x ,Z =z) of
the outcome variable given treatment x and value z of another random variable Z , (c)
differences between these (conditional) expectation values, the (conditional) prima facie
effects, and (d) averages over these conditional prima facie effects.
The conditional expectation values E (Y |X =x ) and E (Y |X =x ,Z =z) are easily estimated
under the usual assumptions made for a sample, such as the assumption of independent
and identically distributed observations. However, they are only like the length of the in-
visible man’s shadow; depending on the angle of the sun, they can be seriously biased if
mistaken for the size of the invisible man itself.
Limitations
A limitation of the concept of a true outcome variable and the definitions of causal effects
based thereon is that they are defined only for values x of a putative cause variable X that
have a positive probability. This is not restrictive as long as we confine ourselves to con-
sidering only experiments and quasi-experiments. True outcome variables are restrictive,
however, if we study causal dependencies among continuous random variables and causal
dependencies on latent variables. In these cases, the concept of a true outcome variable
does not apply. In chapter 8 we will treat a class of causality conditions that do also apply
if the putative cause variable X is a continuous random variable. Furthermore, the class
of causality conditions presented in chapter 9 also applies if X is continuous or a latent
variable. In these cases causal effects and causal dependencies have to be defined with-
out true outcome variables. Nevertheless, true outcome variables are important for many
applications.
Another limitation is that we did not consider potential mediators, that is, variables that
are between X and Y . However, focussing only on causal total effects does not imply that
we denie that there are variables mediating these effects. This is one of the virtues of true
outcome theory: Given treatment x and person u, only the expectation of the outcome
variable Y is fixed, not the value of Y itself. In contrast, in Rubin’s potential outcome ap-
5.7 Proofs 147
proach it is assumed that the value of a potential outcome variable is fixed if we condition
on a treatment x and a person u. Such a determinism is at odds with the idea that potential
mediators might also affect the outcome variable Y . True outcome theory remedies this
deficiency. Nevertheless, defining potential mediators, direct, and indirect causal effects
would necessitate a filtration (Ft )t ∈T with more than three σ-algebras Ft .
5.7 Proofs
∀ z ∈ Z (Ω): τx is P Z=z-unique
¡ ¢ ¡ ¢
E CTE Z ; x x ′ (Z ) = E E (τx − τx ′ |Z ) [(5.30)]
Proposition (i).
¡ ¯ ¢ ¡ ¯ ¢
E CTE Z ; x x ′ (Z ) ¯ W =
P
E E (τx − τx ′ |Z ) ¯ W [(5.30)]
=
P
E (τx − τx ′ |W ) [RS-Box 4.1 (xiii)]
=
P
CTE W ; xx ′ (W ). [(5.30)]
Proposition (ii).
¡ ¯ ¢
E CTE Z ; x x ′ (Z ) ¯ W =
P
CTE W ; xx ′ (W ) [(5.45)]
148 5 True Outcome Variable and Causal Total Effects
¡ ¯ ¢
⇔ E CTE Z ; x x ′ (Z ) ¯ W =P
E (τx − τx ′ |W ) [Def. 5.38 (i)]
¡ ¯ ¢
⇒ E CTE Z ; x x ′ (Z ) ¯ W =w = E (τx − τx ′ |W =w ) [RS-(2.68), RS-(4.17)]
¡ ¯ ¢
⇔ E CTE Z ; x x ′ (Z ) ¯ W =w = CTE W ; xx ′ (w). [Def. 5.38 (ii)]
5.8 Exercises
⊲ Exercise 5-1 What is the conceptual framework in which we can define a true outcome variable?
⊲ Exercise 5-2 What does it mean that a true outcome variable τx is P-unique.
⊲ Exercise 5-3 Which are the values of the true outcome variable τ0 and of the conditional expec-
tation E (Y | X ,U ) for ω4 = (Joe ,yes ,+) in the example presented in Table 5.1?
⊲ Exercise 5-4 Compute the values of the true outcome variable τ0 = E X=0 (Y |U ) in the example
presented in RS-Table 2.1.
⊲ Exercise 5-5 Suppose that X is a binary treatment variable and Y an outcome variable. Why are
the conditional expectation values E (Y |X =0), E (Y |X =1), and their difference, the prima facie effect
E (Y |X =1) −E (Y |X =0), often useless in the evaluation of the causal total effect?
⊲ Exercise 5-6 Suppose that X is a treatment variable and Y an outcome variable. If the conditional
expectation values E (Y |X =x ) and their differences E (Y |X =x ) −E (Y |X =x ′ ) do not represent the
treatment effects we are interested in, then what are the treatment effects we would like to study?
⊲ Exercise 5-7 What is the difference between the causal average total effect ATE xx ′ and the prima
facie effect PFE xx ′ ?
⊲ Exercise 5-8 What is the causal conditional total effect CTE Z ; x x ′ (z) on Y comparing x to x ′ given
the value z of a random variable Z ?
⊲ Exercise 5-9 What is the causal conditional total effect CTE X ; x x ′ (x ∗ ) on Y comparing x to x ′ given
the value x ∗ of the putative cause variable X ?
⊲ Exercise 5-10 What is the causal conditional total effect CTE (X ,Z ); xx ′ (x ∗, z) on Y comparing x to
x ′ given treatment x ∗ and the value z of a random variable Z ?
⊲ Exercise 5-11 Use RS-Theorem 1.38 to compute the probability P(X =1) for the example dis-
played in Table 5.2.
⊲ Exercise 5-12 Show that Proposition (5.39) follows from independence of X and D X .
⊲ Exercise 5-14 Compute the probability P X =1(U =Ann, Z = 0) in the example of Table 5.2.
⊲ Exercise 5-16 Compute the causal average total effect ATE 10 for the random experiment pre-
sented in Table 5.2.
5.8 Exercises 149
⊲ Exercise 5-17 Compute the causal conditional total effect CTE Z ; 10 (0) given no group therapy for
the random experiment presented in Table 5.2.
⊲ Exercise 5-18 Let Z represent sex with values m (males) and f (females). Furthermore, suppose
CTE Z ;10 (m) = 11, CTE Z ; 10 ( f ) = 5, P(Z =m) = 1/3, and P(Z = f ) = 2/3. Which is the causal average
total effect ATE 10?
Solutions
⊲ Solution 5-1 First, we assume that there is a probability space (Ω,A,P). (In an empirical applica-
tion, this probability space represents the concrete random experiment considered.) Second, there
are two random variables on (Ω,A,P), say X and Y, where X represents the putative cause variable
and Y the outcome variable. Third, there is a filtration (Ft )t ∈T in A in which X is prior to Y (see Box
3.1). Fourth, we assume that P (X =x ) > 0. Fifth, we assume that D X is a global potential confounder
of X . By definition, τx = E X =x (Y |D X ) is P X =x -unique. In this definition, we do not assume that the
true outcome variable τx = E X =x (Y |D X ) is P-unique (see Exercise 5-2).
and
P(Y =1, X =0,U =Ann) .06
P(Y =1| X =0,U =Ann) = = = .2.
P(X =0,U =Ann) .24 + .06
⊲ Solution 5-5 A potential confounder W of X may determine the probabilities of being treated
[i. e., P(X =x |W ) 6= P(X =x ), x ∈ {0,1}] and the (X =x )-conditional expectation values of the out-
come variable Y [i. e., E X =x (Y |W ) 6= E X =x (Y ), x ∈ {0,1}]. In this case, there are examples in which
the difference E (Y |X =1) − E (Y |X =0) is not identical to the treatment effects to be studied. Simp-
son’s paradox presented in chapter 1 is such an example. Another example of such a potential con-
founder is W = severity of symptoms. If there is self-selection or if there is systematic selection to
treatment by experts that is also determined by the severity of the symptoms, then W will affect the
treatment probability and the conditional expectation values of the outcome variable (e. g., severity
of symptoms after treatment).
⊲ Solution 5-6 The basic idea is to consider the true total effect variable, that is, the difference
′
τx −τx ′ = E X =x (Y |D X ) −E X =x (Y |D X ), where we condition on a global potential confounder D X of
X . This means controlling for all potential confounders. If we take the expectation of the difference
τx −τx ′ (over the distribution of D X ), then this yields the causal average total effect on Y compar-
ing x to x ′. Note that taking this expectation necessitates to assume that τx and τx ′ are P-unique.
150 5 True Outcome Variable and Causal Total Effects
This assumption implies that the expectation E (τx −τx ′ ) is identical for all versions of τx and τx ′ (see
Remarks. 5.27 to 5.29).
⊲ Solution 5-7 The causal average total effect ATE x x ′ comparing treatment x to treatment x ′ has
been defined by Equation (5.21) (see also the solution to Exercise 5-6). It is this causal average total
effect that is of interest in the empirical sciences if our goal is to evaluate the treatment conditions x
and x ′ with respect to the outcome variable Y by a single number. In contrast, the prima facie effect
PFE x x ′ comparing x to x ′ is usually not of interest for the evaluation of such a treatment effect be-
cause it can be biased. Both terms differ from each other because PFE x x ′ = E (Y | X =x ) −E (Y | X =x ′ )
is not neccesarily identical to ATE xx ′ = E (τx )−E (τx ′ ). Note, however, that there are conditions under
which PFE xx ′ = ATE x x ′ . Such conditions, which are called causality conditions, are studied in some
detail in the next chapters.
⊲ Solution 5-8 The causal conditional total effect (on the outcome variable Y ) comparing x to x ′
given the value z of a random variable Z is the (Z=z)-conditional expectation value of the true total
effect variable δxx ′ = τx −τx ′ , that is,
It is presumed that P(Z=z ) > 0 and that τx and τx ′ are P Z=z-unique. These assumptions imply that
CTE Z ; x x ′ (z) is a uniquely defined number.
⊲ Solution 5-9 The causal conditional total effect (on Y ) comparing x to x ′ given the value x ∗ of the
putative cause variable X is the (X =x ∗)-conditional expectation value of δxx ′ = τx −τx ′ , that is,
CTE X ; x x ′ (x ∗ ) = E (δxx ′ | X =x ∗ ),
∗
where we presume P(X =x ∗ ) > 0 and that τx and τx ′ are P X =x -unique. If X represents a treatment
variable, x ∗= x , and x ′ = 0 represents a control group, then CTE X ; x0 (x) is the causal conditional total
effect comparing treatment x to control given treatment x. If x ∗= x ′ and x ′ = 0 represents a control
group, then CTE X ; x0 (0) is the causal conditional total effect comparing treatment x to control given
control. Although this sounds paradoxical, the term CTE X ; x0 (0) is meaningful and well-defined. Like
all other causal effects it refers to a random experiment to be conducted in the future. This means
that these concepts are well-defined even if the experiment is not yet conducted, or will never be
conducted (see sect. 5.4.2 for more details).
⊲ Solution 5-10 The causal conditional total effect CTE (X ,Z ); xx ′ (x ∗, z) (on Y ) comparing x to x ′
given treatment x ∗ and value z of Z is the (X =x ∗, Z=z)-conditional expectation value of the true-
effect variable, that is,
CTE (X ,Z ); x x ′ (x ∗, z) = E (δxx ′ | X =x ∗, Z=z),
∗
where we presume P (X , Z )=(x ∗, z) > 0 and that τx and τx ′ are P X =x ,Z=z -unique. If X represents
¡ ¢
∗
a treatment variable, x = x, the value 0 of X represents a control group, and m (male) the value
of Z = sex, then CTE (X ,Z ); x0 (x,m) is the causal conditional total effect comparing treatment x to
control given treatment x and the person to be sampled is male. If x ∗= 0, then CTE (X ,Z ); x0 (0,m) is
the causal conditional total effect comparing treatment x to control given control and the person to
be sampled is male.
⊲ Solution 5-11 Note that the four pairs (u, z) of values of U and Z are disjoint and all these pairs
of values have positive probabilities. Hence, we can apply the theorem of total probability [see RS-
Eq. (1.38)]:
XX
P(X =1) = P(X =1|U =u , Z=z ) · P(U =u , Z=z)
u z
3 1 3 1 1 1
µ ¶
= + + + · = .
4 4 4 4 4 2
5.8 Exercises 151
Also note that P(X =1) = E (1X =1 ) = E [E (1X =1 |U , Z )] [see RS-Box 4.1 (iv)], and that the proba-
bilities P(X =1|U =u , Z=z) are the values of the conditional expectation E (1X =1 |U , Z ). Then using
RS-Equation (3.13) yields the same formula. This second way makes clear that the unconditional
probability P(X =1) is the expectation of the conditional probability P(X =1|U , Z ) [see again RS-Box
4.1 (iv)].
⊲ Solution 5-12 Independence of X and D X implies that also X and E X =x (Y |D X ) are independent
because E X =x (Y |D X ) is D X -measurable [see RS-Box 2.1 (iv)]. Hence,
= E E X =x (Y |D X )
¡ ¢
[RS-Box 4.1 (v)]
P
X =x
£ ¤
= E (τx ). τx = E (Y |D X ), RS-Box 3.1 (v)
P
⊲ Solution 5-13 No solution provided. See what happens with the various techniques of re-aggre-
gating conditional effects, for example, re-aggregating the log odds ratios and compare it to re-ag-
gregating conditional effects according to Equation (5.44). Play with other parameter constellations.
[see RS-Eq. (5.1)]. For U =Ann, Z = 0, and X =1, this equation yields
P X =x (Z=z,U =u )
P X =x (Z=z |U =u ) =
P X =x (U =u )
P(Z=z,U =u , X =x )/P(X =x )
=
P(U =u , X =x )/P(X =x )
= P(Z=z | X =x ,U =u ) .
For both units u, the complementary conditional probability to P(X =1|U =u , Z = 0) = 3/4 (see
Table 5.2) is P(X =0 |U =u , Z = 0) = 1/4. Similarly the complementary conditional probability to
P (X =1|U =u , Z =1) = 1/4 (see again Table 5.2) is P(X =0 |U =u , Z =1) = 3/4. Now we can use the
equation
P(X =x |U =u , Z =z) · P(Z =z |U =u )
P(Z =z | X =x ,U =u ) = X ,
P(X =x |U =u , Z=z ) · P(Z=z |U =u )
z
which follows from Bayes’ Theorem (see RS-Th. 1.39) using P(Z=z | X =x ,U =u ) = P U =u (Z=z | X =x )
[see RS-Eq. (5.26)]. For example, the probability of not receiving group therapy (Z = 0), if Ann is
drawn (U =Ann) and does not receive individual therapy (X =0), is
1 1 3 1 1
= · + · = .
4 2 4 2 2
Inserting this result into Equation (5.63) yields the conditional probability
Using the same procedure for all values of U , X , and Z leads to the other conditional probabilities
displayed in the last two columns of Table 5.2.
⊲ Solution 5-16 In this example, (U , Z ) is a global potential confounder of X . Hence, according to
RS-Equation (3.14), Equations (5.20) and (5.21), using the parameters shown in Table 5.2 results in
XX¡ ¢
ATE 10 = E (Y | X =1,U =u , Z=z) − E (Y | X =0,U =u , Z=z) · P(U =u , Z=z)
u z
¡ ¢ 1
= (82 − 68) + (100 − 96) + (98 − 80) + (106 − 104) · = 9.5.
4
⊲ Solution 5-17 In this example, (U , Z ) is a global potential confounder of X . Hence, according to
RS-Equation (3.14) and Equation (5.31), E (δ10 |Z = 0) = E Z = 0 (δ10 ), using the parameters displayed
in Table 5.2 results in
XX¡ ¢
CTE Z ;10 (0) = E (Y | X =1,U =u , Z=z) − E (Y | X =0,U =u , Z=z ) · P(U =u , Z=z | Z = 0)
u z
E (Y | X =1,U =u , Z=z) − E (Y | X =0,U =u , Z=z ) · P Z = 0 (U =u , Z=z)
XX¡ ¢
=
u z
1 1
= (82 − 68) · + (100 − 96) · 0 + (98 − 80) · + (106 − 104) · 0 = 16.
2 2
⊲ Solution 5-18 Using Equation (5.41), we can compute the causal average total effect as follows:
1 2 1 2
ATE 10 = CTE Z ;10 (m) · +CTE Z ; 10 ( f ) · = 11 · + 5 · = 7.
3 3 3 3
Part III
Causality Conditions
Chapter 6
Tackling this problem, in this chapter we introduce and study unbiasedness of various
conditional expectation values, conditional expectations, prima facie effects, and prima
facie effect functions. In particular, we study how and under which conditions these pa-
rameters and random variables can be used to identify the corresponding causal effects
and effect functions. Hence, in this chapter we provide the link between causal effects and
causal effect functions on one side and parameters and functions that can empirically be
estimated on the other side. The unbiasedness conditions are the first and logically weak-
est kind of causality conditions, which, together with the structural components listed in
a regular probabilistic causality setup, distinguish causal stochastic dependencies from
ordinary stochastic dependencies that have no causal meaning.
Requirements
Reading this chapter we assume again that the reader is familiar with the contents of the
first five chapters of Steyer (2024), referred to as RS-chapters 1 to 5. Furthermore, we as-
sume familiarity with chapters 4 and 5 of the present book.
In this chapter we often will refer to the following notation and assumptions.
Under the Assumptions 6.1 (a) to (d) and (g), and assuming that τx = E X =x (Y |D X ) and
′
τx ′ = E X =x (Y |D X ) are P-unique, we defined the average total effect by ATE x x ′ = E (δxx ′ )
(see sect. 5.3). Inserting the definition of a true total effect variable δxx ′ = τx − τx ′ and the
definition of a true outcome variable yields
Under the assumptions mentioned above, including P -uniqueness of τx and τx ′ , all terms
in these equations are uniquely defined numbers [see RS-Th. 5.27 (v)].
Remark 6.4 [Unbiased With Respect to DC ] Note that Definition 6.3 refers to total effects
true outcome variables. Considering these true outcome variables τx = E X =x (Y |D X ), we
condition on a global potential confounder D X of X , and with it on its generated σ-algebra
σ(D X ) = DC (see RS-Def. 5.4). This is also the reason why DC occurs in the shortcuts for
unbiasedness. ⊳
Example 6.5 [No Treatment for Joe] In the example displayed in RS-Table 2.1, we assume
that the regular causality space is the same as specified in Example 4.10, and again, X , Y ,
and U take the roles of the putative cause variable, the outcome variable, and the global
potential confounder D X . Furthermore, the co-domain ΩX′ of X is any subset of R (includ-
ing R itself) containing the elements 0 and 1. The values of the two true outcome variables
τx = P X =x (Y =1|U ) = E X =x (Y |U ), x = 0, 1,
are displayed in the table. Whereas τ0 is the only element of the set E X=0 (Y |U ) (which
means that τ0 is uniquely defined and therefore P-unique), τ1 is not the only element
∗
in the set E X =1 (Y |U ) because the random variable τ1∗ = P X =1(Y =1|U ) displayed in the
last column of RS-Table 2.1 is also an element of E X =1
(Y |U ). Furthermore, τ1 = τ ∗ does
P 1
∗
not hold. Instead, τ1 (ω) 6= τ1 (ω) for ω ∈ {ω1 , . . . , ω4 } and P ({ω1 , . . . ω4 }) = .5. Hence, in this
example, the conditional expectation value E (Y |X =0) is unbiased, whereas the condi-
tions of the definition of unbiasedness neither hold for E (Y |X =1) nor for the conditional
expectation¡ E (Y |X ) because τ1 is not ¡P-unique (see Def. 6.3). In fact, in this example,
∗¢
E (τ1 ) = E P X =1(Y =1|U ) 6= E (τ1∗ ) = E P X =1(Y =1|U ) , which is easily seen looking at
¢
the last two columns of RS-Table 2.1 and the probabilities P ({ωi }) displayed in the first
numerical column. ⊳
E (Y |X =x ) = E X =x (Y ) (6.3)
[see RS-Eq. (3.24)]. Hence, if P (X =x ) > 0, then the conditional expectation value E (Y |X =x )
is unbiased if and only if the expectation E X =x (Y ) of Y with respect to the conditional
probability measure P X =x [see again RS-Eq. (5.1)] is unbiased, that is, if and only if τx is
P-unique and
E X =x (Y ) = E (τx ) = E E X =x (Y |D X ) .
¡ ¢
(6.4)
⊳
Remark 6.8 [Identification of E (τx )] Unbiasedness of E (Y |X =x ) is important because it
gives us access to the expectation E (τx ) of the true outcome variable τx . If E (Y |X =x ) is
unbiased, then, according to Definition 6.3 (i), an estimate of E (Y |X =x ) is also an estimate
of E (τx ). In contrast to E (τx ), the conditional expectation value E (Y |X =x ) can often be
estimated from a data sample of the random variables X and Y. The sample mean of the
values of Y that are observed together with the value x of the putative cause variable X is
such an estimate of E (Y |X =x ) (see RS-Exercise 3-4). If X is a treatment variable, then this
is the sample mean of the observed values of Y in treatment x. ⊳
In the following theorem, we present three conditions that are equivalent to unbiasedness
of E (Y |X =x ). Each of these conditions involves a true outcome variable τx . Reading this
theorem, remember
τx 1X =x ⇔ E (τx | 1X =x ) =
P
E (τx ) (6.5)
εx := τx − E (τx | 1X =x ), (6.8)
Remark 6.10 [Sufficient Conditions of Unbiasedness] Later it will be shown that E X =x (τx )
= E (τx ) as well as τx 1X =x follow from independence of τx and 1X =x (see Th. 7.7), which
itself follows from D X ⊥⊥ 1X =x , that is, from independence of a global potential confounder
D X of X and the indicator 1X =x [see Th. 8.22 (i) for X = 1X =x ]. Note that, in an experi-
ment in which X is the treatment variable and the person variable U takes the role of a
global potential confounder of X , the independence condition D X ⊥ ⊥ 1X =x can be created
by randomized assignment of the observational unit to treatment x (see the examples in
RS-Table 1.2 and in Table 6.3). ⊳
Remark 6.11 [Dichotomous X ] If X is dichotomous and x is one of the two values, then
E (τx |X ) =
P
E (τx | 1X =x ) =
P
E (τx | 1X 6=x ), (6.11)
because, if X is dichotomous, then the σ-algebras generated by X , 1X =x , and 1X 6=x are iden-
tical. Remember, the σ-algebras σ(X ), σ(1X =x ), and σ(1X 6=x ) play a crucial role in the defini-
tion of the conditional expectations E (τx |X ), E (τx | 1X =x ), and E (τx | 1X 6=x ) (see RS-Def. 4.4).
Hence, if X is dichotomous and the conditional expectations E X =x (Y |D X ) and E X 6=x(Y |D X )
are P-unique, then
τx X ⇔ τx 1X =x ⇔ E (Y |X ) ⊢ D X , (6.12)
E (Y |X ) ⊢ D X ⇔ E (Y |X =x ) ⊢ D X ∧ E (Y |X 6=x) ⊢ D X . (6.13)
⊳
Remark 6.12 [Unbiasedness of E (Y |X ) in Quasi-Experiments] In empirical applications
in which there is no randomized assignment of the observational unit to one of the treat-
ment conditions, unbiasedness of E (Y |X ) or E (Y |X =x ) is not very likely. However, if we
additionally consider a (uni- or multivariate) covariate Z of X and the conditional expecta-
tion values E (Y |X =x , Z =z), then unbiasedness of these parameters and of the conditional
expectation E (Y |X, Z ) is much more realistic, even beyond experiments with randomized
assignment of the unit to a treatment condition. This motivates the following section. ⊳
Remember that a true outcome variable τx denotes a version of the D X -conditional expec-
tation of Y with respect to the probability measure P X =x . That is, τx denotes an element of
the set E X =x (Y |D X ) (see Def. 5.4 and Rem. 5.5). Furthermore, remember
160 6 Unbiasedness and Identification of Causal Effects
and that P (X =x , Z =z) > 0 does not only imply P X =x (Z =z) > 0 and P (Z =z) > 0, but also
[see RS-Eq. (5.26)]. All terms appearing in Proposition (6.14) and Equation (6.15) have been
introduced in RS-section 5.1.
Furthermore, remember, under the Assumptions 6.1 (a) to (f),
and that this property follows from P -uniqueness of τx [see RS-Box 5.1 (v)]. Also note that
P Z=z -uniqueness of τx is equivalent to P (X =x |D X ) > 0 (see RS-Th. 5.27), which is de-
P Z=z
fined by
(a) τx is P-unique
(b) E X =x (Y |Z ) =
P
E (τx |Z ).
(a) τx is P Z=z-unique
(b) E X =x (Y |Z =z) = E (τx |Z =z).
Under the assumptions of Definition 6.13 (ii), the (Z =z)-conditional expectation value
E (τx |Z =z) is uniquely defined. Furthermore, if τx is P Z=z-unique for all z ∈ Z (Ω), then,
according to Lemma 5.37, it is also P-unique and
∀ z ∈ Z (Ω): E (τx |Z )(ω) = E (τx |Z =z), if ω ∈ {Z =z } . (6.18)
That is, if τx is P Z=z-unique for all z ∈ Z (Ω), then the (Z =z)-conditional expectation values
E (τx |Z =z) are the uniquely defined values of the conditional expectation E (τx |Z ).
Remark 6.14 [Unbiasedness of E Z=z(Y |X =x ) and E (Y |X =x , Z =z)] Let the assumptions
of Definition 6.13 (ii) hold. Then Proposition (6.15) allows us to define
and
Remark 6.15 [Identification of E (τx |Z =z) and E (τx |Z )] According to Defniition 6.13 (i), if
E X =x (Y |Z ) is unbiased, then we can identify the Z -conditional expectation E (τx |Z ) by
E X =x (Y |Z ). Hence, an estimate of E X =x (Y |Z ) is also an estimate of E (τx |Z ) provided that
E X =x (Y |Z ) is unbiased. Analogously, if E X =x (Y |Z =z) is unbiased, then, according to Defi-
nition 6.13 (i), we can identify the (Z =z)-conditional expectation E (τx |Z =z) of a true out-
come variable τx by the (Z =z)-conditional expectation E X =x(Y |Z =z) of Y with respect to
the conditional probability measure P X =x . Hence, an estimate of E X =x(Y |Z =z) is also an
estimate of E (τx |Z =z), provided that E X =x(Y |Z =z) is unbiased. In contrast to E (τx |Z ), the
conditional expectation E X =x (Y |Z ) can often be estimated from a data sample of the ran-
dom variables X , Y, and Z using the values of Y and Z observed together with the value x
of the putative cause variable X (see Exercise 6-10). ⊳
E X =x (Y |Z ) ⊢ D X ⇒ E (Y |X =x , Z =z) ⊢ D X . (6.21)
(Proof p. 188)
∀ x ∈ X (Ω): E X =x (Y |Z ) ⊢ D X . (6.23)
(ii) If we additionally assume 6.1 (f ) and P Z=z (X =x ) > 0, for all x ∈ X (Ω), then
E Z=z (Y |X ) is called unbiased , denoted E Z=z (Y |X ) ⊢ D X , if
[see Props. (6.19) and (6.20)]. Hence, we can replace Proposition (6.24) by each of the
propositions on the right-hand sides of (6.25) and (6.26).
In the next theorem we treat a condition that is equivalent to unbiasedness of a condi-
tional expectation E Z=z (Y |X ).
Now we turn to some conditions that are equivalent to unbiasedness of a conditional ex-
pectation E X =x (Y |Z ). These conditions are also used in the proofs of sufficient conditions
of unbiasedness (see chs. 8 to 10). Note again, in this chapter we assume that Z is a covari-
ate of X . That is, in contrast to chapter 5 where we defined a causal Z -conditional total
effect variable, now we exclude that a putative cause variable X can take the role of Z .
(i) Then
E X =x (Y |Z ) ⊢ D X ⇔ E X =x (τx |Z ) =
P
E (τx |Z ) (6.28)
⇔ E (τx | 1X =x , Z ) =
P
E (τx |Z ). (6.29)
εx := τx − E (τx | 1X =x , Z ), (6.30)
then
E X =x (Y |Z ) ⊢ D X ⇔ E X =x (εx |Z ) =
P
E (εx |Z ) (6.31)
⇔ E (εx | 1X =x , Z ) =
P
E (εx |Z ). (6.32)
(Proof p. 189)
PFE x x ′ := E (Y | X =x ) − E (Y | X =x ′ ). (6.39)
164 6 Unbiasedness and Identification of Causal Effects
where PFE Z ; x x ′ (Z ) denotes the composition of Z and PFE Z ; x x ′ . While PFE Z ; x x ′ assigns to
each value z ∈ ΩZ′ a (Z =z)-conditional prima facie effect of x compared to x ′, the composi-
tion PFE Z ; x x ′ (Z ) is a random variable on (Ω, A, P ) assigning values to each ω ∈ Ω. We call
the composition PFE Z ; x x ′ (Z ) a Z-conditional prima facie effect variable. Finally, presum-
ing P (X =x , Z =z), P (X =x ′, Z =z) > 0, we define the (Z =z)-conditional prima facie effect
Note again that all three concepts of unbiasedness refer to total effects and that Z de-
notes a covariate of X .
Now we show that unbiasedness of the conditional expectation values E (Y |X =x ) and
E (Y |X =x ′ ) implies unbiasedness of the prima facie effect PFE x x ′ .
Hence, under the Assumptions 6.1 (a) to (d) and (g), unbiasedness of E (Y |X =x ) and
E (Y |X =x ′ ) implies that the prima facie effect PFE x x ′ is unbiased.
Next, we show that unbiasedness of the Z-conditional expectations E X =x (Y |Z ) and
X =x ′
E (Y |Z ) implies unbiasedness of the Z-conditional prima facie effect function PFE Z ; x x ′ .
6.3 Unbiasedness of Prima Facie Effects 165
Hence, under the Assumptions 6.1 (a) to (e) and (g), unbiasedness of E X =x (Y |Z ) and
X =x ′
E (Y |Z ) implies that the Z -conditional prima facie effect function PFE Z ; x x ′ and the
composition PFE Z ; x x ′ (Z ) of PFE Z ; x x ′ and Z , the Z -conditional prima facie effect variable,
are unbiased [see Def. 6.23 (ii)].
In the following theorem, we explicate the relationship between unbiasedness of the
′
(Z =z)-conditional expectation values E X =x (Y |Z =z), E X =x (Y |Z =z), and unbiasedness of
the (Z =z)-conditional prima facie effect PFE Z ; x x ′ (z).
Hence, under the Assumptions 6.1 (a) to (f) and (g), unbiasedness of conditional expec-
′
tation values E X =x (Y |Z =z) and E X =x (Y |Z =z) implies that the (Z =z)-conditional prima
facie effect PFE Z ; x x ′ (z) is unbiased.
Box 6.1 summarizes the definitions of unbiasedness of various conditional expecta-
tions, their values, prima facie effect functions, and prima facie effects.
Unbiasedness of various conditional expectations, their values, prima facie effect functions,
and prima facie effects is symbolized and defined as follows:
even cannot be estimated unless very restrictive assumptions are introduced. This has
been discussed in some detail by Holland (1986) and has been called the “fundamental
problem of causal inference” (see also the preface). However, in chapters 7 to 10 we intro-
duce other causality conditions that can be tested empirically and that imply unbiased-
ness. It is those causality conditions that can be used for covariate selection in empirical
causal research. ⊳
In chapter 5 we introduced causal average total effects and causal conditional total effect
functions, which, in the first place, are of a purely theoretical nature. They just define what
6.4 Identification of Causal Total Effects 167
we are interested in, for example, in studies evaluating the causal effects of a treatment,
an intervention, or an exposition. Now we study how causal total effects can be identified,
that is, how they can be computed from parameters that can be estimated in samples, and
how the causal conditional total effect functions can be identified by functions that can be
estimated in samples.
presuming that τx and τx ′ are P-unique. Taking its expectation, the true total effect variable
δxx ′ = τx − τx ′ is coarsened to a single number. With such a coarsening, we often lose infor-
mation. However, the resulting causal average total effect is still unbiased. In this context,
instead of coarsening, we also use the term aggregation or re-aggregation. To emphasize,
re-aggregation does not mean to ignore the potential confounders of a putative cause vari-
able X . By definition, a potential confounder of X is a random variable on the probability
space considered that is measurable with respect to the global potential confounder D X
[see Def. 4.11 (iii)]. Re-aggregation only means coarsening and loosing information about
more fine-grained conditional effects. It does not reintroduce bias. Instead, re-aggregation
maintains causal interpretability.
Remark 6.30 [Basic Idea of the True-Outcome Theory of Causal Effects] In the construc-
tion of the theory of causal effects, we first condition on a global potential confounder
D X in order to control for all potential confounders. Doing this, we obtain the most fine-
grained causal total effect variable CTE D X ; xx ′ (D X ) = τx − τx ′ [see Def. 5.18 (i)]. Then we re-
aggregate it and obtain a coarsened causal effect function or effect parameter that can be
computed from an empirically estimable function or parameter. ⊳
Corollary 6.31 [Identifying the Causal Average Total Effect via PFE x x ′ ]
Let the Assumptions 6.1 (a) to (d) and (g) hold, and assume that PFE x x ′ is unbiased.
Then
ATE x x ′ = PFE x x ′ . (6.46)
Example 6.33 [Joe and Ann With Self-Selection] Table 6.1 displays the crucial parame-
ters of a random experiment, in which the effect of treatment 1 compared to treatment
0 is reversed if the person variable U is ignored. This table has already been shown in a
similar form in Table 1.2. However, now it is written in the terms introduced in chapters
5 and 6. In this example, the person variable U is a global potential confounder of X . The
causal average total effect is
= E (τ1 ) − E (τ0 )
X X =1 X X=0
= E (Y |U =u ) ·P (U =u ) − E (Y |U =u ) ·P (U =u )
u u
µ
1 1
¶ µ
1 1
¶
= .8 · + .4 · − .7 · + .2 · = .6 − .45 = .15
2 2 2 2
[see Box 6.2 (ii)]. In contrast, the corresponding prima facie effect is
[see Box 6.2 (i)]. Considering Box 6.2 and comparing Equations (i) and (ii) to each other
reveals why such a reversal of effects (.15 vs. −.18) can occur: Computing the conditional
expectation value E (Y |X =x ) we weigh the conditional expectation values E X =x (Y |U =u )
by the conditional probabilities P (U =u |X =x ) [see Eq. (i) in Box 6.2], whereas computing
the expectation E (τx ) of the true outcome variable we weigh them by the probabilities
P (U =u ) [see Eq. (ii) in that box]. [For a proof of Equations (i) to (iv) of Box 6.2 see Exercise
6-11]. ⊳
According to the following theorem we can also identify the causal average total ef-
fect ATE x x ′ if the prima facie effect PFE x x ′ is not unbiased. It suffices to assume that the
′
Z -conditional prima facie effect variable PFE Z ; x x ′ (Z ) = P
E X =x (Y |Z ) − E X =x (Y |Z ) is unbi-
ased. Reading this theorem, note that Z may be a multivariate covariate of X such that
Z = (Z 1 , . . . , Z m ) consists of m univariate covariates Z i , i = 1, . . . , m.
Theorem 6.34 [Identifying the Causal Average Total Effect via PFE Z ; x x ′ ]
Let the Assumptions 6.1 (a) to (e) and (g) hold, and assume that PFE Z ; x x ′ is unbiased.
Then
¡ ¢
ATE x x ′ = E PFE Z ; x x ′ (Z ) . (6.47)
(Proof p. 191)
In Equation (6.47) we re-aggregate the prima facie effect function PFE Z ; x x ′ to obtain
a single number. If PFE Z ; x x ′ is unbiased, then this does not mean to ignore the potential
6.4 Identification of Causal Total Effects 169
E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (U =u |X = 0)
P (X =1 |U =u )
P (U =u |X =1)
CTE U ;10 (u)
Person u
P (U =u )
Joe 1/2 .04 .7 .8 .1 4/5 1/20
Ann 1/2 .76 .2 .4 .2 1/5 19/20
x =0 x=1
E (τx ): .45 .6 ATE 10 = .15
E (Y |X =x ): .6 .42 PFE 10 = −.18
Remark 6.35 [Foundation for the Analysis of Causal Total Effects] Theorem 6.34 is the
theoretical foundation for the analysis of causal average total effects beyond the simple
randomized experiment. The crucial assumption is unbiasedness of PFE Z ; x x ′ , and this as-
sumption may also hold in observational studies. Of course, finding a (possibly multivari-
ate) covariate Z of X for which PFE Z ; x x ′ is unbiasedness is often a challenge for empirical
research. Also note that if PFE Z ; x x ′ is unbiased, then it is identical to the more fine-grained
causal total effect variable CTE Z ; x x ′ (Z ) (see Cor. 6.37), which is much more informative
than the causal average total effect ATE x x ′ . In the chapters to come we will learn more
about sufficient conditions for unbiasedness of PFE Z ; x x ′ . ⊳
The first term on the right-hand side of this equation is called the Z-adjusted (X =x )-condi-
tional expectation value of Y , and the second term the Z-adjusted (X =x ′ )-conditional ex-
pectation value of Y . Again, we presume that Z is a covariate of X . ⊳
In Definition 5.38 we introduced the causal conditional total effect function CTE Z ; xx ′ and
the composition CTE Z ; xx ′ (Z ) by
CTE Z ; x x ′ (Z ) =
P
E (τx − τx ′ |Z ), (6.49)
assuming that τx and τx ′ are P-unique and that Z is s random variable on (Ω, A, P ). In
Definition 6.23 (ii) we defined unbiasedness of the prima facie effect function PFE Z ; x x ′ by
170 6 Unbiasedness and Identification of Causal Effects
Consider the examples presented in Tables 6.1 to 6.4, let x ∈ X (Ω) denote a value of the treat-
ment variable X , and let U denote the observational-unit variable. In these examples, U = D X ,
P(X =x ,U =u ) > 0 for all values u ∈U (Ω), and τx = E X =x (Y |U ) ∈ E X =x (Y |U ). Then:
X
E (Y |X =x ) = E (τx |X =x ) = E (Y |X =x ,U =u ) ·P(U =u |X =x ) , (i)
u
whereas X
E (τx ) = E (Y |X =x ,U =u ) ·P(U =u ) . (ii)
u
Additionally, let Z be a covariate of X and let z ∈ Z (Ω). For the examples in Tables 6.1 to 6.4, in
which U is finite, this implies P(X =x , Z=z) > 0 and
X
E (Y |X =x ,Z=z) = E (τx |X =x , Z=z) = E (Y |X =x ,U =u ) ·P(U =u |X =x , Z=z ) , (iii)
u
whereas X
E (τx |Z=z) = E (Y |X =x ,U =u ) ·P(U =u |Z=z) . (iv)
u
PFE Z ; x x ′ (Z ) =
P
E (τx − τx ′ |Z ), (6.50)
Note that this assumption implies σ(V ) ⊂ σ(1X =x , Z ) and σ(V ) ⊂ σ(1X =x ′ , Z ). It holds, for
example, if one of the following conditions applies:
6.4 Identification of Causal Total Effects 171
This assumption holds, for example, if one of the following conditions applies:
(a) σ(V ) ⊂ σ(Z )
(b) σ(V ) ⊂ σ(X ).
More general conditions implying Proposition (6.55) are found in RS-Lemma 2.35 and RS-
Corollary 2.36.
If σ(X ) 6⊂ σ(1X =x , 1X =x ′ ), then Theorem 6.41 extends the results of Theorem 6.39 to a
larger set of random variables taking the role of V . However, the price is to assume P -
uniqueness of τx and τx ′ as well as Z -conditional mean-independence of τx and τx ′ from
X , that is,
τx X | Z and τx ′ X |Z . (6.56)
172 6 Unbiasedness and Identification of Causal Effects
E (τx |X , Z ) =
P
E (τx |Z ) and E (τx ′ |X , Z ) =
P
E (τx ′ |Z ), (6.57)
respectively. Some sufficient conditions for the equations in Proposition (6.57) will be
treated in chapter 7 (see, e. g., Table 7.2), which is devoted to the Rosenbaum-Rubin
causality conditions.
Under P -uniqueness of τx and τx ′ , assuming (6.57) implies unbiasedness of E X =x (Y |Z )
′
and E X =x (Y |Z ), that is, it implies
′
E X =x (Y |Z ) ⊢ D X and E X =x (Y |Z ) ⊢ D X (6.58)
′
(see Exercise 6-12). Finally, note that unbiasedness of E X =x (Y |Z ) and E X =x (Y |Z ) is equiv-
alent to τx 1X =x | Z and τx 1X =x ′ |Z , provided that τx and τx ′ are P-unique [see Th. 6.20 (i)].
Theorem 6.41 [Identifying a Causal V -Conditional Effect Function via PFE Z ; x x ′ III ]
Let the Assumptions 6.1 (a) to (e) and (g) hold, and let V be a random variable on
(Ω, A, P ) satisfying σ(V ) ⊂ σ(X , Z ). Furthermore, assume that τx and τx ′ are P-unique,
and that Equations (6.57) hold. Then
¡ ¯ ¢
CTE V ; x x ′ (V ) =
P
E PFE Z ; x x ′ (Z ) ¯V (6.59)
′
E E X =x (Y |Z ) ¯V − E E X =x (Y |Z ) ¯V .
¡ ¯ ¢ ¡ ¯ ¢
=
P
(6.60)
(Proof p. 192)
Remark 6.42 [Foundation for the Analysis of Causal Conditional Total Effects] Corollary
6.37, Theorem 6.39, and Theorem 6.41 are the theoretical foundations for the analysis of
causal conditional total effect functions. The crucial assumptions in Theorem 6.39 are un-
biasedness of PFE Z ; x x ′ and Proposition (6.52). In contrast, the crucial assumptions in The-
orem 6.41 are that τx and τx ′ are P-unique, σ(V ) ⊂ σ(X , Z ), and that the Equations (6.57)
hold. In the chapters to come we will study various sufficient conditions for these require-
ments. ⊳
The causal conditional total effect CTE Z ; x x ′ (z) has been defined by
assuming that τx and τx ′ are P Z=z-unique [see Def. 5.38 (ii)]. Furthermore, additionally
assuming P (X =x , Z =z) > 0 and P (X =x ′, Z =z) > 0, unbiasedness of the conditional prima
facie effect PFE Z ; x x ′ (z) has been defined by the conjunction of
and P Z=z -uniqueness of τx and τx ′ [see Def. 6.23 (iii)]. The assumption that PFE Z ; x x ′ (z) is
unbiased comprises the assumption that τx and τx ′ are P Z=z -unique. This assumption has
already been explained in more detail in Remarks 5.34 and 5.36.
Equations (6.61) and (6.62) immediately imply the following corollary.
6.4 Identification of Causal Total Effects 173
Hence, under the assumptions of Corollary (6.43), the (Z =z)-conditional causal total
effect on Y comparing x to x ′ , that is, CTE Z ; xx ′ (z), is identical to the corresponding prima
facie effect PFE Z ; x x ′ (z).
Now we consider again re-aggregation of a causal Z -conditional effect variable, assum-
ing CTE Z ; x x ′ (Z ) =
P
PFE Z ; xx ′ (Z ), that is, assuming unbiasedness of PFE Z ; x x ′ (Z ). Theorem
6.39 and Theorem 6.41 imply the following corollary about the identification of the causal
(V =v )-conditional total effect CTE V ; x x ′ (v), which is a uniquely defined number because
we assume P (V =v) > 0 (see RS-Rem. 4.26).
Hence, under the assumptions of Corollary (6.44), the (V =v)-conditional causal total
effect on Y comparing x to x ′ , that is, CTE V ; xx ′ (v), is identical to the (V =v)-conditional
expected value of the prima facie effect variable PFE Z ; x x ′ (Z ). Note that the terms on the
right-hand sides of these equations are estimable in an appropriate sampling model. Also
note, the true outcome variables τx and τx ′ are implicitly involved in the term CTE V ; xx ′ (v)
because CTE V ; xx ′ (v) = E (τx |V =v) − E (τx ′ |V =v). In contrast, true outcome variables are
not involved in the terms on the right-hand sides of Equations (6.64) and (6.65).
Remark 6.45 [Unbiasedness of PFE Z ; x x ′ (z) vs. Unbiasedness of PFE V ; x x ′ (v)] Again, note
that even if V is Z -measurable, then CTE V ; x x ′ (v) is not necessarily identical to
′
PFE V ; x x ′ (v) = E X =x (Y |V =v) − E X =x (Y |V =v) .
Comparing the right-hand side of this equation to the right-hand side of Equation (6.65)
reveals the difference. However, CTE V ; x x ′ (v) = PFE V ; x x ′ (v), if PFE V ; x x ′ (v) is unbiased. Note
that unbiasedness of PFE V ; x x ′ (v) is not implied by the assumptions of Corollary 6.44.
Hence, Corollary 6.44 offers a way to identify CTE V ; xx ′ (v) even if PFE V ; x x ′ (v) is biased.
The crucial assumptions are mentioned in Theorems 6.39 and 6.41. ⊳
Box 6.3 Identification of causal total effects and causal conditional effect functions
Various causal total effects and causal total effect functions can be identified ...
See Box 6.1 for the definitions of the unbiasedness assumptions such as PFE x x ′ ⊢ DC .
¡ ¯ ¢
CTE X ; 10 (0) = E PFE Z ; 10 (Z ) ¯X =0 (6.66)
= E E X =1 (Y | Z ) ¯ X =0 − E E X=0 (Y | Z ) ¯ X =0
¡ ¯ ¢ ¡ ¯ ¢
(6.67)
and
¡ ¯ ¢
CTE X ; 10 (1) = E PFE Z ; 10 (Z ) ¯X =1 (6.68)
= E E X =1 (Y | Z ) ¯ X =1 − E E X=0 (Y | Z ) ¯ X =1 .
¡ ¯ ¢ ¡ ¯ ¢
(6.69)
is the causal conditional total effect (comparing treatment to control) given control and
the value w of W . Correspondingly,
¡ ¯ ¢
CTE X W ; 10 (1, w) = E PFE Z ; 10 (Z ) ¯X =1,W =w (6.72)
¡ X =1 ¯ ¢ ¡ X=0 ¯ ¢
= E E (Y | Z ) X =1,W =w − E E
¯ (Y | Z ) X =1,W =w .
¯ (6.73)
is the causal conditional total effect given treatment and the value w of W . ⊳
Tables 6.2 to 6.4 show parameters pertaining to fictive random experiments such as the
single-unit trials described in chapter 2. Among these parameters are the individual ex-
pectation values E (Y |X =x ,U =u ) given the treatment conditions, and the individual treat-
ment probabilities P (X =1|U =u ). The parameters presented in the tables can be used to
generate sample data that would result if the random experiments to which the tables refer
were conducted n times.1
Remark 6.47 [The Probability Space] For simplicity, we consider random experiments in
which no fallible covariates are observed and in which there is neither a second treatment
variable nor any other variable that is simultaneous to the treatment variable. In this case,
the set
Ω = Ω1 × Ω2 × Ω3 = ΩU × ΩX × R (6.74)
suffices to describe the set of possible outcomes of the random experiment, where ΩU =
{Tom , Tim , Joe , Jim , Ann , Sue } and ΩX = {treatment, control}. Furthermore, we consider
the product σ-algebra A = P (ΩU ) ⊗ P (ΩX ) ⊗ B, where B denotes the Borel σ-algebra on
R , the set of real numbers (see RS-Rem. 1.14). The probability measure P on (Ω, A ) is only
partly known. Looking at Table 6.2, for example, we only know the conditional expectation
E (Y |X ,U ). In contrast, we do not know the conditional distribution P Y | X ,U , which would
be known only if additional information were added, such as ‘Y is conditionally normally
distributed given (X ,U )’, with a specified conditional variance of Y given (X ,U ). However,
for our purpose, the conditional distribution of Y is not relevant because we only consider
1 Although the focus of this book is on theory and not on data analysis, we also provide sample data for each table
on the home page of this book: www.causal-effects.de. These and other examples of this type as well as a data
sample generated by these examples can easily be created with the PC-program CausalEffectsXplorer that is also
provided on www.causal-effects.de, together with an extensive help file providing the most important concepts
and formulas.
176 6 Unbiasedness and Identification of Causal Effects
Fundamental parameters
E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (X =1|U =u )
P (U =u |X = 0)
P (U =u |X =1)
CTE U ;10 (u)
Person u
P (U =u )
Sex z
x=0 x =1
E (τx ): 92.333 102.333 ATE 10 = 10
E (Y |X =x ): 100.286 94.429 PFE 10 = −5.857
causal total effects that are defined in terms of the conditional expectation E (Y | X ,U ) and
its values E (Y | X =x ,U =u ) (see ch. 5). ⊳
Remark 6.48 [Filtration and Global Potential Confounder] The filtration (Ft )t ∈T , t ∈T =
{1, 2, 3} consists of the σ-algebras
where π1, π2, and π3 are the projections πt : Ω → Ωt , t ∈T (see Def. 4.6). ⊳
Remark 6.49 [Random Variables] In all ¡ examples ¢of this section, we consider the person
variable U : Ω → ΩU with
¡ ′ value space ΩU , P (ΩU ) , the treatment variable X : Ω → ΩX′ =
′
¢
{0, 1} with value space ΩX , P (ΩX ) , where X takes on the value 1 for treatment and 0 for
control, and the real-valued outcome variable Y : Ω → R with value space (R , B). ⊳
Remark 6.50 [Cause and Confounder σ-Algebras] Because there are no other variables
that are simultaneous to the treatment variable X , the index sets J and K used in Defi-
nition 4.6 are J = K = {1}. Hence, the cause σ-algebra is C = σ(π2j , j ∈ K ) = σ(π2) and the
confounder σ-algebra is DC = σ(π1, π2j , j ∈ J \ K ) = σ(π1 ).
Note that U = π1, which is a global potential confounder of X [see Def. 4.11 (iii)]. Fur-
thermore, U is prior in (Ft )t ∈T to X and Y, and X is prior to Y (see sect. 3.1). ⊳
6.5 Three Examples 177
U g1
Ω = ΩU × ΩX × R R
τ1 = E X =1 (Y |U ) = g 1 (U )
Figure 6.1. The person variable U , the function g 1 , and their composition, the true out-
come variable τ1 = g 1 (U ).
Hence, for all examples of this section, we specified the regular probabilistic causality
setup
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y
[see Def. 4.11 (v)] and asserted that U is a global potential confounder of X .
Remark 6.51 [True Outcome Variables] In all examples of this chapter, P (X =x ,U =u ) > 0
for all pairs (x, u) of values of X and U . Therefore, and because σ(U )= DC , the two true
outcomes variables are
τx := E X =x (Y |D X ) = E X =x (Y |U ), x ∈ {0, 1}. (6.76)
According to Equation (5.4), the true outcome variable τx can also be written as a function
of the person variable U . More specifically, it can be written as the composition of U and
the function g x : ΩU → R defined by
g x (u) = E (Y |X =x ,U =u ), for all u ∈ ΩU . (6.77)
Note, that in this definition, x is a fixed value of X , and τx = g x (U ) is the composition of U
and g x , that is,
¡ ¢
∀ω ∈ Ω: τx (ω) = g x U (ω) = g x (u), if ω ∈ {U =u }. (6.78)
This implies that the values of the conditional expectation E X =x (Y |U ) are identical to the
conditional expectation values E (Y |X =x ,U =u ) [see Eq. (5.3)]. Figure 6.1 illustrates these
equations for treatment x =1. ⊳
In the first example (see Table 6.2), the individual treatment probabilities are different for
each and every person, and they strongly depend on the individual expectation values of
the outcomes under control, that is, they depend on the true outcome variable τ0 and also
on τ1 . The conditional expectation values E (Y |X =1) and E (Y |X =0) are biased. In fact, the
prima facie effect PFE 10 is negative, whereas the causal average total effect ATE 10 is posi-
tive. Furthermore, the conditional expectation values E (Y |X =1, Z =z) and E (Y |X =0, Z =z)
are biased as well. Although the causal (Z =z)-conditional total effects and the causal aver-
age total effect are defined (and can be computed from the parameters displayed in the up-
per left part of the table), they cannot be estimated from empirically estimable parameters
such as the conditional expectation values E (Y |X =1) and E (Y |X =0) or E (Y |X =1, Z =z)
and E (Y |X =0, Z =z).
In the second example (see Table 6.3), the treatment probabilities are identical for all
persons, implying that X and U are independent, which has many implications that are
studied in detail in chapter 8. Among these implications are that the conditional expecta-
tion E (Y |X ) and its values E (Y |X =x ) as well as the conditional expectation E (Y |X, Z ) and
its values E (Y |X =x , Z =z) are unbiased.
In the third example (see Table 6.4), the treatment probabilities differ between males
and females. Furthermore, males (m) and females ( f ) also differ in their conditional ex-
pectation values of the true outcome variable τ0 , that is, E (τ0 |Z =m) 6= E (τ0 |Z =f ). How-
ever, given the value m or f of Z , the treatment probabilities do not differ from each
other. This implies that X and U are Z -conditionally independent. In this example, in
which P (X =x ,U =u ) > 0 for all pairs of values of X and U , implying that the true out-
come variables τ0 and τ1 are P-unique (see Rem. 5.17), we can conclude that E (Y |X, Z )
is unbiased (see Th. 8.31). Hence, in this third example, the conditional expectation val-
ues E (Y |X =x ) are biased, whereas the conditional expectation values E (Y |X =x , Z =z) are
unbiased. Again, this case is studied extensively in chapter 8.
Tables 6.2, 6.3, and 6.4 display the true outcomes and the individual treatment proba-
bilities. According to Equation (6.76), the values of τx , are also the individual conditional
expectation values E (Y |X =x ,U =u ), and, according to Equation (6.79), the values of the
conditional probability P (X =1|D X ) = P (X =1|U ) are identical to the individual treatment
probabilities P (X =1|U =u ). The tables also display the values of Z := sex, which is a the
covariate of X because it is U -measurable.
Looking at the first four numerical columns, the three tables differ only in the treatment
probabilities P (X =1|U =u ). All other entries in these first four columns, such as the true
outcomes are the same. However, if we look at the other parameters, the three tables differ
in important aspects.
Fundamental parameters
E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (X =1|U =u )
P (U =u |X = 0)
P (U =u |X =1)
CTE U ;10 (u)
Person u
P (U =u )
Sex z
Tom m 1/6 3/4 68 81 13 1/6 1/6
Tim m 1/6 3/4 78 86 8 1/6 1/6
Joe m 1/6 3/4 88 100 12 1/6 1/6
Jim m 1/6 3/4 98 103 5 1/6 1/6
Ann f 1/6 3/4 106 114 8 1/6 1/6
Sue f 1/6 3/4 116 130 14 1/6 1/6
x =0 x=1
E (τx ): 92.333 102.333 ATE 10 = 10
E (Y |X =x ): 92.333 102.333 PFE 10 = 10
These expectations and conditional expectations are easy to compute from the param-
eters displayed in Table 6.2. The expectation E (τx ) of the true outcome variable τx is ob-
tained using the unconditional probabilities P (U =u ) as weights [see Box 6.2 (ii)]. In con-
trast, the corresponding conditional expectations values E (Y |X =x ) are identical to the
(X =x )-conditional expectation values E (τx |X =x ) of the true outcome variables, using as
weights the conditional probabilities P (U =u |X =x ) [see Box 6.2 (i)].
If used for the evaluation of the total treatment effect, the (X =x )-conditional expecta-
tion values would lead to completely wrong conclusions. First, the direction of the prima
facie effect
is reversed if compared to
¡ ¢
ATE 10 = E CTE U ;10 (U ) = E (τ1 ) − E (τ0 ) = 10.
And second, it is also reversed if compared to each and every individual total effect [see
the column CTE U ;10 (u) in Table 6.2]. All individual total effects are positive in this exam-
ple, ranging between 5 and 14. The bias in this example is due to strong inter-individual
differences in the true outcomes and to the fact that the individual treatment probabilities
P (X =1|U =u ) heavily depend on the true outcome variables, and, therefore, on the per-
son variable U . For instance, Tom has a true outcome under control of E X=0 (Y |U =Tom ) =
180 6 Unbiasedness and Identification of Causal Effects
Fundamental parameters
E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (U =u |X = 0)
P (X =1|U =u )
P (U =u |X =1)
CTE U ;10 (u)
Person u
P (U =u )
Sex z
Tom m 1/6 3/4 68 81 13 1/10 3/14
Tim m 1/6 3/4 78 86 8 1/10 3/14
Joe m 1/6 3/4 88 100 12 1/10 3/14
Jim m 1/6 3/4 98 103 5 1/10 3/14
Ann f 1/6 1/4 106 114 8 3/10 1/14
Sue f 1/6 1/4 116 130 14 3/10 1/14
x =0 x=1
E (τx ): 92.333 102.333 ATE 10 = 10
E (Y |X =x ): 99.8 96.714 PFE 10 = −3.086
68 and a treatment probability of 6/7, while Sue has a true outcome under control of
E X=0 (Y |U =Sue ) = 116 and a treatment probability of 1/7. Such a constellation is to be
expected under self-selection of subjects to treatments, if the subjects base their decisions
to take treatment on the severity of their dysfunction before treatment and if severity of their
dysfunction after treatment is assessed as the outcome variable.
This example once again drastically demonstrates the necessity for the distinction
between a difference E (Y | X =1) − E (Y | X =0) of conditional expectation values and the
causal average total effect E (τ1 ) − E (τ0 ). Obviously, only the latter is of interest if we want
to evaluate the treatment.
For the second example presented in Table 6.3, the situation is completely different.
Although the true outcome variables are the same as in Table 6.2, here, the conditional
expectation values E (Y |X =x ) and the expectations E (τx ) of the true outcome variables
τx = E X =x (Y |U ) are identical to each other, and this applies to both values x=0 and x=1
of X . Hence, in this example, the conditional expectation values E (Y |X =x ) are unbiased
and can be used for the evaluation of the treatment effect. This is due to the fact that the
individual treatment probabilities do not depend on the persons. This constellation occurs
in a perfect randomized experiment, in which the experimenter decides that each person
is in treatment 1 with probability P (X =1) and in treatment 0 with probability 1 − P (X =1),
provided, of course, that the persons comply with the experimenters decisions. In our sec-
ond example, P (X =1) = 3/4. Note, however, that P (X =1) could be any number between 0
6.5 Three Examples 181
and 1, exclusively. The only important point is that the individual treatment probabilities
do not differ between persons, that is, P (X =1|U =u ) = P (X =1) for all persons u ∈ ΩU . Such
a randomized assignment may be performed by drawing a ball from an urn with three
black balls and one white ball, adopting the rule that the subject is treated if a black ball is
drawn.
In the third example (see Table 6.4), the conditional expectation values E (Y |X =x ) are
biased again. Here, E (Y |X =0) = 99.8, whereas E (τ0 ) = 92.333. Again, E (Y |X =0) is much
larger than E (τ0 ). In contrast, E (Y |X =1) = 96.714, whereas E (τ1 ) = 102.333, that is, again
E (Y |X =1) is much smaller than E (τ1 ). Hence, in this example, the conditional expectation
values E (Y |X =x ) are strongly biased as well. However, in contrast to the first example, the
conditional expectation values E (Y |X =x , Z =z) are unbiased. In this example, the treat-
ment probability is 3/4 for all male units, while it is 1/4 for all female units. The crucial
point is that these probabilities are identical given a value m or a value f of the covariate
Z , that is, P (X =1| Z =z,U =u ) = P (X =1| Z =z) for each person u and both values z of the
covariate Z . This constellation holds in a perfect conditionally randomized experiment in
which we assign the sampled person to treatment with probability P (X =1| Z =m) if he is
male and with probability P (X =1| Z = f ) if the sampled person is female.
The conditional expectation values E (Y |X =x ,Z =z), can be computed from the parame-
ters displayed in Table 6.4, applying Equation (iii) of Box 6.2. For this purpose we also need
the formula
P (X =x |U =u ) · P (U =u , Z =z)
P (U =u |X =x , Z =z) = , (6.80)
P (X =x | Z =z) · P (Z =z)
where P
u P (X =x |U =u ) · P (U =u , Z =z)
P (X =x | Z =z) = (6.81)
P (Z =z)
(see Exercise 6-13). Note that in the three examples P (X =x |U =u , Z =z) = P (X =x |U =u )
because in theses examples Z is U -measurable. Intuitively speaking, this means that Z
(sex) does not contain any information that is not already contained in U (the person vari-
able). All terms on the right-hand side of Equation (6.80) are displayed in Table 6.4 or can
be computed from the parameters displayed in this table.2
Remark 6.53 [How Realistic Are These Examples?] In empirical applications, assuming
D X = U is correct if (a) there is neither a second treatment variable nor another variable
that is simultaneous to X and if (b) no fallible covariate is observed. In this case, u sig-
nifies the observational unit at the onset of treatment. If, however, a fallible covariate of
X is observed and u represents the observational unit at the time at which the covariate
is assessed, then there may very well be covariates that are not measurable with respect
to U , which affect the outcome variable Y and/or the treatment probability (see section
2.2). Hence, in this case, D X = U would not hold. In this case a global potential confounder
would be the multivariate random variable (U , Z ), where Z = (Z 1 , . . . , Z m ) consists of the
fallible covariates Z i , i = 1, . . . , m, to be assessed before treatment. That is, in this case
D X = (U , Z ) is a global potential confounder of X [see Def. 4.11 (iii)]. ⊳
2 An alternative is using the Causal Effects Xplorer provided at www.causal-effects.de, the home page of this
book.
182 6 Unbiasedness and Identification of Causal Effects
Comparing the conditional prima facie effects to the causal conditional total effects re-
veals that the conditional prima facie effects are still biased with respect to total effects
in the random experiment presented in Table 6.2, but not in the examples displayed in
Tables 6.3 and 6.4. Hence, in the first example, the conditional prima facie effect and the
causal conditional total effect for males are not identical, while they are identical in the
second and third example, and the same applies to the corresponding conditional prima
facie effects for females.
The bias of the conditional prima facie effects in the example presented in Table 6.2 is
no surprise because there are still individual differences within the two sets of males and
females with respect to (a) the true outcomes under treatment and under control, as well
as (b) in the individual treatment probabilities P (X =1|U =u ). In contrast, in the second
and third example, the individual treatment probabilities are all the same within each of
the two sets of males and females.
In all three examples, the average over the individual total effects is equal to the causal
average total effect. [Remember, the causal average total effect is defined as the expec-
tation of true total effect variable CTE D X ; 10 (D X ) = δ10 (see Def. 5.26) and in this exam-
ple U =D X .] However, only in the second and third example, the expectation of the Z -
conditional prima facie effects is equal to the causal average total effect. Because this is no
coincidence, this fact can be used for causal inference even in those cases in which the
unconditional prima facie effects are biased, provided that the conditional prima facie ef-
fects are unbiased, that is, provided that PFE Z ;10 (z) = E (δ10 |Z =z) for each value z of the
covariate Z (see Th. 6.34).
Whether or not a causal average total effect is meaningful if there are different causal
conditional total effects — some of which may even be negative, while some are positive —
needs judgement with regards to content in the specific applications considered. In some
applications it might be meaningful, in others it might not. Clearly, causal conditional to-
tal effects give more specific information than the causal average total effect. However,
there are also advantages of causal average total effects. First, they give a summary eval-
uation of a treatment in a single number and different treatments may be compared to
each other with respect to this number. Second, in samples of limited size, causal average
effects can be estimated with more accuracy than the plenitude of causal conditional ef-
fects. And third, one should keep in mind that even causal conditional total effects are only
causal average total effects (see, e. g., Table 6.4). Hence, it is always a matter of case-specific
judgement how fine-grained the analysis should be.
First Conclusions
The three examples show that conditioning on a covariate of X does not necessarily yield
unbiasedness given the values of the covariate. While there is no bias at all in the second
example, the third example shows that conditioning may remove bias. Comparing Tables
6.3 and 6.4 to each other shows that unbiasedness of the conditional expectation values
E (Y |X =x , Z =z) relies on specific conditions. In these two tables, P (X =1|U ) = P (X =1| Z ),
6.6 An Example With Accidental Unbiasedness 183
that is, in these two tables there are equal individual treatment probabilities for all units
with an identical value z of Z . Such conditions implying unbiasedness of E (Y |X ) or of
E (Y |X, Z ) and their values are called causality conditions. Note, however, that there are
several of such causality conditions that do not involve the U -conditional treatment prob-
abilities (see ch. 9).
Now we treat an example demonstrating that there can be unbiasedness of the conditional
expectation E (Y |X ) and at the same time bias of the conditional expectation E (Y |X , Z ),
where Z is a covariate of X . This example shows that unbiasedness can be accidental, that
is, there are cases in which unbiasedness is not a logical consequence of experimental
design but an ‘accident of numbers’. In chapter 8 we show that the experimental design
technique of randomized assignment always induces unbiasedness of the conditional ex-
pectations E (Y |X ) and E (Y |X, Z ), whenever Z is a covariate of X , that is, whenever Z is
measurable with respect to D X .3
Table 6.5 displays the relevant parameters. We assume that it is a simple experiment
having the same structure as the examples treated in section 6.5. That is,
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,
as specified in section 6.5, is the regular probabilistic causality setup. The only difference
is that Ω1 = ΩU = { Joe , Jim , Ann , Sue } now consists of four (instead of six) persons. Again,
D X = U is a global potential confounder of X .
In this specific example, the causal individual total effects are the same for all persons,
namely 5, implying that the causal average total effect is also 5. The prima facie effect
can be computed from the difference between the two conditional expectation values
E (Y |X =0) and E (Y |X =1). In this example, Box 6.2 (i) yields
X
E (Y |X =0) = E (Y | X = 0,U =u ) · P (U =u |X = 0)
u
3 1 7 5
= 95 · + 65 · + 80 · + 50 · = 72.5
16 16 16 16
and
X
E (Y |X =1) = E (Y | X =1,U =u ) · P (U =u |X = 1)
u
5 7 1 3
= 100 · + 70 · + 85 · + 55 · = 77.5.
16 16 16 16
Using Equation (ii) of Box 6.2, the corresponding expectations of the true outcome vari-
ables are
3 Note again that unbiasedness does not refer to a sample and that there is no (successful) randomized assign-
ment if there is systematic attrition, that is, if persons do not comply with the assignments of the experimenter.
184 6 Unbiasedness and Identification of Causal Effects
Fundamental parameters
E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (U =u |X = 0)
P (X =1 |U =u )
P (U =u |X =1)
CTE U ;10 (u)
Person u
P (U =u )
Sex z
Joe m 1/4 5/8 95 100 5 3/16 5/16
Jim m 1/4 7/8 65 70 5 1/16 7/16
Ann f 1/4 1/8 80 85 5 7/16 1/16
Sue f 1/4 3/8 50 55 5 5/16 3/16
x =0 x =1
E (τx ): 72.5 77.5 ATE 10 = 5
E (Y |X =x ): 72.5 77.5 PFE 10 = 5
X
E (τ0 ) = E (Y | X = 0,U =u ) · P (U =u )
u
1 1 1 1
= 95 · + 65 · + 80 · + 50 · = 72.5
4 4 4 4
and
X
E (τ1 ) = E (Y | X =1,U =u ) · P (U =u )
u
1 1 1 1
= 100 · + 70 · + 85 · + 55 · = 77.5.
4 4 4 4
Hence, the conditional expectation values E (Y |X =0) and E (Y |X =1) are unbiased because
they are identical to the corresponding expectations E (τ0 ) and E (τ1 ) of the true outcome
variables and because the two true outcome variables are P-unique.
The conditional expectation values E (Y |X =1, Z =z) and E (Y |X = 0, Z =z) can be computed
from the parameters displayed in Table 6.5 using Equation (iii) of Box 6.2. This equation
holds because, in this example, the random variable Z is measurable with respect to U
(see RS-Cor. 2.36). While the individual expected outcomes E (Y |X =x ,U =u ) are displayed
in Table 6.5, the conditional probabilities P (U =u |X =x , Z =z) have to be computed via
Equation (6.80) (see Exercise 6-13).
For Z =m (males), Equation (iii) of Box 6.2 yields
6.6 An Example With Accidental Unbiasedness 185
X
E (Y |X = 0, Z =m) = E (Y |X =0,U =u ) · P (U =u |X =0, Z =m)
u
9 3
= 95 · + 65 · + 80 · 0 + 50 · 0 = 87.5
12 12
and X
E (Y |X =1, Z =m) = E (Y | X =1,U =u ) · P (U =u |X =1, Z =m)
u
5 7
= 100 · + 70 · + 85 · 0 + 55 · 0 = 82.5.
12 12
In contrast, using Equation (iv) of Box 6.2, the (Z =m)-conditional expectation values
of the true outcome variables are
X
E (τ0 |Z =m) = E (Y | X =0,U =u ) · P (U =u |Z =m)
u
1 1
= 95 · + 65 · + 80 · 0 + 50 · 0 = 80
2 2
and
X
E (τ1 |Z =m) = E (Y | X =1,U =u ) · P (U =u |Z =m)
u
1 1
= 100 · + 70 · + 85 · 0 + 55 · 0 = 85.
2 2
For Z = f (females), Equation (iii) of Box 6.2 yields
X
E (Y |X = 0, Z = f ) = E (Y |X =0,U =u ) · P (U =u |X =0, Z = f )
u
= 95 · 0 + 65 · 0 + 80 · 7/12 + 50 · 5/12 = 67.5
and
X
E (Y |X =1, Z = f ) = E (Y |X =1,U =u ) · P (U =u |X =1, Z =m)
u
= 100 · 0 + 70 · 0 + 85 · 3/12 + 55 · 9/12 = 62.5.
In contrast, using Equation (iv) of Box 6.2, the (Z = f )-conditional expectation values of
the true outcome variables are
X
E (τ0 |Z = f ) = E (Y |X =0,U =u ) · P (U =u |Z = f )
u
1 1
= 95 · 0 + 65 · 0 + 80 · + 50 · = 65
2 2
and
X
E (τ1 |Z = f ) = E (Y |X =1,U =u ) · P (U =u |Z = f )
u
1 1
= 100 · 0 + 70 · 0 + 85 · + 55 · = 70.
2 2
Obviously, the conditional expectation values E (Y |X =x , Z =z) of the outcome variable Y
are not identical to the corresponding conditional expectation values E (τx |Z =z) of the
true outcome variables. Hence, the E (Y |X =x , Z =z) are biased although the conditional
expectation values E (Y |X =x ) are unbiased.
186 6 Unbiasedness and Identification of Causal Effects
Unbiasedness
Unbiasedness is a first kind of causality conditions, which, together with the additional
structural components listed in a regular probabilistic causality space, distinguishes a
conditional expectation that has a causal meaning from an ordinary conditional expecta-
tion. Several kinds of conditional expectations and their values as well as their differences
can be unbiased (see Box 6.1). The general insight of this chapter is that comparing con-
ditional expectation values (true means) does not allow to draw any conclusions on the
effects of a treatment or intervention unless they are unbiased. In terms of the metaphor
discussed in the preface, conditional expectation values and their differences are like the
shadow of the invisible man. The length of this shadow is identical to the height of the
invisible man only under very specific conditions, in particular the angle between the sun
and the surface of the earth at the point on which the man stands.
Identification
The unbiasedness conditions are the weakest assumptions under which we can identify
causal average total effects and causal conditional total effects and effect functions. Box
6.3 summarizes the identification equations. Note that the right-hand sides of these equa-
tions are empirically estimable parameters or empirically estimable functions that can be
computed from the conditional expectations E (Y |X ) or E (Y |X, Z ). That is, only the pu-
tative cause variable X , the outcome variable Y , and the random variables Z and V are
involved. All other causality conditions that will be treated in the chapters to come imply
unbiasedness, provided that true outcome theory applies. Some of these other causality
6.8 Proofs 187
conditions are empirically testable, at least in the sense of falsifiability. Unfortunately, this
does not apply to unbiasedness itself.
Limitations
6.8 Proofs
E (Y |X =x ) ⊢ D X
⇔ E (Y |X =x ) = E (τx ) [Def. 6.3 (i)]
X =x
⇔ E (Y ) = E (τx ) [(6.3)]
X =x
¡ X =x ¢
⇔ E E (Y |D X ) = E (τx ) [RS-Box 4.1 (iv)]
X =x
⇔ E (τx ) = E (τx ). [(5.1)]
Equation (6.7).
E (Y |X =x ) ⊢ D X
⇔ E X =x (τx ) = E (τx ) [(6.6)]
⇔ E (τx |X =x ) = E (τx ) [(6.3)]
⇔ τx 1X =x [RS-Th. 4.38 (ii)]
⇔ E (τx | 1X =x ) =
P
E (τx ). [RS-(4.35)]
Equation (6.9).
E (Y |X =x ) ⊢ D X
⇔ E X =x (τx ) = E (τx ) [(6.6)]
X =x
¡ ¢ ¡ ¢
⇔ E εx +E (τx | 1X =x ) = E εx +E (τx | 1X =x ) [(6.8)]
⇔ E X =x (εx ) + E X =x E (τx | 1X =x ) = E (εx ) + E E (τx | 1X =x )
¡ ¢ ¡ ¢
[RS-Box 3.1 (vii)]
⇔ E X =x (εx ) + E X =x E (τx ) = E (εx ) + E (τx )
¡ ¢
[(6.7), RS-Box 4.1 (iv)]
⇔ E X =x (εx ) + E (τx ) = E (εx ) + E (τx ) [RS-Box 3.1 (i)]
188 6 Unbiasedness and Identification of Causal Effects
⇔ E X =x (εx ) = E (εx ).
Equation (6.10).
E (Y |X =x ) ⊢ D X
⇔ E X =x (εx ) = E (εx ) [(6.9)]
⇔ E (εx |X =x ) = E (εx ) [(6.3)]
⇔ εx 1X =x [RS-Th. 4.38 (ii)]
⇔ E (εx | 1X =x ) =
P
E (εx ). [RS-(4.35)]
Let g x denote the factorization of E X =x (Y |Z ) [see RS-Eq. (5.23)] and g denote the fac-
torization of E (τx |Z ) [see RS-Eq. (4.14)]. Furthermore, note that P (X =x , Z =z) > 0 implies
P (Z =z) > 0.
E X =x (Y |Z ) ⊢ D X ⇔ E X =x (Y |Z ) =
P
E (τx |Z ) [Def. 6.13 (i)]
⇔ g x (Z ) =
P
g (Z ) [RS-(4.14), RS-(5.23)]
⇒ g x (z) = g (z) [P (Z =z) > 0, RS-(2.68)]
⇔ E X =x(Y |Z =z) = E (τx |Z =z) [RS-(5.24), RS-(4.17)]
X =x
⇔ E (Y |Z =z) ⊢ D X [Def. 6.13 (ii)]
⇔ E (Y |X =x , Z =z) ⊢ D X . [(6.20)]
Again, let g x denote the factorization of E X =x (Y |Z ) [see RS-Eq. (5.23)] and g denote the
factorization of E (τx |Z ) [see RS-Eq. (4.14)].
Proposition (6.28). If Z is a covariate of X , then σ(Z ) ⊂ σ(D X ) [see Def. 4.11 (iv) and
Rem. 4.16]. Hence,
E X =x (Y |Z ) ⊢ D X
⇔ E X =x (Y |Z ) =
P
E (τx |Z ) [Def. 6.13 (i)]
⇔ E X =x E X =x (Y |D X ) ¯ Z =
¡ ¯ ¢
P
E (τx |Z ) [RS-Box 4.1 (xiii)]
X =x
⇔ E (τx |Z ) =
P
E (τx |Z ). [(5.1)]
Proposition (6.29).
E X =x (Y |Z ) ⊢ D X
⇔ E X =x (τx |Z ) =
P
E (τx |Z ) [(6.28)]
⇔ τx 1X =x | Z [RS-Th. 5.50]
⇔ E (τx | 1X =x , Z ) =
P
E (τx |Z ). [RS-(4.48)]
Proposition (6.31).
E X =x (Y |Z ) ⊢ D X
⇔ E X =x (τx |Z ) =
P
E (τx |Z ) [(6.28)]
X =x
¡ ¯ ¢ ¡ ¯ ¢
⇔ E εx +E (τx | 1X =x , Z )¯ Z =
P
E εx +E (τx | 1X =x , Z )¯ Z
[RS-Box 4.1 (xiv), (6.30)]
⇔ E X =x (εx |Z ) + E X =x E (τx | 1X =x , Z )¯Z =
¡ ¯ ¢ ¡ ¯ ¢
P
E (εx |Z ) + E E (τx | 1X =x , Z )¯ Z
[RS-Box 4.1 (xvii)]
X =x X =x
¡ ¯ ¢ ¡ ¯ ¢
⇔ E (εx |Z ) + E E (τx |Z ) Z =
¯
P
E (εx |Z ) + E E (τx |Z ) Z
¯ [(6.29)]
X =x
⇔ E (εx |Z ) + E (τx |Z ) =
P
E (εx |Z ) + E (τx |Z ) [RS-Box 4.1 (xi)]
⇔ E X =x (εx |Z ) =
P
E (εx |Z ).
Proposition (6.32).
E X =x (Y |Z ) ⊢ D X
⇔ E X =x (εx |Z ) =
P
E (εx |Z ) [(6.31)]
⇔ εx 1X =x |Z [RS-Th. 5.50]
⇔ E (εx | 1X =x , Z ) =
P
E (εx |Z ). [RS-(4.48)]
190 6 Unbiasedness and Identification of Causal Effects
Equation (6.37). If Z is a covariate of X , then σ(Z ) ⊂ σ(D X ) [see Def. 4.11 (iv) and
Rem. 4.16], which implies {Z =z } = {ω ∈ Ω: Z (ω) = z } ∈ σ(D X ). Also note that E X =x (Y |D X ) =
P
E X =x (Y |DC )
[see Def. 4.11 (iii) and RS-Def. 4.4]. Hence,
E X =x (Y |Z =z) ⊢ D X
⇔ E X =x (Y |Z =z) = E (τx |Z =z) [Def. 6.13 (ii)]
X =x
⇔ E (Y |{Z =z }) = E (τx |Z =z) [RS-(3.23)]
X =x
¡ X =x ¯ ¢
⇔ E E (Y |D X ) ¯ {Z =z } = E (τx |Z =z) [{Z =z } ∈ σ(D X ), RS-Box 4.1 (xii)]
X =x
⇔ E (τx |Z =z ) = E (τx |Z =z). [(5.1), RS-(3.23)]
Equation (6.38).
E X =x (Y |Z =z) ⊢ D X
⇔ E X =x (τx |Z =z ) = E (τx |Z =z) [(6.37)]
X =x
¡ ¯ ¢ ¡ ¯ ¢
⇔ E εx +E (τx | 1X =x , Z ) Z =z = E εx +E (τx | 1X =x , Z ) Z =z
¯ ¯ [(6.30)]
X =x X =x
¡ ¯ ¢ ¡ ¯ ¢
⇔ E (εx |Z =z) + E E (τx | 1X =x , Z ) Z =z = E (εx |Z =z) + E E (τx | 1X =x , Z ) Z =z
¯ ¯
[RS-Box 3.1 (vii)]
X =x X =x
¡ ¯ ¢ ¡ ¯ ¢
⇔ E (εx |Z =z) + E E (τx | 1X =x , Z ) ¯{Z =z } = E (εx |Z =z) + E E (τx | 1X =x , Z ) ¯{Z =z }
[RS-(3.23)]
X =x X =x
⇔ E (εx |Z =z) + E (τx |Z =z) = E (εx |Z =z) + E (τx |Z =z)
[{Z =z } ∈ σ(1X =x , Z ), RS-Box 4.1 (xii), RS-(3.23)]
X =x
⇔ E (εx |Z =z) = E (εx |Z =z). [(6.37)]
E (Y |X =x ) ⊢ D X ∧ E (Y |X =x ′ ) ⊢ D X
⇔ E (Y |X =x ) = E (τx ) ∧ E (Y |X =x ′ ) = E (τx ′ ) ∧ τx , τx ′ are P-unique [Def. 6.3 (i)]
⇒ E (Y | X =x ) − E (Y | X =x ′ ) = E (τx ) − E (τx ′ ) ∧ τx , τx ′ are P-unique
⇔ PFE x x ′ = E (τx ) − E (τx ′ ) ∧ τx , τx ′ are P-unique [(6.39)]
⇔ PFE x x ′ ⊢ DC . [Def. 6.23 (i)]
′
E X =x (Y |Z ) ⊢ D X ∧ E X =x (Y |Z ) ⊢ D X
6.8 Proofs 191
′
⇔ E X =x (Y |Z ) =
P
E (τx |Z ) ∧ E X =x (Y |Z ) =
P
E (τx ′ |Z ) ∧ τx , τx ′ are P-unique [Def. 6.13 (i)]
X =x ′
⇒ E X =x (Y |Z ) − E (Y |Z ) =
P
E (τx |Z ) − E (τx ′ |Z ) ∧ τx , τx ′ are P-unique [SN-Rem. 2.76 (ii)]
⇔ PFE x x ′ (Z ) = E (τx |Z ) − E (τx ′ |Z ) ∧ τx , τx ′ are P-unique [(6.40)]
⇔ PFE Z ; x x ′ ⊢ DC . [Def. 6.23 (ii)]
′
E X =x(Y |Z =z) ⊢ D X ∧ E X =x (Y |Z =z) ⊢ D X
′
⇔ E X =x(Y |Z =z) = E (τx |Z =z) ∧ E X =x (Y |Z =z) = E (τx ′ |Z =z)
∧ τx , τx ′ are P Z=z-unique [Def. 6.13 (ii)]
X =x X =x ′
⇒ E (Y |Z =z) − E (Y |Z =z) = E (τx |Z =z) − E (τx ′ |Z =z) ∧ τx , τx ′ are P Z=z-unique
⇔ PFE Z ; x x ′ (z) = E (τx − τx ′ |Z =z) ∧ τx , τx ′ are P Z=z-unique [RS-(3.35), (6.41)]
⇔ PFE Z ; x x ′ (z) ⊢ DC . [Def. 6.23 (iii)]
¡ ¢ ¡ ¢
E PFE Z ; x x ′ (Z ) = E E (τx − τx ′ |Z ) [Def. 6.23 (ii)]
= E (τx − τx ′ ) [RS-Box 4.1 (iv)]
= ATE x x ′ . [(5.20), (5.21)]
Equation (6.53).
¡ ¯ ¢
E PFE Z ; x x ′ (Z ) ¯V
¡ ¯ ¢
=
P
E E (τx − τx ′ |Z ) ¯V [Def. 6.23 (ii)]
¡ ¯ ¢
=
P
E E (τx |Z ) − E (τx ′ |Z ) ¯V [RS-Box 4.1 (xviii)]
¡ ¯ ¢
=
P
E E (τx | 1X =x , Z ) − E (τx ′ | 1X =x ′ , Z ) ¯V [(6.29)]
¡ ¯ ¢ ¡ ¯ ¢
=
P
E E (τx | 1X =x , Z ) ¯V − E E (τx ′ | 1X =x ′ , Z ) ¯V [RS-Box 4.1 (xviii)]
=
P
E (τx |V ) − E (τx ′ |V ) [σ(V ) ⊂ σ(1X =x , Z ), σ(V ) ⊂ σ(1X =x ′ , Z ), RS-Box 4.1 (xiii)]
=
P
E (τx − τx ′ |V ) [RS-Box 4.1 (xviii)]
=
P
CTE V ; xx ′ (V ). [(5.34)]
192 6 Unbiasedness and Identification of Causal Effects
Equation (6.54).
¡ ¯ ¢
CTE V ; x x ′ (V ) =
P
E PFE Z ; x x ′ (Z ) ¯V [(6.53)]
′
E E X =x (Y |Z ) − E X =x (Y |Z ) ¯V
¡ ¯ ¢
=
P
[(6.40)]
′
E E X =x (Y |Z ) ¯V − E E X =x (Y |Z ) ¯V .
¡ ¯ ¢ ¡ ¯ ¢
=
P
[RS-Box 4.1 (xviii)]
Equation (6.59).
¡ ¯ ¢
E PFE Z ; x x ′ (Z ) ¯V
¡ ¯ ¢
=
P
E E (τx − τx ′ |Z ) ¯V [(6.58), Def. 6.23 (ii)]
¡ ¯ ¢
=
P
E E (τx |Z ) − E (τx ′ |Z ) ¯V [RS-Box 4.1 (xviii)]
¡ ¯ ¢
=
P
E E (τx |X , Z ) − E (τx ′ |X , Z ) ¯V [(6.57)]
¡ ¯ ¢ ¡ ¯ ¢
=
P
E E (τx |X , Z ) ¯V − E E (τx ′ |X , Z ) ¯V [RS-Box 4.1 (xviii)]
=
P
E (τx |V ) − E (τx ′ |V ) [σ(V ) ⊂ σ(X , Z ), RS-Box 4.1 (xiii)]
=
P
E (τx − τx ′ |V ) [RS-Box 4.1 (xvii)]
=
P
CTE V ; x x ′ (V ). [(5.30)]
Equation (6.60).
CTE V ; xx ′ (V )
¡ ¯ ¢
=
P
E PFE Z ; x x ′ (Z ) ¯V [(6.59)]
′
E E X =x (Y |Z ) − E X =x (Y |Z ) ¯V
¡ ¯ ¢
=
P
[(6.40)]
′
E E X =x (Y |Z ) ¯V − E E X =x (Y |Z ) ¯V .
¡ ¯ ¢ ¡ ¯ ¢
=
P
[RS-Box 4.1 (xvii)]
Both sets of assumptions, those of Theorem 6.39 and those of Theorem 6.41, yield
¡ ¯ ¢
CTE V ; xx ′ (V ) =
P
E PFE Z ; x x ′ (Z ) ¯ V .
Because both sides are compositions of V and some numerical functions, RS-Remark 2.55
implies Equation (6.64). The same kind of argument proves Equation (6.65).
6.9 Exercises
⊲ Exercise 6-1 What is the difference between the two terms E (τx ) and E (Y |X =x )?
6.9 Exercises 193
⊲ Exercise 6-2 Compute the probabilities P(Z=z) occurring in Equation (6.81) for both values of Z
in the example displayed in Table 6.2.
⊲ Exercise 6-3 Which are the probabilities P(U =Tom , Z =m) and P(U =Ann, Z =m) occurring in
Equation (6.81) for the example displayed in Table 6.2.
⊲ Exercise 6-4 Compute the two conditional probabilities P(U =u |Z =m) for u =Tom and u = Ann
displayed in Table 6.2.
⊲ Exercise 6-5 Use RS-Theorem 1.38 to compute the probability P(X =1) for the example displayed
in Table 6.2.
⊲ Exercise 6-6 Compute the probabilities P(U =u |X = 0) and P(U =u |X =1) displayed in Table 6.2
for all six persons.
⊲ Exercise 6-7 Compute the conditional probabilities P(U =u |X =1, Z =m) occurring in Equation
(iii) of Box 6.2 for the example of Table 6.2.
⊲ Exercise 6-8 Compute the conditional expectation values E (Y |X =0) and E (Y |X =1) for the ex-
ample in Table 6.4.
⊲ Exercise 6-9 Compute the conditional expectation values E (τ1 |Z = f ) and E (τ0 |Z = f ) displayed
in Table 6.2.
⊲ Exercise 6-10 Download Kbook Table 8.4.sav from www.causal-effects.de. This data set has been
generated from Table 6.4 for a sample of size N = 10,000. Estimate the conditional expectations
E X=0 (Y | Z ) and E X =1 (Y | Z ) and use them to compute the four conditional expectation values
E (Y |X =x , Z=z ) displayed in Table 6.4.
Solutions
⊲ Solution 6-1 The term E (τx ) denotes the expectation of a true outcome variable τx , where x is a
value of a putative cause variable. It is these true outcome variables that are of interest in the em-
pirical sciences because the difference between τx and τx ′ is the conditional effect function of x
compared to x ′ controlling for all potential confounders of X . This implies that the true outcome
variables cannot be biased, and this also applies to their expectations E (τx ). In causal research, we
often aim at estimating the differences E (τx ) −E (τx ′ ). In contrast, the differences between the con-
ditional expectation values E (Y |X =x ) and E (Y |X =x ′ ) of the outcome variable Y are not of interest
in causal research because they do not have a causal interpretation unless E (Y |X =x ) = E (τx ) and
E (Y |X =x ′ ) = E (τx ′ ), that is, unless E (Y |X =x ) and E (Y |X =x ′ ) are unbiased.
⊲ Solution 6-2 The events that U takes on the value ui and that U takes on the value u j , i 6= j , are
disjoint. Therefore, we can use the theorem of total probability (see RS-Th. 1.38):
P(Z =m) = P(Z =m,U =Tom ) + ... + P(Z =m,U =Sue )
1 1 1 1 4
= + + + +0+0 = .
6 6 6 6 6
P(Z = f ) = P(Z = f ,U =Tom ) + ... + P(Z = f ,U =Sue )
1 1 2
= 0+0+0+0+ + = .
6 6 6
194 6 Unbiasedness and Identification of Causal Effects
⊲ Solution 6-3 P(U =Tom , Z =m) = 1/6 and P(U =Ann, Z =m) = 0.
⊲ Solution 6-4
P(U =Tom , Z =m) 1/6 1
P(U =Tom | Z =m) = = = .
P(Z =m) 4/6 4
P(U =Ann, Z =m) 0
P(U =Ann|Z =m) = = = 0.
P(Z =m) 4/6
⊲ Solution 6-5 The events {U =Tom },... ,{U =Sue } are disjoint and all these events have positive
probabilities. Hence we can apply the theorem of total probability (see RS-Th. 1.38):
P(X =1) = P(X =1|U =Tom ) · P(U =Tom ) + ... + P(X =1|U =Sue ) · P(U =Sue )
6 1 5 1 4 1 3 1 2 1 1 1
= · + · + · + · + · + ·
7 6 7 6 7 6 7 6 7 6 7 6
21 1
= = .
42 2
⊲ Solution 6-6 We have to use the equation
P(X =x |U =u ) · P(U =u )
P(U =u |X =x ) = .
P(X =x )
as well as 5/18, 4/18, and 3/18 for the corresponding conditional probabilities for u =Tim , u =Joe ,
and u =Jim . The conditional probabilities P(U =Ann |X =1, Z =m) and P(U =Sue | X =1, Z =m) are
zero.
⊲ Solution 6-8 According to Equation (i) of Box 6.2,
X
E (Y |X =0) = E (Y | X = 0,U =u ) · P(U =u |X = 0)
u
1 3
= (68 + 78 + 88 + 98) · + (106 + 116) ·
10 10
= 33.2 + 66.6 = 99.8,
and
X
E (Y |X =1) = E (Y | X =1,U =u ) · P(U =u |X =1)
u
3 1
= (81 + 86 + 100 + 103) · + (114 + 130) ·
14 14
≈ 79.286 + 17.429 ≈ 96.715.
⊲ Solution 6-9 Remember again, in this example, D X = U . Therefore, the values of the true outcome
variable τx are the conditional expectation values E X =x (Y |U =u ) = E (Y | X = x,U =u ). Hence,
X
E (τ0 |Z = f ) = E (Y | X = 0,U =u ) · P(U =u |Z = f )
u
1 1
= 68 · 0 + ... + 98 · 0 + 106 · + 116 · = 111.
2 2
X
E (τ1 |Z = f ) = E (Y | X =1,U =u ) · P(U =u |Z = f )
u
1 1
= 81 · 0 + ... + 103 · 0 + 114 · + 130 · = 122.
2 2
⊲ Solution 6-10 Because Z is dichotomous, one way to estimate the conditional expectations
E X=0 (Y | Z ) and E X =1 (Y | Z ) is to estimate the linear regressions of Y on the indicator 1Z =m in treat-
ments 0 and 1, that is, within the data subsamples with x = 0 and x =1, respectively.
⊲ Solution 6-11 (i). We only have to prove E (Y |X =x ) = E (τx |X =x ) because the second equation is
Equation (ii) of RS-Box 3.2. Hence, for τx ∈ E X =x (Y |U ),
E (Y |X =x ) = E X =x (Y ) [RS-(3.24)]
E X =x E X =x (Y |U )
¡ ¢
[RS-Box 4.1 (iv)]
E X =x (τx ) [τx = E X =x (Y |U )]
E (τx |X =x ). [RS-(3.24)]
(ii). Note that there is a measurable mapping g x such that E X =x (Y |U ) = g x (U ) [see Rem. 5.11].
Furthermore, under the assumptions of Box 6.2, in particular P(X =x ,U =u ) > 0, for all u ∈U (Ω), the
true outcome variable τx is P-unique. Hence, according to RS-Box 3.1 (v), for τx ∈ E X =x (Y |U ),
E (τx ) = E E X =x (Y |U ) [τx = E X =x (Y |U )]
¡ ¢
P
[E X =x (Y |U ) = g x (U )]
¡ ¢
= E g x (U )
P
X
= g x (u) · P(U =u ) [RS-(3.13)]
u
E X =x (Y |U =u ) ·P(U =u )
X
= [RS-(5.24)]
u
X
= E (Y |X =x ,U =u ) ·P(U =u ). [RS-(5.26)]
u
196 6 Unbiasedness and Identification of Causal Effects
P(X =x ,U =u ) > 0
⇒ P X =x (Z=z) > 0
£ X =x
P X =x (U =u )
X ¤
P (Z=z) =
u : f (u)=z
X =x
£ ¤
⇔ P(X =x , Z=z) > 0. P (Z=z) = P(X =x , Z=z)/P(X =x )
Furthermore,
E X =x (Y |Z ) = E X =x E X =x (Y |U ) ¯ Z
¡ ¯ ¢
[RS-Box 4.1 (xiii)]
P X =x
E X =x (Y |Z=z) = E X =x E X =x (Y |U ) ¯ Z=z X =x
¡ ¯ ¢
⇒ [P (Z=z) > 0, RS-Rem. 2.55]
E X =x (Y |Z=z) = E X =x g x (U ) ¯ Z=z
¡ ¯ ¢
⇔ [RS-(5.23)]
X =x X =x
(Y |U ) ¯ Z=z = E X =x (τx |Z=z) = E (τx |X =x , Z=z),
¡ ¯ ¢
E E
which, together with the equivalence propositions above, proves the first equation of (iii).
(iv).
⊲ Solution 6-12
τx X | Z
⇔ E (τx |X , Z ) = E (τx |Z ) [RS-Def. 4.41]
P
¡ ¯ ¢ ¡ ¯ ¢
⇒ E E (τx |X , Z ) ¯ 1X =x , Z = E E (τx |Z ) ¯ 1X =x , Z [RS-Box 4.1 (xiv)]
P
⊲ Solution 6-13 In the examples presented in Tables 6.2 to 6.4, Z (sex) is U -measurable. According
to RS-Corollary 2.36, there is a mapping g : ΩU → {m, f } such that Z is the composite function of U
and g , that is, Z = g (U ). Therefore,
6.9 Exercises 197
(
{U =u }, if g (u) = z,
{U =u , Z=z} = (6.82)
Ø, otherwise.
According to Equation (6.82), the event to sample person u and that the sampled person is male is
identical to the event to sample person u, if that person is male [i. e., if g (u) = m]. Correspondingly,
the event to sample person u and that the sampled person is female [i. e., g (u) = f ] is identical to
the event to sample person ¡u, if that
¢ person is female. In contrast, the event to sample a male person
u and to observe Z (ω) = g U (ω) = f is the
¡ empty¢ set. The same applies to the event to sample a
female person u and to observe Z (ω) = g U (ω) = m. Hence, Equation (6.82) implies
(
P(U =u ), if g (u) = z,
P(U =u , Z=z) = (6.83)
0, otherwise,
P(X =x ,U =u , Z=z)
P(X =x |U =u , Z=z) =
P(U =u , Z=z)
P(X =x ,U =u ) (6.84)
=
P(U =u )
Furthermore, in our examples, P(X =x , Z=z) > 0. Therefore, if P(U =u , Z=z) > 0, then
P(X =x ,U =u , Z=z)
P(U =u |X =x , Z=z ) =
P(X =x , Z=z)
P(X =x |U =u , Z=z) · P(U =u , Z=z )
=
P(X =x | Z=z) · P(Z=z )
P(X =x |U =u ) · P(U =u , Z=z)
= , [Eq. (6.84)]
P(X =x | Z=z) · P(Z=z )
which is Equation (6.80). If P(U =u , Z=z) = 0, then P(X =x ,U =u , Z=z) = 0. Hence, if P(U =u , Z=z) =
0, then
P(X =x ,U =u , Z=z)
P(U =u |X =x , Z=z) = = 0.
P(X =x , Z=z)
Requirements
Reading this chapter we assume that the reader is familiar with the concepts treated in
all chapters of Steyer (2024). Chapters 4 to 6 are now crucial, dealing with the concepts of
a conditional expectation, a conditional expectation with respect to a conditional proba-
bility measure, and conditional independence. Furthermore, we assume familiarity with
chapters 4 to 6 of the present book.
In the present chapter we will often refer to the following notation and assumptions.
τx 1X =x :⇔ E (τx | 1X =x ) =
P
E (τx ) (7.1)
τx X :⇔ E (τx |X ) =
P
E (τx ). (7.2)
These are the first two causality conditions listed in Box 7.1.
Furthermore, under the Assumptions 7.1 (a) to (f) we define
∀x : τx 1X =x :⇔ ∀ x ∈ X (Ω): E (τx | 1X =x ) =
P
E (τx ) (7.3)
and call it mean-independence of τx from 1X =x for all x . Under the same assumptions,
mean-independence of all τx from X is defined by
∀x : τx X :⇔ ∀ x ∈ X (Ω): E (τx |X ) =
P
E (τx ). (7.4)
Remark 7.2 [Consequences of P -Uniqueness of the True Outcome Variables] If τx , τx∗ are
two versions of a true outcome variable, then P -uniqueness of τx does not only imply
that the two versions are P -equivalent, that is, τx = P
τx∗ (see RS-Def. 2.46), but also that
their expectations are identical and their X -conditional expectations are P -equivalent
[see RS-Box 4.1 (xiv)]. Without P -uniqueness of τx , neither τx = P
τx∗, nor E (τx ) = E (τx∗), or
∗
E (τx |X ) =
P
E (τx |X ) are guaranteed. Hence, if a true outcome variable τx is not P-unique,
then assuming E (τx |X ) = P
E (τx ) does not make sense because E (τ x ) would not be uniquely
defined. ⊳
Remark 7.3 [τx Is the Composition of D X and a Function g x ] At first sight, mean-indepen-
dence of τx from X seems paradoxical, however, it is not. The reason is that τx = E X =x(Y |D X )
is a function of the global potential confounder D X . More precisely, τx is a composition
g x (D X ) of D X and a function g x : Ω′D X → R [see RS-Eq. (5.23)], where (Ω′D X , A D
′
X
) denotes
the value space of D X . Although τx refers to a specific value x of X , a true outcome variable
τx is not a function of X . Therefore, postulating E (τx |X ) =
P
E (τx ) does make sense, provided
that τx is P-unique. ⊳
τx X ⇒ τx 1X =x (7.5)
⇔ E (Y |X =x ) ⊢ D X (7.6)
∀x : τx X ⇒ ∀ x ∈ X (Ω): τx 1X =x (7.7)
⇔ ∀ x ∈ X (Ω): E (Y |X =x ) ⊢ D X (7.8)
⇔ E (Y |X ) ⊢ D X (7.9)
(see Exercise 7-2). Hence, mean-independence of all τx from X implies unbiasedness of all
conditional expectation values E (Y |X =x ), and under the Assumptions 7.1 (a) to (f), this is
equivalent to unbiasedness of the conditional expectation E (Y |X ). ⊳
In this section, we treat some independence conditions involving the true outcome vari-
ables τx . These conditions also imply unbiasedness of E (Y |X =x ) and E (Y |X ). Remem-
ber, X ⊥ ⊥Y denotes independence of two random variables X and Y (see RS-Def. 2.59).
Furthermore, τ = (τ0 , τ1 , . . . , τJ ), that is, τ is a J + 1-variate random variable consisting of
the true outcome variables τx = E X =x(Y |D X ), x ∈ X (Ω) = {0, 1, . . . , J }.
202 7 Rosenbaum-Rubin Conditions
τx ⊥
⊥ 1X =x Independence of τx and 1X =x . If the Assumptions 7.1 (a) to (c) hold, then
it is equivalent to
P(X =x |τx ) = P(X =x ). (iii)
P
τx ⊥
⊥X Independence of τx and X . If the Assumptions 7.1 (a) to (c) hold, then it is
equivalent to
∀ x ′ ∈ X (Ω): P(X =x ′ |τx ) = P(X =x ′ ). (iv)
P
The last two conditions are well-defined under the Assumptions 7.1 (a) to (c). However, only
in conjunction with Assumption 7.1 (d), each of conditions (i) to (iv) implies E (Y |X =x ) ⊢ D X .
∀x : τx ⊥
⊥ 1X =x Independence of τx and 1X =x for all x. If the Assumptions 7.1 (a) to (c)
and (e) hold, then it is equivalent to
∀ x ∈ X (Ω) : P(X =x |τx ) = P(X =x ). (vii)
P
∀x : τx ⊥
⊥X Independence of τx and X for all x. If the Assumptions 7.1 (a) to (c) and
(e) hold, then it is equivalent to
∀ x, x ′ ∈ X (Ω) : P(X =x ′ |τx ) = P(X =x ′ ). (viii)
P
τ⊥
⊥X Independence of τ and X . If the Assumptions 7.1 (a) to (c) and (e) hold,
then it is equivalent to
∀ x ∈ X (Ω) : P(X =x |τ) = P(X =x ). (ix)
P
last three conditions are well-defined under the Assumptions 7.1 (a) to (c) and (e). However,
only in conjunction with Assumption 7.1 (f ), each of conditions (v) to (ix) implies unbiasedness
of E (Y |X ) and all E (Y |X =x ), x ∈ X (Ω).
7.1 RR-Conditions for E (Y |X =x ) and E (Y |X ) 203
τx ⊥
⊥ 1X =x ⇔ P (X =x |τx ) =
P
P (X =x ) (7.10)
′ ′ ′
τx ⊥
⊥X ⇔ ∀ x ∈ X (Ω): P (X =x |τx ) =
P
P (X =x ) (7.11)
∀x : τx 1X =x ⇔ ∀ x ∈ X (Ω): P (X =x |τx ) =
P
P (X =x ) (7.12)
∀x : τx ⊥
⊥X ⇔ ∀ x, x ′ ∈ X (Ω): P (X =x ′ |τx ) =
P
P (X =x ′ ) (7.13)
τ⊥
⊥X ⇔ ∀ x ∈ X (Ω): P (X =x |τ) =
P
P (X =x ). (7.14)
Note again that independence of two random variables X and Y is also defined if nei-
ther X nor Y are finite or countable (see RS-Def. 2.59). However, because we assume that X
is finite or countable, the propositions on the right-hand sides above are more convenient
and more intuitive than the general definition. ⊳
Table 7.1 displays the implications among the causality conditions listed in Box 7.1. Note
that the propositions summarized in this table are special cases of the corresponding
propositions presented in Table 7.2. (For proofs see Exercises 7-3 and 7-4.)
In the sequel we study some consequences of some of the independence conditions,
starting with the consequences of τx ⊥ ⊥ 1X =x , that is, of independence of the true outcome
variable τx and the indicator variable 1X =x . According to Theorem 7.7, under the Assump-
tions 7.1 (a) to (d), τx ⊥
⊥ 1X =x implies mean-independence of τx from 1X =x , which itself is
equivalent to unbiasedness of E (Y |X =x ).
⊥ 1X =x Implies Unbiasedness of E (Y |X =x )]
Theorem 7.7 [τx ⊥
Let the Assumptions 7.1 (a) to (d) hold. Then
τx ⊥
⊥ 1X =x ⇒ τx 1X =x (7.15)
⇔ E (Y |X =x ) ⊢ D X . (7.16)
(Proof p. 220)
⊥1X =x
∀x : τx 1X =x
⊥X
∀x : τx X
⊥1X =x
τx 1X =x
∀x : τx ⊥
∀x : τx ⊥
⊥X
τx X
τx ⊥
τx ⊥
τx X (a)-(d)
τx ⊥
⊥ 1X =x (a)-(d)
τx ⊥
⊥X (a)-(d) (a)-(d) (a)-(c)
∀x : τx 1X =x (a)-(f)
∀x : τx X (a)-(f) (a)-(f)
∀x : τx ⊥
⊥ 1X =x (a)-(e) (a)-(e) (a)-(c),(e) (a)-(f)
∀x : τx ⊥
⊥X (a)-(e) (a)-(e) (a)-(c),(e) (a)-(c),(e) (a)-(f) (a)-(f) (a)-(c),(e)
τ⊥
⊥X (a)-(e) (a)-(e) (a)-(c),(e) (a)-(c),(e) (a)-(f) (a)-(f) (a)-(c),(e) (a)-(c),(e)
Note: An entry such as (a)-(d) means that the condition in the row implies the condition in the
column, provided that the Assumptions 7.1 (a) to (d) hold. The symbols involving or ⊥ ⊥ are
explained in Box 7.1. Trivial equivalences such as τx ⊥⊥X ⇔ τx ⊥ ⊥X are omitted. The first three
conditions imply unbiasedness of E (Y |X =x ), provided that the Assumptions 7.1 (a) to (d) hold
and, under the Assumptions 7.1 (a) to (f ), the last five imply unbiasedness of E (Y |X ) and all
E (Y |X =x ), x ∈ X (Ω).
τx ⊥
⊥X ⇒ τx ⊥
⊥ 1X =x . (7.17)
τx ⊥
⊥X ⇒ τx X (7.18)
⇒ τx 1X =x (7.19)
⇔ E (Y |X =x ) ⊢ D X . (7.20)
(Proof p. 221)
⊥X implies τx ⊥
Remark 7.9 [τ⊥ ⊥ 1X =x ] Remember, according to RS-Corollary 6.18,
P (X =x |τx ) =
P
P (X =x ) ⇔ τx ⊥
⊥ 1X =x . (7.21)
As mentioned before, even if we assume τx ⊥⊥X for all x ∈ X (Ω), then this is less restrictive
than τ⊥
⊥X . More precisely, if the Assumptions 7.1 (a) to (c) and (e) hold, then
¡ ¢
τ⊥⊥X ⇒ ∀ x ∈ X (Ω) : τx ⊥
⊥X , (7.22)
which follows from σ(τx ) ⊂ σ(τ) and RS-Box 2.1 (iv). Note that the term on the right-hand
side of Proposition (7.22) does not imply τ⊥
⊥X . ⊳
7.2 RR-Conditions for E Z=z(Y |X =x ) and E Z=z (Y |X ) 205
τ⊥
⊥X ⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x (7.23)
⇒ ∀ x ∈ X (Ω): τx ⊥
⊥X . (7.24)
∀ x ∈ X (Ω): τx ⊥
⊥X ⇒ ∀ x ∈ X (Ω): τx X (7.25)
⇒ ∀ x ∈ X (Ω): τx 1X =x (7.26)
⇔ ∀ x ∈ X (Ω): E (Y |X =x ) ⊢ D X (7.27)
⇔ E (Y |X ) ⊢ D X . (7.28)
X⊥
⊥Y |(Z =z) :⇔ X ⊥
⊥Y
P Z=z (7.29)
Z=z
⇔ P (A ∩B ) = P Z=z (A) · P Z=z (B), ∀A, B ∈ σ(X ) × σ(Y )
Y X | Z =z :⇔ Y X ⇔ E Z=z (Y |X ) Z=z
= E Z=z (Y ) (7.30)
P Z=z P
Using this notation, Theorem 7.10 implies the following corollary about the conse-
quences of (Z =z)-conditional independence of τ and X . Reading this corollary, note that
P -uniqueness of τx implies P Z=z -uniqueness of τx (see RS-Box 5.1) and that unbiasedness
of the terms E Z=z (Y |X ) and E Z=z(Y |X =x ), x ∈ X (Ω), is defined only if Z is a covariate of
X.
τ⊥
⊥X |(Z =z) ⇔ ∀ x ∈ X (Ω): τ⊥
⊥ 1X =x |(Z =z) (7.31)
⇒ ∀ x ∈ X (Ω): τx ⊥
⊥X |(Z =z). (7.32)
∀ x ∈ X (Ω): τx ⊥
⊥X |(Z =z) ⇒ ∀ x ∈ X (Ω): τx X |(Z =z) (7.33)
⇒ ∀ x ∈ X (Ω): τx 1X =x |(Z =z). (7.34)
Now we generalize the causality conditions treated in section 7.1, conditioning on a ran-
dom variable Z on (Ω, A, P ). Under the appropriate assumptions, which include that Z
is a covariate of X , these conditions imply unbiasedness of the conditional expectation
E (Y |X, Z ) and all Z -conditional expectations E X =x (Y |Z ), x ∈ X (Ω). These causality con-
ditions can indirectly be created by conditionally randomized assignment of the observa-
tional unit to a treatment condition, but also via covariate selection (see chs. 8 to 10).
7.3 RR-Conditions for E X =x (Y |Z ) and E (Y |X, Z ) 207
Box 7.2 lists all causality conditions treated in this section, including their symbols and
definitions, and Table 7.2 summarizes the implications among them. The most restric-
tive of these causality conditions is Z -conditional independence of a (J + 1)-variate true
outcome variable τ and X , which is the translation (into true outcome theory) of strong
ignorability (Rosenbaum & Rubin, 1983b).
τx 1X =x | Z :⇔ E (τx | 1X =x , Z ) =
P
E (τx |Z ). (7.37)
τx X | Z :⇔ E (τx |X , Z ) =
P
E (τx |Z ). (7.38)
Remark 7.15 [Dichotomous X ] If the Assumptions 7.1 (a) to (d) and (g) hold, then X be-
ing dichotomous implies
E (τx |X , Z ) =
P
E (τx | 1X =x , Z ) =
P
E (τx | 1X 6=x , Z ) (7.39)
because σ(X , Z ) = σ(1X =x , Z ) = σ(1X 6=x , Z ) (see RS-Def. 4.4). Hence, under these assump-
tions, X being dichotomous implies
τx X | Z ⇔ τx 1X =x | Z ⇔ τx 1X 6=x | Z . (7.40)
τx X | Z ⇔ E (Y |X, Z ) ⊢ D X . (7.41)
Conditions three and four in Box 7.2 are Z -conditional independence of τx and 1X =x , de-
noted τ⊥ ⊥1X =x |Z , and Z -conditional independence of τx and X , denoted τ⊥⊥ X |Z . Remem-
ber that the general concept of conditional independence of two random variables X and
Y given a random variable Z , denoted X ⊥ ⊥Y | Z , has been introduced in RS-Definition 6.2.
However, under the Assumptions 7.1 (a) to (c) and (g),
τx ⊥
⊥ 1X =x |Z ⇔ P (X =x | Z , τx ) =
P
P (X =x | Z ), (7.42)
and
τx ⊥
⊥X |Z ⇔ ∀ x ′ ∈ X (Ω) : P (X =x ′ |τx , Z ) =
P
P (X =x ′ | Z ). (7.43)
208 7 Rosenbaum-Rubin Conditions
τx ⊥
⊥1X =x |Z Z-conditional independence of τx and 1X =x . Under the Assumptions 7.1
(a) to (c) and (g), it is equivalent to
τx ⊥
⊥X |Z Z-conditional independence of τx and X . Under the Assumptions 7.1 (a)
to (c) and (g), it is equivalent to
x ′ ∈ X (Ω): P(X =x ′ |τx , Z ) = P(X =x ′ | Z ). (iv)
P
If we additionally assume 7.1 (d) and Z is a covariate of X , then each of conditions (i) to (iv)
implies E X =x (Y |Z ) ⊢ D X .
∀x : τx ⊥
⊥ 1X =x |Z Z-conditional independence of τx and 1X =x for all x. Under the Assump-
tions 7.1 (a) to (c), (e) and (g), it is equivalent to
∀x : τx ⊥
⊥X |Z Z-conditional independence of τx and X for all x. Under the Assump-
tions 7.1 (a) to (c), (e) and (g), it is equivalent to
∀ x, x ′ ∈ X (Ω) : P(X =x ′ |τx , Z ) = P(X =x ′ | Z ). (viii)
P
τ⊥
⊥X |Z Z-conditional independence of τ and X (strong ignorability). Under the
Assumptions 7.1 (a) to (c), (e) and (g), it is equivalent to
If we additionally assume 7.1 (f ) and Z is a covariate of X , then each of conditions (v) to (ix)
implies E (Y |X, Z ) ⊢ D X and E X =x (Y |Z ) ⊢ D X , for all x ∈ X (Ω).
7.3 RR-Conditions for E X =x (Y |Z ) and E (Y |X, Z ) 209
Also note that τx ⊥⊥X |Z implies τx ⊥ ⊥ 1X =x |Z but not vice versa, unless X is dichotomous.
Conditions (v) to (viii) of Box 7.2 postulate that conditions (i) to (iv) hold for all values
x of X . Under the appropriate assumptions including that Z is a covariate of X , these
conditions imply unbiasedness of E (Y |X, Z ) and E X =x (Y |Z ), for all values x ∈ X (Ω). Note
that condition (viii) has been proposed by Porta (2014, p. 142).
Finally, the last condition in Box 7.2 is Z -conditional independence of τ and X , denoted
τ⊥⊥X |Z , where τ = (τ0 , τ1 , . . . , τ J ) is a J + 1-dimensional random variable consisting of the
true outcome variables τx , x ∈ X (Ω). Under the Assumptions 7.1 (a) to (c), (e) and (g), it is
equivalent to
τ⊥
⊥X |Z ⇔ ∀ x ∈ X (Ω) : P (X =x | Z , τ) =
P
P (X =x | Z ) (7.44)
P (X =x |D X ) >
P
0 (7.45)
Table 7.2 displays the implications among the causality conditions listed in Box 7.2. In this
section, we present some theorems in which most of these implications are proved. The
solution to Exercise 7-4 provide a guide to the proofs of all these implications.
In the following theorem, we present some propositions about the consequences of
τx X | Z concerning unbiasedness.
τx X | Z ⇒ τx 1X =x | Z . (7.46)
τx 1X =x | Z ⇔ E X =x (Y |Z ) ⊢ D X . (7.47)
(Proof p. 221)
210 7 Rosenbaum-Rubin Conditions
τx ⊥
⊥ 1X =x |Z ⇒ τx 1X =x | Z . (7.50)
τx 1X =x | Z ⇔ E X =x (Y |Z ) ⊢ D X . (7.51)
(Proof p. 222)
Again, note that Proposition (7.50) holds for any value x of X for which τx is P-unique.
Hence, if τx is P-unique for all x ∈ X (Ω), then Proposition (7.50) holds for all x ∈ X (Ω). Cor-
respondingly, if τx is P-unique for all x ∈ X (Ω) and Z is a covariate of X , then Proposition
(7.51) holds for all x ∈ X (Ω).
In Theorem 7.19 we considered Z -conditional independence of τx and an indicator
variable 1X =x . In the following theorem we turn to Z -conditional independence of τx and
X , which may take on more than just two different values.
211
⊥1X =x |Z
∀x : τx 1X =x | Z
⊥X |Z
∀x : τx X |Z
τx 1X =x | Z
⊥1X =x |Z
⊥X |Z
∀x : τx ⊥
∀x : τx ⊥
τx X | Z
τx ⊥
τx ⊥
τx X | Z (a)-(d), (g)
τx ⊥
⊥1X =x |Z (a)-(d), (g)
τx ⊥
⊥X |Z (a)-(d), (g) (a)-(d), (g) (a)-(c), (g)
∀x : τx 1X =x | Z (a)-(g)
∀x : τx X |Z (a)-(g) (a)-(g)
∀x : τx ⊥
⊥ 1X =x |Z (a)-(e), (g) (a)-(c), (e), (g) (a)-(g)
∀x : τx ⊥
⊥X |Z (a)-(e), (g) (a)-(e), (g) (a)-(c), (e), (g) (a)-(c), (e), (g) (a)-(g) (a)-(g) (a)-(c), (e), (g)
τ⊥
⊥X |Z (a)-(e), (g) (a)-(e), (g) (a)-(c), (e), (g) (a)-(c), (e), (g) (a)-(g) (a)-(g) (a)-(c), (e), (g) (a)-(c), (e), (g)
7.3 RR-Conditions for E X =x (Y |Z ) and E (Y |X, Z )
Note: An entry such as (a)-(g) means that the condition in the row implies the condition in the column, provided that we assume 7.1 (a) to (g).
The symbols involving or ⊥ ⊥ are explained in Box 7.2. Trivial equivalences such as τx ⊥ ⊥X |Z ⇔ τx ⊥ ⊥X |Z are omitted. If we additionally assume
that Z is a covariate of X , then the first three conditions listed in the first column of the table imply unbiasedness of E X =x (Y |Z ), provided that the
Assumptions 7.1 (a) to (d) and (g) hold. The last five conditions imply unbiasedness of E (Y |X, Z ) and all E X =x (Y |Z ), x ∈ X (Ω), if Z is a covariate of X
and we assume 7.1 (a) to (g).
212 7 Rosenbaum-Rubin Conditions
τx ⊥
⊥X |Z ⇒ τx X | Z . (7.52)
Furthermore,
τx ⊥
⊥X |Z ⇒ τx ⊥
⊥ 1X =x |Z (7.53)
⇒ τx 1X =x | Z . (7.54)
(Proof p. 222)
τ⊥
⊥X |Z ⇔ ∀ x ∈ X (Ω) : τ⊥
⊥1X =x |Z (7.55)
⇒ ∀ x, x ′ ∈ X (Ω) : τx ⊥
⊥1X =x ′ |Z (7.56)
⇔ ∀ x ∈ X (Ω) : τx ⊥
⊥X |Z (7.57)
⇒ ∀ x ∈ X (Ω) : τx ⊥
⊥1X =x |Z . (7.58)
∀ x ∈ X (Ω) : τx ⊥
⊥X |Z ⇒ ∀ x ∈ X (Ω) : τx X | Z (7.59)
⇒ ∀ x ∈ X (Ω) : τx 1X =x | Z (7.60)
and
∀ x ∈ X (Ω) : τx ⊥
⊥1X =x |Z ⇒ ∀ x ∈ X (Ω) : τx 1X =x | Z . (7.61)
Remark 7.22 [Methodological Consequences] What has been said in Remark 7.11 about
the causality conditions for E (Y |X ) and its values E (Y |X =x ) also applies to the causal-
ity conditions for E (Y |X, Z ) and for the conditional expectations E X =x (Y |Z ). There is no
direct way to create the conditions such as τ⊥ ⊥X |Z or ∀x : τx X |Z . There is also no di-
rect way to select the (possibly multivariate) random variable Z such that these condi-
tions hold. However, in chapters 8 to 10 we will treat other causality conditions that imply
τ⊥⊥X |Z and ∀x : τx X |Z . These conditions can be created via conditionally randomized
7.4 Examples 213
assignment of the observational unit (person) to a treatment condition, but also via an ap-
propriate selection of Z . Hence, there are indirect ways to create τ⊥⊥ X |Z and ∀x : τx X |Z ,
and with them unbiasedness of the conditional expectations E (Y |X, Z ) and the condi-
tional expectations E X =x (Y |Z ), x ∈ X (Ω). Remember, it is these conditional expectations
that can be estimated in data samples. ⊳
7.4 Examples
In this section, we study two examples. In the first one, there is Z -conditional mean-inde-
pendence of τx from 1X =x for all values x of X . In the second one, Z -conditional indepen-
dence of τ from X holds.
Table 7.3 displays the parameters of a random experiment in which Z -conditional mean-
independence of τx from 1X =x holds for each value x of X . However, neither Z -conditional
mean-independence of all τx from X holds nor Z -conditional independence of τ from X .
We assume that this random experiment has the same structure as the random experi-
ments treated in section 6.5. That is,
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,
as specified in section 6.5, is the regular probabilistic causality setup. The only difference
is that Ω2 = ΩX = {treatment 0, treatment 1, treatment 3} now consists of three (instead of
two) treatment conditions. Again, U is a global potential confounder of X .
The upper left part of Table 7.3 displays the true outcomes under treatments 0, 1, and
2, the probabilities P (U =u ) for each person u to be sampled, as well as the conditional
probabilities P (X =1|U =u ) and P (X =2 |U =u ) to be assigned to treatment 1 and treat-
ment 2, respectively. All other parameters, such as the associated individual causal total
effects CTE U ;10 (u) and CTE U ; 20 (u) or the conditional probabilities P (U =u |X =x ), can be
computed from those ‘fundamental parameters’. The table also displays the values of the
covariate (potential confounder) Z =sex. To emphasize, the table does not contain sample
data; it displays the parameters describing the laws of a random experiment, the single-
unit trial, which consists of (a) sampling a person from the set of (the six) persons, (b)
assigning or registering the (self-) assignment of the person to one of the three treatment
conditions, and (c) observing the value of the outcome variable. (For more details on such
singl-unit trials see ch. 2).
In this random experiment, the true treatment probabilities and true outcomes are such
that τx 1X =x |Z holds for each of the three values x of X . By definition, τx 1X =x |Z is equiv-
alent to
E (τx | 1X =x , Z ) =
P
E (τx |Z ), (7.64)
Fundamental parameters
E X =2 (Y |U =u )
E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (X =2 |U =u )
P (X =1 |U =u )
P (U =u |X = 0)
P (U =u |X =2)
P (U =u |X =1)
CTE U ;20 (u)
CTE U ;10 (u)
Person u
P (U =u )
Sex z
x =0 x =1 x =2 x =1 x =2
E (τx ): 93 105 105.333 ATE x0 : 12 12.333
E (Y |X =x ): 102.857 101.186 105.333 PFE x0 : −1.671 2.476
5.51, RS-Equations (5.49) and (5.50), Equation (7.64) holds if and only if
The values of the true outcome variable τx = E X =x (Y |U ) = g x (U ) [see RS-Eq. (5.23)] are
the conditional expectation values E X =x(Y |U =u ) = E (Y |X =x ,U =u ). Now,
¡ ¯ ¢
E (τx |X =x , Z =z) = E g x (U ) ¯ X =x , Z =z
X
= g x (u) · P (U =u |X =x , Z =z) (7.66)
u
E X =x (Y |U =u ) ·P (U =u |X =x , Z =z),
X
= ∀ z ∈ {m, f }.
u
In contrast,
¡ ¢
E (τx |Z =z) = E g x (U ) | Z =z
X
= g x (u) · P (U =u |Z =z) (7.67)
u
E X =x (Y |U =u ) ·P (U =u |Z =z),
X
= ∀ z ∈ {m, f }.
u
According to Equation (7.66), the conditional expectation value E (τ1 |X =1, Z =z) can be
computed from Tables 7.3 and 7.4 by
7.4 Examples 215
P (U =u |X =0, Z =m)
P (U =u |X =1, Z =m)
P (U =u |X =2, Z =m)
P (U =u |X =0, Z = f )
P (U =u |X =1, Z = f )
P (U =u |X =2, Z = f )
Person u
E X =1 (Y |U =u ) · P (U =u |X =1, Z =m)
X
E (τ1 |X =1, Z =m) =
u
6 5 6
= 87 · + 80 · + 73 · = 80.
17 17 17
This is exactly the same result as obtained for
X X =1
E (τ1 |Z =m) = E (Y |U =u ) · P (U =u |Z =m)
u
1
= (87 + 80 + 73) · = 80.
3
Hence,
and the corresponding equations hold for the second value of Z , namely
Now we show that τx X |Z does not hold in the example of Table 7.3 for at least one value
x of X . Of course, this implies that ∀x : τx X |Z does not hold either. We start gathering
the relevant equations. By definition, ∀x : τx X |Z is equivalent to
E (τx |X , Z ) =
P
E (τx |Z ), ∀ x ∈ X (Ω) (7.68)
[see Eq. (7.38)]. As mentioned before, in this example U is a global potential confounder
of X , the image of Ω under X is X (Ω) = {0, 1, 2}, and τx = E X =x (Y |U ) is P-unique for all
216 7 Rosenbaum-Rubin Conditions
x ∈ X (Ω). Hence because P (X =x, Z =z) > 0 for all pairs (x, z) of values of (X , Z ), Equation
(7.68) is equivalent to
E (τx |X =x ′, Z =z) = E (τx |Z =z), ∀(x, x ′, z) ∈ {0, 1, 2}2 × {m, f } (7.69)
[see RS-Th. 4.42 (ii)].
The values of the true outcome variable τx = E X =x (Y |U ) = g x (U ) are the conditional
expectation values E X =x(Y |U =u ) = E (Y |X =x ,U =u ). Hence, applying RS-Equation (3.28)
to the left-hand side of Equation (7.69) yields
¡ ¯ ¢
E (τx |X =x ′, Z =z) = E g x (U ) ¯ X =x ′, Z =z
X
= g x (u) · P (U =u |X =x ′, Z =z) (7.70)
u
E X =x(Y |U =u )·P (U =u |X =x ′, Z =z), ∀(x, x ′, z) ∈ {0, 1, 2}2 × {m, f }.
X
=
u
In contrast, applying RS-Equation (3.28) to the right-hand side of Equation (7.69) yields
¡ ¢
E (τx |Z =z) = E g x (U ) | Z =z
X
= g x (u) · P (U =u |Z =z) (7.71)
u
E X =x (Y |U =u ) ·P (U =u |Z =z),
X
= ∀(x, z) ∈ {0, 1, 2} × {m, f }.
u
In order to show that ∀x : τx X |Z does not hold in this example, it suffices to show that
there is a triple (x, x ′, z) ∈ {0, 1, 2}2 × {m, f } for which E (τx |X =x ′, Z =z) 6= E (τx |Z =z) [see
Eq. (7.69)].
According to Equation (7.70), the conditional expectation value E (τ2 |X =1, Z = f ) can
be computed from Tables 7.3 and 7.4 by
X X =2
E (τ2 |X =1, Z = f ) = E (Y |U =u ) · P (U =u |X =1, Z = f )
u
2 1 2
= 90 · + 120 · + 126 · = 110.4.
5 5 5
In contrast,
E X =2 (Y |U =u ) · P (U =u |Z = f )
X
E (τ2 |Z = f ) =
u
1 1 1
= 90 · + 120 · + 126 · = 112.
3 3 3
Hence, E (τ2 |X =1, Z = f ) 6= E (τ2 |Z = f ), and this proves that ∀x : τx X |Z does not hold in
this example.
To summarize: Whereas, in this example, τx 1X =x |Z holds for all values x of X — and
with it, unbiasedness of the conditional expectations E (Y |X, Z ), E X =x (Y |Z ), and the con-
ditional expectation values E (Y |X =x , Z =z) — the more restrictive condition ∀x : τx X |Z
does not hold. Hence, τ⊥ ⊥ X |Z (strong ignorability), which implies ∀x : τx X |Z , does not
hold as well in this example (see Exercise 7-5).
Table 7.5 displays the parameters of a random experiment in which there is Z -conditional
independence of τ and X . Again we assume that this random experiment has the same
structure as the random experiments treated in section 6.5. That is,
7.4 Examples 217
¢
P X =1 |Z =(z 1 , z 2 )
E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (X =1 |U =u )
P (U =u |X = 0)
P (U =u |X =1)
CTE U ;10 (u)
College z 2
Person u
P (U =u )
Sex z 1
¡
Tom m no 1/6 7/8 6/8 72 83 11 1/22 7/26
Tim m no 1/6 5/8 6/8 72 83 11 3/22 5/26
Joe m yes 1/6 5/8 5/8 95 100 5 3/22 5/26
Jim m yes 1/6 5/8 5/8 100 105 5 3/22 5/26
Ann f yes 1/6 2/8 2/8 106 114 8 6/22 2/26
Sue f yes 1/6 2/8 2/8 116 130 14 6/22 2/26
x =0 x=1
E (τx ): 93.5 102.5 ATE 10 = 9
E (Y |X =x ): 100.227 96.5 PFE 10 = −3.727
¡ ¯ ¢
¡ ¯E τx Z =(m, no )¢ : 72 83 CTE Z ; 10 (m, no ) = 11
¯
E Y ¯ X =x , Z =(m, no ) : 72 83 PFE Z ;10 (m, no ) = 11
¡ ¯ ¢
¡ ¯E τx Z =(m, yes )¢ : 97.5 102.5 CTE Z ; 10 (m, yes ) = 5
¯
E Y ¯ X =x , Z =(m, yes ) : 97.5 102.5 PFE Z ;10 (m, yes ) = 5
¡ ¯ ¢
¡ ¯E τx Z =( f, yes )¢ : 111 122 CTE Z ; 10 ( f, yes ) = 11
¯
E Y ¯ X =x , Z =( f, yes ) : 111 122 PFE Z ;10 ( f, yes ) = 11
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,
as specified in section 6.5, is the regular probabilistic causality setup. This also includes
the set Ω2 = ΩX = {control, treatment }, represented by the values 0 and 1 of the treatment
variable X . And again, U is a global potential confounder of X .
The upper left part of Table 7.5 displays the true outcomes under treatment and under
control as well as the probabilities for each person to be assigned to treatment condition
1. In the random experiment presented in this table, the true treatment probabilities and
true outcomes are such that Z -conditional independence of τ and X (i. e., Z -conditional
strong ignorability) holds.
We check if τ⊥
⊥X |Z actually holds via checking
P (X =1| Z , τ) =
P
P (X =1| Z ) (7.72)
P (X =0 | Z , τ) =
P
P (X =0 | Z ) (7.73)
218 7 Rosenbaum-Rubin Conditions
because X is binary. According to the same theorem, this equation is also equivalent to
τ⊥⊥1X =1 | Z and because, in this example, 1X =1 = X , it is equivalent to τ⊥
⊥X |Z as well. Fur-
thermore, in this example, P (Z =z, τ=t ) > 0 for all pairs (z, t ) ∈ Z (Ω) × τ(Ω). Therefore,
Equation (7.72) is also equivalent to
[see RS-Th. 4.42 and RS-Eq. (3.26)]. Note that, in this example, Z = (Z 1 , Z 2) and τ = (τ0 , τ1 )
are two-dimensional random variables.
In the sequel, we use
X
P (X =x |V =v) = P (X =x |V =v ,U =u ) · P (U =u |V =v ), (7.75)
u
which is always true if P (V =v,U =u ) > 0 for all values of U [see RS-Box 3.2 (ii) for
Y = 1X =x ]. Using Equation (7.75) for the parameters displayed in Table 7.5 with Z = (Z 1 , Z 2)
taking the role of V and considering x =1 and z=(m, no) yields
¡ ¯ ¢
P X =1¯ Z =(m, no)
X ¡ ¯ ¢ ¡ ¯ ¢
= P X =1¯ Z =(m, no),U =u · P U =u ¯ Z =(m, no)
u
= 7/8 · 1/2 + 5/8 · 1/2 = 6/8.
In this case, we only have to sum over the first ¡two persons
¯ displayed
¢ in Table 7.5 because,
for the other four persons, the probabilities P U =u ¯ Z =(m,no) are zero. Applying Equa-
tion (7.75) to V = (Z , τ) and the combination of values z = (m, no) and t = (72, 83) yields
exactly the same probability
¡ ¯ ¢
P X =1¯ Z =(m, no), τ=(72, 83)
X ¡ ¯ ¢ ¡ ¯ ¢
= P X =1¯ Z =(m, no), τ=(72, 83),U =u · P U =u ¯ Z =(m, no), τ=(72, 83)
u
= 7/8 · 1/2 + 5/8 · 1/2 = 6/8
(see the first two rows of Table 7.5). Hence, we have shown
¡ ¯ ¢ ¡ ¯ ¢
P X =1¯ Z =(m, no) = P X =1¯ Z =(m, no), τ=(72, 83) = 6/8.
Again using Equation (7.75) with Z taking the role of V and considering the case x =1
and z =(m, yes) yields
¡ ¯ ¢
P X =1¯ Z =(m, yes)
X ¡ ¯ ¢ ¡ ¯ ¢
= P X =1¯ Z =(m, yes),U =u · P U =u ¯ Z =(m, yes)
u
= 5/8 · 1/2 + 5/8 · 1/2 = 5/8.
In this case, we only have to sum over persons three¡ and four
¯ displayed
¢ in Table 7.5; for the
other four persons, the conditional probabilities P U =u ¯ Z =(m,yes) are zero.
Applying Equation (7.75) to V = (Z , τ) and the combination of values Z = (m, yes) and
τ = (95, 100) yields exactly the same conditional probability
7.5 Summary and Conclusions 219
¡ ¯ ¢
P X =1 ¯ Z =(m, yes), τ=(95, 100)
X ¡ ¯ ¢ ¡ ¯ ¢
= P X =1 ¯ Z =(m, yes), τ=(95, 100),U =u · P U =u ¯ Z =(m, yes), τ=(95, 100)
u
= 5/8 · 1 = 5/8
(see the third row of Table 7.5). The same result is obtained if we apply Equation (7.75) to
V = (Z , τ) and the combination of values z = (m, yes) and t = (100, 105) (see the fourth row
of Table 7.5). Hence we have shown
¡ ¯ ¢ ¡ ¯ ¢
P X =1¯ Z =(m, yes) = P X =1¯ Z =(m, yes), τ=(95, 100)
¡ ¯ ¢
= P X =1¯ Z =(m, yes), τ=(100, 105) = 5/8.
This proves that Proposition (7.74), and with it, Equation (7.72) — which is equivalent to
τ⊥
⊥X |Z — holds in this example.
In this chapter, we treated some causality conditions, all of which involve the true out-
come variables τx . The simple (i. e., the unconditional) ones are listed in Box 7.1, the con-
ditional ones in Box 7.2. The implication relations among the simple causality conditions
are listed in Table 7.1, whereas the implications among the conditional ones are found in
Table 7.2. According to the last row of Table 7.2, Z -conditional independence of τ and X ,
the translation of Rosenbaum and Rubin’s Z -conditional strong ignorability into true out-
come theory, is the strongest, that is, the most restrictive condition among those causality
conditions in which we condition on a covariate Z of X ; it implies all other causality con-
ditions listed in that table. The same applies to independence of τ and X , which implies
all other causality conditions listed in Table 7.1.
Note that there are no implications between the simple causality conditions summa-
rized in Table 7.1 and the conditional ones listed in Table 7.2. Even the strongest (most
restrictive) condition τ⊥ ⊥X does not imply any of the causality conditions listed in Table
7.2 or vice versa. This implies, for example, that ∀x : τx 1X =x , which is equivalent to unbi-
asedness of E (Y |X ), does not imply ∀x : τx 1X =x | Z , which is equivalent to unbiasedness
of E (Y |X, Z ) (see also the example described in section 6.6). Furthermore, τ⊥ ⊥X |Z does
not imply τ⊥ ⊥X or any of the simple causality conditions listed in Box 7.1.
Limitations
The causality conditions treated in the present chapter have three important limitations:
They are not generalizable, they can only indirectly be created via experimental design
techniques, and they are not testable in empirical applications.
The term not generalizable refers to the fact mentioned above that, for example, τ⊥ ⊥X
does not imply τ⊥⊥X |Z , even not if Z is a covariate of X . This has serious disadvantages
220 7 Rosenbaum-Rubin Conditions
7.6 Proofs
τx ⊥
⊥ 1X =x ⇒ τx 1X =x [RS-Th. 4.40]
⇔ E (τx | 1X =x ) =
P
E (τx ) [RS-Def. 4.36]
⇔ E (Y |X =x ) ⊢ D X . [Th. 6.9 (i)]
7.6 Proofs 221
Proposition (7.17). This proposition immediately follows from the fact that σ(1X =x ) ⊂
σ(X ) and RS-Box 2.1 (iv).
Propositions (7.18) to (7.20). With 7.1 (d) we assume that τx is P-unique. Under this
assumption,
τx ⊥
⊥X ⇒ τx X [RS-Th. 4.40]
⇒ τx 1X =x [σ(1X =x ) ⊂ σ(X ), RS-(4.45)]
⇔ E (Y |X =x ) ⊢ D X . [(7.16)]
Proposition (7.46). According to Assumption 7.1 (d), the true outcome variable τx is
P-unique. Hence,
τx X | Z
⇔ E (τx |X , Z ) =P
E (τx |Z ) [(7.38)]
¡ ¯ ¢ ¡ ¯ ¢
⇒ E E (τx |X , Z ) ¯ 1X =x , Z = P
E E (τx |Z ) ¯ 1X =x , Z [RS-Box 4.1 (xiv)]
¡ ¯ ¢
⇔ E (τx | 1X =x , Z ) =
P
E E (τx |Z ) ¯ 1X =x , Z
[σ(1X =x , Z ) ⊂ σ(X , Z ), RS-Box 4.1 (xiii)]
¡ ¢
⇔ E (τx | 1X =x , Z ) =
P
E (τx |Z ) [σ(E τx |Z ) ⊂ σ(1X =x , Z ), RS-Box 4.1 (xi)]
⇔ τx 1X =x | Z . [(7.37)]
Proposition (7.47).
τx 1X =x | Z ⇔ E (τx | 1X =x , Z ) =
P
E (τx |Z ) [(7.37)]
X =x
⇔ E (Y |Z ) ⊢ D X . [Th. 6.20 (i)]
Proposition (7.48).
Proposition (7.49).
τx ⊥
⊥ 1X =x |Z ⇒ τx 1X =x | Z . [RS-Box 4.1 (xiv), RS-Th. 6.8]
τx 1X =x | Z ⇔ E X =x (Y |Z ) ⊢ D X . [(7.47)]
τx ⊥
⊥X |Z ⇒ τx X | Z . [RS-Th. 6.8]
τx ⊥
⊥X |Z ⇒ τx ⊥
⊥ 1X =x |Z [σ(1X =x ) ⊂ σ(X ), RS-Box 6.1 (vi)]
⇒ τx 1X =x | Z . [τx is P-unique, RS-Th. 6.8]
Propositions (7.55) to (7.58). Under Assumptions 7.1 (a) to (c), (e), and (g),
τ⊥
⊥X |Z
⇔ ∀ x ∈ X (Ω) : τ⊥
⊥1X =x |Z [RS-(6.8)]
′
⇒ ∀ x, x ∈ X (Ω): τx ⊥
⊥1X =x ′ |Z [σ(τx ) ⊂ σ(τ), RS-Box 6.1 (vi)]
⇔ ∀ x ∈ X (Ω): τx ⊥
⊥X |Z [RS-(6.8)]
′
⇔ ∀ x, x ∈ X (Ω): τx ⊥
⊥1X =x ′ |Z [RS-(6.8)]
⇒ ∀ x ∈ X (Ω): τx ⊥
⊥1X =x |Z . [x ∈ X (Ω)]
Propositions (7.59) and (7.60). Under the Assumptions 7.1 (a) to (g),
∀ x ∈ X (Ω): τx ⊥
⊥X |Z
⇒ ∀ x ∈ X (Ω): τx X | Z [τx is P-unique, RS-Th. 6.8]
⇒ ∀ x ∈ X (Ω): τx 1X =x | Z . [σ(1X =x ) ⊂ σ(X ), RS-(4.52)]
∀ x ∈ X (Ω) : τx ⊥
⊥1X =x |Z
⇒ ∀ x ∈ X (Ω) : τx 1X =x | Z . [τx is P-unique, RS-Th. 6.8]
7.7 Exercises 223
∀ x ∈ X (Ω): τx 1X =x | Z
⇔ ∀ x ∈ X (Ω) : E X =x (Y |Z ) ⊢ D X [Th. 6.20 (i), (7.38)]
⇔ E (Y |X, Z ) ⊢ D X . [Def. 6.18 (i)]
7.7 Exercises
⊲ Exercise 7-4 Check the implications listed in Table 7.2 and find their proofs in this chapter.
⊲ Exercise 7-5 Change a single number in Table 7.3 so that in this modified example τ⊥
⊥X |(Z = f )
holds. Use the Causal Effects Xplorer to check this condition.
⊲ Exercise 7-6 Show that P(X =x |D X ) > 0 implies P(X =x | Z ) > 0, if Z is a covariate of X .
P P
Solutions
⊲ Solution 7-2 Under the Assumptions 7.1 (a) to (f ), Propositions (7.7) to (7.8) immediately follow
from Propositions (7.5) to (7.6), and Proposition (7.9) follows from Definition 6.3 (ii).
⊲ Solution 7-3
τ⊥
⊥X ⇔ ∀ x ∈ X (Ω) : P(X =x |τ) = P(X =x ) [(7.14)]
P
⇔ ∀x : τx ⊥
⊥X . [(7.13)]
⊲ Solution 7-4 The propositions of Table 7.2 are considered row wise.
Row 1
224 7 Rosenbaum-Rubin Conditions
(21) Under the Assumptions 7.1 (a) to (c), (e) and (g): τ⊥
⊥X |Z ⇒ τx ⊥
⊥ 1X =x |Z .
This follows from Propositions (7.55) and (7.58).
(22) Under the Assumptions 7.1 (a) to (c), (e) and (g): τ⊥ ⊥X |Z ⇒ τx ⊥
⊥X |Z .
This follows from Propositions (7.55) to (7.57).
¡ ¢
(23) Under the Assumptions 7.1 (a) to (g): τ⊥⊥X |Z ⇒ ∀ x ∈ X (Ω): τx 1X =x | Z .
This follows from Propositions (7.55) to (7.60).
¡ ¢
(24) Under the Assumptions 7.1 (a) to (g): τ⊥⊥X |Z ⇒ ∀ x ∈ X (Ω): τx X | Z .
This immediately follows from Propositions (7.55) to (7.59).
¡ ¢
(25) Under the Assumptions 7.1 (a) to (c), (e) and (g): τ⊥ ⊥X |Z ⇒ ∀ x ∈ X (Ω): τx ⊥
⊥ 1X =x |Z .
This follows from Propositions (7.55) to (7.58).
¡ ¢
(26) Under the Assumptions 7.1 (a) to (c), (e) and (g): τ⊥ ⊥X |Z ⇒ ∀ x ∈ X (Ω): τx ⊥
⊥X |Z .
This follows from Propositions (7.55) to (7.57).
are defined only if the values x of X have a positive probability P (X =x ). For example, if
X is normally distributed, then P (X =x ) = 0 for all values x of X , and this also applies if X
has any other continuous distribution. Even in the definitions of causal effects treated in
chapter 5 we presumed that X is discrete and has at least two values x and x ′ having a pos-
itive probability. Hence, although the Fisher conditions have many useful implications on
unbiasedness and true outcome variables, they have far reaching consequences beyond
true outcome theory.
Requirements
Reading this chapter we assume that the reader is familiar with the concepts treated in all
chapters of Steyer (2024). Again, chapters 4 to 6 of that book are now crucial. They deal
with the concepts of a conditional expectation, a conditional expectation with respect to a
conditional probability measure, and conditional independence. Furthermore, we assume
familiarity with chapters 4 to 7 of the present book.
In this chapter, we will often refer to the following assumptions and notation.
8.1 F-Conditions
In this section, we present the causality conditions summarily referred to as the simple
(or unconditional) and the conditional Fisher (F) conditions. In the simple F-conditions,
we assume independence of a putative cause variable X and a global potential confounder
D X of X , or independence of D X and an indicator variable 1X =x , where x denotes a value of
8.1 F-Conditions 229
X . Among other things, these simple F-conditions imply unbiasedness of the conditional
expectation E (Y |X ) and the conditional expectation value E (Y |X =x ), respectively. Such a
simple F-condition is equivalent to independence of all potential confounders of X on one
side and X (or 1X =x ) on the other side. In contrast, the conditional F-conditions postulate
Z -conditional independence of D X and X (or 1X =x ), where Z denotes a covariate of X .
Among other things, these conditional F-conditions imply unbiasedness of E (Y |X, Z ) and
E X =x (Y |Z ), respectively. We also explicitly consider F-conditions in which we condition
on a single value z of Z .
From a methodological point of view, it should be noted that the simple F-conditions
are the mathematical foundation of the randomized experiment. In contrast, the condi-
tional F-conditions are the mathematical foundation of the conditionally randomized ex-
periment. However, the conditional F-conditions can also be used for covariate selection
aiming at establishing conditional independence of D X and X given the (possibly multi-
variate) covariate Z (for more details, see sect. 8.6).
In RS-section 2.4 we already introduced the concept of independence of two random vari-
ables with respect to a probability measure P . This concept is used repeatedly in Box 8.1, in
which the definitions of all F-conditions are gathered. Reading the definitions in this box,
remember that σ(D X ) and σ(X ) denote the σ-algebras generated by the random variables
D X and X , respectively (see RS-Def. 2.12).
Remark 8.2 [Independence of D X and 1X =x ] Box 8.1 (i) presents the definition of inde-
pendence of a global potential confounder D X of X and an indicator variable 1X =x for the
event {X =x }. In this definition, we require the Assumptions 8.1 (a) and (b). The notation
⊥1X =x . According to RS-Corollary 6.18,
is D X ⊥
DX ⊥
⊥1X =x ⇔ P (X =x |D X ) =
P
P (X =x ) (8.1)
⇔ P (1X =x =1|D X ) =
P
P (1X =x =1) (8.2)
⇔ P (1X =x =0 |D X ) =
P
P (1X =x =0) (8.3)
⇔ P (X 6=x |D X ) =
P
P (X 6=x). (8.4)
Note that Propositions (8.1) to (8.4) also hold if D X is neither discrete nor numerical and if
P (X =x ) = 0. Hence, the condition D X ⊥ ⊥1X =x can be used beyond true outcome theory of
causal effects (see ch. 5), and this applies to all other conditions presented in Box 8.1. ⊳
Remark 8.3 [Independence of D X and X ] The second definition [see Box 8.1 (ii)] is inde-
pendence of the putative cause variable X and a global potential confounder D X of X . The
notation is D X ⊥
⊥X . This concept also applies if neither X nor D X are discrete or numerical.
We only require Assumptions 8.1 (a). However, if X is discrete (see RS-Def. 2.62), then
DX ⊥
⊥X ⇔ ∀ x ∈ X (Ω): P (X =x |D X ) =
P
P (X =x ) (8.5)
⇔ ∀ x ∈ X (Ω): D X ⊥
⊥1X =x (8.6)
(see RS-Cor. 6.17). Hence, if X is discrete, then according to Proposition (8.5), indepen-
dence of D X and X (with respect to the probability measure P ) implies that the D X -con-
ditional probabilities of the events {X =x } = {ω ∈ Ω: X (ω) = x } do not depend on the global
230 8 Fisher Conditions
Simple F-conditions
DX ⊥
⊥1X =x Independence of D X and 1X =x . Under Assumptions 8.1 (a) and (b), it is de-
fined by
∀ (A,B) ∈ σ(D X ) ×σ(1X =x ): P(A ∩ B) = P(A) · P(B). (i)
DX ⊥
⊥X Independence of D X and X . Under Assumptions 8.1 (a), it is defined by
∀ (A,B) ∈ σ(D X ) ×σ(X ) : P(A ∩ B) = P(A) · P(B). (ii)
Z-conditional F-conditions
DX⊥
⊥1X =x |Z Z -conditional independence of D X and 1X =x . Under Assumptions 8.1 (a),
(b), and (d), it is defined by
DX⊥
⊥X |Z Z -conditional independence of D X and X . Under Assumptions 8.1 (a) and
(d) it is defined by Note:
Under Assumptions 8.1 (a), (c), (d), (g), and that Z is a covariate of X it
implies E (Y |X, Z ) ⊢ D X .
(Z = z)-conditional F-conditions
⊥1X =x |(Z =z) (Z =z)-conditional independence of D X and 1X =x . Under Assumptions 8.1
DX⊥
(a), (b), (d), and (e), it is defined by
DX⊥
⊥X |(Z =z ) (Z =z)-conditional independence of D X and X . Under Assumptions 8.1 (a),
(d), and (e), it is defined by
Under Assumptions 8.1 (a) to (e), (g), and that Z is a covariate of X it im-
plies E Z=z (Y |X ) ⊢ D X .
The proofs that the six F-conditions imply unbiasedness of the specified conditional expecta-
tions are found in the theorems and corollaries of section 8.3.
8.1 F-Conditions 231
The general concept of conditional independence of two random variables given a ran-
dom variable has been treated in some detail in RS-chapter 6. For a more detailed presen-
tation of conditional independence of random variables see SN-chapter 16.
DX⊥
⊥1X =x |Z ⇔ P (X =x |D X , Z ) =
P
P (X =x |Z ) (8.7)
⇔ P (1X =x =1|D X , Z ) =
P
P (1X =x =1| Z ) (8.8)
⇔ P (1X =x =0 |D X , Z ) =
P
P (1X =x =0 | Z ) (8.9)
⇔ P (X 6=x |D X , Z ) =
P
P (X 6=x |Z ). (8.10)
Remark 8.8 [The Putative Cause Variable Does Not Have to Be Discrete] Just as D X ⊥
⊥X ,
the condition D X ⊥
⊥X |Z is also defined and may hold if X is continuous. In this case,
232 8 Fisher Conditions
true outcome theory of causal effects is not applicable because there we presume posi-
tive probabilities P (X =x ) for all x ∈ X (Ω), which does not hold if X is continuous. Nev-
ertheless, if Z is a covariate of X , then D X ⊥ ⊥X |Z is still a causality condition, implying
that E (Y |X, Z ) describes a causal Z -conditional dependence of Y on X . However, in this
volume we refrain from generalizing the theory to continuous putative cause variables. ⊳
DX⊥
⊥X |Z ⇔ ∀ x ∈ X (Ω): P (X =x |D X , Z ) =
P
P (X =x | Z ) (8.11)
⇔ ∀ x ∈ X (Ω): D X ⊥
⊥1X =x |Z (8.12)
(see RS-Th. 6.5). To emphasize, the right-hand sides of these propositions are equivalent
to D X ⊥
⊥X |Z only if X is discrete. However, in contrast to the definition presented in Box
8.1 (iv), they do not hold anymore, if X is continuous. Also note that Proposition (8.5) is a
special case of (8.11) for Z being a constant map, that is, for σ(Z ) = {Ω, Ø}. ⊳
Remark 8.10 [Consequences of Z Being a Covariate of X ] If we assume that Z is a covari-
ate of X , then, according to Definition 4.11 (iv) and Remark 4.16, σ(Z ) ⊂ σ(D X ) holds for
the σ-algebras generated by these two random variables. This implies σ(D X ) = σ(Z , D X )
[see RS-Prop. (2.19)] and
P (X =x |D X , Z ) =
P
P (X =x |D X ) (8.13)
[see RS-Def. 4.4 and RS-Eq. (4.10)]. Hence, if X is discrete and Z is a covariate of X , then
Proposition (8.11) can also be written as
DX⊥
⊥X |Z ⇔ ∀ x ∈ X (Ω): P (X =x |D X ) =
P
P (X =x | Z ). (8.14)
DX⊥
⊥1X =x |Z ⇔ P (X =x |D X ) =
P
P (X =x |Z ) (8.15)
⇔ P (1X =x =1|D X ) =
P
P (1X =x =1| Z ) (8.16)
⇔ P (1X =x =0 |D X ) =
P
P (1X =x =0 | Z ) (8.17)
⇔ P (X 6=x |D X ) =
P
P (X 6=x |Z ). (8.18)
⊳
Example 8.11 [A First Example of Z -Conditional Independence] An example of Z -con-
ditional independence of the putative cause variable X and a global potential confounder
D X of X has already been presented in Table 6.4. In this example, U takes the role of a
global potential confounder of X . In section 8.5 we will treat several examples in more
detail. ⊳
Remark 8.12 [(Z =z)-Conditional Independence of D X And X ] The fifth causality condi-
tion [see Box 8.1 (v)] is (Z =z)-conditional independence of D X and 1X =x . It is denoted by
DX⊥ ⊥1X =x |(Z =z). The required assumptions are 8.1 (a), (b), (d), and (e). This condition
means that 1X =x on one side and all potential confounders of X on the other side are
(Z =z)-conditionally independent. The definition shown in Box 8.1 (v) reveals that (Z =z)-
conditional independence of D X and 1X =x is equivalent to independence of D X and 1X =x
with respect to the probability measure P Z=z [see Assumptions 8.1 (e)]. That is,
DX⊥
⊥1X =x |(Z =z) ⊥ 1X =x .
:⇔ D X ⊥ (8.19)
P Z=z
DX⊥
⊥1X =x |(Z =z) ⇔ P Z=z (X =x |D X ) Z=z
= P Z=z (X =x ) (8.20)
P
⊳
Remark 8.13 [(Z =z)-Conditional Independence of D X and X ] The sixth causality condi-
tion, introduced in Box 8.1 (vi), is (Z =z)-conditional independence of D X and X , denoted
DX⊥ ⊥X |(Z =z). The required assumptions are 8.1 (a), (d), and (e). This condition means
that the putative cause variable X on one side and all potential confounders of X on the
other side are (Z =z)-conditionally independent. According to the definition in Box 8.1 (vi),
(Z =z)-conditional independence of D X and X is equivalent to independence of D X and X
with respect to the probability measure P Z=z . That is,
DX⊥
⊥X |(Z =z) ⊥X.
:⇔ D X ⊥ (8.24)
P Z=z
⊳
Remark 8.14 [(Z =z)-Conditional Independence of D X and a Finite X ] Suppose that X is
finite or countable. Then, according to Remark 8.3 and Proposition (8.24),
DX⊥
⊥X |(Z =z) ⇔ ∀ x ∈ X (Ω): P Z=z (X =x |D X ) Z=z
= P Z=z (X =x ) (8.25)
P
⇔ ∀ x ∈ X (Ω): D X ⊥
⊥1X =x |(Z =z) . (8.26)
In this section we study the implication structure among the F-conditions. A summary of
all these implications is presented in Box 8.1. Proofs are found in the theorems treated in
the present section and in the solution to Exercise 8-2.
DX ⊥
⊥X ⇒ DX ⊥
⊥1X =x (8.27)
because σ(1X =x ) ⊂ σ(X ) [see RS-Box 2.1 (iv)]. Correspondingly, and for the same reason,
under the Assumptions 8.1 (a), (b), and (d),
DX⊥
⊥X |Z ⇒ DX⊥
⊥1X =x |Z (8.28)
Hence, under the Assumptions 8.1 (a) and that W is a potential confounder of X , in-
dependence of a putative cause variable X and a global potential confounder D X of X
implies W -conditional independence of X and D X . This proposition is called generaliz-
ability of D X ⊥
⊥X . Its methodological implications are discussed in more detail in Remark
8.63. Note that assuming W to be a potential confounder of X is crucial for this implica-
tion. Also note that, under the Assumptions 8.1 (a) and (b), Propositions (8.27) and (8.30)
imply
DX ⊥
⊥X ⇒ ∀W ∈ WX : D X ⊥
⊥1X =x |W . (8.31)
⊥X |(Z =z)
⊥1X =x |Z
⊥1X =x
⊥X |Z
DX⊥
DX⊥
DX⊥
D X⊥
D X⊥
DX⊥⊥X |(Z =z ) (a), (b), (d), (e)
DX⊥⊥1X =x |Z (a), (b), (d), (e)
DX⊥⊥X |Z (a), (b), (d), (e) (a), (d), (e) (a), (b), (d)
DX ⊥
⊥1X =x (a), (b), (d), (e), (h) (a), (b), (h)
DX ⊥
⊥X (a), (b), (d), (e), (h) (a), (d), (e), (h) (a), (b), (h) (a), (h) (a), (b)
Note: An entry such as (a), (b) means that the condition in the row implies the condition in
the column, provided that the Assumptions 8.1 (a) and (b) hold. Trivial equivalences such as
DX ⊥⊥1X =x ⇔ D X ⊥
⊥1X =x are omitted.
DX⊥
⊥X |Z ⇒ DX⊥
⊥X |(Z =z). (8.35)
DX ⊥
⊥X ⇒ DX⊥
⊥X |(Z =z) . (8.36)
Hence, under the assumptions of this corollary, independence of a putative cause variable
X and a global potential confounder of X implies (Z =z)-conditional independence of X
and D X .
DX⊥
⊥X |(Z =z) ⇒ DX⊥
⊥1X =x |(Z =z) (8.37)
Note that the assumptions of the theorems and the other propositions stated above
neither include P (X =x ) > 0 nor that Y is real-valued, which are prerequisites for the true
outcome variables τx to be defined. Hence, these propositions also hold beyond true out-
come theory. This remark also applies to all other propositions summarized in Table 8.1.
(For the proofs of these propositions, see the solution to Exercise 8-2.)
In this section we treat the consequences of the Fisher conditions on the Rosenbaum-
Rubin conditions and on unbiasedness. Now our assumptions include that the putative
cause variable X is finite with values in the set X (Ω) = {0, 1, . . . , J }, P (X =x ) > 0 for all values
of X , and that Y is real-valued. These assumptions are prerequisites for the definitions of
the true outcome variables τx , x ∈ X (Ω).
Table 8.2 summarizes the consequences of the Fisher conditions on the Rosenbaum-
Rubin conditions. These consequences are proved in the theorems treated in this section
or in Exercise 8-3. We also prove the implications of the Fisher conditions on unbiased-
ness.
8.3 Implications of F-Conditions on RR-Conditions and Unbiasedness 237
In the first theorem of this section, we consider consequences of D X ⊥ ⊥X for the multi-
variate true outcome variable τ = (τ0 , τ1 , . . . , τJ ) and unbiasedness of the conditional ex-
pectation E (Y |X ) and its values E (Y |X =x ). In this theorem, we confine ourselves to those
consequences that do not involve conditioning on a covariate of X .
Remark 8.23 [D X ⊥ ⊥X Implies That All τx Are P -Unique] Under the assumptions of The-
orem 8.22, D X ⊥ ⊥X implies that all true outcome variables τx , x ∈ X (Ω), are P-unique [see
Prop. (iii) of Th. 8.22]. Hence, P -uniqueness of the true outcome variables τx is not an ad-
ditional assumption in Proposition (v) of Theorem 8.22, according to which independence
of D X and X implies unbiasedness of E (Y |X ). ⊳
Hence, according to this theorem, D X ⊥ ⊥ 1X =x implies that the true outcome variable
E X =x(Y |D X ) and the indicator 1X =x are independent. Correspondingly, D X ⊥ ⊥ 1X =x implies
that E X 6=x(Y |D X ) and the indicator 1X 6=x are independent. It also implies that the true out-
come variables E X =x (Y |D X ) and E X 6=x(Y |D X ) are P-unique. Finally, D X ⊥ ⊥ 1X =x implies
that E (Y | 1X =x ) and its values E (Y |X =x ) and E (Y |X 6=x) are unbiased.
238 8 Fisher Conditions
⊥X |(Z =z)
⊥1X =x |Z
⊥1X =x
⊥X |Z
⊥X
τ⊥
τ⊥
τ⊥
τ⊥
τ⊥
τ⊥
DX⊥⊥1X =x |(Z =z) (a)-(e),(g)
DX⊥⊥X |(Z =z ) (a)-(e),(g) (a)-(e),(g)
DX⊥⊥1X =x |Z (a)-(e),(g) (a)-(d),(g)
DX⊥⊥X |Z (a)-(g) (a)-(e),(g) (a)-(d),(g) (a)-(d),(g)
DX ⊥
⊥1X =x (a)-(g),(h) (a)-(d),(g),(h) (a),(c),(g)
DX ⊥
⊥X (a)-(g),(h) (a)-(g),(h) (a)-(d),(g),(h) (a)-(d),(g),(h) (a),(c),(g) (a),(c),(g)
Note: An entry such as (a)-(g) means that the condition in the row implies the condition in the
column, provided that the Assumptions 8.1 (a) to (g) hold. Consequences of the Rosenbaum-
Rubin conditions displayed in the columns of the table are found in Tables 7.1 and 7.2.
E X =x (Y |D X )⊥
⊥ 1X 6=x and E X 6=x(Y |D X )⊥
⊥ 1X =x . (8.38)
Example 8.33 [No Treatment For Males] Table 8.3 displays the parameters of a random
experiment in which males have a zero probability to be treated. We assume that this ran-
dom experiment has the same structure as the random experiments treated in section 6.5.
That is,
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,
as specified in section 6.5, is the regular probabilistic causality setup. Again, U takes the
role of a global potential confounder D X of X and Z = sex is a covariate of X because
σ(Z ) ⊂ σ(U ). This implies
P (X =1|U , Z ) =
P
P (X =1|U )
because σ(U , Z ) = σ(U ) [see RS-Prop. (2.19), RS-Def. 4.4, and RS-Rem. 4.12].
In this example, there is Z -conditional independence of U and X but the true outcome
variable τ1 is not P-unique. Therefore, unbiasedness of E X =1 (Y | Z ) and unbiasedness of
E (Y |X, Z ) are not defined in this example.
We start checking if U ⊥ ⊥X | Z holds. According to RS-Theorem 6.6, if X is binary, then
U⊥
⊥X | Z ⇔ P (X =1|U , Z ) =
P
P (X =1|Z ).
U⊥
⊥X | Z ⇔ P (X =1|U ) =
P
P (X =1|Z ).
Fundamental parameters
E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (U =u |X = 0)
P (X =1|U =u )
P (U =u |X =1)
CTE U ;10 (u)
Person u
P (U =u )
Sex z
Joe m 1/4 0 68 999∗ ndef 2/5 0
Jim m 1/4 0 78 −999∗ ndef 2/5 0
Ann f 1/4 3/4 106 114 8 1/10 1/2
Sue f 1/4 3/4 116 130 14 1/10 1/2
x =0 x=1
E (τx ): 92.0 111∗ ATE 10 = ndef
E (Y |X =x ): 80.6 122 PFE 10 = 41.4
Note: An asterisk ∗ indicates that this number is arbitrary and could be any other real number
because, in this example, the corresponding term is not uniquely defined (see RS-sect. 4.4 for
more details). Furthermore, ndef means that this term is not defined in this example.
(
0, if ω ∈ {Z =m}
P (X =1|U )(ω) = P (X =1|Z )(ω) = (8.39)
3/4, if ω ∈ {Z = f },
which proves U ⊥ ⊥X | Z .
Now we show that, in this example, the true outcome variable τ1 is not P-unique.
Because U is a global potential confounder of X , P -uniqueness of τ1 is equivalent to
P (X =1|U ) >
P
0 (see RS-Th. 5.27), which in turn is equivalent to
¡ ¢
P {ω ∈ Ω: P (X =1|U )(ω) > 0} = 1.
[see RS-Eq. (4.12)]. Looking at the columns headed P (U =u ) and P (X =1|U =u ) in Table 8.3
shows that
¡ ¢
P {ω ∈ Ω: P (X =1|U )(ω) > 0} = 1/4 + 1/4 = 1/2.
For the second part of Theorem 8.31 we need the additional assumption that all true
outcome variables τx , x ∈ X (Ω), are P-unique. In the following theorem we present a con-
dition under which Z -conditional independence of D X and X implies that a true outcome
variable τx is P-unique.
DX⊥
⊥1X =x |Z ∧ P (X =x |Z ) >
P
0 ⇒ τx is P-unique. (8.40)
P (X =x |Z ) >
P
0 ⇔ E X =x (Y |Z ) is P-unique.
Hence, under the assumptions of Theorem 8.37, D X ⊥ ⊥X |Z implies that all true out-
come variables τx , x ∈ X (Ω), are P Z=z-unique and that τ = (τ0 , τ1 , . . . , τJ ) and X are (Z =z)-
conditionally independent. Furthermore, under the additional assumption that Z is a co-
variate of X , D X ⊥⊥X |Z also implies unbiasedness of all conditional expectation values
E Z=z(Y |X =x ), x ∈ X (Ω), and of the X-conditional expectation E Z=z (Y |X ) of Y with respect
to the measure P Z=z .
Remark 8.38 [Consequences of τ⊥ ⊥X |(Z =z)] Corollary 7.14 lists some other consequences
that follow from the conjunction of τ⊥ ⊥X |(Z =z) and P Z=z -uniqueness of the true out-
come variables. Because of Proposition (ii) of Theorem 8.37, these consequences also fol-
low from D X ⊥⊥X |Z , provided that the assumptions of Theorem 8.37 hold. ⊳
In Theorem 8.37 and Remark 8.38, we only consider a single value z of the random
variable Z and assume that P (Z =z) > 0. In contrast, in the following theorem we assume
P (X =x , Z =z) > 0 for all pairs (x, z) of values of X and Z , which implies P (X =x ) > 0 for all
values x of X and P (Z =z) > 0 for all values z of Z .
Hence, under the assumptions of Theorem 8.39, we can conclude that all true outcome
variables τx , x ∈ X (Ω), are P-unique and, if Z is a covariate of X , then it also follows that all
conditional expectations E X =x (Y |Z ), x ∈ X (Ω), as well as E (Y |X, Z ) are unbiased.
E (Y |X, Z ) is unbiased can often be created by the experimenter. The first is the assump-
tion that P (X =x , Z =z) > 0 for all pairs (x, z) of values of X and Z . If, for example, X is
a binary treatment variable and Z the binary covariate sex, then the experimenter simply
has to make sure that there is a positive probability for each of the four pairs (x, z) of values
of X and Z under which the outcome variable Y is observed. ⊳
Hence, under the assumptions of Theorem 8.41, D X ⊥ ⊥X |(Z =z) implies that the multi-
variate true outcome variable τ and X are (Z =z)-conditionally independent. This implies
that τ and each indicator variable 1X =x , x ∈ X (Ω), are (Z =z)-conditionally independent. If
we additionally assume P Z=z (X =x ) > 0 for all x ∈ X (Ω), then D X ⊥ ⊥X |(Z =z) also implies
that all true outcome variables τx , x ∈ X (Ω), are P Z=z-unique. Finally, if Z is a covariate of
⊥X |(Z =z) also implies and that the conditional expectation E Z=z (Y |X ) and
X , then D X ⊥
all the conditional expectation values E Z=z(Y |X =x ) = E (Y |X =x , Z =z), x ∈ X (Ω), are un-
biased. Other implications can be found in the last row of Table 7.2.
In the next theorem we consider some consequences of (Z =z)-conditional indepen-
dence a global potential confounder D X of X and an indicator 1X =x for a value x of the
putative cause variable X .
(i) E X =x(Y |D X )⊥
⊥ 1X =x |(Z =z) and E X 6=x(Y |D X )⊥
⊥ 1X 6=x |(Z =z).
In section 6.4 we showed how to identify average and conditional causal total effects and
effect functions from unbiased prima facie effects and unbiased prima facie effect func-
tions. Now we turn to sufficient conditions for unbiasedness of conditional and uncon-
ditional prima facie effects and effect functions. Hence, this section is also crucial for the
identification of conditional and average causal total effects.
In the following theorem we specify the conditions under which the prima facie effect
PFE x x ′ = E (Y |X =x ) − E (Y |X =x ′ ) (8.44)
is unbiased, that is, under which PFE x x ′ ⊢ DC . Hence, according to Definition 6.23 (i), we
specify conditions under which τx and τx ′ are P-unique and the prima facie effect PFE x x ′
is identical to the average causal total effect
Theorem 8.43 [An F-Condition Implying Unbiasedness of the Prima Facie Effect]
Let the Assumptions 8.1 (a) to (c), and (f ) hold. Then
¡ ¢
DX ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′ ⇒ PFE x x ′ ⊢ DC . (8.46)
(Proof p. 263)
Remark 8.44 [Identification of ATE x x ′ ] Hence, under the assumptions of Theorem 8.43,
the conjunction of (a) independence of the indicator 1X =x and a global potential con-
founder D X of X and (b) independence of the indicator 1X =x ′ and D X implies P -unique-
ness of τx and τx ′ as well as
246 8 Fisher Conditions
That is, the conjunction D X ⊥ ⊥1X =x ∧ D X ⊥ ⊥1X =x ′ implies that the difference between the
conditional expectation values E (Y |X =x ) and E (Y |X =x ′ ) is identical to the causal aver-
age total effect of x compared to x ′ . In this context we also say that the ATE x x ′ is identified
by that difference. Note that P -uniqueness of τx follows from D X ⊥ ⊥1X =x and P -uniqueness
of τx ′ follows from D X ⊥
⊥1X =x ′ . It is not an additional assumption. ⊳
which follows from σ(1X =x ), σ(1X =x ′ ) ⊂ σ(X ) and RS-Box 2.1 (iv). Hence, under the as-
sumptions of Theorem 8.43,
DX ⊥
⊥X ⇒ PFE x x ′ ⊢ DC (8.49)
⇒ PFE x x ′ = ATE x x ′ . (8.50)
This means, under D X ⊥ ⊥X , the average causal total effect ATE x x ′ is identified by the prima
facie effect ATE x x ′ . ⊳
In the next theorem we specify two conditions under each of which a Z -conditional prima
facie effect variable
′
PFE Z ; x x ′ (Z ) =
P
E X =x (Y |Z ) − E X =x (Y |Z ) (8.51)
CTE Z ; x x ′ (Z ) =
P
E (τx − τx ′ |Z ) (8.53)
of X and the indicator 1X =x and of D X and 1X =x ′ implies that the Z -conditional prima facie
effect variable PFE Z ; x x ′ (Z ) is unbiased.
Under the same assumptions, the same conclusion can be drawn from Z -conditional
independence of D X and X . Because unbiasedness of PFE Z ; x x ′ (Z ) implies
PFE Z ; x x ′ (Z ) =
P
CTE Z ; x x ′ (Z )
[see Prop. (8.52)], we also say that the causal Z -conditional total effect variableCTE Z ; xx ′ (Z )
is identified by PFE Z ; x x ′ (Z ), provided that the assumptions of Theorem 8.46 hold. ⊳
Remark 8.48 [PFE Z ; x x ′ ⊢ DC And the Identification of the Causal Average Total Effect] If
PFE Z ; x x ′ is unbiased, then we can not only identify the causal Z -conditional total effect
variable CTE Z ; xx ′ (Z ) (see Rem. 8.47), but also the causal average total effect ATE x x ′ , even
if D X ⊥
⊥X and Equation (8.47) do not hold. For details see Theorem 6.34 and the example
presented in Table 6.4. ⊳
According to the following theorem, independence of X and a global potential con-
founder D X of X is a sufficient condition for unbiasedness of a Z -conditional prima facie
effect function PFE Z ; x x ′ , and it does not require the additional assumptions P (X =x | Z ) >
P
0
and P (X =x ′ | Z ) >
P
0 that are indispensable in Theorem 8.46. Instead, these additional as-
sumption already follow from D X ⊥ ⊥X .
Now we consider a single value z of Z and specify conditions under which a (Z =z)-con-
ditional prima facie effect
′
PFE Z ; x x ′ (z) = E X =x (Y |Z =z)− E X =x (Y |Z =z) (8.54)
is unbiased. Note that PFE Z ; x x ′ (z) is the value of the effect function PFE Z ; x x ′ : ΩZ′ → R . Also
remember that PFE Z ; x x ′ (z) ⊢ DC denotes unbiasedness of PFE Z ; x x ′ (z), which is defined by
[see Eq. (5.31)] is the causal (Z =z)-conditional total effect and CTE Z ; xx ′ (z) is the value of
the causal Z -conditional total effect function CTE Z ; x x ′ : ΩZ′ → R .
248 8 Fisher Conditions
Remark 8.51 [Identification of CTE Z ; xx ′ (z)] Given the assumptions of Theorem 8.50, each
of the four conditions listed in this theorem implies that the causal (Z =z)-conditional
total effect is identical to the difference between the two conditional expectation values
E (Y |X =x , Z =z) and E (Y |X =x ′, Z =z). That is, each of these four conditions implies
In this context we also say that CTE Z ; xx ′ (z) is identified by the (Z =z)-conditional prima
facie effect PFE Z ; x x ′ (z), that is, it is identified by the difference between the conditional
expectation values E (Y |X =x , Z =z) and E (Y |X =x ′, Z =z). ⊳
Remark 8.52 [Conditional Randomization] Note that D X ⊥ ⊥X |Z [see Th. 8.50 (iii)] can be
created by conditional randomization, that is, by randomized assignment of the unit to a
treatment x, conditional on the values z of a covariate Z . In this case, the treatment prob-
abilities P (X =x | Z =z) can be fixed by the experimenter and may differ between different
values z of the covariate Z (see Rem. 8.15). ⊳
Remark 8.53 [Covariate Selection] Also note that we may try to select the (possibly multi-
variate) covariate Z = (Z 1 , . . . , Z m ) in such a way that D X ⊥
⊥X |Z holds. For instance, severity
of the disorder, knowing about the treatment, and availability of the treatment often are
candidates for such covariates. However, there is no guarantee that D X ⊥ ⊥X |Z holds for a
specified (univariate or multivariate) covariate Z . This is why the conditions specified in
the following theorem are of utmost importance. ⊳
Remark 8.55 [Randomization] Note that D X ⊥ ⊥X [see Th. 8.54 (ii)] can be created in an
experiment by randomization, that is, by randomized assignment of the sampled unit to a
treatment condition (represented by a value x of X ). Also note that Z can be any covariate
of X and remember, according to Theorem 4.31, Definition 4.11 (iv), and Remark 4.16,
every random variable on (Ω, A, P ) that is prior in (Ft )t ∈T to X is a covariate of X . ⊳
8.5 Examples 249
8.5 Examples
Now we illustrate the causality conditions treated in this chapter by some examples. In
RS-chapter 1 we introduced an example with independence of X and a global potential
confounder D X (see RS-Table 1.2). A second example has already been presented in chap-
ter 6 (see Table 6.3), and in the same chapter, there is also an example with Z -conditional
independence of D X and X (see Table 6.4). The structure of these random experiments has
already been treated in section 6.5, that is,
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,
as specified in section 6.5, is the regular probabilistic causality setup. Furthermore, the
observational-unit variable U is a global potential confounder, that is, D X = U , implying
E (Y | X , D X ) =
P
E (Y | X ,U )
and
P (X =1|D X ) =
P
P (X =1|U ).
In this equation, P (X =1|U ) denotes the individual treatment probability function, whose
values are the individual treatment probabilities P (X =1|U =u ) (see RS-Remarks 4.12 and
4.25). If the values u of U represent the observational units at the onset of treatment, then
D X = U will hold in empirical applications if
(a) no fallible covariate is assessed, and
(b) there is no other variable that is simultaneous to the treatment variable X (such as
a second treatment variable).
If a fallible covariate is assessed, then this covariate, say Z , is not measurable with respect
to U , which implies that it is not identical to a composition f (U ) of U and some map f .
In this case, D X = (U , Z ) is a global potential confounder of X , unless there are still other
fallible covariates of X .
3
P (X =1|U , Z ) =
P
P (X =1| Z ) =
P
P (X =1) = .
4
Note that U ⊥ ⊥X | Z implies that X and all random variables that are measurable with re-
spect to U are also Z -conditionally independent [see RS-Box 6.1 (vi)].
If D X =U , then, according to Theorem 8.22 (v), independence of U and X implies unbi-
asedness of the conditional expectation E (Y |X ). Furthermore, because, in this example,
250 8 Fisher Conditions
X is binary and P (X =0) > 0, P (X =1) > 0, independence of X and U also implies unbiased-
ness of the conditional expectation values E (Y |X =0) and E (Y |X =1) [see Th. 8.22 (iv)] as
well as unbiasedness of the prima facie effect
and
PFE Z ; 10 ( f ) = E (Y | X =1, Z = f ) − E (Y | X = 0, Z = f ) = 122 − 111 = 11
are unbiased (see Th. 8.50).
Also note that
4 2
PFE 10 = E [PFE Z ; 10 (Z )] = 9.50 · + 11 · = 10.
6 6
This means that the prima facie effect is the expectation of the corresponding conditional
prima facie effect variable. And because PFE Z ; 10 (Z ) is unbiased, it is identical to the causal
Z -conditional total effect variable CTE Z ;10 (Z ), and its expectation is identical to the causal
average total effect ATE 10 (see Th. 6.34). ⊳
Example 8.57 [Z -Conditional Independence of X and U ] In Table 6.4 we already treated
an example in which D X ⊥ ⊥X |Z holds, whereas D X ⊥ ⊥X does not. Again, D X =U . As is easily
seen in this table, the individual treatment probabilities are the same for all units within
each of the two subsets of males and females, that is,
3
for each male unit u : P (X =1|U =u , Z =m) = P (X =1| Z =m) =
4
and
1
for each female unit u : P (X =1|U =u , Z = f ) = P (X =1| Z = f ) = .
4
Hence, the individual treatment probabilities are 3/4 for males and 1/4 for females. These
individual treatment probabilities differ for different values of the covariate Z , but they are
invariant given a value of the covariate Z . Note that the conditional treatment probability
P (X =1| Z =m) = 3/4 is also the individual treatment probability P (X =1|U =u ) for the four
male units, and the conditional treatment probability P (X =1| Z = f ) = 1/4 is also the indi-
vidual treatment probability P (X =1|U =u ) for the two female units. This follows from the
fact that P (X =1|U , Z ) and P (X =1|Z ) are conditional expectations [see RS-Eq. (4.10)] and
¡ ¯ ¢
P (X =1|U ) =
P
E P (X =1|U , Z ) ¯ U [σ(U , Z ) ⊂ σ(U ), RS-Box 4.1 (xiii)]
¡ ¯ ¢
=
P
E P (X =1| Z ) ¯ U [P (X =1|U , Z ) =
P
P (X =1| Z )]
£ ¡ ¢ ¤
=
P
P (X =1| Z ). σ P (X =1| Z ) ⊂ σ(U ), RS-Box 4.1 (xi)
8.6 Methodological Consequences 251
In Table 6.4 there is only Z -conditional independence of U and the treatment variable
X , that is, U ⊥
⊥X | Z , whereas U ⊥⊥X does not hold. Therefore, in this table, the prima facie
effect
PFE 10 = E (Y |X =1) − E (Y |X =0) ≈ 96.715 − 99.800 ≈ −3.085
is biased, because the average total effect in this example is ATE 10 = 10 (see Exercise 8-9).
However, the (Z =z)-conditional prima facie effects are unbiased. In fact, they are the same
as in Table 6.3 (see Example 8.56). Hence, we can use the conditional prima facie effects to
compute the average total effect. If the conditional prima facie effects are unbiased, that is,
if they are equal to the corresponding causal conditional total effects, then the expectation
of the conditional prima facie effect variable is equal to the causal average total effect (see
Th. 6.34). In our example, this expectation is,
¡ ¢ 4 2
ATE 10 = E PFE Z ;10 (Z ) = PFE Z ;10 (m) · + PFE Z ; 10 ( f ) ·
6 6
4 2
= 9.50 · + 11 · = 10.00.
6 6
⊳
Now we discuss the conclusions from the theory treated in this chapter for the design and
analysis of experiments and quasi-experiments. Theorems 8.22 and 8.27 are the theoreti-
cal foundation of the experimental design technique of randomization and of the analysis
of causal conditional and causal average total treatment effects in experiments by compar-
ing (unadjusted) means between treatment conditions. Theorem 8.31 is the theoretical
foundation for the experimental design technique of conditional randomization and of
the analysis of causal conditional total treatment effects. Together with Theorem 6.34, this
theorem can also be used for the analysis of causal average total effects. Finally, Theorems
8.31 and 8.34, and Corollary 8.36 can be used in (nonrandomized) quasi-experiments for
covariate selection and the analysis of causal conditional and causal average total effects.
This will now be explained in more detail.
P (X =x |D X ) =
P
P (X =x |U ) =
P
P (X =x ), ∀ x ∈ X (Ω) . (8.58)
252 8 Fisher Conditions
The last one of these two equations implies that all units u ∈ U (Ω) have the same probabil-
ity P (X =x |U =u ) = P (X =x ) to be assigned to treatment x, and this holds for all x ∈ X (Ω).
Remember that we are talking about a single-unit trial. A simple example of such a
single-unit trial consists of sampling a single observational unit from a set of units, pos-
sibly assessing a number of covariates, assigning the unit (or observing its assignment) to
one of the treatment conditions and observing the outcome variable (see ch. 2 for more
details and other kinds of single-unit trials).
Note that Equation (8.58) does not imply that the probabilities P (X =x ) are the same for
all treatment conditions x. If, for example, there are two treatment conditions, say 0 and 1,
then the two treatment probabilities might be P (X =1) = 1/4 and P (X = 0) = 3/4. However,
DX ⊥ ⊥X implies P (X =1|U ) = P (X =1) because σ(U ) ⊂ σ(D X ). In other words, D X ⊥ ⊥X im-
plies that the individual treatment probabilities P (X =x |U =u ) are identical for all units,
and they are equal to the (unconditional) treatment probability P (X =x ). Hence, in a per-
fect randomized experiment we ensure D X ⊥ ⊥X . If P (X =x ) > 0 for all x ∈ X (Ω), then this
condition implies that the conditional expectation values E (Y |X =x ), x ∈ X (Ω), are unbi-
ased (see Th. 8.22) and that a prima facie effect E (Y |X =x ) − E (Y |X =x ′ ) is identical to the
average causal total effect ATE x x ′ (see Th. 8.43).
It is important to note that through a perfect randomized experiment we do not only
create D X ⊥⊥X but also D X ⊥ ⊥X |W for each potential confounder W of X (see Th. 8.17). If
P (X =x ) > 0 for all x ∈ X (Ω), then this condition implies P (X =x |W ) > P
0 for all x ∈ X (Ω),
which in turn implies that each conditional expectation E X =x(Y |W ), x ∈ X (Ω), the condi-
tional expectation E (Y | X ,W ), and each prima facie effect variable
′
PFE W ; x x ′ (W ) = E X =x (Y |W ) − E X =x (Y |W ), x, x ′ ∈ X (Ω),
systematic attrition of subjects may invalidate the F-condition D X ⊥⊥X (see, e. g., Abraham
& Russell, 2004; Fichman & Cummings, 2003; Graham & Donaldson, 1993; Shadish et al.,
2002). In this case, we will say that randomization failed and the initially randomized ex-
periment turned into a quasi-experiment.
In a quasi-experiment, selecting the covariates in the covariate vector Z := (Z 1 , . . . , Z m )
for which we can hope that D X ⊥ ⊥X |Z holds is a useful strategy in the analysis of causal
conditional and causal average total treatment effects. However, note that there might be
many covariates determining the treatment probabilities. For instance, the severity of the
disorder, knowing about the treatment, and availability of the treatment are candidates for
such covariates. To emphasize, there is no guarantee that D X ⊥ ⊥X |Z holds for a specified
(univariate or multivariate) covariate Z , unless the conditional probabilities P (X =x | Z ),
x ∈ X (Ω), are fixed by the experimenter. ⊳
Remark 8.61 [Testability] As already mentioned before, in contrast to the causality con-
ditions treated in chapters 6 and 7, the causality conditions D X ⊥ ⊥X and D X ⊥
⊥X |Z can
be tested in empirical applications, at least in the sense that some consequences of these
conditions can be checked. If at least one of these consequences does not hold, then we
say that the corresponding causality condition is falsified.
Let us briefly outline how we can test the assumption that D X ⊥⊥X holds. Remember, if
X (Ω) is finite or countable, then D X ⊥
⊥X is equivalent to
P (X =x |D X ) =
P
P (X =x ), ∀ x ∈ X (Ω) . (8.59)
P (X =x |D X ) =
P
P (X =x | Z ), ∀ x ∈ X (Ω) . (8.61)
P (X =x |W ∗, Z ∗ ) =
P
P (X =x | Z ∗ ), ∀ x ∈ X (Ω), (8.63)
8.6 Methodological Consequences 255
P (X =x |D X ) =
P
P (X =x | Z ∗ ), ∀ x ∈ X (Ω) . (8.64)
Of course, such a procedure does not guarantee that we find a (possibly multivariate) co-
variate Z ∗ such that Equation (8.64) holds. Instead, in a quasi-experiment, Equation (8.64)
always remains an assumption. However, this assumption can always be falsified, which
has a positive and a negative side. The negative side is that we can never be sure that this
assumption holds. The positive side is that this assumption is empirically testable and, in
this sense, it is not just a matter of belief (cf. Popper, 2005). ⊳
Remark 8.63 [Generalizability of D X ⊥ ⊥X ] If D X ⊥
⊥X holds, then, according to Theorem
8.17, D X ⊥⊥X |W holds as well, provided that W is a potential confounder of X , or synony-
mously, a covariate of X . Under the assumptions of Theorem 8.22, this implies: If D X ⊥ ⊥X
holds, then E (Y |X ) is unbiased, and under the assumptions of Theorem 8.27, which in-
clude that Z is a covariate of X , it also implies unbiasedness of E (Y |X, Z ). Conditioning on
a covariate Z of X may be meaningful in order to obtain a causal conditional total effect
function CTE Z ; x x ′ that is more fine-grained and therefore providing more specific infor-
mation than the causal average total effect ATE x x ′ .
If, for example, Z denotes the covariate sex with values m (male) and f (female), and
P (X =x , Z =z), P (X =x ′, Z =z) > 0 for both values z of Z , then D X ⊥
⊥X does not only imply
PFE x x ′ = ATE x x ′ but also
where CTE Z ; xx ′ (z) denotes the causal (Z =z)-conditional total effect comparing x to x ′ .
This applies to males (z =m) and to females (z = f ). Even if PFE x x ′ = PFE Z ; x x ′ (m) =
PFE Z ; x x ′ ( f ), then this would be an important information, extending the substantive in-
terpretation of the causal average total effect ATE x x ′ because then it would be identical to
the causal conditional total effects CTE Z ; x x ′ (m) and CTE Z ; x x ′ ( f ).
In order to emphasize the importance of this point for substantive research, imagine
there would be an empirical study in which we can interpret E (Y | X =x ) − E (Y | X =x ′ ) as
the causal average total treatment effect and at the same time the corresponding differ-
ences E (Y | X =x , Z =z) − E (Y | X =x ′, Z =z) would have no causal interpretation! ⊳
Remark 8.64 [Generalizability of D X ⊥ ⊥X |Z ] Now assume that D X ⊥ ⊥X |Z holds, where Z
is a random variable on (Ω, A, P ), but not necessarily a covariate of X . If W is a poten-
tial confounder of X , then, according to Theorem 8.18, this implies that we also have
DX⊥ ⊥X |(Z ,W ), that is, (Z ,W )-conditional independence of D X and X . Hence, if we con-
dition on Z and D X ⊥ ⊥X |Z holds, then there is no need to control for further potential con-
founders, at least not for the purpose of establishing unbiasedness. However, controlling
for W may still be meaningful in order to obtain a more fine-grained total effect function
CTE Z ,W ; x x ′ that contains more specific information than CTE Z ; x x ′ .
Under the Assumptions 8.1 (a) to (d), (g), P (X =x | Z ,W ) > P
0 for all x ∈ X (Ω), and that Z
is a covariate of X , the condition D X ⊥ ⊥X |Z implies unbiasedness not only of E (Y |X, Z )
but, via D X ⊥ ⊥X |(Z ,W ), also of E (Y |X , Z ,W ) (see Cor. 8.36), no matter which other poten-
tial confounder W of X we consider. Controlling, additionally to Z , for another potential
confounder W of X may still be meaningful in order to obtain a causal conditional total
effect function CTE Z ,W ; xx ′ that is more fine-grained than CTE Z ; xx ′ . ⊳
256 8 Fisher Conditions
8.8 Proofs
DX ⊥
⊥1X =x ⇔ ∀W ∈ WX : (W, D X )⊥
⊥ 1X =x [σ(W, D X ) = σ(D X )]
⇒ ∀W ∈ WX : D X ⊥
⊥1X =x |W . [RS-Box 6.1 (ix)]
DX⊥
⊥X |Z ⇔ ∀W ∈ WX : (W, D X )⊥
⊥X | Z [σ(W, D X ) = σ(D X )]
⇒ ∀W ∈ WX : D X ⊥
⊥X |(Z ,W ) . [RS-Box 6.1 (viii)]
DX⊥
⊥1X =x |Z ⇔ ∀W ∈ WX : (W, D X )⊥
⊥ 1X =x |Z [σ(W, D X ) = σ(D X )]
⇒ ∀W ∈ WX : D X ⊥
⊥1X =x |(Z ,W ) . [RS-Box 6.1 (viii)]
Proposition (i). This follows from σ(τ) ⊂ σ(D X ) and RS-Box 2.1 (iv).
Proposition (ii).
DX ⊥
⊥X ⇒ τ⊥
⊥X [(i)]
⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x . [(7.23)]
Proposition (iii).
DX⊥
⊥X ⇔ ∀ x ∈ X (Ω): P (X =x |D X ) =
P
P (X =x ) [(8.5)]
⇒ ∀ x ∈ X (Ω): P (X =x |D X ) >
P
0 [P (X =x ) > 0, SN-(2.40)]
⇔ ∀ x ∈ X (Ω): τx is P-unique. [RS-Th. 5.27]
Note that 0 < P (X =x ) < 1 implies 0 < P (X 6=x) < 1 because P (X 6=x) = 1 − P (X =x ).
258 8 Fisher Conditions
DX ⊥
⊥1X =x ⇒ E X =x(Y |D X )⊥
⊥ 1X =x .
Correspondingly, because σ E X 6=x(Y |D X ) ⊂ σ(D X ) and σ(1X =x ) = σ(1X 6=x ), RS-Box 2.1 (iv)
¡ ¢
yields
DX ⊥
⊥1X 6=x ⇒ E X 6=x(Y |D X )⊥
⊥ 1X 6=x .
Proposition (ii).
DX ⊥
⊥1X =x ⇔ P (X =x |D X ) =
P
P (X =x ) [(8.1)]
⇒ P (X =x |D X ) >
P
0 [P (X =x ) > 0, SN-(2.40)]
X =x
⇔ E (Y |D X ) is P-unique. [RS-Th. 5.27]
Correspondingly,
DX ⊥
⊥1X =x ⇔ P (X 6=x |D X ) =
P
P (X 6=x) [(8.4)]
⇒ P (X 6=x |D X ) >
P
0 [P (X 6=x) > 0, SN-(2.40)]
X 6=x
⇔ E (Y |D X ) is P-unique. [SN-Cor. 14.48]
Proposition (iii).
DX ⊥
⊥ 1X =x
⇒ E X =x(Y |D X )⊥
⊥ 1X =x ∧ E X =x (Y |D X ) is P-unique [(i), (iii)]
X =x
⇒ E (Y |X =x ) ⊢ D X . [E (Y |D X ) = τx , Th. 7.8]
DX ⊥
⊥ 1X =x
⇒ E X 6=x(Y |D X )⊥
⊥ 1X 6=x ∧ E X 6=x(Y |D X ) is P-unique [(i), (iii)]
X 6=x X 6=x
⇒ E(Y |D X ) 1X 6=x ∧ E (Y |D X ) is P-unique [RS-Th. 4.40]
X 6=x
¡ ¯ ¢ ¡ X 6=x ¢ X =
6 x
⇒ E E (Y |D X ) ¯ 1X 6=x =P
E E (Y |D X ) ∧ E (Y |D X ) is P-unique
[RS-Def. 4.36]
⇔ E (Y |X 6=x) ⊢ D X . [Th. 6.9 (i), (6.13)]
Proposition (iv). This immediately follows from Propositions (iii) and (6.13).
Proposition (i).
DX ⊥
⊥X ⇒ DX⊥
⊥X |Z [Th. 8.17]
⇒ τ⊥
⊥X |Z . [ σ(τ) ⊂ σ(D X ), RS-Box 6.1 (vi)]
Proposition (ii).
8.8 Proofs 259
DX ⊥
⊥X ⇒ τ⊥
⊥X |Z [(i)]
⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x |Z . [RS-(6.8)]
DX ⊥
⊥X ⇒ τ⊥
⊥X |Z ∧ (∀ x ∈ X (Ω): τx is P-unique ) [(i), Th. 8.22 (iii)]
X =x
⇒ ∀ x ∈ X (Ω): E (Y |Z ) ⊢ D X [Th. 7.21]
⇔ E (Y |X, Z ) ⊢ D X . [(7.63)]
Proposition (i).
DX ⊥
⊥X
⇒ Z⊥
⊥X [σ(Z ) ⊂ σ(D X ), RS-Box 2.1 (iv)]
⇒ ∀ x ∈ X (Ω): P (X =x , Z =z) = P (X =x ) · P (Z =z)
[{X =x } ∈ σ(X ), {Z =z } ∈ σ(Z ), Box 8.1 (ii)]
⇒ ∀ x ∈ X (Ω): P (X =x , Z =z) > 0. [P (Z =z) > 0, ∀ x ∈ X (Ω): P (X =x ) > 0]
Proposition (ii).
DX ⊥
⊥X ⇒ τ⊥
⊥X |Z [Th. 8.27 (i)]
⇒ τ⊥
⊥X [P (Z =z) > 0, RS-(6.37)]
P Z=z
⇔ τ⊥
⊥X |(Z =z) . [RS-(6.47)]
Proposition (iii).
DX ⊥
⊥X ⇒ τ⊥
⊥X |(Z =z) [(ii)]
⇔ τ⊥
⊥X [RS-(6.47)]
P Z=z
⇔ ∀ x ∈ X (Ω): τ ⊥
⊥ 1X =x [RS-(6.27)]
P Z=z
⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x |(Z =z) . [RS-(6.47)]
DX⊥
⊥X |Z
⇒ τ⊥
⊥X |Z [σ(τ) ⊂ σ(D X ), RS-Box 6.1 (vi)]
⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x |Z [RS-Th. 6.5]
⇒ ∀ x, x ′ ∈ X (Ω): τx ⊥
⊥1X =x ′ |Z . [σ(τx ) ⊂ σ(τ), RS-Box 6.1 (vi)]
Propositions (iv) and (v). These propositions follow from Proposition (i) and Theorem
7.21.
Proposition (8.40).
DX⊥
⊥1X =x |Z ∧ P (X =x |Z ) >
P
0
⇔ P (X =x |D X ) =
P
P (X =x |Z ) ∧ P (X =x |Z ) >
P
0 [(8.7), (8.13)]
⇒ P (X =x |D X ) >
P
0 [SN-(2.40)]
⇔ τx is P-unique. [RS-Th. 5.27]
Proposition (8.41).
Proposition (i).
DX⊥
⊥X |Z ⇒ DX ⊥
⊥X [P (Z =z) > 0, RS-(6.37)]
P Z=z
Z=z
⇒ ∀ x ∈ X (Ω): τx is P -unique. [Th. 8.22 (iii)]
Proposition (ii).
DX⊥
⊥X |Z ⇒ τ⊥
⊥X |Z [Th. 8.31]
⇒ τ⊥
⊥X [P (Z =z) > 0, RS-(6.37)]
P Z=z
⇔ τ⊥
⊥X |(Z =z) . [RS-(6.47)]
Proposition (iii).
DX⊥
⊥X |Z ⇒ τ⊥
⊥X |(Z =z) [(ii)]
8.8 Proofs 261
⇔ τ⊥
⊥X [RS-(6.47)]
P Z=z
⇔ ∀ x ∈ X (Ω): τ ⊥
⊥ 1X =x [(7.12), (7.14)]
P Z=z
⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x |(Z =z).
DX⊥
⊥X |Z
⊥X |(Z =z) ∧ ∀ x ∈ X (Ω): τx is P Z=z-unique
¡ ¢
⇒ τ⊥ [(i), (ii)]
⊥X |(Z =z) ∧ τx is P Z=z-unique
¡ ¢
⇒ ∀ x ∈ X (Ω): τx ⊥ [(7.32)]
Z=z
⇒ ∀ x ∈ X (Ω): E (Y |X =x ) ⊢ D X . [σ(Z ) ⊂ σ(D X ), (7.35)]
DX⊥
⊥X |Z ⇒ ∀ x ∈ X (Ω): E Z=z(Y |X =x ) ⊢ D X [(iv)]
⇔ E Z=z (Y |X ) ⊢ D X . [Def. 6.18 (ii)]
Proposition (i).
¡ ¢
DX⊥
⊥X |Z ∧ ∀(x, z) ∈ X (Ω)×Z (Ω): P (X =x , Z =z) > 0
⊥X |Z ∧ ∀(x, z) ∈ X (Ω)×Z (Ω): P Z=z (X =x ) > 0
¡ ¢
⇔ DX⊥
[def. of P Z=z (X =x )]
¡ ¢
⇒ DX⊥
⊥X |Z ∧ ∀ x ∈ X (Ω): P (X =x | Z ) >
P
0 [∀ z ∈ Z (Ω): P (Z =z) > 0]
⇒ ∀ x ∈ X (Ω): τx is P-unique. [Th. 8.34]
DX⊥
⊥X |Z ⇒ ∀(x, z) ∈ X (Ω)×Z (Ω): E Z=z(Y |X =x ) ⊢ D X [Th. 8.37 (iv)]
X =x
⇔ ∀(x, z) ∈ X (Ω)×Z (Ω): E (Y |Z =z) ⊢ D X [(6.19)]
X =x
⇒ ∀ x ∈ X (Ω): E (Y |Z ) ⊢ D X . [(i), Th. 6.17]
Proposition (iii). This follows from Proposition (ii) and Definition 6.18 (i).
Proposition (i).
DX⊥
⊥X |(Z =z) ⇔ DX ⊥
⊥X [(8.24)]
P Z=z
⇒ τ⊥
⊥X [σ(τ) ⊂ σ(D X ), RS-Box 2.1 (iv)]
P Z=z
⇔ τ⊥
⊥X |(Z =z) . [RS-(6.47)]
262 8 Fisher Conditions
Proposition (ii).
DX⊥
⊥X |(Z =z)
⇒ τ⊥
⊥X |(Z =z) [(i)]
⇔ τ⊥
⊥X [RS-(6.47)]
P Z=z
⇒ ∀ x ∈ X (Ω): τ ⊥
⊥ 1X =x [∀ x ∈ X (Ω): σ(1X =x ) ⊂ σ(X ), RS-Box 2.1 (iv)]
P Z=z
⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x |(Z =z). [RS-(6.47)]
Proposition (iii).
DX⊥
⊥X |(Z =z)
⇔ DX ⊥
⊥X [(8.24)]
P Z=z
⇔ ∀ x ∈ X (Ω): P Z=z (X =x |D X ) Z=z
= P Z=z (X =x ) [(8.25)]
P
⇒ ∀ x ∈ X (Ω): P Z=z (X =x |D X ) > 0 [P Z=z (X =x ) > 0, SN-(2.40)]
P Z=z
⇔ ∀ x ∈ X (Ω): τx is P Z=z-unique. [RS-Th. 5.27 (i), (ii)]
Propositions (iv) and (v). Now we also assume that Z is a covariate of X . Hence,
DX⊥
⊥X |(Z =z) ⇒ τ⊥
⊥X |(Z =z) [(i)]
⇒ ∀ x ∈ X (Ω): τx ⊥
⊥X |(Z =z) [(7.32)]
⇒ ∀ x ∈ X (Ω): E Z=z(Y |X =x ) ⊢ D X [(iii), (7.35)]
Z=z
⇒ E (Y |X ) ⊢ D X . [(iii), (7.36)]
The proof is analog to the proof of Theorem 8.25. We only have to replace the probability
measure P on (Ω, A, P ) by the measure P Z=z defined in Assumption 8.1 (e). Also note that
E X =x (Y |D X ) =
P
E 1X =x =1 (Y |D X ) and E X 6=x(Y |D X ) =
P
E 1X =x =0 (Y |D X ).
Hence, E X =x (Y |D X ) and E X 6=x(Y |D X ) are versions of the two true outcome variables per-
taining to the indicator variable 1X =x (see Def. 5.4).
Proposition (i).
DX⊥
⊥1X =x |(Z =z)
⇔ DX ⊥
⊥ 1X =x [(8.19)]
P Z=z
X =x
[σ E X =x (Y |D X ) ⊂ σ(D X ), RS-Box 2.1 (iv)]
¡ ¢
⇒ E (Y |D X ) ⊥
⊥ 1X =x
P Z=z
X =x
⇔ E (Y |D X )⊥
⊥ 1X =x |(Z =z). [RS-Prop. (6.47)]
8.8 Proofs 263
DX⊥
⊥1X =x |(Z =z)
⇔ P Z=z (X =x |D X ) Z=z
= P Z=z (X =x ) [(8.20)]
P
Analogously,
DX⊥
⊥1X =x |(Z =z)
DX⊥
⊥1X =x |(Z =z)
⇔ DX ⊥
⊥ 1X =x [(8.19)]
P Z=z
⇒ E X =x(Y |D X ) ⊥
⊥ 1X =x ∧ E X =x (Y |D X ) is P Z=z-unique [(i), (iii)]
P Z=z
DX⊥
⊥1X =x |(Z =z)
⇔ DX ⊥
⊥ 1X =x [(8.19)]
P Z=z
⇔ DX ⊥
⊥ 1X 6=x [σ(1X =x ) = σ(1X 6=x ), RS-Def. 2.59 ]
P Z=z
⇒ E X 6=x(Y |D X ) ⊥
⊥ 1X 6=x ∧ E X 6=x(Y |D X ) is P Z=z-unique [(i), (iii)]
P Z=z
Proposition (iv). This immediately follows from Propositions (iii) and (6.13).
DX ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′
⇒ E (Y |X =x ) ⊢ D X ∧ E (Y |X =x ′ ) ⊢ D X [Th. 8.25 (iii)]
264 8 Fisher Conditions
Proposition (i).
DX⊥ ⊥1X =x ′ | Z
⊥1X =x |Z ∧ D X ⊥
⇒ E 1X =x =1 (Y | Z ) ⊢ DC ∧ E 1X =x ′ =1 (Y | Z ) ⊢ DC [Th. 8.31 (iv), (8.40)]
X =x X =x ′
⇔ E (Y |Z ) ⊢ D X ∧ E (Y |Z ) ⊢ D X
[P (1X =x = 1) = P (X =x ), P (1X =x ′ = 1) = P (X =x ′ )]
⇒ PFE Z ; x x ′ ⊢ DC . [Th. 6.25]
Proposition (i).
DX ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′
⇒ DX⊥ ⊥1X =x ′ | Z ∧ τx , τx ′ are P-unique
⊥1X =x |Z ∧ D X ⊥
[Th. 8.17 (ii), Th. 8.25 (ii)]
1 X =x =1 1 X =x ′ =1
⇒ E (Y | Z ) ⊢ DC ∧ E (Y | Z ) ⊢ DC [Th. 8.31 (iv)]
X =x ′
⇔ E X =x (Y |Z ) ⊢ D X ∧ E (Y |Z ) ⊢ D X
[P (1X =x = 1) = P (X =x ), P (1X =x ′ = 1) = P (X =x ′ )]
⇒ PFE Z ; x x ′ ⊢ DC . [Th. 6.25]
Proposition (i).
DX⊥
⊥1X =x |(Z =z) ∧ D X ⊥
⊥1X =x ′|(Z =z)
⇒ E Z=z(Y |X =x ) ⊢ D X ∧ E Z=z (Y |X =x ′ ) ⊢ D X [Th. 8.42 (iii)]
X =x X =x ′
⇔ E (Y |Z =z) ⊢ D X ∧ E (Y |Z =z) ⊢ D X [(6.19)]
⇒ PFE Z ; x x ′ (z) ⊢ DC . [Th. 6.26]
Proposition (ii).
8.9 Exercises 265
DX⊥ ⊥1X =x ′ | Z
⊥1X =x |Z ∧ D X ⊥
⇒ DX⊥
⊥1X =x |(Z =z) ∧ D X ⊥
⊥1X =x ′|(Z =z) [RS-(6.37), RS-(6.47)]
⇒ PFE Z ; x x ′ (z) ⊢ DC . [(i)]
Proposition (iii).
DX⊥
⊥X |Z
⇒ DX⊥ ⊥1X =x ′ | Z
⊥1X =x |Z ∧ D X ⊥ [σ(1X =x ), σ(1X =x ′ ) ⊂ σ(X ), RS-Box 6.1 (vi)]
⇒ PFE Z ; x x ′ (z) ⊢ DC . [(ii)]
Proposition (iv).
DX⊥
⊥X |(Z =z)
⇔ DX ⊥
⊥X
P Z=z
⇒ DX ⊥
⊥ 1X =x ∧ D X ⊥
⊥ 1X =x ′ [σ(1X =x ), σ(1X =x ′ ) ⊂ σ(X ), RS-Box 2.1 (iv)]
P Z=z P Z=z
⇔ DX⊥
⊥1X =x |(Z =z) ∧ D X ⊥
⊥1X =x ′|(Z =z) [RS-(6.47)]
Proposition (i).
DX ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′
⇒ E Z=z(Y |X =x ) ⊢ D X ∧ E Z=z (Y |X =x ′ ) ⊢ D X [Th. 8.29 (iv)]
⇒ PFE Z ; x x ′ (z) ⊢ DC . [Th. 6.26]
8.9 Exercises
⊲ Exercise 8-2 Check if and where the implications listed in Table 8.1 have been proven in this
chapter and prove those that have not. Use and specify the appropriate choice of assumptions listed
in the Assumptions 8.1.
⊲ Exercise 8-3 Check if and where the implications listed in Table 8.2 have been proven in this
chapter and prove those that have not. Use and specify the appropriate choice of assumptions listed
in the Assumptions 8.1.
⊲ Exercise 8-4 Consider Table 8.3 and show that P(X =1|U )(ω) = P(X =1|Z )(ω), for all ω ∈ {Z =m}∪
{Z = f } implies P(X =1|U ) = P(X =1|Z ).
P
266 8 Fisher Conditions
⊲ Exercise 8-5 Consider Table 8.3 and specify the four values of a second version of the U -condi-
tional expectation of Y with respect to the measure P X =1.
⊲ Exercise 8-6 Let the Assumptions 8.1 (a) to (c) and (g) hold and let Z be a covariate of X . Which
terms are then unbiased if D X ⊥
⊥X holds?
⊲ Exercise 8-7 Let the Assumptions 8.1 (a) to (d) and (g) hold. Which terms are then unbiased if we
also assume D X ⊥
⊥X |Z and P(X =x | Z ) > 0?
P
P(X =x |W ) = P(X =x ),
P
provided that the random variable W is measurable with respect to D X . Assume that P(X =x ) > 0.
⊲ Exercise 8-9 Compute the conditional expectation value E (Y |X =0) and the expectation E (τ0 ) in
the example presented in Table 6.4 and compare these numbers to each other.
⊲ Exercise 8-10 Describe randomized assignment of a unit to one of two treatment conditions!
⊲ Exercise 8-11 Describe conditionally randomized assignment of a unit to one of two treatment
conditions given a covariate Z ! For simplicity, assume that Z is finite with P(Z=z) > 0 for all its
values z ∈ Z (Ω) and that X is dichotomous with values 0 and 1.
P(X =x | Z ,W ) = P(X =x | Z ),
P
if the random variable W is measurable with respect to D X . Assume that P(X =x ) > 0.
Solutions
⊲ Solution 8-1 Definitions 4.11 (iv) and (iii) of a potential confounder and a global potential con-
founder of X imply σ(W ) ⊂ σ(D X ). Therefore, RS-Proposition (6.5) yields the proposition.
(1) Under the Assumptions 8.1, (a), (b), (d), and (e): D X ⊥
⊥X |(Z =z) ⇒ D X ⊥
⊥1X =x |(Z =z).
DX⊥
⊥X |(Z =z ) ⇔ DX ⊥
⊥X [(8.24)]
P Z=z
⇒ DX ⊥⊥ 1X =x [(8.6)]
P Z=z
⇔ DX⊥⊥1X =x |(Z =z) . [(8.19)]
(2) Under the Assumptions 8.1 (a), (b), (d), and (e): D X ⊥
⊥1X =x |Z ⇒ D X ⊥
⊥1X =x |(Z =z).
This immediately follows from RS-Propositions (6.47) and (6.48).
(3) Under the Assumptions 8.1 (a), (b), (d), and (e): D X ⊥
⊥X |Z ⇒ D X ⊥
⊥1X =x |(Z =z).
DX⊥
⊥X |Z ⇒ DX⊥
⊥1X =x |Z [σ(1X =x ) ⊂ σ(X ), RS-Box 6.1 (vi)]
⇒ DX⊥
⊥1X =x |(Z =z) . [(2)]
8.9 Exercises 267
DX ⊥
⊥1X =x ⇒ DX⊥
⊥1X =x |Z [(8.30)]
⇒ DX⊥
⊥1X =x |(Z =z) . [(2)]
DX ⊥
⊥X ⇒ DX ⊥
⊥1X =x [(8.27)]
⇒ DX⊥
⊥1X =x |(Z =z) . [(6)]
(9) Under the Assumptions 8.1 (a), (d), (e), and (h): D X ⊥
⊥X ⇒ D X ⊥
⊥X |(Z =z ).
This is Proposition (8.36).
(10) Under the Assumptions 8.1 (a), (b), and (h): D X ⊥
⊥X ⇒ D X ⊥
⊥1X =x |Z .
This is Proposition (8.31).
(11) Under the Assumptions 8.1 (a) and (h): D X ⊥
⊥X ⇒ D X ⊥
⊥X |Z .
This is Proposition (8.29).
(12) Under the Assumptions 8.1 (a) and (b): D X ⊥
⊥X ⇒ D X ⊥
⊥1X =x .
This is Proposition (8.27).
(1) Under the Assumptions 8.1 (a) to (e) and (g): D X ⊥ ⊥1X =x |(Z =z) ⇒ τ⊥
⊥1X =x |(Z =z).
This is Theorem 8.41 (i) for 1X =x taking the role of X .
(2) ⊥X |(Z =z) ⇒ τ⊥
Under the Assumptions 8.1 (a) to (e) and (g): D X ⊥ ⊥1X =x |(Z =z).
This immediately follows from Theorem 8.41 (ii).
(3) ⊥X |(Z =z) ⇒ τ⊥
Under the Assumptions 8.1 (a) to (e) and (g): D X ⊥ ⊥X |(Z =z).
This is Theorem 8.41 (i).
(4) Under the Assumptions 8.1 (a) to (e) and (g): D X ⊥ ⊥1X =x |Z ⇒ τ⊥
⊥1X =x |(Z =z).
This is Theorem 8.37 (ii) for 1X =x taking the role of X .
(5) Under the Assumptions 8.1 (a) to (d) and (g) : D X ⊥⊥1X =x |Z ⇒ τ⊥
⊥1X =x |Z .
This is Theorem 8.31 (i) for 1X =x taking the role of X .
(6) ⊥X |Z ⇒ τ⊥
Under the Assumptions 8.1 (a) to (g): D X ⊥ ⊥1X =x |(Z =z).
This immediately follows from Theorem 8.37 (iii).
(7) ⊥X |Z ⇒ τ⊥
Under the Assumptions 8.1 (a) to (e) and (g): D X ⊥ ⊥X |(Z =z).
This is Theorem 8.37 (ii).
(8) ⊥X |Z ⇒ τ⊥
Under the Assumptions 8.1 (a) to (d) and (g) : D X ⊥ ⊥1X =x |Z .
This is Theorem 8.31 (ii).
(9) ⊥X |Z ⇒ τ⊥
Under the Assumptions 8.1 (a) to (d) and (g) : D X ⊥ ⊥X |Z .
This is Theorem 8.31 (i).
(10) Under the Assumptions 8.1 (a) to (g) and (h): D X ⊥ ⊥1X =x ⇒ τ⊥
⊥1X =x |(Z =z).
This is Theorem 8.29 (ii) for 1X =x taking the role of X .
(11) Under the Assumptions 8.1 (a) to (d), (g), and (h): D X ⊥ ⊥1X =x ⇒ τ⊥
⊥1X =x |Z .
This is Theorem 8.27 (i) for 1X =x taking the role of X .
268 8 Fisher Conditions
(12) Under the Assumptions 8.1 (a), (c), and (g): D X ⊥⊥1X =x ⇒ τ⊥ ⊥1X =x .
This is Theorem 8.22 (i) for 1X =x taking the role of X .
(13) Under the Assumptions 8.1 (a) to (g) and (h): D X ⊥ ⊥X ⇒ τ⊥ ⊥1X =x |(Z =z).
This immediately follows from Theorem 8.29 (iii).
(14) Under the Assumptions 8.1 (a) to (g) and (h): D X ⊥ ⊥X ⇒ τ⊥ ⊥X |(Z =z).
This is Theorem 8.29 (ii).
(15) Under the Assumptions 8.1 (a) to (d), (g), and (h): D X ⊥ ⊥X ⇒ τ⊥ ⊥1X =x |Z .
This immediately follows from Theorem 8.27 (ii).
(16) Under the Assumptions 8.1 (a) to (d), (g), and (h): D X ⊥ ⊥X ⇒ τ⊥ ⊥X |Z .
This is Theorem 8.27 (i).
(17) Under the Assumptions 8.1 (a), (c), and (g) : D X ⊥⊥X ⇒ τ⊥ ⊥1X =x .
This immediately follows from Theorem 8.22 (ii).
(18) Under the Assumptions 8.1 (a), (c), and (g) : D X ⊥⊥X ⇒ τ⊥ ⊥X .
This is Theorem 8.22 (i).
⊲ Solution 8-4 In Example 8.33 we specified the regular causality setup to be the same as in section
6.5. Hence, Ω = ΩU × ΩX × R [see Eq. (6.74)]. Furthermore, according to Equation (8.39), the values
of P(X =1|U ) and P(X =1|Z ) are identical for all elements of the set
{Z =m} ∪ {Z = f } = { Joe , Jim } × ΩX × R ∪ {Ann ,Sue } × ΩX × R = Ω.
Hence, the set A used in RS-Definition 2.46 is Ø, the empty set. Because P(Ø) = 0 and
∀ ω ∈ Ω \ Ø: P(X =1|U )(ω) = P(X =1|Z )(ω)
holds [see RS-Eq. (2.61)], this proves P(X =1|U ) = P(X =1|Z ).
P
⊲ Solution 8-5 The four values of a second version τ1∗ of the U -conditional expectation of Y with
respect to the measure P X =1 are
τ1∗ (ω) = 1000, if ω ∈ {U =Joe }
∗
τ1 (ω) = 2000, if ω ∈ {U =Jim }
τ1∗ (ω) = 114, if ω ∈ {U =Ann}
τ1∗ (ω) = 130, if ω ∈ {U =Sue }.
Note that, instead of 1000 and 2000, we could have chosen any other two numbers. The crucial point
is: τ1∗ = τ1 because P X =1(U =Joe ) = P X =1(U =Jim ) = 0 (see Table 8.3).
P X =1
⊲ Solution 8-6 Under these assumptions D X ⊥ ⊥X implies that E (Y |X ), E (Y |X, Z ), all E (Y |X =x ),
and all E X =x (Y |Z ), x ∈ X (Ω), are unbiased. This also implies that the prima facie effects PFE x x ′ and
the prima facie effect variables PFE Z ; x x ′ (Z ), x, x ′ ∈ X (Ω), are unbiased as well.
⊲ Solution 8-7 Under these assumptions D X ⊥ ⊥X |Z implies that E (Y |X, Z ) and all E X =x (Y |Z ),
x ∈ X (Ω), are unbiased. It also implies that the prima facie effect variables PFE Z ; x x ′ (Z ), x, x ′ ∈ X (Ω),
are unbiased as well. Also note that the causal average total effects ATE x x ′ can be computed from
PFE Z ; x x ′ (Z ) (see Th. 6.34).
⊲ Solution 8-9 According to RS-Box 3.2 (ii), the conditional expectation value E (Y |X =0) can be
computed via X
E (Y |X =0) = E (Y | X = 0,U =u ) · P(U =u |X =0)
u
1 3
= (68 + 78 + 88 + 98) · + (106 + 116) · = 99.8.
10 10
In contrast, according to RS-(3.13) and Equations (6.77), (6.78), the expectation E (τ0 ) can be com-
puted via
E (τ0 ) = E [E X=0 (Y |U )] = E g 0 (U ) =
¡ ¢ X
g 0 (u) · P(U =u )
u
X
= E (Y | X = 0,U =u ) · P(U =u )
u
1
= (68 + 78 + 88 + 98 + 106 + 116) ·= 92.3333.
6
Comparing E (Y |X =0) = 99.8 to E (τ0 ) = 92.3333 shows that E (Y |X =0) is strongly biased [see Def. 6.3
(i)].
⊲ Solution 8-10 In a random experiment, in which a unit u is sampled and assigned to one of two
treatment conditions, we may assign the unit by coin toss, for instance. This ensures
that is, the treatment probabilities do not depend on the global potential confounder D X of X [see
Prop. (8.5)] and therefore not on any potential confounder of X (see Rem. 8.61). If D X is specified
such that U is measurable with respect to D X , then P(X =1|D X ) = P(X =1) implies that each unit u
P
has the same probability P(X =1|U =u ) = P(X =1) of being assigned to treatment 1. If we assume
that X is dichotomous with values 0 and 1, then this also implies that each unit u has the same
probability P(X = 0) to be assigned to treatment 0 as well because
P(X =0 |U =u ) = 1 − P(X =1|U =u ) = 1 − P(X =1) = P(X =0).
⊲ Solution 8-11 We consider a random experiment, in which a unit u is sampled and a value z of
the covariate Z is assessed before the unit is assigned to one of the two treatment conditions. We
also assume P(Z =z ) > 0 for all values z ∈ Z (Ω) and 0 < P(X =1| Z ) < 1. Then Z -conditional random-
ized assignment of a unit to one of the two treatment conditions refers to assigning the unit u to
treatment 1 with probability
⊲ Solution 8-12
¡ ¢ ¡ ¢
E P(X =x |U ) = E E (1X =x |U ) [RS-(4.10)]
= E (1X =x ) [RS-Box 4.1 (iv)]
= P(X =x ). [RS-(3.9)]
Requirements
Reading this chapter we assume that the reader is familiar with the concepts treated in
all chapters of Steyer (2024). Again, chapters 4 to 6 of that book are now crucial, dealing
with the concepts of a conditional expectation, a conditional expectation with respect to a
conditional probability measure, and conditional independence. Furthermore, we assume
familiarity with chapters 4 to 8 of the present book.
In this chapter we often refer to the following notation and assumptions.
(d) For x ∈ X (Ω), let τx = E X =x (Y |D X ) denote a (version of the) true outcome variable
of Y given (the value) x (of X ).
(e) For all x ∈ X (Ω) = {0, 1, . . . , J }, let {x } ∈ AX′ and assume 0 < P (X =x ) < 1. Further-
more, let τ := (τ0 , τ1 , . . . , τJ ) denote the (J +1)-variate random variable consisting
of the true outcome variables τx , x ∈ X (Ω).
(f ) Let x ′ ∈ ΩX′ and {x ′ } ∈ AX′ , let 1X =x ′ denote the indicator of the event {X =x ′ } =
{ω ∈ Ω: X (ω) = x ′ } and assume that 0 < P (X =x ′ ) < 1. Furthermore, let τx ′ =
′
E X =x (Y |D X ) denote a (version of the) true outcome variable of Y given x ′ .
(g) Let Z be a random variable on (Ω, A, P ) and let (ΩZ′ , AZ′ ) denote its value space.
(h) Let Z be a covariate of X , that is, let Z be a random variable on (Ω, A, P ) satisfy-
ing σ(Z ) ⊂ σ(D X ).
(i) Assume that τx is P-unique.
(j) Assume that all τx , x ∈ X (Ω), are P-unique.
(k) Assume that τx ′ is P-unique.
9.1 SR-Conditions
Box 9.1 presents all eight Suppes-Reichenbach conditions discussed in this chapter. We
start commenting on those conditions that only involve conditioning on X or on one of its
values x, but not on another random variable Z .
Remark 9.2 [Independence of Y and D X With Respect to P X =x ] The very first condition
displayed in Box 9.1 (i), Y ⊥⊥D X |(X =x ), is equivalent to independence of Y and D X with
respect to the measure P X =x . This measure has been specified in Assumptions 9.1 (b).
Hence,
Y⊥
⊥D X |(X =x ) ⇔ Y ⊥
⊥ DX, (9.1)
P X =x
where
Y ⊥
⊥ DX :⇔ ∀(A, B) ∈ σ(Y ) × σ(D X ): P X =x (A ∩ B) = P X =x (A) · P X =x (B). (9.2)
P X =x
This implies that we can utilize all properties of independence of two random variables
with respect to a probability measure. In this case, it is the measure P X =x instead of P .
9.1 SR-Conditions 273
Y⊥
⊥D X |(X =x ) (X =x )-conditional independence of Y and D X . Under Assumptions 9.1
(a) and (b), it is defined by
∀ (A,B) ∈ σ(Y ) ×σ(D X ): P(A ∩ B | X =x ) = P(A | X =x ) · P(B | X =x ). (i)
E X =x (Y |D X ) = E X =x (Y ). (ii)
P X =x
Under Assumptions 9.1 (a) to (d), each of the above two conditions implies E (Y |X =x ) ⊢ D X .
If, additionally, Z is a covariate of X , then each of them also implies E X =x (Y |Z ) ⊢ D X .
Y⊥
⊥D X |X X -conditional independence of Y and D X . Under Assumptions 9.1 (a), it
is defined by
E (Y | X ,D X ) = E (Y |X ) . (iv)
P
Under Assumptions 9.1 (a) to (e), each of the last two conditions implies E (Y |X ) ⊢ D X . If,
additionally, Z is a covariate of X , then each of them also implies E (Y |X, Z ) ⊢ D X .
Y⊥
⊥D X |(X =x , Z ) (X =x , Z )-conditional independence of Y and D X . Under Assumptions 9.1
(a), (b), and (g), it is defined by
E X =x (Y | Z ,D X ) = E X =x (Y |Z ) . (vi)
P X =x
Under Assumptions 9.1 (a) to (d) and if Z is a covariate of X , then each of the last two condi-
tions implies E X =x (Y |Z ) ⊢ D X .
Y⊥
⊥D X |(X , Z ) (X , Z )-conditional independence of Y and D X . Under Assumptions 9.1 (a)
and (g), it is defined by
Under Assumptions 9.1 (a) to (e) and that Z is a covariate of X , each of the last two conditions
implies E (Y |X, Z ) ⊢ D X and E X =x (Y |Z ) ⊢ D X for all x ∈ X (Ω).
274 9 Suppes-Reichenbach Conditions
Note that the condition Y ⊥⊥D X |(X =x ) does not apply if X is continuous because we as-
sume P (X =x ) > 0 [see Assumptions 9.1 (b) and RS-Rem. 2.77]. Independence of random
variables (with respect to a probability measure) and some of its properties have been
treated in RS-section 2.4. (More details are provided, for example, in SN-chapter 16. In
particular, equivalent conditions for Y ⊥⊥D X |X in terms of conditional distributions are
found in SN-section 17.6.) ⊳
Remark 9.3 [Mean-Independence of Y and D X With Respect to P X =x ] The causality con-
dition Y D X |(X =x ) defined in Box 9.1 (ii) is equivalent to mean-independence of Y from
D X with respect to P X =x (see RS-Def. 4.36), which is denoted Y D X . Hence,
P X =x
Again note that Y D X |(X =x ) does not apply if X is continuous because it is defined only
if P (X =x ) > 0. ⊳
Remark 9.4 [X -Conditional Independence of Y and D X ] The third causality condition
defined in Box 9.1 (iii) is X-conditional independence of the outcome variable Y and D X ,
denoted by Y ⊥ ⊥D X |X . This condition implies that the distribution of the outcome vari-
able Y does not depend on the global potential confounder D X , once we condition on
the putative cause variable X . That is, we implicitly postulate that P Y | X is a version of the
conditional distribution P Y | X , D X (see SN-ch. 17). This condition also applies if X is contin-
uous. More details about conditional independence are found in RS-chapter 6 and about
conditional distributions in SN-chapter 16. ⊳
Remark 9.5 [X -Conditional Mean-Independence of Y From D X ] The next causality con-
dition, defined in Box 9.1 (iv), is X-conditional mean-independence of Y and D X , denoted
Y D X |X. With this condition we postulate that the (X , D X )-conditional expectation of Y
actually does not depend on the global potential confounder D X of X , once we condi-
tion on X . That is, we postulate that E (Y |X ) is a version of the conditional expectation
E (Y | X , D X )(see RS-ch. 4). Again, note that Y D X |X also applies if X is continuous. ⊳
Remark 9.6 [A Caveat Concerning Mediators] Note that none of the causality conditions
treated so far excludes that mediators (and other variables that are between X and Y ) de-
termine the distribution of Y. For example, Y ⊥ ⊥D X |X only implies that the conditional
distribution P Y | X is a version of the conditional distribution P Y |X ,W whenever W is a po-
tential confounder of X . Remember, a potential confounder of X is D X -measurable and
therefore prior or simultaneous in (Ft )t ∈T to X (see Cor. 4.33). ⊳
In the following theorem we present a condition that is equivalent to Y D X |X , pro-
vided that, for all values x of X , P (X =x ) > 0.
Remark 9.8 [Constant True Outcome Variables] Let the Assumptions 9.1 (a) to (e) hold,
which includes that all τx , x ∈ X (Ω), are P-unique, then
(Y ), E X =1 (Y ), . . . , E X = J (Y ) ,
¡ X=0 ¢
Y D X |X ⇔ τ =
P
E (9.6)
τx =
P
E X =x (Y ), ∀ x ∈ X (Ω) = {0, 1, . . . , J }. (9.7)
Remark 9.9 [(X =x , Z )-Conditional Independence of Y and D X ] The first one, which is
defined in Box 9.1 (v), is called (X =x , Z )-conditional independence of Y and D X , and de-
noted Y ⊥⊥D X |(X =x , Z ) or Y ⊥
⊥ D X |Z . Hence,
P X =x
Y⊥
⊥D X |(X =x , Z ) ⇔ Y ⊥
⊥ D X |Z , (9.8)
P X =x
that is, the two symbols denote the same kind of Z -conditional independence of Y from
D X given a value x of X . ⊳
E X =x (Y |D X ) X==x E X =x (Y | Z , D X ) . (9.9)
P
Y D X |(X , Z ) ⇔ E (Y | X , D X ) =
P
E (Y |X, Z ) . (9.14)
⊳
In the following theorem we present a condition that is equivalent to Y D X |(X , Z ) if X
is discrete. This theorem is a generalization of Theorem 9.7.
Remark 9.15 [True Outcome Variables as Functions of Z ] Under the Assumptions 9.1 (a)
to (e), which includes P -uniqueness of the true outcome variables τx , x ∈ X (Ω),
(Y | Z ), E X =1 (Y | Z ), . . . , E X = J (Y | Z ) ,
¡ X=0 ¢
Y D X |(X , Z ) ⇔ τ =
P
E (9.16)