0% found this document useful (0 votes)
54 views

Probability and Causality

This document discusses the importance of understanding causality and probability in empirical sciences, particularly in relation to social and political interventions, medical treatments, and various outcomes. It critiques existing theories of causality for lacking mathematical rigor and presents a new mathematical theory that defines causal effects in terms of measure and probability theory. The book focuses on causal inference, emphasizing the challenges of multiple determinacy and the fundamental problem of causal inference in experimental designs.

Uploaded by

manubgeeee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Probability and Causality

This document discusses the importance of understanding causality and probability in empirical sciences, particularly in relation to social and political interventions, medical treatments, and various outcomes. It critiques existing theories of causality for lacking mathematical rigor and presents a new mathematical theory that defines causal effects in terms of measure and probability theory. The book focuses on causal inference, emphasizing the challenges of multiple determinacy and the fundamental problem of causal inference in experimental designs.

Uploaded by

manubgeeee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 300

Rolf Steyer

Probability and Causality

Conditional and Average Total Effects

Version: November 15, 2024

University of Jena
Preface

What can we do to reduce global warming? How can we prevent another global financial
crisis? How to fight AIDS? What is the effect of being infected by a virus on dying within
a fixed amount of time? And which political measures can save lives in a specific virus
pandemic? These and similar questions ask about causal effects of political and social in-
terventions, of medical or psychological treatments, or of expositions to potentially harm-
ful or beneficial expositions, for example, to a virus or to asbestos dust. Obviously, inter-
ventions based on the wrong causal theories and hypotheses will cost the lives of thou-
sands, huge amounts of money that could be spent more appropriately, and fundamen-
tally change our lives. Even if our daily problems beyond these issues are less dramatic,
they are of the same nature. Just think about your own actions that you have to chose in
your responsibilities as a student, scientist, teacher, physician, psychologist, politician, or
as a parent! Whatever you do has direct, indirect, and total effects, and these effects might
be different if you take one action instead of another one. Furthermore, on the individual
level, these effects differ for different persons. The effects of being infected by a virus may
be much more severe for persons above 80 years of age if they already have severe health
problems as compared to younger persons or those who do not have such problems. It is
these kind of thoughts that make me believe that there is no other issue in the methodol-
ogy of empirical sciences that deserves and needs more attention and effort than causality.
And because the dependencies we are investigating are of a nondeterministic nature, we
need a probabilistic theory of causality. In other words, we need to understand probability
and causality.

Why Another Book on Causality?

The simple reason is that I wanted to understand the concepts of causality that are and
should be used in our theories in science and in practical live. Existing theories and their
answers never satisfied my own standards of what science should be. For me ‘understand-
ing’ includes to be able to define all relevant concepts in terms of mathematics. None of
the main stream theories, neither Rubin’s potential outcome approach nor Pearl’s graphi-
cal modeling approach satisfy this criterion. Although these pioneers contributed tremen-
dously to our understanding and the popularity of the issue of causality in the empirical
sciences, they lack mathematical rigor, thus opening the ground for misunderstandings
and endless and all too often fruitless discussions.
This book presents a mathematical theory of causality. All terms are well-defined in
terms of measure and probability theory, and all relevant propositions have a mathemati-
cal proof. This does not exclude that I made some mistakes. However, it is now possible to
VI

find and correct them, where they should have occurred. And, of course, the theory is still
far from being complete.

What This Book is About

Empirical causal research involves several inferences and interpretations. Among these
are:

(a) Statistical inference, that is, the inference from a data sample to parameters charac-
terizing the distributions of random variables.
(b) Causal inference, that is, the inference from parameters characterizing the distribu-
tions of random variables to causal effects and/or dependencies.
(c) The substantive interpretation or meaning of the putative cause.
(d) The substantive interpretation or meaning of the outcome variable.
(e) The specification of the random experiment considered.

This book does not deal with all these points. We will neither discuss the mathematics of
statistical inference nor the content issues of construct validity or external validity (Camp-
bell & Stanley, 1963; Cook & Campbell, 1979; Shadish, Cook, & Campbell, 2002) involved
in points (c) to (e). Instead we will focus on the second point: causal inference, that is,
the inference from true parameters (i. e., not their estimates) to causal effects. Parameters
characterizing the distributions of random variables such as the conditional expectation
values of an outcome variable in two treatment conditions have, per se, no causal inter-
pretation. The inference from true parameters to causal effects is what the probabilistic
theory of causal effects and this book are about. As will be shown, causal effects are also
parameters that characterize the joint distributions of the random variables considered
in a random experiment. However, their definitions are less obvious than ‘ordinary’ con-
ditional expectation values and their differences. And, sometimes causal effects are iden-
tical to differences between ordinary conditional expectation values and sometimes they
are not. In other words, sometimes we can have a positive difference between conditional
expectation values that seemingly indicates a positive treatment effect, whereas the causal
effect is negative, and vice versa.

Basic Idea

In order to get a first impression of what this means, let us briefly formulate the basic idea
that can most easily be explained if the putative (or presumed) cause is a treatment or in-
tervention variable. Suppose an individual, or in more general terms, an observational unit
(this can also be a country), could be treated by condition 1 or it could be treated by con-
dition 0, everything else being invariant. If there is a difference in the outcome considered
(some measure of success of the treatment), then this difference is due to the difference in
the two treatment conditions. This idea goes back at least to Mill (1843/1865).

Multiple Determinacy

The problem with this first version of the basic idea is that most outcomes are multiply
determined, that is, they are not only influenced by the treatment variable, but by many
other variables as well. In the field of agricultural research, for example, the yield (outcome)
VII

of a variety does not only depend on the variety (treatment) itself, but it also depends
on the quality of the plot (observational unit), such as the average hours of sunshine on
the plot per day, the amount of water reaching the plot, and the number of microbes in
the plot, and so on. Although Mill’s idea sounds perfect, it is not immediately clear which
implications it has for practice, because the number of other causes is often too large for
keeping constant all of them. Furthermore, Mill’s idea fails to distinguish between poten-
tial confounders and intermediate variables. Holding constant all intermediate variables
as well — and not only all pretreatment variables — would imply that there is no treatment
effect any more, if we assume that all treatment effects have to be transmitted by some
intermediate variables.
Because of the problem of multiple determinacy, Mills conception has been comple-
mented by Sir Ronald A. Fisher (1925/1946) and by Jerzy S. Neyman (1923/1990) in the
second and third decades of the last century. Simply speaking, emphasizing and propa-
gating the randomized experiment, Fisher replaced the ceteris paribus clause (‘everything
else invariant’) by the ceteris paribus distributionibus clause: all other possible causes (the
‘pretreatment variables’) having the same distribution. This is what randomized assign-
ment of units to treatment conditions, for example, based on a coin flip, secures.

A Metaphor — The Invisible Man and His Shadow

Imagine an invisible man. Although we cannot see him, suppose we know that he is there,
because we can see his shadow. Furthermore, suppose we would like to measure his size.
Doing that, we have two problems, a theoretical and a practical one. The theoretical prob-
lem is to define size. We have to clarify that we do not mean ‘volume’ or ‘weight’, but
‘height’ — without shoes, and without hat and hair. Unfortunately, actual height varies
slightly in the course of a day. Hence, we define size to be the expectation (with respect
to the uniform distribution over the 24 hours) of the momentary heights. This solves the
theoretical problem; now we know what we want to measure.
However, because the man is invisible, we cannot measure his size directly — and this
is not only because his size slightly varies over the day. The crucial problem is that we
can only observe his shadow. And this is the practical problem: How to determine his size
from his shadow? Sometimes, there is almost no shadow at all, sometimes it is huge. Some
geometrical reflection yields a first simple solution: measuring the shadow when the sun
has an angle of 45°. But what if it is winter and the sun does not reach this angle? Now we
need more geometrical knowledge, taking into account the actual angle of the sun and the
observed length of the shadow. This will yield an exact measure of the size of the invisible
man at this time of the day as well.

The Perfect Randomized Experiment

Determining a causal effect we face the same kind of problems. First, we have to define
a causal effect, and second, we have to find out how to determine it from empirically es-
timable parameters such as true means, that is, from conditional expectation values. The
simple solution — corresponding to the 45° angle of the sun in the metaphor — is the
perfect randomized experiment. The sample mean differences we observe in a random-
ized experiment only randomly deviate from the causal effect (due to random sample
variation). In contrast, in quasi-experiments and observational studies, solutions to the
VIII

practical problem are more sophisticated. They are also more sophisticated than in the
metaphor of the invisible man, because it is not only one other variable (the angle) that
determines the length of the shadow; instead there often are many other variables sys-
tematically determining the sample means as well as the true means that are estimated
by these sample means. This is again the problem of multiple determinacy. Furthermore,
a true effect to be estimated may even be negative although the true causal effect is posi-
tive, and vice versa. And this reversal of effects can be systematic, and not only be due to
sampling error.

This book presents a solution to the theoretical and the practical problems mentioned
above. Unfortunately, both solutions are not as simple and obvious as in our metaphor.
Furthermore, there is not only one single kind of causal effects, even if we restrict ourselves
to total causal effects and do not consider direct and indirect effects.

Total Individual and Average Causal Effects

To our knowledge, the first pioneer tackling the theoretical and the practical problems was
Jerzy S. Neyman (1923/1990). While Fisher propagated the design technique of random-
ization, Neyman introduced the concepts of total individual and average causal effects,
thus attempting a first solution to the theoretical problem mentioned above. (Note, how-
ever, that he used different terms for these concepts). Developing statistical methods for
agricultural research, he assumed that, for each individual plot, there is an intra-individual
(i. e., plot-specific) distribution of the outcome variable, say Y , under each treatment.
He then defined the individual causal effect of treatment x compared to treatment x ′ to
be the difference between the intra-individual (plot-specific) expectation of Y (the “true
yield”) given treatment (“variety”) x and the intra-individual (plot-specific) expectation of
Y given treatment (“variety”) x ′ . Once the individual causal effect is defined, the average
treatment effect of x compared to x ′ on Y is simply the expectation (true mean) of the cor-
responding individual (plot-specific) causal effects in the set (population) of observational
units (plots). Similarly, several kinds of conditional effects can be defined, conditioning, for
instance, on covariates, that is, on other causes of Y that cannot be affected by X , such as
measures of the quality of the soil before treatment, average hours of sunshine, average
hours of rain, and so on.

Total, Direct, and Indirect Effects

At about the same time as Neyman and Fisher developed their ideas, Sewall Wright
(Wright, 1918, 1921, 1923, 1934, 1960a, 1960b) developed his ideas on path analysis and the
concepts of total, direct, and indirect effects. While his total effect aims at the same idea
as the causal total average effect, his direct and indirect effects were new. Simply speaking,
in the context of an experiment or quasi-experiment, a direct effect of the treatment is the
effect that is not transmitted through the intermediate variables; it is the conditional effect
of the treatment variable holding constant the intermediate variables on one of their val-
ues. In contrast, the indirect effect is the difference between the total effect and the direct
effect.
IX

Fundamental Problem of Causal Inference

Whereas the basic ideas outlined above are relatively simple and straightforward, trying to
put them into practice — that is, solving the practical problem mentioned above — is of-
ten difficult and needs considerable sophistication. The “fundamental problem of causal
inference” (Holland, 1986) is that we cannot expose an observational unit to treatment 1
and, at the same time, to treatment 0. However, this is exactly what is necessary if we want
to be sure that ‘everything else is invariant’, a clause that is also an implicit assumption in
the solution proposed by Neyman. Comparing the true yield of treatment 1 to treatment
0 within the same plot at the same time and identical conditions is an ideal version of the
ceteris paribus clause, which unfortunately is rarely accomplishable.

Pre-Post Designs

If we choose to first observe a unit under ‘no treatment’ and then observe it again after
‘treatment’, we may be tempted to interpret the pre-post differences as estimates of the
individual causal effects of the treatment given in between. However, this interpretation
might be wrong, because the unit may have developed (maturated, learned), may have
suffered from critical life events, may have experienced historical change, and so on (see,
e. g., Campbell & Stanley, 1963; Cook & Campbell, 1979; Shadish et al., 2002). Hence, in
these pre-post designs or synonymously, within-group designs, we have to make assump-
tions on the nature of these possible alternative interpretations of the pre-post compar-
isons, for example, that they do not hold in the application considered or that they have a
certain structure that can be taken into account when making causal inferences based on
pre-post comparisons.

Between-Group Designs

If, instead of making comparisons within a unit, we compare different units to each other
in between-group experiments, we certainly lose the possibility of estimating the individ-
ual causal effects. However, what we can hope for is that we are still able to estimate
the causal average total effect and certain causal conditional total effects. But how to es-
timate the average of the causal individual total effects if, due to the fundamental problem
of causal inference, the causal individual total effects are not estimable? Both, between-
group experiments and quasi-experiments, have a set of (observational) units, at least two
experimental conditions (‘treatment conditions’, ‘expositions’, ‘interventions’, etc.), and at
least one outcome variable (‘response’, ‘criterion’, ‘dependent variable’) Y . In the medical
sciences, the units are usually patients. In psychology the observational units are often
persons, but it could be persons-in-a-situation, or groups as well. In economics it could
be subjects, companies, or countries, for instance. In educational sciences the units might
be school classes, schools, communities, districts, or countries. In sociology and the polit-
ical sciences, the units could be persons, but also communities, countries, and so on. In
this book we show how to define and also how to make inferences about the average of the
causal individual total effects in such sets (and subsets) of observational units and about
causal conditional total effects, conditioning on attributes of the observational units or on
pretest scores, for instance.
X

Scope of the Theory

In order to delineate the scope of the theory, consider the following kind of random exper-
iment : Draw an observational unit u (e. g., a person) out of a set of units, observe the value
z of a (possibly multivariate qualitative or quantitative) covariate Z for this unit, assign the
unit or observe its assignment to x, one of several experimental conditions, and record the
numerical value y of the outcome variable Y . We will use U to denote the random variable
representing with its value u the unit drawn. Note that many observations can be made
additionally to observing U , Z , X , and Y . Although this single-unit trial is a prototype of
the kind of empirical phenomena the theory is dealing with, there are other single-unit
trials in which the theory can be applied as well (see ch. 2). In fact, the theory is applicable
far beyond the true (i. e., the randomized experiment) and the quasi-experiment. This in-
cludes applications in which the putative causes are not manipulable. In this volume, we
also treat the case in which the putative cause is a continuous random variable (see, e. g.,
the causality conditions treated in chs. 8 or 9). The theory has its limitations only if there
is no clear time order of the random variables considered as putative causes or outcomes.

True Experiments and Quasi-Experiments

The single-unit trial described above is a random experiment, but not necessarily a ran-
domized experiment. A randomized experiment is a special random experiment in which
the drawn unit is assigned to one of the treatment conditions via randomization, for ex-
ample, depending on the outcome of a coin flip. (In empirical research, the single-unit
trials are repeated n times, where n denotes the sample size.) Referring to single-unit tri-
als, we can distinguish the true experiment from the quasi-experiment as follows: In the
true experiment, there are at least two treatment conditions and the assignment to one of
the treatment conditions is randomized, for example, by flipping a coin. In a traditional
randomized experiment, for instance, the treatment probabilities are chosen to be equal
for all units. However, equal treatment probabilities for all units are neither essential for
the definition of the true experiment nor for drawing valid causal inferences. We may as
well have treatment probabilities depending on the units and/or on a covariate (for more
details, see, e. g., Rem. 8.59), as long as these treatment probabilities are fixed or known
by the researcher. Note, however, that in designs, in which different units have different
treatment probabilities, standard techniques of data analysis such as t -tests or analysis of
variance do not test the hypothesis about a causal effect any more.
For between-group designs, the quasi-experiment may be defined such that there are at
least two treatment conditions; however, in contrast to the true experiment, the treatment
probabilities are unknown. Nevertheless, valid causal inferences can be drawn in quasi-
experiments provided that we can rely on certain assumptions (see the causality conditions
treated in Part III of this book. In specific applications these assumptions might be wrong.
If they are actually wrong, causal inferences can be completely wrong as well.

Who Should Study This Book?

The Methodologist

In the first place, I would like to address the methodologist, that is, the expert in empiri-
cal research methodology, especially in the social, economic, behavioral, cognitive, medi-
XI

cal, agricultural, and biological sciences. This book provides answers to some of the most
important and fundamental questions of these empirical sciences: What do we mean by
terms like ‘X affects Y ’, ‘X has an effect on Y ’, ‘X influences Y ’, ‘X leads to Y ’, and so on
used in our informal theories and hypotheses? How can we translate these terms into a
precise language (i. e., probability theory) that is compatible with the statistical analysis of
empirical data? How to design an empirical study and how to analyze the resulting data
if we want to probe our theories and learn from such data about the causal dependencies
postulated in our theories and hypotheses? And last but not least: How to evaluate in-
terventions, treatments, or expositions to (possibly detrimental) environments, and learn
about how which effects they have for which kind of subjects or observational-units, and
under which circumstances?

The Statistician

Many statisticians believe that causality is beyond the horizon of their profession. Causal-
ity might be a matter of empirical researchers and philosophers, they say, but not their
own. They think that it cannot be treated mathematically and therefore a statistician
should refrain from causal interpretations. As a consequence, they ignore the issue of
causality. This book proves that these beliefs are prejudices. The theory of causal effects,
as presented here, is a branch of probability theory, which itself, at least since Kolmogorov
(1933/1977), is a part of pure mathematics — although with an enormous potential for
applications in many empirical sciences and even beyond. The main purpose of this book
is to translate the informal concepts about causal effects shared by many methodologists
and applied statisticians into well-defined terms of mathematical probability theory. The
principle is not to use any term that itself is not defined in other mathematic terms, and
the result is a purely mathematical theory of causal effects. Of course, this will make it
harder to read this book for the methodologist and those not yet trained in probability
theory. However, the reward is a much deeper understanding of what is essential and a
much better grasp of the nature of our theories about the real world.
Of course, undefined terms are still used in this book, but only in the examples, in
the interpretations, and in the motivations of the definitions. The theory itself is pure
mathematics, just in the same way as Kolmogorov’s probability theory presented in 1933,
which explicated the mathematical, measure-theoretical structure of probabilistic con-
cepts. Substantive meaning results, for example, if we interpret the core components of
the formal structure in a specific random experiment considered. And this is also true for
the theory of causal effects presented in this book.

The Empirical Scientist

The empirical scientist in the fields mentioned above has at least three good reasons to
study this book. The first is that some crucial parts of his theories and hypotheses are
explicated, at least when it comes to considering a concrete experiment or study. The am-
biguity in causal language such as ‘X affects Y ’, ‘X has an effect on Y ’, ‘X influences Y ’,
‘X leads to Y ’ are not necessary any more. Reading this book will make it possible to re-
place these ambiguous terms by well-understood and well-defined terms, improving the
precision of empirical research and theories.
XII

The second motivation of the empirical scientist is that even if he knows his own theo-
retical concepts and hypotheses, he still has to know how to design experiments and stud-
ies that enable him to test them empirically.
Third, the standard ways of analyzing data offered in the textbooks of applied statistics
and in the available computer programs often do not estimate and test the causal effects
and dependencies we refer to in our theories. And this is not only bad for the empirical
scientist but also for all those relying on the validity of his inferences and his expertise. Just
think about all the harmful consequences of wrong causal theories in various empirical
research fields, if they are applied to solving concrete problems!

The Experimental Scientist

There are two messages for those who do their research with experiments, a good one and
a bad one. The good news is that, in a perfect randomized experiment, the causal average
total treatment effect is indeed estimated when comparing sample means between two
different treatment conditions. The bad news is that we can not rely on randomized as-
signment of units to treatment conditions when it comes to estimating direct and indirect
effects. More specifically, in such an analysis it is usually not sufficient to consider inter-
mediate variables, treatment and outcome variables. Instead we also have to include in
our analysis pre-treatment variables such as a pre-test of the intermediate variable and a
pre-test of the outcome variable and apply adjustment methods, very much in the same
way as we have to use these techniques in quasi-experiments. Hence, if we want to study
the black box between the treatment and the outcome variables, we have to adopt the
techniques of causal modeling that are far beyond traditional comparisons of means and
analysis of variance. (For more details see, e. g., Mayer, Thoemmes, Rose, Steyer, & West,
2014).

The Philosopher of Science

Philosophers of science study and teach the methodology of empirical sciences. In that
respect, their task is very similar to that of the methodologist, perhaps only more gen-
eral and less specific for a certain discipline. Therefore, it is not surprising that probabilis-
tic causality has also been tackled by philosophers of science (see, e. g., Cartwright, 1979;
Spohn, 1980; Stegmüller, 1983; Suppes, 1970). Compared to these approaches, our empha-
sis is more on those parts of the theory that have implications for the design of empirical
studies and the analysis of data resulting from such studies.

The Students in These Fields

For reasons detailed before, I believe that the probabilistic theory causal effects and de-
pendencies is the most rewarding topic in methodology. Although it is tough to get into it,
you will get insights why all this methodology stuff was useful and what it was good for. At
least this is what many of my students said at the end of their curriculum, even if they did
not have the choice whether or not to take the courses on causal effects.
XIII

Research Traditions in Causal Effects

Several research traditions have been contributing to the theory of causal effects in vari-
ous ways. From the Neyman-Rubin tradition, I adopted the idea that it is important to de-
fine various causal effects such as individual, conditional, and average total effects, even
though we modified and extended these concepts in some important aspects. Defining
causal effects is important for proving that certain methods of data analysis yield unbi-
ased estimates of these effects if certain assumptions can be made. Are there conditions
under which the analysis of change scores (between pre- and post-tests) and repeated-
measures analysis of variance yield causal effects? Under which conditions do we estimate
and test causal effects in the analysis of covariance? Which are the assumptions under
which propensity score methods yield estimates of causal effects? Which are the assump-
tions under which an instrumental variable analysis estimates a causal effect? All these
questions and their answers presuppose that we have a mathematical definition of causal
effects. Simply speaking, Rubin’s potential outcome variables are replaced by the true out-
come variables (see ch. 5), allowing for variance in the outcome (or response) variables
given treatment and an observational unit. Many important results of the theory, for exam-
ple, about strong ignorability and about propensity scores remain unchanged, while other
results are new, giving more insights, and open the floor for new research techniques.
From the Campbellian tradition (see, e. g., Campbell & Stanley, 1966; Cook & Camp-
bell, 1979; Shadish et al., 2002) we learned that there are questions and problems beyond
the theory causal effects itself that are relevant in empirical causal research, such as: How
to generalize beyond the study? What does the treatment variable actually mean from a
substantive point of view? What is the meaning of the outcome variable? And, perhaps the
most general question: Are there alternative explanations for the effect? The vast major-
ity of social scientists (including myself) have been educated in this research tradition to
some degree. Although this training is still very useful as a general methodology frame-
work, it lacks precision and clarity in a number of issues — and the definition of a causal
effect is one of them that remains unnecessarily vague in their ideas dealing with interval
validity.
From the graphical modeling tradition (see, e. g., Cox & Wermuth, 2004; Pearl, 2009;
Spirtes, Glymour, & Scheines, 2000), we learned that conditional independence plays an
important role in causal modeling. This research tradition has also been developing tech-
niques to estimate causal effects and to search for causal models if specific assumptions
can be made. The fact that randomization in a true experiment in no way guarantees the
validity of causal inferences on direct effects has been brought up by this research tradi-
tion.
Structural equation modeling and psychometrics have been teaching us how to use la-
tent variables and structural equation modeling in testing causal hypotheses. Due to a
number of statistical programs such as AMOS (Arbuckle, 2006), EQS (Bentler, 1995), lavaan
(Rosseel, 2012), LISREL (Jöreskog & Sörbom, 1996/2001), Mplus (Muthén & Muthén, 1998-
2007), OpenMx (OpenMx, 2009), RAMONA (Browne & Mels, 1998), structural equation
modeling became extremely popular in the social sciences. Although many users of these
programs hope to find causal answers, it should be clearly stated that structural equation
modeling — and this is true for all kinds of statistical models (including analysis of vari-
ance) — does neither automatically estimate and test causal effects, nor does it provide a
satisfactory theory of causal effects and dependencies. Nevertheless, this research tradi-
XIV

tion contributes — just like other areas of statistics — a number of statistical techniques
that can be very useful in causal modeling.
This book is aimed at embedding — and, where necessary, extending — conventional
statistical procedures such as analysis of covariance, nonorthogonal analysis of variance,
and latent variable modeling, but also more recent techniques based on propensity scores
into a coherent theory of probabilistic causality.

Outline of Chapters

Chapter 1 is intended to motivate the theory by some examples. Specifically, it is shown


that, in general, true mean differences do not have — without further assumptions — a
valid causal interpretation. Furthermore, we present an example showing that, under cer-
tain conditions, traditional techniques of data analysis such as analysis of variance, sys-
tematically estimate the wrong parameters and test the wrong hypotheses if causal effects
are aimed at.
In chapter 2, we describe in more detail some random experiments, that is, some of
the empirical phenomena the theory is dealing with. These include, for example, a simple
experiment in which we study the effect of a treatment on an outcome variable, but also
more complex random experiments in which manifest or latent covariates (e. g., latent
pretest variables) are measured, a treatment variable is observed, and manifest or latent
outcome variables are assessed.
While the first two chapters are accessible without much knowledge in probability the-
ory, the rest of the book is not. Therefore we recommend to study an introduction to prob-
ability and conditional expectation such as Steyer (2024) or Steyer and Nagel (2017) before
reading the rest of this book and/or whenever these terms appear. These books are also
often referred to using the abbreviations RS and SN, respectively.
In chapter 3, we introduce the additional mathematical concepts that are necessary for
the definition of causal effects and that allow to meaningfully raise the question if a depen-
dency of a random variable on another one describes a causal dependency. These terms
include the concept of a filtration that is fundamental in the theory of stochastic processes.
Using a filtration, we can introduce time order between events, families of events, and ran-
dom variables. It will also be used to distinguish potential confounders from others kinds
of random variables.
Chapter 4 introduces the mathematical structure that allows us to define a putative
cause variable X , an outcome variable Y of X , and a potential confounder of X . In many
cases, conditional expectations describing a causal dependence can be distinguished from
conditional expectations that have no such causal interpretation by their relationship to
the potential confounders of the putative cause variable X .
Chapter 5 starts with the concepts of true outcome variables and true effect variables
that play am important role in the definition of total causal effects. Then several kinds
of conditional and average total effects are introduced. As indicated above, the basic ideas
are simple. However, in the subsequent chapters it will become obvious that the theory has
tremendous consequences, for the design of experiments, for data analysis in experiments
and quasi-experiments, and beyond.
In chapter 6, we will study how (conditional and unconditional) prima facie effects
(PFEs) are related to the different kinds of causal effects. Among these prima facie effects
are the true mean differences E (Y |X =x ) − E (Y |X =x ′ ) between two treatment conditions
XV

x and x ′, and the corresponding mean differences E (Y |X =x ,Z =z) − E (Y |X =x ′, Z =z) con-


ditioning also on the value z of a covariate Z . For all these concepts, we introduce the
notion of unbiasedness with respect to total effects.
In chapter 7 we intensively study the Rosenbaum-Rubin conditions, which deal with the
relationship between true outcome variables and the putative cause variable. The weakest
of these conditions is equivalent to unbiasedness, the strongest is the probabilistic version
of strong ignorability (see, e.g., Rosenbaum & Rubin, 1984). Although these conditions are
of theoretical interest, they can neither be tested empirically nor can they be used for co-
variate selection.
Chapter 8 deals with other causality conditions, the Fisher conditions, which imply un-
biasedness if the putative cause variable is discrete. However, these conditions also hold
for a continuous putative cause variable, for which unbiasedness and true outcome vari-
ables are not defined. These causality conditions can deliberately be created by the design
technique of randomized assignment of a unit to one of the treatment conditions. Further-
more, these causality conditions are the first ones treated that can be tested in empirical
applications.
In contrast to the Fisher conditions, which deal with the relationship between the pu-
tative cause variable and the potential confounders, the Suppes-Reichenbach conditions
treated in chapter 9 focus the relationship between the outcome variable and the potential
confounders. Just like the Fisher conditions, they are empirically testable and can there-
fore be used for covariate selection.
In chapter 10 we present a fourth kind of causality conditions, the unconfoundedness
conditions. To our knowledge, they are the weakest conditions implying unbiasedness that
are still empirically testable.
In chapter 11 we show that propensities can also be used instead of the original co-
variates to adjust the expectations of the outcomes in the treatment conditions and to
compute the average causal effect. This also involves weighting the outcome variable with
a function of the propensity scores.

How to Use This Book

This book is written such that standard mathematical probability theory is sufficient for a
complete understanding, provided one takes the time that these topics require. In many
parts, this is not a book one can just read; instead it is a book to be studied. This includes
working on the questions and exercises provided in each chapter. We presume that the
reader is familiar with — or learns while studying this book — the essentials of proba-
bility theory, including not only random variables and their distribution, but also con-
ditional expectations and conditional independence. These essentials of probability the-
ory are extracted in Steyer (2024) from the more complete and detailed book Steyer and
Nagel (2017). Both books are also referred to very often for definitions, theorems, and
other propositions used in this text. The references to Steyer (2024) are abbreviated by
RS-Definition, RS-Theorem, RS-Remark, or RS-(3.3), the latter referring to an equation or
a proposition in that book. Similarly, references to Steyer and Nagel (2017) are abbreviated
by SN-Definition, SN-Theorem, SN-Remark, or SN-(10.32), for example.
The largest part of this book is devoted to the theory of causal effects. The Causal Ef-
fects Explorer (Nagengast, Kröhne, Bauer, & Steyer, 2007) can be used for exploring prima
facie effects, conditional and average total effects given certain parameters. Furthermore,
XVI

the program EffectLiteR (Mayer, Dietzfelbinger, Rosseel, & Steyer, 2016), can be used to
estimate conditional and average total effects from empirical data in experiments and
quasi-experiments. Both programs, which are available at www.causal-effects.de, may be
used together with this book in a course on causal modeling. In fact, this is the content of
my workshops on the analysis of causal conditional and average total effects, which are
available both as videos-on-demand on the internet and on DVDs, again at www.causal-
effects.de.

Acknowledgements

This book has been written with the help of several colleagues and students. Werner Nagel
(FSU Jena) helped whenever I felt lost in probability spaces. Stephen G. West (Arizona
State University) and Felix Thömmes (Cornell University) made detailed suggestions for
improving readability of the book. Safir Yousfi and Sonja Hahn contributed and/or sug-
gested concrete ideas, Sonja being extremely helpful also in checking some of the math-
ematics. Our students Franz Classe, Lisa Dietzfelbinger, Niclas Heider, Marc Heigener,
Remo Kamm, Lawrence Lo, David Meder, Marita Menzel, Yuka Morikawa, Sebastian Nit-
sche, Fabian Schäfer, Michael Temmerman, Sebastian Weirich, Anna Zimmermann, and
other students critically commented on previous versions, helped minimizing errors, or
organizing the references. Uwe Altmann, Linda Gräfe, Sven Hartenstein, Ulf Kröhne, Axel
Mayer, Marc Müller, Christof Nachtigall, Benjamin Nagengast, Andreas Neudecker, Ivailo
Partchev, Jan Plötner, Steffi Pohl, Norman Rose, Marie-Ann Sengewald, and Andreas Wolf
together with the others mentioned above provided the intellectual climate in which this
book could be written. I am also grateful to the students and colleagues participating at
our courses on the analysis of causal effects asking questions and making important com-
ments. Over the years, this helped a lot to improve this book.

Jena, August 15, 2024


Contents

Part I
Introduction

1 Introductory Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Example 1 — Joe and Ann With Self-Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Joint Probabilities P (X =x , Y =y ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2 Marginal Probabilities P (X =x ) and P (Y =y) . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.3 Prima Facie Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.4 Individual Total Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.5 Prima Facie Effect Versus Expectation of the Individual Total Effects 10
1.1.6 How to Evaluate the Treatment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Example 2 — Experiment With Two Nonorthogonal Factors . . . . . . . . . . . . . . . 13
1.2.1 Prima Facie Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.2 (Z =z)-Conditional Prima Facie Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.3 Average of the (Z =z)-Conditional Prima Facie Effects . . . . . . . . . . . . . . 16
1.2.4 Individual Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.5 Average of the Individual Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.6 (Z =z)-Conditional Total Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.7 How to Evaluate the Treatment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 Some Typical Kinds of Random Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29


2.1 Simple Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 Experiments With Fallible Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 Two-Factorial Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 Multilevel Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5 Experiments With Latent Outcome Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.6 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Part II
Basic Concepts of the Theory of Causal Total Effects

3 Time Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1 Filtration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Prior-To Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 Properties of the Prior-to Relation of Measurable Set Systems . . . . . . . 51
XVIII Contents

3.2.2 Properties of the Prior-to Relation of Measurable Maps . . . . . . . . . . . . 54


3.3 Simultaneous-to Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.1 Properties of the Simultaneous-to Relation of Measurable Set
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.2 Properties of the Simultaneous-to Relation of Measurable Maps . . . . 62
3.4 Prior-or-Simultaneous-to Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.1 Properties of the Prior-or-Simultaneous-to Relation of Measurable
Set Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.2 Properties of the Prior-or-Simultaneous-to Relation of Measurable
Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4 Regular Causality Space and Potential Confounder . . . . . . . . . . . . . . . . . . . . . . . . . . . 83


4.1 Regular Causality Space and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.1 Regular Causality Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.2 Regular Causality Setup and Potential Confounder . . . . . . . . . . . . . . . . 87
4.1.3 Restriction of a Regular Causality Space and Setup . . . . . . . . . . . . . . . . . 90
4.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.1 Joe and Ann With Three Treatment Conditions . . . . . . . . . . . . . . . . . . . . 91
4.2.2 Joe and Ann With Two Simultaneous Treatment Variables . . . . . . . . . . 94
4.2.3 Nonorthogonal Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3.1 Cause σ-Algebra and Potential Confounder σ-Algebra . . . . . . . . . . . . . 99
4.3.2 Putative Cause Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3.3 Potential Confounder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.3.4 Outcome Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5 True Outcome Variable and Causal Total Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115


5.1 True Outcome Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 True Total Effect Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.3 Causal Average Total Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.4 Causal Conditional Total Effect and Total Effect Function . . . . . . . . . . . . . . . . . 127
5.4.1 Notation, Assumptions and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.4.2 Causal (X =x ∗ )-Conditional Total Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.4.3 Complete Re-Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.4.4 Partial Re-Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.5 Example: Joe and Ann With Bias at the Individual Level . . . . . . . . . . . . . . . . . . . 134
5.6 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Part III
Causality Conditions
Contents XIX

6 Unbiasedness and Identification of Causal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155


6.1 Unbiasedness of E (Y |X ) and Its Values E (Y |X =x ) . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2 Unbiasedness of E (Y |X, Z ) and Related Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2.1 Definition and First Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2.2 Equivalent Conditions of Z -Conditional Unbiasedness . . . . . . . . . . . . 162
6.3 Unbiasedness of Prima Facie Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.4 Identification of Causal Total Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.4.1 Identification of the Causal Average Total Effect . . . . . . . . . . . . . . . . . . . 167
6.4.2 Identification of a Causal Conditional Total Effect Function . . . . . . . . 169
6.4.3 Identification of a Causal Conditional Total Effect . . . . . . . . . . . . . . . . . 172
6.5 Three Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.6 An Example With Accidental Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

7 Rosenbaum-Rubin Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199


7.1 RR-Conditions for E (Y |X =x ) and E (Y |X ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.1.1 Mean-Independence Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.1.2 Independence Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.1.3 Implications Among RR-Conditions for E (Y |X =x ) and E (Y |X ) . . . . . 203
7.2 RR-Conditions for E Z=z(Y |X =x ) and E Z=z (Y |X ) . . . . . . . . . . . . . . . . . . . . . . . . . 205
7.3 RR-Conditions for E X =x (Y |Z ) and E (Y |X, Z ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
7.3.1 Conditional Mean-Independence Conditions . . . . . . . . . . . . . . . . . . . . . 207
7.3.2 Conditional Independence Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
7.3.3 Implications Among RR-Conditions for E X =x (Y |Z ) and E (Y |X, Z ) . . 209
7.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
7.4.1 Z -Conditional Mean-Independence of τx From 1 X =x for All x . . . . . . 213
7.4.2 Z -Conditional Independence of τ and X . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

8 Fisher Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227


8.1 F-Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
8.1.1 Simple F-Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
8.1.2 Z -Conditional F-Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.1.3 (Z =z)-Conditional F-Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
8.2 Implications Among F-Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.3 Implications of F-Conditions on RR-Conditions and Unbiasedness . . . . . . . . 236
8.3.1 Consequences of Independence of D X and X . . . . . . . . . . . . . . . . . . . . . . 237
8.3.2 Consequences of Z -Conditional Independence of D X and X . . . . . . . 239
8.3.3 Consequences of (Z =z)-Conditional Independence of D X and X . . . 244
8.4 Unbiasedness of Prima Facie Effects and Effect Functions . . . . . . . . . . . . . . . . . 245
8.4.1 Unbiasedness of the Prima Facie Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
8.4.2 Unbiasedness of a Z -Conditional Prima Facie Effect Function . . . . . . 246
8.4.3 Unbiasedness of a (Z =z)-Conditional Prima Facie Effect . . . . . . . . . . . 247
8.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
XX Contents

8.6 Methodological Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251


8.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

9 Suppes-Reichenbach Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271


9.1 SR-Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
9.1.1 Simple SR-Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
9.1.2 Conditional SR-Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
9.1.3 Example: (X , Z )-Conditional Mean-Independence of Y From D X . . . 277
9.2 Implications Among the Suppes-Reichenbach Conditions . . . . . . . . . . . . . . . . . 278
9.3 Consequences for the Rosenbaum-Rubin Conditions . . . . . . . . . . . . . . . . . . . . . 282
9.3.1 Consequences of Y  D X |(X =x ) and Y  D X |X . . . . . . . . . . . . . . . . . . . . 282
9.3.2 Consequences of Y  D X |(X =x , Z ) and Y  D X |(X , Z ) . . . . . . . . . . . . . . 286
9.3.3 Consequences for the Prima Facie Effect Variables . . . . . . . . . . . . . . . . . 288
9.3.4 Example: (X , Z )-Conditional Mean-Independence of Y From D X . . . 289
9.4 Methodological Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
9.4.1 Methodological Consequences of Y ⊥ ⊥D X |X and Y  D X |X . . . . . . . . . 290
9.4.2 Methodological Consequences of Y ⊥ ⊥D X |(X , Z ) and Y  D X |(X , Z ) 290
9.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
9.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
9.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

10 Unconfoundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
10.1 Unconfoundedness Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
10.2 Sufficient Conditions of Unconfoundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
10.2.1 Fisher Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
10.2.2 Suppes-Reichenbach Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
10.3 Hybrid Sufficient Conditions of Unconfoundedness . . . . . . . . . . . . . . . . . . . . . . 319
10.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
10.5 Implications of Unconfoundedness on Unbiasedness . . . . . . . . . . . . . . . . . . . . . 324
10.5.1 Unbiasedness of E (Y |X ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
10.5.2 Unbiasedness of E (Y |X, Z ), E X =x (Y |Z ), and E X =x (Y |Z =z) . . . . . . . . . 325
10.6 Expectation Stability of Prima Facie Effects and Effect Functions . . . . . . . . . . . 326
10.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
10.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

11 Identification of Causal Effects Using Propensities . . . . . . . . . . . . . . . . . . . . . . . . . . . 343


11.1 True Propensities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
11.1.1 True Propensities and True Outcome Variables . . . . . . . . . . . . . . . . . . . . 347
11.1.2 Identification of Total Effect Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
11.1.3 Identifying the Expectation of a True Outcome Variable . . . . . . . . . . . . 350
11.1.4 Identifying a Conditional Expectation of a True Outcome Variable . . 351
11.1.5 Identifying a Causal Conditional Total Effect Function . . . . . . . . . . . . . 352
11.1.6 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
11.2 Methodological Implications of the Theory of True Propensities . . . . . . . . . . . 355
11.2.1 Known True Propensities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Contents XXI

11.2.2 Unknown True Propensities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357


11.3 Z -Conditional Propensities for Total Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
11.3.1 Methodological Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
11.3.2 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
11.4 Weighting the Outcome Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
11.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
11.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372

12 Methodological Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377


12.1 Summary of the Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
12.2 Design and Analysis of Experiments and Quasi-Experiments . . . . . . . . . . . . . . 380
12.3 Other Issues of Causal Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
List of Figures

1.1 Probability of success given treatment conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 7


1.2 Conditional probabilities of success given treatment and person . . . . . . . . . . . . . 11
1.3 Conditional probabilities of success given treatment and person . . . . . . . . . . . . . 12
1.4 Conditional expectation values of Y given treatment and status . . . . . . . . . . . . . 17

2.1 A simple experiment or quasi-experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


2.2 Experiment or quasi-experiment with a fallible covariate . . . . . . . . . . . . . . . . . . . . 35

4.1 The filtration (Ft )t ∈T and various σ-algebras in a regular causality space. . . . . . 89

6.1 The person variable U , the function g 1 , and their composition, the true
outcome variable τ1 = g 1 (U ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
List of Tables

1.1 Joe and Ann with self-selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4


1.2 Joe and Ann with self-selection – explicit table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Joint and marginal probabilities of treatment and success . . . . . . . . . . . . . . . . . . . 6
1.4 Joint and marginal probabilities of all three observables . . . . . . . . . . . . . . . . . . . . . 9
1.5 Random experiment with two nonorthogonal factors . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Conditional expectation values of the outcome variable Y given treatment . . . 15
1.7 Conditional expectation values E (Y | X =x , Z =z) given treatment and status . . 16

4.1 Joe and Ann with a single treatment variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86


4.2 Joe and Ann with three treatment conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3 Joe and Ann with two simultaneous treatment variables . . . . . . . . . . . . . . . . . . . . . 95

5.1 Joe and Ann with self-selection revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119


5.2 Joe and Ann with bias at the individual level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.1 Joe and Ann with self-selection to treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169


6.2 Self-selection to treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.3 Randomized assignment of the person to a treatment . . . . . . . . . . . . . . . . . . . . . . . 179
6.4 Conditionally randomized assignment of the person to a treatment . . . . . . . . . . 180
6.5 Accidental unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

7.1 Implications among RR-conditions for E (Y |X =x ) and E (Y |X ) . . . . . . . . . . . . . . 204


7.2 Implications among RR-conditions for E X =x (Y |Z ) and E (Y |X, Z ) . . . . . . . . . . . . 211
7.3 Z -conditional mean-independence of τx from 1X =x for all values x of X . . . . . . 214
7.4 Conditional probabilities P (U =u |X =x , Z =z) supplementing Table 7.3 . . . . . . 215
7.5 Z -conditional independence of X and τ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

8.1 Implications among the F-conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235


8.2 Implications of the Fisher conditions on some Rosenbaum-Rubin conditions . 238
8.3 No treatment for males: Z -conditional independence of X and D X . . . . . . . . . . 241

9.1 (X , Z )-conditional mean-independence of Y from D X . . . . . . . . . . . . . . . . . . . . . . 277


9.2 Implications among the Suppes-Reichenbach conditions . . . . . . . . . . . . . . . . . . . 279
9.3 Implications of the Suppes-Reichenbach conditions on some
Rosenbaum-Rubin conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

10.1 Implications among unconfoundedness conditions . . . . . . . . . . . . . . . . . . . . . . . . 311


10.2 Implications of the Fisher conditions on unconfoundedness . . . . . . . . . . . . . . . . 312
10.3 Implications of Suppes-Reichenbach conditions on unconfoundedness . . . . . 316
XXVI List of Tables

10.4 Unconfoundedness of E (Y |X, Z ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

11.1 Conditional expectation values E X =x (Y |ϕ=p) of the outcome variable Y


given treatment x and true propensity p in the example of Table 7.5 . . . . . . . . . 354
11.2 Conditional expectation values E X =x (Y |π1 = p) in the example of Table 7.5 . . . 366
11.3 Implications among some causality conditions involving propensities . . . . . . . 372
Part I

Introduction
Chapter 1
Introductory Examples

For more than a century there have been examples in the statistical literature showing that
comparing means or comparing probabilities (e. g., of success of a treatment) between a
group exposed to a treatment and a comparison group (unexposed or exposed to a dif-
ferent treatment) does not necessarily answer our questions: ‘Which treatment is better
overall?’ or ‘Which treatment is better for which kind of person?’ Differences between true
means and differences between probabilities (or any other comparison between probabil-
ities such as odds ratios, log odds ratios, or relative risk) are usually not the treatment ef-
fects we are looking for (see, e. g., Pearson, Lee, & Bramley-Moore, 1899; Yule, 1903; Simp-
son, 1951). They are just effects at first sight or “prima facie effects” (Holland, 1986).
Just like the shadow in the metaphor of the invisible man presented in the preface,
prima facie effects reflect the effects of the treatment (the size of the invisible man), but
also the effects of other causes (the angle of the sun). The goal of analyzing causal effects
is to estimate the effect of the treatment alone, isolating it from other potential determi-
nants, such sex, educational background, socio-economic status, and so on. The general
idea is to define and, in applications, estimate a treatment effect that is not biased by pre-
existing differences between treatment groups that would also be observed after treatment
if there were no treatment effect at all.

Overview

We illustrate systematic bias in determining total (as opposed to direct or indirect) treat-
ment effects in quasi-experiments by two examples. The first one deals with a dichoto-
mous outcome variable, the second with a quantitative one. Note that the problems de-
scribed in these two examples cannot occur in a randomized experiment, but they are
ubiquitous in nonrandomized quasi-experimental observational studies.

1.1 Example 1 — Joe and Ann With Self-Selection

In this example, the prima facie effect reverses if we switch from comparing the condi-
tional probabilities of success between treatment and control, that is, from comparing

P (Y =1|X =1) to P (Y =1|X =0)

to comparing the corresponding probabilities additionally conditioning on the person


variable U with values u ∈ { Joe , Ann }, that is, to comparing

P (Y =1| U =u , X =1) to P (Y =1| U =u , X =0).


4 1 Introductory Examples

Table 1.1. Joe and Ann with self-selection

Person u P(U =u ) P(X =1|U =u ) P(Y =1|U =u , X =0) P(Y =1|U =u , X =1)

Joe .5 .04 .7 .8
Ann .5 .76 .2 .4

This kind of phenomenon, which is already known at least since Yule (1903), is called
Simpson’s paradox (Simpson, 1951), and it is still being debated (see, e. g., Hernán, Clayton,
& Keiding, 2011; Wang & Rousseau, 2021).
Table 1.1 shows the parameters specifying a random experiment that is composed of
three parts.
(1) A person is sampled from a set of two persons, Joe and Ann, with identical proba-
bilities for each person u, that is, with probability P (U =u ) = .5.
(2) If Joe is sampled, then he obtains treatment (X =1) with probability P (X =1|U =Joe )
= .04. If Ann is sampled, then she is treated with probability P (X =1|U =Ann) = .76.
(These numbers may reflect self-selection to treatment and the different inclina-
tions of the two persons to go to treatment.)
(3) If Joe is sampled and not treated, then his probabilityP (Y =1| U =Joe , X =0) of suc-
cess is .7. If he is sampled and treated, then his probability P (Y =1| U =Joe , X =1)
of success is .8. In contrast, if Ann is sampled and not treated, then her probability
P (Y =1| U =Ann, X =0) of success is .2, and if she is sampled and treated, then her
probability P (Y =1| U =Ann, X =1) of success is .4.
This table describes a random experiment and it contains all information we need to
compute the causal total effects of the treatment on the outcome variable Y (success),
including the causal conditional total effects given the person and the causal average total
effect of the treatment variable. These terms will be defined in chapter 5 and computed for
this example in section 1.1.4.
Note that Table 1.1 does not describe a randomized experiment, in which, by definition,
the treatment probabilities P (X =1|U =u ) would be identical for all observational units u.
Instead, it describes a random experiment, which is that kind of empirical phenomenon
that we usually consider when we apply probability theory using terms such as random
variables, their expectations, variances, distribution, correlations, etc. In inferential statis-
tics it is those concepts about which we formulate our hypothesis and that we try to esti-
mate in a sample.
In probability theory, we consider such a random experiment from the pre facto per-
spective. Hence, we do not consider data that would result from actually conducting such
a random experiment. Data are only important in order to learn from observations about
the laws of a random experiment. Data analysis is only a way to learn about these laws.
But it these laws of the random experiment that are of primary interest. More precisely, if
we know the eight probabilities displayed in Table 1.1, then we have all the information
that we need to compute the causal conditional and average total effects of the treatment
on the outcome (success). All it needs is to define these concepts in terms of probability
theory, and this is what this book is about. Causal effects are nothing philosophical or even
metaphysical. Instead, they are parameters that can be computed from the probabilities
1.1 Example 1 — Joe and Ann With Self-Selection 5

Table 1.2. Joe and Ann with self-selection – explicit table

Possible outcomes ωi Observables Conditional probabilities

Treatment variable X

Response variable Y
Person variable U

P (Y = 1|X ,U )
P (X =1| U )

P (Y = 1|X )
Treatment

Success

P ({ωi })
Unit

ω1 = (Joe, no, −) .144 Joe 0 0 .04 .6 .7


ω2 = (Joe, no, +) .336 Joe 0 1 .04 .6 .7
ω3 = (Joe, yes, −) .004 Joe 1 0 .04 .42 .8
ω4 = (Joe, yes, +) .016 Joe 1 1 .04 .42 .8
ω5 = (Ann,no, −) .096 Ann 0 0 .76 .6 .2
ω6 = (Ann,no, +) .024 Ann 0 1 .76 .6 .2
ω7 = (Ann,yes,−) .228 Ann 1 0 .76 .42 .4
ω8 = (Ann,yes,+) .152 Ann 1 1 .76 .42 .4

of events and the distributions of random variables pertaining to the random experiment
considered.
Table 1.2 describes the same random experiment as Table 1.1, but in a different way.
The eight triples such as (Joe, no, −) or (Ann, yes, +) represent one of the eight possible
outcomes ω1 , . . . , ω8 of the random experiment that are gathered in the set Ω of possible
outcomes. Remember, an event A is a subset of Ω that has a probability P (A), which is
assigned by the probability measure P to each element A in the set A of all events (see RS-
ch. 1 for these elementary concepts of probability theory). The eight probabilities of the
elementary events contain the same information as the eight probabilities in Table 1.1. All
conditional probabilities appearing in Table 1.2, but also all probabilities and all condi-
tional probabilities presented in Table 1.1 can be computed from these eight probabilities
of the elementary events.
Table 1.2 has the virtue of explicitly showing all possible outcomes of the random ex-
periment considered. Furthermore, it shows how the random variables U , X , and Y are
defined, showing the assignments of their values to each of the eight possible outcomes
of the random experiment (see RS-Def. 2.2 for the definition of a random variable). It also
displays the conditional probabilities P (Y =1|X ,U ), P (Y =1|X ), and P (X =1|U ), which are
random variables on the same probability space as the observables, that is, the random
variables U , X , and Y (see RS-Def. 4.1 and RS-Rem. 4.12 for the definition of such a con-
ditional probability).
The crucial point is that each of the conditional probabilities mentioned above also
assigns a value to each of the eight possible outcomes of the random experiment. For
example, the values assigned by P (X =1|U ) to each outcome ωi ∈ Ω are the conditional
probabilities P (X =1|U =u ). More precisely,

P (X =1| U )(ωi ) = P (X =1| U =u ), if ωi ∈ {ω ∈ Ω: U (ω) = u }. (1.1)


6 1 Introductory Examples

Table 1.3. Joint and marginal probabilities of treatment and success

Treatment
Success No (X =0) Yes (X =1)
No (Y =0) .240 .232 .472
Yes (Y =1) .360 .168 .528
.600 .400 1.000

Note. The entries in the four cells are the joint probabilities P(X =x ,Y =y), the other entries are
the marginal probabilities P(X =x ) (last row) and P(Y =y) (last column).

Similarly,

P (Y =1|X )(ωi ) = P (Y =1|X =x ), if ωi ∈ {ω ∈ Ω: X (ω) = x }, (1.2)

and

P (Y =1|X ,U )(ωi ) = P (Y =1|X =x ,U =u ), if ωi ∈ {ω ∈ Ω: X (ω) = x,U (ω) = u } (1.3)

(see Table 1.2 in order to check these assignment rules).

1.1.1 Joint Probabilities P (X =x , Y =y)

Unfortunately, in realistic applications, we cannot estimate the probabilities of all elemen-


tary events displayed in Table 1.2 because, usually, we cannot repeat this random experi-
ment. Once, a person is treated, we cannot undo treatment and repeat this process. This
has been called the “fundamental problem of causal inference” (Holland, 1986). Often-
times a treatment is irreversible and the time between treatment and the assessment of the
outcome variable Y can take months and even years. Nevertheless, it is meaningful to con-
sider the random experiment described by Table 1.2 including the person-specific prob-
abilities of treatment and the probabilities of success given treatment and person. As will
be shown in chapter 5, we even have to consider this kind of random experiment when we
want to define causal effects. And once we do consider such random experiments, causal
effects are relatively easy to define.
What can be estimated in empirical applications are the joint probabilities such as
P (X =x , Y =y ) and the conditional probabilities such as P (Y =1|X =1) and P (Y =1|X =0),
that is, the conditional probabilities of success given treatment and given control, respec-
tively. In order to estimate P (X =x , Y =y ) we only have to observe the relative frequencies
of the joint occurrence of treatment x and outcome y. In a simulation, we can easily repeat
this random experiment n times in order to generate a data sample of size n (see Exercise
1-7). In contrast, in an empirical application, estimating P (X =x , Y =y ) requires to sample
from a very large set of persons (observational units), and not just from the set { Joe , Ann }
of two persons. (See Splawa-Neyman, 1923/1990 for an early sampling model dealing with
the problem of non-replacement).
Table 1.3 shows the joint probabilities P (X =x , Y =y ) of treatment and success, as well
as the marginal probabilities P (X =x ) and P (Y =y) of treatment and success, respectively.
1.1 Example 1 — Joe and Ann With Self-Selection 7

1.0

Probability of success
0.8
0.6
0.6
0.42
0.4

0.2

0.0
Control Treatment

Figure 1.1. Probability of success given treatment conditions

These probabilities are easily computed from the probabilities of the elementary events
displayed in the second column of Table 1.2. For example, the probability P (X =0, Y =1)
that the sampled person receives no treatment and is successful is the sum of the proba-
bilities of the elementary events {ω2 } = {(Joe, no, +)} and {ω6 } = {(Ann, no, +)}, that is,
¡ ¢ ¡ ¢
P (X =0, Y =1) = P {(Joe, no, +)} + P {(Ann, no, +)} = .336 + .024 = .36.

Similarly, the probability P (X =1, Y =1) that the sampled person receives treatment
and is successful is the sum of the probabilities of the two elementary events {ω4 } =
{(Joe, yes, +)} and {ω8 } = {(Ann, yes, +)}, that is,
¡ ¢ ¡ ¢
P (X =1, Y =1) = P {(Joe, yes, +)} + P {(Ann, yes, +)} = .016 + .152 = .168.

Table 1.3 is the theoretical analog to a contingency table that would be observed in a
data sample. More precisely, if we multiply the displayed numbers by the sample size, then
we receive the expected frequencies of the corresponding events. For example, if the sample
size is 1000, then we expect 240 cases in cell (X =0, Y = 0) and 360 cases in cell (X =0, Y =1),
etc. Of course, in a data sample, the observed frequencies would fluctuate around these
expected frequencies (see again Exercise 1-7).

1.1.2 Marginal Probabilities P (X =x ) and P (Y =y)

The marginal probabilities P (X =x ) and P (Y =y) are also easily computed from the prob-
abilities of the elementary events displayed in the second column of Table 1.2. For exam-
ple, the probability P (X =0) that the sampled person receives no treatment is the sum of
the probabilities of the four elementary events {ω1 } = {(Joe, no, −)}, {ω2} = {(Joe, no, +)},
{ω5 } = {(Ann, no, −)}, and {ω6 } = {(Ann, no, +)}, that is,
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
P (X =0) = P {(Joe, no, −)} + P {(Joe, no, +)} + P {(Ann, no, −)} + P {(Ann, no, +)}
= .144 + .336 + .096 + .024 = .6,
8 1 Introductory Examples

and this implies

P (X =1) = 1 − P (X =0) = .4

Similary, the probability P (Y =1) that the sampled person is successful is the sum of the
probabilities of the four elementary events {ω2} = {(Joe, no, +)}, {ω4 } = {(Joe, yes, +)}, {ω6 } =
{(Ann, no, +)}, and {ω8 } = {(Ann, yes, +)}, that is,
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
P (Y =1) = P {(Joe, no, +)} + P {(Joe, yes, +)} + P {(Ann, no, +)} + P {(Ann, yes, +)}
= .336 + .016 + .024 + .152 = .528,

which implies

P (Y = 0) = 1 − P (Y =1) = .472.

1.1.3 Prima Facie Effect

Comparing the conditional probability P (Y =1|X =1) of success given the treatment con-
dition to the conditional probability P (Y =1|X =0) of success given the control condition
would lead us to the (wrong) conclusion that the treatment is harmful. These two condi-
tional probabilities can be computed by

P (Y =1, X =1) .168


P (Y =1| X =1) = = = .42
P (X =1) .4

and
P (Y =1, X =0) .36
P (Y =1| X = 0) = = = .6,
P (X =0) .6
respectively (see, e. g., RS-Def. 1.32 for the events {Y =1}, {X =0}, and {X =1}). Figure 1.1
displays both conditional probabilities in a bar chart.
These two conditional probabilities can be compared to each other in different ways.
The simplest one is looking at the difference P (Y =1| X =1) − P (Y =1| X = 0). This is a par-
ticular case of the difference E (Y |X =1) − E (Y |X =0) between two conditional expectation
values, in which the outcome variable Y is dichotomous with values 0 and 1 (see RS-
Rem. 3.22). Following Holland (1986), we will call this difference the (unconditional) prima
facie effect and use the notation

PFE 10 = E (Y |X =1) − E (Y |X =0) = P (Y =1| X =1) − P (Y =1| X = 0) .

Hence, in this example,

PFE 10 = P (Y =1| X =1) − P (Y =1| X = 0) = .42 − .6 = −.18.

Other possibilities of comparing the two conditional probabilities are to compute the odds
ratio, its logarithm, or the risk ratio (see, e. g., SN-sect. 13.3 or chapter 4 of Rothman, Green-
land, & Lash, 2008, for a detailed discussion of these and other effect parameters). No mat-
ter which of these effect parameters we choose, they all lead to the conclusion that the
treatment is harmful (see Exercise 1-8). As shown in the following section this conclusion
is utterly wrong.
1.1 Example 1 — Joe and Ann With Self-Selection 9

Table 1.4. Joint and marginal probabilities of all three observables

Joe (U =Joe )

Treatment
Success No (X =0) Yes (X =1)
No (Y = 0) .144 .004 .148
Yes (Y =1) .336 .016 .352
.48 .02 .5

Ann (U =Ann )

Treatment
Success No (X =0) Yes (X =1)
No (Y = 0) .096 .228 .324
Yes (Y =1) .024 .152 .176
.12 .38 .5

Note. The entries in the four cells for Joe and in the four cells for Ann are the joint probabilities
P(U =u , X =x ,Y =y). The other entries are the joint probabilities P(U =u , X =x ) (third and last
row) and P(U =u ,Y =y ) (last column), respectively, and the two marginal probabilities P(U =u ).

1.1.4 Individual Total Effects

The conclusion about the effect of the treatment is completely different if we look at the
treatment effects separately for Joe and Ann. Table 1.4 shows the joint distributions of
treatment, success, and the person variable U with values Joe and Ann . The probabilities
of sampling Joe and of sampling Ann are identical, that is, P (U =Joe ) = P (U =Ann ) = .5.
Furthermore, the joint probabilities P (U =u , X =x , Y =y ) are the probabilities of the ele-
mentary events displayed in the second column of Table 1.2. These joint probabilities are
displayed again in a form analog to (2×2×2)-contingency table in Table 1.4.
As already mentioned in section 1.1.1, in empirical applications, this random exper-
iment cannot be repeated in order to obtain a data sample. However, we can repeat it
in a simulation (see Exercise 1-7). If, in such a simulation, we multiply the numbers dis-
played in Table 1.4 by the sample size, then we receive the expected frequencies of the cor-
responding events. For example, if the sample size is 1000, then we expect 144 cases in
cell (U =Joe , X =0, Y = 0) and 336 cases in cell (U =Joe , X =0, Y =1), etc. Of course, in data
samples, the observed frequencies fluctuate around these expected frequencies.
Using the joint probabilities displayed in Table 1.4, the conditional probability of suc-
cess for Joe in the treatment condition can be computed as follows:
P (U =Joe , X =1, Y =1) .016
P (Y =1| X =1,U =Joe ) = = = .8
P (U =Joe , X =1) .016 + .004
(see Exercise 1-9). In contrast, Joe’s conditional probability of success in the control con-
dition is
10 1 Introductory Examples

P (U =Joe , X =0, Y =1) .336


P (Y =1| X = 0,U =Joe ) = = = .7.
P (U =Joe , X =0) .336 + .144

Hence,
P (Y =1| X =1,U =Joe ) − P (Y =1| X = 0,U =Joe ) = .8 − .7 = .1,

which may lead us to conclude that the treatment is beneficial for Joe. Again, because Y is
binary with values 0 and 1, this difference is a special case of the difference

E (Y | X =1,U =Joe ) − E (Y | X =0,U =Joe ),

which we call the individual total (treatment) effect of Joe, using the notation ITE U ;10 ( Joe ).
Hence,
ITE U ;10 ( Joe ) = E (Y | X =1,U =Joe ) − E (Y | X =0,U =Joe )
(1.4)
= P (Y =1| X =1,U =Joe ) − P (Y =1| X = 0,U =Joe ) .

What about the individual total effect of Ann? Table 1.4 shows that the conditional prob-
ability of success for Ann in the treatment condition is

P (U =Ann, X =1, Y =1) .152


P (Y =1| X =1,U =Ann ) = = = .4,
P (U =Ann, X =1) .152 + .228

whereas it is

P (U =Ann, X =0, Y =1) .024


P (Y =1| X = 0,U =Ann ) = = = .2
P (U =Ann, X =0) .024 + .096

in the control condition. Figure 1.2 shows these conditional probabilities in a bar chart.
Considering the individual total effect

ITEU ; 10 (Ann ) = P (Y =1| X =1,U =Ann ) − P (Y =1| X = 0,U =Ann )


(1.5)
= .4 − .2 = .2

of Ann may lead us to conclude that the treatment is also beneficial for Ann.
Hence, it seems that the treatment is beneficial for Joe and for Ann. This seems to con-
tradict our finding ignoring the person variable. Just considering the prima facie effect

PFE 10 = E (Y |X =1) − E (Y |X =0) = P (Y =1| X =1) − P (Y =1| X = 0) = −.18,

ignoring the person variable U , the treatment seems to be harmful.

1.1.5 Prima Facie Effect Versus Expectation of the Individual Total Effects

In contrast to our intuition, the prima facie effect E (Y |X =1) − E (Y |X =0) is neither the
simple average nor any weighted average of the corresponding individual total effects

ITE U ; 10 (u) = E (Y |X =1,U =u ) − E (Y |X =0,U =u ). (1.6)

This is studied in more detail in the sequel.


1.1 Example 1 — Joe and Ann With Self-Selection 11

Conditional probability of success


1.0
0.8
0.8
0.7

0.6
0.4
0.4
0.2
0.2

0.0
Control Treatment Control Treatment
Joe Ann

Figure 1.2. Conditional probabilities of success given treatment and person

Prima Facie Effect

The conditional probability P (Y =1| X = 0) of success given control is the sum of the corre-
sponding probabilities P (Y =1| X = 0,U =Joe ) and P (Y =1| X = 0,U =Ann ), weighted by the
conditional probabilities P (U =Joe |X =0) and P (U =Ann|X =0), respectively, that is,
P (Y =1| X = 0) = P (Y =1| X = 0,U =Joe ) ·P (U =Joe |X =0) +
P (Y =1| X = 0,U =Ann ) ·P (U =Ann|X =0)
.48 .12
= .7 · + .2 · = .6
.6 .6
[see Box 3.2 (ii) and Exercise 1-10]. Because the difference between the conditional prob-
abilities P (U =Joe |X =0) = .48/.6 and P (U =Ann|X =0) = .12/.6 is large, the probability of
success in treatment 0 is much closer to .7 than to .2 (see the dots above X = 0 in Fig. 1.3).
Similarly, the conditional probability P (Y =1| X =1) of success given treatment con-
dition (X =1) is the sum of the two corresponding individual conditional probabilities
P (Y =1| X =1,U =Joe ) and P (Y =1| X =1,U =Ann ), weighted by the conditional probabili-
ties P (U =Joe |X =1) and P (U =Ann|X =1), respectively, that is,
P (Y =1| X =1) = P (Y =1| X =1,U =Joe ) ·P (U =Joe |X =1) +
P (Y =1| X =1,U =Ann ) ·P (U =Ann|X =1)
.02 .38
= .8 · + .4 · = .42.
.4 .4
Hence, the prima facie effect is
PFE 10 = P (Y =1| X =1) − P (Y =1| X = 0)
X
= P (Y =1|X =1,U =u ) · P (U =u |X =1) −
u
X (1.7)
P (Y =1|X =0,U =u ) · P (U =u |X =0)
u
= .42 − .6. = −.18.
12 1 Introductory Examples

1.0

Conditional probability of success


0.8 P(Y =1|X =x ,U =Joe )

0.6

P(Y =1|X )
0.4 P(Y =1|X =x ,U =Ann )

0.2

0.0
0 1 X

Figure 1.3. Conditional probabilities of success given treatment and person

Because the two (X =1)-conditional probabilities P (U =Joe |X =1) = .02/.4 = .05 and
P (U =Ann|X =1) = .38/.4 = .95 are very different, the probability of success in treatment
1 is much closer to .4 than to .8 (see the dots above X =1 in Fig. 1.3). (The size of the area
of the dotted circles is proportional to the conditional probabilities P (U =u |X =x ) that are
used in the computation of the conditional expectation values E (Y |X =x ). [This kind of
graphics has been adopted from Agresti, 2007)].

Average of the Individual Effects

The prima facie effect is not identical to the expectation of the individual total effects,
which is the expectation of the random variable ITE U ; 10 (U ), the values of which are the
two individual total effects ITE U ; 10 ( Joe ) and ITE U ; 10 (Ann ) for Joe and Ann, respectively,
that is,
¡ ¢ X
E ITE U ;10 (U ) = ITE U ; 10 (u) · P (U =u )
u
X
= P (Y =1|X =1,U =u ) · P (U =u ) − (1.8)
u
X
P (Y =1|X =0,U =u ) · P (U =u ).
u

Because the two individual effects are ITE U ; 10 ( Joe ) = .1 and ITE U ; 10 (Ann ) = .2,

¡ ¢ 1 1
E ITEU ;10 (U ) = .1 · P (U = Joe ) + .2 · P (U =Ann ) = .1 · + .2 · = .15.
2 2

Hence, whereas the prima facie effect PFE 10 = P (Y =1| X =1)− P (Y =1| X = 0) is negative,
namely −.18, the expectation of the individual total-effect variable is positive, namely .15
(see Exercise 1-15). This expectation will be called the causal average total effect of the
treatment on the outcome variable Y , denoted ATE 10 .
1.2 Example 2 — Experiment With Two Nonorthogonal Factors 13

1.1.6 How to Evaluate the Treatment?

The conclusions drawn from the prima facie effect

PFE 10 = P (Y =1|X =1) − P (Y =1|X =0)

and from the individual effects

ITEU ;10 (u) = P (Y =1|X =1,U =u ) − P (Y =1|X =0,U =u )

are contradictory. Which of these comparisons should we trust? Is the treatment harmful
as P (Y =1|X =1) − P (Y =1|X =0) = −.18 suggests? Or is it beneficial as suggested by the
two positive differences P (Y =1|X =1,U =u ) − P (Y =1|X =0,U =u )? Which of these com-
parisons are meaningful for evaluating the causal total effect of the treatment on the suc-
cess variable Y ? Before we come back to these questions, we consider another example.

1.2 Example 2 — Experiment With Two Nonorthogonal Factors

In this section, we treat an example with three treatment conditions, representing two
treatments and a control, for instance. Furthermore, there are a discrete covariate with
three values, representing, for example, educational status, and a quantitative outcome
variable, indicating the degree of success, for instance.1
Table 1.5 shows the table of a random experiment. Again this random experiment is
composed of three parts.
(a) A person is sampled from a set of eight persons with identical probabilities for each
person u, that is, with probability P (U =u ) = 1/8.
(b) If Tom is sampled, then he obtains treatment 1 with probability P (X =1|U =Tom )
= 10/60 and treatment 2 with probability P (X =2 |U =Tom ) = 3/60. The corre-
sponding probabilities are also displayed for the other seven persons. The proba-
bilities of getting treatment 0 can be computed from the displayed probabilities for
treatment 1 and 2. For example, for Tom it is P (X =0 |U =Tom ) = 1 − (10/60 + 3/60).
And again, all these conditional probabilities may reflect self-selection to one of the
treatments and the different inclinations of the persons to go to those treatments.
(c) After receiving treatment, a value of the outcome variable Y (success) is assessed.
These values cannot be displayed in the table, because we assume that Y is contin-
uous. Instead, the table displays the (U =u , X =x )-conditional expectation values of
Y . If Tom is sampled and not treated, then his expectation value E (Y |U =Tom , X =0)
1 In this example, we consider a (3×3)-factorial design with crossed, non-orthogonal factors. The analysis of
such designs has been puzzling many statisticians (see, e. g., Aitkin, 1978; Appelbaum & Cramer, 1974; Carlson &
Timm, 1974; Gosslee & Lucas, 1965; Jennings & Green, 1984; Keren & Lewis, 1976; Kramer, 1955; Langsrud, 2003;
Nelder & Lane, 1995; Overall & Spiegel, 1969, 1973b, 1973a; Overall, Spiegel, & Cohen, 1975; Williams, 1972). In
fact, none of the statistical packages such as SAS, SysStat, or SPSS with their Type I, II, III or IV sums of squares
provide correct estimates and tests of the average effects (or main effects) for such a design unless the second
factor has a uniform distribution, with equal probabilities for all values of the second factor. In this case Type III
analysis yields correct results, at least, if the second factor is assumed to be fixed. However, in most applications
in the social sciences, the second factor is not fixed but stochastic with a distribution of this factor (a qualitative
random variable) varying between samples. Mayer and Thoemmes (2019) show how to conduct a correct analysis
including average total effects (see also Exercise 1-17).
14 1 Introductory Examples

Table 1.5. Random experiment with two nonorthogonal factors

E(Y |X = 0,U =u )

E(Y |X =1,U =u )

E(Y |X =2,U =u )
P (X =1 |U =u )

P (X =2 |U =u )
Educational
Person u

P (U =u )
status z
Tom low 1/8 10/60 3/60 120 100 80
Tim low 1/8 18/60 9/60 120 100 80
Joe med 1/8 26/60 17/60 90 90 70
Jim med 1/8 26/60 17/60 100 100 80
Ann med 1/8 26/60 17/60 120 100 100
Eva med 1/8 26/60 17/60 130 110 110
Sue hi 1/8 12/60 44/60 60 100 140
Mia hi 1/8 16/60 36/60 60 100 140

of Y is 120. If he is sampled and receives treatment 1, then his expectation value


E (Y |U =Tom , X =1) is 100, and if he gets treatment 2, then his expectation value
E (Y |U =Tom , X =2) is 80. The corresponding conditional expectation values are
also displayed in Table 1.5 for the other seven persons.

This table also contains the values of a qualitative covariate Z , which indicates an at-
tribute of the person, his or her educational status. Because it is an attribute of the person,
there is no extra sampling of Z . The value z of Z is fixed as soon as the person is actually
sampled. In chapter 2, we will also deal with random experiments in which we do have an
extra sampling process of one or several covariates. This will always be the case if, given
the person, his or her value on Z is not fixed. A typical example is Z being a fallible pretest,
for instance, a psychological test (say, of live satisfaction) that is not perfectly reliable, so
that there is measurement error.

For this example to be realistic, we have to assume that there is still variation of the
outcome variable Y in each combination of person and treatment condition. This condi-
tional variance may be due to (a) measurement error, but also to (b) mediator effects, that
is, to effects of variables and events that are in between X and the outcome variable Y in
the process considered. Because Y is continuous and subject to measurement error and
mediator effects, a full table similar to Table 1.2 with all possible outcomes is not feasi-
ble. It will not exist if Y is actually continuous, which would be true if we would assume,
for example, that Y has a normal distribution given the combination of a person u and a
treatment condition x.

Nevertheless, it is still possible to present a table that is analog to Table 1.1. In that table,
the conditional probabilities P (Y =1|X =x ,U =u ) are identical to the conditional expecta-
tion values E (Y | X =x ,U =u ) if Y is binary with values 0 and 1 (see RS-Rem. 3.22).
1.2 Example 2 — Experiment With Two Nonorthogonal Factors 15

Table 1.6. Conditional expectation values of the outcome variable Y given treatment

Treatment condition E (Y |X =x ) P(X =x )


X =0 (Control) 111.25 1/3
X =1 (Treatment 1) 100 1/3
X = 2 (Treatment 2) 114.25 1/3
E (Y ) 108.5

1.2.1 Prima Facie Effects

The conditional expectation values of the outcome variable Y given one of the three treat-
ment conditions x are displayed in Table 1.6. The ratios in the last column are the treat-
ment probabilities P (X =x ), which are 1/3 for all three values x of X . Note that this is not a
randomized design as will become obvious if we look at the joint probabilities of X and Z
(see Table 1.7). Furthermore, considering the conditional expectation values, and not the
sample means, should make clear that we are not discussing statistical inference (i. e., in-
ference from sample statistics to true parameters), but causal inference, that is, inference
from the conditional expectation values such as E (Y | X =x ) or E (Y | X =x , Z =z) to causal
effects.
If our evaluation of the treatment effects were based on the differences between the
conditional expectation values E (Y |X =x ) of Y in the three treatment conditions x, then
we would conclude that there is a negative effect of treatment 1 compared to control,
namely,
E (Y |X =1) − E (Y |X =0) = 100 − 111.25 = −11.25,

and a positive effect of treatment 2 compared to control, namely,

E (Y |X =2) − E (Y |X =0) = 114.25 − 111.25 = 3

(see Exercise 1-16).

1.2.2 (Z =z)-Conditional Prima Facie Effects

A second attempt to evaluate the ‘effects’ of the treatment is to look at the differences
between the conditional expectation values of Y in the three treatment conditions given
one of the three values of Z : low, med, and hi. These (Z =z)-conditional effects are also
called simple effects in the literature on analysis of variance.
Table 1.7 displays the conditional expectation values of the outcome variable Y in the
nine cells of the (3×3)-design. The ratios in parentheses are the probabilities that the pairs
(x, z) of values of X and Z are observed. Hence, this table contains the conditional expec-
tation values (true cell means) E (Y | X =x , Z =z) of the outcome variable Y , and the joint
probabilities P (X =x , Z =z) determining the true joint distribution of X and Z .2

2 In this context, ‘true’ just indicates that we are not referring to sample means or relative frequencies in a sample.
Instead these are the true means around which sample means would vary.
16 1 Introductory Examples

Table 1.7. Conditional expectation values E (Y | X =x , Z=z) given treatment and status

Status
Treatment low (Z = 0) med (Z =1) hi (Z = 2)
X =0 120 (20/120) 110 (17/120) 60 (3/120) (40/120)
X =1 100 (7/120) 100 (26/120) 100 (7/120) (40/120)
X =2 80 (3/120) 90 (17/120) 140 (20/120) (40/120)
(30/120) (60/120) (30/120)

Note. Probabilities P(X =x , Z=z), P(Z=z), and P(X =x ) in parentheses.

In the low status condition (Z = 0), there are large negative effects, both of treatment 1
and of treatment 2 compared to the control:
PFE Z ;10 (0) = E (Y | X =1, Z = 0) − E (Y | X = 0, Z = 0) = 100 − 120 = −20
and
PFE Z ;20 (0) = E (Y | X =2, Z = 0) − E (Y | X = 0, Z = 0) = 80 − 120 = −40.
In the medium status condition (Z =1), there are also negative effects of treatment 1 and
of treatment 2 compared to the control:
PFE Z ;10 (1) = E (Y |X =1, Z =1) − E (Y |X = 0, Z =1) = 100 − 110 = −10
and
PFE Z ; 20 (1) = E (Y | X =2, Z =1) − E (Y |X = 0, Z =1) = 90 − 110 = −20.
Finally, in the high status condition (Z =2), the effects of treatment 1 and treatment 2 are
both positive:
PFE Z ; 10 (2) := E (Y | X =1, Z =2) − E (Y | X =0, Z =2) = 100 − 60 = 40
and
PFE Z ;20 (2) := E (Y | X =2, Z =2) − E (Y | X =0, Z =2) = 140 − 60 = 80.
Based on these comparisons, we can conclude that the ‘effects’ of the treatments depend
on the status of the subjects: the differences between the expectations of Y are negative
for subjects with low and medium status, and they are positive for the subjects with high
status.

1.2.3 Average of the (Z =z)-Conditional Prima Facie Effects

Now we consider the average of the (Z =z)-conditional prima facie effects, where Z is again
the qualitative covariate status.3 Because we already looked at the corresponding (Z =z)-
conditional prima facie effects (see section 1.2.2), we just have to compute their averages,
3 Note that we assume that Z is a random variable. In contrast, in analysis of variance it is assumed that Z is
a fixed factor with a fixed number of observations for each value z of Z , that is, these numbers of observations
are assumed to be invariant across different samples. In many empirical applications, this assumption is not
realistic, but it does not invalidate the statistical conclusions as long as the parameters of interest do not involve
the distribution of Z . However, a hypothesis about the average total effect does involve the distribution of Z
if we the term ‘average’ is specified as an expectation value, and this is the reason why programs on analysis
of variance usually are not able to correctly estimate and test hypotheses about average total effects. For more
details see again Mayer and Thoemmes (2019).
1.2 Example 2 — Experiment With Two Nonorthogonal Factors 17

160

Conditional expectation values E (Y | X =x , Z =2)

120
E (Y |X =x )

E (Y | X =x , Z =1)
80 E (Y | X =x , Z = 0)

40
0 1 2 X

Figure 1.4. Conditional expectation values of Y given treatment and status

more precisely, the expectations of these conditional effects over the distribution of status:

¡ ¢ X 1 1 1
E PFE Z ;10 (Z ) = PFE Z ; 10 (z) · P (Z =z) = (−20) · + (−10) · + 40 · = 0. (1.9)
z 4 2 4

Hence, the average of the (Z =z)-conditional prima facie effects of treatment 1 compared
to the control is 0.
Comparing treatment 2 to control yields the average of the (Z =z)-conditional prima
facie effects:
¡ ¢ X 1 1 1
E PFE Z ;20 (Z ) = PFE Z ; 20 (z) · P (Z =z) = (−40) · + (−20) · + 80 · = 0. (1.10)
z 4 2 4

According to this result, the average effect of the (Z =z)-conditional prima facie effects of
treatment 2 compared to the control is 0 as well.

1.2.4 Individual Effects

In this fictive example we can also look at the individual effects of treatment 1 compared
to control and treatment 2 compared to control. These two effects can be read from Table
1.5 for each person. For example, for Tom the individual effect of treatment 1 compared to
control is

ITE U ; 10 (Tom ) = E (Y | X =1, Tom ) − E (Y | X =0, Tom ) = 100 − 120 = −20,

and his individual effect of treatment 2 compared to control is

ITE U ;20 (Tom ) = E (Y | X =1, Tom ) − E (Y | X =0, Tom ) = 80 − 120 = −40.

Correspondingly, for Joe the individual effect of treatment 1 compared to control is


18 1 Introductory Examples

ITEU ;10 ( Joe ) = E (Y | X =1, Joe ) − E (Y | X =0, Joe ) = 90 − 90 = 0,

and his individual effect of treatment 2 compared to control is

ITE U ; 20 ( Joe ) = E (Y | X =1, Joe ) − E (Y | X =0, Joe ) = 70 − 90 = −20.

For the reasons mentioned in section 1.1.1, unlike the (Z =z)-conditional prima facie ef-
fects treated in section 1.2.2, the individual effects usually cannot be estimated in em-
pirical applications. Nevertheless, they will play a crucial role in the definition of causal
effects.
Of course, individual effects are more informative than their average if we want to know
which treatment is the best for which individual. Nevertheless, we might ask: What are the
total individual treatment effects on average ? And, which are the (Z =z)-conditional effects,
that is, the total individual treatment effects on average, given the value z of Z ? Further-
more, if the total individual effects cannot be estimated in empirical applications under
realistic assumptions, is it possible to estimate at least the average of the total individual
treatment effects and/or the total individual treatment effects on average given the value
z of Z ? And if yes, under which conditions?

1.2.5 Average of the Individual Effects

Note that we have two averages of the individual effects in this example; we can com-
pare treatment 1 to control and treatment 2 to control. Because we already looked at the
corresponding individual effects, we just have to compute their averages, that is, the ex-
pectations of these conditional effects over the distribution of the person variable U , that
is, ¡ ¢ X
E ITE U ; 10 (U ) = ITE U ; 10 (u) · P (U =u )
u
1 1 1 1
= (100 − 120) · + (100 − 120) · + (90 − 90) · + . . . + (100 − 60) · = 0.
8 8 8 8
Hence, the average total effect of treatment 1 compared to the control is 0. Comparing
treatment 2 to control yields
¡ ¢ X
E ITEU ;20 (U ) = ITE U ; 20 (u) · P (U =u )
u
1 1 1 1
= (80 − 120) · + (80 − 120) · + (70 − 90) · + . . . + (140 − 60) · = 0.
8 8 8 8
According to this result, the average total effect of treatment 2 compared to the control is
0 as well.

1.2.6 (Z =z)-Conditional Total Effects

Again, because we already know the individual effects, we just have to compute their av-
erages given the value z of the covariate, or more precisely, the (Z =z)-conditional expec-
tation values of the corresponding individual effects. We exemplify the computations for
the value med of the covariate (second factor) Z . Comparing treatment 1 to control yields
¡ ¯ ¢ X
E ITE U ;10 (U ) ¯ Z =med = ITE U ; 10 (u) · P (U =u |Z =med )
u
1 1 1 1
= (90 − 90) · + (100 − 100) · + (100 − 120) · + (110 − 130) · = −10.
4 4 4 4
1.3 Summary and Conclusion 19

Hence, the (Z =med )-conditional total effect of treatment 1 compared to control is −10.
Comparing treatment 2 to control we obtain
¡ ¯ ¢ X
E ITEU ;20 (U ) ¯ Z =med = ITE U ; 20 (u) · P (U =u |Z =med )
u
1 1 1 1
= (70 − 90) · + (80 − 100) · + (100 − 100) · + (110 − 110) · = −10.
4 4 4 4
Hence, the (Z =med )-conditional total effect of treatment 2 compared to control is −10 as
well.
¡ The computations¯ ¢for the other two
¡ values of ¯ Z are analogous.
¢ For Z =low we obtain
E ITE U ; 10¡ (U ) ¯ Z =low¯ = −20 ¢ and E ITE U¡;20 (U ) ¯ Z =low = −40. In contrast, for Z =hi we
¯ ¢
obtain E ITEU ; 10 (U ) ¯ Z =hi = −40 and E ITE U ;20 (U ) ¯ Z =hi = −80.

1.2.7 How to Evaluate the Treatment?

To summarize, we discussed several ways that may, at first sight, be used to evaluate the
treatment effects in empirical applications: First, we may compare the differences be-
tween the conditional expectation values E (Y |X =x ) of the outcome variable in the three
treatment conditions x ∈ { 0, 1, 2}. Second, we may consider the corresponding differences
between the conditional expectation values E (Y |X =x ,Z =z) given each of the three val-
ues z ∈ {low, med, hi } of status. Third, we may compare the expectations of these differ-
ences between the (X =x , Z =z)-conditional expectation values over the distribution of Z .
All these comparisons yield different results. Which of them are meaningful for the evalu-
ation of the treatment effects? All three of them, or only two, just one, or none at all? And
which are the conditions under which they are meaningful?
Furthermore, we also presented three parameters based on individual effects that may
be used to evaluate the treatment effects: First, the individual total effect of a treatment
compared to a control. These effects are hard to estimate in empirical applications, unless
we introduce very restrictive assumptions. Second, the expectation of these individual ef-
fects, which in this example, might also be called average total effects. This kind of effect
is less informative than the individual total effects, but it is a summary parameter that in-
forms us if the treatment is beneficial, ineffective, or even harmful on average. Third, the
conditional expectation values of the individual total effects, which in this example, might
also be called the (Z =z)-conditional total treatment effects. They inform us with a single
number for each value z if the treatment is beneficial, ineffective, or even harmful on av-
erage for those individuals with value z on the covariate Z . Box 1.1 provides a summary of
these effects.
In this example, the averages of the (Z =z)-conditional prima facie effects are identical
to the averages of the individual total effects. Is this just a coincidence? Or is this due to
systematic conditions that hold in this example? If yes, which are these conditions?

1.3 Summary and Conclusion

In this chapter, we treated two examples. In the first one, a dichotomous treatment vari-
able X has a negative (prima facie) effect P (Y =1|X =1) − P (Y =1|X =0) on a dichotomous
outcome variable Y (‘success’), although the corresponding individual treatment effects
20 1 Introductory Examples

P (Y =1|X =1,U =u ) − P (Y =1|X =0,U =u )

are positive. Taking the expectation of these two individual effects also yields a positive
effect.
In the second example, there are nonzero differences E (Y |X =1) − E (Y |X =0) and
E (Y |X =2) − E (Y |X =0), where Y is a quantitative outcome variable, and nonzero condi-
tional ‘effects’ E (Y |X =1, Z =z) − E (Y |X =0, Z =z) and E (Y |X =2, Z =z) − E (Y |X =0, Z =z)
for the different values z of status. The expectations of these (Z =z)-conditional prima fa-
cie effects (comparing treatment 1 to 0 and comparing treatment 2 to 0) over the three
status conditions are zero.

The Problem

Box 1.1 displays the various total effects that have been computed and discussed in the
two examples. Because the conclusions drawn from each of these putative effects are con-
tradictory, which of these should we trust? In the first example: Is the treatment harmful —
as the difference P (Y =1|X =1) − P (Y =1|X =0) suggests? Or is it beneficial as suggested by
the individual effects P (Y =1|X =1,U =u ) − P (Y =1|X =0,U =u )? In the second example:
Do the prima facie effects E (Y | X =1) − E (Y | X =0) have a meaningful causal interpreta-
tion? Or do the (Z =z)-conditional prima facie effects E (Y | X =1, Z =z) − E (Y | X =0, Z =z)
have a meaningful causal interpretation? And, does this apply also to their expectation?
In the first example, we demonstrated that we cannot expect that the difference

P (Y =1|X =1) − P (Y =1|X =0)

is the average (expectation) of the corresponding person-specific differences

P (Y =1|X =1,U =u )− P (Y =1|X =0,U =u ) .

Similarly, in the second example, we showed that we can not expect that the difference

E (Y | X =1) − E (Y | X =0)

is the average (expectation) of the corresponding differences

E (Y | X =1, Z =z) − E (Y | X =0, Z =z)

given a value z of status. And, how do we know that these (Z =z)-conditional effects are
meaningful for the evaluation of the treatment? As noted before, these questions are not
related to statistical inference; they are not raised at the sample level, but on the level of
true conditional expectation values!
Hence our examples show that conditional expectation values and their differences,
the prima facie effects, can be totally misleading in evaluating the effects of a treatment
variable X on an outcome variable Y . This conclusion can also be extended to conditional
probabilities, to correlations and to all other parameters describing relationships and de-
pendencies between random variables. They all are like the shadow in the metaphor of the
invisible man (see the preface).
If this is true, is the whole idea of learning from experience — the core of empirical
sciences — wrong? Our answer is ‘No’. However, we have to be more explicit in what we
1.3 Summary and Conclusion 21

Box 1.1 Various effects treated in this chapter

PFE xx ′ Prima facie effect of treatment x compared to treatment x ′. It is de-


fined by
PFE xx ′ := E (Y |X =x ) −E (Y |X =x ′ ) .

PFE Z ; x x ′ (z) (Z=z)-conditional prima facie effect of treatment x compared to


treatment x ′. It is defined by

PFE Z ; x x ′ (z) := E (Y |X =x ,Z=z) −E (Y |X =x ′, Z=z ) .

¡ ¢
E PFE Z ; x x ′ (Z ) Expectation of the (Z=z)-conditional prima facie effects of treat-
ment x compared to treatment x ′. If Z is discrete, then it is com-
puted by
¡ ¢ X
E PFE Z ; x x ′ (Z ) := PFE Z ; x x ′ (z) · P(Z=z ).
z

ITEU ; x x ′ (u) Individual effect of treatment x compared to treatment x ′. It is de-


fined by

ITEU ; xx ′ (u) := E (Y |X =x ,U =u ) −E (Y |X =x ′,U =u ) .

¡ ¢
E ITEU ; x x ′ (U ) Expectation of the individual effects of treatment x compared to
treatment x ′. It is computed by
¡ ¢ X
E ITEU ; xx ′ (U ) := ITEU ; xx ′ (u) · P(U =u ).
u
¡ ¯ ¢
E ITEU ; x x ′ (U ) ¯ Z=z (Z =z)-conditional expectation value of the individual effects of
treatment x compared to treatment x ′. It is computed by
¡ ¯ ¢ X
E ITEU ; xx ′ (U ) ¯ Z=z := ITEU ; x x ′ (u) · P(U =u |Z=z).
u

mean by terms like ‘X affects Y ’, ‘X has an effect on Y ’, ‘X influences Y ’, ‘X leads to Y ’,


and so on used in our theories and hypotheses. How can these terms be uniquely defined
in a language that is compatible with statistical analyses of empirical data? How to design
an empirical study and how to look at the resulting data if we want to probe our theories
and learn about the causal effects postulated in these theories and hypotheses?
In the chapters to come we will show that these parameters are meaningful under cer-
tain conditions, just like the shadow of the invisible man can be meaningful under certain
conditions in order to measure his height. In the metaphor a crucial condition is the 45°
degree angle of the sun. Do we also have such a crucial condition for causal inference? We
know that a reversal of total effects does not occur in randomized experiments, that is, in
experiments in which observational units (in the social and behavioral sciences, usually
the subjects or individuals) are randomly assigned to one of at least two treatment condi-
tions. In the randomized experiment comparing conditional expectation values is infor-
22 1 Introductory Examples

mative about total causal treatment effects. But why? What is so special in the randomized
experiment? Which are the mathematical conditions that we create in a randomized ex-
periment? Are there also conditions that can be utilized in quasi-experimental evaluation
studies? How can we estimate causal effects in quasi-experimental observational studies?
Conclusive answers to these questions can be hoped for only within a theory of causal
effects.

Relevance of the Problem

Obviously, these questions are of fundamental importance for the methodology of empir-
ical sciences and for the empirical sciences themselves. The answers to these questions
have consequences for the design and analysis of experiments, quasi-experiments, and
other studies aiming at estimating the effects of treatments, interventions, or expositions.
No prevention study can meaningfully be conducted and analyzed without knowing the
concepts of causal effects and how they can be estimated from empirical data. Similarly,
without a clear concept of causal effects we are not able to learn from our data about the
effects of a certain (possibly harmful) environment on our health, or about the effects of
certain behaviors such as smoking or drug abuse. Again, this is similar to the problem of
measuring the invisible man’s size via the length of his shadow: only with a clear concept
of size, some basic knowledge in geometry, and the additional information such as the an-
gle of the sun at the time of measurement are we able to determine his size of the man
from the length of his shadow.

Research Traditions

Of course, raising these questions and attempting answers is not new. Immense knowl-
edge and wisdom about experiments and quasi-experiments has been collected in the
Campbellian tradition of experiments and quasi-experiments (see, e. g., Campbell & Stan-
ley, 1963; Cook & Campbell, 1979; Shadish et al., 2002). In the last decades, a more for-
mal approach has been developed supplementing the Campbellian theory and terminol-
ogy in important aspects: the theory of causal effects in the Neyman-Rubin tradition (see,
e. g., Splawa-Neyman, 1923/1990; Rubin, 1974, 2005). Many papers and books indicate the
growing influence of this theory (see, e. g., Greenland, 2000, 2004; Höfler, 2005; Rosen-
baum, 2002a; Rubin, 2006; Winship & Morgan, 1999; Morgan & Winship, 2007) and re-
markable efforts have already been made to integrate it into the Campbellian framework
(West, Biesanz, & Pitts, 2000). Furthermore, these questions have also been dealt with in
the graphical modeling tradition (see, e. g., Pearl, 2009; Spirtes et al., 2000) as well as in
biometrics, econometrics, psychometrics, epidemiology, and other fields dealing with the
methodology of empirical research.

Outlook

In this volume, we present the theory of causal total effects in terms of mathematical prob-
ability theory. We show that a number of questions that have been debated controver-
sially and inconclusively can now be given a clear-cut answer. What kinds of causal effects
can meaningfully be defined? Which design techniques allow for unbiased estimation of
causal effects? How to analyze nonorthogonal ANOVA designs (cf., e. g., Aitkin, 1978; Ap-
1.4 Exercises 23

pelbaum & Cramer, 1974; Gosslee & Lucas, 1965; Maxwell & Delaney, 2004; Overall et al.,
1975)? How to analyze non-equivalent control-group designs (cf., e. g., Reichardt, 1979)?
Should we compare pre-post differences between treatment groups (cf., e. g., Lord, 1967;
Senn, 2006; van Breukelen, 2006; Wainer, 1991)? Should we use analysis of covariance to
adjust for differences between treatment and control that already existed prior to treat-
ment (cf., e. g., Maxwell & Delaney, 2004; Cohen, Cohen, West, & Aiken, 2003)? Should
we use propensity score methods instead of the more traditional procedures mentioned
above (cf., e. g., Rosenbaum & Rubin, 1983b)? How do we deal with non-compliance to
treatment assignment (cf., e. g., Cheng & Small, 2006; Dunn et al., 2003; Jo, 2002a, 2002b,
2002c; Jo, Asparouhov, Muthén, Ialongo, & Brown, 2008; J. Robins & Rotnitzky, 2004;
J. M. Robins, 1998)?
We do not treat the statistical sampling models with their distributional assumptions,
their implications for parameter estimation, and the evaluation (or tests) of hypotheses
about these parameters. However, we will discuss the virtues and problems of general
strategies of data analysis such as the analysis of difference scores, analysis of covariance,
its generalizations, and analysis based on propensity scores.

1.4 Exercises

⊲ Exercise 1-1 Why do we need the concept of a causal treatment effect?

⊲ Exercise 1-2 What is the relationship between the unconditional prima facie effect PFE 10 and the
expectations E (Y |X =0) and E (Y |X =1) of the outcome variable Y in the two treatment conditions?

⊲ Exercise 1-3 Compute the probabilities P(X =x ,Y =y) presented in Table 1.3 from the probabili-
ties P(U =u , X =x ,Y =y) presented in Table 1.4.

⊲ Exercise 1-4 Which are the kinds of prima facie effects treated in this chapter?

⊲ Exercise 1-5 What is the difference between statistical inference and causal inference?

⊲ Exercise 1-6 Why are the conditional expectation values E (Y |X =x ) in treatment conditions x
also the (X =x )-conditional probabilities for the event {Y =1} in the first example treated in this chap-
ter?

⊲ Exercise 1-7 Download Kbook Table 1.1.sav from www.causal-effects.de. This data set has been
generated from Table 1.1 for a sample of size N = 10,000. Compute the contingency table corre-
sponding to Table 1.3 and the associated estimates of the conditional probabilities P(Y =1|X =0)
and P(Y =1|X =1).

⊲ Exercise 1-8 Use P(Y =1| X =1) = .42 and P(Y =1| X = 0) = .6 computed in section 1.1.3 in order to
compute the corresponding odds ratio, its logarithm, and the risk ratio, according to the definitions
of these parameters presented in SN-sect. 13.3.

⊲ Exercise 1-9 Compute the conditional probability P(Y =1| X =1,U =Joe ) from Table 1.4.

⊲ Exercise 1-10 Compute the probability P(Y =1| X = 0) from the corresponding conditional prob-
abilities P(Y =1|X =0,U =u ).

⊲ Exercise 1-11 What (i. e., how big) are the unconditional prima facie effects of the treatments, that
is, the prima facie effects E (Y |X =1) −E (Y |X =0) and E (Y |X =2) −E (Y |X =0) in the second example
of this chapter?
24 1 Introductory Examples

⊲ Exercise 1-12 What are the conditional prima facie effects of the treatments, that is, the prima
facie effects E (Y |X =1, Z=z) −E (Y |X =0, Z=z) and E (Y |X =2, Z=z) −E (Y |X =0, Z=z ) in the second
example of this chapter?

⊲ Exercise 1-13 What are the averages of the conditional prima facie effects

E (Y |X =1, Z=z) − E (Y |X =0, Z=z) and E (Y |X =2, Z=z) − E (Y |X =0, Z=z)

in the second example of this chapter?

⊲ Exercise 1-14 Compute the conditional probability P(U =Tom | X =0) from the parameters pre-
sented in Tables 1.5 and 1.6.

⊲ Exercise 1-15 Open the Causal Effects Xplorer with table K-book table 1.1.tab. Change the condi-
tional probabilities P(X =1|U =u ) of receiving treatment 1 for Joe and Ann to 2/5. Then compare the
two individual treatment effects of Joe and Ann and their average to the prima facie effect E (Y |X =1)
−E (Y |X =0).

⊲ Exercise 1-16 Open the Causal Effects Xplorer with table K-book Table 1.1.tab displaying the con-
ditional probabilities P(U =u |X =x ). Then use RS-Box 3.2 (ii) in order to compute the three condi-
tional expectation values E (Y |X =x ) displayed in Table 1.6 from the parameters presented in Table
1.5.

⊲ Exercise 1-17 Download Kbook Table 1.5.sav from www.causal-effects.de. This data set has been
generated with the Causal Effects Xplorer from Table 1.5 for a sample of size N = 10,000 with error
variance 10 given each person.
(a) Compute the cell means and the relative frequencies of observations in each of the nine cells
of the (3×3)-table.
(b) Use each of the procedures offered by your statistical program package to analyze the data
including a test of the main effects of the treatment factor (most programs offer Typ I, II and
III sums of squares for such an analysis).
(c) Compare the results of these analyses to the parameters presented in Table 1.7.

Solutions

⊲ Solution 1-1 We need the concept of a causal treatment effect, because the two examples show
that differences between conditional expectation values are meaningless for the evaluation of the ef-
fects of a treatment, unless we can show how the differences between these conditional expectation
values are related to the causal treatment effects. Obviously, without a definition of causal treatment
effects this is not possible. Estimating causal treatment effects is crucial for answering questions
such as ‘Does the treatment help our patients with respect to the outcome variable considered?’
⊲ Solution 1-2 The unconditional prima facie effect PFE 10 is defined as the difference between the
two conditional expectation values E (Y |X =1) and E (Y |X =0).
⊲ Solution 1-3 This can easily be verified by adding the probabilities for the observations of the
pairs (x, z) of X and Z over males and females. This yields .144 + .096 = .24, .004 + .228 = .232, .336 +
.024 = .36 and .016 + .152 = .168.
⊲ Solution 1-4 The kinds of prima facie effects treated in this chapter are: the unconditional prima
facie effect, the conditional prima facie effect given the value z of a covariate Z , and the average of the
(Z=z)-conditional prima facie effects. The unconditional prima facie effect of treatment 1 compared
1.4 Exercises 25

to treatment 0 is the difference PFE 10 = E (Y |X =1) −E (Y |X =0) between the conditional expecta-
tion values of an outcome variable Y given the two treatment conditions. The (Z=z)-conditional
prima facie effect is the difference PFE Z ; 10 (z) = E (Y |X =1, Z=z) − E (Y |X = 0, Z=z) between the
(X =1, Z=z)-conditional expectation value and the (X =0, Z=z)-conditional expectation value of the
outcome variable Y . The average prima facie effect is the expectation of the (Z=z)-conditional
prima facie effects over the distribution of Z [see Eqs. (1.9) and (1.10)].
⊲ Solution 1-5 In statistical inference we estimate and test hypotheses about parameters charac-
terizing the (joint or marginal) distributions of random variables from sample data. In causal infer-
ence we interpret some of these parameters as causal effects, provided that certain conditions are
satisfied that allow for such a causal interpretation.
⊲ Solution 1-6 E (Y |X =x ) = P(Y =1| X =x ), because, in this example, Y is dichotomous with values
0 and 1. In this case, the term P(Y =1 | X =x ) is defined by E (Y |X =x ) (see RS-Rem. 3.22).
⊲ Solution 1-7 No solution provided. Just compare your results to the true parameters presented in
Table 1.3 and to the conditional probabilities P(Y =1|X =0) and P(Y =1|X =1) presented in section
1.1.3.
⊲ Solution 1-8 The odds ratio is
P(Y =1|X =1) . P(Y =1|X =0)
≈ .483.
1 − P(Y =1|X =1) 1 − P(Y =1|X =0)
Because this number is smaller than 1 it indicates that there is a negative effect of the treatment. The
natural logarithm of the odds ratio is the log odds ratio, which is
P(Y =1|X =1) . P(Y =1|X =0)
· ¸
ln ≈ −0.728.
1 − P(Y =1|X =1) 1 − P(Y =1|X =0)
This number is smaller than 0 indicating that there is a negative effect of the treatment. The log odds
ratio is identical to the logistic regression coefficient λ1 in the equation
exp(λ0 + λ1 · X )
P(Y =1|X ) = .
1 + exp(λ0 + λ1 · X )
Another closely related parameter is the risk ratio
P(Y =1|X =1)
= .7.
P(Y =1|X =0)
Because this number is smaller than 1, it indicates that there is a negative effect of the treatment.
Hence, no matter which of these parameters we use, we would always come to the same (wrong)
conclusion that the treatment is detrimental for our patients.
⊲ Solution 1-9 Using the joint probabilities presented in Table 1.4, the definition of the conditional
probability yields
P(X =1,Y =1,U =Joe ) .016
P(Y =1|X =1,U =Joe ) = = = .8.
P(X =1,U =Joe ) .016 + .004
⊲ Solution 1-10 First of all, note that the theorem of total probability (see RS-Th. 1.38), can also
be applied to conditional probabilities (see RS-Th. 1.42). In this exercise, it is applied to the (X =0)-
conditional probabilities P(Y =1| X = 0) = P X=0(Y =1). Hence, according to this theorem,

P(Y =1| X = 0) = P(Y =1|X =0,U =Joe ) · P(U =Joe |X =0) +


P(Y =1|X =0,U =Ann) · P(U =Ann |X =0).

The probabilities P(Y =1|X =0,U =Joe ) = .7 and P(Y =1|X =0,U =Ann) = .2 are computed analo-
gously to Exercise 1-9 and the other two probabilities occurring in this formula are P(U =Joe |X =0) =
.48/.6 and P(U =Ann|X =0) = .12/.6 (see Table 1.4). Hence,
.7 · .48 .2 · .12
P(Y =1| X = 0) = + = .6.
.48 + .12 .48 + .12
26 1 Introductory Examples

⊲ Solution 1-11 The prima facie effects E (Y |X =1) −E (Y |X =0) and E (Y |X =2) −E (Y |X =0) can be
computed from Table 1.6 as follows:

PFE 10 = E (Y |X =1) −E (Y |X =0) = 100.00 − 111.25 = −11.25

and
PFE 20 = E (Y |X =2) −E (Y |X =0) = 114.25 − 111.25 = 3.00.
⊲ Solution 1-12 The conditional prima facie effects

E (Y |X =1, Z=z) −E (Y |X =0, Z=z) and E (Y |X =2, Z=z) −E (Y |X =0, Z=z)

can be computed from Table 1.7. For low status (Z =low), they are:

PFE Z ;10 (low) = E (Y | X =1, Z =low) −E (Y | X = 0, Z =low) = 100 − 120 = −20

PFE Z ;20 (low) = E (Y | X =2, Z =low) −E (Y | X = 0, Z =low) = 80 − 120 = −40.


For medium status (Z =med), they are:

PFE Z ; 10 (med ) = E (Y |X =1, Z =med ) −E (Y |X = 0, Z =med ) = 100 − 110 = −10

PFE Z ; 20 (med ) = E (Y |X =2, Z =med ) −E (Y |X = 0, Z =med ) = 90 − 110 = −20.


Finally, for high status (Z =hi ), the conditional prima facie effects are:

PFE Z ; 10 (hi ) = E (Y | X =1, Z =hi ) − E (Y | X =0, Z =hi ) = 100 − 60 = 40

PFE Z ;20 (hi ) = E (Y | X =2, Z =hi ) − E (Y | X =0, Z =hi ) = 140 − 60 = 80.

⊲ Solution 1-13 Using the results of the last exercise, the average of the (Z=z)-conditional prima
facie effects can be computed from the conditional effects as follows:
¡ ¢
E PFE Z ; 10 (Z ) = PFE Z ;10 (low) · P(Z =low) + PFE Z ; 10 (med ) · P(Z =med ) + PFE Z ; 10 (hi ) · P(Z =hi )
1 1 1
= (−20) · + (−10) · + 40 · = 0.
4 2 4
¡ ¢
E PFE Z ;20 (Z ) = PFE Z ;20 (low) · P(Z =low) + PFE Z ; 20 (med ) · P(Z =med ) + PFE Z ; 20 (hi ) · P(Z =hi )
1 1 1
= (−40) · + (−20) · + 80 · = 0.
4 2 4
⊲ Solution 1-14
P(U =Tom , X =0) P(X =0 | U =Tom ) · P(U =Tom )
P(U =Tom | X =0) = =
P(X =0) P(X =0)
(47/60) · (1/8) 47
= = .
1/3 160
⊲ Solution 1-15 With this change, the prima facie effect changes to E (Y |X =1) −E (Y |X =0) = .6 −
.45 = .15, which is the average of the two individual total effects, which still are .10 for Joe and .20 for
Ann. Note that identical treatment probabilities P(X =1|U =u ) for all persons u is what we create by
randomly assigning a person to treatment 1 in a randomized experiment.

⊲ Solution 1-16 Rewriting RS-Box 3.2 (ii) for our example,


X
E (Y |X =x ) = E (Y | X =x ,U =u ) · P(U =u |X =x ).
u

Hence,
1.4 Exercises 27
X
E (Y | X =0) = E (Y | X =0,U =u ) · P(U =u |X =0)
u
47 33 8
= 120 · + 120 · + ... + 60 · = 111.25,
160 160 160

X
E (Y | X =1) = E (Y | X =1,U =u ) · P(U =u |X =1)
u
10 18 16
= 100 · + 100 · + ... + 100 · = 100,
160 160 160
and
X
E (Y | X =2) = E (Y | X =2,U =u ) · P(U =u |X =2)
u
3 9 36
= 80 · + 80 · + ... + 140 · = 114.25.
160 160 160
⊲ Solution 1-17 No solution provided. Just compare your results to the parameters presented in
Table 1.7.
Chapter 2
Some Typical Kinds of Random Experiments

In chapter 1 we have seen that comparing conditional expectation values of an outcome


variable between treatment groups can be completely misleading if used for the evalua-
tion of treatment effects. In this chapter we continue preparing the stage for the theory
of causal total effects and dependencies, describing the kind of empirical phenomena it
refers to: single-unit trials of experiments or quasi-experiments, but also single-unit trials
of observational studies in which causal total effects and dependencies can be investi-
gated. First examples of such a single-unit trial have already been treated in sections 1.1
and 1.2 (see in particular Tables 1.2 and 1.5).
A single-unit trial is a specific random experiment. Note the distinction between a ran-
dom experiment and a randomized experiment. Stochastic dependencies between events
and between random variables always refer to a random experiment, but not necessarily
to a randomized experiment in which the assignment of a subject to one of the treatment
conditions is determined exclusively by a random procedure such as flipping a coin. In
contrast, a random experiment is the concrete empirical phenomenon to which stochas-
tic dependencies between events and random variables (described by conditional distri-
butions, probabilities, correlations, and conditional expectations) refer to.
The single-unit trial is not the sample dealt with in statistical models. In a sample, we
consider repeating the single-unit trial many times in one way or another. This is neces-
sary if we want to deal with estimation of parameters and tests of hypotheses about these
parameters, some of which might be causal effects. The single-unit trial does not allow
treating problems of parameter estimation or hypothesis testing. However, it is sufficient
for defining causal effects and studying how to identify them, that is, investigating under
which conditions and how they can be computed from empirically estimable parameters.
A single-unit trial is also what we refer to in hypotheses and theories of the empiri-
cal sciences. In many text books on applied statistics the dazzling term ‘population’ is
used instead, obfuscating what we are actually talking about when we use probabilistic
terms such as expectation, variance, covariance, correlation, regression, etc. Furthermore,
single-unit trials are what is of interest in practical work. How does the treatment of a pa-
tient affect the outcome of this patient if compared to another possible treatment? What is
the treatment effect for a male, and what is its effect for a female? Which variables explain
inter-individual differences in individual causal total effects? All these practical questions
are raised using concepts referring to a single-unit trial.

Overview

We start with the single-unit trial of simple experiments and then treat increasingly more
complex ones introducing additional design features. Specifically, we will introduce the
single-unit trials of experiments and quasi-experiments with fallible covariates, a multi-
30 2 Some Typical Kinds of Random Experiments

factorial design with more than one treatment, multilevel experiments and quasi-experi-
ments, and experiments and quasi-experiments with latent covariates and/or outcome
variables.
We also discuss different kinds of random variables that will play a crucial role in the
chapters to come. Among these random variables are the observational-unit variable or
person variable, manifest and latent covariates, treatment variables, as well as manifest
and latent outcome variables. In this chapter, we confine ourselves to an informal descrip-
tion of single-unit trials and the random variables involved, preparing the stage for their
mathematical representations in the subsequent chapters.

2.1 Simple Experiments

As a first class of random experiments we consider the single-unit trials of simple exper-
iments and quasi-experiments. Such single-unit trials are experiments and quasi-experi-
ments in which no fallible covariates are assessed, that is, no covariates whose values con-
sist in part of a measurement error. Such a single-unit trial consists of:
(a) sampling an observational unit u (e. g., a person) from a set of units,
(b) assigning the unit or observing its assignment to one of several experimental con-
ditions (represented by the value x of the treatment variable X ),
(c) recording the value y of the outcome (or response) variable Y .
Figure 2.1 displays a tree representation of the set of possible outcomes of this single-
unit trial. Note that this is the kind of random experiment we considered in the Joe-Ann
example presented in section 1.1 and in the two-factorial design example treated in sec-
tion 1.2. The random variables X (treatment), Y (success), and Z (status), the conditional
expectation values E (Y |X =x ) and E (Y |X =x ,Z =z), as well as the probabilities P (X =x ),
P (Z =z), P (X =x , Z =z) all referred to such a single-unit trial. Of course, all these condi-
tional expectation values and probabilities are unknown in empirical applications. Never-
theless, they are among the parameters that determine the outcome of a single-unit trial,
just in the same way as the probability of heads determines the outcome of flipping a coin.
In order to illustrate this point, imagine flipping a deformed coin that has the shape of a
Chinese wok, and suppose that in this case the probability of flipping heads is .8 instead of
.5. Although this probability does not deterministically determine the outcome of flipping
the coin, it stochastically determines the outcome.
In fact, we may consider the single-unit trial of (a) sampling a coin u from a set of coins,
(b) forming (X =1) or not forming (X =0) a wok out of it, and (c) observing whether (Y =1)
or not (Y = 0) we flip heads. In this single-unit trial, the difference .8 − .5 = .3 would be the
causal total effect of the treatment variable X on the outcome variable Y . Note that the
probabilities .8 and .5 and their difference .3 refer to this single-unit trial, although these
probabilities can only be estimated if we conduct many of these single-unit trials, that is,
if we draw a data sample. However, if these probabilities were known, we could dispense
with a sample (including the data that would result from drawing it), and still have a per-
fect theory and prediction for the outcome of such a single-unit trial (see Exercise 2-1).
2.1 Simple Experiments 31

y1
control y2
..
u1 .
y1
treatment y2
..
.
y1
control y2
..
u2 .
y1
treatment y2
..
.
..
.

Figure 2.1. A simple experiment or quasi-experiment

Sampling a Unit

The first part of this single-unit trial consists of sampling an observational unit. In the so-
cial sciences, units often are persons, but they might be groups, school classes, schools
and even countries. Usually such units change over time. Therefore, it should be em-
phasized that, in simple experiments and quasi-experiments, we are talking about the
units at the onset of treatment . Later we will see that we have to distinguish between units
at the onset of treatment and units at the time of assessment of the outcome variable, which
might be months or even years later (for details see Steyer, Mayer, Geiser, & Cole, 2015).
In a single-unit trial of simple experiments and quasi-experiments, the units can be rep-
resented by the observational-unit variable U , whose possible values u are the units at the
onset of treatment.
Note that the unit at the onset of treatment also comprises his or her experiences a year
and/or the day before treatment, as well as the psycho-bio-social situation in which he
or she is at the onset of treatment. Both, the experiences and the situation, already hap-
pen before the onset of treatment (see again Steyer et al., 2015 for more details). Therefore,
they are attributes of the observational units u. They can be treated in the same way as
other attributes such as sex and educational status. However, if these attributes are actu-
ally assessed and if this assessment is fallible, then we have to distinguish between these
attributes and their fallible assessments (see sect. 2.2).

Treatment Variable

In an experiment or quasi-experiment, there is always a treatment variable, which we fo-


cus as a cause1 and usually denote it by X . In a true experiment , the unit drawn is assigned
1 We use the term (putative) cause for a random variable if we consider its causal effect on an outcome variable.
Note that a causal effect can also be 0.
32 2 Some Typical Kinds of Random Experiments

to one of the possible treatments with a probability that is fixed by the experimenter. In
contrast, in a quasi-experiment we just observe selection (e. g., self-selection or selec-
tion by someone else) to one of the treatment conditions. In the simplest case there are at
least two treatment conditions, for example, treatment and control. These treatment con-
ditions are the possible values of the treatment variable X . For simplicity, we use the values
0, 1, . . . , J to represent J +1 treatment conditions. Furthermore, unless stated otherwise, we
presume that treatment assignment and actual exposure to treatment are equivalent, that
is, we assume that there is perfect compliance.
Selection of a unit into one of the treatment conditions x may happen with unknown
probabilities. This is the case, for example, in self-selection or assignment by an unknown
physician. In this case we talk about a quasi-experiment. However, assignment can also
be done with known probabilities that are equal for different units such as in the sim-
ple randomized experiment or with known probabilities that may be unequal for different
units such as in the conditionally randomized experiment . In this case, these treatment
probabilities may also depend on a covariate Z representing pre-treatment attributes of
the units. As mentioned above, conditional and unconditional randomized assignment,
distinguish the true experiment from the quasi-experiment, in which the assignment prob-
abilities are unknown. (See Remarks 8.58 and 8.59 for more details on randomization and
conditional randomization.)

Potential Confounders and Covariates

In simple experiments and quasi-experiments, the focus is usually on total treatment ef-
fects on an outcome variable. Hence, if we are interested in the treatment variable as a
cause, then each attribute of the observational units is a potential confounder. Examples
are sex, race, educational status, and socio-economic status. Once the unit is drawn, its sex,
race, educational status, and socio-economic status are fixed. This means that there is no
additional sampling process associated with assessing these potential confounders. This
is also the reason why they do not appear in points (a) to (c) describing the single-unit
trial.
A potential confounder is also called a covariate if it is actually assessed and used to-
gether with X in a conditional expectation or a conditional distribution. Note that a po-
tential confounder can also be unobserved, and in this case we usually do not call it a
covariate.
Because potential confounders represent attributes of the unit at the onset of treatment
they can never be affected by the treatment. However, there can be (stochastic) dependen-
cies between the treatment variable and potential confounders. In the Joe-Ann example
treated in sect. 1.1, for instance, there is a stochastic dependence between the treatment
variable X and the person variable U . Similarly, in the second example presented in sec-
tion 1.2 there is a stochastic dependence between Z (status) and the treatment variable
X.

Multidimensional Potential Confounders and Covariates

Potential confounders – and therefore also covariates – may be uni- or multi-dimensional,


qualitative (such as Z 1 :=sex and Z 2 := educational status ) or quantitative (such as Z 3 :=
height and Z 4 :=body mass index). If it is a multivariate variable made up of several
2.1 Simple Experiments 33

uni-dimensional variables, it may consist of qualitative and quantitative potential con-


founders such as Z 5 = (Z 1 , Z 4 ).

Specific Potential Confounders

Note that the observational-unit variable U and the U -conditional treatment probability
P (X =x |U ) (see RS-Def. 4.4) are potential confounders as well. The values of P (X =x |U )
are the conditional probabilities P (X =x |U =u ), which are attributes of the persons u [see
Eq. (1.1)]. Similarly, the Z -conditional treatment probability P (X =x | Z ) is also a potential
confounder provided that Z is a covariate [see Def. 4.11 (iv) and Rem. 4.16]. Furthermore,
the assignment to treatment x with values ‘yes’ and ‘no’ is also a potential confounder if
assignment to treatment and exposure to treatment (again with values ‘yes’ and ‘no’) are
not identical and exposure to treatment is focused as a (putative) cause. This distinction
is useful in experiments with non-compliance (see, e. g., Jo, 2002a, 2002b, 2002c; Jo et al.,
2008).

Unobserved Potential Confounders

Even if we consider a multivariate potential confounder Z consisting of several univariate


potential confounders, there are always unobserved variables that are prior or simultane-
ous to treatment. Such variables are called unobserved potential confounders. Sometimes
they are also called hidden confounders (cf., e. g., Rosenbaum, 2002a). Of course such an
unobserved potential confounder may bias the conditional expectation values of the out-
come variable just in the same way as an observed covariate. Whether or not the condi-
tional expectation values of the outcome variable in the treatment conditions are unbiased
such that their differences represent causal total effects does not only depend on the rela-
tionship between the observed variables such as X , Y , and the observed (possible multi-
variate) covariate, say Z , but also on the relationship of these variables to the unobserved
potential confounders. Potential confounders exert their maleficent effects irrespective of
whether or not we observe them.

Outcome Variable

Of course, the outcome variable Y refers to a time at which the treatment might have had
its impact. Hence, treatment variables are always prior to the outcome variable. In prin-
ciple, we may also observe several outcome variables, for example, in order to study how
the effects of a treatment grow or decline over time or to study treatment effects that are
not confined to a single outcome variable. All random variables mentioned above refer to
a concrete single-unit trial and they have a joint distribution. Each combination of unit,
treatment condition, and score of the outcome variable may be an observed result of such
a single-unit trial. This implies that the variables U , Z , X , and Y , as well as unobserved
potential confounders, say W , have a joint distribution (see RS-Def. 2.38 and SN-section
5.3). Once we specified the random experiment to be studied, this joint distribution is
fixed, even though it might be known only in parts or even be unknown altogether.
34 2 Some Typical Kinds of Random Experiments

Causal Effects and Causal Dependencies

There are already several kinds of causal effects that can be considered in the single-unit
trial of a simple experiment or quasi-experiment. For simplicity, suppose the treatment
has just two values, say treatment and control. First, there is the causal average total effect
of treatment (compared to control) on the outcome variable Y . Second, there are the
causal conditional total treatment effects on Y , where we may condition on any function
of the observational-unit variable U . If, for example, Z := sex with values m for male and
f for female, then we may consider the causal (Z =m)-conditional total treatment effect
on Y , that is, the causal average total treatment effect for males, and the causal (Z = f )-
conditional total treatment effect on Y , that is, the causal average total treatment effect
for females. Similarly, if Z := socio-economical status, we may consider the causal condi-
tional total treatment effects on Y for each status group, etc. Third, although difficult and
often impossible to estimate, we may also consider the causal individual total effect of
treatment compared to control on Y .
By definition, within a simple experiment and quasi-experiment we cannot consider
any direct treatment effects with respect to one or more specified potential mediators, that
is, the effects of the treatment on the outcome variable that are not transmitted through
specified potential mediators. However, the causal total treatment effects discussed above
are, of course, transmitted through potential mediators, irrespective of whether or not we
observe (or are aware of) these potential mediators.

2.2 Experiments With Fallible Covariates

Another class of random experiments are single-unit trials of experiments and quasi-
experiments in which we assess a fallible covariate. In this case, the fallible covariate does
not represent a (deterministic) attribute of the observational units. The single-unit trial of
such an experiment of quasi-experiment consists of:
(a) sampling an observational unit u (e. g., a person) from a set of units,
(b) assessing the values z1 , . . . , zk of the covariates (pre-treatment variables) Z 1 , . . . , Z k ,
k ≥ 1.
(c) assigning the unit or observing its assignment to one of several experimental con-
ditions (represented by the value x of the treatment variable X ),
(d) recording the value y of the outcome variable Y .
The crucial distinction between a simple (quasi-) experiment and a (quasi-) experiment
with fallible covariates is that there is variability of at least one of the covariates given the
observational unit u (see Fig. 2.2). In this case, we may distinguish between the latent
covariate, say ξ, representing the attribute to be assessed and its fallible measures, some
manifest variables that can actually be observed. (For the theory of latent variables see
Steyer et al., 2015). Also note that sometimes it is crucial to adjust the effect of X on Y by
conditioning on the latent variable ξ in order to fully adjust for the bias of the prima-facie
effect of X on Y . In these cases, only adjusting for the manifest variables that measure the
latent covariate ξ does not completely remove bias (see, e. g., Sengewald, Steiner, & Pohl,
2019).
Furthermore, this distinction also implies that the unit whose attributes are measured
at the time when the potential confounder is assessed is not identical any more to the unit
2.2 Experiments With Fallible Covariates 35

y1
control y2
..
z1 .
y1
treatment y2
..
.
u1 y1
control y2
..
z2 .
.. y1
. treatment y2
..
.
y1
control y2
..
z1 .
y1
treatment y2
..
.
u2 y1
control y2
..
z2 .
.. y1
. treatment y2
..
.
..
.

Figure 2.2. Experiment or quasi-experiment with a fallible covariate

at the onset of treatment (see section 2.1). The covariate might be assessed some months
before the treatment is given — enough time and plenty of possibilities for the unit to
change in various ways, for example, due to maturation, learning, critical life events, and
other experiences that are not fixed yet at the time of assessing the covariate. As a con-
sequence, a variable, say W , representing such intermediate events or experiences may
also affect the outcome variable Y and the treatment variable. Hence, such intermediate
variables are also potential confounders. This is one of the reasons why we need to define
causal effects in a more general way than in the Neyman-Rubin tradition (see ch. 5).
Note that assessing a fallible covariate does not only change the interpretation of the
observational-unit variable U (now its values are the units of the time of assessment of the
manifest covariates), but it also changes the random experiment, and with it, the empiri-
cal phenomenon we are considering. Assessing a fallible covariate often involves that the
sampled person fills in a questionnaire or takes a test. Assessing, prior to treatment, a fal-
36 2 Some Typical Kinds of Random Experiments

lible covariate such as a test of an ability, an attitude, or a personality trait, may change the
observational units and their attributes, as well as the effects of the treatment on a spec-
ified outcome variable, which usually is related to such pre-treatment variables. This has
already been discussed by Campbell and Stanley (1963), who also recommended designs
for studying how pre-treatment assessment modifies the effects of the treatment variable
on the outcome variable.

Potential Confounders and Covariates

Which are the potential confounders of the treatment variable X in such a single-unit trial?
First of all, it is each attribute of the units at the time of the assessment of the observed
covariates. This does not only include variables such as sex, race, and educational status,
but also a latent covariate, say ξ, (which might be multi-dimensional). Furthermore, aside
from the manifest covariates, each variable W representing an intermediate event or ex-
perience of the unit (occurring in between the assessment of the observed covariates and
the onset of the treatment), as well as any attribute of the unit at the onset of treatment is
a potential confounder as well, irrespective of whether or not these potential confounders
are observed.
Note that a latent covariate ξ may be considered a cause of its fallible measures Z 1 , . . . , Z k
and of the outcome variable Y . This is not in conflict with the theory that the treatment
variable X is a cause of Y as well. In this kind of single-unit trial, we have several causes and
several outcome variables, and a cause itself can be considered as an outcome variable. For
example, it would be possible to consider the treatment variable X to be causally depen-
dent on the manifest or latent covariates. In other words, we may also raise the question
if the conditional treatment probabilities P (X =1 | Z 1 , . . . , Z k ) or P (X =1 | ξ ) describe causal
dependencies. This makes clear that the terms ‘potential confounder’ and ‘covariate’ can
only be defined with respect to a focused cause.

2.3 Two-Factorial Experiments

As a third class of random experiments we consider two-factorial experiments. The single-


unit trial of such a two-factorial experiment or quasi-experiment consists of:
(a) sampling an observational unit u (e. g., a person) from a set of units,
(b) assigning the unit or observing its assignment to one of several experimental con-
ditions that are defined by the pair (x, z) of levels of two treatment variables X and
Z , respectively.
(c) recording the value y of the outcome variable Y .

Sampling a Unit

Because we presume that no fallible potential confounders such as ‘severity of symptoms’,


‘motivation for treatment’, etc. are assessed before treatment, sampling an observational
unit means that we are sampling a unit at the onset of treatment.
2.3 Two-Factorial Experiments 37

Treatment Variables

As a simple example, let us consider an experiment in which we study the effects — in-
cluding the joint effects — of two treatment factors, say individual therapy represented by
X (with values ‘yes’ and ‘no’) and group therapy represented by Z (with values ‘yes’ and
‘no’).
In such a two-factorial experiment, we may consider group therapy as a covariate and
individual therapy to be the cause in order to ask for the conditional and average total
effects of individual therapy given group therapy and given no group therapy. In contrast,
we may also consider individual therapy to be a covariate and group therapy to be the
focused treatment variable. Finally, we may also consider the two-dimensional variable
(X , Z ) as the cause. Which option is chosen depends on the causal effects we are interested
in (see below).

Outcome Variable

Again, the outcome variable Y refers to a time at which the treatment might have exerted
the effects to be estimated. Hence, both treatment variables are prior to the outcome vari-
able considered. And again, we may also observe several outcome variables, for example,
in order to study how effects of a treatment grow or decline over time or to study effects
that are not confined to a single outcome variable.

Causal Effects

There are several causal effects we might look at. If X and Z have only two values, then we
may be interested in the following effects on the outcome variable Y :

(a1 ) the conditional total effect of ‘individual therapy’ as compared to ‘no individual ther-
apy’ given that the unit treated also receives ‘group therapy’,
(b 1 ) the corresponding conditional total effect given that the unit does not receive ‘group
therapy’, and
(c 1 ) the average of these conditional total effects of ‘individual therapy’ as compared to
‘no individual therapy’, averaging over the two values of Z (group therapy).

Vice versa, we might also be interested in the following effects on the outcome variable Y :

(a2) the conditional total effect of ‘group therapy’ as compared to ‘no group therapy’
given that the unit treated also receives ‘individual therapy’,
(b 2) the corresponding conditional total effect given that the unit does not receive ‘indi-
vidual therapy’, and
(c 2 ) the average of these conditional total effects of ‘group therapy’ as compared to ‘no
group therapy’, averaging over the two values of X .

Furthermore, there are other causal effects on Y we might study, namely

(a3 ) the total effect of receiving ‘individual therapy’ and ‘group therapy’ as compared to
receiving none of the two treatments.
(b 3 ) the total effect of receiving ‘individual therapy’ and ‘no group therapy’ as compared
to receiving ‘group therapy’ and ‘no individual therapy’.
38 2 Some Typical Kinds of Random Experiments

All these effects may answer meaningful causal questions. In fact there are even more
causal effects than those listed above. For example, we could compare each of the four
combinations of the two treatments to an average of the other treatments. Furthermore,
many additional causal effects can be considered if we condition on other covariates such
as sex or educational status.

Potential Confounders and Covariates

If we focus on the effect of X (individual therapy), then we consider Z (group therapy)


as a covariate of X . In contrast, we treat X as a covariate of Z if we study the effects of
Z (group therapy). Furthermore, in both cases, each attribute of the unit at the onset of
treatment (such as sex or educational status) could be considered as covariates as well.
Assessing these covariates does not appear in points (a) to (c) of the random experiment,
because these covariates are (deterministic) functions of the observational-unit variable.
Therefore, there is no additional sampling process associated with their assessment.
This is also true for other potential confounders, for example, variables characterizing
the situation in which the unit is at the onset of treatment, the number of hours slept last
night, or day time at which the unit receives its treatment. Even variables that characterize
early experiences in the childhood of the unit such as a broken home or mother’s child care
behavior are potential confounders in this single-unit trial. They are there and exert their
effects even if they are not assessed.
Note again that assessment of these potential confounders in a questionnaire filled in
by the person constitutes a new random experiment that may differ in important ways
from a random experiment in which the unit has no such task (see sect. 2.2). In psychology,
an assessment often is a treatment of its own.

2.4 Multilevel Experiments

In multilevel experiments and quasi-experiments we also study the effect of a treatment


on an outcome variable. However, in such a design the observational units are nested
within higher hierarchical units referred to as clusters. Examples include experiments, in
which students are nested within classrooms, patients are nested within clinics, and in-
habitants are nested in cities and neighborhoods. Multilevel designs can be classified as
designs with treatment assignment at the unit-level or at the cluster-level. Furthermore,
multilevel designs differ with respect to the assignment of units to clusters. There are de-
signs with pre-existing clusters and there are designs with assignment of units to clusters.
All these designs involve different single-unit trials.
A single-unit trial with pre-existing clusters consists of:
(a) sampling a cluster c (e. g., a school class, a neighborhood, or a hospital) from a set
of clusters,
(b) sampling an observational unit u (e. g., a person) from a set of units within the clus-
ter,
(c) assigning the unit or the cluster (depending on the design) or observing their as-
signment to one of several experimental conditions (represented by the value x of
the treatment variable X ),
(d) recording the value y of the outcome variable Y .
2.5 Experiments With Latent Outcome Variables 39

In contrast, a single-unit trial with assignment of units to clusters consists of:


(a) sampling an observational unit u (e. g., a person) from a set of units,
(b) assigning the unit or observing its assignment to one of several clusters (repre-
sented by the value c of the cluster variable C ),
(c) assigning the unit or the cluster (depending on the design) or observing their as-
signment to one of several experimental conditions (represented by the value x of
the treatment variable X ),
(d) recording the value y of the outcome variable Y .
In the experiment with pre-existing clusters, each unit can only appear in one cluster,
whereas in the experiment with assignment of units to a cluster, each unit can be assigned
to any cluster. Note again, that we are considering single-units trials from the pre-factual
perspective, not from the post-factual or ‘counter-factual’ perspective (see the remarks fol-
lowing the description of the random experiment presented in Table 1.1). Hence, in exper-
iments with assignment of units to a cluster, the cluster variable can bias the dependency
of the outcome variable on the treatment variable on the level of the observational unit. In
this aspect this design resembles the multifactorial design described in section 2.3.

Potential Confounders and Covariates

Which are the potential confounders in multilevel designs if the treatment variable X is
considered as the cause? The answer depends on the type of design considered: In de-
signs with assignment of units to clusters, attributes of the observational unit such as sex,
race, or educational status, are potential confounders of X . Other potential confounders
are attributes of the cluster such as school type, hospital ownership, or school-level socio-
economic status or school-level intelligence. The last two kinds of potential confounders
would be defined as conditional expectations of the corresponding potential confounders
at the unit-level given the cluster variable.
In these designs, clusters may not only be considered as potential confounders, but also
as treatments, because some of the effects observed later on may depend on the compo-
sition of the group to which a particular unit, say Joe, is assigned. Receiving group therapy
together with beautiful Ann in the same group might make a great difference as compared
to getting it together with awful Joe. In designs in which clusters as a whole are assigned to
treatment conditions, only attributes of the cluster can influence the assignment. Hence,
in data analysis we would focus on controlling for the potential confounders on the cluster
level (see, e. g., Nagengast, 2009, for more details).

2.5 Experiments With Latent Outcome Variables

We may also consider single-unit trials of experiments with a latent outcome variable. The
basic goal of such experiments is to investigate the effect of the treatment variable X on a
latent outcome variable, say η. This is of interest, for example, where a quantitative out-
come variable can only be measured by qualitative observations such as solving or not
solving certain items indicating the (latent) ability. However, it can also be of interest if the
manifest measures are linearly related to the latent variable such as in models of classical
test theory (see, e. g., Steyer, 2001) or in models of latent state-trait theory (see, e. g., Steyer
40 2 Some Typical Kinds of Random Experiments

et al., 2015). If, for example, there are three manifest variables Y1 , Y2, and Y3 measuring a
single latent variable η, then we may ask if there is just one single effect of the treatment
on the latent outcome variable η – which transmits these effects to the manifest variables
Y1 , Y2, and Y3 – instead of three separate effects of X on each variable Yi . Hence, the latent
variable may also be considered to be a mediator variable. Showing that all effects of X on
the variables Yi are indirect, that is, mediated by η is one of the research efforts that aims
at establishing construct validity of the latent variable η.

In the simplest case with a single latent variable, we consider the following single-unit
trial:

(a) Sampling a person u0 out of a set of persons,

(b) assigning the unit or observing its assignment to one of several experimental con-
ditions (represented by the value x of the treatment variable X ),

(c) recording the values y 1 , . . . , y m of the manifest outcome variables Y1 , . . . , Ym .

In this single-unit trial, a value u1 of the observational-unit variable U 1 represents the


observational unit at the onset of treatment, while the latent outcome variable η represents
some attribute of the same unit at the time point at which the outcome of the treatment
is assessed. In such a design we have to distinguish between u1 , a unit at time 1, and u2 ,
the same unit at time 2 (see Steyer et al., 2015) for more details). Obviously, time 2 is after
treatment and prior to the observation of the manifest outcome variables Yi , at least as
long as we preclude change in the latent variable during the process of assessing the man-
ifest outcome variables. If this cannot be precluded, we would have to consider the time
sequence in assessing the manifest outcome variables (e. g., of the items to be solved or
answered) as well.

Potential Confounders and Covariates

Which are the potential confounders in such a single-unit trial? Again, the answer depends
on the cause considered. If it is the treatment variable X , then each attribute of the unit at
the onset of treatment is a potential confounder (with respect to X ). Obviously, this again
includes variables such as sex, race, and educational status. Note that in this kind of exper-
iments, the set of potential confounders of X is the same irrespective of the choice of the
outcome variable. Remember, we may not only consider the latent outcome variable η but
also the manifest outcome variables Yi , for example, in order to study whether or not the
effects of X on these manifest outcome variables are perfectly transmitted (or mediated)
through the latent variable η.

Choosing the latent outcome variable η as a cause of the manifest outcomes variables
Yi brings additional potential confounders into play, for instance, all those variables that
are in between treatment and the assessment of η. If, for example, we consider an exper-
iment studying the effects of different teaching methods, these additional potential con-
founders are critical life events (such as father or mother leaving the family), or additional
lessons taken after treatment and before outcome assessment, for instance.
2.6 Summary and Conclusion 41

Box 2.1 Glossary of new concepts

Note that all terms mentioned in this box are still of an informal nature. Their mathematical
specification starts in chapter 3.
Random experiment The kind of empirical phenomenon to which events, random
variables, and their dependencies refer.
Randomized experiment An random experiment in which the experimenter fixes the treat-
ment probabilities for each observational unit.
Single-unit trial A particular kind of random experiment that consists of sampling
a single unit from a set of observational units and observing the
values of one or more random variables related to this unit.
Cause A random variable. Its effect on an outcome variable is consid-
ered.
Outcome variable A random variable. Its dependency on a cause is considered.
Potential confounder If we confine the discussion to total causal effects, then it is a ran-
dom variable that is prior or simultaneous to the cause consid-
ered. It might be correlated with the cause and the outcome vari-
able.
Covariate A potential confounder that is considered together with X in a
conditional expectation or a conditional distribution.
Fallible covariate A covariate that is assessed with measurement error.
Latent covariate A covariate that is not directly observed. Instead it is defined us-
ing some parameters of the joint distribution of a set of manifest
random variables.
Intermediate variable A variable that might mediate (transmit) the effect of the cause on
the outcome variable. The cause is always prior to an potential
mediator and an potential mediator is always prior to the out-
come variable. An potential mediator is not necessarily affected
by the cause and it does not necessarily have an effect on the out-
come variable.
Mediator An intermediate variable on which X has a causal effect and
which itself has a causal effect on the outcome variable Y .

2.6 Summary and Conclusion

In this chapter we described a number of random experiments in informal terms. The


purpose was to get a first idea of which kind of empirical phenomena causal theories and
hypotheses refer to. We focused on single-unit trials, which are the kinds of empirical phe-
nomena we are interested in, both in theory and practice. We emphasized that a single-
unit trial is a random experiment and discussed several kinds of random variables playing
a crucial role in the theory of causal effects. We also mentioned that there is a certain time
order among these random variables, for example, saying that the potential confounders
42 2 Some Typical Kinds of Random Experiments

are ‘prior’ or ‘simultaneous’ to the treatment variable, which itself is ‘prior’ to the outcome
variable. Furthermore, for each single-unit trial and each cause in such a single-unit trial,
we discussed the potential confounders involved. We emphasized that each cause consid-
ered in such a single-unit trial has its own set of potential confounders.

Other Single-Unit Trials

The single-unit trials discussed in this chapter are just a small selection of single-unit tri-
als in which causal effects and causality of stochastic dependencies are of interest. We
might also consider single-unit trials with latent covariates and latent outcome variables
and manifest and/or latent potential mediators, but also single-unit trials with multiple
mediation. Furthermore, we could also consider single-unit trials of growth curve models
(see, e. g., Biesanz, Deeb-Sossa, Aubrecht, Bollen, & Curran, 2004; Bollen & Curran, 2006;
Meredith & Tisak, 1990; Singer & Willett, 2003; Tisak & Tisak, 2000), latent change mod-
els (see, e. g., McArdle, 2001; Steyer, Eid, & Schwenkmezger, 1997; Steyer, 2005), or cross-
lagged panel models (see, e. g., Kenny, 1975; Rogosa, 1980; Watkins, Lei, & Canivez, 2007;
Wolf, Chandler, & Spies, 1981). Causality is also an issue in uni- and multivariate time-
series analysis as well as in stochastic processes with continuous time. However, in this
book our examples will usually deal with experiments and quasi-experiments, including
latent covariates and outcome variables.

Outlook

In chapter 3 we will study additional mathematic concepts that allow us to meaningfully


talk about time order between events and random variables. This will be used in chapter
4, in which we introduce the notions of a causality space and the concept of a potential
confounder. This will provide the mathematical framework and language in which causal
effects can meaningfully be defined (see ch. 5).

2.7 Exercises

⊲ Exercise 2-1 Imagine that the probabilities of a crash for a flight with Airline A is ten times smaller
than with Airline B. Which airline would you choose?

⊲ Exercise 2-2 Why does the theory of causal effects refer to single-unit trials?

⊲ Exercise 2-3 Why is it important to know which random experiment we are talking about?

⊲ Exercise 2-4 Which type of random experiment did we refer to in the two examples described in
chapter 1?

⊲ Exercise 2-5 Why is it important to emphasize that, in simple experiments and quasi-experiments
(see section 2.1), the observational-unit variable U represents the observational units at the onset of
treatment ?

⊲ Exercise 2-6 What is the basic idea of a potential confounder of a cause?

⊲ Exercise 2-7 Which kinds of causal effects can be considered in the simple experiment or quasi-
experiment in which no fallible potential confounder and no potential mediator is assessed?
2.7 Exercises 43

Solutions

⊲ Solution 2-1 If your answer is A, then you implicitly apply these probabilities to the random ex-
periment of flying once with A or B, even if these probabilities have been estimated in a sample. This
example serves to emphasize that, not only in theory but also in practice, we are mainly interested
in a single-unit trial, not in a sample consisting of many such single-unit trials, and in particular not
in what applies to sample size going to infinity. (This is how many applied statisticians try to specify
the term ‘population’.)
⊲ Solution 2-2 Within such a single-unit trial, the various concepts of causal effects can be defined
and we can study how to identify these causal effects from the parameters describing the joint dis-
tribution of the random variables considered. In such a single-unit trial, there usually is a clear time
order which helps (but is not sufficient) to disentangle the possible causal relationships between the
random variables considered.
⊲ Solution 2-3 Different random experiments are different empirical phenomena. Although the
names of the variables in different random experiments might be the same, the variables themselves
are different entities, implying that the dependencies and effects between these variables might dif-
fer between different random experiments.
⊲ Solution 2-4 The type of random experiment we refer to in these examples is the single-unit trial
of simple experiments and quasi-experiments described in section 2.1, because there is no extra
sampling of a covariate. Instead, the value of this covariate is fixed as soon as the person is sampled.
That is, the covariate is an attribute of the person.
⊲ Solution 2-5 In the social sciences, units are often persons, and persons can change over time.
If, in a simple experiment or quasi-experiment, a value u of U represents the observational unit
sampled at the onset of treatment, each potential confounder is a function of U . If, in contrast, U
represents the observational unit at the assessment of a fallible covariate (see sect. 2.2), which is
some time prior to the onset of treatment, then there can be other potential confounders in between
assessment of the fallible potential confounder and the onset of treatment. We have to consider
these additional potential confounders both in the definition of causal effects and in data analysis.
⊲ Solution 2-6 A potential confounder of a cause is a random variable that is prior or simultaneous
to the cause, at least as long as we only consider total effects. (If we also consider direct effects, then
a potential confounder can also be posterior to a cause.)
⊲ Solution 2-7 If the treatment has just two values, say treatment and control, then there are differ-
ent kinds of causal effects of the treatment variable on the outcome variable Y , such as the average
total treatment effect, the conditional total treatment effects given a value z of a covariate Z , and the
individual total effect of X on Y given an observational unit u. Aside from these treatment effects,
we may also consider the causal effects of a potential confounder Z on the treatment variable X , but
also on the outcome variable Y .
Part II

Basic Concepts of the Theory of Causal Total


Effects
Chapter 3
Time Order

In chapter 1 we studied some examples showing that the conditional expectation values
E (Y |X =x ) of an outcome variable Y and their differences E (Y |X =x ) − E (Y |X =x ′ ), the
prima facie effects, can be seriously misleading in evaluating the causal effect of a (treat-
ment) variable X on an (outcome or response) variable Y. Hence, conditional expectation
values cannot be used offhandedly to define the causal effects in which we are interested
when we want to evaluate a treatment, an intervention, or an exposition. For the purpose
of such an evaluation, the concept of a conditional expectation value E (Y |X =x ) has two
deficits. The first one is that the terms E (Y |X =x ) and E (Y |X =x ′ ) do not necessarily de-
scribe the kind of dependency in which we are interested for the evaluation of a treatment.
This deficit is related to (causal) bias, which will be treated in chapter 6. The second deficit
is that it does not guarantee that X is prior to Y in time, which is indispensable for a dif-
ference E (Y |X =x ) − E (Y |X =x ′ ) to describe a causal effect of a value x of X compared to
another value x ′ of X .

Overview

In the present chapter, we focus on time order of sets of events and of random variables.
We introduce the concepts of a filtration and the relations prior to, simultaneous to, and
prior or simultaneous to with respect to a filtration. Note that the definitions of these rela-
tions do not involve a probability measure. Instead, they can be introduced for measurable
set systems and measurable maps. In the framework of a probability space, these relations
represent time order of sets of events and random variables, respectively. For brevity, we re-
frain from explicitly treating the corresponding relations among measurable sets (and with
it, among events).

Requirements

Reading this chapter requires that the reader is familiar with the contents of the first two
chapters of Steyer (2024). The first of these chapters deals with the concepts of probabil-
ity and conditional probability of events, including the necessary mathematical frame-
work such as a probability space (Ω, A, P ) consisting of a set Ω of possible outcomes, a
σ-algebra A of events, and a probability measure P on A. The second chapter introduces
the concepts of a random variable as a special measurable map and its distribution as a
special image measure. These chapters will be referred to as RS-chapter 1 and RS-chapter
2. The same kind of shortcut is used when referring to other parts of that book, such as
sections, definitions, theorems, remarks, or equations, for instance.
48 3 Time Order

3.1 Filtration

The definition of an event in probability theory does not presume that there is a time or-
der between events, sets of events, and random variables. However, in many applications
of probability theory such a time order is important, in particalur if causal interpretations
of dependencies are intended. In both examples presented in chapter 1, for instance, it
is crucial for the evaluation of the treatment that the treatment variable is prior to the
outcome variable Y. Such a time order can be defined with respect to a filtration, a funda-
mental concept of the theory of stochastic processes (see, e. g., Klenke, 2020, Def. 9.9).

Definition 3.1 [Filtration]


Let (Ω, A ) be a measurable space, T ⊂ R , and T 6= Ø. A family FT = (Ft )t ∈T of σ-alge-
bras Ft ⊂ A is called a filtration in A , if Fs ⊂ Ft , for all s, t ∈T with s ≤ t .

Note that this concept is defined in the context of a measurable space (Ω, A ) (see RS-
Def. 1.4), which just consists of a set Ω and a σ-algebra A. No probability measure is in-
volved, and this also applies to the prior-to relations that will be introduced in section 3.2.
Hence, throughout this chapter, we do not presume that there is a probability measure
P on (Ω, A ). Nevertheless, our examples refer to random experiments, which are repre-
sented by a probability space (Ω, A, P ).
RST: Vielleicht auch filtered measurable space einführen.

Example 3.2 [Joe and Ann With Self-Selection] In the random experiment presented in
Table 1.2, all elements ω1 , . . . , ω8 of the set of possible outcomes are listed in the first column
of this table. The set of these possible outcomes is the Cartesian product

Ω = ΩU × ΩX × ΩY
= {ω1 , . . . , ω8 } (3.1)
© ª
= ( Joe , no , −), . . . , (Ann , yes , +) ,

where ΩU = { Joe , Ann }, ΩX = {no , yes }, and ΩY = {−, +}. The σ-algebra on Ω is specified by
A = P (Ω), that is, A is chosen to be the power set of Ω, which is the set of all subsets of Ω,
consisting of 28 = 256 elements. Finally, the probability measure P : Ω → A is specified by
the assignment of the probabilities P ({ωi }) to the eight elements of Ω. These probabilities
are shown in the second column of Table 1.2. All other 248 probabilities P (A), A ∈ A, can
be computed from the probabilities P ({ωi }), i = 1, . . . , 8, because, except for the empty set,
they are unions of the elementary events {ωi } [see RS-Box 1.1 (x)].
In this example, the person variable

U : Ω → ΩU (3.2)

has the co-domain ΩU = { Joe , Ann }, the treatment variable

X : Ω → ΩX′ (3.3)

has the co-domain ΩX′ = { 0, 1}, and the outcome variable

Y : Ω → ΩY′ (3.4)
3.1 Filtration 49

has the co-domain ΩY′ = { 0, 1}. Table 1.2 shows the assignment of values of these random
variables to each element of Ω. Furthermore, we choose the σ-algebras AU = P (ΩU ) and
AX′ = AY′ = P ({ 0, 1}) to be the power sets of ΩU and of ΩX′ = ΩY′ , respectively. Hence, the
value space of U is (ΩU , AU ), and (ΩX′ , AX′ ) = (ΩY′ , AY′ ) is the value space of X and Y (see
RS-Def. 2.2).
Now, we specify the filtration FT = (Ft )t ∈T , T = {1, 2, 3}, in A by

F1 = σ(U ), F2 = σ(U , X ), F3 = σ(U , X , Y ) (3.5)

(see RS-Def. 2.12). The first of these three σ-algebras is the σ-algebra generated by U,

σ(U ) = U −1 (A ′ ): A ′ ∈ AU
© ª
(3.6)
= U −1 ({ Joe }), U −1 ({Ann }), U −1 (ΩU ), U −1 (Ø) .
© ª

This σ-algebra consists of four events, the inverse images U −1 (A ′ ) = {ω ∈ Ω: U (ω) ∈ A ′ } of


the elements A ′ of
© ª
AU = P (ΩU ) = { Joe }, {Ann }, ΩU , Ø

under the map U . These inverse images are the sets

U −1 ({ Joe }) = ( Joe , no , −), ( Joe , no , +), ( Joe , yes , −), ( Joe , yes , +)
© ª

= {ω1 , ω2 , ω3 , ω4 } (3.7)

U − 1 ({Ann }) = (Ann , no , −), (Ann , no , +), (Ann , yes , −), (Ann , yes , +)
© ª

= {ω5 , ω6 , ω7 , ω8 } (3.8)

U −1 (ΩU ) = Ω (3.9)

U −1 (Ø) = Ø (3.10)

(see Exercises 3-1 and 3-2). Hence, these sets are the four events that the person variable
U takes on

− the value Joe , which is the event that Joe is drawn,


− the value Ann , which is the event that Ann is drawn,
− a value in the set ΩU = { Joe , Ann }, which is the event that Joe or Ann is drawn,
− a value in the empty set, which is the event that no one is drawn.

The σ-algebra generated by (the bivariate random variable) (U , X ) (see RS-sect. 2.1.4) is

σ(U , X ) = (U , X ) − 1 (A ′ ): A ′ ∈ AU ⊗ AX′ .
© ª
(3.11)

This σ-algebra consists of 24 = 16 elements, the inverse images (U , X ) − 1 (A ′ ) of the ele-


ments A ′ of the product σ-algebra

AU ⊗ AX′ = σ(A 1 × A 2 : A 1 ∈ AU , A 2 ∈ AX′ )


n© ª © ª © ª © ª
= ( Joe , 0) , ( Joe , 1) , (Ann , 0) , (Ann , 1) ,
© ª © ª
( Joe , 0), ( Joe , 1) , (Ann , 0), (Ann , 1) ,
© ª © ª
( Joe , 0), (Ann , 0) , ( Joe , 1), (Ann , 1) , (3.12)
50 3 Time Order
© ª © ª
( Joe , 0), (Ann , 1) , ( Joe , 1), (Ann , 0) ,
© ª © ª
( Joe , 0), ( Joe , 1), (Ann , 0) , ( Joe , 0), ( Joe , 1), (Ann , 1) ,
© ª © ª
( Joe , 0), (Ann , 0), (Ann , 1) , ( Joe , 1), (Ann , 0), (Ann , 1) ,
o
ΩU × ΩX′ , Ø

(see RS-Def. 1.15). In this example, this set is identical to the power set of the Cartesian
product ΩU × ΩX′ . Hence, gathering the inverse images of all elements of AU ⊗ AX′ [see
Eq. (3.11)] yields
©
σ(U , X ) = {ω1 , ω2 }, {ω3 , ω4 }, {ω5 , ω6 }, {ω7 , ω8 },
{ω1 , ω2 , ω3 , ω4 }, {ω5 , ω6 , ω7 , ω8 },
{ω1 , ω2 , ω5 , ω6 }, {ω1 , ω2 , ω7 , ω8 },
{ω3 , ω4 , ω5 , ω6 }, {ω3 , ω4 , ω7 , ω8 }, (3.13)
{ω1 , ω2 , ω3 , ω4 , ω5 , ω6 }, {ω1 , ω2 , ω3 , ω4 , ω7 , ω8 },
ª
{ω1 , ω2 , ω5 , ω6 , ω7 , ω8 }, {ω3 , ω4 , ω5 , ω6 , ω7 , ω8 }, Ω, Ø

(see Exercise 3-4). Comparing σ(U ) to σ(U , X ) [see Eqs. (3.7) to (3.10)] shows that all ele-
ments of σ(U ) are also elements of σ(U , X ). Hence, σ(U ) ⊂ σ(U , X ).
Finally, the σ-algebra σ(U , X , Y ) generated by (U , X , Y ) is identical to the power set of
Ω. It consists of 28 = 256 elements. Because σ(U ) ⊂ σ(U , X ) and the power set consists
of all subsets of Ω, we can conclude σ(U ) ⊂ σ(U , X ) ⊂ σ(U , X , Y ). Hence, FT defined by
Equations (3.5) is a filtration in A. ⊳

3.2 Prior-To Relations

Now we introduce time order. More precisely, we define the prior-to relation among mea-
surable set systems (sets of events), and measurable maps (random variables). For these
definitions, it suffices to refer to a measurable space (Ω, A ). In the framework of a pro-
bability space (Ω, A, P ), measurable set systems (i.e., subsets of A ) are sets of events (see
RS-Def. 1.4) and measurable maps are random variables (see RS-Def. 2.2).
Reading the following definition, note that ∃ means ‘there is’ and ∧ symbolizes the
conjunction of two propositions, that is, the logical ‘and’. Furthermore, remember that
the σ-algebra σ(X ) generated by a measurable map X on (Ω, A ) is a set system satisfying,
among other things, σ(X ) ⊂ A (see RS-Def. 2.12).

Definition 3.3 [Prior-to Relations]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D ⊂ A.
(i) We say C is prior in FT to D (and D posterior in FT to C ), denoted C ≺ D, if
FT
(a) ∃ s ∈T: C ⊂ Fs ∧ D 6⊂ Fs , and
(b) ∃ t ∈T: D ⊂ Ft .
(ii) Let X and Y be measurable maps on (Ω, A ). Then we say X is prior in FT to Y
(and Y posterior in FT to X ), denoted X ≺ Y , if σ(X ) is prior in FT to σ(Y ).
FT
3.2 Prior-To Relations 51

Remark 3.4 [Prior-to Relation of Measurables Sets and of Events] Let (Ω, A ) be a mea-
surable space and {C }, {D } be set systems containing the sets C , D ∈ A as their only el-
ement. In the context of a probability space (Ω, A, P ), the sets C and D represent events.
We say that C is prior in FT to D (or D posterior in FT to C ) and denote it by C ≺ D , if
FT
{C } is prior in FT to {D }. Using this definition, all propositions about the prior-to rela-
tion of measurable set systems can be applied to the prior-to relation of measurable sets.
However, for brevity, we refrain from explicitly spelling out more details about the prior-to
relation of measurable sets or events. ⊳
Example 3.5 [Joe and Ann With Self-Selection] In Example 3.2, we specified the proba-
bility space (Ω, A, P ), the random variables U , X , Y, and the filtration FT = (Ft )t ∈T in A,
T = {1, 2, 3}, by
F1 = σ(U ), F2 = σ(U , X ), F3 = σ(U , X , Y ),

where σ(U ), σ(U , X ), and σ(U , X , Y ) are the σ-algebras generated by the random variables
U , (U , X ), and (U , X , Y ), respectively.
According to Definition 3.3 (i), the set system σ(U ) is prior in FT to the set system σ(X )
because σ(U ) ⊂ F1 , σ(X ) 6⊂ F1 [see Def. 3.3 (i) (a)], but σ(X ) ⊂ F2 [see Def. 3.3 (i) (b)]. Note
that
¡ ¢
F2 = σ(U , X ) = σ σ(U )∪ σ(X )

[see RS-Eq. (2.16)], which implies σ(X ) ⊂ F2. Similarly, σ(U ) is prior in FT to σ(Y ) because
σ(U ) ⊂ F1 , σ(Y ) 6⊂ F1 , but σ(Y ) ⊂ F3 . Again note that
¡ ¢
F3 = σ(U , X , Y ) = σ σ(U )∪ σ(X )∪ σ(Y )

[see again RS-Eq. (2.16)], which implies σ(Y ) ⊂ F3 . Finally, σ(X ) is prior in FT to σ(Y )
because σ(X ) ⊂ F2 and σ(Y ) 6⊂ F2, but σ(Y ) ⊂ F3 (see Exercise 3-5).
Remember, that random variables are measurable maps and that the prior-to relation
of measurable maps is defined via their generated σ-algebras . Now that we clarified the
prior-to relation of the set systems σ(U ), σ(X ), and σ(Y ), we can also conclude that U is
prior in FT to X and Y, and that X is prior in FT to Y [see Def. 3.3 (ii)]. ⊳

3.2.1 Properties of the Prior-to Relation of Measurable Set Systems

Now we study some properties of the prior-to relations. First of all, let us ascertain that
the prior-to relation of measurable set systems is asymmetric and transitive. Reading this
theorem, note that ¬ denotes the negation of a proposition and ⇒ the implication between
two propositions.

Theorem 3.6 [Asymmetry and Transitivity]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D, E ⊂ A.
Then:
(i) C F≺ D ⇒ ¬ (D F≺ C ) (asymmetry)
T T

(ii) (C F≺ D ∧ D F≺ E ) ⇒ C F≺ E . (transitivity)
T T T
(Proof p. 70)
52 3 Time Order

Hence, according to Proposition (i) of Theorem 3.6, the set system C being prior in FT
to the set system D implies that D is not prior in FT to C . This property of the prior-to
relation is called asymmetry. Furthermore, according to Proposition (ii) of Theorem 3.6,
the prior-to relation is transitive. That is, if C is prior in FT to D that itself is prior in FT to
E , then C is prior in FT to E .
The following remark prepares another important property of the prior-to relation of
measurable set systems.

Remark 3.7 [Some Propositions About Subsets] Some general properties of subsets are:

(A ⊂ B ∧ A ⊂ C ) ⇔ A ⊂ (B ∩ C ) (3.14)
(A ⊂ C ∧ B ⊂ C ) ⇔ (A ∪ B) ⊂ C (3.15)
and
(A ⊂ B ∧ B ⊂ C ) ⇒ A ⊂C (3.16)

(see Exercises 3-6 to 3-8). Hence, according to Proposition (3.16), if (Ω, A ) is a measurable
space and FT = (Ft )t ∈T a filtration in A, then:

(C 0 ⊂ C ∧ C ⊂ Ft ) ⇒ C 0 ⊂ Ft . (3.17)

Theorem 3.8 [An Implication of C F≺ D for a Subset of C ]


T
Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D ⊂ A. Then

C F≺ D ⇒ C 0 F≺ D, if C 0 ⊂ C. (3.18)
T T
(Proof p. 70)

Now we turn to some properties of the prior-to relation of measurable set systems
that involve σ-algebras generated by set systems. Let (Ω, A ) be a measurable space,
FT = (Ft )t ∈T a filtration in A, and C ⊂ A. Because each Ft , t ∈T , is a σ-algebra,

∀ t ∈T : C ⊂ Ft ⇔ σ(C ) ⊂ Ft (3.19)

holds for the σ-algebra σ(C ) generated by C [see RS-Def. 1.7 and RS-Prop. (1.5)]. This
proposition is used in the proof of the following theorem.

Theorem 3.9 [First Properties of the Prior-to Relation Involving σ-Algebras]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C , D ⊂ A . Then:

(i) C F≺ D ⇔ C F≺ σ(D)
T T

(ii) C ≺ D ⇔ σ(C ) ≺ D.
FT FT
(Proof p. 70)

Hence, according to Proposition (i) of Theorem 3.9, the set system C being prior in FT
to the set system D is equivalent to C being prior in FT to the σ-algebra σ(D) generated by
D. And, according to Proposition (ii) of this theorem, C being prior in FT to D is equivalent
to σ(C ) being prior in FT to D.
3.2 Prior-To Relations 53

Remark 3.10 [The σ-Algebra Generated by the Union of Two Set Systems] Now we turn
to some properties of the prior-to relation of set systems involving the σ-algebra generated
by the union of two set systems. This is of interest because
¡ ¢
σ(X , Y ) = σ σ(X ) ∪ σ(Y ) (3.20)

(see RS-Lem. 2.15). According to this equation, the union of the σ-algebras σ(X ) and σ(Y )
generated by the measurable maps X and Y on a measurable space (Ω, A ) generates the
σ-algebra generated by the bivariate measurable map (X , Y ) (see RS-sect. 2.1.4).
Note that, if C is a σ-algebra on Ω, then
¡ ¢
σ(X ) ∪ σ(Y ) ⊂ C ⇔ σ σ(X ) ∪ σ(Y ) ⊂ C (3.21)

(see RS-Rem. 1.9). Therefore, and because a filtration FT = (Ft )t ∈T is a family of σ-algebras,
all properties of the prior-to relation of set systems involving the σ-algebra generated by the
union of two set systems can be translated to corresponding properties involving a bivariate
measurable map or a bivariate random variable (see RS-sect. 2.1.4). ⊳

In the following theorem we use the notation

C, D ≺ E :⇔ C ≺E ∧ D≺E. (3.22)
FT FT FT

Theorem 3.11 [Properties Involving the Union of Set Systems]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D, E ⊂ A.
Then:
(i) (C F≺ D ∧ ∃ t ∈T : E ⊂ Ft ) ⇒ C F≺ σ(D ∪ E )
T T

(ii) C, D F≺ E ⇔ σ(C ∪ D) F≺ E .
T T
(Proof p. 71)

Hence, according to Proposition (i) of Theorem 3.11, if C is prior in FT to D and E


is in the filtration (i.e., there is a t ∈T with E ⊂ Ft ), then C is also prior in FT to σ(D ∪ E ).
Furthermore, C and D being prior in FT to E is equivalent to σ(C ∪ D) being prior in FT
to E [see Prop. (ii)].

Remark 3.12 [A Special Case] Note that C F≺ E implies the existence of a t ∈T with E ⊂ Ft
T
[see Prop. (b) of Def. 3.3 (i)]. Hence,

C F≺ D, E ⇒ C F≺ σ(D ∪ E ) (3.23)
T T

is an immediate implication of Theorem 3.11 (i). In this proposition, the premise is a short-
cut for C F≺ D ∧ C F≺ E . Hence, if C is prior in FT to D and to E , then C is also prior in FT
T T
to the σ-algebra generated by the union of D and E . ⊳

According to the following theorem, C being prior in FT to D is equivalent to C being


prior in FT to the σ-algebra generated by the union of C and D.
54 3 Time Order

Theorem 3.13 [Another Property Involving the Union of Set Systems]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D ⊂ A. Then:

C F≺ D ⇔ C F≺ σ(C ∪ D). (3.24)


T T
(Proof p. 71)

Example 3.14 [Joe and Ann With Self-Selection] In Example 3.2, we already specified the
probability space (Ω, A, P ), the random variables U , X , Y, and the filtration FT = (Ft )t ∈T
in A, T = {1, 2, 3}, by

F1 = σ(U ), F2 = σ(U , X ), F3 = σ(U , X , Y ) .


¡ ¢
Remember, σ(X , Y ) = σ σ(X ) ∪ σ(Y ) [see Eq.¡ (3.20)], which implies ¢ σ(X ), σ(Y ) ⊂ σ(X , Y )
[see RS-Prop. (1.9)]. Similarly, σ(U , X , Y ) = σ σ(U )∪ σ(X ) ∪ σ(Y ) , and this implies σ(U ),
σ(X , Y ) ⊂ σ(U , X , Y ).
In this example,

σ(U ) F≺ σ(X ), σ(Y ), σ(X , Y ), σ(U , X , Y ),


T

because σ(U ) ⊂ F1 and σ(X ), σ(Y ), σ(X , Y ), σ(U , X , Y ) 6⊂ F1 , but σ(X ) ⊂ F2 and σ(Y ),
σ(X , Y ), σ(U , X , Y ) ⊂ F3 [see Def. 3.3 (i)]. That is, σ(U ) is prior in FT to σ(X ), prior to
σ(Y ), prior to σ(X , Y ), and prior to σ(U , X , Y ).
Finally,

σ(X ) F≺ σ(Y ), σ(X , Y ), σ(U , X , Y ),


T

because σ(X ) ⊂ F2 and σ(Y ), σ(X , Y ), σ(U , X , Y ) 6⊂ F2, but σ(Y ), σ(X , Y ), σ(U , X , Y ) ⊂ F3
[see again Def. 3.3 (i)]. Hence, σ(X ) is prior in FT to σ(Y ), prior to σ(X , Y ), and prior to
σ(U , X , Y ). ⊳

3.2.2 Properties of the Prior-to Relation of Measurable Maps

Now we will translate the properties of the prior-to relation of measurable set systems to
the prior-to relation of measurable maps. For a full understanding, the following remarks
on the measurability of a composition are important.

Remark 3.15 [Composition of Two Maps] Let X : Ω → ΩX′ and g : ΩX′ → Ωg′ be maps. Then
a map W : Ω → Ωg′ is called the composition of X and g , denoted g ◦ X or g (X ) if
¡ ¢
W (ω) = g X (ω) , ∀ω ∈ Ω . (3.25)

The following lemma is useful whenever we deal with the composition of two measur-
able maps. (For a proof, see RS-Theorem 2.11, RS-Definition 2.12, RS-Remark 2.13, and
Lemma 2.34).
3.2 Prior-To Relations 55

Lemma 3.16 [Measurability of a Composition]


Let X : (Ω, A ) → (ΩX′ , AX′ ) and g : (ΩX′ , AX′ ) → (Ωg′ , Ag′ ) be measurable maps and assume
W = g (X ). Then W is X-measurable, that is, then σ(W ) ⊂ σ(X ).

According to this lemma, we can conclude that W is X -measurable, whenever it is the


composition of a measurable map X and another measurable map g . Note that no re-
quirements are necessary for the three measurable spaces involved. However, this lemma
only contains a sufficient condition of X -measurability of W .

Remark 3.17 [Measurability of a Composition of X and g ] Now we add a requirement for


the measurable space (Ωg′ , Ag′ ). This yields a necessary and sufficient condition of X -mea-
surability of W. Let X : (Ω, A ) → (ΩX′ , AX′ ) and W : (Ω, A ) → (R, B) be measurable maps on
a measurable space (Ω, A ). Then σ(W ) ⊂ σ(X ) if and only if there is a measurable map
g : (ΩX′ , AX′ ) → (R, B) such that W = g (X ) is the composition of X and g (see RS-Lem. 2.35).
Hence, if W is numerical, then it is X-measurable if and only if it can be written as the com-
position of X and g . ⊳

Example 3.18 [Some Simple Examples] If W = α · X , α ∈ R , then W is X -measurable (see


Exercise 3-9). Another example is W = 1X =x , the indicator of X taking on the value x. Note
that, in Lemma 3.16 and Remark 3.17, X may also be a multidimensional map. For exam-
ple, if W = β1 X 1 + β2 X 2 with β1 , β2 ∈ R , then W is (X 1 , X 2 )-measurable, that is, W is measur-
able with respect to the bivariate map X =(X 1 , X 2 ) (see Exercise 3-10). Another example is
the product W = 1X =x ·1Y =y of the indicators of X taking on the value x and Y taking on the
value y. In this example, W is (X , Y )-measurable and (1X =x , 1Y =y )-measurable. ⊳

Now we turn to properties of the prior-to relation of measurable maps; the most im-
portant ones are gathered in Box 3.1. Because random variables on a probability space
(Ω, A, P ) are measurable maps on the measurable space (Ω, A ), all propositions about
measurable maps in this box and in the present section also hold for random variables.
Box 3.1 starts repeating the definition of the prior-to relation of measurable maps. This
definition is based on the framework of a measurable space (Ω, A ) and a filtration FT in A.
Proposition (i) of Box 3.1 is the asymmetry property of the prior-to relation of measurable
maps. According to this proposition, if X is prior in FT to Y, then Y is not prior in FT to X .
This proposition immediately follows from Theorem 3.6 (i) because the prior-to relation
of measurable maps is defined via their generated σ-algebras [see Def. 3.3 (ii)].
According to Proposition (ii) of Box 3.1, X is prior in FT to Y if and only if X is prior in
FT to (X , Y ), where (X , Y ) denotes the bivariate measurable map on (Ω, A ) that consists of
X and Y (for more details see RS-sect. 2.1.4). This proposition is an immediate implication
of Theorem 3.13 and Definition 3.3 (ii).
According to Propositions (iii) and (iv) of this box, if W is X -measurable, then X being
prior in FT to Y implies that W is also prior in FT to Y and to the bivariate map (X , Y ).
Proposition (iii) is an immediate implication of Theorem 3.8, whereas Proposition (iv) fol-
lows from Theorem 3.13, Proposition (3.20), and Theorem 3.8 [see Exercise 3-11].
Now we turn to Propositions (v) and (vi) of Box 3.1, which involve an additional measur-
able map Z on the measurable space (Ω, A ). We prove these propositions in the following
theorem. In this theorem, we use the notation
56 3 Time Order

Box 3.1 Properties of the prior-to relation of measurable maps

Let X and Y be measurable maps on a measurable space (Ω,A ), let FT = (Ft )t ∈T be a filtration
in A, and let σ(X ) and σ(Y ) denote the σ-algebras generated by X and Y, respectively. Then we
say that X is prior in FT to Y (and Y posterior in FT to X ), denoted X ≺ Y , if the following two
FT
conditions hold:
(a) ∃ s ∈T : σ(X ) ⊂ Fs ∧ σ(Y ) 6⊂ Fs
(b) ∃ t ∈T : σ(Y ) ⊂ Ft .

A first property is
X ≺Y ⇒ ¬(Y ≺ X ). (asymmetry) (i)
FT FT

Additionally, let (X ,Y ) denote the bivariate measurable map consisting of X and Y. Then:
X ≺Y ⇔ X ≺ (X ,Y ). (ii)
FT FT

Additionally, let also W be a measurable map on (Ω,A ). Then:

X ≺Y ⇒ W ≺ Y, if σ(W ) ⊂ σ(X ) (iii)


FT FT
X ≺Y ⇒ W ≺ (X ,Y ), if σ(W ) ⊂ σ(X ) . (iv)
FT FT

Additionally, let also Z be a measurable map on (Ω,A ), let (Y , Z ) denote the bivariate map
consisting of Y and Z , and σ(X ,Y ) the σ-algebra generated by (X ,Y ). Then:

X ,Y ≺ Z ⇒ W ≺ Z, if σ(W ) ⊂ σ(X ,Y ) (v)


FT FT
X ≺Y ∧ Y ≺Z ⇒ W ≺ Z, if σ(W ) ⊂ σ(X ,Y ) . (vi)
FT FT FT

Additionally, let σ(Z ) denote the σ-algebra generated by Z . Then:


¡ ¢
X ≺ Y ∧ ∃ t ∈T : σ(Z ) ⊂ Ft ⇒ W ≺ (Y , Z ), if σ(W ) ⊂ σ(X ) . (vii)
FT FT

X , Y F≺ Z :⇔ (X F≺ Z ∧ Y F≺ Z ). (3.26)
T T T

Theorem 3.19 [Further Implications]


Let W, X , Y , Z be measurable maps on a measurable space (Ω, A ) and FT = (Ft )t ∈T a
filtration in A. Then:

(i) X , Y F≺ Z ⇒ W F≺ Z , if σ(W ) ⊂ σ(X , Y ).


T T

(ii) (X F≺ Y ∧ Y F≺ Z ) ⇒ W F≺ Z , if σ(W ) ⊂ σ(X , Y ).


T T T
(Proof p. 72)

Hence, according to Proposition (v) of Box 3.1, if X and Y are prior in FT to Z , then
each (X , Y )-measurable map W is prior in FT to Z as well. Note that each X -measurable
map W is also (X , Y )-measurable, that is,

σ(W ) ⊂ σ(X ) ⇒ σ(W ) ⊂ σ(X , Y ) . (3.27)


3.2 Prior-To Relations 57

Furthermore, according to Proposition (vi) of Box 3.1, if X is prior in FT to Y that itself


is prior in FT to Z , then each (X , Y )-measurable map W is also prior in FT to Z .

Remark 3.20 [Two Special Cases] For W =X , Proposition (ii) of Theorem 3.19 yields

(X F≺ Y ∧ Y F≺ Z ) ⇒ X F≺ Z . (transitivity) (3.28)
T T T

Hence, if X is prior in FT to Y and Y is prior to Z , then X is prior in FT to Z as well. This


is the transitivity property of the prior-to relation of measurable maps.
Furthermore, for W =(X , Y ), Proposition (ii) of Theorem 3.19 yields

(X F≺ Y ∧ Y F≺ Z ) ⇒ (X , Y ) F≺ Z . (3.29)
T T T

Hence, if X is prior in FT to Y and Y is prior to Z , then the bivariate measurable map


(X , Y ) is prior in FT to Z as well. ⊳

Now we turn to Proposition (vii) of Box 3.1, which is proved in the following theorem.

Theorem 3.21 [Another Implication of X F≺ Y for an X -Measurable Map]


T
Let W, X , Y , Z be measurable maps on a measurable space (Ω, A ) and FT = (Ft )t ∈T a
filtration in A. Then
¡ ¢
X F≺ Y ∧ ∃ t ∈T : σ(Z ) ⊂ Ft ⇒ W F≺ (Y , Z ), if σ(W ) ⊂ σ(X ) . (3.30)
T T
(Proof p. 72)

Hence, if X is prior in FT to Y and Z is in the filtration FT , then each X -measurable


map W is prior in FT to the bivariate map (Y , Z ).

Remark 3.22 [More Implications of X F≺ Y ] For W =X , Theorem 3.21 implies


T

¡ ¢
X F≺ Y ∧ ∃ t ∈T : σ(Z ) ⊂ Ft ⇒ X F≺ (Y , Z ). (3.31)
T T

Hence, if X is prior in FT to Y and Z is in the filtration FT , then X is prior to the bivariate


map (Y , Z ). ⊳

Example 3.23 [Joe and Ann With Self-Selection] In Example 3.14, we already showed that

σ(U ) F≺ σ(X ), σ(Y ), σ(X , Y ), σ(U , X , Y ) .


T

and

σ(X ) F≺ σ(Y ), σ(X , Y ), σ(U , X , Y ) .


T

Therefore, according to Definition 3.3 (ii),

U ≺ X , Y , (X , Y ), (U , X , Y ). (3.32)
FT
58 3 Time Order

That is, U is prior in FT to X , prior in FT to Y, and prior in FT to (the multivariate random


variables) (X , Y ) and (U , X , Y ). Furthermore,

X F≺ Y , (X , Y ), (U , X , Y ), (3.33)
T

that is, X is prior in FT to Y, prior in FT to the bivariate random variable (X , Y ), and prior
in FT to the trivariate random variable (U , X , Y ).
In order to illustrate some propositions of Box 3.1, consider the indicator 1U = Joe of the
event that Joe is sampled. Because 1U = Joe is U -measurable, according to Proposition (3.32)
and Box 3.1 (iii),

1U = Joe F≺ X , Y , (X , Y ), (U , X , Y ). (3.34)
T

Furthermore, according to Propositions (3.32), (3.33), and Box 3.1 (vi),

1U = Joe F≺ Y , (X , Y ), (U , X , Y ). (3.35)
T

Finally, consider the product 1U = Joe · X , which is the indicator of the event that Joe is
sampled and treated. (Note that, in this example, X is an indicator, too.) This indicator is
(U , X )-measurable, that is, σ(1U = Joe · X ) ⊂ σ(U , X ). Hence, according to (3.32), (3.33), and
Box 3.1 (v), the random variable 1U = Joe · X is prior in FT to Y. ⊳

3.3 Simultaneous-to Relations

In chapter 2, we already discussed that we consider as potential confounders of X all those


random variables that are prior or simultaneous to the focused putative cause variable
X . As an example of a potential confounder that is simultaneous to X we mentioned a
second treatment variable that is manipulated at the same time as X . As another example,
consider studying the effects of X , the amount of antibodies at time t . In such a case, we
might also consider Z , the amount of leucocites at time t , referring to the same time point
t . Then X and Z would be simultaneous to each other.
Again, we start defining the simultaneous-to relation for measurable set systems C and
D, and extend it to measurable maps X and Y via their generated σ-algebras σ(X ) and
σ(Y ).

Definition 3.24 [Simultaneous-to Relations]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C , D ⊂ A.

(i) We say that C and D are simultaneous in (or, with respect to) FT , denoted
C ≈ D, if the following two conditions hold:
FT
(a) ∃ t ∈T : C ⊂ Ft
(b) ∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft .
(ii) Let X and Y be measurable maps on (Ω, A ). Then we say that X and Y are
simultaneous in FT , denoted X ≈ Y , if σ(X ) and σ(Y ) are simultaneous in
FT
FT .
3.3 Simultaneous-to Relations 59

The simultaneous-to relation always refers to a measurable space (Ω, A ) and a fil-
tration FT = (Ft )t ∈T in A. For C and D to be simultaneous in FT , or, synonymously,
for C to be simultaneous in FT to D , we require that there is an element t ∈ T such that
the set system C is a subset of Ft [see Def. 3.24 (i) (a)]. In this case we also say that
C is in the filtration FT . Furthermore, in condition (b) of this definition, we require that
C is a subset of Ft if and only if D is a subset of Ft , for all t ∈T . Note that the conjunc-
tion of (a) and (b) implies that D is in the filtration FT as well, that is, there is a t ∈T such
that D ⊂ Ft . Note again that, in the context of a probability space (Ω, A, P ), a measurable
map is a random variable on the probability space (Ω, A, P ), and measurable set systems
C, D ⊂ A are sets of events.

Remark 3.25 [Simultaneous-to Relation of Measurables Sets and of Events] Let (Ω, A )
be a measurable space and {C }, {D } set systems containing the sets C , D ∈ A as their only
element. We say that C and D are simultaneous in FT , denoted C ≈ D , if {C } and {D } are
FT
simultaneous in FT . Using this definition, all propositions about the simultaneous-to re-
lation of measurable set systems are easily translated to the simultaneous-to relation of
measurable sets. Again remember, in the context of a probability space (Ω, A, P ), the sets C
and D represent events. Again, for brevity, we will not treat any details of the simultaneous-
to relation of measurable sets. ⊳
Example 3.26 [Joe and Ann With Self-Selection] In Example 3.2, we already specified the
probability space (Ω, A, P ), the random variables U , X , Y, and the filtration FT = (Ft )t ∈T .
In this example, the person variable U and the event { ©Joe } × ΩX × ΩY thatª Joe is sampled
are simultaneous in FT . This is because the set system { Joe } × ΩX × ΩY is a subset of the
σ-algebra generated by U and because the first σ-algebra F1 in the filtration FT has been
defined to be the σ-algebra generated© by U . Hence,
ª conditions (a) and (b) of Definition
3.24 (i) hold for C = σ(U ) and D = { Joe } × ΩX × ΩY . ⊳

3.3.1 Properties of the Simultaneous-to Relation of Measurable Set Systems

Now we study some elementary properties of the simultaneous-to relation. First of all, we
show that this relation is reflexive, symmetric and transitive, that is, we show that it is an
equivalence relation.

Theorem 3.27 [Reflexivity, Symmetry, and Transitivity]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D, E ⊂ A.
Then:
(i) ∃ t ∈T : C ⊂ Ft ⇒ C ≈C (reflexivity)
FT
(ii) C ≈ D ⇒ D ≈C (symmetry)
FT FT
(iii) (C ≈ D ∧ D ≈ E ) ⇒ C ≈ E. (transitivity)
FT FT FT
(Proof p. 73)

Now we turn to some properties of the simultaneous-to relation involving the σ-algebras
σ(C ) and σ(C ∪ D) generated by the set systems C and C ∪ D, respectively (see RS-
Def. 1.7). The motivation is to obtain some properties of the simultaneous-to relation of
measurable maps and random variables (see again Rem. 3.10).
60 3 Time Order

Theorem 3.28 [Properties of the Simultaneous-to Relation Involving σ-Algebras]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C , D ⊂ A. Then:

(i) C ≈ D ⇔ σ(C ) ≈ D
FT FT
(ii) C ≈ D ⇒ σ(C ∪ D) ≈ D.
FT FT
(Proof p. 73)

Hence, according to Theorem 3.28, the set systems C and D being simultaneous in FT
is equivalent to σ(C ) (i.e., the σ-algebra generated by C ) being simultaneous in FT to D
[see Prop. (i)]. Furthermore, C and D being simultaneous in FT implies that σ(C ∪D) (i.e.,
the σ-algebra generated by the union C ∪ D) is simultaneous in FT to D [see Prop. (ii)].
In the following theorem we treat a property of the simultaneous-to relation involving
a third measurable set system E ⊂ A.

Theorem 3.29 [A Proposition Involving a Third Measurable Set Systems]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D, E ⊂ A.
Then
(C ≈ D ∧ D ≈ E ) ⇒ σ(C ∪ D) ≈ E . (3.36)
FT FT FT
(Proof p. 74)

Hence, if the set systems C , D and E are simultaneous in FT to each other, then the
σ-algebra generated by the union of C and D is simultaneous in FT to E .

Example 3.30 [Joe and Ann With Self-Selection] In Example 3.2, we already specified the
probability space (Ω, A, P ), the random variables U , X , Y, and the filtration FT = (Ft )t ∈T
for the random experiment presented in Table 1.2. In this example,

σ(X ) ≈ σ(U , X ),
FT

and

σ(Y ) ≈ σ(X , Y ), σ(U , X , Y ) .


FT

That is, σ(X ) is simultaneous in FT to σ(U , X ), and σ(Y ) is simultaneous in FT to σ(X , Y )


and to σ(U , X , Y ). Therefore, according to Definition 3.3 (ii),

X ≈ (U , X )
FT

and

Y ≈ (X , Y ), (U , X , Y ).
FT

That is, X is simultaneous in FT to (the multivariate random variable) (U , X ), and Y is


simultaneous in FT to (X , Y ) and to (U , X , Y ). ⊳
3.3 Simultaneous-to Relations 61

Now we treat some theorems involving the prior-to and the simultaneous-to relations.
In the first one we show that the set system C being prior in FT to the set system D implies
that σ(C ∪ D), the σ-algebra generated by the union of C and D, is simultaneous in FT to
D and that C being simultaneous in FT to D implies that C is not prior in FT to D.

Theorem 3.31 [Combining the Prior-to and Simultaneous-to Relations]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D ⊂ A. Then:

(i) C F≺ D ⇒ σ(C ∪ D) ≈ D
T FT

(ii) C ≈ D ⇒ ¬(C F≺ D).


FT T
(Proof p. 74)

In the following theorem we study an implication of C and D being prior or simultane-


ous in FT for a subset C 0 of the σ-algebra generated by the union of C and D.

Theorem 3.32 [Combining the Prior-to and Simultaneous-to Relations]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D ⊂ A. If
C 0 ⊂ σ(C ∪ D), then:

(i) C ≈ D ⇒ (C 0 ≺ D ∨ C 0 ≈ D)
FT FT FT
(ii) C F≺ D ⇒ (C 0 F≺ D ∨ C 0 ≈ D).
T T FT
(Proof p. 74)

Hence, according to the two propositions of this theorem, if the set system C is prior or
simultaneous in FT to the set system D and C 0 is a subset of σ(C ∪ D), then C 0 is prior or
simultaneous in FT to D.
Finally, we show: If C is simultaneous in FT to D that itself is prior in FT to E , then
C 0 is prior in FT to E , provided that C 0 is a subset of σ(C ∪ D). Furthermore, if C is prior
in FT to D that itself is simultaneous in FT to E , and C 0 is a subset of C, then we can
conclude that C 0 is prior in FT to E .

Theorem 3.33 [Combining the Prior-to and Simultaneous-to Relations]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D, E ⊂ A.
Then:

(i) (C ≈ D ∧ D F≺ E ) ⇒ C 0 F≺ E , if C 0 ⊂ σ(C ∪ D)
FT T T

(ii) (C ≺ D ∧ D ≈ E ) ⇒ C0 ≺ E , if C 0 ⊂ C.
FT FT FT
(Proof p. 75)

In the following remark, we consider the special case of Theorem 3.33 in which C 0 =C.

Remark 3.34 [A Special Case] Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration
in A, and C, D, E ⊂ A. Then:
(i) (C ≈ D ∧ D ≺ E ) ⇒ C ≺E
FT FT FT
62 3 Time Order

(ii) (C F≺ D ∧ D ≈ E ) ⇒ C F≺ E .
T FT T

Hence, if the set system C is simultaneous in FT to the set system D that itself is prior
in FT to the set system E , then C is prior in FT to E as well. Furthermore, if C is prior in
FT to D and D is simultaneous in FT to E , then C is also prior in FT to E . ⊳

3.3.2 Properties of the Simultaneous-to Relation of Measurable Maps

Now we turn to properties of the simultaneous-to relation of measurable maps, the most
important of which are gathered in Box 3.2. Again remember, because random variables
on a probability space (Ω, A, P ) are measurable maps on the measurable space (Ω, A ), all
propositions about the simultaneous-to relation of measurable maps in this box and in
this section also hold for random variables.
Box 3.2 starts repeating the definition of the simultaneous-to relation of measurable
maps. Propositions (i) (reflexivity) and (ii) (symmetry) of Box 3.2 immediately follow from
Propositions (i) and (ii) of Theorem 3.27 because the simultaneous-to relation of measur-
able maps is defined via their generated σ-algebras [see Def. 3.24 (ii)].
Proposition (iii) of Box 3.2 is an immediate implication of Theorem 3.31 (ii) because
the simultaneous-to and prior-to relations of measurable maps are defined via the corre-
sponding relation of their generated σ-algebras [see Defs. 3.3 (ii) and 3.24 (ii)]. According
to this proposition, if X is simultaneous in FT to Y, then X is not prior in FT to Y, nor is Y
prior in FT to X (which follows from symmetry).
According to Propositions (iv) and (v) of Box 3.2, X being simultaneous or prior in FT
to Y implies that the bivariate map (X , Y ) is also simultaneous in FT to Y. The first of these
propositions immediately follows from Theorem 3.28 (ii), Definition 3.24 (ii), and Equation
(3.20), the second from Theorem 3.31 (i), Definition 3.24 (ii), and Equation (3.20).
Proposition (vi) of Box 3.2 is called transitivity of the simultaneous-to relation of mea-
surable maps. According to this proposition, if X is simultaneous in FT to Y and Y simul-
taneous in FT to Z , then X is also simultaneous in FT to Z . This proposition immediately
follows from Theorem 3.27 (iii), Definition 3.24 (ii), and Equation (3.20).
According to Proposition (vii) of Box 3.2, X being simultaneous in FT to Y and Y being
simultaneous in FT to Z implies that the bivariate map (X , Y ) is also simultaneous in FT
to Z . This proposition immediately follows from Theorem 3.29, Definition 3.24 (ii), and
Equation (3.20). Symmetry immediately yields
¡ ¢
X ≈Y ∧ Y ≈ Z ⇒ (X , Z ) ≈ Y . (3.37)
FT FT FT

According to Propositions (viii) and (ix) of Box 3.2, X being prior or simultaneous in
FT to Y implies that each (X , Y )-measurable map W is prior or simultaneous in FT to Y
(see again Rem. 3.15 to Example 3.18). These two propositions are proved in the following
theorem.

Theorem 3.35 [Two Propositions Involving the Prior-to Relation]


Let W, X , Y be measurable maps on a measurable space (Ω, A ), let FT = (Ft )t ∈T be a
filtration in A, and assume σ(W ) ⊂ σ(X , Y ). Then
(i) X F≺ Y ⇒ (W F≺ Y ∨ W ≈ Y )
T T FT
3.3 Simultaneous-to Relations 63

Box 3.2 Properties of the simultaneous-to relation of measurable maps

Let X and Y be measurable maps on a measurable space (Ω,A ), let FT = (Ft )t ∈T be a filtration
in A , and let σ(X ) and σ(Y ) denote the σ-algebras generated X and Y , respectively. Then we
say that X is simultaneous to Y in FT , denoted X ≈ Y , if the following two conditions hold:
FT
(a) ∃ t ∈T : σ(X )⊂ Ft
(b) ∀ t ∈T : σ(X ) ⊂ Ft ⇔ σ(Y ) ⊂ Ft .

Some first properties are

∃ t ∈T : σ(X ) ⊂ Ft ⇒ X ≈X (reflexivity) (i)


FT
X ≈Y ⇒ Y ≈X (symmetry) (ii)
FT FT
X ≈Y ⇒ ¬(X ≺ Y ). (iii)
FT FT

Additionally, let (X ,Y ) denote the bivariate measurable map consisting of X and Y. Then:

X ≈Y ⇒ (X ,Y ) ≈ Y (iv)
FT FT
X ≺Y ⇒ (X ,Y ) ≈ Y . (v)
FT FT

Additionally, let also Z denote a measurable map on (Ω,A ). Then:

(X ≈ Y ∧ Y ≈ Z ) ⇒ X ≈Z (transitivity) (vi)
FT FT FT
(X ≈ Y ∧ Y ≈ Z ) ⇒ (X ,Y ) ≈ Z . (vii)
FT FT FT

Additionally, let also W be a measurable map on (Ω,A ). Then:

X ≺Y ⇒ (W ≺ Y ∨ W ≈ Y ), if σ(W ) ⊂ σ(X ,Y ) (viii)


FT FT FT
X ≈Y ⇒ (W ≺ Y ∨ W ≈ Y ), if σ(W ) ⊂ σ(X ,Y ) (ix)
FT FT FT
(X ≈ Y ∧ Y ≺ Z ) ⇒ W ≺ Z, if σ(W ) ⊂ σ(X ,Y ) (x)
FT FT FT

(X ≺ Y ∧ Y ≈ Z ) ⇒ W ≺ Z, if σ(W ) ⊂ σ(X ). (xi)


FT FT FT

(ii) X ≈ Y ⇒ (W F≺ Y ∨ W ≈ Y ).
FT T FT
(Proof p. 76)

According to Proposition (x) of Box 3.2, if X is simultaneous in FT to Y that itself is prior


in FT to Z , then each (X , Y )-measurable map W is also prior in FT to Z . This proposition
immediately follows from Theorem 3.33 (i), Definition 3.3 (ii), and Definition 3.24 (ii) (see
Exercise 3-12).
Finally, according to Proposition (xi) of Box 3.2, if X is prior in FT to Y that itself is
simultaneous in FT to Z , then each X -measurable map W is also prior in FT to Z . This
proposition immediately follows from Theorem 3.33 (ii), Definition 3.3 (ii), and Definition
3.24 (ii). Note that X -measurability of W implies that W is (X , Y )-measurable, but not vice
versa.
64 3 Time Order

Remark 3.36 [Two Special Cases] For W =X , Proposition (x) of this box yields
¡ ¢
X ≈ Y ∧ Y F≺ Z ⇒ X F≺ Z . (3.38)
FT T T

Hence, if X is simultaneous in FT to Y that itself is prior in FT to Z , then X is prior in FT


to Z .
Similarly, for W =X , Proposition (xi) of Box 3.2 yields
¡ ¢
X ≺Y ∧ Y ≈Z ⇒ X ≺ Z. (3.39)
FT FT FT

According to this proposition, if X is prior in FT to Y and Y is simultaneous in FT to Z ,


then X is also prior in FT to Z . ⊳

3.4 Prior-or-Simultaneous-to Relations

As noted before, the intuitive idea of a potential confounder W of X is that is it a random


variable that is prior or simultaneous to X . This concept is now defined for measurable
maps — and with it for random variables — and for measurable set systems — and with
for sets of events. Reading the following definition, remember that the σ-algebra σ(X ) gen-
erated by a measurable map X is a set system (see RS-Def. 2.12).

Definition 3.37 [Prior-or-Simultaneous-to Relations]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D ⊂ A .

(i) We say that C is prior or simultaneous in FT to D, denoted C 4 D, if C is


FT
prior in FT to D or simultaneous in FT to D.
(ii) Let X and Y be measurable maps on (Ω, A ). Then we say that X is prior or
simultaneous in FT to Y , denoted X 4 Y , if σ(X ) is prior or simultaneous in
FT
FT to σ(Y ).

Remark 3.38 [Prior-or-Simultaneous-to Relation of Measurables Sets and of Events]


Let (Ω, A ) be a measurable space and {C }, {D } set systems containing the sets C , D ∈ A
as their only element. Then we say that C and D are prior or simultaneous in FT , denoted
C 4 D , if {C } and {D } are prior or simultaneous in FT . Remember again that in the con-
FT
text of a probability space (Ω, A, P ), the sets C and D represent events. Again, for the sake
brevity, we will not treat any details of the prior-or-simultaneous-to relation of measurable
sets. However, using this definition, all propositions about the prior-or-simultaneous-to
relation of measurable set systems treated in this section are easily translated to the prior-
or-simultaneous-to relation of measurable sets. ⊳

3.4.1 Properties of the Prior-or-Simultaneous-to Relation of Measurable Set


Systems

Now we show that the prior-or-simultaneous-to relation is reflexive, pseudo-antisym-


metric, linear, and transitive.
3.4 Prior-or-Simultaneous-to Relations 65

Theorem 3.39 [Reflexivity, Pseudo-Antisymmetry, and Transitivity]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D ⊂ A. Then:

(i) ∃ t ∈T : C ⊂ Ft ⇒ C 4C (reflexivity)
FT
(ii) (C 4 D ∧ D 4 C ) ⇒ C ≈D (pseudo-antisymmetry)
FT FT FT
(iii) (∃ s, t ∈T : C ⊂ Fs ∧ D ⊂ Ft ) ⇒ (C 4 D ∨ D 4 C ). (linearity)
FT FT

If, additionally, E ⊂ A, then

(iv) (C 4 D ∧ D 4 E ) ⇒ C 4E . (transitivity)
FT FT FT
(Proof p. 76)

Hence, according to Proposition (i) of Theorem 3.39, if the set system C is in the filtra-
tion, then it is prior or simultaneous in FT to itself. Furthermore, if C is prior or simulta-
neous in FT to the set system D that itself is prior or simultaneous in FT to C , then we
can conclude that C and D are simultaneous in FT to each other [see Prop. (ii)]. Further-
more, if C and D are in the filtration FT , then C is prior or simultaneous in FT to D or D
is prior or simultaneous in FT to C [see Prop. (iii)]. Finally, according to Proposition (iv),
if C is prior or simultaneous in FT to D that itself is prior or simultaneous in FT to the set
system E , then C is prior or simultaneous in FT to E .
Now we treat an implication of a set system C being prior or simultaneous in a filtration
FT to a set system D for a subset C 0 of σ(C ∪ D), the σ-algebra generated by the union of
the sets C and D.

Theorem 3.40 [An Implication of C 4 D for a Subset of σ(C ∪ D)]


FT
Let (Ω, A ) be a measurable space, let FT = (Ft )t ∈T a filtration in A, and C, D ⊂ A.
Then:
C 4D ⇒ C 0 4 D, if C 0 ⊂ σ(C ∪ D). (3.40)
FT FT
(Proof p. 78)

Hence, if the measurable set system C is prior or simultaneous in FT to a measurable


set system D, then each subset of the σ-algebra generated by the union C ∪ D is also prior
or simultaneous in FT to D.
The following corollary is a generalization of relexivity of the prior-or-simultaneous-to
relation of measurable set systems to subsets of a measurable set system.

Corollary 3.41 [A Generalization of Reflexivity]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A and C ⊂ A. Then:

∃ t ∈T : C ⊂ Ft ⇒ C 0 4 C, if C 0 ⊂ C . (3.41)
FT
(Proof p. 78)

According to Proposition (3.41), if C is in the filtration FT , then each subset of C is


prior or simultaneous to C.
66 3 Time Order

In the following theorems we treat some properties of the prior-or-simultaneous-to re-


lation of measurable set systems involving σ-algebra generated by the union of two set
systems (for the motivation see again Rem. 3.10).

Theorem 3.42 [An Implication of C 4 D for a Subset of σ(C ∪ D)]


FT
Let (Ω, A ) be a measurable space, let FT = (Ft )t ∈T a filtration in A, and C, D ⊂ A.
Then:
C 4D ⇒ C 0 4 σ(C ∪ D), if C 0 ⊂ σ(C ∪ D). (3.42)
FT FT
(Proof p. 78)

According to this theorem, if the set system C is prior or simultaneous in FT to the set
system D, then each subset C 0 of the σ-algebra σ(C ∪ D) generated by the union of C and
D is prior or simultaneous in FT to σ(C ∪ D).
Remark 3.43 [A Special Case] In the special case C 0 =C, Proposition (3.42) yields

C 4D ⇒ C 4 σ(C ∪ D). (3.43)


FT FT

Hence, if the measurable set system C is prior or simultaneous in FT to a measurable set


system D, then C is also prior or simultaneous in FT to σ(C ∪ D). ⊳
In the next theorem we use the notation

C, D 4 E :⇔ (C 4 E ∧ D 4 E ). (3.44)
FT FT FT

Theorem 3.44 [Propositions Involving the Union of Two Set Systems]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D, E ⊂ A.
Then:
C, D 4 E ⇒ σ(C ∪ D) 4 E . (3.45)
FT FT
(Proof p. 78)

Hence, if the measurable set systems C and D are prior or simultaneous in FT to a


measurable set system E , then the σ-algebra generated by their union C ∪ D is also prior
or simultaneous in FT to E .
In the following theorem we show that the set system C being prior or simultaneous
in FT to the set system D is equivalent to C being prior or simultaneous in FT to the
σ-algebra generated by D. Furthermore, it is also equivalent to the σ-algebra generated by
C being prior or simultaneous in FT to D.

Theorem 3.45 [More Properties Involving σ-Algebras]


Let (Ω, A ) be a measurable space, FT = (Ft )t ∈T a filtration in A, and C, D ⊂ A. Then:

(i) C 4 D ⇔ C 4 σ(D)
FT FT
(ii) C 4 D ⇔ σ(C ) 4 D.
FT FT
(Proof p. 79)
3.4 Prior-or-Simultaneous-to Relations 67

Box 3.3 Properties of the prior-or-simultaneous-to relation of measurable maps

Let X and Y be measurable maps on a measurable space (Ω,A ), let FT = (Ft )t ∈T be a filtration
in A , and let σ(X ) and σ(Y ) denote the σ-algebras generated by X and Y, respectively. Then we
say that X is prior or simultaneous to Y in FT , denoted X 4 Y , if X ≺ Y or X ≈ Y.
FT FT FT
Two first properties are:

(X 4 Y ∧ Y 4 X ) ⇒ X ≈Y (pseudo-antisymmetry) (i)
FT FT FT
¡ ¢
∃ s, t ∈T : σ(X ) ⊂ Fs ∧ σ(Y ) ⊂ Ft ⇒ (X 4 Y ∨ Y 4 X ). (linearity) (ii)
FT FT

Additionally, let also W be a measurable map on (Ω,A ). Then:


X ≈Y ⇒ W 4 Y, if σ(W ) ⊂ σ(X ,Y ) (iii)
FT FT

X 4Y ⇒ W 4 Y, if σ(W ) ⊂ σ(X ,Y ) (iv)


FT FT

X 4Y ⇒ W 4 (X ,Y ), if σ(W ) ⊂ σ(X ,Y ) (v)


FT FT
∃ t ∈T : σ(X ) ⊂ Ft ⇒ W 4 X, if σ(W ) ⊂ σ(X ). (vi)
FT

Additionally, let also Z be a measurable map on (Ω,A ). Then:

(X ≈ Y ∧ Y ≈ Z ) ⇒ W 4 Z, if σ(W ) ⊂ σ(X ,Y ) (vii)


FT FT FT

(X 4 Y ∧ Y 4 Z ) ⇒ W 4 Z, if σ(W ) ⊂ σ(X ,Y ) (viii)


FT FT FT

X ,Y 4 Z ⇒ W 4 Z, if σ(W ) ⊂ σ(X ,Y ). (ix)


FT FT

3.4.2 Properties of the Prior-or-Simultaneous-to Relation of Measurable Maps

Now we turn to the prior-or-simultaneous-to relation of measurable maps, the most im-
portant properties of which are gathered in Box 3.3. The first two, pseudo-antisymmetry
and linearity, are immediate implications of Definition 3.37 (ii) and Theorem 3.39 (ii) and
(iii), respectively. According to Proposition (i) of Box 3.3, if X is prior or simultaneous in
FT to Y and Y is prior or simultaneous in FT to X , then we can conclude that X and Y
are simultaneous in FT to each other. And, according to Proposition (ii) of Box 3.3, if X
and Y are in the filtration FT , then X is prior or simultaneous in FT to Y , or Y is prior or
simultaneous in FT to X .
Proposition (iii) of Box 3.3 has already been proved in Theorem 3.35 (ii). According to
this proposition, X and Y being simultaneous in FT implies that each (X , Y )-measurable
map W is prior or simultaneous in FT to Y.
Proposition (iv) of Box 3.3 is an immediate implication of Theorem 3.35 (i) and (ii). Ac-
cording to this proposition, if X is prior or simultaneous in FT to Y , then W is prior or
simultaneous in FT to Y as well, provided that W is measurable with respect to (X , Y ),
that is, provided that σ(W ) ⊂ σ(X , Y ).
According to Proposition (v) of Box 3.3, if X is prior or simultaneous in FT to Y, then
W is prior or simultaneous in FT to the bivariate measurable map (X , Y ), provided that
W is measurable with respect to (X , Y ). This is the case, for example, if W = X or W = Y.
Proposition (v) is an immediate implication of Theorem 3.40 and Definition 3.37 (ii).
68 3 Time Order

According to Proposition (vi) of Box 3.3, if X is in the filtration FT and W is X -measur-


able, then W is prior or simultaneous in FT to X . In the special case W = X , this implies

∃ t ∈T : σ(X ) ⊂ Ft ⇒ X 4X. (reflexivity) (3.46)


FT

Note again that σ(W ) ⊂ σ(X ) implies σ(W ) ⊂ σ(X , Y ), but not vice versa. The analog of
Proposition (vi) for measurable set systems has been treated in Theorem 3.39 (i). Together
with Definition 3.37 (ii), this proves Proposition (vi) of Box 3.3, and with it, Proposition
(3.46).
According to Proposition (vii), if X and Y as well as Y and Z are simultaneous in FT to
each other, then each (X , Y )-measurable map W is prior or simultaneous in FT to Z . For
example, if 1X =x is the indicator of the event that X takes on the value x, X is simultaneous
to Y , and Y simultaneous to Z in FT , then 1X =x is prior or simultaneous to Z .
Furthermore, according to Proposition (viii) of Box 3.3, if X is prior or simultaneous in
FT to Y and Y is prior or simultaneous in FT to Z in FT , then W is prior or simultaneous
in FT to Z as well, provided that W is (X , Y )-measurable.
Also note, for W = X , Proposition (viii) of Box 3.3 yields

(X 4 Y ∧ Y 4 Z ) ⇒ X 4Z, (transitivity) (3.47)


FT FT FT

The proofs of Propositions (vii) and (viii) of Box 3.3 are found in the following theorem.

Theorem 3.46 [Further Implications]


Let W, X , Y, Z be measurable maps on a measurable space (Ω, A ), let FT = (Ft )t ∈T be
a filtration in A, and assume σ(W ) ⊂ σ(X , Y ). Then:

(i) (X ≈ Y ∧ Y ≈ Z ) ⇒ W 4Z
FT FT FT

(ii) (X 4 Y ∧ Y 4 Z ) ⇒ W 4Z.
FT FT FT
(Proof p. 79)

Finally, we consider Proposition (ix) of Box 3.3. According to this proposition, if X and
Y are prior or simultaneous to Z in FT and W is (X , Y )-measurable, then W is prior or
simultaneous in FT to Z as well. The proof of this proposition is found in the following
theorem, in which we use the notation

X ,Y 4 Z :⇔ (X 4 Z ∧ Y 4 Z ). (3.48)
FT FT FT

Theorem 3.47 [An Implication Involving an (X , Y )-Measurable Map]


Let W, X , Y, Z be measurable maps on a measurable space (Ω, A ) and FT = (Ft )t ∈T a
filtration in A. Then

X ,Y 4 Z ⇒ W 4Z, if σ(W ) ⊂ σ(X , Y ). (3.49)


FT FT
(Proof p. 80)
3.5 Summary and Conclusions 69

Box 3.4 Glossary of new concepts

Let (Ω,A ) be a measurable space, that is, let Ω be a set and A a σ-algebra on Ω.
FT Filtration in A . A family FT := (Ft )t ∈T of σ-algebras Ft ⊂ A satisfying Fs ⊂ Ft ,
for all s, t ∈T with s ≤ t, T ⊂ R , and T 6= Ø.

Additionally, let C, D ⊂ A.
C≺D C is prior in FT to D . This means:
FT
(a) there is an s ∈T such that C ⊂ Fs and D 6⊂ Fs , and
(b) there is a t ∈T such that D ⊂ Ft .

C ≈D C is simultaneous in FT to D . This means:


FT
(c) there is a t ∈T such that C ⊂ Ft , and
(d) for all t ∈T : C ⊂ Ft if and only if D ⊂ Ft .

C 4D C is prior or simultaneous in FT to D . This means that C is prior in FT to D or


FT
C is simultaneous in FT to D.

Let X ,Y be measurable maps on (Ω,A ) and let σ(X ), σ(Y ) denote their generated σ-algebras.
X ≺Y X is prior in FT to Y . This means that σ(X ) is prior in FT to σ(Y ).
FT
X ≈Y X is simultaneous in FT to Y . This means that σ(X ) is simultaneous in FT to σ(Y ).
FT
X 4Y X is prior or simultaneous in FT to Y . This means that σ(X ) is prior or simultane-
FT
ous in FT to σ(Y ).

3.5 Summary and Conclusions

In this chapter we introduced the fundamental concept of a filtration FT = (Ft )t ∈T in the


σ-algebra A of a measurable space (Ω, A ). Such a filtration induces time order among
measurable set systems (i.e., subsets of A ) and among measurable maps on a measur-
able space (Ω, A ), provided that we interpret T as a time set. In the context of a probabi-
lity space (Ω, A, P ), this means that a filtration induces time order among sets of events,
and random variables. Referring to a filtration, we defined the prior-to, simultaneous-
to, and prior-or-simultaneous-to relations of measurable set systems and of measurable
maps. In the framework of a probability space this yields the prior-to, simultaneous-to,
and prior-or-simultaneous-to relations of sets of events and of random variables. Box 3.4
summarizes these concepts. The properties of the prior-to relation, the simultaneous-to
relation, and the prior-or-simultaneous-to relation of measurable maps have been gath-
ered in Boxes 3.1, 3.2, and 3.3, respectively.

It should be noted that these concepts are defined in the framework of a measurable
space (Ω, A ). No probability measure on A is involved. Hence, whether or not a random
variable X is prior to a random variable Y does not depend on the stochastic dependencies
between these random variables, let alone on data that result from conducting a random
experiment.
70 3 Time Order

3.6 Proofs

Proof of Theorem 3.6

(i). If C is prior in FT to D, then

(a) ∃ s ∈T: C ⊂ Fs ∧ D 6⊂ Fs
(b) ∃ t ∈T: D ⊂ Ft .

Because FT is a filtration, ¬ (∃ r ∈T : D ⊂ Fr ∧ C 6⊂ Fr ). This implies that D is not prior to


C, which proves asymmetry.
(ii). If C is prior to D in FT , then (a) holds, and if D is prior to E , then
(c) ∃ t ∈T: D ⊂ Ft ∧ E 6⊂ Ft ,
(d) ∃ r ∈T: E ⊂ Fr .
Because FT is a filtration, the conjunction of (a) and (c) implies
(e) ∃ s ∈T: C ⊂ Fs ∧ E 6⊂ Fs .
In conjunction with (d) this is equivalent to C F≺ E [see Def. 3.3 (i)].
T

Proof of Theorem 3.8

(i). If C is prior in FT to D, then

(a) ∃ s ∈T: C ⊂ Fs ∧ D 6⊂ Fs
(b) ∃ t ∈T: D ⊂ Ft .

According to Proposition (3.17), Proposition (a) and C 0 ⊂ C imply

(c) ∃ s ∈T: C 0 ⊂ Fs ∧ D 6⊂ Fs .

However, the conjunction of (c) and (b) is equivalent to C 0 F≺ D [see Def. 3.3 (i)].
T

Proof of Theorem 3.9

(i). C F≺ D is equivalent to the conjunction of


T

(a) ∃ s ∈T : C ⊂ Fs ∧ D 6⊂ Fs
(b) ∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft .
According to Proposition (3.19), (a) is equivalent to
(c) ∃ s ∈T : C ⊂ Fs ∧ σ(D) 6⊂ Fs ,
and for the same reason, (b) is equivalent to
(d) ∀ t ∈T : C ⊂ Ft ⇔ σ(D) ⊂ Ft .
However, the conjunction of (c) and (d) is equivalent to C F≺ σ(D).
T

(ii). As mentioned above, C F≺ D is equivalent to the conjunction of (a) and (b). Accord-
T
ing to Proposition (3.19), (a) is equivalent to
(e) ∃ s ∈T : σ(C ) ⊂ Fs ∧ D 6⊂ Fs .
3.6 Proofs 71

Furthermore, again according to Proposition (3.19), (b) is equivalent to


(f) ∀ t ∈T : σ(C ) ⊂ Ft ⇔ D ⊂ Ft .
However, the conjunction of (e) and (f) is equivalent to σ(C ) ≺ D.
FT

Proof of Theorem 3.11

(i).

(C F≺ D ∧ ∃ t ∈T : E ⊂ Ft )
¡ T
⇔ ∃ r ∈T : C ⊂ Fr ∧ D 6⊂ Fr [Def. 3.3 (i) (a)]
∧ ∃ s ∈T : D ⊂ Fs [Def. 3.3 (i) (b)]
¢
∧ ∃ t ∈T : E ⊂ Ft [part 2 of the premise in (i)]
¡
⇒ ∃ r ∈T : C ⊂ Fr ∧ σ(D ∪ E ) 6⊂ Fr [D 6⊂ Fr , D ⊂ σ(D ∪ E )]
¢
∧ ∃ u ∈T : σ(D ∪ E ) ⊂ Fu [u = max {s, t }]
⇔ C F≺ σ(D ∪ E ). [Def. 3.3 (i)]
T

(ii). First of all, note that C, D ≺ E is equivalent to the conjunction of:


FT

(a) ∃ r ∈T: C ⊂ Fr ∧ E 6⊂ Fr
(b) ∃ s ∈T: D ⊂ Fs ∧ E 6⊂ Fs
(c) ∃ t ∈T: E ⊂ Ft
[see Prop. (3.22) and Def. 3.3 (i)]. Without loss of generality, we assume r ≤ s. Because FT
is a filtration, this implies
(d) C, D ⊂ Fs
(e) ∀u ∈T : C, D ⊂ Fu ⇔ σ(C ∪ D) ⊂ Fu
[see RS-Prop. (1.9)].
C, D F≺ E ⇒ σ(C ∪ D) F≺ E . The conjunction of (b), (d), and (e) implies
T T

(f) ∃ s ∈T : σ(C ∪ D) ⊂ Fs ∧ E 6⊂ Fs .

However, the conjunction of (c) and (f) is equivalent to σ(C ∪ D) F≺ E [see Def. 3.3 (i)].
T

C, D F≺ E ⇐ σ(C ∪D) F≺ E . As just has been said, the conjunction of (c) and (f) is equiv-
T T
alent to σ(C ∪D) ≺ E . Now, the conjunction of (f) and (e) implies (a) and (b). However, the
FT
conjunction of (a), (b), and (c) is equivalent to C, D ≺ E .
FT

Proof of Theorem 3.13

C F≺ D ⇒ C F≺ σ(C ∪ D). This proposition is a special case of Theorem 3.11 (i), in which C
T T

also takes the role of E . The existence of a t ∈T with C ⊂ Ft is a part of the premise C ≺ D
FT
[see Def. 3.3 (i)].
C ≺ D ⇐ C ≺ σ(C ∪ D). According to Definition 3.3 (i), C ≺ σ(C ∪ D) is equivalent to
FT FT FT
the conjunction of
72 3 Time Order

(a) ∃ s ∈T : C ⊂ Fs ∧ σ(C ∪ D) 6⊂ Fs
(b) ∃ t ∈T : σ(C ∪ D) ⊂ Ft .

According to RS-Proposition (1.9), Proposition (a) implies


(c) ∃ s ∈T : C ⊂ Fs ∧ D 6⊂ Fs .
Furthermore, because D ⊂ σ(C ∪ D), Proposition (b) implies
(d) ∃ t ∈T : D ⊂ Ft .
However, according to Definition 3.3 (i), the conjunction of (c) and (d) is equivalent to
C F≺ D.
T

Proof of Theorem 3.19

(i).

X , Y F≺ Z ⇔ σ(X ), σ(Y )F≺ σ(Z ) [Def. 3.3 (ii)]


T T
¡ ¢
⇔ σ σ(X ) ∪ σ(Y ) F≺ σ(Z ) [Th. 3.11 (ii)]
T

⇔ σ(X , Y ) F≺ σ(Z ) [(3.20)]


T

⇒ σ(W )F≺ σ(Z ) [σ(W ) ⊂ σ(X , Y ), Th. 3.8]


T

⇔ W F≺ Z . [Def. 3.3 (ii)]


T

(ii).
¡ ¢
(X F≺ Y ∧ Y F≺ Z ) ⇔ σ(X ) F≺ σ(Y ) ∧ σ(Y )F≺ σ(Z ) [Def. 3.3 (ii)]
T T T T
¡ ¢
⇒ σ(X ) F≺ σ(Z ) ∧ σ(Y )F≺ σ(Z ) [Th. 3.6 (ii)]
T T

⇔ σ(X ), σ(Y ) ≺ Z [(3.22)]


FT
⇔ X ,Y ≺ Z [Def. 3.3 (ii)]
FT
⇒ W ≺ Z. [σ(W ) ⊂ σ(X , Y ), (i)]
FT

Proof of Theorem 3.21

¡ ¢
X F≺ Y ∧ ∃ t ∈T : σ(Z ) ⊂ Ft
T
¡ ¢
⇔ σ(X ) F≺ σ(Y ) ∧ ∃ t ∈T : σ(Z ) ⊂ Ft [Def. 3.3 (ii)]
T
¡ ¢
⇒ σ(X ) F≺ σ σ(Y ) ∪ σ(Z ) [Th. 3.11 (i)]
T

⇒ σ(W ) F≺ σ(Y , Z ) [Th. 3.8, (3.20)]


T

⇔ W F≺ (Y , Z ). [Def. 3.3 (ii)]


T
3.6 Proofs 73

Proof of Theorem 3.27

(i). The premise


(a) ∃ t ∈T : C ⊂ Ft
implies
(b) ∀ t ∈T : C ⊂ Ft ⇔ C ⊂ Ft .

According to Definition 3.24 (i), the conjunction of (a) and (b) is equivalent to C ≈ C .
FT

(ii). The premise C ≈ D is equivalent to the conjunction of (a) and


FT

(c) ∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft .

Now, the conjunction of (a) and (c) implies


(d) ∃ t ∈T : D ⊂ Ft
and
(e) ∀ t ∈T : D ⊂ Ft ⇔ C ⊂ Ft .

However, the conjunction of (d) and (e) is equivalent to D ≈ C .


FT
(iii). C ≈ D is equivalent to (a) and (c), and D ≈ E implies (d) and
FT FT

(f) ∀ t ∈T : D ⊂ Ft ⇔ E ⊂ Ft .

Now, the conjunction of (c) and (f) implies


(g) ∀ t ∈T : C ⊂ Ft ⇔ E ⊂ Ft .

However, the conjunction of (a) and (g) is equivalent to C ≈ E.


FT

Proof of Theorem 3.28

(i). C ≈ D is equivalent to the conjunction of the following two propositions:


FT

(a) ∃ t ∈T : C ⊂ Ft
(b) ∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft .

Furthermore, according to Proposition (3.19), (a) is equivalent to


(c) ∃ t ∈T : σ(C ) ⊂ Ft ,
and (b) is equivalent to
(d) ∀ t ∈T : σ(C ) ⊂ Ft ⇔ D ⊂ Ft .

However, the conjunction of (c) and (d) is equivalent to σ(C ) ≈ D.


FT

(ii). As mentioned above, C ≈ D is equivalent to the conjunction of (a) and (b). FT


FT
being a filtration implies that each Ft , t ∈T , is a σ-algebra. Therefore, and because of RS-
Proposition (1.9), (b) is equivalent to:

(e) ∀ t ∈T : C ⊂ Ft ⇔ σ(C ∪ D) ⊂ Ft .
74 3 Time Order

However, the conjunction of (a) and (e) is equivalent to C ≈ σ(C ∪ D) [see Def. 3.24 (i)], to
FT
σ(C ∪ D) ≈ C [see Th. 3.27 (ii)], and to σ(C ∪ D) ≈ D [see Th. 3.27 (iii)].
FT FT

Proof of Theorem 3.29


¡ ¢
(C ≈ D ∧ D ≈ E ) ⇒ σ(C ∪ D) ≈ D ∧ D ≈ E [Th. 3.28 (ii)]
FT FT FT FT

⇒ σ(C ∪ D) ≈ E . [Th. 3.27 (iii)]


FT

Proof of Theorem 3.31

(i). C F≺ D is equivalent to the conjunction of


T

(a) ∃ s ∈T : C ⊂ Fs ∧ D 6⊂ Fs
(b) ∃ t ∈T : D ⊂ Ft .

Because FT is a filtration, the conjunction of (a) and (b) implies

(c) ∀ t ∈T : D ⊂ Ft ⇒ C ⊂ Ft .

Because of RS-Proposition (1.9), (c) is equivalent to


(d) ∀ t ∈T : D ⊂ Ft ⇒ σ(C ∪ D) ⊂ Ft .
Furthermore, because D ⊂ σ(C ∪ D),
(e) ∀ t ∈T : σ(C ∪ D) ⊂ Ft ⇒ D ⊂ Ft .
However, the conjunction of (d) and (e) is equivalent to

(f) ∀ t ∈T : D ⊂ Ft ⇔ σ(C ∪ D) ⊂ Ft .

Finally, according to Definition 3.24 (i) and Theorem 3.27 (ii), the conjunction of (b) and
(f) is equivalent to σ(C ∪ D) ≈ D.
FT

(ii). C ≈ D is equivalent to the conjunction of


FT

(g) ∃ t ∈T : C ⊂ Ft
(h) ∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft .
Now, (h) implies
(i) ¬ ∃ t ∈T : C ⊂ Ft ∧ D 6⊂ Ft ,

which in turn implies ¬(C F≺ D) [see again Def. 3.24 (i)].


T

Proof of Theorem 3.32

(i). According to Theorem 3.28 (ii), C ≈ D is implies σ(C ∪ D) ≈ D, which in turn is


FT FT
equivalent to the conjunction of
(a) ∃ t ∈T : σ(C ∪ D) ⊂ Ft
(b) ∀ t ∈T : σ(C ∪ D) ⊂ Ft ⇔ D ⊂ Ft
3.6 Proofs 75

[see Def. 3.24 (i)]. Because of C 0 ⊂ σ(C ∪D) and Proposition (3.17), Proposition (a) implies
(c) ∃ t ∈T : C 0 ⊂ Ft .
Now, we distinguish two cases.
Case 1: ∀ t ∈T : C 0 ⊂ Ft ⇔ σ(C ∪ D) ⊂ Ft . Together with (b), this proposition implies
(d) ∀ t ∈T : C 0 ⊂ Ft ⇔ D ⊂ Ft .

Now, the conjunction of (c) and (d) is equivalent to C 0 ≈ D [see Def. 3.24 (i)].
¡ ¢ FT
Case 2: ¬ ∀ t ∈T : C 0 ⊂ Ft ⇔ σ(C ∪ D) ⊂ Ft . In conjunction with (c), this proposition
implies
(e) ∃ s ∈T : C 0 ⊂ Fs ∧ σ(C ∪ D) 6⊂ Fs .
Together with (b), this proposition implies
(f) ∃ s ∈T : C 0 ⊂ Fs ∧ D 6⊂ Fs .
Furthermore, the conjunction of (a) and (b) implies
(g) ∃ t ∈T : D ⊂ Ft .
However, the conjunction of (f) and (g) is equivalent to C 0 F≺ D [see Def. 3.3 (i)].
T
(ii).

C F≺ D ⇒ σ(C ∪ D) ≈ D [Th. 3.31 (i)]


T FT
£ ¡ ¢ ¤
⇒ (C 0 F≺ D ∨ C 0 ≈ D). σ σ(C ∪ D) ∪ D , (i)
T FT

Proof of Theorem 3.33

(i). According to Theorem 3.28 (ii),


¡ ¢
(C ≈ D ∧ D F≺ E ) ⇒ σ(C ∪ D) ≈ D ∧ D F≺ E .
FT T FT T

The right-hand side of this proposition is equivalent to the conjunction of


(a) ∃ t ∈T : σ(C ∪ D) ⊂ Ft
(b) ∀ t ∈T : σ(C ∪ D) ⊂ Ft ⇔ D ⊂ Ft
(c) ∃ s ∈T : (D ⊂ Fs ∧ E 6⊂ Fs )
(d) ∃ r ∈T : E ⊂ Fr .
Now the conjunction of (b) and (c) implies
¡ ¢
(e) ∃ s ∈T : σ(C ∪ D) ⊂ Fs ∧ E 6⊂ Fs .
However, according to Definition 3.3 (i), the conjunction of (e) and (d) is equivalent to
σ(C ∪ D) F≺ E . Now, Proposition (i) follows from Proposition (3.18).
T

(ii). C F≺ D ∧ D ≈ E is equivalent to the conjunction of


T FT

(f) ∃ s ∈T : (C ⊂ Fs ∧ D 6⊂ Fs )
(g) ∃ t ∈T : D ⊂ Ft
(h) ∀ t ∈T : D ⊂ Ft ⇔ E ⊂ Ft .
76 3 Time Order

Now, the conjunction of (f) and (h) implies

(i) ∃ s ∈T : (C ⊂ Fs ∧ E 6⊂ Fs ),
and the conjunction of (g) and (h) implies

(j) ∃ t ∈T : E ⊂ Ft .

However, according to Definition 3.3 (i), the conjunction of (i) and (j) is equivalent to
C F≺ E . Now, Proposition (ii) follows from Proposition (3.18).
T

Proof of Theorem 3.35

(i).

X F≺ Y
T

⇔ σ(X ) F≺ σ(Y ) [Def. 3.3 (ii)]


T
¡ ¢
⇒ σ σ(X ) ∪ σ(Y ) ≈ σ(Y ) [Th. 3.31 (i)]
FT

⇔ σ(X , Y ) ≈ σ(Y ) [(3.20)]


FT
¡ ¢
⇒ σ(W ) F≺ σ(Y ) ∨ σ(W ) ≈ σ(Y ) [σ(W ) ⊂ σ(X , Y ), Th. 3.32 (i)]
T FT

⇔ (W ≺ Y ∨ W ≈ Y ). [Defs. 3.3 (ii), 3.24 (ii)]


FT FT

(ii). Similarly,

X ≈Y
FT

⇔ σ(X ) ≈ σ(Y ) [Def. 3.24 (ii)]


FT
¡ ¢
⇒ σ σ(X ) ∪ σ(Y ) ≈ σ(Y ) [Th. 3.28 (ii)]
FT

⇔ σ(X , Y ) ≈ σ(Y ) [(3.20)]


FT
¡ ¢
⇒ σ(W ) F≺ σ(Y ) ∨ σ(W ) ≈ σ(Y ) [σ(W ) ⊂ σ(X , Y ), Th. 3.32 (i)]
T FT

⇔ (W ≺ Y ∨ W ≈ Y ). [Defs. 3.3 (ii), 3.24 (ii)]


FT FT

Proof of Theorem 3.39

(i).

∃ t ∈T : C ⊂ Ft ⇒ C ≈C [Th. 3.27 (i)]


FT

⇒ C 4C . [Def. 3.37 (i)]


FT

(ii). According to Theorem 3.6 (i),

(a) C F≺ D ⇒ ¬(D F≺ C )
T T

and, according to Theorems 3.31 (ii) and 3.27 (ii),


3.6 Proofs 77

(b) C ≈ D ⇒ ¬(D F≺ C ).
FT T

Hence, in both cases in which C 4 D holds, we can conclude ¬(D ≺ C ), and in both cases
FT FT
in which D 4 C holds, we can conclude ¬(C F≺ D). Therefore,
FT T

(C 4 D ∧ D 4 C ) ⇒ ¬(C F≺ D) ∧ ¬(D F≺ C ) [(a), (b), Def. 3.37 (i)]


FT FT T T

⇒ ¬(C F≺ D ∧ D F≺ C )
T T

⇔ ¬ (C F≺ D) ∨ ¬ (D F≺ C ). [de Morgan]
T T

Now ¬ (C F≺ D) implies C ≈ D, because we assume C 4 D [see Def. 3.37 (i)], and


T FT FT
¬ (D F≺ C ) implies C ≈ D, because we assume D 4 C [see Def. 3.37 (i) and Th. 3.27 (ii)].
T FT FT

(iii). We presume ∃ s, t ∈T : C ⊂ Fs ∧ D ⊂ Ft .

Case 1: (∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft )
⇒ C ≈D [∃ s ∈T : C ⊂ Fs , Def. 3.24 (i)]
FT

⇒ C 4 D. [Def. 3.37 (i)]


FT

⇒ (C 4 D ∨ D 4 C ).
FT FT

Case 2: ¬ (∀ t ∈T : C ⊂ Ft ⇔ D ⊂ Ft )
¡ ¢
⇒ ∃ t ∈T : (C ⊂ Ft ∧ D 6⊂ Ft ) ∨ (C 6⊂ Ft ∧ D ⊂ Ft )
⇒ (C F≺ D ∨ D F≺ C ) [Def. 3.3 (i)]
T T

⇒ (C 4 D ∨ D 4 C ). [Def. 3.37 (i)]


FT FT

(iv). Note that

(C 4 D ∧ D 4 E )
F FT
¡ T ¢
⇔ (C F≺ D ∨ C ≈ D) ∧ (D F≺ E ∨ D ≈ E ) . [Def. 3.37 (i)]
T FT T FT
¡ ¢
⇔ (C ≺ D ∧ D ≺ E ) ∨ (C ≺ D ∧ D ≈ E ) ∨ (C ≈ D ∧ D ≺ E ) ∨ (C ≈ D ∧ D ≈ E ) .
FT FT FT FT FT FT FT FT

The latter proposition follows from repeatedly applying the first distributive law. Accord-
ing to the last proposition, we consider four cases.

Case 1: (C ≺ D ∧ D ≺ E )
FT FT
⇒ C ≺E [Th. 3.6 (ii)]
FT

⇒ C 4E . [Def. 3.37 (i)]


FT

Case 2: (C F≺ D ∧ D ≈ E )
T FT
⇒ C F≺ E [Th. 3.33 (ii)]
T

⇒ C 4E . [Def. 3.37 (i)]


FT

Case 3: (C ≈ D ∧ D F≺ E )
FT T
78 3 Time Order

⇒ C ≺E [Th. 3.33 (i)]


FT

⇒ C 4E . [Def. 3.37 (i)]


FT

Case 4: (C ≈ D ∧ D ≈ E )
FT FT

⇒ C ≈E [Th. 3.27 (iii)]


FT

⇒ C 4E . [Def. 3.37 (i)]


FT

Proof of Theorem 3.40

C 4D ⇔ (C F≺ D ∨ C ≈ D) [Def. 3.37 (i)]


FT T FT

⇒ (C 0 F≺ D ∨ C 0 ≈ D) [Th. 3.32]
T FT

⇔ C 0 4 D. [Def. 3.37 (i)]


FT

Proof of Corollary 3.41

∃ t ∈T : C ⊂ Ft ⇒ C 4C [Th. 3.39 (i)]


FT

⇒ C0 4C . [Th. 3.40]
FT

Proof of Theorem 3.42

C 4D ⇔ (C F≺ D ∨ C ≈ D) [Def. 3.37 (i)]


FT FT
¡ T ¢
⇒ C ≺ σ(C ∪ D) ∨ C ≈ σ(C ∪ D) [Ths. 3.13, 3.27 (ii), 3.28 (ii)]
³ FT F
¡ T ¢´
⇒ C 0 F≺ σ(C ∪ D) ∨ C 0 F≺ σ(C ∪ D) ∨ C 0 ≈ σ(C ∪ D) [Th. 3.32]
T T F T
¡ ¢
⇔ C 0 F≺ σ(C ∪ D) ∨ C 0 ≈ σ(C ∪ D)
T FT

⇔ C 0 4 σ(C ∪ D). [Def. 3.37 (i)]


FT

Proof of Theorem 3.44

First of all, note that

C, D 4 E
FT

⇔ (C 4 E ∧ D 4 E ) [(3.44)]
FT FT
¡ ¢
⇔ (C F≺ E ∨ C ≈ E ) ∧ (D F≺ E ∨ D ≈ E ) [Def. 3.37 (i)]
T FT T FT
¡ ¢
⇔ (C F≺ E ∧ D F≺ E ) ∨ (C F≺ E ∧ D ≈ E ) ∨ (C ≈ E ∧ D F≺ E ) ∨ (C ≈ E ∧ D ≈ E ) .
T T T FT FT T FT FT
3.6 Proofs 79

The last proposition follows from repeatedly applying the first distributive law. Accord-
ingly, we consider four cases.

Case 1: (C F≺ E ∧ D F≺ E ) ⇔ σ(C ∪ D) F≺ E [Th. 3.11 (ii)]


T T T

⇒ σ(C ∪ D) 4 E . [Def. 3.37 (i)]


FT

¡ ¢
Case 2: (C F≺ E ∧ D ≈ E ) ⇒ σ(C ∪ E ) ≈ E ∧ E ≈ D [Ths. 3.31 (i), 3.27 (ii)]
T FT FT FT
¡ ¢
⇒ σ σ(C ∪ D) ∪ E ≈ E [Ths. 3.29, 3.27 (iii)]
FT

⇒ σ(C ∪ D) 4 E . [Th. 3.32 (i), Def. 3.37 (i)]


FT

Case 3: (C ≈ E ∧ D F≺ E ) ⇔ (D F≺ E ∧ C ≈ E )
FT T T FT

⇒ σ(C ∪ D) 4 E . [Case 2, σ(D ∪ C ) = σ(C ∪ D)]


FT

Case 4: (C ≈ E ∧ D ≈ E ) ⇒ (C ≈ D ∧ D ≈ E ) [Th. 3.27 (ii), (iii)]


FT FT FT FT

⇒ σ(C ∪ D) ≈ E [Th. 3.29]


FT

⇒ σ(C ∪ D) 4 E . [Def. 3.37 (i)]


FT

Proof of Theorem 3.45

(i).

C 4D ⇔ (C ≺ D ∨ C ≈ D) [Def. 3.37 (i)]


FT F FT
¡ T ¢
⇔ C F≺ σ(D) ∨ C ≈ σ(D) [Ths. 3.9 (i), 3.28 (i), 3.27 (ii)]
T FT

⇔ C 4 σ(D). [Def. 3.37 (i)]


FT

(ii). Similarly,

C 4D ⇔ (C F≺ D ∨ C ≈ D) [Def. 3.37 (i)]


FT FT
¡ T ¢
⇔ σ(C ) ≺ D ∨ σ(C ) ≈ D [Ths. 3.9 (ii), 3.28 (i)]
FT FT

⇔ σ(C ) 4 D. [Def. 3.37 (i)]


FT

Proof of Theorem 3.46

(i).

(X ≈ Y ∧ Y ≈ Z ) ⇒ X ≈Z [Box 3.2 (vi)]


FT FT FT
80 3 Time Order

⇒ W 4Z. [Th. 3.35 (ii), Def. 3.37 (ii)]


FT

(ii).
¡ ¢
(X 4 Y ∧ Y 4 Z ) ⇔ σ(X ) 4 σ(Y ) ∧ σ(Y ) 4 σ(Z ) [Def. 3.37 (ii)]
FT FT FT FT

⇒ σ(X ) 4 σ(Z ) [Th. 3.39 (iv)]


FT

⇔ X 4Z [Def. 3.37 (ii)]


FT

⇒ W 4Z. [Th. 3.35 (ii), (i)]


FT

Proof of Theorem 3.47

X ,Y 4 Z ⇔ (X 4 Z ∧ Y 4 Z ) [Prop. (3.48)]
FT FT FT
¡ ¢
⇔ 4
σ(X ) σ(Z ) ∧ σ(Y ) 4 σ(Z ) [Def. 3.37 (ii)]
FT FT
¡ ¢
⇒ σ σ(X ) ∪ σ(Y ) 4 σ(Z ) [Th. 3.44]
FT

⇔ σ(X , Y ) 4 σ(Z ) [(3.20)]


FT

⇒ (X , Y ) 4 Z [Def. 3.37 (ii)]


FT

⇒ W 4Z. [σ(W ) ⊂ σ(X , Y ), Box 3.3 (iv)]


FT

3.7 Exercises

⊲ Exercise 3-1 Which are the elements of the σ-algebra σ(U ) in Example 3.2?

⊲ Exercise 3-2 Consider Table 1.2 and write down the inverse image X − 1 (A ′ ) of the set A ′ = {1} in
terms of a subset of Ω = {ω1 ,ω2,... ,ω8 }.

⊲ Exercise 3-3 Consider the random experiment presented in Example 3.2 and enumerate¢ all ele-
ments of the σ-algebra generated by X . Instead of specifying the value space ΩX′ ,P (ΩX′ ) of X with
¡

ΩX′ = {0,1} as in Example 3.2, choose (R ,B), where B denotes the Borel σ-algebra on the set R of real
numbers (see RS-Rem. 1.14).

⊲ Exercise 3-4 Consider Table 1.2 and write down the inverse images

(U , X ) − 1 (A i′ ) = ω ∈ Ω: (U , X )(ω) ∈ A i′ , i = 1,2,
© ª

in terms of subsets of Ω = {ω1 ,ω2 ,... ,ω8 } for A ′1 = {( Joe ,0)} and A ′2 = {( Joe ,1),(Ann ,0),(Ann ,1)}.

⊲ Exercise 3-5 Consider Example 3.5 and name three more pairs of sets systems such that one is
prior in FT to the other.

⊲ Exercise 3-6 Prove (A ⊂ B ∧ A ⊂ C ) ⇔ A ⊂ (B ∩C ).

⊲ Exercise 3-7 Prove(A ⊂ C ∧ B ⊂ C ) ⇔ (A ∪ B) ⊂ C .


¡ ¢
⊲ Exercise 3-8 Prove: A ⊂ B ∧ B ⊂ C ⇒ A ⊂ C .
3.7 Exercises 81

⊲ Exercise 3-9 Prove: If X is a real-valued measurable map on (Ω,A ) and W = α · X , α ∈ R , then W


is X -measurable.

⊲ Exercise 3-10 Show: If W = β1 X 1 + β2 X 2 with β1 , β2 ∈ R , then W is (X 1 , X 2 )-measurable.

⊲ Exercise 3-11 Prove Proposition (iv) of Box 3.1.

⊲ Exercise 3-12 Prove Propositions (x) and (xi) of Box 3.2.

Solutions

⊲ Solution 3-1 The probability space (Ω,A,P) representing the random experiment described in
Table 1.2 is specified in Example 3.2. Furthermore, the random variable U is specified in Table 1.2.
The σ-algebra σ(U ) has four elements. Aside from Ω and Ø, these are the events
© ª
C = { Joe } × ΩX ×ΩY = ( Joe, no ,−),(Joe ,yes ,−),(Joe , no ,+),(Joe ,yes ,+)

that Joe is drawn and

C c = {Ann } × ΩX ×ΩY = (Ann, no ,−),(Ann ,yes ,−),(Ann , no ,+),(Ann ,yes ,+)


© ª

that Ann is drawn.


⊲ Solution 3-2 This inverse image is X − 1 (A ′ ) = X − 1 ({1}) = {ω3 ,ω4 ,ω7 ,ω8 }. This is the event that X
takes on the value 1.
⊲ Solution 3-3 There are four different inverse images of sets B ∈ B under X :

{ω ,ω ,ω ,ω }, if 0 ∈B and 1 ∉B
 1 2 5 6



{ω ,ω ,ω ,ω },
3 4 7 8 if 0 ∉B and 1 ∈B
∀B ∈ B : X − 1 (B) =


Ω, if 0 ∈B and 1 ∈B

Ø, if 0 ∉B and 1 ∉B.

These four inverse images are the ¢elements of


© σ(X ). The same
ª¢ four elements are obtained if, instead
of (R ,B), we choose ΩX′ ,P (ΩX′ ) = {0,1}, {0},{1},{0,1},Ø as the value space of X (cf. Exercise
¡ ¡

3-2).
⊲ Solution 3-4 The first inverse image is

(U , X ) − 1 (A ′1) = (U , X ) − 1 {( Joe ,0)} = {ω1 ,ω2 }.


¡ ¢

This is the event that Joe is drawn and not treated. The second inverse image is

(U , X ) − 1 (A ′2) = (U , X ) − 1 {( Joe ,1),(Ann ,0),(Ann ,1)} = {ω3 ,ω4 ,ω5 ,ω6 ,ω7 ,ω8 }
¡ ¢

[consider again Eqs. (3.11) to (3.13)]. This is the event that Joe is drawn and treated or Ann is drawn.
⊲ Solution 3-5 σ(U ) is prior in FT to σ(U , X ), σ(U ) is prior to σ(U , X ,Y ), and σ(X ) is prior to
σ(U , X ,Y ).
⊲ Solution 3-6

(A ⊂ B ∧ A ⊂ C ) ⇔ ∀a ∈ A : a ∈ B ∧ a ∈ C
⇔ ∀a ∈ A : a ∈ (B ∩C )
⇔ A ⊂ (B ∩C ).
82 3 Time Order

⊲ Solution 3-7

(A ⊂ C ∧ B ⊂ C ) ⇔ ∀a ∈ A : a ∈ C ∧ ∀b ∈B : b ∈ C
⇔ ∀c ∈ A ∪ B : c ∈ C
⇔ A ∪B ⊂ C.

⊲ Solution 3-8

(A ⊂ B ∧ B ⊂ C ) ⇔ ∀a ∈ A : a ∈ B ∧ ∀b ∈ B : b ∈ C
⇒ ∀a ∈ A : a ∈ C
⇔ A ⊂ C.

⊲ Solution 3-9 If α ∈ R , then W =α·X is the composition g (X ) of the measurable maps X : (Ω,A ) →
(R, B) and g : (R,B) → (R,B) defined by

g (x) = α · x, ∀x ∈ R.

According to Remark 3.17, W is X -measurable.


⊲ Solution 3-10 If β1 , β2 ∈ R , then W = β1 X 1 + β2 X 2 is the composition g (X 1 , X 2 ) of the bivariate
measurable map (X 1 , X 2 ): (Ω,A ) → (R 2, B 2 ) and the measurable map g : (R 2, B 2 ) → (R, B) defined
by

g (x1 , x2 ) = β1 x1 + β2 x2 , ∀(x1 , x2 ) ∈ R 2.

According to Remark 3.17, W is (X 1 , X 2 )-measurable.


⊲ Solution 3-11

X ≺Y ⇔ σ(X ) ≺ σ(Y ) [Def. 3.3 (ii)]


FT FT
¡ ¢
⇔ σ(X ) ≺ σ σ(X ) ∪σ(Y ) [Th. 3.13]
FT
⇔ σ(X ) ≺ σ(X ,Y ) [(3.20)]
FT
⇒ σ(W ) ≺ σ(X ,Y ) [Th. 3.8]
FT
⇔ W ≺ (X ,Y ). [Def. 3.3 (ii)]
FT

⊲ Solution 3-12 Box 3.2 (x).

(X ≈ Y ∧ Y ≺ Z )
FT FT
¡ ¢
⇔ σ(X ) ≈ σ(Y ) ∧ σ(Y ) ≺ σ(Z ) [Defs. 3.3 (ii), 3.24 (ii)]
F T F T
³ ¡ ¢ ´
⇔ σ σ(X ) ∪ σ(Y ) ≈ σ(Y ) ∧ σ(Y ) ≺ σ(Z ) [Th. 3.28 (i)]
FT FT
⇒ σ(W ) ≺ σ(Z ) [σ(W ) ⊂ σ(X ,Y ), Th. 3.33 (i)]
FT
⇔ W ≺ Z. [Def. 3.3 (ii)]
FT

Box 3.2 (xi).

(X ≺ Y ∧ Y ≈ Z )
FT FT
¡ ¢
⇔ σ(X ) ≺ σ(Y ) ∧ σ(Y ) ≈ σ(Z ) [Defs. 3.3 (ii), 3.24 (ii)]
FT FT
⇒ σ(W ) ≺ σ(Z ) [σ(W ) ⊂ σ(X ), Th. 3.33 (ii)]
FT
⇔ W ≺ Z. [Def. 3.3 (ii)]
FT
Chapter 4
Regular Causality Space and Potential Confounder

In chapter 1 we studied some examples showing that the conditional expectation values
E (Y |X =x ) of an outcome variable Y and their differences E (Y |X =x ) − E (Y |X =x ′ ), the
prima facie effects, can be seriously misleading in evaluating the causal effect of a treat-
ment variable X on an outcome variable Y. These examples demonstrate that the standard
probabilistic concepts — such as conditional expectation values or conditional probabili-
ties — cannot be used offhandedly to define the causal effects in which we are interested
when we have to evaluate a treatment, an intervention, or an exposition. In chapter 2, we
described random experiments of various research designs in which a causal total effect is
of interest. In chapter 3 we started the mathematical theory of causal effects introducing
the prior-to, simultaneous-to, and prior-or-simultaneous-to relations among measurable
set systems and among measurable maps. In the context of a probability space (Ω, A, P ),
these relations also apply to sets of events and random variables, respectively.

Overview

In the present chapter, we introduce the concepts of a regular causality space and a regular
causality setup. A regular causality setup provides the mathematical structure that allows
us to define a putative cause variable X , an outcome variable Y of X , a potential confounder
of X , and a potential mediator between X and Y . In many cases, conditional expectations
describing a causal dependence can be distinguished from conditional expectations that
have no such causal interpretation by their relationship to the potential confounders of
the putative cause variable X . This will be substantiated in the chapters on causality con-
ditions.
Just as in chapter 3, none of the concepts treated in this chapter involves a probabi-
lity measure. Instead, a measurable space (Ω, A ) (i. e., a set Ω and a σ-algebra A on Ω)
suffices. Note, however, that all properties of a regular causality space still hold if we add
a probability measure P on (Ω, A ), considering a probability space (Ω, A, P ). This will be
necessary as soon as we turn to random variables and their stochastic dependencies on
each other (see the chapters to come). In empirical applications, (Ω, A, P ) represents a
concrete random experiment, Ω the set of possible outcomes, and A the set of possible
events in this random experiment.

Prerequisites

Reading this chapter requires that the reader is familiar with the concepts of a σ-algebra, a
measurable space, and a measurable map as treated, for example, in the first two chapters
of Steyer (2024). These chapters will be referred to as RS-chapter 1 and RS-chapter 2, and
the same kind of shortcut is used when referring to other parts of that book.
84 4 Regular Causality Space and Potential Confounder

4.1 Regular Causality Space and Setup

In this section, we introduce the notions of a regular causality space and a regular proba-
bilistic causality space. A regular causality space is the minimal formal framework in which
we can introduce the concepts of a putative cause variable, a potential confounder, a po-
tential mediator, and an outcome variable. In a regular probabilistic causality space we
can define true outcome variables, (causal) unbiasedness, and causality conditions, which
imply unbiasedness of various conditional expectations and their differences.

4.1.1 Regular Causality Space

Defining a regular causality space (see Def. 4.6), we refer to a measurable space (Ω, A )
that is assumed to be the product of four measurable spaces (Ωt , At ), t ∈T = {1, 2, 3} (see
RS-Def. 1.15). This implies that Ω is the set product of the four sets Ω1 , Ω2 , and Ω3 , each of
which can itself be a set product of other sets.

Remark 4.1 [Intuitive Background of the Product Space (Ω, A )] When we consider the
effect of a putative cause variable X (treatment, intervention, or exposition variable) on
an outcome variable Y that is assessed for an observational unit to be sampled (often-
times a person), then there are variables that are prior to X . They represent attributes of
the observational unit before treatment. It is obvious that these pretreatment attributes or
their fallible observations cannot be caused by a subsequent treatment. Simple examples
are age, sex, race, socioeconomic status, or educational status before treatment. The set Ω1
occurring in Definition 4.6 should be chosen such that these pretreatment variables solely
depend on the elements of Ω1 .
Next, there might be variables that are simultaneous to X , for example, a second treat-
ment variable that varies at the same time as X . For example, you can drink coffee (or not)
and alcohol (or not) at the same time. The set Ω2 in Definition 4.6 should be chosen such
that variables that are simultaneous to a focused putative cause variable X (including X
itself) only depend on the elements of Ω2 .
Finally, the set Ω3 should be chosen such that the outcome variable Y depends, at least
in part, on the elements of this set. For details and a mathematical formulation of these
ideas, see Definitions 4.6 and 4.11. ⊳

Remark 4.2 [Intuitive Background of the Projections] Definition 4.6 refers to the projec-
tions (or coordinate maps) π1 to π3 (see, e. g., RS-Def. 2.27). Intuitively speaking, the pro-
jection π1 is a map that contains the information about all events that are prior to the pu-
tative cause variable. In any case, all nonconstant maps that only depend — in the sense of
measurability, not in the sense of a probabilistic dependence — on π1 are potential con-
founders.
In contrast, π2 contains the information about all events that are simultaneous to the
putative cause variable, say X , including the events represented by X itself. The projection
π2 can be multidimensional, that is, it may consist of several projections π2j . Hence, π2 =
(π2j , j ∈ J ) is a family of projections for a nonempty index set J .
Finally, the projection π3 contains the information about all events that are simultane-
ous or posterior to the outcome variable, say Y . We will require that Y depends at least in
part on π3 . This will allow us to choose V2 −V1 as an outcome variable, where V2 is a func-
4.1 Regular Causality Space and Setup 85

tion of π3 , and V1 a function of π1 . Examples for V1 and V2 are achievement, well-being, or


health scores, before and after treatment, respectively. ⊳

Remark 4.3 [Filtration (Ft )t ∈T ] In Definition 4.6 we specify a specific filtration (Ft )t ∈T ,
T = {1, 2, 3}. From the perspective of the focused putative cause variable, the σ-algebras
F1 and F2 represent the sets of past and present events, respectively. In contrast, F3 rep-
resents all events that are posterior to X . ⊳

Remark 4.4 [Cause σ-Algebra] In Remark 4.2, we already discussed the relation between
the projection π2 and a putative cause variable X . In Definition 4.6 (i), this relationship is
made explicit for the cause σ-algebra C with respect to which we will require a putative
cause variable X to be measurable (see Def. 4.11). ⊳

Remark 4.5 [Potential Confounder σ-Algebra] Intuitively speaking, we consider all vari-
ables as potential confounders of a putative cause variable X that are prior or simultane-
ous to X , except for X itself. In Definition 4.6 we introduce the concept of a confounder
σ-algebra of C , denoted DC . This σ-algebra DC can be interpreted to represent the set of
past and present possible events from the perspective of the cause σ-algebra C , except for
those that are represented by C itself. The σ-algebra DC also plays a crucial role in the
definition of a true outcome variable and all other concepts based thereon (see ch. 5). ⊳

Definition 4.6 [Regular Causality Space]


Let (Ω, A ) denote the product of the measurable spaces (Ωt , At ), t ∈T = {1, 2, 3}, where
for all ωt ∈ Ωt , t ∈T , we assume {ωt } ∈ At . Furthermore, let πt : Ω → Ωt , t ∈T , denote the
projection with πt (ω) = ωt , for all ω ∈ Ω, and define the filtration (Ft )t ∈T by

F1 := σ(π1 ), F2 := σ(π1 , π2), F3 := σ(π1 , π2 , π3 ). (4.1)

Finally, let π2 = (π2j , j ∈ J ), where J is a nonempty finite subset of N and assume that
{Ω, Ø} 6= σ(π2j , j ∈ K ), Ø 6= K ⊂ J . Then we call
(i) C := σ(π2j , j ∈ K ) , Ø 6= K ⊂ J , the cause σ-algebra
(ii) DC := σ(π1, π2j , j ∈ J \ K ) the confounder σ-algebra of C
¡ ¢
(iii) (Ω, A ), (Ft )t ∈T ,C , DC a regular causality space .

Remark 4.7 [Regular Probabilistic Causality Space] Beginning with chapter 5, we will ad-
ditionally consider a probability measure P on the σ-algebra A, and then
¡ ¢
(Ω, A, P ), (Ft )t ∈T , C , DC

will be called a regular probabilistic causality space. ⊳

Remark 4.8 [Cause σ-Algebra C and Confounder σ-Algebra DC ] According to Definition


4.6 (i), the cause σ-algebra C can be chosen to be generated by π2 itself or by any proper
subset of the projections π2j , j ∈ J , that constitute π2. This choice of the cause σ-algebra C
also determines the confounder σ-algebra DC . In particular, the choice of the index set
K ⊂ J and with it, the choice of C , determines which projections π2j generate, together
with π1, the confounder σ-algebra DC [see Def. 4.6 (ii)]. ⊳
86 4 Regular Causality Space and Potential Confounder

Table 4.1. Joe and Ann with a single treatment variable

Possible outcomes Projections Observables

Treatment variable X

Outcome variable Y
Treatment

Success
Unit

π1

π2

π3
ω1 = ( Joe, no, −) Joe no − 0 0
ω2 = ( Joe, no, +) Joe no + 0 1
ω3 = ( Joe, yes, −) Joe yes − 1 0
ω4 = ( Joe, yes, +) Joe yes + 1 1
ω5 = (Ann, no, −) Ann no − 0 0
ω6 = (Ann, no, +) Ann no + 0 1
ω7 = (Ann, yes, −) Ann yes − 1 0
ω8 = (Ann, yes, +) Ann yes + 1 1

Remark 4.9 [(Ft )t ∈T Is a Filtration in A ] The family of σ-algebras (Ft )t ∈T , T = {1, 2, 3},
specified in the definition of a regular causality space [see Eq. (4.1)] is a filtration in A.
This immediately follows from RS-Equation (1.3) and RS-Equation (2.15). ⊳

Example 4.10 [Joe and Ann With a Single Treatment Variable] We illustrate the concepts
treated in Definition 4.6 by the kind of experiment presented in Table 4.1. It is the same
kind of experiment as already presented in Table 1.2. The first part of the experiment con-
sists of sampling a person u from the set Ω1 = { Joe, Ann }. Then the sampled person re-
ceives (yes) or does not receive (no) a treatment. The two treatment conditions are the
elements of the set Ω2 = {no, yes}. Finally, it is observed whether or not a success criterion
is reached at some appropriate time after treatment. These two possible outcomes are the
elements of the set Ω3 = {−, +}. Hence, the sets Ωt , t ∈T = {1, 2, 3}, occurring in Definition
4.6 are
Ω1 = { Joe, Ann }, Ω2 = {no, yes}, Ω3 = {−, +}.

Furthermore, we choose the σ-algebras on these sets to be At = P (Ωt ), t ∈T , the cor-


responding power sets. Because, in Definition 4.6, we require that (Ω, A ) is the product
of the measurable spaces (Ωt , At ) (see again RS-Def. 1.15), the measurable space (Ω, A ) is
completely specified, and with it the projections π1, π2, and π3 . The assignment of values
of these projections to all elements ωi of Ω is explicitly shown in Table 4.1. Note that the
projection π1 is identical to the person variable U specified in Table 1.2.
The σ-algebras of the filtration (Ft )t ∈T , T = {1, 2, 3}, are

F1 = σ(π1 ), F2 = σ(π1, π2), F3 = σ(π1, π2, π3 ) (4.2)

(see Exercises 4-1 to 4-4). Furthermore, according to Definition 4.6, in this kind of experi-
ment,
4.1 Regular Causality Space and Setup 87
© ª
C = σ(π2) = {ω1 , ω2 , ω5 , ω6 }, {ω3 , ω4 , ω7 , ω8 }, Ω, Ø (4.3)

is necessarily the cause σ-algebra because J = K = {1} implying π2 = π21 = (π2j , j ∈ K ) (see
Exercises 4-5 and 4-6). Finally,
© ª
DC = σ(π1 ) = {ω1 , . . . , ω4 }, {ω5 , . . . , ω8 }, Ω, Ø (4.4)

is the confounder σ-algebra


¡ of C because J¢ = K . Hence, we completely specified the reg-
ular causality space (Ω, A ), (Ft )t ∈T , C , DC .
Note that the structure of this kind of experiment is essentially the same for every sim-
ple experiment of the type described in section 2.1. Only the numbers of values of the
projections πt increase if we consider more than two observational units, more than two
treatment conditions, or more than two values of the outcome variable. ⊳

4.1.2 Regular Causality Setup and Potential Confounder

Now we introduce a regular causality setup, which, additionally to a regular causality


space, comprises a putative cause variable and an outcome variable. Furthermore, we de-
fine the concepts of a potential confounder of a putative cause variable and a global poten-
tial confounder of a putative cause variable. In particular it is explicated how these maps
are related to a regular causality space.

Definition
¡ 4.11 [Regular Causality
¢ Setup and Potential Confounder]
Let (Ω, A ), (Ft )t ∈T ,C , DC be a regular causality space and X , Y , D X ,W measurable
maps on (Ω, A ). Then we call
(i) X a putative cause variable of Y if
(a) {Ω, Ø} 6= σ(X ) ⊂ C = σ(π2j , j ∈ K )
(b) ¬∃ K 0 ⊂ K , K 0 6= K , such that σ(X ) ⊂ σ(π2j , j ∈ K 0 )
(ii) Y an outcome or response variable if σ(Y ) 6⊂ F2.
(iii) D X a global potential confounder of X if σ(D X ) = DC
(iv) W a potential confounder of X if σ(W ) ⊂ σ(D X )
¡ ¢
(v) (Ω, A ), (Ft )t ∈T ,C , DC , X , Y a regular causality setup .
A potential confounder W of X is called trivial if σ(W ) = {Ω, Ø}.

In Definition 4.11, we require that all variables introduced in this definition are mea-
surable maps on the same measurable space (Ω, A ). Specifying (Ω, A ) and (Ft )t ∈T , we fix
which kind of experiment we are talking about, and with X and Y, we choose the puta-
tive cause variable and the outcome variable, respectively. Finally, with D X we specify the
global potential confounder of X . Note that the set of all potential confounders of X is the
set of all measurable maps on (Ω, A ) that are D X -measurable.

Remark 4.12 [Regular Probabilistic Causality Setup] As mentioned before, in chapter 5,


we will additionally consider a probability measure P on the σ-algebra A, and then
¡ ¢
(Ω, A, P ), (Ft )t ∈T , C, DC , X , Y

will be called a regular probabilistic causality setup. ⊳


88 4 Regular Causality Space and Potential Confounder

Remark 4.13 [ Putative Cause Variable] In Definition 4.11 (i), we postulate σ(X ) 6= {Ω, Ø}.
Hence, a putative cause variable X is not a constant. Requiring σ(X ) ⊂ C — and not
σ(X ) = C — means that X is not necessarily the only putative cause variable we might con-
sider in the framework of a given regular causality space. Instead we can also coarsen an
original putative cause variable and consider a new one with fewer values (see the example
in sect. 4.2.1). According to condition (b) of Definition 4.11 (i), there is no proper subset K 0
of K such that a putative cause variable X is measurable with respect to (π2j , j ∈ K 0 ). This
requirement secures, for example, that we cannot choose a putative cause variable that is
measurable with respect to a single projection, say π21 , if we choose C to be generated by
(π21 , π22). If, in a setting with π2 = (π21 , π22), a π21 -measurable putative cause variable X is
desired, then we have to choose C = σ(π21 ), implying σ(π22 ) ⊂ DC = σ(π1, π22 ). Hence, if
π2 = (π21 , π22 ) and σ(X ) ⊂ σ(π21 ), then the projection π22 is a potential confounder of X .
For more details see the first regular causality setup presented in the example of section
4.2.2. ⊳
Remark 4.14 [Outcome Variable] Requiring σ(Y ) 6⊂ F2, we secure that an outcome vari-
able is neither a constant nor a potential confounder of X . We allow for outcome variables
that are functions of a potential confounder and a map that is measurable with respect
to σ(π3 ). A typical example is a change variable Y = V2 − V1 between a σ(π3 )-measurable
variable V2 and a σ(π1 )-measurable V1 — for example, a pretest — assessing the same at-
tribute as V2, but before treatment. In this case, the change variable still depends on the
elements of Ω3 , but not exclusively (see Rem. 4.2).
Of course, a σ(π3 )-measurable map is an outcome variable as well. The definition of
an outcome variable also allows us to rescale, coarsen, or aggregate a new outcome vari-
able from other outcome variables. For example, instead of considering a ten-dimensional
outcome variable (Y1 , . . . , Y10 ) representing the binary answers to ten items in a question-
naire, one might be interested in a uni-dimensional outcome variable Y defined as the
sum Y1 + . . . + Y10 . ⊳
Remark 4.15 [Potential Confounder] According to Definition 4.11 (iv), each non-constant
measurable map W satisfying σ(W ) ⊂ DC is a potential confounder of X . Hence, each pre-
treatment variable W — which is measurable with respect to the projection π1 — is a po-
tential confounder of a putative cause variable X [see Def. 4.11 (iv)]. However, a treatment
variable can also be a potential confounder of another treatment variable if this other
treatment variable is focused as the putative cause variable [see Defs. 4.6 (ii) and 4.11 (i)].

Remark 4.16 [Covariate of X ] If W is a potential confounder of X , then we also call it a
covariate of X , in particular in a context in which we condition on W and X in a con-
ditional expectation such as E (Y | X ,W ) or in a conditional distribution P Y |X ,W . From a
mathematical point of view, the terms potential confounder of X and covariate of X are
synonyms. Note, however, that, in the statistical literature, the term ‘covariate’ is not used
unanimously. Sometimes it even refers to a putative cause variable of Y . ⊳
Remark 4.17 [Global Potential Confounder] According to Definition 4.11 (iii), each non-
constant measurable map D X satisfying σ(D X ) = DC is a global or comprehensive potential
confounder of X . Hence, instead of specifying the σ-algebra DC we may also specify a
global potential confounder D X , which may often be more convenient. The concept of a
global potential confounder also plays a crucial role in the definition of a true outcome
4.1 Regular Causality Space and Setup 89

F3 = σ(π1, π2 , π3 ) = A

F2 = σ(π1, π2)

D = σ(D X )

F1 = σ(π1)
σ(W1 ) σ(W3 )

σ(W2)

C σ(X )

σ(π3 )

Figure 4.1. The filtration (Ft )t ∈T and various σ-algebras in a regular causality space.

variable and all other concepts based thereon (see ch. 5). The σ-algebra σ(D X ) generated
by D X can be interpreted to represent, from the perspective of the cause σ-algebra C, the
set of past and present possible events, except for those that are represented by C itself. ⊳

Remark 4.18 [There Are Several Global Potential Confounders] There are always several
global potential confounders of X . For example, in Table 4.1, the projection π1 and the in-
dicator variable 1U = Joe of the event that Joe is sampled generate the same σ-algebra, which
is also the confounder σ-algebra if we consider the treatment variable X as the putative
cause variable. Therefore, we talk about ‘a’ — and not about ‘the’ — global potential con-
founder of X unless a specific global potential confounder is already specified. In contrast,
the confounder σ-algebra DC = σ(D X ) is uniquely determined, once the measurable space
(Ω, A ), the filtration (Ft )t ∈T , and the cause σ-algebra C are specified (see Def. 4.6). ⊳

Remark 4.19 [Reference to X ] According to Definition 4.11, a potential confounder of X


is every non-constant map that is measurable with respect to the σ-algebra generated by a
global potential confounder D X of X . Note that the reference to X is important. Choosing
another putative cause variable may lead to another confounder σ-algebra [see the exam-
ples in sect. 4.2] and with it, to other potential confounders. ⊳

Figure 4.1 shows the subset relationships between the various σ-algebras in a regular
causality setup. These relationships follow from the conditions specified in Definitions 4.6
and 4.11. The figure also shows which of the σ-algebras occur for the first time in one of the
four σ-algebras F1 , F2, and F3 . Note that W1 ,W2 , and W3 denote potential confounders of
X . The σ-algebra σ(Y ) generated by the outcome variable Y is not shown in the figure. It
can be a subset of σ(π3 ) or a any other subset of A as long as it is not a subset of F2.
90 4 Regular Causality Space and Potential Confounder

4.1.3 Restriction of a Regular Causality Space and Setup

In experiments in which we have more than two treatment conditions, for example, treat-
ment a, treatment b, and control, we might be interested in comparing treatment a or
treatment b against control, or treatment a against treatment b. In each of these cases, we
use putative cause variables that are not maps on the original measurable space (Ω, A )
but on (Ω0 , A | Ω0 ), where Ω0 is a subset of Ω and A | Ω0 denotes the restriction of the
σ-algebra A to Ω0 . For detailed examples see sections 4.2.1 and 4.2.2.

Remark 4.20 [Restriction of a Set System and a σ-Algebra] Let E be a set of subsets of a
nonempty set Ω and Ω0 ⊂ Ω. Then

E | Ω0 := { Ω0 ∩ A : A ∈ E } (4.5)

is called the restriction of E to Ω0 or the trace of E in Ω0 . If A is a σ-algebra on Ω, then


A | Ω0 is a σ-algebra on Ω0 . (For a proof see SN-Exercise 1-5). ⊳
Remark 4.21 [Restriction of a Map] In the sequel we will also use the restriction of a map
f : Ω → Ω ′ to the subset Ω0 of Ω, defined by

f | Ω0 (ω) = f (ω), for all ω ∈ Ω0 . (4.6)

This concept also applies to multivariate maps f = ( f 1 , . . . , f m ) consisting of several maps


f 1 , . . . , f m . In this case
¡ ¢
f | Ω0 = ( f 1 , . . . , f m )| Ω0 = f 1 | Ω0 , . . . , f m | Ω0 . (4.7)

Note that we can also consider the restrictions

πt | Ω0 (ω) = πt (ω), for all ω ∈ Ω0 , (4.8)

of the projections πt , t ∈T , introduced in Definition 4.6. That is, each map πt | Ω0 is the re-
striction of the corresponding map πt to the subset Ω0 of Ω. ⊳
¡ ¢
A regular causality space (Ω, A ), (Ft )t ∈T , C , DC is a structure that entirely consists
of σ-algebras on the set Ω. According to the following theorem, the restrictions of these
σ-algebras to a subset Ω0 of Ω constitute a new regular causality space, which is the frame-
work for putative cause variables that are measurable maps on Ω0 (see sect. 4.2.1 for a
detailed example).

Theorem
¡ 4.22 [Restriction¢ of a Regular Causality Space]
Let (Ω, A ), (Ft )t ∈T , C , DC be a regular causality space and

Ω0 = (Ω01 × Ω02 × Ω03) ⊂ Ω.

Then
F1 | Ω0 = σ(π1| Ω0 )
¡ ¢
F2| Ω0 = σ (π1, π2)| Ω0 (4.9)
¡ ¢
F3 | Ω0 = σ (π1, π2, π3 )| Ω0
and
4.2 Examples 91

¡ ¢ ³¡ ¢¡ ¢ ´
(Ω, A ), (Ft )t ∈T , C , DC | Ω0 := Ω0 , A | Ω0 , Ft | Ω0 t ∈T , C | Ω0 , DC | Ω0 (4.10)
¡ ¢
is a regular causality space. It is called the restriction of (Ω, A ), (Ft )t ∈T ,C , DC to Ω0 .
(Proof p. 106)

Remark 4.23 [Restriction of a Putative Cause Variable and a Potential Confounder] Let
us also consider the restriction of the putative cause variable X to Ω0 ,

X | Ω0 (ω) = X (ω), for all ω ∈ Ω0 , (4.11)

and the restriction of the outcome variable Y to Ω0 ,

Y | Ω0 (ω) = Y (ω), for all ω ∈ Ω0 . (4.12)

Adding these maps to the restriction of a regular causality space yields a new regular cau-
sality setup. ⊳

Definition 4.24 [Restriction of a Regular Causality Setup]


Let the assumptions of Theorem 4.22 hold. Then
¡ ¢ ³¡ ¢¡ ¢ ´
(Ω, A ), (Ft )t ∈T , C , DC , X , Y | Ω0 := Ω0 , A | Ω0 , Ft | Ω0 t ∈T , C | Ω0 , DC | Ω0 , X | Ω0 , Y | Ω0
¡ ¢
is called the restriction of (Ω, A ), (Ft )t ∈T ,C , DC , X , Y to Ω0 .

4.2 Examples

4.2.1 Joe and Ann With Three Treatment Conditions

We illustrate the concepts treated in section 4.1 by the kind of experiment presented in Ta-
ble 4.2. In this kind of experiment, we first sample a person u from the set Ω1 = { Joe, Ann }.
Then the sampled person receives treatment a, b, or c. These treatment conditions are the
elements of the set Ω2 = {a, b, c }. In this experiment, there is no other treatment variable.
Finally, it is observed whether or not a success criterion is reached at some appropriate
time after treatment. These two possible outcomes are the elements of the set Ω3 = {−, +}.
Hence, the sets Ωt , t ∈T = {1, 2, 3}, occurring in Definition 4.6 are

Ω1 = { Joe, Ann }, Ω2 = {a, b, c }, Ω3 = {−, +},

and Ω = Ω1 × Ω2 × Ω3 contains the twelve elements listed in first column of Table 4.2.
Furthermore, we choose the σ-algebras on these sets to be At = P (Ωt ), t ∈T , the cor-
responding power sets. Because, in Definition 4.6, we require that (Ω, A ) is the product
of the measurable spaces (Ωt , At ) (see RS-Def. 1.15), the measurable space (Ω, A ) is com-
pletely specified, and with it the projections π1, π2, and π3 . The assignment of values of
these projections to all elements of Ω is explicitly shown in Table 4.2.
92 4 Regular Causality Space and Potential Confounder

Table 4.2. Joe and Ann with three treatment conditions

Possible outcomes Projections Observables Restrictions

ab

ab
Treatment variable X | Ω

Outcome variable Y | Ω
Treatment variable 1C
Treatment variable 1A

Treatment variable 1B
Treatment variable X
Treatment condition

Outcome variable Y
Success
Unit

π1

π2

π3
ω1 = ( Joe, a, −) Joe a − 0 1 0 0 0 1 0
ω2 = ( Joe, a, +) Joe a + 0 1 0 0 1 1 1
ω3 = ( Joe, b, −) Joe b − 1 0 1 0 0 0 0
ω4 = ( Joe, b, +) Joe b + 1 0 1 0 1 0 1
ω5 = ( Joe, c, −) Joe c − 2 0 0 1 0
ω6 = ( Joe, c, +) Joe c + 2 0 0 1 1
ω7 = (Ann, a, −) Ann a − 0 1 0 0 0 1 0
ω8 = (Ann, a, +) Ann a + 0 1 0 0 1 1 1
ω9 = (Ann, b, −) Ann b − 1 0 1 0 0 0 0
ω10 = (Ann, b, +) Ann b + 1 0 1 0 1 0 1
ω11 = (Ann, c, −) Ann c − 2 0 0 1 0
ω12 = (Ann, c, +) Ann c + 2 0 0 1 1

Note . For the definition of Ωab see the text.

The filtration (Ft )t ∈T , T = {1, 2, 3}, is specified by

F1 = σ(π1 ), F2 = σ(π1, π2), F3 = σ(π1, π2, π3 ). (4.13)

Furthermore, according to Definition 4.6, in this kind of experiment,


©
C = σ(π2) = {ω1 , ω2 , ω7 , ω8 }, {ω3 , ω4 , ω9 , ω10 }, {ω5 , ω6 , ω11 , ω12 },
{ω1 , ω2 , ω3 , ω4 , ω7 , ω8 , ω9 , ω10 },
(4.14)
{ω1 , ω2 , ω5 , ω6 , ω7 , ω8 , ω11 , ω12 },
ª
{ω3 , ω4 , ω5 , ω6 , ω9 , ω10 , ω11 , ω12 }, Ω, Ø

is necessarily the cause σ-algebra because J = K = {1}, implying π2 = π21 = (π2j , j ∈ K ) (see
Def. 4.6 and Exercise 4-7). Finally,
© ª
DC = σ(π1, π2j , j ∈ J \ K ) = σ(π1 ) = {ω1 , . . . , ω6 }, {ω7 , . . . , ω12 }, Ω, Ø

is the confounder σ-algebra


¡ of C because J¢ = K . Hence, we completely specified the reg-
ular causality space (Ω, A ), (Ft )t ∈T , C , DC .
In the framework of this space, we can consider several putative cause variables. The
first one is X as specified in Table 4.2. The σ-algebra generated by X is identical to C =
σ(π2). (Comparing X to π2 in the table shows why.) Other putative cause variables are the
indicator variable of treatment a defined by
4.2 Examples 93
(
1, if ω ∈ {ω1 , ω2 , ω7 , ω8 }
1A (ω) = (4.15)
0, if ω ∈ Ω \ {ω1 , ω2 , ω7 , ω8 },

the indicator variable of treatment b defined by


(
1, if ω ∈ {ω3 , ω4 , ω9 , ω10 }
1B (ω) = (4.16)
0, if ω ∈ Ω \ {ω3 , ω4 , ω9 , ω10 },

and the indicator variable of treatment c defined by


(
1, if ω ∈ {ω5 , ω6 , ω11 , ω12 }
1C (ω) = (4.17)
0, if ω ∈ Ω \ {ω5 , ω6 , ω11 , ω12 }.

The σ-algebras generated by these three putative cause variables are subsets of C . They
can be used for comparing one treatment condition against the other two. All four putative
cause variables, X and the three indicator variables listed above, are maps on Ω, sharing
the same cause and confounder σ-algebras, and the same set of potential confounders.
Hence, we specified the four regular causality setups
¡ ¢ ¡ ¢
(Ω, A ), (Ft )t ∈T , C , DC , X , Y , (Ω, A ), (Ft )t ∈T , C , DC , 1A , Y ,
¡ ¢ ¡ ¢
(Ω, A ), (Ft )t ∈T , C , DC , 1B , Y , (Ω, A ), (Ft )t ∈T , C , DC , 1C , Y ,

which only differ in their putative cause variables (see Exercise 4-8).

Comparing Treatment a to b

However, if we intend to compare treatment conditions ¡ a and b to each other, ¢ then we


have to use the restriction of the regular causality space (Ω, A ), (Ft )t ∈T ,C , DC to the sub-
set

Ωab := {ω1 , ω2 , ω3 , ω4 , ω7 , ω8 , ω9 , ω10 }


¡ ¢
of Ω. That is, we use the regular causality space (Ω, A ), (Ft )t ∈T , C , D ¡ C | Ωab . All σ-algebras¢
in this space are the restrictions of the corresponding σ-algebras in (Ω, A ), (Ft )t ∈T ,C , DC
to the set Ωa b ⊂ Ω (see Def. 4.24). For example, for the cause σ-algebra C specified in Equa-
tion (4.14), this means

C | Ωab = {Ωab ∩ C : A ∈ C }
©
= Ωab ∩ {ω1 , ω2 , ω7 , ω8 }, Ωab ∩ {ω3 , ω4 , ω9 , ω10 }, Ωab ∩ {ω5 , ω6 , ω11 , ω12 },
Ωab ∩ {ω1 , ω2 , ω3 , ω4 , ω7 , ω8 , ω9 , ω10 },
(4.18)
Ωab ∩ {ω1 , ω2 , ω5 , ω6 , ω7 , ω8 , ω11 , ω12 },
ª
Ωab ∩ {ω3 , ω4 , ω5 , ω6 , ω9 , ω10 , ω11 , ω12 }, Ωab ∩ Ω, Ωab ∩ Ø
© ª
= {ω1 , ω2 , ω7 , ω8 }, {ω3 , ω4 , ω9 , ω10 }, Ωab , Ø .

Note that C | Ωab contains only four elements (see the last equation), whereas the orig-
inal σ-algebra C contains eight elements. Checking conditions (a) to (c) of RS-Definition
1.4 for the set in the last displayed line shows that C | Ωab is in fact a σ-algebra on Ωab (see
Exercise 4-9).
94 4 Regular Causality Space and Potential Confounder

Finally, we may consider the restriction X | Ωab of the original putative cause variable X
to Ωab specified by
(
0, if ω ∈ {ω1 , ω2 , ω7 , ω8 }
X | Ωab (ω) = X (ω) =
1, if ω ∈ {ω3 , ω4 , ω9 , ω10 },

and the restriction Y | Ωab of the original outcome variable Y to Ωab specified by
(
0, if ω ∈ {ω1 , ω3 , ω7 , ω9 }
Y | Ωab (ω) = Y (ω) =
1, if ω ∈ {ω2 , ω4 , ω8 , ω10 }
¡ ¢
(see the last two colums of Table 4.2). Hence, (Ω, A ), (Ft )t ∈T , C , DC , X , Y | Ωab is a regular
causality setup in which we can compare treatments a and b to each other with respect to
the outcome variable Y | Ωab .

Comparing Two Other Treatment Conditions To Each Other

If we intend to compare two other treatment conditions to each other, for example, a to c
or b to c, then each
¡ of these comparisons ¢ would be based on the restriction of the regular
causality space (Ω, A ), (Ft )t ∈T ,C , DC to another subset of Ω. For comparing a to c, this
subset of Ω is

Ωac := {ω1 , ω2 , ω5 , ω6 , ω7 , ω8 , ω11 , ω12 },

and for comparing b to c, it is

Ωbc := {ω3 , ω4 , ω5 , ω6 , ω9 , ω10 , ω11 , ω12 }.


¡ ¢
The corresponding restrictions of the regular causality space (Ω, A ), (Ft )t ∈T ,C , DC are
¡ ¢ ¡ ¢
(Ω, A ), (Ft )t ∈T ,C , DC | Ωac and (Ω, A ), (Ft )t ∈T ,C , DC | Ωbc .

4.2.2 Joe and Ann With Two Simultaneous Treatment Variables

Now we illustrate the concepts introduced in section 4.1 by the kind of experiment pre-
sented in Table 4.3. Again, we sample a person u from the set Ω1 = { Joe, Ann }. Then the
sampled person receives or does not receive treatment a (e. g., drug a) and, simultane-
ously, he or she receives or does not receive treatment b (e. g., drug b). The pairs of these
treatment conditions are the elements of the set

Ω2 = Ω21 × Ω22 = {no , yes } × {no , yes }


(4.19)
= {(no , no ) , (no , yes ) , (yes , no ) , (yes , yes ) }.

Finally, it is observed whether or not a success criterion is reached at some appropriate


time after treatment. The set Ω3 = {−, +} contains these two possible outcomes as ele-
ments.
Furthermore, we choose the σ-algebras on these sets to be the power sets At = P (Ωt ),
t ∈ T = {1, 2, 3}, and specify (Ω, A ) to be the product of the measurable spaces (Ωt , At ) (see
Def. 4.6 and RS-Def. 1.15). Hence, the measurable space
4.2 Examples 95

Table 4.3. Joe and Ann with two simultaneous treatment variables

Possible outcomes Projections Observables Restrictions

0
Treatment variable X | Ω

0
Outcome variable Y | Ω
Treatment variable 1A
Treatment variable Z

Treatment variable X

Outcome variable Y
π2 = (π21 ,π22 )
Treatment a

Treatment b

Success
Unit

π1

π3
ω1 = ( Joe, no, no, −) Joe (no, no) − 0 0 0 0 0 0
ω2 = ( Joe, no, no, +) Joe (no, no) + 0 0 0 1 0 1
ω3 = ( Joe, no, yes, −) Joe (no, yes) − 0 1 0 0
ω4 = ( Joe, no, yes, +) Joe (no, yes) + 0 1 0 1
ω5 = ( Joe, yes, no, −) Joe (yes, no) − 1 0 0 0
ω6 = ( Joe, yes, no, +) Joe (yes, no) + 1 0 0 1
ω7 = ( Joe, yes, yes, −) Joe (yes, yes) − 1 1 1 0 1 0
ω8 = ( Joe, yes, yes, +) Joe (yes, yes) + 1 1 1 1 1 1
ω9 = (Ann, no, no, −) Ann (no, no) − 0 0 0 0 0 0
ω10 = (Ann, no, no, +) Ann (no, no) + 0 0 0 1 0 1
ω11 = (Ann, no, yes, −) Ann (no, yes) − 0 1 0 0
ω12 = (Ann, no, yes, +) Ann (no, yes) + 0 1 0 1
ω13 = (Ann, yes, no, −) Ann (yes, no) − 1 0 0 0
ω14 = (Ann, yes, no, +) Ann (yes, no) + 1 0 0 1
ω15 = (Ann, yes, yes, −) Ann (yes, yes) − 1 1 1 0 1 0
ω16 = (Ann, yes, yes, +) Ann (yes, yes) + 1 1 1 1 1 1

Note . For the definition of Ω0 see the text.

(Ω, A ) = (Ω1 × Ω2 × Ω3 , A1 ⊗ A2 ⊗ A3 )

is completely specified, and with it the projections π1, π2 , π3 , and the σ-algebras of the fil-
tration (Ft )t ∈T [see Eqs. (4.1)].
Note that the first projection π1 is identical to the person variable U that has been used
in the examples of chapter 1, for instance. In this example, the second projection π2 =
(π21 , π22) consists of two projections π21 : Ω → Ω21 and π22 : Ω → Ω22 [see Eq. (4.19)]. Note
that π2 generates the same σ-algebra as (Z , X ).
Table 4.3 shows the sixteen elements ωi of the set Ω = Ω1 × Ω2 × Ω3 of all possible out-
comes of this kind of experiment. It also shows the values assigned to these elements by
the projections π1 to π3 , the treatment variable Z (that represents treatment a vs.¬a), the
treatment variable X (representing treatment b vs.¬b), and by the outcome variable Y.

First Regular Causality Setup. Again, in this kind of experiment, we can choose among
several maps as the focused putative cause variable. However, in contrast to the example
presented in section 4.2.1, now the cause and confounder σ-algebras change, even if we
stick to the unrestricted measurable space (Ω, A ).
96 4 Regular Causality Space and Potential Confounder

Referring to Definition 4.6, the index set J is {1, 2} because π2 = (π21 , π22). If we want to
choose X to take the role of the putative cause variable, then we have to specify K = {2},
which yields

C 1 = σ(π2j , j ∈ K ) = σ(π22). (4.20)

Another choice of the index set K would not meet conditions (a) and (b) of Definition 4.11
(i) for X . For K = {2}, condition (a) is satisfied because Ø 6= σ(X ) = σ(π22) = C 1 . Condition
(b) is met as well because there is no proper subset K 0 of K = {2} such that X is measurable
with respect to (π2j , j ∈ K 0 ). In contrast, for K = {1}, condition (a) is not satisfied because
σ(X ) 6⊂ σ(π21 ). Finally, for K = {1, 2}, condition (a) is satisfied, however, condition (b) does
not hold because K 0 = {2} is a proper subset of K and the σ-algebra generated by X is
identical to (and therefore a subset of) σ(π2j , j ∈ K 0 ) = σ(π22).
Choosing K = {2} does not only imply C 1 = σ(π22 ) but also that the corresponding con-
founder σ-algebra is

DC 1 = σ(π1, π2j , j ∈ J \ K ) = σ(π1, π21 ). (4.21)

In Table 4.3 we already specified the putative cause variable, the treatment variable X , and
the outcome variable Y. In this example, σ(X ) = σ(π22) = C 1 , σ(Z ) = σ(π21 ) ⊂ DC 1 , and
σ(Y ) = σ(π3 ). Hence, the treatment variable Z , which is simultaneous to X , as well as π1
and π21 are potential confounders of X because they all are DC 1 -measurable.
Finally, the bivariate projection (π1, π21 ) as well as (π1, Z ) are global potential con-
founders of X and can take the role of D X [see Def. 4.11 (iii)] because σ(π1, π21 ) = σ(π1, Z ) =
DC 1 . Hence, we completely specified the regular causality setup
¡ ¢
(Ω, A ), (Ft )t ∈T , C 1 , DC 1 , X , Y .

Second Regular Causality Setup. Because X and Z are simultaneous treatment vari-
ables, their roles can be exchanged. Hence, now we consider Z as the focused putative
cause variable and X as a potential confounder of Z (see again Table 4.3). For this purpose,
we specify K = {1}, which yields C 2 = σ(π21 ) and that DC 2 = σ(π1, π22) is the confounder σ-
algebra of Z (see again Def. 4.6). Now, the bivariate projection (π1, π22) as well as (π1, X ) are
global potential confounders of Z and can take the role of D Z [see Def. 4.11 (iii)]. Hence,
the regular causality setup is now
¡ ¢
(Ω, A ), (Ft )t ∈T , C 2 , DC 2 , Z , Y ,

and (π1, π22 ), π1, as well as X are potential confounders of Z . The outcome variable is still
Y.

Third Regular Causality Setup. In this example, we can also choose the bivariate map
(X , Z ) to take the role of a putative cause variable. This allows us to study the joint effects
of treatment a and b. In this case, K = {1, 2}, C 3 = σ(π21 , π22 ) is the cause σ-algebra, and
DC 3 = σ(π1) the confounder σ-algebra (see again Def. 4.6). Now, π1 is a global potential
confounder of the putative cause variable (X , Z ). Hence, a third regular causality setup in
the example presented in Table 4.3 is
¡ ¢
(Ω, A ), (Ft )t ∈T , C 3 , DC 3 , (X , Z ), Y ,

with the outcome variable Y specified in Table 4.3.


4.2 Examples 97
¡ ¢
Fourth Regular Causality Setup. In the regular causality space (Ω, A ), (Ft )t ∈T , C 3 , DC 3
we can choose among several putative cause variables. The first one, the bivariate map
(X , Z ), has been selected above. However, we can also choose the indicator variable
(
1, if ω ∈ {ω7 , ω8 , ω15 , ω16 }
1A (ω) =
0, otherwise.

This new putative cause variable allows us to compare receiving both treatments to re-
ceiving at most one of the two treatments. The cause and the confounder σ-algebras are
still C 3 = σ(π2) = σ(π21 , π22 ) and DC 3 = σ(π1 ), respectively, because 1A is measurable with
respect to π2 but neither with respect to π21 nor with respect to π22. Hence, we specified
the fourth regular causality setup
¡ ¢
(Ω, A ), (Ft )t ∈T , C 3 , DC 3 , 1A , Y .

Fifth Regular Causality Setup. If, for example, we intend to compare receiving both
treatments to receiving
¡ no treatment, then ¢ we have to use the restriction of the regular
causality space (Ω, A ), (Ft )t ∈T , C 3 , DC 3 to the subset

Ω0 = {ω1 , ω2 , ω7 , ω8 , ω9 , ω10 , ω15 , ω16 }


¡ ¢
of Ω. That is, we have to use (Ω, A ), (Ft )t ∈T , C 3 , DC 3 | Ω0 because the comparison of re-
ceiving both treatments to receiving no treatment only involves the elements of Ω0 and
not the other elements of Ω (see Table 4.3). All σ-algebras involved in this space are the
restrictions of the corresponding σ-algebras to Ω0 . For example, for the cause σ-algebra

C 3 = σ(π2) = σ(π2) − 1 (A2) = σ(π2) − 1 P (Ω2)


¡ ¢
©
= {ω1 , ω2 , ω9 , ω10 }, {ω3 , ω4 , ω11 , ω12 }, {ω5 , ω6 , ω13 , ω14 }, {ω7 , ω8 , ω15 , ω16 },
{ω1 , ω2 , ω3 , ω4 , ω9 , ω10 , ω11 , ω12 }, {ω1 , ω2 , ω5 , ω6 , ω9 , ω10 , ω13 , ω14 },
{ω1 , ω2 , ω7 , ω8 , ω9 , ω10 , ω15 , ω16 }, {ω3 , ω4 , ω5 , ω6 , ω11 , ω12 , ω13 , ω14 },
{ω3 , ω4 , ω7 , ω8 , ω11 , ω12 , ω15 , ω16 }, {ω5 , ω6 , ω7 , ω8 , ω13 , ω14 , ω15 , ω16 },
{ω1 , ω2 , ω3 , ω4 , ω5 , ω6 , ω9 , ω10 , ω11 , ω12 , ω13 , ω14 },
{ω1 , ω2 , ω3 , ω4 , ω7 , ω8 , ω9 , ω10 , ω11 , ω12 , ω15 , ω16 },
{ω3 , ω4 , ω5 , ω6 , ω7 , ω8 , ω11 , ω12 , ω13 , ω14 , ω15 , ω16 },
ª
{ω1 , ω2 , ω5 , ω6 , ω7 , ω8 , ω9 , ω10 , ω13 , ω14 , ω15 , ω16 }, Ω, Ø

this means
© ª
C 3 | Ω0 = {ω1 , ω2 , ω9 , ω10 }, {ω7 , ω8 , ω15 , ω16 }, Ω0 , Ø .

For, the intended comparison we can use the restriction X | Ω0 of the original putative
cause variable X (see Table 4.3) to Ω0 specified by,
(
0, if ω ∈ {ω1 , ω2 , ω9 , ω10 }
X | Ω0 (ω) = X (ω) =
1, if ω ∈ {ω7 , ω8 , ω15 , ω16 },

and the restriction Y | Ω0 of the original outcome variable Y to Ω0 specified by,


98 4 Regular Causality Space and Potential Confounder
(
0, if ω ∈ {ω1 , ω7 , ω9 , ω15 }
Y | Ω0 (ω) = Y (ω) =
1, if ω ∈ {ω2 , ω8 , ω10 , ω16 }.

Hence, the fifth regular causality setup is


¡ ¢ ³¡ ¢¡ ¢ ´
(Ω, A ), (Ft )t ∈T , C 3 , DC 3 , X , Y | Ω0 = Ω0 , A | Ω0 , Ft | Ω0 t ∈T , C 3 | Ω0 , DC 3 | Ω0 , X | Ω0 , Y | Ω0 .

Of course, we might also want to compare receiving only treatment a to only receiving
treatment b, or receiving both treatments to receiving only one single treatment, and so
on.
¡ All such comparisons ¢ would need their own restriction of the regular causality space
(Ω, A ), (Ft )t ∈T , C 3 , DC 3 to the appropriate subset of Ω (see Exercise 4-10).

4.2.3 Nonorthogonal Factors

Consider again the example presented in Table 1.5. In contrast to the previous examples,
now we consider an outcome variable Y whose possible values are not only 0 and 1 any
more. Therefore, this example cannot be presented in the same format as before.
Nevertheless, we can specify the sets Ωt , t ∈ T = {1, 2, 3}, occurring in Definition 4.6. In
this example, they are Ω1 = {Tom , Tim , . . . , Mia }, Ω2 = {control, treatment 1, treatment 2 },
and Ω3 = R . Furthermore, we choose the σ-algebras on these sets to be At = P (Ωt ),
t = 1, 2, and A3 = B, the Borel σ-algebra on R (see RS-Rem. 1.14).
Requiring (Ω, A ) to be the product of the measurable spaces (Ωt , At ), t ∈T (see Def. 4.6),
the measurable space (Ω, A ) is completely specified and with it the projections π1 to π3 .
The projection π1 : Ω → Ω1 has the possible values Tom , Tim , . . . , Mia . Hence, π1 is identi-
cal to the person variable U used in Table 1.5. The projection π2 : Ω → Ω2 has the possible
values control, treatment 1, treatment 2, and the possible values of π3 : Ω → Ω3 are the real
numbers.
The filtration (Ft )t ∈T is specified by

F1 = σ(π1 ), F2 = σ(π1, π2), F3 = σ(π1, π2, π3 ).

Furthermore, the index sets occurring in Definition 4.6 are J = K = {1}, implying that the
cause σ-algebra is
C = σ(π2j , j ∈ K ) = σ(π2)

and that the confounder σ-algebra of C is

DC = σ(π1, π2j , j ∈ J \ K ) = σ(π1) .

Choosing X as the putative cause variable and Y as the outcome variable (see Table 1.5),
we specify the regular causality setup
¡ ¢
(Ω, A ), (Ft )t ∈T ,C , DC , X , Y .

Note that the putative


¡ cause variable
¢ X is the composition of the projection π2 and
a measurable map g : Ω2 , P (Ω2 ) → (R , B) (see RS-sect. 2.1.7), that is, X = g (π2). This
implies σ(X ) ⊂ σ(π2) (see RS-Lemma 2.35). Although X is a measurable map on (Ω, A ),
intuitively speaking, X = g (π2) means that the values of X only depend on the elements
4.3 Properties 99

of the set Ω2 = {control, treatment 1, treatment 2 }, the second factor set in the set product
Ω = Ω1 × Ω2 × Ω3 .
Similarly, the outcome variable Y is the composition of the projection π4 and a measur-
able map f : (R , B) → (R , B), that is, Y = f (π4 ). This implies σ(Y ) ⊂ σ(π4 ) (see again RS-
Lemma 2.35). Hence, the values of Y only depend on the elements of Ω4 = R , the fourth
set in the set product Ω = Ω1 × Ω2 × Ω3 .
According to Definition 4.11 (iv), each non-constant measurable map W that satisfies
σ(W ) ⊂ DC is a potential confounder of X . In this example, this applies, for instance, to
educational status specified in Table 1.5, but also to sex, which is also an attribute of the
persons in the set Ω1 = {Tom , Tim , . . . , Mia }. Again, each of these variables can be written
as the composition of some map and the projection π1, implying that they are measurable
with respect to σ(π1 ) = DC (see Exercise 4-11).

4.3 Properties

In this section, we study some properties of a regular causality space, a putative cause
variable, an outcome variable, a potential confounder, and other maps that are measur-
able with respect to these maps.

4.3.1 Cause σ-Algebra and Potential Confounder σ-Algebra

According to the following theorem, the intersection of the cause σ-algebra C and the con-
founder σ-algebra DC is the trivial σ-algebra, and the cause and confounder σ-algebras, C
and DC , are measurable with respect to F2. In contrast to C , the confounder σ-algebra DC
can also be measurable with respect to F1 . According to Definition 4.6 (ii), this is the case
if J = K , implying DC = σ(π1 ). This is the case if there are no events A ∈ A that are simulta-
neous to the cause σ-algebra C and that are not elements of C .

Theorem
¡ 4.25 [Properties ¢ of C and DC ]
If (Ω, A ), (Ft )t ∈T , C , DC is a regular causality space, then

C ∩ DC = {Ω, Ø} (4.22)
C 6⊂ F1 (4.23)
C ⊂ F2 (4.24)
DC ⊂ F2 . (4.25)
(Proof p. 109)

4.3.2 Putative Cause Variable

Proposition (4.22) has two immediate implications for the σ-algebra generated by a puta-
tive cause variable and the σ-algebra generated by a potential confounder.
¡ ¢
Remark 4.26 [Two Immediate Implications] If (Ω, A ), (Ft )t ∈T ,C , DC , X , Y is a regular
causality setup, then
100 4 Regular Causality Space and Potential Confounder

σ(X ) ∩ DC = {Ω, Ø} (4.26)

because σ(X ) ⊂ C . Furthermore, if W is a potential confounder of X , then

σ(X ) ∩ σ(W ) = {Ω, Ø}, (4.27)

because σ(W ) ⊂ DC . ⊳

Remark 4.27 [Implications Involving a Global Potential Confounder] Remember, by def-


inition, a global potential confounder D X of X satisfies σ(D X ) = DC . Hence, Equation (4.26)
is equivalent to

σ(X ) ∩ σ(D X ) = {Ω, Ø} . (4.28)


¡ ¢
In the framework of a regular probabilistic causality setup (Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,
this equation neither implies stochastic independence nor dependence of X and D X . That
is, X and D X can be stochastically dependent or independent of each other, and this solely
is determined by the probability measure P on (Ω, A ). ⊳

Example 4.28 [Joe and Ann With a Single Treatment Variable] The issue addressed in Re-
mark 4.27 is exemplified by RS-Tables 1.2 and 1.4. These tables present random experi-
ments that differ from each other only in the probability measure P , whereas the structure
is the same as described in Example 4.10. Hence, in both examples, σ(X ) ∩ σ(U ) = {Ω, Ø}
holds for the putative cause variable X and the global potential confounder U . Because,
in this example, σ(X ) = C and σ(U ) = DC , this can be seen from Equations (4.3) and (4.4).
Obviously, the only elements shared by these σ-algebras are Ω and Ø.
Furthermore, while X and U are stochastically independent in the example of RS-Table
1.2, they are stochastically dependent in the example of RS-Table 1.4. This can be seen
from comparing the columns P (X =1|U =u ) in the two tables to each other. In RS-Table
1.2, the conditional probability P (X =1|U =u ) is the same for both persons u, which im-
plies stochastic independence of X and U . In contrast, in RS-Table 1.4, the conditional
probabilities P (X =1|U =Joe ) and P (X =1|U =Ann) differ from each other, and this implies
stochastic dependence of X and U . ⊳

According to the following theorem, a putative cause variable is not a potential con-
founder of itself, it is measurable with respect to F2, and it is not measurable with respect
to F1 .

Theorem
¡ 4.29 [Properties of a¢ Putative Cause Variable]
If (Ω, A ), (Ft )t ∈T ,C , DC , X , Y is a regular causality setup, then

σ(X ) 6⊂ DC (4.29)
σ(X ) ⊂ F2 (4.30)
σ(X ) 6⊂ F1 . (4.31)
(Proof p. 110)
4.3 Properties 101

4.3.3 Potential Confounder

Now we study some properties of a potential confounder. In the following theorem, we


show that every potential confounder is measurable with respect to the σ-algebra F2, but
not measurable with respect to the cause σ-algebra σ(X ).

Theorem
¡ 4.30 [Two Properties ¢ of a Potential Confounder]
If (Ω, A ), (Ft )t ∈T ,C , DC , X , Y is a regular causality setup and W a potential con-
founder of X , then

σ(W ) ⊂ F2 (4.32)
σ(W ) 6⊂ σ(X ) . (4.33)
(Proof p. 110)

Now we turn to time order concerning a putative cause variable and a potential con-
founder. According to the first proposition of the following theorem, a measurable map
V that is prior in (Ft )t ∈T to a putative cause variable X is measurable with respect to DC .
And, according to the second proposition, if V is DC -measurable, then it is prior or simul-
taneous in (Ft )t ∈T to X . Note that V can also be a global potential confounder of X .

Theorem
¡ 4.31 [A Measurable Map ¢ That Is Prior to X Is DC -Measurable]
Let (Ω, A ), (Ft )t ∈T ,C , DC , X , Y be a regular causality setup, V a measurable map on
(Ω, A ), and let FT denote the filtration (Ft )t ∈T . Then

V F≺ X ⇒ σ(V ) ⊂ DC (4.34)
T

σ(V ) ⊂ DC ⇒ V 4X. (4.35)


FT
(Proof p. 110)

Proposition (4.34) and Definition 4.11 (iv) immediately imply the following corollary, in
which we consider a non-constant measurable map on (Ω, A ) that is prior in (Ft )t ∈T to X .

Corollary 4.32 [A Non-Constant Measurable Map That Is Prior to X ]


Under the assumptions of Theorem 4.31, if V is prior in FT to X and σ(V ) 6= {Ω, Ø},
then V is a potential confounder of X .

An immediate implication of Proposition (4.35) and Definition 4.11 applies to potential


confounders, which, by definition are non-constant.

Corollary 4.33 [A Potential Confounder Is Prior or Simultaneous to X ]


Under the assumptions of Theorem 4.31, if W is a potential confounder of X , then W is
prior or simultaneous in FT to X .
102 4 Regular Causality Space and Potential Confounder

Hence, according to this corollary, each potential confounder of X is prior or simulta-


neous in FT to X . Remember, however, a potential confounder of X is not measurable
with respect to X [see Prop. (4.33)].
According to the following corollary, each non-constant W -measurable map is a poten-
tial confounder of X , if W is a potential confounder of X .

Corollary
¡ 4.34 [Measurable Maps ¢ of a Potential Confounder]
Let (Ω, A ), (Ft )t ∈T ,C , DC , X , Y be a regular causality setup and let V,W be measurable
maps on (Ω, A ). If W is a potential confounder of X and {Ω, Ø} 6= σ(V ) ⊂ σ(W ), then V
is a potential confounder of X as well.
(Proof p. 111)

4.3.4 Outcome Variable

Now we study some measurability properties of an outcome variable Y. According to the


following theorem, an outcome variable Y is F3 -measurable but not measurable with re-
spect to F2.

Theorem
¡ 4.35 [An Outcome Variable ¢ Is F3 -Measurable]
If (Ω, A ), (Ft )t ∈T ,C , DC , X , Y is a regular causality setup, then σ(Y ) 6⊂ F2 and

σ(Y ) ⊂ F3 . (4.36)
(Proof p. 111)

Remark 4.36 [Immediate Implications] Note that σ(Y ) 6⊂ F2 implies σ(Y ) 6⊂ F1 because
(Ft )t ∈T is a filtration. It also implies σ(Y ) 6⊂ DC because DC ⊂ F2 [see Prop. (4.25)]. ⊳

According to the following corollary, a potential confounder of X is prior in (Ft )t ∈T to Y .


According to Remark 4.16, this also applies to a covariate of X . This corollay immediately
follows from Theorems 4.30, 4.35, and Definition 3.3.

Corollary
¡ 4.37 [A Potential Confounder
¢ of X is Prior in (Ft )t ∈T to Y ]
If (Ω, A ), (Ft )t ∈T ,C , DC , X , Y is a regular causality setup and W a potential con-
founder of X , then

W F≺ Y . (4.37)
T

Now we consider a map V on (Ω, A ) that is not F2-measurable. If it is measurable with


respect to (W, Y ), where W is F2-measurable and Y is F3 -measurable, then, according to
the following theorem, V is measurable with respect to F3 .
4.3 Properties 103

Theorem
¡ 4.38 [Measurable Map ¢ of Y and W ]
Let (Ω, A ), (Ft )t ∈T ,C , DC , X , Y be a regular causality setup, let W be F2-measurable,
and V a measurable map on (Ω, A ) that is not F2-measurable. Then

σ(V ) ⊂ σ(W, Y ) ⇒ σ(V ) ⊂ F3 . (4.38)


(Proof p. 111)

Under the assumptions of Theorem 4.38, a (W,Y )-measurable map V on (Ω, A ) that is
not F2-measurable is posterior in (Ft )t ∈T to X .

Corollary 4.39 [X Is Prior to V ]


Let the assumptions of Theorem 4.38 hold and let FT denote the filtration (Ft )t ∈T . Then

σ(V ) ⊂ σ(W, Y ) ⇒ X F≺ V. (4.39)


T
(Proof p. 111)

According to RS-Lemma 2.34, the premise σ(V ) ⊂ σ(W, Y ) holds for the composition of
(W, Y ) and a measurable map g . Hence, the propositions of Theorem 4.38 and Corollary
4.39 apply to such a composition g (W, Y ).

Corollary
¡ 4.40 [X Is Prior to g (W,¢ Y )]
Let (Ω, A ), (Ft )t ∈T ,C , DC , X , Y be a regular causality setup, let FT denote the filtra-

tion (Ft )t ∈T , and let W : (Ω, A ) → (ΩW , AW′ ) be measurable with respect to F2. Fur-
′ ′ ′ ′ ′ ′
thermore,
¡ ¢assume that g : (ΩY × ΩW , AY ⊗ AW ) → (Ωg , Ag ) is a measurable map and
σ g (W, Y ) 6⊂ F2. Then
¡ ¢
σ g (W, Y ) ⊂ F3 (4.40)

and
X F≺ g (W, Y ). (4.41)
T

Hence, if the composition g (W, Y ) is not measurable with respect to the σ-algebra F2,
then g (W, Y ) is F3 -measurable and X is prior in FT to g (W, Y ).

Example 4.41 [Linear Combination of W and Y ] A special case of a map g (W, Y ) men-
tioned in Corollary 4.40 that is not F2-measurable is a linear combination g (W, Y ) = αW +
βY , α, β ∈ R , provided that β 6= 0. Note that in Definition 4.11 (ii) we assume σ(Y ) 6⊂ F2,
which implies σ(Y ) 6= {Ω, Ø}. ⊳

Example 4.42 [Change Score Variable] A special case of a linear combination of W and Y
mentioned in Example 4.41 is the difference variable g (W, Y ) = Y −W , where W assesses
the same attribute as Y, only before treatment (see sect. 2.2). Again, remember the require-
ment σ(Y ) 6⊂ F2 [see Def. 4.11 (ii)], which implies σ(Y ) 6= {Ω, Ø}. ⊳
104 4 Regular Causality Space and Potential Confounder

Example 4.43 [Residual] Another special case of a function g (W, Y ) mentioned in Corol-
lary 4.40 that is not F2-measurable is the residual g (W, Y ) = Y − E (Y |W ) of Y with respect
to its W -conditional expectation. [Here, we additionally assume that P is a probability
measure on (Ω, A ) (see RS-sect. 4.5).] Of course, the requirements for Y mentioned in the
last examples still stand. ⊳
The conjunction of Definition 4.11, Remark 4.9, and Corollary 4.40 implies the following
corollary.

Corollary 4.44 [A New Regular Causality Setup]


Let the assumptions of Corollary 4.40 hold. Then
¡ ¢
(Ω, A ), (Ft )t ∈T ,C , DC , X , g (W, Y ) (4.42)

is a regular causality setup.

According to this corollary, X is a putative cause variable of the ¡ outcome


¢ variable
g (W, Y ), provided that the assumptions of Corollary 4.40 including σ g (W, Y ) 6⊂ F2 hold.
Hence, according to Example 4.41,
¡ ¢
(Ω, A ), (Ft )t ∈T ,C , DC , X , αW + βY (4.43)

is a regular causality setup, provided that β 6= 0. Furthermore, according to Example 4.42,


¡ ¢
(Ω, A ), (Ft )t ∈T ,C , DC , X , Y −W (4.44)

is a regular causality setup. Finally, according to Example 4.43,


¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C , DC , X , Y −E (Y |W ) (4.45)

is a regular probabilistic causality setup. Note that in all of these setups we assume that Y
is not measurable with respect to F2, which implies σ(Y ) 6= {Ω, Ø}.

4.4 Summary and Conclusions

In this chapter, we specified the mathematical structure, and formulated the assumptions
under which we can define causal effects. The most important concepts are gathered in
Box 4.1 and their properties are summarized in Box 4.2. ¡ ¢
The fundamental concept is a regular causality space (Ω, A ), (Ft )t ∈T ,C , DC , which
consists of a measurable space (Ω, A ), a filtration (Ft )t ∈T , a cause σ-algebra C and a con-
founder σ-algebra DC . None of these concepts involves a probability measure. However,
in the context of a probability measure P on (Ω, A ), a measurable space (Ω, A ) is the math-
ematical framework for any statement about events, their probabilities, and their (causal
or noncausal) dependencies. It is also the framework for the definition of random vari-
ables, their distributions, and their (causal or noncausal) dependencies (see, e. g., Steyer &
Nagel, 2017).
A filtration (Ft )t ∈T in A allows to define time order among elements of A , among sub-
sets of A , and among measurable maps on (Ω, A ) (see ch. 3). In the context of a probability
4.4 Summary and Conclusions 105

Box 4.1 Glossary of new concepts

¡ ¢
(Ω,A ),(Ft )t ∈T ,C , DC Regular causality space. It consists of a measurable space,
a filtration on A , and two σ-algebras C and DC on A .
(Ω,A ) Measurable space. (Ω,A ) is assumed to be the product of
the measurable spaces (Ωt ,At ), T = {1,2,3}. For all t ∈T
and all ωt ∈ Ωt , we assume {ωt } ∈ At .
(Ft )t ∈T Filtration (Ft )t ∈T in A . It consists of the σ-algebras F1
= σ(π1 ), F2 = σ(π1,π2 ), and F3 = σ(π1,π2,π3 ), where πt ,
t ∈T , are the projections πt : Ω → Ωt defined by πt (ω) = ωt ,
for all ω ∈ Ω. The projection π2 may itself consist of several
projections, that is, π2 = (π2j , j ∈ J ), J ⊂ N.
C Cause σ-algebra. For Ø 6= K ⊂ J it is defined by
C = σ(π2j , j ∈ K ) .

DC Confounder σ-algebra of C . It is defined by


DC = σ(π1,π2j , j ∈ J \ K ).
¡ ¢
(Ω,A ),(Ft )t ∈T ,C , DC , X ,Y Regular causality setup. It consists of a regular causality
space, a putative cause variable X , and an outcome vari-
able Y , which are measurable maps on (Ω,A ).
X Putative cause variable. By definition, it satisfies
(a) {Ω,Ø} 6= σ(X ) ⊂ C
(b) ¬∃ K 0 ⊂ K , K 0 6= K ,such that σ(X ) ⊂ σ(π2j , j ∈ K 0 ).

Y Outcome variable or response variable. It is defined by


σ(Y ) 6⊂ F2 .
In the framework of a regular causality space this implies
that Y is measurable with respect to F3 .
W Potential confounder of X or covariate of X . It is a measur-
able map on (Ω,A ) satisfying σ(W ) ⊂ DC .
DX Global potential confounder of X . A potential confounder
of X satisfying σ(D X ) = DC .
¡ ¢
(Ω,A ),(Ft )t ∈T ,C , DC | Ω0 Restriction of a regular causality space to Ω0 ⊂ Ω. It is
a
¡ new regular causality ¢ space in which all σ-algebras of
(Ω,A ),(Ft )t ∈T ,C , DC are restricted to Ω0 .
¡ ¢
(Ω,A ),(Ft )t ∈T ,C , DC , X ,Y | Ω0 Restriction of a regular causality setup to Ω0 ⊂ Ω. It is a reg-
ular causality setup in¡which we consider the¢ restrictions
of X and Y to Ω0 , and (Ω,A ),(Ft )t ∈T ,C , DC | Ω0 .
106 4 Regular Causality Space and Potential Confounder

space (Ω, A, P ), this is tantamount to time order among events, among sets of events, and
among random variables, respectively. Such a time order is indispensable for a formaliza-
tion of the intuitive idea that a cause precedes its outcome. The filtration (Ft )t ∈T consists
of three σ-algebras. Their substantive interpretation is that, from the perspective of the
cause, F1 represents the set of all past events, F2 the set of all past and present events, and
F3 the set of all past, present, and future events.
The third and fourth components of a regular causality space are the cause σ-algebra
C and its confounder σ-algebra DC . In the context of a probability space and from the
perspective of a focused putative cause variable X , the σ-algebra C represents the set of
all present events with respect to which a putative cause variable X is measurable. In con-
trast, DC consists of all past and all other present events. They are the events that might
confound or disturb the stochastic dependency of the outcome variable Y on the putative
cause variable X .
Adding a putative cause variable X and¡ an outcome variable Y to¢a regular causality
space constitutes a regular causality setup (Ω, A ), (Ft )t ∈T ,C , DC , X , Y . By definition, the
putative cause variable X satisfies two conditions. According to condition (a), it is C -
measurable and its generated σ-algebra is not trivial. The latter would be the case if X
would be a constant map. According to the condition (b), there is no family of projections
(π2j , j ∈ K 0 ), K 0 ⊂ K , such that X is measurable with respect to this family of projections.
This condition secures that the confounder σ-algebra DC is chosen such that all other pu-
tative cause variables that are simultaneous to X are measurable with respect to DC .
By definition, the outcome variable Y is not measurable with respect to F2. Hence, an
outcome variable Y can either solely depend on π3 or depend on π3 and one or more of the
projections π1 and π2. This allows, for instance, to consider pre-post difference variables
(‘pre’ and ‘post’ with respect to the focused putative cause variable) as outcome variables
(see Examples 4.41 to 4.43).
Furthermore, we introduced the concepts of a potential confounder and a global po-
tential confounder of a putative cause variable X . By definition, a potential confounder W
of a putative cause variable X is a nonconstant DC -measurable map, and a global potential
confounder D X of X is a potential confounder that generates the confounder σ-algebra DC .
Hence, all potential confounders are measurable with respect to D X .
Last¡ but not least, in Theorem¢ 4.22 we showed that the restriction of a regular causality
space (Ω, A ), (Ft )t ∈T , C¡ , DC | Ω0 to Ω0 ⊂ Ω is¢ a new regular causality space in which all
σ-algebras involved in (Ω, A ), (Ft )t ∈T ,C¡, DC are restricted to Ω0 . Adding
¢ the restrictions
of the maps X and Y to Ω0 then yields (Ω, A ), (Ft )t ∈T , C , DC , X , Y | Ω0 , the restriction of
a regular causality setup to Ω0 , which is the formal framework in which we can consider
putative cause variables and outcome variables that are measurable maps on Ω0 (see the
Examples in sect. 4.2.1 and 4.2.2).

4.5 Proofs

Proof of Theorem 4.22

Remember,¡ (Ω, A ) is¢ the product of the measurable spaces (Ωt¡, At ), t ∈T . ¢Hence, first we
show that Ω0 , A | Ω0 is the product of the measurable spaces Ω0t , At| Ω0t , t ∈T . That is,
we show A | Ω0 = A1| Ω01 ⊗ A2| Ω02 ⊗ A3 | Ω03 . Now consider the set system
4.5 Proofs 107

Box 4.2 Properties of a regular causality setup and potential confounders

All
¡ propositions gathered¢ in this box refer to the same regular causality setup
(Ω,A ),(Ft )t ∈T ,C , DC , X ,Y and throughout this box, W denotes a potential confounder
of X , which by definition, satisfies σ(W ) ⊂ DC .

C ∩ DC = {Ω,Ø} (i)
C 6⊂ F1 (ii)
C ⊂ F2 (iii)
DC ⊂ F2 (iv)

σ(X ) ∩ σ(W ) = {Ω,Ø} (v)


σ(X ) ⊂ F2 (vi)
σ(X ) 6⊂ F1 (vii)
σ(X ) 6⊂ DC (viii)

σ(Y ) ⊂ F3 (ix)
∀ t ∈ {1,2}: σ(Y ) 6⊂ Ft (x)
σ(Y ) 6⊂ DC (xi)

σ(W ) ⊂ F2 (xii)
σ(W ) 6⊂ σ(X ) (xiii)
W 4 X. (xiv)
FT

If V is a measurable map on (Ω,A ) that is not F2-measurable, then

σ(V ) ⊂ σ(W,Y ) ⇒ (a) σ(V ) ⊂ F3 (xv)


(b) X ≺ V (xvi)
FT
(c) V is an outcome variable. (xvii)

If Z is a measurable map on (Ω,A ), then

σ(Z ) ⊂ σ(W ) ⇒ V is a potential confounder of X . (xviii)

3
×A : A ∈ A |
n o
E 0 := 0t 0t t Ω0t
t =1
3
= × A : A ∈ {Ω ∩ A : A ∈ A }
n o
0t 0t 0t t t t [(4.5)]
t =1
3
= ×(Ω ∩ A ): A ∈ A
n o
0t t t t [A 0t = Ω0t ∩ A t ]
t =1
3 3
= ×Ω ∩× A : A ∈ A
n o
0t t t t
t =1 t =1
3 3
= Ω ∩× A : A ∈ A . ×Ω
n o h i
0 t t t Ω0 = 0t
t =1 t =1
108 4 Regular Causality Space and Potential Confounder

According to RS-Equation (1.10), the set system E 0 is a generating system of the product
σ-algebra A1| Ω01 ⊗ A2| Ω02 ⊗ A3 | Ω03 . Hence, using this fact and the last equation yields

3
O
At| Ω0t = σ(E 0 ) [RS-Eq. (1.10)]
t =1
3 3
×A : A ∈ A ×A : A ∈ A
³n o´ h n oi
= σ Ω0 ∩ t t t E 0 = Ω0 ∩ t t t
t =1 t =1
3 3 3
×A : A ∈ A
³n O o´ h n o O i
= σ Ω0 ∩ A : A ∈ At RS-(1.3), t t t ⊂ At
t =1 t =1 t =1
n 3
O o
= Ω0 ∩ A : A ∈ At [RS-(1.6)]
t =1
© ª
= Ω0 ∩ A : A ∈ A [Def. 4.6, RS-Eq. (1.10)]

= A | Ω0 . [(4.5)]

Hence, A | Ω0 is in fact the product of the σ-algebras At| Ω0t , t ∈T , as required in the defini-
tion of a regular causality
¡ space.
¢
Next we show that Ft | Ω0 t ∈T is a filtration in A | Ω0 .

F1 ⊂ F2 ⊂ F3
⇒ { Ω0 ∩ A : A ∈ F1 } ⊂ { Ω0 ∩ A : A ∈ F2 } ⊂ { Ω0 ∩ A : A ∈ F3 }
⇔F1 | Ω0 ⊂ F2 | Ω0 ⊂ F3 | Ω0 . [(4.5)]
¡ ¢
The filtration Ft | Ω0 t ∈T satisfies Equations (4.9), which can be seen as follows:

F1 | Ω0 = { Ω0 ∩ A : A ∈ F1 } [(4.5)]
© ª
= A 0 : A 0 ∈ F1 | Ω0 [A 0 := Ω0 ∩ A ]
n© ª o
= ω ∈ Ω0 : π1 | Ω0 (ω) ∈ A 0 : A 0 ∈ F1 | Ω0 [(4.8)]
n¡ ¢−1 o
= π1 | Ω0 (A 1 ): A 1 ∈ A1| Ω01 [RS-(2.1)]
¡ ¢
= σ π1 | Ω 0 . [RS-(2.11), (2.12)]

F2| Ω0 = { Ω0 ∩ A : A ∈ F2 } [(4.5)]
© ª
= A 0 : A 0 ∈ F2 | Ω0 [A 0 := Ω0 ∩ A ]
n© ¡ ¢ ª o
= ω ∈ Ω0 : π1 | Ω0 , π2| Ω0 (ω) ∈ A 0 : A 0 ∈ F2| Ω0 [(4.8)]
©¡ ¢ −1 ª
= π1 | Ω0 , π2| Ω0 (A 12 ): A 12 ∈ A1| Ω01⊗ A2| Ω02 [RS-(2.1)]
¡ ¢
= σ π1 | Ω 0 , π2 | Ω 0 . [RS-(2.11), (2.12)]

F3 | Ω0 = { Ω0 ∩ A : A ∈ F3 } [(4.5)]
© ª
= A 0 : A 0 ∈ F3 | Ω0 [A 0 := Ω0 ∩ A ]
©© ¡ ¢ ª ª
= ω ∈ Ω0 : π1 | Ω0 , π2 | Ω0 , π3 | Ω0 (ω) ∈ A 0 : A 0 ∈ F3 | Ω0 [(4.8)]
n¡ ¢−1 O 3 o
= π1 | Ω0 , π2| Ω0 , π3 | Ω0 (A 123 ): A 123 ∈ At| Ω01 [RS-(2.1)]
¡ ¢ t =1
= σ π1 | Ω 0 , π2 | Ω 0 , π3 | Ω 0 . [RS-(2.11), (2.12)]
4.5 Proofs 109

Now we show that C | Ω0 is a cause σ-algebra. Let Ø 6= K ⊂ J ,

C | Ω0 = { Ω0 ∩ A : A ∈ C } [(4.5)]
© ª
= Ω0 ∩ A : A ∈ σ(π2j , j ∈ K ) [Def. 4.6 (i)]
© ¡ ¢ª
= A 0 : A 0 ∈ σ π2j | Ω0 , j ∈ K [A 0 = Ω0 ∩ A ]
¡ ¢
= σ π2j | Ω0 , j ∈ K .

Finally, we show that DC | Ω0 is the confounder σ-algebra of C | Ω0 .

DC | Ω0 = { Ω0 ∩ A : A ∈ DC } [(4.5)]
© ª
= Ω0 ∩ A : A ∈ σ(π1, π2j , j ∈ J \ K ) [Def. 4.6 (ii)]
© ¡ ¢ª
= A 0 : A 0 ∈ σ π1 | Ω0 , π2j | Ω0 , j ∈ J \ K [A 0 = Ω0 ∩ A ]
¡ ¢
= σ π1 | Ω0 , π2j | Ω0 , j ∈ J \ K .

Proof of Theorem 4.25

Equation (4.22). Note that K ∩ ({1} ∪ J \ K ) = Ø holds for the sets used in Definition 4.6.
Hence,

C ∩ DC = σ(π2j , j ∈ K ) ∩ σ(π1, π2j , j ∈ J \ K ) [Def. 4.6 (i), (ii)]


= {Ω, Ø} . [RS-Eq. (2.39)]

Proposition (4.23). The proof is by contradiction.

C ⊂ F1 ⇔ C ⊂ σ(π1) [(4.1)]
⇒ C ⊂ DC , [(3.16), Def. 4.6 (ii)]

which is a contradiction to Proposition (4.22) because C 6= {Ω, Ø} [see Def. 4.6 (i)].
Proposition (4.24). We assume Ø 6= K ⊂ J (see Def. 4.6). Hence,

C = σ(π2j , j ∈ K ) [Def. 4.6 (i)]


⇒ C ⊂ σ(π2j , j ∈ J ) [(3.16)]
⇔ C ⊂ σ(π2) [Def. 4.6]
⇒ C ⊂ F2 . [(3.16), (4.1)]

Proposition (4.25).

DC = σ(π1, π2j , j ∈ J \ K ) [Def. 4.6 (ii)]


⇒ DC ⊂ σ(π1, π2j , j ∈ J ) [(3.16)]
⇔ DC ⊂ σ(π1, π2 ) [Def. 4.6]
⇔ DC ⊂ F2 . [(4.1)]
110 4 Regular Causality Space and Potential Confounder

Proof of Theorem 4.29

Proposition (4.29). In Definition 4.11 (i), we assume σ(X ) 6= {Ω, Ø}. Hence, σ(X ) ⊂ DC
would be a contradiction to Equation (4.26).
Proposition (4.30).

σ(X ) ⊂ C [Def. 4.11 (i)]


⇒ σ(X ) ⊂ σ(π1 ) ∪ σ(π2) [Def. 4.6 (i), (3.16)]
¡ ¢
⇒ σ(X ) ⊂ σ σ(π1 ) ∪ σ(π2) [RS-(1.6), RS-(1.7)]
⇔ σ(X ) ⊂ σ(π1 , π2 ) [(3.20)]
⇔ σ(X ) ⊂ F2 . [(4.1)]

Proposition (4.31). The proof is by contradiction.

σ(X ) ⊂ F1 ⇔ σ(X ) ⊂ σ(π1) [(4.1)]


⇒ σ(X ) ⊂ DC , [(3.16), Def. 4.6 (ii)]

which is a contradiction to Proposition (4.29) because we assume σ(X ) 6= {Ω, Ø} [see


Def. 4.11 (i)].

Proof of Theorem 4.30

Proposition (4.32).

σ(W ) ⊂ DC [Def. 4.11 (iv)]


⇒ σ(W ) ⊂ σ(π1, π2) [Def. 4.6 (ii), (3.16)]
⇔ σ(W ) ⊂ F2 . [(4.1)]

Proposition (4.33). The proof is by contradiction. Hence, additionally to σ(W ) ⊂ DC


and σ(W ) 6= {Ω, Ø} [see Def. 4.11 (iv)], we assume σ(W ) ⊂ σ(X ). Then
¡ ¢ ¡ ¢
σ(W ) ⊂ DC ∧ σ(W ) ⊂ σ(X ) ⇔ σ(W ) ⊂ DC ∩ σ(X ) [(3.14)]
⇒ σ(W ) ⊂ {Ω, Ø}, [(4.26)]

which is a contradiction to the assumption σ(W ) 6= {Ω, Ø}.

Proof of Theorem 4.31

Proposition (4.34). According to Propositions (4.30) and (4.31), σ(X ) ⊂ F2 and σ(X ) 6⊂ F1 .
Hence,

V F≺ X ⇒ σ(V ) ⊂ F1 [(4.30), (4.31), Def. 3.3]


T

⇔ σ(V ) ⊂ σ(π1) [(4.1)]


⇒ σ(V ) ⊂ DC . [Def. 4.6 (ii), (3.16)]

Proposition (4.35).
4.5 Proofs 111

σ(V ) ⊂ DC
⇒ σ(V ) ⊂ σ (π1, π2 ) [Def. 4.6 (ii), (3.16)]
⇔ σ(V ) ⊂ F2 [(4.1)]
¡ ¢
⇒ σ(V ) ⊂ F1 ∨ σ(V ) ⊂ F2 ∧ σ(V ) 6⊂ F1 [F1 ⊂ F2]
⇒ σ(V )F≺ σ(X ) ∨ σ(V ) ≈ σ(X ) [Def. 3.3, F1 6⊃ σ(X ) ⊂ F2, Def. 3.24]
T FT
⇔ V 4X. [Def. 3.37 (ii)]
FT

Proof of Corollary 4.34

{Ω, Ø} 6= σ(V ) ⊂ σ(W )


⇒ {Ω, Ø} 6= σ(V ) ⊂ DC [σ(W ) ⊂ DC , (3.16)]
⇔ V is a potential confounder of X . [Def. 4.11 (iv)]

Proof of Theorem 4.35

The property σ(Y ) 6⊂ F2 is required in Definition 4.11 (ii). Furthermore,


σ(Y ) ⊂ A [Def. 4.11, SN-Cor. 2.28]
⇔ σ(Y ) ⊂ A1 ⊗ A2 ⊗ A3 [Def. 4.6, RS-Def. 1.15]
⇔ σ(Y ) ⊂ σ(π1, π2 , π3 ) [RS-Th. 2.30]
⇔ σ(Y ) ⊂ F3 . [(4.1)]

Proof of Theorem 4.38

σ(W ) ⊂ F2 ∧ σ(Y ) ⊂ F3 [(4.36)]


⇒ σ(W ) ⊂ F3 ∧ σ(Y ) ⊂ F3 [F2 ⊂ F3 , (3.16)]
¡ ¢
⇒ σ(Y ) ∪ σ(W ) ⊂ F3 [(3.15)]
⇔ σ(W, Y ) ⊂ F3 [(3.20), (3.21)]
⇒ σ(V ) ⊂ F3 . [σ(V ) ⊂ σ(W, Y ), (3.16)]

Proof of Corollary 4.39

According to Proposition (4.30), σ(X ) ⊂ F2 and according to Theorem 4.38, σ(V ) ⊂ F3 .


However, this implies X F≺ V (see Def. 3.3).
T
112 4 Regular Causality Space and Potential Confounder

4.6 Exercises

⊲ Exercise 4-1 Write down the set of all values of the bivariate projection (π1,π2 ) occurring in Equa-
tion (4.2).

⊲ Exercise 4-2 Consider Table 4.1 and gather in a set all elements of the σ-algebra σ(π1 ), using RS-
Equations (2.1), (2.11), and (2.12).

⊲ Exercise 4-3 Consider Table 4.1 and write down the product σ-algebra A1 ⊗ A2 with all its ele-
ments. The σ-algebras A1 and A2 are specified in Example 4.10 to be the power sets of Ω1 and Ω2 ,
respectively. Use RS-Equation (1.10) as well as RS-Definitions 1.4 and 1.7.

⊲ Exercise 4-4 Consider Table 4.1 and gather in two sets all elements of the two inverse images
(π1,π2) − 1 ({( Joe ,no )}) and (π1,π2 ) − 1 ({( Joe ,no ),(Ann ,no )}), using RS-Equation (2.1).

⊲ Exercise 4-5 In the context of a probability space (Ω,A,P), what is the substantive interpretation
of the sets {ω1 ,ω2,ω5 ,ω6 } and {ω3 ,ω4 ,ω7 ,ω8 } occurring in the set C specified in Equation 4.3?

⊲ Exercise 4-6 Enumerate all elements of the σ-algebra generated by the map X : Ω → R speci-
fied in Table 4.1, using σ(X ) = X − 1 (B) [see RS-Eqs. (2.11) and (2.12)], where B denotes the Borel
σ-algebra on R .

⊲ Exercise 4-7 The σ-algebra C in Equation (4.14) is identical to σ(π2 ) = π2− 1 (A 2 ): A 2 ∈ A2 ,


© ª

where A2 = P ({a,b,c }) [see RS-Eqs. (2.11) and (2.12))]. Write down the power set P ({a,b,c }), list-
ing its elements in the sequence corresponding to the sequence of the elements in C .

⊲ Exercise 4-8 Gather in a set of all elements of the σ-algebra σ(1A ) generated by the indicator vari-
able 1A : Ω → {0,1} specified in Equation (4.15). Use the power set of {0,1} as the σ-algebra on {0,1},
and RS-Equations (2.1), (2.11), and (2.12).

⊲ Exercise 4-9 Check conditions (a) to (c) of RS-Definition 1.4 for the set in the last displayed line
of Equation (4.18).

⊲ Exercise 4-10¡ Consider Table 4.3 and specify


¢ the two subsets of Ω for the restriction of the regular
causality space (Ω,A ),(Ft )t ∈T ,C 3 , DC 3 within which we can compare
(a) receiving only treatment a to receiving only treatment b
(b) receiving both treatments to receiving only one single treatment.

⊲ Exercise 4-11 Consider the example presented in Table 1.5 and specify the measurable map
W : Ω → {male,female } by assigning to each element of Ω the values male or female according to
the sex of the person to be sampled. Then specify W once again as the composition of the projection
π1 : Ω1 → Ω1 and a map g : Ω1 → {male,female }.

Solutions
© ª
⊲ Solution 4-1 The set of all values of (π1,π2) is ( Joe ,no ), ( Joe ,yes ), (Ann ,no ), (Ann ,yes ) .
© ª
⊲ Solution 4-2 In this example, Ω1 = { Joe , Ann } and A1 = { Joe },{Ann },Ω1 ,Ø is the power set of
Ω1 (see Example 4.10). Hence,

σ(π1 ) = π1− 1 (A 1 ): A 1 ∈ A1
© ª
[RS-(2.11), RS-(2.12)]
= π1− 1 ({ Joe }), π1− 1 ({Ann }), π1− 1 (Ω1 ), π1− 1 (Ø)
© ª
[A1 = P (Ω1 )]
© ª
= {ω1 ,... ,ω4 }, {ω5 ,... , ω8 }, Ω, Ø . [RS-(2.1), Table 4.1]
4.6 Exercises 113

⊲ Solution 4-3 In Example 4.10, Ω1 = { Joe , Ann } and


© ª
A1 = { Joe }, {Ann }, Ω1 , Ø

is the power set of Ω1 . Similarly, Ω2 = {no ,yes } and


© ª
A2 = {no }, {yes }, Ω2, Ø

is the power set of Ω2 . Hence,


¡ ¢
A1 ⊗ A2 = σ { A 1 × A 2 : A 1 ∈ A1 , A 2 ∈ A2 } [RS-(1.10)]
¡©
= σ {( Joe ,no )}, {( Joe ,yes )}, {(Ann ,no )}, {(Ann ,yes )},
{( Joe ,no ),( Joe ,yes )}, {(Ann ,no ),(Ann ,yes )},
{( Joe ,no ),(Ann ,no )}, {( Joe ,yes ),(Ann ,yes )},
ª¢
{( Joe ,no ),( Joe ,yes ),(Ann ,no ),(Ann ,yes )} [def. of A 1 × A 2 ]
©
= {( Joe ,no )}, {( Joe ,yes )}, {(Ann ,no )}, {(Ann ,yes )},
{( Joe ,no ),( Joe ,yes )}, {( Joe ,no ),(Ann ,no )},
{( Joe ,no ),(Ann ,yes )}, {( Joe ,yes ),(Ann ,no )},
{( Joe ,yes ),(Ann ,yes )}, {(Ann ,no ),(Ann ,yes )},
{( Joe ,no ),( Joe ,yes ),(Ann ,no )},
{( Joe ,no ),( Joe ,yes ),(Ann ,yes )},
{( Joe ,yes ),(Ann ,no ),(Ann ,yes )},
ª
{( Joe ,no ),(Ann ,no ),(Ann ,yes )}, Ω1 × Ω2 , Ø . [RS-Defs. 1.4, 1.7]

⊲ Solution 4-4 The two inverse images are

(π1,π2)− 1 ({( Joe ,no )}) = ω ∈ Ω: (π1,π2 )(ω) ∈ {( Joe ,no )}


© ª

= {ω1 ,ω2 }
and

(π1,π2 ) − 1 ({( Joe ,no ),(Ann ,no )}) = ω ∈ Ω: (π1,π2 )(ω) ∈ {( Joe ,no ),(Ann ,no )}
© ª

= {ω1 ,ω2,ω5 ,ω6 }.

⊲ Solution 4-5 In the context of a probability space (Ω,A,P), the set {ω1 ,ω2 ,ω5 ,ω6 } is the event
that the sampled person receives treatment and {ω3 ,ω4 ,ω7 ,ω8 } is the event that it does not receive
treatment.
⊲ Solution 4-6 Although there is an uncountably infinite number of elements B in the Borel σ-al-
gebra B on the set R of real numbers (see RS-Rem. 1.14), there are only four different inverse images
X − 1 (B) of sets B ∈ B under X because

{ω ,ω ,ω ,ω }, if 0 ∉B and 1 ∈B
 3 4 7 8



{ω ,ω ,ω ,ω },
1 2 5 6 if 0 ∈B and 1 ∉B
∀B ∈ B : X − 1 (B) =


Ω, if 0 ∈B and 1 ∈B

Ø, if 0 ∉B and 1 ∉B.

These four inverse images are the elements of the σ-algebra σ(X ) = X − 1 (B) generated ¡ by X [see
RS-Eqs. (2.11) and (2.12)]. Because, in this example, X is binary, σ(X ) = X − 1 (B) = X − 1 P ({0,1}) .
¢

© ª
⊲ Solution 4-7 A2 = P ({a,b,c }) = {a },{b },{c },{a,b }, {a,c },{b, c },Ω2 ,Ø .
114 4 Regular Causality Space and Potential Confounder

⊲ Solution 4-8

σ(1A ) = 1A− 1 (C ): C ∈ P ({0,1})


© ª
[RS-(2.11), RS-(2.12)]
© −1 −1 −1 −1
ª
= 1A ({0}), 1A ({1}), 1A ({0,1}), 1A (Ø) [P ({0,1}) = {{0},{1},{0,1},Ø}]
© ª
= {ω1 ,ω2,ω7 ,ω8 }, {ω3 ,... ,ω6 ,ω9 ,... ,ω12 }, Ω, Ø . [RS-(2.1), Table 4.2]

⊲ Solution 4-9 Condition (a) of RS-Definition 1.4 holds for the set C | Ωab because Ωab ∈ C | Ωab .
Condition (b) also holds because the complement C c = Ωab \ C of each set C ∈ C | Ωab is an element
of C | Ωab as well. Finally, Condition (c) of RS-Definition 1.4 holds too. For each sequence C 1 ,C 2 ,...
of elements of C | Ωab , the union of these elements is an element of C | Ωab as well. Examples of such
sequences are
{ω1 ,ω2 ,ω7 ,ω8 }, Ø, Ø, ...
{ω1 ,ω2,ω7 ,ω8 }, {ω3 ,ω4 ,ω9 ,ω10 }, Ωab , Ωab ,...
and
{ω1 ,ω2,ω7 ,ω8 }, {ω3 ,ω4 ,ω9 ,ω10 }, Ωab ,Ø, Ø, ... .
The union of the elements of such sequences is always itself an element of C | Ωab .

¡⊲ Solution 4-10 For comparison


¢ (a) we have to use the restriction of the regular causality space
(Ω,A ),(Ft )t ∈T ,C¡3 , DC 3 to Ω(a) = {ω3 ,...¢,ω6 , ω11 ,... ,ω14 } and for comparison (b) we have to use
the rescriction of (Ω,A ),(Ft )t ∈T ,C 3 , DC 3 to Ω(b) = {ω3 ,... ,ω8 , ω11 ,... ,ω16 }.

⊲ Solution 4-11 The potential confounder W : Ω → {male,female } can be defined by


(
male, if ω ∈ {Tom ,... , Jim } × Ω2 ×Ω3
W (ω) =
female, if ω ∈ {Ann ,... ,Mia } × Ω2 ×Ω3 .

Alternatively, we may define W as the composition W = g (π1) of the first projection π1 and a map
g : Ω1 → {male,female }, where

π1(ω) = π1(ω1 ,ω2,ω3 ) = ω1 , ∀ (ω1 ,ω2 ,ω3 ) ∈ Ω1 ×Ω2 ×Ω3 ,

Ω1 = {Tom ,Tim ,... ,Mia } (see the first column of Table 1.5), and
(
male, if ω1 ∈ {Tom ,... , Jim }
g (ω1 ) =
female, if ω1 ∈ {Ann ,... ,Mia }.
Chapter 5
True Outcome Variable and Causal Total Effects

In chapter 4, we introduced the notion of a regular causality setup, which is the formal
framework in which we can discuss time order of measurable maps (see ch. 3), define
putative cause variables, their potential confounders, outcome variables, and study their
mathematical properties. We also defined the concept of a global potential confounder of
a putative cause variable X , denoted D X . A global potential confounder of X comprises all
potential confounders of X in the sense that the σ-algebra generated by a potential con-
founder of X is a subset of the σ-algebra generated by D X (for a detailed summary see Box
4.1).
In chapter 4, we also mentioned that a regular causality setup is also called a regular
probabilistic causality setup if there is a probability measure P on the measurable space
(Ω, A ). In the framework of a probability space (Ω, A, P ), the measurable maps mentioned
above, including the putative cause variables, their potential confounders, and the out-
come variables are random variables on (Ω, A, P ) (see RS-Def. 2.2).
In this and all other chapters
¡ to come we assume¢ that there is such a regular prob-
abilistic causality setup (Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y , which has all the properties of a
regular causality setup studied in chapter 4. In the framework of such a regular probabilis-
tic causality setup we can meaningfully define causal effects and dependencies among
two random variables X and Y on a probability space (Ω, A, P ).
We begin this chapter defining a true outcome variable τx = E X =x(Y |D X ) as a version
of the D X -conditional expectation of Y with respect to the (X =x )-conditional probabil-
ity measure P X =x on A (see RS-ch. 5), where x denotes a value of X for which we assume
P (X =x ) > 0. As mentioned above, with D X we condition on all potential confounders of X .
Although, in empirical applications, the values of such a true outcome variable are rarely
estimable, the expectation of τx as well as the conditional expectation of τx given a covari-
ate of X , can be estimated under appropriate and realistic assumptions.
Based on the concept of a true outcome variable we then define a true total effect vari-
able. The expectation of the true total effect variable is then defined to be the causal aver-
age total effect on Y comparing x to x ′ , where x ′ denotes a second value of X . Then we turn
to the definition of a causal conditional total effect of x compared to x ′ given the value z
of a random variable Z , and a causal Z -conditional total effect function comparing treat-
ment x to treatment x ′ . Each of these kinds of conditional total effects or effect functions
provides specific information that might be of interest in empirical causal research and
evaluation studies.
In the first place, these parameters and effect functions are of a purely theoretical na-
ture. However, in the next chapter we study how and under which assumptions these vari-
ous kinds of causal effects can be identified by empirically estimable parameters and how
the causal effect functions can be identified by empirically estimable functions.
116 5 True Outcome Variable and Causal Total Effects

Requirements

Reading this chapter requires that the reader is familiar with the contents of the first five
chapters of Steyer (2024), referred to as RS-chapters 1 to 5, the first two of which have al-
ready been required in chapters 3 and 4 of this book. RS-chapter 3 deals with the concepts
expectation, variance, covariance, and correlation, and RS-chapter 4 with the concept of a
conditional expectation. The most important one for the present chapter is RS-chapter 5,
introducing the concept of a conditional expectation with respect to the probability mea-
sure P X =x .
In this chapter, we will often refer to some of the following assumptions and notation.

Notation and Assumptions 5.1


¡ ¢
(a) Let (Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y be a regular probabilistic causality setup and
let D X denote a global potential confounder of X .
(b) Let (ΩX′ , AX′ ) denote the value space of X , let x ∈ ΩX′ , let {x } ∈ AX′ , and let 1X =x
denote the indicator of the event {X =x } = {ω ∈ Ω: X (ω) = x }.
(c) Let Y be real-valued with positive variance.
(d) Assume 0 < P (X =x ) < 1, define the probability measure P X =x : A → [0, 1] by
P X =x (A) = P (A | X =x ), for all A ∈ A, let E X =x (Y |D X ) denote a (version of the)
D X -conditional expectation of Y with respect to P X =x, and E X =x (Y |D X ) the set
of all such versions.
(e) Let Z be a random variable on (Ω, A, P ) and let (ΩZ′ , AZ′ ) denote its value space.
(f ) Let z ∈ ΩZ′ be a value of Z , let {z } ∈ AZ′ , assume P (Z =z) > 0, and define the pro-
bability measure P Z=z : A → [0, 1] by P Z=z (A) = P (A | Z =z), for all A ∈ A.
(g) Let x ′ ∈ ΩX′ and {x ′ } ∈ AX′ , let 1X =x ′ denote the indicator of the event {X =x ′ } =

{ω ∈ Ω: X (ω) = x ′ }, assume 0 < P (X =x ′ ) < 1, and let E X =x (Y |D X ) denote a

(version of the) D X -conditional expectation of Y with respect to P X =x , which is
defined analogously to P X =x in Assumption (d).

5.1 True Outcome Variable

In this section, we introduce the concept of a (total effects) true outcome variable of Y
given the value x of X . ¡This concept can be defined ¢in the framework of a regular proba-
bilistic causality setup (Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y if we assume P (X =x ) > 0. Although
this concept is not mandatory for all causality conditions (see, e.g., chs. 8 and 9, it is fun-
damental for several causality conditions and useful for others.
In the definition of a true outcome variable we consider a D X -conditional expecta-
tion of Y with respect to the conditional probability measure P X =x , denoted E X =x (Y |D X )
(see RS-Def. 5.4), where D X denotes a global potential confounder of X (see Def. 4.6).
Considering the term E X =x (Y |D X ) is tantamount to conditioning on the event {X =x } =
{ω ∈ Ω: X (ω) = x } and on all potential confounders of X , presuming P (X =x ) > 0.

Remark 5.2 [Intuitive Background] Conditioning on a global potential confounder, we


share John Stuart Mill’s idea already described in the preface. However, we make a slight
5.1 True Outcome Variable 117

modification. Instead of comparing values of Y, we compare to each other the D X -condi-


tional expectations of Y given the values x and x ′ of the putative cause X .
The intuitive idea is most easily explained by an example. For instance, in Example 4.10
(see also RS-Tables 1.2 and 1.4), the projection π1 (with values Joe and Ann ) takes the role
of a global potential confounder of X . Now, suppose a person u may receive treatment
(X =1) or control (X =0) (e. g., no treatment or an alternative treatment). If there is a differ-
ence between the conditional expectation values E (Y | X =1,U =u ) and E (Y | X =0,U =u ),
then this difference is due to (i. e., caused by) the treatment variable X .
In this example, conditioning on a person u means that ‘everything else is invariant’, for
example, the severity of symptoms, the motivation for treatment, educational status, and
so on, which all are attributes of the person. Hence, this is a probabilistic version of the
ceteris paribus clause.1 Note that the treatment effect can be different for different values
of the global potential confounder, that is, in this example, it can be different for different
persons. ⊳

Remark 5.3 [Two Concepts] The intuitive idea of a true outcome variable outlined in Re-
mark 5.2 can mathematically be specified by conditional expectations E X =x (Y |D X ) with
respect to a conditional probability measure P X =x (see RS-section 5.1). If P (X =x ) > 0, then
this concept can be used to describe how a numerical random variable Y depends on a
random variable D X given the event {X =x } that X takes on the value x. The difference

E X =x (Y |D X ) − E X =x (Y |D X )

then defines the true effect variable comparing the values x and x ′ to each other (see

Def. 5.18), provided that we assume that E X =x (Y |D X ) and E X =x (Y |D X ) are P-unique (see
sect. 5.2 below).
Alternatively, we might use the partial conditional expectations E (Y | X =x , D X ) and
E (Y | X =x ′, D X ) (see RS-Def. 5.34) comparing conditions x and x ′ to each other. According
to RS-Theorem 5.40, if P (X =x ) > 0, then E (Y | X =x , D X ) is also a version of the conditional
expectation E X =x (Y |D X ) with respect to the conditional probability measure P X =x . Hence,
we may use both concepts and their corresponding notation for the definition of a true
outcome variable. However, E X =x (Y |D X ) is more convenient because we can immediately
utilize the (at least since Kolmogorov, 1933/1977) well-known properties of conditional
expectations, which are presented in some detail in RS-chapter 4. Reading the following
definition, remember that X (Ω) = {X (ω): ω ∈ Ω } denotes the image of Ω under the map X .

Definition 5.4 [True Outcome Variable ]


Let the Assumptions 5.1 (a) to (d) hold. Then

τx := E X =x (Y |D X ) (5.1)

is called a true outcome variable of Y given (the value) x (of X ) .

1 The idea of comparing conditional expectation values on the individual level is already found in Splawa-
Neyman (1923/1990). Later, it has been oversimplified by Rubin, who compares the values Y x (u) and Y x ′ (u)
of his potential outcome variables between two treatment conditions x and x ′ (see, e. g., Rubin, 1974, 2005).
118 5 True Outcome Variable and Causal Total Effects

Note that τx is a random variable on the probability space (Ω, A, P ), just as the putative
cause X , the outcome variable Y, and the global potential confounder D X of X . Only Y
has to be real-valued whereas X and D X may take on their values in any set ΩX′ and Ω D ′
X
,
respectively.

Remark 5.5 [Uniqueness of a True Outcome Variable τx ] According to Equation (5.1), a


true outcome variable τx denotes any element of the set E X =x (Y |D X ) of versions of the
D X -conditional expectation of Y with respect to the probability measure P X =x (see RS-
Rem. 5.8). Note that, in general, there are many elements in the set E X =x (Y |D X ). However,
if τx , τx∗ ∈ E X =x (Y |D X ), then τx X==x τx∗, that is, the versions τx and τx∗ are identical P X =x -al-
P
most surely. Hence, a true outcome variable is uniquely defined up to almost sure identity
with respect to the measure P X =x. In this case we also say that τx is P X =x-unique. ⊳
Remark 5.6 [Total Effects] To emphasize, controlling for a global potential confounder
D X , we do not condition on potential mediators. This is why we may explicitly refer to
‘total effects’ if the context is ambiguous. ⊳
Remark 5.7 [A Caveat on Notation] The shortcut τx for (a version of) the conditional ex-
pectation E X =x (Y |D X ) is meaningful only in a context in which the references to a speci-
fied outcome variable Y, a specified putative cause variable X with its value x, and a global
potential confounder D X of X are unambiguous. ⊳
Remark 5.8 [Value of a True Outcome Variable] A value of a true outcome variable τx is
called the true (or expected) outcome of Y given the value x of X and the value d of D X or
given the event {X =x , D X =d }. According to Equation (5.1) and RS-Equation (5.25),

∀ω ∈ Ω : τx (ω) = E X =x (Y |D X )(ω) = E X =x (Y |D X =d ), if ω ∈ {D X =d }. (5.2)

Furthermore, if P (X =x , D X =d) > 0, then

E X =x (Y |D X =d ) = E (Y | X =x , D X =d), (5.3)

[see RS-Eq. (5.26)]. That is, a value of a true outcome variable τx is identical to a con-
ditional expectation value of Y given the value x of X and the value d of a global po-
tential confounder D X . Such a conditional expectation value is uniquely defined only if
P (X =x , D X =d ) > 0 (see RS-Rem. 4.33). ⊳
Remark 5.9 [The Values of τx Cannot be Observed] Because the values of τx are the con-
ditional expectation values E X =x (Y |D X =d ), they cannot be observed. In contrast, the val-
ues of X and Y can be observed, if the random experiment represented by (Ω, A, P ) is con-
ducted. In most applications, the values of τx even cannot be estimated (for more details,
see Rem. 5.13). ⊳
Remark 5.10 [ τx is D X -Measurable] Note that τx is a random variable on the probability
space (Ω, A, P ) that is measurable with respect to D X , that is, σ(τx ) ⊂ σ(D X ). This immedi-
ately follows from RS-Definition 5.4 (a). ⊳
Remark 5.11 [Factorization of τx ] According to RS-Corollary 4.18, there is a measurable
function g x : (Ω′D X , A ′D X ) → (R, B) such that τx = E X =x (Y |D X ) = g x (D X ) is the compo-
sition of D X and the factorization g x of E X =x (Y |D X ) (see RS-sect. 5.29). Therefore, if
P (X =x , D X =d ) > 0, then we may rewrite Equations (5.2) and (5.3) as follows:
5.1 True Outcome Variable 119

Table 5.1. Joe and Ann with self-selection revisited

Probability
Possible outcomes ωi measures Observables Conditional expectations

Treatment variable X

Outcome variable Y
Person variable U

τ0 = E X=0 (Y | U )

τ1 = E X =1 (Y | U )

δ10 = τ1 −τ0
P X=0({ωi })

P X =1({ωi })

P (X =1| U )

E(Y | X ,U )
Treatment

E(Y | X )
Success

P ({ωi })
Unit

ω1 = (Joe, no, −) .144 .24 0 Joe 0 0 .04 .6 .7 .7 .8 .1


ω2 = (Joe, no, +) .336 .56 0 Joe 0 1 .04 .6 .7 .7 .8 .1
ω3 = (Joe, yes, −) .004 0 .01 Joe 1 0 .04 .42 .8 .7 .8 .1
ω4 = (Joe, yes, +) .016 0 .04 Joe 1 1 .04 .42 .8 .7 .8 .1
ω5 = (Ann,no, −) .096 .16 0 Ann 0 0 .76 .6 .2 .2 .4 .2
ω6 = (Ann,no, +) .024 .04 0 Ann 0 1 .76 .6 .2 .2 .4 .2
ω7 = (Ann,yes,−) .228 0 .57 Ann 1 0 .76 .42 .4 .2 .4 .2
ω8 = (Ann,yes,+) .152 0 .38 Ann 1 1 .76 .42 .4 .2 .4 .2

τx (ω) = E X =x (Y |D X )(ω) = g x (D X )(ω) = g x D X (ω) = g x (d)


¡ ¢
∀ ω ∈ Ω:
(5.4)
= E X =x (Y |D X =d ) = E (Y | X =x , D X =d ), if ω ∈ {D X =d },

Hence, in order to assign a value of τx to an outcome ω ∈ Ω of the random experiment,


first we may assign to ω — via the global potential confounder D X — a value d of this
map, and then assign to d — via the factorization g x — the corresponding conditional
expectation value E X =x (Y |D X =d ), which is identical to E (Y | X =x , D X =d ) [see Eq. (5.3)
and Example 5.12]. ⊳

Example 5.12 [Joe and Ann With Self-Selection] Consider the random experiment pre-
sented in Table 5.1. It describes the same random experiment as Table 1.2 and it has
the same structure as the experiment presented in Table 4.1. However, in Table 5.1 we
added some terms that are important to illustrate the true outcome variables τ0 , τ1 and
their difference τ1 − τ0 . Furthermore, we also used the more general notation for condi-
tional expectations, which also apply if the outcome variable is not an indicator vari-
able with values 0 and 1. In Example 4.10, we already specified the set Ω = Ω1 × Ω2 × Ω3
of possible outcomes with its elements ωi = (u, ωX , ωY ), the probability space (Ω, A, P ),
the random variables U = π1, X , Y, and the filtration (Ft )t ∈T in A, where T = {1, 2, 3}.
Furthermore, in Example 4.10 we also asserted that U = π1 is a global potential con-
founder of the treatment variable X , playing the role of D X in the general theory. Therefore,
τx = E X =x (Y |D X ) = E X =x (Y |U ) for both treatments x = 0 and x = 1.
Now we specify the values of the true outcome variables τx = E X =x (Y |U ) for the two
treatment conditions x = 0 and x = 1. According to Remark 5.11,

τ0 = g 0 (U ), (5.5)
120 5 True Outcome Variable and Causal Total Effects

where U : Ω → ΩU with
¡ ¢
U (ωi ) = U (u, ωX , ωY ) = u, for all ωi ∈ Ω, (5.6)
and g 0 : ΩU → R with
g 0 (u) = E (Y | X =0,U =u ), for all u ∈ ΩU (5.7)
[see Eq. (5.4)]. Hence, in order to assign a value of τ0 to an outcome ωi ∈ Ω of the random
experiment, first we have to assign to ωi a value u ( Joe or Ann ) of the person variable
U , and then assign to u via g 0 the number E (Y | X =0,U =u ), that is, the corresponding
conditional expectation value.
If, for instance, ω3 = ( Joe , yes , −) (see the third row in Table 5.1), then U (ω3 ) = Joe , and
the value of τ0 is
¡ ¢
τ0 (ω3 ) = g 0 U (ω3 ) = g 0 ( Joe ) = E (Y | X =0,U =Joe ) = P (Y =1| X =0,U =Joe ) = .7.
This is true even though X (ω3 ) = 1 and the value of the conditional expectation E (Y | X ,U )
is
E (Y | X ,U )(ω3 ) = E (Y | X =1,U =Joe ) = P (Y =1| X =1,U =Joe ) = .8.
Hence, the true outcome variable τ0 given treatment 0 takes on a well-defined value for ω3
even though the unit drawn receives treatment 1. This illustrates the distinction between
the random variables τ0 = E X=0 (Y |U ) and E (Y | X ,U ). While τ0 = g 0 (U ) is solely a function
of U [see Eq. (5.5)], the conditional expectation E (Y | X ,U ) is a function of X and U , that is,
there is a function g : ΩX′ ×ΩU → R such that E (Y | X ,U ) = g (X ,U ), where g (X ,U ) denotes
the composition of (X ,U ) and g . Again, we simply say that E (Y | X ,U ) is a function of X
and U .
Because, in this example, the outcome variable Y is dichotomous with values 0 and
1, the conditional expectation value E (Y | X =0,U =u ) is also the conditional probability
P (Y =1| X =0,U =u ) of success, and because in this example, U has only two values, Joe
and Ann , the true outcome variable τ0 also has only two different values, the two condi-
tional probabilities P (Y =1| X =0,U =Joe ) = .7 and P (Y =1 | X =0,U =Ann ) = .2 (see Table
5.1). Hence, if ωi ∈ {U =Ann} = {ω ∈ Ω: U (ω) = Ann }, then the value of τ0 is
¡ ¢
τ0 (ωi ) = g 0 U (ωi ) = g 0 ( Ann ) = E (Y | X =0,U =Ann ) = P (Y =1| X =0,U =Ann ) = .2.
Similarly, the true outcome variable τ1 = E X =1 (Y |U ) given treatment 1 is specified by
τ1 = g 1 (U ), (5.8)
where g 1 : ΩU → R is defined by
g 1 (u) = E (Y | X =1,U =u ), for all u ∈ ΩU . (5.9)
Hence, if ωi ∈ {U =Joe } = {ω ∈ Ω: U (ω) = Joe }, then the value of τ1 is
¡ ¢
τ1 (ωi ) = g 1 U (ωi ) = g 1 ( Joe ) = E (Y | X =1,U =Joe ) = P (Y =1| X =1,U =Joe ) = .8,
and if ωi ∈ {U =Ann}, then the value of τ1 is
¡ ¢
τ1 (ωi ) = g 1 U (ωi ) = g 1 ( Ann ) = E (Y | X =1,U =Ann ) = P (Y =1| X =1,U =Ann ) = .4.
The last two columns of Table 5.1 show which values the two random variables τ0 =
E X=0 (Y |D X ) and τ1 = E X =1 (Y |D X ) assign to each of the eight possible outcomes ωi of the
random experiment. ⊳
5.1 True Outcome Variable 121

Remark 5.13 [The Fundamental Problem of Causal Inference] In Remark 5.9 we already
mentioned that the values of τx cannot be estimated. The reason is as follows: Consider
the last but one column of Table 5.1 with the values E X =1 (Y |U =u ) of τ1 , namely .8 for
u = Joe and .4 for u = Ann . Imagine that we would conduct the random experiment pre-
sented in this table and that Joe is sampled and he selects treatment (X =1). Then we can
observe the value of Y for Joe under treatment, which is an estimate of E X =1 (Y |U =Joe ),
although a bad one. However, in many applications, we cannot estimate E X=0 (Y |U =Joe )
at the same time because, if he selects treatment, then we cannot observe his value of Y
under control, and vice versa. This has been called “the fundamental problem of causal
inference” by Holland (1986). Observing Joe’s value of the outcome variable Y, first under
control and then under treatment may yield a value of Y for Joe under treatment that may
systematically differ from the corresponding value that would be observed if Joe would not
have been in control, in the first place. There are numerous reasons for this fact that are
discussed in great detail by Campbell and Stanley (1966). From a formal point of view, it
should be noted that observing Joe’s value of the outcome variable Y under control and
then again under treatment refers to a different random experiment (represented by a dif-
ferent probability space) than observing his outcome value under treatment without the
preceding observation under control. ⊳

Example 5.14 [Nonorthogonal Factors] In Example 4.2.3 we already showed that U is


a global potential confounder of X and that Z is a potential confounder of X . In the
last three columns of Table 1.5 we specified the true outcome variables τ0 = E X=0 (Y |U ),
τ1 =E X =1 (Y |U ), and τ2 =E X =2 (Y |U ). In this example, all three true outcome variables are
uniquely defined because P (X =1|U ) > 0 (see RS-Th. 5.27). ⊳

Remark 5.15 [True Outcomes vs. Potential Outcomes] Rubin (1974, 2005) assumes that,
given an observational unit u and a treatment condition x, the values of his potential out-
come variables Y0 and Y1 are fixed numbers. In the example presented in Table 5.1, this
would mean to replace the two true outcome variables τ0 and τ1 by the two potential out-
come variables Y0 and Y1 that can take on only the values 0 and 1. Substantively speaking,
this would mean that, given a concrete treatment and a concrete observational unit, the
outcome is fixed to 0 or 1. For example, if the outcome is being alive (Y =1) or not (Y = 0) at
the age of 80 and the treatment is receiving (X =1) or no receiving (X =0) an anti-smoking
therapy before the age of 40, then this deterministic idea is not in line with our knowledge
of causes for being alive or not being alive at the age of 80.
In contrast, the two true outcome variables τ0 and τ1 can take on any real number
as values. In the example of Table 5.1, they can take on any value between 0 and 1, in-
clusively. In the smoking example, their values would be the person-specific probabil-
ities of being alive at the age of 80 given treatment or given no treatment. [As we will
see later on, it is irrelevant that we usually cannot estimate these conditional probabili-
ties P (Y =1| X =1,U =u ).] Therefore, the true outcome variables can be considered to be
a generalization of the potential outcome variables. Most important, in contrast to po-
tential outcome variables, true outcome variables are in line with the idea that mediators
and events that may occur between treatment and outcome might also affect the outcome
variable Y.
Hence, in Rubin’s potential outcome approach the observational unit u and treatment
x determine the value of the outcome variable Y. In contrast, according to true outcome
122 5 True Outcome Variable and Causal Total Effects

theory, u and x only determine the conditional distribution P Y |X =x ,U =u , and with it, the
conditional expectation value E (Y |X =x ,U =u ) of the outcome variable Y. ⊳

5.2 True Total Effect Variable

Now we introduce the concept of a true (D X =d )-conditional total effect comparing x to x ′,


the latter being two different values of the focused putative cause variable X . The basic
idea of this concept is to hold constant all those other possible causes of Y that are prior or
simultaneous to X at one combination of their values and then compare the conditional
expectation values of Y between the two values x and x ′ of X .
As mentioned before, the intuitive version of this basic idea goes back at least to John
Stuart Mill (1843/1865). It is often referred to as the ceteris paribus clause (all other things
equal). Remember that a global potential confounder can be a multivariate random vari-
able, consisting of several univariate random variables (potential confounders). Also note
that a global potential confounder of X does not comprise potential mediators, that is,
variables that might be affected by X and might have an effect on Y. The term ‘does not
comprise’ is used here in the sense that the σ-algebra generated by such a potential medi-
ator is not a subset of the σ-algebra generated by D X .
If we assume P (X =x , D X =d ) > 0 and P (X =x ′, D X =d ) > 0, then a true (D X =d )-condi-
tional total effect
E (Y | X =x , D X =d) − E (Y | X =x ′, D X =d )
is a uniquely defined number [see RS-Eq. (3.23)]. This difference may also be called the
true total effect on Y comparing x to x ′ given (the event) {D X =d }.
For different values d and d ′ of the global potential confounder D X , the true conditional
total effects can differ from each other. This necessitates a second concept, a true total
effect variable of Y comparing x to x ′, that is,

E X =x (Y |D X ) − E X =x (Y |D X ) = τx − τx ′ .

In general, this random variable is not uniquely defined so that there can be many versions
of such a true total effect variable. However, assuming P -uniqueness of τx = E X =x (Y |D X )

and of τx ′ = E X =x (Y |D X ) implies that the difference τx − τx ′ is P-unique as well [see RS-Box
5.1 (viii)].

Remark 5.16 [True Outcome Variables Are P X =x -Unique] In Remark 5.5 we already men-
tioned that, according to its definition as a D X -conditional expectation of Y with respect
to the probability measure P X =x , a true outcome variable τx is P X =x -unique. That is, if
E X =x (Y |D X ) denotes the set of all versions of the D X -conditional expectation of Y with
respect to the measure P X =x , then

∀τx , τx∗ ∈ E X =x (Y |D X ): τx X==x τx∗ (5.10)


P

holds, where

τx X==x τx∗ P X =x ω ∈ Ω: τx (ω) = τx∗(ω)


¡© ª¢
:⇔ = 1. (5.11)
P

That is, whenever τx and τx∗ are two versions of the D X -conditional expectation of Y with
respect to P X =x , then they are identical P X =x -almost surely. ⊳
5.2 True Total Effect Variable 123

Remark 5.17 [Assuming P -Uniqueness of True Outcome Variables] However, in the def-
inition of a true total effect variable (see Def. 5.18) we assume that the true outcome vari-
ables τx and τx ′ are P -unique. Remember,

τx is P-unique :⇔ ∀τx , τx∗ ∈ E X =x (Y |D X ): τx =


P
τx∗, (5.12)

where

τx∗ P ω ∈ Ω: τx (ω) = τx∗(ω) = 1.


¡© ª¢
τx =
P
:⇔ (5.13)

According to RS-Theorem 5.27, P -uniqueness of τx is equivalent to P (X =x |D X ) >


P
0, which
is defined by
¡© ª¢
P (X =x |D X ) >
P
0 :⇔ P ω ∈ Ω: P (X =x |D X )(ω) > 0 = 1. (5.14)

That is, P -uniqueness of τx is equivalent to P (X =x |D X ) being positive, P -almost surely. ⊳

Definition 5.18 [True Total Effect Variable and True Total Effect ]
Let the Assumptions 5.1 (a) to (d) and (g) hold. Furthermore, let τx and τx ′ denote true
outcome variables of Y given the values x and x ′ of X , respectively.
(i) If τx and τx ′ are P-unique, then we call CTE D X ; x x ′ : Ω′D X → R a version of the
true total effect function comparing x to x ′ (with respect to Y), if

CTE D X ; xx ′ (D X ) =
P
τx − τx ′ (5.15)

holds for the composition CTE D X ; x x ′ (D X ) of D X and CTE D X ; x x ′ .


(ii) If τx and τx ′ are P-unique, then the composition CTE D X ; xx ′ (D X ) is called a ver-
sion of the true total effect variable comparing x to x ′ (with respect to Y ).
(iii) If d is a value of D X such that P (X =x , D X =d ), P (X =x ′, D X =d ) > 0, then we call

CTE D X ; xx ′ (d) := E (Y | X =x , D X =d ) − E (Y | X =x ′, D X =d ) (5.16)

the true total effect on Y given the value d of D X comparing x to x ′ .

Of course, the concepts of a true total effect and a true total effect variable are of a the-
oretical nature and can be estimated only under rather restrictive assumptions. However,
other causal total effects such as the expectation of τx − τx ′ (see sect. 5.3) can be estimated
under realistic assumptions.

Remark 5.19 [CTE D X ; x x ′ (D X ) Versus CTE D X ; xx ′ ] While CTE D X ; x x ′ (D X ) is a random vari-


able on the probability space (Ω, A, P ) assigning values to all ω ∈ Ω, the function CTE D X ; x x ′
is a random variable on the probability space (Ω′D X , A ′D X , P D X ) that assigns values to all
d ∈ Ω′D X (see RS-Rem. 2.39). From a substantive point of view the two maps contain the
same information. ⊳
Example 5.20 [Joe and Ann With Self-Selection] In Example 5.12, we already specified
the true outcome variables τ0 and τ1 . Their difference τ1 − τ0 is a version of the true to-
tal effect variable. Because, in this example, the person variable U is a global potential
confounder of X , the true total effect variable may also be written
124 5 True Outcome Variable and Causal Total Effects

CTE U ;10 (U ) = E X =1 (Y |U ) − E X=0 (Y |U ) = τ1 − τ0 .

It is a random variable on the probability space (Ω, A, P ). In contrast, the


¡ true total effect
¢
function CTE U ;10 : ΩU → R is a random variable on the probability space ΩU , P (ΩU ), PU .
Using the functions g 0 and g 1 defined by Equations (5.7) and (5.9),

CTE U ;10 := g 1 − g 0 .

Its values are

CTE U ;10 ( Joe ) = g 1 ( Joe ) − g 0 ( Joe ) = .8 − .7 = .1,

the treatment effect of Joe, and

CTE U ;10 (Ann ) = g 1 (Ann ) − g 0 (Ann ) = .4 − .2 = .2,

the treatment effect of Ann. Hence, CTE U ;10 assigns to each person u ∈ ΩU the person-
specific true effect on the outcome variable Y (success) comparing x =1 to x = 0. ⊳
Remark 5.21 [Uniqueness of a True Total Effect Variable] For simplicity, we denote a true
total effect variable CTE D X ; x x ′ (D X ) also by

δxx ′ = τx − τx ′ . (5.17)

It is a random variable on (Ω, A, P ) that is not necessarily unique. However, if P (X =x ) and


P (X =x ′ ) are positive and τx , τx ′ are P -unique, then the difference variable δxx ′ is also P -
unique [see RS-Box 5.1 (viii)]. That is, two versions of such a difference variable are iden-
∗ ′ ∗ ′
tical P -almost surely. Hence, if δx x ′ and δxx are two such versions, then E (δxx ′ ) = E (δxx )

[see RS-Box 3.1 (vi) with A = Ω]. Other implications are that δx x ′ and δxx ′ have identical
distributions, variances, and covariances with other random variables (see RS-Rem. 4.11).

Remark 5.22 [Uniqueness of a True Total Effect Function] In contrast to δxx ′ , which is a
function on Ω, a true total effect function CTE D X ; x x ′ is a function on Ω′D X . It is also not
necessarily unique. However, if there are two versions of CTE D X ; xx ′ , then they are identical
P D X -almost surely. This follows from δxx ′ =
P
δx∗x ′ (see Rem. 5.21) and RS-Theorem 2.54. ⊳
Remark 5.23 [Values of a True Total Effect Variable] As mentioned above, δx x ′ is a ran-
dom variable on (Ω, A, P ). According to Equation (5.1), and RS-Equations (5.25) and (5.26),

∀ ω ∈ Ω: δxx ′ (ω) = E X =x (Y |D X )(ω) − E X =x (Y |D X )(ω)

= E X =x (Y |D X =d ) − E X =x (Y |D X =d ) (5.18)

= E (Y | X =x , D X =d ) − E (Y | X =x , D X =d), if ω ∈ {D X =d }.
Hence, a value of a true total effect variable δxx ′ is the difference between the conditional
expectation values of Y given the values (x, d) and (x ′, d) of (X , D X ). If d is a value of D X
such that P (X =x , D X =d ), P (X =x ′, D X =d) > 0, then

∀ ω ∈ Ω: δx x ′ (ω) = CTE D X ; xx ′ (d), if ω ∈ {D X =d } (5.19)

[see Def. 5.18 (iii) and RS-Def. 5.30]. If P (X =x , D X =d ), P (X =x ′, D X =d ) > 0, then this value
is identical for all versions of the true total effect function CTE D X ; xx ′ and for all versions of
the true total effect variable CTE D X ; xx ′ (D X ) = δxx ′ .

5.3 Causal Average Total Effect 125

Remark 5.24 [Probabilistic Ceteris Paribus Clause] Considering a value CTE D X ; x x ′ (d) is
tantamount to comparing x to x ′ (with respect to the outcome variable Y ) keeping con-
stant the global potential confounder D X , and with it, keeping constant all potential con-
founders of X . Keeping constant D X , is the translation of the ceteris paribus clause for total
effects into probability theory. ⊳
Example 5.25 [Joe and Ann With Self-Selection] In Example 5.20, we already specified
the true total effect function CTE U ;10 : ΩU → R . The corresponding true total effect vari-
able CTE U ;10 (U ) is the composition of U and CTE U ;10 . It is a random variable on (Ω, A, P ).
For all ωi ∈ Ω, its values are assigned by
(
¡ ¢ CTE U ;10 ( Joe ) = .1, if ωi ∈ {U =Joe }
δxx ′ (ωi ) = CTE U ;10 (U )(ωi ) = CTE U ;10 U (ωi ) =
CTE U ;10 (Ann ) = .2, if ωi ∈ {U =Ann }

(see the last column of Table 5.1). ⊳

5.3 Causal Average Total Effect

Now we define a causal average total effect by the expectation

E (δxx ′ ) = E (τx − τx ′ ) = E (τx ) − E (τx ′ ) (5.20)

of a true total effect variable [see Eq. (5.17) and RS-Box 3.1 (vii)]. Hence, the causal average
total effect is the expectation of a true total effect variable δxx ′ ; it is not an unweighted
average as the name might suggest.
As mentioned before, this expectation can be estimated under assumptions that are
less restrictive than those that allow estimating the total effect variable δxx ′ itself. This will
be detailed in the chapters on unbiasedness and its sufficient conditions.

Definition 5.26 [Causal Average Total Effect]


Let the Assumptions 5.1 (a) to (d) and (g) hold, let τx and τx ′ denote true outcome vari-
ables of Y given the values x and x ′ of X , respectively, and let δx x ′ = τx − τx ′ . Further-
more, assume that τx and τx ′ are P-unique and that the expectations E (τx ) and E (τx ′ )
are finite. Then

ATE x x ′ := E (δxx ′ ), (5.21)

is called the causal average total effect on Y comparing x to x ′ (with respect to P ).

Taking the expectation of δxx ′ (with respect to P ) means that we consider the average
total effect with respect to the measure P , that is,

ATE xx ′ = E (δxx ′ ) = E CTE D X ; x x ′ (D X ) = E E X =x(Y |D X ) − E X =x (Y |D X ) .
¡ ¢ ¡ ¢
(5.22)

Note again that CTE D X ; x x ′ (D X ) is a random variable on (Ω, A, P ). In principle, we can also
consider the average total effect with respect to another measure than P (for more details
see Rem. 5.45).
126 5 True Outcome Variable and Causal Total Effects

Remark 5.27 [Expectation of CTE D X ; xx ′ With Respect to the Distribution of D X ] Accord-


ing to RS-Theorem 3.11, ATE xx ′ is identical to the expectation of a true total effect function
CTE D X ; x x ′ with respect to the distribution of the global potential confounder D X , that is,

ATE xx ′ = E (δxx ′ ) = E D X (CTE D X ; xx ′ ). (5.23)

Note that CTE D X ; x x ′ is a random variable on (Ω′D X , A ′D X , P D X ), not on (Ω, A, P ) (see again
RS-Rem. 2.39). ⊳

Remark 5.28 [P -Uniqueness and Expectations of τx and τx ′ ] According to RS-Box 5.1 (vii),
P -uniqueness of τx also implies

∀τx , τx∗ ∈ E X =x (Y |D X ) : E (τx ) = E (τx∗). (5.24)

Hence, under the assumptions of Definition 5.26, the expectation of τx is a uniquely de-
fined number, and the same applies to the expectation of τx ′ . 2 ⊳

Remark 5.29 [Expectation of δx x ′ ] Under the assumptions of Definition 5.26, E (δxx ′ ) is a


uniquely defined real number because

E (δxx ′ ) = E (τx − τx ′ ) = E (τx ) − E (τx ′ ) [(5.20)]



= E E X =x (Y |D X ) − E E X =x (Y |D X ) .
¡ ¢ ¡ ¢
[(5.1)]

If the true outcome variables τx and τx ′ are not P-unique or the expectations E (τx ) and
E (τx ′ ) are not finite, then the causal average total effect ATE x x ′ is not defined. ⊳

Now we gather some sufficient conditions for E (τx ) to be finite. For more details and
proofs see SN-Remark 14.47, from which the following remark is adapted.

Remark 5.30 [Sufficient Conditions for Finiteness of E (τx )] Remember, the expectation
(Ω, A, P ) exists if the integral Y + d P of
R
of a random variable Y on a probability R space
the positive part of Y or the integral Y − d P of the negative part of Y is finite (see RS-
Def. 3.1). Hence, under the assumptions of Definition 5.26, some sufficient conditions for
the expectation E (τx ) of any version τx ∈ E X =x (Y |D X ) to exists and to be finite are:
(a) σ(D X ) is a finite set and E X =x (Y ) is finite
(b) τx has only a finite number of real values
(c) τx is P -almost surely bounded on both sides, that is, ∃ α ∈ R : −α ≤ τx ≤ α
P P
(d) Y is P -almost surely bounded on both sides, that is, ∃ α ∈ R : −α ≤ Y ≤ α.
P P
Note that 0 ≤ Y ≤ α, for 0 < α ∈ R , as well as Y = 1A , for A ∈ A, are special cases of (d). ⊳
P P

Remark 5.31 [Causal Average Effect] If X represents a treatment variable, then ATE x x ′ is
also called the ‘average causal effect’ or the ‘causal average treatment effect’ comparing
treatment x to treatment x ′, which is unambiguous as long as no direct and/or indirect
treatment effects are discussed in the same context. ⊳
2 The expectation E(τ ) corresponds to the term E [Y |do(x)] in Pearl’s and to E(Y ) in Rubin’s terminologies (see,
x x
e. g., Pearl, 2009, p. 108 and Rubin, 2005, p. 323).
5.4 Causal Conditional Total Effect and Total Effect Function 127

Remark 5.32 [Hypothesis of a t -Test in a Randomized Experiment] The causal average


total effect to be zero is what is tested by a t -test of the hypothesis µ0 = µ1 [i. e., E (Y |X =0) =
E (Y |X =1)] in an experiment with randomized assignment of an observational unit to a
treatment condition. The expectations E (τ0 ) and E (τ1 ) occurring in Equation (5.20) are es-
timated, for example, by the sample group means in such a randomized experiment with
treatment conditions 0 and 1. In this case, the expectations E (τ0 ) and E (τ1 ) are identical
to the conditional expectation values E (Y |X =0) and E (Y |X =1), respectively. (For more
details see ch. 8, in particular Th. 8.43.) ⊳
Example 5.33 [Joe and Ann With Self-Assignment] In Tables 1.2 and 5.1 we presented an
example in which the person sampled assigns him- or herself to one of the two treatment
conditions, 0 and 1. In Example 5.12, we specified the true outcome variables τ0 = g 0 (U )
and τ1 = g 1 (U ) and their values g 0 (u) and g 1 (u) for this example.
Now we illustrate the causal average total effect comparing treatment to control, which
can be computed as follows:
ATE 10 = E E X =1 (Y |U ) − E X=0 (Y |U ) = E (τ1 − τ0 ) = E (τ1 ) − E (τ0 )
¡ ¢

X X
= g 1 (u) · P (U =u ) − g 0 (u) · P (U =u )
u u
= P (Y =1| X =1,U =Joe ) · P (U =Joe ) + P (Y =1| X =1,U =Ann ) · P (U =Ann )
¡ ¢
− P (Y =1| X =0,U =Joe ) · P (U =Joe ) + P (Y =1| X =0,U =Ann ) · P (U =Ann )

= .80 · .50 + .40 · .50 − (.70 · .50 + .20 · .50) = .15,


using the transformation theorem (see RS-Th. 3.11 and RS-Rem. 3.13).
Figure 1.3 visualizes the causal average total effect, which, in this example, is not iden-
tical to the difference

E (Y | X =1) − E (Y | X =0) = P (Y =1|X =1) − P (Y = 0|X =0) = .48 − .6 = −.18.

Figure 1.3 also illustrates various conditional probabilities of success. The points marked
by the dashed line are the probabilities P (Y =1|X =1,U =Joe ) and P (Y =1|X =0,U =Joe ) of
success for Joe given that he is treated and given that he is not treated, respectively. Simi-
larly, the two points marked by the solid line show the probabilities P (Y =1|X =1,U =Ann)
and P (Y =1|X =0,U =Ann) of success for Ann given that she is treated and given that she
is not treated, respectively. The points marked by the dotted line represent the conditional
probabilities P (Y =1| X =1) and P (Y =1| X =0) of success given treatment and given con-
trol, respectively. The size of the area of the dotted circles is proportional to the conditional
probabilities P (U =u |X =x ) that are used in the computation of the conditional expecta-
tion values
X
E (Y |X =x ) = E (Y | X =x ,U =u ) · P (U =u |X =x ) (5.25)
u

[see RS-Box 3.2 (ii)]. ⊳

5.4 Causal Conditional Total Effect and Total Effect Function

So far we considered the random variables X , Y, and D X , a global potential confounder


of X . Now we bring into play an additional random variable Z on (Ω, A, P ) and introduce
128 5 True Outcome Variable and Causal Total Effects

the concepts of the causal conditional total effect given the value z of Z and of a causal Z-
conditional total effect function. Often Z is a pretest that assesses the ‘same’ attribute as the
outcome variable Y, only prior to treatment. In other examples, Z could be X itself. First,
we explain the assumptions based on which we can introduce these concepts, present
their definitions, and then turn to re-aggregating conditional effects.

5.4.1 Notation, Assumptions and Definitions

One of the assumptions in the definition of a causal conditional total effect is P Z=z -unique-
ness of the true outcome variables τx and τx ′ , which is defined in the following remark.

Remark 5.34 [P Z=z -Uniqueness of a True Total Effect Variable] Let the Assumptions 5.1
(a) to (f) hold. Then

τx is P Z=z-unique :⇔ ∀τx , τx∗ ∈ E X =x (Y |D X ): P Z=z {ω ∈ Ω: τx (ω) = τx∗(ω)} = 1,


¡ ¢
(5.26)

That is, if τx is P Z=z-unique, then all versions of the D X -conditional expectation of Y with
respect to the probability measure P X =x are identical, P Z=z -almost surely.
Also note that P Z=z -uniqueness of τx is equivalent to P (X =x |D X ) > 0 (see RS-Th. 5.27),
P Z=z
which is defined by

P Z=z {ω ∈ Ω: P (X =x |D X )(ω) > 0} = 1.


¡ ¢
P (X =x |D X ) > 0 :⇔ (5.27)
P Z=z

Hence, P Z=z -uniqueness of τx is equivalent to P (X =x |D X ) being positive, P Z=z -almost


surely. ⊳

Remark 5.35 [An Implication of P Z=z -Uniqueness of τx and τx ′ ] The conditional expecta-
tion value E (τx − τx ′ |Z =z) is a uniquely defined number if P (Z =z) > 0 and we assume
P Z=z -uniqueness of τx and τx ′ . In other words, if τx , τx ′ are P Z=z-unique and δx x ′ , δx∗x ′ are
different versions of the true total effect variable CTE D X ; xx ′ (D X ) [see Def. 5.18 (ii)], then
∗ ∗
E Z=z (δxx ′ ) = E Z=z (δxx ′ ) = E (δxx ′ | Z =z) = E (δxx ′ | Z =z) (5.28)

[see RS-Box 5.1 (viii)]. ⊳

Remark 5.36 [P -Uniqueness Implies P Z=z -Uniqueness] If P (Z =z) > 0, then P -unique-
ness of τx = E X =x (Y |D X ) implies that τx is also P Z=z-unique [see RS-Box 5.1 (v)]. In con-
trast, P Z=z -uniqueness of τx does not imply that it is also P-unique. Hence, P Z=z-unique-
ness of τx is a weaker assumption than P -uniqueness of τx . However, there are conditions
under which P Z=z -uniqueness implies P -uniqueness, so that they are equivalent to each
other. ⊳

Lemma 5.37 [A Condition Under Which P Z=z -Uniqueness Implies P -Uniqueness]


Let the Assumptions 5.1 (a) to (e) hold. Furthermore, assume 5.1 (f ) for all z ∈ Z (Ω).
Finally, let τx denote a true outcome variable of Y given the value x. Then

∀ z ∈ Z (Ω): τx is P Z=z-unique
¡ ¢
⇔ τx is P-unique. (5.29)
(Proof p. 147)
5.4 Causal Conditional Total Effect and Total Effect Function 129

Hence, under the assumptions of Lemma 5.37, τx is P Z=z-unique for all values z of Z
if and only if τx is P-unique. However, note that P -uniqueness of τx is also defined if Z is
continuous and P (Z =z) = 0, for all z ∈ Z (Ω). In this case, Proposition (5.29) does not hold.

Definition 5.38 [Causal Conditional Total Effect Function]


Let the Assumptions 5.1 (a) to (e) and (g) hold, let τx and τx ′ denote true outcome vari-
ables of Y given the value x and x ′ , respectively, and let CTE Z ; x x ′ : (ΩZ′ , AZ′ ) → (R , B) be
a measurable function.
(i) If τx and τx ′ are P-unique, then the composition

CTE Z ; xx ′ (Z ) =
P
E (τx − τx ′ |Z ) (5.30)

is called a version of the causal Z-conditional total effect variable comparing x to


x ′ (with respect to Y ), and CTE Z ; xx ′ is called a version of the causal Z-conditional
total effect function comparing x to x ′ (with respect to Y ).
(ii) Let the Assumptions 5.1 (a) to (g) hold and assume that τx and τx ′ are P Z=z-unique.
Then
CTE Z ; x x ′ (z) = E (τx − τx ′ |Z =z). (5.31)
is called the causal (Z =z)-conditional total effect on Y comparing x to x ′.

Under the assumptions of Definition 5.38 (ii), CTE Z ; x x ′ (z) is uniquely defined. Further-
more, if τx and τx ′ are P Z=z-unique for all z ∈ Z (Ω), then, according to Lemma 5.37, they
are also P-unique and

∀ z ∈ Z (Ω): CTE Z ; x x ′ (Z )(ω) = CTE Z ; xx ′ (z), if ω ∈ {Z =z } . (5.32)

That is, if τx and τx ′ are P Z=z-unique for all z ∈ Z (Ω), then the causal (Z =z)-conditional
total effects CTE Z ; x x ′ (z) are the uniquely defined values of the causal Z -conditional total
effect function CTE Z ; xx ′ .

Remark 5.39 [A Characterization of CTE Z ; xx ′ (Z )] Lemma 3.16 and RS-Proposition (4.8)


imply that CTE Z ; xx ′ (Z ) is Z -measurable and a version of the Z-conditional expectation of
τx − τx ′ , that is,

CTE Z ; xx ′ (Z ) ∈ E (τx − τx ′ |Z ). (5.33)

Remark 5.40 [CTE Z ; x x ′ (Z ) Versus CTE Z ; x x ′ ] While the composition CTE Z ; x x ′ (Z ) is a ran-
dom variable on the probability space (Ω, A, P ), which assigns values to all ω ∈ Ω, the func-
tion CTE Z ; xx ′ is a random variable on (ΩZ′ , AZ′ , P Z ), which assigns values to all z ∈ ΩZ′ . It is
the factorization of the conditional expectation CTE Z ; x x ′ (Z ) (see RS-sect. 4.3). ⊳

Remark 5.41 [Conditioning on D X Versus Conditioning on Z ] So far, we considered two


kinds of random variables on which we condition. The first is D X , a global potential con-
founder of X . Such a variable is essential for the theory and in particular for translating the
130 5 True Outcome Variable and Causal Total Effects

ceteris paribus clause into the language of probability theory. It has been used to define a
true outcome variable τx = E X =x (Y |D X ).
Estimating the true outcome variables τx and τx ′ or their difference δxx ′ requires strong
assumptions. Considering a causal Z-conditional total effect variable, we re-aggregate the
true total effect variable τx − τx ′ [see Eq. (5.30)]. This yields a less fine-grained or coarsened
total effect variable, but it is still a causal conditional total effect variable. Considering
E (τx − τx ′ |Z ) instead of τx − τx ′ itself, we may loose information, but we do not loose causal
interpretability. In contrast to a true total effect variable τx − τx ′ , a Z-conditional total effect
variable E (τx − τx ′ |Z ) can often be identified under realistic assumptions by empirically
estimable conditional expectations (see, e. g., chs. 6 to 11). ⊳
Remark 5.42 [E (τx |Z ) Versus E X =x (Y | Z )] Note the distinction between the two Z-condi-
tional expectations E (τx |Z ) and E (τx ′ |Z ) on one side and the two Z-conditional expecta-

tions E X =x (Y | Z ) and E X =x (Y | Z ) on the other side. The difference between the first two
conditional expectations is a causal Z-conditional total effect variable, that is,

CTE Z ; x x ′ (Z ) =
P
E (τx − τx ′ |Z ) =
P
E (τx |Z ) − E (τx ′ |Z ). (5.34)

In contrast, the conditional expectations E X =x (Y | Z ) and E X =x (Y | Z ), and their differ-
ence have no causal meaning unless they are unbiased (see Def. 6.13). The difference

E X =x (Y | Z ) − E X =x (Y | Z ) is just a Z -conditional prima facie effect variable, which can be
seriously misleading if erroneously interpreted as a causal conditional total effect variable.

Remark 5.43 [Coarsening the True Total Effect Variable τx − τx ′ ] With E (τx |Z ) we coarsen
(or re-aggregate) the true outcome variable τx = E X =x (Y |D X ). Conditioning on the global
potential confounder D X we control for all potential confounders of X . Therefore the con-

ditional expectations E X =x (Y |D X ) and E X =x (Y |D X ) inform us how Y depends on the val-
ues x and x ′ controlling for all potential confounders of X . Hence, considering the Z-con-
ditional expectation of the difference variable τx − τx ′ does not introduce bias. As already
stated in Remark 5.41, it just coarsens the most fine-grained total effects to causal total
effects that are less fine-grained. In contrast, considering the conditional expectations

E X =x (Y | Z ) and E X =x (Y | Z ) and their difference, we only control for Z , possibly neglecting
important potential confounders. In chapter 6 we will define E X =x (Y | Z ) to be unbiased or
biased depending on whether or not E X =x (Y | Z ) = P
E (τx |Z ) (see again Def. 6.13). ⊳
Remark 5.44 [Values of a Causal Conditional Total Effect Variable] Note that the compo-
sition CTE Z ; x x ′ (Z ) of the conditional total effect function CTE Z ; x x ′ and Z is a random vari-
able on (Ω, A, P ), and according to RS-Equation (4.18),

∀ ω ∈ Ω: CTE Z ; xx ′ (Z )(ω) = E (τx − τx ′ |Z )(ω)


(5.35)
= E (τx − τx ′ |Z =z) = CTE Z ; x x ′ (z), if ω ∈ {Z =z}.

Hence, if z is a value of Z such that the assumptions of Definition 5.38 (ii) are satisfied, then
CTE Z ; x x ′ (z) is uniquely defined and it is identical to the (Z =z)-conditional expectation
value of τx − τx ′ given the event {Z =z} = {ω ∈ Ω: Z (ω) = z }. Furthermore, according to RS-
Equation (3.35),

CTE Z ; xx ′ (z) = E (τx − τx ′ |Z =z) = E (τx |Z =z) − E (τx ′ |Z =z). (5.36)


5.4 Causal Conditional Total Effect and Total Effect Function 131

That is, a value of the causal conditional total effect variable CTE Z ; xx ′ (Z ) is the difference
between the (Z =z)-conditional expectation values of τx and τx ′ . ⊳
Remark 5.45 [Average Total Effect With Respect to P Z=z ] In Definition 5.38 (ii) it is as-
sumed that P (Z =z) > 0 and that τx and τx ′ are P Z=z-unique. Therefore, the causal (Z =z)-
conditional total effect CTE Z ; x x ′ (z) on Y comparing x to x ′ is identical to the causal aver-
age total effect on Y comparing x to x ′ with respect to the measure P Z=z . That is,

CTE Z ; xx ′ (z) = E (τx − τx ′ |Z =z) = E Z=z (τx − τx ′ ). (5.37)


Remark 5.46 [Causal Conditional Versus Causal Average Total Effects] A causal condi-
tional total effect variable is more informative than the causal average total effect. If the
values z of Z are pretest scores that assess the ‘same’ attribute (e. g., life satisfaction) as the
outcome variable Y (the post-test), but prior to the onset of the treatment, then compar-
ing the conditional total effects CTE Z ; xx ′ (z) and CTE Z ; xx ′ (z ′ ) shows if these conditional
total effects are different for different values z and z ′ of this pretest. If they are, then the
numbers CTE Z ; xx ′ (z) and CTE Z ; x x ′ (z ′ ) may inform us about the differential indication of
the treatment. That is, they answer questions such as “Which treatment is good for which
kind of persons?” ⊳

5.4.2 Causal (X =x ∗ )-Conditional Total Effect

A special case of a (Z =z)-conditional effect is the (X =x ∗ )-conditional effect comparing x


to x ′ . In this case, the random variableX does not only play the role of the focused putative
cause variable, but also of the variable on whose values we condition. Note that x ∗ can be
identical to x, x ′ , or to a third value of X . For Z =X and z =x ∗, Definition 5.38 (ii) yields

CTE X ; x x ′ (x ∗ ) = E (τx − τx ′ |X =x ∗), (5.38)

the causal (X =x ∗)-conditional total effect on Y comparing x to x ′ .

Remark 5.47 [Substantive Meaning] Suppose X represents a treatment variable in an ex-


periment or in a quasi-experiment. If there are two treatment conditions, treatment (X =1)
and control (X =0), then we may consider CTE X ; 10 (1), the (X =1)-conditional total effect
comparing treatment (X =1) to control (X =0), and CTE X ; 10 (0), the (X =0)-conditional to-
tal effect comparing treatment (X =1) to control (X =0). These effects are also known as
the ‘average effect on the treated’ and the ‘average effect on the untreated’, respectively. ⊳

Remark 5.48 [Pre-Facto Perspective] At first sight, the concept of an (X =x ∗)-conditional


total effect comparing x to x ′ seems strange. If, for example, x ′ represents ‘no treatment’,
how can we talk about the causal average (or conditional) total treatment effect on the
untreated ? However, remember that we are not talking about data that resulted from an
experiment — an interpretation that is suggested by the term ‘treatment effect on the un-
treated’. Instead, we are considering a random experiment that is still to be conducted, that
is, we look at the random experiment from the pre facto perspective. This is what proba-
bilistic theories are about: a random experiment that is not yet conducted. Talking about
the probability of an event does not make sense for an event that already occurred, unless
we do as if it did not yet occur, that is, unless we take the pre facto perspective. Hence, we
132 5 True Outcome Variable and Causal Total Effects

can talk about a causal individual total effect although the individual is not yet treated and
even if it will never be treated, just in the same way as we can talk about the probability of
flipping ‘heads’, even if the coin is never flipped. ⊳

Remark 5.49 [Causal (X =x ∗)-Conditional Total Treatment Effects] Causal conditional


total effects given a specific value x ∗ of the treatment variable X are often more informa-
tive than the causal average total effect, especially, if the X -conditional expectations of the
true outcome variables τx and τx ′ actually depend on the values of X . However, if τx and
τx ′ are mean-independent from X , that is, if

E (τx |X ) =
P
E (τx ) and E (τx ′ |X ) =
P
E (τx ′ ), (5.39)

then

CTE X ; x x ′ (X ) =
P
E (τx − τx ′ |X ) =
P
E (τx |X ) − E (τx ′ |X ) =
P
E (τx ) − E (τx ′ ) = ATE x x ′ . (5.40)

Hence, if Proposition (5.39) holds, then the causal (X =x ∗)-conditional total treatment ef-
fects CTE X ; x x ′ (x ∗ ) are identical for all values x ∗ of X for which P (X =x ∗) > 0. A sufficient
condition of Proposition (5.39) is stochastic independence of X and the global potential
confounder D X (see Exercise 5-12), a condition that is created in the randomized experi-
ment. (For more details see ch. 8.) ⊳

Remark 5.50 [CTE X ; xx ′ (x) Versus CTE X ; xx ′ (x ′ )] Suppose we are interested in the effects
of a treatment (represented by X =x ) compared to a control (represented by X =x ′ ) with
respect to the outcome variable Y, say well-being, and assume that there is no random
assignment of persons to treatments. In this case, the persons that tend to take the treat-
ment may differ in their well-being before treatment and in other pre-treatment variables
from those who tend to be in the control condition. In this case, there might be large dif-
ferences between the causal (X =x )-conditional total effect CTE X ; xx ′ (x) compared to the
causal (X =x ′ )-conditional total effect CTE X ; xx ′ (x ′ ). In this scenario the causal average
total effect ATE xx ′ would not be of much interest. The causal (X =x )-conditional effect
CTE X ; x x ′ (x) helps us evaluating how good the treatment is on average for those who tend
to take this treatment. In contrast, CTE X ; x x ′ (x ′ ) informs us about the average effect of the
treatment on those who tend not to take the treatment — under the side conditions under
which the random experiment is to be conducted. Hence, if the causal conditional total
effect CTE X ; xx ′ (x) is smaller than the causal conditional total effect CTE X ; x x ′ (x ′ ), one may
raise the question whether or not it would be worthwhile to change the regime of assigning
units to treatment, provided, of course, that this regime is under our control. ⊳

Remark 5.51 [Multivariate Random Variable Z ] Also note that the concept of a causal
(Z =z)-conditional total effect is not restricted to a univariate random variable Z . Instead,
Z = (Z 1 , . . . , Z m ) may also be an m-variate random variable on (Ω, A, P ) such that a value
z = (z1 , . . . , zm ) of Z is an m-tupel of values of the random variables Z 1 , . . . , Z m . ⊳

5.4.3 Complete Re-Aggregation

In this section we consider complete re-aggregation of conditional effects, that is, we con-
sider the expectation of a causal total effect variable CTE Z ; xx ′ . According to Theorem 5.52,
this expectation is identical to the causal average total effect on Y comparing x to x ′ .
5.4 Causal Conditional Total Effect and Total Effect Function 133

Theorem 5.52 [Complete Re-Aggregation of a Conditional Total Effect Variable]


Let the Assumptions 5.1 (a) to (e) and (g) hold, and let CTE Z ; xx ′ (Z ) denote a causal
Z-conditional total effect variable comparing x to x ′ with respect to Y. Then
¡ ¢
E CTE Z ; x x ′ (Z ) = ATE x x ′ . (5.41)
(Proof p. 147)

Remark 5.53 [Expectation With Respect to the Distribution of Z ] According to Theorem


5.52, the expectation of a causal Z-conditional total effect variable is identical to the aver-
age total effect. To emphasize, the expectation is with respect to the distribution of Z , that
is, ¡ ¢
E CTE Z ; x x ′ (Z ) = E Z (CTE Z ; x x ′ ) = ATE x x ′ , (5.42)
which immediately follows from RS-Equation (3.11). Taking this expectation means to re-
aggregate the (Z =z)-conditional total effects to a single number, the causal average total
effect. ⊳
Remark 5.54 [The Proper Way of Re-Aggregation] Inserting the definition of CTE Z ; xx ′ (Z )
[see Eq. (5.30)] into Equation (5.41) yields
¡ ¢
ATE x x ′ = E CTE Z ; x x ′ (Z ) [(5.41)]
¡ ¢
= E E (τx − τx ′ |Z ) [(5.30)]
(5.43)
= E (τx − τx )
′ [RS-Box 4.1 (iv)]
= E (τx ) − E (τx ′). [RS-Box 3.1 (ix)]
Hence, complete re-aggregation actually yields the average causal effect, that is, the ex-
pectation of the true effect function τx − τx ′ [see Def. 5.18 (i)].
If Y is binary, then this implies that we take the expectation of the D X -conditional prob-
abilities
′ ′
P X =x (Y =1|D X ) = E X =x (Y |D X ) = τx and P X =x (Y =1|D X ) = E X =x (Y |D X ) = τx ′ ,
and not of their log odds ratios or other transformations of these probabilities. In general,
a re-aggregation of such transformed probabilities does not yield the causal average total
effect. In fact, the expectation of log odds ratios may result in negative ‘effects’ although
the causal average effect is positive and vice versa (see Exercise 5-13). To emphasize, if Y
is an indicator variable with values 0 and 1, then Equations (5.43) yield

ATE xx ′ = E P X =x (Y =1|D X ) − E P X =x (Y =1|D X ) .
¡ ¢ ¡ ¢
(5.44)

5.4.4 Partial Re-Aggregation

Now we turn to a less rigorous re-aggregation of a causal total effect variable CTE Z ; x x ′ (Z ),
considering a W-conditional expectation of CTE Z ; xx ′ (Z ), which, according to Theorem
5.55, is a causal W-conditional total effect variable CTE W ; xx ′ (W ), provided that W is Z -
measurable,
¡ that is, provided
¯ ¢ that σ(W ) ⊂ σ(Z ). Furthermore, the conditional expectation
value E CTE Z ; xx ′ (Z ) ¯ W =w is identical to the causal (W =w)-conditional total effect on
Y comparing x to x ′ , if we assume P (W =w) > 0.
134 5 True Outcome Variable and Causal Total Effects

Theorem 5.55 [Partial Re-Aggregation of a Conditional Total Effect Variable]


Let the assumptions of Theorem 5.52 hold and assume that W is a Z-measurable ran-
dom variable on (Ω, A, P ).
(i) Then
¡ ¯ ¢
E CTE Z ; x x ′ (Z ) ¯ W =
P
CTE W ; x x ′ (W ). (5.45)

(ii) If w ∈W (Ω) is a value of W such that P (W =w) > 0, then


¡ ¯ ¢
E CTE Z ; x x ′ (Z ) ¯ W =w = CTE W ; xx ′ (w). (5.46)
(Proof p. 147)

Remark 5.56 [Partial Re-Aggregation] According to Theorem 5.55 (i), the W-conditional
expectation of a causal Z-conditional total effect variable is P -almost surely identical to
a causal W-conditional total effect variable, provided that W is Z -measurable. If W is Z-
measurable,
¡ then ¢ ) ⊂ σ(Z ), and if also σ(W ) 6= σ(Z ), then the conditional expectation
¯ σ(W
E CTE Z ; xx ′ (Z ) ¯ W may be called a partial re-aggregation of the original causal total effect
variable CTE Z ; xx ′ (Z ). It is tantamount to coarsening the original causal total effect variable
CTE Z ; x x ′ (Z ) to a less fine-grained causal total effect variable CTE W ; x x ′ (W ). ⊳

Remark 5.57 [The Proper Way of Partial Re-Aggregation] Inserting the definition of the
causal conditional total effect variable CTE Z ; xx ′ (Z ) [see Eq. (5.30)] into Equation (5.45)
yields
¡ ¯ ¢
CTE W ; x x ′ (W ) =
P
E CTE Z ; xx ′ (Z ) ¯ W [(5.45)]
¡ ¯ ¢
=
P
E E (τx − τx ′ |Z ) W
¯ [(5.30)]
(5.47)
=
P
E (τx − τx ′ |W ) [RS-Box 4.1 (xiii)]
=
P
E (τx |W ) − E (τx ′ |W ). [RS-Box 4.1 (xviii)]

Hence, partial re-aggregation actually yields the causal W -conditional effect variable, that
is, the W -conditional expectation of the true effect variable τx − τx ′ [see Def. 5.18 (i)].
If Y is binary, then we take a W-conditional expectation of Z-conditional probabilities
′ ′
P X =x (Y =1|D X ) = E X =x (Y |D X ) = τx and P X =x (Y =1|D X ) = E X =x (Y |D X ) = τx ′ ,

and not of their log odds ratios or other transformations of these probabilities. In general,
a partial re-aggregation of such transformed probabilities does not yield a causal W -con-
ditional total effect variable. ⊳

5.5 Example: Joe and Ann With Bias at the Individual Level

Now we illustrate the various causal total effects by an example in which there are two
treatment variables. Such a two-factorial experiment has already been discussed at an in-
formal level in section 2.3 and at a structural level in section 4.2.2. In contrast to the ex-
ample in section 4.2.2, now the outcome variable is not binary any more. In this example,
5.5 Example: Joe and Ann With Bias at the Individual Level 135

we also exemplify that bias can occur at the individual level if we just control for the per-
son variable but not for the second treatment variable that is simultaneous to the focused
putative cause variable.

The Random Experiment and the Probability Measure

The parameters displayed in Table 5.2 refer to a random experiment in which


(a) a person is sampled from the set ΩU = { Joe , Ann } of persons,
(b) the sampled person may (yes ) or may not (no ) receive treatment a (represented by
Z ) and, at the same time, receive (yes ) or not (no ) receive treatment b (represented
by X ).
(c) After an appropriate time a real-valued outcome variable Y (e. g., well-being, life
satisfaction, or a score on a symptom checklist) is observed.
The set of possible outcomes of this random experiment is

Ω = Ω1 × Ω2 × Ω3 = ΩU × (ΩZ × ΩX ) × ΩY ,

where
Ω1 = ΩU := { Joe, Ann },
Ω2 = ΩZ × ΩX := { (no , no ), (no , yes ), (yes , no ), (yes , yes ) },
and ΩY is the set of possible observations based on which the score of the outcome vari-
able Y is computed.
If Ω3 = ΩY is finite or countably infinite, then we may choose A = P (Ω) as the σ-al-
gebra on Ω, where P (Ω) denotes the power set of Ω. However, if ΩY = R , then A is the
product σ-algebra A = P (ΩU )⊗P (ΩZ ×ΩX )⊗B, where B denotes the Borel σ-algebra on
R (see RS-Rem. 1.14).
The probability measure P on (Ω, A ) is known only in that part which can be com-
puted from the parameters displayed in Table 5.2. For example, the conditional prob-
abilities for the two kinds of treatments are as follows: Joe receives treatment a (Z =1)
with probability P (Z =1|U =Joe ) = 1/2 and he receives treatment b (X =1) with probability
P (X =1|U =Joe, Z = 0) = 3/4 if he does not receive treatment a (Z = 0), and with probability
P (X =1|U =Joe, Z =1) = 1/4 if he receives treatment a (Z =1) as well. Similarly, Ann receives
treatment b (X =1) with probability P (X =1|U =Ann, Z = 0) = 3/4 if she does not receive
treatment a (Z = 0), and with probability P (X =1|U =Ann, Z =1) = 1/4 if she also receives
treatment a (Z =1). This is a realistic scenario if the probability of a person getting treat-
ment b is fixed by design depending on whether or not this person receives treatment a.
(Availability of resources might be a reason for such a design.)

Specifying the Filtration in A

We specify the filtration (Ft )t ∈T , T = {1, 2, 3}, as follows: F1 = σ(π1 ) = σ(U ), F2 = σ(π1, π2 ) =
σ(U , Z , X ), and F3 = σ(π1, π2, π3 ) = σ(U , Z , X , Y ), presuming that the two treatment vari-
ables X and Z are simultaneous. Of course, treatments are processes in time and they
could be represented by a more fine-grained filtration that would allow us to study de-
pendencies within the treatment process. However, for our present purpose the filtration
(Ft )t ∈T specified above will suffice.
136 5 True Outcome Variable and Causal Total Effects

Table 5.2. Joe and Ann with bias at the individual level

E X=0 (Y |U =u , Z=z)

E X =1 (Y |U =u , Z=z)
P (X =1 |U =u , Z=z)

P X=0(Z=z |U =u )

P X =1(Z=z |U =u )
Group therapy z

P (Z=z |U =u )
Person u

P (U =u )
0 1/2 3/4 68 82 1/4 3/4
Joe 1/2
1 1/2 1/4 96 100 3/4 1/4

0 1/2 3/4 80 98 1/4 3/4


Ann 1/2
1 1/2 1/4 104 106 3/4 1/4

Focused Treatment Variable and Potential Confounders

There are several true total causal effects we might look at. In principle, we might be inter-
ested in

(a1 ) the causal individual total effect on Y of treatment a (Z =1) compared to not treat-
ment a (Z = 0) given that Joe (Ann) also receives treatment b (X =1),
(b 1 ) the causal individual total effect on Y of treatment a (Z =1) compared to not treat-
ment a (Z = 0) given that Joe (Ann) does not receive treatment b (X =0), and
(c 1 ) the average of these causal individual total effects, averaging over the two values of
X (representing treatment b and not treatment b, respectively).

Of course, the causal average effect is certainly less informative than the two conditional
effects.
Similarly, we may also be interested in

(a2 ) the causal individual total effect on Y of treatment b (X =1) compared to not treat-
ment b (X =0) given that Joe (Ann) also receives treatment a (Z =1),
(b 2 ) the causal individual total effect of treatment b (X =1) compared to not treatment b
(X =0) on Y given that Joe (Ann) does not receive treatment a (Z = 0), and
(c 2 ) the average of these causal individual total effects, averaging over the two values of
Z (representing treatment a and not treatment a, respectively).
Looking at the effects (a1 ) to (c 1 ), we consider treatment b to be a (qualitative) covariate
and treatment a to be the putative cause variable asking for the causal conditional effects
of treatment a given treatment b and their average, the ‘main effect’ of treatment a. In
contrast, looking at the effects (a2 ) to (c 2 ), we consider treatment a to be a (qualitative)
covariate and treatment b to be the putative cause variable.
In principle, both treatment variables, X and Z , may take the role of a covariate (and
potential confounder), depending on which treatment effects we are studying, the causal
effects of X on Y or the causal effects of Z on Y . In this example, we focus on X as a
putative cause variable of Y, that is, we consider the regular probabilistic causality setup
5.5 Example: Joe and Ann With Bias at the Individual Level 137
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y . In this setup, the index sets in Definition 4.11 are J = {1, 2}
and K = {2}, implying

C = σ(π2j , j ∈ K ) = σ(π22) = σ(X )

and

DC = σ(π2j , j ∈ J \ K ) = σ(π1, π21 ) = σ(U , Z ) .

In this setup, Z is a potential confounder of X because σ(Z ) ⊂ DC [see Def. 4.11 (iv)]. Fur-
thermore, the bivariate random variable (U , Z ) is a global potential confounder of X be-
cause σ(U , Z ) = σ(π1, π21 ) [see Def. 4.11 (iii)].

True Outcome Variables and True Total Effects

As stated above, D X = (U , Z ) is a global potential confounder of X . Hence, choosing X as


putative cause, the true outcome variables are

τ0 = E X=0 (Y |D X ) = E X=0 (Y |U , Z ) and τ1 = E X =1 (Y |D X ) = E X =1 (Y |U , Z ). (5.48)

The values of τ0 and τ1 , denoted E X=0 (Y |U =u , Z =z) and E X =1 (Y |U =u , Z =z), are dis-
played in Table 5.2. According to RS-Equation (5.26),

E X=0 (Y |U =u , Z =z) = E (Y | X =0,U =u , Z =z)

and
E X =1 (Y |U =u , Z =z) = E (Y | X =1,U =u , Z =z).
The true total effect variable is

CTE (U ,Z ); 10 (U , Z ) = τ1 − τ0 .

It is (U , Z )-measurable. According to Table 5.2, its values are as follows: If Joe does not re-
ceive treatment a (Z = 0), then his causal total effect of treatment b compared to ¬b is

CTE (U ,Z ); 10 ( Joe , 0) = E (Y | X =1,U = Joe , Z = 0) − E (Y | X =0,U = Joe , Z = 0)


(5.49)
= 82 − 68 = 14.

In contrast, if he does receive treatment a (Z =1), then it is

CTE (U , Z ); 10 ( Joe , 1) = E (Y | X =1,U = Joe , Z =1) − E (Y | X =0,U = Joe , Z =1)


(5.50)
= 100 − 96 = 4.

Similarly, Ann’s total effect of treatment b compared to ¬b is

CTE (U ,Z ); 10 (Ann , 0) = E (Y | X =1,U = Ann , Z = 0) − E (Y | X =0,U = Ann , Z = 0)


(5.51)
= 98 − 80 = 18,

if she does not receive treatment a (Z = 0), whereas it is

CTE (U ,Z ); 10 (Ann , 1) = E (Y | X =1,U = Ann , Z =1) − E (Y | X =0,U = Ann , Z =1)


(5.52)
= 106 − 104 = 2,
138 5 True Outcome Variable and Causal Total Effects

if she does (Z =1). Hence, all these (U =u , Z =z)-conditional total effects are positive, and
they are the true total effects on Y comparing treatment b to treatment ¬b [see Def. 5.18
(iii)]. However, in this example, these effects are not identical to the individual effects
CTE U ;10 (u), as will be shown later on in this section. The individual total effect CTE U ;10 (u)
is an attribute of the person u, whereas the true total effect CTE (U , Z ); 10 (u, z) is an attribute
of the person u in treatment z.

Causal Average Total Effect

Using the true total effects [see Eqs. (5.49) to (5.52)], the causal average total effect of treat-
ment b (X =1) compared to ¬b (X =0) can be computed by
¡ ¢ XX
E CTE (U ,Z ); 10 (U , Z ) = CTE (U ,Z ); 10 (u, z) · P (U =u , Z =z)
u z
1 1 1 1
= 14 · + 4 · + 18 · + 2 · = 9.5.
4 4 4 4
BecauseCTE (U ,Z ); 10 (U , Z ) denotes the composition of (U , Z ) and the true total effect func-
tion CTE (U ,Z ); 10 [see Def. 5.38 (i)], in these computations we used RS-Equation (3.13), and

P (U =u , Z =z) = P (Z =z |U =u ) · P (U =u ) = 1/2 · 1/2 = 1/4,

for all values (u, z) of the global potential confounder (U , Z ). Note that in other examples
the probabilities P (U =u , Z =z) may not be identical for all pairs (u, z) of values of U and
Z.

(U =u )-Conditional Prima Facie Effects

Now we compute the (U =u )-conditional prima facie effects, that is, the differences

E (Y | X =1,U =Joe ) − E (Y | X =0,U =Joe)

and
E (Y | X =1,U =Ann) − E (Y | X =0,U =Ann) .
These differences may also be called the individual or person-specific prima facie effects.
They are not identical to the causal individual total effects, that is, they are not the values
of the causal U -conditional total effect function, which will be computed in the next sub-
section. Hence, in this example, these individual prima facie effects are biased (see ch. 6
for more details).
In order to compute the conditional expectation values of Y given treatment and unit,
we use the equation
X
E (Y | X =x,U =u) = E (Y | X =x,U =u, Z =z) · P (Z =z | X =x,U =u), (5.53)
z

which is always true if Z is discrete with P (X =x,U =u, Z =z) > 0 for all values of Z [see
RS-Box 3.2 (ii)]. Both kinds of parameters occurring on the right-hand side of this equa-
tion are displayed in Table 5.2. This does not only include the conditional expectation
values E (Y | X =x,U =u, Z =z) = E X =x (Y |U =u, Z =z), but also the conditional probabilities
P (Z =z | X =x,U =u) = P X =x (Z =z |U =u), which have been computed via:
5.5 Example: Joe and Ann With Bias at the Individual Level 139

P (X =x |U =u, Z =z) · P (Z =z |U =u)


P (Z =z | X =x,U =u) = X (5.54)
P (X =x |U =u, Z =z) · P (Z =z |U =u)
z

(see Exercises 5-14 and 5-15).


Hence, using Equation (5.53), the (X =x ,U =Joe )-conditional expectation values for Joe
are
1 3
E (Y | X = 0,U =Joe) = 68 · + 96 · = 89,
4 4
3 1
E (Y | X =1,U =Joe) = 82 · + 100 · = 86.5,
4 4
and his individual prima facie effect is

E (Y | X =1,U =Joe) − E (Y | X = 0,U =Joe) = 86.5 − 89 = −2.5. (5.55)

In this example, the individual prima facie effect of treatment b compared to not treat-
ment b is negative, namely −2.5, although all (U =Joe , Z =z)-conditional effects are posi-
tive, namely 14 for U =Joe and Z = 0 (i. e., given Joe and not treatment a) and 4 for U =Joe
and Z =1 (i. e., given Joe and treatment a).
Similarly, using Equation (5.53), the (X =x ,U =Ann)-conditional expectation values for
Ann are
1 3
E (Y | X = 0,U =Ann) = 80 · + 104 · = 98,
4 4
3 1
E (Y | X =1,U =Ann) = 98 · + 106 · = 100,
4 4
and her individual prima facie effect of X on Y is

E (Y | X =1,U =Ann) − E (Y | X = 0,U =Ann) = 100 − 98 = 2. (5.56)

This prima facie effect does not have a causal interpretation. It is not identical to the causal
(U =Ann)-conditional total effect of X on Y , which is computed in the following subsec-
tion.

Causal (U =u )-Conditional Total Effects

Because D X = (U , Z ) is a global potential confounder in the example presented in Table


5.2, for Joe the causal (U =u )-conditional (or individual) total effects of treatment b (X =1)
compared to not treatment b (X =0) can be computed by
¡ ¢
CTE U ;10 ( Joe ) = E CTE (U ,Z ); 10 (U , Z ) |U =Joe

= E E X =1 (Y |U , Z ) − E X=0 (Y |U , Z ) ¯ U =Joe
¡ ¯ ¢

XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z |U =Joe)
u z
1 1
= (82 − 68) · + (100 − 96) · + (98 − 80) · 0 + (106 − 104) · 0 = 9
2 2
140 5 True Outcome Variable and Causal Total Effects

[see Eqs. (5.31), (5.48), and RS-Eq. (3.28)]. This is an example of partial re-aggregation (see
Th. 5.55). For Ann, the corresponding causal total individual effect is

¡ ¢
CTE U ;10 (Ann ) = E CTE (U ,Z ); 10 (U , Z ) |U =Ann

= E E X =1 (Y |U , Z ) − E X=0 (Y |U , Z ) ¯ U =Ann
¡ ¯ ¢

XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z |U =Ann)
u z
1 1
= (82 − 68) · 0 + (100 − 96) · 0 + (98 − 80) · + (106 − 104) · = 10.
2 2

Hence, in this example, the two causal individual total effects for Joe and Ann are both
positive.
Comparing the causal individual effect CTE U ;10 ( Joe ) = 9 to the corresponding prima fa-
cie effect −2.5 [see Eq. (5.55)] shows that the individual prima facie effect E (Y | X =1,U = Joe )
− E (Y | X = 0,U = Joe ) strongly differs from its causal counterpart, and the same applies to
the individual prima facie effect of Ann. This is evident if we compare her prima facie effect
E (Y | X =1,U =Ann) − E (Y | X = 0,U =Ann) = 2 to her causal individual effect CTE U ;10 (Ann )
= 10.
According to Equation (5.41), the expectation of these causal individual effects,

E CTE U ;10 (U ) = E E X =1 (Y |U , Z ) − E X=0 (Y |U , Z ) ¯ U


¡ ¢ ¡ ¯ ¢

X 1 1
= CTE U ;10 (u) · P (U =u ) = 9 · + 10 · = 9.5,
u 2 2

is the causal average total effect ATE 10 of treatment b compared to not treatment b.
Causal individual total effects are more informative than the causal average total ef-
fect and usually more informative than causal conditional total effects given a value of a
pre-test or a second treatment variable. However, note again that causal individual [i. e.,
(U =u )-conditional] total effects are not necessarily the most fine-grained causal total ef-
fects. In this example, there is a second treatment variable, denoted Z , that contributes
to the variation of the outcome variable Y beyond the individual level. This is exem-
plified comparing the causal (U =u , Z =z)-conditional total effects to the causal (U =u )-
conditional total effects.

Causal (Z =z)-Conditional Total Effects

In the example presented in Table 5.2, the causal (Z = 0)-conditional (i. e., given not treat-
ment a) total effect of treatment b (X =1) compared to not treatment b (X =0) can be com-
puted by

¡ ¯ ¢
CTE Z ;10 (0) = E CTE (U ,Z ); 10 (U , Z ) ¯ Z = 0

= E E X =1 (Y |U , Z ) − E X=0 (Y |U , Z ) ¯ Z = 0
¡ ¯ ¢

XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z | Z = 0)
u z
1 1
= (82 − 68) · + (100 − 96) · 0 + (98 − 80) · + (106 − 104) · 0 = 16
2 2
5.5 Example: Joe and Ann With Bias at the Individual Level 141

[see Eqs. (5.31), (5.48), and RS-Eq. (3.28)]. The corresponding causal (Z =1)-conditional
(i. e., given treatment a) total effect is
¡ ¯ ¢
CTE Z ;10 (1) = E CTE (U ,Z ); 10 (U , Z ) ¯ Z =1

= E E X =1 (Y |U , Z ) − E X=0 (Y |U , Z ) ¯ Z =1
¡ ¯ ¢

XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z | Z =1)
u z
1 1
= (82 − 68) · 0 + (100 − 96) ·
+ (98 − 80) · 0 + (106 − 104) · = 3.
2 2
According to Equation (5.41), taking the expectation
¡ ¢ X ¡ ¯ ¢ 1 1
E CTE Z ; 10 (Z ) = E CTE Z ; 10 (z) ¯ Z =z · P (Z =z) = 16 · + 3 · = 9.5 (5.57)
z 2 2
again yields the average total effect. In this equation, we used the theorem of total prob-
P
ability in order to compute P (Z =z) = u P (Z =z |U =u ) · P (U =u ) (see RS-Th. 1.38), which
yields P (Z = 0) = P (Z =1) = 1/2.

Causal (X =x )-Conditional Total Effects

Consider again the example presented in Table 5.2. Because D X = (U , Z ), the causal
(X =0)-conditional total effect of treatment b (X =1) compared to not treatment b (X =0)
can be computed by
¡ ¯ ¢
CTE X ;10 (0) = E CTE (U ,Z );10 (U , Z ) ¯ X =0
XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z | X =0)
u z
1 3 1 3
= (82 − 68) · + (100 − 96) · + (98 − 80) · + (106 − 104) · = 6.25
8 8 8 8
[see again Eqs. (5.31), (5.48), and RS-Eq. (3.28)].
In contrast,
¡ ¯ ¢
CTE X ; 10 (1) = E CTE (U ,Z ); 10 (U , Z ) ¯ X =1
XX¡ ¢
= E (Y | X =1,U =u , Z =z) − E (Y | X =0,U =u , Z =z) · P (U =u , Z =z | X =1)
u z
3 1 3 1
= (82 − 68) · + (100 − 96) · + (98 − 80) · + (106 − 104) · = 12.75
8 8 8 8
yields the (X =1)-conditional total effect of treatment b (X =1) compared to not treatment
b (X =0). In these equations, we used
P (X =x |U =u , Z =z) · P (U =u , Z =z)
P (U =u , Z =z | X =x ) = . (5.58)
P (X =x )
According to Equation (5.41), taking the expectation
¡ ¢ X 1 1
E CTE X ; 10 (X ) = CTE X ;10 (x) · P (X =x ) = 6.25 · + 12.75 · = 9.5
x 2 2
yields the causal average total effect. In this equation, we again used the theorem of total
P P
probability, that is, P (X =x ) = u z P (X =x |U =u , Z =z) · P (U =u , Z =z) (see RS-Th. 1.38),
which yields P (X =0) = P (X =1) = 1/2.
142 5 True Outcome Variable and Causal Total Effects

Causal (X =x , Z =z )-Conditional Total Effects

Because, in the example presented in Table 5.2, (U , Z ) is a global potential confounder,


the causal conditional total effect CTE (X ,Z ); 10 (x, z) of treatment b (X =1) compared to not
treatment b (X =0) given X =x and Z =z can be computed by

CTE (X ,Z ); 10 (x, z)
¡ ¯ ¢
= E CTE (U ,Z );10 (U , Z ) ¯ X =x , Z =z = E (δ10 | X =x , Z =z) (5.59)

(Y |U =u , Z =z ∗ ) − E X=0 (Y |U =u , Z =z ∗ ) · P (U =u , Z =z ∗ |X =x , Z =z)
X X ¡ X =1 ¢
= E
u z∗

[see Eqs. (5.31), (5.48), and RS-Eq. (3.28)]. In this equation, the summation is over all
values u of U and all values z ∗ of Z . In the term P (U =u , Z =z ∗ |X =x , Z =z) we con-
dition on a fixed value x of X and a fixed value z of Z . The conditional probabilities
P (U =u , Z =z ∗ |X =x , Z =z) can be computed via
(
∗ P (U =u |X =x , Z =z), if z ∗= z
P (U =u , Z =z |X =x , Z =z) = (5.60)
0, if z ∗6= z,

because, if z = z ∗, then

P (U =u , Z =z ∗, X =x , Z =z)
P (U =u , Z =z ∗ |X =x , Z =z) =
P (X =x , Z =z)
P (U =u , X =x , Z =z)
= = P (U =u |X =x , Z =z) (5.61)
P (X =x , Z =z)
P (X =x |U =u , Z =z) · P (U =u , Z =z)
= ,
P (X =x | Z =z) · P (Z =z)

where
X
P (X =x | Z =z) = P (X =x |U =u , Z =z) · P (U =u |Z =z) , (5.62)
u

with P (U =u |Z =z) = P (Z =z |U =u ) · P (U =u )/P (Z =z). In this example, P (U =u |Z =z) =


1/2, for all values of U and Z . Hence, Equation (5.62) yields

P (X =0 | Z = 0) = 1/4 · 1/2 + 1/4 · 1/2 = 1/4,

and using Equation (5.60) we receive:

1/4 · 1/4 1
P (U =u , Z = 0 | X =0, Z = 0) = = ,
1/4 · 1/2 2
¡ ¯ ¢
for u =Joe and for u =Ann. Hence, the equation for E CTE (U ,Z ); 10 (U , Z ) ¯ X =x , Z =z yields
¡ ¯ ¢
E CTE (U ,Z ); 10 (U , Z ) ¯ X =0, Z = 0 = E (δ10 | X =0, Z = 0)
1 1
= (82 − 68) · + (100 − 96) · 0 + (98 − 80) · + (106 − 104) · 0 = 16.
2 2
For X =1 and Z = 0, we receive
5.6 Summary and Conclusions 143
¡ ¯ ¢
E CTE (U ,Z ); 10 (U , Z ) ¯ X =1, Z = 0 = E (δ10 | X =1, Z = 0)
1 1
= (82 − 68) · + (100 − 96) · 0 + (98 − 80) · + (106 − 104) · 0 = 16,
2 2
for X =0 and Z =1, we receive
¡ ¯ ¢
E CTE (U ,Z );10 (U , Z ) ¯ X =0, Z =1 = E (δ10 | X =0, Z =1)
1 1
= (82 − 68) · 0 + (100 − 96) · + (98 − 80) · 0 + (106 − 104) · = 3,
2 2
and for X =1 and Z =1,
¡ ¯ ¢
E CTE (U ,Z );10 (U , Z ) ¯ X =1, Z =1 = E (δ10 | X =1, Z =1)
1 1
= (82 − 68) · 0 + (100 − 96) · + (98 − 80) · 0 + (106 − 104) · = 3.
2 2
Hence, in this example, the conditional total effects E (δ10 | Z =z) and E (δ10 | X =x , Z =z) are
identical.
According to Equation (5.41), taking the expectation
³ ¡ ¯ ¢´ ¡ ¢
E E CTE (U ,Z ); 10 (U , Z ) ¯ X , Z = E E (δ10 | X , Z )
XX
= E (δ10 | X =x , Z =z) · P (X =x , Z =z)
x z
1 1 1 1
= 16 · + 3 · + 16 · + 3 · = 9.5,
4 4 4 4
yields again the causal average total effect ATE 10 . In this equation, we used P (X =x , Z =z) =
P (X =x | Z =z) · P (Z =z), where the conditional probabilities P (X =x |Z =z) are obtained via
Equation (5.62).

5.6 Summary and Conclusions

In this chapter, we introduced the concept of a true outcome variable of the value x of a pu-
tative cause variable X , which was then used to define various causal total effects. Assum-
ing P (X =x ) > 0, a true outcome variable τx has been defined such that its values are the
conditional expectation values E (Y | X =x , D X =d ) of the outcome variable Y holding con-
stant X at the value x and D X at a value d, where D X denotes a global potential confounder
of X . These conditional expectation values are uniquely defined if P (X =x , D X =d ) > 0.
Note that this requirement is not necessary in the definition of a true outcome variable
itself. Also note, in this definition we only consider total effects.
Based on the concept of a true outcome variable τx = E X =x(Y |D X ), we defined several
kinds of causal total effects of treatment x compared to another treatment x ′ using the
true total effect variable

δxx ′ = τx − τx ′ .

The definitions of a causal average total effect ATE xx ′ and of a causal Z-conditional total
effect variable CTE Z ; xx ′ (Z ) (see Box 5.1) are based on the assumption that the two true
144 5 True Outcome Variable and Causal Total Effects

Box 5.1 Glossary of new concepts


¡ ¢
Let (Ω,A,P),(Ft )t ∈T ,C, DC , X ,Y be a regular probabilistic causality setup, let D X denote a
global potential confounder of X , let Y be real-valued with Var (Y ) > 0, let x ∈ X (Ω) be a value
of X , and assume P(X =x ) > 0.

τx True outcome variable of Y given the value x of X . It is a version of the


D X -conditional expectation of Y with respect to the probability measure
P X =x , that is, τx = E X =x (Y |D X ). This is a random variable on (Ω,A,P)
that is defined only if P (X =x ) > 0. With a global potential confounder
D X , we condition on all potential confounders of X .

Additionally, let also x ′ ∈ ΩX′ , assume P(X =x ′ ) > 0, and let τx and τx ′ denote true outcome
variables of Y given the values x and x ′ of X , respectively.

ATE x x ′ Causal average total effect comparing x to x ′. If τx and τx ′ are P-unique,


then it is defined by
ATE x x ′ := E (τx −τx ′ ).

Additionally, let also Z be a random variable on (Ω,A,P).

CTE Z ; xx ′ (Z ) Causal Z-conditional total effect variable comparing x to x ′ . If τx and τx ′


are P-unique, then
CTE Z ; x x ′ (Z ) := E (τx −τx ′ |Z ).

It is the composition of Z and CTE Z ; xx ′ , and therefore a random variable


on (Ω,A,P).
CTE Z ; xx ′ Causal Z-conditional total effect function comparing x to x ′. It is a factor-
ization of E (τx −τx ′ |Z ). Hence, it is a random variable on (ΩZ′ ,AZ′ ,P Z ).
CTE Z ; xx ′ (z) Causal (Z=z)-conditional total effect comparing x to x ′. If P(Z=z) > 0
and τx , τx ′ are P Z=z -unique, then
CTE Z ; x x ′ (z) := E (τx −τx ′ |Z=z).

CTE U ; xx ′ (u) Causal individual total effect comparing x to x ′ for unit u.

CTE X ; x x ′ (x ∗ ) Causal (X =x ∗)-conditional total effect comparing x to x ′.

CTE (X ,Z ); x x ′ (x ∗, z) Causal (X =x ∗, Z=z)-conditional total effect comparing x to x ′.

outcome variables τx and τx ′ are P-unique. Defining the causal (Z =z)-conditional total ef-
fect CTE Z ; x x ′ (z) we only assume that τx and τx ′ are P Z=z-unique. The term ‘total’ is used in
order to distinguish these effects from direct and indirect effects, which are not considered
in this volume.
While D X is a global potential confounder of X on which we condition in order to con-
trol for all potential confounders of X , the variable Z may be used to re-aggregate the
(D X =d )-conditional total effects in order to consider less fine-grained causal conditional
total effects. Examples of Z are the observational-unit variable U , a pre-treatment variable
Z , and a treatment variable X .
5.6 Summary and Conclusions 145

Causal Average Total Effect

Often we have to content ourselves with the causal average total effect or with causal con-
ditional total effects. Note however, that there might be cases in which half of the units
have positive causal individual total effects and the other half negative ones. The causal av-
erage total effect can then be zero. This is not a paradox but the nature of an average. Also
remember that a causal average total effect is informative for causal inference, whereas
an ordinary true mean difference E (Y | X =1) − E (Y | X =0), the prima facie effect, is not.
These prima facie effects have no causal interpretation at all, unless they are identical to
the causal average total effect. This will be detailed in chapter 6.

Main Effects Versus Conditional Effects in Analysis of Variance

Conceptually, the causal average total effect is what is tested in a t -test for two independent
groups, provided that the data are sampled in a perfect randomized experiment. Similarly,
in this case, a test of the main effect of the ‘treatment factor’ in orthogonal analysis of vari-
ance is a simultaneous test of several causal average total effects if there are more than
two treatment conditions. Furthermore, if Z is a qualitative covariate of X , then it is con-
sidered a second ‘factor’ in analysis of variance. In this case, the (Z =z)-conditional total
effects are often called the ‘simple main effects’ (see, e. g., Woodward & Bonett, 1991).
Note that causal average total effects are uniquely defined even if there are inter-indivi-
dual differences in the causal individual total effects, and even if there is interaction be-
tween X and a covariate Z of X in the sense that the effect of X depends on the values of
Z . However, only in the randomized experiment can we be sure that, with the main effects
in analysis of variance, we test the causal average total effects.
Of course, the causal conditional effects given the values of a covariate are usually more
informative than their average, that is, than the causal average total effect; but sometimes
averaging is useful in order to avoid information overload, and sometimes we may be able
to estimate precisely enough only the causal average effect, but not the causal conditional
effects, for example, because of small sample sizes.

Pre-Facto Versus Post-Facto or ‘Counterfactual’ Perspective

Note that our definitions of the various kinds of total effects solely use concepts of prob-
ability theory. No concepts have to be borrowed from philosophy or any other science,
although the basic idea goes back at least to Mill (1843/1865). We do not take a counter-
factual but a pre-facto perspective, which is the perspective taken in every application of
probability theory. Causal total effects are parameters, just in the same way as the proba-
bility of flipping ‘heads’ is a parameter about which we can talk before the coin is flipped
and even if the coin is never flipped. It is even meaningful to talk about the causal condi-
tional total effect of a treatment given control [see Eq. (5.38) and Rem. 5.47].

Theoretical Parameters Versus Data

Note that all concepts introduced in this chapter such as the causal average total effects,
causal conditional total effects, causal individual total effects, and so on, are of a purely
theoretical nature. This does not mean that they are irrelevant for practical research. On
146 5 True Outcome Variable and Causal Total Effects

the contrary, they explicate what exactly we are looking for when we ask for the causal total
effects, for example, comparing two values of a treatment variable with respect to an out-
come variable Y . It is these effects that we have to estimate if we want to evaluate (a) if and
how dangerous an infection is, and (b) the overall effects of medical, psychological, social,
or political interventions on a specified criterion. And this includes their (undesired) side
effects.
The examples treated in chapters 4 and 5 exemplify what we mean by ‘purely theoretical
nature‘. For example, Table 5.2 does not show data that might be obtained in a data sam-
ple. Instead, it contains the theoretical parameters we would like to estimate from sample
data. Data serve to estimate theoretical parameters, including the various causal effects.
Defining these parameters is necessary if we want to study the conditions under which
these parameters can in fact be estimated.

Causal Versus Noncausal Theoretical Parameters

Not all theoretical parameters have a causal meaning. In terms of the metaphor presented
in the preface, the causal total effects are the size of the invisible man. In contrast, in chap-
ter 1, we only dealt with (a) ordinary conditional expectation values E (Y |X =x ) of an out-
come variable Y given treatment x, (b) conditional expectation values E (Y |X =x ,Z =z) of
the outcome variable given treatment x and value z of another random variable Z , (c)
differences between these (conditional) expectation values, the (conditional) prima facie
effects, and (d) averages over these conditional prima facie effects.
The conditional expectation values E (Y |X =x ) and E (Y |X =x ,Z =z) are easily estimated
under the usual assumptions made for a sample, such as the assumption of independent
and identically distributed observations. However, they are only like the length of the in-
visible man’s shadow; depending on the angle of the sun, they can be seriously biased if
mistaken for the size of the invisible man itself.

Limitations

A limitation of the concept of a true outcome variable and the definitions of causal effects
based thereon is that they are defined only for values x of a putative cause variable X that
have a positive probability. This is not restrictive as long as we confine ourselves to con-
sidering only experiments and quasi-experiments. True outcome variables are restrictive,
however, if we study causal dependencies among continuous random variables and causal
dependencies on latent variables. In these cases, the concept of a true outcome variable
does not apply. In chapter 8 we will treat a class of causality conditions that do also apply
if the putative cause variable X is a continuous random variable. Furthermore, the class
of causality conditions presented in chapter 9 also applies if X is continuous or a latent
variable. In these cases causal effects and causal dependencies have to be defined with-
out true outcome variables. Nevertheless, true outcome variables are important for many
applications.
Another limitation is that we did not consider potential mediators, that is, variables that
are between X and Y . However, focussing only on causal total effects does not imply that
we denie that there are variables mediating these effects. This is one of the virtues of true
outcome theory: Given treatment x and person u, only the expectation of the outcome
variable Y is fixed, not the value of Y itself. In contrast, in Rubin’s potential outcome ap-
5.7 Proofs 147

proach it is assumed that the value of a potential outcome variable is fixed if we condition
on a treatment x and a person u. Such a determinism is at odds with the idea that potential
mediators might also affect the outcome variable Y . True outcome theory remedies this
deficiency. Nevertheless, defining potential mediators, direct, and indirect causal effects
would necessitate a filtration (Ft )t ∈T with more than three σ-algebras Ft .

5.7 Proofs

Proof of Lemma 5.37


© ª
Define the event A := ω ∈ Ω: P (X =x |D X )(ω) > 0 . Then

∀ z ∈ Z (Ω): τx is P Z=z-unique

⇔ ∀ z ∈ Z (Ω): P (X =x |D X ) > 0 [RS-Th. 5.27]


P Z=z
⇔ ∀ z ∈ Z (Ω): P Z=z (A) = 1 [def. of A, (5.27)]
X X
⇒ P (A | Z =z) · P (Z =z) = 1 · P (Z =z) [RS-(5.1)]
z ∈Z (Ω) z ∈Z (Ω)
£ X ¤
⇒ P (A) = 1 RS-(1.38), z
P (Z =z) = 1
⇔ P (X =x |D X ) >
P
0 [def. of A, (5.14)]
⇔ τx is P-unique. [RS-Th. 5.27]

The reverse implication immediately follows from RS-Box 5.1 (v).

Proof of Theorem 5.52

¡ ¢ ¡ ¢
E CTE Z ; x x ′ (Z ) = E E (τx − τx ′ |Z ) [(5.30)]

= E (τx − τx ′ ). [RS-Box 4.1 (iv)]

= ATE xx ′ . [Def. 5.26]

Proof of Theorem 5.55

Proposition (i).
¡ ¯ ¢ ¡ ¯ ¢
E CTE Z ; x x ′ (Z ) ¯ W =
P
E E (τx − τx ′ |Z ) ¯ W [(5.30)]

=
P
E (τx − τx ′ |W ) [RS-Box 4.1 (xiii)]

=
P
CTE W ; xx ′ (W ). [(5.30)]

Proposition (ii).
¡ ¯ ¢
E CTE Z ; x x ′ (Z ) ¯ W =
P
CTE W ; xx ′ (W ) [(5.45)]
148 5 True Outcome Variable and Causal Total Effects
¡ ¯ ¢
⇔ E CTE Z ; x x ′ (Z ) ¯ W =P
E (τx − τx ′ |W ) [Def. 5.38 (i)]
¡ ¯ ¢
⇒ E CTE Z ; x x ′ (Z ) ¯ W =w = E (τx − τx ′ |W =w ) [RS-(2.68), RS-(4.17)]
¡ ¯ ¢
⇔ E CTE Z ; x x ′ (Z ) ¯ W =w = CTE W ; xx ′ (w). [Def. 5.38 (ii)]

5.8 Exercises

⊲ Exercise 5-1 What is the conceptual framework in which we can define a true outcome variable?

⊲ Exercise 5-2 What does it mean that a true outcome variable τx is P-unique.

⊲ Exercise 5-3 Which are the values of the true outcome variable τ0 and of the conditional expec-
tation E (Y | X ,U ) for ω4 = (Joe ,yes ,+) in the example presented in Table 5.1?

⊲ Exercise 5-4 Compute the values of the true outcome variable τ0 = E X=0 (Y |U ) in the example
presented in RS-Table 2.1.

⊲ Exercise 5-5 Suppose that X is a binary treatment variable and Y an outcome variable. Why are
the conditional expectation values E (Y |X =0), E (Y |X =1), and their difference, the prima facie effect
E (Y |X =1) −E (Y |X =0), often useless in the evaluation of the causal total effect?

⊲ Exercise 5-6 Suppose that X is a treatment variable and Y an outcome variable. If the conditional
expectation values E (Y |X =x ) and their differences E (Y |X =x ) −E (Y |X =x ′ ) do not represent the
treatment effects we are interested in, then what are the treatment effects we would like to study?

⊲ Exercise 5-7 What is the difference between the causal average total effect ATE xx ′ and the prima
facie effect PFE xx ′ ?

⊲ Exercise 5-8 What is the causal conditional total effect CTE Z ; x x ′ (z) on Y comparing x to x ′ given
the value z of a random variable Z ?

⊲ Exercise 5-9 What is the causal conditional total effect CTE X ; x x ′ (x ∗ ) on Y comparing x to x ′ given
the value x ∗ of the putative cause variable X ?

⊲ Exercise 5-10 What is the causal conditional total effect CTE (X ,Z ); xx ′ (x ∗, z) on Y comparing x to
x ′ given treatment x ∗ and the value z of a random variable Z ?

⊲ Exercise 5-11 Use RS-Theorem 1.38 to compute the probability P(X =1) for the example dis-
played in Table 5.2.

⊲ Exercise 5-12 Show that Proposition (5.39) follows from independence of X and D X .

⊲ Exercise 5-13 Open your RStudio, go to www.causal-effects.de, tools, Aggregation[0,1]Xplorer,


shinyapplication.r, download, open the file, Run App. Then choose the example ‘Treatment effect, X
and Z are dependent [Inversion of SME and MEM]’, and click on ‘Compute aggregated effects and
visualizations’.

⊲ Exercise 5-14 Compute the probability P X =1(U =Ann, Z = 0) in the example of Table 5.2.

⊲ Exercise 5-15 Compute the conditional probabilities P X =x (Z=z |U =u ) in Table 5.2.

⊲ Exercise 5-16 Compute the causal average total effect ATE 10 for the random experiment pre-
sented in Table 5.2.
5.8 Exercises 149

⊲ Exercise 5-17 Compute the causal conditional total effect CTE Z ; 10 (0) given no group therapy for
the random experiment presented in Table 5.2.

⊲ Exercise 5-18 Let Z represent sex with values m (males) and f (females). Furthermore, suppose
CTE Z ;10 (m) = 11, CTE Z ; 10 ( f ) = 5, P(Z =m) = 1/3, and P(Z = f ) = 2/3. Which is the causal average
total effect ATE 10?

Solutions

⊲ Solution 5-1 First, we assume that there is a probability space (Ω,A,P). (In an empirical applica-
tion, this probability space represents the concrete random experiment considered.) Second, there
are two random variables on (Ω,A,P), say X and Y, where X represents the putative cause variable
and Y the outcome variable. Third, there is a filtration (Ft )t ∈T in A in which X is prior to Y (see Box
3.1). Fourth, we assume that P (X =x ) > 0. Fifth, we assume that D X is a global potential confounder
of X . By definition, τx = E X =x (Y |D X ) is P X =x -unique. In this definition, we do not assume that the
true outcome variable τx = E X =x (Y |D X ) is P-unique (see Exercise 5-2).

⊲ Solution 5-2 By definition of a true outcome variable τx = E X =x (Y |D X ), there may be different


versions of a true outcome variable. In general, two such versions τx and τx∗ are identical almost
surely with respect to the probability measure P X =x. This is what we mean saying that τx is P X =x -
unique (see Rem. 5.16). If we assume that τx is P-unique, then two versions τx and τx∗ are identical
almost surely with respect to the probability measure P (see Rem. 5.17).
⊲ Solution 5-3 The value τ0 (ω4 ) of the true outcome variable τ0 is E (Y | X =0,U =Joe ) = .7. In con-
trast, the value E (Y | X ,U )(ω4 ) of the conditional expectation E (Y | X ,U ) is E (Y | X =1,U =Joe ) = .8
(see the fourth row in Table 5.1).
⊲ Solution 5-4 The values of the true outcome variable τ0 = E X=0 (Y |U ) are the two conditional ex-
pectation values E (Y | X =0,U =Joe ) and E (Y | X =0,U =Ann ). Because Y is binary, E (Y | X =0,U =u )
= P(Y =1| X =0,U =u ). These conditional probabilities can be computed from the probabilities of
the elementary events presented in RS-Table 2.1 as follows:

P(Y =1, X =0,U =Joe ) .35


P(Y =1| X =0,U =Joe ) = = = .7
P(X =0,U =Joe ) .15 + .35

and
P(Y =1, X =0,U =Ann) .06
P(Y =1| X =0,U =Ann) = = = .2.
P(X =0,U =Ann) .24 + .06
⊲ Solution 5-5 A potential confounder W of X may determine the probabilities of being treated
[i. e., P(X =x |W ) 6= P(X =x ), x ∈ {0,1}] and the (X =x )-conditional expectation values of the out-
come variable Y [i. e., E X =x (Y |W ) 6= E X =x (Y ), x ∈ {0,1}]. In this case, there are examples in which
the difference E (Y |X =1) − E (Y |X =0) is not identical to the treatment effects to be studied. Simp-
son’s paradox presented in chapter 1 is such an example. Another example of such a potential con-
founder is W = severity of symptoms. If there is self-selection or if there is systematic selection to
treatment by experts that is also determined by the severity of the symptoms, then W will affect the
treatment probability and the conditional expectation values of the outcome variable (e. g., severity
of symptoms after treatment).
⊲ Solution 5-6 The basic idea is to consider the true total effect variable, that is, the difference

τx −τx ′ = E X =x (Y |D X ) −E X =x (Y |D X ), where we condition on a global potential confounder D X of
X . This means controlling for all potential confounders. If we take the expectation of the difference
τx −τx ′ (over the distribution of D X ), then this yields the causal average total effect on Y compar-
ing x to x ′. Note that taking this expectation necessitates to assume that τx and τx ′ are P-unique.
150 5 True Outcome Variable and Causal Total Effects

This assumption implies that the expectation E (τx −τx ′ ) is identical for all versions of τx and τx ′ (see
Remarks. 5.27 to 5.29).

⊲ Solution 5-7 The causal average total effect ATE x x ′ comparing treatment x to treatment x ′ has
been defined by Equation (5.21) (see also the solution to Exercise 5-6). It is this causal average total
effect that is of interest in the empirical sciences if our goal is to evaluate the treatment conditions x
and x ′ with respect to the outcome variable Y by a single number. In contrast, the prima facie effect
PFE x x ′ comparing x to x ′ is usually not of interest for the evaluation of such a treatment effect be-
cause it can be biased. Both terms differ from each other because PFE x x ′ = E (Y | X =x ) −E (Y | X =x ′ )
is not neccesarily identical to ATE xx ′ = E (τx )−E (τx ′ ). Note, however, that there are conditions under
which PFE xx ′ = ATE x x ′ . Such conditions, which are called causality conditions, are studied in some
detail in the next chapters.

⊲ Solution 5-8 The causal conditional total effect (on the outcome variable Y ) comparing x to x ′
given the value z of a random variable Z is the (Z=z)-conditional expectation value of the true total
effect variable δxx ′ = τx −τx ′ , that is,

CTE Z ; x x ′ (z) = E (δxx ′ | Z=z).

It is presumed that P(Z=z ) > 0 and that τx and τx ′ are P Z=z-unique. These assumptions imply that
CTE Z ; x x ′ (z) is a uniquely defined number.

⊲ Solution 5-9 The causal conditional total effect (on Y ) comparing x to x ′ given the value x ∗ of the
putative cause variable X is the (X =x ∗)-conditional expectation value of δxx ′ = τx −τx ′ , that is,

CTE X ; x x ′ (x ∗ ) = E (δxx ′ | X =x ∗ ),


where we presume P(X =x ∗ ) > 0 and that τx and τx ′ are P X =x -unique. If X represents a treatment
variable, x ∗= x , and x ′ = 0 represents a control group, then CTE X ; x0 (x) is the causal conditional total
effect comparing treatment x to control given treatment x. If x ∗= x ′ and x ′ = 0 represents a control
group, then CTE X ; x0 (0) is the causal conditional total effect comparing treatment x to control given
control. Although this sounds paradoxical, the term CTE X ; x0 (0) is meaningful and well-defined. Like
all other causal effects it refers to a random experiment to be conducted in the future. This means
that these concepts are well-defined even if the experiment is not yet conducted, or will never be
conducted (see sect. 5.4.2 for more details).

⊲ Solution 5-10 The causal conditional total effect CTE (X ,Z ); xx ′ (x ∗, z) (on Y ) comparing x to x ′
given treatment x ∗ and value z of Z is the (X =x ∗, Z=z)-conditional expectation value of the true-
effect variable, that is,
CTE (X ,Z ); x x ′ (x ∗, z) = E (δxx ′ | X =x ∗, Z=z),

where we presume P (X , Z )=(x ∗, z) > 0 and that τx and τx ′ are P X =x ,Z=z -unique. If X represents
¡ ¢

a treatment variable, x = x, the value 0 of X represents a control group, and m (male) the value
of Z = sex, then CTE (X ,Z ); x0 (x,m) is the causal conditional total effect comparing treatment x to
control given treatment x and the person to be sampled is male. If x ∗= 0, then CTE (X ,Z ); x0 (0,m) is
the causal conditional total effect comparing treatment x to control given control and the person to
be sampled is male.

⊲ Solution 5-11 Note that the four pairs (u, z) of values of U and Z are disjoint and all these pairs
of values have positive probabilities. Hence, we can apply the theorem of total probability [see RS-
Eq. (1.38)]:
XX
P(X =1) = P(X =1|U =u , Z=z ) · P(U =u , Z=z)
u z
3 1 3 1 1 1
µ ¶
= + + + · = .
4 4 4 4 4 2
5.8 Exercises 151

Also note that P(X =1) = E (1X =1 ) = E [E (1X =1 |U , Z )] [see RS-Box 4.1 (iv)], and that the proba-
bilities P(X =1|U =u , Z=z) are the values of the conditional expectation E (1X =1 |U , Z ). Then using
RS-Equation (3.13) yields the same formula. This second way makes clear that the unconditional
probability P(X =1) is the expectation of the conditional probability P(X =1|U , Z ) [see again RS-Box
4.1 (iv)].

⊲ Solution 5-12 Independence of X and D X implies that also X and E X =x (Y |D X ) are independent
because E X =x (Y |D X ) is D X -measurable [see RS-Box 2.1 (iv)]. Hence,

E (τx |X ) = E E X =x (Y |D X ) ¯ X τx = E X =x (Y |D X ), RS-Box 4.1 (xiv)


¡ ¯ ¢ £ ¤
P P

= E E X =x (Y |D X )
¡ ¢
[RS-Box 4.1 (v)]
P
X =x
£ ¤
= E (τx ). τx = E (Y |D X ), RS-Box 3.1 (v)
P

The prove for τx ′ is analog.

⊲ Solution 5-13 No solution provided. See what happens with the various techniques of re-aggre-
gating conditional effects, for example, re-aggregating the log odds ratios and compare it to re-ag-
gregating conditional effects according to Equation (5.44). Play with other parameter constellations.

⊲ Solution 5-14 We have to use the equation

P(X =x ,U =u , Z=z) P(X =x |U =u , Z=z) · P(U =u , Z=z)


P X =x (U =u , Z=z) = =
P(X =x ) P(X =x )

[see RS-Eq. (5.1)]. For U =Ann, Z = 0, and X =1, this equation yields

P(X =1|U =Ann, Z = 0) · P(U =Ann, Z = 0)


P X =1(U =Ann, Z = 0) =
P(X =1)
3/4 · 1/4 3
= = .
1/2 8
⊲ Solution 5-15 If P(X =x ) > 0, then, applying RS-Equation (5.1) several times, yields

P X =x (Z=z,U =u )
P X =x (Z=z |U =u ) =
P X =x (U =u )
P(Z=z,U =u , X =x )/P(X =x )
=
P(U =u , X =x )/P(X =x )
= P(Z=z | X =x ,U =u ) .

For both units u, the complementary conditional probability to P(X =1|U =u , Z = 0) = 3/4 (see
Table 5.2) is P(X =0 |U =u , Z = 0) = 1/4. Similarly the complementary conditional probability to
P (X =1|U =u , Z =1) = 1/4 (see again Table 5.2) is P(X =0 |U =u , Z =1) = 3/4. Now we can use the
equation
P(X =x |U =u , Z =z) · P(Z =z |U =u )
P(Z =z | X =x ,U =u ) = X ,
P(X =x |U =u , Z=z ) · P(Z=z |U =u )
z
which follows from Bayes’ Theorem (see RS-Th. 1.39) using P(Z=z | X =x ,U =u ) = P U =u (Z=z | X =x )
[see RS-Eq. (5.26)]. For example, the probability of not receiving group therapy (Z = 0), if Ann is
drawn (U =Ann) and does not receive individual therapy (X =0), is

P(X =0 |U =Ann, Z = 0) · P(Z = 0 |U =Ann )


P(Z = 0 | X =0,U =Ann ) = X , (5.63)
P(X =0 |U =Ann, Z=z) · P(Z=z |U =Ann )
z

where P(X =0 |U =Ann, Z = 0) = 1/4, P(Z = 0 |U =Ann ) = 1/2, and


152 5 True Outcome Variable and Causal Total Effects
X
P(X =0 |U =Ann, Z=z) · P(Z=z |U =Ann )
z
= P(X =0 |U =Ann, Z = 0) · P(Z = 0 |U =Ann ) + P(X =0 |U =Ann, Z =1) · P(Z =1|U =Ann )

1 1 3 1 1
= · + · = .
4 2 4 2 2
Inserting this result into Equation (5.63) yields the conditional probability

P(Z = 0 | X =0,U =Ann ) = P X=0(Z = 0 |U =Ann ) = 1/4.

Using the same procedure for all values of U , X , and Z leads to the other conditional probabilities
displayed in the last two columns of Table 5.2.
⊲ Solution 5-16 In this example, (U , Z ) is a global potential confounder of X . Hence, according to
RS-Equation (3.14), Equations (5.20) and (5.21), using the parameters shown in Table 5.2 results in
XX¡ ¢
ATE 10 = E (Y | X =1,U =u , Z=z) − E (Y | X =0,U =u , Z=z) · P(U =u , Z=z)
u z
¡ ¢ 1
= (82 − 68) + (100 − 96) + (98 − 80) + (106 − 104) · = 9.5.
4
⊲ Solution 5-17 In this example, (U , Z ) is a global potential confounder of X . Hence, according to
RS-Equation (3.14) and Equation (5.31), E (δ10 |Z = 0) = E Z = 0 (δ10 ), using the parameters displayed
in Table 5.2 results in
XX¡ ¢
CTE Z ;10 (0) = E (Y | X =1,U =u , Z=z) − E (Y | X =0,U =u , Z=z ) · P(U =u , Z=z | Z = 0)
u z
E (Y | X =1,U =u , Z=z) − E (Y | X =0,U =u , Z=z ) · P Z = 0 (U =u , Z=z)
XX¡ ¢
=
u z
1 1
= (82 − 68) · + (100 − 96) · 0 + (98 − 80) · + (106 − 104) · 0 = 16.
2 2
⊲ Solution 5-18 Using Equation (5.41), we can compute the causal average total effect as follows:

1 2 1 2
ATE 10 = CTE Z ;10 (m) · +CTE Z ; 10 ( f ) · = 11 · + 5 · = 7.
3 3 3 3
Part III

Causality Conditions
Chapter 6

Unbiasedness and Identification of Causal Effects

In chapter 4, we introduced the concepts of a regular probabilistic causality setup, a po-


tential confounder or covariate of X , and a global potential confounder of X . In chapter 5,
we turned to the concepts of a true outcome variable, a true total effect variable, a causal
average total effect, a causal conditional total effect variable, and a causal conditional total
effect. All these parameters and variables are of a theoretical nature. It is not evident how
they can be computed (identified) from parameters of the joint distribution of observable
random variables such as X , Y, and a (possibly multivariate) covariate Z . That is, it is not
yet evident how causal effects can be computed from those parameters that can be esti-
mated in a data sample.

Tackling this problem, in this chapter we introduce and study unbiasedness of various
conditional expectation values, conditional expectations, prima facie effects, and prima
facie effect functions. In particular, we study how and under which conditions these pa-
rameters and random variables can be used to identify the corresponding causal effects
and effect functions. Hence, in this chapter we provide the link between causal effects and
causal effect functions on one side and parameters and functions that can empirically be
estimated on the other side. The unbiasedness conditions are the first and logically weak-
est kind of causality conditions, which, together with the structural components listed in
a regular probabilistic causality setup, distinguish causal stochastic dependencies from
ordinary stochastic dependencies that have no causal meaning.

We start with the concept of unbiasedness of a conditional expectation value E (Y |X =x ),


unbiasedness of the corresponding prima facie effects, that is, of the differences E (Y |X =x )
− E (Y |X =x ′ ), and unbiasedness of a conditional expectation E (Y |X ). We also treat a
first way to identify the causal average total effect ATE x x ′ . Then we consider an addi-
tional random variable Z that is assumed to be a covariate of X , and define unbiased-
ness of a conditional expectation value E (Y |X =x ,Z =z), unbiasedness of the conditional
expectations E X =x (Y |Z ), E Z=z (Y |X ), and E (Y |X, Z ), as well as the prima facie effect vari-

ables E X =x (Y |Z ) − E X =x (Y |Z ). We also show how to identify the causal average total ef-
fect ATE x x ′ as well as a causal conditional total effect function CTE Z ; xx ′ and a causal
conditional total effect CTE Z ; x x ′ (z). Next, we illustrate these concepts by some numer-
ical examples. Finally, we show that unbiasedness can be accidental, presenting an ex-
ample in which the conditional expectation values E (Y |X =x ) are unbiased, whereas the
conditional expectation values E (Y |X =x ,Z =z) are not, although Z is a covariate of X .
This example emphasizes the limitations of unbiasedness, in particular if compared to the
causality conditions presented in chapters 8 and 10.
156 6 Unbiasedness and Identification of Causal Effects

Requirements

Reading this chapter we assume again that the reader is familiar with the contents of the
first five chapters of Steyer (2024), referred to as RS-chapters 1 to 5. Furthermore, we as-
sume familiarity with chapters 4 and 5 of the present book.
In this chapter we often will refer to the following notation and assumptions.

Notation and Assumptions 6.1


¡ ¢
(a) Let (Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y be a regular probabilistic causality setup and
let D X denote a global potential confounder of X .
(b) Let (ΩX′ , AX′ ) denote the value space of X , let x ∈ ΩX′ , let {x } ∈ AX′ , and let 1X =x
denote the indicator of the event {X =x } = {ω ∈ Ω: X (ω) = x }.
(c) Let Y be real-valued with positive variance.
(d) Assume P (X =x ) > 0, define the probability measure P X =x : A → [0, 1] by
P X =x (A) = P (A | X =x ), for all A ∈ A, let τx = E X =x (Y |D X ) denote a (version of
the) true outcome variable of Y given x, and E X =x (Y |D X ) the set of all such ver-
sions.
(e) Let Z be a covariate of X , that is, let σ(Z ) ⊂ σ(D X ), and let (ΩZ′ , AZ′ ) denote its
value space.
(f ) Let z ∈ ΩZ′ be a value of Z , let {z } ∈ AZ′ , assume P (Z =z) > 0, and define the pro-
bability measure P Z=z : A → [0, 1] by P Z=z (A) = P (A | Z =z), for all A ∈ A.
(g) Let x ′ ∈ ΩX′ and {x ′ } ∈ AX′ , let 1X =x ′ denote the indicator of the event {X =x ′ } =

{ω ∈ Ω: X (ω) = x ′ }, assume 0 < P (X =x ′ ) < 1, define the measure P X =x analo-

gously to P X =x in Assumption (d), let τx ′ = E X =x (Y |D X ) denote a (version of the)
true outcome variable given x ′, and define δx x ′ := τx − τx ′ .
(h) Let X (Ω) = {0, 1, . . . , J } denote the image of Ω under X , for all x ∈ X (Ω), let
{x } ∈ AX′ and assume 0 < P (X =x ) < 1. Finally, let τ := (τ0 , τ1 , . . . , τJ ) denote the
(J + 1)-variate random variable of the true outcome variables τx , x ∈ X (Ω).

6.1 Unbiasedness of E (Y |X ) and Its Values E (Y |X =x )

Under the Assumptions 6.1 (a) to (d) and (g), and assuming that τx = E X =x (Y |D X ) and

τx ′ = E X =x (Y |D X ) are P-unique, we defined the average total effect by ATE x x ′ = E (δxx ′ )
(see sect. 5.3). Inserting the definition of a true total effect variable δxx ′ = τx − τx ′ and the
definition of a true outcome variable yields

ATE x x ′ = E (δxx ′ ) = E (τx − τx ′ )


′ (6.1)
= E (τx ) − E (τx ′ ) = E E X =x (Y |D X ) − E E X =x (Y |D X ) .
¡ ¢ ¡ ¢

Under the assumptions mentioned above, including P -uniqueness of τx and τx ′ , all terms
in these equations are uniquely defined numbers [see RS-Th. 5.27 (v)].

Remark 6.2 [A Condition Equivalent to P -Uniqueness] Remember, according to RS-The-


orem 5.27 (ii), P -uniqueness of τx is equivalent to P (X =x |D X ) >
P
0, which is defined by
¡ ¢
P (X =x |D X ) >
P
0 :⇔ P { ω ∈ Ω: P (X =x |D X )(ω) > 0} = 1. (6.2)
6.1 Unbiasedness of E (Y |X ) and Its Values E (Y |X =x ) 157

Hence, assuming P (X =x |D X ) >P


0 means that the probability of P (X =x | D X ) taking on a
value greater than zero is one. ⊳

Definition 6.3 [Unbiasedness of E (Y |X =x ) and E (Y |X )]


Let the Assumptions 6.1 (a) to (d) hold.
(i) We call the conditional expectation value E (Y |X =x ) unbiased and denote it by
E (Y |X =x ) ⊢ D X if the following two conditions hold:
(a) τx is P -unique
(b) E (Y |X =x ) = E (τx ).
(ii) Let the Assumptions 6.1 (a) to (d) and (h) hold. Then we call the conditional
expectation E (Y |X ) unbiased , denoted E (Y |X ) ⊢ D X if, for all x ∈ X (Ω), the
conditional expectation values E (Y |X =x ) are unbiased.

Remark 6.4 [Unbiased With Respect to DC ] Note that Definition 6.3 refers to total effects
true outcome variables. Considering these true outcome variables τx = E X =x (Y |D X ), we
condition on a global potential confounder D X of X , and with it on its generated σ-algebra
σ(D X ) = DC (see RS-Def. 5.4). This is also the reason why DC occurs in the shortcuts for
unbiasedness. ⊳

Example 6.5 [No Treatment for Joe] In the example displayed in RS-Table 2.1, we assume
that the regular causality space is the same as specified in Example 4.10, and again, X , Y ,
and U take the roles of the putative cause variable, the outcome variable, and the global
potential confounder D X . Furthermore, the co-domain ΩX′ of X is any subset of R (includ-
ing R itself) containing the elements 0 and 1. The values of the two true outcome variables

τx = P X =x (Y =1|U ) = E X =x (Y |U ), x = 0, 1,

are displayed in the table. Whereas τ0 is the only element of the set E X=0 (Y |U ) (which
means that τ0 is uniquely defined and therefore P-unique), τ1 is not the only element

in the set E X =1 (Y |U ) because the random variable τ1∗ = P X =1(Y =1|U ) displayed in the
last column of RS-Table 2.1 is also an element of E X =1
(Y |U ). Furthermore, τ1 = τ ∗ does
P 1

not hold. Instead, τ1 (ω) 6= τ1 (ω) for ω ∈ {ω1 , . . . , ω4 } and P ({ω1 , . . . ω4 }) = .5. Hence, in this
example, the conditional expectation value E (Y |X =0) is unbiased, whereas the condi-
tions of the definition of unbiasedness neither hold for E (Y |X =1) nor for the conditional
expectation¡ E (Y |X ) because τ1 is not ¡P-unique (see Def. 6.3). In fact, in this example,
∗¢
E (τ1 ) = E P X =1(Y =1|U ) 6= E (τ1∗ ) = E P X =1(Y =1|U ) , which is easily seen looking at
¢

the last two columns of RS-Table 2.1 and the probabilities P ({ωi }) displayed in the first
numerical column. ⊳

Remark 6.6 [Unbiased Estimators Versus Unbiased Parameters] Unbiasedness in statis-


tics usually refers to estimators of a parameter. In the framework of the theory of causal ef-
fects, we say that the conditional expectation value E (Y |X =x ) is unbiased if it is identical
to E (τx ), which is the parameter of interest in causal inference. Of course, this parameter
has to be uniquely defined. This is secured by requiring P -uniqueness of τx [see Def. 6.3
and RS-Th. 5.27 (v)]. ⊳
158 6 Unbiasedness and Identification of Causal Effects

Remark 6.7 [Expectation With Respect to the Measure P X =x ] If P (X =x ) > 0, then

E (Y |X =x ) = E X =x (Y ) (6.3)

[see RS-Eq. (3.24)]. Hence, if P (X =x ) > 0, then the conditional expectation value E (Y |X =x )
is unbiased if and only if the expectation E X =x (Y ) of Y with respect to the conditional
probability measure P X =x [see again RS-Eq. (5.1)] is unbiased, that is, if and only if τx is
P-unique and

E X =x (Y ) = E (τx ) = E E X =x (Y |D X ) .
¡ ¢
(6.4)


Remark 6.8 [Identification of E (τx )] Unbiasedness of E (Y |X =x ) is important because it
gives us access to the expectation E (τx ) of the true outcome variable τx . If E (Y |X =x ) is
unbiased, then, according to Definition 6.3 (i), an estimate of E (Y |X =x ) is also an estimate
of E (τx ). In contrast to E (τx ), the conditional expectation value E (Y |X =x ) can often be
estimated from a data sample of the random variables X and Y. The sample mean of the
values of Y that are observed together with the value x of the putative cause variable X is
such an estimate of E (Y |X =x ) (see RS-Exercise 3-4). If X is a treatment variable, then this
is the sample mean of the observed values of Y in treatment x. ⊳

Equivalent Conditions of Unbiasedness of E (Y |X ) and E (Y |X =x )

In the following theorem, we present three conditions that are equivalent to unbiasedness
of E (Y |X =x ). Each of these conditions involves a true outcome variable τx . Reading this
theorem, remember

τx  1X =x ⇔ E (τx | 1X =x ) =
P
E (τx ) (6.5)

[see RS-Prop. (4.35)]. Hence, we read τx  1X =x as τx is mean-independent from 1X =x , the


indicator variable of the event {X =x } = {ω ∈ Ω: X (ω) = x } that X takes on the value x.

Theorem 6.9 [Equivalent Conditions of Unbiasedness of E (Y |X =x )]


Let the Assumptions 6.1 (a) to (d) hold and assume that τx is P-unique.
(i) Then each of the following two equations is equivalent to E (Y |X =x ) ⊢ D X .

E X =x (τx ) = E (τx ) (6.6)


E (τx | 1X =x ) =
P
E (τx ). (6.7)

(ii) If, additionally,

εx := τx − E (τx | 1X =x ), (6.8)

then each of the following two equations is also equivalent to E (Y |X =x ) ⊢ D X

E X =x (εx ) = E (εx ) (6.9)


E (εx | 1X =x ) =
P
E (εx ). (6.10)
(Proof p. 187)
6.2 Unbiasedness of E (Y |X, Z ) and Related Terms 159

Remark 6.10 [Sufficient Conditions of Unbiasedness] Later it will be shown that E X =x (τx )
= E (τx ) as well as τx  1X =x follow from independence of τx and 1X =x (see Th. 7.7), which
itself follows from D X ⊥⊥ 1X =x , that is, from independence of a global potential confounder
D X of X and the indicator 1X =x [see Th. 8.22 (i) for X = 1X =x ]. Note that, in an experi-
ment in which X is the treatment variable and the person variable U takes the role of a
global potential confounder of X , the independence condition D X ⊥ ⊥ 1X =x can be created
by randomized assignment of the observational unit to treatment x (see the examples in
RS-Table 1.2 and in Table 6.3). ⊳
Remark 6.11 [Dichotomous X ] If X is dichotomous and x is one of the two values, then

E (τx |X ) =
P
E (τx | 1X =x ) =
P
E (τx | 1X 6=x ), (6.11)

because, if X is dichotomous, then the σ-algebras generated by X , 1X =x , and 1X 6=x are iden-
tical. Remember, the σ-algebras σ(X ), σ(1X =x ), and σ(1X 6=x ) play a crucial role in the defini-
tion of the conditional expectations E (τx |X ), E (τx | 1X =x ), and E (τx | 1X 6=x ) (see RS-Def. 4.4).
Hence, if X is dichotomous and the conditional expectations E X =x (Y |D X ) and E X 6=x(Y |D X )
are P-unique, then

τx  X ⇔ τx  1X =x ⇔ E (Y |X ) ⊢ D X , (6.12)

where E (Y |X ) ⊢ D X denotes unbiasedness of E (Y |X ). Remember, if X is dichotomous,


then unbiasedness of E (Y |X ) is defined by unbiasedness of E (Y |X =x ) and E (Y |X 6=x) [see
Def. 6.3 (ii)], that is, in this case

E (Y |X ) ⊢ D X ⇔ E (Y |X =x ) ⊢ D X ∧ E (Y |X 6=x) ⊢ D X . (6.13)


Remark 6.12 [Unbiasedness of E (Y |X ) in Quasi-Experiments] In empirical applications
in which there is no randomized assignment of the observational unit to one of the treat-
ment conditions, unbiasedness of E (Y |X ) or E (Y |X =x ) is not very likely. However, if we
additionally consider a (uni- or multivariate) covariate Z of X and the conditional expecta-
tion values E (Y |X =x , Z =z), then unbiasedness of these parameters and of the conditional
expectation E (Y |X, Z ) is much more realistic, even beyond experiments with randomized
assignment of the unit to a treatment condition. This motivates the following section. ⊳

6.2 Unbiasedness of E (Y |X , Z ) and Related Terms

Now we extend the concept of unbiasedness to conditioning on a (possibly multivariate)


covariate Z of X or on one of its values z. That is, we assume σ(Z ) ⊂ σ(D X ). Note that
Z := (Z 1 , . . . , Z m ) is a covariate of X if and only if Z 1 , . . . , Z m are covariates of X (see RS-sect.
2.1).

6.2.1 Definition and First Properties

Remember that a true outcome variable τx denotes a version of the D X -conditional expec-
tation of Y with respect to the probability measure P X =x . That is, τx denotes an element of
the set E X =x (Y |D X ) (see Def. 5.4 and Rem. 5.5). Furthermore, remember
160 6 Unbiasedness and Identification of Causal Effects

P X =x (Z =z) > 0 ⇔ P Z=z (X =x ) > 0 ⇔ P (X =x , Z =z) > 0, (6.14)

and that P (X =x , Z =z) > 0 does not only imply P X =x (Z =z) > 0 and P (Z =z) > 0, but also

E X =x (Y |Z =z) = E Z=z(Y |X =x ) = E (Y |X =x , Z =z) (6.15)

[see RS-Eq. (5.26)]. All terms appearing in Proposition (6.14) and Equation (6.15) have been
introduced in RS-section 5.1.
Furthermore, remember, under the Assumptions 6.1 (a) to (f),

τx is P Z=z-unique :⇔ ∀τx , τx∗ ∈ E X =x (Y |D X ): P Z=z ω ∈ Ω: τx (ω) = τx∗(ω) = 1, (6.16)


¡© ª¢

and that this property follows from P -uniqueness of τx [see RS-Box 5.1 (v)]. Also note that
P Z=z -uniqueness of τx is equivalent to P (X =x |D X ) > 0 (see RS-Th. 5.27), which is de-
P Z=z
fined by

P (X =x |D X ) > 0 :⇔ P Z=z ω ∈ Ω: P (X =x |D X )(ω) > 0 = 1.


¡© ª¢
(6.17)
P Z=z

> 0, then we also say that P (X =x |D X ) is positive, P Z=z -almost surely.


If P (X =x |D X ) Z=z
P

Definition 6.13 [Unbiasedness of E X =x (Y |Z ) and E X =x (Y |Z =z)]


Let the Assumptions 6.1 (a) to (e) hold.
(i) Then E X =x (Y |Z ) is called unbiased , denoted E X =x (Y |Z ) ⊢ D X , if

(a) τx is P-unique
(b) E X =x (Y |Z ) =
P
E (τx |Z ).

(ii) If we additionally assume 6.1 (f ) and P X =x (Z =z) > 0, then E X =x (Y |Z =z) is


called unbiased , denoted E X =x (Y |Z =z) ⊢ D X , if

(a) τx is P Z=z-unique
(b) E X =x (Y |Z =z) = E (τx |Z =z).

Under the assumptions of Definition 6.13 (ii), the (Z =z)-conditional expectation value
E (τx |Z =z) is uniquely defined. Furthermore, if τx is P Z=z-unique for all z ∈ Z (Ω), then,
according to Lemma 5.37, it is also P-unique and
∀ z ∈ Z (Ω): E (τx |Z )(ω) = E (τx |Z =z), if ω ∈ {Z =z } . (6.18)

That is, if τx is P Z=z-unique for all z ∈ Z (Ω), then the (Z =z)-conditional expectation values
E (τx |Z =z) are the uniquely defined values of the conditional expectation E (τx |Z ).
Remark 6.14 [Unbiasedness of E Z=z(Y |X =x ) and E (Y |X =x , Z =z)] Let the assumptions
of Definition 6.13 (ii) hold. Then Proposition (6.15) allows us to define

E Z=z(Y |X =x ) ⊢ D X :⇔ E X =x (Y |Z =z) ⊢ D X (6.19)

and

E (Y |X =x , Z =z) ⊢ D X :⇔ E X =x (Y |Z =z) ⊢ D X . (6.20)

Hence, unbiasedness of E Z=z(Y |X =x ) and of E (Y |X =x , Z =z) are equivalent to unbiased-


ness of E X =x(Y |Z =z) and to each other. ⊳
6.2 Unbiasedness of E (Y |X, Z ) and Related Terms 161

Remark 6.15 [Identification of E (τx |Z =z) and E (τx |Z )] According to Defniition 6.13 (i), if
E X =x (Y |Z ) is unbiased, then we can identify the Z -conditional expectation E (τx |Z ) by
E X =x (Y |Z ). Hence, an estimate of E X =x (Y |Z ) is also an estimate of E (τx |Z ) provided that
E X =x (Y |Z ) is unbiased. Analogously, if E X =x (Y |Z =z) is unbiased, then, according to Defi-
nition 6.13 (i), we can identify the (Z =z)-conditional expectation E (τx |Z =z) of a true out-
come variable τx by the (Z =z)-conditional expectation E X =x(Y |Z =z) of Y with respect to
the conditional probability measure P X =x . Hence, an estimate of E X =x(Y |Z =z) is also an
estimate of E (τx |Z =z), provided that E X =x(Y |Z =z) is unbiased. In contrast to E (τx |Z ), the
conditional expectation E X =x (Y |Z ) can often be estimated from a data sample of the ran-
dom variables X , Y, and Z using the values of Y and Z observed together with the value x
of the putative cause variable X (see Exercise 6-10). ⊳

Theorem 6.16 [An Implication of Unbiasedness of E X =x (Y |Z ) For E (Y |X =x , Z =z)]


Let the Assumptions 6.1 (a) to (f ) hold and assume P (X =x , Z =z) > 0. Then

E X =x (Y |Z ) ⊢ D X ⇒ E (Y |X =x , Z =z) ⊢ D X . (6.21)
(Proof p. 188)

In the following theorem we treat a sufficient condition for unbiasedness of a condi-


tional expectation E X =x (Y |Z ) with respect to the probability measure P X =x .

Theorem 6.17 [Sufficient Condition for Unbiasedness of E X =x (Y |Z )]


Let the Assumptions 6.1 (a) to (f ) hold and for all z ∈ Z (Ω) assume P X =x (Z =z) > 0. Then

∀ z ∈ Z (Ω): E X =x (Y |Z =z) ⊢ D X ⇒ E X =x (Y |Z ) ⊢ D X . (6.22)


(Proof p. 188)

In the following definition we introduce unbiasedness of E (Y |X, Z ) and of E Z=z (Y |X ),


building on the terms introduced in Definition 6.13. The latter term can be defined only
if we assume P (X =x , Z =z) > 0 for all values x ∈ X (Ω) of X . Among other things, this as-
sumption implies P (Z =z) > 0 and P (X =x ) > 0 for all x ∈ X (Ω).

Definition 6.18 [Unbiasedness of E (Y |X, Z ) and E Z=z (Y |X )]


Let the Assumptions 6.1 (a) to (e) and (h) hold.
(i) Then E (Y |X, Z ) is called unbiased , denoted E (Y |X, Z ) ⊢ D X , if

∀ x ∈ X (Ω): E X =x (Y |Z ) ⊢ D X . (6.23)

(ii) If we additionally assume 6.1 (f ) and P Z=z (X =x ) > 0, for all x ∈ X (Ω), then
E Z=z (Y |X ) is called unbiased , denoted E Z=z (Y |X ) ⊢ D X , if

∀ x ∈ X (Ω): E Z=z(Y |X =x ) ⊢ D X . (6.24)

Under the assumptions of Definition 6.18 (ii),


162 6 Unbiasedness and Identification of Causal Effects

∀ x ∈ X (Ω): E Z=z(Y |X =x ) ⊢ D X ⇔ ∀ x ∈ X (Ω): E X =x(Y |Z =z) ⊢ D X (6.25)


⇔ ∀ x ∈ X (Ω): E (Y |X =x , Z =z) ⊢ D X (6.26)

[see Props. (6.19) and (6.20)]. Hence, we can replace Proposition (6.24) by each of the
propositions on the right-hand sides of (6.25) and (6.26).
In the next theorem we treat a condition that is equivalent to unbiasedness of a condi-
tional expectation E Z=z (Y |X ).

Theorem 6.19 [[A Condition Equivalent to Unbiasedness of E Z=z (Y |X )]


Let the Assumptions 6.1 (a) to (f ) and (h) hold. Furthermore, for all x ∈ X (Ω), assume
P Z=z (X =x ) > 0 and that τx is P Z=z-unique. Then

∀ x ∈ X (Ω): E Z=z(Y |X =x ) = E Z=z (τx ) E Z=z (Y |X ) ⊢ D X .


¡ ¢
⇔ (6.27)
(Proof p. 188)

6.2.2 Equivalent Conditions of Z -Conditional Unbiasedness

Now we turn to some conditions that are equivalent to unbiasedness of a conditional ex-
pectation E X =x (Y |Z ). These conditions are also used in the proofs of sufficient conditions
of unbiasedness (see chs. 8 to 10). Note again, in this chapter we assume that Z is a covari-
ate of X . That is, in contrast to chapter 5 where we defined a causal Z -conditional total
effect variable, now we exclude that a putative cause variable X can take the role of Z .

Theorem 6.20 [Conditions Equivalent to Unbiasedness of E X =x (Y |Z )]


Let the Assumptions 6.1 (a) to (e) hold and that τx is P-unique.

(i) Then

E X =x (Y |Z ) ⊢ D X ⇔ E X =x (τx |Z ) =
P
E (τx |Z ) (6.28)
⇔ E (τx | 1X =x , Z ) =
P
E (τx |Z ). (6.29)

(ii) If, additionally, for all τx ∈ E X =x (Y |D X ), we define

εx := τx − E (τx | 1X =x , Z ), (6.30)

then

E X =x (Y |Z ) ⊢ D X ⇔ E X =x (εx |Z ) =
P
E (εx |Z ) (6.31)
⇔ E (εx | 1X =x , Z ) =
P
E (εx |Z ). (6.32)
(Proof p. 189)

Hence, if we assume that Z is a covariate of X and τx is P-unique, then Theorem 6.20


provides four equations that are equivalent to E X =x (Y |Z ) = P
E (τx |Z ). The first two use the
true outcome variable τx itself, whereas the last two use the residual εx of τx with respect to
the conditional expectation E (τx | 1X =x , Z ) [see Eq. (6.30)]. According to Equation (6.29), the
6.3 Unbiasedness of Prima Facie Effects 163

conditional expectation E X =x (Y |Z ) is unbiased if and only if τx is Z -conditionally mean-


independent from the indicator variable 1X =x .
In the next corollary we present some conditions that are equivalent to unbiasedness
of the conditional expectation E (Y |X , Z ), provided that Z is a covariate of X and all true
outcome variables τx are P-unique. This corollary immediately follows from Theorem 6.20
and Definition 6.18 (i).

Corollary 6.21 [Conditions Equivalent to Unbiasedness of E (Y |X , Z )]


Let the Assumptions 6.1 (a) to (e) and (h) hold, and assume that for all x ∈ X (Ω), τx is
P-unique. Then

E (Y |X, Z ) ⊢ D X ⇔ ∀ x ∈ X (Ω): E X =x (τx |Z ) =


P
E (τx |Z ) (6.33)
⇔ ∀ x ∈ X (Ω): E (τx | 1X =x , Z ) =
P
E (τx |Z ). (6.34)

If we additionally assume that εx is the residual defined by Equation (6.30), then

E (Y |X, Z ) ⊢ D X ⇔ ∀ x ∈ X (Ω): E X =x (εx |Z ) =


P
E (εx |Z ) (6.35)
⇔ ∀ x ∈ X (Ω): E (εx | 1X =x , Z ) =
P
E (εx |Z ). (6.36)

Hence, according to Equation (6.34), the conditional expectation E (Y |X , Z ) is unbiased


if and only if, for all x ∈ X (Ω), the true outcome variable τx is Z -conditionally mean-
independent from the indicator variable 1X =x . Furthermore, E (Y |X , Z ) is also unbiased
if and only if, for all x ∈ X (Ω), the residual εx is Z -conditionally mean-independent from
1X =x [see Eq. (6.36)].
In the next theorem we present two conditions each of which is equivalent to unbiased-
ness of E X =x (Y |Z =z). According to Equation (6.15), these conditions are also equivalent
to unbiasedness of E (Y |X =x , Z =z).

Theorem 6.22 [Condition Equivalent to Unbiasedness of E X =x (Y |Z =z)]


Let the Assumptions 6.1 (a) to (f ) hold. If τx is P Z=z-unique, then

E X =x(Y |Z =z) ⊢ D X ⇔ E X =x (τx |Z =z) = E (τx |Z =z). (6.37)

If we additionally assume that εx is defined by Equation (6.30), then

E X =x (Y |Z =z) ⊢ D X ⇔ E X =x (εx |Z =z) = E (εx |Z =z). (6.38)


(Proof p. 190)

6.3 Unbiasedness of Prima Facie Effects

Now we introduce the concept of unbiasedness of a prima facie effect

PFE x x ′ := E (Y | X =x ) − E (Y | X =x ′ ). (6.39)
164 6 Unbiasedness and Identification of Causal Effects

Furthermore, for a covariate Z of X , we also extend the concept of unbiasedness to a


Z-conditional prima facie effect function PFE Z ; x x ′ : ΩZ′ → R , which satisfies

PFE Z ; x x ′ (Z ) =
P
E X =x (Y |Z ) − E X =x (Y |Z ), (6.40)

where PFE Z ; x x ′ (Z ) denotes the composition of Z and PFE Z ; x x ′ . While PFE Z ; x x ′ assigns to
each value z ∈ ΩZ′ a (Z =z)-conditional prima facie effect of x compared to x ′, the composi-
tion PFE Z ; x x ′ (Z ) is a random variable on (Ω, A, P ) assigning values to each ω ∈ Ω. We call
the composition PFE Z ; x x ′ (Z ) a Z-conditional prima facie effect variable. Finally, presum-
ing P (X =x , Z =z), P (X =x ′, Z =z) > 0, we define the (Z =z)-conditional prima facie effect

PFE Z ; x x ′ (z) := E (Y | X =x , Z =z) − E (Y | X =x ′, Z =z). (6.41)

Definition 6.23 [Unbiasedness of a Prima Facie Effect]


Let the Assumptions 6.1 (a) to (d) and (g) hold.
(i) Then the prima facie effect PFE x x ′ is called unbiased , denoted PFE x x ′ ⊢ DC , if
(a) τx and τx ′ are P-unique
(b) PFE x x ′ = E (τx − τx ′ ).
(ii) If we additionally assume 6.1 (e), then PFE Z ; x x ′ and PFE Z ; x x ′ (Z ) are called
unbiased , denoted PFE Z ; x x ′ ⊢ DC , if
(a) τx and τx ′ are P-unique
(b) PFE Z ; x x ′ (Z ) =
P
E (τx − τx ′ |Z ).

(iii) If we additionally assume 6.1 (f ), P (X =x , Z =z) > 0, and P (X =x ′, Z =z) > 0,


then PFE Z ; x x ′ (z) is called unbiased , denoted PFE Z ; x x ′ (z) ⊢ DC , if

(a) τx and τx ′ are P Z=z-unique


(b) PFE Z ; x x ′ (z) = E (τx − τx ′ |Z =z).

Note again that all three concepts of unbiasedness refer to total effects and that Z de-
notes a covariate of X .
Now we show that unbiasedness of the conditional expectation values E (Y |X =x ) and
E (Y |X =x ′ ) implies unbiasedness of the prima facie effect PFE x x ′ .

Theorem 6.24 [Unbiasedness of the Prima Facie Effect PFE x x ′ ]


Let the Assumptions 6.1 (a) to (d) and (g) hold. Then
¡ ¢
E (Y |X =x ) ⊢ D X ∧ E (Y |X =x ′ ) ⊢ D X ⇒ PFE x x ′ ⊢ DC . (6.42)
(Proof p. 190)

Hence, under the Assumptions 6.1 (a) to (d) and (g), unbiasedness of E (Y |X =x ) and
E (Y |X =x ′ ) implies that the prima facie effect PFE x x ′ is unbiased.
Next, we show that unbiasedness of the Z-conditional expectations E X =x (Y |Z ) and
X =x ′
E (Y |Z ) implies unbiasedness of the Z-conditional prima facie effect function PFE Z ; x x ′ .
6.3 Unbiasedness of Prima Facie Effects 165

Theorem 6.25 [Unbiasedness of Prima Facie Effect Function PFE Z ; x x ′ ]


Let the Assumptions 6.1 (a) to (e), and (g) hold. Then

E X =x (Y |Z ) ⊢ D X ∧ E X =x (Y |Z ) ⊢ D X
¡ ¢
⇒ PFE Z ; x x ′ ⊢ DC . (6.43)
(Proof p. 190)

Hence, under the Assumptions 6.1 (a) to (e) and (g), unbiasedness of E X =x (Y |Z ) and
X =x ′
E (Y |Z ) implies that the Z -conditional prima facie effect function PFE Z ; x x ′ and the
composition PFE Z ; x x ′ (Z ) of PFE Z ; x x ′ and Z , the Z -conditional prima facie effect variable,
are unbiased [see Def. 6.23 (ii)].
In the following theorem, we explicate the relationship between unbiasedness of the

(Z =z)-conditional expectation values E X =x (Y |Z =z), E X =x (Y |Z =z), and unbiasedness of
the (Z =z)-conditional prima facie effect PFE Z ; x x ′ (z).

Theorem 6.26 [Unbiasedness of a (Z =z)-Conditional Prima Facie Effect PFE Z ; x x ′ (z)]


Let the Assumptions 6.1 (a) to (f ) and (g) hold. Then

E X =x(Y |Z =z) ⊢ D X ∧ E X =x (Y |Z =z) ⊢ D X
¡ ¢
⇒ PFE Z ; x x ′ (z) ⊢ DC . (6.44)
(Proof p. 191)

Hence, under the Assumptions 6.1 (a) to (f) and (g), unbiasedness of conditional expec-

tation values E X =x (Y |Z =z) and E X =x (Y |Z =z) implies that the (Z =z)-conditional prima
facie effect PFE Z ; x x ′ (z) is unbiased.
Box 6.1 summarizes the definitions of unbiasedness of various conditional expecta-
tions, their values, prima facie effect functions, and prima facie effects.

Remark 6.27 [Estimability of Conditional Expectations] While true outcome variables τx


and their values are not directly estimable in a data sample unless rather restrictive as-
sumptions are introduced, the conditional expectation values E (Y |X =x ), E (Y |X =x ,Z =z),
and the conditional expectations E X =x (Y |Z ) and E (Y |X, Z ) can be estimated under real-
istic assumptions, and the same is true for the conditional and unconditional prima facie
effects. As we shall see, this implies that causal average total effects and causal (Z =z)-
conditional total effects can be estimated as well, provided that we assume that the con-
ditional expectations E (Y |X ), E X =x (Y |Z ), and E (Y |X, Z ) mentioned above are unbiased
(see sect. 6.4 for details). ⊳

Remark 6.28 [Unbiasedness and Randomization] In chapter 8 we show that unbiased-


ness of the conditional expectations, their values, and the prima facie effects can be cre-
ated by randomized assignment of the observational unit to one of several treatment con-
ditions. Unbiasedness can also be strived for by covariate selection. That is, we may try
to select covariates Z 1 , . . . , Z m of X such that unbiasedness of the conditional expectations
E X =x (Y |Z ) holds for the m-variate covariate Z := (Z 1 , . . . , Z m ) and all values x of X . ⊳

Remark 6.29 [Unbiasedness and Covariate Selection] Unfortunately, unbiasedness can-


not be used as a criterion for covariate selection. The reason is that it cannot be tested
empirically because the definitions involve the true outcome variables τx . These variables
166 6 Unbiasedness and Identification of Causal Effects

Box 6.1 Unbiasedness

Unbiasedness of various conditional expectations, their values, prima facie effect functions,
and prima facie effects is symbolized and defined as follows:

E (Y |X =x ) ⊢ D X E (Y |X =x ) is unbiased. Under the Assumptions 6.1 (a) to (d) it is de-


fined by P-uniqueness of τx and E (Y |X =x ) = E (τx ).
E (Y |X ) ⊢ D X E (Y |X ) is unbiased. Under the Assumptions 6.1 (a) to (d) and (h) it is
defined by unbiasedness of E (Y |X =x ) for all values x ∈ X (Ω).
PFE x x ′ ⊢ DC The prima facie effect PFE x x ′ = E (Y |X =x ) −E (Y |X =x ′ ) is unbiased.
Under the Assumptions 6.1 (a) to (d) and (g) it is defined by P-unique-
ness of τx , τx ′ and PFE x x ′ = E (τx −τx ′ ).

E X =x (Y |Z ) ⊢ D X E X =x (Y |Z ) is unbiased. Under the Assumptions 6.1 (a) to (e) it is de-


fined by P-uniqueness of τx and E X =x (Y |Z ) = E (τx |Z ).
P
E (Y |X, Z ) ⊢ D X E (Y |X, Z ) is unbiased. Under the Assumptions 6.1 (a) to (e) and (h) it
is defined by unbiasedness of E X =x (Y |Z ) for all values x ∈ X (Ω).
PFE Z ; x x ′ ⊢ DC The prima facie effect function PFE Z ; x x ′ : ΩZ′ → R and the prima fa-

cie effect variable PFE x x ′ (Z ) = E X =x (Y |Z ) −E X =x (Y |Z ) are unbiased.
P
Under the Assumptions 6.1 (a) to (e), and (g) it is defined by P-unique-
ness of τx , τx ′ , and PFE Z ; x x ′ (Z ) = E (τx −τx ′ |Z ).
P

E X =x (Y |Z=z) ⊢ D X E X =x (Y |Z=z) is unbiased. Under the Assumptions 6.1 (a) to (f ),


and P(X =x , Z=z) > 0 it is defined by P Z=z -uniqueness of τx and
E X =x (Y |Z=z) = E (τx |Z=z).
E Z=z (Y |X ) ⊢ D X E Z=z (Y |X ) is unbiased. Under the Assumptions 6.1 (a) to (f ), (h), and
P(X =x , Z=z) > 0 for all x ∈ X (Ω), it is defined by unbiasedness of
E X =x (Y |Z=z) for all values x ∈ X (Ω).

PFE Z ; x x ′ (z) ⊢ DC The prima facie effect PFE Z ; x x ′ (z) = E X =x (Y |Z=z) −E X =x (Y |Z=z) is
unbiased. Under the Assumptions 6.1 (a) to (g) and P(X =x , Z=z),
P(X =x ′, Z=z) > 0, it is defined by P Z=z -uniqueness of τx , τx ′ and
PFE Z ; x x ′ (z) = E (τx −τx ′ |Z=z).

even cannot be estimated unless very restrictive assumptions are introduced. This has
been discussed in some detail by Holland (1986) and has been called the “fundamental
problem of causal inference” (see also the preface). However, in chapters 7 to 10 we intro-
duce other causality conditions that can be tested empirically and that imply unbiased-
ness. It is those causality conditions that can be used for covariate selection in empirical
causal research. ⊳

6.4 Identification of Causal Total Effects

In chapter 5 we introduced causal average total effects and causal conditional total effect
functions, which, in the first place, are of a purely theoretical nature. They just define what
6.4 Identification of Causal Total Effects 167

we are interested in, for example, in studies evaluating the causal effects of a treatment,
an intervention, or an exposition. Now we study how causal total effects can be identified,
that is, how they can be computed from parameters that can be estimated in samples, and
how the causal conditional total effect functions can be identified by functions that can be
estimated in samples.

6.4.1 Identification of the Causal Average Total Effect

In Definition 5.26 we introduced the causal average total effect

ATE x x ′ = E (τx − τx ′ ), (6.45)

presuming that τx and τx ′ are P-unique. Taking its expectation, the true total effect variable
δxx ′ = τx − τx ′ is coarsened to a single number. With such a coarsening, we often lose infor-
mation. However, the resulting causal average total effect is still unbiased. In this context,
instead of coarsening, we also use the term aggregation or re-aggregation. To emphasize,
re-aggregation does not mean to ignore the potential confounders of a putative cause vari-
able X . By definition, a potential confounder of X is a random variable on the probability
space considered that is measurable with respect to the global potential confounder D X
[see Def. 4.11 (iii)]. Re-aggregation only means coarsening and loosing information about
more fine-grained conditional effects. It does not reintroduce bias. Instead, re-aggregation
maintains causal interpretability.

Remark 6.30 [Basic Idea of the True-Outcome Theory of Causal Effects] In the construc-
tion of the theory of causal effects, we first condition on a global potential confounder
D X in order to control for all potential confounders. Doing this, we obtain the most fine-
grained causal total effect variable CTE D X ; xx ′ (D X ) = τx − τx ′ [see Def. 5.18 (i)]. Then we re-
aggregate it and obtain a coarsened causal effect function or effect parameter that can be
computed from an empirically estimable function or parameter. ⊳

In Definition 6.23 we defined unbiasedness of the prima facie effect PFE x x ′ by P -


uniqueness of τx , τx ′ , and PFE x x ′ = E (τx − τx ′ ). Hence, the definitions of ATE x x ′ and unbi-
asedness of the prima facie effect PFE x x ′ immediately yield the following corollary.

Corollary 6.31 [Identifying the Causal Average Total Effect via PFE x x ′ ]
Let the Assumptions 6.1 (a) to (d) and (g) hold, and assume that PFE x x ′ is unbiased.
Then
ATE x x ′ = PFE x x ′ . (6.46)

Remark 6.32 [Ignoring Potential Confounders of X ] In contrast to re-aggregation of the


true total effect variable τx − τx ′ , considering the prima facie effect E (Y |X =x ) − E (Y |X =x ′ )
— and in this sense, ignoring potential confounders of X — may lead to a completely wrong
conclusion about the causal average total effect (of x compared to x ′ on Y ), unless the
prima facie effect E (Y |X =x ) − E (Y |X =x ′ ) is unbiased. Even a reversal of effects is possi-
ble. That is, there may be a positive prima facie effect while the causal average total effect
is negative and vice versa. This is exemplified in the following example. ⊳
168 6 Unbiasedness and Identification of Causal Effects

Example 6.33 [Joe and Ann With Self-Selection] Table 6.1 displays the crucial parame-
ters of a random experiment, in which the effect of treatment 1 compared to treatment
0 is reversed if the person variable U is ignored. This table has already been shown in a
similar form in Table 1.2. However, now it is written in the terms introduced in chapters
5 and 6. In this example, the person variable U is a global potential confounder of X . The
causal average total effect is

ATE 10 = E E X =1 (Y |U ) − E X=0 (Y |U ) = E CTE U ;10 (U ) = E (τ1 − τ0 )


¡ ¢ ¡ ¢

= E (τ1 ) − E (τ0 )
X X =1 X X=0
= E (Y |U =u ) ·P (U =u ) − E (Y |U =u ) ·P (U =u )
u u
µ
1 1
¶ µ
1 1

= .8 · + .4 · − .7 · + .2 · = .6 − .45 = .15
2 2 2 2

[see Box 6.2 (ii)]. In contrast, the corresponding prima facie effect is

PFE 10 = E (Y |X =1) − E (Y |X =0)

= E (τ1 |X =1) − E (τ0 |X =0)


X X =1 X X=0
= E (Y |U =u )·P (U =u |X =1) − E (Y |U =u ) ·P (U =u |X =0)
u u
µ
1 19
¶ µ
4 1

= .8 · + .4 · − .7 · + .2 · = −.18
20 20 5 5

[see Box 6.2 (i)]. Considering Box 6.2 and comparing Equations (i) and (ii) to each other
reveals why such a reversal of effects (.15 vs. −.18) can occur: Computing the conditional
expectation value E (Y |X =x ) we weigh the conditional expectation values E X =x (Y |U =u )
by the conditional probabilities P (U =u |X =x ) [see Eq. (i) in Box 6.2], whereas computing
the expectation E (τx ) of the true outcome variable we weigh them by the probabilities
P (U =u ) [see Eq. (ii) in that box]. [For a proof of Equations (i) to (iv) of Box 6.2 see Exercise
6-11]. ⊳
According to the following theorem we can also identify the causal average total ef-
fect ATE x x ′ if the prima facie effect PFE x x ′ is not unbiased. It suffices to assume that the

Z -conditional prima facie effect variable PFE Z ; x x ′ (Z ) = P
E X =x (Y |Z ) − E X =x (Y |Z ) is unbi-
ased. Reading this theorem, note that Z may be a multivariate covariate of X such that
Z = (Z 1 , . . . , Z m ) consists of m univariate covariates Z i , i = 1, . . . , m.

Theorem 6.34 [Identifying the Causal Average Total Effect via PFE Z ; x x ′ ]
Let the Assumptions 6.1 (a) to (e) and (g) hold, and assume that PFE Z ; x x ′ is unbiased.
Then
¡ ¢
ATE x x ′ = E PFE Z ; x x ′ (Z ) . (6.47)
(Proof p. 191)

In Equation (6.47) we re-aggregate the prima facie effect function PFE Z ; x x ′ to obtain
a single number. If PFE Z ; x x ′ is unbiased, then this does not mean to ignore the potential
6.4 Identification of Causal Total Effects 169

Table 6.1. Joe and Ann with self-selection to treatment

E X =1 (Y |U =u )
E X=0 (Y |U =u )

P (U =u |X = 0)
P (X =1 |U =u )

P (U =u |X =1)
CTE U ;10 (u)
Person u

P (U =u )
Joe 1/2 .04 .7 .8 .1 4/5 1/20
Ann 1/2 .76 .2 .4 .2 1/5 19/20

x =0 x=1
E (τx ): .45 .6 ATE 10 = .15
E (Y |X =x ): .6 .42 PFE 10 = −.18

confounders of X . Instead, we just re-aggregate (coarsen) the causal Z -conditional total


effect variable to obtain a single number, the causal average total effect ATE x x ′ .

Remark 6.35 [Foundation for the Analysis of Causal Total Effects] Theorem 6.34 is the
theoretical foundation for the analysis of causal average total effects beyond the simple
randomized experiment. The crucial assumption is unbiasedness of PFE Z ; x x ′ , and this as-
sumption may also hold in observational studies. Of course, finding a (possibly multivari-
ate) covariate Z of X for which PFE Z ; x x ′ is unbiasedness is often a challenge for empirical
research. Also note that if PFE Z ; x x ′ is unbiased, then it is identical to the more fine-grained
causal total effect variable CTE Z ; x x ′ (Z ) (see Cor. 6.37), which is much more informative
than the causal average total effect ATE x x ′ . In the chapters to come we will learn more
about sufficient conditions for unbiasedness of PFE Z ; x x ′ . ⊳

Remark 6.36 [Z -Adjusted (X =x )-Conditional Expectation Value of Y ] If we insert Equa-


tion (6.40) into the right-hand side of Equation (6.47), then we obtain

ATE x x ′ = E E X =x (Y |Z ) − E E X =x (Y |Z ) .
¡ ¢ ¡ ¢
(6.48)

The first term on the right-hand side of this equation is called the Z-adjusted (X =x )-condi-
tional expectation value of Y , and the second term the Z-adjusted (X =x ′ )-conditional ex-
pectation value of Y . Again, we presume that Z is a covariate of X . ⊳

6.4.2 Identification of a Causal Conditional Total Effect Function

In Definition 5.38 we introduced the causal conditional total effect function CTE Z ; xx ′ and
the composition CTE Z ; xx ′ (Z ) by

CTE Z ; x x ′ (Z ) =
P
E (τx − τx ′ |Z ), (6.49)

assuming that τx and τx ′ are P-unique and that Z is s random variable on (Ω, A, P ). In
Definition 6.23 (ii) we defined unbiasedness of the prima facie effect function PFE Z ; x x ′ by
170 6 Unbiasedness and Identification of Causal Effects

Box 6.2 Conditional expectation values in the examples of this chapter

Consider the examples presented in Tables 6.1 to 6.4, let x ∈ X (Ω) denote a value of the treat-
ment variable X , and let U denote the observational-unit variable. In these examples, U = D X ,
P(X =x ,U =u ) > 0 for all values u ∈U (Ω), and τx = E X =x (Y |U ) ∈ E X =x (Y |U ). Then:
X
E (Y |X =x ) = E (τx |X =x ) = E (Y |X =x ,U =u ) ·P(U =u |X =x ) , (i)
u

whereas X
E (τx ) = E (Y |X =x ,U =u ) ·P(U =u ) . (ii)
u
Additionally, let Z be a covariate of X and let z ∈ Z (Ω). For the examples in Tables 6.1 to 6.4, in
which U is finite, this implies P(X =x , Z=z) > 0 and
X
E (Y |X =x ,Z=z) = E (τx |X =x , Z=z) = E (Y |X =x ,U =u ) ·P(U =u |X =x , Z=z ) , (iii)
u

whereas X
E (τx |Z=z) = E (Y |X =x ,U =u ) ·P(U =u |Z=z) . (iv)
u

PFE Z ; x x ′ (Z ) =
P
E (τx − τx ′ |Z ), (6.50)

assuming that Z is a covariate of X as well as P -uniqueness of τx and τx ′ . Because the


covariate Z of X is a random variable on (Ω, A, P ), these two definitions immediately im-
ply the following corollary, according to which a conditional prima facie effect function
PFE Z ; x x ′ is a causal conditional total effect function CTE Z ; x x ′ if it is unbiased.

Corollary 6.37 [Identifying a Causal Z -Conditional Effect Function via PFE Z ; x x ′ I ]


Let the Assumptions 6.1 (a) to (e) and (g) hold, and assume that PFE Z ; x x ′ is unbiased.
Then

CTE Z ; x x ′ (Z ) =
P
PFE Z ; x x ′ (Z ) =
P
E X =x (Y |Z ) − E X =x (Y |Z ) . (6.51)

Remark 6.38 [A Measurability Assumption for Z ] Under the assumptions of Theorem


6.34, the expectation of an unbiased prima facie effect variable PFE Z ; x x ′ (Z ) is identical to
the causal average total effect ATE x x ′ . This implies a complete re-aggregation of a causal
Z -conditional total effect function to a single number. In Theorem 6.39 we extend this
result to a partial re-aggregation that yields a causal V -conditional total effect variable
CTE V ; x x ′ (V ). In this theorem we do not only assume PFE Z ; x x ′ (Z ) to be unbiased (and with
it that Z is a covariate of X ) and P -uniqueness of the true outcome variables τx and τx ′ ,
but also

σ(V ) ⊂ σ(1X =x , 1X =x ′ , Z ). (6.52)

Note that this assumption implies σ(V ) ⊂ σ(1X =x , Z ) and σ(V ) ⊂ σ(1X =x ′ , Z ). It holds, for
example, if one of the following conditions applies:
6.4 Identification of Causal Total Effects 171

(a) σ(V ) ⊂ σ(Z )


(b) σ(V ) ⊂ σ(1X =x )
(c) σ(V ) ⊂ σ(1X =x ′ ).
Conditions implying (a) are found in RS-Lemma 2.35 and RS-Corollary 2.36. ⊳

Theorem 6.39 [Identifying a Causal V -Conditional Effect Function via PFE Z ; x x ′ II ]


Let the Assumptions 6.1 (a) to (e) and (g) hold, let V be a random variable on (Ω, A, P ),
assume that Proposition (6.52) holds and PFE Z ; x x ′ (Z ) is unbiased. Then
¡ ¯ ¢
CTE V ; x x ′ (V ) =
P
E PFE Z ; x x ′ (Z ) ¯V (6.53)

E E X =x (Y |Z ) ¯V − E E X =x (Y |Z ) ¯V .
¡ ¯ ¢ ¡ ¯ ¢
=
P
(6.54)
(Proof p. 191)

Remark 6.40 [Comparing PFE V ; x x ′ (V ) to E (PFE Z ; x x ′ (Z ) |V )] Even if V is Z -measurable,


then the right-hand side of Equation (6.53) is not necessarily almost surely identical to the
V-conditional prima facie effect function

PFE V ; x x ′ (V ) =
P
E X =x (Y |V )− E X =x (Y |V ) .
¡ ¯ ¢
The two terms PFE V ; x x ′ (V ) and E PFE Z ; x x ′ (Z ) ¯V are functions of V , which implies that
they are V -measurable ¡ (see RS-Lem.¯ ¢ 2.35). However, if PFE Z ; x x ′ (Z ) is unbiased and (6.52)
holds for V , then E PFE Z ; x x ′ (Z ) ¯V is almost surely identical to a causal conditional to-
tal effect variable CTE V ; x x ′ (V ) [see Eq. (6.53)]. In contrast, even if V is Z -measurable,
then PFE V ; x x ′ (V ) does not have a causal meaning unless it is unbiased. Unbiasedness of
PFE V ; x x ′ (V ) does not follow from the assumptions of Theorem 6.39. Hence, Theorem 6.39
offers a way to identify CTE V ; x x ′ (V ) even if PFE V ; x x ′ (V ) is biased. The two crucial assump-
tions implying Equation (6.53) are unbiasedness of PFE Z ; x x ′ (Z ) and the measurability as-
sumption (6.52). ⊳
Now we extend the result of Theorem 6.39 on the identification of a causal V -conditional
total effect function CTE V ; x x ′ . In Theorem 6.39 we assumed σ(V ) ⊂ σ(1X =x , 1X =x ′ , Z ). In
contrast, in Theorem 6.41 we assume

σ(V ) ⊂ σ(X , Z ). (6.55)

This assumption holds, for example, if one of the following conditions applies:
(a) σ(V ) ⊂ σ(Z )
(b) σ(V ) ⊂ σ(X ).
More general conditions implying Proposition (6.55) are found in RS-Lemma 2.35 and RS-
Corollary 2.36.
If σ(X ) 6⊂ σ(1X =x , 1X =x ′ ), then Theorem 6.41 extends the results of Theorem 6.39 to a
larger set of random variables taking the role of V . However, the price is to assume P -
uniqueness of τx and τx ′ as well as Z -conditional mean-independence of τx and τx ′ from
X , that is,

τx  X | Z and τx ′  X |Z . (6.56)
172 6 Unbiasedness and Identification of Causal Effects

Remember, under P -uniqueness of τx and τx ′ , these assumptions are equivalent to

E (τx |X , Z ) =
P
E (τx |Z ) and E (τx ′ |X , Z ) =
P
E (τx ′ |Z ), (6.57)

respectively. Some sufficient conditions for the equations in Proposition (6.57) will be
treated in chapter 7 (see, e. g., Table 7.2), which is devoted to the Rosenbaum-Rubin
causality conditions.
Under P -uniqueness of τx and τx ′ , assuming (6.57) implies unbiasedness of E X =x (Y |Z )

and E X =x (Y |Z ), that is, it implies

E X =x (Y |Z ) ⊢ D X and E X =x (Y |Z ) ⊢ D X (6.58)

(see Exercise 6-12). Finally, note that unbiasedness of E X =x (Y |Z ) and E X =x (Y |Z ) is equiv-
alent to τx  1X =x | Z and τx  1X =x ′ |Z , provided that τx and τx ′ are P-unique [see Th. 6.20 (i)].

Theorem 6.41 [Identifying a Causal V -Conditional Effect Function via PFE Z ; x x ′ III ]
Let the Assumptions 6.1 (a) to (e) and (g) hold, and let V be a random variable on
(Ω, A, P ) satisfying σ(V ) ⊂ σ(X , Z ). Furthermore, assume that τx and τx ′ are P-unique,
and that Equations (6.57) hold. Then
¡ ¯ ¢
CTE V ; x x ′ (V ) =
P
E PFE Z ; x x ′ (Z ) ¯V (6.59)

E E X =x (Y |Z ) ¯V − E E X =x (Y |Z ) ¯V .
¡ ¯ ¢ ¡ ¯ ¢
=
P
(6.60)
(Proof p. 192)

Remark 6.42 [Foundation for the Analysis of Causal Conditional Total Effects] Corollary
6.37, Theorem 6.39, and Theorem 6.41 are the theoretical foundations for the analysis of
causal conditional total effect functions. The crucial assumptions in Theorem 6.39 are un-
biasedness of PFE Z ; x x ′ and Proposition (6.52). In contrast, the crucial assumptions in The-
orem 6.41 are that τx and τx ′ are P-unique, σ(V ) ⊂ σ(X , Z ), and that the Equations (6.57)
hold. In the chapters to come we will study various sufficient conditions for these require-
ments. ⊳

6.4.3 Identification of a Causal Conditional Total Effect

The causal conditional total effect CTE Z ; x x ′ (z) has been defined by

CTE Z ; xx ′ (z) = E (τx − τx ′ |Z =z), (6.61)

assuming that τx and τx ′ are P Z=z-unique [see Def. 5.38 (ii)]. Furthermore, additionally
assuming P (X =x , Z =z) > 0 and P (X =x ′, Z =z) > 0, unbiasedness of the conditional prima
facie effect PFE Z ; x x ′ (z) has been defined by the conjunction of

PFE Z ; x x ′ (z) = E (τx − τx ′ |Z =z) (6.62)

and P Z=z -uniqueness of τx and τx ′ [see Def. 6.23 (iii)]. The assumption that PFE Z ; x x ′ (z) is
unbiased comprises the assumption that τx and τx ′ are P Z=z -unique. This assumption has
already been explained in more detail in Remarks 5.34 and 5.36.
Equations (6.61) and (6.62) immediately imply the following corollary.
6.4 Identification of Causal Total Effects 173

Corollary 6.43 [Identifying a Causal (Z =z)-Conditional Total Effect by PFE Z ; x x ′ (z)]


Let the Assumptions 6.1 (a) to (g) hold. Furthermore, assume that P (X =x , Z =z) > 0,
P (X =x ′, Z =z) > 0, and that PFE Z ; x x ′ (z) is unbiased. Then

CTE Z ; x x ′ (z) = PFE Z ; x x ′ (z) = E X =x (Y |Z =z) − E X =x (Y |Z =z) . (6.63)

Hence, under the assumptions of Corollary (6.43), the (Z =z)-conditional causal total
effect on Y comparing x to x ′ , that is, CTE Z ; xx ′ (z), is identical to the corresponding prima
facie effect PFE Z ; x x ′ (z).
Now we consider again re-aggregation of a causal Z -conditional effect variable, assum-
ing CTE Z ; x x ′ (Z ) =
P
PFE Z ; xx ′ (Z ), that is, assuming unbiasedness of PFE Z ; x x ′ (Z ). Theorem
6.39 and Theorem 6.41 imply the following corollary about the identification of the causal
(V =v )-conditional total effect CTE V ; x x ′ (v), which is a uniquely defined number because
we assume P (V =v) > 0 (see RS-Rem. 4.26).

Corollary 6.44 [Identifying a Causal (V =v)-Conditional Total Effect via PFE Z ; x x ′ ]


Let the assumptions of Theorem 6.39 or Theorem 6.41 hold and assume P (V =v ) > 0.
Then
¡ ¯ ¢
CTE V ; x x ′ (v) = E PFE Z ; x x ′ (Z ) ¯ V =v (6.64)

= E E X =x (Y |Z ) ¯V =v − E E X =x (Y |Z ) ¯V =v .
¡ ¯ ¢ ¡ ¯ ¢
(6.65)
(Proof p. 192)

Hence, under the assumptions of Corollary (6.44), the (V =v)-conditional causal total
effect on Y comparing x to x ′ , that is, CTE V ; xx ′ (v), is identical to the (V =v)-conditional
expected value of the prima facie effect variable PFE Z ; x x ′ (Z ). Note that the terms on the
right-hand sides of these equations are estimable in an appropriate sampling model. Also
note, the true outcome variables τx and τx ′ are implicitly involved in the term CTE V ; xx ′ (v)
because CTE V ; xx ′ (v) = E (τx |V =v) − E (τx ′ |V =v). In contrast, true outcome variables are
not involved in the terms on the right-hand sides of Equations (6.64) and (6.65).

Remark 6.45 [Unbiasedness of PFE Z ; x x ′ (z) vs. Unbiasedness of PFE V ; x x ′ (v)] Again, note
that even if V is Z -measurable, then CTE V ; x x ′ (v) is not necessarily identical to

PFE V ; x x ′ (v) = E X =x (Y |V =v) − E X =x (Y |V =v) .

Comparing the right-hand side of this equation to the right-hand side of Equation (6.65)
reveals the difference. However, CTE V ; x x ′ (v) = PFE V ; x x ′ (v), if PFE V ; x x ′ (v) is unbiased. Note
that unbiasedness of PFE V ; x x ′ (v) is not implied by the assumptions of Corollary 6.44.
Hence, Corollary 6.44 offers a way to identify CTE V ; xx ′ (v) even if PFE V ; x x ′ (v) is biased.
The crucial assumptions are mentioned in Theorems 6.39 and 6.41. ⊳

Remark 6.46 [Some Special Cases of V ] A special case of Corollary 6.44 is V = X = 1X =1 .


Hence, if the assumptions of Theorem 6.39 hold, then
174 6 Unbiasedness and Identification of Causal Effects

Box 6.3 Identification of causal total effects and causal conditional effect functions

Various causal total effects and causal total effect functions can be identified ...

... by the equations ... under the Assumptions


ATE x x ′ = PFE x x ′
6.1 (a) to (d), (g), and PFE x x ′ ⊢ DC .
¡ ¢
ATE x x ′ = E PFE Z ; x x ′ (Z )
6.1 (a) to (e), (g), and PFE Z ; x x ′ ⊢ DC .
CTE Z ; xx ′ (Z ) = PFE Z ; x x ′ (Z )
P 6.1 (a) to (e), (g), and PFE Z ; x x ′ ⊢ DC .
CTE Z ; xx ′ (z) = PFE Z ; x x ′ (z)
6.1 (a) to (g), P(X =x , Z=z) > 0,
P(X =x ′, Z=z) > 0, and PFE Z ; x x ′ (z) ⊢ DC .
¡ ¯ ¢
CTE V ; xx ′ (V ) = E PFE Z ; x x ′ (Z ) ¯V
P 6.1 (a) to (e), (g), PFE Z ; x x ′ ⊢ DC , V is a
random variable on (Ω,A,P) satisfying
σ(V ) ⊂ σ(1X =x , 1X =x ′ , Z ).
An alternative set of assumptions under
which this identification equation holds is:
6.1 (a) to (e), (g), Equations (6.57), V is a
random variable on (Ω,A,P) satisfying
σ(V ) ⊂ σ(X , Z ).
¡ ¯ ¢ ¡ ¯ ¢
CTE V ; xx ′ (v ) = E PFE Z ; x x ′ (Z ) ¯ V =v CTE ′ (V ) = E PFE
V ; xx ′ (Z ) ¯V and
Z ;xx
P
P(V =v ) > 0.

See Box 6.1 for the definitions of the unbiasedness assumptions such as PFE x x ′ ⊢ DC .

¡ ¯ ¢
CTE X ; 10 (0) = E PFE Z ; 10 (Z ) ¯X =0 (6.66)
= E E X =1 (Y | Z ) ¯ X =0 − E E X=0 (Y | Z ) ¯ X =0
¡ ¯ ¢ ¡ ¯ ¢
(6.67)

and
¡ ¯ ¢
CTE X ; 10 (1) = E PFE Z ; 10 (Z ) ¯X =1 (6.68)
= E E X =1 (Y | Z ) ¯ X =1 − E E X=0 (Y | Z ) ¯ X =1 .
¡ ¯ ¢ ¡ ¯ ¢
(6.69)

In empirical applications, in which X is an indicator variable representing treatment


(X =1) and control (X =0), the term CTE X ;10 (0) is known as the causal total effect on the
untreated of treatment 1 compared to treatment 0. Similarly, CTE X ;10 (1) is known as the
causal total effect 0 on the treated of treatment 1 compared to treatment. Note, however,
that this wording is not in line with the pre facto perspective of probability theory (see Re-
marks 5.47 and 5.48). Hence, it is better to replace ‘on the untreated’ and ‘on the treated’ by
‘given no treatment’ and ‘given treatment’, respectively. Also note that, in particular under
self-selection to treatment, the (X =1)-conditional distributions of potential confounders
of X may differ from their (X =0)-conditional distributions. In informal terms, the subjects
who select treatment are likely to differ before treatment from those who don’t in impor-
tant attributes such as ‘severity of symptoms’, ‘motivation for treatment’, and so on, which
often leads to bias of prima facie effects.
6.5 Three Examples 175

Another special case of Corollary 6.44 is V =(1X =1 ,W ), where W is a Z -measurable ran-


dom variable. Hence, if the assumptions of Theorem 6.39 hold, X is a putative cause vari-
able, and P (X =x ,W =w) > 0 for x = 0, 1, then
¡ ¯ ¢
CTE X W ; 10 (0, w) = E PFE Z ; 10 (Z ) ¯X =0,W =w (6.70)
¡ X =1 ¯ ¢ ¡ X=0 ¯ ¢
= E E (Y | Z ) X =0,W =w − E E
¯ (Y | Z ) X =0,W =w
¯ (6.71)

is the causal conditional total effect (comparing treatment to control) given control and
the value w of W . Correspondingly,
¡ ¯ ¢
CTE X W ; 10 (1, w) = E PFE Z ; 10 (Z ) ¯X =1,W =w (6.72)
¡ X =1 ¯ ¢ ¡ X=0 ¯ ¢
= E E (Y | Z ) X =1,W =w − E E
¯ (Y | Z ) X =1,W =w .
¯ (6.73)

is the causal conditional total effect given treatment and the value w of W . ⊳

6.5 Three Examples

Tables 6.2 to 6.4 show parameters pertaining to fictive random experiments such as the
single-unit trials described in chapter 2. Among these parameters are the individual ex-
pectation values E (Y |X =x ,U =u ) given the treatment conditions, and the individual treat-
ment probabilities P (X =1|U =u ). The parameters presented in the tables can be used to
generate sample data that would result if the random experiments to which the tables refer
were conducted n times.1

The Regular Probabilistic Causality Space in All Examples

Remark 6.47 [The Probability Space] For simplicity, we consider random experiments in
which no fallible covariates are observed and in which there is neither a second treatment
variable nor any other variable that is simultaneous to the treatment variable. In this case,
the set

Ω = Ω1 × Ω2 × Ω3 = ΩU × ΩX × R (6.74)

suffices to describe the set of possible outcomes of the random experiment, where ΩU =
{Tom , Tim , Joe , Jim , Ann , Sue } and ΩX = {treatment, control}. Furthermore, we consider
the product σ-algebra A = P (ΩU ) ⊗ P (ΩX ) ⊗ B, where B denotes the Borel σ-algebra on
R , the set of real numbers (see RS-Rem. 1.14). The probability measure P on (Ω, A ) is only
partly known. Looking at Table 6.2, for example, we only know the conditional expectation
E (Y |X ,U ). In contrast, we do not know the conditional distribution P Y | X ,U , which would
be known only if additional information were added, such as ‘Y is conditionally normally
distributed given (X ,U )’, with a specified conditional variance of Y given (X ,U ). However,
for our purpose, the conditional distribution of Y is not relevant because we only consider
1 Although the focus of this book is on theory and not on data analysis, we also provide sample data for each table
on the home page of this book: www.causal-effects.de. These and other examples of this type as well as a data
sample generated by these examples can easily be created with the PC-program CausalEffectsXplorer that is also
provided on www.causal-effects.de, together with an extensive help file providing the most important concepts
and formulas.
176 6 Unbiasedness and Identification of Causal Effects

Table 6.2. Self-selection to treatment

Fundamental parameters

E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (X =1|U =u )

P (U =u |X = 0)

P (U =u |X =1)
CTE U ;10 (u)
Person u

P (U =u )
Sex z

Tom m 1/6 4/7 88 100 12 3/21 4/21


Tim m 1/6 3/7 98 103 5 4/21 3/21
Joe m 1/6 6/7 68 81 13 1/21 6/21
Jim m 1/6 5/7 78 86 8 2/21 5/21
Ann f 1/6 2/7 106 114 8 5/21 2/21
Sue f 1/6 1/7 116 130 14 6/21 1/21

x=0 x =1
E (τx ): 92.333 102.333 ATE 10 = 10
E (Y |X =x ): 100.286 94.429 PFE 10 = −5.857

E (τx |Z =m): 83 92.5 CTE Z ; 10 (m) = 9.5


E (Y |X =x , Z =m): 88 90.278 PFE Z ; 10 (m) = 2.278

E (τx |Z = f ): 111 122 CTE Z ; 10 ( f ) = 11


E (Y |X =x , Z = f ): 111.455 119.333 PFE Z ; 10 ( f ) = 7.879

causal total effects that are defined in terms of the conditional expectation E (Y | X ,U ) and
its values E (Y | X =x ,U =u ) (see ch. 5). ⊳

Remark 6.48 [Filtration and Global Potential Confounder] The filtration (Ft )t ∈T , t ∈T =
{1, 2, 3} consists of the σ-algebras

F1 := σ(π1), F2 := σ(π1, π2 ), F3 := σ(π1, π2, π3 ), (6.75)

where π1, π2, and π3 are the projections πt : Ω → Ωt , t ∈T (see Def. 4.6). ⊳
Remark 6.49 [Random Variables] In all ¡ examples ¢of this section, we consider the person
variable U : Ω → ΩU with
¡ ′ value space ΩU , P (ΩU ) , the treatment variable X : Ω → ΩX′ =

¢
{0, 1} with value space ΩX , P (ΩX ) , where X takes on the value 1 for treatment and 0 for
control, and the real-valued outcome variable Y : Ω → R with value space (R , B). ⊳
Remark 6.50 [Cause and Confounder σ-Algebras] Because there are no other variables
that are simultaneous to the treatment variable X , the index sets J and K used in Defi-
nition 4.6 are J = K = {1}. Hence, the cause σ-algebra is C = σ(π2j , j ∈ K ) = σ(π2) and the
confounder σ-algebra is DC = σ(π1, π2j , j ∈ J \ K ) = σ(π1 ).
Note that U = π1, which is a global potential confounder of X [see Def. 4.11 (iii)]. Fur-
thermore, U is prior in (Ft )t ∈T to X and Y, and X is prior to Y (see sect. 3.1). ⊳
6.5 Three Examples 177

ΩU = {Tom , Tim , . . . , Sue }

U g1

Ω = ΩU × ΩX × R R
τ1 = E X =1 (Y |U ) = g 1 (U )

Figure 6.1. The person variable U , the function g 1 , and their composition, the true out-
come variable τ1 = g 1 (U ).

Hence, for all examples of this section, we specified the regular probabilistic causality
setup
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y
[see Def. 4.11 (v)] and asserted that U is a global potential confounder of X .

True Outcome Variables and Individual Treatment Probabilities

Remark 6.51 [True Outcome Variables] In all examples of this chapter, P (X =x ,U =u ) > 0
for all pairs (x, u) of values of X and U . Therefore, and because σ(U )= DC , the two true
outcomes variables are
τx := E X =x (Y |D X ) = E X =x (Y |U ), x ∈ {0, 1}. (6.76)
According to Equation (5.4), the true outcome variable τx can also be written as a function
of the person variable U . More specifically, it can be written as the composition of U and
the function g x : ΩU → R defined by
g x (u) = E (Y |X =x ,U =u ), for all u ∈ ΩU . (6.77)
Note, that in this definition, x is a fixed value of X , and τx = g x (U ) is the composition of U
and g x , that is,
¡ ¢
∀ω ∈ Ω: τx (ω) = g x U (ω) = g x (u), if ω ∈ {U =u }. (6.78)
This implies that the values of the conditional expectation E X =x (Y |U ) are identical to the
conditional expectation values E (Y |X =x ,U =u ) [see Eq. (5.3)]. Figure 6.1 illustrates these
equations for treatment x =1. ⊳

Remark 6.52 [Individual Treatment Probabilities] In all examples of this chapter,


P (X =1|D X ) = P (X =1|U ). (6.79)
Note, by definition, P (X =1|D X ) = E (X |D X ) and P (X =1|U ) = E (X |U ) (see RS-Rem. 4.12),
provided that X is dichotomous with values 0 and 1. In all examples of this chapter, the
D X -conditional treatment probability is uniquely defined and it is identical to the U -con-
ditional treatment probability P (X =1|U ), whose values are denoted by P (X =1|U =u ) and
also called the individual treatment probabilities. ⊳
178 6 Unbiasedness and Identification of Causal Effects

Description of the Examples

In the first example (see Table 6.2), the individual treatment probabilities are different for
each and every person, and they strongly depend on the individual expectation values of
the outcomes under control, that is, they depend on the true outcome variable τ0 and also
on τ1 . The conditional expectation values E (Y |X =1) and E (Y |X =0) are biased. In fact, the
prima facie effect PFE 10 is negative, whereas the causal average total effect ATE 10 is posi-
tive. Furthermore, the conditional expectation values E (Y |X =1, Z =z) and E (Y |X =0, Z =z)
are biased as well. Although the causal (Z =z)-conditional total effects and the causal aver-
age total effect are defined (and can be computed from the parameters displayed in the up-
per left part of the table), they cannot be estimated from empirically estimable parameters
such as the conditional expectation values E (Y |X =1) and E (Y |X =0) or E (Y |X =1, Z =z)
and E (Y |X =0, Z =z).
In the second example (see Table 6.3), the treatment probabilities are identical for all
persons, implying that X and U are independent, which has many implications that are
studied in detail in chapter 8. Among these implications are that the conditional expecta-
tion E (Y |X ) and its values E (Y |X =x ) as well as the conditional expectation E (Y |X, Z ) and
its values E (Y |X =x , Z =z) are unbiased.
In the third example (see Table 6.4), the treatment probabilities differ between males
and females. Furthermore, males (m) and females ( f ) also differ in their conditional ex-
pectation values of the true outcome variable τ0 , that is, E (τ0 |Z =m) 6= E (τ0 |Z =f ). How-
ever, given the value m or f of Z , the treatment probabilities do not differ from each
other. This implies that X and U are Z -conditionally independent. In this example, in
which P (X =x ,U =u ) > 0 for all pairs of values of X and U , implying that the true out-
come variables τ0 and τ1 are P-unique (see Rem. 5.17), we can conclude that E (Y |X, Z )
is unbiased (see Th. 8.31). Hence, in this third example, the conditional expectation val-
ues E (Y |X =x ) are biased, whereas the conditional expectation values E (Y |X =x , Z =z) are
unbiased. Again, this case is studied extensively in chapter 8.
Tables 6.2, 6.3, and 6.4 display the true outcomes and the individual treatment proba-
bilities. According to Equation (6.76), the values of τx , are also the individual conditional
expectation values E (Y |X =x ,U =u ), and, according to Equation (6.79), the values of the
conditional probability P (X =1|D X ) = P (X =1|U ) are identical to the individual treatment
probabilities P (X =1|U =u ). The tables also display the values of Z := sex, which is a the
covariate of X because it is U -measurable.
Looking at the first four numerical columns, the three tables differ only in the treatment
probabilities P (X =1|U =u ). All other entries in these first four columns, such as the true
outcomes are the same. However, if we look at the other parameters, the three tables differ
in important aspects.

(X =x )-Conditional Expectation Values

We start computing the (X =x )-conditional expectation values of Y and check whether or


not they are unbiased. In the first example (see Table 6.2), E (Y |X =0) = 100.286, whereas
E (τ0 ) = 92.333, that is, E (Y |X =0) is much larger than E (τ0 ). In contrast, E (Y |X =1) =
94.429, whereas E (τ1 ) = 102.333, that is, E (Y |X =1) is much smaller than E (τ0 ). Hence,
according to Definition 6.3 (i), the conditional expectation values E (Y |X =x ) are biased,
and according Definition 6.3 (ii) this is also true for the conditional expectation E (Y |X ).
6.5 Three Examples 179

Table 6.3. Randomized assignment of the person to a treatment

Fundamental parameters

E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (X =1|U =u )

P (U =u |X = 0)

P (U =u |X =1)
CTE U ;10 (u)
Person u

P (U =u )
Sex z
Tom m 1/6 3/4 68 81 13 1/6 1/6
Tim m 1/6 3/4 78 86 8 1/6 1/6
Joe m 1/6 3/4 88 100 12 1/6 1/6
Jim m 1/6 3/4 98 103 5 1/6 1/6
Ann f 1/6 3/4 106 114 8 1/6 1/6
Sue f 1/6 3/4 116 130 14 1/6 1/6

x =0 x=1
E (τx ): 92.333 102.333 ATE 10 = 10
E (Y |X =x ): 92.333 102.333 PFE 10 = 10

E (τx |Z =m): 83 92.5 CTE Z ;10 (m) = 9.5


E (Y |X =x , Z =m): 83 92.5 PFE Z ;10 (m) = 9.5

E (τx |Z = f ): 111 122 CTE Z ;10 ( f ) = 11


E (Y |X =x , Z = f ): 111 122 PFE Z ;10 ( f ) = 11

These expectations and conditional expectations are easy to compute from the param-
eters displayed in Table 6.2. The expectation E (τx ) of the true outcome variable τx is ob-
tained using the unconditional probabilities P (U =u ) as weights [see Box 6.2 (ii)]. In con-
trast, the corresponding conditional expectations values E (Y |X =x ) are identical to the
(X =x )-conditional expectation values E (τx |X =x ) of the true outcome variables, using as
weights the conditional probabilities P (U =u |X =x ) [see Box 6.2 (i)].
If used for the evaluation of the total treatment effect, the (X =x )-conditional expecta-
tion values would lead to completely wrong conclusions. First, the direction of the prima
facie effect

PFE 10 = E (Y |X =1) − E (Y |X =0) = −5.857

is reversed if compared to
¡ ¢
ATE 10 = E CTE U ;10 (U ) = E (τ1 ) − E (τ0 ) = 10.

And second, it is also reversed if compared to each and every individual total effect [see
the column CTE U ;10 (u) in Table 6.2]. All individual total effects are positive in this exam-
ple, ranging between 5 and 14. The bias in this example is due to strong inter-individual
differences in the true outcomes and to the fact that the individual treatment probabilities
P (X =1|U =u ) heavily depend on the true outcome variables, and, therefore, on the per-
son variable U . For instance, Tom has a true outcome under control of E X=0 (Y |U =Tom ) =
180 6 Unbiasedness and Identification of Causal Effects

Table 6.4. Conditionally randomized assignment of the person to a treatment

Fundamental parameters

E X =1 (Y |U =u )
E X=0 (Y |U =u )

P (U =u |X = 0)
P (X =1|U =u )

P (U =u |X =1)
CTE U ;10 (u)
Person u

P (U =u )
Sex z
Tom m 1/6 3/4 68 81 13 1/10 3/14
Tim m 1/6 3/4 78 86 8 1/10 3/14
Joe m 1/6 3/4 88 100 12 1/10 3/14
Jim m 1/6 3/4 98 103 5 1/10 3/14
Ann f 1/6 1/4 106 114 8 3/10 1/14
Sue f 1/6 1/4 116 130 14 3/10 1/14

x =0 x=1
E (τx ): 92.333 102.333 ATE 10 = 10
E (Y |X =x ): 99.8 96.714 PFE 10 = −3.086

E (τx |Z =m): 83 92.5 CTE Z ;10 (m) = 9.5


E (Y |X =x , Z =m): 83 92.5 PFE Z ;10 (m) = 9.5

E (τx |Z = f ): 111 122 CTE Z ;10 ( f ) = 11


E (Y |X =x , Z = f ): 111 122 PFE Z ;10 ( f ) = 11

68 and a treatment probability of 6/7, while Sue has a true outcome under control of
E X=0 (Y |U =Sue ) = 116 and a treatment probability of 1/7. Such a constellation is to be
expected under self-selection of subjects to treatments, if the subjects base their decisions
to take treatment on the severity of their dysfunction before treatment and if severity of their
dysfunction after treatment is assessed as the outcome variable.
This example once again drastically demonstrates the necessity for the distinction
between a difference E (Y | X =1) − E (Y | X =0) of conditional expectation values and the
causal average total effect E (τ1 ) − E (τ0 ). Obviously, only the latter is of interest if we want
to evaluate the treatment.
For the second example presented in Table 6.3, the situation is completely different.
Although the true outcome variables are the same as in Table 6.2, here, the conditional
expectation values E (Y |X =x ) and the expectations E (τx ) of the true outcome variables
τx = E X =x (Y |U ) are identical to each other, and this applies to both values x=0 and x=1
of X . Hence, in this example, the conditional expectation values E (Y |X =x ) are unbiased
and can be used for the evaluation of the treatment effect. This is due to the fact that the
individual treatment probabilities do not depend on the persons. This constellation occurs
in a perfect randomized experiment, in which the experimenter decides that each person
is in treatment 1 with probability P (X =1) and in treatment 0 with probability 1 − P (X =1),
provided, of course, that the persons comply with the experimenters decisions. In our sec-
ond example, P (X =1) = 3/4. Note, however, that P (X =1) could be any number between 0
6.5 Three Examples 181

and 1, exclusively. The only important point is that the individual treatment probabilities
do not differ between persons, that is, P (X =1|U =u ) = P (X =1) for all persons u ∈ ΩU . Such
a randomized assignment may be performed by drawing a ball from an urn with three
black balls and one white ball, adopting the rule that the subject is treated if a black ball is
drawn.
In the third example (see Table 6.4), the conditional expectation values E (Y |X =x ) are
biased again. Here, E (Y |X =0) = 99.8, whereas E (τ0 ) = 92.333. Again, E (Y |X =0) is much
larger than E (τ0 ). In contrast, E (Y |X =1) = 96.714, whereas E (τ1 ) = 102.333, that is, again
E (Y |X =1) is much smaller than E (τ1 ). Hence, in this example, the conditional expectation
values E (Y |X =x ) are strongly biased as well. However, in contrast to the first example, the
conditional expectation values E (Y |X =x , Z =z) are unbiased. In this example, the treat-
ment probability is 3/4 for all male units, while it is 1/4 for all female units. The crucial
point is that these probabilities are identical given a value m or a value f of the covariate
Z , that is, P (X =1| Z =z,U =u ) = P (X =1| Z =z) for each person u and both values z of the
covariate Z . This constellation holds in a perfect conditionally randomized experiment in
which we assign the sampled person to treatment with probability P (X =1| Z =m) if he is
male and with probability P (X =1| Z = f ) if the sampled person is female.

(X =x , Z =z )-Conditional Expectation Values

The conditional expectation values E (Y |X =x ,Z =z), can be computed from the parame-
ters displayed in Table 6.4, applying Equation (iii) of Box 6.2. For this purpose we also need
the formula
P (X =x |U =u ) · P (U =u , Z =z)
P (U =u |X =x , Z =z) = , (6.80)
P (X =x | Z =z) · P (Z =z)
where P
u P (X =x |U =u ) · P (U =u , Z =z)
P (X =x | Z =z) = (6.81)
P (Z =z)
(see Exercise 6-13). Note that in the three examples P (X =x |U =u , Z =z) = P (X =x |U =u )
because in theses examples Z is U -measurable. Intuitively speaking, this means that Z
(sex) does not contain any information that is not already contained in U (the person vari-
able). All terms on the right-hand side of Equation (6.80) are displayed in Table 6.4 or can
be computed from the parameters displayed in this table.2

Remark 6.53 [How Realistic Are These Examples?] In empirical applications, assuming
D X = U is correct if (a) there is neither a second treatment variable nor another variable
that is simultaneous to X and if (b) no fallible covariate is observed. In this case, u sig-
nifies the observational unit at the onset of treatment. If, however, a fallible covariate of
X is observed and u represents the observational unit at the time at which the covariate
is assessed, then there may very well be covariates that are not measurable with respect
to U , which affect the outcome variable Y and/or the treatment probability (see section
2.2). Hence, in this case, D X = U would not hold. In this case a global potential confounder
would be the multivariate random variable (U , Z ), where Z = (Z 1 , . . . , Z m ) consists of the
fallible covariates Z i , i = 1, . . . , m, to be assessed before treatment. That is, in this case
D X = (U , Z ) is a global potential confounder of X [see Def. 4.11 (iii)]. ⊳
2 An alternative is using the Causal Effects Xplorer provided at www.causal-effects.de, the home page of this
book.
182 6 Unbiasedness and Identification of Causal Effects

Conditional Total Effects

Comparing the conditional prima facie effects to the causal conditional total effects re-
veals that the conditional prima facie effects are still biased with respect to total effects
in the random experiment presented in Table 6.2, but not in the examples displayed in
Tables 6.3 and 6.4. Hence, in the first example, the conditional prima facie effect and the
causal conditional total effect for males are not identical, while they are identical in the
second and third example, and the same applies to the corresponding conditional prima
facie effects for females.
The bias of the conditional prima facie effects in the example presented in Table 6.2 is
no surprise because there are still individual differences within the two sets of males and
females with respect to (a) the true outcomes under treatment and under control, as well
as (b) in the individual treatment probabilities P (X =1|U =u ). In contrast, in the second
and third example, the individual treatment probabilities are all the same within each of
the two sets of males and females.

Average Total Effect

In all three examples, the average over the individual total effects is equal to the causal
average total effect. [Remember, the causal average total effect is defined as the expec-
tation of true total effect variable CTE D X ; 10 (D X ) = δ10 (see Def. 5.26) and in this exam-
ple U =D X .] However, only in the second and third example, the expectation of the Z -
conditional prima facie effects is equal to the causal average total effect. Because this is no
coincidence, this fact can be used for causal inference even in those cases in which the
unconditional prima facie effects are biased, provided that the conditional prima facie ef-
fects are unbiased, that is, provided that PFE Z ;10 (z) = E (δ10 |Z =z) for each value z of the
covariate Z (see Th. 6.34).
Whether or not a causal average total effect is meaningful if there are different causal
conditional total effects — some of which may even be negative, while some are positive —
needs judgement with regards to content in the specific applications considered. In some
applications it might be meaningful, in others it might not. Clearly, causal conditional to-
tal effects give more specific information than the causal average total effect. However,
there are also advantages of causal average total effects. First, they give a summary eval-
uation of a treatment in a single number and different treatments may be compared to
each other with respect to this number. Second, in samples of limited size, causal average
effects can be estimated with more accuracy than the plenitude of causal conditional ef-
fects. And third, one should keep in mind that even causal conditional total effects are only
causal average total effects (see, e. g., Table 6.4). Hence, it is always a matter of case-specific
judgement how fine-grained the analysis should be.

First Conclusions

The three examples show that conditioning on a covariate of X does not necessarily yield
unbiasedness given the values of the covariate. While there is no bias at all in the second
example, the third example shows that conditioning may remove bias. Comparing Tables
6.3 and 6.4 to each other shows that unbiasedness of the conditional expectation values
E (Y |X =x , Z =z) relies on specific conditions. In these two tables, P (X =1|U ) = P (X =1| Z ),
6.6 An Example With Accidental Unbiasedness 183

that is, in these two tables there are equal individual treatment probabilities for all units
with an identical value z of Z . Such conditions implying unbiasedness of E (Y |X ) or of
E (Y |X, Z ) and their values are called causality conditions. Note, however, that there are
several of such causality conditions that do not involve the U -conditional treatment prob-
abilities (see ch. 9).

6.6 An Example With Accidental Unbiasedness

Now we treat an example demonstrating that there can be unbiasedness of the conditional
expectation E (Y |X ) and at the same time bias of the conditional expectation E (Y |X , Z ),
where Z is a covariate of X . This example shows that unbiasedness can be accidental, that
is, there are cases in which unbiasedness is not a logical consequence of experimental
design but an ‘accident of numbers’. In chapter 8 we show that the experimental design
technique of randomized assignment always induces unbiasedness of the conditional ex-
pectations E (Y |X ) and E (Y |X, Z ), whenever Z is a covariate of X , that is, whenever Z is
measurable with respect to D X .3
Table 6.5 displays the relevant parameters. We assume that it is a simple experiment
having the same structure as the examples treated in section 6.5. That is,
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,

as specified in section 6.5, is the regular probabilistic causality setup. The only difference
is that Ω1 = ΩU = { Joe , Jim , Ann , Sue } now consists of four (instead of six) persons. Again,
D X = U is a global potential confounder of X .

Conditional Expectation Values E (Y |X =x )

In this specific example, the causal individual total effects are the same for all persons,
namely 5, implying that the causal average total effect is also 5. The prima facie effect
can be computed from the difference between the two conditional expectation values
E (Y |X =0) and E (Y |X =1). In this example, Box 6.2 (i) yields
X
E (Y |X =0) = E (Y | X = 0,U =u ) · P (U =u |X = 0)
u
3 1 7 5
= 95 · + 65 · + 80 · + 50 · = 72.5
16 16 16 16
and
X
E (Y |X =1) = E (Y | X =1,U =u ) · P (U =u |X = 1)
u
5 7 1 3
= 100 · + 70 · + 85 · + 55 · = 77.5.
16 16 16 16

Using Equation (ii) of Box 6.2, the corresponding expectations of the true outcome vari-
ables are
3 Note again that unbiasedness does not refer to a sample and that there is no (successful) randomized assign-
ment if there is systematic attrition, that is, if persons do not comply with the assignments of the experimenter.
184 6 Unbiasedness and Identification of Causal Effects

Table 6.5. Accidental unbiasedness

Fundamental parameters

E X =1 (Y |U =u )
E X=0 (Y |U =u )

P (U =u |X = 0)
P (X =1 |U =u )

P (U =u |X =1)
CTE U ;10 (u)
Person u

P (U =u )
Sex z
Joe m 1/4 5/8 95 100 5 3/16 5/16
Jim m 1/4 7/8 65 70 5 1/16 7/16
Ann f 1/4 1/8 80 85 5 7/16 1/16
Sue f 1/4 3/8 50 55 5 5/16 3/16

x =0 x =1
E (τx ): 72.5 77.5 ATE 10 = 5
E (Y |X =x ): 72.5 77.5 PFE 10 = 5

E (τx |Z =m): 80 85 CTE Z ;10 (m) = 5


E (Y |X =x , Z =m): 87.5 82.5 PFE Z ; 10 (m) = −5

E (τx |Z = f ): 65 70 CTE Z ;10 ( f ) = 5


E (Y |X =x , Z = f ): 67.5 62.5 PFE Z ; 10 ( f ) = −5

X
E (τ0 ) = E (Y | X = 0,U =u ) · P (U =u )
u
1 1 1 1
= 95 · + 65 · + 80 · + 50 · = 72.5
4 4 4 4
and
X
E (τ1 ) = E (Y | X =1,U =u ) · P (U =u )
u
1 1 1 1
= 100 · + 70 · + 85 · + 55 · = 77.5.
4 4 4 4
Hence, the conditional expectation values E (Y |X =0) and E (Y |X =1) are unbiased because
they are identical to the corresponding expectations E (τ0 ) and E (τ1 ) of the true outcome
variables and because the two true outcome variables are P-unique.

Conditional Expectation Values E (Y |X =x ,Z =z )

The conditional expectation values E (Y |X =1, Z =z) and E (Y |X = 0, Z =z) can be computed
from the parameters displayed in Table 6.5 using Equation (iii) of Box 6.2. This equation
holds because, in this example, the random variable Z is measurable with respect to U
(see RS-Cor. 2.36). While the individual expected outcomes E (Y |X =x ,U =u ) are displayed
in Table 6.5, the conditional probabilities P (U =u |X =x , Z =z) have to be computed via
Equation (6.80) (see Exercise 6-13).
For Z =m (males), Equation (iii) of Box 6.2 yields
6.6 An Example With Accidental Unbiasedness 185
X
E (Y |X = 0, Z =m) = E (Y |X =0,U =u ) · P (U =u |X =0, Z =m)
u
9 3
= 95 · + 65 · + 80 · 0 + 50 · 0 = 87.5
12 12
and X
E (Y |X =1, Z =m) = E (Y | X =1,U =u ) · P (U =u |X =1, Z =m)
u
5 7
= 100 · + 70 · + 85 · 0 + 55 · 0 = 82.5.
12 12
In contrast, using Equation (iv) of Box 6.2, the (Z =m)-conditional expectation values
of the true outcome variables are
X
E (τ0 |Z =m) = E (Y | X =0,U =u ) · P (U =u |Z =m)
u
1 1
= 95 · + 65 · + 80 · 0 + 50 · 0 = 80
2 2
and
X
E (τ1 |Z =m) = E (Y | X =1,U =u ) · P (U =u |Z =m)
u
1 1
= 100 · + 70 · + 85 · 0 + 55 · 0 = 85.
2 2
For Z = f (females), Equation (iii) of Box 6.2 yields
X
E (Y |X = 0, Z = f ) = E (Y |X =0,U =u ) · P (U =u |X =0, Z = f )
u
= 95 · 0 + 65 · 0 + 80 · 7/12 + 50 · 5/12 = 67.5
and
X
E (Y |X =1, Z = f ) = E (Y |X =1,U =u ) · P (U =u |X =1, Z =m)
u
= 100 · 0 + 70 · 0 + 85 · 3/12 + 55 · 9/12 = 62.5.

In contrast, using Equation (iv) of Box 6.2, the (Z = f )-conditional expectation values of
the true outcome variables are
X
E (τ0 |Z = f ) = E (Y |X =0,U =u ) · P (U =u |Z = f )
u
1 1
= 95 · 0 + 65 · 0 + 80 · + 50 · = 65
2 2
and
X
E (τ1 |Z = f ) = E (Y |X =1,U =u ) · P (U =u |Z = f )
u
1 1
= 100 · 0 + 70 · 0 + 85 · + 55 · = 70.
2 2
Obviously, the conditional expectation values E (Y |X =x , Z =z) of the outcome variable Y
are not identical to the corresponding conditional expectation values E (τx |Z =z) of the
true outcome variables. Hence, the E (Y |X =x , Z =z) are biased although the conditional
expectation values E (Y |X =x ) are unbiased.
186 6 Unbiasedness and Identification of Causal Effects

Remark 6.54 [Methodological Implications] This example shows that unbiasedness of


the conditional expectation values E (Y |X =x ) does not imply unbiasedness of the con-
ditional expectation values E (Y |X =x , Z =z), even if Z is a covariate (or potential con-
founder) of X . Hence, this example shows that unbiasedness can be accidental, that is,
it may be a fortunate coincidence, an ‘accident of numbers’, not a logical consequence
of experimental design. In chapter 8, however, we will show that experimental design
techniques such as randomized assignment of the unit to one of the treatment condi-
tions always leads to unbiasedness of the conditional expectation values E (Y |X =x ) and
E (Y |X =x , Z =z), if Z is a covariate (or potential confounder) of X . Note, however, that this
beneficial implication of randomized assignment only applies to unbiasedness with respect
to total effects. Unfortunately, it does not apply to unbiasedness with respect to direct ef-
fects (see, e. g., Mayer et al., 2014). ⊳

6.7 Summary and Conclusions

In this chapter we introduced the concepts of unbiasedness of conditional expectations


such as E (Y |X ), E (Y |X, Z ), and E X =x (Y |Z ). We also treated unbiasedness of the prima fa-
cie effects E (Y |X =x ) − E (Y |X =x ′ ) and E (Y |X =x , Z =z) − E (Y |X =x ′, Z =z), and of the con-

ditional prima facie effect variables E X =x (Y |Z ) − E X =x (Y |Z ). In these expressions, X is the
putative cause variable, Y the outcome variable, and Z a covariate of X .

Unbiasedness

Unbiasedness is a first kind of causality conditions, which, together with the additional
structural components listed in a regular probabilistic causality space, distinguishes a
conditional expectation that has a causal meaning from an ordinary conditional expecta-
tion. Several kinds of conditional expectations and their values as well as their differences
can be unbiased (see Box 6.1). The general insight of this chapter is that comparing con-
ditional expectation values (true means) does not allow to draw any conclusions on the
effects of a treatment or intervention unless they are unbiased. In terms of the metaphor
discussed in the preface, conditional expectation values and their differences are like the
shadow of the invisible man. The length of this shadow is identical to the height of the
invisible man only under very specific conditions, in particular the angle between the sun
and the surface of the earth at the point on which the man stands.

Identification

The unbiasedness conditions are the weakest assumptions under which we can identify
causal average total effects and causal conditional total effects and effect functions. Box
6.3 summarizes the identification equations. Note that the right-hand sides of these equa-
tions are empirically estimable parameters or empirically estimable functions that can be
computed from the conditional expectations E (Y |X ) or E (Y |X, Z ). That is, only the pu-
tative cause variable X , the outcome variable Y , and the random variables Z and V are
involved. All other causality conditions that will be treated in the chapters to come imply
unbiasedness, provided that true outcome theory applies. Some of these other causality
6.8 Proofs 187

conditions are empirically testable, at least in the sense of falsifiability. Unfortunately, this
does not apply to unbiasedness itself.

Limitations

Hence, a first limitation of unbiasedness is that it cannot be tested empirically. Another


drawback of unbiasedness has been exemplified by the numerical example displayed in
Table 6.5. This example shows that, even if the conditional expectation values E (Y |X =x )
are unbiased and Z is a covariate of X , the conditional expectation values E (Y |X =x ,Z =z)
and the (Z =z)-conditional prima facie effects can be biased (see also Greenland & Robins,
1986). In contrast, the sufficient conditions of unbiasedness treated in chapters 8 and 9 are
less volatile, that is, they generalize to conditioning on a covariate Z of X . Generalizability
and falsifiability are two important virtues of these alternative causality conditions.

6.8 Proofs

Proof of Theorem 6.9

In the proofs of all four equations, we assume that τx is P-unique.


Equation (6.6).

E (Y |X =x ) ⊢ D X
⇔ E (Y |X =x ) = E (τx ) [Def. 6.3 (i)]
X =x
⇔ E (Y ) = E (τx ) [(6.3)]
X =x
¡ X =x ¢
⇔ E E (Y |D X ) = E (τx ) [RS-Box 4.1 (iv)]
X =x
⇔ E (τx ) = E (τx ). [(5.1)]

Equation (6.7).

E (Y |X =x ) ⊢ D X
⇔ E X =x (τx ) = E (τx ) [(6.6)]
⇔ E (τx |X =x ) = E (τx ) [(6.3)]
⇔ τx  1X =x [RS-Th. 4.38 (ii)]
⇔ E (τx | 1X =x ) =
P
E (τx ). [RS-(4.35)]

Equation (6.9).

E (Y |X =x ) ⊢ D X
⇔ E X =x (τx ) = E (τx ) [(6.6)]
X =x
¡ ¢ ¡ ¢
⇔ E εx +E (τx | 1X =x ) = E εx +E (τx | 1X =x ) [(6.8)]
⇔ E X =x (εx ) + E X =x E (τx | 1X =x ) = E (εx ) + E E (τx | 1X =x )
¡ ¢ ¡ ¢
[RS-Box 3.1 (vii)]
⇔ E X =x (εx ) + E X =x E (τx ) = E (εx ) + E (τx )
¡ ¢
[(6.7), RS-Box 4.1 (iv)]
⇔ E X =x (εx ) + E (τx ) = E (εx ) + E (τx ) [RS-Box 3.1 (i)]
188 6 Unbiasedness and Identification of Causal Effects

⇔ E X =x (εx ) = E (εx ).

Equation (6.10).

E (Y |X =x ) ⊢ D X
⇔ E X =x (εx ) = E (εx ) [(6.9)]
⇔ E (εx |X =x ) = E (εx ) [(6.3)]
⇔ εx  1X =x [RS-Th. 4.38 (ii)]
⇔ E (εx | 1X =x ) =
P
E (εx ). [RS-(4.35)]

Proof of Theorem 6.16

Let g x denote the factorization of E X =x (Y |Z ) [see RS-Eq. (5.23)] and g denote the fac-
torization of E (τx |Z ) [see RS-Eq. (4.14)]. Furthermore, note that P (X =x , Z =z) > 0 implies
P (Z =z) > 0.

E X =x (Y |Z ) ⊢ D X ⇔ E X =x (Y |Z ) =
P
E (τx |Z ) [Def. 6.13 (i)]
⇔ g x (Z ) =
P
g (Z ) [RS-(4.14), RS-(5.23)]
⇒ g x (z) = g (z) [P (Z =z) > 0, RS-(2.68)]
⇔ E X =x(Y |Z =z) = E (τx |Z =z) [RS-(5.24), RS-(4.17)]
X =x
⇔ E (Y |Z =z) ⊢ D X [Def. 6.13 (ii)]
⇔ E (Y |X =x , Z =z) ⊢ D X . [(6.20)]

Proof of Theorem 6.17

Again, let g x denote the factorization of E X =x (Y |Z ) [see RS-Eq. (5.23)] and g denote the
factorization of E (τx |Z ) [see RS-Eq. (4.14)].

∀ z ∈ Z (Ω): E X =x(Y |Z =z) ⊢ D X


⇔ ∀ z ∈ Z (Ω): E X =x(Y |Z =z) = E (τx |Z =z) [Def. 6.13 (ii)]
⇔ ∀ z ∈ Z (Ω): g x (z) = g (z) [RS-(5.24), RS-(4.17)]
⇒ g x (Z ) =
P
g (Z ) [RS-(2.69)]
⇒ E X =x (Y |Z ) =
P
E (τx |Z ) [RS-(5.23), RS-(4.14)]
X =x
⇔ E (Y |Z ) ⊢ D X . [Def. 6.13 (i)]

Proof of Theorem 6.19


6.8 Proofs 189

∀ x ∈ X (Ω): E Z=z(Y |X =x ) = E Z=z (τx )


⇔ ∀ x ∈ X (Ω): E X =x (Y |Z =z) = E (τx |Z =z) [(6.15), RS-(3.24)]
X =x
⇔ ∀ x ∈ X (Ω): E (Y |Z =z) ⊢ D X [Def. 6.13 (ii)]
Z=z
⇔ ∀ x ∈ X (Ω): E (Y |X =x ) ⊢ D X [(6.25)]
Z=z
⇔ E (Y |X ) ⊢ D X . [Def. 6.18 (ii)]

Proof of Theorem 6.20

Proposition (6.28). If Z is a covariate of X , then σ(Z ) ⊂ σ(D X ) [see Def. 4.11 (iv) and
Rem. 4.16]. Hence,

E X =x (Y |Z ) ⊢ D X
⇔ E X =x (Y |Z ) =
P
E (τx |Z ) [Def. 6.13 (i)]
⇔ E X =x E X =x (Y |D X ) ¯ Z =
¡ ¯ ¢
P
E (τx |Z ) [RS-Box 4.1 (xiii)]
X =x
⇔ E (τx |Z ) =
P
E (τx |Z ). [(5.1)]

Proposition (6.29).

E X =x (Y |Z ) ⊢ D X
⇔ E X =x (τx |Z ) =
P
E (τx |Z ) [(6.28)]
⇔ τx  1X =x | Z [RS-Th. 5.50]
⇔ E (τx | 1X =x , Z ) =
P
E (τx |Z ). [RS-(4.48)]

Proposition (6.31).

E X =x (Y |Z ) ⊢ D X
⇔ E X =x (τx |Z ) =
P
E (τx |Z ) [(6.28)]
X =x
¡ ¯ ¢ ¡ ¯ ¢
⇔ E εx +E (τx | 1X =x , Z )¯ Z =
P
E εx +E (τx | 1X =x , Z )¯ Z
[RS-Box 4.1 (xiv), (6.30)]
⇔ E X =x (εx |Z ) + E X =x E (τx | 1X =x , Z )¯Z =
¡ ¯ ¢ ¡ ¯ ¢
P
E (εx |Z ) + E E (τx | 1X =x , Z )¯ Z
[RS-Box 4.1 (xvii)]
X =x X =x
¡ ¯ ¢ ¡ ¯ ¢
⇔ E (εx |Z ) + E E (τx |Z ) Z =
¯
P
E (εx |Z ) + E E (τx |Z ) Z
¯ [(6.29)]
X =x
⇔ E (εx |Z ) + E (τx |Z ) =
P
E (εx |Z ) + E (τx |Z ) [RS-Box 4.1 (xi)]
⇔ E X =x (εx |Z ) =
P
E (εx |Z ).

Proposition (6.32).

E X =x (Y |Z ) ⊢ D X
⇔ E X =x (εx |Z ) =
P
E (εx |Z ) [(6.31)]
⇔ εx  1X =x |Z [RS-Th. 5.50]
⇔ E (εx | 1X =x , Z ) =
P
E (εx |Z ). [RS-(4.48)]
190 6 Unbiasedness and Identification of Causal Effects

Proof of Theorem 6.22

Equation (6.37). If Z is a covariate of X , then σ(Z ) ⊂ σ(D X ) [see Def. 4.11 (iv) and
Rem. 4.16], which implies {Z =z } = {ω ∈ Ω: Z (ω) = z } ∈ σ(D X ). Also note that E X =x (Y |D X ) =
P
E X =x (Y |DC )
[see Def. 4.11 (iii) and RS-Def. 4.4]. Hence,

E X =x (Y |Z =z) ⊢ D X
⇔ E X =x (Y |Z =z) = E (τx |Z =z) [Def. 6.13 (ii)]
X =x
⇔ E (Y |{Z =z }) = E (τx |Z =z) [RS-(3.23)]
X =x
¡ X =x ¯ ¢
⇔ E E (Y |D X ) ¯ {Z =z } = E (τx |Z =z) [{Z =z } ∈ σ(D X ), RS-Box 4.1 (xii)]
X =x
⇔ E (τx |Z =z ) = E (τx |Z =z). [(5.1), RS-(3.23)]

Equation (6.38).

E X =x (Y |Z =z) ⊢ D X
⇔ E X =x (τx |Z =z ) = E (τx |Z =z) [(6.37)]
X =x
¡ ¯ ¢ ¡ ¯ ¢
⇔ E εx +E (τx | 1X =x , Z ) Z =z = E εx +E (τx | 1X =x , Z ) Z =z
¯ ¯ [(6.30)]
X =x X =x
¡ ¯ ¢ ¡ ¯ ¢
⇔ E (εx |Z =z) + E E (τx | 1X =x , Z ) Z =z = E (εx |Z =z) + E E (τx | 1X =x , Z ) Z =z
¯ ¯
[RS-Box 3.1 (vii)]
X =x X =x
¡ ¯ ¢ ¡ ¯ ¢
⇔ E (εx |Z =z) + E E (τx | 1X =x , Z ) ¯{Z =z } = E (εx |Z =z) + E E (τx | 1X =x , Z ) ¯{Z =z }
[RS-(3.23)]
X =x X =x
⇔ E (εx |Z =z) + E (τx |Z =z) = E (εx |Z =z) + E (τx |Z =z)
[{Z =z } ∈ σ(1X =x , Z ), RS-Box 4.1 (xii), RS-(3.23)]
X =x
⇔ E (εx |Z =z) = E (εx |Z =z). [(6.37)]

Proof of Theorem 6.24

E (Y |X =x ) ⊢ D X ∧ E (Y |X =x ′ ) ⊢ D X
⇔ E (Y |X =x ) = E (τx ) ∧ E (Y |X =x ′ ) = E (τx ′ ) ∧ τx , τx ′ are P-unique [Def. 6.3 (i)]
⇒ E (Y | X =x ) − E (Y | X =x ′ ) = E (τx ) − E (τx ′ ) ∧ τx , τx ′ are P-unique
⇔ PFE x x ′ = E (τx ) − E (τx ′ ) ∧ τx , τx ′ are P-unique [(6.39)]
⇔ PFE x x ′ ⊢ DC . [Def. 6.23 (i)]

Proof of Theorem 6.25


E X =x (Y |Z ) ⊢ D X ∧ E X =x (Y |Z ) ⊢ D X
6.8 Proofs 191

⇔ E X =x (Y |Z ) =
P
E (τx |Z ) ∧ E X =x (Y |Z ) =
P
E (τx ′ |Z ) ∧ τx , τx ′ are P-unique [Def. 6.13 (i)]
X =x ′
⇒ E X =x (Y |Z ) − E (Y |Z ) =
P
E (τx |Z ) − E (τx ′ |Z ) ∧ τx , τx ′ are P-unique [SN-Rem. 2.76 (ii)]
⇔ PFE x x ′ (Z ) = E (τx |Z ) − E (τx ′ |Z ) ∧ τx , τx ′ are P-unique [(6.40)]
⇔ PFE Z ; x x ′ ⊢ DC . [Def. 6.23 (ii)]

Proof of Theorem 6.26


E X =x(Y |Z =z) ⊢ D X ∧ E X =x (Y |Z =z) ⊢ D X

⇔ E X =x(Y |Z =z) = E (τx |Z =z) ∧ E X =x (Y |Z =z) = E (τx ′ |Z =z)
∧ τx , τx ′ are P Z=z-unique [Def. 6.13 (ii)]
X =x X =x ′
⇒ E (Y |Z =z) − E (Y |Z =z) = E (τx |Z =z) − E (τx ′ |Z =z) ∧ τx , τx ′ are P Z=z-unique
⇔ PFE Z ; x x ′ (z) = E (τx − τx ′ |Z =z) ∧ τx , τx ′ are P Z=z-unique [RS-(3.35), (6.41)]
⇔ PFE Z ; x x ′ (z) ⊢ DC . [Def. 6.23 (iii)]

Proof of Theorem 6.34

¡ ¢ ¡ ¢
E PFE Z ; x x ′ (Z ) = E E (τx − τx ′ |Z ) [Def. 6.23 (ii)]
= E (τx − τx ′ ) [RS-Box 4.1 (iv)]
= ATE x x ′ . [(5.20), (5.21)]

Proof of Theorem 6.39

Equation (6.53).
¡ ¯ ¢
E PFE Z ; x x ′ (Z ) ¯V
¡ ¯ ¢
=
P
E E (τx − τx ′ |Z ) ¯V [Def. 6.23 (ii)]
¡ ¯ ¢
=
P
E E (τx |Z ) − E (τx ′ |Z ) ¯V [RS-Box 4.1 (xviii)]
¡ ¯ ¢
=
P
E E (τx | 1X =x , Z ) − E (τx ′ | 1X =x ′ , Z ) ¯V [(6.29)]
¡ ¯ ¢ ¡ ¯ ¢
=
P
E E (τx | 1X =x , Z ) ¯V − E E (τx ′ | 1X =x ′ , Z ) ¯V [RS-Box 4.1 (xviii)]
=
P
E (τx |V ) − E (τx ′ |V ) [σ(V ) ⊂ σ(1X =x , Z ), σ(V ) ⊂ σ(1X =x ′ , Z ), RS-Box 4.1 (xiii)]
=
P
E (τx − τx ′ |V ) [RS-Box 4.1 (xviii)]
=
P
CTE V ; xx ′ (V ). [(5.34)]
192 6 Unbiasedness and Identification of Causal Effects

Equation (6.54).
¡ ¯ ¢
CTE V ; x x ′ (V ) =
P
E PFE Z ; x x ′ (Z ) ¯V [(6.53)]

E E X =x (Y |Z ) − E X =x (Y |Z ) ¯V
¡ ¯ ¢
=
P
[(6.40)]

E E X =x (Y |Z ) ¯V − E E X =x (Y |Z ) ¯V .
¡ ¯ ¢ ¡ ¯ ¢
=
P
[RS-Box 4.1 (xviii)]

Proof of Theorem 6.41

Equation (6.59).
¡ ¯ ¢
E PFE Z ; x x ′ (Z ) ¯V
¡ ¯ ¢
=
P
E E (τx − τx ′ |Z ) ¯V [(6.58), Def. 6.23 (ii)]
¡ ¯ ¢
=
P
E E (τx |Z ) − E (τx ′ |Z ) ¯V [RS-Box 4.1 (xviii)]
¡ ¯ ¢
=
P
E E (τx |X , Z ) − E (τx ′ |X , Z ) ¯V [(6.57)]
¡ ¯ ¢ ¡ ¯ ¢
=
P
E E (τx |X , Z ) ¯V − E E (τx ′ |X , Z ) ¯V [RS-Box 4.1 (xviii)]
=
P
E (τx |V ) − E (τx ′ |V ) [σ(V ) ⊂ σ(X , Z ), RS-Box 4.1 (xiii)]
=
P
E (τx − τx ′ |V ) [RS-Box 4.1 (xvii)]
=
P
CTE V ; x x ′ (V ). [(5.30)]

Equation (6.60).

CTE V ; xx ′ (V )
¡ ¯ ¢
=
P
E PFE Z ; x x ′ (Z ) ¯V [(6.59)]

E E X =x (Y |Z ) − E X =x (Y |Z ) ¯V
¡ ¯ ¢
=
P
[(6.40)]

E E X =x (Y |Z ) ¯V − E E X =x (Y |Z ) ¯V .
¡ ¯ ¢ ¡ ¯ ¢
=
P
[RS-Box 4.1 (xvii)]

Proof of Corollary 6.44

Both sets of assumptions, those of Theorem 6.39 and those of Theorem 6.41, yield
¡ ¯ ¢
CTE V ; xx ′ (V ) =
P
E PFE Z ; x x ′ (Z ) ¯ V .

Because both sides are compositions of V and some numerical functions, RS-Remark 2.55
implies Equation (6.64). The same kind of argument proves Equation (6.65).

6.9 Exercises

⊲ Exercise 6-1 What is the difference between the two terms E (τx ) and E (Y |X =x )?
6.9 Exercises 193

⊲ Exercise 6-2 Compute the probabilities P(Z=z) occurring in Equation (6.81) for both values of Z
in the example displayed in Table 6.2.

⊲ Exercise 6-3 Which are the probabilities P(U =Tom , Z =m) and P(U =Ann, Z =m) occurring in
Equation (6.81) for the example displayed in Table 6.2.

⊲ Exercise 6-4 Compute the two conditional probabilities P(U =u |Z =m) for u =Tom and u = Ann
displayed in Table 6.2.

⊲ Exercise 6-5 Use RS-Theorem 1.38 to compute the probability P(X =1) for the example displayed
in Table 6.2.

⊲ Exercise 6-6 Compute the probabilities P(U =u |X = 0) and P(U =u |X =1) displayed in Table 6.2
for all six persons.

⊲ Exercise 6-7 Compute the conditional probabilities P(U =u |X =1, Z =m) occurring in Equation
(iii) of Box 6.2 for the example of Table 6.2.

⊲ Exercise 6-8 Compute the conditional expectation values E (Y |X =0) and E (Y |X =1) for the ex-
ample in Table 6.4.

⊲ Exercise 6-9 Compute the conditional expectation values E (τ1 |Z = f ) and E (τ0 |Z = f ) displayed
in Table 6.2.

⊲ Exercise 6-10 Download Kbook Table 8.4.sav from www.causal-effects.de. This data set has been
generated from Table 6.4 for a sample of size N = 10,000. Estimate the conditional expectations
E X=0 (Y | Z ) and E X =1 (Y | Z ) and use them to compute the four conditional expectation values
E (Y |X =x , Z=z ) displayed in Table 6.4.

⊲ Exercise 6-11 Prove the four equations presented in Box 6.2.

⊲ Exercise 6-12 Show that τx  X | Z implies unbiasedness of E X =x (Y |Z ) if τx is P-unique.

⊲ Exercise 6-13 Show that Equation (6.80) holds.

Solutions

⊲ Solution 6-1 The term E (τx ) denotes the expectation of a true outcome variable τx , where x is a
value of a putative cause variable. It is these true outcome variables that are of interest in the em-
pirical sciences because the difference between τx and τx ′ is the conditional effect function of x
compared to x ′ controlling for all potential confounders of X . This implies that the true outcome
variables cannot be biased, and this also applies to their expectations E (τx ). In causal research, we
often aim at estimating the differences E (τx ) −E (τx ′ ). In contrast, the differences between the con-
ditional expectation values E (Y |X =x ) and E (Y |X =x ′ ) of the outcome variable Y are not of interest
in causal research because they do not have a causal interpretation unless E (Y |X =x ) = E (τx ) and
E (Y |X =x ′ ) = E (τx ′ ), that is, unless E (Y |X =x ) and E (Y |X =x ′ ) are unbiased.
⊲ Solution 6-2 The events that U takes on the value ui and that U takes on the value u j , i 6= j , are
disjoint. Therefore, we can use the theorem of total probability (see RS-Th. 1.38):
P(Z =m) = P(Z =m,U =Tom ) + ... + P(Z =m,U =Sue )
1 1 1 1 4
= + + + +0+0 = .
6 6 6 6 6
P(Z = f ) = P(Z = f ,U =Tom ) + ... + P(Z = f ,U =Sue )
1 1 2
= 0+0+0+0+ + = .
6 6 6
194 6 Unbiasedness and Identification of Causal Effects

⊲ Solution 6-3 P(U =Tom , Z =m) = 1/6 and P(U =Ann, Z =m) = 0.
⊲ Solution 6-4
P(U =Tom , Z =m) 1/6 1
P(U =Tom | Z =m) = = = .
P(Z =m) 4/6 4
P(U =Ann, Z =m) 0
P(U =Ann|Z =m) = = = 0.
P(Z =m) 4/6
⊲ Solution 6-5 The events {U =Tom },... ,{U =Sue } are disjoint and all these events have positive
probabilities. Hence we can apply the theorem of total probability (see RS-Th. 1.38):

P(X =1) = P(X =1|U =Tom ) · P(U =Tom ) + ... + P(X =1|U =Sue ) · P(U =Sue )
6 1 5 1 4 1 3 1 2 1 1 1
= · + · + · + · + · + ·
7 6 7 6 7 6 7 6 7 6 7 6
21 1
= = .
42 2
⊲ Solution 6-6 We have to use the equation

P(X =x |U =u ) · P(U =u )
P(U =u |X =x ) = .
P(X =x )

For u =Tom and x =1 this equation yields:

P(X =1|U =Tom ) · P(U =Tom )


P(U =Tom | X =1) =
P(X =1)
6/7 · 1/6 6
= = . [Exercise 6-5]
1/2 21
Using the same procedure, we obtain 5/21,4/21,... ,1/21, the corresponding probabilities for the
other five persons Tim , ... , Sue , respectively. For u =Tom and x = 0, we obtain

P(X = 0 |U =Tom ) · P(U =Tom )


P(U =Tom | X = 0) =
P(X = 0)
1/7 · 1/6 1
= = . [Exercise 6-5]
1/2 21
Using the same procedure, we obtain 2/21, 3/21,... ,6/21 for the other five persons Tim , ... , Sue ,
respectively.
⊲ Solution 6-7 According to Equation (6.80) we need the conditional probabilities P(X =1|U =u )
displayed in Table 6.2. The other probabilities occurring in this equation can be computed from the
probabilities displayed in the table.
One of these other probabilities that needs some computation is P(X =1| Z =m). For x =1 and
z=m in the example of Table 6.2, Equation (6.81) results in:
X
P(X =1|U =u ) · P(U =u , Z =m)
u
P(X =1| Z =m) =
P(Z =m)

(6/7) · (1/6) + ... + (3/7) · (1/6) + (2/7) · 0 + (1/7) · 0 27


= = .
4/6 42
Using this result, Equation (6.80) yields:

P(X =1|U =Tom ) · P(U =Tom , Z =m)


P(U =Tom | X =1, Z =m) =
P(X =1| Z =m) · P(Z =m)

(6/7) · (1/6) 6/7 6


= = = ,
(27/42) · (4/6) (27/42) · 4 18
6.9 Exercises 195

as well as 5/18, 4/18, and 3/18 for the corresponding conditional probabilities for u =Tim , u =Joe ,
and u =Jim . The conditional probabilities P(U =Ann |X =1, Z =m) and P(U =Sue | X =1, Z =m) are
zero.
⊲ Solution 6-8 According to Equation (i) of Box 6.2,
X
E (Y |X =0) = E (Y | X = 0,U =u ) · P(U =u |X = 0)
u
1 3
= (68 + 78 + 88 + 98) · + (106 + 116) ·
10 10
= 33.2 + 66.6 = 99.8,

and
X
E (Y |X =1) = E (Y | X =1,U =u ) · P(U =u |X =1)
u
3 1
= (81 + 86 + 100 + 103) · + (114 + 130) ·
14 14
≈ 79.286 + 17.429 ≈ 96.715.

⊲ Solution 6-9 Remember again, in this example, D X = U . Therefore, the values of the true outcome
variable τx are the conditional expectation values E X =x (Y |U =u ) = E (Y | X = x,U =u ). Hence,
X
E (τ0 |Z = f ) = E (Y | X = 0,U =u ) · P(U =u |Z = f )
u
1 1
= 68 · 0 + ... + 98 · 0 + 106 · + 116 · = 111.
2 2
X
E (τ1 |Z = f ) = E (Y | X =1,U =u ) · P(U =u |Z = f )
u
1 1
= 81 · 0 + ... + 103 · 0 + 114 · + 130 · = 122.
2 2
⊲ Solution 6-10 Because Z is dichotomous, one way to estimate the conditional expectations
E X=0 (Y | Z ) and E X =1 (Y | Z ) is to estimate the linear regressions of Y on the indicator 1Z =m in treat-
ments 0 and 1, that is, within the data subsamples with x = 0 and x =1, respectively.
⊲ Solution 6-11 (i). We only have to prove E (Y |X =x ) = E (τx |X =x ) because the second equation is
Equation (ii) of RS-Box 3.2. Hence, for τx ∈ E X =x (Y |U ),

E (Y |X =x ) = E X =x (Y ) [RS-(3.24)]
E X =x E X =x (Y |U )
¡ ¢
[RS-Box 4.1 (iv)]
E X =x (τx ) [τx = E X =x (Y |U )]
E (τx |X =x ). [RS-(3.24)]

(ii). Note that there is a measurable mapping g x such that E X =x (Y |U ) = g x (U ) [see Rem. 5.11].
Furthermore, under the assumptions of Box 6.2, in particular P(X =x ,U =u ) > 0, for all u ∈U (Ω), the
true outcome variable τx is P-unique. Hence, according to RS-Box 3.1 (v), for τx ∈ E X =x (Y |U ),

E (τx ) = E E X =x (Y |U ) [τx = E X =x (Y |U )]
¡ ¢
P

[E X =x (Y |U ) = g x (U )]
¡ ¢
= E g x (U )
P
X
= g x (u) · P(U =u ) [RS-(3.13)]
u
E X =x (Y |U =u ) ·P(U =u )
X
= [RS-(5.24)]
u
X
= E (Y |X =x ,U =u ) ·P(U =u ). [RS-(5.26)]
u
196 6 Unbiasedness and Identification of Causal Effects

(iii). If Z is measurable with respect to U and ′


¢ ΩZ is finite, then, according to RS-Corollary 2.36,
there is a measurable mapping f : ΩU ,P (ΩU ) → (ΩZ′ ,AZ′ ) such that Z = f (U ). Hence,
¡

P(X =x ,U =u ) > 0

⇔ P X =x (U =u ) > 0 P X =x (U =u ) = P(X =x ,U =u )/P(X =x )


£ ¤

⇒ P X =x (Z=z) > 0
£ X =x
P X =x (U =u )
X ¤
P (Z=z) =
u : f (u)=z
X =x
£ ¤
⇔ P(X =x , Z=z) > 0. P (Z=z) = P(X =x , Z=z)/P(X =x )

Furthermore,

E X =x (Y |Z ) = E X =x E X =x (Y |U ) ¯ Z
¡ ¯ ¢
[RS-Box 4.1 (xiii)]
P X =x
E X =x (Y |Z=z) = E X =x E X =x (Y |U ) ¯ Z=z X =x
¡ ¯ ¢
⇒ [P (Z=z) > 0, RS-Rem. 2.55]

E X =x (Y |Z=z) = E X =x g x (U ) ¯ Z=z
¡ ¯ ¢
⇔ [RS-(5.23)]

E X =x (Y |Z=z) = g x (u) · P X =x (U =u |Z=z)


X
⇔ [RS-(3.28)]
u
E X =x (Y |Z=z) = E X =x (Y |U =u ) · P X =x (U =u |Z=z)
X
⇔ [RS-(5.24)]
u
X
⇔ E (Y |X =x , Z=z ) = E (Y |X =x ,U =u ) · P(U =u |X =x , Z=z). [RS-(5.26)]
u

Note that, in Box 6.2, we assume τx = E X =x (Y |U ). Therefore,


P

X =x X =x
(Y |U ) ¯ Z=z = E X =x (τx |Z=z) = E (τx |X =x , Z=z),
¡ ¯ ¢
E E

which, together with the equivalence propositions above, proves the first equation of (iii).
(iv).

E (τx |Z=z) = E E X =x (Y |U ) ¯ Z=z [τx = E X =x (Y |U )]


¡ ¯ ¢
P
¡ ¯ ¢
= E g x (U )¯ Z=z [RS-(5.23)]
X
= g x (u) · P(U =u |Z=z) [RS-(3.28)]
u
E X =x (Y |U =u ) · P(U =u |Z=z)
X
= [RS-(5.24)]
u
X
= E (Y |X =x ,U =u ) ·P(U =u |Z=z). [RS-(5.26)]
u

⊲ Solution 6-12

τx  X | Z
⇔ E (τx |X , Z ) = E (τx |Z ) [RS-Def. 4.41]
P
¡ ¯ ¢ ¡ ¯ ¢
⇒ E E (τx |X , Z ) ¯ 1X =x , Z = E E (τx |Z ) ¯ 1X =x , Z [RS-Box 4.1 (xiv)]
P

⇔ E (τx | 1X =x , Z ) = E (τx |Z ) [RS-Box 4.1 (xiii), (xi)]


P
X =x
⇒ E (Y |Z ) ⊢ D X . [τx is P-unique, Th. 6.20 (i)]

⊲ Solution 6-13 In the examples presented in Tables 6.2 to 6.4, Z (sex) is U -measurable. According
to RS-Corollary 2.36, there is a mapping g : ΩU → {m, f } such that Z is the composite function of U
and g , that is, Z = g (U ). Therefore,
6.9 Exercises 197
(
{U =u }, if g (u) = z,
{U =u , Z=z} = (6.82)
Ø, otherwise.

According to Equation (6.82), the event to sample person u and that the sampled person is male is
identical to the event to sample person u, if that person is male [i. e., if g (u) = m]. Correspondingly,
the event to sample person u and that the sampled person is female [i. e., g (u) = f ] is identical to
the event to sample person ¡u, if that
¢ person is female. In contrast, the event to sample a male person
u and to observe Z (ω) = g U (ω) = f is the
¡ empty¢ set. The same applies to the event to sample a
female person u and to observe Z (ω) = g U (ω) = m. Hence, Equation (6.82) implies
(
P(U =u ), if g (u) = z,
P(U =u , Z=z) = (6.83)
0, otherwise,

Therefore, we can conclude, that in these examples,

P(X =x ,U =u , Z=z)
P(X =x |U =u , Z=z) =
P(U =u , Z=z)
P(X =x ,U =u ) (6.84)
=
P(U =u )

= P(X =x |U =u ), if P(U =u , Z=z) > 0.

Furthermore, in our examples, P(X =x , Z=z) > 0. Therefore, if P(U =u , Z=z) > 0, then

P(X =x ,U =u , Z=z)
P(U =u |X =x , Z=z ) =
P(X =x , Z=z)
P(X =x |U =u , Z=z) · P(U =u , Z=z )
=
P(X =x | Z=z) · P(Z=z )
P(X =x |U =u ) · P(U =u , Z=z)
= , [Eq. (6.84)]
P(X =x | Z=z) · P(Z=z )

which is Equation (6.80). If P(U =u , Z=z) = 0, then P(X =x ,U =u , Z=z) = 0. Hence, if P(U =u , Z=z) =
0, then
P(X =x ,U =u , Z=z)
P(U =u |X =x , Z=z) = = 0.
P(X =x , Z=z)

The same result is also obtained applying Equation (6.80).


Chapter 7
Rosenbaum-Rubin Conditions

In chapter 6, we introduced unbiasedness of various conditional expectations, conditional


expectation values, prima facie effects, and prima facie effect functions. We also studied
how various causal total effects and total effect functions can be identified by empirically
estimable parameters and functions if we can assume that certain terms are unbiased.
Those unbiasedness conditions are a first of several kinds of causality conditions, which,
together with the structural components listed in a regular probabilistic causality setup,
distinguish causal total effects from differences between conditional expectation values
that have no causal meaning.
In this chapter we introduce some other causality conditions, all of them dealing with
the relationship between the true outcome variables and the focused putative cause vari-
able (the treatment, intervention, exposition variable). For simplicity, these causality con-
ditions will summarily be referred to as the Rosenbaum-Rubin conditions. They include
strong ignorability that has been introduced by Rosenbaum and Rubin (1983b), which
is the most restrictive of all causality conditions treated in this chapter. Note, however,
that we adapted the original Rosenbaum-Rubin condition by replacing their (determinis-
tic) potential outcome variables by the (probabilistic) true outcome variables, which have
been introduced in chapter 5.

Requirements

Reading this chapter we assume that the reader is familiar with the concepts treated in
all chapters of Steyer (2024). Chapters 4 to 6 are now crucial, dealing with the concepts of
a conditional expectation, a conditional expectation with respect to a conditional proba-
bility measure, and conditional independence. Furthermore, we assume familiarity with
chapters 4 to 6 of the present book.
In the present chapter we will often refer to the following notation and assumptions.

Notation and Assumptions 7.1


¡ ¢
(a) Let (Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y be a regular probabilistic causality setup and
let D X denote a global potential confounder of X .
(b) Let (ΩX′ , AX′ ) denote the value space of X , assume that the image X (Ω) of Ω under
X is finite or countable. Let x ∈ ΩX′ , {x } ∈ AX′ , and let 1X =x denote the indicator
variable of the event {X =x } = {ω ∈ Ω: X (ω) = x }.
(c) Let Y be real-valued with positive variance. Assume that P (X =x ) > 0, define the
probability measure P X =x : A → [0, 1] by P X =x (A) = P (A | X =x ), for all A ∈ A,
and let τx = E X =x (Y |D X ) denote a true outcome variable of Y given x.
200 7 Rosenbaum-Rubin Conditions

(d) Assume that τx is P-unique.


(e) Let X (Ω) = {0, 1, . . . , J } denote the image of Ω under X , for all x ∈ X (Ω), let
{x } ∈ AX′ and assume 0 < P (X =x ) < 1. Finally, let τ := (τ0 , τ1 , . . . , τJ ) denote
the (J + 1)-variate random variable consisting of the true outcome variables τx ,
x ∈ X (Ω).
(f ) Assume that all τx , x ∈ X (Ω), are P-unique.
(g) Let Z be a random variable on (Ω, A, P ) and let (ΩZ′ , AZ′ ) denote its value space.
(h) Let z ∈ ΩZ′ be a value of Z , let {z } ∈ AZ′ , assume P (Z =z) > 0, and define the pro-
bability measure P Z=z : A → [0, 1] by P Z=z (A) = P (A | Z =z), for all A ∈ A.

7.1 RR-Conditions for E (Y |X =x ) and E (Y |X )

In chapter 6 we introduced unbiasedness of E (Y |X =x ), denoted E (Y |X =x ) ⊢ D X and de-


fined by the conjunction of P -uniqueness of τx and E (Y |X =x ) = E (τx ). Presuming P -
uniqueness of τx , we also showed that E (Y |X =x ) ⊢ D X is equivalent to mean-indepen-
dence of τx from 1X =x , denoted τx  1X =x and defined by E (τx | 1X =x ) = E (τx ) [see Th. 6.9 (i)].
In this section, we introduce more causality conditions for E (Y |X =x ) and E (Y |X ) involv-
ing the true outcome variables τx . These causality conditions are presented in Box 7.1 and
the implication relations between them are summarized in Table 7.1.

7.1.1 Mean-Independence Conditions

The general concept of mean-independence has already been introduced in RS-Definition


4.36. However, in order to apply it to a true outcome variable τx we have to assume that
τx = E X =x(Y |D X ) is P-unique (see RS-Rem. 5.14, and RS-Th. 5.27). Hence, under the As-
sumptions 7.1 (a) to (d) we define

τx  1X =x :⇔ E (τx | 1X =x ) =
P
E (τx ) (7.1)

and call it mean-independence of τx from 1X =x . Correspondingly, mean-independence of τx


from X is defined by

τx  X :⇔ E (τx |X ) =
P
E (τx ). (7.2)

These are the first two causality conditions listed in Box 7.1.
Furthermore, under the Assumptions 7.1 (a) to (f) we define

∀x : τx  1X =x :⇔ ∀ x ∈ X (Ω): E (τx | 1X =x ) =
P
E (τx ) (7.3)

and call it mean-independence of τx from 1X =x for all x . Under the same assumptions,
mean-independence of all τx from X is defined by

∀x : τx  X :⇔ ∀ x ∈ X (Ω): E (τx |X ) =
P
E (τx ). (7.4)

These are conditions (v) and (vi), respectively, of Box 7.1.


7.1 RR-Conditions for E (Y |X =x ) and E (Y |X ) 201

Remark 7.2 [Consequences of P -Uniqueness of the True Outcome Variables] If τx , τx∗ are
two versions of a true outcome variable, then P -uniqueness of τx does not only imply
that the two versions are P -equivalent, that is, τx = P
τx∗ (see RS-Def. 2.46), but also that
their expectations are identical and their X -conditional expectations are P -equivalent
[see RS-Box 4.1 (xiv)]. Without P -uniqueness of τx , neither τx = P
τx∗, nor E (τx ) = E (τx∗), or

E (τx |X ) =
P
E (τx |X ) are guaranteed. Hence, if a true outcome variable τx is not P-unique,
then assuming E (τx |X ) = P
E (τx ) does not make sense because E (τ x ) would not be uniquely
defined. ⊳

Remark 7.3 [τx Is the Composition of D X and a Function g x ] At first sight, mean-indepen-
dence of τx from X seems paradoxical, however, it is not. The reason is that τx = E X =x(Y |D X )
is a function of the global potential confounder D X . More precisely, τx is a composition
g x (D X ) of D X and a function g x : Ω′D X → R [see RS-Eq. (5.23)], where (Ω′D X , A D

X
) denotes
the value space of D X . Although τx refers to a specific value x of X , a true outcome variable
τx is not a function of X . Therefore, postulating E (τx |X ) =
P
E (τx ) does make sense, provided
that τx is P-unique. ⊳

Remark 7.4 [Mean-Independence of τx From X Implies Unbiasedness of E (Y |X =x )] If


the Assumptions 7.1 (a) to (d) hold, then

τx  X ⇒ τx  1X =x (7.5)
⇔ E (Y |X =x ) ⊢ D X (7.6)

(see Exercise 7-1). Hence, mean-independence of τx from X implies mean-independence


of τx from 1X =x , which itself is equivalent to unbiasedness of E (Y |X =x ) [see Th. 6.9 (i)].
However, unless X is dichotomous, τx  X is not equivalent to τx  1X =x . More precisely,
τx  1X =x does not imply τx  X unless E (τx |X ) =
P
E (τx | 1X =x ). ⊳

Remark 7.5 [Mean-Independence of All τx From X Implies Unbiasedness of E (Y |X )] If


the Assumptions 7.1 (a) to (f) hold, then

∀x : τx  X ⇒ ∀ x ∈ X (Ω): τx  1X =x (7.7)
⇔ ∀ x ∈ X (Ω): E (Y |X =x ) ⊢ D X (7.8)
⇔ E (Y |X ) ⊢ D X (7.9)

(see Exercise 7-2). Hence, mean-independence of all τx from X implies unbiasedness of all
conditional expectation values E (Y |X =x ), and under the Assumptions 7.1 (a) to (f), this is
equivalent to unbiasedness of the conditional expectation E (Y |X ). ⊳

7.1.2 Independence Conditions

In this section, we treat some independence conditions involving the true outcome vari-
ables τx . These conditions also imply unbiasedness of E (Y |X =x ) and E (Y |X ). Remem-
ber, X ⊥ ⊥Y denotes independence of two random variables X and Y (see RS-Def. 2.59).
Furthermore, τ = (τ0 , τ1 , . . . , τJ ), that is, τ is a J + 1-variate random variable consisting of
the true outcome variables τx = E X =x(Y |D X ), x ∈ X (Ω) = {0, 1, . . . , J }.
202 7 Rosenbaum-Rubin Conditions

Box 7.1 Rosenbaum-Rubin conditions for E (Y |X =x ) and E (Y |X )

RR-Conditions implying unbiasedness of E (Y |X =x )


τx  1X =x Mean-independence of τx from 1X =x . If the Assumptions 7.1 (a) to (d)
hold, then it is defined by

E (τx | 1X =x ) = E (τx ). (i)


P

τx  X Mean-independence of τx from X . If the Assumptions 7.1 (a) to (d) hold,


then it is defined by
E (τx |X ) = E (τx ). (ii)
P

τx ⊥
⊥ 1X =x Independence of τx and 1X =x . If the Assumptions 7.1 (a) to (c) hold, then
it is equivalent to
P(X =x |τx ) = P(X =x ). (iii)
P

τx ⊥
⊥X Independence of τx and X . If the Assumptions 7.1 (a) to (c) hold, then it is
equivalent to
∀ x ′ ∈ X (Ω): P(X =x ′ |τx ) = P(X =x ′ ). (iv)
P

The last two conditions are well-defined under the Assumptions 7.1 (a) to (c). However, only
in conjunction with Assumption 7.1 (d), each of conditions (i) to (iv) implies E (Y |X =x ) ⊢ D X .

RR-Conditions implying unbiasedness of E (Y |X ) and all E (Y |X =x ) The


∀x : τx  1X =x Mean-independence of τx from 1X =x for all x. If the Assumptions 7.1 (a)
to (f ) hold, then it is defined by

∀ x ∈ X (Ω) : E (τx | 1X =x ) = E (τx ). (v)


P

∀x : τx  X Mean-independence of all τx from X . If the Assumptions 7.1 (a) to (f )


hold, then it is defined by

∀ x ∈ X (Ω) : E (τx |X ) = E (τx ). (vi)


P

∀x : τx ⊥
⊥ 1X =x Independence of τx and 1X =x for all x. If the Assumptions 7.1 (a) to (c)
and (e) hold, then it is equivalent to
∀ x ∈ X (Ω) : P(X =x |τx ) = P(X =x ). (vii)
P

∀x : τx ⊥
⊥X Independence of τx and X for all x. If the Assumptions 7.1 (a) to (c) and
(e) hold, then it is equivalent to
∀ x, x ′ ∈ X (Ω) : P(X =x ′ |τx ) = P(X =x ′ ). (viii)
P

τ⊥
⊥X Independence of τ and X . If the Assumptions 7.1 (a) to (c) and (e) hold,
then it is equivalent to
∀ x ∈ X (Ω) : P(X =x |τ) = P(X =x ). (ix)
P

last three conditions are well-defined under the Assumptions 7.1 (a) to (c) and (e). However,
only in conjunction with Assumption 7.1 (f ), each of conditions (v) to (ix) implies unbiasedness
of E (Y |X ) and all E (Y |X =x ), x ∈ X (Ω).
7.1 RR-Conditions for E (Y |X =x ) and E (Y |X ) 203

Remark 7.6 [Equivalent Formulations of the Independence Conditions] These indepen-


dence conditions and their symbols are listed in Box 7.1 [see conditions (iii), (iv), and (vii)
to (ix)]. The assumptions specified in this box include that the image X (Ω) of Ω under
X is finite or countable, and in this case the conditions specified in the table and on the
right-hand sides of Propositions (7.10) to (7.14) are equivalent to the corresponding in-
dependence condition (see RS-Cors. 6.17 and 6.18). For convenience, we repeat these in-
dependence conditions and the conditions that are equivalent to them, provided that the
image X (Ω) of Ω under X is finite or countable:

τx ⊥
⊥ 1X =x ⇔ P (X =x |τx ) =
P
P (X =x ) (7.10)
′ ′ ′
τx ⊥
⊥X ⇔ ∀ x ∈ X (Ω): P (X =x |τx ) =
P
P (X =x ) (7.11)
∀x : τx  1X =x ⇔ ∀ x ∈ X (Ω): P (X =x |τx ) =
P
P (X =x ) (7.12)
∀x : τx ⊥
⊥X ⇔ ∀ x, x ′ ∈ X (Ω): P (X =x ′ |τx ) =
P
P (X =x ′ ) (7.13)
τ⊥
⊥X ⇔ ∀ x ∈ X (Ω): P (X =x |τ) =
P
P (X =x ). (7.14)

Note again that independence of two random variables X and Y is also defined if nei-
ther X nor Y are finite or countable (see RS-Def. 2.59). However, because we assume that X
is finite or countable, the propositions on the right-hand sides above are more convenient
and more intuitive than the general definition. ⊳

7.1.3 Implications Among RR-Conditions for E (Y |X =x ) and E (Y |X )

Table 7.1 displays the implications among the causality conditions listed in Box 7.1. Note
that the propositions summarized in this table are special cases of the corresponding
propositions presented in Table 7.2. (For proofs see Exercises 7-3 and 7-4.)
In the sequel we study some consequences of some of the independence conditions,
starting with the consequences of τx ⊥ ⊥ 1X =x , that is, of independence of the true outcome
variable τx and the indicator variable 1X =x . According to Theorem 7.7, under the Assump-
tions 7.1 (a) to (d), τx ⊥
⊥ 1X =x implies mean-independence of τx from 1X =x , which itself is
equivalent to unbiasedness of E (Y |X =x ).

⊥ 1X =x Implies Unbiasedness of E (Y |X =x )]
Theorem 7.7 [τx ⊥
Let the Assumptions 7.1 (a) to (d) hold. Then

τx ⊥
⊥ 1X =x ⇒ τx  1X =x (7.15)
⇔ E (Y |X =x ) ⊢ D X . (7.16)
(Proof p. 220)

Hence, under the assumptions of this theorem, independence of τx and 1X =x implies


mean-independence of τx from 1X =x , which is equivalent to unbiasedness of the condi-
tional expectation value E (Y |X =x ).
According to the next theorem, independence of the true outcome variable τx and the
putative cause variable X implies τx ⊥ ⊥ 1X =x . If we additionally assume P -uniqueness of τx ,
then it also implies mean-independence of τx from X and from the indicator variable 1X =x ,
which is equivalent to unbiasedness of E (Y |X =x ) [see Prop. (7.16)].
204 7 Rosenbaum-Rubin Conditions

Table 7.1. Implications among RR-conditions for E (Y |X =x ) and E (Y |X )

⊥1X =x
∀x : τx  1X =x

⊥X
∀x : τx  X
⊥1X =x
τx  1X =x

∀x : τx ⊥
∀x : τx ⊥
⊥X
τx  X

τx ⊥
τx ⊥
τx  X (a)-(d)
τx ⊥
⊥ 1X =x (a)-(d)
τx ⊥
⊥X (a)-(d) (a)-(d) (a)-(c)

∀x : τx  1X =x (a)-(f)
∀x : τx  X (a)-(f) (a)-(f)
∀x : τx ⊥
⊥ 1X =x (a)-(e) (a)-(e) (a)-(c),(e) (a)-(f)
∀x : τx ⊥
⊥X (a)-(e) (a)-(e) (a)-(c),(e) (a)-(c),(e) (a)-(f) (a)-(f) (a)-(c),(e)
τ⊥
⊥X (a)-(e) (a)-(e) (a)-(c),(e) (a)-(c),(e) (a)-(f) (a)-(f) (a)-(c),(e) (a)-(c),(e)

Note: An entry such as (a)-(d) means that the condition in the row implies the condition in the
column, provided that the Assumptions 7.1 (a) to (d) hold. The symbols involving  or ⊥ ⊥ are
explained in Box 7.1. Trivial equivalences such as τx ⊥⊥X ⇔ τx ⊥ ⊥X are omitted. The first three
conditions imply unbiasedness of E (Y |X =x ), provided that the Assumptions 7.1 (a) to (d) hold
and, under the Assumptions 7.1 (a) to (f ), the last five imply unbiasedness of E (Y |X ) and all
E (Y |X =x ), x ∈ X (Ω).

Theorem 7.8 [Consequences of τx ⊥ ⊥X ]


Let the Assumptions 7.1 (a) to (c) hold. Then:

τx ⊥
⊥X ⇒ τx ⊥
⊥ 1X =x . (7.17)

If we additionally assume 7.1 (d), then

τx ⊥
⊥X ⇒ τx  X (7.18)
⇒ τx  1X =x (7.19)
⇔ E (Y |X =x ) ⊢ D X . (7.20)
(Proof p. 221)

⊥X implies τx ⊥
Remark 7.9 [τ⊥ ⊥ 1X =x ] Remember, according to RS-Corollary 6.18,

P (X =x |τx ) =
P
P (X =x ) ⇔ τx ⊥
⊥ 1X =x . (7.21)

As mentioned before, even if we assume τx ⊥⊥X for all x ∈ X (Ω), then this is less restrictive
than τ⊥
⊥X . More precisely, if the Assumptions 7.1 (a) to (c) and (e) hold, then
¡ ¢
τ⊥⊥X ⇒ ∀ x ∈ X (Ω) : τx ⊥
⊥X , (7.22)

which follows from σ(τx ) ⊂ σ(τ) and RS-Box 2.1 (iv). Note that the term on the right-hand
side of Proposition (7.22) does not imply τ⊥
⊥X . ⊳
7.2 RR-Conditions for E Z=z(Y |X =x ) and E Z=z (Y |X ) 205

The following theorem summarizes some important consequences of τ⊥ ⊥ X . These


consequences include unbiasedness of E (Y |X ) and all its values E (Y |X =x ). Note that this
theorem is a special case of Theorem 7.21 for Z being a constant map.

Theorem 7.10 [Consequences of τ⊥ ⊥X ]


If the Assumptions 7.1 (a) to (c) and (e) hold, then

τ⊥
⊥X ⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x (7.23)
⇒ ∀ x ∈ X (Ω): τx ⊥
⊥X . (7.24)

If we additionally assume 7.1 (f ), then

∀ x ∈ X (Ω): τx ⊥
⊥X ⇒ ∀ x ∈ X (Ω): τx  X (7.25)
⇒ ∀ x ∈ X (Ω): τx  1X =x (7.26)
⇔ ∀ x ∈ X (Ω): E (Y |X =x ) ⊢ D X (7.27)
⇔ E (Y |X ) ⊢ D X . (7.28)

Remark 7.11 [Methodological Consequences] In empirical studies, there is neither a di-


rect way to create one of the causality conditions treated in this section, nor to check if
these conditions hold, unless the values of the true outcome variables are known. Hence,
from a practical perspective, these conditions are useless. However, in chapter 8, we will
treat another causality condition, D X ⊥ ⊥X , that implies τ⊥ ⊥X and P -uniqueness of all true
outcome variables τx , x ∈ X (Ω). The condition D X ⊥ ⊥X can be created via randomized as-
signment of the observational unit (e. g., the person) to a treatment condition x. Hence,
there is an indirect way to create τ⊥ ⊥X , and with it, unbiasedness of the conditional ex-
pectation E (Y |X ) and all its values E (Y |X =x ), x ∈ X (Ω). ⊳

7.2 RR-Conditions for E Z=z (Y |X =x ) and E Z=z (Y |X )

Remember, τ⊥ ⊥X means independence of τ and X with respect to the probability mea-


sure P on the measurable space (Ω, A ). A more explicit notation is τ ⊥ ⊥ X . Hence, all
P
propositions of Theorem 7.10 hold with respect to P . For example, τx  X denotes mean-
independence of τx from X with respect to P , which may also be denoted by τx  P
X . Such
an explicit notation is necessary if we want to use Theorem 7.10 for another probability
measure on (Ω, A ) such as P Z=z : A → [0, 1] [see Ass. 7.1 (h)].

Remark 7.12 [(Z =z)-Conditional Independence of X and Y ] Hence, if X , Y , Z are ran-


dom variables on (Ω, A, P ) and P (Z =z) > 0, then

X⊥
⊥Y |(Z =z) :⇔ X ⊥
⊥Y
P Z=z (7.29)
Z=z
⇔ P (A ∩B ) = P Z=z (A) · P Z=z (B), ∀A, B ∈ σ(X ) × σ(Y )

defines (Z =z)-conditional independence of X and Y . By definition, it is equivalent to inde-


pendence of X and Y with respect to the probability measure P Z=z on (Ω, A ). ⊳
206 7 Rosenbaum-Rubin Conditions

Remark 7.13 [(Z =z)-Conditional Mean-Independence of Y From X ] Furthermore, if Y


is numerical or with finite expectation E Z=z (Y ) with respect to the measure P Z=z , then

Y X | Z =z :⇔ Y  X ⇔ E Z=z (Y |X ) Z=z
= E Z=z (Y ) (7.30)
P Z=z P

defines (Z =z)-conditional mean-independence of Y from X , which, by definition, is equiv-


alent to mean-independence of Y from X with respect to the probability measure P Z=z . ⊳

Using this notation, Theorem 7.10 implies the following corollary about the conse-
quences of (Z =z)-conditional independence of τ and X . Reading this corollary, note that
P -uniqueness of τx implies P Z=z -uniqueness of τx (see RS-Box 5.1) and that unbiasedness
of the terms E Z=z (Y |X ) and E Z=z(Y |X =x ), x ∈ X (Ω), is defined only if Z is a covariate of
X.

Corollary 7.14 [Consequences of (Z =z)-Conditional Independence of τ and X ]


If the Assumptions 7.1 (a) to (c), (e), (g), and (h) hold, then

τ⊥
⊥X |(Z =z) ⇔ ∀ x ∈ X (Ω): τ⊥
⊥ 1X =x |(Z =z) (7.31)
⇒ ∀ x ∈ X (Ω): τx ⊥
⊥X |(Z =z). (7.32)

If we additionally assume that all τx , x ∈ X (Ω), are P Z=z -unique, then

∀ x ∈ X (Ω): τx ⊥
⊥X |(Z =z) ⇒ ∀ x ∈ X (Ω): τx  X |(Z =z) (7.33)
⇒ ∀ x ∈ X (Ω): τx  1X =x |(Z =z). (7.34)

If we additionally assume that Z is a covariate of X , then

∀ x ∈ X (Ω): τx  1X =x |(Z =z) ⇔ ∀ x ∈ X (Ω): E Z=z(Y |X =x ) ⊢ D X (7.35)


Z=z
⇔ E (Y |X ) ⊢ D X . (7.36)

Hence, according to this theorem (Z =z)-conditional independence of τ and X im-


plies, among other things, unbiasedness of the X -conditional expectation of Y with re-
spect to the probability measure P Z=z [see Prop. (7.36)] and unbiasedness of all its values
E Z=z(Y |X =x ) = E (Y |X =x , Z =z), x ∈ X (Ω) [see Prop. (7.35)], provided that Z is a covariate
of X .

7.3 RR-Conditions for E X =x (Y |Z ) and E (Y |X , Z )

Now we generalize the causality conditions treated in section 7.1, conditioning on a ran-
dom variable Z on (Ω, A, P ). Under the appropriate assumptions, which include that Z
is a covariate of X , these conditions imply unbiasedness of the conditional expectation
E (Y |X, Z ) and all Z -conditional expectations E X =x (Y |Z ), x ∈ X (Ω). These causality con-
ditions can indirectly be created by conditionally randomized assignment of the observa-
tional unit to a treatment condition, but also via covariate selection (see chs. 8 to 10).
7.3 RR-Conditions for E X =x (Y |Z ) and E (Y |X, Z ) 207

Box 7.2 lists all causality conditions treated in this section, including their symbols and
definitions, and Table 7.2 summarizes the implications among them. The most restric-
tive of these causality conditions is Z -conditional independence of a (J + 1)-variate true
outcome variable τ and X , which is the translation (into true outcome theory) of strong
ignorability (Rosenbaum & Rubin, 1983b).

7.3.1 Conditional Mean-Independence Conditions

The first condition in Box 7.2 is Z -conditional mean-independence of τx from 1X =x , de-


noted τx  1X =x | Z . Under the Assumptions 7.1 (a) to (d) and (g), it is defined by

τx  1X =x | Z :⇔ E (τx | 1X =x , Z ) =
P
E (τx |Z ). (7.37)

According to Theorem 6.20 (i), this condition is equivalent to unbiasedness of E X =x (Y |Z ),


provided that τx is P-unique and Z is a covariate of X .
The second condition in Box 7.2 is Z -conditional mean-independence of τx from X , de-
noted τx  X | Z . Under the Assumptions 7.1 (a) to (d) and (g), it is defined by

τx  X | Z :⇔ E (τx |X , Z ) =
P
E (τx |Z ). (7.38)

Remark 7.15 [Dichotomous X ] If the Assumptions 7.1 (a) to (d) and (g) hold, then X be-
ing dichotomous implies

E (τx |X , Z ) =
P
E (τx | 1X =x , Z ) =
P
E (τx | 1X 6=x , Z ) (7.39)

because σ(X , Z ) = σ(1X =x , Z ) = σ(1X 6=x , Z ) (see RS-Def. 4.4). Hence, under these assump-
tions, X being dichotomous implies

τx  X | Z ⇔ τx  1X =x | Z ⇔ τx  1X 6=x | Z . (7.40)

If we additionally assume that Z is a covariate of X , then X being dichotomous also implies

τx  X | Z ⇔ E (Y |X, Z ) ⊢ D X . (7.41)

7.3.2 Conditional Independence Conditions

Conditions three and four in Box 7.2 are Z -conditional independence of τx and 1X =x , de-
noted τ⊥ ⊥1X =x |Z , and Z -conditional independence of τx and X , denoted τ⊥⊥ X |Z . Remem-
ber that the general concept of conditional independence of two random variables X and
Y given a random variable Z , denoted X ⊥ ⊥Y | Z , has been introduced in RS-Definition 6.2.
However, under the Assumptions 7.1 (a) to (c) and (g),

τx ⊥
⊥ 1X =x |Z ⇔ P (X =x | Z , τx ) =
P
P (X =x | Z ), (7.42)

and

τx ⊥
⊥X |Z ⇔ ∀ x ′ ∈ X (Ω) : P (X =x ′ |τx , Z ) =
P
P (X =x ′ | Z ). (7.43)
208 7 Rosenbaum-Rubin Conditions

Box 7.2 Conditional Rosenbaum-Rubin conditions

RR-Conditions implying unbiasedness of E X =x (Y |Z )

τx  1X =x | Z Z-conditional mean-independence of τx from 1X =x . Under the Assump-


tions 7.1 (a) to (d) and (g), it is defined by

E (τx | 1X =x , Z ) = E (τx |Z ) . (i)


P

τx  X | Z Z-conditional mean-independence of τx from X . Under the Assump-


tions 7.1 (a) to (d) and (g), it is defined by
E (τx |X , Z ) = E (τx |Z ) . (ii)
P

τx ⊥
⊥1X =x |Z Z-conditional independence of τx and 1X =x . Under the Assumptions 7.1
(a) to (c) and (g), it is equivalent to

P(X =x |τx , Z ) = P(X =x | Z ). (iii)


P

τx ⊥
⊥X |Z Z-conditional independence of τx and X . Under the Assumptions 7.1 (a)
to (c) and (g), it is equivalent to
x ′ ∈ X (Ω): P(X =x ′ |τx , Z ) = P(X =x ′ | Z ). (iv)
P

If we additionally assume 7.1 (d) and Z is a covariate of X , then each of conditions (i) to (iv)
implies E X =x (Y |Z ) ⊢ D X .

RR-Conditions implying unbiasedness of E (Y |X, Z ) and all E X =x (Y |Z )

∀x : τx  1X =x | Z Z-conditional mean-independence of τx from 1X =x for all x. Under the


Assumptions 7.1 (a) to (g), it is defined by

∀ x ∈ X (Ω) : E (τx | 1X =x , Z ) = E (τx |Z ) . (v)


P

∀x : τx  X |Z Z-conditional mean-independence of all τx from X . Under the Assump-


tions 7.1 (a) to (g), it is defined by
∀ x ∈ X (Ω) : E (τx |X , Z ) = E (τx |Z ) . (vi)
P

∀x : τx ⊥
⊥ 1X =x |Z Z-conditional independence of τx and 1X =x for all x. Under the Assump-
tions 7.1 (a) to (c), (e) and (g), it is equivalent to

∀ x ∈ X (Ω) : P(X =x |τx , Z ) = P(X =x | Z ). (vii)


P

∀x : τx ⊥
⊥X |Z Z-conditional independence of τx and X for all x. Under the Assump-
tions 7.1 (a) to (c), (e) and (g), it is equivalent to
∀ x, x ′ ∈ X (Ω) : P(X =x ′ |τx , Z ) = P(X =x ′ | Z ). (viii)
P

τ⊥
⊥X |Z Z-conditional independence of τ and X (strong ignorability). Under the
Assumptions 7.1 (a) to (c), (e) and (g), it is equivalent to

∀ x ∈ X (Ω) : P(X =x |τ, Z ) = P(X =x | Z ). (ix)


P

If we additionally assume 7.1 (f ) and Z is a covariate of X , then each of conditions (v) to (ix)
implies E (Y |X, Z ) ⊢ D X and E X =x (Y |Z ) ⊢ D X , for all x ∈ X (Ω).
7.3 RR-Conditions for E X =x (Y |Z ) and E (Y |X, Z ) 209

Also note that τx ⊥⊥X |Z implies τx ⊥ ⊥ 1X =x |Z but not vice versa, unless X is dichotomous.
Conditions (v) to (viii) of Box 7.2 postulate that conditions (i) to (iv) hold for all values
x of X . Under the appropriate assumptions including that Z is a covariate of X , these
conditions imply unbiasedness of E (Y |X, Z ) and E X =x (Y |Z ), for all values x ∈ X (Ω). Note
that condition (viii) has been proposed by Porta (2014, p. 142).
Finally, the last condition in Box 7.2 is Z -conditional independence of τ and X , denoted
τ⊥⊥X |Z , where τ = (τ0 , τ1 , . . . , τ J ) is a J + 1-dimensional random variable consisting of the
true outcome variables τx , x ∈ X (Ω). Under the Assumptions 7.1 (a) to (c), (e) and (g), it is
equivalent to

τ⊥
⊥X |Z ⇔ ∀ x ∈ X (Ω) : P (X =x | Z , τ) =
P
P (X =x | Z ) (7.44)

(see RS-Theorem 6.5). Hence, assuming Z-conditional independence of τ and X is equiva-


lent to assuming Z-conditional independence of the events {X =x } from τ, for all x ∈ X (Ω).

Remark 7.16 [Strong Ignorability] Note that τ⊥ ⊥X |Z is the translation of Rosenbaum


and Rubin’s strong ignorability into true outcome theory. This condition has been pre-
sented by Rosenbaum and Rubin (1983b) and plays a crucial role in Rubin’s potential out-
come approach to causal effects. Also note that the additional assumption P (X =x | Z ) >
P
0,
for all x ∈ X (Ω), of Rosenbaum and Rubin (1983b) follows from P -uniqueness of τx , for all
x ∈ X (Ω), which is Assumption 7.1 (f) (see Exercise 7-6). As mentioned before, P -unique-
ness of a true outcome variable τx is equivalent to

P (X =x |D X ) >
P
0 (7.45)

[see RS-Th. 5.27 (ii)]. If we assume that Z is a covariate of X , then P (X =x | D X ) > P


0 im-
plies P (X =x | Z ) >
P
0. However, P (X =x | Z ) >
P
0 does not imply P (X =x |D X ) >
P
0, also not in
conjunction with τx ⊥ ⊥X |Z or τ⊥⊥X |Z , unless P (X =x | Z ) =
P
P (X =x |D X ). ⊳

7.3.3 Implications Among RR-Conditions for E X =x (Y |Z ) and E (Y |X , Z )

Table 7.2 displays the implications among the causality conditions listed in Box 7.2. In this
section, we present some theorems in which most of these implications are proved. The
solution to Exercise 7-4 provide a guide to the proofs of all these implications.
In the following theorem, we present some propositions about the consequences of
τx  X | Z concerning unbiasedness.

Theorem 7.17 [Consequences of Conditional Mean-Independence of τx From X ]


If the Assumptions 7.1 (a) to (d) and (g) hold, then

τx  X | Z ⇒ τx  1X =x | Z . (7.46)

If we additionally assume that Z is a covariate of X , then

τx  1X =x | Z ⇔ E X =x (Y |Z ) ⊢ D X . (7.47)
(Proof p. 221)
210 7 Rosenbaum-Rubin Conditions

Hence, under the assumptions of Theorem 7.17, if τx is Z -conditionally mean-indepen-


dent from X , then it is also Z -conditionally mean-independent from an indicator 1X =x for
one of the values x of the putative cause variable X , which itself is equivalent to unbiased-
ness of E X =x (Y |Z ), provided that we also assume that Z is a covariate of X .
In the following corollary we consider the consequences of assuming that, for all
x ∈ X (Ω), the true outcome variable τx is Z -conditionally mean-independent from X .

Corollary 7.18 [Consequences of Conditional Mean-Independence of τx From 1X =x ]


If the Assumptions 7.1 (a) to (d) and (g) hold, then

∀ x ∈ X (Ω): τx  X | Z ⇒ ∀ x ∈ X (Ω): τx  1X =x | Z . (7.48)

If we additionally assume that Z is a covariate of X , then

∀ x ∈ X (Ω): τx  1X =x | Z ⇔ E (Y |X, Z ) ⊢ D X . (7.49)


(Proof p. 221)

Hence, if we assume Z -conditional mean-independence of τx from X for all values x of


X , and Z is a covariate of X , then E (Y |X, Z ) is unbiased, provided that all true outcome
variables τx , x ∈ X (Ω), are P-unique.
Now we consider conditional (stochastic) independence of a true outcome variable τx
from an indicator variable 1X =x for a value x of the putative cause variable X . According
to the following theorem, if the Assumptions 7.1 (a) to (d) and (g) hold, then τx ⊥ ⊥ 1X =x |Z
implies Z -conditional mean-independence of τx from 1X =x , and if Z is a covariate of X ,
then it also implies unbiasedness of E X =x (Y |Z ).

Theorem 7.19 [Consequences of Conditional Independence of τx From 1X =x ]


If the Assumptions 7.1 (a) to (d) and (g) hold, then

τx ⊥
⊥ 1X =x |Z ⇒ τx  1X =x | Z . (7.50)

If we additionally assume that Z is a covariate of X , then

τx  1X =x | Z ⇔ E X =x (Y |Z ) ⊢ D X . (7.51)
(Proof p. 222)

Again, note that Proposition (7.50) holds for any value x of X for which τx is P-unique.
Hence, if τx is P-unique for all x ∈ X (Ω), then Proposition (7.50) holds for all x ∈ X (Ω). Cor-
respondingly, if τx is P-unique for all x ∈ X (Ω) and Z is a covariate of X , then Proposition
(7.51) holds for all x ∈ X (Ω).
In Theorem 7.19 we considered Z -conditional independence of τx and an indicator
variable 1X =x . In the following theorem we turn to Z -conditional independence of τx and
X , which may take on more than just two different values.
211

Table 7.2. Implications among RR-conditions for E X =x (Y |Z ) and E (Y |X, Z )

⊥1X =x |Z
∀x : τx  1X =x | Z

⊥X |Z
∀x : τx  X |Z
τx  1X =x | Z

⊥1X =x |Z

⊥X |Z

∀x : τx ⊥
∀x : τx ⊥
τx  X | Z

τx ⊥

τx ⊥
τx  X | Z (a)-(d), (g)
τx ⊥
⊥1X =x |Z (a)-(d), (g)
τx ⊥
⊥X |Z (a)-(d), (g) (a)-(d), (g) (a)-(c), (g)

∀x : τx  1X =x | Z (a)-(g)
∀x : τx  X |Z (a)-(g) (a)-(g)
∀x : τx ⊥
⊥ 1X =x |Z (a)-(e), (g) (a)-(c), (e), (g) (a)-(g)
∀x : τx ⊥
⊥X |Z (a)-(e), (g) (a)-(e), (g) (a)-(c), (e), (g) (a)-(c), (e), (g) (a)-(g) (a)-(g) (a)-(c), (e), (g)
τ⊥
⊥X |Z (a)-(e), (g) (a)-(e), (g) (a)-(c), (e), (g) (a)-(c), (e), (g) (a)-(g) (a)-(g) (a)-(c), (e), (g) (a)-(c), (e), (g)
7.3 RR-Conditions for E X =x (Y |Z ) and E (Y |X, Z )

Note: An entry such as (a)-(g) means that the condition in the row implies the condition in the column, provided that we assume 7.1 (a) to (g).
The symbols involving  or ⊥ ⊥ are explained in Box 7.2. Trivial equivalences such as τx ⊥ ⊥X |Z ⇔ τx ⊥ ⊥X |Z are omitted. If we additionally assume
that Z is a covariate of X , then the first three conditions listed in the first column of the table imply unbiasedness of E X =x (Y |Z ), provided that the
Assumptions 7.1 (a) to (d) and (g) hold. The last five conditions imply unbiasedness of E (Y |X, Z ) and all E X =x (Y |Z ), x ∈ X (Ω), if Z is a covariate of X
and we assume 7.1 (a) to (g).
212 7 Rosenbaum-Rubin Conditions

Theorem 7.20 [Consequences of τx ⊥ ⊥X |Z ]


If the Assumptions 7.1 (a) to (d) and (g) hold, then

τx ⊥
⊥X |Z ⇒ τx  X | Z . (7.52)
Furthermore,
τx ⊥
⊥X |Z ⇒ τx ⊥
⊥ 1X =x |Z (7.53)
⇒ τx  1X =x | Z . (7.54)
(Proof p. 222)

The following theorem summarizes some important consequences of τ⊥ ⊥X |Z . If Z is


a covariate of X , then these consequences include unbiasedness of the conditional ex-
pectation E (Y |X, Z ), which is equivalent to unbiasedness of all conditional expectations
E X =x (Y |Z ), x ∈ X (Ω).

Theorem 7.21 [Consequences of Conditional Independence of τ and X and More]


If the Assumptions 7.1 (a) to (c), (e), and (g) hold, then

τ⊥
⊥X |Z ⇔ ∀ x ∈ X (Ω) : τ⊥
⊥1X =x |Z (7.55)
⇒ ∀ x, x ′ ∈ X (Ω) : τx ⊥
⊥1X =x ′ |Z (7.56)
⇔ ∀ x ∈ X (Ω) : τx ⊥
⊥X |Z (7.57)
⇒ ∀ x ∈ X (Ω) : τx ⊥
⊥1X =x |Z . (7.58)

Furthermore, if the Assumptions 7.1 (a) to (g) hold, then

∀ x ∈ X (Ω) : τx ⊥
⊥X |Z ⇒ ∀ x ∈ X (Ω) : τx  X | Z (7.59)
⇒ ∀ x ∈ X (Ω) : τx  1X =x | Z (7.60)
and

∀ x ∈ X (Ω) : τx ⊥
⊥1X =x |Z ⇒ ∀ x ∈ X (Ω) : τx  1X =x | Z . (7.61)

If we additionally assume that Z is a covariate of X , then

∀ x ∈ X (Ω) : τx  1X =x | Z ⇔ ∀ x ∈ X (Ω) : E X =x (Y |Z ) ⊢ D X (7.62)


⇔ E (Y |X, Z ) ⊢ D X . (7.63)
(Proof p. 222)

Remark 7.22 [Methodological Consequences] What has been said in Remark 7.11 about
the causality conditions for E (Y |X ) and its values E (Y |X =x ) also applies to the causal-
ity conditions for E (Y |X, Z ) and for the conditional expectations E X =x (Y |Z ). There is no
direct way to create the conditions such as τ⊥ ⊥X |Z or ∀x : τx  X |Z . There is also no di-
rect way to select the (possibly multivariate) random variable Z such that these condi-
tions hold. However, in chapters 8 to 10 we will treat other causality conditions that imply
τ⊥⊥X |Z and ∀x : τx  X |Z . These conditions can be created via conditionally randomized
7.4 Examples 213

assignment of the observational unit (person) to a treatment condition, but also via an ap-
propriate selection of Z . Hence, there are indirect ways to create τ⊥⊥ X |Z and ∀x : τx  X |Z ,
and with them unbiasedness of the conditional expectations E (Y |X, Z ) and the condi-
tional expectations E X =x (Y |Z ), x ∈ X (Ω). Remember, it is these conditional expectations
that can be estimated in data samples. ⊳

7.4 Examples

In this section, we study two examples. In the first one, there is Z -conditional mean-inde-
pendence of τx from 1X =x for all values x of X . In the second one, Z -conditional indepen-
dence of τ from X holds.

7.4.1 Z -Conditional Mean-Independence of τx From 1 X =x for All x

Table 7.3 displays the parameters of a random experiment in which Z -conditional mean-
independence of τx from 1X =x holds for each value x of X . However, neither Z -conditional
mean-independence of all τx from X holds nor Z -conditional independence of τ from X .
We assume that this random experiment has the same structure as the random experi-
ments treated in section 6.5. That is,
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,

as specified in section 6.5, is the regular probabilistic causality setup. The only difference
is that Ω2 = ΩX = {treatment 0, treatment 1, treatment 3} now consists of three (instead of
two) treatment conditions. Again, U is a global potential confounder of X .
The upper left part of Table 7.3 displays the true outcomes under treatments 0, 1, and
2, the probabilities P (U =u ) for each person u to be sampled, as well as the conditional
probabilities P (X =1|U =u ) and P (X =2 |U =u ) to be assigned to treatment 1 and treat-
ment 2, respectively. All other parameters, such as the associated individual causal total
effects CTE U ;10 (u) and CTE U ; 20 (u) or the conditional probabilities P (U =u |X =x ), can be
computed from those ‘fundamental parameters’. The table also displays the values of the
covariate (potential confounder) Z =sex. To emphasize, the table does not contain sample
data; it displays the parameters describing the laws of a random experiment, the single-
unit trial, which consists of (a) sampling a person from the set of (the six) persons, (b)
assigning or registering the (self-) assignment of the person to one of the three treatment
conditions, and (c) observing the value of the outcome variable. (For more details on such
singl-unit trials see ch. 2).

Z -Conditional Mean-Independence of τx From 1X =x

In this random experiment, the true treatment probabilities and true outcomes are such
that τx  1X =x |Z holds for each of the three values x of X . By definition, τx  1X =x |Z is equiv-
alent to
E (τx | 1X =x , Z ) =
P
E (τx |Z ), (7.64)

[see Eq. (7.37)]. In this example, U =D X , τx = E X =x (Y |U ), and τx is P-unique. Because


Z (Ω) = {m, f } and P (X =x , Z =z) > 0 for all values of X and Z , according to RS-Corollary
214 7 Rosenbaum-Rubin Conditions

Table 7.3. Z -conditional mean-independence of τx from 1X =x for all values x of X

Fundamental parameters

E X =2 (Y |U =u )
E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (X =2 |U =u )
P (X =1 |U =u )

P (U =u |X = 0)

P (U =u |X =2)
P (U =u |X =1)
CTE U ;20 (u)
CTE U ;10 (u)
Person u

P (U =u )
Sex z

Tom m 1/6 3/5 1/3 75 87 97 12 22 4/63 12/59 1/6


Tim m 1/6 1/2 1/3 70 80 92 10 22 10/63 10/59 1/6
Joe m 1/6 3/5 1/3 65 73 107 8 42 4/63 12/59 1/6
Ann f 1/6 1/2 1/3 106 114 90 8 −16 10/63 10/59 1/6
Sue f 1/6 1/4 1/3 116 130 120 14 4 25/63 5/59 1/6
Eva f 1/6 1/2 1/3 126 146 126 20 0 10/63 10/59 1/6

x =0 x =1 x =2 x =1 x =2
E (τx ): 93 105 105.333 ATE x0 : 12 12.333
E (Y |X =x ): 102.857 101.186 105.333 PFE x0 : −1.671 2.476

E (τx |Z =m): 70 80 98.667 CTE Z ; x0 (m) : 10 28.667


E (Y | X =x , Z =m): 70 80 98.667 PFE Z ; x0 (m) : 10 28.667

E (τx |Z = f ): 116 130 112 CTE Z ; x0 ( f ) : 14 −4


E (Y | X =x , Z = f ): 116 130 112 PFE Z ; x0 ( f ) : 14 −4

5.51, RS-Equations (5.49) and (5.50), Equation (7.64) holds if and only if

E (τx |X =x , Z =z) = E (τx |Z =z), ∀ z ∈ {m, f }. (7.65)

The values of the true outcome variable τx = E X =x (Y |U ) = g x (U ) [see RS-Eq. (5.23)] are
the conditional expectation values E X =x(Y |U =u ) = E (Y |X =x ,U =u ). Now,
¡ ¯ ¢
E (τx |X =x , Z =z) = E g x (U ) ¯ X =x , Z =z
X
= g x (u) · P (U =u |X =x , Z =z) (7.66)
u
E X =x (Y |U =u ) ·P (U =u |X =x , Z =z),
X
= ∀ z ∈ {m, f }.
u

In contrast,
¡ ¢
E (τx |Z =z) = E g x (U ) | Z =z
X
= g x (u) · P (U =u |Z =z) (7.67)
u
E X =x (Y |U =u ) ·P (U =u |Z =z),
X
= ∀ z ∈ {m, f }.
u

According to Equation (7.66), the conditional expectation value E (τ1 |X =1, Z =z) can be
computed from Tables 7.3 and 7.4 by
7.4 Examples 215

Table 7.4. Conditional probabilities P(U =u |X =x , Z=z) supplementing Table 7.3

P (U =u |X =0, Z =m)

P (U =u |X =1, Z =m)

P (U =u |X =2, Z =m)

P (U =u |X =0, Z = f )

P (U =u |X =1, Z = f )

P (U =u |X =2, Z = f )
Person u

Tom 2/9 6/17 1/3 0 0 0


Tim 5/9 5/17 1/3 0 0 0
Joe 2/9 6/17 1/3 0 0 0
Ann 0 0 0 2/9 2/5 1/3
Sue 0 0 0 5/9 1/5 1/3
Eva 0 0 0 2/9 2/5 1/3

E X =1 (Y |U =u ) · P (U =u |X =1, Z =m)
X
E (τ1 |X =1, Z =m) =
u
6 5 6
= 87 · + 80 · + 73 · = 80.
17 17 17
This is exactly the same result as obtained for
X X =1
E (τ1 |Z =m) = E (Y |U =u ) · P (U =u |Z =m)
u
1
= (87 + 80 + 73) · = 80.
3
Hence,

E (τ1 |X =1, Z =m) = E (τ1 |Z =m) = 80,

and the corresponding equations hold for the second value of Z , namely

E (τ1 |X =1, Z = f ) = E (τ1 |Z = f ) = 130.

Therefore, according to RS-Corollary 5.51, we proved τ1  1X =1 | Z . In the same way it can


be shown that τ0  1X=0 | Z and τ2  1X =2 | Z hold as well.

Z -Conditional Mean-Independence of All τx From X

Now we show that τx  X |Z does not hold in the example of Table 7.3 for at least one value
x of X . Of course, this implies that ∀x : τx  X |Z does not hold either. We start gathering
the relevant equations. By definition, ∀x : τx  X |Z is equivalent to

E (τx |X , Z ) =
P
E (τx |Z ), ∀ x ∈ X (Ω) (7.68)

[see Eq. (7.38)]. As mentioned before, in this example U is a global potential confounder
of X , the image of Ω under X is X (Ω) = {0, 1, 2}, and τx = E X =x (Y |U ) is P-unique for all
216 7 Rosenbaum-Rubin Conditions

x ∈ X (Ω). Hence because P (X =x, Z =z) > 0 for all pairs (x, z) of values of (X , Z ), Equation
(7.68) is equivalent to
E (τx |X =x ′, Z =z) = E (τx |Z =z), ∀(x, x ′, z) ∈ {0, 1, 2}2 × {m, f } (7.69)
[see RS-Th. 4.42 (ii)].
The values of the true outcome variable τx = E X =x (Y |U ) = g x (U ) are the conditional
expectation values E X =x(Y |U =u ) = E (Y |X =x ,U =u ). Hence, applying RS-Equation (3.28)
to the left-hand side of Equation (7.69) yields
¡ ¯ ¢
E (τx |X =x ′, Z =z) = E g x (U ) ¯ X =x ′, Z =z
X
= g x (u) · P (U =u |X =x ′, Z =z) (7.70)
u
E X =x(Y |U =u )·P (U =u |X =x ′, Z =z), ∀(x, x ′, z) ∈ {0, 1, 2}2 × {m, f }.
X
=
u

In contrast, applying RS-Equation (3.28) to the right-hand side of Equation (7.69) yields
¡ ¢
E (τx |Z =z) = E g x (U ) | Z =z
X
= g x (u) · P (U =u |Z =z) (7.71)
u
E X =x (Y |U =u ) ·P (U =u |Z =z),
X
= ∀(x, z) ∈ {0, 1, 2} × {m, f }.
u

In order to show that ∀x : τx  X |Z does not hold in this example, it suffices to show that
there is a triple (x, x ′, z) ∈ {0, 1, 2}2 × {m, f } for which E (τx |X =x ′, Z =z) 6= E (τx |Z =z) [see
Eq. (7.69)].
According to Equation (7.70), the conditional expectation value E (τ2 |X =1, Z = f ) can
be computed from Tables 7.3 and 7.4 by
X X =2
E (τ2 |X =1, Z = f ) = E (Y |U =u ) · P (U =u |X =1, Z = f )
u
2 1 2
= 90 · + 120 · + 126 · = 110.4.
5 5 5
In contrast,
E X =2 (Y |U =u ) · P (U =u |Z = f )
X
E (τ2 |Z = f ) =
u
1 1 1
= 90 · + 120 · + 126 · = 112.
3 3 3
Hence, E (τ2 |X =1, Z = f ) 6= E (τ2 |Z = f ), and this proves that ∀x : τx  X |Z does not hold in
this example.
To summarize: Whereas, in this example, τx  1X =x |Z holds for all values x of X — and
with it, unbiasedness of the conditional expectations E (Y |X, Z ), E X =x (Y |Z ), and the con-
ditional expectation values E (Y |X =x , Z =z) — the more restrictive condition ∀x : τx  X |Z
does not hold. Hence, τ⊥ ⊥ X |Z (strong ignorability), which implies ∀x : τx  X |Z , does not
hold as well in this example (see Exercise 7-5).

7.4.2 Z -Conditional Independence of τ and X

Table 7.5 displays the parameters of a random experiment in which there is Z -conditional
independence of τ and X . Again we assume that this random experiment has the same
structure as the random experiments treated in section 6.5. That is,
7.4 Examples 217

Table 7.5. Z -conditional independence of X and τ

¢
P X =1 |Z =(z 1 , z 2 )

E X =1 (Y |U =u )
E X=0 (Y |U =u )
P (X =1 |U =u )

P (U =u |X = 0)

P (U =u |X =1)
CTE U ;10 (u)
College z 2
Person u

P (U =u )
Sex z 1

¡
Tom m no 1/6 7/8 6/8 72 83 11 1/22 7/26
Tim m no 1/6 5/8 6/8 72 83 11 3/22 5/26
Joe m yes 1/6 5/8 5/8 95 100 5 3/22 5/26
Jim m yes 1/6 5/8 5/8 100 105 5 3/22 5/26
Ann f yes 1/6 2/8 2/8 106 114 8 6/22 2/26
Sue f yes 1/6 2/8 2/8 116 130 14 6/22 2/26

x =0 x=1
E (τx ): 93.5 102.5 ATE 10 = 9
E (Y |X =x ): 100.227 96.5 PFE 10 = −3.727
¡ ¯ ¢
¡ ¯E τx Z =(m, no )¢ : 72 83 CTE Z ; 10 (m, no ) = 11
¯
E Y ¯ X =x , Z =(m, no ) : 72 83 PFE Z ;10 (m, no ) = 11
¡ ¯ ¢
¡ ¯E τx Z =(m, yes )¢ : 97.5 102.5 CTE Z ; 10 (m, yes ) = 5
¯
E Y ¯ X =x , Z =(m, yes ) : 97.5 102.5 PFE Z ;10 (m, yes ) = 5
¡ ¯ ¢
¡ ¯E τx Z =( f, yes )¢ : 111 122 CTE Z ; 10 ( f, yes ) = 11
¯
E Y ¯ X =x , Z =( f, yes ) : 111 122 PFE Z ;10 ( f, yes ) = 11

Note: In this table Z = (Z1 , Z2 ) is a two-dimensional random variable.

¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,

as specified in section 6.5, is the regular probabilistic causality setup. This also includes
the set Ω2 = ΩX = {control, treatment }, represented by the values 0 and 1 of the treatment
variable X . And again, U is a global potential confounder of X .
The upper left part of Table 7.5 displays the true outcomes under treatment and under
control as well as the probabilities for each person to be assigned to treatment condition
1. In the random experiment presented in this table, the true treatment probabilities and
true outcomes are such that Z -conditional independence of τ and X (i. e., Z -conditional
strong ignorability) holds.
We check if τ⊥
⊥X |Z actually holds via checking

P (X =1| Z , τ) =
P
P (X =1| Z ) (7.72)

[see Prop. (7.44)]. According to RS-Theorem 6.6, Equation (7.72) is equivalent to

P (X =0 | Z , τ) =
P
P (X =0 | Z ) (7.73)
218 7 Rosenbaum-Rubin Conditions

because X is binary. According to the same theorem, this equation is also equivalent to
τ⊥⊥1X =1 | Z and because, in this example, 1X =1 = X , it is equivalent to τ⊥
⊥X |Z as well. Fur-
thermore, in this example, P (Z =z, τ=t ) > 0 for all pairs (z, t ) ∈ Z (Ω) × τ(Ω). Therefore,
Equation (7.72) is also equivalent to

P (X =1| Z =z, τ=t ) = P (X =1| Z =z), ∀(z, t ) ∈ Z (Ω) × τ(Ω) (7.74)

[see RS-Th. 4.42 and RS-Eq. (3.26)]. Note that, in this example, Z = (Z 1 , Z 2) and τ = (τ0 , τ1 )
are two-dimensional random variables.
In the sequel, we use
X
P (X =x |V =v) = P (X =x |V =v ,U =u ) · P (U =u |V =v ), (7.75)
u

which is always true if P (V =v,U =u ) > 0 for all values of U [see RS-Box 3.2 (ii) for
Y = 1X =x ]. Using Equation (7.75) for the parameters displayed in Table 7.5 with Z = (Z 1 , Z 2)
taking the role of V and considering x =1 and z=(m, no) yields
¡ ¯ ¢
P X =1¯ Z =(m, no)
X ¡ ¯ ¢ ¡ ¯ ¢
= P X =1¯ Z =(m, no),U =u · P U =u ¯ Z =(m, no)
u
= 7/8 · 1/2 + 5/8 · 1/2 = 6/8.

In this case, we only have to sum over the first ¡two persons
¯ displayed
¢ in Table 7.5 because,
for the other four persons, the probabilities P U =u ¯ Z =(m,no) are zero. Applying Equa-
tion (7.75) to V = (Z , τ) and the combination of values z = (m, no) and t = (72, 83) yields
exactly the same probability
¡ ¯ ¢
P X =1¯ Z =(m, no), τ=(72, 83)
X ¡ ¯ ¢ ¡ ¯ ¢
= P X =1¯ Z =(m, no), τ=(72, 83),U =u · P U =u ¯ Z =(m, no), τ=(72, 83)
u
= 7/8 · 1/2 + 5/8 · 1/2 = 6/8

(see the first two rows of Table 7.5). Hence, we have shown
¡ ¯ ¢ ¡ ¯ ¢
P X =1¯ Z =(m, no) = P X =1¯ Z =(m, no), τ=(72, 83) = 6/8.

Again using Equation (7.75) with Z taking the role of V and considering the case x =1
and z =(m, yes) yields
¡ ¯ ¢
P X =1¯ Z =(m, yes)
X ¡ ¯ ¢ ¡ ¯ ¢
= P X =1¯ Z =(m, yes),U =u · P U =u ¯ Z =(m, yes)
u
= 5/8 · 1/2 + 5/8 · 1/2 = 5/8.

In this case, we only have to sum over persons three¡ and four
¯ displayed
¢ in Table 7.5; for the
other four persons, the conditional probabilities P U =u ¯ Z =(m,yes) are zero.
Applying Equation (7.75) to V = (Z , τ) and the combination of values Z = (m, yes) and
τ = (95, 100) yields exactly the same conditional probability
7.5 Summary and Conclusions 219
¡ ¯ ¢
P X =1 ¯ Z =(m, yes), τ=(95, 100)
X ¡ ¯ ¢ ¡ ¯ ¢
= P X =1 ¯ Z =(m, yes), τ=(95, 100),U =u · P U =u ¯ Z =(m, yes), τ=(95, 100)
u
= 5/8 · 1 = 5/8

(see the third row of Table 7.5). The same result is obtained if we apply Equation (7.75) to
V = (Z , τ) and the combination of values z = (m, yes) and t = (100, 105) (see the fourth row
of Table 7.5). Hence we have shown
¡ ¯ ¢ ¡ ¯ ¢
P X =1¯ Z =(m, yes) = P X =1¯ Z =(m, yes), τ=(95, 100)
¡ ¯ ¢
= P X =1¯ Z =(m, yes), τ=(100, 105) = 5/8.

Finally, the analog procedure yields


¡ ¯ ¢ ¡ ¯ ¢
P X =1¯ Z =( f, yes) = P X =1¯ Z =( f, yes), τ=(106, 114)
¡ ¯ ¢
= P X =1¯ Z =( f, yes), τ=(116, 130) = 2/8.

This proves that Proposition (7.74), and with it, Equation (7.72) — which is equivalent to
τ⊥
⊥X |Z — holds in this example.

7.5 Summary and Conclusions

In this chapter, we treated some causality conditions, all of which involve the true out-
come variables τx . The simple (i. e., the unconditional) ones are listed in Box 7.1, the con-
ditional ones in Box 7.2. The implication relations among the simple causality conditions
are listed in Table 7.1, whereas the implications among the conditional ones are found in
Table 7.2. According to the last row of Table 7.2, Z -conditional independence of τ and X ,
the translation of Rosenbaum and Rubin’s Z -conditional strong ignorability into true out-
come theory, is the strongest, that is, the most restrictive condition among those causality
conditions in which we condition on a covariate Z of X ; it implies all other causality con-
ditions listed in that table. The same applies to independence of τ and X , which implies
all other causality conditions listed in Table 7.1.
Note that there are no implications between the simple causality conditions summa-
rized in Table 7.1 and the conditional ones listed in Table 7.2. Even the strongest (most
restrictive) condition τ⊥ ⊥X does not imply any of the causality conditions listed in Table
7.2 or vice versa. This implies, for example, that ∀x : τx  1X =x , which is equivalent to unbi-
asedness of E (Y |X ), does not imply ∀x : τx  1X =x | Z , which is equivalent to unbiasedness
of E (Y |X, Z ) (see also the example described in section 6.6). Furthermore, τ⊥ ⊥X |Z does
not imply τ⊥ ⊥X or any of the simple causality conditions listed in Box 7.1.

Limitations

The causality conditions treated in the present chapter have three important limitations:
They are not generalizable, they can only indirectly be created via experimental design
techniques, and they are not testable in empirical applications.
The term not generalizable refers to the fact mentioned above that, for example, τ⊥ ⊥X
does not imply τ⊥⊥X |Z , even not if Z is a covariate of X . This has serious disadvantages
220 7 Rosenbaum-Rubin Conditions

for the interpretation of the conditional expectation E (Y |X, Z ). Although τ⊥ ⊥X implies


that E (Y |X ) is unbiased, we cannot conclude that E (Y |X, Z ) is unbiased as well. Suppose
that the putative cause variable X is binary and Z represents the covariate sex with values
m and f . Then, even if we know the causal total effects of treatment condition x compared
to x ′ on the outcome Y, then we do not know anything about the conditional causal total
effects for males and for females. This also means that the difference between the con-
ditional expectation values E (Y | X =1, Z =m) and E (Y | X =0, Z =m) may not have a causal
meaning, and the same applies to the corresponding difference for females.
The second limitation is that these conditions can only indirectly be created via exper-
imental design techniques such as randomized assignment or conditionally randomized
assignment of the observational unit (person) to a treatment condition. In chapters 8 to
10 we will treat other causality conditions that imply all conditions treated in the present
chapter and that are created via randomized assignment of the observational unit to a
treatment condition. Randomized assignment of a unit to a treatment is a way to cre-
ate independence of the global potential confounder D X and X , which implies τ⊥ ⊥X and
τ⊥ ⊥X |Z , and with them all causality conditions treated in the present chapter, including
unbiasedness of E (Y |X ) and E (Y |X, Z ), provided that Z is a covariate of X (see Rem. 4.16).
In this case, E (Y |X =1) − E (Y |X =0) and E (Y | X =1, Z =z) − E (Y | X =0, Z =z) are unbiased
and have a causal interpretation if P (X =x , Z =z) > 0 for the values (x, z) of X and Z . In
contrast, Z -conditionally randomized assignment of a unit to a treatment is a way to cre-
ate Z -conditional independence of D X and X , which implies τ⊥ ⊥X |Z , and with it unbi-
asedness of E (Y |X, Z ), which is sufficient for computing conditional and average causal
total effects (for more details see Box 6.3).
The third disadvantage mentioned above is testability. Just like unbiasedness, all causal-
ity conditions treated in this chapter, including Z -conditional independence of X and τ
(i. e., strong ignorability), cannot be tested empirically without unrealistic additional as-
sumptions because, unlike in the numerical examples treated in this chapter, in empiri-
cal applications, the values of the true outcome variables τx are unknown and cannot be
estimated for more than one single value x of X (see the fundamental problem of causal
inference described in the Preface and in Rem. 5.13). Therefore, in contrast to the causality
conditions treated in the chapters to come, the causality conditions treated in the present
chapter cannot be used for a selection of a covariate Z . Nevertheless, the Rosenbaum-
Rubin conditions are of theoretical interest because they are implied by other causality
conditions that are empirically testable (see chs. 8 to 10).

7.6 Proofs

Proof of Theorem 7.7

The assumptions include that τx is P-unique. Under this assumption,

τx ⊥
⊥ 1X =x ⇒ τx  1X =x [RS-Th. 4.40]
⇔ E (τx | 1X =x ) =
P
E (τx ) [RS-Def. 4.36]
⇔ E (Y |X =x ) ⊢ D X . [Th. 6.9 (i)]
7.6 Proofs 221

Proof of Theorem 7.8

Proposition (7.17). This proposition immediately follows from the fact that σ(1X =x ) ⊂
σ(X ) and RS-Box 2.1 (iv).
Propositions (7.18) to (7.20). With 7.1 (d) we assume that τx is P-unique. Under this
assumption,

τx ⊥
⊥X ⇒ τx  X [RS-Th. 4.40]
⇒ τx  1X =x [σ(1X =x ) ⊂ σ(X ), RS-(4.45)]
⇔ E (Y |X =x ) ⊢ D X . [(7.16)]

Proof of Theorem 7.17

Proposition (7.46). According to Assumption 7.1 (d), the true outcome variable τx is
P-unique. Hence,

τx  X | Z
⇔ E (τx |X , Z ) =P
E (τx |Z ) [(7.38)]
¡ ¯ ¢ ¡ ¯ ¢
⇒ E E (τx |X , Z ) ¯ 1X =x , Z = P
E E (τx |Z ) ¯ 1X =x , Z [RS-Box 4.1 (xiv)]
¡ ¯ ¢
⇔ E (τx | 1X =x , Z ) =
P
E E (τx |Z ) ¯ 1X =x , Z
[σ(1X =x , Z ) ⊂ σ(X , Z ), RS-Box 4.1 (xiii)]
¡ ¢
⇔ E (τx | 1X =x , Z ) =
P
E (τx |Z ) [σ(E τx |Z ) ⊂ σ(1X =x , Z ), RS-Box 4.1 (xi)]
⇔ τx  1X =x | Z . [(7.37)]

Proposition (7.47).

τx  1X =x | Z ⇔ E (τx | 1X =x , Z ) =
P
E (τx |Z ) [(7.37)]
X =x
⇔ E (Y |Z ) ⊢ D X . [Th. 6.20 (i)]

Proof of Corollary 7.18

Proposition (7.48).

∀ x ∈ X (Ω): τx  X | Z ⇒ ∀ x ∈ X (Ω): τx  1X =x | Z . [(7.46)]

Proposition (7.49).

∀ x ∈ X (Ω): τx  1X =x | Z ⇔ ∀ x ∈ X (Ω): E X =x (Y |Z ) ⊢ D X [(7.47)]


⇔ E (Y |X, Z ) ⊢ D X . [Def. 6.18 (i)]
222 7 Rosenbaum-Rubin Conditions

Proof of Theorem 7.19

Proposition (7.50). We assume that τx is P-unique. Hence,

τx ⊥
⊥ 1X =x |Z ⇒ τx  1X =x | Z . [RS-Box 4.1 (xiv), RS-Th. 6.8]

Proposition (7.51). We assume that Z is a covariate of X . Hence,

τx  1X =x | Z ⇔ E X =x (Y |Z ) ⊢ D X . [(7.47)]

Proof of Theorem 7.20

Proposition (7.52). We assume that τx is P-unique. Hence,

τx ⊥
⊥X |Z ⇒ τx  X | Z . [RS-Th. 6.8]

Propositions (7.53) and (7.54).

τx ⊥
⊥X |Z ⇒ τx ⊥
⊥ 1X =x |Z [σ(1X =x ) ⊂ σ(X ), RS-Box 6.1 (vi)]
⇒ τx  1X =x | Z . [τx is P-unique, RS-Th. 6.8]

Proof of Theorem 7.21

Propositions (7.55) to (7.58). Under Assumptions 7.1 (a) to (c), (e), and (g),

τ⊥
⊥X |Z
⇔ ∀ x ∈ X (Ω) : τ⊥
⊥1X =x |Z [RS-(6.8)]

⇒ ∀ x, x ∈ X (Ω): τx ⊥
⊥1X =x ′ |Z [σ(τx ) ⊂ σ(τ), RS-Box 6.1 (vi)]
⇔ ∀ x ∈ X (Ω): τx ⊥
⊥X |Z [RS-(6.8)]

⇔ ∀ x, x ∈ X (Ω): τx ⊥
⊥1X =x ′ |Z [RS-(6.8)]
⇒ ∀ x ∈ X (Ω): τx ⊥
⊥1X =x |Z . [x ∈ X (Ω)]

Propositions (7.59) and (7.60). Under the Assumptions 7.1 (a) to (g),

∀ x ∈ X (Ω): τx ⊥
⊥X |Z
⇒ ∀ x ∈ X (Ω): τx  X | Z [τx is P-unique, RS-Th. 6.8]
⇒ ∀ x ∈ X (Ω): τx  1X =x | Z . [σ(1X =x ) ⊂ σ(X ), RS-(4.52)]

Proposition (7.61). Under the Assumptions 7.1 (a) to (g),

∀ x ∈ X (Ω) : τx ⊥
⊥1X =x |Z
⇒ ∀ x ∈ X (Ω) : τx  1X =x | Z . [τx is P-unique, RS-Th. 6.8]
7.7 Exercises 223

Propositions (7.62) and (7.63). If we additionally assume that Z is a covariate of X , then

∀ x ∈ X (Ω): τx  1X =x | Z
⇔ ∀ x ∈ X (Ω) : E X =x (Y |Z ) ⊢ D X [Th. 6.20 (i), (7.38)]
⇔ E (Y |X, Z ) ⊢ D X . [Def. 6.18 (i)]

7.7 Exercises

⊲ Exercise 7-1 Prove Propositions (7.5) and (7.6).

⊲ Exercise 7-2 Prove Propositions (7.7) to (7.9).

⊲ Exercise 7-3 Prove that Proposition (7.14) implies (7.13).

⊲ Exercise 7-4 Check the implications listed in Table 7.2 and find their proofs in this chapter.

⊲ Exercise 7-5 Change a single number in Table 7.3 so that in this modified example τ⊥
⊥X |(Z = f )
holds. Use the Causal Effects Xplorer to check this condition.

⊲ Exercise 7-6 Show that P(X =x |D X ) > 0 implies P(X =x | Z ) > 0, if Z is a covariate of X .
P P

Solutions

⊲ Solution 7-1 If the Assumptions 7.1 (a) to (d) hold, then

τx  X ⇔ E (τx |X ) = E (τx ) [(7.2)]


P

⇔ ∀ x ∈ X (Ω): E (τx | 1X =x ′ ) = E (τx ) [RS-(4.40)]
P

⇔ ∀ x ∈ X (Ω): τx  1X =x ′ [RS-Def. 4.36]
⇒ τx  1X =x [x ∈ X (Ω)]
⇔ E (Y |X =x ) ⊢ D X . [(6.5), Th. 6.9 (i)]

⊲ Solution 7-2 Under the Assumptions 7.1 (a) to (f ), Propositions (7.7) to (7.8) immediately follow
from Propositions (7.5) to (7.6), and Proposition (7.9) follows from Definition 6.3 (ii).
⊲ Solution 7-3

τ⊥
⊥X ⇔ ∀ x ∈ X (Ω) : P(X =x |τ) = P(X =x ) [(7.14)]
P

⇔ ∀ x ′ ∈ X (Ω) : E (1X =x ′ |τ) = E (1X =x ′ ) [RS-(4.10), RS-(3.9)]


P

∀ x, x ′ ∈ X (Ω) : E E (1X =x ′ |τ) ¯ τx = E E (1X =x ′ ) ¯ τx


¡ ¯ ¢ ¡ ¯ ¢
⇒ [RS-Box 4.1 (xiv)]
P

⇔ ∀ x, x ′ ∈ X (Ω) : E (1X =x ′ |τx ) = E (1X =x ′ ) [RS-Box 4.1 (xiii), (i)]


P

⇔ ∀ x, x ′ ∈ X (Ω) : P(X =x ′ |τx ) = P(X =x ′ ) [RS-(4.10), RS-(3.9)]


P

⇔ ∀x : τx ⊥
⊥X . [(7.13)]

⊲ Solution 7-4 The propositions of Table 7.2 are considered row wise.
Row 1
224 7 Rosenbaum-Rubin Conditions

(1) Under the Assumptions 7.1 (a) to (d) and (g): τx  X | Z ⇒ τx  1X =x | Z .


This is Proposition (7.46).
Row 2
(2) ⊥ 1X =x |Z ⇒ τx  1X =x | Z .
Under the Assumptions 7.1 (a) to (d) and (g): τx ⊥
This is Proposition (7.50).
Row 3
(3) ⊥X |Z ⇒ τx  1X =x | Z .
Under the Assumptions 7.1 (a) to (d) and (g): τx ⊥
This is Proposition (7.54).
(4) ⊥X |Z ⇒ τx  X | Z .
Under the Assumptions 7.1 (a) to (d) and (g): τx ⊥
This is Proposition (7.52).
(5) Under the Assumptions 7.1 (a) to (c) and (g): τx ⊥
⊥X |Z ⇒ τx ⊥
⊥ 1X =x |Z .
This is Proposition (7.53).
Row 4
¡ ¢
(6) Under the Assumptions 7.1 (a) to (g): ∀ x ∈ X (Ω): τx  1X =x | Z ⇒ τx  1X =x | Z .
This proposition immediately follows from x ∈ X (Ω).
Row 5
¡ ¢
(7) Under the Assumptions 7.1 (a) to (g): ∀ x ∈ X (Ω): τx  X | Z ⇒ τx  1X =x | Z .
This immediately follows from Proposition (7.60).
¡ ¢
(8) Under the Assumptions 7.1 (a) to (g): ∀ x ∈ X (Ω): τx  X | Z ⇒ τx  X | Z .
This proposition immediately follows from x ∈ X (Ω).
Row 6
¡ ¢
(9) Under the Assumptions 7.1 (a) to (e) and (g): ∀ x ∈ X (Ω): τx ⊥ ⊥ 1X =x |Z ⇒ τx  1X =x | Z .
This immediately follows from Proposition (7.61).
¡ ¢
(10) Under the Assumptions 7.1 (a) to (c), (e) and (g): ∀ x ∈ X (Ω): τx ⊥⊥ 1X =x |Z ⇒ τx ⊥⊥ 1X =x |Z .
This proposition immediately follows from x ∈ X (Ω).
¡ ¢ ¡ ¢
⊥ 1X =x |Z ⇒ ∀ x ∈ X (Ω): τx  1X =x | Z .
(11) Under the Assumptions 7.1 (a) to (g): ∀ x ∈ X (Ω): τx ⊥
This follows from Proposition (7.54).
Row 7
¡ ¢
(12) Under the Assumptions 7.1 (a) to (e) and (g): ∀ x ∈ X (Ω): τx ⊥⊥X |Z ⇒ τx  1X =x | Z .
This follows from Propositions (7.59) and (7.60).
¡ ¢
(13) Under the Assumptions 7.1 (a) to (e) and (g): ∀ x ∈ X (Ω): τx ⊥⊥X |Z ⇒ τx  X | Z .
This immediately follows from Proposition (7.59).
¡ ¢
(14) Under the Assumptions 7.1 (a) to (c), (e) and (g): ∀ x ∈ X (Ω): τx ⊥
⊥X |Z ⇒ τx ⊥⊥ 1X =x |Z .
This follows from Proposition (7.53).
¡ ¢
(15) Under the Assumptions 7.1 (a) to (c), (e) and (g): ∀ x ∈ X (Ω): τx ⊥
⊥X |Z ⇒ τx ⊥⊥X |Z .
This proposition immediately follows from x ∈ X (Ω).
¡ ¢ ¡ ¢
⊥X |Z ⇒ ∀ x ∈ X (Ω): τx  1X =x | Z .
(16) Under the Assumptions 7.1 (a) to (g): ∀ x ∈ X (Ω): τx ⊥
This follows from Propositions (7.53) and (7.54).
¡ ¢ ¡ ¢
⊥X |Z ⇒ ∀ x ∈ X (Ω): τx  X | Z .
(17) Under the Assumptions 7.1 (a) to (g): ∀ x ∈ X (Ω): τx ⊥
This follows from Proposition (7.52).
(18) Under
¡ ¢ 7.1¡ (a) to (c), (e) and (g): ¢
the Assumptions
∀ x ∈ X (Ω): τx ⊥
⊥X |Z ⇒ ∀ x ∈ X (Ω): τx ⊥ ⊥ 1X =x |Z . This follows from Proposition (7.53).
Row 8
(19) Under the Assumptions 7.1 (a) to (e) and (g): τ⊥ ⊥X |Z ⇒ τx  1X =x | Z .
This follows from Propositions (7.55) to (7.60).
⊥X |Z ⇒ τx  X | Z .
(20) Under the Assumptions 7.1 (a) to (e) and (g): τ⊥
This immediately follows from Propositions (7.55) to (7.59).
7.7 Exercises 225

(21) Under the Assumptions 7.1 (a) to (c), (e) and (g): τ⊥
⊥X |Z ⇒ τx ⊥
⊥ 1X =x |Z .
This follows from Propositions (7.55) and (7.58).
(22) Under the Assumptions 7.1 (a) to (c), (e) and (g): τ⊥ ⊥X |Z ⇒ τx ⊥
⊥X |Z .
This follows from Propositions (7.55) to (7.57).
¡ ¢
(23) Under the Assumptions 7.1 (a) to (g): τ⊥⊥X |Z ⇒ ∀ x ∈ X (Ω): τx  1X =x | Z .
This follows from Propositions (7.55) to (7.60).
¡ ¢
(24) Under the Assumptions 7.1 (a) to (g): τ⊥⊥X |Z ⇒ ∀ x ∈ X (Ω): τx  X | Z .
This immediately follows from Propositions (7.55) to (7.59).
¡ ¢
(25) Under the Assumptions 7.1 (a) to (c), (e) and (g): τ⊥ ⊥X |Z ⇒ ∀ x ∈ X (Ω): τx ⊥
⊥ 1X =x |Z .
This follows from Propositions (7.55) to (7.58).
¡ ¢
(26) Under the Assumptions 7.1 (a) to (c), (e) and (g): τ⊥ ⊥X |Z ⇒ ∀ x ∈ X (Ω): τx ⊥
⊥X |Z .
This follows from Propositions (7.55) to (7.57).

⊲ Solution 7-5 Change E X =2 (Y |U =Eva ) from 126 to 150.


⊲ Solution 7-6 Remember, if Z is a covariate of Z , then σ(Z ) ⊂ σ(D X ) (see Rem. 4.16). Hence,
¡ ¯ ¢
P(X =x |D X ) > 0 ⇒ E P(X =x |D X ) ¯ Z > 0 [RS-(4.13)]
P P
¡ ¯ ¢
⇔ E E (1X =x |D X ) ¯ Z > 0 [RS-(4.10)]
P

⇔ E (1X =x |Z ) > 0 [SN-(2.40), σ(Z ) ⊂ σ(D X ), RS-Box 4.1 (xiii)]


P

⇔ P(X =x | Z ) > 0. [RS-(4.10)]


P
Chapter 8
Fisher Conditions

In chapter 6, we treated unbiasedness of the conditional expectations E (Y |X ), E X =x (Y |Z ),


and E (Y |X, Z ) and unbiasedness of the conditional expectation values E (Y |X =x ) and
E (Y |X =x , Z =z). Unbiasedness of these terms is a first kind of causality conditions jus-
tifying causal interpretations of the dependencies described by conditional expectations
and conditional expectation values. For example, if E (Y | X =x ) and E (Y | X =x ′ ) are unbi-
ased, then the prima facie effect PFE x x ′ = E (Y | X =x ) − E (Y | X =x ′ ) is unbiased as well and
it is identical to the causal average (total) effect ATE xx ′ . Therefore, under unbiasedness,
an estimate of PFE x x ′ is also an estimate of the ATE x x ′ . Similarly, if E (Y | X =x , Z =z) and
E (Y | X =x ′, Z =z) are unbiased, then the (Z =z)-conditional prima facie effect PFE Z ; x x ′ (z)
= E (Y | X =x , Z =z) − E (Y | X =x ′, Z =z) is unbiased and PFE Z ; x x ′ (z) is identical to the causal
(Z =z)-conditional (total) effect CTE Z ; xx ′ (z).
In chapter 7, we treated some other causality conditions, the Rosenbaum-Rubin condi-
tions, which involve the true outcome variables τx and imply unbiasedness of the condi-
tional expectations E (Y |X ), E X =x (Y |Z ), E (Y |X, Z ), and their values. However, these con-
ditions as well as unbiasedness itself cannot be tested empirically. Therefore we say that
they cannot be falsified. Consequently, none of these conditions can be used for selecting
the random variable Z with respect to which conditional independence of the multivari-
ate true outcome variable τ and X (i. e., τ⊥ ⊥X |Z ) or conditional mean-independence of
the true outcome variables τx and X (i. e., τx  X | Z ) hold. Furthermore, unbiasedness can
be accidental in the sense that it may hold for E (Y |X ) but not for E (Y |X, Z ), where Z is a
covariate of X . This also applies, for example, to the strong ignorability condition τ⊥ ⊥X |Z .
If it holds for a specified covariate Z , then this does not imply that τ⊥ ⊥X |(Z ,W ) holds as
well, even if Z and W are covariates of X . This is one of the reasons why, in the present and
some of the next chapters, we study other causality conditions that are less volatile (i. e.,
that are generalizable) and falsifiable, therefore lending themselves for covariate selection
in quasi-experiments and observational studies.
The conditions introduced in this chapter will be referred to as the Fisher conditions
(for total effects). This name is chosen to recognize the contributions of Sir R. A. Fisher
to understanding the relevance of the experimental design technique of randomization
for causal inference (see, e. g., Fisher, 1925/1946). The Fisher conditions can be created
by randomly assigning (e. g. by a coin flip) the observational unit to one of several treat-
ment conditions represented by the values x of the putative cause variable X . In contrast
to the causality conditions treated in the previous chapters, the Fisher conditions are also
falsifiable. Hence, they can be tested in samples and can be used for covariate selection.
Furthermore, they are not accidental in the sense described above, that is, they are gener-
alizable.
Finally, the Fisher conditions may also hold if the putative cause variable X is a contin-
uous random variable. Remember, all causality conditions described in chapters 6 and 7
228 8 Fisher Conditions

are defined only if the values x of X have a positive probability P (X =x ). For example, if
X is normally distributed, then P (X =x ) = 0 for all values x of X , and this also applies if X
has any other continuous distribution. Even in the definitions of causal effects treated in
chapter 5 we presumed that X is discrete and has at least two values x and x ′ having a pos-
itive probability. Hence, although the Fisher conditions have many useful implications on
unbiasedness and true outcome variables, they have far reaching consequences beyond
true outcome theory.

Requirements

Reading this chapter we assume that the reader is familiar with the concepts treated in all
chapters of Steyer (2024). Again, chapters 4 to 6 of that book are now crucial. They deal
with the concepts of a conditional expectation, a conditional expectation with respect to a
conditional probability measure, and conditional independence. Furthermore, we assume
familiarity with chapters 4 to 7 of the present book.
In this chapter, we will often refer to the following assumptions and notation.

Notation and Assumptions 8.1


¡ ¢
(a) Let (Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y be a regular probabilistic causality setup, let
D X denote a global potential confounder of X , and WX the set of all potential
confounders of X , that is, the set of all random variables on (Ω, A, P ) satisfying
σ(W ) ⊂ σ(D X ).
(b) Let (ΩX′ , AX′ ) denote the value space of X , let x ∈ ΩX′ , let {x } ∈ AX′ , and let 1X =x
denote the indicator of the event {X =x } = {ω ∈ Ω: X (ω) = x }.
(c) Let Y be real-valued with positive variance, assume 0 < P (X =x ) < 1, define the
probability measure P X =x : A → [0, 1] by P X =x (A) = P (A | X =x ), for all A ∈ A,
and let τx = E X =x (Y |D X ) denote a true outcome variable of Y given x.
(d) Let Z be a random variable on (Ω, A, P ) and let (ΩZ′ , AZ′ ) denote its value space.
(e) Let z ∈ ΩZ′ be a value of Z , let {z } ∈ AZ′ , assume P (Z =z) > 0, and define the pro-
bability measure P Z=z : A → [0, 1] by P Z=z (A) = P (A | Z =z), for all A ∈ A.
(f ) Let x ′ ∈ X (Ω) = {0, 1, . . . , J } and {x ′ } ∈ AX′ , let 1X =x ′ denote the indicator of the
event {X =x ′ } = {ω ∈ Ω: X (ω) = x ′ }, and assume 0 < P (X =x ′ ) < 1. Furthermore,

let τx ′ = E X =x (Y |D X ) denote a true outcome variable of Y given x ′.
(g) For all x ∈ X (Ω) = {0, 1, . . . , J }, let {x } ∈ AX′ and assume 0 < P (X =x ) < 1. Further-
more, let τ := (τ0 , τ1 , . . . , τJ ) denote the (J +1)-variate random variable consisting
of the true outcome variables τ0 , τ1 , . . . , τJ of Y.
(h) Let Z be a covariate of X , that is, let σ(Z ) ⊂ σ(D X ).

8.1 F-Conditions

In this section, we present the causality conditions summarily referred to as the simple
(or unconditional) and the conditional Fisher (F) conditions. In the simple F-conditions,
we assume independence of a putative cause variable X and a global potential confounder
D X of X , or independence of D X and an indicator variable 1X =x , where x denotes a value of
8.1 F-Conditions 229

X . Among other things, these simple F-conditions imply unbiasedness of the conditional
expectation E (Y |X ) and the conditional expectation value E (Y |X =x ), respectively. Such a
simple F-condition is equivalent to independence of all potential confounders of X on one
side and X (or 1X =x ) on the other side. In contrast, the conditional F-conditions postulate
Z -conditional independence of D X and X (or 1X =x ), where Z denotes a covariate of X .
Among other things, these conditional F-conditions imply unbiasedness of E (Y |X, Z ) and
E X =x (Y |Z ), respectively. We also explicitly consider F-conditions in which we condition
on a single value z of Z .
From a methodological point of view, it should be noted that the simple F-conditions
are the mathematical foundation of the randomized experiment. In contrast, the condi-
tional F-conditions are the mathematical foundation of the conditionally randomized ex-
periment. However, the conditional F-conditions can also be used for covariate selection
aiming at establishing conditional independence of D X and X given the (possibly multi-
variate) covariate Z (for more details, see sect. 8.6).

8.1.1 Simple F-Conditions

In RS-section 2.4 we already introduced the concept of independence of two random vari-
ables with respect to a probability measure P . This concept is used repeatedly in Box 8.1, in
which the definitions of all F-conditions are gathered. Reading the definitions in this box,
remember that σ(D X ) and σ(X ) denote the σ-algebras generated by the random variables
D X and X , respectively (see RS-Def. 2.12).

Remark 8.2 [Independence of D X and 1X =x ] Box 8.1 (i) presents the definition of inde-
pendence of a global potential confounder D X of X and an indicator variable 1X =x for the
event {X =x }. In this definition, we require the Assumptions 8.1 (a) and (b). The notation
⊥1X =x . According to RS-Corollary 6.18,
is D X ⊥

DX ⊥
⊥1X =x ⇔ P (X =x |D X ) =
P
P (X =x ) (8.1)
⇔ P (1X =x =1|D X ) =
P
P (1X =x =1) (8.2)
⇔ P (1X =x =0 |D X ) =
P
P (1X =x =0) (8.3)
⇔ P (X 6=x |D X ) =
P
P (X 6=x). (8.4)

Note that Propositions (8.1) to (8.4) also hold if D X is neither discrete nor numerical and if
P (X =x ) = 0. Hence, the condition D X ⊥ ⊥1X =x can be used beyond true outcome theory of
causal effects (see ch. 5), and this applies to all other conditions presented in Box 8.1. ⊳

Remark 8.3 [Independence of D X and X ] The second definition [see Box 8.1 (ii)] is inde-
pendence of the putative cause variable X and a global potential confounder D X of X . The
notation is D X ⊥
⊥X . This concept also applies if neither X nor D X are discrete or numerical.
We only require Assumptions 8.1 (a). However, if X is discrete (see RS-Def. 2.62), then

DX ⊥
⊥X ⇔ ∀ x ∈ X (Ω): P (X =x |D X ) =
P
P (X =x ) (8.5)
⇔ ∀ x ∈ X (Ω): D X ⊥
⊥1X =x (8.6)

(see RS-Cor. 6.17). Hence, if X is discrete, then according to Proposition (8.5), indepen-
dence of D X and X (with respect to the probability measure P ) implies that the D X -con-
ditional probabilities of the events {X =x } = {ω ∈ Ω: X (ω) = x } do not depend on the global
230 8 Fisher Conditions

Box 8.1 F-conditions

Simple F-conditions
DX ⊥
⊥1X =x Independence of D X and 1X =x . Under Assumptions 8.1 (a) and (b), it is de-
fined by
∀ (A,B) ∈ σ(D X ) ×σ(1X =x ): P(A ∩ B) = P(A) · P(B). (i)

Under Assumptions 8.1 (a) to (c), it implies E (Y |X =x ) ⊢ D X .

DX ⊥
⊥X Independence of D X and X . Under Assumptions 8.1 (a), it is defined by
∀ (A,B) ∈ σ(D X ) ×σ(X ) : P(A ∩ B) = P(A) · P(B). (ii)

Under Assumptions 8.1 (a), (c), and (g), it implies E (Y |X ) ⊢ D X .

Z-conditional F-conditions
DX⊥
⊥1X =x |Z Z -conditional independence of D X and 1X =x . Under Assumptions 8.1 (a),
(b), and (d), it is defined by

∀ (A,B) ∈ σ(D X ) ×σ(1X =x ): P(A ∩ B | Z ) = P(A | Z ) · P(B | Z ). (iii)


P

Under Assumptions 8.1 (a) to (d) and that Z is a covariate of X it implies


E X =x (Y |Z ) ⊢ D X .

DX⊥
⊥X |Z Z -conditional independence of D X and X . Under Assumptions 8.1 (a) and
(d) it is defined by Note:

∀ (A,B) ∈ σ(D X ) ×σ(X ): P(A ∩ B | Z ) = P(A | Z ) · P(B | Z ). (iv)


P

Under Assumptions 8.1 (a), (c), (d), (g), and that Z is a covariate of X it
implies E (Y |X, Z ) ⊢ D X .

(Z = z)-conditional F-conditions
⊥1X =x |(Z =z) (Z =z)-conditional independence of D X and 1X =x . Under Assumptions 8.1
DX⊥
(a), (b), (d), and (e), it is defined by

∀ (A,B) ∈ σ(D X ) ×σ(1X =x ): P Z=z (A ∩ B) = P Z=z (A) · P Z=z (B). (v)


P Z=z

Under Assumptions 8.1 (a) to (e), and that Z is a covariate of X it implies


E Z=z(Y |X =x ) ⊢ D X .

DX⊥
⊥X |(Z =z ) (Z =z)-conditional independence of D X and X . Under Assumptions 8.1 (a),
(d), and (e), it is defined by

∀ (A,B) ∈ σ(D X ) × σ(X ): P Z=z (A ∩ B) = P Z=z (A) · P Z=z (B). (vi)


P Z=z

Under Assumptions 8.1 (a) to (e), (g), and that Z is a covariate of X it im-
plies E Z=z (Y |X ) ⊢ D X .

The proofs that the six F-conditions imply unbiasedness of the specified conditional expecta-
tions are found in the theorems and corollaries of section 8.3.
8.1 F-Conditions 231

potential confounder D X . Furthermore, according to Proposition (8.6), if X is discrete,


then D X ⊥⊥X is also equivalent to independence of D X and all indicator variables 1X =x ,
x ∈ X (Ω), where denotes X (Ω) the image of Ω under the map X . ⊳

Example 8.4 [First Examples of Independence of D X and X ] Examples of independence


of a global potential confounder D X and the putative cause variable X have already been
presented in RS-Table 1.2 and in Table 6.3 of the present book. In both examples, the per-
son variable U takes the role of a global potential confounder D X . ⊳

Remark 8.5 [Randomized Assignment of a Unit to a Treatment] As already mentioned in


the introduction of this chapter, D X ⊥ ⊥X can be created by randomized assignment (e. g.,
via a coin flip) of a unit to a treatment because, (a) by definition of randomized assign-
ment, X deterministically depends on the coin flip whose outcome does not depend on
D X , and (b) D X is prior or simultaneous to the treatment variable X (see Th. 4.31). This
means that X cannot cause D X and there is no common cause of X and Y. Note that ‘ran-
domization creates D X ⊥ ⊥X ’ is a substantive theory. It involves a term that refers to a design
technique that can be applied in empirical research. In contrast to ‘independence of D X
and X ’, the term ‘randomized assignment’ is not a mathematical concept. ⊳

8.1.2 Z -Conditional F-Conditions

The general concept of conditional independence of two random variables given a ran-
dom variable has been treated in some detail in RS-chapter 6. For a more detailed presen-
tation of conditional independence of random variables see SN-chapter 16.

Remark 8.6 [Z -Conditional Independence of D X and 1X =x ] The third causality condition


in Box 8.1 (iii), is Z-conditional independence of D X and 1X =x , denoted D X ⊥
⊥1X =x |Z . In this
definition, in which we require the Assumptions 8.1 (a), (b), and (d), P (A | Z ) := E (1A | Z ) de-
notes the Z -conditional probability of the event A (see RS-Rem. 4.12). This F-condition is
tantamount to assuming that 1X =x on one side and all potential confounders on the other
side are Z -conditionally independent. According to RS-Theorem 6.6,

DX⊥
⊥1X =x |Z ⇔ P (X =x |D X , Z ) =
P
P (X =x |Z ) (8.7)
⇔ P (1X =x =1|D X , Z ) =
P
P (1X =x =1| Z ) (8.8)
⇔ P (1X =x =0 |D X , Z ) =
P
P (1X =x =0 | Z ) (8.9)
⇔ P (X 6=x |D X , Z ) =
P
P (X 6=x |Z ). (8.10)

Note that Propositions (8.7) to (8.10) still hold if P (X =x ) = 0. ⊳

Remark 8.7 [Z-Conditional Independence of D X and X ] The fourth causality condition,


defined in Box 8.1 (iv), is Z-conditional independence of D X and X , denoted D X ⊥ ⊥X |Z .
For this definition, we require the Assumptions 8.1 (a) and (d). This causality condition is
tantamount to the assumption that X on one side and all potential confounders of X on
the other side are Z -conditionally independent. ⊳

Remark 8.8 [The Putative Cause Variable Does Not Have to Be Discrete] Just as D X ⊥
⊥X ,
the condition D X ⊥
⊥X |Z is also defined and may hold if X is continuous. In this case,
232 8 Fisher Conditions

true outcome theory of causal effects is not applicable because there we presume posi-
tive probabilities P (X =x ) for all x ∈ X (Ω), which does not hold if X is continuous. Nev-
ertheless, if Z is a covariate of X , then D X ⊥ ⊥X |Z is still a causality condition, implying
that E (Y |X, Z ) describes a causal Z -conditional dependence of Y on X . However, in this
volume we refrain from generalizing the theory to continuous putative cause variables. ⊳

⊥X |Z if X is Discrete] If X is discrete (see RS-Def. 2.62), then


Remark 8.9 [D X ⊥

DX⊥
⊥X |Z ⇔ ∀ x ∈ X (Ω): P (X =x |D X , Z ) =
P
P (X =x | Z ) (8.11)
⇔ ∀ x ∈ X (Ω): D X ⊥
⊥1X =x |Z (8.12)

(see RS-Th. 6.5). To emphasize, the right-hand sides of these propositions are equivalent
to D X ⊥
⊥X |Z only if X is discrete. However, in contrast to the definition presented in Box
8.1 (iv), they do not hold anymore, if X is continuous. Also note that Proposition (8.5) is a
special case of (8.11) for Z being a constant map, that is, for σ(Z ) = {Ω, Ø}. ⊳
Remark 8.10 [Consequences of Z Being a Covariate of X ] If we assume that Z is a covari-
ate of X , then, according to Definition 4.11 (iv) and Remark 4.16, σ(Z ) ⊂ σ(D X ) holds for
the σ-algebras generated by these two random variables. This implies σ(D X ) = σ(Z , D X )
[see RS-Prop. (2.19)] and

P (X =x |D X , Z ) =
P
P (X =x |D X ) (8.13)

[see RS-Def. 4.4 and RS-Eq. (4.10)]. Hence, if X is discrete and Z is a covariate of X , then
Proposition (8.11) can also be written as

DX⊥
⊥X |Z ⇔ ∀ x ∈ X (Ω): P (X =x |D X ) =
P
P (X =x | Z ). (8.14)

Correspondingly, Propositions (8.7) to (8.10) simplify to

DX⊥
⊥1X =x |Z ⇔ P (X =x |D X ) =
P
P (X =x |Z ) (8.15)
⇔ P (1X =x =1|D X ) =
P
P (1X =x =1| Z ) (8.16)
⇔ P (1X =x =0 |D X ) =
P
P (1X =x =0 | Z ) (8.17)
⇔ P (X 6=x |D X ) =
P
P (X 6=x |Z ). (8.18)


Example 8.11 [A First Example of Z -Conditional Independence] An example of Z -con-
ditional independence of the putative cause variable X and a global potential confounder
D X of X has already been presented in Table 6.4. In this example, U takes the role of a
global potential confounder of X . In section 8.5 we will treat several examples in more
detail. ⊳

8.1.3 (Z =z)-Conditional F-Conditions

Now we turn to conditional independence of a putative cause variable X (or 1X =x ) and a


global potential confounder D X of X given that Z takes on the value z. In these definitions,
we assume P (Z =z) > 0 and refer to the probability measure P Z=z defined in Assumptions
8.1 (e).
8.1 F-Conditions 233

Remark 8.12 [(Z =z)-Conditional Independence of D X And X ] The fifth causality condi-
tion [see Box 8.1 (v)] is (Z =z)-conditional independence of D X and 1X =x . It is denoted by
DX⊥ ⊥1X =x |(Z =z). The required assumptions are 8.1 (a), (b), (d), and (e). This condition
means that 1X =x on one side and all potential confounders of X on the other side are
(Z =z)-conditionally independent. The definition shown in Box 8.1 (v) reveals that (Z =z)-
conditional independence of D X and 1X =x is equivalent to independence of D X and 1X =x
with respect to the probability measure P Z=z [see Assumptions 8.1 (e)]. That is,

DX⊥
⊥1X =x |(Z =z) ⊥ 1X =x .
:⇔ D X ⊥ (8.19)
P Z=z

Furthermore, according to RS-Corollary 6.18,

DX⊥
⊥1X =x |(Z =z) ⇔ P Z=z (X =x |D X ) Z=z
= P Z=z (X =x ) (8.20)
P

⇔ P Z=z (1X =x =1|D X ) Z=z


= P Z=z (1X =x =1) (8.21)
P

⇔ P Z=z (1X =x =0 |D X ) Z=z


= P Z=z (1X =x =0) (8.22)
P

⇔ P Z=z (X 6=x |D X ) Z=z


= P Z=z (X 6=x). (8.23)
P


Remark 8.13 [(Z =z)-Conditional Independence of D X and X ] The sixth causality condi-
tion, introduced in Box 8.1 (vi), is (Z =z)-conditional independence of D X and X , denoted
DX⊥ ⊥X |(Z =z). The required assumptions are 8.1 (a), (d), and (e). This condition means
that the putative cause variable X on one side and all potential confounders of X on the
other side are (Z =z)-conditionally independent. According to the definition in Box 8.1 (vi),
(Z =z)-conditional independence of D X and X is equivalent to independence of D X and X
with respect to the probability measure P Z=z . That is,

DX⊥
⊥X |(Z =z) ⊥X.
:⇔ D X ⊥ (8.24)
P Z=z


Remark 8.14 [(Z =z)-Conditional Independence of D X and a Finite X ] Suppose that X is
finite or countable. Then, according to Remark 8.3 and Proposition (8.24),

DX⊥
⊥X |(Z =z) ⇔ ∀ x ∈ X (Ω): P Z=z (X =x |D X ) Z=z
= P Z=z (X =x ) (8.25)
P
⇔ ∀ x ∈ X (Ω): D X ⊥
⊥1X =x |(Z =z) . (8.26)

Hence, if X is finite or countable, then (Z =z)-conditional independence of D X and X is


equivalent to (Z =z)-conditional independence of D X and 1X =x for all values x of X . ⊳
Remark 8.15 [Z -Conditional Randomized Assignment] If Z is prior in (Ft )t ∈T to X [see
Def. 3.3 (ii)] and therefore a covariate of X (see Cor. 4.33), then we can conduct exper-
iments in which we create D X ⊥ ⊥X |Z without D X ⊥ ⊥X . For example, conditional indepen-
dence of a treatment variable X and a global potential confounder D X of X given a covari-
ate Z of X may be created via Z -conditional randomization. In this technique of experi-
mental design, assignment to treatment is arranged such that the conditional probabilities
P (X =x | Z =z) may differ for different values z of Z (representing, e. g., different degrees of
234 8 Fisher Conditions

severity of the disorder before treatment), but D X ⊥


⊥X |Z still holds. The reason is that, for
all x ∈ X (Ω), this assignment procedure secures that no other potential confounder than
Z determines the probabilities of being assigned to treatment x (see Rem. 8.59 or more
details). ⊳

8.2 Implications Among F-Conditions

In this section we study the implication structure among the F-conditions. A summary of
all these implications is presented in Box 8.1. Proofs are found in the theorems treated in
the present section and in the solution to Exercise 8-2.

Remark 8.16 [First Consequences of D X ⊥


⊥X and D X ⊥
⊥X |Z ] Under the Assumptions 8.1
(a) and (b),

DX ⊥
⊥X ⇒ DX ⊥
⊥1X =x (8.27)

because σ(1X =x ) ⊂ σ(X ) [see RS-Box 2.1 (iv)]. Correspondingly, and for the same reason,
under the Assumptions 8.1 (a), (b), and (d),

DX⊥
⊥X |Z ⇒ DX⊥
⊥1X =x |Z (8.28)

[see RS-Box 6.1 (vi)]. ⊳


In the following theorem, we consider an implication that we call generalizability of
independence of a putative cause variable X and a global potential confounder D X of X .

Theorem 8.17 [Generalizability of D X ⊥


⊥X ]
Let the Assumptions 8.1 (a) hold.
(i) Then
DX ⊥
⊥X ⇒ ∀W ∈ WX : D X ⊥
⊥X |W . (8.29)

(ii) If we additionally assume 8.1 (b), then


DX ⊥
⊥1X =x ⇒ ∀W ∈ WX : D X ⊥
⊥1X =x |W . (8.30)
(Proof p. 256)

Hence, under the Assumptions 8.1 (a) and that W is a potential confounder of X , in-
dependence of a putative cause variable X and a global potential confounder D X of X
implies W -conditional independence of X and D X . This proposition is called generaliz-
ability of D X ⊥
⊥X . Its methodological implications are discussed in more detail in Remark
8.63. Note that assuming W to be a potential confounder of X is crucial for this implica-
tion. Also note that, under the Assumptions 8.1 (a) and (b), Propositions (8.27) and (8.30)
imply
DX ⊥
⊥X ⇒ ∀W ∈ WX : D X ⊥
⊥1X =x |W . (8.31)

In the following theorem, we extend Theorem 8.17 considering generalizability of Z -


conditional independence of a putative cause variable X and a global potential con-
founder D X of X .
8.2 Implications Among F-Conditions 235

Table 8.1. Implications among the F-conditions

⊥1X =x |(Z =z)

⊥X |(Z =z)

⊥1X =x |Z

⊥1X =x
⊥X |Z

DX⊥
DX⊥

DX⊥
D X⊥

D X⊥
DX⊥⊥X |(Z =z ) (a), (b), (d), (e)
DX⊥⊥1X =x |Z (a), (b), (d), (e)
DX⊥⊥X |Z (a), (b), (d), (e) (a), (d), (e) (a), (b), (d)
DX ⊥
⊥1X =x (a), (b), (d), (e), (h) (a), (b), (h)
DX ⊥
⊥X (a), (b), (d), (e), (h) (a), (d), (e), (h) (a), (b), (h) (a), (h) (a), (b)

Note: An entry such as (a), (b) means that the condition in the row implies the condition in
the column, provided that the Assumptions 8.1 (a) and (b) hold. Trivial equivalences such as
DX ⊥⊥1X =x ⇔ D X ⊥
⊥1X =x are omitted.

Theorem 8.18 [Generalizability of D X ⊥⊥X |Z ]


Let the Assumptions 8.1 (a) and (d) hold.
(i) Then
DX⊥
⊥X |Z ⇒ ∀W ∈ WX : D X ⊥
⊥X |(Z ,W ) . (8.32)

(ii) If we additionally assume 8.1 (b), then


DX⊥
⊥1X =x |Z ⇒ ∀W ∈ WX : D X ⊥
⊥1X =x |(Z ,W ) . (8.33)
(Proof p. 257)

Hence, under the assumptions of Theorem 8.18, Z -conditional independence of a pu-


tative cause variable X and a global potential confounder D X of X implies (Z ,W )-condi-
tional independence of X and D X , provided that W is a potential confounder of X . This is
what we call generalizability of D X ⊥
⊥X |Z (see Rem. 8.64) for a discussion of the method-
ological implications). Again, assuming W to be a potential confounder of X is crucial for
this implication. Also note that, under the Assumption 8.1 (a), (b), (d), Propositions (8.28)
and (8.33) imply
DX⊥
⊥X |Z ⇒ ∀W ∈ WX : D X ⊥
⊥1X =x |(Z ,W ) . (8.34)

In the next corollary we consider a consequence of D X ⊥


⊥X |Z that involves conditioning
on an event {Z =z } = {ω ∈ Ω: Z (ω) = z } for which we assume P (Z =z) > 0. This corollary
immediately follows from RS-Propositions (6.47) and (6.48).

⊥X |Z Implies (Z =z)-Conditional Independence of D X and X ]


Corollary 8.19 [D X ⊥
If the Assumptions 8.1 (a), (d), and (e) hold, then
236 8 Fisher Conditions

DX⊥
⊥X |Z ⇒ DX⊥
⊥X |(Z =z). (8.35)

Hence, under the assumptions of Corollary 8.19, Z -conditional independence of a pu-


tative cause variable X and a global potential confounder of X implies (Z =z)-conditional
independence of X and D X .
Propositions (8.29) and (8.35) immediately yield the following corollary.

⊥X Implies (Z =z)-Conditional Independence of D X and X ]


Corollary 8.20 [D X ⊥
Let the Assumptions 8.1 (a), (d), and (e) hold, and let Z be a covariate of X . Then

DX ⊥
⊥X ⇒ DX⊥
⊥X |(Z =z) . (8.36)

Hence, under the assumptions of this corollary, independence of a putative cause variable
X and a global potential confounder of X implies (Z =z)-conditional independence of X
and D X .

⊥X |(Z =z)] Under the Assumptions 8.1 (a), (b), (d),


Remark 8.21 [A Consequence of D X ⊥
and (e),

DX⊥
⊥X |(Z =z) ⇒ DX⊥
⊥1X =x |(Z =z) (8.37)

[see solution (1) to Exercise 8-2]. ⊳

Note that the assumptions of the theorems and the other propositions stated above
neither include P (X =x ) > 0 nor that Y is real-valued, which are prerequisites for the true
outcome variables τx to be defined. Hence, these propositions also hold beyond true out-
come theory. This remark also applies to all other propositions summarized in Table 8.1.
(For the proofs of these propositions, see the solution to Exercise 8-2.)

8.3 Implications of F-Conditions on RR-Conditions and Unbiasedness

In this section we treat the consequences of the Fisher conditions on the Rosenbaum-
Rubin conditions and on unbiasedness. Now our assumptions include that the putative
cause variable X is finite with values in the set X (Ω) = {0, 1, . . . , J }, P (X =x ) > 0 for all values
of X , and that Y is real-valued. These assumptions are prerequisites for the definitions of
the true outcome variables τx , x ∈ X (Ω).
Table 8.2 summarizes the consequences of the Fisher conditions on the Rosenbaum-
Rubin conditions. These consequences are proved in the theorems treated in this section
or in Exercise 8-3. We also prove the implications of the Fisher conditions on unbiased-
ness.
8.3 Implications of F-Conditions on RR-Conditions and Unbiasedness 237

8.3.1 Consequences of Independence of D X and X

In the first theorem of this section, we consider consequences of D X ⊥ ⊥X for the multi-
variate true outcome variable τ = (τ0 , τ1 , . . . , τJ ) and unbiasedness of the conditional ex-
pectation E (Y |X ) and its values E (Y |X =x ). In this theorem, we confine ourselves to those
consequences that do not involve conditioning on a covariate of X .

Theorem 8.22 [Consequences of D X ⊥ ⊥X for True Outcomes and Unbiasedness]


Let the Assumptions 8.1 (a), (c), and (g) hold. Then D X ⊥
⊥X implies
(i) τ⊥⊥X
(ii) ∀ x ∈ X (Ω): τ⊥ ⊥1X =x
(iii) ∀ x ∈ X (Ω): τx is P-unique
(iv) ∀ x ∈ X (Ω): E (Y |X =x ) ⊢ D X
(v) E (Y |X ) ⊢ D X .
(Proof p. 257)

Remark 8.23 [D X ⊥ ⊥X Implies That All τx Are P -Unique] Under the assumptions of The-
orem 8.22, D X ⊥ ⊥X implies that all true outcome variables τx , x ∈ X (Ω), are P-unique [see
Prop. (iii) of Th. 8.22]. Hence, P -uniqueness of the true outcome variables τx is not an ad-
ditional assumption in Proposition (v) of Theorem 8.22, according to which independence
of D X and X implies unbiasedness of E (Y |X ). ⊳

Remark 8.24 [D X ⊥ ⊥X Implies the Rosenbaum-Rubin Conditions] Note that D X ⊥ ⊥X does


not only imply τ⊥ ⊥ X and P -uniqueness of all true outcome variables τx [see Props. (i) and
(iii) of Th. 8.22], but also all other simple Rosenbaum-Rubin conditions (see the last row
of Table 7.1). ⊳

In the next theorem we consider some consequences of independence of an indicator


1X =x for a value x of the putative cause variable X and a global potential confounder D X
of X . Reading this theorem, remember that E X =x(Y |D X ) is the more explicit notation of a
true outcome variable τx (see Def. 5.4).

Theorem 8.25 [Consequences of D X ⊥ ⊥1X =x ]


Let the Assumptions 8.1 (a) to (c) hold. Then D X ⊥
⊥1X =x implies
(i) E X =x(Y |D X )⊥
⊥ 1X =x and E X 6=x(Y |D X )⊥
⊥ 1X 6=x
(ii) E X =x(Y |D X ) and E X 6=x(Y |D X ) are P-unique
(iii) E (Y |X =x ) ⊢ D X and E (Y |X 6=x) ⊢ D X
(iv) E (Y | 1X =x ) ⊢ D X .
(Proof p. 257)

Hence, according to this theorem, D X ⊥ ⊥ 1X =x implies that the true outcome variable
E X =x(Y |D X ) and the indicator 1X =x are independent. Correspondingly, D X ⊥ ⊥ 1X =x implies
that E X 6=x(Y |D X ) and the indicator 1X 6=x are independent. It also implies that the true out-
come variables E X =x (Y |D X ) and E X 6=x(Y |D X ) are P-unique. Finally, D X ⊥ ⊥ 1X =x implies
that E (Y | 1X =x ) and its values E (Y |X =x ) and E (Y |X 6=x) are unbiased.
238 8 Fisher Conditions

Table 8.2. Implications of the Fisher conditions on some Rosenbaum-Rubin conditions

⊥1X =x |(Z =z)

⊥X |(Z =z)

⊥1X =x |Z

⊥1X =x
⊥X |Z

⊥X
τ⊥

τ⊥

τ⊥

τ⊥
τ⊥
τ⊥
DX⊥⊥1X =x |(Z =z) (a)-(e),(g)
DX⊥⊥X |(Z =z ) (a)-(e),(g) (a)-(e),(g)
DX⊥⊥1X =x |Z (a)-(e),(g) (a)-(d),(g)
DX⊥⊥X |Z (a)-(g) (a)-(e),(g) (a)-(d),(g) (a)-(d),(g)
DX ⊥
⊥1X =x (a)-(g),(h) (a)-(d),(g),(h) (a),(c),(g)
DX ⊥
⊥X (a)-(g),(h) (a)-(g),(h) (a)-(d),(g),(h) (a)-(d),(g),(h) (a),(c),(g) (a),(c),(g)

Note: An entry such as (a)-(g) means that the condition in the row implies the condition in the
column, provided that the Assumptions 8.1 (a) to (g) hold. Consequences of the Rosenbaum-
Rubin conditions displayed in the columns of the table are found in Tables 7.1 and 7.2.

Remark 8.26 [Further Consequences of D X ⊥


⊥ 1X =x ] Because σ(1X =x ) = σ(1X 6=x), Theorem
8.25 (i) immediately yields

E X =x (Y |D X )⊥
⊥ 1X 6=x and E X 6=x(Y |D X )⊥
⊥ 1X =x . (8.38)

In the next theorem we consider some consequences of D X ⊥ ⊥X involving condition-


ing on a covariate Z of X . These consequences include unbiasedness of the conditional
expectations E X =x (Y |D X ), x ∈ X (Ω), and E (Y |X, Z ).

Theorem 8.27 [Consequences of D X ⊥ ⊥X Involving Conditioning on a Covariate]


Let the Assumptions 8.1 (a) to (c) hold and Z be a covariate of X . Then D X ⊥
⊥X implies
(i) τ⊥⊥X |Z
(ii) ∀ x ∈ X (Ω): τ⊥ ⊥1X =x |Z
(iii) ∀ x ∈ X (Ω): E X =x (Y |Z ) ⊢ D X
(iv) E (Y |X, Z ) ⊢ D X .
(Proof p. 258)

Remark 8.28 [D X ⊥ ⊥X Implies τ⊥ ⊥X |Z ] Under the assumptions of Theorem 8.27, D X ⊥ ⊥X


implies Z -conditional independence of the multivariate true outcome variable τ and the
putative cause variable X . Remember, according to Theorem 8.22 (iii), D X ⊥ ⊥X implies P -
uniqueness of all true outcome variables τx , x ∈ X (Ω). Under the assumptions of Theo-
rem 8.27, D X ⊥ ⊥X also implies all consequences of τ⊥ ⊥X |Z listed in the last row of Table
7.2. Hence, under these assumptions, D X ⊥ ⊥X does not only imply that E (Y |X ) and each
E (Y |X =x ), x ∈ X (Ω), are unbiased, but it also implies unbiasedness of E (Y |X, Z ) and each
8.3 Implications of F-Conditions on RR-Conditions and Unbiasedness 239

Z -conditional expectation E X =x (Y |Z ), x ∈ X (Ω). To emphasize, under these assumptions


and D X ⊥⊥X , an additional assumption of P -uniqueness of the true outcome variables τx
is not necessary because this property already follows from D X ⊥ ⊥X . ⊳
Now we turn to some implications of D X ⊥ ⊥X that involve conditioning on a value z
of Z , that is, conditioning on the event {Z =z }. According to the next theorem, D X ⊥ ⊥X
implies (Z =z)-conditional independence of τ and X , provided that Z is a covariate of
X . Assuming P (Z =z) > 0 and P (X =x ) > 0, for all values x of X , conditional indepen-
dence of D X and X also implies unbiasedness of each conditional expectation value
E Z=z(Y |X =x ) = E (Y |X =x , Z =z), x ∈ X (Ω), and unbiasedness of the X -conditional expec-
tation E Z=z (Y |X ).

⊥X Implies (Z =z)-Conditional Independence of τ and X ]


Theorem 8.29 [D X ⊥
Let the Assumptions 8.1 (a) to (g) hold and assume that Z is a covariate of X . Then
DX ⊥⊥X implies
(i) ∀ x ∈ X (Ω): P (X =x , Z =z) > 0
(ii) τ⊥
⊥X |(Z =z)
(iii) ∀ x ∈ X (Ω): τ⊥
⊥1X =x |(Z =z)
(iv) ∀ x ∈ X (Ω): E Z=z(Y |X =x ) ⊢ D X
(v) E Z=z (Y |X ) ⊢ D X .
(Proof p. 259)

Remark 8.30 [Other Consequences of D X ⊥ ⊥X Involving an Event {Z =z }] Hence, under


the assumptions of Theorem 8.29, D X ⊥ ⊥X implies that τ and X are (Z =z)-conditionally
independent. Corollary 7.14 lists a number of consequences that follow from τ⊥ ⊥ X |(Z =z),
provided that all true outcome variables τx are P Z=z-unique. Remember that D X ⊥ ⊥X im-
plies that all true outcome variables τx , x ∈ X (Ω), are P-unique [see Th. 8.22 (iii)], which in
turn implies that they are also P Z=z-unique [see RS-Box 5.1 (v)]. Hence, because of Propo-
sitions (i) and (ii) of Theorem 8.29, all the consequences listed in Corollary 7.14 also follow
from D X ⊥ ⊥X , provided that Z is a covariate of X . No additional assumption of P -unique-
ness of the true outcome variables τx , x ∈ X (Ω), is required because it already follows from
DX ⊥ ⊥X . The most important consequences of D X ⊥ ⊥X are unbiasedness of E Z=z (Y |X ) and
Z=z
the conditional expectation values E (Y |X =x ), x ∈ X (Ω), provided, of course, that the
assumptions of Theorem 8.29 hold. ⊳

8.3.2 Consequences of Z -Conditional Independence of D X and X

Now we turn to the consequences of Z -conditional independence of D X and X . Reading


the second part of this theorem, note that P -uniqueness of a true outcome variable τx
follows from the conjunction of D X ⊥
⊥X |Z and P (X =x | Z ) >
P
0 (see Th. 8.34).

Theorem 8.31 [Consequences of D X ⊥ ⊥X |Z ]


Let the Assumptions 8.1 (a) to (d) and (g) hold. Then D X ⊥
⊥X |Z implies
(i) τ⊥
⊥X |Z
240 8 Fisher Conditions

(ii) ∀ x ∈ X (Ω): τ⊥ ⊥1X =x |Z


(iii) ∀ x, x ′ ∈ X (Ω): τx ⊥
⊥1X =x ′ |Z .
If, additionally, all true outcome variables τx , x ∈ X (Ω), are P-unique and Z is a covari-
ate of X , then D X ⊥
⊥X |Z also implies
(iv) ∀ x ∈ X (Ω): E X =x (Y |Z ) ⊢ D X
(v) E (Y |X, Z ) ⊢ D X .
(Proof p. 260)

To emphasize, Z -conditional independence of D X and X alone is not sufficient for


unbiasedness of E (Y |X, Z ), even if Z is a covariate of X . This will be exemplified in Ex-
ample 8.33. However, in Theorem 8.34 we show that the conjunction of D X ⊥ ⊥X |Z and
P (X =x |Z ) >
P
0, for all x ∈ X (Ω), implies P -uniqueness of all true outcome variables τx ,
x ∈ X (Ω).

Remark 8.32 [D X ⊥ ⊥X |Z Implies the Rosenbaum-Rubin Conditions] Note that D X ⊥ ⊥X |Z


does not only imply τ⊥ ⊥X |Z but, if all τx , x ∈ X (Ω), are P-unique, also all other conditional
Rosenbaum-Rubin conditions [see Prop. (i) of Th. 8.31 and the last row of Table 7.2]. Fur-
thermore, if Z is a covariate of X , then D X ⊥ ⊥X |Z also implies unbiasedness of E (Y |X, Z )
and all conditional expectations E X =x (Y |Z ), x ∈ X (Ω). ⊳

Example 8.33 [No Treatment For Males] Table 8.3 displays the parameters of a random
experiment in which males have a zero probability to be treated. We assume that this ran-
dom experiment has the same structure as the random experiments treated in section 6.5.
That is,
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,

as specified in section 6.5, is the regular probabilistic causality setup. Again, U takes the
role of a global potential confounder D X of X and Z = sex is a covariate of X because
σ(Z ) ⊂ σ(U ). This implies

P (X =1|U , Z ) =
P
P (X =1|U )

because σ(U , Z ) = σ(U ) [see RS-Prop. (2.19), RS-Def. 4.4, and RS-Rem. 4.12].
In this example, there is Z -conditional independence of U and X but the true outcome
variable τ1 is not P-unique. Therefore, unbiasedness of E X =1 (Y | Z ) and unbiasedness of
E (Y |X, Z ) are not defined in this example.
We start checking if U ⊥ ⊥X | Z holds. According to RS-Theorem 6.6, if X is binary, then

U⊥
⊥X | Z ⇔ P (X =1|U , Z ) =
P
P (X =1|Z ).

Because, in this example, σ(Z ) ⊂ σ(U ), this simplifies to

U⊥
⊥X | Z ⇔ P (X =1|U ) =
P
P (X =1|Z ).

Inspecting the column headed P (X =1|U =u ) in Table 8.3 shows that


8.3 Implications of F-Conditions on RR-Conditions and Unbiasedness 241

Table 8.3. No treatment for males: Z -conditional independence of X and D X

Fundamental parameters

E X =1 (Y |U =u )
E X=0 (Y |U =u )

P (U =u |X = 0)
P (X =1|U =u )

P (U =u |X =1)
CTE U ;10 (u)
Person u

P (U =u )
Sex z
Joe m 1/4 0 68 999∗ ndef 2/5 0
Jim m 1/4 0 78 −999∗ ndef 2/5 0
Ann f 1/4 3/4 106 114 8 1/10 1/2
Sue f 1/4 3/4 116 130 14 1/10 1/2

x =0 x=1
E (τx ): 92.0 111∗ ATE 10 = ndef
E (Y |X =x ): 80.6 122 PFE 10 = 41.4

E (τx |Z =m): 73 0∗ CTE Z ; 10 (m) = ndef


E (Y |X =x , Z =m): 73 888∗ PFE Z ; 10 (m) = 815∗

E (τx |Z = f ): 111 122 CTE Z ; 10 ( f ) = 11


E (Y |X =x , Z = f ): 111 122 PFE Z ;10 ( f ) = 11

Note: An asterisk ∗ indicates that this number is arbitrary and could be any other real number
because, in this example, the corresponding term is not uniquely defined (see RS-sect. 4.4 for
more details). Furthermore, ndef means that this term is not defined in this example.

(
0, if ω ∈ {Z =m}
P (X =1|U )(ω) = P (X =1|Z )(ω) = (8.39)
3/4, if ω ∈ {Z = f },

which proves U ⊥ ⊥X | Z .
Now we show that, in this example, the true outcome variable τ1 is not P-unique.
Because U is a global potential confounder of X , P -uniqueness of τ1 is equivalent to
P (X =1|U ) >
P
0 (see RS-Th. 5.27), which in turn is equivalent to
¡ ¢
P {ω ∈ Ω: P (X =1|U )(ω) > 0} = 1.

[see RS-Eq. (4.12)]. Looking at the columns headed P (U =u ) and P (X =1|U =u ) in Table 8.3
shows that
¡ ¢
P {ω ∈ Ω: P (X =1|U )(ω) > 0} = 1/4 + 1/4 = 1/2.

Therefore, P (X =1|U ) >


P
0 and with it P -uniqueness of τ1 = E X =1 (Y |U ) does not hold (see
Exercise 8-4). Hence, there is at least one other version τ1∗ of the U -conditional expectation
of Y with respect to the measure P X =1 for which τ1 = P
τ1∗ does not hold (see Exercise 8-5).

242 8 Fisher Conditions

For the second part of Theorem 8.31 we need the additional assumption that all true
outcome variables τx , x ∈ X (Ω), are P-unique. In the following theorem we present a con-
dition under which Z -conditional independence of D X and X implies that a true outcome
variable τx is P-unique.

Theorem 8.34 [A Condition Under Which D X ⊥ ⊥X |Z Implies P -Uniqueness of τx ]


Let the Assumptions 8.1 (a) to (d) hold and assume that Z is a covariate of X . Then

DX⊥
⊥1X =x |Z ∧ P (X =x |Z ) >
P
0 ⇒ τx is P-unique. (8.40)

If we additionally assume 8.1 (g), then


¡ ¢
DX⊥
⊥X |Z ∧ ∀ x ∈ X (Ω): P (X =x |Z ) >
P
0 ⇒ ∀ x ∈ X (Ω): τx is P-unique. (8.41)
(Proof p. 260)

⊥1X =x |Z Transfers Positivity From P (X =x |Z ) to P (X =x |D X )] Accord-


Remark 8.35 [D X ⊥
ing to RS-Theorem 5.27,

P (X =x |Z ) >
P
0 ⇔ E X =x (Y |Z ) is P-unique.

However, according to Proposition (8.40), if Z is a covariate of X , then the conjunction


DX⊥ ⊥1X =x |Z ∧ P (X =x |Z ) >P
0 implies that not only E X =x (Y |Z ) is P-unique, but also a true
X =x
outcome variable E (Y |D X ) = τx . RS-Theorem 5.27 provides some other conditions that
are equivalent to P (X =x |Z ) > P
0. ⊳

Theorems 8.31 and 8.34 immediately imply the following corollary.

Corollary 8.36 [Consequences of D X ⊥ ⊥X |Z For Unbiasedness of E (Y |X, Z )]


Let the Assumptions 8.1 (a) to (d) and (g) hold, and assume that Z is a covariate of X .
Then
¡ ¢
DX⊥ ⊥X |Z ∧ ∀ x ∈ X (Ω): P (X =x | Z ) >
P
0
⇒ ∀ x ∈ X (Ω): E X =x (Y |Z ) ⊢ D X (8.42)
⇔ E (Y |X, Z ) ⊢ D X . (8.43)

Note that D X ⊥ ⊥X |Z is not sufficient for unbiasedness of the conditional expectations


E X =x (Y |Z ) and E (Y |X, Z ). However, in conjunction with P (X =x | Z ) > P
0, ∀ x ∈ X (Ω), it is,
provided that Z is a covariate of X . In contrast, if Z is a covariate of X and P (X =x ) > 0
for all x ∈ X (Ω) , then D X ⊥
⊥X is sufficient for unbiasedness of the conditional expectations
E X =x (Y |Z ), x ∈ X (Ω), and E (Y |X, Z ) (see also Exercises 8-6 and 8-7).
In the following theorem, we consider consequences of Z -conditional independence
of a putative cause variable X and a global potential confounder D X of X on (Z =z)-con-
ditional independence of X and the multivariate true outcome variable τ.
8.3 Implications of F-Conditions on RR-Conditions and Unbiasedness 243

⊥X |Z Implies (Z =z)-Conditional Independence of τ And X ]


Theorem 8.37 [D X ⊥
Let the Assumptions 8.1 (a) to (e) and (g) hold. Then D X ⊥
⊥X |Z implies
(i) ∀ x ∈ X (Ω): τx is P Z=z-unique
(ii) τ⊥ ⊥X |(Z =z)
(iii) ∀ x ∈ X (Ω): τ⊥
⊥1X =x |(Z =z)
If we additionally assume that Z is a covariate of X , then

(iv) ∀ x ∈ X (Ω): E Z=z(Y |X =x ) ⊢ D X


(v) E Z=z (Y |X ) ⊢ D X .
(Proof p. 260)

Hence, under the assumptions of Theorem 8.37, D X ⊥ ⊥X |Z implies that all true out-
come variables τx , x ∈ X (Ω), are P Z=z-unique and that τ = (τ0 , τ1 , . . . , τJ ) and X are (Z =z)-
conditionally independent. Furthermore, under the additional assumption that Z is a co-
variate of X , D X ⊥⊥X |Z also implies unbiasedness of all conditional expectation values
E Z=z(Y |X =x ), x ∈ X (Ω), and of the X-conditional expectation E Z=z (Y |X ) of Y with respect
to the measure P Z=z .

Remark 8.38 [Consequences of τ⊥ ⊥X |(Z =z)] Corollary 7.14 lists some other consequences
that follow from the conjunction of τ⊥ ⊥X |(Z =z) and P Z=z -uniqueness of the true out-
come variables. Because of Proposition (ii) of Theorem 8.37, these consequences also fol-
low from D X ⊥⊥X |Z , provided that the assumptions of Theorem 8.37 hold. ⊳

In Theorem 8.37 and Remark 8.38, we only consider a single value z of the random
variable Z and assume that P (Z =z) > 0. In contrast, in the following theorem we assume
P (X =x , Z =z) > 0 for all pairs (x, z) of values of X and Z , which implies P (X =x ) > 0 for all
values x of X and P (Z =z) > 0 for all values z of Z .

Theorem 8.39 [More Consequences of D X ⊥ ⊥X |Z ]


Let the Assumptions 8.1 (a) to (e) hold and assume that P (X =x , Z =z) > 0 for all pairs
(x, z) ∈ X (Ω)×Z (Ω). Then D X ⊥
⊥X |Z implies

(i) ∀ x ∈ X (Ω): τx is P-unique.

If we additionally assume that Z is a covariate of X , then


(ii) ∀ x ∈ X (Ω): E X =x (Y |Z ) ⊢ D X
(iii) E (Y |X, Z ) ⊢ D X .
(Proof p. 261)

Hence, under the assumptions of Theorem 8.39, we can conclude that all true outcome
variables τx , x ∈ X (Ω), are P-unique and, if Z is a covariate of X , then it also follows that all
conditional expectations E X =x (Y |Z ), x ∈ X (Ω), as well as E (Y |X, Z ) are unbiased.

Remark 8.40 [Methodological Consequences] Theorem 8.39 has important methodolog-


ical consequences. The two crucial assumptions under which the conditional expectation
244 8 Fisher Conditions

E (Y |X, Z ) is unbiased can often be created by the experimenter. The first is the assump-
tion that P (X =x , Z =z) > 0 for all pairs (x, z) of values of X and Z . If, for example, X is
a binary treatment variable and Z the binary covariate sex, then the experimenter simply
has to make sure that there is a positive probability for each of the four pairs (x, z) of values
of X and Z under which the outcome variable Y is observed. ⊳

8.3.3 Consequences of (Z =z)-Conditional Independence of D X and X

Now we turn to some consequences of (Z =z)-conditional independence of D X and X ,


which has been introduced in Remark 8.12 by independence of D X and X with respect to
the probability measure P Z=z .

Theorem 8.41 [Consequences of D X ⊥ ⊥X |(Z =z)]


Let the Assumptions 8.1 (a) to (e) and (g) hold. Then D X ⊥
⊥X |(Z =z) implies
(i) τ⊥
⊥X |(Z =z)
(ii) ∀ x ∈ X (Ω): τ⊥
⊥1X =x |(Z =z).
If we additionally assume P Z=z (X =x ) > 0 for all x ∈ X (Ω), then D X ⊥
⊥X |(Z =z) implies
(iii) ∀ x ∈ X (Ω): τx is P Z=z-unique.

If we additionally assume P Z=z (X =x ) > 0 for all x ∈ X (Ω) and Z is a covariate of X ,


then D X ⊥
⊥X |(Z =z) also implies
(iv) ∀ x ∈ X (Ω): E Z=z(Y |X =x ) ⊢ D X
(v) E Z=z (Y |X ) ⊢ D X .
(Proof p. 261)

Hence, under the assumptions of Theorem 8.41, D X ⊥ ⊥X |(Z =z) implies that the multi-
variate true outcome variable τ and X are (Z =z)-conditionally independent. This implies
that τ and each indicator variable 1X =x , x ∈ X (Ω), are (Z =z)-conditionally independent. If
we additionally assume P Z=z (X =x ) > 0 for all x ∈ X (Ω), then D X ⊥ ⊥X |(Z =z) also implies
that all true outcome variables τx , x ∈ X (Ω), are P Z=z-unique. Finally, if Z is a covariate of
⊥X |(Z =z) also implies and that the conditional expectation E Z=z (Y |X ) and
X , then D X ⊥
all the conditional expectation values E Z=z(Y |X =x ) = E (Y |X =x , Z =z), x ∈ X (Ω), are un-
biased. Other implications can be found in the last row of Table 7.2.
In the next theorem we consider some consequences of (Z =z)-conditional indepen-
dence a global potential confounder D X of X and an indicator 1X =x for a value x of the
putative cause variable X .

Theorem 8.42 [Some Consequences of D X ⊥ ⊥1X =x |(Z =z)]


Let the Assumptions 8.1 (a) to (e) hold. Then D X ⊥
⊥1X =x |(Z =z) implies

(i) E X =x(Y |D X )⊥
⊥ 1X =x |(Z =z) and E X 6=x(Y |D X )⊥
⊥ 1X 6=x |(Z =z).

If we additionally assume 0 < P Z=z (X =x ) < 1, then D X ⊥


⊥1X =x |(Z =z) implies
(ii) E X =x(Y |D X ) and E X 6=x(Y |D X ) are P Z=z-unique.
8.4 Unbiasedness of Prima Facie Effects and Effect Functions 245

If, additionally, 0 < P Z=z (X =x ) < 1 and Z is a covariate of X , then D X ⊥


⊥1X =x |(Z =z)
also implies

(iii) E Z=z(Y |X =x ) ⊢ D X and E Z=z (Y | X 6=x) ⊢ D X


(iv) E Z=z (Y | 1X =x ) ⊢ D X .
(Proof p. 262)

Hence, according to Theorem 8.42, (Z =z)-conditional independence of the indica-


tor 1X =x and a global potential confounder D X of X implies that the true outcome vari-
able E X =x (Y |D X ) and the indicator 1X =x as well as E X 6=x(Y |D X ) and the indicator 1X 6=x
are (Z =z)-conditionally independent. If we additionally assume 0 < P Z=z (X =x ) < 1, then
DX⊥ ⊥1X =x |(Z =z) also implies that E X =x (Y |D X ) and E X 6=x(Y |D X ) are P Z=z-unique. Finally,
we also add the assumption that Z is a covariate of X , then D X ⊥ ⊥1X =x |(Z =z) implies that
E Z=z (Y | 1X =x ) and the conditional expectation values E Z=z(Y |X =x ) and E Z=z (Y | X 6=x) are
unbiased.

8.4 Unbiasedness of Prima Facie Effects and Effect Functions

In section 6.4 we showed how to identify average and conditional causal total effects and
effect functions from unbiased prima facie effects and unbiased prima facie effect func-
tions. Now we turn to sufficient conditions for unbiasedness of conditional and uncon-
ditional prima facie effects and effect functions. Hence, this section is also crucial for the
identification of conditional and average causal total effects.

8.4.1 Unbiasedness of the Prima Facie Effect

In the following theorem we specify the conditions under which the prima facie effect

PFE x x ′ = E (Y |X =x ) − E (Y |X =x ′ ) (8.44)

is unbiased, that is, under which PFE x x ′ ⊢ DC . Hence, according to Definition 6.23 (i), we
specify conditions under which τx and τx ′ are P-unique and the prima facie effect PFE x x ′
is identical to the average causal total effect

ATE x x ′ = E (τx − τx ′ ) = E (τx ) − E (τx ′ ). (8.45)

Theorem 8.43 [An F-Condition Implying Unbiasedness of the Prima Facie Effect]
Let the Assumptions 8.1 (a) to (c), and (f ) hold. Then
¡ ¢
DX ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′ ⇒ PFE x x ′ ⊢ DC . (8.46)
(Proof p. 263)

Remark 8.44 [Identification of ATE x x ′ ] Hence, under the assumptions of Theorem 8.43,
the conjunction of (a) independence of the indicator 1X =x and a global potential con-
founder D X of X and (b) independence of the indicator 1X =x ′ and D X implies P -unique-
ness of τx and τx ′ as well as
246 8 Fisher Conditions

PFE x x ′ = E (Y |X =x ) − E (Y |X =x ′ ) = ATE x x ′ = E (τx ) − E (τx ′ ). (8.47)

That is, the conjunction D X ⊥ ⊥1X =x ∧ D X ⊥ ⊥1X =x ′ implies that the difference between the
conditional expectation values E (Y |X =x ) and E (Y |X =x ′ ) is identical to the causal aver-
age total effect of x compared to x ′ . In this context we also say that the ATE x x ′ is identified
by that difference. Note that P -uniqueness of τx follows from D X ⊥ ⊥1X =x and P -uniqueness
of τx ′ follows from D X ⊥
⊥1X =x ′ . It is not an additional assumption. ⊳

⊥X Implies PFE x x ′ = ATE x x ′ ] Also note that


Remark 8.45 [D X ⊥
¡ ¢
DX ⊥
⊥X ⇒ DX ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′ , (8.48)

which follows from σ(1X =x ), σ(1X =x ′ ) ⊂ σ(X ) and RS-Box 2.1 (iv). Hence, under the as-
sumptions of Theorem 8.43,

DX ⊥
⊥X ⇒ PFE x x ′ ⊢ DC (8.49)
⇒ PFE x x ′ = ATE x x ′ . (8.50)

This means, under D X ⊥ ⊥X , the average causal total effect ATE x x ′ is identified by the prima
facie effect ATE x x ′ . ⊳

8.4.2 Unbiasedness of a Z -Conditional Prima Facie Effect Function

In the next theorem we specify two conditions under each of which a Z -conditional prima
facie effect variable

PFE Z ; x x ′ (Z ) =
P
E X =x (Y |Z ) − E X =x (Y |Z ) (8.51)

is unbiased. Remember, in Definition 6.23 (ii), we introduced the notation


¡ ¢
PFE Z ; x x ′ ⊢ DC ⇔ τx and τx ′ are P-unique ∧ PFE Z ; x x ′ (Z ) =
P
CTE Z ; x x ′ (Z ) (8.52)

for unbiasedness of PFE Z ; x x ′ (Z ), where

CTE Z ; x x ′ (Z ) =
P
E (τx − τx ′ |Z ) (8.53)

denotes a causal Z -conditional total effect variable.

Theorem 8.46 [Unbiasedness of a Z -Conditional Prima Facie Effect Variable]


If the Assumptions 8.1 (a) to (d), and (f ) hold, P (X =x | Z ), P (X =x ′ | Z ) >
P
0, and Z is a
covariate of X , then each of the following conditions implies PFE Z ; x x ′ ⊢ DC :
(i) D X ⊥ ⊥1X =x ′ | Z
⊥1X =x |Z ∧ D X ⊥
(ii) D X ⊥
⊥X |Z .
(Proof p. 264)

Remark 8.47 [Identification of CTE Z ; xx ′ (Z )] Hence, under the assumptions of Theorem


8.46, the conjunction of Z -conditional independence of a global potential confounder D X
8.4 Unbiasedness of Prima Facie Effects and Effect Functions 247

of X and the indicator 1X =x and of D X and 1X =x ′ implies that the Z -conditional prima facie
effect variable PFE Z ; x x ′ (Z ) is unbiased.
Under the same assumptions, the same conclusion can be drawn from Z -conditional
independence of D X and X . Because unbiasedness of PFE Z ; x x ′ (Z ) implies

PFE Z ; x x ′ (Z ) =
P
CTE Z ; x x ′ (Z )

[see Prop. (8.52)], we also say that the causal Z -conditional total effect variableCTE Z ; xx ′ (Z )
is identified by PFE Z ; x x ′ (Z ), provided that the assumptions of Theorem 8.46 hold. ⊳
Remark 8.48 [PFE Z ; x x ′ ⊢ DC And the Identification of the Causal Average Total Effect] If
PFE Z ; x x ′ is unbiased, then we can not only identify the causal Z -conditional total effect
variable CTE Z ; xx ′ (Z ) (see Rem. 8.47), but also the causal average total effect ATE x x ′ , even
if D X ⊥
⊥X and Equation (8.47) do not hold. For details see Theorem 6.34 and the example
presented in Table 6.4. ⊳
According to the following theorem, independence of X and a global potential con-
founder D X of X is a sufficient condition for unbiasedness of a Z -conditional prima facie
effect function PFE Z ; x x ′ , and it does not require the additional assumptions P (X =x | Z ) >
P
0
and P (X =x ′ | Z ) >
P
0 that are indispensable in Theorem 8.46. Instead, these additional as-
sumption already follow from D X ⊥ ⊥X .

Theorem 8.49 [Again Unbiasedness of a Z -Conditional Prima Facie Effect Variable]


If the Assumptions 8.1 (a) to (d) and (f ) hold, and Z is a covariate of X , then each of the
following conditions implies PFE Z ; x x ′ ⊢ DC :
(i) D X ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′
(ii) D X ⊥
⊥X .
(Proof p. 264)

8.4.3 Unbiasedness of a (Z =z)-Conditional Prima Facie Effect

Now we consider a single value z of Z and specify conditions under which a (Z =z)-con-
ditional prima facie effect

PFE Z ; x x ′ (z) = E X =x (Y |Z =z)− E X =x (Y |Z =z) (8.54)

is unbiased. Note that PFE Z ; x x ′ (z) is the value of the effect function PFE Z ; x x ′ : ΩZ′ → R . Also
remember that PFE Z ; x x ′ (z) ⊢ DC denotes unbiasedness of PFE Z ; x x ′ (z), which is defined by

PFE Z ; x x ′ (z) ⊢ DC ⇔ PFE Z ; x x ′ (z) = CTE Z ; xx ′ (z) ∧ τx , τx ′ are P Z=z-unique


¡ ¢
(8.55)

[see Def. 6.23 (iii)]. Presuming P Z=z -uniqueness of τx and τx ′ ,

CTE Z ; x x ′ (z) = E (τx |Z =z) − E (τx ′ |Z =z) (8.56)

[see Eq. (5.31)] is the causal (Z =z)-conditional total effect and CTE Z ; xx ′ (z) is the value of
the causal Z -conditional total effect function CTE Z ; x x ′ : ΩZ′ → R .
248 8 Fisher Conditions

Theorem 8.50 [Unbiasedness of the (Z =z)-Conditional Prima Facie Effect]


Let the Assumptions 8.1 (a) to (f ) hold, let P (X =x | Z =z), P (X =x ′ | Z =z) > 0, and let Z
be a covariate of X . Then each of the following conditions implies PFE Z ; x x ′ (z) ⊢ DC :
(i) DX⊥
⊥1X =x |(Z =z) ∧ D X ⊥
⊥1X =x ′|(Z =z)
(ii) D X ⊥ ⊥1X =x ′ | Z
⊥1X =x |Z ∧ D X ⊥
(iii) D X ⊥
⊥X |Z
(iv) D X ⊥
⊥X |(Z =z).
(Proof p. 264)

Remark 8.51 [Identification of CTE Z ; xx ′ (z)] Given the assumptions of Theorem 8.50, each
of the four conditions listed in this theorem implies that the causal (Z =z)-conditional
total effect is identical to the difference between the two conditional expectation values
E (Y |X =x , Z =z) and E (Y |X =x ′, Z =z). That is, each of these four conditions implies

CTE Z ; x x ′ (z) = PFE Z ; x x ′ (z) = E (Y |X =x , Z =z) − E (Y |X =x ′, Z =z) . (8.57)

In this context we also say that CTE Z ; xx ′ (z) is identified by the (Z =z)-conditional prima
facie effect PFE Z ; x x ′ (z), that is, it is identified by the difference between the conditional
expectation values E (Y |X =x , Z =z) and E (Y |X =x ′, Z =z). ⊳
Remark 8.52 [Conditional Randomization] Note that D X ⊥ ⊥X |Z [see Th. 8.50 (iii)] can be
created by conditional randomization, that is, by randomized assignment of the unit to a
treatment x, conditional on the values z of a covariate Z . In this case, the treatment prob-
abilities P (X =x | Z =z) can be fixed by the experimenter and may differ between different
values z of the covariate Z (see Rem. 8.15). ⊳
Remark 8.53 [Covariate Selection] Also note that we may try to select the (possibly multi-
variate) covariate Z = (Z 1 , . . . , Z m ) in such a way that D X ⊥
⊥X |Z holds. For instance, severity
of the disorder, knowing about the treatment, and availability of the treatment often are
candidates for such covariates. However, there is no guarantee that D X ⊥ ⊥X |Z holds for a
specified (univariate or multivariate) covariate Z . This is why the conditions specified in
the following theorem are of utmost importance. ⊳

Theorem 8.54 [Again Unbiasedness of a (Z =z)-Conditional Prima Facie Effect]


If the Assumptions 8.1 (a) to (f ) hold and Z is a covariate of X , then each of the following
conditions implies PFE Z ; x x ′ (z) ⊢ DC :
(i) D X ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′
(ii) D X ⊥
⊥X .
(Proof p. 265)

Remark 8.55 [Randomization] Note that D X ⊥ ⊥X [see Th. 8.54 (ii)] can be created in an
experiment by randomization, that is, by randomized assignment of the sampled unit to a
treatment condition (represented by a value x of X ). Also note that Z can be any covariate
of X and remember, according to Theorem 4.31, Definition 4.11 (iv), and Remark 4.16,
every random variable on (Ω, A, P ) that is prior in (Ft )t ∈T to X is a covariate of X . ⊳
8.5 Examples 249

8.5 Examples

Now we illustrate the causality conditions treated in this chapter by some examples. In
RS-chapter 1 we introduced an example with independence of X and a global potential
confounder D X (see RS-Table 1.2). A second example has already been presented in chap-
ter 6 (see Table 6.3), and in the same chapter, there is also an example with Z -conditional
independence of D X and X (see Table 6.4). The structure of these random experiments has
already been treated in section 6.5, that is,
¡ ¢
(Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y ,

as specified in section 6.5, is the regular probabilistic causality setup. Furthermore, the
observational-unit variable U is a global potential confounder, that is, D X = U , implying

E (Y | X , D X ) =
P
E (Y | X ,U )

and
P (X =1|D X ) =
P
P (X =1|U ).
In this equation, P (X =1|U ) denotes the individual treatment probability function, whose
values are the individual treatment probabilities P (X =1|U =u ) (see RS-Remarks 4.12 and
4.25). If the values u of U represent the observational units at the onset of treatment, then
D X = U will hold in empirical applications if
(a) no fallible covariate is assessed, and
(b) there is no other variable that is simultaneous to the treatment variable X (such as
a second treatment variable).
If a fallible covariate is assessed, then this covariate, say Z , is not measurable with respect
to U , which implies that it is not identical to a composition f (U ) of U and some map f .
In this case, D X = (U , Z ) is a global potential confounder of X , unless there are still other
fallible covariates of X .

Example 8.56 [Independence of X and U ] Table 6.3 displays an example in which D X ⊥ ⊥X


holds, where D X = U . In this table, it is easily seen that the individual treatment probabili-
ties are the same for all units, that is,
3
P (X =1|U ) = P (X =1) = .
4
Note that U ⊥
⊥X implies independence of X and all random variables on (Ω, A, P ) that are
measurable with respect to U [see RS-Box 2.1 (iv)].
In the same example, D X ⊥
⊥X |Z holds as well, where Z = sex, because D X = U and Z is
measurable with respect to U . Furthermore,

3
P (X =1|U , Z ) =
P
P (X =1| Z ) =
P
P (X =1) = .
4
Note that U ⊥ ⊥X | Z implies that X and all random variables that are measurable with re-
spect to U are also Z -conditionally independent [see RS-Box 6.1 (vi)].
If D X =U , then, according to Theorem 8.22 (v), independence of U and X implies unbi-
asedness of the conditional expectation E (Y |X ). Furthermore, because, in this example,
250 8 Fisher Conditions

X is binary and P (X =0) > 0, P (X =1) > 0, independence of X and U also implies unbiased-
ness of the conditional expectation values E (Y |X =0) and E (Y |X =1) [see Th. 8.22 (iv)] as
well as unbiasedness of the prima facie effect

PFE 10 := E (Y |X =1) − E (Y |X =0) = 102.333 − 92.333 = 10

(see Th. 8.43), and this implies PFE 10 = ATE 10 .


As already stated above, U ⊥ ⊥X | Z holds as well for Z = sex. Because D X =U and Z is U -
measurable, according to Theorem 8.27 (iv), the conditional expectation E (Y |X, Z ) is un-
biased. Furthermore, because P (X =0, Z =z), P (X =1, Z =z) > 0 for both values z of Z , this
implies unbiasedness of all conditional expectation values E Z =z (Y |X =x ) = E (Y |X =x ,Z =z)
[see Th. 8.29 (iv)], and that the (Z =z)-conditional prima facie effects

PFE Z ;10 (m) = E (Y | X =1, Z =m) − E (Y | X = 0, Z =m) ≈ 92.50 − 83.00 ≈ 9.50

and
PFE Z ; 10 ( f ) = E (Y | X =1, Z = f ) − E (Y | X = 0, Z = f ) = 122 − 111 = 11
are unbiased (see Th. 8.50).
Also note that
4 2
PFE 10 = E [PFE Z ; 10 (Z )] = 9.50 · + 11 · = 10.
6 6
This means that the prima facie effect is the expectation of the corresponding conditional
prima facie effect variable. And because PFE Z ; 10 (Z ) is unbiased, it is identical to the causal
Z -conditional total effect variable CTE Z ;10 (Z ), and its expectation is identical to the causal
average total effect ATE 10 (see Th. 6.34). ⊳
Example 8.57 [Z -Conditional Independence of X and U ] In Table 6.4 we already treated
an example in which D X ⊥ ⊥X |Z holds, whereas D X ⊥ ⊥X does not. Again, D X =U . As is easily
seen in this table, the individual treatment probabilities are the same for all units within
each of the two subsets of males and females, that is,

3
for each male unit u : P (X =1|U =u , Z =m) = P (X =1| Z =m) =
4
and
1
for each female unit u : P (X =1|U =u , Z = f ) = P (X =1| Z = f ) = .
4
Hence, the individual treatment probabilities are 3/4 for males and 1/4 for females. These
individual treatment probabilities differ for different values of the covariate Z , but they are
invariant given a value of the covariate Z . Note that the conditional treatment probability
P (X =1| Z =m) = 3/4 is also the individual treatment probability P (X =1|U =u ) for the four
male units, and the conditional treatment probability P (X =1| Z = f ) = 1/4 is also the indi-
vidual treatment probability P (X =1|U =u ) for the two female units. This follows from the
fact that P (X =1|U , Z ) and P (X =1|Z ) are conditional expectations [see RS-Eq. (4.10)] and
¡ ¯ ¢
P (X =1|U ) =
P
E P (X =1|U , Z ) ¯ U [σ(U , Z ) ⊂ σ(U ), RS-Box 4.1 (xiii)]
¡ ¯ ¢
=
P
E P (X =1| Z ) ¯ U [P (X =1|U , Z ) =
P
P (X =1| Z )]
£ ¡ ¢ ¤
=
P
P (X =1| Z ). σ P (X =1| Z ) ⊂ σ(U ), RS-Box 4.1 (xi)
8.6 Methodological Consequences 251

In Table 6.4 there is only Z -conditional independence of U and the treatment variable
X , that is, U ⊥
⊥X | Z , whereas U ⊥⊥X does not hold. Therefore, in this table, the prima facie
effect
PFE 10 = E (Y |X =1) − E (Y |X =0) ≈ 96.715 − 99.800 ≈ −3.085
is biased, because the average total effect in this example is ATE 10 = 10 (see Exercise 8-9).
However, the (Z =z)-conditional prima facie effects are unbiased. In fact, they are the same
as in Table 6.3 (see Example 8.56). Hence, we can use the conditional prima facie effects to
compute the average total effect. If the conditional prima facie effects are unbiased, that is,
if they are equal to the corresponding causal conditional total effects, then the expectation
of the conditional prima facie effect variable is equal to the causal average total effect (see
Th. 6.34). In our example, this expectation is,
¡ ¢ 4 2
ATE 10 = E PFE Z ;10 (Z ) = PFE Z ;10 (m) · + PFE Z ; 10 ( f ) ·
6 6
4 2
= 9.50 · + 11 · = 10.00.
6 6

8.6 Methodological Consequences

Now we discuss the conclusions from the theory treated in this chapter for the design and
analysis of experiments and quasi-experiments. Theorems 8.22 and 8.27 are the theoreti-
cal foundation of the experimental design technique of randomization and of the analysis
of causal conditional and causal average total treatment effects in experiments by compar-
ing (unadjusted) means between treatment conditions. Theorem 8.31 is the theoretical
foundation for the experimental design technique of conditional randomization and of
the analysis of causal conditional total treatment effects. Together with Theorem 6.34, this
theorem can also be used for the analysis of causal average total effects. Finally, Theorems
8.31 and 8.34, and Corollary 8.36 can be used in (nonrandomized) quasi-experiments for
covariate selection and the analysis of causal conditional and causal average total effects.
This will now be explained in more detail.

Remark 8.58 [Randomization] It is well-known at least since (Fisher, 1925/1946) that


randomization plays a crucial role in ruling out bias in comparisons of means. In a ran-
domized experiment, there is always an observational-unit variable U whose value u de-
notes the observational unit that is sampled if the random experiment considered is ac-
tually conducted. The definition of a global potential confounder D X [see Def. 4.11 (iii)]
guarantees that U is measurable with respect to a global potential confounder D X . In fact,
in a simple experiment, in which we do not observe any fallible pretests and in which
there is only one single treatment variable X , the random variable U itself is a global po-
tential confounder of X . Such a global potential confounder of X comprises all potential
confounders of X in the sense that they are measurable with respect to the global poten-
tial confounder. In any case, if X represents a discrete treatment variable, then D X ⊥
⊥X is
equivalent to

P (X =x |D X ) =
P
P (X =x |U ) =
P
P (X =x ), ∀ x ∈ X (Ω) . (8.58)
252 8 Fisher Conditions

The last one of these two equations implies that all units u ∈ U (Ω) have the same probabil-
ity P (X =x |U =u ) = P (X =x ) to be assigned to treatment x, and this holds for all x ∈ X (Ω).
Remember that we are talking about a single-unit trial. A simple example of such a
single-unit trial consists of sampling a single observational unit from a set of units, pos-
sibly assessing a number of covariates, assigning the unit (or observing its assignment) to
one of the treatment conditions and observing the outcome variable (see ch. 2 for more
details and other kinds of single-unit trials).
Note that Equation (8.58) does not imply that the probabilities P (X =x ) are the same for
all treatment conditions x. If, for example, there are two treatment conditions, say 0 and 1,
then the two treatment probabilities might be P (X =1) = 1/4 and P (X = 0) = 3/4. However,
DX ⊥ ⊥X implies P (X =1|U ) = P (X =1) because σ(U ) ⊂ σ(D X ). In other words, D X ⊥ ⊥X im-
plies that the individual treatment probabilities P (X =x |U =u ) are identical for all units,
and they are equal to the (unconditional) treatment probability P (X =x ). Hence, in a per-
fect randomized experiment we ensure D X ⊥ ⊥X . If P (X =x ) > 0 for all x ∈ X (Ω), then this
condition implies that the conditional expectation values E (Y |X =x ), x ∈ X (Ω), are unbi-
ased (see Th. 8.22) and that a prima facie effect E (Y |X =x ) − E (Y |X =x ′ ) is identical to the
average causal total effect ATE x x ′ (see Th. 8.43).
It is important to note that through a perfect randomized experiment we do not only
create D X ⊥⊥X but also D X ⊥ ⊥X |W for each potential confounder W of X (see Th. 8.17). If
P (X =x ) > 0 for all x ∈ X (Ω), then this condition implies P (X =x |W ) > P
0 for all x ∈ X (Ω),
which in turn implies that each conditional expectation E X =x(Y |W ), x ∈ X (Ω), the condi-
tional expectation E (Y | X ,W ), and each prima facie effect variable

PFE W ; x x ′ (W ) = E X =x (Y |W ) − E X =x (Y |W ), x, x ′ ∈ X (Ω),

are unbiased (see Cor. 8.36 and Th. 8.46). ⊳

Remark 8.59 [Conditional Randomization] In the single-unit trial of an experiment or


quasi-experiment, in which a unit u is sampled and a value z of a covariate Z is assessed
prior to treatment, conditional randomization refers to randomized assignment of the unit
u to the treatment condition x with probability P (X =x | Z =z), x ∈ X (Ω). In this procedure
we have to make sure that treatment assignment only depends on the values z of Z but
not on any other attribute of the units or any other covariate. In more formal terms, we
create P (X =x |D X ) =
P
P (X =x | Z ) for all values x ∈ X (Ω), where X (Ω) denotes the image of
Ω under X , that is, the set of all values of the treatment variable X .
We distinguish two cases. First, if Z is assessed without error, then each value z of Z
represents an attribute of the observational units. In this case, there is a map f such that
Z = f (U ) is the composition of U and f . This implies that U is D X -measurable (see RS-
Cor. 2.36). Table 6.4 contains an example with conditional independence of U and treat-
ment variable X given that the sampled person is male or is female, the two values of Z .
Second, if the covariate Z is assessed with error, that is, Z = f (U )+ε for some non-constant
random variable ε, then a value z of Z does not represent an attribute of the units be-
cause z is not only determined by u but also by other factors and/or chance. Nevertheless,
treatment can be assigned such that the treatment probability deterministically depends
on the observed score z, and not on the attribute f (u ) of the unit u itself. In both cases,
Z = f (U ) and Z = f (U ) + ε, conditional randomized assignment of a unit u to one of the
treatment conditions given a value z of Z secures that a global potential confounder D X
satisfies D X ⊥
⊥X |Z , that is, Z -conditional independence of D X and X . In the first case,
8.6 Methodological Consequences 253

D X = U and in the second case D X = (U , Z ), provided there are no other covariates of X


than Z that are assessed with error. (Covariates that are assessed without error are maps
of U and therefore U -measurable.)
Conditionally randomized assignment allows us that units with different values z1 and
z2 of Z have different probabilities to be assigned to treatment x. Thus it is possible to
assign a unit for which we observe a covariate value z1 (e. g., high need) to a treatment x
with a higher probability than a unit with covariate value z2 (e. g., low need). In this way
we may respect ethical and/or other requirements without compromising the validity of
causal inference. Remember, unconditionally randomized assignment means that each
unit has the same probability of being assigned to a given treatment, regardless of his or
her need and any other attribute of the unit.
For simplicity, suppose there are just two treatment conditions, X = 0 (control) and X =1
(treatment). Then conditional randomization consists of:
(a) fixing the conditional treatment probabilities P (X =1| Z =z ) for all values z of the
covariate Z of X ,
(b) sampling a unit u and assessing the value z of the covariate Z , and
(c) randomly assigning the unit with probability P (X =1| Z =z) to treatment 1.
If, for example, the covariate has three values, say high, medium, and low need, we may
toss a die and assign a unit with high need to treatment if the die shows less than six dots,
and assign it to control otherwise. A unit with medium need might be assigned to treat-
ment if the die shows less than four dots, and to control otherwise. Finally, a unit with low
need might be assigned to treatment if the die shows one dot, and to control otherwise.
If the covariate Z has a finite number of values, then the conditional treatment proba-
bilities P (X =1| Z =z) may be fixed in a table by assigning a treatment probability to each
value z that seems appropriate with respect to ethical and other requirements. In the ex-
ample above, these were the values 5/6 for high need, 3/6 for medium need and 1/6 for low
need. If the covariate is univariate continuous, then we may also use a function, such as
exp(λ0 + λ1 · Z )
P (X =1| Z ) =
1 + exp(λ0 + λ1 · Z )
with real-valued coefficients λ0 and λ1 that seem appropriate for the experiment consid-
ered.
If P (X =x , Z =z), P (X =x ′, Z =z) > 0, then Z -conditional randomization implies that the
(Z =z)-conditional prima facie effect PFE Z ; x x ′ (z) = E (Y |X =x ,Z =z) − E (Y |X =x ′, Z =z) is
identical to the causal conditional total effect given a value z of the covariate Z (see
Th. 8.50 for the mathematical details). Hence, studying conditional prima facie effects in
a perfect conditionally randomized experiment is tantamount to studying causal condi-
tional total effects. And once we know the causal conditional total effects given the values
z of a covariate Z , then we can also compute the causal average total effect (see Th. 6.34).

Remark 8.60 [Covariate Selection in Quasi-Experiments] Covariate selection is the first
step in a number of techniques for the analysis of causal conditional and causal average to-
tal effects in quasi-experiments, in which by definition of the quasi-experiment, the exper-
imenter cannot fix the true treatment probabilities himself. The steps to follow distinguish
different techniques of analysis. By definition, in a quasi-experiment, there is no random-
ization and no conditional randomization. However, even under initial randomization,
254 8 Fisher Conditions

systematic attrition of subjects may invalidate the F-condition D X ⊥⊥X (see, e. g., Abraham
& Russell, 2004; Fichman & Cummings, 2003; Graham & Donaldson, 1993; Shadish et al.,
2002). In this case, we will say that randomization failed and the initially randomized ex-
periment turned into a quasi-experiment.
In a quasi-experiment, selecting the covariates in the covariate vector Z := (Z 1 , . . . , Z m )
for which we can hope that D X ⊥ ⊥X |Z holds is a useful strategy in the analysis of causal
conditional and causal average total treatment effects. However, note that there might be
many covariates determining the treatment probabilities. For instance, the severity of the
disorder, knowing about the treatment, and availability of the treatment are candidates for
such covariates. To emphasize, there is no guarantee that D X ⊥ ⊥X |Z holds for a specified
(univariate or multivariate) covariate Z , unless the conditional probabilities P (X =x | Z ),
x ∈ X (Ω), are fixed by the experimenter. ⊳

Remark 8.61 [Testability] As already mentioned before, in contrast to the causality con-
ditions treated in chapters 6 and 7, the causality conditions D X ⊥ ⊥X and D X ⊥
⊥X |Z can
be tested in empirical applications, at least in the sense that some consequences of these
conditions can be checked. If at least one of these consequences does not hold, then we
say that the corresponding causality condition is falsified.
Let us briefly outline how we can test the assumption that D X ⊥⊥X holds. Remember, if
X (Ω) is finite or countable, then D X ⊥
⊥X is equivalent to

P (X =x |D X ) =
P
P (X =x ), ∀ x ∈ X (Ω) . (8.59)

This implies that


P (X =x |W ) =
P
P (X =x ), ∀ x ∈ X (Ω) (8.60)
holds for each random variable W that is measurable with respect to D X (see again Ex-
ercise 8-8). Examples for such random variables W are sex, educational status, but also
fallible pretests, that is, variables that are assessed with measurement error before the on-
set of treatment.
Similarly, if X (Ω) is finite or countable, then D X ⊥
⊥X |Z is equivalent to

P (X =x |D X ) =
P
P (X =x | Z ), ∀ x ∈ X (Ω) . (8.61)

(Remember that Z denotes a covariate of X , which, by definition, is D X -measurable.) This


equation implies
P (X =x |W, Z ) =P
P (X =x | Z ), ∀ x ∈ X (Ω), (8.62)
for each potential confounder of X , that is, for each random variable W that is measur-
able with respect to D X (see Exercise 8-13). Equations (8.60) and (8.62) can be tested using
standard procedures for the analysis of conditional probabilities such as logistic regres-
sion (see, e. g., Agresti, 2007; Bonney, 1987; Green, 2003; Hosmer & Lemeshow, 2000) or
probit regression (see, e. g., McCullagh & Nelder, 1989; Borooah, 2001; Liao, 1994). ⊳
Remark 8.62 [The Covariate Selection Process] If Equation (8.62) does not hold for spec-
ified D X -measurable random variables W and Z , then D X ⊥ ⊥X |Z , which implies Equation
(8.62), cannot hold as well. In this case we may define Z ∗ := (Z ,W ) and select a new D X -
measurable random variable W ∗, and check if

P (X =x |W ∗, Z ∗ ) =
P
P (X =x | Z ∗ ), ∀ x ∈ X (Ω), (8.63)
8.6 Methodological Consequences 255

holds. This process can be continued as long as there is doubt that

P (X =x |D X ) =
P
P (X =x | Z ∗ ), ∀ x ∈ X (Ω) . (8.64)

Of course, such a procedure does not guarantee that we find a (possibly multivariate) co-
variate Z ∗ such that Equation (8.64) holds. Instead, in a quasi-experiment, Equation (8.64)
always remains an assumption. However, this assumption can always be falsified, which
has a positive and a negative side. The negative side is that we can never be sure that this
assumption holds. The positive side is that this assumption is empirically testable and, in
this sense, it is not just a matter of belief (cf. Popper, 2005). ⊳
Remark 8.63 [Generalizability of D X ⊥ ⊥X ] If D X ⊥
⊥X holds, then, according to Theorem
8.17, D X ⊥⊥X |W holds as well, provided that W is a potential confounder of X , or synony-
mously, a covariate of X . Under the assumptions of Theorem 8.22, this implies: If D X ⊥ ⊥X
holds, then E (Y |X ) is unbiased, and under the assumptions of Theorem 8.27, which in-
clude that Z is a covariate of X , it also implies unbiasedness of E (Y |X, Z ). Conditioning on
a covariate Z of X may be meaningful in order to obtain a causal conditional total effect
function CTE Z ; x x ′ that is more fine-grained and therefore providing more specific infor-
mation than the causal average total effect ATE x x ′ .
If, for example, Z denotes the covariate sex with values m (male) and f (female), and
P (X =x , Z =z), P (X =x ′, Z =z) > 0 for both values z of Z , then D X ⊥
⊥X does not only imply
PFE x x ′ = ATE x x ′ but also

PFE Z ; x x ′ (z) = E (Y | X =x , Z =z) − E (Y | X =x ′, Z =z) = CTE Z ; xx ′ (z),

where CTE Z ; xx ′ (z) denotes the causal (Z =z)-conditional total effect comparing x to x ′ .
This applies to males (z =m) and to females (z = f ). Even if PFE x x ′ = PFE Z ; x x ′ (m) =
PFE Z ; x x ′ ( f ), then this would be an important information, extending the substantive in-
terpretation of the causal average total effect ATE x x ′ because then it would be identical to
the causal conditional total effects CTE Z ; x x ′ (m) and CTE Z ; x x ′ ( f ).
In order to emphasize the importance of this point for substantive research, imagine
there would be an empirical study in which we can interpret E (Y | X =x ) − E (Y | X =x ′ ) as
the causal average total treatment effect and at the same time the corresponding differ-
ences E (Y | X =x , Z =z) − E (Y | X =x ′, Z =z) would have no causal interpretation! ⊳
Remark 8.64 [Generalizability of D X ⊥ ⊥X |Z ] Now assume that D X ⊥ ⊥X |Z holds, where Z
is a random variable on (Ω, A, P ), but not necessarily a covariate of X . If W is a poten-
tial confounder of X , then, according to Theorem 8.18, this implies that we also have
DX⊥ ⊥X |(Z ,W ), that is, (Z ,W )-conditional independence of D X and X . Hence, if we con-
dition on Z and D X ⊥ ⊥X |Z holds, then there is no need to control for further potential con-
founders, at least not for the purpose of establishing unbiasedness. However, controlling
for W may still be meaningful in order to obtain a more fine-grained total effect function
CTE Z ,W ; x x ′ that contains more specific information than CTE Z ; x x ′ .
Under the Assumptions 8.1 (a) to (d), (g), P (X =x | Z ,W ) > P
0 for all x ∈ X (Ω), and that Z
is a covariate of X , the condition D X ⊥ ⊥X |Z implies unbiasedness not only of E (Y |X, Z )
but, via D X ⊥ ⊥X |(Z ,W ), also of E (Y |X , Z ,W ) (see Cor. 8.36), no matter which other poten-
tial confounder W of X we consider. Controlling, additionally to Z , for another potential
confounder W of X may still be meaningful in order to obtain a causal conditional total
effect function CTE Z ,W ; xx ′ that is more fine-grained than CTE Z ; xx ′ . ⊳
256 8 Fisher Conditions

8.7 Summary and Conclusions

In chapter 6 we showed that unbiasedness of the conditional or unconditional prima fa-


cie effects is crucial for computing causal conditional and unconditional total effects from
(the empircally estimable) conditional or unconditional prima facie effects. In chapter 7
we treated several causality conditions that imply unbiasedness, which cannot be tested
in empirical applications. The most restrictive of these conditions is Rosenbaum and Ru-
bin’s strong ignorability. In the present chapter, we introduced a first kind of causality con-
ditions that are empirically testable. These causality conditions, which are called Fisher
conditions, are summarized in Box 8.1.
The implication relations among the causality conditions treated in this chapter are
listed in Table 8.1 and the implication on the Rosenbaum-Rubin conditions are summa-
rized in Table 8.2. This table also includes τ⊥ ⊥ X and the strong ignorability condition
τ⊥ ⊥X | Z that have been treated in chapter 7. The consequences of these two conditions
on other causality conditions are summarized in the last rows of Table 7.1 and Table 7.2,
respectively. Hence, putting these tables together yields long chains of implications. Obvi-
ously, D X ⊥⊥X is the most powerful causality condition. If P (X =x ) > 0 for all x ∈ X (Ω) and Z
is a covariate of X , then it does not only imply unbiasedness of all E (Y |X =x ), E X =x (Y |Z ),
x ∈ X (Ω), as well as unbiasedness of E (Y |X ) and E (Y |X, Z ), but it also implies all other
conditions treated until now (including those dealt with in ch. 7). It even implies that all
true outcomes variables are P-unique. In contrast, assuming D X ⊥ ⊥X |Z but not D X ⊥⊥X ,
we need the additional assumptions that Z is a covariate of X and P (X =x | Z ) > P
0 in order
to show that all E X =x (Y |Z ), x ∈ X (Ω), as well as E (Y |X, Z ) are unbiased.
In experiments, the condition D X ⊥ ⊥X can be created by randomized assignment of the
observational unit to one of the treatment conditions represented by a value x of X . Simi-
larly, D X ⊥
⊥X |Z can be created by conditionally randomized assignment based on the val-
ues z of covariate Z , provided of course, that Z is a pretreatment variable.
Outside the randomized experiment, and even in cases in which X does not denote a
treatment variable that is manipulated by an experimenter, D X ⊥ ⊥X |Z may also be valid if
the covariate Z is selected appropriately (see Rem. 8.60 to 8.62).
Unlike unbiasedness and the other causality conditions treated in chapter 7, indepen-
dence of X and a global potential confounder D X of X as well as Z -conditional inde-
pendence of X and D X are empirically testable. Suppose that X is discrete. In order to
test D X ⊥⊥X , we simply have to select a potential confounder W of X and investigate if
P (X =x |W ) = P
P (X =x ) holds for all values x of X . Similarly, if X is discrete and Z is a co-
variate of X , then we can test D X ⊥ ⊥X |Z by selecting a potential confounder W of X and
see if P (X =x | Z ,W ) =
P
P (X =x | Z ) holds for all values x of X .

8.8 Proofs

Proof of Theorem 8.17

Proposition (8.29). By definition of a potential confounder of X , σ(W ) ⊂ σ(D X ), and


this implies σ(W, D X ) = σ(D X ) [see RS-Prop. (2.19)]. Therefore,
DX ⊥
⊥X ⇔ ∀W ∈ WX : (W, D X )⊥
⊥X [σ(W, D X ) = σ(D X )]
⇒ ∀W ∈ WX : D X ⊥
⊥X |W . [RS-Box 6.1 (ix)]
8.8 Proofs 257

Proposition (8.30). Correspondingly,

DX ⊥
⊥1X =x ⇔ ∀W ∈ WX : (W, D X )⊥
⊥ 1X =x [σ(W, D X ) = σ(D X )]
⇒ ∀W ∈ WX : D X ⊥
⊥1X =x |W . [RS-Box 6.1 (ix)]

Proof of Theorem 8.18

Proposition (8.32). By definition of a potential confounder of X , σ(W ) ⊂ σ(D X ), and


this implies σ(W, D X ) = σ(D X ) [see RS-Prop. (2.19)]. Therefore,

DX⊥
⊥X |Z ⇔ ∀W ∈ WX : (W, D X )⊥
⊥X | Z [σ(W, D X ) = σ(D X )]
⇒ ∀W ∈ WX : D X ⊥
⊥X |(Z ,W ) . [RS-Box 6.1 (viii)]

Proposition (8.33). Analogously,

DX⊥
⊥1X =x |Z ⇔ ∀W ∈ WX : (W, D X )⊥
⊥ 1X =x |Z [σ(W, D X ) = σ(D X )]
⇒ ∀W ∈ WX : D X ⊥
⊥1X =x |(Z ,W ) . [RS-Box 6.1 (viii)]

Proof of Theorem 8.22

Proposition (i). This follows from σ(τ) ⊂ σ(D X ) and RS-Box 2.1 (iv).
Proposition (ii).

DX ⊥
⊥X ⇒ τ⊥
⊥X [(i)]
⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x . [(7.23)]

Proposition (iii).

DX⊥
⊥X ⇔ ∀ x ∈ X (Ω): P (X =x |D X ) =
P
P (X =x ) [(8.5)]
⇒ ∀ x ∈ X (Ω): P (X =x |D X ) >
P
0 [P (X =x ) > 0, SN-(2.40)]
⇔ ∀ x ∈ X (Ω): τx is P-unique. [RS-Th. 5.27]

Propositions (iv) and (v).


¡ ¢
DX ⊥
⊥X ⇒ τ⊥
⊥X ∧ ∀ x ∈ X (Ω): τx is P-unique [(i), (iii)]
⇒ ∀ x ∈ X (Ω): E (Y |X =x ) ⊢ D X [Th. 7.10]
⇔ E (Y |X ) ⊢ D X . [(7.28)]

Proof of Theorem 8.25

Note that 0 < P (X =x ) < 1 implies 0 < P (X 6=x) < 1 because P (X 6=x) = 1 − P (X =x ).
258 8 Fisher Conditions

Proposition (i). Because σ E X =x(Y |D X ) ⊂ σ(D X ), RS-Box 2.1 (iv) implies


¡ ¢

DX ⊥
⊥1X =x ⇒ E X =x(Y |D X )⊥
⊥ 1X =x .

Correspondingly, because σ E X 6=x(Y |D X ) ⊂ σ(D X ) and σ(1X =x ) = σ(1X 6=x ), RS-Box 2.1 (iv)
¡ ¢

yields

DX ⊥
⊥1X 6=x ⇒ E X 6=x(Y |D X )⊥
⊥ 1X 6=x .

Proposition (ii).

DX ⊥
⊥1X =x ⇔ P (X =x |D X ) =
P
P (X =x ) [(8.1)]
⇒ P (X =x |D X ) >
P
0 [P (X =x ) > 0, SN-(2.40)]
X =x
⇔ E (Y |D X ) is P-unique. [RS-Th. 5.27]

Correspondingly,

DX ⊥
⊥1X =x ⇔ P (X 6=x |D X ) =
P
P (X 6=x) [(8.4)]
⇒ P (X 6=x |D X ) >
P
0 [P (X 6=x) > 0, SN-(2.40)]
X 6=x
⇔ E (Y |D X ) is P-unique. [SN-Cor. 14.48]

Proposition (iii).

DX ⊥
⊥ 1X =x
⇒ E X =x(Y |D X )⊥
⊥ 1X =x ∧ E X =x (Y |D X ) is P-unique [(i), (iii)]
X =x
⇒ E (Y |X =x ) ⊢ D X . [E (Y |D X ) = τx , Th. 7.8]

DX ⊥
⊥ 1X =x
⇒ E X 6=x(Y |D X )⊥
⊥ 1X 6=x ∧ E X 6=x(Y |D X ) is P-unique [(i), (iii)]
X 6=x X 6=x
⇒ E(Y |D X )  1X 6=x ∧ E (Y |D X ) is P-unique [RS-Th. 4.40]
X 6=x
¡ ¯ ¢ ¡ X 6=x ¢ X =
6 x
⇒ E E (Y |D X ) ¯ 1X 6=x =P
E E (Y |D X ) ∧ E (Y |D X ) is P-unique
[RS-Def. 4.36]
⇔ E (Y |X 6=x) ⊢ D X . [Th. 6.9 (i), (6.13)]

Proposition (iv). This immediately follows from Propositions (iii) and (6.13).

Proof of Theorem 8.27

Proposition (i).

DX ⊥
⊥X ⇒ DX⊥
⊥X |Z [Th. 8.17]
⇒ τ⊥
⊥X |Z . [ σ(τ) ⊂ σ(D X ), RS-Box 6.1 (vi)]

Proposition (ii).
8.8 Proofs 259

DX ⊥
⊥X ⇒ τ⊥
⊥X |Z [(i)]
⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x |Z . [RS-(6.8)]

Propositions (iii) and (iv).

DX ⊥
⊥X ⇒ τ⊥
⊥X |Z ∧ (∀ x ∈ X (Ω): τx is P-unique ) [(i), Th. 8.22 (iii)]
X =x
⇒ ∀ x ∈ X (Ω): E (Y |Z ) ⊢ D X [Th. 7.21]
⇔ E (Y |X, Z ) ⊢ D X . [(7.63)]

Proof of Theorem 8.29

Proposition (i).

DX ⊥
⊥X
⇒ Z⊥
⊥X [σ(Z ) ⊂ σ(D X ), RS-Box 2.1 (iv)]
⇒ ∀ x ∈ X (Ω): P (X =x , Z =z) = P (X =x ) · P (Z =z)
[{X =x } ∈ σ(X ), {Z =z } ∈ σ(Z ), Box 8.1 (ii)]
⇒ ∀ x ∈ X (Ω): P (X =x , Z =z) > 0. [P (Z =z) > 0, ∀ x ∈ X (Ω): P (X =x ) > 0]

Proposition (ii).

DX ⊥
⊥X ⇒ τ⊥
⊥X |Z [Th. 8.27 (i)]

⇒ τ⊥
⊥X [P (Z =z) > 0, RS-(6.37)]
P Z=z

⇔ τ⊥
⊥X |(Z =z) . [RS-(6.47)]

Proposition (iii).

DX ⊥
⊥X ⇒ τ⊥
⊥X |(Z =z) [(ii)]

⇔ τ⊥
⊥X [RS-(6.47)]
P Z=z

⇔ ∀ x ∈ X (Ω): τ ⊥
⊥ 1X =x [RS-(6.27)]
P Z=z

⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x |(Z =z) . [RS-(6.47)]

Propositions (iv) and (v).


¡ ¢
DX ⊥
⊥X ⇒ τ⊥
⊥X |(Z =z) ∧ ∀ x ∈ X (Ω): τx is P-unique [(ii), Th. 8.22 (iii)]
Z=z
⇒ ∀ x ∈ X (Ω): E (Y |X =x ) ⊢ D X [Cor. 7.14]
Z=z
⇔ E (Y |X ) ⊢ D X . [(7.36)]
260 8 Fisher Conditions

Proof of Theorem 8.31

Propositions (i) to (iii).

DX⊥
⊥X |Z
⇒ τ⊥
⊥X |Z [σ(τ) ⊂ σ(D X ), RS-Box 6.1 (vi)]
⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x |Z [RS-Th. 6.5]
⇒ ∀ x, x ′ ∈ X (Ω): τx ⊥
⊥1X =x ′ |Z . [σ(τx ) ⊂ σ(τ), RS-Box 6.1 (vi)]

Propositions (iv) and (v). These propositions follow from Proposition (i) and Theorem
7.21.

Proof of Theorem 8.34

Proposition (8.40).

DX⊥
⊥1X =x |Z ∧ P (X =x |Z ) >
P
0
⇔ P (X =x |D X ) =
P
P (X =x |Z ) ∧ P (X =x |Z ) >
P
0 [(8.7), (8.13)]
⇒ P (X =x |D X ) >
P
0 [SN-(2.40)]
⇔ τx is P-unique. [RS-Th. 5.27]

Proposition (8.41).

DX⊥ ⊥X |Z ∧ ∀ x ∈ X (Ω): P (X =x |Z ) >


P
0
¡ ¢
⇔ ∀ x ∈ X (Ω): D X ⊥
⊥1X =x |Z ∧ P (X =x |Z ) >
P
0 [(8.12)]
⇒ ∀ x ∈ X (Ω): τx is P-unique. [(8.40)]

Proof of Theorem 8.37

Proposition (i).

DX⊥
⊥X |Z ⇒ DX ⊥
⊥X [P (Z =z) > 0, RS-(6.37)]
P Z=z
Z=z
⇒ ∀ x ∈ X (Ω): τx is P -unique. [Th. 8.22 (iii)]

Proposition (ii).

DX⊥
⊥X |Z ⇒ τ⊥
⊥X |Z [Th. 8.31]
⇒ τ⊥
⊥X [P (Z =z) > 0, RS-(6.37)]
P Z=z
⇔ τ⊥
⊥X |(Z =z) . [RS-(6.47)]

Proposition (iii).

DX⊥
⊥X |Z ⇒ τ⊥
⊥X |(Z =z) [(ii)]
8.8 Proofs 261

⇔ τ⊥
⊥X [RS-(6.47)]
P Z=z

⇔ ∀ x ∈ X (Ω): τ ⊥
⊥ 1X =x [(7.12), (7.14)]
P Z=z

⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x |(Z =z).

Proposition (iv). We assume that Z is a covariate of X , that is, σ(Z ) ⊂ σ(D X ).

DX⊥
⊥X |Z
⊥X |(Z =z) ∧ ∀ x ∈ X (Ω): τx is P Z=z-unique
¡ ¢
⇒ τ⊥ [(i), (ii)]
⊥X |(Z =z) ∧ τx is P Z=z-unique
¡ ¢
⇒ ∀ x ∈ X (Ω): τx ⊥ [(7.32)]
Z=z
⇒ ∀ x ∈ X (Ω): E (Y |X =x ) ⊢ D X . [σ(Z ) ⊂ σ(D X ), (7.35)]

Proposition (v). Again, we assume σ(Z ) ⊂ σ(D X ). Hence,

DX⊥
⊥X |Z ⇒ ∀ x ∈ X (Ω): E Z=z(Y |X =x ) ⊢ D X [(iv)]
⇔ E Z=z (Y |X ) ⊢ D X . [Def. 6.18 (ii)]

Proof of Theorem 8.39

Proposition (i).
¡ ¢
DX⊥
⊥X |Z ∧ ∀(x, z) ∈ X (Ω)×Z (Ω): P (X =x , Z =z) > 0
⊥X |Z ∧ ∀(x, z) ∈ X (Ω)×Z (Ω): P Z=z (X =x ) > 0
¡ ¢
⇔ DX⊥
[def. of P Z=z (X =x )]
¡ ¢
⇒ DX⊥
⊥X |Z ∧ ∀ x ∈ X (Ω): P (X =x | Z ) >
P
0 [∀ z ∈ Z (Ω): P (Z =z) > 0]
⇒ ∀ x ∈ X (Ω): τx is P-unique. [Th. 8.34]

Proposition (ii). We additionally assume that Z is a covariate of X . Hence,

DX⊥
⊥X |Z ⇒ ∀(x, z) ∈ X (Ω)×Z (Ω): E Z=z(Y |X =x ) ⊢ D X [Th. 8.37 (iv)]
X =x
⇔ ∀(x, z) ∈ X (Ω)×Z (Ω): E (Y |Z =z) ⊢ D X [(6.19)]
X =x
⇒ ∀ x ∈ X (Ω): E (Y |Z ) ⊢ D X . [(i), Th. 6.17]

Proposition (iii). This follows from Proposition (ii) and Definition 6.18 (i).

Proof of Theorem 8.41

Proposition (i).

DX⊥
⊥X |(Z =z) ⇔ DX ⊥
⊥X [(8.24)]
P Z=z

⇒ τ⊥
⊥X [σ(τ) ⊂ σ(D X ), RS-Box 2.1 (iv)]
P Z=z

⇔ τ⊥
⊥X |(Z =z) . [RS-(6.47)]
262 8 Fisher Conditions

Proposition (ii).

DX⊥
⊥X |(Z =z)

⇒ τ⊥
⊥X |(Z =z) [(i)]

⇔ τ⊥
⊥X [RS-(6.47)]
P Z=z

⇒ ∀ x ∈ X (Ω): τ ⊥
⊥ 1X =x [∀ x ∈ X (Ω): σ(1X =x ) ⊂ σ(X ), RS-Box 2.1 (iv)]
P Z=z

⇔ ∀ x ∈ X (Ω): τ⊥
⊥1X =x |(Z =z). [RS-(6.47)]

Proposition (iii).

DX⊥
⊥X |(Z =z)

⇔ DX ⊥
⊥X [(8.24)]
P Z=z
⇔ ∀ x ∈ X (Ω): P Z=z (X =x |D X ) Z=z
= P Z=z (X =x ) [(8.25)]
P
⇒ ∀ x ∈ X (Ω): P Z=z (X =x |D X ) > 0 [P Z=z (X =x ) > 0, SN-(2.40)]
P Z=z
⇔ ∀ x ∈ X (Ω): τx is P Z=z-unique. [RS-Th. 5.27 (i), (ii)]

Propositions (iv) and (v). Now we also assume that Z is a covariate of X . Hence,

DX⊥
⊥X |(Z =z) ⇒ τ⊥
⊥X |(Z =z) [(i)]
⇒ ∀ x ∈ X (Ω): τx ⊥
⊥X |(Z =z) [(7.32)]
⇒ ∀ x ∈ X (Ω): E Z=z(Y |X =x ) ⊢ D X [(iii), (7.35)]
Z=z
⇒ E (Y |X ) ⊢ D X . [(iii), (7.36)]

Proof of Theorem 8.42

The proof is analog to the proof of Theorem 8.25. We only have to replace the probability
measure P on (Ω, A, P ) by the measure P Z=z defined in Assumption 8.1 (e). Also note that

E X =x (Y |D X ) =
P
E 1X =x =1 (Y |D X ) and E X 6=x(Y |D X ) =
P
E 1X =x =0 (Y |D X ).

Hence, E X =x (Y |D X ) and E X 6=x(Y |D X ) are versions of the two true outcome variables per-
taining to the indicator variable 1X =x (see Def. 5.4).
Proposition (i).

DX⊥
⊥1X =x |(Z =z)

⇔ DX ⊥
⊥ 1X =x [(8.19)]
P Z=z
X =x
[σ E X =x (Y |D X ) ⊂ σ(D X ), RS-Box 2.1 (iv)]
¡ ¢
⇒ E (Y |D X ) ⊥
⊥ 1X =x
P Z=z
X =x
⇔ E (Y |D X )⊥
⊥ 1X =x |(Z =z). [RS-Prop. (6.47)]
8.8 Proofs 263

The proof of E X 6=x(Y |D X )⊥


⊥ 1X 6=x |(Z =z) is analog. Just replace E X =x(Y |D X ) by E X 6=x(Y |D X )
and 1X =x by 1X 6=x .
Proposition (ii).

DX⊥
⊥1X =x |(Z =z)

⇔ P Z=z (X =x |D X ) Z=z
= P Z=z (X =x ) [(8.20)]
P

⇒ P Z=z (X =x |D X ) > 0 [P Z=z (X =x ) > 0, SN-(2.40)]


P Z=z

⇔ E X =x (Y |D X )is P Z=z-unique. [RS-Th. 5.27]

Analogously,

DX⊥
⊥1X =x |(Z =z)

⇔ P Z=z (X 6=x |D X ) Z=z


= P Z=z (X 6=x) [(8.23)]
P

⇒ P Z=z (X 6=x |D X ) Z=z


> 0 [P Z=z (X 6=x) > 0, SN-(2.40)]
P

⇔ E X 6=x(Y |D X ) is P Z=z-unique. [SN-Cor. 14.48]

Proposition (iii). Now we additionally assume that Z is a covariate of X .

DX⊥
⊥1X =x |(Z =z)
⇔ DX ⊥
⊥ 1X =x [(8.19)]
P Z=z

⇒ E X =x(Y |D X ) ⊥
⊥ 1X =x ∧ E X =x (Y |D X ) is P Z=z-unique [(i), (iii)]
P Z=z

⇒ E Z=z(Y |X =x ) ⊢ D X . [E X =x (Y |D X ) = τx , Th. 7.8, (7.20)]

The proof of E Z=z (Y | X 6=x) ⊢ D X is almost analog.

DX⊥
⊥1X =x |(Z =z)
⇔ DX ⊥
⊥ 1X =x [(8.19)]
P Z=z

⇔ DX ⊥
⊥ 1X 6=x [σ(1X =x ) = σ(1X 6=x ), RS-Def. 2.59 ]
P Z=z

⇒ E X 6=x(Y |D X ) ⊥
⊥ 1X 6=x ∧ E X 6=x(Y |D X ) is P Z=z-unique [(i), (iii)]
P Z=z

⇒ E Z=z (Y | X 6=x) ⊢ D X . [Th. 7.8, (7.20)]

Proposition (iv). This immediately follows from Propositions (iii) and (6.13).

Proof of Theorem 8.43

DX ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′
⇒ E (Y |X =x ) ⊢ D X ∧ E (Y |X =x ′ ) ⊢ D X [Th. 8.25 (iii)]
264 8 Fisher Conditions

⇔ E (Y |X =x ) = E (τx ) ∧ E (Y |X =x ′ ) = E (τx ′ ) ∧ τx , τx ′ are P-unique


[Def. 6.3 (i)]

⇒ E (Y |X =x ) − E (Y |X =x ) = E (τx ) − E (τx ′ ) ∧ τx , τx ′ are P-unique
⇔ PFE x x ′ ⊢ DC . [Def. 6.23 (i)]

Proof of Theorem 8.46

Proposition (i).

DX⊥ ⊥1X =x ′ | Z
⊥1X =x |Z ∧ D X ⊥
⇒ E 1X =x =1 (Y | Z ) ⊢ DC ∧ E 1X =x ′ =1 (Y | Z ) ⊢ DC [Th. 8.31 (iv), (8.40)]
X =x X =x ′
⇔ E (Y |Z ) ⊢ D X ∧ E (Y |Z ) ⊢ D X
[P (1X =x = 1) = P (X =x ), P (1X =x ′ = 1) = P (X =x ′ )]
⇒ PFE Z ; x x ′ ⊢ DC . [Th. 6.25]

Proposition (ii). This follows from Propositions (i) and (8.12).

Proof of Theorem 8.49

Proposition (i).

DX ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′
⇒ DX⊥ ⊥1X =x ′ | Z ∧ τx , τx ′ are P-unique
⊥1X =x |Z ∧ D X ⊥
[Th. 8.17 (ii), Th. 8.25 (ii)]
1 X =x =1 1 X =x ′ =1
⇒ E (Y | Z ) ⊢ DC ∧ E (Y | Z ) ⊢ DC [Th. 8.31 (iv)]
X =x ′
⇔ E X =x (Y |Z ) ⊢ D X ∧ E (Y |Z ) ⊢ D X
[P (1X =x = 1) = P (X =x ), P (1X =x ′ = 1) = P (X =x ′ )]
⇒ PFE Z ; x x ′ ⊢ DC . [Th. 6.25]

Proposition (ii). This follows from Propositions (i) and (8.6).

Proof of Theorem 8.50

Proposition (i).

DX⊥
⊥1X =x |(Z =z) ∧ D X ⊥
⊥1X =x ′|(Z =z)
⇒ E Z=z(Y |X =x ) ⊢ D X ∧ E Z=z (Y |X =x ′ ) ⊢ D X [Th. 8.42 (iii)]
X =x X =x ′
⇔ E (Y |Z =z) ⊢ D X ∧ E (Y |Z =z) ⊢ D X [(6.19)]
⇒ PFE Z ; x x ′ (z) ⊢ DC . [Th. 6.26]

Proposition (ii).
8.9 Exercises 265

DX⊥ ⊥1X =x ′ | Z
⊥1X =x |Z ∧ D X ⊥
⇒ DX⊥
⊥1X =x |(Z =z) ∧ D X ⊥
⊥1X =x ′|(Z =z) [RS-(6.37), RS-(6.47)]
⇒ PFE Z ; x x ′ (z) ⊢ DC . [(i)]

Proposition (iii).

DX⊥
⊥X |Z
⇒ DX⊥ ⊥1X =x ′ | Z
⊥1X =x |Z ∧ D X ⊥ [σ(1X =x ), σ(1X =x ′ ) ⊂ σ(X ), RS-Box 6.1 (vi)]
⇒ PFE Z ; x x ′ (z) ⊢ DC . [(ii)]

Proposition (iv).

DX⊥
⊥X |(Z =z)

⇔ DX ⊥
⊥X
P Z=z

⇒ DX ⊥
⊥ 1X =x ∧ D X ⊥
⊥ 1X =x ′ [σ(1X =x ), σ(1X =x ′ ) ⊂ σ(X ), RS-Box 2.1 (iv)]
P Z=z P Z=z

⇔ DX⊥
⊥1X =x |(Z =z) ∧ D X ⊥
⊥1X =x ′|(Z =z) [RS-(6.47)]

⇒ PFE Z ; x x ′ (z) ⊢ DC . [(i)]

Proof of Theorem 8.54

Proposition (i).

DX ⊥
⊥1X =x ∧ D X ⊥
⊥1X =x ′
⇒ E Z=z(Y |X =x ) ⊢ D X ∧ E Z=z (Y |X =x ′ ) ⊢ D X [Th. 8.29 (iv)]
⇒ PFE Z ; x x ′ (z) ⊢ DC . [Th. 6.26]

Proposition (ii). This follows from Propositions (i) and (8.6).

8.9 Exercises

⊲ Exercise 8-1 Prove: If D X is a global potential confounder and W a potential confounder of X ,


then D X ⊥
⊥X implies W ⊥⊥X .

⊲ Exercise 8-2 Check if and where the implications listed in Table 8.1 have been proven in this
chapter and prove those that have not. Use and specify the appropriate choice of assumptions listed
in the Assumptions 8.1.

⊲ Exercise 8-3 Check if and where the implications listed in Table 8.2 have been proven in this
chapter and prove those that have not. Use and specify the appropriate choice of assumptions listed
in the Assumptions 8.1.

⊲ Exercise 8-4 Consider Table 8.3 and show that P(X =1|U )(ω) = P(X =1|Z )(ω), for all ω ∈ {Z =m}∪
{Z = f } implies P(X =1|U ) = P(X =1|Z ).
P
266 8 Fisher Conditions

⊲ Exercise 8-5 Consider Table 8.3 and specify the four values of a second version of the U -condi-
tional expectation of Y with respect to the measure P X =1.

⊲ Exercise 8-6 Let the Assumptions 8.1 (a) to (c) and (g) hold and let Z be a covariate of X . Which
terms are then unbiased if D X ⊥
⊥X holds?

⊲ Exercise 8-7 Let the Assumptions 8.1 (a) to (d) and (g) hold. Which terms are then unbiased if we
also assume D X ⊥
⊥X |Z and P(X =x | Z ) > 0?
P

⊲ Exercise 8-8 Show that P(X =x |D X ) = P(X =x ) implies


P

P(X =x |W ) = P(X =x ),
P

provided that the random variable W is measurable with respect to D X . Assume that P(X =x ) > 0.

⊲ Exercise 8-9 Compute the conditional expectation value E (Y |X =0) and the expectation E (τ0 ) in
the example presented in Table 6.4 and compare these numbers to each other.

⊲ Exercise 8-10 Describe randomized assignment of a unit to one of two treatment conditions!

⊲ Exercise 8-11 Describe conditionally randomized assignment of a unit to one of two treatment
conditions given a covariate Z ! For simplicity, assume that Z is finite with P(Z=z) > 0 for all its
values z ∈ Z (Ω) and that X is dichotomous with values 0 and 1.

⊲ Exercise 8-12 Show that E [P(X =x |U )] = P(X =x ).

⊲ Exercise 8-13 Show that P(X =x |D X ) = P(X =x | Z ), implies


P

P(X =x | Z ,W ) = P(X =x | Z ),
P

if the random variable W is measurable with respect to D X . Assume that P(X =x ) > 0.

Solutions

⊲ Solution 8-1 Definitions 4.11 (iv) and (iii) of a potential confounder and a global potential con-
founder of X imply σ(W ) ⊂ σ(D X ). Therefore, RS-Proposition (6.5) yields the proposition.

⊲ Solution 8-2 We check the implications listed in Table 8.1 row-wise.

(1) Under the Assumptions 8.1, (a), (b), (d), and (e): D X ⊥
⊥X |(Z =z) ⇒ D X ⊥
⊥1X =x |(Z =z).

DX⊥
⊥X |(Z =z ) ⇔ DX ⊥
⊥X [(8.24)]
P Z=z
⇒ DX ⊥⊥ 1X =x [(8.6)]
P Z=z
⇔ DX⊥⊥1X =x |(Z =z) . [(8.19)]

(2) Under the Assumptions 8.1 (a), (b), (d), and (e): D X ⊥
⊥1X =x |Z ⇒ D X ⊥
⊥1X =x |(Z =z).
This immediately follows from RS-Propositions (6.47) and (6.48).
(3) Under the Assumptions 8.1 (a), (b), (d), and (e): D X ⊥
⊥X |Z ⇒ D X ⊥
⊥1X =x |(Z =z).

DX⊥
⊥X |Z ⇒ DX⊥
⊥1X =x |Z [σ(1X =x ) ⊂ σ(X ), RS-Box 6.1 (vi)]
⇒ DX⊥
⊥1X =x |(Z =z) . [(2)]
8.9 Exercises 267

(4) Under the Assumptions 8.1 (a), (d), and (e): D X ⊥


⊥X |Z ⇒ D X ⊥
⊥X |(Z =z ).
This is Proposition (8.35).
(5) Under the Assumptions 8.1 (a), (b), and (d): D X ⊥
⊥X |Z ⇒ D X ⊥⊥1X =x |Z .
This immediately follows from σ(1X =x ) ⊂ σ(X ) and RS-Box 6.1 (vi).
(6) Under the Assumptions 8.1 (a), (b), (d), (e), and (h): D X ⊥
⊥1X =x ⇒ D X ⊥
⊥1X =x |(Z =z).

DX ⊥
⊥1X =x ⇒ DX⊥
⊥1X =x |Z [(8.30)]
⇒ DX⊥
⊥1X =x |(Z =z) . [(2)]

(7) Under the Assumptions 8.1 (a), (b), and (h): D X ⊥


⊥1X =x ⇒ D X ⊥
⊥1X =x |Z .
This is Proposition (8.30).
(8) Under the Assumptions 8.1 (a), (b), (d), (e), and (h): D X ⊥
⊥X ⇒ D X ⊥
⊥1X =x |(Z =z).

DX ⊥
⊥X ⇒ DX ⊥
⊥1X =x [(8.27)]
⇒ DX⊥
⊥1X =x |(Z =z) . [(6)]

(9) Under the Assumptions 8.1 (a), (d), (e), and (h): D X ⊥
⊥X ⇒ D X ⊥
⊥X |(Z =z ).
This is Proposition (8.36).
(10) Under the Assumptions 8.1 (a), (b), and (h): D X ⊥
⊥X ⇒ D X ⊥
⊥1X =x |Z .
This is Proposition (8.31).
(11) Under the Assumptions 8.1 (a) and (h): D X ⊥
⊥X ⇒ D X ⊥
⊥X |Z .
This is Proposition (8.29).
(12) Under the Assumptions 8.1 (a) and (b): D X ⊥
⊥X ⇒ D X ⊥
⊥1X =x .
This is Proposition (8.27).

⊲ Solution 8-3 We check the implications listed in Table 8.2 row-wise.

(1) Under the Assumptions 8.1 (a) to (e) and (g): D X ⊥ ⊥1X =x |(Z =z) ⇒ τ⊥
⊥1X =x |(Z =z).
This is Theorem 8.41 (i) for 1X =x taking the role of X .
(2) ⊥X |(Z =z) ⇒ τ⊥
Under the Assumptions 8.1 (a) to (e) and (g): D X ⊥ ⊥1X =x |(Z =z).
This immediately follows from Theorem 8.41 (ii).
(3) ⊥X |(Z =z) ⇒ τ⊥
Under the Assumptions 8.1 (a) to (e) and (g): D X ⊥ ⊥X |(Z =z).
This is Theorem 8.41 (i).
(4) Under the Assumptions 8.1 (a) to (e) and (g): D X ⊥ ⊥1X =x |Z ⇒ τ⊥
⊥1X =x |(Z =z).
This is Theorem 8.37 (ii) for 1X =x taking the role of X .
(5) Under the Assumptions 8.1 (a) to (d) and (g) : D X ⊥⊥1X =x |Z ⇒ τ⊥
⊥1X =x |Z .
This is Theorem 8.31 (i) for 1X =x taking the role of X .
(6) ⊥X |Z ⇒ τ⊥
Under the Assumptions 8.1 (a) to (g): D X ⊥ ⊥1X =x |(Z =z).
This immediately follows from Theorem 8.37 (iii).
(7) ⊥X |Z ⇒ τ⊥
Under the Assumptions 8.1 (a) to (e) and (g): D X ⊥ ⊥X |(Z =z).
This is Theorem 8.37 (ii).
(8) ⊥X |Z ⇒ τ⊥
Under the Assumptions 8.1 (a) to (d) and (g) : D X ⊥ ⊥1X =x |Z .
This is Theorem 8.31 (ii).
(9) ⊥X |Z ⇒ τ⊥
Under the Assumptions 8.1 (a) to (d) and (g) : D X ⊥ ⊥X |Z .
This is Theorem 8.31 (i).
(10) Under the Assumptions 8.1 (a) to (g) and (h): D X ⊥ ⊥1X =x ⇒ τ⊥
⊥1X =x |(Z =z).
This is Theorem 8.29 (ii) for 1X =x taking the role of X .
(11) Under the Assumptions 8.1 (a) to (d), (g), and (h): D X ⊥ ⊥1X =x ⇒ τ⊥
⊥1X =x |Z .
This is Theorem 8.27 (i) for 1X =x taking the role of X .
268 8 Fisher Conditions

(12) Under the Assumptions 8.1 (a), (c), and (g): D X ⊥⊥1X =x ⇒ τ⊥ ⊥1X =x .
This is Theorem 8.22 (i) for 1X =x taking the role of X .
(13) Under the Assumptions 8.1 (a) to (g) and (h): D X ⊥ ⊥X ⇒ τ⊥ ⊥1X =x |(Z =z).
This immediately follows from Theorem 8.29 (iii).
(14) Under the Assumptions 8.1 (a) to (g) and (h): D X ⊥ ⊥X ⇒ τ⊥ ⊥X |(Z =z).
This is Theorem 8.29 (ii).
(15) Under the Assumptions 8.1 (a) to (d), (g), and (h): D X ⊥ ⊥X ⇒ τ⊥ ⊥1X =x |Z .
This immediately follows from Theorem 8.27 (ii).
(16) Under the Assumptions 8.1 (a) to (d), (g), and (h): D X ⊥ ⊥X ⇒ τ⊥ ⊥X |Z .
This is Theorem 8.27 (i).
(17) Under the Assumptions 8.1 (a), (c), and (g) : D X ⊥⊥X ⇒ τ⊥ ⊥1X =x .
This immediately follows from Theorem 8.22 (ii).
(18) Under the Assumptions 8.1 (a), (c), and (g) : D X ⊥⊥X ⇒ τ⊥ ⊥X .
This is Theorem 8.22 (i).
⊲ Solution 8-4 In Example 8.33 we specified the regular causality setup to be the same as in section
6.5. Hence, Ω = ΩU × ΩX × R [see Eq. (6.74)]. Furthermore, according to Equation (8.39), the values
of P(X =1|U ) and P(X =1|Z ) are identical for all elements of the set
{Z =m} ∪ {Z = f } = { Joe , Jim } × ΩX × R ∪ {Ann ,Sue } × ΩX × R = Ω.
Hence, the set A used in RS-Definition 2.46 is Ø, the empty set. Because P(Ø) = 0 and
∀ ω ∈ Ω \ Ø: P(X =1|U )(ω) = P(X =1|Z )(ω)

holds [see RS-Eq. (2.61)], this proves P(X =1|U ) = P(X =1|Z ).
P

⊲ Solution 8-5 The four values of a second version τ1∗ of the U -conditional expectation of Y with
respect to the measure P X =1 are
τ1∗ (ω) = 1000, if ω ∈ {U =Joe }

τ1 (ω) = 2000, if ω ∈ {U =Jim }
τ1∗ (ω) = 114, if ω ∈ {U =Ann}
τ1∗ (ω) = 130, if ω ∈ {U =Sue }.
Note that, instead of 1000 and 2000, we could have chosen any other two numbers. The crucial point
is: τ1∗ = τ1 because P X =1(U =Joe ) = P X =1(U =Jim ) = 0 (see Table 8.3).
P X =1
⊲ Solution 8-6 Under these assumptions D X ⊥ ⊥X implies that E (Y |X ), E (Y |X, Z ), all E (Y |X =x ),
and all E X =x (Y |Z ), x ∈ X (Ω), are unbiased. This also implies that the prima facie effects PFE x x ′ and
the prima facie effect variables PFE Z ; x x ′ (Z ), x, x ′ ∈ X (Ω), are unbiased as well.
⊲ Solution 8-7 Under these assumptions D X ⊥ ⊥X |Z implies that E (Y |X, Z ) and all E X =x (Y |Z ),
x ∈ X (Ω), are unbiased. It also implies that the prima facie effect variables PFE Z ; x x ′ (Z ), x, x ′ ∈ X (Ω),
are unbiased as well. Also note that the causal average total effects ATE x x ′ can be computed from
PFE Z ; x x ′ (Z ) (see Th. 6.34).

⊲ Solution 8-8 If P(X =x |D X ) = P(X =x ), then


P

P(X =x |W ) = E (1X =x |W ) [RS-(4.10)]


P
¡ ¯ ¢
= E E (1X =x |D X ) ¯ W [RS-Box 4.1 (xiii)]
P
¡ ¯ ¢
= E P(X =x |D X ) ¯ W [RS-(4.10)]
P
¡ ¢
= E P(X =x ) |W [P(X =x |D X ) = P(X =x ), RS-Box 4.1 (xiv)]
P P

= P(X =x ). [RS-Box 4.1 (i)]


P
8.9 Exercises 269

⊲ Solution 8-9 According to RS-Box 3.2 (ii), the conditional expectation value E (Y |X =0) can be
computed via X
E (Y |X =0) = E (Y | X = 0,U =u ) · P(U =u |X =0)
u
1 3
= (68 + 78 + 88 + 98) · + (106 + 116) · = 99.8.
10 10
In contrast, according to RS-(3.13) and Equations (6.77), (6.78), the expectation E (τ0 ) can be com-
puted via
E (τ0 ) = E [E X=0 (Y |U )] = E g 0 (U ) =
¡ ¢ X
g 0 (u) · P(U =u )
u
X
= E (Y | X = 0,U =u ) · P(U =u )
u
1
= (68 + 78 + 88 + 98 + 106 + 116) ·= 92.3333.
6
Comparing E (Y |X =0) = 99.8 to E (τ0 ) = 92.3333 shows that E (Y |X =0) is strongly biased [see Def. 6.3
(i)].
⊲ Solution 8-10 In a random experiment, in which a unit u is sampled and assigned to one of two
treatment conditions, we may assign the unit by coin toss, for instance. This ensures

P(X =1|D X ) = P(X =1),


P

that is, the treatment probabilities do not depend on the global potential confounder D X of X [see
Prop. (8.5)] and therefore not on any potential confounder of X (see Rem. 8.61). If D X is specified
such that U is measurable with respect to D X , then P(X =1|D X ) = P(X =1) implies that each unit u
P
has the same probability P(X =1|U =u ) = P(X =1) of being assigned to treatment 1. If we assume
that X is dichotomous with values 0 and 1, then this also implies that each unit u has the same
probability P(X = 0) to be assigned to treatment 0 as well because
P(X =0 |U =u ) = 1 − P(X =1|U =u ) = 1 − P(X =1) = P(X =0).
⊲ Solution 8-11 We consider a random experiment, in which a unit u is sampled and a value z of
the covariate Z is assessed before the unit is assigned to one of the two treatment conditions. We
also assume P(Z =z ) > 0 for all values z ∈ Z (Ω) and 0 < P(X =1| Z ) < 1. Then Z -conditional random-
ized assignment of a unit to one of the two treatment conditions refers to assigning the unit u to
treatment 1 with probability

P Z=z (X =1|D X ) = P Z=z (X =1) = P(X =1| Z=z).


P Z=z
This equation implies

P Z=z (X =1|U ) = E Z=z P Z=z (X =1|D X ) ¯ U


¡ ¯ ¢
[RS-Box 4.1 (xiii)]
P Z=z

= E Z=z P Z=z (X =1) ¯ U


¡ ¯ ¢
P Z=z

= P Z=z (X =1) [RS-Box 4.1 (i)]


P Z=z
= P(X =1| Z=z) [RS-(3.24), RS-(3.26)]
and
P Z=z (X =1|U )(ω) = P Z=z (X =1) = P(X =1| Z=z ), if ω ∈ {Z =z }
[see RS-Eq. (4.18)]. Hence, all units with the same value z of the covariate Z have the same positive
probability to be assigned to treatment 1. This assignment procedure allows for different treatment
probabilities for units for which we observe different values z of the covariate Z . It creates D X ⊥ ⊥X |Z
and 0 < P(X =1| Z ) < 1, which implies that E (Y |X, Z ) and all E X =x (Y |Z ), x ∈ X (Ω), are unbiased (see
Cor. 8.36).
270 8 Fisher Conditions

⊲ Solution 8-12
¡ ¢ ¡ ¢
E P(X =x |U ) = E E (1X =x |U ) [RS-(4.10)]
= E (1X =x ) [RS-Box 4.1 (iv)]
= P(X =x ). [RS-(3.9)]

⊲ Solution 8-13 If P(X =x |D X ) = P(X =x | Z ), then


P

P(X =x | Z ,W ) = E (1X =x |Z ,W ) [RS-(4.10)]


P
¡ ¯ ¢
= E E (1X =x |D X ) ¯ Z ,W [RS-Box 4.1 (xiii)]
P
¡ ¯ ¢
= E P(X =x |D X ) ¯ Z ,W [RS-(4.10)]
P
¡ ¢
= E P(X =x | Z ) | Z ,W
P

= P(X =x | Z ). [RS-Box 4.1 (xi)]


P
Chapter 9
Suppes-Reichenbach Conditions

In chapter 8, we introduced the Fisher conditions, a first class of empirically testable


causality conditions. We emphasized that D X ⊥ ⊥X implies D X ⊥
⊥X |Z , provided that Z is a
covariate of X . We also showed that D X ⊥ ⊥X |Z implies the Rosenbaum-Rubin conditions,
if we assume P (X =x | Z ) >P
0, for all x ∈ X (Ω). The Fisher conditions focus on (conditional)
independence of the putative cause variable X and all potential confounders of X .
In contrast, the causality conditions introduced in the present chapter, focus on condi-
tional independence or conditional mean-independence of the outcome variable Y and all
potential confounders of X . These conditions will summarily be referred to as the Suppes-
Reichenbach conditions, honoring two pioneers who made early contributions to proba-
bilistic causality (see, e. g., Reichenbach, 1956; Suppes, 1970). These conditions are also
empirically testable, some of them also apply if X is continuous, and if the true outcome
variables are P-unique, then the Suppes-Reichenbach conditions have implications on the
Rosenbaum-Rubin conditions.
We start introducing these causality conditions, then treat their consequences and dis-
cuss their methodological implications. In particular, we discuss their role in covariate
selection in the empirical analysis of causal conditional and average total effects.

Requirements

Reading this chapter we assume that the reader is familiar with the concepts treated in
all chapters of Steyer (2024). Again, chapters 4 to 6 of that book are now crucial, dealing
with the concepts of a conditional expectation, a conditional expectation with respect to a
conditional probability measure, and conditional independence. Furthermore, we assume
familiarity with chapters 4 to 8 of the present book.
In this chapter we often refer to the following notation and assumptions.

Notation and Assumptions 9.1


¡ ¢
(a) Let (Ω, A, P ), (Ft )t ∈T ,C, DC , X , Y be a regular probabilistic causality setup, let
D X denote a global potential confounder of X , and WX the set of all potential
confounders of X , that is, the set of all random variables on (Ω, A, P ) satisfying
σ(W ) ⊂ σ(D X ).
(b) Let X (Ω) = {0, 1, . . . , J } denote the image of Ω under X , let x ∈ X (Ω), let {x } ∈ AX′ ,
and let 1X =x be the indicator of the event {X =x } = {ω ∈ Ω: X (ω) = x }. Further-
more, assume P (X =x ) > 0 and define the probability measure P X =x : A → [0, 1]
by P X =x (A) = P (A | X =x ), for all A ∈ A.
(c) Let Y be real-valued with positive variance.
272 9 Suppes-Reichenbach Conditions

(d) For x ∈ X (Ω), let τx = E X =x (Y |D X ) denote a (version of the) true outcome variable
of Y given (the value) x (of X ).
(e) For all x ∈ X (Ω) = {0, 1, . . . , J }, let {x } ∈ AX′ and assume 0 < P (X =x ) < 1. Further-
more, let τ := (τ0 , τ1 , . . . , τJ ) denote the (J +1)-variate random variable consisting
of the true outcome variables τx , x ∈ X (Ω).
(f ) Let x ′ ∈ ΩX′ and {x ′ } ∈ AX′ , let 1X =x ′ denote the indicator of the event {X =x ′ } =
{ω ∈ Ω: X (ω) = x ′ } and assume that 0 < P (X =x ′ ) < 1. Furthermore, let τx ′ =

E X =x (Y |D X ) denote a (version of the) true outcome variable of Y given x ′ .
(g) Let Z be a random variable on (Ω, A, P ) and let (ΩZ′ , AZ′ ) denote its value space.
(h) Let Z be a covariate of X , that is, let Z be a random variable on (Ω, A, P ) satisfy-
ing σ(Z ) ⊂ σ(D X ).
(i) Assume that τx is P-unique.
(j) Assume that all τx , x ∈ X (Ω), are P-unique.
(k) Assume that τx ′ is P-unique.

9.1 SR-Conditions

In this section, we introduce eight causality conditions, summarily referred to as the


Suppes-Reichenbach (SR) conditions. They include conditional independence and con-
ditional mean-independence among certain random variables. Remember, conditional
mean-independence of a numerical random variable from another random variable given
a third random variable has be been treated in RS-section 4.7 and conditional indepen-
dence of two random variables given a third one in RS-section 6.1.2. In this chapter, we use
these concepts for the definitions and notation of all SR-conditions. Later, their most im-
portant implications are studied, including their implications on the Rosenbaum-Rubin
conditions and on unbiasedness.

9.1.1 Simple SR-Conditions

Box 9.1 presents all eight Suppes-Reichenbach conditions discussed in this chapter. We
start commenting on those conditions that only involve conditioning on X or on one of its
values x, but not on another random variable Z .
Remark 9.2 [Independence of Y and D X With Respect to P X =x ] The very first condition
displayed in Box 9.1 (i), Y ⊥⊥D X |(X =x ), is equivalent to independence of Y and D X with
respect to the measure P X =x . This measure has been specified in Assumptions 9.1 (b).
Hence,

Y⊥
⊥D X |(X =x ) ⇔ Y ⊥
⊥ DX, (9.1)
P X =x
where

Y ⊥
⊥ DX :⇔ ∀(A, B) ∈ σ(Y ) × σ(D X ): P X =x (A ∩ B) = P X =x (A) · P X =x (B). (9.2)
P X =x

This implies that we can utilize all properties of independence of two random variables
with respect to a probability measure. In this case, it is the measure P X =x instead of P .
9.1 SR-Conditions 273

Box 9.1 Suppes-Reichenbach conditions

Y⊥
⊥D X |(X =x ) (X =x )-conditional independence of Y and D X . Under Assumptions 9.1
(a) and (b), it is defined by
∀ (A,B) ∈ σ(Y ) ×σ(D X ): P(A ∩ B | X =x ) = P(A | X =x ) · P(B | X =x ). (i)

Y  D X |(X =x ) (X =x )-conditional mean-independence of Y from D X . Under Assump-


tions 9.1 (a) to (c), it is defined by

E X =x (Y |D X ) = E X =x (Y ). (ii)
P X =x

Under Assumptions 9.1 (a) to (d), each of the above two conditions implies E (Y |X =x ) ⊢ D X .
If, additionally, Z is a covariate of X , then each of them also implies E X =x (Y |Z ) ⊢ D X .

Y⊥
⊥D X |X X -conditional independence of Y and D X . Under Assumptions 9.1 (a), it
is defined by

∀ (A,B) ∈ σ(Y ) ×σ(D X ): P(A ∩ B | X ) = P(A | X ) · P(B | X ). (iii)


P

Y  D X |X X -conditional mean-independence of Y from D X . Under Assumptions 9.1


(a) and (c), it is defined by

E (Y | X ,D X ) = E (Y |X ) . (iv)
P

Under Assumptions 9.1 (a) to (e), each of the last two conditions implies E (Y |X ) ⊢ D X . If,
additionally, Z is a covariate of X , then each of them also implies E (Y |X, Z ) ⊢ D X .

Y⊥
⊥D X |(X =x , Z ) (X =x , Z )-conditional independence of Y and D X . Under Assumptions 9.1
(a), (b), and (g), it is defined by

∀ (A,B) ∈ σ(Y )×σ(D X ): P X =x(A ∩ B | Z ) = P X =x(A | Z ) · P X =x(B | Z ). (v)


P X =x

Y  D X |(X =x , Z ) (X =x , Z )-conditional mean-independence of Y from D X . Under Assump-


tions 9.1 (a) to (c) and (g), it is defined by

E X =x (Y | Z ,D X ) = E X =x (Y |Z ) . (vi)
P X =x

Under Assumptions 9.1 (a) to (d) and if Z is a covariate of X , then each of the last two condi-
tions implies E X =x (Y |Z ) ⊢ D X .

Y⊥
⊥D X |(X , Z ) (X , Z )-conditional independence of Y and D X . Under Assumptions 9.1 (a)
and (g), it is defined by

∀ (A,B) ∈ σ(Y ) ×σ(D X ): P(A ∩ B | X , Z ) = P(A | X , Z ) · P(B | X , Z ). (vii)


P

Y  D X |(X , Z ) (X , Z )-conditional mean-independence of Y from D X . Under Assump-


tions 9.1 (a), (c), and (g), it is defined by
E (Y | X , Z ,D X ) = E (Y |X, Z ) . (viii)
P

Under Assumptions 9.1 (a) to (e) and that Z is a covariate of X , each of the last two conditions
implies E (Y |X, Z ) ⊢ D X and E X =x (Y |Z ) ⊢ D X for all x ∈ X (Ω).
274 9 Suppes-Reichenbach Conditions

Note that the condition Y ⊥⊥D X |(X =x ) does not apply if X is continuous because we as-
sume P (X =x ) > 0 [see Assumptions 9.1 (b) and RS-Rem. 2.77]. Independence of random
variables (with respect to a probability measure) and some of its properties have been
treated in RS-section 2.4. (More details are provided, for example, in SN-chapter 16. In
particular, equivalent conditions for Y ⊥⊥D X |X in terms of conditional distributions are
found in SN-section 17.6.) ⊳
Remark 9.3 [Mean-Independence of Y and D X With Respect to P X =x ] The causality con-
dition Y  D X |(X =x ) defined in Box 9.1 (ii) is equivalent to mean-independence of Y from
D X with respect to P X =x (see RS-Def. 4.36), which is denoted Y  D X . Hence,
P X =x

Y  D X |(X =x ) ⇔ Y  DX, (9.3)


P X =x
where

Y  DX :⇔ E X =x(Y |D X ) X==x E X =x (Y ). (9.4)


P X =x P

Again note that Y  D X |(X =x ) does not apply if X is continuous because it is defined only
if P (X =x ) > 0. ⊳
Remark 9.4 [X -Conditional Independence of Y and D X ] The third causality condition
defined in Box 9.1 (iii) is X-conditional independence of the outcome variable Y and D X ,
denoted by Y ⊥ ⊥D X |X . This condition implies that the distribution of the outcome vari-
able Y does not depend on the global potential confounder D X , once we condition on
the putative cause variable X . That is, we implicitly postulate that P Y | X is a version of the
conditional distribution P Y | X , D X (see SN-ch. 17). This condition also applies if X is contin-
uous. More details about conditional independence are found in RS-chapter 6 and about
conditional distributions in SN-chapter 16. ⊳
Remark 9.5 [X -Conditional Mean-Independence of Y From D X ] The next causality con-
dition, defined in Box 9.1 (iv), is X-conditional mean-independence of Y and D X , denoted
Y  D X |X. With this condition we postulate that the (X , D X )-conditional expectation of Y
actually does not depend on the global potential confounder D X of X , once we condi-
tion on X . That is, we postulate that E (Y |X ) is a version of the conditional expectation
E (Y | X , D X )(see RS-ch. 4). Again, note that Y  D X |X also applies if X is continuous. ⊳
Remark 9.6 [A Caveat Concerning Mediators] Note that none of the causality conditions
treated so far excludes that mediators (and other variables that are between X and Y ) de-
termine the distribution of Y. For example, Y ⊥ ⊥D X |X only implies that the conditional
distribution P Y | X is a version of the conditional distribution P Y |X ,W whenever W is a po-
tential confounder of X . Remember, a potential confounder of X is D X -measurable and
therefore prior or simultaneous in (Ft )t ∈T to X (see Cor. 4.33). ⊳
In the following theorem we present a condition that is equivalent to Y  D X |X , pro-
vided that, for all values x of X , P (X =x ) > 0.

Theorem 9.7 [A Condition Equivalent to Y  D X |X if X is Discrete]


Let the Assumptions 9.1 (a) to (c) hold and assume P (X =x ) > 0 for all x ∈ X (Ω). Then

Y  D X |X ⇔ ∀ x ∈ X (Ω): E X =x (Y |D X ) X==x E X =x (Y ). (9.5)


P
(Proof p. 292)
9.1 SR-Conditions 275

Remark 9.8 [Constant True Outcome Variables] Let the Assumptions 9.1 (a) to (e) hold,
which includes that all τx , x ∈ X (Ω), are P-unique, then

(Y ), E X =1 (Y ), . . . , E X = J (Y ) ,
¡ X=0 ¢
Y  D X |X ⇔ τ =
P
E (9.6)

where τ := (τ0 , τ1 , . . . , τ J ) is a multivariate random variable consisting of J + 1 true outcome


variables. Hence, under these assumptions Y  D X |X holds if and only if

τx =
P
E X =x (Y ), ∀ x ∈ X (Ω) = {0, 1, . . . , J }. (9.7)

Also remember, according to RS-Remark 3.20, if P (X =x ) > 0, then E X =x (Y ) = E (Y |X =x )


[see RS-Eq. (3.24)]. ⊳

Intuitively speaking, according to Proposition (9.5), Y  D X |X means that X is the only


cause of Y . No other random variable that is prior or simultaneous to X determines the
expectation of Y. Of course, this is unrealistically restrictive in many applications. In con-
trast, their generalizations presented in the next section, where we additionally condition
on a covariate of X , are realistic.

9.1.2 Conditional SR-Conditions

Now we turn to those Suppes-Reichenbach conditions that involve conditioning on an-


other random variable Z on (Ω, A, P ). Note that Z = (Z 1 , . . . , Z m ) may be a multivariate
random variable on (Ω, A, P ) consisting of m random variables.

Remark 9.9 [(X =x , Z )-Conditional Independence of Y and D X ] The first one, which is
defined in Box 9.1 (v), is called (X =x , Z )-conditional independence of Y and D X , and de-
noted Y ⊥⊥D X |(X =x , Z ) or Y ⊥
⊥ D X |Z . Hence,
P X =x

Y⊥
⊥D X |(X =x , Z ) ⇔ Y ⊥
⊥ D X |Z , (9.8)
P X =x

that is, the two symbols denote the same kind of Z -conditional independence of Y from
D X given a value x of X . ⊳

Remark 9.10 [(X =x , Z )-Conditional Mean-Independence of Y from D X ] The next condi-


tion, defined in Box 9.1 (vi), is (X =x , Z )-conditional mean-independence of the outcome
variable Y from D X , which is denoted by Y  D X |(X =x , Z ). With this condition we pos-
tulate that the D X -conditional expectation of the outcome variable Y with respect to the
measure P X =x does not depend on the global potential confounder D X , once we condi-
tion on the random variable Z . If we additionally assume that Z is a covariate of X , then
σ(Z ) ⊂ σ(D X ) [see Def. 4.11 (iv) and Rem. 4.16] and σ(D X , Z ) = σ(D X ) [see RS-Prop. (2.19)].
In this case,

E X =x (Y |D X ) X==x E X =x (Y | Z , D X ) . (9.9)
P

[see RS-Def. 5.4] and

Y  D X |(X =x , Z ) ⇔ E X =x (Y |D X ) X==x E X =x (Y |Z ) . (9.10)


P
276 9 Suppes-Reichenbach Conditions

If Z is discrete and P X =x (Z =z) > 0 for all values z ∈ Z (Ω), then

Y  D X |(X =x , Z ) ⇔ ∀ ω ∈ Ω: E X =x (Y |D X )(ω) = E X =x(Y |Z =z), if ω ∈ {Z =z } (9.11)

(see Exercise 9-5). ⊳


Remark 9.11 [(X , Z )-Conditional Independence of Y and D X ] The last but one causality
condition, which is defined in Box 9.1 (vii), is (X , Z )-conditional independence of the out-
come variable Y from D X . It is denoted by Y ⊥ ⊥D X |(X , Z ). With this condition we implic-
itly postulate that the (X , D X )-conditional distribution of the outcome variable Y does not
depend on the global potential confounder D X once we condition on the putative cause
variable X and the random variable Z . That is, we implicitly postulate that P Y |X , Z is a ver-
sion of the conditional distribution P Y | X , D X (see again SN-ch. 17). ⊳
Remark 9.12 [(X , Z )-Conditional Mean-Independence of Y From D X ] The last causality
condition defined in Box 9.1 (viii) is (X , Z )-conditional mean-independence of Y from D X ,
denoted Y  D X |(X , Z ). With this condition we postulate that the (X , D X )-conditional ex-
pectation of Y actually does not depend on the global potential confounder D X of X , once
we condition on X and Z . That is, we postulate that E (Y |X, Z ) is a version of the condi-
tional expectation E (Y | X , D X )(see RS-ch. 4). ⊳
Remark 9.13 [An Equivalent Formulation of Y  D X |(X , Z )] If we assume that Z is a co-
variate of X , then σ(Z ) ⊂ σ(D X ) [see Def. 4.11 (iv) and Rem. 4.16] and σ(D X ) = σ(D X , Z ).
This implies

σ(X , D X , Z ) = σ(X , D X ) (9.12)


and
E (Y | X , Z , D X ) =
P
E (Y | X , D X ) (9.13)

(see RS-Def. 4.4). Hence, if Z is a covariate of X , then

Y  D X |(X , Z ) ⇔ E (Y | X , D X ) =
P
E (Y |X, Z ) . (9.14)


In the following theorem we present a condition that is equivalent to Y  D X |(X , Z ) if X
is discrete. This theorem is a generalization of Theorem 9.7.

Theorem 9.14 [A Condition Equivalent to Y  D X |(X , Z ) if X is Discrete]


Let the Assumptions 9.1 (a) to (c) and (g) hold and assume P (X =x ) > 0 for all x ∈ X (Ω).
Then

Y  D X |(X , Z ) ⇔ ∀ x ∈ X (Ω): E X =x (Y | Z , D X ) X==x E X =x (Y |Z ). (9.15)


P
(Proof p. 293)

Remark 9.15 [True Outcome Variables as Functions of Z ] Under the Assumptions 9.1 (a)
to (e), which includes P -uniqueness of the true outcome variables τx , x ∈ X (Ω),

(Y | Z ), E X =1 (Y | Z ), . . . , E X = J (Y | Z ) ,
¡ X=0 ¢
Y  D X |(X , Z ) ⇔ τ =
P
E (9.16)

You might also like