0% found this document useful (0 votes)

2 views

Why experimenters might not always want to randomize_Kasy 2016

The document discusses the limitations of randomization in experimental design, arguing that deterministic treatment assignments can minimize expected mean squared error (MSE) more effectively than random assignments. It emphasizes the importance of using baseline covariates to inform treatment allocation, suggesting that optimal designs can be derived through statistical decision-making frameworks. The author provides methods for implementing these designs and highlights that conditional independence can still be achieved without randomization, challenging traditional views on experimental methodology.

Uploaded by

Thuy Tien Trinh Vu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Why experimenters might not always want to randomize_Kasy 2016

Uploaded by

Thuy Tien Trinh Vu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Why Experimenters Might Not Always Want to Randomize, and What They Could Do

Instead
Author(s): Maximilian Kasy
Source: Political Analysis , Summer 2016, Vol. 24, No. 3 (Summer 2016), pp. 324-338
Published by: Cambridge University Press on behalf of the Society for Political
Methodology

Stable URL: https://www.jstor.org/stable/26349740

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

and Cambridge University Press are collaborating with JSTOR to digitize, preserve and extend
access to Political Analysis

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
Advance Access publication June 6, 2016 Political Analysis (2016) 24:324-338
doi:10.1093/pan/mpw012

Why Experimenters Might Not Always Want to Random

and What They Could Do Instead

Maximilian Kasy
Department of Economies, Harvard University, 1805 Cambridge Street,
Cambridge, MA 02138, USA
e-mail: [email protected] (corresponding author)

Edited by R. Michael Alvarez

Suppose that an experimenter has collected a sample as well as baseline information about the units i
sample. How should she allocate treatments to the units in this sample? We argue that the answer
not involve randomization if we think of experimental design as a statistical decision problem. If,
instance, the experimenter is interested in estimating the average treatment effect and evaluat
estimate in terms of the squared error, then she should minimize the expected mean squared
(MSE) through choice of a treatment assignment. We provide explicit expressions for the expected
that lead to easily implementable procedures for experimental design.

1 Introduction

Experiments, and in particular randomized experiments, are the conceptual referenc

gives empirical content to the notion of causality. In recent years, actual randomized
have become increasingly popular elements of the methodological toolbox in a wide ra
science disciplines. Examples from the recent political science literature abound
Hartman, and Blair (2014), for instance, provided training in "alternative dispute res
tices" to residents of a random set of towns in Liberia. Kalla and Broockman (2
whether the possibility of scheduling a meeting with a congressional office changes d
whether it is revealed that the person seeking a meeting is a political donor. Findley,
Sharman (2015) study the effect of varying messages to incorporation services in differen
on the possibility of establishing (illegal) anonymous shell corporations.
Researchers conducting such field experiments in political science (as well as in fie
economics, medicine, and other social and biomedical sciences) are often confronted w
of the following situation (cf. Morgan and Rubin 2012). They have selected a random
some population and have conducted a baseline survey for the individuals in this sam
discrete treatment is assigned to the individuals in this sample, usually based on som
tion scheme. Finally, outcomes are realized, and the data are used to perform inferen
average treatment effect.
A key question for experimenters is how to use covariates from the baseline sur
assignment of treatments. Intuition and the literature suggest to use stratified rando
ditional on covariates, also known as blocking. Moore (2012), for instance, makes a
argument that blocking on continuous as well as discrete covariates is better than fu
tion or blocking only on a small number of discrete covariates. We analyze this quest

Author's note·. I thank Alberto Abadie, Ivan Canay, Gary Chamberlain, Raj Chetty, Nathaniel Hendren,
Larry Katz, Gary King, Michael Kremer, and Don Rubin, as well as seminar participants at the Harvard
retreat; the Harvard Labor Economics Workshop; the Harvard Quantitative Issues in Cancer Researc
Harvard Applied Statistics Seminar; UT Austin, Princeton, Columbia, and Northwestern Econometric
RAND; and at the 2013 CEME Conference at Stanford for helpful discussions. Replication data are a
Harvard Dataverse at http://dx.doi.org/10.7910/DVN/I5KCWI. See Kasy (2016). Supplementary materials f
are available on the Political Analysis Web site.

) The Author 2016. ! 'ublished by Oxford University Press on behalf of the Society for Political Methodology.
All rignts reserved, ror permissions, please eman: journais.permissions^oup.com

324

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
Randomized Experiments 325

use baseline covariates—as a decision problem. Th

signment and an estimator, given knowledge of t
jective is to minimize risk based on a loss function su
estimator. The decision criteria considered are Bay
risk.
We show, first, that experimenters might not want to randomize in general. While surprising at
first, the basic intuition for this result is simple and holds for any statistical decision problem. The
conditional expected loss of an estimator is a function of the covariates and the treatment assign
ment. The treatment assignment that minimizes conditional expected loss is in general unique if
there are continuous covariates, so that a deterministic assignment strictly dominates all
randomized assignments.1
We next discuss how to implement optimal and near-optimal designs in practice, where near
optimal designs might involve some randomization. The key problem is to derive tractable expres
sions for the expected MSE of estimators for the average treatment effect, given a treatment as
signment. Once we have such expressions, we can numerically search for the best assignment, or for
a set of assignments that are close to optimal. In order to calculate the expected MSE, we need to
specify a prior distribution for the conditional expectation of potential outcomes given covariates.
We provide simple formulas for the expected MSE for a general class of nonparametric priors.
Our recommendation not to randomize raises the question of identification, cf. the review in
Keele (2015). We show that conditional independence of treatment and potential outcomes given
covariates still holds for the deterministic assignments considered, under the usual assumption of
independent sampling. Conditional independence only requires a controlled trial (CT), not a
randomized controlled trial (RCT).
To gain some intuition for our non-randomization result, note that in the absence of covariates
the purpose of randomization is to pick treatment and control groups that are similar before they
are exposed to different treatments. Formally, we would like to pick groups that have the same
(sample) distribution of potential outcomes. Even with covariates observed prior to treatment
assignment, it is not possible to make these groups identical in terms of potential outcomes. We
can, however, make them as similar as possible in terms of covariates. Allowing for randomness in
the treatment assignment to generate imbalanced distributions of covariates can only hurt the
balance of the distribution of potential outcomes. Whatever the conditional distribution of unob
servables given observables is, having differences in observables implies greater differences in the
distribution of unobserables relative to an assignment with no differences in observables. The
analogy to estimation might also be useful to understand our non-randomization result. Adding
random (mean 0) noise to an estimator does not introduce any bias. But it is never going to reduce
the M SE of the estimator.
The purpose of discussing tractable nonparametric priors—and one of the main contributions of
this article—is to operationalize the notion of "balance." In general, it will not be possible to obtain
exactly identical distributions of covariates in the treatment and control groups. When picking an
assignment, we have to trade off balance across various dimensions of the joint distribution of
covariates. Picking a prior distribution for the conditional expectation of potential outcomes, as
well as a loss function, allows one to calculate an objective function (Bayesian risk) that performs
this trade-off in a coherent and principled way.

2 A Motivating Example and Some Intuitions

2.1 Setup

Before we present our general results and our proposed procedure, let us discuss a simple
motivating example. The example is stylized to allow calculations "by hand," but the intuitions

'if experimenters have a preference for randomization for reasons outside the decision problem considered in the present
article, a reasonable variant of the procedure suggested here would be to randomize among a set of assignments that are
"near-minimizers" of risk. If we are worried about manipulation of covariates, in particular, a final coin flip that
possibly switches treatment and control groups might be helpful. I thank Michael Kremer for this suggestion.

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
326 Maximilian Kasy

Table 1 Comparing bias, variance, and M

Assignment Designs Model 1 Model 2

ISIVIlSC.

d, d2 d3 d4 1 2 3 4 5 bias var MSE bias var MSE

0 1 1 0 1 1 1 1 1 0.0 1.0 1.0 2.0 1.0 5.0 1.77

1 0 0 1 1 1 1 1 1 0.0 1.0 1.0 -2.0 1.0 5.0 1.77
1 0 1 0 1 1 1 1 0 -1.0 1.0 2.0 3.0 1.0 10.0 2.05
0 1 0 1 1 1 1 1 0 1.0 1.0 2.0 -3.0 1.0 10.0 2.05
1 1 0 0 1 1 1 0 0 -2.0 1.0 5.0 6.0 1.0 37.0 6.51
0 0 1 1 1 1 1 0 0 2.0 1.0 5.0 -6.0 1.0 37.0 6.51
0 1 0 0 1 1 0 0 0 -0.7 1.3 1.8 3.3 1.3 12.4 2.49
0 0 1 0 1 1 0 0 0 0.7 1.3 1.8 -0.7 1.3 1.8 2.49
1 1 0 1 1 1 0 0 0 -0.7 1.3 1.8 0.7 1.3 1.8 2.49
1 0 1 1 1 1 0 0 0 0.7 1.3 1.8 -3.3 1.3 12.4 2.49
1 0 0 0 1 1 0 0 0 -2.0 1.3 5.3 4.7 1.3 23.1 Ell
1 1 1 0 1 1 0 0 0 -2.0 1.3 5.3 7.3 1.3 55.1 Ell
0 0 0 1 1 1 0 0 0 2.0 1.3 5.3 -7.3 1.3 55.1 6.77
0 1 1 1 1 1 0 0 0 2.0 1.3 5.3 -4.7 1.3 23.1 Ell
0 0 0 0 1 0 0 0 0 - -

oo - -

oo -

1 1 1 1 1 0 0 0 0 - -

00 - -

00 -

MSE model 1: 00 3.2 2.7 1.5 1.0

MSE model 2: 00 20.6 17.3 7.5 5.0

Notes: Each row of this table corresponds to one possible treatment assignment (d\,..dî). The columns for "model 1" correspond to the
model Yj = + d + ef, and the columns for "model 2" to the model if = -x] + d + ef. Each row shows the bias, variance, and MSE of
β, for the given assignment and model. The designs 1-5 correspond to uniform random draws from the assignments with rows marked by
an entry of 1. Design 1 randomizes over all rows, design 2 over rows one through fourteen, etc. The last column of the table shows the
Bayesian expected MSE for each assignment for the squared exponential prior discussed below. For details, see Section 2.

from this example generalize. Suppose an experimenter has a sample of four experimental units
i = 1,..4, and she observes a continuous covariate Xt for each of them, where it so happens that
(X\,..X4) = (x\,..x4) = (0, 1, 2, 3). She assigns every unit to one of two binary treatments,
d, e {0, l}.2
Our experimenter wants to estimate the (conditional) average treatment effect of treatment D
across these four units,

The experimenter plans to estimate this treatment effect by calculating the difference in means
across treatment and comparison groups in her experiment, that is,

β = - Τ D, Y,■- - T(l - Di) Y,, (2)

«1 V "0 V

Since there are four experimental units, there are 24 = 16 possible treatment
sixteen rows of Table 1 correspond to these assignments.3 In the first
{d\,..., d$) = (0, 1, 1, 0), in the second row (d\,..., i/4) = (1,0, 0, 1), etc.

2Consider for instance the setting of Nyhan and Reifler (2014), where legislators i received lette
(£>,= 1) or did not (£>, = 0). The outcome of interest Y in this case is the future fact-checking ratin
an important covariate X, might be their past rating.
'The code producing Table 1 is available online at Kasy (2016). At this address, we also prov
implementing our proposed approach in practice.

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
Randomized Experiments 327

Assume now that potential outcomes are determined b

Vf = Xj + d + e^, (3)
where the ef are independent given X, and have mean 0 and
Table 1. The average treatment effect in this model is eq
(di,..., dn), we can calculate the corresponding bias, varia
where

Var(/S) = — + —- (4)
n\ «ο

with ri] being the number of units i receiving treatment dt= 1, and «ο being the s
is given by

Bias = Ε[β] - β = — Y diXi - — 1 - di)xt (5)

ηιV

for our model.

For the assignment in the first row of Table 1, we get Var(/J) = f + i = 1, for the assign
ment in row 7, Var(^) = { + 5= 1-33, and similarly for the other rows. The bias for the first
row is given by Bias = Ε[β] - β = \ ■ (x2 + *3 - xi - *4) = 0, for the third row by
Bias = 5 · (χι + X3 - X2 - xn) = -1, etc. The MSE for each row is given by

MSE (di ,...,άη) = Ε[(β- βf] = Bias2 + Var. (6)

2.1.1 Forgetting the covariâtes

Suppose now for a moment that the information about covariates got lost—somebody delete
column of X, in your spreadsheet. Then every i looks the same, before treatment is assigne
variance of potential outcomes T1 now includes both the part due to ef and the part due to X„ an
is equal to Var(îf) = Var(T,) + Var(ef) = Var(T/) + 1. Since units i are indistinguishable in
case, treatment assignments are effectively only distinguished by the number of units treated.
we observed no covariates and have random sampling, there is no bias (even ex post), and th
of any assignment with η χ treated units is equal to

MSE(û?i ,...,dn) == \n1

(-+-)
VarGS)
no/
=_ + _). (Var(X,·) + 1). (7)
Any procedure that randomly assigns n\ — n/2 — 2 units to treatment 1 is optimal in this case. Thi
is, of course, the standard recommendation. A similar argument applies when we observe a discrete
covariate, with several observations i for each value of the covariate. In this latter case, random
ization conditional on X, is optimal; this is what is known as a blocked design.

2.2 Randomized designs

Let us now, and for the rest of our discussion, assume again that we observed the covariates X,. The
calculations we have done so far for this case are for a fixed (deterministic) assignment of treatment
What if we use randomization? Any randomization procedure in our setting can be described by the
probabilities p{d\,di, d^, d4) it assigns to the different rows of Table 1. The MSE of such a pro
cedure is given by the corresponding weighted average of MSEs for each row:

MSE = £ p(dx,..., dn) ■ MSE (dx ,...,dn). (8)

d\,...,d„

"This is the ex post bias, given covariates and treatment assignment. This is the relevant notion of bias for us. It is
different from the ex ante bias, which is the expected bias when we do not know the treatment assignment yet.

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
328 Maximilian Kasy

We can now compare various randomizati

expected M SE. Let us first consider the com
by an independent coin flip. Such a proc
treatment assignments. Since there is a c
assignments in Table 1, the variance of an
infinite. This case corresponds to the firs
Any reasonable experimenter would rer
eliminate the last two rows of Table 1. T
assignment with a large MSE, such as for
tion procedure, labeled "design 2," is equ
Now, again, most experimenters would n
reduce randomness further by only allow
control groups of equal size. This eliminat
considered. The advantage of doing so is t
group sizes. Correspondingly, the average
So far we have ignored the information
such information by blocking, that is
randomizing within these groups. A spec
size 2. In our case, we could group observations 1 and 2, as well as observations 3 and 4, and
randomly assign one unit within each group to treatment and one to control. This gives the fourth
design and leaves us with random assignment over rows 1 through 4, resulting in a further red
tion of average MSE to 1.5.
This procedure, which already has more than halved the expected MSE relative to random
tion over all possible assignments, can still be improved upon. The assignment 0,1,0,1 has sy
tematically higher values of X among the treated relative to the control, resulting in a bias of 1 in
our setting, and systematically lower values of X among the treated for the assignment 1,0, 1
resulting in a bias of -1. Eliminating these two leaves us with the last design that only allows for th
first two treatment assignments. Since these two have the same MSE, we might as well random
between them. This is what the present article ultimately recommends. This yields an average M
of 1 in the present setting.

2.3 Other data-generating processes

Let us now take a step back and consider what we just did. We first noted that the MSE of an
randomized procedure is given by the weighted average of the MSEs of the deterministic assi
ments it is averaging over. By eliminating assignments with larger MSE from the support of t
randomization procedure, we could systematically reduce the MSE. In the limit, we would on
randomize between the two assignments with the lowest MSE.
The careful reader will have noted that all our calculations of the MSE depend on the assum
model determining potential outcomes. So let us consider a different model ("model 2") where
covariate affects outcomes in a different way,

y*=-£ + d + é. (9)

Relative to model 1, we have flipped the si

nonlinear. Table 1 shows the correspond
the ranking over the alternative randomize
model 2"), even though the values of th
forting robustness of the optimal proced
The careful reader will now again note
models, but one might be able to construct
is partially reversed. That is indeed true
of the underlying data-generating p

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
Randomized Experiments 329

performance across alternative data-generating pro

This can be done explicitly using a (nonparametric
calculate the expected MSE, where the expectat
processes. That is the approach we will propose in
MSE for each assignment {d\,..., dn). The last colu
each deterministic treatment assignment; randomiz
of weighted averages of the deterministic MSEs.5 We c
randomize over a set of assignments with low MSE.
step back and briefly review general statistical decision
the arguments against randomization, or for randomiz
the intuition of our little example, hold under ve
decision problem.

3 Decision Theory Tells Us We Should Not Random

3.1 Brief review of decision theory

Our argument requires that we think of experime

decision theory provides one of the conceptual fo
decision theory was pioneered by Abraham Wald i
found in Berger (1985). We shall provide a brief rev
The basic idea of decision theory is to think of s
some action a (picking an estimate, deciding whet
mental treatment allocation d,...). The choice is m
so that a = 8(X). The action will ultimately be e
also depends on the unknown state of the wor
evaluated, for instance, by how much it deviate
thought of as the negative of a utility function, a f
models.

Since we do not know what the true state of the world is, we cannot evaluate a procedure based
on its actual loss. We can, however, consider various notions of average loss. If we average
loss over the randomness of a sampling scheme, and possibly over the randomness of a treat
ment allocation scheme, holding the state of the world θ fixed, we obtain the frequentist risk
function:

R(8, θ) = E[L(8(X), θ)\θ). (10)

We can calculate this frequentist risk function. If loss is the squa
true parameter β, then the frequentist risk function is equ
example of Section 2 for each row of Table 1. The models 1 an
to two values of Θ.
The risk function by itself does not yet provide an answer to the question of which decision
procedure to pick, unfortunately. The reason is that one procedure 8 might be better in terms of
R(8, θ) for one state of the world, while another procedure 8' is better for another state Θ' of the
world. Returning again to Table 1, we have for instance that the assignment in row 10 is better than
row 3 for model 1, but worse for model 2.
We thus have to face a trade-off across states Θ. There are different ways to do this. One way is to
focus on the worst-case scenario that is on the state of the world with the worst value of the risk

5The prior used to calculate this EMSE assumes that

C((x\,d\),(x2,d2))= 10exp(—(H, - x2\\2 - (d\ -i/2)2)/10);

for details, see Section 4.

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
330 Maximilian Kasy

function. This yields the minimax dec

probabilities) to different states of the
Bayesian procedures.

3.2 Randomization in general decision

Let us now return to the question of ran

randomize? Suppose we allow our decisi
that a = S(U, X). U does not tell us anythi
enter the loss function L. As a conseq
procedure is simply a weighted averag
our random procedure might pick. An
the last two rows of Table 1.
Formally, let S"(x) = S(u, χ) be the decision made when U=u and X=x, and let R(S", Θ) be the
risk function of the procedure that picks δ"(χ) whenever X=x. Then, we have

R(S, Θ) = Σ R(&U> 0)-P(U= u), (11)

where we have assumed for simplicity of notation that U is discrete.

To get to a global assessment of δ, we need to trade off risk across values of Θ. If we do so using
the prior distribution π, we get the Bayes risk

R"(8) = J R(8, θ)άπ(θ)

= Σ j R(S", θ)άπ(θ) ■ P{U = u) (12)

= Σ Rn(8") -P(U = u).
U

In terms of Table 1, Bayes risk averages the MSE for a given assignment 8" and data-generating
process Θ, R(8", θ), both across the rows, using the randomization device U, and across different
models for the data-generating process, using the prior distribution π. If, alternatively, we evaluate
each action based on the worst-case scenario for that action, we obtain minimax risk

u).
Rmn\8) = Σ U
(supR(SU,
\ θ /
θ)\ ■ P(U :

In terms of Table 1, minimax risk evaluates each row (realized assignment 8") in terms of the worst
case model for this assignment, supf?(S", θ), and evaluates randomized designs in terms of the
θ

weighted average across the worst case risk for each assignment.
Letting R*(8) denote either Bayes or minimax risk, we have

R(8) = Σ R(8") ■ p(u = ")· (13)

The risk of any randomized procedure is given by the weighted average of risk across t
ministic procedures it averages over. Now consider an alternative decision procedure suc
S* e argmin R*(S), (14)
5

where the argmin is taken over non-randomized decision functions a = δ(Χ). It follows that
R*(S*) <minuR*(Su) < R*(S). (15)
The second inequality is strict, unless δ only puts positive probability on th

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
Randomized Experiments 331

R(8U). If the optimal deterministic S is unique, R(8) < R

that 5 does not involve randomization. We have thus sh

Proposition 1 (Optimality of non-randomized decisions).

cussed in Section 3.1. Let R* be either Bayes risk or minim

1. The optimal risk R(8) when considering only determ

the optimal risk when allowing for randomized procedu
2. If the optimal deterministic procedure 8* is unique, th
trivial randomized procedure.

3.3 Experimental design

We can think of an experiment as a decision problem th

1. Sample observations i from the population of inter

2. Using the baseline covariates X and possibly a rand
treatments /),.
3. After running the experiment, collect data on outcome
estimators β of the objects β that we are interested
treatment effect.

4. Finally, loss is realized, L = L(β, β). A common c

Since this article is about experimental design, we are in
thing else (sampling and the matrix of baseline covariates X
interest β, the loss function L) as given, we can ask what i
the information in the baseline covariates X, and possib
In order to answer this question, we first calculate the ri
treatment allocation <5 = (d\,..d„), which only depends
estimation error (β - β)2, this gives the M SE

R(&, θ) = MSE(rflf..., dn) = Ε[(β - β)2\Χ, θ\. (16)

This is the calculation we did in Section 2 to get the MSE columns for models 1 and 2. The two
models can be thought of as two values of θ at which we evaluate the risk function.
As in the general decision problem, we get that the risk function of a randomized procedure is
given by the weighted average of the risk functions for deterministic procedures,
R(S, θ) = R(S", Θ) ■ P(U = u). Consequently, the same holds for Bayes risk and minimax risk.
This implies that optimal deterministic procedures perform at least as well as any randomized
procedures.
But do they perform better? Do we lose anything by randomizing? As stated in Proposition 1, the
answer to this question hinges on whether the optimal deterministic assignment is unique. Any
assignment procedure, random or not, is optimal if and only if it assigns positive probability only to
assignments {d\,..., dn) that minimize risk among deterministic assignments.
If we were to observe no covariates, then permutations of treatment assignments would not
change risk, since everything "looks the same" after the permutation. In that case, randomization is
perfectly fine, as long as the size of treatment and control groups are fixed. Suppose next that we
observe a covariate with small support, so that we get several observations for each value of the
covariate. In this case, we can switch treatment around within each block defined by a covariate

6Squared error loss is the canonical loss function in the literature on estimation. It has a lot of convenient properties
allowing for tractable analytic results, and in keeping with the literature we will focus on squared error loss. Other
objective functions are conceivable, such as for instance expected welfare of treatment assignments based on experi
mental evidence; see for instance Kasy (2014).

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
332 Maximilian Kasy

value without changing risk—again, ever

case, stratified (blocked) randomization i
Suppose now, however, that we observe
support. In that case, in general, no two
just switch treatment around without chan
will in general be unique (possibly up to a "
This implies that randomizing comes with a
assignment.

4 Our Proposed Procedure

There is one key question we need to answe
to aggregate the risk function R(8, Θ) in
Bayesian approach, using a nonparametri
The key object that we need prior is th
covariates, since it is this function that d
treatment effect. Denote this function by

Ax,d) = E[Ydi\Xi = x,e\. (17)

In model 1 of Section 2, we had /(x, d) = x + d, in model 2, we had f(x
that the prior for / is such that expectations and variances exist. We d

E\J{x, d)] — μ(χ, d), and

Cov(/(xi, di),/(x2, d2)) = C((xi, d\), (x2, d2)) (18)

for a covariance function C. Assume further that we have a prior over the conditi
Y, which satisfies

£IVar( W, A, 0)1 Χι, A] = σ2(Τ„ A) = σ2. (19)

Assuming a prior centered on homoskedasticity is not without loss of generally, but is a
baseline.
How should the prior moments μ and C be chosen? We provide an extensive discussion in the
Online Appendix; a popular choice in the machine-learning literature is so-called squared expo
nential priors, where μ = 0 and

C({x\,d\), (x2, d2)) = exp(-(||*i - x2l|2 - (d\ - d2)2)/l). (20)

The length scale parameter I determines the assumed smoothness of / for such priors. This is the
prior we used to calculate the expected MSE in Table 1, choosing a length scale of /= 10. Another
popular set of priors builds on linear models. For such priors, we show below that expected MSE
corresponds to balance as measured by differences in covariate means.
Recall that we are ultimately interested in estimating the conditional average treatment effect,

/* = ££*#-tfw· (21)
The conditional average treatment effect is the object of interest if we want to learn about trea
effects for units, in the population from which our sample was drawn, which look similar in ter
of covariates. We might be interested in this effect both for scientific reasons and for policy re
(deciding about future treatment allocations). One more question has to be settled before w
give an expression for the Bayes risk of any given treatment assignment: How is β going
estimated? We will consider two possibilities. The first possibility is that we use an estimator th
optimal in the Bayesian sense, namely the posterior best linear predictor of β. The second
bility is that we estimate β using a simple difference in means, as in the example of Section

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
Randomized Experiments 333

4.1 Bayes optimal estimation

It is useful to introduce the following notation for th

μ, = μ(Χι, dj)

Ctj = C((Xi, d,), (Xj, dj))

μβ=-Σ[μ(Χι,\)-μ(Χ„0)\, (22)

c<=l
j
Σ ic({X-d^(x"- C((x" di)> (Xj> °))]·

Let μ and C be the corresponding vector and matrix with entries μ., and C,j. Note that both μ and
C depend on the treatment assignment (d\,...,d„). Using this notation, the prior mean and
variance of Y are equal to μ and C + σ2/, and the prior mean of β equals μ β.
Let us now consider the posterior best linear predictor, which is the best estimator (in the
Bayesian risk sense) among all estimators linear in Y.1 The posterior best linear predictor
is equal to the posterior expectation if both the prior for / and the distribution of the residuals
Y-f are multivariate normal.

Proposition 2 (Posterior best linear predictor and expected loss). The posterior best linear predictor for
the conditional average treatment effect is given by

β = μ-β + C' ■ (C + σ2Τ)_1 - (Υ — μ), (23)

and the corresponding M SE ( Bayes risk ) equals

MSE(di ,...,dn) = \Άτ(β\Χ) - C' · (C + a21)'1 ■ C, (24)

where \Άΐ(β\Χ) is the prior variance of β.

The proof of this proposition follows from standard characterizations of best linear predictors
and can be found in Appendix A. The expression for the MSE provided by equation (24) is easily
evaluated for any choice of (i/i d„). Since our goal is to minimize the MSE, we can in fact ignore
the Var(/3|A) term, which does not depend on {d\,..., d,,).

4.2 Difference in means estimation

Let us next consider the alternative case where the experimenter uses the simple difference in means
estimator. We will need the following notation:

μΊ = μ(Χί, d)
(25)
Ciydl = C((Xi, dl), (Xj, d2)),

for d,dl,d2 e {0,1}. We collect these terms in the vectors μ''and matrices Cd'-d2, which are in turn
collected as

μ = (μ1, μ2)
' C°° C01 \ (26)
C =
c10 c11

7This class of estimators includes all standard estimators of β under unconfoundedness, such as those based on matching,
inverse probability weighting, regression with controls, kernel regression, series regression, splines, etc. Linearity of the
estimator is unrelated to any assumption of linearity in X; we are considering the posterior BLP of β in Y rather than
the BLP of Yi in X,·.

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
334 Maximilian Kasy

μ and C do not depend on (d\,..d„), in

vector / sub-matrix of the former, se
to μ, we get E[f\ = μ and Var(/) = C.

Proposition 3 (Expected MSE for designs,

estimated using the difference in means

β = -ΤΰιΥι--Τ(\ ~ A)Yi- (27)

"1 V "o V

Ί Γ
MSE(rfi = σ2 J + (w' ■ μ)2 + W · C ■ w, (28)
n\ n0

w = (wu, w ),

W
ι di= 1,
n\ η (29)

ο l-dil
W: = f - .
«ο «

The expression for the MSE in this proposition has three terms. The first term is the variance of
the estimator. The second and third terms together are the expected squared bias. This splits in turn
into the square of the prior expected bias, and the prior variance of the bias.
It is interesting to note that we recover the standard notion of balance if, in addition to the
assumptions of Proposition 3, we impose a linear, separable model for /, that is,

f^x, d) = x' ■ γ + d ■ β, (30)

where γ has prior mean 0 and variance Σ. For this model, we

CΧ1-Χ2)'·β, (31)
where Xd is the sample mean of covariates in e
squared bias is equal to

(X1 - X2)' · Σ · (X1 - X2), (32)

and the MSE equals
1 1'
MSE(i/i, ...,d„) = a
2
1— + {XX - χ2)' -Σ- (Xx - x2). (
nx no

Risk is thus minimized by choosing treatment and control arms ot equal size,
balance as measured by the difference in covariate means (Χ1 — X2).

4.3 Discrete optimization

We now have almost all the ingredients for a procedure that can be used by p
important element is missing: How do we find the optimal assignment if? Or h
find a set of assignments that are close to optimal in terms of expected risk? T
since solving the problem by brute force is generally not feasible. We could do
example in Section 2, since in this example there were only 24 = 16 possible treatmen
In general, there are 2" assignments for a sample of size n, a number that very
prohibitive to search by brute force.

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
Randomized Experiments 335

So what to do? One option, which is close to a co

rerandomize a feasible number of times, and to pi
picked. A quick calculation shows that this procedu
domize k times, and pick d as the best of the k assig
better then 99% of all assignments, say, is equal t
larger than 99%, and significantly larger k are feasib
Implementation of our design procedure us
summarized as follows: First, pick a prior. Th
For our example in Section 2, we used μ = 0 and C((*i, d\), (x2, d2)) =
10 * exp(—(||xi -X2II2 - (d\ - z/2)2)/10). Calculate the corresponding matrix C, and C
iterate the following:

1. Draw a random treatment assignment.

2. Calculate the objective function of equation (28).
3. Compare it to the best MSE obtained thus far. If the new MSE is better than the
store the new treatment assignment and MSE.
4. Iterate for some prespecified number of times k.

More sophisticated alternative optimization methods are discussed in the Online Appen

5 Arguments for Randomization

There are a number of arguments that can be made for randomization and against the f
considered in this article. We shall discuss some of these, and to what extent thev aooear to

5.1 Randomization inference requires randomization

That is correct. How does this relate to our argument? The arguments of Section 3 s
optimal procedures in a decision theoretic sense do not rely on randomization. Rando
inference cannot be rationalized as an optimal procedure under the conceptual frame
decision theory, however. As a consequence, the arguments of Section 3 do not apply.
One could take the fact that randomization inference is not justified by decision theo
argument against randomization inference. But one could also consider a compromise approac
is based on randomization among assignments that have a low expected MSE. Such partially r
assignments are still close to optimal and allow the construction of randomization-based test

5.2 Is identification assuming conditional independence still valid without randomization?

Yes, it is. Selection on observables holds, as the name suggests, for any treatment assignmen
a function of observables. Put differently, conditional independence is guaranteed by any co
trial, whether randomized or not, as stated in the following proposition.

Proposition 4 (Conditional independence)

Suppose that (Χ,, Y?, Yj) are i.i.d. draws from the population of interest, which are independ
Then, any treatment assignment of the form £), = di(X\,..., X„, U) satisfies conditional inde

(T,0, Y])LDi\Xi. (34)

This is true, in particular, for deterministic treatment assig

5.3 The Bayesian approach to experimental design require

That is correct, as far as the ranking of assignments in terms

ment might be optimal for a particular prior and objective (su
(MSE) of an estimator for the average treatment effect (ATE

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
336 Maximilian Kasy

This does not imply, however, that the

has to rely on the same prior as the on
From the perspective of a researcher a
was assigned based on observables as w
ment was assigned, as long as the cond
guaranteed by Proposition 4. The desig
such as the difference in means, which

5.4 Randomized assignments are perceiv

A possible objection to the practical fe

that randomized assignments are fair,
not fair, in a similar way that "tagging"
constraint on feasible experiments.
Note, however, that optimal designs
distribution across treatments, leading
graphic or other groups, relative to ran

5.5 Does the proposed approach rely o

No, it does not. Nowhere have we imposed

distribution of covariates, beyond the ass
Section 4, in particular, is nonparametric

6 Conclusion

In this article, we discuss the question of how information from baseline covariates
when assigning treatment in an experiment. In order to give a well-grounded an
question, we adopt a decision theoretic and nonparametric framework. The nonpa
spective and the consideration of continuous covariates distinguish this article from
previous experimental design literature.
A number of conclusions emerge from our analysis. First, randomization is in general
Rather, we should pick a risk-minimizing treatment assignment, which is generically un
presence of continuous covariates. Second, we can consider nonparametric priors tha
tractable estimators and expressions for Bayesian risk (MSE). The general form of the ex
for such priors is Var(/t|A) — C' ■ (C+ Σ)-1 · C, where C and C are the appropria
vector and matrix from the prior distribution, cf. Section 4. We suggest picking th
signment that minimizes this prior risk. Finally, conditional independence betwe
outcomes and treatments given covariates does not require random assignment. It
conducting controlled trials, and does not rely on randomized controlled trials. Mat
plement our proposed approach is available online at http://dx.doi.org/10.7910/DVN

Conflict of interest statement. None declared.

I thank Larry Katz for pointing this out.

9To be precise, the support of these priors is the closure of the Reproducing Kernel Hilbert Space cor
For further discussion, see the Online Appendix.

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
Tûrl Cvnorimanlc 337

Appendix A: Proofs
Proof of Proposition 2. By the

β = Ε[β\X, D] + Cov03, Y

Using the notation introduced

Ε[β\Χ, D] = μβ

Cov(/6, Y\X, D) = C'

(A2)
Var(Y\X, D) = C +
Ε\Υ\Χ,Ό] = μ,

which yields equation (23). We furthermore have, by the general properties of best linear predictors,
that

MSEC4,..dn) = m - β)2\Χ, D] = \ατ(β - β\Χ, D),

and Οον(β, β - β\Χ, D) = 0, so that

Var($|X, D) = Var(3|U, D) + Var(/3 - D).

This immediately implies equation (24). □

Proof of Proposition 3. Let ef = if —fiJXi, fif,·)· We can write

Δ := - /! = Ç · (/(JT„ dt) + ef)-i (/(*, 1) 0))

and

MSE(û?i ,..<4) = £ΙΔ2] = Var(A) + £ΙΔ]2

= £[Var(A|/)] + Var(£[A|/]) + £ΐΔ]2.

The first term is equal to

1 1
Var(A[/) = σ2 1
L«i "oj

The second term is equal to the variance of

£[ΔI/] = W ·/.

The third term is equal to the square of

E[A] = £[£[Δ|/]]

= e[w' ·/]
= w' · E[f].

The claim follows once we recall £[/] = μ and Var(/) = C. □

Proof of Proposition 4. The assumption of independent sampling implies that

(Xu Τή, Υ])1(Χι,..Xi-uXi+u .. -, X„, U), (A10)

thus

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms
338 Maximilian Kasy

(if, Y])l(Xu ..Xi-uXi+u ..., Xn, U)\Xi, (All)

and therefore

(If, Y))Ldi(Xu...,Xn,U)\Xi. (A12)

References

Berger, J. 1985. Statistical decision theory and Bayesian inference. New York: Springer.
Blattman, C., A. C. Hartman, and R. A. Blair. 2014. How to promote order and property rights under weak rule of law?
An experiment in changing dispute resolution behavior through community education. American Political Science
Review 108:100-120.

Findley, M. G., D. L. Nielson, and J. Sharman. 2015. Causes of noncompliance with international law: A field
ment on anonymous incorporation. American Journal of Political Science 59(1): 146-61.
Kalla, J. L., and D. E. Broockman. 2015. Campaign contributions facilitate access to congressional offici
randomized field experiment. American Journal of Political Science. "http://sfx.hul.harvard.edu/hvd? cha
t = utf8&id = doi: 10.1111/ajps. 12180&sid = libx%3Ahul.harvard&genre = article" doi: 10.1111 /ajps. 12180
Kasy, M. 2014. Using data to inform policy. Working Paper.
. 2016. Matlab implementation for: Why experimenters might not always want to randomize, and what they cou
instead, Harvard Dataverse. http://dx.doi.org/10.7910/DVN/I5KCWI.
Keele, L. 2015. The statistics of causal inference: A view from political methodology. Political Analysis, doi: 10.1093
mpv007.
Moore, R. T. 2012. Multivariate continuous blocking to improve political science experiments. Political Analysis
20(4):460-79.
Morgan, K. L., and D. B. Rubin. 2012. Rerandomization to improve covariate balance in experiments. Annals of
Statistics 40(2): 1263-82.
Nyhan, B., and J. Reifler. 2014. The effect of fact-checking on elites: A field experiment on U.S. state legislators.
American Journal of Political Science.

This content downloaded from

80.64.189.225 on Wed, 06 Apr 2022 09:38:24 UTC
All use subject to https://about.jstor.org/terms

Handbook of Reading Interventions by Rollanda E. OConnor PHD, Patricia F. Vadasy PHD
100% (1)
Handbook of Reading Interventions by Rollanda E. OConnor PHD, Patricia F. Vadasy PHD
448 pages
Foundations of Agnostic Statistics
100% (1)
Foundations of Agnostic Statistics
318 pages
Aronow P.M., Miller B.T. - Foundations of Agnostic Statistics-Cambridge University Press (2019)
No ratings yet
Aronow P.M., Miller B.T. - Foundations of Agnostic Statistics-Cambridge University Press (2019)
318 pages
Medical Statistics Made Easy, fourth edition
From Everand
Medical Statistics Made Easy, fourth edition
Michael Harris
4.5/5 (2)
p1 Sir Muzzamil Notes
100% (1)
p1 Sir Muzzamil Notes
109 pages
Potential Outcomes Framework
100% (1)
Potential Outcomes Framework
7 pages
Case Analysis - Ramana Dayaram Shetty v. The International Airport Authority AIR 1979 SC 1628 - Scholarticles
No ratings yet
Case Analysis - Ramana Dayaram Shetty v. The International Airport Authority AIR 1979 SC 1628 - Scholarticles
12 pages
Experimental Design
No ratings yet
Experimental Design
40 pages
Rubin BayesianInferenceCausal 1978
No ratings yet
Rubin BayesianInferenceCausal 1978
26 pages
Rubin 1978
No ratings yet
Rubin 1978
26 pages
Search Engine Optimization and Analytics
No ratings yet
Search Engine Optimization and Analytics
49 pages
In pursuit of Balance_Bruhn and McKenzie 2009
No ratings yet
In pursuit of Balance_Bruhn and McKenzie 2009
34 pages
Emp Handout PDF
No ratings yet
Emp Handout PDF
36 pages
Unbiased Estimation of The Average Treatment Effect in Cluster-Randomized Experiments
No ratings yet
Unbiased Estimation of The Average Treatment Effect in Cluster-Randomized Experiments
37 pages
Empirical Methods - Esther Duflo 2002
No ratings yet
Empirical Methods - Esther Duflo 2002
36 pages
intro-stat
No ratings yet
intro-stat
17 pages
Causal Inference - A Statistical Learning Approach
No ratings yet
Causal Inference - A Statistical Learning Approach
247 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
8 pages
Estatistica para Economistas
No ratings yet
Estatistica para Economistas
205 pages
Causal Obs
No ratings yet
Causal Obs
35 pages
Decision Analysis: Fundamentals and Applications
From Everand
Decision Analysis: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction To Treatment Effects Handout
No ratings yet
Introduction To Treatment Effects Handout
18 pages
(Ebook) Randomization in Clinical Trials: Theory and Practice (Wiley Series in Probability and Statistics) by William F. Rosenberger, John M. Lachin ISBN 9780471236269, 9780471654070, 0471236268, 0471654078 instant download
100% (1)
(Ebook) Randomization in Clinical Trials: Theory and Practice (Wiley Series in Probability and Statistics) by William F. Rosenberger, John M. Lachin ISBN 9780471236269, 9780471654070, 0471236268, 0471654078 instant download
48 pages
10 Things To Know About Covariate Adjustment
No ratings yet
10 Things To Know About Covariate Adjustment
14 pages
Smart Business Problems and Analytical Hints in Cancer Research
From Everand
Smart Business Problems and Analytical Hints in Cancer Research
Zemelak Goraga
No ratings yet
Cook 2008
No ratings yet
Cook 2008
27 pages
Generalization Bounds and Representation Learning For Estimation of Potential Outcomes and Causal Effects
No ratings yet
Generalization Bounds and Representation Learning For Estimation of Potential Outcomes and Causal Effects
50 pages
mcdermott2002
No ratings yet
mcdermott2002
33 pages
Evaluation Method - 2023 - Class
No ratings yet
Evaluation Method - 2023 - Class
21 pages
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet
The Relationship Between Strategic Success Paradigm and Performance in Nonprofit Hospitals
From Everand
The Relationship Between Strategic Success Paradigm and Performance in Nonprofit Hospitals
Dr. Robert C. Meyers
No ratings yet
High-Dimensional Covariance Estimation: With High-Dimensional Data
From Everand
High-Dimensional Covariance Estimation: With High-Dimensional Data
Mohsen Pourahmadi
No ratings yet
Decision Theory: Fundamentals and Applications
From Everand
Decision Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
Economics 103 - Statistics For Economists: Francis J. Ditraglia
No ratings yet
Economics 103 - Statistics For Economists: Francis J. Ditraglia
29 pages
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
EN_Bayesian Adaptive Designs for Clinical Trials
No ratings yet
EN_Bayesian Adaptive Designs for Clinical Trials
25 pages
Clinical Trials Design and Methodology: Clinical Trials Mastery Series, #3
From Everand
Clinical Trials Design and Methodology: Clinical Trials Mastery Series, #3
Dr. Nilesh Panchal
No ratings yet
Quasi-Experimental Designs
No ratings yet
Quasi-Experimental Designs
5 pages
isthescience
No ratings yet
isthescience
34 pages
2108.12419v5
No ratings yet
2108.12419v5
67 pages
PSM Inès
No ratings yet
PSM Inès
71 pages
Ebs Cohost
No ratings yet
Ebs Cohost
35 pages
Intro-Bayes theory
No ratings yet
Intro-Bayes theory
17 pages
Empirical Tools of Public Finance: Solutions and Activities
No ratings yet
Empirical Tools of Public Finance: Solutions and Activities
7 pages
2008 Imai et al - Misunderstandings between experimentalists and observationalists about causal inference
No ratings yet
2008 Imai et al - Misunderstandings between experimentalists and observationalists about causal inference
22 pages
Taleb Text
100% (1)
Taleb Text
131 pages
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
No ratings yet
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
49 pages
Am Acemoglu Module
No ratings yet
Am Acemoglu Module
16 pages
Statistics for Economists 1663309869
No ratings yet
Statistics for Economists 1663309869
300 pages
Wa0048.
No ratings yet
Wa0048.
6 pages
Lecture Slides_Before Running an Experiment
No ratings yet
Lecture Slides_Before Running an Experiment
27 pages
Intervention Set Selection
From Everand
Intervention Set Selection
Simone G. Symonette
No ratings yet
Surviving Statistics: A Professor's Guide to Getting Through
From Everand
Surviving Statistics: A Professor's Guide to Getting Through
Luther Maddy
No ratings yet
Imbens Wooldridge Notes
No ratings yet
Imbens Wooldridge Notes
473 pages
Bayesian Causal Tutorial Ohiostate June2019
No ratings yet
Bayesian Causal Tutorial Ohiostate June2019
56 pages
Causal Survival Embeddings
No ratings yet
Causal Survival Embeddings
30 pages
Common Errors in Statistics (and How to Avoid Them)
From Everand
Common Errors in Statistics (and How to Avoid Them)
Phillip I. Good
No ratings yet
Topics in Applied Econometrics MIT 14.387 J. Angrist Spring 2004 W. Newey
No ratings yet
Topics in Applied Econometrics MIT 14.387 J. Angrist Spring 2004 W. Newey
7 pages
Heckman and Urzua
No ratings yet
Heckman and Urzua
110 pages
Randomization Inference in The Regression Discontinuity Design: An Application To Party Advantages in The U.S. Senate
No ratings yet
Randomization Inference in The Regression Discontinuity Design: An Application To Party Advantages in The U.S. Senate
24 pages
Matchse Handout
No ratings yet
Matchse Handout
25 pages
Glossary of Research Methods
From Everand
Glossary of Research Methods
Dr. Awadhesh Kishore
No ratings yet
Work, Energy and Power - Lecture Notes and Practice Questions
100% (1)
Work, Energy and Power - Lecture Notes and Practice Questions
24 pages
Appendix I. Raw Data Table A.1 Raw Data
No ratings yet
Appendix I. Raw Data Table A.1 Raw Data
2 pages
Indecent Images of Children
No ratings yet
Indecent Images of Children
6 pages
MPH Restaurant Bar Events Standards. Training Manual 2022
No ratings yet
MPH Restaurant Bar Events Standards. Training Manual 2022
216 pages
Nathan Kallal - Ethics Essay
No ratings yet
Nathan Kallal - Ethics Essay
2 pages
El Tropezon
No ratings yet
El Tropezon
6 pages
GED 192 - Life, Works and Writings of Rizal
No ratings yet
GED 192 - Life, Works and Writings of Rizal
6 pages
Pastor and People
No ratings yet
Pastor and People
5 pages
79 Network Marketing Tips
94% (18)
79 Network Marketing Tips
125 pages
Mooncakes, a Classic Recipe (广式月饼) - Red House Spice
No ratings yet
Mooncakes, a Classic Recipe (广式月饼) - Red House Spice
3 pages
Summary 1
No ratings yet
Summary 1
19 pages
English (UK) Style Guide: Published: June, 2017
No ratings yet
English (UK) Style Guide: Published: June, 2017
42 pages
LK 1 Modul 3
No ratings yet
LK 1 Modul 3
5 pages
4-Sps de Guzman V Ochoa
No ratings yet
4-Sps de Guzman V Ochoa
4 pages
Case 4, 5 & 6
No ratings yet
Case 4, 5 & 6
7 pages
Susan Rast's Newly Released "Look Up!" Is A Powerful Testament of Faith and Resilience
No ratings yet
Susan Rast's Newly Released "Look Up!" Is A Powerful Testament of Faith and Resilience
3 pages
Financial Management 2
100% (1)
Financial Management 2
14 pages
I ST Year MBBS
No ratings yet
I ST Year MBBS
19 pages
Net Present Value (NPV)
No ratings yet
Net Present Value (NPV)
4 pages
Bill of Rights Lesson
No ratings yet
Bill of Rights Lesson
9 pages
Construction and Long-Term Monitoring of A Concrete Box Culvert Bridge Reinforced With GFRP Bars
No ratings yet
Construction and Long-Term Monitoring of A Concrete Box Culvert Bridge Reinforced With GFRP Bars
45 pages
Ge Board 2024 Eval Exam 1 Questions
No ratings yet
Ge Board 2024 Eval Exam 1 Questions
6 pages
PDF
No ratings yet
PDF
20 pages
Intro. To Literature - THE ROCKING HORSE WINNER
No ratings yet
Intro. To Literature - THE ROCKING HORSE WINNER
4 pages
The Olga Tellis Judgement
No ratings yet
The Olga Tellis Judgement
2 pages
Leadership Training Narrative
100% (1)
Leadership Training Narrative
2 pages
Antenatal Care in Urban Area
No ratings yet
Antenatal Care in Urban Area
6 pages

Why experimenters might not always want to randomize_Kasy 2016

Uploaded by

Why experimenters might not always want to randomize_Kasy 2016

Uploaded by

Why Experimenters Might Not Always Want to Randomize, and What They Could Do

Stable URL: https://www.jstor.org/stable/26349740

This content downloaded from

Why Experimenters Might Not Always Want to Random

Edited by R. Michael Alvarez

Experiments, and in particular randomized experiments, are the conceptual referenc

This content downloaded from

use baseline covariates—as a decision problem. Th

2 A Motivating Example and Some Intuitions

This content downloaded from

Table 1 Comparing bias, variance, and M

Assignment Designs Model 1 Model 2

d, d2 d3 d4 1 2 3 4 5 bias var MSE bias var MSE

0 1 1 0 1 1 1 1 1 0.0 1.0 1.0 2.0 1.0 5.0 1.77

MSE model 1: 00 3.2 2.7 1.5 1.0

β = - Τ D, Y,■- - T(l - Di) Y,, (2)

This content downloaded from

Assume now that potential outcomes are determined b

Bias = Ε[β] - β = — Y diXi - — 1 - di)xt (5)

for our model.

MSE (di ,...,άη) = Ε[(β- βf] = Bias2 + Var. (6)

2.1.1 Forgetting the covariâtes

MSE(û?i ,...,dn) == \n1

2.2 Randomized designs

MSE = £ p(dx,..., dn) ■ MSE (dx ,...,dn). (8)

This content downloaded from

We can now compare various randomizati

2.3 Other data-generating processes

Relative to model 1, we have flipped the si

This content downloaded from

performance across alternative data-generating pro

3 Decision Theory Tells Us We Should Not Random

3.1 Brief review of decision theory

Our argument requires that we think of experime

R(8, θ) = E[L(8(X), θ)\θ). (10)

5The prior used to calculate this EMSE assumes that

C((x\,d\),(x2,d2))= 10*exp(—(H*, - x2\\2 - (d\ -i/2)2)/10);

This content downloaded from

function. This yields the minimax dec

3.2 Randomization in general decision

Let us now return to the question of ran

R(S, Θ) = Σ R(&U> 0)-P(U= u), (11)

where we have assumed for simplicity of notation that U is discrete.

R"(8) = J R(8, θ)άπ(θ)

= Σ j R(S", θ)άπ(θ) ■ P{U = u) (12)

R*(8) = Σ R*(8") ■ p(u = ")· (13)

This content downloaded from

R*(8U). If the optimal deterministic S* is unique, R*(8*) < R

Proposition 1 (Optimality of non-randomized decisions).

1. The optimal risk R*(8*) when considering only determ

3.3 Experimental design

We can think of an experiment as a decision problem th

1. Sample observations i from the population of inter

4. Finally, loss is realized, L = L(β, β). A common c

R(&, θ) = MSE(rflf..., dn) = Ε[(β - β)2\Χ, θ\. (16)

This content downloaded from

value without changing risk—again, ever

4 Our Proposed Procedure

Ax,d) = E[Ydi\Xi = x,e\. (17)

E\J{x, d)] — μ(χ, d), and

Cov(/(xi, di),/(x2, d2)) = C((xi, d\), (x2, d2)) (18)

£IVar( W, A, 0)1 Χι, A] = σ2(Τ„ A) = σ2. (19)

C({x\,d\), (x2, d2)) = exp(-(||*i - x2l|2 - (d\ - d2)2)/l). (20)

This content downloaded from

4.1 Bayes optimal estimation

It is useful to introduce the following notation for th

Ctj = C((Xi, d,), (Xj, dj))

β = μ-β + C' ■ (C + σ2Τ)_1 - (Υ — μ), (23)

MSE(di ,...,dn) = \Άτ(β\Χ) - C' · (C + a21)'1 ■ C, (24)

4.2 Difference in means estimation

This content downloaded from

μ and C do not depend on (d\,..d„), in

Proposition 3 (Expected MSE for designs,

β = -ΤΰιΥι--Τ(\ ~ A)Yi- (27)

f^x, d) = x' ■ γ + d ■ β, (30)

(X1 - X2)' · Σ · (X1 - X2), (32)

C((x\,d\),(x2,d2))= 10exp(—(H, - x2\\2 - (d\ -i/2)2)/10);

R(8) = Σ R(8") ■ p(u = ")· (13)

R(8U). If the optimal deterministic S is unique, R(8) < R

1. The optimal risk R(8) when considering only determ