0% found this document useful (0 votes)

11 views

Arkhangelsky-SyntheticDifferenceinDifferences-2021

Uploaded by

xadobo7465

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Arkhangelsky-SyntheticDifferenceinDifferences-2021

Uploaded by

xadobo7465

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

American Economic Association

Synthetic Difference-in-Differences
Author(s): Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg, Guido W. Imbens and
Stefan Wager
Source: The American Economic Review , DECEMBER 2021, Vol. 111, No. 12 (DECEMBER
2021), pp. 4088-4118
Published by: American Economic Association

Stable URL: https://www.jstor.org/stable/10.2307/27086719

REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/10.2307/27086719?seq=1&cid=pdf-
reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

American Economic Association is collaborating with JSTOR to digitize, preserve and extend access
to The American Economic Review

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
American Economic Review 2021, 111(12): 4088–4118
https://doi.org/10.1257/aer.20190159

Synthetic Difference-in-Differences†

By Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg,

Guido W. Imbens, and Stefan Wager*

We present a new estimator for causal effects with panel data that
builds on insights behind the widely used difference-in-differences
and synthetic control methods. Relative to these methods we find,
both theoretically and empirically, that this “synthetic differ-
ence-in-differences” estimator has desirable robustness properties,
and that it performs well in settings where the conventional estima-
tors are commonly used in practice. We study the asymptotic behav-
ior of the estimator when the systematic part of the outcome model
includes latent unit factors interacted with latent time factors, and
we present conditions for consistency and asymptotic normality.
(JEL C23, H25, H71, I18, L66)

Researchers are often interested in evaluating the effects of policy changes using
panel data, i.e., using repeated observations of units across time, in a setting where
some units are exposed to the policy in some time periods but not others. These pol-
icy changes are frequently not random—neither across units of analysis, nor across
time periods—and even unconfoundedness given observed covariates may not be
credible (e.g., Imbens and Rubin 2015). In the absence of exogenous variation
researchers have focused on statistical models that connect observed data to unob-
served counterfactuals. Many approaches have been developed for this setting but,
in practice, a handful of methods are dominant in empirical work. As documented
by Currie, Kleven, and Zwiers (2020), difference-in-differences (DID) methods
have been widely used in applied economics over the last three decades; see also
Ashenfelter and Card (1985); Bertrand, Duflo, and Mullainathan (2004); and Angrist

* Arkhangelsky: CEMFI, Madrid (email: [email protected]); Athey: Graduate School of Business, Stanford
University, SIEPR, and NBER (email: [email protected]); Hirshberg: Department of Quantitative Theory
and Methods, Emory University (email: [email protected]); Imbens: Graduate School of Business
and Department of Economics, Stanford University, SIEPR, and NBER (email: [email protected]); Wager:
Graduate School of Business, and of Statistics (by courtesy), Stanford University (email: [email protected]).
Thomas Lemieux was the coeditor for this article. We are grateful for helpful comments and feedback from ref-
erees, as well as from Alberto Abadie, Avi Feller, Paul Goldsmith-Pinkham, Liyang Sun, Erik Sverdrup, Yiqing
Xu, Yinchu Zhu, and seminar participants at several venues. This research was generously supported by ONR
grant N00014-17-1-2131 and the Sloan Foundation. The R package for implementing the methods developed here
is available at https://github.com/synth-inference/synthdid. The associated vignette is at https://synth-inference.
github.io/synthdid/.
†
Go to https://doi.org/10.1257/aer.20190159 to visit the article page for additional materials and author
disclosure statements.

4088

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
VOL. 111 NO. 12 ARKHANGELSKY ET AL.: SYNTHETIC DIFFERENCE IN DIFFERENCES 4089

and Pischke (2008). More recently, synthetic control (SC) methods, introduced in a
series of seminal papers by Abadie and coauthors (Abadie and Gardeazabal 2003;
Abadie, Diamond, and Hainmueller 2010; 2015; Abadie and L’Hour 2016), have
emerged as an important alternative method for comparative case studies.
Currently these two strategies are often viewed as targeting different types of
empirical applications. In general, DID methods are applied in cases where we have
a substantial number of units that are exposed to the policy, and researchers are will-
ing to make a “parallel trends” assumption that implies that we can adequately con-
trol for selection effects by accounting for additive unit-specific and time-specific
fixed effects. In contrast, SC methods, introduced in a setting with only a single (or
small number) of units exposed, seek to compensate for the lack of parallel trends
by reweighting units to match their p re-exposure trends.
In this paper, we argue that although the empirical settings where DID and SC
methods are typically used differ, the fundamental assumptions that justify both
methods are closely related. We then propose a new method, synthetic difference in
differences (SDID), that combines attractive features of both. Like SC, our method
reweights and matches pre-exposure trends to weaken the reliance on parallel trend
type assumptions. Like DID, our method is invariant to additive u nit-level shifts,
and allows for valid large-panel inference. Theoretically, we establish consistency
and asymptotic normality of our estimator. Empirically, we find that our method is
competitive with (or dominates) DID in applications where DID methods have been
used in the past, and likewise is competitive with (or dominates) SC in applications
where SC methods have been used in the past.
To introduce the basic ideas, consider a balanced panel with N units and T
time periods, where the outcome for unit iin period tis denoted by Y it, and expo-
sure to the binary treatment is denoted by W it ∈ {0, 1}. Suppose moreover that
co (control) units are never exposed to the treatment, while the last N
the first N tr
= N − Nco (treated) units are exposed after time T pre.1 Like with SC methods, we
start by finding weights ω ˆ   sdidthat align pre-exposure trends in the outcome of unex-
posed units with those for the exposed units, e.g., ∑ Ni=1 sdid
co
  ωˆ   i  Yit ≈ N  −1
tr  ∑ i=Nco
N
+1 Yit
for all t = 1, …, Tpre. We also look for time weights λ    t  that balance p re-exposure
ˆ sdid

time periods with postexposure ones (see Section I for details). Then we use these
weights in a basic two-way fixed effects regression to estimate the average causal
effect of exposure (denoted by τ ):2

{i=1 t=1 }
N T
(τˆ   sdid, μˆ , αˆ , βˆ ) =
(1)    ∑  ∑ (Yit  − μ − αi  − βt  − Wit τ)  2 ωˆ   sdid
arg min i  λ   t  .
ˆ sdid
τ,μ,α,β

In comparison, DID estimates the effect of treatment exposure by solving the same
two-way fixed effects regression problem without either time or unit weights:

{i=1 t=1 }
N T
( τˆ   did, μˆ , αˆ , βˆ ) = arg min
(2)    ∑  ∑ (Yit  − μ − αi  − βt  − Wit τ)  2 .
α,β,μ,τ

1
Throughout the main part of our analysis, we focus on the block treatment assignment case where Wit = 1({i >
Nco, t > Tpre}). In the closely related staggered adoption case (Athey and Imbens 2021) where units adopt the
treatment at different times, but remain exposed after they first adopt the treatment, one can modify the methods
developed here. See the Appendix for details.
2
This estimator also has an interpretation as a DID of weighted averages of observations. See equations (7)
and (8) below.

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4090 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

The use of weights in the SDID estimator effectively makes the two-way fixed effect
regression “local,” in that it emphasizes (puts more weight on) units that on average
are similar in terms of their past to the target (treated) units, and it emphasizes peri-
ods that are on average similar to the target (treated) periods.
This localization can bring two benefits relative to the standard DID estimator.
Intuitively, using only similar units and similar periods makes the estimator more
robust. For example, if one is interested in estimating the effect of anti-smoking
legislation on California (Abadie, Diamond, and Hainmueller 2010), or the effect of
German reunification on West Germany (Abadie, Diamond, and Hainmueller 2015),
or the effect of the Mariel boatlift on Miami (Card 1990, Peri and Yasenov 2019),
it is natural to emphasize states, countries or cities that are similar to California,
West Germany, or Miami respectively relative to states, countries, or cities that are
not. Perhaps less intuitively, the use of the weights can also improve the estima-
tor’s precision by implicitly removing systematic (predictable) parts of the outcome.
However, the latter is not guaranteed: If there is little systematic heterogeneity in
outcomes by either units or time periods, the unequal weighting of units and time
periods may worsen the precision of the estimators relative to the DID estimator.
Unit weights are designed so that the average outcome for the treated units is
approximately parallel to the weighted average for control units. Time weights are
designed so that the average posttreatment outcome for each of the control units
differs by a constant from the weighted average of the p retreatment outcomes for the
same control units. Together, these weights make the DID strategy more plausible.
This idea is not far from the current empirical practice. Raw data rarely exhibit paral-
lel time trends for treated and control units, and researchers use different techniques,
such as adjusting for covariates or selecting appropriate time periods to address this
problem (e.g., Abadie 2005, Callaway and Sant’anna 2020). Graphical evidence that
is used to support the parallel trends assumption is then based on the adjusted data.
SDID makes this process automatic and applies a similar logic to weighting both
units and time periods, all while retaining statistical guarantees. From this point of
view, SDID addresses pretesting concerns recently expressed in Roth (2018).
In comparison with the SDID estimator, the SC estimator omits the unit fixed
effect and the time weights from the regression function:

{i=1 t=1 }
N T
(τˆ   sc, μˆ , βˆ ) =
(3)  ∑  ∑ (Yit  − μ − βt  − Wit τ)  2 ωˆ   sc
 
arg min i  .
μ,β,τ

The argument for including time weights in the SDID estimator is the same as the
argument for including the unit weights presented earlier: The time weight can both
remove bias and improve precision by eliminating the role of time periods that are
very different from the posttreatment periods. Similar to the argument for the use of
weights, the argument for the inclusion of the unit fixed effects is twofold. First, by
making the model more flexible, we strengthen its robustness properties. Second,
as demonstrated in the application and simulations based on real data, these unit
fixed effects often explain much of the variation in outcomes and can improve pre-
cision. Under some conditions, SC weighting can account for the unit fixed effects
on its own. In particular, this happens when the weighted average of the outcomes
for the control units in the pretreatment periods is exactly equal to the average of
outcomes for the treated units during those pretreatment periods. In practice, this

This content downloaded from

equality holds only approximately, in which case including the unit fixed effects
in the weighted regression will remove some of the remaining bias. The benefits of
including unit fixed effects in the SC regression (3) can also be obtained by apply-
ing the SC method after centering the data by subtracting, from each unit’s trajec-
tory, its p retreatment mean. This estimator was previously suggested in Doudchenko
and Imbens (2016) and Ferman and Pinto (2019). To separate out the benefits of
allowing for fixed effects from those stemming from the use of time weights, we
include in our application and simulations this synthetic control with intercept DIFP
(Doudchenko-Imbens Ferman-Pinto) estimator.

I. An Application

To get a better understanding of how τˆ   did, τˆ   sc, and τˆ   sdidcompare to each other,
we first revisit the California smoking cessation program example of Abadie,
Diamond, and Hainmueller (2010). The goal of their analysis was to estimate the
effect of increased cigarette taxes on smoking in California (based on the data from
Orzechowski & Walker 2005). We consider observations for 39 states (including
California) from 1970 through 2000. California passed Proposition 99 increas-
ing cigarette taxes (i.e., is treated) from 1989 onwards. Thus, we have Tpre = 19
pretreatment periods,
Tpost = T − Tpre = 12 posttreatment periods, Nco = 38
unexposed states, and Ntr = 1exposed state (California).

A. Implementing SDID

Before presenting results on the California smoking case, we discuss in detail how
we choose the SC type weights ωˆ   sdid and λˆ   sdidused for our estimator as specified in
(1). Recall that, at a high level, we want to choose the unit weights to roughly match
pretreatment trends of unexposed units with those for the exposed ones, ∑Ni=1 sdid
co
  ωˆ   i 
Yit ≈ N  −1
 tr ∑Ni=Nco +1 Yitfor all t = 1, …, Tpre, and similarly we want to choose the
time weights to balance pre- and postexposure periods for unexposed units.
In the case of the unit weights ω ˆ   sdid, we implement this by solving the optimiza-
tion problem

(ωˆ 0, ωˆ   sdid) =   arg min

(4)   ℓunit(ω0, ω),
ω0∈ℝ,ω∈Ω

where

( 0 )
Tpre Nco N
2

ℓ unit(ω0, ω) =  ∑  ω

  +  ∑  ωi Yit   −  _1     ∑ Y it    + ζ  2 Tpre
∥ω∥  22   ,
t=1 i=1 N
tr i=Nco+1

{ω   + 1, …, N},
Nco

Ω =
∈ ℝ  N+  :  ∑  ωi = 1, ωi = N  −1
tr    for all i =
Nco
i=1

where ℝ+
denotes the positive real line. We set the regularization parameter ζ as
Tpre−1
Nco
(Ntr Tpost
)  σˆ  with σˆ   =  _    ∑   ∑ (Δit   − Δ )  ,
–
(5)
ζ =   
1/4 2 1 2

Nco(Tpre  − 1)i=1 t=1

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4092 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

where
Tpre−1
Nco
Δ  =  _    ∑    ∑ Δit .
–
Δit = Yi(t+1)  − Yit, and
  
1
(Tpre  − 1)i=1
Nco t=1

That is, we choose the regularization parameter ζ to match the size of a typical
o ne-period outcome change Δitfor unexposed units in the pre-period, multiplied by
a theoretically motivated scaling (Ntr Tpost )  1/4. The SDID weights ω
ˆ   sdidare closely
related to the weights used in Abadie, Diamond, and Hainmueller (2010), with
two minor differences. First, we allow for an intercept term ω0 , meaning that the
weights ωˆ   sdidno longer need to make the unexposed p re-trends perfectly match the
exposed ones; rather, it is sufficient that the weights make the trends parallel. The
reason we can allow for this extra flexibility in the choice of weights is that our
use of fixed effects αiwill absorb any constant differences between different units.
Second, following Doudchenko and Imbens (2016), we add a regularization pen-
alty to increase the dispersion, and ensure the uniqueness, of the weights. If we
were to omit the intercept ω 0 and set ζ = 0, then (4) would correspond exactly to
a choice of weights discussed in Abadie, Diamond, and Hainmueller (2010) in the
case where N tr = 1.
We implement this for the time weights λˆ   sdidby solving 3

(6) (λˆ 0, λˆ   sdid) =  arg min

  ℓtime(λ0, λ),
λ0∈ℝ,λ∈Λ

where
Nco
Tpre 2

i=1 ( +1 )
T
ℓ t ime( λ0, λ) =  ∑  λ0  +  ∑  λt Yit  −  _
1     ∑ Y it   ,
t=1 post t=Tpre
T

Λ = {λ Tpre  + 1, …, T}.

Tpre
∈ ℝ  T+  :  ∑  λt = 1, λt = T  −1
post for all t =
t=1

The main difference between (4) and (6) is that we use regularization for the former
but not the latter. This choice is motivated by our formal results, and reflects the fact
we allow for correlated observations within time periods for the same unit, but not
across units within a time period, beyond what is captured by the systematic compo-
nent of outcomes as represented by a latent factor model.
We summarize our procedure as Algorithm 1.4 In our application and simula-
tions we also report the SC and DIFP estimators. Both of these use weights solv-
ing (4) without regularization. The SC estimator also omits the intercept ω 0.5

3
The weights λˆ  sdidmay not be uniquely defined, as ℓtime can have multiple minima. In principle our results hold
for any argmin of ℓtime . These tend to be similar in the setting we consider, as they all converge to unique ‘oracle
weights’ λ̃  sdidthat are discussed in Section IIIB. In practice, to make the minimum defining our time weights
unique, we add a very small regularization term ζ 2Nco ∥ λ ∥2to ℓtime
, taking ζ = 10−6σˆ for σˆ as in (5).
4
Some applications feature time-varying exogenous covariates Xit ∈ ℝp. We can incorporate adjustment for
these covariates by applying SDID to the residuals Yres it = Yit− Xitβof the regression of Yiton Xit.
ˆ
5
Like the time weights λ ˆ  sdid, the unit weights for the SC and DIFP estimators may not be uniquely defined.
To ensure uniqueness in practice, we take ζ = 10−6σˆ , not ζ = 0, in ℓ unit . In our simulations, SC and DIFP with
this minimal form of regularization outperform more strongly regularized variants with ζas in (5). We show this
comparison in Table 6.

This content downloaded from

Algorithm 1—SDID

Data: Y, W
Result: Point estimate τˆ   sdid
1. Compute regularization parameter ζ using (5);
2. Compute unit weights ω ˆ   sdid via (4);
3. Compute time weights λ ˆ   sdid via (6);
4. Compute the SDID estimator via the weighted DID regression

{i=1 t=1 }
N T
(τˆ   sdid, μˆ , αˆ , βˆ ) = arg min
     ∑  ∑  (Yit  − μ − αi  − βt  − Wit τ)  2 ωˆ   sdid
i  λ   t  ;
ˆ sdid
τ,μ,α,β

Table 1

SDID SC DID MC DIFP

Estimate −15.6 −19.6 −27.3 −20.2 −11.1
Standard error (8.4) (9.9) (17.7) (11.5) (9.5)

Notes: Estimates for average effect of increased cigarette taxes on California per capita ciga-
rette sales over 12 posttreatment years, based on SDID, SC, DID, MC, DIFP, along with esti-
mated standard errors. We use the “placebo method” standard error estimator discussed in
Section IV.

Finally, we report results for the matrix completion (MC) estimator proposed by
Athey et al. (2021), which is based on imputing the missing Yit(0)using a low rank
factor model with nuclear norm regularization.

B. The California Smoking Cessation Program

The results from running this analysis are shown in Table 1. As argued in Abadie,
Diamond, and Hainmueller (2010), the assumptions underlying the DID estimator
are suspect here, and the −27.3 point estimate likely overstates the effect of the
policy change on smoking. SC provides a reduced (and generally considered more
credible) estimate of −19.6. The other methods, our proposed SDID, the DIFP and
the MC estimator are all smaller than the DID estimator with the SDID and DIFP
estimator substantially smaller than the SC estimator. At the very least, this differ-
ence in point estimates implies that the use of time weights and unit fixed effects
in (1) materially affects conclusions, and, throughout this paper, we will argue that
when τˆ   scand τˆ   sdiddiffer, the latter is often more credible. Next, and perhaps surpris-
ingly, we see that the standard errors obtained for SDID (and also for SC, DIFP, and
MC) are smaller than those for DID, despite our method being more flexible. This is
a result of the local fit of SDID (and SC) being improved by the weighting.
To facilitate direct comparisons, we observe that each of the three estimators can
be rewritten as a weighted average difference in adjusted outcomes δˆ ifor appropri-
ate sample weights ωˆ i :
Nco
N
τˆ  = δˆ tr  −  ∑  ωˆ i δˆ i
(7) where δˆ tr =  _1     ∑ δˆ  .
i=1 Ntr i=Nco+1 i

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4094 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

DID uses constant weights ω i  =

ˆ   did N  −1
co 
, while the construction of SDID and SC
weights is outlined in Section IA. For the adjusted outcomes δˆ i, SC uses unweighted
treatment period averages, DID uses unweighted differences between average treat-
ment period and pretreatment outcomes, and SDID uses weighted differences of the
same:
T
δˆ   i  =  
(8)
sc
_ 1     ∑ Y it ,
Tpost t=Tpre
+1

T Tpre

δˆ   did _ 1     ∑ Y it  −  _ 1    ∑  Y ,
i  =  
Tpost
t=Tpre
+1
T pre t=1 it

T Tpre
δˆ   sdid _ 1     ∑ Y it  −  ∑  λˆ   sdid
i  =   t  Yit .
Tpost t=Tpre
+1 t=1

The top panel of Figure 1 illustrates how each method operates. As is well-known
(Ashenfelter and Card 1985), DID relies on the assumption that cigarette sales in
different states would have evolved in a parallel way absent the intervention. Here,
preintervention trends are obviously not parallel, so the DID estimate should be con-
sidered suspect. In contrast, SC reweights the unexposed states so that the weighted
of outcomes for these states match California p reintervention as close as possible,
and then attributes any p ostintervention divergence of California from this weighted
average to the intervention. What SDID does here is reweight the unexposed control
units to make their time trend parallel (but not necessarily identical) to California
preintervention, then apply a DID analysis to this reweighted panel. Moreover,
because of the time weights, we only focus on a subset of the p reintervention time
periods when carrying out this last step. These time periods were selected so that
the weighted average of historical outcomes predicts average treatment period out-
comes for control units, up to a constant. It is useful to contrast the d ata-driven SDID
approach to selecting the time weights to both DID, where all p retreatment periods
are given equal weight, and to event studies where typically the last pretreatment
period is used as a comparison and so implicitly gets all the weight (e.g., Borusyak
and Jaravel 2016; Freyaldenhoven, Hansen, and Shapiro 2019).
The lower panel of Figure 1 plots δˆ tr  − δˆ ifor each method and for each unex-
posed state, where the size of each point corresponds to its weight ωˆ i; observations
with zero weight are denoted by an × symbol. As discussed in Abadie, Diamond,
and Hainmueller (2010), the SC weights ωˆ   scare sparse. The SDID weights ω ˆ   sdid are
also sparse—but less so. This is due to regularization and the use of the intercept ω0,
which allows greater flexibility in solving (4), enabling more balanced weighting.
Observe that both DID and SC have some very high influence states, that is, states
with large absolute values of ωˆ i( δ ˆ tr  − δˆ i) (e.g., in both cases, New Hampshire). In
contrast, SDID does not give any state particularly high influence, suggesting that
after weighting, we have achieved the desired “parallel trends” as illustrated in the
top panel of Figure 1 without inducing excessive variance in the estimator by using
concentrated weights.

This content downloaded from

SDID
DID SC
160
Cigarette consumption

120
(packs/year)

40
Control California

00
70

0
8

9
20

20
19

19
Difference in consumption

40
(packs/year)

−40

−80
0 0.1 0.2 0.3
Pennsylvania
South Carolina
South Dakota
Tennessee
Texas
Vermont
West Virginia
Wyoming

Pennsylvania
South Carolina
South Dakota
Tennessee
Texas
Vermont
West Virginia
Wyoming

Pennsylvania
South Carolina
South Dakota
Tennessee
Texas
Vermont
West Virginia
Wyoming
Arkansas

Delaware

Illinois
Iowa
Kansas
Louisiana
Maine
Minnesota
Missouri
Montana

North Carolina
North Dakota
Ohio
Oklahoma
Rhode Island

Arkansas
Utah
Virginia

Delaware

Illinois
Iowa
Kansas
Louisiana
Maine
Minnesota
Missouri
Montana

Arkansas

Delaware

Illinois
Iowa
Kansas
Louisiana
Maine
Minnesota
Missouri
Montana
Wisconsin

North Carolina
North Dakota
Ohio
Oklahoma
Rhode Island

Utah
Virginia
Wisconsin

North Carolina
North Dakota
Ohio
Oklahoma
Rhode Island

Utah
Virginia
Wisconsin
Alabama
Colorado

Georgia
Idaho
Indiana

Kentucky

Nebraska
Nevada
New Hampshire
New Mexico

Alabama
Colorado

Georgia
Idaho
Indiana

Alabama
Kentucky

Nebraska
Nevada
New Hampshire

Colorado
New Mexico

Georgia
Idaho
Indiana

Kentucky

Nebraska
Nevada
New Hampshire
New Mexico
Connecticut

Mississippi

Connecticut

Mississippi

Connecticut

Mississippi
Figure 1. A Comparison between DID, SC,
and SDID Estimates for the Effect of California Proposition 99
on Per-Capita Annual Cigarette Consumption (in Packs/Year)

Notes: In the first row, we show trends in consumption over time for California and the relevant weighted average of
control states, with the weights used to average pretreatment time periods at the bottom of the graphs. The estimated
effect is indicated by an arrow. In the second row, we show the state-by-state adjusted outcome difference δˆ tr  − δˆ i as
specified in (7) and (8), with the weights ω ˆ iindicated by dot size and the weighted average of these differences: the
estimated effect—indicated by a horizontal line. States are ordered alphabetically. Observations with zero weight
are denoted by an × symbol.

II. Placebo Studies

So far, we have relied on conceptual arguments to make the claim that SDID
inherits good robustness properties from both traditional DID and SC methods, and
shows promise as a method that can be used in settings where either DID and SC
would traditionally be used. The goal of this section is to see how these claims play
out in realistic empirical settings. To this end, we consider two carefully crafted sim-
ulation studies, calibrated to datasets representative of those typically used for panel
data studies. The first simulation study mimics settings where DID would be used in
practice (Section IIA), while the second mimics settings suited to SC (Section IIB).
Not only do we base the outcome model of our simulation study on real datasets,
we further ensure that the treatment assignment process is realistic by seeking to
emulate the distribution of real policy initiatives. To be specific, in Section IIA, we
consider a panel of US states. We estimate several alternative treatment assignment
models to create the hypothetical treatments, where the models are based on the
state laws related to minimum wages, abortion, or gun rights.

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4096 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

In order to run such a simulation study, we first need to commit to an econo-

metric specification that can be used to assess the accuracy of each method. Here,
we work with the following latent factor model (also referred to as an “interactive
fixed-effects model” in Xu 2017; see also Athey et al. 2021),

it = γi υ  ⊤
(9)
Y t    + τ Wit
  + εit ,

where γiis a vector of latent unit factors of dimension R, and υ

tis a vector of latent
time factors of dimension R. In matrix form, this can be written

(10)
Y = L + τW + E where L = Γ ϒ  ⊤.

We refer to Eas the idiosyncratic component or error matrix, and to Las the sys-
tematic component. We assume that the conditional expectation of the error matrix
Egiven the assignment matrix W and the systematic component L is zero. That is,
the treatment assignment cannot depend on E. However, the treatment assignment
may in general depend on the systematic component L (i.e., we do not take W to be
randomized). We assume that E iis independent of E
i′ for each pair of units i, i′ , but
we allow for correlation across time periods within a unit. Our goal is to estimate
the treatment effect τ.
The model (10) captures several qualitative challenges that have received consid-
erable attention in the recent panel data literature. When the matrix L takes on an
it = αi  + βt, then the DID regression will consistently recover
additive form, i.e., L
τ. Allowing for interactions in L is a natural way to generalize the fi xed-effects
specification and discuss inference in settings where DID is misspecified (Bai 2009;
Moon and Weidner 2015, 2017). In our formal results given in Section III, we show
how, despite not explicitly fitting the model (10), SDID can consistently estimate τ
in this design under reasonable conditions. Finally, accounting for correlation over
time within observations of the same unit is widely considered to be an import-
ant ingredient to credible inference using panel data (Angrist and Pischke 2008;
Bertrand, Duflo, and Mullainathan 2004).
In our experiments, we compare DID, SC, SDID, and DIFP, all implemented
exactly as in Section I. We also compare these four estimators to an alternative that
estimates τby directly fitting both Land τ in (10); specifically, we consider the MC
estimator recommended in Athey et al. (2021) that uses nuclear norm penalization
to regularize its estimate of L. In the remainder of this section, we focus on com-
paring the bias and root-mean-squared error (RMSE) of the estimator. We discuss
questions around inference and coverage in Section IV.

A. Current Population Survey Placebo Study

Our first set of simulation experiments revisits the landmark placebo study of
Bertrand, Duflo, and Mullainathan (2004) using the Current Population Survey
(CPS). The main goal of Bertrand, Duflo, and Mullainathan (2004) was to study
the behavior of different standard error estimators for DID. To do so, they randomly
assigned a subset of states in the CPS dataset to a placebo treatment and the rest to
the control group, and examined how well different approaches to inference for DID

This content downloaded from

estimators covered the true treatment effect of zero. Their main finding was that only
methods that were robust to serial correlation of repeated observations for a given
unit (e.g., methods that clustered observations by unit) attained valid coverage.
We modify the placebo analyses in Bertrand, Duflo, and Mullainathan (2004) in
two ways. First, we no longer assigned exposed states completely at random, and
instead use a n onuniform assignment mechanism that is inspired by different policy
choices actually made by different states. Using a nonuniformly random assignment
is important because it allows us to differentiate between various estimators in ways
that completely random assignment would not. Under completely random assign-
ment, a number of methods, including DID, perform well because the presence of
L in (10) introduces zero bias. In contrast, with a nonuniform random assignment
(i.e., treatment assignment is correlated with systematic effects), methods that do
not account for the presence of Lwill be biased. Second, we simulate values for the
outcomes based on a model estimated on the CPS data, in order to have more control
over the data generating process.

The Data Generating Process.—For the first set of simulations we use as the start-
ing point data on wages for women with positive wages in the March outgoing rota-
tion groups in the CPS for the years 1979 to 2019. We first transform these by taking
logarithms and then average them by state/year cells (we use data from National
Bureau of Economic Research). Our simulation design has two components, an out-
come model and an assignment model. We generate outcomes via a simulation that
seeks to capture the behavior of the average by state/year of the logarithm of wages
for those with positive hours worked in the CPS data as in Bertrand, Duflo, and
Mullainathan (2004). Specifically, we simulate data using the model (10), where
the rows Eiof Ehave a multivariate Gaussian distribution E i ∼  (0, Σ), and we
choose both Land Σ to fit the CPS data as follows. First, we fit a rank four factor
model for L :

(11)
L ≔   arg min
   ∑    (Y  ∗it  − 
  Lit)  2,
L:rank(L)=4 it

where Y  ∗it   denotes the true state/year average of log wage in the CPS data. We then
estimate Σby fitting an AR(2) model to the residuals of Y   ∗it  − 
  Lit. For purpose of
interpretation, we further decompose the systematic component Linto an additive
(fixed effects) term Fand an interactive term M, with
T N
1   ∑ L   +  _
Fit = αi  + βt =  _
(12) 1   ∑ L   −  _
1  ∑  L ,
T l=1 il N j=1 jt NT it it
it = Lit  − Fit .

M

This decomposition of Linto an additive two-way fixed effect component Fand an

interactive component Menables us to study the sensitivity of different estimators
to the presence of different types of systematic effects.
Next we discuss generation of the treatment assignment. Here, we are designing
a “null effect” study, meaning that treatment has no effect on the outcomes and all
methods should estimate zero. However, to make this more challenging, we choose

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4098 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

the treated units so that the assignment mechanism is correlated with the systematic
component L. We set W it = Di 1t>
T0, where D
iis a binary exposure indicator gen-
erated as

i | Ei, αi, Mi ∼ Bernoulli(πi),

(13)
D
exp(ϕα αi  + ϕM Mi)
πi = π(αi, Mi; ϕ) =  _________________
  
      .
1 + exp(ϕα αi  + ϕM Mi)
In particular, the distribution of D imay depend on α
i and M
i; however, Diis inde-
pendent of Ei, i.e., the assignment is strictly exogenous.6 To construct probabilities
{πi}for this assignment model, we choose ϕas the coefficient estimates from a
logistic regression of an observed binary characteristic of the state Dion Miand αi .
We consider three different choices for Di, relating to minimum wage laws, abortion
rights, and gun control laws.7 As a result, we get assignment probability models that
reflect actual differences across states with respect to important economic variables.
In practice the α iand Mithat we construct predict a sizable part of variation in D i,
with R   2varying from 15 percent to 30 percent.

Simulation Results.—Table 2 compares the performance of the four aforemen-

tioned estimators in the simulation design described above. We consider various
choices for the number of treated units and the treatment assignment distribu-
tion. Furthermore, we also consider settings where we drop various components
of the outcome-generating process, such as the fixed effects For the interactive
component M, or set the noise correlation matrix Σ to be diagonal. In the baseline
simulation design _ (the first row of Table
_ 2) these components have the following
______
sizes: ∥F∥F/√ NT = 0.100, and √
NT = 0.992, ∥ M∥F/√ tr(Σ)/T  = 0.098. The
covariance matrix Σis based on an AR(2) process with autoregressive coefficients
(ρ−1, ρ−
2 ) = (0.01, − 0.06).
At a high level, we find that SDID has excellent performance relative to the
benchmarks—both in terms of bias and RMSE. This holds in the baseline simula-
tion design and over a number of other designs where we vary the treatment assign-
ment (from being based on minimum wage laws to gun laws, abortion laws, or
completely random), the outcome (from average of log wages to average hours and
unemployment rate), and the maximal number of treated units (from ten to one) and
the number of exposed periods (from ten to one). We find that when the treatment
assignment is uniformly random, all methods are essentially unbiased, but SDID is
more precise. Meanwhile, when the treatment assignment is not uniformly random,
SDID is particularly successful at mitigating bias while keeping variance in check.
In the second panel of Table 2 we provide some additional insights into the
superior performance of the SDID estimator by sequentially dropping some of
the components of the model that generates the potential outcomes. If we drop
the interactive component Mfrom the outcome model (“No M”), so that the fixed

6
In the simulations below, we restrict the maximal number of treated units (either to ten or one). To achieve
this, we first sample Diindependently and accept the results if the number of treated units satisfies the constraint.
If it does not, then we choose the maximal allowed number of treated units from those selected in the first step
uniformly at random.
7
See the Appendix for details.

This content downloaded from

Table 2

RMSE Bias
SDID SC DID MC DIFP SDID SC DID MC DIFP
1. Baseline 0.28 0.37 0.49 0.35 0.32 0.10 0.20 0.21 0.15 0.07

Outcome model
2. No corr 0.28 0.38 0.49 0.35 0.32 0.10 0.20 0.21 0.15 0.07
3. No M 0.16 0.18 0.14 0.14 0.16 0.01 0.04 0.01 0.01 0.01
3. No F 0.28 0.23 0.49 0.35 0.32 0.10 0.04 0.21 0.15 0.07
4. Only noise 0.16 0.14 0.14 0.14 0.16 0.01 0.01 0.01 0.01 0.01
5. No noise 0.06 0.17 0.47 0.04 0.11 0.05 0.04 0.20 0.00 0.01

Assignment process
6. Gun law 0.26 0.27 0.47 0.36 0.30 0.08 −0.03 0.15 0.15 0.09
7. Abortion 0.23 0.31 0.45 0.31 0.27 0.04 0.16 0.03 0.02 0.01
8. Random 0.24 0.25 0.44 0.31 0.27 0.01 −0.01 0.02 0.01 −0.00

Outcome variable
9. Hours 1.90 2.03 2.06 1.85 1.97 1.12 −0.49 0.85 1.00 1.00
10. U-rate 2.25 2.31 3.91 2.96 2.30 1.77 1.73 3.60 2.63 1.69

Assignment block size

= 1
11. Tpost 0.50 0.59 0.70 0.51 0.54 0.20 0.17 0.38 0.21 0.12
12. Ntr = 1 0.63 0.73 1.26 0.81 0.83 0.03 0.15 0.11 0.05 −0.02
13. Tpost = Ntr = 1 1.12 1.24 1.52 1.07 1.16 0.14 0.24 0.33 0.16 0.11

Notes: Simulation results for CPS data. The baseline case uses state minimum wage laws to simulate treat-
ment assignment, and generates outcomes using the full data-generating process described in Section IIA, with
Tpost = 10posttreatment periods and at most N
tr  = 10treatment states. In subsequent settings, we omit parts of
the data-generating process (rows 2–6), consider different distributions for the treatment exposure variable Di (rows
7–9) and different distributions for the outcome variable (rows 10 and 11), and vary the number of treated cells
(rows 12–14). The full dataset has N = 50, T = 40, and outcomes are normalized to have mean zero and unit vari-
ance. All results are based on 1,000 simulation replications and are multiplied by ten for readability.

effect specification is correct, the DID estimator performs best (alongside MC). In
contrast, if we drop the fixed effects component (“No F”) but keep the interactive
component, the SC estimator does best. If we drop both parts of the systematic com-
ponent, and there is only noise, the superiority of the SDID estimator vanishes and
all estimators are essentially equivalent. On the other hand, if we remove the noise
component so that there is only signal, the increased flexibility of the SDID estima-
tor allows it (alongside MC) to outperform the SC and DID estimators dramatically.
Next, we focus on two designs of interest: one with the assignment probability
model based on parameters estimated in the minimum wage law model and one
where the treatment exposure D iis assigned uniformly at random. Figure 2 shows
the errors of the DID, SC, and SDID estimators in both settings, and reinforces our
observations above. When assignment is not uniformly random, the distribution of
the DID errors is visibly off-center, showing the bias of the estimator. In contrast,
the errors from SDID are nearly centered. Meanwhile, when treatment assignment
is uniformly random, both estimators are centered but the errors of DID are more
spread out. We note that the right panel of Figure 2 is closely related to the simu-
lation specification of Bertrand, Duflo, and Mullainathan (2004). From this per-
spective, Bertrand, Duflo, and Mullainathan (2004) correctly argue that the error
distribution of DID is centered, and that the error scale can accurately be recovered

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4100 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

Minimum wage assignment Random assignment

15 DID
SC
SDID

10
Density

−0.2 −0.1 0 0.1 0.2 −0.2 −0.1 0 0.1 0.2

Error

Figure 2. Distribution of the Errors of SDID, SC, and DID in the Setting of the
“Baseline” (i.e., with Minimum Wage) and Random Assignment Rows of Table 2

using appropriate robust estimators. Here, however, we go further and show that this
noise can be substantially reduced by using an estimator like SDID that can exploit
predictable variation by matching on pre-exposure trends.
Finally, we note that Figure 2 shows that the error distribution of SDID is nearly
unbiased and Gaussian in both designs, thus suggesting that it should be possible
to use τˆ   sdidas the basis for valid inference. We postpone a discussion of confidence
intervals until Section IV, where we consider various strategies for inference based
on SDID and show that they attain good coverage here.

B. Penn World Table Placebo Study

The simulation based on the CPS is a natural benchmark for applications that
traditionally rely on DID-type methods to estimate the policy effects. In contrast,
SC methods are often used in applications where units tend to be more hetero-
geneous and are observed over a longer timespan as in, e.g., Abadie, Diamond,
and Hainmueller (2015). To investigate the behavior of SDID in this type of
setting, we propose a second set of simulations based on the Penn World Table
(Feenstra, Inklaar, and Timmer 2015). This dataset contains observations on
annual real GDP for N = 111countries for T = 48consecutive years, starting
from 1959; we end the dataset in 2007 because we do not want the treatment
period to coincide with the Great Recession. We construct the outcome and the
assignment model following the same procedure outlined in the previous subsec-
tion. We select log(realGDP)as the primary outcome. As with the CPS dataset,
the two-way fixed effects explain most of the variation; however, the interactive
component plays a larger role in determining outcomes for this dataset than for the
CPS data. We again derive treatment assignment via an exposure variable D i, and
consider both a uniformly random distribution for D ias well as two nonuniform
ones based on predicting Penn World Table indicators of democracy and education
respectively.

This content downloaded from

Table 3

RMSE Bias
SDID SC DID MC DIFP SDID SC DID MC DIFP
Democracy 0.31 0.38 1.97 0.58 0.39 −0.05 −0.04 1.75 0.43 −0.07
Education 0.30 0.53 1.72 0.49 0.39 −0.03 0.25 1.62 0.40 −0.05
Random 0.37 0.46 1.29 0.63 0.45 −0.02 −0.11 −0.06 −0.04 −0.04

Notes: Simulation results based on the Penn World Table dataset. We use log(GDP)as the outcome, with N tr = 10
out of N = 111treatment countries, and Tpost
  = 10out of T = 48treatment periods. In the first two rows we con-
sider treatment assignment distributions based on democracy status and education metrics, while in the last row
the treatment is assigned completely at random. All results are based on 1,000 simulations and multiplied by ten
for readability.

Results of the simulation study are presented in Table 3. At a high level, these
results mirror the ones above: SDID again performs well in terms of both bias and
RMSE and across all simulation settings dominates the other estimators. In par-
ticular, SDID is nearly unbiased, which is important for constructing confidence
intervals with accurate coverage rates. The main difference between Tables 2 and 3
is that DID does substantially worse here relative to SC than before. This appears to
be due to the presence of a stronger interactive component in the Penn World Table
dataset, and is in line with the empirical practice of preferring SC over DID in set-
tings of this type. We again defer a discussion of inference to Section IV.

III. Formal Results

In this section we discuss the formal results. For the remainder of the paper, we
assume that the data generating process follows a generalization of the latent factor
model (10),

(14)
Y = L + W ∘ τ + E, where ( W ∘ τ)it = Wit τit .

The model allows for heterogeneity in treatment effects τi t, as in de Chaisemartin
and D’Haultfœuille (2020). As above, we assume block assignment Wit
= 1({i > Nco, t > Tpre})
, where the subscript “ ” stands for control group,
co
“tr” stands for treatment group, “pre” stands for pretreatment, and “post” stands
for posttreatment. It is useful to characterize the systematic component Las a fac-
tor model L = Γ ϒ  ⊤as in (10), where we define factors Γ = UD  1/2and ϒ  ⊤
= D  1/2 V  ⊤in terms of the singular value decomposition L = UDV  ⊤. Our target
estimand is the average treatment effect for the treated units during the periods they
were treated, which under block assignment is
N T
τ =  _
(15) 1     ∑   ∑  τ .
Ntr Tpost
i=Nco+1 t=Tpre+1 it

For notational convenience, we partition the matrix Y as

Yco,pre Yco,post
( Ytr,pre )
Y =   
      ,
Ytr,post

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4102 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

with Yco,pre, an N
co
  × Tprematrix; Y co,post
, an N
co
  × Tpostmatrix; Y
tr,pre
, an N
tr   × Tpre
matrix; and Ytr,post, an Ntr  × Tpostmatrix; and similar for L, W, τ, and E. Throughout
our analysis, we will assume that the errors Ei.are homoskedastic across units (but
not across time), i.e., that var[Ei⋅] = Σ ∈ ℝ  T×Tfor all units i = 1, …, n. We par-
tition Σ as
pre,pre Σpre,post
Σ
(Σpost,pre Σpost,post)
Σ =   
    .

Given this setting, we are interested in guarantees on how accurately SDID can
recover τ .
A simple, intuitively appealing approach to estimating τ in (14) is to directly fit
both L and τvia methods for low-rank matrix estimation, and several variants of this
approach have been proposed in the literature (e.g., Athey et al. 2021, Bai 2009, Xu
2017, Agarwal et al. 2019). However, our main interest is in τ and not in L , and so
one might suspect that approaches that provide consistent estimation of L may rely
on assumptions that are stronger than what is necessary for consistent estimation of
τ.
SC methods address confounding bias without explicitly estimating L in (14).
Instead, they take an indirect approach more akin to balancing as in Zubizarreta
(2015) and Athey, Imbens, and Wager (2018). Recall that the SC weights ω ˆ   sc
seek to balance out the preintervention trends in Y. Qualitatively, one might hope
that doing so also leads us to balance out the unit factors Γ from (10), render-
ing ∑i=Nco +1 ω i  Γi⋅  − ∑i=1  ω i  Γi⋅ ≈ 0
N
ˆ   sc Nco sc
ˆ   . Abadie, Diamond, and Hainmueller
(2010) provide some arguments for why this should be the case, and our for-
mal analysis outlines a further set of conditions under which this type of phe-
nomenon holds. Then, if ωˆ   scin fact succeeds in balancing out the factors in Γ,
the SC estimator can be approximated as τˆ   sc ≈ τ + ∑Ni=1(2Wi  − 1) ωˆ   sc i  ε i with ε i
– –

post ∑ t=Tpre
= T  −1 T
+1εit ; in words, SC weighting has succeeded in removing the bias
associated with the systematic component Land in delivering a nearly unbiased
estimate of τ.
Much like the SC estimator, the SDID estimator seeks to recover τ in (14) by
reweighting to remove the bias associated with L. However, the SDID estimator
takes a two-pronged approach. First, instead of only making use of unit weights ωˆ 
that can be used to balance out Γ , the estimator also incorporates time weights λ ˆ  that
seek to balance out ϒ. This provides a type of double robustness property, whereby
if one of the balancing approaches is effective, the dependence on Lis approxi-
mately removed. Second, the use of t wo-way fixed effects in (1) and intercept terms
in (4) and (6) makes the SDID estimator invariant to additive shocks to any row or
column; i.e., if we modify L it ← Lit  + αi  + βtfor any choices α
iand β
tthe esti-
mator τˆ   sdidremains unchanged. The estimator shares this invariance property with
DID (but not SC).8
The goal of our formal analysis is to understand how and when the SDID weights
succeed in removing the bias due to L . As discussed below, this requires assumptions

8
More specifically, as suggested by (3), SC is invariant to shifts in β
tbut not α
i. In this context, we also note
that the DIFP estimator proposed by Doudchenko and Imbens (2016) and Ferman and Pinto (2019) that center each
unit’s trajectory before applying the SC method is also invariant to shifts in αi.

This content downloaded from

on the signal to noise ratio. The assumptions require that Edoes not incorporate too
much serial correlation within units, so that we can attribute persistent patterns in
Yto patterns in L ; furthermore, Γshould be stable over time, particularly through
the treatment periods. Of course, these are nontrivial assumptions. However, as dis-
cussed further in Section V, they are considerably weaker than what is required in
results of Bai (2009) or Moon and Weidner (2015, 2017) for methods that require
explicitly estimating L in (14). Furthermore, these assumption are aligned with
standard practice in the literature; for example, we can assess the claim that we
balance all components of Γby examining the extent to which the method succeeds
in balancing preintervention periods. Historical context may be needed to justify the
assumption that there were no other shocks disproportionately affecting the treat-
ment units at the time of the treatment.

A. Weighted D
ouble-Differencing Estimators

We introduced the SDID estimator (1) as the solution to a weighted two-way

fixed effects regression. For the purpose of our formal results, however, it is conve-
nient to work with the alternative characterization described in equation (16). For
any weights ω ∈ Ωand λ ∈ Λ, we can define a weighted double-differencing
estimator 9

τˆ (ω, λ) =
(16) ω  ⊤
tr  Ytr,postλpost  − ω  co    Yco,postλpost  − ω  tr  Ytr,preλpre+ ω  co  Yco,pre λpre .
⊤ ⊤ ⊤

One can verify that the basic DID estimator is of the form (16), with constant
weights ωtr = 1 / Ntr , etc. The proposed SDID estimator (1) can also be written as
(16), but now with weights ωˆ   sdid and λˆ   sdid solving (4) and (6) respectively. When
there is no risk of ambiguity, we will omit the SDID superscript from the weights
and simply write ωˆ and λˆ .
Now, note that for any choice of weights ω ∈ Ωand λ ∈ Λ, we have ωtr ∈
ℝ  Ntrand λpost ∈ ℝ  Tpostwith all elements equal to 1 / Ntr and 1 / Tpost respectively,
and so ω tr  τtr,post λpost = τ
  ⊤ . Thus, we can decompose the error of any weighted
double-differencing estimator with weights satisfying these conditions as the sum
of a bias and a noise component:

τˆ (ω, λ)  − τ
(17)

      
= ω  tr  Ltr,post
⊤
λpost
  − ω  ⊤
co    Lco,post
λ    
  − ω  ⊤
post tr  Ltr,pre
λpre
  + ω  ⊤
co    Lco,pre
λpre


bias B(ω,λ)

+       
ω  ⊤
tr  Etr,post
λpost
  − ω  ⊤
co    Eco,post
λ    
  − ω ⊤
post tr  Etr,pre
λpre
  + ω  ⊤
co    Eco,pre
λpre
.

noise ε(ω,λ)

9
This weighted double-differencing structure plays a key role in understanding the behavior of SDID. As dis-
cussed further in Section V, despite relying on a different motivation, certain specifications of the recently proposed
“augmented synthetic control” method of B en-Michael, Feller, and Rothstein (2018) also result in a weighted
double-differencing estimator.

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4104 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

In order to characterize the distribution of τˆ   sdid − τ, it thus remains to carry out two
tasks. First, we need to understand the scale of the errors B (ω, λ)and ε (ω, λ), and
second, we need to understand how data adaptivity of the weights ωˆ and λˆ  affects
the situation.

B. Oracle and Adaptive Synthetic Control Weights

To address the adaptivity of the SDID weights ωˆ and λˆ chosen via (4) and (6),
we construct alternative “oracle” weights that have similar properties to ω ˆ and λˆ  in
terms of eliminating bias due to L , but are deterministic. We can then further decom-
pose the error of τ into the error of a weighted double-differencing estimator with
ˆ   sdid

the oracle weights and the difference between the oracle and feasible estimators.
Under appropriate conditions, we find the latter term negligible relative to the error
of the oracle estimator, opening the door to a simple asymptotic characterization of
the error distribution of τˆ   sdid.
We define such oracle weights ω ̃ by minimizing the expectation of the
̃ and λ
objective functions ℓunit( ⋅ )and ℓt ime( ⋅ )used in (4) and (6) respectively, and set

(ω ̃ 0, ω̃ ) =   arg min

(18) (λ̃ 0, λ̃ ) =   arg min
  E[ℓu nit(ω0, ω)],   E[ℓt ime(λ0, λ)].
ω0∈ℝ,ω∈Ω λ0∈ℝ,λ∈Λ

In the case of our model (14) these weights admit a simplified characterization

(ω̃ 0, ω̃ ) =   arg min

(19)   ||ω0  + ω  ⊤
co    Lco,pre  − ω  tr  Ltr,pre||  2 
⊤ 2
ω0∈ℝ,ω∈Ω

+ (tr(Σpre,pre) + ζ  2 Tpre) ||ω||  22   ,

( λ̃ 0, λ̃ ) =   arg min

(20) 0  + Lco,pre λpre  − Lco,post λpost||  22  + ||Σ̃ λ||  22   ,
  ||λ
λ0∈ℝ,λ∈Λ

where
Σpre,pre − Σpre,post
(− Σpost,pre Σ post,post )
Σ̃  =   
        .

The error of the SDID estimator can now be decomposed as follows,

(21)
τˆ   sdid  − τ =   , λ̃ )+  B
ε(ω̃       , λ̃ )  τ
(ω̃  +  ˆ (  
   τˆ (ω̃ , λ̃ ) ,
ωˆ , λˆ ) − 
 
⏟
oracle noise oracle ⏟
confounding bias deviation from oracle

and our task is to characterize all three terms.

First, the oracle noise term tends to be small when the weights are not
too concentrated, i.e., when ∥ ω̃  ∥2and ∥ λ̃  ∥2are small, and we have a suf-
ficient number of exposed units and time periods. In the case with Σ
= , i.e., without any c ross-observation correlations, we note that v ar[ε(ω̃ , λ̃ )]
σ  2 IT×T
= σ  2(N  −1 ∥ λ̃  ∥  22  ). When we move to our asymptotic analysis
tr    + ∥ ω̃  ∥  2   )(T  post +
2 −1

below, we work under assumptions that make this oracle noise term dominant rela-
tive to the other error terms in (21).

This content downloaded from

Second, the oracle confounding bias will be small either when the pre-exposure ora-
cle row regression fits well and generalizes to the exposed rows, i.e., ω̃ 0  + ω̃   ⊤ co    Lco,pre

≈ ω̃   tr  Ltr,pre
⊤
and ω ̃ 0  + ω̃   co    Lco,post
⊤
≈ ω̃   tr  Ltr,post
⊤
, or when the unexposed oracle col-
umn regression fits well and generalizes to the exposed columns, λ̃ 0  + Lco,pre λ̃ pre
≈ Lco,post λ̃ postand λ ̃ 0  + Ltr,pre λ̃ pre ≈ Ltr,post λ̃ post. Moreover, even if neither model
generalizes sufficiently well on its own, it suffices for one model to predict the gen-
eralization error of the other:

B( ω, λ) =
(ω  tr  Ltr,post
⊤
  − ω  ⊤
co    Lco,post
) λpost
  − (ω  ⊤
tr  Ltr,pre
  − ω  ⊤ ) λpre
co    Lco,pre

tr  (
= ω  ⊤
Ltr,post λpost  − Ltr,pre λpre) − ω  ⊤
co (
  Lco,post λpost  − Lco,pre λpre).

The upshot is even if one of the sets of weights fails to remove the bias from the
presence of L, the combination of weights ω̃ and λ̃ can compensate for such failures.
This double robustness property is similar to that of the augmented inverse proba-
bility weighting estimator, whereby one can trade off between accurate estimates of
the outcome and treatment assignment models (Ben-Michael, Feller, and Rothstein
2018; Scharfstein, Rotnitzky, and Robins 1999).
We note that although poor fit in the oracle regressions on the unexposed rows
and columns of L will often be indicated by a poor fit in the realized regressions
on the unexposed rows and columns of Y , the assumption that one of these regres-
sions generalizes to exposed rows or columns is an identification assumption with-
out clear testable implications. It is essentially an assumption of no unexplained
confounding: any exceptional behavior of the exposed observations, whether due to
exposure or not, can be ascribed to it.
Third, our core theoretical claim, formalized in our asymptotic analysis, is
that the SDID estimator will be close to the oracle when the oracle unit and time
weights look promising on their respective training sets, i.e, when ω ̃ 0  + ω̃   ⊤
co    Lco,pre
≈ ω̃   tr  Ltr,preand ∥ ω̃  ∥2is not too large and λ̃ 0  + Lco,pre λ̃ pre ≈ Lco,post λ̃ post and
⊤

∥ λ̃  ∥2is not too large. Although the details differ, as described above these qualita-
tive properties are also criteria for accuracy of the oracle estimator itself.
Finally, we comment briefly on the behavior of the oracle time weights λ̃ in the
presence of autocorrelation over time. When Σis not diagonal, the effective regular-
ization term in (20) does not shrink λ ̃ pretowards zero, but rather toward an autore-
gression vector

(22)  
ψ = arg min
T
Σ̃ ( λ

v
)
post =
Σ−1
v∈ℝ 
∥
  pre,pre Σpre,post

pre
∥
λpost
.

Here λpost
is the T
post
-component column vector with all elements equal to 1 / Tpost
and ψ is the population regression coefficient in a regression of the average of the
posttreatment errors on the pretreatment errors. In the absence of autocorrelation,
ψis zero, but when autocorrelation is present, shrinkage toward ψreduces the vari-
ance of the SDID estimator—and enables us to gain precision over the basic DID
estimator (2) even when the t wo-way fixed effects model is correctly specified. This
explains some of the behavior noted in the simulations.

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4106 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

C. Asymptotic Properties

To carry out the analysis plan sketched above, we need to embed our problem into
an asymptotic setting. First, we require the error matrix Eto satisfy some regularity
properties.

ASSUMPTION 1 (Properties of Errors): The rows 𝐄 iof the noise matrix are inde-
pendent and identically distributed Gaussian vectors and the eigenvalues of its
covariance matrix Σare bounded and bounded away from zero.

Next, we spell out assumptions about the sample size. At a high level, we want
the panel to be large (i.e., N, T → ∞), and for the number of treated cells of the
panel to grow to infinity but slower than the total panel size. We note in particular
that we can accommodate sequences where one of Tpostor Ntris fixed, but not both.

ASSUMPTION 2 (Sample Sizes): We consider a sequence of populations where

(i ) the product Ntr Tpost goes to infinity, and both Nco and Tpre go to infinity,

(ii ) the ratio Tpre / Nco is bounded and bounded away from zero,

co / (Ntr Tpost max(Ntr, Tpost)log  2(Nco)) → ∞.

(iii ) N

We also need to make assumptions about the spectrum of L ; in particular, L

cannot have too many large singular values, although we allow for the possibility
of many small singular values. A sufficient, but not necessary, __________ condition for the
assumption below is that the rank of Lis less than √min(    Tpre, Nco) . Notice that
we do not assume any lower bounds for nonzero singular values of L; in fact can
accommodate arbitrarily many nonzero but very small singular values, much like,
e.g., Belloni, Chernozhukov, and Hansen (2014) can accommodate arbitrarily many
nonzero but very small signal coefficients in a high-dimensional inference prob-
__________
lem. We need the √    Tpre, Nco) t h singular value of L
min( co,preto be sufficiently small.
Formally, we have the following result.

ASSUMPTION 3 (Properties of L) : Letting σ 1(Γ), σ2(Γ), …denote the singu-

values of the matrix Γ in decreasing order and R
lar__________ the largest integer less than
√min(
   , Nco
Tpre )  ,

σR( Lco,pre
(23) ) / R = o(min{N  −1/2
tr  log  −1/2(Nco
), T  −1/2
post log 
−1/2
(Tpre)}).

The last—and potentially most interesting—of our assumptions concerns the

relation between the factor structure L and the assignment mechanism W . At a high
level, it plays the role of an identifying assumption, and guarantees that the oracle
weights from (19) and (20) that are directly defined in terms of Lare able to ade-
quately cancel out Lvia the weighted double-differencing strategy. This requires
that the optimization problems (19) and (20) accommodate reasonably dispersed

This content downloaded from

weights, and that the treated units and after periods not be too dissimilar from the
control units and the before periods respectively.

ASSUMPTION 4 (Properties of Weights and L

): The oracle unit weights ω̃  satisfy

∥ ω ̃ co ∥2 = o([(Ntr Tpost )

−1/2
(24) )log(Nco
)] 

and

∥ ω ̃ 0  + ω̃   ⊤

co    Lco,pre
  − ω̃   ⊤
tr  Ltr,pre
∥
2

= o(N  1/4 log  −1/2(Nco)),

−1/4
co  (Ntr Tpost max(Nco, Tpost)) 

the oracle time weights λ̃  satisfy

∥ λ̃ pre  − ψ ∥ = o([(Ntr Tpost)log(Nco)]  )

−1/2
(25)
2

and

∥ λ̃ 0  + Lco,pre λ̃ pre  − Lco,post λ̃ post ∥ = o(N  1/4

co  (Ntr Tpost) 
−1/8
),
2

and the oracle weights jointly satisfy

(26) ̃ ̃ ̃ ̃
tr  Ltr,post λ post  − ω  ̃ co    Lco,post λ post  − ω̃   tr  Ltr,pre λ pre  + ω̃   co    Lco,pre λ pre
ω̃   ⊤ ⊤ ⊤ ⊤

= o((Ntr Tpost)  −1/2).

Assumptions 1 –4 are substantially weaker than those used to establish asymp-

totic normality of comparable methods.10 We do not require that double differ-
encing alone removes the individual and time effects as the DID assumptions do.
Furthermore, we do not require that unit comparisons alone are sufficient to remove
the biases in comparisons between treated and control units as the SC assumptions
do. Finally, we do not require a low rank factor model to be correctly specified, as is
often assumed in the analysis of methods that estimate L explicitly (e.g., Bai 2009,
Moon and Weidner 2015, 2017). Rather, we only need the combination of the three
bias-reducing components in the SDID estimator, (i ) double differencing, (ii ) the
unit weights, and (iii ) the time weights, to reduce the bias to a sufficiently small
level.
Our main formal result states that under these assumptions, our estima-
tor is asymptotically normal. Furthermore, its asymptotic variance is optimal,

10
In particular, note that our assumptions are satisfied in the w ell-specified two-way fixed effect setting model.
Suppose we have Lit = α i+ βtwith uncorrelated and homoskedastic errors, and that the sample size restrictions
in Assumption 2 are satisfied. Then Assumption 1 is automatically satisfied, and the rank condition on L from
Assumption 3 _ is satisfied with R = 2. Next, we see that the oracle unit weights satisfy ω̃ co,i = 1_ / Nco so that
∥ ω̃ ∥2 = 1 / √ N co, and the oracle time weights satisfy λ̃ pre,i = 1 / Tpreso that ∥ λ ̃ − ψ ∥2 = 1 / √
N co. Thus if
the restrictions on the rates at which the sample sizes increase in Assumption 2 are satisfied, then (24) and (25)
are satisfied. Finally, the additive structure of L implies that, as long as the weights for the controls sum to one,
ω̃ ⊤ λ̃ post
trLtr,post − ω̃ ⊤ coL
co,post
λ̃ post
= 0, and ω̃ ⊤ ̃ + ω̃ ⊤
trLtr,preλpre λ̃ pre = 0, so that (26) is satisfied.
coLco,pre

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4108 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

coinciding with the variance we would get if we knew Land Σa priori and
could therefore estimate τby a simple average of τ itplus unpredictable noise,
tr  ∑ i=Nco+1[T  post ∑ t=Tpre+1(τit
N  −1 N −1 T
  + εit) − Ei,pre
ψ].

THEOREM 1: Under the model (14) with 𝐋and 𝐖taken as fixed, suppose that we run
the SDID estimator (1) with regularization parameter ζ satisfying (Ntr Tpost
)  1/2log(Nco
)
= o(ζ  ). Suppose moreover that Assumptions 1–4 hold. Then,
2

Ntr i=Nco +1( Tpost t=Tpre+1 it )

N T
_
τˆ   sdid  − τ =  
(27) 1     ∑ _   1     ∑ ε   − Ei,pre ψ + op((Ntr Tpost
)  −1/2),

and consequently

(τˆ   sdid  − τ) / V  1/2

(28) τ  ⇒  (
0, 1),

where

Ntr [ Tpost t=Tpre+1 it ]

T
Vτ =  _   1     ∑ ε   − Ei,pre ψ .
1   var _

Here Vτ is on the order of 1 / (Ntr Tpost), i.e., Ntr Tpost Vτ is bounded and bounded away
from zero.

IV. Large-Sample Inference

The asymptotic result from the previous section can be used to motivate practical
methods for large-sample inference using SDID. Under appropriate conditions, the
estimator is asymptotically normal and z ero centered; thus, if these conditions hold
and we have a consistent estimator for its asymptotic variance Vτ, we can use con-
ventional confidence intervals
__
τ ∈ τˆ   sdid  ± zα /2 √Vˆ τ 
(29)

to conduct asymptotically valid inference. In this section, we discuss three approaches

to variance estimation for use in confidence intervals of this type.
The first proposal we consider, described in detail in Algorithm 2, involves a clus-
tered bootstrap (Efron 1979) where we independently resample units. As argued in
Bertrand, Duflo, and Mullainathan (2004), u nit-level bootstrapping presents a nat-
ural approach to inference with panel data when repeated observations of the same
unit may be correlated with each other. The bootstrap is simple to implement and,
in our experiments, appears to yield robust performance in large panels. The main
downside of the bootstrap is that it may be computationally costly as it involves run-
ning the full SDID algorithm for each bootstrap replication, and for large datasets
this can be prohibitively expensive.
To address this issue we next consider an approach to inference that is more
closely tailored to the SDID method and only involves running the full SDID algo-
rithm once, thus dramatically decreasing the computational burden. Given weights ωˆ 
and λˆ used to get the SDID point estimate, Algorithm 3 applies the jackknife (Miller

This content downloaded from

Algorithm 2—Bootstrap Variance Estimation

Data: Y, W, B
Result: Variance estimator Vˆ   cb
τ 
1. for i ← 1 to B do
2. Construct a bootstrap dataset (Y  (b), W  (b))by sampling N rows of
3. (Y, W) with replacement.
4. if the bootstrap sample has no treated units or no control units then
5. Discard and resample (go to 2)
6. end
7. Compute the SDID estimator τ  (b)based on (Y  (b), W  (b))
8. end
9. Define Vˆ   bτ  =  _
1
B
  (τˆ   (b)  −  _
  ∑Bb=1 1
B
   ∑   τˆ   (b))  2 ;
Bb=1

Algorithm 3—Jackknife Variance Estimation

Data: ωˆ , λˆ , Y, W, τˆ 

Result: Variance estimator Vˆ τ
1. for i ← 1 to N do
2. Compute τˆ   (−i)  : arg minτ,{αj,βt}j≠i,t
 ∑ j≠i,t (Yjt  − αj  − βt  − τ Wit)  ω
2
ˆ j λˆ t
3. end
4. τ  = (N − 1) N
Compute Vˆ   jack   −1 ∑Ni=1 (τˆ   (−i)  − τˆ )  2;

1974) to the weighted SDID regression (1), with the weights treated as fixed. The
validity of this procedure is not implied directly by asymptotic linearity as in (27);
however, as shown below, we still recover conservative confidence intervals under
considerable generality.

THEOREM 2: Suppose that the elements of 𝐋 are bounded. Then, under the condi-
tions of Theorem 1, the jackknife variance estimator described in Algorithm 3 yields
conservative confidence intervals, i.e., for any 0 < α < 1,
____
(30) lim inf Pr[τ ∈
τˆ   sdid ± zα/2 ˆ
√V  τ   ] ≥ 1 − α.
jack

Moreover, if the treatment effects τit = τare constant11 and

Tpost N  −1
(31) tr  ‖λ 0  + Ltr,pre λ pre  − Ltr,post λ post‖  2   →p  0,
ˆ ˆ ˆ 2

that is, the time weights λ

ˆ  are predictive enough on the exposed units, then the jack-
knife yields exact confidence intervals and (30) holds with equality.

11
When treatment effects are heterogeneous, the jackknife implicitly treats the estimand (15) as random
whereas we treat it as fixed, thus resulting in excess estimated variance; see Imbens (2004) for further discussion.

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4110 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

Algorithm 4—Placebo Variance Estimation

Data: Yco,⋅ , Ntr , B
Result: Variance estimator Vˆ   placebo
τ 
1. for b ← 1 to B do
2. Sample Ntrout of the Ncocontrol units without replacement to ‘receive the placebo’;
3. Construct a placebo treatment matrix W  (b) co,⋅ for the controls;
4. Compute the SDID estimator τˆ   (b)based on (Yco,⋅ co,⋅ );
, W  (b)
5. end
6. Define Vˆ   placebo
τ  =  _
1
B
  (τˆ   (b)  −  _
  ∑Bb=1 1
B
   ∑   τˆ   (b))  2 ;
Bb=1

In other words, we find that the jackknife is in general conservative and is exact
when treated and control units are similar enough that time weights that fit the con-
trol units generalize to the treated units. This result depends on specific structure of
the SDID estimator, and does not hold for related methods such as the SC estimator.
In particular, an analogue to Algorithm 3 for SC would be severely biased upwards,
and would not be exact even in the w ell-specified fixed effects model. Thus, we do
not recommend (or report results for) this type of jackknifing with the SC estimator.
We do report results for jackknifing DID since, in this case, there are no random
weights ωˆ or λˆ and so our jackknife just amounts to the regular jackknife.
Now, both the bootstrap- and jackknife-based methods discussed so far are
designed with the setting of Theorem 1 in mind, i.e., for large panels with many
treated units. These methods may be less reliable when the number of treated units
Ntris small, and the jackknife is not even defined when Ntr = 1. However, many
applications of SCs have N tr = 1, e.g., the California smoking application from
Section I. To this end, we consider a third variance estimator that is motivated by
placebo evaluations as often considered in the literature on SCs (Abadie, Diamond,
and Hainmueller 2010,;2015), and that can be applied with N tr = 1. The main idea
of such placebo evaluations is to consider the behavior of SC estimation when we
replace the unit that was exposed to the treatment with different units that were not
exposed.12 Algorithm 4 builds on this idea, and uses placebo predictions using only
the unexposed units to estimate the noise level, and then uses it to get Vˆ τand build
confidence intervals as in (29). See Bottmer et al. (2021) for a discussion of the
properties of such placebo variance estimators in small samples.
Validity of the placebo approach relies fundamentally on homoskedasticity across
units, because if the exposed and unexposed units have different noise distributions
then there is no way we can learn Vτfrom unexposed units alone. We also note
that n onparametric variance estimation for treatment effect estimators is in gen-
eral impossible if we only have one treated unit, and so homoskedasticity across
units is effectively a necessary assumption in order for inference to be possible

12
Such a placebo test is closely connected to permutation tests in randomization inference; however, in many
SC applications, the exposed unit was not chosen at random, in which case placebo tests do not have the formal
properties of randomization tests (Firpo and Possebom 2018, Hahn and Shi 2016), and so may need to be inter-
preted via a more qualitative lens.

This content downloaded from

Table 4

Bootstrap Jackknife Placebo

SDID SC DID SDID SC DID SDID SC DID
1. Baseline 0.96 0.93 0.89 0.93 — 0.92 0.95 0.88 0.96
2. Gun law 0.97 0.96 0.92 0.94 — 0.93 0.94 0.95 0.93
3. Abortion 0.96 0.94 0.93 0.93 — 0.95 0.97 0.91 0.96
4. Random 0.96 0.96 0.92 0.93 — 0.94 0.96 0.96 0.94
5. Hours 0.92 0.96 0.94 0.89 — 0.95 0.91 0.90 0.96
6. Urate 0.78 0.74 0.38 0.71 — 0.42 0.74 0.77 0.41
7. Tpost
= 1 0.93 0.94 0.84 0.92 — 0.88 0.92 0.90 0.92
8. Ntr = 1 — — — — — — 0.97 0.95 0.96
9. Tpost
= Ntr = 1 — — — — — — 0.96 0.94 0.94
10. Resample, N = 200 0.94 0.96 0.92 0.95 — 0.93 0.96 0.95 0.94
11. Resample, N = 400 0.95 0.91 0.96 0.96 — 0.95 0.96 0.90 0.96
12. Democracy 0.93 0.96 0.55 0.94 — 0.59 0.98 0.97 0.79
13. Education 0.95 0.95 0.30 0.95 — 0.34 0.99 0.90 0.94
14. Random 0.93 0.95 0.89 0.96 — 0.91 0.95 0.94 0.91

Notes: Coverage results for nominal 95 percent confidence intervals in the CPS and Penn World Table simulation
setting from Tables 2 and 3. The first three columns show coverage of confidence intervals obtained via the clus-
tered bootstrap. The second set of columns show coverage from the jackknife method. The last set of columns show
coverage from the placebo method. Unless otherwise specified, all settings have N = 50and T = 40cells, of which
at most N tr = 10units and T
post = 10periods are treated. In rows 7–9, we reduce the number of treated cells. In
rows 10 and 11, we artificially make the panel larger by adding rows, which makes the assumption that the number
of treated units is small relative to the number of control units more accurate. (We set N trto 10percents of the total
number of units.) We do not report jackknife and bootstrap coverage rates for N tr  = 1because the estimators are
not well-defined. We do not report jackknife coverage rates for SC because, as discussed in the text, the variance
estimator is not well justified in this case. All results are based on 400 simulation replications.

here.13 Algorithm 4 can also be seen as an adaptation of the method of Conley

and Taber (2011) for inference in DID models with few treated units and assuming
homoskedasticity, in that both rely on the empirical distribution of residuals for
placebo-estimators run on control units to conduct inference. We refer to Conley and
Taber (2011) for a detailed analysis of this class of algorithms.
Table 4 shows the coverage rates for the experiments described in Section IIA
and IIB, using Gaussian confidence intervals (29) with variance estimates obtained
as described above. In the case of the SDID estimation, the bootstrap estimator
performs particularly well, yielding nearly nominal 95 percent coverage, while both
placebo and jackknife variance estimates also deliver results that are close to the
nominal 95 percent level. This is encouraging, and aligned with our previous obser-
vation that the SDID estimator appeared to have low bias. That being said, when
assessing the performance of the placebo estimator, recall that the data in Section IIA
was generated with noise that is both Gaussian and homoskedastic across units—
which were assumptions that are both heavily used by the placebo estimator.
In contrast, we see that coverage rates for DID and SC can be relatively low, espe-
cially in cases with significant bias such as the setting with the state unemployment
rate as the outcome. This is again in line with what one may have expected based
on the distribution of the errors of each estimator as discussed in Section IIA, e.g., in

13
In Theorem 1, we also assumed homoskedasticity. In contrast to the case of placebo inference, however,
it’s likely that a similar result would also hold without homoskedasticity; homoskedasticity is used in the proof
essentially only to simplify notation and allow the use of concentration inequalities which have been proven in the
homoskedastic case but can be generalized.

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4112 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

Figure 2: If the point estimates τˆ from DID and SC are dominated by bias, then we
should not expect confidence intervals that only focus on variance to achieve coverage.

V. Related Work

Methodologically, our work draws most directly from the literature on SC meth-
ods, including Abadie and Gardeazabal (2003); Abadie, Diamond, and Hainmueller
(2010, 2015); Abadie and L’Hour (2016); Doudchenko and Imbens (2016); and
Ben-Michael, Feller, and Rothstein (2018). Most methods in this line of work can
be thought of as focusing on constructing unit weights that create comparable (bal-
anced) treated and control units, without relying on any modeling or weighting
across time. Ben-Michael, Feller, and Rothstein (2018) is an interesting exception.
Their augmented SC estimator, motivated by the augmented inverse-propensity
weighted estimator of Robins, Rotnitzky, and Zhao (1994), combines SC weights
with a regression adjustment for improved accuracy. (See also Kellogg et al. 2020
which explicitly connects SC to matching). They focus on the case of Ntr = 1
exposed units and Tpost = 1postexposure periods, and their method involves fitting
a model for the conditional expectation m ( ⋅ )for YiTin terms of the lagged outcomes
Yi,pre, and then using this fitted model to “augment” the basic SC estimator as
follows:

( i=1 ( ))
N−1 N−1
τˆ asc = YN T  −    ∑ωˆ   sc
(32) ˆ (YN ,pre)  −   ∑ωˆ   sc
i  YiT  +  m ˆ (Yi,pre) .
i  m
i=1

Despite their different motivations, the augmented SC and SDID methods share an
interesting connection: with a linear model m ( ⋅ ), τ ˆ sdidand τˆ ascare very similar. In
fact, had we fit ω ˆ   sdid
without intercept, they would be equivalent for m ˆ ( ⋅ )fit by least
squares on the controls, imposing the constraint that its coefficients are nonnegative
and to sum to one, that is, for m ˆ (Yi,pre) = λˆ   sdid
0  + Yi,pre λ   pre. This connection sug-
ˆ sdid
gests that weighted two-way b ias-removal methods are a natural way of working
with panels where we want to move beyond simple DID approaches.
We also note recent work of Roth (2018) and Rambachan and Roth (2019), who
focus on valid inference in DID settings when users look at past outcomes to check
for parallel trends. Our approach uses past data not only to check whether the trends
are parallel, but also to construct the weights to make them parallel. In this setting,
we show that one can still conduct valid inference, as long as Nand Tare large
enough and the size of the treatment block is small.
In terms of our formal results, our paper fits broadly in the literature on panel
models with interactive fixed effects and the matrix completion literature (Athey
et al. 2021; Bai 2009; Moon and Weidner 2015, 2017; Robins 1985; Xu 2017).
Different types of problems of this form have a long tradition in the economet-
rics literature, with early results going back to Ahn, Lee, and Schmidt (2001);
Chamberlain (1992); and Holtz-Eakin, Newey, and Rosen (1988) in the case of
finite-horizon panels (i.e., in our notation, under asymptotics where T is fixed and only
N → ∞). More recently, Freyberger (2018) extended the work of Chamberlain
(1992) to a setting that’s closely related to ours, and emphasized the role of
the past outcomes for constructing moment restrictions in the fixed-T setting.

This content downloaded from

Freyberger (2018) attains identification by assuming that the errors Eitare uncor-
related, and thus past outcomes act as valid instruments. In contrast, we allow for
correlated errors within rows, and thus need to work in a large-T setting.
Recently, there has considerable interest in models of type (10) under asymptotics
where both N and Tget large. One popular approach, studied by Bai (2009) and Moon
and Weidner (2015, 2017), involves fitting (10) by “least squares,” i.e., by minimizing
squared-error loss while constraining L ˆ to have bounded rank R. While these results
do allow valid inference for τ, they require strong assumptions. First, they require the
to be known a priori (or, in the case of Moon and Weidner 2015, require
rank of L
a known upper bound for its rank), and second, they require a βm in-type condition
whereby the normalized n onzero singular values of L are well separated from zero. In
contrast, our results require no explicit limit on the rank of L and allow for Lto have
to have positive singular values that are arbitrarily close to zero, thus suggesting that
the SDID method may be more robust than the least squares method in cases where
the analyst wishes to be as agnostic as possible regarding properties of L.14
Athey et al. (2021); Amjad, Shah, and Shen (2018); Moon and Weidner (2018;,
and Xu (2017) build on this line of work, and replace the fi xed-rank constraint with
data-driven regularization on L ˆ . This innovation is very helpful from a computa-
tional perspective; however, results for inference about τthat go beyond what was
available for least squares estimators are currently not available. We also note recent
papers that draw from these ideas in connection to SC type analyses, including Chan
and Kwok (2020) and Gobillon and Magnac (2016). Finally, in a paper contem-
poraneous to ours, Agarwal et al. (2019) provide improved bounds from principal
component regression in an errors-in-variables model closely related to our setting,
and discuss implications for estimation in SC type problems. Relative to our results,
however, Agarwal et al. (2019) still require assumptions on the behavior of the small
singular values of L , and do not provide methods for inference about τ.
In another direction, several authors have recently proposed various methods that
implicitly control for the systematic component Lin models of time (10). In one
early example, Hsiao, Ching, and Ki Wan (2012) start with a factor model similar
to ours and show that under certain assumptions it implies the moment condition
N−1
YNt = a +  ∑ βj Yjt  + ϵNt, E εN t | {Yjt}  N−1 = 0,
[ ]
(33)
j=1 j=1

for all t = 1, …, T. The authors then estimate βj by (weighted) ordinary least
squares. This approach is further refined by Li and Bell (2017), who additionally
propose to penalizing the coefficients βj using the lasso (Tibshirani 1996). In a
recent paper, Chernozhukov, Wüthrich, and Zhu (2018) use the model (33) as a
starting point for inference.
While this line of work shares a conceptual connection with us, the formal setting
is very different. In order to derive a representation of the type (33), one essentially
needs to assume a random specification for (10) where both Land Eare stationary

14
By analogy, we also note that, in the literature on high-dimensional inference, methods that do no assume
a uniform lower bound on the strength of n onzero coefficients of the signal vector are generally considered more
robust than ones that do (e.g., Belloni, Chernozhukov, and Hansen 2014; Zhang and Zhang 2014).

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4114 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

in time. Li and Bell (2017) explicitly assumes that the outcomes Ythemselves are
weakly stationary, while Chernozhukov, Wüthrich, and Zhu (2018) makes the same
assumption to derive the results that are valid under general misspecification. In our
results, we do not assume stationarity anywhere: Lis taken as deterministic and the
errors Emay be nonstationary. Moreover, in the case of most SC and DID analyses,
we believe stationarity to be a fairly restrictive assumption. In particular, in our model,
stationarity would imply that a simple p re-post comparison for exposed units would
be an unbiased estimator of τand, as a result, the only purpose of the unexposed units
would be to help improve efficiency. In contrast, in our analysis, using unexposed
units for double differencing is crucial for identification.
Ferman and Pinto (2019) analyze the performance of SC estimator using essen-
tially the same model as we do. They focus on the situations where N is small, while
Tpre (the number of control periods) is growing. They show that unless time factors
have strong trends (e.g., polynomial) the SC estimator is asymptotically biased.
Importantly Ferman and Pinto (2019) focus on the standard SC estimator, without
time weights and regularization, but with an intercept in the construction of the
weights.
Finally, from a statistical perspective, our approach bears some similarity to the
work on “balancing” methods for program evaluation under unconfoundedness,
including Athey, Imbens, and Wager (2018); Graham, Pinto, and Egel (2012);
Hirshberg and Wager (2017); Imai and Ratkovic (2014); Kallus (2020); Zhao
(2019); and Zubizarreta (2015). One major result of this line of work is that, by
algorithmically finding weights that balance observed covariates across treated and
control observations, we can derive robust estimators with good asymptotic prop-
erties (such as efficiency). In contrast to this line of work, rather than balancing
observed covariates, we here need to balance unobserved factors Γand ϒ in (10) to
achieve consistency; and accounting for this forces us to follow a different formal
approach than existing studies using balancing methods.

Appendix. Staggered Adoption

In the paper so far we have focused on the case where some units start receiving the
treatment at a common point in time, what Athey et al. (2021) call block assignment.

⎜ ⎟
Under block assignment the N × Tmatrix of treatment assignments W has the form
like the following matrix, where units 3 –6 all adopt the treatment in period 5:

1 2 3 4 5 6 7 ⎞⎛
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
W =  3   
    0  0  0  0  1  1  1    .
4 0 0 0 0 1 1 1
5 0 0 0 0 1 1 1
⎝ 6 0 0 0 0 1 1 1 ⎠

This is a common setting, but there are other settings that are of interest. Another
important special case is that of staggered adoption (e.g.,Athey and Imbens 2021)
with multiple dates at which the treatment is started. For example, in the following

This content downloaded from

⎜ ⎟
assignment matrix units 5 and 6 adopt the treatment in period 3, and units 3 and 4
adopt the treatment in period 5 (and units 1 and 2 never adopt the treatment):

1 2 3 4 5 6 7 ⎞ ⎛
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
W =  3   
    0  0  0  0  1  1  1    .
4 0 0 0 0 1 1 1
5 0 0 1 1 1 1 1
⎝ 6 0 0 1 1 1 1 1 ⎠

With staggered adoption the weighted DID regression approach in SDID does not
work directly. However, there are various alternatives. Here we discuss a simple
modification to estimate the average treatment effect for the treated in that setting by
applying the SDID estimator repeatedly, once for every adoption date. An alterna-
tive is the procedure developed in Ben-Michael, Feller, and Rothstein (2019). In the
example above with two adoption dates, we can create two assignment matrices, W  1
and W  2, that both fit into the block assignment setting. We can then apply the SDID
estimator to both samples, and calculate a weighted average of the two estimators,
with the weight equal to the fraction of treated unit/time-period pairs in each of the
two samples. In the above example, the first sample would consist of units 1, 2, 5
and 6, and the second sample would consist of units 1, 2, 3, and 4, as illustrated in
the two assignment matrices below:

⎜ ⎟
⎛
1 2 3 4 5 6 7 ⎞
1 0 0 0 0 0 0 0
  1 =  2   
W
    0  0  0  0  0  0  0    ,
5 0 0 1 1 1 1 1
⎝ 6 0 0 1 1 1 1 1 ⎠

⎜ ⎟
⎛
1 2 3 4 5 6 7 ⎞
1 0 0 0 0 0 0 0

W  2 =  2   
    0  0  0  0  0  0  0    .
3 0 0 0 0 1 1 1
⎝ 4 0 0 0 0 1 1 1 ⎠

Alternatively we can create the two samples by splitting the data up by time periods.

⎜ ⎟ ⎜ ⎟
In that case the first sample would consist of time periods 1, 2, 3, and 4, and the
second sample would consist of time periods 1, 2, 5, 6, and 7, as illustrated below:

⎛ 1 2 3 4 ⎞ ⎛ 1 2 5 6 7 ⎞
1 0 0 0 0 1 0 0 0 0 0
2 0 0 0 0 2 0 0 0 0 0
W
    0  0  0  0 ,
  1 =   3       0  0  1  1  1   .
W  2 =   3  
4 0 0 0 0 4 0 0 1 1 1
5 0 0 1 1 5 0 0 1 1 1
⎝ 6 0 0 1 1 ⎠ ⎝ 6 0 0 1 1 1 ⎠

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4116 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

REFERENCES

Abadie, Alberto. 2005. “Semiparametric Difference-in-Differences Estimators.” Review of Economic

Studies 72 (1): 1–19.
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic Control Methods for Com-
parative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of
the American Statistical Association 105 (490): 493–505.
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2015. “Comparative Politics and the Syn-
thetic Control Method.” American Journal of Political Science 59 (2): 495–510.
Abadie, Alberto, and Javier Gardeazabal. 2003. “The Economic Costs of Conflict: A Case Study of the
Basque Country.” American Economic Review 93 (1): 113–32.
Abadie, Alberto, and Jérémy L’Hour. 2016. “A Penalized Synthetic Control Estimator for Disaggre-
gated Data.” Unpublished.
Agarwal, Anish, Devavrat Shah, Dennis Shen, and Dogyoon Song. 2019. “On Robustness of Principal
Component Regression.” arXiv preprint arXiv:1902.10920.
Ahn, Seung Chan, Young Hoon Lee, and Peter Schmidt. 2001. “GMM Estimation of Linear Panel Data
Models with Time-Varying Individual Effects.” Journal of Econometrics 101 (2): 219–55.
Amjad, Muhammad, Devavrat Shah, and Dennis Shen. 2018. “Robust Synthetic Control.” Journal of
Machine Learning Research 19 (22): 1–51.
Angrist, Joshua D., and Jörn-Steffen Pischke. 2008. Mostly Harmless Econometrics: An Empiricist’s
Companion. Princeton: Princeton University Press.
Arkhangelsky, Dmitry, Susan Athey, David A. Hirshberg, Guido W. Imbens, and Stefan Wager. 2021.
“Replication Data for: Synthetic Difference-in-Differences.” American Economic Association
[publisher], Inter-university Consortium for Social and Political Research [distributor]. https://doi.
org/10.3886/E146381V1.
Ashenfelter, Orley, and David Card. 1985. “Using the Longitudinal Structure of Earnings to Estimate
the Effect of Training Programs.” Review of Economics and Statistics 67 (4): 648–60.
Athey, Susan, Mohsen Bayati, Nikolay Doudchenko, Guido Imbens, and Khashayar Khosravi. 2021.
“Matrix Completion Methods for Causal Panel Data Models.” Journal of the American Statistical
Association 116. https://doi.org/10.1080/01621459.2021.1891924.
Athey, Susan, and Guido W. Imbens. 2021. “Design-Based Analysis in Difference-in-Differ-
ences Settings with Staggered Adoption.” Journal of Econometrics. https://doi.org/10.1016/j.
jeconom.2020.10.012.
Athey, Susan, Guido W. Imbens, and Stefan Wager. 2018. “Approximate Residual Balancing: Debiased
Inference of Average Treatment Effects in High Dimensions.” Journal of the Royal Statistical Soci-
ety: Series B (Statistical Methodology) 80 (4): 597–623.
Bai, Jushan. 2009. “Panel Data Models with Interactive Fixed Effects.” Econometrica 77 (4): 1229–79.
Barrios, Thomas, Rebecca Diamond, Guido W. Imbens, and Michal Kolesár. 2012. “Clustering, Spa-
tial Correlations, and Randomization Inference.” Journal of the American Statistical Association
107 (498): 578–91.
Belloni, Alexandre, Victor Chernozhukov, and Christian Hansen. 2014. “Inference on Treatment
Effects after Selection among High-Dimensional Controls.” Review of Economic Studies 81 (2):
608–50.
Ben-Michael, Eli, Avi Feller, and Jesse Rothstein. 2018. “The Augmented Synthetic Control Method.”
arXiv preprint arXiv:1811.04170v1.
Ben-Michael, Eli, Avi Feller, and Jesse Rothstein. 2019. “Synthetic Controls and Weighted Event Stud-
ies with Staggered Adoption.” arXiv preprint arXiv:1912.03290v1.
Bertrand, Marianne, Esther Duflo, and Sendhil Mullainathan. 2004. “How Much Should We Trust
Differences-in-Differences Estimates?” Quarterly Journal of Economics 119 (1): 249–75.
Borusyak, Kirill, and Xavier Jaravel. 2016. “Revisiting Event Study Designs.” Unpublished.
Bottmer, Lea, Guido Imbens, Jann Spiess, and Merrill Warnick. 2021. “A Design-Based Perspective
on Synthetic Control Methods.” arXiv preprint arXiv:2101.09398v1.
Callaway, Brantly, and Pedro H. C. Sant’anna. 2020. “Difference-in-Differences with Multiple Time
Periods.” Journal of Econometrics. https://doi.org/10.1016/j.jeconom.2020.12.001.
Card, David. 1990. “The Impact of the Mariel Boatlift on the Miami Labor Market.” Industrial and
Labor Relation 43 (2): 245–57.
Chamberlain, Gary. 1992. “Efficiency Bounds for Semiparametric Regression.” Econometrica 60 (3):
567–96.
Chan, Mark K., and Simon Kwok. 2020. “The PCDID Approach: Difference-in-Differences When
Trends Are Potentially Unparallel and Stochastic.” Unpublished.

This content downloaded from

Chernozhukov, Victor, Kaspar Wuthrich, and Yinchu Zhu. 2018. “Inference on Average Treatment
Effects in Aggregate Panel Data Settings.” arXiv preprint arXiv:1812.10820v1.
Conley, Timothy G., and Christopher R. Taber. 2011. “Inference with ‘Difference in Difference’ with
a Small Number of Policy Changes.” Review of Economics and Statistics 93 (1): 113–25.
Currie, Janet, Henrik Kleven, and Esmée Zwiers. 2020. “Technology and Big Data Are Changing Eco-
nomics: Mining Text to Track Methods.” AEA Papers and Proceedings 110: 42–48.
de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2020. “Two-Way Fixed Effects Estimators with
Heterogeneous Treatment Effects.” American Economic Review 110 (9): 2964–96.
Doudchenko, Nikolay, and Guido W. Imbens. 2016. “Balancing, Regression, Difference-in-Differ-
ences and Synthetic Control Methods: A Synthesis.” NBER Working Paper 22791.
Efron, B. 1979. “Bootstrap Methods: Another Look at the Jackknife.” Annals of Statistics 7 (1): 1–26.
Feenstra, Robert C., Robert Inklaar, and Marcel P. Timmer. 2015. “The Next Generation of the Penn
World Table.” American Economic Review 105 (10): 3150–82. Available for download at www.
ggdc.net/pwt.
Ferman, Bruno, and Cristine Pinto. 2019. “Synthetic Controls with Imperfect Pre-treatment Fit.”
arXiv preprint arXiv:1911.08521v1.
Firpo, Sergio, and Vitor Possebom. 2018. “Synthetic Control Method: Inference, Sensitivity Analysis
and Confidence Sets.” Journal of Causal Inference 6 (2): Article 20160026.
Freyaldenhoven, Simon, Christian Hansen, and Jesse M. Shapiro. 2019. “Pre-event Trends in the Panel
Event-Study Design.” American Economic Review 109 (9): 3307–38.
Freyberger, Joachim. 2018. “Non-parametric Panel Data Models with Interactive Fixed Effects.”
Review of Economic Studies 85 (3): 1824–51.
Gobillon, Laurent, and Thierry Magnac. 2016. “Regional Policy Evaluation: Interactive Fixed Effects
and Synthetic Controls.” Review of Economics and Statistics 98 (3): 535–51.
Graham, Bryan S., Cristine Campos De Xavier Pinto, and Daniel Egel. 2012. “Inverse Probability
Tilting for Moment Condition Models with Missing Data.” Review of Economic Studies 79 (3):
1053–79.
Hahn, Jinyong, and Ruoyao Shi. 2016. “Synthetic Control and Inference.” Unpublished.
Hirshberg, David A. 2021. “Least Squares with Error in Variables.” arXiv preprint arXiv:2104.08931v1.
Hirshberg, David A., and Stefan Wager. 2017. “Augmented Minimax Linear Estimation.” arXiv pre-
print arXiv:1712.00038.
Holtz-Eakin, Douglas, Whitney Newey, and Harvey S. Rosen. 1988. “Estimating Vector Autoregres-
sions with Panel Data.” Econometrica 56 (6): 1371–95.
Hsiao, Cheng, H. Steve Ching, and Shui Ki Wan. 2012. “A Panel Data Approach for Program Evalu-
ation: Measuring the Benefits of Political and Economic Integration of Hong Kong with Mainland
China.” Journal of Applied Econometrics 27 (5): 705–40.
Imai, Kosuke, and Marc Ratkovic. 2014. “Covariate Balancing Propensity Score.” Journal of the Royal
Statistical Society: Series B (Statistical Methodology) 76 (1): 243–63.
Imbens, Guido. 2004. “Nonparametric Estimation of Average Treatment Effects under Exogeneity: A
Review.” Review of Economics and Statistics 86 (1): 4–29.
Imbens, Guido W., and Donald B. Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical
Sciences. New York: Cambridge University Press.
Kallus, Nathan. 2020. “Generalized Optimal Matching Methods for Causal Inference.” Journal of
Machine Learning Research 21 (62): 1–54.
Kellogg, Maxwell, Magne Mogstad, Guillaume Pouliot, and Alexander Torgovitsky. 2020. “Combining
Matching and Synthetic Control to Trade Off Biases from Extrapolation and Interpolation.” NBER
Working Paper 26624.
Li, Kathleen T., and David R. Bell. 2017. “Estimation of Average Treatment Effects with Panel Data:
Asymptotic Theory and Implementation.” Journal of Econometrics 197 (1): 65–75.
Miller, Rupert G. 1974. “The Jackknife - A Review.” Biometrika 61 (1): 1–15.
Moon, Hyungsik Roger, and Martin Weidner. 2017. “Dynamic Linear Panel Regression Models with
Interactive Fixed Effects.” Econometric Theory 33 (1): 158–95.
Moon, Hyungsik Roger, and Martin Weidner. 2015. “Linear Regression for Panel with Unknown
Number of Factors as Interactive Fixed Effects.” Econometrica 83 (4): 1543–79.
Moon, Hyungsik Roger, and Martin Weidner. 2018. “Nuclear Norm Regularized Estimation of Panel
Regression Models.” arXiv preprint arXiv:1810.10987v1.
National Bureau of Economic Research. 2021. “Merged Outgoing Rotation Groups (MORG).” https://
data.nber.org/morg/annual (accessed August 1, 2021).
Orzechowski, & Walker. 2005. The Tax Burden on Tobacco. Historical Compilation, Vol. 40, Arling-
ton, VA: Orzechowski & Walker.

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms
4118 THE AMERICAN ECONOMIC REVIEW DECEMBER 2021

Peri, Giovanni, and Vasil Yasenov. 2019. “The Labor Market Effects of a Refugee Wave: Synthetic
Control Method Meets the Mariel Boatlift.” Journal of Human Resources 54 (2): 267–309.
Rambachan, Ashesh, and Jonathan Roth. 2019. “An Honest Approach to Parallel Trends.” Unpub-
lished.
Robins, James M., Andrea Rotnitzky, and Lue Ping Zhao. 1994. “Estimation of Regression Coef-
ficients When Some Regressors Are Not Always Observed.” Journal of the American Statistical
Association 89 (427): 846–66.
Robins, Philip K. 1985. “A Comparison of the Labor Supply Findings from the Four Negative Income
Tax Experiments.” Journal of Human Resources 20 (4): 567–82.
Roth, Jonathan. 2018. “Pre-test with Caution: Event-Study Estimates after Testing for Parallel Trends.”
Unpublished.
Scharfstein, Daniel O., Andrea Rotnitzky, and James M. Robins. 1999. “Adjusting for Nonignorable
Drop-Out Using Semiparametric Nonresponse Models.” Journal of the American Statistical Asso-
ciation 94 (448): 1096–1120.
Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal
Statistical Society: Series B (Methodological) 58 (1): 267–88.
Vershynin, Roman. 2018. High-Dimensional Probability: An Introduction with Applications in Data
Science. Cambridge, UK: Cambridge University Press.
Xu, Yiqing. 2017. “Generalized Synthetic Control Method: Causal Inference with Interactive Fixed
Effects Models.” Political Analysis 25 (1): 57–76.
Zhang, Cun-Hui, and Stephanie S. Zhang. 2014. “Confidence Intervals for Low Dimensional Param-
eters in High Dimensional Linear Models.” Journal of the Royal Statistical Society: Series B (Sta-
tistical Methodology) 76 (1): 217–42.
Zhao, Qingyuan. 2019. “Covariate Balancing Propensity Score by Tailored Loss Functions.” Annals
of Statistics 47 (2): 965–93.
Zubizarreta, José R. 2015. “Stable Weights That Balance Covariates for Estimation with Incomplete
Outcome Data.” Journal of the American Statistical Association 110 (511): 910–22.

This content downloaded from

141.161.91.14 on Thu, 26 Dec 2024 20:56:24 UTC
All use subject to https://about.jstor.org/terms

1P Seedy Tavern
No ratings yet
1P Seedy Tavern
1 page
Year 10 Mathematics Quadratics Test: Section 1 Multiple Choice
No ratings yet
Year 10 Mathematics Quadratics Test: Section 1 Multiple Choice
5 pages
LedSync820C User Manual VLS PDF
No ratings yet
LedSync820C User Manual VLS PDF
17 pages
SDID
No ratings yet
SDID
37 pages
2503.13323v1
No ratings yet
2503.13323v1
75 pages
Callaway & SantAnna
No ratings yet
Callaway & SantAnna
31 pages
01 Introduction
No ratings yet
01 Introduction
28 pages
Journal - Generalized Synthetic Control Method - Causal Inference With Interactive Fixed Effects Models
No ratings yet
Journal - Generalized Synthetic Control Method - Causal Inference With Interactive Fixed Effects Models
20 pages
Generalized Synthetic Control Method
No ratings yet
Generalized Synthetic Control Method
33 pages
Balancing, Regression, Difference-In-Difference and Synthetic Control Methods
No ratings yet
Balancing, Regression, Difference-In-Difference and Synthetic Control Methods
38 pages
What's Trending in Difference-In-Differences
No ratings yet
What's Trending in Difference-In-Differences
27 pages
Did, Iv
No ratings yet
Did, Iv
42 pages
Diff - Simplifying The Estimation of Difference-In-difference Treatment Effects
No ratings yet
Diff - Simplifying The Estimation of Difference-In-difference Treatment Effects
20 pages
13_dind
No ratings yet
13_dind
58 pages
Dynamic DiD Regression Li Strezhnev June 25 2024
No ratings yet
Dynamic DiD Regression Li Strezhnev June 25 2024
112 pages
Chapter_13
No ratings yet
Chapter_13
14 pages
2023 Roth Santanna Bilinski
No ratings yet
2023 Roth Santanna Bilinski
58 pages
DiD Review Paper
No ratings yet
DiD Review Paper
54 pages
Difference-in-Differences With Interference: Ruonan Xu
No ratings yet
Difference-in-Differences With Interference: Ruonan Xu
65 pages
2024 DiD Handout
No ratings yet
2024 DiD Handout
4 pages
DID topics
No ratings yet
DID topics
86 pages
Xu GeneralizedSyntheticControl 2017
No ratings yet
Xu GeneralizedSyntheticControl 2017
21 pages
2019-Impact Evaluation Using DiD
No ratings yet
2019-Impact Evaluation Using DiD
14 pages
01_Introduction
No ratings yet
01_Introduction
53 pages
Evaluating the Impact of Health Policies Using a Difference-Indifferences Approach
No ratings yet
Evaluating the Impact of Health Policies Using a Difference-Indifferences Approach
6 pages
2025 More on Panels
No ratings yet
2025 More on Panels
17 pages
DID
No ratings yet
DID
28 pages
BorusyakJaravelSpiess (2024) ReviewOfEconomicStudies
No ratings yet
BorusyakJaravelSpiess (2024) ReviewOfEconomicStudies
33 pages
1 s2.0 S0304405X22000204 Main
No ratings yet
1 s2.0 S0304405X22000204 Main
26 pages
A_Tutorial_on_Applying_the_Difference-in-Differenc (1)
No ratings yet
A_Tutorial_on_Applying_the_Difference-in-Differenc (1)
11 pages
Lect - 10 - Difference-in-Differences Estimation PDF
No ratings yet
Lect - 10 - Difference-in-Differences Estimation PDF
19 pages
Lect 10 Diffindiffs 230305 014504
No ratings yet
Lect 10 Diffindiffs 230305 014504
20 pages
Difference-In-Differences Estimation Under Non-Parallel Trends
No ratings yet
Difference-In-Differences Estimation Under Non-Parallel Trends
33 pages
Guido Imbens
No ratings yet
Guido Imbens
62 pages
Differences in Differences
No ratings yet
Differences in Differences
78 pages
cách chọn nhóm đối chứng
No ratings yet
cách chọn nhóm đối chứng
13 pages
L_II_3 (2)
No ratings yet
L_II_3 (2)
37 pages
The Estimation of Causal Effects by Difference-In
No ratings yet
The Estimation of Causal Effects by Difference-In
56 pages
Distribution Regression Difference-in-Differences
No ratings yet
Distribution Regression Difference-in-Differences
49 pages
CH13 Wooldridge 7e+PPT 2pp
No ratings yet
CH13 Wooldridge 7e+PPT 2pp
14 pages
utaa001
No ratings yet
utaa001
17 pages
D - D: E, I V T: Ifference IN Iscontinuities Stimation Nference AND Alidity Ests
No ratings yet
D - D: E, I V T: Ifference IN Iscontinuities Stimation Nference AND Alidity Ests
27 pages
Designing Difference in Difference Studies- Best Practices for Public Health Policy Research
No ratings yet
Designing Difference in Difference Studies- Best Practices for Public Health Policy Research
21 pages
Chaisemartind'Haultfoeuille (2023) EconometricsJournal
No ratings yet
Chaisemartind'Haultfoeuille (2023) EconometricsJournal
30 pages
Panel3 DID
No ratings yet
Panel3 DID
36 pages
w29691 PDF
No ratings yet
w29691 PDF
30 pages
Lee Wooldridge 20230720
No ratings yet
Lee Wooldridge 20230720
45 pages
Introduction To DiD Design
No ratings yet
Introduction To DiD Design
4 pages
Difference in Difference For Impact Evaluation
No ratings yet
Difference in Difference For Impact Evaluation
18 pages
Financial Econometrics Homework 6
No ratings yet
Financial Econometrics Homework 6
20 pages
11_Diff-n-diff
No ratings yet
11_Diff-n-diff
44 pages
Research Paper - Econometrics - TWFE
No ratings yet
Research Paper - Econometrics - TWFE
35 pages
2108.12419v5
No ratings yet
2108.12419v5
67 pages
Introduction To The Difference-In-Differences Regression Model (2021)
No ratings yet
Introduction To The Difference-In-Differences Regression Model (2021)
2 pages
正在发送邮件 wk-08-slides
No ratings yet
正在发送邮件 wk-08-slides
96 pages
Micro-Econometrics ECO 6175: Abel Brodeur
No ratings yet
Micro-Econometrics ECO 6175: Abel Brodeur
34 pages
Lecture Note 11 Panel Analysis
No ratings yet
Lecture Note 11 Panel Analysis
11 pages
Explanation DID
No ratings yet
Explanation DID
8 pages
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
Applied Economics DD Lecture Notes
No ratings yet
Applied Economics DD Lecture Notes
76 pages
Chapter 1
No ratings yet
Chapter 1
7 pages
The Quarterly Review of Economics and Finance: Orkideh Gharehgozli
No ratings yet
The Quarterly Review of Economics and Finance: Orkideh Gharehgozli
12 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Inventories, Lumpy Trade and Large Devaluations
No ratings yet
Inventories, Lumpy Trade and Large Devaluations
37 pages
Gerard - 2021 - Assortative Matching or Exclusionary Hiring The I
No ratings yet
Gerard - 2021 - Assortative Matching or Exclusionary Hiring The I
40 pages
Are Emily and Greg More Employable than Lakisha and Jamal
No ratings yet
Are Emily and Greg More Employable than Lakisha and Jamal
6 pages
IV_Quantile_Regression_for_Group_Level_Treatments
No ratings yet
IV_Quantile_Regression_for_Group_Level_Treatments
10 pages
Training Flyer - AGA
No ratings yet
Training Flyer - AGA
1 page
General Organic and Biological Chemistry An Integrated Approach 4th Edition Kenneth W. Raymond All Chapter Instant Download
No ratings yet
General Organic and Biological Chemistry An Integrated Approach 4th Edition Kenneth W. Raymond All Chapter Instant Download
50 pages
Insect Pest of Rice
No ratings yet
Insect Pest of Rice
9 pages
Beyond Realism Naturalist Film In Theory And Practice Robert Singer download
100% (1)
Beyond Realism Naturalist Film In Theory And Practice Robert Singer download
72 pages
Meenakshi Amman Temple
No ratings yet
Meenakshi Amman Temple
6 pages
(Ebook) Stoicism (Ancient Philosophies) by John Sellars ISBN 9780520249080, 0520249089 instant download
100% (1)
(Ebook) Stoicism (Ancient Philosophies) by John Sellars ISBN 9780520249080, 0520249089 instant download
51 pages
CSI0601
No ratings yet
CSI0601
2 pages
MSC 4 Sem Geology Fuel Geology (Coal Petroleum and Nuclear) Paper 4 Summer 2018
No ratings yet
MSC 4 Sem Geology Fuel Geology (Coal Petroleum and Nuclear) Paper 4 Summer 2018
1 page
Black and Yellow Modern Construction Engineering Presentation
No ratings yet
Black and Yellow Modern Construction Engineering Presentation
25 pages
002perspective and Orthographic Projections 1
No ratings yet
002perspective and Orthographic Projections 1
40 pages
Kjellberg Finsterwalde Brochure Automation Solutions
No ratings yet
Kjellberg Finsterwalde Brochure Automation Solutions
44 pages
Role of Salesperson: Diagnostic
No ratings yet
Role of Salesperson: Diagnostic
13 pages
461 - APH302 Non-Ruminant Production Lecture Note 1
No ratings yet
461 - APH302 Non-Ruminant Production Lecture Note 1
32 pages
Schneider-Contactor-Catalogue_142
No ratings yet
Schneider-Contactor-Catalogue_142
1 page
Combat System Maintenance Status: David H. Olwell
No ratings yet
Combat System Maintenance Status: David H. Olwell
14 pages
Anatomy and Physiology
No ratings yet
Anatomy and Physiology
3 pages
Cylinders Shall Be Produced As Per IS 7285 and Approved by Chief Controller of Explosive
100% (1)
Cylinders Shall Be Produced As Per IS 7285 and Approved by Chief Controller of Explosive
1 page
Baking Pans
No ratings yet
Baking Pans
2 pages
The Sea Is at Our Gates The History of The Canadian Navy PDF
100% (3)
The Sea Is at Our Gates The History of The Canadian Navy PDF
400 pages
Topoisomerases Structure Function and Mechanism
No ratings yet
Topoisomerases Structure Function and Mechanism
15 pages
Characterization of Common-Mode Choke For Automotive Ethernet Networks Enabling 100 Mbit S
No ratings yet
Characterization of Common-Mode Choke For Automotive Ethernet Networks Enabling 100 Mbit S
6 pages
Worksheet Integration
No ratings yet
Worksheet Integration
11 pages
I Show Off When Others Try Their Best But Am I Rated S-Level 1-98 S1-98
No ratings yet
I Show Off When Others Try Their Best But Am I Rated S-Level 1-98 S1-98
312 pages
Units 6-7 Vocab
No ratings yet
Units 6-7 Vocab
2 pages
GeoHazards 92 - Vol 1 - Geotechnique and Natural Hazards
No ratings yet
GeoHazards 92 - Vol 1 - Geotechnique and Natural Hazards
417 pages
Chapter 2 - The Gods and Goddesses
No ratings yet
Chapter 2 - The Gods and Goddesses
8 pages
Science 6-7TH Monthly Test-2024
No ratings yet
Science 6-7TH Monthly Test-2024
2 pages

Arkhangelsky-SyntheticDifferenceinDifferences-2021

Uploaded by

Arkhangelsky-SyntheticDifferenceinDifferences-2021

Uploaded by

American Economic Association

Stable URL: https://www.jstor.org/stable/10.2307/27086719

This content downloaded from

By Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg,

This content downloaded from

This content downloaded from

This content downloaded from

​(​​ωˆ ​​0​​, ​​ωˆ ​​​ sdid)​ ​ = ​ arg min​

ℓ​ ​unit​​​(​ω0​​​, ω)​ = ​ ∑ ​​ ​​​ ω

​Nc​o​​​(​T​pre​​ − 1)​i=1 t=1

This content downloaded from

(6) ​​(​​λˆ ​​0​​, ​​λˆ ​​​ sdid​)​ = ​ arg min​

Λ = ​{λ T​pre​​ + 1, …, T}​​.

This content downloaded from

SDID SC DID MC DIFP

B. The California Smoking Cessation Program

This content downloaded from

DID uses constant weights ​​ω i​ ​ = ​

This content downloaded from

II. Placebo Studies

This content downloaded from

In order to run such a simulation study, we first need to commit to an econo-

where ​​γi​​​​is a vector of latent unit factors of dimension ​R​, and ​υ

A. Current Population Survey Placebo Study

This content downloaded from

This decomposition of ​L​into an additive ­two-way fixed effect component ​F​and an

This content downloaded from

​ i​​​ | ​Ei​​​, ​α​i​​, ​Mi​​​ ∼ Bernoulli​(​πi​​​)​,

Simulation Results.—Table 2 compares the performance of the four aforemen-

This content downloaded from

Assignment block size

This content downloaded from

Minimum wage assignment Random assignment

−0.2 −0.1 0 0.1 0.2 −0.2 −0.1 0 0.1 0.2

B. Penn World Table Placebo Study

This content downloaded from

III. Formal Results

For notational convenience, we partition the matrix ​Y​ as

This content downloaded from

This content downloaded from

We introduced the SDID estimator (1) as the solution to a weighted t­wo-way

This content downloaded from

B. Oracle and Adaptive Synthetic Control Weights

(​​​ω ​​̃ 0​​, ​ω̃ ​)​ = ​ arg min​

(​​​ω̃ ​​0​​, ​ω̃ ​)​ = ​ arg min​

+ ​(tr​(​Σp​re,pre​​)​ + ​ζ​​ 2​ ​Tp​re​​)​ ​||ω||​ 22​​ ​ ,

(​ ​​λ̃ ​​0​​, ​λ̃ ​)​ = ​ arg min​

The error of the SDID estimator can now be decomposed as follows,

and our task is to characterize all three terms.

This content downloaded from

This content downloaded from

ASSUMPTION 2 (Sample Sizes): We consider a sequence of populations where

​​ ​co​​ / (​Nt​r​​ ​Tp​ost​​ max(​Nt​r​​, ​T​post​​)l​og​​ 2​(​Nc​o​​)) → ∞​.

We also need to make assumptions about the spectrum of L ​ ​; in particular, ​L​

ASSUMPTION 3 (Properties of ​L)​ : Letting σ ​​ ​1​​(Γ), ​σ2​​​(Γ), …​denote the singu-

The last—and potentially most interesting—of our assumptions concerns the

This content downloaded from

ASSUMPTION 4 (Properties of Weights and L

∥​ ​​ω ​​̃ co​​ ∥​2​​ = o​(​​[​(​Ntr​ ​​ ​Tpost ​)​ ​

∥​ ​​ω ​​̃ 0​​ + ​​ω̃ ​​ ⊤

= o​(​N​ 1/4 ​ ​log​​ −1/2​​(​Nc​o​​)​)​​,

the oracle time weights ​​λ̃ ​​ satisfy

​∥ ​​λ̃ ​​pre​​ − ψ ∥​ ​​ = o​(​​[​(​Nt​r​​ ​Tp​ost​​)​log​(​Nc​o​​)​]​​​ ​)​ ​

​∥ ​​λ̃ ​​0​​ + ​L​co,pre​​ ​​λ̃ ​​pre​​ − ​L​co,post​​ ​​λ̃ ​​post​​ ∥​ ​​ = o​(​N​ 1/4

and the oracle weights jointly satisfy

= o​(​​(​Nt​r​​ ​Tp​ost​​)​​​ −1/2​)​​.

Assumptions 1­ –4 are substantially weaker than those used to establish asymp-

This content downloaded from

​Nt​r​​ i=​Nco​ ​​+1( ​T​post​​ t=​T​pre​​+1 it )

​(​​τˆ ​​​ sdid​ − τ)​ / ​V​ 1/2

​Nt​r​​ [ ​T​post​​ t=​T​pre​​+1 it ]

to conduct asymptotically valid inference. In this section, we discuss three approaches

This content downloaded from

Algorithm 2—Bootstrap Variance Estimation

Algorithm 3—Jackknife Variance Estimation

Data: ​​ωˆ ​​, ​​λˆ ​​, Y, W, ​​τˆ ​​

(ωˆ 0, ωˆ   sdid) =   arg min

ℓ unit(ω0, ω) =  ∑  ω

Nco(Tpre  − 1)i=1 t=1

(6) (λˆ 0, λˆ   sdid) =  arg min

Λ = {λ Tpre  + 1, …, T}.

DID uses constant weights ω i  =

where γiis a vector of latent unit factors of dimension R, and υ

This decomposition of Linto an additive two-way fixed effect component Fand an

i | Ei, αi, Mi ∼ Bernoulli(πi),

For notational convenience, we partition the matrix Y as

We introduced the SDID estimator (1) as the solution to a weighted two-way

(ω ̃ 0, ω̃ ) =   arg min

(ω̃ 0, ω̃ ) =   arg min

+ (tr(Σpre,pre) + ζ  2 Tpre) ||ω||  22   ,

( λ̃ 0, λ̃ ) =   arg min

co / (Ntr Tpost max(Ntr, Tpost)log  2(Nco)) → ∞.

We also need to make assumptions about the spectrum of L ; in particular, L

ASSUMPTION 3 (Properties of L) : Letting σ 1(Γ), σ2(Γ), …denote the singu-

∥ ω ̃ co ∥2 = o([(Ntr Tpost )

∥ ω ̃ 0  + ω̃   ⊤

= o(N  1/4 log  −1/2(Nco)),

the oracle time weights λ̃  satisfy

∥ λ̃ pre  − ψ ∥ = o([(Ntr Tpost)log(Nco)]  )

∥ λ̃ 0  + Lco,pre λ̃ pre  − Lco,post λ̃ post ∥ = o(N  1/4

= o((Ntr Tpost)  −1/2).

Assumptions 1 –4 are substantially weaker than those used to establish asymp-

Ntr i=Nco +1( Tpost t=Tpre+1 it )

(τˆ   sdid  − τ) / V  1/2

Ntr [ Tpost t=Tpre+1 it ]

Data: ωˆ , λˆ , Y, W, τˆ 

Moreover, if the treatment effects τit = τare constant11 and