Extreme Changes in Changes
Extreme Changes in Changes
Abstract
Policy analysts are often interested in treating the units with extreme outcomes,
such as infants with extremely low birth weights. Existing changes-in-changes (CIC)
estimators are tailored to middle quantiles and do not work well for such subpopu-
lations. This paper proposes a new CIC estimator to accurately estimate treatment
effects at extreme quantiles. With its asymptotic normality, we also propose a method
propose to use our extreme CIC estimator for extreme, such as below 5% and above
95%, quantiles, while the conventional CIC estimator should be used for intermediate
quantiles. Applying the proposed method, we study the effects of income gains from
the 1993 EITC reform on infant birth weights for those in the most critical conditions.
∗
We thank the editor, associate editor, two anonymous referees, Alfonso Flores-Lagunes, Hilary Hoynes,
Doug Miller, and David Simon for useful advice about our empirical application. We benefited from dis-
cussions with Jon Roth. All remaining errors are ours. A Stata command, ecic (extreme changes in
changes), associated with this paper can be installed from SSC archive with the following command line:
ssc install ecic.
†
Associate professor of economics, Vanderbilt University. Email: [email protected]
‡
Assistant professor of economics, Syracuse University. Email: [email protected].
1
1 Introduction
program evaluation in the presence of policy events in time. The common DID methods
critically depend on parallel trend assumptions and focus on identifying (conditional) average
effects. An alternative empirical strategy is the changes in changes (CIC) method proposed
by Athey and Imbens (2006). At the cost of alternative distributional assumptions, the
CIC gets around the common trend assumption and further can identify distributions of
counterfactual outcomes as opposed to just their averages. Thus, CIC can be used to analyze
As are the cases with other quantile-based estimands, however, the existing CIC esti-
mator only works for intermediate quantiles in theory. Practically, for instance, such an
estimator is accurate for intermediate quantile levels such as q ∈ (0.05, 0.95) between the
fifth to the ninety-fifth percentiles. This limitation for the existing CIC estimator rules out
causal inference for those individuals at the extreme top and extreme bottom quantiles.
Yet, it is sometimes rather at extreme quantiles that treatment effects are more relevant
to social policy analysis. For instance, policymakers often care about treating economically
The treatment effect for these tail subpopulations could be substantially larger than that
for the mid-sample subpopulations, and hence it is imperative for such policymakers to have
methods with which they can accurately assess treatment effects at the subpopulations in
the tail.
In this paper, we propose an alternative CIC estimator that more accurately estimates the
treatment effects at the tails, technically in the limits as q → 0 and q → 1. We also develop
asymptotic normality for this estimator and propose an easy-to-construct confidence interval.
Based on our simulation studies, we provide the following practical recommendation. For the
2
intermediate quantiles, use the existing estimator by Athey and Imbens (2006) along with
its standard error. For the extreme quantiles, on the other hand, use our proposed estimator
along with its standard error. We suggest using the log-log plot to choose the switching
point and demonstrate a combined use of both estimators in our empirical application.
With the proposed econometric method, we revisit the study by Hoynes, Miller, and Si-
mon (2015) in which they use the 1993 event of EITC reform to evaluate the effects of income
gains on infant birth weights. While they analyze average effects via the DID, we focus on
the effects at the low quantiles to see if such income gains can improve infant birth weights,
particularly for those at the most critical birth weight conditions. This empirical question
is of interest because low infant birth weight is known to have long-lasting impacts on the
health and economic well-being in adulthood (e.g., Currie, 2011) as well as an immediate
Literature. In contrast to the nowadays extensive body of literature on DID, the liter-
ature on CIC is relatively thin. Since its first proposal by Athey and Imbens (2006), the
CIC framework has been extended to fuzzy treatment assignments (de Chaisemartin and
D’Haultfœuille, 2014), models with covariates (Melly and Santangelo, 2015), continuous
treatments (D’Haultfœuille, Hoderlein, and Sasaki, 2022), and correction of attrition bias
(Ghanem, Hirshleifer, Kedagni, and Ortiz-Becerra, 2022). To our best knowledge, how-
ever, no preceding paper investigates extreme quantiles in the context of CIC, despite the
aforementioned policy relevance. On the other hand, there are a few papers that investigate
treatment effects at extreme quantiles outside the context of CIC – see Chernozhukov (2005),
Chernozhukov and Fernández-Val (2011), D’Haultfœuille, Maurel, and Zhang (2018), Zhang
(2018), and Deuber, Li, Engelke, and Maathuis (2021) to list but a few. None of the existing
method, and Section 4 derives asymptotic properties. Section 5 discusses some practical
3
issues, and Section 6 extends the proposed method to allow for covariates. Section 7 shows
simulation studies, and Section 8 presents the empirical application. Section 9 presents
additional simulation results calibrated to the empirical dataset, and Section 10 concludes.
Stata Command. This paper is accompanied by a Stata command, ecic (extreme changes
in changes). The package can be installed from SSC archive with the following command
line: ssc install ecic. After the installation, run help ecic for usage of the command.
This section briefly reviews the CIC estimator following Athey and Imbens (2006). The goals
here are to introduce the data-generating model and the treatment parameter of interest, as
the control (respectively, treatment) group. Each individual is observed in one of the two
time periods T i ∈ {0, 1}. For each draw i = 1, ..., n from the population, the group identity
Gi and time period T i are treated as random variables. Letting Y i denote a continuous
The underlying structure to generate Y i is as follows. Let YNi (respectively, YIi ) denote
the potential outcome for individual i under no treatment (respectively, under treatment).
YN = h (U, T ) , (1)
where U represents unobserved characteristics, h (·, t) is strictly increasing for each t ∈ {0, 1},
and U ⊥ T |G. Let I i ∈ {0, 1} indicate that individual i receives a treatment. In the two-
Y i = YNi 1 − I i + YIi · I i .
4
We now introduce the following short-hand notations:
YgtN ∼ YN |G = g, T = t
YgtI ∼ YI |G = g, T = t
Ygt ∼ Y |G = g, T = t.
For any distribution function F , we define its left-inverse F −1 by F −1 (q) = inf{y : F (y) ≥
q}. In this setup and with these notations, Athey and Imbens (2006, Theorem 3.1) establish
For each quantile q ∈ (0, 1), the quantile effect of the treatment is thus identified by
The conventional estimator (Athey and Imbens, 2006, page 464) for τqCIC performs well in
middle quantiles, such as q ∈ (0.05, 0.95), but may perform less desirably in extreme quantiles
(e.g., q ∈ (0.00, 0.05]∪[0.95, 1.00)), as are the case with common quantile estimators. Indeed,
the asymptotic theory for the conventional estimator rules out extreme values of q. In this
section, we present our proposed method of estimating τqCIC as q = qn → 1 in the right tail.
A symmetric argument applies to the limit on the other side of the distribution in the left
tail as q → 0. To stress the drifting sequence of limiting parameters of our interest, we use
the notation τqeCIC for extreme CIC. In other words, ‘e’ in “eCIC” is used to remind readers
Suppose that the distribution function FYgt of Ygt has regularly varying tails for each
5
{g, t} ∈ {0, 1}2 . Specifically, we assume
1 − FYgt (ty)
→ y −αgt as t → ∞
1 − FYgt (t)
for each {g, t} ∈ {0, 1}2 . Here, the parameter αgt > 0 is referred to as the Pareto exponent.
Our extreme CIC estimation is built on estimating the Pareto exponent. We emphasize that
this assumption is quite mild and most of the common families of parametric distributions
as well as a large class of nonparametric distributions satisfy it. For example, the Student-
t distribution with ν degrees of freedom satisfies this condition with ν being the Pareto
exponent. See, for example, de Haan and Ferreira (2007, Chapter 1) and Resnick (2007,
group {g, t}, where ngt denotes the subsample size in this group. Choose the largest kgt + 1
of them, that is
(1) (2) (k +1)
Ygt ≥ Ygt ≥ ... ≥ Ygt gt .
As q → 1, FY−1
gt
(q) is estimated by
1/α̂gt
(k +1) kgt
F̂Y−1 (q) = Ygt gt . (4)
gt
ngt (1 − q)
By combining the identifying formula (2) with the component estimators (3)–(5), we
τ̂qeCIC =F̂Y−1
11
(q) − F̂Y−1
01
◦ F̂Y00 ◦ F̂Y−1
10
(q)
6
1/α̂01
1/α̂11
(k +1) k11 (k +1) k01
=Y11 11 − Y01 01
n11 (1 − q) n01 1 − F̂Y00 ◦ F̂Y−1 (q)
10
1/α̂11
(k +1) k11
=Y11 11
n11 (1 − q)
1/α̂10 α̂00 1/α̂01
(k +1) k10
k n Y10 10 n10 (1−q)
(k +1) 01 00
− Y01 01 (k00 +1)
.
n01 k00
Y00
We thus propose (6) as the extreme CIC estimator, which is quite simple to implement. The
next section presents asymptotic properties based on kgt → ∞ as ngt → ∞ for all g and t.
We close this section with a discussion of the identifiability of the extreme CIC, τqeCIC .
While we informally reviewed the identification result of Athey and Imbens (2006, Theorem
3.1) in Section 2, we should emphasize that it relies on a common support condition (Athey
and Imbens, 2006, Assumption 3.4). Namely, for the identifying equality (2) to hold for all
(Athey and Imbens, 2006, Corollary 3.1). Such an unidentified region of q generally contains
extreme quantiles. Hence, the common support condition is crucial especially in the context
of extreme quantiles. If FY11N is bounded away from zero and one on the support of Y |G =
0, T = 1, then we can deduce that the common support condition may be violated.
7
4 Asymptotic Theory
In this section, we derive a limit distributional property for the proposed extreme CIC
estimator. This result paves a way for statistical inference about the extreme CIC.
n
Let {Ygti }i=1
gt
denote the subsample of observed outcomes in group g and time t. We state
Conditions
n
1. Ygti is i.i.d. across i within each g and t. {Ygt1 , ..., Ygt gt } are independent across g and t.
2. Fgt (·) is regularly varying at infinity with Pareto exponent αgt . Moreover, for some
constant ρgt > 0, 1 − Fgt (y) = c1 y −αgt + c2 y −αgt −αgt ρgt (1 + o(1)) as y → ∞.
3. n11 /ngt → η11/gt ∈ (0, ∞) and k11 /kgt → λ11/gt ∈ (0, ∞) for each g, t ∈ {0, 1}2 .
2ρ /(1+2ρgt )
4. kgt → ∞ and kgt = o ngt gt for each g, t ∈ {0, 1}2 .
p
5. ngt (1 − q) = o(kgt ) and log [ngt (1 − q)] = o kgt for each g, t ∈ {0, 1}2 .
6. FY−1
11
(q) /FY−1
01
◦ FY00 ◦ FY−1
10
(q) → ς ∈ (0, ∞).
We provide some discussions about these conditions. Following Athey and Imbens (2006),
Condition 1 assumes random sampling within each time and treatment group, and indepen-
dence across time periods and groups. Thus, it presumes repeated cross sections rather
than panel data. Condition 2 imposes the regularly varying tail conditions on all four con-
ditional distributions of the outcome. More generally, the regularly varying tail condition
is equivalent to that the underlying distribution belongs to the domain of attraction of the
extreme value distribution with a positive tail index. See, for example, de Haan and Fer-
reira (2007, Chapter 1). Since we derive the convergence rate, the second-order Pareto tail
approximation is inevitable. The second-order parameter ρgt governs the distance between
8
the true underlying distribution and the Pareto one. As remarked previously, this condition
imposes a rather mild restriction and also satisfies the common support condition. Condition
3 requires that the sample sizes of all subsamples are asymptotically of the same order of
magnitude.
Condition 4 specifies the order of the tail thresholds used in estimation. For simplicity
2ρ /(1+2ρgt )
of illustration, we select kgt to be of a smaller order than ngt gt so that the estimators
incur negligible asymptotic biases relative to variances. This requirement is similar in spirit
asymptotic bias involves the second-order parameter ρgt (e.g., de Haan and Ferreira, 2007,
Chapter 3). Estimation of this parameter is challenging since it requires further restrictions
on the underlying distribution (e.g., Cheng and Peng, 2001; Haeusler and Segers, 2007;
Carpentier and Kim, 2014), which are hard to interpret and hard to justify. Furthermore,
such bias estimators entail slower rates of convergence. Given these limitations, we focus on
Condition 5 imposes restrictions on the rate at which the quantile level q under investiga-
tion tends to the unit in the drifting sequence. In particular, q should tend to one sufficiently
fast so that the quantile under investigation is extreme. Otherwise, the q quantile is not in
the tail and can be better estimated by the standard CIC method. This condition is also
common in the extreme quantile literature (e.g., de Haan and Ferreira, 2007, Chapter 4).
Note that this condition allows for ngt (1 − q) → 0. When this happens, the other part
of this condition implicitly imposes a lower bound of 1 − q and equivalently that the ex-
trapolation cannot be pushed too far in the right tail (e.g., de Haan and Ferreira, 2007,
p
Remark 4.3.4). To see this, observe that the condition log [ngt (1 − q)] = o kgt implies
9
Condition 6 requires that the limit of the counterfactual outcome ratio is finite as q tends
to the unit. For simplicity, we consider ς ∈ (0, ∞). If ς is 0 or ∞, however, the estimator
F̂Y−1
gt
(·) has a different convergence rate across g and t, and consequently, we could ignore the
The following theorem establishes the asymptotic normality for the extreme CIC estima-
In finite samples, the asymptotic variance can be estimated by substituting α̂gt , λ̂11/gt =
formula of the asymptotic variance Ω provided in the statement of Theorem 1. Under the
same conditions, this estimator of Ω is also consistent. The 95% confidence interval is then
constructed as
2 1/2
2
−1/2
F̂Y−1
11
−2
(q) α̂11 + F̂Y−1
01
◦ F̂Y00 ◦ F̂Y−1
10
(q)
τ̂qeCIC ± 1.96k11 log d11 . (7)
2 h i 2
λ̂11/10 α̂00
× η̂11/10
λ̂11/00 + λ̂11/10 + λ̂11/01 α̂210 α̂201
positive value of the logarithm in (7). We set d = 10 in the subsequent simulation studies
10
in the special case where αg,t is the same across {g, t}, say αg,t = α for all {g, t}, although
we do not impose this restriction in the subsequent numerical analyses. This could happen
if the treatment effect is a constant shift of the outcome. Given that α̂g,t is asymptotically
independent across g and t, we can perform the standard t-test for their equivalence.
5 Practical Issues
The number kgt of order statistics is the key tuning parameter in our method. We propose to
use the empirical choice rule proposed by Guillou and Hall (2001). We present the detailed
Since the identical algorithm applies to each pair of g and t, we suppress these subscripts
first sort them descendingly and denote the order statistics by Y (1) ≥ Y (2) ≥ . . . ≥ Y (n) .
tail approximation performs well, Tk should have its mean close to zero and variance close to
one. Accordingly, we can minimize the following criteria based on a moving average of Tk2 :
1/2
bk/2c
X
Ck = (2bk/2c + 1)−1 2
Tk+j .
j=−l
11
5.2 Extreme Quantiles
We now discuss how to define the domain [q, 1) of q on which one may use this extreme
CIC estimator, as opposed to the conventional CIC estimator. We suggest to make a scatter
(i) n
gt ngt
plot of {log Ygt }i=1 against {log i}i=1 , called the log-log plot. This plot is linear near small
values of i if the tail is approximately Pareto, and our estimator is accurate where it appears
linear. In this light, one can choose the boundary point q such that this log-log plot appears
linear for i ∈ {1, · · · , bngt (1 − q)c}. We concretely illustrate this procedure in our empirical
application in Section 8.
6 Extension: Covariates
Our proposed method can be easily extended to allow for covariates. Similarly to Athey and
Imbens (2006, pages 465-466), we first regress the outcome variable on the covariates and
then apply the proposed extreme CIC estimator to the regression residuals. We formalize
where Wgti denotes the outcome variable for the i-th individual in group g and time t, and
i
Xgt denotes the covariate vector. The coefficient βgt can be different across g and t, and
hence the above regression can be conducted separately for each g and t. For notational
simplicity, we continue using Ygti to denote the error term, which is now unobserved. Given
i 0
an estimate β̂gt , we treat the residuals Ŷgti = Wgti − Xgt β̂gt as effective observations and
construct the proposed extreme CIC estimator based on them. Specifically, we order the
residuals as
(1) (2) (k +1)
Ŷgt ≥ Ŷgt ≥ ... ≥ Ŷgt gt
12
(i) k +1
gt (i) k +1
gt
and replace {Ygt }i=1 with {Ŷgt }j=1 in (3)–(6).
Condition
p |Ŷgti −Ygti |
7. max1≤i≤ngt kgt 1+ Y i = op (1) for all g and t.
| gt |
study an estimator of tail features in a more general setup. This condition is mild and
satisfied by the least square estimator in the linear model (9) (cf., Girard et al., 2021, Section
i
√
3.1). In particular, when Xgt has a compact support, and the regression estimator β̂gt is n-
√
i
consistent, |Ŷgti − Ygti | becomes ||Xgt || · ||β̂gt − βgt || = Op ( ngt ) . Then Condition 7 follows
from that kgt /ngt → 0. In summary, this condition requires that the estimation error is
gt(i) k +1
sufficiently small and consequently the CIC estimator based on {Ŷgt }i=1 is asymptotically
(i) k +1
gt
the same as that based on {Ygt }i=1 .
Corollary 1 Consider the linear regresssion model (9). If Conditions 1-7 are satisfied, then
(i) k +1
the estimator τ̂qeCIC based on {Ŷgt }i=1
gt
has the same asymptotic distribution as in Theorem
1.
7 Simulations
We use the following data generating design based on our baseline model. Generated first
13
To allow for the endogenous dependence between the group Gi and the unobservables U i ,
Here, we use the uniform distribution under Gi = 1 for ease of analytic tractability of both
the Pareto exponent and the quantile treatment effects and for the purpose of accurate
evaluations of simulation results with analytically known true parameter values. We also
remark that the conditional independence assumption U i ⊥ T i |Gi of Athey and Imbens
where Ft−1
α
denotes the quantile function of the Student-t distribution with α degrees of
Y i = YNi 1 − I i + YIi · I i .
There are three notable features in this data generating process. First, FY11N and FY11I in
(10)–(11) have Pareto exponents of α. Second, the second term U on the right-hand side of
(11), but not of (10), causes heterogeneous treatment effects characterized as follows
Finally, we remark that the monotonicity assumption of Athey and Imbens (2006) for the
14
We evaluate the finite sample performance of our proposed extreme CIC estimator τ̂qeCIC
given in (6) along with its standard error estimator (7). The order statistics kgt are chosen
based on Guillou and Hall (2001) for each subsample (g, t) as described in Section 5.1.
We also present simulation results for the conventional estimator τ̂qCIC of Athey and Imbens
(2006, page 464) with its standard error estimator (Athey and Imbens, 2006, pages 464-465).
For the standard error estimation for τ̂qCIC , we use Epanechnikov kernel and Silverman’s rule
of thumb for bandwidth selection. Before presenting the results, we want to stress that we
focus on the extreme quantiles q ∈ [0.90, 1.00) on which comparisons are necessarily unfair
for the estimator τ̂qCIC of Athey and Imbens (2006), which presumes intermediate quantiles
in theory. We confirm and acknowledge that the estimator τ̂qCIC of Athey and Imbens (2006)
Figure 1 shows Monte Carlo averages and inter-quartile ranges of the estimates under the
design with (πG , πT , πA , πB , α) = (0.1, 0.5, 1.0, 2.0, 10). The dashed curves on the left column
of the figure indicate the average estimates based on the conventional estimator τ̂qCIC . The
dotted curves on the right column of the figure indicate the average estimates based on
our proposed estimator τ̂qeCIC . In each panel, the shaded regions indicate the inter-quartile
ranges of the estimates by the respective methods. The results are shown at the extreme
quantiles q ∈ [0.90, 1.00) and for sample sizes N ∈ {2500, 5000}. The solid curves indicate
the true treatment effects. Observe that the conventional estimator τ̂qCIC tends to give biased
estimates as q → 1. On the other hand, our proposed estimator τ̂qeCIC yields significantly
less biased estimates even in the limit q → 1. We ran many additional simulations with
varying design parameter values (πG , πT , πA , πB , α), and the results indicate similar patterns
Figure 2 shows Monte Carlo frequencies that the true treatment effects are covered by the
95% confidence intervals. The dashed curves indicate the results based on the conventional
estimator τ̂qCIC and the dotted curves indicate the results based on our proposed estimator
15
Figure 1: Monte Carlo averages and inter-quartile ranges (shaded) of the estimates based
on the conventional estimator τ̂qCIC (dashed curves on the left column) and our proposed
estimator τ̂qeCIC (dotted curves on the right column) at the extreme quantiles q ∈ [0.90, 1.00)
under the design with (πG , πT , πA , πB , α) = (0.1, 0.5, 1.0, 2.0, 10). The true treatment effects
are indicated by the solid curves.
16
Figure 2: Monte Carlo frequencies of coverage of the true treatment effects by the 95%
confidence intervals at the extreme quantiles q ∈ [0.90, 1.00) under the design with
(πG , πT , πA , πB , α) = (0.1, 0.5, 1.0, 2.0, 10). The dashed and dotted curves indicate the results
based on the conventional estimator τ̂qCIC and our proposed estimator τ̂qeCIC , respectively.
τ̂qeCIC . The results are shown at the extreme quantiles q ∈ [0.90, 1.00) and for sample sizes
N ∈ {2500, 5000}. Observe that the coverage frequency based on the conventional method
deviates away from the nominal probability of 0.95 as q → 1. In contrast, the coverage
frequency based on our proposed method is close to the nominal probability of 0.95 at each
point q ∈ [0.90, 1.00) in the extreme quantiles. We remark again that we ran many additional
simulations with varying design parameter values (πG , πT , πA , πB , α), and the results indicate
Use the conventional estimator τ̂qCIC of Athey and Imbens (2006, page 464) along with its
standard error estimator (Athey and Imbens, 2006, pages 464-465) for intermediate quantiles.
17
On the other hand, use our proposed estimator τ̂qeCIC in (6) along with the standard error
estimator (7) for extreme quantiles. The switching point can be chosen by using the log-
log plot described in Section 5.2. We also follow this practical guideline for the empirical
There is a long history in health economics research to study causes and prevention of low
infant birth weight. It is an important topic from policy viewpoint because low infant birth
weight has been identified to have long-lasting impacts on the health and economic well being
in adulthood (e.g., Currie, 2011) as well as they are well known to have immediate impact on
infant mortality. Some economic and behavioral factors affecting infant birth weight include
maternal smoking (e.g., Almond, Chay, and Lee, 2005; Currie, Neidell, and Schmieder, 2009),
maternal stress (e.g., Aizer, Stroud, and Buka, 2009; Camacho, 2008; Evans and Garthwaite,
2014), and economic resources (e.g., Hoynes et al., 2015), among others.
With studies of average effects as in most of the existing empirical studies, it still remains
unknown if these causal factors would have positive impacts on the most vulnerable subpop-
ulation, namely those infants born with extremely low birth weights. There are a few papers
(Chernozhukov and Fernández-Val, 2011; Sasaki and Wang, 2022) that study extreme quan-
tiles of infant birth weights, but causal interpretations of their estimation results require to
assume exogeneity of the explanatory variable of interest conditional on other observed co-
can handle flexible endogeneity in the treatment choice and study treatment effects for the
most vulnerable subpopulation at the extremely low quantiles using the method proposed
in this paper.
Hoynes et al. (2015) use the difference-in-differences (DID) design based on EITC reform
18
(Omnibus Reconciliation Act of 1993, OBRA93) to evaluate the effects of income gains
through the EITC on infant health outcomes. They find significant average effects of income
shocks on the incidence of low birth weight and the average infant birth weight. In this paper,
we aim to complement the work of Hoynes et al. (2015) by analyzing the heterogeneous effects
of the income gains through the EITC on infant birth weight at extremely low quantiles, as
Following the prior work by Hoynes et al. (2015), we use the U.S. Vital Statistics Natality
Data, 1989–1999. We also adopt their DID design for our extreme CIC analysis by following
their two key assumptions. First, the effects of the EITC on infant birth weights run through
the cash available to the family which arrives through tax refunds and the cash is spent over
the subsequent 12 months. Second, we focus on the effects during the sensitive development
stage in the three months prior to birth. Consequently, following the cash-in-hand assignment
rule of Hoynes et al. (2015, Table 1), we include births in May 1994 or after in the “Post”
group (T = 1) associated with the policy event of OBRA93. The eligibility criteria for the
EITC includes the requirement that a taxpayer has a qualifying child. In this light, we
include all the second- or higher-order live births as the treatment group (G = 1). The
sample sizes are n00 = 2372001, n01 = 1287185, n10 = 2652321, and n11 = 1325598.
Hoynes et al. (2015) define subpopulations by year, state, parity, education, race, and
mother’s age. Then, they treat such a subpopulation as a unit of observation, and use the
average birth weight within a subpopulation as the outcome value for the unit. However, this
procedure will not allow us to analyze individual heterogeneity with the quantile treatment
effect because aggregation eliminates individual heterogeneity. Hence, we use each birth
as a unit of observation unlike Hoynes et al. (2015). Otherwise, we follow their empirical
approach as follows. First, we use year and state fixed effects. Since Hoynes et al. (2015) use
parity, education, race, and mother’s age to define their subgroups of aggregation, we instead
use this list of variables as covariates in our analysis. To accommodate these covariates, the
19
extended method introduced in Section 6 is employed. Second, we focus on single women
To determine the switching point q between our extreme CIC estimator and the conven-
tional CIC estimator, we draw the log-log plots for −Ŷ00 , −Ŷ01 , −Ŷ10 , and −Ŷ11 in Figure 3.
Observe in each figure that the plot is reasonably linear up to around the 2.5-th or 5-th per-
centile, and thereby starts to curve downward. In light of the discussion in Section 5.2 and
noting that our current focus is on the left tail, we choose the switching point q such that the
log-log plot is linear for i ∈ {1, · · · , bngt qc}. To guarantee a well Pareto tail approximation,
Figure 4 illustrates estimates and confidence intervals for τqCIC . The estimates by our pro-
posed method for the extreme quantiles q ∈ (0.000, 0.025] are indicated by dotted curves, and
the estimates by Athey and Imbens (2006) for the intermediate quantiles q ∈ (0.025, 0.200]
are indicated by the dashed curves. The gray shades indicate pointwise 95 percent confidence
intervals.
Observe that the point estimates are unambiguously positive for all the quantiles q ∈
(0.000, 0.200). Furthermore, these income effects are statistically significant at each quantile
q ∈ (0.000, 0.200]. Therefore, we can conclude that income gains will causally improve the
While Hoynes et al. (2015) discover positive effects of the EITC income gains on average,
we further find positive effects at the low quantiles in particular. This progress in empirical
research is important as causal effects for extremely low infant birth weights are more relevant
to policy analysis. Low infant birth weight is known to have have long-lasting impacts on the
health and economic well being in adulthood (e.g., Currie, 2011) as well as they are known to
have immediate impact on infant mortality. Our findings focusing on the low quantiles imply
that income support during pregnancy may help mitigate these adverse health and economic
outcomes. We want to stress that, for us to reach this important empirical conclusion, both
20
Figure 3: The log-log plots for −Ŷ00 , −Ŷ01 , −Ŷ10 , and −Ŷ11 .
21
Figure 4: Estimates and 95 percent confidence intervals for τqCIC of infant birth weight
for q ∈ (0.000, 0.200]. The sample consists of infants born between 1989 and 1999 from
unmarried black mothers who have complete 12 years of education. The results for the
extreme quantiles q ∈ (0.000, 0.025] are based on the proposed method. The results for the
middle quantiles q ∈ (0.025, 0.200] are based on Athey and Imbens (2006).
22
the conventional estimator τ̂qCIC by Athey and Imbens (2006) and our proposed estimator
Section 7 presents simulation studies based on data generated from an artificial design. In
this section, we present additional simulation studies with resamples from the empirical data
each g ∈ {0, 1}, we draw a one-percent subsample of size b0.01 · ng0 c from {Ŷ00i }ni=1
00
∪ {Ŷ10i }ni=1
10
with replacement, and define this subsample as a simulated sample of Yg0 . Similarly, from
each g ∈ {0, 1}, we draw a one-percent subsample of size b0.01 · ng1 c from {Ŷ01i }ni=1
01
∪ {Ŷ11i }ni=1
11
with replacement, and define this subsample as a simulated sample of Yg1 . Since we pool the
source samples between the control and the treatment groups for each t, the true quantile
treatment effect τqCIC is zero for all q by construction. Recall from Section 8 that the original
sample sizes are n00 = 2372001, n01 = 1287185, n10 = 2652321, and n11 = 1325598. Hence,
simulation sample sizes are b0.01 · n00 c = 23720, b0.01 · n01 c = 12871, b0.01 · n10 c = 26523,
and b0.01 · n11 c = 13255. Under this empirical Monte Carlo design, we run the same set of
estimation and inference as in Section 7, except that we focus on the left tail q ∈ (0.00, 0.10]
The top row of Figure 5 shows Monte Carlo averages and inter-quartile ranges of the
estimates, analogously to Figure 1 in Section 7. The dashed curve on the left panel indicates
the average estimates based on the conventional estimator τ̂qCIC . The dotted curve on the
right panel indicates the average estimates based on our proposed estimator τ̂qeCIC . In each
panel, the shaded region indicates the inter-quartile ranges of the estimates by the respective
methods. The solid curves indicate the true treatment effects. Since the true treatment
23
effects are homogeneously zero for all q under the current data generating design, there is
little bias in the both estimators. Therefore, the inter-quartile ranges are nicely symmetric
for the both estimators. This feature of the results contrasts with that in Section 7, where
non-trivial biases exist for the conventional estimator τ̂qCIC at the extreme quantiles.
The bottom row of Figure 5 shows Monte Carlo frequencies that the true treatment effects
are covered by the 95% confidence intervals, analogously to Figure 2 in Section 7. The dashed
curve indicates the results based on the conventional estimator τ̂qCIC and the dotted curve
indicates the results based on our proposed estimator τ̂qeCIC . The results are shown at the
extreme quantiles q ∈ (0.00, 0.10]. Although the conventional estimator does not suffer from
bias under the current design, its statistical inference still suffers from size distortions. Our
proposed extreme CIC estimator τ̂qeCIC yields substantially less size distortions than the
In this paper, we propose a new CIC estimator to accurately estimate the treatment effects
at extreme/tail quantiles. We also derive its asymptotic normality result for statistical
inference. Our proposal of these new methods is motivated by the fact that policy analysts
are often interested in treating subpopulations near tails of the distributions of outcome
variables (e.g., extremely poor individuals and infants with extremely low birth weights)
Simulation studies demonstrate that the new extreme CIC estimator along with its stan-
dard error estimator performs better than the conventional method in the tails. Based on
our observations of these results, we propose to use our proposed CIC estimator for extreme
quantiles, while the conventional CIC estimation should be used for intermediate quantiles.
Applying the proposed method to U.S. Vital Statistics Natality Data, we study the effects
24
Figure 5: Top: Monte Carlo averages and inter-quartile ranges (shaded) of the estimates
based on the conventional estimator τ̂qCIC (dashed curves on the left column) and our
proposed estimator τ̂qeCIC (dotted curves on the right column) at the extreme quantiles
q ∈ (0.00, 0.10]. The true treatment effects are indicated by the solid curves. Bottom:
Monte Carlo frequencies of coverage of the true treatment effects by the 95% confidence
intervals at the extreme quantiles q ∈ (0.00, 0.10]. The dashed and dotted curves indicate
the results based on the conventional estimator τ̂qCIC and our proposed estimator τ̂qeCIC ,
respectively.
25
of income gains from the 1993 EITC reform on infant birth weights for those in the most
critical conditions. We find significant positive effects of the income gains on infant birth
Finally, we remind the readers that this paper is accompanied by a Stata command,
ecic (extreme changes in changes). The package can be installed from SSC archive with the
following command line: ssc install ecic. After the installation, run help ecic for usage
of the command.
References
Aizer, A., L. Stroud, and S. Buka (2009): “Maternal stress and child well-being:
Almond, D., K. Y. Chay, and D. S. Lee (2005): “The costs of low birth weight,”
Camacho, A. (2008): “Stress and birth weight: evidence from terrorist attacks,” American
Cheng, S. and L. Peng (2001): “Confidence intervals for the tail index,” Bernoulli, 7,
751–760.
26
Chernozhukov, V. and I. Fernández-Val (2011): “Inference for extremal conditional
quantile models, with an application to market and birthweight risks,” Review of Economic
Currie, J. (2011): “Inequality at birth: some causes and consequences,” American Eco-
Currie, J., M. Neidell, and J. F. Schmieder (2009): “Air pollution and infant health:
Unpublished Manuscript.
Deuber, D., J. Li, S. Engelke, and M. H. Maathuis (2021): “Estimation and infer-
ence of extremal quantile treatment effects for heavy-tailed distributions,” arXiv preprint
arXiv:2110.06627.
of Econometrics, forthcoming.
sions for selection models and the black–white wage gap,” Journal of Econometrics, 203,
129–142.
Policy, 6, 258–90.
27
Ghanem, D., S. Hirshleifer, D. Kedagni, and K. Ortiz-Becerra (2022): “Cor-
Guillou, A. and P. Hall (2001): “A diagnostic for selecting the threshold in extreme
value analysis,” Journal of the Royal Statistical Society: Series B (Statistical Methodology),
63, 293–305.
Haeusler, E. and J. Segers (2007): “Assessing confidence intervals for the tail index by
Hill, B. M. (1975): “A simple general approach to inference about the tail of a distribu-
Hoynes, H., D. Miller, and D. Simon (2015): “Income, the earned income tax credit,
Sasaki, Y. and Y. Wang (2022): “Fixed-k inference for conditional extremal quantiles,”
Zhang, Y. (2018): “Extremal quantile treatment effects,” Annals of Statistics, 46, 3707–
3740.
28
Appendix
A Proof of Theorem 1
Proof. For succinctness, we use the short-hand notation Fgt (·) for FYgt (·), and accordingly
for all (g, t) ∈ {0, 1}2 – see Hill (1975). Moreover, under Conditions 1, 2, 4, and 5, we have
!
kgt F̂gt−1 (q)
p
d −2
− 1 ≡ Λ gt → N 0, αgt , (13)
log dgt Fgt−1 (q)
for all (g, t) ∈ {0, 1}2 by Theorem 4.3.8 in de Haan and Ferreira (2007), where dgt ≡
kgt / (ngt (1 − q)). Given the indepence of {Ygt } across g and t under Condition 1, {F̂Y−1
gt
(q) , α̂gt }
are also independent across {g, t}. Thus, it suffices to derive the limit of the second item in
and we are going to linearize Ân /An − 1 around zero. First, note that we have
(k +1)
!
p Ygt gt d −2
kgt − 1 ≡ ∆gt → N 0, αgt (14)
Fgt−1 (1 − kgt /ngt )
from Theorem 2.4.8 in de Haan and Ferreira (2007) and our Condition 4. Second, we
29
−1 k10
1−
!
α̂00
(k +1)
Y10 10 α00 F 10 n10
+ log (k +1)
− log
α̂01 Y00 00 α01 −1
F00 1− k00
n00
1 1 k01 n00
+ − log
α̂ α01 n k00
01 01
α̂00 α00 k10
+ − log
α̂10 α̂01 α10 α01 n10 (1 − q)
≡ I1n + I2n + I3n + I4n .
−1 k10
F10 1 − n10
!
(k10 +1)
α̂00 Y10
+ log (k +1)
− log
α̂01 Y00 00 −1 k00
F00 1 − n00
−1 k10
(α̂00 − α00 ) α00
F 10 1 − n10
= − (α̂01 − α01 ) log
α̂01 α̂01 α01 −1
F00 1 − nk00
00
(k10 +1) (k00 +1)
α̂00 Y10 Y00
+ log − log
α̂01 F −1 1 − k10 F −1 1 − k00
10 n10 00 n00
−1 k10
−1/2
k00 Γ00 α00 F 10 1 − n10
−1/2
= − k01 Γ01 log
−1/2 2 −1/2 −1 k00
α01 + Op k01 α01 + Op k01 F00 1 − n00
α00
−1/2 −1/2
h
−1/2 −1/2
−1/2 −1/2
i
+ + Op k00 + k01 k10 ∆10 − k00 ∆00 + op k10 + k00 .
α01
by (12) and (14). For term I3n , we rewrite it as
h
−1/2
−1/2
i k01 n00
I3n = k01 Γ01 + op k01 log
n01 k00
by (12). For term I4n , we rewrite it as
−1/2
k00 Γ00 −1/2
− α2α00 k10 Γ10
α10 α01 10 α01
k10
I4n = log
n10 (1 − q)
α00 −1/2 −1/2 −1/2 −1/2
− α10 α2 k01 Γ01 + op k00 + k10 + k01
01
30
by (12).
Conditions 4 and 5 imply that dgt = kgt / [ngt (1 − q)] → ∞ for all g, t ∈ {0, 1}2 . More-
over, Condition 2 implies that Fgt−1 1 − nkgt
gt
= O((kgt /ngt )−1/αgt ) for all g, t. Then using
−1 −1
α10 log (k10 /n10 ) α00 log (k00 /n00 )
= O − +
log (k11 /n11 ) − log (1 − q) log (k11 /n11 ) − log (1 − q)
= o(1).
Now, combining I1n , I2n , I3n , and I4n , and using the fact that exp(x) = 1 + x + O(x2 ) as
x → 0, we obtain
√ !
k11 Ân
−1
log (d11 ) An
" #
1/2 1/2 1/2
log (d10 ) k11 Γ00 k11 α00 Γ10 k11 α00 Γ01
= − 2
− 2
+ op (1)
log (d11 ) k00 α10 α01 k10 α10 α01 k01 α10 α01
2 2 2
d λ11/10 α00 α00 α00
→ N 0, λ11/00 2 2 + λ11/10 2 2 + λ11/01 2 2 .
η11/10 α10 α01 α01 α01 α10 α01
31
!
1/2 −1
k11 F̂11 (q)
= −1 −1
log d11 F11 (q)
!
1/2
k Ân An
− 11 −1 −1
log d11 An F11 (q)
d
→ N (0, Ω) ,
where
2
λ11/10 2 α2
−2 1
Ω= α11 + λ11/00 + λ11/10 + λ11/01 2 002 .
ς η11/10 α10 α01
This completes the proof.
B Proof of Corollary 1
Proof. The proof follows once we establish (12)–(14). Our Condition 7 is the same as
Girard et al. (2021, eq.(2)). Our Condition 2 is sufficient for their second-order Pareto tail
condition C2 (γ, ρ, A). Then (12) and (14) directly follow from their Corollary 2.1. Using the
same proof of Theorem 4.3.8 in de Haan and Ferreira (2007), (13) further follows from (12),
32