11_Diff-n-diff
11_Diff-n-diff
Difference-in-differences (DiD)
Deepak Singhania
HS 649
17-10-2024
Main reference: World Bank (2016). Impact Evaluation in Practice. Link
Objective
To understand the difference-in-differences method, and application.
Before And After
What is before and after comparison?
- Pre-assessment and post-assessment
Before After
intervention intervention
Connected to intervention
and outcome
INTERVENTION
An example of the Before And After Approach
You started a remedial education program for % of Children who
school-going children in Std 2 to improve their can read grade 1
reading skills of grade 1 level text. level text
53%
The remedial education classes are run after
school hours. 38%
The outcome of interest is the ability to read
grade 1 level text
You know that there are no other remedial
education programs/after-school support
running in village. Before After
Cross-Sectional comparison
Comparing two groups at same time. What is the underlying assumption for
causal identification?
Counterfactual!
Difference-in-differences
It is a mix of “cross-sectional” and
“before and after” comparison. One of
the most widely used methods.
Includes:
Before After Difference
• cross-sectional comparison MEANS
Treatment Treatment
• before and after comparison
Treated A B A-B
B-A
Control C D C-D
D-C
D-n-D basically compares changes in Difference A-C B-D (A-B)
(B-A) – (C-D)
(D-C)
outcomes over time for one group (B-D) – (A-C)
with changes in over time for the
other group. In other words it This is acting as counterfactual. The only
difference is that treated got treatment
compares the trends. Look at the table.
Difference-in-Difference Approach, with example
Learning Levels
Key Identifying Assumption
Before-after
Impact D-n-D In the absence of the intervention,
Impact the outcome levels would have
Cross-sectional increased by the same amounts in
Treated school treatment and control group.
Control school
Parallel trends assumption: it is
Before After
important to show parallel trends to
prove the impact through this method.
With an untreated group, we are closing both the time and
the cross-sectional back doors.
1. Isolate the within variation for both the treated group and untreated group.
Because we have isolated within variation, we are controlling for group
differences and closing the back door through Group (the “differences’’)
2. Compare the within variation in the treated group to the within variation in
the untreated group. Because the within variation in the untreated group is
affected by time, doing this comparison controls for time differences and
closes the back door through Time (the “difference” in those differences)
Example: Snow 1855
Most famous and oldest example of DiD
Snow 1855
▪ demonstrated to the world that cholera was spread by fecally-contaminated
water and not via the air
the entire plan behind a difference-in-differences design is to use the change in the
untreated group to represent all non-treatment changes in the treated group
So, you can’t have a control / untreated group that is having it’s own
kind of changes or trend that is specifically different for that group
in mathematical terms…
• The difference between pre-treatment and post-treatment in the treated group is
Effect of Treatment + Other Treated Group Changes
• So, essentially, for DID to be correctly identify the last two terms in the final
expression about should exactly cancel each other out.
So what to look for?
1.There’s no particular reason to believe the untreated group
would suddenly change around the time of treatment.
𝛽2 𝛽3
𝛽2 + 𝛽3
Some key aspects of D-n-D to remember
❑ The effect of a treatment is on the change in outcome and not on the overall level of outcomes.
❑ Showing a parallel trends is a must to argue that you have clearly identified the effect of x on y,
or that your analysis is causal.
❑ Baseline differences, in levels, could be a cause of concern, but if you show valid enough
parallel trends then you don’t need to worry about it.
❑ If you have more many period data, you need to be careful about running a regular diff-n-diff
regression. (Synthesis of D-n-D method progression)
Exercise
o T and C are treatment and control groups.
o In the baseline (i.e. t=0), average unemployment rate is 10% for T, and in the
endline (i.e. t=1) it is 12%
o For the control group these nos. are 8% in the baseline and 10% in endline
Moratorium on splitting and the timing of elections were plausibly exogenous (Burgess
et al. 2011, Bazzi and Gudgeon 2014) and hence provide robust causal estimate
Simpler specification (Effect of
Splitting)
𝑌𝑖𝑑𝑡 = ∝ + 𝛽1 𝑃𝑜𝑠𝑡𝑡 + 𝛽2 𝑆𝑝𝑙𝑖𝑡𝑑 + 𝛽1 (𝑆𝑝𝑙𝑖𝑡 ∗ 𝑃𝑜𝑠𝑡)𝑖𝑑𝑡
Full Specification (in my paper)
D-n-D in Stata
reg elec_hh_pc year jst_splt_dmy splt_post_dmy, cluster(knkab)
Simple and interesting read on diff-n-diff with two time periods and
multiple time periods
Next
Impact of MGNREGA using diff-n-diff method.