0% found this document useful (0 votes)
9 views

DiD Regression

NUS BT2101 DiD Regression

Uploaded by

datnt21413ca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

DiD Regression

NUS BT2101 DiD Regression

Uploaded by

datnt21413ca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

BT2101 AY 23/24 Semester 2

Assignment 4

DiD Regression
Load the data from Wooldridge

Select Kentucky Data

Please use data


only from Kentucky.
Question (a)

• Estimate the impact of the policy based on a difference-in-differences (DiD) regression without
including any other control variables.

• Guidelines:
oUse durat as the dependent variable.
oYour specification should include three terms: the control/treatment group dummy, before/after
intervention time dummy, and a term capturing the interaction between control/treatment and
before/after the intervention.

• Please interpret all the coefficient estimates in your regression table.

3
Question (a)

4
Coefficient interpretation
• Intercept = 6.272 means the duration is 6.272 on average for those without highearn and before the policy
change. This is statistically significant at the 5% level, since p-value < 2e-16.

• Coefficient of afchnge = 0.766 represents the expected change in duration if the policy change is implemented on
low-earn individual, ceteris paribus. The coefficient is statistically insignificant at the 5% level since p-value =
0.314. Hence, we cannot reject the null hypothesis and continue to believe that no change in duration can be
expected if the policy change is implemented on low-earn individual, ceteris paribus.

• Coefficient of highearn = 4.905 means without the policy change, the duration of high-earn is expected to be 4.905
weeks more than low-earn, ceteris paribus. This is statistically significant at the 5% level, since p-value = 1.3e-09.

• Coefficient of afchnge*highearn = 0.951 means the expected change in duration is 0.951 more for policy
implementation on high-earn group compared to low-earn group. The coefficient is statistically insignificant at the
5% level since p-value = 0.414. Hence, we cannot reject the null hypothesis and continue to believe that no
change in duration can be expected if the policy change is implemented on high-earn individual, ceteris paribus.

5
Question (b)
• Change the dependent variable into ldurat, and repeat a similar DiD regression as the
question (a). Please interpret all the coefficient estimates in this regression table.

Log-transformation Recap

6
Question (b)

7
Question (b)

• Intercept = 1.126 means the log duration is 1.126 on average for those without highearn and
before the policy change. This is statistically significant at the 5% level, since p-value < 2e-16.

• Coefficient of afchnge = 0.0077 means the duration will be expected to increase by 0.77% if the
policy change is implemented on low-earn individual, ceteris paribus. This is statistically
insignificant at the 5% level, since p-value = 0.864 is large.

• Coefficient of highearn = 0.2565 means without the policy change, the duration of high-earn is
expected to be 25.65% more than low-earn, ceteris paribus. This is statistically significant at the
5% level, since p-value = 6.72e-08.

• Coefficient of afchnge*highearn = 0.1906 means the policy change's effect on the duration is
19.06% more for high-earn group compared to low-earn group, ceteris paribus. This is statistically
significant at the 5% level, since p-value=0.00542.

8
Question (c)

• Using ldurat as the dependent variable, and the independent variables already used in
the previous question, now add more control variables: male, married, and the full set of
industry and injury type dummy variables.

• How does the coefficient of interaction term change when these other factors are
controlled? Is the estimate still statistically significant? Please explain the changes, if any.

9
Question (c)

The coefficient of the interaction term changed from


0.190601 to 0.230877 (increasing)

It is still statistically significant since p-value is < 0.05


The coefficient slightly increased. A possible reason is
that when we add more control variables, we remove
noises in the effect, which can be seen in much lower
p-value, although it is already very low. (a gain from
complicating the model)

10
Question (c) – Alternative way to include dummy variables

Or, by using dummy variables through as.factor()

11
Question (d)
• Your colleague argues that we cannot draw a causal inference due to the small
magnitude of the R-squared and adjust R-squared in question (c).

• How will you respond to this argument? Explain.

R-squared is 0.0412 and adjusted R-squared is 0.0387.

Small multiple R-squared value indicates that covariates in the regression model only
explain 0.04 of the variance observed in the dependent variable ldurat.
However, this does not indicate that the estimation is useless or biased as the
estimation is still statistically significant in this model. Model fit does not indicate the
statistical significance of the impact we aim to measure. This is determined by the p-
value of the coefficient, which remains significant.
Moreover, employing Difference-in-Differences (DID) with controls is effective in
addressing concerns about endogeneity. The causal inference still holds.
12
Question (e)

• What is the most critical assumption of the difference-in-differences model? Even if you
cannot provide conclusive proof, can you use the data to offer some qualitative
support/opposition to the validity of this assumption in this dataset? Using your own
words, discuss what plots and/or statistics would help you support/oppose this
assumption. Construct/compute these plots/statistics and make a concluding statement
describing your support/opposition to the validity of this critical assumption in this dataset.

13
Question (e)

The most critical assumption is that the control (baseline) and treatment group
follows a parallel trend.

To observe parallel trends, we can plot a line graph to observe the trend of ldurat
over time. However, with the current dataset, we are unable to plot a line graph due
to the lack of information on how ldurat changes over time. We only have sufficient
data to differentiate ldurat just before and after implementation of the policy.

14
What will happen if I introduce fixed effects to the DiD analysis with the
treatment group indicator? (Extra insights)

Treatment group indicator indicates whether the observation is from treatment group or not.

What will happen to the treatment group indicator if we cover fixed effect in the above model_b?

Recall: fixed effect refers to the effect brought by the fixed characteristics that vary by unit but not by
time. Here the unit is each person (or individual).

If I let the above model to cover fixed effect, the treatment group indicator will
be omitted automatically from the model coefficient estimation.
Including Fixed-effect VS DiD
In Difference-in-Differences (DID) analyses, the treatment group indicator may sometimes be omitted.
This happens due to introducing fixed effects (FE), which can cause perfect multicollinearity.
If introducing fixed effect (recall from last tutorial: fixed effect is the effect varying across units but not
varying over time) to the model, we are giving each unit different intercept. In this example, the unit is
each person. i.e. we are giving each individual one unique intercept.
In DiD analysis, the coefficient estimation of the treatment group indicator give treatment group and
control group different intercept. So from the intercept perspectives, unit fixed effect and DiD are just
ways to capture the intercept in different level (unit fixed effect capture individual intercept VS DiD’s
treatment group indicator capture the intercept of treatment group/control group).
So if I include the unit fixed effect, let the model estimate individual intercepts, the model then will not
estimate the coefficient of treatment group indicator. Because if I estimate the intercept of each
individuals, the intercept of being in treatment group/control group is directly included. In other words,
treatment group indicator does not provide any new information anymore. Individual variation
already covers the variation brought by this individual being in treatment group or not.
Including Fixed-effect VS DiD
Another way to look at it, why introducing fixed effect will make treatment group
indicator redundant and be automatically omitted from model estimation, is that:
Introducing unit fixed effect will net out all the fixed characteristics’ impact
that vary across units but not over time. The treatment group indicator, meaning
whether this observation is in treatment group or control group, is one fixed
characteristics that vary across units (different individuals may be allocated to either
treatment group or control group) but not by time (individuals stay in the same group
throughout the different time point).
Therefore, by introducing unit fixed effect, the effect brought by the treatment group
indicator is net out automatically. In this case, the model will not need to predict the
effect brought by treatment group indicator and will need to omit the indicator
because adding redundant information will cause perfect multicollinearity.
Thank you!

You might also like