0% found this document useful (0 votes)
10 views

Chapter 7 Special cases

This chapter discusses three quasi-experimental methods used in economics: synthetic control, regression discontinuity, and instrumental variables, emphasizing their application in contexts where randomized controlled trials are not feasible. It highlights the importance of reconstructing counterfactuals to evaluate policy impacts, particularly through the synthetic control method illustrated by California's tobacco control policy. Additionally, the chapter outlines the regression discontinuity design, which compares outcomes around a specific eligibility threshold, providing a robust evaluation strategy for policies with clear cut-off criteria.

Uploaded by

estebanlider
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter 7 Special cases

This chapter discusses three quasi-experimental methods used in economics: synthetic control, regression discontinuity, and instrumental variables, emphasizing their application in contexts where randomized controlled trials are not feasible. It highlights the importance of reconstructing counterfactuals to evaluate policy impacts, particularly through the synthetic control method illustrated by California's tobacco control policy. Additionally, the chapter outlines the regression discontinuity design, which compares outcomes around a specific eligibility threshold, providing a robust evaluation strategy for policies with clear cut-off criteria.

Uploaded by

estebanlider
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Impact Evaluation, Quasi-experimental Methods,

Special Cases
Aurélia Lépine

Introduction
This chapter will focus on three quasi-experimental methods that are widely used in
economics:

• Synthetic control

• Regression discontinuity

• Instrumental variables

Those methods are called special cases because you will see that we will need a special
context to be able to apply them.

Motivation for using quasi-experimental methods


We know by now that evaluations of policies or interventions requires a counterfactual. A
counterfactual is information on what the level of outcome in the treatment group would have
been in the absence of the intervention. Randomised controlled trials are considered the gold
standard for evaluations because the level of outcome in the control group can be used as a
counterfactual, and the effect of the intervention is given by the difference in the level of
outcome in the treatment and in the control group.

The issue with randomised controlled trials is that they are not always feasible. You may
observe contexts in which there is an absence of the control group, for example, when
policies are implemented at a national level. Sometimes, randomisation is not acceptable or
ethical, for example, in humanitarian projects. It is also very common that an evaluation is
not planned prior to the implementation of the policy. Sometimes we do have a control group,
but the control group is imperfect because it does not have exactly the same characteristics as
the treatment group. For example, imagine a policy that is implemented in areas with specific
characteristics, such as cities. It is very hard to use rural areas as a control group because we
know that individuals who live in cities and rural areas have different characteristics.

In order to overcome this issue of the absence of a control group, we use quasi-experimental
methods that allow us to reconstruct the counterfactual.
Synthetic Control
This is the most recent quasi-experimental method that has been developed. In order to
explain how the method works let’s use the most famous example of the application of this
method, which is the example given by the people who developed the method in their
seminal paper: the evaluation of the effect of tobacco control policy in California1.

The authors wanted to know whether a policy to reduce the consumption of tobacco in
California in 1988 has been effective in reducing cigarette sales about a decade after it has
been introduced, in the year 2000. The policy is called Proposition 99 and it’s a proposition
that consisted of two different aspects. The first one is that it aimed to increase tax on
tobacco. Assuming that the demand for tobacco is price elastic, we expect that an increase in
price should lead to a reduction in the demand. The second aspect of the policy is that
revenues that are generated by the introduction of the tax are used in order to finance
antismoking campaigns in Californian schools and counties.

The difficulty in evaluating the policy arises from the fact that it had been implemented
everywhere in California. Therefore, there was not a perfect control group because there is
only one California in the US and it was treated. The absence of a control group makes the
evaluation of the impact of the policy quite difficult.

If we look at the trends in cigarette sales in California over time, as shown in Figure 1, we
can see that there has been a decrease in cigarette sales; a decrease that was occurring prior to
the implementation of Proposition 99 in 1988 and continued after the introduction of the
policy.

Figure 1 - Trend of per-capita cigarette sales in California

We cannot really say anything about the effect of Proposition 99 from this graph. If we were

1
Abadie, A., et al. (2010). ‘Synthetic control methods for comparative case studies: Estimating the effect of
California’s tobacco control program.’ Journal of the American Statistical Association 105(490): 493-505.
implementing an interrupted time series design to evaluate its effect (which you will learn
about next week), we would probably conclude that there is no effect because we cannot see
any interruption in the trend at the time of the introduction of the policy.

Assuming that the dashed line in Figure 2 is the trend in the counterfactual, i.e. the level of
outcome in the absence of the intervention, we would conclude that the introduction of the
policy did decrease cigarette sales because the level of cigarette sales we observed with the
policy is lower than the one we would have observed without the policy. In this case, a.
reduction by 26 packs per capita per year in 2000, which is the difference in cigarette sales
between the counterfactual and the level of outcome that we observed.

Figure 2 – Hypothetical counterfactual for the trend in per-capita cigarette sales in California

But maybe the counterfactual was that there was no effect at all (e.g. the dashed line and the
solid line are the same), in which case we would conclude that the introduction of the policy
has been ineffective in reducing cigarette sales because we did not see a difference in the
trend in the counterfactual and the trend that we observed in the presence of the policy.

Which control could we use?

We could, for example, use one of the 50 states in the US that has not implemented such a
policy. Of course, we would need to ensure that this state will share the exact similar
characteristics than California before the policy was introduced.

Another option, and this is what the authors present in their paper, would be to compare
California to the rest of the US. So, let’s say we take an average of cigarette sales in the other
US states and we compare the level of cigarette sales in California to the rest of the US, as
seen in Figure 3.
Figure 3 – Trends in per-capita cigarette sales: California vs. the rest of the United States

The level of outcome is lower in California than in the rest of the US, but it would be hard to
conclude that this is due to Proposition 99 because the trend in California and the trend in the
rest of the US were different before the implementation of the policy.

The aim of the synthetic control method is to use a data driven procedure to reconstruct a
synthetic California that will have exactly the same trend prior to the intervention as the one
that we observe in the real California. We can then be confident that the level of outcome we
observe in the synthetic California would have been the one we would have observed in the
real California in the absence of Proposition 99.

We are going to reconstruct a synthetic control, in this case, a synthetic California, by


reweighting other US states so that the trend of the synthetic California overlaps with the
trend of the real California.

Figure 4 shows the results that the authors present in their papers.
Figure 4 - Trends in per-capita cigarette sales: California vs. synthetic California

The authors have managed to reconstruct a synthetic California that has very close
characteristics to the real California. This is shown by the perfect overlap between the two in
the pre-intervention trend (level of cigarette sales per capita before the introduction of the
policy). Table 1 shows the similarities in other characteristics as well (e.g. GDP per capita,
age distribution, retail price, beer consumption per capita). Thus, we can trust that the
divergence observed in the level of outcome between the real California and the one we were
observing in the rest of the US is due to the introduction of the policy.

Table 1- Cigarette sales predictor means

The effects of the policy can also be presented by the gap, which is the difference in the level
of outcome between California and the synthetic California, as seen in Figure 5.
Figure 5- Per-capita cigarette sales gap between California and synthetic California

Because the overlap is almost perfect in the pre-intervention period we can see that the gap is
close to zero before the introduction of the policy (i.e. no difference between the level of
outcome in California and the synthetic California before the intervention), which ensures
that we have managed to reconstruct a control group that has exactly the same characteristic
as in California. We can see that over time, after the implementation of the policy, there is a
huge decrease in cigarette sales in California compared to the level we would observe in the
counterfactual, i.e. in the absence of the policy.

So what is the synthetic California? Synthetic California is constructed by a weighting


procedure whereby the characteristics of the US states are used to determine the weight that
the states are given. Table 2 shows the weights given to the other US states. A higher weight
was given to the states that share the closest characteristics to California, while some states
were given a weight of zero because they were simply too different from California. We can
see that the synthetic California is actually made of five states: Colorado, which accounts of
16% in the synthetic California; Connecticut for about 7%; Montana for about 20%; Nevada
for about 23%; and Utah for about 33%, a neighbour state of California which accounts for
the largest share of the synthetic control.
Table 2 - State weights in the synthetic California

Is the effect statistically significant? In this case, because this methodology is based on a data
driven procedure, we do not obtain a standard error. We’re estimating the effects by looking
at the difference in the level of outcome between the real California and the synthetic
California. We can appreciate that the coefficient we obtained was quite high (a reduction of
about 26 packs per person, per year) but we do not know if this effect is statistically
significant, therefore, we have a question of whether the result could be by chance?

In order to answer this question, we are going to use placebo tests, also called permutation
tests.

We are going to apply the synthetic control method for each of the states that are in the donor
pool, with California being in the donor pool now. By donor pool we mean all the states that
have not implemented the policy. We are going to test if the effect of the policy that we
obtained in California is higher than the effect we observe in the donor states. This test
ensures that the effect we observe in California has not occurred by chance. We obtain the
trends in Figure 6 that are represented in grey.
Figure 6 - Per-capita cigarette sales gaps in California and placebo gaps in 34 control states (discards states
with pre-Proposition 99 MSPE twenty times higher than California’s).

We can see that in some donor pool states, or placebo states, we observe an increase in the
level of outcome, and in others we observe a decrease in the level of outcome. We do not
really know the reason for these effects in the donor pool. There are many things that can
affect cigarette sales in the states, but what is sure is that such variations are not due to the
implementation of the policy because there was no policy implemented in those states.

What we want to ensure is that the effect we observe in California is outside the distribution
of effects that we obtained using the placebo test. The maximum distribution we see here, the
95th percentile, is an increase in about 26 packs, and the 5th percentile is a reduction in about
18 packs. Therefore, we can conclude that the reduction that we observe in California is more
important than the reduction we observe in the donor pool state where the reduction was
strongest.

From this distribution of effects, we can calculate a p-value, which is the rank of the treated
state in the distribution of effect divided by the total number of states in the distribution. In
our case California has the largest effect out of the 39 states presented in Figure 6, therefore
its rank is 1 and the p-value can be calculated as 1/39=0.02. As a result, we can conclude that
the effect is statistically significant.

This is only an introduction to the method. For more information, please read the referenced
paper.
Regression Discontinuity
This method was developed in the 60s in the field of educational psychology. It is a very
powerful method but it is not widely adopted in the epidemiology literature. This method is
considered by scholars as an evaluation strategy that provides results that are as compelling
as the estimates we can derive from a randomised experiment. It has been proven that we
would observe very close results by using this method as we would by using a randomised
controlled trial. It is important to understand when we can use this evaluation strategy and
also to understand how it works in order to apply it correctly.

When can we use regression discontinuity?

You will see that this method is a ‘special case’ method because we can only use it when a
programme is assigned based on a transparent rule i.e. the programme is implemented in
places that meet a clear cut off value rather than on the discretion of administrators.

Imagine we are implementing a policy targeting the poorest households and only those who
earn less than £1,000 are eligible, therefore, the assignment variable here is income. The idea
behind the regression discontinuity design is that we are going to compare the outcomes in
households that fall within a narrow range on either side of the £1000 cut off value. For
example, we are going to compare households that are eligible for the programme who have
earnings of £999, with households who are not eligible but who are very close to the cut off,
so those with earnings of £1,001. We can imagine that these households that have earning of
£1001 will share very similar characteristics to the eligible household with earnings of £999,
but they are not eligible the programme because they do not meet the cut off threshold.
Therefore, the allocation of the programme based on an arbitrary cut-off value may be as
good as a random allocation. An illustration of this is shown in Figure 7.
Figure 7 – Regression discontinuity illustration

There are many examples of policies that use a clear cut off in public health. As in the case of
our example, many policies in high and low income countries are assigned based on income.
We can also evaluate the effect of certain drugs that are often given depending on a cut off.
For example, antiretroviral treatment drugs are usually prescribed when CD4 counts are
lower than 200 cells. Therefore, we could compare the effect of antiretroviral treatment on
mortality or on other health outcomes by comparing the outcomes of people who are eligible,
so have CD4 count just below 200 and those who are not eligible with a CD4 count just
above 200. The same logic can be applied to hypertension medications that are prescribed
depending on the level of blood pressure, and with statins that are usually prescribed
depending on a risk score. Another example are policies implemented depending on age. For
instance, most countries in Africa offer free care to children under five, so we compare the
effect of free care on healthcare utilisation and health outcomes of children by looking at the
difference in outcomes between eligible and non-eligible children within a narrow range of 5
years of age.

Let’s take another example, which you can find in the referenced book2.

This is an evaluation conducted by the World Bank of a programme that was assigned to low
income households by using a poverty index. All low-income households were receiving
health insurance and the evaluation was to find out whether there was a strong effect of
health insurance on health expenditure. The idea was that providing health insurance can
improve financial protection, and we can expect that people who are insured will have lower
health expenditure. The evaluation of this programme was possible with a regression
discontinuity design because there was a clear definition of eligibility in this programme:
only households with a poverty line index below 58 received the programme. Therefore,
households who were just below and just above the poverty line of 58 were compared to see
if there is a difference in financial protection. The results are shown in Figure 8.

2
Gertler P, Martinez S, Premand P, Rawlings L, and Vermeersch C (2011). Chapter 5. Regression discontinuity
design. Impact Evaluation Methods in Practice. Washington DC, World Bank.
Figure 8 – Poverty index and health expenditures – The health insurance subsidy program: baseline and two
years later

If we look at the data at baseline, we can see that there is a positive relationship between
income and health expenditure, which is normal because we know that the richest people
spend more money on health, and that there is no discontinuity in this relationship. However,
if we look at the effect of health insurance two years after its implementation, the households
that were eligible (point B) have much lower health expenditure than the households who
were not eligible (point A) and who are just above the poverty index of 58. Here, the
difference between health expenditure for eligible and non-eligible households is the effect of
having health insurance. If we use a simple multivariate linear regression we can see that the
programme reduced health expenditure by about $9 per household. This method, in contrast
to synthetic control, provides a standard error, so we can see directly that this effect is
statistically significant.

In addition to being very robust, this method is also very economical because it does not
usually require data collection. We can use secondary data to apply this method. It is indeed
much cheaper than doing a randomised controlled trial.

Limitations

One limitation of this method is that it requires many observations around the threshold,
therefore, data needs to be collected at many time points.

Another limitation is that we can only estimate the effects of the policy around the cut-off
value. For example, it would have been interesting to measure the effect of providing health
insurance on the poorest households. However, we are not able to do that, we are only able to
conclude that providing health insurance is effective for households who are very close to the
cut off, which are the richest households in the pool of poor households.

Finally, a main assumption of this method is that the only reason for discontinuity is the
policy effect. We need to make sure that there are no other policies that are being provided
based on the same cut off. For instance, if we take the example of the free health care policy
for children under five. The issue that arises is that in many developing countries there are
several other policies that are available to children under five. Therefore, we have to make
sure that the policy we are evaluating is the only one that is given based on this specific cut
off, otherwise we cannot conclude that the difference in the level of outcome is due to the
policy itself.

Instrumental variables
Instrumental variable is a method that is used in statistics in order to recover the causal effect
of a variable on another variable and is not exclusively used for the purpose of policy
evaluation.

The endogeneity issue

You will see many papers on the endogeneity problem. This means that often times, we are
able to report on a correlation between two variables, but we are unable to make conclusions
about the any causal relationship that may exist between those two variables.

Let’s assume that we want to estimate the effect of X on Y:

Y = βX + U

We have x, which is a covariate, and Y is our dependent variable, or the outcome of interest.
Imagine we want to estimate β (Beta), the coefficient of the covariate.

In this regression we have an error term that I will call ‘U’. This includes all the variables that
have an effect on Y but are not observed. They are not observed because they have not been
measured or we do not have this information in our data.

The problem of endogeneity occurs if this error term U is correlated with x:

There are many reasons where we would have this correlation between the error term and the
variable of interest. It can be due to confounding factors correlated with X. It can be due to
reverse causality, that we expect X to have an effect on Y but maybe Y also has an effect on
X. There can also be a measurement error where X is not perfectly measured. Many variables
are hard to measure, such as income. We do not directly observe income, so we rely on what
respondents tell us. In some cases, we measure income by looking at asset ownership or
consumption or expenditure, which is not a perfect measure and can result in this problem of
endogeneity because of measurement error.

If we have any one of these problems then the β, the coefficient of X, will be biased if we use
basic regression techniques. We would only be able to estimate a correlation between X and
Y, but we could not conclude that there is a causal relationship between X and Y.

In order to recover this causal effect we can use an instrumental variable. An instrumental
variable creates an exogenous variation on the variable of interest X. We will look for an
instrumental variable, and call it Z:

Z is going to generate an exogenous variation on X, but it is not going to be correlated with


Y. In other words, Z is only going to affect the outcome through its effect on the variable of
interest and will not be correlated with the error term U, or with the outcome Y. Resolving
this issue of correlation between X and U allows us to recover the causal effect of X on Y.

Let’s take an example of an instrumental variable. The approach behind the instrumental
variable method was used in some ways by John Snow in order to establish that cholera was a
waterborne disease. At that time, he did not know if water impurity was the cause of cholera.
The equation that John Snow was trying to resolve was this:

Incidence of cholera = β water impurity + u

He wanted to know if water impurity was not only associated but was really affecting the
incidence of cholera. What he was trying to estimate in some ways, was the coefficient
associated with water impurity to be sure that water impurity was the main factor influencing
the incidence of cholera.

Here, we have this problem of endogeneity. We can consider that water impurity is also
correlated with the error term, which includes all the variables that we do not observe but that
may also be affecting the incidence of cholera. There are many reasons why we could think
that water impurity is endogenous, for example, people who drank impure water at this time
may have been more likely to be poor. They may have also been more likely to live in
crowded tenements or in an environment that was contaminated in other ways.

What could serve as an instrument in order to recover the causal effect of water impurity on
the incidence of cholera? We need a variable here that is a strongly affecting the consumption
of impure water, but does not have a direct effect on the incidence of cholera.

At that time, there were two different water companies supplying households in London with
water. One located in Lambeth and one located in Southwark and Vauxhall. The Lambeth
water company drew water from the river upstream the main sewage discharge, while the
water provided by Southwark and Vauxhall water company was contaminated.

John Snow noticed that the water company supplying households with drinking water was a
main cause of the consumption of impure water. We can see in Table 3 that indeed there were
many more deaths among the households that were receiving their water from Southwark and
Vauxhall compared to the households that were receiving their water from Lambeth
company. Thus, there is evidence that there is a strong correlation between our instrumental
variable, the water company, and water impurity. This is the first assumption we want to test
when we are using an instrumental variable.

Table 3 – Cholera deaths by water company supplier

The second question is whether this instrumental variable, the water company, is uncorrelated
with the outcome, the incidence of cholera. There is evidence that at that time Londoners
were subscribed to one of the two water companies in a close to a random manner. John
Snow wrote that “no fewer than 300,000 people of both sexes, every age and occupation, and
of every rank and station, from gentlefolks down to the very poor, were divided into two
groups without their choice, and in most cases, without even knowing to which company they
were affected. One group was supplied with water containing the sewage of London, which is
the group receiving the water from Southwark and Vauxhall, and the other group was
receiving water quite free from such impurity, which is the group receiving the water from
Lambeth company.”

In this case we can conclude that the water company is a valid instrumental variable because
it is strongly correlated with water impurity but is not directly correlated with the incidence
of cholera because the allocation to the water company was close to random.

This method is very appealing because most of the time we are interested in causal effect
rather than correlation. However, in practice, it is very hard to find valid instrumental
variables. While it is easy to prove that the instrumental variable is strongly correlated with
the endogenous variable, or the variable of interest, it is never possible to argue with certainty
that the instrumental variable is uncorrelated with the outcome. This is why in practice
situations where you would be able to use this method are rare.

Conclusion
Each of the methods introduced here are different in the context in which they can be applied
but share the fact that they are able to reconstruct the counterfactual and thus, assess a causal
effect of a policy on an outcome. However, we also refer to them as ‘special cases’ because
they are not always feasible to use.

Requirements:

The synthetic control requires several pre-intervention periods because the logic behind this
method is to reconstruct a pre-intervention trend that overlaps with the trend we observe in
the treated unit. Therefore, it requires a lot of data in the pre-intervention period. It also
requires several untreated units, which was the case in the example of California. Finally, it
requires undertaking a placebo test in order to determine whether or not the effect we observe
in the treatment unit is by chance.

The main requirement for the application of regression discontinuity is to have a clear cut off
and that the policy or intervention was really implemented using this cut off. We also have to
be certain that there are no other policies that are being provided based on the same cut off.

Using an instrumental variable to determine causal effects requires that the instrumental
variable has two properties: it needs to be strongly correlated with the variable of interest but
uncorrelated with the outcome.

You might also like