How To Conduct Propensity Score Matching - An Introduction
How To Conduct Propensity Score Matching - An Introduction
by Simon Moss
Introduction
Many studies are designed to examine the effect of some intervention or treatment. If
possible, to explore the impact of this intervention, researchers should conduct a randomized
control trial in which participants are randomly allocated to conditions. However, in many
circumstances, researchers cannot randomly allocate participants to conditions. In these
circumstances, to address this problem, they can utilize a statistical technique called propensity
score matching. Propensity score matching is often more effective than alternative techniques such
as ANCOVA or multiple regression. The reason is that propensity score matching is effective even if
the covariates are not linearly related to the outcome.
To appreciate the importance of propensity score matching, you need to understand the
distinction between randomized control trials, sometimes called experiments, and quasi
experimental designs. To illustrate, suppose you wanted to examine whether caffeine improves
marks on exams. If the researcher conducts a randomized control trial
• half the individuals would be randomly assigned to the condition in which they receive a caffeine
pill
• the remaining individuals would be randomly assigned to the condition in which they receive a
placebo pill instead, perhaps a sugar pill
• all individuals would then complete the exam, and their marks would be recorded
Quasi-experimental design
• perhaps some people are allergic to caffeine—and, therefore, these individuals need to receive
the placebo
• or perhaps, rather than ask people to consume either caffeine or a placebo, the researchers
simply ask the participants whether they have consumed caffeine—such as coffee—this morning
Receives caffeine then completes exam: Average 85%
• as this figure shows, even before the study commenced, the participants who received caffeine
differed from the participants who received placebos
• for example, the participants who received caffeine might have been older or more intelligent
• their higher score on this exam could thus be ascribed to age or intelligence rather than caffeine
Therefore, as this simple example demonstrates, the results derived from quasi-experimental
designs are not as compelling as the results derived from randomized control trials.
In practice, the difference between the intervention and control on these covariates is not
usually as pronounced. Therefore, to decide the probability that each person would have been
assigned to the intervention condition, the software
2 31 97 8 .12 Control
3 28 99 6 .87 Intervention
So, how does the software utilize these probabilities—called propensity scores? To answer
this question, consider the following table. In this table, the first column represents the propensity
scores of individuals in the intervention group. The second column represents the propensity scores
of individuals in the control group.
.84 .82
.43 .45
.59 .55
.15 .17
.65 .63
As this table shows
In short, if the propensity scores in the intervention condition match the propensity scores in the
control condition, the design theoretically resembles a randomized control trial. Differences
between the conditions can be more confidently ascribed to caffeine rather than to the other
measured co-variates, such as age, IQ, and motivation.
But, how does the software match the conditions. The software can apply one of several
principles
This method is straightforward in principle, but not especially helpful in practice. The problem is
this method will exclude many participants, diminishing the likelihood of significant results. Instead,
software can utilize other procedures instead, such as interval matching, Mahalanobis metric
matching, and weighting. In principle, these methods are similar, but utilize more complex
algorithms to increase statistical power.
Software to conduct propensity score matching
So far, this document has merely introduced the rationale that underpins propensity score
matching. To actually conduct this technique, you need to use the relevant software.
The free statistical package R can be used to conduct propensity score matching. If you
have yet to download R, you should
• Go to https://cran.rstudio.com
• Click the “Download R” option that is relevant to your computer—such as Mac or Windows
• Click the options to download the latest version of R
• Install R on your computer by following the instructions—just like you would install any software
• Go to https://www.rstudio.com
• Click Download R studio
• Click the RStudio option associated with your computer under the heading “Installers”
• Install this software and follow the various prompts
• R Studio is designed to simplify the use of R
Finally, Google the web for instructions on how to utilize R to conduct propensity score
matching.
Stata
If you use Stata, you can utilize the psmatch2 command. Specifically
SPSS
You can conduct propensity score matching in SPSS. However, before you start, you need to
download R and an accompanying package, both of which are free. The following table illustrates
how you can achieve this goal and then conduct propensity score matching
Activity Details
Install R on to your computer First, decide which version of R to download
For SPSS Version 21, install R.2.24.0
For SPSS Version 22, install R.2.25.0
For later versions of SPSS, usually you should install the
latest versions of R
To install R
• Go to https://cran.rstudio.org
• Click the “Download R” option that is relevant to your
computer—such as the Mac or Windows version
• Then, click the options to download R
• Install R on your computer by following the instructions—
just like you would install any software.
• Usually, you can just press Next after each instruction.
Download IBM SPSS Statistics- • Google “IBM Tools for SPSS products”
Essentials for R • Then click the link that resembles IBM Tools for SPSS
Products 2019/01/16 17:26:18
• Press register here to register an account
The output that you generate will depend on which statistical package you utilize. This
section, however, outlines some principles you should consider to interpret the output.
Measures of balance
• Most packages will present some statistics that indicate the extent to which the intervention and
control conditions were matched successfully—that is, the degree to which the conditions are
similar on the co-variates.
• One example is the reduction in pseudo R2 from the logistic model. A high value indicates the
conditions are significantly more similar to each other after the propensity scores were matched.
Likewise, if the pseudo R2 from the logistic model is close to zero, the matching had been
successful
Further reading
Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of
confounding in observational studies. Multivariate Behavioral Research, 46, 399-424.
Austin, P. C. (2014) A comparison of 12 algorithms for matching on the propensity score. Statical
Medicine, 33, 1057-1069.
Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity
score matching. Journal of Economic Surveys, 22, 31-72.
Winger, D. G., & Nason, K. S. (2016). Propensity-score analysis in thoracic surgery: when, why, and
an introduction to how. Journal of Thoracic Cardiovascular Surgery, 151, 1484–1487.