Introduction to Propensity Score Analysis
Introduction to Propensity Score Analysis
Matthew VanEseltine
Assistant Professor
Bowling Green State University
Outline
1. Introduction
2. Matching and Counterfactuals
3. The Propensity Score Matching Process
a. Estimate Propensity Scores
b. Match Cases
c. Estimate Treatment Effects
d. Demonstrate Robustness
4. Elaborations and Complications
5. End
3
A Practical Introduction
This workshop is a practical introduction to propensity score analysis (PSA), a relatively
new approach to estimating treatment effects with nonexperimental data.
Whereas regression models attempt to balance data by including controls, PSA involves
matching cases based on their predicted likelihood to experience values of the
independent variable of interest.
The simplest forms of PSA use discrete treatments (e.g., imprisoned or not; became
married or remained unmarried) and is best suited for studies of longitudinal data with
few moderating or mediating variables.
The workshop will cover various types of matching, strengths and weaknesses of the
approach, and tests of robustness, with examples in Stata.
4
It’s Here
Uses and Mentions of “Propensity Score,” 2003–2013
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
AJS SF AJS ASR AJS AJS ASR AJS AJS AJS AJS
AJS Crim Crim Crim ASR SF AJS ASR AJS AJS
Crim JMF Crim Crim SF SF ASR ASR ASR AJS
JMF JMF SF SF ASR ASR ASR ASR
JMF Crim Crim ASR ASR SF SF
JMF Crim Crim ASR ASR SF Crim
JMF JMF Crim ASR ASR SF JMF
JMF JMF ASR ASR SF JMF
ASR SF SF JMF
SF SF SF JMF
SF Crim Crim
SF Crim Crim
SF Crim Crim
SF JMF Crim
Crim JMF
Crim JMF
Crim JMF
Crim JMF
JMF JMF
JMF
JMF
JMF
JMF
JMF
5
Y1 Y0
What if we could figure out everything that ‘matters’ and just match people together?
Statistically construct sufficiently identical groups
• Exact matching has been around a long time – but handles very few covariates.
• Propensity score analysis instead extracts the relevant information from those
covariates (likelihood to receive treatment) to make its matches.
Propensity Scores
A propensity score is the probability of being assigned to a treatment.
• Randomized experiments build this into the design. A coin flip: P(Wi = 1) = 0.5
• With observational data, we are instead going to try to estimate it.
• An estimated likelihood, given a vector of observed covariates: P(Wi = 1|Xi)
• Matching cases on propensity score will approximately balance treated and untreated.
Model Covariates
Which covariates to include? Remember the goal: predicting selection into treatment.
As with regression, a lot comes down to this, and quality matters more than quantity.
Common Support
Limit inferences to a range with information on both treated and untreated cases.
• Throw out all treated with higher propensity scores than the highest untreated.
• Throw out all untreated with lower propensity scores than the lowest treated.
And now is a good time to look at the distribution of your propensity score.
Match Cases
1. Introduction
2. Matching and Counterfactuals
3. The Propensity Score Matching Process
a. Estimate Propensity Scores
b. Match Cases
Matching Algorithms
Check for Balance
c. Estimate Treatment Effects
d. Demonstrate Robustness
4. Elaborations and Complications
5. End
13
Matching Algorithms
Score Treated Control
.3 C
T C
C
.4 C
.5 T
C
.6 T
C
.7
.8 T C
C
.9 T C
14
Nearest-Neighbor
Score Treated Control
.3 C
T C
C
.4 C
.5 T
C
.6 T
C
.7
.8 T C
C
.9 T C
15
NN within Caliper
Score Treated Control
.3 C
T C
C
.4 C
.5 T
C
.6 T
C
.7
.8 T C
C
.9 T C
16
.5 T
C
.6 T
C
.7
.8 T C
C
.9 T C
17
Kernel (Gaussian)
Score Treated Control
.3 C
T C
C
.4 C
.5 T
C
.6 T
C
.7
.8 T C
C
.9 T C
18
Kernel (Uniform)
Score Treated Control
.3 C
T C
C
.4 C
.5 T
C
.6 T
C
.7
.8 T C
C
.9 T C
19
.5 T
C
.6 T
C
.7 C
.8 T
C
C
.9 T C
20
.5 T
C
.6 T
C
.7 C
.8 T
C
C
.9 T C
21
Matching Algorithms
A range of matching protocols… so how do you decide?
For us, availability probably matters. See references for software links.
In Stata, pscore supports nearest-neighbor, kernel, and radius matching.
And psmatch2 adds Mahalanobis to that list.
Optimal matching is available in R, but not Stata (yet).
Did the propensity score successfully balance the data on observed covariates? Compare
differences between treated and untreated for each covariate before and after matching.
• Can use standardized bias (Rosenbaum and Rubin 1985b): under 10 as a rule of thumb
▫ Available for Stata in pstest.
• Can use a general guideline: less than 25% of the SD of X (Ho et al. 2007)
• No hard rules. You’ll see t-tests being used, though they’re debatable here.
• In Stata, pscore can test for balance by strata: don’t make it angry.
23
The most commonly used packages in Stata are pscore and psmatch2.
Becker, Sascha O. and Andrea Ichino. 2002. “Estimation of Average Treatment Effects
Based on Propensity Scores.” The Stata Journal 2(4): 358–377.
25
Alternatives to Matching
Stratification on the propensity score.
• Bin the sample into quintiles (or finer) by propensity score.
• Five subclasses are expected to remove 90% of bias from modeled covariates.
• Favored not for the overall estimate as much as the substantive value.
26
Stratification
Score Treated Control Strata
.3
T C
1
.4 C
C
T
2
.5 C
T C
.6 T
3
C
T
.7 C
4
T C
.8 T
T C
T 5
.9 T C
27
Alternatives to Matching
Stratification on the propensity score.
• Bin the sample into quintiles (or finer) by propensity score.
• Five subclasses are expected to remove 90% of bias from modeled covariates.
• Favored not for the overall estimate as much as the substantive value.
Demonstrate Robustness
1. Introduction
2. Matching and Counterfactuals
3. The Propensity Score Matching Process
a. Estimate Propensity Scores
b. Match Cases
c. Estimate Treatment Effects
d. Demonstrate Robustness
Rosenbaum Bounds
Modifications and Variations
4. Elaborations and Complications
5. End
29
Rosenbaum Bounds
Rosenbaum (2002): A sensitivity analysis for observational studies should ask “what the
unmeasured covariate would have to be like to alter the conclusions of the study.”
• Really showing us “what happens when we violate ignorable treatment assumption”?
• The standard formal sensitivity analysis for propensity score matching in sociology.
What About…
Sample size?
• At least 1,000–1,500 is recommended (Shadish 2013).
Missing data?
• Some disagreement. Listwise is most common. Multiple imputation might be okay.
Clustered data?
• Still an open issue. Multilevel matching is on the way. (Within-group or within-person.)
Weighting?
• You can use population weights on the final estimates if it makes sense.
• Sampling weights are generally out.
End
1. Introduction
2. Matching and Counterfactuals
3. The Propensity Score Matching Process
a. Estimate Propensity Scores
b. Match Cases
c. Estimate Treatment Effects
d. Demonstrate Robustness
4. Elaborations and Complications
5. End
34
For longer lists, see Apel and Sweeten (2010:559–560), Guo and Fraser (2010:321–326).
35
Key Features
The reasons I find propensity score matching especially advantageous or compelling:
• Counterfactual framework.
• Common support and “apples to apples” comparisons.
• Nonparametric estimates of treatment effects.
• Rosenbaum bounds and sensitivity testing.
• Generally promotes good habits.
Morgan, Winship, and Harding (2007): "Matching represents an intuitive method for
addressing causal questions, primarily because it pushes the analyst to confront the
process of causal exposure as well as the limitations of available data."
36
Acknowledgements
Special thanks are due to certain Penn State faculty. My training in this methodology was
founded formally by D. Wayne Osgood, and more informally by Jeremy Staff and Michael
Massoglia (now at Wisconsin).
This work was supported by the Center for Family and Demographic Research, Bowling
Green State University, which has core funding from the Eunice Kennedy Shriver
National Institute of Child Health and Human Development (R24HD050959-09).
My Stata example uses data from Add Health, a program project directed by Kathleen
Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan
Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-
HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human
Development, with cooperative funding from 23 other federal agencies and foundations.
Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in
the original design. Information on how to obtain the Add Health data files is available on
the Add Health website. No direct support was received from grant P01-HD31921 for this
analysis.
37
Examples in Handout
Apel, Robert, Arjan A.J. Blokland, Paul Nieuwbeerta, and Marieke van Schellen. 2009. “The Impact of
Imprisonment on Marriage and Divorce: A Risk Set Matching Approach.” Journal of Quantitative
Criminology 26:269–300. Sensitivity, null treatment comparison.
Frisco, Michelle L., Chandra Muller, and Kenneth Frank. 2007. “Parents’ Union Dissolution and
Adolescents’ School Performance: Comparing Methodological Approaches.” Journal of Marriage
and Family 69:721–741. Nearest neighbor, kernel, OLS comparison, rare treatment.
Yanovitzky, Itzhak, Elaine Zanutto, and Robert Hornik. 2005, “Estimating causal effects of public
health education campaigns using propensity score methodology.” Evaluation and Program
Planning 28:209–220. Pre- and post-adjustment, stratification by quintile, dosage.
Meier, Ann M. 2007. “Adolescent First Sex and Subsequent Mental Health.” American Journal of
Sociology 112:1811–1847. Full adjustment table, score distribution, many subgroups.
King, Ryan D., Michael Massoglia, and Ross Macmillan. 2007. “The Context of Marriage and Crime:
Gender, the Propensity to Marry, and Offending in Early Adulthood.” Criminology 45:33–65.
OLS comparison, gender subgroups, stratification within gender.
38
References
Introduction and Overview
Apel, Robert J. and Gary Sweeten. 2010. “Propensity Score Matching in Criminology and
Criminal Justice” in Handbook of Quantitative Criminology, edited by Alex R. Piquero
and David Weisburd. New York, NY: Springer.
Steiner, Peter M. and David Cook. 2013. “Matching and Propensity Scores” in The Oxford
Handbook of Quantitative Methods, edited by Todd L. Little. New York, NY: Oxford
University Press.
Full-Scale Coverage
Guo, Shenyang and Mark W. Fraser. 2010. Propensity Score Analysis: Statistical
Methods and Applications. Thousand Oaks, CA: Sage.
Morgan, Stephen L., and Christopher Winship. 2007. Counterfactuals and Causal
Inference. New York, NY: Cambridge University Press.
39
References
Elizabeth Stuart maintains a page with information on propensity score procedures
available to researchers. This will be a good place to look if you go beyond Stata:
http://www.biostat.jhsph.edu/~estuart/propensityscoresoftware.html
Stata:
Becker, Sascha O. and Andrea Ichino. 2002. “Estimation of Average Treatment Effects
Based on Propensity Scores.” Stata Journal 2(4): 358-377. [pscore]
Leuven, Edwin and Barbara Sianesi. 2003. “PSMATCH2: Stata module to perform full
Mahalanobis and propensity score matching, common support graphing, and covariate
imbalance testing.” [psmatch2]
R:
Ho, Daniel, Kosuke Imai, Gary King, and Elizabeth Stuart. MatchIt.
http://gking.harvard.edu/matchit
40
References
Some – Not All – Core Technical Work
Rosenbaum, Paul. R. and Donald B. Rubin. 1983. “The Central Role of the Propensity
Score in Observational Studies for Causal Effects.” Biometrika 70: 41–55.
Rosenbaum, Paul. R. and Donald B. Rubin. 1984. “Reducing Bias in Observational
Studies Using Subclassification on the Propensity Score.” Journal of the American
Statistical Association 79: 516–524.
Rosenbaum, Paul. R. and Donald B. Rubin. 1985a. “Constructing a Control Group Using
Multivariate Matched Sampling Methods that Incorporate the Propensity Score.” The
American Statistician 39: 33–38.
Rosenbaum, Paul. R. and Donald B. Rubin. 1985b. “The Bias Due to Incomplete
Matching.” Biometrics 41: 103–116.
Rosenbaum, Paul R. 2002. Observational Studies, 2nd Ed. New York, NY: Springer.
Rubin, Donald B. 1997. “Estimating Causal Effects from Large Data Sets Using Propensity
Scores.” Annals of Internal Medicine 127: 757–763.
41
References
Materials from Other Presentations
Chen, Vivien W., and Krissy Zeiser. 2007. “Implementing Propensity Score Matching
Causal Analysis with Stata” at the Population Research Institute, Penn State University,
February 27, 2008. Slides: http://help.pop.psu.edu/help-by-statistical-
method/propensity-matching/Intro%20to%20P-score_Sp08.pdf
Guo, Shenyang. 2010. “Overview of Propensity Score Matching” at Children and Family
Futures, September 1, 2010. Slides: http://cffutures.com/files/webinar-
handouts/Presentation%20Propensity%20Scoring%20Session%20I.pdf
Massoglia, Michael. 2012. “Propensity Score Models” at Indiana University, November
30, 2012. Slides:
http://www.indiana.edu/~wim/docs/11_30_2012_massoglia_Propensity%20Score%2
0Models-IU.pptx
Stuart, Elizabeth. 2011. ”The Why, When, and How of Propensity Score Methods for
Estimating Causal Effects” at Society for Prevention Research, May 31, 2011. Slides:
http://www.preventionresearch.org/wp-content/uploads/2011/07/SPR-Propensity-pc-
workshop-slides.pdf