Appendix: Ps Matching in R: (With Attached Dataset and Code)
Appendix: Ps Matching in R: (With Attached Dataset and Code)
APPENDIX: PS MATCHING IN R
(with attached dataset and code)
Example 1
• 1987 National Medical • What is the effect of
Expenditures Survey ever smoking on odds
• Persons 40+ with of lung cancer /
complete covariate data laryngeal cancer / or
• Exposure: ever smoking COPD, as compared
• Control: never smoking with never smoking?
• Outcome: lung cancer,
laryngeal cancer, or
COPD
• N = 11,587
3
Variables
• eversmk (exposure) • 1/0 ever smoker / never smoker
• lc5 (outcome) • Lung / laryngeal CA / COPD
• LASTAGE: age • In years
• MALE: sex • 1/0 male / female
• RACE3: race • Other / African American /
Caucasian
• beltuse: seatbelt use • Rare / some / always
• educate: • college grad / some college/HS
grad/other
• marital:
• Married / widowed / divorced /
separated / never married
• SREGION: census region
• NE / MW / S / W
• POVSTALB: poverty status
• Poor / near poor / low income/
middle income / high income
4
1. Estimate the PS
• Goal: to achieve covariate balance on confounders so that they
cannot bias results
Machine learning
• Can use machine learning methods
to estimate the PS (Westreich 2010)
• E.g., neural nets, classification and
regression trees (CART)
Machine learning
• R: To implement boosted CART :
• library(twang)
• ps.model <‐ ps(treatment ~
LASTAGE + MALE + educate +
POVSTALB, data=data)
• data$PS <‐ ps.model$ps[, 1]
Estimating the PS
Model 1 Model 2
Logistic regression model Boosted CART model
Matching attempt #1
• For logistic regression estimated PS (Model 1)…
• library(MatchIt)
• nn1 <- matchit(eversmk ~ LASTAGE + MALE + educate + beltuse +
POVSTALB + marital + RACE3 + SREGION, data=a, distance=a$PS,
method="nearest")
• nn1.data <- match.data(nn1)
The distance option is not always necessary; if
• summary(nn1, standardize=T)
option is left out, Matchit can automatically
calculate the PS based on the linear model as
Standardize in order to shown. For more complex PS models, e.g.
show standardized balance with nonlinearities and such, estimate the PS
beforehand and specify the resulting PS as the
distance measure, like shown here.
14
N’s C T
All 5023 6564
Matched 5023 5023
Not 0 1541
matched
Discarded 0 0
16
Matching attempt #2
• For boosted CART estimated PS (Model 2)…
• library(MatchIt)
• nn2 <- matchit(eversmk ~ LASTAGE + MALE + educate + beltuse +
POVSTALB + marital + RACE3 + SREGION, data=a, distance=a$PS2,
method="nearest", exact="MALE", discard="both")
• nn2.data <- match.data(nn2)
• summary(nn2, standardize=T)
18
N’s C T
All 5023 6564
Matched 4320 4320
Discarded 26 31
19
Matching attempt #3
• For boosted CART estimated PS (Model 2)…
• library(MatchIt)
• nn3 <- matchit(eversmk ~ LASTAGE + MALE + educate + beltuse +
POVSTALB + marital + RACE3 + SREGION, data=a, distance=a$PS2,
method="nearest", exact="MALE", discard="both", caliper=0.2)
• nn3.data <- match.data(nn3)
• summary(nn3, standardize=T)
21
N’s C T
All 5023 6564
Matched 4075 4075
Discarded 26 31
23
References
• Austin PC, Mamdani MM. A comparison of propensity score methods: a case-study estimating the effectiveness of
post-AMI statin use. Statistics in medicine. 2006;25(12):2084-2106.
• Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Sturmer T. Variable selection for propensity score
models. Am J Epidemiol. 2006 Jun 15;163(12):1149-56.
• Cole SR, Hernan MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol. 2008
Sep 15;168(6):656-64.
• Hayes JR, Groner JI. Using multiple imputation and propensity scores to test the effect of car seats and seat belt
usage on injury severity from trauma registry data. J Pediatr Surg. 2008;43(5):924-927.
• Howe CJ, Cole SR, Westreich DJ, Greenland S, Napravnik S, Eron JJ, Jr. Splines for trend analysis and continuous
confounder control. Epidemiology. 2011;22(6):874-875.
• Imai K, van Dyk DA. Causal inference with general treatment regimes: generalizing the propensity score. Journal of
the American Statistical Association. 2004;99(467):854-866.
• Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Statistics in medicine.
2010;29(3):337-346.
• Lee BK, Lessler J, Stuart EA. Weight trimming and propensity score weighting. PLoS One. 2011;6(3):e18174.
• Mansson R, Joffe MM, Sun W, Hennessy S. On the estimation and use of propensity scores in case-control and
case-cohort studies. Am J Epidemiol. 2007 Aug 1;166(3):332-9.
• Stuart EA. Matching methods for causal inference: A review and a look forward. Stat Sci. 2010 Feb 1;25(1):1-21.
• Westreich D, Lessler J, Funk MJ. Propensity score estimation: neural networks, support vector machines, decision
trees (CART), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol. 2010 Aug;63(8):826-33.
• Westreich D, Cole SR, Funk MJ, Brookhart MA, Sturmer T. The role of the c-statistic in variable selection for
propensity score models. Pharmacoepidemiol Drug Saf. 2010 Dec 9.