Propensity Score Estimation

Practical Propensity Score Methods Using R

Before propensity scores can be estimated, it is necessary to handle any missing data in the covariates. I recommend multiple imputation because it has been shown to outperform other methods of handling missing data such as listwise deletion, pairwise deletion, and single imputation (hot deck imputation, regression imputation).

There are two main approaches to multiple imputation: joint modeling and multiple imputation by chained equations (MICE). I use MICE because it does not require the specification of a joint distribution of covariates.

In the video below, I review R code for multiple imputation as well as single imputation of covariates prior to propensity score estimation using MICE.

PROPENSITY SCORE ESTIMATION WITH LOGISTIC REGRESSION
The most common method to estimate propensity scores is logistic regression, because it is a parametric model that is familiar to many researchers. Although there are many advanced data mining methods that can potentially outperform logistic regression, I recommend that researchers use logistic regression first because it frequently produces propensity scores that result in adequate covariate balance. If you are able to achieve covariate balance using the propensity scores estimated with logistic regression, it is not necessary to use advanced data mining methods.

In the video below, I review R code for propensity score estimation with logistic regression.

Code for Chapter 2 Propensity Score Estimation

R Code for Propensity Score Estimation
File Size:	15 kb
File Type:	r

Download File

Data for Example of Propensity Score Estimation

R Data for Propensity Score Estimation Example
File Size:	294 kb
File Type:	rdata

Download File

Many data mining methods can be used to estimate propensity scores, such as generalized boosted modeling, random forests, and neural networks. In this video , I show how to estimate propensity scores using generalized boosted modeling with the twang package of R.

In the video below, I show how to estimate propensity scores with random forests using the party package of R.

Related Research:
Leite, W. L., Aydin, B., & D. D. Cetin-Berber (2021). Imputation of Missing Covariate Data Prior to Propensity Score Analysis: A Tutorial and Evaluation of Robustness of Practical Approaches. Evaluation Review. https://doi.org/10.1177/0193841X211020245
Code for the paper

Collier, Z. K., & Leite, W. L. (2021). A Tutorial on Artificial Neural Networks in Propensity Score Analysis. Journal of Experimental Education. DOI: 10.1080/00220973.2020.1854158

Collier, Z. K., Leite, W. L, & Zhang, H. (2021): Estimating propensity scores using neural networks and traditional methods: a comparative
simulation study, Communications in Statistics - Simulation and Computation, DOI: 10.1080/03610918.2021.1963455

Collier Z. K., Leite W. L., Karpyn A. (2021). Neural Networks to Estimate Generalized Propensity Scores for Continuous Treatment Doses. Evaluation Review. doi:10.1177/0193841X21992199