Before propensity scores can be estimated, it is necessary to handle any missing data in the covariates. I recommend multiple imputation because it has been shown to outperform other methods of handling missing data such as listwise deletion, pairwise deletion, and single imputation (hot deck imputation, regression imputation).

There are two main approaches to multiple imputation: joint modeling and multiple imputation by chained equations (MICE). I use MICE because it does not require the specification of a joint distribution of covariates.

In the video below, I review R code for multiple imputation as well as single imputation of covariates prior to propensity score estimation using MICE.

PROPENSITY SCORE ESTIMATION WITH LOGISTIC REGRESSION

The most common method to estimate propensity scores is logistic regression, because it is a parametric model that is familiar to many researchers. Although there are many advanced data mining methods that can potentially outperform logistic regression, I recommend that researchers use logistic regression first because it frequently produces propensity scores that result in adequate covariate balance. If you are able to achieve covariate balance using the propensity scores estimated with logistic regression, it is not necessary to use advanced data mining methods.

In the video below, I review R code for propensity score estimation with logistic regression.

