Before propensity scores can be estimated, it is usually necessary to handle any missing data in the covariates. Selecting the missing data method requires making an assumption about the data's missing data mechanism. It is usually not safe to assume the data is missing completely at random (MCAR), which means that the existing data is a random sample of the complete data. The more reasonable and common assumption is that data is missing at random (MAR), which means that the missing values are related to variables in the dataset. Under the MAR assumption, imputation methods have the best performance for propensity score analysis, particularly multiple imputation methods. Multiple imputation preserves the within-dataset variability and adds between-dataset variability to the standard errors, which accounts for the uncertainty of the missing values.
There are two main approaches to multiple imputation: joint modeling and multiple imputation by chained equations (MICE). I use MICE because it does not require the specification of a joint distribution of covariates. In particular, I recommend the MICE package in R using predictive mean matching (PMM) as the univariate imputation method. Kleinke (2017) has shown that PMM performs well under violations of assumptions.
Although multiple imputation has better theoretical properties the single imputation (because it adds between-imputation variability to standard errors), I have found through a simulation study (Leite, Aydin & Cetin-Berber, 2021), that single imputation with PMM of covariates prior to PSA performs well, resulting in unbiased treatment effect estimates and standard errors.
In the video below, I review R code for multiple imputation as well as single imputation of covariates prior to propensity score estimation using MICE.
There are two main approaches to multiple imputation: joint modeling and multiple imputation by chained equations (MICE). I use MICE because it does not require the specification of a joint distribution of covariates. In particular, I recommend the MICE package in R using predictive mean matching (PMM) as the univariate imputation method. Kleinke (2017) has shown that PMM performs well under violations of assumptions.
Although multiple imputation has better theoretical properties the single imputation (because it adds between-imputation variability to standard errors), I have found through a simulation study (Leite, Aydin & Cetin-Berber, 2021), that single imputation with PMM of covariates prior to PSA performs well, resulting in unbiased treatment effect estimates and standard errors.
In the video below, I review R code for multiple imputation as well as single imputation of covariates prior to propensity score estimation using MICE.
Publications about missing data in propensity score analysis
Leite, W. L., Aydin, B., & Cetin-Berber, D. D. (2021). Imputation of Missing Covariate Data Prior to Propensity Score Analysis: A Tutorial and Evaluation of the Robustness of Practical Approaches. Evaluation Review, 45(1-2), 34–69. https://doi.org/Artn 0193841x211020245
10.1177/0193841x211020245
Lee, Y., & Leite, W. L. (2024). A comparison of random forest-based missing imputation methods for covariates in propensity score analysis. Psychol Methods. https://doi.org/10.1037/met0000676
10.1177/0193841x211020245
Lee, Y., & Leite, W. L. (2024). A comparison of random forest-based missing imputation methods for covariates in propensity score analysis. Psychol Methods. https://doi.org/10.1037/met0000676
Proudly powered by Weebly