Comparison of Methods for Handling Covariate Missingness in Propensity Score Estimation with a Binary Exposure
Background: Causal effect estimation with observational data is subject to bias due to confounding, which is often controlled for using propensity scores. One unresolved issue in propensity score estimation is how to handle missing values in covariates.
Method: Several approaches have been proposed for handling covariate missingness, including multiple imputation (MI), multiple imputation with missingness pattern (MIMP), and treatment mean imputation. However, there are other potentially useful approaches that have not been evaluated, including single imputation (SI) + prediction error (PE), SI+PE + parameter uncertainty (PU), and Generalized Boosted Modeling (GBM), which is a nonparametric approach for estimating propensity scores in which missing values are automatically handled in the estimation using a surrogate split method. To evaluate the performance of these approaches, a simulation study was conducted.
Results: Results suggested that SI+PE, SI+PE+PU, MI, and MIMP perform almost equally well and better than treatment mean imputation and GBM in terms of bias; however, MI and MIMP account for the additional uncertainty of imputing the missingness.
Conclusions: Applying GBM to the incomplete data and relying on the surrogate split approach resulted in substantial bias. Imputation prior to implementing GBM is recommended.
Figure 1
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the manuscript can be downloaded and accessed as a PDF.
This is a list of supplementary files associated with this preprint. Click to download.
Posted 12 Jun, 2020
On 09 Feb, 2020
Received 07 Feb, 2020
Received 10 Jan, 2020
Received 01 Jan, 2020
On 30 Dec, 2019
On 29 Dec, 2019
Invitations sent on 13 Dec, 2019
On 13 Dec, 2019
On 18 Nov, 2019
On 17 Nov, 2019
On 17 Nov, 2019
On 17 Nov, 2019
Comparison of Methods for Handling Covariate Missingness in Propensity Score Estimation with a Binary Exposure
Posted 12 Jun, 2020
On 09 Feb, 2020
Received 07 Feb, 2020
Received 10 Jan, 2020
Received 01 Jan, 2020
On 30 Dec, 2019
On 29 Dec, 2019
Invitations sent on 13 Dec, 2019
On 13 Dec, 2019
On 18 Nov, 2019
On 17 Nov, 2019
On 17 Nov, 2019
On 17 Nov, 2019
Background: Causal effect estimation with observational data is subject to bias due to confounding, which is often controlled for using propensity scores. One unresolved issue in propensity score estimation is how to handle missing values in covariates.
Method: Several approaches have been proposed for handling covariate missingness, including multiple imputation (MI), multiple imputation with missingness pattern (MIMP), and treatment mean imputation. However, there are other potentially useful approaches that have not been evaluated, including single imputation (SI) + prediction error (PE), SI+PE + parameter uncertainty (PU), and Generalized Boosted Modeling (GBM), which is a nonparametric approach for estimating propensity scores in which missing values are automatically handled in the estimation using a surrogate split method. To evaluate the performance of these approaches, a simulation study was conducted.
Results: Results suggested that SI+PE, SI+PE+PU, MI, and MIMP perform almost equally well and better than treatment mean imputation and GBM in terms of bias; however, MI and MIMP account for the additional uncertainty of imputing the missingness.
Conclusions: Applying GBM to the incomplete data and relying on the surrogate split approach resulted in substantial bias. Imputation prior to implementing GBM is recommended.
Figure 1
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the manuscript can be downloaded and accessed as a PDF.