Background: Early identification of pregnant women at risk for preterm birth (PTB), a major cause of infant mortality and morbidity, has a significant potential to improve prenatal care. However, we lack effective predictive models which can accurately forecast PTB and complement these predictions with appropriate interpretations for clinicians. In this work, we introduce a clinical prediction model (PredictPTB) which combines variables (medical codes) readily accessible through electronic health record (EHR) to accurately predict the risk of preterm birth at 1, 3, 6, and 9 months prior to delivery.
Methods: The architecture of PredictPTB employs recurrent neural networks (RNNs) to model the longitudinal patient's EHR visits and exploits a single code-level attention mechanism to improve the predictive performance, while providing temporal code-level and visit-level explanations for the prediction results. We compare the performance of different combinations of prediction time-points, data modalities, and data windows. We also present a case-study of our model's interpretability illustrating how clinicians can gain some transparency into the predictions.
Results: Leveraging a large cohort of 222,436 deliveries, comprising a total of 27,100 unique clinical concepts, our model was able to predict preterm birth with an ROC-AUC of 0.82, 0.79, 0.78, and PR-AUC of 0.40, 0.31, 0.24, at 1, 3, and 6 months prior to delivery, respectively. Results also confirm that observational data modalities (such as diagnoses) are more predictive for preterm birth than interventional data modalities (e.g., medications and procedures).
Conclusions: Our results demonstrate that PredictPTB can be utilized to achieve accurate and scalable predictions for preterm birth, complemented by explanations that directly highlight evidence in the patient's EHR timeline.