Background and Purpose— Preterm birth (PTB) is the leading cause of infant mortality in the U.S. and globally. The goal of this study is to increase understanding of PTB risk factors that are present early in pregnancy by leveraging statistical and machine learning techniques on big data.
Methods—The 2016 U.S. birth records is obtained and combined with two other area-level datasets, Area Health Resources File and County Health Ranking. Then, we applied multiple machine learning techniques to study a cohort of 3.6 million singleton deliveries to identify generalizable preterm risk factors.
Results—The most important predictors of preterm birth are gestational and chronic hypertension, interval since last live birth, and history of a previous preterm birth that can respectively explain 14.91%, 6.92%, and 6.50% of the AUC. Parents education is one of the influential variables in prediction of PTB explaining 10.5% of the AUC. The relative importance of race declines when parents are more educated or have received adequate prenatal care. The gradient boosting machines outperformed other machine learning techniques with an AUC of 0.75 (recall: 0.64, specificity: 0.73) for the validation dataset.
Conclusions—Application of ML techniques improved the performance measures in prediction of preterm birth. The results emphasize the importance of socioeconomic factors such as parental education as one of the most important indicators of a preterm birth. More research is needed on the mechanisms through which the socioeconomic factors affect the biological responses.
Figure 1
Figure 2
Figure 3
This preprint is available for download as a PDF.
No competing interests reported.
This is a list of supplementary files associated with this preprint. Click to download.
Loading...
Posted 03 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
Invitations sent on 06 Mar, 2021
On 01 Mar, 2021
On 01 Mar, 2021
On 01 Mar, 2021
On 22 Feb, 2021
Posted 03 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
Received 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
On 11 Mar, 2021
Invitations sent on 06 Mar, 2021
On 01 Mar, 2021
On 01 Mar, 2021
On 01 Mar, 2021
On 22 Feb, 2021
Background and Purpose— Preterm birth (PTB) is the leading cause of infant mortality in the U.S. and globally. The goal of this study is to increase understanding of PTB risk factors that are present early in pregnancy by leveraging statistical and machine learning techniques on big data.
Methods—The 2016 U.S. birth records is obtained and combined with two other area-level datasets, Area Health Resources File and County Health Ranking. Then, we applied multiple machine learning techniques to study a cohort of 3.6 million singleton deliveries to identify generalizable preterm risk factors.
Results—The most important predictors of preterm birth are gestational and chronic hypertension, interval since last live birth, and history of a previous preterm birth that can respectively explain 14.91%, 6.92%, and 6.50% of the AUC. Parents education is one of the influential variables in prediction of PTB explaining 10.5% of the AUC. The relative importance of race declines when parents are more educated or have received adequate prenatal care. The gradient boosting machines outperformed other machine learning techniques with an AUC of 0.75 (recall: 0.64, specificity: 0.73) for the validation dataset.
Conclusions—Application of ML techniques improved the performance measures in prediction of preterm birth. The results emphasize the importance of socioeconomic factors such as parental education as one of the most important indicators of a preterm birth. More research is needed on the mechanisms through which the socioeconomic factors affect the biological responses.
Figure 1
Figure 2
Figure 3
This preprint is available for download as a PDF.
Loading...