Nomogram to Early Screen Multiparous Women for Preterm Birth in a Cohort Study

Background

Although preterm birth (PTB) prevalence varies widely among countries, it is generally estimated to be between 3 and 13% of total pregnancies [1, 2]. PTB is also among the leading causes of morbidity and mortality under 5-year-old infants, particularly, in countries with an important number of low to middle income households, especially in some Asian and African countries [3].

Screening for PTB remains difficult in the absence of specific tests that would identify potential mothers at high risk of preterm birth although the cervical length and cervicovaginal fetal fibronectin measurements among others have been used with some success [4]. Hence, most of the prediction studies have used maternal factors that were associated with PTB with some considered non-modifiable such as the history of PTB, extremes in maternal age (< 19 and > 35 years) [1], multiple pregnancies, short cervical length, uterine abnormalities, and genetic factors [5]. Factors related to nutrition, socioeconomic status, low body mass index (BMI), obesity, poor pregnancy weight gain, smoking, substance abuse, short inter-pregnancy interval, periodontal disease, bacterial vaginosis, late or no prenatal care, untreated antenatal depression, and the use of assisted reproductive technologies [3] can be preventable with close medical surveillance.

These maternal factors were used to develop models that predict preterm birth [6]. The models range from traditional logistic regression to identify the risk factors and estimate odds ratios to more recent machine learning algorithms including neural networks [7]. Although neural networks algorithms have been shown to lead to very high preterm prediction results, it is difficult to develop a simple version that can be used by physicians, especially in developing countries where the gynecologist takes all the decisions. The logistic regression model linear coefficients have been used in nomograms and spreadsheets to deliver prediction tools that can be used by all physicians [8].

Early detection of PTB can help lower the risks for infants and mothers through corticosteroid administration, cervical cerclage, and other effective treatments [9] However, because of the low prevalence of PTB, there is a need to screen for women selected to undergo more of these adequate tests for potential PTB, especially in developing countries with limited resources.

Our retrospective data of 1996 women showed that 922 multiparous women had a preterm prevalence (8%) more than double that of nulliparous women (3%). Reports on the incidence of PTB and multi-parity have been inconsistent and variable [10]. Although there are many models to screen for preterm there is a need for more focused analysis on multiparous women, especially because of the availability of indicators from past pregnancies.

The main objective of this project was to develop a valid and easy to use, tool for physicians to screen among non-nulliparous pregnant women for preterm birth risk based on the data routinely collected such as medical history, demographic, and weight parameters. We improved the prediction by training the models on resampled datasets (Up Sampling) to mitigate the problem of the low prevalence of preterm birth. We also used logistic regression regularized models with LASSO (Least Absolute Shrinkage and Selection Operator) to help analyze and select the different covariates for the best possible preterm risk evaluation.

Methods

Source of data: Data were obtained from the medical records in five hospitals in North-Lebanon (private and public Islamic hospitals, Sayyidet Zgharta hospital, governmental hospital of Akkar, and governmental hospital of Tripoli).

Participants: In addition to the aforementioned collection of data from medical records, we also collected data directly from 688 women. The participants were chosen in concert with many local gynecologists.

Outcome: The objective was to develop a model that can be used to predict spontaneous preterm risk for multiparous women but also be able to be expressed in the form of a nomogram easy to use for physicians.

Predictors: The cohort study included binary responses to 15 variables with the positive class as follows Age (25–35 years), BMI (obese), Education-husband (high: university degree), Education-mom (high: university degree), Pre-Cesarean (presence in last pregnancy), Pre-Diabetes (presence in last pregnancy), Preeclampsia (presence in last pregnancy), Pre-Hemorrhage (presence in last pregnancy), Pre-Induction (presence in last pregnancy), Residence (city), preterm (spontaneous presence in last pregnancy), smoking(smoker), Social-status (high), Weight-gain (excess), Work-husband (external job), and Work-mom (external job)

The Body Mass Index (BMI) of each woman was calculated by using the formula: Weight (kg)/Height (m²). Women were divided into obese and non-obese weight groups based on WHO guidelines [11] (BMI below or above 30). The underweight group was discarded due to a negligible number of representatives.

Missing data: There were no missing data because samples with incomplete data, women aged under 17 or above 35 or suspected to have fetuses with congenital malformation were discarded from the study.

Sample size: The data used in this work were part of a program to evaluate pregnancy fetal complications in Northern Labanon. The number of multiparous women were 922 among a total of 1996 that gave birth between January 2014 and January 2016. We divided the multiparous data into two files. The first called testdata was composed of 706 profiles originating from the medical records. These second file that we used for model validation comprised 216 multiparous women from the 922 profiles that we collected directly from the women.

Statistical analysis methods: All the predictors were coded in binary variables. The first model (glm) used was a logistic regression using the testdata file. The second model (glmup) was also a logistic regression model using a new file generated from the testdata file using Up-sampling. This file called upsampleddata included 1258 profiles representing 649 profiles of non-preterm women and 649 randomly generated profiles, by the up-sampling algorithm, for women with a preterm. The third model (glmnetup) was a logistic LASSO penalized regression. The final model (glmglmnetup) was a logistic regression using only the predictors selected by the LASSO penalized regression trained using the upsampleddata.

All models were validated first using the testdata and then using the validation data set (validationset). The models were compared in terms of statistically significant predictors along with the percentage of true positives and false positives. True positives were identified for a risk (probability) higher than 50%. We also compared the risk distribution profiles given by each model.

Chi-square test, Fisher test, and Principal Component Analysis for categorical variables were performed using SPSS. The logistic regression modeling, up-sampling, and LASSO penalization were carried out using R version 3.6.1. The Nomogram was created using the lrm package in R version 3.6.1.

Results

The multiparous women of the sample represented 46% of the total retrospective data. Despite some overlapping, the multiparous women form a distinct group characterized by a relatively lower social status and a higher incidence of gynecological complications as shown on the projection of the first two Principal Analysis Components (Fig. 1).

This group of 922 multiparous women were in majority urban, rather older working women with high education in a good income household (Table 1). They have dominantly university-level education (79%) along with their husbands (81%). About 65% reported having a job. Almost all the husbands reported having a job (96%) with a good social level (high income by 71%). They were also dominantly in the age bracket of 25 to 35 (64%), residing in the city (76%). About 33% of the women had rather an obese BMI with 47% presenting an excessive weight gain during the pregnancy.

Table 1

Percentage of each characteristic for the nulliparous women.
Characteristic	Percentage
Characteristic	(positives/total)
Age(25–35 years)	64
BMI(obese)	33
Education_husband(high)	81
Education_mom(high)	79
Pre_Cesarean(presence)	35
Pre_Diabetes(presence)	5
Pre_Eclampsia(presence)	4
Pre_Hemmorrhage(presence)	29
Pre_Induction(presence)	31
Residence(city)	76
preterm (presence)	8
smoking(smoker)	13
Social_status(high)	71
Weight_gain(excess)	47
Work_husband(external job)	96
Work_mom(external job)	65

The percentage of mothers who smoked during pregnancy was 13%. The dominant gynecological complications during past pregnancies were diabetes (5%) and Pre-eclampsia (4%). Approximately 31% of them have had induction and 29% hemorrhage.

There were 75 spontaneous preterm cases among the 922 multiparous women representing a PTB prevalence around 8%, which represented more than double the prevalence for nulliparous women. The percentage of women with PTB was slightly higher in the validationset with about 9.7% (21 among 216) than the testdataset with 7.6% (54 among 706).

The covariates that presented the highest difference of percentage within the PTB positive and the negative class were Pre-hemorrhage, Weight gain, Age, BMI, and Social status (Fig. 2). The Chi-square test revealed that most of these variables were statistically significant at least at the 5% level (Fig. 2). Smaller, non-statistically significant, differences were observed for pre-diabetes, work husband, and pre-eclampsia. Pre-eclampsia and Pre-diabetes were discarded from further modeling analysis because they gave a low prevalence reaching even 0 for the positive class. It is most likely that women with these indicators were already surveilled for PTB, which may explain their low prevalence.

Table 2

Linear coefficients of each logistic regression model (significant at the level 5% *, 1%** and 1‰ ***).
Factors	Models^a
Factors	glm	glmup	glmnetup	glmglmnetup
Intercept	-4.56**	-1.39**	-3.72	-1.97***
Age1	.54	0.86***	0.33	0.68***
BMI1	1.07**	0.75***	0.35	0.70***
Education_hus1	-0.52	-0.02	.
Education_mom1	-0.01	0.12	.
Pre_Cesarean1	-0.29	-0.52**	.
Pre_Hemmorrhage1	1.98***	2.11***	1.62	1.93***
Pre_Induction1	-0.12	0.12	.
Residence1	1.27*	1.30***	0.47	1.11***
smoking1	0.12	0.24	.
Social_status1	-1.42**	-1.82***	-1.04	-1.79***
Weight_gain1	1.03*	1.06***	0.76	1.07***
Work_hus1	-0.28	-0.64	.
Work_mom1	-0.14	0.09	.
^aglm: logisitc regression on original data, glmup: logisitc regression up-sample data,
glmnetup: LASSO regression on up-sample data,
glmglmnetup: Logistic regression with selected LASSO variables on up-sample data

The logistic regression analysis of the original dataset (glm) led to almost the same significant variables, as the Chi-square test, except that Age was not significant while Residence was added to the list of significant co-factors (Table 2). Despite presenting a high AUC of 0.84, this logistic model gave a low prediction of PTB that did not exceed 16% for the training set and 12% for the validation dataset. The women of the majority class of non-PTB were classified correctly which explains the high AUC (Area Under the Curve) observed (Accuracy higher than 92% for the training and validation dataset).

Table 3

Values of preterm and non-preterm (false positives) prediction for the different models.
Models*	Preterm (percent in total preterm)		False Positives	AUC
	Test set	Validation set	(perent total)	AUC
glm	16	12	1	0.841
glmup	78	92	25	0.846
glmnetup	80	92	25	0.837
glmglmnetup	76	88	21	0.84
*glm: logisitc regression on original data, glmup: logisitc regression up-sample data,
glmnetup: LASSO regression on up-sample data,
glmglmnetup: Logistic regression with selected LASSO variables on up-sample data

In contrast, after creating a balanced sample using the up-sampling algorithm and running the logistic model (glmup) on these datasets, the results were notably improved for the PTB prediction (Table 3). Indeed, PTB prediction ranged from 78 for the training set to 92% for the validation dataset although the number of misclassified non-PTB women significantly increased from few cases for the first model (glm) to about 25%, of the total number of pregnant women, for this last regression model. Comparable results were obtained by the LASSO regularized model (glmnetup) and the logistic regression using the selected variables by the LASSO regularization (glmglmnetup) that gave the lowest number of false positives (lower than 21%) while maintaining high PTB prediction, in comparison to all the models (Table 3) but still, the accuracy decreased to around 79%.

The comparison of the distribution of the PTB risk estimated by each model in comparison to original data (Fig. 3), showed that logistic regression before up-sampling (glm) and the Lasso model (glmnet) generally underestimate the probabilities in comparison to the other models. Even the last logistic model using the lasso selected variables slightly under-estimated those probabilities. However, both logistic regression with up-sampling before or after lasso regularization gave a closer risk or probability distribution to the original data than the other models (Fig. 3).

Along with the improvement of preterm prediction the number of statistically significant covariates (at least at the level 5%) also increased from 5 for glm, to 10 in glmup but the glmnetup reduced this number to 6 (Table 2). The regression model using the selected Lasso variables (glmglmnetup) was used to develop a nomogram (Fig. 4). The validation of this nomogram using the data of this study showed the possibility of having a reasonably accurate risk of PTB given the levels of Social status, Residence, Pre-hemorrhage, Age, BMI, and Weight gain for a multiparous woman.

Discussion

The results of this work led to a significant improvement of early preterm birth prediction, reaching up to 88%, for multiparous women using routinely collected social, demographic, and health indicators. The model that led to the best result for PTB prediction and the lowest number of false positives, was used to draw a graphical nomogram that could be easily used by physicians to screen for high-risk PTB. Nevertheless, the physicians will need to put on stricter medical surveillance about 21% (at risk of PTB + false positives) of the total number of multiparous women.

To achieve this level of PTB prediction, data augmentation of the initial sample through up-sampling algorithms was used. Hence, it is probable that the low PTB prediction of the logistic regression model based on the original data was at least partially due to the low prevalence of preterm birth. This model still predicted the majority class of non-PTB women with levels comparable to reported data on preconception PTB modeling [8].

However, using logistic regression to predict low prevalence events may lead to meaningless outcomes [12]. Data augmentation by up-sampling randomly increases the number of positive preterm birth profiles in the newly generated dataset without changing the other class comprising women not presenting PTB [13]. This technique has been successfully used in investigations with low or very low prevalence, including some machine learning techniques such as convolutional neural networks [7].

The logistic regression model on low prevalence data clearly under-estimates the general probability [14]. A similar phenomenon was also observed for the Lasso based model, albeit with significantly smaller under-estimation. Furthermore, the regressions on up-sampled data included a higher number of significant variables to explain the model. The number of significant variables by logistic regression almost exactly corresponded to the variables selected by Lasso regularization. However, the final model using the 6 selected variables from Lasso regularization decreased the number of false positives and hence gave the best results for PTB prediction.

The selected covariates that seem to significantly affect PTB in this study were Social status, Pre-hemorrhage, Residence, Weight gain, BMI, and Age. These variables were used to draw a nomogram that can be used to screen multiparous women for PTB. Hence, it seems that the possibility of access to adequate medical care through a high income and avoiding weight problems are key factors to decrease PTB incidence for this group of multiparous women. Nevertheless, if residing in the city may grant easier access to medical care, in comparison to villages, urban women presented a slightly higher PTB risk. In China, it has also reported higher PTB risk in urban areas [15]. Indicators of excess weight in terms of BMI or pregnancy Weight gain especially coupled to older pregnancy age increase preterm risk [16, 17]. It is noteworthy that besides the social status, the high incidence of hemorrhage in this group of women, reaching 29% that is higher even in comparison to some countries of lower national income [18] led to the highest adjusted odds ratio for PTB of 6.88 to 10.24 (95% interval).

However, this study presents many limitations. It would be improved with a higher number of women in the sample. On top of the low number of cases, the sample was fairly homogeneous because data are better kept in hospitals treating a bigger number of high social status patients. We are hoping that this type of work will encourage health authorities to establish public databases on births in this type of low to middle-income countries. Pre-eclampsia and Diabetes were not used in the models because of the very low prevalence affecting the interpretation of the models. More variables could be added such as past PTB, the number of children, stressful work, anxiety and planned pregnancies among others. Measurements such as cervical length and cervicovaginal fetal fibronectin should be added in the screening model or at least carried out on the group of screened women by the nomogram.

References

[1] World Health Organization. Born too soon: the global action report on preterm birth. World Health Organization. (‎2012)‎. https://apps.who.int/iris/handle/10665/44864.

[2] Chawanpaiboon S., Vogel J.P., Moller A.-B., Lumbiganon P., Petzold M., Hogan D., Landoulsi S., Jampathong N., Kongwattanakul K., Laopaiboon M., et al. Global, regional, and national estimates of levels of preterm birth in 2014, a systematic review and modelling analysis. Lancet Glob. Health. 2018;7:e37–e46. doi: 10.1016/S2214-109X(18)30451-0.

[3] Katz, J., Lee, A. C., Kozuki, N., Lawn, J. E., Cousens, S., Blencowe, H., Ezzati, M., Bhutta, Z. A., Marchant, T., Willey, B. A., Adair, L., Barros, F., Baqui, A. H., Christian, P., Fawzi, W., Gonzalez, R., Humphrey, J., Huybregts, L., Kolsteren, P., Mongkolchati, A., … CHERG Small-for-Gestational-Age-Preterm Birth Working Group (2013). Mortality risk in preterm and small-for-gestational-age infants in low-income and middle-income countries: a pooled country analysis. Lancet (London, England), 382(9890), 417–425. https://doi.org/10.1016/S0140-6736(13)60993-9.

[4] Kaplan, Zeynep & Ozgu-Erdinc, A.Seval. Prediction of Preterm Birth: Maternal Characteristics, Ultrasound Markers, and Biomarkers: An Updated Overview. Journal of Pregnancy. 2018. 1-8.

[5] Goldenberg, Robert AU - Culhane, Jennifer AU - Iams, Jay AU - Romero, Roberto PY - Epidemiology and Causes of Preterm Birth. Lancet (2018). 371 :75-84.

[6] Kleinrouweler CE, Cheong-See FM, Collins GS, Kwee A, Thangaratinam S, Khan KS, Mol BW, Pajkrt E, Moons KG, Schuit E. Prognostic models in obstetrics: available, but far from applicable. Am J Obstet Gynecol. 2016 Jan;214(1):79-90.e36. doi: 10.1016/j.ajog.2015.06.013. [7] Włodarczyk Tomasz , Szymon PłotkaPrzemysław , Rokita Nicole, Sochacki-WójcickaJakub, WójcickiMichał and LipaTomasz Trzciński Spontaneous Preterm Birth Prediction Using Convolutional Neural Networks. In: Hu Y. et al. (eds) Medical Ultrasound, and Preterm, Perinatal and Paediatric Image Analysis. ASMUS 2020, PIPPI 2020. Lecture Notes in Computer Science, vol 12437. Springer, Cham. https://doi.org/10.1007/978-3-030-60334-2_27.

[8] Mehta-Lee, S.S., Palma, A., Bernstein, P.S. et al. A Preconception Nomogram to Predict Preterm Delivery. Matern Child Health J 21, 118–127 (2017). https://doi.org/10.1007/s10995-016-2100-3.

[9] Eleje, G. U., Ikechebelu, J. I., Eke, A. C., Okam, P. C., Ezebialu, I. U., & Ilika, C. P. Cervical cerclage in combination with other treatments for preventing preterm birth in singleton pregnancies. The Cochrane Database of Systematic Reviews, 2017(11), CD012871. https://doi.org/10.1002/14651858.CD012871.

[10] Koullali, B., van Zijl, M.D., Kazemier, B.M. et al. The association between parity and spontaneous preterm birth: a population based study. BMC Pregnancy Childbirth (2020). 20, 233. https://doi.org/10.1186/s12884-020-02940-w.

[11] World Health Organization (WHO), Global Strategy on Diet, Physical Activity and Health [cited 2020 Aug 31], https://www.who.int/dietphysicalactivity/childhood_what/en/.

[12] Doerken S, Avalos M, Lagarde E, Schumacher M Penalized logistic regression with low prevalence exposures beyond high dimensional settings. PLOS ONE (2019) .14(5): e0217057. https://doi.org/10.1371/journal.pone.0217057.

[13] Gao Cheng, Sarah Osmundson, Digna R. Velez Edwards, Gretchen Purcell Jackson, Bradley A. Malin, You Chen. Deep learning predicts extreme preterm birth from electronic health records, Journal of Biomedical Informatics (2019), Volume 100, 103334, ISSN 1532-0464, https://doi.org/10.1016/j.jbi.2019.103334.

[14] Giordano Francesco, Marcella Niglio & Marialuisa Restaino A new procedure for variable selection in presence of rare events, Journal of the Operational Research Society, (2020). DOI: 10.1080/01605682.2020.1740620.

[15] Li, L., Ma, J., Cheng, Y. et al. Urban–rural disparity in the relationship between ambient air pollution and preterm birth. Int J Health Geogr 19, 23 (2020). https://doi.org/10.1186/s12942-020-00218-0

[16] Masho, S.W., Bishop, D.L. & Munn, M. Pre-pregnancy BMI and weight gain: where is the tipping point for preterm birth?. BMC Pregnancy Childbirth 13, 120 (2013). https://doi.org/10.1186/1471-2393-13-120

[17] Fuchs, F., Monet, B., Ducruet, T., Chaillet, N., & Audibert, F. Effect of maternal age on the risk of preterm birth: A large cohort study. PloS one (2018), 13(1), e0191002. https://doi.org/10.1371/journal.pone.0191002.

[18] Kebede, B. A., Abdo, R. A., Anshebo, A. A., & Gebremariam, B. M. Prevalence and predictors of primary postpartum hemorrhage: An implication for designing effective intervention at selected hospitals, Southern Ethiopia. PloS one (2019), 14(10), e0224579. https://doi.org/10.1371/journal.pone.0224579.

Nomogram to Early Screen Multiparous Women for Preterm Birth in a Cohort Study

Abstract

Plain English Abstract

Background

Methods

Results

Discussion

Conclusion

List of Abbreviations

Declarations

References