Nomogram to Early Screen Multiparous Women for Preterm Birth in a Cohort Study

DOI: https://doi.org/10.21203/rs.3.rs-111878/v1

Abstract

Background: Preterm Birth (PTB) can negatively affect the health of mothers as well as infants. Prediction of this gynecological complication remains difficult especially in Middle and Low-Income countries because of limited access to specific tests and data collection scarcity. Multiparous women in our study presented a higher PTB prevalence compared to nulliparous women.

Methods: In a cohort study from Northern Lebanon of 1996 women, 922 were multiparous presenting a PTB prevalence of 8%. We analyzed the personal, demographic, and health indicators available for this group of women. We compared 4 modified logistic regression models (up-sampling, lasso penalized regression) to develop a nomogram that can screen for preterm in multi-parous women. The models were trained and validated on different data sets.

Results: The best PTB prediction of the Logistic regression model reached around 88%. This was obtained using a Logistic Regression Model trained on up-sampled datasets and LASSO (Least Absolute Shrinkage and Selection Operator) penalized. The regression coefficients of the 6 selected variables (Pre-hemorrhage, Social status, Residence, Age, BMI, and Weight gain) were used to create a nomogram to screen multiparous women for PTB risk.

Conclusions: The nomogram based on readily available indicators for multiparous women reasonably predicted most of the at PTB risk women. This tool will allow physicians to screen women that represent a high risk for spontaneous preterm birth and run furthermore adequate additional tests leading to better medical surveillance that can reduce PTB incidence.

Plain English Abstract

Preterm Birth (PTB) is still one of the pregnancy complications that affects negatively the health of mothers as well as infants. Prediction of preterm remains difficult, especially in Middle and Low-Income countries, because of data collection scarcity and limited resources to perform advanced clinical tests. In our study, women with at least one child presented a higher PTB prevalence compared to women in their first pregnancy. In the absence of specific preterm clinical tests, we used collected data on past pregnancy to develop a graphical tool, a nomogram, which can also be used in a spreadsheet to evaluate the risk of these women to undergo a PTB. The evaluation of the nomogram showed promising results to screen 88% of women at risk for PTB using easily available information at the beginning of pregnancy cycle including past hemorrhage or diabetes, high social status, residence in a city, age higher than 25 years, obese BMI, and excessive weight gain.

Background

Although preterm birth (PTB) prevalence varies widely among countries, it is generally estimated to be between 3 and 13% of total pregnancies [1, 2]. PTB is also among the leading causes of morbidity and mortality under 5-year-old infants, particularly, in countries with an important number of low to middle income households, especially in some Asian and African countries [3].

Screening for PTB remains difficult in the absence of specific tests that would identify potential mothers at high risk of preterm birth although the cervical length and cervicovaginal fetal fibronectin measurements among others have been used with some success [4]. Hence, most of the prediction studies have used maternal factors that were associated with PTB with some considered non-modifiable such as the history of PTB, extremes in maternal age (< 19 and > 35 years) [1], multiple pregnancies, short cervical length, uterine abnormalities, and genetic factors [5]. Factors related to nutrition, socioeconomic status, low body mass index (BMI), obesity, poor pregnancy weight gain, smoking, substance abuse, short inter-pregnancy interval, periodontal disease, bacterial vaginosis, late or no prenatal care, untreated antenatal depression, and the use of assisted reproductive technologies [3] can be preventable with close medical surveillance.

These maternal factors were used to develop models that predict preterm birth [6]. The models range from traditional logistic regression to identify the risk factors and estimate odds ratios to more recent machine learning algorithms including neural networks [7]. Although neural networks algorithms have been shown to lead to very high preterm prediction results, it is difficult to develop a simple version that can be used by physicians, especially in developing countries where the gynecologist takes all the decisions. The logistic regression model linear coefficients have been used in nomograms and spreadsheets to deliver prediction tools that can be used by all physicians [8].

Early detection of PTB can help lower the risks for infants and mothers through corticosteroid administration, cervical cerclage, and other effective treatments [9] However, because of the low prevalence of PTB, there is a need to screen for women selected to undergo more of these adequate tests for potential PTB, especially in developing countries with limited resources.

Our retrospective data of 1996 women showed that 922 multiparous women had a preterm prevalence (8%) more than double that of nulliparous women (3%). Reports on the incidence of PTB and multi-parity have been inconsistent and variable [10]. Although there are many models to screen for preterm there is a need for more focused analysis on multiparous women, especially because of the availability of indicators from past pregnancies.

The main objective of this project was to develop a valid and easy to use, tool for physicians to screen among non-nulliparous pregnant women for preterm birth risk based on the data routinely collected such as medical history, demographic, and weight parameters. We improved the prediction by training the models on resampled datasets (Up Sampling) to mitigate the problem of the low prevalence of preterm birth. We also used logistic regression regularized models with LASSO (Least Absolute Shrinkage and Selection Operator) to help analyze and select the different covariates for the best possible preterm risk evaluation.

Methods

Source of data: Data were obtained from the medical records in five hospitals in North-Lebanon (private and public Islamic hospitals, Sayyidet Zgharta hospital, governmental hospital of Akkar, and governmental hospital of Tripoli).

Participants: In addition to the aforementioned collection of data from medical records, we also collected data directly from 688 women. The participants were chosen in concert with many local gynecologists.

Outcome: The objective was to develop a model that can be used to predict spontaneous preterm risk for multiparous women but also be able to be expressed in the form of a nomogram easy to use for physicians.

Predictors: The cohort study included binary responses to 15 variables with the positive class as follows Age (25–35 years), BMI (obese), Education-husband (high: university degree), Education-mom (high: university degree), Pre-Cesarean (presence in last pregnancy), Pre-Diabetes (presence in last pregnancy), Preeclampsia (presence in last pregnancy), Pre-Hemorrhage (presence in last pregnancy), Pre-Induction (presence in last pregnancy), Residence (city), preterm (spontaneous presence in last pregnancy), smoking(smoker), Social-status (high), Weight-gain (excess), Work-husband (external job), and Work-mom (external job)

The Body Mass Index (BMI) of each woman was calculated by using the formula: Weight (kg)/Height (m2). Women were divided into obese and non-obese weight groups based on WHO guidelines [11] (BMI below or above 30). The underweight group was discarded due to a negligible number of representatives.

Missing data: There were no missing data because samples with incomplete data, women aged under 17 or above 35 or suspected to have fetuses with congenital malformation were discarded from the study.

Sample size: The data used in this work were part of a program to evaluate pregnancy fetal complications in Northern Labanon. The number of multiparous women were 922 among a total of 1996 that gave birth between January 2014 and January 2016. We divided the multiparous data into two files. The first called testdata was composed of 706 profiles originating from the medical records. These second file that we used for model validation comprised 216 multiparous women from the 922 profiles that we collected directly from the women.

Statistical analysis methods: All the predictors were coded in binary variables. The first model (glm) used was a logistic regression using the testdata file. The second model (glmup) was also a logistic regression model using a new file generated from the testdata file using Up-sampling. This file called upsampleddata included 1258 profiles representing 649 profiles of non-preterm women and 649 randomly generated profiles, by the up-sampling algorithm, for women with a preterm. The third model (glmnetup) was a logistic LASSO penalized regression. The final model (glmglmnetup) was a logistic regression using only the predictors selected by the LASSO penalized regression trained using the upsampleddata.

All models were validated first using the testdata and then using the validation data set (validationset). The models were compared in terms of statistically significant predictors along with the percentage of true positives and false positives. True positives were identified for a risk (probability) higher than 50%. We also compared the risk distribution profiles given by each model.

Chi-square test, Fisher test, and Principal Component Analysis for categorical variables were performed using SPSS. The logistic regression modeling, up-sampling, and LASSO penalization were carried out using R version 3.6.1. The Nomogram was created using the lrm package in R version 3.6.1.

Results

The multiparous women of the sample represented 46% of the total retrospective data. Despite some overlapping, the multiparous women form a distinct group characterized by a relatively lower social status and a higher incidence of gynecological complications as shown on the projection of the first two Principal Analysis Components (Fig. 1).

This group of 922 multiparous women were in majority urban, rather older working women with high education in a good income household (Table 1). They have dominantly university-level education (79%) along with their husbands (81%). About 65% reported having a job. Almost all the husbands reported having a job (96%) with a good social level (high income by 71%). They were also dominantly in the age bracket of 25 to 35 (64%), residing in the city (76%). About 33% of the women had rather an obese BMI with 47% presenting an excessive weight gain during the pregnancy.

Table 1

Percentage of each characteristic for the nulliparous women.

Characteristic

Percentage

(positives/total)

Age(25–35 years)

64

BMI(obese)

33

Education_husband(high)

81

Education_mom(high)

79

Pre_Cesarean(presence)

35

Pre_Diabetes(presence)

5

Pre_Eclampsia(presence)

4

Pre_Hemmorrhage(presence)

29

Pre_Induction(presence)

31

Residence(city)

76

preterm (presence)

8

smoking(smoker)

13

Social_status(high)

71

Weight_gain(excess)

47

Work_husband(external job)

96

Work_mom(external job)

65

 

The percentage of mothers who smoked during pregnancy was 13%. The dominant gynecological complications during past pregnancies were diabetes (5%) and Pre-eclampsia (4%). Approximately 31% of them have had induction and 29% hemorrhage.

There were 75 spontaneous preterm cases among the 922 multiparous women representing a PTB prevalence around 8%, which represented more than double the prevalence for nulliparous women. The percentage of women with PTB was slightly higher in the validationset with about 9.7% (21 among 216) than the testdataset with 7.6% (54 among 706).

The covariates that presented the highest difference of percentage within the PTB positive and the negative class were Pre-hemorrhage, Weight gain, Age, BMI, and Social status (Fig. 2). The Chi-square test revealed that most of these variables were statistically significant at least at the 5% level (Fig. 2). Smaller, non-statistically significant, differences were observed for pre-diabetes, work husband, and pre-eclampsia. Pre-eclampsia and Pre-diabetes were discarded from further modeling analysis because they gave a low prevalence reaching even 0 for the positive class. It is most likely that women with these indicators were already surveilled for PTB, which may explain their low prevalence.

 

Table 2

Linear coefficients of each logistic regression model (significant at the level 5% *, 1%** and 1‰ ***).

Factors

Modelsa

glm

glmup

glmnetup

glmglmnetup

Intercept

-4.56**

-1.39**

-3.72

-1.97***

Age1

.54

0.86***

0.33

0.68***

BMI1

1.07**

0.75***

0.35

0.70***

Education_hus1

-0.52

-0.02

.

 

Education_mom1

-0.01

0.12

.

 

Pre_Cesarean1

-0.29

-0.52**

.

 

Pre_Hemmorrhage1

1.98***

2.11***

1.62

1.93***

Pre_Induction1

-0.12

0.12

.

 

Residence1

1.27*

1.30***

0.47

1.11***

smoking1

0.12

0.24

.

 

Social_status1

-1.42**

-1.82***

-1.04

-1.79***

Weight_gain1

1.03*

1.06***

0.76

1.07***

Work_hus1

-0.28

-0.64

.

 

Work_mom1

-0.14

0.09

.

 

aglm: logisitc regression on original data, glmup: logisitc regression up-sample data,

   

glmnetup: LASSO regression on up-sample data,

     

glmglmnetup: Logistic regression with selected LASSO variables on up-sample data

   

 

The logistic regression analysis of the original dataset (glm) led to almost the same significant variables, as the Chi-square test, except that Age was not significant while Residence was added to the list of significant co-factors (Table 2). Despite presenting a high AUC of 0.84, this logistic model gave a low prediction of PTB that did not exceed 16% for the training set and 12% for the validation dataset. The women of the majority class of non-PTB were classified correctly which explains the high AUC (Area Under the Curve) observed (Accuracy higher than 92% for the training and validation dataset).

Table 3

Values of preterm and non-preterm (false positives) prediction for the different models.

Models*

Preterm (percent in total preterm)

False Positives

AUC

 

Test set

Validation set

(perent total)

glm

16

12

1

0.841

glmup

78

92

25

0.846

glmnetup

80

92

25

0.837

glmglmnetup

76

88

21

0.84

*glm: logisitc regression on original data, glmup: logisitc regression up-sample data,

 

glmnetup: LASSO regression on up-sample data,

   

glmglmnetup: Logistic regression with selected LASSO variables on up-sample data

 

 

In contrast, after creating a balanced sample using the up-sampling algorithm and running the logistic model (glmup) on these datasets, the results were notably improved for the PTB prediction (Table 3). Indeed, PTB prediction ranged from 78 for the training set to 92% for the validation dataset although the number of misclassified non-PTB women significantly increased from few cases for the first model (glm) to about 25%, of the total number of pregnant women, for this last regression model. Comparable results were obtained by the LASSO regularized model (glmnetup) and the logistic regression using the selected variables by the LASSO regularization (glmglmnetup) that gave the lowest number of false positives (lower than 21%) while maintaining high PTB prediction, in comparison to all the models (Table 3) but still, the accuracy decreased to around 79%.

The comparison of the distribution of the PTB risk estimated by each model in comparison to original data (Fig. 3), showed that logistic regression before up-sampling (glm) and the Lasso model (glmnet) generally underestimate the probabilities in comparison to the other models. Even the last logistic model using the lasso selected variables slightly under-estimated those probabilities. However, both logistic regression with up-sampling before or after lasso regularization gave a closer risk or probability distribution to the original data than the other models (Fig. 3).

Along with the improvement of preterm prediction the number of statistically significant covariates (at least at the level 5%) also increased from 5 for glm, to 10 in glmup but the glmnetup reduced this number to 6 (Table 2). The regression model using the selected Lasso variables (glmglmnetup) was used to develop a nomogram (Fig. 4). The validation of this nomogram using the data of this study showed the possibility of having a reasonably accurate risk of PTB given the levels of Social status, Residence, Pre-hemorrhage, Age, BMI, and Weight gain for a multiparous woman.

Discussion

The results of this work led to a significant improvement of early preterm birth prediction, reaching up to 88%, for multiparous women using routinely collected social, demographic, and health indicators. The model that led to the best result for PTB prediction and the lowest number of false positives, was used to draw a graphical nomogram that could be easily used by physicians to screen for high-risk PTB. Nevertheless, the physicians will need to put on stricter medical surveillance about 21% (at risk of PTB + false positives) of the total number of multiparous women.

To achieve this level of PTB prediction, data augmentation of the initial sample through up-sampling algorithms was used. Hence, it is probable that the low PTB prediction of the logistic regression model based on the original data was at least partially due to the low prevalence of preterm birth. This model still predicted the majority class of non-PTB women with levels comparable to reported data on preconception PTB modeling [8].

However, using logistic regression to predict low prevalence events may lead to meaningless outcomes [12]. Data augmentation by up-sampling randomly increases the number of positive preterm birth profiles in the newly generated dataset without changing the other class comprising women not presenting PTB [13]. This technique has been successfully used in investigations with low or very low prevalence, including some machine learning techniques such as convolutional neural networks [7].

The logistic regression model on low prevalence data clearly under-estimates the general probability [14]. A similar phenomenon was also observed for the Lasso based model, albeit with significantly smaller under-estimation. Furthermore, the regressions on up-sampled data included a higher number of significant variables to explain the model. The number of significant variables by logistic regression almost exactly corresponded to the variables selected by Lasso regularization. However, the final model using the 6 selected variables from Lasso regularization decreased the number of false positives and hence gave the best results for PTB prediction.

The selected covariates that seem to significantly affect PTB in this study were Social status, Pre-hemorrhage, Residence, Weight gain, BMI, and Age. These variables were used to draw a nomogram that can be used to screen multiparous women for PTB. Hence, it seems that the possibility of access to adequate medical care through a high income and avoiding weight problems are key factors to decrease PTB incidence for this group of multiparous women. Nevertheless, if residing in the city may grant easier access to medical care, in comparison to villages, urban women presented a slightly higher PTB risk. In China, it has also reported higher PTB risk in urban areas [15]. Indicators of excess weight in terms of BMI or pregnancy Weight gain especially coupled to older pregnancy age increase preterm risk [16, 17]. It is noteworthy that besides the social status, the high incidence of hemorrhage in this group of women, reaching 29% that is higher even in comparison to some countries of lower national income [18] led to the highest adjusted odds ratio for PTB of 6.88 to 10.24 (95% interval).

However, this study presents many limitations. It would be improved with a higher number of women in the sample. On top of the low number of cases, the sample was fairly homogeneous because data are better kept in hospitals treating a bigger number of high social status patients. We are hoping that this type of work will encourage health authorities to establish public databases on births in this type of low to middle-income countries. Pre-eclampsia and Diabetes were not used in the models because of the very low prevalence affecting the interpretation of the models. More variables could be added such as past PTB, the number of children, stressful work, anxiety and planned pregnancies among others. Measurements such as cervical length and cervicovaginal fetal fibronectin should be added in the screening model or at least carried out on the group of screened women by the nomogram.

Conclusion

Using readily available information from past pregnancy along with social and weight indicators, we developed a nomogram that can be used to screen for PTB risk in multiparous women. The best logistic regression model, that was used to develop the nomogram, showed that a group representing about 1/5 of the total number of women included 88% of the high PTB risk women. This group that was identified based on a risk threshold higher than 50%, should undergo additional tests or at least a closer medical surveillance for PTB. The number of women could be adjusted as a function of the health care capacity by decreasing or increasing the probability threshold using the nomogram. The nomogram uses the binary response to 6 covariates including Social status, Pre-hemorrhage, Residence, Weight gain, BMI, and Age.

In order to achieve a reasonably high prediction for PTB, the logistic regression was trained on a data augmented sample using upsampling and LASSO penalization, was used to help select these final covariates. These methods have proven their effectiveness in diseases or health complications that present low or very low prevalence.

List of Abbreviations

PTB: PreTerm Birth

AUC: Area Under the Curve

BMI: Body Mass Index

LASSO: Least Absolute Shrinkage and Selection Operator

Declarations

Ethics approval and consent to participate: The data for the study were collected in Lebanon. We had the approval of the Ethics Committee of the Global University, Faculty of health science composed of:

Dr.Ahmad Chatila

Dr.Nisreen Adada

Dr.Mohammed Kanaan

Ms.Zahira Manasfi

Consent for publication: Not applicable because the data used were non-nominal.

Availability of data and materials: Data available on request to Mrs. Traboulsi Mayssa

Competing interests: The authors declare that they don’t have any conflict of interest regarding the data published in this work

Funding: No special funding was used for the study.

Authors' contributions:

Mrs Traboulsi Mayssa: is a Ph.D. candidate that collected the data, participated in the design and write up of this work.

Pr. Zainab E. El Alaoui- Talibi: is the Ph.D. main advisor, participated in the design and write up of this work.

Pr. Boussaid Abdellatif: is the Ph.D. co-advisor, participated in the design and write up of this work. Executed and helped in the interpretation of the statistical analyses.

Acknowledgements: The authors thank the Hospitals that helped us collect the data for this work.

References

[1] World Health Organization. Born too soon: the global action report on preterm birth. World Health Organization. (‎2012)‎. https://apps.who.int/iris/handle/10665/44864.

[2] Chawanpaiboon S., Vogel J.P., Moller A.-B., Lumbiganon P., Petzold M., Hogan D., Landoulsi S., Jampathong N., Kongwattanakul K., Laopaiboon M., et al. Global, regional, and national estimates of levels of preterm birth in 2014, a systematic review and modelling analysis. Lancet Glob. Health. 2018;7:e37–e46. doi: 10.1016/S2214-109X(18)30451-0. 

[3] Katz, J., Lee, A. C., Kozuki, N., Lawn, J. E., Cousens, S., Blencowe, H., Ezzati, M., Bhutta, Z. A., Marchant, T., Willey, B. A., Adair, L., Barros, F., Baqui, A. H., Christian, P., Fawzi, W., Gonzalez, R., Humphrey, J., Huybregts, L., Kolsteren, P., Mongkolchati, A., … CHERG Small-for-Gestational-Age-Preterm Birth Working Group (2013). Mortality risk in preterm and small-for-gestational-age infants in low-income and middle-income countries: a pooled country analysis. Lancet (London, England), 382(9890), 417–425. https://doi.org/10.1016/S0140-6736(13)60993-9.

[4] Kaplan, Zeynep & Ozgu-Erdinc, A.Seval. Prediction of Preterm Birth: Maternal Characteristics, Ultrasound Markers, and Biomarkers: An Updated Overview. Journal of Pregnancy. 2018. 1-8.

[5] Goldenberg, Robert AU - Culhane, Jennifer AU - Iams, Jay AU - Romero, Roberto PY - Epidemiology and Causes of Preterm Birth. Lancet (2018). 371 :75-84.

[6] Kleinrouweler CE, Cheong-See FM, Collins GS, Kwee A, Thangaratinam S, Khan KS, Mol BW, Pajkrt E, Moons KG, Schuit E. Prognostic models in obstetrics: available, but far from applicable. Am J Obstet Gynecol. 2016 Jan;214(1):79-90.e36. doi: 10.1016/j.ajog.2015.06.013. [7] Włodarczyk Tomasz , Szymon PłotkaPrzemysław , Rokita Nicole, Sochacki-WójcickaJakub,  WójcickiMichał and LipaTomasz Trzciński Spontaneous Preterm Birth Prediction Using Convolutional Neural Networks. In: Hu Y. et al. (eds) Medical Ultrasound, and Preterm, Perinatal and Paediatric Image Analysis. ASMUS 2020, PIPPI 2020. Lecture Notes in Computer Science, vol 12437. Springer, Cham. https://doi.org/10.1007/978-3-030-60334-2_27.

[8] Mehta-Lee, S.S., Palma, A., Bernstein, P.S. et al. A Preconception Nomogram to Predict Preterm Delivery. Matern Child Health J 21, 118–127 (2017). https://doi.org/10.1007/s10995-016-2100-3.

[9] Eleje, G. U., Ikechebelu, J. I., Eke, A. C., Okam, P. C., Ezebialu, I. U., & Ilika, C. P. Cervical cerclage in combination with other treatments for preventing preterm birth in singleton pregnancies. The Cochrane Database of Systematic Reviews, 2017(11), CD012871. https://doi.org/10.1002/14651858.CD012871.

[10] Koullali, B., van Zijl, M.D., Kazemier, B.M. et al. The association between parity and spontaneous preterm birth: a population based study. BMC Pregnancy Childbirth (2020). 20, 233. https://doi.org/10.1186/s12884-020-02940-w.

[11] World Health Organization (WHO), Global Strategy on Diet, Physical Activity and Health [cited 2020 Aug 31], https://www.who.int/dietphysicalactivity/childhood_what/en/.

[12] Doerken S, Avalos M, Lagarde E, Schumacher M Penalized logistic regression with low prevalence exposures beyond high dimensional settings. PLOS ONE (2019) .14(5): e0217057. https://doi.org/10.1371/journal.pone.0217057.

[13] Gao Cheng, Sarah Osmundson, Digna R. Velez Edwards, Gretchen Purcell Jackson, Bradley A. Malin, You Chen. Deep learning predicts extreme preterm birth from electronic health records, Journal of Biomedical Informatics (2019), Volume 100, 103334, ISSN 1532-0464, https://doi.org/10.1016/j.jbi.2019.103334.

[14] Giordano Francesco, Marcella Niglio & Marialuisa Restaino A new procedure for variable selection in presence of rare events, Journal of the Operational Research Society, (2020). DOI: 10.1080/01605682.2020.1740620.

[15] Li, L., Ma, J., Cheng, Y. et al. Urban–rural disparity in the relationship between ambient air pollution and preterm birth. Int J Health Geogr 19, 23 (2020). https://doi.org/10.1186/s12942-020-00218-0

[16] Masho, S.W., Bishop, D.L. & Munn, M. Pre-pregnancy BMI and weight gain: where is the tipping point for preterm birth?. BMC Pregnancy Childbirth 13, 120 (2013). https://doi.org/10.1186/1471-2393-13-120

[17] Fuchs, F., Monet, B., Ducruet, T., Chaillet, N., & Audibert, F. Effect of maternal age on the risk of preterm birth: A large cohort study. PloS one (2018), 13(1), e0191002. https://doi.org/10.1371/journal.pone.0191002.

[18] Kebede, B. A., Abdo, R. A., Anshebo, A. A., & Gebremariam, B. M. Prevalence and predictors of primary postpartum hemorrhage: An implication for designing effective intervention at selected hospitals, Southern Ethiopia. PloS one (2019), 14(10), e0224579. https://doi.org/10.1371/journal.pone.0224579.