Modeling The Predictors of Preterm Delivery Based On The Pregnancy Care Data

Introduction:Effective reduction in prenatal mortality and comorbid complications, and associated expenses necessitates the determination of the causative agents and early identication of the risk of preterm delivery in pregnant women. Methodology: A cross-sectional study was conducted on the cases of 5651 mothers. Data was collected based on the information recorded in the patient les within the time period of 2016-2020 in the community health centers of Gonabad. Logistic regression model was employed to determine the factors associated with preterm delivery. The external validity of the model of the factors predicting preterm delivery was independently analyzed for the data collected from 100 postpartum women. Findings: Of the total deliveries studied, preterm delivery occurred in 11.5% of them (i.e., 649 deliveries). The multiple logistic regression model included 7 variables for predicting preterm birth. The developed model was checked for internal and external validity using accuracy, specicity, sensitivity, positive predictive value, negative predictive value and the area under the ROC curve. In the educational data, accuracy, specicity, sensitivity positive predictive value, negative predictive value, and the area under the ROC curve was 65.4%, 65.8%, 66.0%, 19.3%, 92.9%, 0.681, respectively. Conclusion: This model has used 7 most important independent predictors of preterm delivery and the performance of the model is relatively good. Using the variables of this study, health care workers can identify mothers at risk of preterm delivery, and take the necessary measures to protect the mothers at risk. The results of this research can initiate further studies in this eld.


Introduction
During the life of pregnant women, preterm delivery is considered as an unpredictable and inevitable fact (1), and it has been de ned as the presence of uterine contractions with su cient frequency and intensity affecting progressive cervical effacement and dilatation before reaching the 37 th week (259 days) of gestation age (2,3). In 2016, the global prevalence of preterm delivery had been estimated to be approximately 15 million a year (4). The prevalence of preterm delivery for different studies conducted in different regions of the country differs in Iran, and according to several studies, it has been 8.7% in Preterm delivery is an obstetrical complication with complicated causes, in the occurrence of which a set of factors such as individual, behavioral and psychological, environmental, biological and genetic factors, medical conditions, as well as infertility treatment play a major role (9). The numerous complications and long-term consequences of preterm delivery include respiratory distress syndrome, necrotizing enterocolitis, retinopathy, and long-term hearing-vision impairment in preterm infants leading to increased neonatal mortality (10,11).
Prediction is the process of quantitative and qualitative data analysis and making decisions on future events based on the past and present inputs and data. Prediction methods are categorized into two major groups: qualitative and quantitative (univariate and multivariate) (12). Predictive models provide patient and gynecologist with accurate information to discuss the expected outcomes and how to manage the situation separately (13). Due to the role of many contributing factors in preterm delivery phenomenon, the accuracy of predicting a preterm delivery is still a mystery at present (14).
Using PopBayes based on a Bayesian ltering algorithm and a previously collected dataset that included demographic, clinical and genetic factors, Lee et al. (2011) developed a spontaneous preterm delivery prediction model. They concluded that this instrument is a useful and adjustable tool for predicting preterm deliveries. This study has not examined important sociodemographic factors such as education, occupation, income and smoking (15). Another study conducted in 2018 in Canada concluded that although combining the socioeconomic status with personal predictors improves the overall preterm delivery prediction compared to using only personal predictors, the diagnosis rate does not su ce for public health or clinical use. They suggested that a predictive model with better predictability is needed to effectively nd women at risk of preterm delivery (16).
Complications related to preterm delivery can be a heavy burden for limited health resources. If mothers at risk of preterm delivery are diagnosed before their uterine activity begins, the pregnancy outcomes can be improved through proper management. More accurate prediction is necessary to identify the high-risk individuals and take the necessary measures to reduce the preterm delivery complications, which requires further studies. This study aims to investigate the simultaneous effect of different variables in predicting preterm delivery using an appropriate statistical model, so that its results can be used to improve the e ciency of the health systems in predicting preterm delivery.

Methodology
Research environment: This is a Cross-sectional study conducted in the medical school of Gonabad University of Medical Sciences in Iran. The main purpose of this study was to build a model of preterm delivery predictors based on the pregnancy care data. This study was approved by the Department of Health and the Vice Chancellor for Health of Gonabad University of Medical Sciences.

Data source:
The study design included community health centers in urban and rural areas of Gonabad University of Medical Sciences in eastern Iran. These centers are the main providers of prenatal care in the Iranian health network system. Reproductive care coverage for Iranian women in these community health centers is complete. With the introduction of the Apple system, information on the use of reproductive care in these centers is recorded electronically. Data were collected from the SIB system (Integrated Health System) of the Department of Health in which pregnancy, delivery and postpartum care information for mothers were recorded online during 2016-17 in Gonabad community health centers. Using sequential sampling method and based on the criteria of the research unit's selection form, the authors identi ed the cases eligible for being included in the study. The required information was collected based on the checklist of preterm delivery predictive factors.

Inclusion and exclusion criteria:
Termination of pregnancy due to abortion, fetal death and placental abruption, and the mothers with medical diseases (epilepsy, heart failures, gastrointestinal diseases, kidney diseases, diabetes, hypertension, intrauterine growth restriction (IUGR), placental abnormalities (previa, increta, percreta and accreta) and cervical cerclage were excluded from the study. Also, 1464 cases were excluded from the study due to poor or incomplete registration of the reported pregnancy information.

Predictors:
Spontaneous preterm delivery is de ned as delivery before 37 th week of gestation, which includes all deliveries resulted by the onset of spontaneous contractions with/without premature rupture of the amniotic membranes. Gestational age was determined based on the last menstrual period or via ultrasound detection of cranial-caudal length at gestational age of 12 weeks.
The potential predictors were as follows: The potential predictor variables of pregnancy care found in the SIB system included maternal age (younger than 18 years, 18-35 years and older than 35 years), mother's education, place of residence, maternal parity, orthopnea, the rate of multiple births, gestational diabetes, urine culture, amniotic uid volume (AFV), the interval between the last health care and delivery, concurrence of pregnancy with contraception, maternal gestational weight gain (GWG), rapid weight gain, assisted reproduction techniques (ARTs), antiphospholipid antibodies, abdominal and ank pains, premature rupture of fetal membranes, the nal score of thromboembolism screening, frequency of health care, the score of pregnancy complications in the third trimester of pregnancy, having a history of stillbirths, having a history of abortions, having a history of preeclampsia, having a history of babies weighing less than 2500g, having a history of ectopic pregnancy, and taking supplements during pregnancy.

Model construction and validation:
The data collection tool was a researcher-made checklist created based on scienti c texts and papers, and was provided to 10 concerned experts to implement content validity and determine the content validity ratio (CVR) and content validity index (CVI). The nal checklist data was entered in SPSS (version 16). Chi-square or Fisher's exact tests were employed to evaluate the relationship between preterm delivery and qualitative variables, and the Mann-Whitney test was used to examine the relationship with quantitative variables. Also, due to the fact that the preterm delivery variable was a two-state response variable, logistic regression method was employed to determine the factors related to preterm delivery, such that, initially, the data were randomly divided into two parts. Of the entire data, 70% (training data) were used to build the model, and 30% of the data (experimental data) were used for internal evaluation of the model. Then, the relationship between each independent variable and preterm delivery was investigated separately through a simple logistic regression model, and the variables that were signi cant at the 0.05 level were entered into the multiple logistic regression model, and the correlation between these variables and preterm delivery was investigated at 0.05 signi cance level. Model evaluation was reported based on accuracy, speci city, sensitivity, positive predictive value, negative predictive value and the area under the ROC curve. Finally, the preterm delivery predictors model was externally validated for separate data of 100 women who had given birth with their pregnancy information recorded in the SIB system of the Department of Health.

Findings
A total of 5651 mothers who had given birth with their pregnancy information recorded in the SIB system of the Ministry of Health in Gonabad community health centers during the 2016-2017 period were studied ( Fig.1). Preterm delivery occurred in 649 cases (11.5%). Table 1 shows the summary of the characteristics of the women included in our study.
Based on the results of multiple logistic regression model, seven variables were identi ed as preterm delivery predictors. Thus, the chance of preterm birth in mothers aged over 35 years was 1.5 times that of the mothers aged 18-35 years (P=0.018). In multiple births, the chance of preterm delivery was 45.1 times the chance of that in delivery of a single infant (P<0.001). Also, in mothers with gestational diabetes, the chance of developing preterm delivery was 1.7 times higher than those without gestational diabetes (P=0.002). In pregnant mothers with decreased amniotic uid volume, the chance of preterm delivery was 6.2 times that of the mothers with normal amniotic uid volume (P=0.006). In pregnant women whose interval between the last health care and delivery was seven days or more, the chance of preterm was 2.6 times higher (P<0.001). Pregnant mothers whose weight gain was less than the recommended during their pregnancy had a higher chance of preterm birth (P=0.036, OR =1.4). Also, for every unit of increase in the pregnancy complication score in the third trimester of pregnancy, the chance of preterm was 2.4 times that of mothers who had gained as much weight as recommended during their pregnancy (P=0.004). The results of the logistic regression model based on the variables that were signi cant in simple regression and multiple regression model at the level of 0.05 have been presented in Table 2. In training data, accuracy was 65.4%, speci city was 65.8%, sensitivity was 66.0%, positive predictive value was 19.3%, negative predictive value was 92.9% and the area under the ROC curve was 0.681 (Fig.2). In the experimental data, accuracy was 65.3%, speci city was 66.0%, sensitivity was 60.2%, positive predictive value was 17.6%, negative predictive value was 93.2% and the area under the ROC curve was 0.682 (Fig.3). Also, in the external validation data, accuracy was 66.0%, speci city was 65.6%, sensitivity was 70.0%, positive predictive value was 18.4%, and negative predictive value was 95.2%.

Discussion
In univariate logistic regression model, the factors of maternal age of less than 18 years and more than 35 years, multiparity, multiple births, gestational diabetes, positive urine culture, decreased amniotic uid volume, an interval of 7 days or more between the last health care session and delivery, maternal weight gain less than the recommended and the score of pregnancy complications in the third trimester of pregnancy had a statistically signi cant relationship with preterm delivery.
In the multiple regression model, the variables of maternal age of more than 35 years, multiple births, gestational diabetes, decreased amniotic uid volume, an interval of 7 days or more between the last health care session and delivery, maternal weight gain less than the recommended and the score of pregnancy complications in the third trimester of pregnancy were effective in predicting preterm delivery. These results are in line with, and con rm, the results of some studies, for example, in (17)(18)(19)(20)(21)(22), age over 35 years was an independent risk factor for preterm delivery. In our study, ages over 35 years with an odds ratio of 1.5 was an independent risk factor for preterm delivery (P=0.018). In this age group, the increased preterm delivery is due to maternal and fetal causes such as premature and preterm rupture of fetal membranes and maternal diseases (23). In (19,(24)(25)(26), multiple births were independent risk factors for preterm delivery. The chance of preterm delivery in multiple births is 45.1 times higher than that in single births (P<0.001). The excessive uterine dilatation leads to early activation of placental-fetal endocrine cascade. Premature increased dilatation and activity of endocrine may trigger events altering the course of uterine activities, including premature cervical preparation (27). Gestational diabetes in (28-30) was an independent risk factor for preterm delivery. In our study, gestational diabetes with an odds ratio of 1.7 was an independent risk factor for preterm delivery (P=0.002). In mothers with gestational diabetes, medically indicated preterm deliveries due to obstetric or medical complications are more likely to occur (27). In (31,32), the decreased volume of amniotic uid was signi cant as an independent risk factor for preterm delivery. In pregnancies with reduced amniotic uid volume, the chance of preterm delivery is 6.2 times higher than that in pregnancies with normal volume of amniotic uid (P=0.006). In pregnancies with reduced amniotic uid volume, fetal growth restrictions and unreliable fetal heart rate (FHR) patterns are more likely, and as a result, preterm delivery is also more likely to occur (27). In this study, the interval between the last health care session and delivery was signi cant, and in people whose last health care session to delivery was 7 days or more, the chance of preterm delivery is 2.6 times greater than the people whose last health care session to delivery was less than 7 days (P<0.001). Appropriate and adequate prenatal care and identi cation of risk factors can reduce the chance of preterm delivery (33). In this study, pregnancy complications were signi cant in the third trimester of pregnancy, and the chance of preterm delivery in people with the complications of the third trimester of pregnancy is 2.4 times higher than those who do not experience complications of the third trimester of pregnancy (P=0.004). Premature and preterm rupture of fetal membranes is one of the major causes of preterm delivery. 30-35% of preterm deliveries occur after preterm rupture of the membranes. Abdominal and ank pains before preterm delivery can be due to contractions at the onset of labor or in ammation due to a urinary tract infection (UTI) (27). In several studies (34-38), maternal weight gain less than the recommended was a signi cant independent risk factor for preterm delivery. The chance of preterm delivery in mothers with weight gains lower than the recommended amounts is 1.4 times higher than those who gained the recommended weight percentage during pregnancy (P=0.036).
Weight gain less than recommended during pregnancy indicates abnormal physiology in pregnancy, maternal stress, depression and lack of lower or higher nutrient, which can play a signi cant role in the occurrence of preterm labor (39).
In the implemented regression model, accuracy is 65.4%, sensitivity is 66.0%, speci city is 65.8%, positive predictive value is 19.3%, negative predictive value is 92.9% and the area under the ROC curve is 0.681, which indicates relatively good performance of the model. In a study by Mercer et al. (1996), sensitivity was 24.2%, speci city was 92.1%, positive predictive value was 30.8% and negative predictive value was 89.4% for multiparous women, and sensitivity was 18.2%, speci city was 95.4%, positive predictive value was 33.3% and negative predictive value was 90.2% for nulliparous women (40). In the above study, the history of preterm birth was evaluated and applied in the model. Preterm delivery history variable is a very important factor in predicting the preterm delivery, which is not evaluated for the mothers who had given birth with pregnancy information registered in the SIB system of the Department of Health, and therefore, it has not been included in our research model. Although the speci city of the model designed by Mercer was higher than that of our study, it has lower sensitivity and positive predictive value, and the sensitivity and the negative predictive value of our model is higher than both models of Mercer.
Celik et al. (2008) employed cervical length and obstetric history variables, and the area under the curve of severe, early, medium and mild preterm delivery models was 0.919, 0.836, 0.819 and 0.650, with a sensitivity of 80.6%, 58.5%, 53.0% and 28.6%, respectively (41). In this study, cervical length and preterm delivery history were evaluated and used in their model. The variable of cervical length is also a very important factor in predicting preterm delivery, which is not recorded in all pregnancy care les of mothers, and thus, it has not been included in the present study. The area below the curve in our study model was higher than that of the mild preterm delivery model of the above study, and the sensitivity of our study was higher than that of the mild, moderate and early preterm delivery models of that study as well. In Schaaf et al. (2012), the area under the curve is 0.63, the positive predictive value is 19.4%, the negative predictive value is 96.3%, the sensitivity is 4.2%, and the speci city is 99.3% for the cut-off point of 0.1, and the positive predictive value is 25.8%, and the negative predictive value was 96.2% for the cutoff point of 0.4 (42). The model of this study has used the variable of preterm delivery history as well. Our study model is of a lower area under curve and higher sensitivity compared to those of the Schaaf's study model for the cut-off point of 0.1, and the predictive value of our model and that of Schaaf's study model for the cut-off point of 0.1 are almost the same. In Lee et al. (2011), in model one for the gestational age of less than 37 weeks, the sensitivity was 68.8%, the speci city was 85.0%, the positive predictive value was 50.8%, and the negative predictive value was 92.4%, and in model two for gestational age of less than 34 weeks, the sensitivity was 57.1%, the speci city was 86.2%, the positive predictive value was 19.0% and the negative predictive value was 97.3%, and in model three with gestational age of less than 32 weeks, the sensitivity was 64.3%, the speci city was 95.1%, the positive predictive value was 26.5% and the negative predictive value was 99.0% (15). The sensitivity of the present study model is greater than the sensitivity of the second and third models, the positive predictive value of our research model is higher than the positive predictive value of the second model, and the negative predictive value of the present research model is higher than that of the rst model. Dabi et al (2017) studied the two groups. The rst group of pregnant women had a single birth, and the second group of pregnant women delivered twins. The area under the ROC curve in the rst group was 0.88, and in the second group, it was 0.71, the sensitivity in the rst group was 80%, and for the second group, it was 69%, the speci city in the rst group was 82% and in the second group, it was 73%, the positive predictive value in the rst group was 39.8% and in the second group, it was 48.5%, the negative predictive value in the rst group was 94.8%, and in the second group, it was 91.3% ([43). This study also measured the cervical length and used it in the model. In the present study, it was not possible to measure cervical length in all mothers. The negative predictive value of the present research model was higher than that of the second group.

Strengths and limitations of the present study
The present study is privileged by employment of the most related routine variables of preterm delivery; hence no additional cost burdened the model development. This study used logistic regression model to predict preterm delivery, the advantages of which are its simplicity of implementation and interpretation of results. At the same time, one of the disadvantages of logistic regression is its inability to examine nonlinear or more complex relationships with the logit. Therefore, it is recommended to study preterm delivery predictors using data mining and machine learning methods that can overcome this disadvantage. Some of the limitations of this study are its retrospective nature and insu cient certainty of the accuracy and precision of the data found in the records. Also, some important variables such as preterm delivery history, cervical length and income are not included in the pregnancy care data, so complete recording of these variables and preterm delivery data is recommended for quantitative and qualitative enhancement in future studies.

Conclusions
It can be concluded that designing a multivariate logistic regression model using the most important independent predictors of preterm delivery for the Iranian female population including maternal age of over 35 years, multiple births, gestational diabetes, decreased volume of amniotic uid, a 7-day or more interval between the last health care session and delivery, maternal weight gain less than the recommended in pregnancy and the pregnancy complication score in the third trimester of pregnancy works relatively well in preterm delivery prediction. This study showed that the available data on antenatal pregnancy care contain a lot of information and variables from the beginning to the end of pregnancy that are useful for preterm delivery predictors modeling research and other related studies.

Declarations
All authors con rm that all methods were carried out in accordance with relevant guidelines and regulations (Declaration of Helsinki).

Acknowledgments
This article is the result of a master's thesis. We would like to extend our gratitude to the Vice Chancellor   Rock curve related to logistic regression evaluation on training data