Predictive Value of Mid-Trimester Cervical Measurement Data Combined With Maternal Characteristics for Twin Preterm Birth at < 32 Weeks: A Retrospective Analysis and External Validation Study

Objective The purpose of this study was to develop a dynamic model to predict the risk of spontaneous preterm birth at < 32 weeks in twin pregnancy. Methods Women with twin pregnancies were followed up from January 2017 to December 2019 in two tertiary medical centres—data from one were used to construct the model, and data from the other were used to evaluate the model. Data on maternal demographic characteristics, transvaginal cervical length and funnelling during 20–24 weeks were extracted. The prediction model was constructed with independent variables determined by logistic regression analyses. After applying specied exclusion criteria, an algorithm with maternal and biophysical factors was developed based on 92 twin pregnancies with a preterm birth < 32 weeks and 672 twin pregnancies with a delivery ≥ 32 weeks. It was then evaluated among 36 pregnancies with a preterm birth < 32 weeks and 261 pregnancies with a delivery ≥ 32 weeks in a second tertiary centre without specic training. The model reached a sensitivity of 78.26%, specicity of 88.84%, false positive rate of 11.16% and negative predictive value of 96.76%; ROC characteristics proved that the model was superior to any single parameter with an AUC of 0.856. We developed and validated a dynamic nomogram model to predict the individual probability of early preterm birth in order to better represent the complex aetiology of twin pregnancies and hopefully improve the prediction and indication of interventions.


Introduction
Complications of preterm birth (PTB) are the primary cause of death among children in the rst 5 years of life, accounting for approximately 35% of deaths among new-borns and 18% of all paediatric deaths. 1 Twin gestations are increasing in number and currently account for 3% of all live births and approximately 15-20% of all PTBs. 2,3 Women with twin pregnancies are more likely to confront the loss one or two twins or be parents of one or two handicapped children. [4][5][6] To date, strategies for the prevention of PTB in twin pregnancy, such as the use of vaginal progesterone, cervical pessary and cervical cerclage, remain controversial or are considered to have limited effects. [7][8][9][10][11][12][13][14][15] This is partly due to results from RCTs using careless care models that lead to negative results for cervical cerclage, vaginal progesterone or cervical pessary. To address the growing desire for better guidance for clinical practice, it is necessary to distinguish patients who are at greater risk of early PTB from among the whole twin-pregnancy population.
There are discrepant opinions on how precisely the risk of spontaneous preterm birth (SPTB) in twin pregnancies can be determined. More importantly, preterm birth is a complex syndrome with many causes and phenotypes. In twins, there is an additional pre-existing risk due to overdistension and the effect on the cervix but possibly also due to increased uterine irritation and subclinical in ammation after ART or physical and psychological maternal stress factors. 16,17 In addition, the great variety in PTB rates signi es that there are epigenetic transgenerational stress factors and determinants from the social environment and the health care system. [18][19][20] Previous studies have demonstrated the association between SPTB in twin pregnancies and speci c clinical indicators, such as ethnic origin, age, nulliparity, chorionicity, body mass index (BMI), tobacco usage, history of previous preterm delivery, cervical length and funnelling. [21][22][23][24][25][26][27][28][29][30][31][32] Different combinations of clinical variables might indicate different likelihoods of SPTB. The purpose of this study is to synthesize an array of maternal demographic factors and clinical variables and develop a practical algorithm to calculate the risk of SPTB for twin pregnancies, similar to the rst trimester genetic disease screening tools or the Framingham heart disease score. 31

Characteristics of the development and external validation groups
In total, 1065 twin pregnancies were eligible for the study, of which 764 collected from the Fujian Maternity and Child Health Hospital were assigned to the training group, while 301 from the Fujian Provincial Hospital were assigned to the external validation group (Fig. 1). In the whole study population, the numbers of positive cases of SPTB at < 28, 32, 34 and 37 weeks were 33 (3.1%), 128 (12.0%), 221 (20.4%) and 639 (58.9%), respectively.
There were no signi cant differences in maternal demographic and clinical characteristics between the training and validation groups (all P > 0.05), indicating that the features of the training and external validation groups were similar and that subsequent external validation would be representative (Table 1). Predictive factors associated with SPTB at < 32 weeks In the training group, we conducted univariate and multivariate regression analyses to detect the correlations between clinical variables and probabilities of preterm delivery before 28 weeks, 32 weeks, and 34 weeks by applying the AIC-based backward procedure (Table 2). Then, we constructed three ROC curves for predicting SPTB according to the results of multivariate analysis. By comparing the AUCs, we found that the predictive value for SPTB at < 32 weeks was the highest (Fig. 2). After comprehensively considering the predictive power and the number of positive cases of SPTB before the three gestational weeks, we chose to establish a predictive model for predicting PTB at < 32 weeks. Multivariate logistic regression analysis (< 32 weeks) showed that nulliparity, monochorionicity, lower prepregnancy BMI, previous preterm birth or late abortion, cervical funnelling and shorter cervical canal were independent risk factors for SPTB at < 32 weeks. Development and validation of a dynamic nomogram for SPTB at < 32 weeks Based on meaningful independent factors in multivariate regression analysis, we developed a nomogram to predict SPTB probability at < 32 weeks (Fig. 3). Each point could be determined based on the intersection of the vertical line from the variable to the point axis. Then, the total risk score was calculated by adding each variable point. The possibility of twin SPTB at < 32 weeks could be read on the total point axis.
Furthermore, a user-friendly dynamic predicative nomogram was established and is available online (https://zhanwenqiang.shinyapps.io/DynNomapp/). The dynamic nomogram conveniently provided the individual probability of SPTB, which was calculated automatically by the input parameters of each subject (Fig. 4, PS: To facilitate readers' understanding, we speci cally recorded a video of how to use the model, which is in the attachment). Harrell's concordance index value of the nomogram model in the training group was 0.856 (95% CI: 0.813-0.899). When applied to the external validation group, Harrell's concordance index value in the external group was 0.808 (95% CI: 0.751-0.865). The calibration curves indicated that the probability predicted by the nomogram was in good agreement with the actual probabilities in both the internal cohort and external cohort (Fig. 5).
Model performance test and risk strati cation Next, the restricted cubic spline curve showed that the risk escalated continuously with the increasing scores obtained from the nomogram, which proves the reliability of the model (Fig. 6). In the training group and external validation group, the AUC of the nomogram predicting the probability of SPTB at < 32 weeks was 0.856 (95% CI: 0.813-0.899) and 0.791 (95% CI: 0.751-0.865), respectively. In both group, the prediction accuracy of the nomogram was superior to that of any single predictor (all P<0.05) (Fig. 7). With the ROC curve of the training group, the optimal cut-off value of the risk score (124.76) was calculated based on the maximum Youden index.Then, the cut-off value categorized the training population into the low-risk group (163 twin pregnancies with risk score ≤ 124.76) and the high-risk group (601 twin pregnancies with risk score > 124.76) (OR = 17.21, 95% CI 10.30-28.76, P < 0.001). The model reached a sensitivity of 78.26%, speci city of 88.84%, false positive rate of 11.16% and negative predictive value of 96.76%. By using the same cut-off value into external validation group, the results also proved the advantage of the nomogram (Table 3). Thus, we observed that the probability of SPTB in the high-risk group was signi cantly higher than that in the low-risk group (HR = 0.537, 95% CI (0.382-0.756), P < 0.001), and gestational age at delivery was signi cantly earlier in the high-risk group (Fig. 8).

Discussion
In our retrospective analysis and external validation study, we developed a predictive model of SPTB at < 32 weeks based on maternal characteristics and sonographic cervical measurements to provide an accurate and comprehensive risk estimation, which can serve as an assessement tool to help physicians make judicious treatment decisions about further management of twin pregnancy. Moreover, external validation and restricted cubic splines supported the predictive performance.
The reason we comprehensively considered all the above factors when building the model was that the predictive performance of a single maternal factor or cervix geometry (including length) is not satisfactory, primarily due to poor sensitivity. [36][37][38][39][40] The mechanism of SPTB involves various mechanical stimuli (two continuously growing foetuses and the expanding uterus) and biochemical stimuli (in ammatory factors, fetoplacental signals and steroid hormones). 41,42 Compared to that in singleton pregnancies, the mechanism of SPTB in twin pregnancies is predominantly determined by overdistension, whereas the role of in ammation and microbiologic invasion of the amniotic cavity (MIAC) is relatively minor. 16 Overdistension of the lower uterine segment and smooth muscle stretch in the human cervix provokes proin ammatory cytokine secretion, and research on changes in the cervical microstructure has been published by Vink et al. 17,43,44 Jose Villar et al proposed the use of a phenotypic classi cation system of PTB that does not force any PTB into a prede ned phenotype but instead relies on a new conceptual framework in which a maternal clinical phenotype of PTB potentially related to a certain perinatal outcome is characterized by all relevant conditions observed during pregnancy. 18 A series of common clinical characteristics, such as age, race, BMI, history of PTB, previous uterine surgeries, and tobacco usage, may indicate the initial states and variations in the structure and function of the cervix, which contributes to the risk of cervical insu ciency. 19,20,24,45,46 All these risk factors have interconnected effects and a computational framework for changing and remodelling the cervix. Our study is concordant with existing research indicating that nulliparity, lower prepregnancy BMI, history of PTB or late abortion, chorionicity, cervical funnelling and shorter cervical canal increase the possibility of SPTB in twin pregnancies. However, there is no risk calculation yet for PTB after 32 weeks, which still represents a population with a 10-fold increased risk for perinatal mortality compared to twins at term. 47 Our research incorporated maternal characteristics and biophysical tests of both cervical length and funnelling to develop a dynamic nomogram model that may better indicate clinical strategies, such as therapy decision-making and follow-up schedules, and may reduce complications for clinicians related to excessive monitoring and administration resulting from an unde ned or inherently subjective risk assessment. Thus, the ability to generate a risk assessment and present it in the form of a percentage for each patient will enable caregivers to schedule more frequent follow-ups or administer targeted interventions, such as antenatal corticosteroids and tocolytic therapy as well as transfer to a tertiary medical centre for patients at higher risk, while reducing overtreatment and unnecessary hospitalization for those at lower risk. On the other hand, in the study design for the negative trials regarding PTB intervention, only a few researchers screened out and followed high-risk twin pregnancies, which may introduce confusion regarding indications for the interventions and result in bias when comparing outcomes. 7,10,11,48 To some extent, a lack of good care during surveillance frequently makes the difference in RCTs. It would be interesting in the future to determine whether the use of this tool to assess the indications for interventions and stratify patients according to risk could improve outcomes.
Our study has some limitations. Most importantly, it is limited by its retrospective design. There is a possibility of confounding bias: patients with unmeasured or unobservable factors who were excluded may represent patients at higher risk, so that our study might ignore the most clinically interesting population. Second, the study population in the two centres is limited to our own population (Asian), which limits generalizability to people of different races. For example, in many high resource countries, the risk of PTB is associated with obesity and not underweight. 24,49 However, this potential limitation may also be considered a strength. All women included in the study were followed up and treated only in the two tertiary medical centres, which limits the confounding factors associated with the heterogeneity in provider bias, such as clinicians' experience, and differences in the process of monitoring and management for offering the intervention. Based on the model, researchers in other countries can make use of their own data on demographic characteristics to justify the odds for their population. The last limitation is that because of the incomplete data for cervical length before 20 weeks, our model may poorly predict very early PTBs since we adopted cervical measurements during 20-24 weeks and applied the system relatively late for the high-risk population. 50 In the future, we should concentrate on earlier evaluation of our algorithm to prevent early mortality and severe morbidity.
In summary, we developed and validated a dynamic nomogram model to predict the individual probability of early preterm birth; this nonogram better represents the complex aetiology of twin pregnancies and hopefully improves our understanding of the indications for interventions and, therefore, our ability to predict when they will be needed.

Study population
We retrospectively collected data from 1448 consecutive women with twin pregnancies in the Fujian Maternity and Child Health Hospital (with an annual delivery number of more than 20,000 and a speci ed preterm birth clinic for ambulatory patients) and the Fujian Provincial Hospital (with 2398 beds and an annual delivery number of more than 5000) from January 2017 to December 2019. This retrospective study was performed with approval from the Ethics Committee of the Fujian Maternity and Child Health Hospital and the Fujian Provincial Hospital (Ethical approval number: 2019-014). The data was anonymous, and the requirement for informed consent was therefore waived. The completion and reporting of the study was in accordance with STROBE guidelines.
Subjects with any of the following conditions were excluded: uncertain pregnancy date, genetic or structural abnormalities of either foetus, stillbirth of one or two foetuses, gestational age at birth < 20 weeks, twin birth weight < 500 g, monoamniotic or monochorionic twin pregnancy complicated by twin transfusion syndrome (TTTS) or twin anaemia-polycythaemia sequence (TAPS), placement of cervical cerclage, maternal or foetal indications for iatrogenic PTB at < 32 weeks, or delivery at a medical centre other than ours. Women who gave birth before 20 weeks were excluded because in most cases, these women were likely to represent a unique subgroup of women whose cervical changes would have been detected very early and would be extremely obvious. Additionally, these women would not have had their cervical measurement at the indicated gestational stage in our study period, which was a major part of our research. As a result, we excluded 383 patients who met the exclusion criteria, and thus, 1065 patients met the inclusion criteria.
We assigned 764 samples collected from the Fujian Maternity and Child Health Hospital as the training group and 301 samples collected from the Fujian Provincial Hospital as the external validation group. All samples were reassessed by two obstetricians according to the inclusion and exclusion criteria (the owchart showing the derivation of the development cohort and validation cohort is presented in Fig. 1).

Data collection
Data were extracted from the medical charts. Demographic characteristics included maternal age, prepregnancy body mass index (prepregnancy BMI), nulliparity, history of previous cervical surgery, history of tobacco usage, clinical data including validation of gestational age by rst trimester ultrasound, chorionicity, history of previous preterm or late abortion (during 12-28 weeks), complications during pregnancy, use of assisted reproductive technology, cervical length (20-24 weeks) and cervical funnelling, and gestational age at delivery.
Gestational age was calculated from the last menstrual period (LMP) and con rmed by the foetal crown-rump length measurement at the rst trimester ultrasonic scan. If a discrepancy of more than 7 days was observed, the sonographic gestational age was followed. Chorionicity was con rmed by identifying lambda and T signs with ultrasound imaging between 11 + 0 and 13 + 6 weeks of gestation. 33 All patients underwent transvaginal cervical length (TVCL) measurements between 20-24 weeks when the optimal image of the cervix was relatively easy to capture. The TVCL measurements of all subjects were performed by experienced sonographers at our ultrasound units. The ultrasound assessment was performed to measure the length of the cervical canal from the internal os to the external os and observe whether cervical funnelling appears with patients in the lithotomy position with an empty bladder. The measurement was repeated under gentle fundal pressure or the Valsalva manoeuvre unless severe cervical shortening was observed. Each examination was performed for at least 3 minutes as an evaluation period to detect the development of a "funnel", which was de ned as the protrusion of the amniotic membrane of 3 mm or more into the internal os as measured along the lateral border of the funnel. 34,35 To ensure that measurements were made appropriately and consistently, all the ultrasonic reports of every subject were reviewed by a single investigator who was blind to the outcome of maternity.

Statistical analysis
Model development Quantitative data are expressed as the median (interquartile range, IQR), and qualitative data are expressed as the number (percentage). The Wilcoxon-Mann-Whitney test or Fisher's exact test was performed to measure the distribution differences of variables between the development and external validation groups. Univariate and multivariate logistic regression analyses were used to detect the correlation between clinical variables and preterm birth at 28 weeks, 32 weeks, and 34 weeks by applying a backward procedure based on the Akaike information criterion (AIC). By drawing the ROC curve of the predicted probabilities of SPTB before three gestational weeks (28, 32, 34 weeks) with multivariate meaningful variables, the prediction power for SPTB before the three gestational weeks was compared. Based on these results, a nomogram model with higher predictive performance was established.

Model validation
The performance of the nomogram models in identi cation and calibration was evaluated. The discriminative ability and predictive ability of the model were evaluated through Harrell's C-index, and external crowds were introduced to further evaluate the predictive value of the model. The calibration curve was analysed by drawing the predicted probability of the nomogram and the actual occurrence of SPTB. Restricted cubic splines were used to evaluate the correlation between the model's predicted score and the risk of SPTB. Kaplan-Meier curves were generated to compare the pregnancy outcomes in the two groups with different risk strati cations. ROC curve analysis was used to evaluate the prediction performance of the nomogram model and that of each meaningful parameter.
Statistical analyses were all performed with R 3.6.0 software (R Foundation, Vienna, Austria). A two-sided P-value < 0.05 was considered to indicate statistical signi cance.