Simple Model to Predict Gestational Diabetes in Nulliparous Women in Early Pregnancy

Background The aim of this study was to develop a simple tool using anthropometric, clinical, and analytical variables to predict the risk of gestational diabetes mellitus in the rst trimester of pregnancy. Methods A historical cohort study was conducted with 1,946 Caucasian nulliparous pregnant women at the Real (Cádiz/Spain). The predictive model used was a multivariate logistic regression evaluated by 10-fold cross-validation with ve iterations. Receiver-operating characteristic and prediction recall curves were plotted with the predictions of the model. Optimal cut-off points of the receiver-operating characteristic curve were estimated using the Youden Index and minimum distance to point 1,1. and as variables. The model showed an accuracy of and a receiver-operating characteristic area under the curve of 0.791.


Background
Gestational diabetes mellitus (GDM) is de ned as glucose intolerance that is diagnosed for the rst time during pregnancy, but usually resolves soon after delivery. 1 Currently, there is an increase in the incidence of obesity and type II diabetes mellitus, and this has led to an increase in the incidence of GDM. 2,3 GDM is associated with adverse perinatal outcomes, including macrosomia, shoulder dystocia, cesarean delivery, preeclampsia, and neonatal hypoglycemia. 4,5 Moreover, GDM is a marker of susceptibility to health problems later in life for both the mother and her child, including obesity, type 2 diabetes, metabolic syndrome, and cardiovascular disease. 6 Nowadays, GDM is mostly diagnosed at the third trimester of pregnancy when the fetal and maternal deleterious effects have already started. This is because, in most cases, the diagnosis is established in women who return a positive screening test result (O'Sullivan's test), and this test is usually performed between 24 and 28 weeks of gestation. 7 There are several models for predicting the risk of GDM. The performance of these models varies widely with receiver-operating characteristic area under the curve (ROC-AUC) 8 results ranging from 0.7 to 0.9. [9][10][11][12][13][14] Most of these models are produced using multivariate logistic regression, 15 but some are developed using learning machine techniques. 16,17 No one technique has been shown to have clear superiority over the others for general diabetes or in GDM. 18 Consequently, there is no common model used in clinical practice. This may be due to limitations inherent in the models, including the inclusion of too many variables to be practical for clinical practice. In addition, other models include variables that are not measured in the context of the management of a normal pregnancy. 19,20 Moreover, some models perform considerably worse in certain groups, such as primiparous women 10 or women with a negative gestational diabetes history. 19 These groups are the ones with the most uncertainty, and with the greatest need for determination of the risk. Therefore, a simple model with variables regularly assessed in clinical practice and high predictive capability for most groups is needed. If we were able to determine which pregnant women are at risk of developing GDM in the rst trimester, we could implement interventions aimed at minimizing the maternal and fetal impacts of diabetes before the disease is established, as suggested by some authors. [21][22][23] We hypothesized that it is possible to predict the risk of GDM in the rst trimester of gestation using anthropometric, clinical, and analytic variables. The aim of this study was to develop a predictive model using these variables.

Methods
A historical cohort study was conducted at the Department of Obstetrics and Gynecology of the University Hospital of Puerto Real (Cádiz/Spain). Medical records of all consecutive singleton births that occurred between January 2014 and December 2019 were retrieved from our clinical information system. This study was approved by the Ethic Committee of Biomedical Research of Cádiz.
Pregnant women identi ed as having preexisting diabetes mellitus were excluded. Additionally, pregnancies with deliveries before gestational week 32 or after gestational week 42 were excluded. Only nulliparous Caucasian women were included in the study.
Categorical data were summarized as counts and percentages. The distributions of continuous data were assessed using the Shapiro-Wilk test. Continuous data with a normal distribution were summarized as mean and standard deviation. In contrast, when the data showed a non-normal distribution, the median and the interquartile range were used.
The predictive model involved multivariate logistic regression evaluated with 10-fold cross-validation and ve iterations trained towards improvement of the accuracy using R software version 4.0.5. 24 The normality and homogeneity of the model residuals were tested using the Shapiro-Francia and Levene tests, respectively.
Variable selection was performed using a backward elimination procedure. The inclusion criteria for the variables was 95% statistical signi cance for both dichotomous and quantitative variables. To maintain model consistency, when variables were polytomous, only one of the subcategories was required for it to be signi cant and subsequently included.
Quantitative variables were transformed into categorical variables and the predictive impact in the model was tested using an F test and Chi-square test with an H 0 that considered no predictive changes in the model.
We modeled all the interactions between the two variables. If the new variable originating from the interaction was statistically signi cant, we evaluated the predictive impact of the interaction. This evaluation consisted of a MANOVA by Roy's largest root and Pillai's Trace. The values of ROC-AUC and prediction recall (PR)-AUC of the inclusion of the interaction were analyzed.
ROC and PR curves were plotted with the predictions of the model. Optimal cut-off points of the ROC curve were estimated by the Youden index and minimum distance to point 1,1 (sensitivity and speci city equal to 1). Additionally, we estimated the cut-off point needed to obtain a sensitivity equal to 80% as a rule-in alternative to diagnose GDM.
An online calculator was developed using the Shiny package 25 for the R programming language. This tool is accessible in the link https://obgynreference.shinyapps.io/calcdiabgest/.

Results
The initial study population involved 7,505 pregnancies. Following application of the exclusion criteria  Table 1 shows the epidemiological characteristics of the study population. A family history of diabetes was found in 20% of pregnant women. Body mass index (BMI) percentages were adjusted to the expected values for a Spanish female population. 26 Based on the multivariate logistic regression model, GDM was predicted in nulliparous women with an accuracy of 0.93 and a ROC-AUC of 0.791. These predictive values were achieved using independent variables that are commonly assessed in clinical practice, such as weight and height, thyroid-stimulating hormone (TSH) levels, family history of diabetes mellitus or hypertension, and the presence of chronic hypertension. (Table 2).
There are some points worth noting regarding the features of the independent variables of the model. The most impactful variable was expected to be either a family history of diabetes or obesity. However chronic hypertension and twin pregnancy increased the risk of GDM considerably more.
There was a statistically signi cant interaction between age and chronic hypertension. This interaction shows that young women with chronic hypertension are more prone to develop GDM. Our population showed more incidence of hypertension in older population. (Figure 2).
Although age does not have remarkable importance when comparing two pregnant women with a small age difference, it can become the most impactful factor when comparing women with an age difference of 19 years, and this surpasses the impact of twin pregnancy. For a pregnant woman aged 40 years, the impact of chronic hypertension was ameliorated to an extent to be lower than the impact of family history of diabetes, and this was remarkable considering that a model without this variable interaction still showed chronic hypertension as the most impactful variable.
The model reached an accuracy of 0.93. The Shapiro-Francia test indicated that the residual variables did not show a normal distribution; thus, the Levene test was performed with some different splits of the data, and showed homogeneity.
The prediction capability of the model was tested with an ROC curve (Figure 3). The ROC-AUC of the model Since the model had a considerable ROC-AUC, we calculated a PR curve for this model to properly compare it with other results. The PR curve can be seen in Figure 4. For the nal model, the value of the PR curve was 0.30.

Discussion
We found an increase in the risk of GDM development due to age, BMI, twin pregnancy, hypertension, diabetes, hypertension history, and TSH level, and there was an interaction between age and hypertension.
The relation between these features and GDM is already known. We only identi ed one previous model that considered chronic hypertension 11 as a risk factor for GDM and no models had considered TSH levels or family history of hypertension.
Based on our ndings, we developed a model to predict GDM in nulliparous women with a ROC-AUC of 0.79, which provided an 80% sensitivity and 59% speci city for a 4.3% threshold.
Untreated GDM carries signi cant risk of perinatal morbidity at all severity levels. Prompt treatment can enhance outcomes in newborns. Our model provides a tool to assess the risk of GDM in the rst trimester of pregnancy, allowing early intervention and minimization of the maternal and fetal impact of diabetes before the disease manifests.
Although there are many resources available for prediction of GDM, they all have limitations. The GDM models that we are aware of are shown in Table 3. However, direct comparison between some models might not be appropriate owing to the difference in GDM prevalence in different populations.
Savvidou et al. 19 reported a 0.86 ROC-AUC, but this dropped when used to predict the GDM risk in women with a negative GDM history. They also used variables that are not measured in common clinical practice, making its clinical application more di cult. Models, such as that of Van Leeuwen et al. 23 or Zheng et al. 13 are simple to administer, but our model has a better AUC.
Sweeting et al. 9 developed a model which performs better with a 0.91 ROC-AUC, although their population size was small. Moreover, when this model was used in nulliparous women, the AUC dropped to 0.76 when using biomarkers, and to 0.70 without biomarkers, which is considerably less than the 0.79 ROC-AUC found for our model. Moreover, the model itself uses a high number of variables, and many of them are not currently measured in common clinical practice, especially lipocalin-2.
Artzi et al. 16  Other models, such as that of Syngelaki et al. 11 or Nanda et al., 12 have good predictive capability, the former having an ROC-AUC of 0.84 without including previous history of GDM. Based on the variables used in these models, the ROC-AUC drop may be less severe. However, these models did not incorporate data on nulliparous women, or data from another cohort.
One of the strengths of our model is its high predictive capability. Another strength is its simplicity. As such, the model can be used by most current clinics since the variables it contains are commonly determined in regular clinical practice. Moreover, since those variables are evaluated in the rst trimester, the model allows prediction of the development of GDM before it appears. Another point is that our study covers a group of great interest-nulliparous women-for whom clinicians do not have information about previous pregnancies.
Our model does have limitations. First, some potentially impactful variables were not studied owing to the limitations of the database or current practice, such as glucose levels. Second, this study was performed mainly on Caucasian women; thus, we should be careful about generalizing the results to women of other ethnicities. Third, since the model was designed for nulliparous women, it was expected to underperform in women with previous pregnancies.
Although the results obtained by cross-validation are good, to use our model in another population, an external validation study would need to be carried out. It is also our desire to improve the model in the future using some variables that were not studied in this retrospective study, such is glucose levels and ethnicity.
Furthermore, a future study in multiparous pregnant women, including those with a history of GDM as a variable, would be desirable.

Page 7/14
This study provides a tool to predict the onset of GDM in nulliparous women during the rst trimester. It is more accurate than the any of the other currently available tools and allows earlier treatment intervention.

Abbreviations
GDM: gestational diabetes mellitus, ROC-AUC: receiver-operating characteristic area under the curve Declarations 1. Etthics approval and consent to participate This study was approved by the Ethic Committee of Biomedical Research of Cádiz at 13th July of 2020. In the present study due to the retrospective nature of the study, no informed consent was required.

Consent for publication
Not applicable

Availability of data and material
The data that support the ndings of this study are available from the corresponding author (Fernandez Alba JJ), upon reasonable request.

Competing interests
The authors declare that they have no competing interests.

Funding:
Not applicable, no funding was received for the development of this study.   Boxplot showing the correlation between age (in years) and chronic hypertension. Women with chronic hypertension are signi cantly older than normal women.