Participants
The dataset used in this study derived from a cohort of pregnant women established in Qingdao between November 2017 and December 2019. This study was conducted at three women and child health care centers and a university-affiliated hospital. The university-affiliated Hospital is a treatment center for critical and difficult cases on the Jiaozhou Peninsula in eastern China, with 4500–5000 deliveries annually.
Information about participants’ socio-demographic characteristics and medical history, including age (identified from the identity card), height, pre-pregnancy body weight, and family history of diabetes, was collected through face-to-face interviews and self-completed questionnaires. Interestingly, the birth weight of participants and the delivery weight of their mothers, when the participants’ mothers gave birth to them, were collected in the study. Information about reproductive characteristics (gravidity, parity, multiple birth (yes/no), and pregnancy complications), as well as laboratory test results, including Hemoglobin (Hb), Urine Ket (U-Ket), Fasting Plasma Glucose (FPG), triglyceride (TG), total cholesterol (TC), and High-Density Lipoprotein (HDL), were all extracted from the participants’ medical records.
The Medical Ethics Committee of the first author’s university approved the study (Ethical number: QYFYKYLL411311920). All participants were informed of the aims and plan of the study, and written consent was obtained. During the entire research process, the names of the research participants were anonymized, and a unified numbering system was used to identify the research participants.
The inclusion criteria included women 1) aged 18 years old and above, 2) who planned to give birth in the study hospital, and 3) those with a singleton pregnancy. Women were not eligible to participate in the study if they: 1) were previously diagnosed with type Ⅰ or type Ⅱ diabetes mellitus, 2) had their first pregnancy visit later than the 28th week of gestation and could not obtain antenatal examination data in the early stage of pregnancy, or 3) had cognitive or communication impairments, such as participants’ being unable to hear or speak. The diagnosis of GDM was based on the results of the OGTT test administered at the 24th to 28th week of gestation. In addition, participants whose blood glucose levels at fasting, 1 h, or 2 h after taking sugar reached or exceeded 5.1, 10.0, and 8.5mmol/L [21] respectively, were diagnosed as GDM.
Prediction methods
To systematically train the model and evaluate its accuracy, the train_test_split function was used to randomly divide the data set into 75% as the training data set and 25% as the test data set. The model was first trained by the training data, and then verified by the test data. Using python-based tools, four machine learning algorithms were developed to model the original data, including LR, RF, SVM, and ANN, and the prediction abilities of the different models were compared.
To avoid the possibility of increasing the false positive rate when a single machine learning model improved the true positive rate, a New-Stacking algorithm was used based on the ensemble method. The approach involved integrating the GDM prediction model established by LR, RF, SVM, and ANN as the primary decision maker. And Decision Tree (DT) algorithm, a basic classification machine learning algorithm, was used to make secondary decisions to improve the prediction results of the algorithm. This process consisted of two stages. In the first stage, the entire dataset was randomly divided into a training set and test set, and then N different models were fit to the training set. For each model, K-fold cross-validation was used. For the same model, the prediction set of the whole training set could be obtained by modelling K times in turn. Similarly, each sample in the test set generated K prediction values, and the prediction set of the test set could be obtained by averaging. By analogy, the training set fit four different models (LR, RF, SVM, and ANN) to generate two output matrices, defined as (nrow (Train), N) and (nrow (Test), N). These results would enter the second stage of the Stacking method. In the second stage, a DT model was selected to fit the results of the training set in the first stage, and then this model was used for predictions in the test set.
Model evaluation
The performance of each model was evaluated using the areas under the Receiver Operating Characteristic Curve (AUC), diagnostic accuracy, sensitivity, and specificity. When normal gestation women in the test set were predicted to be normal gestation pregnancies by the model, it was marked as a True Negative (TN). Otherwise, when normal gestation pregnancies were predicted to be GDM patients, it was marked as a False Positive (FP). Similarly, when GDM patients in the test set were predicted to be normal by the model, it was marked as a False Negative (FN). Conversely, when GDM patients were correctly predicted to be GDM patients, the result was marked as a True Positive (TP). Thus, the diagnostic accuracy was defined as the proportion of all participants in which the gestational GDM status was correctly predicted (Eq. 1).
(1)
Sensitivity was defined as the percentage of GDM patients whose GDM status was successfully detected (Eq. 2).
(2)
Specificity was defined as the proportion of normal gestations that was successfully detected (Eq. 3).
(3)
The Receiver Operating Characteristic Curve (ROC) is a quantitative method for accurate classification of two confusing features. The horizontal axis represents the false positive rate (1 –Specificity) and the vertical axis indicates the Sensitivity. If the vertex of the curve is closer to the upper left corner, indicating that the model not only has a higher Sensitivity, but also has a lower false positive rate. The AUC can quantitatively describe the accuracy of the model. The larger the AUC, the better the prediction accuracy of the model.
Data analysis
The collected data were inputted into Excel 2016, and all the classified variables were processed as 0/1 variables. The output variable was predicted by whether diabetes was diagnosed by the OGTT test at the 24th to 28th week of gestation. If GDM was diagnosed, the result was marked as 1, and if the OGTT was normal, it was marked as 0.
There was inherent correlation between some indexes in the original data set, such as the BMI and body weight, weight growth, and body weight. To eliminate the relationship between the original index and the comprehensive index, principal component analysis (PCA) was used to reduce the dimension of the data. PCA was used to extract a series of principal components from the original data and project the high-dimensional data to the low-dimensional space. These principal components are linear combinations of the original data vectors, which can approximately reflect the characteristics of the original data whilst reducing the noise impact of the original data. In this study, PCA is used to extract the global features of the original data.