This is the first study predicting the risk of GDM based on the elemental content of fingernails using a machine learning algorithm. A similar approach has been used to evaluate the risk of GDM based on the metabolites of urine22. Conventional statistical models have been widely applied to evaluate the association between elements and GDM in many studies13,23,24. However, no studies have applied machine learning for this purpose. In the present study, we first used conventional statistical models, and found significant associations of Be, Ni, Se, Sn, Sb, Cu and Hg with GDM (Table 2 to Table 4). We here present the first report of a significant association between Be concentration and GDM. Nevertheless, statistical models cannot conclusively determine the risk of GDM solely by the association with elements. According to Senat, et al., many other basic characteristics including age, pre-pregnancy BMI, and family history of diabetes are general risk factors for GDM25. Passive smoking23; parity24 has also been reported as a potential risk factor. Machine learning can take into account these many factors. Hence, machine learning analysis was implemented to find the hidden pattern in multi-factorial data collected from pregnant women with and without GDM, and then predict the risk of GDM with trained models.
Numerous machine learning models for the prediction of GDM have been reported26,27; however, there is no consensus as to which one is best. As shown in Supplementary Fig. S1, the prediction performances of 15 machine learning algorithms were compared using the training data set. Ensemble models and SVM models resulted in similar AUC in the trained models, but the ensemble subspace model was more reproducible, suggesting it would be more reliable for the data in the present study. The major advantage of ensemble models over SVM is that ensemble models use multiple single models to form a new model. As a result, the prediction performance of an ensemble algorithm is usually better than a single algorithm28. After selecting the algorithms, different combinations of elements as well as basic characteristics were used to train models to obtain the highest accuracy.
The model was firstly trained by single element content in fingernails. The results of single element models (Supplementary Table S5) show that only the ensemble model trained by Cu level in fingernails provided acceptable prediction performance. Multiple studies have reported that multi-elements exposure is significantly associated with GDM6,7. Hence, we also evaluated the performance of models trained by multiple elements, ranging from two element combinations to four element combinations. Table 5 shows that when the number of elements increased, the prediction performance of the trained model also increased. The trained model was validated by an external testing data set. Figure 1 shows that acceptable accuracy was obtained by the trained model, which suggested that the concentrations of Cu, Ni and Se were important predictors for GDM. In the present study, addition of the basic characteristics of pregnant women did not improve the prediction performance of the machine learning models. It indicated that the models used in the present study worked better for numerical variables but not categorical variables33.
The elements used to train the predictive model were similar to most of the other studies. The correlation between circulating Cu level and GDM was summarized using the data from 14 published studies. It was concluded that high serum Cu was positively associated with the risk of GDM, especially among Asians during the third trimester29. Multiple systematic reviews and meta-analyses have focused on the association between Se and GDM. Those studies were consistent in concluding that Se concentrations were low in women with GDM compared with normal women, while the present study shows an opposite trend. The studies involved in those reviews determined serum Se level in either second or third trimesters30–32; while in this study Se levels were measured in the first trimester. Studies reporting the correlation of blood or urinary Ni with GDM are limited, and the conclusions are inconsistent. The present study found significant negative association between fingernail Ni level and GDM while the above mentioned studies reported no significant association7 and positive association6, respectively. Our results show that the correlation between fingernail elements and GDM is different from that of blood and urine.
Although the trained model in the present study did not include basic characteristic as predictors, our models highlighted fingernail Cu, Ni and Se concentrations as potential predictors for GDM. To the best of our knowledge, this is the first study demonstrating the prediction of GDM by elemental contents using machine learning. Our model outperformed the models trained by serum triglyceride and fasting plasma glucose level (AUC: 0.68)16. Our trained model also performed comparably to another model trained by cytosine-phosphate-guanine levels in blood (AUC: 0.82)18. Although excellent prediction models constructed by putrescine and microRNA with AUC 0.95 and 0.91, respectively, have been reported, studies using those models did not include external validation by a testing data set17,26. Our prediction model was validated by a testing data set and resulted in AUC 0.71, which indicated acceptable performance.
Another major highlight of the present study is that fingernail samples were collected in the first trimester. To date, many studies involving nail samples utilized nail clippings collected either during a later stage of pregnancy or postpartum34. Information obtained from nail samples represents exposure from a few weeks to a few months before collection35. As a result, the association observed using those samples is closely related to the middle to later stage of pregnancy. In contrast, the fingernail samples used in this study represent exposure during the first few weeks of gestation, if not before, which is much earlier than the identification of GDM. But this is what prediction means: Anticipating a problem before it develops. The model used in this pilot study highlights the ability of fingernail Cu, Ni and Se levels to predict GDM because it was predicting the risk of GDM before the development of GDM.
In the present work, we collected both urine and fingernail samples from the same individual and predicted the risk of GDM with their elemental contents through machine learning analysis. One of the major advantages of using fingernails rather than urine is that the elemental detection rate in fingernails is higher than that in urine. The elemental analysis revealed more than 90% of 24 elements in fingernail samples, while the same analysis could detect only 9 elements in urine samples. For fingernails, it should be pointed out that although the detection rates of Be and Hg were relatively low, our model found that they had a significant association with the risk of GDM. In terms of the prediction performance of the trained model, prediction by fingernail elemental contents provided acceptable predictive accuracy for the testing data set while the prediction by urinary elemental contents was similar to random guessing, as the AUC was 0.49 for the external validation result of a urine prediction model (Fig. 1)36. It was mainly due to the low elemental detection rate and no significant difference in elemental concentrations between control and GDM patients for urine sample (Supplementary Table S3). Although it is expected that the use of urine sample will remain dominant in HBM studies, this pilot study highlights that fingernails are a potential alternative sample for predicting the risk of GDM.
However, there are several important limitations that should be considered in interpreting the results of the present study. Firstly, the sample size was relatively small. A larger sample (more than 1000 pregnant women in total) will be utilized in the future study to compare the prediction performance of models with other studies37. Secondly, the reason why the results of this study with regard to the correlation between some of the elements with GDM were not consistent with past studies is not known. For example, As content in urine or blood is well-known for its correlation with GDM but no significant association was observed in the present study38,39. To date, there is only one study reported As content in toenails in association with GDM, and it found that As content in toenails collected 2 weeks postpartum was significantly associated with GDM13. Our study utilized fingernails, collected in the first trimester. The influence of type of nails and the specific stage of pregnancy needs to be thoroughly examined in future studies, and other reasons for these inconsistencies need to be explored. Thirdly, the urinary elemental detection rates in the present study were low, which affected the results of machine learning.