Prediction of fasting plasma glucose and glycated haemoglobin using machine learning based on tongue features

Given tongue features and basic features, this study aimed to develop and assess a non-invasive machine learning model to perform regression prediction on fasting plasma glucose and glycated haemoglobin which will help optimize diabetes risk warning. We collected the basic features, tongue features and blood features of the subjects. Using machine learning algorithms to analyze these data, we built models to predict fasting plasma glucose and glycated haemoglobin. Then the performance of the models was evaluated through 5-fold cross-validation results and test set results. a diabetes risk prediction model to predict FPG and HbA 1c . The model can provide timely risk warning to prevent or delay the onset of diabetes.


Background
Given tongue features and basic features, this study aimed to develop and assess a non-invasive machine learning model to perform regression prediction on fasting plasma glucose and glycated haemoglobin which will help optimize diabetes risk warning.

Methods
We collected the basic features, tongue features and blood features of the subjects. Using machine learning algorithms to analyze these data, we built models to predict fasting plasma glucose and glycated haemoglobin. Then the performance of the models was evaluated through 5-fold crossvalidation results and test set results.

Results
The results of cross validation on the training set showed that given non-invasive input features, the minimum average mean square error of fasting plasma glucose and glycated haemoglobin prediction

Conclusions
We developed an effective non-invasive method for estimating fasting plasma glucose and glycated haemoglobin from tongue features and basic features combined, which may help identify individuals at high risk for diabetes.

Background
Diabetes takes a signi cant toll on health care systems and public health [1]. The prevalence rate is currently 9.3%, and there are 463 million people aged 20-79 with DM globally. It is estimated that the number will reach 578.4 million by 2030. The number of diabetic patients in China is as high as 116.4 million, ranking the rst in the world. Only in 2019, the medical cost of diabetic patients in China is as high as $109 billion, second only to the United States [2]. China is the country with the most severe diabetes prevalence. Diabetes affects multiple organ systems and is associated with multiple vascular and non-vascular complications [3]. Excess morbidity and premature death place a heavy burden and farreaching impact on individuals, families and societies [4].
Effective interventions to prevent or delay diabetes and its complications have the potential to save signi cant health care costs and improve the health of a population in the long run [5,6]. However, for those people at a low risk of diabetes, early intervention should not be undertaken, and hence it is vital to identify high-risk individuals. In order to reduce the in uence of diabetes and lower the morbidity, noninvasive risk models are needed to detect individuals at a high risk, Moreover, non-invasive risk models are more economical and more suitable for large-scale screening than invasive risk models [7,8].
There is growing evidence that diabetes is associated with changes in tongue image.
The yellow tongue coating is associated with a high prevalence of diabetes, and at the same time, it is also associated with prediabetes among Asian people [9]. As a non-invasive and readily available feature, purple tongue, thick tongue coating, and yellow tongue coating can be used for early screening of diabetes [10]. At present, the researches that use machine learning to analyze the features of tongue image and establish a diabetes risk prediction model are mostly qualitative and two-category studies [11,12].
We hypothesize that we could train a machine learning model to identify the subtle features acquired from a standard tongue image that are due to glucose metabolism level changes associated with a history of hyperglycemia. To the best of our knowledge, this is the rst attempt to use tongue features to quantitatively predict fasting plasma glucose (FPG) and glycated haemoglobin(HbA 1c ).

Methods
We use machine learning to establish the mapping relation between various types of features and the glucose metabolism index, make predictions for FPG and HbA 1c in undiagnosed subjects based on the mapping relation, and nally achieve the predictions of diabetes risk (Fig. 1).

Data Collection
We included all subjects aged 18 years or older with at least one digital, normal tongue capture acquired at Shu Guang Hospital A liated to Shanghai University of Traditional Chinese Medicine (SHUTCM) between Apr 13, 2015, and Nov 15, 2019. A total of 2,500 subjects participated in the study. Due to incomplete data or incorrect data recording, 320 subjects were excluded, and eventually 2,180 subjects were enrolled in the study. Basic information and laboratory test results were provided by Shu Guang Hospital. Our study has been approved by the ethics committee of SHUTCM. All subjects were willing to participate in the study and signed the informed consent.

Tongue Features Extraction
Tongue images are captured by the TFDA-1 Tongue Diagnosis Instrument (Fig. 2). The key component of TFDA-1 is a stable light source, which has color temperature of 5000K and color rendering index of 97. Given the stable light source and white diffuse re ection coating, TFDA-1 can ensure a standard and stable environment for acquiring tongue images [13].
After tongue images acquired by TFDA-1, features are extracted by the Tongue Diagnosis Analysis System V2.0. TDAS rst segments the tongue area from the original image, separates the tongue body (TB) and tongue coating (TC) by the "chrominance-threshold method" [14], and automatically calculates the color features and texture features (Fig. 3). R, G, B represent the three components of the RGB color space; L, a, b represent the three components of the Lab color space; H, S, I represent the three components of the HSI color space; Y, Cr, Cb represent the three components of YCrCb color space. perAll is the ratio of tongue coating to the entire tongue surface, perPart is the ratio of tongue coating to the tongue surface without the tongue coating. Texture features include contrast (CON), angle second moments (ASM), entropy (ENT), and mean [15,16].
Both TFDA-1 and TDAS V2.0 are developed by the intelligent diagnostic laboratory of SHUTCM.

Handling Missing and Abnormal Values
Missing data may weaken the representativeness of the samples and complicate the research analysis [17]. In order to ensure the reliability of the experiment, samples with missing values were deleted directly. According to the Tukey Method [18], if the sample contains two or more outliners which are higher than upper whisker or lower than lower whisker, then it is deleted directly. A few abnormal values of data were handled by replacing them with the mean value of feature. Where X denotes the original value of the feature, Q3 denotes upper quartile, Q1 denotes lower quartile, IQR denotes the difference between Q3 and Q1, the upper whisker denotes Q3 plus 1.5 times IQR, and the lower whisker denotes the difference between Q1 and 1.5 times IQR.

Data Scaling and Normalization
The appropriate conditions for the machine learning model to work require that the eigenvalues are in a similar scale and approximate to a normal distribution. Normalization is applied to each observation so that the values in a row have a unit norm.
where X denotes the original value of the feature, X̄ denotes the mean value of the feature, and Std denotes the standard deviation of the feature.
Basic features included in this study were Age, Weight, systolic blood pressure (SBP) and diastolic blood pressure (DBP). Age was calculated as Year, Weight was calculated as Kg, and SBP and DBP were calculated as mmHg.

Statistical analysis
Normally distributed variables were presented as mean ± standard deviation. Skewed variables were presented as the median, 25% quartile and 75% quartile. Continuous data were tested for normality using the D'Agostino and Pearson's tests. Continuous data were tested for homogeneity of variance using the Bartlett's test. The two-sample t-test, separate variance estimation t-test and Wilcoxon rank-sum test were used to compare continuous variables between normal group and hyperglycemia group. The correlations were executed using Pearson's and Spearman's methods. All tests were two-sided and statistical signi cance was assumed at P < 0.05. All analyses were performed with Python software, version 3.7.4. Table 1 Statistical analysis was used to compare the differences between the basic and blood features of the normal and hyperglycemia group, and the correlations between the features and glucose metabolism index were analyzed.

Data Partition
Our research divides the data set into a training set and a test set according to the ratio of 8: 2. The training set consisted of 80% of the original database (1744 individuals). The test set consisted of 20% of the original database (436 individuals). We performed a 5-fold cross-validation on the training set and selected the model through cross-validation, i.e., we used 80% of the training set, leaving 20% as the validation set, and then rotated ve times. The test set didn't participate in the training process of the model, and the actual detection performance of the model was checked on the test set.

Feature Selection
In this paper, we selected Age, Weight, BP, RBC, HGB, WBC, TCHO, TG, HDL LDL, ALT, TB-L, TB-a, TB-b, TC-L, TC-a, and TC-b according to research practice. Age is an important risk factor for the diagnosis, development and prognosis of type 2 diabetes; however, HbA 1c has a tendency to decline with age, and only increases when it is older than 90 years of age [21]. Weight is highly correlated with the occurrence of diabetes. After the diagnosis of diabetes, weight loss of more than 5% can improve the level of HbA 1c and reduce the risk of cardiovascular disease within 10 years [22]. There is no causal relationship between hypertension and diabetes, but diabetes often leads to increased blood pressure [23]. Hyperglycemia can affect blood indicators, which are recognized a risk factor for complications. Due to the chronic in ammatory state of diabetes, pro-in ammatory factors can differentiate and mature white blood cells. Diabetes changes the surface charge and aggregation of red blood cells to reduce the number of red blood cells and the content of hemoglobin [24]. Fat accumulation in the body leads to excessive saturated fatty acids remaining in the cells. These saturated fatty acids can be toxic to liver and pancreatic islet cells. Therefore, it is very important to evaluate lipid toxicity by detecting TCHO, TG, HDL LDL and ALT [25].
Tongue diagnosis has been widely used in the diagnosis of diabetes by traditional Chinese medicine. In clinical practice, 80% of diabetic patients do not show typical symptoms, and tongue signs have changed accordingly. For example, the tongue of diabetic patients is often red and yellow [26]. Studies have shown that yellow tongue coating is associated with a high incidence of diabetes [9]. Given the above research, we choose the related features to train the machine learning regression model.

Model Construction
In our study, we utilize multiple supervised learning models for regression of FPG and HbA 1c of subjects.
We not only used the popular arti cial neural network(ANN) model, but also used three ensemble models including Gradient Boosting Decision Tree(GBDT), Random Forest(RF) and eXtreme Gradient Boosting Tree(XGBT).
In a simple neural network, it generally includes three layers, an input layer, a hidden layer, and an output layer. The neurons are connected to each other. The data enters the neural network at the input layer and propagates forward to the output layer to get the solution. There is a corresponding weight at the input of each neuron to control the strength of the input [27]. RF [28] is an integrated tree algorithm based on the idea of bagging. Using a different bootstrap sample of the data, RF builds a foundational decision tree. Each node is split using the best among a subset of predictors randomly chosen at that node. The double randomness of feature selection and sample selection can not only improve the generalization ability of the model, but also avoid over tting the model.
GBDT and XGBT [29] have been widely used in a number of data mining and machine learning challenges. They are tree ensemble models which use K additive functions to predict the output. However, XGBT controls the complexity of the model by adding regular terms to the objective function to avoid over tting the model. We constructed models for prediction of FPG and HbA 1c with machine learning algorithms (Fig. 5). In the current experiment, to study the contribution of various types of features in modeling, we combined different types of features, including basic features, blood features and tongue features.

Fusion Strategy
The machine learning based model blending approach is a very common fusion technique [30]. The idea of model fusion is to use many independent models to calculate the initial prediction, and then mix the initial prediction to achieve a better nal prediction result. Four independent models were used to t the training set, the prediction veri cation set and the test set. The prediction results of the veri cation set were combined into a new training matrix, and the prediction results of the test set were combined into a new test matrix. The second layer of linear regression was used to t the new training matrix, predict the new test matrix, and get the nal test result (Fig. 6). The stacking and generalization operation of the blending approach uses different data to avoid the leakage of learning information.

Evaluation Criteria
Mean Squared Error(MSE) and coe cient of determination(R-squared) were used to evaluate the performance of our prediction model. The calculation of MSE was simple and the meaning was clear. It was a commonly used evaluation index in statistical analysis [31]. We should not only consider the actual working accuracy of the prediction model, but also consider the in uence of the prediction model on the clinical decision. Consequently, the Clark's Error Grid Analysis(EGA) [32,33] was used to determine the acceptable error for the accuracy of predictive value of FPG in comparison with the actual value. The more values that appear in Zones A and B, the more accurate the model is in terms of clinical utility [34].The scatter-plot was used to access the performance of the HbA 1c prediction model [35]. The closer the slope of the tted line is to 1, the closer the intercept is to 0, the better the model performance.

Cross Validation
In order to estimate the effect of model, we performed ve-fold cross validation on training set including  Table 3; Fig. 8 and Table 4). The lowest MSE and highest R-squared were obtained by the machine learning models that were applied for prediction of FPG and HbA 1c using non-invasive features included basic features and tongue features.
Combining tongue features with basic features would decrease MSE and increase R-squared compared with FPG and HbA 1c prediction with basic features alone. Combining tongue features with blood features would decrease MSE and increase R-squared compared with FPG and HbA 1c prediction with blood features alone. Combining basic features, blood features and tongue features would decrease MSE and increase R-squared compared with FPG and HbA 1c prediction using the model combined basic features and blood features( Fig. 9 and Table 5; Fig. 10 and Table 6; Fig. 12 and Table 7; Fig. 13 and Table 8). The EGA results of four models on the test set were presented, including the RF model with non-invasive features, the XGBT model with non-invasive features, the blending model with non-invasive features and blending model with full features. Because non-invasive RF model achieve highest R-squared and lowest MSE; non-invasive XGBT and blending models got better EGA results than non-invasive RF model; The best EGA results were obtained by the blending model with full features.
Given non-invasive features, 89.68% of the results predicted by the RF model appear in zone A, 9.63% of the results appear in zone B; 90.14% of the results predicted by the XGBT model appear in zone A, 9.17% of the results appear in zone B; 90.83% of the results predicted by the blending model appear in zone A, and 8.49% of the results predicted in zone B. The EGA results obtained on the blending model with noninvasive features input closely agreed with the values using the best model with full features (Fig. 11). Scatter-plots of HbA 1c from ve models were shown. The GBDT model with the best MSE and R-squared achieved lower slope and intercept than the XGBT model and RF model. The blending model with noninvasive input features didn't produce better results. Based on non-invasive features, the XGBT model with obtained highest slope and lowest intercept, which was better than the blending model with full features (Fig. 14).

Discussion
Obesity, physical inactivity, and smoking have a tremendous harmful effect on the health of patients with diabetes [36]. Recent research shows that timely lifestyle-modifying interventions can prevent and delay the onset of diabetes [37]. By identifying high-risk subjects early and taking active intervention measures as early as possible, it can prevent people who are in prediabetes from developing diabetes and other diseases [38]. Furthermore, people who already have diabetes can be diagnosed as early as possible, receive treatment, and avoid complications occur [39][40][41]. Therefore, in order to reduce the morbidity and mortality of diabetes, we have strong reasons to use diabetes risk prediction models for population screening. Using machine learning methodologies, we established the diabetes risk prediction regression Machine learning is a branch of arti cial intelligence, which can extract the inherent relationships of data such as decision rules and patterns [43]. Several diabetes risk prediction models in previous studies were based on machine learning algorithms [44]. Choi et al. [45] has developed two effective machine learning models for predicting pre-diabetes. The input features included age, gender, family history of diabetes, hypertension, alcohol intake, BMI, smoking status, waist circumference, and physical activity. A recent cross-sectional study conducted in USA showed that machine learning models based on survey questionnaires are able to identify individuals at high risk of diabetes [46]. In another study [32], a deep learning model for dynamically predicting blood glucose was developed, which is conducive to selfmanagement of diabetes. Machine learning algorithms have become a key process for mining the internal relationship between clinical data and diabetes or pre-diabetes [47].In our study, classical machine learning models with different structures are used to explore the linear relationship between tongue features and glucose metabolism indicators, moreover, we attempted to enhance this relationship through model fusion.
The current research is mainly to use statistical methods to explore the relationship between tongue features and diabetes and pre-diabetes. Given the tongue features, the diabetes risk prediction model established by machine learning method mainly performs qualitative classi cation prediction [11,12]. However, several potential limitations of this study should be mentioned. Despite our best efforts, the sample size we included in the study was still relatively small. In addition, our study requires independent evaluations from data collected in other sources to further validate the performance of the model.

Conclusion
We discovered a liner relationship between tongue features and glucose metabolism index by means of machine learning algorithms.   Fusion strategy of blending approach Average MSE of various FPG prediction models with different types of features using cross validation.

Figure 8
Page 29/32 Average MSE of various HbA1c prediction models with different types of features using cross validation. MSE of various FPG prediction models with different types of features on the test set. Figure 10 R-squared of various FPG prediction models with different types of features on the test set.

Figure 11
The EGA results of (a)RF model with non-invasive features, (b)XGBT model with non-invasive features, (c)blending model with non-invasive features and (d)blending model with full features on the test set.
Non-invasive features refer to tongue features and basic features combined.

Figure 12
MSE of various HbA1c prediction models with different types of features on the test set.

Figure 13
Page 32/32 R-squared of various HbA1c prediction models with different types of features on the test set.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.