We use machine learning to establish the mapping relation between various types of features and the glucose metabolism index, make predictions for FPG and HbA1c in undiagnosed subjects based on the mapping relation, and finally achieve the predictions of diabetes risk(Fig. 1).
Data Collection
We included all subjects aged 18 years or older with at least one digital, normal tongue capture acquired at Shu Guang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine (SHUTCM) between Apr 13, 2015, and Nov 15, 2019. A total of 2,500 subjects participated in the study. Due to incomplete data or incorrect data recording, 320 subjects were excluded, and eventually 2,180 subjects were enrolled in the study. Basic information and laboratory test results were provided by Shu Guang Hospital. Our study has been approved by the ethics committee of SHUTCM. All subjects were willing to participate in the study and signed the informed consent.
Tongue Features Extraction
Tongue images are captured by the TFDA-1 Tongue Diagnosis Instrument(Fig. 2). The key component of TFDA-1 is a stable light source, which has color temperature of 5000K and color rendering index of 97. Given the stable light source and white diffuse reflection coating, TFDA-1 can ensure a standard and stable environment for acquiring tongue images[13].
After tongue images acquired by TFDA-1, features are extracted by the Tongue Diagnosis Analysis System V2.0. TDAS first segments the tongue area from the original image, separates the tongue body (TB) and tongue coating (TC) by the "chrominance-threshold method"[14], and automatically calculates the color features and texture features(Fig. 3). R, G, B represent the three components of the RGB color space; L, a, b represent the three components of the Lab color space; H, S, I represent the three components of the HSI color space; Y, Cr, Cb represent the three components of YCrCb color space. perAll is the ratio of tongue coating to the entire tongue surface, perPart is the ratio of tongue coating to the tongue surface without the tongue coating. Texture features include contrast (CON), angle second moments (ASM), entropy (ENT), and mean[15, 16].
Both TFDA-1 and TDAS V2.0 are developed by the intelligent diagnostic laboratory of SHUTCM.
Handling Missing and Abnormal Values
Missing data may weaken the representativeness of the samples and complicate the research analysis[17]. In order to ensure the reliability of the experiment, samples with missing values were deleted directly. According to the Tukey Method[18], if the sample contains two or more outliners which are higher than upper whisker or lower than lower whisker, then it is deleted directly. A few abnormal values of data were handled by replacing them with the mean value of feature. Where X denotes the original value of the feature, Q3 denotes upper quartile, Q1 denotes lower quartile, IQR denotes the difference between Q3 and Q1, the upper whisker denotes Q3 plus 1.5 times IQR, and the lower whisker denotes the difference between Q1 and 1.5 times IQR.
Data Scaling and Normalization
The appropriate conditions for the machine learning model to work require that the eigenvalues are in a similar scale and approximate to a normal distribution. Normalization is applied to each observation so that the values in a row have a unit norm.
where X denotes the original value of the feature, X̄ denotes the mean value of the feature, and Std denotes the standard deviation of the feature.
Variable Definition
Hyperglycemia group was defined as FPG > 6.0mmol/L(108mg/dl)[19]; HbA1c >= 38.801mmol/mol(5.7%)[20].
Basic features included in this study were Age, Weight, systolic blood pressure (SBP) and diastolic blood pressure (DBP). Age was calculated as Year, Weight was calculated as Kg, and SBP and DBP were calculated as mmHg.
Blood features included in this study were white blood cell(WBC), red blood cell(RBC), hemoglobin(HGB), total cholesterol(TCHO), triglyceride(TG), low density lipoprotein(LDL), high density lipoprotein(HDL), alanine aminotransferase(ALT). WBC was calculated as 109/L, RBC was calculated as 1012/L, HGB was calculated as g/L, TCHO, TG, LDL and HDL were calculated as mmol/L, ALT was calculated as U/L.
Tongue features included in this study were perAll, perPart, TC-CON, TC-ASM, TC-ENT, TC-MEAN, TC-R, TC-G, TC-B, TC-L, TC-a, TC-b, TC-H, TC-I, TC-S, TC-Y, TC-Cr, TC-Cb, TB-CON, TB-ASM, TB-ENT, TB-MEAN,TB-R, TB-G, TB-B, TB-L, TB-a, TB-b, TB-H, TB-I, TB-S, TB-Y, TB-Cr, TB-Cb.
Tongue features included in machine leaning model were TC-L, TC-a, TC-b, TB-L, TB-a, TB-b.
Non-invasive features were the fusion of tongue features and basic features.
Full features were the fusion of tongue features, basic features and blood features.
Statistical analysis
Normally distributed variables were presented as mean ± standard deviation. Skewed variables were presented as the median, 25% quartile and 75% quartile. Continuous data were tested for normality using the D’Agostino and Pearson’s tests. Continuous data were tested for homogeneity of variance using the Bartlett’s test. The two-sample t-test, separate variance estimation t-test and Wilcoxon rank-sum test were used to compare continuous variables between normal group and hyperglycemia group. The correlations were executed using Pearson's and Spearman’s methods. All tests were two-sided and statistical significance was assumed at P < 0.05. All analyses were performed with Python software, version 3.7.4.
Table 1 Statistical analysis was used to compare the differences between the basic and blood features of the normal and hyperglycemia group, and the correlations between the features and glucose metabolism index were analyzed.
Feature
|
Normal
(N=1546)
|
Hyperglycemia
(N=634)
|
p valuea
|
CCa
FPG
|
p valueb
|
CCb
HbA1c
|
p valuec
|
Age(Year)
|
38.0(30.0-47.0)
|
55.0(46.0- 63.75)
|
<0.001
|
0.373
|
<0.001
|
0.536
|
<0.001
|
Weight(Kg)
|
64.0(55.0-73.0)
|
71.0(63.0-78.0)
|
<0.001
|
0.257
|
<0.001
|
0.25
|
<0.001
|
SBP(mmHg)
|
119.0(109.0-131.0)
|
133.0(121.25-145.0)
|
<0.001
|
0.397
|
<0.001
|
0.354
|
<0.001
|
DBP(mmHg)
|
75.0(68.0-83.0)
|
80.0(73.0-88.0)
|
<0.001
|
0.264
|
<0.001
|
0.224
|
<0.001
|
WBC(109/L)
|
5.8(5.1-6.8)
|
6.3(5.4-7.4)
|
<0.001
|
0.1
|
<0.001
|
0.158
|
<0.001
|
RBC(1012/L)
|
4.76(4.45-5.13)
|
4.86(4.57-5.14)
|
<0.001
|
0.11
|
<0.001
|
0.078
|
<0.001
|
HGB(g/L)
|
145.0(134.0-156.0)
|
150.0(140.0-157.0)
|
<0.001
|
0.163
|
<0.001
|
0.075
|
<0.001
|
TCHO(mmol/L)
|
4.875(4.33-5.42)
|
5.2(4.62-5.868)
|
<0.001
|
0.113
|
<0.001
|
0.239
|
<0.001
|
TG(mmol/L)
|
1.04(0.75-1.487)
|
1.46(1.07-2.09)
|
<0.001
|
0.293
|
<0.001
|
0.333
|
<0.001
|
HDL(mmol/L)
|
1.37(1.17-1.578)
|
1.2(1.06-1.42)
|
<0.001
|
-0.267
|
<0.001
|
-0.228
|
<0.001
|
LDL(mmol/L)
|
2.82(2.41-3.32)
|
3.14(2.63-3.695)
|
<0.001
|
0.134
|
<0.001
|
0.246
|
<0.001
|
ALT(U/L)
|
17.0(12.0-24.0)
|
21.0(16.0-29.0)
|
<0.001
|
0.212
|
<0.001
|
0.211
|
<0.001
|
p valuea Significance level for the difference between the normal and the hyperglycemia group.
CCa Correlation coefficient between features and FPG.
p valueb Significance level of correlation coefficient between features and FPG.
CCb Correlation coefficient between features and HbA1c.
p valuec Significance level of correlation coefficient between features and HbA1c.
Table 2 Statistical analysis was used to compare the differences between the tongue features of the normal and hyperglycemia group, and the correlations between the tongue features and the glucose metabolism index were analyzed.
Feature
|
Normal
(N=1546)
|
Hyperglycemia
(N=634)
|
p valuea
|
CCa
FPG
|
p valueb
|
CCb
HbA1c
|
p valuec
|
perAll
|
0.414(0.314-0.563)
|
0.454(0.354-0.792)
|
<0.001
|
0.302
|
<0.001
|
0.137
|
<0.001
|
perPart
|
1.152(1.067-1.289)
|
1.1(1.023-1.252)
|
<0.001
|
-0.208
|
<0.001
|
-0.144
|
<0.001
|
TB-CON
|
60.946(43.702-83.004)
|
62.968(43.366-86.508)
|
0.675
|
-0.074
|
0.001
|
0.027
|
0.214
|
TB-ASM
|
0.085(0.072-0.101)
|
0.084(0.071-0.102)
|
0.606
|
0.068
|
0.002
|
-0.028
|
0.198
|
TB-ENT
|
1.173(1.098-1.244)
|
1.179(1.092-1.254)
|
0.759
|
-0.078
|
<0.001
|
0.024
|
0.271
|
TB-MEAN
|
0.024(0.02-0.028)
|
0.024(0.02-0.028)
|
0.655
|
-0.07
|
0.001
|
0.028
|
0.195
|
TC-CON
|
75.281(49.555-102.545)
|
85.722(59.51-114.987)
|
<0.001
|
0.058
|
0.007
|
0.138
|
<0.001
|
TC-ASM
|
0.073(0.062-0.092)
|
0.068(0.058-0.084)
|
<0.001
|
-0.056
|
0.009
|
-0.133
|
<0.001
|
TC-ENT
|
1.226(1.131-1.297)
|
1.256(1.172-1.324)
|
<0.001
|
0.06
|
0.005
|
0.138
|
<0.001
|
TC-MEAN
|
0.027(0.022-0.031)
|
0.028(0.024-0.033)
|
<0.001
|
0.056
|
0.009
|
0.135
|
<0.001
|
TB-R
|
162.0(154.0-168.0)
|
157.0(149.0-164.0)
|
<0.001
|
-0.153
|
<0.001
|
-0.191
|
<0.001
|
TB-G
|
100.0(93.0-108.0)
|
97.0(89.25-106.0)
|
<0.001
|
-0.014
|
0.517
|
-0.127
|
<0.001
|
TB-B
|
105.0(98.0-115.0)
|
105.0(97.0-116.0)
|
0.86
|
0.145
|
<0.001
|
-0.018
|
0.394
|
TC-R
|
152.102±15.38
|
150.278±16.516
|
0.017
|
-0.051
|
0.016
|
-0.071
|
0.001
|
TC-G
|
110.0(101.0-120.0)
|
111.0(101.0-123.0)
|
0.141
|
0.101
|
<0.001
|
0.019
|
0.381
|
TC-B
|
114.0(104.0-125.0)
|
117.0(106.0-134.0)
|
<0.001
|
0.2
|
<0.001
|
0.088
|
<0.001
|
TB-L
|
104.988(102.774-107.829)
|
104.07(101.362-106.988)
|
<0.001
|
-0.038
|
0.076
|
-0.138
|
<0.001
|
TB-a
|
21.372(19.351-23.381)
|
21.824(19.66-23.626)
|
0.024
|
-0.069
|
0.001
|
0.037
|
0.087
|
TB-b
|
5.182(2.674-6.671)
|
4.46(-2.886-6.318)
|
<0.001
|
-0.276
|
<0.001
|
-0.135
|
<0.001
|
TC-L
|
107.233±5.185
|
107.52±5.421
|
0.247
|
0.077
|
<0.001
|
0.006
|
0.795
|
TC-a
|
14.393±2.651
|
13.917±2.762
|
<0.001
|
-0.198
|
<0.001
|
-0.081
|
<0.001
|
TC-b
|
3.527(1.179-5.206)
|
2.925(-3.833-4.75)
|
<0.001
|
-0.251
|
<0.001
|
-0.13
|
<0.001
|
TB-H
|
356.037(352.221-358.425)
|
354.791(339.386-357.969)
|
<0.001
|
-0.24
|
<0.001
|
-0.132
|
<0.001
|
TB-I
|
121.0(115.0-130.0)
|
119.0(112.0-127.0)
|
<0.001
|
-0.014
|
0.504
|
-0.122
|
<0.001
|
TB-S
|
0.181±0.025
|
0.189±0.031
|
<0.001
|
0.023
|
0.287
|
0.113
|
<0.001
|
TC-H
|
356.62(350.777-360.0)
|
354.924(330.0-360.0)
|
<0.001
|
-0.212
|
<0.001
|
-0.125
|
<0.001
|
TC-I
|
125.0(116.0-135.0)
|
126.0(117.0-138.0)
|
0.088
|
0.096
|
<0.001
|
0.021
|
0.331
|
TC-S
|
0.125±0.024
|
0.124±0.025
|
0.698
|
-0.104
|
<0.001
|
-0.015
|
0.492
|
TB-Y
|
117.672(112.442-124.673)
|
115.605(109.439-122.641)
|
<0.001
|
-0.043
|
0.043
|
-0.14
|
<0.001
|
TB-Cr
|
154.292(151.371-156.631)
|
153.123(149.379-155.681)
|
<0.001
|
-0.278
|
<0.001
|
-0.147
|
<0.001
|
TB-Cb
|
120.853(119.551-123.211)
|
121.596(119.981-128.45)
|
<0.001
|
0.282
|
<0.001
|
0.147
|
<0.001
|
TC-Y
|
121.972(113.952-130.15)
|
121.851(113.796-132.073)
|
0.409
|
0.071
|
0.001
|
0.002
|
0.908
|
TC-Cr
|
146.131(143.219-148.418)
|
145.058(140.772-147.693)
|
<0.001
|
-0.286
|
<0.001
|
-0.145
|
<0.001
|
TC-Cb
|
122.801(121.473-125.023)
|
123.394(121.764-130.292)
|
<0.001
|
0.252
|
<0.001
|
0.132
|
<0.001
|
p valuea Significance level for the difference between the normal and hyperglycemia group.
CCa Correlation coefficient between features and FPG.
p valueb Significance level of correlation coefficient between features and FPG.
CCb Correlation coefficient between features and HbA1c.
p valuec Significance level of correlation coefficient between features and HbA1c.
Data Partition
Our research divides the data set into a training set and a test set according to the ratio of 8: 2. The training set consisted of 80% of the original database (1744 individuals). The test set consisted of 20% of the original database (436 individuals). We performed a 5-fold cross-validation on the training set and selected the model through cross-validation, i.e., we used 80% of the training set, leaving 20% as the validation set, and then rotated five times. The test set didn’t participate in the training process of the model, and the actual detection performance of the model was checked on the test set.
Feature Selection
In this paper, we selected Age, Weight, BP, RBC, HGB, WBC, TCHO, TG, HDL LDL, ALT, TB-L, TB-a, TB-b, TC-L, TC-a, and TC-b according to research practice. Age is an important risk factor for the diagnosis, development and prognosis of type 2 diabetes; however, HbA1c has a tendency to decline with age, and only increases when it is older than 90 years of age[21]. Weight is highly correlated with the occurrence of diabetes. After the diagnosis of diabetes, weight loss of more than 5% can improve the level of HbA1c and reduce the risk of cardiovascular disease within 10 years[22]. There is no causal relationship between hypertension and diabetes, but diabetes often leads to increased blood pressure[23]. Hyperglycemia can affect blood indicators, which are recognized a risk factor for complications. Due to the chronic inflammatory state of diabetes, pro-inflammatory factors can differentiate and mature white blood cells. Diabetes changes the surface charge and aggregation of red blood cells to reduce the number of red blood cells and the content of hemoglobin[24]. Fat accumulation in the body leads to excessive saturated fatty acids remaining in the cells. These saturated fatty acids can be toxic to liver and pancreatic islet cells. Therefore, it is very important to evaluate lipid toxicity by detecting TCHO, TG, HDL LDL and ALT [25].
Tongue diagnosis has been widely used in the diagnosis of diabetes by traditional Chinese medicine. In clinical practice, 80% of diabetic patients do not show typical symptoms, and tongue signs have changed accordingly. For example, the tongue of diabetic patients is often red and yellow[26]. Studies have shown that yellow tongue coating is associated with a high incidence of diabetes[9]. Given the above research, we choose the related features to train the machine learning regression model.
Model Construction
In our study, we utilize multiple supervised learning models for regression of FPG and HbA1c of subjects. We not only used the popular artificial neural network(ANN) model, but also used three ensemble models including Gradient Boosting Decision Tree(GBDT), Random Forest(RF) and eXtreme Gradient Boosting Tree(XGBT).
In a simple neural network, it generally includes three layers, an input layer, a hidden layer, and an output layer. The neurons are connected to each other. The data enters the neural network at the input layer and propagates forward to the output layer to get the solution. There is a corresponding weight at the input of each neuron to control the strength of the input[27].
RF[28] is an integrated tree algorithm based on the idea of bagging. Using a different bootstrap sample of the data, RF builds a foundational decision tree. Each node is split using the best among a subset of predictors randomly chosen at that node. The double randomness of feature selection and sample selection can not only improve the generalization ability of the model, but also avoid overfitting the model.
GBDT and XGBT[29] have been widely used in a number of data mining and machine learning challenges. They are tree ensemble models which use K additive functions to predict the output. However, XGBT controls the complexity of the model by adding regular terms to the objective function to avoid overfitting the model.
We constructed models for prediction of FPG and HbA1c with machine learning algorithms(Fig. 5). In the current experiment, to study the contribution of various types of features in modeling, we combined different types of features, including basic features, blood features and tongue features.
Fusion Strategy
The machine learning based model blending approach is a very common fusion technique[30]. The idea of model fusion is to use many independent models to calculate the initial prediction, and then mix the initial prediction to achieve a better final prediction result. Four independent models were used to fit the training set, the prediction verification set and the test set. The prediction results of the verification set were combined into a new training matrix, and the prediction results of the test set were combined into a new test matrix. The second layer of linear regression was used to fit the new training matrix, predict the new test matrix, and get the final test result(Fig. 6). The stacking and generalization operation of the blending approach uses different data to avoid the leakage of learning information.
Evaluation Criteria
Mean Squared Error(MSE) and coefficient of determination(R-squared) were used to evaluate the performance of our prediction model. The calculation of MSE was simple and the meaning was clear. It was a commonly used evaluation index in statistical analysis[31]. We should not only consider the actual working accuracy of the prediction model, but also consider the influence of the prediction model on the clinical decision. Consequently, the Clark’s Error Grid Analysis(EGA)[32, 33] was used to determine the acceptable error for the accuracy of predictive value of FPG in comparison with the actual value. The more values that appear in Zones A and B, the more accurate the model is in terms of clinical utility[34].The scatter-plot was used to access the performance of the HbA1c prediction model[35]. The closer the slope of the fitted line is to 1, the closer the intercept is to 0, the better the model performance.