A Study On Fatigue Classication By Logistic Regression Method: Based On Data of Tongue And Pulse

Purpose: Fatigue is a subjective symptom which is hard to quantify, it is prevalent in sub-health and disease population, and there is still no accurate and stable method to distinguish disease fatigue from sub-health fatigue. Tongue and pulse diagnoses are the reection of the overall state of the body, and the modern researches of tongue and pulse diagnoses have made great progress. This study aims to explore the distribution rules of tongue and pulse data in disease fatigue and sub-health fatigue population, and evaluate the contribution rate of tongue and pulse data to fatigue diagnosis through modeling. Methods: Tongue and Face Diagnosis Analysis-1 instrument and Pulse Diagnosis Analysis-1 instrument were used to collect tongue image and pulse sphygmogram of sub-health fatigue population (n=252) and disease fatigue group(n=1160), we mainly analyzed the tongue and pulse characteristics and constructed the classication model based on logistic regression method. Results: The results showed that sub-health fatigue people and disease fatigue people had different characteristics of tongue and pulse, and the logistic regression model based on tongue and pulse data showed a better classication effect. The accuracy of healthy controls & sub-health fatigue, sub-health fatigue & disease fatigue, health controls & disease fatigue model were 65.70%, 65.10%, 78.90%, and the AUC were 0.678, 0.834, and 0.879 respectively. Conclusion: This study provided a new non-invasive method for the fatigue diagnosis from the perspective of objective tongue and pulse data, the modern tongue and pulse diagnoses have a good application prospect.


Introduction
Fatigue refers to symptom that physical tiredness with lack of energy or mental exhaustion with lack of concentration. It can be divided into physical fatigue and mental fatigue [20]. Fatigue is the rst cause of sub-health, and is one of the most common symptoms in primary care, and it is experienced by many patients with chronic hepatitis [6,22], depression [3], and various types of cancers [4]. Sub-health and a wide variety of diseases are associated with different degrees of fatigue with negative effect on people's life. With the improvement of general medical care and living standard, fatigue is more and more awared by people, however, due to the lack of objective diagnostic evidence of fatigue, there is still no reliable and stable evaluation method to distinguish disease fatigue and sub-health fatigue.
A large number of clinical practices and studies have shown that tongue and pulse can re ect the overall state of body [24]. Intelligent diagnosis of TCM is a new research eld in recent years, and it meets the trend that TCM diagnosis methods developing gradually towards intelligence and potential application in clinical practice [19,10]. In recent researches of tongue and pulse diagnoses, new diagnosis systems are adopted to collect and analyze clinical data related to disease, and machine learning method such as Arti cial Neural Network [16,23], Support Vector Machine(SVM) [26,9] and KNN [25] are used to establish corresponding diagnosis model, which can effectively assist doctor on the diagnosis of disease. There are more and more studies on fatigue based on tongue and pulse diagnoses [15,21,17].
Based on the modern research of tongue and pulse diagnoses, this study aims to explore the distribution rules of the data of tongue and pulse in disease fatigue and sub-health fatigue, and evaluate the contribution rate of the data to fatigue diagnosis through modeling, so as to provide a new reference for convenient and non-invasive methods of fatigue diagnosis, if an objective evaluation method based on the data of tongue and pulse can be established, it will play an important role in clinical diagnosis of fatigue.

Study design
A total of 7,025 subjects were collected from January 2015 to December 2018 in the medical examination center of Shuguang Hospital A liated to Shanghai University of Traditional Chinese Medicine, collecting their Western medicine physical examination index, tongue and pulse data of TCM. The 7,025 subjects were divided into healthy controls (n = 799), sub-health fatigue group (n = 361) and disease fatigue group (n = 1529). After excluding the outliers with extreme values in tongue or pulse data, there were 551, 252, and 1,160 subjects in healthy controls, subhealth fatigue group and disease fatigue group respectively. The overall ow diagram of the study was shown as

Diagnostic criteria
Health and sub-health state of each individual were determined using the Health Status Assessment Questionnaire H20 Scale [8]  Fasting blood glucose ≥ 7.0mmol/L and/or blood glucose at any point ≥ 7.8mmol/L and/or blood glucose at two hours after meal ≥ 11.1mmol/L Hypertension [18] Systolic blood pressure ≥ 140 mmHg and/or Diastolic blood pressure ≥ 90 mmHg Hyperlipidemia[2] TC ≥ 6.2mmol/L and/or LDL-C ≥ 4.1mmol/L and/or HDL-C ≥ 4.9mmol/L and/or TG ≥ 2.3mmol/L and/or non-HDL-C ≥ 1.55mmol/L Fatty liver disease [5] Ultrasound examination Disease was diagnosed by four well-trained clinicians according to the above diagnostic criteria of Western medicine. After excluding the disease population, the population with a score between 60 and 79 on the H20 scale was sub-health population, and a score between 80 and 100 on the H20 scale was healthy controls. Finally, the Information Record Form of Four Diagnosis of TCM and H20 scale were used to select fatigue population.

Tongue diagnosis and Pulse diagnosis Instruments
TFDA-1 tongue and face diagnosis instrument [13] and PDA-1 pulse diagnosis instrument [11] were shown in Fig. 2 and Fig. 3, they were used for data collection. The indexes of tongue image from color spaces of RGB, HSI, Lab and YCrCb. The pre x TB represented the tongue body index, TC represented the tongue coating index. Each of the index of pulse has its meaning [19].

Data Analysis
SPSS (Version 23.0) software was used for statistical analysis of the data. The normal distribution measurement data were expressed as "X ± S". Non-normal distribution data are expressed as quartiles expressed as "median (upper quartile, lower quartile)". Analysis of Variance (ANOVA) was performed for normality and homogeneity of variance among groups, Kruskal-wallis H test was performed for non-normal distribution data, and GraphPad Prism Version 8.0 was used for violin plot. All the test result was double-tailed test, test level was α = 0.05, and the difference was statistically signi cant when P < 0.05.

Modeling
Logistic regression analysis is performed for factors with statistical signi cance by ANOVA or Rank Sum Test. It is often used in data mining, automatic disease diagnosis, economic prediction and others, accuracy of the decision can be improved by adjusting the parameters of the regression model [1,27]. Accuracy, Sensitivity and Speci city are used to evaluate the performance of models. Area under the receiver operator characteristic (ROC) curve was also used to evaluate models, which generally has value between 0.5-1, the larger the value, the better the effect of the classi cation. Accuracy is the most common evaluation index, which is the ratio of the number of samples correctly classi ed by the model to the total number of samples. The higher the index, the better the performance of the classi er. Sensitivity is the true positive rate, that is, the percentage of people with actual disease who are correctly diagnosed. Speci city also known as true negative rate, it re ects the ability of a test to identify non-patients.
Accuracy, Sensitivity, and Speci city were de ned as follows: In the above formulas, TP represents the true positive rate, TN represents the true negative rate, FP represents the false positive rate, FN represents the false negative rate.

General Result
Diseases in disease fatigue group mainly included hypertension, diabetes, hyperlipidemia and fatty liver with distribution were shown in Fig. 4. Table 2 showed the general result of the healthy controls, the group of sub-health fatigue and disease fatigue.  Table 3 showed the statistical analysis result of the distribution of the characteristic parameters of tongue body and tongue coating among the healthy controls, the group of sub-health fatigue and disease fatigue.   Table 4 showed the statistical analysis result of the distribution of pulse characteristic parameters in healthy controls, the group of sub-health fatigue and disease fatigue. vs. Sub-health fatigue, # P < 0.05, vs. Sub-health fatigue, ## P < 0.01. Figure 6 showed the Violin Plots of selected parameters of pulse characteristic with statistical signi cance.

Statistical Analysis of Pulse indexes
The main result of pulse feature parameters showed that: t 1 , t 2 , t 3 , t 4 , h 1 , h 4 , h 5 , w 1 , w 2 , w 1 /t, w 2 /t, h 1 /t 1, h 3 /h 1 , As and Ad had signi cant statistical differences between the group of disease fatigue and healthy controls (P < 0.05, P < 0.01), t 4 had signi cantly statistical differences between the group of sub-health fatigue and the healthy controls (P Page 9/19 < 0.05), t 1 , h 1 , h 4 , h 5 , h 1 /t 1 , Ad, w 1 , w 2 , w 1 /t, w 2 /t had signi cantly statistical differences between the group of subhealth fatigue and the disease fatigue (P < 0.05, P < 0.01). The main characteristic of result was that the group of sub-health fatigue and disease fatigue showed a gradual increasing tendency in each parameter compared with the health controls, and it re ected that the two groups of fatigue people had a consistent tendency in the changing nature of pulse. In addition, the changes of pulse feature in the group of disease fatigue were more signi cant than the sub-health fatigue.

Results of Modeling and Model Evaluation
Logistic regression method was used to establish classi cation model based on tongue and pulse data of subjects from the 3 groups (250 in the health controls, 242 in the group of sub-health fatigue and 215 in the disease fatigue) who had complete Body Mass Index (BMI) data. Firstly, multiple logistic regression analysis was used for classi cation of the three groups based on tongue data, pulse data and BMI data. Table 5 showed the result of classi cation. And then to do binary logistic regression analysis. Table 6 showed the classi cation model result between the group of disease fatigue, the sub-health fatigue and the healthy controls. The ROC curves were shown in Fig. 7.
In addition, the classi cation model was reconstructed adding BMI data with tongue & pulse. Table 7showed the classi cation result of the reconstructed model. The ROC curves were shown in Fig. 8.
The research result showed that objective data of tongue & pulse had a good classi cation effect on disease fatigue, followed by sub-health fatigue. After adding BMI data, both of the model accuracy and ROC curve were improved except the sub-health fatigue and healthy controls. BMI is a convenience and noninvasive data, which suggested that we could combine BMI with tongue and pulse data to improve the diagnostic accuracy of fatigue.

Discussion
In this study, the distribution trends of the objective data of tongue were different between the sub-health fatigue population and the disease fatigue population. The study showed that TB-B, TB-R, TB-G, TC-B, TB-I, TB-Y, TB-L, TB-Cb, TB-Cr, TB-a and TB-a were in an ascending order in the group of sub-health fatigue, healthy controls and the group of disease fatigue. This indicated that disease fatigue people in general had more purple or red purple tongue body, and more white-greasy tongue coating. The tongue parameters of sub-health fatigue population were lower than those of the healthy controls, while disease fatigue was higher than the healthy controls, this result might partly related to the fact that the subjects in this study came from the physical examination center. Certain differences were found in tongue parameters of the fatigue groups comparing with the healthy controls, that was subjects in the group of disease fatigue had darker tongue body, more yellow or yellowish brown tongue coating, which was more associated with sthenia syndrome, and the subjects in the group of sub-health fatigue had light-colored tongue with white coating, which was more associated with the de ciency syndrome. The nding was consistent with the TCM theory that sub-health was manifested as decreased vitality, function and adaptability, and disease was mostly due to the hyperactivity of evil spirits, or dysfunction of the dysfunctional organs caused by phlegm, dampness and blood stasis and other pathological products. The result could help to distinguish sub-health fatigue and disease fatigue.
In our study, the pulse analysis result of the three groups showed that fatigue state can directly affect the changes of sphygmogram parameters, and the change had a consistent trend, so to say, the indexes of disease fatigue were more abnormal and the differences were more signi cant compared to healthy controls, while between the group of sub-health fatigue and health controls only w 2 /t had statistical difference, several indexes had signi cant difference between the group of sub-health fatigue and disease fatigue. As to the distribution trend of pulse indexes, the group of sub-health fatigue was located between healthy controls and the group of disease fatigue. Studies have shown that pulse can directly re ect various cardiovascular functional states, the results of this study, to a certain extent, indicated that patients with disease fatigue had more severe functional decline and other abnormal changes in cardiovascular functions, such as left ventricular function, peripheral resistance, great artery compliance, wall elasticity, blood viscosity. Since fatigue in the most serious case can cause sudden cardiac death, it was of great practical value to use sphygmogram to detect fatigue in order to diagnose cardiovascular disease and help to guide the early intervention.
BMI is an index of obesity which is closely related to health state. Studies have shown that BMI combined circumference level can be used to assess the risk of coronary heart disease in Japanese diabetic patients [14]. In this study, the accuracy of the model remained unchanged after adding BMI into the group of sub-health fatigue and the healthy controls. The reason may be that there was no difference in BMI index between the two groups, so their contribution to the accuracy of the model was not signi cant. However, the patients with disease fatigue and subhealth fatigue, there were statistically signi cant differences in BMI, therefore, BMI combined with tongue and pulse data had a positive effect on the modeling.

Conclusion
In this study, we successfully analyzed the tongue and pulse data characteristics and distribution trend of fatigue and healthy population, at the same time, logistic regression modeling can realize the diagnosis of disease fatigue and sub-health fatigue to a certain extent. It provided a non-invasive differential diagnosis method for the datadriven evaluation of different fatigue states based on the data of tongue and pulse. Figure 1 Overall Flow Diagram  Distribution of main diseases in the group of disease fatigue Figure 5 Violin Plots of the tongue characteristic parameters of the three groups Figure 6 Violin Plots of the pulse characteristic parameters of the three groups