Study on the Syndrome Characteristics and Classication Model of Non- Small Cell Lung Cancer Based on Tongue and Pulse Data

Background: Lung cancer is a common malignant tumor that affects people's health seriously. Traditional Chinese medicine (TCM) is one of the effective methods for the treatment of advanced lung cancer, accurate TCM syndrome differentiation is essential to treatment. When the symptoms are not obvious, the traditional symptom-based syndrome differentiation cannot be carried out. There is a close relationship between syndrom and index of western medicine, the combination of micro index and macro symptom can assist syndrome differentiation effectively. Methods: Tongue and pulse data of non-small cell lung cancer (NSCLC) patients with Qi deciency syndrome (n=163), patients with Yin deciency syndrome (n=174) and healthy controls (n=185) were collected by using intelligent Tongue and Face Diagnosis Analysis-1 instrument and Pulse Diagnosis Analysis-1 instrument, respectively. The characteristics of tongue and pulse data were analyzed, the correlation analysis was also made on tongue and pulse data. And four machine learning methods, namely Random Forest, Logistic Regression, Support Vector Machine and Neural Network, were used to establish the classication models based on symptoms, tongue & pulse data, and symptoms & tongue & pulse data, respectively. Results: Signicant difference indexes of tongue diagnosis between Qi deciency syndrome and Yin deciency syndrome were TB-a, TB-S, TB-Cr, TC-a, TC-S, TC-Cr, perAll and the tongue coating texture indexes including TC-Con, TC-ASM, TC-MEAN, and TC-ENT. Signicant difference indexes of pulse diagnosis were t 4 and t 5 . The classication performance of each model based on different data sets was as follows: model of tongue & pulse data <model of symptom < model of symptom & tongue & pulse data. The Neural Network model had a better classication performance for the symptom & tongue & pulse data, with an area under the ROC curve and accuracy rate were 0.9401 and 0.8806. Conclusions:This study explored the characteristics of tongue and pulse data of NSCLC with Qi deciency syndrome and Yin deciency syndrome, and established syndromes classication model. It was feasible to use tongue and pulse data as one of the objective diagnostic indexes in Qi deciency syndrome and Yin deciency syndrome of NSCLC. 1 , h 1 /t 1 , and t 1 were prolonged, reecting that the pulse force of the Qi deciency syndrome was soft and weak, the of h1 and the the sphygmogram smaller, indicating that the pulse body was thin and small. All in all, the tongue of Qi deciency syndrome was pale and the pulse was weak, while the tongue body of Yin deciency syndrome was more red or crimson, more brighter in tongue color, thinner and greasy in tongue coating, and more ne in pulse.


Introduction
Lung cancer is a common malignant tumor of the lung, which is one of the highest morbidity and mortality in the world. It is estimated that the number of deaths from lung cancer accounts for about 24% of all cancer deaths in the United States [1,2] . Organization report shows that lung cancer causes approximately 1.76 million deaths worldwide each year, accounting for 18.7% of all cancer deaths [3] . Non-small cell lung cancer (NSCLC) is the most common histological type of lung cancer, accounting for more than 80% of primary lung cancers, with a very high morbidity and mortality [4] , 60% of NSCLC cases have metastasized at the time of diagnosis, and the 5-year survival rate for advanced NSCLC is less than 5%, and early diagnosis of lung cancer is an important opportunity to reduce mortality [5,6] . The current treatment methods for NSCLC mainly include surgery, radiotherapy, chemotherapy and targeted therapy [7,8] , chemotherapy is the most common treatment, patients with poor health often have a low tolerance to conventional treatment, and they often tend to drug resistance [9] . Traditional Chinese medicine (TCM) has a long history and rich experience in the treatment of lung cancer, and is one of the main methods of comprehensive treatment of lung cancer in China. Systematic evaluation of TCM showed that TCM combined with radiotherapy and chemotherapy and targeted therapy had certain advantages in alleviating symptoms, stabilizing tumors, improving life quality and prolonging survival period [10] . TCM has been proved to be an effective method for the treatment of advanced lung cancer. On the basis of accurate syndrome differentiation, TCM plays an active role in each stage of the occurrence and development of lung cancer [11,12] .
Syndrome differentiation and treatment is the basic principle of TCM to diagnose and deal with diseases. It is a process of comprehensive judgment on the four diagnostic information of patients based on the theory of TCM combined with the doctor's experience [13] . Accurate syndrome differentiation can provide a basis for the treatment of diseases and is the foundation of clinical e cacy. Traditional syndrome differentiation and treatment inevitably suffer from subjectivity and ambiguity, which actually hinders the development of TCM. Microsyndrome differentiation is a method of using modern advanced technology to go deep into the body's microcosmic level to understand and differentiate syndromes on the basis of macroscopic syndrome differentiation, which can be used to guide disease differentiation and syndrome differentiation, explore the cause and pathogenesis, and evaluate the e cacy and guide the prognosis of the disease [14] . Previous studies have veri ed that there is a close relationship between different syndromes and physical and chemical indexes, and the combination of micro index and macro symptom can assist syndrome differentiation effectively.
With the rapid development of modern research on tongue and pulse diagnosis, a variety of tongue and pulse diagnosis instruments are widely used in clinical practice, generating a large number of objective data of tongue and pulse diagnosis, which are also microscopic indexes in a sense. In recent years, researches based on tongue and pulse diagnosis data have been increasing day by day, many researchers apply machine learning and data mining methods to the elds of image recognition, target detection, natural language processing and other elds [15][16][17][18] . In addition, researches have demonstrated that accurate detection, identi cation and multi-dimensional quantitative analysis based on tongue and pulse data have been gradually applied to disease diagnosis. By constructing the diagnostic relationship between tongue and pulse and health status, it not only saves medical resources, but also greatly improves diagnosis e ciency and treatment [19][20][21][22] . Qi de ciency syndrome and Yin de ciency syndrome are the two main common syndromes of NSCLC. When the symptoms are not obvious, the traditional symptom-based syndrome differentiation cannot be carried out. Modern study of tongue and pulse diagnosis research provides a good data basis for TCM syndrome differentiation. Therefore, this study aims to explore the differentiation of NSCLC based on tongue & pulse diagnosis data, using machine learning methods to establish a syndrome classi cation model based on macro symptom data, objective tongue &pulse data, and macro symptom data & objective tongue &pulse data, and evaluate the contribution rate of the objective data of tongue and pulse diagnosis to syndrome differentiation.

Study design and subjects
A total of 337 patients were selected from the Oncology Department of Yueyang Hospital of Integrated Traditional Chinese and Western Medicine from January 2018 to October 2020, including 163 patients with Qi de ciency syndrome and 174 patients with Yin de ciency syndrome, all patients were pathologically or cytologically con rmed to be NSCLC. A total of 184 healthy populations were randomly selected from Shuguang Hospital of Shanghai University of Traditional Chinese Medicine from January 2018 to October 2020 as the healthy controls.
The research owchart was shown in Fig. 1.

Diagnostic Criteria
Diagnostic criteria of Western medicine: according to the clinical practice guidelines for lung cancer screening issued by the National Comprehensive Cancer Network (NCCN) [23] and the fourth edition lung cancer histological classi cation standards of "Classi cation of Lung Tumors" [24,25] issued by the World Health Organization (WHO) . TCM Syndrome Differentiation Standard: according to the "Technical Guidelines for Clinical Research of New Drugs of Syndromes" [26] and the Syndrome Part of TCM Clinical Diagnosis and Treatment Terms [27] and textbooks of Common Diseases and Symptoms in Internal Medicine of Traditional Chinese Medicine.

Inclusion And Exclusion Criteria
Inclusion criteria:(1) Meet the above diagnostic criteria. (2) Con rmed by pathology or cytology.  [28] . All the work of tongue and pulse diagnosis collection and inquiry is completed by professional personnel of TCM or integrated TCM and western medicine who have received standardized training. Each patient was consulted by at least two professional researchers, and the interpretation of all patient syndromes was completed by three senior doctors to ensure the consistency and authenticity of data collection and interpretation, and minimize deviation. The TFDA-1 digital tongue diagnosis instrument and its corresponding tongue image analysis system were shown in Fig. 2.
The PDA-1 digital pulse diagnosis instrument and its corresponding sphygmogram were shown in Fig. 3.
Tongue diagnosis parameters mainly come from the three color spaces of Lab, HIS and YCrCb [29][30][31][32] , and each parameter of tongue and pulse diagnosis has its corresponding medical signi cance [31,33] Statistical analysis SPSS 25.0 was used for statistical analysis. Count data was expressed as percentage N(%), measurement data obeyed normal distribution was expressed as "mean ± standard deviation", and those didn't obey is expressed as "median (upper quartile, lower quartile)". Measured data were compared with analysis of variance (ANOVA) or rank sum test (Kruskal-Wallis H test), the correlation heat map was made by GraphPad Prism 8.0. All the test results were double-tailed test, and the test level α = 0.05, the difference was statistically signi cant when P < 0.05.

Classi cation By Machine Learning Approach
Orange (3.26.0) software was used, four machine learning methods, namely Neural Network, Random Forest, Support Vector Machine (SVM) and Logistic Regression were used to set the ratio of training set to test set at 8:2. Adjusted the parameters of each model to establish classi cation and diagnosis models of Qi de ciency syndrome and Yin-de ciency syndrome of NSCLC based on "symptom", "tongue pulse" and "symptom & tongue pulse" respectively. Accuracy, Precision, F1-score (F1), Sensitivity, Speci city and AUC were used as evaluation indexes to evaluate the predictive performance. AUC was the area under the ROC Curve, and the value was between 0.5-1. The larger the value, the better the classi cation effect of the classi er. The calculation formula of each index was as follows: In the above statements, True Positive (TP) was the positive sample predicted by the model as the positive category. True Negative (TN) was the negative sample predicted by the model as the negative category. False Positives (FP) was the negative sample predicted by the model as the positive category. False Negative (FN) was the positive sample predicted by the model as the negative category.

Characteristics of participants
The basic statistical analysis result of the three groups was shown in Table 1. The result showed that: in terms of sex ratio, the female in the syndrome of Qi de ciency was slightly higher than the male, and the male in the syndrome of Yin de ciency was slightly higher than the female. Compared with the healthy controls, the age of the people with Qi de ciency syndrome and Yin de ciency syndrome were statistically signi cant, while the age difference between Qi de ciency syndrome and Yin de ciency syndrome was not statistically signi cant.

Statistical Analysis Of Tongue Data
The statistical analysis result of tongue diagnosis data in the three groups was shown in Table 2. The result showed that: (1) Compared with Qi de ciency syndrome, there were more signi cant differences between Yin de ciency syndrome and the healthy controls. (2) In the indexes of signi cant difference between Yin de ciency syndrome and healthy controls, except for the texture index of tongue coating, the index of the tongue body is more than that of the tongue coating. That is, changes of tongue body index of Yin de ciency syndrome were more signi cant than that of tongue coating index. (3) Signi cant difference indexes of tongue diagnosis between Qi de ciency syndrome and Yin de ciency syndrome were TB-a, TB-S, TB-Cr, TC-a, TC-S, TC-Cr, perAll and the TC-Con, TC-ASM, TC-Mean, TC-ENT of the tongue coating texture indexes. Among them, the parameters of TB-a, TB-Cr, TC-a, TC-S, TC-Cr, and TC-ASM of Yin de ciency syndrome were all higher than those of Qi de ciency syndrome, while the parameters of perAll, TC-Con, TC-ENT of Yin de ciency syndrome were all lower than those of Qi de ciency syndrome.

Statistical Analysis Of Pulse Data
The statistical analysis result of pulse diagnosis data in the three groups was shown in Table 3.

Correlation Analysis Of Tongue And Pulse Data
Further correlation analysis was conducted on the tongue and pulse data with statistical signi cance about Qi de ciency syndrome and Yin de ciency syndrome. The correlation heat map was made by GraphPad Prism 8.0, and all the results were two-tailed tests. The difference was statistically signi cant when P < 0.05, the heat map result of Qi de ciency syndrome was shown in Fig. 4.
The correlation analysis result of tongue and pulse data between Qi de ciency syndrome was shown in Table 4. The result showed that: (1) There was a strong correlation between the tongue coating texture parameters, the color space parameters of the tongue coating and tongue body were also correlated. The correlation between the tongue coating texture parameters and the color space parameters was weaker than the correlation of the pulse parameters. The heat map result of Yin de ciency syndrome was shown in Fig. 5.
The correlation analysis result of tongue and pulse data between Yin de ciency syndrome was shown in Table 5. The result showed that: (1) Similar to Qi de ciency syndrome, the tongue coating texture parameters of the Yin de ciency syndrome have a strong correlation, and the color space parameters of the tongue coating and tongue body are also strongly correlated. The correlation between tongue coating texture parameters and color space parameters was weaker than that of pulse parameters. (2) There was a certain correlation between pulse parameters t 4 and tongue parameters TC-ASM and TC-a, the correlation coe cients were both − 0.14, but the difference was not statistically signi cant (P > 0.05). Comparing the results of tongue and pulse correlation analysis between Qi de ciency syndrome and Yin de ciency syndrome, it could be seen that the tongue and pulse correlation intensity of Yin de ciency syndrome was signi cantly stronger than that of Qi de ciency syndrome, and the correlation between t 4 and tongue parameters in Yin de ciency syndrome was signi cantly reduced, while the correlation between t 5 and tongue parameters was signi cantly increased.

Machine Learning Results
Based on Neural Network, Random Forest, SVM, Logistic Regression four machine learning methods, the modeling result of Qi de ciency syndrome and Yin de ciency syndrome based on symptom, tongue & pulse data, symptom & tongue & pulse data was shown in Table 6. The ROC curves of the models based on symptom, tongue and pulse, and symptom & tongue and pulse were shown in Fig. 6, Fig. 7

Discussion
Treatment based on syndrome differentiation is the basic principle of TCM to recognize and treat diseases, and it runs through the whole process of prevention and rehabilitation of medical care practices. Syndrome differentiation is to recognize the disease and determine the syndrome, treatment is to establish treatment methods and prescription drugs based on the results of syndrome differentiation. Syndrome differentiation is the prerequisite and basis for treatment. Only on the basis of accurate syndrome differentiation can we get a good therapeutic effect. Qi de ciency syndrome and Yin de ciency syndrome are two common syndromes in TCM. According to the basic theory of TCM syndrome differentiation, Qi de ciency syndrome refers to the lack of vitality of the body and the decreased function of visceral organs.
The main manifestations are fatigue, lack of energy, lazy speech, and weak pulse. Yin de ciency syndrome refers to the lack of yin uid in the human body, its nourishing and nourishing functions are reduced, or Yin does not control Yang, Yang is too hyperactive. The main manifestations are dry mouth and pharynx, dysphoria in chestpalms-soles, tidal fever and night sweating. In the classi cation of lung cancer syndromes, the main manifestations of Qi de ciency syndrome are: cough, white or foamy phlegm, small amount of hemoptysis, chest tightness, shortness of breath, low fever, spontaneous sweating, lack of energy, pale complexion, poor appetite, loose stools, pale red tongue with tooth marks, thin white coating, thin pulse. Yin de ciency syndrome is mainly manifested as: cough without phlegm, or less but sticky phlegm, phlegm with blood, shortness of breath and dull chest pain, low fever, dry mouth, night sweat, upset and insomnia, red tongue, little or bare without tongue coating, thin and rapid pulse. According to the principle of TCM syndrome differentiation and treatment, the principle and treatment method of Qi de ciency syndrome is to invigorate the spleen and replenish qi, and the corresponding prescription is Sijunzi decoction. The principle and treatment method of Yin de ciency syndrome is to nourish Yin and clear lung, and the corresponding prescription is Shashen Maidong decoction.
Statistical Analysis of tongue and pulse data of Qi de ciency and Yin de ciency TCM is a promising and effective adjuvant therapy in the treatment of lung cancer. Compared with chemotherapy and radiotherapy, it has the advantages of availability, effectiveness and low toxicity 33 , its various mechanisms deserve further study [34][35] . In this study, the tongue parameters TB-a, TC-a, TB-Cr, TC-Cr of Qi de ciency syndrome and Yin de ciency syndrome, thy all represent the red value of tong body and tongue coating, the larger the value, the more redder or magenta the tongue is. In Yin de ciency syndrome, TB-a, TC-a, TB-Cr, and TC-Cr were all higher than those in Qi de ciency syndrome, indicating that the tongue of Yin de ciency syndrome was redder or magenta. S stands for saturation, the higher the value of S, the brighter the tongue color will be. TC-S in Yin de ciency syndrome was higher than that in Qi de ciency syndrome, indicating that the tongue color of Yin de ciency syndrome was brighter. perAll is the ratio of tongue coating area to total tongue area. perAll has a higher diagnostic value for thick coating, the higher the value, the thicker the tongue coating. perAll in Yin de ciency syndrome was lower than that in Qi de ciency syndrome, indicating that the tongue coating was thinner in Yin de ciency syndrome. Among the four parameters of texture parameters Con, ASM, ENT, and MEAN, the smaller the value of Con, ENT, and MEAN, the larger the ASM, re ecting that the more delicate the tongue texture or the more greasy the tongue coating. In this study, TC-Con and TC-ENT of Yin De ciency Syndrome were signi cantly lower than those of Qi de ciency syndrome, while TC-ASM was higher than that of Qi de ciency syndrome, indicating that the tongue coating of Yin de ciency syndrome was more greasy.
In the pulse parameters, t 4 is the time value from the starting point to the descending isthmus of the sphygmogram, corresponding to the systolic period of the left ventricle, and t 5 is the time value from the dicrotic notch to the end point of the sphygmogram, corresponding to the diastolic period of the left ventricle. t 4 and t 5 of Yin de ciency syndrome were smaller than those of Qi de ciency syndrome, indicating that the time of systole and diastole of Yin de ciency syndrome were shorter than those of Qi de ciency syndrome, the pulsation cycle t of Yin de ciency syndrome also showed a decreasing trend, indicating that the pulse wave velocity of Yin de ciency syndrome was slightly higher. In addition, there was a phenomenon of elevation of Yin de ciency syndrome in dicrotic notch h 4 . In addition, indicrotic notch h 4 in Yin de ciency syndrome was elevated. In the Qi de ciency syndrome, h 3 /h 1 , h 1 /t 1 , and t 1 were prolonged, re ecting that the pulse force of the Qi de ciency syndrome was soft and weak, the amplitude of main wave h1 was reduced, and the area under the sphygmogram was smaller, indicating that the pulse body was thin and small. All in all, the tongue of Qi de ciency syndrome was pale and the pulse was weak, while the tongue body of Yin de ciency syndrome was more red or crimson, more brighter in tongue color, thinner and greasy in tongue coating, and more ne in pulse.
Tongue and pulse data modeling analysis of Qi de ciency syndrome and Yin de ciency syndrome In recent years, with the rapid development of computer technology, different recognition algorithm and machine learning methods, such as Logical Regression [36] , SVM [22,37] , Random Forest [38] and neural network [15,39] and other data mining technologies have been widely used in medical research. The quantitative diagnosis of diagnostic information through various mathematical models has promoted the development of TCM informatization. In this study, symptom and tongue and pulse data were used to classify syndromes. The results showed that the classi cation e ciency of models based on different data sets was as follows: model of tongue & pulse data < model of symptom < model of symptom & tongue & pulse data, indicating that tongue and pulse data contributed to the classi cation of syndrome to some extent. Therefore, when faced with a complicated quantitative and qualitative, subjective and objective, determine and fuzzy, massive TCM data combining linear and nonlinear. TCM syndrome associated with complex multidimensional characteristics, and associated with multiple micro index, especially when symptoms were not evident, to explore the relationship between different syndromes and physical and chemical indexes can effectively assist in syndrome differentiation. Research also shows that it is very reasonable to combine micro indexes with macro symptoms. Using machine learning or data mining methods to build TCM syndrome or disease diagnosis model can make the process of syndrome differentiation and treatment more objective, standardized and intelligent [40][41][42] .
The de ciency of this study is that the sample size is insu cient. In future studies, large-scale, multi-center and large-sample size studies will be helpful for further exploration. In addition, this study only analyzed the tongue and pulse characteristics and classi cation model of Qi de ciency syndrome and Yin de ciency syndrome of NSCLC, in the future, dialectical studies based on more kinds of syndromes of NSCLC are needed.

Limitations And Future Work
This research is based on the real-world investigation, and the results basically conform to the syndrome distribution feature of NSCLC in the clinic. However, there are also some limitations in the study. First of all, due to the limitation of time and place, the sample size of this study was not large enough. Secondly, the basic data statistics of the subjects are not comprehensive enough, and there is a lack of statistics on height, weight, body mass index (BMI), history of present illness, and past medical history, etc., which may affect the data results. Last but not the least, this study mainly focused on the common non-small cell lung cancer syndrome of Qi de ciency and Yin de ciency, lack of more syndromes to explore. In the future, a large-scale and multicenter epidemiological investigation should be combined, the collection of four diagnostic information and basic characteristics needs to be more standardized and complete, and further researches based on more comprehensive syndrome differentiation results need to be carried out.

Conclusions
To sum up, objective tongue and pulse data of NSCLC are useful for the classi cation of TCM syndrome, they can improve the accuracy of TCM syndrome classi cation to a certain extent. Tongue and pulse diagnosis parameters can provide new ideas and methods for TCM syndrome differentiation of Qi de ciency syndrome and Yin de ciency syndrome of NSCLC.  The TFDA-1 digital tongue diagnosis instrument and its corresponding tongue image analysis system Heat map of tongue and pulse correlation analysis of Qi de ciency syndrome Figure 5 Heat map of tongue and pulse correlation analysis of Yin de ciency syndrome