Study design and subjects
From July 2020 to March 2022, 324 lung cancer patients at Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine's department of oncology were collected, and their case information, including medical record number, name, gender, medical history information, diagnosis information, and so on, was collected separately. Ethical approval was obtained from the Longhua Hospital affiliated to Shanghai University of Traditional Chinese Medicine Hospital Ethics Committee (registration number 2020LCSY083). Professionally trained graduate students collected standardized tongue image and tumor marker data. A total of 219 NSCLC patients were included in this study, including 109 patients with stages I, II, and III combined into the non-advanced NSCLC group and 110 patients with stage IV in the advanced NSCLC group. All patients were informed and signed informed consent after receiving a clear pathological diagnosis. The research flow chart was shown in Figure 1.
Diagnostic criteria
According to the "Clinical Practice Guidelines for Lung Cancer Screening" issued by the National Comprehensive Cancer Network (NCCN) [18] and the fourth edition of the World Health Organization (WHO) "Classification of Lung Tumors" for histological classification of lung cancer [19; 20].
Inclusion and Exclusion Criteria
Inclusion criteria: (1) NSCLC diagnosed by pathology or cytology; (2) age 18 to 90 years; (3) clear pathological staging diagnosis; (4) complete tongue image; and (5) informed and signed informed consent.
Exclusion criteria include: (1) patients who did not meet the inclusion criteria; (2) patients who were pregnant or breastfeeding; (3) patients who were combined with other malignant tumors; (4) patients who were combined with systemic acute and chronic infections; and (5) patients who had a mental illness, are unwilling to cooperate, or have poor study compliance.
Collecting Clinical Data
TFDA-1 Intelligent tongue diagnosis instrument
The Tongue Face Diagnosis Analysis-1(TFDA-1) digital tongue and face diagnosis instrument developed by the project team of the National Key Research and Development Program "TCM Intelligent Tongue Diagnosis System Research and Development" (NO: 2017YFC17033301) was used to collect the tongue images of patients, and the tongue image analysis system TDAS was used to analyze the tongue images to obtain the objective tongue features. The TFDA-1 digital tongue diagnosis instrument was shown in Figure 2 (A) and Figure 2 (B), and the corresponding tongue image analysis system TDAS was shown in Figure 3.
All tongue images were collected by researchers with standardized training to ensure the standardization and accuracy of collection. Specific tongue image collection methods were as follows: (1) set the shooting parameters and sterilize the instrument with alcohol; (2) instruct the subjects to place their chin on the mandibular rest of the digital tongue and face diagnosis instrument, relax naturally, open their mouth and stretch out the tongue, let the tongue body relax, tongue surface is flat, the tip of the tongue is downward, and touch the center of the tongue image in the camera to complete the acquisition. (3) examine the photographed tongue image, ensuring that the tongue body is complete and not nervous and that there is no fogging, light leakage, overexposure, or underexposure, and those who do not meet the requirements must be re-shot.
Introduction to features of tongue diagnosis
The tongue color index is derived from four different color spaces: RGB, HSI, Lab, and YCrCb. R(Red), G(Green), and B(Blue) represent the three primary colors of red, green, and blue, with values ranging from 0 to 255. "H" stands for Hue, and its angle range is [0, 2π], which means that the angle of red is 0, the angle of green is 2π/3, the angle of blue is 4π/3, and "S" stands for saturation. “I” stands for intensity; “L” stands for lightness, and its value ranges from 0 to 100, representing pure black to pure white, “a” stands for the green-red axis, its value range is [127, -128], “b” stands for the blue-yellow axis, its value range is [127, -128]; “Y” stands for the luminance, which ranges from 16 to 235, and "Cr" and "Cb" denote chrominance, where Cr denotes the difference between the red part of the RGB input signal and the brightness value of the RGB signal, that is, the degree of offset of the current color to red. and Cb represents the difference between the blue part of the RGB input signal and the brightness value of the RGB signal, that is, the degree of offset of the current color to blue; Cr and Cb have a value range of 16 to 240. CON (Contrast), ASM (Angular Second Moment), ENT (Entropy), and MEAN are the tongue texture indexes; perAll and perPart are the tongue coating indexes, where perAll is the ratio of the tongue coating area to the total tongue area and perPart is the ratio of the coating area to the uncoated tongue area. The prefix "TB-" refers to the tongue body, and "TC-" refers to tongue coating in this study. In order to better reflect the continuity of data and find the data regularity and real differences, this study rotated TB-H and TC-H by 180° and redefined the H value after rotation.
In addition, the tumor markers of patients were obtained from the Hospital Information System (HIS), and the specific indexes included CA50, CA242, AFP, NSE, CA72-4, CYFRA21-1, SCC, CEA, CA125, CA15-3, and CA19-9.
Statistical Analysis
SPSS 25.0 was used for statistical analysis, count data were expressed as percentage N (%), Pearson c2/Fisher's exact test was used for comparison between groups, measurement data that followed normal distribution were expressed as "X±SD", and those that did not conform were expressed as “Median ( P25, P75)”, T-test analysis was performed for groups followed to normality and homogeneity of variance, and independent sample Kruskal-Wallis U test was performed for those not conforming, and correlation heat maps were performed by GraphPad Prism 8.0. All test results were two-tailed, with a test level α = 0.05, P < 0.05 was considered statistically significant.
Modeling with machine learning methods
Orange is an open-source data mining tool that is built on C++ and has a Python interface. It includes a large number of standard and non-standard machine learning and data mining algorithms, as well as routine data reading, writing, and operations, as well as a Graphical User Interface (GUI), which is component-based data mining software. It has a good user interface and can complete complex data mining work by composing workflows through a variety of pre-defined modules. It includes a series of data visualization, retrieval, preprocessing, and modeling techniques. Orange software includes a lot of modeling widgets. Orange (3.26.0) software was used in this study, and five machine learning modeling components of logistic regression, SVM, random forest, naive bayes, and neural network were used to establish diagnostic models for different clinical stages of NSCLC. Classification models were built using four data sets: "tongue feature," "tumor Marker," "tongue feature & tumor marker," and "tongue feature & tumor marker & baseline data" from patients with different clinical stages of NSCLC. In order to minimize the error caused by the unreasonable selection of the test set and effectively avoid the error caused by one-time modeling, in this study we repeated resampling 10 times, randomly dividing the data into training sets and test sets in a 7:3 ratio for each resampling, and adjusted the model parameter of each group. The modeling was repeated 10 times for each data set, and the “Mean (Standard Deviations)” of the 10 classification results was used to describe the model's classification performance.
Python 3.7 was used in this study to draw ROC curves for each model and calculate evaluation indexes for each model based on the adjusted probability matrix of each model. As evaluation indexes, Accuracy, Precision, F1-score, Sensitivity, and Specificity were used. AUC was the area under the ROC curve, with values ranging from 0.5 to 1. The higher the value, the better the classification effect. Sensitivity, also known as true positive rate, assesses the sensitivity of diagnostic methods to diseases; the greater the sensitivity, the lower the likelihood of a missed diagnosis. Specificity is also known as the true negative rate; the higher the specificity, the greater the likelihood of a correct diagnosis. Accuracy indicates the proportion of the number of correctly classified test instances to the total number of test instances. Precision is the ratio of the number of positive cases correctly classified to the number of positive cases classified. F1-score is a harmonic average based on Recall and Precision, which is to evaluate the Recall and Precision comprehensively. The evaluation indexes were shown in the following formulas:
TP (True Positive) refers to a positive sample predicted as positive by the model. TN (True Negative) refers to a negative sample predicted by the model to be negative. FP (False Positive) refers to a negative sample predicted to be positive by the model; FN (False Negative) refers to a positive sample predicted to be negative by the model.