Lung cancer is the leading cause of cancer-related death worldwide (Sung et al., 2021), and the cancer with the highest incidence and mortality in China (Cao et al., 2021; Siegel et al., 2021). Early lung cancer does not have any overt symptoms, and roughly 80% of patients are already in an advanced state when they receive their initial diagnosis, which prevents them from receiving the best possible care. Most patients with advanced lung cancer survive for less than a year (Zeng et al., 2015). One of the most frequent pathological changes to the lungs is nodules. Malignant lung nodules are one of the most important manifestations in the early stage of lung cancer. It is extremely important from a clinical standpoint to accurately diagnose the type and nature of lung nodules in order to perform early screening for lung cancer. In order to increase patient survival rates, improve lung cancer patient prognoses, and lessen overdiagnosis and overtreatment of patients with benign lung nodules, early diagnosis of benign and malignant lung nodules is crucial (Schabath & Cote, 2019).
The traditional classification of benign and malignant lung nodules mainly relies on pathological biopsy at the cellular or molecular level, which is an invasive examination. It is not suitable for small lung nodules, which makes early diagnosis difficult. Patients with lung nodules frequently skip the checkup, allowing the lesions to advance and worsen until they turn into lung cancer. In recent years, with the rapid development of modern information technology such as image processing, big data and artificial intelligence, breakthroughs have been made in the research of tarditional Chinese medicine (TCM) tongue diagnosis informatization and digitalization. The standardization and datalization of tongue image information have been initially realized through the research and development of numerous contemporary tongue diagnosis tools and technologies, which provides an important foundation for the development of intelligence and has a good development and application prospect (Kamarudin et al., 2017; Li et al., 2021; Wang & Zhang, 2013; Zhang et al., 2019; Zhang et al., 2017).
Digital image processing technology is frequently employed in contemporary tongue diagnosis research to extract color and texture data from tongue images, and it has produced successful results. Among the many machine learning techniques, Support Vector Machine (SVM)(Liu & Cheng, 2018; Yue et al., 2010),convolutional neural network (C-NN)(Li et al., 2019), U-Net (Alom et al., 2019), and logistic regression (Schober & Vetter, 2021; Zhang et al., 2018) are frequently employed. Qiang Xu et al (Xu et al., 2020) used Multi-Task Joint learning (MTL) method to segment and classify tongue images, and two deep neural network variants (UNET and Discriminative Filter Learning (DFL)) were fused into the MTL method to perform these two tasks, and the experimental results show that our joint method outperforms the currently available tongue characterization methods. Xu Wang et al.(Wang et al., 2020) employed the ResNet34 CNN architecture to extract data and carry out classifications using a deep convolutional neural network to recognize tongues with dental marks. Additionally, the models' total accuracy was over 90%, and they were effective and had good generalization. This study offers a practical and objective computer-aided tongue diagnostic technique for monitoring disease development and assessing pharmaceutical effects from an informatics standpoint. Using objective tongue image data, basic data, and serological indexes, Tao Jiang et al.(Jiang et al., 2021) applied computer tongue image analysis technology to the diagnosis of Nonalcoholic Fatty Liver Disease (NAFLD). A diagnostic model of NAFLD was established using machine learning techniques, such as logistic regression, SVM, random forest, and gradient lifting decision tree, adaptive lifting algorithm, naive Bayes, and neural network. The findings demonstrated that the use of computerized tongue image analysis technology can increase the precision of NAFLD diagnosis, which may serve as an easy technical guide for the development of NAFLD early screening techniques. To build the link between tongue image and nine different TCM constitutions, several research used deep neural networks, such as ResNet-50, Inception-V3, VGG-16, and other deep learning algorithms, which improved the classification accuracy (Ma et al., 2019).
The standardization and datalization-based tongue diagnosis technique generated a lot of data. The development of a differential diagnosis model for benign lung nodules and lung cancer using machine learning, data mining, and other techniques has significant practical value. It aids in the differential diagnosis of benign lung nodules and lung cancer, early lung cancer screening, early diagnosis, and treatment, and increases patient survival rates. This study used computer technology to extract objective diagnostic features from the tongue images of patients with benign lung nodules and lung cancer. It then conducted a correlation analysis of the tongue image features of the two groups, mining the differences of the two groups' tongue image features in order to serve as a reference for later research on the creation of classification models based on machine learning techniques.