Background Traditional Chinese medicine (TCM) describes physiological and pathological changes inside and outside the human body by the application of four methods of diagnosis. One of the four methods, tongue diagnosis, is widely used by TCM physicians, since it allows direct observations that prevent discrepancies in the patient’s history and, as such, provides clinically important, objective evidence. The clinical significance of tongue features has been explored in both TCM and modern medicine. However, TCM physicians may have different interpretations of the features displayed by the same tongue, and therefore intra- and inter-observer agreements are relatively low. If an automated interpretation system could be developed, more consistent results could be obtained, and learning could also be more efficient. This study will apply a recently developed deep learning method to the classification of tongue features, and indicate the regions where the features are located.
Methods A large number of tongue photographs with labeled fissures were used. Transfer learning was conducted using the ImageNet-pretrained ResNet50 model to determine whether tongue fissures were identified on a tongue photograph. Often, the neural network model lacks interpretability, and users cannot understand how the model determines the presence of tongue fissures. Therefore, Gradient-weighted Class Activation Mapping (Grad-CAM) was also applied to directly mark the tongue features on the tongue image.
Results Only 6 epochs were trained in this study and no graphics processing units (GPUs) were used. It took less than 4 minutes for each epoch to be trained. The correct rate for the test set was approximately 70%. After the model training was completed, Grad-CAM was applied to localize tongue fissures in each image. The neural network model not only determined whether tongue fissures existed, but also allowed users to learn about the tongue fissure regions.
Conclusions This study demonstrated how to apply transfer learning using the ImageNet-pretrained ResNet50 model for the identification and localization of tongue fissures and regions. The neural network model built in this study provided interpretability and intuitiveness, (often lacking in general neural network models), and improved the feasibility for clinical application.