This study demonstrated that the most effective classification model for classifying histopathological images of oral squamous cell carcinoma using deep learning utilizes VGG16 with a learning rate scheduler and the SAM optimizer. It was also shown that diagnoses using deep-learning assistance contributes to the improvement of diagnostic accuracy of oral pathologists by considering the learning results of the classifier that were obtained using the best model.
This study first identified an optimized CNN model for the considered dataset. The best model utilized the SAM optimizer with VGG16 and a learning rate scheduler, as mentioned above. The SAM optimizer has been recently reported as a deep learning optimization method that performed well for publicly available datasets10, and classifiers using medical images11,12. Similar results were obtained using other deep learning classifiers research in this study. Although they did not perform quite as good as SAM, in each CNN model using SGDM as optimizer, the introduction of a learning rate scheduler was effective in achieving an improved performance within a limited number of epochs. Comparing the VGG16 and ResNet50 CNN models, the VGG16 performed better on the present dataset and hyperparameters. The VGG16 is a CNN architecture that has been demonstrated to improve robustness depending on the environment of the model13, and this was also observed in this study.
In recent years, studies have used classifiers based on deep learning techniques that are applied on pathological tissue images of the head and neck region. Various methods have been used for verification, and the images that are used vary depending on public data and facility-specific data14, which makes the cross-sectional comparisons of classification accuracy difficult. Previous studies using CNN classifiers for the histopathological diagnosis of oral squamous cell carcinoma have reported accuracies of 77.9–90.1%.14,15,16 Most studies have divided oral squamous cell carcinoma into normal tissue or benign and malignant tumors. In this study, three other categories were used, including normal, oral squamous cell carcinoma, and inflammatory response. Additionally, we targeted all cropped images that contained cells; hence, there are many factors that make diagnosis difficult. Despite such complex conditions, the proposed CNN model achieved a high classification diagnostic performance for the multiclass classification of complex datasets.
We analyzed the effectiveness of deep learning-assisted diagnosis using ROC curves and AUC data when used to aid oral pathologists. In this study, we considered both macro and micro averaging. The macro average values can reflect all classes in the same manner, while the micro average can reflect the bias considering the amount of data in each class. In this study, both macro-average and micro-average AUC evaluations showed statistically significant differences. In other words, it was shown that the use of deep-learning-assisted diagnosis greatly contributed to improving the diagnostic performances of the oral pathologists.
Each image segmented from the WSI image was classified into three. In general, pathologists use a single specimen slide to make an overall diagnosis, and they also consider the condition of the surrounding tissue before making a final decision. This makes it difficult to make confident diagnostic decisions from only one segmented image. In this study, we posited that the use of deep learning-assisted diagnosis positively affect the confidence of the pathologists.
Importantly, we statistically demonstrated the effectiveness of deep learning diagnostic aids. This is the first study that has demonstrated the improved diagnostic performances of pathologists using ROCAUC evaluation methods. In addition, we also demonstrated the effect size related to the auxiliary diagnosis provided by deep learning.17 Effect sizes may be used to determine the number of observers that will be present in future similar studies. The results of this study may provide a basis for the application of reliable deep learning methods in histopathological diagnoses.
This study has several limitations. First, only a few CNN models were verified, and there are many other optimizers and learning rate schedulers that were not investigated. To sufficiently verify the use of more complex CNN models, sufficient resources that can withstand the required computational costs are needed. Second, the pathological tissue images were verified at only one facility, and to confirm the effectiveness of more robust deep learning auxiliary diagnosis methods, the verification of external validity using external data is also required.