Recently, computer-aided diagnosis (CAD) using AI models has been developed in various clinical fields to support the shortage of pathologists.15,16 This shortage also results in a disincentive for the popularization of ROSE in EUS-FNA.7 If the ROSE-AI system is put into practical use and becomes widespread, it will reduce the burden of cytopathologists and the difference in diagnostic ability between facilities. It is also expected to reduce patients’ physical and financial burdens from reexamination and readmission.
ROSE-AI system is expected to be widely used instead of the conventional effort-consuming ROSE. Previous studies have reported the effectiveness of deep learning in the pathological classification of pancreatic solid masses in EUS-FNA.9,10 One study using deep learning methods reported that the diagnostic accuracy was 83.4% in internal validation and 88.7% in external validation using 467 digitized images.9 However, in the study, there was an imbalance in the number of images used across the categories due to a shortage of images. Another study reported that the AUC for cancer diagnosis using the AI with deep learning was 0.958.10 They also showed generalizable and robust performance on internal datasets, external datasets, and subgroup analysis. In addition, the performance of their system was superior to that of trained endoscopists and comparable to that of cytopathologists on their testing datasets. An AI system with excellent performance does not always lead to general usage clinically in the real world because of its bias due to the insufficient datasets used for the training. Data comprehensiveness is essential to ensure the generalizability of the ROSE-AI system, which requires large quantities of datasets. Obtaining sufficient training data to improve diagnostic ability remains a major challenge in creating the ROSE-AI system. Indeed, pathological re-evaluation requires significant effort when creating a training dataset from EUS-FNA slides.
Data-augmentation has been reported as a valuable technique to compensate for efficient learning in AI training.11,12 It has the advantage of increasing the amount of training data by creating new images from the original images. AI training with data-augmentation has been reported as a useful technique for creating AI systems that detect Barrett’s esophagus and colorectal polyps in gastroenterology.17,18 However, few reports focus on the differences among data-augmentation techniques in the deep learning approach for CAD. One study compared the data-augmentation techniques of rotation, scaling, and distortion, and the rotation technique was the only method to improve the diagnostic ability for peripheral blood leukocyte recognition in the field of hematology.19 Another study optimized data-augmentation and CNN hyperparameters for detecting coronavirus disease 2019 from chest radiographs regarding validation accuracy.20 The study evaluated common augmentation techniques in the chest radiograph classification literature (resize value, resize method, rotate, zoom, warp, light, flip, and normalize), recently proposed methods (mixup and random erasing), and combinations of these methods. Individual data-augmentation methods yielded slightly increased task performance, and “Rotate” showed the highest AUC (0.965) performance. In addition, the combination of these optimized methods significantly improved the performance. Therefore, optimal data-augmentation techniques are effective for the efficient training datasets for AI.
To our knowledge, this is the first report to compare various data-augmentation techniques on the AI system of ROSE. Our study demonstrates that data-augmentation influences the performance of an AI system. Data-augmentation may provide a solution to guarantee data comprehensiveness. However, it seems there are compatibilities between various data-augmentation techniques and the contents of AI training. Therefore, we should select the highly effective data-augmentation technique from different techniques. Cytological diagnosis is based on the characteristics of cells or cell clusters, such as nuclear enlargement, variability in nuclear size, and irregularity of nuclear margins. Therefore, techniques such as color space transformations and kernel filtering may render the morphology unclear and work unfavorably in cellular diagnosis. Geometric transformations may be more useful than other data-augmentation techniques in the ROSE-AI system and other morphological diagnostic AI systems, such as endoscopic and computed tomography (CT)/magnetic resonance imaging (MRI) images. One study demonstrated that geometric transformations help detect prostate cancer in diffusion-weighted MRI using a CNN. 21 However, the study reported that the noise augmentation method's effect is insignificant. There are various techniques, including perspective transformation, rotation, flip, noise, and crop in geometric transformations, and the effectiveness of each technique may vary in terms of improving diagnostic performance. We tried to select the best effective technique from various data-augmentation techniques for the ROSE-AI system. However, because the advances in this field are much faster than we expected, new data-augmentation techniques other than those we used may become more suitable.
This study has some limitations. First, the training and evaluation of the AI system were performed using data collected retrospectively from only one institution. The detailed methods for creating pathological slides vary among facilities. Therefore, it is unclear whether the ROSE-AI system can be used with the same diagnostic ability at other facilities if the training data are collected from only one facility. Second, our study was analyzed using five-fold cross-validation, and we could not perform a validation study using new images. In the future, prospective studies should be conducted to further evaluate the value of the ROSE-AI system in the diagnostic yield of EUS-FNA in clinical practice.
In conclusion, the geometric transformation technique is most useful for training the ROSE-AI system. We believe the efficient ROSE-AI system will soon assist endoscopists in performing ROSE where cytopathologists are unavailable.