Computer-aided diagnosis of serrated colorectal lesions using non-magnified white-light endoscopic images

Computer-aided diagnosis systems for polyp characterization are commercially available but cannot recognize subtypes of sessile lesions. This study aimed to develop a computer-aided diagnosis system to characterize polyps using non-magnified white-light endoscopic images. A total of 2249 non-magnified white-light images from 1030 lesions including 534 tubular adenomas, 225 sessile serrated adenoma/polyps, and 271 hyperplastic polyps in the proximal colon were consecutively extracted from an image library and divided into training and testing datasets (4:1), based on the date of colonoscopy. Using ResNet-50 networks, we developed a classifier (1) to differentiate adenomas from serrated lesions, and another classifier (2) to differentiate sessile serrated adenoma/polyps from hyperplastic polyps. Diagnostic performance was assessed using the testing dataset. The computer-aided diagnosis system generated a probability score for each image, and a probability score for each lesion was calculated as the weighted mean with a log10-transformation. Two experts (E1, E2) read the identical testing dataset with a probability score. The area under the curve of classifier (1) for adenomas was equivalent to E1 and superior to E2 (classifier 86%, E1 86%, E2 69%; classifier vs. E2, p < 0.001). In contrast, the area under the curve of classifier (2) for sessile serrated adenoma/polyps was inferior to both experts (classifier 55%, E1 68%, E2 79%; classifier vs. E2, p < 0.001). The classifier (1) developed using white-light images alone compares favorably with experts in differentiating adenomas from serrated lesions. However, the classifier (2) to identify sessile serrated adenoma/polyps is inferior to experts.


Introduction
Colorectal carcinogenesis through a serrated pathway has been shown, and it is widely accepted that 20-30% of colorectal cancers originated from serrated lesions [1,2]. According to the World Health Organization (WHO) 2019 classification [3], serrated colorectal lesions are classified into hyperplastic polyps, sessile serrated lesions (SSLs) and traditional serrated adenomas. In the proximal colon, SSLs which were previously referred to as sessile serrated adenomas/polyps (SSAPs) in the WHO 2010 classification [4] play a major role in this serrated pathway. Colonoscopic examination of serrated lesions reveals flat, mucus-covered, pale lesions, with wide variation among colorectal polyps [5]. As for small (≤ 10 mm) polyps, endoscopic differentiation of serrated lesions from conventional adenomas is a challenge for colonoscopists [6]. In previous reports [7][8][9], image-enhanced endoscopy, e.g., narrow band imaging (NBI) or blue laser imaging facilitated their differentiation, and the diagnostic ability of imaging modalities was excellent to discriminate serrated lesions from conventional adenomas. In contrast, the diagnostic ability using white-light images (WLIs) alone was not acceptable, [10,11] although the overwhelming majority of colonoscopy practice in community hospitals is performed with WLIs and ordinary colonoscopes.
With progress in computer science, artificial intelligence (AI) is being increasingly applied to the interpretation of medical images. In the field of colonoscopy, computer-aided diagnosis (CADx) systems for polyp characterization have been developed and their ability validated [12][13][14][15]. To obtain high diagnostic accuracy for realtime prediction of pathology, unfortunately, these CADx systems function only using enhanced images, i.e., NBI [13,14], or endocytoscopic images [15]. In a recent report, however, the CADx system developed by Zachariah et al. showed excellent diagnostic accuracy among polyps, independent of the use of NBI or WLI [16]. This report implies that a CADx for polyp characterization could be developed without magnification and/or image-enhancement. In the present study, we developed CADx systems for polyp characterization, using non-magnified WLI alone. A classifier to differentiate conventional adenomas from serrated lesions was developed initially. In the next step, a classifier to differentiate SSAPs from hyperplastic polyps was developed, which is a difficult problem to solve because even expert endoscopists cannot accurately predict the histology of serrated lesions when using magnified NBI. In the final step, the diagnostic ability of these two classifiers was examined and directly compared with that of expert colonoscopists.

Study design
We conducted a single-center, retrospective study. This study was approved by the Institutional Review Board of Fukushima Medical University (registration no. 2020-039). All data were collected by September 2020. The STROBE (Strengthening the reporting of observational studies in epidemiology) guidelines were followed in reporting this study (Supplementary Table 1).

Endoscopic images
To protect patients' privacy, only endoscopic still images were extracted, which were all deidentified and labeled with basic endoscopic information, from existing image libraries in the medical information system of Aizu Medical Center Hospital. Upon initial search, lesions from the images collected were limited to tubular adenomas (TAs) with low-grade dysplasia, SSAPs, and microvesicular type hyperplastic polyps (MVHPs), all located in the proximal colon (cecum, ascending and transverse colon). Adenomatous lesions were collected from August 2018 to March 2020, and serrated lesions (SSAPs and MVHPs) were collected from April 2013 to March 2020. Images with invasive cancers, villous adenomas, adenomas with high-grade dysplasia, and other miscellaneous lesions, e.g., inflammatory polyp, goblet cell rich hyperplastic polyp were excluded, because this study focused on creation of classifiers for TAs with low-grade dysplasia, SSAPs and MVHPs, and these minor lesions can confound development of the CADx system. The images were limited to non-magnified WHIs, and any images enhanced with dye spray, NBI or, blue laser imaging were excluded. Lesions with pedunculated morphology were also excluded. Basic information regarding the lesions included pathological diagnosis (TA, SSAP, MVHP), size, and morphology (flat or polypoid). Pathological diagnosis was made by one expert pathologist (Professor Hiroshi Hojo), based on the WHO criteria (2010) [4]. Morphology was classified into polypoid (0-Is), and flat types (0-IIa, 0-IIa + IIc, 0-IIc) based on the Paris classification [17]. All endoscopic images were digitized at high resolution (1280 × 1024), using equipment from two endoscope manufacturers (Fujinon 71%: EC-590MP, EC-590ZW, and EC-600ZP; Olympus 29%: CFH260AI, CF-HQ290, and PCF-H290ZI).

Allocation into training and testing datasets
A total of 2249 non-magnified WLIs from 1030 lesions were collected including 534 TAs, 225 SSAPs, and 271 MVHPs. Characteristics of the lesions are shown in Table 1 and Supplementary Table 2. These images were allocated into a training dataset (with TA 447, SSAP 184, MVHP 184) and a testing dataset (TA 87, SSAP 41, MVHP 87), based on the date of colonoscopy, as illustrated in Fig. 1. All images from the same lesion were not allocated into both the training and test datasets. In this allocation, lesions detected earlier were used in the training dataset while more recent ones were used in the testing dataset. The ratio of images in the training to testing datasets was set to approach 4:1. Lesion size, anatomical location, and endoscope manufacturer were almost evenly distributed in both datasets, but the median number of images per lesion, lesion histology, and morphology showed significant differences between the two datasets (Table 1).

Training methods
We used a Resnet-50 convolutional neural network as the backbone for extracting features from images for classification [18]. ResNet-50 was pretrained using the ImageNet Large Scale Visual Recognition Challenge 2012 classification dataset, consisting of 1.2 million training images, with 1000 classes of objects [19]. The CADx systems were proposed and trained using the training dataset. As a first step, we developed a binary classifier (CADx1) to differentiate TAs from serrated lesions. In the next step, we developed another binary classifier (CADx2) to differentiate SSAPs from MVHPs. Before training, lesions were annotated and cropped from endoscopic images for training to increase the versatility of the training dataset and remove any influence from background structures. This facilitates spatial invariance in the analysis of polyps. Data augmentation including rotation, saturation adjustment, resizing, and exposure adjustment were performed to increase the number of training images. The features of images with lesions were evaluated using ResNet-50, and then sent to a full connection and Focal loss layer for classification [20]. The CADx system was trained using stochastic gradient descent with momentum. In training, the initial learning rate was set to 0.85, a learning rate drop factor 0.88, a minibatch size 30, and a L2 regularization coefficient 0.0005. A focal loss [21] was employed to overcome the influence caused by the imbalance of images' numbers in different classes, where alpha was set to 0.1, gamma was set to 2. The system is developed using MATLAB 2020a (MathWorks Inc., US) and the Deep Learning Toolbox™ and Parallel Computing Toolbox™, and a workstation with two Graphics Processing Units with NVIDIA GeForce GTX TITAN X.

Testing methods
After the training process, diagnostic performance was assessed using the testing dataset. The CADx1 system generated a probability score of each image for TA, and the probability score for each lesion was calculated as the weighted mean of the log 10 -transformed probability scores of images [22]. A probability score > 0.5 was defined as TA. Similarly, the CADx2 system to differentiate SSAP from MVHP was tested.

Readings by endoscopists
Two expert endoscopists (YH, TY) from other institutions were asked to compare their diagnostic performance with the CADx system. Both experts have performed over 10,000 colonoscopic examinations each and are certified by the Japan Gastroenterological Endoscopy Society. Identical endoscopic images were presented in a random order, and the experts rated the histology with a confidence level (high, medium, low), blinded to the proportion of each histology. First, experts rated whether the polyp histology is TA or a serrated lesion. Two months later, experts rated whether the histology of serrated lesions is SSL or MVHP. To discriminate TAs from serrated lesions, for instance, an expert diagnosis of TA with low confidence level is interpreted as a "0.6" probability score, and an expert diagnosis of serrated lesions with high confidence is interpreted as a "0" probability score (Supplementary Table 3). After completing the reading test, a probability score for each lesion was calculated as the weighted mean of log 10 -transformed probability score of images, the same as CADx.

Outcome measurements and statistical methods
Area under the curve (AUC) analysis from the receiver operating characteristics curve was performed based on the weighted mean probability score of images for each lesion.   Fig. 2A). A significant difference was observed between CADx1 and expert 2 (p = 0.0003). The accuracy showed a tendency similar to the AUC. Diagnostic performance of CADx1 by probability score is shown in Supplementary Table 3A.

Diagnostic performance to differentiate TAs from serrated lesions (
The sensitivity of CADx1 was significantly higher than that for both experts while the specificity of CADx1 was significantly lower than that of expert 1. The positive predictive value showed a higher trend for expert 1, and the negative predictive value was significantly lower for expert 2. Examples are shown in Fig. 3.

Diagnostic performance to differentiate SSAP from MVHP (Table 3)
The AUC was 55% [44-66] for CADx2, 68% [58-77] for expert 1, and 79% [70-87] for expert 2 (Fig. 2B). Significant differences were observed between CADx2 and expert 2 (p = 0.0002). Diagnostic performance of CADx2 by probability score is shown in Supplementary Table 3B. The accuracy showed similar values for all readers. The sensitivity of CADx2 was significantly lower than that of both experts while the specificity of CADx2 was significantly higher than that of both experts. The positive predictive value showed no apparent trend. The negative predictive value of CADx2 showed a lower trend, and a significant difference was observed between CADx2 and expert 2. Examples are shown in Fig. 4.

Subgroup analyses (Table 4)
In differentiating TA from serrated lesions, there is a trend that the diagnostic performance for larger (≥ 2 cm) lesions was superior to that for smaller lesions for all readers, and this trend is apparent in the AUC of CADx1. In subgroup analyses by morphology, the AUC of CADx1 shows a higher trend for polypoid lesions, but this trend was not observed in the accuracy. In differentiating SSAP from MVHP, the diagnostic performance for larger (≥ 2 cm) lesions was inferior for CADx2, and this trend was statistically significant for accuracy (p = 0.008). In contrast, these trends were not observed for both experts. A statistical comparison of AUC was not possible based on morphology because all 8 polypoid lesions were MVHP, and comparison of the accuracy did not show a trend. Subgroup analysis by endoscope manufacturer is shown in Supplementary Table 4. The AUC of CADx2 for "Fujifilm" was significantly higher than that for "Olympus" (p = 0.001).

Diagnostic performance based on image type
The overall diagnostic performance of CADx1 for TA versus serrated lesions and CADx2 for SSAP versus MVHP are summarized in Supplementary Table 5.

Discussion
This study demonstrates that the diagnostic ability of the CADx1 system to differentiate TA (tubular adenomas with low-grade dysplasia) from serrated lesions (SSAP + MVHP) is superior or equivalent to experts. These excellent results may be attributed to the collection of a relatively large number of endoscopic images of serrated lesions including SSAPs for deep learning. Application of the latest AI technology such as focal loss and over-sampling may overcome imbalances in the data resulting in good diagnostic performance [20,21]. In subgroup analysis, the AUC of the CADx1 increased with lesion size, reaching as high as 94% [88][89][90][91][92][93][94][95][96][97][98][99][100] in the largest (> 10 mm) lesion group. The AUC was also as high as 94% [87-100] for polypoid lesions. The present CADx1 system may be practical for lesion classification in clinical use, particularly for large polypoid lesions. In contrast, the diagnostic ability of the CADx2 system to differentiate SSAP from MVHP is inferior to both experts, although the diagnostic ability of the experts was not acceptably high. In previous reports, expert endoscopists were unable to make accurate optical diagnoses for the subtype of serrated lesions even using magnified NBI [23]. Since it is generally difficult for CADx to surpass human diagnostic abilities with current technology, these poor results might be within expectations. Development of a CADx system to accurately characterize lesions including the subtype of serrated lesions is a technological challenge, and most previous studies did not include serrated lesions in either training or testing datasets [11][12][13][14][15]. Unstable histological assessment would influence these less than acceptable results. In a recent report, the overall interobserver agreement for serrated lesions was moderate (κ = 0.44) based on histological assessment [24]. Recent minor changes in the histological criteria for SSAP/ SSL, where some MVHPs are reclassified into SSLs, would indicate that the SSL is not a distinct disease entity. Recent evidence has established that MVHP is a precursor to SSL, suggesting that both lesions are along the same disease spectrum. In clinical practice, the presence of a lesion > 1 cm in the proximal colon is an indication for polypectomy regardless of histology. Except for SSLs, small (< 1 cm) serrated lesions are not good candidates for endoscopic resection. Accuracy of the CADx2 in the smallest lesion group revealed the highest (73%) accuracy and approaches the diagnostic ability of experts. Application of the CADx2 system might be limited to small and polypoid lesions.
In this study, we did not target lesions located in the distal colon or rectum. The distal colon and rectum are generally a site for the development of a large number of MVHPs, a small number of TSAs, and a very small number of SSAPs, mostly showing atypical histological features. It is well known that typical MVHPs in the distal colon or rectum have a diminutive, pale and hemispherical appearance, suggesting that it is easier to discriminate MVHPs from relatively large TSAs or SSAPs. In contrast, the proximal colon is a site for the development of a variety of serrated lesions, and SSAPs frequently develop in the proximal colon. That is the main reason the study was limited to proximal lesions. To evaluate the diagnostic performance of the CADx system, we selected the AUC as the primary outcome because comparisons are easily made regardless of discrimination thresholds. For each image, the CADx system generates a probability score which expresses its prediction of lesion histology, and we set 0.5 as the discrimination threshold in this study. However, this discrimination threshold can be arbitrarily set to obtain the highest diagnostic value. At a discrimination threshold of 0.5, the sensitivity of the CADx2 system was extremely low (17%) and the specificity was high (85%), implying that probability score threshold was not optimal. To classify serrated lesions more accurately, lower thresholds may be optimal.
This study has several acknowledged limitations. First, this is a retrospective study, and a prospective clinical trial should be performed to confirm the performance and reliability of the CADx systems. Second, the datasets were extracted based on predefined inclusion/exclusion criteria, but mainly using subjective assessment of image quality, which may lead to selection bias. In addition, all lesions used in this study were in the proximal colon, and the generalizability of these findings may be limited. Third, the training data were still images selected by endoscopists. This limited the full use of deep neural networks to extract features for classification in the training phase. A future study should use digital videos incorporating temporal images with sequence recognition. Fourth, all data are from a single medical center. The proposed method should be trained and tested using data from multiple institutions to improve performance and reliability. According to the TRIPOD Statement [25], it is strongly recommended to evaluate the performance of the model with other participant data (an external validation cohort) than was used for model development. If an external validation cohort was used in this study, the CADx 1 could be more robust.
In conclusion, we have developed two CADx systems trained with non-magnified WLIs. The CADx1 was comparable to experts in differentiating TAs from serrated lesions in the proximal colon. The CADx2 system failed to demonstrate sufficient diagnostic ability in discriminating SSAPs from MVPs. This is the next issue to be addressed. A future study may use digital video images incorporating temporal images with sequence recognition.