This study was approved by the Institutional Review Board of the Chang Gung Memorial Hospital (IRB No. 202001328B0). Polyps were detected by ordinary white-light endoscopy. We collected 257 images of HP, 423 images of SSA, and 60 images of TA under white light. We also collected 238 images of hyperplastic, 284 images of SSA, and 71 images of TA under NBI. It is worth noting that our data collection does not aim to reassemble a data set according to the actual proposition of SSA, TA, and HP. By contrast, our data collection method aims to allow the AI model to learn different characteristics of SSA, TA, and HP to classify them. During the data collection stage, we intentionally collected the data using an unequal sampling method. Unequal sampling is a commonly used statistical method for this purpose. Uneven sampling broadens the proposition of defects or target items and enables the statistical model to extract characteristics from the target group. In this study, we implemented unequal sampling to increase the proportion of SSA and enable the AI model to extract features from SSA images. Experienced endoscopists classified all images and pathologies and verified all classification results. TensorFlow (https://www.tensorflow.org/) and CNN are implemented to construct our deep learning model. Images of colon polyps were collected from the Lin-Kou Chang Gung Memorial Hospital database between 2016 and 2019. Two experienced gastrointestinal pathologists reviewed the pathology of colon polyps. Images with blurred surfaces and poor focal lengths were discarded. The resolution of the images was 150 × 150 bpi. Our deep learning heuristic consisted of three parts: the data augmentation algorithm, deep learning framework, and CNN model. At the end of this section, we present our statistical validation methods and operational environment.
2.1 Image Preprocessing
To address this small sample size, we developed a procedure to increase the sample size of the images. Data augmentation is an effective and commonly applied method for defect detection.8 We adopted a similar idea to design our image preprocessing algorithm. The collected images first go through the preprocessing algorithm (Algorithm 1), and then the deep learning model is built, as described in Heuristic 1.
The image preprocessing algorithm
Indices
i = the ith deep learning dataset
k = the kth randomly divided sub dataset in k-fold cross validation method.
Algorithm 1
Step 1: Process the images into the correct input format for the TensorFlow
Step 2: Augment the images with rotation. For example, one can rotate images by 45°, 90°, and 180°, and triple the new useable images. The images are also enhanced by image enhancement software (see Fig. 2)
Step 3: Collect and randomly divide the images into four equal-sized subsets within each type of polyp. Assigned each subset with an index number k. For example, in 4-fold cross validation there are four subsets, k = 1, 2, 3, and 4.
Step 4: Construct deep-learning datasets. Every dataset was constructed using the training, validation, and testing sets. All deep learning datasets were named with the subsets generated in Step 3. In 4-fold cross validation, there are 12 heterogeneous deep-learning datasets, that is i = 1..12.
Step 5: Output the deep learning building datasets into Heuristic 1. End the Algorithm 1.
Nate that, we named each deep leaning dataset i based on the subset index k. For example, if a training set consisted of subsets 1 and 2, a validation set is of subset 3, and a test set is subset 4. The deep learning dataset was named 1234.
2.2 Framework of TensorFlow
In this study, TensorFlow was conducted in the Anaconda environment using Jupyter Notebook and Python. We used a CNN model, called Inception V4, which includes Softmax, Dropout, Average Pooling, Inceptions, and Reduction layer. The basic idea of inception includes multiple convolution layers, average pooling layers, and activation functions (such as Rectified Linear Unit (ResLU)). Softmax and dropout were used to prevent model overfitting. The convolution layer extracts characteristics from the image. The activation functions introduced weights for the standard deviations and added a small value to the bias[8]. The active functions can help generate a nonlinear combination of the convolution layer and thus activate the neurons and avoid dead neurons. The pooling layer retained significant characteristics and avoided overfitting problems (Zhang, 2019) [9] These three parameters—learning rate for the activation function, batch size, and epoch for convolution—must be optimized within the deep learning process. Detailed information on Inception V4 can be found in the study by Szegedy et al. [10]. We implemented Inception V4 as the convolution neuron network model because of its consistency and performance in our preliminary modeling experiments. The deep learning model-building procedure is summarized in Heuristic 1.
The Heuristic of the deep learning model building
Heuristic 1
Step 1: Collect data and mark the true state of the images.
Step 2: Input data into Algorithm 1.
Step 3: Index the deep learning datasets from i = 1… 12 in 4-fold cross validation method. For convenience, we label subsets for each deep learning dataset as j = 1,2,3,4. Note that j represented the order of the subsets in the deep learning dataset, and it is not quals to k
Step 4: initiate the Heuristic by set i = 1 and go to step 5.
Step 5: Input deep learning dataset i. Go to step 6.
Step 6: Input subsets j = 1, 2, and 3 into TensorFlow model. The subsets j = 1 and 2 are training sets, and the validation set is set j = 3. Find the best parameters (learning rate, batch size, and epoch) for the deep learning model in Step 6, and output the model to Step 7.
Step 7: Input subsets j = 4 into the deep learning model built in Step 7 to test its accuracy. Record the testing results for deep learning dataset i.
Step 8: Collect the model testing results. If i = n, stop and output all testing results from the deep learning models. Otherwise, set i = i + 1 and return to Step 5.
Step 9: Collected all the testing results for statistical analysis.
In Fig. 3 below, we present our overall AI modeling framework.
2.3 Statistical Analysis
A percent confidence interval analysis was implemented to benchmark the consistency of the deep learning model. To highlight the classification power of our deep learning model, discriminability indicators such as sensitivity, specificity, and area under the curve (AUC) were calculated. We also present a confusion matrix to summarize these indicators. All statistical analyses were conducted using Phyton 3.7.
A confusion matrix was constructed by defining the correct classification if the model could identify an image containing HP or adenoma. That is, if an SSA image is classified as TA by our model in our confusion matrix, it is still recorded as a true positive, and vice versa. In contrast, if an SSA(TA) image is classified as HP by our model, then we record it as a false negative, and vice versa (see Table 1).
Table 1
|
|
True Condition
|
|
|
SSA/TA
|
HP
|
Predicted Condition
|
SSA/TA
|
True Positive
|
False Positive
|
HP
|
False Negative
|
True Negative
|
From the calculation of the confusion matrix, we can calculate sensitivity and specificity as equations (1) and (2), respectively. The result of the sensitivity and specificity and the area under the curve (AUC) serve as an indicator of the model discriminability.
$$Sensitivity=\frac{True Positive}{True Positive+False Negative}$$
1
$$Specificity=\frac{True Negative}{True Negative +False Positive}$$
2
As described in the method, we used the classical 4-fold method to check for overfitting in the deep learning model. A 4-fold method is a commonly adopted cross-validation method for deep learning. A 5-fold or even k-fold method can be used. We adopted the 4-fold method for its simplicity and efficiency. As described in Heuristic 1, the augmented data were randomly split into 4 equal-size subsets. Two of the subsets were assigned to build the model: one subset served as a validation set to set the parameters, and one test was set to test the accuracy of the final model. The results were recorded and assessed based on accuracy, sensitivity, specificity, and AUC.
Since we aimed to design a method for healthcare institutes to build their own deep learning model, we designed our method to be easy to build and execute. The execution environment is summarized in Table 2. As can be observed, our execution environment requirements are affordable for any healthcare institution. This is another advantage of the proposed method.
Table 2
The hardware and software environment of deep learning model
|
Item
|
Content
|
Hardware
|
GPU
|
Tesla P100
|
GPU RAM
|
16.0 GB
|
Software
|
Windows
|
Windows 10 pro
|
Operation System type
|
x64 processer
|
NVIDIA
|
NVIDIA 441.22
|
CUDA
|
CUDA 10.2
|
Language Environment
|
Anaconda3 Jupyter Notebook
|
Language
|
Python 3.7
|
TensorFlow version
|
TensorFlow-GPU 1.14
|