In this designed study framework, a diabetes care database was established by collecting fundus photographic images of patients with Type II diabetes retrospectively from the Department of Endocrinology and Metabolism of Kaohsiung Datong Hospital (KMTTH) and the Affiliated Hospital of Kaohsiung Medical University (KMUH). All images were acquired digitally and following local Patient Health Information (PHI) policy to extract all the private information.
To unify and perform quality control of these initial image data sets, and execute preliminary image quality screening, patients’ fundus image data were then re-exanimated to exclude over-exposure and under-exposed images which possessed no clinical values. Then the remaining qualified images were further improved by removing text information and noisy background. Pre-processing method was adopted through the adaptive histogram equalization (CLAHE) algorithm to improve the contrast of acquired images. The pre-processed images were then divided into training, validation, and test sets, respectively. In order to make statistically valuable models, we then rotated the data sets with a positive and negative 45° matrix to generate extra image information. These extra image data sets were also passed through AI models such as VGGNet, ResNet, and Transformer, as well as deep learning architectures such as ConvMixer to build predictive models as the original data sets. The RAdam (Ratified Adam) optimization tool was used as the parameter optimizer for the training of the models to accomplish a robust parameter update process. Indicators such as accuracy (ACC), area under the ROC curve (AUC) as well as Positive Predictive Value (PPV) were applied to evaluate the individual model rigidity. The research flow chart was observed and defined in Fig. 1.
The research work was coded and analyzed in python language (version 3.8.12; Python Software Foundation, Beaverton, OR USA) with provided functions such as image preprocessing and data visualization packages. Tensorflow (version 2.7.1; The Keras library in Google LLC, Mountain View, CA, USA) was also applied as a solid framework for implementing deep learning models as a necessary framework.
Within this study, two various source data sets were collected from the clinical practice information from the diabetes care generated by the Kaohsiung Datong Hospital (KMTTH) and the Endocrinology and Metabolism Clinic of the Affiliated Hospital of Kaohsiung Medical University (KMUH). The collected time span was from 2013 to 2017, respectively. During the five years’ time, for patients with Type II diabetes who had undergone clinical routine treatments and tested by routine ophthalmoscopy, the severity scores of peripheral neuropathy according to nerve conduction velocity (NCV) standard were generated and documented. Classification of the severity of peripheral neuropathy was well defined after receiving nerve conduction tests as follows: Class 0: Patients without DPN symptoms; Class 1: Patients with mild DPN symptoms; Class 2: Patients with moderate DPN symptoms; Class 3: Patients with severe DPN symptoms, retinal fundus images of the four DPN severity levels were shown in Fig. 2.
The retinal fundus images of selected clinical diabetic patients were collected by two institutes. All images were taken by the digital fundus cameras manufactured by Colin Instruments Co., Ltd. With the canon CR-2 model (Canon Medical Systems, Tustin, CA, USA). The CR-2 model has a very high resolution of up to 18 million pixels per frame. Canon Retinal Imaging Control Software (RICS) was utilized as an accompanying image control software to process and store patient fundus images with DICOM and JPEG formats for image processing needs.
Patient characteristics whose images acquired in this protocol study were shown in Table 1. A total number of patients admitted was 751, among them, 317 cases were from Datong Hospital, Kaohsiung City and 434 cases were from the Affiliated Hospital of Kaohsiung Medical University, respectively. This study was approved by the Institutional Review Board (IRB) for Affiliated Hospital of Kaohsiung Medical University. Images were collected and analyzed in accordance with relevant regulations and IRB trial guidelines set by hospitals with approval number KMUHIRB-E(I)-20190448. The informed consent is waived by the ethics committee.
Since the study number of patients with severe peripheral neuropathy was significantly lower compared to the other three categories statistically, we have decided to combine mild and severe cases into one accumulated category. This major difference certainly would create bias which introduced the inability to successfully establish AI deep learning models. Therefore, we decided that patients with severe DPN symptoms (Raw_Class 3) data shall be incorporated into patient data with moderate DPN symptoms (Raw_Class 2). So, the Class 2 definition in our review, was fine-tuned to represent patients with both moderate and severe DPN symptoms.
Table 1
Datasheet of patient characteristics
State Features
|
Class Total
n = 751 (100%)
|
Class 0
n = 275 (37%)
|
Class 1
n = 246 (33%)
|
Class 2
n = 230 (30%)
|
Raw_Class 2
n = 176(23%)
|
Raw_Class 3
n = 54(7%)
|
age
|
|
|
|
|
|
average value
|
60.70
|
59.05
|
62.00
|
61.79
|
59.70
|
range
|
21–93
|
21–85
|
27–93
|
32–90
|
31–81
|
gender
|
|
|
|
|
|
male
|
416
|
122
|
148
|
103
|
43
|
female
|
335
|
153
|
98
|
73
|
11
|
Note: Class Total: All patients with Type II diabetes who have received case fundus images, Class 0: Type II diabetics who have not had (None) peripheral neuropathy; Class 1: Patients with Type II diabetes with mild peripheral neuropathy; Raw Class 2: Moderate Type II diabetics with peripheral neuropathy; Raw Class 3: Type II diabetics with severe peripheral neuropathy, Class 2: Total patients combined with Type II diabetes mellitus with severe (Moderate and Severe, M&S) peripheral neuropathy.
|
In this study, age characteristics of the one-way Analysis of variance (ANOVA) algorithm were adopted, and the P value of this feature for the DPN severity category was calculated as 0.00866, and this value was less than the statistical significance threshold of 0.05 for the common two-tailed test. In addition, the chi-squared test of sex was used to analyze sex characteristics to determine the severity of DPNs. The calculated value was 6.9*, and the right tail probability covered by this value under the chi-square distribution of two degrees of freedom was also calculated as 0.05. The maximum threshold representation was 5.991, which was also calculated based on gender characterizations. All the values were less than the maximum threshold for the chi-square distribution and were all statistically meaningful.
Pre-screening was a preliminary selection process of the quality of the retinal fundus images in this study. After excluding the incomplete retina images from databases, such as the images with under-exposed and/or extra low exposure rate, and the images with over-exposed retina, the remaining 948 retinal fundus images were then screened and selected for analysis. The filtered image data sets were shown in Fig. 3. Images were screened sequentially for de-identification and background removal to preserve retinal fundus image details solely. This study was based on the CLAHE algorithm suggested by Swati. C et al. [11] with individually targeting the RGB channels to sharpen the subtle features in the image data sets. Finally, the new retinal fundus images with equally scaled superimposed three RGB channels were adopted as the model training data.
In this study, 948 retinal fundus images were enhanced by adding both ± 45-degree rotations of the original image data sets. This operation increased the original image number by three times the original, and finally, 2844 images were used in this study. The reason for his mathematical operations was to create the diversities of original images which then could solidify and strengthen our AI algorithm learning capability. Total image data sets were further divided into ratios by 7:2:1; the scaling rationale was divided into training, verification, and test sets for the establishment of AI models, where the training processes were optimized with enough but not overkilled data points.
The batch size of the model training has a certain impact on the training model implementation. According to the study by NS Keskar et al. [12], a smaller batch size possessed a higher possibility to reduce model loss and could navigate to a better search direction for a more generalized model. With this assumption, a small match size can help the AI model train the system more efficiently. Based on the above reasons and taking into account the limitations of the hardware utilized, we adjusted the match size configuration of the model to 21 to serve the purpose. Our study used a single NVIDIA GeForce RTX 3080 graphics card (NVIDIA Corp., Santa Clara, California, USA) for computing, which has 10GB of memory.
Optimization of model parameters used the Rectified Adam (RAdam) optimizer to replace the common Adam optimizer [13] which was published by L Liu et al. in 2019 for the deep learning (DL) algorithm. The DL Model Optimizer [14] in their study concluded that using a relatively lower learning rate during the warm-up period was quite effective in reducing the variation caused by using an adaptive learning rate initially. Adjusting the learning rate based on the warm-up period could stabilize the optimization gradient to achieve a much better and more stable training process.
During the model training period, in order to slow the unwanted trend of entering an overfitting dilemma, the image data generator resided in the Keras library was used as a functional formula to achieve this goal (stabilization). This study applies this specified data generator in each batch of training, importing each testing model for our data fitting and training. Because the generator can process the original image in a random manner, the overlapping situation of the trained images in each iteration will be greatly reduced, which greatly avoids the number of model learnings with the same image in multiple training iterations. This application could effectively slow down the occurrence rate of the model accidentally entering the overfitting, which fell into the dead loop with reduced efficiency in the machine learning process.
The PT-Attention model was built with the Inception v3 model [15] as a pre-training mechanism, while the “self-attention layers” were added to the original model architecture [16] as a final layer to retrain and build a robust predictive model for DPN analysis.
Our study utilized the algorithm created by A Trockman et al. who proposed the ConvMixer architecture [17] in order to establish a DPN prediction model in 2022. This predictive model’s architecture was classified as ideologically similar to ViT (Vision Transformer) [18] with a more basic MLP-Mixer (Multilayer perceptron-mixer) [19]. This ConvMixer model proposed in this study used standard convolutional layers to achieve image recognition capabilities through the Transformer architecture [16].
The research tool ResNet (Residual Network) was proposed in 2016 by Kaiming He et. al [20]. The residual architecture proposed in their study was named the “identify mapping” model [21]. This architecture alleviated the difficulty of gradient diminishing or weakening issues inside neural network structures [20].
VGG16 applied in this study consisted of 13 convolutional layers and 3 end layers connected altogether [22]. The theoretical foundation of VGG16 acted as the basis for “transfer learning” [23]. The application of VGG16 was to construct the input data with retinal fundus images of diabetic patients in our research, therefore, successfully establishing a robust classification model with the proper input data sets.