AI has brought about a revolutionary transformation in research, optimizing and streamlining processes. The abundance of patient information stored in electronic medical records (EMRs) presents a challenge in converting this data into actionable knowledge that can benefit clinicians, researchers, and healthcare systems in providing precise, personalized, and efficient care. Fortunately, AI has emerged as a solution, enabling the development of computer-generated algorithms and automated systems that expedite data collection [35].
The field of ophthalmology, with its access to ophthalmic imaging and objective markers, is particularly well-suited for leveraging large datasets. These datasets not only facilitate the identification of correlations but also enable multicenter studies, support multimodal analyses, unveil novel imaging patterns, and enhance the statistical power of research. Often referred to as "big data," this extensive collection of medical information serves as an ideal foundation for the application of AI, machine learning (ML), and deep learning (DL) algorithms on an unprecedented scale [36].
In this study, we introduce an extraction method that utilizes transfer learning to extract fundus image features from the DenseNet-121 model. The extracted features are subsequently fed into a stack of dense layers, with a SoftMax activation function at the final layer, enabling multiclass classification of retinal images into three distinct classes: Healthy Eye, Glaucoma, and Diabetic Retinopathy.
State of the art
In this section, we present an overview of current advancements, techniques, methodologies, and research findings relevant to our study. This review provides a comprehensive snapshot of the state of the art in this research area.
In 2018, Lam et al. developed Convolutional Neural Network (CNN) models for the purpose of recognizing and staging diabetic retinopathy. Their study involved the utilization of two fundoscopic image datasets: the Kaggle Diabetic Retinopathy Images dataset and the Messidor-1 dataset [37].
In order to enhance the learning of challenging features, the researchers transitioned to a smaller but more suitable dataset. To achieve this, they supplemented the Messidor dataset with a Kaggle partition called "MildDR," which comprised of 550 images. By combining these datasets, they created a new dataset consisting of 1200 retinal images classified into four distinct classes [37].
They discovered that preprocessing with contrast limited adaptive histogram equalization on 3 class classifier increases sensitivity from 0 to 29.4% in the mild class while this measure was approximately the same for the remaining two classes. They used AlexNet, VGG16 and GoogLeNet models for the Binary model and The GoogLeNet model achieved the highest sensitivity of 95% and specificity of 96%. However, when they trained 3-ary and 4-ary classifiers with a GoogLeNet model on the Kaggle dataset, they were unable to achieve significant sensitivity levels for the mild class. Same as their research we also encountered that errors primarily occur in the misclassification of mild disease as normal due to the CNNs inability to detect subtle disease features [37].
During their research, the authors made an intriguing discovery regarding the preprocessing technique known as contrast limited adaptive histogram equalization (CLAHE) in the context of a three-class classifier. They found that applying CLAHE increased the sensitivity in the mild class from 0–29.4%, while the impact on the remaining two classes was negligible [37].
For their binary model, the researchers employed three popular CNN architectures: AlexNet, VGG16, and GoogLeNet. Among these models, GoogLeNet demonstrated the highest sensitivity of 95% and specificity of 96%. However, when training 3-ary and 4-ary classifiers using the GoogLeNet model on the Kaggle dataset, the researchers were unable to achieve significant sensitivity levels for the mild class. Interestingly, their findings aligned with our own research, as we also encountered challenges related to the misclassification of mild disease as normal. This was primarily attributed to CNN’s limited ability to detect subtle disease features [37].
In a study conducted by Li et al. in 2019, the GoogLeNet model was utilized for the task of diabetic retinopathy grading. They collected a dataset consisting of 13,676 retinal images and evaluated the performance of various models. Interestingly, the ResNet-18 model exhibited superior performance compared to other models, achieving an average accuracy of 76.59% on the test set. This surpassed the average accuracy of 74.98% achieved by the GoogLeNet model [38].
ResNet, short for Residual Neural Network, is a deep learning architecture that was introduced by Kaiming He et al. in 2015 and won the ImageNet 2015 competition. Its primary goal was to address the challenge of training very deep neural networks, which had previously been difficult due to the problem of vanishing gradients. The key innovation of ResNet lies in its use of residual blocks, which employ shortcut connections or skip connections within the network [39].
ResNets has been widely utilized for the classification task of DR. Zhang et al. (2019) conducted a study using ResNet on 3,833 macula-centered retinal fundus images. However, the original dataset contained a larger number of images, comprising 13,767 images from 1,872 patients, which included fluorescence contrast images and postoperative fundus images. The authors excluded these additional images, resulting in 9,934 images from 1,229 patients to be excluded [40].
In their study, Zhang et al. compared the performance of various deep learning models, including ResNet50, DenseNets, InceptionV3, InceptionV2, and Xception, for DR classification. They reported classification accuracies of 94.94%, 93.34%, 94.55%, 94.16%, and 92.09%, respectively, for these models. Among them, Xception exhibited the lowest classification accuracy, while ResNet achieved the highest accuracy. Interestingly, in the identification task, Xception achieved the highest accuracy, while ResNet had the lowest accuracy in this study [40].
Gulshan et al. conducted a study where they employed a pre-trained InceptionV3 model to classify diabetic retinopathy (DR) using a dataset of 128,175 images. The images were retrospectively collected from EyePACS in the US and three eye hospitals in India. The study also included two clinical validation sets: EyePACS-1 and Messidor-2, containing 9,963 and 1,748 images, respectively [41].
For the high specificity operating point, the algorithm achieved a sensitivity of 90.3% and 87.0% and a specificity of 98.1% and 98.5% in EyePACS-1 and Messidor-2, respectively. The referable diabetic retinopathy was defined as moderate or worse diabetic retinopathy or referable macular edema, as determined by a panel of at least seven US board-certified ophthalmologists. At the high sensitivity operating point, the algorithm demonstrated a sensitivity of 97.5% and 96.1% and a specificity of 93.4% and 93.9% in the two validation sets, respectively [41].
The development of new technologies has increasingly shown their relevance in the field of glaucoma diagnosis and treatment. In particular, ML techniques have emerged as essential tools for achieving significant research outcomes. Creating a CNN model for glaucoma detection using retina fundus images offers numerous advantages, including early detection, objectivity, scalability, and enhanced diagnostic accuracy [42].
Li et al. conducted a study in which they developed a deep learning system specifically designed for classifying glaucomatous optic neuropathy using color fundus photographs. The research team recruited 21 trained ophthalmologists who could classify the photographs. A total of 48,116 fundus images were included in the study, out of which 39,745 images met the criteria for training and validation [43].
The developed deep convolutional neural network consisted of 22 layers. When evaluated on the validation dataset, the system achieved an impressive accuracy of 92.9% in classifying referable glaucomatous optic neuropathy. Additionally, it demonstrated a sensitivity of 95.6% and a specificity of 92.0%, indicating its ability to effectively identify cases of glaucomatous optic neuropathy [43].
Fu et al. proposed a novel approach called Disc-aware Ensemble Network (DENet) for automated glaucoma screening. The DENet model was designed to consider both global and local image levels to enhance its screening performance [44].
At the global level, the DENet model consisted of two streams. The first stream utilized a standard classification network based on ResNet. The second stream employed a segmentation-guided network adapted from the U-shape convolutional network. This segmentation-guided network helped in identifying and localizing important regions within the image. The DENet model achieved promising results with an accuracy of 84.29%, a specificity of 83.80%, and a sensitivity of 84.78% on SCES Dataset [44].
Raghavendra et al. conducted a study to develop a glaucoma detection method utilizing an 18-layer CNN architecture. The architecture included convolutional and max-pool layers, followed by a classification layer employing the logarithmic soft-max activation function. The method was evaluated using a dataset of 1,426 fundus images, comprising 589 normal images and 837 glaucoma images. Remarkably, the proposed method achieved the highest accuracy of 98.13% in classifying the fundus images into normal and glaucoma categories [45].
In 2018, Christopher et al. conducted a study in which they recruited participants from two longitudinal studies focused on assessing optic nerve structure and visual function in glaucoma: the African Descent and Glaucoma Evaluation Study (ADAGES) and the University of California, San Diego (UCSD) Diagnostic Innovations in Glaucoma Study (DIGS). The datasets comprised stereo fundus images obtained from these studies. The dataset included 14,822 individual optic nerve head (ONH) images from 4,363 eyes of 2,329 individuals [46].
Three deep learning architectures, VGG16, Inception, and ResNet50, were evaluated using both native and transfer learning approaches. The models were trained, validated, and tested using a 10-fold cross-validation setup. The best performing model was determined based on the area under the receiver operating characteristic curve (AUC) and was found to be the transfer learning ResNet50 model, achieving an AUC of 0.91 [46].
The model created in this study is able to classify fundus images to three classes of Diabetic retinopathy, Glaucoma, and healthy eye with an average accuracy of 84.78%, a precision of 84.75%, and a recall of 84.76%. Although by training a model on a mild DR omitted dataset, these metrics increased significantly to an accuracy of 97.97%, a precision of 97.97%, and a recall of 97.96%.