Nine hundred patients with DM diagnosis and without a known DR diagnosis who applied to Akdeniz University Endocrinology and Metabolic Diseases Department were included in the study. Ethics Committees of Akdeniz University and the Ministry of Health of the Republic of Türkiye approved the study.
The inclusion criteria of patients to participate in the study were as follows:
-
Patients who had DM diagnosis and are followed by the Endocrinology Department.
-
Older than 18 years of age
-
No previous diagnosis of DR
-
Not having undergone intravitreal injection, laser photocoagulation, or DR-related surgery.
-
Not having undergone an intraocular surgery, including cataract surgery
-
Had signed the Informed Consent Form.
-
The absence of media opacity could affect the retina's and optic disc's photographic appearance.
Posterior pole images were obtained from the patients using three non-mydriatic fundus cameras; Canon CR2, Topcon NW400, and Optomed Aurora. These images consisted of two images centered on the macula (Fig. 1) and centered on the optic disc (Fig. 2).
Nurses took optical disk-centered and macula-centered Fundus images using each of these cameras without dilation. These images were imported to EyeCheckup client software and were evaluated by the EyeCheckup AI software for the presence of DR. EyeCheckup AI software detects pathological findings (hard exudates, microaneurysms, intraretinal hemorrhages, soft exudates, venous beading, neovascular vessels in the retina and optic disc, preretinal hemorrhage, and vitreous hemorrhage) from the patient's fundus images. By evaluating the detected pathological findings, these patients were graded as Mild Non-proliferative DR (NPDR), Moderate NPDR, severe NPDR, and Proliferative DR (PDR), as recommended by the American Academy of Ophthalmology.8 If the patient had a DR severity greater than mild NPDR, they were graded as "more than mild DR (mtmDR)". Patients with severe NPDR or PDR were graded as "vision-threatening DR (vtDR)".6 The mtmDR group of patients represents the patients that should be referred to an ophthalmologist in 6 months, and the vtDR group includes those at risk of serious vision loss, thus, should be referred as soon as possible (1–2 months). The presence of a hard exudate in the macula and/or microaneurysm in the fovea and parafovea in the photographs was considered a CSDME suspicion. Patients suspected of CSDME were also classified as vtDR.
The subjects underwent dilation, and in addition to the existing two images additional four quadrant wide-field fundus images (4W) were also taken from the patients using Canon CR2 45-degree fundus cameras to match the Seven standard-field fundus photography used in ETDRS protocol. A recent clinical study has shown substantial agreement in the ETDRS 7-Field (7F) to 4-Widefield (4W) Digital Imaging in the Evaluation of Diabetic Retinopathy Severity, demonstrating that the two imaging protocols are interchangeable. Both 4W and 7F digital imaging protocols can be used for assessing ETDRS levels of DR, even in populations with minimal diabetic Retinopathy.16
The four quadrant wide-field images showing the periphery and the previous two non-mydriatic images were evaluated by retina specialists (MED, YA, MB), and a consensus was reached for each eye from each patient. These diagnoses were accepted as ground truth in terms of definitive patient diagnosis. Classification on a patient basis was made by considering the more severe eye. The diagnoses produced by EyeCheckup and the ground truth established by the retinal specialists were compared. Different sensitivity and specificity ratios were calculated for each camera, and the severity of DR was diagnosed. Minimum success thresholds for clinical validation were determined as; 85% for sensitivity and 82% for Specificity.
Labeling of the pathologic findings
For the detection of anomalies in fundus images, an object detection model was developed that was trained on annotated data. Anomalies were labeled using "Doctor Says" labeling software, an open-source tool designed for bounding box labeling, classification, and segmentation (Fig. 3). This software uses a specialized JSON format for labeling and provides a user-friendly interface for annotators. The labeling process involved identifying specific features in the fundus images that distinguish standard regions from abnormal regions. These features included color, shape, and texture.
A team of retina experts manually annotated the images using the "Doctor Says" software and the labeling process's accuracy was verified by cross-validation. The "Doctor Says" labeling software was developed using a combination of programming languages, including Java, JavaScript, HTML, and CSS.
It provides several unique features that set it apart from other labeling software. For example, it allows annotators to adjust the zoom level and brightness of the image to improve the accuracy of the labeling process. The labeled fundus images and corresponding annotations were then processed and converted to the TFRecord format, providing training and storage advantages. TFRecord enables faster data access and processing times, making it a preferred format for large datasets.
Preprocessing
Preprocessing is a crucial step in computer vision that involves manipulating, enhancing, and refining raw input data to extract meaningful information effectively. It is a series of techniques and methods designed to optimize the images or videos for subsequent analysis, interpretation, and decision-making. In this context, several quality checks were conducted on the fundus images to ensure they were suitable for analysis.
It is crucial to discuss the importance of preprocessing in computer vision and why it is necessary for accurate and reliable results.
In the preprocessing phase of the study, several quality checks were employed to ensure that the fundus images were suitable for analysis. Firstly, a chromaticity test was implemented to verify whether the images were in color, discarding those that did not meet this criterion. Secondly, a size threshold check was conducted, retaining only images that met or exceeded 1024x1024 pixels. A novel dynamic cropping algorithm was developed to rectify any irregularity in image dimensions that considered the retinal borders and automatically computed an appropriate crop offset. Only images that maintained a post-crop size greater than or equal to 1024x1024 pixels were deemed valid for further processing.
Additionally, an optic disc/fovea (ODF) detection model was employed to identify the ODF in each image. By extension, this model provided vital contextual information regarding the image orientation, i.e., whether it pertained to the right or left eye. Potential artifacts that could adversely impact the models' performance were also addressed, such as partial blurring/darkening, eyelashes in the frame, blurring due to incorrect focus, etc. A quality scoring model was designed and trained to evaluate the images' suitability for the study to mitigate such issues. Only those images that met this criterion were incorporated into the training and testing datasets.
EyeCheckup Artificial Intelligence Training Process
TensorFlow Object Detection is a popular framework for building and deploying computer vision models to identify and locate objects within images and videos. The framework provides a set of pre-trained models and tools that make it easy to train custom models for specific use cases. Object detection is critical in computer vision, with numerous healthcare, transportation, and security applications. Approximately 350,000 fundus photographs were collected and evaluated by the quality evaluation model, the anomalies in the picture diagnosed as DR were annotated by the ophthalmologists using only the quality photographs. In the following process, the photos labeled according to the diseases were converted to the appropriate format in line with the models' needs, and the training of models was trained with the proper architecture. In the training process, architectures were changed, and parameter optimizations were made according to the success of the models; according to the scores obtained from the training, the most successful model was selected.
TensorFlow Object Detection models are based on deep learning techniques that use convolutional neural networks (CNNs) to extract features from input images or videos. These models typically consist of two stages: a region proposal stage that generates candidate regions of interest and a classification stage that predicts the class of the objects within these regions. TensorFlow Object Detection models can be categorized into two types: single-stage detectors and two-stage detectors. Single-stage sensors are faster but less accurate than two-stage detectors, which require more computational resources. Recent advancements in object detection research have led to the development of novel single-stage detector models that eliminate the need for anchor boxes and predefined bounding boxes to generate candidate regions of interest. These models use a keypoint-based approach to predict the location and size of objects directly, making them simpler and more efficient than traditional object detection models.
EyeCheckup Artificial Intelligence Test Process
To comprehensively evaluate the effectiveness of the proposed method, the following metrics were adopted: Sensitivity, Specificity, and average precision.
True Positive (TP) is the number of positive samples that are correctly identified as positive samples; the number of true negatives (TN) is the number of negative samples that are correctly identified as negative samples, the number of false positives (FP) is the number of negative samples misidentified as positive samples, and the number of false negatives (FN) is the number of positive samples misidentified as negative samples. The Intersection-over-Union (IoU) reflects the degree of coincidence between the detection result (DR) of the model and the original ground-truth (GT) frame.
A predicted bounding box is considered a True Positive (TP) if it meets the following criteria: it has a higher confidence score than the specified threshold, it belongs to the same class as the ground truth (GT) bounding box, and it has an Intersection over Union (IoU) value greater than or equal to the specified threshold with the corresponding GT bounding box. Similarly, a predicted bounding box is considered a True Negative (TN) if it does not overlap with any GT bounding boxes of the same class with an IoU value more significant than the specified threshold and has a confidence score lower than the specified threshold.
In this study, the primary objective was to determine the DR level of the patient. Converting the predicted anomalies to the DR level is done using the following algorithm using DR Disease Severity Level recommended by the American Academy of Ophthalmology where DR0, DR1, DR2, DR3, and DR4 generated by the software refer to No Apparent Retinopathy, Mild NPDR, Moderate NPDR, Severe NPDR, and PDR respectively in the AAO PPP.
Sensitivity (true positive rate) refers to the probability of a positive test, conditioned on truly being positive. It is calculated as sensitivity = TP/(TP + FN). Specificity (true negative rate) refers to the probability of a negative test, conditioned on truly being negative. It is calculated as Specificity = TN/(TN + FP).
Statistical Analysis
Thirty-five of the 900 patients who participated in the study were excluded based on the exclusion criteria. Fundus photographs from 865 patients were included in the study. The study consists of 900 participants, with almost an equal distribution of male (50.89%) and female (49.11%) participants. Most participants have Type 2 diabetes (98.44%), with only a small proportion having Type 1 diabetes (1.56%). On average, participants have been living with diabetes for almost ten years (9.78 years), and the average age of participants is 58 years old. The average weight of participants is 81.52 kilograms, and the average height is 165.28 centimeters, resulting in an average BMI of 29.87. Approximately 3.89% of patients were deemed ineligible for the study, with the remaining 96.11% being eligible for inclusion. Understanding the demographic characteristics of the study population is crucial for interpreting the study results and generalizing them to the larger Population.
Study Demographics
Statistical Results
This study aimed to evaluate the diagnostic accuracy of three different cameras, Optomed Aurora, Canon CR2 AF, and Topcon NW400, to detect three common ocular anomalies: mtmDR, vtDR, and CSDME. The Optomed Aurora camera was used for 875 individuals, the Canon CR2 AF camera for 704 individuals, and the Topcon NW400 camera for 585 individuals. The sensitivity and specificity values for each camera and disease type were calculated based on the presence or absence of disease as determined by a reference standard and are presented in the following section. These results provide valuable insights into the diagnostic performance of these cameras and their potential use in clinical settings for screening and diagnosis of ocular diseases.
Two-Sided 95% Confidence Intervals are calculated using the Clopper Pearson Exact Binomial method from the RStudio software using the binomial test function