Methodology
A literature search was performed up to August 31st, 2021 using the following online databases: PubMed, Embase, Cochrane Library, and Google Scholar. Article screening was done by the senior author (RD). The inclusion criteria were as follows: (1) Deep Learning (2) Ultra-Widefield Imaging. The exclusion criteria were as follows: (1) articles published in any language other than English, (2) articles not peer-reviewed (usually preprints) (3) no full-text availability (4) articles using machine learning algorithms other than deep learning. No study design was excluded from consideration. The detailed search methodology was as follows: (“deep learning” OR "artificial intelligence" OR "machine learning") AND ("Ultra-Widefield" OR "UWF" OR "UWFI" OR "Optos").
A total of 36 studies were included. A full listing of included studies, authors, and their respective digital object identifiers (DOIs) are listed in Table 1. A full listing of included studies and their respective architectures, datasets, and experimental results are listed in Table 2. A chart detailing the number of included publications by year is included in Figure 2. A map highlighting the number of publications by country is provided in Figure 3.
Disease Detection and Classification
Disease detection and classification have been the most thoroughly investigated uses for UWF imaging with DL. Specifically, DL has been used for disease detection and classification of diabetic retinopathy (DR), retinal detachment (RD), glaucoma, age-related macular degeneration (AMD), retinitis pigmentosa (RP), pachychoroid, retinal vein occlusion (RVO), idiopathic macular hole (IMH), retinal hemorrhage, and sickle cell retinopathy (SCR).
Diabetic Retinopathy
At the time of writing, six published peer-reviewed articles have explored DL with UWF imaging in DR patients.[32, 37–42] Five have specifically used UWFIs for the detection and classification of diabetic retinopathy.[32, 38–40, 42]
Wang et al. first used UWFIs to train a DL model for the detection of referrable diabetic retinopathy in 2018.[40] 754 UWFIs were graded by ophthalmologists, of which 643 were gradable and inputted into the algorithm. The study set a threshold of moderate non-proliferative diabetic retinopathy (NPDR) or higher (i.e. level 2 or higher on the International Clinical Diabetic Retinopathy scale) as sufficient to warrant a referral to an ophthalmologist. The study used the EYEART proprietary and closed-source algorithm to automatically detect and quantify DR lesions, such as hemorrhages, microaneurysms, lipid exudates, and wool spots. The EYEART algorithm, designed for standard flash colour images, was applied here to UWFIs.
The algorithm found 21.22% of the images contained referral-warranted DR while the graders determined 30.77% contained referral-warranted DR. When using both eyes to grade DR, the algorithm achieved a 91.7% sensitivity, 50.0% specificity, and 0.873 AUROC. When using individual eyes, the algorithm achieved a 90.3% sensitivity, 53.6% specificity, and 0.851 AUROC. While the authors were able to achieve high sensitivity, the low specificity indicates a high number of false positives using the EYEART algorithm. While the results were promising, a full understanding of their ML methods is unable to be determined as the algorithm is closed-source. As EYEART was designed on flash colour images, it is expected that algorithms designed and trained on UWFIs would be more effective.
Nagasawa et al published a study in 2019, which used DL for detecting treatment-naïve proliferative diabetic retinopathy (PDR) from UWFIs.[41] The authors used a deep convolutional network (DCNN) with PDR and non-PDR images.
The authors used the VGG-16 DCNN, which automatically learns the local features of an image and generates a classification model.[26] The authors used 40 deep learning models from 40 learning cycles and chose the model with the highest correct answer rate from test data as the DL model for the study. The DCNN selected achieved 94.7% sensitivity, 97.2% specificity, and 0.969 AUROC. Gradient-weighted class activation mapping (Grad-CAM) was utilized to visualize the image features used by the DCNN to classify images as containing referrable PDR.
The authors specifically used treatment-naïve PDR, which may have improved their results relative to Wang et al. Nonetheless, the authors were able to achieve high sensitivity and a high specificity, indicating that DCNN approaches trained on UWFIs may be superior to applying algorithms designed for colour fundus images (i.e. the EyeArt algorithm) to UWFIs for DR detection.
Using UWF-FA from PDR patients, Bawany et al utilized DL to correlate automated vessel density with visual acuity in 2020.[37] While not focusing on detection of DR generally, the goals of the study were to use a DL-quantified measure (retinal vessel density) and determine if it correlated with an outcome (visual acuity) known to be affected by PDR. Retinal blood vessels were first detected using a deep neural network (DNN) that was organized in the U-Net architecture. The study authors trained the dataset on UWF-FA images with corresponding ground-truth vessel maps trained using a human-in-the-loop procedure first demonstrated by Ding et al.[43] The output of the DNN was a vessel map where pixel intensity indicated the likelihood of a pixel being a vessel. The trained DNN achieved 0.930 AUPRC.
Vessel density was measured by calculating the percentage of vessel pixels in a circular area centered around the fovea. To study the correlation between vessel density and best corrected visual acuity (BCVA), UWF-FAs were analyzed using the trained model. The study found a statistically significant positive correlation between vessel density and BCVA of 0.4071 (p = 0.0075), but no statistically significant correlation between vessel density and central retinal thickness (CRT).
Tang et al published a study in 2021 which used DL to detect vision-threatening DR (VTDR) and referrable DR (RDR) from UWFIs.[39] Using UWFIs, they trained three CNNs to develop a pipeline for disease detection. The first CNN classified images as gradable or ungradable, the second for detecting VTDR, and the third for detecting RDR. The study used transfer learning and applied ResNet50 models pre-trained on ImageNet. Finally, the authors applied Class Activation Mapping (CAM) heatmaps for each result (true positive, true negative, false positive, and false negative) to assess DL performance.
The first CNN to determine gradeability achieved an 86.5% sensitivity, 82.1% specificity, and 0.923 AUROC on their primary dataset. The RDR detection CNN achieved 0.981 AUROC, 94.9% sensitivity, and 95.1% specificity. The VTDR CNN achieved 0.966 AUROC, 87.2% sensitivity, and 95.8% specificity. On four external datasets, the gradeability CNN achieved >0.82 AUROCs, >79.6% sensitivity, and >70.4% specificity while the RDR and VTDR CNNs achieved AUROCs and accuracies of >0.9 and >80% respectively.
Oh et al published a study investigating the early detection of DR using DL and UWFIs.[38] They compared the ability of CNNs to classify ETDRS 7SF vs. optic disk and macula-centered ETDRS F1-F2 images as containing diabetic retinopathy. They first trained a U-Net model with ResNet-18 for optic disk detection on the publicly available REFUGE dataset of colour FI.[44] The authors then used size and distance thresholds to determine macula locations. They then inputted UWFIs into this trained model to detect the optic disk and macula center. From these detected locations, they segmented the UWFIs into ETDRS 7SF images and F1-F2 images. 7SF ETDRS images contain 7 fields of 30 degrees each, while F1-F2 images contain only 30-degree overlapping circles centered on the optic disk and macula center.
The authors then trained a ResNet-34 model pre-trained on ImageNet and optimized their model using their dataset. In doing so, they achieved an 0.915 AUROC, 83.38% sensitivity, and 83.41% specificity on 7SF images. However, they achieved a 0.8867 AUROC, 80.60% sensitivity, and 80.61% specificity on F1-F2 images. 7SF images achieved results that were significantly greater for all three measures (p < 0.001) compared to those of F1-F2 images.
While the authors demonstrate that DL classification systems are more accurate using 7SF images, the achieved AUROC, sensitivity, and specificities have been greater in previously published studies using whole UWF. This indicates the greater utility of UWFIs over 7SF and F1-F2 images of the fundus.
Nagasawa et al published a second study on DR using DCNNs and UWFIs in April 2021.[42] They compared the accuracy of DL-based DR staging from UWFIs and OCTA images. UWFIs and OCT en face images of the superficial plexus, deep plexus, outer retina, choriocapillaris, and density map were extracted. OCTA scans of a 6x6 mmregion were acquired for each patient. The OCTA and UWFIs were combined into a single image file, to form a third “imaging modality.” The severity of DR for each patient and their associated images were graded by three retinal specialists.
The study authors then trained a VGG-16 DCNN to first classify the images as containing DR, and the second to detect PDR. Each DCNN was tested on UWF, OCTA, and UWF-OCTA combined datasets. In detecting DR, the first DCNN achieved AUCs after training on the UWF, OCTA, and UWF-OCTA images of 0.790, 0.883, and 0.847 respectively. In detecting PDR, the second DCNN achieved AUCs after training on the UWF, OCTA, and UWF-OCTA images of 0.981, 0.928, and 0.964 respectively. This study demonstrates the ability of DL systems to detect DR and PDR but also demonstrates no additive benefit of combining imaging modalities (UWF and OCTA) to increase the accuracy of disease classification.
Retinal Detachment
Five studies have been published on retinal detachment detection from UWFIs.[33, 45–48] The first published, from Ohsugi et al, used UWFIs to detect rhegmatogenous retinal detachment (RRD).[45] The study used a CNN with 3 convolutional layers, of which each were followed by activation function (ReLU) layers and finished with two fully connected layers. The final output layer performed a binary classification using a softmax function. The trained model achieved an 0.988 AUROC, 97.6% sensitivity, and 96.5% specificity.
In 2019, Li et al developed a DL system for identifying specific characteristics of retinal detachment from UWFIs.[46] They developed a DL system for detecting notable peripheral retinal lesions (NPRLs), such as lattice degeneration and retinal breaks, which often lead to RRD. The authors then used and compared 4 CNNs: InceptionResNetV2, InveptionV3, ResNet50, and VGG-16. With each CNN, the authors explored three methods for improving the DL algorithm: i) no data augmentation, ii) data augmentation with brightness shifts, 45-degree rotation, and horizontal and vertical flipping, and iii) data augmentation with histogram brightness equalizations, 45-degree rotation, horizontal flipping, and vertical flipping. This led to 12 models trained and compared. The study found that the dataset trained on InceptionResNetV2 with the second data augmentation method achieved the greatest performance, with 98.7% sensitivity, and 99.2% specificity, and 99.1% total accuracy. This was significantly greater than comparisons with ophthalmologists in the study. The authors found that a general ophthalmologist with 5 years of experience had a 97.6% accuracy, 93.6% sensitivity ,and a 98.7% specificity, while one with 3 years experience had a 94.5% accuracy, 85.9% sensitivity and 96.8% specificity. These results are very promising for the continued use of DL in identifying NPRLs and the greater accuracy in comparison to trained ophthalmologists.
In 2020, Li et al then applied an InceptionResNetV2-based DL model to detect RD and discern macular status using UWFIs.[33] They first developed a DL system to detect RD. The model for detecting RD achieved a 96.1% sensitivity, 99.6% specificity, and 0.989 AUROC. A retina specialist with 3 years experience achieved 94.4% sensitivity and 99.1% specificity, while a specialist with 5 years experience achieved 95.4% sensitivity and 99.8% specificity. The RD images were then used as the dataset for the DL for macular status classification. This DL model achieved 93.8% sensitivity, 90.9% specificity, and 0.975 AUROC. The ophthalmologist with 3 years of training achieved sensitivities and specificities of 86.3% and 87.1%, while the more senior ophthalmologist achieved 91.3% and 92.4% respectively. The difference in discerning macular status between the DL and ophthalmologists is greater than their difference in RD detection. As macular status is an indication for emergency surgery, this difference is significant in demonstrating the utility and necessity of DL in ophthalmology.[33]
Zhang et al. developed a DL system for detecting lattice degeneration, retinal breaks, and RD in tessellated eyes. They then tested two image pre-processing techniques with the seResNext50 CNN. The first technique resized all images to 512x512, and when applied to the DL model, the model would output a positive number for each lesion per image. The second method used the cropping of patches of labelled lesions. The DL model applied to this dataset would then assign a positive score to each lesion and output the max score of all the image’s patches. Furthermore, they trained three distinct models for detecting lattice degeneration, retinal breaks, and RD respectively combining for a total of 6 tested models. In detecting lattice degeneration, the resizing method achieved 0.888 AUROC while the cropping method achieved 0.841. For retinal breaks, the resizing method had 0.843 AUROC while the cropping method achieved 0.953 AUROC. In RD, the resizing and cropping methods achieved AUROCs of 1.00 and 0.979 respectively. The use of the full image led to greater accuracy in all cases except for retinal breaks, where the cropping method was found to be superior.
In 2021, Antaki et al. published a study exploring the use of automated machine learning (AutoML) technologies for classifying RD, RP, and RVO from UWFIs.[48] They trained a DL model through the Google Cloud AutoML platform using RD and normal UWFIs. The binary classification of RD achieved an 89.77% sensitivity and 78.72% specificity when the confidence level of the system was set to 0.8. This model also achieved an AUPRC of 0.921.
Glaucoma
Two studies have investigated glaucoma with UWFIs.[49, 50] In 2018, Masumoto et al. applied a DL classifier to UWFIs to detect glaucoma in a patient dataset stratified by disease severity. The study authors first categorized glaucoma patients into early (-6 dB), moderate (-6 to -12 dB), and severe (-12dB or worse) based on visual field damage from Humphrey Field Analyzer measurements. In classifying any glaucoma, the DL model achieved a mean of 0.872 AUROC. For early, moderate, and severe glaucoma, the DL model achieved AUROCs of 0.830, 0.864, and 0.934 respectively. The DL model was similarly most sensitive and most specific in classifying severe glaucoma vs. healthy UWFIs. While the results are promising, the AUROC does not reach the 0.9 threshold, which was an acknowledged weakness by the study authors.
Li et al. used DL for automated glaucomatous optic neuropathy (GON) detection using UWFIs in 2020.[49] They trained a CNN based on the InceptionResNetV2 neural network. All UWFIs were classified as containing GON or not by glaucoma specialists, based on a vertical cup to disc ratio ≥ 0.7, rim width ≤ 0.1 of disc diameter, retinal nerve fiber layer defects, or disc splinter hemorrhages. The primary dataset achieved an AUROC, sensitivity, and specificity of 0.999, 97.5%, and 98.4% respectively. The range of AUROC, sensitivity and specificity achieved were 0.983-0.999, 97.5-98.2%, and 94.3-98.4% across the primary and four external datasets. The methods demonstrated by Li et al. achieved significantly greater outcomes in detecting and classifying glaucoma on UWFIs than Masumoto et al likely due to the increased primary dataset size.
AMD
At the time of writing, two studies related to UWFIs and using DL to diagnose or detect AMD or its complications have been published.[51, 52] Matsuba et al. published a study in 2018 using DL to detect AMD on UWFIs.[51] In this study, they trained a deep CNN (DCNN) on UWFIs of healthy (no visible fundus disease) and patients with exudative AMD (wet AMD). The DCNN achieved a 0.976 average AUROC, with 100% average sensitivity, and 97.31% sensitivity in detecting wet AMD. Six ophthalmologists yielded a correct classification 81.9% of the time, with 71.4% and 92.5% sensitivity and specificity respectively. The study ophthalmologists averaged 11 minutes and 23.54 seconds for classification, while the DL model averaged 26.29 seconds.
The second study, published in 2021, comes from Li et al. who used DL for the automated detection of retinal exudates and drusen from UWFIs.[52] Images were labelled as containing retinal exudates and/or drusen (RED) or non-RED by four retina specialists. Two external datasets were then used for validation of the InceptionResNetV2 CNN model. On the primary dataset, 0.994 AUROC was achieved, with 94.2% sensitivity and 97.4% specificity. The external datasets achieved 0.972 and 0.988 AUROCs, with 94.9% and 95.1% sensitivities, and 96.5% and 97.3% specificities respectively.
Retinitis Pigmentosa
Masumoto et al. trained a deep CNN on UWFIs of retinitis pigmentosa (RP) in 2018.[53] Using UWF and UWF-FAF images, they trained a DCNN (VGG-16) to classify images based on whether they contained RP. UWFIs and UWF-FAFs from RP and healthy patients respectively were used in their dataset. The UWF DNN achieved 0.998 AUROC while that of the FAF achieved 1.00 AUROC. The UWF DCNN achieved 99.3% and 99.1% sensitivity and specificity scores respectively, while the UWF-FAF DCNN achieved 100% and 99.5% sensitivity and specificity scores. There were no statistically significant differences between the sensitivities and specificities of the UWF and the UWF-FAF DCNNs.
Antaki et al. published a study exploring the use of AutoML technologies for classifying RD, RP, and RVO from UWFIs.[48] They trained a DL model through the Google Cloud AutoML platform using RP and normal UWFIs.. The binary classification of RP achieved an 88.0% sensitivity and 100% specificity when the confidence level of the system was set to 0.5. This model also achieved 0.942 AUPRC. When repeated using the data from Masumoto et al[53] the system achieved an AUPRC of 1, with sensitivity, specificity, and PPV all increased to 100% with no misclassifications made by the AutoML model.
Pachychoroid
A single peer-reviewed study on using UWFIs in detecting pachychoroid disease has been published. Kim et al. used an AutoML platform to classify UWFIs based on their presence of pachychoroid disease.[54] Specifically, the authors trained the Google AutoML Vision on UWF indocyanine green angiography (UWF-ICGA) of healthy and pachychoroid patients. Pachychoroid and non-pachychoroid UWF-ICGA images were uploaded. They trained two models, the first of which used all images in their original orientation and the second of which horizontally flipped left eye images such that all images were of the same laterality. The first model achieved precision, accuracy, sensitivity, and specificity values of 0.8182, 0.8367, 0.8182, and 0.8519 respectively, while the second model achieved 0.8636, 0.8776, 0.8636, 0.8889 respectively. However, the mean precision, accuracy, sensitivity, and specificity scores of three retina specialists were 0.9048, 0.9388, 0.9500, and 0.9643. These results indicate that training the AutoML model with images of the same laterality led to better results, but that the current training did not reach the levels of precision or recall of retina specialists.
RVO
Three peer-reviewed studies exist on using DL on UWFIs in RVO, two of which were published by Nagasato et al, and the third from Antaki et al.[48, 55, 56] The first, published in 2018, uses UWFIs to classify and detect central retinal vein occlusion (CRVO). The study used UWFIs from CRVO and non-CRVO healthy subjects. A VGG-16 based DNN was trained on the dataset, along with fine-tuning using parameters borrowed from ImageNet. After comparing 40 DL models obtained in 40 learning cycles, they used the DL model with the highest rate of correct answers for evaluation. The model achieved 0.989 AUROC, 98.4% sensitivity, and 97.9% specificity. They similarly used a support vector machine (SVM) algorithm to detect CRVO from UWFIs. The SVM achieved 0.895 AUROC, 84.0% sensitivity, and 87.5% specificity. The DL model achieved significantly greater results in all measures compared to the SVM (p < 0.001).
In 2019, the same group completed a similar study using a DL model on UWFIs of branch retinal vein occlusion (BRVO) patients. In this study, they used the same model (VGG-16), and DNN parameters on a BRVO dataset. Specifically, they trained the DNN on BRVO and non-BRVO healthy UWFIs. They similarly tested an SVM model. In this study, the DNN achieved 0.976 AUROC, 94.0% sensitivity, and 97.0% specificity. The SVM model achieved 0.857 AUROC, 80.5% sensitivity, and 84.3% specificity. The authors demonstrated the ability of a DNN to accurately detect BRVO and the superiority of a DNN over SVMs in detecting BRVO.
In 2021, Antaki et al. published a study exploring the use of the Google Cloud AutoML platform for classifying RD, RP, and RVO from UWFIs.[48] The binary classification of RVO achieved a 84.9% sensitivity and 100% specificity when the confidence level of the system was set to 0.5. This model also achieved 0.967 AUPRC. While the sensitivity was lower than that of Nagasato et al, their model achieved comparable specificities.[55, 56]
Myopia
In 2020, Shi et al. published a study where they studied the ability of a DL system to detect myopia using UWFIs.[57] For this task, they used a custom CNN, known as the Myopia Detection network (MDNet). This network combined dense connection and Residual Squeeze-and-Excitation attention for detecting myopia. The CNN combined attention dense blocks, transition blocks, convolutional layers, max-pooling layers, and a dense layer to make full use of shallow features and improve information flow.
They trained the CNN on left and right UWFIs. The study defined severe myopia as having a spherical equivalent (SE) less than -6 diopters (D), mild myopia as between -6D and -3D, and mild myopia as SE between -3D and 0D. Images were then cropped for a region of interest around the optic disk of 400x400 pixels, centered on the optic disk and including the macula.
In evaluating the model, the study authors used mean absolute error (MAE) as the main evaluation index, as well as root-mean-square error (RMSE) and mean-absolute-percent error (MAPE). The CNN achieved optimal results at an MAE of 1.1150 D and RMSE and MAPE of 1.4520 D and 24.99% respectively. These results show that myopia is effectively detected within reasonable error using DL and UWFIs.
Idiopathic Macular Hole
A single peer-reviewed study on detecting idiopathic macular hole (IMH) using UWFIs and DL has been published. In 2018, Nagasawa et al trained a deep CNN on normal and IMH images.[32] The CNN achieved an 0.9993 AUROC, 100% accuracy, 100% sensitivity, and 99.5% specificity. The CNN was able to classify images at an average speed of 32.80±7.36 seconds for a series of 50 test images. They similarly tested the ability of human ophthalmologists to detect IMH from the same UWF test images. The ophthalmologists were able to achieve an 80.6±5.9% accuracy, 69.5±15.7% sensitivity, and 95.2±4.3% specificity, and required an average time of 838±199.16 seconds to classify the same 50 images. From this study, it is clear that IMH is more accurately and rapidly diagnosed using CNNs than trained ophthalmologists.
Retinal Hemorrhage
Li et al. published a study using a DL system to screen retinal hemorrhage (RH) from a dataset of RH and non-RH UWFIs.[58] The study used InceptionResNetV2, with weights pre-trained for ImageNet classification for CNN initialization. On the primary dataset, the CNN achieved an 0.999 AUROC, 98.9% sensitivity, 99.4% specificity, and 99.3% accuracy. Two external datasets were used for further testing, which achieved 0.998 and 0.997 AUROCs, 96.7% and 97.6% sensitivities, 98.7% and 98.0% specificities, and 98.4% and 98.0% accuracies respectively. On an external dataset, an ophthalmologist with five years of training achieved a 95.9% sensitivity and a 99.5% specificity, while an ophthalmologist with three years of training achieved 92.6% and 98.9% respectively. Here, the ophthalmologists scored sensitivities lower than the trained CNN, but specificities were close to the specificity of the CNN.
Sickle Cell Retinopathy
A single study, published in 2020, explores using DL with UWFIs to diagnose sickle cell retinopathy. Specifically, the study from Cai et al. explored the detection of sea fan neovascularization (SFN) from UWFIs of patients with sickle cell hemoglobinopathy.[59] The study notes that the detection of potentially asymptomatic SFN provides the opportunity for prophylactic scatter laser photocoagulation, which can help to reduce the rates of proliferative sickle cell retinopathy (PSR) vision loss. An InceptionV4 CNN, pre-trained on the ImageNet dataset, was trained on the image set for 100 iterations. After training, the CNN achieved an 0.988 AUROC, 97.0% accuracy, 97.4% sensitivity, and 97.4% specificity. Only a single image received a false-negative classification from the CNN, due to a severe lid artifact obscuring the retinal vasculature.
Quality Assessment
Three studies have been published on using DL methods for quality assessment of UWFIs.[60–62] The first, published in 2020 by Calderon-Auza et al focuses on using CNNs as a teleophthalmology support system, to determine the quality of images provided. Specifically, the system they proposed uses four steps to determine UWFIs quality. First, the system detects the optic disc (OD), performs quality analysis on the OD, determines obstruction (i.e. eyelash shadows) detection of the region of interest (ROI), and then segments the vessels of the image. For OD detection, faster R-CNN (FR-CNN) for feature extraction along with the AlexNet CNN architecture was used. On their dataset, this CNN configuration achieved an accuracy, sensitivity, and specificity of 0.9254, 0.9643, and 0.4424 respectively. Images determined as containing an OD were then used as the dataset for the OD quality analysis step. For this step, VGG-16 was used and achieved a 0.8612 accuracy, 0.9113 sensitivity, and 0.8064 specificity in detecting classifying ODs by quality. For obstruction analysis in the ROI, centered on the optic disk and macula, a SegNet-trained CNN achieved a 1.0 accuracy due to the low number of artifacts in the training and test sets. Finally, for vessel segmentation in the ROI, a SegNet architecture with VGG-16 proposed by the authors achieved an 0.9784 accuracy, 0.7169 sensitivity, and 0.9816 specificity on their dataset.
While the study from Calderon-Auza et al is a proof of concept of the uses of DL for detecting low-quality UWFIs, it also provides examples of these techniques in practice. For this reason, this study provides readers with a clear implementation of the uses of DL in UWFIs for a multi-step process in tele-ophthalmology.
Li et al. proposed and designed a classification system using “U-Net style” CNN and UWF-FAF in 2020.[61] UWF-FAF were graded as ungradable, poor, good, or best by ophthalmologists. The CNN achieved 90.5% sensitivity and 87.0% specificity for distinguishing between gradable and ungradable images and a sensitivity and specificity of 78.9% and 94.1% for distinguishing between optimal quality (good, best) and limited quality (poor, ungradable) images. The authors calculated the overall accuracy of the classifier as 89.0% for gradable vs. ungradable classification and 89.3% for recognizing optimal quality versus limited quality. The model also achieved an 0.920 AUROC.
In 2020, Li et al proposed a DL-based image filtering system (DLIFS) to filter out poor-quality UWFIs in an automated fashion, such that only images of sufficient quality would be used in subsequent AI diagnostic systems.[62] Images were identified as poor-quality or good-quality images by 4 retina specialists. Image quality was categorized as poor if more than 1/3 of the fundus was obscured, macular vessels could not be identified or >50% of the macular area was obscured, or if the vessels within a 1-disc diameter of the OD margin could not be identified. From this dataset, they trained InceptionResNetV2 with weights pre-trained for ImageNet. The CNN would classify the image quality for each inputted UWFI. The trained DLIFS achieved an 0.996 AUROC, 96.9% sensitivity, and 96.6% specificity. Two external datasets were used for testing, with which the DLIFS achieved 0.994 and 0.997 AUROCs, 95.6% and 96.6% sensitivities, and 97.9% and 98.8% specificities respectively.
Segmentation and Localization
Five peer-reviewed studies have been published on segmentation and localization using UWFIs, all of which focus on vessel segmentation.[43, 63–66] The first, from Ding et al presented a method to detect retinal vessels in UWF fluorescent angiography (UWF-FA).[43] In this study, the authors developed a method to produce vessel segmentation maps without previously labelled ground-truth datasets. They primarily relied on cross-modality transfer and human-in-the-loop (HITL) learning. The HITL approach, a form of semi-supervised learning, allowed the DL system to predict the vessels and respond to human feedback regarding whether it had segmented the vessels correctly and accurately. Over multiple iterations, this led to complete segmentation of the vessels. The authors were able to reduce manual annotation effort by first using morphological analysis to segment the vessels in a preliminary fashion. This was followed by a cross-modality approach that transferred vessel maps from UWF colour images to UWF-FA using robust chamfer alignment in an Expectation-Maximization framework. These were combined using the HITL iterative DL process for detection of retinal vessels.
The first step in the pipeline, relying on cross-modality transfer, trained a DNN on a dataset of ground truth colour UWFIs with UWF-FA from the same patient eye taken at the same time. Specifically, the DNN was trained on existing labelled UWFIs to extract the vessel maps from unlabelled UWFIs. These detected vessel maps were then geometrically aligned and transferred to the UWF-FA. These new vessel maps, aligned to UWF-FA, served as the approximate ground truth for training a DNN for vessel detection in UWF-FA images. From this point, the DNN for detecting vessel segmentation was continually run starting from the approximate ground truth from the UWF colour image, until the DNN did not produce maps with new changes or more vessels segmented.
The process of producing vessel maps was approached as one that would be best suited for a generative adversarial network (GAN), in producing an output image of vessel segmentation from an input image of UWF-FA.
The authors then evaluated their method of reducing the burden of annotation by calculating the number of pixels added and removed at each iteration. After 7 iterations, approximately 19,300 (2.0%) new pixels were added, and 14,100 (1.4%) of pixels were removed. In validating their approach on an external dataset, they achieved a maximal 0.987 AUROC, with significant improvements over traditional morphological techniques for vessel segmentation.
The same team of Ding et al then published a method to segment vessels from colour UWFIs via iterative multi-modal registration and learning.[63] In this project, they similarly utilized concurrently captured UWF-FA to segment the vessels from UWFIs. The first step requires multi-modal registration of the vessels segmented first from UWF-FA using a pre-trained DNN to the UWFIs, using parametric chamfer alignment. The second step utilized a learning method to mitigate the noisy labels due to the differences in the UWF-FA and UWFIs modalities. The detected UWFIs vessel maps are then used for the registration in the following iteration, allowing for iterative improvement until the segmented vessel maps are accurate. After this training, the DNN can detect vessels from UWFIs without concurrently captured UWF-FA. After training their DNN, they evaluated the model on an external dataset of UWFIs, achieving an AUROC PR of 0.886.
Nunez do Rio et al. published a study in 2020 that explored the use of DL-based segmentation for quantification of retinal capillary non-perfusion using UWF-FA.[64] Capillary non-perfusion (CNP) is a metric that is useful in determining retinal ischemia. For this reason, they sought to use UWF-FA, which is a high-resolution image with a clearly defined retinal vasculature, to quantify this. For this process, they trained a U-Net-style CNN on 75 UWF-FA that were manually graded for CNP to segment and extract the vasculature of these images. 20 images were also segmented by an expert grader manually. To standardize the CNP measurement, a circular grid of rings of increasing radius was centered on the foveal avascular zone (FAZ). The segmentation model achieved an 0.82 AUROC. Between the manually graded images and the automatically segmented images, an inter-grader dice similarity coefficient (DSC) of 65.51 was achieved. In comparing the assessment of CNP between the CNN model and the grader, a Kappa score of 0.55 was achieved. The authors conclude that this automatic segmentation method allows for a DL-based segmentation of CNP and a quantifiable measurement of CNP from UWF-FA.
Wang, Z et al. published a study in 2020 that utilized a multi-task Siamese network for separating retinal arteries from retinal veins using deep convolution.[65] They did so on an FI dataset (DRIVE), a UWFI dataset (WIDE), and an OCT dataset (INSPIRE). Using these datasets, they first segmented the vessels using a CNN-based approach, followed by skeletonization of the vessels. Next, they built a graph representing the vascular network by finding branching and end points on the skeleton map. Next, errors such as twinborn nodes produced by overlapping vessels were removed by morphological analysis through the skeleton. This produced a refined vascular graph. They then used Convolution Along Vessel (CAV) to extract visual features by convolving the image along the vessel segments and the geometric features of the vessels, by tracking the direction of blood flow in the vessels. Following this, the Siamese network was trained to learn to classify vessel types by visual features of vessel segments, and by estimating the similarity of every two connected segments by comparing their visual and geometric features. This was done to separate the vasculature types into individual trees of arteries and veins. On the WIDE dataset of UWFI, they were able to achieve an accuracy value of 94.5%.
In 2021, Sevgi et al. published a study that explored the ability to extract the cumulative retinal vessel areas (RVA) from UWF-FA images using CNN-based DL segmentation. For this study, they extracted the RVA from the available UWF-FA image frames. Images that contained the maximum RVA were considered the optimum early phase, while a frame that was taken ≥ 4 minutes after that closely mirrored the RVA from the early image was considered the late phase frame. Image analysts then evaluated the selected pairs. 1578 UWF-FA sequences from 66 sessions were used to create cubic splines and a total of 13,980 UWF-FA sequences from 462 sessions were used for evaluation. 85.2% of the sessions had appropriate images for both phases successfully identified. 90.7% of early and 94.6% of late frames were successfully identified.
Generative Image Synthesis using GANs
At the time of writing, four studies involving generative adversarial networks (GANs) and UWFIs have been published.[67–70] GANs, designed in 2014 by Goodfellow et al are ML frameworks that utilize two competing neural networks to generate new data.[71] The first neural network, named the ”generator”, generates random data. The second neural network, the “discriminator”, is trained on data that is to be modelled and produced. As the generator produces data, the discriminator will reject synthesized data from the generator that does not sufficiently represent the training data source. Through iterative processes, the generator becomes more effective at generating data that effectively represents the target data, until synthetic data that appears close to the ground-truth dataset is produced.[72] These approaches have been effective at generating data such as human faces that appear realistic.[73]
Ju et al. published another study in 2020, where they utilized GANs to produce labelled datasets of UWFIs from labelled fundus image datasets. They noted that due to the differences in fundus images (FIs) and UWFIs, labelled datasets of FIs could not be used for UWFIs. For this reason, they used a GAN to generate synthetic UWFIs for training. Using a consistency regularization method, they ensured that the pathologies present in labelled FIs were similarly present in corresponding generated UWFIs. The first step in this process required using target UWFIs to train a target-task model, which helps to regulate the quality of generated data. Following this, pseudo-labels were generated for the generated UWFIs. Finally, they used the original UWFIs samples and the generated samples to train the target-task model together. To test that the generated UWFIs were properly pseudo-labelled and carried the disease pathology of interest, they then classified the images that contained DR using a Res-Net50 based residual neural network. They then similarly validated their synthetic UWFIs by testing them with vessel segmentation and lesion detection tasks. The study authors effectively succeeded at producing high-quality UWFIs that mirrored FIs in pathology, and mirrored “natural” UWFIs in image quality and complexity.
In 2020, Xie et al. published a study where they proposed a GAN which used an attention encoder (AE) and generation flow network to build a UWFIs classifier for retinal pathologies found in patients under the age of eighteen (i.e. Coats, FEVR, morning glory syndrome, retinitis pigmentosa, and diabetic retinopathy).[69] The goal of this project was to harness the adversarial learning that occurs between the generator and the discriminator to build robustness into their classification model. Their proposed method achieved higher classification accuracy (84.75% and 97.25%) compared to classifiers based on a standard CNN architecture such as ResNet-50 (77.35% and 87.95%).
In 2020, Yoo et al used GANs in a way opposite to Ju et al.[68, 70] In their study, they utilized a GAN architecture to produce synthetic FIs from UWFIs. Specifically, they used the CycleGAN architecture to translate the UWFIs to FIs while maintaining the structure, pathology, and lesions specific to the original FIs without generating new or fake features into the FI. The authors began by using a dataset of UWFIs and FIs, which were reviewed by ophthalmologists for image quality. The GAN was trained on the dataset of UWFIs and FIs, and then tested on the test dataset of UWFIs to generate synthetic FIs. Image registration was applied to crop the region of interest on the input UWFIs, focused on the optic disk and fovea, for conversion into an FI. After training the CycleGAN model for 40 epochs, the model was able to successfully transfer the image from UWFI to FIs with high fidelity to the original UWFI structure and pathologies. For example, UWFIs with diabetic retinopathy microaneurysms and blot hemorrhages, glaucomatous optic nerves, retinal detachment, CRVO, Drusen, and retinal atrophy all had their specific lesions transferred to FIs successfully. Finally, they calculated structural similarity (SSIM) indices between the generated FIs and the ground truth FIs, and achieved an average SSIM level of 0.802, indicating strong similarities between the image produced and the ground truth image.
Systemic Diseases
UWF imaging is also being used in conjunction with DL for the prediction of non-ocular and neurological factors. As UWF imaging is a rich image format, exploratory studies have been conducted to determine if retinal changes can be associated with features like an individual’s age, vascular changes, and neurological status.
Age and Brachial-Ankle Pulse-Wave Velocity
Nagasato et al. published a study in 2020 demonstrating an ability to predict both patient age and their brachial-ankle pulse-wave velocity (baPWV) using UWFIs and DL.[74] For each patient included in the study, they also recorded patient baPWV. They then processed these images to contain the entire image (the total image), a cropped region of the optic disk and macula (the central region), and the total image with the central region covered in black pixels (the peripheral region). Each of these processed images were used as separate datasets for the model and compared by the study authors in their performance for DL prediction of age and baPWV. They then used patient baPWV, UWFI, and age as input data for a VGG-16 based CNN. The results showed that the total, central, and peripheral images were all able to predict the age and baPWV of a patient with statistical significance. Specifically, the statistical significance of the correlation between predicted and actual age and baPWV were both p<0.001 for all three datasets. Conclusively, the authors show that UWFIs can be used to make clear and specific predictions of a patient’s age and baPWV specifically, which is itself a marker of vascular health.
Alzheimer’s Disease
In 2020, Wisely et al. used multiple imaging modalities to train a DL model to identify symptomatic Alzheimer’s disease (AD).[75] In this study, the authors used UWFIs, UWF-FAFs, colour maps of ganglion cell-inner plexiform layer (GC-IPL) thickness, and superficial capillary plexus (SCP) en face optical coherence tomography angiography (OCTA) for their training. They included these imaging modalities from eyes from cognitively healthy subjects and patients with symptomatic AD. The DL model designed took the three imaging modalities as input, as well as OCT and OCTA numerical and patient data. The model used a shared-weight image feature extractor to extract modality-agnostic features that were then used in a modality-specific function in a fully connected layer. After training the model, they then tested the model on each imaging modality individually, as well as combinations of the data. They found UWFIs to lead to an 0.450 AUROC when inputted alone, and UWF-FAF to achieve an 0.618 AUROC when inputted alone. On their own, OCTA achieved an 0.582 AUROC, and GC-IPL achieved an 0.809 AUROC. All images when inputted together achieved an 0.829 AUROC, while all images along with quantitative data achieved an 0.830 AUROC. All images and all data achieved an 0.836 AUROC, while GC-IPL, quantitative data, and patient data together achieved the highest AUROC of 0.841. These findings indicate that GC-IPL has the strongest individual predictive value of symptomatic AD and that the inclusion of more imaging modalities (i.e. OCTA, UWF-FAF, and UWF) do not improve the predictive value in this case. As well, the predictive value of UWF imaging alone for symptomatic AD is low.