Aim of the study
The aim of this study was to develop and validate a deep learning algorithm capable of distinguishing small choroidal melanomas from nevi in both wide- and standard-field fundus photographs.
Data sets
Fundus photographs were collected from the Ocular Oncology Service at St. Erik Eye Hospital, Stockholm, Sweden—a center receiving images from multiple institutions across Sweden. The photographs were taken using either an ultra-widefield camera (covering 200° of the fundus, Optos, Inc, Dunfermline, UK) or a standard field retinal camera (covering 45° of the fundus, Canon Medical Systems Europe, B.V., Amstelveen, the Netherlands, examples of the collected images are provided in Fig. 1A). The collection prioritized small pigmented choroidal lesions, excluding large melanomas as they are relatively easy to distinguish from nevi. Inclusion criteria were:
-
Photo taken after January 1st, 2010, marking a period in which medical records were digitalized which facilitated control over follow-up.
-
Diagnosis of either choroidal melanoma (International Classification of Diseases, 10th revision (ICD10) C69.3) or choroidal nevi (ICD-10 D31.3)
-
Diagnoses had the be established by a subspecialized ocular oncologist.
-
For lesions diagnosed as nevi at the time of photography, there had to be at least 5 years of follow-up without re-diagnosis as a melanoma. Lesions that were diagnosed melanoma at a later point in time (e.g. due to growth) were considered melanomas in this study. This criterion was introduced to facilitate the algorithms detection of early signs of malignancy at a time when a small melanoma is hard to distinguish from a nevus.
Exclusion criteria were:
-
Photos of low quality (issues with focus, movement artifacts, over- or underexposure, reflections, etc.).
-
Photos where our assessment determined that less than half of the lesion was visible, acknowledging the limitation in precisely estimating the size of the portion not visible in the photograph.
-
Lesion obscured by retinal detachment, vitreous bleeding or similar.
Out of 866 images evaluated, 112 were excluded based on the above criteria. An additional 2 images were excluded due to containing sensitive personal information (patient names and personal identification numbers), leaving 752 images for the study. These images were randomized into a training cohort (n = 495), a validation cohort (n = 168), and a test cohort (n = 89). For each image in the training and validation sets, a mask of the nevus was created, and each image was labeled with the diagnosis (melanoma or nevus). The study was approved by the Swedish Ethical Review Authority (reference 2022-06210-02) and adhered to the tenets of the Declaration of Helsinki. The requirement for informed consent was waived due to the study's retrospective nature, relying solely on previously collected data, including clinical records and images. This research did not involve any new treatments, interventions, tests, analysis of biological samples, or collection of additional sensitive information. Additionally, we followed the Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Modeling Studies, details of which are provided in a supplementary file.10
Clinical diagnosis of nevi and melanomas
St. Erik Eye Hospital in Stockholm holds the national responsibility for diagnosing uveal melanoma. Although ophthalmologists from other Swedish institutions may detect potential choroidal tumors and refer patients to our center, a definitive diagnosis of uveal melanoma is made only after a comprehensive examination at our facility, which is equipped with specialized diagnostic tools and expertise. We advise healthcare professionals, including optometrists and nurses, to initially refer patients to general ophthalmologists for a preliminary evaluation before considering a referral to our institution.
At St. Erik Eye Hospital, we conduct a comprehensive review of each patient’s medical history, including previous diagnoses, current medication regimens, and records of past ocular examinations. Our diagnostic protocol encompasses a range of procedures: assessment of best corrected visual acuity (BCVA) and intraocular pressures (IOP); wide or standard field fundus photographs with autofluorescence; OCT; slit-lamp biomicroscopy; and A- and B-scan ultrasonography. Following this evaluation, we are able to confirm a diagnosis of uveal melanoma in the vast majority of cases. On the rare occasion where clinical examinations are inconclusive, we perform either transvitreal or transscleral biopsies (Fig. 1B).11, 12 Patients with small choroidal nevi with absence of risk factors do not need to come to our institution, but may be monitored in their home clinics with periodic examinations and photo documentation. If growth is observed, or other features develop, the patient is typically sent to us for evaluation.
For this study, lesions were also assessed using the MOLES and TFSOM-UHHD criteria.13, 14 MOLES assigns a score of 0, 1, or 2 for the well-established predictors Mushroom shape, Orange pigment, Large size, Enlarging tumor, and Subretinal fluid, based on their absence, borderline presence, or presence. Lesions are classified as common nevi, low-risk nevi, high-risk nevi, or probable melanoma, based on their total score being 0, 1, 2, or more than 2, respectively. TFSOM-UHHD, stands for “To Find Small Ocular Melanoma Using Helpful Hints Daily,” or Thickness greater than 2 mm, presence of subretinal Fluid, Symptoms, Orange pigment, tumor Margin within 3 mm of the optic disc, Ultrasonographic Hollowness, and the absence of Halo and Drusen. Lesions exhibiting none of these factors have a 3% likelihood of growth over 5 years, suggesting they are most likely choroidal nevi. Those displaying one factor have a 38% chance of growth, while lesions with two or more factors have a growth probability exceeding 50% at 5 years.15
Data preprocessing and model architecture
In the preprocessing stage, each fundus photograph was resized to a resolution of 1024×1536 pixels and adjusted to include three channels (RGB) to maintain color information. To standardize brightness across the dataset, we normalized the images based on the average brightness of the training dataset, scaling the pixel values to a range of [0, 1].
We implemented a U-net architecture for our model, characterized by three down sampling layers and eight base filters, employing the Rectified Linear Unit (ReLU) as the activation function.16 This model was specifically designed to perform as a segmentation tool, with its output subsequently applied to the task of classification. The rationale behind opting for a segmentation approach, as opposed to a direct classification framework, lies in the enhanced interpretability it offers; it allows for clearer visualization of which pixels are being activated by the network. Additionally, by segmenting nevi and melanomas, we provide the network with more detailed information during the training phase, potentially improving the model's learning efficiency and accuracy.
Model training
Our model training process utilized two distinct U-net models within the Expligences Explipipe training framework: one aimed at identifying the area of the lesion, and another tasked with classifying whether the lesion is a melanoma. Both models underwent augmentation for brightness and rotation to enhance their robustness.
For the first model, which focuses on detecting the nevus area, categorical cross-entropy served both as the loss function and the evaluation metric. The process involved calculating the weighted central point of the model's output. Subsequently, a bounding box of dimensions 488×488 pixels was centered around this point, which then served as the input for the second network.
The second model, designed for melanoma classification, also utilized categorical cross-entropy as its loss function. However, the Area Under the Curve (AUC) was the chosen evaluation metric to identify the most effective network iteration. For AUC calculation, the pixel exhibiting the highest melanoma probability within the segmentation was considered the output. During the training of this second network, only melanoma segmentation masks were used, whereas nevus images were paired with empty segmentation masks. The network achieving the highest AUC was at epoch 1189, with an AUC score of 83.4% (Fig. 1C).
In the final step, a shallow random forest classifier was trained on the sorted probability outputs, applying a weighting factor of 10 to melanoma images. This strategy aimed to increase specificity, minimize false negatives, and leverage the entirety of the output data, not just the pixel with the highest probability. Incorporating this method raised the AUC to 88.5% on the validation set (Fig. 1D).
Validation of the algorithm
Statistical analysis and performance comparison
Statistical analysis was performed to compare the sensitivities and specificities of human observers (resident ophthalmologists, n = 6, consultant ophthalmologists, n = 3, and ocular oncologists, n = 3) against the gold standard diagnoses of choroidal melanoma or nevi. During the testing phase of fundus photograph assessment, both human evaluators and the algorithm were blinded to any additional patient and lesion information, encompassing clinical diagnoses and follow-up histories. The Kruskal-Wallis test was utilized to assess the overall differences among the groups for both sensitivity and specificity. Post-hoc pairwise comparisons were conducted using Dunn's test, with Bonferroni correction of P values. Mann-Whitney U tests were employed to compare the algorithm’s performance with the aggregated sensitivities and specificities of human observers. The AUC of the algorithm was compared with the AUCs of MOLES and TFSOM-UHHD score, and pairwise DeLong’s tests. Bonferroni correction was applied to multiple comparisons. P values of less than 0.05 was considered to indicate statistical significance, with all P values being two-sided. Statistical significance and confidence intervals were calculated using SciPy (version 0.15.1) and R (version 4.2.2) with the stats, PMCMRplus, pROC, dunn.test, and dplyr packages.