The training/validation dataset included 750 RNFL OCT B-scans from 250 eyes (60 eyes with NAION, 50 eyes with ON, and 140 control eyes). These study eyes were split into training (80% of the sample) and validation (20%) datasets.
Performance of our algorithm was evaluated in 370 scans acquired in 132 eyes from the test set. There was no overlap between the training and testing sets.
After excluding 10 scans due to centration and quality issues, the final test set consisted of 194 OCT B-scans from 70 healthy eyes, 82 scans from 28 NAION eyes, and 84 scans of 29 ON eyes. The mean of the manually corrected average RNFL thickness (ground-truth) was significantly lower in NAION (69.7± 19.3 μm) and ON (76.0 ±16.2 μm) eyes compared to control (100.2 ±10.8 μm) eyes (P < 0.001, Kruskal-Wallis). There was no difference in average RNFL thickness between ON and NAION eyes (P=0.96). NAION scans had lower OCT quality scores than ON scans (23.7 ± 4.5 vs 25.7±4, P=0.03, Kruskal-Wallis) and control scans (26.8 ± 4, P< 0.001). In all datasets, the mean age at OCT scan was 62.8 ± 9.4 years in the NAION group, 30.8 ± 10.1 years in the ON group, and 42.3 ± 18.4 in the controls.
We conducted a two-step process to obtain our results: the first step was our U-net evaluation, and the second step was considering the RNFL thickness measurement using the U-net algorithm in the three different groups (control, ON, NAION) for the test images [11,12].
Our U-net model yielded high performance in the test and validation images. The sensitivity and specificity of our proposed model on the validation data sets were 0.91 and 0.90, respectively. The same measures on the test sets were 0.88 and 0.86, respectively. The Dice coefficient between our proposed segmentation and manual segmentation by an expert for validation data set was 0.90, and for the test images was 0.87.
We also compared the estimate of RNFL thickness measurements in seven sectors (average, nasal, temporal, superior-temporal, superior-nasal, inferior-temporal, inferior-nasal) by three different methods (our U-Net algorithm, conventional OCT machine data, and the manual segmented best estimate determined by the ophthalmologist) in the three different groups (Table 1):
Table 1
Comparison of estimate of RNFL thickness measurements (µm) in seven sectors by three different methods in the control, non-arteritic anterior ischemic optic neuropathy (NAION) and demyelinating optic neuritis (ON).
Parameter
|
Control
|
NAION
|
ON
|
Ground truth
|
OCT Machine
|
U-Net
|
Ground truth
|
OCT Machine
|
U-Net
|
Ground truth
|
OCT machine
|
U-Net
|
Average
|
100.3 ± 10.9
|
100.2 ± 11.2
|
100.1 ± 10.8
|
69.7 ± 19.3
|
64.4 ± 21.3
|
70.5 ± 19.4
|
76.1 ± 16.2
|
75.8 ± 16.6
|
77.1 ± 16.1
|
Nasal
|
75.6 ± 12.6
|
75.4 ± 12.9
|
76.3 ± 12.3
|
54.5 ± 16.9
|
51.3 ± 23.9
|
55.5 ± 16.7
|
57.8 ± 14.6
|
57.8 ± 14.5
|
58.9 ± 14.5
|
Temporal
|
68.4 ± 10.8
|
68.4 ± 10.8
|
68.9 ± 10.9
|
51.9 ± 18.2
|
49.6 ± 21.1
|
53.1 ± 18.2
|
47.4 ± 16.2
|
47.3 ± 15.9
|
48.3 ± 15.9
|
Nasal inferior
|
117.1 ± 22.8
|
116.9 ± 23.1
|
117.8 ± 22.9
|
89.9 ± 40.0
|
89.1 ± 42.1
|
90.6 ± 39.7
|
91.5 ± 24.4
|
92.6 ± 24.3
|
92.6 ± 24.3
|
Temporal inferior
|
145.1 ± 22.2
|
145.1 ± 22.2
|
145.8 ± 22.0
|
101.7 ± 43.2
|
100.3 ± 45.7
|
102.9 ± 43
|
108.4 ± 33.1
|
108.3 ± 33
|
109.6 ± 32.9
|
Nasal superior
|
114.8 ± 20.8
|
115.1 ± 20.9
|
114.8 ± 22.4
|
70.7 ± 20.7
|
67 ± 27.1
|
71.9 ± 20.6
|
93 ± 23.2
|
92.6 ± 24.1
|
94.1 ± 23.1
|
Temporal superior
|
137.2 ± 18.1
|
137.2 ± 18.1
|
137.8 ± 18.1
|
82.3 ± 34.9
|
77.3 ± 36.2
|
83 ± 35.2
|
105.8 ± 25.8
|
105.4 ± 26.7
|
106.7 ± 25.8
|
1. RNFL thickness in the normal control group:
There was no significant difference in average RNFL thickness amongst the three methods of measurements (P=0.69, ANOVA). The mean average RNFL thickness using our U-Net algorithm segmentation was not different from the manual segmented best estimate (ground truth) (101.1 ± 10.8 µm, vs 100.2 ± 10.8 µm; P>0.99). Similarly, there was no significant difference between the OCT machine average RNFL thickness (100.2±11.2) and the manually segmented value (P>0.99). Both RNFL thickness from U-Net algorithm segmentation and the conventional OCT machine data were strongly correlated with RNFL thickness obtained from manual segmentation (r² =0.99 and 0.98) with no significant difference between two correlations (P=0.33) (Figure 3). Mean absolute error (MAE) of the average RNFL thickness was 1.04 ± 0.74 μm and 0.18 ± 1.23 µm in the U-Net algorithm segmentation and the conventional OCT machine data, respectively. There was no significant difference between the two MAE numbers (P= 0.93).
2. RNFL thickness in the NAION group:
The Kruskal-Wallis test showed a significant difference in average RNFL thickness amongst the three methods of measurement (P=0.02). While the mean average RNFL thickness using our U-Net algorithm was not different from the manually segmented value (ground truth) (70.5 ± 19.4 µm, vs 69.7 ± 19.3 µm, respectively; P>0.99), the OCT machine RNFL thickness (64.4 ± 21.3 µm ) was lower than the manual segmented value (P=0.04). Furthermore, a significant difference was also found between U-Net calculated RNFL thickness and OCT machine thickness (P=0.009). The correlation between the manually segmented RNFL thickness and the U-Net average RNFL thickness (r=0.99) was stronger than the correlation between manually segmented RNFL thickness and the OCT machine RNFL thickness (r=0.95) (P=0.02) (Figure 3, 4). The MAE of the average RNFL thickness was 1.18 ± 0.69 μm and 6.65 ± 5.37 μm in the U-Net algorithm segmentation and the conventional OCT machine data, respectively. There was a significant difference between the two MAE numbers (P=0.0001). Specifically, the MAE for nasal, nasal superior, and temporal superior RNFL thicknesses with U-Net segmentation were 1.48 ± 1.26 μm, 1.64 ± 2.19 μm, and 1.69 ± 1.38 μm, respectively. The MAE for the corresponding thickness sectors with the conventional OCT machine were 6.16 ± 10.02 μm, 5.08 ± 10.47 μm, and 6.26 ± 13.16 μm, respectively.
3. RNFL thickness in the ON group
The average RNFL thickness was not significantly different amongst the three methods of measurement (P=0.66 Kruskal-Wallis). The mean average RNFL thickness was 76.1±16.2 µm with manual segmentation, and 77.1±16 µm versus 75.9±16.6 µm using U-net algorithm segmentation and the OCT machine, respectively. Both average RNFL thicknesses from U-net algorithm segmentation and the conventional OCT machine data were strongly correlated with RNFL thickness obtained from manual segmentation without a significant difference between them (r =0.99 and 0.99, respectively, P=033) (Figure 3). The MAE of the average RNFL thickness was 0.2 ± 1 μm and 1.2 ± 0.71 μm with OCT machine and U-Net segmentation without a significant difference between them (P=0.93).
This study investigated quantification of peripapillary RNFL thickness on OCT with deep learning (U-Net) in NAION eyes, ON eyes and controls. We compared the manually segmented (ground-truth) estimate of RNFL thickness with our U-Net algorithm and conventional OCT machine data. First, we showed high performance of our U-Net model in the test and validation images. The Dice coefficient between our model and manual segmentation for the validation data set was 0.90, and for the test images was 0.87. Second, both the RNFL thickness from U-Net algorithm segmentation and the conventional OCT machine data were similar to the RNFL thickness obtained from manual segmentation in control and ON eyes. However, in NAION eyes, the mean average RNFL thickness using the OCT machine was different from the manual segmentation (Figure 5). In these eyes our U-Net algorithm was not different from the manually segmented value (ground truth).
Errors in segmentation of the RNFL and estimates of its thickness are not uncommon and could lead to disease misdiagnosis. Several studies have demonstrated high rates of errors on peripapillary RNFL segmentation in glaucomatous eyes.[4-6,13,14] Mansberger et al.[5] have found that automated OCT machine data resulted in a 1.6 μm thinner RNFL thickness than the ground-truth measurements determined by manual refinement. Manual refinement changed 8.5% of scans to a different global glaucoma classification wherein 23.7% of borderline classifications become normal. A few studies have shown RNFL segmentation problems in neuro-ophthalmology and its impact on disease follow up.[7] In this study we showed that the mean global RNFL obtained from OCT machine automated segmentation has MAE of 6.65 ± 5.37 μm in NAION eyes compared to ground-truth manual segmentation. However, this error was significantly lower in ON eyes and controls. The reasons for segmentation error in NAION eyes are multifactorial. Another study found three common sources of RNFL imaging artifacts: posterior vitreous detachments, high myopia, and epiretinal membranes, with the third being the most common culprit.[13] Our NAION eyes, which were older than ON and control eyes, likely had a higher frequency of vitreous detachment and epiretinal membrane. Other studies also indicated that the difference between the automated and ground-truth thickness increased with older age, thinner RNFL thickness, and lower scan quality.[5] Miki et al.[14] also showed a 20.7 % segmentation failure in glaucoma eyes, which is significantly correlated with low signal strength index and large disc area.[14] Of note, our NAION scans had lower scan quality than ON and control scans. It seems that sub-optimal scan quality reduces the accuracy of automated segmentation. A reduction in signal strength from a media opacity such as dry eye, corneal opacities, and cataract or vitreous opacities can result in artifacts in layer segmentation and interpretation.7
Several studies have used deep learning algorithms for retinal segmentation in normal and age-related macular degeneration eyes.[15-17] In the optic nerve head scans, Devalla et al.[18] developed a DL algorithm which achieved good accuracy when compared to manual segmentation. Jammal et al.[9] also developed a DL algorithm that detects errors in RNFL segmentation in glaucoma and normal eyes. In this study, the test sample consisted of scans with at least one RNFL segmentation error and scans without error as defined by a human grader, and the algorithm was trained to output the probability of a segmentation error in test data. For a probability cut point of 0.5, the DL algorithm was 95.0% sensitive and correctly identified 1,172 of the 1,234 scans that had any segmentation error(s) in the test sample. The same group in another study predicted the RNFL thickness from raw unsegmented scans using DL.[19] In images without segmentation errors, they found a high correlation of segmentation-free DL RNFL predictions with conventional OCT RNFL thickness calculations. In low-quality images with segmentation errors, segmentation-free DL predictions had higher correlation with the best available estimate compared to those from the conventional OCT machine. The MAE was 4.98 ± 5.85 μm for DL RNFL estimates and 8.59 ± 11.26 μm for OCT machine estimates.[19] However, the ground truth in their study was considered the best available estimate from a good quality scan of the conventional OCT, rather than manual segmented data as a ground truth as in our work. We found MAE of the average RNFL thickness in NAION eyes with lower scan quality was 1.18 ± 0.69 μm and 6.65 ± 5.37 μm in the U-Net algorithm segmentation and the OCT machine data, respectively. Interestingly, in ON and controls, both the average RNFL thicknesses from U-net algorithm segmentation and the conventional OCT machine data were strongly correlated with RNFL thickness obtained from manual segmentation without a significant difference between them.
Our study had several limitations. First, our data set was smaller than in glaucoma studies, which is expected in light of the relative rarity of other optic neuropathies compared to glaucoma. In addition, we do not know if the segmentation performance would improve when trained upon a larger data set. Second, our U-Net was trained with the images from the Spectralis OCT machine and therefore, we could not extrapolate our algorithm to scans of other OCT devices. Finally, our supervised DL algorithm was trained to be only as good as the manual segmentation according to an ophthalmologist which was subject to bias.
Overall, using U-Net, we were able to segment the RNFL layer in three groups of eyes. When trained and tested on compensated images, there was good correlation with manual segmentation in control eyes, ON eyes, and NAION eyes. In contrast, conventional OCT machine segmentations were prone to errors in NAION eyes, resulting in inaccurate RNFL thickness measurements. In addition, in lower quality scans, our U-Net segmentation performance was similar to ground truth, and therefore this algorithm may provide robust RNFL thickness estimates both in good quality images as well as in those that are prone to segmentation errors such as may occur in NAION eyes. Such an algorithm could be helpful in clinical practice for assessing RNFL thickness in NAION eyes as well as ON eyes.