3.1. Data
We evaluated our proposed model on a dataset comprising glaucoma and normal images, collected at the Singapore Eye Research Institute, a specialized eye care institution in Singapore. The study adhered to the ethical guidelines of the Declaration of Helsinki and was approved by the SingHealth Centralized Institutional Review Board. All participants consented in writing before participating in the study. Wide-field OCT imaging was performed with a prototype swept-source system (PlexElite SS-OCT, Zeiss Meditech, Dublin, CA), which has a field-of-view of up to 56° with a depth of 3mm, to acquire volumetric OCT scans using a 15 mm x 9 mm scan protocol. Only OCT volumes with signal strengths 6 and above, and with no substantial artefacts as determined by a manual grader, were included. Each volume, as depicted in Fig. 1, contained 834 cross-sectional B-scan images, with each cross-sectional scan having a resolution of 500 x 512 pixels. Due to differences in retinal layer structure at the optic nerve head, B-scans that included the optic nerve area were excluded.
The acquired OCT dataset comprised of 74 volumetric scans, with 19 volumes without glaucoma from 19 healthy participants, and 55 volumes with glaucoma from 49 glaucoma patients. No other ocular diseases were present in the participants. A summary of the participant characteristics can be found in Supplementary Table S1. Of these, ground truth labels were available for the volumes from 13 healthy eyes and 9 glaucoma eyes. The ground truth for these volumes was established through manual annotation by a single grader masked to the disease status, and labelled as six distinct layers: the retinal nerve fiber layer (RNFL), the combined layers of ganglion cell and inner plexiform (GCL-IPL), inner nuclear and outer plexiform (INL-OPL), outer nuclear layer (ONL), the interface from inner segment/outer segment to retinal pigment epithelium (IS/OS-RPE), and the choroidal layer. Detailed illustrations of each layer are presented in Fig. 1(c). The labelled dataset, was further split into a training subset of 12 volumes (8 healthy, 4 glaucoma), a validation subset of 6 volumes (3 healthy, 3 glaucoma), and a test dataset of 4 volumes (2 healthy, 2 glaucoma). An illustration of the distribution is provided in Supplementary Figure S1. The cross-sectional B-scans from the labelled dataset, totalling 11,750 images, were stratified by participant, such that images from one participant remained in the same training, validation, or test subset. In total, there were 6,172 images in the training subset, 3,184 images in the validation subset, and 2,394 images in the test subset. The remaining 52 volumes formed an unlabelled dataset of 29,016 B-scans.
To compare the results of our segmentation, we acquired an independent set of 15 glaucomatous eyes and 15 normal eyes who underwent imaging with both the SS-OCT system and a clinically-used spectral-domain (SD-OCT) system (Cirrus SD-OCT, Zeiss Meditech, Dublin, CA). Characteristics of the data used for this comparison are provided in Supplementary Table S2. For the SD-OCT system, a 200x200 macular protocol, equivalent to a 6mm x 6mm region centered on the macula, was performed. Using the built-in review software in the SD-OCT system, we extracted measurements of the RNFL and GCL-IPL based on an annulus with an inner diameter of 1 mm vertically and 1.2 mm horizontally, and an outer diameter of 4 mm vertically and 4.8 mm horizontally, centered on the macula (Supplementary Figure S2) [50]. Measurements of superior, inferior, and global thicknesses were obtained. The same measurement protocol was applied to the segmented SS-OCT images of the same eyes. The location of the macula centers in the SS-OCT scans were manually determined by an experienced grader through assessing the anatomical structure in the B-scans.
3.2. Experimental results
Our study implemented a semi-supervised learning approach using cross-teaching techniques to segment OCT images into six distinct layers: RNFL, GCL-IPL, INL-OPL, ONL, IS/OS-RPE, and the choroidal layer. In this semi-supervised learning setup, OCT image segmentation is achieved using two distinct networks: a CNN-based UNet and a Transformer-based Swin-UNet, with training conducted on a Tesla V100 DGXS 32GB GPU using PyTorch. The training utilizes the SGD optimizer, with a batch size of 4, for a total of up to 200k iterations. The initial learning rate is set at 10− 4 and is adjusted using a poly learning rate strategy. Training incorporates both labelled and unlabelled datasets, where labelled data undergoes individual ground truth supervision, and losses are computed using cross-entropy and Dice loss. For the unlabelled data, model predictions are utilized to generate pseudo labels through the argmax function, contributing to the cross-teaching loss calculation. The overall loss is modulated by a Gaussian warming-up function, which combines the supervised loss with the cross-teaching loss to optimize the training’s effectiveness.
We evaluated the efficacy of our approach by comparing it with supervised UNet and Swin-UNet baseline models. The supervised models were trained only with the labelled dataset, consisting of with 12 volumes, validated on 6, and tested on 4 volumes. To enhance the training process, we introduced an additional 60 unlabelled volumes through cross-teaching approaches. The lowest validation loss identified the best model for each layer. The performance of the segmented models was evaluated using the Intersection over Union (IoU) metric. The IoU quantifies the extent of overlap between the model predictions and the ground truth mask. It is calculated using the formula:
$$\:IoU(P,Y)\:=\:\frac{|P\:\cap\:\:Y|}{|P\:\cup\:\:Y|}$$
1
where: P denotes the set of pixels in the predicted segmentation and Y represents the set of pixels in the ground truth segmentation.
Figure 2 illustrates the performance evaluation of the cross-teaching SSL between UNet and Swin-UNet by varying the amounts of labelled data in training for RNFL layer segmentation. Initially, the model was trained with 3 labelled volumes and 60 unlabelled volumes, achieving an IoU score of 0.799 ± 0.046. Adding more labelled volumes in increments, specifically 6, then 9, and finally up to 12, led to a significant (P-value < 0.001) improvements in IoU scores, reaching 0.826 ± 0.042, 0.841 ± 0.037, and 0.857 ± 0.037, respectively. The increase in IoU scores on the test set as more labelled data is utilized indicates a positive impact on model generalization rather than overfitting.
Table 1 presents the results from an ablation study on RNFL layer segmentation. The study evaluates the performance of different semantic segmentation models and approaches. The baseline models used for comparison are UNet and Swin-UNet. Additionally, the study explores the use of semi-supervised learning with UNet and Swin-UNet backbones. Cross-teaching with semi-supervised learning between the UNet and Swin-UNet models was evaluated (CT-SSL). Among these, the CT-SSL, achieved the highest IoU score. This indicates its superior performance in segmenting the RNFL layer accurately.
Encouraged by these findings, we extended this cross-teaching SSL approach to segment all six OCT layers using 12 labelled datasets. The IoU scores presented in Table 2 demonstrate the effectiveness of the CT-SSL model compared to the baseline models, UNet and Swin-UNet with all layers showing significant improvements (P-values < 0.001). The choroid layer showed the largest improvement with IoU scores increasing from 0.914 ± 0.062 (UNet) and 0.918 ± 0.032 (SwinU-Net) to 0.945 ± 0.033 with our CT-SSL model. The results underscore the ability of the CT-SSL model in segmenting different retinal layers within the OCT images. Figure 3 illustrates the segmentation results achieved by our model for six distinct OCT layers: RNFL, GCL-IPL, INL-OPL, ONL, IS/OS-RPE, and the choroid. For each layer, 3D views showcased in Fig. 3(a) are generated from the segmented results derived from a consistent set of B-scans. Figure 3(b) displays a side-by-side comparison between our model's segmentations and the corresponding ground truth data. A detailed 3D rotation and orientation of the OCT scan and the segmented layers can be found in Supplementary Video S1. In this figure, the left column presents the ’Ground Truth’ for each layer, serving as a reference, while the right column demonstrates the ’Predictions’ made by our model. IoU scores are provided along with the detected layers. Additionally, the thickness maps for this volumetric scan, associated with the RNFL and GCL-IPL layers, are provided in Supplementary Figure S3.
Table 1
Ablation study for RNFL layer segmentation
Model
|
IoU
|
P-value*
|
UNet
|
0.747 ± 0.134
|
< 0.001
|
Swin-UNet
|
0.837 ± 0.033
|
< 0.001
|
UNet with SSL
|
0.839 ± 0.048
|
< 0.001
|
Swin-UNet with SSL
|
0.831 ± 0.035
|
< 0.001
|
Cross-Teaching (UNet and Swin-UNet) with SSL (CT-SSL)
|
0.857 ± 0.037
|
̶
|
SSL: Semi-Supervised Learning, IoU: Intersection over Union
Values represent mean IoU and standard deviation
*P-value represents significance testing using a paired t-test comparing CT-SSL with the iter models
|
|
Table 2
Performance evaluation for each layer with respect to manual ground truth annotation
Layer
|
CT-SSL
|
UNet
|
P-value*
|
Swin-UNet
|
P-value**
|
RNFL
|
0.857 ± 0.037
|
0.747 ± 0.134
|
< 0.001
|
0.837 ± 0.033
|
< 0.001
|
GCL-IPL
|
0.858 ± 0.043
|
0.782 ± 0.119
|
< 0.001
|
0.830 ± 0.036
|
< 0.001
|
INL-OPL
|
0.882 ± 0.029
|
0.824 ± 0.087
|
< 0.001
|
0.858 ± 0.022
|
< 0.001
|
ONL
|
0.914 ± 0.018
|
0.893 ± 0.053
|
< 0.001
|
0.905 ± 0.015
|
< 0.001
|
IS/OS-RPE
|
0.925 ± 0.025
|
0.890 ± 0.058
|
< 0.001
|
0.904 ± 0.019
|
< 0.001
|
Choroid
|
0.945 ± 0.033
|
0.914 ± 0.062
|
< 0.001
|
0.918 ± 0.032
|
< 0.001
|
CT-SSL: Cross-Teaching Semi-Supervised Learning
Values represent mean IoU and standard deviation
*P-value of paired t-test comparing CT-SSL and UNet
**P-value of paired t-test comparing CT-SSL and Swin-UNet
|
|
Table 3 compares the same eyes between the RNFL and GCL-IPL thicknesses obtained from our deep learning model and those from a clinical SD-OCT system. The measurements are segmented into three defined regions within the elliptical annulus: Superior, Inferior, and Global, differentiating between normal individuals and those with glaucoma. Scatter plots for our approach and the measurements from the SD-OCT system for the global RNFL and GCL-IPL thicknesses are shown in Figure 4. The left plot shows a moderate correlation (represented by a Pearson’s r of 0.67 and P-value < 0.001) for RNFL segmentation, while the right plot reveals a higher correlation for GCL-IPL segmentation (represented by Pearson’s r of 0.91 and P-value < 0.001), highlighting the model accuracy in identifying this layer, which plays an important role in the clinical diagnosis and management of glaucoma.
Table 3: Correlation of Segmented RNFL and GCL-IPL Layer Mean Thickness with Clinical Measurements from SD-OCT
RNFL: Retinal Nerve Fiber Layer; GCL-IPL: Ganglion Cell Layer to Inner Plexiform Layer
Pearson’s correlation values of the segmented mean thickness with respect to the clinical measurements in the corresponding regions are provided