Study Population and Data Source
We conducted a series of studies using retrospective cases at Second Affiliated Hospital of Harbin Medical University. Cases were collected from the ophthalmology department from September 2014 to January 2021, including early POAG eyes, suspected POAG eyes, and healthy eyes. This research followed the Tenets of the Declaration of Helsinki.
All subjects should meet the following two inclusion criteria: the best corrected visual acuity ≥ 0.5logMAR, refractive error between + 3 and − 6 diopters, open angles on gonioscopy and reliable SAP exams, with false-positive errors, false-negative errors and fixation losses less than 15%. Patients with other retinal diseases, secondary glaucoma, other causes of optic neuropathy, history of intraocular surgery, or systemic diseases such as diabetes and hypertension were excluded.
The inclusion criteria of the early POAG group were: intraocular pressure (IOP) fluctuation ≥ 8mmHg 24 hours before the use of IOP-lowering drugs, glaucomatous optic disc lesions (defined when cup-disk ratio > 0.6, or cup-disc ratio asymmetry > 0.2, focal defects of the neuroretinal rim, or disc hemorrhage), glaucomatous visual field damage and the MD value is less than − 6dB.
The inclusion criteria of POAG suspects group: increase of suspicious cup-to-disk ratio and/or measurement of intraocular pressure > 21mmHg at least twice (after corneal thickness correction), OCT fiber layer scanning suggested RNFL thinning, no visual field lesions characteristic of glaucoma was observed. The exclusion criteria are the same as the earlier POAG group.
The inclusion criteria of the healthy group: intraocular pressure between 10-21mmHg (after corneal thickness correction), without family history of glaucoma, normal visual field, or normal optic disc.
All participants underwent a comprehensive ophthalmologic examination, including visual acuity, optometry, slit lamp microscope examination, gonioscopy, central corneal thickness, IOP test, SAP (SITA FAST 24 − 2, size III stimulus, Allergan Humphery-750, Carl Zeiss Inc., Dublin, CA), and OCT (Spectralis, Heidelberg, Germany). The intraocular pressure of each patient was measured by non-contact tonometer 3 times and taking the average. IOP value of POAG patients was measured under IOP-lowering medication.
The fundus photograph was taken without mydriasis (Canon CR4-45 NM, Japan). All participants underwent SAP tests at least twice, with the first time as a trial. Reliable visual field examination results were selected. SAP parameters were obtained including visual field index (VFI), mean deviation (MD), and pattern standard deviation (PSD). All OCT images were performed by ophthalmologists with more than 5 years of experience in OCT scanning. No mydriasis was required. Scanning was operated with a diameter of 3.46mm around optic disc. The RNFL thickness (RNFLt) values of 6 quadrants (temporal, nasal, superior temporal, superior nasal, inferior temporal, and inferior nasal) were obtained. The interval of all examination for the same patient should not exceed 1 week.
The DL-ML Framework
Three glaucoma specialists (experience over 5 years) manually marked OD and OC in each photograph with patient information hidden. A senior specialist with over 10 years’ experience in glaucoma checked afterward. A deep learning method was used to segment optic nerve head based on ophthalmologists’ annotation (Fig. 1). And the DL computed the vertical cup to disc ratio (VCDR), the horizontal cup to disc ratio (HCDR), and calculated whether there was violation of the inferior > superior > nasal > temporal (ISNT) rule. MLCs were used to get the final classification results. From examinations above, 15 features were used to test and verify MLCs including age, 6 sectoral RNFLt, VFI, MD, PSD, IOP value, HCDR, VCDR, and rim comparison results obtained by FP segmentation (Fig. 2).
The total software environment is Python version 3.6 (Anaconda, Inc). The segmentation framework was implemented in Python based on Keras, with the Tensorflow backend. We used the Adam optimizer and set the initial learning rate as 1e-3. Two NVIDIA GeForce RTX 2080 GPUs were used to support the computation with a mini-batch size of 4. We totally trained the segmentation network with 200 epochs in each folder. The classifiers were programmed in the Sklearn packages mainly. In the first stage, the main segmentation code was based on Wang’s research. In our paper, we used the segmentation network of deeplabv3plus with a backstone network of MobileNetV2. In the encoder stage, we adopted channel attention before the skip-add branch in every block. It could strengthen the important feature and weaken the bad feature for the segmentation task and then make the network focus on the more useful information. In the decoder stage, in order to make full use of the low-level information and high-level information and reduce the bad influence from encoder, we reduced the number of channels of low-level features and add channel attention after the combining the features and make the more important information to transmit the deeper module the get the better segmentation. Region of Interest (ROI) was defined as the region that include the optic disc. We located and cropped the images with the size of 512*512. ROI images were fed in the segmentation network and we trained the DLC and got the mask of ROI images. FP sets were tested with 8-fold cross validation (training: validation: test = 6:1:1). Finally, we marked the edge of mask on the original images to show the performance.
In the second stage, we trained and tested the following algorithms: Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR) and Multilayer Perceptron (MLP). More concretely, MLP had one hidden layer including 512 units. All 4 machine learning methods were tested with 5-fold cross validation.
Dice, accuracy,sensitivity, specificity, F1-score, Mathews correlation coefficient (MCC), area under receiver operator characteristic curve (AUROC) were calculated as evaluation parameters for each method. The formula and meaning of these performances are explained in supplementary file1.
AUROC value equals to 1 indicates a perfect model. More specifically, AUC was calculated by averaging the performance of 5-fold cross validation. The number of correct categories was summed and divided by the total number of samples in each of the three categories to find the accuracy value. To calculate the sensitivity, specificity, F1-score, and MCC value, the data were classified as follows: group 0 which sets POAG suspects as positive samples, early POAG and healthy eyes as negative samples; group 1 which sets healthy eyes as positive samples, early POAG and POAG suspects as negative samples; group 2 which sets early POAG as positive samples, POAG suspects and healthy eyes as negative samples. Three matrices were generated, and the results were calculated by averaging the performance of these three groups. Feature importance was showed by RF.