1. Patients
Pseudopapilledema was defined as follows: cases with an elevated optic nerve head and blurred disc margins, with normal visual acuity (>0.8 Snellen visual acuity), visual field, color vision, and pupillary reflex. Only those patients who did not change their optic nerve head and visual function for more than one year were included in the present study. The optic neuropathies group includes 177 cases of ischemic optic neuropathy, 48 of optic neuritis, 17 of diabetic optic neuropathy, 22 of papilledema, and 31 of retinal disorders such as central retinal vein occlusion or posterior uveitis (Figure 1-a). Normal controls were enrolled from routine examination without any abnormal findings and visual problems.
2. Data Preparation
Fundus photographs of normal and glaucoma patients were collected from Kim’s Eye Hospital. Fundus photography was performed using a non-mydriatic auto fundus camera (AFC-330, Nidek, Japan). A total of 1,369 images were obtained, including 295 images of optic neuropathies, 295 of PPE, and 779 normal control images. The obtained images were scaled to ae fixed width of 500 pixels while keeping the aspect ratio constant. To remove variations in lightning and brightness of images, the local average color was subtracted using Gaussian filtering.8 Finally, pixels of each image was normalized to have 0 mean and 1 standard deviation. In order to produce fixed-size input necessary for machine learning models, photos were cropped at the region of optic nerve with a size of pixels. Figure 1-b shows the schematic view of the image pre-processing step. The entire set of 1,369 images was split into an 876-image training dataset for training the model, a 274-image validation dataset for validation of the model while training, and a 219-image test dataset for evaluation of the final model. The validation dataset was generated by a random split of 20% of the entire dataset; the test dataset was generated by a random split of 20% of the remaining images after validation split (Table 1). Normal and PPE patients exhibited normal findings on red-free RNFL photography (Vx-10; Kowa Optimed, Inc., Tokyo, Japan), OCT (Cirrus HD-OCT, Carl Zeiss Meditec Inc., Dublin, CA and Heidelberg), and visual field testing (Humphrey 740 visual field analyzer, Carl Zeiss Meditec Inc., Dublin, CA).
3. Convolution Neural Network
I. Data Augmentation
Since the images comprised a small dataset, we applied augmentation to each image to overcome overfitting. Each image was cropped at all four corners as well as in the middle, generating five images with a fixed size of pixels. This cropping process was repeated again after flipping the image, thereby generating 10 images per photograph. Data augmentation can help overcome overfitting by showing the computer an image from various views to aid in decision-making.9
II. Training Model
We have constructed a convolution neural network, using Google’s Tensorflow deep learning framework as backend.10 In order to produce best working model, an optimum set of working hyper-parameters are needed. These hyper-parameters include learning rate, activation function, patch size, filter size, number of fully connected layers, and number of hidden nodes in each fully connected layer. However, trying out all possible combinations of hyper-parameters is very time consuming and computationally expensive. Many methods have been proposed for hyper-parameter tuning such as grid search, random search,11 genetic algorithm,12 and Bayesian optimization.13 We implemented Bayesian optimization for our hyper-parameter tuning process using python package Scikit-Optimize. Seven hyper-parameters were tuned using Bayesian optimization including number of convolution layers, number of convolution filters, number of convolution patch size, number of fully connected layers, number of hidden nodes in each fully connected layer, activation function (rectifier linear unit, exponential linear units, hyperbolic tangent), and learning rate. Max pooling layers were fixed after every convolutional layer with patch size and stride 2, and dropout layers with rate 0.5 were fixed after every fully connected layer. Mattern kernel was used for Bayesian optimization and expected improvement was used for acquisition function. The best hyper-parameters were selected after 100 rounds of updating the Gaussian process model. Figure 2 shows a schematic view of hyper-parameter tuning process. The training was conducted again with the selected hyper-parameters with Adam optimizer14 and cross entropy as a loss function until the average loss of validation data for each epoch started to increase.
III. Transfer learning
We conducted transfer learning,15 which involved training our data with a predefined (trained) existing model using three well-known convolution neural networks. These include GoogleNet Inception v3,16 19-layer Very Deep Convolution Network from Visual Geometry group (VGG) and 50-layer Deep Residual Learning also known as ResNet. 17, 18 These networks were trained using approximately 1.2 million images from ImageNet Large-Scale Visual Recognition Challenge. We modified the fully connected layers of the three networks to fit our classification needs. Bayesian optimization was used to tune the hyper-parameters. Four hyper-parameters were tuned including number of fully connected layers, number of hidden nodes, activation function, and learning rate. Dropout layers with rate 0.5 were fixed after every fully connected layer. Fine-tuning was conducted after hyper-parameter tuning using Adam optimizer and cross entropy as a loss function. Training was considered finished when the average loss of validation data for each epoch started to increase.
4. Evaluation
The model obtains an image as input and outputs the probability that the image represents a photograph of a normal subject, or one with PPE or papilledema. Since we used augmented data (10 images per photography), we generated 10 probabilities from a single image. By averaging these probability values, we obtained a single probability that the image is normal, or depicts PPE or papilledema (Fig 3-b). Using this strategy, we evaluated our model as well as GoogleNet Inception v3, VGG, and ResNet transferred model. Also, we have calculated micro-averaged sensitivity and specificity of each model and generated ROC (receiver operating characteristic) curve which indicates overall performance of how well the models classify images into three groups (Normal, PPE, papilledema).