Computer-aided Age-related Macular Degeneration Diagnosis with the Fusion of Both Color Fundus and Fluorescein Angiography

Background: Age-related macular degeneration (AMD) is one of the most severe vision-threatening diseases, and yet Fundus Fluorescein Angiography (FFA) is the gold standard for AMD diagnosis. In recent years, many AMD computer-aided diagnosis (CAD) systems have been developed based on either color fundus images or OCT images. However, there is no CAD technique that integrates FFA with other ophthalmic imaging so far. Methods: In order to improve the performance of AMD CAD system, we propose a pioneering CAD pipeline that combines color fundus and FFA photography. This novel pipeline is the first work that incorporates FFA with any other modality. Six deep neural networks (ResNet-18, ResNet-50, ResNet-101, Inception-V3, Inception-ResNetV2, and DenseNet-201) were utilized to extract feature vectors to facilitate five classifiers (Random Forest, K-Nearest Neighbor, and Support Vector Machine with Linear, Gaussian, and Quadratic functions) for AMD diagnosis. The pipeline was validated on 664 pairs of color fundus and FFA images using 10-fold cross-validation. Results and conclusion: The accuracy and area under curve (AUC) value achieves 93.8% and 0.97, respectively. The results demonstrate that combining color fundus images and FFA images in CAD system is beneficial for AMD diagnosis, indicating promising potential to clinical practice in the future. FFA images. This system merges the feature vectors extracted from those two modalities using deep neural networks, and then accomplishes diagnosis with SVM classifier. The evaluation results show that the advantage of this dual-modality CAD system which outperforms both single fundus and single FFA


Background
The aging of society has become a global problem as the population ages above 60 will increase from a current number of 901 million to 1.4 billion by 2030 [1]. Elderly people are expecting a longer and better life nowadays [2], nevertheless, age-related diseases are still threatening the expectancies. The emerging problem of population aging and age-related diseases bring heavier burdens on healthcare providers. Among various age-related diseases, age-related macular degeneration (AMD) is one of the most severe vision-threatening diseases in people age 50 and older in the developed countries [3][4][5]. In the United States, there are two million advanced AMD patients and more than eight million mid-term AMD patients. Moreover, this number will even increase by 50% by 2020 [6]. As society ages, the number of AMD patients would increase significantly in the next decades, which decreases the quality of elderly people's life, affects their relatives, and brings an increasing burden on economies [2].
AMD is a chronic eye disease that mainly affects the macular area in the retina. The macular area that are responsible for most visual functions and visual acuity the influences central vision [7]. Basically, the AMD is of two types: non-neovascular (i.e. dry), and neovascular (i.e. wet). Dry AMD can be further divided into early stage, middle stage and late stage (geographic atrophy). At the early stage, drusen, which are yellow sediments in different sizes, shapes, and distributions, can be observed under the retina [8]. As the disease progresses, the number and size of drusen will increase. At the late stage, geographic atrophy results in progressive atrophy of RPE, capillaries, and photoreceptor cells. In wet AMD, choroidal neovascularization breaks through the retinal pigment epithelium (RPE) layer and reaches the retina, causing leakage of liquid, lipid, and blood, then finally leads to the formation of a fibrous scar.
There was no effective treatment for AMD until the discovery of anti-vascular endothelial growth factor (anti-VEGF) therapies. With anti-VEGF therapies, the prognosis of AMD has undergone earthshaking changes: nearly 95% of patients can avoid vision loss, and 40% of patients get improvements in their vision [9][10][11]. Since early detection and early treatment are critical to prevent severe visual impairment, the correct clinical diagnosis and classification of AMD is crucial, which directly affects the prognosis of patients.
In order to early diagnose AMD, techniques such as fundus color photography, optical coherent tomography (OCT), and fundus fluorescein angiography (FFA) are widely used to detect comprehensive information of the clinical manifestations in the posterior pole such as drusen, map-like atrophy, choroidal neovascularization, etc. [12,13] Among all the retinal imaging methods used for AMD detection, color fundus photography is the most convenient imaging method. However, some lesions (e.g., leakages) cannot be clearly observed on color fundus images for the ophthalmologists to diagnosis AMD. In fact, FFA represents the golden standard for AMD diagnosis, since it can precisely detect the leakage of dye (hyper fluorescence), neovascular, drusens and other lesions. Nevertheless, differential diagnosis of AMD takes a lot of time, and the conclusion is often subjective [14]. Therefore, CAD systems can help ophthalmologists reduce workload and improve diagnostic accuracy. In recent years, many AMD CAD systems based on color fundus images as well as some other modalities have been developed [15][16][17][18][19][20][21][22][23]. In their pipelines, various feature extracting approaches with different classifiers have been demonstrated to achieve promising results. Mookiah et al. [16] applied Local Configuration Pattern (LCP) features and Support Vector Machine (SVM) classifier to the AMD screening. In their research [17] and [18], discrete wavelet transform (DWT) and the empirical mode decomposition (EMD) were respectively examined as feature extractors for AMD diagnosis. They [19] also surveyed the performance of greyscale features including various entropies, Higher Order Spectra (HOS), bispectra features, Fractional Dimension (FD), and Gabor wavelet.
Classifiers consisting of Naive Bayes (NB), k-Nearest Neighbour (k-NN), Probabilistic Neural Network (PNN), Decision Tree (DT) and Support Vector Machine (SVM) were also examined and evaluated in the research [19]. Acharya et al.[20] proposed a AMD CAD tool with pyramid of histogram of oriented gradients (PHOG) features to diagnose normal, dry AMD, and wet AMD, which is the first study on three-class AMD color fundus images. In their related research [21], Radon Transform (RT), DWT, and Locality Sensitive Discriminant Analysis (LSDA) were utilized as feature extractor, and DT, SVM, PNN, and k-NN were applied as classifiers for AMD identification. They [22] also used Bidimensional Empirical Mode Decomposition (BEMD) technique and various entropy methods as feature extractor in their Posterior Segment Eye Diseases (PSED) diagnosis, which include diabetic retinopathy (DR), glaucoma, and AMD. Feature extraction is considered essential but intractability in applied machine learning, therefore extracting adequate features always bother researchers. Some researchers have applied convolutional neural networks (CNNs) to AMD diagnosis. Tan et al. [23] proposed a AMD CAD system with a 14 layer CNN model, achieving 95.45% accuracy with ten-fold cross-validation, which is the first work to apply deep learning method to perform AMD CAD. Serener et al. applied ResNet model to fast classify dry and wet AMD based on the OCT images [24]. Nevertheless, none of these researches used FFA imaging. In addition, some of their datasets are small, which limits the generalization of their results.
Represented by CNNs, deep neural networks have shown significant advantages in automatic feature extraction. However, it is worth mentioning that fined extracted features can boost the performance of classifiers such as SVM, random forests, and even is superior than fully connected deep neural networks. Especially, the computational cost is saliently reduced. Therefore, we propose a new AMD CAD pipeline that employs deep neural networks as representative learning to extract features along with binary classifiers. Both color fundus and FFA images were utilized to train the CAD systems for AMD diagnosis. In this study, six pre-trained CNN models (including ResNet-18, ResNet-50, ResNet-101, Inception-V3, Inception-ResNetV2, and DenseNet-201) were implemented to extract the features 664 pairs of color fundus and FFA images, while five binary classifiers (including Random Forest, K-Nearest Neighbor, and Support Vector Machine with Linear, Gaussian, and Quadratic functions) were investigated using 10-fold cross-validation as well.

Evaluation Metrics
The AMD CAD task is evaluated on image classification of sensitivity (Sen), specificity (Spe) and accuracy (Acc). These evaluation indexes are computed from the following equations: Furthermore, the receiver operating curve and the value of the area under curve (AUC) is used to evaluate the AMD 10-fold cross-validation classification results. Detailed results are listed in Table 2 and Table   3. The results in Table 2 demonstrate that for all CNN models, in terms of the accuracy, sensitivity, specificity, and AUC, using dual-modality images for diagnosis is the most competitive, i.e., it performs better than using only color fundus or FFA images. This result also reveals the potential advantage of combining feature vectors of different modalities which are extracted by deep neural networks. Among all the state-of-the-art models, Resnet-50 achieved the best performance, proving that this feature representation is an effective off-the shelf descriptor for AMD screening task. In addition, SVM with quadratic kernel function outperformed all the other examined classifiers such as K-NN, and Random Forest. The dual modality CAD system achieved 93.8% accuracy, 97% sensitivity, 74% specificity, which outperforms the single color fundus CAD system and single FFA system. We can observe in figure   1 that the AUC of single color fundus is 0.86, that of single FFA is 0.95, and that of dual modality is 0.97. Multiple modalities images include color fundus and FFA images contain complementary information which boost the classification accuracy.

Discussion
To the best of our knowledge, this is the first study to combine color fundus and FFA images for AMD CAD with deep learning methods. Previous study has shown that the possibility of first combining with fundus color photography and OCT can improve the diagnostic accuracy of AMD diagnosis compared to the data alone [28]. In fact, as the gold standard modality for diagnosing AMD, FFA is effective for detecting the presence of choroidal neovascularization (CNV). In FFA screening, a contrast agent is given by intravenous injection in order to collect photos of vascular visualization, and then the existence and range of CNV can be determined by observing the presence or absence of contrast agent in the morphological memory of blood vessels [29,30]. FFA is effective in determining the classification of the CNV (classic or occult), boundaries, composition, location of the neovascular complex, and can guide laser or anti-VEGF therapy [31]. In addition, the lack of proper labels of dry and wet AMD images is also a limitation of our research.
The identification of AMD types (i.e., dry-AMD and wet-AMD) is also an important research problem.
In some cases, distinguishing between dry AMD and wet AMD is difficult even for a well-trained ophthalmologist. Our research is a binary classification which focusing on the identification of AMD.
After the identification of AMD, distinguishing the dry and wet AMD can be treated as another binary classification problem. If we could have achieved a dataset with dry and wet AMD labeled, we would implement our pipeline on it for further classification of dry and wet AMD.

Conclusion
In this study, we proposed a new AMD computer-aided diagnosis system with both color fundus and FFA images. This system merges the feature vectors extracted from those two modalities using deep neural networks, and then accomplishes diagnosis with SVM classifier. The evaluation results show that the advantage of this dual-modality CAD system which outperforms both single fundus and single FFA CAD system. The proposed method achieved 93.8% accuracy and 0.97 AUC value, which can help ophthalmologist reduce the workload and improve the accuracy of diagnosis. The results also indicate that ResNet-50 with Quadratic SVM are suggested to be used as feature extraction and classifier in this CAD pipeline which achieved the best performance in this study, which has feasibility of clinical application. In the future work, along with color fundus and fluorescein angiography imaging, optical coherence tomography (OCT) and optical coherence tomography angiography (OCTA) are also suggested to extract multi-modality features. We could incorporate these two modalities into our pipeline in order to further improve the feature representation and classification result.

Datasets
We

Pre-processing
AMD screening is mainly based on the lesions in the macular area. In addition, the high brightness of the optic disc can affect the feature extraction of macular area. Therefore, we crop the color fundus and FFA images and only keep the macular area, so that we could maintain and focus on the information that is closely related to AMD. We calculate the centroid of optic disk by implementing region growing method using optic disk centroid and papilla diameter. The centroid of macular can be located indirectly.
Square area of macular with radius of 2.5 papilla diameter is utilized as input to extract universal features.
According to the corresponding coordinate positions, paired macular area of FFA images can also be cropped. Since the distance between the center of the macular area and the center of the optic disc is about 2.5 times of papilla diameter, the macular areas of the color fundus and FFA images were located and cropped by batch processing according to the corresponding coordinate positions. Since some macular areas were difficult to locate (for example, some images were collected from patients with serious diseases, whose macular areas were nearly all covered by lesions), manual cropping was needed to extract the exact macular area patches. The resolution of macular area patches was from 300×300 to 400×400, and all these macular images were subsequently normalized 224×224 and 299×299 in order to load into different Convolutional Neural Network (CNN).

Methods
Convolutional Neural Network (CNN) is a powerful machine learning technique from the field of deep learning. CNNs are trained using large collections of diverse images. From these large collections, CNNs can learn rich feature representations for a wide range of images. These feature representations often outperform hand-crafted features such as HOG, LBP, or SURF [32]. An easy way to leverage the power of CNNs. In order to fuse two modality's features, pre-trained CNNs with off the shelf strategy were used to extract feature. ResNet-18, ResNet-50, ResNet-101 [33] Inception-V3 [34], Inception-ResnetV2 [35], and DenseNet-201 [36] CNN models which achieved good performance in the ImageNet Competitions were implemented in this study. The pipeline of our study is shown in figure 3. We also used several fully connected network (FCN) layers of these pre-trained CNN models as feature extractors.
The extracted features are therefore used to train different classifiers. The names of the six examined pretrained CNN models and their corresponding FCN layers are listed in Table 1.

Figure 3
The pipeline of the Dual Modality AMD computer-aided diagnosis

Deep Feature Extraction
The pre-trained models trained on the ImageNet weights were employed in our study. where 0 and 1 mean that the fundus and FFA images was taken from normal people (without AMD) and AMD patients, respectively. Here we briefly describe the pre-trained CNN models that were examined in our research.

2) Inception-V3
Since the success in ILSVRC2014, GoogLeNet has been widely applied in image classification tasks.
Nevertheless, some disadvantages of the network, such as the complexity of its "Inception" structure, limited the progress and development of GoogLeNet. Thus, the researchers, Szegedy et al., reconsidered the Inception module in GoogLeNet and presented some general principles so that the Inception network can be modified and optimized [34]. According to these principles, they proposed a new network architecture named as Inception-V3. Inception-V3 has a depth of 48 layers, consisting of six convolutional layers, two pooling layers, three Inception structures, one linear layer, and one softmax classifier. The total parameters of this network is 23.9 million. In the off-the-shelf pre-trained Inception-V3 model, the image input size is 299×299.

3) Inception-ResnetV2
Inception-ResnetV2 is a combined model of Inception-V3 and residual network [35]. This model was

4) DenseNet-201
Unlike the models above, who focused more in extending the depth and width of networks, DenseNet "densely" connected each block to achieve the re-usage of features (shown in figure 4(b)), which made the model relatively easy to fit and convergence [36]. In addition, bottleneck was introduced to further reduce the computing burden. More specifically, The feature maps of all previous layers are treated as separate inputs by connecting them to a single tensor [ 0 , 1 , … , ], while their own feature maps are passed as input to all subsequent layers. Layer l + 1 receives the feature maps of all previous layers and can be expressed as: x l+1 = +1 ([ 0 , 1 , … , ]). The total number of parameters in this network is 20.0 million, and the input size of image is 224×224.

Classifiers
For the purpose of quantitatively assessing the performance of classifiers, we tested and compared Support Vector Machine (SVM) with different kernel functions, K-Nearest Neighbor (K-NN), and Random Forest. We also applied tenfold cross-validation to better reflect data distribution. In the SVM methods, we examined three kernel functions (linear function, Gaussian function, and Quadratic function).

1) Random Forest
Random forest (RF), also known as random decision forest, is a learning based method that have been widely applied in multiple tasks such as classifications and regressions. This algorithm was firstly proposed by Kam in 1995 [37], and subsequently developed by Breiman in 2001 [38]. The random forest, basically, uses multiple non-correlated decision trees to optimize the accuracy. While the decision tree is a widely welcomed learning-based method, the accuracy of it is limited by the overfitting issue. A sampling and averaging algorithm called bootstrap aggregating (or bagging) is introduced to reduce the variance and further resolve the overfitting. In our implementation of the random forest, the number of trees (ntree) was set to 200.

2) K-Nearest Neighbor
The K-NN approach is a famous non-parametric machine learning method. The theory is that in the dataset, it finds a group of k samples that are nearest to unknown samples. For this classifier in our study, the parameter k was set as 2 due to that we were performing a binary classification task.

3) Support Vector Machine
SVM is a classifier which finds the optimal separating hyperplane (OSH) to categorize data into, normally, two classifications. The prototype of SVM was originally proposed in 1963, and was continuously improved until 1995 when it became current well-known pattern [39,40]. The gap between the OSH and the nearest point in the data is called "support vectors". Assume that each point in the dataset can be defined as ∈ , = 1,2, … , , and belongs to a class ∈ {−1,1}.
Thus, the distance between support vectors is defined as When ‖ ‖ is minimized, distance d gets maximized, and the separating hyperplane is optimized, i.e., the OSH. This problem can be represented as a constrained quadratic programming convex optimization problem. Using the Lagrangian function: convert into a dual form of quadratic programming convex optimization: . . , ≥ 0, = 1,2, … , This problem has a single optimized solution. Let be the optimized solution, then * = =1 * The support vector consists of all the that is not 0. On the other hand, can be solved with the constraints. Thus, the optimized classifying function is: SVM uses non-linear transforms to map the input space into a high-dimension space. In the mapped new space, solving the OSH could be easier. The non-linear transform is performed by kernel functions In our implementation, parameter of penalty coefficient (C) and kernel width parameter (γ) are optimized by grid search method.