CNN with Multiple Input for automatic glaucoma assessment using Fundus Images

In the area of ophthalmology, glaucoma aﬀects an increasing number of people. It is a major cause of blindness. Early detection avoids severe ocular complications such as glaucoma, cystoid macular edema, or diabetic proliferative retinopathy. Intelligent artiﬁ-cial has been conﬁrmed beneﬁcial for glaucoma assessment. In this paper, we describe an approach to automate glaucoma diagnosis using funds images. The setup of the proposed framework is, in order: The Bi-dimensional Empirical Mode Decomposition (BEMD) al-gorithm is applied to decompose the Regions of Interest (ROI) to components (BIMFs + residue). CNN architecture VGG19 is implemented to extract features from decomposed BEMD components. Then, we fuse the features of the same ROI in a bag of features. These last are very long; therefore, Principal Component Analyses (PCA) are used to reduce features dimensions. Obtained bags of features are the input parameters of the implemented classiﬁer based on the Support Vector Machine (SVM). To train the built models, we have used two public datasets, which are ACRIMA and REFUGE. For testing our models, we have used a part of ACRIMA and REFUGE plus four other public datasets, which are RIM-ONE, ORIGA-light, Drishti-GS1, and sjchoi86-HRF. The overall accuracy of 98.31%, 98.61%, 96.43%, 96.67%, 95.24%, and 98.60% are obtained on ACRIMA, REFUGE, RIM-ONE, ORIGA-light, Drishti-GS1, and sjchoi86-HRF datasets, respectively, by using the model trained on REFUGE. Against an accuracy of 98.92%, 99.06%, 98.27%, 97.10%, 96.97%, and 96.36% are obtained on ACRIMA, REFUGE, RIM-ONE, ORIGA-light, Drishti-GS1, and sjchoi86-HRF datasets, respectively, using the model training on ACRIMA. Obtained experimental results from diﬀerent datasets demonstrate the eﬃciency and robustness of the proposed approach. A comparison with some recent previous work in the literature has shown a signiﬁcant advancement in our proposal.


Introduction
Glaucoma is an acute or chronic visual neuropathy, characterized by the progressive death of the ganglion cells of the retina, causing damage to the optic nerve and the visual field. The final consequence of glaucoma is permanent blindness without recovery [1]. It is a serious and most important eye disease in ophthalmology, being the leading cause of blindness and low vision for adults with the incurable disease worldwide. Glaucoma is one of the most pressing problems in ophthalmology today. It is a progressive chronic disease that affects between 8-10% of people worldwide. Generally, the disease affects people over than 40 age, although it occurs in young people and even in childhood. In numbers, 65 million people worldwide living with glaucoma in 2010. This number rises to 79.6 million in 2020, increasing to 111.8 million individuals will have glaucoma in 2040 [2], [3]. As well as glaucoma, blindness goes from 6.7 million people in 2000 to 8.5 million people in 2010, then by estimate about 11 million in 2020 [3]. The constant increase in blindness and morbidity due to this disease in the context of socio-economic changes, determine the urgency of the glaucoma problem and improve the system for its detection, treatment, and monitoring of patients. This applies in particular to the most frequently diagnosed open-angle primary form [4][5][6][7][8][9]. To increase the effectiveness of prevention, early detection, and treatment of glaucoma, reduce disability, it is necessary to study automatic and weak decision support tools. In this context, several scientific pieces of research emphasize this subject as a main research axis. In the last decade, artificial intelligence (AI) has become a promising and essential field for the development of methods for health diagnostics, treatments, and interpretation of databases and medical images [2]. Machine learning is a class of artificial intelligence methods used to analyze complex data and find patterns and interdependencies without explicit programming [5,6].
Machine learning algorithms analyze the signs of the input data and, through a series of repetitive operations, spread over several layers produce a linear and nonlinear prediction of the models that define the signals, classify the models, and predict the results [3][4][5][6]. There are many classic automatic learning methods, namely : random forest, learning associative rules, decision tree, inductive logic programming, nearest neighbor method, vector method support, artificial neural networks, etc. [5,[7][8][9]. In the last few years, convolutional neural networks take an important place, and its improvements increase from year to year. Typically, a neural network is made up of layers of interconnected nodes. The node contains a weighted sum, which is passed through the output link to the activation function and then to the next node. The values of the weighted amount change dynamically during the training phase. There are three types of layers: • Input layer: The reception of input data; • Hidden layers: highlighting patterns among the data; • Output layer: transmission of information processing results.
For overcoming diagnosis difficulties, many automated methods have developed for glaucoma diagnosis. To know, the methods based on extraction and selection for glaucoma assessment [10][11][12][13] and the methods used deep learning approach for glaucoma diagnosis [14][15][16][17][18][19]. These developed methods have improved the diagnos-tic quality, but the performance is not yet reaching the desired accuracy, as well as is not yet stable. As a result, researchers are still developing new techniques to improve diagnosis. In this work, we propose a new method to automate glaucoma assessment using funds images. The setup of the proposed method is, in order: The BEMD algorithm is applied to decompose the image to components (BIMFs levels + residue). Then, CNN architecture VGG19 is called to extract features from decomposed BEMD components. Next, we fuse the extracted features of the same image in a bag of features. This last is very long; therefore, Principal Component Analyses (PCA) are used to reduce features dimensions. Selected bags of features are the input parameters of the implemented classifier based on the SVM. To train the built models, we have used two public datasets, which are ACRIMA and REFUGE. For testing our models, we have used a part of ACRIMA and REFUGE plus four other public datasets, which are RIM-ONE, ORIGA-light, Drishti-GS1, and sjchoi86-HRF. The proposed method has shown a significate advancement.
The rest of this paper is organized as follows: in section 2, we present an overview of some previous related work. Section 3 introduces the used materials and the proposed methods. The experiments are introduced in section 4. The discussion is given in section 5. finished by a conclusion in section 6.

Related Previous Works
In recent years, machine vision has made spectacular leaps forward, notably thanks to recent breakthroughs in optimization and the explosion in the computing power of computers. What has been done for facial recognition, for example, the research community is trying to replicate in the medical field. Artificial intelligence, also known as AI, can process thousands of images in seconds and detect with great precision important information that a radiologist would have taken months to find. For this reason many computer-aided diagnoses has been developed to help physician improving the diagnosis results in several medical fields, such as breast cancer [20], [21] [22], brain cancer [23], retinopathies diabetic [24], [25], etc. In this paper, we have implemented the BEMD and a CNN architecture to automate glaucoma assessment using funds images.

Bi-dimensional Emperical Mode Decomposition
Empirical Mode Decomposition (EMD) [26] is an approach adapted to nonstationary time-frequency analysis and has been applied in several fields, namely, biological [27], marine environment [28], structural diagnosis [29], mechanical diagnosis faults [30], [31], and other domains based on one dimensional signal. In addition, Nunes et al. [32], [33] have Introduced EMD in image processing (twodimensional signal) and they have developed the algorithm for two-dimensional decomposition in empirical mode (BEMD). This algorithm has caught the attention of several researchers and has been applied in image denoising [34], image compression [35], [36], image segmentation [37], [38], scaling the image [39], extraction of image characteristics [40], texture synthesis [41], and classification of image texture [42]. There are many methods of decomposition signals and images (DCT, Wavelets, Fourier, etc.) which assume decomposition on a basis given a priori. The major advantage of empirical modal decomposition (EMD) is to define decompositions of the images which do not depend on the choice of a particular base. In addition, the EMD is particularly well suited for the study of non-stationary signals. We will present the original EMD algorithm whose efficiency is recognized by the applications it allows to process. Similar to the Empirical Mode Decomposition (EMD) of the one-dimensional signal, the BEMD for two-dimensional images based on the extreme that exists in the original image or obtained from the first derivative of the original or higher-order derivative, to achieve the decomposition of the image signal. Distances between extrema may provide the information describing the image on intrinsic length scales. The Empirical Mode Decomposition (EMD) method is an adaptive decomposition that allows decomposing any signal into a redundant set of signals intrinsic mode functions denoted BIMF [43]. In this paper, we have implemented the BEMD to describe regions in order to make the classification phase very easy given the use of several components in the same region.

Glaucoma diagnosis
The following section introduces the related recently previously developed methods of retinal abnormalities classification. Numerous promising approaches are coming up for solving the problems of retinal abnormalities classification in fundus images.
In the previous few years, hard efforts had been made to develop automated systems to assist practicians to detect and diagnose retinal abnormalities using fundus images. The use of computer-aided diagnosis has become increasingly relevant to medical decision helping; recently artificial intelligence methods have demonstrated significant advances in the medical area. In this section, we introduce a simple overview of previous and current work developed artificial intelligence tools and their applications in the retinopathy field, to help the practician's ophthalmologists early diagnose retinal abnormalities generally and glaucoma specifically, also understand their potential impact on glaucoma care. Diaz-Pinto, Andres, et al [14] have developed an approach for automatic glaucoma assessment using five different ImageNet-trained models (VGG16, VGG19, InceptionV3, ResNet50, and Xception), five different databases based on funds image. In this work, the authors applied directly CNN with different architecture and without passed by the segmentation method and the extracted features to automate glaucoma assessment. Their proposal gives good results generally (an average AUC of 0.9605, an average specificity of 0.8580, and an average sensitivity of 0.9346), but gives modest results and no stability when they change testing set (an AUC of 0.7678, an accuracy of 0.7021, a Sensitivity of 0.6893, and a Specificity of 0.7020) by using ACRIMA dataset. Chakravarty, Arunava, and Jayanthi Sivaswamy [10] have published an article that trait the subject of glaucoma classification by using the classical method, regions of interest segmentation, feature extraction then classification. The novelty in this work that the authors used different features and fusing them for a more ROIs description. Obtained results in the test set are too modest (a Sensitivity of 80%, a Specificity of 60%, an accuracy of 76.77%, and an AUC of 78% ) using DRISHTI-GS1. Maheshwari, Shishir [11] Have presented a method to automate glaucoma diagnosis using Empirical Wavelet Transform (EWT) based on fundus images. In this paper, the authors have applied The EWT to decompose the image, and then features are extracted from decomposed EWT components. Selected features have ranked based on the t-value feature selection algorithm. Obtained features are used to classify images as normal or glaucoma using Least Squares Support Vector Machine (LS-SVM) classifier. The LS-SVM is implemented with Morlet wavelet, Mexican-hat wavelet kernels, and Radial Basis Function (RBF). The proposed method is achieving a classification accuracy of 98.33% and 96.67% using three-fold and ten-fold cross-validation respectively. This work achieved high performance, except that used databases are privates, therefore, we cannot compare their proposal this the existing in the literature.

Materials and methods
In this section and their subsection, we introduce the used materials and methods. First, we present the used datasets (ACRIMA, REFUGE, RIM-ONE, Drishti-GS1, ORIGA-light, and sjchoi86-HRF), next by the proposed approach.

Used databases
To train, validate and testing the proposed approach, we have used six publicly available databases. Witch are: ACRIMA [14], REFUGE [15], RIM-ONE [44], Drishti-GS1 [45], ORIGA-light [46], and sjchoi86-HRF [47]. Which are represent database reference for the majority of research work in the field. Table 1 below presents a description and statistics about using databases.

K-fold cross-validation
Learning the parameters of a prediction function and testing it on the same data is a methodological error: a model that would simply repeat the labels of the samples we have just seen would have a perfect score but could not predict anything d 'useful on unseen data. This situation is called overfitting. To avoid this, it is common during the execution of a machine learning experience (supervised) to keep some of the available data in the form of a test set. Therefore, in this work, we have used the 5-fold cross-validation technique to avoid overfitting.

Proposed Methodology
The proposed method's framework is shown in Figure 2. The proposed method steps are: first, we apply the BEMD algorithm to decompose ROI to components (BIMFs+Residue), these components are the inputs of CNN architecture VGG19. Then, we extract the features of the input components. Next, we fuse the features of the components of the same ROIs in a bag of features. This list is very long; therefore, PCA is used to reduce the feature dimension and select optimal features. Finally, we apply binary classification by implementing SVM to classify ROI on funds images to normal or glaucoma. The following subsections discuss the proposed method in detail. Fig. 3 Architecture description of the proposed approach for glaucoma diagnosis, by using (1) CNN-based deep learning to conduct feature learning, and (2) an SVM for the final prediction (Normal or glaucoma)

BEMD applied
Due to glaucoma disease, obviously optic disk structure, cup disk structure are changing, therefore the corresponding images change, which results in intensity, texture, and even shape changes. Sometimes these changes are not too clear and readable. Some details are hidden in digital images; hence, the use of BEMD is useful in order to extract more detail. First, the BEMD algorithm was applied to decompose ROIs to a list of BIMF according to different intensity levels plus a residue. The BIMF and residue will name components. Each layer is considered as a CNN input. Figure 3 to

Feature Extraction by CNN
After BEMD algorithm applied to decompose image into components (BIMF +Residue). CNN architucture VGG19 is used to extract feature from input data. Feature extraction and selection to describe candidates regions in order to classify them is a crucial task, and it affects directly classification accuracy. Recently, the convolutional neural network (CNN) architecture is very useful to design powerful filters to extract the sensitive image features for the classification task. Therefore, is resolved this problem almost definitively. In this paper, we implement the CNN   To classify healthy and glaucomatous eyes. We build a list of parallel models in a total of 12 VGG19 models with inputs differently. The input of the first model is the original candidate region; the inputs of model 2 to model 11 are the BIMFs of ROI: BIMF1 to BIMF10 respectively, and the input of the last model is the ROI

Feature Fusion and Dimensionality Reduction
After features extraction of input data, feature vectors of the same region are concatenated to form a higher dimensional bag of features. Generally, there is some correlation and many information redundancies among these feature and higher dimensional features lead to higher computation complexity. Therefore, we applied principle component analysis (PCA) for dimensionality reduction and selection of features. In this paper, we define the eigenvalue statistical rates as the ratio of the number of principal components (eigenvalues) retained by PCA to the number of all components. Extracting features from the image helps to fully learn the attributes for the detailed description of their rich internal information, except that a few are unnecessary or redundant, resulting in additional computational complexity. Therefore a study for dimensionality reduction is desirable. Through this dimensionality reduction operation, we can achieve a compact, optimal description of the built-in features, resulting in lower computational complexity and better image detection and classification performance in a stress-free environment.

Binary Classification Using SVM
The traditional SVM is a non-probabilistic linear classifier. In practice, it goes through two phases: a learning phase and a test phase. The learning phase constructs a model that will classify future unknown data. In the test phase, you will use the structure created previously to classify your unknown samples and finally, you can calculate the accuracy of the classifier.

Experimental result on the used dataset
Early glaucoma diagnosis remains a challenging task actually. Although CAD systems for early glaucoma diagnosis using fundus images have been developed aiming to assist ophthalmologists to distinguish between glaucoma and normal retina. To assess the effectiveness of the proposed approach on large-scale datasets, we applied it to many different datasets.

Train model on the used dataset
In this section, we present the trained models using the ACRIMA and REFUGE datasets. We have trained the proposed models using 50 and 100 epochs respectively, and 5-folds cross-validation to resolve the overfitting problem. Randomly 60% images of the used dataset are selected for training the model, 20% images for the validation phase, and for the remaining 20% images are used for testing. The used datasets are not too large. Therefore, to enlarge them rotation data augmentation technique is employed. Below, we introduce the training model detail by using the ACRIMA dataset. Figure 9 and figure 11 show the last parts of training processes of the proposed model with 50 and 100 epochs respectively. Figure 8 and Figure  10 show obtained accuracies and losses in different cases. Therefore, the same way we have retrained the model used the REFUGE datasets. The goal is to verify the robustness of our proposed approach. Table 2 summarises the obtained quantitative performances by using two models which are: the first one trained using the ACRIMA dataset, and the second one trained using the REFUGE dataset.  Tables 2 and 3 below show the obtained quantitates results in the testing phase using the two training models formed. Which are the model trained on the ACRIMA   dataset, and the model trained on the REFUGE dataset, respectively. In this work, we have trained the proposed method using two different datasets to study the stability of our proposal. Table 3 reveals the obtained results using the model trained on the ACRIMA dataset, then testing on ACRIMA, REFUGE, RIM-ONE, ORIGA-light, Drishti-GS1, and sjchoi86-HRF datasets. Table 4 reveals the obtained results using the model trained on the REFUGE dataset, then testing on ACRIMA, REFUGE, RIM-ONE, ORIGA-light, Drishti-

Discussion
We tested the proposed approach using 3512 retina images, whose 1049 are glaucomatous and 2463 are normal images. The images were obtained from six publicly available datasets, which are ACRIMA, REFUGE, RIM-ONE, ORIGA-light, Drishti-GS1, and sjchoi86-HRF dataset. We have trained and on-line tested two models, the first one by using the ACRIMA dataset and the second one using the REFUGE dataset. We have built two different models to confirm the stability of our proposed approach. In the testing phase, we have used different datasets, to ensure that the result of the glaucoma assessment is not biased due to the change of the dataset in the training and testing phases. The receiver operating characteristic (ROC) curve, sensitivity, specificity, and balance accuracy were used as the performance metrics. The two built models are trained with, 50 and 100, epochs. Whey, to demonstrate that obtained results in the validation phase with 50 epochs  4. We have found that the proposed method gives a similar performance with different trained models. Thanks to the diversified data, the proposed method has shown excellent performance in various training, testing, and validation datasets. Therefore, it is stable and robust.
Many similar studies on CAD system for glaucoma diagnosis has recently been published [10], [11], [14][15][16][17][18][19], [47], [48] these studies allow us make a comparison in order to assess the performance of the proposed approche. Table 5 below shows a comparison of the obtained results in this work and the results of previous works. Our approach is slightly better.

Conclusion
In this proposal, we have addressed the problem of glaucoma disease diagnosis using digital fundus images. The objective is to automate the diagnosis based on the opposed method and using digital fundus images only and without having to resort to additional examinations. The framework of our method is: first, the BEMD algorithm is applied to decompose ROI under consideration to the intensity scale (BIMFs + residue). Second, the CNN architecture VGG19 is implemented to feature extraction, then concatenated them with a bag of features. Third, the ACP method is adopted to reduce features dimensionality. Finally, the SVM classifier called for the classification task. The proposal has shown robustness despite the diversity of the dataset used either for training or for testing. As well as the results obtained have compared with others recently published in the literature and they showed a significant improvement in the diagnostic performance of glaucoma retinal abnormality.