Using Transfer Learning for Diabetic Retinopathy Stages Classication

Purpose – Diabetes is a chronic disease, that leads to damage of many systems of the body. One of the dangerous complications of diabetes is diabetic retinopathy. Frequent inspection for diabetic retinopathy is essential to recognize patients at risk of visual impairment. The disease grade level must be tracked to manage its progress and to start the appropriate decision for treatment in time. Effective automated methods for the detection of diabetic retinopathy and the classication of its severity stage are necessary. It also helps to decrease the burden on ophthalmologists and reduce diagnostic contradictions among manual readers. Methods– In this research, a convolutional neural network (CNN) is used based on color retinal fundus images for the detection of diabetic retinopathy (DR) and classication of its stages. CNN can recognize sophisticated features on the retina and so provide an automatic diagnosis. The pre-trained CNN model Visual Geometry Group (VGG) is applied on DR data using a transfer learning approach to utilize the already learnt parameters based on 1,000,000 images of ImageNet with 1000 classes. Results – By conducting different experiments with different classes setting the built models achieved promising results. The best achieved accuracies for 2-ary, 3-ary, 4-ary, and 5-ary classication are 85.99, 80.5, 61.28, and 71, respectively.


Introduction
Diabetes is a chronic disease that is caused by the inability of the pancreas to produce a su cient amount of insulin, which is a hormone that adjusts blood sugar, or it is the inability of the body to use the produced insulin effectively. High blood sugar is a prevalent result of uncontrolled diabetes and eventually affects many systems of the body, such as blood vessels and nerves. Therefore, it is a main cause of blindness, heart attacks, lower limb amputation, strokes, and kidney failures. The number of diabetic people increases signi cantly, it increased from 1980 to 2014, from 108 million to 422 million which was clearer in low-and middle-income countries. In 2016, when a 5% increase in premature mortality from diabetes since 2000 was noticed, diabetes caused a total of 1.6 million deaths. Diabetes can be treated with physical activity, diet, medication and frequent screening, and treatment for complications [1].
Diabetic retinopathy is considered one of the serious complications of diabetes, which is responsible for 2.6% of overall blindness. The incidence of DR is due to the extreme increase in the level of glucose in the blood. Higher levels of blood sugar destroy the blood vessels in the retina. That rises the probability of uid leakage and bleeding which results dangerous vision problems that might leads to blindness [2]. To decrease the dangerous effect of diabetic retinopathy, early detection, precise diagnosis, and appropriate treatment are required [3,4]. Therefore, an intelligent automated method for early and accurate detection of diabetic retinopathy is required to manage the progress of the disease and thus guarantee appropriate treatment and reduction of the risk factors of diabetic complications.
Classi cation of DR includes the weighting of many features and nding the position of these features. This is an exhausting time-consuming task for ophthalmologists, and it is prone to mistakes. Therefore, ophthalmologists can be supported by computer aided systems (CADs), which can detect abnormalities and classify the severity of different cases and it is even faster. They can decrease the load on ophthalmologists and reduce inconsistencies between manual readers. The automated detection and classi cation of DR is an active area of research in computer science. Great work has been achieved on detecting DR automatically using tradition methods such as Naive Bayes, k-Nearest Neighbor, and support vector machines classi ers, that depend on hand-crafted features extraction and then classify different cases depending on the set of selected features [5,6]. In contrast, features can be learnt automatically from the original images through the training phase using deep learning [7].
The advancement in deep learning has motivated researchers to use deep learning in medical images analysis. The research that applied deep learning in medical domains is proved to be successful.
Convolutional Neural Networks (CNNs) is a type of deep learning networks that are specialized in applications of image analysis. Where the layers nearer to the input of the model learn low-level features like lines, the layers in the middle learn convoluted abstract features that integrate the lower level features, and the layers closer to the output interpret the features extracted in the context of the classi cation [8]. In numerous recent image classi cation tasks, CNNs were applied and achieved high performance. These high-performing CNN models can be imported and used for another image classi cation task using transfer learning approach.
Transfer learning approach is to utilize a pre-trained model to train a new model. The pre-trained model is trained on a large dataset to solve a problem that is like the problem that needs to be solved. The traits learned by pre-training on the large dataset can be transferred to the new network, where only the classi cation component is trained on the new smaller dataset. It uses the knowledge obtained during solving one problem and exploits it in solving various but relevant problems. A pre-trained CNN uses the features that were learned from a certain domain to ne-tune another data. It may be used as-is to classify new images, or as feature extraction models, where the output from the layer before the model's output layer is used as input to a new model classi er. Transfer learning saves considerable time used in developing and training a deep CNN model [9].
There are many high-performing pre-trained models that can be imported and used for image recognition.
Most of these models have been developed as part of the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The competition has resulted in several innovations in the architecture of CNNs. These models can be used as the basis for transfer learning in image classi cation. Examples of these models from published literature are Visual Geometry Group (VGG) [10], inception modules (GoogleNet) [11,12], Residual Neural Network (ResNet) [13], and Neural Architecture Search Network (NasNetLarge) [14] etc. These models were trained using ImageNet data which consists of 1,000,000 images with 1,000 classes, so they have learned to detect generic features and their learnt weights are provided and used in similar problems. They achieved state-of-the-art performance and when used to develop other image recognition tasks, they remain effective [15,16].
In this research, a transfer learning approach using a per-trained model VGG was utilized to detect diabetic retinopathy and classify its stages based on retinal fundus images. The remainder of the paper is organized as follows; related research articles about the detection and diagnosis of diabetic retinopathy are reviewed in Sect. 2. The proposed method for detection and classi cation of diabetic retinopathy is introduced in Sect. 3. In Sect. 4, experimental results of the proposed model are presented. Section 5 contains the discussion and comparison with the literature. Finally, conclusion and future work are drawn in Sect. 6.

Related Work
Many systems have been proposed in the literature for the detection and diagnosis of diabetic retinopathy using various machine learning techniques (MLTs). These systems are either based on conventional MLTs, which depend on hand-crafted features extraction, or deep learning where the features are extracted automatically during the training. In this section, some of these systems that were found in the literature will be illustrated.
Seoud et al. [17] used random forests to develop a computer aided system to automatically classify fundus images according to diabetic retinopathy grade. The proposed system detects red lesions, then extracts 35 features from these lesions, nally the random forests classi er uses these features in classi cation. They applied their system on Messidor dataset and achieved an accuracy of 74.1%.
[18] used a machine learning bagging ensemble classi er (ML-BEC) for DR diagnosis. It includes two stages, the rst stage is to extract the important objects from the images of retinal fundus such as neural tissue, blood vessels, optic disc size, and optic nerve which are considered as features for the diagnosis of DR disease. While in the second stage, it classi es the cases using ensemble classi er based on the extracted features.
Cao et al. [19] used 3 classi ers which are neural network, random forest, and support vector machine to classify DR into 5 grades severity based on collected patches from images to recognize microaneurysm. Random forest feature importance and principal component analysis (PCA) were used to select the important features. The proposed classi ers were applied on fundus images in the the DIAbetic RETinopathy DataBase.
Mansour [20] proposed a CAD system to classify the images of retinal fundus according to their grades level. It contains sequences of tasks which are region segmentation using Gaussian mixture, features extraction using AlexNet CNN, features selection using linear discriminant analysis (LDA) and PCA, and nally classi cation using support-vector-machine.
Gadekallu et al. [21] proposed a model based on PCA, deep neural network (DNN), and Grey Wolf Optimization (GWO) algorithm. The proposed model is a binary classi er which classify the features extracted from dataset into affected with DR or not). The whole process to build the model contains standard scaler normalization for standardization of the dataset, then PCA to reduce the dimensionality, followed by GWO to select the optimal parameters and nally train DNN model using DR Debrecen dataset from UCI machine learning repository.
Rahim et al. [22] proposed an automatic method to detect diabetic retinopathy and maculopathy in eye fundus images using fuzzy image processing techniques. The fuzzy techniques have been used in different tasks some in the preprocessing stage as ltering and histogram equalization, also, it has been used in the detection of 4 retinal structures.
Pratt H. et al. [23] proposed a CNN to classify DR using digital fundus images. The data was augmented, then CNN applied to learn the sophisticated features on the retina and so to automatically diagnosis DR.
CNN was trained on the Kaggle dataset and achieved an accuracy of 75%.
Lam C. et al. [24] used CNNs on color fundus images to detect the stage of DR. Transfer learning based on pre-trained models which are GoogLeNet and AlexNet was applied on Kaggle and Messidor-1 dataset.

The Proposed Method
In this section, the used data and the proposed models are described. First, the used dataset for developing the CNN models is presented. Then, the full process which contains two phases "Preprocessing the data" and "Developing Transfer Learning-based CNN Architecture" is explained. In "Preprocessing the data" phase the data is prepared for developing CNN architecture based on transfer learning approach in the second phase.

The Used Datasets
In this research, the proposed CNN models based on transfer learning approach was conducted using data obtained from a publicly available benchmark dataset which is the Kaggle dataset [27]. The dataset contains color highly diverse levels of illumination in fundus images. From the Kaggle dataset, a set of 35,126 retinal images are used to train the models.
Kaggle dataset images are in PNG format and they are re-sized to 224x224 pixels. Each image is labeled as left or right eye. Each image is categorized according to the level of severity into one of 5-class labels (0-4) to represent (normal, mild, moderate, severe, proliferate_DR) stages. Figure 1 shows different samples from Kaggle dataset representing different stages, where Fig. 1. (a) is a normal sample and (b)-(e) samples represent different stages of severity.

Pre-processing the Data
To develop the proposed CNN models based on transfer learning approach, pre-processing steps are applied on the retinal fundus images to prepare the images for the learning phase. The pre-processing steps can be summarized as follow: 1. The retinal fundus image region has been cropped automatically from each image to remove the background and unwanted region. Figure 2 (a) shows a sample of an original image from the Kaggle dataset, while Fig. 2 (b) shows the same image after removing unwanted region.
2. One of the most important challenges in the development of a deep learning model is the unbalanced and limited data size. In this research, the used data does not suffer from data limitation, especially that the adopted approach for learning is transfer learning which relatively overcomes the data limitation problem. However, as it is clear from Table 1 that there is a balancing problem in the used data, where the representation of the classes is unbalanced. The images in class 3 and 4 do not have enough representation as the other classes which is an obstacle in the way of detection of the cases belonging to these classes. Therefore, augmentation has been applied to the poorly represented classes which are 3 and 4 to solve the data balancing problem. Thus, each training image belonging to these classes has been rotated by three angles of 90°, 180°, and 270°a nd then ipped to enlarge the representation of these classes in the dataset. Column 4 in Table 1 shows the number of classes after augmentation. Figure 2(c) shows the augmentation of the sample in Fig. 2 (b) which belongs to proliferate_DR (class 4) in the Kaggle dataset.
3. All images are resized to the same size to satisfy the CNN requirement of equally sized images that are provided as input to CNN model.

Developing Transfer Learning-based Architecture
Transfer Learning (TL) is to utilize pre-trained model to train another model. It uses the knowledge obtained while solving one problem and apply it to another but relevant problem. Thus, the pre-trained CNN models on ImageNet such as VGG, ResNet, and inception can be utilized to solve another problem. The developers of these models provided their models publicly to enable more research on the use of these representations in computer vision. Where these pre-trained models contain many millions of parameters in their architectures, training them from scratch requires a very long computational time and input images. So, the transfer learning is the best solution for many problems that can exploit pre-trained models to solve other one such as the one presented in this research.
The used transfer learning architecture is shown in Fig. 3. As shown in the gure, the pre-trained CNN model was trained using Imagenet dataset which contains 1,000,000 images to classify 1,000 classes. This pre-trained network has been utilized to train the retinal fundus dataset, after applying preprocessing on it.
The top 2 layers of the pre-trained model, which are employed to classify 1000 classes, are removed and replaced by an output layer with SoftMax activation function as a classi er with 5-nodes to supply 5output classes, which represent the stages of DR. The 5-nodes can be changed to (2)(3)(4) according to the applied model which specify the required output. The residual components of CNN are handled as features extractor for the new dataset, while the pre-trained model weights are kept unchanged. The new network was re-trained with retinal fundus images dataset with learning rate of 0.001 and Adam optimizer. The number of epochs to train the new models was (10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) epochs.
In this research, transfer learning-based models are developed to classify the retinal fundus images dataset into the different 5 stages of DR severity levels. The three most common pre-trained models which are VGG, ResNet, and inception are used to classify the Kaggle dataset into its ve severity levels.
Where VGG achieved the best result in this task, it has been used as the adopted model in this research and so it has been further investigated by performing several experiments where they classify the dataset into different combinations of classes.
VGG is composed of 16-19 depth layers. The input to VGG is a size (224×224) image. The network contains a set of convolutional lters with (3×3) size. The stride of 1 pixel is used for all convolution lters, the padding is 1 pixel for (3×3) convolutional Filters. The recti cation (ReLU) activation function is used for all hidden layers. Five of convolutional layers are followed by (2x2) max-pooling layers with stride of 2. Finally, 2-Fully-Connected (FC) layers with 4096 channels each were applied, followed by the output of 1000 channels (one for each class) with soft-max activation function.

Experimental Results
This section demonstrates the analysis and the experimental results of the proposed architecture. To validate the e ciency of the proposed architecture and to compare the results with others, benchmark dataset was used for implementation. Keras Python deep learning library on the top of TensorFlow framework was used for implementing the different built models based on VGG with 16 layers (VGG16) on a machine with an Intel® Core™ i7 CPU@ 3.6 GHz with 32 GB RAM and a Titan X Pascal Graphics Processing Unit (GPU). Extensive experiments were conducted to get the best setting that achieve the best results.
The data has been randomly split into training and test data, where the training dataset represents 70% of the whole data. The different classi cation models have been implemented according to the proposed architecture previously described using the training dataset and tested using test data. As mentioned before, the proposed architecture was applied using the three most common pre-trained models which are ResNet, inception, and VGG to classify the retinal fundus images of the Kaggle dataset into the ve severity levels. The achieved accuracies were 66.24%, 63.41%, and 71% for ResNet50, Inception, and VGG16, respectively. Table 2 shows the achieved accuracies of the used models and the input shape required for each model. Since VGG16 achieved the best result, it has been used as the adopted model and so used for more experiments where it classi es the dataset into different combinations of classes, as will be shown below. Firstly, to test the capability of the VGG16 model to detect abnormality in general, experiment #1 was conducted. It is a binary classi cation task, where it classi es the cases into normal and abnormal that includes the other 4-classes which are {mild, moderate, severe, proliferate_DR}. So, these 4-classes were merged in 1-class which is abnormal class. The achieved accuracy of this experiment is 75.5% in detecting abnormality.
As mentioned before, each image in the Kaggle dataset is categorized into one of the 5-classes (0-4) according to the level of severity to represent (normal, mild, moderate, severe, proliferate_DR) stages. To test the capability of the model to classify the cases into the different 5 severity levels, experiment #2 was conducted. The achieved accuracy is 71%.
According to the consulted ophthalmologists, in Kaggle database the mild cases did not form an obvious class as some cases could be classi ed as normal while others were more likely to belong to moderate. So, it was suspected that the model might not be able to distinguish them from normal and moderate classes. Likewise, the severe and proliferative were not easily distinguishable. Therefore, in experiment #3, "normal" and "mild" cases; and "severe" and "proliferative" cases were merged. And, in experiment #4, "mild" and "moderate" cases; and "severe" and "proliferative" cases were merged. The accuracies were improved to 75.03% and 80.5% respectively, which proves that the differentiating traits for the mild class are not evident and the classi cation of classes of the dataset is not accurate. The increment in the accuracy in experiment #4 compared to experiment #3, clarify that the mild class is closer to moderate class, which may be translated that the number of cases in the mild class that inclines to the moderate class are more than the ones that inclines to the normal class.
In another approach, there was the intention to determine the severity level among the abnormal cases only, neglecting the normal cases. In experiment #5, the model classi ed the 4 abnormal classes. The achieved accuracy was 61.28%. Due to the mentioned classes convergence, merging between classes was applied again. So, in experiment #6 mild was merged with moderate, while severe was merged with proliferate_DR. The achieved accuracy improved to 81.60% By consulting ophthalmologists, we found that the 4 stages of abnormality can be mainly categorized into Proliferative Diabetic Retinopathy (PDR) and Non-Proliferative Diabetic Retinopathy (NPDR) according to severity level. Therefore experiment#7 was conducted by classifying cases into proliferate_DR and Non-Proliferative {mild, moderate, severe}. The achieved accuracy was 85.99%. Table 3 shows the accuracies of different experiments using various models built using transfer learning based on VGG16 model. Figure 4. illustrates the comparison among the different accuracies achieved by the different models built using VGG16.

Discussion
Detecting diabetic retinopathy and the classi cation of its severity stages is one of the biggest challenges for ophthalmologists. The contribution of this work is to develop an architecture to help in detecting DR and classifying its different stages. Since the available number of cases in different DR datasets are relatively limited, so, transfer learning is the suitable approach for the proposed work to employ pretrained models to build the models that can classify the DR cases using the available data.
To evaluate the proposed work, it was compared with previous works that used the same dataset which is Kaggle dataset for a fair comparison. Table 4 illustrates this comparison. Chowdhury et al. [25] used GoogLeNet to classify DR into 2, 3, and 5 classes and the achieved accuracies are 61.3, 60.3, and 37.7 respectively. Lam C. et al. [24] applied AlexNet and GoogLeNet TL approach, but they stated that GoogLeNet achieved better accuracies than AlexNet, which are 74.5, 68.8, and 57.2 for 2, 3, and 4-classes respectively. Although the two-research [24] and [25] used the same model which is GoogLeNet and the same dataset, the results of the two researches are different. That may be resulted from different preprocessing steps and changes in the setting of TL-network. Pratt H. et al. [23] built CNN using Kaggle dataset to classify the cases into the 5-classes and the achieved accuracy was 75%. As it is shown, the proposed model's results outperform the two works that used the same dataset and TL approach but with GoogLeNet. The third work is better than the proposed model in classifying the cases into 5-classes which is the only applied model in that research. It achieved an accuracy of 75% but the proposed model achieved 71%.
It is worth noting that, by applying the proposed architecture without augmentation the results of different models were better, but they were suffering from over tting. That was clear from that the results of different models that remained with the same accuracy through all epochs, even by using different models which are VGG16, ResNet50, and Inception. As an example, they all achieved 75% to classify DR into 5 severity stages (experiment # 2) and 91.93% for Proliferative Diabetic Retinopathy (PDR) and Non-Proliferative (experiment # 7).

Conclusion And Future Work
Recently, the number of diabetes patients is increased dramatically and consequently the number of diabetic retinopathy patients. To help in the detection of the disease and classify its grade stages, deep learning has been used, in this research. The transfer learning approach, which utilized the pre-trained CNN model VGG16 to apply it on retinal fundus images dataset, has been used. In the proposed architecture the pre-trained model VGG16 was used for feature extraction, then the top 2-layers were replaced by SoftMax activation function with the new output layer which changed to 2-5 classes according to the experiment.

Figure 2
Pre-processing steps for one image from the proliferate_DR images of the Kaggle dataset.

Figure 3
Transfer learning architecture