Detection of Choroidal Neovascularization (CNV) in Retina OCT Images Using VGG16 and DenseNet CNN

In this study, we intend to diagnose Choroidal Neovascularization in retinal Optical Coherence Tomography (OCT) images using Deep Learning Architectures. OCT images can be used to differentiate between a healthy eye and an eye with CNV disease. In CNV the Retinal Pigment Epithelial layer experiences changes in various properties which can be related to the assistance of OCT Images. This paper proposes a technique for finding CNV in OCTA pictures. Among the few attributes of CNV, the bigger turning point of veins is a moderately clear element, so we will utilize this property to see if there is CNV in an OCTA picture. DenseNet and Vgg16 Architectures of Deep Learning were used in the study and the hyper parameters of both of the architectures were changed to diagnose the disease properly. After the detection of the disease, the diseased OCT images are segmented from the background for the Region of Interest detection using Python OpenCV library that is used for the processing of images. The results of implementation of the Architectures showed that Vgg16 showed better results in detecting the images rather than Dense Net Architecture with an accuracy percentage of 97.53% approximately a percent greater than Dense Net.


Introduction
Optical Coherence Tomography is one of the most widely used diagnostic imaging method for the diagnosis of the retinal diseases. The output from the OCT machine provides an OCT image and provides enough visualization to predict whether there are some qualitative and quantitative changes in the retinal vessels that are imprinted on the OCT film. The increase or decrease of the retinal layers and their measurements are the main evaluating pointers in the clinical trials of detection of disease. Regular retinal OCT scanning help in early detection of any of the diseases related to the retina and can be avoided at older ages [1]. Age related macular degeneration is a main source of vision misfortune and irreversible visual deficiency. AMD is portrayed as neovascular dependent on the presence of CNV, a neurotic condition where new vessels develop from the choroid into the external retina. CNV regularly brings about vision misfortune since it can result in subretinal discharge, lipid exudation, subretinal liquid, intraretinal liquid, or development of fibrotic scars. Fluorescein and indocyanine green angiography (ICGA) are customarily utilized for CNV ID and representation, yet disadvantages to color based angiography incorporate that it gives just two-dimensional perception of vascular organizations, that obtrusive intravenous difference color can prompt queasiness and hypersensitivity, and that long procurement times makes high volume and various subsequent angiograms illogical. A promising elective methodology is optical soundness tomographic angiography (OCTA), which estimates stream signal in vivo by assessing movement contrast between resulting OCT B-checks at a similar area. Rather than regular color based imaging modalities, OCTA is non-obtrusive, has fast obtaining, is high-goal, and produces three-dimensional datasets. Be that as it may, OCTA is defenseless to a few imaging relics. Projection antiques cause credible stream signal in more profound anatomical layers. Many brain, eye, cardiovascular system diseases are already in a line if there is a retinal disease detected in a body. Various other disease can also be detected from OCT scans, a person suffering from diabetes have high chances of diabetic retinopathy, also any kind of macular edemas are also visible in retina OCT images. The main focus in this study is given to Choroidal Neovascularization (CNV) that is one of the most major causes of blindness in developed countries.
Choroidal Neovascularization in layman's terms can be defined as the generation of the extra blood vessels in the choroid layer of the retina. The innermost layer of the same choroid layer is known as the Brunch's Membrane (BM) any kind of damage to the membrane can be a cause for the Choroidal Neovascularization in the retina and would lead to loss of sight in future.
Choroidal Neovascularization (CNV) is an age related infection which manages the Degeneration of Macular tissue. This degeneration causes intense drop in focal vision as the age advances. Hence it is important to distinguish the progressions brought about by CNV for the Successful discovery of this sickness. The use of deep learning for the classification of diseased and non-diseased images in medical images have increased in recent years. Deep learning techniques such as CNN have proved to be of great use in object detection, image recognition and segmentation too. Hence this demonstrated the importance of analyzing the OCT images for the diseased images using deep learning. Comparison of the prediction of diseased images is done between the state of the art architectures of deep learning Vgg16 and DenseNet. Then segmenting diseased images to highlight enhanced blood vessels with hollow formations in the retinal layer with Choroidal Neovascularization [2]. This research contributes to improve the retina based diagnosis and the performance of CNN makes the exact detection. This research motivates to the researcher in medical field applications for various performance in CNN [3][4][5][6] with Vgg16 and desnet layers.

3 2 Background and Related Work
Retina OCT datasets have brought into light in recent days and a lot of researches for the feature of dataset is going on.One of the good studies that inspired us for the project is presented by Sertkaya et al. [7]. The study works upon the same dataset but predicts two more diseases than our study, but the segmentation of dataset is still missing in the study. Also study uses 4 architectures with almost the same type of layer background and two of them being Vgg16 and Vgg19. The number of epochs for each architecture in the study is 200, similar amount of graphic unit could have used for the study of denser networks such as Res Net or Dense Net, with around 50-60 hidden layers for better results.
A similar study in the journal of Elsevier shows the detection of macular edema using LeNet and Transfer Learning [8] by Motozawa et al. shows us how an already trained model is using the OCT dataset for a new training and applies the knowledge gained earlier to the new dataset for the prediction of disease. But, the only drawback the study has is the usage of really less amount of data to train the CNN models that would be considered as an undertrained model.
There were various studies which have shown the use of Convolutional Neural Network but it was done for the detection of other diseases such as Macular Edema [1,2,9,10]. These Studies also show the use of transfer learning but for transfer learning to work we need to pre-train the model on a dataset and then again train and test the model with the main dataset [1,2,[8][9][10][11][12][13][14][15][16] but our dataset was not enough for the process of the transfer learning. Also the segmentation studies helped us to learn a lot about how segmentation in the normal retina OCT explains a lot about how segmentation is an important aspect in breaking down the layers present in a normal retina and how it gets affected when various diseases affect the normal retina. One study helps us understand not only about the human retina layer segmentation but works upon the rat retina OCT and uses encoder decoder technique for segmentation of the retinal layers [9] by Rocio Armor, the only drawback of this study is to work upon the same technique once on the human retina OCT images.

Methodology
Deep Learning is an artificial intelligence module that can actually enact like a human brain and create patterns, recognize them, learn them and them differentiate among various patterns. Basically, deep learning is a subset of deep learning.
This study aims to use deep learning for better results across he dataset and for the comparison of two well trained architectures of CNN i.e. Vgg16 and DenseNet. The concept of deep in deep learning about the layers that are interconnected.
Layers play a key role in the working of the Deep Learning architectures.There are mainly three types of layers in a deep learning network-input layer, hidden layer and the output layer.
The input layer is a passive layer which means it just takes input and cannot modify the input, whereas the hidden and the output layer are active that can change the input images according to the weights set up in each node of the layers. For easier explanation we can see in Fig. 1 that the input layer sends the output to the hidden layer and the hidden layer basically multiplies the weights with the inputs received by the hidden nodes and then summed together to a single value and before the output leaves the node it is sent to a sigmoid function, so that the output value ranges between 0 or 1 [8].
CNNs are ordinary feedforward neural organizations, which are applied BP calculations to change the boundaries (loads and inclinations) of the organization to diminish the worth of the expense work. Convolutional neural organizations like any neural organization model are computationally costly. In any case, that is even more a downside than a shortcoming. This can be overwhelmed with better registering equipment, for example, GPUs and Neuromorphic chips. The layers are really important for a network to provide better output. Now, hidden layers can be any in number and can be used to enhance the model but should be used in a numbered and structured way so that the model is neither over trained nor under trained. Vgg16 and Dense Net also have some hidden layers that are repeated over and over again as filters to the input for the recognition of the anomaly. These layers are working as follows:

Convolutional Layer
Convolutional layer is one of the most basic and widely used layer that is used for the extraction of attributes from the input layer. Basically this layer has a filter matrix and is really smaller than that of the input image. This filter is convolved over the input image and then an activation map for neurons is computed. In layman's term, the filter is superimposed and slid over the input image the dot product of image matrix is done with filter matrix and saved as a activation map that is used as a input for the other hidden layer nodes, now be it Convolutional layer or any other [8,11,12].

Max Pooling Layer
Max Pooling Layer is one layer that can be called as the maxima of all layers values as, its work is to save the activation map from the product of filter and input image and get the maximum value of the feature map out of all the values. Pooling layers have an advantage over the convolutional layers that they are easier and faster to compute as it reduces the calculations and just is about finding the maxima and extracts the sharpest feature of the image [11][12][13].

Weights in Deep Learning
Weights can be assumed as the parameters that are set in the nodes of the hidden layers of the deep learning architecture. Weights are multiplied with the input matrix and then we get an output from the weighted node also known as a perceptron in a neural network in terms of human brain. Weights in architecture are best set by the researchers and are available online.

Activation Function
Activation function is used to show non linearity of a deep neural network model. They are used to convert the input signal of an input layer to an output signal and is forwarded to the next layer as the input. If activation function is not applied to the network then the output signal will be linear and would be easier to solve, but would have less power to learn complex mappings also, we need it to make our machine powerful to add ability to learn from complex data and represent its non-linear complex form when asked to plot [10].
Activation Functions used in our study is:

ReLu Function
ReLu stands for Rectified Linear Unit Function and is used as a linear function and output the input as it is constantly if positive but if negative it converts the data to zero. It is used now a days usually because most of the neural networks have achieved higher performance using this function.
ReLU for short is a piecewise straight capacity that will yield the information straightforwardly in the event that it is positive, else, it will yield zero. It has become the default actuation work for some sorts of neural network in light of the fact that a model that utilizes it is simpler to prepare and frequently accomplishes better execution. The sigmoid and exaggerated digression initiation capacities can't be utilized in networks with numerous layers because of the disappearing angle issue. The rectified linear activation work with the evaporating inclination issue, permitting models to learn quicker and perform better.

Sigmoid Function
Sigmoid function is also one of the most widely used activation functions in deep learning and is used to provide the output signal between 0's and 1's so that the predictions become easier. Also, if the machine has to make a multiclass differentiation, then softmax activation function provides us better results [10].

Dataset
The dataset used is the retina's OCT images and formed as a result of the work of institutions for many years.Dataset is mainly made up of two classes i.e. Normal eye OCT that is uninfected of any disease and CNV infected eye. The whole dataset consist of 10,000 images of infected and non-infected retina OCT there are segmented accordingly in training and testing folders so as to train the networks (Table 1).
Each dataset image is labelled within the CNV or normal folder and is named accordingly, the images with CNV are labelled as CNV-image ID and Image number varying from 1 to 10. The Table 2 below shows the basic features of the images that were used in the study from the dataset and were the only features provided before the start of study. The principle indications of CNV action in OCT are the presence of intraretinal or subretinal liquid, PEDs as well as RPE tears. These OCT discoveries ought to be assessed biomicroscopically for the presence of fibrosis in disciform scars, which are conclusive and irreversible phases of the illness that occasionally can be appeared on OCT.

Vgg16
Vgg16 is a convolutional neural system model proposed by Simonyan and Zisserman [1,2,[7][8][9][10][11][12][13][14][15] from the University of Oxford in the paper "Exceptionally Deep Convolutional Networks for Large-Scale Image Recognition". The model accomplishes 92.7% top-5 test exactness in ImageNet, which is a dataset of more than 14 million images with 1000 classes.  Vgg16 is one of the most researched models whose weights are also defined in various researches by deep learning experts. Figure 2 shows the architecture of a Vgg16 Convolutional Neural Network [17].
The pre-trained characterization CNN like VGG16 is comprise of a couple conv blocks, which has 2 or 3 convolution (Conv2D). It utilizes 13 convolutional layers and 3 completely associated layers. The convolutional layers in VGG-16 are for the most part 3 × 3 convolutional layers with a step size of 1 and a similar cushioning, and the pooling layers are largely 2 × 2 pooling layers with a step size of 2. Highlights of each picture in the preparation set are determined utilizing a secret layer of the pretrained VGG-16 convolutional neural network. A solitary completely associated feed-forward layer is prepared to take those outcomes.

DenseNet
DenseNet is state of the art architecture of deep learning that has gained popularity among researchers in recent days. Recent studies have indicated that convolutional networks can be generously more profound, increasingly exact, and proficient if they contain shorter connections between input and output layers. Dense Net is a more enhance version of Resnet where element-wise addition is used in the hidden layers [18]. A dense net consist of a dense block that is followed by the transition layers i.e. the In DenseNet, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. The same can be seen in the Fig. 3 that these dense blocks get the additional inputs from the preceding layers and is seen to be a successful addition for enhancing the feature maps that every layer gets as input.
This study is totally based on the detection of Choroidal Neovascularization in a retina OCT image and comparing the detection accuracy of two state of the art models of deep learning [19][20][21][22]. Vgg16 is implemented using the keras module of tensor flow, an open source library for implementing deep learning models. The dataset as divided were provide to the architecture as it is without any augmentation and the training was done in a batch size of 100, with a total number of epoch to counted to be four due to the restriction of the GPU the study could only be limited to four epochs. Similarly, in the Dense Net Model is implemented using keras module of the tensor flow library, the data is loaded into the dense net model using the Image Data Generator function of keras module and help us to make a batch of 32 images each for the training of the densenet model and hence training images for training of the model. 7000 infected and non-infected retina OCT images are trained in both the models and both of the models are validate for a mixture of 3000 retinal OCT images and is tested for prediction.
After the prediction is made and saved, the diseased images from the test dataset are detected and are segmented using the OpenCV library of python, the two methods used for segmenting the enhanced blood vessels of retina in the OCT image are canny edge Detection and Contour Marking [23]. The Canny filter is a multi-stage edge finder. It utilizes a channel dependent on the subordinate of a Gaussian to figure the power of the inclinations. The Gaussian decreases the impact of clamor present in the picture. Discover the force slopes of the picture. After performing gaussian filter, apply slope extent thresholding or lower bound remove concealment to dispose of fake response to edge recognition.
Canny edge and Contour marking are very widely used methods for marking objects and recognizing lines and patterns in images. Hence using such segmentation technique would help segment our diseased image and show us some recognizable changes in the normal image of a retina OCT versus that of the one with Choroidal Neovascularization.  Figure 4 shows a visual representation of the method used in study, to train and test each model and finally compare both for better accuracies.

Vgg16 Model Results
Vgg16 model was studied upon using a training dataset mixed with normal and CNV infected OCT images, which is made of 7000 images randomly taken as input. The data was trained in a batch size of 150, which means 150 images out of 7000 were provided at a time when the model started to learn, in a learning period of 4 epochs. The data was trained with a training accuracy that went up to 99.03% at the final epoch whereas when tested over a dataset of 3000 mixed normal and diseased images it showed an accuracy of 97.53% that was also a reasonable result with such an amount of data. Now, the diseased images that are predicted as true are separated from the model by checking the model prediction that can either be '0' or be '1' and are provide to the segmentation code where the use of contour marking and canny edge detection is used to provides reasonable distinction between a normal eye and a CNV infected eye which has increased amount of blood vessels due to the infection (Fig. 5).

Dense Net Model Results
The DenseNet model was also studied upon the same dataset and provides us another type of results. The data is trained in a batch form of 32 images, using image data But when tested over the 3000 test images, the results proved to less than that of Vgg16 and the accuracy of the densenet model's prediction was 96.87% which was, quite less than what Vgg16 provide with. The images are separated from the model with better prediction which is the Vgg16 model and then further the enhancement of blood vessels is segmented to show a variable difference in normal and CNV infected retina OCT image (Fig. 6).

Segmentation of Blood Vessels
The segmentation of the infected blood vessels can be seen and marked distinctly for segmentation using canny edge detection and contour marking on the image.

Canny edge Detection
Canny edge detection is a technique of edge detection in an image to mark the boundary of the given image, now the vessels can be detected and marked across the boundaries by using this method and provide us a diseased segmented area. Now, using the same method if we detect edges of a normal retina OCT and compare it to asegmented image of infected one a variable amount is difference is visible, which is represented in Fig. 7.

Contour Marking
Contour marking is technique used to create and mark contours in an image just we need to assign the color of the contour boundary while drawing it. Contours are made wherever there is a shape with closed boundary space but if the image is too noisy the contours increase in number. The contours in an image can be counted and are visible on the image. Contour marking on disease images showed us a regular pattern of a marking of large number of contour ranging between 70 and 120 in the images whereas the normal images with regular shaped blood vessels, uninfected retina OCT images has comparatively less number of contours ranging between 20 and 40 contours. As represented in Fig. 8. the difference in an infected image and a normal image can be differentiated easily by the use of contour marking. This difference in number of contours is because of the enhancement of blood vessels in infected retina, which enhances the size of blood vessels and changes their shape, which leads to new contour formation.

Findings and Discussion
In our study we find that Vgg16 model has provided us with much better accuracy than that of Dense Net model, but the dense net model even in an extra number of epoch has shown us a great amount of accuracy and also, it could have happened due to the loss I the model while it was being tested. Otherwise, in the comparison Vgg16 has provided better predictions and its predictions were used to apply the segmentation algorithms, which have shown us a variable amount of difference among the predicted images and the ones which are normal, I is believed that contour marking is more helpful as it counts the number of contours and can tell us somehow that the shape of the organ has been changed in one way or the other as there is high amount of increase in the contours. Also, canny edge detection technique is used to detect the edges in an image and does a good job in our case also, it has shown us variable amount of difference like it shows us distinction in the change of the boundary of a non-infected image to that of the infected image making it clearer to check the predicted image but still contour marking has helped more due to it more distinctive and count feature.

Conclusion and Future Work
Our study comes to conclusion that Vgg16 proved to provide better accuracy for the less featured dataset in comparison to the Dense Net model. Also, it can be seen that since dense net provide better results it can be tested for more medical dataset that has less features in future. The unique quality of Dense Net to do an element wise addition has proved to increase the accuracy of the prediction of disease on such a dataset gave slightly low accuracy but was not a disappointment in whole, the image data generator function could enhance the prediction accuracy by augmenting the images and creating a new dataset as whole which would have higher graphic needs. Canny edge detection and contour formation provide us a good level of difference from a normal image to that of the diseased image and hence differentiation of the diseased to that of normal is possible.
Future works for the same study are to enhance the segmentation of the images using better algorithms such as Hough transforms or heat map variance to detect changes in normal a retina OCT image and comparing the same with canny edge outputs and contour marking outputs. Also we can use another deep learning concept for the future study such as transfer learning, a new field in deep learning, which helps us develop a machine which has already fine-tuned and trained using the same data and is already learnt and then again we train it with another data and enhance its learning power to get better prediction from the machine, Dense Net is one of the deep learning model which has recently shown good results in few studies when it uses the concept of transfer learning while implementation. Such kind of works are appreciated and can be researched thoroughly for a better future of this study. Anuja Vaishnava obtained her bachelor's degree in the field of Software Engineering from SRM Institute of Science and Technology, Chennai, India. She is currently working at a software company in Chennai as a full-stack developer. She is more towards the development using the recent technologies and frameworks. Her current research interest is in Deep Learning and Data Science.