Dilated Deep Neural Architectures for Improving Retinal Vessel Extraction

Retinal vascular region is recognized as the promising anatomical region for the diagnosis of several commonly seen diseases including cardiovascular related and diabetes. In this paper we propose two novel deep neural architectures named as Dilated fully convolved convolutional neural network (FCNN) and dilated depth concatenated neural network (DCNN) to segment the retinal blood vessels. The proposed work is evaluated for both the proposed architectures with and without dilation. It is observed from the obtained results that dilation enhances the network performance. To eliminate the non-uniform illumination and low contrast differences effect the preprocessed images are used for training the architectures. The proposed methodologies are experimented on the two publicly available databases DRIVE and STARE database. The proposed dilated FCNN architecture can able to obtain high accuracy of about 95.39% which is high compared to the FCNN architecture. For the dilated DCNN architecture also, accuracy obtained is about 96.16% which is high compared to DCNN. The experimental results reveal the significance of dilation operation in improving the semantic segmentation of retinal blood vessels.


Introduction
Human eye is an absolute sensory organ for vision. Eye sight is entirely accomplished by the blood flow in retinal vessels in eye. Diseases such as diabetes retinopathy, hypertension and arteriosclerosis cause change in branching pattern and retinal blood vessel thickness leading to blindness. Hence primary requirement for screening and diagnosis of retinal diseases involves segmentation of retinal blood vessels. Two dimensional or three dimensional colour fundus images are used for segmenting retinal vessel. In early days analyst or experts manually segment the retinal vessel from the fundus images which is a time consuming progress and it leads to error. Automatic segmentation of retinal blood 1 3 vessels reduces time and it helps in early detection and diagnosis of diseases. Improvement in computer aided system leads to development of various methods for retinal vessel segmentation. Several supervised or unsupervised methods have already been proposed for retinal vessel segmentation. Supervised methods rely on manual annotation groundtruth for segmentation of retinal vessel whereas unsupervised method does not depend on manual annotation.
Salazar Gonzalez et al. proposed an unsupervised graph cut based retinal vessel segmentation method [1] in which the input images are preprocessed by adaptive histogram equalization and distance transform. Recently, retinal vessel segmentation by unsupervised methodologies based on Frangi enhancement filter [2], ICA enhancement technique based method [3], line detector based methodology [4], Gaussian filtering with morphological processing [5] and also probabilistic tracking based method [6] are proposed. The retinal vascular region segmentation are chronically with similar intensity range for both background and vascular region and non-uniform brightness and contrast. These issues lead to detect the background region as vascular regions by the unsupervised method. Supervised methods can able to overcome the drawback of unsupervised methods by handcrafting the features describe vessel regions. Zhu et al. proposed a supervised method [7] by taking thirty nine discrete features for each fundus image comprising morphological feature, hessian feature and so on. The extracted feature together with ground truth is given as input to the extreme machine learning classifier and the image is classified as vessel and non-vessel region. High efficiency of neural network based methods lead to its implication for medical image processing. Marin et al, applied feed forward neural network as a classifier for retinal vessel extraction with gray and moment invariant features [8]. In many approaches the feature selection for segmentation of image is strenuous task as segmented result depends on features selected. Influence of significant features need to be considered for the vessel extraction are mentioned in the optimization based feature selection method [9] for the retinal vessel segmentation.
In [10] sumathi et al. proposed a robust algorithm for segmentation of retinal vessels. The input images are preprocessed to normalize the illumination and contrast problem and thirteen dimension feature vectors are extracted. Probability map are formed using neural network classifier and the images are classified as vessel region and non-vessel region. In [11] Shelhamer et al. proposed a transfer learning approach for retinal vessel segmentation using Alexnet. The conventional fully connected layer and classification layer on the output layer are modified to form a fully convolutional neural network.
The significance of both supervised and unsupervised methods are incorporated in some works [11,12] to obtain the improved performance at the cost of increasing computational burden. In [12], radius based clustering algorithm is enhanced with a partial supervision strategy to segment blood vessels of small diameter and thickness. The thin vessel extraction is attempted through mean matting [13] method comprising both supervised and unsupervised methods.
Convolutional neural network process the image similar to that of human by applying feature engineering. Hence convolutional neural network was applied as an eminent feature extractor in some works [14][15][16][17][18][19]. Orlando et al. [14] segmented the retinal colour fundus images using convolutional neural network for feature extraction and classified the image using Support vector machine classifier. The input images are preprocessed to eliminate the contrast and illumination problem. Wang et al proposed convolutional neural network as a hierarchical feature extractor [15] and Random forest as a classifier for retinal vessel extraction. In [16] Liskowski et al. proposed a deep neural network based approach in which the input images are preprocessed with global contrast normalization. The neural 1 3 network is trained with four thousand samples which are data augmented using geometric transformation and gamma correction. In addition to the above surveys, several deep learning based methods [17][18][19] are also proposed for retinal blood vessel extraction.
Fu et al. proposed neural network based retinal vessel segmentation method [20] in which Convolutional neural network learns the features and generates probability map and segment the vessel by pixel concurrence using conditional random field. Zhexin et al. proposed transfer learning based approach [21] for retinal vessel segmentation and then the obtained result are post processed to improve the accuracy of segmented result. In [22] Olaf et al. proposed neural network architecture for segmenting medical images. The architecture consists of multiple hidden layers; the output of each hidden stage is up sampled and concatenated to previous layer output before upsampling. Dasgupta et al. [23] proposed a method for segmentation of retinal vessel in which input images are preprocessed by extracting green channel of fundus image to remove non uniform illumination and then the image is divided into patches and trained using fully convolved neural network. The images are then classified using probability map by setting threshold.
In [24], Qiaoliang et al. proposed a cross modality approach for retinal vessel segmentation. Mapping function is used to obtain the corresponding vessel map. The neural network consists of five layers viz, one input layer, output layer and three hidden layers with a sigmoid activation function consisting of 756 units. The neural network is trained with green channel image and it is transformed to vessel label.
Olivera et al. [25] proposed a method for retinal vessel segmentation in which multiscale analysis is performed by combining Stationary wavelet transform and fully convolutional neural network. The green channel of the images is taken and is its mean and variance is normalized. The remaining channels are stationary wavelet transformed and are then trained using neural network.
The fully convolved neural networks are successful in the semantic segmentation of medical images. As per retinal vessel segmentation is concerned, extraction of thin infinitesimal vessel is highly challenging. Several papers [26][27][28][29][30] are focused on modification of the fully convolved networks for enhancing the thin vessel extraction. The modification attempted on the fully convolved networks include adding residual blocks, multi paths, drop out or varying the number of layers. The cost for achieving the thin vessel extraction is done by increasing the computational burden. Hence it is essential to improve the thin vessel extraction without increasing the computational burden is to be focused.
The background intensity range coincides with the thin vessel region, therefore it is highly challenging to extract them. Dilation operation improves the field of view in the convolution operation [31]. Increasing the convolutional filter field of view helps to distinguish vessel region with the background. Even the field of view in the convolutional filter is increased, the computational complexity remain the same. In the proposed work, retinal vessels are segmented by two dilated convolutional neural network architectures derived from [22] and [32] respectively. The contribution of the proposed work includes the development of dilated fully convolved neural network and dilated Depth concatenated neural network architectures for the retinal vessel extraction. In order to verify the performance improvement by dilation operation, retinal vessel extraction with dilation factor of 1 is also experimented.
The proposed method has the following contribution and novelty: • Development of dilated neural network architectures for retinal vessel segmentation.
• Enhancing the thin vessel extraction by the dilated convolution.
• Performance improvement is achieved without increasing the computational burden.
• Performance improvement is verified through comparing the dilated convolutional network result with non-dilated convolutional network.
Paper is organized as: Sect. 2 describes the proposed methodology. It describes the outline, preprocessing, dilation convolution, proposed architectures and class imbalance problem. Section 3 describes the results and discussion. It includes the database description, evaluation metric and experimental results. Section 4 describes the conclusion and future work.

Outline
In the proposed work two different architectures of neural networks are proposed one is derived from segnet architecture [32] and it is named as fully convolved convolutional neural network (FCNN), the other is derived from unet architecture [22] and it is named as depth concatenated neural network (DCNN). The neural network consists of input layer, hidden layer for feature extraction and output layer for classification. Both the proposed architecture consists of five stages of encoder and decoder layers. All the convolutional layers in the proposed architecture are replaced by dilated convolutional layer. The ultimate significance of the dilated convolution is the large field of observation. It is inferred from our experimentation that convolution computed using the non-local neighbor pixels improves the classification performance. In this work, the input images are preprocessed to eliminate illumination and low contrast between vessels and background. Features are extracted using multiscale CNN and are classified as vessel and non-vessel region using pixel classification. The vessel region is less compared to non-vessel region this leads to imbalance of pixel count between vessel and non-vessel region. The imbalance of pixel count is eliminated by cross entropy loss function thereby reducing misclassification of classes.

Preprocessing
Non uniform illumination and low contrast difference between vessels and non-vessels lead to improper segmentation of vessel. Retinal color fundus images are preprocessed to eliminate contrast and illumination problem. The green channel of the fundus image provides better contrast and can reduce illumination problem hence it is considered as the input image. The extracted green channel fundus images are further preprocessed for vessel enhancement. In the proposed work Contrast limited adaptive histogram equalization (CLAHE) is applied for vessel enhancement. CLAHE is a local enhancement technique that limits the amplification of histogram by clipping at a predefined value. To enable the vascular region more visible, CLAHE is the most commonly used technique in medical images. The preprocessed fundus image is shown in Fig. 1.

Dilated Convolution
Dilated convolution differs from conventional convolution layer by providing space between variables in kernel during convolution operation [31]. Dilation enlarges the receptive field without increasing the quantity of parameters or computation. The dilated convolution operation is given in Eq. 1.
In dilated convolution, a kernel of size k × k filter is enlarged to a filter of size k + (k−1) (r−1), where r is the dilated rate. It keeps the same resolution by allowing adaptable aggregation of the multiscale information. The pictorial representation of dilation working principle with 3 × 3 filter is shown in Fig. 2.

Fully Convolutional Neural Network
The proposed FCNN architecture is extracted from segnet architecture [32], which consists of five stages of encoder and decoder containing twenty six layers of convolution with two convolution layers in the first two stages of encoder and decoder and three in the next three stages with constant kernel size 7 × 7. The proposed FCCN architecture consists of five stages of encoder and decoder and it is shown in Fig. 3. All the stages of encoder consist of two dilated Convolution layer, each convolution layer is followed by batch normalization and rectified linear unit. Convolution operation performed in the proposed FCCN involves dilated filter of 3 × 3 with striding and padding of one. Batch normalization layer are included to increase the speed of training and rectified linear unit is used as the activation function. The output of each encoder hidden stage is unpooled and is convolved twice.  Tables 1 and 2 respectively.

Depth Concatenated Neural Network
The proposed DCNN architecture is modified from Unet architecture [22]. The hidden layer of DCN architecture consists of dilated convolution layer, maxpooling layer, ReLu layer, batch normalization layer, depth concatenation layer and upconvolutional layer. The proposed architecture is shown in Fig. 4. The hidden layers form five encoder stages and five decoder stages. Each encoder stage consists of two convolutional layer and maxpooling layer. The depth concatenation layer in the proposed work concatenates the output from the encoder stage to the corresponding decoder stage. The depth concatenation layer concatenates the two inputs   and hence for the fifth stage it is 512. The size of image is reduced to half the size of image for each encoder stage. The images are then upsampled by decoder. Convolution operation performed in the proposed work involves dilation factor of 2. The pooling operation performed in the proposed work is 2x2 maximum pooling. The output layer in DCN is same as that of FCCN. The encoder and decoder layer specifications are listed in Tables 3 and 4 respectively.

Class Imbalance problem
In pixel classification layer, all the pixels in the input images are classified as vessels and non-vessels. Vessel pixel count occupies less than one fourth of the total image pixel count which leads to misclassification of vessel as non-vessel. In order to avoid this class imbalance problem class weightage of vessel is increased. The images are classified as vessel and non-vessel, whose corresponding weights are denoted as CW = CW (1) , CW (2) . The total class weight (T(CW)) is given by Eq. (2) Class frequency are obtained by dividing each class weight by total class weight. Class frequency F is presented by Eq. (3) (2) T(CW) = CW (1) + CW (2)     Pixel classification layer is updated with class weight so that more weightage is given to the vessel region compared to the non vessel region. The network learns well by avoiding the class imbalance problem with the inclusion of weightage to the classes and it enhances the segmentation result.

Database
The proposed methodology is experimented using two popular databases namely DRIVE (Digital Retinal Images for Vessel Extraction) [33] and STructured Analysis of Retina (STARE) [34]. Canon CR5 with 45 • field of view is used for capturing DRIVE images. The database comprises totally 40 images; half the images are for training images and the half is for testing purpose. Each image is of size 565x584, and available with corresponding groundtruth and its mask image. Two ground truths are provided for testing; two experts were developed the groundtruth. Testing images are also available with one mask image. The STARE database are obtained using TopCon TRV-50 fundus camera at 35 • Field of view. Images are of size 605 × 700 and for each image two independent manual segmentations are offered as groundtruth.

Evaluation Metric
The proposed methodology is evaluated using sensitivity, specificity and accuracy as performance metrics. The classified pixels may be under any of the following cases; True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN). TP is the outcome of vessel predicted correctly as vessel and those are wrongly predicted as non-vessel pixels instead of vessels are counted as FN. TN is the outcome of non-vessel pixel correctly predicted as non-vessel and the non-vessel pixel wrongly predicted as vessel are defined as FP.
The sensitivity (Se), specificity (Sp) and accuracy (Acc) are calculated as follows.

Experimental Results
The proposed Dilated FCNN and DCNN architectures are trained using twenty training images taken from DRIVE database and is evaluated for the remaining twenty test images.
In addition in order to validate the performance of the proposed trained architecture we tested twenty abnormal images from STARE database. In order to show the improvement in accuracy for the proposed dilated architectures, we do experimentation on both the architectures with and without dilation layers. The entire experimentation is carried out on Intel i5 core, 32 GB CPU system using Matlab 2018b. The number of epoch is a parameter that defines the number of times the training algorithm learns the entire training dataset. Both the proposed architectures are trained until the training accuracy improvement and loss reaches an optimum value; that is further improvement is very negligible. It is experiential that the training accuracy and training loss for FCCN architecture is 95.71% and 0.001 respectively. The DCNN architecture reaches the training accuracy of about 92.35% and loss of 0.131. Even though training accuracy is more for FCNN architecture, overall accuracy computation is better for DCNN architecture. Also it is observed that when all the convolutional layers are replaced by dilated convolutional layers, the overall accuracy gets increased for both FCNN and DCNN architectures. The improvement in overall accuracy for the dilated architectures is due to the decline of false positives more accurately by the proposed architectures. The segmented results for DRIVE database using FCCN and DCNN with and without dilation are shown in Fig. 5.
In order to verify the data independancy of the proposed architectures, the vessel extracted from the image with different levels of contrast and illumination are listed. In Fig  5, the first row shows the result of low contrast image, the result of high illuminated image in second row and the result of high illuminated and low contrast image in third row using the proposed architectures with and without dilation. The evaluation metric for twenty test images of DRIVE such as sensitivity, specificity and accuracy are calculated and tabulated in Table 5 and for dilation it is tabulated in Table 6. First human observer groundtruth is taken to perform the metric calculation.
The evaluation metric obtained for DCNN architecture is better compared to FCNN architecture. The accuracy value of DCNN is increased to nearly one percentage compared to FCCN. From Tables 5 and 6 it is observed that both architectures with dilated convolution layers have improved accuracy and sensitivity value. The visibility of the vessel extracted image is good for the DCNN dilated architecture due to the avoidance of artifacts arised due to retinal image capturing effects. The maximum sensitivity, specificity and accuracy for twenty images of DRIVE database for Dilated FCNN are 87.57%, 97.97% and 95.92% respectively. The maximum sensitivity, specificity and accuracy for twenty images of DRIVE database for Dilated DCNN is 89.44%, 98.31% and 97.24%. The comparison of segmented image of two Dilated architectures are shown in Fig. 6. From the figure shown, it is clearly identified that the segmented result obtained using DCNN with dilation are better compared to FCNN with dilation. Thin vessels are correctly distinguished from the background by the dilated convolutional layers of DCNN.
The cross validation of the proposed FCNN and DCNN is done using STARE database (i.e) the network is trained using DRIVE database and is tested on STARE database. The abnormal images form STARE database are purposely selected for cross validation. The segmented result for STARE database are shown in Fig. 7. The segmented result for the abnormal images also found good for Dilated DCNN compared to other architectures.
From Fig. 7, it is observed that the abnormalities like red and bright lesions present in the input images are eliminated in a better way by the proposed architectures. The evaluation metric for twenty test images of STARE using both architecture with dilation such as sensitivity, specificity and accuracy are calculated and tabulated in Tables 7 and 8. The maximum sensitivity, specificity and accuracy of STARE database for FCNN with dilation are 95.58%, 98.55% and 96.31% respectively. The maximum sensitivity, specificity   In order to verify the significance of dilation in convolution layers, the proposed architectures are compared with some of the existing deep neural network (DNN) architectures and other recent methods. The performance comparison of DRIVE database with existing methodologies is listed in Table 7 and it is inferred that the proposed depth concatenated dilated architecture results are better compared to most of the existing methods. We also do experimentation on the data independency of the proposed architecture. The dilated architectures which are trained on normal images from DRIVE database is tested over selected abnormal images from STARE database. Sensitivity achieved by the depth concatenated  dilated architecture is about 89.25% and as far as we know, it is the maximum sensitivity obtained for the retinal vessel extraction for STARE database. Crossed trained STARE database also shows better compared to the existing methodologies and they are listed in Table 8. Accuracy and sensitivity of the Dilated DCNN architecture shows better performance compared to other methods (Tables 9 and 10).

Conclusion
Extraction of fine vessels from the retinal fundus images are still a challenging task. I n this work, we proposed two novel deep neural neural netwoks based on dilated convolution layers. Dilated convolution performs well in the vessel detection task irrespective of vessel thickness. I t is also inferred that for the pixel level classification, depth concatenation with dilated convolution improves the result in a better way. The proposed architectures are observed to be the optimum performing architctures due to the inclusion of class balanced cross entropy loss function. The proposed dilated architectures optimized the segmentation of retinal blood vessels. The proposed methodology is evaluated on DRIVE and STARE database using the two different proposed architectures. The fine vessels are detected well in the DCNN with dilation architecture compared to FCNN architecture. In future the work can be further improved by enhancing encoder and decoder layers. The proposed dilated architectures show superior performance in the fine vessel detection even in the presence of abnormalities. I n conclusion, based on the performance measures calculated and also by the naked eye observation proposed architectures outperform the state of art methods.