Method
The data set consists of images that are fed into the GAN network. The network which is adversarial in nature processes the given data. The generator network generates realistic images from the available original data. Simultaneously the discriminator network learns from the real and generated images. The learning will result in a conclusion having a unique result, termed as prediction. Throughout the learning process different parameters are updated and fine tuned. The final segmented results are generated from the Seg Net transferring the weights.
The proposed system shown in Fig. 1 generates images using a generative adversarial network. After that the generated images are used to learn a SegNet, and the parameters are fine tuned by using real data. Through parameter transfer learning, the SegNet is updated for image segmentation. The results are verified by using a test data set.
Augmented image generation using GAN
Instead of traditional square patches we used horizontal and vertical patches. This will keep global and local appearance features in the patches. A dimension of 16x64 and 64x16 is kept for horizontal and vertical patches respectively. The mean intensity and variance of all the patches in the training set are extracted. All the patches are then normalized with zero mean and unit variance before fed into the auto encoder for training.
The idea can be implemented by using a deep learning based generative model. Generative Adversarial Networks (GANs) are considered here for improving system performance by training the system intelligently. The generative adversarial network consists of two opposing networks, one is generative and the other is discriminative in nature. These two networks race to reach the learning target for obtaining optimised results. These networks can be used to generate any distribution of data through training. Random inputs are used here to generate the initial set of images. The discriminator discriminates the images and assists the generator to reduce the discrimination. Through the feedback, the generator modifies the parameters and tries to generate better outputs. This can be achieved by updating loss functions. Generator and discriminator loss are updated throughout the course of process, and thereby achieve resultant images that highly resemble in nature. Thus the network provides data with high quality for the process. CT images can be generated for various applications using these generative models [10]-[15].
The expression for standard GAN loss function is shown here. It comprises loss function of discriminator and loss function of generator. Generator tries to improve the performance and thereby the loss function gets minimised. But the discriminator functions in the reverse direction by trying to maximise the function.
Loss function = Ex [ log ( D(x) ) ] + EZ [ log ( 1- D (G(z)) ) ] (1)
New realistic data can be generated with great accuracy using generative models. Here the probability distribution of the existing dataset is used to generate the new data set. Generative Adversarial Network comes under this category and is used here to generate realistic data. The performance can be improved by integrating multiple generators and discriminators within the system. The GAN can also be trained in a distributed fashion and then it can be presented as a Multi-Discriminator GAN.
In GAN as shown in the Fig. 2, random images are given as the input to the generator section. From the random inputs received, the generator tries to generate sample images. The purpose of the generator section is to generate images that are identical in nature with the real images. So from the random inputs, the generator generates real images like samples. These samples are fed into the discriminator. The discriminator receives real image samples also. The function of the discriminator is to discriminate generated images from the real images. The output of the discriminator section, known as loss functions, are generated and classified as generator loss function and discriminator loss function. The signal provided by the discriminator functions act as a feedback signal to the generator. This signal is used by the generator to update the weights. Through updating the weights of the generator, generates images that are more identical to the real image samples. The aim of the generator is to generate images that are very much identical to the real images. This can be achieved through the updating of weights periodically. Back propagation from the discriminator helps the generator to update the weights and reproduce images with better similarity.
SegNet
Semantic segmentation models in a broad sense comprise networks responsible for encoding and decoding. Here the encoder works as a classification network, which is pre-trained. The encoder is composed of different convolution layers. The feature extraction process is done by the encoder which is composed of convolution layers. Features are taken out of the input through the convolution process. The decoder decodes low resolution feature maps to full input resolution for the purpose of dense classification. The decoder is composed of different de-convolution layers. The aim of the de-convolution layer is up sampling. Through de-convolution the features that are reduced in dimensions are up sampled to image size.
Figure 3 shows the SegNet system flow from the input to the output. In SegNet a set of encoder layers encode the input images based on deep convolution. These images are decoded by corresponding decoder layers. Decoded images are forwarded to a classification layer. This layer acts as a pixel based classifier. Feature fusion layer combine the features that are collected from the patches. Input image sliced into patches in horizontal and vertical fashion. These are passed via the encoder and feature fusion layers. Based on the feature characteristics the decoder layers produce the output. The decoder output will be the lung nodule segmentation result as shown in the Fig. 3.
Transfer Learning Process
Accumulation of information is not happening in the case of traditional machine learning. Information that was learned previously is not considered in the case of the traditional learning process. Traditional learning is an isolated process and considered as a single task learning process. In transfer learning the learning process relies on information that is learned previously. Hence the transfer learning process can be done faster and require less amount of data for training.
Transfer learning starts from taking a previously learned network for some other process. The previously learned information is used here in the new training process which makes it faster and more accurate. The method of reusing a pre-trained model is considered here. The information gained via training can be used for improving the performance of another model by transfer learning process. In practice it is done by transferring the weights that are learned by the network. A problem can be solved easily by using the information learned from solving another problem which is related to the first one. Figure 4 explains the concept of traditional and transfer learning process.
The procedure followed throughout the investigation is described below.
A. Database
Many public databases are available with Data sets of various images of internal organs for research purposes. Lung Nodule Analysis is a data set widely known as LUNA16 with CT images of lungs. Another source is the Decathlon lung data set, having a lot of lung image data. NSCLC radio genomics is also helpful for this purpose. Data sets available with Luna16 are three dimensional CT images intended for lung nodule detection. Decathlon includes three dimensional CT images and the detail of segmentation with labels. Images without pre-processing, from the decathlon lung data set are used to prepare three sets of training data sets. The NSCLC radio genomics data set includes non-small cell lung cancer CT, PET/CT images. Images with segmentation labels can only be used as test images in this system.
B. Setup
LUNA16 lung database consist of 888 CT scans with slice thickness less than 2.5 mm. In LIDC/IDRI database the presence of nodule is categorized into three as no nodule, less than 3mm nodule, greater than 3mm nodule through annotation process. GAN can be used to generate images for lung nodule detection. The same method can be used in generating dataset for nodule segmentation. Due to the lack of labels of large true nodules, generation of data is not practical. So three dimensional CT images corresponding to small lung nodules are considered for the generation of lung images having cancerous lung nodules
C. Training and Optimization
Figure.5 shows some examples of benign and malignant nodules found in the lung tissues. The first row represents benign nodules while the second row shows the malignant nodules. All the images are taken from the LUNA16 lung database consisting of 888 CT scans. GAN is used for generating further images from the dataset.
A total of 100, 1000 data is generated for training the SegNet architecture for nodule segmentation. The real and generated images are considered for the training process. The details of training data for the GAN and SegNet are shown in Table.1.
Table 1
Training the GAN – Original and augmented images used for training
Training | Benign, Augmented | Malignant, Augmented |
SegNet | 450, 1280 | 430, 1148 |
GAN | 225, 640 | 215, 574 |
The original data from LUNA 16 is used to further generate benign and malignant samples using GAN. A total of 440 images and its augmented images are used for training GAN and generated 100,1000 images.
D. Evaluation
There is also a significant difference in segmentation while using GAN based images. This is evident from the evaluation table in Table.2. We used mainly three parameters for evaluation namely, Dice Similarity Coefficient (DSC), Positive Predictive Value(PPV) and Sensitivity. DSC is a measure of overlap between ground truth and segmented result. Sensitivity shows how much positive proportions are measured correctly. Positive results are represented as a proportion as PPV. The parameters are evaluated from confusion matrix as follows:
DSC = 2TP / (FP + 2TP + FN) (2)
PPV = TP / (TP + FP) (3)
Sensitivity = TP / (TP + FN) (4)
Then the receiver operating characteristic (ROC) curve of SegNet, during training the model with GAN and without GAN is also evaluated. The area under curve is calculated with ROC curve.