In this section, we outline the proposed technique, describe the WP-UNet architecture and conduct experiments.
3.1 Weighted Pruning (WP) with Depthwise Separable Convolutions
WP-UNet has been proposed to be implemented on standard convolutions. It was recommended that WP-UNet be added to the regular convolutions. In this work, to minimize the number of parameters and necessary computations in the U-Net model, the regular convolution layers are replaced with depthwise separable layers[21]. WP is added to the U-Net's usable layers. Therefore, WP-UNet achieves a higher and smoother failure curve during training and helps increase model accuracy.
3.2 WP-UNet (Proposed Architecture)
With a few changes, WP-UNet(Fig. 5) follows a similar architecture to U-Net. The other convolution layers are constructed of separable convolutions, except for the first convolution layer, which has a regular convolution. Five blocks are made up of the pruning [27] of encoding layers. The design of the WP-UNet architecture is as follows:
- Block 1: Initial block with a regular convolution layer, ReLU activation function and normalization batch
- Blocks 2, 3 and 4: These blocks of two depthwise separable convolution layers [21], two activation layers and one normalization layer are composed of a WP-Unet block(Fig. 4).
- Block 5: A separable final depthwise layer[21] with a dropout layer[17]
The upsampling of the decoding path is performed with a scale of two to restore the size of the segmentation map. The WP-decoding UNet's direction is made up of a mixture of standard convolution blocks and WP-UNet blocks. It also consists of the same number of network layer blocks.
- Block1: A separable convolution layer in depth with its characteristics concatenated with the dropout layer [17] from the encoding direction block4.
- Blocks 2, 3, and 4: a WP-UNet block and a separable depthwise layer [21] concatenated from the encoding direction with matching blocks
- Block 5: Two WP-UNet blocks with the last block one as the final layer and two depthwise separable layers.
3.3 Configuration
The training was based on Keras with a TensorFlow backend as a Google Colab deep learning framework enabled with an NVidia GPU such as T4(12 GB memory) with a high-memory VM.
3.4 Datasets(KiTs Challenge Dataset)
The KiTs challenge datasets for kidney tumor disease segmentation are the datasets used to assess the performance of WP-UNet. Proposed deep network model applied on the KiTS dataset [5]. It consists of 210 high contrast CT scans of patients, collected in the preoperative arterial process and chosen from a cohort of subjects who underwent partial or radical nephrectomy [26] for one or more kidney tumors at the University Of Minnesotal Medical Center and were applicants for inclusion in this database between 2010 and 2018. The volumes included are characterized by different plane resolutions ranging from 0.437 to 1.04 mm, with slice thicknesses ranging from a minimum of 0.5 mm to a minimum of 5.0 mm in each case.
The dataset also provides the ground-truth mask of both healthy kidney tissue and healthy tumors (Fig. 6) for each case included. Under the guidance of experienced radiologists, a group of medical students manually generated sample labels with only CT scan image axial projections. A detailed description of the segmentation strategy for the ground truth is described in [5]. The KiTs challenge dataset is provided with shape (num slices, height, width) in the standard NIFTI format.
Figure 6 Sample CT scan imaging and ground truth labels from the Kidney, and Kidney Tumor Segmentation (KiTs) Datasets.
3.5 Data Preprocessing
Initially, the resolution of the images of the KiTs challenge dataset stacks was originally 512 x 512, but because of technical limitations, it was resized to 256 x 256. To reduce disk capacity, the data stack is accessible in the standard NIFTI format, which is converted into tfrecords. Owing to the small number of training images available data augmentation techniques have been used. A smaller number of images could lead to a concept known as overfitting, where a trained model performs on training data very well but on new test data performs poorly. Horizontal flip, zoom range, height and width adjustment range were used in these enhancement techniques. After improvement, the number of images of the box stacks dataset grew to 120. The resolution of the images was also decreased in the Kit data set (512 x 512). Center cropping and data normalization were also used to ensure 0 mean and unit variance and the original 3D slices were converted into 2D slices with separable and ReLU convolution layers for training and testing of UNet[19]. For the training of 44175 images and 17030 image verification, the suggested WP-UNet with ReLU activation function is used.
3.6 Optimization
The Adam optimization algorithm [16] has been used to train the network model with a learning rate range from 0,0001 to 0,00001 on the KiTs CT scan image dataset. Losses in the training were based on the KiTs dataset's binary cross-entropy loss. The loss was a weighted sum of the loss of a negative dice and of the binary algorithms for the KiTsdataset.
3.7 Performance Metrics
The key performance metrics used in measuring WP-UNet performance on the CT scan dataset are explained in detail in this section.
In the formula given below, accuracy measures the percentage of correct predictions and is given by:
where TP = Predicted Positive and Its True, TN =Predicted Negative and Its True, FP = Predicted Positive and Its False, FN = Predicted Negative and Its False
-
Mean Intersection Over Union (Mean IOU)
Mean IoU [28] is a popular evaluation method for semantically segmented images that first determines the IOU for each semanticized class and then determines the average over classes. The mean IOU shall be described as:
Floating point FLOPs are essentially a calculation of the number of multiplications and additions of the floating point number to be performed by the processor of the computation device. A neural network in progress requires such floating point operation calculations to estimate the complexity of the proposed model.