The images used in this study are taken from a database on the Brigham Hospital website[12]. In this database, MRI images of 230 patients of different weights are available. Of course, a small number of these images were usable in image processing, so the number of images approved by the doctor with the method has been increased. All patients registered on this site have prostate cancer and the benign and malignant type of this disease has been confirmed by a reputable doctor. In this study, the number of MRI images of the prostate using T2W falling into two benign and malignant types by the physician increased using the popular methods of noise removal in image processing. Then to separate the prostate gland from the rest of the image, the efficient method of Faster R-CNN utilized for region segmentation. Later, for feature extraction and automatic classification, a convolutional neural network with different architectures was used. Finally, these architectures will compare in terms of their efficiency. The results will compare with that of similar methods, which showed the efficiency of this method in response speed and relative accuracy. Figure 2 displays a general overview of the presented methods.
2.1 Data Preprocessing and Augmentation
In this study, MRI images were used using T2W related to 31 prostate cancer patients. This number of images was too small for an in-depth learning-based education system. Thus, it is possible to increase this number using common techniques of data augmentation like rotation, scaling, cropping, translation, color augmentation and then carry out preprocessing which includes applying noise removal filters[1, 13]. Here to minimize the execution time, data augmentation was done using the popular methods of image processing. Due to its valuable characteristics in extracting the features of images, morphology was also used for increasing their number. Structural and formal characteristics like shape, color, and patterns are investigated in morphology. The basic morphological operators are erosion, opening, closing, and dilation. These operators are used for filling holes, connecting objects, etc. In morphological processing, the sizes of the input and output images are the same. In utilizing the dilation operator, the pixels related to the borders of objects are added. In erosion, the pixels of the borders of objects are removed. The closing and opening operators are a combination of the other two operators. Due to their characteristics, the dilation and erosion operators are more widely used[14].
And this article has used these two more common operators, too.
Gaussian filters[15], histogram equalization[16], and the median filter are among popular filtering methods in image processing for removing noises common in MRI images[17, 18].
Hence, we will have more realistic images for processing. Surely the response speed will go higher with the removal of the preprocessing stage independently and the combination of the preprocessing stage and data augmentation. Therefore, in addition to the image itself, five different types of masks have been applied to the images, bringing their number to 186.
2.2 Separating
Zoning or separating the image means separating one part of the image from the others. Separating one area from the rest of the image can be done using area segmentation methods. Currently, one of the popular methods in this regard is Faster R-CNN[19]. This network has several advantages for region segmentation; in this network, the features of each convolutional layer can be used for predicting the related region proposals. Hence, there is no need for an algorithm like selective search, and therefore the model gains speed without repetitive computations. Faster R-CNN is formed of two modules: RPN [20]and Fast R-CNN[21]. In this network, the input image enters the RPN network after mapping the feature by CNN network convolution layers. Suggested areas are generated in RPN; these areas are likely to be present in the area in question. This network has many advantages over one-stage systems such as Yolo [22], SSD[23]. Among other things, the computational cost of the proposed areas is significantly reduced and also this network has no limit to perform calculations on the size of the input image and it can be implemented on real images.
Figure 3 shows how the Faster R-CNN network works on the image. The extraction process of the prostate area is depicted. After extracting the specific features through the domain selection network, the domains produce the same size called ROI, then transfer to the fully connected layer, where they are sorted by a Fast R-CNN layer.
In this research, 80% of images are given to Faster R-CNN as the training. This training is done by the pre-trained network AlexNet[24] then 20% of images are used for assessing the network[25, 26]. In this way, the prostate area is separated from the rest of the MRI as the target area.
2.3 Feature Extraction and Classification by Deep Learning Network
In recent years, deep learning networks have helped physicians diagnose diseases. Using improved algorithms, the method of deep learning can provide precise and professional predictions about an image. This is carried out by creating a training model from a separate data set, the validity of which is later assessed. In unsupervised learning, the software identifies a behavioral algorithm unassisted. In supervised learning, a new set with correct tags is produced. Today machine learning systems are mostly based on supervised learning. In these systems, pre-trained learning networks are used. In this method, each layer is first trained independently and then is done on a precisely regulated and integrated network[27]. Some features need shallow networks while others produce better responses with deeper networks. However, the main aim of medical image processing is to segment images into several regions having the same features. In this regard, CNNs are best for this challenge because feature extraction and classification are done simultaneously in them. In this research, different CNN architectures were compared to diagnose the two types of benign and malignant prostate cancers[18].
Much work has been done to use CNN[28] to diagnose prostate cancer and the extent of the disease[19]. In this study, different CNN architectures for the diagnosis of prostate cancer in both benign and malignant types are compared, and the results show the acceptable performance of the proposed design.
2.3.1 ResNet Neural Network
Residual neural networks rank first in regard to the challenge of LSVRC2015. ResNet architecture has been designed with a maximum of 152 layers for training and learning. This architecture outperforms its previous architectures like AlexNet and GoogleNet[29]. In this structure, the network is formed of accumulated residual blocks. Different architectures with various layers have been introduced so far including ResNet18 where 18 layers are used for training and learning. ResNet34, ResNet50 and ResNet101 are 34, 50 and 101 layers deep, respectively[30]. The more layers, the deeper the network, and the better the response should be. But if the response speed is lower and the relative accuracy is desired, shallow networks should also be considered.
2.3.2 MobileNetv2 Neural Network
It is a convolutional neural network that has 53 layers of depth. You can download a pre-training version of the trained network of over one million images from the ImageNet database. The pre-trained grid can categorize images into 1000 categories of objects, such as keyboards, mice, pencils, and many animals. The MobileNetV2 architecture reverses from a residual structure in which the input and output of the remaining blocks are thin layers of bottlenecks. It also uses lightweight twists to filter the properties of the expansion layer. Finally, it eliminates nonlinearities in thin layers. High speed, low number of parameters, and acceptable accuracy feature strongly in MobileNetv2 neural networks. Having a convolution different from the standard one is the reason for having the minimum parameter in this architecture[31].
The internal structure of the CNN network is shown in Fig. 4. This figure shows how to extract features and classify data. In this way, the images from the previous stage, after passing through the convolution layers and pooling layers, the features extracted by the fully connected layers are divided into two categories, benign and malignant.
2.4 Performance Assessment Criteria
One of the main stages after designing the proposed model is performance assessment, accuracy, and truth. Sensitivity and specificity are two vital indices for the statistical evaluation of performance and the results of classification tests. The quality of the proposed method is possible to be measured and described using sensitivity and specificity[32]. Post-analysis data are classified as below:
True positive (TP), false positive (FP), true negative (TN), and false-negative (FN).
Figure 5 shows the location of the TP, FP, TN, and FN criteria in the confusion matrix.
In this research, malignancy has been chosen as the first group and tagged as positive. Benignancy has been selected as the second group and labeled as negative. Thus if TP is close to 100%, malignant tumors have been correctly diagnosed and if FP is big, the group wrongly diagnosed as malignant has a larger percentage. In order to simultaneously show all criteria, a confusion matrix has been drawn. This matrix is usually used for supervised algorithms. The criteria derived from the confusion matrix are calculated as follows:
Accuracy indicates the correct diagnosis for the whole set, which is calculated by formula 1.
Accuracy= (TP + TN)/(TP + TN + FP + FN) (1)
Sensitivity shows the correct diagnosis of the first group, which is calculated by formula 2.
Sensitivity = TP/(TP + FN) (2)
Specificity shows the correct diagnoses of the second group, which is calculated by formula 3.
Specificity = TN/(TN + FP) (3)
Another criterion used for the measurement is the receiver operating characteristic curve (ROC) where the vertical axis is the true rate of change for the first group (true positive) and the horizontal axis is the false rate of change in the first group (false positive)[33].
One more criterion utilized for performance assessment in machine learning methods is the area under the curve (AUC). AUC value obtains ROC, showing a number between 0 and 1. The closer this number is to 1, the truer diagnoses are[34].