The main objective of this research is to design a hybrid system to calculate the risk scores of breast cancer by using a deep neuro-fuzzy system and build a simple rule-based classifier to assign class labels using the calculated risk scores. The hybrid system model is very important for the health sector outside the traditional methods of diagnosing breast cancer. Since the disease can be diagnosed more accurately by using the proposed method, the number of unnecessary biopsies is reduced and the negative effects of performing biopsies, such as anxiety and cost, are also eliminated.
Figure 1 shows the architecture of our proposed deep neuro-fuzzy system. This hybrid system consists of two separate parts: a deep neural network model and a rule-based fuzzy system. Our system starts by training a deep neural network model using the BCDR-D02 dataset and the merged dataset which is a combination of images from BCDR-D02, mini-MIAS datasets as well as images obtained from our university hospital in the first stage. We obtain two output variables from the deep neural network model. The first output is the probability that the mammogram image belongs to the benign class (NN output1), and the second output is the probability that the mammogram image belongs to the malignant class (NN output2). In the second stage, the output variables obtained from the deep learning model are used as input to the rule-based fuzzy system, which uses a set of fuzzy rules and fuzzy sets to evaluate the risk of breast cancer. Then, this risk value is used to classify the mammogram image more accurately. For example, if the computed risk breast cancer risk is 90%, the risk of breast cancer is high therefore the class of the mammogram image is malignant.
The classical CNN and ResNet50 deep learning models trained on the Breast Cancer Digital Repository (BCDR-02) dataset and the merged dataset are used in the proposed system. The success of the models in determining the presence of breast abnormalities in mammogram images is determined by taking only images with calcification abnormalities from both datasets.
A. Deep Learning
Recently, Deep Learning has been widely used to diagnose breast cancer. In this styudy, we utilize deep learning methods to generate input variables for the fuzzy rule-based system. We start our work by training two different architectures, a classical CNN and ResNet50, with the BCDR-D02 dataset and the merged dataset to obtain input variables for the fuzzy rule-based system. In this study, we chose to use CNN and ResNet50 architectures because CNNs have been successfully used in breast cancer detection and the ResNet50 model is fast and provides higher accuracy results in classifying mammograms [12],[25].
A simple CNN architecture, shown in Fig. 2, is used to obtain knowledge from mammogram images, and we compare the results with those of the ResNet50 model. This classical CNN model consists of 3 convolutional blocks, a global average pooling layer and a sigmoid layer. As shown in Fig. 2, a mammogram image is passed through convolutional layer 1 as an input matrix to extract features from the mammogram image. Each convolutional layer consists of 2 convolutional layers and 1 maximum pooling layer. Convolutional layer 1 generates 2 feature maps using a 32x32 filter. Then the maximum pooling layer is used to reduce the dimensionality of the feature maps using a 2x2 matrix. Convolutional layer 2 takes the output of convolutional layer 1 as input and generates feature maps using a 64x64 filter. Then maximum pooling is used to reduce the dimensionality of the feature maps. Similarly, convolutional layer 3 takes the output of convolutional layer 2 as input and generates feature maps using a 32x32 filter. As the last step of convolutional layer 3, the max-pooling layer is applied to reduce the dimension of the feature maps. The global average pooling layer takes the output of convolutional layer 3 and computes the mean of each feature map and passes it to a sigmoid layer. The sigmoid layer produces a vector where each element represents the probability of each class and outputs the class with the higher probability.
The dataset used in this study contains only a few data instances (see subsection C), which are not sufficient to accurately train a deep neural network. Therefore, we use the transfer learning model ResNet50 because transfer learning is a very effective method when the training dataset that is small. Like the classical CNN model, we train the ResNet50 model with the BCDR-D02 dataset and the merged dataset to generate the input for the fuzzy rule-based system. As shown in Fig. 3, we use the original architecture of the ResNet50 model except for the last fully connected layer, where the number of nodes depends on the number of classes in the dataset. To adapt the ResNet50 architecture to our breast cancer dataset, we updated the last fully connected layer.
B. Fuzzy rule-based system
In this paper, the fuzzy system is used to determine the risk scores of mammography images. This expert system helps us to model how a doctor determines the risk of breast cancer. The fuzzy rule-based system consists of two input variables which are the outputs of the neural network model and one output variable which is the breast cancer risk.
Table I. The numerical range of the probability of belonging to malign/benign class for each neural network output
NN output (class label)
|
Probability of malign/benign class
|
Max
|
Min
|
MEAN
|
NN output 1 (Benign)
|
0.99
|
0.39
|
0.69
|
NN output 2 (Benign)
|
0.59
|
0.0001
|
0.30
|
NN output 2 (Malign)
|
0.99
|
0.53
|
0.76
|
NN output 1 (Malign)
|
0.38
|
0.008
|
0.19
|
First, the parameters affecting the risk calculations for breast cancer diagnosis were determined and the input/output variables were defined. Table 1 shows the maximum and minimum probability values calculated by the deep neural network for the malignant and benign classes. For example, according to the first row in Table 1, the image in the benign class is between 0.39 and 0.99, if the output NN is 1. The range in Table 1 was considered when defining the input/output variables of the rule-based fuzzy system. The system consists of two inputs and one output; "probability of belonging to benign class (NN output 1)" and "probability of belonging to malignant class (NN output 2)" are the inputs, while "breast cancer risk" is the output of the fuzzy system.
The output of the fuzzy system (breast cancer risk) is in the range of [0, 100] and the inputs of the fuzzy system are in the range of [0, 1]. Since the triangular and trapezoidal membership functions are the most used and have the simplest shapes, they are selected as membership functions. As shown in Fig. 4, we use both triangular and trapezoidal membership functions to characterize NN-output 1 and NN-output 2. Similarly, both triangular and trapezoidal membership functions are used for breast cancer risk, as shown in Fig. 5. These membership functions are used to fuzzify the input and output variables of the rule-based fuzzy system.
The purpose of the fuzzy rule-based system is to reduce the number of unnecessary breast biopsies by calculating the risk of breast cancer, thereby eliminating the negative consequences of unnecessary breast biopsies such as cost and anxiety. First, appropriate numerical ranges are created for the system’s inputs and outputs. For each range, suitable linguistic expressions are found and assigned. All these ranges are validated with the help of a domain expert. As can be seen in Table 2, for each input there are three different linguistic variables "low", "medium" and "high" whose minimum and maximum values for benign and malignant classes are defined in the table.
Table II. Distribution of probability values obtained from the neural network model
Probability of each class
|
Min
|
Max
|
MEAN
|
NN output 1 (Benign) Low
|
0.1
|
0.4
|
0.25
|
NN output 1 (Benign) Medium
|
0.2
|
0.8
|
0.5
|
NN output 1 (Benign) High
|
0.6
|
0.9
|
0.75
|
NN output 2 (Malign) Low
|
0.1
|
0.4
|
0.25
|
NN output 2 (Malign) Medium
|
0.2
|
0.8
|
0.5
|
NN output 2 (Malign) High
|
0.6
|
0.9
|
0.75
|
Based on the training results of the neural network models, we obtained two different outputs. One is the probability that the image belongs to the malignant class (NN output 2); the other is the probability that the image belongs to the benign class (NN output 1). As listed in Table 2, there are three important points for NN output 2: 0.1 (lower bound), 0.5 (middle bound), and 0.9 (upper bound). Therefore, we have defined three different linguistic variables such as low, medium, and high for NN output 2. We have defined the low, high, and medium linguistic variables respectively in the range of 0.1 to 0.4, 0.6 to 0.9 and 0.2 to 0.8. When we examine the distribution of probabilities, we find that there are probabilities smaller than 0.1 and larger than 0.9. We include values smaller than 0.1 for low linguistic variable, and values larger than 0.9 for high linguistic variable by using a trapezoidal membership function. Thus, the degree of low linguistic variables is 1 for probabilities between 0 and 0.1, and as this probability grows from 0.1, the degree of low linguistic variables decreases. Thus, the degree of the high linguistic variable becomes 1 for all probabilities between 0.9 and 1. We use the triangular membership function for the medium linguistic variable to express ranges between 0.2 and 0.8. This is because at a probability of 0.5, the medium degree of membership is 1. While the mean of 0.1 and 0.4 establishes the lower bound of the triangular membership function of the medium linguistic variable, the mean of 0.6 and 0.9 yields the upper bound for the medium linguistic variable.
Therefore, as shown in Fig. 4, the trapezoidal membership function for the low linguistic variable defined by a lower bound of 0.1, an upper bound of 0.4, is calculated by using Eq. (1).
\({\mu _{low}}\left( x \right)=\left\{ {\begin{array}{*{20}{c}} {0,x>0.4} \\ {\frac{{\left( {0.4 - x} \right)}}{{0.3}},0.1 \leqslant x \leqslant 0.4} \\ {1,x<0.1} \end{array}} \right.\) (1)
Similarly, the trapezoidal membership function for the high linguistic variable in Fig. 4 is defined by a lower bound of 0.6 and an upper bound of 0.9, and it is calculated by using Eq. (2).
$${\mu _{high}}\left( x \right)=\left\{ {\begin{array}{*{20}{c}} {0,x<0.6} \\ {\frac{{\left( {x - 0.6} \right)}}{{0.3}},0.6 \leqslant x \leqslant 0.9} \\ {1,x>0.9} \end{array}} \right.$$
2
The triangular membership function of the medium linguistic variable in Fig. 4 is defined by a lower bound as 0.2, an upper bound as 0.8, and a value m where 0.2 < m < 0.8. We calculate the medium degree of membership by using the following equation:
$${\mu _{medium}}\left( x \right)=\left\{ {\begin{array}{*{20}{c}} {0,x \leqslant 0.2} \\ {\frac{{\left( {x - 0.2} \right)}}{{m - 0.2}},0.2<x \leqslant m} \\ {\frac{{\left( {0.8 - x} \right)}}{{0.8 - m}},m<x<0.8} \\ {0,x \geqslant 0.8} \end{array}} \right.$$
3
On the other hand, in Fig. 5, we define the risk parameter with 5 different linguistic variables such as very low, low, medium, high, and very high. The risk of 50% is the middle point to define the risk of breast cancer as low or high. When the risk is lower than 50%, we can express the range [0, 50] with the linguistic variables low and very low. Similarly, if the risk is higher than 50%, then we define the range [50, 100] with the linguistic variables high and very high. And if the risk is 50%, then we can define the risk with medium linguistic variable. Table 3 shows the range of each linguistic variable for risk score. We use trapezoidal membership functions to include values between 0 and 10 and between 90 and 100. Triangular membership functions are used for risk values between 10 and 90.
Table III. Ranges for breast cancer risk score
RISK
|
Min
|
Max
|
MEAN
|
Very Low
|
0
|
10
|
5
|
Low
|
10
|
40
|
25
|
Medium
|
25
|
75
|
50
|
High
|
60
|
90
|
75
|
Very High
|
90
|
100
|
85
|
Algorithm 1 Fuzzy rules for breast cancer risk
Finally, the risk of breast cancer is determined as "very low", "low", "medium", "high", and "very high" by applying the rules in Alg. 1. As shown in Alg. 1, this expert system uses 9 rules to calculate the risk values for breast cancer. These fuzzy rules are determined with the help of a domain expert, a physician. In our system, we give the probability of NN-output1 and NN-output2 as input to the fuzzy inference system to calculate the risk values using the fuzzy rules in Alg. 1. Then, the defuzzification process is applied to convert the fuzzy risk variable into a numerical value. We calculate numerical risk values by using the defuzzification method with the centre of the area. The midpoint of area method calculates a vertical line that divides the area under the curve into two equal areas. The defuzzified value or numerical risk value is calculated by applying Eq. (4), where\(\alpha {\text{ }}={\text{ }}min\left\{ {x|{\text{ }}x{\text{ }}\epsilon {\text{ }}X} \right\}and\beta {\text{ }}={\text{ }}max\left\{ {x|{\text{ }}x{\text{ }}\epsilon {\text{ }}X} \right\}\).
$$\mathop \smallint \limits_{\alpha }^{{{x^*}}} \mu A\left( x \right)dx=\mathop \smallint \limits_{{{x^*}}}^{\beta } \mu A\left( x \right)dx$$
4
In the following, we give a simple example of how to obtain risk values with fuzzy rules: Suppose the probability values computed by the deep neural network are as follows: Benign probability = 0.8, and Malignant probability = 0.2. According to Fig. 6, the benign probability is in the range of high linguistic variable, so the membership degree of the benign input is calculated by using Eq. 2 as follows:
$${\mu _{high}}_{{}}\left( {0.8} \right){\text{ }}={\text{ }}\left( {0.8{\text{ }} - {\text{ }}0.6} \right){\text{ }}/{\text{ }}0.3{\text{ }}={\text{ }}2/3$$
According to Fig. 7, membership value of malign probability is computed as low.
In this case, Rule 7 in Alg. 1 is applied, which reads, "If benign is high and malignant is low, the risk is very low." Since the fuzzy rule contains the operator AND, the membership degree for very low is the minimum value of the membership degrees of the inputs benign and malignant, which is calculated as follows:
$${\mu _{very - low}}={\text{ }}Min{\text{ }}\left( {2/3,{\text{ }}2/3} \right){\text{ }}={\text{ }}2/3$$
$$\begin{array}{*{20}{l}} {A1=15*2/3=10~~~~~~~~A2=10*2/3*1/2=10/3} \\ {X1=15/2=7.5~~~~~~~~~~~X2=15+(10*1/3)=55/3} \\ {Y1=2/3*1/2=1/3~~~~~~Y2=1/3*2/3=2/9} \\ {X{\text{ }}={\text{ }}((A1*X1){\text{ }}+{\text{ }}(A2*X2)){\text{ }}/{\text{ }}\left( {A1+A2} \right)} \\ {~~~={\text{ }}\left( {\left( {75} \right){\text{ }}+{\text{ }}\left( {550/9} \right)} \right){\text{ }}/{\text{ }}\left( {40/3} \right){\text{ }}={\text{ }}10.2} \\ {Y{\text{ }}={\text{ }}((A1*Y1){\text{ }}+{\text{ }}(A2*Y2)){\text{ }}/{\text{ }}\left( {A1+A2} \right)} \\ {~~~={\text{ }}\left( {\left( {10/3} \right){\text{ }}+{\text{ }}\left( {20/27} \right)} \right){\text{ }}/{\text{ }}\left( {40/3} \right){\text{ }}={\text{ }}0.3} \end{array}$$
Risk score = 10.2
Therefore, the risk score is found as 10.2
After obtaining the numerical risk score value, we apply our simple rule-based classifier to determine the class label of the image. Our classifier labels an image as benign if the risk score is less than or equal to 50, otherwise it is classified as malignant. We have developed this classification rule with the help of an expert. According to our classifier, our toy example where the risk score is 10.2, is classified as benign.
C. Datasets
BCDR-D02 Dataset:
In this research, we use the Breast Cancer Digital Repository (BCDR-DM) to evaluate the performance of the proposed system. The BCDR-DM consists of images from Portuguese patients. We used the BCDR-D02 dataset to classify only calcification abnormalities. The BCDR-D02 database, downloaded from [14], consists of 397 benign and 42 malignant mammogram images. The number of images in the benign class is larger than in the malignant class. We balance the number of instances in each class by applying undersampling [26]. So, we randomly select 42 benign images to have the same number of instances in each class. In addition, we increase the number of mammogram images by combining flipping with rotation transformations of 0, 90, 180, and 270 degrees [27]. In this way, we create 8 new instances for each resulting mammogram image and end up with 336 malignant and 336 benign images.
Merged Dataset:
The merged dataset contains images from BCDR-D02, miniMIAS, our hospital dataset. The hospital dataset have been collected from patients’ mammogram images at the Department of General Surgery, Faculty of Medicine, Cukurova University. We have put a lot of effort into collecting this dataset. Doctors and experts in the hospital helped us to create the dataset. We obtained 74 malignant and 32 benign mammogram images.
The merged dataset also includes mini-MIAS dataset [29] that contains 322 mammogram images, from which 20 images of calcifications, 10 malignant and 10 benign images.
In the merged dataset we have a total of 126 malignant and 439 benign images. We balance the number of instances in each class by applying undersampling [26]. Thus, we randomly select 126 benign images to have the same number of instances in each class. We also increase the number of mammogram images by combining flipping with rotation transformations of 0, 90, 180, and 270 degrees [27]. In this way, we create 8 new instances for each resulting mammogram image and eventually obtain 1008 malignant and 1008 benign images.
D. Evaluation Metrics
We use accuracy, precision, recall, and F-score metrics for evaluating the proposed method. These metrics are computed by using the confusion matrix [28] given in Table 4, which consists of the values for true positive (TP), true negative (TN), false positive (FP), and false negative (FN).
Table IV. A Confusion Matrix
Actual class
|
Predicted class
|
|
Class = YES
|
Class = NO
|
Class = YES
|
True Positive (TP)
|
False Negative (FN)
|
Class = NO
|
False Positive (FP)
|
True Negative (TN)
|
According to Table 4, TP is the number of samples of the positive class (class = YES) that were correctly predicted, TN is the number of samples of the negative class (class = NO) that were correctly predicted, FP is the number of instances of the negative class that were predicted to be in the positive class, and FN is the number of instances of the positive class that were predicted to be in the negative class.
We use accuracy, precision, recall, and F-score values that can be computed by using TP, TN, FP, and FN values to evaluate performance of the proposed system. Accuracy is simply the ratio of correctly predicted observations to total observations, and it is calculated according to Eq. (5). Precision is the ratio of correctly predicted positive observations to total predicted positive observations, and Eq. (6) is used to compute it. Recall is the ratio of correctly predicted positive observations to all observations in the actual positive class, and it is calculated according to Eq. (7). F-Score, which is the weighted average of Precision and Recall, is given in Eq. (8).
\(Accuracy=\frac{{\left( {TP+TN} \right)}}{{\left( {TP+TN+FP++FN} \right)}}\) (5)
$$Precision=\frac{{TP}}{{\left( {TP+FP} \right)}}$$
6
$$Recall=\frac{{TP}}{{\left( {TP+FN} \right)}}$$
7
$$F - score=\frac{{2 \times \left( {Recall \times Precision} \right)}}{{\left( {Recall+Precision} \right)}}$$
8