The processing and analysis of breast mammogram images plays a significant role in the early diagnosis of breast cancer. This section reviews the most influential and relevant existing efforts about the early breast cancer diagnosis using digital image processing. Some of these studies classify the input image into normal or abnormal, while others further classify the abnormal into benign or malignant. The primary challenge in this field of research is improving the accuracy rate of breast cancer detection.
S. Srivastava et al., in 2013, introduced a computer-based system for the early breast cancer diagnosis using digital mammographic images [9]. Contrast-limited histogram equalization technique is utilized for the enhancement purposes. In addition, three-class fuzzy C-means is used for the segmentation process. Consequently, the geometric/shape features as well as texture features such as wavelet and Gabor were extracted. Finally, support vector machine (SVM), k-nearest neighbor (kNN), and artificial neural network (ANN) classifier techniques were used for classifying normal or abnormal cells. Furthermore, SVM provides better results in comparison to the kNN and ANN. This technique was achieved an accuracy rate of 85.57% using Mammographic Image Analysis Society (MIAS) dataset of images. V. Vishrutha et al., in 2015, suggested a strategy for combining wavelet and texture information that leads to increase the accuracy rate of the developed CAD system for the early breast cancer diagnosis [10]. The mammogram images were first pre-processed using median filter. Then, the label and the black background are removed on the bases of sum of each column’s intensities. Consequently, if the total intensity of a column falls below a certain level/threshold, the column will be removed. The resulted images from the pre-processing step were utilized as input for the region growth technique used to determine the region of interest (ROI) as a segmentation step. Discrete Wavelet Transform (DWT) technique was used to extract features from the segmented images/regions. Finally, SVM classifier technique was utilized to categorize the mammogram images as normal or abnormal with an accuracy rate of 92% using Mini-MIAS dataset of images. In 2017, S. Pashoutan et al. developed a CAD system for the early breast cancer diagnosis [11]. For the pre-processing step, cropping begins by employing coordinates and an estimated radius of any artifacts introduced into images to get the ROI where bulk and aberrant tissues are found. Moreover, histogram equalization and median filter were used to enhance the contrast of the images. Edge-based segmentation and region-based segmentation methods were used for the segmentation purposes. Furthermore, four different techniques were utilized for extracting features, such as wavelet transform, Gabor wavelet transform, Zernike moments, and Gray-Level Co-occurrence Matrix (GLCM). Eventually, using the MIAS dataset, this technique reached an accuracy rate of 94.18%. V. Hariraj et al., in 2018, suggested a CAD system for the breast cancer detection [12]. In the pre-processing step, Fuzzy Multi-layer technique was used to eliminate background information such as labels and wedges from images. Moreover, thresholding was used to transform the grayscale image to the binary image. Furthermore, morphological technique was implemented on the binary image to remove undesirable tiny items. Regarding to the segmentation step, k-means clustering was utilized. For the feature extraction purposes, certain shape features were extracted such as: diameter, perimeter, compactness, mean, standard deviation, entropy, and correlation. Finally, the SVM classifier technique provides better accuracy rate, which was 97.8%, out of other tested classifier techniques using Mini-MIAS dataset of images. S. Sarosa et al., in 2019, designed a computer-based system for breast cancer diagnosis [13]. Histogram equalization was utilized for enhancing the images as pre-processing. Consequently, GLCM was used to extract features from the pre-processed images. Finally, backpropagation neural network (BPNN) classifier technique was used to determine whether the input image is normal or abnormal. The performance of this study was evaluated using MIAS dataset of images and it achieved an accuracy rate of 90%. In 2019, A. Arafa et al. introduced a CAD system for the breast cancer detection [14]. In the pre-processing step, the area including the breast region is automatically picked and artifacts as well as pectoral muscle were removed. The Gaussian Mixture Model (GMM) was utilized to segment the ROI. Consequently, texture, shape, and statistical features were extracted from the ROI. For the texture feature, GLCM was utilized. On the other hand, the following shape features such as circularity, brightness, compactness, and volume were extracted. Regarding to the statistical features, mean, standard deviation, correlation, skewness, smoothness, kurtosis, energy, and histogram were extracted. Finally, SVM classifier technique was used to classify the segmented ROI into normal or abnormal. This study was evaluated using MIAS dataset of images and it achieved an accuracy of 92.5%. In 2020, A. Farhan et al. developed a CAD system for classifying the input mammogram images into normal or abnormal [15]. At the beginning, contrast limited adaptive histogram equalization (CLAHE) method was used to enhance the mammogram images. In addition, the histogram of oriented gradient (HOG), GLCM, as well as the local binary pattern (LBP) techniques were used to extract features. Finally, SVM and kNN classifier techniques were used for the classification purposes. The highest accuracy rate of 90.3%, using Mini-MIAS dataset, was obtained when GLCM and kNN were used. E. Hussein et al., in 2021, designed a computer-based system to aid radiologists in providing a second opinion when diagnosing mammograms [16]. In the pre-processing step, median filter was used to remove noise and minor artifacts. Hybrid Bounding Box and Region Growing (HBBRG) algorithm was used to segment the ROI. For the feature extraction, two types of features were extracted which are: 1) statistical features such as mean, standard deviation, skewness, and kurtosis, and 2) texture features such as LBP and GLCM. Consequently, SVM was used to categorize mammography images as normal or abnormal. This study used MAIS dataset to evaluate the performance, and an accuracy rate of 83.45% was obtained. H. Mujizah et al., in 2021, suggested a CAD system for the breast cancer diagnosis [17]. At the beginning, certain pre-processing techniques, such as Gaussian filter and Canny edge detection technique, were implemented to enhance the visual quality of the input images. The thresholding method was also used for the segmentation purposes. To extract features, GLCM was used as texture feature, and area, perimeter, metric, as well as eccentricity were extracted as shape feature. Finally, for the classification step, SVM was used and an accuracy rate of 98.14% was obtained using Mini-MIAS dataset of images. Recently, in 2022, L. Kanya et al. introduced a CAD system for the early detection of breast cancer to classify the input image as normal or abnormal [18]. For the pre-processing step, CLAHE technique was applied to improve the contrast of images. Moreover, to extract the features, the Advanced Gray Level Co-Occurrence Matrix (AGLCM) technique is utilized. Finally, kNN, ANN, and SVM classifier techniques were used, and SVM provides highest accuracy rate, which was 95.6.%, using MIAS dataset.
All of the studies discussed above classified the input images as normal or abnormal. The input images in the subsequent studies, on the other hand, are classified as normal, benign, or malignant.
In 2020, A. Chalampuente et al. designed a CAD system for detecting breast cancer [19]. To eliminate labels and undesired characteristics, the images were first smoothed using Gaussian blurring. Consequently, Gabor wavelet was used to extract features. In general, feature extraction generates a large amount of data that is difficult to interpret, therefore; principal components analysis (PCA) and t-distributed stochastic neighbor embedding (TSNE) were employed in this study to minimize the dimensionality of information. Finally, by utilizing the Mini-MIAS dataset of images, the kNN classifier technique was used to classify mammography images as normal, benign, or malignant and an accuracy rate of 89.3% was achieved. J. Jebamony et al., in 2020, introduced a breast cancer diagnosis computer-based technique [20]. Histogram equalization was performed for the image quality enhancement as a pre-processing. Otsu segmentation technique was applied on the enhanced image to segment the ROI. From the segmented ROI, the energy features were extracted. Consequently, different classifier techniques were used such as ANN, SVM, Fuzzy Support Vector Machines (FSVM), and Core Vector Machine (CVM) to classify the input images into normal, benign, and malignant. Finally, CVM classifier technique provides the highest accuracy rate, which was 95.89%, using MIAS dataset. S. Suradi et al., in 2021, developed a CAD system for the breast cancer diagnosis [21]. Otsu thresholding technique was applied to segment the ROI, and then features such as area, circularity, and solidity were extracted from the ROI. Finally, SVM classifier technique was used to classify the images as normal, benign, or malignant and an accuracy rate of 96.6% was obtained using MIAS dataset. L. bachiret et al., in 2021, proposed a system for developing a computer-based early breast cancer diagnosis [22]. For the pre-processing step, median filter and contrast limited adaptive histogram equalization were implemented for the noise removal and contrast enhancement, respectively. Furthermore, the segmentation process employs a mixture of Otsu and k-means. Texture features such as contrast, energy, correlation, and homogeneity were extracted from the segmented ROI. In addition, shape features such as perimeter, area, circularity, eccentricity, and euler number were also extracted. Finally, the SVM classifier technique was used to determine whether the mammography image is normal, benign, or malignant. This study achieved an accuracy rate of 94.2% using MIAS dataset. Very recently, Al-Fahaidy et al., in 2022, exploit digital mammographic images to design a computer-based system for early breast cancer diagnosis [23]. In pre-processing step, median filter was used for the noise removal purposes, and morphological operation was used for eliminating the background. Then, seeded region growing (SRG) was used for segmenting the ROI. Consequently, features such as first-order statistics, second-order statistics, shape, fractal dimension, and wavelet were extracted form ROI. Furthermore, sequential forward selection (SFS) feature selection technique was performed for selecting the most important and relevant features. Eventually, SVM was used to classify the input image as normal, benign, or malignant, and an accuracy rate of 87.1% was obtained using MIAS dataset.
The remainder of this paper concerns with the extension and further refinement of the strategy of using digital image processing to increase the accuracy rate for the early breast cancer diagnosis.