Classification System for Early Breast Cancer Diagnosis using Machine Learning

doi:10.21203/rs.3.rs-4118274/v1

Download PDF

Research Article

Classification System for Early Breast Cancer Diagnosis using Machine Learning

https://doi.org/10.21203/rs.3.rs-4118274/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Morbidity and even mortality are common outcomes of breast cancer, which is one of the most common diseases affecting women worldwide. Breast cancer is challenging, expensive, and takes a long time to diagnose manually by radiologist. Since it gets beyond all of the drawbacks of manual diagnosis, an automatic/computer-based diagnosis of breast cancer might be thought of as an alternative to manual diagnosis. Utilizing image processing techniques, computer-based diagnostic systems process breast images from mammograms. This study aims to suggest a computer-based diagnostic system for breast cancer by using machine learning to classify the input mammography image into three classes: normal, benign, and malignant. The suggested system comprises of a certain of steps. The input image is initially pre-processed to remove labels and enhance image quality using median filter and adaptive histogram equalization. The next step entails applying the threshold segmentation technique to segment the cancer cells in order to isolate the region of interest (ROI). The Gray Level Run Length Matrix (GLRLM) feature extraction technique is then implemented to extract texture features from the segmented ROI. Consequently, on the basis of the extracted features, Support Vector Machine (SVM) and k-Nearest Neighbor (kNN) classifier techniques are employed to classify the segmented region as normal, benign, and malignant. The performance of the proposed system was examined via extensive experiments conducted on the well-known Mammographic Image Analysis Society (MIAS) dataset of mammography images. The experimental findings reveal that the proposed system outperforms existing systems, which attained a 98.6% accuracy rate.

Medical image

GLRLM

ROI

machine learning

classification

In recent years, the concept of technology has emerged as one of the most widely used objects available to humans for a wide variety of applications. Digital image processing (DIP) is a critical component of technology, particularly in computer science, and it is important in a different of fields including medical image processing, image in-painting, pattern recognition, biometrics, content-based image retrieval, image de-hazing, and multimedia security [1][2]. The medical and healthcare industries have been looking for novel medical therapies and treatment procedures that can be combined with technological advancements in computer and hardware development. Nowadays, one of the most significant issue concerns by healthcare is cancer. Cancer is a disease characterized by the proliferation of abnormal cells within the human body that has emerged as a significant contributor to morbidity and mortality rates worldwide. In general, physician must have adequate expertise, skills, and the ability to mentally remember multiple events involving various illnesses in order to diagnose and identify diverse cancers from medical images. Additionally, the procedure of diagnostic by physician is tedious and costly. Thus, a computer-based diagnostic systems can provide an accurate decision in a timelier manner with a lower cost [3]. Meanwhile, in the field of medical imaging, computer-based diagnostic system, also known as computer aided diagnosis (CAD), assists doctors in making precise decisions [4]. Medical imaging deals with information in images that medical practitioners and doctors must review and interpret in a short period of time. Analysis of imaging in the medical profession is a critical endeavor since imaging is a primary modality for diagnosing any disease at the earliest stage [5].

Breast cancer is a significant public health concern worldwide. The breast has different parts: 1) lobules are the glands that produce breast milk, 2) ducts are small canals that emerge from the lobules and carry the milk to the nipple; this is the most common place for breast cancer to start, 3) a nipple is an opening in the skin of the breast where the ducts come together and turn into larger ducts so the milk can leave the breast [6]. Figure 1 illustrates normal breast tissue [4].

The purpose of developing automated CAD systems is to accurately extract the targeted diseases. In general, each CAD system has four essential steps which are pre-processing, segmentation, feature extraction, and classification. Figure 2 depicts the general steps for CAD-based breast cancer diagnosis system, as well as the CAD system suggested in this study.

Image pre-processing is essential in medical image processing to achieve the optimal outcomes in subsequent steps of a CAD system, such as segmentation and feature extraction. Pre-processing aims to remove noise as well as flaws introduced during the image capture procedure, resize the image, and enhance its intensity. Image segmentation is critical for pattern recognition and computer vision. The success or failure of the automated system is heavily dependent on segmentation accuracy. The selection of segmentation algorithms in medical imaging is strongly influenced by the application and imaging modality [7]. Feature extraction is the process of extracting feature descriptors from an image in order to reduce data volume. Features are aspects of the overall image or region of interest. Finally, classification is the last step in CAD systems that distinguishes and labels abnormalities [8]. In medicine, classification systems serve an important role in diagnosis.

The aim of this study is to propose computer-based system for early diagnosis of breast cancer. The main contribution of the proposed system includes correctly segmenting the breast region and extracting the most relevant information, leading to an increase in diagnosis accuracy. The remainder of the paper is organized as follows: section 2 reviews previous attempts from the literature. The proposed system is presented in section 3. The findings of the experiments are shown in section 4. Section 5 gives the conclusion.

The processing and analysis of breast mammogram images plays a significant role in the early diagnosis of breast cancer. This section reviews the most influential and relevant existing efforts about the early breast cancer diagnosis using digital image processing. Some of these studies classify the input image into normal or abnormal, while others further classify the abnormal into benign or malignant. The primary challenge in this field of research is improving the accuracy rate of breast cancer detection.

S. Srivastava et al., in 2013, introduced a computer-based system for the early breast cancer diagnosis using digital mammographic images [9]. Contrast-limited histogram equalization technique is utilized for the enhancement purposes. In addition, three-class fuzzy C-means is used for the segmentation process. Consequently, the geometric/shape features as well as texture features such as wavelet and Gabor were extracted. Finally, support vector machine (SVM), k-nearest neighbor (kNN), and artificial neural network (ANN) classifier techniques were used for classifying normal or abnormal cells. Furthermore, SVM provides better results in comparison to the kNN and ANN. This technique was achieved an accuracy rate of 85.57% using Mammographic Image Analysis Society (MIAS) dataset of images. V. Vishrutha et al., in 2015, suggested a strategy for combining wavelet and texture information that leads to increase the accuracy rate of the developed CAD system for the early breast cancer diagnosis [10]. The mammogram images were first pre-processed using median filter. Then, the label and the black background are removed on the bases of sum of each column’s intensities. Consequently, if the total intensity of a column falls below a certain level/threshold, the column will be removed. The resulted images from the pre-processing step were utilized as input for the region growth technique used to determine the region of interest (ROI) as a segmentation step. Discrete Wavelet Transform (DWT) technique was used to extract features from the segmented images/regions. Finally, SVM classifier technique was utilized to categorize the mammogram images as normal or abnormal with an accuracy rate of 92% using Mini-MIAS dataset of images. In 2017, S. Pashoutan et al. developed a CAD system for the early breast cancer diagnosis [11]. For the pre-processing step, cropping begins by employing coordinates and an estimated radius of any artifacts introduced into images to get the ROI where bulk and aberrant tissues are found. Moreover, histogram equalization and median filter were used to enhance the contrast of the images. Edge-based segmentation and region-based segmentation methods were used for the segmentation purposes. Furthermore, four different techniques were utilized for extracting features, such as wavelet transform, Gabor wavelet transform, Zernike moments, and Gray-Level Co-occurrence Matrix (GLCM). Eventually, using the MIAS dataset, this technique reached an accuracy rate of 94.18%. V. Hariraj et al., in 2018, suggested a CAD system for the breast cancer detection [12]. In the pre-processing step, Fuzzy Multi-layer technique was used to eliminate background information such as labels and wedges from images. Moreover, thresholding was used to transform the grayscale image to the binary image. Furthermore, morphological technique was implemented on the binary image to remove undesirable tiny items. Regarding to the segmentation step, k-means clustering was utilized. For the feature extraction purposes, certain shape features were extracted such as: diameter, perimeter, compactness, mean, standard deviation, entropy, and correlation. Finally, the SVM classifier technique provides better accuracy rate, which was 97.8%, out of other tested classifier techniques using Mini-MIAS dataset of images. S. Sarosa et al., in 2019, designed a computer-based system for breast cancer diagnosis [13]. Histogram equalization was utilized for enhancing the images as pre-processing. Consequently, GLCM was used to extract features from the pre-processed images. Finally, backpropagation neural network (BPNN) classifier technique was used to determine whether the input image is normal or abnormal. The performance of this study was evaluated using MIAS dataset of images and it achieved an accuracy rate of 90%. In 2019, A. Arafa et al. introduced a CAD system for the breast cancer detection [14]. In the pre-processing step, the area including the breast region is automatically picked and artifacts as well as pectoral muscle were removed. The Gaussian Mixture Model (GMM) was utilized to segment the ROI. Consequently, texture, shape, and statistical features were extracted from the ROI. For the texture feature, GLCM was utilized. On the other hand, the following shape features such as circularity, brightness, compactness, and volume were extracted. Regarding to the statistical features, mean, standard deviation, correlation, skewness, smoothness, kurtosis, energy, and histogram were extracted. Finally, SVM classifier technique was used to classify the segmented ROI into normal or abnormal. This study was evaluated using MIAS dataset of images and it achieved an accuracy of 92.5%. In 2020, A. Farhan et al. developed a CAD system for classifying the input mammogram images into normal or abnormal [15]. At the beginning, contrast limited adaptive histogram equalization (CLAHE) method was used to enhance the mammogram images. In addition, the histogram of oriented gradient (HOG), GLCM, as well as the local binary pattern (LBP) techniques were used to extract features. Finally, SVM and kNN classifier techniques were used for the classification purposes. The highest accuracy rate of 90.3%, using Mini-MIAS dataset, was obtained when GLCM and kNN were used. E. Hussein et al., in 2021, designed a computer-based system to aid radiologists in providing a second opinion when diagnosing mammograms [16]. In the pre-processing step, median filter was used to remove noise and minor artifacts. Hybrid Bounding Box and Region Growing (HBBRG) algorithm was used to segment the ROI. For the feature extraction, two types of features were extracted which are: 1) statistical features such as mean, standard deviation, skewness, and kurtosis, and 2) texture features such as LBP and GLCM. Consequently, SVM was used to categorize mammography images as normal or abnormal. This study used MAIS dataset to evaluate the performance, and an accuracy rate of 83.45% was obtained. H. Mujizah et al., in 2021, suggested a CAD system for the breast cancer diagnosis [17]. At the beginning, certain pre-processing techniques, such as Gaussian filter and Canny edge detection technique, were implemented to enhance the visual quality of the input images. The thresholding method was also used for the segmentation purposes. To extract features, GLCM was used as texture feature, and area, perimeter, metric, as well as eccentricity were extracted as shape feature. Finally, for the classification step, SVM was used and an accuracy rate of 98.14% was obtained using Mini-MIAS dataset of images. Recently, in 2022, L. Kanya et al. introduced a CAD system for the early detection of breast cancer to classify the input image as normal or abnormal [18]. For the pre-processing step, CLAHE technique was applied to improve the contrast of images. Moreover, to extract the features, the Advanced Gray Level Co-Occurrence Matrix (AGLCM) technique is utilized. Finally, kNN, ANN, and SVM classifier techniques were used, and SVM provides highest accuracy rate, which was 95.6.%, using MIAS dataset.

All of the studies discussed above classified the input images as normal or abnormal. The input images in the subsequent studies, on the other hand, are classified as normal, benign, or malignant.

In 2020, A. Chalampuente et al. designed a CAD system for detecting breast cancer [19]. To eliminate labels and undesired characteristics, the images were first smoothed using Gaussian blurring. Consequently, Gabor wavelet was used to extract features. In general, feature extraction generates a large amount of data that is difficult to interpret, therefore; principal components analysis (PCA) and t-distributed stochastic neighbor embedding (TSNE) were employed in this study to minimize the dimensionality of information. Finally, by utilizing the Mini-MIAS dataset of images, the kNN classifier technique was used to classify mammography images as normal, benign, or malignant and an accuracy rate of 89.3% was achieved. J. Jebamony et al., in 2020, introduced a breast cancer diagnosis computer-based technique [20]. Histogram equalization was performed for the image quality enhancement as a pre-processing. Otsu segmentation technique was applied on the enhanced image to segment the ROI. From the segmented ROI, the energy features were extracted. Consequently, different classifier techniques were used such as ANN, SVM, Fuzzy Support Vector Machines (FSVM), and Core Vector Machine (CVM) to classify the input images into normal, benign, and malignant. Finally, CVM classifier technique provides the highest accuracy rate, which was 95.89%, using MIAS dataset. S. Suradi et al., in 2021, developed a CAD system for the breast cancer diagnosis [21]. Otsu thresholding technique was applied to segment the ROI, and then features such as area, circularity, and solidity were extracted from the ROI. Finally, SVM classifier technique was used to classify the images as normal, benign, or malignant and an accuracy rate of 96.6% was obtained using MIAS dataset. L. bachiret et al., in 2021, proposed a system for developing a computer-based early breast cancer diagnosis [22]. For the pre-processing step, median filter and contrast limited adaptive histogram equalization were implemented for the noise removal and contrast enhancement, respectively. Furthermore, the segmentation process employs a mixture of Otsu and k-means. Texture features such as contrast, energy, correlation, and homogeneity were extracted from the segmented ROI. In addition, shape features such as perimeter, area, circularity, eccentricity, and euler number were also extracted. Finally, the SVM classifier technique was used to determine whether the mammography image is normal, benign, or malignant. This study achieved an accuracy rate of 94.2% using MIAS dataset. Very recently, Al-Fahaidy et al., in 2022, exploit digital mammographic images to design a computer-based system for early breast cancer diagnosis [23]. In pre-processing step, median filter was used for the noise removal purposes, and morphological operation was used for eliminating the background. Then, seeded region growing (SRG) was used for segmenting the ROI. Consequently, features such as first-order statistics, second-order statistics, shape, fractal dimension, and wavelet were extracted form ROI. Furthermore, sequential forward selection (SFS) feature selection technique was performed for selecting the most important and relevant features. Eventually, SVM was used to classify the input image as normal, benign, or malignant, and an accuracy rate of 87.1% was obtained using MIAS dataset.

The remainder of this paper concerns with the extension and further refinement of the strategy of using digital image processing to increase the accuracy rate for the early breast cancer diagnosis.

CAD systems have been developed to aid in the diagnose of a wide range of illnesses including cancers such as breast cancer, lung cancer, tumors, and etc. This study discusses how mammography images can be utilized to early diagnose breast cancer by establishing an effective CAD system. Mammography depicts the breast in gray and white, although the background is often black. A lump or tumor may sometimes appear as a concentrated region of white. Tumors may be cancerous/malignant or non-cancerous/benign. The most crucial point that each CAD system accomplishes to detect breast cancer is to isolate the ROI from the rest of the mammography image. This section describes the steps of the proposed CAD system, beginning with the implementation of various pre-processing processes to enhance the input image. Consequently, a segmentation technique must be used to separate the ROI from the rest of the input image. Finally, from the segmented ROI, several critical features are extracted, and certain classifier techniques are employed to determine whether the input images are normal, benign, or malignant.

3.1 Pre-Processing

The first and most important step of developing any CAD system is pre-processing, which has a significant impact on the segmentation and feature extraction steps. Meanwhile, pre-processing facilitates segmentation and improves feature extraction from the segmented ROI. This step uses region-props to remove the label from the input image, a median filter to reduce noise, and adaptive histogram equalization (AHE) to enhance contrast, see Fig. 3.

3.2 Segmentation

Segmentation can be considered as an important step in the development of a CAD system since segmented cells are used to extract relevant features that are later employed in the classification process. In this study, the threshold-based segmentation technique is employed to isolate the ROI from other portions of the input image in image (d) in Fig. 3, and the output image is a binary image, see Fig. 4.

One of the most efficient segmentation techniques is threshold-based segmentation, which separates an image according to the intensity value of each pixel. It is possible to divide an image into smaller portions by using a single-color value to generate a binary image, where black represents the backdrop and white represents the objects. Depending on the quality of the image, a manual or automatic threshold T value decision can be made. T = 0.8 was chosen for the suggested approach in this study since it yields the best accuracy results.

3.3 Feature Extraction

This step extracts the most significant and effective features from the ROI. These extracted features will be used in the next step to distinguish between normal, benign, and malignant cells. The extracted features involve: 1) statistical features including skewness, mean, entropy, and standard deviation; and 2) texture features such as Gray Level Run Length Matrix (GLRLM) that includes short run emphasis (SRE), long run emphasis (LRE), gray level non-uniformity (GLN), run percentage (RP), run length non-uniformity (RLN), low gray level run emphasis (LGRE), and high gray level run emphasis (HGRE). Furthermore, for classification purposes, all extracted features are fused. Table 1 summarizes the extracted features.

Table 1

Extracted Features
Type of Feature	Name of Feature
Statistical features	Skewness Mean Entropy Standard Deviation
Texture feature: Gray Level Run Length Matrix (GLRLM)	Short Run Emphasis (SRE) Long Run Emphasis (LRE) Gray Level Non-Uniformity (GLN) Run Percentage (RP) Run Length Non-Uniformity (RLN) Low Gray Level Run Emphasis (LGRE) High Gray Level Run Emphasis (HGRE)

3.4 Classification

Eventually, SVM and kNN classifier techniques are used to classify the input images into normal, benign, or malignant. The extracted features are fed into classifier techniques, which create a model that can distinguish between normal, benign, and cancerous cells. Furthermore, all classifiers with k = 5, 10, 15, and 20 employ k-fold cross validation. Figure 5 illustrates the block diagram of the proposed CAD system.

The main objective of the proposed CAD system is to classify breast cancer cells as normal, benign, or malignant. Extensive experiments are carried out in this section to determine how well the proposed strategy performs in terms of accuracy rate. In addition, the proposed system is compared to the existing related studies.

4.1 Dataset

Experiments are conducted on the public and well-known dataset of images known as Mammographic Image Analysis Society (MIAS) to evaluate the performance of the proposed CAD system. The MIAS dataset contains 322 images, 206 of which are normal, 64 of which are benign, and 52 of which are malignant [24]. The images all have the same resolution of 1024 by 1024 pixels.

4.2 Results

The proposed CAD system was designed using MATLAB programming language. The accuracy rate for each extracted feature is assessed using two classifier techniques, SVM and kNN. Tables 2 and 3 show the acquired accuracy rate for the extracted statistical features and the extracted GLRLM features individually. Furthermore, distinct k-fold values were taken into account in each assessment test. The process of dividing a dataset into k folds and testing the model's performance with new data is known as k-fold cross-validation. k is the number of groups that the data sample is divided into. In this study, k-fold cross-validation is utilized in the assessment process to measure the accuracy of the classifier techniques with values for k = 5, 10, 15, and 20. Furthermore, the accuracy rate was calculated using the formula below [25]:

TP + TN/(TP + TN + FP + FN) (1)

Where: TP, TN, FP, and FN refer to true positive, true negative, false positive, and false negative respectively.

Table 2

Accuracy Rate for the Extracted Features using SVM
Features	5 k-fold	10 k-fold	15 k-fold	20 k-fold	Average
Statistical	97.9%	98%	97.8%	97.8%	97.8%
GLRLM	98.1%	98.1%	98%	98%	98.1%

Table 3

Accuracy Rate for the Extracted Features using kNN
Features	5 k-fold	10 k-fold	15 k-fold	20 k-fold	Average
Statistical	96.9%	96.5%	96%	96%	96.3
GLRLM	97%	96.9%	97%	96.6%	96.8

From Tables 2 and 3 it is noticeable that both extracted statistical features and extracted GLRLM features has almost the same impact on the accuracy rate. Furthermore, SVM provides higher accuracy rate than kNN.

Further study has been undertaken by fusing both extracted statistical and extracted GLRLM features, as shown in Table 4.

Table 4

Accuracy Rate of the Proposed CAD System
Classifier Technique	5 k-fold	10 k-fold	15 k-fold	20 k-fold	Average
SVM	98.6%	98.5%	98.6%	98.7%	98.6%
kNN	96.8%	96.9%	97.1%	97.1%	96.9%

It is fairly obvious from Table 4 that the strategy of fusing the extracted features has an impact on increasing accuracy rate. Additionally, the accuracy rate of the SVM classifier technique is higher than that of the kNN.

More investigation has been conducted to determine the appropriate thresholding T value to use for segmentation purposes, see Table 5.

Table 5

Investigating the Optimum Value for Thresholding T
T -Values	kNN	SVM
0.1	90.5%	91.4%
0.2	90.1%	90.4%
0.3	92.3%	93.4%
0.4	92.5%	94.2%
0.5	93.6%	94.5%
0.6	94.4%	95.1%
0.7	95.8%	96.6%
0.8	96.9%	98.6%
0.9	96.4%	98.1%

It is clear from the findings shown in Table 5 that T = 8 produced the highest accuracy rate.

Eventually, Tables 6 and 7 compare the proposed CAD system’s collected results to those of other three current efforts in order to emphasize the results of additional testing. Two of the current efforts employed the SVM classifier technique, and the third one employed kNN.

Table 6

Accuracy Rate of the Tested CAD Systems using kNN
CAD Systems	Accuracy Rate
[19]	89.3%
Proposed	96.9%

Table 7

Accuracy Rate of the Tested CAD Systems using SVM
CAD Systems	Accuracy Rate
[22]	94.2%
[23]	87.1%
Proposed	98.6%

The development of medical image processing applications in the healthcare sector has enhanced the quality and accuracy of disease/cancer diagnosis. This is due to the fact that manually diagnosing a disease/cancer and choosing on medicines is expensive, time-consuming, and requires the assistance of competent experts. While this is happening, medical image processing techniques may be able to accurately and economically extract target diseases and tumors. As consequently, early detection of breast cancer is critical in order to reduce mortality. Thus, modern machine learning algorithms may be predicted to detect breast cancer cells in their early stages. The primary purpose of developing a CAD system for mammography images is to help doctors and diagnostic professionals by providing a second opinion, which promotes trust in the diagnostic method. The proposed CAD system contains multiple steps, including pre-processing step to enhance image quality by employing a median filter and histogram equalization. The thresholding segmentation technique is applied to images in order to distinguish the ROI from other parts of the input image. Following segmentation of the breast region from the mammography image, particular statistical and texture features were extracted. Furthermore, two classifier techniques, SVM and kNN, are utilized to determine whether the segmented region should be classed as normal, benign, or malignant based on the extracted features. Finally, based on the obtained results, the proposed CAD system outperformed the previously used systems by accuracy rate of 98.6%.

Conflict of Interest

Availability of data and materials

Competing interests

Not Applicable

Funding

Not Applicable

Author Contribution

Miran done all practical related issues and methodology.Alan wrote and reviewed the manuscript. In addition, the idea of the contribution was suggested by Alan.

Acknowledgements

This work was supported by the University of Sulaimani.

Shalaw, F., Salih, Abdulla, A.A.: An improved content-based image retrieval technique by exploiting bi-layer concept. UHD J. Sci. Technol. 5(1), 1–12 (2021)
Alan, A., Abdulla, Ahmed, M.W.: An improved image quality algorithm for exemplar-based image inpainting, Multimedia Tools Applications, Springer, vol. 80, no. 9, pp. 13143–13156, (2021)
Arimura, H., Magome, T., Yamashita, Y., Yamamoto, D.: Computer-aided diagnosis systems for brain diseases in magnetic resonance images. Algorithms. 2(3), 925–952 (2009)
Breast Cancer What is breast cancer? American Cancer Society: [Online]. Available: (2022). http://www.cancer.org/cancer/breast-cancer/about/what-is-breast-cancer.html
Kunio, D.: Computer-aided diagnosis in medical imaging: Historical review, current status and future potential, Computerized medical imaging and graphics, Elsevier, vol. 31, no. 4, pp. 198–211, (2007)
Amit, A., Bhadra, M., Jain, Shidnal, S.: Detection of eye diseases, International Conference on Wireless Communications, Signal Processing and Networking, IEEE, pp. 1341–1345, (2016)
Zhang, Y.-J.: An overview of image and video segmentation in the last 40 years. Adv. Image Video Segmentation, pp. 1–16, (2006)
Kurani, A.S., Xu, D.H., Furst, J., Raicu, D.S.: Co-occurrence matrices for volumetric data. J. Heart. 27, 25 (2004)
Srivastava, S., Sharma, N., Singh, S.K., Srivastava, R.: Design, analysis and classifier evaluation for a CAD tool for breast cancer detection from digital mammograms. Int. J. BioMed. Eng. Technol. 13(3), 270–300 (2013)
Vishrutha, V., Ravishankar, M.: Early Detection and Classification of Breast Cancer, Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications, Springer, vol. 1, pp. 413–419, (2015)
Pashoutan, S., Shokouhi, S.B., Pashoutan, M.: Automatic Breast Tumor Classification Using a Level Set Method and Feature Extraction in Mammography, International Conference on Biomedical Engineering, IEEE, pp. 1–6, (2018)
Hariraj, V., Khairunizam, W., Vijean, V., Ibrahim, Z.: Fuzzy multi-layer SVM classification of breast cancer mammogram images. Int. J. Mach. Eng. Technol. 9(8), 1281–1299 (2018)
Sarosa, S., Utaminingrum, F., Bachtiar, F.A.: Breast cancer classification using GLCM and BPNN. Int. J. Adv. Softw. Comput. its Appl. 11(3), 157–172 (2019)
Arafa, A., El-Sokary, N., Asad, A., Hefny, H.: Computer-Aided Detection System for Breast Cancer Based on GMM and SVM. Arab. J. Nucl. Sci. Appl. 52(2), 142–150 (2019)
Farhan, A., Kamil, M.Y.: Texture Analysis of Breast Cancer via LBP, HOG, and GLCM techniques, IOP conference series: materials science and engineering, vol. 928, no. 7, (2020)
Hussein Saeed, E., Saleh, H.A., Khalel, E.A.: Classification of mammograms based on features extraction techniques using support vector machine. Comput. Sci. Inform. Technol. 2(3), 121–131 (2021)
Mujizah, H., Novitasari, D.C.R.: Comparison of the histogram of oriented gradient, GLCM, and shape feature extraction methods for breast cancer classification using SVM. Jurnal Teknologi dan. Sistem Komputer. 9(3), 150–156 (2021)
Kanya Kumari, L., Naga Jagadesh, B.: A Robust Feature Extraction Technique for Breast Cancer Detection using Digital Mammograms based on Advanced GLCM Approach. EAI Endorsed Trans. Pervasive Health Technol. 8, 1–10 (2022)
Andrade, C., Hermel, L.: Universidad De Investigación De Tecnología Experimental Yachay, pp. 61, (2020)
Jebamony, J., Jacob, D.: Classification of Benign and Malignant Breast Masses on Mammograms for Large Datasets using Core Vector Machines. Curr. Med. Imaging. 16(6), 703–710 (2020)
Suradi, S., Abdullah, K.A., Mat, N.A.: Automated Classification of Breast Cancer Lesions for Digitised Mammograms via Computer-Aided Diagnosis System. J. Appl. Sci. Process. Eng. 8(2), 892–902 (2021)
Bachir, L., Daoudi, I., Tallal, S.: Automatic computer-aided diagnosis system for mass detection and classification in mammography, Multimedia Tools and Applications, Springer, vol. 80, no. 6, pp. 9493–9525, (2021)
Al-Fahaidy, B., Al-Fuhaidi, I., Al-Darouby, F., Al-Abady, M., Al-Qadry, Al-Gamal, A.: A Diagnostic Model of Breast Cancer Based on Digital Mammogram Images Using Machine Learning Techniques. Appl. Comput. Intell. Soft Comput., (2022)
J Suckling et al, The Mammographic Image Analysis Society Digital Mammogram Database. Exerpta Med. Int. Congress Ser., pp. 375–378., (1994)
Murtirawat, R., Panchal, S., Singh, V.K., Panchal, Y.: Breast Cancer Detection Using K-Nearest Neighbors, Logistic Regression and Ensemble Learning, International conference on electronics and sustainable communication systems (ICESC), IEEE, pp. 534–540, (2020)

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Classification System for Early Breast Cancer Diagnosis using Machine Learning

Status:

Version 1

Abstract

Figures

1. Introduction

2. Literature Review

3. Methodology

3.1 Pre-Processing

3.2 Segmentation

3.3 Feature Extraction

3.4 Classification

4. Experimental Results

4.1 Dataset

4.2 Results

5. Conclusion

Declarations

Conflict of Interest

Competing interests

Funding

Author Contribution

Acknowledgements

References

Additional Declarations

Status:

Version 1