SELF: A Stacked-based Ensemble Learning Framework for Breast Cancer Classification

DOI: https://doi.org/10.21203/rs.3.rs-2013877/v2

Abstract

Nowadays, breast cancer is the most prevalent and jeopardous disease in women after lung cancer. During the past few decades, a substantial amount of cancer cases have been reported throughout the world. Among them, breast cancer has been a widely acknowledged category of cancer disease in women due to the lack of awareness. According to the world cancer survey report 2020, about 2.3 million cases and 685000 deaths have been reported worldwide. As, the patient-doctor ratio (PDR) is very high; consequently, there is an utmost need for a machine-based smart breast cancer diagnosis system that can detect cancer at its early stage and be cured more efficiently. The plan is to assemble scientists in both the restorative and the machine learning fields to progress toward this clinical application. This paper presents SELF, a stacked-based ensemble learning framework, to classify breast cancer at an early stage from the histopathological images of tumor cells with computer-aided diagnosis tools. In this work, we use the BreakHis dataset with 7909 histopathological images obtained on 82 patients and Wisconsin Breast Cancer Database (WBCD) with 569 instances for the performance evaluation. We have trained several distinct classifiers on both datasets and SELF selected the best five common classifiers on the basis of their accuracy measure to create our ensemble model. We use the stacking ensemble technique and consider the Extra tree, Random Forest, Adaboost, Gradient Boosting, and KNN9 classifiers as the base learners, and the logistic regression model as a final estimator. We have evaluated the performance of SELF on the BreakHis dataset and WBCD datasets and achieved the testing accuracy of approximately 95% and 99% respectively. The result of the other performance parameters on the BreakHis dataset also showed that our proposed framework outperforms with F1-Score, ROC, and MCC scores with values of 94.17%, 89.41%, and 80.81% respectively.