Purpose: Choosing the relevant features is important to provide a better understanding of the data and improve the prediction performance. Thus, the main aim of this paper is to identify the risk factors of breast cancer.
Methods: focusing on two different datasets: Breast Cancer Surveillance Consortium (BCSC) and Breast Cancer Coimbra (BCC), we perform a comparative study of various feature selection methods: Filter Methods, Wrapper Methods and Embedded Methods. In addition, this work investigates the stability of these techniques when perturbation on datasets is added. Artficial Neural Network, Random Forest, SVM, Logistic Regression and Decision Tree are used for classification. Results: The results are compared when using all the features and when using only the top ranked. The classification performances are comparable in either cases. Furthermore, we found that invasive, glucose, resistin, insulin, leptin, age, adiponectin, BMI and HOMA are the most relevant features that promote breast cancer.
Conclusion: Our findings demonstrate that the identified feature selection methods can efficiently determine the risk factors of breast cancer.