Risk Factors of Breast Cancer Determination: a Comparative Study on Different Feature Selection Techniques

doi:10.21203/rs.3.rs-2120645/v1

Download PDF

Research Article

Risk Factors of Breast Cancer Determination: a Comparative Study on Different Feature Selection Techniques

https://doi.org/10.21203/rs.3.rs-2120645/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Purpose: Choosing the relevant features is important to provide a better understanding of the data and improve the prediction performance. Thus, the main aim of this paper is to identify the risk factors of breast cancer.

Methods: focusing on two different datasets: Breast Cancer Surveillance Consortium (BCSC) and Breast Cancer Coimbra (BCC), we perform a comparative study of various feature selection methods: Filter Methods, Wrapper Methods and Embedded Methods. In addition, this work investigates the stability of these techniques when perturbation on datasets is added. Artficial Neural Network, Random Forest, SVM, Logistic Regression and Decision Tree are used for classification. Results: The results are compared when using all the features and when using only the top ranked. The classification performances are comparable in either cases. Furthermore, we found that invasive, glucose, resistin, insulin, leptin, age, adiponectin, BMI and HOMA are the most relevant features that promote breast cancer.

Conclusion: Our findings demonstrate that the identified feature selection methods can efficiently determine the risk factors of breast cancer.

Feature selection

Stability analysis

Classification

Breast cancer

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Risk Factors of Breast Cancer Determination: a Comparative Study on Different Feature Selection Techniques

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1