Background: Breast cancer is the first cancer and fifth cause of death in women around the world. Exploring unique genes for cancers has become interesting. The aim of this study was to explore unique gens of five molecular subtypes of breast cancer in women using penalized logistic regression models.
Methods: In this study, microarray data of five independent GEO datasets was combined. This combination includes genetic information of 324 women with breast cancer and 12 healthy women. Lasso logistic regression and adaptive lasso logistic regression were used to extract unique genes. Biological process of extracted gens was evaluated in open-source GOnet web-application. R software version 3.6.0 with glmnet package was used for fitting the models.
Results: Totally, 119 genes were extracted among fifteen pairwise comparisons. 17 genes (%14) had overlap between comparative groups. Among 27 genes contributed in positive regulation of cell processes, one gene belonged exclusively to this biological process. Among 46 genes contributed in negative regulation of cell processes, 6 genes belonged exclusively. Among 50 genes that were significant in regulation of metabolism, 4 genes belonged exclusively. Among 32 genes that related to response of stress, 4 genes belonged exclusively.
Conclusions: The most genes selected by lasso logistic regression and adaptive Lasso logistic regression, were diagnosed in negative regulation of cell processes.