Accumulating evidence has demonstrated that aberrant DNA methylation plays a crucial role in promoting breast cancer development through regulating gene expression, which is named methylation-driven genes (MDGs). In this study, we identified 288 MDGs in breast cancer and constructed a risk signature based on 19 prognosis-associated MDGs with a moderate predictive ability of patients’ overall survival. Furthermore, we explored the possible molecular features of immune regulation, mutation patterns, and drug sensitivity in patients with different risks. We have also established a nomogram model that combined the risk signature and clinicopathological characteristics, suitable to serve as a predictive biomarker in clinical applications. At last, we built a ceRNA network with 73 pairs of lncRNA-miRNA-mRNA to uncover the underlying molecular in regulating the proposed risk signature.
In recent years, with the development of genomics in cancer, predictive signatures with a cluster of methylated genes and methylation sites of breast cancer have been proposed by many studies. Chuntao Tao et al. constructed a novel prognostic biomarker for breast cancer with 7 DNA methylation sites[7]. In another similar study conducted by Guoyang et al.’s, 4 dmCpG sites were identified and included in the OS-associated promoter CpG sites signature for predicting breast cancer’s prognosis[8]. Xiongdong Zhong et al. screened out 10 prognosis-associated methylation-driven genes found in breast cancer and constructed a predictive nomogram model with 3 out of 10 genes and several clinical features[9]. Yinqi Gao et al. established a prognostic signature in the TCGA-TNBC cohort with 5 differentially methylated sites (DMSs)[10]. Ming Zhang et al. reported a predictive model built with 7 DMSs, which could be used to distinguish breast cancer from normal tissues[11]. Compared to previous predictive models constructed with the abundance of DNA methylation, our predictive signature was based on prognosis-associated methylation-driven genes’ mRNA expression, and we verified the efficiency of the model with multiple validations. Additionally, we noticed that previous studies only proposed predictive models with several DMSs, so our study further performed mechanism exploration and potential drugs’ investigation. Moreover, we constructed a predictive nomogram and mRNA-miRNA-lncRNA network that have important implications in clinical application and identifying novel biomarkers for further research. Our study not only established a new method in breast cancer’s classification, but also shed light on new perspectives for the further molecular investigation and potential treatments for breast cancer.
In our study, all the methylation-driven genes involved in the risk signature were significantly associated with breast cancer prognosis. Amid these genes, 12 out of 19 were previously reported to be associated with breast cancer, a result that could be confirmed via molecular investigation. In previous studies, KRT19[15], CCND2[16], SFRP1[17], C2ORF40[18], and NDRG2[19] could act as tumor suppressors in breast cancer, while TNFRSF18[20], BATF[21], CXCL14[22], and MAL2 [23]could promote the progression of breast cancer. In addition, studies have formerly identified that NRG1[24] and STAT5A[25] play a dual role as both candidate oncogene and tumor suppressor, but the function of NT5E is still controversial[26, 27]. We observed that some of the target genes might promote cancer development through altering their methylation, such as NT5E, NRG1, NDRG2, CCND2[16, 28], and C2orf40. In line with previous reports, a significant correlation between methylation level and mRNA expression of NDRG2, CCND2, and C2orf40 was also observed in our study. Aside from tumor cell regulation, some of the genes, including CXCL14, MAL2[29], NT5E, and TNFRSF18 [20, 30]were demonstrated to regulate breast cancer through tumor immunity and be potential targets of immunotherapy.
The tumor’s immune microenvironment plays a key role in tumor development, clinical therapeutic efficacy, and overall survival of diagnosed patients. Therefore, we speculated that tumor immunity might be the potential mechanism by which the risk signature can be associated with patients’ prognoses. Through immune analysis with CIBERSORT algorithm and data extracted from the TISIDB database, different subpopulations of immune cells were observed in both the low- and high-risk groups. The level of macrophage M0/M2 was positively correlated with the risk score and significantly enriched in the high-risk group, while in the low-risk group, the proportion of CD8+ T and B naïve cells was higher. Previous studies have demonstrated that tumor-associated macrophages M0 and M2 are generally associated with poor prognosis[31], while CD8+ T and B cells are correlated with better breast cancer prognosis[32, 33]. The survival advantages/ disadvantages of the low-risk/ high-risk patients indicated that the subpopulation of infiltrating immune cells might be the potential factors that affect prognosis and may develop as effective immunotherapeutic targets in breast cancer classified with our risk signature.
Aside from tumor immunity, tumor mutation is also pivotal to cancer development. In our research, we identified that TP53 mutation was more frequently observed in patients from the high-risk group, which has a higher proportion of macrophage M0 and M2. Differently, in patients from the low-risk group, the proportion of tumors with PIK3CA mutation is higher. Our conclusions regarding the relationship between tumor mutation and immune cells are consistent with Lin Li et.al’s results, who reported that TP53-mutated cancers have higher level of antitumor immune signatures[34]. Moreover, previous reports have uncovered that TP53 mutation is commonly observed in triple-negative breast cancer (TNBC)[35], and PIK3CA mutation is frequently detected in Luminal and Her2 breast cancer[36, 37]. These conclusions, combined with the results of our analysis, reinforce the idea that more patients with high risks are prone to develop molecular features of TNBC, and patients with low risk are more likely to have Luminal features.
Apart from the molecular investigation, we also expect to screen out candidate therapeutic targets and suitable drugs for patients with different risks. We have observed higher TMB and neoantigen load were observed in patients from the high-risk group, and many of them are also more sensitive to cisplatin. Considering previous studies have suggested that high TMB and neoantigen load are positively correlated with immunotherapy’s sensitivity[38], so we concluded that patients with high risks included in our study may benefit from immunotherapy and chemotherapy based on the use of cisplatin. In contrast, patients in the low-risk group were sensitive to common taxane chemo drugs (paclitaxel and docetaxel) other than anthracycline (doxorubicin). Moreover, low-risk patients are also more sensitive to some antineoplastic drugs and anti-HER2 drugs, which have been proven to affect breast cancer, but have not been widely used in clinical treatment as much as cytarabine, gemcitabine, gefitinib, lapatinib, and methotrexate.
It is widely known that lncRNAs may act as ceRNAs to regulate mRNAs expression, thereby, we performed a ceRNA network analysis to uncover the potential regulators of our risk signature. In our analysis, we identified 38 miRNAs and 9 lincRNAs associated with the target DMGs, and some of the lincRNAs were reported to play a role in cancer development early on. Studies have previously detected that HOXA-AS2 [39]and LINC00707 [40]could promote tumor invasion and metastasis. Other related studies also clarified that the high expression of MAGI2-AS3[41, 42], ADAMTS9-AS2[43], and LINC01140[44] could inhibit the proliferation of breast cancer cells. Until now, no studies have identified the relationship between SOX9-AS1, LINC01354, LINC01198, and breast cancer, but their oncogenic roles in other tumors (e.g., liver cancer, lung cancer, and colorectal cancer) have been well demonstrated.
Our study constructed a risk signature with 19 methylation-driven genes, and the signature is predictable in multiple validations. However, there are still several limitations to our study. Firstly, we only performed bioinformatic analysis without experimental validation. Secondly, further prospective studies are expected to evaluate the value of the proposed risk signature and nomogram in clinical practice. Furthermore, experimental exploration is still needed to elucidate the underlying mechanism of the risk signature and identify the effective therapeutic targets.