Identification of differently expressed CR-related genes
We collected 870 CRs-related genes from previous studies. Next, we obtained 707 common CRs-related genes from the intersection list of GSE42568 and GSE45255 genesets. A total of 107 BC and 17 normal breast tissue samples were analyzed to identify differentially expressed genes (DEGs) from GSE42568. Using p < 0.05 and |logFC| > 1 as criteria, we identified 168 differentially expressed CRs-related genes including 136 upregulated and 32 downregulated (Additional file 1). The profiles of differentially expressed genes were presented in the heatmap plot (Fig. 1).
GO and KEGG analysis
The GO and KEGG pathway analyses were used to explore the potential functions of the 168 DEGs (136 upregulated and 32 downregulated). The GO analysis included three categories. For BP, genes were significantly enriched in histone modification, peptidyl-lysine modification, chromatin organization, histone methylation, and histone-lysine methylation. For MF, genes were significantly enriched in transcription coregulator activity, histone binding, transcription corepressor activity, histone methyltransferase activity, and transcription coactivator activity. For CC, genes were significantly enriched in methyltransferase complex, histone methyltransferase complex, histone deacetylase complex, PcG protein complex, and heterochromatin. These results suggested that the functions of DEGs were mainly enriched in histone modification and methylation. Moreover, the KEGG pathway analysis indicated that these DEGs were mainly involved in lysine degradation, Notch signaling pathway, cell cycle, viral life cycle-HIV-1, glucagon signaling pathway, thyroid hormone signaling pathway, transcriptional misregulation in cancer, viral carcinogenesis, and FoxO signaling pathway (Fig. 2).
Establishment and validation of the CRs-based signature
Further, we used univariate Cox regression analysis to investigate the prognostic value of differently expressed CR genes. The 12 top genes were selected with p < 0.001 (Fig. 3A). Then, we successfully built a prognostic signature with four genes (MORF4L1, NCOA4, TTK and JMJD4 ) based on the LASSO Cox regression analysis:
Risk score = (-0.55636281 × MORF4L1 expression) + (-0.08847054 × NCOA4 expression) + (0.04836153 × TTK expression) + (0.27976355 × JMJD4 expression) .
The cut-off value was 0.156, and the prognosis was significantly worse in the high-risk group (p < 0.001) (Fig. 3B and C). The time-dependent ROC analysis showed that the AUCs in the GSE42568 dataset were 0.794, 0.784, and 0.837 at 1, 3, and 5 years, respectively (Fig. 3D), representing the good prognostic value. We found similar results in the GSE45255 dataset (Fig. 4A and B). The time-dependent ROC curve showed that the AUCs were 0.78, 0.716, and 0.73 at 1, 3, and 5 years (Fig. 4C).
Expression of the four genes in BC specimens
To validate the above results, we evaluated the expression of the four genes in 21 paired tumor and normal BC samples by RT-qPCR. MORF4L1 and NCOA4 were downregulated (p < 0.05), and TTK was upregulated (p < 0.05) in BC tissues compared with normal tissues. However, JMJD4 was shown none difference in breast cancer tissues compared with normal tissues (p > 0.05) (Fig. 5).
The CR-based signature contributed to BC prognosis
Univariable and multivariable Cox analyses were used to identify the predictive ability of this signature. In the univariate analysis, the tumor grade, T stage, lymph node (LN) status and risk score were associated with the survival of BC patients (p < 0.05) (Fig. 6A). The multivariate analysis showed that the LN status and risk score were dependent prognostic factors (p < 0.05) (Fig. 6B), demonstrating the favorable prognostic ability of the signature.
Association between the signature and clinical characteristics
The boxplot demonstrated that there were prominent differences between the two groups for tumor grade (p < 0.001), T stage (p < 0.001), LN status (p < 0.05), and breast cancer types (p < 0.05) (Fig. 7A), while no statistical significance was detected for age and ER status (p > 0.05). We further investigated the prognostic performance of the CRs-based signature using stratification analysis in subgroups. The signature performed well for patients with age > 60 (p < 0.001), G1-2 (p < 0.05), LN metastasis (p < 0.001), and T2-3 (p < 0.001) (Fig. 7B). On the other hand, it presented a poor performance for those with age ≤ 60, G3, no LN metastasis, T1 stage, luminal subtype, HER2 overexpressing subtype and basal-like breast subtype (p > 0.05).
Nomogram construction
According to the above results, we structured a nomogram with LN status and risk score to graphically assess the survival probability of BC patients. The 3- and 5-year survival rates of BC patients were predicted using this nomogram (Fig. 8A). The C-index of the nomogram was 0.8265, and the 3- and 5-year correction curve demonstrated its good prediction ability (Fig. 8B and C).
Immune infiltration analysis
Since it is widely considered that immune cell infiltration is closely related to the prognosis and treatment of BC, we further explored the immune infiltration differences between the two groups and found that activated T cells were more prevalent in the high-risk group. In contrast, resting mast cells had higher proportions in the low-risk group (Fig. 9A). Additionally, we found that the expression of PDL-1, LAG3 and CD274 were significantly elevated in the high-risk group, demonstrating that immunotherapy might be effective for these patients (Fig. 9B).