Extraction of MRGs from the TCGA database and identification of DE-MRGs
We downloaded a total of 1222 cases from the TCGA database and extracted 927 MRGs from the TCGA gene expression matrix.After adjusting with GSE20685 expression profile,861 MRGs was finally selected as candidate MRGs.After screening, we identified 315 DE-MRGs in the TCGA database under the screening criteria (|logFC|>0.5 and the P-value<0.05), including 163 up-regulated MRGs and 152 down-regulated MRGs. The top 10 differentially up-regulated MRGs and the top 10 differentially down-regulated MRGs were demonstrated in Table 1.The heatmaps of these top differentially up-regulated and down-regulated MRGs were demonstrated in Figure 1.
The potential drug treatments for breast cancer
From the prediction of the Cmap dataset,6 candidate drugs that scored ≤‑0.75 were considered as potential drugs for BC treatment.Strong negative correlationwas found between BC and LY-294002, tanespimycin, apigenin, omeprazole, liothyronine, sirolimus.Strong positive correlation was found between BC and adiphenine,viomycin,Prestwick-692,genistein,isoflupredone and atractyloside(Table 2).These drugs might have therapeutic effects on BC.The tomographes of the top 3 associated molecule drugs were investigated in Pubchem database (Figure 2a-c).
Functional enrichment of DE-MRGs
Function annotation analyses of the 315 DE-MRGs were performed. A total of 791 GO terms (including 593 biological processes,28 cellular components,and 170 molecular functions) were enriched.
The GO analysis includes three categories (biological process, molecular function, and cellular component),and the 10 significant enrichment terms were displayed for each category (Figure 3A).From the biological
process,it was observed that DEGs were predominantly associated with small molecule catabolic process,purine-containing compound metabolic process and nucleoside phosphate biosynthetic process.For the cellular component,these DEGs were enriched in mitochondrial matrix,organelle outer membrane and outer membrane.In molecular function,these DEGs were associated with coenzyme binding,oxidoreductase activity,acting on CH-OH group of donors and oxidoreductase activity.
Furthermore, analysis of the KEGG pathways of these metabolic genes showed that 32 KEGG pathways were enriched,mainly including purine metabolism,Carbon metabolism,Fatty acid degradation, Tryptophan metabolism and Pyruvate metabolism (Fig. 3B)
Construction and of validation the prognostic metabolic gene signature
Univariate Cox regression analysis and lasso‐penalized Cox regression analysis (15) were performed to identify the prognosis related metabolic genes and construct the prognostic gene signature.P<0.01 in univariate Cox regression analysis were considered statistically significant.LASSO analysis identified six DE-MRGs(NT5E,PAICS,PFKL,PLA2G2D,QPRT and SHMT2) which were included in the classifier. The riskscore = 0.0697*expression of NT5E + 0.0182*expression of PAICS + 0.0220*expression of PFKL - 0.1127*expression of PLA2G2D + 0.0239*expression of QPRT + 0.0068*expression of SHMT2. All patients were assigned to high-(n = 519) and low-risk(n = 520) groups based on median risk scores. The AUC of ROC to the risk score was the best in both training set and testing set and the K-M survival analysis indicated significantly different survival time between the two groups (Figure 4).In both univariate and multivariate Cox regressions, the hazard ratio of risk score was maximal compared with other clinical features.The univariate Cox regression focused on the individual variable but may be affected by the confounding factors.The multivariate Cox regression avoided this limitation. These analyses complemented each other and indicated that the risk score could be a definitely independent risk factor for the prognosis of BC (Figure 5).
Constructing and validating a predictive nomogram in the TCGA and GEO cohort
Nomograms was created for OS based on their common clinical traits.The prognostic nomogram for 3- ,5- and 10-year OS of the TCGA and GEO cohort were shown in Figure 6A,B.In order to identify the discriminating superiority of nomograms,various methods were used in this study, including calibration curves,C-index values and DCA curves.Calibration plots showed that the performance of the nomogram was best in predicting 3- ,5- and 10-year OS (Figure 6C,D). The C-index of the nomogram for the prediction of OS were 0.757 in training group,and 0.728 invalidation group.The DCA curves showed some net benefit for predicting OS, especially for 3- ,5- and 10-year survival both in training group and validation group(Figure 6E,F).
External Validation of the Prognostic Signature.
All these five genes have been confirmed that they were significantly differentially expressed between BC and control tissues in the TIMER2 database, with significant differences in expression (Figure 7), which is consistent with our results.Through the Human Protein Atlas Database (https://www.proteinatlas.org/),we compared the protein expression levels in breast cancer and normal tissues(Figure 8A).In the cBioportal database,the mutation rate of QPRT is the highest, 4% in the sample, and other genes also show changes(Figure 8B).
Gene set enrichment analyses
GSEA was performed and found a great majority of the enriched pathways were metabolism related (such as KEGG_PYRIMIDINE_METABOLISM, KEGG_GALACTOSE_METABOLISM, KEGG_PURINE_METABOLISM, KEGG_FRUCTOSE_AND_MANNOSE_METABOLISM, KEGG_GLUTATHIONE_METABOLISM) in the high-risk group,Besides,the metabolism-related pathways(such as KEGG_ETHER_LIPID_METABOLISM, KEGG_NITROGEN_METABOLISM, KEGG_GLYCEROPHOSPHOLIPID_METABOLISM, KEGG_INOSITOL_PHOSPHATE_METABOLISM) were enriched in the low‐risk group, based on the criterion of a NOM p-value <0.05 (Figure 9).
Correlation with Tumor Microenvironment.
Compared with the high-risk group, the Stromal score, Immune score and ESTIMATEscore of the samples in the low-risk group were higher, and the tumor purity in the high-risk group was higher with significant differences(Figure 10).
The relationship was calculated between the signature and the level of immune cell infiltration through the CIBERSORT system.A total of 11 subtypes of immune cells(B cells naive, B cells memory,T cells CD8,T cells CD4 memory activated,T cells regulatory(Tregs),T cells gamma delta,NK cells activated,Macrophages M0, Macrophages M2,Dendritic cells activated,Neutrophils) have a clear correlation between infiltration level and risk score(Figure 11).