Correlation between metabolic characteristics and breast cancer subtypes and prognosis

(BRCA) between In this study, we ran dimension reduction for The Cancer Genome Atlas (TCGA) training set from 943 metabolic genes to 100 principle components (PCs) by principle component analysis (PCA). The Pearson correlations were calculated between PCs and clinical information. The least absolute shrinkage and selection operator (LASSO) Cox regression was used to sort out prognostic PCs. The top-loading genes of PCs validated successfully in TCGA validation set were further used for gene enrichment analysis and the abundance of immune cells between high- and low-PC BRCA patients was compared. Finally, the key genes and prognostic PCs in our result were validated by using Joel dataset.


Conclusions
Our study revealed that the metabolic PCs were related to subtypes and prognosis of patients. The key pathways and genes of these PCs were potential BRCA biomarkers, which may provide a new direction for the future treatment of BRCA.

Background
Breast cancer (BRCA) is the most common cancer in women around the world [1]. Although the mortality of BRCA has decreased, BRCA is still a leading cause of cancer death in females aged 20-59 years [2]- [4]. During malignant transformation of BRCA, metabolic reprogramming was the main characteristic and was different among BRCA subtypes [5]- [8]. Recent studies also revealed that abnormal metabolism was closely associated with dysfunctional immunity in various cancer types[6] [9]. However, abnormal metabolism of BRCA in large samples has not been reported yet. In addition, the relationship is also largely unknown between metabolic features and the prognosis of BRCA.
Here, we analyzed 1039 BRCA samples in The Cancer Genome Atlas (TCGA) by principal component analysis (PCA). We identi ed metabolic PCs that were correlated with histological type, PAM50 subtypes, ER status, PR status, HER2 status and prognosis of BRCA. Moreover, we sorted out the hub genes of these PCs, which could be biomarkers for prognosis and treatment of BRCA.

Acquisition of information from patients with BRCA
We obtained TCGA RNA sequencing data (expected count of genes) and relevant clinical information of BRCA patients from the UCSC Xena [10]. To reduce statistical bias in this analysis, BRCA patients without overall survival (OS) or age information were excluded.
The microarray data from Joel's research [11] was downloaded from GEO (GEO data set GSE10886) for additional validation. Because the follow-up time of patients in the Joel dataset was relatively short (max 118 days), we chose Relapse Free Survival (RFS) instead of OS as a measure of patient's survival. BRCA patients without RFS or age information were excluded. PAM50 subtypes in Joel dataset were predicted according to the codes provided in their research [11].

Selection of metabolic-related gene sets
Four metabolic-related gene sets were downloaded from the "Molecular Signatures Database" module of the GSEA website [12]. The gene sets were as follows: GOBP carbohydrate derivative biosynthetic process, GOBP carbohydrate derivative catabolic process, GOCC lipid droplet, and KEGG fatty acid metabolism (Table S1). A total of 943 metabolic genes in TCGA dataset were used for further analysis.

Preprocess for expression data of TCGA dataset and Joel dataset
The TCGA dataset of 943 metabolic genes was randomly divided into training set (75%) and validation set (25%). The expression matrix of the training and validation sets was normalized by counts per million (CPM). Then normalized data was used to run dimension reduction in the training set to 100 PCs by principal component analysis (PCA) using the "prcomp" function in R [13].
For some genes which were not included in Joel dataset but in the 943 metabolic genes, we all replaced them with zero in Joel dataset. Then Joel dataset was scaled by library size normalization and per gene median normalization. The loadings of 100 PCs from training set were used to predict PCs for validation set and Joel dataset.
Correlation between PCs and subtypes in BRCA Finally, the Pearson Correlation Coe cients between PCs and clinical information of BRCA patients were calculated in all datasets. In this process, all categorical variables of clinical information were assigned different levels and converted to the one-hot encoding format. In ltrating ductal carcinoma, in ltrating lobular carcinoma, and mucinous carcinoma were regarded as level 1 to level 3, respectively. Estrogen receptor (ER)/ progesterone receptor (PR)-positive and HER2-negative were considered as level 1 while ER/PR-negative and HER2-positive were level 2. Patients with indeterminate ER/PR status or equivocal HER2 status were ltered out. For classi cation of the prediction analysis of microarray 50 (PAM50), Luminal A (LumA), Basal, Luminal B (LumB), HER2, and Normal subtypes were regarded as level 1 to level 5, respectively. The levels of these factors above were determined by the results of multivariate Cox regression in the BRCA training set. Then PCs with signi cant correlation (cor > 0.2 or cor <-0.2) in both training set and validation set were sorted out. Analysis of Variance (ANOVA) was used to determine the signi cance of the different BRCA subtypes for these PCs.

Comprehensive prognostic model based on the prognostic metabolic PCs
The 100 PCs and clinical variables in the training set were rstly normalized by Z-score normalization. LASSO Cox regression was used to eliminate variables unrelated to the survival of BRCA using the R package glmnet [14]. Age, histological type, and 13 prognostic PCs were sorted out to establish the comprehensive risk model by multivariate Cox regression. The training set was utilized to construct a prognostic model while validation set and Joel dataset were applied to validate this established model.
The following formula was used to calculate the risk scores: Risk score = coef (PC3) x PC3 + coef (PC4) x PC4 + … + coef (PCn) x PCn + coef (Age) x Age + coef (histological type) x histological type, where coef indicated the coefficient and PCn indicated the PC correlated with survival. According to the median risk scores, patients were divided into low-and high-risk groups. For Joel dataset, "coef (histological type) x histological type" was considered to be zero because of the missing of histological type.

Kaplan-Meier analyses between two groups
Patients were divided into two groups according to different criteria, such as risk score of the prognostic model, CPM (Counts Per Million) of gene expression, and values predicted by PCs. R package survival [15] and survminer [16] were used for Kaplan-Meier analyses in this study. The p values were calculated by log-rank test.
Forest plot R package survminer [16] was used to draw the forest plot for the comprehensive 13-PCs Cox proportional risk model.

Functional analysis
Generally, genes carried different loadings when they were used for predicting PCs. For each PC, KEGG enrichment analysis and Gene Ontology (GO) enrichment analysis were performed for both positive and negative top 20 loading genes with the R package GDCRNATools [17], KEGG website [18] as a supplementary. P <0.01 indicated that the pathway was significantly enriched. The top two signi cant pathways were shown in Venn Diagrams. Only the uppermost pathway was retained if the genes in the two pathways were too similar. If no signi cant pathway was enriched, top 30 loading genes were involved in the functional analysis.
Exploration of the prognostic PCs in the immune microenvironment The ImmuCellAI [19] was used to estimate cell enrichment score. Wilcoxon rank-sum test was used to analyze the difference of cell enrichment score between the high-and low-PC groups. We also tested whether the gene expression was signi cantly different between the groups of high-and low-Tr1 cell abundance.
Comparing the expression of the metabolic genes among the different subtypes of BRCA ANOVA was used to test whether the gene expression was signi cantly different among the different subtypes of BRCA.

Results
Principal component analysis (PCA) of the BRCA expression data of metabolic genes The entire work ow was shown in Figure S1. TCGA BRCA expression data (TCGA dataset) was downloaded from UCSC Xena and separated into training set (75%) and validation set (25%). PCA was run in the training set using 943 genes related to carbohydrate metabolism and lipid metabolism. Gene loadings (Table S2) of the top 100 principal components (PCs) in the training set were used to predict PCs for the validation set and Joel dataset. To con rm that PCs could represent the difference among different samples, the rst three PCs were used to display the three-dimensional distance. Expectedly, different PAM50 subtypes were separated ( Figure 1A-B), indicating the heterogeneity and biological signi cance of metabolic PCs among BRCA patients.

Metabolic PCs correlated with clinical information of BRCA patients
The Pearson Correlation Coe cients were calculated between the clinical information and PCs. Most PCs were associated with histological type, PAM50 subtype, PR status, ER status, and HER2 status in both the training and validation set while no PCs were correlated with tumor stage, age, or tumor metastasis (Table S3). PC1 and PC12 were correlated with histological type, with p-value less than 2.2e-16 and 4.1e-6, respectively ( Figure 2A). The Z-score of in ltrating lobular, in ltrating ductal and mucinous subtype increased in order in PC1. On the other hand, the Z-scores of in ltrating lobular and mucinous subtypes were similar but higher than in ltrating ductal subtype in PC12.
We divided patients in Joel dataset into PAM50 subtypes according to Joel's method. Then Joel dataset was used for validating the correlation between PCs and PAM50 subtypes. PC1 (p < 2.2e-16), PC2 (p < 2.2-16), and PC4 (p < 2.2-16) were signi cantly different ( Figure 2B-C) among different PAM50 subtypes in both TCGA dataset and Joel dataset. In PC1, the normal subtype was the lowest while LumB subtype was the highest. More importantly, LumB and LumA subtypes could also be distinguished ( Figure 2B-C). In PC2, the basal subtype was the largest, followed by HER2 and normal subtypes, and the two luminal subtypes had the least PC2 ( Figure 2B-C). In addition, PR-positive and negative patients could be distinguished by PC2 ( Figure 2D, p < 2.2e-16), which was capable of separating ER-positive and negative patients (p < 2.2e-16, Figure 2E). In PC4, there was a little difference between TCGA dataset and Joel dataset. The Z-score of normal subtype was not the highest in TCGA dataset ( Figure 2B) but was the highest in Joel dataset. The Z-score of basal subtype was the highest in TCGA dataset and the second highest in Joel dataset ( Figure 2C). In both two datasets, HER2 was the lowest (Figure 2B-C). Correspondingly, the Z-score in PC4 was lower in HER2-positive patients compared with the counterparts (p = 1.9e-07, Figure 2F). Conversely, HER2-positive patients showed higher Z-scores in PC6 (p < 2.2e-16) and higher PC78 (p = 3.3e-05) ( Figure 2F).

Metabolic PCs correlated with survival of BRCA patients
Based on the top 100 PCs and clinical information in the training set, the univariate least absolute shrinkage and selection operator (LASSO) model was built. 13 prognostic PCs were sorted out and used to build a multivariate Cox regression model. The Hazard ratios and p values of 13 PCs in multivariate Cox regression were shown in Figure 3A. Based on this prognostic model, prognostic risk scores were calculated for patients in the training and validation sets. Then the patients were divided into high-and low-risk groups according to the median of the prognostic risk scores in each dataset. A signi cant difference in survival rates was found between high-and low-risk patients in both training ( Figure 3B, P < 0.0001) and validation sets ( Figure 3C, p = 0.003).
The multivariate Cox regression and forest plot were conducted in the validation set using the PCs in the prognostic model. The p values were signi cant in PC7, PC13, and PC29, with p =0.0119, 0.0012, and 0.0056, respectively ( Figure S2). We further divided the patients into high-and low-PC groups and investigated whether the three PCs were independently correlated with the survival of BRCA patients. The results showed that only PC13 was signi cant in survival analysis in both training set (p = 3e-04, Figure  S3C) and validation set (p = 0.0063, Figure S3D).

GO and KEGG enrichment for PCs
To depict the biological characteristics of these PCs, the top 20 positive-and 20 negative-loading genes in these PCs were used for KEGG enrichment and GO enrichment. PC7, PC13, and PC29, which were signi cant in both training and validation sets by multivariate Cox regression ( Figure 3A, Figure S2), were considered as prognostic PCs and used to run enrichment analysis. All enrichment results were shown in Table S4. For PCs without signi cantly enriched KEGG pathway (FDR < 0.01), such as PC4, PC6, PC13, PC14, and PC29, top 30 positive-and 30 negative-loading genes were used for enrichment.
The enrichment of positive loading genes of PC1 included the PPAR signaling pathway, regulation of lipolysis in adipocytes, lipid droplet, and lipid catabolic process ( Figure 4A, Table S4). On one hand, mucinous subtype and LumB subtype patients had the high activity of these metabolic features with higher Z-scores in PC1. On the other hand, pyrimidine metabolism, carbohydrate derivative catabolic process, fructose, and mannose metabolism, enriched by the negative loading genes, were up-regulated in in ltrating lobular and normal subtypes, both of which had low PC1 values ( Figure 4B, Table S4).
According to the enrichment of genes in PC12, chemokine signaling pathway, other types of O-glycan biosynthesis, and cAMP metabolic process were up-regulated in in ltrating lobular and mucinous subtypes while fatty acid metabolism and ferroptosis were up-regulated in in ltrating ductal subtype ( Figure 2A, Figure 5C-D, Table S4).
Top-loading genes of PC2 were positively enriched in glycosphingolipid biosynthesis and glycosaminoglycan biosynthesis ( Figure 4C). Upregulation of these pathways was a metabolic feature of the basal subtype because of its high PC2 values. Regulation of lipolysis in adipocytes, endocrine resistance pathway, and sphingolipid biosynthetic process were enriched in negative loading genes of PC2 ( Figure 4D Table S4), whose Z-scores were higher in basal subtype (Figure 2B-C), but lower in HER2 positive BRCA ( Figure 2F).
To investigate the effect of top-loading genes of PCs on the survival of BRCA patients, we ran the Kaplan-Meier survival analysis by comparing patients with high and low expression of each top-loading gene. The p values of survival difference (Table S5) between the two groups were calculated by log-rank test. Genes were marked with colors if the p values were less than 0.05. Accordingly, some genes in these PCs were independently associated with a better prognosis of patients, such as CCL19 in PC4, MT3, HYAL3, UQCC3 in PC1, MMP12, RAMP1 in PC2, FUT7, TYMP, and CCR7 in PC4, while B3GNT5 in PC2, MUCL1 in both PC1 and PC4 were associated with a worse prognosis ( Figure 4A-F).
HER2 subtype seemed to be more complicated. PC4, PC6, and PC78 were correlated with HER2-positive BRCA patients ( Figure 2F). PC6, correlated with higher expression of lipid droplet pathway, PPAR signaling pathways, and ferroptosis, was also relevant to lower expression of genes enriched in glycoprotein metabolic process and other types of O−glycan biosynthesis ( Figure 5A-B, Table S4). PC78 was correlated with higher activity of fatty acid metabolism, nucleoside bisphosphate biosynthetic process, and glycoprotein metabolic process, but with lower activity of endocrine resistance, mannose type O-glycan biosynthesis, glycoprotein biosynthetic process, and glycosaminoglycan metabolic process ( Figure 6C-D). Prognostic genes in PC6 and PC78 included MUCL1 and MT3 in glycoprotein metabolic process, ACSL and ALOX15 in ferroptosis, EHHADH in fatty acid metabolism, PAPSS2 in nucleoside bisphosphate biosynthetic process, FUT9 in both glycoprotein biosynthetic process and mannose type Oglycan biosynthesis ( Figure 5A-B, Figure 5C-D). But ADCY8 in PC6 and PC78 was not enriched in any pathway.
PR-positive patients had lower PC2 values ( Figure 2D) while ER-positive patients tended to have lower values in both lower PC2 and PC14 ( Figure 2E). The glycosphingolipid biosynthesis − lacto and neolacto pathway was enriched in both PC2 and PC14 ( Figure 4C, Figure 6B), indicating the important role of this pathway in ER and PR status.
The pathways in PC7, PC13, and PC29 were correlated to the better prognosis of patients. Chemokine signaling pathway, cytidine deaminase activity, and glycoprotein biosynthetic process were enriched in negative loading genes of PC7 ( Figure 7B, Table S4). Glycoprotein metabolic process, positive regulation of innate immune response, and fatty acid elongation were enriched by the negative loading genes of PC13 ( Figure 7D, Table S4). Aminoglycan catabolic process and protein glycosylation were enriched by the positive loading genes of PC29 ( Figure 7F, Table S4).
In contrast, several pathways were correlated with poor survival of patients, such as glycoprotein metabolic process, other types of O−glycan biosynthesis, mucin-type O−glycan biosynthesis and purine metabolism in PC7, retrograde endocannabinoid signaling, membrane lipid metabolic process, glycosaminoglycan biosynthesis-heparan sulfate/heparin, and protein glycosylation in PC13, purine metabolism, other types of O−glycan biosynthesis, and mucopolysaccharide metabolic process in PC29 ( Figure 7A, C, E, Table S4). It was noteworthy that purine metabolism was correlated with poor survival of patients in both PC7 and PC29 ( Figure 7C, E), indicating that purine metabolism may be an important factor in the survival of BRCA patients.
Hub genes correlated to the survival of BRCA patients To know whether the top-loading genes of subtype-related PCs were correlated with the corresponding subtypes, we selected several prognostic genes for validation. B3GNT5, a positive loading gene in PC2, had high expression in the basal subtype ( Figure S4A). RAMP1, a negative loading gene in PC2, was signi cantly upregulated in LumA and LumB ( Figure S4B) while PC2 was lower in LumA and LumB subtypes ( Figure 2B-C). CCL19, negative loading gene of PC4, was not lowly expressed in basal subtype but was lowly expressed in LumB subtype ( Figure S4C) while PC4 was the highest in basal subtype and LumB subtype was the second higher ( Figure 2B-C). The expression of ALOX15 was higher than basal subtype while PC6 was high in HER2 positive patient ( Figure S4D). ADCY1, ADCY5, and BCL2, negative loading genes of PC2, were all signi cantly highly expressed in ER-positive BRCA patients ( Figure S4E-G) while PC2 was low in ER positive patients ( Figure 2E). PAPSS2 in PC78 positive loading genes was highly expressed in HER2-positive patients ( Figure S4I), but EHHADH in PC78 positive loading genes showed no difference between HER2-positive and negative patients ( Figure S4H), while PC78 was high in HER2 positive patients ( Figure 2F). Therefore, the relationship between PCs and clinical information did not mean that the top-loading genes of PCs were related to the clinical information and it could only imply the possible relationship.
Next, to sort out the hub prognostic genes that may have therapeutic potential, we examined the toploading genes in PCs correlated with survival of BRCA patients and enriched in more than one pathway ( Figure 7, Table S3). CCR7 and CCL19, PC7's negative top-loading genes, were correlated with good prognosis and enriched in the CCR7 chemokine axis, chemokine signaling pathway, and glycoprotein biosynthetic process ( Figure 7A). Speci cally, CCL19 was a negative top-loading gene in two survivalrelated PCs, PC7 and PC13 ( Figure 7B, Figure 7D), and was also downregulated in LumB subtype ( Figure   S4B). APOBEC3D and APOBEC3G, PC7's negative loading genes, were enriched in the cytidine catabolic process and correlated with a better prognosis of patients ( Figure 7A). The CCR7 chemokine axis was comprised of CCL21 and CCL19 [20]. The CCR7 chemokine axis played an important role in orchestrating the in ammatory and immunological responses [21] as well as cancer progression [20], [22]- [25]. Since PC7's top negative loading genes that were enriched in the CCR7 chemokine axis were correlated with a worse outcome of BRCA patients, this axis could be a good indicator for the prognosis of BRCA patients. As genes in other pathways might interact with those in the chemokine axis, we sorted out patients with high expression of genes in this axis and investigated whether other top-loading genes in PC7 affected the survival of these patients. As expected, patients had a better prognosis with high expression of CCL21 if other negative loading genes in PC7 were upregulated, such as APOBEC3D, APOBEC3H, or FUT7 ( Figure   S6 A-C, Table S6). Conversely, they would have a worse prognosis if the positive loading genes in PC7 were overexpressed, such as MUCL1, GPC6, or GALNT15 (Figure S6 D-F, Table S6). Similarly, patients with high expression of CCL19 also had a poor prognosis if GPC6 or GALNT15 was highly expressed ( Figure  S6 G-H, Table S6). MUCL1, which was related to the poor survival of patients with BRCA, was the top-loading gene in three prognostic PCs ( Figure 7A, D, and E). MMP12 was associated with both glycoprotein metabolic process and positive regulation of innate immune response, and its upregulation indicated a better prognosis.
ADCY6 and PAPSS2 in PC13 and PC29 were enriched in purine metabolism and they were correlated with poor survival of BRCA patients ( Figure 7C, E). ELOVL2, an important component in PC2, PC7, PC13, PC29, and PC78, seemed to be an indicator for good prognosis in early time but bad prognosis in late time after 3000 to 4000 days ( Figure S5A-B).

Cell abundance analysis for patients with high prognosis-related PC values
Patients were divided into high and low groups according to the three validated prognostic PCs. Cell abundance for patients was calculated and compared between the two groups. Patients with high value in PC7 had less abundance in exhausted T (Tex) (Figure 8A), but showed more abundance in Natural Killer T (NKT) cells, CD8 naïve cells, T helper 17 (Th17) cells, dendritic cells (DC), neutrophil cells, monocyte cells and macrophage cells ( Figure 8B). These patients seemed to have less cytotoxic T cell in ltration, consistent with enrichment of negative top-loading genes of PC7 in the CCR7 chemokine axis. Figure 8C). Interestingly, cell abundance of Tr1 was also higher in patients with high values in PC29 ( Figure 8D), who had worse prognostic outcomes (p=0.068, Figure S7A). The purine metabolism, enriched by the positive top-loading genes in both PC13 and PC29, might play an important role in BRCA patients with high Tr1 cell abundance. Therefore, we compared the expression of purine metabolismrelated genes between high and low Tr1 cell abundance groups. The result showed that ADCY7 and PAPSS2 but not ADCY6 were correlated with cell abundance of Tr1 ( Figure S7C-D), indicating that metabolic-related PCs and genes were correlated with immunity environment and survival of BRCA patients.

Validation for hub genes and prognostic model by using Joel dataset
Hub genes included in both TCGA dataset and Joel dataset were used for validation. As shown in Figure  9A-E, the Expression of B3GNT5, RAMP1 and BCL2 had signi cant difference between different PAM50 subtype in Joel dataset (p < 2.2e−16, p = 2e−04, p = 8e−05, respectively). The median expression of PAPSS2 and ALOX15 were high in HER2 subtype although the p values of difference between PAM50 subtypes were not signi cant (p=0.13 and p=0.098 respectively).
For the reason of short follow-up time, we used RFS instead of OS for validation in Joel dataset. Although the p values were not all signi cant, we could still get the trends (Figure 9 F-L).

Discussion
Abnormal metabolism was an essential hallmark of breast cancer [6], [26], [27], whose genomic characteristics of different histological types had been documented [28], [29]. But the relationship between abnormal metabolism and survival of BRCA patients was still unclear. The metabolic difference among different histological types or PAM50 subtypes was also scarcely reported in BRCA. In our study, several metabolic PCs correlated with subtypes and prognosis of patients were sorted out.
Of note, although the basal-like subtype of BRCA had a poor prognosis than LumB subtype at 5-years of follow-up, lumB subtype had worse outcome at 10-years [30]. We found that PPAR signaling pathway, regulation of lipolysis in adipocytes, lipid droplet, and lipid catabolic process were highly expressed in LumB subtype. In addition, the expression of CCL19 was lower in LumB subtype compared with other subtypes of BRCA. It had been reported that regulation of lipolysis in adipocytes promoted metabolic remodeling and stimulated the invasiveness of BRCA [31]. PPAR signaling pathway was down-regulated in BRCA compared with normal tissue [32] [12]. BRCA patients with low CCL19 expression tended to have a worse prognosis [36]. These ndings were consistent with our results.
The basal-like subtype, which accounted for 80% of triple-negative breast cancer [33], was characterized by high expression of proliferation-related genes, keratins typically expressed in skin, moderate expression of HER2-related genes, and low expression of luminal-related genes [33], [34]. The basal subtype had a higher PC2 value, thus, the corresponding cohort had the following metabolic features: lower activity in the regulation of lipolysis in adipocyte, endocrine resistance, and sphingolipid biosynthetic process; higher activity in glycosphingolipid biosynthesis − lacto and neolacto series and glycosaminoglycan biosynthesis − keratan sulfate, glycoprotein metabolic process, and protein glycosylation. On the contrary, these metabolic features were unremarkable in ER-positive, PR-positive, LumA, and LumB patients. Among these pathways, glycosphingolipid biosynthesis − lacto and neolacto series and glycosaminoglycan synthesis were reported to be correlated with LumB and basal-like subtype [35]. We found that B3GNT5 and RAMP1 were potential key prognostic genes for the basal-like subtype. B3GNT5, highly expressed in the basal subtype, was associated with a poor prognosis whereas the lowly expressed gene RAMP1 was contrarily associated.
It was reported that basal-like subtype only contained a small number of hormone receptor-positive BRCA patients while LumA and LumB both accounted for a large number [33], which could explain why PC2 values were high in basal subtype but low in ER/PR, LumA, and LumB subtype in our result. Notably, endocrine resistance was enriched by the PC2 negative loading genes, such as ADCY1, ADCY5, and BCL2, which indicated that endocrine resistance occurred in ER-or PR-positive patients who had received endocrine therapy.
HER2-positive subtype represented approximately 15-20% in BRCA and was characterized by ampli cation or overexpression of ERBB2/neu [36]. With the advent of trastuzumab, the prognosis of HER2-positive BRCA patients had considerably improved [37]. Here, we found that HER2-positive patients had higher activity in lipid droplet pathway, PPAR signaling pathways, fatty acid metabolism, nucleoside bisphosphate biosynthetic process, gluconeogenesis pathway, and low activity of endocrine resistance, and other types of O − glycan biosynthesis. These unreported metabolic pathways might contribute to the further biological study of HER2-positive BRCA. Notably, PAPSS2 in nucleoside bisphosphate biosynthetic was a prognostic gene and hub metabolic gene exclusively expressed in the HER2-positive subtype. It was correlated with metastasis of BRCA by sulfation pathway [41].
Basal-like subtype was reported to be more susceptible to ferroptosis while HER2-positive and ER-positive BRCA were resistant to ferroptosis [38]. We found that ACSL1 and ALOX15 in the ferroptosis pathway were important metabolic characteristic genes in HER2-positive and in ltrating ductal subtypes, but correlated with poor survival of BRCA patients. ACSL1 was associated with tumor growth by regulation of GM-CSF [38].
Prognostic PCs and genes were sorted out to reveal novel biomarkers or therapeutic targets of BRCA. PC13 and PC29 were correlated with poor survival of BRCA patients and high Tr1 cell abundance. Purine metabolism was enriched in both PC13 and PC29. In this pathway, ADCY6 and PAPSS2 were indicators for a poor prognosis while ADCY7 and PAPSS2 were correlated with cell abundance of Tr1. This result was consistent with previous studies. Tr1 was correlated with the impaired immune response in hepatocellular carcinoma [39]. ADCY6 was associated with a worse prognosis involving the DNA methylation-regulated immune processes in luminal-like BRCA [39]. PAPSS2 was correlated with BRCA metastasis in the sulfation pathway [40] while ADCY7 expression was strongly associated with immune cell in ltration [41]. All these results indicated the relationship among purine metabolism pathway, Tr1 abundance, and prognosis of BRCA, which could provide a new therapy direction for BRCA.
Using the genes in the chemokine signaling pathway as biomarkers of BRCA cancer, other PC7's toploading genes, such as APOBEC3D, APOBEC3H, FUT7, MUCL1, GPC6, and GALNT15, should be comprehensively considered. APOBEC3A was a major cytidine deaminase in breast cancer cells and was possibly a prominent contributor to the APOBEC mutation signature [42], which was promoted by the immune microenvironment [43]. APOBEC3G in the cytidine catabolic process was related to human CD4 Recently, it was reported that ELOVL2 was a novel tumor suppressor attenuating tamoxifen resistance in BRCA. Down-regulation of ELOVL2 induced reprogramming of lipid metabolism and contributed to the malignant phenotypes [39]. Consistent with their results, we found that ELOVL2 was a negative toploading gene in three prognostic PCs and that it was an indicator for good prognosis in early survival time but bad prognosis in long term. This indicated that ELOVL2 was a potentially sensitive indicator.
In summary, our research had identi ed metabolic PCs which were related to subtypes and prognosis of BRCA. By analyzing these PCs, we discovered metabolic pathways and hub genes that may provide new biomarkers for the guidance of BRCA treatment in the future. The mechanism that impacted on survival and expansion of BRCA should be further clari ed. Our research had certain shortcomings and limitations. Firstly, external validation with other clinical datasets would be bene cial. Secondly, the biological signi cance of PCs had not been fully elucidated. Last but not least, more pathways should be included so that we could reveal the whole picture of the interaction between metabolic PCs and other PCs.

Conclusions
Based on 1039 TCGA samples, our study revealed several metabolic PCs which were related to subtypes and prognosis of BRCA patients. The key pathways and genes within these PCs were described in detail, containing potential biomarkers that were worth studying in the future. These results also provided a new direction for the treatment of different subtypes of BRCA. Availability of data and materials All datasets used are publicly available and referenced in the methods section. Y.L. conducted datasets searches, model discovery and validation analyses. L.Y. and H.L. provided expertise on BRCA. Y.L., Z.C. and L.Y. wrote the paper. All authors read and approved the nal manuscript.   subtype in TCGA dataset. P value was calculated from ANOVA.
(C) Boxplots showed the Z-score difference of PC1, PC2 and PC4 among patients with different PAM50 subtype in Joel dataset. P value was calculated from ANOVA. (D) Boxplots showed the Z-score difference of PC2 among patients with different PR status in TCGA dataset. P value was calculated from ANOVA.
(E) Boxplots showed the Z-score difference of PC2 and P14 among patients with different ER status in TCGA dataset. P value was calculated from ANOVA.
(F) Boxplots showed the Z-score difference of PC4, PC6 and PC78 among patients with different HER2 status in TCGA dataset. P value was calculated from ANOVA.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.