Comprehensive multi-omics analysis identifies an immune related predicting model in early stage breast cancer

Breast cancer is known the highest incidence cancer in women. Tumor-infiltrating immune cells were reported closely related to cancers’ fate but it’s function still not very clear in breast cancer. The aim of our study is to build a infiltrating immune cells based nomogram model to better predict patients survival and explore its relationship with immune features. We first use CIBERSORT to analyze 22 immune cells’ status in two unrelated breast cancer cohorts (TCGA and METRABRIC). The univariate and multivariate Cox analyses were used to analyze the prognostic ability of immune cells and other clinicopathological factors. Nomogram model were built by using independent prognostic factors. Different immune signatures and gene enrichment analysis were performed in different nomogram risk groups. Multivariate cox analysis showed that Macrophages M2 with HR of 1.733 (95% CI: 1.013-2.966) in TCGA cohort and 1.334 (95% CI: 1.125-1.581) in METABRIC cohort is the only independent prognostic factor among the 22 immune cells in early stage breast cancer. Macrophages M2, age, TNM stage and molecular types were used to build the nomogram model. The AUC of the ROC of nomogram reached 0.732 in TCGA cohort and 0.702 in METABRIC cohort. Using the nomogram score can better classified patients to low and high risk group. High risk group showed higher malignant signatures and predicted immunotherapy response rates than low risk group which consistent with the gene entrenchment analysis. In study, we revealed that M2 macrophages could predict the OS of breast cancer patients. Based on M2 and other clinical features we established a nomogram model which were significantly different in immune features that can better assess the OS risk or be a predictor for the immunotherapy response in breast cancer for further research.


Background
Breast cancer is known the highest incidence cancer in women. Tumor-infiltrating immune cells were reported closely related to cancers' fate but it's function still not very clear in breast cancer. The aim of our study is to build a infiltrating immune cells based nomogram model to better predict patients survival and explore its relationship with immune features.

Methods
We first use CIBERSORT to analyze 22

Conclusion
In this study, we revealed that M2 macrophages could predict the OS of breast cancer patients. Based on M2 and other clinical features we established a nomogram model which were significantly different in immune features that can better assess the OS risk or be a predictor for the immunotherapy response in breast cancer for further research. Background 3 Breast cancer has the highest incidence rate and second highest mortality rate among female in US [1,2]. The overall 5-year survival rate are better for Stage I and II with 98% and 92% respectively.
While the survival rates are decreased to 75% in stage III and 27% for stage IV. The treatment pattern choice were different in different TNM stage patients as in stage I and II breast-conserving surgery plus radiotherapy account for the highest proportion, while in stage III and IV chemotherapy plus radiotherapy were important in America patients [2]. Breast cancer's outcome improved in recent years contributed to the advents of more and more treatment especially the chemotherapy. In most cases, immunotherapy included in the chemotherapy, but their effect still controversy in breast cancer [3,4]. Found the immune condition of the cancer and better classify the breast cancer patients into different risk stages can not only help manage patients better but also help found some patients that may benefit most from immunotherapies.
In breast cancer, tumor-infiltrating lymphocytes (TILs) are the most studied immune cells and it has been shown to be potentially predict the prognosis in specific subtypes of breast cancer especially in HER2 positive and triple negative patients. Large retrospective studies have shown that higher levels of TILs can increase patients' response to neoadjuvant treatment and benefits more from adjuvant treatment. What's more, higher TILs in breast cancer tissues are associated with higher overall survival (OS) and fewer recurrences rates regardless of therapy [5][6][7][8]. Low quantity of TILs in breast cancers were identified as a reason for little efficacy in treatments against PD-1 and PD-L1[9].
However, higher TILs do not always predict good response to treatment or survival, that may relate to the various other immune cells in tumor microenvironment (TME). Besides TILs, tumor-associated macrophages (TAMs) were also ubiquity in cancer. In breast cancer patients, the TAMs have been found associated with different outcomes of treatment response or survival mostly because TAMs contained three subtypes that are M0, M1 and M2, those different subtypes have different inflammation and immunologic functions. The interconversion of them can result in different fate in breast cancer [10][11][12]. Other immune cells like NK cells, mast cells and several immune genes have also been found related to breast cancer outcomes [13][14][15][16]. As more and more immunotherapy being invented, there is growing interest in developing the immunotherapy with other therapies such as molecularly targeted therapies in breast cancer [9]. Therefore, better estimate the status of immune cells in the TME and probe the distribution and function of various immune cells are important in cancers that may improve the efficacy of immunotherapy or found new treatment methods.
CIBERSORT, the abbreviation of "Cell type Identifcation By Estimating Relative Subsets Of RNA Transcripts", is a gene-base algorithm using the method of deconvolution to transform the signature matrix of 547 marker genes into 22 infiltration immune cell types' fraction in tumor tissues that has been successfully validated in some cancers like breast, lung and liver cancers [17][18][19][20]. The Cancer Genome Atlas (TCGA) database is an America project and METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) database is a Canada-UK project [21,22]. Both two databases contents large amount of transcriptome data and correspondent clinicopathologic information of breast cancer and make researchers more convenient in implementing researches. By using the information from TCGA and METABRIC database, the CIBERSORT can count the infiltration immune cells of each breast cancer patients. The study of us aimed to explore the infiltrating immune cells status in early stage breast cancer and their relationship with clinical information and survival outcomes. A nomogram model will be built by using immune cells and clinical factors to better distinguish patients into good and bad risk groups. Furthermore, we also want to explore the relationship of nomogram risk and immune features and found some hub genes and possible network that may related with patients' risk and immunotherapy response.

Data acquisition
The RNA-seq data of TCGA breast cancer and normal control were downloaded from the TCGA data portal (https://tcga-data.nci.nih. gov/tcga/), and were normalized by the "limma" R package. The corresponding survival and clinicopathological information of TCGA cohort were also downloaded from the TCGA data portal. Normalized gene expression data, clinical data and overall survival data of breast cancer patients in METABRIC cohort were downloaded from cBioPortal website (https://www.cbioportal.org/). We excluded patients in both TCGA and METABRIC cohorts with primary metastasis and those with information such as estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2(HER2), lymph node status and tumor size missing and the patients with overall survival (OS) follow-up time less than 30 days were also excluded in the clinical associated analysis.

Tumor infiltration immune cells (TIICs) count
The subpopulations of TIICs were identified by CIBERSORT algorithm. With the help of gene matrix The immune signatures such as SNV Neoantigens, Aneuploidy score BCR Shannon and TCR Shannon of each TCGA breast cancer patients got from the study of Thorsson[23]. To explore the relationship between the nomogram risk groups and the immunotherapeutic response, a web named Tumor Immune Dysfunction and Exclusion (TIDE) (http://tide.dfci.harva rd.edu) was used to predict the response of each patients to anti-PD1 or anti-CTLA4 immunotherapeutic based on the transcriptome information of the TCGA cohort [24].
Identification of hub genes and transcription factor-immune gene networks 6 The selection of different expression genes (DEGs) between nomogram high risk group and low risk group patients in TCGA cohort were used edgeR package in R software. The cutoff values of false discovery rate (FDR) < 0.05 and a log2 |fold change| > 1 were used. Differentially expressed genes were then use Online analysis tool STRING (http://string-db.org) and Cytoscape software to found the hub genes. By the help of package CytoHubba in Cytoscape, five different algorithm (Maximal Clique Density of Maximum Neighborhood Component (DMNC) and Degree) were used and the top20 important hub genes in each algorithm were selected. The overlapped hub genes from these 5 algorithm methods were showed by Venn diagram. The immune genes and transcription factor genes were got from the web IMMPORT(https://www.immport.org/home) and Cistrome Project (http://www.cistrome.org/) respectively and the network were built by Cytoscape. The gene ontology (GO) of Biological processes enrichment analysis were used the analysis function of STRING.

Statistical analyses
Overall survival (OS) was defined as the time of pathologically confirmed disease until death for any reason or the last follow-up. T test or one-way ANOVA was used to compare the immune signatures in different nomogram risk groups. The analyses were performed in SPSS 25.0 or R software 3.6.1, and two-tailed p values < 0.05 were considered significant.

Survival prognosis analyze of TIICs and clinicopathology factors
We first investigated the fractions of tumor infiltrated immune cells (TIICs) between tumor and nontumorous tissues in TCGA cohort and METABRIC cohort. After the exclusion of clinicopathology data missing cases, a total of 602 patients diagnosed with breast cancer were registered in the TCGA cohort as well as 1082 breast cancer patients from METABRIC cohort were included in our study for analysis. The clinical information of both two cohorts are listed in Table 1. To assess the predictive value of TIICs in breast cancer, we examined patient overall survival according to different infiltrated immune cells in TCGA and METABRIC cohort. In TCGA cohort, as shown in Table 2, univariate Cox analysis showed that the Macrophages M2 and Monocyte were act as risk factors with p value lower than 0.05, while other factors showed no significant difference in univariate Cox analysis (Macrophages M2(HR 1.771, 95% CI: 1.155-2.717); Monocyte (HR 1.602, 95% CI: 1.038-2.472) ).
Kaplan-Meier survival analyses showed that high fraction of Macrophages M2 and Monocyte cells were significantly associated with worse OS in breast cancer ( Fig. 1A and B). Univariate Cox analysis also showed that age, TNM type and molecular type of patients were associated with OS (

Construction of nomogram model based on Macrophages M2
As Macrophages M2 performed as an independent prognostic factor in breast cancer, we explored its potential to improve the prognostic accuracy with other clinicopathological prognostic factors by building a nomogram predictive model using the TCGA cohort and verified by METABRIC cohort ( Fig. 2A). After incorporating with Macrophages M2, the nomogram's accuracy to predict OS of breast cancer patients were improved as the AUC of the ROC curve reached to 0.732 (Fig. 2B, table 3) and the C-index reached to 0.751 (Table 3). The calibration plot indicated that the nomogram showed good performance for OS (Fig. 2C). By using the ROC curve and best Youden index, we found the best cut-off of the risk is 112.5. Patients with total points lower than 112.5 were included in low risk group and those with higher than 112.5 were included in high risk group. The METABRIC cohort were used as validation cohort and the accuracy to predict OS of breast cancer patients were also improved by using the nomogram model. The AUC of ROC curve was 0.702 (Fig. 2D, Table 3) and the C-index was 0.715 (Table 3) and the calibration plot also demonstrate good consistency of prediction with reality ( Fig. 2E). The low and high risk group showed significant difference survival in breast cancer patients ( Fig. 3A (TCGA), Fig. 3B (METABRIC)), and it can further distinguish each TNM stage patients into a low or high risk level which were proved in both TCGA (Fig. 3(C, E and G)) and METABRIC cohort (Fig. 3(D, F and H)).

Nomogram model shows significant association with immune features
We compared the TCGA nomogram low and high risk groups according to immune subtypes and other dominant sample characteristics. The proportion of Lymphocytes were higher in low risk group while the proportion of Macrophages were higher in high risk group with p value less than 0.05. Other immune subtype showed no difference between the low and high risk groups (Fig. 4A). High risk group showed higher tumor mutation burden (TMB) (Fig. 4B, p = 0.016) and SNV Neoantigens (Fig. 4C, p = 0.023). The aneuploidy score which is an indicator for DNA damage were significantly higher in high risk group (Fig. 4D, p < 0.0001). T cell receptor (TCR) and B cell receptor (BCR) analysis showed that low risk group had higher TCR Shannon score than high risk group while the BCR Shannon score showed no difference between high and low risk groups (BCR Shannon: Fig. 4E, P = 0.486; TCR Shannon: Fig. 4F, p = 0.024). Because the low and high risk group were highly associated with the immune signatures, we then tested whether they could predict the immunotherapeutic response. By using the TIDE web program, we found that patients in high risk group had a lower TIDE score ( Fig. 4G, p = 0.001) and higher predict clinical response to immune checkpoint blockade than low risk group patients (Fig. 4H, p = 0.002).
Identification of the hub genes and transcription factor-immune gene networks To explore the differential expressed genes (DEGs) within nomogram high and low risk groups in breast cancer, we examined the transcriptional microarray in the 602 patients from TCGA cohort.
Based on risk level, 310 DEGs were found with 192 genes up-regulated and 118 genes downregulated in the nomogram high risk group than the low risk group (Fig. 5A). We then choice the top20 hub genes by using five methods in Cytohubba module by using the 310 DEGs (Supplementary Table 1) and after overlapping analysis, twenty genes were existed in at least 3 methods as shown in Venn picture in Fig. 5B. The expression of the 20 hub genes of each patients were shown in Fig. 5C and fourteen out of the twenty hub genes were immune related genes as framed in red line. We then analyzed the transcription factor and immune genes regulatory network in low and high risk group separately. The survival related transcription factors and immune genes with correlation p value less than 0.05 were selected to build the network. In low risk group, 3 transcription factors BCL11A, AFF4 and POLR3D were positive regulate the protective immune gene MAPK3 and negative regulate 3 hazard immune genes MC4R, IL23R and SFTPD (Fig. 5D). In high risk group, two transcription factors TAT and PPARD with statistical significance in survival were oppositely regulate some hazard immune genes such as CTLA4 and protective immune genes such as ITGAV respectively (Fig. 5E). Biological processes enrichment analysis with gene ontology (GO) in high risk group are showed in Fig. 5F. The most significant biological process was the immune response, and others such as response to external stimulus and cell population proliferation were also significantly enriched in high risk group.

Discussion
Though breast cancer was not though to be immunogenic cancer types in previous compared with other cancer types, the role of specific immune cells in clinical and prognostic of breast cancer still in controversial. In the present research, we used two different breast cancer cohort TCGA and METABRIC with confirmed clinicopathology and RNA-seq expression data to analysis the immune cells expression pattern and their relationship with overall survival. The density of 22 main TIICs in breast cancer were evaluated by CIBERSORT algorithm and we found that though some immune cells shows prognostic ability in TCGA and METABRIC cohorts, only macrophage M2 were testified as an independent prognostic factor by multivariate Cox analysis in both two cohort. Based on the independent prognostic factors including M2 macrophages we build a nomogram model to further improve the prognostic ability in breast cancer. By using the nomogram risk, we can best separate patients into high and low risk groups in all patients as well as in patients in different TNM stages. The nomogram low and high risk groups showed significant difference in some immune features and the predicted clinical response to immune checkpoint blockade were higher in high risk group patients.
The TME comprised by various cells and the predominant cells recruited to and activated in it are immune cells [25,26]. Immune cells in TME are closely related to tumor processes such as growth, angiogenesis and metastasis [27][28][29]. The tumor progression may be effected by imbalanced host immune response, and inverse, the quantity and phenotype of immune cells can be also effected by tumor cells [30,31]. Therefore, testing the immune status and found the possible way to change the infiltrated immune cells type in the TME may help us dope out strategies to improve the response rate of immunotherapy. Tumor infiltrating lymphocytes (TILs) account for most of immune cells in TME which included T cells, B cells and NK cells are highly heterogeneous in tumors and in some cases showed opposite functions and effects on survival [32]. In TCGA cohort, those TILs showed no significant difference in overall survival, while in METABRIC cohort the T cells CD4 memory activated,

T cells regulatory, NK cells resting and another TILs T cells gamma delta showed prognostic ability.
The T cells, which reported to produce some regulatory cytokines such as TGF-β and IL-10 and can be recruited to the cancer environment via CCL22/CCR4, were able to escape from immune surveillance, result in tumor immune tolerance and ultimately reduced tumor survival [33][34][35]. TILs play different functions in breast cancer and more research should be done in the future to help us take advantage of it.
Macrophages are also important in infiltration immune cells in cancer and it can be divided into M0, M1 and M2 subtypes. M0 subtype, which is inactivated and not have inflammatory or tumor-related function, can through different activation pathways polarized to two distinctly immunoregulatory imparity subtypes M1(pro-inflammatory) and M2(anti-inflammatory). M2 macrophages can secrete cytokines to inhibit inflammation and inhibit the proliferation and differentiation of T cells, it can also promote tumor proliferation and some cancer associated process such as angiogenesis [36][37][38][39][40]. TCGA cohort exhibit that the density fraction of macrophage M0 and M1 were higher in tumor tissues than normal tissues, while M2 were on the opposite. In breast cancer tissues the fraction of M2 were higher than M1, which means in breast cancer the macrophage M0 tend to differentiate towards M2 subtype.
We found that only M2 macrophage neither M0 nor M1 macrophages is an independent risk factor to breast cancer patients. Many studies also report macrophages were associated with shorter survival in breast cancer patients and that may due to high M2 subtype polarization [11,12,41]. Based on the shift of macrophages, several approaches such as eliminate the macrophages, blockade the M2 subtype shift, reprogramming the macrophages shift to M1 subtype and inhibit the recruitment of monocyte into cancer may try to as possible anti-tumor therapies in cancer [42].
We build a nomogram model including the density level of macrophages M2 to better predict the OS in breast cancer. The patients were appropriate separated into high risk group and low risk group according to the nomogram score. We found that the low and high risk group were different in some immune signatures. The content of immune cell subtypes were different between low and high risk groups and as Thorsson demonstrate that the tumor with macrophage dominated especially with higher M2 macrophage content and low lymphocytes infiltrate had poor prognosis [23]. DNA aneuploidy, as reported were related to cell proliferation and poor tumor differentiation [43]. In our study, the high risk group patients had higher Aneuploidy score, proliferation and worse prognosis which consistent with the results from a large meta-analysis that aneuploid breast cancer patients had poor DFS and OS [44]. The immunotherapy is an attempt in breast cancer patients when there is no other way out and the current immunotherapy response were relatively poor[3]. It is very important to seek out the potential molecular mechanism of immunotherapy responsiveness and those who are suitable to receive immunotherapy [45]. Tumor mutation burden (TMB) which was calculated as total count of variants divided by the whole length of exons is a novel biomarker for predicting immunotherapy effect. In small-cell lung cancer patients, high TMB means better treatment prognosis either in single or combined immunotherapy [46]. In breast cancer TMB were associated with prognostic immune subclasses and those patients with high TMB and favorable immune-infiltrate subtypes had better prognosis [47]. Neoantigens are a kind of peptides that are generated from somatic mutations and it's immunotherapy response were still contradict in tumors [48]. In order to release tumor-specific immune responses, the Neoantigens can result T cells recognizing cancer cells through T-cell receptors (TCR) interacted with major histocompatibility complex (MHC) [48,49]. Our high risk group patients had higher Neoantigens but lower TCR score than low risk group patients so to improve specificity and sensitivity of neoantigen-MHC complexes may more helpful in those patients. The TIDE score that can indicate tumor immune escape and predict response to immune checkpoint blockade such as anti-PD1 and anti-CTLA4 is mainly used for melanoma and non-small cell lung cancer, but can still try to applied in other tumors [24]. The high risk group patients had lower TIDE score means they experience less immune evasion and may more likely to benefit from immune checkpoint blockade therapy. From the above, we found that though the high risk group patients had worse prognosis they may benefit more from immunotherapy.
Twenty gene selected by multiple methods were supposed to be hub genes between high risk and low risk groups. Fourteen out of the twenty genes were immune genes and among the 14 immune hub genes, CXCR2 is the most studied one in cancer and immunology. Knockdown . Further study could try to focus on these hub genes to regulate tumor microenvironment and immune status, which may help to found some new therapeutic methods for breast cancer.
transcription factors and immune genes network in low and high risk group showed significant difference. In high risk group, higher transcription factor TAT expression co-upregulated with hazard immune genes and co-downregulated with protective immune genes while the transcription factor PPARD were in adverse. Methods that inhibit the gene expression of TAT or improve the gene expression of PPARD may change the immune gene expression and improve the high risk group patients survival as well. Further GO analysis of these survival related immune genes in high risk group showed that they were enriched in cell population proliferation and immune-relevant biological pathways, such as immune response and response to stimulus, this can partly explain the high risk group had relatively poor survival but may have better immune response than low risk group patients.
Though we have used two different cohort and large patients to get and verity our results, there are still some limitations in our study. The first is data missing, as the immune cells infiltration data were calculated by CIBERSORT algorithm and some patients were excluded because of the p value of predictive accuracy higher than 0.05. The second is the population race imbalance. The TCGA is an America project and the METABRIC is a Canada-UK project, as most of the patients included in our study were white, the results should be further researched in people of other races. The last is the study failed to use experiment to prove the results and exploring the underlying mechanisms of the results. Therefore, future research needs to do to better testify our results.

Conclusions
In conclusion, the study of us demonstrate that higher M2 macrophages infiltration in breast cancer tissues were the poor prognostic factor for OS. The nomogram model incorporated in macrophages

Availability of data and materials
The datasets generated and/or analyzed during the current study are available in the TCGA repository[https://www.cancer.gov/] and METABRIC [https://www.cbioportal.org/].

Competing interests
The authors declare that they have no competing interests