Construction and Validation of an Immune-related Gene Pairs Prognostic Model for Breast Cancer

Background: Immunotherapy plays an increasingly important role in the treatment of advanced female breast cancer, which has the highest mortality rate among malignant tumors. The purpose of this study was to identify immune-related genes associated with breast cancer prognosis as possible targets of immunotherapy, and their related biological processes and signaling pathways. Methods: Clinical data and gene expression proles of patients with breast cancer were extracted from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases and divided into training (n = 1053) and verication (n = 508) groups. CIBERSORT was used to predict differences in immune cell inltration in patient subsets stratied according to risk. Gene Ontology (GO) enrichment analysis was used to identify pathways associated with immune-related genes in patient subsets stratied according to risk. Results: The prognostic model composed of 27 immune-related gene pairs signicantly distinguished between high- and low-risk patients. Univariate and multivariate analyses indicated that the model was an independent prognostic factor for breast cancer. Among the identied genes, APOBEC3G, PLXNB1, and C3AR1 had not been previously studied in breast cancer and warrant further exploration. CCR chemokine receptor binding, regulation of leukocyte-mediated cytotoxicity, T cell migration, T cell receptor complex, and other pathways were signicantly enriched in low-risk patients. M2 and M0 macrophages were more highly expressed in high-risk than in low-risk patients. CD8+ T cells and naïve B cells were more abundant in low-risk than in high-risk patients. Conclusion: The immune-related gene pairs prognostic model developed in the current study can help assess breast cancer prognosis and provides a potential target and research direction for breast cancer immunotherapy in the future. a tumor-associated antigen on the surface of the target cell These results indicate that immune-related pathways may play an important role in the treatment of breast cancer in the future. Consistent with these conclusions, our study determined that upregulation of these pathways was closely associated with low-risk patients. CCR chemokine receptor binding, immunoglobulin complex circulation, and regulation of cell killing pathways not been in in vivo and in vitro studies.

heavier than that of luminal-type tumors 7,8 . Similarly, the proportion of tumor-in ltrating lymphocytes (TILs) in HER2 + tumors and TNBC is much higher than that in HR + tumors, in which the increased proportion of TILs indicates better prognosis 9,10 . In the TRYPHAENA trial, TILs were not associated with the PCR but with improved event-free survival (EFS) 11 . These data indicate the immunogenicity of breast cancer.
Although breast cancer is not considered an in ammatory tumor, the immunogenicity of TNBC and HER2 + tumors is higher than that of luminal A-subtype breast cancer 12 . Immunotherapy has become the rst treatment choice for patients diagnosed with advanced programmed death ligand (PD-L)1+ (TNBC) 13 .
Increasing numbers of studies have investigated immunotherapy treatments for breast cancer.
Traditional immunotherapy for HER2 + breast cancer comprises mainly the HER2-blocking antibody trastuzumab. Although immune checkpoint inhibitors (ICIs) have been successfully used in the treatment of melanoma and lung cancer 14 , the rst effective ICI in the treatment of metastatic TNBC was the PD-L1 antibody atezolizumab combined with nab-paclitaxel 15 . In neoadjuvant therapy, patients treated with atezolizumab had a higher pathological complete response rate (PCR) than controls 16 . However, only PD-L1 was found to predict the e cacy of atezolizumab treatment 13 , suggesting that the role of tumor immunity needs to be better understood to identify additional immune biomarkers and potential therapeutic targets. Single-center or single-ICI studies provide limited clinical and genetic data; more data are needed to provide evidence for combination treatments with ICIs. High-throughput technology and large-scale gene databases can enable the discovery of unique immune-related genes, resulting in more exible immunotherapy treatments for each tumor type.
PD-L1 expression in immunocytes was used to evaluate the e cacy of a combination of atezolizumab plus nab-paclitaxel in the Impassion130 trial 13 . However, another study revealed no clear correlation between PD-L1 expression and clinical e cacy 16 . To date, a predictive biomarker for the e cacy of ICI in breast cancer therapy has not been identi ed. Therefore, we aimed to identify powerful biomarkers for the prediction of immune-checkpoint blockade (ICB) responsiveness using data extracted from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Further, we combined these data with those in the immunology database and analysis portal ImmPort to investigate relevant molecular mechanisms and immune cell relationships.

Materials And Methods:
Obtaining breast cancer gene expression data This was a retrospective study of gene expression and corresponding clinical data based on two independent datasets obtained from publicly available databases In total, data from 1561 patients were analyzed. Expression of 56,737 genes and survival outcome data of 1053 patients were obtained from TCGA (https://portal.gdc.cancer.gov/repository). Gene expression and disease-free survival (DFS) of 508 patients were obtained from the GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE25066).

Construction of immune-related gene pairs prognostic model
In order to construct an immune-related gene prognostic model, 2498 immune-related genes were obtained from the ImmPort database (https://www.immport.org/home) on May 30, 2020. This gene platform includes a list of immunologically relevant genes curated with functions and Gene Ontology terms. The ImmuneRegulation web-based tool identi ed regulators of immune system-speci c genes of interest, and the Immcantation framework analyzed high-throughput adaptive immune receptor repertoire sequencing (AIRR-Seq) datasets characterizing B cell and T cell receptors. In this study, we retained only immune-related genes identi ed in both the GEO and TCGA datasets with a median absolute deviation > 0.5 17 . Relative expression within each immune-related gene pair was compared for each patient in the TCGA dataset. In each pair, if expression of one gene was larger than that of the other, the value of the gene pair was considered 1; otherwise, the value was considered 0. After removing immune-related gene pairs with relatively small variations in expression within the pair (< 20%), least absolute shrinkage and selection operator (Lasso) regression was performed for 1000 simulations, and a prognostic model containing 27 immune-related gene pairs was obtained. The risk value of each patient in the TCGA dataset was calculated using the model. A receiver operating characteristic (ROC) curve was established using the risk values, and an optimal cut-off value was determined to distinguish between low-and highrisk patients.

Veri cation of immune-related gene pairs prognostic model
To further verify the immune-related gene pairs prognostic model, the GEO dataset was used as the validation group. The risk value of each patient in the GEO dataset was calculated using the model, and the cut-off value obtained for the training group was used to stratify the GEO patients into high-and lowrisk groups. Univariate and multivariate Cox proportional hazards analyses were used to verify whether the model could be used as an independent prognostic factor relative to other clinical features such as age, HR, HER2, and American Joint Committee on Cancer (AJCC) stage in the GEO and TCGA datasets.

Immune cell in ltration associated with immune-related gene pairs prognostic model
The CIBERSORT algorithm was used to estimate differences in immune cell in ltration using gene expression data in the high-and low-risk TCGA groups 18 . This algorithm can predict the proportion of 22 types of tumor-in ltrating immune cells, such as T cells, B cells, macrophages, and natural killer cells, using gene expression data.

Enrichment analysis by Gene Ontology (GO)
Enrichment analysis of the identi ed immune-related genes was performed using g:Pro ler 19,20 . All GO gene sets were downloaded from the Gene Set Enrichment Analysis (GSEA) website (https://www.gsea-msigdb.org/gsea/index.jsp), and gene sets in high-and low-risk TCGA groups were compared using the Bioconductor "fgsea" package in R. After 10,000 cycles, signi cant enrichment pathways were obtained and sequenced. Gene sets with statistical signi cance were selected with a false discovery rate (FDR)adjusted P < 0.05.

Statistical analysis
All statistical analyses were performed using R Statistical Software version 3.6.3. Comparison between groups was performed using a t-test. The Kaplan-Meier method was used for survival analysis, and "survival" package in R was used for survival curve analysis. Cox proportional hazards regression analysis was used for univariate and multivariate analyses of OS. The Wilcoxon test was used to compare differences in immune cell in ltration. P < 0.05 was considered statistically signi cant. Statistical differences were recorded as follows: *P < 0.05, **P < 0.01, ***P < 0.001.

Results:
Construction of immune-related gene pairs prognostic model A total of 56,735 genes and 2498 unique immune-related genes were obtained from the TCGA and ImmPort databases, respectively. Among them, 1653 immune-related genes were shared in the data obtained from both databases. Then, the immune-related genes from ImmPort and the genes obtained from the GEO dataset were intersected to nd the same genes. Among 606 shared immune-related genes, 31,896 immune-related gene pairs were found after removing gene pairs with relatively small variations within the pair. The immune-related gene pairs from the TCGA dataset were combined with the corresponding clinical data, revealing 69 immune-related gene pairs signi cantly associated with prognosis. Then, the Lasso method for Cox proportional hazards regression analysis was used to construct the immune-related gene pairs prognostic model for the training group. Finally, 27 immunerelated gene pairs using 43 immune-related genes were selected in the model ( Table 1). The model was used to calculate the risk value of each patient in the training group. A time-dependent ROC curve was employed to distinguish between high-and low-risk patients using the optimal cut-off value (Fig. 1). The prognostic model signi cantly distinguished between high-and low-risk patients related to OS in the TCGA dataset (P < 0.001); the OS of high-risk patients was signi cantly lower than that of low-risk patients (Fig. 2a). Univariate and multivariate Cox proportional hazards regression analyses were used to analyze the corresponding clinical data in the TCGA dataset. The immune-related gene pairs prognostic model and AJCC stage were independent prognostic factors in the TCGA dataset (Figs. 3a, 3b). In order to verify the predictive value of the immune-related gene pairs prognostic model, we applied the model to the GEO dataset and strati ed the patients into high-and low-risk groups. The DFS values of the two validation groups were similar to those of the training groups; the DFS of the high-risk group was signi cantly lower than that of the low-risk group (Fig. 2b, P = 0.026). Univariate and multivariate Cox proportional hazards regression analyses were used to analyze the corresponding clinical data in the GEO dataset. The immune-related gene pairs prognostic model was an independent prognostic factor in the GEO dataset (Fig. 3c).
Immune cell in ltration in different risk groups CIBERSORT, which has been applied in many tumor microenvironments 21,22 , was used to predict the in ltration of 21 different immune cell types in the high-and low-risk TCGA groups (Fig. 4a), including M0 and M2 macrophages, CD8 T cells, and resting dendritic cells. M0 (Fig. 4b, P < 0.001) and M2 (Fig. 4c, P < 0.001) macrophages were highly expressed in the high-risk group, while CD8 T cells (Fig. 4d, p < 0.001) and naive B cells (Fig. 4e, p < 0.001) were highly expressed in the low-risk group.

Functional evaluation of immune-related gene pairs
In order to determine the biological processes and signaling pathways related to the immune-related gene pairs in the prognostic model, we used GO enrichment to analyze the identi ed immune-related genes in the TCGA dataset and identi ed pathways with signi cant differences between high-and low-risk patients (Fig. 5). We found that CCR chemokine receptor binding, regulation of leukocyte-mediated cytotoxicity, T cell migration, T cell receptor complex, and other pathways were signi cantly enriched in low-risk patients (Fig. 6a-6f). The enrichment of these pathways in low-risk patients con rmed the importance of immune cells in the treatment and prognosis of breast cancer. Discussion: In the current study, we demonstrated that a prognostic model constructed with 27 immune-related gene pairs from 43 independent immune-related genes predicted OS and DFS in breast cancer. Many of the identi ed immune-related genes have been previously investigated in breast cancer research. The application of gene expression pro les to construct prognostic models has been studied; however, owing to the heterogeneity of organisms and differences in technology platforms, the accuracy of the analysis method requires veri cation 23 . Relative sequencing and pairing of genes to preprocess an immunerelated gene pairs prognostic model has provided reliable results for many tumors 24,25 .
Breast cancer stem-like cells were rendered as sensitive to γδTc cytotoxicity as non-stem-like cells by preventing increased MICA shedding via ADAM inhibitor GW280264X 26 . This approach, using MICA shedding to enhance γδTc targeting of cancer types, was considered an immune evasion mechanism 26 .
Genetic knockdown and pharmacological blockade of Rac1/Rac2 upregulated claudin-1 and prohibited the strong metastatic potential of TNBC characterized by a mesenchymal-like phenotype 27 . RNF144A is known to be downregulated in a subset of primary breast tumors and suppresses breast cancer growth and metastasis through, at least in part, targeting HSPA2 for ubiquitination and degradation in a ubiquitin ligase activity-dependent manner 28 . HSPA2 has been shown to be upregulated in various types of human cancers and promotes cancer cell growth, angiogenesis, migration, invasion, and metastasis through distinct mechanisms 28 . PPRP4 via NEDD4 downregulates ROBO1, which is implicated as a tumor suppressor in various cancers, resulting in activation of SRC and FAK, thus promoting breast cancer metastasis 29 . PLXNB3 is reportedly overexpressed in breast cancer tissues 30 , and CD74 is associated with poor breast cancer prognosis 31 . MIF can activate NF-kB through CD74 to control the dynamics and stability of mitochondria, thus promoting carcinogenesis by preventing apoptosis 32 . CD74 expression in the membrane and cytoplasm of breast cancer cells was found to be higher than that in normal breast tissue. Downregulation of CD74 was shown to decrease the invasion and migration of MDA-MB-231 breast cancer cells 33 . Finally, in ER + breast cancer cells, interaction between CRABP2 and LATS1 promotes LATS1 ubiquitination and inactivates the Hippo signaling pathway, thus promoting invasion and metastasis of ER + breast cancer 34 .
Other identi ed immune-related genes in the current study have been investigated in relation to immunotherapy for other cancers. As an agonist of Toll-like receptors TLR7 and TLR8, R848 is the driving factor of the M1 anti-tumor macrophage phenotype in vitro. R848-loaded nanoparticles effectively delivered drugs to tumor-associated macrophages in vivo 35 . A CCR1 antagonist has been shown to reduce T cell transportation to the omentum and liver in obesity-related cancer 36 . CXCL14 inhibits human papillomavirus-associated head and neck cancer by restoring expression of MHC-I in tumor cells and promoting antigen-speci c CD8 + T cell response 37 . IL18 can inhibit CD4 + CD25 + Foxp3 + T cells through accumulation of pre-mNK cells and memory-type CD8 + T cells and enhanced the therapeutic effects of ICIs on the peritoneal dissemination of malignant tumors or melanoma metastasis via tail vein injection 38 . The CXCL14 / ACKR2 pathway of autocrine broblasts is a clinically relevant stimulator for epithelial-mesenchymal transition (EMT), tumor cell invasion, and metastasis. ACKR2 is a newly identi ed mediator of CXCL14 function, which is a potential pathway related to drug targets 39 . Taken together, these studies support that immune-related genes identi ed in the current study play important roles in the occurrence and development of tumors through immune cells. Immune-related genes have thus gradually become targets of immunotherapy or the means to enhance the effects of immunotherapy. However, some of the identi ed immune-related genes, such as APOBEC3G, PLXNB1, and C3AR1, have not been previously studied in breast cancer and warrant further exploration.
Additionally, the study ndings revealed that M2 and M0 macrophages were highly expressed in high-risk breast cancer patients, while CD8 T cells and naïve B cells were highly expressed in low-risk breast cancer patients. M2 and M0 macrophages are associated with poor prognosis in colon cancer 22 . Some studies have demonstrated that M2 macrophages might impart outgrowth and M1 macrophages may contribute to dormancy behaviors in metastatic breast cancer cells. Research has also demonstrated that EMT and secondary metastatic sites are regulated by selected macrophage phenotypes in the liver metastatic microenvironment. These results indicate that macrophages could be a potential therapeutic target for limiting death due to malignant metastases in patients with breast cancer 40 . The presence of CD8 + T cells in breast cancer is associated with a signi cant reduction in the relative risk of death from disease in both ER-and ER + subtypes of HER2 + breast cancer. Thus, inclusion of TILs may improve risk strati cation in patients with breast cancer classi ed into these subtypes 41 . Intratumor in ltration of B cells is signi cantly impaired during hepatocellular carcinoma progression. High densities of tumorin ltrating B cells imply a better clinical outcome. Therapies designed to target B cells may be a novel strategy for hepatocellular carcinoma 42 . Therefore, different immune cells have different effects on breast cancer prognosis. Macrophages may promote the metastasis of breast cancer, while CD8 + T cells and B cells may inhibit the growth of breast cancer. Their individual effects and related molecular mechanisms need to be further investigated.
In this study, the identi ed immune-related genes were associated with multiple pathways related to immune cell in ltration and migration and immune checkpoint enhancement. Among them, regulation of the leukocyte-mediated cytotoxicity pathway is associated with tumor progression and decreased CD8 + in ltration in pancreatic cancer 43 . The T cell migration pathway can enhance tumor immunity and increase the e cacy of ICI in preclinical breast cancer models 44 . This pathway can enhance T cells into tumors and intratumoral T-cell diversity 45 . A previous study identi ed the chemokine signal regulator FROUNT as a target to control tumor-associated macrophages by the chemokine receptor binding pathway 46 . Anti-PD-L1 combined with Liposomal-AMD3100, a CXCR4 antagonist, exerted an increased antitumor effect and prolonged survival in a murine TNBC model compared with monotherapies, measured by chemokine response 47 . CD3 bispeci c antibody promoted tumor cell killing by cross-linking the CD3 component of the T cell receptor complex pathway with a tumor-associated antigen on the surface of the target cell 48 . These results indicate that immune-related pathways may play an important role in the treatment of breast cancer in the future. Consistent with these conclusions, our study determined that upregulation of these pathways was closely associated with low-risk patients. CCR chemokine receptor binding, immunoglobulin complex circulation, and regulation of cell killing pathways have not been studied in tumors but warrant further in vivo and in vitro studies.
It is necessary to gather information regarding the expression of multiple immune-related genes in the application of our prognostic model. The reliability of our model requires further veri cation in a large clinical patient population. Our two datasets originated from retrospective studies; thus, our prognostic model needs to be more widely validated in prospective cohort studies. In addition, the immune-related genes and related pathways associated with our prognostic model, and their molecular mechanisms, require further veri cation in vitro and in vivo.

Conclusion:
We established a prediction model based on immunogenomics that reliably predicted the prognosis of breast cancer patients using retrospective data. We also identi ed several key immune-related genes, including MICA, RAC2, HSPA2, NEDD4, PLXNB3, APOBEC3G, PLXNB1, and C3AR1, which may play important roles in the treatment of breast cancer in the future and provide new targets for immunotherapy. Further, we determined that M2 and M0 macrophages were highly expressed in high-risk breast cancer patients, while CD8 T cells and naïve B cells were highly expressed in low-risk breast cancer patients. Speci c immune cell in ltration combined with ICI is the future direction of breast cancer treatment research. Abbreviations: