Immune Microenvironment-Related Genes Contribute to Clinical Prognosis in Patients with Triple-Negative Breast Cancer

Background: Triple-negative breast cancer (TNBC) is a high-risk breast cancer subtype, which accounts for 15% to 20% of all breast cancers and generally has a poor prognosis. TNBC patients have high recurrence post-surgery and high risk of metastasis to other organs. The tumor microenvironment (TME) plays important roles in the carcinogenesis, development, and metastasis of tumors. Methods: This study aimed to investigate the effects of immune and stromal cell-related genes on TNBC prognosis. The ESTIMATE algorithm was used to calculate the immune/stromal scores of TNBC samples from the GEO database. The samples were divided into high- and low- score groups and the differential expression genes were identied using the limma package within R. Functional enrichment and protein-protein interaction (PPI) network analyses revealed that these genes are primarily involved in immune responses. Results: Survival analysis in both GSE21653 and KM-plotter website showed that 36 genes were signicantly related to disease-free survival (DFS) of TNBC. Another survival analysis by R package survival in GSE58812 indicated that 14 genes of 36 were greatly interrelated to DFS of TNBC. Moreover, the expression levels of some of these genes were veried through immunohistochemical staining and RT-qPCR. Finally, four genes importantly associated with TNBC prognosis were identied with Cox-LASSO analysis. Time-dependent receiver-operating characteristic (ROC) analysis displayed an area under the curve (AUC) of 0.95 for one-year survival rate, indicating the four genes performed very well for prognosis prediction. Conclusions: associated ESTIMATE and immune scores are closely related to the survival of TNBC. of ESTIMATE, TNBC samples.


Introduction
Breast cancer is the cancer with the highest morbidity and mortality among women [1]. Globally, there were estimated 2.1 million newly diagnosed female breast cancer cases in 2018, accounting for nearly a quarter of female cancer cases [2]. TNBC was diagnosed via immunohistochemical methods through the lack of ampli ed expression of three receptors -estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) [3]. TNBC accounts for 15-20% of all breast cancers and has a more obvious pattern of metastasis than other subtypes of the disease, with poor prognosis in patients [4,5]. Many efforts have been made to understand the molecular mechanism of TNBC. The Cancer Genome Atlas (TCGA) project contributes to a comprehensive understanding of breast cancer-speci c molecular heterogeneity and driver mutations, including TNBC [6]. So far, the etiology and molecular mechanism of TNBC have not been well explained. Therefore, it is particularly important to nd the prognostic biomarkers in TNBC patients and explore the molecular mechanisms underlying the high morbidity and mortality of TNBC. TME refers to the internal and external environment of tumor cells that are closely associated with the occurrence, growth, and metastasis of tumors. In the TME, tumor cells are able to adapt and proliferate with greatly reduced detection and eradication by host immune surveillance [7]. In addition to tumor cells, the TME mainly includes two non-tumor components immune cells and stromal cells, which are considered to be of great signi cance for tumor diagnosis and prognosis. Currently, the most promising way to activate therapeutic anti-tumor immunity is blocking immune checkpoints to reduce immunosuppression. Cancer immunotherapies targeting programmed cell death protein 1 (PD-1) and programmed cell death 1 ligand 1 (PD-L1) are changing traditional tumor treatment. Moreover, anti-PD-1 / PD-L1 antibodies have been proven to be of great clinical signi cance in more than 15 types of cancers, including melanoma, non-small cell lung cancer (NSCLC), and renal cell carcinoma (RCC) [8,9]. The diverse immune microenvironment in TNBC greatly in uences the risk of relapse, response to chemotherapy, and applications of immunotherapy. An immune checkpoint inhibitor is now ready for use in the study of neoadjuvant and adjuvant therapies for TNBC [10].
ESTIMATE algorithm is a tool that uses gene expression pro le characteristics to predict the proportion of stromal and immune cells in tumors, and infer the purity of tumors in tissues [11]. Current ESTIMATE analysis has shown that stromal/immune cell in ltration is associated with improved prognosis in patients with various types of tumors, including glioblastoma and cutaneous melanoma [12,13].
Nonetheless, there is currently no detailed analysis of TNBC immune/stromal scores.
In this study, in order to better understand the effects of immune and stromal cell-related genes on TNBC prognosis, we systematically analyzed gene expression pro les and excavated TME-related genes with poor prognosis to explore potential regulatory mechanisms.

Materials And Methods
Raw data RNA-seq data and corresponding clinicopathological information of TNBC patients in GSE21653 and GSE58812 were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). Immune/stromal/estimate scores of the samples in GSE21653 were calculated using the ESTIMATE algorithm (https://r-forge.r-project.org) [11]. We selected 85 samples containing both DFS information and ESTIMATE scores for further analysis. The E-MTAB-365 dataset from KM-plotter website (http://kmplot.com/analysis/index.php?p=service&cancer=breast), containing 48 TNBC samples, was used to further con rm the survival analysis [14]. Survival curve parameters were provided in Table 1. The GSE58812 dataset was used for validation, Cox-Lasso regression analysis and ROC curve. The Human Protein Atlas (http://www.proteinatlas.org) was used to validate the immunohistochemistry of genes with prognostic values. Survival Analysis The survival curves used for GSE21653 and GSE58812 were plotted using the R package survival to analyze the relationship between patients' disease-free survival (DFS) rate (months) and high-low score group or high-low gene expression group, and compared using a Log Rank test whether there are differences in survival curves between the two groups of patients.

Protein-protein Interaction Analysis (Ppi)
PPI analysis was conducted via STRING [16]. Then PPI network was reconstructed using Cytoscape software [17], which showed a network of more than 10 nodes. The node size was related to the node degree, with the thickness of the edges re ecting a score of the degree of interaction between the nodes, and the color of the nodes re ecting the differential expression degree.

Rt-qpcr
Total RNA was extracted with Trizol reagent according to the manufacturer's instructions (Invitrogen). Potential DNA contamination was avoided with RNase-free DNase treatment (Promega). cDNA was prepared with MMLV Reverse Transcriptase (Roche). Relative quantitation of gene expression was performed using the ABI PRISM 7500 sequence detection system (Applied Biosystems) measuring realtime SYBR green uorescence. The results were acquired using the comparative Ct method (2 −△△ Ct) with GAPDH as an internal control. This experiment was performed at least thrice independently. The primer sequences used are listed in Supplementary Table 1.

Statistical analysis
We t Ten-fold cross-validated Cox Survival Analysis and Least Absolute Shrinkage and Selection Operator Regression (Cox-LASSO regression) model, as implemented in R package glmnet and survival. The corresponding hazard ratio (HR), 95% con dence interval (CI), and p value were collected. The forest plot was drawn using the R package survminer.
Time-dependent receiver operating characteristic (ROC) curves were carried out to estimate the predictive accuracy for prognosis and describe the associated AUCs at years 1 ~ 10 based on the risk score by the R package pROC. From the curves, sensitivity, speci city, likelihood ratio, predictive value and their respective 95% con dence intervals can be seen. p-value < 0.05 were considered statistically signi cant.

ESTIMATE and immune scores are closely related to the survival of TNBC
We downloaded clinical information of TNBC samples (GSE21653) from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). We analyzed the samples using ESTIMATE, immune and stromal scores. The distribution of scores of TNBC samples were shown in Fig. 1A.
Then we divided the TNBC samples into high-and low-score groups based on the median, and conducted survival analysis using the clinical follow-up information corresponding to the three scores for each group. As shown in Fig. 1B, the survival curve showed that the high-score group of the ESTIMATE score has a higher survival rate than the low-score group (p = 0.0028 in log-rank test), indicating the ESTIMATE score was signi cantly associated with disease-free survival (DFS) of TNBC samples from the GEO database (p < 0.05). Similar phenomena were observed in the high-and low-score groups of the immune score ( Fig. 1C-1D).

Identi cation of differentially expressed genes based on the TNBC ESTIMATE score
Since the ESTIMATE score is a comprehensive evaluation of the immune and stromal score, we further explored the genes associated with TNBC prognosis based on the ESTIMATE score. Differentially expressed genes (DEGs) were rst screened between the high-and low-ESTIMATE score groups. A total of 278 DEGs (276 upregulated and 2 downregulated) were identi ed (|log2FC|>1 and p < 0.05) (Supplementary Table 2). The DEGs in the high-and low-score groups were shown in Fig. 2A.
In order to study the function of these DEGs, we performed functional enrichment analysis on the 276 upregulated and 2 downregulated genes via the STRING database, including GO (molecular function, biological process and cellular component) and KEGG pathway analyses. The top ten enrichment terms for each section of GO and KEGG pathway were shown in Fig. 2B -2E (sorted by -log10 of Q value). GO functions indicated that these genes are mainly enriched in protein binding, immune system process and immune response and membrane part ( Fig. 2B-2D). Moreover, cytokine-cytokine receptor interaction, chemokine signaling pathway were also obtained from KEGG pathways analysis (Fig. 2E).
Screening and functional analysis of genes related to TNBC prognosis In order to screen genes associated with prognosis of TNBC, we performed survival analysis of all the DEGs, among which 171 genes were signi cantly related to DFS (p < 0.05) (Supplementary Table 3). The survival curves of six genes with the lowest p values from the 171 genes were shown in Fig. 3A, including SH2D1A, CST7, GPR18, LCP2, CLIC2 and ITK.
To reveal the potential role of DEGs related to DFS of TNBC, we performed PPI analysis of the 171 genes above via STRING (Fig. 3B). The core proteins included CD2, SELL, CCR5, IL10RA, and LCP2.
Subsequently, the GO enrichment and KEGG pathway analyses on the genes mined by survival analysis were carried out. The data showed these genes are mainly enriched in the TME and immune-related pathway (Figs. 3C-3F).
We integrated the protein interaction of the 171 DFS-related genes from the PPI network module via MCODE analysis, the top four of the eight modules obtained were selected for further investigation. The protein interaction of these four modules were shown in Fig. 4, with the core nodes having higher degrees. Module 1 contained a total of 24 nodes and 245 edges. Among them, SELL, ITGAL, CD8A, CD52, and CD2 have been con rmed as cell adhesion markers, which are involved in a series of important physiological and pathological processes, such as immune response, tumor metastasis, and wound healing (hsa04514) [18][19][20]. Module 2 contained a total of 20 nodes and 70 edges. Among them, C1QB, HLA-DRA, C3, and HLA-DPA1 have been con rmed to be associated with S. aureus infection and systemic lupus erythematosus, indicating that these genes are closely related to immune response (hsa05150, hsa05322) [21]. Module 3 contained a total of 20 nodes and 64 edges. Among them, CASP1, GBP4, and GBP5 have been con rmed to be related to Nod-like receptor signaling pathway, which is an important way for eukaryotes to recognize pathogens (hsa04621) [22]. Module 4 contained a total of 12 nodes and 28 edges, four (CD4, CCR5, CD3D, and ITK) of which have been reported to be closely related to immune response (hsa04658, hsa04060, hsa04660) [23,24]. As shown in Fig. 4, SELL, ITGAL, CD69, and HLA-DRA had higher node degrees, indicating that they might be important immune microenvironment-related genes of TNBC. Among these 171 DFS-related genes, 11 genes have been reported to be signi cantly related with breast cancer prognosis, including CD3D, CD8A, CORO1A, GZMB, LCK, TRBC1, HLA-DRA, ACSL5, EOMES, IRF4, IRF8. In addition, CD3D,CD8A CORO1A GZMB EOMES and IRF8 have been reported to be related with the prognosis of TNBC [25][26][27][28][29].The association of the remaining genes with TNBC prognosis, including CD247, CD3E, LAX1, LPXN, PRKCB, and SIRPG, have not been reported. These genes may be potential immune microenvironment-related prognostic markers of TNBC.

Relationships Between Immune Microenvironment-related Genes And Tnbc Patient Prognosis
DFS-related genes (n = 171) were further analyzed in KM-plotter website (http://kmplot.com/analysis/index.php?p=service&cancer=breast) and the expression levels of 36 genes were signi cantly correlated with DFS in E-MTAB-365 dataset (p < 0.01). Detailed information of the 36 genes was provided in Table 2. Additionally, GSE58812 dataset with clinical information of TNBC samples were downloaded from the GEO database and were used to verify the correlation of the above 36 genes with prognosis of TNBC samples. These genes were further analyzed by univariate Cox regression using the R-package "survival". The detailed information of univariate log-rank was provided in Supplementary Table 4 and only 14 genes were signi cantly correlated with DFS. Kaplan-Meier survival analysis showed that patients with higher expression of these genes have better disease-free survival (Fig. 5). The detailed information of the 14 genes was provided in Table 3. Among these genes, SELL, GZMB, IL2RB, LCP2, and CD8A were involved in Module 1, CASP1 and TRIM22 were involved in Module 3, EOMES and ITK were involved in Module 4. These genes may be potential genes for the poor prognosis of TNBC and may provide therapeutic value for TNBC in the future. The genes in normal font are reported genes related to breast cancer prognosis; the genes marked with an asterisk (*) are reported genes related to TNBC prognosis; the genes in bold font have not been reported to be related to breast cancer prognosis.  (Fig. 6C). All the results showed that these immune microenvironment-related genes might be good prognostic biomarkers of TNBC.

Discussion
In this paper, we used the ESTIMATE algorithm [11] to calculate the Estimate/immune/stromal scores of gene expression data of TNBC samples from GEO database GSE21653. These samples were divided into high-and low-score groups based on the ESTIMATE scores. By analyzing the differential expression genes between the high-and low-score groups, we acquired 278 DEGs. After functional enrichment analysis of upregulated and downregulated genes, many genes were found to be related to TME and immunity. The results were consistent with previous studies, which reported the important role of immune cells and stromal cells in TME [12,[30][31][32]. A total of 171 core genes were extracted from the survival analysis and analyzed by PPI network analysis and modules. Functional enrichment analysis of these genes indicated that they were related to immune response. We then performed a KM plotter analysis of the 171 DFS-related genes, further analyzed them, and identi ed 36 genes signi cantly associated with DFS, which could be used as TNBC prognosis biomarkers. These 36 genes were reanalyzed by univariate Cox regression analysis in GSE58812 dataset. Only 14 genes of 36 were signi cantly related to DFS of TNBC. Subsequent immunohistochemical staining and RT-qPCR veri cation revealed that these genes were likely to be prognostic markers for TNBC. Finally, Cox LASSO regression assay revealed that BIRC3, CD8A, GNLY and TRIM22, might be potential immune microenvironment-related prognostic markers of TNBC.
BIRC3 (baculoviral IAP repeat containing 3) participates in immunity activities by regulating NF-κB signaling and other in ammatory signals. It also acts as an E3 ubiquitin protein ligase in the TME in mice. The TNFa-TNFR2-BIRC3-TRAF1 signaling pathway has been shown to promote metastases in mice, and the activation of this pathway is associated with the poor prognosis of gastrointestinal stromal tumor patients [33].
CD8A (CD8 antigen) is a cell surface glycoprotein on most cytotoxic T lymphocytes, which mediates effective cell-to-cell interactions in the immune system. It functions as a coreceptor with T cell receptors on T lymphocytes and recognizes antigens displayed by antigen-presenting cells in class I MHC molecules. As previously reported that, CD8A was predictive for increased pathologic complete response (pCR) in the neoadjuvant GeparSixto trial [19]. The CD8A gene is associated with an improved outcome in several public breast cancer datasets. High expression of CD8A is related with a better relapse free survival in TNBC [26]. Our results further supports the role of CD8A in TNBC.
GNLY (granulysin) is part of the saposin-like protein (SAPLIP) family and is located in the cytotoxic granules of T cells and released after antigen stimulation. GNLY may tempt ER stress-mediated apoptosis [34]. It is correlated to the ability of NK-extracellular vesicles (EV) to induce cytotoxicity [35]. Additionally, serum GNLY may be a potential biomarker for nasopharyngeal carcinoma, the early stage of colorectal adenocarcinoma and muscle-invasive bladder cancer [36][37][38]. The function of GNLY in TNBC immune microenvironment and immunotherapies deserves further research.
TRIM22 (Stimulated Trans-Acting Factor of 50 kDa, Staf-50) is a E3 ubiquitin-ligase and a member of the C-IV group of tripartite motif (TRIM) family, which is strongly induced by interferon stimulation and takes part in innate immunity of cells. Besides the antiviral effects, TRIM22 is a potential therapeutic target and prognosis marker for NSCLC [39]. We will further explore the expression and function of TRIM22 in TNBC.
Our results were due to the publicly available data and ESTIMATE algorithm. Multiple datasets were used to verify that the selected genes were reliable and common. The method "CIBERSORT" was used to analyze microarray data and not for TCGA RNA-seq data [40]. The method "TIMER" has limited sample size and relevance for distinguishing the location of immune cells in the stroma or tumor or capturing tumor cell heterogeneity [41]. The ESTIMATE algorithm is applicable in microarray expression data sets, new microarray and RNA-seq-based transcriptome pro les. The predictive ability of this method has been veri ed in large independent data sets. This method provides a good indication of biomarkers for tumor prognosis, provides a basis for the next step of research, links the tumor microenvironment and tumor prognosis together, and contributes to the better resolution of tumor prognosis.
This study has certain limitations. The ROC curve analysis shows that these 4 genes can predict the prognostic effect of 1 year, but there is a limitation when considering 3 or 5 years or long-term prognosis, and other predictive markers may be needed.
In conclusion, based on the ESTIMATE algorithm, we systematically analyzed the expression pro le of TNBC and screened out some genes related to TME and prognosis. These genes may be potential prognostic markers of the TNBC immune microenvironment. Future work will be focused on comprehensively exploring the pathogenic mechanism of these genes in the immune microenvironment, and the potential associations between TME and TNBC prognosis. Our ndings have greatly contributed to a better understanding of the complex regulatory network of TNBC and of immune and stromal cellrelated genes on TNBC progression. These ndings may provide new promising biomarkers for TNBC therapy.

Declarations
Ethics approval and consent to participate The study was authorized by the Ethics Committee of Capital Medical University, and the written informed consent was signed by each patient.

Supplementary Files
Supplemental tables are not available with this version. were divided into high-and low-scoring groups (median as a standard). Survival analysis was performed using the clinical follow-up data corresponding to each sample. As shown in the gure, the ESTIMATE score was signi cantly correlated with the DFS of TNBC samples in the GEO database (p<0.05).
The samples are sorted according to the ESTIMATE score from high to low and from left to right. The blue group on the left is a sample from the ESTIMATE high score group, and the right pink group is a sample from the ESTIMATE low score group. The genes were ranked according to the p value of the differential expression analysis from low to high. Red represents high gene expression and blue represents low gene expression. The darker the red or blue color, the greater the degree of difference in gene expression.   Relationships between immune microenvironment-related genes and TNBC patient prognosis. Kaplan-Meier survival analysis of the relationship between survival time and 14 genes expression signatures by R package survival. These 14 genes are likely to be prognostic markers for the immune microenvironment of TNBC. Figure 6