Gene Expression Patterns Associated With Tumor-Inltrating CD4+ and CD8+ T Cells In Invasive Breast Carcinomas

Objective This study investigated the gene expression patterns associated with tumor-inltrating CD4 + and CD8 + T cells in invasive breast carcinomas. Methods The gene expression data and corresponding clinical phenotype data from the Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) were downloaded. The stromal and immune score were calculated using ESTIMATE. The differentially expressed genes (DEGs) with a high vs. low stromal score and a high vs. low immune score were screened and then functionally enriched. The tumor-inltrating immune cells were investigated using the Cibersort algorithm, and the CD4 + and CD8 + T cell-related genes were identied using a Spearman correlation test of inltrating abundance with the DEGs. Moreover, the miRNA-mRNA pairs and lncRNA-miRNA pairs were predicted to construct the competing endogenous RNAs (ceRNA) network. Kaplan-Meier (K-M) survival curves were also plotted. Results In total, 478 DEGs with a high vs. low stromal score and 796 DEGs with a high vs. low immune score were identied. In addition, 39 CD4 + T cell-related genes and 78 CD8 + T cell-related genes were identied; of these, 14 genes were signicantly associated with the prognosis of BRCA patients. Moreover, for CD4 + T cell-related genes, the chr22-38_28785274-29006793.1–miR-34a/c-5p–CAPN6 axis was identied from the ceRNA network, whereas the chr22-38_28785274-29006793.1–miR-494-3p–SLC9A7 axis was identied for CD8 + T cell-related genes. chr22-38_28785274-29006793.1–miR-34a/c-5p–CAPN6 chr22-38_28785274-29006793.1–miR-494-3p–SLC9A7


Background
Breast carcinoma is one of the most common tumors in women, accounting for approximately 25% of all cancers; it is the second leading cause of cancer-related death in women worldwide [1]. Breast carcinomas can be classi ed into two main categories, including in situ carcinomas and invasive carcinomas [2]. Invasive breast carcinoma (BRCA) refers to those tumors that show tumor cell invasion to adjacent tissues of the mammary ducts and display a trend of regional lymph node metastasis [2]. The clinical course and outcome of breast carcinoma differs with diverse immunohistochemical characteristics and histopathological subtypes [3]. Despite the great progress in diagnosis and therapy of breast carcinomas, clinical outcomes remain unsatisfactory, and the 5-year survival rate of female patients with metastatic breast cancer is 27% [4].
Recently, the therapeutic prospects of malignancies, including breast carcinomas, have been remarkably altered with a deeper understanding of tumor-immune interactions and the development of immunological checkpoint inhibitors [5,6]. Immune checkpoints are de ned as various inhibitory pathways mediating self-tolerance and immune responses to limit collateral tissue damage [7]. These pathways can be utilized by tumor cells to escape detection and elimination by the immune system [8].
Programmed death 1 (PD-1) is an immune checkpoint; it suppresses the biological functions of effector T cells. Tumor cells can express PD-L1, a ligand of PD-1, which can inhibit the antitumor immune response by binding to PD-1 [9,10]. Tumor-in ltrating lymphocytes (TILs) are immune cells that have migrated to the tumor tissue microenvironment, indicating an antitumor immune response [6,11]. Shi et al. revealed that the distribution of TILs differs in diverse subtypes; patients with triple-negative breast carcinomas show more PD-1 + exhausted TILs, suggesting an immunosuppressive microenvironment [12]. CD4 + and CD8 + T cells are two major lymphocyte cell types. Matsumoto et al. suggested that increased CD4 + and CD8 + T cell in ltration indicates good prognosis in triple-negative breast carcinomas [13]. Additionally, Su et al. indicated that immunosuppression in breast carcinomas can be reversed by blocking the recruitment of naive CD4 + T cells, which might be a promising strategy for anticancer immunotherapy [14].
The growth and metastasis of breast carcinomas involves dynamic processes that are affected by the tumor-immune microenvironment, in which the gene expression of resident cells exhibit obvious alterations [8,15]. Genetic alterations in tumor progression facilitate the ectopic expression of normally silent genes in tumors, with potential carcinogenesis [16]. Masjedi et al. demonstrated that olfactory receptor genes are upregulated in BRCA and are implicated in the proliferation and progression of BRCA [17]. The role of tumor-immune interactions in breast carcinomas has received more attention. Nevertheless, the major molecular changes in the immune response accompanying breast carcinoma progression, especially in CD4 + and CD8 + T cells, is still largely unknown. Speci c TIL subsets were indicated to be clinically signi cant and could be used to predict treatment responses. Cytotoxic CD8 + T lymphocytes could alternatively identify and eliminate tumor cells. While it could be deactivated by T reg cells, and the role of co-stimulatory proteins on antigen-presenting cells (APC) would subsequently decrease. That is to say, CD8 + T lymphocytes can be inhibited by T reg cells [18]. Breast carcinomas can generate substances that affect APCs and alter T cell types [19]. Disis et al. showed that the cause of robust TILs in triple-negative breast carcinomas might be the elevated mutations producing a neoantigen signature [20]. Therefore, we explored the gene expression pattern associated with tumor-in ltrating CD4 + and CD8 + T cells of invasive breast carcinomas. The results are expected to provide therapeutic targets for clinical treatment of BRCA.

Data acquisition
The RNA-sequencing (RNA-seq) Fragments per Kilobase of transcript per Million mapped reads (FPKM) and corresponding clinical phenotype data from The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data collection were downloaded from the University of California Santa Cruz (UCSC, https://xenabrowser.net/) Genome Browser database. There were 1194 samples, including 1082 BRCA samples and 112 normal samples (data acquisition on 07-18-2019). Those data were analyzed according to the presupposed work ow (Fig. 1).

Differentially expressed gene (DEG) screening and functional enrichment
The probes were rst annotated according to the annotation les, and were ltered based on whether they matched the gene symbol. The mean value was selected when multiple probes matched to the same gene. The stromal score and immune score of tumor tissue were calculated using the ESTIMATE (Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data) package [21]. Then, samples were split into high stromal/immune score and low stromal/immune score groups according to the median value of the stromal/immune score, respectively. Then, the DEGs with a high stromal/immune score vs. a low stromal/immune score were screened using the Limma package [22] (Version 3.10.3) with |log fold change (FC)| > 0.263 and P value < 0.05. Finally, the overlapping DEGs between the two groups were selected and used in the following analysis.

Functional enrichment analysis
The biological processes terms of the Gene Ontology annotation and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were analyzed to investigate the functions of the upregulated and downregulated overlapping DEGs using clusterPro ler [23] (Version 3.2.11) in R package. The number of enriched genes was set as: count ≥ 2. The terms with P < 0.05 were considered to be signi cant results.
Identi cation of CD4 + and CD8 + T cell-related genes Based on the Cibersort algorithm [24], the tumor-in ltrating immune cells in BRCA were investigated to estimate the in ltrating abundance of six immune cell types, including B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages, and dendritic cells. Then, the correlation coe cient (r) between overlapping DEGs and in ltrating abundance of immune cells (CD4 + and CD8 + T cells) was calculated by using the Spearman correlation test, and CD4 + and CD8 + T cell-related genes were identi ed with |r| > 0.15.

Construction of the protein-protein interaction (PPI) network
The interactions between CD4 + T cell-related genes were retrieved from the STRING database [25] with a PPI score setting as 0.15 (low con dence), and the species was set as human. Based on the retrieved PPIs, CD4 + T cell-related PPI network visualization was performed using Cytoscape [26] (version: 3.2.0). The CD8 + T cell-related PPI network was also constructed using the same method.

Construction of the competing endogenous RNAs (ceRNA) network
The prediction of miRNA-mRNA interactions was conducted for CD4 + T cell-related genes using miRWalk 3.0 [27], and the species was set as human. The miRNA-mRNA interactions with a score > 0.95, and those that existed in both the TargetScan and miRDB databases were selected. Next, based on DIANA-LncBase v.2 [28], the prediction of lncRNA-miRNA interactions was conducted and the interactions with score = 1 were selected. The construction of a CD4 + T cell-related ceRNA network was completed by integrating the obtained lncRNA-miRNA interactions and miRNA-mRNA interactions. Similarly, a CD8 + T cell-related ceRNA network was also constructed using the same method.

Construction of a chemical-target network
The genes and chemicals associated with breast neoplasms were explored from the comparative toxicogenomics database (CTD) [29] using "breast neoplasms" as the search keywords. Then, the overlapping genes between breast neoplasm-associated genes and genes in the CD4 + T cell-related ceRNA network were selected and used to screen chemical-target pairs. Then, the CD4 + T cell-related chemical-target network was visualized using Cytoscape. Similarly, the CD8 + T cell-related chemicaltarget network was also constructed using the same method.

Survival analysis
The overall survival (OS) and OS status in the clinical phenotype data were used to perform survival analysis. Brie y, the immune cell-related genes (CD4 + and CD8 + T cell-related genes) were split into highexpression and low-expression groups based on the median expression value, together with a log-rank statistical test. The cut-off was set as a P value < 0.05 to select the genes signi cantly associated with prognosis, and then Kaplan-Meier (K-M) survival curves were plotted.

Differences in gene expression between high and low stromal scores
The stromal score was calculated to predict the non-tumor cell in ltration [21]. Based on the stromal score, the samples were split into high and low stromal score groups, and a total of 478 genes were signi cantly differentially expressed in the two groups, including 104 upregulated and 374 downregulated genes ( Fig. 2A). These genes were considered to be associated with the stroma in tumor tissue.

Differences in gene expression between high and low immune scores
The immune score refers to the in ltration of immune cells in tumor tissue, which is considered an available indicator of prognosis in tumors [30]. Based on the immune score, the samples were split into high and low immune score groups; 796 genes were found to be signi cantly differentially expressed in the 2 groups (Fig. 2B). Of these, 503 genes were upregulated whereas 293 genes were downregulated, suggesting that these genes might be implicated in the immune status in the tumor microenvironment.
Overlapping genes between the stroma score-related and immune score-related genes The overlapping genes between the stroma score-related and immune score-related genes were identi ed by VENN analysis; 167 genes were obtained, including 58 upregulated and 109 downregulated genes (Fig. 2C). The functions of these overlapping genes were further investigated, and the downregulated genes were found to be signi cantly implicated in one KEGG pathway (hsa05146, amoebiasis) and ve biological process terms, for example: GO:0006885 ~ regulation of pH (Table 1).

Immune cell in ltration in BRCA
The in ltration abundance of six immune cell types in BRCA was analyzed using the Cibersort algorithm. From the bar charts of immune cell subset proportions (Fig. 3), CD8 + T cells and activated memory CD4 + T cells accounted for a large proportion of immune cell in ltration in BRCA. Therefore, CD8 + and CD4 T + cells were selected in the following analysis.

Identi cation of CD4 + and CD8 + T cell-related genes
The in ltrating abundance of CD4 + and CD8 + T cells is related to prognosis in BRCA [13]. The in ltrating abundance of CD4 + and CD8 + T cells was calculated using the Cibersort algorithm. Then, the immune cell-related genes were identi ed using the Spearman correlation test between overlapping genes and the in ltrating abundance of immune cells. A total of 39 CD4 + T cell-related genes and 78 CD8 + T cellrelated genes were identi ed. These genes were regarded as crucial genes involved in CD4 + and CD8 + T cell in ltration.

PPI network of CD4 + and CD8 + T cell-related genes
Proteins and their functional interactions form the backbone of cellular machinery, and connectivity networks are conducive to fully understand biological phenomena [31]. For CD4 + T cell-related genes, the PPI network contained 13 genes and 11 interactions (Fig. 4A). Of the 13 genes, 5 genes were upregulated whereas 8 genes were downregulated. For CD8 + T cell-related genes, the PPI network consisted of 56 genes (16 upregulated and 40 downregulated) and 76 interactions (Fig. 4B).
Breast neoplasm-related chemicals target mRNAs in the ceRNA network The etiology of many diseases involves the interactions between environmental chemicals and genes that regulate physiological processes [32]. The CTD provides information about chemical-gene/proteindisease relationships [32]. In this study, the chemical-gene interactions involving breast neoplasms were identi ed from the CTD, and were ltered using the mRNAs in the ceRNA network. A total of 31 chemicals were found to interact with the ve genes in the CD4 + T cell-related ceRNA network (Fig. 6A, Supplemental Table 3). For example, atrazine (CAS, 1912-24-9) was predicted to result in the increased expression of CAPN6 mRNA.
Similarly, 57 chemicals were found to interact with the 12 genes in the CD8 + T cell-related ceRNA network (Fig. 6B, Supplemental Table 4). For example, arsenic (CAS, 7440-38-2) was predicted to affect the methylation of the SLC9A7 gene.
CD4 + and CD8 + T cell-related genes associated with prognosis of BRCA To explore the prognostic value of CD4 + and CD8 + T cell-related genes, survival analysis was performed.
In total, 14 genes were found to be signi cantly related to prognosis of BRCA patients; of these, eight genes were commonly related to both CD4 + and CD8 + T cells (Fig. 7, Table 2), for example, MUC2. Among the 14 genes, 6 genes were found to be speci c to CD4 + or CD8 + T cells; METTL5 and MSTN were CD4 + T cell-related genes, whereas NMNAT3, GNAL, ZBED4, and RDH12 were CD8 + T cell-related genes. (a) Prognosis associated genes from the CD4 + T cells related genes; (b) Prognosis associated genes from the CD8 + T cells related genes; The genes marked in red represent the overlapped genes from both CD4 + T cells related genes and CD8 + T cells related genes.

Discussion
In this study, the RNA-seq data and clinical phenotype data from TCGA-BRCA were used to investigate the gene expression pattern in the immune response in BRCA progression. A total of 478 DEGs with a high vs. low stromal score, and 796 DEGs with a high vs. low immune score were identi ed, and the overlapping DEGs were found to be implicated in ion transmembrane transport, regulation of pH, and extrinsic apoptotic signaling pathways. In addition, a total of 39 CD4 + T cell-related genes and 78 CD8 + T cellrelated genes were identi ed, of which 14 genes were related to prognosis of BRCA patients, for example, MUC2. Moreover, for CD4 + T cell-related genes, the chr22-38_28785274-29006793.1-miR-34a/c-5p-CAPN6 axis was identi ed in the ceRNA network and for CD8 + T cell-related genes, the chr22-38_28785274-29006793.1-miR-494-3p-SLC9A7 axis was identi ed.
MUC2, also termed mucin-2, belongs to the mucin protein family, which are high molecular weight glycoproteins secreted to form an insoluble mucous barrier that protects the gut lumen [33]. MUC2 was found to be expressed in breast mucinous carcinomas, and Matsukita et al. indicated that overexpression of MUC2 in mucinous carcinoma might serve as a barrier to attenuate tumor aggressiveness [34]. Astashchanka et al. showed that MUC2 participates in the regulation of proliferation, apoptosis, and metastasis of breast carcinoma cells, indicating roles of MUC2 in therapy and in clinical outcome prediction in breast carcinoma [35]. In our study, the expression of MUC2 was correlated with both CD4 + and CD8 + T cell in ltration and was associated with prognosis of BRCA patients. Therefore, we speculated that MUC2 might regulate the aggressiveness of BRCA; this process is probably accompanied by T cell in ltration.
For CD4 + T cell-related genes, the chr22-38_28785274-29006793.1-miR-34a/c-5p-CAPN6 axis was identi ed from the ceRNA network. The miR-34 family (miR-34a/b/c) is composed of tumor suppressors, functioning in inhibiting proliferation, migration and inducing apoptosis, and are also important mediators of the p53 signaling pathway [36,37]. It has been reported that miR-34a serves as a tumor suppressor in triple-negative breast carcinomas by targeting the proto-oncogene c-SRC [38]. Similarly, Xiao et al. found that miR-34a can inhibit glycolysis and proliferation of breast carcinoma cells in breast carcinomas by directly targeting lactate dehydrogenase A, whose overexpression is associated with tumor growth and metastasis [39]. CAPN6, also termed calpain-6, is a member of an intracellular cysteine protease family, and this family is found to be abnormally expressed in malignant tumors [40]. Calpains are reported to be involved in various cellular activities in breast carcinomas, including cellular survival, apoptosis and migration [41]. MacLeod et al. suggested that calpain 1 and 2 play a pro-tumorigenic role in HER2 + breast cancer, whereas tumorigenesis can be delayed by disrupting calpain 1 and 2 expression [42]. Calpain 6 has been reported to relate to tumorigenesis and unfavorable prognosis in head and neck squamous cell carcinoma [40], and is considered to be a possible target in the treatment of sarcomas [43]. The role of calpain 6 in breast carcinomas was rarely reported. A previous study showed that calpain 1 and miR-34a/c were associated with kanamycin-induced inner ear cell apoptosis [44]. In our study, CAPN6/calpain 6 was predicted to be a target of miR-34a/c-5p, which was regulated by the lncRNA chr22-38_28785274-29006793.1. The role of the lncRNA chr22-38_28785274-29006793. 1 has not yet been reported. We speculated that the chr22-38_28785274-29006793.1-miR-34a/c-5p-CAPN6 axis might regulate cellular activities associated with CD4 + T cell in ltration in BRCA.
The abnormal expression of miR-494 has been reported in various cancers. However, the role of miR-494 in carcinogenesis is contradictory, including a tumor suppressor role [45,46] and an oncogenic role [47,48]. Zhan et al. revealed that miR-494 can inhibit the progression and metastasis of breast carcinomas by targeting P21 (RAC1) activated kinase 1 [49]. The proliferation and migration of MDA-MB-231 and MDA-MB-468 breast carcinoma cells can be promoted by highly expressing miR-183 or miR-494 [50].
Although several novel ndings were found in this study, there were some limitations. (1) Our study preliminarily analyzed the gene expression pattern of tumor-in ltrating CD4 + and CD8 + T cells of BRCA. However, further experiments are needed to con rm the expression of the DEGs and the predicted ceRNA axes. (2) The correlations between prognosis and the 14 identi ed genes should be further investigated using clinical trials. (3) The predicted chemical-gene interactions should be con rmed to provide research topics for the treatment of BRCA.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable. The work ow of the study.   Protein-protein interaction networks Protein-protein interaction network of CD4+ T cell-related genes (A) and CD8+ T cell-related genes (B). Red and green nodes represent upregulated and downregulated genes, respectively.