Tumor microenvironmental prognosis analysis and molecular labeling of immune-related genes in Luminal B type breast cancer

Background: Tumor microenvironment (TME) cells is one of the important elements, constitute the tumor tissues and in predicting tumor clinical results and treatment effect has the important significance of clinical pathology. However, a detailed map for prognostic landscape of TME in Luminal B breast cancer is still lacking. Methods: ownloaded the expression profile and clinical follow-up information of luminal B breast cancer from TCGA and GEO, and used CIBERSORT to evaluate the infiltration mode of TME in 209 patients, and constructed the molecular subtype of luminal B breast cancer based on TME, further evaluated the relationship between molecular subtype and prognosis. DESeq2 was used to analyze the differentially expressed genes among molecular subtypes of luminal B breast cancer, and cox multivariate regression analysis was used for feature selection to construct TME signature. Results: ased on the median value of TMEscore, the samples were divided into High TMEscore and Low TMEscroe, and the relationship with prognosis, clinical features, immune gene expression and genomic variation was further evaluated. Different TME cells have significant correlation in luminal B breast cancer. These TME cells stratified different clinical results of luminal B breast cancer, and further TMEscore was established by Cox regression analysis. High TME score is associated with poor prognosis. Meanwhile, this study showed that CXCL10, CXCL9, GZMB and other immune activation genes and PDCD1LG2 and other immune checkpoint genes have higher expression levels in high TME score, and the mutation frequency of TP53 gene was lower in risk-H than that in risk-L. Conclusion: In this study, we systematically evaluated the TME infiltration patterns of 209 TCGA luminal B breast cancer patients and developed the TME score approach. It was found that TME score is a powerful prognostic biomarker and predictor of response to immunocheckpoint inhibitors and provides a new strategy for luminal B breast cancer immunotherapy. predict checkpoint immunotherapy we elucidate the comprehensive landscape of interactions between Luminal B breast cancer clinical characteristics and infiltrating TME cells, help of several computational algorithms, we established a method to quantify the infiltrating pattern of TME -TMEScore.


Introduction
Breast cancer is one of the most common malignant tumors in the female population, and its incidence has leapt to the top of female malignant tumors. The latest global cancer data estimate that in 2018, there were 8.6 million new cancer cases and 4.2 million deaths among women 4 worldwide, including 2.1 million new cases of breast cancer, accounting for 24.2% of the total, and 630,000 deaths, accounting for 15% of the total, both ranked first (1). The continuous advancement of gene detection technology has promoted the research on the molecular level of breast cancer. In 2000, Perou et al. first proposed the concept of molecular typing (2), Molecular typing is determined by the intrinsic gene expression of breast cancer (2,3). Currently, breast cancer is roughly divided into four subtypes, namely luminal type A, luminal type B, HER-2 overexpression type and basal type (4,5). Although most women with luminal B-like breast cancer use endocrine therapy alone, some of them have fatal recurrences, so understanding the development of luminal B-like breast cancer and developing new prognostic factors is an urgent problem to be solved.
Different tumor microenvironments (TMEs) are formed in different stages of tumor development, and have multiple capabilities to induce adverse and beneficial consequences of tumorigenesis, immune cells can be activated to promote tumor growth and progression, most likely to be affected by the tumor microenvironment (6). More and more research shows that TME plays a vital role in cancer progression and therapeutic response (7,8), For example, differences in the compositions of resident cell types within the TME, including cytotoxic T cells, helper T cells, dendritic cells (DCs), tumorassociated macrophages, mesenchymal stem cells, and associated inflammatory pathways, have been reported in patients with cancer (9)(10)(11)(12). Detection of TME at the time of diagnosis can reflect immune response and chemotherapy benefits (13), and changes in the number of infiltrating CD8 + T cells, CD4 + T cells, macrophages, and fibroblasts in TME are associated with clinical and prognosis of multiple malignancies (12,(14)(15)(16). There are several algorithms that can estimate the abundance of immune cells and other cells in TME (17)(18)(19). Although some studies have used these methods to explore the relationship between TME infiltration and clinical outcomes of tumors (20,21), a comprehensive landscape of TME infiltration has not been explored in luminal B breast cancer.
In this study, based on the clinical sample annotation of luminal B breast cancer gene expression profile, CIBERSORT algorithm was used to estimate 22 types of immune cells and the proportion of cancer associated fibroblasts, 209 TME infiltration mode of luminal B breast cancer patientsn were assessed, and systematically the TME phenotype and luminal B sample characteristics and clinical 5 pathologic features associated with breast cancer genome. Therefore, we established a method to quantify the penetration pattern of TME (TME score). TME score was found to be a powerful prognostic biomarker and predictor of response to immunocheckpoint inhibitors.

Data collection and processing
We download expression profile data of TCGA Affymetrix HT Human Genome U133a microarray platform from GDC API database and the prognostic information on August 14, 2019. The samples without clinical data and follow-up time less than 30 days were removed, and 209 luminal B breast cancer samples were included in the final study as training set. Similarly, we downloaded two sets of chip data sets GSE1456 and GSE21653 from the Gene Expression Omnibus (GEO) database (22), which integrates the express expression data and clinical follow-up information of 65 luminal B breast cancer samples from Affymetrix Human Genome U133A Array, Affymetrix Human Genome U133 Plus 2.0 Array, Affymetrix Human Genome U95 Version 2 Array, with a follow-up time of more than 30 days. The sample informations are shown in Table 1. For the probe data, the R package hgu133plus2.db was used to map the probe to the GeneSymbol, and the median was obtained when the multiple probes correspond to the expression of one gene, and the probes that match multiple genes were removed.  Living  165  16  23  Dead  33  7  19   pathologic_T   T 1  37  --T 2  127  --T 3  24  --T 4 10 -- Consensus clustering to obtain molecular subtypes associated with TME-infiltrating cells Consensus clustering was performed using the ConcensusClusterPlus (23) package in R to determine subgroups of GBM based on the TME-infiltrating cells. As zhang et al. (24) evaluated the optimal number of clusters between k = 2-10, repeated 1000 times to ensure the stability of the results, and used the R software package pheatmap for visualization.
Differentially expressed genes (DEG) associated with the TME phenotype To identify genes associated with TME cell-infiltrating patterns, the linear model was used to analyze the gene expression differences among phenotypic related TME subgroups. Specifically, the R software package DESeq2 (25) was used to calculate the differential genes, and FDR < 0.05 was selected. In order to include more candidate genes, Foldchange was not restricted. TME phenotype related differential gene re-clustering Nonnegative matrix factorization (NMF) is an unsupervised clustering method that is widely used in the discovery of genomics-based tumor molecular subtypes (26,27). In order to further observe the relationship between the expression of TME phenotype-related differential genes and phenotype, we used the NMF method to re-clusters the samples based on the expression profiles of TME phenotyperelated differential genes, and analyzed the clinical features of the re-clustered samples. NMF The method selects the standard "brunet" for 50 iterations, the number of clusters k is set to 2 to 10, and the average profile width of the common member matrix is calculated using the R package NMF (28), and the minimum member of each subclass is set to 10.

Dimension reduction and generation of TME gene signatures
In order to obtain robust TME gene signatures, the random forest algorithm is further used to assess the importance of these DEGs. Specifically, the copth function of survival of R software package was used for univariate survival analysis, and the selection threshold was 0.05, and the genes with significant prognosis were included for randomForest feature selection using randomForest of R software package, and mtry of each segmentation set as 1-235 and ntree = 500, and the mtry value with the lowest error rate was selected as the optimal mtry value of the randomForest algorithm.
Then ntree = 100 was selected according to the error rate of random forest. Finally, each DEG was ranked according to its importance, and 95% DEGs with cumulative importance > 95% were selected as candidate feature genes. k-means (29) was used to divide these genes into 5 categories. Psych R package was used for principal component analysis of the expression profiles of 5 categories of genes, and the first principal component was extracted as signature score after 100 iterations. This approach has the advantage of focusing the score on the set with the largest block of well-correlated (or anticorrelated) genes in the set, while downweighting contributions from genes that do not track with other set members. For gene type j, signature score formula of samples is as follows: j represents the jth class of the five types of genes, wherein n j represents the number of genes of the jth gene, Pc1 i represents the first principal component coefficient of the ith gene of the jth gene, and Exp i indicates the first The ith gene expression level of the j-type gene.
PCA analysis was conducted on signature G1, signature G2 and signature G3 using R's psych package. For each gene signature, 100 iterations were conducted to obtain the optimal number of 8 principal components (PCs), and then the respective PCs scores were calculated. PC1 scores of G1, G2 and G3 were selected as the final score. Cox multivariate regression was used to obtain the signature score risk coefficients of each G1, G2 and G3. For any sample, the TMEScore formula is as follows: Where, is the multivariate regression coefficient of each signature, and PC1 is the PC1 score of each signature.
Relationship between TME score and clinical features To observe the relationship between TMEScore and clinical phenotype, the samples were divided into two groups based on the median TMScore of the samples, which compared the prognostic differences between high TMEScore and low TMEScore, respectively. Similarly, the relationship between high and low TMEScore and age and gender is analyzed.
Relationship between TME Score and tumor genomic variation To observe differences in genomic variation between high TMEScore and low TMEScore samples, we downloaded SNP data from TCGA, removed intron and silent mutations, and used fisher's exact test to analyze the difference in the mutations between the two samples, selected the threshold p < 0.05.

Statistical Analysis
The normality of the variables was tested by the Shapiro-Wilk normality test (33), unless otherwise specified. For the comparison of the two groups, the statistical significance of the normal distribution variables was estimated by the unpaired Student t test, and the non-normal distribution variables were analyzed by the Mann-Whitney U test. For comparisons between the two groups, the Kruskal-Wallis test and the one-way ANOVA were used as non-parametric and parametric methods, respectively (34). Correlation coefficients were calculated by Spearman and distance correlation analysis. The two-sided Fisher exact test was used to analyze the contingency table, and we used the Benjamini-Hochberg method to convert the P value to FDR. The Kaplan-Meier method was used to generate survival curves for the subgroups in each data set, and the logrank test was used to determine the statistical significance of the differences, with significance being defined as P < 0.05.
All of these analyses were performed in R 3.4.3, and all analyses were based on default parameters unless otherwise stated.

TME landscape in Luminal B breast cancer
To investigate the association of various immune cells in TME in Luminal B breast cancer, we used Macrophages M1 are associated with a better prognosis (log rank p < = 0.05, HR < 1) (Fig. 1B). First, the scores of 6 immune cells with significant correlation with prognosis were selected for consensus clusterPlus, and the optimal number of clusters between k = 2-10 was evaluated. 1000 times was repeated, and k = 3 was selected as the optimal number of clusters according to the CDF value and Delta area (S1_Figure). TME scores are grouped into 3 categories, defined as TMEC1-TMEC3, According to the clustering results, immune cells such as B cells naive and T cells regulatory (Tregs) scored significantly higher in TMEC1, while T cells CD4 resting and Macrophages M1 scored 10 significantly higher in TMEC2. Macrophages M2 and NK cells resting have relatively high scores in TMEC3 (Fig. 1C). The prognosis analysis of OS between TMEC showed that there was a significant difference in OS prognosis between TMEC (log rank p < 0.0001), and TMEC3 had the worst prognosis, TMEC2 had the best prognosis, and TMEC1 was between the two (Fig. 1D). Furthermore, we found that there were significant prognostic differences between TMEC3 and TMEC1 and TMEC2, respectively, while there were no significant differences between TMEC1 and TMEC2 (S2_Figure). We also analyzed the prognostic differences of the 22 immune cell scores among the three types of samples, 12 of which (54.54%) had significant differences (Fig. 1E). In conclusion, TME score may be closely related to the development of luminal type B breast cancer.

Differential expression and functional analysis of genes between TMEC
To investigate differences in gene expression patterns between different TMECs, TCGA data was selected for TMEC differential expression analysis. The differentially expressed genes (DEGs) between TMEC3 and TMEC1, TMEC2, which had the worst prognosis, were analyzed using the DESeq2 tool, and 241 DEGs shared between TMEC3/TMEC1, TMEC3/TMEC2 for subsequent analysis ( Fig. 2A). Based on a total of 241 DEGs, we first filtered out the genes with a value of 0 in 50% of the samples, and finally obtained 235 genes for subsequent analysis. NMF was used for reclassification analysis of TCGA samples to obtain two stable samples, which were defined as GeneC1 and GeneC2, respectively ( Fig. 2B). OS prognostic analysis showed that there was a significant difference in total survival between GeneC1 and GeneC2 (Fig. 2C). Comparing the distribution of GeneC1 and GeneC1 on 22 immune cell scores, GeneC1 and GeneC1 were significantly different in various TMEs. For example, GeneC1 with the worst prognosis scored significantly higher on Macrophages M0, Macrophages M1 and Macrophages M2 than GeneC2 (Fig. 2D).

Construction of the TME signature
To further screen the 235 DEGs shared between TMECs, we evaluated the importance of these 235 DEGs using the R package random Forest. According to the plot of the random forest, ntree = 100 was selected (S3_FigureA), and 24 candidate genes were selected by selecting DEGs with cumulative importance ˃ 95% (S3_FigureB, S3_FigureC), and these 24 genes are mainly enriched in T cell activation, NF − kappa B signaling pathway, Primary immunodeficiency and other immune-related GO and KEGG Pathway (Fig. 3A, B). K-means algorithm was used to cluster them, and the number of clustering was 3 (Fig. 3C), defined as signature G1, signature G2, signature G3, and contains 10, 6 and 8 genes, respectively (Fig. 3C), among G3 was in the high expression group, G2 was in the low expression group, and G1 was in the middle (Fig. 3D).
As described in the Methods section, TMEScore is built to calculate the TMEScore for each sample in the training set. Comparing TME score of Gene C, we found that the worst prognosis of GeneC1 was significantly higher than the best prognosis of GeneC2 (Fig. 3E, F). With the median of TME score (0.3657), the samples were divided into two categories: risk-h and risk-l. The prognosis of OS in the risk-h group and the risk-l group was significantly different (log rank p = 0.0024, HR = 2.928) (Fig. 3G).

Relationship between clinical characteristics of TMEscore and immune gene expression
Furthermore, we assessed the relationship between TME scores and clinical characteristics, and found no significant difference ( Figure S4). In order to study the relationship between different TME and immune state, the relationship between the expression of immune activation genes (CXCL10, CXCL9, GZMA, GZMB, PRF1, IFNG, TBX2, TNF, CD8A) and TMEC, GeneC and TMEScore was analyzed, and there were different expression patterns in different GeneC, TMEScore and TMEC, in which the expression of CXCL10 and CXCL9 in the risk-H group with poor prognosis was significantly higher than that in the risk-L group (Fig. 4A). The relationship between the expression of immune checkpoint genes (PDCD1, CTLA4, LAG3, IDO1, CD274, PDCD1LG2 and HAVCR2) and TMEC, GeneC and TMEScore was analyzed, although the expression level was not high, different GeneC, TMEScore and TMEC also had different expression trends, among which the expression of HAVCR2 and IDO1 in the risk-H group with poor prognosis was significantly higher than that in the risk-L group (Fig. 4B). The expression levels of TGF/EMT pathway genes VIM, ACTA2, COL4A1, TGFBR2, ZEB1, CLDN3, SMAD9 and TWIST1 in TMEC, GeneC and TMEScore were analyzed, and found that there was little difference in the expression patterns of different GeneC, TMEScore and TMEC, only TGFBR2 was significantly higher in the risk-h group with poor prognosis than in the risk-l group (Fig. 4C). Similarly, the same phenomenon was observed in three groups of TME samples ( Figure S5, S6). In conclusion, different TMEScore is closely related to the expression of immune genes.

TME characteristics of tumor genome
Patients were divided into risk-h and risk-l groups according to TME score. We compared the relationship between TME score and genomic variation and screened out a group of important genes related to TME score. Fisher test was used to compare the genes with significant difference in mutation frequency between risk-h and risk-l groups (intron and silent mutations were removed), and a total of 14 genes with prominent mutation were obtained (Fig. 5). The results showed that the mutation frequency of TP53 gene in risk-h was lower than that in risk-l, which might indicate that these genes were significantly correlated with TME in luminal B breast cancer.

Comparison of risk models with other models
Three prognostic-related risk models were selected: 4-gene signature (Luthra) Fig. 6A-D). Similarly, the prognostic KM curves of the three models were observed, and found that no significant differences in the Risk-H and Risk-L group samples (Fig. 6E-G). Finally, in order to compare the predictive performance of these models on Luminal B breast samples, the rms package in R was used to calculate the concordance index (Cindex) of the three models and the TMEscore model. The TMEscore model of the four models has the highest C-index (Fig. 6H), indicating that the overall performance of the model is better than the other three.

Discussion
Up to now, malignant tumor is still a major threat to human health. In the past, research has tended to focus on the tumor cells themselves. In 1989, Paget (38) put forward the theory of "seed and soil", 13 which received widespread attention once it was put forward, and thus the tumor microenvironment entered the researchers' vision. With the development of the research, it is found that the tumor is a complete tissue with its own homeostasis (39), which makes the treatment plan of the tumor gradually shift from focusing solely on the tumor itself to simultaneously intervening in the tumor microenvironment. In recent years, immunocheckpoint inhibitors have shown significant efficacy in the treatment of some advanced cancer patients (40).  (42). Therefore, it is important to develop biomarkers to predict the benefits of checkpoint immunotherapy, and the latest data support the view that TME plays a key role in checkpoint inhibitor immunotherapy (43). The role of PD-L1 blockade is enhanced by regulating the immune microenvironment in small cell lung cancer (44). Multi-functional nano-regulator reshapes immune microenvironment and enhances immune memory of tumor immunotherapy (45). Therefore, the development of biomarkers to predict checkpoint immunotherapy is extremely important. Here, we have elucidate the comprehensive landscape of interactions between Luminal B breast cancer clinical characteristics and infiltrating TME cells, and with the help of several computational algorithms, we have established a method to quantify the infiltrating pattern of TME -TMEScore.
14 Comprehensive analysis showed that TMEScore was a prognostic biomarker for Luminal type B breast cancer, and high TMEScore was associated with poor prognosis. An immune-related gene expression pattern was identified in liver tissues of patients with early-stage HCC that associates with risk of HCC development (46). Zemek RM et al. found that activation of a STAT1/NK axis in the tumor microenvironment could predicts response to immune checkpoint blockade, which suggest a biomarker-driven approach to patient management to establish whether a patient would benefit from treatment with sensitizing therapeutics before immune checkpoint blockade (47). In this study, it was found that CXCL10, CXCL9, GZMB and other immune activation genes as well as PDCD1, CTLA4 and other immune checkpoint genes were highly expressed in high TMEScore, which indicated the potential of TMEScore as a marker for the benefit of immune checkpoint blocking therapy. TP53 mutation is one of the factors of breast cancer and plays an important role in clinical treatment and prognosis (48). The frequency of TP53 mutations in high TMEScore samples was significantly lower than that in low TMEScore samples. These data can provide a new perspective for studying the mechanism of TME formation and explore the role of individual mutations in Luminal B breast cancer immunity and immunotherapy.
Although we identified potential candidate immune gene markers involved in Luminal type B breast cancer in a large sample using bioinformatics techniques, further validation should be performed in a prospective cohort of immunotherapies to more fully define cut-off values to be used. Second, given the heterogeneity of different tumor area, on the edge of the core and invade the tumor to evaluate the immune system infiltration is appropriate, because not all of the patients with high TMEScore immunotherapy has greater benefits, so it should be more clinical factors into the model to improve accuracy, in the end, only the results obtained through bioinformatics analysis is inadequate, experimental verification is needed to confirm these results. Therefore, further genetic and experimental studies with a larger sample size and experimental validation are needed.

Conclusions
In summary, in this study, we systematically evaluated TME infiltration patterns from 209 Luminal B breast cancer patients and developed methods for TME infiltration patterns. TME score was found to 15 be a strong prognostic biomarker and predictor of response to immunocheckpoint inhibitors.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
All authors agree with publication.

Data Availability
The data used for supporting the results of the study are included within the article.

Conflicts of Interest
The authors declare no conflicts of interest.

Fund
This work was supported by grants from National Nature Science Foundation of China (NCFS No.81860469).

Authors' contributions
QL designed the study. FJG and KH contributed to the literature search. XML and NJ contributed to date acquisition; QYW and YL analyzed data; MGX and SHS interpretated data; QL was responsible for writing the initial draft of the manuscript. The manuscript has been read and approved by all authors.       The relationship between TME and genomic mutation characteristics. The horizontal axis represents samples, the vertical axis represents genes, the black rectangle represents mutations, and the gray represents no mutations.
26 Figure 5 The relationship between TME and genomic mutation characteristics. The horizontal axis represents samples, the vertical axis represents genes, the black rectangle represents mutations, and the gray represents no mutations.