Mine TCGA Database for Tumor Microenvironment-Related Genes of Prognostic value in lung squamous cell carcinoma

Background: Tumor microenvironment (TME) plays a significant role in the development of cancer. However, the roles of TME in lung squamous cell carcinoma (LUSC) are not well studied. In our study, we aimed to identify differentially expressed tumor microenvironment-related genes as biomarker for predicting the prognosis of LUSC. Methods: We combined The Cancer Genome Atlas (TCGA) and Estimation of Stromal and Immune cells in Malignant Tumor tissue using Expression data (ESTIMATE) datasets to identified differentially expressed genes in lung squamous cell carcinoma microenvironment. Then, functional enrichment analysis and protein-protein interaction (PPI) network were conducted. The top six genes in the PPI network were regarded as tumor microenvironment-related hub genes. Finally, the relationship between hub genes and tumor-infiltrating immune cells was deciphered using TIMER. Results: Our study revealed that immune and stromal scores are associated with specific clinicopathologic variables in LUSC. These variables include gender, age, distant metastasis and prognosis. In addition, a total of 874 upregulated and 72 downregulated genes were identified. Functional enrichment analysis demonstrated a correlation between DEGs and the tumor microenvironment, tumor immune cells differentiation and activation. C3AR1, CSF1R, CCL2, CCR1, TYROBP, CD14were selected as the hub genes. A positive correlation was obtained between the expression of hub genes and the abundance of six immune cells. Conclusions: The results of the present study showed that ESTIMATE algorithm-based stromal and immune scores may be a reference indicator of cancer prognosis. We identified five TME-related genes, which could be used to predict the prognosis of LUSC patients. leukocyte migration, leukocyte proliferation, leukocyte cell-cell adhesion, lymphocyte proliferation, mononuclear cell proliferation, regulation of lymphocyte activation, positive regulation of cell activation, protein complex, cytokine binding, immunoglobulin binding, chemokine activity, chemokine receptor binding, cytokine activity, CCR chemokine receptor binding. receptor and protein binding. immune-related typical pathway: cytokine-cytokine receptor interaction, cell adhesion molecules (CAMs), intestinal immune network for IgA production, Phagosome, chemokine pathway.


Conclusions:
The results of the present study showed that ESTIMATE algorithm-based stromal and immune scores may be a reference indicator of cancer prognosis. We identified five TME-related genes, which could be used to predict the prognosis of LUSC patients.

Background
Lung cancer is the deadliest malignant cancer and the leading cause of cancer death in the world with about 2.09 million new cases and 1.76 million deaths each year [1,2]. According to histological examination, approximately 85% of lung cancer are non-small cell lung cancer (NSCLC), of which lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) are the most common subtypes [3,4]. In different types of lung cancer, LUSC has the second highest incidence of cell carcinoma and mortality, which have been increasing in recent years [5].Platinum-based dual chemotherapy has been the cornerstone of first-line treatment in patients with advanced NSCLC, however, this pattern changed in 2016 with the outcome of the iconic KEYNOTE-024 trial, which improved overall survival(OS) by using pembrolizumab [6,7].Targeted therapies against some mutated genes, such as EGFR, ALK, ROS1, HER2, AKT1, RET,BRAF have proved to bring better clinical outcomes to adenocarcinoma patients, but the prognostic biomarker for LUSC patients is little known [8,9]. Therefore, efforts towards to the identification of LUSC are still needed, which are important for the development of targeted therapies.
Tumor microenvironment (TME) not only is an attractive target to develop novel strategies for cancers but also plays a key role during tumor initiation, progression and metastasis [10]. There are many kinds of cells in TME including various cell types (endothelial cells, immune cells, etc.) and extracellular cells (cytokines, growth factors, extracellular matrix, etc.) that are surrounding tumor cells and nourished by vascular networks [11]. Immune cells and stromal cells play a critical role in tumor message transfer, immune surveillance and tumor niche information [12]. The prospect of effective immunotherapies for cancer is now becoming a clinical reality [13]. Immune score as a prognostic factor has become a part of tumor diagnosis and prognosis [14]. Study has predicted prognosis and reflect the immune microenvironment of LUAD [15]. The ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data) algorithm is a new method for predicting tumor purity, and the presence of infiltrating stromal/immune cells in tumor tissues by analysis gene expression data [16]. The ESTIMATE algorithm has been applied to estimate the diagnostic and prognostic values of immune and stromal cells in many tumor types, including renal, thyroid cancer and glioblastoma [17][18][19].
However, the proportion of stromal/immune cells in LUSC has not been explored using the ESTIMATE algorithm.
The Cancer Genome Atlas(TCGA) is an open project, contain genomic, epigenomic, transcriptomic, and proteomic data [20].In this study, RNA expression patterns and clinical data including gender, age, tumor stage and survival state of LUSC patients were extracted from the TCGA database. The relationship between the ESTIMATE scores and clinical data was explored. Meanwhile, the relationship between tumor microenvironment related genes and the outcomes of LUSC patients was also examined. Besides, the hub genes were identified and their correlation with immune cell infiltration were evaluated.

Microarray data
The gene expression profiles of lung squamous cell carcinoma (LUSC) patients were extracted from the TCGA database (https://portal.gdc.cancer.gov/), including 502 LUSC tissues and 49 non-LUSC tissues. FPKM data was downloaded for differential analysis. Extracted clinical data included age, gender, tumor stage, survival state, and outcome. After the data were downloaded and transposed, the Estimation of Stromal and Immune cells in Malignant Tumor tissue using Expression data (ESTIMATE) algorithm was performed, and immune and stromal scores were calculated [16].

Identification of differentially expressed genes
Evidence for differential gene expression was analyzed using the "limma R" package (http://www.bioconductor.org/packages/release/bioc/html/limma.html) [21]. Differentially expressed genes (DEGs) were screened and obtained using the cut-off values of fold change (log FC) > 1 and adjusted the false discovery rate (FDR) to a P value < 0.05. Based on the cut-off values, patients were divided into low-and highstromal/immune score groups. Group comparisons of stromal/immune scores between different clinical indexes were performed by R package. P value 0.05 was considered statistically significantly different.

Heatmaps and clustering analysis
Clustering analysis and the construction of immune and stromal heatmaps were done using the pheatmap R package(https://cran.r-project.org/web/packages/pheatmap/index.html).
The overlapping DEGs would be used for further analysis. Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were used to perform the functional annotation of DEGs. GO analysis was done for assessment of biological processes (BP), molecular functions (MF), and cellular components (CC). GO, and KEGG pathway analyses results were processed by the "clusterProfiler"(http://www.bioconductor.org/packages/release/bioc/html/clusterProfiler.html),"enrichplot"(http://www. and "ggplot2" packages. A P value of less than 0.05 was used regarded as statistically significant.

Identification of hub genes
The PPI network of overlapping DEGs would be obtained from STRING (https://string-db.org/ ver11.0, update Jan 2019) [22]. Active interaction sources were set as follows: Textmining, Experiments, Databases, Co-expression, Neighborhood, Gene Fusion and Co-occurrence. The required minimum interaction score was set at 0.4. The barplot were generated by the R software (https://www.r-project.org/ver 3.6.2). The top six DEGs were selected as the tumor microenvironment-related hub genes.

Overall survival Analysis
The prognostic value of hub genes was evaluated using Gene Expression Profiling Interactive Analysis (GEPIA(http://gepia.cancer-pku.cn/)). GEPIA is a newly developed interactive web server for cancer and normal gene expression profiling and analyses [23]. The P value 0.05 was set as the cut-off value.

The Human Protein Atlas
All the data in the Human Protein Atlas(https://www.proteinatlas.org/) are open and composed of six independent parts, the Tissue Atlas shows the distribution of protein in all major tissues and organs of the human body [24]. We analyzed the hub gene protein expression in LUSC via immunohistochemistry (IHC) analysis.

Hub genes and tumor-infiltrating immune cells
The TIMER (https://cistrome.shinyapps.io/timer/) is a web resource for systematical evaluations of the clinical impact of different immune cells in diverse cancer types. The relationship between hub genes and tumorinfiltrating immune cells (B cell, CD4 T cell, CD8 T cell, neutrophil, macrophage and dendritic cell) was deciphered using TIMER [25].

Patient characteristics
The data on gene expression profiles and complete clinicopathologic variables of 484 LUSC case were extracted from the TCGA database in Jan 2020. The detail demographic and baseline characteristic of 484 LUSC patients are described in Table 1. Table 1 The baseline characteristics of all patients.   (Fig. 1d). These results indicate that immune scores, stromal scores and ESTIMATE scores are associated with specific clinicopathologic variables.
Kaplan-Meier survival curves were plotted to find out the potential correlation of overall survival with immune, stromal, and ESTIMATE scores. Firstly, 484 cases of LUSC patients were divided into high and low score groups based on their scores. Our results of ESTIMATE scores demonstrate a more prolonged overall survival of LUSC patients in the low score group compared with the high score group (Fig. 2a, median 539 d VS 587 d, P = 0.05 in log-rank test). Consistently, stromal (Fig. 2b, median 513.5 d VS 585 d ,P = 0.066 ) and immune (Fig. 2c, median 535 d VS 581 d ,P = 0.219) score results showed longer median overall survival of LUSC patients in the lower score group compared to the higher score group, although the differences were not statistically significant.

Comparison of gene expression profiles with immune scores and stromal scores in LUSC
We compared the data of 484 LUSC patients extracted from the TCGA database to reveal the correlation of gene expression profiles in immune scores and stromal scores. The stromal score based heatmaps are presented in Fig. 2d, which demonstrated that 1195 and 136 genes were up-downregulated in the high score than low score groups, respectively (fold change > 1, P < 0.05). The immune score based heatmaps are presented in Fig. 2e, which showed 1187 and 176 up-downregulated genes in the high than low score groups, respectively (fold change > 1, P < 0.05).Moreover, Venn diagrams (Fig. 2f,g) showed that 874 genes were commonly upregulated, and 72 genes were commonly downregulated.

Functional enrichment analysis of DEGs
Functional enrichment analyses by applying the clusterProfiler R package were conducted to determine the functions of the DEGs. Figure 3a shows the top 10 biological processes (BP), the top 10 cellular components (CC) and the top 10 molecular functions (MF) identified following GO analyses. The ten enriched BPs include T cell activation, leukocyte migration, leukocyte proliferation, leukocyte cell-cell adhesion, lymphocyte proliferation, mononuclear cell proliferation, regulation of lymphocyte activation, positive regulation of cell activation, regulation of T cell activation, positive regulation of cytokine production. All the above BPs are involved in immune cells differentiation and activation. The ten enriched CCs include external side of plasma membrane, secretory granule membrane, collagen-containing extracellular matrix, MHC class II protein complex, tertiary granule, tertiary granule membrane, MHC protein complex, ficolin-1-rich granule membrane, clathrin-coated endocytic vesicle, specific granule. All the above BPs are involved in extracellular matrix and membrane. The ten enriched MFs include carbohydrate binding cytokine receptor activity, cytokine binding, immunoglobulin binding, glycosaminoglycan binding, chemokine activity, chemokine receptor binding, cytokine activity, CCR chemokine receptor binding, heparin binding. Also, the above MFs are associated with surface receptor activity and protein binding. Notably, KEGG analysis (Fig. 3b) suggested that most DEG-related pathway were significantly linked to immune response.

PPI network and identification of hub genes
The present study used the online STRING tool to explore protein-protein interaction (PPI) networks among significant target genes. The functional network contained 153 nodes and 281 edges (Fig. 3c). According to the PPI network, the top six significant genes (C3AR1, CSF1R, CCL2, CCR1, TYROBP, CD14) were selected as hub genes (Fig. 3d).

The roles of hub genes in overall survival in LUSC
This study plotted Kaplan-Meier survival curves to identify the underlying functions of hub genes in the overall survival of LUSC patients. The log-rank test of six hub genes showed remarkably predict poor overall survival in LUSC except TYROBP (Fig. 4a, P < 0.05). And all six genes expression was significant (Fig. 4b, P < 0.05).

Immunohistochemistry of the five hub genes
Immunohistochemistry of the five hub genes were collected based on the Human Protein Atlas. The protein levels of 4 genes were higher in tumor tissues compared with normal tissues. Immunohistochemistry of CCR1were not found. (Fig. 5a) Protein levels of C3AR1 in normal tissue and tumor tissue. (Fig. 5b) Protein levels of CSF1R in normal tissue and tumor tissue. (Fig. 5c) Protein levels of CCL2 in normal tissue and tumor tissue. (Fig. 5d) Proteins level of CD14 in normal tissue and tumor tissue.The correlation between hub genes and immune cell infiltration The present study further analyzed the correlation between hub genes and immune cell infiltration. Our results revealed positive correlations between hub genes expression and immune cell infiltration (C3AR1 Fig. 6a, CSF1R Fig. 6b, CCL2 Fig. 6c, CCR1 Fig. 6d, CD14 Fig. 6e. Details are given in the Table 3. Table 3 The correlation between hub genes and immune cell infiltration.

Discussion
Worldwide, lung cancer remains a highly lethal disease and the most common cause of cancer death [26]. NSCLC remains a major cause of cancer-related mortality in both men and women with survival rate of 19.3% [27].In American, the rate of decline in lung cancer mortality has accelerated, with men dropping from 3% a year during 2008 through 2013 to 5% during 2013 through 2018 and women from 2% to nearly 4% ,spurring the biggest one-year drop in overall cancer mortality of 2.2% from 2016 to 2017.However, lung cancer still has more death in 2017 than breast, prostate, colorectal and brain cancers combined [2]. LUSC and LUAD are the two major subtypes of lung cancer, in which LUSC progresses faster than LUAD of the same stage [28]. Tumor-related prognosis or predictive markers in the immune microenvironment significantly changed the treatment prospects of lung cancer and significantly improved the prognosis of some patients, especially those with adenocarcinoma [29][30][31][32][33].
The tumor microenvironment consists of a complex tissue system through which immune and stromal cells can regulate anti-tumor immunogenicity. Recently, many studies have shown prognostic markers in tumor microenvironment for tumor prognosis. Zeng et al provides potential TME-related biomarker for the therapy and prognosis of RCC [34].Zeng et al describe the comprehensive features of gastric cancer TME characteristics and provide new strategies for cancer treatment [35].A present study found that many elements of TME other than tumor epithelial cells influence the progression of HNSCC [36]. Another study highlights the favorable prognostic impact of an immune-inflamed TME in LUAD [37]. There is also studies analyzed the immune microenvironment of stage I LUAD and found that the high expression of IL-7R and the lower expression of IL-12Rβ2 were all independent factors of poor prognosis [38].Yue et al identified three TME-related DEGs signature which could be used to predict the prognosis of LUAD patients [39].Chen et al revealed the differentiated regulation of immuneresponse related genes between LUAD and LUSC [28]. However, genome changes in squamous cell lung cancer have not been fully characterized and no molecular targeted drugs have been specifically developed for the treatment [40]. Moreover, prognostic biomarkers related to TME in LUSC is still lacking. In this study, we explore differentially expressedTME-related genes and reveal prognostic biomarkers for predicting the diagnosis and prognosis of LUSC.
The present study extracted data from the TCGA database to explore TME-related genes that affect the overall survival and tumorigenesis of LUSC. Subsequently, we identified prognostic and therapeutic biomarkers for LUSC. This study first assessed the relationship between clinicopathological variables and immune and stromal scores in LUSC. Our results suggest that immune score and stromal score are associated with specific clinicopathological variables (age, gender, tumor stage). In addition, immune and stromal score showed great potential in prognostic prediction with LUSC patients.
After distinguishing the high and low immune/stromal scores groups, we identified 946 differentially expressed genes (DEGs). The functional enrichment analyses demonstrated that these DEGs are involved in immune response, immune cells differentiation and activation, extracellular matrix and membrane. For example, GO analysis revealed that DEGs enriched T cell activation, leukocyte migration, leukocyte proliferation, leukocyte cell-cell adhesion, plasma membrane, secretory granule membrane, collagen-containing extracellular matrix, MHC class II protein complex, cytokine binding, immunoglobulin binding, chemokine activity, chemokine receptor binding, cytokine activity, CCR chemokine receptor binding. Also, the above MFs are associated with surface receptor activity and protein binding. This study also showed that DEGs were most enriched in immunerelated typical pathway: cytokine-cytokine receptor interaction, cell adhesion molecules (CAMs), intestinal immune network for IgA production, Phagosome, chemokine signaling pathway. Our results confirmed previous reports on genomic alterations and tumor microenvironment impact cancer proliferation and invasion and roles of immune cells and their interactions with cancer cells in LUSC [41][42][43].
In addition, we further clarified the interplay among DEGs by using the STRING tool to construct protein-protein interaction (PPI) networks. The results demonstrated that all the significant genes are involved in immune response, and the top six genes (C3AR1, CSF1R, CCL2, CCR1, TYROBP, CD14)were identified as hub genes.One research suggested that the exclusion of CD8 T cells from the tumor islets was associated with poorer clinical outcomes and lower lymphocyte activity in LUSC patients, but CSF1R blockade enhanced the migration and infiltration of CD8 T cells to the tumor islets [44]. CSF1R inhibitors represent a new immune-modulatory in tumor therapy [45]. A previous study reported that CCL2 overexpression was associated with total survival and progression-free survival in patients with LUSC [46]. Wang et al showed that CCR1 knockdown suppresses NSCLC cell invasion [47]. Although reports about C3AR1, and CD14 are lacking, the between those genes and immuneassociated disease and other tumors has been reported [48,49]. These data suggest that our results based on the TCGA database have predictive value. Notably, most of the genes we identified have not been previously reported in LUSC, suggesting that these genes should de further validated.
Increasing evidence suggested a significant correlation between immune cell infiltration and patient outcomes [50,51]. The current study revealed positive correlations between hub genes expression and immune cell infiltration (C3AR1, CSF1R, CCL2, CCR1, TYROBP, CD14). Therefore, these genes may provide more information about immune cell infiltration and clinical outcomes in LUSC patients.
Many studies have explored the correlation between gene expression and tumorigenesis and prognosis of NSCLC or LUAD in tumor/immune microenvironment. Recently, there are several treatment methods for targeting TMErelated genes and have got good results in clinical or clinical trial. We firstly used the ESTIMATE algorithm to explored the association between immune/stromal scores and LUSC clinicopathologic and got TME-related genes as the novel potential biomarkers for LUSC. Although our study has achieved very valuable results, this study was conducted in small cohorts and has significant limitations and clinical trials still need.

Conclusions
In conclusion, targeted therapy and immunotherapy progress rapidly, however, the overall progression-free survival rate in LUSC patients remains unsatisfactory. To improve the clinical outcomes of LUSC patients, there is an urgent need to monitor biomarkers of targeted treatment response. Using bioinformatics methods, we focused on TME-related genes of LUSC based on TCGA database and screened five potential prognostic genes (C3AR1, CSF1R, CCL2, CCR1, CD14), two of which (C3AR1 and CD14) lack relevant research in LUSC. Although the five prognostic markers identified in our study may still need lots of clinical trials to validation, these genes may provide prospects for prognostic evaluation of LUSC patients.  Association between immune/stromal scores and clinical characteristics in LUSC. a A significant association was revealed between gender and the level of immune scores (P=2.886e-04) and ESTIMATE scores (P=0.007). b A significant association was revealed between age (1 means age 602 means age ≥60) and the level of immune scores (P=0.009), stromal scores (P=0.017), and ESTIMATE scores (P=0.007) . c LUSC patients with distant metastases had higher immune scores (P=0.88, median scores 1034.73 VS 1016.71), stromal scores (P=0.307, median scores 56.57 VS -95.23), and ESTIMATE scores (P=0.666, median scores 1253.27 VS 948.14), although statistically not significant. d Stage IV cases have higher stromal scores (median scores -25.95 VS -65.06) and ESTIMATE scores (median scores 1038.32 VS 956.59) compared with stage I, although statistically not significant.

Figure 2
Overall survival and comparison of gene expression profiles with immune scores and stromal scores in LUSC. a Overall survival in LUSC patients with high ESTIMATE score is poorer than that of patients with low ESTIMATE scores (p=0.05). b Overall survival in LUSC patients with high stromal scores is poorer than that of patients with   Correlation of expression of hub genes in overall survival in LUSC. a Kaplan-Meier survival curves were generated for hub genes extracted from the comparison of groups of high (red line) and low (blue line) gene expression. b shown Expression on Box Plots of tumor (red) and normal (grey).

Figure 5
Immunohistochemistry of the five hub genes a Protein levels of C3AR1 in normal tissue and tumor tissue. b Protein levels of CSF1R in normal tissue and tumor tissue. c Protein levels of CCL2 in normal tissue and tumor tissue. d Proteins level of CD14 in normal tissue and tumor tissue.