The tumor environment immune phenotype of LUSC by genome-wide analysis

Background To compare the landscape of tumor microenvironment (TME) of lung squamous carcinoma (LUSC) in different immune pattern and explore potential factors on immune therapy and prognosis


Background
Lung cancer was one of the most common cancer worldwide (12% of all cancers), which caused enormous economic and medical burdens every year. 1 .About 80-85% of lung cancers are non-small cell carcinoma (NSCLC) 2 and squamous cell carcinoma is one of the major subtypes of NSCLC (more than 30% 3 ).Strategies targeting the immune responses have demonstrated signi cant antitumor activity across a range of solid tumors, including NSCLC.Several immune therapy drugs including immune checkpoint inhibitors such as pembrolizumab and nivolumab have been widely applied in the treatment of patients with LUSC and showed excellent results [4][5][6] .
The tumor microenvironment (TME) could signi cantly impact the response of immunotherapy.
Microenvironment-mediated drug resistance can be induced by soluble factors secreted by a tumor or stromal cells, while different immune cells can also improve or obstruct therapeutic e cacy and may vary in their activation status within the TME.Therefore, evaluating the TME status can help us to predict the potential response to immunotherapy.
In our study, we collected 728 LUSC patients and classi ed them according to the immune cells' in ltration abundance in the tumor samples calculated by GSEA.Then we integrally compared the genomic signatures, clinical characteristics and immune microenvironment including immune checkpoints factors between patients with different immune cells' in ltration abundance and seek valuable potential biomarkers to predict the response to immunotherapy.We hope this research will offer some important insights into the complex relationship between the heterogeneity of intratumoral immune cells, tumor molecular subtypes, and disease progression in LUSC.

Method And Materials
Acquisition and preprocessing of LUSC data sets Discovery cohort: Gene expression data of LUSC patients (FPKM normalized) and corresponding clinical information of The Cancer Genome Atlas (TCGA) were downloaded from the UCSC Xena browser (GDC hub: https://gdc.xenahubs.net) 7.The data with unclear prognosis information including outcome status and survival time was removed.
Validation cohort one: Gene Expression Omnibus (GEO) data of LUSC patient's data was composed by the datasets including GSE3141, GSE8894, GSE19188, GSE29013, GSE30219, GSE37745, GSE43580, GSE50081 and GSE115457 from the same chip platform (Affymetrix Human Genome U133 Plus 2.0 Array) after systematic screening [8][9][10][11][12][13][14][15] .The expression data and clinical data were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo).The probe sets of Affymetrix Human Genome U133 Plus 2.0 chips were annotated to gene names based on the annotation platforms GPL570.We have eliminated the batch effect due to the heterogeneity among different studies by the COMBAT empirical Bayes method in the gsva package and performed the data normalization with the limma package 16,17 .
Validation cohort two 44 samples were obtained from patients with lung squamous-cell carcinoma who underwent surgical resection between July 2012 and December 2012 at Zhongshan Hospital, Fudan University.Every patient was provided informed consent to conduct genomic studies in accordance with the ethical principles of the Declaration of Helsinki 18 .The study was approved by the ethical committees of Zhongshan Hospital (No. 2011-219(2)).All pulmonary resections were performed by experienced thoracic surgeons and quali ed pathologists to con rm the diagnosis of LUSC.Patients with metastasis, history of chemoradiotherapy or immunotherapy, and unclear survival information were excluded.Annual follow-up was performed by call or clinic.RNA sequencing for all tumor samples was performed using Illumina Hiseq 2500 and BGI-500RNAseq platforms.

Classi cation based on microenvironment immune cell abundance
We have used the gene signatures proposed by Bindea et al. to detect the relative abundances of the 24 immune cell populations by GSVA algorithm 19 , which consist of 585 genes representing 24 microenvironment cell subsets from both innate and adaptive immunity (Supplement Table 1).GSVA algorithm was an unsupervised gene set enrichment method that computes an enrichment score by integrating the collective expression of a given gene set relative to the other genes in the sample 20 .A matrix containing the enrichment score ranging from − 1 to 1 for each cell type in each tumor sample was produced by GSVA function, representing the relative abundances of the immune cell populations used for further analyses.

Cluster analysis and construction of the random forest classi er model
Samples were grouped based on the immune cells in ltration pattern by an unsupervised clustering method based on Euclidean distance and Ward's linkage and the optimal number of "TME clusters" was determined based on the percentage of the variance of the data using the ConsensusClusterPlus package with 1000 repeats 21,22 .Then we calculated the correlation matrix by Spearman's test.The TME cluster classi er model was established using the random forest algorithm by randomForest package in R, a machine learning dimension reduction strategy based on the construction of thousands of classi cation or regression trees 23,24 .Patients in the TCGA database were randomly assigned to training and testing groups at a ratio of 7:3 25 .

Differentially expressed genes and functional analysis
The expressed genes in different TME clusters were evaluated by the limma package in R and the threshold for identi cation of DEGs was set as p < 0.05 and logFC (fold change) > 0.5 26 .Functional enrichment analysis of DEGs was performed by the clusterPro ler package in R. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) terms were completed in R and the cutoff was adjusted P < 0.01 and false discovery rate (FDR) < 0.05.Single-sample GSEA analysis was carried out using the GSVA Bioconductor package according to the gene signatures associated with tumor development and immune response 20 .We also identi ed pathways that were up-or downregulated in TME clusters with GSEA.Gene sets were obtained from the MSigDB database of the Broad Institute, and c5.bp.symbols gene sets were selected to perform quanti cation of pathway activity 27 .

Comparison of somatic mutations and copy number variation among TME clusters
Somatic mutation and copy number variation data were obtained from the Xena browser.We have compared the somatic mutations and copy number variation in TME clusters by Kruskal-Wallis test and the results were shown with the oncoplot function in the maftools package.

Statistical analysis
Categorical variables were compared using Fisher's exact test and Pearson's c2 test and continuous variables were compared using Student's t-test and the Wilcoxon test.Kaplan-Meier survival estimates were generated and visualized by the ggplot2 package.Multivariable Cox regression analyses were used to test the independent prognostic value of the immune clusters using the R package survival and the coxph function.All analyses were performed in R version 3.6.1.Unless otherwise stated, results were considered statistically signi cant, if p-value < 0.05.The p-values were all two-sided.

Result
The landscape of the microenvironment phenotype in LUSC Our study included 493 LUSC patients in TCGA, 370 LUSC patients in GEO and 44 LUSC patients in our hospital.The design of the study was shown in supplement Fig. 1A.Patients from TCGA were divided into the discovery cohort while the patients from GEO and our hospital were divided into the validation cohort.We used the GSVA enrichment score of the 24 immune cell populations to de ne the relevant abundance in each tumor sample to describe the immune in ltration phenotype in LUSC patients (supplement Table 2).We employed the unsupervised consensus clustering base on the GSVA score of the 24 cell populations by the ConsensusClusterPlus package in R, and after k-means clustering, we found the best separation was a dichotomy (supplement Fig. 1B).We divided the LUSC patients from TCGA into two heterogeneous clusters named TME cluster A and TME cluster B, which meant the different immune in ltration patterns of the 24 immune cells.(shown in Fig. 1A).The TME cluster A had 243 patients while the TME cluster B covered 250 patients.The difference between 2 TME clusters was the immune in ltration pattern.We also found the different 5-year OS and MST between two TME clusters (Fig. 1B, A vs. B, 43.9% vs. 52.2%,p = 0.028) (MST, A vs. B, 49.0 months vs. 61.9months).TME cluster A was marked with blue and the B group was marked by red, which meant the different immune cell in ltration abundance.Red meant higher in ltration of the innate and adaptive immune response cell including pDCs, Tgd cells, macrophages, B cells, T cells, cytotoxic cells, and Treg cells.There was a signi cant difference between the two TME clusters groups (Fig. 1C) and the relevance between every immune cell and prognosis was shown in Fig. 1D.We found there was a signi cant association between most immune in ltration cells and survival.According to the baseline analysis (Fig. 1E), we also found only sex was associated with immune cell in ltration (p = 0.017) while the age, stage, and location were not related to immune cell in ltration.Male patients had higher immune cell in ltration after logistic analysis (p = 0.046).

Construction and validation of the random forest classi er model
We have established a new classi cation method through random decision forests based on Bootstrap aggregating and random subspace method to sort patients with LUSC based on the different immune in ltration relative abundant.We divided randomly the TCGA patients into 2 cohorts including 345 patients in train cohorts and 148 patients in the test cohort.And the relative abundance of 24 immune cell populations was input variate while the unsupervised TME cluster (Fig. 1A and supplement Fig. 1C) was set as the response factor.The most suitable mtry and ntrees were selected as 4 and 3000 and we use the random forest model to verify the training and internal testing cohorts.The accuracy rate was 100% and 99.3%, so we could recognize it was a stable model and the separation of the scatter diagram also showed the validity of the model.We could nd there was nearly no overlapping area in the scatter diagram (supplement Fig. 5A).Therefore, we could use the random forest model to verify the patients in GEO and our hospital.The classi ed results of the GEO cohort and our hospital were shown in Fig. 2A-2B.And we found the immune cell in ltration in GEO and hospital patients were similar to patients in TCGA, which also certi ed the validity of the random forest model again.The TME cluster B in the GEO cohort had more immune cell in ltration (shown in Fig. 2C) and the TME cluster B had a higher 5 years OS rate (shown in Fig. 2E).Patients in our hospital were also classi ed into 2 groups by the model and the TME cluster B had more immune in ltration relative abundant (Fig. 2D), but there was no survival difference between 2 groups in our hospital (Fig. 2F, 78.3% vs. 84.0%,p = 0.640), which might be due to the lack of samples.

DEGs and functional annotation
The 5-year survival difference in different tumor microenvironment phenotypes may be related to the DEGs.So, we have examined the expression pro les from TCGA and we found 1828 signi cant different genes including 301 up-regulated genes in TME cluster A and 1527 up-regulated genes in TME cluster B (supplement Table 3).The result was shown in Fig. 3A and we simultaneously labeled the top 10 upregulated or down-regulated genes.The KEGG and GO enrichment analysis was operated by clusterPro ler in R. We sought out the enrichment pathways on immune cell activation and adhesion in TME cluster B by KEGG and GO analysis.The enrichment pathways in TME cluster B included the cytokine-cytokine receptor interaction pathway and other metabolism signal pathways (Fig. 3B-3C).The above DEGs enrichment results were consistent with the immune in ltration pattern in TME cluster B. At the same time, the GSEA algorithm was performed to further verify the enrichment pathways and showed similar results (supplement Fig. 3).Besides, we also have utilized the ssGSEA to compare the different patterns in those TME subgroups and the results showed that immune-related pathways were upregulated in the TME cluster B group, which was consistent with the KEGG/GO result (supplement Table 4).Several important signal pathways including EMT, TGF-β, RAS and so on were also up-regulated in TME cluster B, which meant those signal pathways may play an important role in immune in ltration (supplement Fig. 4A).The DEGs and functional annotation analysis of data from GEO and our hospital was shown in supplement Fig. 4B-4G.

Immune microenvironment traits associated with TME clusters
The difference of 2 TME clusters was immune in ltration pattern, so we have compared the expression level of the immune-related genes, cytokines, and miRNA of LUSC patients in TCGA to explore the tumor environment difference characteristics.Pervious a POPLAR trial 6,28,29 had mentioned a 7-gene sign including (CD8A, GZMA, GZMB, IFNγ, EOMES, CXCL9, CXCL10, and TBX21) which was associated with activated T cells, immune cytolytic activity, and interferon-γ expression.Another article quanti ed the cytolytic activity of the local immune in ltrate and devised a simple and quantitative measure of immune cytolytic activity ('CYT') based on transcript levels of two key cytolytic effectors, granzyme A (GZMA) and perforin (PRF1), which are dramatically upregulated upon CD8 + T cell activation.In our study, the 8 genes associated with the cytotoxic function was up-regulated in TME cluster B (p < 0.05), which meant the TME B had higher immune cytotoxic (Fig. 4A).Besides, we also found some genes related to innate immunity including MYD88, TLR3 and so on were up-regulated in TME cluster B (p < 0.05) but the MAVS gene was down-regulated (Fig. 4B).Meanwhile, the TME cluster B had a more MHC-associated antigen molecular expression than cluster A. We still compared the miRNA between in 2 clusters (shown in supplement Fig. 5D).According to the heat map, we found the TME cluster B had more immune cells including activated cells and suppressor cells such as Treg cells or iDCs.Therefore, we have explored the immune activation level by calculating the ratio of CD8 + T cells/ Treg cells and aDC/iDC.The results were shown that the TME cluster B group had a higher ratio, which meant the TME cluster B had a more activated immune environment including effect cells and antigen-presenting cells (Fig. 4C).The expression of immunoregulation factors in 2 clusters was shown in Fig. 4D-4E.We have compared 15 immune checkpointing molecules and found 12 higher-expression molecules in the cluster B group (p < 0.001, Fig. 4D).There were also 14 signi cant higher-expression co-stimulated factors in cluster B. (p < 0.001, Fig. 4E).important immune checkpoints molecules such as CTLA4, PDCD1 (PD-1), LAG3, and CD274 (PD-L1) were up-regulated in cluster B (Fig. 4F).We could also observe the similar trend of immune-related factors of patients from GEO and our hospital even though some immune molecules were no different (supplement Fig. 6).We also have built a correlation between immune checkpoints (Fig. 4G).

Tumor genomic alterations associated with TME clusters
Patients in different immune in ltration patterns had a different genomic mutation, which could cause survival differences.Some studies suggested there was a signi cant relation to immunotherapy effects and genomic mutation.We have compared the mutation load and copy number variation (CNV) between two TME clusters and found the TME cluster B had higher mutation load (243.97 vs 257.20, p = 0.023) and CNV (1629 vs 2419, p < 0.001), which suggested the number of somatic mutations may play an important role in immune in ltration (Fig. 5A and supplement Fig. 5B-5C).

Prognostic value of TME clusters
We have compared the survival rates of patients between different TME clusters.Patients in TME cluster B from TCGA had a better 5-year OS (A vs. B, 43.9% vs. 52.2%,p = 0.028) and the TME cluster B group from GEO also had a higher 5-year survival (A vs. B, 33.1% vs. 46.4% p = 0.0073).There was no signi cant survival difference between different TME cluster patients in our hospital (A vs. B, 78.3% vs. 84.0%,p = 0.640).The cox multivariable analysis also showed the TME cluster B had a better prognosis in patients from TCGA (B vs. A, HR = 0.673, p = 0.006) and GEO (B vs. A, HR = 0.662 p = 0.005).We have built the predictive model of prognosis by nomogram and the c-index of the model was 0.654 and 0.731, which suggested the TME cluster was a valuable predictive factor (supplement Fig. 2).

Discussion
In recent decades, studies focused on the immune landscape have obtained a lot of attention in cancer biological and clinical research.Immune markers such as CD3 + or CD8 + have suggested the predictive value of prognosis in several cancer types 30 .The immune landscape is an important factor in the immune response in cancer.Therefore, to explore genomic and clinical characteristics in different immune in ltration pattern of LUSC patients, we collected 728 LUSC patients from common datasets and our institution and divided them into 2 groups named "TME cluster" based on the abundant of immune cells by GSEA method.Then we systemically compared the genomic signi cance, clinical characteristic and immune in ltration pattern in 2 TME clusters and found TME cluster was an important predictive factor in prognosis.
Several studies have suggested that the immune in ltration status in the tumor samples was signi cantly correlated with the prognosis in breast cancer, renal cancer, and neck cancer [31][32][33] .In our results, we also found that TME cluster A, a subgroup with lower immune cell abundant, was associated with poor survival.,Some studies have reported the abundance of immune cells in tumor tissue could affect the response to immune checkpoints inhibitors such as nivolumab, which could explain the non-ideal OS in some advanced NSCLC patients with immune therapy.However, it also has been reported the recruitment of Tregs could promote the immune escape of tumors in NSCLC 34 , which might be con ictive to our founding.Therefore, we speculated the co-in ltration of Treg cells and the effect of T cells in cluster B could maintain the immune balance 35,36 .The anti-tumor effect of the immune microenvironment was decided by the ratio of effect cells and suppressor cells.We have calculated the ratio of CD8 + T cell/Treg cell and aDC/iDC and found the TME cluster B had higher activated immune cells.
The co-expression immune-related genes of de ning affect T cells and IFN-γ expression including CD8A, GZMA, GZMB, IFNγ, EOMES, CXCL9, CXCL10, and TBX21 were both up-regulated in TME cluster B, which was consistent with the T cells in ltration pattern.It has been reported the high-expression of immune-related genes could increase the response rate to immunotherapy in NSCLC cancer and improve the overall survival, which might suggest TME cluster B was an immunotherapy-sensitive subgroup 37,38 .At the same time, the cytolytic activity score which was associated with OS as some studies reported based on the PRF1 and GZMA which re ected the anti-tumor effect also was higher in cluster B 39,40 .Sumana Narayanan et al 41 suggested a higher CYT score was associated with higher expression of immune checkpoint molecules and higher mutation load in colon cancer.It was further veri ed in LUSC that cluster B had a higher mutation load in our study.Tumor mutation burden was an important predictive factor in immune therapy.Several signi cant mutation DNA repair genes including TP53, ATR, and LIG1 which were considered to be associated with the effect of PD-1 inhibitors 42 were both up-regulated in the TME B cluster.Key targeted gene aberrations in LUSC such as PI3K, ROS and EGFR mutation were more common in TME cluster B, which means LUSC patients with those mutations may also be suitable for immunotherapy.Besides, it has been reported higher CNV could increase the expression of the cancerrelated genes, which was consistent with our study 13 .
The low immune in ltration pattern was considered being associated with distinct tumor immune escape mechanisms 43 including reduced expression of major histocompatibility complex (MHC) class I, adhesion and costimulatory molecules, loss of antigens, and increased expression of immunosuppressive components such as HLA-G, HLA-E, and PD-L1 and of other immunosuppressive factors such as cytokine and metabolites that contribute to the escape from immune recognition 44 .The expression of adhesion (CADM1) 45 and costimulatory molecules (CD80, CD86) were increased in cluster B while the immune checkpoint molecules such as PD-L1 and CTLA-4 were up-regulated in cluster B, which was coincident with GO analysis.TGF-β, another factor that has been identi ed as a key driver of tumor plasticity 46,47 , was up-regulated in cluster B, which might be an important therapy site.
Rachel Rosenthal et al suggested the diverse TME impact upon neoantigen presentation, as well as the tumor-speci c mechanisms leading to immune escape 48 .The disruption of tumor antigen presentation was a signi cant pathway to promote immune escape.Antigen presentation cells (APC) such as dendritic cells and macrophages were up-regulated in TME cluster B and the ratio of activated DC/inhibited DC was also higher in cluster B, so, we speculated the cluster with lower in ltration pattern could lead to immune escape via the down-regulation of active APC.Some speci c genes in the APM process including PSMB5, PSMB6, PSMB7, PSMB8, PSMB9, PSMB10, TAP1, TAP2, ERAP2, CANX, CALR, PDIA3, TAPBP, B2M, HLA-A, HLA-B, and HLA-C could predict the e ciency of this antigen-processing and presenting steps based on previous studies (46).Therefore, identi cation of such genes might lead to further insight into the complex interaction between tumor cells and the immune system and thus facilitate the development of personalized immune therapeutic regimens and enhance the response rate of immune checkpoint inhibitors in LUSC patients.
There were several limitations to our study.First, accessible gene-expression data of patients who received immunotherapy is insu cient currently, so we can't validate our ndings in patients who received immunotherapy.Second, most of the tumor samples from TCGA and GEO were single-loci.Given the spatial heterogeneity of intratumor immunoreactivity, the exploration of the multi-foci sample is still warranted.Besides, the study was a retrospective descriptive study.In summary, by machine learning methods and multi-omics pro ling, our large-cohort study still described a comprehensive landscape of LUSC immune in ltration patterns and integrated biomarkers associated with distinct immunophenotypes, thus explored the interaction between tumors and immune microenvironment, which may guide a more precise and personalized immune therapeutic strategy for LUSC patients.

Conclusions
Our study has described the microenvironment landscape of LUSC in different immune in ltration patterns and systemically analyzed genomic and clinical characteristics with distinct immunophenotypes, thus partly revealed the interaction between tumors and immune microenvironment, which may guide a more precise and personalized immune therapeutic strategy for LUSC patients.The immune in ltration pattern could be a prognosis predictor in LUSC.

Table 1 TCGA
Patient Characteristics According to the Status of TME cluster

Table 2
The result of UVA and MVA Cox proportional hazards regression in TCGA data