Analysis of prognostic immune and stromal- related genes in melanoma microenvironment

Increasing evidence has showed the immune response is an important indicator for predicting therapeutic ecacy, and the prognosis value of the inltration degree of immune and stromal cells within the tumor microenvironment has been extensively explored. It is necessary to better understand the prognosis value of genes concerning the immune and stromal cells in melanoma patients. A total of 714 up-regulated genes in the high immune and stromal score groups compared with low immune and stromal score groups. Then, a total of 104 hub genes of these differential genes were screened by protein-protein interaction network, and their functional enrichment and signal pathway enrichment analysis showed that they mainly affected tumor microenvironment by cytokine and cytokine receptors interaction, chemokine signaling pathway and T cell receptor signaling pathway involved in regulating Th1, Th2, Th17 differentiation. a single factor risk analysis of survival was performed and found 59 low-risk genes. On this basis, multi-factor risk analysis was carried out, and 10 genes risk model was constructed in TCGA-SKCM. Compared with clinically relevant parameters, the risk models established by these 10 genes showed that these 10 genes could be used as independent factors to predict the prognosis of SKCM patients. The CIBERSORT algorithm was used to calculate immune cell subtypes. In the high-risk group, the inltration of B cells naïve, monocytes and Macrophages M2 was signicantly higher than that in the low-risk group. Finally, we validated these 10 gene in a different melanoma cohort extracted from the GEO database. we identied a list of melanoma immune-related genes that might predict outcomes of melanoma patients. These ndings indicated that monitoring and manipulating the melanoma microenvironment could be a promising strategy for precision immunotherapy.


Introduction
Melanoma, arising from a malignant transformation of the melanocytes, is the fth most common type of cancer and has the highest mortality rate than other form of skin cancer in adults [1,2]. The incidence has been increasing in the United States and worldwide, with an estimated 100,350 adults (60,190 male and 40160 female) diagnosed in the United States in 2020, taking up 5.6% of all new cancer cases and leading to 6850 deaths (1.1% of all estimated deaths) [3]. Once melanoma has spread, this type of cancer rapidly becomes life-threatening [2]. In recent years, the melanoma research eld has seen an unprecedented advances due to the increasing number of researches concerning the mechanisms of gene and immune regulation in melanoma [1]. Speci cally, immune checkpoint inhibitors have dramatically changed the treatment options for melanoma [4]. However, the intrinsic and acquired resistance puts a fundamental limitation in extending the bene ts to all patients [5]. Therefore, it is necessary to nd the molecules that affect immune response and prognosis, which may promote drug development and improve the therapeutic effect.
The tumor microenvironment (TME) is a complex milieu in which the tumor is located, consisting of immune, mesenchymal, and endothelial cells, cytokines, chemokines, and extracellular matrix (ECM) [6,7]. These cells and molecules are in a dynamic process and contribute to tumor immune escape, tumor growth and metastasis [8]. Immune and stromal cells in the TME were valuable for diagnosis and prognosis of cancer [9]. Therefore, understanding the molecular features in the melanoma microenvironment not only provides novel insight into the melanoma occurrence, development and immune escape, but also is crucial for effectively manage cancer progression and immune response. The ESTIMATE algorithm, designed by Yoshihara et al, was helpful in calculating the immune and stromal scores to evaluate the non-tumor cells in ltration by analyzing the transcriptome data which contains a set of immune and stromal genes [10], and the Cutoff Finder was used to identify the cutoffs of the ESTIMATE results [11]. Researchers have used it to performed prognostic assessments in prostate cancer [12], glioblastoma [13], colon cancer [14], and clear cell renal cell carcinoma [15]. However, the value of molecule features for melanoma remains to be further elucidated.
In the present work, to investigate the potential immune features for melanoma patients, we used the TCGA-SKCM database and applied the ESTIMATE algorithm to calculate the immune and stromal scores in the SKCM, and the TME-related hub genes which can predict the prognosis and treatment risk of SKCM patients in the SKCM tumor microenvironment were extracted. Moreover, this was validated in a different melanoma cohort available from the GEO database. We hypothesized that these identi ed genes were correlated with favorable prognosis of melanoma patients and might be helpful in developing the potential immune therapeutic strategies by providing novel insights into the molecular mechanisms of melanoma.

Materials And Methods
Data collection TCGA-SKCM is available on the TCGA Data Portal (https://portal.gdc.cancer.gov/) and the patients' clinical characters has been also downloaded from the TCGA Data Portal, including gender, age, clinical stage, survival time and survival status. Among the inclusion criteria were (a) Tumor purity has been estimated by image analysis of haematoxylin and eosin stain slides by the Nationwide Children's Hospital Biospecimen Core Resource and (b) clinical data with survival time and outcome, Exclusion criteria included: (a) clinical data with survival time and outcome less than 90 days, and (b) samples with less than 50% tumor cells were removed. For the external validation, clinical data about GSE65904 (datasets with analyzable sample sizes n = 189) gene expression pro les, survival time and status were retrieved from the GEO Data Portal (https://www.ncbi.nlm.nih.gov/geo/query). Since the data was obtained from a public database, approval from the Ethics committee or written informed consent from patients was not required.

Identi cation of differentially expressed genes (DEGs)
The immune and stromal scores were calculated by the ESTIMATE package on R version 4.0.2. The immune scores and stromal scores were divided into high and low groups by the cutoff value that was generated by the method, Survival: signi cance (log-rank test) in Cutoff Finder website (http://molpath.charite.de/cutoff/assign.jsp). Venn diagrams analysis of the number of the cases in immune and stromal score groups by VennDiagram package version 1.6.20. The extracted data was analyzed by using the limma package version 3.44.3 in Bioconductor on the R version. Fold change > 4 and adj. P < 0.001 were set as the cutoffs to screen the signi cantly DEGs. Heatmaps were generated by the pheatmap package version 1.0.12.

Construction of protein-protein interaction (PPI) network
The PPI was obtained from the Search Tool for the Retrieval of Interacting Genes (STRING) database ( https://string-db.org/, version 11.0) and was reconstructed via Cytoscape version 3.7.1. The interaction score > 0.9 was set as the criteria. The Molecular COmplex DEtection (MCODE) plugin of Cytoscape was utilized to identify the most signi cant clusters. The multivariate Cox regression model was constructed as previous report [16]. In brief, the TME-related genes were put in a stepwise multivariate Cox proportional regression model to nd the best predictive model. Risk score (RS) staging model was constructed via the survival package version 3.1. RS= (n = the number of prognostic genes, β = the coe cient of the genes, X = the relative expression value of the corresponding gene).

Univariate Cox regression and Multivariate Cox regression analysis
To nd the relationship between hub genes expression level and melanoma patient survival, the univariate Cox regression analysis was performed by applying the survival package of R version 3.1. A total of 36 hub genes with log-rank test P < 0.05 was considered as signi cance. Multivariate Cox regression analysis, which combines the expression values of signi cant hub genes weighted by their estimated regression coe cients was used to identify the prognosis signature of the genes.
Overall survival curve and Receiver operating characteristic curve (ROC) analysis Samples were divided into different groups according to their risk scores, and Kaplan-Meier analysis was performed for the survival curves. The curves were generated via the R "survminer" package version 0.4.8.
The survival signi cance was determined by two-sided log-rank tests. The ROC was dawn using the R package of "survival ROC" version 1.0.3 and the area under the curve (AUC) was used to evaluate the predictive value of the gene-based prognostic model.
Comparison of 22 immune cell subtypes between high RS and low RS groups To explore the differences of immune cell subtypes, CIBERSORT package was used to assess the proportions of 22 immune cell subtypes based on expression le. The perm was set at 1000. Samples with P < 0.05 in CIBERSORT analysis result were used in further analysis. Mann-Whitney U test was used to compare differences in immune cell subtypes in the high RS and low RS groups.

Statistical analysis
Kaplan-Meier survival analysis was performed by log-rank test. The data was processed using the PERL programming language (Version 5.32.0 http://www.perl.org). All statistical analyses were performed using the R software (version 4.0.2, https://www.r-project.org/). P < 0.05 was regarded as statistically signi cant.

Results
Comparison of gene expression pro les with immune and stromal scores in TCGA-SKCM The transcriptional expression pro les of 471 SKCM cases were downloaded from the TCGA database.
The 443 samples that meet the set criteria were evaluated by the ESTIMATE algorithm. Survival analysis based on the ESTIMATE score group shows that the survival time of the high group is signi cantly longer than the low group (5.562 Years vs. 9.764 Years, log rank test, P < 0.0001) ( Figure. 1A). In order to explore the relationship between the overall survival and immune or stromal scores, we used the method Survival: signi cance (log-rank test) in Cutoff Finder website to nd the proper cutoff value, and divided 443 SKCM cases into high scores and low scores groups according to their scores and performed the Kaplan-Meier analysis. The survival curve showed that the median overall survival of patients with low immune score was shorter than those with high score group (5.233 Years vs. 12.35Years, log rank test, P < 0.0001) ( Figure. 1B), and signi cant results were also found in stromal scores group (5.756 Years vs. 8.019, logrank test, P = 0.0422) (Fig. 1C). To nd out the relationship between overall gene expression pro les and immune/stromal scores, the Venn diagrams indicated that 175 cases were up-regulated in immune and stromal high score groups ( Fig. 1D), while 161 cases were down-regulated in both immune and stromal low score groups (Fig. 1D). Comparing the high-and low-immune/stromal score groups, 714 upregulated genes and 2 down-regulated genes were identi ed (fold change > 4, P < 0.001) (Fig. 1E). Therefore, we focused on the 716 DEGs for further analysis.

Screening of hub genes in SKCM microenvironment via protein-protein interactions
For better understanding the interactions of these identi ed DEGs, the protein-protein interaction (PPI) networks were acquired by applying the online STRING tool, and the combined score > 0.9 was considered to be signi cant in an interaction. The PPI networks of DEGs contained 386 nodes and 4518 edges.
The SKCM tumor microenvironment risk model was constructed Combined with the survival time and survival status in TCGA-SKCM data, 104 SKCM TME hub genes were analyzed by Univariate risk ratio analysis and ltered by P < 0.01. A total of 59 low-risk TME-related genes were identi ed (Fig. 4A), and they were analyzed by multivariate Cox regression to construct a prognostic model containing SKCM TME genes of 10 genes (Risk score= (-0.493) × P2RY13 + 0.845 × FCGR1A + 0.712 × PDCD1LG2 + 0.447 × IRF1 +(-0.207)×GBP2 + -0.977 ×CD247 + -0.155 ×HLA-DQB2 + 0.388 × HLA-DRA + -0.680 ×CD8A + 0.516 × CD8B (Fig. 4B, C and D). We plotted survival curves of these 10 genes to explore their prognostic value. The results suggested that up-regulated of these ten SKCM TME-related genes was associated with a longer survival period (Fig. S1). The multivariate risk analysis results showed that only these 10 genes constructing the risk models with the area under the ROC curve reached 0.771 could be used as independent risk factors (Fig. S2).
The speci city and sensitivity of tumor microenvironment-related gene risk scores To explore the relationship between the risk scores of the 10 TME-related genes and the TME stromal and immune scores, we applied the correlation analysis. The results showed that risks scores had a strong negative correlation with melanoma immune and stromal score, and the correlation coe cients were − 0.5013 and − 0.3535, respectively ( Fig. 5A and B). The univariate risk analysis showed the 10 genes risk scores and tumor staging in the TME, T1-4 and N0-3 could be used as a risk factor for melanoma patients (Fig. 5C). However, the multivariate risk analysis results showed that only the 10 genes risk scores in the TME, T1-4 and N0-3 could be used as independent risk factors for melanoma patients (Fig. 5D). Receiver operating characteristics (ROC) analysis was used to evaluate the e ciency of the prognosis signature, T1-4 and N0-3. The data indicated that the robustness of the 10 genes risk scores in the SKCM TME is the highest (Fig. 5E). Kaplan-Meier survival curves were obtained for selecting and the results suggested that high risk was associated with poor overall survival (P = 2.665e-15 in Log-Rank test) (Fig. 5F).

Immune cell subtypes between high and low RS group
The 22 immune cell in ltration of high and low RS groups are shown in Fig. 6. T cells CD8, T cells CD4 memory activated, T cells gamma delta and Macrophages M1 in ltration in the low-risk group were signi cantly higher than those in the high-risk group. In the high-risk group, the in ltration of B cells naïve, monocytes and Macrophages M2 was signi cantly higher than that in the low-risk group.

GSE65904 for validation data set in GEO database
To understand whether the melanoma TME-related risk genes identi ed from the TCGA-SKCM dataset are also prognostic in other melanoma cases, we downloaded the GSE65904 dataset from the GEO database and analyzed gene expression pro les of the independent case cohort. Cox multivariate was used to analyze the 10 melanoma TME-related risk genes, and we found that ten genes including CD247, CD8A, FCGR1A, GBP2, HLA-DQB2, HLA-DRA, IRF1, P2RY13, PDCD1LG2 and CD8B participated in the cohort (Fig. 7A, B and C). Survival analysis showed signi cant differences in survival between the low-and highrisk groups in the GSE65904 dataset (P = 2.377e-09 (Fig. 7E). ROC curve was drawn, and the area under the ROC curve was found to be 0.742 (Fig. 7D).

Discussion
In the TME, the immune and stromal cells which are the main types of non-tumor components, which are considered to be valuable for tumor diagnosis and prognosis and therapeutic strategy selection [17,18]. For example, tumor immune pro ling studies revealed that metastatic melanoma patients with partially exhausted CD8 + T cells within tumor site were predicted to have a high likelihood to response to anti-PD-1 therapy [19,20]. And one single-cell RNA-seq study of melanoma also indicated that the interaction between immune cell and stromal molecule may be important factor for melanoma diagnosis and therapeutic strategies [21]. Since the genomic landscape of tumor tissues is an essential factor that determine the TME [22], it is necessary to use various methods to explore effective prognostic biomarkers which may remarkably affect the cancer treatment response. Therefore, we explored the TME of melanoma and sought signi cant genes with prognostic value that contribute to melanoma patients' survival.
We calculated the immune and stromal scores of the TME of 336 patients in SKCM based on transcriptome data in the TCGA database. Comparing the differences in gene expression levels between the high and low immune/stromal scores, we identi ed 59 low risk genes associated with the prognosis of SKCM patients. Multivariate risk analysis of these 59 low-risk genes constructed a risk grouping model based on 10 genes. P2RY13, FCGR1A, PDCD1LG2, IRF1, CD274, GBP2, CD247, HLA-DQB2, CD8A and CD8B and divided SKCM into low-and high-risk groups. These 10 genes were highly expressed in the lowrisk group, indicating that they are related with favorable prognosis of melanoma patients. These 10 molecules can be divided into the following categories: TCR/CD3 complex related molecules: CD8A and CD8B; G protein coupled receptors and their ligands: P2RY13; Immune checkpoint related proteins: PDCD1LG2, CD274; Interferon gamma signaling pathway proteins: GBP2 and IRF1; T cell activation molecules, integrin and IgG receptor related molecules: FCGR1A; Type II major histocompatibility complex: HLA-DQB2 and HLA-DRA.
The CD3 complex is essential in the immune response. The protein encoded by CD8A and CD8B constitute the CD8, which can mediate effective cell-to-cell interactions in the immune system on cytotoxic T lymphocytes. This mechanism enables CTL to recognize and eliminate infected cells and tumor cells [23]. P2RY13 (Purinergic receptor P2Y13) is activated by ADP and may play a role in hematopoietic and immune systems [24]. PDCD1LG2 (programmed cell death 1 ligand 2) involves in costimulatory signals, and the interaction with PDCD1 inhibits T cell proliferation by preventing cell cycle progression and cytokine production [25]. GBP2 (Guanylate binding protein 2) is reported to be associated with better prognosis in breast cancer and represents an e cient T cell response [26]. IRF1 (Interferon Regulatory Factor 1) is a transcriptional regulator and tumor suppressor. It directly affects NK maturation and activity [27], M1 macrophage production [28], and its priming of the TEXs (tumor-derived exosomes) enhances antitumor immune redsponse [29]. FCGR1A (Fc fragment of IgG receptor Ia) is a high-a nity Fc-γ receptor that plays a role in both innate and adaptive immune responses [30]. HLA-DQB2 and HLA-DRA belong to HLA II. They can bind peptides from antigens. These peptides enter the endocytic pathway of antigen-presenting cells (APC) and present them to the cell surface for CD4 T cell recognition [31]. Whether they determine the speci city of peptide binding in melanoma microenvironment remains to be determined.
The above analysis shows that the expression of 10 genes mainly affects the immune in ltration score. This indicates that the prognostic value of the risk score is related to the immune system of melanoma.
To further explore immunity and risk scoring, we use the CIBERSORT algorithm to calculate immune cell subtypes. The results showed that the two risk score groups expressed different immune cell subtypes. In the high-risk group, T cells CD8, T cells CD4 memory activated, T cells gamma delta and Macrophages M1 levels were lower, while B cells naïve, monocytes and Macrophages M2 levels were higher. This implies that the imbalance of the ratio of T cells CD8, CD4, T cells gamma delta and M macrophages in the high-risk group may reduce the survival rate of patients. These results may provide clues for decoding the complex interaction between tumor and TME in melanoma. A lot of valuable information remains to be explored and mined.
In summary, by analyzing of TCGA database using ESTIMATE algorithm and functional analysis of the genes, we extracted 59 genes related to TME with low risk, and an effective risk prediction model composed of the 10 genes was constructed. The risk models constructed by these low-risk genes was con rmed in the GEO-GSE65904 cohort. In addition, we nd that these genes are conducive to the overall survival of patients. Therefore, further exploration of these genes may provide novel insight into the potential biomarkers in TME for melanoma prognosis and therapy.
Declarations interpreted the results. Jun-Mei Yang and Jia-Hui Guo co-worked on associated data collection and helped to revise the manuscript. Chen Chen and Shan-Shan Yin helped to perform the reference collection. All authors read and approved the nal manuscript.

Funding
This work was supported in part by the National Natural Science Foundation of China (82002922 and 81572796), Science and Technology Committee of Shanghai (14401901500).

Availability of data and materials
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate         Tumor microenvironment-related gene risk scores were compared with clinical pathological parameters. (A) Correlation analysis between TME-related gene risk scores and TME immune scores. (B) Correlation analysis between TME-related gene risk scores and TME stromal scores. (C and D) Univariate and multivariate risk analysis of TME-related gene risk scores, tumor stage, tumor size, distant metastasis, and lymph node in ltration. (E) The 10 TME-related gene risk assessment scores, tumor stage and TNM ROC curve and area under the curve in TCGA-SKCM tumors. (F) Kaplan-Meier survival curves were generated for selected the 10 TME-related gene risk assessment scores from the comparison of groups of high (red line) and low (blue line) score groups.

Figure 5
Tumor microenvironment-related gene risk scores were compared with clinical pathological parameters.
(A) Correlation analysis between TME-related gene risk scores and TME immune scores. (B) Correlation analysis between TME-related gene risk scores and TME stromal scores. (C and D) Univariate and multivariate risk analysis of TME-related gene risk scores, tumor stage, tumor size, distant metastasis, and lymph node in ltration. (E) The 10 TME-related gene risk assessment scores, tumor stage and TNM ROC curve and area under the curve in TCGA-SKCM tumors. (F) Kaplan-Meier survival curves were generated for selected the 10 TME-related gene risk assessment scores from the comparison of groups of high (red line) and low (blue line) score groups.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.