Identication of Prognostic Signature and Immune Inltration in the Skin Cutaneous Melanoma Microenvironment

Skin cutaneous melanoma (SKCM), characterized by high immunogenicity, has an increasing incidence in recent years. The development of immunotherapy recently offered a promising treatment for patients with SKCM. Unfortunately, not all patients derive benet from such treatment, so we still face considerable challenges. Hence, it is imperative to develop novel prognostic signature and identify immunotherapeutic targets. In the present study, patients in high immune scores group presented a higher survival rate, while no statistical difference was observed in groups with different stromal scores. 493 DEGs were identied for functional analysis, which were enriched in function related to immune regulation such as lymphocyte activation and cytokine-cytokine receptor interaction. Subsequently, 84 DEGs intersected from Univariate Cox regression analysis and top 100 hub genes in PPI network were identied for the construction of prognostic model. Finally, a prognostic signature including 3 genes (HLA-DQB2, CD80 and GBP4) was established in TCGA training dataset, which was effectively validated in test dataset. Moreover, the model was considered as an independent prognostic factor via univariate and multivariate analysis. Besides, CIBERSORT and correlation analysis demonstrated that the expression level of risk scores was signicantly correlated to inltration levels of immune cells in SKCM. Above all, our study developed a novel prognostic signature, serving as potential prognostic biomarker for SKCM patients. A closely correlation between the prognostic model and tumor immune microenvironment was conrmed, offering a novel insight for the pathogenesis and potential immunotherapy for SKCM. stage and risk scores were selected for univariate and multivariate Cox regression analysis to identify independent prognostic factors for SKCM. P<0.05 was considered signicant. Chi-square test was conducted to investigate the association between risk level of prognostic gene signature and clinical phenotype. P<0.05 was considered statistically signicant. A nomogram was formulated to predict survival rates of SKCM patients.


Introduction
Skin Cutaneous Melanoma (SKCM), a type of solid tumor, is a highly heterogenous disease characterized by highly aggressive phenotype. It is one of the most common cancer worldwide, whose morbidity is increasing year upon year [1,2]. Besides, it is characterized by poor outcomes, resulting in high mortality [3,4]. Patients with early stage acquire good prognosis with the surgical treatment. However, the prognosis for most of the patients suffered from advanced melanoma is poor due to the high incidence of metastasis [5].
An increasing number of evidence documented that the tumor microenvironment (TME) played an important role in the process of tumor. TME, a complex system, consist of mutated cancer cells and immune cells and non-immune cells such as stromal cells extra-cellular matrix (ECM), chemokines and cytokines. In various types of cancer, alternative components in tumor microenvironment were implicated in immunotherapy and resistance to chemotherapy. Especially, immune cells and stromal cells were demonstrated to closely associated with the prognosis of cancer. For example, in several tumors, such as clear cell renal cell carcinoma (ccRCC) and acute myeloid leukemia (AML), immune-and stromal-related genes were determined to predict overall survival of patients [6,7]. The dynamic interaction of different component in TME changed the traditional understanding of tumor. Notably, perceptions of TME promoted the development of e cacious immunotherapy of towards SKCM. In recent years, the prognosis of patients with SKCM was substantially improved ascribed to the development of targeted therapies and immunotherapies including the cytotoxic T-lymphocyte antigen 4 (CTLA-4), programmed death ligand 1 (PD-L1) and programmed cell death receptor 1 (PD-1) [8,9], implying the great promise of immunotherapy. However, the clinical bene ts of such treatment for most patients are still limited. So, it is of great signi cance to develop more reliable and e cient targets for immunotherapy. The components of immune cells in tumor microenvironment remain the foundation of tumor invasion and metastasis. In several studies, it has been demonstrated that recognition of the immune cells components helped to predict e ciency of immunotherapy [10,11]. For example, the activation of CD8+ T cells was correlated with immunotherapy response [12]. Another study determined that lymphocytic in ltration was a strong independent predictor for prognostic prediction [13].Therefore, given the signi cance of immune cells component in the tumor process, gaining deeper insight into TME is essential.
Several algorithms were performed to estimate immune in ltration. ESTIMATE is an algorithm developed to compute the stromal scores and immune scores using ssGSEA, representing the distribution of stromal cells and immune cells [14]. Actually, the reliability of the algorithm has been proven in a variety of tumor such as lung adenocarcinoma [15], acute myeloid leukemia[6, 16] and clear cell renal cell carcinoma [17].
Therefore, in our study, ESTIMATE algorithm was applied to investigate the correlation between TME and clinical prognosis. We also identi ed a immune-related signature for prognostic prediction. In addition, CIBERSORT, a method to characterize immune cell fractions, was used to estimate immune cells compositions [18]. Moreover, accumulating evidence suggested that the composition of immune cells including T Lymphocytes, NK cells, Dendritic Cells and Macrophages was closely correlated to cancer treatment and prognosis [19,20]. Therefore, we also investigated the association between risk scores and immune cells.

Data collection
The RNA-seq data of patients with SKCM, including expression data normalized by Fragments Per

TME analysis
Immune and stromal microenvironment in ltration, presenting by immune and stromal scores, were calculated by ESTIMATE algorithm using estimate R package. The sum of immune and stromal scores re ected the tumor purity, presenting as estimate scores. Relationship between microenvironment in ltration and clinical features were evaluated by Wilcox test.
Differentially expressed genes (DEGs) identi cation and functional analysis As groups divided by different stomal scores have no signi cance in the survival probability, we identi ed DEGs in groups divided by the median of immune scores. Genes with |log FC|>2 and adjust P <0.05 were identi ed as DEGs. The heatmap of top 100 DEGs was drawn by R package. GO and KEGG enrichment analysis of 492 DEGs were conducted to analyze the potential function by clusterPro ler R package [21].
Terms with adjust P <0.05 were regarded as enriching signi cantly. In order to identify the hub genes in DEGs, PPI network was performed by STRING database with high con dence (0.9) and re-analyzed by Cytoscape (version 3.7). The DEGs were ranked by the numbers of interaction nodes.

Prognostic model development and validation
In our study, univariate Cox and Kaplan-Meier analysis were used to screen the prognosis-related genes with P< 0.05. TCGA dataset were randomly split into training and test datasets (70% vs 30%). Intersection of prognosis-related genes and top 100 hub genes in PPI network were selected for Lasso-penalized Cox regression analysis to narrow the range of prognosis-related genes in the training dataset. Subsequently, a prognostic model with 3 prognostic genes was constructed by multivariate Cox regression. The calculation formula of risk scores was as follows: Risk scores = expression of HLA-DQB2 * (-0.012311896) + expression of CD80 * (-0.628923031) + expression of GBP4 * (-0.017960845). All patients with tumor were divided into two groups according to the median of risk scores. Overall survival (OS) in groups of low and high risk scores was compared by Kaplan-Meier analysis. The receiver operating characteristic (ROC) curve was utilized for evaluating the diagnostic values. In addition, we validated the e ciency of the prognostic model in the test dataset.
Identification of independent factor for SKCM Characteristics including gender, age, TMN stage and risk scores were selected for univariate and multivariate Cox regression analysis to identify independent prognostic factors for SKCM. P<0.05 was considered signi cant. Chi-square test was conducted to investigate the association between risk level of prognostic gene signature and clinical phenotype. P<0.05 was considered statistically signi cant. A nomogram was formulated to predict survival rates of SKCM patients.
Identi cation of prognostic genes signature-related immune cells

Estimation of immune in ltration
CIBERSORT was applied to estimate the abundance of immune cells using R software. Totally, 260 samples with p<0.05 were selected for further analysis. The correlation matrix of 21 immune cells proportions. Correlation matrix was established to characterize the correlation between immune cells. Furthermore, Mann-Whitney U test was employed to determine the immune cells proportion between low risk and high risk groups. Additionally, scatter plots of correlation between the relative content of immune cells and risk scores were delineated (p<0.05). Finally, Venn diagram was applied to de ne the common immune in ltration associated with risk scores.

Results
Immune scores was signi cantly associated with prognosis and clinical characteristics of SKCM The information of 471 patients with SKCM, including gene expression and clinical characteristics such as survival time and status, age, gender and tumor stage, were downloaded from TCGA. ESTIMATE algorithm method was applied to evaluate the immune scores and stromal scores, representing the proportion of immune cells and stromal cells respectively. To investigate the relationship between scores and overall survival, Kaplan-Meirer analysis was performed to analyze the survival probability of all tumor samples, which were split into two groups according to the median of scores. As a result, compared with low immune scores group, a higher survival probability was observed in high immune scores group Identi cation and functional enrichment of DEGs According to the above results, we considered that immune components was important for prognostic diagnosis of SKCM. Hence, we divided all the tumor samples into low and high immune scores groups to differentiate immune related genes. As a result, 493 DEGs were identi ed, including 486 upregulated and 7 downregulated DEGs. All of the DEGs were applied to annotate their potential function by GO and KEGG enrichment analysis. GO enrichment analysis ascribed for DEGs mostly enriched in immune-related GO terms, including T cell activation, regulation of T cells and lymphocyte activation (Fig. 2B). In addition, KEGG enrichment results showed that DEGs mapped to cytokine-cytokine receptor interaction, Graftversus-host disease and hematopoietic cell lineage (Fig. 2C). Hence, these results suggested a predominant role of immune factors, which was worth investigating further.
Intersection of top-ranked DEGs in PPI network and DEGs responsible for prognosis PPI network was performed by STRING database with high con dence (0.9). Re-analysis of PPI network by MCODE of in cytoscape software differentiated top two important modules (Fig S1). Additionally, the top 30 DEGs with more nodes were depicted in the bar plot (Fig. 2D). Univariate COX regression and Kaplan-Meier survival analysis for survival probability were conducted to discriminate signi cant genes from 493 DEGs. Subsequently, intersection analysis of 100 top-ranked DEGs in PPI network and DEGs with signi cant p-value (P < 0.05) in univariate COX regression and Kaplan-Meier survival analysis was performed to screen signi cant prognostic factors (Fig. 2E). In total, 84 DEGs were identi ed for further analysis.

Construction of immune-related genes prognostic model
Firstly, all samples were randomly split into a training dataset and a test dataset (7:3). Lasso Cox regression analysis was applied to establish the prognostic signature model in training dataset. Subsequently, 9 genes were selected for further multivariate Cox regression analysis. 3 genes including HLA Class II Histocompatibility Antigen, DQ Beta 2 Chain (HLA-DQB2), CD80 Molecule (CD80), Guanylate Binding Protein 4 (GBP4) were de ned to develop a prognostic model for OS (Table 1). Risk scores = expression of HLA-DQB2 * (-0.012311896) + expression of CD80 * (-0.628923031) + expression of GBP4 * (-0.017960845). According to the formula, the risk scores of patients in train or test dataset were calculated. Additionally, patients were classi ed into low-and high-groups based on the median of risk scores. The distribution of risk scores and expression patterns 3 prognostic genes in training and test datasets was displayed in Fig. 3C and Fig. 3D respectively. Kaplan-Meier curves in both training and test datasets demonstrated that patients in the low-risk group had a better prognosis (P < 0.01) ( Fig. 3E and 3F).

Establishment of nomogram for survival probability prediction
Next, ROC curves for training and test dataset were depicted to evaluate the prognostic value of immunerelated prognostic model. The AUCs ascribed for 3-, and 5-year OS in training dataset were 0.718 and 0.722 respectively, while that in test dataset were 0.670 and 0.674 ( Fig. 4A and 4B). As shown in Fig. 4C and 4D, the results analyzed by Univariate Cox and Multiple Cox revealed that T and N stage classi cation, as well as our prognostic model were independent prognostic factors in SKCM patients. In addition, the statistical difference of clinicopathological factors including survival status, age, gender and TMN stage classi cation between low-and high-risk groups was investigated by chi-square test. The results in Fig. 4E determined that survival status, T and stage classi cation were signi cantly different in low-and high-risk groups. Finally, in order to predict OS for SKCM patients, a visual nomogram model was established in Fig. 4F.
Correlation between risk scores and immune cells CIBERSORT algorithm was applied to calculate the proportion of 21immune cells in all tumor samples (Fig. 5A). And the correlation of 21 immune cells was depicted as a matrix in Fig. 5B. T cells CD8 exhibited negative correlation with Macrophages M0 and T cells CD4 memory resting, while it was positively related to T cells CD4 memory activated and T cells follicular helper. The expression of immune cells was different in low and high risk scores groups ( Fig. 6A and 6B). To identify the signi cant survivalrelated immune cells, the intersection analysis of difference and correlation analysis was performed (Fig. 6C).

As a result, 7 immune cells such as T cells CD8, T cells CD 4 memory activated, Macrophages
M1 and NK cells activated presented negative correlation with risk scores, while 6 immune cells such as T cells CD4 memory activated, Macrophages M0 and Macrophages M2 were positively associated with risk scores.
Discussion TME, which was found to implicate in initiation, progression and metastasis of tumors, has received increased attention in recent years. Nevertheless, TME play distinct roles in different cancers. In particular, recent study demonstrated that the immunological tumor microenvironment contributed to cancer immunotherapy [22][23][24]. Cutaneous melanoma is one of the highly malignant tumor in skin tumors, with strong immunogenicity. Of note, it has been demonstrated that components of immune in ltrate supplied signature for survival prediction in patients with SKCM [25], indicating the importance of immune in ltrate. Hence, we performed this study to investigate the characteristics of immunological TME in SKCM and establish prognostic signature to guide individual therapy.
Firstly, we investigated the correlation between TME and OS. Our results showed that immune cells component contributed to patient survival, while stromal cells presented no signi cance. Particularly, immune scores were negatively correlated with T classi cation, indicating immune cells mediated inhibition of tumor invasion and metastasis. Subsequently, 493 DEGs were identi ed from low-and high immune scores and functional analysis showed enrichments in lymphocyte activation and cytokine-cytokine receptor interaction, revealing a promising role of immune factors in the tumor immune microenvironment. Subsequently, 84 DEGs intersected from prognosis-related genes and top 100 hub genes in PPI network were identi ed for construction of a prognostic model. Finally, a prognostic signature including 3 genes (HLA-DQB2, CD80 and GBP4) was established through lasso Cox and multivariate Cox regression analysis in TCGA training dataset. All of these three genes presented positive effects on the prognosis of SKCM patients. HLA-DQB2 was encoded HLA-DQ gene cluster located on chromosome 6, which engaged in the immune regulation [26]. Of note, previous studies reported that CD 80 displayed antitumor immunity via inhibiting programmed death ligand-1 (PD-1) mediated immune repression [27][28][29]. Recent study demonstrated that CD80 mediated tumor T cells in ltration and tumor growth inhibition in Cutaneous Melanoma [30]. Research performed by Lai Xu et al reported that GBP4 contribute to prognosis of breast cancer, colorectal cancer, stomach carcinoma and SKCM [31]. Additionally, favorable prognosis was observed with high expression of GBP4 in patients with SKCM [32]. Hence, both CD 80 and GBP4 were certi ed to correlate with development of SKCM. However, the role of HLA-DQB2 in SKCM remains to be further clari ed. OS of patients in low-and high-risk divided by risk scores were e ciently strati ed in both training dataset and test dataset. Validation of e ciency and multivariate Cox analysis upon the prognostic signature revealed that it was an effective and independent prognostic predictor for SKCM. Moreover, a nomogram was delineated to predict OS for patients with SKCM, supplying a tailored risk prediction. Our results revealed that our model could consider as potential prognostic signature for SKCM.
As the genes of prognosis signature were involved in immune regulation, CIBERSORT analysis was performed to explore whether the potential mechanism was related to immune cell subpopulations.
In ltrating lymphocytes including CD4 + and CD8 + T cells was considered to relate to favorable prognosis due to its potent anti-tumor response [33,34]. Tumor-in ltrating cells such as lymphocytes, Dendritic cells, Macrophages and Natural killer and Natural killer T cells played a pivotal role in tumor immune response [25]. Our results found that the level of risk scores was negatively associated with CD4 + T cells and CD8 + T cells, evidencing by high T lymphocytes proportion in the low risk group. T cells lymphocytes, a pivotal component of cellular immunity, played a major role in anti-tumor immune response. Many studies have demonstrated that high numbers of tumor-in ltrating T cells indicated a better prognosis in multiple tumor types such as breast cancer [35] and non-small cell lung carcinoma (NSCLC) [36]. Previous study demonstrated that tumor in ltrating lymphocytes has signi cant prognostic value in primary cutaneous melanoma [37]. Additionally, Thomas et al detected pathological slides utilizing hematoxylin and eosin staining to investigate lymphocyte grade [38]. Their results revealed that the fewer the lymphocyte in ltrate, the shorter disease-free survival (DFS) is, which were consistent with our results. Previous studies reported that macrophages such as M0, M1 and M2 showed functional plasticity in TME, exhibiting both promotional and repressive function in tumor [39]. Our results showed that the assay of macrophages M1 was negatively related to risk scores, while macrophages M0 and M2 are the opposite, indicating the different immune microenvironment between low-and high-risk groups. Totally, our results revealed that high degree of immune in ltration help to form local immunity, resulted in a high survival rate of patients. Several studies have reported that the degree of immune in ltration informed valuable prediction for immunotherapy [40][41][42]. Hence, our study provides prognostic signature for the OS as well as a potential target for immunotherapy for SKCM. However, our study presents several limitations. Firstly, although we veri ed the internal validity of the prognostic signature in test dataset, the prospective survey should be conducted in further study. Additionally, a detail validation via biological experiments was not performed.
In summary, we developed a reliable prognostic model for SKCM, evidencing by ROC analysis and survival analysis. Additionally, we also con rmed that the degree of immune in ltration was different in patients with different risk levels, conferring new insight into the pathogenesis and potential treatment for SKCM.

Declarations
Ethics approval and consent to participate: Not applicable.

Consent for publication:
Not applicable.
Availability of data and materials: The datasets generated and analysed during the current study are available in TCGA database (https://portal.gdc.cancer.gov).