Identication of a Prognostic Signature of Malignant Melanoma Based on Tumor Immune Microenvironment Exploration

differentially (DEGs) Cancer Genome (TCGA) effects of (IRGs) Cox multivariate Cox and risk of Operating Characteristic (ROC) analyses were to model using TCGA Gene Expression Omnibus (GEO) datasets, testing datasets low-risk multivariate


Background
Melanoma is a malignant neoplasm of melanocytes that accounts for the majority of skin cancer deaths despite comprising less than 5% of all cutaneous malignancies. Its incidence has increased faster than that of any other cancer over the past half-century and the annual costs of treatment in the United States alone have risen rapidly. The average 5-year survival rate is approximately 90, 70, 40, and less than 10% for patients with stage I, II, III, and IV malignant melanoma, respectively [1][2][3][4]. One of the standard methods of diagnosing and treating melanoma is represented by surgical resection (melanoma is frequently located in the skin, so excision is still widely applied at present for patients with early-stage melanoma due to its facile accessibility).
Although the majority of primary melanomas are cured with local excision, metastatic melanoma historically carries a grim prognosis with a median survival of 9 months and a long-term survival rate of 10% [1,5]. Except for melanoma in situ, the surgical procedure is generally not enough when it is performed as monotherapy [4], local recurrence, and / or metastases develop in most patients after a variable period of time.
In contrast to identi cation of well-de ned oncogenic alterations like BRAF mutations for MM patient strati cation, effective selection of predictive biomarkers remains a challenge in the era of checkpoint blockade. Tumor cellular diversity poses both challenges and opportunities for cancer therapy. This is exempli ed by the varied clinical e cacy achieved in malignant melanoma with targeted therapies and immunotherapies. Immune checkpoint inhibitors (ICIs) can produce clinical responses in some patients with metastatic melanomas [6,7], but the genomic and molecular determinants of response to these agents remain incompletely understood. Although tumor neoantigens and PD-L1 expression have been demonstrated clearly correlated with this response [8,9], other factors from subsets of malignant cells, the microenvironment, and tumor-in ltrating lymphocytes (TILs) may also play essential roles [10] . Classi cation of the immune landscape in tumors depends on both PD-L1 expression and the presence of T cell in ltration, referred to as tumor immunity in microenvironment (TIME) classi cation, which have important implications for the success of anti PD-1/PD-L1 therapy [2]. Therefore, it is critical to de ne and validate an effective gene signature model for the prognosis of MM. Because of the differences in genetic backgrounds and therapeutic e cacy of immunotherapy, specialized therapeutic strategies for these subtypes of melanoma should be established in the future.
In this study, we selected signi cant mRNAs for the prognosis of metastasis MM patients via data mining, and the data were validated in independent datasets. This may provide effective candidate genes for the prediction and treatment of metastasis group in MM. To validate the effectiveness of extrapolation of the signatures obtained by the above mentioned approaches. The validation dataset, containing 44 metastatic melanoma samples with RNA sequencing information from three groups were included for validation, including GSE19234 [11].

Identi cation of genes
The gene expression differences between the primary group and metastasis group compared using the "limma" package in the R software 3.3.1 [12] , and genes with |log fold change (FC)| >1 and Benjamini-Hochberg (BH)-adjusted P<0.01 were considered to be signi cant DEGs.

Acquisition of immune-related genes
A comprehensive list of IRGs was downloaded from the Immunology Database and Analysis Portal (ImmPort) database [13]. The list comprised a total of 2,498 IRGs, covering 17 immune categories.
Construction of MM immune-related prognostic model Using the "survival" package in R, we employed univariate cox regression on IRGs and Observed Survival, OBS [14]OBS of metastasis MM in the TCGA database to identify survival-associated IRGs. OBS was de ned as the time interval from TCGA sampling to patient death or last follow-up. After testing for collinearity, multivariate cox regression analysis was performed to establish the IRG-derived risk signature in metastasis MM. The following formula based on a combination of cox coe cient and gene expression was used to calculate the risk score [15]: where k, β i , S i represent the number of signature genes, the coe cient index, and the gene expression level, respectively.
To stratify metastasis patients into low-and high-risk groups, the optimum cutoff value for the risk score was determined using the "survminer" package in R. Next, the Kaplan Meier survival curve and log-rank test was performed to evaluate the survival rates between low-and high-risk groups. The area under the receiver operating characteristic (ROC) curve (AUC) was calculated using the "survival ROC" package in R [16], and the AUC was calculated to measure the predictive accuracy of this prognostic signature for time-dependent cancer death. For ROC curve analysis, the AUC was calculated and for Kaplan-Meier (KM) survival analysis, the p-value of a two-side log-rank test was calculated by the "survminer" package [17].

Functional enrichment analysis of DEGs
Gene ontology (GO) analysis was performed via an R clusterPro ler package [18] for screening of differential methylation of genes was conducted to search the vital functions related to the differential gene methylation in MM [19]. The p-value was adjusted into false discovery rate (FDR) by Benjaminiand Hochberg study [20]. FDR <0.01 was considered as the cutoff value. P-value <0.05 was regarded as the cut off threshold value. By this strategy, the screened genes with methylation differences were then assigned to different gene functions including cellular component (CC), molecular function (MF) and biological process (BP).

In ltrating immune cells of MM analysis in CIBERSORTx database
CIBERSORTx is a suite of machine learning tools for the assessment of cellular abundance and cell typespeci c gene expression patterns from bulk tissue transcriptome pro les [21]. The gene expression data was uploaded to CIBERSORTx's online platform (https://cibersortx.stanford.edu). A validated leukocyte gene signature matrix (LM22) was used to identify 22 human hematopoietic cell subsets in bulk tissues, including tumors. We utilized Bulk-mode batch correction and 1000 permutations for the analysis in relative mode. Quantile normalization was disabled as the dataset was generated from RNA-seq. The lter criteria of each sample is set as the CIBERSORTx calculation of P-value < 0.05 (statistical signi cance of the deconvolution result across all cell subsets to ensure adequate goodness-of-t).

Calculation of Stromal and Immune Scores
ESTIMATE is an algorithm that infers the in ltration situation of immune cells and stromal cells in tumor tissue according to the transcriptome data of TME-related genes which contain a set of immune and stromal signature genes [22]. The expression data of these TME-related genes were extracted from TCGA MM dataset. ESTIMATE outputs stromal and immune scores by performing single-sample gene setenrichment analysis [23,24]. For tumor samples of MM cohort, rst, gene expression values were ranknormalized and rank ordered. Then, the empirical cumulative distribution functions were calculated for genes in the signature and the remaining genes. Finally, a statistic was calculated by an integration of the difference between the empirical cumulative distribution function.
Validation of the gene signature in the testing dataset The predictive performance of the IRGs was further validated in GEO dataset. Samples in the testing datasets were divided into high-and low-risk group according to the formula of risk score derived from the training dataset, respectively. KM survival analysis and ROC curve were used to evaluate the predicting power of the gene signature.

Evaluation of immune status between high-risk and low-risk groups strati ed by prognostic model
To explore the potential relationship between immune system and IRGs, we analyzed the immune status of the high-risk and low-risk samples. First, using six-IRGs, we quanti ed the immune activities between high-risk and low-risk samples by the ESTIMATE algorithm was used to calculate corresponding immune scores, stromal scores and tumor purities [22], and the difference of tumor purities between high-risk and low-risk samples was further analyzed.

Design of work ow
For investigating the biomarkers in MM, a hypothesis was proposed, which assumes that differential genes related to immunity may be related to the prognosis of metastatic patients. Thus, the prognosis was explored in MM patients with OBS from TCGA database. The work ow was listed in Figure1 Patient demographics and clinical factors Our study sample comprised 448 patients. Patient characteristics of the discovery and validation set are shown in Table 1 and Table 2. The TCGA sample form 448 patients as the discovery set (96 patients of primary, 352 patients of metastasis) was analyzed, and the GSE19234 (44 patients with above stage III of MM) samples of MM patients as the validation set (Table 2)  Establishment of the immune-related genes risk signature Seven predominant clinical and prognostic factors, including age, gender, race, clark level value, pathologic TNM, stage, Breslow depth value and IRGs were evaluated using univariate and after testing for collinearity, the results showed that 16 IRGs and two clinical factors (Clerk scores and pathologic T) were selected (Table. 3), for univariable Cox regression, the coe cients of clerk scores (HR was 1.261 with 95% CI [1.026, 1.55], and p-value = 0.0275 *) and pathologic T stage (HR was 1.101 with 95% CI [1.013, 1.197], and p-value = 0.024 *) were statistically signi cant. Then, the stepwise multivariate Cox regression analysis was performed to establish the IRG-derived risk signature in metastasis MM. After stepwise multivariate cox proportional hazard regression, only 6 IRGs for OBS were included in the model (Table 4). The data showed that the pathologic T stage continued to be in uencing factors of the prognosis of MM patients in TCGA, with p-values of 0.025. For every MM patient, a PI as a risk score was calculated according to equation (1). The PI was signi cantly associated with OBS in metastasis group.
As, the PI was the most signi cant with the smallest p-value (1.921×10 -13 **) and the largest HR (1.614 [1.420, 1.833]), which thereby implied that the PI was superior to conventional prognostic markers, including T stage in predicting patient outcome.
The time-dependent ROC curve was created in Figure. 3B. The ability and e ciency of each model to predict metastasis patient's outcome was estimated by calculating the AUC of ROC curve, which was conducted using the survival ROC package in R software. The time-dependent ROC curve was created, the AUC of ROC curve were 0.700 (5 year Figure. 3A) show that the p-value of the log-rank test was 2.01Í10 -10 , which implied that the PI could signi cantly separate patients in a high-risk and low-risk group.
Therefore, it can be concluded that the PI calculated via 6 IRGs had a good predictive value of prognosis of MM patients in the TCGA database.

GO functional enrichment analysis
We analyzed the GO enrichment and pathways in which gene-signature involved in. Of these IRGs models, the number of genes in a gene signature model is so small that it is di cult to enrich in GO analysis. ClusterPro ler package was employed to analyze the models. The above package can compare the results of biological process, cellular component and molecular function (Figure. 4). The results of biological process suggested that the 172 differential genes share many processes ( Figure.  Subsequently, as shown in the box plots ( Figure. 5D), the in ltration levels of plasma cells, naive B cells and Tfh cells were signi cantly differential in primary group and metastasis group in TCGA datasets (pvalue < 0.05). The naïve B cells and plasma cells were all signi cantly higher in metstasis group than that in primary group in (p-value < 0.05). And conversely, for Tfh cells, there is a lower in ltration level in metastasis group than primary group.
Correlation of the risk score with tumor-in ltrating immune cells As we known the immune checkpoint gene expression, which is associated with treatment responses of immune check point inhibitors, was also analyzed. We evaluated nine genes previously reported to be targets of ICIs: PDCD1 (PD-1), CD274 (PD-L1), CD276 (B7-H3), CTLA-4, IDO1, LAG3, VTCN1, HAVCR2 and TNFRSF4 (OX40). The correlation analysis showed that correlationbetween 22 types of tumor immune cells both and 9 immune-checkpoint genes (ICGs) and 6-IRGs were very different in primary tissues and metastatic tissues of MM. (Figure. 6A,C and 7A,C, Supplement le. 3,5 and 7,9). Except for CXCR4 and VTCNI, the correlation between 6 genes and 9-ICGs is very similar. In the primary group, CXCR4 and VTCNI showed a negative correlation (r = -0.18, p-value = 0.01) in the metastasis group, while there was no correlation in the primary group. The transfer group CD79A (r = -0.28, p-value <= 0.001) and LYZ (r = -0.189, p-value <= 0.001) showed a negative correlation with CD276, but there was no between CD79A, LYZ, and CD276 in the primary group correlation (p-value = 1), ( Figure. 6B and 7B, Supplement le. 4 and 8). The correlation between 6-IRGs and tumor purity showed that ( Figure. 6D and 7D, Supplement le.6 and 10), fourth IRGs were negativities correlated with the tumor purity both in primary group and metastasis group. The correlation analysis results of the 6-genes and the differential tumor in ltrating cells in the primary and metastatic groups showed that ( Figure. 6E-G and 7E-G): in the metastatic group, LYZ (r =0.309, p-value <0.0001) and CXCR4(r =0.331, p-value < 0.0001) were highly correlated with plasma cells, and CCL19(r =0.303, p-value < 0.0001), CD79A (r =0.241, p-value < 0.0001 ) and the naïve B cells were correlated. Not only that, CXCR4 is also highly correlated with Tfh (r =-0.175, p-value < 0.0001). Perhaps these genes are related to the transfer of primary tumor.

Validation set prognosis analysis for MM in GEO cohort
We screened six IRGs and veri ed them with the GEO data set again (GSE19234), which can distinguish the prognosis of high-risk and low-risk patients. The validation result is shown in Figure 8. CCL19 showed a signi cant positive correlation with T cells CD4 memory activated (0.534, p-value =0.02), and a signi cant negative correlation with NK cells activated (r= -0.683, p-value <0.001) and Dendritic cells resting (r= -0.535, p-value =0.02). It is worth noting that the correlation analysis results of 6 IRGS and TME showed that 6 IRGS showed a signi cant negative correlation with tumor purity (p-value <0.001), and a signi cant positive correlation with other immune microenvironment scores (p-value <0.001) (except SLP1 and S100A7) ( Figure 9B, supplement le.12). From Figure 9C (supplement le. 13), we can see the correlation between 6 IRGs and ITLS, 6 IRGs and PDL1 have a strong positive correlation (except SLP1 and S1007A), which also provides a certain amount for future immunotherapy Ideas. From Figure 9D, we can see the correlation between 6-IRGs and three type ITLSs, verifying the results of the TCGA dataset.
Evaluation the immune status between low-risk and high-risk groups To further explore the relationship between six-gene signature and TME, ESTMATE method was used to assess the overall immune status of high-risk and low-risk groups by analyzing the expression pro les of the 29 immune signature gene sets. The low-risk group in TCGA and GSE19234 ( Figure.10) showed more immune activities than that of high-risk group. Tumor purity of low-risk group in TCGA and GSE19234 were signi cantly higher than that of high-risk group, which suggested more in ltrated immune and stromal cells in the TME of low-risk group.

Discussion
Immune checkpoint blockade represents a major advancement in cancer therapy for advanced melanoma. However, durable clinical responses are seen in only a minority of patients treated with singleagent CTLA-4 (1) or PD-1 blockade [27,28]. Although higher response rates are achieved when CTLA-4 and PD-1 inhibitors are administered concurrently, this regimen also has greatly increased toxicity [28,29]. There is a clinical need to predict who will bene t from immunotherapy and to understand mechanisms of therapeutic resistance to improve patient management and outcomes.
The cancer immune microenvironment has been intensively studied in the past few decades, paving the way for the recent clinical application of immunotherapies targeting immune checkpoints such as cytotoxic T-lymphocyte associated protein 4 (CTLA4), programmed cell death 1 (PDCD1/PD1) and programmed cell death 1 ligand 1 (CD274/PDL1) [30,31]. Various immunogenomic approaches have been applied to analyze tumor-immune cell interactions [32,33] and accumulating evidence supports the impact of host immunity on cancer progression and response to immunotherapy [34][35][36][37]. Immunotherapy for the treatment of advanced melanoma and other cancer types have become a primary treatment in the clinic. Checkpoint inhibitors block natural pathways that dampen or inhibit an immune response to stimulus.
These pathways include programmed cell death 1 receptor/programmed death-ligand 1 and cytotoxic T lymphocyte antigen-4 [38]. Although the genetic and epigenetic changes in tumor cells are crucial to the oncogenesis and progress of tumors, accumulating evidence shows that the interaction between the tumor cells and its surrounding normal cells also plays an indispensable role [39].
Based on the important role of TME in the development of tumors, we identi ed the immune-related DEGs, constructed a six-IRGs and validated its association with OS in a GEO test dataset to identify prognostic biomarkers associated with TME in MM's samples. The results indicated that the six-IRGs can well classi ed the MM patients of training and testing datasets into high-and low-risk groups, and the high-risk groups were associated with poorer prognosis. The univariate and multivariate cox analysis con rmed that the six-IRGs can be used as independent prognostic factor for predicting the patients' outcome. The ROC analyses using training, testing datasets also proved the robusticity of the prognostic signature.
For there is now abundant evidence that both tumor [40][41][42][43][44] and host-related factors [45][46][47][48] can in uence heterogeneous response and resistance to immune checkpoint blockade. GO analysis was conducted to study the potential molecular mechanism of prognostic effects of gene signature and the results showed that the expression changes of genes in the prognostic model mainly affected the cell-cell junction and the pathways related to IL-17signaling pathway. To explore the state of immunity in MM, ESTIMATE method was used to evaluate the overall immune status and tumor purity in metastasis TME in MM. Consistent with the KEGG pathway analysis, the overall immune activity of most low-risk groups was higher than that of the high-risk group. Correspondingly, the tumor purity was lower than that of the high-risk group, suggesting that more stromal cells and immune cells were in ltrated in the TME [38].
Immune and stromal cells in ltrated in TME are composed of many different types of cells. In this study, using transcriptome pro ling data, we estimated immune cell composition in clinically tumor-primary tissue from MM patients and metastasis tissue from MM individuals. Correlations between immune in ltration, immune activity and patient survival were also investigated. As expected, signi cantly altered immune cell composition was seen in metastasis tumors compared to primary samples. However, we also found elevated in ltration of CD8 + T cells, naive B cells and CD4 + memory T cells in metastasis tumor compared to primary samples from MM individuals, whereas in ltration of master cells resting and activated NK cells showed a decrease.
Immune scores and stromal scores calculated based on the ESTIMATE algorithm could facilitate the quanti cation of the immune and stromal components in a tumor. To better understand the effects of genes involved in immune and stromal cells on prognosis, we identi ed the immune-related differential expression genes of MM primary samples and metastatic samples in the TCGA database, and the expression of the genes was signi cantly related to the prognosis of MM patients. The immune/matrix scores of patients with different prognostic risk groups were also identi ed. The overall immune score was also higher in primary samples compared with metastasis samples. This result has also been con rmed in other tumor studies [49].
We suggested that the level of LYZ, CCL19, CXCR4 and CD79A expression was negatively correlated to the tumor purity in metastatic MM. The expression of immune check point, a potential therapeutic target of MM, was also analyzed in this study. The immune check point signaling pathway related PD-L1, CTLA-4 and IDO1 were consistently upregulated in highly immune-activated MM. These ndings indicated that expression of new immunotherapy targets was correlated to expression of PD-L1 and CTLA-4; therefore, it is suggested that the MM showed clinical response to anti-PD-1 or PD-L1 antibody might also respond well to new immune check point inhibitors, such as IDO1 inhibitor. Moreover, these results suggested that a combination of immune check point inhibitors can be an effective therapeutic strategy in metastatic MM.
Of the 6 genes, ve genes(LYZ,S100A7, CCL19, CXCR4 and CD79A) have been reported to be associated with MM's progression or signi cant in MM survival prediction [50][51][52][53][54], indicating that our big data analysis based on ESTIMATE algorithm has prognostic values in the MM cohort of TCGA database. The other two genes have never been reported to correlate with MM development and prognosis before and can be perceived as novel potential biomarkers for MM. Secretory leukocyte protease inhibitor (SPLI) is a member of whey acidic protein four-disul de core family and plays important role in tumorigenesis and cancer progression and metastasis. SLPI expression is highly upregulated in lung cancer, ovarian cancer, pancreatic carcinoma, papillary thyroid cancer and uterine cervix cancer [55,56], while downregulated in bladder tumors, prostatic neoplasms and some breast carcinoma [57]. As for the function of SLPI in cancers, Wagenblast et al. reported that SLPI not only drives the formation of extra-vascular networks but also acting as anticoagulants to ensure tumor perfusion [58]. In breast cancer (BC) SLPI may inhibit cell proliferation and promote cancer metastasis [59]. However, the underlying mechanisms have not been fully elucidated in melanoma.
Previously study found the expression of these S100 family genes to be signi cantly correlated with both lymphatic and distant melanoma metastasis, as well as with American Joint Committee on Cancer grade but not with Clark's grade, age, or sex. This suggests that expression of these genes may be related to the degree of tumor invasion. Although further validation through basic and clinical trials is needed, our results suggest that the S100 family genes have the potential to play an important role in the diagnosis of melanoma. S100 expression may be related to tumor invasion and facilitate the early diagnosis of melanoma, allowing for a more accurate prognosis. S100-targeted therapies are also potentially viable strategies in the context of melanoma [60].
The study showed that stage IV melanoma was characterized by an increased frequency of CCR7 + CD56 bright NK cells as well as high serum concentrations of the CCR7 ligand CCL19. CCR7 expression and CCL19 secretion were also observed in melanoma cell lines [61].
Melanoma is at the forefront of development of systemic therapeutics with both molecular targeted therapies and immune checkpoint inhibitors as cornerstones of treatment. Additionally, programmeddeath-1 (PD-1) inhibitors have revolutionized the treatment of melanoma and are set to pave future improvements in other solid tumors. Combinations of PD-1 inhibitors with novel immune checkpoints or with molecularly targeted therapies are under investigation and may make considerable progress. This study has identi ed an IRGs that may facilitate the development of therapeutic strategies for metastatic MM.

Conclusions
Our analysis has uncovered a 6-gene signature with signi cant difference between primary and metastasis samples in melanoma and thus can be used to predict the prognosis for patients with malignant melanoma.
Furthermore, we identi ed the 6-gene signature linked to IRGs to targeted therapies in metastatic melanomas and validated the outcomes in a number of melanoma samples using different datasets. Our data suggested that the 6-gene signature can be developed as potential biomarker for distinguishing prognosis and selecting patients for immune checkpoint blockade in patients with malignant melanoma.

Consent for publication
All authors have read and approved the content and agree to submit for consideration for publication in the journal.

Competing interests
The authors declare that they have no competing interests.

Funding
None. All authors read and approved the nal version of the manuscript.