An Immune-Related Gene Pairs Signature Predicts Overall Survival in Melanoma

Background: The incidence rate and mortality rate of melanoma have been increasing in recent decades. Increasing evidence has depicted the correlation between melanoma prognosis and immune signature. Therefore, the aim of this study is to develop a robust prognostic immune-related gene pairs (IRGPs) signature for estimating overall survival (OS) of melanoma. Methods: Gene expression proling and clinical information of melanoma patients were derived from two public data sets, divided into training and validation cohorts. Immune genes signicantly associated with prognosis were selected. Results: Among 1,646 immune genes, a 25 IRGPs signature was built which was signicantly associated with OS in the training cohort (P=1.80×10 −22 ; hazard ratio [HR] =9.50 [6.04, 14.93]). In the validation datasets, the IRGPs signature signicantly divided patients into high- vs low- risk groups considering their prognosis (P=2.47×10 −4 ; HR =2.99 [1.66, 5.38]) and was prognostic in multivariate analysis. Functional analysis showed that several biological processes, including keratinization and pigment phenotype-related pathways, enriched in the high-risk group. Macrophages M0, NK cells resting and T cells gamma delta were signicantly higher in the high-risk group compared with the low-risk group. Conclusions: providing new insights into post-operational


Introduction
Melanoma is one of the most aggressive cancers seen in humans, with approximately 300,000 new cases diagnosed worldwide each year. [1] It is currently predicted that one in 34 males and one in 53 women will be diagnosed with melanoma during their lives. [2] The overall ve-year survival rate (15-20%) has not changed much over the past few decades. [3] Most patients are usually diagnosed as late stage due to confusion of normal nevus and lack of highly speci c biomarkers. [4] Recently, despite advances in treatment, the prognosis for melanoma remains grim, with more than 60% of patients with metastatic melanoma having or developing brain metastases during their onset. [5] So far, the standard treatment is a combination of surgery, chemotherapy and immunotherapy. However, these treatments are largely limited by an advanced cancer diagnosis. [6] Conventional clinical features, such as tumor grade, histopathological classi cation, dermoscopy, and imaging examination, do not provide an accurate prognosis prediction for melanoma patients. [7] Meanwhile, patients with the same clinical or pathological conditions tend to have different clinical manifestations. The intrinsic genetic heterogeneity of patients has the greatest effect on the clinical and molecular diversity of melanoma. [8] Recently, researchers have established numerous molecular subtyping of melanoma patients based on gene expression pro les and have constructed prognostic multigene expression features that can divide patients into different risk groups. [9,10] However, due to the over tting of small sample training data sets and the lack of su cient validation data sets, these data sets have not been applied to daily clinical practice, and these defects may also reduce the e ciency and robustness of statistical conclusions.
However, the diversity of data also poses great challenges to the effective use of large-scale gene expression data. Considering the technical deviation and biological heterogeneity, it is di cult to standardize the gene expression pro les produced by different platforms. [11] In recent years, a new method based on relative ranking of gene expression level has been proposed to eliminate the shortcomings of data normalization and scaling in gene expression data processing, and reliable results have been obtained in various researches. [12][13][14] There is growing evidence that the immune system plays a key role in the development and progression of melanoma. [15] In 2014, nivolumab and pembrolizumab, two checkpoint inhibitors binding to PD-1, were approved for the treatment of metastatic melanoma.
[16] Given their prognostic potential in melanoma, the molecular characteristics of immune interactions should be studied in depth. In this study, immune genes signi cantly associated with prognosis were selected. Based on these genes, individualized prognostic signature was constructed and validated by integrating immune-related gene pairs (IRGPs) for melanoma.

Ethical approval
The researchers were granted approval to conduct the research by Departmental Research Ethics Committee at the Xiangya Hospital, Changsha, China. The study protocol was approved by the institutional review board of Xiangya Hospital. All the procedures were performed in accordance with the Declaration of Helsinki and relevant policies in China.

Sources of melanoma patients
This was a retrospective study using public data for comprehensive analysis. In total, two independent data-sets from different high-throughput platforms were used in this study including TCGA (n=476), and Cabrita (GSE65904, n=214). [17] Gene expression data together with corresponding clinical information were accessed from Gene Expression Omnibus (GEO http://www.ncbi.nlm.nih.gov/geo/) and Firehose Broad GDAC portal (http://gdac.broadinstitute.org/, accessed on Mar 8, 2021). Totally, 684 samples were available for analysis.
Gene expression data processing The original microarray data le (GSE65904) was normalized using the robust multi-array average (RMA) method from the "affy" package. For each data set, the expression pro le was converted from probe level to corresponding gene symbol based on each set of annotation les without further standardization; probe sets were selected on the basis of overall highest overall expression for each gene. For all datasets, this study only selected patients with complete overall survival (OS) information.
Construction of prognostic immune related gene pairs (IRGPs) signature The identi cation of prognostic IRGPs was performed as described previously. [11] The TCGA cohort were used for discovery and training the model. To construct immune related prognostic signature, we obtained 13,996 immune related genes (IRGs) from the ImmPort database (https://immport.niaid.nih.gov) [18] on 3/8/2021. A variety of immune-related genes were included including signal transduction, cell communication and immune response. IRGs was measured only on all platforms involved in this study and had relatively high variability (determined by median absolute deviation >0.5). Each IRGP was computed by pairwise comparison the gene expression level in a speci c sample or pro le. More speci cally, in a pairwise comparison, the output is 1 if the rst element is larger than the later one and 0 for the different order. After removing IRGPs with a small variation and imbalanced distribution (MAD =0), the remaining IRGPs were left and selected as initial candidate IRGPs for prognosis prediction. From the initial candidate IRGPs, an IRGP index (IRGPI) was constructed using Lasso Cox proportional hazards regression with 10-fold cross validation (glmnet package, version: 2.0-18) and 25 gene pairs were used to de ne the nal model. To stratify patients into low or high-risk groups, the optimal cut off of the IRGPI was determined using time-dependent receiver operating characteristic (ROC) curve analysis (survivalROC, version 1.0.3) at 1 year in the training cohort for overall survival.

Validation of the IRGPs signature
The IRGPI model was further evaluated in the validation of melanoma patients by the log rank test. We then assessed the IRGPI based on other clinical factors in the univariate and multivariate cox proportional-hazards analysis.

Immune cells in ltration in bulk tumor GEPs
In order to study the enrichment of immune cells in different risk groups, we used CIBERSORT, [19] a machine learning method for predicting immune cell in ltration of bulk tumor GEPs. CIBERSORT used support vector regression to estimate the enrichment of various immune cell types in GEPs. For each sample, CIBERSORT quanti ed the relative abundance of 22 in ltrating immune cells, including T cells, Bcell macrophages and natural killer cells. Using Monte Carlo sampling, CIBERSORT calculates the deconvolution p value of each sample to provide the estimated con dence. The TCGA GEPs were uploaded to the online analytical platform CIBERSORT portal website (http://cibersort.stan ford.edu/), using the default signature matrix at 1,000 permutations.
Gene ontology (GO) and gene set enrichment analysis (GSEA) GO analysis was performed for the prognostic immune signature using FunRich (http://www.funrich.org) [20]. Gene set enrichment analysis [21] was conducted using the Bioconductor package "fgsea" with 10,000 permutations. The log2 fold change between the gene expression pro les of high-and low-risk groups was taken. Gene sets from high-vs low-risk groups were compared. Biological processes involved in this study were downloaded from the Molecular Signature Database (MSigDB C2 and C5 databases, version 7.2) (http://www.broadinstitute.org/gsea/msigdb/collections.jsp). FDR-adjusted P<0.05 or nominal (NOM) P<0.05 was used to select statistically signi cant gene sets.

Statistical analysis
All the statistical tests were performed using R (version 3.6.3, www.r-project.org). Student's t-tests were used to compare the differences among groups. Cumulative survival time was calculated using the Kaplan-Meier method and the differences in survival curves were analyzed using the log-rank test from "survival" package (version: 2.42.3). Hazard ratios were calculated using the "survcomp" package [22] (version: 1.40.0). Univariate and multivariate analyses were conducted using the Cox proportional hazards regression model. For all tests, a pvalue of less than 0.05 indicated a statistically signi cant difference. Statistical signi cance is shown as *P<0.05, **P<0.01, ***P<0.001.

Results
Prognostic IRGPs signature construction The TCGA cohort (n=476) gene expression data was used as discovery cohort and genes with relatively high variation were kept as candidates, evaluated by median absolute deviation (MAD) >0.5. The ltered discovery dataset was subjected to construct the prognostic model by using 13,996 unique immunerelated genes (IRGs), which were obtained from the ImmPort database (accessed on 3/8/2021). After removing IRGPs with relatively small variation (MAD =0), 2215 IRGPs were left and selected as initial candidate IRGPs. Then we de ned IRGP index (IRGPI) using Lasso Cox proportional hazards regression on the training set and selected 25 IRGPs in the nal risk scoring model ( Table 1). The IRGPI includes a panel of 41 unique IRGs and 7.3 percent of them chemokine activity related molecules (Figure 1). We then used the IRGPI to calculate the risk score for each patient in the training group. The optimal cut-off of the IRGPI for dividing patients into the high-or low-immune risk groups was set at−0.175 using timedependent ROC curve analysis (Figure 2). The IRGPI signi cantly strati ed training set patients into low and high-risk groups in terms of overall survival (OS). Our data suggested that the high-risk group Validation of the IRGPI signature for survival prediction To validate whether the IRGPI had consistent prognostic value in different risk groups, we applied the IRGPI signature to an independent validation cohort from Cabrita (GSE65904, n=214) [17] as external validation. Patients in the validation group were divided into low-and high-risk groups, and then survival was compared. Similar to the results obtained from the training cohort, patients in the high-risk group had a shorter OS than those in the low-risk group (P=2.47×10 −4 ; HR =2.99 [1.66, 5.38]) ( Figure 5).

Immune cells in ltration within different risk groups
We applied CIBERSORT to estimate the relative proportion of 22 different immune cells for each patient within different risk groups. Figure 6A depicted a comparative summary of CIBERSORT output resulting from these two risk groups. Immune cells, such as T cells CD8, T cells CD4 memory activated, Macrophages M0, Macrophages M1, T cells follicular helper, T cells gamma delta and NK cells resting were enriched in different risk groups ( Figure 4A). We found that Macrophages M0, NK cells resting and T cells gamma delta were signi cantly highly expressed in the high-risk group (P<0.05), while the percentage of T cells CD8, T cells CD4 memory activated, T cells follicular helper and Macrophages M1 were signi cantly higher in the low-risk group (Figure 6 B-H).

Functional assessment of the IRGPI signature
To explore the biological processes and signaling pathways altered by the IRGPI signature, we carried out GO analysis and GSEA. GO analysis showed that the IRGPI signature genes in the training cohort were mostly involved in positive regulation of defense response (Figure 7). GSEA was carried out between high-and low-risk groups to investigate the pathways that were signi cantly altered. Keratinization and pigment phenotype-related pathways, including keratinocyte differentiation, corni cation, molting cycle, pigment biosynthetic process, were highly enriched for the high-risk groups (false discovery rate (FDR) <0.01) (Figure 8 A-D). The enrichment of keratinization and pigment phenotype-related pathways provided evidence of molecular mechanisms affected by the IRGPI signature and thus predicts the prognosis of melanoma.

Discussion
Melanoma is an aggressive form of skin cancer, and a worldwide problem with increasing incidence. [23] Currently, clinical features, such as tumor grade, histopathological classi cation, dermoscopy, and imaging examination are still the most common ways to evaluate the risk of melanoma patients, [7] and besides, a lot of multigene prognostic signatures [9,24] have been developed for melanoma patients, but the accuracy of their prognosis predictions remains uncertain. Therefore, there is an urgent to nd robust prognostic biomarkers to predict the survival of melanoma patients.
In view of the inherent biological heterogeneity of tumor and the technical deviation caused by sequencing platform, the previous prognostic risk models need to properly standardize gene expression pro les, which is the di culty of data analysis. In order to achieve the robustness of prediction, we use a new method for data analysis, regardless of the technical deviation of different platforms. [11] The newly established prognostic features do not need data preprocessing, such as scaling and standardization, but are achieved by relative sequencing and paired comparison of gene expression values. This method has been reported to have reliable results in many studies, including molecular classi cation of cancer. [12][13][14] In the present study, we identi ed an immune-related gene pair signature to predict overall survival for melanoma patients. The prognostic signature consists of 25 immune-related gene pairs (IRGPs) containing 41 unique IRGs. Most of the genes involved in the immune signature are cytokines and cytokine receptors that play key roles in positive regulation of defense response. Within these 41 immune genes, The HLA-DQB1 allele was present in 56% of melanoma patients vs. 27% of 200 local Caucasian controls, which suggests that the HLA-DQB1 may play an important role in determining the risk of development and the prognosis of melanoma. [25]. In clinical samples, the cumulative (tumor and host) CCL8 expression was lower in the group in which human primary melanoma formed lung metastasis compared to non-metastatic primary tumors.Increased migration of the examined tumor cell lines was observed when CCL8 was applied as a chemoattractant.
[26] The tumor microenvironment-associated protein S100A8/A9 serves as a novel prognostic marker for metastasis and survival of metastatic melanoma patients and predicts response to immunotherapy with pembrolizumab. [27] Elevated APOBEC3G expression is signi cantly correlated with better prognosis in skin cutaneous melanoma patients. [28]. In addition, immune cells have been shown to be related to poor prognosis in a variety of cancer types. [29] We found signi cantly increased in ltration levels of Macrophages M0, NK cells resting and T cells gamma delta in the immune high-risk group.
Moreover, in this study, we also found that keratinization and pigment phenotype-related pathways, including keratinocyte differentiation, corni cation, molting cycle, pigment biosynthetic process, were related to the IRGPI signature. These pathways are well-known critical factors that affect tumor metastasis. [30][31][32][33] Our ndings suggested that a high-risk score correlates with up-regulation of keratinization and pigment phenotype-related pathways, consistent with the above knowledge. These results suggest the important role of the IRGPI signature in tumor invasion in melanoma patients.
Our research has some limitations. First of all, prognostic markers are gene expression pro les based on microarray platform. Due to its high price, long transformation cycle and requirements for bioinformatics expertise, it is di cult to promote in daily clinical application. Second, the training data sets used to establish immune signals were from retrospective studies, including fresh frozen samples. Therefore, the stability and effectiveness of these samples are still in doubt. More data sets with different sample attributes are needed for more extensive veri cation.

Conclusion
In summary, we systematically investigated the prognostic value of the immune-related gene pair signature, which could provide risk assessment for melanoma patients management. Our immune-related signature was associated with prognosis of melanoma patients. These immune signature genes and biological functions of the IRGPI provide a further understanding of the role of our prognostic signature in the development of melanoma. This signature will be a useful predictive tool for identifying patients who may bene t from immunotherapy.

Declarations
Ethics approval and consent to participate: Not applicable.
Consent for publication: Not applicable.
Availability of data and materials: The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Competing interests: The authors declare that they have no competing interests. Authors' contributions: DP designed and implemented the project. QO gave methodological support. PW carried out data collation. FY conducted data visualization. JT was a major contributor in writing the manuscript. All authors read and approved the nal manuscript.  Figure 1 Molecular function for IRGs. The percentage of genes is shown on the right side of the graph.

Figure 2
Time-dependent ROC curve for IRGPI in the training cohort. IRGPI score of -0.175 which was used as cutoff for IRGPI to stratify patients into low-or high-risk groups. Abbreviations: ROC, receiver operating characteristic; IRGPI, immune-related gene pairs index.   Immune-related signature genes analysis. Gene ontology (GO) of the 41 unique immune signature genes.
"GeneRatio" is the percentage of total differential genes in the given GO term.