A Comprehensive Data Analysis of Differentially Regulated Genes in Melanoma


 Background: Melanoma is the most deadly tumor in skin tumors and is prone to distant metastases. The incidence of melanoma has increased rapidly in the past few decades, and current trends indicate that this growth is continuing. This study was aimed to explore the molecular mechanisms of melanoma pathogenesis and discover underlying pathways and genes associated with melanoma.Methods: We used high-throughput expression data to study differential expression profiles of related genes in melanoma. The differentially expressed genes (DEGs) of melanoma in GSE15605, GSE46517, GSE7553 and the Cancer Genome Atlas (TCGA) datasets were analyzed. Differentially expressed genes (DEGs) were identified by paired t-test. Then the DEGs were performed cluster and principal component analyses and protein–protein interaction (PPI) network construction. After that, we analyzed the differential genes through bioinformatics and got hub genes. Finally, the expression of hub genes was confirmed in the TCGA databases and collected patient tissue samples.Results: Total 144 up-regulated DEGs and 16 down-regulated DEGs were identified. A total of 17 gene ontology analysis (GO) terms and 11 pathways were closely related to melanoma. Pathway of pathways in cancer was enriched in 8 DEGs, such as junction plakoglobin (JUP) and epidermal growth factor receptor (EGFR). In the PPI networks, 9 hub genes were obtained, such as loricrin (LOR), filaggrin (FLG), keratin 5 (KRT5), corneodesmosin (CDSN), desmoglein 1 (DSG1), desmoglein 3 (DSG3), keratin 1 (KRT1), involucrin (IVL) and EGFR. The pathway of pathways in cancer and its enriched DEGs may play important roles in the process of melanoma. The hub genes of DEGs may become promising melanoma candidate genes. Five key genes FLG, DSG1, DSG3, IVL and EGFR were identified in the TCGA database and melanoma tissues.Conclusions: The results suggested that FLG, DSG1, DSG3, IVL and EGFR might play important roles and potentially be valuable in the prognosis and treatment of melanoma.

screening and the introduction of new therapies, melanoma remains a major public health problem [6,7]. In 2018, there were approximately 280,000 new deaths and 60,712 deaths worldwide [8]. There are currently an estimated 1.2 million melanoma survivors in the United States alone [9]. While many previous studies have examined factors associated with survival [10][11][12][13], exhaustive research on the pathogenic genes and mechanisms of melanoma pathogenicity remain scarcely. Data on the pathogenesis, metastasis, and prognosis of melanoma can generate important information that can guide treatment, monitoring plans, and point the way for future melanoma research.
In recent years, microarrays and high-throughput sequencing technologies that detect the expression levels of tens of millions of genes in humans have been widely used to predict potential targets for melanoma treatment [14,15]. But most studies focus on a single genetic event of melanoma or results from a single cohort study [16][17][18]. Therefore, there is no clear and effective treatment [19][20][21].
In this study, we have compiled the GEO and TCGA databases in order to explore the key genes and pathogenesis of melanoma as comprehensively as possible.

Data searches and DEG identification
We conducted a search of Gene Expression Omnibus (GEO: https://www.ncbi.nlm.nih.gov/geo/) for high-throughput functional genomics experiments of melanoma. We used the following search terms: melanoma, primary melanoma, metastasis melanoma and skin cutaneous melanoma. Datasets were screened for dataset record following the criteria: (1) samples contained melanoma and normal skin tissue, (2) study type was restricted to expression profiling by array, (3) organism was restricted to Homo sapiens, (4) original data were accessible. We excluded studies of less than five samples in each group. The gene expression profiles meeting inclusion criteria were selected from GEO database and TCGA (https://cancergenome.nih.gov/abouttcga/overview) database. The DEGs between the melanoma and the normal controls were analyzed using the web tool GEO2R (https://www.ncbi.nlm.nih.gov/geo/info/geo2r.html). Differences in gene expression values were evaluated by t-test. Multiple testing correction was performed using the Benjamin Hochberg method.
Finally, the DEGs with (|logFC fold change|) >2 and adjusted p < 0.05 were screened [22]. In addition, RNASeqV2 data for cutaneous melanoma can be downloaded from the TCGA database. This study reanalyzed the microarray data downloaded from the public database and processed the raw data of each datasets with R statistical software (version 3.6.0). Thus, commonly changed DEGs from the datasets were integrated using Venn analysis.

Gene ontology and pathway enrichment analyses
Gene ontology analysis (GO, http://www.geneontology.org/) was used to identify characteristic biological attributes for DEGs [23]. Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analysis was performed to identify functional attributes for DEGs [24]. We used online database for annotation, visualization and integrated discovery (DAVID)[25] and KEGG to access GO and pathway enrichment analysis. p < 0.05 was set as the cut-off criterion.

PPI network construction
Functional interactions between proteins can provide support for elucidating the molecular mechanisms of disease processes. In our study, the search tool for the retrieval of interacting gene (STRING) [26] database was utilized to construct PPI network. In addition, Cytoscape software was applied to construct protein interaction relationship network [27].

Hub genes identification
The Cytoscape was performed to scale degree and closeness of the PPI network. The degree of a node is the average number of edges (interactions) incident to this node. The genes at the top of the degree distribution (≥ 95% percentile) in the significantly perturbed networks were defined as hub genes.

Module analysis of the PPI network
Module analysis of the PPI network was performed with the parameters of minimum size > 3 and P-Value < 0.01 using ClusterONE, a Cytoscape plugin that unified different clustering techniques and displayed them in a single interface.

Hub genes expression level and survival analysis
Gene-level correlations with patient survival were featured in UALCAN (http://ualcan.path.uab.edu/analysis.html) [28]. Available TCGA patient survival data were used for Kaplan-Meier survival analysis and to generate overall survival plots. The two scores for each slide were then combined to produce a final grade of PTEN expression: 0, total score = 0; 1+, total score = 1-2; 2+, total score = 3-4; 3+, total score = 5-6. The average score is used when there exists a difference between the two pathologists. This study was approved by the local ethics committee and written informed consent was obtained from each patient.

Search results and study characteristics
According to the inclusion criteria, three GEO datasets and TCGA dataset were obtained in our study: GSE15605, GSE46517, GSE7553 and TCGA skin cutaneous melanoma data. 1078, 407, 892 and 2148 DEGs from the expression profile datasets GSE15605, GSE46517, GSE7553 and TCGA dataset were extracted, respectively. 160 consistently expressed genes were identified by Venn analysis (Figure 1).
Among them, 144 genes were up-regulated while 16 were down-regulated compared to normal skin tissue (Table 1).

Gene ontology analysis
Gene ontology describes gene function and relationships between these concepts. P < 0.01 was used as the cut-off criterion. DEGs were classified into the biological process: pathways and larger processes made up of the activities of multiple gene products. After GO enrichment analysis, we found that 160 DEGs were enriched in 17 GO terms (biological process). Among them, the most enriched GO terms were epidermis development (p=1.88E-17), keratinocyte differentiation (p = 9.90E-10), keratinization (p = 3.40E-06) and establishment of skin barrier (p = 1.49E-05) ( Figure 2).

Pathway enrichment analysis
Pathway enrichment analysis was carried out by online websites of KEGG, a database was applied to assign sets of DEGs to specific pathways. p < 0.05 was used as the cut-off criterion. After pathway enrichment analysis, we found that 160 DEGs were enriched in 11 pathways. Among them, DEGs were mainly enriched in the pathways in cancer (p = 0.03), transcriptional misregulation in cancer (p = 0.01), Rap1 signaling pathway (p = 0.02) and Ras signaling pathway (p = 0.03) ( Figure 3).

PPI network analysis
We obtained a PPI network from STRING to describe protein interactions (Figure 4). Based on the information obtained from the STRING database, a PPI framework with 160 nodes and 385 edges was generated, and its local clustering coefficient was 0.45. The results of computed hub genes were shown in the Table 2, including LOR, FLG, KRT5, CDSN, DSG1, DSG3, KRT1, IVL and EGFR.

Module analysis of the PPI network
To study and identify the function of the overlapping DEGs in detail, cluster analysis of the PPI network was conducted based on the ClusterONE Cytoscape plugin, an important tool for the analysis of densely connected and possibly overlapping regions within the Cytoscape network, which would contribute to the classification of protein network and relevant analysis. There were a total of 23 functional modules given and the most significant module (node = 29, density = 0.4335, p-value = 1.01E-8, Figure 5A) was selected for further analysis of functions and pathways to deeply understand the melanoma progression. To further verify the accuracy of this inference, the module genes were submitted into DAVID to perform the KEGG pathway enrichment analysis. The results showed that they were significantly enriched in the renin secretion signaling pathway, epidermal development, keratinocyte differentiation, peptide cross-linking, keratinization and single biological cell adhesion, among other processes shown, p < 0.05 ( Figure 5B).

Hub genes expression level and survival analysis
Using UALCAN, we verified gene expression level of hub genes in 1 normal tissue, 104 primary tissues, and 368 metastasis tissues from TCGA database. Through this analysis, we found that LOR,

Detection of hub genes related protein expression by immunohistochemistry
In order to determine the expression of hub genes in human melanoma, we used immunohistochemical methods to detect the expression of the protein corresponding to hub genes.
We detected the expression level of hub genes in 63 pairs of melanoma specimens (melanoma and adjacent normal tissue) by immunohistochemistry (Figure 7). The results showed that the expressions of FLG, DSG1, DSG3, IVL and EGFR were considerably higher than those of adjacent normal tissues (P <0.05) ( Figure 7B, 7E, 7F, 7H and 7I).

Discussion
Melanoma is a progressive disease that requires effective prognostic indicators for diagnosis and treatment. In recent years, effective computational models have been constructed to identify diseaserelated mRNAs. However, most research has focused on using cell lines or animal models to intervene at the level of a single gene, protein, or miRNA. In our study, we used high-throughput expression data to study differential expression profiles of related genes in melanoma. We analyzed tumor and normal skin samples from patients in the GEO and TCGA databases to explore abnormally expressed genes in melanoma. The results showed that 160 differentially expressed genes were selected, including 144 up-regulated genes and 16 down-regulated genes. Later, we identified hub genes and pathways in melanoma based on the use of bioinformatics methods. We integrated 4 original microarray datasets and identified 160 frequently changed DEGs. DEGs were mainly enriched in 17 biological processes by GO terms, of which epidermis development, keratinocyte differentiation, keratinization, and establishment of skin barrier were the most obvious. KEGG pathway enrichment analysis showed that DEGs were mainly enriched in 5 signaling pathways, of which pathways in cancer, transcriptional misregulation in cancer, Rap1 signaling pathway and Ras signaling pathway were the most significant. In particular, the pathway of pathways in cancer was enriched by 8 DEGs, such as EGFR and JUP. EGFR (degree = 21) and JUP (degree = 19) were important key node genes in the PPI network. The results of our study suggested that these genes and pathways may play critical showed that the expressions of FLG, DSG1, DSG3, IVL and EGFR were markedly higher than those of adjacent normal tissues. These studies indicated that we got the key genes FLG, DSG1, DSG3, IVL, and EGFR that could affect melanoma development. We further discussed hub genes expressed in melanoma patient tissues. Filaggrin, a highly abundant protein of the stratum corneum, draw considerable attention after the discovery of its role in the aetiology of atopic dermatitis [34].

Conclusion
In summary, we believe that by identifying the specific molecular genetic signature associated with the pathogenesis of melanoma will allow for better diagnosis and optimization of treatment modalities. This study is the first analysis of differential genes in melanoma and matched adjacent normal tissue samples. We have identified several differentially regulated proteins FLG, DSG1, DSG3, IVL, and EGFR in melanoma, which can be used as new biomarkers for simple, early detection of diseases, while providing novel therapeutic targets, pending experimental validation. Our investigation demonstrates a comprehensive screening of mRNA signature in publicly available GEO and TCGA resources. Our approach seeks to maximize the utilization of established datasets to understand the biological role of DEGs of melanoma in tumorigenesis, disease prognosis and treatment outcome. To improve the accuracy of the results of this study, the authors will continue to collect samples with related mutations for further research.     PPI network of DEGs. Each node is a differentially expressed gene (protein). A red node represents an up-regulated gene (protein). A green node represents a down-regulated gene (protein). The node size reflects node degree: the bigger the degree value, the bigger the node size is. The left ordinate of histogram represents the gene counts, and the right represents the pvalue. BP stands for biological process; and Pathway stands for cell signaling pathway.