Identi cation and Validation of Key Genes in Giant Congenital Melanocytic Nevi by RNA Sequencing Analysis


 Background

Giant congenital melanocytic nevi (GCMNs) are melanotic lesion present at birth, which often cause severe psychological and financial burden to patients and their families.However, the pathogenes is still unclear.
Objective

We aim to identify key genes and biological processes that related to the development of GCMN.
Methods

We sequenced ten pairs of GCMN tissues and adjacent normal tissues by high-throughput RNA-seq, then used GO and KEGG analysis to find inportment pathways, and used MCODE,Cluego and Cytohubba plugin of Cytoscape software to identify hub genes.
Results

A total of 1163 differentially expressed genes were identified. 29 BPs, 18 CCs, and 17 MFs were significantly enriched in GO analysis and no pathway was significantly enriched in KEGG analysis. PPI Visual Network consisted of 779 nodes and 2359 edges,which was be divided into 25 functional modules by MCODE. We discovered most of the hub genes were located in module 5,and the top 3 hub genes (PTGS2,EGF,SOX10) in module 5 were involved in GO and KEGG enrichment pathways --“Arachidonic acid metabolism”,”glycosaminoglycan biosynthetic process”,”developmental pigmentation” respectively.
Conclusion

 PTGS2, EGF and SOX10 are thought to be the three most important hub genes and may play essential roles in the development of GCMN.


Introduction
Giant congenital melanocytic nevi (GCMNs) are benign melanocytic skin tumors that originate from the neural crest and present at birth 1 .
Generally, GCMN is de ned by the maximal diameter of its projected adult size (PAS)>40cm 2 . The clinical importance of GCMNs is the increased risk of melanomas 3 and neurocutaneous melanosis (NCM) 4 ,both of which are associated with extremely poor prognosis. The great impairment on appearance and severe complications often cause serious psychological and economic burden to patients and their families. Currently, the mainstay of GCMN treatment is surgical excision. However, the therapeutic effect is not ideal due to the complexity of the disease. Its root cause lies in the the lack of understanding of the disease,therefore, further exploring the pathogenesis of the disease is essential.
With the development of the sequencing technology, bioinformatics analysis provides a new approach to the deep investigation of complicated diseases. Several scholars studied the molecular characterization of GCMN patients by next-generation sequencing and found approximately 80% of patients harbor somatic mutations. NRAS,BRAF,KRAS,APC,MET are common mutated genes and are involved in RAS/RAF/MEK/ERK, PI3K/Akt and JAK/STAT pathways 5,6 . Rouille, et al. 7 reported that inhibition of MEK and/or PI3K pathways could signi cantly reduce the proliferative ability of melanocytes in vivo and in vitro, which provides a new direction for the treatment of GCMN.
Other molecular events may also intervene in the occurrence of GCMN 8 . Therefore, it is crucial to further elucidate the underlying mechanisms of GCMNs and pave the way for novel therapeutic targets.
The mRNA contains abundant genetic information and is necessary for protein translation, however the molecular function and clinical value of transcriptomic expression pro les in GCMN remains unclear. RNA sequencing (RNA-seq) presents a convenient method to obtain comprehensive RNA expression information of speci c tissues in a certain state 9 .In this study, we applied the RNA-Seq technology to identify all differentially expressed genes (DEGs) between GCMN tissues and paired normal tissues.Gene Ontology(GO),Kyoto Gene and Genomic Encyclopedia(KEGG) pathway analyses and protein-protein interaction (PPI) network analysis were performed in order to explore key genes and molecular processes.

Sample collection and preparation
Ten pairs of GCMN and adjacent normal tissues for RNA-seq and Hematoxylin-Eosin staining(H&E staining) were collected from January 2020 to June 2020, and paired normal tissues were taken from surgically dissected tissues 5 mm away from the lesion edge. All patients met the diagnostic criteria for GCMN, and none of them had combined melanoma and neurocutaneous melanosis at the time of treatment. The dissected skin samples were rapidly washed with physiological normal saline and divided into two parts, one of which was instantly frozen in liquid nitrogen and thereafter stored at -80C° for RNA extraction and the other was xed in 10% neutral buffered formalin and para n-embedded, and stained with hematoxylin and eosin. Then, the samples were observed under light microscope for picture collection and analysis to ensure the quality of samples and the effectiveness of the results (Fig. 1A.1B). Another 14 GCMN tissues and 14 adjacent normal tissues were obtained from Xiang'an Hospital for use in RT-qPCR.
All subjects/ subjects' guardians gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Xiang'an Hospital of Xiamen University.

RNA extraction and Illumina sequencing
Total RNA was extracted using RNeasy Fibrous Tissue Mini Kit (Qiagen, Valencia CA,USA) according to the manufacturer's instructions, The purity and integrity of RNA was checked using the NanoPhotometer spectrophotometer (IMPLEN, CA, USA) and the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA, USA) respectively. The RNA integrity number (RIN) values of all samples were above 7.
A total amount of 1 µg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext UltraTM RNA Library Prep Kit for Illumina (NEB, USA). mRNA was puri ed from total RNA using poly-T oligo-attached magnetic beads, then clean data (clean reads) were obtained by removing reads containing adapter, reads containing ploy-N and low quality reads from raw data.

Data Analysis
Index of the reference genome was built and paired-end clean reads were aligned to the reference genome using Hisat2 v2.0.5 10 . After data homogenization and quality control, FPKM(fragments per kilobase) of each gene was calculated based on the length of the gene and reads count mapped to this gene. Genes with fold change > 2 and adjusted P-value <0.01 (P-values were adjusted using the Benjamini and Hochberg's approach to control the false discovery rate) determined by DESeq2 were considered as differentially expressed 11

GO and KEGG enrichment analysis of differentially expressed genes
To understand the biological function of DEGs, GO term and KEGG (http://www.genome.jp/kegg/) pathway analysis was implemented by the clusterPro ler R package 12 . Gene Ontology (GO) can annotate target genes and perform analysis from three levels, biological processes, cellular components and molecular function 13 . KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database resource for understanding high-level functions and utilities of the biological system 14 . A corrected P value less than 0.05(padj<0.05) was considered statistical signi cantly.

PPI Network Construction and Hub Gene Identi cation
A PPI network of DEGs was generated based on the STRING database (https://string-db.org/) with a combined score > 0.4 and further analyzed and visualized by Cytoscape software (version 3.7.1). The Cytoscape app MCODE were used to divide clusters of the above network with the default parameters: Degree Cutoff = 2, Node Score Cutoff = 0.2, K-Core = 2 and Max.Depth = 100.Next ,the ClueGO plugin of Cytoscape was employed for Biological process analysis upon Cluster 5, with a level of signi cance set for 0.05 and Kappa Score of 0.4.with a score value greater than 5. Then we explored hub genes using 12 scoring methods(MCC, DMNC, MNC, Degree, EPC, BottleNeck, EcCentricity, Closeness, Radiality, Betweenness, Stress, Clustering Coe cient available in another Cytoscape plugin cytoHubba. The top 10 genes from each of these algorithms were choosen. Any gene that appeared more than one time and also presented in top 6 MCODE clusters was considered as a hub gene.

RT-qPCR analysis
RT-qPCR(Reverse transcription-quantitative polymerase chain reaction) was performed to con rm the expression of hub genes in GCMN tissues and adjacent normal tissues. Total RNA was extracted from each sample using TRIzol reagent (Qiagen,Valencia, CA,U.S.A.) and rst-strand cDNA was synthesized with the cDNA Synthesis kit (TOYOBO, Osaka, Japan). Next, ampli cation was performed using the Power SYBR Green real-time PCR Master Mix (TOYOBO, Osaka, Japan). All procedures were conducted in three technical replicates. The relative expression of each target gene was normalized to GAPDH,and data were analyzed by the 2 − ΔΔCt method.

Statistical analysis
All statistical analyses were conducted using GraphPad Prism 7.0 and the results were expressed as the means ± standard deviation.
Student's t-test was performed to compare differences between two groups,The χ2 test and Fisher's exact test were used for GO and KEGG analysis. P value less than 0.05 was considered statistically signi cant.

Identi cation of DEGs
To systematically analyze that mRNAs may play a role in GCMN, we performed RNA-Seq experiments on 10 pairs of GCMN tissues and their matched adjacent normal tissues. Approximately 45 million reads were sequenced for each sample. In this study, a total of 1163 differentially expressed genes (DEGs) were identi ed with the cut-off criterion as p <0.01 and | log2fc | > 1, among which 519 genes were up-regulated and 644 genes were down-regulated (Fig. 1C).

PPI Network and Hub Gene Identi cation
We used Cytoscape software to visualize the PPI network, which consists of 779 nodes and 2359 edges. Cytoscape plug-in MCODE was applied for the module analysis, and a total of 25 functional modules were obtained from the entire network. Detailed informations of top six modules are listed in Table1. Subsequently, hub genes were identi ed using 12 types of calculation methods in cytoHubba (Fig. 3A). detailed information about these hub genes is presented in Table2. We discovered most of the hub genes were located in module 5, thus Module5 was chosen for further evaluation through hierarchical clustering analysis (Fig. 3B)and Cluego analysis (Fig. 3C). According to the result, Module5 was primarily involved in melanin biosynthetic process,and most genes were upregulated. The top 3 hub genes (PTGS2,EGF,SOX10) with the most frequent occurrences were selected for further experimental veri cation.

Validation of hub genes using RT-qPCR
To validate the results observed in RNA-seq, we undertook RT-qPCR to quantify the relative expression of 3 hub genes selected above.
Results show that compared with adjacent normal tissues, SOX10 was signi cantly up-regulated and PTGS2,EGF were signi cantly downregulated in GCMN tissues, which is consistent with the RNA-Seq data (Figure.4).The expression patterns of selected genes in qRT-PCR were consist with RNA-seq.

Discussion
The lesion area of GCMN becomes larger and the number of satellite nevi increases as the child grows. In addition,relevant complications may seriously affect the quality of life and even longevity 15 . These problems cannot be solved completely by surgical removal and new medical options based on the key disease-associated genes seem to be promising. Previously published studies mainly focused on the association between somatic mutations and clinical characteristics. In this study, we employed RNA-Seq to screen out 1163 DEGs, and found several novel biological processes and key genes that have not been investigated in GCMN.
GO classi cation and enrichment analysis revealed that several biological processes related to melanin synthesis were signi cantly enriched and the genes involved were up-regulated, which is consistent with the increased number of melanocytes and darker skin color of lesions 16 .In addition, primary biological processes of the DEGs were involved in the regulation of epidermis development, skin development, and keratinocyte differentiation. The thickened cuticle was also observed in diseased tissues with the HE staining results obtained, indicating that keratinocytes are strikingly different between pathological and normal tissues in terms of quantity and gene expression. A previous study suggested that melanocytes transfered melanin to keratinocytes through melanosomes, and the number of melanosomes in keratinocytes determined the degree of skin pigmentation 17 . We hypothesized that the two types of cells interact through direct contacting and cytokines secreting, and jointly participate in the progression of disease. Extracellular matrix (ECM) can provide the tissue with the mechanical support and mediate the cell-microenvironment interactions. Meanwhile, the extracellular structural changes may play facilitating roles in tumor progression 18 . Our results showed that GO analysis were linked to extracellular matrix synthesis, including collagen trimer and glycosaminoglycan biosynthetic process. ECM not only directly regulates cellular processes, such as adhesion, migration, proliferation and cell differentiation 19 , but also acts in binding, guiding and regulating signal molecules 20 . Tannous, et al. 15 demonstrated that the structure of ECM in the lesion site was disordered and melanocytes were extensively in ltrated. This consistency implies that ECM is involved in the formation and development of GCMN by affecting the behavior of melanocytes.
Although no pathway was signi cantly enriched in KEGG analysis, pathway"Arachidonic acid metabolism"(padj=0.277) attracted our attention. Gledhill, et al. 21 reported that arachidonic acid metabolism pathway exists in both keratinocytes and melanocytes, which can stimulate melanocyte proliferation and melanin production by secreting PGE2 and other regulatory factors. Furthermore, the arachidonic acid metabolic pathway has been proved to be activated or inhibited in some pigmented diseases, such as vitiligo 22 and solar lentigo 23 .
The role of over-expression of this pathway in GCMN lesions need to be further explored.
From the PPI network analysis, 10 genes were recognized as hub genes. Among them, PTGS2, EGF, SOX10 appeared most frequently in the top ten rankings and were involved in the mentioned biological processes. PTGS2, also known as COX-2, serves as an important role in prostaglandin production from arachidonic acid, and is highly expressed in in ammatory sites and melanoma 24 . Both animal experiments and cell experiments have con rmed that the growth, migration and invasion ability of melanoma cells were inhibited after the knockout of PTGS2 gene 24,25 . However ,it should be noted that compared with the normal group, PTGS2 gene was down-regulated in the diseased group in our study.
Epidermal growth factor (EGF) is commonly used to treat chronic ulcers of skin, such as diabetic foot ulcers 26 and severe burn injuries 27 ,as it can promote wound healing by inducing granulation tissue growth, angiogenesis and epidermal remodeling. In recent years, some studies have demonstrated that EGF has the inhibitory effects of skin pigmentation by blocking the activity of tyrosinase 28 . In addition, EGF is a typical anti-in ammatory and antioxidant cytokine, which could also reduce melanin production 26,29 .Kim, et al. 30 proposed EGFcontaining ointment had anti-pigmentation effects through performing a randomized controlled trial. In our study, it has been identi ed that the expression level of EGF was dramatically decreased in the lesions and may be related to local over-pigmentation.
As an essential transcription factor, Sox10 was the rst gene in SOX family which was found to be involved in neural crest development 31 .
By synergising with PAX3, SOX10 is capable of up-regulating the expression of the MITF(melanocyte master transcription factor) 32 and further activating dopachrome tautomerase and tyrosinase 33 ,the chain reaction can vastly promote melanocyte proliferation and synthesis of melanin. Numerous studies have been reported that SOX10 is highly expressed in tumors, especially melanoma, and plays an essential role in the proliferation, migration, and invasion of melanoma cells 34 .Kaufman, et al. 35 also found that overexpression of SOX10 in melanocytes signi cantly accelerated the formation of melanoma. Therefore, the overexpression of SOX10 in GCMN tissues is very likely to be linked to abnormal proliferation and high canceration of melanocytes and thus exploring the mechanisms of SOX10 in the occurrence and development of GCMN may be of great signi cance.
Several limitations exist in the present study because the sample size is not large enough. further experiments in vivo and in vitro are required to validate our ndings and to explore the role of the hub genes and biological processes.
In summary, we identi ed some key genes and biological processes that may be associated with the occurrence and development of GCMN through RNA-seq analysis, among which PTGS2, EGF and SOX10 are thought to be the three most important hub genes. Besides relating to melanin synthesis, these three genes are involved in "Arachidonic acid metabolism","glycosaminoglycan biosynthetic process","developmental pigmentation" respectively. These new ndings may help to elucidate the pathophysiology of GCMN.

Declarations
Data availability The RNA sequencing data presented in this study can be found in SRA database,The accession numbers are: PRJNA730598.