Analysis of small nucleolar RNAs (snoRNAs) associated co-expression and construction of ceRNA regulatory networks in esophageal carcinoma

Purpose: This study aimed to investigate the role of small nucleolar RNAs (snoRNAs) and their host genes (SNHGs) in tumorigenesis and progression of esophageal carcinoma (EC). Methods: RNA-seq data and clinical information for EC patients were obtained from The Cancer Genome Atlas database. Differentially expressed snoRNAs (DE-snoRNAs) and SNHGs (DE-SNHGs) between EC samples and normal controls were screened with the edgeR package, followed by the trend analysis with STEM. Survival analysis was performed using the Kaplan–Meier method with the log-rank test. Screening of co-expressed genes and functional enrichment analyses were performed with Pearson’s correlation analysis and enrichment analysis, respectively. Lastly, the co-expression and ceRNA networks were constructed. Results: A total of 19 DE-snoRNAs and nine DE-SNHGs were screened between EC samples and normal controls. Moreover, two snoRNA clusters related to the clinic stage were analyzed through trend analysis. Survival analysis results demonstrated that SNORA70D was signicantly associated with the overall survival of EC patients (P=0.034). In addition, a co-regulation network that included six snoRNAs and 29 mRNAs was constructed. A ceRNA network comprised of ve SNHGs, 11 mRNAs, and 38 miRNAs was constructed based on the top 50 relationship pairs. KEGG pathway enrichment analysis revealed that SNORA14B, SNORA47, SNORA71C, SNORD12B, and SNORD14E in the co-expression network were mainly enriched in the NOD−like receptor signaling pathway, and neuroactive ligand−receptor interaction pathway. SNHG23, SNHG3, SNHG17, SNHG4, and SNHG8 in the ceRNA regulation network were mainly enriched in calcium signaling pathway, cytokine−cytokine receptor interaction, and the extracellular matrix−receptor interaction pathway. Conclusion: The data revealed a series of snoRNAs, SNHGs, and pathways involved in EC carcinogenesis, which will provide a basis for understanding the underlying molecular mechanism of EC development.


Introduction
Esophageal carcinoma (EC), one of the most aggressive cancers, has been the sixth leading cause of cancer deaths worldwide, representing the 3.2% of all cancers [1,2]. Traditionally, EC is subdivided into two histologic types, including esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC), of which 88% of the cases are ESCC [3]. Accumulating evidence shows that the risk factors of EC include smoking, alcoholic beverages, Barrett's esophagus, gastric re ux and obesity [4,5]. Till now, despite the improvements of multidisciplinary treatments, such as surgery, chemotherapy, and radiotherapy, the prognosis of EC patients still remains poor, with the 5-year survival rate remains only approximately 20% [6]. To our knowledge, EC is a multistep process involving a series of genetic or epigenetic alterations; hence, to investigate the molecular and pathogenic mechanisms underlying the carcinogenesis of EC is still necessary.
Small nucleolar RNAs (snoRNAs), comprising of 60-300 nucleotides in length, are a conserved class of non-coding RNAs that contribute to ribosome biogenesis and RNA splicing by modifying ribosomal RNA and spliceosomal RNAs. The role of snoRNA in tumor development was rst reported in 2002 [7]. In the ensuing decade, an increasing number of studies have indicated the involvement of snoRNAs in different tumors, including lymphoma, breast cancer, melanoma, and myeloma [8][9][10][11][12]. It has been reported that snoRNAs are regulated in a was complex manner by host genes, copy number variation, and DNA methylation [13]. As the host genes for snoRNAs, small nucleolar RNA host genes (SNHGs) include coding genes and non-coding genes. Long non-coding small nucleolar host genes are one of the classes of SNHGs. Increasing lnc-SNHG members continue to be identi ed in cancers. For instance, it has been reported that SNHGs are involved in the development of various diseases, including cancer progression, cell apoptosis, and survival [14]. SNHG8 may play oncogenic roles in the malignancy of ESCC, and offer potential biomarkers for the diagnosis and prognosis of ESCC [15]. LncRNA SNHG1 may be a potential predictor of prognosis in ESCC patients and a novel target for ESCC treatment [16].
With the development of microarray pro les and RNA-sequencing, a series of public gene expression analyses were published, including The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov/) and Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/gds) datasets. In particular, some snoRNAs and SNHGs play important roles in biological processes of disease. However, few studies have addressed the presence of these snoRNAs or SNHGs in EC. Hence, in the present study, we conducted a bioinformatics analyses to screen the candidate snoRNAs, SNHGs, and pathways involved in EC development as shown in Supplementary Figure 1.

Data collection and preprocessing
Public RNA-sequencing data and survival data of patients with EC were downloaded from the UCSC Xena [17] ( https://xenabrowser.net/). The data are open to the public under certain guidelines. All expression pro les data were classi ed into two groups: normal and tumor. Clinical information, including age at diagnosis, gender, site, type and body mass index was summarized in Table 1. After the extraction of snoRNAs and SNHGs data, the initial log (counts+1) value was converted to the counts.

Differential snoRNAs and SNHGs expression analysis
Based on the read count, differentially expressed snoRNAs (DE-snoRNA) and SNHGs (DE-SNHGs) were analyzed using the TMM (trimmed mean of M values) normalization method of edgeR software package [18,19], followed by difference analysis using quasi-likelihood (QL) F-test method. The cutoff criteria of DE-snoRNA was P<0.05 and |logFC|>1(foldchange>1.2), and that for DE-SNHGs screening was P<0.05 and |logFC|>0.585(foldchange>1.5). Hierarchical clustering analysis was performed for the DE-snoRNA and DE-SNHGs using the R package.

Trend analysis
According to the diagnosis classi cation of tumor stage, EC was divided into four stages, including stage i, stage ii, stage iii, and stage iv. The trend analysis was performed with STEM [20] (version 1.3.11, http://www.cs.cmu.edu/~jernst/stem/).

Screeningof the host genes of the snoRNA and functional annotation
The host genes of the DE-snoRNAs and snoRNAs related to clinical stage of EC were extracted by using the snoopy tool (http://snoopy.med.miyazaki-u.ac.jp/) [21]. For functional annotation of those host genes of snoRNA, GO and KEGG pathway annotations were conducted with the cluster pro ler package [22]. Signi cant pathways with a threshold of P<0.05 were selected.

Survival analysis between snoRNA and methylation
According to the TCGA datasets, Kaplan-Meier analysis was used to screen the genes showing clinical relevance in tumor patients. Brie y, the samples were divided into two groups according to the median expression of each gene: high expression and low expression groups. The log-rank test was used to analyze the differences between groups, and the signi cantly prognostic value was selected with a threshold of P<0.05. Statistical analyses were performed using R package ("survival"). To construct the ceRNA regulation network, Pearson correlation analysis was performed to determine the correlation of the DE-SNHG and DEGs (threshold: P<0.05). The top 5 members of the lncRNA-mRNA relationships were selected for further study. Subsequently, the targeted miRNAs of lncRNAs and mRNAs were predicted based on the MiRanda software [5]. Next, the ceRNA network was constructed based on the lncRNA-miRNA and lncRNA-mRNA relationship pairs with the threshold of -sc 140 and -en -20. Similarly, the enrichment analysis of the genes in the network was done using the cluster pro ler package [22].

Statistical analyses
All statistical analyses were performed using GraphPad Prism 7.0 (GraphPad Inc., La Jolla, CA), and all data are expressed as the mean ± standard deviation (SD). Statistical comparisons between groups of normalized data were performed using the Student's t-test. P<0.05 was considered to be statistically signi cant difference.

Screening of DE-snoRNA and DE-SNHGs
Based on the fold change of P<0.05 and |logFC|>1, a total of 19 snoRNAs were revealed to be differentially expressed between EC and normal controls. Among these DE-snoRNAs, 16 were upregulated and 3 were down-regulated in EC compared with normal samples (Supplementary Table 1). In addition, compared with those in the normal, there were 9 DE-SNHGs, including 4 up-and 56 downregulated SNHGs in the EC samples, with the threshold of P<0.05 and |logFC|>0.585. The heat map of hierarchical clustering showed that the DE-snoRNAs or DE-SNHGs could be signi cantly distinguished between two groups ( Figure 1) Screening of snoRNAs and SNHGs related to clinical stage As shown in Figure 2A, the expressed trends of snoRNAs related to clinical stage of EC were further analyzed. Two snoRNA clusters, including cluster 6 and cluster 7 were screened. A total of 22 and 169 snoRNAs were found in cluster 6 and cluster 7, respectively (Supplementary File 1). The expressed trends of the snoRNAs in two clusters are illustrated in Figure 2B and C. However, no SNHG cluster was signi cantly related to the clinical stage of EC.

Functional annotation
To analyze the functions of the DE-snoRNAs and snoRNAs related to clinical stage of EC, the host genes were screened ( Table 2). As shown in Figure 3A, the host genes, including SNORA14A, SNORD11B, SNORD15A, SNORD53, and SNORD72, were mainly enriched in the GO BP terms related to regulation of steroid hormone biosynthetic process, positive regulation of cholesterol metabolic process, and carnitine metabolic process. Furthermore, KEGG pathway enrichment analysis revealed that SNORD11B, SNORD15A, SNORD53 and SNORD72 were involved in pathways that included Ribosome biogenesis in eukaryotes, Ribosome and n Pathogenic Escherichia coli infection ( Figure 3B). The host genes of clinical stage-related snoRNAs (SCARNA10, SCARNA12, SNORA1, and SNORA11) were mainly enriched in pathways that included mitotic chromosome condensation, chromosome condensation, and mitotic sister chromatid segregation ( Figure 3C).

Correlation analysis between snoRNAs and methylation
To predict the relationship between snoRNAs and methylation, correlation analysis was performed based on the methylation levels of the EC samples from the TCGA. The results showed 7 DE-sonRNAs and 5 snoRNAs related to clinical stage were found signi cantly correlated with the methylation levels (all, P<0.05; Figure 4 and Supplementary table 2), respectively.
Survival analysis of DE-snoRNAs and snoRNAs related to clinical stage of EC In order to screen the snoRNAs, which may show clinical relevance with tumor patients, we performed the survival analysis on the DE-snoRNAs and snoRNAs related to clinical stage using TCGA datasets.
Kaplan-Meier curve showed that patients with high expression of SNORA70D had shorter overall survival compared to those with low expression (P=0.034; Figure 5), suggesting SNORA70D is associated with the prognosis of EC patients.
Construction of snoRNA -mRNA co-expression network A total of 38 and 3 DEGs were respectively found to be associated with DE-sonRNAs and clinical staging related snoRNAs based on the Pearson correlation coe cient analysis. The co-regulation network of snoRNA-mRNA was shown in Figure 6A, which included 6 snoRNAs and 29 mRNAs. Among these snoRNAs, SNORD12B was found to have the highest degree (degree=12). Furthermore, KEGG pathway enrichment analysis of snoRNA -mRNA relation pairs (top 5) revealed that SNORA14B, SNORA47, SNORA71C, SNORD12B, and SNORD14E mainly enriched in transcriptional misregulation in cancer, NOD−like receptor signaling pathway, neuroactive ligand−receptor interaction ( Figure 6B).

Construction of the ceRNA regulation networks
According to the miranda software, lncRNA-miRNA and mRNA-miRNA relationship pairs were obtained, respectively (Supplementary le 2). Next, the ceRNA network was constructed using the top 50 relationship pairs. As illustrated in Figure 7A, the ceRNA network was composed of 5 SNHG nodes, 11mRNA nodes, and 38 miRNA nodes. Subsequently, KEGG pathway enrichment analysis of SNHGs (top 5: SNHG23, SNHG3, SNHG17, SNHG4 and SNHG8) revealed a series of pathways, such as calcium signaling pathway, IL−17 signaling pathway, cytokine−cytokine receptor interaction and ECM−receptor interaction ( Figure 7B).

Discussion
With the advances in high throughput sequencing, an increasing number of snoRNAs have been screened and are emerging as important RNAs, thereby attracting the attention of researchers. Studies have shown that some snoRNAs play important roles in biological processes, and dysfunction of snoRNAs may lead to oncogenesis [16]. It has also been reported that snoRNAs could serve as biomarkers in several diseases, including cancers [23]. Concerning the host gene for snoRNAs, SNHG genes may have diverse regulatory effects on cellular processes in some cancers [24,25]. In the present study, a total of 19 DE-snoRNAs and 9 DE-SNHGs were screened between EC samples and normal controls, among which SNORA14B, SNORA47, SNORA71C, SNORD12B, and SNORD14E were selected as the key members due to their topological characteristics in the snoRNA-mRNA co-expression network; Similarly, SNHG23, SNHG3, SNHG17, SNHG4, and SNHG8 was selected as the key members in the ceRNA regulation network. SNORA47 is involved in lung cancer tumorigenesis [26] and promotes tumorigenesis by regulating EMT markers in hepatocellular carcinoma [27]. SNORA71C is overexpressed in breast cancer brain metastases [28]. Furthermore, SNHG3 facilitates cell proliferation and migration in oral squamous cell carcinoma via targeting nuclear transcription factor Y subunit gamma [29]. It has been also been reported to that SNHG3-related ceRNAs can be potential research targets to explore the molecular mechanisms of HCC [30]. Moreover, SNHG17 is involved in gastric cancer, non-small-cell lung cancer, or melanoma through regulation of pathways, such as the phosphoinositide 3-kinase (PI3K)-AKT pathway [31-33]. However, until, little has been published of these snoRNAs or SNHGs in EC.
Oncogenesis is a complex process that is involved in the interaction of various genes and signaling pathways. Hence, the analysis of candidate snoRNAs or SNHGs and pathways related to EC could provide the cognitive basis for disease development. The present data reveal the key snoRNAs or SNHGs in co-expression and ceRNA networks mainly enriched in pathways, such as the NOD-like receptor signaling pathway and neuroactive ligand-receptor interaction, as well as calcium signaling pathway, cytokine-cytokine receptor interaction, and ECM-receptor interaction. Actually, cytokine-cytokine receptor interaction, NOD-like receptor signaling pathway, and ECM-receptor interaction are chief contributors to EC progression [34]. Zeng et al. [35] reported that the calcium signaling pathway and the neuroactive ligand-receptor interaction are the most relevant pathways in EC diagnosis using miRNA-seq and RNA-seq data. In accordance with previous studies, our data suggest the involvement of snoRNAs or SNHGs in the progression of EC by the aforementioned pathways.
Recent studies have suggested that snoRNAs can be a novel prognostic biomarkers and therapeutic targets for cancers. For instance, Zhu et al. [36] reported that SNORD89 deleteriously affects the prognosis of ovarian cancer patients by regulating the Notch1-c-Myc pathway. Another study described that the inhibition of SNORA7B expression impaired cell growth, proliferation, migration, and invasion by inducing apoptosis [37]. In this study, based on data from TCGA, patients with high expression of SNORA70D displayed shorter overall survival compared to those with low expression. However, the role of SNORA70D has not been reported in any disease, and it will be necessary to further validate the clinical prognostic value of SNORA70D EC. In addition to their host genes, snoRNAs can participate in the regulation of methylation. In EC, several genes with aberrant DNA methylation have been selected as potential disease biomarkers [38, 39] But until now, few studies have investigated the interaction between snoRNAs and methylation in EC. In our study, the SNORD11B and SNORD15A were positively correlated with the levels of methylation sites, such as cg16620283 and cg27122942. However, this study only analyzed the correlation between snoRNAs and methylation sites, and the detailed regulatory mechanism between them needs to be further clari ed.
There were some limitations in this study. For example, all results in this study were predicted by the bioinformatics methods, and hence, further experiments and more samples are needed to validate these ndings.

Conclusion
The present study used bioinformatics analyses to screen DE-snoRNAs and DE-SNHGs in EC based on the TCGA dataset. Brie y, DE-snoRNAs of SNORA14B, SNORA47, SNORA71C, SNORD12B, and SNORD14E; DE-SNHGs of SNHG23, SNHG3, SNHG17, SNHG4, and SNHG8, as well as pathways of NODlike receptor signaling pathway, neuroactive ligand-receptor interaction, calcium signaling pathway, cytokine-cytokine receptor interaction, and ECM-receptor interaction, which may be involved the EC development, were obtained. These ndings could improve the understanding of the pathogenesis and molecular mechanisms of EC, which will provide effective targets for the treatment of EC.

Supplemental Information Note
Supplementary Figure 1 The main work ow of this study. Figure 1 Screening of the differentially expressed snoRNA and SNHGs between esophageal carcinoma and normal controls. The heat map of hierarchical clustering analysis indicating the differential snoRNA (A)
Page 18/20 Figure 5 Survival analysis of SNORA70D. Kaplan-Meier curve showed that patients with high expression of SNORA70D had shorter overall survival compared to those with low expression.

Figure 6
The snoRNA -mRNA co-expression network and KEGG enrichment analysis. A: Yellow and blue of oval represent up-and down-regulated mRNA. The rectangle represents snoRNA. B: The horizontal axis represents snoRNAs, and the vertical axis represents enriched pathways.

Figure 7
The ceRNA network and KEGG enrichment analysis. A: The red, green and blue circle represents SNHGs, mRNA, and miRNA, respectively. B: The horizontal axis represents SNHGs, and the vertical axis represents enriched pathways.The ceRNA network and KEGG enrichment analysis. A: The red, green and blue circle represents SNHGs, mRNA, and miRNA, respectively. B: The horizontal axis represents SNHGs, and the vertical axis represents enriched pathways.