Identification and analysis of small nucleolar RNAs (snoRNAs) associated co-expression and ceRNA regulatory networks in esophageal carcinoma

Background: This study aimed to investigate the role of small nucleolar RNAs (snoRNAs) and their host genes (SNHGs) in and (EC). Methods: RNA-seq data and clinical information for EC patients were obtained from The Cancer Genome Atlas database. Differentially expressed snoRNAs (DE-snoRNAs) and SNHGs (DE-SNHGs) between EC samples and normal controls were screened with the edgeR package, followed by the trend analysis with STEM. Survival analysis was performed using the Kaplan–Meier method with the log-rank test. Identification of co-expressed genes and functional enrichment analyses were performed with Pearson’s correlation analysis and enrichment analysis, respectively. Lastly, the co-expression and ceRNA networks were constructed. Results: A total of 19 DE-snoRNAs and nine DE-SNHGs were identified between EC samples and normal controls. Moreover, two snoRNA clusters related to the clinic stage were identified through trend analysis. Survival analysis results demonstrated that SNORA70D was significantly associated with the overall survival of EC patients (P=0.034). In addition, a co-regulation network that included six snoRNAs and 29 mRNAs was constructed. A ceRNA network comprised of five SNHGs, 11 mRNAs, and 38 miRNAs was constructed based on the top 50 relationship pairs. KEGG pathway enrichment analysis revealed that SNORA14B, SNORA47, SNORA71C, SNORD12B, and SNORD14E in the co-expression network were mainly enriched in the NOD−like receptor signaling pathway, and neuroactive ligand−receptor interaction pathway. SNHG23, SNHG3, SNHG17, SNHG4, and SNHG8 in the ceRNA regulation network were mainly enriched in calcium signaling pathway, cytokine−cytokine receptor interaction, and the extracellular matrix−receptor interaction pathway. Conclusion: The data revealed a series of snoRNAs, SNHGs, and pathways involved in EC carcinogenesis, which will provide a basis for understanding the underlying molecular mechanism of EC development. were obtained. These findings could improve the understanding of the pathogenesis and molecular mechanisms of EC, which will provide effective targets for the treatment of EC.

into two histologic types, including esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC), of which 88% of the cases are ESCC [3]. Accumulating evidence shows that the risk factors of EC include smoking, alcoholic beverages, Barrett's esophagus, gastric reflux and obesity [4,5]. Till now, despite the improvements of multidisciplinary treatments, such as surgery, chemotherapy, and radiotherapy, the prognosis of EC patients still remains poor, with the 5-year survival rate remains only approximately 20% [6]. To our knowledge, EC is a multistep process involving a series of genetic or epigenetic alterations; hence, to investigate the molecular and pathogenic mechanisms underlying the carcinogenesis of EC is still necessary.
Small nucleolar RNAs (snoRNAs), comprising of 60-300 nucleotides in length, are a conserved class of non-coding RNAs that contribute to ribosome biogenesis and RNA splicing by modifying ribosomal RNA and spliceosomal RNAs. The role of snoRNA in tumor development was first reported in 2002 [7].
In the ensuing decade, an increasing number of studies have indicated the involvement of snoRNAs in different tumors, including lymphoma, breast cancer, melanoma, and myeloma [8][9][10][11][12]. It has been reported that snoRNAs are regulated in a was complex manner by host genes, copy number variation, and DNA methylation [13]. As the host genes for snoRNAs, small nucleolar RNA host genes (SNHGs) include coding genes and non-coding genes. Long non-coding small nucleolar host genes are one of the classes of SNHGs. Increasing lnc-SNHG members continue to be identified in cancers. For instance, it has been reported that SNHGs are involved in the development of various diseases, including cancer progression, cell apoptosis, and survival [14]. SNHG8 may play oncogenic roles in the malignancy of ESCC, and offer potential biomarkers for the diagnosis and prognosis of ESCC [15].
LncRNA SNHG1 may be a potential predictor of prognosis in ESCC patients and a novel target for ESCC treatment [16].
With the development of microarray profiles and RNA-sequencing, a series of public gene expression analyses were published, including The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov/) and Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/gds) datasets. In particular, some snoRNAs and SNHGs play important roles in biological processes of disease. However, few studies have addressed the presence of these snoRNAs or SNHGs in EC. Hence, in the present study, we conducted a bioinformatics analyses to identify the candidate snoRNAs, SNHGs, and pathways involved in EC development as shown in Supplementary Fig. 1.

Materials And Methods Data collection and preprocessing
Public RNA-sequencing data and survival data of patients with EC were downloaded from the UCSC Xena [17] ( https://xenabrowser.net/). The data are open to the public under certain guidelines. All expression profiles data were classified into two groups: normal and tumor. Clinical information, including age at diagnosis, gender, site, type and body mass index was summarized in Table 1. After the extraction of snoRNAs and SNHGs data, the initial log (counts + 1) value was converted to the counts.

Differential snoRNAs and SNHGs expression analysis
Based on the read count, differentially expressed snoRNAs (DE-snoRNA) and SNHGs (DE-SNHGs) were analyzed using the TMM (trimmed mean of M values) normalization method of edgeR software package [18,19], followed by difference analysis using quasi-likelihood (QL) F-test method. The cutoff criteria of DE-snoRNA was P < 0.05 and |logFC|>1(foldchange > 1.2), and that for DE-SNHGs screening was P < 0.05 and |logFC|>0.585(foldchange > 1.5). Hierarchical clustering analysis was performed for the DE-snoRNA and DE-SNHGs using the R package.

Trend analysis
According to the diagnosis classification of tumor stage, EC was divided into four stages, including stage i, stage ii, stage iii, and stage iv. The trend analysis was performed with STEM [20] (version 1.3.11, http://www.cs.cmu.edu/~jernst/stem/).

Identification of the host genes of the snoRNA and functional annotation
The host genes of the DE-snoRNAs and snoRNAs related to clinical stage of EC were extracted by using the snoopy tool (http://snoopy.med.miyazaki-u.ac.jp/) [21]. For functional annotation of those host genes of snoRNA, GO and KEGG pathway annotations were conducted with the cluster profiler package [22]. Significant pathways with a threshold of P < 0.05 were selected.

Survival analysis between snoRNA and methylation
According to the TCGA datasets, Kaplan-Meier analysis was used to identify the genes showing clinical relevance in tumor patients. Briefly, the samples were divided into two groups according to the median expression of each gene: high expression and low expression groups. The log-rank test was used to analyze the differences between groups, and the significantly prognostic value was selected with a threshold of P < 0.05. Statistical analyses were performed using R package ("survival").

Correlation analysis between snoRNA and methylation
Methylation has been regarded as a key factor in the abnormal expression of snoRNA in cancers. The correlation analysis of the methylation level and DE-snoRNA was analyzed by the SNORic data portal ( http://bioinfo.life.hust.edu.cn/SNORic/) [22].
Construction of snoRNA-mRNA co-expression network and ceRNA regulation network Based on the expression values of snoRNA or trend-related snoRNA, as well as the expression values of differentially expressed mRNAs corresponding to diseases, a correlation analysis was conducted to obtain the mRNAs corresponding to snoRNAs, with the threshold of P < 0.05 and |r| >0.4.
Furthermore, the enrichment analysis of the snoRNA-mRNA co-expression pairs were further carried out using the cluster profiler package [22].
To construct the ceRNA regulation network, Pearson correlation analysis was performed to determine the correlation of the DE-SNHG and DEGs (threshold: P < 0.05). The top 5 members of the lncRNA-mRNA relationships were selected for further study. Subsequently, the targeted miRNAs of lncRNAs and mRNAs were predicted based on the MiRanda software [5]. Next, the ceRNA network was constructed based on the lncRNA-miRNA and lncRNA-mRNA relationship pairs with the threshold of -sc 140 and -en -20. Similarly, the enrichment analysis of the genes in the network was done using the cluster profiler package [22].

Statistical analyses
All statistical analyses were performed using GraphPad Prism 7.0 (GraphPad Inc., La Jolla, CA), and all data are expressed as the mean ± standard deviation (SD). Statistical comparisons between groups of normalized data were performed using the Student's t-test. P < 0.05 was considered to be statistically significant difference.

Identification of DE-snoRNA and DE-SNHGs
Based on the fold change of P < 0.05 and |logFC|>1, a total of 19 snoRNAs were revealed to be differentially expressed between EC and normal controls. Among these DE-snoRNAs, 16 were upregulated and 3 were down-regulated in EC compared with normal samples (Supplementary Table 1).
In addition, compared with those in the normal, there were 9 DE-SNHGs, including 4 up-and 56 down-

Functional annotation
To identify the functions of the DE-snoRNAs and snoRNAs related to clinical stage of EC, the host genes were identified ( Table 2). As shown in Fig. 3A, the host genes, including SNORA14A, SNORD11B, SNORD15A, SNORD53, and SNORD72, were mainly enriched in the GO BP terms related to regulation of steroid hormone biosynthetic process, positive regulation of cholesterol metabolic process, and carnitine metabolic process. Furthermore, KEGG pathway enrichment analysis revealed that SNORD11B, SNORD15A, SNORD53 and SNORD72 were involved in pathways that included Ribosome biogenesis in eukaryotes, Ribosome and n Pathogenic Escherichia coli infection (Fig. 3B).

Correlation analysis between snoRNAs and methylation
To predict the relationship between snoRNAs and methylation, correlation analysis was performed

Construction of the ceRNA regulation networks
According to the miranda software, lncRNA-miRNA and mRNA-miRNA relationship pairs were obtained, respectively (Supplementary file 2). Next, the ceRNA network was constructed using the top 50 relationship pairs. As illustrated in Fig. 7A, the ceRNA network was composed of 5 SNHG nodes, 11mRNA nodes, and 38 miRNA nodes. Subsequently, KEGG pathway enrichment analysis of SNHGs (top 5: SNHG23, SNHG3, SNHG17, SNHG4 and SNHG8) revealed a series of pathways, such as calcium signaling pathway, IL − 17 signaling pathway, cytokine − cytokine receptor interaction and ECM − receptor interaction (Fig. 7B).

Discussion
With the advances in high throughput sequencing, an increasing number of snoRNAs have been identified and are emerging as important RNAs, thereby attracting the attention of researchers.
Studies have shown that some snoRNAs play important roles in biological processes, and dysfunction of snoRNAs may lead to oncogenesis [16]. It has also been reported that snoRNAs could serve as biomarkers in several diseases, including cancers [23]. Concerning the host gene for snoRNAs, SNHG genes may have diverse regulatory effects on cellular processes in some cancers [24,25]. In the present study, a total of 19 DE-snoRNAs and 9 DE-SNHGs were identified between EC samples and normal controls, among which SNORA14B, SNORA47, SNORA71C, SNORD12B, and SNORD14E were selected as the key members due to their topological characteristics in the snoRNA-mRNA coexpression network; Similarly, SNHG23, SNHG3, SNHG17, SNHG4, and SNHG8 was selected as the key members in the ceRNA regulation network. SNORA47 is involved in lung cancer tumorigenesis [26] and promotes tumorigenesis by regulating EMT markers in hepatocellular carcinoma [27]. SNORA71C is overexpressed in breast cancer brain metastases [28]. Furthermore, SNHG3 facilitates cell proliferation and migration in oral squamous cell carcinoma via targeting nuclear transcription factor Y subunit gamma [29]. It has been also been reported to that SNHG3-related ceRNAs can be potential research targets to explore the molecular mechanisms of HCC [30]. Moreover, SNHG17 is involved in gastric cancer, non-small-cell lung cancer, or melanoma through regulation of pathways, such as the phosphoinositide 3-kinase (PI3K)-AKT pathway [31][32][33]. However, until, little has been published of these snoRNAs or SNHGs in EC.
Oncogenesis is a complex process that is involved in the interaction of various genes and signaling pathways. Hence, the analysis of candidate snoRNAs or SNHGs and pathways related to EC could provide the cognitive basis for disease development. The present data reveal the key snoRNAs or SNHGs in co-expression and ceRNA networks mainly enriched in pathways, such as the NOD-like receptor signaling pathway and neuroactive ligand-receptor interaction, as well as calcium signaling pathway, cytokine-cytokine receptor interaction, and ECM-receptor interaction. Actually, cytokinecytokine receptor interaction, NOD-like receptor signaling pathway, and ECM-receptor interaction are chief contributors to EC progression [34]. Zeng et al. [35] reported that the calcium signaling pathway and the neuroactive ligand-receptor interaction are the most relevant pathways in EC diagnosis using miRNA-seq and RNA-seq data. In accordance with previous studies, our data suggest the involvement of snoRNAs or SNHGs in the progression of EC by the aforementioned pathways.
Recent studies have suggested that snoRNAs can be a novel prognostic biomarkers and therapeutic targets for cancers. For instance, Zhu et al. [36] reported that SNORD89 deleteriously affects the prognosis of ovarian cancer patients by regulating the Notch1-c-Myc pathway. Another study described that the inhibition of SNORA7B expression impaired cell growth, proliferation, migration, and invasion by inducing apoptosis [37]. In this study, based on data from TCGA, patients with high expression of SNORA70D displayed shorter overall survival compared to those with low expression.
However, the role of SNORA70D has not been reported in any disease, and it will be necessary to further validate the clinical prognostic value of SNORA70D EC. In addition to their host genes, snoRNAs can participate in the regulation of methylation. In EC, several genes with aberrant DNA methylation have been selected as potential disease biomarkers [38,39] But until now, few studies have investigated the interaction between snoRNAs and methylation in EC. In our study, the SNORD11B and SNORD15A were positively correlated with the levels of methylation sites, such as cg16620283 and cg27122942. However, this study only analyzed the correlation between snoRNAs and methylation sites, and the detailed regulatory mechanism between them needs to be further clarified.
There were some limitations in this study. For example, all results in this study were predicted by the bioinformatics methods, and hence, further experiments and more samples are needed to validate these findings.

Conclusion
The present study used bioinformatics analyses to identify DE-snoRNAs and DE-SNHGs in EC based on the TCGA dataset. Briefly, DE-snoRNAs of SNORA14B, SNORA47, SNORA71C, SNORD12B, and SNORD14E; DE-SNHGs of SNHG23, SNHG3, SNHG17, SNHG4, and SNHG8, as well as pathways of NOD-like receptor signaling pathway, neuroactive ligand-receptor interaction, calcium signaling pathway, cytokine-cytokine receptor interaction, and ECM-receptor interaction, which may be involved the EC development, were obtained. These findings could improve the understanding of the pathogenesis and molecular mechanisms of EC, which will provide effective targets for the treatment of EC. Supplementary Figure Legend Supplementary Figure 1 The main workflow of this study. Tables   Tables 1 and 2 are not available for this version of the manuscript.      The snoRNA -mRNA co-expression network and KEGG enrichment analysis. A: Yellow and blue of oval represent up-and down-regulated mRNA. The rectangle represents snoRNA. B:

Abbreviations
The horizontal axis represents snoRNAs, and the vertical axis represents enriched pathways.

Figure 7
The ceRNA network and KEGG enrichment analysis. A: The red, green and blue circle