Small Nucleolar RNA and Small Nucleolar RNA Host Gene Signatures as Biomarkers for Pancreatic Cancer

Objective: The objective of this study was to identify key molecules including small nucleolar RNAs (snoRNAs) and small nucleolar RNA host genes (SNHGs) involved in pancreatic cancer. Methods: First, we screened the differentially expressed snoRNAs (DEsnoRNAs) and trend-related snoRNAs based on the cancer genome atlas (TCGA) dataset for pancreatic cancer, and then performed methylation correlation analysis, survival analysis, and extraction of snoRNA host genes for Gene ontology (GO) functional and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Next, DESNHGs and trend-related SNHGs were screened according to the TCGA dataset for pancreatic cancer, and a competing endogenous RNA (ceRNA) network was constructed for pathway and functional enrichment analysis. Results: A total of eight DEsnoRNAs and 93 trend-related snoRNAs were extracted. Then, ten host genes of the snoRNAs were identied. Functional analysis suggested that the ten host genes were signicantly enriched in several GO terms including mitotic chromosome condensation and endocytosis pathway. SNORA38B was considered to associate with survival and prognosis. The SNORD17 and SNORA11 were considered to negatively correlate with methylation. In addition, two trend-related SNHGs were extracted. Additionally, a ceRNA network was constructed with 11 miRNAs, one lncRNAs, and one mRNA. SNHG24 mainly correlated with GnRH secretion and neuroactive ligand-receptor interaction in pancreatic cancer. Conclusion: The identied snoRNAs and SNHGs could serve as potential markers for the early detection of pancreatic cancer.


Introduction
Pancreatic cancer is one of the most common malignancies of the digestive system. It has a poor prognosis due to its short disease course, high degree of malignancy, and challenges in early detection and treatment [1]. According to statistics, only about 24% of the patients survive for one year, and 9% of the patients survive for more than ve years following the diagnosis of pancreatic cancer [2]. According to GLOBOCAN (Global cancer epidemic statistics) estimates, 458,918 new cases of pancreatic cancer were diagnosed in 2018 (accounting for 2.5% of all cancers), that resulted in 438,242 deaths, accounting for 4.5% of all cancer-related deaths [3]. At present, due to the low e ciency of surgical resection and chemotherapy, and the incidence and mortality rate of pancreatic cancer continue to be on the rise [4]. Therefore, pancreatic cancer is one of the most signi cant cancers affecting the health of the people in China and worldwide, which has caused serious economic burden to the families of the patients [5,6]. Therefore, new methods are needed to overcome the invasion, metastasis, and drug resistance of pancreatic cancers. Several studies have investigated the possible molecular mechanisms of pancreatic cancer initiation and development to identify novel therapeutic targets for pancreatic cancer [7,8]. Notably, noncoding RNA transcripts such as microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) have been reported to play signi cant roles in the molecular mechanism of cancers [9,10]. According to the competing endogenous RNA (ceRNA) hypothesis, different RNA transcripts participate in pathological processes mainly through competing for the binding sites of shared miRNAs. Fu et al. [11] analyzed the relationship between the expression of the LncRNA, HOTTIP, and overall or disease-free survival of 90 patients with pancreatic cancer following radical resection. They found that HOTTIP modulates cancer stem cell properties in human pancreatic cancer by regulating HOXA9. Cheng et al. [12] investigated the functional effects and the possible mechanisms of the small nucleolar RNA host gene 7 (SNHG7) in pancreatic cancer, and found that SNHG7 was overexpressed in pancreatic cancer tissues, and knockdown of SNHG7 suppressed pancreatic cancer cell proliferation, migration, and invasion through the miR-342-3p/ID4 axis.
Small nucleolar noncoding RNAs (snoRNAs) are generally considered to be housekeeping molecules of ribosome function. In addition, subsets of snoRNAs were shown to be dysregulated in some cancers, including SNORA74B, SNORA21, U50, SNORD47, and SNORD62 [13,14]. Alexey et al. [15] investigated the expression of six miRNAs isolated from formalin-xed para n-embedded samples of pancreatic adenocarcinomas, and found that the snoRNA, U91, was a new internal control for accurate microRNA quanti cation in pancreatic cancer.
In this study, we screened differentially expressed snoRNAs (DEsnoRNAs) and trend-related snoRNAs based on the cancer genome atlas (TCGA) dataset for pancreatic cancer, and then performed methylation correlation analysis, survival analysis, and extraction of snoRNA host genes for Gene ontology (GO) functional and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses. Then, the DESNHGs and trend-related SNHGs were screened based on the TCGA dataset for pancreatic cancer, and a ceRNA network was constructed for pathway and functional enrichment analysis. A ow chart depicting the schematic of this study is shown in Fig. 1.
This study aims to provide better understanding and promising therapeutic targets for pancreatic cancer.

snoRNA Data preprocessing
The TCGA dataset for pancreatic cancer and the corresponding comment les were downloaded from UCSC Xena [16]. First, the data was re-annotated to extract the expression data for snoRNAs. Then the downloaded data was transformed to get the count values, and the count values were sorted to obtain the expression matrix of the data.
Lastly, details about the pancreatic cancer of the sample and the stage grouping were obtained. The grouping of pancreatic cancer included tumor and normal time-series analysis of the stage.

Identi cation of differentially expressed snoRNAs
The differential analysis compared the tumor group with the normal. First, based on the read count data of the snoRNAs, the TMM (trimmed mean of m values) normalization provided by the R software package edge [17,18] was used to preprocess the read counts. Then the quasi-likelihood (QL) F-test in the edgeR package was applied for screening the differentially expressed snoRNAs (DEsnoRNAs), with the screening criteria set as p < 0.05, |logFC| > 1 (foldchange > 1.2). Based on the DEsnoRNA values obtained from the screening, a heat map was drawn to observe the clustering of samples, and a volcano plot was drawn to show the difference.

Identi cation of trend-related snoRNAs
Based on the classi cation of the tumor stage diagnosis, including stage I, stage II, stage III, and stage IV, the data were analyzed using the STEM [19] software (version 1.3.11, http://www.cs.cmu.edu/~jernst/stem/) for trend analysis. The data were processed with 'No normalization/add 0' and screened with p < 0.05.

Identi cation of host genes of DEsnoRNAs and trend-related snoRNAs
The host genes of DEsnoRNAs and trend-related snoRNAs in the data were extracted using the snoopy [20] tool (http://snoopy.med.miyazaki-u.ac.jp/).

GO functional and pathway enrichment analysis
According to the host genes of the snoRNAs, the clusterpro ler tool [21] was used to perform GO functional and KEGG pathway analysis. The GO functional analysis included biological processes (BP), cellular components (CC), and molecular functions (MF).

Survival analysis
Survival analysis was conducted on the DEsnoRNA and trend-related snoRNA data. p < 0.05 was considered to be an association with survival and prognosis. During the analysis, samples with survival information were screened and those with survival time longer than one month were selected.

Methylation correlation analysis
The methylation correlation analysis was conducted using the DEsnoRNAs, trend-related snoRNAs, and the data of TCGA methylation with corresponding pancreatic cancer. Then the methylation correlation results were obtained using the SNORic tool [22] ( http://bioinfo.life.hust.edu.cn/SNORic/) .

Data preprocessing
The TCGA dataset for pancreatic cancer and the corresponding comment les were downloaded from UCSC Xena [16]. First, the data was re-annotated to extract the expression data for small nucleolar RNA host gene (SNHG), and the mRNA expression data corresponding to each dataset was extracted. The downloaded data were transformed to get the count values, and the count values were sorted to obtain the expression matrix of the data. Finally, details about the pancreatic cancer of the sample and the stage grouping were obtained. The grouping of pancreatic cancer included tumor and normal time-series analysis of stage analysis.
Identi cation of differentially expressed SNHGs Differential analysis was used to compare the tumor and normal groups. First, according to the read count data, the TMM (trimmed mean of m values) normalization provided by the R software package edge [17,18] was used to preprocess the read count. Then, the quasi-likelihood (QL) F-test in the edgeR package was applied for screening the DESNHGs, and the following screening criteria were used p < 0.05, |logFC| > 0.585 (foldchange > 1.5).

Identi cation of trend-related SNHGs
Based on the classi cation of tumor stage diagnosis, including stage I, stage II, stage III and stage IV, the data were processed using the STEM [19] software (version 1.3.11, http://www.cs.cmu.edu/~jernst/stem/ ) for trend analysis. First, the data were processed with 'No normalization/add 0' and screened with p < 0.05 parameter. Then the data were processed according to the parameter 'normalize data' and screened with p < 0.1.

Correlation analysis
Based on the results of the differential and trend analyses, the SNHGs related to the pancreatic cancer trend were selected and combined with the mRNA expression value of corresponding datasets. LncRNA and its corresponding mRNA in each group of data were analyzed to obtain the lncRNA-mRNA interaction pairs. The screening threshold was set at p < 0.05.

Prediction of lncRNA-miRNA interaction pairs
The Miranda [23] software was used to extract the lncRNA sequence in the lncRNA-mRNA interaction pairs, and to predict the target miRNA. Then the results were screened with the screening parameters -sc 140, -en -20, and the lncRNA-miRNA interaction pairs were obtained.

Prediction of mRNA-miRNA interaction pairs
The Miranda [23] software was used to extract the mRNA sequence in the lncRNA-mRNA interaction pairs, and to predict the target miRNA. Then the results were screened with the screening parameters -sc 140, -en -20, and the mRNA-miRNA interaction pairs were obtained.

Construction of ceRNA network
The interaction pairs of lncRNA-mRNA, lncRNA-miRNA, and mRNA-miRNA were integrated to construct the ceRNA network and to obtain the ceRNA interaction pairs.

GO functional and pathway enrichment analysis in ceRNA network
The genes in the ceRNA network were used to perform KEGG pathway annotation using the clusterpro ler tool [21].
Then the pathways that may be affected by the SNHGs were obtained, and the mechanism of its effect on pancreatic cancer was analyzed.
Statistical analysis SPSS 22.0 statistical software was used for statistical processing. Continuous variables are expressed as mean ± standard deviation (mean ± SD). The enumeration data consistent with normal distribution between two groups were compared using the t test, and the comparison among groups of enumeration data were tested using chi-square test. Statistical difference with p < 0.05 was considered signi cant.

Results
Clinical information of pancreatic cancer data samples As shown in Table 1, the sites of pancreatic cancer included the body of pancreas, head of pancreas, overlapping lesion of pancreas, pancreas, NOS, and tail of pancreas. There was a signi cant difference between stage I, II, III, and IV (p < 0.05 ). The type and age of pancreatic cancer between stage I, II, III, and IV showed no signi cant differences.

Snorna
Identi cation of differentially expressed snoRNAs and trendrelated snoRNAs A total of eight DEsnoRNAs were identi ed using the method described above with p < 0.05 and |logFC| > 1 (foldchange > 1.2) and consisted of two upregulated and six downregulated snoRNAs. The heatmap and volcano plot of the DEsnoRNAs are shown in Fig. 2. After the data were processed with 'No normalization/add 0' and screened with p < 0.05, a total of 93 trend-related snoRNAs were obtained (Fig. 3, Table 2). The top 10 trend-related snoRNAs were selected for subsequent analysis. Identi cation of host genes of DEsnoRNAs and trend-related snoRNAs In total, four host genes of the DEsnoRNAs in the data were obtained, including NCAPD2, C1orf79, NAP1L4, and SNX5. A total of six host genes of the trend-related snoRNAs in data were identi ed, including NCAPD2, PHB2, MAGED2, CWF19L1, C5orf26, and C1orf79 (Table 3). Functional analysis of host genes of DEsnoRNAs and trend-related snoRNAs The GO enrichment analysis of the host genes of DEsnoRNAs revealed that the signi cantly enriched GO-BP terms included mitotic chromosome condensation, chromosome condensation, and mitotic sister chromatid segregation.
The condensin complex, DNA packaging complex, and condensed chromosome were three markedly enriched GO-CC terms. Furthermore, we also found that these genes played essential roles in histone binding, nucleosome binding, and actin lament binding according to the GO-MF analysis. The KEGG pathway analysis suggested that these genes were responsible for endocytosis pathway. The GO enrichment analysis of the host genes of trend-related snoRNAs revealed that the signi cantly enriched GO-BP terms included mitotic chromosome condensation, chromosome condensation, and mitotic sister chromatid segregation. The condensin complex, DNA packaging complex, and condensed chromosome were three markedly enriched GO-CC terms. Furthermore, we also found that these genes played essential roles in histone binding, nucleosome binding, and estrogen receptor binding according to the GO-MF analysis. The KEGG pathway analysis revealed that these genes were responsible for the endocytosis pathway ( Fig. 4 and Supplementary le 1).
Survival analysis of DEsnoRNAs and trend-related snoRNAs snoRNA survival analysis was performed based on the DEsnoRNAs and trend-related snoRNAs. As shown in the

Methylation correlation analysis of DEsnoRNAs and trend-related snoRNAs
Methylation correlation analysis was performed based on the DEsnoRNAs and trend-related snoRNAs. As illustrated in Fig. 6 and Table 4, SNORD17, a DEsnoRNA with the p = 0.0412 and correlation coe cient = -0.2077, was found to be negatively correlated with methylation. SNORA11, a trend-related snoRNA, was also found to be negatively correlated with methylation (p = 0.0336 and correlation coe cient = -0.2159). Based on p < 0.05, |logFC| > 0.585 (foldchange > 1.5), no DESNHG was identi ed. After the data were processed with 'No normalization/add 0' and screened with p < 0.05, there were no signi cant trend-related SNHGs. Then the data were processed based on the parameter 'normalize data' and screened with p < 0.1, and two SNHGs correlated with trends, including SNHG23 and SNHG24 (Fig. 7, Table 5). The expression values of these two SNHGs were selected for subsequent analysis. Prediction of lncRNA-miRNA and mRNA-miRNA interaction pairs With the screening parameters of -sc 140 and -en -20, a total of 127 lncRNA-miRNA interaction pairs were obtained.
The lncRNA-miRNA interaction pairs are shown in Table 6. With the screening parameters of -sc 140 and -en -20, a total 99 mRNA-miRNA interaction pairs were identi ed. The mRNA-miRNA interaction pairs are shown in Table 7.    Construction of ceRNA network Based on the interaction pairs of lncRNA-miRNA, mRNA-miRNA, and lncRNA-mRNA, a ceRNA network was constructed with 11 miRNAs, one lncRNA, and one mRNA, including SNHG24-has-miR-4739-KISS1, SNHG24-has-miR-4734-KISS1, and SNHG24-has-miR-135b-3p-KISS1 (Fig. 8). These RNA interactions may contribute to the progression and metastasis of pancreatic cancer.

GO functional and pathway enrichment analysis in ceRNA network
Based on the lncRNA-mRNA interaction pairs in the ceRNA network, KEGG pathway annotation was performed. The results revealed that SNHG24 mainly correlated with GnRH secretion and neuroactive ligand-receptor interaction in pancreatic cancer (Fig. 9).

Discussion
Pancreatic cancer is the most malignant tumor of the digestive system, and it is also known as the 'king of cancers' [24]. According to the latest statistical data, the incidence of pancreatic cancer ranks eighth among all malignant tumors in China, and the cumulative 5-year survival rate is only 7.2% [25]. Although growing number of studies have focused on dissecting and evaluating the underlying molecular etiology of pancreatic cancer over the past few years, the precise mechanisms of pancreatic cancer initiation and development are still poorly understood. In our study, eight DEsnoRNAs and 93 trend-related snoRNAs were identi ed. Ten host genes of snoRNAs were also identi ed, and functional analyses revealed that the ten host genes were signi cantly enriched in GO terms including mitotic chromosome condensation and endocytosis pathway. SNORA38B was found to be associated with survival and prognosis. SNORD17 and SNORA11 were found to be negatively correlated with methylation. Two trend-related SNHGs were also extracted. Additionally, a ceRNA network was constructed with the 11 miRNAs, one lncRNAs, and one mRNA. SNHG24 was mainly correlated with GnRH secretion and neuroactive ligand-receptor interaction in pancreatic cancer.
SNORD17, also known as HBI-43, has been shown to be involved in the differentiation of different cancer subtypes [26]. A recent report showed that SNORD17 was functionally related to a speci c active gene TRIM25 [27]. Wang et al. [28] demonstrated that the control of TRIM25 RNA by an interplay between IGF2BP3 and miR-3614-3p represents a mechanism for breast cancer cell proliferation. More notably, Li et al. [29] reported that suppression of TRIM25 expression using high levels of miR-873 dictates MTA1 protein upregulation in hepatocellular carcinoma. Therefore, it is quite reasonable to speculate that as the regulator of TRIM25, the expression pattern of SNORD17 may also be sensitive to distinguish the different cancer subtypes. Interestingly, Davanian et al. [30] rst identi ed SNORA11 as a distinct ncRNA signature of ameloblastoma. Herein, our methylation correlation analysis showed that SNORD17 and SNORA11 were also negatively correlated with methylation. Taken together, our results suggest that SNORD17 and SNORA11 are likely associated with pancreatic cancer occurrence and progression. In addition, our functional analysis showed that SNORD17 participated in some BPs, especially in endocytosis. Numerous studies have reported that endocytosis serves as a key regulatory mechanism in carcinogenesis [31,32], which further implies that the molecular interactions between SNORD17 and endocytosis may be a novel therapeutic target for pancreatic cancer treatment. In this study, we also found SNORA38B to be associated with survival and prognosis of pancreatic cancer. Until now, SNORA38B has not been associated with any other neoplasms, thus making it unique in this respect.
Clorf79, also known as SNHG12, is a novel lncRNA identi ed to be upregulated in several cancer cells, such as human osteosarcoma cell [33], nasopharyngeal carcinoma cell [34], and human endometrial carcinoma [35]. Zhao et al. [36] veri ed that SNHG12 promotes angiogenesis after ischemic stroke through the miR-150/VEGF pathway, that further clari ed the mechanism of angiogenesis after ischemic stroke, and provided a target for the treatment of ischemic stroke. Wang et al. [37] suggested that SNHG12 contributes to the oncogenic potential of triple-negative breast cancer and may be a promising therapeutic target. Wang et al. [38] illustrated that SNHG12 promoted cell growth and inhibited cell apoptosis in CRC cells, indicating that SNHG12 might be a useful biomarker for colorectal cancer. In the current study, C1orf79 was identi ed as the host gene of DEsnoRNAs and trend-related snoRNAs. C1orf79 is a new biomarker that to the best of our knowledge has not yet been reported to be associated with pancreatic cancer.
SNHG24, a prominent regulator of the ceRNA network, was found to be closely linked with the process of GnRH secretion and neuroactive ligand-receptor interaction, and exhibited signi cant correlation with KISS1. Reports suggest that the lncRNA SNHG24 can regulate TNF-α expression, and TNF-α and its associated lncRNA SNHG24 represent potential therapeutic targets for bone-invasive pituitary adenomas in the future [39]. Eckstein et al. [40] highlighted a strong correlation between GnRH secretion and the risk of prostate cancer, which directly supports our ndings. Reports suggest that the pathway of neuroactive ligand-receptor interaction is associated with a variety of cancers [41], which further implies that the molecular interactions between SNHG24, GnRH secretion, and neuroactive ligand-receptor interaction may be a novel therapeutic target for pancreatic cancer treatment. In addition, Wang et al. [42] showed that an increase in the expression of KiSS1 could inhibit invasion of pancreatic cancer cells without affecting cell proliferation. McNally et al. [43] suggested that the induction of KISS1 expression has the potential of adjuvant therapy in pancreatic cancer. More interestingly, the ceRNA-related with SNHG24-KISS1 network analysis found that SNHG24-KISS1 had a strong relationship with miRNAs such as has-miR-4739, has-miR-4734, and has-miR-135b-3p. Increasing evidence suggest that miRNA serve as functional modulators in tumor development [44,45]. Taken together, we speculate that miRNAs such as has-miR-4739, has-miR-4734, and has-miR-135b-3p contribute to pancreatic cancer progression possibly by targeting the SNHG24-KISS1 axis, which still needs to be con rmed through further bioinformatics analysis and experimental evidence.
Although we explored the potential molecular mechanisms of pancreatic cancer occurrence and development using a bioinformatics approach, there were still several limitations in this study. Firstly, no DESNHGs were obtained in this study. In addition, relevant experiments including cell biology assays, and animal and clinical studies need to be performed to verify the multiple candidate targets and signaling pathways identi ed from our bioinformatics analyses.

Conclusion
The snoRNAs and SNHGs identi ed in our study may serve as potential markers for the early detection of pancreatic cancer. However, our ndings still need to be validated through further studies.

Availability of data and materials
The datasets generated during the current study are not publicly available but obtained from corresponding authors on reasonable request. Figure 1 The ow chart of this study Figure 2 Identi cation of differentially expressed snoRNAs (DEsnoRNAs) and trend-related snoRNAs To screen the differentially expressed snoRNAs (DEsnoRNAs) between the tumor and normal groups based on the read count data of the snoRNAs, the TMM (trimmed mean of m values) normalization provided by the R software package edge was used to preprocess the read count. Then the quasi-likelihood (QL) F-test in the edgeR package was applied for screening the DEsnoRNAs, and the screening criteria was p < 0.05, |logFC| > 1(foldchange > 1.2). A total of eight DEsnoRNAs, including two upregulated and six downregulated snoRNAs were identi ed. A: Heatmap of DEsnoRNAs;

Figures
B: volcano plot of DEsnoRNAs    Identi cation of trend-related SNHGs To obtain the trend-related SNHGs based on the classi cation of tumor stage diagnosis, including stage I, stage II, stage III, and stage IV, the data was processed using STEM software for trend analysis. First, the data was processed with 'No normalization/add 0' and screened with p < 0.05. Then the data was  Construction of ceRNA network To identify the ceRNA interaction pairs, the interaction pairs of lncRNA-mRNA, lncRNA-miRNA, and mRNA-miRNA were integrated to construct the ceRNA network. The ceRNA network was constructed with 11 miRNAs, one lncRNA and one mRNA. The red circle represents SNHGs, the green circle represents mRNA, and the blue circle represents miRNA.