Gene modules and non-coding RNAs involved in pancreatic tumorigenesis through acinar ductal metaplasia

Background: Acinar ductal metaplasia (ADM) is a recently identied precursor lesion that can progress through pancreatic ductal intraepithelial neoplasia (PanIN) to pancreatic ductal adenocarcinoma (PDAC). However, the genetic alterations and the transcriptional regulators at work during the process of ADM-driven PDAC tumorigenesis are largely unknown. We applied a multidimensional integration strategy to unveil the gene modules and non-coding RNAs that drive the ADM-PanIN-PDAC process. Methods: GSE40895 and the microarray datasets were integrated to unmask the regulators linked to ADM, PanIN and PDAC. Based on the differentially expressed genes and protein–protein interaction (PPI) networks for each stage, overlapping and crosstalk gene modules in ADM-PanIN-PDAC were identied using the search tool for the retrieval of interacting genes (STRING) and Cytoscape. The functions of these modules were elucidated by gene ontology (GO) analysis. The expression levels of hub genes and survival analysis were investigated in human PDAC via gene expression proling interactive analysis (GEPIA). The MiRDB database was used to predict potential non-coding RNAs (ncRNAs) capable of regulating overlap and crosstalk genes. Results: We found several bridging ADM gene modules (e.g. SMARCA1 and H2AFZ), PanIN gene modules (e.g. HDAC11 and SMARCA2) and PDAC gene modules (e.g. OLFR239 and CLIP3). They were enriched in nucleosome assembly, chromatin organization and G-protein coupled receptor signalling pathways by GO analysis. MicroRNAs (e.g. mmu-miR-335-5p and mmu-miR-669n) and lncRNAs (e.g. H19 and Gm14207) took part in this ample crosstalk by regulating the gene expression. Conclusions: SMARCA1, SMARCA2 and CLIP3 were identied as novel crosstalk genes and potential prognostic biomarkers for ADM-driven PDAC carcinogenesis. After validation in clinical and functional studies,

Overlapping gene modules between samples of ADM and PanIN were selected by hypergeometric test with the following formula [20]: M, n, N and m represent the total number of genes in the ADM module, PanIN module, STRING database, and overlap modules, respectively. Overlapping gene modules between samples of PDAC and PanIN were also selected by the formula mentioned above. In this case M and n represent the number of genes in PanIN and PDAC, respectively, while the other values remain the same as above. A P value less than 0.05 was considered statistically signi cant. The same test was run again on these two overlapping gene modules. We nally obtained the signi cant overlapping gene modules of ADM-PanIN-PDAC. Cytoscape was used to visualize the interaction network of the overlapping gene modules.

Crosstalk gene modules in ADM, PanIN and PDAC
Whether the crosstalk of two gene modules is statistically signi cant depends on the number of interactions between two certain PPI networks and random computation [20]. One pair of sub-network modules of ADM and PanIN (or PanIN and PDAC) had m times of participation interaction in actual conditions. The original PPI network was randomized 1000 times by maintaining the degree of distribution of the unchanged nodes. The two sub-network modules with the same size as the original network modules were randomly screened. We computed the interaction times in random sub-network modules in the same pair of ADM and PanIN (or PanIN and PDAC) modules. The p value for the signi cance of the interaction between a single pair of sub-network modules was calculated as the randomized simulation computation of interaction times larger than the actual participation interaction times divided by 1000 times.
Interactive sub-network modules with a p value less than 0.05 were considered to be signi cant interactive sub-networks.

Gene ontology analysis of overlapping and crosstalk gene modules
The function of signi cant overlapping and crosstalk gene modules was investigated by GO analysis through the database for annotation, visualization and integrated discovery (DAVID) (https://david.ncifcrf.gov). Fisher's exact test was applied and a P value less than 0.05 was de ned as statistically signi cant.
MicroRNA and lncRNA network regulating the overlapping and crosstalk gene modules The MiRDB database was used to predict those microRNAs capable of regulating signi cant overlapping and crosstalk gene modules with the bioinformatic tool MirTarget. All of the mouse lncRNA information was obtained from deepbase (http://biocenter.sysu.edu.cn/deepBase/browser.php). Co-expression of lncRNA and potential target genes was acquired based on Pearson correlation by R. Only those with both a P value less than 0.05 and a correlation value more than 0.7 were considered as predicted lncRNA.
Differentially expressed and potentially prognostic overlapping or crosstalk genes in human PDAC To explore the expression level of overlapping and crosstalk genes in human PDAC samples, we used the GEPIA tool [21] to perform differential expression analysis in 171 normal pancreatic tissue samples and 179 PDAC tissue samples produced by The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) projects. Genes with absolute log 2 Fold Change (|log 2 FC|) >1 and P <0.05 are considered to be differentially expressed. Overall survival analysis was also performed by the GEPIA tool. It classi ed the high-expression and low-expression cohorts according to the median for log-rank test (Mantel-Cox test). Hazard ratios (HR) and con dence intervals (CI) were given. Genes with P <0.05 were considered to be signi cant potential prognostic biomarkers for PDAC. Results with statistical signi cance were reported.

DEGs and PPI networks in ADM-PanIN-PDAC
In the GSE40895 dataset, a total of 1658 DEGs in ADM (972 upregulated, 686 downregulated), 557 DEGs in PanIN (254 upregulated, 303 downregulated) and 705 DEGs in PDAC (355 upregulated, 350 downregulated) were identi ed among samples compared with matched non-precursor lesions or non-tumor tissue samples, respectively. A heatmap of the DEG expression was also created in the corresponding ADM, PanIN and PDAC groups ( Figure S1). The names of the genes used for the heatmap are presented in Tables S1-S3. To investigate the regulatory correlations of the DEGs in ADM, PanIN and PDAC, a PPI network of the DEGs was mined. 935 DEGs from a total of 1658 DEGs in ADM ( Fig. 1) were ltered into the DEG PPI network complex, including 6278 edges and 935 nodes. The degree value of the genes was indicated by the size of the nodes. The top 10 largest nodes in ADM included Ehmt1, Hdac3, Cecr2, Cdk6, Ranbp2, Bptf, Actg1, Acta1, Ncbp1 and Cul3. We then clari ed 639 PPI pairs among 261 ltered DEGs in PanIN and found 10 nodes with a higher degree (Rhob, Hist1h3g, Hist1h4j, Hist4h4, Hdac11 and Cdk17, Fig. 1c). Among all the 705 DEGs in PDAC, 320 DEGs were ltered into the PPI network with 320 nodes and 777 edges. Rhoq, Actr8, Frk, Zap70, Mras, Akt3, Lyn, Rab25, Pik3c3 and Jup were the top 10 largest nodes in the PPI network of PDAC.
Signi cant overlapping gene modules in ADM-PanIN-PDAC In total, twenty-six, seven and nineteen gene modules were selected from the PPI networks of ADM, PanIN and PDAC, respectively, based on the MCODE plugin of Cytoscape software. Using hypergeometric tests of the genes above, we found one pair of modules containing 4 signi cant overlapping genes between were enriched in nucleosome assembly, positive epigenetic regulation and initiation of DNA template transcription by GO analysis. Interestingly, DEGs in PanIN module-1 were enriched in similar functions including epigenetic regulation of gene expression, DNA-template transcription, and DNA methylation. 75% of all the genes annotated in both ADM and PanIN modules were members of the histone family, strongly indicating its potential role in promoting ADM to PanIN.
Coincidently, all of the 4 overlapping genes (Hist1h2an, Hist1h4c, Hist1h4m and Hist4h4) belong to the core histone family, H2A and H4, suggesting that they may play critical roles in promoting ADM to PanIN. To further understand the major functions of the overlapping genes, we performed a GO analysis based on the DAVID database (Table 1). No signi cant overlapping genes were found between PanIN and PDAC modules. This result indicates that overlapping genes are highly involved in the initial stages of PDAC (ADM and PanIN), rather than the PanIN-PDAC stage. However, whether overlapping genes exist between PanIN-PDAC based on other databases remains to be explored. Therefore, the mechanisms for regulating ADM-PanIN-PDAC progression may rely mostly on crosstalk, especially at the PanIN-PDAC stage.

Crosstalk modules between ADM and PanIN
Three pairs of PPI sub-networks between ADM and PanIN with signi cant crosstalk modules were found. There were 63, 35

Crosstalk modules in ADM-PanIN-ADM
We nally combined all the 10 crosstalk modules above to select crosstalk modules among the three different modules (ADM, PanIN and PDAC). Two crosstalk sub-networks of ADM-PanIN-PDAC were discovered ( Fig.3K and Fig.3L). Detailed statistical information on the combined crosstalk genes are shown in Table S6. Interestingly, most genes involved in the crosstalk of ADM-PanIN-PDAC are from the histone family. Mutation and modi cation of histones may promote ADM progression to PDAC through multiple pathways including nucleosome assembly, microtubule-based processes, and chromatin modi cation, as predicted by GO analysis ( Table 2). The top 10 genes regulating ADM-PanIN-PDAC with the highest degree were Chd1, Bptf, Smarca1, Cecr2 H2afz in ADM module-1, Smarca2 in PanIN module-4, and Frk, Clip3, Kcnmb2, Ccng1 in PDAC module-9 (Fig.3K). Another crosstalk network focuses on the OR superfamily (Fig.3L). GO analysis shows that these genes mainly function through the G-protein coupled receptor (GPCR) signaling pathway. So far, the role of ORs as GPCRs has been underappreciated. Some reports have shown that certain ORs inhibit the proliferation of lung and prostate cancer cell lines [23,24].
Importantly, we mapped all the signi cant overlapping and crosstalk genes into human homologies to investigate their expression level in PDAC and their potential relationships with prognosis using the GEPIA tool. We found that SMARCA1, SMARCA2, CLIP3, BPTF and H2AFZ were signi cantly overexpressed in PDAC compared with normal pancreatic tissue (Fig. 4A). KCNMB2 and H2AFP (human homology of mouse HIST1H2AN), SMARCA1, CLIP3 and SMARCA2 are putative prognostic factors for PDAC (Fig. 4B). Information on the expression and prognosis of other overlapping and crosstalk genes are provided in Fig. S1.

Discussion
This study unveiled the potential genetic alterations and their regulators that drive ADM to PanIN and subsequently PDAC. We identi ed 45 genes in the ADM stage, 7 genes in PanIN and 28 genes in PDAC that interact with each other. Overlapping and crosstalk genes in ADM-PanIN-PDAC contain several bridging ADM gene modules (e.g. Hist2h2ab and smarca1), PanIN gene modules (e.g. Hdac11 and smarca2) and PDAC gene modules (e.g. Olfr239 and Hdac6). GO analysis shows that these genes mainly play roles in nucleosome assembly, chromatin organization and the G-protein coupled receptor signaling pathway. Our results show that the expression of cancer-related genes is affected by epigenetic dysregulation via DNA methylation and histone modi cation and by small non-coding regulatory microRNAs (e.g. mmu-miR-335-5p) and lncRNAs (e.g. H19).
Overlapping genes and most crosstalk genes belong to the histone family. The histone is a highly conserved protein with extensive cellular functions. It constitutes two functional parts: core histones (H2A, H2B, H3 and H4) and linker histones (H1 and H5) [25]. In our study, most overlapping genes are core histones, including HIST1H2AN, also known as H2AC22 (homology of human H2AC11), and three members of Histone H4 (Hist1h4c, Hist1h4m and Hist4h4). Crosstalk genes including H2AFZ (synonym of H2A.Z) were annotated in ADM-PanIN-PDAC. Histones are associated with cancer-predisposing in ammation or even trans-differentiation [26]. Histone H3 has the potential to be a biomarker for evaluating the severity of acute pancreatitis due to caerulein-triggered extensive pancreatic acinar cell death in animal models [27]. Histone H4 binds to smooth muscle cells and triggers arterial tissue in ammation [28]. Histone variant H2A.J accumulates with aging in speci c tissues and may contribute to chronic in ammation, aging-associated disease and cancers [29]. H2AFZ has been reported to be upregulated in breast, liver, bladder and lung cancer and has oncogenic properties in prostate cancer [30]. Our study is the rst to demonstrate the potential involvement of H2AFZ in ADM driven PDAC carcinogenesis. Apart from the differential expression level of histone variants, modi cation of histones could also exert effects on carcinogenesis [31]. We found several histone deacetylases (HDAC) including HDAC3 in ADM module-1, HDAC11 in PanIN module-4 and HDAC6 in PDAC module-9 that are important in the network of ADM-PanIN-PDAC. High expression of HDAC3 in precursor lesions of prostate cancer has been reported, indicating a critical role in the initiation stage of tumorigenesis [32]. Aberrant expression of HDAC6 also plays critical roles in cell differentiation, apoptosis and cell cycle control [33]. Hdac11 has been identi ed as a novel target in antitumor therapy [34]. Loss of histone trimethyl transferase, H3K36, facilitates ADM formation through epigenetic dysregulation and leads to extracellular matrix (ECM) production in PDAC [35].
Many genes predicted to function in ADM-PanIN-PDAC regulation were found to interact with histones to different extents. Several genes annotated in our study have been previously validated in PDAC. Absence of ATRX in adult mice with oncogenic KRAS mutation, a subgroup of SWI/SNF complex (Switch/Sucrose Non-Fermentable chromatin remodeling), resulted in increased ADM and progressive PanIN lesions [36]. Deletion of ARID1A, a SWI/SNF component, in the pancreas exaggerated ADM formation, diminished regeneration after injury and lead to intraductal papillary mucinous neoplasm and PanIN when it cooperated with mutant KRAS [37]. We found that another family member, ARID4A, predicted in the ADM module, may also play a role in PDAC initiation. The crosstalk gene, SMARCA1, has been linked with in ammation related diseases and cancer cell proliferation, migration, growth, death and DNA damage [38,39]. Two other members of the SMARCA class have been successfully linked to PDAC. SMARCA2, annotated in the PanIN module, was correlated with cancer growth, chemoresistance and poor survival of pancreatic cancer patients [40]. SMARCA4-de cient mice in cooperation with oncogenic Kras promoted PDAC precursor lesions [41]. SMARCA1 may act upstream of p53 (a very common mutation in PDAC) by regulating the expression of p53 gene through the Wnt pathway, thus it could represent a prognostic biomarker for PDAC and warrants further research.
Another crosstalk gene, CHD1, is a chromatin remodeling factor which speci cally binds to methylated histone H3 lysine 4 residue (H3K4me3), and is involved in nuclear shuttling in pancreatic cancer cells [42]. Similarly, BPTF (bromodomain PHD transcription factor) could also bind with H3K4me3 to stabilize the NURF (nucleosome remodeling factor) complex on chromatin, resulting in transcriptional regulation [43]. BPTF was expressed at intermediate levels in PDACderived cell lines and is responsible for cancer cell proliferation and PDAC initiation and maintenance by interacting with c-MYC [44]. FRK plays a role in pancreatic cancer cell migration and proliferation, and was suggested to be a critical therapeutic target of pancreatic cancer [45]. CLIP3, a known antiin ammatory regulator involved in TNF-α signaling and injury healing [46],was identi ed in our study as a novel target of PDAC in the crosstalk gene network. CECR2 functions with SMARCA1 in the CERF complex (CECR2-containing remodeling factor) to regulate cell differentiation and development [47]. Interestingly, SMARCA1, SMARCA2 and CLIP3 were overexpressed in human PDAC and their expression level predicts the patient's overall survival.
We show that olfactory receptor (OR/OLFR) superfamily genes are relatively independent modules involved in PDAC formation. Emerging data has shown that ORs are correlated with cell invasiveness [48] and could function as putative drivers of cancer. Olfactory receptor (OR/OLFR) superfamily genes belong to the G protein-coupled receptor (GPCR) superfamily which are involved in the in ammatory response and cancer development via NF-kB signaling [49]. Our ndings on the 22 ORs linked to ADM-PanIN-PDAC support the notion that the olfactory transduction pathway is associated with an elevated risk of pancreatic cancer [50]. Another OR member, prostate-speci c G-protein-coupled receptor (PSGR/OR51E2), was upregulated in prostate cancer and could allegedly induce prostatitis at early age in the mouse and promote prostatic intraepithelial neoplasia [51]. MiRNA-374a and miRNA-410, putative regulators of cancer-related genes, were inversely correlated with PSGR overexpression [52]. Recently, overexpression of Olfr544, Olfr543 and Olfr1349 at the mRNA level were found to play a role in regulating glucagon secretion in pancreatic cells [53]. ORs are linked to H2AFZ and the histone superfamily through 4 lncRNAs (Gm9866, 4833422C13Rik, BC016548 and 2810407A14Rik). Exploring the relationship between the spectrum of OR and cancer is invaluable for kindling new therapeutic targets.
It is well known that long noncoding RNAs (lncRNAs) and small noncoding RNAs (miRNA) facilitate tumor initiation and progression through regulating tumor suppressor genes or oncogenes. Our analysis revealed a complex network among miRNAs, lncRNAs and their targeted genes during PDAC tumorigenesis. Consistently, one of our strongest candidates, Gas5, was recently shown to regulate pancreatic cancer metastasis via PTEN [54]. MiR-143, reported to be suppressed in colorectal neoplasia [55], was predicted to regulate ADM-PanIN-PDAC via targeting of ACTR10. MiR-193b, validated as a tumor suppressor in various cancer types, was found to regulate CLIP3 and interact with oncogenic lncRNA (MIR31HG) in our network [56]. Another tumor suppressor Meg3, predicted to interact with Hist1h2bn in our study, was found to be involved in pancreatic cancer [57]. H19, identi ed as an oncogenic lncRNA, plays a role in pancreatic cancer invasion and metastasis [55]. In line with previous ndings, we identi ed H19 as the top lncRNA in crosstalk with ADM, PanIN and PDAC. miR-335-5p, miR-7646-5p and miR-669n target both Hist4h4 and Hist1h4m. In the literature there is only one study that suggests that MiR-335-5p may attenuate pancreatic cancer development trough modulating the downstream oncogene [58]. The newly identi ed biological functions of these lncRNA/miRNAs in PDAC remain to be validated in further mechanistic and functional studies. Other tumor suppressors were also annotated in our network.
Although it has been well documented in the literature that persistent ADM could progress to PanIN, and nally to invasive PDAC [7,[59][60][61], PDAC may arise from ADM without the need for PanIN as an intermediate step. The results derived from this study do not apply to those PDAC arising directly from ADM or other precursors such a mucinous cystic neoplasia or intraductal papillary mucinous neoplasm. Furthermore, this bioinformatic analysis is primarily based on animal data, and validation in patient samples and functional studies are required to determine whether these ndings will be clinically useful.

Conclusion
In summary, we revealed potential molecular mechanisms that could regulate the progression of ADM-PanIN-PDAC at both genetic and epigenetic (differentially expressed genes, miRNA and lncRNA) levels. SMARCA1, SMARCA2 and CLIP3 were not only signi cant crosstalk genes, but also differentially expressed genes and potential prognostic factors for human PDAC. These newly TGF-β: transforming growth factor-β.

Declarations
Ethics approval and consent to participate: Not applicable.
Consent for publication Not applicable.
Availability of data and material: All data generated or analysed during this study are included in this published article [and its supplementary information les].
Competing interests: The authors declare that they have no competing interests.