Identification of novel candidate biomarkers and efficacious small molecule drugs for pancreatic adenocarcinoma based on comprehensive bioinformatics technology

Background: Pancreatic adenocarcinoma (PAAD) is considered as one of most serious solid tumors with high mortality as well as incidence. We performed this present study to screen out novel biomarkers and potential efficacious small drugs in PAAD using integrated bioinformatics analyses. Methods: Expression profiles derived from TCGA and GTEx datasets were integrated by following bioinformatics methods: DEGs analysis, WGCNA, KEGG and GO enrichment analyses. Kaplan-Meier method was conducted to assess the prognostic performance of genes, and the predictive value of identified genes was determined by the ROC analysis. CMap analysis was utilized to identify the related small molecule drugs in PAAD. Results: A total of 4777 DEGs containing 2536 up-regulated and 2241 down-regulated genes were identified. WGCNA identified six modules that were related with clinical characteristics, and functional enrichment analyses showed that all genes of blue module were enriched in EMT, PI3K/Akt signaling pathway and other important biological processes and pathways. Moreover, three genes ( ITGB1 , ITGB5 , and OSMR ) of blue module were determined as key genes with higher expression levels and significant prognostic value. ROC analysis revealed that these three genes had excellent diagnostic efficiency for PAAD. These three key genes were highly regulated in PAAD tissues compared with normal controls. A panel of small molecule drugs including thapsigargin, viomycin, adiphenine, and CP-690334-01 were screened out to treat PAAD. Conclusion: In conclusion, our findings identified three key genes implicated in PAAD and screened out several potential small drugs to treat it, which may shed novel insights into the development of PAAD and its treatment. Background

4 for identification of candidate key genes, sheds novel insights for cancer treatments, and has been employed in multiple cancers, such as stomach adenocarcinoma [8], clear cell renal cell carcinoma [9], and non-small-cell lung cancer [10]. Furthermore, five hub miRNAs have been identified as potential diagnostic and prognostic candidates for pancreatic ductal adenocarcinoma [11]. Herein, in this present study, we established a co-expression network using DEGs correlated with PAAD development through accessing to TCGA and Oncomine databases, and determined the important gene sets that closely linked with PAAD using WGCNA. KEGG and GO enrichment analyses were performed to better explore the functional roles of identified DEGs. Overall survival of these DEGs was detected using Kaplan-Meier method and small molecule drugs related to them were screened out for the potential use in the treatment of PAAD.

Data collection of gene expression profiles
To identify the DEGs in PAAD, the RNA-Seq data of PAAD were downloaded from the TCGA database (https://cancergenome.nih.gov/), which contains 178 tumor samples and 4 normal samples. Due to there were only 4 normal samples of adjacent tissue samples, we integrated the RNA-Seq data of 171 normal pancreas specimens from the Genotype-Tissue Expression (GTEx) dataset (https://gtexportal.org/).

Data preprocessing and screening for DEGs
The "limma" package of R software was used to standardize and transform the RNA-Seq data and screen the DEGs between PAAD samples and normal pancreas tissue specimens [12]. |logFC (fold change)| > 1 and false discovery rate (FDR) < 0.05 were considered as the threshold for DEGs screening.
Weighted co-expression network analysis As previously mentioned, weight gene co-expression network was established by "WGCNA" R package [13]. The input data of WGCNA establishment consists of 4777 DEGs that we had obtained and 176 PAAD samples with pathological staging. At first, clustering the samples and selecting a soft threshold value β based on the standard scale-free networks. Second, the adjacencies among all filtered genes were determined using the power adjacent function to Pearson correlation matrix and then transformed these data into topological overlap matrixs (TOMs). Average linkage hierarchical clustering was performed after calculating the dissimilarity (1-TOM). Finally, in order to further analyze the module, the dissimilarity of module eigengenes was calculated and modules with highly similarity were merged together as the cut-off value of 5.

KEGG and GO enrichment analyses
The biological functions of the genes in blue module were predicted using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses on the basis of the on-line platform of Database for Annotation, Visualization, and Integrated Discovery (DAVID) (http://david.abcc.ncifcrf.gov/). Adjusted P < 0.05 was regarded as the cut-off criteria.

Validation of key genes
To further investigate the possible roles of module genes in the incidence and progression of PAAD, we conducted Kaplan-Meier analysis on the genes of PI3K/Akt signaling pathway with the most abundant genes in blue module to screen the key genes that have significant prognostic significance in PAAD. The clinical specimens were derived from TCGA database. The receiver operating characteristic (ROC) curve was plotted to compare the sensitivity and specificity of key genes we obtained. The data from Oncomine datasets were utilized to verify the significant difference of identified key genes.
Identification of DEmiRs was performed using the miR-Seq expression profiles that were derived from TCGA (a total of 182 samples including 178 tumor samples and 4 normal specimens). The specific steps were the same as the screening of DEGs, as above described. TargetScan website was used to predict the targeted miRs of key genes.
Identification of potential small molecules The Connectivity Map (CMap, http://www.broadinstitute.org/cmap/) was employed to screen the potential small molecule drugs that were closely correlated with indicated disease [14]. The DEGs that we identified were contrasted with those genes related with the small molecule drugs treatment in CMap database to screen the potential small molecule drugs linked with DEGs. The range of connectivity value is from − 1 to 1: the positive value represents that the small drug can activate the PAAD cells, while negative value represents that these small molecules can be used to reverse the status of PAAD cells.

Screening of DEGs in PAAD
After integrated bioinformatics analyses for TCGA-PAAD and GTEx datasets, a total of 4777 protein-coding genes were found to be differentially expressed with the threshold of |logFC| > 1 and FDR < 0.05. The volcano plot of DEGs in PAAD was presented in Fig. 1.
Among these DEGs, 2536 DEGs were up-regulated and 2241 DEGs were down-regulated.
Weighted co-expression network establishment and key modules identification The 4777 DEGs and 176 PAAD samples with the data of pathological stage and grade were enrolled to construct the WGCNA. Then constructed phylogenetic tree and removed outliers, finally retained 162 samples. After assessing the quality of expression matrix of TCGA and GTEx, "WGCNA" package was used in R. The value of β = 9 was chosen to ensure the scale-free network. Next, we set MEDissThres as 5 to merge similar modules and finally identified a total of 12 modules (Fig. 2). Black module included 127 genes, blue module included 553 genes, brown module included 131 genes, green module included 215 genes, green-yellow module included 58 genes, grey module included 2142 genes, magenta module included 106 genes, pink module included 123 genes, purple module included 89 genes, red module included 177 genes, tan module included 42 genes, turquoise module included 562 genes, and yellow module included 249 genes. Genes contained in grey module were identified as not co-expressed and deleted in the further analyses.
Interaction relationship among modules and identification of pivotal modules The interaction relationship of 12 modules was analyzed using the network heatmap ( Fig. 3A). This analysis revealed that every module was independent of each other, which suggested that these 12 modules posed high-scale independence degree and significant independence of genes expression in every module. Afterwards, correlation analysis was implemented between each module and clinical features and the findings indicated that tan and red modules were significantly correlated with the Nude (N) stage of PAAD

Enrichment analyses of blue module
On the basis of the correlation between identified modules and clinicopathological characteristics, we performed functional enrichment analyses to investigate the biological functions of DEGs that were included in the above mentioned modules (Fig. 3B-G). The results of enrichment analyses revealed that many important biological processes and pathways were enriched in blue module. In Fig. 4A, Go analysis showed that DEGs of the blue module were mainly related with cell-substrate adhesion, cellular response to growth factor stimulus, epithelial to mesenchymal transition (EMT) and cell migration.
Furthermore, KEGG enrichment analysis disclosed that these DEGs were significantly associated with ECM-receptor interaction, PI3K/Akt signaling pathway, TGF-β signaling pathway, Hippo signaling pathway and so on (Fig. 4B).

Key genes identification and validation
Based on the results of KEGG enrichment analysis in blue module, we found that the most genes were enriched in PI3K/Akt signaling pathway (Fig. 4B). To identify the key genes, Kaplan-Meier method was performed to plot the survival curves of these genes for prognostic significance assessment and a total of three genes (ITGB1, ITGB5, and OSMR) were observed to be significantly associated with prognosis of PAAD patients. As shown in Similarly, ITGB1, ITGB5, and OSMR were also negatively associated with the disease free survival of PAAD patients ( Fig. 5D-F, P < 0.05). Furthermore, the expression levels of these three genes were remarkably increased in PAAD tissue samples in contrast to normal pancreas specimens (Fig. 5G-I, P < 0.0001). In addition, based on the TCGA database, ROC curves were plotted to examine the diagnostic effects of ITGB1, ITGB5, and OSMR. The AUC revealed that ITGB1, ITGB5, and OSMR played pivotal effects on the diagnosis of PAAD ( Fig. 6A-C, P < 0.0001). Afterwards, the Oncomine datasets were utilized to verify the expression levels of these three key genes and discovered that compared with the pancreas tissues, the PAAD tumor tissues exhibited high expression levels of ITGB1, ITGB5, and OSMR ( Fig. 6D-G, P < 0.05). Collectively, these observations showed that ITGB1, ITGB5, and OSMR could serve as important biomarkers for the prognosis and diagnosis of PAAD. miRs that could target the key genes To explore the potential molecular mechanism of these key genes, we firstly screened the DEmiRs in PAAD based on the TCGA database. After downloaded the miR-Seq expression profiles from TCGA database, the limma package of R software was applied to conduct difference analysis in accordance with the threshold (|logFC| > 1 and FDR < 0.05) and a total of 17 DEmiRs were obtained including 2 up-regulated miRs and 15 down-regulated miRs. Then, the miRs that have potential interactions with ITGB1, ITGB5, and OSMR were predicted using TargetScan website. The predicted miRs of these three key genes and DEmiRs that were decreased in PAAD were intersected, whose results indicated that ITGB1 and ITGB5 had a common regulatory miR, namely miR-16. Therefore, we speculated that miR-16, functioned as a crucial factor, might inhibit the progression of PAAD by mediating the ITGB1/ITGB5/PI3K/Akt signaling pathway. To further validate the hypothesis, we subsequently detected the expression level of miR-16 in PAAD tissues and found that miR-16 expression was significantly declined in PAAD tissue samples compared with normal controls (Fig. 7A, P < 0.0001). Figure 7B suggested that miR-16 was positively correlated with survival rates of PAAD patients (P < 0.05). In summary, these findings demonstrated that miR-16 might affect the development of PAAD via regulating key genes ITGB1/ITGB5.

Screening of related small molecule drugs
In order to screen out the candidate small molecule drugs of PAAD, CMap was implemented on the basis of the analysis of consistent differently expressed probesets between PAAD tumor samples and normal specimens. The 29 related small molecule drugs with highly significant correlations were filter by P < 0.01 and listed in Table 1. Among these small molecule drugs, thapsigargin, viomycin, adiphenine, and CP-690334-01 revealed higher obviously negative correlations and the potential to reverse the PAAD tumor status.

Discussion
PAAD is a highly aggressive tumor with dismal prognoses which is resulted by the lack of reliable biomarkers and therapeutic targets [1]. In our exploration, we conducted comprehensive bioinformatics analyses to identify key genes related with the progression of PAAD and finally found three key genes as well as several potential small molecule drugs to treat PAAD.
The three key genes OSMR, ITGB1, and ITGB5 play promoting effects on the progression of PAAD, and their high expression could induce poor prognoses in PAAD patients. These three genes have not been identified as crucial targets in PAAD. OSMR gene is located in chromosome 5p13.1 and encodes an important protein OSMRβ, which can be heterodimerized with IL-6 and then forming the type II OSMR [15]. It has been reported that type II OSMR was correlated with cell viability, differentiation, inflammatory responses and metastasis [16]. The association between the polymorphism of OSMR gene and bladder cancer was determined, which would affect the recurrence rate, overall survival and tumor grade [17]. Xu et al. have discovered that OSMR might be a prognostic biomarker in glioblastoma and overexpression of OSMR can regulate the immune response of glioblastoma [18,19]. Integrins are heterodimeric cell surface receptors composed of 18 α-subunits and 8 β-subunits, and modulate a variety of biological functions in tumorigenesis [20]. As previously described, ITGB1 is considered as a key adhesion receptor for ECM components and can contribute to cell proliferation, invasion and metastasis in various cancers [21,22]. High-regulation of ITGB1 can stimulate the proliferation, invasion and migration in colorectal cancer cells [23]. ITGB1 is crucial for fascin-mediated breast cancer stem cell function and disease progression [24]. Lin et al.
demonstrated that miR-185/ITGB5 played an essential role in the development of HCC through mediating β-catenin pathway [25]. In breast cancer cells, the tumorigenic ability could be facilitated due to the ITGB5-mediated EMT induced by transforming growth factor β [26]. Wortzel et al. found that ITGB5 was enriched in the exosomes of liver metastatic pancreatic carcinoma [27]. To further validate our results, we utilized ROC analysis and Oncomine datasets to detect their diagnostic significances and the expression levels in PAAD. All data showed that ITGB1, ITGB5, and OSMR had significant diagnostic potentials in PAAD, and all were remarkably increased in PAAD tissue samples compared with normal controls. These findings suggested that ITGB1, ITGB5, and OSMR might serve as important prognostic and diagnostic biomarkers in PAAD.
In order to explore how three key genes are regulated in PAAD, we used miR-Seq profiles from TCGA database to identify the DEmiRs, and a total of 2 up-regulated miRs and 15 down-regulated miRs were screened out. Then, integrated the above miRs with the upstream miRs of ITGB1, ITGB5, and OSMR predicted by TargetScan website, miR-16 was selected due to it is the common regulator of ITGB1 and ITGB5. Previous researches have indicated that miR-16 was involved in a variety of tumors, including cervical cancer [28], bladder cancer [29], and lung cancer [30]. Our results revealed that miR-16 was significantly decreased in PAAD tissues and patients with low miR-16 expression had poor outcomes. Therefore, we speculated that miR-16 might inhibit the invasion and metastasis of PAAD through regulating ITGB1/ITGB5-mediated PI3K-Akt pathway. Furthermore, miR-16 has been indicated as a potential diagnostic target involved in the development of PAAD [31]. Therefore, further investigation of miR-16/ITGB1/ITGB5 axis in PAAD may be of great importance.
CMap is an useful tool for identifying the correlation between small molecule drugs and diseases based on the gene expression profiles [32]. To identify small molecule drugs with potential roles to treat PAAD, we conducted CMap analysis and observed that thapsigargin, viomycin, adiphenine, and CP-690334-01 may have potential therapeutic efficacy against PAAD. Among these, thapsigargin and adiphenine have been proven to posses anti-cancer effects while the role of the other two small drugs has not been studied. Thapsigargin is considered as an endoplasmic-reticulum Ca(2+)-ATPase pump inhibitor and correlated with the activation of NF-kappaB translocation in pancreatic cancer [33]. Moreover, thapsigargin can directly induce ER stress and decline the stemness [34]. Adiphenine has been widely used in clinical practice of local anesthetics and it has been reported as an adjuvant small drug to against ovarian cancer [35].
Therefore, we speculated that these identified small molecule drugs might have potential to treat PAAD.
However, there are some limitations in our present study. First, the effects of identified key genes on the phenotype of PAAD are insufficient and the possible molecular mechanism also needs to be further verified in the future. Second, this was only a retrospective investigation based on the public available databases, further in vivo and in vitro experiments should be performed to confirm our current results.

Conclusion
In summary, this present study identified three key genes linked with prognosis and diagnosis of PAAD, and four candidate small molecule drugs that posed the potential to