An Integrative Data Mining for the Identification and Validation of Oncogenic Biomarkers in Pancreatic Carcinoma


 Background: Pancreatic carcinoma (PC) is a severe disease associated with high mortality. Although strategies for cancer therapy made great progress, outcomes of pancreatic ductal adenocarcinoma patients remain extremely poor. Therefore, it is urgent to find novel biomarkers and therapeutic targets to improve outcomes of patients. Methods: To identify general applicable targets for early diagnosis and therapy, we selected related microarray data which including three mRNA microarray datasets GSE62165, GSE15471, GSE32676 and two miRNA datasets GSE24279, GSE32678, and combinative analysis was performed by GEO2R. Functional and pathway enrichment analysis were performed using the DAVID database. MiRTarBase, miRWalk, Diana Tools and TBtools were used to get keys. TCGA database, HPA database and western blot experiments were used to verify diagnostic and prognostic value of key genes.Results: By integrating mRNA and miRNA expression profiles, we identified 114 differentially expressed genes (DEGs) and 114 differentially expressed miRNAs (DEMs), respectively. Furthermore, three overlapping key genes, RUNX2, LAMC2 and FBXO32, were found by compared with DEMs target genes and DEGs. In detail, deregulation of 8 key miRNAs were closely related to poor outcomes and participated in crucial genes regulation which may contribute to build a miRNA biomarker panel for prognosis. Moreover, we confirmed that RUNX2 showed a potential property for distinguishing PC and normal people. We also demonstrated that aberrant over-expression of LAMC2 was associated with poor prognosis of PC patients as well as human tumor status and subtypes. The protein levels of RUNX2 and LAMC2 in PC patients were further verified by IHC from Human Protein Atlas and western blot experiments. Conclusions: In summary, our current study identified that RUNX2 and LAMC2 may be promising targets for early diagnosis and therapy of PC patients.

In this study, we applied three mRNA pro ling datasets (GSE62165 [5] , GSE15471 [6] and GSE32676 [7] ) and two microRNA (miRNA) pro ling datasets (GSE24279 [8] and GSE32678 [7] ), which were downloaded from the Gene Expression Omnibus (GEO) database, to obtain DEGs and DEMs between PC and normal tissue samples. Subsequently, DEMs targeted genes (DEMTGs) were predicted, and by overlapping analysis between DEMTGs and DEGs to lter potential genes and miRNAs. We further clari ed differentially expression of key genes through The Cancer Genome Atlas (TCGA) database. Finally, the diagnostic and prognostic value of these key genes and miRNAs were evaluated and predicted by receiver operating characteristic (ROC) and survival analysis.
The GEO2R web tool was used to screen for DEGs and DEMs between the PC and normal tissue samples in each dataset. The log2FoldChange (logFC) was calculated and the P-values were adjusted to correct for the occurrence of false-positive results by using the default method. Then, P-value < 0.05 and |log2FC| > 1.5 as the cut-off criteria for signi cant DEGs, and P-value < 0.05 and |log2FC| > 0.5 for signi cant DEMs was settled. Subsequently, volcano plots of DEGs and DEMs were used to quickly identify differences and Venn analysis was performed to get overlapping up or down regulated DEGs/DEMs in all mRNA/miRNA datasets, respectively [9] . By adding common shared up regulated DEGs/DEMs with down regulated DEGs/DEMs, obtained datasets were named as tDEGs and tDEMs in current study.

Survival analysis
databases. The expression of key genes was analyzed in various tumor and non-tumor tissues using GEPIA, and the comparison of the key genes expression at different stages of PC was also performed. The clinical information of patients was also downloaded from TCGA for further data validation.

Data validation
Blood and saliva GEO data veri cation: We investigated a validation by comparing the key genes mRNA standardized values in ve independent GEO datasets: four of which were from blood samples (GSE74629 [16], GSE49641 [17], GSE49515 [18] and GSE15932), and the last one was from saliva samples (GSE14245 [19]).
Immunohistochemical (IHC) veri cation: The IHC staining images of PC and normal tissue were obtained from the Human Protein Atlas [20] (HPA, www.proteinatlas.org) to validate expression of key genes. The staining, intensity, quantity and location of IHC images present in HPA was calculated to each gene.
Western Blot (WB) veri cation: A total of 6 pancreatic tissue samples of PC patients were obtained by surgical resection and further divided into the tumor group and the adjacent non-cancerous group (normal). The study was approved by the Institutional Ethics Committee of Wenzhou Medical University and written informed consent was obtained from each patient before their enrollment. WB protocol was performed according to our previously described procedures [21]. The following primary antibodies were used: RUNX2 (ABclonal, cat. no. A2851), LAMC2 (ABclonal, cat. no. A1869) and β-Actin (Abmart, cat. no. P30002). The protein expression levels in samples were quanti ed by Image J software.

Statistical Analysis
The discriminative ability of key genes and miRNAs in the GEO datasets was calculated by ROC analysis, the pROC [22] R package was performed in R 3.6.2 (http://www.R-project.org/). The area under the curve (AUC), 95% CI and P-value for assessing the predictive accuracy and speci city of ROC were calculated using SPSS version 23.0 (IBM, Chicago, IL, USA). All scatter and bar plots were generated by Graph Pad Prism 7 Software (GraphPad Software, Inc.). Comparisons between means were performed by ANOVA. Pvalue < 0.05 was considered statistically signi cant.

Initial identi cation of tDEGs, tDEMs and tDEMTGs in pancreatic cancer
To nd key genes which expressed differentially between PC patients and healthy people,we decided to lter them out in two sides: mRNA and miRNA targeted method. Firstly, we selected three expression array pro ling datasets (GSE62165, GSE15471 and GSE32676) and two non-coding RNA array pro ling datasets (GSE24279 and GSE32678) from GEO database. As for mRNA, 1398, 607 and 1157 DEGs were extracted from GSE62165, GSE15471 and GSE32676. Among them, there were 997, 562, 669 (103 shared genes) up-regulated and 401, 45, 488 (11 shared genes) down-regulated DEGs were identi ed. Taking advantage of Venn analysis in up and down regulated DEGs respectively, we captured 114 total shared DEGs, which we termed as tDEGs in this study, where 103 were high-expression and 11 were lowexpression genes (Fig. 1A&B). As for miRNA, the same operations were also performed. 417 and 569 DEMs were extracted from GSE24279 and GSE32678. Two miRNA datasets totally shared 66 DEMs, which we termed as tDEMs, including 24 were up-regulated and 42 were down-regulated miRNAs ( Fig. 1C&D).
The target genes of tDEMs were predicted by miRTarBase, miRWalk and Diana Tools databases. And the 114 consistent expression genes, which termed as tDEMTGs, were found by Venn analysis (Fig. 1E).

GO and pathway enrichment analysis
The GO term and KEGG pathway analysis were performed via DAVID. Firstly, the results of GO term enrichment analysis varied a lot between tDEGs and tDEMTGs (Fig. 2A&B). Biological process (BP) analysis of GO showed tDEGs were signi cantly enriched in extracellular matrix associated part, such as cell adhesion, extracellular matrix organization and disassembly, collagen catabolism and organization. As for tDEMTGs, top 2 clusters of genes were enriched in the regulation of transcription and proliferation, which showed a similar result in tDEGs, but other genes were mainly responsible for the regulation of cell death associated part. For GO cell component (CC) analysis, the tDEGs were signi cantly enriched in extracellular part, such as extracellular region, extracellular exosome, extracellular space, extracellular matrix and proteinaceous extracellular matrix. However, the results of tDEMTGs were focused on plasma membrane part, cytosol, organelle lumen, membrane-enclosed lumen and intracellular organelle lumen.
The same differences also showed in molecular function (MF) analysis, tDEGs were mainly enriched in protein binding, but tDEMTGs were enriched in nucleotide binding, transcription regulator activity, ribonucleotide binding, purine ribonucleotide binding and purine nucleotide binding.
Additionally, the differences between tDEGs and tDEMTGs were also con rmed KEGG pathway analysis. The result disclosed that tDEGs were involved in PI3K-Akt signaling pathway, focal adhesion, ECMreceptor interaction, amoebiasis and pathways in cancer (Fig. 2C). However, tDEMTGs were signi cantly enriched in pathways in cancer and partial enriched in pancreatic cancer (Fig. 2D).

Identi cation of key genes and miRNAs
To identify potential key genes, tDEMTGs were aligned with tDEGs to obtain the intersection part, which we termed as key genesfor further analysis (Fig. 1F). Importantly, the result showed runt related transcription factor 2 (RUNX2), laminin subunit gamma-2 (LAMC2) and F-box protein 32 (FBXO32) were commonly shared and listed in Table 1. Subsequently, we screened candidate miRNAs involved in the regulation of key genes from 66 tDEMs, and found 16 up-regulated and 19 down-regulated key miRNAs (Table 2). Table 1 The differential expression of key genes in different GEO datasets.  To test the clinical applicability of the screening results, we explored the diagnostic signi cance of candidate miRNAs by ROC curve analysis (Fig. 3A). However, the mean AUC values of them were all lower than 0.85 (Fig. 3B). Then, to further evaluate the prognostic value, survival analysis was applied and the results showed 10 up-regulated key miRNAs indicated a signi cantly lower OS rate: hsa-miR-519c-3p, hsa-miR-1265, hsa-miR-1825, hsa-miR-1227-3p, hsa-miR-1233-3p, hsa-miR-1289, hsa-miR-621, hsa-miR-1248, hsa-miR-199b-5p and hsa-miR-588 (Fig. 4A). Correspondingly, three down-regulated candidate miRNAs were positively related with OS rate: hsa-miR-488-3p, hsa-miR-30b-3p and hsa-miR-628-3p (Fig. 4B). These data indicated that these 13 candidate miRNAs could be potential for clinical application as prognostic markers.

Diagnostic and prognostic value of key genes
We then compared the transcriptional levels of key genes in multiple cancers with those in normal samples by using TCGA database via GEPIA (Fig. 5A&B&C). The mRNA expression levels of RUNX2, LAMC2 and FBXO32 showed signi cantly differences between cancer and adjacent non-cancerous tissues especially in pancreatic adenocarcinoma (PAAD, Fig. 5.D). ROC and survival analysis were also performed to ensure the application prospect of key genes. All the three key genes indicated a good discriminating ability,of which the mean AUC values of each key gene in three GEO datasets were all greater than 0.85 (Fig. 5E). Furthermore, the mean AUC values of LAMC2 and FBXO32 both exceeded 0.95 which indicated that these two molecules might be promising biomarkers to PC diagnosis. The results of survival analysis showed only LAMC2 expression level affected OS rate signi cantly (HR = 3.06, P = 0.00011, Fig. 5F). It suggested that LAMC2 may be an important molecule which participate in the development of PC and also act as a potential prognostic biomarker.

Veri cation of key genes
In addition to high accuracy and speci city, the diagnostic methods should also be painless and woundless. Blood and saliva are easily obtained samples that we can get with little harm for the body.
Thus, we veri ed the diagnostic possibility of key genes in 4 blood cells GEO datasets (GSE74629, GSE49641, GSE49515 and GSE15932) and 1 saliva cells GEO dataset (GSE14245). Interestingly, we found only RUNX2 had signi cant reduction compared with normal controls in saliva and blood cells (GSE14245 and GSE49641) which suggested that RUNX2 could be considered as one of promising candidate biomarkers of PC. However, LAMC2 and FBXO32 were not signi cantly different between the control and PC samples.
Then, to further clarify the expression of RUNX2 and LAMC2 in PC tissue, we studied the IHC experiments of these two genes in HPA (Fig. 6B, Supplementary Fig. 1). First, the result of LAMC2 was consistent with previous analysis. The staining of anti-LAMC2 antibodies were all above medium in PC patients, which were hardly detected in normal pancreas tissues. Although staining quantity were varying (5.5/12 were > 75%, 2.5/12 were 75%-25% and 4/12 were < 25% stained), almost all of their staining intensity were strong. Besides, LAMC2 were visualized in cytoplasmic/membranous while RUNX2 were not corroborating with predicted. Though a slight up-regulation of RUNX2 could be found in some samples (17%), most of them which just like normal samples could not be detected. Consistently, immune blot was performed to detect the level of RUNX2 which showed a similar result with IHC (Fig. 6C). These data indicated that LAMC2 may play a crucial role in the PC therapy.
In a second validation study, we examined RUNX2 and LAMC2 performance in different clinical stage, sex, age, tumor subtype and personal tumor status groups (Fig. 7). We found RUNX2 was only signi cantly up-regulated in different stage groups. However, LAMC2 were signi cantly high-expressed not only in advanced stage PC patients, but also in ductal type PC group (P = 0.03) and PC patients (P = 0.02).
According to above evidences, we further found LAMC2 might be the main factor in in uencing median survival time of PC patients by combination analysis of RUNX2 and LAMC2 (Fig. 8).

Discussion
Recently, increased morbidity and mortality of PC were concerned as a severe challenge for human heathy. Despite great advantages in past years, the early diagnosis and e cient therapeutic strategies of PC are still huge problems owing to lacking of deep understanding about PC carcinogenesis. Currently, due to development of bioinformatics technology, it is much easier to identify the general genetic alterations in diseases which may shed light on determination of hub targets for clinical utility. Studies of miRNA biology have expanded considerably since rst discovery, and have been considered as attractive targets because of their crucial roles in modulation of gene expression under healthy, in amed and carcinogenic pathological state. Therefore, identifying general diagnostic and prognostic biomarkers in cancer is vital to analyze the changes of gene expression in combination with miRNAs.
In the present study, we initially screened out 114 tDEGs and 114 tDEMs from 3 and 2 GEO datasets respectively. By GO function and KEGG analysis, we obtained an in-depth understanding of these genes attached to PC initiation and progression. Further, intending to investigate the general pro les of molecular alterations in PC, we then identi ed 3 key genes and 35 important miRNAs by aligning tDEMTGs with tDEGs. We investigated diagnostic and prognostic value of them by ROC and survival tests. And we also con rmed protein expression with IHC and WB assays. According to the data of current study, we suggested that LAMC2 and RUNX2 could serve as potential candidates for the clinical application of PC in the future.

MicroRNAs (miRNAs) are a class of non-coding RNAs which can bond to 3'-untranslated region (3'-UTR)
of targeted mRNA to regulate their proteins expression levels [23] . Multiple studies have demonstrated that miRNAs participate in the management of all cancer hallmarks. It is commonly considered that miRNAs could be important molecular tools for noninvasive diagnosis and prognosis of cancer. Therefore, it is of great importance to screen most commonly suitable DEGs which could be used as treatment targets or diagnostic markers by discussing the interaction of miRNAs and mRNAs. In this study, we nally obtained 8 key miRNAs from candidate miRNAs by further screening through ROC and survival analysis, which enriched miRNA pro les of PC and may help to highlight the diagnostic and therapeutic potential of miRNAs cluster in PC (Fig. 9).
RUNX2 belongs to RUNX family, and is known as one of the major determinants of osteoblast differentiation and bone formation. Recent studies found RUNX2 was overexpressed in several tumor tissues and may play a vital role in tumor initiation, progression, invasion and metastasis [24][25][26][27][28] . Our results demonstrated that although overexpression of RUNX2 was related to malignant behaviours, it didn't signi cantly affect the survival time of the PC patients which might be a bit inconsistent with previous research. Intriguingly, we found that RUNX2 presented a high diagnostic ability in PC. Histological analysis is an invasive examination that is not suitable for the early diagnosis of PDAC, in terms of safety. Therefore, blood or body uid testing may provide a good alternative method to assist clinicians with diagnosis. Unlike high-expression of RUNX2 in pancreatic tissue, it was down-regulated in blood datasets, and its expression in peripheral blood mononuclear cell (PBMC, GSE49641) was signi cantly lower in PC group than in healthy control samples. Saliva samples (GSE14245) showed a similar result which suggesting its application possibility as a noninvasive diagnostic biomarker.
However, there were no signi cant differences detected in peripheral blood (GSE74629 and GSE15932). In addition to these, RUNX2 had been described that could regulate multiple carcinogenesis genes and molecules in PC, including VEGF (vascular endothelial growth factor), matrix metalloproteinases and survivin [29] . Combined with the results of transcription factors prediction, we believed that the role of RUNX2 in PC progression might concentrate more on regulating the target genes, such as LAMC2.
LAMC2 is a member of laminins family which are the main structural component of basement membranes and participate in a wide variety of biological processes including cell adhesion, differentiation, migration and metastasis [30,31] . In this study, results of the enrichment analysis indicate that LAMC2 are closely related with the process of cell adhesion, extracellular matrix organization/disassembly and positive regulation of cell proliferation. We found LAMC2 expression was elevated in PC tumor tissues through GEO database, IHC and WB analysis. Additionally, PC patients who harbored high expression of LAMC2 had a poor prognosis. According to these data, we hypothesized that LAMC2 expression may lead to an increased risk of tumor recurrence. Pancreatic ductal adenocarcinoma (PDAC) accounts for most human PC cases (more than 95%) [32] , we found that the expression of LAMC2 is mainly up-regulated in PDAC, which indicates that it is not only suitable for most PC patients, but it may also help to identify the subtypes of PC. Because carbohydrate antigen 19 − 9 (CA19-9) which acts as the most commonly used PC marker is not accurate for the diagnosis of PDAC [33] , combining with the good discriminating ability that LAMC2 showed in the AUC curve, we suggest that it may be a potential indicator in the auxiliary diagnosis of PC.

Conclusions
In summary, we con rmed that RUNX2 and LAMC2 are key genes that facilitate the progression of PC through bioinformatics and experimental analysis. Eight key miRNAs (miR-588, miR-199b, miR-1227, miR-628, miR-488, miR-1825, miR-519 and miR1265) which participated in the regulation of key genes expression might enrich the speci c miRNA pro les of PC. These ndings provide a new perspective on the underlying molecular mechanism of PC, suggesting that LAMC2 and RUNX2 may be valuable biomarkers and therapeutic targets for PC patients and may also offer powerful evidence and clues for the future genomic individualized treatment of PC.

Declarations
Ethics approval and consent to participate The study was approved by the Institutional Ethics Committee of Wenzhou Medical University and written informed consent was obtained from each patient before their enrollment.

Consent for publication
Not applicable.

Availability of data and material
The datasets used during the current study are available from the corresponding author on reasonable request.

Competing interests
The Authors declare that there is no con ict of interest.       The correlation of RUNX2 and LAMC2 expression with the clinico-pathological characteristics of patients.  Survival analysis of combined RUNX2 and LAMC2 in TCGA PAAD cohort.

Figure 9
The key genes and key miRNA interaction network.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. sm.docx