ROC Curves and AUC Analysis
The ROC curves demonstrated that hsa-miR-574-5p showed the highest differentiation between PDAC and control with an AUC 0.88 (Fig. 1a). Hsa-miR-608 had the second highest AUC of 0.81 for PDAC vs control but had the highest AUC (0.88) for differentiating pancreatitis vs control (Fig. 1b). These two miRNA formed group 1.
Up/Down Regulated DEM
The most significant down-regulated miRNA in PDAC vs control consist of hsa-miR-146b-3p, 27b, 100-3p, 487b, 28-3p, 320d, 192-3p, 181a-5p, and 532-5p formed group 2 (p<0.05).
The most significant up-regulated miRNA in PDAC vs control consisted of hsa-miR-1250, 608, 126-5p, 885-5p, 595, 302d, and 574-5p, and formed group 3 (p<0.05) (Fig. 2a, 2b).
Correlation Analysis
Pearson correlation coefficients showed the top two pairs of correlated miRNA to be hsa-miR-574-5p and hsa-miR-595; hsa-miR-532-5p, and hsa-miR-181a-5p (p<0.0005, correlation 0.6 and 0.56 respectively) (Fig. 3). These formed group 4.
MiRNET Interaction Network
MiRNET showed 4 DEM with the closest interactions based on target genes and downstream pathways: 181a and 126 and their most abundant mature forms, 181a-5p and 126-5p which formed group 5 (Fig. 4). The highest interactions were found between these DEM and their target genes (Supplementary File 3). The most significantly enriched pathway of these DEMs was the neurotrophin signaling pathway (Supplementary File 3).
Cytoscape MCODE Clusters
1542 target genes were achieved for the 23 DEM with a top 1% cutoff from the MiRDIP. Target gene protein-protein interaction network of these target genes from STRING was uploaded to Cytoscape MCODE plug-in which identified 3 clusters with the strongest interactions of all target genes (Fig. 5). Cluster 1 (33 nodes, 528 edges) main pathway was ubi-conjugation and ubiquitin pathway; cluster 2 (20 nodes, 160 edges) was mRNA splicing/processing/binding; cluster 3 (16 nodes, 120 edges) was mainly endocytosis. The shared miRNA associated with these clusters hub genes and the original 23 miRNA formed group 6 (hsa-miR-28-3p, hsa-miR-320b, hsa-miR-320c, hsa-miR-532-5p, hsa-miR-320d, hsa-miR-423-5p).
Decision Tree Model
A decision tree was trained and tested for PDAC and control was analyzed to obtain 5 most important parameters to form group 7 with AUC 0.92 (Fig. 6). A confusion Matrix and ROC curve was plotted using the same decision tree model (Fig. 7a, 7b).
For the group 7 of miRNA, the top 5 most important parameters in the decision tree were found as hsa-miR-574-5p, hsa-miR-126-5p, hsa-miR-1250-5p, hsa-miR-151-3p, hsa-miR-487b-3p.
Random Forest Training Dataset
Random Forest SMOTE model was used to extract top 5 important features of the original 22 DEM to form group 8. F1 is the harmonic mean of the model's precision and recall and is the most reliable predictor for imbalanced data. The most predictive group was the original 22 microRNA group (mean F1 0.992, mean recall 0.996, mean precision 0.988, mean AUC 1.000). The second most predictive group was the down-regulated group 2 (mean F1 0.983, mean recall 0.99, mean precision 0.977, mean AUC 0.998, with the top 5 most important features being 320d, 146b-3p, 100-3p, 487b-3p, and 27b-5p). The third most predictive group was the up-regulated group 3 (mean F1 0.983, mean recall 1.0, mean precision 0.967, mean AUC 0.998, with the top most important features being 574-5p, 595, 608, 126-5p, and 1250-5p) (Fig. 8a, 8b).
Random forest SMOTE showed F1 scores increased with the number of miRNA taken in the group and was highest for the original unfiltered group of 22 miRNA.
Validation Set
The fitted random forest SMOTE model from the training dataset was applied to predict pancreatic cancer in the combined validation dataset. The most predictive group remained the 22 original microRNA group (mean F1 0.976, mean recall 0.996, mean precision 0.958, mean AUC 0.999). The top 3 subsequent groups for the validation set included the MCODE group 6 (mean F1 0.968, mean recall 0.99, mean precision 0.947, mean AUC 0.995), the down-regulated group 2 (mean F1 0.962, mean recall 0.986, mean precision 0.939, mean AUC 0.99), and the random forest group 8 (mean F1 0.954, mean recall 0.977, mean precision 0.932, mean AUC 0.986) (Fig. 9a, 9b).
DISCUSSION
Group 1 included hsa-miR-574-5p and 608
Hsa-miR-574-5p, is known to be involved in fatty acid elongation, base excision repair, hippo signaling pathway, lysine degradation, purine metabolism, and viral carcinogenesis [15]. It is known to be involved in lung adenocarcinoma, small cell lung cancer, breast cancer, gastric cancer, and nasopharyngeal cancer [16, 17, 18, 19, 20, 21]. It is also involved in other inflammatory pathways including diabetes, asthma, and cardiac remodeling [22, 23, 24, 25]. It has not been found as a differentiating signature in pancreatic cancer previously.
Mir-608 was shown to promote apoptosis via BRD4 downregulation in PDAC [26]. It is also involved in metabolism of xenobiotics by cytochrome P450, transcriptional misregulation in cancer, and base excision repair [15]. It has a role in regulation of apoptosis in non-small cell lung cancer, as well as other multiple types of cancer [27, 28].
Group 2 included miR-146b-3p, 27b, 100-3p, 487b, 28-3p, 320d, 192-3p, 181a-5p, 532-5p
The most significant pathways for the downregulated group were steroid biosynthesis, hippo signaling pathway, ECM-receptor interaction, adherens junction, proteoglycans in cancer, lysine degradation, and viral carcinogenesis(p<0.005). These are also known to be involved in prostate cancer, colorectal cancer, endometrial cancer, and non-small cell lung cancer [15]. MiRNA 146b-3p is also involved in hepatocellular carcinoma, pancreatic cancer, and thyroid cancer [29, 30, 31]. 146b-3p induces apoptosis and blocks proliferation in pancreatic cancer stem cells by targeting the MAP3K10 gene.
MiRNA 100-3p is also known to be involved in esophageal and gastric cancers, vulvar carcinoma, and bladder cancer [32, 33, 34]. MiRNA 27b-5p is involved in oral cancer, ovarian carcinoma, and gastric cancer [35, 36, 37]. MiRNA 487b-3p is involved in colon cancer, osteosarcoma, and anaphylactic reactions [38, 39]. MiRNA 28-3p is involved in Alzheimer’s, nasopharyngeal cancer, gastric cancer, thyroid cancer, and esophageal squamous cell carcinoma [40, 41, 42]. MiRNA 532-5p is involved in breast cancer, glioma, gastric cancer, ovarian cancer, renal carcinoma, and ischemic stroke [27, 43, 44, 45, 46, 47]. MiRNA 320d is involved in hepatocellular carcinoma, aortic dissection, and diffuse large B-cell lymphoma [26, 48, 49]. 320d is most associated with colorectal cancer [50]. MiRNA 181a-5p is involved in atherosclerosis, bladder cancer, glioblastoma, prostate cancer, endometrial cancer, and breast cancer [51, 52, 53, 54, 55, 56]. MiRNA 192-3p is involved in renal disease and gastric cancer [57, 23].
Group 3 included miR-1250, 608, 126-5p, 885-5p, 595, 302d, and 574-5p
The most significant (p<0.005) pathways for the upregulated DEM were proteoglycans in cancer, hippo signaling pathway, lysine degradation, viral carcinogenesis, base excision repair, metabolism of xenobiotics by cytochrome P450, and transcriptional misregulation in cancer. They were also known to be involved in non-small cell lung cancer, colorectal cancer, chronic myeloid leukemia, and pancreatic cancer [15].
MiRNA 126-5p is involved in ovarian cancer, acute myocardial infarction/atherosclerosis, endometriosis, and cervical cancer [58, 59, 60, 61, 62]. 126-5p was noted to differentiate severe acute pancreatitis from mild acute pancreatitis [20]. MiRNA 302d-3p is involved in endometrial cancer, cervical squamous cell carcinoma, glaucoma, osteoarthritis, gastric cancer, and breast cancer [63, 64, 65, 66, 67, 68]. MiRNA 885-3p is involved in clear cell renal carcinoma and gastric cancer [69, 70]. MiRNA 1250-5p is a tumor suppressive miRNA, which is silenced by DNA methylation of AATK gene in non-Hodgkin’s lymphoma [71]. MiRNA 595 is involved in hepatocellular carcinoma, ovarian cancer, glioblastoma, and inflammatory bowel disease [72, 73, 74].
Group 4 included pairs hsa-miR-574-5p and 595; 532-5p, and 181a-5p
Group 4 had the most significant (p<0.001) pathways as hippo signaling pathway, lysine degradation, proteoglycans in cancer, viral carcinogenesis, and TGF-beta signaling pathway. These miRNA were also involved in glioma, endometrial cancer, colorectal cancer, non-small cell lung cancer, prostate cancer, thyroid cancer, and pancreatic cancer [15].
Group 5 included hsa-miR-181a, 126, 181a-5p, and 126-5p
The most significant (p<0.0001) shared pathways of this group were neurotrophin signaling pathway, proteoglycans in cancer, viral carcinogenesis, and signaling pathways regulating pluripotency of stem cells. They were also involved in glioma, endometrial cancer, non-small cell lung cancer, colorectal cancer, prostate cancer, pancreatic cancer, renal cell carcinoma, and chronic myeloid leukemia [15].
Group 6 included hsa-miR-28-3p, 320b, 320c, 532-5p, 320d, and 423-5p
For these group 6 miRNA, the most significant (p<0.006) pathways were fatty acid biosynthesis, adherens junction, hippo signaling pathway, proteoglycans in cancer, lysine degradation, viral carcinogenesis, and fatty acid metabolism. They were also associated with glioma, Huntington’s disease, pancreatic cancer, and non-small cell lung cancer [15].
MiRNA 320b is involved in COPD (chronic obstructive pulmonary cancer), osteosarcoma, glioma, and atherosclerosis [75, 76, 77, 78]. 320b suppresses pancreatic cancer cell proliferation by targeting the FOXM1 gene [79]. MiRNA 320c is involved in pulmonary disease/asthma, cervical cancer, breast cancer, bladder cancer, colorectal cancer, myelodysplasia, and osteoarthritis [80, 26, 81, 82, 83, 84, 85]. 320c regulates the resistance to gemcitabine through SMARCC1 [86]. MiRNA 423-5p is involved in osteosarcoma, prostate cancer, glioblastoma, ovarian cancer, thyroid cancer, colorectal cancer, pulmonary tuberculosis, and many other cancers [87, 88, 89, 90, 91, 92, 93].
Group 7 included hsa-miR-574-5p, 126-5p, 1250-5p, 151-3p, and 487b-3p
For the group 7 of miRNA found as the top 5 most important parameters in the decision tree, the most significant (p<0.009) pathways were proteoglycans in cancer, viral carcinogenesis, biosynthesis of unsaturated fatty acids, and hippo signaling pathway. These miRNA have also been associated with non-small cell lung cancer, colorectal cancer, glioma, endometrial cancer, renal cell carcinoma, chronic myeloid leukemia, and pancreatic cancer [15].
MiRNA 151-3p is also involved in breast cancer, osteosarcoma, myocardial infarction, cholangiocarcinoma, nasopharyngeal carcinoma, and gastric cancer [94, 95, 96, 97, 98].
Group 8 included hsa-miR-574-5p, 608, 1250-5p, 595, and 320d
For group 8, the most significant (p<0.05) pathways were hippo signaling pathway, base excision repair, transcriptional misregulation in cancer, metabolism of xenobiotics by cytochrome P450, TGF-beta signaling pathway, and adherens junction.
A validation dataset that had only pancreatic cancer and control but not pancreatitis, was chosen for two reasons: 1. There was no other publicly available data containing pancreatic cancer, pancreatitis, and control that hadn’t already been used for training (which was already a small dataset); 2. To evaluate the consistency of the best performing mRNA group from the training data when applied to a dataset to differentiate pancreatic cancer and control without the benefit of pancreatitis data. There would be no data to confirm if predicted pancreatic cancer evolved from an earlier episode of pancreatitis in this validation dataset. However, the f1 scores could point to validity of the chosen miRNA group in predicting pancreatic cancer in patients with either no history of pancreatitis or with history of un-recalled or sub-clinical pancreatitis in the past, purely based on a very strong correlation of this miRNA group with PDAC (90% of pancreatic cancer).
Animal model studies have revealed that pancreatic cancer cells metastasize to the liver before the primary site of origin is even detected. This rapid tumor progression is thought to be secondary to epithelial to mesenchymal transition (EMT). The most common signaling pathways affected in pancreatic cancer are the TGF-beta signaling pathway in EMT, wnt/beta-catenin signaling pathway, notch signaling pathway, snail transcription factors, zeb transcription factors, and basic helix loop helix transcription factors (bHLH) [99, 100].
All of the miRNA panels showed good performance with AUC>0.92 and F1 scores >0.85. Almost all of the microRNA panel groups included in the study involved nearly all of the known established pathways in pancreatic cancer. These included Hippo signaling, proteoglycans in cancer, neurotrophin signaling pathways, lysine degradation, TGF-beta signaling, viral carcinogenesis, fatty acid biosynthesis and metabolism, adherens junction, and ECM-receptor interaction (KEGG fig pathway).
The hippo signaling pathway in pancreatic cancer is executed by two major proteins, YES-associated protein (YAP) and transcriptional coactivator with PDZ-binding motif (TAZ). These promote a strong stromal reaction in the pancreatic tumor microenvironment (TME), even in the absence of KRAS [101]. Proteoglycans are involved in the P13K-Akt signaling pathway, MAPK, Wnt signaling pathways, focal adhesion, VEGF and TGF-beta signaling pathways [102].
Perineural invasion, although present in several tumors, has the highest prevalence in PDAC, ranging from 70-90%, including in early-stage and microscopic PDAC, suggesting that it could represent an early event in tumor progression. Neurotrophins are growth factors which increase growth, proliferation, and nerve-cancer affinity in perineural invasion [103]. Neurotrophins affect downstream pathways such as MAPK signaling pathway, ubiquitin mediated proteolysis, and apoptosis [104].
Ubiquitination and acetylation are common lysine modifications. Ubiquitination was a common pathway in cluster 1 target hub genes in this study. Downstream associated pathways include cell-cell adhesion, nucleoplasm, and RNA binding. Lysine modification related mutations are associated with worse survival [105].
Helicobacter pylori and hepatitis viruses have been linked to pancreatic cancer, possibly through inflammatory signaling pathways including proinflammatory cytokines, Toll-like receptor (TLR)/MyD88 (myeloid differentiation primary response gene 88) pathway, nuclear factor-kappa B (NF-κB), up-regulating transcription factors involved in EMT regulation [44]. Pathways involved in hepatitis viral carcinogenesis include MAPK, P13K-Akt, Jak-STAT, p53, NF-kappaB, and apoptosis [106].
Group 6 had a prominent role in fatty acid synthesis and metabolism. Many enzymes involved in cholesterol synthesis are up-regulated in pancreatic cancer [107]. Fatty acid metabolism is regulated by oncogenic signal transduction pathways, such as P13K-Akt-mTOR signaling. Fatty acids also participate in remodeling the tumor microenvironment [108].
Adhesion pathways and ECM interactions may play a role in the evolution of pancreatitis to pancreatic ductal cancer. Loosening of cell-cell adhesion between pancreatic cells disrupts structure and promotes permeability, inflammatory cell migration, and interstitial edema. Oxidative stress in pancreatitis leads to up-regulation of adhesion molecules, such as P-selectin and ICAM-1. These are thought to play a role in the pathological features of acute and chronic pancreatitis, which include inflammatory cell infiltration, stroma formation, and fibrosis. At adherens junctions, tyrosine phosphorylation, of the cadherin-catenin complex, regulates cell contacts. Upregulation of E-cadherin, an adhesion protein, is associated with promotion of the repair of cell-cell-adhesions and protective response [109]. However, E-Cadherin down-regulation is a critical component of EMT, such that it has even been considered as a marker for EMT [110]. Adherens junction is also involved in Wnt, MAPK, and TGF-beta signaling pathways [111].
Stromal cell-derived ECM (Extracellular Matrix) proteins were found to be non-specific, but tumor-cell derived ECM proteins were correlated with poor prognosis. Incidence of PanIN (Pancreatic Intraepithelial Neoplasia) increases to 60% in pancreatitis. Collagens were the most important group of proteins in PDAC progression and pancreatitis. Stromal matrix changes in pancreatitis are a subset of the changes in PDAC, however, PDAC, compared with PanIN and pancreatitis, up-regulates the largest portion of matrisome proteins, thus representing the most fibrotic state. Wnt (Wingless-Related Integration Site) proteins may be active in progression of PanIN to PDAC, but not relevant in pancreatitis. Proteoglycans and focal adhesion are involved in ECM receptor interaction [112].
It is significant to note that the best performance in both the training and validation sets was garnered by the panel that had the highest no. of included miR, which was the original set of miR (n=22). The second best performance in the validation set was by group 6, which had one of the higher no. of miR (n=6). The third best performance in the validation set was by the down-regulated Group 2 (n=9), which was also the group with the second best performance in the training dataset. A larger group of miRNA may have greater predictive ability secondary to diversity of signaling pathways included. Similarly, although 574-5p came up often in many groups and was the most important feature in the decision tree model (group 7) and the random forest model (group 8), it was likely less specific than a combination of grouped miRNA which likely represent diverse pathways in multifactorial pancreatic cancer.
Previous studies found miRNA biomarker panels in plasma such as miR-18a [113], 16, 196a, CA19-9 [7], 22, 642b, 885 [114]. Serum miRNA panels included 20a, 21, 24, 25, 99a, 185, 191, 1290 [115], 125a, 4294, 4476, 4530, 6075, 6799, 6836, 6880 [116], 373 [117], 133a [118], 663a, 642b, 5100, 8073 [119], 1290, 1246, CA19-19 [120], 16, 18a, 20a, 24, 25, 27a, 29c, 30a-5p, 191, 323-3p, 345 and 483-5p [121]. Blood miRNA panels included 26b, 34a, 122, 126*, 145, 150, 223, 505, 636, 885-5p [122]. Of the above, serum panels of 20a, 21, 24, 25, 99a, 185, 191, and plasma panel of 16, 196a, and CA19-9 also differentiated from chronic pancreatitis and PDAC. 181d differentiated from Auto-immune pancreatitis and PDAC [123].
The lack of any significant miRNA shared between the different studies and our study is likely due to different sample preparation protocols and detection methodologies as well as the type of sample itself (plasma vs. serum vs. whole blood) [119]. The functional pathways associated with the microRNA panels of the prior studies did have much in common with the pathways elucidated in this study. However, when compared to this study, the difference is at least partly intentional due to extraction of microRNA that were common to both pancreatitis and PDAC in the current study, as opposed to previous studies that tried to differentiate the miRNA groups for pancreatitis and PDAC. Despite this, the original 22 group and the down-regulated group 2 demonstrate good prediction of PDAC in datasets that both include and exclude information regarding pancreatitis origin of pancreatic cancer.
There were some limitations in the study. Imbalanced datasets (35% PDAC in training dataset, 8% pancreatic cancer in validation set, with the rest being controls) were addressed with a model designed to oversample the minority label and cross validated. Smaller training dataset (n=254) may have affected results. The training dataset had pancreatic ductal adenocarcinoma, while the larger validation dataset combining 6 GSE datasets had pancreatic cancer. Since PDAC constitutes over 90% of pancreatic cancer, the discrepancy is limited but present. Many publicly available databases and studies including TCGA (The Cancer Genome Atlas) also include exclusively “pancreatic cancer” (with no specification if this constituted PDAC or non-PDAC). Given different molecular pathways and far worse prognosis of PDAC compared to other less common pancreatic cancers such as neuroendocrine tumors, the inclusion under a common umbrella of pancreatic cancer may skew data analytic results [124].