DOI: https://doi.org/10.21203/rs.3.rs-2603098/v1
Purpose
Pancreatitis is one of the most important risk factors for pancreatic ductal adenocarcinoma (PDAC). PDAC is a silent, aggressive malignancy that has less than 5% survival rate at 5 years. Detection at early stage and resection of PDAC significantly improves survival. A differentially expressed microRNA panel was sought that could predict the risk of progression to PDAC from pancreatitis.
Methods
Differentially expressed microRNA (DEM) in serum that were common between pancreatitis and PDAC were extracted from two microarray GSE datasets containing pancreatitis, PDAC, and control samples. Eight groups of DEM were derived from multiple bioinformatics methods such as differential expression, miRNA interaction networks, target gene prediction tools, functional enrichment analysis, and machine learning models. The functional enrichment pathway of these groups were identified.
These groups were trained on the original datasets and were used to predict pancreatic cancer in a validation set consisting of six other GSE datasets containing pancreatic cancer and controls. The miRNA panel with the highest precision and recall was the group derived from the target hub genes with the highest interaction (hsa-miR-28-3p, 320b, 320c, 320d, 532-5p, and 423-5p, with a mean F1 of 0.968, mean recall of 0.99, mean precision of 0.947, and mean AUC of 0.995).
These results provide a potential biomarker to identify and follow individuals at high risk for pancreatic cancer after pancreatitis.
Pancreatic Cancer and Pancreatitis
Pancreatic cancer is the 3rd most common cause of cancer related deaths and is projected to become the 2nd leading cause of cancer death by 2030 even as it comprises only 3.2% of all cancer cases [1]. Pancreatic ductal adenocarcinoma (PDAC) comprises 90-95% of all pancreatic cancer [2]. Five year survival for pancreatic ductal adenocarcinoma remains below 5%, with 80% of patients surgically unresectable at the time of presentation. The survival for surgically resectable pancreatic cancer is 17.4% at five years [3, 4]. The most important predictor of survival in pancreatic cancer is resection of early stage cancer [5]. Currently, screening for early detection of pancreatic cancer via annual MRI or endoscopic ultrasound (EUS) is recommended only in the approximately 10% of individuals with hereditary or genetic syndromes [6, 7]. Risk factors include smoking, aging, diabetes, obesity, alcohol, pancreatitis, and genetic factors [7].
Per 100,000 people in the general population, the yearly global incidence of acute pancreatitis is 34 cases, and chronic pancreatitis is 10 cases. The global transition rate from the first episode of acute pancreatitis to a recurrent episode is ~20% and, from recurrent acute pancreatitis to chronic pancreatitis, the rate is ~35% [8]. Pancreatic cancer risk increases 20 times during the first two years after acute pancreatitis (inflammation of the pancreas), and remains double that of the general population after five years [9]. There is an increasing prevalence of pancreatitis and associated years lived with disability [10]. Acute pancreatitis may be the first manifestation of chronic pancreatitis, especially in the setting of persistent triggers such as alcohol. Chronic pancreatitis has a 15-16 fold higher risk of developing pancreatic cancer over the general population [2].
Pancreatic Cancer Blood Biomarkers
KRAS (Kirsten rat sarcoma virus), p16, TP53 (Tumor protein p53), SMAD4 (Mothers against decapentaplegic homolog 4) gene abnormalities are typically found in most PDAC, although they are non-specific and are involved in multiple other cancers [4]. An optimal biomarker would need to be sensitive, reasonably specific and easily accessible, such as through blood. Currently, CA19-9 is the only clinically used blood biomarker. Due to limited sensitivity and specificity, it is only used to detect recurrence of pancreatic adenocarcinoma. Some blood biomarkers have been studied for early diagnosis of pancreatic cancer, some of which include CA19-9, peptide panels, tumor-associated autoantibodies and microRNAs [7].
MicroRNA are single stranded non-coding RNA that are involved in RNA silencing and regulation of gene expression. MicroRNA was chosen as a potential biomarker for this study due to the ease of detection of relatively small numbers of molecules, and stability compared to mRNA [11]. High-throughput analysis such as DNA microarray and next-generation sequencing allow access to all of the microRNA in the sample. MicroRNA was the predominant type of blood biomarker available for pancreatitis and PDAC in available public datasets. A few different microRNA panels have also been validated as blood biomarkers for pancreatic cancer in previous studies [7]. Many of the prior biomarker studies aimed to differentiate pancreatic cancer precursors (pancreatic intraepithelial neoplasia (PanIN), intraductal papillary mucinous neoplasm, or mucinous cystic tumors) and pancreatitis, from pancreatic cancer [7] found specific panels that differentiated chronic pancreatitis from pancreatic cancer. However, there have been no studies on common biomarkers in pancreatitis and PDAC that may predict evolution from the former to the latter.
This study aims to identify, compare, and extract a differentially expressed microRNA (DEM) panel in serum, that could predict risk of progression to PDAC from pancreatitis. If a high risk of PDAC could be predicted early in patients who have had pancreatitis, by identifying specific microRNA that tend to be common to pancreatitis and PDAC, they can then undergo annual MRI imaging screening to detect early stage cancer, given that resection of early stage cancer carries the best prognosis. Downstream and upstream target pathways could also be targets for the developments of therapeutics. DEM panels were obtained from up-regulated and down-regulated miRNA, AUC curves and Pearson correlation analysis, miRNET interaction analysis, Cytoscape MCODE clusters, and machine learning models such as decision tree and random forest. The DEM panel with the highest precision and recall was obtained from testing on a separate, larger validation set.
MicroRNA Expression Datasets and DEM Extraction
NCBI GEO microarray datasets GSE31568 and GSE61741 containing pancreatitis, control, and PDAC samples for peripheral blood microRNA were chosen using keywords 'pancreatic’, ’serum’, ‘homo sapiens’ and using the non-coding RNA profiling by array filter. The differentially expressed miRNA (DEM) of pancreatitis vs control and the PDAC vs control of each GEO dataset were obtained from GEO2R. The common differentially expressed microRNA (n=23) of the two GSE datasets were extracted through a Venn diagram [12]. Expression values for these 23 miRNA in each dataset were combined through Geoquery R package. There were 90 PDAC, 75 pancreatitis, and 164 control samples.
ROC Curves and AUC Analysis
Expression values of the total DEM were normalized and log-transformed through limma R package. ROC curves and AUC were used to determine the ability of each DEM to differentiate pancreatitis vs control and PDAC vs control. The top miRNA with AUC >0.8 formed group 1.
Up and Down-Regulated DEM
The significantly down-regulated and up-regulated DEM in the total dataset for PDAC vs control were analyzed through edgeR in R to form group 2 and 3 respectively. EdgeR uses TMM (Trimmed Mean of M-Values) normalization, negative binomial distribution for the read counts distribution, and exact test for the differential expression. These were plotted with the statmod library. The expression values of the up/down-regulated DEM were then used to execute hierarchical clustering with the method parameter set to ‘complete’. The result was then visualized as a heatmap through the gplot package.
Correlation Analysis
Using R, corrplot, and the RColorBrewer package, Pearson correlation coefficients were obtained and visualized as a Corrplot correlation map. The top 4 most correlated miRNA formed group 4.
MiRNET miR Interaction Network
MiRNET links miRNA to their targets and other correlated molecules. Correlated DEMs and their target genes, as well as their functional annotations were obtained using the hypergeometric algorithm and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and in the MiRNET miRNA interaction tool, with a 2 degree cut-off. The closest miRNA formed group 5.
MiRDIP Target Prediction
MiRDIP is a microRNA data integration portal which supplies numerous miRNA target predictions. Predicted target gene lists of each DEM were acquired based on an integrative score of confidence (Supplementary File 1).
STRING Interaction Network, Cytoscape MCODE, and Functional Enrichment Analysis
Target gene interaction networks were predicted with the STRING database, with the confidence interaction score set to greater than 0.7 [13]. The protein-protein interaction networks were uploaded to Cytoscape [14]. The top network modules were selected by MCODE (Molecular Complex Detection) plug-in in Cytoscape. The degree cutoff was set to 2, the node score cutoff as 0.2, k-core as 2, and maximum depth as 100. The average degree of the MCODE score and nodes were chosen as the cutoff score, with >4 and >12 used for MCODE scores and hub nodes respectively. Functional enrichment analysis was then performed using DAVID functional annotation tool for all target genes and top modules (Supplementary File 2). Reverse MiRDIP was used to find the miRNA associated with the top module target genes. Any shared miRNA between these and the original 23 DEM formed group 6.
Machine Learning Analysis
The expression values of the 23 DEM were processed using min-max normalization, in the pandas and numpy python 3.7 packages. MiRNA 720 from the 23 common miR was removed as Schopman et al. showed that the sequence annotated as miR-720 is likely to be a fragment of a tRNA (RNA Biol. 2010 7:573-576) . Using the sklearn package in Python, a decision tree model (max depth = 10) was trained and tested on PDAC and control samples. ROC curves were plotted and AUC scores were determined based on this model. A confusion matrix was visualized using pyplot from matplotlib. The top 5 most important features of the decision tree were extracted to form group 7. All the groups were analyzed through mirpath for functional pathway involvement. Random Forest SMOTE model with RepeatedStratifiedKFold (n_splits = 10, n_repeats = 3) was utilized to train the imbalanced data. This model oversamples the minority label of an imbalanced dataset. Evaluation metrics, including mean F1 scores, mean precision, mean recall and AUC were procured. The various features, or DEM, were ranked by importance based on this SMOTE model.
Validation
Six datasets containing pancreatic cancer and control samples (4360 controls and 360 pancreatic cancer) were combined and processed with GEO2R, and Geoquery, limma packages in R studio. The fitted SMOTE random forest model from the training data was used to predict pancreatic cancer in this validation set with similar evaluation metrics as for the training dataset. Of note, miRNA 885-3p and 320d-1 which were part of the original 22 DEMs were only available as the precursor 885-5p and 320d in the validation dataset. 5p indicates the microRNA from the 5 prime arm of the hairpin and 3p indicates 3 prime end.
ROC Curves and AUC Analysis
The ROC curves demonstrated that hsa-miR-574-5p showed the highest differentiation between PDAC and control with an AUC 0.88 (Fig. 1a). Hsa-miR-608 had the second highest AUC of 0.81 for PDAC vs control but had the highest AUC (0.88) for differentiating pancreatitis vs control (Fig. 1b). These two miRNA formed group 1.
Up/Down Regulated DEM
The most significant down-regulated miRNA in PDAC vs control consist of hsa-miR-146b-3p, 27b, 100-3p, 487b, 28-3p, 320d, 192-3p, 181a-5p, and 532-5p formed group 2 (p<0.05).
The most significant up-regulated miRNA in PDAC vs control consisted of hsa-miR-1250, 608, 126-5p, 885-5p, 595, 302d, and 574-5p, and formed group 3 (p<0.05) (Fig. 2a, 2b).
Correlation Analysis
Pearson correlation coefficients showed the top two pairs of correlated miRNA to be hsa-miR-574-5p and hsa-miR-595; hsa-miR-532-5p, and hsa-miR-181a-5p (p<0.0005, correlation 0.6 and 0.56 respectively) (Fig. 3). These formed group 4.
MiRNET Interaction Network
MiRNET showed 4 DEM with the closest interactions based on target genes and downstream pathways: 181a and 126 and their most abundant mature forms, 181a-5p and 126-5p which formed group 5 (Fig. 4). The highest interactions were found between these DEM and their target genes (Supplementary File 3). The most significantly enriched pathway of these DEMs was the neurotrophin signaling pathway (Supplementary File 3).
Cytoscape MCODE Clusters
1542 target genes were achieved for the 23 DEM with a top 1% cutoff from the MiRDIP. Target gene protein-protein interaction network of these target genes from STRING was uploaded to Cytoscape MCODE plug-in which identified 3 clusters with the strongest interactions of all target genes (Fig. 5). Cluster 1 (33 nodes, 528 edges) main pathway was ubi-conjugation and ubiquitin pathway; cluster 2 (20 nodes, 160 edges) was mRNA splicing/processing/binding; cluster 3 (16 nodes, 120 edges) was mainly endocytosis. The shared miRNA associated with these clusters hub genes and the original 23 miRNA formed group 6 (hsa-miR-28-3p, hsa-miR-320b, hsa-miR-320c, hsa-miR-532-5p, hsa-miR-320d, hsa-miR-423-5p).
Decision Tree Model
A decision tree was trained and tested for PDAC and control was analyzed to obtain 5 most important parameters to form group 7 with AUC 0.92 (Fig. 6). A confusion Matrix and ROC curve was plotted using the same decision tree model (Fig. 7a, 7b).
For the group 7 of miRNA, the top 5 most important parameters in the decision tree were found as hsa-miR-574-5p, hsa-miR-126-5p, hsa-miR-1250-5p, hsa-miR-151-3p, hsa-miR-487b-3p.
Random Forest Training Dataset
Random Forest SMOTE model was used to extract top 5 important features of the original 22 DEM to form group 8. F1 is the harmonic mean of the model's precision and recall and is the most reliable predictor for imbalanced data. The most predictive group was the original 22 microRNA group (mean F1 0.992, mean recall 0.996, mean precision 0.988, mean AUC 1.000). The second most predictive group was the down-regulated group 2 (mean F1 0.983, mean recall 0.99, mean precision 0.977, mean AUC 0.998, with the top 5 most important features being 320d, 146b-3p, 100-3p, 487b-3p, and 27b-5p). The third most predictive group was the up-regulated group 3 (mean F1 0.983, mean recall 1.0, mean precision 0.967, mean AUC 0.998, with the top most important features being 574-5p, 595, 608, 126-5p, and 1250-5p) (Fig. 8a, 8b).
Random forest SMOTE showed F1 scores increased with the number of miRNA taken in the group and was highest for the original unfiltered group of 22 miRNA.
Validation Set
The fitted random forest SMOTE model from the training dataset was applied to predict pancreatic cancer in the combined validation dataset. The most predictive group remained the 22 original microRNA group (mean F1 0.976, mean recall 0.996, mean precision 0.958, mean AUC 0.999). The top 3 subsequent groups for the validation set included the MCODE group 6 (mean F1 0.968, mean recall 0.99, mean precision 0.947, mean AUC 0.995), the down-regulated group 2 (mean F1 0.962, mean recall 0.986, mean precision 0.939, mean AUC 0.99), and the random forest group 8 (mean F1 0.954, mean recall 0.977, mean precision 0.932, mean AUC 0.986) (Fig. 9a, 9b).
DISCUSSION
Group 1 included hsa-miR-574-5p and 608
Hsa-miR-574-5p, is known to be involved in fatty acid elongation, base excision repair, hippo signaling pathway, lysine degradation, purine metabolism, and viral carcinogenesis [15]. It is known to be involved in lung adenocarcinoma, small cell lung cancer, breast cancer, gastric cancer, and nasopharyngeal cancer [16, 17, 18, 19, 20, 21]. It is also involved in other inflammatory pathways including diabetes, asthma, and cardiac remodeling [22, 23, 24, 25]. It has not been found as a differentiating signature in pancreatic cancer previously.
Mir-608 was shown to promote apoptosis via BRD4 downregulation in PDAC [26]. It is also involved in metabolism of xenobiotics by cytochrome P450, transcriptional misregulation in cancer, and base excision repair [15]. It has a role in regulation of apoptosis in non-small cell lung cancer, as well as other multiple types of cancer [27, 28].
Group 2 included miR-146b-3p, 27b, 100-3p, 487b, 28-3p, 320d, 192-3p, 181a-5p, 532-5p
The most significant pathways for the downregulated group were steroid biosynthesis, hippo signaling pathway, ECM-receptor interaction, adherens junction, proteoglycans in cancer, lysine degradation, and viral carcinogenesis(p<0.005). These are also known to be involved in prostate cancer, colorectal cancer, endometrial cancer, and non-small cell lung cancer [15]. MiRNA 146b-3p is also involved in hepatocellular carcinoma, pancreatic cancer, and thyroid cancer [29, 30, 31]. 146b-3p induces apoptosis and blocks proliferation in pancreatic cancer stem cells by targeting the MAP3K10 gene.
MiRNA 100-3p is also known to be involved in esophageal and gastric cancers, vulvar carcinoma, and bladder cancer [32, 33, 34]. MiRNA 27b-5p is involved in oral cancer, ovarian carcinoma, and gastric cancer [35, 36, 37]. MiRNA 487b-3p is involved in colon cancer, osteosarcoma, and anaphylactic reactions [38, 39]. MiRNA 28-3p is involved in Alzheimer’s, nasopharyngeal cancer, gastric cancer, thyroid cancer, and esophageal squamous cell carcinoma [40, 41, 42]. MiRNA 532-5p is involved in breast cancer, glioma, gastric cancer, ovarian cancer, renal carcinoma, and ischemic stroke [27, 43, 44, 45, 46, 47]. MiRNA 320d is involved in hepatocellular carcinoma, aortic dissection, and diffuse large B-cell lymphoma [26, 48, 49]. 320d is most associated with colorectal cancer [50]. MiRNA 181a-5p is involved in atherosclerosis, bladder cancer, glioblastoma, prostate cancer, endometrial cancer, and breast cancer [51, 52, 53, 54, 55, 56]. MiRNA 192-3p is involved in renal disease and gastric cancer [57, 23].
Group 3 included miR-1250, 608, 126-5p, 885-5p, 595, 302d, and 574-5p
The most significant (p<0.005) pathways for the upregulated DEM were proteoglycans in cancer, hippo signaling pathway, lysine degradation, viral carcinogenesis, base excision repair, metabolism of xenobiotics by cytochrome P450, and transcriptional misregulation in cancer. They were also known to be involved in non-small cell lung cancer, colorectal cancer, chronic myeloid leukemia, and pancreatic cancer [15].
MiRNA 126-5p is involved in ovarian cancer, acute myocardial infarction/atherosclerosis, endometriosis, and cervical cancer [58, 59, 60, 61, 62]. 126-5p was noted to differentiate severe acute pancreatitis from mild acute pancreatitis [20]. MiRNA 302d-3p is involved in endometrial cancer, cervical squamous cell carcinoma, glaucoma, osteoarthritis, gastric cancer, and breast cancer [63, 64, 65, 66, 67, 68]. MiRNA 885-3p is involved in clear cell renal carcinoma and gastric cancer [69, 70]. MiRNA 1250-5p is a tumor suppressive miRNA, which is silenced by DNA methylation of AATK gene in non-Hodgkin’s lymphoma [71]. MiRNA 595 is involved in hepatocellular carcinoma, ovarian cancer, glioblastoma, and inflammatory bowel disease [72, 73, 74].
Group 4 included pairs hsa-miR-574-5p and 595; 532-5p, and 181a-5p
Group 4 had the most significant (p<0.001) pathways as hippo signaling pathway, lysine degradation, proteoglycans in cancer, viral carcinogenesis, and TGF-beta signaling pathway. These miRNA were also involved in glioma, endometrial cancer, colorectal cancer, non-small cell lung cancer, prostate cancer, thyroid cancer, and pancreatic cancer [15].
Group 5 included hsa-miR-181a, 126, 181a-5p, and 126-5p
The most significant (p<0.0001) shared pathways of this group were neurotrophin signaling pathway, proteoglycans in cancer, viral carcinogenesis, and signaling pathways regulating pluripotency of stem cells. They were also involved in glioma, endometrial cancer, non-small cell lung cancer, colorectal cancer, prostate cancer, pancreatic cancer, renal cell carcinoma, and chronic myeloid leukemia [15].
Group 6 included hsa-miR-28-3p, 320b, 320c, 532-5p, 320d, and 423-5p
For these group 6 miRNA, the most significant (p<0.006) pathways were fatty acid biosynthesis, adherens junction, hippo signaling pathway, proteoglycans in cancer, lysine degradation, viral carcinogenesis, and fatty acid metabolism. They were also associated with glioma, Huntington’s disease, pancreatic cancer, and non-small cell lung cancer [15].
MiRNA 320b is involved in COPD (chronic obstructive pulmonary cancer), osteosarcoma, glioma, and atherosclerosis [75, 76, 77, 78]. 320b suppresses pancreatic cancer cell proliferation by targeting the FOXM1 gene [79]. MiRNA 320c is involved in pulmonary disease/asthma, cervical cancer, breast cancer, bladder cancer, colorectal cancer, myelodysplasia, and osteoarthritis [80, 26, 81, 82, 83, 84, 85]. 320c regulates the resistance to gemcitabine through SMARCC1 [86]. MiRNA 423-5p is involved in osteosarcoma, prostate cancer, glioblastoma, ovarian cancer, thyroid cancer, colorectal cancer, pulmonary tuberculosis, and many other cancers [87, 88, 89, 90, 91, 92, 93].
Group 7 included hsa-miR-574-5p, 126-5p, 1250-5p, 151-3p, and 487b-3p
For the group 7 of miRNA found as the top 5 most important parameters in the decision tree, the most significant (p<0.009) pathways were proteoglycans in cancer, viral carcinogenesis, biosynthesis of unsaturated fatty acids, and hippo signaling pathway. These miRNA have also been associated with non-small cell lung cancer, colorectal cancer, glioma, endometrial cancer, renal cell carcinoma, chronic myeloid leukemia, and pancreatic cancer [15].
MiRNA 151-3p is also involved in breast cancer, osteosarcoma, myocardial infarction, cholangiocarcinoma, nasopharyngeal carcinoma, and gastric cancer [94, 95, 96, 97, 98].
Group 8 included hsa-miR-574-5p, 608, 1250-5p, 595, and 320d
For group 8, the most significant (p<0.05) pathways were hippo signaling pathway, base excision repair, transcriptional misregulation in cancer, metabolism of xenobiotics by cytochrome P450, TGF-beta signaling pathway, and adherens junction.
A validation dataset that had only pancreatic cancer and control but not pancreatitis, was chosen for two reasons: 1. There was no other publicly available data containing pancreatic cancer, pancreatitis, and control that hadn’t already been used for training (which was already a small dataset); 2. To evaluate the consistency of the best performing mRNA group from the training data when applied to a dataset to differentiate pancreatic cancer and control without the benefit of pancreatitis data. There would be no data to confirm if predicted pancreatic cancer evolved from an earlier episode of pancreatitis in this validation dataset. However, the f1 scores could point to validity of the chosen miRNA group in predicting pancreatic cancer in patients with either no history of pancreatitis or with history of un-recalled or sub-clinical pancreatitis in the past, purely based on a very strong correlation of this miRNA group with PDAC (90% of pancreatic cancer).
Animal model studies have revealed that pancreatic cancer cells metastasize to the liver before the primary site of origin is even detected. This rapid tumor progression is thought to be secondary to epithelial to mesenchymal transition (EMT). The most common signaling pathways affected in pancreatic cancer are the TGF-beta signaling pathway in EMT, wnt/beta-catenin signaling pathway, notch signaling pathway, snail transcription factors, zeb transcription factors, and basic helix loop helix transcription factors (bHLH) [99, 100].
All of the miRNA panels showed good performance with AUC>0.92 and F1 scores >0.85. Almost all of the microRNA panel groups included in the study involved nearly all of the known established pathways in pancreatic cancer. These included Hippo signaling, proteoglycans in cancer, neurotrophin signaling pathways, lysine degradation, TGF-beta signaling, viral carcinogenesis, fatty acid biosynthesis and metabolism, adherens junction, and ECM-receptor interaction (KEGG fig pathway).
The hippo signaling pathway in pancreatic cancer is executed by two major proteins, YES-associated protein (YAP) and transcriptional coactivator with PDZ-binding motif (TAZ). These promote a strong stromal reaction in the pancreatic tumor microenvironment (TME), even in the absence of KRAS [101]. Proteoglycans are involved in the P13K-Akt signaling pathway, MAPK, Wnt signaling pathways, focal adhesion, VEGF and TGF-beta signaling pathways [102].
Perineural invasion, although present in several tumors, has the highest prevalence in PDAC, ranging from 70-90%, including in early-stage and microscopic PDAC, suggesting that it could represent an early event in tumor progression. Neurotrophins are growth factors which increase growth, proliferation, and nerve-cancer affinity in perineural invasion [103]. Neurotrophins affect downstream pathways such as MAPK signaling pathway, ubiquitin mediated proteolysis, and apoptosis [104].
Ubiquitination and acetylation are common lysine modifications. Ubiquitination was a common pathway in cluster 1 target hub genes in this study. Downstream associated pathways include cell-cell adhesion, nucleoplasm, and RNA binding. Lysine modification related mutations are associated with worse survival [105].
Helicobacter pylori and hepatitis viruses have been linked to pancreatic cancer, possibly through inflammatory signaling pathways including proinflammatory cytokines, Toll-like receptor (TLR)/MyD88 (myeloid differentiation primary response gene 88) pathway, nuclear factor-kappa B (NF-κB), up-regulating transcription factors involved in EMT regulation [44]. Pathways involved in hepatitis viral carcinogenesis include MAPK, P13K-Akt, Jak-STAT, p53, NF-kappaB, and apoptosis [106].
Group 6 had a prominent role in fatty acid synthesis and metabolism. Many enzymes involved in cholesterol synthesis are up-regulated in pancreatic cancer [107]. Fatty acid metabolism is regulated by oncogenic signal transduction pathways, such as P13K-Akt-mTOR signaling. Fatty acids also participate in remodeling the tumor microenvironment [108].
Adhesion pathways and ECM interactions may play a role in the evolution of pancreatitis to pancreatic ductal cancer. Loosening of cell-cell adhesion between pancreatic cells disrupts structure and promotes permeability, inflammatory cell migration, and interstitial edema. Oxidative stress in pancreatitis leads to up-regulation of adhesion molecules, such as P-selectin and ICAM-1. These are thought to play a role in the pathological features of acute and chronic pancreatitis, which include inflammatory cell infiltration, stroma formation, and fibrosis. At adherens junctions, tyrosine phosphorylation, of the cadherin-catenin complex, regulates cell contacts. Upregulation of E-cadherin, an adhesion protein, is associated with promotion of the repair of cell-cell-adhesions and protective response [109]. However, E-Cadherin down-regulation is a critical component of EMT, such that it has even been considered as a marker for EMT [110]. Adherens junction is also involved in Wnt, MAPK, and TGF-beta signaling pathways [111].
Stromal cell-derived ECM (Extracellular Matrix) proteins were found to be non-specific, but tumor-cell derived ECM proteins were correlated with poor prognosis. Incidence of PanIN (Pancreatic Intraepithelial Neoplasia) increases to 60% in pancreatitis. Collagens were the most important group of proteins in PDAC progression and pancreatitis. Stromal matrix changes in pancreatitis are a subset of the changes in PDAC, however, PDAC, compared with PanIN and pancreatitis, up-regulates the largest portion of matrisome proteins, thus representing the most fibrotic state. Wnt (Wingless-Related Integration Site) proteins may be active in progression of PanIN to PDAC, but not relevant in pancreatitis. Proteoglycans and focal adhesion are involved in ECM receptor interaction [112].
It is significant to note that the best performance in both the training and validation sets was garnered by the panel that had the highest no. of included miR, which was the original set of miR (n=22). The second best performance in the validation set was by group 6, which had one of the higher no. of miR (n=6). The third best performance in the validation set was by the down-regulated Group 2 (n=9), which was also the group with the second best performance in the training dataset. A larger group of miRNA may have greater predictive ability secondary to diversity of signaling pathways included. Similarly, although 574-5p came up often in many groups and was the most important feature in the decision tree model (group 7) and the random forest model (group 8), it was likely less specific than a combination of grouped miRNA which likely represent diverse pathways in multifactorial pancreatic cancer.
Previous studies found miRNA biomarker panels in plasma such as miR-18a [113], 16, 196a, CA19-9 [7], 22, 642b, 885 [114]. Serum miRNA panels included 20a, 21, 24, 25, 99a, 185, 191, 1290 [115], 125a, 4294, 4476, 4530, 6075, 6799, 6836, 6880 [116], 373 [117], 133a [118], 663a, 642b, 5100, 8073 [119], 1290, 1246, CA19-19 [120], 16, 18a, 20a, 24, 25, 27a, 29c, 30a-5p, 191, 323-3p, 345 and 483-5p [121]. Blood miRNA panels included 26b, 34a, 122, 126*, 145, 150, 223, 505, 636, 885-5p [122]. Of the above, serum panels of 20a, 21, 24, 25, 99a, 185, 191, and plasma panel of 16, 196a, and CA19-9 also differentiated from chronic pancreatitis and PDAC. 181d differentiated from Auto-immune pancreatitis and PDAC [123].
The lack of any significant miRNA shared between the different studies and our study is likely due to different sample preparation protocols and detection methodologies as well as the type of sample itself (plasma vs. serum vs. whole blood) [119]. The functional pathways associated with the microRNA panels of the prior studies did have much in common with the pathways elucidated in this study. However, when compared to this study, the difference is at least partly intentional due to extraction of microRNA that were common to both pancreatitis and PDAC in the current study, as opposed to previous studies that tried to differentiate the miRNA groups for pancreatitis and PDAC. Despite this, the original 22 group and the down-regulated group 2 demonstrate good prediction of PDAC in datasets that both include and exclude information regarding pancreatitis origin of pancreatic cancer.
There were some limitations in the study. Imbalanced datasets (35% PDAC in training dataset, 8% pancreatic cancer in validation set, with the rest being controls) were addressed with a model designed to oversample the minority label and cross validated. Smaller training dataset (n=254) may have affected results. The training dataset had pancreatic ductal adenocarcinoma, while the larger validation dataset combining 6 GSE datasets had pancreatic cancer. Since PDAC constitutes over 90% of pancreatic cancer, the discrepancy is limited but present. Many publicly available databases and studies including TCGA (The Cancer Genome Atlas) also include exclusively “pancreatic cancer” (with no specification if this constituted PDAC or non-PDAC). Given different molecular pathways and far worse prognosis of PDAC compared to other less common pancreatic cancers such as neuroendocrine tumors, the inclusion under a common umbrella of pancreatic cancer may skew data analytic results [124].
A new serum biomarker panel of 22 microRNA predicting evolution of pancreatitis to pancreatic ductal adenocarcinoma, and it’s associated pathways, has been identified, that also performed very well in distinguishing pancreatic cancer (with or without pancreatitis risk factor) from control. A smaller panel of 9 microRNA (hsa-miR-146b-3p, 27b, 100-3p, 487b, 28-3p, 320d, 192-3p, 181a-5p, and 532-5p) had the second best performance. The goal of identifying common microRNA between pancreatitis and PDAC in a patient who has had pancreatitis is to use those biomarkers as a screening test to identify those patients with pancreatitis, who would benefit from undergoing annual MRI imaging screening. Thereby, potential early stage PDAC can be discovered, and resected, thereby enabling the best chance of cure.
The inflammation to tumor progression and its implication in the discovery of modern day biomarkers is a potential target for future studies. Larger case control and cohort studies, with standardized sequencing protocols would be helpful. Sample collection (blood vs. serum vs. plasma) would benefit from standardization, with an eye towards accuracy and ease of processing. Specification of pancreatic ductal adenocarcinoma (vs pancreatic cancer) would be needed to avoid skewing data with the more benign types of pancreatic cancer, such as neuroendocrine tumors. Applicability to other tumors should be expanded upon, given many common signaling pathways. Eventually, prospective experimental studies would be needed. Serial acquisition of common biomarkers from the first episode of pancreatic pathology could predict evolution from pancreatitis and other precursors to PDAC. Given many common target pathways, these biomarkers may also incidentally detect other cancers, such as lung and gastrointestinal cancers.
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
The authors have no relevant financial or non-financial interests to disclose.
Author Contributions
Material preparation, data collection and analysis were performed by Mira Nuthakki. The first draft of the manuscript was written by Mira Nuthakki and Vivian Utti commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Ethics Approval
The research conducted was purely virtual and utilized publicly available data from the NCBI database. Dr. Serena McCalla of the iResearch Institute also determined that no ethics approval was needed.