2.1 Datasets
The study gathered transcriptional sequencing, single nucleotide variant (SNV), and clinical data on pancreatic cancer from the Cancer Genome Atlas (TCGA) database. Differential gene expression in pancreatic cancer was determined using analysis of variance on differentially expressed genes (DEGs) extracted from the GEPIA database. The significance threshold was set at p-value < 0.05 and fold change (log2FC) > 1. Validation cohorts were obtained by downloading transcriptional sequencing data from the International Cancer Genome Consortium (ICGC) and Gene Expression Omnibus (GEO) database. Genes associated with various types of cell death, including apoptosis, ferroptosis, necroptosis, entotic cell death, and pyroptosis, were retrieved from the MSigDB database.
2.2 Identification of Shared Genes in Regulated Cell Deaths
The Weighted Gene Co-expression Network Analysis (WGCNA) is a machine learning algorithm which identifies gene modules with similar expression patterns. It was utilized to investigate the common genes implicated in various programmed cell death pathways that are exclusive to pancreatic cancer. To start, we used Pearson correlation to construct a co-expression matrix. Then, we generated a scale-free network to mimic biological systems. This intricate network consists of a group of specific nodes, known as hubs, which display numerous connections to other nodes, while most other nodes exhibit only a few connections. To achieve this, we utilized the soft-threshold power (β), which is a power-exponent function, to form a scale-free network (adjacency matrix) out of the co-expression matrix. To determine the suitable soft-threshold power (β), we performed a linear regression analysis by comparing the frequency of the total correlation coefficient sums (ki) with their correspondent ki values. An R^2 value greater than 0.8 is essential in improving connectivity within the network. Consequently, the soft-threshold power (β) was computed. To minimize bias from other genes, we generated the Topological Overlap Matrix (TOM) via the scale-free network method (7). Several gene modules were identified through TOM analysis of DEGs. These modules comprise genes that exhibit comparable expression patterns or functional connections within the framework of pancreatic cancer. Enrichment scores of genes implicated in regulated cell death were determined via single-sample gene set enrichment analysis (ssGSEA) utilizing gene expression data from pancreatic cancer. Pearson correlation tests were performed to evaluate the association between gene modules and enrichment scores for regulated cell death genes. The gene module was selected for further investigation based on the criteria of a p-value less than 0.05 and a correlation greater than 0.5. The hub genes within the selected module were identified for apoptosis, ferroptosis, necroptosis, entotic cell death, and pyroptosis (geneTraitSignificance greater than 0.4 and geneModuleMembership greater than 0.7). Ultimately, we identified shared genes by examining interactions among the hub genes in apoptosis, ferroptosis, necroptosis, entotic cell death, and pyroptosis.
2.3 Functional and Pathway Enrichment Analysis
The biological functions and signaling pathways of the shared genes were examined using the Gene ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Set Enrichment Analysis (GSEA). Analysis was conducted using the R packages "clusterProfiler" and "org. Hs.eg.db," which offer functional enrichment analysis tools.
2.4 Establishment of a Prognostic Signature for Pancreatic Cancer
The study identified shared genes with prognostic relevance through a univariate Cox regression analysis, using a significance threshold of p < 0.05. The shared genes-related signature were constructed using 15 machine learning algorithms including “Gradient Boosting Machine (GBM)”, “Random Survival Forest (RSF)”, “Least Absolute Shrinkage and Selection Operator (LASSO)”, “Ridge”, “Elastic network”, “Step-Cox”, “Support Vector Machine (survival-SVM)”, “Support Vector Machine Recursive Feature Elimination (SVM-RFE)”, “Coxboost”, “Principle component analysis”, “partial least squares regression for COX (PLSR)”, XGBoost, CatBoost, AdaBoost and Deep learning. 167 cross-validation combinations were generated using these 15 machine learning algorithms to create a predictive signature based on the TCGA-PAAD, GSE28735, AND ICGC-PACA-CA datasets. The signature with the highest C-index score was chosen for additional analysis. Patients were classified as high-risk or low-risk based on the median cutoff of the risk score. We assessed the predictive ability of the signature by utilizing the "survival" and "timeROC" R packages to generate Kaplan-Meier (KM) survival analysis and time-dependent receiver-operator characteristic (ROC) area under the curve (AUC) values for a time period of 1 to 5 years.
2.5 Validation of the Prognostic Signature in Independent Cohorts
The prognostic signature's viability and transferability were evaluated by validating it with independent cohorts from the International Cancer Genome Consortium (ICGC) database, including ICGC-PACA-AU and ICGC-PACA-CA, as well as with the GSE28735 and GSE78229 datasets from the Gene Expression Omnibus (GEO) database. The predictive signature derived from the training cohort was leveraged to generate KM survival analysis and time-dependent ROC AUC values (1–5 years).
2.6 Construction of a Nomogram
The study identified independent risk factors through a Cox regression analysis and generated a nomogram, employing the "rms" R package to predict overall survival rates of pancreatic cancer for the 1-, 2-, and 3-year periods by incorporating various clinical factors. The nomogram consisted of risk score, residual tumor, and TNM. The nomogram's performance was evaluated using the C-index and AUC. Calibration curves were used to evaluate the precision of the nomogram's forecasts. These curves enable a visual comparison between the prognosticated survival probabilities produced by the nomogram and the actual survival rates.
2.7 Mutation Landscape Map
For this study, we utilized data on single nucleotide variants (SNVs) obtained from The Cancer Genome Atlas (TCGA) and analyzed the frequency of mutations in groups at high and low risk using the "maftools" R package. We identified the top ten mutated genes by comparing these two groups and presenting our findings with a waterfall plot. In addition, we explored the differences in mutation types between these two groups. In this study, survival differences were investigated between wild-type and mutated versions of two crucial genes, KRAS and TP53, known for their role in pancreatic cancer development.
2.8 Relationship between the Risk Score and Immune Microenvironment
The study analyzed the tumor-infiltrating immune cells of two groups with machine learning algorithms, including "EPIC", "ESTIMATE", "MCPcounter", and "Xcell". The statistical significance of the differences was determined by the Wilcoxon test. To predict immunotherapy response, the Tumor Immune Dysfunction and Exclusion (TIDE) algorithm was employed, which is accessible on the TIDE online analysis tool from Harvard.edu. The analysis compared the TIDE score between the high-risk and low-risk groups. We investigated the association between the expression of the shared genes and the expression of immune checkpoint genes, specifically PDCD1/PD-1, PD-L1/CD274, CTLA4, CD47, BTLA, TIGIT, TNFRSF4, TNFRSF9, and VTCN1, using the Wilcoxon test.
2.9 Relationship between the Risk Score and Chemotherapy
Gemcitabine, docetaxel, paclitaxel, and oxaliplatin are frequently utilized in the treatment of pancreatic cancer. The present study aims to estimate the half-maximal inhibitory concentration (IC50) of these drugs for pancreatic cancer using data from the CellMiner database. The results of this analysis will provide valuable information for the development of targeted therapies for pancreatic cancer patients.
2.10 MYOF expression and tumor microenvironment
Multi-variate Cox regression analysis was performed on RUNX1, NTAN1, FNDC3B, VCAN, MYOF, ANO6, MXRA5, SRPX2, CORO1C, and LIMS1. MYOF showed significant association and was selected for further investigation due to a p-value of less than 0.05. The study then conducted a differential analysis of MYOF expression between pancreatic carcinoma and normal tissue in GSE28735, GSE62454, and ICGC-PACA-CA. Additionally, we divided pancreatic cancer into two groups based on MYOF expression and conducted Kaplan-Meier survival analysis. We then investigated the association between MYOF expression and the enrichment score of regulated cell death as well as the tumor microenvironment based on Xcell algorithm.
2.11 Proteome analysis for the shared genes
The protein expression levels of RUNX1, NTAN1, FNDC3B, VCAN, MYOF, ANO6, MXRA5, SRPX2, CORO1C, and LIMS1 were extracted from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) database. Then, the differentiated expressed analysis between pancreatic cancer and corresponding normal control were conducted.
2.12 Samples and Western blot and real time quantitative PCR
Tumor tissue and paired adjacent tissues from patients diagnosed with pancreatic adenocarcinoma after surgery were procured from the First Affiliated Hospital of Chongqing Medical University and stored at -80°C. Reverse transcription-PCR (qRT-PCR) was used for quantitative analysis of gene expression in both pancreatic cancer and adjacent tissues after total RNA extraction. Trizol reagent was utilized to extract RNA from the tissues, following the manufacturer's instructions. The extracted RNA was then reverse transcribed into complementary DNA (cDNA), using RT primers and a reverse transcription reaction mix. The reaction mixture comprised of RNAse inhibitors (Sangon Biotech, Shanghai, China), MMLV RT enzyme (P7040L, Enzymatics, USA), buffer solution (B7040L, Enzymatics, USA), and dNTPs (7DN1, HyTest Ltd, Finland). The cDNA samples underwent qRT-PCR analysis utilizing qPCR Master Mix on a Gene Amp PCR System 9700. Gene-specific primers were designed for both the target gene MYOF and the reference gene GAPDH (Supplementary Table 1). The sequences for the forward and reverse primers were provided. The expression of the targeted genes was relatively quantified using the 2-ΔΔCt method.
2.13 Cell culture and regents
The PANC-1 cell line was obtained from the Cell Bank of the Type Culture Collection in Shanghai. Standard protocols were used for cell culture, with DMEM from Gibco supplemented with 10% FBS and 1% penicillin/streptomycin. The cells were maintained at 37°C in a humidified incubator with a 5% CO2 concentration. To achieve lentiviral overexpression, we followed the manufacturer's guidelines for transducing MYOF-overexpressing lentiviruses from GeneCopoeia in Guangzhou, China. For knockdown, we employed MYOF-specific short hairpin RNAs (siRNAs) obtained from Shanghai Genechem Co.
2.14 CCK-8, Wound Healing Assay and Transwell Invasion Assay
We examined the proliferative effects of PANC-1 cells with and without MYOF downregulation using the Cell Counting Kit-8 (CCK-8) assay. PANC-1 cells, including both control and si-MYOF cells, were placed individually into 96-well plates, each group seeded at a density of 8 × 10^3 cells per well. The cells were then incubated until adherence was achieved, followed by the addition of 10 µL of CCK-8 reagent to each well at 0, 24, 48, and 72-hour intervals. The CCK-8 reagent was applied to all wells in both the control and si-MYOF groups. Subsequently, the cells were incubated for two hours after the introduction of the CCK-8 reagent. The spectrophotometer then assessed the absorbance of each well at a wavelength of 450 nm following a two-hour incubation period.
The Transwell assay is used to assess the invasion ability of PANC-1 cells with diverse levels of MYOF expression. Matrigel-coated Transwell chambers from BD Sciences in Sparks, MD, USA are employed, and each of the two groups in the upper chamber is seeded with 5 × 10^4 cells. The groups consist of PANC-1 cells exhibiting downregulation of MYOF expression, PANC-1 cells that lack MYOF downregulation, PANC-1 cells overexpressing MYOF, and PANC-1 cells without MYOF overexpression. After a 24-hour incubation period, cells that penetrate the Matrigel and descend to the lower part of the chamber are fixed with a 4% formaldehyde solution. Following fixation, the fixed cells are treated with a 0.2% crystal violet solution for approximately 20 minutes. To remove excess staining solution and non-invading cells from the upper portion of the Transwell chamber, carefully wipe the inner surface with a cotton swab. Count the number of invading cells that are stained on the underside of the Transwell chamber by using an inverted microscope.
The quantity of invading cells serves as a gauge for the invasion capacity of PANC-1 cells under various MYOF expression conditions. PANC-1 cells with downregulated or unaltered MYOF expression are cultured in separate wells of a 6-well plate, achieving complete confluency to create a nearly full monolayer before evaluation. Once the cells reach the desired level of confluence, a wound is created by manually scraping the cell monolayer using a sterile pipette tip that is 200 µL in size. This procedure produces a gap, or "wound," in the cell layer that is imaged and monitored at two different time points: 0 and 24 hours after the wound creation. Images are taken using an inverted microscope. The wound area at 0 and 24 hours is then analyzed using ImageJ to determine the extent of wound closure. The decrease in wound area indicates that cells have migrated into the gap and have the capability to effectively promote wound healing.
2.15 Statistical analysis
R version 4.3.1 software was utilized for all static analysis, with statistical significance defined as a P value of less than 0.05. The following algorithms were utilized: RSF, SVM-RFE, GBM, Lasso, xgboost, catboost, adaboost, and deep learning. The algorithms were executed using various R packages, including "randomForestSRC", "e1071", "gbm", "glmnet", "xgboost", "catboost", "adabag", and "h2o". The implemented machine learning algorithms in R were superpc, survivalsvm, gbm, PLSR, COXBoost, Ridge, Lasso, ElasticNet, stepwise Cox, and RSF. The packages "superpc", "survivalsvm", "gbm", "plsRcox", "CoxBoost", "glmnet", and "randomForestSRC" were utilized as they provide the necessary functions and implementations for the aforementioned algorithms.