2.1 Datasets
A total of 178 pancreatic ductal adenocarcinoma (PDAC) samples were obtained from the Cancer Genome Atlas (TCGA) database (14), 455 PDAC samples were sourced from the International Cancer Genome Consortium (ICGC) database (15), and 91 PDAC samples were downloaded from the Gene Expression Omnibus (GEO) database (16) with the FPKM format in the bulk transcriptome level. Differentially expressed genes (DEGs) in PDAC were extracted from the Gene Expression Profiling Interactive Analysis (GEPIA) database (17), with the significance threshold at p-value < 0.05 and fold change (log2FC) > 1. A total of 788 genes that regulate the 5 kinds of PCD mutually were extracted, including 200 genes of apoptosis, 159 of pyroptosis, 110 of ferroptosis, 184 of entotic cell death, and 135 of necroptosis from the MSigDB database.
2.2 Identification of PCD-related hub genes in PDAC
The weighted gene co-expression network analysis (WGCNA) was utilized to investigate the hub genes in several PCD pathways in PDAC, which is a machine learning algorithm to construct gene modules based on similarly expressed genes (18). Initially, Pearson correlation among DEGs was calculated and then transformed into an adjacency matrix by indexing to generate a scale-free network. In such a network, a subset of hub genes exhibits high connectivity with other genes, whereas the majority of genes show limited connectivity, reflecting more closely the biological conditions. We then aimed to identify the appropriate exponent (soft-threshold power) by performing a linear regression analysis between the frequency of adjacency matrix values and the corresponding adjacency matrix values. This approach is believed to maximize overall connectivity and enhance the strength of the regression analysis. Additionally, the adjacency matrix was then transformed into topological overlap matrix (TOM) to minimize bias from other genes (19). Eventually, gene modules were constructed based on topological overlap matrix.
Enrichment scores of PCD pathways including apoptosis, ferroptosis, necroptosis, entotic cell death, and pyroptosis were calculated by single-sample gene set enrichment analysis (ssGSEA) (20). Pearson correlation analyses were then performed between gene modules and these enrichment scores. The PCD-related gene module was selected for further investigation if it had a p-value less than 0.05 and a relevance score greater than 0.5. The genes with geneTraitSignificance greater than 0.4 and geneModuleMembership greater than 0.7 were identified. Ultimately, PCD-related hub genes were selected based on their interactions across apoptosis, ferroptosis, necroptosis, entotic cell death, and pyroptosis pathways.
2.3 Functional and pathway enrichment analysis
The biological functions and signaling pathways of the PCD-related hub genes were examined using Gene Ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Set Enrichment Analysis (GSEA). These analyses were conducted with the R packages "clusterProfiler" and "org.Hs.eg.db".
2.4 Establishment of the PCD-related prognostic model in PDAC
The study identified PCD-related hub genes with prognostic relevance using univariate-cox regression analysis via the “survival” R package with the criteria of p < 0.05. Based on prognostic PCD-related hub genes, a total of 167 algorithmic combinations consisted of 15 machine learning methods were utilized within the cohorts in our research, including “Gradient Boosting Machine (GBM)”, “Random Survival Forest (RSF)”, “Least Absolute Shrinkage and Selection Operator (LASSO)”, “Ridge”, “Elastic network”, “Step-Cox”, “Support Vector Machine (SVM)”, “Support Vector Machine Recursive Feature Elimination (SVM-RFE)”, “Coxboost”, “Principle component analysis”, “partial least squares regression for COX (PLSR)”, XGBoost, CatBoost, AdaBoost and Deep learning. The average C-index of each algorithmic combination across the entire training, test, and validation sets was used as a criterion to assess the superiority of the combinations. The combination with the highest average C-index was chosen for subsequent model construction. Consequently, the Deep Learning-Random Survival Forest (RSF) combination was selected. Using the features identified via Deep Learning, we applied RSF to establish our PCD-related prognostic model by calculating the Cell Death Index (CDI). Patients were classified as high-risk or low-risk based on the median CDI cutoff. The predictive ability of the model was assessed using the "survival" and "timeROC" R packages, generating Kaplan-Meier survival analyses and time-dependent receiver operating characteristic (ROC) area under the curve (AUC) values over a period of 1 to 5 years in TCGA-PDAC, ICGC-PACA-AU, ICGC-PACA-CA, GSE28735 and GSE78229.
2.5 Construction of the Nomogram in PDAC
In this study, integrating various clinical information (sex, age, maximum tumor dimension, residual tumor, TNM stage, tobacco, alcohol, diabetes, and chronic pancreatitis), univariate-Cox regression analysis was performed to determine whether CDI could serve as an independent prognostic factor in PDAC patients. Subsequently, a clinical nomogram incorporating CDI and the aforementioned clinical features was constructed using the "rms" R package. The reliability of the nomogram was evaluated using the C-index, AUC, and calibration curves.
2.6 Assessment of gene mutation frequency in PDAC
For this study, single nucleotide variants (SNVs) in PDAC were downloaded from TCGA database and cleaned by R package "maftools". The variant classification, variant type, and gene mutation frequency in PDAC were analyzed, and differences across groups were calculated using the chi-square test. Survival differences between KRAS, TP53, and CDKN2A mutant and wild-type PDAC were identified by K-M analysis.
2.7 Assessment of tumor microenvironment and immune infiltration analysis
Several machine learning algorithms of immune infiltration, including "EPIC", "ESTIMATE", "MCPcounter", and "Xcell" were utilized to identify immune microenvironment via the corresponding methods in the R package “IOBR”, and differences across groups were calculated using the Wilcoxon test. The significance threshold was set at p-value < 0.05. To predict immunotherapy response, the Tumor Immune Dysfunction and Exclusion (TIDE) algorithm (an online analysis tool) was employed. The expression differences of immune checkpoint genes between high-risk and low-risk groups, including PDCD1 (PD-1), PD-L1 (CD274), CTLA4, CD47, BTLA, TIGIT, TNFRSF4, TNFRSF9, and VTCN1, were investigated using the Wilcoxon test.
2.8 Exploration of the drug sensitivity in PDAC patients
Gemcitabine, docetaxel, paclitaxel, and oxaliplatin are frequently utilized in the treatment of PDAC. This study aims to estimate the half-maximal inhibitory concentration (IC50) of these drugs for PDAC using data from the CellMiner database and the expression levels of PCD-related hub genes.
2.9 Validation of the significant prognostic PCD-related gene (MYOF)
Multi-variate Cox regression analysis was performed to identify independent risk factor, including UNX1, NTAN1, FNDC3B, VCAN, MYOF, ANO6, MXRA5, SRPX2, CORO1C, and LIMS, and MYOF was selected for further investigation. The expression of MYOF between PDAC and normal tissue in GSE28735, GSE62454, and ICGC-PACA-CA cohorts were conducted. Then, survival differences between PDAC with high MYOF expression and those with low MYOF expression were calculated using Kaplan-Meier analysis.
2.10 Samples and real time quantitative PCR
Tumor tissue and paired adjacent tissues from patients diagnosed with pancreatic adenocarcinoma after surgery were procured from the First Affiliated Hospital of Chongqing Medical University and stored at -80°C. Reverse transcription-PCR (qRT-PCR) was used for quantitative analysis of gene expression in both pancreatic cancer and adjacent tissues after total RNA extraction. Trizol reagent was utilized to extract RNA from the tissues, following the manufacturer's instructions. The extracted RNA was then reverse transcribed into complementary DNA (cDNA), using RT primers and a reverse transcription reaction mix. The reaction mixture comprised of RNAse inhibitors (Sangon Biotech, Shanghai, China), MMLV RT enzyme (P7040L, Enzymatics, USA), buffer solution (B7040L, Enzymatics, USA), and dNTPs (7DN1, HyTest Ltd, Finland). The cDNA samples underwent qRT-PCR analysis utilizing qPCR Master Mix on a Gene Amp PCR System 9700. Gene-specific primers were designed for both the target gene MYOF and the reference gene GAPDH (Supplementary Table 1). The sequences for the forward and reverse primers were provided. The expression of the targeted genes was relatively quantified using the 2-ΔΔCt method.
2.11 Cell culture and regents
The PANC-1 cell line was obtained from the Cell Bank of the Type Culture Collection in Shanghai. Standard protocols were used for cell culture, with DMEM from Gibco supplemented with 10% FBS and 1% penicillin/streptomycin. The cells were maintained at 37°C in a humidified incubator with a 5% CO2 concentration. To achieve lentiviral overexpression, we followed the manufacturer's guidelines for transducing MYOF-overexpressing lentiviruses from GeneCopoeia in Guangzhou, China. For knockdown, we employed MYOF-specific short hairpin RNAs (siRNAs) obtained from Shanghai Genechem Co.
2.12 CCK-8, Wound Healing Assay and Transwell Invasion Assay
We examined the proliferative effects of PANC-1 cells with and without MYOF downregulation using the Cell Counting Kit-8 (CCK-8) assay. PANC-1 cells, including both control and si-MYOF cells, were placed individually into 96-well plates, each group seeded at a density of 8 × 10^3 cells per well. The cells were then incubated until adherence was achieved, followed by the addition of 10 µL of CCK-8 reagent to each well at 0, 24, 48, and 72-hour intervals. The CCK-8 reagent was applied to all wells in both the control and si-MYOF groups. Subsequently, the cells were incubated for two hours after the introduction of the CCK-8 reagent. The spectrophotometer then assessed the absorbance of each well at a wavelength of 450 nm following a two-hour incubation period.
The Transwell assay is used to assess the invasion ability of PANC-1 cells with diverse levels of MYOF expression. Matrigel-coated Transwell chambers from BD Sciences in Sparks, MD, USA are employed, and each of the two groups in the upper chamber is seeded with 5 × 10^4 cells. The groups consist of PANC-1 cells exhibiting downregulation of MYOF expression, PANC-1 cells that lack MYOF downregulation, PANC-1 cells overexpressing MYOF, and PANC-1 cells without MYOF overexpression. After a 24-hour incubation period, cells that penetrate the Matrigel and descend to the lower part of the chamber are fixed with a 4% formaldehyde solution. Following fixation, the fixed cells are treated with a 0.2% crystal violet solution for approximately 20 minutes. To remove excess staining solution and non-invading cells from the upper portion of the Transwell chamber, carefully wipe the inner surface with a cotton swab. Count the number of invading cells that are stained on the underside of the Transwell chamber by using an inverted microscope.
The quantity of invading cells serves as a gauge for the invasion capacity of PANC-1 cells under various MYOF expression conditions. PANC-1 cells with downregulated or unaltered MYOF expression are cultured in separate wells of a 6-well plate, achieving complete confluency to create a nearly full monolayer before evaluation. Once the cells reach the desired level of confluence, a wound is created by manually scraping the cell monolayer using a sterile pipette tip that is 200 µL in size. This procedure produces a gap, or "wound," in the cell layer that is imaged and monitored at two different time points: 0 and 24 hours after the wound creation. Images are taken using an inverted microscope. The wound area at 0 and 24 hours is then analyzed using ImageJ to determine the extent of wound closure. The decrease in wound area indicates that cells have migrated into the gap and have the capability to effectively promote wound healing.
2.13 Statistical analysis
R version 4.3.1 software was utilized for all static analysis, with statistical significance defined as a P value of less than 0.05. The following algorithms were utilized: GBM), RSF, LASSO, Ridge, Elastic network, Step-Cox, Support Vector Machine (SVM), SVM-RFE, Coxboost, PCA, PLSR, XGBoost, CatBoost, AdaBoost and Deep learning. The algorithms were executed using various R packages, including "randomForestSRC", "e1071", "gbm", "glmnet", "xgboost", "catboost", "adabag", and "h2o". The implemented machine learning algorithms in R were superpc, survivalsvm, gbm, PLSR, COXBoost, Ridge, Lasso, ElasticNet, stepwise Cox, and RSF. The packages "superpc", "survivalsvm", "gbm", "plsRcox", "CoxBoost", "glmnet", and "randomForestSRC" were utilized as they provide the necessary functions and implementations for the aforementioned algorithms.