ScRNA-seq identified immune related DEGs for PDAC
To identify transcriptional profile difference between tumor cells and normal ductal cells in PDAC, we conducted scRNA-seq for cancerous and adjacent tissues from patients with PDAC in our institution, and downloaded scRNA-seq datasets from Genome Sequence Archive (GSA) and China National GeneBank DataBase (CNGBdb) databases. The t-distributed stochastic neighbor embedding (t-SNE) was used to delineate cell atlas for different datasets (Supplementary Fig. S1A-C). Most of cell subpopulations in PDAC and normal pancreatic tissues, including T Cell, Fibroblast, Macrophage, Ductal, Endothelial, Stellate, Acinar, B Cell, Plasma, Mast Cell, Neutrophils, Schwann, and Endocrine Cell, were identified by scRNA-seq analysis. Our previous study isolated ductal cell subpopulations, inferred somatic large-scale chromosomal CNVs and calculated CNV scores according to reference normal cells, such as endothelial cells, macrophages, and t cells. Ductal cell subpopulations with high CNV scores were defined as malignant ductal cells subpopulations, therefore were marked as Tumor 1/2/3/4/5 (Supplementary Fig. S1D-F). We compared normal and malignant ductal subpopulations, and identified many DEGs (Fig. 1B-D). Furthermore, common DEGs (Tumor 1/2/3/4/5 vs. normal ductal subpopulations) were identified for each scRNA-seq dataset (Fig. 1E-G). The venn diagram showed the intersection of common DEGs among three datasets (Fig. 1H). A total of 28 common DEGs were found, which represented the most significant differences between tumor cells and normal ductal cells in PDAC. Seven DEGs were related to immune regulation according to the Immunology Database and Analysis Portal (ImmPort) database. The correlation analysis was performed to select the most relevant LncRNAs using TCGA_PAAD dataset. Finally, 7 immune related DEGs (IRDEGs) and 70 LncRNAs were included for developing prognostic model.
Construction and internal validation of prognostic model for PDAC
The gene expression matrix and clinical follow-up data were downloaded and integrated using TCGA_PAAD dataset. Univariate cox regression analysis was conducted to screen out prognosis related signatures. A total of 30 OS related IRDEGs and LncRNAs were found. To avoid the prognostic model overfitting, LASSO-penalized cox regression analysis was performed to further select 10 signatures for model construction (Fig. 1I-J). Then, 177 subjects in TCGA_PAAD dataset were randomly divided into train set and validation set in 2:1. Multivariate cox regression analysis was used to construct prognostic model in the train set (Supplementary Fig. S1G). Five immune-related signatures, including SPP1, LINC00683, SNHG10, LINC00237, CASC19 were used to develop risk score formula to predict the overall survival of PDAC patients. The risk score of each subject was calculated as follows: risk score (t) = h0 (t) * exp (0.1775 * SPP1 – 0.3990 * LINC00683 – 0.3215 * SNHG10 – 0.3744 * LINC00237 + 0.1537 * CASC19).
Subjects were classified as low risk group or high risk group according to the median risk score (Fig. 2C). The Kaplan–Meier curve (KM) indicated that subjects with higher risk scores had significantly worse prognosis than those with lower risk scores (P = 2.877e - 06) (Fig. 2A). Time-dependent receiver operating characteristic curve (ROC) was utilized to assess the validity of risk score model to predict 1-year, 1.5-year and 2-year OSs, and the area under curve (AUC) values were 0.709, 0.77, and 0.787 respectively (Fig. 2B). Risk curve showed that higher risk scores subjects had, the shorter OS they were inclined to have (Fig. 2D). Next, the internal validation was performed. In line with results in the train set, subjects in high risk group had worse OS in the validation set (P = 1.361e – 02; AUC: 1-/1.5-/2-year OSs: 0.667/0.726/0.818) and all set (P = 1.361e – 02; AUC: 1-/1.5-/2-year OSs: 0.681/0.751/0.798) (Fig. 2E-L).
To further test the performance of established prognostic model in different age and N stage subgroups, subjects were divided into two subgroups (age > 65 and age ≤ 65; N0 and N1). Consistent with previous results, subjects in high risk group had worse OS than those in the low risk group for subgroup analyses. For age > 65 subgroup, p value was 4.362e – 03 in the KM curve, and 1-/1.5-/2-year AUC values were 0.637/0.71/0.756 (Supplementary Fig. S2A-D). For age ≤ 65 subgroup, p value was 8.972e – 05 in the KM curve, and 1-/1.5-/2-year AUC values were 0.718/0.791/0.818 (Supplementary Fig. S2E-H). For N0 subgroup, p value was 3.49e – 02 in the KM curve, and 1-/1.5-/2-year AUC values were 0.639/0.764/0.803 (Supplementary Fig. S2I-L). For N1 subgroup, p value was 2.318e – 03 in the KM curve, and 1-/1.5-/2-year AUC values were 0.663/0.669/0.718 (Supplementary Fig. S2M-P).
Nomogram to predict the survival of patients with PDAC
The clinicopathological characteristics and risk score of each subject in TCGA_PAAD were combined. Univariate cox regression analysis indicated that risk score (HR = 2.16; 95% CI: 1.50 – 3.09, P = 0.0000), margin status (HR = 1.72; 95% CI: 1.02 – 2.92, P = 0.0431), and N stage (HR = 1.95; 95% CI: 1.03 – 3.67, P = 0.0398) were significantly associated with the OS (Fig. 2M). However, multivariate cox regression analysis showed that risk score was the only independent prognostic factor of PDAC (HR = 2.01; 95% CI: 1.4 – 2.9, P = 0.0002) (Fig. 2N). Moreover, an easy-to-use prognostic nomogram, including N stage, margin status, risk score, was established (Fig. 2O). The subject with higher total points was predicted to have shorter 2-year and 3-year OSs. There was a good correlation between nomogram predicted OSs and actual OSs, showing the accuracy of established nomogram (Fig. 2N).
The prognostic model was associated with tumor immune infiltration status
The type of tumor immune infiltration was a good predictor for prognosis and immunotherapy response [18, 19]. To explore the correlation between established prognostic model and tumor immune infiltration status in PDAC, immune infiltration score of each subject was calculated. Subjects in high risk group had higher immune infiltration scores compared with low risk group (P = 0.003) (Fig. 3A). The KM curve showed that subjects with higher immune infiltration scores had significantly worse OS (1.241e - 02) (Fig. 3B). PDAC tissues from the high risk group had higher proportion of Tregs (P < 0.05), which might contribute to the dismal prognosis (Fig. 3C). In addition, multiple deconvolution-based analyses for the proportion of cell subpopulations in PDAC and normal pancreatic tissues showed that there was significant difference between PDAC and normal pancreatic tissues in immune cell compositions, and the proportions of Tregs and M2_macrophage were significantly higher in PDAC than normal pancreatic tissues for CIBERSORT and quan TIseq, indicating that PDAC underwent the process of suppressive immune regulation (Supplementary Fig. S3A-C). PDAC also had more cancer associated fibroblasts (CAFs) for EPIC.
Pan-cancer landscape of SPP1 was delineated
The SPP1, a key signature in the established prognostic model, plays a pivotal role in tumor immune microenvironment and tumor progression [20-22]. The SPP1 is also a promising biomarker for various tumors [23-25]. Pan-cancer landscape of SPP1 for diagnosis, prognosis, and tumor immune infiltration was investigated. SPP1 was significantly upregulated in various malignancies, such as breast cancer, cholangio carcinoma, glioblastoma, kidney carcinoma, pancreatic cancer, brain lower grade glioma et al. (Fig. 3D). The KM curves showed that patients with higher expression of SPP1 had worse OSs in many tumors, including adrenocortical carcinoma, cervical carcinoma, head and neck carcinoma, lower grade glioma, liver hepatocellular carcinoma and pancreatic cancer (Fig. 3E). Furthermore, the expression level of SPP1 was related to the abundance of tumor-infiltrating Tregs and macrophage (Fig. 3F-G). There was significant difference of expression of SPP1 between responders and non-responders in various checkpoint blockade therapy datasets (Fig. 3H). The expression of SPP1 was also correlated to multiple immunoinhibitors (CSF1R, HAVCR2, IL10, PDCD1LG2, TGFB1) and immunostimulators (CD80, CD86, IL2RA, TNFSF4/9) (Supplementary Fig. S3D-E).
ScRNA-seq identified the expression of SPP1 in various cell subpopulations of PDAC
Recently, scRNA-seq showed great advantages in uncovering transcriptional profiles of tissues in single cell resolution. Multiple human and mouse scRNA-seq datasets for PDAC were retrieved from online databases. In our previous, we have described tissue landscapes and identified cell subpopulations identifications. Here, the expression level of SPP1 among various cell subpopulations was showed by violin plot. We found that many cell subpopulations, especially macrophage and ductal cells, had enriched expression of SPP1 in human and mouse PDAC tissues (Fig. 4A-B). In addition, PDAC had the higher expression of SPP1 than normal pancreatic tissue (Fig. 4C). However, no significant difference was found in the expression of SPP1 for PDAC patients with different stages (Fig. 4D). Notably, M2_macrophage in PDAC had higher expression of SPP1 than those in normal pancreatic tissue. In summary, SPP1 in M2_macrophage might be a better diagnostic and prognostic biomarker for PDAC.
To further explore the expression of SPP1 in distinct macrophage subpopulations, the Gene-Cell matrix of macrophages in our scRNA-seq dataset was isolated for t-SNE analysis. The eight original clusters were found, then integrated into four macrophage subpopulations: macrophage 1-4 (Fig. 5A-B, D). There was a significant difference in gene expression profiles among four macrophage subpopulations (Fig. 5C). The macrophage 2 had enriched expression of many chemokines and ligands, including CCL4, CCL4L2, CCL3, CCL3L1, which were involved in recruiting immune cells to promote tumor development [26, 27]. Next, we tested the expression of markers of different macrophage types among macrophage 1-4 subpopulations (Fig. 5E). The common macrophage marker AIF1 was expressed in all isolated macrophages, identifying their cell identification. The M2_macrophages and CD169+ macrophages were found based on the expression of IL4R and SIGLEC1. The SPP1+ macrophages were distributed among macrophage 1-4 subpopulations. There were few macrophages that expressed both IL4R and SPP1.
The distribution of SPP1 in PDAC was verified by multicolor IHC stating
To further verify the distribution of SPP1 in PDAC, we tested the expression of SPP1 in PDAC cell lines. The RT-qPCR results showed that many PDAC cell lines, except for MIA PaCa-2, expressed the higher levels of SPP1 compared with HPNE (Fig. 6A). Similar results were observed in western blot (Fig. 6B). The upregulation of SPP1 was found in the MIA PaCa-2 and BxPC-3 under hypoxic condition (Fig. 6C). In addition, the paired cancerous and adjacent tissues of PDAC patients were examined. We found cancerous tissues had significantly enriched expression of SPP1 than adjacent tissues (Fig. 6D). The results of multicolor IHC showed that SPP1 was expressed in various cell types in PDAC tissues, and PanCK+/SPP1+ and CD68+/SPP1+ cells were observed, indicating that SPP1 was enriched in ductal cells and macrophages.