Weighted gene co-expression network analysis (WGCNA) identifies eight cell-marker-gene modules
Pancreatic ductal adenocarcinoma (PDAC) is preponderated by intricate tumor micro-environmental (TME) cells. To evaluate the prognostic importance of TME variations and key molecule events underlying tumor cell-TME interactions, a set of cell-cluster-marker genes was defined based on PDAC single-cell RNA-sequencing data. Then, the cell-cluster-marker genes were used in gauging TME status in bulk sequencing data to mine prognostic prediction systems in PDAC patients. (Figure 1A). A single-cell RNA-seq dataset CRA001160 was consecutively analyzed by Seurat software and dimension reduced by UMAP methods to obtain consensus clusters (Figure 1B). According to the expression of a handful of well-documented cell type specific genes (Supplementary figure 1A), we distinguished cellular identity of each cluster and integrated them into eight major components including B cells, cytotoxic cells, endothelial cells, fibroblasts, myeloid cells, acinar cells, islet cells and tumor cells (Figure 1C, Supplementary figure 1B). To acquire the Cell Component Marker Genes (CCMGs), the conserved markers were determined by the non-overlapped genes with Fold Change>2 and P value < 0.05, in each cluster (Supplementary table 1, Supplementary figure 1C). The genes representing B cell, T cell, myeloid cell, endothelial cell and fibroblast were then picked up and defined as Tumor Micro-Environment Marker genes (MEMGs).
To generate a training set for deciphering how tumor-microenvironment status links prognosis, we merged three cohorts of PDAC bulk RNA-sequencing datasets (PACA-CA, PACA-AU and GSE79668) into one combined PDAC RNA-seq dataset, using 'Combat' function to eliminate batch effect (Supplementary figure 1D). The single-cell data derived CCMGs were subjected as input genes to WGCNA analysis in the combined PDAC dataset under soft threshold 0.85, resulting in eight non-grey gene modules (Figure 1D, Supplementary figure 2A-B). When each gene in every module was traced back to the represented cell type, the results showed that gene module-distribution across cell types (Figure 1E), and the cell type constitutes among gene modules (Figure 1F) are both highly heterogeneous. Tumor cell-derived marker genes were enriched and dominant in blue and green models, while the turquoise model was mainly contributed by endothelial cells and fibroblast-derived genes. Each CCMGs were also tested for their prognostic potential respectively, by Cox regression analyses. According to Cox regression results, The CCMGs were classified into good (HR<1, p<0.05) and poor (HR>1, p<0.05) outcome associated genes (Supplementary table 2). Considering the attribution of prognosis-significant genes to respective gene modules (Figure 1G) and cell types (Figure 1H), we found that the majority of genes within most of the gene modules and cell types indicated good survival. alternatively, the green gene module was preoccupied by the genes indicating deteriorated prognosis.
Censuses clustering PDAC samples by MEMGs into prognostic-related subtypes
Superior to calculating the prognostic value of a single gene, it is meaningful to evaluate the overall effects of one gene module to obtain the impact of gene co-expression networks. To this end, the Module Eigengene (ME) values were calculated and used for survival analysis for each module (Supplementary table 3). The Cox regression analyses and Kaplan-Meier analyses under optimal cutoff values revealed that high MEgreen values and low MEblue values inferred unfavorite outcomes (Figure 2A-B). To unwind the gene co-expression networks in green and blue modules, we considered the genes with both high Module membership (MM) value and intramodular connectivity value (MM>0.5 & Connectivity>0.5) as the hub genes (Figure 2C, Supplementary table 4). The results indicated that all 63 hub genes, excluding 2 endothelial genes, in both two modules derived from tumor cells, in agreement of the notion that tumor cells provide the the driving force in shaping tumor microenvironment.
To categorize PDAC patients in respect to tumor microenvironmental heterogeneity, we picked up MEMGs in green and blue modules for consensus clustering of PDAC patients. After filtering samples using silhouette score (silhouette width>0), the remaining samples were assigned to three TME classes (Figure 2D). The vast majority of patients fell into C1 and C2 classes. Most poor-prognostic MEMGs and green module derived MEMGs were highly expressed in TME 2 classes. When analyzing the expression of hub genes in the TME classes, the expression of hub genes in green and blue modules were remarkably segregated along with C1 and C 2 classes (Figure 2E). The green module was correlated with worse survival, however, by contrast, the blue module was associated with benefited survival. Unexpectedly, patients in C1 and C2 were effectively separated by Kaplan-Meier curves (Figure 2F). To extend this finding, we referred the same consensus clustering method in TCGA-PAAD cohort, resulting in a similar result (Supplementary figure 3). These results demonstrated that distinct tumor-microenvironmental status, reflecting by a MEMG subset, was related to patients' outcomes. To compare tumor microenvironmental components between main TME classes, we used the 'quantiseq' and 'TIDE' deconvolution methods to estimate the abundance of some crucial tumor-infiltrating cells. The results showed that the C1 class were infiltrated by more cytotoxic cells, such as NK cells and CD8+ T cells. In contrary, the C2 class were enriched of cancer-associated fibroblasts (CAF) and immunosuppressive cells MDSC (Figure 2G). To assess the relationship of genetic variations with TME classes, we mapped the gene variation landscape in different TME classes. The results showed that the occurrence of concurrent mutations, KRAS and TP53, were more frequently in C2 class than C1 class, which may lead to tumor microenvironment remodeling.
Neural network model and risk score in predicting overall survival and chemo-responsiveness
To utilize the prognosis-significant gene modules in predicting overall survival of PDAC patients, we used PyTorch platform to construct a deep neural network (DNN) model based on MEMGs in blue and green modules [24]. The DNN model contains five layers as shown in Figure 3A. Firstly, the DNN model was trained using randomly selected 2/3 samples in the combined PDAC dataset. Then, the remaining 1/3 samples were used as internal testing set (AUC = 0.90). Since the median survival time is closed to one year, we applied this model to predict one-year survival. Consequently, we got an AUC = 0.88 in the training set, and AUC = 0.9 in the testing set (Figure 3B). When using the DNN predictor-derived probability score in Kaplan-Meier survival analysis, the patients with higher score indeed had shorter survival time (Figure 3C). In another external testing set, the TCGA-PAAD dataset, the DNN model was also successful in predicting patients' outcomes (AUC = 0.81; Kaplan-Meier survival analysis, p < 0.0001) (Figure 3D-E). Furthermore, the meta-analysis was performed to review the prognostic efficiency of DNN probability score in independent datasets. The results showed that DNN score associated with poor prognosis in all tested PDAC datasets. Meta-analysis through DL (DerSimonian and Laird) model resulted in a positive hazard ratio (HR=3.076, p<0.00001), corroborating the general prognostic effect (Figure 3F). We also utilize the DNN model to predict chemo-sensitivity of PDAC. From the ICGC datasets, we selected the PDAC patients receiving chemotherapy for training DNN model. Then, the DNN model accurately predicted chemo-responsiveness (Figure 3G-H). Concurrently, the DNN probability score level stratified therapeutic success in PDAC patients (Figure 3I).
In addition to DNN model, we also developed a MEMG-based risk score model in prognosis prediction. The risk score was defined as weighted average expression of MEMGs. The Cox coefficient was used as the weight. In multiple independent PDAC datasets (Supplementary table 5), high risk score positively correlated with unfavorite prognosis (Figure 4A). Meta-analysis showed that the risk score had a positive hazard ratio in most individual datasets and in overall testing by DL model (Figure 4B). Moreover, the risk score related to the outcomes after chemotherapy (Figure 4C). When the risk scores were dimidiated by maxstat-derived cutoff value, the risk score level was significantly correlated with actual chemo-responses (Fisher's exact test, P=0.017) (Figure 4D). To exam the relationship between risk score and immunotherapeutic response, we showed that MEMG-based risk score positively correlated with TIDE-estimated MDSC abundance, but negatively correlated with Dysfunction score (Figure 4E). Importantly, the risk score level was correlated with TIDE-estimated immune checkpoint blockage response (Fisher's exact test, P=0.007) (Figure 4F).
Cell junction molecule-mediating cell-cell communications govern the prognosis-related gene-network
To understand the molecular basis of prognosis-related gene networks, we performed functional enrichment analysis of hub genes in REACTOME database. The results revealed that "cell junction" and "cell communication" were on the top of enriched molecular function terms (Figure 5A, Supplementary table 6). We selected thirteen cell junction hub genes (in green and blue modules), which serve as cell communication ligand/receptor, as central cell-cell communication mediators (Figure 5B). As shown in Figure 5C, the selected central cell communication mediators weaved a co-expression correlation network with convoluted tumor cell-tumor cell or tumor cell-TME connections. The ligand-receptor pairs involving the above cell-cell communication mediators were extracted from the connectomeDB2020 database (Figure 5D) and subjected for following assays. The cell-cell communication analysis was performed in the single-cell RNA-seq data (CRA001160) through NATMI algorithm. Because certain ligand/receptor molecule may be expressed in more than one cell type. One ligand-receptor pair may modulate communications between several cell pairs. on the other hand, one cell pairs could also be bridged by multiple ligand-receptor pairs (Figure 5E). The cell pair with the largest average expression value was considered as the top cell-cell connection for each ligand-receptor pair (Supplementary table 7). By assigning the ligands/receptors to sending cell types and target cell types according to the top connections, we got an cells-ligand-receptor-targets alluvial plot, in which molecules were weighted by average expression and cell types were weighted by sum average expression of contributing molecules. This network figure out that tumor cells, fibroblasts and endothelial cells were the most vital cell types engaged in cell-cell communication in pancreatic cancers (Figure 5F).
Integrins are key mediators critical for tumor cell to micro-environment communications
To interrogate the prognostic effect of cell-cell communication linked by particular ligand-receptor pair, we calculated the cell-cell communication score in bulk RNA-seq data for Cox regression analyses in the combined PDAC and TCGA-PAAD cohorts (Figure 6A). The strength of most ligand-receptor-induced cell-cell connection remarkably correlated with poor prognosis (HR>1, P<0.05). Integrin-mediated tumor cell-fibroblast communication and tumor cell-endothelial cell communication were prior in the ordered hazard ratio ranks. Besides, the majority of cell-cell communication scores were correlated with DNN model-generated probability score (correlation coefficient > 0, P<0.05), capable of indicting patient's outcome (Figure 6B). By integrating the cell-cell communication networks in both combined PDAC dataset and TCGA-PAAD dataset, we found that tumor cell, fibroblast and endothelial cells were the most significant cellular component, meanwhile, integrins such as ITGA2, ITGA6 were main contributing molecules (Figure 6C). Correlation analyses showed that the mean expression of the integrins in hub genes was largely in parallel with cell-cell communication score, as well as DNN probability score and risk score (Figure 6D, Supplementary figure 4). More than that, the PDAC samples highly expressing the hub integrins were accumulated in TME C1 class, and possessed more driver genetic variations (Figure 6E). These results revealed that integrins may key mediators participated in the dialogue between tumor cell and micro-environmental cells, potentiating the aggressive phenotype of PDAC.
Pharmacological blocking of ITGA2 orchestrates micro-environmental changes and limits PDAC initiation
Intrigued by the pivotal roles of integrins in intercellular network, we seek out to choose the most significant integrin, ITGA2, testing for the protein expression and druggable potential. In the tissue microarray containing 66 PDAC samples, we performed immunohistochemical analysis against ITGA2. ITGA2 was shown to be clearly expressed on the membrane of tumor cells (Figure 7A). In accordance with the bioinformatic analyses, the PDAC patients highly expressing ITGA2 implied inferior prognosis (Figure 7B). To explore whether and to what extend targeting ITGA2 influence PDAC development, we employed Pdx1-Cre+, KrasG12D (KC) mice for in vivo inhibitor management. The mice were orally treated with ITGA2 inhibitor E7820[27], and then, the effected pancreas area was quantified. It appeared that E7820 essentially diminished pancreatic lesions without profoundly influence body weights (Figure 7C, Supplementary Figure 5A). The efficiency of E7820 treatment was also revealed by the detection of ductal biomarker CK19, mucin content (Alcian blue staining) and proliferation marker Ki67 (Figure 7D). In cultured PDAC cells, E7820 indeed suppressed the mRNA and protein expression of ITGA2 (Supplementary Figure 5B-C), which in accordance with the molecular mechanism of this inhibitor[28], but it does not significantly alter the proliferation rate of PDAC cells in vitro (Supplementary Figure 5D). Furthermore, in mice model, the lesioned area within pancreas from mice receiving E7820 contained less αSMA-positive fibroblasts, CD31-positive micro-vessels and Gr1-positive MDSCs (Figure 7E). Collectively, our in vivo pharmacology experiment claimed that targeting the key molecule in the cell communication network reshaped the tumor-benefiting micro-environment to decelerate PDAC growth.