Identification of prostate cancer subtypes based on immune signature scores in bulk and single-cell transcriptomes

Prostate cancer (PC) is heterogeneous in the tumor immune microenvironment (TIME). Subtyping of PC based on the TIME could provide new insights into intratumor heterogeneity and its correlates of clinical features. Based on the enrichment scores of 28 immune cell types in the TIME, we performed unsupervised clustering to identify immune-specific subtypes of PC. The clustering analysis was performed in ten different bulk tumor transcriptomic datasets and in a single-cell RNA-Seq (scRNA-seq) dataset, respectively. We identified two PC subtypes: PC immunity high (PC-ImH) and PC immunity low (PC-ImL), consistently in these datasets. Compared to PC-ImL, PC-ImH displayed stronger immune signatures, worse clinical outcomes, higher epithelial-mesenchymal transition (EMT) signature, tumor stemness, intratumor heterogeneity (ITH) and genomic instability, and lower incidence of TMPRSS2-ERG fusion. Tumor mutation burden (TMB) showed no significant difference between PC-ImH and PC-ImL, while copy number alteration (CNA) was more significant in PC-ImL than in PC-ImH. PC-ImH could be further divided into two subgroups, which had significantly different immune infiltration levels and clinical features. In conclusion, “hot” PCs have stronger anti-tumor immune response, while worse clinical outcomes versus “cold” PCs. CNA instead of TMB plays a crucial role in the regulation of TIME in PC. TMPRSS2-ERG fusion correlates with decreased anti-tumor immune response while better disease-free survival in PC. The identification of immune-specific subtypes has potential clinical implications for PC immunotherapy.


TCGA
The

Background
Prostate cancer (PC) is the second most common cancer in men worldwide, with more than 1,275,000 new diagnoses and 350,000 deaths annually. Abundant evidence has shown that PC is highly heterogeneous in genomic and phenotypic characteristics [1]. By analyzing 333 primary PCs, The Cancer Genome Atlas (TCGA) revealed that more than 70% of PCs belonged to one of seven subtypes in terms of specific gene fusions (ERG, ETV1/4, and FLI1) or mutations (SPOP, FOXA1, and IDH1) [2]. Salami et al. demonstrated transcriptomic heterogeneity in PC by analyzing 156 multifocal, low-grade, and high-grade PC samples [3]. Wilkinson et al. showed that specific molecular features, e.g., loss of chromosome 10q and TP53 alterations, correlated with poor response to neoadjuvant androgen deprivation therapy [4]. PC treatments include two categories: local treatments and systemic treatments. Local treatments refer to surgery and radiation therapy, while systemic treatments include hormonal therapy, targeted therapy, chemotherapy, and immunotherapy. In particular, immunotherapies, such as immune checkpoint inhibitors (ICIs), have recently achieved success in treating various malignancies, including refractory and metastatic tumors [5]. Currently, several ICIs, included sipuleucel-T [6], abiraterone acetate [7], enzalutamide [8], cabazitaxel [9], radium-223 [10], and apalutamide [11], have been approved by the Food and Drug Administration (FDA) to treat androgen depletion therapy-resistant PCs.
Nevertheless, current immunotherapeutic strategies are beneficial to only a subset of cancer patients. Thus, discovery of effective markers for immunotherapeutic responses is urgently in need. Several such markers have been identified and are being used in clinical practice, including PD-L1 expression [12], DNA mismatch repair deficiency [13], and tumor mutation burden (TMB) [14]. Besides, the tumor immune microenvironment (TIME) plays a crucial role in immunotherapeutic responses [15]. Overall, the "hot" tumors with dense T-cell infiltration are more likely to respond to immunotherapy, compared to the "cold" tumors with sparse T-cell infiltration [16]. Thus, identification of "hot" and "cold" tumors may aid stratification of cancer patients responsive to immunotherapy. In a previous study [17], we have developed an unsupervised machine learning method to identify "hot" and "cold" tumor subtypes based on immune signature scores.
In this study, we performed unsupervised clustering based on the enrichment scores of 28 immune cell types in the TIME to identify immune-specific subtypes of PC. To demonstrate the reproducibility of this method, we performed the clustering analysis in ten different transcriptomic datasets for PC bulk tumors. Consistently, we identified two PC subtypes which exhibited significantly different molecular and clinical features. Finally, we validated our method and results in a single-cell RNA-Seq (scRNA-seq) dataset. Our identification of immune-specific subtypes may provide new insights into associations between the TIME and molecular and clinical features in PC as well as potential clinical implications for the immunotherapy of this disease.

Calculation of enrichment scores of immune signatures, pathways, and phenotypic features
We calculated the enrichment score of an immune signature, pathway, or phenotypic feature in a tumor sample by the single-sample gene-set enrichment analysis (ssGSEA) [19] of the expression profiles of its gene set. We collected the gene sets representing immune signatures, pathways, or phenotypic features from KEGG [20] or related publications, as shown in Supplementary Table S2.

Identification of PC subtypes
We identified PC subtypes based on the enrichment scores (ssGSEA scores) of 28 immune cell types by hierarchical clustering. The 28 immune cell types included CD56-bright natural killer (NK) cells, effector memory CD4 T cells,  eosinophil, CD56-dim NK cells, type 17 T helper cells, activated B cells, monocytes, memory B cells, activated CD4  T cells, type 2 T helper cells, plasmacytoid dendritic cells,  neutrophils, macrophages, effector memory CD8 T cells,  myeloid-derived suppressor cell (MDSC), immature B cells,  T follicular helper cells, NK cells, immature dendritic cells,  mast cells, type 1 T helper cells, activated dendritic cells,  central memory CD4 T cells, gamma delta T cells, central  memory CD8 T cells, regulatory T cells, activated CD8 T cells, and natural killer T cells [21]. Before clustering, we normalized the ssGSEA scores by z-score and transformed them into distance matrices by the R function "dist" with the parameter: method = "euclidean". We performed hierarchical clustering using the function "hclust" in the R package "Stats" with the parameters: method = "ward.D2" and members = NULL. We also calculated Silhouette scores which indicates consistency within clusters of data.

Calculation of tumor immune scores
We calculated immune scores of tumors by ESTIMATE [22] with the input of the expression profiles of immune genes. The immune score reflects the tumor immune infiltration level.

Survival analysis
We compared disease-free survival (DFS) rates between PC subtypes using the Kaplan-Meier (K-M) model [23]. K-M curves were used to show the survival time differences, and log-rank tests were utilized to evaluate the significance of survival time differences.

Pathway analysis
We first identified differentially expressed genes (DEGs) between PC subtypes using Student's t test with a threshold of adjusted P-value < 0.05 and fold change of mean expression levels > 1.5. By input of the upregulated DEGs in a PC subtype into GSEA [24], we obtained the KEGG pathways highly enriched in the subtype with a threshold of adjusted P-value < 0.05.

Calculation of TMB and SCNAs
We determined a tumor sample's TMB as the total count of its somatic mutations. We calculated G-scores in tumors by GISTIC2 [25] with the input of "SNP6" files.

Class prediction
We first normalized attribute values (ssGSEA scores of immune cells) by Z score and transformed all attribute values into the range between − 3 and 3 by setting an attribute value as 3 if it was greater than 3 and setting an attribute value as -3 if it was less than -3. We used the Random Forest (RF) algorithm [26] to predict PC subtypes. In the RF model, the number of trees was set to 500, and the features were the 28 immune cells. We reported prediction accuracies, sensitivities, and specificities. The class prediction was performed using the R package "randomForest".

Statistical analysis
We used the Mann-Whitney U test to compare two classes of non-normally distributed data and Student's t test for normally distributed data. We used the Spearman method to correlate immune scores and other variables. The Fisher's exact test was utilized to evaluate the association between two categorical variables. To adjust for P values in multiple tests, we used the Benjamini-Hochberg method [27] to calculated the false discovery rate (FDR). We performed all statistical analyses in the R programming environment (version 4.0.3).

Identification of PC subtypes based on immune cell enrichment scores
Based on the enrichment scores of 28 immune cell types [21], we identified PC subtypes by hierarchical clustering. We performed the clustering analysis in ten PC datasets (TCGA-PRAD, DKFZ2018, GSE21034, GSE46602, GSE54460, GSE70770, GSE107299, GSE116918, GSE141551, and GSE157547), respectively. We compared the Silhouette scores between two clusters and three clusters in each of the ten datasets, and found the scores were consistently higher in the case of two clusters in all the datasets. Thus, we preferred to define two clusters (PC subtypes): PC immunity high (PC-ImH) and PC immunity low (PC-ImL), consistently in the ten datasets (Fig. 1A). To verify the significantly higher immunity in PC-ImH versus PC-ImL, we compared the enrichment scores of both immunostimulatory signatures (CD8+ T cells, immune cytolytic activity, and interferon (IFN) response) and immunosuppressive signatures (CD4+ regulatory T cells, M2 macrophages, T-cell exhaustion, and myeloidderived suppressor cells (MDSCs)) between PC-ImH and PC-ImL. Interestingly, all these immune signatures displayed significantly higher enrichment scores in PC-ImH than in PC-ImL (one-tailed Mann-Whitney U test, P < 0.01) in the ten datasets (Fig. 1B). Also, the ratios of immunostimulatory to immunosuppressive signatures (CD8+ /CD4+ regulatory T cells), which were the base-2 log-transformed values of the geometric mean expression levels of all marker genes of CD8+ T cells divided by those of CD4+ regulatory T cells, were significantly higher in PC-ImH than in PC-ImL (two-tailed Student's t test, P < 0.05) in most of these datasets (Fig. 1B). These results confirmed that PC-ImH had significantly stronger immune signatures than PC-ImL.

Clinical and phenotypic features of the PC subtypes
We compared disease-free survival (DFS) time between PC-ImH and PC-ImL in four datasets (TCGA-PRAD, GSE116918, GSE46602, and DKFZ2018) with survival data available. In these datasets, PC-ImL were likely to show Fig. 1 Hierarchical clustering of prostate cancer (PC) in ten transcriptomic datasets based on the enrichment scores of 28 immune cell types. A There are two PC subtypes identified: PC immunity high (PC-ImH) and PC immunity low (PC-ImL). The enrichment score of an immune cell type in a tumor sample was calculated by the single-sample gene-set enrichment analysis (ssGSEA) [19] of the expression profiles of its marker gene set. B Comparisons of the enrichment scores of immunostimulatory signatures (CD8+ T cells, immune cytolytic activity, and interferon (IFN) response), immunosuppressive signatures (CD4+ regulatory T cells, M2 macrophages, T-cell exhaustion, and myeloid-derived suppressor cells (MDSCs)), and the ratios of immunostimulatory to immunosuppressive signatures (CD8+ /CD4+ regulatory T cells), between PC-ImH and PC-ImL in TCGA-PRAD  T cell  Central memory CD8 T cell  Effector memeory CD8 T cell  Activated CD4 T cell  Central memory CD4 T cell  Effector memeory CD4 T cell  T follicular helper cell  Gamma delta T cell  Type 1 T helper cell  Type 17 T helper cell  Type 2 T helper cell  Regulatory T cell  Activated B cell  Immature B

DKFZ2018
Activated CD8 T cell  Central memory CD8 T cell  Effector memeory CD8 T cell  Activated CD4 T cell  Central memory CD4 T cell  Effector memeory CD4 T cell  T follicular helper cell  Gamma delta T cell  Type 1 T helper cell  Type 17 T helper cell  Type 2 T helper cell  Regulatory T cell  Activated B cell  Immature B  better DFS than PC-ImH (log-rank test, P = 0.014, 0.078, and 0.117, respectively) ( Fig. 2A). To support that the survival difference between both subtypes has an association with their different immune infiltration levels, we compared DFS between high-immune-score (> mean) and lowimmune-score (< mean) PC patients. The immune score represents the immune infiltration level in the tumor, which was calculated by the ESTIMATE algorithm [22]. We observed Fig. 1 (continued) that low-immune-score patients tended to have a better DFS than high-immune-score patients (Fig. 2B). It supports that the survival difference between both PC subtypes correlates with their different tumor immune microenvironment.
In TCGA-PRAD, we compared several clinical features between both subtypes, including blood level of prostatespecific antigen (PSA), Gleason score, pathological stage, lymph node staging, and metastasis. The elevated level of prostate-specific antigen (PSA) is often associated with PC. We found that PSA levels were significantly higher in PC-ImH than in PC-ImL (one-tailed Mann-Whitney U test, P = 0.009) (Fig. 2C). Tumor grade indicates how abnormal the tumor cells and the tumor tissue look under a microscope compared to normal cells and how quickly a tumor is likely to grow and spread. The Gleason score, ranging from 6 to 10, is the classical grading system for PC. We found that PC-ImH harbored a significantly higher proportion of high-grade (Gleason score of 9 to 10) tumors than PC-ImL (Fisher's exact test, P = 0.013; odds ratio (OR) = 1.68) (Fig. 2D). The pathological stage assesses how pervasive the cancer cells are within and around the prostate. We found that PC-ImH harbored a higher proportion of latestage (T3-4) tumors than PC-ImL (P = 0.063; OR = 1.43) (Fig. 2D). Lymph node staging indicates whether cancer cells are present in nearby lymph nodes. We observed that PC-ImH harbored a significantly higher proportion of patients whose cancer cells present in nearby lymph nodes than PC-ImL (P = 0.030; OR = 1.78) (Fig. 2D). Finally, we found that PC-ImH involved a significantly higher proportion of patients with metastasis than PC-ImL (P = 0.015; OR = 3.52) (Fig. 2D). Similar results were also observed in the other datasets. For example, in DKFZ2018, high-grade tumors had a significantly higher proportion in PC-ImH than in PC-ImL (P = 0.009; OR = 5.65).
We further compared two tumor phenotypes between PC-ImH and PC-ImL, including epithelial-mesenchymal transition (EMT) signature and intratumor heterogeneity (ITH). These phenotypes are associated with tumor progression, metastasis, and drug resistance [28][29][30]. We found that EMT signature scores were significantly higher in PC-ImH than Fig. 2 Comparisons of clinical and phenotypic features between the PC subtypes. Kaplan-Meier curves to compare DFS time between the PC subtypes (A) and between high-immune-score (> mean) and low-immune-score (< mean) PC patients (B). The log-rank test P values are shown in (A, B). Comparisons of PSA levels (C), proportion of high-grade (Gleason score of 9 to 10) tumors, proportion of late-stage (T3-4) tumors proportion of patients with cancer cells present in nearby lymph nodes, and proportion of patients with metastasis (D), scores of EMT signature (E), stemness (F), and ITH (G) between the PC subtypes. The one-tailed Mann-Whitney U test P values are shown in (C, E, F), and the Fisher's exact test P values and odds ratios are shown in (D). (H) Cox proportional hazards regression analysis with the response variable "DFS time" and the predictor variables "age", "PSA level", "Gleason score" and "subtype PC-ImH". The scores of EMT signature and stemness were the ssGSEA scores of their marker gene sets, and ITH scores were calculated by the DEPTH algorithm [29]. The immune scores calculated by ESTI-MATE [19] represent immune infiltration levels in tumors. DFS: disease-free survival. PSA: prostate-specific antigen. EMT: epithelialmesenchymal transition. ITH: intratumor heterogeneity. HR: hazard ratio. CI: confidence interval. *P < 0.05, **P < 0.01, ***P < 0.001, ns P ≥ 0.05. It also applies to the following figures ▸ in PC-ImL in the ten datasets (P < 0.01) (Fig. 2E); Stemness scores were significantly higher in PC-ImH than in PC-ImL in six of the ten datasets (P < 0.05) (Fig. 2F); ITH scores, which were evaluated by the DEPTH algorithm [29], were also significantly higher in PC-ImH than in PC-ImL in two datasets (P < 0.05) (Fig. 2G).
To investigate whether the worse DFS in PC-ImH versus PC-ImL was influenced by other confounding variables, we performed multivariate survival analysis with the multivariate Cox proportional hazards model. In the model, the response variable was DFS time, and the predictor variables included age, PSA level, Gleason score, and subtype. We found that the subtype PC-ImH was still a risk factor (P = 0.049; hazard ratio (HR) = 1.603 and 95% confidence interval (CI): [1.002, 2.560]) (Fig. 2H). As expected, both Gleason score (P < 0.001; HR = 2.261 and 95% CI: [1.773, 2.884]) and PSA level (P < 0.001; HR = 1.065 and 95% CI: [1.036, 1.094]) were also significant risk factors for DFS in PC. However, age showed no significant correlation with DFS time in PC (P = 0.506).
Taken together, these results suggest a worse prognosis in PC-ImH versus PC-ImL.

Genomic features of the PC subtypes
Tumor genomic features, such as TMB and CNA, have been associated with anti-tumor immune response in cancer [31]. However, TMB showed no significant difference between PC-ImH and PC-ImL (two-tailed Mann-Whitney U test, P = 0.98) (Fig. 3A). In contrast, CNA were significantly different between both subtypes. For example, homologous recombination deficiency (HRD) may contribute to aneuploidy (i.e., CNA) in cancer [32]. The HRD scores [32] were significantly higher in PC-ImH than in PC-ImL (one-tailed Mann-Whitney U test, P = 0.04) (Fig. 3B). The G-score calculated by GISTIC2 [33] represents the amplitude of the CNA and the frequency of its occurrence across a group of samples. We found that the G-scores of copy number amplifications and deletions were significantly higher in PC-ImL than in PC-ImH (Fig. 3C). Altogether, these results suggest: (1) PC-ImH has a higher level of genomic instability than PC-ImL; and (2) CNA rather than TMB has a significant impact on the TIME in PC.
We further compared activities (enrichment scores) of nine DNA damage repair (DDR) pathways between PC-ImH and PC-ImL. The nine pathways included mismatch repair, base excision repair, nucleotide excision repair, the Fanconi anemia (FA) pathway, homology-dependent recombination, non-homologous DNA end joining, direct damage reversal/ repair, translesion DNA synthesis, and damage sensor. Notably, the enrichment scores of these pathways tended to be higher in PC-ImL and PC-ImH (Fig. 3D). It could explain why PC-ImL is more genomically stable than PC-ImH in terms of DNA repair deficiencies being a primary factor responsible for genomic instability in cancer [34].

Pathways highly enriched in the PC subtypes
We identified KEGG pathways highly enriched in PC-ImH and PC-ImL by GSEA [24]. The pathways highly enriched in PC-ImH and PC-ImL were obtained by geneset enrichment analysis of significantly upregulated genes in PC-ImH and PC-ImL, respectively. Using a threshold of adjusted P-value < 0.05, we identified 67 pathways upregulated in PC-ImH and zero in PC-ImL. The pathways highly enriched in PC-ImH were mainly involved in to immune, stromal and oncogenic signatures (Fig. 4A). The immunerelated pathways included chemokine signaling, cytokinecytokine receptor interactions, Toll-like receptor signaling, natural killer cell-mediated cytotoxicity, leukocyte transendothelial migration, antigen processing and presentation, complement and coagulation cascades, NOD-like receptor signaling, Jak-STAT signaling, Fc epsilon RI signaling, Fc gamma R-mediated phagocytosis, cytosolic DNA-sensing, T-cell receptor, B cell receptor, and TGF-β signaling. The stromal signature-related pathways included cell adhesion molecules, regulation of actin cytoskeleton, focal adhesion, ECM-receptor interaction, adherens junction, and gap junction. The oncogenic pathways MAPK signaling, apoptosis, VEGF signaling, p53, and Wnt signaling. In addition, several pathways involved in neural regulation were also highly enriched in PC-ImH, including axon guidance, neuroactive ligand receptor interaction, and neurotrophin signaling. The upregulation of numerous immune-rated pathways in PC-ImH confirmed the stronger immune signatures in this subtype versus PC-ImL. Furthermore, we observed that the enrichment scores of the oncogenic pathways enriched in PC-ImH tended to have positive correlations with immune scores in the ten datasets (Spearman correlation, P < 0.05) (Fig. 4B). It indicates that the upregulation of these pathways may activate the inflammatory TME in PC.

Protein expression profiles of the PC subtypes
We found 17 proteins differentially expressed between PC-ImH and PC-ImL in TCGA-PRAD (two-tailed Student's t test, FDR < 0.05) (Fig. 6A). Among these proteins, six displayed significant higher expression levels in PC-ImH than in PC-ImL, including Stathmin, PREX1, Cyclin_D1, HSP70, Lck, and Syk. In fact, many of these proteins have been associated with PC. For example, Stathmin, a member  (A, B, D). TMB is the total number of somatic mutations. The G-scores were calculated by GISTIC2 [33]. TMB: tumor mutation burden. HRD: homologous recombination deficiency. CNA: copy number alteration of microtubule-destabilizing protein family, plays a critical role in the regulation of mitosis and is overexpressed in various cancers [35]. This protein has been suggested as a target for the treatment of PC [35]. PREX1, a guanine nucleotide exchange factor for the RHO family of small GTP-binding proteins, is overexpressed in various cancers and its overexpression promotes PC metastasis [36]. Cyclin_D1 is a key regulator of cell cycle and its overexpression is associated with PC metastasis [37]. Hsp70, a stress-inducible protein, has been indicated as a potential biomarker and therapeutic target in PC [38,39]. SYK, a member of the Syk family of tyrosine kinases, has been implicated as a candidate therapeutic target for the therapy of advanced PC [40]. Overall, previous studies have indicated that overexpression of these proteins is associated with PC advancement or metastasis, consistent with their upregulation in the subtype with worse prognosis. Notably, most of these proteins displayed positive expression correlations with immune scores (Spearman correlation, P < 0.05) (Fig. 6B), consistently with their upregulation in PC-ImH.
The 11 proteins having higher expression levels in PC-ImL than in PC-ImH included FASN, Claudin-7, ACC1, E-Cadherin, ACC_pS79, β-Catenin, VEGFR2, p70S6K, mTOR, SCD1, and Bak. In contrast with the proteins Fig. 4 Pathways enriched in the PC subtypes. A The KEGG pathways involved in immune, stromal, and oncogenic signatures highly enriched in PC-ImH relative to PC-ImL in TCGA-PRAD. These pathways were identified by GSEA [24]. B Spearman correlations between the oncogenic pathways' scores and immune scores. FDR: false discovery rate. The Spearman correlation coefficients (ρ) and P values are shown upregulated in PC-ImH, downregulation of many of these proteins is associated with PC progression, such as E-Cadherin [41], mTOR [42], and Bak [43]. As expected, most of these proteins showed negative expression correlations with immune scores (Spearman correlation, P < 0.05) (Fig. 6B), consistently with their upregulation in PC-ImL.

Prediction of the PC subtypes
To demonstrate the predictability of the immune signature scores-based classification method, we performed the class prediction of the PC subtypes based on the enrichment scores of the 28 immune cell types by the Random Forest (RF) algorithm [26]. We used a dataset as the training set and the other nine datasets as test sets by turns. When TCGA-PRAD was the training set, its tenfold crossvalidation (CV) accuracy, sensitivity, and specificity was 91.8%, 92.6%, and 92.2%, respectively. In seven of the Fig. 5 Genes differentially mutated between the PC subtypes in TCGA-PRAD. A 8 and 25 genes with significantly lower and higher mutation frequencies in PC-ImH than in PC-ImL, respectively (Fisher's exact test, P < 0.05). B The mutations of the 8 genes with lower mutation frequencies in PC-ImH correlate with lower immune scores in PC. The onetailed Mann-Whitney U test P values are shown nine test sets, the prediction accuracies were higher than 80% (higher than 90% in two test sets); in all the nine test sets, the sensitivities were higher than 80% (higher than 90% in six test sets); in five test sets, the specificities were higher than 80% (higher than 90% in three test sets) (Fig. 7). We obtained similar results when the other datasets were the training set (Fig. 7). These results indicate the predictability of the immune signature scores-based classification method.

Validation of the subtyping method in scRNA-seq data
Using the immune signature scores-based clustering method, we analyzed a scRNA-seq dataset (GSE141445) [18]. By the Fig. 6 Proteins differentially expressed between the PC subtypes in TCGA-PRAD. A) Heatmap showing 6 and 11 proteins with significant higher and lower expression levels in PC-ImH than in PC-ImL, respectively (two-tailed Student's t test, FDR < 0.05). B) Spear-man correlations between the expression levels of these proteins and immune scores in PC. The Spearman correlation coefficients (ρ) and P values are shown t-distributed stochastic neighbor embedding (tSNE) algorithm [44], we clustered 36,424 single cells (Fig. 8A). The malignant and non-malignant cells were clearly separated. Moreover, the malignant cells were clearly separated based on different patients. For each PC patient, we hierarchically clustered its tumor cells based on their enrichment scores of four immune-rated pathways, including antigen processing and presentation, PD-L1 expression pathway in cancer, apoptosis, and JAK-STAT signaling. We used the four immune signatures instead of the 28 immune cell types for clustering bulk tumors because these immune signatures are likely expressed in tumor cells themselves [45][46][47][48]. Consistently, we classified all tumor cells in a patient into two subgroups, also termed PC-ImH and PC-ImL (Fig. 8B).
Interestingly, we found that PC-ImH tumor cells were obviously closer to T cells than PC-ImL tumor cells (Fig. 8C). Based on the single-cell clustering results, we defined a PC patient as PC-ImH if most (> 50%) of its tumor cells belonged to the immunity high group (PC-ImH), otherwise as PC-ImL. According to this rule, seven and six PC patients were classified into PC-ImH and PC-ImL, respectively (Fig. 8D). Likewise, we compared clinical, phenotypic, and molecular features between both subtypes in this dataset. Consistent with previous results, PSA levels were significantly higher in PC-ImH than in PC-ImL (P = 0.009) (Fig. 8E). The stemness scores, and cell cycle activity (ssGSEA scores of the cell cycle pathway) were also significantly higher in PC-ImH than in PC-ImL (P < 0.05) (Fig. 8E). These results supported that PC-ImH had a worse prognosis than PC-ImL.

Identification of two subgroups in PC-ImH
In TCGA-PRAD and GSE116918, there were significantly different survival prognosis between PC-ImH and PC-ImL ( Fig. 2A). We further analyzed both datasets by the graph learning-based dimensionality reduction analysis [49]. We found that PC-ImH could be further divided into two subgroups, termed PC-ImH-A and PC-ImH-B (Fig. 9A). Compared to PC-ImH-B, PC-ImH-A showed stronger immune signatures (Fig. 9B) but worse DFS (Fig. 9C); PC-ImH-A harbored a higher proportion of late-stage (T3-4) tumors than PC-ImH-B (P < 0.001; OR = 2.48) (Fig. 9D). Moreover, PC-ImH-A had higher stemness scores, ITH levels and MKI67 expression levels than PC-ImH-B (Fig. 9E). Overall, these results are consistent with the previous results of the comparisons between PC-ImH and PC-ImL, supporting the negative association between immune activities and clinical outcomes in PC.

Association between TMPRSS2-ERG fusion and immune signatures in PC
TMPRSS2-ERG fusion has been indicated as a biomarker for PC [50]. Interestingly, in two datasets (TCGA-PRAD and DKFZ2018) with TMPRSS2-ERG fusion data available, TMPRSS2-ERG fusion was consistently more frequent in PC-ImL than in PC-ImH (Fisher's exact test, P < 0.05; OR > 1.5) (Fig. 10A). It indicates a negative association between TMPRSS2-ERG fusion and antitumor immune response. Indeed, in both datasets, the tumors with TMPRSS2-ERG fusion displayed significantly lower immune signature scores than those without TMPRSS2-ERG fusion (Fig. 10B). Moreover, the tumors with TMPRSS2-ERG fusion showed better DFS than those without TMPRSS2-ERG fusion (Fig. 10C). It suggests that TMPRSS2-ERG fusion could be a positive prognostic factor for PC patients, while it is a biomarker for PC tumorigenesis.

Discussion
We identified two immune-specific subtypes of PC based on the enrichment scores of 28 immune cell types. Compared to PC-ImL, PC-ImH displayed higher levels of immune infiltration and stromal contents, worse prognosis and tumor progression, higher scores of EMT signature, tumor stemness, and ITH, and higher level of genomic instability. Our data suggest that "hot" PCs have stronger anti-tumor immune response, while worse clinical outcomes compared to "cold" PCs. This finding is consistent with that from a previous study [51]. The main reason could be that the strong antitumor immune response is in fact the tumor progression-promoting inflammation in PC [52]. The negative association between anti-tumor immune response and clinical outcomes have also been indicated in some cancer types, such as gliomas [53]. However, in many other cancer types, such as head and neck squamous cell cancer [54], gastric cancer [55], and triple-negative breast cancer [56], elevated anti-tumor immune response is associated with better clinical outcomes. Thus, the association between anti-tumor immune response and clinical outcomes is cancer type dependent.
Interestingly, although high TMB is a predictive biomarker for the active response to immunotherapy for its increased production of neoantigens to incite anti-tumor immune response, TMB showed no significant difference between PC-ImH and PC-ImL. However, CNA, an immunosuppressive marker [31], is more significant in PC-ImL than in PC-ImH. It suggests that CNA instead of TMB plays a crucial role in the regulation of TIME in PC. PD-L1, an immunosuppressive signature, also showed higher mRNA expression levels in PC-ImH than in PC-ImL in eight of the ten datasets (two-tailed Student's t test, P < 0.05). The possible reason could be that PD-L1 is also expressed on immune cells, which are more abundant in PC-ImH. Indeed, by analyzing the scRNA-seq dataset (GSE141445) [18], we found that PD-L1 was more highly expressed in immune cells than in tumor cells (P < 0.001). Because both PD-L1 expression [57] and high level of immune infiltration [16] are indicators of an active response to ICIs, PC-ImH would respond better to ICIs than PC-ImL. This hypothesis needs to be approved by clinical data, whereas they are not currently publicly available.
In conclusion, the identification of immune-specific subtypes has potential clinical implications for the prognosis and immunotherapy of PC.