Identification of BLCA specific PDEARGs. We obtained 3126 differentially expressed genes (DEGs) based on the TCGA-BLCA dataset, among which 1223 genes were downregulated, and 1903 genes were upregulated in tumor tissues compared with normal tissues (false discovery rate (FDR) < 0.05, |log2 FC | > 1; Fig 2A). Then, 34 BLCA-specific differentially expressed ARGs (DEARGs) were identified (Fig 2B), and five prognostic DEARGs (PDEARGs) were found to be significantly associated with the OS of BLCA patients (P < 0.05; Fig 2C). Among the five PDEARGs, APOL1 with hazard ratio ≤ one was regarded as a protective gene, while the other four genes (DIRAS3, NAMPT, P4HB, and SPHK1) were identified as high-risk genes predicting a poor prognosis of BLCA.
Construction of PDEARGs-based prognostic signature. The regression coefficients were employed to construct a five PDEARGs-based prognostic risk model (Table 1). Subsequently, the risk score for each patient was calculated with the following computational formula:
Risk score = (-0.0019 × expression of APOL1) + (0.0775 × expression of DIRAS3) + (0.0127 × expression of NAMPT) + (0.0030 × expression of P4HB) + (0.0120 × expression of SPHK1).
Individuals were sorted into a high-risk group (n = 185) and a low-risk group (n = 186) by the median risk score of 1.02. Survival analysis indicated that the prognosis was poorer in the high-risk group than in the low-risk group (P < 0.001; Fig 3A). Precisely, the five-year OS rate in the high-risk group was 33.2%, while it was 59.5% in the low-risk group. Then, we analyzed the distribution of each patient's risk score and survival status, and a large amount of death existed in the high-risk group (Fig 3B). As well, a heat map and a series of box plots were generated to depict the expression level of the five selected PDEARGs, among which the protective gene (APOL1) was downregulated, and the other four risk genes were upregulated in patients with high-risk score (Fig 3B and D). The area under the curve (AUC) was 0.724, which was much higher than other clinical parameters, suggesting that the present model was more accurate in predicting BLCA patients' OS (Figure 3C).
To validate the RNA-seq data, we analyzed the expression level of five selected PDEARGs using qRT-PCR. As shown in Figure 3E, the protective gene (APOL1) was downregulated, while the other four risk genes were upregulated in tumor tissues, compared with the adjacent, which were consistent with the RNA-seq data above.
Independent prognostic value of present signature. As shown in Figure 4A, the variables of pathological stage, tumor-node-metastasis (TNM) status, and risk score were associated with the prognosis of BLCA patients (P < 0.05). Multivariate analysis showed that the risk score was an independent prognostic factor for OS (P < 0.01; Fig 4B). The hazard ratio for the risk score was 1.695, indicating a high-risk score would predict a bad prognosis. The commonly used clinical variables, such as age, gender, pathological stage, and TNM status, were insufficient to serve as independent prognostic predictors (P > 0.05).
Clinical utility of present signature. The relationship between present model and clinical variables was analyzed (Table 2). High-expression of APOL1 was associated with decreased pathological stage, N status, and M status (P < 0.05; Fig 5B-D). Contrarily, as the value of risk score or the expression of the other four genes (DIRAS3, NAMPT, P4HB, and SPHK1) increased, the histological grade, pathological stage, T status, or N status of BLCA patients increased (P < 0.05; Fig 5E-U). Furthermore, low-expression of APOL1 and increased risk score were observed in patients with age > 65 (P < 0.05; Fig 5A-Q). Survival analysis showed that high expression of the protective gene APOL1 indicated a good prognosis (P < 0.01; Fig 6A), while increased expression of risk gene P4HB resulted in a poor prognosis (P < 0.05; Fig 6D). The other three risk genes (DIRAS3, NAMPT, and SPHK1) had no significant effect on survival outcome (P > 0.05; Fig 6B, C, and E).
Correlation between risk score and TME. As shown in Figure 7A-D, with the increase of risk score generated by present prognostic signature, tumor-infiltrating stromal cells (i.e., stromal score) and estimate score (sum of stromal score and immune score) significantly increased, meanwhile tumor purity significantly decreased (P < 0.05). Additionally, 22 types of tumor-infiltrating immune cells (TIICs) in TCGA-BLCA were analyzed with CIBERSORT, and the content of neutrophils, macrophages M0 and M2 increased with risk score (P < 0.05; Fig 7E-H). Furthermore, a high proportion of macrophages M0 could lead to a poor prognosis, accompanied by increased pathological stage, N status, and M status of BLCA patients (P < 0.05; Fig 7I-L).
GO enrichment analysis. Gene Ontology (GO) enrichment analysis showed that the five selected PDEARGs were associated with 52 biological processes (BPs), six molecular functions (MFs), and nine cellular components (CCs), among which immunologic process-related GO terms were significantly enriched, such as the activation of microglial cell, leukocyte, and macrophage as well as inflammatory response. Besides, oxidative stress and relevant apoptotic signal pathway were significantly enriched (FDR < 0.05; Fig 8A). To better explain the relationship between PDEARGs and GO terms, a chord plot was constructed, and three essential genes (ARGs SPHK1, P4HB, and NAMPT) were founded (Fig 8B).
Construction of PDEARGs-based systematic signature. To promote clinical application and predictive accuracy of the model mentioned above, we constructed a systematic prognostic signature by integrating seven clinical variables and the five PDEARGs-based signatures. The clinical variables (i.e., age, histological grade, and TNM status) correlated highly with the aforementioned risk score were deleted to avoid overfitting. Then, the regression coefficients were used to weight selected variables, and the systematic risk of each patient was calculated with the following computational formula (Table 3):
Systematic risk = (-0.5755 × gender) + (0.6253 × pathological stage) + (0.5276 × risk score generated from aforementioned model); gender: female = 1, male = 0.
The median of the systematic risk was 0.96, and with the increase of systemic risk, a bad prognosis of BLCA patients was observed (P < 0.001; Fig 9A and C). Precisely, the three-year and five-year OS rate was 49.0% and 37.6% in the high-risk group, while it was 77.6% and 69.0% in the low-risk group, respectively. A nomogram was constructed accordingly, and the AUC value for systematic prognostic signature was 0.791, which was more accurate than the pure PDEARGs-based model (Fig 9B and D).
External validation of both signatures. Both PDEARGs-based prognostic signature and systematic signature were validated in external datasets. As shown in Figure 10A and B, a significantly different survival outcome was observed between the two risk groups (P < 0.005, log-rank test). The OS rate at 5-year was 15.49% and 37.50% for PDEARGs-based high-risk and low-risk groups. The corresponding rate for systematic signature was 20.83% and 32.39%, indicating that both signatures could precisely predict BLCA prognosis regardless of internal and external patients. Besides, the prognostic value of five selected PDEARGs was estimated through survival analysis. Except for APOL1, the other four genes could effectively separate BLCA patients with different survival outcomes (P < 0.05, Fig 10C). Though these results were not exactly consistent with Figure 6, the general trend of survival curve between the high expressive and low expressive groups was similar.