Biological functions of 6,762 DEGs and 36 DE-NMRGs
In TCGA-GBM samples, totally 6,762 DEGs were screened, of which 3,392 up-regulated and 3,370 down-regulated genes (Figure 1A-B). The biologically relevant functions of DEGs were explored by enrichment analysis, GO analysis showed that these DEGs were related to the synaptic membrane, presynapse and postsynaptic specialization (Figure 1C). In KEGG analysis, DEGs were significantly enriched in the GABAergic synapse, coronavirus disease-COVID-19, glutamatergic synapse and neuroactive ligand-receptor interaction (Figure 1D). Furthermore, the 6,762 DEGs were intersected with 97 NMRGs to obtain 36 DE-NMRGs (Figure 1E). The biological functions of DE-NMRGs were analyzed using proteomaps, as showed that DE-NMRGs were predominantly enriched in purine metabolism and pyrimidine metabolism (Figure 1F-G).
NUDT1, CDA, UPP1 and ADSL were treated as the primary biomarkers
The prognostic impact of DE-NMRGs on GBM patients was predicted by constructing prognostic model. In the univariate Cox analysis, a total of five candidate characterized genes with higher prognostic ability, namely TYMP, NUDT1, CDA, UPP1 and ADSL (Figure 2A). When “lambda” was taken 12, five candidate characterized genes were regarded as effective characterized genes to construct the optimal model (Figure 2B-C). Then, the NUDT1, CDA, UPP1 and ADSL were screened by multivariate Cox analysis and stepwise regression analysis (Figure 2D). The prognostic model constructed from the above characterized genes was subjected to PH assumption test, which showed that each characterized gene was not statistically significant (P > 0.05). Thus NUDT1, CDA, UPP1 and ADSL were treated as the primary biomarkers, and the prognostic model satisfied PH assumption (Figure 2E).
The higher the risk score, the lower the survival rate
In the high- /low-risk groups of TCGA-GBM and CGGA-GBM, the higher the risk score, the lower the survival rate of GBM patients (Figure 3A-D). ROC analysis revealed that AUC were all greater than 0.6, demonstrating that the prognostic model was appropriately accurate for predicting the overall survival (OS) in GBM. The specific values of the AUC were shown in Figure 3E-F. KM survival curves indicated that the GBM low-risk patients were lived longer (P < 0.001) (Figure 3G-H). In the PCA analysis, as shown in the Figure 3I-J, there was clear clustering difference, so the prognostic genes were considered to have the ability to differentiate between high- /low-risk patients.
Risk scores were effective in predicting survival in almost all subgroups of clinical characteristics
The correlation between risk score and clinical characteristics were analyzed, as shown that in the two risk groups, high-risk patients were more likely to be significantly associated with older age, wild-type IDH mutation status, demethylated MGMT promoter status and Mesenchymal subtype (Figure 4A-E). Meanwhile, the stratified survival analysis was used to analyze whether the risk score could be applied to different clinical characteristics, revealing that the risk score could effectively predict OS in almost all subgroups from different clinical characteristics (Figure 4F-O).
IDH mutation status, MGMT promoter status and riskScore were independent prognostic factors
In TCGA-GBM samples, correlation between clinical characteristics and the prognostic model was further investigated. The univariate Cox analysis showed that the age, IDH mutation status, MGMT promoter status and riskScore were significant correlation with the prognostic model (P < 0.01) (Figure 5A), and all of them fulfilled the multivariate Cox analysis (P = 5.3814e-08, C-index = 0.71) (Figure 5B). In PH assumption test, except for age, the remaining clinical characteristics were correlated with the prognostic model (P > 0.05) (Figure 5C), and all of them fulfilled the multivariate Cox analysis (P = 2.3763e-07, C-index = 0.68) (Figure 5D). Therefore, IDH mutation status, MGMT promoter status and riskScore were selected as independent prognostic factors. Subsequently, the clinical characteristics were studied by drawing the nomogram, finding that the higher the score, the lower the survival rate (Figure 5E). Also, the validity of nomogram was verified with calibration curves, and AUCs were 0.765, 0,895 and 0.871 at 1-, 2- and 3-years, respectively (Figure 5F-G). All of which proved that the nomogram had good predictive ability for the survival rate of CRC patients. Finally, the survival analysis indicated that the survival rate was significantly lower in the high-scoring group (P < 0.05) (Figure 5H).
Biological pathways associated with four biomarkers
In TCGA-GBM samples, the expression analysis of biomarkers showed that the expression of biomarkers in tumor tissues was significantly higher (P < 0.05) (Figure 6A). To clarify the enriched regulatory pathways and related functions of the biomarkers, using UPP1 as an example, UPP1 was significantly associated with adaptive immune response, leukocyte chemotaxis and lymphocyte mediated immunity in the single-gene GSEA GO analysis (Figure 6B). In the single-gene GSEA KEGG analysis, the pathways positively correlated with UPP1 were hematopoietic cell lineage, ribosome and cytokine-cytokine receptor interaction (Figure 6C). The specific results of the univariate GSEA analysis for the remaining three biomarkers were shown in Additional file 1.
Acquisition of four immune cells associated with biomarkers
The role of biomarkers in immune infiltration was further explored in high- /low-risk samples of TCGA-GBM. The distribution of 22 immune cells abundances was shown in Figure 7A. Among them, the cell abundance of activated memory CD4+ T cell, activated natural killer (NK) cell, M1 macrophage and neutrophil were significantly different between two risk groups (P < 0.05) (Figure 7B). UPP1 was the highest positive correlation with activated memory CD4+ T cell (|cor| = 0.35); whereas NUDT1 was the highest negative correlation with monocyte and (|cor| = -0.31). (Figure 7C). Then, the expression of CD274, HAVCR2, LGALS9, PDCD1 and PDCD1LG2 was obtained by intersecting the immune checkpoints and TCGA-GBM dataset, and the expression and clinical information were presented (Figure 7D). However, only CD274 and PDCD1 were significantly different between two risk groups, and their expression was higher in the high-risk group (P < 0.05) (Figure 7E).
The overall TMB was higher in the high-risk group
In TCGA-GBM, the four biomarkers had the highest proportion of missense mutation, in which the type of variant was mainly SNP. SNP mutation mode was mainly manifested as C to T. The average number of variants in the sample was 43, and the top-10 mutated genes were listed here, namely TTN, TP53, EGFR, PTEN, MUC16, NF1, SPTA1, LRP2, RYR2 and FLG (Figure 8A). Then, the five genes (TTN, TP53, EGFR, PTEN and MUC16) with the highest mutation proportions between two groups were demonstrated, as shown that TP53 was higher mutation proportions between the two risk groups, but the overall missense mutations were more frequent in the high-risk group (Figure 8B). TMB analysis revealed that the overall TMB was higher in the high-risk group (Figure 8C-D). In the low-risk group, TP53 and EGFR were significant mutual exclusivity, but EGFR and PCLO had significant co-occurrence (Figure 8E). In the high-risk group, TP53 and MUC16 were mutually exclusive, while TTN and GLYR1 showed significant co-occurrence (Figure 8F). At last, protein expression validation showed that ADSL was highly expressed in tumor samples compared to the normal samples (Figure 8G-N).