Database Queue Information
The GBM case cohort contains 155 cases of GSE16011 in the training set, 209 cases of GSE108474 in the test set, and 518 cases of TCGA. In Table 1 we obtained the median age of the patients as 55 years. The number of male and female cases in the training set was 105 and 50, respectively, and because GBM is a malignant tumor with high mortality, the number of deceased cases in the training set was 147 and 8 survived.
Correlation of immune infiltration and GBM in the tumor microenvironment
Cellular characterization was performed by the CIBERSORT algorithm, and we found that tumor-associated macrophages and T cells were the most represented TME-infiltrating cells (Figure 1A). The infiltration abundance of immune cells was assessed using the xCell algorithm. The correlation network plots reflected the correlation between different types of immune cells (Figure 1B). Figure 1C shows the immune infiltration correlation heat map. Subsequently, 155 cases in GSE16011 were analyzed by both ESTIMATE and xCell algorithms to assess the infiltration status of immune cells and stromal cells in the tumor microenvironment in GBM patients (Figure 1D). The figure shows the cumulative distribution curves of the scores calculated by the two algorithms. It can be seen that the score of immune cells is higher than that of stromal cells through two different algorithms, indicating that immune infiltration plays a major role in it.
Four types of scores were significantly associated with GBM subtypes
Immune, stromal, estimate and tumorpurity scores were calculated using ESTIMATE on the GSE16011 gene expression profile. We divided patients into high and low groups by the optimal breakpoint of scores. Significant differences were shown by the survival curves for both high and low scoring subgroups of the four scores. The overall survival time and scores for immune (p=0.0015), stromal (p=0.0015) and estimate scores (p=0.0038) were inversely proportional, with a better prognosis for low scores (Figure 2A, B and C). Interestingly, the tumor purity score and overall survival time were positively correlated, with higher scores having a better prognosis (p=0.0038) (Figure 2D).
Identification of tumor microenvironmental cells in the scRNA-seq dataset
As shown in Figure 3A, after quality control, we obtained a final set of 11,762 cells. we conducted correlation analysis and found no association between sequencing depth and mitochondrial gene sequences. However, there is a significant positive correlation between sequencing depth and total intracellular sequences in our analysis (Figure 3B). In our PCA analysis, we selected 20 principal components based on a significance threshold of p-value < 0.05 (Figure 3C, D). Subsequently, we successfully applied the UMAP method to cluster the cells, resulting in the identification of 11 distinct clusters (Figure 3E). From these 11 clusters, we determined 11126 marker genes, and a subset of these marker genes was visualized using a cluster heatmap (Figure 3F). This comprehensive analysis enabled us to characterize the cellular heterogeneity and identify key genes associated with different cell clusters.
Establishment of risk score (RS) model in GBM patients
In the bulk RNA-seq data, we identified 750 highly expressed genes by applying the criteria of |log2 FC| > 1 and p < .05. Additionally, we selected 2000 variable genes from a pool of 18575 genes in the scRNA-seq data (Figure 4A). Among them, 235 genes were found to be common in the differentially expressed genes (DEGs) identified from both the GBM scRNA-seq and TCGA-GBM datasets (Figure 4B). Subsequently, we performed univariate Cox analysis on these genes and selected 22 genes for random forest analysis. Finally, we identified 9 genes that were significantly associated with patient prognosis (Figure 4C). Finally, nine genes (AK5, ATP2B1, CNTN2, GABARAPL1, HK2, IFI44, PLP2, S100A11 and ST18) were identified by permuting the obtained genes to construct the risk assessment model:
Nine genes risk score model prognostic analysis
The GSE16011 dataset is used as the training dataset, and the GSE108474 and TCGA datasets are used as external validation datasets to evaluate the robustness and effectiveness of our risk score prognostic model. In Figure 5A, C, and E, the survival curves demonstrated that the high-risk group exhibited a significantly worse prognosis compared to the low-risk group (p < .01) in all these datasets.
Furthermore, we conducted time-dependent ROC curve analysis to predict the 1-, 3-, and 5-year survival rates. The area under the curve (AUC) values for 1-, 3-, and 5-year overall survival (OS) in the GSE16011 dataset were 0.661, 0.7093, and 0.726 (Figure 5B). In the GSE108474 dataset, the AUC values for 1-, 3-, and 5-year OS were 0.579, 0.600, and 0.713, respectively (Figure 5D). Additionally, in Figure 5F, the AUC values for 1-, 3-, and 5-year OS in the TCGA dataset were 0.541, 0.641, and 0.687.
Taken together, these findings highlight the effective predictive capability of the risk score prognostic model developed through integrated analysis of scRNA-seq and bulk RNA-seq) datasets.
Integrated analysis of scRNA-seq and bulk RNA-seq data with clinical survival information
We performed cell type identification on different clusters and identified seven distinct cell types, namely B cells, CD14 monocytes, CD56-dim natural killer cells, macrophages, oligodendrocytes, oligodendrocyte precursor cells, and T cells (Figure 6A, C). Subsequently, we employed SCISSOR to investigate the association between scRNA-seq and bulk RNA-seq data with patient survival. In the tumor microenvironment, B cells, CD14 monocytes, CD56-dim natural killer cells, and T cells exhibited a negative correlation with patient survival, while macrophages demonstrated a similar impact on patient survival. Among the identified cells, 8,093 were found to be unrelated to survival, 2,748 showed a negative correlation, and 921 showed a positive correlation with survival (Figure 6B, D).
The spatial distribution and expression of prognostic-related genes in both scRNA-seq and bulk RNA-seq datasets
Subsequently, we presented the distribution of the nine prognostic-related genes in scRNA-seq. We observed that these genes were predominantly expressed in macrophages, CD14, and T cells (Figure 7A). Furthermore, we calculated risk scores for the training set separately and generated correlation heat maps in bulk RNA-seq. As depicted in Figure 7B, higher risk scores in the training set were associated with increased expression of AK5, ATP2B1, CNTN2, HK2, IFI44, PLP2, S100A11, and ST18, while showing decreased expression of GABARAPL1. Cox analysis results also demonstrated that GABARAPL1 exhibited low expression in the high-risk group, suggesting its potential as a protective factor (Table 2).
Mutation analysis
To investigate the relationship between mutations and the tumor immune microenvironment, we analyzed mutation data from the TCGA-GBM cohort and showed the top 20 frequently mutated genes in individuals. From the oncoplot, we found that four genes with mutation frequencies higher than 20% were PTNT, TP53, TTN and EGFR (Figure 8A). We also plotted the gene cloud, where the size of each gene is proportional to the total number of samples in which it is mutated, and it was also evident that four genes have the highest mutation frequency in the sample (Figure 8B). Compared with a single mutations, we also detected co-mutations of the top 20 mutated genes by pair-wise Fisher's Exact test. The results of green are mutated genes that tend to coexist; yellow is mutually exclusive, and the depth of color indicate significance (Figure 8C), providing a theoretical basis for clinical treatment. Figure 8D shows the variant allele frequency (VAF), VAF can be used to infer tumor heterogeneity and tumor purity. In addition, high or low VAF may affect the prognosis of cancer. We also enriched oncogenic pathways, with RTK-RAS, PI3K and TP53 being the top 3 pathways in the cases (Figure 8E). Finally, we selected the set of mutated genes of greatest prognostic significance (TP53 and ATRX) for analysis, and interestingly, the set of mutated genes were associated with a good prognosis (Figure 8F).
IFI44 promoted the proliferation, migration, and invasion ability of glioma cells
we focused on one specific gene, IFI44, which is established as biomarkers in a variety of tumor types. Their functional significance in glioma remains understudied. To further validate the potential tumor inhibiting role of IFI44 in glioma, we explored its function in the glioma cell line. We first transfected siIFI44 and IFI44 into U87and U251 cells, and the efficiencies were checked by Western blot (Figure 9A), we analyzed the effect of IFI44 on cell viability using CCK-8 assay and observed increases in the viabilities of IFI44 overexpression in U87and U251 cells (Figure 9B). Correspondingly, the knockdown of IFI44 led to impaired cell viability. We also investigated the effect of IFI44 on the migration and invasion of glioma cells. The results indicated that IFI44 overexpression significantly promoted the migration and invasion of glioma cells, whereas IFI44 knockdown attenuated the migration and invasion of U87 and U251 cells (Figure 9C and 9D). Through analysis of the timer database, Higher levels of IFI44 expression are associated with poorer survival rates (Figure 9E). Together, these findings revealed that changes in IFI44 expression affected the malignant phenotype of glioma cells.