Background：Glioblastoma multiforme (GBM) is one of the most common malignancies in the world and many studies have used traditional high-throughput RNA sequencing (bulk RNA-seq) data to explore its potential prognostic markers. However, it is unable to detect specific cellular and molecular changes in tumour cells. The aim of this study was to use single cell RNA-seq (scRNA-seq) to discover marker genes in endothelial cells and to construct a prognostic model for GBM patients in combination with traditional RNA-seq data.
Methods：Bulk RNA-seq data were downloaded from The Cancer Genome Atlas (TCGA) and The China Glioma Genome Atlas (CGGA) databases. 10x scRNA-seq data for GBM were obtained from the Gene Expression Omnibus (GEO) database. The uniform manifold approximation and projection (UMAP) was used for downscaling and cluster identification. Key modules and differentially expressed genes (DEGs) were identified by weighted gene correlation network analysis (WGCNA). Non-negative matrix decomposition (NMF) algorithm was used to identify different subtypes based on DEGS. And Cox regression analysis was used to build prognostic models. Finally, differences in mutational landscape, immune cell abundance, immune checkpoint inhibitors (ICIs)-associated genes, immunotherapy effects and enriched pathways were also investigated between different risk groups.
RESULT：The analysis of scRNA-seq data from eight samples revealed 13 clusters and four cell types. After applying differential analysis, endothelial cells were identified as the most important significant cell type. In addition, we explored potential neoangiogenic pathways in GBM after controlling for phenotypic differences between GBM and normal brain tissue ECs at the single cell level. Overall, through differential analysis, WGCNA and screening of endothelial cell marker genes, we identified 157 DEGs for the construction of prognostic models. In addition, based on DEGs,the NMF algorithm identified two clusters with different prognostic and immunological features observed. We finally built a prognostic model based on the expression levels of four of these key genes. Higher risk scores were significantly associated with poorer survival outcomes and were associated with low mutation rates in IDH genes and upregulation of immune checkpoints such as PD-L1 and CD276. The CGGA cohort served as an external validation cohort for our findings.
Conclusion：We built and validated a 4-gene signature for GBM using 10 scRNA-seq and bulk RNA-seq data in this work. We believe our findings will provide greater insight into the characteristics of ECs in GBM and provide potential prognostic biomarkers to design rational treatment plans.
Due to its confined and locally aggressive growth, glioblastoma multiforme is one of the most prevalent malignant tumors in the world, with a significant morbidity and fatality rate. It is the most common primary intracranial tumor. And among the most dangerous brain tumors, glioblastoma multiforme (GBM) accounts for about half of them. The prognosis of GBMs is extremely discouraging, with less than 5% of observations surviving more than 5 years at the time of diagnosis. With the development of research, remarkable results have been achieved in exploring the molecular pathogenesis of glioma, such as isocitrate dehydrogenase (IDH) statusand O6-methylguanine DNA methyltransferase promoter (MGMTp) methylation. Diagnoses, categorization systems, and precise therapy have all improved as a result of these findings. Despite the fact that IDH mutations help individuals with gliomas live longer, gliomas with IDH mutations return frequently. Further research is therefore essential for the identification of new molecular targets, prognostic assessment work and the development of therapeutic options. The US Food and Drug Administration (FDA) has authorized just four medications for the treatment of GBM: Bevacizumab, temozolomide, lomustine, and carmustine. Although these adjuvant drugs and surgical treatments have improved the prognosis of glioma patients to some extent, the overall survival (OS) of patients is still very low. This is partly due to the fact that the mechanisms of tumor microenvironment and immune evasion are not fully understood and that high-grade gliomas are spatially and temporally heterogeneous. Different cells with different mutational characteristics. Most important is the presence of the blood-brain barrier (BBB), a dynamic interface between blood and brain tissue that selectively prevents the passage of substances. The effectiveness of antitumor chemotherapeutic agents is hampered by the blood-brain barrier, which strictly regulates the homeostasis of the central nervous system.
In recent years, a growing number of studies have used traditional bulk RNA sequencing data to explore potential prognostic markers for GBM and to improve our understanding of tumorigenesis and progression. For example, a prognostic model was developed based on 5 iron death-related genes to predict survival and response to immunotherapy in GBM patients. There is also a prognostic model of GBM constructed based on three angiogenesis-related bases. These prognostic characteristics are based on traditional RNA-seq, and because GBM is a highly heterogeneous tumor, these approaches are unable to detect exact cellular and molecular alterations because bulk RNAseq mostly represents the "average" expression of all cells in the sample.
Endothelial cells (ECs) regulate vascular functions, such as permeability, endostasis and angiogenesis. Abnormal vascular proliferation and vascular system abnormalities are the most characteristic features of GBM. Vascular abnormalities promote tumor cell invasion by inducing hypoxia, thereby exacerbating GBM progression. In addition, GBM vascular leakage can lead to edema. In the 1970s, Professor Folkman proposed that tumor growth and metastasis are dependent on angiogenesis,Inhibition of angiogenesis can be a therapeutic strategy for tumor treatment. Meanwhile, endothelial cells (EC) are key cellular components of the blood-brain barrier (BBB), and abnormal vascular development in gliomas is associated with their unique gene expression. In recent years, targeting pro-angiogenic genes has become a research hotspot for tumor treatment and prevention of tumor expansion.
Single-cell RNA-seq (scRNA-seq) is nowadays used as a new technology for sequencing genes in different cell types, and is able to dig deeper into cell-specific information. Currently, single cell sequencing has been widely used in the fields of tumor heterogeneity, immune microenvironment, neuroscience, embryonic development, cell differentiation, etc. Given the advantages of single-cell sequencing, we can identify markers of GBM endothelial cells by integrating scRNA-seq and conventional RNA-seq, and use them to construct prognostic models of GBM patients with external validation cohorts to verify their risk stratification ability. In addition, we explored potential neoangiogenic pathways in GBM after controlling for phenotypic differences between ECs of GBM and normal brain tissue ECs at the single-cell level. Finally, through the constructed 4-gene signature, we observed the prognostic and immunological characteristics of different risk subgroups of the population. We believe our findings will provide deeper insights into the characteristics of ECs in GBM and provide potential prognostic biomarkers to design rational treatment regimens and optimized drugs.
We downloaded 10X of scRNA-seq data from the GSE162631 dataset, with 10446, 11821, 15352, 16750, 21415, 15008, 13653 and 15122 cells per sample. Due to the large number, we extracted one-tenth of these cells for subsequent studies such as pathways and cellular communication. a large number of RNAseq data, mutation data and clinicopathological features of TCGA-GBM were downloaded from the UCSC Xena website. In addition, we downloaded normal brain tissue expression data for GTEx from the UCSC Xena website (https://xena.ucsc.edu/). The final externally validated gene expression profiles and clinical data of patients with glioblastoma multiforme were obtained from the China Glioma Genome Atlas (CGGA) data portal (http://www.cgga.org.cn/). Detailed clinical characteristics of patients in the TCGA and CGGA databases are summarized in Supplementary Table S1.
We processed the 10× scRNA-seq data as follows: A The R package "Seurat" was used to convert the 10× scRNA-seq data into Seurat objects[19, 20]; B Original counts are checked for quality by calculating the proportion of mitochondrial or ribosomal genes and eliminating cells with low quality; C After quality control, "FindVariableFeatures" function was used to screen the top 2000 highly variable genes; D Based on 2000 genes for principal component analysis (PCA), uniform manifold approximation and projection (UMAP)  was used for dimensionality reduction and cluster identification; E Using the "Find All Markers" function, log2 [Foldchange (FC)] was set to 0.3 and min.pct was set to 0.25 to identify markers in different clusters; F The "SingleR" package is used for clustering annotations to identify different cell types. In addition, we used the R package "ReactomeGSA" to perform functional enrichment analysis of the identified hub cell types. We used the "analyze_sc_clusters" function to perform the enrichment analysis and extracted the results by "pathways". The "monocle" package examines cell trajectories and pseudotime distributions and reduces dimensionality using the "DDRTree" approach. The contribution of genes to cell growth was then calculated using the BEAM statistical approach, and the top 100 genes were chosen for display. Cell-cell communication analysis and network visualization were finally performed using the "CellChat"and "patchwork" software packages.
Weighted gene co-expression network analysis (WGCNA, Weighted correlation network analysis) is a systems biology method for identifying gene relationship patterns across samples. In our study, we used the 'WGCNA' package in R to construct an expression data map of TCGA-GBM differential genes, and used it to identify gene sets that vary synergistically in the GBM cohort, and to identify biomarker genes based on gene set endogeneity and association between gene sets and phenotypes. The appropriate soft threshold (power) for the TCGAGBM cohort was determined using the function "pickSoftThreshold" at 7. Next, aij = |Sij|β(aij: adjacency matrix between gene i and gene j, Sij: similaritymatrix which is done by Pearson correlation of all gene pairs,β: softpower value) to calculate the matrix composed of weighted correlation values between genes and genes, i.e., the adjacency matrix matrix ). Finally, a hierarchical clustering dendrogram of dissimilarity (1-TOM) matrix is produced to compute the correlation between modules, and modules with strong correlation coefficients are selected candidates for correlation with clinical features, and further analysis is undertaken. Studies with a more detailed description of the WGCNA method have been reported.
Non-negative matrix decomposition (NMF) can classify GBM patients into different subtypes: firstly, the sample is clustered using the R package "NMF" package, and then patients are classified into different subtypes based on parameters such as copulas and dispersions. Finally, a consensus heat map was generated based on the above optimal number of clusters to see the distribution characteristics among different subtypes. Then, we also explored the relationship between different subgroups and OS. In addition, the MCPcounter algorithm was used to estimate the infiltration of immune cells between different subgroups. We also investigated the association between subpopulations and the six immune subtypes identified in previously published studies.
Differential expression analysis of differentially expressed genes (DEGs) was performed for the TCGA cohort and different subgroups using the R software "limma" package with log2FC|>1.0 and FDR < 0.05 as thresholds. We performed GSEA using Gene Set Enrichment Analysis (GSEA) software 4.1.0 (http://www.gsea-msigdb.org/gsea/index.jsp) to identify significantly enriched pathways between the low- and high-risk groups. values of P < 0.05 and FDR < 0.25 were considered as thresholds for statistical significance. Results were visualized by the "gridExtra", "grid" and "ggplot2" R software packages. In addition to this, functional enrichment analysis was performed by the "clusterProfiler" package in the R software, including Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analysis.
The 169 GBM samples with survival information from the TCGA dataset were used as the training set for constructing the prognostic risk model, and the 388 GBM samples with survival information from the CGGA dataset were used for external validation. We also used the R package "sva" to eliminate batch effects between the TCGA and CGGA data to build an accurate model. We then matched the differential gene mRNA expression profiles from the TCGA-GTEx cohort to the WGCNA results. Genes affecting overall survival (OS) in GBM patients were screened by R package 'survival' and using a one-way Cox analysis method (p < 0.05).
Lasso Cox regression analysis was used using the "glmnet" R package to minimize over-fitting of prognostic characteristics and to narrow down the genes that predict OS. Stepwise multifactorial cox regression analysis was used to analyze the genes discovered using the Lasso method. The expression of each gene and the accompanying regression coefficients were used to create risk scores for each patient, and risk models for important genes were built in the TCGA cohort by weighting the estimated cox regression coefficients. The risk score formula was Ʃ (ð × Expi), where ð is the corresponding regression coefficient and Expi represents the expression value of each gene. Based on the risk score formula, patients were divided into low risk and high risk groups using the median risk score as the cut-off point. The optimal cut-off point for survival analysis was determined using the 'survcutpoint' function in the R package 'survminer'. A log-rank test was used to evaluate the difference in survival rates between the two groups and Kaplan-Meier (K-M) survival curves were plotted. In addition, the nomogram model was created using the R package "rms".
The receiver operating characteristic (ROC) curve was completed using the R package "survival ROC" and the corresponding area under the ROC curve (AUC) was measured to assess the sensitivity and specificity of the relevant characteristics.
Two waterfall plots were generated using the 'oncoplot' function in the R package 'maftools' to explore the detailed mutations between the high-risk and low-risk groups. The immune fraction and stromal fraction of each glioma sample was assessed using the R package 'ESTIMATE', which indicates how many immune and stromal components are present in vivo. The tumor-infiltrating immune cells dataset was downloaded from TIMER 2.0 (http://timer.cistrome.org). TIMER, CIBERSORT, quantTIseq, MCP-counter, xCELL, and EPIC algorithms were also compared. In addition, we investigated the correlation between risk scores and immune checkpoint genes, and tumor mutational load (TMB), respectively, visualized using the R software 'ggplot2' package.
The Immune Cell Abundance Identifier (ImmuCellAI) is a computational method released in 2020 to predict immune checkpoint responses based on the abundance of immune cells, specifically different T-cell subpopulations. TCIA (Cancer Immunome Atlas) is an online program that gives full immunogenomic analysis findings. Immunophenotype Score is a quantitative measure of tumor immunogenicity that ranges from 0 to 10. (IPS). Immune checkpoint inhibitor(ICI) response can be predicted using IPS. The "prophytic" R package was used to compute the half-maximal inhibitory concentration (IC50) of samples from the high and low risk score groups to analyze the risk score for predicting responsiveness to chemotherapy and molecular medicines. The Wilcoxon signed rank test was used to compare the IC50 of the high and low expression groups.
All analyses were performed using R version 4.1.1, 64-bit6 and its support package. To calculate prognostic values and to compare patient survival in different subgroups in each dataset, Kaplan-Meier survival analysis and the log-rank test were used. The non-parametric Wilcoxon rank sum test was used to test the relationship between the two groups for continuous variables. LASSO regression and Cox regression analyses were used for predictive model development. Clinical characteristics of the high and low risk groups were screened for prognostic variables using univariate and multivariate Cox regression (R package 'survival'). Correlation coefficients were examined using spearman correlation analysis. In all statistical investigations, P < 0.05 was considered statistically significant. The nomogram model was created using the R package "rms", the ROC curves for the column plots were plotted using the R package "survivalROC", and the "pec" R package was used to The Concordance Index (C-index) is calculated.
We downloaded 10X scRNA-seq data from the GSE162631 dataset for four GBM and four normal samples (single cell suspensions of CD 31 + cells enriched for magnetically activated cell sorting (MACS)), and a total of 102412 cells were identified after cell quality control (QC) (Supplementary Fig. S1A). The first 2000 highly variable genes are shown in Supplementary FigS1B. After PCA and UMAP downscaling analysis, we identified 13 different cell clusters (Fig. 1A,B). We then used the "SingleR" package for cluster annotation and UMAP to visualise the downscaled cell types and we identified four cell types including endothelial cells, monocytes, macrophages and neutrophils (Fig. 1C). Of these, endothelial cells were the most important cell type, and ReactomeGSA functional enrichment analysis showed that these cell types were mainly involved in classical antibody-mediated complement activation, transmembrane transport and serotonin receptor and cardiolipin synthesis (CL) (Fig. 1D). We then used the "monocle" R package to determine the cell trajectories and pseudotime distributions (pseudotime) of two cell types that differ significantly in tumour and normal samples. We observed neutrophils corresponding to state 1 and endothelial cells corresponding to states 2 and 3 (Fig. 1E-G). Finally we calculated the contribution of genes during cell development and selected the top 100 genes for visualization (Supplementary Fig. S2A). We inferred cell-cell communication networks by calculating the likelihood of communication (Supplementary Fig. S2B). In addition, we predicted the cell-cell communication network for the relevant ligand receptors. We found that MSTN-TGFBR1-ACVR2A (Supplementary Fig. S2C), WNT7B-FZD4-LRP6 (Supplementary Fig. S2D) and others play a crucial role in the communication network of endothelial cells.
We performed differential analysis of tumour and normal tissues from the TCGA-GBM and GTEx cohorts and identified a total of 3911 differential genes (DEGs). Of these, 2021 genes were up-regulated in expression in tumours and 1901 genes were down-regulated (Fig. 2A).
Next, we used WGCNA to identify DEGs involved in the development of GBM associated with the TCGA cohort. during co-expression network construction, we observed a soft threshold power β of 7 when the fit index R2 of the scale-free topology reached 0.90 (Fig. 3B). We identified eight modules based on the average linkage hierarchical clustering and the soft thresholding power (Fig. 3C).Eight modules were identified based on the average linkage hierarchical clustering and the soft thresholding power (Fig. 2D ). Based on correlation coefficients and p-values, we observed that turquoise modules and brown modules were significantly associated with GBM development. We eventually took the intersection of marker genes of endothelial cells and module genes of WGCNA and selected 157 genes to construct an expression matrix for further analysis (Fig. 2E).
Finally we performed univariate Cox regression analysis to identify potential prognostic factors for GBM in the TCGA cohort. 28 genes were identified as being prognostically associated (Fig. 2F).
Based on the results of the univariate analysis, all patients were divided into two groups using the NMF algorithm (Fig. 3A; Supplementary Figure S3). Sankey plots were used to investigate the relationship between different immune subtypes and groupings. The results showed that all patients in group 1 were classified as immune C4 (Lymphocyte Depleted) subtype, while patients in group 2 were also almost categorized as immune C4 subtype, but with a specific minority as C1 (Wound Healing) subtype and immune C6 (TGF-beta Dominant) subtype (Fig. 3B). The results showed that patients in group 2 had better OS compared to patients in group 1 (Fig. 3D).The MCPcounter algorithm was used to estimate the infiltration of immune cells in different clusters. We found that the level of infiltration of Cytotoxic lymphocytes was significantly higher in cluster 2, however, the level of infiltration of fibroblasts was higher in group 1 (Fig. 3C).
After differential analysis of gene expression between the two groups, 56 genes were expressed down-regulated in cluster 2 and 12 genes were expressed up-regulated in cluster 2 (Fig. 3E). Finally, the R package "clusterProfiler" was used to perform GO and KEGG enrichment analysis in these DEGs, which are associated with a variety of items, including "wound healing" and "negative regulation of hydrolase activity" in the biological processes (BP) category, "collagen- containing extracellular matrix" in the cellular component (CC) category, and "endopeptidase inhibitor activity" in the molecular function (MF) category (Fig. 3F). It is also associated with HIF-1, P53 and signaling pathways in diabetes (Fig. 3G).
We performed LASSO regression analysis on 28 prognostic genes to reduce the number of DEGs in the final risk model, and seven genes were identified by this step (Fig. 4A). Ultimately, by multivariate Cox analysis, four genes were considered independent prognostic factors, including TUBA1C,RPS4X,KDELR2 and SLC40A1.Based on their coefficients, we calculated risk scores using the following formula: risk score = expression level of TUBA1C*0.41 + expression level of RPS4X*- 0.65 + expression level of KDELR2*0.60 + expression level of SLC40A1*-0.39. All patients were divided into high and low risk groups according to the median value of the risk score. Survival curves showed that patients in the high-risk group had worse OS compared to those in the low-risk group (Fig. 4B). Furthermore, it was shown that the risk score performed well in predicting OS for these individuals in the TCGA cohort (AUCs for 1, 3 and 5 year OS: 0.655, 0.774 and 0.955; Fig. 4C). Similar results were observed in the CGGA cohort (Fig. 4B, C). Risk plots show detailed survival outcomes for each patient in the TCGA cohort and the CGGA external validation cohort, and heat maps demonstrate differences in the expression of the four genes in the model in the risk groups (Fig. 4D).
We then performed principal component analysis (PCA) to further validate the grouping ability of the four degs. PCA was performed to demonstrate the differences between the high and low risk groups based on the prognostic characteristics of the whole gene expression profile and the expression profile classification of the four model genes. The distribution of the high- and low-risk groups was relatively dispersed (Fig. 4E). However, our signature yielded results indicating that the high- and low-risk groups had different distributions (Fig. 4F). These results suggest that prognostic characteristics can distinguish between high and low risk groups.
We performed univariate and multifactorial Cox analyses to determine whether risk scores could be an independent prognostic factor for GBM patients compared to other common clinicopathological parameters. We observed that risk score could be an independent prognostic factor for these individuals (Fig. 5A). Based on a risk regression model for the TCGA cohort, we included all factors such as age, gender, race and riskscore in the nomogram column line graph. We constructed calibration plots to assess the agreement between the predicted overall survival (OS) of the prognostic model and the actual overall survival, and the results showed that the predictions of the column line plots were reliable (Fig. 5B). The area under the curve (AUC) for the risk class was higher than the AUC for other clinicopathological features (Fig. 5C), and the temporal c-index values for the risk class were similarly higher than for other features (Fig. 5D). The results suggest that the prognostic features of the 4-gene signature are quite reliable. The bar chart shows an extremely strong relationship between IDH mutation status and risk grouping only (Fig. 5E).
Also, GBM patients were grouped by age, gender and IDH mutation status to investigate the relationship between risk characteristics and prognosis of GBM patients in these clinicopathological variables. For different staging, patients in the low-risk group of the TCGA and CGGA cohorts had significantly longer OS than those in the high-risk group (Fig. 6A-L). The differential results for the TCGA > 60 years group and the female subgroup may be due to poor prognosis in GBM and the limited number of patients. These results suggest that predictive characteristics may also predict the prognosis of GBM patients of different ages, genders, and IDH status.
Afterwards, we generated two waterfall plots to explore the detailed gene mutation profiles between the high- and low-risk groups. We identified TP53, TTN and PTEN as the most commonly mutated genes in both the high- and low-risk groups (Fig. 7A, B). Using immune cell infiltration data from the TCGA database downloaded from TIMER 2.0 (infiltration_estimation_for_tcga)
Using Spearman correlation analysis, we found a correlation between the risk score and the abundance of immune cells in the GBM tumour microenvironment obtained by various algorithms, e.g. B cells in CIBERSORT, XCELL and TIMER results all had a negative correlation with the risk score (Fig. 7C). Using correlation heat maps, we investigated the correlation between the four genes and risk scores in the model and the expression levels of genes associated with common ICIs, respectively. The results showed that higher risk scores were significantly associated with upregulation of CD276, CD274 and CD44 (Fig. 7D). We examined the correlation between risk score and tumour mutational load (TMB) and the difference in TMB between risk groups (Supplementary Fig. 3A), and the results showed no significant association between risk score and TMB. Using the R package "estimate", we found no significant differences between stromal and immune scores in the high- and low-risk groups (Supplementary Fig. 3B).
We applied TCIA to predict the susceptibility of patients with high and low risk scores to immunotherapy. As shown in the figure, neither programmed cell death protein 1 (PD-1) nor cytotoxic t lymphocyte antigen 4 (CTLA4) was significant for treatment in the risk group (Supplementary Fig. 3C), probably due to the very poor prognosis of GBM This may be due to the very poor prognosis of patients with GBM. We predicted the IC50 of all chemotherapeutic agents in the high- and low-risk score groups. we found that most of the agents such as AKT inhibitors and pabucirib exhibited a higher IC50 in patients with high risk scores. suggesting that patients with high risk scores may be more sensitive to these agents (p all < 0.05; Supplementary Fig. 3D). We performed GSEA enrichment analysis between the TCGA high and low risk datasets to assess the biological function of these genes. Using the gene set database MSigDB Collections (c2.cp.kegg.v7.4.symbols.gmt), we selected the eight most significant enriched signalling pathways based on normalised enrichment scores (NES) and P values (< 0.001) (Fig. 7E). p53 signalling pathway, cell cycle, DNA repair and regulation of the actin cytoskeleton were enriched in the high-risk group, whereas the low-risk group had higher levels of Parkinson's disease, ribosomal, Alzheimer's disease and neuroactive ligand receptor interactions.
The process of angiogenesis is the growth of new capillaries from pre-existing vessels. Glioblastoma (GBM) is a highly vascularised tumour and the growth of glioma is extremely dependent on the formation of new blood vessels. Endothelial cells (ECs) dynamically modify their behavior during angiogenesis, resulting in changes in differentiation, proliferation, migration, polarity, metabolism, and cell-cell communication. These modifications are assumed to integrate many external inputs, but they also govern ECs' ability to respond to environmental stimuli, such as up- or down-regulation of surface receptor expression. In recent studies, TAM-derived factor (sema4d) has been shown to promote pericyte recruitment in neovascularization and cellular communication between glioma stem cell-derived perivascular cells and endothelial cells, directly contributing to vascular stability in gliomas. FAK proteins may increase angiogenesis in gliomas by triggering endothelial cell migration, according to research on endothelial cells and angiogenesis in gliomas. High-grade gliomas have higher FAK expression compared to low-grade gliomas and are associated with poorer survival. As a result, there has been interest in anti-angiogenic therapies targeting endothelial cells, which include inhibiting the proliferation of gliomas through the use of angiogenesis-inhibiting factors and drugs to inhibit the formation of new tumor blood vessels.
Characterization of ECs in normal brain tissue and GBM based on bulk RNAseq data is often limited. In studies of endothelial cells, it is often impossible to infer the effects of other cell types because of GBM cell heterogeneity. In this study we characterised the brain and GBM endothelial cells in more detail by integrating 10 × scRNA-seq and bulk RNA-seq data, and used the mark gene of endothelial cells to build a prognostic model for GBM patients. We found that the constructed prognostic model could effectively classify patients in the TCGA and CGGA cohorts into high- and low-risk groups. In addition, we explored survival status, clinical relevance, mutational status and tumour immune infiltration in the different groups. Our study showed that higher risk scores were associated with poorer prognosis, lower frequency of IDH mutations and upregulation of immune checkpoints such as PD-L1 in patients. We therefore suggest that patients with higher risk scores may be more likely to receive immunotherapy. In addition, we identified two different subtypes using the NMF algorithm. All patients in cluster 1 were immune C4 subtypes, which were associated with a worse prognosis. We observed that the different subtypes had different prognostic and TME components. Group 1 was associated with a poorer clinical outcome and high infiltration levels of fibroblasts, whereas group 2 was associated with a better clinical outcome and high infiltration levels of cytotoxic lymphocytes. Fibroblasts can support tumor growth by depleting glucose.
We first identified mark genes in endothelial cells by means of single-cell sequencing, followed by LASSO and Cox regression analysis to identify four hub genes, including TUBA1C, RPS4X, KDELR2 and SLC40A1 to model prognosis. TUBA1C is an isoform of alpha-microtubule protein that serves as a core component of the eukaryotic cytoskeleton and plays a cell division, formation, motility and intracellular trafficking [39, 40]. In addition, the biological functions of microtubule proteins have been linked to cancer development, neurodevelopment and neurodegenerative diseases. In a recent study, TUBA1C expression was significantly higher in gliomas than in normal brain tissue and indicated a poorer prognosis. In addition, knockdown of TUBA1C also inhibited proliferation and migration of glioma cells, leading to apoptosis and G2/M phase arrest. Studies on the oncogenic ribosomal protein S4 X-linked (RPS4X) have found that RPS4X increases cisplatin resistance after depletion of specific small interfering rna's. RPS4X is associated with ovarian cancer stage and its low expression is also associated with poor survival and disease progression, but there are no reports on RPS4X in glioma. In hepatocellular carcinoma, RPS4X is required for SLFN11 inactivation in the mTOR signalling pathway. Interestingly, the KDEL receptor (KDELR2) can also target and promote the growth of HIF1a through the mTOR signaling pathway to guide glioblastoma. In addition, KDELR2 knockdown reduces cell viability, promotes G1 phase cell cycle arrest and induces apoptosis. kDELR2 can regulate cellular function in glioma cells by targeting CCND1. Solute carrier family 40 member 1 (SLC40A1) is a gene encoding an iron transporter protein. Studies in multiple myeloma and ovarian cancer have shown that SLC40A1 inhibits tumour cell growth and reduces resistance to chemotherapy[47, 48]. Only one recent bioinformatics study has suggested that the ferroptosis suppressor SLC40A1 is associated with immunosuppression in gliomas and that acetaminophen may exert antitumor effects in GBM by modulating SLC40A1-induced death. The CGGA external validation cohort was also employed to confirm its predictive capacity, and both cohorts showed similar results. The findings revealed that the prognostic model developed exhibited independent predictive power in predicting OS in GBM patients. The status of genetic mutations and immunological function in different risk categories were then studied. We found no significant differences in gene mutations such as TP53 and PTEN between the high-risk group and the low-risk group, yet the high-risk group did not have any IDH mutations. 2016 WHO classification clearly indicates a significant difference between IDH mutant GBM and IDH wild-type GBM, while IDH wild-type GBM has a poorer prognosis. This further validates the reliability of our model. In addition, we also investigated the relationship between risk score and TMB values and PD-L1 expression levels. Disappointingly, higher risk scores did not correlate significantly with higher TMB values (figure S3 A). A key mediator of immunosuppression in GBM is PD-L1, and although only a fraction of GBM cells express PD-L1, the tumour microenvironment is deficient in the expression of PD-L1.
Finally, immune checkpoint blockade treatment may be more effective in individuals with higher risk ratings. The developed prognostic model might be used as a predictive biomarker for immunotherapy patients. The CGGA external validation cohort was also employed to confirm the accuracy of the model in predicting OS in these individuals. Our research does, however, have certain inevitable flaws. To begin with, all of these findings are based on bioinformatic studies and will require further experimental confirmation. To corroborate our findings, we created an endothelial cell-based biomarker that will need to be tested in large-scale clinical studies.
This study constructed and validated a prognostic model for GBM by integrating 10× scRNA-seq and bulk RNA-seq data. Higher risk scores were significantly associated with poorer survival outcomes, with almost zero IDH mutation rates and upregulation of immune checkpoints such as PD-L1 and CD276. Our prognostic model may therefore be a potential biomarker for risk stratification and treatment response prediction in GBM patients.
This work was supported by Wuxi Taihu Lake Talent Plan, Supports for Leading Talents in Medical and Health Profession（2020THRC-DJ-SNW）, General project of Wuxi commission of Health(2020ZHYB16, MS201933, T202120), Youth project of Wuxi commission of Health(Q202133).
Availability of data and materials
The datasets analyzed for this study were obtained from the UCSC Xena website (https://xenabrowser.net/datapages/) , the CGGA dataset (mRNAseq_325) (http://www.cgga.org.cn/index.jsp) and the NCBI website (https://www.ncbi.nlm.nih.gov/).
Junfei S and Chao C conceived the study and participated in the study design, performance, coordination and manuscript writing. SYZ, WJ, YFS, YSF,JH and HH performed the literature review and graphics production. Junfei S and Chao C revised the manuscript. All authors reviewed and approved the final manuscript.
Ethical approval and Consent to participate
This article does not contain any studies with human participants or animals performed by any of the authors.
The authors declare no conflicts of interest.