Comprehensive Prognostic Assessment by Integrating Single-Cell and Bulk RNA-seq Signatures in Glioblastoma

doi:10.21203/rs.3.rs-4128581/v1

Download PDF

Research Article

Comprehensive Prognostic Assessment by Integrating Single-Cell and Bulk RNA-seq Signatures in Glioblastoma

https://doi.org/10.21203/rs.3.rs-4128581/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

Glioblastoma (GBM) is one of the most challenging malignancies in all cancers. The immune response in the tumor microenvironment has an important impact on the prognosis of GBM patients. Therefore, it becomes critical to correlate tumors with the immune response in their microenvironment and to screen for genes of potential prognostic value associated with the immune microenvironment.

Methods

We first evaluated the tumor microenvironment on bulk RNA-seq data using the Xcell and ESTIMATE algorithms, followed by an integrated analysis of single-cell and bulk RNA-seq data from the GEO database, with a special focus on GBM-related datasets. From this analysis, we identified a set of differentially expressed genes (DEGs) that were consistently observed in scRNA-seq and bulk RNA-seq datasets. We then performed random forest analysis on these DEGs to identify core genes for our prognostic model. Findings regarding the function of IFI44 in the glioma cell line were validated by siRNA knockdown, overexpression, and transwell experiments.

Result

We ultimately identified 235 DEGs that were consistently observed in both single-cell and bulk RNA-seq datasets. Through Cox regression and random forest analysis, we further identified nine genes, namely AK5, ATP2B1, CNTN2, GABARAPL1, HK2, IFI44, PLP2, S100A11 and ST18, which exhibited a strong association with glioblastoma multiforme (GBM) prognosis. Notably, these genes were predominantly expressed in macrophages, DC14 cells, and T cells within the single-cell dataset. Patients classified as low-risk demonstrated significantly better prognoses compared to those classified as high-risk. Importantly, these findings were robustly reproduced in the test dataset. The IFI44 could promote both glioma cells proliferation and migration in vitro. Higher levels of IFI44 expression are associated with poorer survival rates.

Conclusions

We identified nine genes as prognostic biomarkers in GBM. These results may provide valuable insights into the molecular mechanisms underlying GBM progression.

glioblastoma

tumor microenvironment

nomogram

prognosis

model

From 1990 to 2016, the prevalence of cancers of the central nervous system (CNS) increased worldwide. In 2016, approximately 330,000 patients with and 227,000 deaths from tumors of the brain and other CNS were observed ¹. As one of the CNS tumors, glioblastoma (GBM) is considered to be the most prevalent primary malignant brain cancer ^{2, 3}. There are several treatment options available for GBM, but its prognosis remains poor, with a 5-year survival rate of only 5.5 ^{4, 5}. Therefore, mining potential GBM-related biomarkers through the expression profiling of clinical data is important for improving the prognosis of cancer, optimizing individual therapeutic regimens, and improving the survival rate of patients.

The patient's tumor region is not only composed of tumor cells but also includes resident and recruited host cells (cancer-associated stromal cells and immune cells), among others ^{6, 7}. There are different interactions between tumor cells and components of their microenvironment, such as ones that promote tumor growth or inhibit tumor growth, at different stages of tumor development including tumorigenesis and metastasis ^{8, 9}. Cancer cells can be recognized and eliminated by immune cells within the tumor microenvironment (TME) through different immune mechanisms ^{10, 11}. However, cancer cells can avoid recognition by immune cells through various immunosuppressive mechanisms, such as recruitment of immunosuppressive cell populations and by downregulating tumor immunogenicity ^{12, 13}. With the popularity of cancer gene expression profiling, cancer prognostic modeling has been developed ^{14, 15}. The infiltration of cancer-associated normal cells affects prognostic modeling through gene expression information ^{16, 17}. Methods for calculating the proportion of tumor cells in the tumor microenvironment have been developed where two categories of cancer-associated normal cells are important for prognostic modeling: immune cells and stromal cells ¹⁸. Computational methods to predict tumor proportions based on gene expression data from databases have also been developed. For example, Yoshihara et al. ¹⁹ designed an algorithm called ESTIMATE (estimation of stromal and immune cells in malignant tissues using expression data). This method analyzes the infiltration of nontumor cells in a scoring manner by analyzing specific gene expression features of immune and stromal cells. The xCell ²⁰ algorithm calculates and generates able cell type enrichment scores based on the association between expression profiles and features. The ESTIMATE and xCell algorithms have been applied to lung adenocarcinoma ^{21, 22}, colon cancer ^{23, 24}, and adrenal cortical cancer ^{25, 26}. The multiple cancer applications illustrate the strong applicability of both algorithms.

With the widespread adoption of high-throughput sequencing technologies, bulk RNA -seq data from relevant cancer tissues has become an indispensable part of transcriptome analysis²⁷. However, bulk RNA-seq overlooks cellular heterogeneity, which limits our understanding of gene expression from a single-cell perspective. The emergence of single-cell sequencing technologies has addressed this issue, enabling us to analyze gene expression at the individual cell level. The utilization of scRNA-seq techniques will empower us to conduct a more comprehensive analysis of patient prognosis, particularly from the perspective of the tumor microenvironment and its associated cellular layers^28–30. This approach promises to enhance our understanding of cancer progression and may lead to more accurate prognostic assessments for cancer patients.

In this study, we performed a comprehensive analysis of the tumor microenvironment by integrating bulk RNA-seq with scRNA-seq data from GBM samples. Leveraging the power of both bulk and single-cell transcriptomics, we aim to construct a prognostic model that correlate the patient outcomes.

Data collection and pre-processing

GBM-related gene expression profiles were obtained from public databases. From the GEO (https://www.ncbi.nlm.nih.gov/geo/) databases, we downloaded two bulk RNA-seq databases, GSE16011³¹ and GSE108474³², for subsequent analysis. In the TCGA (https://portal.gdc.cancer.gov) database we downloaded the cohort named TCGA-GBM³³ of RNA- sequencing. All cohorts were mapped from probe IDs to gene symbols via annotation files, and if there are cases where gene symbols correspond to multiple probes, the average value is selected as the final gene expression value. The scRNA-seq dataset, derived from GBM and encompassing a total of 11,762 cells, was procured from the GEO database under the series record GSE108474, which with a reading depth of 10× genomics based on Illumina NextSeq 500.

Processing of scRNA-seq dataset

Using the Seurat package in the R programming environment, we performed quality control, statistical analysis, and exploration on the scRNA-seq dataset. The percentage of mitochondrial genes was calculated using the PercentageFeatureSet function, followed by correlation analysis to elucidate the association between sequencing depth, mitochondrial gene sequence, and total intracellular sequence. The standardization of gene expression in residual cells was achieved through the LogNormalize method. After that,variance analysis was performed to identify the top 2000 genes with highly variable features. In addition, principal component analysis (PCA) was performed based on the expression profiles of these genes to identify significant dimensions with p-values < 0.05.

The filtered matrix is subjected to Uniform Manifold Approximation and Projection (UMAP) and clustered using the Seurat version, with default parameters applied. The filtered matrices were subjected to Uniform Manifold Approximation and Projection (UMAP) and clustered using the Seurat version with default parameters applied. Marker genes for each cluster were obtained under the criteria of log2 [fold change (FC)] > 0.25 and an adjusted p-value < 0.05. These clusters were subsequently annotated with these marker genes using the "scMayoMap" package.

Immune infiltration analysis

The relative or absolute abundance of immune and stromal cell populations in the samples was calculated using the three algorithms of xCell, ESITMATE, and CIBERSORT³⁴. Immune and stromal proportions were calculated for each patient using the R package “ESTIMATE” and “xCell”, respectively, and the proportions of 22 immune cells in the tumor microenvironment of the samples were loaded and calculated via the CIBERSORT website (http://cibersort.stanford.edu/).

Data Processing and Acquirements of DEGs

Data were normalized and processed in the R environment (version 4.2.1, https://www.r-project.org/). The “limma” package³⁵ was used for the acquisition of DEGs in the GSE16011 profile. The screening conditions for the DEGs were p-value < 0.05 and |log₂FC| ≥ 2. To demonstrate cellular heterogeneity, we selected highly variable genes at the single-cell level for joint analysis with differentially expressed genes from bulk RNA-seq. Common genes were selected for subsequent analysis.

Construction of risk scoring model

First, we classified the patients according to high and low scores based on the estimate score. Then, the limma package was used to calculate the differential genes. Using the R package "coxph", risk ratios (HR) and P-values were calculated for each gene of the differential gene. Then, candidate genes with P-values less than 0.05 were used as input to the random forest, and we finally identified nine genes for risk score (RS) model construction:

$$\varvec{R}\varvec{S}={\sum }_{\varvec{i}=1}^{\varvec{N}}\left({\varvec{E}\varvec{x}\varvec{p}\varvec{r}\varvec{e}\varvec{s}\varvec{s}\varvec{i}\varvec{o}\varvec{n}}_{\varvec{i}}\times {\varvec{C}\varvec{o}\varvec{e}\varvec{f}\varvec{f}\varvec{i}\varvec{c}\varvec{i}\varvec{e}\varvec{n}\varvec{t}}_{\varvec{i}}\right)$$

where N is the number of genes, Expression is the gene expression values, and Coefficient is the Cox coefficients.

RS prognostic analysis

We divided patients into high and low groups using the R package “maxstat” ³⁶ to determine the optimal breakpoint for GBM. Kaplan‒Meier survival curves were used to assess survival differences between the GMB patient groups. Multivariate Cox regression analysis was performed to evaluate the significance of each variable in the risk of survival. Time-dependent consistency index (C-index) and time-dependent receiver operating characteristic (tROC) analyses were performed using the R package "timeROC" to compare the survival predictive power across conditions.

Mutation analysis

Mutation data of GBM patients were obtained from the TCGA database and stored in mutation annotation format (MAF) files, and a series of analyses of the mutation data were performed using the R package “maftools” ³⁷.

Cell culture

The human glioma cell lines were purchased from the American Type Culture Collection. U87 and U251 cells were cultured in DMEM supplemented with 10% fetal bovine serum (FBS) and 1% penicillin–streptomycin at 37°C with 5% CO2.

Cell invasion assays

Cell invasion assays were conducted using Corning Matrigel Invasion Chambers with 8.0 µm polyethylene terephthalate Membranes (CORNING, 354480). Cells were resuspended in DMEM without FBS and seeded into the upper chamber wells, while DMEM with 10% FBS was placed in the lower chamber as the chemoattractant. Following a 24-hour incubation for invasion assays and a 12-hour incubation for migration assays, cells were rinsed with PBS, fixed with 4% paraformaldehyde, and stained with 0.1% crystal violet solution. The invaded cells were enumerated under a microscope. Cell imaging was performed using an inverted microscope, and ImageJ software was utilized for cell quantification. All experiments were carried out in triplicate.

Wound healing assay

Equal amounts of cells were plated and allowed to grow to 90% confluence. Culture inserts for live cell analysis (Ibidi) were used to make a wound in the cell monolayer. The wound areas were marked and photographed at 0 and 24 h with a phase-contrast microscope. ImageJ software was used to calculate the area of wound healing. All assays were performed in triplicate.

Cell counting kit-8 (CCK-8) assay

Cells were initially seeded in 96-well plates at a density of 1 × 104 cells per well in 100 µL of cell culture medium and incubated for 24 hours. The cells were transiently transfected with the designated plasmids and short interfering RNA. Cell viability was assessed using the CCK-8 assay kit (Dojindo, Japan). After a specified incubation period, 10 µL of CCK-8 solution was added to each well of the 96-well plate and incubated for 3 hours in a controlled environment. Subsequently, the optical density was measured at 450 nm, and a proliferation curve was constructed based on the correlation between time and absorbance.

Western blotting analysis

The U87 and U251 cells were harvested by treatment with 0.05% trypsin, followed by cold PBS washing, and lysed in ice-cold lysis buffer containing protease inhibitors. Subsequently, the proteins were separated by gel electrophoresis using 10% Tris-glycine gels and transferred to polyvinylidene fluoride (PVDF) membranes. Blocking of nonspecific binding sites was achieved by incubating the membranes in 5% nonfat milk in TBST buffer, followed by incubation with primary antibodies. The membranes were then probed with peroxidase-conjugated secondary antibodies, and specific protein bands were visualized using enhanced chemiluminescence reagents. The primary antibodies used in this study (anti-IFI44, ab236657) were acquired from Abcam. All experiments were conducted in triplicate.

Data analysis and processing

GEO and TCGA data analyses were performed in the R environment (version 4.2.1). KM and Cox analyses were performed using the R package "survival," where log-rank tests and univariate Cox proportional risk regression generated P values and HRs with 95% confidence intervals (CIs). The package "Clusterprofiler" was used to perform functional annotation. p < 0.05 was considered statistically significant. The statistical analyses were conducted using GraphPad Prism 8.00 statistical software. The experimental data were presented as the mean ± SD (standard error of measurement) of three or more independent experiments, as specified in the corresponding figure legends and methods. The normality of the data distribution was assessed using the Shapiro–Wilk test. Differences among the three groups were assessed using one-way ANOVA or the Kruskal–Wallis test (nonparametric). p < 0.05 was considered to indicate a significant difference.

Database Queue Information

The GBM case cohort contains 155 cases of GSE16011 in the training set, 209 cases of GSE108474 in the test set, and 518 cases of TCGA. In Table 1 we obtained the median age of the patients as 55 years. The number of male and female cases in the training set was 105 and 50, respectively, and because GBM is a malignant tumor with high mortality, the number of deceased cases in the training set was 147 and 8 survived.

Correlation of immune infiltration and GBM in the tumor microenvironment

Cellular characterization was performed by the CIBERSORT algorithm, and we found that tumor-associated macrophages and T cells were the most represented TME-infiltrating cells (Figure 1A). The infiltration abundance of immune cells was assessed using the xCell algorithm. The correlation network plots reflected the correlation between different types of immune cells (Figure 1B). Figure 1C shows the immune infiltration correlation heat map. Subsequently, 155 cases in GSE16011 were analyzed by both ESTIMATE and xCell algorithms to assess the infiltration status of immune cells and stromal cells in the tumor microenvironment in GBM patients (Figure 1D). The figure shows the cumulative distribution curves of the scores calculated by the two algorithms. It can be seen that the score of immune cells is higher than that of stromal cells through two different algorithms, indicating that immune infiltration plays a major role in it.

Four types of scores were significantly associated with GBM subtypes

Immune, stromal, estimate and tumorpurity scores were calculated using ESTIMATE on the GSE16011 gene expression profile. We divided patients into high and low groups by the optimal breakpoint of scores. Significant differences were shown by the survival curves for both high and low scoring subgroups of the four scores. The overall survival time and scores for immune (p=0.0015), stromal (p=0.0015) and estimate scores (p=0.0038) were inversely proportional, with a better prognosis for low scores (Figure 2A, B and C). Interestingly, the tumor purity score and overall survival time were positively correlated, with higher scores having a better prognosis (p=0.0038) (Figure 2D).

Identification of tumor microenvironmental cells in the scRNA-seq dataset

As shown in Figure 3A, after quality control, we obtained a final set of 11,762 cells. we conducted correlation analysis and found no association between sequencing depth and mitochondrial gene sequences. However, there is a significant positive correlation between sequencing depth and total intracellular sequences in our analysis (Figure 3B). In our PCA analysis, we selected 20 principal components based on a significance threshold of p-value < 0.05 (Figure 3C, D). Subsequently, we successfully applied the UMAP method to cluster the cells, resulting in the identification of 11 distinct clusters (Figure 3E). From these 11 clusters, we determined 11126 marker genes, and a subset of these marker genes was visualized using a cluster heatmap (Figure 3F). This comprehensive analysis enabled us to characterize the cellular heterogeneity and identify key genes associated with different cell clusters.

Establishment of risk score (RS) model in GBM patients

In the bulk RNA-seq data, we identified 750 highly expressed genes by applying the criteria of |log2 FC| > 1 and p < .05. Additionally, we selected 2000 variable genes from a pool of 18575 genes in the scRNA-seq data (Figure 4A). Among them, 235 genes were found to be common in the differentially expressed genes (DEGs) identified from both the GBM scRNA-seq and TCGA-GBM datasets (Figure 4B). Subsequently, we performed univariate Cox analysis on these genes and selected 22 genes for random forest analysis. Finally, we identified 9 genes that were significantly associated with patient prognosis (Figure 4C). Finally, nine genes (AK5, ATP2B1, CNTN2, GABARAPL1, HK2, IFI44, PLP2, S100A11 and ST18) were identified by permuting the obtained genes to construct the risk assessment model:

Nine genes risk score model prognostic analysis

The GSE16011 dataset is used as the training dataset, and the GSE108474 and TCGA datasets are used as external validation datasets to evaluate the robustness and effectiveness of our risk score prognostic model. In Figure 5A, C, and E, the survival curves demonstrated that the high-risk group exhibited a significantly worse prognosis compared to the low-risk group (p < .01) in all these datasets.

Furthermore, we conducted time-dependent ROC curve analysis to predict the 1-, 3-, and 5-year survival rates. The area under the curve (AUC) values for 1-, 3-, and 5-year overall survival (OS) in the GSE16011 dataset were 0.661, 0.7093, and 0.726 (Figure 5B). In the GSE108474 dataset, the AUC values for 1-, 3-, and 5-year OS were 0.579, 0.600, and 0.713, respectively (Figure 5D). Additionally, in Figure 5F, the AUC values for 1-, 3-, and 5-year OS in the TCGA dataset were 0.541, 0.641, and 0.687.

Taken together, these findings highlight the effective predictive capability of the risk score prognostic model developed through integrated analysis of scRNA-seq and bulk RNA-seq) datasets.

Integrated analysis of scRNA-seq and bulk RNA-seq data with clinical survival information

We performed cell type identification on different clusters and identified seven distinct cell types, namely B cells, CD14 monocytes, CD56-dim natural killer cells, macrophages, oligodendrocytes, oligodendrocyte precursor cells, and T cells (Figure 6A, C). Subsequently, we employed SCISSOR to investigate the association between scRNA-seq and bulk RNA-seq data with patient survival. In the tumor microenvironment, B cells, CD14 monocytes, CD56-dim natural killer cells, and T cells exhibited a negative correlation with patient survival, while macrophages demonstrated a similar impact on patient survival. Among the identified cells, 8,093 were found to be unrelated to survival, 2,748 showed a negative correlation, and 921 showed a positive correlation with survival (Figure 6B, D).

The spatial distribution and expression of prognostic-related genes in both scRNA-seq and bulk RNA-seq datasets

Subsequently, we presented the distribution of the nine prognostic-related genes in scRNA-seq. We observed that these genes were predominantly expressed in macrophages, CD14, and T cells (Figure 7A). Furthermore, we calculated risk scores for the training set separately and generated correlation heat maps in bulk RNA-seq. As depicted in Figure 7B, higher risk scores in the training set were associated with increased expression of AK5, ATP2B1, CNTN2, HK2, IFI44, PLP2, S100A11, and ST18, while showing decreased expression of GABARAPL1. Cox analysis results also demonstrated that GABARAPL1 exhibited low expression in the high-risk group, suggesting its potential as a protective factor (Table 2).

Mutation analysis

To investigate the relationship between mutations and the tumor immune microenvironment, we analyzed mutation data from the TCGA-GBM cohort and showed the top 20 frequently mutated genes in individuals. From the oncoplot, we found that four genes with mutation frequencies higher than 20% were PTNT, TP53, TTN and EGFR (Figure 8A). We also plotted the gene cloud, where the size of each gene is proportional to the total number of samples in which it is mutated, and it was also evident that four genes have the highest mutation frequency in the sample (Figure 8B). Compared with a single mutations, we also detected co-mutations of the top 20 mutated genes by pair-wise Fisher's Exact test. The results of green are mutated genes that tend to coexist; yellow is mutually exclusive, and the depth of color indicate significance (Figure 8C), providing a theoretical basis for clinical treatment. Figure 8D shows the variant allele frequency (VAF), VAF can be used to infer tumor heterogeneity and tumor purity. In addition, high or low VAF may affect the prognosis of cancer. We also enriched oncogenic pathways, with RTK-RAS, PI3K and TP53 being the top 3 pathways in the cases (Figure 8E). Finally, we selected the set of mutated genes of greatest prognostic significance (TP53 and ATRX) for analysis, and interestingly, the set of mutated genes were associated with a good prognosis (Figure 8F).

IFI44 promoted the proliferation, migration, and invasion ability of glioma cells

we focused on one specific gene, IFI44, which is established as biomarkers in a variety of tumor types. Their functional significance in glioma remains understudied. To further validate the potential tumor inhibiting role of IFI44 in glioma, we explored its function in the glioma cell line. We first transfected siIFI44 and IFI44 into U87and U251 cells, and the efficiencies were checked by Western blot (Figure 9A), we analyzed the effect of IFI44 on cell viability using CCK-8 assay and observed increases in the viabilities of IFI44 overexpression in U87and U251 cells (Figure 9B). Correspondingly, the knockdown of IFI44 led to impaired cell viability. We also investigated the effect of IFI44 on the migration and invasion of glioma cells. The results indicated that IFI44 overexpression significantly promoted the migration and invasion of glioma cells, whereas IFI44 knockdown attenuated the migration and invasion of U87 and U251 cells (Figure 9C and 9D). Through analysis of the timer database, Higher levels of IFI44 expression are associated with poorer survival rates (Figure 9E). Together, these findings revealed that changes in IFI44 expression affected the malignant phenotype of glioma cells.

Gliomas are primary malignant tumors with a very high degree of malignancy. In addition to the traditional surgical resection, chemotherapy and radiotherapy, immunotherapy is increasingly being developed. Avoidance of immune destruction is a hallmark of cancer. Not only is it necessary to acquire intrinsic features during tumor development, but avoidance of immune killing is an equally important external factor. namely, while a cancer develops itself at the cellular level, its microenvironment is altered to help it evade recognition by immune cells. Therefore, the study of cancer is not only limited to cancer cells, but it is also important to analyze the surrounding tumor microenvironment.

In this study, we analyzed the proportion of 22 immune cell species in the GBM tumor microenvironment by CIBERSORT. The correlation of immune cells was obtained by Xcell and the immune score, stromal score, tumor purity score and ESTIMATE score were calculated by ESTIMATE. The cumulative curves were used to observe the immune and stromal scores, and it was found that the cumulative immune score was higher than the stromal score in the tumor immune microenvironment, indicating a crucial role of immune infiltration. Subsequently, patients were divided into high and low groups according to the four scores and optimal cutoff point. Survival prognostic analysis was performed, and all were found to have significant prognostic effects. We focused on the prognosis associated with the immune score and found that the prognosis in the low immune score group was good, so we selected the low immune score group for subsequent analysis.

To construct a prognostic model related to GBM tumor microenvironment, we integrated 750 DEGs from bulk RNA-seq and 2000 highly variable genes from scRNA-seq data, resulting in a set of 235 overlapping genes. Subsequently, we employed univariate Cox regression and random forest analysis to identify a prognostic gene signature consisting of AK5, ATP2B1, CNTN2, GABARAPL1, HK2, IFI44, PLP2, S100A11, and ST18. This gene signature was used to establish a risk scoring method for GBM patients, demonstrating significant prognostic power. Then, we used SCISSOR to correlate the survival-related clinical information from scRNA-seq and bulk RNA-seq data, enabling us to analyze the impact of different tumor microenvironments on patient survival. Furthermore, we conducted additional analysis to explore the mutations associated with this gene signature.

We developed a prognostic model using nine genes (AK5, ATP2B1, CNTN2, GABARAPL1, HK2, IFI44, PLP2, S100A11, and ST18) based on the training dataset. AK5 is localized in the cytoplasm and only expressed in the brain, which has been found to significantly affect the prognosis of GBM, consistent with the findings of Yang et al³⁸. Zhang et al. demonstrated that ATP2B1 overexpression can activate immune signaling and prompt cold tumor response, suggesting that it is closely related to the tumor immune microenvironment³⁹. Yu et al. found that CNTN2 is highly expressed in high-grade glioma, and upregulation of CNTN2 can promote the proliferation of glioma cells, inhibit the differentiation of glioma cells, and activate RTK/Ras/MAPK pathway⁴⁰. GABARAPL1 is involved in the autophagy pathway in cancer, and Su et al. determined that GABARAPL1 inhibits cancer metastasis by inhibiting the PI3K/Akt pathway, which is consistent with our findings on its protective role in GBM⁴¹. HK2 is associated with increased rates of glycolysis observed in rapidly growing cancer cells, and Huang et al. also reported its significant correlation with malignant tumor growth⁴².

Pan et al. found that IFI44 is overexpressed in head and neck squamous cell carcinoma (HNSC) samples compared with normal tissues, and the expression of IFI44 was positively correlated with the infiltration of CD4⁺ cells, macrophages, and neutrophils in HNSC⁴³. Our study identifies a new gene IFI44 which is associated with cell proliferation, and migration. Specifically, we observed a close correlation between the expression level of the IFI44 gene and the survival period of GBM patients. Higher levels of IFI44 expression are associated with poorer survival rates, suggesting that IFI44 may have important implications as a potential biomarker or therapeutic target for GBM. Feng et al. demonstrated that increased expression of PLP2 can promote GBM cells growth, highlighting its critical role in GBM⁴⁴. The study by Tu et al. found that overexpression of S100A11 promotes GBM cell growth, epithelial-mesenchymal transition (EMT), migration, invasion, and the generation of glioblastoma stem cells (GSCs), while its knockout inhibits these activities. Importantly, they demonstrated that the S100A11/ANXA2/NF-κB positive feedback loop promotes GBM progression⁴⁵. However, the function of ST18 remains unknown.

Our study introduces a new approach to construct a prognostic model by integrating scRNA-seq and bulk RNA-seq data. Through a series of screening steps, we identified nine key genes that demonstrated excellent prognostic performance in both the training and test cohorts. Through functional validation in cells, we discovered that IFI44 promotes the proliferation and invasion of glioma cells, and high expression of IFI44 conferred a worse prognosis in glioma patients. Moreover, our analysis of the distribution of prognostic-related genes on single-cell UMAP plot revealed their enrichment in macrophages, CD14 cells, and T cells. Notably, we found that the prognosis of GBM patients was negatively correlated with the presence of B cells, CD14 monocytes, CD56-dim natural killer cells, and T cells, emphasizing the close relationship between GBM prognosis and the tumor microenvironment. These findings underscore the robust prognostic power of our model and reveal complex interactions between GBM and its microenvironment.

In conclusion, our study proposes a refined prognostic model by integrating scRNA-seq and bulk RNA-seq data, providing a more comprehensive analysis of the impact of the tumor microenvironment on patient outcome. By examining the distribution of genes in different cell types, we go beyond traditional gene expression analysis to gain a deeper understanding of the complex interaction between tumor microenvironment and prognosis. In addition, our innovative method of using SCISSOR analysis to associate survival related clinical information from batch RNA seq with scRNA seq data can provide new explanations for the impact of different cell types in the tumor microenvironment on patient prognosis at the single cell level.

Acknowledgment

We acknowledge TCGA and GEO database for providing their platforms and contributors for uploading their meaningful datasets.

Competing Interest

The authors have no competing interest.

Availability of Data and Materials

The datasets analyzed during the current study are available in the TCGA (https://portal.gdc.cancer.gov) and GEO (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16011, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108474) database.

Author Contributions

CL worked on the conception and designed the research. QF and WL were dedicated to data analysis and interpretation. QF and JG writing and review the manuscript. All authors read and approved the final manuscript.

Funding

Not applicable.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Uddin MS, Mamun AA, Alghamdi BS, et al. Epigenetics of glioblastoma multiforme: From molecular mechanisms to therapeutic approaches. Semin Cancer Biol. 2022;83:100–20.
Davis ME, Glioblastoma. Overview of Disease and Treatment. Clin J Oncol Nurs. 2016;20:S2–8.
Wang J, Leavenworth JW, Hjelmeland AB, et al. Deletion of the RNA regulator HuR in tumor-associated microglia and macrophages stimulates anti-tumor immunity and attenuates glioma growth. Glia. 2019;67:2424–39.
Alexander BM, Cloughesy TF. Adult Glioblastoma. J Clin Oncol. 2017;35:2402–9.
Perry JR, Laperriere N, O'Callaghan CJ, Brandes AA, Menten J. Short-Course Radiation plus Temozolomide in Elderly Patients with Glioblastoma. N Engl J Med. 2017;376:1027–37.
Junttila MR, de Sauvage FJ. Influence of tumour micro-environment heterogeneity on therapeutic response. Nature. 2013;501:346–54.
Lei X, Lei Y, Li JK, et al. Immune cells within the tumor microenvironment: Biological functions and roles in cancer immunotherapy. Cancer Lett. 2020;470:126–33.
Mantovani A, Marchesi F, Malesci A, Laghi L, Allavena P. Tumour-associated macrophages as treatment targets in oncology. Nat Rev Clin Oncol. 2017;14:399–416.
Gonzalez H, Hagerling C, Werb Z. Roles of the immune system in cancer: from tumor initiation to metastatic progression. Genes Dev. 2018;32:1267–84.
Rentschler M, Braumuller H, Briquez PS, Wieder T. Cytokine-Induced Senescence in the Tumor Microenvironment and Its Effects on Anti-Tumor Immune Responses. Cancers (Basel). 2022; 14.
Ahmad A. Tumor microenvironment and immune surveillance. Microenvironment Microecology Res. 2022; 4.
Schreiber RD, Old LJ, Smyth MJ. Cancer immunoediting: integrating immunity's roles in cancer suppression and promotion. Science. 2011;331:1565–70.
Mergener S, Pena-Llopis S. A new perspective on immune evasion: escaping immune surveillance by inactivating tumor suppressors. Signal Transduct Target Ther. 2022;7:15.
Verhaak RG, Wouters BJ, Erpelinck CA, et al. Prediction of molecular subtypes in acute myeloid leukemia based on gene expression profiling. Haematologica. 2009;94:131–4.
Verhaak RG, Hoadley KA, Purdom E, Wang V. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell. 2010;17:98–110.
Pan JH, Zhou H, Cooper L, et al. LAYN Is a Prognostic Biomarker and Correlated With Immune Infiltrates in Gastric and Colon Cancers. Front Immunol. 2019;10:6.
Zhao J, Cheng M, Gai J, Zhang R, Du T, Li Q. SPOCK2 Serves as a Potential Prognostic Marker and Correlates With Immune Infiltration in Lung Adenocarcinoma. Front Genet. 2020;11:588499.
Carter SL, Cibulskis K, Helman E, et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol. 2012;30:413–21.
Yoshihara K, Shahmoradgoli M, Martinez E, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun. 2013;4:2612.
Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017;18:220.
Xu ZY, Zhao M, Chen W, et al. Analysis of prognostic genes in the tumor microenvironment of lung adenocarcinoma. PeerJ. 2020;8:e9530.
Wu J, Li L, Zhang H, et al. A risk model developed based on tumor microenvironment predicts overall survival and associates with tumor immunity of patients with lung adenocarcinoma. Oncogene. 2021;40:4413–24.
Alonso MH, Ausso S, Lopez-Doriga A, et al. Comprehensive analysis of copy number aberrations in microsatellite stable colon cancer in view of stromal component. Br J Cancer. 2017;117:421–31.
Tokumaru Y, Oshi M, Patel A et al. Organoids Are Limited in Modeling the Colon Adenoma-Carcinoma Sequence. Cells. 2021; 10.
Jin Y, Wang Z, He D, et al. Analysis of m6A-Related Signatures in the Tumor Immune Microenvironment and Identification of Clinical Prognostic Regulators in Adrenocortical Carcinoma. Front Immunol. 2021;12:637933.
Lin X, Gu Y, Su Y et al. Prediction of Adrenocortical Carcinoma Relapse and Prognosis with a Set of Novel Multigene Panels. Cancers (Basel). 2022; 14.
Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet. 2019;20:631–56.
Kinker GS, Greenwald AC, Tal R, et al. Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity. Nat Genet. 2020;52:1208–18.
Patel AP, Tirosh I, Trombetta JJ, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–401.
Peng J, Sun BF, Chen CY, et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 2019;29:725–38.
Liu HJ, Hu HM, Li GZ, et al. Ferroptosis-Related Gene Signature Predicts Glioma Cell Death and Glioma Patient Progression. Front Cell Dev Biol. 2020;8:538.
Yin W, Tang G, Zhou Q, et al. Expression Profile Analysis Identifies a Novel Five-Gene Signature to Improve Prognosis Prediction of Glioblastoma. Front Genet. 2019;10:419.
Zhao B, Wang Y, Wang Y, et al. Systematic identification, development, and validation of prognostic biomarkers involving the tumor-immune microenvironment for glioblastoma. J Cell Physiol. 2021;236:507–22.
Chen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA. Profiling Tumor Infiltrating Immune Cells with CIBERSORT. Methods Mol Biol. 2018;1711:243–59.
Diboun I, Wernisch L, Orengo CA, Koltzenburg M. Microarray analysis after RNA amplification can detect pronounced differences in gene expression using limma. BMC Genomics. 2006;7:252.
Sheng W, Li X, Li J, Mi Y, Li F. Evaluating prognostic value and relevant gene signatures of tumor microenvironment characterization in esophageal carcinoma. J Gastrointest Oncol. 2021;12:1228–40.
Mayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018;28:1747–56.
Yang W, Warrington NM, Taylor SJ et al. Sex differences in GBM revealed by analysis of patient imaging, transcriptome, and survival data. Sci Transl Med. 2019; 11.
Zhang X, He Y, Ren P, et al. Low expression and Hypermethylation of ATP2B1 in Intrahepatic Cholangiocarcinoma Correlated With Cold Tumor Microenvironment. Front Oncol. 2022;12:927298.
Yan Y, Jiang Y. RACK1 affects glioma cell growth and differentiation through the CNTN2-mediated RTK/Ras/MAPK pathway. Int J Mol Med. 2016;37:251–7.
Su W, Li S, Chen X, et al. GABARAPL1 suppresses metastasis by counteracting PI3K/Akt pathway in prostate cancer. Oncotarget. 2017;8:4449–59.
Huang Y, Ouyang F, Yang F, et al. The expression of Hexokinase 2 and its hub genes are correlated with the prognosis in glioma. BMC Cancer. 2022;22:900.
Pan H, Wang X, Huang W, et al. Interferon-Induced Protein 44 Correlated With Immune Infiltration Serves as a Potential Prognostic Indicator in Head and Neck Squamous Cell Carcinoma. Front Oncol. 2020;10:557157.
Feng Z, Zhou W, Wang J, et al. Reduced expression of proteolipid protein 2 increases ER stress-induced apoptosis and autophagy in glioblastoma. J Cell Mol Med. 2020;24:2847–56.
Tu Y, Xie P, Du X, et al. S100A11 functions as novel oncogene in glioblastoma via S100A11/ANXA2/NF-kappaB positive feedback loop. J Cell Mol Med. 2019;23:6907–18.

Table.1 Clinical information of the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets.
Characteristic	GSE16011	GSE108474	TCGA
Age (years)
> 55	78	-	306
≤ 55	77	-	212
Sex
Female	50	-	204
Male	105	-	314
Vital status
Alive	8	23	77
Dead	147	187	441

Table 2 Prognosis of the nine genes in the signature.
ENSEMBL ID	Symbol ID	Gene name	Coef	P-value	Prognostic indicator
ENSG00000154027	AK5	Adenylate Kinase 5	0.45	< 0.01	high
ENSG00000070961	ATP2B1	ATPase Plasma Membrane Ca2+ Transporting 1	0.41	=0.01	high
ENSG00000184144	CNTN2	Contactin 2	0.55	< 0.01	high
ENSG00000147488	GABARAPL1	Suppression Of Tumorigenicity 18	-0.45	< 0.01	low
ENSG00000159399	HK2	Hexokinase 2	0.45	< 0.01	high
ENSG00000137965	IFI44	Interferon Induced Protein 44	0.60	< 0.01	high
ENSG00000102007	PLP2	Proteolipid Protein 2	0.45	< 0.01	high
ENSG00000163191	S100A11	S100 Calcium Binding Protein A11	0.49	< 0.01	high
ENSG00000147488	ST18	ST18 C2H2C-Type Zinc Finger Transcription Factor	0.36	=0.03	high

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Comprehensive Prognostic Assessment by Integrating Single-Cell and Bulk RNA-seq Signatures in Glioblastoma

Status:

Version 1

Abstract

Background

Methods

Result

Conclusions

Figures

Introduction

Materials and Methods

Data collection and pre-processing

Processing of scRNA-seq dataset

Immune infiltration analysis

Data Processing and Acquirements of DEGs

Construction of risk scoring model

RS prognostic analysis

Mutation analysis

Cell culture

Cell invasion assays

Wound healing assay

Cell counting kit-8 (CCK-8) assay

Western blotting analysis

Data analysis and processing

Results

Discussion

Conclusions

Declarations

References

Tables

Additional Declarations

Status:

Version 1