3.1 Aberrant Expression of SOX family genes in Glioma
Nineteen SOX family genes were retrieved using the ONCOMINE database, and their transcriptional levels in tumor tissues were first examined(14). Compared with normal tissues, many SOX family genes were aberrantly expressed in a wide range of tumor types, especially in Colorectal Cancer, Breast Cancer, Brain, and CNS cancer (Fig. 1A). Of note, the average expression scores of all SOX family genes in both GBM and LGG patient samples were elevated according to the analysis of bulk level data from TCGA and GTEx datasets (Fig. 1B). A plate-based high-quality scRNA-seq dataset was used to validate the reliability of the results to exclude interference from non-tumor cells(14). After an unsupervised graph-based clustering by using Leiden(30), eight major cell clusters were identified, which were microglia (MG), malignant cells (Malignant), proliferative malignant cells (Proliferation), normal myelin oligodendrocytes (OL), bone marrow-derived macrophage (BMM), polymorphonuclear neutrophils (PMN), lymphocytes (Lymphocyte) and lung cancer cells (Lung Cancer) (Fig. 1C, D). The lung cancer cell cluster was from a control sample in the original dataset and used for quality control. The mRNA levels of SOX2, 3, 4, 5, 6, 8, 9, 11, and 15 were significantly higher, while that of SOX10 was lower in malignant cells (Supplementary Fig. 1A). Overall, aberrant expression of SOX family genes in various human cancer tissues, including GBM, is a common occurrence.
3.2 Establishment of the SOX gene-based prognostic signature model for Glioma
To find the most prognostically relevant genes in the SOX family, we used a machine learning approach to shrink the candidate genes further. The TCGA-LGG and -GBM cohort (n = 663) was used as the training set, and two cohorts from CGGA (n1 = 222, n2 = 404) as the validation set. After the univariate analysis, differential genes (8 genes) with P-value < 0.05 were included in the lasso-penalized Cox model (Supplementary Fig. 1B & C). Moreover, a four-gene signature (SOX3, SOX6, SOX8, SOX9) being the most significant to OS was established (Fig. 2A & B). Risk scores based on the signature were calculated for each patient, and an optimal cut-off value to classify the patients into either high- or low-risk group was selected (Fig. 2C). The Kaplan-Meier analysis and time-dependent ROC curve were performed to validate the prognostic efficacy using the TCGA and CCGA cohorts. The Area under the ROC curves (AUCs) for 1-year, 3-year, and 5-year OS were 0.82, 0.84, 0.802 for the TCGA cohort, and 0.712, 0.591, 0.602, and 0.773, 0.702, 0.673 for the two CGGA cohorts, respectively, indicating the high sensitivity and specificity of this model (Fig. 2D-G). Clinical parameters and risk scores of the prognostic model were further included for (a?) the multivariate Cox regression model. The risk score was identified as an independent factor for OS, showing HR 1.636, and 95% CI 1.108–2.417 (with P-value < 0.001) (Fig. 2H). Together, the 4-gene signature model had an effective and independent prognostic ability for OS in glioma.
3.3 The high-risk population is characterized by a high proliferative capacity of tumor cells and a complex immune microenvironment
There are significant differences between the high- and low-risk groups (Supplementary Fig. 2A). To explore their underlying biological processes, we analyzed the differentially expressed genes between the high- and low-risk groups using single-sample gene set enrichment analysis (ssGSEA). In the Molecular Signature Database (MSigDB) “hallmark” collection of major biological categories, the high-risk group gliomas showed pathway enrichment in (1) Remodeling of the TME (“Angiogenesis”, Hypoxia”), (2) Inflammation (“Inflammatory Response”, “Allograft Rejection”), (3) Proliferation and metastasis (“G2M checkpoint”, “E2F Targets”, “Epithelial-Mesenchymal Transition”), and (4) Immune cell activation states (“TNF⍺ Signaling via NF-κB”, “Interferon ⍺ Response”, “Interferon γ Response”) (Supplementary Fig. 2B). Similar results were obtained from the Gene Set Enrichment Analysis (GSEA) (Fig. 3A). Moreover, the Ingenuity Pathways Analysis (IPA) analysis was employed to examine the upstream regulators of known pathway structures using genes with differential expression ( > = 2-fold, p-value < 0.05). The results from IPA revealed a significant activation of the TME-associated network (Supplementary Fig. 2C and D and E). Of note, in the high-risk group, the expression of VEGF-suppression molecules SOX1, SOX3, and GMNN was remarkably downregulated, while that of VEGF-promoting molecules TNF, IFNG, IL6 was upregulated (Supplementary Fig. 2D and E). Collectively, these findings indicated that the high-risk group displayed a more pronounced feature of malignancy.
3.4 High risk of glioma associates with a suppressive immune microenvironment
Immune checkpoint treatment, such as the PD1-PD1 ligand 1 (PD-L1) axis and cytotoxic T lymphocyte-associated antigen 4 (CTLA4), has proffered dramatic successes against various solid tumors(36, 37). While immune checkpoint inhibitors (ICIs) have revolutionized cancer treatment, unfortunately, they have had little therapeutic success in GBM(8, 38). To assess the relationship between glioma risk and the landscape of immune cell infiltration, we used Estimation of STromal and Immune cells in MAlignant Tumour tissues using Expression data (ESTIMATE) to evaluate the presence of infiltrating stromal and immune cells in glioma tissues(23). Significant differences were found for immune score (p < 0.001) and stroma score (p < 0.001, Fig. 3B) between the two groups of the disease, revealing a more complex tumor microenvironment in the high-risk group. In addition, CIBERSORT was used to infer relative frequencies of immune cells in the tumors(22). Surprisingly, no significant difference was observed in the distribution of CD8 + T cells between the high- and low-risk groups. In contrast, suppressive immune cells such as regulatory T cells were significantly enriched in the high-risk group, but most of these regulatory T cells exhibited a resting phenotype (Fig. 3C). Overall, the high-risk group showed a more complex microenvironment than the low-risk group, with the majority of effector cells in a resting state, suggesting the presence of significant immunosuppressive activities.
3.5 High-risk group associates with recruitment of bone marrow derived myeloid cells
To clarify the biological role of the 4 SOX signature genes in shaping an immunosuppressive microenvironment in glioma, we created an immune gene list that was strongly correlated with risk score by Pearson correlation analysis (Pearson |R| > 0.4). The heat map demonstrates a large number of immune genes strongly associated with the SOX signature, again suggesting a complex immune microenvironment in the high-risk group (Fig. 3D). Gene ontology (GO) enrichment analysis suggested that activation of myeloid cells was the most typical event among them (Fig. 3E).
Given that the immune cell types of the central nervous system are different from those of the peripheral blood, the LM22 signature matrix built-in CIBERSORT may result in bias. We, therefore, calculated the relative abundance of bone marrow-derived macrophages (BMM) and microglia (MG) using the signature genes by the GSVA algorithm(9). These results revealed that the high-risk group gliomas displayed a significant increase in BMM cells instead of MG cells compared to the low-risk gliomas (Fig. 3F). In addition, the high-risk group gliomas also had dramatically higher expression of ICPs, such as CD274, PDCD1LG2, CTLA4, PDCD1, HAVCR2, TIGIT, compared to the low-risk group gliomas, which could cause immune tolerance and immune escape (Fig. 3G).
Based on these analyses, our results highlight the importance of BMM cells in glioma tumors, which may be involved in the formation of a suppressive immune microenvironment.
3.6 SOX-based risk model could identify more malignant cells well at single cell level
ScRNA-seq can provide detailed information on the transcriptional state of cancer cells with single cell resolution(39). In order to better explore the biological significance of the model, we further validated the model's performance based on scRNA-seq analysis. SOX2-positive tumor cells and MOG-positive normal glial cells were included in the analysis; these cells were further clustered into eight subgroups (Fig. 4A). Among the four genes that made up the SOX-based risk model, SOX3, SOX6, and SOX8 were highly expressed in MOG + normal OLs and OLIG2 + stem-like tumor cells, whereas SOX9 was highly expressed in AQP1 + and TNFRSF12A + tumor cells (Supplementary Fig. 3A). We then constructed a transcriptional trajectory to explore the relationship among these three tumor cell subtypes on a pseudo-time scale using Monocle(32). This analysis showed that the OLIG2 + stem-like tumor cells were at the beginning of the trajectory path, while the two branches of the trajectory were populated by TNFRSF12A + cells for fate one and AQP1 + cells for fate two (Fig. 4B). We next investigated the transcriptional changes associated with the transition states and observed that the expression of SOX6/SOX8/OLIG2 was decreased with pseudo time, and the expression of SOX9/TNFRSF12A/AQP1 was increased with pseudo time (Fig. 4C). Interestingly, at the bulk level, OLIG2 level also correlated with SOX3, SOX6, and SOX8 levels (Supplementary Fig. 3D), while TNFRSF12A and AQP1 levels correlated with SOX9 level (Supplementary Fig. 3E).
Notably, AQP1 and TNFRSF12A were enriched in the high-risk group, whereas OLIG2 was highly expressed in the low-risk group, which was consistent with the data at the bulk level (Supplementary Fig. 3B). Of note, high TNFRSF12A and AQP1 levels and low OLIG2 levels also predicted poor prognosis in the TCGA glioma set (Supplementary Fig. 3C).
Taken together, we have uncovered the relationship between specific cell types and the risk genes: High levels of SOX3, SOX6, and SOX8 are markers for relatively benign cell types, and upregulation of SOX9 is more likely a marker in the much more malignant cell type. The SOX-based risk score can reflect the proportion of relatively benign and malignant cells in the tumor and the evolution process of tumor malignancy.
3.7 SOX9 + malignant cells have more substantial proliferative and invasive potential
We delved further into the underlying biological processes enriched in the OLIG2+, TNFRSF12A+, and AQP1 + cell types and explored the reasons for their different prognosis. TNFRSF12A + cells were characterized by the increased proliferation-related pathway activity, while AQP1 + cells were characterized by EMT-related pathway activity (Fig. 4D). To verify the proliferation ability of different tumor sub-clusters, we used a weighted vote classifier based on the reference cell identities to deconvolute the composition of the MKI67 + proliferation cells. Consistent with the enrichment analysis results, 60% of these MKI67 + cycling cells were contributed from the TNFRSF12A + cells showing the most vigorous proliferation activity, (Fig. 4E). Of note, in the TNFRSF12A + population, proliferating cells are twice as many as resting cells (Fig. 4E). It is also worth noting that the AQP1 + cells hardly participated in proliferation, but their EMT activity was the strongest (Fig. 4E and 4F), which suggested that AQP1 + cells could be the origin of glioma metastasis.
3.8 High SOX9 level associates with poor prognosis in glioma.
Our RNA-seq data analyses have demonstrated that high level of SOX9 correlates with malignancy and poor prognosis, probably by promoting cell proliferation and immunosuppressive microenvironment. To further investigate the clinical application value of SOX9, we performed IHC on 140 glioma specimens. The distribution and positivity of SOX9 varied in different tumor tissues; but notably, there were no tumor tissues with complete SOX9 positivity, further suggesting that SOX9 + tumor cells might be differentiated from tumor stem cells after tumorigenesis (Fig. 5A). The expression of SOX9 correlated with age, DFS, EGFR, and Ki-67 levels, indicating the proliferative ability of SOX9 positive cells (Fig. 5B), which further validated our bioinformatics analysis. Finally, when patients were stratified according to SOX9 expression intensity, those with high SOX9 expression had a significantly poorer prognosis than those with low SOX9 expression (Fig. 5D).
3.9 SOX9 + cells recruit BMM to shape the suppressive immune microenvironment
Angiogenesis and inflammatory-related pathways were activated in both TNFRSF12A + and AQP1 + cells (Fig. 4D), suggesting that they could play an important role in the remodeling of the tumor microenvironment by interaction with other cells. Cellular communications were evaluated by using CellChat, which associates each interaction with a probability value to identify differentially expressed ligands and receptors(34). Significant interactions were detected between BMM-TNFRSF12A+, BMM-AQP1+, MG-TNFRSF12A+, or between MG-AQP1 + cells, representing CX3C and CSF signaling pathways (Fig. 6A). Specifically, high levels of CX3CR1 and CSF1R were detected in BMM and MG cells, and their corresponding ligands CX3CL1 and CSF1 were highly expressed in TNFRSF12A + and AQP1 + cells, suggesting the cell communication between immune cells and tumor cells (Fig. 6B).
We further used multicolor immunofluorescence to explore the infiltration of myeloid cells. CD68 was used to label myeloid cells, CD163 to label BMM, and TMEM119 to label microglia (MG). Significant BMM infiltration occurred in tumors with high SOX9 expression, whereas there was little difference in MG infiltration regardless of high or low SOX9 expression (Fig. 6E-G). These observations suggested the widespread occurrence of the immune-regulatory interactions among BMM, MG cells, and TNFRSF12A+, AQP1 + cells in glioma, implying an overall immune suppressive microenvironment in SOX9-expressing glioma.
In general, the analysis at the single-cell level explained the results of the bulk level well and made new expansions. The high-risk group's high proliferation and invasion characteristics are represented by the TNFRSF12A + and AQP1 + cells, respectively. Besides, they can all produce inflammatory chemokines and attract myeloid cells to form a complex inhibitory microenvironment.