Short Survival-Related Genes Harbor EMT Processes and Dendritic Cell Inltration in Glioblastoma

Background: Glioblastoma is an aggressive primary tumour with the lowest survival time among brain tumours. Tumour-inltrating immune cells (TIICs) are involved in tumour progression and determine the prognosis, while the association of immune cell inltration with glioblastoma is rarely unknown. This study aimed to screen survival-related (SR) genes and major biological processes through bioinformatic analysis and to identify the relationship between SR genes and TIICs. Methods: SR genes were screened by comparing the long-term (>36 months) and short-term (<12 months) survivors in the database GSE53733. Gene set enrichment analysis (GSEA) was applied to compare the differences in biological processes between long-term survivors and short-term survivors. The SR genes were identied using the limma package of R. Gene Ontology (GO) analysis was conducted through Metascape. The protein-protein interaction (PPI) network of the SR genes was established through the Search Tool for the Retrieval of Interacting Genes (STRING) website and further analysed by the Molecular Complex Detection (MCODE) algorithm. UALCAN and GlioVis were employed to analyse the expression levels and prognostic value of hub genes. The correlation of hub genes with immune cell ltration was estimated by the Tumor Immune Estimation Resource (TIMER). The gene-drug interaction network was constructed using the Comparative Toxicogenomics Database (CTD). Results: The functions of the detected genes were mainly enriched in epithelial mesenchymal transition (EMT) and oxidative phosphorylation. Of the detected genes, a total of 220 SR genes were identied, including 78 upregulated genes and 142 downregulated genes in long-term survivors. The upregulated genes were mainly related to neuron projection morphogenesis, extracellular matrix, and cation channel activity. The downregulated genes were mainly related to extracellular matrix organization and angiogenesis. The PPI network for SR genes was constructed with 65 edges and 195 nodes, and two signicant modules were selected. The results indicated that COL1A2, COL6A2, COL8A1, and COL8A2 were hub SR genes. In addition, they were correlated with immune cell inltration, dendritic cell inltration. Conclusions: These results revealed that collagens accounted for the progression and prognosis of glioblastoma. In addition, DC inltration is a risk factor for glioblastoma patients. The expression of collagen protein COL6A2 was signicantly correlated with the DC inltration level and poor prognosis. Further, potential drugs that affect the function of COL6A2 could improve the outcomes of glioblastoma.


Background
Glioblastoma, sometimes called glioblastoma multiforme (GBM), is the most aggressive primary brain tumour, with a 5-year survival rate of 6.8%, and accounts for approximately 14.6% of all brain and other central nervous system tumours and 48.3% of malignant gliomas [1]. Despite the application of computed tomography (CT) and magnetic resonance imaging (MRI) for diagnosis and the advances in multimodality therapy, the overall prognosis of GBM is still poor, and the long-term survival remains the lowest among brain and other central nervous system tumours, with a median survival of only 14 months [2].
The survival of cancer patients is mainly associated with molecular factors and clinical features including age, tumour site and therapies. In GBM, numerous studies have indicated that epidermal growth factor receptor activates the RTK/RAS/PI3K pathway, leading to increased proliferation and is associated with poorer survival [3][4][5]. Furthermore, methylation of O6-methylguanine-DNA methyltransferase (MGMT), mutation of isocitrate dehydrogenase 1/2 (IDH1/2), PTEN and immune cell in ltration also affect the prognosis of GBM [6,7]. It seems that studying survival-related factors is meaningful for improving the outcomes of cancer patients.
In solid tumours, the tumour microenvironment (TME) is the soil for tumour progression and consists of cancer cells, endothelial cells, cancer-associated broblasts (CAFs), immune cells and noncancer stromal cells [8]. Immune cells are the guardians of the human body. Both innate and adaptive immune cells can interact with cancer cells via expressed antigens on the surface and affect the proliferation and invasion of carcinoma [9]. Firm evidence has indicated that tumours with high expression of programmed death ligand 1 (PD-L1) and indoleamine-2,3-dioxygenase (IDO) display high in ltration of regulatory T cells (Tregs) and have a poor prognosis [10]. Currently, a few drugs targeting PD-L1 have been approved for cancer immunotherapy to improve clinical outcomes [11,12]. Interestingly, during the evolutionary trajectories of cancer, the remodelling of the TME by alteration of immune cell in ltration leads to diverse tumour behaviour, including immune escape and tolerance [13,14]. Thus, it is important to clarify the relationship of tumour-related molecular expression with immune cell in ltration, especially in SR molecules.
In this study, we identi ed collagens as negative survival-related (NSR) genes through bioinformatic analysis of gene expression pro ling and analysed the correlation of collagens with immune cell in ltration. Furthermore, we screened potential small molecular drugs that improve clinical outcomes by

Identi cation of SR genes
The limma package [15] in R was used to choose the SR genes in long-term survivors compared with those in short-term survivors. While a gene symbol may correspond to multiple probes (expression values), the max value is the nal expression value for that mRNA. The cutoff criteria were p-value < 0.05 and |log fold change (FC)| >0.5. The ggplot2 and pheatmap packages of R were applied for the visualization of SR genes in a volcano plot and a heatmap.

Functional enrichment analysis
Gene set enrichment analysis (GSEA, version 4.1.0) is a computational method that determines whether an a priori de ned set of genes shows statistically signi cant, concordant differences between two biological states [16]. In this study, the hallmark gene set of Molecular Signatures Database v7.2 of GSEA was used to explore the enrichment of gene sets among the detected genes.
To obtain deep insight into the biological functions of critical genes, Gene Ontology (GO) annotation of SR genes was performed with Metascape [17] (http://metascape.org/). GO functional analysis contains biological processes (BP), molecular functions (MF), and cellular components (CC). The human genome (Homo sapiens) was selected as the background variable. Gene count ≥ 3 and p-value < 0.05 were set as the threshold. The results were visualized using Prism 8.

Protein-protein interaction network (PPI) construction
The PPI network of SR genes was constructed using the Search Tool for the Retrieval of Interacting Genes (STRING) database (https://string-db.org/). STRING is a biological database designed to construct a PPI network of SR genes based on known and predicted PPIs and then analyse the functional interactions between proteins [18]. The criterion was set at con dence score greater than 0.7. The Molecular Complex Detection (MCODE) plug-in of Cytoscape (version 3.9.0) was applied to screen the key modules of the PPI network [19]. The advanced options were set as degree cutoff = 3, K-Core = 2, and node score cutoff = 0.2.

Expression and survival analysis of collagen genes
The analysis of the relative expression of the four collagen genes was performed using UALCAN (http://ualcan.path.uab.edu), a user-friendly, interactive web resource for analysing cancer transcriptome data [20]. Furthermore, the ggplot2 and ggpubr packages of R were applied for the visualization of collagen gene expression between long-term survivors and short-term survivors in dataset GSE53733.
GlioVis [21] (http://gliovis.bioinfo.cnio.es/) is a user-friendly web application for data visualization and analysis used to explore brain tumour expression datasets, which is mainly based on the Chinese Glioma Genome Atlas (CGGA) and The Cancer Genome Atlas (TCGA) databases and other researcher's data. In this study, we analysed the relation of OS with the expression of collagen genes through Kaplan-Meier (KM) survival estimates in the TCGA-GBM dataset.

Correlation of collagens with immune cell in ltration
The Tumor Immune Estimation Resource (TIMER) database (https://cistrome.shinyapps.io/timer/) is an online tool for systematic analysis of immune cell in ltration across diverse cancer types from TCGA [22]. TIMER uses a deconvolution algorithm to estimate the abundance of tumour-in ltrating immune cells (TIICs) based on gene expression pro les [23]. First, we evaluated the association between collagens and tumour purity and the association between clinical outcome and the abundance of immune in ltrates in GBM. Then, we explored the correlation of collagen gene expression with the abundance of immune cell in ltration.

Gene-drug interaction network analysis
The gene-drug interaction network was constructed using the Comparative Toxicogenomics Database [24] (CTD) for chemotherapeutic drugs that could decrease or affect the mRNA, protein expression or methylation of genes.

Statistical analysis
Statistical analysis was performed with R (version 4.0.1). Student's t-tests were utilized for the comparison of two sample groups. Differences were considered statistically signi cant when p < 0.05.

SR genes between long-term survivors and short-term survivors in GBM
First, we analysed the physiological processes involving the detected genes using GSEA. Epithelial mesenchymal transition (EMT), TNF-α signalling via the NF-κB and the G2M checkpoint were enriched in short-term survivors, and oxidative phosphorylation was enriched in long-term survivors (Fig. 1a).
According to the cutoff of a logFC value of 0.5, a total of 220 SR genes were identi ed, including 78 upregulated genes in long-term survivors, which were de ned as positive SR genes (PSR genes), and 142 downregulated genes in long-term survivors, which were de ned as NSR genes (Fig. 1b&c). Since EMT is a vital process associated with cancer prognosis [25] and had the highest score in GSEA, we analysed the genes involved in NSR and PSR genes. As shown in Fig. 1d, 11 NSR genes were enriched in the EMT gene set, while no PSR genes were enriched. The 11 enriched genes included COL1A2, COL6A2, COL8A2, FBN2, FMOD, LAMA1, PCOLCE, PLOD2, SERPINE1, TFPI2, and VEGFA.

Functional enrichment analysis of SR genes
To further determine the signi cance of SR genes in GBM, gene enrichment analysis was performed based on the PSR genes and NSR genes. The NSR genes were mainly enriched in embryonic organ development, embryonic morphogenesis, and skeletal system development in the BP terms; extracellular matrix (ECM), collagen-containing ECM, and basement membrane in the CC terms; and ECM structural constituent, proximal promoter sequence-speci c DNA binding, and ECM structural constituent conferring tensile strength in the MF terms. Accordingly, the PSR genes were mainly associated with neuron projection morphogenesis, plasma membrane bounded cell projection morphogenesis, cell projection morphogenesis in BP gene sets; extracellular matrix, axon, neuronal cell body in the CC gene sets; and potassium ion transmembrane transporter activity, voltage-gated potassium channel activity, cation channel activity in the MF gene sets (Fig. 2b). These results indicated that EMT-related processes are the core mechanism by which NSR genes affect GBM prognosis. To further con rm these ndings, a proteinprotein interaction (PPI) network derived from NSR genes and PSR genes was mapped using the STRING database (Fig. 2c). Interestingly, among the resulting nodes, two nodes with higher scores were screened out by MCODE (Fig. 2d, e). COL1A2, COL6A2, COL8A1, COL8A2 and PLOD2 comprised module 1 with a score of 5.0 (Fig. 2d). HOXB3, HOXA5, HOXB5, and HOXB7 comprised module 2 with a score of 4.0 (Fig. 2e). All the genes of module 1 and module 2 were derived from NSR genes.

Expression and survival analysis of collagen genes
Noting that COL1A2, COL6A2 and COL8A2 are EMT and signi cant PPI node participants, we further explored whether these genes predict poor outcomes in GBM. As shown in Fig. 3a, the expression of the three mentioned collagen proteins was markedly downregulated in short-term survivors (p < 0.05).
Moreover, expression analysis based on TCGA-GBM RNAseq datasets shown that these genes are overexpressed in primary tumour tissues compared to normal tissues (Fig. 3b). Since it belongs to the collagen proteins, COL8A1 was also assessed. High expression of these genes was signi cantly associated with poor prognosis (Fig. 3c), which was consistent with their inclusion in the NSR gene set.

The expression of collagen genes is correlated with immune cell in ltration in GBM
The process of EMT is closely related to tumour heterogeneity, which contributes to resistance and treatment failure [26]. To explore whether the screened collagen genes are associated with the tumour heterogeneity of GBM, we examined the correlation of the tumour purity of GBM tissues and collagen proteins using TIMER. Although no signi cant difference in tumour purity was found between the shortterm and long-term samples ( Supplementary Fig. 1a), the expression of three screened collagen genes, COL1A2 (r = -0.320, p < 0.05), COL6A2 (r = -0.261, p < 0.05), and COL8A2 (r = -0.369, p < 0.05), was negatively correlated with tumour purity, suggesting that these three collagen genes are involved in tumour heterogeneity (Fig. 4a). The aggregation of TIICs is a key component of the tumour microenvironment. TIICs interact with tumour cells and domesticate each other, forming a community to maintain the malignant phenotype and heterogeneity [27]. We analysed the association between immune cells and survival in GBM. As shown in Fig. 4b, only dendritic cell (DC) in ltration was signi cantly associated with survival. Lower DC in ltration predicts better survival outcomes. Furthermore, the expression of COL1A2 (r = 0.392, p < 0.05), COL6A2 (r = 0.461, p < 0.05), and COL8A2 (r = 0.350, p < 0.05) was positively correlated with the in ltration level of DCs (Fig. 4c). To further con rm the ndings, the correlation of somatic copy number of collagen proteins and DC in ltration was analysed. As shown in Fig. 4d, arm-level gain of COL1A2 and arm-level deletion of COL6A2 were associated with DC in ltration. The immune functions of tumour-in ltrating DCs are performed by activated DCs. Moreover, the expression of COL6A2 was negatively correlated with myeloid DC activation (ρ=-0.182, p < 0.05).
Collectively, these data suggested that DC in ltration predicts poor outcomes in GBM patients and that the elevation of COL6A2 possibly regulates activated DC in ltration.

COL6A2-drug interaction network analysis
Based on the results of in ltration level analysis, COL6A2 is suggested to be the main regulator of DCmediated immune anergy in GBM. Overall, we screened 9 drugs that can reduce the expression or affect secretion of COL6A2 protein, including erianin, sodium uoride and triclosan. The screening results also shown that 31 drugs can reduce the expression of COL6A2 mRNA, including cisplatin, clo brate and cytarabine. In addition, there were 5 drugs that increase or affect the methylation of COL6A2, including valproic acid, sodium arsenite and a atoxin B2 (Fig. 5), indicating that these drugs could potentially improve survival for GBM patients.

Discussion
In the present study, we identi ed GBM SR genes and uncovered that NSR genes regulate EMT processes in glioblastoma. Among the NSR genes, collagen genes, including COL1A2, COL6A2, and COL8A2, predict high-risk GBM with shorter survival time and are related to enhanced tumour heterogeneity and DC in ltration.
Tumour prognosis involves multiple mechanisms. EMT is an essential process associated with metastasis and drug resistance in cancer [28,29]. In this study, we found that genes upregulated in shortterm survivors were enriched in EMT, which had the highest normalized enrichment score (NES). EMT involves cell-cell and cell-extracellular matrix interactions [30]. Furthermore, we screened 220 SR genes and found that 11 EMT gene set members were integrated with NSR genes, including three matrix proteins. Among SR genes, the NSR genes were mainly associated with multicellular organism development and ECM organization. The ECM is a crucial structure for tumours that can promote the growth, survival, and invasion of tumours and modify broblast and immune cell behaviour [31,32]. Moreover, the matrix and stromal cells can also modulate the e cacy of therapy, and methods that reshape the tumour matrix could improve the outcomes of patients [33,34]. Thus, the ECM might be an interesting direction for exploring hub SR genes. In this study, the PPI network was constructed and modules were identi ed with the MCODE plug-in of Cytoscape. The results implied that the collagen proteins COL1A2, COL6A2, COL8A1, and COL8A2 were hub SR genes, and further investigations were carried out.
The collagen superfamily is the most important group of ECM proteins and is characterized by three signature features and comprises 28 members. Studies have revealed that collagens promote the proliferation, metastasis and invasion of cancers [35,36]. In addition, overexpression of collagens can enhance the resistance of cancers to chemotherapy drugs, leading to poor prognosis [37,38]. In this study, we found that COL1A2, COL6A2, COL8A1, and COL8A2 were overexpressed in GBM compared to normal tissue and that they were downregulated in long-term survivors compared to short-term survivors, which con rmed that the four mentioned collagen proteins are oncogenes and SR genes. Collagens interact with cancer cells through receptors and play a crucial role [39]. Discoidin domain receptors (DDRs) are a subfamily of tyrosine kinases that are activated by collagens. Research has shown that the activation of DDR2 by COL1 regulates SNAIL1 stability and promotes breast cancer cell invasion and migration [40]. Others further reported that the binding of COL11A1 to DDR2 activates Src-PI3K/Akt-NF-kB signalling and inhibits cisplatin-induced apoptosis in ovarian cancer cells [41]. In addition to DDRs, integrin is also a receptor of collagens, and the binding of integrin to COL1 might enhance the proliferation and invasion of squamous cell carcinoma cells via the MEK/ERK signalling pathway [42]. Collectively, it seems that collagens promote the progression of tumours via multiple approaches.
Collagens bind to receptors that correlate with cancer behaviours and prognosis. Collagens combine with immune cells and comprise the main component of the TME. However, the correlation of collagen expression with immune cell in ltration in GBM is rarely unknown. Studies have shown that immune cell in ltration is highly relevant to antitumor responses and prognosis [43,44]. In this study, we estimated the association of six immune in ltrates with patient survival. We found that tumour-in ltrating DCs (TIDCs) are signi cantly associated with clinical outcome in GBM, and a high level of DC in ltration predicts poor cumulative survival. In general, higher survival rates were found in patients harbouring larger populations of DCs and T lymphocytes [45,46]. However, due to the presence of suppressive immune cells, especially Tregs, tumour cells are able to escape the immune system [47,48]. In DC-mediated tumour immunity, the prognostic impact is related to the TIDC phenotype; mature DCs have been considered immune stimulatory, whereas immature DCs have been considered suppressive and tolerogenic [49]. In the current study, we found that the expression of COL6A2 was negatively correlated with the abundance of activated DCs. DC activation is suppressed by tumour-derived molecules, including PD-L1 and Tim3 [50,51]. A recent study shown that VEGF can impair the migration capacity and immune function of mature DCs and contribute to immunosuppression [52]. In our study, we found that the expression of COL1A2, COL6A2, and COL8A2 was positively correlated with the DC in ltration level. A previous study demonstrated that collagens could promote DC survival and promote the maturation of monocyte-derived DCs via osteoclast-associated receptors [53]. In this study, COL6A2 increased the level of DC in ltration but decreased the level of activated DC in ltration. Thus, it might be the main factor that leads immune escape and poor prognosis.
The above conclusions indicate that COL1A2, COL6A2, COL8A1, and COL8A2 are oncogenes and SR genes. Recent studies have shown that depleting collagens improves the therapeutic e cacy of antitumour drugs [54][55][56]. In this study, we found that cisplatin, clo brate and cytarabine can decrease the expression of COL6A2 mRNA, while valproic acid can increase the methylation of COL6A2. Cisplatin is a rst-line chemotherapy drug and is used in various tumours, and valproic acid is a selective inhibitor of histone deacetylase. Barneh's research shown that valproic acid inhibits stromal cell function and exerts anticancer effects [57]. In GBM, valproic acid is considered a favourable SR drug that acts through multiple mechanisms [58][59][60][61].
In conclusion, our results highlight the key roles of collagens in GBM prognosis and immune cell in ltration. Furthermore, we reveal the potential mechanism through which valproic acid regulates GBM progression via COL6A2 methylation. Future studies will aim to investigate additional mechanisms involved in collagen methylation and tumorigenesis.

Conclusions
In this study, we screened SR genes by comparing the long-term and short-term survivors in the database GSE53733. Through multiple bioinformatic analysis, it determined that short-survival genes were mainly associated with EMT associated process. We identi ed three collagen genes were most signi cantly associated with EMT and short survival regulation in GBM. Further analysis indicated that COL1A2, COL6A2 and COL8A2 negatively correlated to tumour purity and DCs in ltration. Survival analysis uncovered that DCs in ltration is a risk factor for GBM, and A novel nding is that COL6A2 was signi cant correlation with DCs in ltration and leading adverse prognosis.      Gene-drug interaction network constructed with COL6A2 and chemotherapeutic drugs. Overview of drugs that decrease protein and mRNA expression and affect methylation of COL6A2. Drugs that decrease the mRNA expression were exhibited for 15 terms.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. supplement guresNI.pdf