Somatic mutational phenotypes in glioblastoma and IDH1 mutation frequencies in pan-cancer.
Somatic mutational data were used to descript the genomic mutational landscape of 20 genes with the highest mutation frequency in GBM. Missense mutation is the most common alteration. There are 6.6% of GBM patients with IDH1 mutation. The IDH1 mutation frequency in pan-cancer revealed a highest mutation frequency in diffuse glioma and a fourth mutation frequency in GBM. It has been demonstrated that IDH1 mutation is crucial for the diagnosis and prognosis of GBM. Here, we performed the survival analysis to validate the predictive function of IDH1 mutation in TCGA and CGGA. The Kaplan-Meier curves showed that IDH1MUT had a higher survival probability than IDHWT in both TCGA and CGGA, log-rank p value < 0.05 was considered as significant (Fig. 1).
Common Differentially Expressed Genes in TCGA and CGGA
Due to the significant roles of IDH1 mutation in GBM, this study aimed to investigate the molecular mechanism differences between IDH1MUT and IDH1WT GBM and managed to construct a robust prognostic model for patients with GBM. First, the ‘limma’ package in R was used to identify the DEGs in IDH1MUT compared to IDHWT. There were 3106 DEGs in TCGA and 2298 DEGs in CGGA. The blue circles and red circles in volcano plots represent the down-regulated DEGs and up-regulated DEGs (Fig. 2A-B). The Venn diagram showed 1049 common DEGs between two datasets, among which 838 DEGs were down-regulated, 208 DEGs were up-regulated and 3 DEGs were reversely regulated in two datasets (Fig. 2C).
GO and KEGG pathway analysis of common DEGs
838 common down-regulated DEGs and 208 common up-regulated DEGs were used to perform the GO and KEGG pathway analysis. The top5 enrichment GO terms in common down-regulated DEGs were cytokine-cytokine receptor interaction, PI3K-Akt signaling pathway, focal adhesion, pathways in cancers and proteoglycans in cancer. The top5 enrichment KEGG pathways in common down-regulated DEGs were signal transduction, cell adhesion, inflammatory response, immune response and positive regulation of cell proliferation. The top5 enrichment GO terms in common up-regulated DEGs were neuroactive ligand-receptor interaction, cAMP signaling pathway, glutamatergic pathway, basal cell carcinoma and Steroid hormone biosynthesis. The top5 enrichment KEGG pathways in common up-regulated DEGs were negative regulation of transcription from RNA polymerase II promoter, cell adhesion, nervous system development, positive regulation of transcription (DNA templated) and chemical synaptic transmission (Fig. 2D-E). All these pathways are associated with the biological process of cancer.
Determination of the hub genes
In order to further investigate the molecular mechanism differences between IDH1MUT and IDH1WT GBM, the starBase v2.0 was used to identify the target genes of the lncRNAs in common DEGs. As a result, twenty-nine lncRNAs in DEGs targeted 697 unique genes. There were 16 common genes in 697 target genes and 1049 common DEGs. The 16 common genes were marked as candidate genes in the following analysis. Survival analysis were performed between high expression group and low expression group of 16 candidate genes in the GBM cohort in TCGA and CGGA. MICA was the only one whose p value less than 0.05 in both datasets (Fig. 3). In this study, MICA was the potential target of lncRNA HCP5. Therefore, we speculated that HCP5/MICA axis may play a significant role in GBM.
Expression of HCP5 and MICA
The expression levels of HCP5 and MICA were analyzed in 33 types of tumor and the corresponding normal tissue samples. The expression data of tumor samples were obtained from TCGA, the majority of normal tissue were obtained from GTEx due to the few normal samples in TCGA. The violin plots showed that the expression of HCP5 is higher in most tumor types than that in normal tissue and that MICA is differentially expressed in different types of tumor and normal tissue, but they are both higher in GBM than that in normal brain tissues (Fig. 4A-B). Then we analyzed the expression of HCP5 and MICA in IDH MUT and IDH1WT GBM, showing a significant higher expression of both genes in IDH WT GBM than that in IDH MUT GBM (Fig. 4C-D). Considering that the subcellular localization of lncRNA implies the underlying mechanism, we utilized the lncLocator[10] to predict the subcellular localization of HCP5 and MICA. As shown in Fig. 4G, HCP5 was mainly located in cytoplasm and MICA was mainly located in cytoplasm and cytosol, which suggested that HCP5 may improve the expression of MICA (Fig. 4E). Analysis of expression correlation of HCP5 and MICA in GBM revealed that the expression of MICA was strongly correlated with the expression of HCP5 (Fig. 4F-G). Then we analyzed the interactions between HCP5 and MICA using starBase v2.0. There are 9 interaction regions in HCP5 and MICA and three interactions with lowest free energy are shown in Fig. 4H. Their interactions were validated in 3 three supported experiments[12, 13]. These data indicated that HCP5 can improve the expression of MICA in GBM.
Correlation analysis between MICA expression and immune infiltration and immune checkpoints.
Immune infiltration analysis revealed the correlation between MICA expression and scores of six types of immune cell in GBM. MICA expression was significantly correlated with the immune infiltration of macrophage (R = 0.174, p = 0.0327) and dendritic (R = 0.25, p = 0.00195) in GBM (Fig. 5A). In order to explore the roles of MICA in GBM immune microenvironment, the correlation between MICA expression and immune score, estimate immune score and stromal score were analyzed. As shown in Fig. 5B-D, MICA expression was positively correlated with immune score (R = 0.278, p < 0.001), estimate immune score (R = 0.291, p < 0.001) and stromal score (R = 0.303, p < 0.001) in GBM. Furthermore, we analyzed the correlation between MICA expression and the expression levels of 47 immune checkpoint genes in 33 types of tumor. MICA expression was significantly correlated with expression levels of immune check point genes in pan-cancer, especially in LGG (low grade glioma), OV and LIHC. In GBM, there were 21 immune check point genes correlated with the MICA expression (Fig. 5E).
Methylation analysis of MICA in GBM
To further unravel the roles of MICA in GBM, we also explored the correlation of methylation sites of MICA and the survival of GBM patients. The heatmap derived from UCSC Xena displayed the survival days, survival status and 74 methylation sites of MICA in 151 patients with GBM (Fig. 6A). For each methylation site, 151 patients with GBM were divided into high methylation and low methylation groups according to their beta values. Survival analysis between high methylation and low methylation groups determined three methylation sites (cg16203549, cg18188079, cg09210225) that were associated with the OS of GBM. High methylation of cg16203549 and low methylation of cg18188079 and cg09210225 are the favorable prognostic factors in GBM (Fig. 6B-D). Then we analyzed the correlation between MICA expression and the beta value of methylation. As shown in Fig. 6E-G, MICA expression was negatively correlated with the methylation of cg16203549 (R=-0.28, p = 0.025), and is positively correlated with the methylation of cg18188079 (R = 0.43, p < 0.01) and cg09210225 (R = 0.41, p < 0.01). These data suggested that the methylation of cg16203549, cg18188079 and cg09210225 regulated the MICA expression and eventually influence the survival time of GBM patients.
The MICA-associated gene signature in GBM
Ten MICA-associated genes were determined by String database and five protein-protein associations existed in MICA and the proteins encoded by its associated genes (Fig. 7A). Then the expression data of these 11 genes in GBM in TCGA were used to perform the LASSO cox regression. Six genes with non-zero coefficients were determined by the best penalty parameter λ. Four genes (MICA, PDIA6, ULBP2, KLRD1) had positive coefficients, whereas two genes (KLRK1, MICB) had negative coefficients (Fig. 7B-D). We calculated the risk score for each patient according to the formula mentioned in the method. Survival analysis revealed that low risk score group had a superior OS than the high risk score group in GBM, and it was validated in the CGGA dataset (Fig. 7E-F). Therefore, the MICA-associated gene signature consisting of six genes may be a potential prognosis biomarker for GBM.
Prognostic model for GBM
The development and progress of GBM involves intricate mechanism and the prognosis of patients with GBM can be affected by various factors. Here, we subjected the risk score, age, gender, IDH1 mutation, TP53 mutation, radiation and chemotherapy into the nomogram to construct a prognostic model for individual GBM patient. The point for each variable ranges from 0 to 10. Risk score is the variable taking the largest share in the total points. The favorable prognostic factors for variables are younger age, female, IDH1 mutant, TP53 mutant, receiving temozolomide chemoradiation and temozolomide chemotherapy, low risk score. Given a specific total point of a patient with GBM, the survival probabilities at 1-year, 2-year and 5-year can be predicted from the nomogram (Fig. 8). The C-index of the nomogram is 0.76.