Gene expression analysis data
The relationship between human CBX3 (NM_007276.4 for mRNA or NP_009207.2 for protein) and pan-cancer was explored from many aspects in the study. We analyzed the distribution and expression of CBX3 in different tumor cells and tissues based on the combination of the TCGA, GTEx, CCLE and HPA datasets.
The TIMER2.0 tool was applied to analyze the expression of CBX3 in cancers of TCGA. As shown in Figure 1a, CBX3 in most tumor tissues, including BLCA, BRCA, CHOL, COAD, ESCA, GBM, HNSC, KIRC, KIRP, LIHC, LUAD, LUSC, PRAD, STAD, THCA(P<0.001), CESC, UCEC (P<0.01) has a higher expression level, compared with that in normal tissue. While, the KICH (P<0.01), and PCPG (P<0.05) have lower expression level of CBX3 than the corresponding control tissues.
With the help of GEPIA 2 web server, we combined TCGA and GTEx database to compare the expression of gene in tumor tissues and normal ones. The expression difference of CBX3 in SKCM, GBM, LAML, DLBC, LGG, PAAD, TGCT, UCS and THYM (Figure 1b, P<0.05) were further evaluated. However, for other tumors, such as ACC, SARC, OV and CESC (Figure S1), it showed no significant difference.
In order to analyze the mRNA expression of CBX3 in cancer patients, the ONCOMINE database was used. As were shown in Figure 1c, mRNA expressions of CBX3 in various types of cancers were measured and compared to normal tissues. Significantly higher mRNA expression of CBX3 was found in multiple tumor tissues.
We visualized the distribution and expression of CBX3 in normal and tumor human tissues via the GEPIA2 website. As shown in Figure 1d, the body map, dot plot and bar plot indicate that CBX3 is distributed in multiple tissues of the human body. In addition, in general, tumor tissues have a higher expression level of CBX3 compared with that in normal ones, apart from KICH, PCPG and LAML. The result is consistent with that from Figure 1a, and further confirms that CBX3 is lowly expressed in KICH, PCPG compared with normal controls.
Considering that tissue-based RNA expression detection might be complicated by the non-tumor tissues that are adjacent to tumor cells, we further analyzed the expression and distribution of CBX3 in 1457 cell lines from 26 tumor types based on the CCLE database. The box plot in Figure 1e demonstrates that the cell lines from burkitt lymphoma, neuroblastoma and lung_small_cell were the top ones expressing the highest levels of CBX3 mRNA. However, the cell lines from hodgkin lymphoma showed relatively lower CBX3 mRNA expression.
In order to clarify the expression level of CBX3 in pathological stages of different tumors, we applied the “Pathological Stage Plot” module of GEPIA2 tool. Among them, the cancers including LIHC (P<0.001), ACC (P<0.01), THCA, TGCT, KICH, OV (P<0.05) showed difference between gene expression and pathological stages (Figure 1f).
The results of the CPTAC dataset showed compared with the normal tissues, the primary tumors of breast cancer, colon cancer, ovarian cancer, clear cell RCC, LUAD and UCEC had higher expression of CBX3 total protein (Figure 1g, P<0.001). Moreover, the additional analysis results in the ONCOMINE database (Figure 1c) further confirmed that CBX3 is highly expressed in breast cancer, cervical cancer, lung cancer, kidney cancer, colorectal cancer and ovarian (P<0.01) compared with normal controls.
We tried to explore the protein expression patterns of CBX3 in cancers by the Human Protein Atlas. As was shown in Figure 1h, CBX3 proteins were not detected in normal liver and testis tissues, whereas low expressions of them were observed in normal skin and pancreas tissues. In addition, medium protein expressions of CBX3 were expressed in normal tissues of breast, lung, cervix, ovary and colon. while high protein expressions in their corresponding cancers were observed. Taken together, the results showed that genic and proteinic expressions of CBX3 were over-expressed in most cancers.
Data from the multiple databases showed that CBX3 mRNA and protein expression in tumor tissues was significantly higher, compared to that in normal tissues, indicating that it might function as an oncogenic molecule in the development of diverse tumors.
Survival analysis data
Next, we evaluated the effect of CBX3 expression on prognostic value for the patients in pan-cancer. According to the expression levels of CBX3, the cancer cases were divided into high-expression and low-expression groups. Notably, CBX3 expression was significantly correlated with OS in 11 types of cancer (ACC, CESC, HNSC, KICH, LGG, LIHC, LUAD, MESO, PAAD, THYM and UVM) (Figure 2a). In addition, except as a protective factor in THYM (P =1.7e-02, HR =0.98), CBX3 appeared to be a risk factor in other cancer types. Specifically, the results were shown in Figure2b, ACC (P =2.7e−07, HR =1.01), CESC (P =1.4e-02, HR =1), HNSC (P =2.0e-02, HR =1), KICH (P =1.7e-04, HR =1.04), LGG (P =4.6e-07, HR =1.01), LIHC (P =3.6e-07, HR =1.01), LUAD (P =3.4e-03, HR =1), MESO (P = 2.8e-02, HR =1.01), PAAD (P =2.8e-04, HR =1.01) and UVM (P =6.3e-03, HR =1.01). Subsequently, we investigated the relationship between CBX3 expression and DFI and found that increased gene expression was correlated with poor prognosis in ACC (P= 0.0021, HR= 1.01), CESC (P =0.0400, HR =1.01), KIRP (P =0.0140, HR =1.01), LIHC (P =0.0130, HR =1.01) and PAAD (P=0.0280, HR =1.01), but with favorable prognosis in OV (P= 0.0031, HR =0.99) (Figure 2c, d). Next, the relationship between CBX3 expression and PFI in 33 cancers from TCGA was analyzed. Results showed CBX3 expression impacted patients’ PFI in 12 cancer types (Figure 2e). Specifically, the Kaplan-Meier curves of all the cancers (Figure 2f) showed that high expression of CBX3 was significantly correlated with poor prognosis of patients in ACC (P =9.9e-10, HR =1.01), CESC (P =5.0e-03, HR =1.01), HNSC (P =1.8e-03, HR =1), KICH (P =3.1e-04, HR =1.04), KIRP (P =1.1e-02, HR =1.01), LGG (P =3.0e-07, HR =1), LIHC (P =1.5e-04, HR =1.01), LUAD (P =1.7e-03, HR =1), MESO (P =1.8e-02, HR =1.01), PAAD (P =4.2e-04, HR =1.01), PRAD (P =3.5e-03, HR =1.01) and UVM (P =1.1e-04, HR =1.02). Ultimately, we assessed the relationship between CBX3 expression and DSS to exclude non-tumor death factors. The results showed that high expression of CBX3 affected DSS unfavorably in ACC (P =3.1e-07, HR =1.01), CESC (P =3.9e-03, HR =1.01), HNSC (P =1.1e-02, HR =1), KICH (P =6.8e-05, HR =1.05), LGG (P =6.1e-07, HR =1.01), LIHC (P =5.8e-04, HR =1.01), LUAD (P =1.9e-02, HR =1), MESO (P =1.4e-02, HR =1.01), PAAD (P =1.4e-03, HR =1.01), THCA (P =2.0e-02, HR =1.03), UVM (P =6.0e-03, HR =1.01) (Figure 2g, h). In conclusion, these results indicated that CBX3 expression level is a key influencing factor for the prognosis of patients, especially those with ACC, CESC, LIHC and PAAD. Moreover, the above data mainly from the datasets of TCGA revealed that elevated expression of CBX3 is associated with the poor prognosis of cases in most tumors.
DNA methylation analysis data
The DNMIVD tool was used to investigate the potential association between CBX3 DNA methylation and the prognosis of pan-cancers based on TCGA and GEO databases. As shown in the TABLE 1, 21 methylation sites of CBX3 and related information were summarized. The correlation between promoter methylation in target cancers of KIRC, LUSC and UCEC and CBX3 gene expression were shown in Figure 3a, c, e, respectively. Except in UCEC, we observed a significant positive correlation of CBX3 DNA methylation and gene expression in KIRC and LUSC. Furthermore, the Figure 3b, 3d and 3f displayed the relationship between gene methylation and prognosis in KIRC, LUSC and UCEC, respectively. In contrast to that in LUSC, the high methylation expression of CBX3 in KIRC showed the worse prognosis, compared with the low expression.
Genetic alteration analysis data
The genetic variation status of CBX3 in various cancer samples were observed via the cBioportal tool. As shown in Figure 4a, the patients with UCEC occupied the highest alteration frequency of CBX3, nearly 5%. Furthermore, the primary type of alteration performed “mutation” (~3% frequency). The “amplification” type of CNA was the primary type in the esophageal cancer cases, which show an alteration frequency of ~4% (Figure 4a). In addition, all sarcoma cancer cases with genetic alteration (~3% frequency) presented “amplification” alteration (Figure 4a). As displayed in Figure 4b, the types, sites and case number of the CBX3 genetic alteration are presented clearly. We found that missense mutation of CBX3 was the main type of genetic alteration, and K14Nfs*23 alteration in the Chromo, which was detected in 2 cases of UCEC, 2 cases of STAD and 1 case of COAD (Figure 4b). We presented the 3D structure of CBX3 protein (Figure 4c). Additionally, we explored the potential association between genetic alteration of CBX3 and the clinical survival prognosis of cases with different types of cancer. The data of Figure 4d indicated that STAD cases with altered CBX3 showed poorer prognosis in disease-free survival (P=4.046e-03), compared with cases without CBX3 alteration. However, no significant differences were observed in overall (P=0.835), disease-specific (P=0.490) and progression-free (P=0.945) survival.
Protein phosphorylation analysis data
The CBX3 phosphorylation sites from SMART web were summarized in the pattern diagram of Figure 5a. The differences in protein phosphorylation levels of CBX3 between normal tissues and primary tumor ones were compared from the CPTAC dataset via UALCAN tool. Among them, we analyzed six types of tumors, including breast cancer, LUAD, clear cell RCC, ovarian cancer, colon cancer and UCEC. The S93 locus of CBX3 exhibits a significant difference on phosphorylation level in five primary tumor tissues except clear cell RCC, compared with normal tissues (Figure 5c-g, all P <0.05). In contrast with the ovarian cancer, the other four cancers expressed higher phosphorylation level than corresponding controls. The second change was an increased phosphorylation level of the S95 locus for LUAD (Figure 5c, P=3.1e-02), colon cancer (Figure 5e, P=2.1e-08) and UCEC (Figure 5g, P=5.8e-10). While, clear cell RCC (Figure 5f, P=5.2e-07) in the S95 locus displayed decreased level. We also analyzed and summarized the expression difference in other phosphorylation sites. As shown in Figure 5, most of the primary tumors displayed a higher phosphorylation level than the corresponding normal tissues.
TME analysis data
The TME contains various cells, yet infiltrating immune cells account for a large proportion [22, 23]. As prominent components of TME, these cells were believed to be closely related with the initiation, progression or metastasis of cancer [24]. It was reported that the TILs and CAFs participated in modulating the function of various tumor-infiltrating immune cells [25]. Therefore, it is necessary to evaluate the correlation between CBX3 and the characters of immune infiltration in pan-cancer. In the part, we investigated the potential relationship between the infiltration level of different immune cells and CBX3 gene expression or clinical outcome in diverse cancer types of TCGA.
We obtained the scores of 6 infiltrating immune cells from TCGA database and analyzed the correlation with CBX3. As shown in Figure 6a, results revealed that CBX3 expression was appreciably positively correlated with the infiltration levels of 6 immune cells in KIRC and LIHC, while negatively in LUSC. For the StromalScore, the top 3 cancers with significant difference all showed negatively correlated with CBX3 expression in TGCT, BRCA and LUSC. The increased gene expression was correlated with decreased Est_ImmuneScore in BRCA, UCEC and SKCM. In addition, CBX3 expression was negatively correlated with the ESTIMATEScore in BRCA, UCEC and LUSC (Figure 6b). Subsequently, we analyzed the correlation between CBX3 expression and that of 40 common immune checkpoint genes. The heat plot showed CBX3 expression was correlated with more than 20 immune checkpoint markers in UVM and KIRC. While, in LIHC, it had even close to 40 checkpoint markers (Figure 6c). To be conclude, these results strongly suggest that CBX3 plays a vital role in tumor immunity.
Moreover, we observed a statistical positive correlation of CBX3 expression and the estimated infiltration value of CAFs for the CESC, ESCA, HNSC-HPV-, KICH, LGG, LIHC, PAAD, and THCA, but noted a negative correlation for TGCT and THYM (Figure 6d) based on the TIDE, MCP-COUNTER and EPIC algorithms. The scatterplot data of the above tumors produced using one algorithm are presented in Figure 6e. For instance, the CBX3 expression level in KICH is positively correlated with the infiltration level of CAFs (cor=0.486, P=4.06e-05) based on the TIDE algorithm. Similarly, as shown in the Figure 6f and 6h, we revealed the potential relationship between the level of cancer associated fibroblast and the clinical outcomes of age or stage in various cancer types in the form of heat map, respectively. In addition, the cumulative survival graphs on the cancer with statistical difference were displayed in the Figure 6g and 6i. For example, as the age of LGG patients with high infiltration levels increases, the survival outcome is worse. To sum up, the results indicated that CBX3 expression was closely correlated with the TME in cancers.
Correlation analysis of CBX3-binding hub genes
To further investigate the molecular mechanism of the CBX3 gene in the occurrence and development of tumors, we attempted to screen out and analyze the CBX3-binding hub genes. Based on the STRING tool, we obtained 50 CBX3-binding genes, which were supported by experimental evidence (Figure 7a). Afterwards,10 hub genes (CBX3, TRIM24, HIST1H1E, HIST1H1C, H2AFY, CHD1L, POLA1, EHMT2, L3MBTL1, HIST1H1B) were further screened with the help of Cytohubba module (Figure 7b).
In the protein-chemical interactions network, hubs were associated with the chemical substances, Aflatoxin B1, Valproic Acid, Formaldehyde, Copper Sulfate, Cyclosporine (Figure 7c). These toxic chemicals, as a predisposing factor, may induce cancer by affecting CBX3 or its binding genes.
We presented hubs-TFs interactions (Figure 7d) and hubs-miRNAs interactions (Figure 7e) in the form of network diagram, respectively. We further detected central regulatory biomolecules (TFs and miRNAs) according to the set topological parameters. Seven TFs (GATA3, E2F1, TFAP2A, GATA2, FOXC1, JUN, CREB1) and four miRNAs (Hsa-mir-182-5p, Hsa-mir-320a, Hsa-mir-24-3p, Hsa-mir-1-3p) were detected from the hubs-TFs and hubs-miRNAs interaction networks, respectively. The TFs and miRNAs participated the tumorigenesis by regulating CBX3-binding hub genes in transcription level and post-transcriptional level.
Enrichment analysis of CBX3-related partners
Similarly, the CBX3 expression-correlated genes were also screened out for a series of pathway enrichment analyses to further investigate the molecular mechanism from another point of view. The GEPIA2 tool was used to screen the top 100 genes that correlated with CBX3 expression from all tumor expression data of TCGA. The scatter plots (Figure 8a) displayed the positive correlation between CBX3 expression level and that of HNRNPA2B1(R=0.67), PLEKHA8(R=0.64), LSM5(R=0.64), FTSJ2(R=0.63) and DHX9(R=0.62) genes (all P <0.001). Moreover, the corresponding heatmap further confirmed the positive association between CBX3 and the five genes in the majority of tumors (Figure 8b). We did an intersection analysis of the correlated gene group and interacted gene group, then obtained one common member, centromere protein A CENPA (Figure 8c). Finally, the KEGG and GO enrichment analyses were performed with the combined two datasets via R language, respectively.
The GO enrichment analysis data were presented in three parts, including BP (Biological Process), CC (Cellular Component) and MF (Molecular Function). The results indicated that most of these genes are linked to the cellular biology of DNA. For example, in the BP part, DNA conformation change and protein-DNA subunit organization played a major role. In the CC part, nuclear chromatin and chromosomal region were main enrichment position. The change in molecular function of nucleosome binding and chromatin DNA binding is linked to the pathways (Figure 8d).
The bubble plot of KEGG data suggested that the two pathways of “alcoholism” and “neutrophil extracellular trap formation” might be involved in the tumor pathogenesis of CBX3 to a large extent (Figure 8e).