Pan-cancer analysis of BLM as a prognostic and immunological marker in human tumors

Background: Bloom syndrome helicase (BLM) is critical for DNA replication, repair, transcription, telomere maintenance and other cellular metabolic processes. Studies reported that BLM is closely related to the development of some types of tumors, however there is no pan-cancer analysis of BLM. Results: BLM was over-expressed in almost all types of cancers and associated with overall survival and disease-free survival in multiple cancers . We found a stage-specific expression of BLM among them. A correlation of BLM expression with the estimated infiltration value of cancer-associated fibroblasts, endothelial cells, immunostimulator, and immunoinhibitor was discovered in part types of cancers. Finally, we found that BLM was associated with cell cycle, p53 signaling pathway, and other mechanisms. Conclusion: Our study provided a relatively comprehensive understanding of BLM as an oncogene and predictor of prognosis in various tumor types. Methods: We analyzed data from TCGA (The cancer genome atlas), GEO (Gene expression omnibus), and GTEx databases by using bioinformatic tools, to evaluate the expression difference, prognostic value, genetic alterations, immune infiltrates, and biological functions of BLM in 33 types of cancers.


Gene Expression Analysis Data
We applied the TIMER2 tool to analyze the expression status of BLM across all cancer types of TCGA. As shown in Fig. 1A Next, we further evaluated the expression difference of BLM between the normal tissues and tumor tissues after including the normal tissue of the GTEx dataset. As shown in Fig. 1B, except for the types of tumors mentioned above, the expression level of BLM is up-regulated in tumor tissue of ACC (Adrenocortical carcinoma), DLBC (Lymphoid neoplasm diffuse large B-cell lymphoma), LAML (Acute myeloid leukemia), LGG (Brain lower grade glioma), OV, PAAD (Pancreatic adenocarcinoma), TGCT (Testicular germ cell tumors), THYM (Thymoma), and UCS (Uterine carcinosarcoma). However, we did not observe a significant difference for the remnant tumors, such as KICH (Kidney chromophobe) and SARC (Sarcoma).
Next, we analyzed the total protein expression of BLM by using the CPTAC dataset, which showed that BLM is highly expressed in the tumor tissues of clear cell RCC (p<0.001), LUAD (p<0.05), and UCEC (p<0.01) than in normal tissues, whereas BLM is lower expressed in the tumor tissues of OV (p<0.05) (Fig. 1C).
We also used GEPIA2 approach to analyze the correlations between BLM expression and the pathological stages of cancers, which showed that a stage-specific expression of BLM in ACC, BRCA, KICH, KIRC, KIRP, LIHC, LUAD, and THCA (Fig. 1D, all P<0.05) but not others (Fig. S1).

Survival analysis data
In order to evaluate the prognostic value of BLM, we classified the cancer cases into high-expression and low-expression groups according to the expression levels of BLM and investigated the correlation of BLM expression with the prognosis of patients with various types of tumors based on the datasets of TCGA and GEO. As shown in Fig. 2A, high expression of BLM was associated with poor prognosis of OS for cancers of ACC (P<0.001), KIRC (P=0.0045), KIRP (P=0.0017), LGG (P<0.001), LUAD (P=0.039), MESO (P<0.001), PAAD (P=0.044), SARC (P=0.031), and SKCM (P=0.016). In contrast, the high expression of BLM was associated with favor prognosis of OS for CESC (P=0.021), LUSC (P=0.018), and OV (P=0.029).With regard to DFS, the data (Fig. 2B) showed a correlation between high BLM expression and poor prognosis for the cancers including ACC (P=0.0039), KIRP (P=0.0064), LGG (P=0.001), LIHC (P=0.0048), MESO (Mesothelioma) (P=0.038), PAAD (P=0.0062), and PRAD (P<0.001). Using univariate Cox regression analysis, we found that increased BLM expression was a risk factor for ACC, CHOL, KICH, KIRC, LGG, MESO, and PAAD. In contrast, decreased BLM expression was a risk factor for CESC and THYM (Fig. 3).

Genetic alteration analysis data
Using cBioportal tool, we observed the genetic alteration status of BLM in different tumor samples of the TCGA cohorts. As shown in Fig. 3A, the highest alteration frequency of BLM (almost 8%) appears for patients with undifferentiated stomach adenocarcinoma with "mutation" as the only type. The "amplification" type of CNA was the primary type in the cases of esophagogastric adenocarcinoma, which show an alteration frequency of~5% (Fig. 4A). Besides, we found that all sarcoma cases with genetic alteration (>2% frequency) had amplification of BLM (Fig. 5A).
The types, sites and case number of the BLM genetic alteration are further presented in Fig. 4B. We found that missense mutation of BLM was the main type of genetic alteration, and N515Mfs*16 alteration was detected in 3 cases of colorectal adenocarcinoma, 2 cases of STAD and 1 case of UCEC (Fig. 5B).
Additionally, we analyzed the potential association between genetic alteration of BLM and the clinical survival prognosis of cases with different types of cancer. As shown in Fig.  5C, it indicated that cases with altered BLM showed better prognosis in disease-free (P=0.0318) survival, but not overall (P=0.383) and disease-specific (P=0.639) and progression-free (P=0.141) survival when compared with cases without BLM alteration.

Immune Infiltration Analysis Data
Tumor-infiltrating immune cells, as pivotal components of the tumor microenvironment, were tightly associated with the initiation, progression or metastasis of cancer [22,23]. Cancer-associated fibroblasts in the stroma of the tumor microenvironment were reported to participate in modulating the function of various tumor-infiltrating immune cells [24,25]. Thus, we applied the TIMER, CIBERSORT, CIBERSORT-ABS, TIDE, XCELL, MCPCOUNTER, QUANTISEQ and EPIC algorithms to explore the correlation between the infiltration level of cancer-associated fibroblasts and endothelial cells and BLM expression in multiple tumor types of TCGA. Interestingly, we discovered a negative correlation of BLM expression and the estimated infiltration value of cancer-associated fibroblasts for the BRCA, BRCA-Her2, LUSC, STAD, TGCT, and THYM, whereas BLM expression was positively correlated with that in KIRP (Fig. 6). Meanwhile, we found a negative correlation of BLM expression and the endothelial cell infiltration for the BRCA, COAD, KIRC, LUAD, STAD, TGCT, THCA, and THYM, while a positive correlation was found between the endothelial cells and BLM expression in the tumors of GBM ( Figure S2).
By using TISIDB, we also explored the association of BLM expression with immune-related chemokines, immunostimulator, and immunoinhibitor. The analysis of immune-related chemokines revealed that BLM was associated with most of the chemokines in KIRC, TGCT, and THCA. Among all tumor types, CCL14 and CX3CL1 were the most common chemokines which correlated with BLM expression (Fig. S3A). With regard to immunostimulator and immunoinhibitor, we found that BLM was positively correlated with most of them in KIRC, TGCT, and THCA, whereas BLM was negatively correlated with that in GBM and UCS( Fig. S3B-S3C).

Enrichment analysis of BLM-related partners
To further distinguish the molecular mechanism of the BLM gene in tumorigenesis, we attempted to filter out the targeting BLM-binding proteins and the BLM expression-correlated genes for pathway enrichment analyses. By using STRING tool, we obtained a total of 50 BLM-binding proteins, which were originated from experimental evidence. The interaction network of these proteins were shown at Fig. 7A. Meanwhile, we used the GEPIA2 tool to combine all tumor expression data of TCGA and acquired the top 100 genes that correlated with BLM expression. An intersection analysis by Jvenn tool of the above two groups showed 11 common genes, including BRCA1 (Breast cancer type 1 susceptibility protein), CHAF1A (Chromatin assembly factor 1 subunit A), CHEK1 (Serine/threonine-protein kinase Chk1), ERCC6L (DNA excision repair protein ERCC-6-like), EXO1 (Exonuclease 1), FANCA (Fanconi anemia group A protein), FANCD2 (Fanconi anemia group D2 protein), FEN1 (Flap endonuclease 1), RAD51 (DNA repair protein RAD51 homolog 1), RAD54L (DNA repair and recombination protein RAD54-like), and TIPIN (TIMELESS-interacting protein) (  Fig. 7D showed the corresponding heatmap data of a positive correlation between BLM and the above 11 genes in the majority of detailed cancer types. Finally, we combined the two datasets to perform KEGG and GO enrichment analyses. The KEGG data of Fig. 7E suggest that "cell cycle", "mismatch repair", "cellular senescence", "p53 signaling pathway", "microRNAs in cancer", and "platinum drug resistance" might be involved in the effect of BLM on tumor pathogenesis. In addition, the GO enrichment analysis indicated that most of these genes are linked to "DNA replication" in the BP category, chromosomal region in the CC category, and ATPase activity in the MF category( Fig. 7F-7H).

Discussion
Dysregulation of DNA damage repair plays a fundamental role in carcinogenesis. BLM has important roles in the initiation and regulation of homologous recombination repair of DNA double-strand breaks. [2,3,4]. Therefore, it is reasonable to extrapolate that an optimal level of BLM is necessary to maintain genome stability. Both high and low levels or loss of BLM may lead to genomic instability and may eventually promote tumorigenesis. [15,6]Although recent evidence shows BLM mRNA to be over expressed in various cancers including colon, breast and prostate when compared with the normal samples. [16,18,13] However, the roles of BLM in human pan-cancer are not well understood. In this study, we identified the relationship of the BLM in multiple cancer tumorigenesis models via a pan-cancer analysis of TCGA, CPTAC, TIMER2, and GTEx databases.
According to the TCGA and GTEx dataset, the gene expression analysis showed that significantly higher expression level of BLM occurred almost in all types of malignances except KICH and SARC, as compared with corresponding normal tissues. Moreover, a stage-specific expression of BLM were found in ACC, BRCA, KICH, KIRC, KIRP, LIHC, LUAD, and THCA, which indicated role of BLM in tumorigenesis in various types of cancer. Although BLM was highly expressed in most tumors, the survival prognosis analysis data suggested distinct conclusions for different tumors. BLM was over-expressed and associated with worse OS in ACC, KIRC, KIRP, LGG, LUAD, MESO, PAAD and SKCM. In contrast, the high expression of BLM was associated with favor OS in CESC, LUSC and OV. However, BLM expression were not necessarily associated with prognosis in other types of cancer. Along with that, the results of cox regression showed that BLM expression was independent risk factor for OS in ACC, CESC, KIRC, LGG, MESO, and PAAD.
By now, several researches have testified the impact of BLM in several kinds of cancers. Ruan et al found that over-expressed BLM can either promote cell proliferation of PC3 cell lines in vitro study, or promote tumor growth by directly interacting with EZH2 in vitro and vivo study [14]. Otherwise, they also validated that BLM was highly expressed in tumor tissue when compared with normal and hyperplasia tissues, which was accordingly to the result of our pan-cancer analysis. Another study investigated the mRNA and protein expression of BLM in a huge amount of patients with breast cancer, which showed that a high expression of BLM was correlated with poor breast cancer specific survival. However, the relationship between BLM expression and overall survival was not displayed [18]. In addition, the role of BLM gene in cholangiocarcinoma was studied by Du et a [21]. This study showed that the expression of BLM was much higher in tumor tissue and cells than in non-tumor tissue and cells by using Immunohistochemical staining and qRT-PCR. After BLM was knockdown by siRNA, the proliferation and migration ability of CCA cells were inhibited, while the cell cycle was arrested. This is highly consistent with our current research.
Gene mutations play an important role in the pathogenesis of some cancers [22]. In this study, the highest alteration frequency of BLM (almost 8%) appears for patients with undifferentiated stomach adenocarcinoma with "mutation" as the only type. We found that missense mutation of BLM was the main type of genetic alteration. These missense mutations have been shown to abolish the ATPase and DNA binding activity with some of them losing ATP binding activity, thus rendering the BLM protein catalytically inactive [23,24]. In literatures, BLM mutation that confers low-to-moderate penetrance risk for developing CRC [25,7] Our study indicated that cases with altered BLM showed better prognosis in disease-free survival. Specific gene mutations may predict patient prognosis and treatment response. However, the precise biological mechanism is not yet fully investigated.
The tumor microenvironment (TME) is a tumor promoting setting that tumor cells use to evade immune surveillance. The presence of the TME significantly influences therapeutic response and clinical outcome. Previous studies have identified various components that participate in the formation of the TME, including cancer-associated fibroblasts (CAF), endothelial cells, and so on [26,27,28,29] Surprisingly, we discovered a negative correlation of BLM expression and the estimated infiltration value of CAF and endothelial cell for a plethora of cancer. This result demonstrated the tumor suppressive function of BLM once again. CAF and endothelial cells serve pro-tumorigenic roles in the TME via secretion of various growth factors, cytokines, and chemokines and via degradation of the extracellular matrix [30,31]. This association between BLM and TME might be another reason for the prognostic implications of BLM in various cancers.
We characterized the function of differentially expressed BLM via GO enrichment analysis and KEGG pathway enrichment analysis. We found that differentially expressed BLM was mainly associated with regulation of cell cycle, mismatch repair, cellular senescence, p53 signaling pathway, microRNAs in cancer, and platinum drug resistance. The forementioned studies of prostate cancer and cholangiocarcinoma both proved that BLM was associated with p53 signaling pathway and cell cycle. This, in turn, confirms our pan-cancer analysis. Whether BLM participated in the mentioned pathways in other types of cancer needs further investigation.

Conclusion
Regarding genomic instability and mutations is one of the hallmarks of cancer which derived result from impairment of DNA damage repair, there has been increasingly more attention for BLM because of its critical role on DNA replication, repair, transcription, telomere maintenance. By using bioinformatic analysis, our study provided a relatively comprehensive understanding of BLM as an oncogene and predictor of prognosis in various tumor types.

Gene Expression Analysis
In this study, we used TIMER2 (tumor immune estimation resource, version 2, http://timer.cistrome.org/) to analyze the expression differences of BLM between tumor types and adjacent normal tissues. For partial types of tumors without or only with limited normal tissue [e.g., TCGA-DLBC (Lymphoid neoplasm diffuse large B-cell lymphoma), TCGA-PAAD (Pancreatic adenocarcinoma), etc.], we analyzed the BLM expression based on combining the data for normal tissues from the GTEx (Genotype-Tissue Expression) database with data from The Cancer Genome Atlas (TCGA). All expression data were normalized via log2 conversion. The GEPIA2 (Gene Expression Profiling Interactive Analysis, version 2) tool (http://gepia2.cancer-pku.cn/#analysis) was used to analyze the BLM expression in different pathological stages of all TCGA cancers. The log2 [TPM (Transcripts per million) +1] transformed expression data were applied for the box or violin plots. Additionally, we used UALCAN tool (http://ualcan.path.uab.edu/analysis prot.html) to conduct the protein expression analysis of the CPTAC (Clinical proteomic tumor analysis consortium) dataset based on cancer Omics data (15). Expression level of the total protein of BLM has been compared between primary and normal tissues from available datasets of five tumors,which included BRCA (breast cancer), OV (ovarian cancer), KIRC (Kidney renal clear cell carcinoma), LUAD (Lung adenocarcinoma), and UCEC (Uterine corpus endometrial carcinoma).

Survival Prognosis Analysis
We used GEPIA2 tool to obtain the OS (Overall Survival) and DFS (Disease-Free Survival) significance map data and survival plots of BLM across all TCGA tumors. We used cutoff-high (50%) and cutoff low (50%) values as the expression thresholds to split the high-expression and low-expression cohorts (16). The log-rank test was used in the hypothesis testing. For survival analysis, the HR and p value were calculated using univariate Cox regression analysis.

DNA methylation analysis data
We also used the UALCAN tool to investigate the potential association between BLM DNA methylation and the pathogenesis of different tumors in the TCGA project.

Genetic Alteration Analysis
We used cBioPortal tool (https://www.cbioportal.org/) to obtain the data of alteration frequency, mutation type, mutated site information, and CNA (Copy number alteration) across all TCGA tumors. Survival data, including the overall, disease-free, progression-free, and disease-free survival differences were compared for all the TCGA cancer types, with or without BLM genetic alteration. Kaplan-Meier plots with log-rank P-value were generated as well.

Immune Infiltration Analysis
TIMER2 tool was also used to investigate the relationship between BLM expression and immune infiltrates across all TCGA tumors. Cancer-associated fibroblast and endothelial cell were selected for detailed analysis. The TIMER, TIDE, CIBERSORT, CIBERSORT-ABS, QUANTISEQ, XCELL, MCPCOUNTER and EPIC algorithms were applied for estimations. The data were visualized as a heatmap and a scatter plot. Additionally, we used TISIDB website (http://cis.hku.hk/TISIDB/index.php) for the analysis of immune-related chemokines, immunostimulator, and immunoinhibitor. The P-values and partial correlation (cor) values were obtained via the purity-adjusted Spearman's rank correlation test.

BLM-Related Gene Enrichment Analysis
We firstly used STRING website (https://string-db.org/) for the analysis of protein-protein interaction network. The main parameters were: minimum required interaction score ["Low confidence (0.150)"], meaning of network edges ("evidence"), max number of interactors to show ("no more than 50 interactors" in 1st shell) and active interaction sources ("experiments"). Secondly, we used GEPIA2 tool to obtain the top 100 BLM-correlated genes based on the datasets of all TCGA tumor and normal tissues. Then we used Jvenn, an interactive Venn diagram viewer [19], to conduct an intersection analysis to compare the BLM-binding and interacted genes. After we obtained the overlapping molecules from Jvenn, we conducted a pairwise gene-gene Pearson correlation analysis between BLM and the selected genes by using TIMER2 tool. The P-values and the correlation coefficient (R) were calculated and are indicated in the corresponding figure panels. Heatmap representation of the expression profile for the selected genes contains the partial correlation (cor) and P-value in the purity adjusted Spearman's rank correlation test. We merged and filtered the two sets of data to perform KEGG (Kyoto encyclopedia of genes and genomes) pathway analysis. The enriched pathways were visualized as cnetplots with the "tidyr" and "ggplot2" R packages. In addition, we applied the "clusterProfiler" R package to conduct GO (Gene ontology) enrichment analysis. The data for BP (Biological process), CC (Cellular component), and MF (Molecular function) were visualized as bubble charts. The R language software [R-3.6.3, 64-bit] (https://www.r-project.org/) was used in this analysis. Two-tailed P<0.05 was considered to be statistically significant (17).             Mutation feature of BLM in different tumors of TCGA. We analyzed the mutation features of BLM for the TCGA tumors by using the cBioPortal tool. The alteration frequency with mutation type (A) and mutation site (B) are shown. The potential correlation between mutation status and overall, disease-speci c, disease-free and progression-free survival (C) by using the cBioPortal tool are displayed.

Figure 6
Correlation analysis between BLM expression and immune in ltration of cancer-associated broblasts.
Different algorithms explored the potential correlations of (A) expression of the BLM gene and (B) in ltration of cancer-associated broblasts across all types of cancer in TCGA.