MYC overexpression predicts poor prognosis and correlates with immune infiltration in chronic lymphocytic leukemia


 Background: Chronic lymphocytic leukemia (CLL) is a highly genetically heterogeneous disease that represents 30% of adult leukemia. We aimed to identify survival-associated key genes in CLL and investigate potential immunotherapy targets.Methods: The survival information and microarray mRNA expression profiles of CLL patients were downloaded from the Gene Expression Omnibus (GEO) and the International Cancer Genome Consortium (ICGC) database. Hub genes were uncovered in CLL. The bioinformatics technology was applied to identify key genes and performe functional analysis. Finally, we explored the role of this key gene and the situation of immune infiltration in the bone marrow.Results: We uncovered 223 differentially expressed genes (DEGs) as hub genes associated with CLL. MYC was identified as the key gene in CLL. High MYC expression was associated with poor prognosis. The biological pathways such as "Hypoxia", "Myc Targets V2", "P53 pathway" etc, were remarkably linked with MYC high expression phenotype. The expression level of MYC was positively correlated with infiltration levels of Eosinophils and follicular helper T cells, while negatively correlated with infiltration levels of regulatory T cells.Conclusion: Overexpression of MYC predicts poor prognosis and correlates with immune infiltration. MYC could serve as a potential immunotherapeutic target for CLL.


Introduction
Chronic lymphocytic leukemia (CLL) is a clonal and proliferative tumor of mature B lymphocytes, characterized by the accumulation of CD5+ and CD19+ B cells in peripheral blood, bone marrow, spleen and lymph nodes [1]. As a highly heterogeneous disease, CLL has signi cant individual differences in pathogenesis, disease progression, therapeutic effect, and survival period [2]. The disease has an insidious onset and unknown etiology. Recent studies have shown that the tumor microenvironment plays an essential role in the progression of CLL disease. Therefore, how the microenvironment in CLL can promote the survival of leukemia cells deserves further study.
Immune in ltration refers to the abnormal appearance of human immune cells in diseased tissues. The speci c mechanism is not completely clear at present. The immune imbalance is an essential part of the pathogenesis of CLL, and there are few studies on the regulation mechanism of immune in ltration in CLL. In recent years, the awareness of immune in ltration has gradually increased. Immune cells in ltrate tumor tissues to constitute the tumor microenvironment. In ltration of immune cells has been found in various solid tumor tissues, such as lung cancer, breast cancer, colorectal cancer [3][4][5][6][7], and the type of in ltrating immune cells is strongly correlated with the clinical characteristics of these solid tumors, affecting the clinical prognosis of tumor patients [8][9]. Therefore, some researchers have proposed that the in ltration of different immune cells in tumor tissues can be used for tumor risk strati cation [10] and becomes a potential drug target to improve patients' survival rate [11].
MYC gene is a nucleoprotein proto-oncogene, mainly including c-myc, n-myc, l-myc families. It is involved in the activation of the PTEN/PI3K/AKT pathway, and its encoded product can speci cally bind with DNA to promote the opening of cell proliferation-related genes and produce cell proliferation effects [12]. MYC gene plays an essential role in the formation, development and prognosis evaluation of lymphoma [13].
Studies have shown that MYC protein overexpression accounts for 29%-47% in diffuse large B-cell lymphoma (DLBCL) [14][15][16]. MYC protein overexpression is a poor prognostic factor for DLBCL patients [12,15]. Kluk MJ found that a lower overall survival rate in patients with MYC-positive DLBCL treated with RCHOP regimen [16] . This research uses bioinformatics tools to obtain genes that are most closely related to the occurrence and development of CLL, analyze the immune in ltration of bone marrow, andexplore the role of immune in ltration and immune regulation in the development of the disease. This study can provide the theoretical basis for targeted therapy of CLL [16][17].

DEG screening and co-expression analysis
To screen the DEGs between CLL and normal B cells, we carried out differentially expression analysis in GSE31048. Figure 1A showed the volcano plot of the differential mRNA expression analysis. Finally, a total of 396 DEGs were found. Using the "WGCNA" R package, hub modules and relevant genes were identi ed among these genes. First, the adjacency matrix was converted into a topological overlap matrix (TOM), hence clustering all genes into different modules. In GSE31048, a soft-thresholding power was set at 3 (scale-free R 2 = 0.90), cut height as 0.25, and minimal module size as 50. As a result, the brown module, with the closest association with CLL formation, was identi ed ( Figure 1B).
2. Identi cation of hub genes in CLL and the enrichment of these genes As shown in the Venn diagram, 223 intersecting genes of the brown module and DEGs were considered hub genes ( Figure 2A). To investigate their biological roles, we carried out two types of enrichment assessments: (a) Gene Ontology(GO)enrichment assessment and (b) Kyoto Encyclopedia of Genes and Genomes(KEGG) enrichment assessment. Figure 2B indicates the top 10 enrichment ndings in GO. The hub genes under biological processes (BP) terms were mostly linked with leukocyte cell-cell adhesion, B cell activation, regulation of leukocyte cell-cell adhesion, T cell activation, and cell-cell adhesion regulation. As for KEGG pathway enrichment, the hub genes were enriched in Hematopoietic cell lineage, PI3K-Akt signaling pathway, Acute myeloid leukemia, Th17 cell differentiation and Transcriptional misregulation in cancer ( Figure 2C).

Determining of key genes in the PPI network
We imported the 223 hub genes linked to CLL into the STRING web resource to construct aprotein protein interaction (PPI) network. After ltering the PPI complex, 78 nodes and 93 pairs of PPI associations were obtained ( Figure 2D). The data were then uploaded into Cytoscape 3.7.2 to compute the network and each node's topological features. According to the three algorithms in the cytoHubba plug-in, we selected the top 10 hub genes graded by each algorithm in the PPI network. These genes were located at the central position, serving an essential function in the disease's onset and progression. Seven genes (IL6, CTLA4, CD38, GNB4, SMAD3, PRKACB, and MYC) were linked to CLL ( Figure 3A). Further survival analyses were employed to evaluate their effects on the survival of CLL. Brie y, only one gene, MYC, was related to the prognosis of patients. Kaplan-Meier survival analysis discovered that CLL patients with high MYC expression had a shorter overall survival based on the best cutoff 2.24 ( Figure 3B). This cutoff value was employed in subsequent subgroup analyses.

MYC -related signaling pathways based on GSEA and GSVA
We performed Gene Set Enrichment Analysis (GSEA) analysis to investigate the potential mechanism of MYC in CLL prognosis. As shown in Figure 4, genes in the high expression group were enriched in biological pathways, including "Hypoxia", "MYC Targets V2", "TNFA signaling via NFKB", "mTORC1 signaling", "G2Mcheckpoint" and "P53 pathway" (Figure 4). Gene set variation analysis (GSVA) con rmed that they were also signi cantly up-regulated in the high expression group, further suggesting their importance in CLL progression ( Figure 5).

Immune cell in ltration analysis
To explore the function of MYC in CLL, we set 298 samples in International Cancer Genome Consortium (ICGC). Firstly, due to the signi cance of immune cells in cancer formation, we evaluated the situation of immune in ltration in bone marrow between low and high MYC groups using CIBERSORT ( Figure 6A

Discussion
CLL has signi cant clinical prognostic heterogeneity. The early parameters used to evaluate the prognosis of CLL mainly includedisease stage (Rai/Binet stage), peripheral blood lymphocyte count, lymphocyte doubling time, bone marrow in ltration pattern, serum β2-microspheres Protein (β2-MG), lactate dehydrogenase (LDH), etc. With the advancesin immunology, cellular/molecular genetics, and molecular biology, the research on CLL prognostic factors has progressed rapidly. Studies have shown that patients with del (17p13) genetic abnormality, immunoglobulin heavy chain variable region (IGVH) mutation status alternative marker ZAP-70 high expression and Binet stage Chave a short survival period. They are essential factors affecting the prognosis of CLL [25]. Besides, many studies have shown that mutations of TP53 (tumorproteinp53) gene and genetic abnormalities such as chromosome 13q deletion (13q-), 11q deletion (11q-) and 17p deletion (17p-) are closely related to the prognosis of CLL [26][27]. With the development of next-generation sequencing and gene chip technologies, we have a profound understanding of molecular mechanisms, so it is vital to explore new biological markers that closely link disease progression and clinical prognosis [28].
We used STRING web resource to construct a PPI network and screened seven key genes related to CLL, such as IL6, CTLA4, CD38, GNB4, SMAD3, PRKACB and MYC. Among them, only MYC was related to the prognosis of patients. Survival analysis indicated that CLL patients with high MYC expression had a shorter overall survival. MYC is a proto-oncogene that is abnormally activated in hematological tumors and solid tumors, leading to cell proliferation and growth [29]. It plays a key role in regulating critical biological processes such as the proliferation, differentiation, and apoptosis of hematopoietic cells and malignant leukemia cells, suggesting that it participates in many processes such as the occurrence, development, and evolution of leukemia. The oncogene C-MYC is the most important member of the MYC family. As a potential target for tumor therapy, C-MYC is abnormally highly expressed in a variety of human solid tumors and hematological neoplastic disease [30]. Tumors with high C-MYC expression are characterized by a high proliferation rate, aggressive phenotype, drug resistance, and poor prognosis. C-MYC protein has a high expression level in most patients with malignant tumor diseases in hematological tumors, and the prognosis is poor [31]. The abnormal increased proliferation and differentiation disorder of hematopoietic stem/progenitor cells caused by overexpression may be a signi cant cause of most leukemias [32][33]. Current studies have pointed out that C-MYC can promote the transformation of chronic myeloid leukemia, and its high expression may be one of the mechanisms of disease progression and acute change [34]. In addition, high expression of C-MYC may cause chronic myeloid leukemia cells to reduce imatinib's sensitivity, so the target C-MYC may become a new breakthrough point for the treatment of TKI-resistant chronic myelogenous leukemia [35][36]. The positive expression rate of C-MYC protein, myeloid-derived suppressor cell, and Th17 cell proportion was signi cantly increased in multiple myeloma patients and was positively correlated with the clinical ISS stage [37].
The GO and KEGG enrichment analysis of hub genes was carried out in this study, suggesting that upregulated genes' biological process is closely related to immune regulation. Leukocyte cell-cell adhesion, B cell activation, regulation of cell-cell adhesion, T cell activation, and regulation of cell-cell adhesion and other biological processes are mainly involved, which further illustrates that CLL is an immune process involving a variety of immune cells, cytokines and complex signal transduction. We performed GSEA and GSVA analyses to explore the potential mechanism of MYC in CLL prognosis. MYC-related signaling pathways are up-regulated, indicating their importance in CLL progression.
Immune in ltration in the tumor microenvironment plays a key role in tumor occurrence and development and affects cancer patients' clinical prognosis [11]. In the development of human solid tumors, immune cells exist in various forms, ranging from in ltrating to overt in ammation. Lymphocytes are usually the largest component of the immune in ltrating site, so they are called " tumor in ltrating lymphocytes (TILs) ". TILs mainly include T lymphocytes, B lymphocytes, NK cells and other immune cells. The phenotypes of these immune lymphocytes can promote or inhibit tumor occurrence and development, which is of great signi cance in the prognosis of tumors. Studies have found that B lymphocytes in TILs are essential biomarkers for tumor treatment and prognosis [38][39], and invasive NK cells have synergistic anti-tumor effects [40]. A retrospective study con rmed that M2 macrophage in ltration was signi cantly associated with poor breast cancer prognosis [41].
The CLL tumor microenvironment contains a variety of cells such as T lymphocytes, dendritic cells, stromal cells, endothelial cells, etc [42]. Together with lymphocytes and their progenitor cells, these cells form " pseudofollicular " cells, which are distributed in the patient's bone marrow, spleen, and lymph nodes, etc. To further explore the role of MYC and immune cells in CLL, we used the CIBERSORT deconvolution method to assess immune cell subtype distribution. Analysis of the in ltration matrix of immune cells showed that the proportion of follicular helper T cells, Monocytes, resting dendritic cells,

Conclusion
In summary, bioinformatics analysis revealed that MYC was highly expressed in CLL, suggesting that it plays a role as an oncogene in CLL. Further analysis demonstrated that MYC overexpression was a useful predictor of poor prognosis of CLL and a molecular marker for prognosis of CLL, suggesting that MYC could be a potential diagnostic and therapeutic target. Another important aspect of this study is that MYC expression is related to immune in ltration. MYC may play a role by regulating the in ltration of immune cells in the tumor microenvironment to provide an effective basis for precise clinical immunotherapy.

Patients and Datasets
GSE31048, including the RNA sequencing (RNA-seq) data of 188 CLL samples and 33 normal B cell samples, was downloaded from the GEO web resource (http://www.ncbi.nlm.nih.gov/geo). The RNA-seq data and corresponding clinical information of 298 CLL samples were obtained from the ICGC database (https://dcc.icgc.org/projects/CLLE-ES). The raw data from GSE31048 were abstracted and processed to normalize in R software (ver. 3.6.0). The limma R tool was employed to identify DEGs [18]. We uncovered the DEGs with a |log2 fold change (FC)| > 1 and adjusted p-value < 0.05.

Construction of co-expression network
The R package was used to lter the genes based on gene expression and variance (standard deviation ≤0.5), and 3716 genes were screened out from all genes in GSE31048. Sample clustering was carried out to plot the sample tree and uncover the outliers. After that, Pearson's correlation analyses were conducted for pair-wise genes and determined the soft thresholding power β value with the pickSoftThreshold tool of the weighted correlation network analysis (WGCNA) [19].

Hub genes identi cation
Firstly, the hub modules were de ned as the module with the absolute value of the Pearson's correlation of module membership being> 0.8 and p-value < 0.05. Moreover, the DEGs in hub modules were de ned as hub genes.

Enrichment analysis
We utilized the clusterPro ler package in R to accomplish functional enrichment analysis consisting of GO and KEGG pathway assessments [20]. GO enrichment constitutes molecular functions, cell components, and BP. P-value of < 0.05 served as the threshold criteria to determine the signi cant GO terms, as well as KEGG cascades, which were visualized using the bubble diagram or "GOplot" in the R package.

Development of Protein-Protein Interaction (PPI) network
We imported the genes linked to the clinical progression into the STRING online web platform (https://string-db.org). Then, a PPI network was established with a minimum level of con dence> 0.9 [21]. The plug-in cytoHubba in Cytoscape v3.6.1 was employed to determine the top 10 genes in the network as the core genes via 3 algorithms: Betweenness, Maximal Clique Centrality, and Degree. We intersected the genes obtained by various algorithms to obtain the key genes.
6. Gene set enrichment analysis and Gene set variation analysis GSEA is a computational method for exploring whether an a priori de ned set of genes shows statistically signi cant, concordant differences between two biological states. It has also been applied to analyze the slight changes in the expressions of genes that belong to a key pathway. GSEA was carried out to analyze significant survival differences between MGLL high and low groups on GSEA software [22]. By running GSEA, normalized enrichment scores and nominal p-value (NOM p-value) were generated for each phenotype's pathways enrichment analysis. Gene sets were considered signi cantly enriched with NOM p-value<0.05, FDR q-value <0.25. Additionally, the GSVA R package was utilized to detect the pathways that are related to each phenotype [23]. An adjusted P value less than 0.05 was considered statistically signi cant. The reference gene set "h.all.v7.0.symbols" was downloaded from the Molecular Signature Database (http://software.broadinstitute.org/gsea/msigdb/index.jsp).

Assessment of immune cell subtype distribution
CIBERSORT constitutes a type of deconvolution algorithm for transforming the standardized gene expression matrix into the invading immune cell composition [24]. During the CIBERSORT computation, quanti cation of the enrichment of particular types of cells in complex tissues was performed, and the CIBERSORT ndings have been veri ed by uorescence-activated cell sorting. We employed LM22 as the reference expression signature set at 1000 permutations. CIBERSORT output of P-value < 0.05 de ned a more accurate prediction of immune cell composition. After that, we selected samples satisfying constraints for further analyses. Herein, 298 samples in ICGC were utilized to estimate the invaded

Availability of data and materials
The datasets generated during the current study are available in the GEO database(http://www.ncbi.nlm.nih.gov/geo) and the ICGC database (https://dcc.icgc.org/projects/CLLE-ES).     GSVA con rmed that the six pathways were also signi cantly upregulated in high MYC group.