Genetic Map of Relapsed and Refractory Diffuse Large B-cell Lymphoma: A Systemic Review and Association Analysis

Diffuse large B-cell lymphoma (DLBCL) is the most common histological subtype of non-Hodgkin lymphoma (NHL). In recent years, a deeper understanding of the genetic subtypes of diffuse large B lymphoma has been reached, and these advances have also been applied to research on relapsed and refractory diffuse large B-cell lymphoma (RRDLBCL). We screened 1495 documents, compiled the whole-exome sequencing data of several studies, formed a data set including 92 observations, and performed association analysis on the high-frequency mutations among them. The most common mutations in the data set include TTN (34/92, 37.0%), KMT2D (29/92, 31.5%), TP53 (25/92, 27.2%), IGLL5 (25/92, 27.2%), CREBBP (21 /92, 22.8%), BCL2 (21/92, 22.8%), MYD88 (20/92, 21.7%), and SOCS1 (19/92, 20.7%). Among these, CREBBP, KMT2D, and BCL2 have a strong association with each other, and SOCS1 has a strong association with genes such as ACTB, CIITA, and GNA13. There is also a strong association between SOCS1 and STAT6. Though TP53 and MYD88 lack signicant associations with most genes, the association between MYD88 and PIM1 is signicant. Through SOM clustering and expression-level analysis of common gene mutations, we believe that RRDLBCL can be divided into four main types: (1) JAK-STAT-related type, including STAT6, SOCS1, ITPKB, CIITA, and B2M. The expression lineage is similar to PMBL and cHL. (2) EZB type: BCL2 and EZHZ are the main types of mutations. Epigenetic mutations such as KMT2D and CREBBP are more common in this type, and are often accompanied by BCL2 mutations. (3) MCD type, including MYD88, CD79B and PIM1. These genes are involved in the BCR signaling pathway and related pathways, and are connected by the common NF-κB pathway. (4) Undened type (Sparse Mutation type). These patients are mainly individuals with sparse mutations, including some patients with TP53 mutations (30.3%, 10/33), but who generally lack characteristic mutations. Among the common gene mutations, the expression changes in BCL2, PIM1, STAT6, ITPKB, and GNA13 have more signicant prognostic signicance. We also reviewed the literature from recent years concerning the previously mentioned common gene mutations.


Introduction
Diffuse large B-cell lymphoma (DLBCL) is the most common histologic subtype of non-Hodgkin lymphoma (NHL). For patients who are refractory to initial treatment or relapse after an initial response, only a small percentage will experience prolonged disease-free survival with salvage chemoimmunotherapy alone [1]. In recent years, our understanding of the classi cation of diffuse large B lymphomas has deepened, and these advances have also been applied to the study of relapsed and refractory diffuse large B-cell lymphomas (RRDLBCL). Among recent advances, the multi-platform genome analysis performed by Schmitz et al. in 2018, based on gene expression classi cation of DLBCL, adds genetic classi cation that may be helpful for understanding the pathogenesis of DLBCL [2]. This milestone study identi ed four prominent genetic subtypes in DLBCL, termed MCD (based on the co-occurrence of MYD88 L265P and CD79B mutations), BN2 (based on BCL6 fusions and NOTCH2 mutations), N1 (based on NOTCH1 mutations), and EZB (based on EZH2 mutations and BCL2 translocations). Recently, Sha et al. de ned a class of molecular high-grade diffuse large B-cell lymphoma (MHG) in their research and concluded that different treatment strategies were required for this group of patients [3].
With the contributions of these researchers, our understanding of the heterogeneity of DLBCL has reached a critical point. Since DLBCL includes various genetic subtypes, it is necessary to understand what role these components play in relapsed and refractory DLBCL. In the study by Schmitz et al., the results of survival analysis showed that MCD and N1 have the worst prognosis in the activated B-cell (ABC) subtype, while EZB has the worst prognosis in the germinal center B-cell (GCB) subtype. As clinicians, we pay particular attention to the composition of each genetic subtype in RRDLBCL. At the same time, whether there are undisclosed subtypes remains unclear. Further ndings in this area can guide clinical design of new treatment strategies to improve the prognosis of patients with RRDLBCL.
At present, there are not enough large-scale studies to summarize the genetic characteristics of this particular population of RRDLBCL patients. To address this, we screened 1495 documents, from which we summarized whole-exome sequencing data from 4 studies of RRDLBCL [4][5][6][7]. By conducting an association study of whole-exome sequencing results for a total of 92 patients, we observed the most common mutations in RRDLBCL and attempted to cluster them. In addition, for the most high-frequency genes, we also used another data set (GSE10846) [8] to study the correlation between expression levels. Finally, we reviewed the research progress on these genes, and speculated the possible molecular biological mechanism of RRDLBCL.

Study selection
A systematic literature search for all relevant articles from January 2005 through August 2019 was conducted in MEDLINE. The search strategy combined the Medical Subject Headings and key words with terms for diffuse large B-cell lymphoma (DLBCL) and for whole-exome sequencing (WES). No language restrictions were imposed. Two investigators (T.L. and W.J.) reviewed all potentially relevant articles independently. We include both prospective and retrospective studies and, critically, complete whole-exome sequencing results of RRDLBCL. One reviewer (T.L.) extracted whole-exome sequencing data and clinical information from each eligible study, and another reviewer (W.J.) con rmed the data. In the end, we selected 4 articles from 1495 articles (Table 1) [4][5][6][7], and the data came from 5 independent clinical studies. We extracted the following information from eligible studies: rst author, year of publication, journal, pathological subtypes of RRDLBCL, and whole-exome sequencing results (mutated genes, mutation types, etc.) corresponding to RRDLBCL patients.

Data processing and analysis
We standardized whole-exome sequencing extracted from different studies (standardization of mutationtype tags) and adopted partial exclusion criteria to avoid serious bias: (1) exclude mutations found in only one data set (highly suspected false positives); (2) exclude genes that affect a number of patients less than or equal to 3; (3) mutation type labels adopted: Truncating SNV, Splice site SNV, Missense SNV, Indel (insert or delete), Synonymous SNV, Multiple types (refers to the presence of multiple types of mutations in the same patient sample); (4) use consistent gene names. After the above processing, we obtained a data set consisting of 92 observations and 128 variables (genes). We then conducted an association analysis [Apriorii and self-organizing map (SOM) algorithms, both based on the R package] on the data set to analyze the strength of the association between each gene and try to cluster it. For genes with strong associations, we rely on another data set (GSE10846) to analyze the correlation of their expressions. For the speci c method, please refer to Supplemental S1. Visualization of relevant results was achieved using the R package and Gephi 0.9.2. . These genes are also frequent item sets in association analysis ( Figure   1a). After K-means clustering the data set, the data was divided into 2 groups ( Figure S1). Cluster 1 consists mainly of patients with SOCS1 mutations. The three most common mutations in the group are SOCS1 MYD88, and SOCS1 mutation is 0 ( Figure 1b). We used the apriori algorithm to observe the association between genes, and found that CREBBP, KMT2D, and BCL2 have a strong correlation with each other, and SOCS1 has a strong association with ACTB, CIITA, GNA13, and other genes ( Figure S2a&b). There is a strong correlation between SOCS1 and STAT6. Among patients with SOCS1 mutations, 42.1% (8/19) also have STAT6 mutations (lift = 2.75). TP53 and MYD88 lack a meaningful association with most genes (lift ≤ 1 indicates independence) ( Figure S2c&d). Among genes with a relatively high incidence (n≥8), MYD88 is only strongly associated with PIM1, CD79B, HLA-A, USH2A, and UBR4 (lift between 1.59 and 2.27). Figure 1c shows the 300 rules with the highest lift value in the apriori algorithm, displayed based on the Fruchterman-Reingold layout. Genes such as SOCS1, BCL2, KMT2D, and CREBBP have more association rules with higher lift values, so the level of in-degree is higher. Although the incidence of TTN, IGLL5, DMD, and other genes is also high, the strength of association with other genes is relatively weak. Gene clustering based on SOM self-organizing mapping neural network model In order to further group the gene mutations in RRDLBCL, we took the lift of genes in the dataset relative to SOCS1, STAT6, KMT2D, CREBBP, PIM1, and MYD88 as variables, and clustered them by SOM algorithm. Figure 2 shows the results of clustering, where the size of the slice re ects the in uence of variables on the objects in the cluster. The mean distance of objects to their closest code book vectors is 1.596, which indicates good mapping quality ( Figure S3 shows the iterative process). By evaluating the generated SOM clusters, a set of ideal clusters were produced ( Figure S4 from left to right and from bottom to top, clusters 1-16 respectively). Evaluation shows that all clusters except cluster 6 are ideal. Through cluster analysis based on a SOM self-organizing network, we found that from the perspective of inter-gene association, there are at least two major gene clusters in the genetic mutations of RRDLBCL. One cluster is a group of mutations including SOCS1 and STAT6, which are strongly associated with the SOCS1 gene and may be related to the JAK-STAT pathway. The other cluster is a group of genes, including KMT2D and CREBBP, related to epigenetic changes and strongly associated with BCL2. High-frequency genes such as MYD88 and TP53 are relatively independent. PIM1 has a broad association with many genes, but it is only strongly associated with MYD88 in high-frequency mutations.
Analysis of correlation between genes by expression array data and grouping of RRDLBCL mutant genes In order to further analyze the relationship between hot-spot genes in RRDLBCL revealed by whole-exome sequencing, we used another public dataset based on whole human genome expression array of Affymetrix Human Genome U133 Plus 2.0 Array. Based on the framework provided by SOM self-organizing network clustering analysis, we perform linear analysis between the expression data of hot-spot genes. We found that STAT6 and SOCS1 not only showed a strong correlation in association analysis, but their expression levels were also highly correlated (P <0.01). Among genes strongly associated with STAT6 and SOCS1, there are also genes such as ITPKB, CIITA, FAT4, ACTB, for which expression levels are highly correlated with both genes. Figure 3 shows the relationship of some hot-spot genes. The gure shows that the expression levels of ITPKB and CIITA are not only related to STAT6 and SOCS1, but also to each other. These genes constitute the most closely related group of hot-spot genes. In addition, B2M, RYR2, LOXHD1, and PKHD1L1 are closely related to STAT6. Although they are also strongly related to SOCS1, they are not related to SOCS1 at the expression level.
As mentioned, KMT2D, CREBBP and BCL2 are strongly related. However, analysis suggests that BCL2 is not related to KMT2D or CREBBP in terms of expression level. However, the expression levels of CREBBP and STAT6 are linearly related. MYD88 lacks closely related genes; among the high-frequency genes, only PIM1 has a strong correlation, linearly correlating with MYD88 in expression level (P <0.01). PIM1 and CD79B also showed a strong correlation (lift = 3.04, P <0.01). PIM1, CD79B, and PRDM15 constitute a closely related group. Although TP53 is one of the most high-frequency mutations, there is no high-frequency gene either strongly associated with it or showing a meaningful linear correlation in expression level.
Through the above analysis, we believe that there may be several combinations in RRDLBCL that summarize most of the mutation types: (1) mutations represented by STAT6 and SOCS1 that may be related to the JAK-STAT pathway; (2) KMT2D, CREBBP, and BCL2 mutations; (3) MYD88, CD79B, and other mutations that show strong correlation with PIM1. Based on this classi cation, we classi ed 92 patients (Supplemental S1) to observe the situation of each sub-category through Figure 4; patients on the left side of the heat map mainly carry SOCS1 and STAT6 mutations, and also include mutations considered to be closely related to SOCS1 and STAT6, such as ITPKB, FAT4, and B2M. In the middle of the heat map, there are mainly KMT2D, CREBBP, and BCL2 mutations. These three mutations are concentrated in some RRDLBCL patients. To the right of the KMT2D-CREBBP-BCL2 patient group on the heat map, there are patients with MYD88 or CD79B mutations, and about 1/4 (6/23) of them also carry PIM1 mutations. However, PIM1 mutations also appear in both JAK-STAT6 and KMT2D-CREBBP-BCL2. There is some overlap between these categories, and we call them complex types of RRDLBCL patients. Complex type 1 shown in the gure has characteristics of KMT2D-CREBBP-BCL2 and JAK-STAT6; complex type 2 has characteristics of KMT2D-CREBBP-BCL2 and PIM1-MYD88-CD79B, and of these, one case has all three types of features.

Survival analysis based on expression of high-frequency genes
We used data from the same expression array to conduct survival analysis to evaluate the research value of the above genes. Patients were grouped according to expression levels of the corresponding genes (expression upregulation, expression downregulation, signi cant upregulation, signi cant downregulation).
Results showed that expression levels of BCL2, PIM1, STAT6, ITPKB, and GNA13 had signi cant prognostic value. Among these, the expression level of BCL2 has relatively signi cant value in patients receiving the drug combination treatments CHOP (P <0.01) and R-CHOP (P = 0.1) (Figure 5a  suggest that PIM1 is mainly expressed in the nucleus, while MYD88 staining is largely cytoplasmic, with almost no nuclear staining. Compared with the control group, the positive expression rates of PIM1 and MYD88 were higher in PCNSL, and their expression levels were positively correlated (r = 0.581, P = 2.0 × 10 -6 ) [19].
PRDM15 encodes a novel DNA-binding protein that regulates expression of key activators and repressors of the WNT and MAPK-ERK pathways at the chromatin level to safeguard naive pluripotency [20]. A recent study showed that PRDM15 regulates the transcription of key effectors of NOTCH and WNT/PCP pathways to preserve early midline structures in the developing embryo [21]. The correlation analysis in this study suggests that PRDM15 is related to PIM1 and CD79B. However, in most previous studies, PRDM1 is closely related to MCD rather than PRDM15. Nevertheless, the MCD subtypes in RRDLBCL are notable, and undoubtedly a subtype with poor prognosis in DLBCL.
BCL2, KMT2D and CREBBP: the only protagonist of EZB In a study published in 2019, a class of molecular high-grade diffuse large B-cell lymphoma (MHG) was de ned, which had signi cantly higher mutation frequencies than GCB in KMT2D, BCL2, MYC, or DDX3X.
The progression-free survival rate at 36 months after R-CHOP in the MHG group was 37% (95% CI, 24-55%) compared with 72% (95% CI, 68-77%) for others. A 2019 NGS study on DHL and THL showed that the most frequently mutated genes were CREBBP (16/20 cases), followed by BCL2 ( CREBBP is a histone acetyltransferase that mediates H3K27 acetylation and is important for gene enhancer activation. Evidence suggests that CREBBP mutations are an early event in lymphoma because mutations in this gene are also found in hematopoietic stem cells [25]. CREBBP-mutant lymphomas have decreased expression of genes involved in germinal center exit, those responsible for plasma cell differentiation, and those associated with antigen presentation by MHC class II, suggesting that CREBBP de ciencies contribute to lymphomagenesis by blocking B-cell differentiation and facilitating immune escape [26]. In our research, we found that KMT2D and CREBBP mutations often appear at the same time, but there is no expression correlation between the two. At present, there are no other studies to prove that there is a straightforward mechanistic relationship between the two genes.
BCL2 abnormal activation is usually driven by genetic abnormalities in BCL2 itself, the most important of which is the t(14; 18) (q32; q21) translocation [27]. In addition, BCL2 gain or ampli cation is also related to BCL2 overexpression, which occurs almost exclusively in the ABC subtype (~14% and STAT6 with BCL2, KMT2D, and CREBBP is still relatively rare. These two sets of mutations are a manifestation of DLBCL heterogeneity, and it is necessary to distinguish them. We discuss further below. STAT6 and SOCS1: trunk of the JAK-STAT subtype The JAK2-STAT signaling pathway is a survival pathway involved in the proliferation and differentiation of B cells. Deregulated JAK2-STAT signaling participates in the pathogenesis of subtypes of lymphoma [28]. In general, activation of the JAK2-STAT signaling pathway appears more in HL and PMBL [29][30]. HL is frequently associated with a 9p24.1 genomic ampli cation that includes the JAK2 locus, as well as with a cytokine-enriched tumor microenvironment. Thus, activation of the JAK2-STAT signaling pathway may promote tumor growth in HL [31][32]. As for PMBL, it shares molecular features with HL [33]. However, Schmitz states that activators of transcription (JAK-STAT) signaling may have been promoted in 49% of cases by a STAT6 mutation or ampli cation or by a mutation or deletion targeting SOCS1 [2].
SOCS1 encodes one of eight members of the suppressor-of-cytokine-signaling (SOCS) protein family. These proteins contain an SH2 domain and a "SOCS box" that play pivotal roles in the down-regulation of JAK activity [34]. SOCS1 mutations are found primarily in patients with PMBL or HL. These mutations were rst identi ed in 42% of patients with cHL, and subsequently in a similar proportion of those patients with PMBL or nodular lymphocyte-predominant Hodgkin's lymphoma (NLPHL) [35][36][37]. In DLBCL, the SOCS1 mutation has previously been shown to be associated with good survival [38]. A 2016 study also showed that the SOCS1 mutations were exclusive to non-recurrent primary DLBCL and were completely absent in cases of relapsing DLBCL, which also supported a good prognosis [39]. However, our research shows that SOCS1 mutation still occurs in a considerable proportion of RRDLBCL, affecting about 20% of patients, which should not be ignored. These patients likely did not show a good prognosis because there are a large number of other mutations that are clearly associated with SOCS1, such as STAT6, ITPKB, and CIITA. This shows that the role of SOCS1 in DLBCL is likely to be more complicated than previously recognized.
STAT6 belongs to the STAT family of both adaptor proteins and transcription factors. STAT family members display a shared protein structure that is instrumental for their activation and functions [40]. This family plays a key role in the proliferation and survival of B lymphocytes and is often dysregulated in lymphomas [41]. In particular, expression and/or activation of STAT6 as well as ampli cation of the locus encoding STAT6 on chromosome 12 have been detected in more than 50% of PCNSL specimens [42]. In addition to PCNSL, similar to SOCS1, STAT6 mutations are also found in PMBL. An earlier study suggested that 20 of 55 (36%) PMBL cases harbor heterozygous missense mutations in the STAT6 DNA-binding domain, whereas no mutation in the gene was found in 25 diffuse large B-cell lymphoma samples [43]. Some studies have shown that STAT6 is associated with poor prognosis, but the sample size is small in these cases [44].
ITPKB, CIITA, B2M, FAT4, GNA13: branches of JAK-STAT subtype Some genes with relatively few background studies have been screened in this study. Among them, ITPKB, GNA13, CIITA, B2M and FAT4 are closely related to STAT6 and/or SOCS1. ITPKB encodes for a kinase that converts the second messenger inositol trisphosphate (IP3) to IP4, a soluble antagonist of the AKTactivating PI3K-product IP3 [45]. Studies have shown that ITPKB may be associated with Alzheimer's disease, Parkinson's disease, and common-variant immunode ciency diseases [46][47][48]. In a 2018 study, cHLencoding genomes were analyzed in tumors and normal cells of individually isolated lymphomas in biopsied tissue from 34 patients. The study found that ITPKB is one of the most frequently mutated genes in cHL (16% of total cases), and other mutated genes include STAT6 (32%), GNA13 (24%), and XPO1 (18%) [49]. In the data we compiled, ITPKB affected 16.3% of patients (15/92), of which only one case was PMBL (GHE0645). 1/3 of these patients also carried GNA13 (5/15, 33.3%), which shows that a segment of RRDLBCL has a mutational gene spectrum highly similar to cHL.
CIITA encodes a protein with an acidic transcriptional activation domain, 4 LRRs (leucine-rich repeats), and a GTP-binding domain. Research into the role of this gene in lymphoma has focused on gray zone lymphoma (GZL) [50]. In a 2019 study of GZL, 139 cases of were classi ed into cHL-like GZL and LBCL-like GZL. A continuous morphologic and immunophenotypic spectrum was observed within these 2 GZL categories. The majority of cases presented genetic immune escape features with CD274/PDCD1LG2 and/or CIITA structural variants by uorescence in situ hybridization. Cases with LBCL-like morphology had more PD-L1/PD-L2 or CIITA rearrangements than cHL-like cases [51].
Beta-2-microglobulin (B2M) encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. B2M is the most commonly altered gene in HRS cells. Studies have found that 7/10 of cHL cases have inactive mutations. The study also indicated that wild-type B2M expression in a cHL cell line restored MHC-I expression [52]. In addition, recent studies have suggested that B2M is also altered frequently in PMBL [53].
There is little published research on FAT atypical cadherin 4 (FAT4). The protein encoded by this gene is a member of the protocadherin family. This gene may play a role in regulating planar cell polarity (PCP). Previous studies have found frequent mutations in FAT4 in spleen marginal lymphomas and extranodal nasal NK/T-cell lymphomas. Because of their potential role in lymphoma formation, there is value in further studying this gene in the context of neoplastic growth [54][55].
The GNA13 mutation affected 13% (12/92) of RRDLBCL patients and showed a clear prognostic signi cance in the survival analysis of expression data in this study. Previous studies suggest that GNA13 is the most common mutant gene in germinal center (GC)-derived B-cell lymphoma, including nearly a quarter of Burkitt's lymphoma and GC-derived diffuse large B-cell lymphoma cases [56]. GNA13 is involved in Gna13 signaling, which restricts germinal center B cells to the germinal center, suggesting that GNA13 mutations are related to cell migration [57]. Some researchers have modeled the GNA13-de cient state exclusively in GC B cells by crossing the Gna13 conditional knockout mouse strain with the GC-speci c AID-Cre transgenic strain. GNA13 de ciency, combined with conditional MYC transgene expression in mouse GC B cells, ultimately promoted lymphomagenesis [58]. In the association analysis conducted in our study, the association between GNA13 and SOCS1 and STAT6 was prominent, but the expression level was only linearly correlated with STAT6 (P=0.06).
Of these genes that are potentially linked to the JAK-STAT pathway, some are related to immune escape (B2M, CIITA), some are related to cell migration (GNA13), and some require further study. We found that the expression levels of ITPKB and GNA13 are related to prognosis, which previous studies have also suggested [59]. If SOCS1 and STAT6 represent the signi cance of JAK-STAT pathway in the pathogenesis of DLBCL, then these genes related to JAK-STAT may affect the remission and prognosis of DLBCL on multiple levels. The role of the entire JAK-STAT pathway in RRDLBCL may be more complicated than we imagined, suggesting that we should treat it independently from EZB as represented by BCL2 and CREBBP.

RRDLBCL classi cation and precision medicine
Our research relies on part of the published research data, and there is an inevitable lack of information. For example, the NOTCH2 mutation was discarded in the process of dimensionality reduction because of a signi cant false positive in one of the data sources. However, overall, the results of this study are still bene cial for the classi cation and treatment of RRDLBCL. In terms of classi cation, the so-called MCD and EZB types cover about 2/3 of RRDLBCL. In EZB classi cation, we believe that it is meaningful to distinguish mutations related to the JAK-STAT pathway from mutations represented by BCL2 and CREBBP; this re nes classi cation of RRDLBCL and provides guidance for further research. In sum, RRDLBCL is classi ed into four types, and the pathways and mechanisms involved are shown in Figure 7: (1) JAK-STAT related type: including STAT6, SOCS1, ITPKB, CIITA, and B2M. The expression lineage is similar to PMBL and cHL. This type involves many mechanisms, such as cell migration and immune escape. However, in general, there is a close relationship with the JAK-STAT pathway.
(2) EZB type: BCL2 and EZHZ are the main types of mutations. Epigenetic mutations such as KMT2D and CREBBP are more common in this type, and are often accompanied by BCL2.
(3) MCD type: including MYD88, CD79B, and PIM1. These genes are involved in the BCR signaling and related pathways, and are connected with the common NF-κB pathway. This is the most independent category, with a more conservative mutant gene lineage.
(4) Unde ned type (Sparse Mutation type): These patients are mainly individuals with sparse mutations, including some patients with TP53 mutations (30.3%, 10/33), and they generally lack characteristic mutations. The most frequent mutations other than TP53 are found in TTN, IGLL5, and DMD, but these mutations are also very common in other patients. Of the 128 genes we used as variables, the Sparse Mutation type has an average of 11.65 gene mutations per person. In contrast, each of the other types of patients has an average of 13.8 genes mutated.
Two points bear consideration: (1) There are complex types of mutations. That is, a patient may have two or more types of mutations in the above categories 1-3; (2) For RRDLBCL of Unde ned type, the relapse refractory mechanism may be di cult to analyze by genetic mutation, which means that there may be other reasons (we have not observed a meaningful mutation pattern in this group of patients

Availability of data and materials
Data and materials supporting the conclusion of this study have been included within the article and the additional les. Some datasets generated during and/or analyses during the current study are available in the NCBI database (GSE10846).

Consent for publication
The authors declare that they have no competing interests.

Funding
Not applicable.  Cluster analysis of SOM self-organizing network for 128 genes. Each circle represents a cluster, and the size of the sector represents the strength of the association of several major genes with genes within the cluster.  Clustering heat map based on association analysis results. Num_mutations refers to the number of gene mutation types carried by a patient. Complex type 1 shown in the gure has characteristics of KMT2D-CREBBP-BCL2 and JAK-STAT6. Complex type 2 has characteristics of KMT2D-CREBBP-BCL2 and PIM1-MYD88-CD79B.

Figure 5
Survival analysis grouped according to several gene expression levels. a. Survival curve in CHOP group according to BCL2 expression level; b. Survival curve in R-CHOP group according to BCL2 expression level; c.
Survival curve in CHOP group according to PIM1 expression level; d. Survival curve in R-CHOP group according to PIM1 expression level; e. Survival curve in CHOP group according to STAT6 expression level; f. Survival curve in R-CHOP group according to STAT6 expression level; g. Survival curve in CHOP group according to ITPKB expression level; h. Survival curve in R-CHOP group according to ITPKB expression level; i. Survival curve in CHOP group according to GNA13 expression level; j. Survival curve in R-CHOP group according to GNA13 expression level.

Figure 6
Survival analysis of the cumulative points. Analysis was performed based on the expression of three combinations of genes,