A novel molecular classication of diffuse large B cell lymphoma based on Metabolism-related genes

Purpose About 30–40% of patients with diffuse large B-cell lymphoma (DLBCL) relapse or fail to respond to rst-line treatment. The molecular heterogeneity is considered to be the main factor affecting the therapeutic response of DLBCL. The existing classication methods can not fully explain these heterogeneity, so we try to explain DLBCL heterogeneity by dening DLBCL subtypes from the perspective of metabolism. In this study, we integrated ve DLBCL data sets (GSE10846, GSE11318, GSE53786, GSE87371 and GSE23501) (n = 742) from geo database, screened 106 metabolic related genes (MAD > 0.5, Cox P < 0.001), and identied dlbcl2 subclasses (nmftype1, nmftype2) by non-negative matrix factorization clustering (NMF).


Results
nmftype1 showed low metabolic activity ,while nmftype2 showed high metabolic activity. Compared with the two subtypes of immune in ltration, it was found that nmftype1 was mainly in ltrated by B cells, and nmftype2 was mainly in ltrated by T cells and macrophages, and the high expression of nmftype2 was more in immune checkpoint. The difference of metabolic subtype OS was statistically signi cant, and the overall survival (OS) of nfmtype1 was worse than that of nmftype2. The combination of metabolic subtypes and ABCGCB subtypes can predict the prognosis of DLBCL patients better than the existing ABCGCB subtypes. Finally, 34 gene classi ers were identi ed. The consistency results were veri ed by GSE31312 (n = 470), and a new classi cation of DLBCL based on metabolic gene expression pro le was established.

Conclusions
We have obtained a new DLBCL typing method, which has prognostic signi cance. It has a certain correlation with immune escape and can guide individualization application of immunotherapy and metabolic therapy.

Background
Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma (NHL), accounting for 30% − 40% of all NHL cases [1,2]. Most DLBCL patients can be cured with current rst-line chemoimmunotherapy based on a combination of anthracyclines and anti-CD20 antibodies (such as R-CHOP). However, about 30-40% of patients still have relapse or ineffective rst-line treatment [3].
Previous studies have shown that DLBCL is a malignant tumor with great heterogeneity in gene mutation, copy number (CN) change, structural variation, clinical presentation and prognosis, etc. [4][5][6], and molecular heterogeneity of DLBCLs is considered to be the main factor in uencing the response to R-CHOP treatment. [7,8]Elucidating the underlying molecular mechanism of DLBCL heterogeneity and discovering different subsets to subdivide DLBCL is essential for guiding clinical treatment and predicting prognosis.
DLBCL can be divided into germinal center B cell like (GCB) and activated B cell like (ABC) according to the cells of origin. About 10%-20% of the cases cannot be classi ed [9].although the prognosis of ABC subtype has been con rmed to be worse than that of GCB subtype. subtypes [7], Heterogeneity still exists within GCB and ABC subtypes,and there are better and worse prognostic subgroups in each group [10].
Schmitz et al. Sequenced the whole exome of 372 genes, RNA sequence and gene copy number 574 DLBCL patients. and classi ed DLBCL patients into four subclasses: MCD (MYD88L265P and CD79B comutated), BN2 (BCL6 fusions or NOTCH2 mutated), N1 (NOTCH1 mutated), and EZB (EZH2 mutated or BCL2 translocated) [6]. However, this classi cation method has several drawbacks:1. The sample size of subclass is small, especially N1 type, and the sample size is only one digit after strati cation. 2. 55% of DLBCL specimens were not classi ed by this method. 3. Because the samples of ABC and unclassi ed subtypes were arti cially enriched in this study, they could not represent the true subtype distribution. (Dubois and Jardin, 2018). Chapuy et al. conducted a comprehensive genetic analysis on 304 DLBCL samples and identi ed 5 DLBCL subclasses(C1-C5) with prominent genetic characteristics. However, this classi cation method requires in-depth genomic analysis, which certain overlaps with the subtypes obtained by Schmitz [3,5]. Therefore, larger sample size and simpler classi cation methods and better heterogeneous subtype identi cation are needed to guide clinical practice [3].
Changes in metabolic activity enable cancer cells to acquire and maintain malignant characteristics [11,12]. In 2005, Stefano Monti et al. Clustered gene expression pro les of 176 DLBCL samples to obtain three subsets of oxidative phosphorylation (OxPhos), B-cell receptor/proliferation, host response (HR).
Genes involved in electron transport chain (ETC) complexes, OxPhos metabolism, and other mitochondrial functions were enriched in the OxPhosz subtype. BCR/proliferation subtypes express higher levels of multiple components of the B cell receptor (BCR) signaling cascade and other B cell speci c or essential transcription factors than the other two subtypes [13]. HR subset had T cell enrichment, and increased expression of complement and in ammatory mediators, etc [14]. These studies indicate that metabolism is associated with tumorigenesis and Cancer progression, and DLBCL has metabolic heterogeneity. These studies suggest the possibility of prospective classi cation of DLBCL from metabolism.
we annotate the data set by converting the probe ID to a gene symbol and extract the mRNA expression data of DLBCL sample in each data set. and got a total of 1024 sample data, The number of samples contributed by each dataset is shown in Table 1( Table 1). Using intersect function, we found that there were 114 identical samples in GSE53786 and GSE 10846, 41 identical samples in GSE 53786 and GSE 1318, and 163 identical samples in GSE 10846 and GSE 11318 ( Figure S1). After removing the repeated samples, 747 samples were obtained, including 300 cases of GSE10846, 40 cases of GSE11318, 65 cases of GSE23501, 119 cases of GSE53786 and 223 cases of GSE87371. After removing the samples without survival data (including 3 cases of GSE11318 and 2 cases of GSE87371), the metadata set of 742 samples was obtained. pathway were extracted. we calculated the median absolute deviation (MAD) of each gene and used a univariate cox regression to evaluate the correlation between the expression of each gene and the overall survival (OS). We used MAD > 0.5 and P < 0.001 as criteria to screen important prognostic metabolic genes. MAD > 0.5 and P < 0.001 were used as standard screen metabolic genes with signi cant prognosis. The mRNA expression data of metabolic genes with signi cant prognosis were clustered unsupervised by NMR package [19]. The K value which the apparent correlation coe cient begins to decrease is selected as the optimal cluster number [20]. The same method was applied to the GSE31312 dataset [21]. GENEPATTER's submap module(Https://cloud.genepattern.org/) compare the two datasets to detect the consistency of the subsets. Then principal component analysis (PCA) was used to verify the distribution of the above metabolic gene mRNA expression data.

Characterization of DLBCL subclasses
Mftype1 was set as the experimental group and mftype2 as the control group. The difference of mRNA expression between the two subtypes was analyzed by limma package. The expression differential genes were obtained according to the standard of log fold change (logFC) > 1.5, P < 0.05. Then, the KEGG and go enrichment analysis of the differentially expressed genes were carried out by using the cluster pro ler package, and the enrichment analysis results were obtained by selecting p.adjust < 0.05 as the standard.

Gene Set Enrichment Analysis(GSEA)
According to the results of limma package, the genes were sorted according to the size of logFC,. Taking the c2.cp.kegg.v7.0.symbols le downloaded from the GSEA o cial website as a reference, GSEA analysis was performed with GSEABase R package, and enrichment pathways were screened according to the criteria of p.adjust < 0.05. Because nmftype2 was taken as the control group, the positive value of enriched normalized enrichment score (NES) was regarded as the speci c metabolic pathway of nmftype1, and the negative value of enriched NES was regarded as the speci c metabolic pathway of nmftype2.

Evaluation of immune in ltration
CIBERSORT was used to calculate the in ltration score of immune cells in the sample. According to the threshold value of P < 0.05, the sample with credible in ltration results. Wilcox.test was used to compare the in ltration scores of immune cells in the two groups. Based on current drug inhibitors in clinical trials or drugs that have been approved for speci c cancer types, 10 potential targeted immune checkpoint genes selected(CTLA-4/CCL2/ CD276/CD4/CXCR4/IL1A/IL6/LAG3/PDL1(CD274)/ TGFB1). The t test was used to compare the differences in the expression of immune checkpoint genes between the two metabolic subtypes [22].

Comparison of clinical characteristics
GSE10846 with complete clinical information was selected for clinical feature analysis. Chi-square test was used to compare the clinical characteristics of different subtypes.

Prognostic analysis of the two subtypes
The prognosis of the two subtypes was compared by log-rank test in each data set and combined data of ve data sets. ABC subtype samples were extracted from the combined data of ve data sets and this data were used to compare the prognosis of the two metabolic subtypes in the Patients with ABC subtype. The same procedure was performed in patients with GCB subtype. The same analysis was performed on GSE31312 to verify the correlation between metabolic subtypes and prognosis.

Comparison of metabolic subtypes and GEP subtypes
The distribution of GEP subtypes in the two metabolic subtypes was compared with ggstatsplot package.
In the combined data of ve data sets, the metabolic subtype prognostic model, GEP subtype prognostic model, and combined prognosis model were constructed by using the coxh function of survival package, The concordance index (c-index), The Akaike information criterion(AIC) value and log-rank χ 2 value of the three models were calculated, and the predictive ability of the three models for prognosis was compared through the three values. The log rank χ 2 value is larger, which indicates that the prediction ability of the model is better. C-index is used to quantify the prediction performance of the model, and AIC is used to compare the model tting [23].
2.9 Classi er generation and performance veri cation 34 genes which play a major role in NMF clustering analysis were selected to establish a subtype prediction classi er. The 34-gene-Classi er was used to predict the metabolic subtypes of GSE31312 dataset. The consistency between the prediction results and those obtained by NMF before GSE 31312 was compared.

Statistical analysis methods:
In this article, R x64 3.5.2 version was used for calculation and statistical analysis. Unpaired t-test was used to compare two groups with normally distributed variables. wilcox.test was used to compare two groups with non-normally distributed variables. Contingency table variables were analyzed by Chi-square test or Fisher's exact test. The Kaplan-Meier method was used for survival analysis, and the log-rank test was used for prognostic comparison. A univariate Cox proportional hazards regression model was used to estimate the hazard ratios for univariate analyses. A two-tailed P value < 0.05 was statistically signi cant.

Results
3.1 Identify two metabolism-related subtypes according to metabolic genes A owchart was developed to systematically describe our study (Fig. 1A). The Canonical Pathways gene set from the KEGG database were downloaded from the GSEA website(https://www.gsea-msigdb.org/). According to this table, the key words of "metabolism" were used as screening conditions to extract the genes related to metabolic pathways, and 948 metabolism-related genes were obtained (Table S1). Among them, 883 genes were expressed in the ve data sets. The expression data of these 883 metabolic genes were extracted from the 5 data sets, and The ComBat function of the sav package was used to remove the batch effect( Figure S2), and the merge function merges the data, The metadata was used for subsequent non-negative matrix decomposition (NMF) clustering.
Before NMF is executed, the ltering process is performed. MAD function of DescTools package was used to calculate MAD of each gene in the combined data of 5 data sets. 778 eligible genes were selected according to MAD > 0.5. Then, Cox regression was performed to evaluate the association between all candidate genes and overall survival (OS) by using r-package "survival". 106 metabolic genes with strong prognosis were obtained (Table S2). Finally, 106 genes with high variability (MAD > 0.5) and signi cant prognostic value (P < 0.001) were used for sample clustering.
( Fig. 1D) (Table S4) 3.2 Correlation of the DLBCL subclasses with metabolism-associated signatures In this study, NMF classi cation was based on metabolism-related gene expression levels, for which we further studied the metabolic characteristics of the two subclasses. Nmftype1 was set as the experimental group and nmftype 2 as the control group. The expression data of the two subtypes were compared by limma package, The gene logFC sequence was obtained by comparing the gene expression difference between the two subgroups. We downloaded C2.CP. Kegg. V7.0.symbols document from GSEA's o cial website as a reference and used GSEABase R package for GSEA analysis [24]. According to the standard of adjust.p < 0.05, 47 enrichment pathways were obtained, Among them, there are 6 pathways related to metabolism, If NES was positive, the pathway was considered as nmftype1 speci c metabolic pathway. If NES was negative, the pathway was considered as nmftype2 speci c metabolic pathway. It was found that there was a speci c metabolic pathway in nmftype1, namely pyrimidine related metabolic pathway. There are 5 speci c metabolic pathways in nmftype2, including the metabolic process of CYTOCHROME P450, ARACHIDONIC ACID, DRUG METABOLISM CYTOCHROME_P450, TRYPTOPHAN, etc. (Fig. 1E) (Table S5) 3.3 Transcriptome comparison of the two subtypes The difference of mRNA expression between the two subtypes was analyzed according to limma package, 556 differentially expressed genes (482 down-regulated genes and 74 up-regulated genes) were obtained with the threshold of logFC > 1 and P < 0.05 (Table S6). Up-regulated gene sets were regarded as nmftype1 speci c gene set, and down-regulated gene sets were regarded as nmftype2 speci c gene set.Then the ClusterPro ler package was used to analyze KEGG and GO enrichment of up-regulated and down-regulated genes respectively, and the enrichment analysis results were obtained using P.ADjust < 0.05 as the screening criteria. The enrichment analysis results were obtained using adjust.P < 0.05 as the screening criteria( Figure S5). GO enriched nmftype1 was associated with genetic material. Nmftype2 is related to extracellular matrix. The results of KEGG enrichment showed that nmftype1 had no speci c enrichment, and the rst three speci c enrichment pathways of nmftype2 were ECM-receptor interaction, Focal adhesion, PI3K-Akt signaling pathway. (Table S7) (Table S8) 3.4 Evaluation of immune in ltration CIBERSORT was used to obtain the in ltration score of immune cells in the samples (Table S9). Samples with credible in ltration results were selected according to the threshold value of P < 0.05, and the number of samples changed from 742 to 737 (nmftype 1 samples changed from 469 to 466, nmftype 2 samples changed from 273 to 271). The corresponding heat map of immune cell in ltration was shown in the gure( Fig. 2A). Wilcox.test was used to compare the in ltration scores of each immune cell in the two groups, There were signi cant differences between the two groups in in ltration of multiple immune cells. B-cell in ltration of nmftype1 subtype was signi cantly higher than that of nmftype2 subtype, while T-cell and macrophage in ltration of nmftype2 subtype was higher than that of nmftype1 subtype (Fig. 2B). The relationship between subtypes and gene expression of 10 potential targeted immune checkpoints was further investigated(CTLA-4/CCL2/ CD276/CD4/CXCR4/IL1A/IL6/LAG3/ PDL1(CD274)/ TGFB1). These genes were selected based on current drug inhibitors in clinical trials or drugs that have been approved for speci c cancer types. The results showed that there were differences among the 9 immune targets, among which 7 genes (CTLA-4/CCL2/ CD276/ CXCR4/ IL6/LAG3/PDL1 (CD274)) were highly expressed in nmftype2 compared with nmftype1 (P < 0.05). The 2 immune checkpoint genes of nmftype1 had higher expression levels than nmftype2. (Fig. 2C)

Relationship between subtypes and clinical characteristics
To further study the relationship between subtypes and clinical features. GSE10846 with complete clinical information was selected as the case set for clinical feature analysis ( Table 2). Chi-square test was used to compare the clinical characteristics of different subtypes. It was found that Extranodalsites (P = 0.040) and GEP subtype(P < 0.001) were correlated with the metabolic subtype in the GSE10846 dataset, but not with age(P = 0.779) gender (P = 0.536), stage(P = 1), ECOG(P = 0.372) in the GSE10846 dataset. (Fig. 3A) 3.6 Comparison of prognostic characteristics of the two subtypes signi cant prognostic difference was observed in combined data set (log-rank test P < 0.0001), the overall survival of nfmtype1 was worse than that of nmftype2. And there were signi cant differences in the prognosis among the four data sets GSE10846(OS P < 0.0001) GSE11318(OS P = 0.00085) GSE53786(OS P = 0.011) GSE87371(OS P < 0.016). The combined samples were strati ed by ABC and GCB, and the nmftype1 suggested a worse prognosis than nmftype2 in GCB samples (P = 0.0049), The same analysis was performed on GSE31312 data set, and consistent results were found in GSE31312 (overall data OS:P = 0.00021)(GCB data OS:P < 0.0001) (Fig. 3B) 3.7 Comparison of metabolic subtype classi cation and existing GEP subtype classi cation Comparing the distribution of the two metabolic subtypes and GEP subtypes with ggstatplot package, it can be seen that ABC subtype accounts for 46%, GCB subtype accounts for 35% in nmftype1 subtype, ABC subtype accounts for 21% and GCB subtype accounts for 52% in nmftype2 subtype. The original unclassi ed subtype is also classi ed according to metabolic subtypes. (Fig. 4A). Three prognostic model indexes [23] were calculated by log-rank test and cox prognostic model. ABC and GCB subtypes: OS(Chisq = 45.6, p = 1e-11) cox prognostic model (C-index: 0.619 ± 0.018), AIC = 2352.155); nmftype1 and nmftype2 subtypes: OS (Chisq = 23.8, P = 1E-06) COX prognostic model (C-index:0.574 ± 0.015,AIC = 3054.996); The two subtypes were combined to obtain four groups of ABC-nmftype1 and ABC-nmftype2, GCB-nmftype1 and GCB-nmftype2: OS (Chisq = 50.7, p = 6e-11)cox prognostic model(C-index:0.637 ± 0.018,AIC = 2347.423). According to the results, the prediction ability of the nmftype subtype was weaker than that of the ABCGCB subtype. However, the new classi cation method combined with the existing subtype classi cation method had better prediction effect than the existing ABCGCB subtype. (Fig. 4B).

3.8Building a gene classi er
In order to build a simpli ed classi er for clinical use, 34 genes which contributed to NMF clustering were selected to construct the gene classi er. (Fig. 4C) (Table S10). This gene classi er was used to predict the metabolic subclass of GSE31312 and the predicted grouping results of this classi er were compared with the original NMF classi cation, (Table S11) We observed a consistency of 73.2% for nmftype1 subclass and 86.9% for nmftype2 subclass. The results indicate that the DLBCL classi cation can be determined repeatably with 34 gene signatures. (Fig. 4B).

Discussion
Although several DLBCL classi cations based on gene expression and mutation have been proposed in recent years, there are still problems such as insu cient sample size, complexity, and inability to classify samples completely. In this study, ve DLBCL data sets (GSE10846, GSE11318, GSE53786, GSE87371 and GSE23501) from the GEO database GPL570 platform were combined to obtain 742 DLBCL samples, and two metabolic subtypes, nmftype1 and nmftype2 were identi ed. The study found that nmftype2 had obvious metabolic characteristics, high expression of immune checkpoint gene, mainly in ltration of T cells and macrophages, and a good prognosis. The metabolic signal of nmftype1 was lower than that of nmftype2, most of the immune checkpoint genes were low in expression, and b-cell in ltration was predominant, indicating a worse prognosis. In general, this study classi ed DLBCL from the perspective of metabolism, and obtained two clusters with active and failing metabolic activities. Moreover, further analysis of metabolic subtypes found that DLBCL has different characteristics of immune in ltration and prognosis.
In  [25] In this study, two subtypes with different metabolic characteristics were also found. Nmftype1 showed low metabolic activity. There was only one pathway related to the metabolism of pyrimidine, and nmftype2 showed high metabolic activity. The abundance of metabolic characteristics indicates that nmftype2 patients are more likely to bene t from metabolic therapy. For example, RNA interference or drug interference with tigecycline (Tigecyl), an FDA approved inhibitor, has selective toxicity to OxPhos-DLBCL cell line tumors. [25]In 2019, Johanna chiche et al. Found that DLBCL with low GAPDH expression was metabolized by OXPHOS and depended on mTORC1 signal transduction and glutamine decomposition. Lymphoma with low GAPDH expression showed poor response to R-CHOP treatment and was sensitive to mitochondrial metabolic inhibitors.
[26]These studies provide insights into the prediction of potential responses to metabolic therapy. The identi cation of two subtype speci c metabolic pathways is expected to provide metabolic therapy for some metabolic processes, providing an alternative for chemotherapy resistant patients.
In complex tumor microenvironment, metabolic reprogramming is mainly guided by the serine/threonine kinase mTOR (mammalian target of rapamycin). [27]The enrichment of nmftype1 by go was related to genetic material, Nmftype2 is associated with extracellular matrix.The results of KEGG enrichment showed that nmftype1 had no speci c enrichment, and the rst three speci c enrichment pathways of nmftype2 were ECM-receptor interaction, Focal adhesion, PI3K-Akt signaling pathway. Focal adhesion signal inhibitor E7123 can induce caspase dependent cell death in DLBCL.
[28] The e cacy of E7123 in the treatment of DLBCL has been veri ed in a mouse model of diffuse large B-cell lymphoma with central nervous system involvement. [29] ROR1 signi cantly promotes the tumorigenesis of DLBCL by regulating PI3K / Akt / mTOR signaling pathway. Targeting ROR1 may provide a promising strategy for the treatment of DLBCL. [30] In rare cases, patients with relapsed and refractory DLBCL have achieved complete response, which suggests that some B-lymphomas may be extremely sensitive to rapalogs, possibly because they show active mTORC1 signal transduction [27].focal adhesion signal inhibitor E7123 and targeted ROR1 may be therapeutic strategies for nmftype2, but may not be effective for nmftype1. When PI3K and Akt inhibitors were used in cell lines and mouse models, the phosphatidylinositol-3-kinase (PI3K)α/δ(PI3Kα/δ) Inhibitor AZD8835 showed signi cant effect in ABC DLBCL model, while azd5363, a protein kinase B (Akt) inhibitor AZD5363 induced apoptosis in PTEN de cient DLBCLs. [31] the enrichment of nmftype2 on PI3K Akt signaling pathway suggests that nmftype2 is more likely to bene t from the treatment of PI3K and Akt inhibitors. The discovery of two speci c pathways of DLBCL metabolic subtypes can reveal the molecular pathogenesis and potential heterogeneity of the two DLBCL metabolic subtypes, and provide a reference for the development of new therapeutic schemes and the screening of clinically bene cial patients.
At present, many immunotherapies have shown encouraging results in patients with recurrent or refractory DLBCL [32]. Metabolic changes in tumor microenvironment (TME) can inhibit anti-tumor immunity such as immune cell in ltration by producing immunosuppressive metabolites. Metabolic disorders of cancer cells will further affect the expression of cell surface markers, thus interfering with immune monitoring [33]. the two metabolic subtypes showed different characteristics of immune in ltration, among which nmftype1 subtype showed signi cant B cell in ltration, and nmftype2 subtype T cell and macrophage in ltration were signi cant. Insu cient in ltration of T cells and / or NK cells indicates low survival rate [34], which may be related to the poor prognosis of nmftype1. Seven potential immunotherapeutic targets or immune checkpoint gene of high metabolic nmftype2 subtype were higher than those of low metabolic nmftype1 (CTLA-4 / CCL2 / PD-1 (CD276) / CXCR4 / / IL6 / Lag3 / PDL1 (CD274)) (P < 0.05). The expression level of IL1A / TGFB in nmftype1 was higher than that of nmftype2 subtypes (P < 0.05). High expression of PD-L1 / PD-L2 in DLBCL cells is associated with good prognosis [35], which is consistent with our conclusion. Because tumor microenvironment promotes immune escape, the clinical bene ts of immunosuppression are still small.
[36] the combination of immunotherapy and antimetabolism therapy may change this situation. previous studies have shown that the targeted blocking effect of CD73 signi cantly enhances the therapeutic activity of anti PD-1 and anti CTLA-4 monoclonal antibodies, which con rms this conclusion.
[37] the two metabolic subtypes have different potential targets for immunotherapy, which provides guidance for the combination of immunotherapy and antimetabolic drugs. Nmftype2 is characterized by high metabolism and high expression of immunotherapy target, which is more likely to bene t from immunotherapy and metabolic therapy. The high expression of IL1A / TGFB1 in nmftype1 may provide possibilities to improve the poor prognosis of nmftype1 subtypes.
In order to further guide the clinical practice, we also studied the relationship between the classi cation of the two subtypes and the clinical features and prognosis. Through the analysis of geo10846 data set, we found that the two subtypes were only correlated with Extranodalsites (p = 0.040) and ABCGCB subtype (p < 0.001). Considering the incomplete clinical feature collection, this problem can be further studied. The results showed that the classi cation of metabolic subtypes was closely related to prognosis (OS).
Nfmtype1 showed worse prognosis (OS). In the strati ed analysis of ABC and GCB subtypes, nmf1 subtype still indicated worse prognosis (OS) in GCB samples (P = 0.0049), and was veri ed in GSE31312.
Compared with the original GEP subtype distribution, this classi cation method can better de ne the unclassi ed subtypes. The model method for distinguishing metabolic subtypes is not a simple repetition of GCB and ABC classi cation, and has new clinical signi cance. Although the ability of metabolic subtypes to predict prognosis is weaker than ABCGCB subtypes, the combination of the metabolic model classi cation and the original ABCGCB classi cation has a better predictive effect on the prognosis than ABCGCB alone. GCB-nmftype2, GCB-nmftype1, ABC-nmftype2 and ABC-nmftype1 in turn indicate the possibility of worse prognosis. With this model, the existing subtypes that can not be classi ed can be de ned, the heterogeneity of ABCGCB can be solved to a certain extent, and a better prognosis prediction model than the existing ABCGCB subtypes is established. In order to further simplify and guide the clinical application of the metabolic classi cation model, we constructed 34 gene classi ers to predict which metabolic subtype the samples belong to. By verifying the metadata set itself, it is found that the grouping result of the classi er is consistent with that of the original NMF classi cation method.
In the existing studies de ning DLBCL subtypes, the sample size of this study is at the forefront [3]. This study provides a new insight into the heterogeneity of DLBCL from the perspective of metabolism. Two metabolic subtypes with different characteristics in transcription, metabolism, immunity and prognosis have been identi ed. We provide data on the role of immune landscape in DLBCL and how Metabolism-Related DLBCL subtypes affect the cellular composition and immune function of tumor microenvironment. New knowledge may lead to promising treatments in the near future. A new prognostic model combining metabolic subtypes and ABCGCB subtypes has been proposed for the rst time, and the prediction ability of this model for DLBCL is better than the existing ABCGCB subtypes.
Finally, we studied the 34 gene simpli ed classi er to predict the metabolic subtype model, which provides the possibility for clinical application. This study provides insights into the potential response to metabolic therapy. By revealing the characteristics of metabolism, malignant tumor molecular subsets with unique biomarkers can be identi ed, which brings opportunities for new treatment opportunities. Through better understanding of metabonomic changes, these targets can be transformed into drug treatable tumor metabolites and developed anti metabolic therapy for high-risk DLBCL.

Conclusions
In conclusion, we classi ed DLBCL by metabolic genes and got a new typing method. This classi cation has prognostic signi cance and is helpful to guide the individualized application of immunotherapy and metabolic therapy. This study provides a new understanding of the heterogeneity of DLBCL from the perspective of metabolism.  Immune characteristics of two subclasses in the metadata dataset. The violin gure of the abundance of immune distinguished by different subclasses, The statistical difference was compared through the wilcox.test, the P values are labeled above each the violin gure. (C) Expression levels of 10 immune checkpoint genes in two DLBCL subclasses (normalized count).Wilcox. test was used to compare the statistical differences. P values were marked with an asterisk above each boxplot (NS means no signi cance, *P < 0.05, **P < 0.01, ***P < 0.001).

Figure 3
Page 22/23 (A) clinical features of the DLBCL subclass in the GSE10846 dataset (B) the differences in OS between the two subtypes (nmftype1, nmftype2) in the metadata set and ve separate datasets(GSE23501 GSE53786 GSE10846 GSE87371 GSE11318)were veri ed by GSE31312, and the statistical signi cance of the differences was determined by logarithmic rank test.

Figure 4
Page 23/23 (A) distribution of GEP subtypes in the two metabolic subtypes. The predictive ability of the three prognostic models(metabolic subtype, GEP subtypes, joint metabolic subtypes and GEP subtypes) for OS was compared. Chi-square value was calculated by log-rank test, and C-index and AIC were calculated by

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.