3.1 Identify two metabolism-related subtypes according to metabolic genes
A flowchart was developed to systematically describe our study (Fig. 1A). The Canonical Pathways gene set from the KEGG database were downloaded from the GSEA website(https://www.gsea-msigdb.org/). According to this table, the key words of "metabolism" were used as screening conditions to extract the genes related to metabolic pathways, and 948 metabolism-related genes were obtained(Table S1). Among them, 883 genes were expressed in the five data sets. The expression data of these 883 metabolic genes were extracted from the 5 data sets, and The ComBat function of the sav package was used to remove the batch effect(Figure S2), and the merge function merges the data, The metadata was used for subsequent non-negative matrix decomposition (NMF) clustering.
Before NMF is executed, the filtering process is performed. MAD function of DescTools package was used to calculate MAD of each gene in the combined data of 5 data sets. 778 eligible genes were selected according to MAD > 0.5. Then, Cox regression was performed to evaluate the association between all candidate genes and overall survival (OS) by using r-package “survival”. 106 metabolic genes with strong prognosis were obtained (Table S2). Finally, 106 genes with high variability (MAD > 0.5) and significant prognostic value (P < 0.001) were used for sample clustering.
Using the NMF function of the NMF package, according to the expression profiles of the 106 candidate genes mentioned above, the NMF clustering was performed on the combined sample set of five data sets. The K-value at which the cophenetic coefficient begins to decline was selected as the optimal decomposition point, and k = 2 was finally obtained as the optimal clustering number.[20] (Fig. 1B). The 34 genes that made major contributions to the NMF were obtained via the NMF package's extractFeatures function.( "NNMT", "PTGDS", "HNMT", "ASAH1", "ENPP2", "DPYD", "OAT", "ALDH2", "DEGS1", "TDO2", "KYNU", "PLA2G7", "GUCY1A1", "PAPSS2", "GUCY1B1", "PLPP3", "CMPK2", "PIK3CG", "PTGIS", "KMO", "PLPP1", "MTMR6", "AKR1C3", "PAPSS1", "PRPS2", "NMRK1", "NANS" , "IMPDH2", "PAICS", "SRM", "PFKP", "GLO1", "POLD2", "PNP"). NMF analysis was performed again with the major contributing genes, and two subclasses were obtained through NMF, which were 469 cases of nmftype1 and 273 cases of nmftype2.(Figure S3) (Table S3) In order to verify the distribution of subclasses, we also performed PCA to reduce the dimension of features, and found that the subtypes are basically consistent with the two-dimensional PCA distribution pattern (Fig. 1C). Two subtypes were obtained by performing NMF typing on GSE31312(Figure S4). The submap of GENEPATTER was used to determine whether the subclasses identified in the two datasets were related. The results show that the nmftype1 and nmftype2 subclasses in the metadata set are highly correlated with the corresponding subclasses in GSE31312. (Fig. 1D) (Table S4)
3.2 Correlation of the DLBCL subclasses with metabolism-associated signatures
In this study, NMF classification was based on metabolism-related gene expression levels, for which we further studied the metabolic characteristics of the two subclasses. Nmftype1 was set as the experimental group and nmftype 2 as the control group. The expression data of the two subtypes were compared by limma package, The gene logFC sequence was obtained by comparing the gene expression difference between the two subgroups. We downloaded C2.CP. Kegg. V7.0.symbols document from GSEA's official website as a reference and used GSEABase R package for GSEA analysis [24]. According to the standard of adjust.p < 0.05, 47 enrichment pathways were obtained, Among them, there are 6 pathways related to metabolism, If NES was positive, the pathway was considered as nmftype1 specific metabolic pathway. If NES was negative, the pathway was considered as nmftype2 specific metabolic pathway. It was found that there was a specific metabolic pathway in nmftype1, namely pyrimidine related metabolic pathway. There are 5 specific metabolic pathways in nmftype2, including the metabolic process of CYTOCHROME P450, ARACHIDONIC ACID, DRUG METABOLISM CYTOCHROME_P450, TRYPTOPHAN, etc. (Fig. 1E) (Table S5)
3.3 Transcriptome comparison of the two subtypes
The difference of mRNA expression between the two subtypes was analyzed according to limma package, 556 differentially expressed genes (482 down-regulated genes and 74 up-regulated genes) were obtained with the threshold of logFC > 1 and P < 0.05 (Table S6). Up- regulated gene sets were regarded as nmftype1 specific gene set, and down-regulated gene sets were regarded as nmftype2 specific gene set.Then the ClusterProfiler package was used to analyze KEGG and GO enrichment of up-regulated and down-regulated genes respectively, and the enrichment analysis results were obtained using P.ADjust < 0.05 as the screening criteria. The enrichment analysis results were obtained using adjust.P < 0.05 as the screening criteria(Figure S5). GO enriched nmftype1 was associated with genetic material. Nmftype2 is related to extracellular matrix. The results of KEGG enrichment showed that nmftype1 had no specific enrichment, and the first three specific enrichment pathways of nmftype2 were ECM-receptor interaction, Focal adhesion, PI3K-Akt signaling pathway. (Table S7)(Table S8)
3.4 Evaluation of immune infiltration
CIBERSORT was used to obtain the infiltration score of immune cells in the samples (Table S9). Samples with credible infiltration results were selected according to the threshold value of P < 0.05, and the number of samples changed from 742 to 737 (nmftype 1 samples changed from 469 to 466, nmftype 2 samples changed from 273 to 271). The corresponding heat map of immune cell infiltration was shown in the figure(Fig. 2A). Wilcox.test was used to compare the infiltration scores of each immune cell in the two groups, There were significant differences between the two groups in infiltration of multiple immune cells. B-cell infiltration of nmftype1 subtype was significantly higher than that of nmftype2 subtype, while T-cell and macrophage infiltration of nmftype2 subtype was higher than that of nmftype1 subtype(Fig. 2B). The relationship between subtypes and gene expression of 10 potential targeted immune checkpoints was further investigated(CTLA-4/CCL2/ CD276/CD4/CXCR4/IL1A/IL6/LAG3/
PDL1(CD274)/ TGFB1). These genes were selected based on current drug inhibitors in clinical trials or drugs that have been approved for specific cancer types. The results showed that there were differences among the 9 immune targets, among which 7 genes (CTLA-4/CCL2/ CD276/ CXCR4/ IL6/LAG3/PDL1 (CD274)) were highly expressed in nmftype2 compared with nmftype1 (P < 0.05). The 2 immune checkpoint genes of nmftype1 had higher expression levels than nmftype2. (Fig. 2C)
3.5. Relationship between subtypes and clinical characteristics
To further study the relationship between subtypes and clinical features. GSE10846 with complete clinical information was selected as the case set for clinical feature analysis (Table 2). Chi-square test was used to compare the clinical characteristics of different subtypes. It was found that Extranodalsites (P = 0.040) and GEP subtype(P < 0.001) were correlated with the metabolic subtype in the GSE10846 dataset, but not with age(P = 0.779) gender (P = 0.536), stage(P = 1), ECOG(P = 0.372) in the GSE10846 dataset. (Fig. 3A)
Table 2
Relationship between GSE10846 subtypes and clinical data
Characteristic
|
GSE10846
|
nfmtype1
|
nfmtype2
|
X2
|
P
|
年龄
|
|
|
|
|
|
<=60
|
188
|
113
|
75
|
|
|
> 60
|
226
|
140
|
86
|
0.079
|
0.779
|
性别
|
|
|
|
|
|
男
|
224
|
130
|
94
|
|
|
女
|
172
|
106
|
66
|
0.383
|
0.536
|
stage
|
|
|
|
|
|
Ⅰ-Ⅱ
|
188
|
116
|
72
|
|
|
Ⅲ-Ⅳ"
|
218
|
134
|
84
|
0
|
1
|
ECOG
|
|
|
|
|
|
༜2
|
296
|
180
|
116
|
|
|
≥ 2
|
93
|
62
|
31
|
0.798
|
0.372
|
Extranodalsites
|
|
|
|
|
|
༜2
|
353
|
226
|
127
|
|
|
≥ 2
|
30
|
13
|
17
|
4,201
|
0.040
|
subtype
|
|
|
|
|
|
ABC
|
167
|
124
|
43
|
|
|
GCB
|
183
|
95
|
88
|
17.66
|
< 0.001
|
3.6 Comparison of prognostic characteristics of the two subtypes
significant prognostic difference was observed in combined data set (log-rank test P < 0.0001), the overall survival of nfmtype1 was worse than that of nmftype2. And there were significant differences in the prognosis among the four data sets GSE10846(OS P < 0.0001) GSE11318(OS P = 0.00085) GSE53786(OS P = 0.011) GSE87371(OS P < 0.016). The combined samples were stratified by ABC and GCB, and the nmftype1 suggested a worse prognosis than nmftype2 in GCB samples (P = 0.0049), The same analysis was performed on GSE31312 data set, and consistent results were found in GSE31312 (overall data OS:P = 0.00021)(GCB data OS:P < 0.0001)(Fig. 3B)
3.7 Comparison of metabolic subtype classification and existing GEP subtype classification
Comparing the distribution of the two metabolic subtypes and GEP subtypes with ggstatplot package, it can be seen that ABC subtype accounts for 46%, GCB subtype accounts for 35% in nmftype1 subtype, ABC subtype accounts for 21% and GCB subtype accounts for 52% in nmftype2 subtype. The original unclassified subtype is also classified according to metabolic subtypes. (Fig. 4A). Three prognostic model indexes[23] were calculated by log-rank test and cox prognostic model. ABC and GCB subtypes: OS(Chisq = 45.6, p = 1e-11) cox prognostic model (C-index: 0.619 ± 0.018), AIC = 2352.155); nmftype1 and nmftype2 subtypes: OS (Chisq = 23.8, P = 1E-06) COX prognostic model (C-index:0.574 ± 0.015,AIC = 3054.996); The two subtypes were combined to obtain four groups of ABC-nmftype1 and ABC-nmftype2, GCB-nmftype1 and GCB-nmftype2: OS (Chisq = 50.7, p = 6e-11)cox prognostic model(C-index:0.637 ± 0.018,AIC = 2347.423). According to the results, the prediction ability of the nmftype subtype was weaker than that of the ABCGCB subtype. However, the new classification method combined with the existing subtype classification method had better prediction effect than the existing ABCGCB subtype. (Fig. 4B).
3.8Building a gene classifier
In order to build a simplified classifier for clinical use, 34 genes which contributed to NMF clustering were selected to construct the gene classifier. (Fig. 4C)(Table S10). This gene classifier was used to predict the metabolic subclass of GSE31312 and the predicted grouping results of this classifier were compared with the original NMF classification, (Table S11) We observed a consistency of 73.2% for nmftype1 subclass and 86.9% for nmftype2 subclass. The results indicate that the DLBCL classification can be determined repeatably with 34 gene signatures. (Fig. 4B).