3.1 The Distribution Landscape of 547 Samples Across the 8 GEO Cohorts
Eight NASH datasets were retrieved from the GEO database, specifically GSE135251, GSE89632, GSE115193, GSE126848, GSE48452, GSE61260, GSE126848, and GSE130970.The number of normal and NASH samples contained in these 8 GEO cohorts is displayed in Fig. 2A and 2B. The GSE135251 and GSE48452 datasets were merged after batch effect removal (Figs. 2C and 2D). The merged dataset serves as the training set, comprising 51 normal samples and 173 NASH samples (Fig. 2E).
Figure 2 The processing of samples from the 8 GEO cohorts. (A) Number of normal samples in the 8 cohorts. (B) Number of NASH samples in the 8 cohorts. (C) Training set before batch effect removal. (D) Training set after batch effect removal. (E) PCA plot of the distribution of normal and NASH samples in the training set.
3.2 Fifteen MRGs Enrich in Metabolic Pathways
Using a criterion of |logFC| > 0.5 and adjusted p < 0.05 in the training dataset, a total of 197 DEGs were filtered (Fig. 3A). Among these, 78 DEGs were underexpressed, and 119 DEGs were overexpressed in the training dataset (Fig. 3B). By intersecting the 197 DEGs with 2030 MRGs, 15 mitochondria-related differential genes were ultimately obtained (Fig. 3C). PCK1, MTHFD2L, P4HA1, IGF1, and NR4A1 were downregulated in the training dataset, while FDPS, ACE2, E2F1, ME1, FADS1, TYMS, FASN, MMP9, TREM2, and AKR1B10 showed the opposite trend (Fig. 3D). By analyzing the biological pathways associated with these 15 genes, we observed their significant enrichment in various metabolic pathways including mitochondrial tissue regulation, cholesterol metabolism, fatty acid metabolism, and monocarboxylic acid metabolism (Fig. 3E, 3F). Furthermore, these 15 genes are closely associated with diseases such as fatty liver, hepatitis, and liver injury (Fig. 3G).
Figure 3 Identification of 15 mitochondria-related differential genes. (A) Volcano plot. DEGs with |logFC| > 1.0 are highlighted with gene names. (B) 78 downregulated DEGs and 119 upregulated DEGs in the training dataset. (C) Intersection of DEGs and MRGs results in 15 genes. (D) 5 downregulated DEGs and 10 upregulated DEGs. (E) 9 most significant biological pathways associated with the 15 DEGs in the Metascape database. (F) Gene and PPI network graph related to the 15 DEGs in the GeneMania database. (G) The top enriched disease for the 15 DEGs is fatty liver.
3.3 Construction of a Predictive Model for Three Mitochondrial-Related Genes
In further screening key genes for constructing a diagnostic model in NASH, a total of 101 machine learning combination algorithms were employed. The Random Forest (RF) algorithm emerged as the optimal choice, as it successfully identified three crucial genes, namely AKR1B10, TYMS, and TREM2, in the training set (Merge cohort) and across seven validation datasets (GSE164760, GSE130970, GSE61260, GSE115193, GSE126848, GSE89632, and Meta-Cohort). The Mitochondrial-Related Model, comprised of these three genes, demonstrated excellent diagnostic performance with ROC values of 0.999, 0.874, 0.935, 0.944, 1.000, 0.974, 0.991, and 0.933 across the eight cohorts (Fig. 4A). Additionally, we investigated the ROC values of these three genes individually for NASH diagnosis in the same eight cohorts (Figs. 4B-4I). Surprisingly, the Mitochondrial-Related Model exhibited superior performance in diagnosing NASH compared to AKR1B10, TYMS, and TREM2.
Furthermore, a comparative analysis of the expression levels of AKR1B10, TYMS, and TREM2 across different subgroups was conducted. Compared to healthy individuals, AKR1B10, TYMS, and TREM2 exhibited significant upregulation in NASH patients (Figs. 5A-5H). Moreover, AKR1B10 and TYMS were associated with liver fibrosis progression, as they showed markedly higher expression in patients with liver fibrosis stages F3-F4 (Figs. 5I-5M). Additionally, in NAFLD-HCC patients, AKR1B10 and TYMS were also significantly upregulated (Fig. 5N). These findings suggest the involvement of AKR1B10 and TYMS in the progression from normal liver to non-alcoholic fatty liver disease, liver fibrosis, and ultimately to liver cancer.
Figure 4 Constructing NASH diagnostic models with machine learning algorithms.
(A) ROC values for the diagnosis of NASH using 113 machine learning combination algorithms in the training set (Merge cohort) and validation sets (GSE164760, GSE130970, GSE61260, GSE115193, GSE126848, GSE89632, Meta-Cohort). (B-I) ROC values for the diagnosis of NASH in the training set (Merge cohort) and validation sets (GSE164760, GSE130970, GSE61260, GSE115193, GSE126848, GSE89632, Meta-Cohort).
Figure 5 The expression differential landscape of three model genes among different subgroups. (A) The expression differences of AKR1B10, TYMS, and TREM2 between the normal group and NASH group are shown in 8 datasets (Merge, GSE164760, GSE130970, GSE61260, GSE115193, GSE126848, GSE89632, Meta-Cohort). (B) The expression differences of AKR1B10, TYMS, and TREM2 between different liver fibrosis stage groups are presented in 5 datasets (Merge, GSE130970, GSE89632, GSE49541, GSE213621). (C) The expression differences of AKR1B10, TYMS, and TREM2 between the NAFLD group and NAFLD-HCC group are depicted in the GSE164441 dataset. * P < 0.05, ** P < 0.01, *** P < 0.001.
3.4 Upregulation of Three Model Genes Indicates a More Severe NASH
Next, the enriched biological pathways and mitochondrial pathways in the NASH group were further investigated. NASH patients were significantly enriched in “Nitrogen metabolism”, “Cysteine and methionine metabolism”, and “Nicotinate and nicotinamide metabolism”, which are associated with the progression of NASH (Fig. 6A). Furthermore, the mitochondrial pathways “OXPHOS”, “Complex IV”, “CIV subunits”, “Complex I”, “Fe − S cluster biosynthesis”, “Tetrahydrobiopterin synthesis”, “CI subunits”, “Iron homeostasis”, “Lipoate insertion”, “fMet processing”, and “CII assembly factors” were significantly upregulated in the NASH group (Fig. 6B). These mitochondrial pathways are typically associated with mitochondrial function, metabolic regulation, oxidative stress, and inflammation in various biological processes. We also investigated the significant variations in immune function scores and immune cell populations between the NASH group and the normal group “HLA” and “Inflammation-promoting” showed significant upregulation in the NASH group (Fig. 6C). And They are linked to inflammation promotion “Macrophages M1” have the capability to release pro-inflammatory cytokines, and their abundance is noticeably higher in the NASH group (Fig. 6D). Conversely, “Macrophages M2” can synthesize and discharge anti-inflammatory cytokines, and their quantities are reduced in the NASH group.
In NASH patients, we investigated the significantly enriched biological pathways associated with the upregulation of TREM2, TYMS, and AKR1B10, respectively. These were notably enriched in “Lysine degradation” and “Glycine serine and threonine metabolism” both of which are related to the metabolic abnormalities and lipid metabolism in NASH (Figs. 7A-7C). TREM2, TYMS, and AKR1B10 are significantly positively correlated with pro-inflammatory genes (CCL2, IL1B, CSF1, HLA-DRA, IL10, PDGFA, TGFB1, TGFB2, TGFB3, TNF) and liver fibrosis genes (COL1A1, COL3A1) (Fig. 7D). TREM2 and TYMS are significantly positively correlated with the lipid synthesis gene (PPARG), while TREM2, TYMS, and AKR1B10 are significantly negatively correlated with genes related to fatty acid beta-oxidation (PPARA). TREM2, TYMS, and AKR1B10 are significantly associated with top-ranked NASH genes from the GeneCard database (PNPLA3, GPT, CYP2E1, ATG7, MSR1, KRT18, PRMT7, SREBF1, TM6SF2, LEPR, LPL) (Fig. 7E). Among these, PNPLA3 is the top-ranked NASH-related gene, which is associated with TYMS. TREM2, TYMS, and AKR1B10 are significantly positively correlated with various immune signatures, especially “Inflammation-promoting”, “CCR”, “HLA”, “MHC class I”, and “Parainflammation” (Fig. 7F). TREM2, TYMS, and AKR1B10 are significantly associated with monocytes and macrophages. They are significantly positively correlated with pro-inflammatory M1 macrophages and significantly negatively correlated with anti-inflammatory M2 macrophages. AKR1B10, TYMS, and TREM2 are significantly positively correlated with NAS (NAFLD Activity Score) scores (Fig. 7G).
Figure 6 Differential pathways and immune signatures between the normal group and the NASH group. (A) Upregulated and downregulated biological pathways in the NASH group are shown in GSVA plots. (B) Upregulated and downregulated mitochondrial pathways in the NASH group are depicted in GSVA plots. (C) Distinct immune signatures between the normal group and the NASH group. * P < 0.05, ** P < 0.01, *** P < 0.001. (D) Differential immune cell populations between the normal group and the NASH group. * P < 0.05, ** P < 0.01, *** P < 0.001.
Figure 7 TREM2, TYMS, and AKR1B10's potential roles in NASH. (A-C) Enriched biological pathways associated with upregulation and downregulation of TREM2, TYMS, and AKR1B10. (D) Upregulation of TREM2, TYMS, and AKR1B10 exacerbates inflammation (CCL2, IL1B, CSF1, HLA-DRA, IL10, PDGFA, TGFB1, TGFB2, TGFB3, TNF), lipid deposition (PPARG), and liver fibrosis (COL1A1, COL3A1). (E) Significant associations of TREM2, TYMS, and AKR1B10 with NASH genes from the GeneCard database (with NASH relevance scores > 10). (F) Correlations of TREM2, TYMS, and AKR1B10 with 13 immune signatures and 19 immune cells. (G)The relationship between AKR1B10, TYMS, and TREM2 and NAS scores. * P < 0.05, ** P < 0.01, *** P < 0.001.
3.5 The Mitochondrial Characteristics of the Three Model Genes
TREM2, TYMS, and AKR1B10 are significantly enriched in “Lysine metabolism” and “Glycine metabolism” (Fig. 8A). “Lysine metabolism” involves the metabolism of lysine and is associated with liver fibrosis, while “Glycine metabolism” is involved in the metabolism of glycine and is related to abnormalities in liver fat metabolism and sugar metabolism. Furthermore, AKR1B10, TYMS, and TREM2 are significantly associated with respiratory chain complex (I-V) genes (Figs. 8D-8H).
Figure 8 Mitochondrial characteristics of AKR1B10, TYMS, and TREM2. (A-C) Enriched mitochondrial pathways associated with the upregulation and downregulation of AKR1B10, TYMS, and TREM2. (D-H) Significant associations of AKR1B10, TYMS, and TREM2 with mitochondrial respiratory chain complex (I-V) genes. * P < 0.05, ** P < 0.01, *** P < 0.001.
3.6 AKR1B10 and TREM2 are enriched in M1 macrophages
A total of 30,038 single cells were obtained from the GSE129516 dataset. The resolution was set to 1.5 for dimensionality reduction of the corrected data (Fig. 9A). Single-cell data were classified into 28 cell clusters, which were subsequently auto-annotated into 8 cell types (Fig. 9B). The distribution landscape of these 8 cell types is shown in Fig. 9C. Considering the limitations of the “SingleR” package, manual annotation was performed again. Using immune cell surface markers for re-annotation of single-cell data, the markers for the 8 immune cell types are displayed in Fig. 9D. Single-cell data were re-annotated as M1 macrophages, M2 macrophages, fibroblasts, CD8 + T cells, CD4 + T cells, neutrophils, and B cells (Fig. 9E). AKR1B10 and TREM2 were significantly enriched in M1 macrophages associated with inflammation (Fig. 9F).
Figure 9 Single-cell annotation figures. (A) Dendrogram. Single-cell data were classified into different clusters at different resolutions. (B) Automatic annotation plot using the “SingleR” package. (C) Single-cell data were automatically annotated into 8 cell types. (D) Expression of 7 immune cell surface markers across 28 clusters. (E) Manual annotation of single-cell data into 7 immune cell types. (F) AKR1B10 and TREM2 are significantly enriched in M1 macrophages.
3.7 The Expression Pattern of Mitochondrial-Related Genes in NASH
NASH patients were divided into two subgroups based on the expression levels of three model genes (Fig. 10A). Principal component analysis. The effective separation of NASH patients into two distinct subgroups was confirmed by the principal component analysis (Fig. 10B). AKR1B10, TYMS, and TREM2 genes were significantly upregulated in the C1 group (Fig. 10C). Additionally, we observed a higher proportion of patients with NAS scores and liver fibrosis stages F3-F4 in the C1 subgroup (Figs. 10D, 10E). In the C1 subgroup of patients, the expression levels of pro-inflammatory genes (CCL2, IL1B, CSF1, HLA-DRA, PDGFA, TGFB1, TGFB3, TNF) and liver fibrosis genes (COL1A1, COL3A1) were higher (Fig. 10F). Additionally, lipid synthesis gene (PPARG) was significantly upregulated in C1 subgroup patients, while the expression of fatty acid oxidation gene (PPARA) was lower. Immune function scores with pro-inflammatory properties, such as “CCR”, “Cytolytic activity”, “HLA”, “Inflammation-promoting”, “MHC class I”, and “Parainflammation” were significantly upregulated in the C1 subgroup (Fig. 9G). Pro-inflammatory components (Neutrophils and M1 macrophages) were more abundant in the C1 subgroup, while anti-inflammatory components (NK cells resting and M2 macrophages) showed the opposite trend (Fig. 9H). “KEGG_ALANINE_ASPARTATE_AND_GLUTAMATE_METABOLISM”, “KEGG_GLYCINE_SERINE_AND_THREONINE_METABOLISM”, and “KEGG_CYSTEINE_AND_METHIONINE_METABOLISM” are significantly upregulated in the C1 group, and these pathways are associated with inflammatory infiltration (Fig. 10I). The mitochondrial pathways “Lipoate insertion” and “Glycine metabolism” were significantly upregulated in Group C1 (Fig. 10J).
Due to the significant differences between the two groups, we performed WGCNA analysis to screen for differentially expressed genes between the two groups (Fig. 11A, 11B). The yellow module contains 217 genes and is the module most strongly positively correlated with the C1 subgroup (Fig. 11C). Differentially expressed genes are significantly enriched in inflammation-related pathways and fibrosis-related pathways, including “Chemokine receptors bind chemokines”, “IL-18 signaling pathway”, “regulation of response to wounding”, and “cellular response to tumor necrosis factor” (Figs. 11D, 11E). Furthermore, “Inflammation”, “Chronic liver disease”, and “Fibrosis” are among the top-ranked diseases enriched with differentially expressed genes (Fig. 11F).
Figure 10 The clinical and biological differences between the two groups of NASH patients. (A) NASH patients were divided into two groups. (B) NASH patients between the two groups were distinctly separated. (C) TREM2, TYMS, and AKR1B10 were significantly upregulated in Group C1. (D) Group C1 had significantly higher NAS scores. (E) Group C1 had a higher proportion of patients in liver fibrosis stages F3-F4. (F) Pro-inflammatory genes (CCL2, IL1B, CSF1, HLA-DRA, PDGFA, TGFB1, TGFB3, TNF), lipid synthesis genes (PPARG), and liver fibrosis-related genes (COL1A1, COL3A1) were significantly upregulated in Group C1. (G) Immune functional scores associated with inflammation promotion were upregulated in Group C1. (H) Differences in immune cell content between Group C1 and Group C2. (I) Significantly upregulated and downregulated biological pathways in Group C1. (J) Significantly upregulated and downregulated mitochondrial pathways in Group C1.
Figure 11 271 differentially expressed genes between the two groups were identified by WGCNA analysis. (A) The soft threshold was set to 2. (B) All genes were grouped into different gene modules. (C) The 271 genes contained within the yellow module were the most significantly different genes between the two groups. (D-E) Biological pathways significantly enriched with the 271 differentially expressed genes. (F) Diseases significantly enriched with the 271 differentially expressed genes.
3.8 The upregulation of three model genes in NASH
According to q-PCR analysis, we can observe that AKR1B10, TYMS, and TREM2 are significantly upregulated in NASH mouse (Figs. 12A-12C).
Figure 12q-PCR analysis. (A-C) AKR1B10, TREM2, and TYMS are significantly upregulated in the liver tissues of NASH mouse. * P < 0.05, *** P < 0.001, **** P < 0.0001.