Microanatomy of the metabolic associated fatty liver disease (MAFLD) by single-cell transcriptomics

Abstract Background Metabolic-associated fatty liver disease (MAFLD) is a major cause of liver disease worldwide and comprises non-alcoholic fatty liver (NAFL) and non-alcoholic steatohepatitis (NASH). Due to the high prevalence and poor prognosis of NASH, it is critical to identify and treat patients at risk. However, the aetiology and mechanisms remain largely unknown, warranting further analysis. Methods We first identified differential genes in NASH by single-cell analysis of the GSE129516 dataset and conducted expression profiling data analysis of the GSE184019 dataset from the Gene Expression Omnibus (GEO) database. Then single-cell trajectory reconstruction and analysis, immune gene score, cellular communication, key gene screening, functional enrichment analysis, and immune microenvironment analysis were carried out. Finally, cell experiments were performed to verify the role of key genes in NASH. Results We conducted transcriptome profiling of 30,038 single cells, including hepatocytes and non-hepatocytes from normal and steatosis adult mouse livers. Comparative analysis of hepatocytes and non-hepatocytes revealed pronounced heterogeneity as non-hepatocytes acted as major cell-communication hubs. The results showed that Hspa1b, Tfrc, Hmox1 and Map4k4 could effectively distinguish NASH tissues from normal samples. The results of scRNA-seq and qPCR indicated that the expression levels of hub genes in NASH were significantly higher than in normal cells or tissues. Further immune infiltration analysis showed significant differences in M2 macrophage distribution between healthy and metabolic-associated fatty liver samples. Conclusions Our results suggest that Hspa1b, Tfrc, Hmox1 and Map4k4 have huge prospects as diagnostic and prognostic biomarkers for NASH and may be potential therapeutic targets for NASH.


Introduction
Metabolic-associated fatty liver disease (MAFLD) is an umbrella term for hepatic abnormalities, including steatosis (fat accumulation, NAFL) and non-alcoholic steatohepatitis (NASH), which is NAFL plus hepatic injury, inflammation, and fibrosis [1]. MAFLD has a prevalence of ~25% worldwide and results from the inability of the liver to maintain lipid homeostasis, leading to the accumulation of triglyceride (TG), the major energy-storage molecule in mammals. This study used in silico methods to analyse MAFLD single-cell and expression profile datasets from public databases and screened the key genes related to MAFLD.
Given that obesity, insulin resistance, and diabetes mellitus have been reported to be major drivers of MAFLD, it is not surprising that the mechanistic target of rapamycin (mTOR), which sits at the crossroads of nutrient signalling [2], plays a critical role in its aetiology. Gosis et al. [3] reported that selective inhibition of a noncanonical arm of mTOR complex 1 (mTORC1) signalling could inhibit hepatic de novo lipogenesis (DNL) and protect mice from MAFLD mediated by activation of the canonical mTORC1-S6 kinase (S6K) pathway, leading to 'feedback' downregulation of FLCN and suppression of mTORC1-mediated phosphorylation of the transcription factor TFE3 by a noncanonical pathway. This mechanism results in the translocation of unphosphorylated TFE3 into the nucleus, increasing insulin-induced gene 2 (Insig2) expression, inhibiting the proteolytic processing of SREBP-1c in the endoplasmic reticulum and Golgi apparatus necessary for its transcriptional activity. This finding is consistent with the finding that activation of mTORC1 by disruption of TSC1 and TSC2 is insufficient for stimulation of DNL and requires suppression of Insig2. Another study focussed on mTORC2 and the dual-specificity tyrosine phosphorylation-regulated kinase (DYRK1B), which has been linked to metabolic syndrome [4]. Upregulated DYRK1B expression in the livers of mice fed high-fat, high-sucrose diets increased hepatic DNL, FA uptake, and TG secretion, along with the development of hyperlipidaemia and NASH. Disruption of mTORC2 reversed these abnormalities. However, these biomarkers exhibit limited value for treatment of MAFLD.
In recent years, MAFLD has been extensively studied. A study by Chutian Wu et al. [5], which included 26 simple steatosis (SS), 34 non-alcoholic steatohepatitis (NASH), and 13 healthy controls (HC) identified 6 up-and 19 downregulated genes in SS and 13 up-and 19 downregulated genes in NASH compared with HC. Moreover, intersected pathways between SS and NASH included the PI3K-Akt signalling pathway and pathways in cancer. High-throughput sequencing of the liver mRNA of Mongolian gerbils with MAFLD and further bioinformatics analysis obtained 8 hub genes Cd44, App, Cdc42, Cd68, Cxcr4, Csf1r, Adgre1 and Fermt3 involved in inflammation, fibrosis and hepatic cell carcinoma (HCC), and qRT-PCR showed that the above genes were upregulated in different periods of modelling process [6]. A comprehensive bioinformatics analysis of 3 comprehensive gene expression datasets in MAFLD tissues showed that CD24, COL1A1, LUM, THBS2 and EPHA3 were upregulated, while PZP was downregulated, and 'Glycolysis/gluconeogenesis' , 'p53 signalling pathway' , 'glycine, serine and threonine metabolism' were 3 common pathways related to the MAFLD process [7]. However, these bioinformatic studies on MAFLD have not fully elucidated the pathogenesis of MAFLD, and no in silico analysis has hitherto been performed on MAFLD single-cell sequencing datasets.
In this study, the non-alcoholic steatohepatitis (NASH) single-cell (scRNA-seq) dataset of non-parenchymal cells of mouse liver GSE129516 and mouse non-alcoholic fatty liver expression profiling dataset GSE184019 from GEO database underwent bioinformatics analysis which yielded 4 key genes. We further conducted immune cell infiltration analysis and used qRT-PCR to validate significant differences in the expression of these four genes in mouse fatty liver cells.

Data collection and preprocessing
The single cell (scRNA-seq) dataset GSE129516 of non-parenchymal cells of healthy and NASH (Non-alcoholic Steatohepatitis) mice liver were obtained from the GEO database (https://www.ncbi.nlm. nih.gov/geo/) [8]. The dataset contained samples from Mus musculus. The detection platform was Illumina NextSeq 500, Illumina HiSeq 4000, Illumina NovaSeq 6000. Three healthy samples and three NASH samples in the dataset were included in this study.

Quality control of single-cell data
First, we installed R (version 4.1) and Seurat (version 4.0.5) R packages [9] to create Seurat objects for single-cell datasets. It is well-established that the proportion of mitochondrial genes to all genetic material may indicate whether the cell is in a steady state. Accordingly, when the proportion of mitochondrial genes in a cell is higher than all genes, it may be in a state of stress. As a result, we filtered cells with a mitochondrial gene content of more than 25%. Considering the potential presence of diploidy, we filtered cells with a UMI count of less than 200 or more than 8500, yielding 30,038 cells.
After data standardisation, highly variable genes were identified in a single cell using a log-normalization method, and the relationship between average expression and dispersion was assessed. Then, principal component analysis (PCA) [10] was conducted, taking the variable genes as the input to identify significant principal components based on the jackStraw function. A total of 7 principal components were selected as statistically significant inputs of t-distribution random neighbourhood embedding (umap). We examined the distribution of umap and PCA among these samples. We compared the average gene expression among samples and found a significant correlation between them. The cells were clustered by the FindClusters function and divided into 16 clusters.

Reconstruction and analysis of the single-cell trajectory
Cell differentiation was inferred using the Monocle (version 2.22.0) R package and the default parameters recommended by the developer. The integrated gene expression matrix from each cell type was first derived from Seurat to Monocle to construct the cell dataset. The variable genes were defined by the dispersion Table  function, and the Filter function was used to sort the cells. Finally, the DDR Tree method was used to reduce the dimension, and the order Cells function was used to estimate the arrangement of cells along the trajectory. Based on the clustering characteristics and marker gene analysis, the trajectory map of the differentiation time of immune cells in the single-cell data set was obtained. For the analysis of each trajectory, we adopted a standard scheme with default parameters.

The score of immune genes in each cell population was calculated by AUCell
Immune-related genes (IRG) were obtained from Innate DB (https:// www.innatedb.com/index.jsp) [11], and the enrichment of IRG in cells was scored with AUCell (version 1.16.0) R package. The AUC score of each cell was based on gene set enrichment analysis. According to the AUC value of each cell for IRG, each cell's ranking of gene expression was generated to estimate the proportion of highly expressed gene sets in each cell. The cells that expressed more genes had higher AUC values. The 'AUCell explore Thresholds' function was used to determine the threshold for identifying the active cells of the gene set. Then, the AUC score of each cell was mapped to UMAP embedding using the ggplot2 (version 3.3.5) R package to visualise the active cluster [12].

Single-cell data enrichment analysis
Gene ontology (Gene Ontology, GO) enrichment analysis is commonly used to study the functional enrichment of genes in terms of biological process (BP), molecular function (MF) and cellular component (CC). Kyoto Encyclopaedia of Genes and Genomes (KEGG) is a widely used data library for storing information about genomes, biological pathways, diseases and drugs. The differentially expressed genes between NASH and healthy groups were identified by the Wilcoxon rank sum test in DESeq2 (version 1.34.0) R package using the criteria log2FC > 1 and p < 0.05. Using the clusterProfiler (version 4.2.0) R package [13], the differential genes identified between NASH and healthy groups underwent GO annotation and KEGG pathway analysis. The enrichment results were visualised in the form of a bar chart and bubble plot, and the significance threshold of enrichment analysis was set to an adjusted p-value < 0.05.

Cell communication analysis
The interaction intensity of intercellular communication was calculated by using CellChat (http://www.cellchat.org/) R package [14]), combining single-cell expression profiles with known ligands, receptors and their cofactors. Furthermore, the ligand-receptor pairs with significant interaction were identified by ligand-receptor interaction probability and disturbance test. Then the cell-cell communication network is integrated by adding the number or intensity of ligand-receptor pairs with significant interactions between cell types. In addition, we visualised the intercellular communication mediated by multiple pairs of ligand-receptor or signal pathways through bubble plots to explore the characteristics of ligand-receptor interaction intensity or ligand-receptor gene expression level between cell types, including similarities or differences. Finally, the cell-cell communication network was systematically analysed, including visualisation of the centrality score of the network and determining the pathways that contribute most to the output or input signals of some cell groups. Then, the determined pathways were verified by network centrality analysis.

Screening of key genes from Bulk RNA data
To study the expression characteristics of differential genes found in Kupffer cells between NASH and healthy groups at the tissue level of non-alcoholic fatty liver, we analysed the Bulk RNA sequencing dataset GSE184019. We screened the differential genes between NASH and healthy groups in the single cell data set Kupffer cell from the GSE184019 dataset and analysed the differences using DESeq2 (version 1.34.0) R package to extract the differential genes (p-value < 0.05& | log2FoldChange | > 0.5). The ClusterProfiler (version 4.2.0) R package was used to annotate the GO function of differential genes to identify the biological process of significant enrichment. The enrichment results were visualised in a bar plot, and the significance threshold of enrichment analysis was based on an adjusted p-value < 0.05. The enrichment analysis results were visualised in a bar plot.
The interaction between marker genes was analysed by STRING online database (https://string-db.org/), and then the PPI network was constructed by Cytoscape (version 3.9.0) software to map the functional interactions of proteins expressed by genes, including direct physical interactions and indirect functional correlations. Then the cytoHubba plug-in was used to assign values to each gene through the topological network algorithm, and the key genes and subnetworks were found by sequencing.

Estimation of the fraction of immune cell types
CIBERSORT (https://cibersortx.stanford.edu/) [15] analysis tool) is a bioinformatics algorithm for accurately calculating immune cell infiltration. The hypothetical immune cell abundance was based on a reference set of 22 immune cells, and the number of permutations was set to 1000. A heatmap was generated to visualise the composition of infiltrating immune cells in each sample. The CorrplotR package was used to analyse and visualise the correlation of 22 kinds of infiltrating immune cells. The ggplot2R package was used to generate a box plot to visualise the difference in immune cell infiltration.

qRT-PCR analysis
We used human hepatoma cells (HepG2) for the in vitro experiments. The human hepatoma cells were cultured and treated with a free fatty acid mixture to establish an in vitro steatosis model, and the steatosis of cells was validated by oil red O staining. The hepatocyte steatosis model was induced by drug treatment of the human hepatoma cells, followed by cell collection, RNA extraction, nucleic acid concentration determination, reverse transcription, RNA electrophoresis, and qPCR. Cells from the hepatocyte steatosis model group were used to validate the expression of hub genes by qRT-PCR. Briefly, the total RNA of the hepatocyte steatosis sample was collected using TRIzol reagent. The collected RNA purity was determined by the Quantus fluorometer. For this purpose, total RNA was reverse transcribed. The amplified cDNA samples were mixed with a one-step SYBR PrimeScript PLUS RTPCR kit. Finally, the reaction was performed on AriaMx HRM. The positive control for this reaction was GAPDH, and each sample was calculated using the comparative Ct method (Table 1).

scRNA-seq data was used for cell type annotation and functional enrichment analysis
Using the UMAP (Uniform Manifold Approximation and Projection) algorithm, mouse non-alcoholic fatty liver cells were successfully classified into 16 independent clusters ( Figure 1B). We used marker genes for each cell type to identify 16 clusters ( Figure 1A). Clusters 0, 4 and 16 were annotated as endothelial cells, accounting for 32.25% (9690 cells) of the total number of cells. Clusters 1, 2 and 15 were annotated as T cells, with a relative proportion of 22.43%. Kupffer cells corresponded to Clusters 3 and 7, with a relative proportion of 14.7%. Clusters 5 and 8 were annotated as B cells (3517, 11.70%). Cluster 6 was annotated as macrophages (2176, 7.24%). Clusters 9 and 12 were annotated as dendritic cells (1533, 5.10%), while clusters 10, 11, 13, and 14 corresponded to bile duct cells (704, 2.34%), hepatocytes (624, 2.07%), plasma cells (333, 1.10%) and hepatic stellate cells (305, 1.01%), respectively. The distribution of cells in NASH and control groups is shown in Figure  1(C). It was found that cells such as macrophages, DCs, Kupffer cells, and cholangiocytes were significantly present in NASH. We used the FindAllMarkers function to identify the differentially expressed genes between each type of cell and generated a dot plot of the first two genes with the most significant differentially expressed genes ( Figure 1D). We use the differentially expressed genes among these cell types to generate a heat map, reflecting the expression of differential genes. Then these differential genes were used for functional pathway enrichment analysis based on the GO database ( Figure 1E). It could be seen that the differential genes of each cell type were enriched in the pathway that matched their cell function.

Reconstruction and analysis of the single-cell trajectory
We extracted immune cell subsets (T cell, B cell, DC, Kupffer cell, Macrophage, Plasma B cell) from the single-cell dataset and analysed them with the Monocle R package to construct the trajectory of immune cell subsets in pseudo-time. The cell transfer trajectory showed one root and two branches in pseudo-time, and the pseudo-timing diagram was coloured based on three aspects: cell type (Figure 2A), cell cluster ( Figure  2B) and pseudo-time process ( Figure 2C). We observed that most cells at the beginning of pseudo-time (0) were clustered on two branches for the Kupffer cell, T cell and B cell, respectively. However, Macrophages, DCs, and Plasma B cells showed high invasive potential. Since the reprogramming trajectory was divided into two branches in the later stage, we try to clarify the molecular dynamics that distinguish the two branches ( Figure 2D). Three major gene clusters were identified to explain these differences. GO analysis of three gene clusters showed that cluster 1 was significantly enriched in immune cell activation, such as monocyte differentiation, regulation of T cell activation and lymphocyte differentiation, and cluster 2 was significantly enriched in positive regulation of cell adhesion, cytokine-mediated signal pathway, leukocyte migration and other functions related to cell migration. Cluster 3 was significantly enriched in immune cell activation, such as B cell activation, monocyte differentiation and regulation of B cell activation (see Table 1).

Immune gene scoring
We use the AUCell R package to calculate the enrichment of the immune-related genes of each cell. The AUCell R package was used to determine the IRG activity of each cell line ( Figure 3A). Cells expressing more differential genes showed higher AUC values, mainly present in Kupffer cells (yellow) with AUC values above 0.12 ( Figure 3B). We then analysed the differences between the NASH and healthy groups, obtained the differential genes which subsequently underwent GO/KEGG analysis ( Figure 3C and 3D). Significantly enriched GO terms were related to endocytosis and autophagy, and enriched KEGG pathways were associated with the metabolic pathway of non-alcoholic fatty liver disease, indicating that Kupffer cells played an important role in the pathogenesis of non-alcoholic fatty liver.

Cellular communication
A total of 33 signalling pathways, including MIF, SPP1, GALECTIN, CCL, and GAS, were detected in 10 cell types annotated by CellChat. We generated a graph of the total interaction intensity between cell types ( Figure 4A) and the intensity of outgoing signals from cell types ( Figure 4B, left) and incoming signals to cell types ( Figure 4B, right) for each pathway. The two cell types, Cholangiocyte and Kupffer cell exhibited the highest interaction strength compared with other cell types when they acted as signal sources and signal receivers. We then counted all the important receptor-ligand pairs when the Cholangiocyte ( Figure 4C) and the Kupffer cell ( Figure 4D) received the signal, using these two cells as signal transmitters and transmitters, respectively. They were identified as the sender/receiver important signalling pathway (SPP1/MIF) verified by network centrality analysis ( Figure 4E and 4F). We found that cholangiocytes and Kupffer cells acted as the main sender and receiver to coordinate signal transmission between cell types.

Screening of key genes from Bulk RNA data
To study the expression characteristics of differential genes found in Kupffer cells between NASH and healthy groups at the tissue level of non-alcoholic fatty liver, we analysed the Bulk RNA sequencing data set GSE184019. In the Bulk RNA dataset, the differentially expressed genes between NASH and healthy groups in Kupffer cells were analysed by DESeq2. The main differentially expressed genes were Cd9, CS, Dlat, Map4k4, Midn, C2, Lipa and so on ( Figure 5A). Then we used these differential genes for functional enrichment analysis, which revealed significant enrichment in cell catabolism, regulation of cell adhesion, bone marrow cell homeostasis and bone marrow cell differentiation ( Figure 5B). We used STRING database to analyse the interaction between different genes, and used Cytoscape software to analyse the results. Then, the cytoHubba plug-in was used to extract the top 10 key genes with the greatest interaction between different genes ( Figure 5D and 5F) and finally selected four hub genes (Hspa1b, Hmox1, Cd86, Tfrc).

Estimation of the fraction of immune cell types
We use the CIBERSORT algorithm to calculate the type and distribution of each immune cell in the Bulk RNA data, remove the immune cells whose abundance is 0 in more than half of the samples, and 13 types of immune cells: B cells naive, Dendritic cells resting, T cells CD4 memory resting, NK cells resting, Monocytes, Neutrophils, Plasma cells, T cells CD8, NK cells activated, Macrophages M1, Mast cells resting, T cells follicular helper, Macrophages M2. We performed correlation analysis on 13 immune cells ( Figure 6A). We also counted the proportion of immune cell types in the transcriptional group samples ( Figure 6B), from which we can see the proportion of immune cells in each sample. Box plot comparison showed a significant difference in M2 macrophage distribution between healthy and non-alcoholic fatty liver samples ( Figure 6C).

The four key genes screened underwent qRT-PCR analysis
Through single-cell sequencing bioinformatics analysis, we screened out four key genes related to metabolic-related fatty liver disease and then performed cell experiments on the four key genes: Hspa1b, Tfrc, Hmox1, Map4k4, and used QPCR technology to analyse the differential expression of the four key genes. The screened biomarkers of the hepatocyte steatosis model group (Hspa1b, Tfrc, Hmox1, Map4k4) were verified using qRT-PCR. The results indicated significant upregulation in the expression of Hspa1b, Tfrc and Map4k4 in the hepatocyte steatosis model group compared to the control group, while the expression of Hmox1 was not significantly different in the hepatocyte steatosis model group compared to the control group (Figure 7).

Discussion
Metabolic-associated fatty liver disease remains a prominent risk factor for many chronic diseases, including obesity, diabetes, cardiovascular disease, and cancer [16]. Moreover, MAFLD can progress to steatohepatitis or cirrhosis. However, there is no definite target and therapeutic mechanism for MAFLD. Single-cell sequencing has guiding significance for screening potential therapeutic targets and molecular mechanisms of MAFLD. In the present study, we conducted transcriptome profiling of 30,038 single cells, including hepatocytes and non-hepatocytes, from normal and steatosis adult mouse livers. Comparative analysis of hepatocytes and non-hepatocytes revealed significant heterogeneity, and non-hepatocytes acted as major cell communication hubs. Systematic analysis of cellular compositions and cell-cell interaction networks showed that hepatocyte metabolism was significantly correlated with changes in liver function. We also uncovered the active involvement of non-hepatocyte cells in regulating the behaviour of hepatocytes, exemplified by Kupffer cells, which could preserve liver function after steatosis.
In this study, bioinformatics analysis of single-cell and transcriptome datasets was performed, and four key genes were screened, including Hspa1b, Hmox1, Cd86 and Tfrc. It is widely acknowledged that Hspa1b encodes a 70 kDa heat shock protein, a member of the heat shock protein 70 family. In conjunction with other heat shock proteins, this protein stabilises existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The Hspa1b gene is Figure 5. Combination with single-cell data to find the differential genes (a) between non-alcoholic fatty liver and healthy samples in Bulk rna data. DeSeqs difference analysis was carried out on the Bulk rna data set, and the volcano map was drawn. (B) The bar chart shows the enrichment results of the functional pathway of go. (C) Heat map expression showed the expression of different genes in each sample (P10802T127, P10802T128, P10802T207, and P10802Y208 as experimental samples, P10802T117, P10802T118, P10802Y119, P10802T120, and P10802Y202 as control samples). The gene interaction intensity map was calculated by Betweenness (D), BottelBeck (e) and MnC (f) calculation methods in the cytoHubba plug-in. located in the major histocompatibility complex class III region in a cluster with two closely related genes encoding similar proteins. Y Guan et al. found that the upregulation of HSPA1A/HSPA1B/ HSPA7 and the downregulation of HSPA9 were related to poor survival in colon cancer [17]. It is well-established that haem oxygenase, an essential enzyme in haem catabolism, cleaves haem to form biliverdin, which is converted to bilirubin by biliverdin reductase and carbon monoxide, a putative neurotransmitter. Haem oxygenase activity is induced by its substrate haem and various nonheme substances. Haem oxygenase occurs as 2 isozymes, an inducible haem oxygenase-1 and a constitutive haem oxygenase-2. HMOX1 and HMOX2 belong to the haem oxygenase family. A study by Z. Meng et al. showed that diabetes significantly increased HMOX1 levels and ferroptosis, promoting atherosclerosis [18]. Moreover, diseases associated with CD86 (a protein-coding gene) include acute myocarditis and plague [19], related to pathways, including the ICos-ICosL Pathway in T-helper cells and RET signalling. GO terms related to this gene included signalling receptor binding and coreceptor activity. An important paralog of this gene is CD80. A study by J Zhao et al. found that injection of CD86 macrophages instead of liver partition could accelerate liver regeneration after portal vein ligation in rats [20]. Tfrc encodes a cell surface receptor necessary for cellular iron uptake by the process of receptor-mediated endocytosis. This receptor is required for erythropoiesis and neurologic development. Multiple alternatively spliced variants have been identified. Research has shown that exosomes derived from hepatitis B virus-infected hepatocytes promote liver fibrosis via the miR-222/TFRC axis [21]. However, in existing studies, these four key genes have not been associated with MAFLD. Although liver metabolism-related genes and pathways play an important role in the pathogenesis of metabolization-related fatty liver disease, few liver metabolism-related genes were identified in this study. Indeed, hepatic astrocytes and cholangiocytes are crucial in metabolic-associated fatty liver disease pathogenesis, but these were not observed in this study, which may be related to differences among studies. Current evidence suggests that bile acid homeostasis dysregulation is an important mechanism for the development of fatty liver, and the homeostasis of bile acids is important regulated by the farnesol X receptor (FXR) activated by bile acids, which can protect the liver from MAFLD by regulating bile acid homeostasis [22]. There is a growing consensus that hypericin may improve MAFLD by regulating cholesterol metabolism, basal metabolism, and excretion [23]. Similarly, it has been reported that MyD88 could induce osteopontin (OPN) secretion in hepatic stellate cells (HSCs) and cholangiocytes, which consequently resulted in the activation of AKT signalling pathway and accumulation of fat in hepatocytes [24].
In the constructed molecular interaction network, the genes closely related to four key protein-coding genes were: hspa1a, cd63, cd9, and ago2. HSPA1A (Heat Shock Protein Family A (Hsp70) Member 1A) is a gene associated with transient cerebral ischaemia and carotid artery occlusion [25]. Among its related pathways are the HIV Life Cycle and the Proteolysis Role of Parkin in the Ubiquitin-Proteasomal Pathway [26]. Significantly enriched GO terms related to this gene include RNA binding and ubiquitin protein ligase binding. An important paralog of this gene is HSPA1B. CD63 is also a protein-coding gene associated with diseases, including Hermansky-Pudlak Syndrome and Storage Pool Platelet Disease [27]. Among its related pathways are the innate immune system and response to elevated platelet cytosolic Ca2 + [28]. An important paralog of this gene is TSPAN3. CD9 is associated with Diphtheria and Human Immunodeficiency Virus [29]. Among its related pathways are a6b1 and a6b4 Integrin signalling and HIV Life Cycle [30], while enriched GO terms include integrin binding. An important paralog of this gene is TSPAN2. Moreover, AGO2 (Argonaute RISC Catalytic Component 2) is associated with AGO2 include Lessel-Kreienkamp Syndrome and Non-Specific Syndromic Intellectual Disability [31], related pathways include Pre-NOTCH Expression and Processing and Mitotic Prophase [32], GO annotations include nucleic acid binding and RNA binding and an important paralog is AGO1. These related genes have hitherto not been reported to be associated with MAFLD, which is inconsistent with the four key genes found in this single-cell sequencing bioinformatics analysis, indicating the innovative nature of our research.
Next, we used the DEGs between different cell types to generate a heat map, reflecting the expression of differential genes. Subsequently, these DEGs underwent functional pathway enrichment analysis based on the GO database ( Figure 1E). It was found that the differential genes of each cell type were enriched in pathways that matched their cellular functions: regulation of actin filament-based process, regulation of actin cytoskeleton organisation, and fatty acid metabolic process. At the same time, the GO function of the four key genes included positive regulation of the cellular catabolic process, regulation of cell-cell adhesion, regulation of peptidase activity, myeloid cell differentiation, and myeloid cell homeostasis ( Figure 5B). An increasing body of evidence suggests that regulation of actin filament-based process is associated with right ventricular developmental trajectory and alcoholic cardiomyopathy [33,34]. In addition, it is associated with kidney ageing and related renal disease [35]. Regulation of actin cytoskeleton organisation is associated with prostate cancer invasion and metastasis [36]. Cells precisely control the formation of dynamic actin cytoskeleton networks to coordinate fundamental processes, including motility, division, endocytosis and polarisation. To support these functions, actin filament networks must be assembled, maintained and disassembled at the correct time and place and with proper filament organisation and dynamics. Regulation of the extent of filament network assembly and filament network organisation has been largely attributed to the coordinated activation of actin assembly factors through signalling cascades [37]. Fatty acid metabolic processing is one of the most critical metabolic pathways in the human body associated with glioma and acute kidney injury [38,39]. These studies are consistent with the conclusions of this study, validating our data mining results.
However, this study has the following limitations. First, although the key genes screened out have been verified in combination with wet experiments, no in-depth functional mechanism research or corresponding clinical relevance studies have been carried out. Besides, this study did not use any available gene set analysis of scRNA-seq in patients with NASH, which can make gene set data analysis less clinically meaningful. Indeed, this study has not demonstrated the expression of these key genes in human hepatocytes. The differential expression of these key genes in clinical samples of fatty liver is expected to be confirmed in later studies. Moreover, liver metabolism-related genes and pathways play an important role in metabolic-associated fatty liver disease pathogenesis, but almost no liver metabolism-related genes have been screened in this study. Interestingly, hepatic astrocytes and cholangiocytes have been established to be crucial in metabolic-associated fatty liver disease pathogenesis, but these have not been found in this study and may be related to individual differences and sample bias. In addition, in the in silico analysis part of this article, we screened out the candidate genes through the liver KC cell dataset, but in the cell experiment part, our team selected hepatocytes to verify the differential expression of the candidate genes. Although previous studies have shown that it is possible to verify the differential expression of candidate genes in different cell types in the same organ [40], it is undoubtedly more meaningful to verify the differential expression of candidate genes with liver KC cells, which is also a shortcoming of our research. Last but not least, we evaluated the expression of these genes in a model of drug-induced steatosis in a tumoral cell. It is widely thought that palmitate or oleic-acid-induced steatosis are better in vitro models of steatosis. However, there is evidence from previous literature that the cell model of steatosis used in this study is internationally accepted and reliable [41]. In our future cell experiments on fatty liver disease, palmitic acid or oleic acid will be used to induce steatosis, given that these in vitro steatosis models are more acceptable.
Overall, we conducted a single-cell bioinformatics analysis and comprehensively explored the pathogenesis and potential molecular targets of fatty liver. Importantly, four key genes closely related to fatty liver were identified and verified by qPCR. Nonetheless, the specific pathogenesis and molecular targets remain largely unclear, warranting further research.