Single-cell RNA sequencing identifies the specific cell types of GWAS genes associated with liver disease


 Background The liver is an important digestive organ in the human body, which has a variety of physiological functions. Once the liver occurs dysfunction, it may indicate the occurrence of liver disease. Genome-wide association studies (GWAS) have identified a lot of genetic variants which are associated with liver disease. However, it is not clear that the genes with variants in which have cell type specificity. Methods To investigate the association between liver cell types and liver disease, we used a new method that integrate the genes associated with liver diseases identified by GWAS and single-cell RNA sequencing (scRNA-seq) data for analysis. We applied the scRNA-seq data from the GEO database, which included 20 cell types from human liver and 9 cell types from mouse liver after reclassifying. The susceptibility genes of liver diseases were downloaded from GWAS catalog and matched to the results of liver scRNA-seq. Results We found that most susceptibility genes of chronic hepatitis B virus (HBV) infection were expressed in human B cells. And the susceptibility genes of biomarkers of liver dysfunction showed similar cell type specificity in human and mouse. Last, we discovered that primary liver disease phenotypes may be due to mutations in multiple cell types. Conclusions Collectively, this study localized the susceptibility genes of liver diseases to specific cell types and provided clues for the in-depth study of liver diseases at the transcriptome level.


Background
The liver is the largest internal organ in human, which comprise 5% of body mass in mammals [1]. As an important organ of the digestive system, the liver has a variety of physiological functions, such as secreting essential clotting factors, regulating glucose, protein, and lipid metabolism. The ability to perform so many functions is due to the cooperation of various cell types in the liver. However, once liver cells are dysfunctional, diseases will occur. Liver cancer is the sixth most commonly diagnosed cancer and the fourth leading cause of cancer death worldwide in 2018, with about 841,000 new cases and 782,000 deaths annually [2]. The incidence of liver cancer in men is two to four times that of women [3]. And some liver chronic diseases can cause irreversible changes in the body. Primary biliary cirrhosis is a slowly progressive autoimmune disease of the liver, which will result in liver failure [4]. With obesity becoming more common, nonalcoholic fatty liver disease is now receiving greater attention, due to its potential to progress to cirrhosis and liver failure [5,6]. However, it is unclear whether the occurrence and development of these diseases have cell type specificity. In addition, when the viruses are infected, such as HBV, hepatitis C virus (HCV), are there certain specific liver cell types occur in response?
With The Human Cell Atlas [7] highlighted and the development of scRNA-seq technology, it has been accepted by more and more scientists to classify the cell types from organs by transcriptome. Han X et al. [8] mapped the Mouse Cell Atlas by microwell-Seq which analysed >400,000 single cells covering all major mouse organs, including 4,685 mouse liver cells. And Sonya A. MacParland et al. [9] performed the human liver single cell RNA sequencing and classified the cell types in detail. Above work provide a great reference for studying the cell type in liver. However, the association between liver cell types and liver disease is unclear.
Although GWASs have discovered thousands of genetic variants that influence risk for complex human traits and diseases during the past decade, the interpretation and functional understanding of these variants lag far behind [10][11][12]. Some scholars suggest that studies on genetic epigenetic interactions have become a major focus in the post-GWAS era [13]. Traditionally, the genes closest to single nucleotide polymorphism (SNP) has been identified as those most susceptible to disease variation [14]. Thus, we used a new method that integrate the genes associated with liver diseases identified by GWAS and the scRNA-seq data for analysis, looking for correlation between diseases and cell types. The genes are identified by GWAS, including primary liver disease, chronic hepatitis virus infection, and biomarkers of liver dysfunction. Meanwhile, to compare the similarity across species, we used human and mouse single-cell sequencing data. These results will provide information to understand how the different cell types are association with liver diseases.

scRNA-seq data of human and mouse liver
Human liver scRNA-seq data were acquired from GEO Database GSE115469 [9], and mouse liver scRNA-seq data were acquired from GEO Database GSE108097 [8].
Susceptibility genes for liver disease identified through GWAS All the significant SNPs of liver disease were downloaded from GWAS catalog [15] (downloaded 18 March 2019). First, we divided the study of GWAS in liver diseases into HBV infection, HCV infection, albumin measurement, liver enzyme measurement (including aspartate aminotransferase measurement, alanine aminotransferase measurement, alkaline phosphatase measurement and γ-glutamyl transferase measurement), nonalcoholic fatty liver disease, hepatocellular carcinoma and primary biliary cirrhosis.
Using the above as keywords, we searched through the GWAS catalog and download the SNPs list. We filtered out these SNPs which P value was greater than 5 × 10 -8 . Then, the remaining SNPs can map candidate genes. We obtained these candidate genes for subsequent correspondence with cell types by scRNA-seq (Table S1-S10).
Susceptibility genes converted to mouse homologous genes Given that human genes have been identified from GWAS, the scRNA-seq results of mouse need to be converted. The biomaRt is a R packages that can provides a flexible, reproducible basis for state-of-the-art bioinformatic data integration and gene mappings across species [16]. Then, we installed biomaRt (version 2.36.1) for genes converting from human to mouse. These candidate genes identified by GWAS can be converted to mouse homologous genes by biomaRt.
Using Seurat for liver scRNA-seq data analysis First, we installed R (version 3.5.1, https://www.r-project.org/) and Seurat [17] R packages (version 2.3.4, https://satijalab.org/seurat/). The tutorial and code of Seurat can be found at the web site, https://satijalab.org/seurat/v2.4/pbmc3k_tutorial.html. Briefly, we performed the scRNA-seq data of human and mouse liver, which we downloaded previously. For human liver data, according to the median number of genes and the percentage of mitochondrial genes in the human, the cells with the number of genes of <200 and >3,000 (potential cell duplets) and the mitochondrial gene percentage of >10% were filtered. For mouse liver data, the cells with the number of genes of <200 and >1,500 and the mitochondrial gene percentage of >10% were filtered. After the step above, we obtained 8,227 human liver and 4,628 mouse liver cells.
After data normalisation, highly variable genes of the single cells were identified after controlling for the relationship between average expression and dispersion. Genes were removed using the low cut-off average expression of 0.0125 and high cut-off of 3, and the z-score cut-off of 0.5 was applied to identify 2996 highly variable genes from human and 9147 highly variable genes from mouse. We used principal component analysis (PCA) with the variable genes as input and identified significant principal components (PCs) based on the jackStraw function. 20 PCs were selected as input for t-distributed stochastic neighbor embedding (t-SNE) statistically significant in human liver, while 10 PCs were selected in mouse liver. This step is explained in detail in the Seurat tutorial (https://satijalab.org/seurat/v2.4/pbmc3k_tutorial.html). Briefly, Seurat can determine statistically significant principal components (PCs) by "JackStrawPlot" function. In general, we will calculate p-values for the 1-30 PCs by "JackStrawPlot" function respectively. Then we selected a suitable PCs which contained a strong enrichment of genes with low pvalues.
All cells were clustered by FindClusters function and classified into 20 clusters for human and 9 clusters for mouse. Then average expression values of the genes were calculated in each cluster by AverageExpression function (Table S11, S12). To facilitate analysis, the average expression values were converted to z-score values (Table S13, S14).
Integration of specific cell types and susceptibility genes associated with liver disease First, we classified liver diseases into 10 different traits (chronic hepatitis B infection, chronic hepatitis C infection, albumin measurement, aspartate aminotransferase measurement, alanine aminotransferase measurement, alkaline phosphatase measurement, γ-glutamyl transferase measurement, nonalcoholic fatty liver disease, hepatocellular carcinoma and primary biliary cirrhosis). Their susceptibility genes have been identified by previous steps (Table S1-S10). Subsequently, we detected the expression of these susceptibility genes in different cell types of human and mouse livers in terms of different traits.

Results
scRNA-seq profiling and unbiased clustering of human and mouse liver cells Due to previous studies [8,9], we used these scRNA-seq data to reclassify liver cells and calculated the average expression of each gene in each cell type by Seurat [17] (https://satijalab.org/seurat/) (Methods). According to the marker genes ( Fig. 1c) of each cell type provided by a previous study [9], human liver cells were classified into cluster 1-  (Fig. 1a). In contrast to previous study, CD3 + T cells were classified into IL7R + and IL7Rsubpopulations. Given to the transcriptome differences between hepatocytes, they were divided into five subtypes. According to the marker genes (Fig. 1d), mouse liver cells were classified into cluster 1-9, thereby corresponding to Endothelial cells, Non-inflammatory Macrophage, Erthyroid cells, Inflammatory Macrophage, Immune cells (ICs), T cells, Hepatocytes, B cells, Cholangiocytes.

Susceptibility genes of chronic HBV infection showed cell-type specificity in human
We focused on the susceptibility genes of chronic hepatitis virus infection, such as HBV and HCV, provided by GWAS catalog [15] (downloaded 18 March 2019). These SNPs which P value was greater than 5 × 10 -8 were filtered out. Finally, 17 SNPs that are closely related to chronic HBV infection were identified and they can map 14 different genes (Table S1).
Interestingly, we found that the 10 of 14 genes were highly expressed in human liver B cells (Mature B cells and Plasma cells) (Fig. 2a). And some genes also expressed in Noninflammatory Macs, Non-inflammatory Macs, CD3 + IL7R -T cells, but they were not as significant as B cells. However, the genes associated with chronic HCV infection did not show cell-type specificity obviously (Fig. 2b). Therefore, we hypothesized that HBV infection have a role in the change of genes in B cells.
To investigate whether these susceptibility genes associated with chronic HBV infection also existed cell-type specificity in mouse, we combined the susceptibility genes to scRNAseq data for analysis. First, these genes were converted into mouse homologous genes (Methods). We found that these genes associated with chronic HBV infection were not highly expressed in a mouse cell type (Fig. 3a). And these genes associated with chronic HCV infection were highly expressed in multiple cell types (Fig. 3b).

Susceptibility genes of biomarkers of liver dysfunction showed similar cell type specificity in human and mouse
We classified biomarkers of liver dysfunction into some traits, such as albumin measurement, aspartate aminotransferase measurement, alanine aminotransferase measurement, alkaline phosphatase measurement and γ-glutamyl transferase measurement. These genes were obtained from GWAS catalog (downloaded 18 March 2019). These indexes are biomarkers of reflecting liver function. Some SNPs were filtered out by above methods, then we acquired some candidate susceptibility genes of each trait (Table S3-S7). Then we found that the majority susceptibility genes of albumin measurement and alanine aminotransferase measurement were highly expressed in hepatocytes (Fig. 4a, 4c). Most susceptibility genes of aspartate aminotransferase measurement were concentrated in hepatocytes and endothelial cells (zone 1 sinusoidal, zone 2/3 sinusoidal and portal) (Fig. 4b). At the same time, most susceptibility genes of alkaline phosphatase measurement and γ-glutamyl transferase measurement were highly expressed in hepatocytes, endothelial cells and cholangiocytes (Fig. 4d, 4e). There was a possibility that changes in these biomarkers may interact with multicellular types.
Interestingly, we found similar results in mouse liver. Most susceptibility genes of albumin measurement and alanine aminotransferase measurement were highly expressed in hepatocytes and cholangiocytes (Fig. 5a, 5c). And most susceptibility genes of aspartate aminotransferase measurement and γ-glutamyl transferase measurement were concentrated in hepatocytes, cholangiocytes and endothelial cells (Fig. 5b, 5e). In particularly, except hepatocytes and cholangiocytes, these susceptibility genes of alkaline phosphatase measurement were expressed in some immune cells, such as B cells and Macs (Fig. 5d). These results indicated that single-cell transcriptome analysis had highlighted specific cell types responsible for specific liver dysfunction, no matter in human or mouse.
Discrete primary liver disease phenotypes can be due to mutations in multiple cell types We next explored the relationship between primary liver disease and liver cell types. Due to a lot of GWAS in primary liver disease, we analyzed nonalcoholic fatty liver disease, hepatocellular carcinoma and primary biliary cirrhosis. We found that the susceptibility genes of nonalcoholic fatty liver disease identified by GWAS were expressed in various types of human and mouse liver cells (Fig. 6a, 7a). However, there was some heterogeneity in human and mouse transcriptome. The GCKR was highly expressed in human and mouse hepatocytes, while the PNPLA3 was only highly expressed in human hepatocytes. In addition, the susceptibility genes of hepatocellular carcinoma were also expressed in various cell types in human and mouse (Fig. 6b, 7c).
Primary biliary cirrhosis is often considered as a typical autoimmune disease [18]. We are interested to realize whether immune cells play an important role in it. We found that the susceptibility genes of primary biliary cirrhosis were not only expressed in human Macs, T cells and B cells, but also in hepatocytes and cholangiocytes (Fig. 6c). Meanwhile, in mouse, these genes were expressed in B cells, T cells, ICs and cholangiocytes (Fig. 7c).
These results indicated that primary biliary cirrhosis could be associated with multiple cell interactions.

Discussion
Chronic hepatitis virus infection is a common hepatic infection disease, mainly caused by HBV and HCV. Based on a study in 2016, HBV affects approximately 250 to 340 million people worldwide [19]. HBV is a DNA virus which humans are the only known natural host [20]. It has been reported that once infected with HBV, human immune system should evolve different mechanisms for dealing with free extracellular virus and virus-infected cells [21]. In a previous study, patients with chronic HBV tended to have late, transient Tcell responses [22]. However, previous studies were difficult to reveal the relationship between HBV and liver cell types in single-cell resolution. In our study, although some susceptibility genes of HBV identified by GWAS were expressed in human T cells, they were mostly highly expressed in human B cells (Fig. 2a). This was an interesting result. B cells play an important role in humoral immunity. Their role in HBV infection may be a major concern towards lots of researchers. Thus, we hypothesized that HBV infection may change the normal physiology of B cells and humoral immunity. However, this result was not suitable for mouse liver. This may be reason for species heterogeneity between human and mouse. Therefore, caution is needed when using mouse models to study this disease.
Albumin is produced in circular polysomes on the rough endoplasmic reticulum by hepatocytes [23]. The common clinical albumin abnormality is hypoalbuminaemia. The pathomechanism of hypoalbuminaemia is often caused by one or more factors, such as malnutrition, changes in the rates of synthesis and degradation of albumin, etc [23]. In this study, we found that the majority susceptibility genes of albumin measurement identified by GWAS were highly expressed in human hepatocytes, the same as alanine aminotransferase measurement. And we can obtain similar results in mouse, which reflected the homology of human and mouse transcriptome. The homology was also discovered in aspartate aminotransferase measurement phosphatase measurement and γglutamyl transferase measurement. The cell types that susceptibility genes highly expressed were similar in human and mouse.
Nonalcoholic fatty liver disease is common liver disease. The prevalence of nonalcoholic fatty liver disease in the general population is estimated at 25% [24]. At present, the risk of progression to severe fibrosis and cirrhosis is well recognized in patients with nonalcoholic steatohepatitis [25]. However, GWAS of nonalcoholic fatty liver disease identified a few SNPs, which P value was less than 5 × 10 -8 (Table S8). In this study, due to few genes matched by SNPs, only five different genes, this result made it hard to estimate whether the susceptibility genes were specific to one or more cell types. As far as this result were concerned, genetic mutations in hepatocytes and T cells may be associated with nonalcoholic fatty liver disease (Fig. 6a).
Primary biliary cirrhosis is a chronic cholestasis disease that may be an autoimmune pathogenesis characterized by inflammation and injury of small ducts in the liver that eventually lead to cirrhosis [26]. The disease is thought to be the result of a combination of genetic predisposition and environmental triggers [27]. Therefore, there are many GWAS studies on primary biliary cirrhosis and many susceptibility genes are found. Combined with scRNA-seq data, most of these genes can be localized to immune cells, such as Macs, T cells and B cells, whether in human or mouse (Fig. 6c, 7c). This result also supported the hypothesis of autoimmune pathogenesis.

Conclusions
In summary, we localized the susceptibility genes of liver diseases to specific cell types by integrating GWAS and scRNA-seq data. We discovered that most susceptibility genes of HBV infection were expressed in human liver B cells. And the susceptibility genes of biomarkers of liver dysfunction showed similar cell type specificity across human and mouse species. In addition, some primary liver disease phenotypes may be due to mutations in multiple cell types. Collectively, our study will provide new clues for the indepth study of liver diseases at the transcriptome level. Figure 1 scRNA-seq reveals the cell populations of liver. a t-SNE plot representation of 8,227 liver cells from human and clusters were colored, distinctively labeled. b t-colored, distinctively labeled. c Heat map showing the marker genes of each cluster, highlighting the selected marker genes for each cluster in human liver. The x-axis represents the cell type from 1 to 20 and y-axis represents genes expression. d Heat map showing the top 5 expression genes of each cluster in mouse liver. The x-axis represents the cell type from 1 to 9 and y-axis represents genes expression.   The genes of biomarkers of liver dysfunction identified by GWAS were integrated into human liver scRNA-seq data. a GWAS genes of albumin measurement expressed in liver cell types. b GWAS genes of aspartate aminotransferase measurement expressed in liver cell types. c GWAS genes of alanine aminotransferase measurement expressed in liver cell types. d GWAS genes of alkaline phosphatase measurement expressed in liver cell types. e GWAS genes of γ-glutamyl transferase measurement expressed in liver cell types. The genes of biomarkers of liver dysfunction identified by GWAS were integrated into mouse liver scRNA-seq data. a GWAS genes of albumin measurement expressed in liver cell types. The map showed genes with z-score>1.5. b GWAS genes of aspartate aminotransferase measurement expressed in liver cell types. c GWAS genes of alanine aminotransferase measurement expressed in liver cell types. The map showed genes with z-score>1.5. d GWAS genes of alkaline phosphatase measurement expressed in liver cell types. The map showed genes with z-score>1.5. e GWAS genes of γ-glutamyl transferase measurement expressed in liver cell types. The map showed genes with z-score>1.5.

Figure 6
The genes of primary liver disease identified by GWAS were integrated into human liver scRNA-seq data. a GWAS genes of nonalcoholic fatty liver disease expressed in liver cell types. b GWAS genes of hepatocellular carcinoma expressed in liver cell types. c GWAS genes of primary biliary cirrhosis expressed in liver cell types. The map showed genes with z-score>1.5.

Figure 7
The genes of primary liver disease identified by GWAS were integrated into mouse liver scRNA-seq data. a GWAS genes of nonalcoholic fatty liver disease expressed in liver cell types. b GWAS genes of hepatocellular carcinoma expressed in liver cell types. c GWAS genes of primary biliary cirrhosis expressed in liver cell types. The map showed genes with z-score>1.5.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.