Partitioned polygenic risk scores identify distinct types of metabolic dysfunction-associated steatotic liver disease

Metabolic dysfunction-associated steatotic liver disease (MASLD) encompasses an excess of triglycerides in the liver, which can lead to cirrhosis and liver cancer. While there is solid epidemiological evidence of MASLD coexisting with cardiometabolic disease, several leading genetic risk factors for MASLD do not increase the risk of cardiovascular disease, suggesting no causal relationship between MASLD and cardiometabolic derangement. In this work, we leveraged measurements of visceral adiposity and identified 27 novel genetic loci associated with MASLD. Among these loci, we replicated 6 in several independent cohorts. Next, we generated two partitioned polygenic risk scores (PRS) based on the mechanism of genetic association with MASLD encompassing intra-hepatic lipoprotein retention. The two PRS suggest the presence of at least two distinct types of MASLD, one confined to the liver resulting in a more aggressive liver disease and one that is systemic and results in a higher risk of cardiometabolic disease.


Introduction
Paralleling the obesity epidemic, metabolic dysfunction-associated steatotic liver disease (MASLD) is a growing burden worldwide.MASLD is a spectrum of conditions encompassing an excess of triglycerides in the liver progressing to in ammation, brosis and ultimately to cirrhosis and liver cancer 1 .MASLD is a heterogenous disease that epidemiologically coexists with a metabolic derangement, including visceral adiposity, insulin resistance, dyslipidemia, hypertension and hyperglycemia.This metabolic derangement ultimately increases the risk of cardiovascular events including heart failure and kidney disease [2][3][4] .Indeed, cardiovascular disease is the most frequent cause of death in individuals with MASLD, whereas hepatic failure leading to liver-related events is a less frequent complication.However, it is a common clinical observation, yet to be understood, that some individuals develop a severe and rapidly progressing liver disease despite similar or even less marked metabolic derangement.
MASLD has a strong inherited component, with variants that increase primarily liver triglyceride accumulation by impairing hepatocyte lipid droplet remodeling and lipoprotein secretion involved in its development and progression 5 .However, contrarily to the epidemiological evidence, these variants result in a protection against cardiovascular disease and no association with hypertension [5][6][7] and heart failure, suggesting no causal relationship between MASLD and cardiometabolic derangement 5 .
Over the last 15 years, genome-wide association studies (GWAS) identi ed several genetic loci associated with chronic liver disease or proxies for increased liver triglyceride content [8][9][10][11][12][13] .Studies have also shown that, excess in adiposity ampli es the effect size of these variants 14 likely by increasing ectopic visceral fat.However, anthropometric measures of adiposity (body mass index, BMI) and body fat distribution (waist circumference, WC) fail to provide an accurate quanti cation of visceral adiposity, which is most closely related to insulin resistance and metabolic alterations.Indeed, imaging (e.g., visceral adipose volume) and bioelectrical impedance analysis (e.g., whole-body fat mass) are more accurate measurements of body composition and are better predictors of MASLD 15 .
Here, we leverage these direct measurements of adiposity to identify novel genetic loci associated with MASLD.We identi ed and replicated 6 novel loci and generated two partitioned polygenic risk scores (PRS) based on the mechanism of genetic association with MASLD encompassing intra-hepatic lipoprotein retention.The two PRS suggest the presence of at least two distinct types of MASLD, one con ned to the liver and one entwined in the systemic cardiometabolic syndrome.

Results
Visceral adipose tissue, whole-body fat mass and BMI are independent predictors of liver triglyceride content and in ammation We started by examining the pairwise correlation among different measures of adiposity and a) liver triglyceride content measured by MRI-derived proton density fat fraction (PDFF), and b) liver in ammation/ brosis measured by liver iron corrected T1 (cT1, Fig. 1A).The strongest correlation with liver outcomes was observed with visceral adipose tissue volume (VAT) followed by BMI, waist-to-hip ratio (WHR) and whole-body fat mass (WFM).As expected, there was a high correlation between PDFF and cT1.Due to the presence of high multicollinearity among the adiposity indices, we carried out a model selection approach using 3 penalized regression models (see Methods) to dissect the predictive contribution of these measures of adiposity to PDFF and cT1.The minimum mean standard error (MSE) on the validation set showed that the Ridge regression model outperformed LASSO and Elastic Net models.The standardized coe cients from the Ridge regression showed that VAT was the strongest independent predictor of PDFF and cT1 followed by WFM and BMI for PDFF (Fig. 1B).Interestingly, due to high collinearity between WFM and BMI, L2 regularization term resulted in a negative standardized coe cient for BMI in predicting PDFF.As opposed to the pairwise correlation, in the penalized regression analysis WHR and IWB had almost no independent predictive power in determining PDFF or cT1.Therefore, we used WFM, BMI and VAT for the following genetic association studies.
Identi cation of 17 novel loci for liver triglyceride content and 9 for liver in ammation by the multi-adiposity-adjusted GWAS To capitalize on the independent contribution of indices of adiposity to liver lipid content and in ammation, we performed three GWAS each of them adjusted for a speci c index of adiposity (VAT, BMI, WFM) and one unadjusted, namely multi-adiposity-adjusted GWAS.All the GWAS were between 9,356,431 common genetic variants (MAF > 0.01) and PDFF (N = 36,394) and cT1 (N = 30,481) in Europeans from the UK Biobank and adjusted for age, sex, age×sex, age 2 and age 2 ×sex, rst 10 genomic principal components and array batch 16 .We estimated genetic correlation and heritability using linkage disequilibrium (LD) score regression analysis [17][18][19] .For PDFF, WFM adjustment explained the largest genetic variability followed by BMI and VAT explaining in the best-case scenario approximately 6% more heritability compared to the model unadjusted for any adiposity index (Supplementary Table 1).For cT1, all adiposity measurements yielded similar results to the unadjusted model.Intercepts from the LD score regression analysis did not show any sign of substantial confounding bias (Supplementary Table 1).These data suggest that while liver triglyceride content is dependent on adiposity, the presence of in ammation is less closely correlated to it.Furthermore, genetic correlations among different adiposity adjustments showed that BMI and WFM adjustments shared the largest overlap for both PDFF and cT1 (Fig. 2A, Supplementary Table 2), which is consistent with the epidemiological correlation.
To identify statistically independent genetic loci for each adiposity adjusted GWAS, we performed linkage disequilibrium (LD) clumping 20 (r 2 > 0.01 in a window of 1 Mb), followed by conditional and joint multiple-SNP analysis (GCTA-COJO) 21 .Next, to identify independent genetic loci from all 4 adiposity adjustments we used pleiotropic analysis 22 , by examining linkage disequilibrium among independent lead variants from the 4 adiposity adjustments and kept the strongest independent association as the lead variant.In this context, pleiotropic analysis refers to genetic loci that are shared among more than two adiposity adjustments (Supplementary Table 3).

PNPLA
To examine whether the newly identi ed genetic loci were previously reported, we searched NHGRI-EBI GWAS Catalog database 23 in a window of 1 Mb around each lead variant.We found 17, and 9 novel genetic loci associating with PDFF and cT1, respectively (Tables 1 and 2, Supplementary Table 5).Among the new and the previously identi ed loci, 4 (PNPLA3, TM6SF2, GPAM, HFE/SLC17A3) were associated with both traits with at least one adiposity adjustment.However, only PNPLA3 and TM6SF2 loci were associated with both traits at a genome-wide level irrespective of the adjustment (Fig. 2B).
Interestingly, a missense variant on ADH1B (rs1229984) was in the rst credible set with a PIP of 1 at ADH1B, MTTP and RP11-766F14.2loci, suggesting that the observed effect from all three loci may derive from the same putative causal variant.Although SuSiE identi ed 2 credible sets at MTTP locus, the second credible set contained 49 variants with highest PIP of only 0.08.In fact, ADH1B rs1229984 and MTTP rs11937107 have a D' of 1 in Europeans 26 .Similarly, ne mapping identi ed 3 credible sets at MAST3 locus, but TM6SF2 rs58542926 was the only variant in the rst credible set with a PIP of 0.98.We observed a similar nding at CKM locus with 2 credible sets, with a large PIP (> 0.99) for APOE rs429358 in the rst credible set.Of note, an upstream gene variant near CEBPA had a PIP of 0.84 at CEBPG locus that recently was found associated with chronic ALT levels in a recent multi-ancestry GWAS 9 .For other loci, independent variants from GWAS were among the ne-mapped credible sets, but with a relatively low probability of being the putative causal variant at the respective loci (Supplementary Table 6).SuSiE failed to identify any credible set for FAM101A and CELA2B loci possibly due to a high purity lter (r 2 < 0.25).

Functional analyses of independent loci associated with liver triglyceride content and in ammation
Independent genetic loci for adiposity adjusted PDFF and cT1 were mapped to genes using multiple approaches (see Methods).We performed positional eQTL in a window of 50 kb, and chromatin interaction mapping using FUMA 30 .A gene-based association analysis of common variants was performed in MAGMA 31 to nd potential coding genes associating with PDFF or cT1.We further used the V2G score for each leading variant at each locus from Open Target Genetics, along with the set of genes found via colocalization and ne-mapping.To rank the potential genes at each locus, we summed the evidence for each approach (Supplementary Table 8).This approach allowed us to map 724 unique genes with at least one piece of evidence for PDFF, and 469 for cT1.
Out of 37 and 18 independent loci for PDFF and cT1, the majority (31 and 12) loci had the highest rank for the nearest genes, respectively (Supplementary Table 8).For the remaining loci, multiple candidate genes were found.Speci cally, while 3 genes, GID4, MYO15A and ATPAF2 had the highest rank at GID4 locus, in a recent report 8 , this locus has been attributed to SREBF1 with at least one signi cant eQTL association and chromatin interaction (FUMA), and a signi cant association at the gene level (MAGMA).At FAM101A locus, CCDC92 had the highest rank.Whole-body knockout of this gene in a mouse model has been found to be protective against obesity and diabetes 32 .In addition, SLC2A4, an insulin regulated transporter of glucose, and ACADVL, an acyl-CoA dehydrogenase catalysing the rst step of mitochondrial fatty acid beta-oxidation, had the highest rank at DLG4 locus.LETMD1 had also the largest rank at TFCP2 locus.This gene plays a role in thermogenesis of brown adipose tissue, and mice lacking this gene were susceptible to diet-induced adiposity and impaired glucose disposal 33 .
To gain a deeper understanding of the biological implications of genome-wide signi cant loci, we conducted functional gene-set enrichment analysis using gene ontology (GO) biological processes, Reactome metabolic pathways, and ARCHS4 tissues.We focused on the mapped genes with the highest evidence rank at each locus (Supplementary Tables 9A and B).Mapped genes for PDFF were enriched in genes mostly expressed in liver and they were involved in triglyceride and lipid metabolism (Supplementary Table 9A and Supplementary Fig. 1A).Conversely, mapped genes for liver iron corrected T1 were enriched in metal ion metabolism and biosynthesis of blood groups (Supplementary Table 9B and Supplementary Fig. 1B).

Most genetic variants were associated with both liver triglyceride content and in ammation
Given the causal relationship between liver triglyceride content and in ammation, we examined the association of the novel variants identi ed by PDFF with cT1 and vice versa (Fig. 3A).Interestingly, the majority of the variants were associated with both traits and directionally concordant (Fig. 3A).This is consistent with the notion that liver triglyceride content causes in ammation 34 .A total of 5 (29%) and 4 (44%) loci were associated with either PDFF or cT1, suggesting a speci city of the effect on lipid and in ammation pathways.

Association between novel genetic loci and liver and metabolic traits
Next, we examined the association between these novel variants and indices of liver damage, brosis and liver disease (Fig. 3B and Supplementary Table 10).
As expected, more than 80% of variants associated with PDFF also associated with ALT, while more than 50% associated with cT1 associated with AST.
Polygenic risk scores of liver triglyceride content and in ammation explain approximately 6% of variation of these traits We generated polygenic risk scores (PRS) for PDFF and cT1 using 1) all variants identi ed in the multi-adiposity-adjusted GWAS, 2) only previously known genetic variants, and 3) novel variants identi ed in the present study.We calculated goodness-of-t of PRS using overall variance explained (% R 2 , Supplementary Table 11).Full PRS explained approximately 5.6% and 7% of the phenotypic variance for PDFF and cT1, respectively.As expected, this variance was mostly accounted for by previously known variants, where novel PRS conferred an improvement in prediction of less than 1% of phenotypic variance for both traits (Supplementary Table 11).
The association between six novel loci and liver triglyceride content was replicated in independent cohorts Based on the strong genetic correlation between PDFF and cT1, to validate the novel SNPs, we meta-analysed the association between all the novel 26 variants and liver triglyceride content in 3,903 individuals of European ancestry from four independent cohorts (Fig. 4 and Supplementary Table 12).We were able to replicate the association between six of the novel loci (CEBPG, TSC22D2, ABO, GUSB, TECTB, TFCP2) and liver triglyceride content.The direction of the association in the replication cohort was consistent with the discovery cohort.
Partitioned polygenic risk scores dissect a liver-speci c from a cardiometabolic component of steatotic liver disease Triglyceride secretion is a key mechanism regulating intracellular hepatocyte triglycerides homeostasis.Triglyceride secretion is mediated by very low-density lipoprotein (VLDL) secretion that in fasting conditions are proxied by circulating triglyceride levels.Variants in genes hampering incorporation of lipid and directly VLDL secretion, including APOB, MTTP, TM6SF2 and PNPLA3, cause retention of liver triglycerides, cholesterol and other lipid species mirrored by lower circulating triglycerides and low-density lipoprotein cholesterol.Carriers of these variants have an increased risk of the entire spectrum of steatotic liver disease and at the same time lower risk for cardiovascular disease due to the lower circulating lipoproteins [35][36][37] .Moreover, MTTP inhibition is currently used as a strategy to reduce cardiovascular risk in genetically determined hypercholesterolemia.
Based on this fundamental mechanism, we clustered the newly replicated and the previously known variants in two groups, and generated two partitioned PRS of PDFF-circulating triglycerides: a) a cluster with discordant association between PDFF and circulating triglycerides (n = 10), where the primary cause of higher liver triglyceride content is likely to be retention and b) a cluster with concordant direction (n = 13), where the primary cause may be an increase in uptake and synthesis of energy substrates.Variants at three genetic loci (SLC17A3, TOR1B and MAST3) did not associate with circulating triglycerides and were not included (Fig. 5 and Supplementary Table 13).While goodness-of-t measures of clustered PDFF-circulating triglycerides showed a strong association for the two PRS, the explained variance of discordant PRS was comparatively higher than the concordant.This was expected, as discordant PRS is composed by PNPLA3 and TM6SF2 variants, the strongest genetic predictors of liver triglyceride content (Supplementary Table 11).
Discordant PRS was associated with an increased risk of the entire spectrum of steatotic liver diseases with the largest association being with hepatocellular carcinoma.The concordant PRS had an overall similar effect on the risk of chronic liver disease.However, the difference in effect size with the discordant PRS was decreasing with increasing severity of liver disease to cirrhosis and cancer (Fig. 6A, Supplementary Fig. 2, and Supplementary Tables 14 and 15).
Interestingly, the discordant PRS, but not the concordant, was also associated with autoimmune liver disease.
Discordant PRS was associated with a decreased risk of cardiovascular disease.On the contrary, concordant PRS was associated with a substantial increased risk of cardiovascular disease and heart failure.When examining diabetes, both discordant and concordant PRS conferred an increased predisposition to this condition, suggesting that hepatic fat accumulation predisposes to diabetes irrespective of the underlying mechanism.Conversely, the larger effect size of the concordant PRS for diabetes despite the lower effect on liver triglyceride content would suggest that the association of diabetes in the concordant PRS is not mediated by liver damage.In the case of hypertension and chronic kidney failure, discordant PRS showed no association, whereas the concordant PRS increased the risk of both diseases.However, when we adjusted for hypertension, the association with chronic kidney failure was no longer signi cant while the other associations remained (Supplementary Table 15).Further adjustment of diabetes, total cholesterol and alcohol intake did not change the results (Supplementary Table 15).When examined the prospective risk conferred by the partitioned PRSs to develop liver and cardiometabolic disease we found virtually identical results (Fig. 6B, Supplementary Table 15).Functional gene-set enrichment analysis of mapped genes for discordant and concordant PRS also revealed a distinct metabolic pattern.While gene sets of discordant PRS were mostly enriched in lipid and triglycerides homeostasis (Supplementary Table 9C, Supplementary Fig. 3), concordant PRS gene sets were enriched in insulin receptor signalling and glucose homeostasis pathways, overall consistent with an impact on stimulation of de novo lipogenesis (Supplementary Table 9D, Supplementary Fig. 3).mRNA expression of loci from the liver speci c PRS is more abundant in the liver We further examined the mRNA expression of mapped-genes within concordant and discordant PRS using paired bulk RNA-Seq of liver (n = 244) and visceral adipose tissue (VAT, n = 261) from participants with obesity from the MAFALDA cohort.Interestingly, mapped genes of discordant, but not the concordant PRS, showed a signi cant overlap with upregulated differentially expressed genes in the liver (one-sided Fisher's exact test p-value = 0.007, Fig. 7).Given the tight interplay between VAT and liver in the SLD, this nding suggests a liver speci c nature of discordant PRS compared to its metabolic counterpart, concordant PRS.

Discussion
The main ndings of this study are: a) the identi cation of novel loci associated with steatotic liver disease and b) the identi cation of two distinct types of MASLD, namely a liver speci c and a systemic cardiometabolic.BMI, as a proxy of adiposity, ampli es the genetic predisposition to SLD given by common variants 14,38 .However, BMI does not consider body fat distribution and body composition.Based on this consideration, to identify novel genetic loci associated with MASLD, we compared a range of instrumental and anthropometric measurements of adiposity.We found that visceral adipose tissue volume, whole-body fat mass and BMI were the best independent predictors of liver triglyceride content and in ammation measured by PDFF and cT1, respectively.
Next, we performed multi-adiposity-adjusted GWAS on PDFF and iron corrected T1 content, as a measure of liver triglyceride content and in ammation.We identi ed a total of 17 novel genetic loci for liver triglyceride content and 9 for liver in ammation and replicated 6 of these loci in four independent cohorts.Among the previously known, we found loci associated with lipid droplet homeostasis in hepatocytes (PNPLA,3 TM6SF2, MBOAT7, MARC1, GPAM and APOE).
The heritability of liver triglyceride content was in uenced by the multi-adiposity adjustment explaining in the best-case scenario approximately 6% more heritability compared to the unadjusted.However, for in ammation this was not the case, suggesting that heritability of in ammation is not directly in uenced by adiposity.Liver triglyceride accumulation is causally associated with liver in ammation 34 .Consistently, approximately 80% of the genetic loci were associated with both liver triglyceride content and in ammation.
When we examined the association among liver traits and the novel variants, we found that 80% of those associated with liver triglyceride content were associated with alanine transaminase (ALT), a clinical biomarker of liver triglyceride accumulation and 50% of those associated with liver cT1 were associated with aspartate transaminase (AST), a marker of liver damage 39 .Interestingly, approximately 50% of the variants associated with cT1 were also associated with hepatocellular carcinoma (HCC) while only 6% of those associated with liver triglyceride content were associated with this cancer.This is consistent with cT1 measuring in ammation and liver damage, which may indicate a more advance disease stage, as opposed to PDFF measuring purely triglyceride content.
Intrahepatocyte triglyceride homeostasis is governed by three fundamental pathways: triglyceride synthesis, lipoprotein secretion, and energy substrate utilization.While all cells in the body are capable of synthesizing and utilizing triglycerides, lipoprotein secretion during fasting is speci c to the hepatocyte.Lipoprotein secretion consists of the incorporation of triglycerides into very low-density lipoproteins as a getaway for partitioning lipids to the adipose/muscle tissues and, in fasting conditions, it is highly correlated with circulating triglycerides.
Hindering lipoprotein secretion causes liver triglyceride accumulation by retention.Indeed, rare loss of function mutations in APOB segregate in families with liver steatosis and hepatocellular carcinoma and are enriched in case studies with this cancer 40 .Moreover, loss of function variants in TM6SF2 and PNPLA3, the strongest common genetic predictors of SLD, cause liver triglycerides retention by reducing lipoprotein secretion 41,42 .Counter-intuitively, carriers of these variants have lower risk for cardiovascular disease due to the lower circulating lipoproteins 35,43 .
Therefore, we decided to generate two partitioned PRS: one composed by variants in which the association between liver triglyceride content and circulating triglycerides were discordant and one in which they were concordant."Partitioned" or "process-speci c" polygenic scores de ne speci c pathways elucidating disease pathogenesis and identifying opportunities for drug target identi cation.These partitioned scores may also capture the speci c signature driving the individual progression from health to disease, hence providing a framework for tailored therapeutic interventions 44 .
The discordant PRS was composed by genetic variants primarily causing hepatic triglyceride retention, whereas the concordant PRS by variants not affecting liver secretion but, allegedly, the other two pathways, namely triglyceride synthesis and utilization.Moreover, the two PRS confer primarily a large risk for the entire spectrum of MASLD where the severity of liver disease was accompanied by a larger genetic risk.However, the discordant confers a substantially larger effect size as compared to the concordant PRS.
To the best of our knowledge, the concordant is the rst PRS predicting the entire spectrum of cardiometabolic disease, namely MASLD, diabetes, heart failure, and cardiovascular disease.On the contrary, the discordant PRS is liver speci c and associates with a more aggressive liver disease mirrored by protection from cardiovascular disease due to lipoprotein retention, despite a marginal increase in the risk of diabetes.The liver speci city of the discordant PRS score is further supported by a higher mRNA expression of genes composing this score in liver versus visceral adipose paired biopsies from individuals with obesity.
Our data suggest the presence of at least two distinct types of MASLD with speci c disease-causing molecular mechanisms: one more aggressive, speci c for the liver, and one associated with milder risk of liver events, but systemic and entwined with the cardiometabolic syndrome (Fig. 8).Understanding the molecular mechanisms underlying these components may allow to nd effective treatments for MASLD and the cardiometabolic syndrome.Clinically, these entities re ect the presence of individuals fast progressing into the later stages of MASLD and those with a persistent, but slow progressing MASLD associated with the entire metabolic cardiometabolic syndrome.The presence of these MASLD subtypes may account for the disease heterogeneity and may contribute to explain why several drugs have failed in clinical trials to treat MASLD.
Currently mendelian randomization studies are done by selecting variants associated with a trait and using them to explain the causal relationship with a different trait.In this study, the PRS had opposite effects on cardiovascular risk indicating that if we had pooled the variants all together, we may have nulli ed the association.Therefore, our ndings support the notion that the development of partitioned PRS constructed by integrating genetic variants into physiological pathways contributing to the trait phenotype, as compared to overall PRS based on the association between variants and a trait, may allow to better clarify the heterogeneity of disease pathogenesis at the individual level.Ultimately, this will lead to precision medicine improving outcome prediction and therapy.
A strength of this study is that the partitioning of the PRS was a hypothesis driven approach based on a solid knowledge of intracellular lipid homeostasis.
While the nding on cardiovascular disease may be expected, although the adjustment for circulating total cholesterol did not changed results, the association with heart failure, hypertension and diabetes was not granted.An unsupervised clustering approach derived from several phenome-wide associations may be used for PRS partitioning.However, this approach may be considered somewhat a self-ful lling prophecy because it uses traits to generate clusters that are subsequently used to predict diseases deriving from the same traits used in the clustering.
In conclusion, by leveraging human genetics and multi-adiposity adjustment we identi ed 6 novel loci associated with SLD and two distinct types of steatotic liver disease, namely one that is liver speci c and confers risk of a more aggressive liver disease and another that is milder and entwined with the full spectrum of cardiometabolic syndrome.

Methods
UK Biobank.The UK Biobank study has recruited over 500,000 participants aged between 40 and 69 years across the UK between 2006 and 2010, with extensive phenotypic and genetic data 45,46 .The UK Biobank received ethical approval from the National Research Ethics Service Committee North West Multi-Centre Haydock (reference 16/NW/0274).Data used in this study were obtained under application number 37142.European ancestry was de ned as described before 12 by removing outliers using genomic principal components.Additionally, subjects were excluded if they fall into any of these categories: 1) more than 10 putative 3rd degree relatives, 2) a mismatch between self-reported and genetically inferred sex, 3) putative sex chromosome aneuploidy, 4) heterozygosity and missingness outliers, and 5) withdrawn consent 45,46 .
Genotypes and imputation.UK Biobank participants were genotyped using 2 highly similar (> 95% overlap) genotyping arrays, which were then imputed centrally by the UK Biobank based on the 1000 Genomes Phase 3, UK 10K haplotype, and Haplotype Reference Consortium reference panels.Starting from approximately 97 million variants, we only kept 9,356,431 variants with a minor allele frequency (MAF) > 1%, imputation quality (INFO) score > 0.8, and Hardy-Weinberg equilibrium P > 1E-10 12,46 .
De nition of traits.We used adiposity measures directly provided by UK Biobank, including visceral adipose tissue volume (VAT, data-eld 22407), whole body fat mass (WFM, data-eld 23100), impedance of whole body (IWB, data-eld 23106).Waist-to-hip ratio (WHR) was calculated by dividing waist to hip circumference.MRI-derived proton density fat fraction (PDFF) and liver iron corrected T1 (cT1) were provided directly by UK Biobank (data-elds 40061 and 40062).The details on liver MRI protocols can be found elsewhere 47 .Brie y, individuals were scanned using a Siemens 1.5T Magnetom Aera.Two sequences were then used for data acquisition, a multiecho-spoiled gradient-echo and a modi ed look locker inversion sequence (ShMOLLI) for PDFF and cT1, respectively 47 .The de nition of binary traits can be found in Supplementary Table 16.
Phenotypic prediction models.To address the multicollinearity between different measures of adiposity and to verify their contribution in predicting PDFF and cT1 values, we t penalized linear regression models and carried out a model selection in a 10-fold nested cross-validation (CV) using Least Absolute Shrinkage and Selection Operator (LASSO), Ridge and Elastic Net.LASSO penalizes the regression model using the L1-norm, effectively reducing the in uence of non-contributing features to zero.On the other hand, Ridge regression utilizes the L2-norm, allowing it to shrink regression coe cients toward zero.Elastic Net combines elements of both LASSO and Ridge by incorporating both L1 and L2 penalties through a mixing parameter α.
To conduct the CV process, the dataset was initially divided into training (80%) and test (20%) sets.Within the training set, the outer CV assessed the performance of each model, while the inner CV was utilized for hyperparameter tuning.This tuning was accomplished by minimizing the mean squared error (MSE) across a grid of α and shrinkage values in each fold of the outer CV.The best performed model with the lowest MSE, was then trained on the entire training set within a 10-fold CV framework.Subsequently, its performance was evaluated using the remaining test set.Finally, the model with the optimal set of hyperparameters, determined in the previous step, was tted to the entire dataset for nal evaluation 48 .Adiposity indices were standardized before the training, while PDFF and cT1 values were rank-based inverse normal transformed.All models were adjusted for age, sex, age 2 , age×sex, age 2 ×sex.All analyses were performed in MATLAB (Mathworks) R2023a.
Genome-wide association analysis.The association between 9 million imputed common variants and PDFF or cT1 under different adiposity adjustments under an additive genetic model was performed using whole-genome regression model as implemented in REGENIE (v3.2.8) 16 .The analysis was adjusted for age at MRI, sex, age 2 , age×sex, age 2 ×sex, the rst 10 PCs of ancestry, genotyping array and adiposity index, where adiposity index was VAT, WFM, BMI or no adiposity adjustments.
Similarly, we tested the association between independent lead variants from multi adiposity adjusted GWAS of PDFF and cT1, and other binary or continuous metabolic traits using either a logistic or linear whole-genome regression model in REGENIE, and adjusted for the same set of covariates including consistent adiposity adjustments.Individuals with an available PDFF or cT1 measurements were excluded prior to the association analysis.In cases where the trait was measured at baseline, we used waist-to-hip ratio (WHR) instead of VAT adjustment, since the latter was not available at the baseline.To t the whole-genome regression model in step 1 of REGENIE, a subset of directly genotyped common variants (MAF > 1%) was used.After excluding variants on long-range linkage disequilibrium (LD) and major histocompatibility complex regions, variants with a missingness < 0.01, and with Hardy-Weinberg equilibrium P > 1E-15 were retained.Finally, 146,833 markers left following an LD pruning with a window of 500,000 base pairs and pairwise r2 < 0.1 20 .Continuous traits were rank-based inverse normal transformed before the analyses.
Pleiotropy analysis.We evaluated whether the independent genome-wide signi cant loci, adjusted for different adiposity measures, were speci c to each adiposity measure, common between PDFF and cT1 GWAS, or shared across both.Therefore, if two independent lead variants within 1 Mb of each other were in LD (r 2 > 0.2), they were assigned the same locus id (Supplementary Table 3).Circular Manhattan plots were visualized using Circos 50 .
Functionally-informed ne-mapping.Functionally-informed genetic ne-mapping was performed using PolyFun and Sum of Single Effects (SuSiE, v0.11.92) 24,25 .PolyFun was used to estimate per-SNP heritability using L2-regularized extension of strati ed LD score regression (S-LDSC) and baseline LD model v2.2 containing 187 annotations [17][18][19]24,51 . The etimated per-SNP heritabilities were used as prior causal probabilities in SuSiE with a maximum of 10 causal variants in each region.The subset of 337,000 unrelated white-British individuals from UK Biobank were used for in-sample LD structure.After excluding MHC region on chromosome 6, ne-mapping per each locus was performed in a window of 1.5 Mb around the lead genetic variants (P < 5E-8).
Colocalization.Colocalization was performed between independent genetic loci identi ed by COJO-GCTA, and summary statistics of gene expression quantitative trait loci (eQTL) of 49 tissues in GTEx (v8) from eQTL catalogue release 4 27,28 .The coordinates of GWAS summary statistics were rst converted from Build 37 to 38 using liftOver function of rtracklayer R package (v.1.54.0) 52 .We performed colocalization using COLOC-SuSiE assuming the presence of multiple causal variants (coloc R package 29,53 , v.5.1.0)with default priors, and considered variants in a window of 1.5 Mb around index variant at each locus.
We considered only genes with at least one signi cant variant (FDR P < 0.1, eGenes), and performed the colocalization in a window of 1.5 Mb around each eGene.If SuSiE did not converge after 1000 iterations, conventional (single causal variant) coloc was used.Finally, a H4 posterior probability (PP) > 0.8 was considered as strong evidence that both traits share the same causal variant.
Gene mapping and functional enrichment analysis.To map and prioritize potential candidate genes for independent genetic loci, we employed multiple approaches: 1) SNP2GENE module of FUMA v1.5.4 30 was used for positional mapping of lead variants to genes with a maximum distance of 50 kb, 2) eQTL mapping using FUMA by considering only genes with at least 1 signi cant eQTL association (FDR < 0.05), 3) 3D chromatin interaction mapping using FUMA by considering only signi cant interactions (FDR < 1E-6) within 250 and 500 bp upstream and downstream of TSS, respectively, 4) Multi -marker Analysis of GenoMic Annotation (MAGMA, v1.08) 31 analysis implemented in FUMA was carried out to perform genome-wide gene association analysis using 19,535 curated protein coding genes.Only genes with a Bonferroni threshold below 0.05/19,535 = 2.56E-6 were kept for gene mapping.Variants within MHC region were excluded prior to the analysis.5) Colocalized genes from colocalization analysis with at least one tissue and an H4 posterior probability (PP) > 0.8, 6) Nearest gene(s) to the ne-mapped variants with the maximum PP per each locus, and 7) Gene with the highest overall V2G score at each locus based on Open Targets Genetics 55 .Finally, to prioritize the mapped genes, we calculated an unweighted ranking score by summing over the evidence from the abovementioned approaches.By using the set of genes with the maximum ranking score at each locus, we performed functional gene-set enrichment analysis using Enrichr tool 56 against, ARCHS4 tissues 57 , Reactome biological pathways 58 and Gene Ontology (GO) Biological Processes.Signi cant terms with a Benjamini-Hochberg FDR corrected P-value < 0.05 per each database, were reported.For visualization, both adjusted P-values and Enrichr combined scores (-log(P)×odds ratio) were Dallas Heart Study.In this study only 828 European Americans from the Dallas Heart Study (DHS-1) were used.The DHS is a population-based sample study of the Dallas County, Texas, USA, where liver triglyceride content was measured by magnetic spectroscopy.Details of this study can be found elsewhere 13 .Meta-analysis.The association between novel independent loci for PDFF and cT1 and MRS liver fat (Dallas Heart Study and NEO studies) or CAP measurement (MAFALDA and Liver BIBLE) was performed using a linear regression analysis adjusted for age, sex, age 2 , age×sex, age 2 ×sex, and BMI after a rank-based inverse normal transformation of the response.An inverse-variance meta-analysis was then performed with xed-effect model using the meta R package (v6.5.0).For genetic variants not available in either of replication cohorts, a proxy variant was used instead: variants in LD (R 2 > 0.4) with the lead variant in the UK Biobank within a window of 1.5 Mbp.If no such variant was found in the UK Biobank, LDproxy tool with Europeans from 1000 G project was used instead 26 .In case of multiple proxy variants, the one with the highest LD and functional consequence was selected.
RNA-seq analysis.Total RNA for 264 paired liver and visceral adipose tissue samples was isolated using miRNeasy Advanced Mini kit (Qiagen, Hulsterweg, Germany).RNA sequencing and library preparation was performed in a paired-end 150 bp mode using the Illumina NovaSeq PE150 (Novogene, China).Reads were aligned to GRCh38 reference genome by STAR 63 (v2.7.10a) after quality check (FastQC software, Babraham Bioinformatics, Cambridge, UK) and trimming of low-quality reads and potential contaminating adapters by Trimmomatic 64 (v0.39).Gene-level read counts were quanti ed by RSEM 65 (v1.3.3)tool against the Ensembl (release 107).Samples with insu cient mapping speci city (uniquely to total mapped reads < 0.7) were excluded before the analysis.Finally, a paired differential expression analysis of 261 VAT and 244 liver samples was carried out using DESeq2 66 (v.1.38.3), while adjusting for RNA Integrity Number (RIN), individual ID, and 5 surrogate variables detected by surrogate variable analysis 67 .
Follow-up analysis.The longitudinal association of PRS with the occurrence of the outcomes was tested through Cox proportional hazard regression and expressed as hazard ratios with 95% con dence intervals.The median follow-up was 14.5 y, and individuals with any of diagnoses at the baseline were excluded prior to the analysis (Supplementary Table 16).The proportional hazard assumption was checked through the consideration of Schoenfeld residuals, and no violations were detected.Prospective associations were performed in R v4.0.2 (R Foundation for Statistical Computing, Vienna, Austria).

Figures
Figure 3 (A) A consistent reciprocal trait association between novel loci and liver triglycerides (PDFF, yellow dot) and in ammation (purple dot, cT1) and (B) novel genetic loci were associated with liver disease and metabolic traits.(A) Association was calculated by a whole-genome regression analysis and the colour of squares represent: no adjustment (red square) adjusted for BMI (black square), for whole-fat mass (WFM, green square) and visceral adipose tissue (VAT, orange square).The edge colours denote the direction of the association with the effect (risk) allele, and their thickness correspond to the strength of the association (-log10 P-value).(B) Heatmap of the Z-score of associations for the effect (risk) allele between novel genetic loci and liver or metabolic-related traits (columns) in n=397,780 UKBB participants after excluding individuals with available PDFF or liver iron corrected T1 (n=36,748).Upper and lower boxes correspond to liver iron corrected T1 and PDFF genetic loci, respectively.Full summary statistics have been reported in Supplementary Table 10.P-values were not corrected for multiple hypothesis testing.VAT: Visceral adipose tissue; WFM: Whole-body fat mass (kg/m2); cT1: liver iron corrected T1; PDFF: proton density fat fraction; CLD: chronic liver disease.The association between 6 novel loci and hepatic triglyceride content was replicated in independent cohorts.Meta-analysis of the associations between independent novel genetic loci and hepatic triglyceride content in four independent European cohorts.Proxy variants were used for variants not available in the replication cohorts (r2 > 0.4 within a window of 1.5 Mb around each lead variant in the UK Biobank) as marked with an asterisk.Pooled effect estimates were calculated using inverse-variance-weighted xed-effect meta-analysis.Genomic loci in bold are those with a P-value < 0.05 in the xed-effect model.Full summary statistics have been reported in the Supplementary Table 12.P-values are two-sided and not adjusted for multiple testing.
Association between 26 previously known and 6 novel replicated genetic loci and circulating triglycerides in the UK Biobank.The heatmap shows the Z-score of associations for the effect (risk) allele in Europeans (n=397,780) after excluding individuals with available PDFF or liver iron corrected T1 (n=36,748).Upper and lower boxes correspond to liver iron corrected T1 and PDFF genetic loci, respectively.Novel replicated genetic loci have been marked in blue.Full summary statistics have been reported in Supplementary Table 13.P-values were not corrected for multiple hypothesis testing.
used.Replication cohorts.NEO.The Netherlands Epidemiology of Obesity Study (NEO) is a population-based cohort study in men and women aged 45 to 65 years, with oversampling of individuals having BMI over 27 kg/m 2 from Leiden and surrounding areas in the Netherlands.At baseline, 6,671 participants were included and around 35% of the NEO participants were randomly selected to undergo hepatic triglyceride content (HTGC) measurements by MRS.Genotyping was performed using Illumina HumanCoreExome-24 BeadChip and imputed to TOPMed reference genome panel 59 .In the present work, a total of 1,822 individuals of European ancestry with an available HTGC were used.Liver-BIBLE.The Liver-BIBLE-2022 cohort comprises 1,144 healthy middle aged individuals (40-65 years) with metabolic dysfunction (at least three criteria for metabolic syndrome among BMI ≥ 35 Kg/m 2 , arterial hypertension ≥ 135/80 mmHg or therapy, fasting glucose ≥ 100 mg/dl or diabetes, low HDL < 45/55 mg/dl in M/F and high triglycerides ≥ 150 mg/dl) who presented for blood donation from June 2019 to February 2021 at the Transfusion Medicine unit of Fondazione IRCCS Ca' Granda Hospital (Milan, Italy) 60 .Hepatic fat content was estimated non-invasively by controlled attenuation parameter (CAP) with FibroScan® device (Echosens, Paris, France).Genotyping was performed by Illumina GlobalScreeningArray (GSA)-24 v3.0 plus Multidisease Array (Illumina, San Diego, CA), and further imputed to TOPMed reference genome panel 61 .At the time of analysis, genomic data passing quality control with an available CAP measure were available for 1,081 patients of European ancestry.MAFALDA.The "Molecular Architecture of FAtty Liver Disease in patients with obesity undergoing bAriatric surgery (MAFALDA)" study started in May 2020 and ended in April 2022.It comprises a total of 468 consecutive participants with morbid obesity (BMI ≥ 35 kg/m 2 ) that underwent bariatric surgery at Campus Bio-Medico University of Rome, Italy in whom SLD diagnosis was assessed by liver histology or vibration-controlled transient elastography including CAP measurement with FibroScan® (Echosens, Paris, France) 62 .Liver fat content estimated by CAP was available for 172 individuals.Genotyping was performed in the same manner as that of the Liver-BIBLE cohort.

Figure 7 mRNA
Figure 7 mRNA expression of loci from the liver speci c (discordant) polygenic risk score is more abundant in the liver compared to the visceral adipose tissue (VAT).Differential expression analysis of paired liver and VAT bulk RNA-Seq data for mapped gene sets of concordant and discordant PRS.The lower right bar plot shows the fraction of upregulated differentially expressed (DE) genes in the liver compared to VAT.The enrichment of PRS clusters with upregulated DE genes in the liver was calculated using one-sided Fisher's exact test.VAT: visceral adipose tissue; FC: fold change; FDR: false discovery rate.

Table 1
Genome-wide signi cant loci for multi-adiposity-adjusted PDFF and liver iron corrected T1 in the UK Biobank.The association between common genetic varian different adiposity adjustments was performed using REGENIE adjusting for adiposity index, age, sex, age×sex, age 2 and age 2 ×sex, rst 10 genomic principal array batch.Each adiposity adjustment has been shown in parentheses.Genomic loci in bold represent the novel loci identi ed in the present work.Locus co nearest gene to the lead variant (from COJO analysis) at each locus.MAGMA column shows signi cant gene-based associations at each locus exceeding Bonf < 2.65E-6).P-values were not corrected for multiple testing among 4 different models (unadjusted, adjusted for BMI, WFM and VAT).PDFF: proton density fat iron corrected T1, WFM: whole-body fat mass (kg/m 2 ), VAT: visceral adipose tissue.