Transaminase concentrations cannot separate NAFL and NASH in morbidly obese patients independent of histological algorithm

Background: Prevalence of non-alcoholic fatty liver disease (NAFLD) in the German population is 20– 30%. Liver biopsy to detect non-alcoholic steatohepatitis (NASH) let alone for NAFLD monitoring is not feasible. Current practice regards elevated serum concentrations of liver enzymes as indicator for NAFLD or NASH. In this study we analyzed if an adjustment of the upper limit of normal (ULN) for serum liver enzymes can improve their diagnostic accuracy. Methods: Data from 363 morbidly obese patients (42.5±10.3 years old; mean BMI: 52±8.5 kg/m²), who underwent bariatric surgery were retrospectively analyzed. All patients had histologically conrmed NAFL or NASH (NAS and SAF). Results: In 121 women (45%) and 45 men (46%) elevated values for at least one serum parameter (ALT, AST, γGT) were present. The serum concentrations of ALT (p <0.0001), AST (p <0.0001) and γGT (p = 0.0023) differed signicantly between NAFL and NASH, independent of classication method (NAS, SAF). Concentrations of all three serum parameters correlated signicantly positively with the NAS and the SAF score, with correlation coecients between 0.33 (ALT/NAS) and 0.40 (GGT/SAF). The AUROCs to separate NAFL and NASH by liver enzymes achieved a maximum of 0.70 (ALT applied to NAS-based classication). For 95% specicity the ULN for ALT would be 47.5 U/l; for 95% sensitivity, the ULN for ALT would be 17.5 U/l, resulting in 62% uncategorized patients. Conclusion: ALT, AST and GGT are unsuitable for non-invasive screening or diagnosis of NAFL or NASH. Utilizing liver enzymes as an indicator for NAFLD/NASH should generally be questioned.


Introduction
Nonalcoholic fatty liver disease (NAFLD) is a highly prevalent disease occurring in 20-25% of the global population and is currently the main chronic liver disease in the Western world 1 . Since obesity and insulin resistance (IR) are crucial elements in the pathophysiology of NAFLD, recently a new term has been proposed to label this disease as MAFLD: Metabolism-associated fatty liver disease 2,3 . This is important because the textbook de nition of NAFLD strictly excludes relevant alcohol consumption and other liver related diseases. In reality humans are diverse in habits and diet and cases with co-existing NAFLD/MAFLD with alcohol consumption or, i.e., hepatitis C infection are not uncommon.
Due to the increasing prevalence of obesity in the last ve decades, detection and characterization of NAFLD has become more and more important for two main reasons. Firstly, NAFLD is the hepatic manifestation of the metabolic syndrome and can therefore be a part of the risk strati cation for cardiovascular diseases and other conditions associated to the metabolic syndrome 4,5 . Secondly, Nonalcoholic steatohepatitis (NASH), the progressive form of NAFLD with in ammation and ballooning of hepatocytes in the liver is associated with substantial risk of morbidity and mortality [6][7][8] . It is debatable if NAFL (liver steatosis without relevant in ammation) and NASH have similar risk for cardiovascular and other metabolic comorbidities and both develop signi cant brosis [9][10][11][12] . But it is clear that NASH is associated with a greater risk of cirrhosis and developing a hepatocellular carcinoma (HCC) with or without cirrhosis 13,14 . The risk of liver-related mortality is also greater in NASH.
A common method in the detection of NAFLD is ultrasound. However, while NAFLD is histologically de ned by the presence of steatosis in at least 5% of hepatocytes, usually 20% liver lipid content (depending on operator experience and the examination conditions) is necessary for the detection of liver steatosis by ultrasound 15 . Elevated serum liver enzymes (ALT, AST, γGT) are the most common symptom prompting further examinations. However, up to 75% of NAFLD patients do not show elevated liver enzymes, when applying current reference values 16 . This situation leads to severe underdiagnosis of NAFLD and NASH in real life practice.
Liver biopsy and histological assessment is still considered the gold standard for the diagnosis of NAFLD and the separation of its subgroups non-alcoholic fatty liver (NAFL) and NASH 15 . The diagnosis is based on semi-quantitative combination of the histological features steatosis, ballooning, and lobular in ammation. By now two scoring systems have been established: The NAFLD activity score (NAS) and the Steatosis activity brosis score (SAF score). Whereas NAS is the unweighted summation of semiquantitative evaluation of steatosis, ballooning, and lobular in ammation (with an increasing likelihood of NASH depending on the scoring points), the SAF score requires each one of those criteria to be met for the diagnosis NASH 15,17 . This gold standard is limited in applicability, as there is a small risk for complications (about 1%), sampling variability and intra-observer differences can be substantial, since one biopsy only represents about 1/50000 of liver tissue 18, 19 and an effective screening of 20% cannot performed via liver biopsy.
Due to the increasing global prevalence of NAFLD and the limitations of above described diagnostic methods described above, non-invasive screening methods are urgently warranted. Aim of the present study was to test whether serum liver enzyme concentrations can be applied to detect NAFLD and/or separate NAFL from NASH, independent from current reference values.

Differences of demographic data between NAFL and NASH depend on assessment method
In total, data of 363 morbidly obese patients undergoing bariatric surgery was analyzed. The whole cohort was comprised of 270 women (74%) and 93 men (26%) with an mean age of 42.5 ± 10.3 years and mean BMI of 52 ± 8.5 kg/m 2 . The mean ALT serum concentration before surgery was 37.2 ± 26.4 U/l, and mean serum AST was 31.3 ± 19.0 U/l, respectively. The median NAS was 4 (1-8) and the median SAF score 6 (1-12).
Applying the SAF score 106 patients (29%) were categorized as NAFL (78% female, 22% male) and 257 patients (71%) were categorized as NASH (73% female, 27% male). Based on the SAF classi cation the patients with NAFL were signi cantly younger than the patients with NASH (41.0 ± 11.1 vs. 43.2 ± 9.9; p = 0.048), but there was no signi cant difference in the BMI (51.2 ± 8.8 kg/m 2 vs. 52.3 ± 8.4 kg/m 2 ) and sex. Disagreement between the two classi cation systems was 20% of all patients with 64 patients categorized as NASH (by SAF) instead of NAFL and 10 patients categorized as NAFL (by SAF) instead of NASH. Accordingly, distribution into NAFL and NASH between NAS and SAF differed signi cantly (p < 0.0001 Fisher's exact test).

Serum concentrations of liver enzymes differ signi cantly between NAFL and NASH
The majority of patients did not show an elevation of liver enzyme serum concentrations above the current upper limit of normal (AST/ALT: 35 U/l, γGT: 40 U/l for women; AST/ALT: 50 U/l, γGT: 60 U/l for men). In 121 women (45%) and 45 men (46%) elevated concentrations were found for at least one of the factors ALT, AST or γGT. Serum concentrations of ALT (p < 0.0001), AST (p < 0.0001) and γGT (p < 0.0023) differed signi cantly between NAFL and NASH, regardless of classi cation by NAS ( Fig. 1) or SAF score ( Fig. 2) Liver enzyme serum concentrations are correlated with NAS and SAF score As all liver enzyme concentrations differed signi cantly between NAFL and NASH, we investigate if there was a direct correlation between histological scores and serum concentrations. Calculation of spearman rank coe cients resulted in signi cantly positive correlations (all p < 0.0001) of all three serum parameters (ALT, AST γGT) with the NAS as well as the SAF score. Correlation coe cients ranged between 0.33 (ALT / NAS) and 0.40 (γGT / SAF), with a detailed overview given in Table 2. Overall, slightly, but not signi cantly weaker correlations were detected between NAS and the serum concentrations compared to those with the SAF score. The highest correlation coe cients were reached by γGT with NAS or the SAF. speci city or 95% sensitivity were calculated (Table 3). For example, to achieve 95% speci city, the threshold for ALT would have to be 17.5 U/l and a threshold of 47.5 U/l would result in 95% sensitivity. The diagnostic gap between these thresholds leaves 62% patients uncategorized. With the same twothreshold approach (each at 95 speci city/sensitivity) 91% of the patients would remain uncategorized for AST and 77% patients for γGT.

Discussion
With a global prevalence of NAFLD at 20-30% and a similar estimated prevalence in the German population 1 it has become more and more important to enhance diagnostic e ciency for this disease. Liver biopsy is still considered gold standard for con rmation of NAFLD, for risk assessment of liver cirrhosis and HCC, and the only method to detect and diagnose NASH 15 . Given the limitations of this method and the very low proportion of patients, who actually undergo liver biopsy, it is no surprise that NAFLD and in particular asymptomatic progressive liver disease (i.e. NASH with compensated cirrhosis) remain severely underdiagnosed 7 . As elevations of serum liver enzyme concentrations are still the most common symptom resulting in further clinical workup and referral to hepatologists 20 [27][28][29][30] . Based on these and further studies, current AASLD guidelines recommend a hepatitis B-screening at ALT serum concentrations above 35 U/l for men and 25 U/l for women 31 . However, to achieve a 95% sensitivity for the detection of NASH in our cohort, the cutoff would have to be even lower (AST: 15.5 U/l; γGT: 19.5 U/l). Lowering of the upper limit of normal values to the point where adequate sensitivity is reached, would lead to a very high rate of false positive results for NAFLD/NASH when used as screening in the general population. In our opinion these ndings of other groups and our own rule out any applicability of the liver serum values for diagnostic or screening purposes in NAFLD.
Another result of this study is that histological evaluation by NAS and by SAF differed in a substantial proportion (20%) of the study population. NAS and SAF were designed for different purposes and the NAS classi cation is usually broken down into three groups with 1-2 points as de nitely no NASH, 3-4 as borderline NASH (requiring further examination or follow up) and ≥ 5 as de nite NASH. To reduce ambiguity and allow more robust statistical evaluation we applied a simpli ed classi cation, interpreting NAS up to 4 as NAFL. This might be one cause, why 64 patients more were classi ed as NASH by SAF than by NAS. However, 10 patients were classi ed as NAFL by SAF and NASH by NAS. A study with a more diverse patient population, different outcome scenarios, and rigorous histological work-up would be required to investigate if one scoring system is superior in identifying patients at higher health risk. With our current understanding 32-34 the observed differences between NAS and SAF classi cation might be a matter of interpretation in borderline NASH cases.
A direct comparison of serum liver enzyme concentrations yielded a signi cant difference between NAFL and NASH, independent of classi cation by NAS or SAF. The effect size was small and the overlap of the range of values was quite broad. Furthermore, all three serum parameters correlated signi cantly positively with the NAS and the SAF score, with slightly stronger correlations with the SAF score. Overall the rank correlation coe cients indicate a modest correlation between the histological assessed liver injury and serum concentrations of enzymes. However, due to this positive correlation and the small but signi cant difference between NAFL and NASH serum liver enzymes might support surveillance. When the diagnosis NAFLD is established (i.e. histologically or by ultrasound) an increase of serum liver enzymes, even within the current normal range, would indicate disease progression. Though, this has to be tested in an appropriately designed study. Our nding that the serum parameters exhibited slightly stronger correlations with the SAF score than with NAS can only be attributed to the addition of the brosis grade to the SAF score. This clearly indicates that serum liver enzymes re ect not only injury of the hepatic parenchyma but also brogenesis. It has been shown multiple times that brogenesis and progression to advanced brosis occurs in a similar proportion of NAFL and NASH patients 10,35−37 , albeit possibly with different pace. The hypothesis that serum liver parameters re ect liver injury and brogenesis in parallel would explain the limited clinical bene t for screening or diagnostic purposes in NAFLD, as progression of brosis is rather rare in this disease.
In summary, our data fortify ndings of the past 15 years, that elevation of serum liver enzymes ALT, AST and γGT is no reliable sign for NASH or progressive NAFLD. Conversely, serum concentrations of these factors in normal range do not exclude NASH. An adaptation of normal ranges would probably dramatically increase false positive results without enhancing clinical risk assessment. Serum liver enzymes might still have use in disease surveillance, when NAFLD or NASH will be established by other diagnostic measures. It is time to move on from serum liver enzyme elevations in NAFLD detection or risk assessment and focus studies on other markers, in particular those related to adipose tissue dysfunction, glucose metabolism, and IR.

Patients
Morbidly obese patients undergoing bariatric surgery were recruited at the Alfried-Krupp-Krankenhaus Essen, Essen, Germany. Dietary and exercise counselling was offered to patients for 6 months prior surgery, no calorie restriction was imposed. A blood sample was collected for assessment of serum derived factors on the day of surgery (prior surgery) and liver tissue was sampled during bariatric surgery.
In liver tissue steatosis, ballooning, lobular in ammation, and brosis were assessed by two independent pathologists. NAFLD and severity thereof was diagnosed histologically according to Kleiner et al. 40 . The SAF was calculated according to Bedossa et al. 17 . All data shown were recorded on the day of surgery.
Overall 694 patients were recruited from 2004 until 2017. For this data analysis patients were eligible, when measurements of alanine-and aspartate aminotransferase serum concentrations (ALT, AST) and liver histology were available, with a NAS of at least 1. To limit groups for a robust analysis a NAS of 1-4 was categorized as NAFL and NAS ≥ 5 was categorized as NASH for the present study. 363 patients were eligible for the retrospective study and detailed demographic and clinical information of this cohort is given in Table 1.  Figure 1 Serum liver enzyme concentrations differ between NAFL and NASH according to classi cation by NAS.

Figure 2
Serum liver enzyme concentrations differ between NAFL and NASH according to classi cation by SAF. Morbidly obese NAFLD patients were grouped as NAFL or NASH according to the SAF score (REF Bedosssa). The serum concentrations of all three liver enzymes, alanine-aminotransferase (ALT; A), aspartate-aminotransferase (AST; B) and gamma-glutamyltransferase (γGT; C) differed signi cantly between NAFL and NASH. Statistical signi cance was tested by Mann-Whitney-U-Test (not normally distributed data) and signi cance was assumed at p < 0.05.  ROC curves for performance of serum liver enzyme concentrations to separate NAFL and NASH de ned by SAF. For each serum parameter alanine-aminotransferase (ALT; A), aspartate-aminotransferase (AST; B) and gamma-glutamyltransferase (γGT; C) ROC curves were calculated to separate NAFL from NASH based on SAF classi cation ( REF Bedossa). The area under the curve (AUC) were calculated for all curves resulting in poor to moderate discriminatory performance of 0.67 (ALT) to 0.69 (ALT).