External validation and comparison of simple tools to screen for nonalcoholic fatty liver disease in Chinese community population

Background Various noninvasive tools based on anthropometric indicators, blood lipids, and liver enzymes, etc. have been developed to screen for nonalcoholic fatty liver disease (NAFLD), with different diagnostic performance and cutoff values among studies. We aimed to validate and compare eight NAFLD-related models developed by simple indicators and to define their cutoff values in Chinese community population. Methods A cross-sectional study was conducted in a health examination cohort of 3259 people. NAFLD was diagnosed by ultrasonography. General, anthropometric and biochemical data were collected. Fatty liver index (FLI), fatty liver disease index (FLD), Zhejiang University index (ZJU), lipid accumulation product (LAP), regression formula of controlled attenuation parameter (CAP), waist-to-height ratio (WHtR), triglyceride and glucose index (TyG), and visceral adiposity index (VAI) were calculated. The accuracy and cutoff points to detect NAFLD were evaluated by area under the receiver operator characteristic curve and the maximum Youden index analysis, respectively. A head-to-head comparison between these models and Decision Curve Analysis (DCA) was conducted. Results In eight noninvasive diagnostic models of NAFLD, AUCs of FLI and FLD for NAFLD were higher than those of other models in the whole (0.852 and 0.852), male (0.826 and 0.824), and female (0.897 and 0.888) population, respectively. DCA showed that FLI, FLD, and ZJU have higher net benefit to screen for NAFLD compared to other models. Conclusions FLI and FLD could be the most accurate and applicable of eight models for the noninvasive diagnosis of NAFLD in both male and female groups.


Introduction
Nonalcoholic fatty liver disease (NAFLD) is becoming increasingly prevalent, affecting more than one-quarter of adults in the world, with prevalence of 20.09% in China [1]. The spectrum of NAFLD covers from simple steatosis or nonalcoholic fatty liver (NAFL) to nonalcoholic of FLI were <30 and >60 for total population to rule out and in NAFLD in the primary study [10], and it was <10 for female and <25 for male to rule out NAFLD, ≥20 for female and ≥35 for male to rule in NAFLD in a validation study in Taiwan [17]. Although some of the models have been validated independently, their diagnostic performances are difficult to compare, as they have been designed and validated against different standards (liver biopsy, ultrasonography, or magnetic resonance spectroscopy) [8,18].
This study aimed to validate and compare the current noninvasive models developed by simple biomarkers, so as to find suitable screening methods and determine the cutoff values for NAFLD in the eastern Chinese community population.

Study population
We conducted a cross-sectional study. Participants were from a community in Nanjing, eastern China who received health check-up services from July to September 2018. All participants have signed informed consents. The study design was approved by the Institutional Ethics Review Committee of Nanjing Medical University and in accordance with the ethical standards in the Declaration of Helsinki.
The exclusion criteria of this study were subjects who: (1) had uncompleted clinical data; (2) had a significant alcoholic consumption (men >140 g or women >70 g per week in the past 12 months); (3) had viral/autoimmune hepatitis, drug-induced liver damage or other liver diseases; (4) have undergone gastrointestinal surgery; and (5) had a history of mental disorders.

Measurement
All participants received physical examination, laboratory examination and upper abdominal ultrasonography. Weight, height, and waist circumstance (WC) were all measured wearing minimal clothing and without socks by the same nurses and machine. BMI was calculated as: BMI = weight (kg)/height (m) 2 . Whole blood samples were collected after 10-h overnight fasting and serum samples were separated for immediate analysis. Fasting plasma glucose (FPG), triglycerides (TG), total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), alanine aminotransferase (ALT), aspartate aminotransferase (AST), and γ-glutamyl transferase (GGT) were measured by a biochemical analyzer (Mindray BS-860, Shenzhen Mindray Bio-Medical Electronics Co., Ltd, Shenzhen, China). The abdominal ultrasonography was performed by a LOGIQ-E9 ultrasound system (General Electric Healthcare, Milwaukee, Wisconsin, USA).

Diagnosis and classification of nonalcoholic fatty liver disease
The diagnosis of NAFLD was based on ultrasonography after excluding alcohol abuse and other liver diseases, which is in accord with the Guidelines for Diagnosis and Treatment of NAFLD issued by Fatty Liver and Alcoholic Liver Disease Study Group of the Chinese Liver Disease Association in 2010 and the updated one in 2018 [19,20]. Two experienced ultrasound experts performed the examination, and NAFLD would be diagnosed by the presence of the following findings: (1) The near-field echo of the liver is diffusely increased and more than the kidney and the spleen. The far-field echo of the liver is attenuated gradually. (2) The intrahepatic duct structure is not clear.
(3) The liver is mildly to moderately enlarged, with round and blunt edges. (4) Color Doppler flow imaging shows that the blood flow signal in the liver is reduced or difficult to display, but the intrahepatic blood vessels are normal. (5) The echo of the capsule of right liver lobe and septum is unclear or incomplete. Those with the symptoms of item 1 and one of items 2-4 above would be mild fatty liver; those with the symptoms of item 1 and two of items 2-4 above would be moderate fatty liver; those with the symptoms of item 1, two of items 2-4 above, and item 5 above would be severe fatty liver.

Calculation of predictive models
Eight models were validated and compared in this study. The formulas of models were checked by tracing back to the original literature (Table 1). TG in FLI and TyG; and FPG in TyG are in mg/dL. TG in FLD, LAP, and VAI; FPG in ZJU; and HDL-C in VAI are all in mmol/L. WC and height are in centimeter. BMI is in kg/m2 and AST, ALT, and GGT are in IU/L.

Statistical analysis
Continuous variables are presented as mean ±SD or median (interquartile range), and categorical variables are presented as numbers and percentages. The Student's t-test or Mann-Whitney U test for continuous variables, and chi-squared test or Kruskal-Wallis test for categorical variables were used to compare the parameters between NAFLD and non-NAFLD groups. Receiver operating characteristic (ROC) curves of models were developed to predict the presence of NAFLD. Comparisons of the areas under receiver operating characteristic (AUC) curves between every two models were performed, using the method of DeLong et al. [21]. Decision Curve Analysis (DCA) was also conducted to compare these models. Youden index and the discriminant ability at each cutoff value for NAFLD-related models were used to determine the optimal cutoff value to diagnose NAFLD in total, male and female population. Descriptive analyses, correlation analyses, and ROC analyses were performed using SPSS (version 23.0, IBM Corp., Chicago, Illinois, USA). The ROC comparison of NAFLD-related models using method of DeLong et al. was performed by MedCalc (version 11.4.2.0, MedCalc Inc., Mariakerke, Belgium) and DCA was performed by R software (version 3.6.2, https:// www.r-project.org/). A two-sided P < 0.05 was considered statistically significant.

Characteristics of the study population
After excluding 391 participants according to our exclusion criteria [significant alcohol consumption (n = 257), positive hepatitis C virus (n = 19), use of medication associated with fatty liver (n = 103) and with mental disorders (n = 12)], a total of 3259 subjects were included in the analysis. As shown in Table 2, overall characteristics of the population were depicted and eight models namely FLI, FLD, ZJU, TyG, CAP, LAP, WHtR, and VAI were also displayed. Seventy-three percent of the study population was male, and mean age of participants was 40.59 ± 9.30 years. NAFLD was present in 1169 of 3259 participants (35.9%). Patients with mild fatty liver accounted for 69.8% (n = 816); moderate accounted for 24.0% (n = 280) and severe accounted for 6.2% (n = 73). NAFLD group had higher levels of NAFLD-related indexes than non-NAFLD group.

Cutoff values of models for the detection of nonalcoholic fatty liver disease
The optimal cutoff values were determined by the maximum of Youden index. The sensitivity and specificity of NAFLD-related models at the optimal cutoff values were displayed in Table 4. ZJU had the same optimal cutoff point (>33.7) in all population. FLI >20.6 (sensitivity = 85.3% and specificity = 75.1%) was the optimal cutoff point of NAFLD screening in total population. In male and female population, the cutoff values were different (25.3, sensitivity = 78.4% and specificity = 72.4%; 8.4, sensitivity = 88.1% and specificity = 78.2%). The diagnostic performance of FLI at 30 and 60 was also analyzed. The specificity is much higher than sensitivity at the cutoff of 30 and 60 (in total population, sensitivity = 66.1% and 24.9%, specificity = 84.9% and 96.9%). The optimal cutoff points of FLD were >28.7 (sensitivity = 79.6% and specificity = 75.2%) in whole population, >29.0 (sensitivity = 80.0% and specificity = 70.0%) in male population, >26.1 (sensitivity = 88.2% and specificity = 75.6%) in female population. Besides, the cutoff values of FLI and FLD for 90% sensitivity and 90% specificity among different population were analyzed and shown in Table S2, Supplemental Digital Content 1, http:// links.lww.com/EJGH/A759. In the total population, FLI >13.5 as cutoff point had the sensitivity of 90.0% and FLI >37.6 had the specificity of 90.0%. The cutoff values of FLD for 90% sensitivity and 90% specificity in the total population were >27.2 and >30.9.

Discussion
More than a dozen models have been proposed to detect NAFLD patients, and most of them had a good diagnostic performance in the original study such as the Steato Test [22], FLI [10], FLD index [14], Hepatic steatosis index [23], Framingham steatosis index [24], homeostasis model assessment of insulin resistance (HOMA-IR) [25], and index of NASH (ION) [26]. However, some indicators in these models are costly and not available in most laboratories in the undeveloped countries, such as α2-macroglobulin, apolipoprotein A-I in the Steato Test, and the insulin test in HOMA-IR and ION. In order to determine a simple, accurate, and cost-effective model for NAFLD screening on a large scale, we screened out NAFLD-related models developed by simple indicators. We validated eight NAFLD-related models (FLI, FLD, ZJU, LAP, CAP, WHtR, TyG, and VAI) and compared the performance of these models in eastern Chinese community population. Our study showed that all of these models have a moderate discrimination, and FLI and FLD have better performance than other models in this population.
FLI, consisting of TG, GGT, BMI, and WC, was developed by Bedogni et al. [10] in northern Italy, 2006. It has been validated to a great extent and has been proved to be correlated with insulin resistance, coronary heart disease, and early atherosclerosis [27,28]. Studies have shown that FLI has a moderate discrimination (AUC, 0.83-0.88) in Taiwan, northern and western Chinese mainland [17,29,30]. Our study has testified the feasibility of FLI in eastern Chinese mainland and proved that FLI has good applicability in the community population. FLD based on BMI, TG, ALT to AST ratio and FPG was proposed by Fuyan et al. in eastern China with an AUC of 0.82 [14]. It has been validated by Zhu et al. with an AUC of 0.87 in western China, but it was no better than FLI (AUC, 0.88) [29]. Our study supported that FLD has the same diagnostic performance as FLI. Besides, the DCA also suggested that FLI and FLD have higher net benefit than other models. The population in Zhu et al.'s study was western Chinese while our population was eastern Chinese, which can cause a great difference. There are a variety of differences in culture and lifestyle between western and eastern China, which can cause heterogeneity among different populations. For instance, various dietary patterns can lead to changes in dietary nutrient intake, resulting in different anthropometric data, and biochemical measurements [31,32]. The prevalence of NAFLD in this study is higher than Zhu et al's. Besides, the lean population proportion in our study is bigger than the previous study. ZJU also has a great performance in this study but it does not perform as well as FLD in the whole population. ZJU was developed based on BMI, FPG, TG, ALT to AST ratio and sex with an AUC of 0.82 in eastern China, and in the validation cohort using pathological data, the ZJU index had a good accuracy (AUC, 0.896) for the detection  of steatosis [15]. Some recent studies have validated ZJU and compared to other models including FLI using large population, with conflicting results [29,33]. The underlying reason may be the difference of NAFLD prevalence and characteristics of population. LAP, CAP, WHtR, TyG, and VAI in our study have AUCs of over 0.75 in all population. WC as a good surrogate parameter of visceral fat [34,35], is the common component of FLI, LAP, WHtR, and VAI. Visceral adiposity has a significant association with increased free fatty acids, which can be transported to liver and expose the liver to fat accumulation, liver insulin resistance and inflammation [36]. CAP was developed based on 141 histological diagnosed NAFLD patients and has higher accuracy than FibroScan [16]. The sample size in the primary study may be too small to develop a model applying to general population. To our knowledge, CAP has not been external validated. TyG has a strong correlation with the degree of NAFLD, but the AUC of TyG to detect NAFLD is not in parallel with the high-risk correlation. TyG has been used to reflect insulin resistance, which is very important in the development of NAFLD [37]. Apart from insulin resistance, transaminases and anthropometric indicators also play vital roles in the prediction of NAFLD. It may be the reason why the diagnostic performance of TyG is not in line with the relationship to NAFLD, and is not as well as FLI and FLD.
Cutoff value is an important concern when applying models to specific population. We also determined cutoff values of these models in this population. The cutoff values of FLI and FLD differ in sex and are inconsistent with previous studies. The optimal cutoff points of the FLI in the present study were 20.6, 25.3, and 8.4 in the total, male and female population, respectively. Western countries mostly identified FLI <30 as non-NAFLD and >60 as NAFLD without gender difference [10]. Li et al. determined the optimal cutoff point of FLI as 20 in the north Chinese population, which was similar to our findings [30]. But they did not detect the gender difference of the cutoff values. A validation study of FLI in Taiwan also showed lower cutoff values for NAFLD than Western populations [17]. Body size, body composition and fat distribution have difference among different races and ethnic groups due to the environment, nutrition factors and culture [38,39], which leads to the diversity of anthropometric and serological measurements. Women had higher percent of fat mass, extremity fat, and lower lean mass compared to men at the same level of age and BMI [40]. This may explain the lower cutoff values in female subjects. The optimal cutoff values of FLD in different  Apart from 30 to 60 cutoff points of FLI, optimal cutoff values of all scores were determined by the maximum of Youden index. CAP, controlled attenuation parameter; FLD, fatty liver disease index; FLI, fatty liver index; LAP, lipid accumulation product; NAFLD, nonalcoholic fatty liver disease; TyG, triglyceride and glucose index; VAI, visceral adiposity index; WHtR, waist-to-height ratio; ZJU, Zhejiang University index.
population are more stable (28.7, 29.0, and 26.1 for total, male and female, respectively) than FLI, while the cutoff value for female subjects is also lower than male. So it is essential to apply corresponding cutoff values among different population.
Our results showed that FLI and FLD can be used to screen for NAFLD in eastern Chinese community population. The expenditure of these models in eastern China has not been studied. Jinzhou et al. compared the expenditure of several NAFLD-related models in western China [29]. FLI costs 20 Yuan per capita, which is lower than FLD and ZJU. So FLI may have advantages in expenditure and accessibility compared to other models. The cost-effectiveness of these models should be further studied.
The strength of this study includes large-scale population, comprehensive analysis of the variables and head to head comparison of included models. There are also some limitations in our study. Ultrasonography as a diagnostic method for NAFLD has limited sensitivity [41]. But ultrasound is a preferable method for large-scale screening of asymptomatic individuals like community population. Another limitation is we did not adjust for other factors such as physical activity, diet and smoke, which may have correlation with the risk of metabolic symptoms. Additionally, our study was retrospective, further prospective studies are needed to evaluate the applied values of these models.

Conclusion
In conclusion, we validated and compared eight NAFLDrelated models for noninvasive diagnosis of NAFLD in eastern Chinese community population. FLI and FLD are more accurate and applicable in our study. The cutoff values should be used according to sex. The cost-effectiveness of NAFLD-related models should be further evaluated.