The Nimble Stage 1 Study Validates Diagnostic Circulating Biomarkers for Nonalcoholic Steatohepatitis

Background There are no approved noninvasive tests (NIT) for the diagnosis of nonalcoholic steatohepatitis (NASH) and its histological phenotypes. Methods The FNIH-NIMBLE consortium tested 5 serum-based NIT panels for the following intended uses: NIS4: At-risk NASH, a composite of NASH with NAFLD activity score (NAS) ≥ 4 and fibrosis stage ≥ 2, OWLiver: NASH and NAS ≥ 4, enhanced liver fibrosis (ELF), PROC3 and Fibrometer VCTE: fibrosis stages ≥ 2, ≥ 3 or 4. Aliquots from a single blood sample obtained within 90 days of histological confirmation of NAFLD were tested. The prespecified performance metric tested for was a diagnostic AUROC greater than 0.7 and superiority to ALT for diagnosis of NASH or NAS ≥ 4 and to FIB-4 for fibrosis. Results A total of 1073 adults including NASH (n = 848), at-risk NASH (n = 539) and fibrosis stages 0–4 (n = 222, 114, 262, 277 and 198 respectively) were studied. The AUROC of NIS4 for at-risk NASH was 0.81 and superior to ALT and FIB4 (p < 0.001 for both). OWliver diagnosed NASH with sensitivity and specificity of 77.3% and 66.8% respectively. The AUROCs (95% CI) of ELF, PROC3 and Fibrometer VCTE respectively for fibrosis were as follows: ≥ stage 2 fibrosis [0.82 (0.8–0.85), 0.8 (0.77–0.83), and 0.84 (0.79–0.88)], ≥ stage 3 [0.83 (0.8–0.86), 0.76 (0.73–0.79), 0.85 (0.81–0.9), stage 4 [0.85 (0.81–0.89), 0.81 (0.77–0.85), 0.89 (0.84–0.95)]. ELF and Fibrometer VCTE were significantly superior to FIB-4 for all fibrosis endpoints (p < 0.01 for all). Conclusions These data support the further development of NIS4, ELF and Fibrometer VCTE for their intended uses.


Introduction
Nonalcoholic fatty liver disease (NAFLD) is a leading cause of liver-related morbidity and mortality 1 . The presence of nonalcoholic steatohepatitis (NASH) an active form of NAFLD and brosis stage of 2 or higher is linked to increased risk of liver outcomes and death [2][3][4] . Identi cation of such individuals and targeting them for therapeutic intervention is a cornerstone of clinical assessment and inclusion in clinical trials 5 .
Histological evaluation of liver biopsy sections is the reference standard for assessment of NASH but requires an invasive liver biopsy with its associated risks which has limited its widespread use 6-8 . This has spurred much work to establish non-invasive tests (NITs) to diagnose NASH and brosis, yet none have met the evidence burden needed for regulatory approval. Validation of such NITs to regulatory standards remains a major unmet need for the eld.
The Non-Invasive BioMarkers of MetaBolic Liver DiseasE (NIMBLE) consortium was established by the Foundation-NIH (FNIH) to generate evidence to support advanced regulatory quali cation of one or more NITs for the evaluation of nonalcoholic fatty liver disease (NAFLD) 9 . The current study represents a collaborative effort between the NIMBLE consortium and the National Institute of Diabetes and Digestive and Kidney Diseases-NASH Clinical Research Network (CRN). The objective was to perform a crosssectional study to rigorously de ne the sensitivity and speci city of ve serum-based NIT panels for the diagnosis of one or more of the following: NASH, high histological NAFLD activity and subpopulations with clinically relevant stages of brosis including cirrhosis. The panels were pre-selected on the basis of their analytic robustness, available literature and potential for scale-up for widespread use. The nal results of this study are presented below.

Materials And Methods
Serum samples collected from adult participants with NAFLD in a non-interventional registry (database 1 and 2 (DB1 and DB2) and baseline samples from clinical trials (PIVENS and FLINT) across 12 NIDDK NASH CRN clinical sites (supplemental Table 1) were analyzed. Participants were enrolled across these studies from 2004-2017 and provided informed consent at enrollment; the use of de-identi ed samples and meta-data was considered exempt from additional consenting requirements. The investigators have analyzed the data and take responsibility for the contents of this manuscript. The studies were done in accordance to STARD guidance and reported using the TRIPOD statement 10,11 .

Context of Use
In individuals with NAFLD or with risk-factors for NAFLD, to serve as a diagnostic enrichment tool for the identi cation of various histological phenotypes of NAFLD, intended for selection for participation in NAFLD/NASH clinical trials and/or drug treatment. Those who were overweight or obese, or had other features of metabolic syndrome were considered to be at risk for NAFLD 12 . The presence of speci c phenotypes to be diagnosed included: At risk NASH: (NASH + NAFLD activity score (NAS) ≥ 4 + brosis stage 2 or higher) 9 Nonalcoholic steatohepatitis (borderline or de nite) Clinically signi cant brosis ( brosis stage ≥ 2) Advanced brosis (stage 3 or 4) Cirrhosis (stage 4)

Study Design
A: Study population The study population was curated from the CRN patient base to ensure su cient number of individuals with and without the histological phenotypes of interest and a balanced distribution of brosis stages to avoid spectrum bias. The current analysis included aliquots from a serum-sample obtained within 90-180 days of an evaluable liver biopsy demonstrating NAFLD. For Fibrometer VCTE (vibration-controlled transient elastography), a liver stiffness measurement was required within 180 days of the biopsy.
Exclusion criteria included pregnancy at the time of sample collection or biopsy, co-morbid liver diseases, use of drugs known to cause steatosis, non-availability of minimum required serum, bariatric surgery within 3 years prior to biopsy, prior liver transplant and known primary or secondary malignancy of the liver.
B: Biomarker panels tested and their intended context of use Serum biomarker panels selected by the NIMBLE circulating workstream were reviewed and approved by the project team, NASH CRN ancillary study and steering committees. These included: The intended use of NIS4 was to diagnose at-risk NASH and its components whereas the OWliver panels intended use was to diagnose the presence of NASH (supplemental Table 2). The intended uses of the ELF test, PROC3 and Fibrometer VCTE were to diagnose clinically signi cant brosis (≥ stage 2 brosis), advanced brosis (≥ stage 3 brosis) or cirrhosis (stage 4 brosis).

C: Study Approach
The study plan was summarized in a letter of intent approved by the US federal government Food and Drug Administration 9,18 . De-identi ed, bar-coded frozen aliquots of the same serum sample from each participant without any prior freeze-thaw were released to the individual laboratories. These laboratories generated panel scores which were provided to the independent statistical team (Cytel) who deposited these in the CRN data warehouse. The CRN then released the meta-data linked to the bar codes to Cytel who implemented the prespeci ed statistical analysis plan without involvement of individual vendors whose panels were tested. The NIMBLE circulating workstream and statistical team then jointly reviewed the results and interpreted the data.

D: Histological Examination:
The pathology committee of the NASH CRN performed the histological assessment, masked to clinical and laboratory data, using an established and validated protocol 19,20 . The key measures included the presence of steatohepatitis and individual severity grades for steatosis (0-3), lobular in ammation (0-2), hepatocellular ballooning (0-2) and brosis stage (0-4). The NAS was computed from the scores for steatosis, ballooning and in ammation while "at risk" NASH was computed from the presence of its components 9,20 .

Statistical Plan
There were two pre-speci ed performance metrics which formed the basis for hypothesis-testing. First, that the area under receiver operating curve (AUROC) for each panel would be 0.7 or higher for its intended use with 95% con dence limits that would not intersect 0.5. Next, the biomarker performance would be superior to commonly used blood-based laboratory aids for their intended use. The AUROC of each panel was therefore compared to that of ALT for diagnosis of NASH or NAS≥ 4 and FIB-4, a commonly used laboratory aid based on age, AST, ALT and platelet counts, for diagnosis of brosis severity 21,22 . The sensitivity and speci city were computed at the Youden cut-point. The sensitivity was further estimated keeping speci city xed at 90% and conversely speci city was measured keeping the sensitivity xed at 90%. Finally, the positive and negative predictive values were computed at various prevalence of speci c NAFLD phenotypes. Missing data were assumed missing at random from the statistical analysis, as they resulted from sample handling and laboratory issues independent of the relationship between biomarkers and histology; complete case-analysis was done.
The sample size was estimated to detect a difference of at least 0.05 between the AUROC of FIB-4 or ALT and the relevant biomarker panel with a power of at least 80% with a one-sided p value of 0.025. It was assumed that the AUROC for FIB-4 would be 0.8 for brosis. Additionally, due to potential correlation between FIB-4 or ALT versus the biomarker panels, adjustments were made assuming the correlation coe cients ranging from 0.5-0.8. Based on these a total number of participants needed with NASH and brosis stage 2 or 3 versus 0 or 1 was 400 each. For analysis of cirrhosis, 180 individuals with cirrhosis were needed.

Results
The NASH CRN cohort had 4094 participants ( Figure 1). A total of 2479 individuals were excluded because of age, lack of samples or evaluable liver biopsies. Of the remaining individuals, consecutive patients for each stage of disease were selected to ensure that that enough patients were available to meet sample size estimates and to have a relatively balanced-distributed spectrum of brosis severity (stages 0-4, n= 222, 114, 262, 277 and 198 respectively). A total of 1073 individuals meeting eligibility criteria were thus included for this analysis with 90% of individuals having a serum sample within 90 days of the liver biopsy ( Table 1).
The mean age of the cohort was 54 years and was preponderantly female and white. NAFL was present in 225 individuals while 835 had NASH and 13 an indeterminate NAFLD phenotype. Those without brosis were younger, had mainly had NAFL and lower NAS compared to those with brosis stage 2 or higher. The study population for Fibrometer VCTE was a smaller subset of the larger population (n= 396) for this analysis due to lack of availability of a VCTE examination within 6 months of the liver biopsy in many individuals. The baseline features of this subset were similar to the larger cohort (supplemental Table 3).

At-risk NASH
NIS4 was the only panel with an intended use to diagnose underlying composite phenotype of "at risk" NASH (n=539) as de ned above. The sensitivity and speci city were 78.1% and 73.6% respectively with an AUROC of 0.815 at the optimal cut-point ( Table 2). The AUROC was superior (p< 0.001 for both) to both ALT (AUROC 0.726) and FIB-4 (AUROC 0.704).

NASH Diagnosis
NIS4 and the OWLiver tests had an intended use to diagnose NASH (supplemental Table 2). NIS4 (Youden cut-point 0.539) had an AUROC of 0.83 (95% CI 0.8-0.86) and was superior to ALT (AUROC 0.67) for this intended use (Table 2, Figure 2). The sensitivity and speci city were 77.7% and 76.2% respectively at this cut-point. NIS4 had a speci city of 47.7% and sensitivity of 54.4% when sensitivity and speci city were constrained at 90% respectively (supplemental table 4) and both were signi cantly superior to ALT (p<0.001 for both). The OWLiver provided the results in categorical format which did not permit generation of an AUROC; it diagnosed NASH with a sensitivity of 77.3%. and speci city of 66.8%.
Clinically signi cant brosis (Fibrosis stage ≥ 2): NIS4, ELF, PROC3 and Fibrometer VCTE had an intended use to identify clinically signi cant brosis in those with NAFLD. The AUROCs of NIS4, ELF, PROC3 and Fibrometer VCTE were 0.874, 0.828, 0.8 and 0.841 respectively. Their respective sensitivity and speci city at their Youden index are provided in Table   2. FIB-4 had an AUROC of 0.798 very close to the expected AUROC of 0.8 22 . NIS4 (p< 0.001), ELF (p< 0.01) and Fibrometer VCTE (p< 0.001) were all superior to FIB4. Similar data were obtained when the performance of these panels with sensitivity and speci city constrained at 90% were evaluated (supplemental Table 4).

Discussion
The regulatory path for approval of a diagnostic test requires rigorously established sensitivity and speci city in a study cohort that is both powered and balanced with respect to the presence or absence of the condition being studied. The current study establishes this rst step and is the foundation for the use of speci c cut-points in relevant populations in the next stage towards regulatory approval of these diagnostic enrichment tools for NASH 18 .
The study has several methodological strengths. The time from biopsy to blood draw was short and all analyses including the comparators were made on the same blood sample. Further, all samples were drawn, aliquoted, stored and analyzed without multiple freeze-thaw using prespeci ed protocols. Histology was read independently using a rigorous pre-speci ed protocol by the pathology committee of the NASH CRN masked to clinical and laboratory data. The distribution of brosis stages in the cohort avoided spectrum bias. Finally, for each of the phenotypes studied, the sample size included enough number of individuals with and without the phenotype assuring power both for sensitivity and speci city.
The practical application of these data has to be considered in the context of how the tests are used. In primary care where the prevalence of advanced brosis is 1%, positive tests are likely to be false positives and even with excellent sensitivity and speci city the PPV will be low 23 . Using these tests to identify patients for clinical trials in such settings are likely to have many false positives resulting in high screen fail rates. The NPV for FIB4 as well as all of the biomarker panels (ranged from 98-99.7% when the population prevalence of advanced brosis was 1% (supplemental Table 5). These tests can therefore be applied for exclusion of this phenotype for both clinical management and for screening for trials of at-risk NASH in a primary care setting.
The prevalence of at-risk NASH or its subsets NASH with advanced brosis or cirrhosis in hepatology clinic settings are higher and range from 10-40% 2,12,24 . It is encouraging to note that the high NPV in settings with low prevalence was maintained at these ranges while the positive predictive values approached 80% at the 40% prevalence when the Youden cut point was used (supplemental Table 5). In clinical trial settings, these data should allow exclusion of those without these phenotypes while limiting overdiagnosis compared to a primary care setting. Additional enhancement of certainty for ruling in disease by using the cut point for 90% speci city will however be associated with a loss of sensitivity and attendant misclassi cation.
Further improvement is likely to require an algorithmic approach using multiple panels or use of imagingbased tests for greater precision in identi cation of this population. Recently, MR-elastography with FIB4 or AST has been shown to identify those with NASH and advanced brosis or at-risk NASH respectively and may provide such tools 25-27 . The current data can't however be directly compared to these due to methodological differences.
For those with advanced brosis or cirrhosis, a mistaken diagnosis of absence of these phenotypes may cause patients to be followed without surveillance for hepatocellular cancer or varices. The overall high NPVs suggest that the risks are in general low. Conversely, overdiagnosis due to the modest PPVs may result in futile additional testing including liver biopsies with its attendant risks. ELF and Fibrometer VCTE can identify 82-94% of true positive cases of cirrhosis but also may over-diagnose some patients to have cirrhosis in clinics with high prevalence of cirrhosis ( Table 5). The risks of overdiagnosis has to be considered in the context of the risks of missing advanced brosis or cirrhosis altogether in speci c populations both in clinical practice and for consideration for inclusion in trials.
This study also has some limitations. The NASH CRN is based at tertiary care centers generating ascertainment bias. The study population is predominantly white and the data are not generalizable to other races. Despite these limitations, the current study provides a rigorous evidence base to establish the sensitivity and speci city of these biomarker panels serves as a critical step in the ultimate regulatory approval of some of these panels as diagnostic enrichment tools in clinical practice and to determine eligibility for inclusion in clinical trials for NASH. 25. Kim BK, Tamaki N, Imajo K, et al. Head-to-head   Platelet ( All statistics presented are means (standard deviations), unless otherwise specified.
*Time between the liver biopsy and study enrollment for 109 (10%) of the cohort was between 92-183 days. FNIH-NIMBLE project-Foundation for the National Institutes of Health (FNIH) -Biomarkers Consortium,

Non-Invasive Biomarkers for Metabolic Liver Disease (NIMBLE) Project
The FNIH creates and manages alliances with public and private institutions in support of the mission of the NIH, the world's premier medical research agency. The FNIH works with its partners to accelerate biomedical research and strategies for addressing diseases and health concerns in the United States and across the globe. The FNIH organizes and administers research projects; supports education and training of new researchers; organizes educational events and symposia; and administers a series of funds supporting a wide range of health issues. Established by Congress in 1990, the FNIH is a not-for-pro t 501(c)(3) charitable organization.
The NIMBLE Project is a comprehensive multi-year pre-competitive, public-private partnership conducted under the auspices of the FNIH-Biomarkers Consortium. The Biomarkers Consortium embraces government, industry, patient advocacy groups, and not-for-pro t organizations each of which has a stake in the identi cation, development, and the seeking of regulatory approval for biomarkers. The NIMBLE project is sponsored by the FNIH and is a public-private partnership supported by multiple entities including AbbVie, Amgen, AstraZeneca, Boehringer Ingelheim, Bristol Myers Squibb, Echosens, GE Healthcare, Genentech, Gilead Sciences, Intercept Pharmaceuticals, Novo Nordisk, P zer Inc, Regeneron Pharmaceuticals and Takeda Development Center Americas. We acknowledge support from the Global Liver Institute, USA, and the US FDA.
The FNIH NIMBLE study was conducted by the NASH CRN and FNIH Investigators and supported by NIDDK. The biospecimens from the NASH CRN reported here were supplied by the NIDDK Central Repository. This manuscript was not prepared in collaboration with the NIDDK Central Repository and does not necessarily re ect the opinions or views of the NIDDK Central Repository.
Key collaborators who provide in-kind support to NIMBLE include AMRA Medical, Canon Medical Systems USA, Echosens, GENFIT, GE Healthcare, Nordic Bioscience, OWL Metabolomics, Philips Ultrasound, P-Value, Hologic SuperSonic Imagine, Siemens Healthineers and Siemens Medical Solutions USA. The laboratories that performed the chemical analyses and reported raw data to the NIMBLE data coordinating center were as follows:  Sample derivation from the NASH CRN cohort and their use for various tests Footnotes: *NAS = NAFLD Activity Score †Subgroup de nitions: 0=NAS<4, NAFL only, brosis stage<2; 1=NAS<4, any NASH, brosis stage<2; 3=NAS=4+, any NASH, brosis stage=2, 3; 3=cirrhosis Figure 2 Sensitivity and Speci city of key NIT panels for their respective intended uses are shown as a function of the cutoff scores for the NIT. The top panel demonstrate changes in sensitivity and speci city at varying NIS4 cutoff scores for the diagnosis of at-risk NASH (panel A) and its key subcomponent diagnosis of NASH (panel B) and stage 2 or greater brosis (Panel C). The middle panels show similar data for the ELF