Study population and study design
This was a post hoc analysis of a prospective cohort study at the Prince of Wales Hospital, Hong Kong. Adult patients (age ≥ 18) with biopsy-proven NAFLD were recruited. Clinical events were censored on 1 Oct 2021. We excluded patients with other liver diseases (e.g., positive hepatitis B surface antigen or anti-hepatitis C virus antibody, histologic features of an alternative diagnosis), secondary causes of hepatic steatosis (e.g., use of systemic steroids or methotrexate), excessive alcohol consumption (> 30g/day in men or > 20g/day in women), and history of HCC or other cancers. No patients underwent liver transplantations. The study was approved by the Ethics Committee of the Chinese University of Hong Kong, and all patients provided informed written consent.
All patients underwent liver biopsy. Anthropometric and clinical data were recorded at the time of the liver biopsy, including age, sex, weight, height, waist circumference, and hip circumference. Body mass index (BMI) was calculated as body weight (kg) divided by body height (m) squared.
Histology
One experienced pathologist (AWHC) scored all histologic slides using the Non-alcoholic Steatohepatitis (NASH) Clinical Research Network system.14 The NAFLD activity score (NAS) was the numerical sum of steatosis (0–3), lobular inflammation (0–3), and hepatocellular ballooning (0–2). NASH was defined as at least 1 grade for each of the NAS components. Patients not fulfilling the histologic criteria of NASH were considered to have non-alcoholic fatty liver (NAFL). Liver fibrosis was defined as stage 0, absence of fibrosis; stage 1, perisinusoidal or periportal fibrosis; stage 2, perisinusoidal and portal/periportal fibrosis; stage 3, bridging fibrosis and stage 4, cirrhosis. Patients who had NAS ≥ 4 with at least 1 point in each component and fibrosis stage 2–4 were considered to have at-risk NASH. Patients with stage 3–4 fibrosis were considered to have advanced fibrosis.
Blood tests
The blood sample was taken after at least 8 hours of fasting for complete blood cell count, international normalized ratio (INR) of the prothrombin time, fasting plasma glucose, haemoglobin A1c, total cholesterol, high-density lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol, triglycerides, alanine aminotransferase (ALT), aspartate aminotransferase (AST), and gamma-glutamyltransferase (GGT).
Transient elastography and other non-invasive algorithms
Transient elastography examination was performed using the FibroScan 502 machine (Echosens, Paris, France) within one week before liver biopsy by experienced operators according to the training and instructions by the manufacturer.15 Patients with ≥ 10 valid liver stiffness measurements and an interquartile range-to-median ratio of the measurements of ≤ 0.3 were considered to have valid results. The median measurement was taken as reflective as the liver stiffness. Fibrosis-4 (FIB-4) index was calculated with age (years), platelet count (109/L), and the serum level of AST (U/L) and ALT (U/L).16 NAFLD fibrosis score (NFS) was calculated with age (years), BMI (kg/m2), presence of impaired fasting glucose or diabetes (YES/NO), platelet count (109/L), and the serum level of AST (U/L) and ALT (U/L).17 AST to platelet ratio index (APRI) was calculated with platelet count (109/L) and the serum level of AST (IU/L) and ALT (IU/L).18
Protein biomarkers
Serum and plasma from biopsy-proven NASH or NAFLD patients, consisting of males and females, were used to measure exploratory and known biomarker levels of liver fibrosis, inflammation, and disease. Procollagen III N-Terminal Propeptide (PIIINP, catalogue # EHP0667, ABclonal, Woburn, MA), Osteopontin (OPN, catalogue # K151HJC, Meso Scale Discovery, Rockville, MD), Growth Differentiation Factor 15 (GDF-15, catalogue # DGD150, R&D Systems, Minneapolis, MN), Fibronectin (FN, catalogue # DFBN10, R&D Systems, Minneapolis, MN), Cytokeratin 18 (CK-18, catalogue # 10011, PEVIVA, Sundbyberg, Sweden), Tissue Inhibitor of Metalloproteinases 1 (TIMP-1, catalogue # DTM100, R&D Systems, Minneapolis, MN), Hyaluronan (HA, catalogue # DHYAL0, R&D Systems, Minneapolis, MN), Matrix Metalloproteinase 7 (MMP-7, catalogue # DMP700, R&D Systems, Minneapolis, MN), Chitinase 3-like 1 (YKL-40, catalogue # DC3L10, R&D Systems, Minneapolis, MN), N-terminal pro-peptide of type III collagen (ProC3, Nordic Bioscience, Herlev, Denmark), C-terminal pro-peptide of type V collagen (ProC5, Nordic Bioscience, Herlev, Denmark), C-terminal of released C5 domain of type VI collagen α3 chain (ProC6, Nordic Bioscience, Herlev, Denmark), and Neo-epitope of MMP9-mediated degradation of type III collagen (C3M, Nordic Bioscience, Herlev, Denmark) biomarker levels were determined by enzyme-linked immunosorbent assays (ELISA). The frozen samples were aliquoted for each assay on wet ice and run in ELISA assays in duplicate and averages were reported.
Statistical analysis
Based on previous studies, NAFLD and NASH prevalence in Asia ranges from 15–40% and 2–3% respectively,19 a sample size of 281 in NAFLD cohort will provide a 95% confidence interval of 2.78. Continuous variables were presented as mean ± standard deviation (SD) or median (interquartile range (IQR)) compared using student t test and Mann-Whitney U test in two groups. Categorical variables were compared using the Chi-squared test and Fisher’s exact test. The overall accuracy of the biomarkers in predicting at-risk NASH or advanced fibrosis was expressed by the area under the receiver-operating characteristics curve (AUROC). Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated to evaluate the diagnostic accuracy of the model. To compare the diagnostic accuracy of non-invasive tests, we used the published cut-off of the following tests: FIB-4 (low cut-off 1.63, high cut-off 2.78), NFS (low cut-off − 0.678, high cut-off 1.17), APRI (low cut-off 0.57, high cut-off 0.84), liver stiffness (8 kPa).8, 20–22 Kaplan-Meier curve was used to estimate the cumulative incidence of liver-related events from follow-up. P value < 0.05 was considered statistically significant for all tests. For missing values of biomarkers, stochastic regression imputation was performed in the model building stage. All data were analysed by IBM SPSS 26 for macOS and R software Version 4.0.0.
Protein biomarker selection by the machine learning algorithm
Improving the accuracy of predictions by identifying a subset of features on the grounds of statistics learning is known as ‘feature selection’. For a dataset D having d variables, feature set F can be expressed as:
F = {f1, f2, …., fd}, (1)
where F stands for the feature set. The objective is to identify an optimum group of features F’, where F’ \(\in\) F. The objective function of variable selection is formulated to optimize two terms as follows: (1) the goodness-of-fit (to be maximized), and (2) the number of variables
(To be minimized). We used the embedded methods to perform the feature selection in the process of training and testing together with the bootstrap resampling technique to improve the stability of feature selection (Supplementary Fig. 1 showed the framework). The workflow was described as follows. First, we randomly split the whole dataset into a training dataset (70%) and a testing dataset (30%). Second, we built the predictive model from the training dataset using 5-fold cross validation. The model parameters were tuned to select the best model. Finally, we tested the model performance and rank the features included in the models. The whole process was repeated 200 times by bootstrap resampling datasets to aggregate the selected features. A single run of feature selection tends to select the local optimum, while this ensemble feature selection has more chance to find the better features through the aggregation of multiple feature selection processes.
Machine learning algorithms for multivariate analysis
To derive the final model, we employed the machine learning algorithms, XGBoost, Random Forest, and Support vector machines to create at-risk NASH classifiers that were trained from protein biomarkers used as the input features.23–25 A classifier produces a probability that a given object belongs to a class (e.g. that at-risk NASH is true). The threshold (or “cut-off”) is the specified probability (often 0.5), beyond which the prediction is considered positive. XGBoost stands for Extreme Gradient Boosting. XGBoost is a tree-based ensemble machine learning algorithm which is a scalable machine learning system for tree boosting. It uses more accurate approximations to find the best tree model. Random forest, as the name implies, consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction. Support vector machines have become popular in an increasingly wide variety of biological applications. The objective of the support vector machine algorithm is to find a hyperplane in N-dimensional space (N — the number of features) that distinctly classifies the data points. The open-source machine learning packages XGBoost, randomForest, and e1071 in R were used in the analysis.