Machine Learning Soft Voting Algorithm for Prediction and Detection of Nonalcoholic Fatty Liver Disease

DOI: https://doi.org/10.21203/rs.3.rs-2025654/v1

Abstract

Nonalcoholic fatty liver disease (NAFLD) is one of the most commonly diagnosed chronic liver diseases in the world and has become an essential public health problem. Introduction of machine learning algorithms to evaluate the best predictive clinical model for NAFLD. In this paper, this study proposes a machine learning Voting algorithm with Genetic Algorithm, Neural Network, Random Forest, and Logistic Regression for NAFLD detection and diagnosis. First, 2,522 of the 10,508 samples met the diagnostic criteria for NAFLD. Visualizing the distribution of missing values, and KNN algorithm is used to fill the missing values. Doing Kolmogorov-Smirnov Z test and the heatmap of 19 variables. The PPFS feature selection method is used to perform the feature selection and the final 11 features are retained. Alanine aminotransferase (ALT), body mass index (BMI), triglycerides (TG), γ-glutamyl transpeptidase (γGT), and Low-density lipoprotein cholesterol (LDL) were the top 5 features contributing to NAFLD. 10 basic machine learning algorithms were used, and the four machine learning algorithms with the highest accuracy were Genetic Algorithm, Neural Network, Random Forest, and Logistic Regression. These four algorithms are fused into the proposed Voting algorithm through the Soft Voting method of Ensemble learning. 10-fold cross-validation was used in the classification. To verify the proposed Voting algorithm, it is compared with other 10 basic machine learning algorithms It achieved accuracy, recall, precision, \({F}_{1}\) score, AUC of up to 0.846212, 0.573248, 0.725806, 0.640569, 0.894010, respectively. According to the results, the proposed Voting algorithm demonstrated the best performance.

1. Introduction

Non-alcoholic fatty liver disease (NAFLD), a disease caused by accumulating fat in the liver, is one of the most commonly diagnosed chronic liver diseases in the world and has become an essential public health problem1, 2. NAFLD comprises a broad clinical spectrum of progression, including simple steatosis, nonalcoholic steatohepatitis (NASH) and fibrosis. Simple steatosis is considered to have a benign development, but NASH may develop into fibrosis, which is a result of chronic liver injury and can further progress to cirrhosis and hepatocellular carcinoma3. It is crucial to obtain an early diagnosis, which will lead to better prevention and manages NAFLD.

Liver biopsy result is the gold standard for NAFLD detection and fibrosis staging, but it is costly and limited by sampling errors and the risk of complications4. Much attention has therefore been focused on whether noninvasive methods can identify NAFLD patients with an enhanced risk of progressive disease. Bedogni et al. proposed the Fatty Liver Index (FLI), a combination based on triglycerides, body mass index (BMI), gamma-glutamyl transpeptidase (γGT), and waist circumference (WC), which is broadly adopted as a biomarker index for NAFLD5. Wang et al. proposed the ZJU index, which is a good predictor of NAFLD in the Chinese population6. Lee et al. proposed the hepatic steatosis index (HSI), consisting of ALT, AST, BMI, gender, and history of diabetes, which can effectively identify NAFLD 7. Ultrasonography is noninvasive, reasonably accurate, and widely used in the clinical diagnosis of NAFLD; however, it is not sensitive enough to detect mild steatosis8.

Machine learning is a field of artificial intelligence study that applies statistical approaches to classifying data. Several machine learning techniques have been applied in clinical settings to predict diseases and have shown higher accuracy for diagnosis than classical methods912. Here, we propose a new machine learning Voting algorithm to study useful predictive model for NAFLD. Other basic 10 machine learning algorithms were also compared. To get a better validation, the proposed study not only uses accuracy but also includes a comparison of other evaluation indicators such as precision, recall, \({F}_{1}\) score, and AUC discussed.

The paper is organized as follows. Section 2 presents the materials and method, including description of the dataset used in the research and the feature selection PPFS method, introduction of the proposed voting algorithm and evaluation indicators. Section 3 provides the results, including missing data filling, correlation analysis, feature selection using PPFS, Machine Learning performance assessment and comparison with existing studies. Section 4 provides the discussion. Finally, Section 5 presents the conclusions of the work.

2. Materials And Method

2.1. Dataset Used in Research

Data were obtained from 10,508 participants who attended the 2010 annual health examination at the First Affiliated Hospital of Zhejiang University School of Medicine, China9. Informed consent was obtained from all subjects involved in the study. The study was approved by the Ethics Committee of the Guilin University of Technology, and was in compliance with the Helsinki Declaration. All methods were performed in accordance with the approved guidelines. The variables consisted mainly of 4 basic characteristics of the subjects and 15 biochemical indicators of the subjects' blood. The four basic characteristics of the subjects were age, gender, height, and weight. The biochemical variables included liver enzymes, lipids, uric acid, and glucose. The diagnosis of NAFLD was based on criteria from the Chinese Liver Disease Association13. Ultrasound examined by trained sonographers. When the variable Ultrasound is equal to 1 indicates that the subject NAFLD is present, and when the variable Ultrasound is equal to 0 indicates that the subject NAFLD is absent. The detailed variables and descriptions are shown in Table 1.

Table 1

The variables and descriptions of the data.

Num.

Variable

Description

Num.

Variable

Description

1

Age

Age

11

IB

Indirect bilirubin

2

Gender

Gender

12

TC

Total cholesterol

3

Height

Height

13

TG

Triglycerides

4

Weight

Weight

14

HDL

High-density lipoprotein cholesterol

5

ALT

Alanine aminotransferase

15

LDL

Low-density lipoprotein cholesterol

6

AST

Glutamic oxaloacetic transaminase

16

Bun

Blood urea nitrogen

7

ALP

Alkaline phosphatase

17

Cr

Creatinine

8

γ-GT

γ-Glutamyl transferase

18

Glu

Fasting plasma glucose

9

TB

Total bilirubin

19

Uric

Serum uric acid

10

DB

Direct bilirubin

20

Ultrasound

Ultrasound

2.2. Feature Selection

Predictive Permutation Feature Selection (PPFS)14 is a novel feature selection algorithm based on the concept of Markov Blanket (MB). MB can be described by the following Fig. 1. It contains all the information related to the target node, and the non-MB nodes can be discarded safely to achieve the purpose of feature selection. The G node with green is a target node, the nodes with a pink form an MB of the G node, and the G node is independent of any node outside the rectangle15.

The PPFS algorithm selects a subset of features based on their performance both individually and as a group, and it can automatically decide how many features to take and tries to find the optimal combination of features14. The PPFS algorithm is implemented by using the PPIMBC function in the PyImeptus package for python.

2.3. Proposed Voting Algorithm

10 machine learning algorithms are used, namely, Logistic Regression (LR)16, Random Forest (RF)17, Support Vector Machine (SVM)18, Decision Tree (DT)19, LigntGBM20, CatBoost21, Neural Network (NN)22, K-Nearest Neighbor (KNN)23, Bayesian Network (BN)24, Genetic Algorithm (GA)25. The four most accurate machine learning algorithms (GA, LR, NN, RF) were used to build a Voting algorithm26 by Ensemble Learning soft voting algorithm.

The flowchart of proposing Voting algorithm is shown in Fig. 2. The whole framework consists of three phases, data preparation, model construction, and model prediction. In the data preparation phase, missing data are filled, correlation analysis and feature selection are performed. In the model construction phase, the four most accurate machine learning algorithms (GA, LR, NN, RF) were used to build a Voting model by Ensemble Learning soft voting algorithm. In the model prediction phase, the proposed Voting algorithm is applied to predict whether a new patient will progress to NAFLD.

GA is a randomized searching optimization algorithm, which stimulates the crossover, variation and selection phenomena that occur in natural selection and genetics processes. Beginning with a random initial population, a population of individuals better suited to the environment is generated by random selection, crossover and mutation operations27.

Ensemble learning is a novel machine learning technique that is broadly used in classification and regression problem. Ensemble learning is the use of multiple identical or different machine learning algorithms to solve a problem through some combination. Voting-based ensemble learning builds multiple models and applies basic statistical methods to combine the predictions of the models. Voting algorithms consist of hard voting and soft voting, and soft voting uses the class probability output by each algorithm for class selection, and the predicted outcome is the class with the largest sum of probabilities among all voting results28.

The packages and functions used by the 11 algorithms are shown in Table 2.

Table 2

The packages and functions used by 11 algorithms.

Algorithm

Package

Function

LR

sklearn

LogisticRegression

RF

sklearn

RandomForestClassifier

SVM

sklearn

SVC

DT

sklearn

DecisionTreeClassifier

LigntGBM

lightgbm

LGBMClassifier

CatBoost

catboost

CatBoostClassifier

NN

sklearn

MLPClassifier

KNN

sklearn

KNeighborsClassifier

BN

pyAgrum

BNClassifier

GA

tpot

TPOTClassifier

Voting

sklearn

VotingClassifier

2.4. Evaluation Indicators

Based on the method used in a previous study29, we calculated the accuracy, precision, recall, \({F}_{1}\) score, and AUC to evaluate the performance of the different algorithms.

$$\left\{\begin{array}{c} \text{T}\text{P}\text{ }\text{=}\text{ }\text{T}\text{r}\text{u}\text{e}\text{ }\text{P}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}\text{ }\\ \text{F}\text{P}\text{ }\text{=}\text{ }\text{F}\text{a}\text{l}\text{s}\text{e}\text{ }\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}\text{ }\\ \text{F}\text{N}\text{ }\text{=}\text{ }\text{F}\text{a}\text{l}\text{s}\text{e}\text{ }\text{n}\text{e}\text{g}\text{a}\text{t}\text{i}\text{v}\text{e}\text{ }\\ \text{T}\text{N}\text{ }\text{=}\text{ }\text{T}\text{r}\text{u}\text{e}\text{ }\text{n}\text{e}\text{g}\text{a}\text{t}\text{i}\text{v}\text{e}\text{ }\end{array}\right\}$$
1

Accuracy represents the number of correctly classified test instances as a percentage of the total number of test instances and is calculated as30

$$Accuracy =\frac{TP+TN}{TP+FP+FN+TN}$$
2

Recall represents the ratio of the number of correctly classified positive cases to the actual number of positive cases and is calculated as31

$$\text{Recall}\text{ }=\frac{TP}{TP+FN}$$
3

Precision represents the ratio of the number of correctly classified positive instances to the number of instances classified as positive and is calculated as32

$$\text{Precision }=\frac{TP}{TP+FP}$$
4

The \({F}_{1}\) score is based on the harmonic mean of Recall and Precision, which evaluates Recall and Precision together and is calculated as33

$${F}_{1}\text{ }=\frac{2*\text{ Recall }*\text{ Precision }}{\text{ Recall }+\text{ Precision }}$$
5

The true positive rate (TPR) indicates the percentage of all samples that are positive that are correctly identified as positive. The false positive rate (FPR) indicates the rate at which all actually negative samples are incorrectly identified as positive. TPR and FPR are calculated as34

$$\text{TPR }=\frac{TP}{TP+FN}$$
6
$$\text{FPR }\text{=}\frac{\text{FP}}{\text{FP+TN}}$$
7

The receiver operating characteristic (ROC)34 curve is based on FPR as the X-axis and TPR as the Y-axis. The area under the curve (AUC)35 is the area under the ROC curve and gives the average performance value of the classifier.

3. Results

3.1. Missing data filling

First, the missingno package in python is used to visualize the distribution of missing values in the data. The visualization result is shown in Fig. 3. The two values on the left axis are the beginning and end of the sample size (from 1 to 10508 data). The number 3 on the right indicates that there are 3 columns of data without missing values, and the number 20 on the bottom right indicates that there are 20 columns of data. If there are more white lines, it means there are more missing values. We can see that the three variables Age, Gender, and Ultrasound do not have white lines, which means that there are no missing values for these three variables.

We could also able to calculate the missing rate of the data, and the result is shown in Fig. 4. It can be seen that the missing rate of each variable is not high. Height and Weight, the two variables with the highest missing rates, are neither more than 0.5%. Age, Gender, and Ultrasound have no missing value and the missing rate is 0. The KNN algorithm in python's fancyimpute package is used to fill the missing values. Constructs the body mass index (BMI), which is calculated by dividing weight by height squared, and is used as a standard for the diagnosis of overweight and obesity.

3.2. Correlation analysis

The data are divided into two categories by the Ultrasound variable, where an Ultrasound of 1 means NAFLD is present and an Ultrasound of 0 means NAFLD is absent. The data characteristics of the two types are viewed by the describe function in python. The result is shown in Table 3.


 
Table 3

The data characteristics of the two types.

Variable

NAFLD present

(n = 2522)

NAFLD absent

(n = 7986)

Kolmogorov-Smirnov

Z value

Kolmogorov-Smirnov

P value

Age (year)

50.86 (12.75)

47.00 (14.96)

5.768

< 0.001

Gender (male/female)

1907/615

4971/3015

5.853

< 0.001

BMI (kg/m2)

26.02 (2.74)

22.48 (2.72)

21.99

< 0.001

ALT (U/L)

23.00(16.00–34.00)

13(10.00–19.00)

17.451

< 0.001

AST (U/L)

23.00(19.00–30.00)

20.00(16.00–24.00)

11.35

< 0.001

ALP (U/L)

83.00(71.00–99.00)

77.00(64.25-91.00)

5.837

< 0.001

γ-GT (U/L)

31.00(22.00–47.00)

17.00(13.00–26.00)

18.006

< 0.001

TB (µmol/L)

12.90(10.20–16.40)

12.20(9.60–16.10)

2.93

< 0.001

DB (µmol/L)

4.10(3.50–5.10)

3.90(3.20–4.80)

4.935

< 0.001

IB (µmol/L)

8.80(6.70–11.50)

8.60(6.30–11.30)

2.026

0.001

TC (mmol/L)

5.08(4.51–5.72)

4.72(4.17–5.30)

7.579

< 0.001

TG (mmol/L)

1.63(1.18–2.23)

0.96(0.71–1.36)

18.113

< 0.001

HDL (mmol/L)

1.34(1.18–1.53)

1.53(1.32–1.78)

11.252

< 0.001

LDL (mmol/L)

2.85(2.35–3.35)

2.60(2.14–3.08)

6.456

< 0.001

Bun (mmol/L)

4.98(4.24–5.85)

4.93(4.18–5.82)

1.33

0.058

Cr (mmol/L)

68.00(59.00–77.00)

66.00(56.00–75.00)

3.314

< 0.001

Glu (mmol/L)

5.11(4.75–5.65)

4.88(4.57–5.24)

8.037

< 0.001

Uric (µmol/L)

367.32(80.48)

312.36(53.31)

12.196

< 0.001

From Table 3, we can see that there are 2522 samples with NAFLD present, accounting for a quarter of the total, and the data is not unbalanced. Only one number in the parentheses represents the inside number as the standard deviation and the outside number as the mean. For example, when NAFLD was present, the mean value of the BMI variable was 26.02 and the standard deviation was 2.74. The numbers inside the parentheses represent the lower and upper quartiles, and the numbers outside the parentheses represent the median when there are two numbers in the parentheses. For instance, when NAFLD was absent, the median value of the ALT variable was 13, the lower quartile was 10, and the upper quartile was 19. When NAFLD was present, we could find more males than females, and the medians of all other variables were large, except for the HDL variable.

Performing the Kolmogorov-Smirnov Z test on the data, we found a Z value of 1.33 and a P value of 0.058 > 0.05 for the Bun variable, indicating no significant difference between the presence and absence of NAFLD for the Bun variable. P values for the remaining variables were less than 0.05, indicating a significant difference between the presence and absence of NAFLD for the remaining variables.

By using the heatmap function in python's seaborn package, the next correlation analysis was conducted on 19 variables. The heatmap of 19 variables is shown in Fig. 5.

We chose the Pearson correlation coefficient, which ranges from − 1 to 1. The larger the absolute value of the correlation coefficient, the stronger the correlation. The darker the color of the heatmap means the stronger the correlation. From Fig. 5, we can see that the correlations between the three variables, IB, DB, and TB, are compelling, the correlation between IB and TB is 0.98, the correlation between DB and TB is 0.87 and the correlation between IB and DB is 0.78. The relationship between LDL and TC was highly correlated, with a correlation of 0.89. The correlation between ALT and AST was also very high, reaching 0.81.

3.3. Feature Selection using PPFS

Variable selection is applied to 18 independent variables (Ultrasound as a dependent variable) by using the PPFS algorithm. The PPFS algorithm can automatically decide how many features to take and it attempts to find the best feature combination. Finally, 11 variables were retained and the feature importance is shown in Fig. 6.

By using the pairplot function in python's seaborn package, the pairplot of the first five most important variables is obtained as shown in Fig. 7. Both from the distribution plot on the diagonal and the scatter plot after classification, it can be seen that for NAFLD, the distributions of the five variables ALT, BMI, TG, γGT, and LDL are more different. In other words, these attributes are available to help us to identify NAFLD.

3.4. Machine Learning

The data are normalized by the StandardScaler function in the python package sklearn's preprocessing. Then, the train_test_split function in the sklearn package randomly divides 70% of the data into the training set and the remaining 30% into the testing set.

Ten basic machine learning models were used to classify the data. Conducting the cross_val_score and GridSearchCV functions in the sklearn package for parameter tuning to determine the optimal parameters that achieved the highest score. GA, LR, RF, and NN have the highest accuracy among the 10 basic algorithms, so these four algorithms are integrated by the ensemble learning Voting method. The Voting algorithm uses a soft voting method. The parameters of GA, LR, RF, NN algorithms are shown in Table 4.


Table 4

The parameters of GA, LR, RF, NN algorithms.

Algorithm

Items

Parameters

GA

generations

5

population_size

50

verbosity

2

LR

penalty

l1

solver

liblinear

C

0.9

max_iter

200

RF

n_estimators

110

min_samples_split

100

max_depth

15

min_samples_leaf

100

NN

max_iter

300

learning_rate

constant

learning_rate_init

0.001

alpha

0.001

activation

relu

solver

adam

batch_size

auto

By using the plot_confusion_matrix function in the sklearn package, the confusion matrices of 11 algorithms were obtained. The confusion matrices of 11 machine learning algorithms are shown in Fig. 8.

The evaluation metrics of the algorithm were calculated using confusion matrices. The results of the evaluation metrics for 11 machine learning algorithms are shown in Table 5.


Table 5

The results of evaluation metrics for 11 machine learning algorithms.

Algorithm

Accuracy

Recall

Precision

F1

AUC

LR

0.839741

0.517516

0.733634

0.606909

0.884480

RF

0.833270

0.500000

0.716895

0.589118

0.884459

SVM

0.811953

0.713376

0.587927

0.644604

0.866256

DT

0.778074

0.552548

0.534669

0.543461

0.713757

LigntGBM

0.826037

0.585987

0.651327

0.616932

0.874366

CatBoost

0.791397

0.824841

0.541841

0.654040

0.890978

NN

0.843167

0.571656

0.715139

0.635398

0.888196

KNN

0.811572

0.527070

0.625709

0.572169

0.822654

BN

0.778835

0.804140

0.524403

0.634821

0.867345

GA

0.840502

0.511146

0.741339

0.605090

0.891156

Voting

0.846212

0.573248

0.725806

0.640569

0.894010

We found that there are different performances for different algorithms. Among the 11 algorithms, the proposed Voting algorithm achieves the best accuracy value (0.846212) and the best AUC (0.894010). GA achieves the best precision (0.7413). CatBoost achieves the best recall (0.680) and the best \({F}_{1}\) score (0.655). The AUC is the most important evaluation metric, which will be further explained in the discussion section. The results showed that of the 11 machine learning algorithms, the proposed Voting algorithm demonstrated the best overall performance. It achieved accuracy, recall, precision, \({F}_{1}\) score, and AUC of up to 0.846212, 0.573248, 0.725806, 0.640569, 0.894010, respectively.

From the ROC curves in Fig. 9, we can see that the DT algorithm has the worst performance, and the second worst performance is the KNN algorithm. When the ROC curves of classifiers are close to each other, the ROC curve is not a clear indication of which classifier is better. AUC is the area under the ROC curve, and AUC is used as an evaluation criterion to indicate which classifier is better. The larger the AUC, the better the classifier. From the value of AUC in the lower right corner of Fig. 9, we can see that the highest AUC of a single classification algorithm is GA, with an AUC of 0.8912, and the highest AUC of the proposed Voting algorithm is 0.8940.

A randomly selected sample from the test data is input to the Voting model and the interpretation information of the Voting model is obtained. The LimeTabularExplainer function of python's Lime package is used to get information about the interpretation of the model. The interpretation information of Voting model is shown in Fig. 10. The positive predictive probability of this sample was 0.84 > 0.5, so it could be classified as NAFLD present. The top four features that contributed the most to the classification of this sample as positive were BMI, ALT, TG, and γGT, and the feature that contributed the most to the classification of this sample as negative was LDL.

3.5. Comparison with Existing Studies

The proposed study was compared with relevant studies to demonstrate its reliability in the screening diagnosis of NAFLD, and Table 6 shows this comparison. According to the comparison results, the proposed Voting algorithm demonstrated the best performance.


Table 6

Comparison with existing studies.

Year

Reference

Result

2018

9

The highest Accuracy is 0.8341 for the LR algorithm

2019

36

The AUC of the LR algorithm is 0.73

2020

37

The highest AUC is 0.824809 for the RF algorithm

2021

38

The Accuracy of the SVM algorithm is 0.71

2021

39

The Accuracy of the NN algorithm is 0.77

The AUC of the NN algorithm is 0.82

2022

40

The highest Accuracy is 0.79 for the RF algorithm

The highest AUC is 0.84 for the ElasticNet algorithm

Proposed Voting algorithm

The Accuracy is 0.846212 and the AUC is 0.8940

4. Discussion

We used 11 of the most advanced machine learning algorithms to evaluate the best clinical prediction models for NAFLD. The results based on the PPFS algorithm variable weight scores showed that the top 5 most discriminative features were ALT, BMI, TG, GGT, and LDL. therefore, we could focus more on these 5 features. The results of the machine learning prediction models show that the proposed Voting algorithm has the best performance. It achieved accuracy, recall, precision, \({F}_{1}\) score and AUC of up to 0.846212, 0.573248, 0.725806, 0.640569, and 0.894010, respectively.

Machine learning methods can identify patterns from data and build accurate predictive models for classification. Ensemble learning is a new machine learning method that can integrate multiple identical or different machine learning algorithms to obtain an optimal classification prediction model. Based on soft voting, four machine learning models, GA, LR, RF, and NN, are integrated to obtain the optimal Voting model. Nevertheless, our models have some limitations, such as the lack of model interpretability.

In this research, NAFLD was diagnosed by ultrasonography method. Ultrasonography does not identify the seriousness of NAFLD and is not the gold standard for the diagnosis of NAFLD. Despite the limitations, ultrasonography is the most commonly used method and is reasonably accurate. In the next study, we intend to validate the predictive ability of the machine learning model by biopsy results.

5. Conclusions

The new machine learning methods can provide good screening and prediction of NAFLD. The application of these machine learning methods may enhance empirically based clinical decision-making, improve early diagnosis rates and reduce terminal complications.

Declarations

Acknowledgements: This research was funded by the National Natural Science Foundation of China (61763008, 71762008, 62166015) and the Guangxi Science and Technology Planning Project (2018GXNSFAA294131, 2018GXNSFAA050005).

Author Contributions: G. C. contributed to the conception or design of the work. G. C. and H. Z. contributed to the acquisition, analysis, or interpretation of the data, draft the manuscript. All authors reviewed the manuscript.

Competing of Interest: The authors declare no competing interests.

Institutional Review Board Statement: The study was approved by the Ethics Committee of the Guilin University of Technology, China.

Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement: The data used to support the findings of this study are available from the corresponding author upon request.

References

  1. Chalasani N, Younossi Z, Lavine J E, et al. The diagnosis and management of non‐alcoholic fatty liver disease: Practice Guideline by the American Association for the Study of Liver Diseases, American College of Gastroenterology, and the American Gastroenterological Association[J]. Hepatology, 2012, 55(6): 2005-2023.
  2. Williams C D, Stengel J, Asike M I, et al. Prevalence of nonalcoholic fatty liver disease and nonalcoholic steatohepatitis among a largely middle-aged population utilizing ultrasound and liver biopsy: a prospective study[J]. Gastroenterology, 2011, 140(1): 124-131.
  3. Sanyal A J, Brunt E M, Kleiner D E, et al. Endpoints and clinical trial design for nonalcoholic steatohepatitis[J]. 2011, 54, 344-353.
  4. Estes C, Anstee Q M, Arias-Loste M T, et al. Modeling nafld disease burden in china, france, germany, italy, japan, spain, united kingdom, and united states for the period 2016–2030[J]. Journal of hepatology, 2018, 69(4): 896-904.
  5. Bedogni G, Bellentani S, Miglioli L, et al. The Fatty Liver Index: a simple and accurate predictor of hepatic steatosis in the general population[J]. BMC gastroenterology, 2006, 6(1): 1-7.
  6. Wang J, Xu C, Xun Y, et al. ZJU index: a novel model for predicting nonalcoholic fatty liver disease in a Chinese population[J]. Scientific reports, 2015, 5(1): 1-10.
  7. Lee J H, Kim D, Kim H J, et al. Hepatic steatosis index: a simple screening tool reflecting nonalcoholic fatty liver disease[J]. Digestive and Liver Disease, 2010, 42(7): 503-508.
  8. Wieckowska A, Feldstein A E. Diagnosis of nonalcoholic fatty liver disease: invasive versus noninvasive[C]//Seminars in liver disease. © Thieme Medical Publishers, 2008, 28(04): 386-395.
  9. Ma H, Xu C, Shen Z, et al. Application of machine learning techniques for clinical predictive modeling: a cross-sectional study on nonalcoholic fatty liver disease in China[J]. BioMed research international, 2018, 2018.
  10. Yoo T K, Kim S K, Kim D W, et al. Osteoporosis risk prediction for bone mineral density assessment of postmenopausal women using machine learning[J]. Yonsei medical journal, 2013, 54(6): 1321-1330.
  11. Choi S B, Kim W J, Yoo T K, et al. Screening for prediabetes using machine learning models[J]. Computational and mathematical methods in medicine, 2014, 2014.
  12. Lee C L, Liu W J, Tsai S F. Development and Validation of an Insulin Resistance Model for a Population with Chronic Kidney Disease Using a Machine Learning Approach[J]. Nutrients, 2022, 14(14): 2832.
  13. Fan J G, Jia J D, Li Y M, et al. Guidelines for the diagnosis and management of nonalcoholic fatty liver disease: update 2010:(published in Chinese on Chinese Journal of Hepatology 2010, 18: 163-166)[J]. Journal of digestive diseases, 2011, 12(1): 38-44.
  14. Hassan A, Paik J H, Khare S, et al. PPFS: Predictive Permutation Feature Selection[J]. arXiv preprint arXiv:2110.10713, 2021.
  15. Wang Y, Gao X, Ru X, et al. Identification of gene signatures for COAD using feature selection and Bayesian network approaches[J]. Scientific Reports, 2022, 12(1): 1-13.
  16. Sumner M, Frank E, Hall M. Speeding up logistic model tree induction[C]//European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, 2005: 675-683.
  17. Breiman L. Random forests[J]. Machine learning, 2001, 45(1): 5-32.
  18. Mining W I D. Data mining: Concepts and techniques[J]. Morgan Kaufinann, 2006, 10: 559-569.
  19. Jiang L, Li C, Cai Z. Learning decision tree for ranking[J]. Knowledge and Information Systems, 2009, 20(1): 123-135.
  20. Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[J]. Advances in neural information processing systems, 2017, 30.
  21. Veronika Dorogush A, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support[J]. arXiv e-prints, 2018: arXiv: 1810.11363.
  22. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010: 249-256.
  23. Mining W I D. Data mining: Concepts and techniques[J]. Morgan Kaufinann, 2006, 10: 559-569.
  24. Jiang L, Li C, Wang S. Cost-sensitive Bayesian network classifiers[J]. Pattern Recognition Letters, 2014, 45: 211-216.
  25. Le T T, Fu W, Moore J H. Scaling tree-based automated machine learning to biomedical big data with a feature set selector[J]. Bioinformatics, 2020, 36(1): 250-256.
  26. Yang N C, Ismail H. Voting-based ensemble learning algorithm for fault detection in photovoltaic systems under different weather conditions[J]. Mathematics, 2022, 10(2): 285.
  27. Yan B, Ye X, Wang J, et al. An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning[J]. Molecules, 2022, 27(10): 3112.
  28. Husain A, Khan M H. Early diabetes prediction using voting based ensemble learning[C]//International conference on advances in computing and data sciences. Springer, Singapore, 2018: 95-103.
  29. Mining W I D. Data mining: Concepts and techniques[J]. Morgan Kaufinann, 2006, 10: 559-569.
  30. Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques, ser. Adaptive computation and machine learning[J]. MIT Press, 2009, 11: 16-19.
  31. Cover T M. Elements of information theory[M]. John Wiley & Sons, 1999.
  32. Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers[J]. Machine learning, 1997, 29(2): 131-163.
  33. Systique H. Machine learning based network anomaly detection[J]. Int. J. Recent Technol. Eng, 2019, 8: 542-548.
  34. Fawcett T. An introduction to ROC analysis[J]. Pattern recognition letters, 2006, 27(8): 861-874.
  35. Hand D J, Till R J. A simple generalisation of the area under the ROC curve for multiple class classification problems[J]. Machine learning, 2001, 45(2): 171-186.
  36. Canbay A, Kälsch J, Neumann U, et al. Non-invasive assessment of NAFLD as systemic disease—a machine learning perspective[J]. PloS one, 2019, 14(3): e0214436.
  37. Bangash A H. Leveraging AutoML to provide NAFLD screening diagnosis: Proposed machine learning models[J]. medRxiv, 2020.
  38. Panigrahi S, Deo R, Liechty E A. A New Machine Learning-Based Complementary Approach for Screening of NAFLD (Hepatic Steatosis)[C]//2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2021: 2343-2346.
  39. Sorino P, Campanella A, Bonfiglio C, et al. Development and validation of a neural network for NAFLD diagnosis[J]. Scientific Reports, 2021, 11(1): 1-13.
  40. Noureddin M, Ntanios F, Malhotra D, et al. Predicting NAFLD prevalence in the United States using National Health and Nutrition Examination Survey 2017–2018 transient elastography data and application of machine learning[J]. Hepatology Communications, 2022.