After infecting the host, Schistosoma japonicum produces a large number of eggs and deposits them in tissues such as the liver. If timely and effective intervention is not performed, changes such as egg granuloma and liver fibrosis may further develop into hepatocellular carcinoma7. Studies have shown that liver fibrosis is not a single irreversible progression, and liver fibrosis may have the potential to regress8. Therefore, it has positive significance in the early diagnosis and treatment of liver fibrosis. At present, schistosomiasis has not attracted enough attention in major endemic countries, resulting in relatively lagging clinical and basic research on schistosomiasis, and there are few basic data research on schistosomiasis liver fibrosis9. This study predicts the risk of liver fibrosis by constructing a diagnostic model, which has important clinical significance for early and correct treatment and intervention.
This study uses a machine learning model to predict liver fibrosis in Schistosomiasis japonicum, helping clinicians to deeply understand the impact of key factors on liver fibrosis. It is helpful for early identification of liver fibrosis and distinguishing the severity of liver fibrosis, so as to timely detect patients with early liver fibrosis and improve the prognosis of them. In this study, the data of 1049 patients with schistosomiasis japonicum were analyzed to establish a liver fibrosis prediction model using machine learning algorithms to help identify patients at high risk of liver fibrosis. The model established in this study is well discriminative and exhibits satisfactory specificity and sensitivity.
After screening out 10 key factors, the research uses 6 different machine learning algorithms to classify. Compared with other models, the LightGBM algorithm has better performance and higher stability, and the AUC of the optimal model is 0.8367. In the evaluation of the importance of model variables, the top three indicators with positive contribution to the outcome of liver fibrosis are neutrophils, red blood cells, and age, while the indicators with the largest negative contributions are RDW-SD and MCV. Except for the patient's age, other indicators are related to blood routine.
Overall, the key variables included in the model play an important role in the early diagnosis of Schistosoma japonicum liver fibrosis. Previous reports point out that there is an inseparable relationship between blood routine indicators and liver fibrosis10, and the results of this study also support this association. The neutrophil-to-lymphocyte ratio (NLR) is widely used to assess inflammatory diseases. The study found that for patients with nonalcoholic fatty liver disease (NAFLD), NLR was significantly correlated with liver fibrosis stage and nonalcoholic fatty liver disease activity score (NAS); For chronic hepatitis B (CHB) patients, NLR was negatively correlated with liver fibrosis stage11–14. Therefore, NLR may be associated with the stage of liver fibrosis. Kekilli et al. also demonstrated that the ratio of neutrophils to lymphocytes reflects the severity of advanced liver fibrosis15. RDW is a parameter reflecting the heterogeneity of red blood cell volume, which is often used to diagnose different types of anemia, and is closely related to the body's inflammation and nutritional status. Elevated RDW often indicates shortened lifespan and increased destruction of red blood cells. Michalak et al believe that RDW and its derivatives may be related to the deterioration of liver function16. Studies have shown that RDW is closely related to liver fibrosis in diseases such as NAFLD and CHB17–19. RDW can be expressed as RDW-CV and RDW-SD. RDW-SD is determined by the width of the red blood cell volume distribution curve above 20% above baseline. Studies have shown20 that RDW-SD is closely related to significant liver fibrosis (F2-F4) in CHB and can be used as an effective predictor for significant liver fibrosis in CHB. Liu et al.21–23 also found that only RDW-SD had a statistically significant difference between different stages of liver fibrosis in AIH (P = 0.046). In univariate Logistic regression analysis, RDW-SD was a risk factor for advanced liver fibrosis (F3-F4) in AIH. MCV is a parameter that reflects the volume of red blood cells, and changes in MCV suggest that the patient's hemoglobin synthesis is impaired. Liu et al.21 further found that MCV had statistically significant differences among different stages of liver fibrosis in AIH and was positively correlated with the severity of liver fibrosis. The combination of MCV and RDW can comprehensively reflect the discrete state of peripheral red blood cell volume. So far, the mechanism between RDW, MCV and liver fibrosis is unclear, and may include the following points: 1. Inflammatory cytokines may inhibit the maturation of red blood cells and accelerate the entry of newer and larger reticulocytes into the peripheral circulation, resulting in increased RDW; 2. Patients with liver disease often have decreased intestinal absorption function, resulting in folic acid, vitamin B12 and other deficiencies, resulting in varying degrees of megaloblastic anemia and heterogeneous changes in red blood cell volume; 3. Hepatic fibrosis often causes splenomegaly and hyperfunction, which accelerates red blood cell destruction and shortens the lifespan of red blood cells, which may promote the release of immature red blood cells and eventually lead to increased RDW17,24,25. These studies provide a theoretical basis for the correlation between blood routine indicators and liver fibrosis, but the magnitude of the correlation and the degree of liver function deterioration have not been clearly quantified, nor have they provided a predictable space for early liver fibrosis. Machine learning can make up for this deficiency. This study also find that age is also a key variable associated with liver fibrosis in schistosomiasis japonicum, and the model predicts that the older the age, the greater the possibility of liver fibrosis. The significance of the machine learning method for this study lies in the establishment of a clinical prediction and identification model through simple blood routine indicators and patient age to give suggestions for the diagnosis of complex liver fibrosis.
This study built a machine learning model and evaluated the model by taking advantage of abundant data. Compared with the models mentioned in the published literature, this study only needs blood routine, age and gender to predict, providing clinicians with a more easy-to-operate and understandable diagnostic method.
But this study also has certain limitations. This study is a single-center retrospective study, which may not be able to avoid inherent selection bias and information bias. The next step of the study needs to conduct multi-center prospective research for external verification. The variables of the current model only include the patient's clinical information and test results. In order to optimize the performance of the identification model, the model can also include biomarkers from microbiome and metabolomics. However, at present, only using clinical variables can also reduce the burden on patients to a certain extent, and it has a certain degree of convenience in clinical application. Finally, the insufficient interpretability of SHAP values warrants the development of more understandable models in the future. In the future, we will further develop an automatic clinical scoring system based on nomograms or machine learning based on research data in order to provide clinicians with more practical and easy-to-understand tools.