Diagnosis of significant liver fibrosis in patients with chronic hepatitis B using a deep learning-based data integration network

Chronic hepatitis B virus (CHB) infection remains a major global health burden and the non-invasive and accurate diagnosis of significant liver fibrosis (≥ F2) in CHB patients is clinically very important. This study aimed to assess the potential of the joint use of ultrasound images of liver parenchyma, liver stiffness values, and patients’ clinical parameters in a deep learning model to improve the diagnosis of ≥ F2 in CHB patients. Of 527 CHB patients who underwent US examination, liver elastography and biopsy, 284 eligible patients were included. We developed a deep learning-based data integration network (DI-Net) to fuse the information of ultrasound images of liver parenchyma, liver stiffness values and patients’ clinical parameters for diagnosing ≥ F2 in CHB patients. The performance of DI-Net was cross-validated in a main cohort (n = 155) of the included patients and externally validated in an independent cohort (n = 129), with comparisons against single-source data-based models and other non-invasive methods in terms of the area under the receiver-operating-characteristic curve (AUC). DI-Net achieved an AUC of 0.943 (95% confidence interval [CI] 0.893–0.973) in the cross-validation, and an AUC of 0.901 (95% CI 0.834–0.945) in the external validation, which were significantly greater than those of the comparative methods (AUC ranges: 0.774–0.877 and 0.741–0.848 for cross- and external validations, respectively, ps < 0.01). The joint use of ultrasound images of liver parenchyma, liver stiffness values, and patients’ clinical parameters in a deep learning model could significantly improve the diagnosis of ≥ F2 in CHB patients.


Introduction
Chronic hepatitis B virus (CHB) infection remains a major global health burden, with approximately 257 million people infected worldwide and 800,000 deaths annually [1][2][3]. Liver structure and function alternations in CHB patients may lead to the development of liver fibrosis. According to the METAVIR system [4], liver fibrosis can be stratified into the following stages: F0, no fibrosis; F1, portal fibrosis without septa; F2, portal fibrosis and few septa; F3, numerous septa without cirrhosis; and F4, cirrhosis. Without proper treatment, liver fibrosis may develop from stages of F0-F4 to hepatocellular carcinoma (HCC) or even liver failure [2,3,5]; however, if treated in time, the ongoing progression of fibrosis can be reversed [3]. Therefore, for better management of liver fibrosis in CHB patients, an accurate diagnosis of the fibrosis stage is very important.
Currently, liver biopsy remains the gold-standard for liver fibrosis staging, but it is associated with potential complications and subject to several limitations, such as sampling error and inter-observer variability. As an alternative to liver biopsy, image analysis based on computed tomography, magnetic resonance or ultrasound (US) imaging is usually used for non-invasive staging of liver fibrosis. Among the various imaging modalities, US is preferred due to its non-ionizing radiation and wide availability. Studies have shown that US image features, such as an uneven or undulating liver surface, heterogeneous echo texture of the liver parenchyma; and changes in the diameters of vessels, blood flow velocity, and spleen size are correlated with liver fibrosis [6][7][8]. Hence, in clinical practice visual assessment of the US image features is often performed for fibrosis staging and screening prior to liver biopsy. However, visual assessment on US images is subjective and its accuracy quite depends on the experience of radiologists.
Apart from visual assessment, there also exist other non-invasive methods for liver fibrosis staging, such as the fibrosis biomarkers based on patients' clinical parameters [9][10][11], fibrosis detectors based on liver stiffness measurements (LSMs) [12,13], and artificial intelligence (AI) based on US images [14][15][16]. Particularly, deep convolutional neural network (DCNN), a subtype of AI, has been rapidly developed and is becoming a promising tool for US image analysis. Using supervised learning based on a large dataset of labeled images, a DCNN model can be trained and used for an objective and intelligent evaluation of the disease reflected by US images. To date, many studies have been conducted to explore the potential of DCNN models for liver fibrosis assessment in US images. However, the already-developed DCNN models often performed well in diagnosing liver cirrhosis (F4), but exhibited poor performance in the detection of significant liver fibrosis (≥ F2) [14,15], while the identification of ≥ F2 is particularly important as it signals the necessity for anti-fibrotic treatments [17]. Thus, it is imperative to develop a new DCNN model for better diagnosis of ≥ F2.
Noting that most of the already-developed deep learning models are based on the mere use of single-source data of US images, neglecting other data sources, such as the previously mentioned LSMs and the patients' clinical parameters which also have potential to reflect the liver fibrosis stages, it is meaningful to investigate if the performance of DCNN models in diagnosing ≥ F2 could be further enhanced using multi-source data. Thus, it was the aim of this study to assess the potential of the joint use of US images of liver parenchyma, liver stiffness values, and patients' clinical parameters in a deep learning model to improve the diagnosis of ≥ F2 in CHB patients.

Patients
All consecutive patients with CHB (hepatitis B surface antigen positive for more than 6 months), who underwent liver biopsy from May 2016 to January 2021 at Shenzhen Third People's Hospital (SZTPH), Shenzhen, China, were studied. The patients who had received US B-mode examination, and LSM via point-shear-wave elastography (P-SWE) within one week prior to liver biopsy were included. Exclusion criteria were: (a) age less than 18 years; (b) received antiviral treatment in 6 months prior to liver biopsy; (c) combined with any other liver diseases, including autoimmune hepatitis, alcoholic liver disease, ascites, and HCC; (d) co-infection with any other viral hepatitis; (e) liver samples smaller than 10 mm or containing less than 6 portal tracts; (f) missing required serological results; and (g) unsuccessful P-SWE. According to the eligibility criteria, we successfully enrolled 385 patients, among which 284 patients who had complete data required were finally included for analysis (Fig. 1). This study was approved by the ethics committee of SZTPH. Written informed consent was obtained from each enrolled patient.

US B-mode and P-SWE examinations
A conventional US examination followed by a P-SWE was conducted after the patient had undergone an overnight fast (at least 8 h). The examinations were performed using one of the following two US systems: Philips iU22, and Mindray Resona7. Both systems have the function of P-SWE and are routinely used in SZTPH. For the conventional US examination, a linear-array probe (L9-3 for Philips iU22, L11-3U for Mindray Resona7) was used to capture US B-mode images of the liver surface, while a convex-array probe (C14L5 for Philips iU22, SC6U-1 for Mindray Resona7) was used to capture B-mode images of the liver parenchyma and spleen. During the examination, the patient was instructed to maintain the supine position with both hands extended to the head, and static B-mode images were scanned. Right after the conventional US examination, the patient was required to maintain the supine position with right arm at maximum abduction. Then, the sonographer changed the scanning mode to P-SWE for LSMs. Among the included 284 patients, 155 received examinations on the Philips system, and the rest (n = 129) received examinations on the Mindray system.

Pathological examination
The pathological examination was performed within less than one week following the US examination. A USguided percutaneous liver biopsy on the right liver lobe was performed using a 16-gauge biopsy needle (MC1616 [16G, 16 cm], BARD, New Jersey, USA). The biopsied liver sample was 10-20 mm in length, containing at least 6 portal areas, which was made into paraffin sections and stained with Sirius Red (S1020, Wuhan Haotian Bioscience Technology Limited Company, Hubei, China). Liver fibrosis was semi-quantitatively assessed according to the METAVIR system [4], and grades ≥ F2 indicated significant fibrosis. The pathological results served as ground truth for the subsequent model training and validation.

Model development
We devised a deep learning-based data integration network (DI-Net) to fuse the information of US images of liver parenchyma, liver stiffness values and patients' clinical parameters for diagnosing ≥ F2 in the included CHB patients. The DI-Net ( Fig. 2) has two branches: Branch-1 and Branch-2, respectively, with input of B-mode image of liver parenchyma and clinical + LSM data. Branch-1 is a DCNN model backboned by ResNet50 [20] for automatic extraction of deep features from B-mode images (the feature set is termed as B, 512 in size, see details in the Supplementary Material). Branch-2 compacted the clinical parameters (C) and liver stiffness values (S) into a single feature vector (C + S, 23 in size). At the backend of the model (Fig. 2, Step-3), the two branch features were fused via an efficient feature fusion module which consists of three feature units (FUs). The first FU, FU-1, has two fully connected (FC) layers with the number of neurons reduced from 512 to 23, such that the B features stemming from Branch-1 can be condensed to a feature set with a dimension matched with the number of C + S parameters. The second FU, FU-2, also has two FC layers, and the number of neurons for each layer is identical to that of the C + S parameters. As such, the C + S parameters can be combined internally first to enhance their associations with the class labels prior to the fusion with the B features. Finally, the feature vectors output from FU-1 and FU-2, each with a size of 23, were concatenated together by FU-3 to yield a comprehensive feature set (46 in size), which was then processed by two FC layers (the number of neurons are 46 and 2, respectively, see Step-4 of Fig. 2) for predicting the ultimate class labels: ≥ F2 or < F2.

Model training and validation
Patients were divided into two cohorts according to the US systems under which they received examinations. The patients who underwent examinations on the Philips system (n = 155) contributed to a main cohort, and the corresponding data yielded an internal dataset for model training and cross-validation. The remaining patients who underwent examinations on the Mindray system (n = 129) served as an independent cohort, and the corresponding data yielded an additional dataset for external validation of the model trained on the internal dataset.
We used fivefold cross-validation to train and validate the DI-Net on the internal dataset. Specifically, the data in the internal dataset was divided into five subsets with no patient overlap, among which three subsets were used for training (see details in the Supplementary Material), one of the remaining two subsets for model optimization, and the rest for validation. By altering the training, optimization and validation folds five times, each fold was tested once. The models trained in the internal dataset were additionally validated on the external dataset. As there were five models trained on the internal dataset through crossvalidation, the class probabilities predicted by the models were averaged for final decision-making in the external validation.

Evaluation metrics and comparative methods
The diagnostic capacity of DI-Net was evaluated in terms of accuracy (ACC), sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV), the receiver-operating-characteristic (ROC) curve, and the area under the ROC curve (AUC). We compared the performance of DI-Net against the following methods: (1) single-source data-based models, i.e., the models trained with single-source data of B-mode images, liver stiffness values, or patients' clinical parameters; (2) fibrosis biomarkers [9,10], including APRI and FIB-4; and (3) interpretations by skilled radiologists. More details of the comparative methods are presented in the Supplementary Material.

Statistical analysis
Continuous variables of the patients' data were expressed as mean ± standard deviation and compared by Student's t test, while the categorical variables were expressed as a number (percentage) and compared by Chi-square test. For the quantitative results of ACC, SEN, SPE, PPV, NPV and AUC, 95% confidence intervals (CIs) were calculated using the Clopper-Pearson method [21]. We used DeLong test [22] to evaluate the differences between the ROC curves and the associated AUC values. p values were two-sided, and the values less than 0.05 indicated statistical significance. All statistical tests were performed on the software of MedCalc (Version 18.2) and MATLAB (Version R2019a).

Patient characteristics
A total of 527 CHB patients who underwent liver biopsy, conventional US and P-SWE examinations were studied. 142 (26.9%) patients did not meet the eligibility criteria and were excluded (Fig. 1). Among the enrolled 385 patients, 101 (26.2%) patients were with incomplete data: 73 without complete required serological parameters, and 28 without successful P-SWE measurements. Finally, we included 284 patients for analysis, of which 155 subjects (mean age 37.9 years, standard deviation, 10 Table 1.  Table 2. Noticeably, the quantitative indexes of DI-Net were universally higher than those of any single-source data-based method. The ROC curve of DI-Net is shown in Fig. 3a, where the curves of the singlesource data-based models are also presented for comparisons. In the same validation, the overall performance of DI-Net was also significantly better than those of APRI (AUC,    external validation). The ROC curves yielded by the models with different data sources and those by APRI, FIB-4 and radiologists' interpretations are plotted in Fig. 3c and d, respectively.

Discussion
The accurate and non-invasive diagnosis of ≥ F2 is crucial for better management of patients suffering from chronic liver disease [2,3,23], but achieving this goal is technically challenging. As one type of non-invasive methods, fibrosis biomarkers have the advantages of high applicability, good inter-laboratory reproducibility, and wide availability. However, many studies showed that the biomarkers offered less accuracy in diagnosing significant fibrosis than cirrhosis [24,25]. In our study, the diagnostic performances of two commonly used fibrosis biomarkers, APRI and FIB-4, in The less accuracy of the non-invasive methods in diagnosing ≥ F2 than F4 is probably due to the fact that the changes of liver fibrosis at intermediate stages are quite subtle and heterogeneous [24,26,27]. The use of single-source data of US images [14,15], liver stiffness values [13], or biomarkers [11,24,25] may not be sufficient to well capture such changes. The results in the study indicated that the information carried by US B-mode images of liver parenchyma could be complemented by the liver stiffness values and patients' clinical parameters for improving the diagnosis of ≥ F2. Therefore, the joint use of multi-source data has the potential to better reflect the subtle changes of liver fibrosis, in comparison to the mere use of single-source data. To further verify this point, we replaced the network in Branch-1 of DI-Net with the other popular DCNN backbones, including VGG16 [28], Inception V3 [29], and DenseNet121 [30], and validated the resulted models on our dataset. The results of the resulted models together with the one based on the backbone model of ResNet50 are summarized in Table 1 of the Supplementary Material. We noted that all models with the additional use of liver stiffness values and patients' clinical parameters achieved significantly better performance than those using only B-mode images, for both cases of cross-validation (AUC ranges 0.908-0.943 vs. 0.773-0.873, p s < 0.01, DeLong test) and external validation (AUC ranges: 0.865-0.901 vs. 0.702-0.806, p s < 0.01, DeLong test). These results, to some extent, further increased our credibility of the added value of liver stiffness measurements and clinical parameters to the B-mode images of liver parenchyma for improving the diagnosis of ≥ F2.
The performance improvement made by DI-Net also benefitted from the specialized design of the feature fusion module. In this respect, we substituted the feature fusion module in DI-Net with a functional unit that directly concatenated the features output from different branches, while adjusting the number of neurons in the penultimate FC layer accordingly. The new model, named DCNN-DFC (Deep convolutional neural network with direct feature concatenation), was trained and validated by following the same procedure as done for DI-Net. The quantitative results yielded by DCNN-DFC are shown in Table 3. Comparing to the results of the model with B-mode images, the AUC improvement made by DCNN-DFC were 1.7 percentages increase in the crossvalidation and 1.9 percentages increase in the external validation, which were lower than those made by DI-Net, i.e., 7.0 percentages increase for the cross-validation and 9.5 percentages increase for the external validation. These results demonstrate that the direct concatenation of two feature sets with highly imbalanced dimension weakens the contribution of the one with smaller number of features, and it is better to concatenate the feature sets after adjusting their dimensions to be at an equivalent level.
Our study has limitations. First, this was a single-center retrospective study with limited population size. It is necessary to enrich our data by prospectively including more CHB patients in our hospital, and/or retrospectively or prospectively including patients from other hospital centers, to further evaluate the robustness of the DI-Net model. Second, our data excluded patients with non-viral liver diseases, such as the non-alcoholic fatty liver disease which may be a confounding factor for fibrosis analysis.
In conclusion, this study successfully developed a deep learning-based data integration network, namely DI-Net, to integrate the information of US images of liver parenchyma, liver stiffness values and patients' clinical parameters for diagnosing significant liver fibrosis in CHB patients. Both cross-and external validations as well as the comparison of DI-Net against the other non-invasive methods demonstrate that the joint use of the multi-source data in a deep learning model could significantly improve the diagnosis of significant liver fibrosis for CHB patients.