Diagnosis of Signi cant Liver Fibrosis in Patients With Chronic Hepatitis B By Using a Deep Learning- Based Data Integration Network

Zhong Liu Shenzhen University https://orcid.org/0000-0001-9650-6097 Huiying Wen Shenzhen University Ziqi Zhu Shenzhen University Qinyuan Li A liated Hospital of Guangdong Medical University Li Liu The Chinese University of Hong Kong Tianjiao Li Shenzhen University Wencong Xu Shenzhen University Chao Hou Shenzhen University Bin Huang Shenzhen University Zhiyan Li Shenzhen Third People's Hospital Changfeng Dong (  chaosheng-02@szsy.sustech.edu.cn ) Shenzhen Third People's Hospital Xin Chen Shenzhen University https://orcid.org/0000-0002-5406-4136


Introduction
Chronic hepatitis B virus (CHB) infection remains a major global health burden, with approximately 257 million people infected worldwide and 800,000 deaths annually [1][2][3] . Liver structure and function alternations in CHB patients may lead to the development of liver brosis. According to the METAVIR system 4 , liver brosis can be strati ed into the following stages: F0, no brosis; F1, portal brosis without septa; F2, portal brosis and few septa; F3, numerous septa without cirrhosis; F4, cirrhosis. Without proper treatment, liver brosis may develop from stages of F0-F4 to hepatocellular carcinoma (HCC) or even liver failure 2,3,5 , however, if treated in time, the ongoing progression of brosis can be reversed 3 . Therefore, for better management of liver brosis in CHB patients, an accurate diagnosis of the brosis stage is very important.
Currently, liver biopsy remains the gold-standard for liver brosis staging, but it is associated with potential complications and subject to several limitations, such as sampling error and inter-observer variability. As an alternative to liver biopsy, image analysis based on computed tomography, magnetic resonance or ultrasound (US) imaging is usually used for non-invasive staging of liver brosis. Among the various imaging modalities, US is preferred due to its non-ionizing radiation and wide availability.
Studies have shown that US image features, such as an uneven or undulating liver surface, heterogeneous echo texture of the liver parenchyma; and changes in the diameters of vessels, blood ow velocity, and spleen size are correlated with liver brosis 6-8 . Hence, in clinical practice visual assessment of the US image features is often performed for brosis staging and screening prior to liver biopsy. However, visual assessment on US images is subjective and its accuracy quite depends on the experience of radiologists.
Apart from visual assessment, there also exist other non-invasive methods for liver brosis staging, such as the brosis biomarkers based on patients' clinical parameters [9][10][11] , brosis detectors based on liver stiffness measurements (LSMs) 12,13 , and arti cial intelligence (AI) based on US images [14][15][16] . Particularly, deep convolutional neural network (DCNN), a subtype of AI, has been rapidly developed and is becoming a promising tool for US image analysis. By using supervised learning based on a large dataset of labeled images, a DCNN model can be trained and used for an objective and intelligent evaluation of the disease re ected by US images. To date, many studies have been conducted to explore the potential of DCNN models for liver brosis assessment in US images. However, the already-developed DCNN models often performed well in diagnosing liver cirrhosis (F4), but exhibited poor performance in the detection of signi cant liver brosis (≥F2) 14,15 , while the identi cation of ≥F2 is particularly important as it signals the necessity for anti-brotic treatments 17 . Thus, it is imperative to develop a new DCNN model for better diagnosis of ≥F2.
Noting that most of the already-developed deep learning models are based on the mere use of singlesource data of US images, neglecting other data sources, such as the previously mentioned LSMs and the patients' clinical parameters which also have potential to re ect the liver brosis stages, it is meaningful to investigate if the performance of DCNN models in diagnosing ≥F2 could be further enhanced by using multi-source data. Thus, it was the aim of this study to assess the potential of the joint use of US images of liver parenchyma, liver stiffness values, and patients' clinical parameters in a deep learning model to improve the diagnosis of ≥F2 in CHB patients.

Patients And Methods
Patients All consecutive patients with CHB (hepatitis B surface antigen positive for more than 6 months), who underwent liver biopsy from May 2016 to January 2021 at Shenzhen Third People's Hospital (SZTPH), Shenzhen, China, were studied. The patients who had received US B-mode examination, and LSM via point-shear-wave elastography (P-SWE) within one week prior to liver biopsy were included. Exclusion criteria were: (a) age less than 18 years; (b) received antiviral treatment in 6 months prior to liver biopsy; (c) combined with any other liver diseases, including autoimmune hepatitis, alcoholic liver disease, ascites, and HCC; (d) co-infection with any other viral hepatitis; (e) liver samples smaller than 10 mm or containing less than 6 portal tracts; (f) missing required serological results; (g) unsuccessful P-SWE.
According to the eligibility criteria, we successfully enrolled 385 patients, among which 284 patients who had complete data required were nally included for analysis ( Figure 1). This study was approved by the ethics committee of SZTPH. Written informed consent was obtained from each enrolled patient.

US B-mode and P-SWE Examinations
A conventional US examination followed by a P-SWE was conducted after the patient had undergone an overnight fast (at least 8 hours). The examinations were performed by using one of the following two US systems: Philips iU22, and Mindray Resona7. Both systems have the function of P-SWE and are routinely used in SZTPH. For the conventional US examination, a linear-array probe (L9-3 for Philips iU22, L11-3U for Mindray Resona7) was used to capture US B-mode images of the liver surface, while a convex-array probe (C14L5 for Philips iU22, SC6U-1 for Mindray Resona7) was used to capture B-mode images of the liver parenchyma and spleen. During the examination, the patient was instructed to maintain the supine position with both hands extended to the head, and static B-mode images were scanned. Right after the conventional US examination, the patient was required to maintain the supine position with right arm at maximum abduction. Then, the sonographer changed the scanning mode to P-SWE for LSMs. Among the included 284 patients, 155 received examinations on the Philips system, and the rest (n = 129) received examinations on the Mindray system.

Pathological Examination
The pathological examination was performed within less than one week following the US examination. A US-guided percutaneous liver biopsy on the right liver lobe was performed by using a 16-gauge biopsy needle (MC1616 [16G, 16cm], BARD, New Jersey, USA). The biopsied liver sample was 10-20 mm in length, containing at least 6 portal areas, which was made into para n sections and stained with Sirius Red (S1020, Wuhan Haotian Bioscience Technology Limited Company, Hubei, China). Liver brosis was semi-quantitatively assessed according to the METAVIR system 4  Step-3), the two branch features were fused via an e cient feature fusion module which consists of three feature units (FUs). The rst FU, FU-1, has two fully-connected (FC) layers with the number of neurons reduced from 512 to 23, such that the B features stemming from Branch-1 can be condensed to a feature set with a dimension matched with the number of C+S parameters. The second FU, FU-2, also has two FC layers, and the number of neurons for each layer is identical to that of the C+S parameters. As such, the C+S parameters can be combined internally rst to enhance their associations with the class labels prior to the fusion with the B features. Finally, the feature vectors output from FU-1 and FU-2, each with a size of 23, were concatenated together by FU-3 to yield a comprehensive feature set (46 in size), which was then processed by two FC layers (the number of neurons are 46 and 2, respectively, see Step-4 of Figure 2) for predicting the ultimate class labels: ≥F2 or <F2.

Model Training and Validation
Patients were divided into two cohorts according to the US systems under which they received examinations. The patients who underwent examinations on the Philips system (n = 155) contributed to a main cohort, and the corresponding data yielded an internal dataset for model training and crossvalidation. The remaining patients who underwent examinations on the Mindray system (n =129) served as an independent cohort, and the corresponding data yielded an additional dataset for external validation of the model trained on the internal dataset.
We used ve-fold cross-validation to train and validate the DI-Net on the internal dataset. Speci cally, the data in the internal dataset was divided into ve subsets with no patient overlap, among which three subsets was used for training (see details in the Supplementary Material), one of the remaining two subsets for model optimization, and the rest for validation. By altering the training, optimization and validation folds ve times, each fold was tested once. The models trained in the internal dataset were additionally validated on the external dataset. As there were ve models trained on the internal dataset through cross-validation, the class probabilities predicted by the models were averaged for nal decision making in the external validation.

Evaluation Metrics and Comparative Methods
The diagnostic capacity of DI-Net was evaluated in terms of accuracy (ACC), sensitivity (SEN), speci city (SPE), positive predictive value (PPV), negative predictive value (NPV), the receiver-operatingcharacteristic (ROC) curve, and the area under the ROC curve (AUC). We compared the performance of DI-Net against the following methods: 1) single-source data-based models, i.e. the models trained with single-source data of B-mode images, liver stiffness values, or patients' clinical parameters. 2) brosis biomarkers 9,10 , including APRI and FIB-4. 3) interpretations by skilled radiologists. More details of the comparative methods are presented in the Supplementary Material.

Statistical Analysis
Continuous variables of the patients' data were expressed as mean ± standard deviation and compared by Student's t test, while the categorical variables were expressed as a number (percentage) and compared by chi-square test. For the quantitative results of ACC, SEN, SPE, PPV, NPV and AUC, 95% con dence intervals (CIs) were calculated using the Clopper-Pearson method 21 . We used DeLong test 22 to evaluate the differences between the ROC curves and the associated AUC values. P values were twosided, and the values less than 0.05 indicated statistical signi cance. All statistical tests were performed on the software of MedCalc (Version 18.2) and MATLAB (Version R2019a).

Patient Characteristics
A total of 527 CHB patients who underwent liver biopsy, conventional US and P-SWE examinations were studied. 142 (26.9%) patients did not meet the eligibility criteria and were excluded (Figure 1). Among the enrolled 385 patients, 101 (26.2%) patients were with incomplete data: 73 without complete required serological parameters, and 28 without successful P-SWE measurements. Finally, we included 284 patients for analysis, of which 155 subjects (mean age, 37.9 years, standard deviation, 10.3 years; F0-F1, 68, F2-F4, 87) made up the main cohort and 129 subjects (mean age, 37.7 years, standard deviation, 8.3 years; F0-F1, 73, F2-F4, 56) made up an independent cohort. The main clinical and demographic characteristics of the study cohort are summarized in Table 1.  Table 2. Noticeably, the quantitative indexes of DI-Net were universally higher than those of any single-source data-based method. The ROC curve of DI-Net is shown in Figure 3a, where the curves of the single-source data-based models are also presented for comparisons. In the same validation, the overall performance of DI-Net was also signi cantly better than those of APRI ( Table 2 (see rows parallel to the external validation). The ROC curves yielded by the models with different data sources and those by APRI, FIB-4 and radiologists' interpretations are plotted in Figure 3c and d, respectively.

Discussion
The accurate and non-invasive diagnosis of ≥F2 is crucial for better management of patients suffering from chronic liver disease 2,3,23 , but achieving this goal is technically challenging. As one type of noninvasive methods, brosis biomarkers have the advantages of high applicability, good inter-laboratory reproducibility, and wide availability. However, many studies showed that the biomarkers offered less accuracy in diagnosing signi cant brosis than cirrhosis 24,25 . In our study, the diagnostic performances of two commonly-used brosis biomarkers, APRI and FIB-4, in diagnosing ≥F2 were also unsatisfactory, with AUC values ranging from 0.791 to 0.828. The problem of less accuracy in diagnosing ≥F2 than F4 was also encountered by DCNN models. For instance, in the study conducted by Wang et al. 14  The less accuracy of the non-invasive methods in diagnosing ≥F2 than F4 is probably due to the fact that the changes of liver brosis at intermediate stages are quite subtle and heterogeneous 24,26,27 . The use of single-source data of US images 14,15 , liver stiffness values 13 , or biomarkers 11,24,25 may not be su cient to well capture such changes. The results in the study indicated that the information carried by US B-mode images of liver parenchyma could be complemented by the liver stiffness values and patients' clinical parameters for improving the diagnosis of ≥F2. Therefore, the joint use of multi-source data has the potential to better re ect the subtle changes of liver brosis, in comparison to the mere use of singlesource data. To further verify this point, we replaced the network in Branch-1 of DI-Net with the other popular DCNN backbones, including VGG16 28 , Inception V3 29 , and DenseNet121 30 , and validated the resulted models on our dataset. The results of the resulted models together with the one based on the backbone model of ResNet50 are summarized in Table 1 of the Supplementary Material. We noted that all models with the additional use of liver stiffness values and patients' clinical parameters achieved signi cantly better performance than those using only B-mode images, for both cases of cross-validation (AUC ranges: 0.908-0.943 vs. 0.773-0.873, Ps <0.01, DeLong test) and external validation (AUC ranges: 0.865-0.901 vs. 0.702-0.806, Ps <0.01, DeLong test). These results, to some extent, further increased our credibility of the added value of liver stiffness measurements and clinical parameters to the B-mode images of liver parenchyma for improving the diagnosis of ≥F2.
The performance improvement made by DI-Net also bene tted from the specialized design of the feature fusion module. In this respect, we substituted the feature fusion module in DI-Net with a functional unit that directly concatenated the features output from different branches, while adjusting the number of neurons in the penultimate FC layer accordingly. The new model, named DCNN-DFC (Deep convolutional neural network with direct feature concatenation), was trained and validated by following the same procedure as done for DI-Net. The quantitative results yielded by DCNN-DFC are shown in Table 3.
Comparing to the results of the model with B-mode images, the AUC improvement made by DCNN-DFC were 1.7 percentages increase in the cross-validation and 1.9 percentages increase in the external validation, which were lower than those made by DI-Net, i.e. 7.0 percentages increase for the crossvalidation and 9.5 percentages increase for the external validation. These results demonstrate that the direct concatenation of two feature sets with highly imbalanced dimension weakens the contribution of the one with smaller number of features, and it is better to concatenate the feature sets after adjusting their dimensions to be at an equivalent level.
Our study has limitations. First, this was a single-centre retrospective study. Acquiring more data from multiple centres is necessary to further validate the robustness of DI-Net. Second, our data excluded patients with non-viral liver diseases, such as the non-alcoholic fatty liver disease which may be a confounding factor for brosis analysis.
In conclusion, this study successfully developed a deep learning-based data integration network, namely DI-Net, to integrate the information of US images of liver parenchyma, liver stiffness values and patients' clinical parameters for diagnosing signi cant liver brosis in CHB patients. Both cross-and external validations as well as the comparison of DI-Net against the other non-invasive methods demonstrate that the joint use of the multi-source data in a deep learning model could signi cantly improve the diagnosis of signi cant liver brosis for CHB patients.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.