Diagnostic Performance of TE, 2D-SWE And MRE For Liver Fibrosis In Treatment-Naive People With HBV: A Systematic Review And Meta-Analysis

Background/aims: To assess the performance of transient elastography (TE), two-dimensional shear wave elastography (2D-SWE), and magnetic resonance elastography (MRE) for staging signicant brosis and cirrhosis in untreated chronic hepatitis B (CHB) patients. Methods: Pubmed, Embase, Web of Science and Cochrane Library were searched for terms involving CHB, TE, SWE, and MRE. Other etiologies of chronic liver disease (CLD), previous treatment in patients or articles not published in SCI journals were excluded. Hierarchical non-linear models were used to evaluate the diagnostic accuracy of TE, 2D-SWE and MRE. Heterogeneity was explored via analysis of threshold effect and meta-regression. Results: Twenty-eight articles with a total of 4540 untreated CHB patients were included. The summary AUROC using TE, 2D-SWE and MRE for predicting signicant brosis (SF) were 0.84, 0.89, and 0.99, respectively. MRE is more accurate than both TE (P(cid:0)0.01) and 2D-SWE (P(cid:0)0.01) in staging SF. 2D-SWE is superior to TE in detecting SF (P(cid:0)0.01). The summary AUROC employing TE, 2D-SWE and MRE for detecting cirrhosis were 0.9, 0.94, and 0.99, respectively. TE displayed a similar diagnostic accuracy with 2D-SWE in staging cirrhosis (P=0.14). MRE and 2D-SWE are comparable for staging cirrhosis (P=0.08). MRE is superior than TE (P(cid:0)0.01) in staging cirrhosis. Conclusion: TE, 2D-SWE, and MRE express acceptable diagnostic accuracies in staging staging signicant brosis and cirrhosis in untreated CHB patients. Both MRE and 2D-SWE are better choices while the TE can be regarded as a secondary option. treatment ; CHB, chronic hepatitis B; DOR, diagnostic odds ratio; HBV, hepatitis B virus; LR, likelihood ratio; MRE, magnetic resonance elastography; NAFLD, nonalcoholic fatty liver disease; NPV, negative predictive value; PPV, positive predictive value; QUADAS, quality assessment of diagnostic accuracy studies; ROC, receiver operating characteristic; SF, signicant brosis; sROC: summary receiver operating characteristic; 2D-SWE, two-dimensional shear wave elastography; TE, transient elastography.

The following situations were considered as the inclusion criteria: 1) the accuracies of 2D-SWE, TE or MRE for discriminating liver brosis in CHB patients were investigated; 2) the speci c liver brosis stage of each patient was biopsy-proven; 3) the sensitivity, speci city and the number of patients in each brosis stage could be extracted to create 2×2 table of test performance; 4) at least 50 patients were enrolled in each investigation; and 5) the original articles need to be published in English and could be screened in SCI journals. The following situations were considered as the exclusion criteria: 1) the original articles did not focus on the diagnostic performance of TE, 2D-SWE or MRE; 2) special types of work such as patent, book section, case report, reply, letter, commentary, conference abstracts, review or meta-analysis were excluded; 3) studies on children or animal; 4) insu cient data to create 2×2 table of test performance; 5) patients were co-infected with other viral hepatitis or HIV; 6) patients were diagnosed as CLD triggered by other etiologies such as alcoholic liver disease (ALD), non-alcoholic fatty liver disease (NAFLD) and autoimmune liver disease; 7) patients had already received antiviral therapy, hepatectomy or liver transplant before biopsy or imaging tests; 8) patients were identi ed as hepatic carcinoma before TE, 2D-SWE, MRE, or liver biopsy; 9) unclear interval between imaging tests and liver biopsy or unclear liver biopsy size.

Identi cation of liver brosis
Signi cant brosis (SF) and cirrhosis were identi ed as stages F2-F4 and F4 using the corresponding scoring systems such as Scheuer, Ishak, Metavir, Batts-Ludwig, and Knodell.

Data acquirement
Two experienced researchers (ML and SW) were rst invited to screen the online databases and make preliminary selections. The eligibility and quality of each article were screened by each investigator. Two researchers then extracted the targeted data separately. Basic characteristics, technical characteristics of the included studies as well as the diagnostic performance of these three noninvasive approaches were summarized in our predesigned forms.
Quality assessment Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool was employed to conduct the evaluation of the quality of the included studies. The results of the QUADAS evaluation were visualized through Review Manager 5.3 (The Cochrane Collaboration). A third investigator (XW) was then invited to assess the discrepancies between the two researchers. The variation between the investigators were resolved through a discussion.

Data synthesis and statistical analysis
The demographic characteristic of the included patients were presented as mean±standard deviation (SD) or median±interquartile range. The number of true positives (tp), false positives (fp), false negatives (fn), and true negatives (tn) was calculated based on the reported population in each biopsy-proven brosis stage, sensitivity, and speci city of these three noninvasive imaging methods. Once the 2×2 table comprised of tp, fp, fn and tn was completed, the summary positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUROC), positive likelihood ratio (LR) and negative LR were calculated according to the corresponding formulas. For meta-analysis, the pooled sensitivity and speci city were presented with midas and metannif modules in Stata 16.0 (StataCorp LP) [14]. The summary diagnostic odds ratios (DORs) were calculated utilizing a Der-Simonian and Laird random effects model with a corresponding test of heterogeneity. Hierarchical non-linear models including HSROC model and the bivariate model were used in our study to evaluate the diagnostic accuracy. The non-threshold heterogeneity was presented with the Q-I 2 statistic in the forest plots. An I-squared value >50% was regarded as a threshold for determining substantial statistical heterogeneity [15]. SAS 9.4 (SAS Institute Inc) with a "mixed" command was utilized for the multiple comparisons of the sensitivity and the speci city among TE, 2D-SWE and MRE [16]. R 4.0.3 (R Foundation for Statistical Computing) with "mada" was utilized to compare the tted SROC curves of TE, 2D-SWE, and MRE [17]. The pairwise comparisons of the AUROC values were conducted through the DeLong test [18]. A P value < 0.05 was considered to indicate statistically signi cant differences.

Publication bias
Deeks' funnel plots were generated by Stata 16.0 with "midas" command and a "mylabels" package for the evaluation of publication bias of the included studies. A P < 0.05 was considered to indicate the existence of publication bias.

Exploration of heterogeneity
As different cut-off values were adopted in individual studies, the threshold effect was evaluated via spearman correlation analysis of the sensitivity and the speci city with MetaDisc 1.4. Meta-regression analysis were used to evaluate the in uence of seven characteristics of individual studies on AUROC, namely the location of the study population (Asia vs Europe), study design (prospective cohort study vs retrospective cohort study or cross-sectional study), mean biopsy length (<20 mm vs ≥20 mm), mean ALT (<5 ULN vs ≥5 ULN), liver biopsies scoring system (Metavir vs non-Metavir), the interval between biopsy and imaging test (<3 months vs ≥3 months), study quality (all question score yes vs one or more questions scored no or unclear).

Characteristics and the Quality of the Retrieved Studies
The ow diagram of the study selection is presented in Figure 1. 4190 records were retrieved utilizing our primary search strategies. 3609 articles were identi ed after duplications removed. After excluding the studies that did not ful ll the eligibility criteria, 28 studies were ultimately included, which were listed in the reference portion of Supporting Information.     Figure 2 and Table 4, MRE was more accurate than both TE (Z=8.88, P 0.01) and 2D-SWE (Z=5.16, P 0.01). 2D-SWE was superior to TE (Z=3.53, P 0.01). As shown in Figure Figure 2 and Table 4, MRE was more accurate than TE (Z=4.71, P 0.01). MRE displayed similar accuracy with SWE (Z=1.73, P=0.08). The accuracies of 2D-SWE and TE were comparable (F=1.47, P=0.14). As shown in Figure 4, in descending order, the combined DORs of MRE, 2D-SWE, and TE for staging cirrhosis were 223.9 (I 2 = 0, P = 0.607), 70.62 (I 2 =59.6%, P = 0.042) and 24.21 (I 2 =58.7%, P = 0.001).

Heterogeneity and Publication Bias
Non-threshold heterogeneity was observed in TE, 2D-SWE and MRE for detecting signi cant brosis and cirrhosis (Supplementary Table 3). A meta-regression analysis can only be conducted in groups of more than 10 studies with complete data to examine the methodological heterogeneity. In groups larger than 10 studies, heterogeneity existed when TE was used for staging brosis. The diagnostic accuracy was not affected by the following factors when TE was used for staging signi cant brosis and cirrhosis: study design (P = 0.18 and 0.87), classi cation criteria (P = 0.5 and 0.21), region (P = 0.4 and 0.49), interval between biopsy and imaging test (P = 0.55 and 0.32), obviously abnormal ALT (P = 0.9 and 0.94), liver biopsy length (P = 0.33 and 0.71), and QUADAS-2 score (P = 0.1 and 0.94). There was no evidence of publication bias for TE, 2D-SWE and MRE for staging brosis (Supplementary Figure 3).

Discussion
In this meta-analysis, we evaluated the diagnostic performance of TE, 2D-SWE and MRE for staging brosis in CHB. This study demonstrates that TE, 2D-SWE, and MRE express acceptable diagnostic accuracies to stage signi cant brosis and cirrhosis in treatment-naive people with HBV. Additionally, for staging SF and cirrhosis, both MRE and 2D-SWE are both the best choices while TE can be regarded as a secondary option. For staging SF, the summary AUROC value of MRE is signi cantly greater than that of 2D-SWE (0.99 vs 0.89, P < 0.01) and TE (0.99 vs 0.84, P < 0.01). 2D-SWE is superior to TE (AUROC = 0.89 vs 0.84, P < 0.01). For discriminating cirrhosis, MRE is superior to TE (AUROC = 0.99 vs 0.9, P < 0.01), but comparable to 2D-SWE (AUROC = 0.99 vs 0.9, P = 0.08). Moreover, the diagnostic accuracy of MRE for staging SF and cirrhosis was both greater than 0.95 using AUROC, at cut-off values 2.47-4.07 kpa and 3.46-6.87 kpa.
MRE expressed the best diagnostic performance for staging SF and cirrhosis, which is also consistent with previous ndings in patients with CLD [19; 20]. For discriminating cirrhosis in patients with other etiologies such as NAFLD, MRE and 2D-SWE were also comparable. Both MRE and 2D-SWE were regarded as better choices for staging cirrhosis [21].
Venkatesh et al. [22] con rmed that normal LSM assessed through MRE in normal Asian population is highly reproducible. The results were not affected by age, sex and body mass index (BMI). MRE can visualize the whole substantive organ without an accurate acoustic window, which is superior to TE [23]. A larger measurement area of the liver can effectively lower the sampling errors [24]. Ichikawa et al. [25] explained that TE can only conduct a unidirectional measurement, which is more likely to be interfered with re ection wave and refraction wave. In terms of MRE, it evaluates the two-dimensional (2D) or even three-dimensional (3D) displacement vector. Additionally, compared with TE, MRE can generate better quality of gures with compressional and continuous waves. Because MRE conducted with gradient-recalled echo (GRE) sequence has been well-validated in previous large cohorts of clinical studies [26], the most commonly applied commercial MRE technique is GRE-MRE [27]. Nevertheless, the conventional GRE-MRE technique tend to be technically de cient as the process of its imaging is easily susceptible to the iron deposition. Hence, GRE-MRE is rather time-consuming and a more stringent breath hold by the patients is required [28]. To lower these barriers, spin-echo-based echo planar imaging (SE-EPI) MRE sequence was developed. This novel sequence is less sensitive to the iron-overload and thus contributing to shorter imaging time and a higher technical success rate [29]. Despite the promising advances, MRE is currently time-consuming and costly. Considering the high prevalence of CHB and the scarcity of MRE in Asian countries, there is still a long way to popularize MRE for CHB patients on a large scale.
Compared with TE, easier access to the ROI with high quality measurements with a colorful elasticity map would be accessible through 2D-SWE [30]. Moreover, the variation of blood ow can be monitored through 2D-SWE [31; 32]. As inspired by its advantages and indicated by the results in our article, 2D-SWE may be a better choice to stage SF than TE.
There are still limitations in this study. First, due to the incomplete data, our meta-regression analysis did not include the factors such as obesity, ascites, HBV-DNA, which may also be the sources of heterogeneity and thus affecting our ultimate conclusions [33]. Surprisingly, Petzold et al. [34] pointed out that parameters such as age, gender, BMI, and liver function indexes had no signi cant impact on LSM measured by 2D-SWE. It is worth noting that the LSM measured through TE tends to be affected by in ammation, congestion, and cholestasis [35] and thus affecting our judgments. Second, this study did not take the nancial cost, the convenience or success rate of examination into consideration. A cheaper and less time-consuming technique would lower the barrier for clinical applications [36]. Moreover, regarding 2D-SWE and MRE, although AUROCs are high, the number of studies on which rely these ndings is rather small (n = 6 and 827 patients; n = 5 and 408 patients), limiting the persuasiveness of our conclusions. Therefore, more prospective and multicenter studies are needed.
Collectively, our current study con rms that TE, 2D-SWE, and MRE express acceptable diagnostic accuracies in staging brosis in treatment-naïve people with Declarations the study conception and study supervision.
Funding This study has received funding by the National Natural Science Foundation of China (82070574), the Natural Science Foundation Team Project of Guangdong Province (2018B030312009), the Fundamental Research Funds for the Central Universities (19ykpy29).

Compliance with ethical standards
Con ict of interest The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent Not applicable.
Clinical trials registration Not applicable.
Data sharing We agree with the policy in the journal. The HSROC plots of TE, 2D-SWE and MRE for the sensitivity and speci city in brosis stage F ≥ 2 and F = 4. The study estimate represents the data of each included study and the size of the circle represents the weight of each study based on the number of patients; the summary point represents the summary sensitivity and speci city; the 95% con dence region represents the 95% con dence interval of the summary sensitivity and speci city; the 95% prediction region represents 95% con dence interval of sensitivity and speci city of each individual study included in the meta-analysis.