This study was conducted and reported in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines .
I. Study selection
This meta-analysis and systematic review included studies that (a) were RCTs and non-RCTs involving primary experimental designs; (b) were designed with clearly defined interventions i.e. EPT+EPBD versus EPT; (c) included participants with CBD stones; (d) were published in English between 2003 and 2020; and (e) included one of the outcome variables. Studies were excluded if they (a) were published in a non-English journal; (b) were meta-analyses or review studies; (c) did not include a comparison group. The study protocol has been registered with Prospero (number CRD42020171689).
II. Search strategy and data sources
Comprehensive searches of Embase, PubMed, and Web of Science were conducted for studies published between January 1, 2003 and February 2, 2020. A medical librarian (MLF) assisted in developing the search strategies. Sets of keywords were used in conjunction with OR and AND to ensure that the search was comprehensive. Additionally, we cross-referenced and manually searched the bibliographies of pertinent studies. Detailed search strategies and results are summarized in S1 Appendix. The search yielded 1,228 articles, of which 845 non-duplicate citations were screened. Ten RCTs and eleven non-RCTs were ultimately selected for analysis (Fig 1).
III. Data extraction and quality assessment
Two investigators (TWC and JLC) independently extracted study data including publication year, study design, study population, characteristics of the stones, type of endoscopic technique performed, rate of successful first-session stone removal, the need for ML, and recorded adverse events. Eligibility for study inclusion was based on these data. Any discrepancies or disagreements on which studies to include were resolved through consensus. When the study investigators could not agree, a third investigator (JL) was consulted.
We used the Cochrane Risk of Bias Tool to assess quality and risks of bias for the RCTs  and the modified Newcastle-Ottawa Scale (NOS)  according to Cummings et al. 2010  to assess quality and risks of bias for the non-RCTs. Two investigators (TWC and JLC) independently conducted quality assessment of the RCTs and non-RCTs. The criteria used are presented in the supplementary figure (Fig S3A and Fig S3B). Stone size was assessed by comparing the largest diameter of the stone with the (ERCP endoscopic) diameter of the duodenoscope 12-13 mm, as measured on the cholangiogram. In addition, dilatation balloon diameter (12,15,18,20 mm) was chosen according to the CBD stone size and was used as a reference for comparing CBD stone. Despite a difference in balloon size, dilatation technique and management algorithm, there was only minor variation across the included studies, and the main principle and purpose to remove CBD stones is similar among the expert endoscopists. The management algorithm and CBD stones size evaluation method in included studies were summarized in table 1. In our study, subgroup analyses were performed based on the mean CBD stone size of the included studies that they were divided into group A(small): <13mm, B (medium): 13-17 mm, and C(large): >17 mm. They were classified based on reference to the diameter of the (duodenoscope) (12-13 mm) or the diameter of dilation balloons (large: over 17 mm) [21, 22] for subgroup analysis. (Table2). In clinical practice, CBD stone size is a variable which is difficult to define with millimeter precision.
After checking the CBD stone measurement method for each included studies, we found that most studies assess CBD stone size, number and bile duct size based on the cholangiogram findings during ERCP, and this was true for both RCTs and non-RCTs. Therefore, pooled data combining RCTs and non-RCTs was performed for this meta-analysis to study the relationship between stones size and efficacy of EPT+ EPBD in stone removal.
IV. Data synthesis and analysis
We used statistical software (Stata 15.0, Stata Corp, College Station, TX) to calculate pooled odds ratios (ORs), risk differences (RDs) or standardized mean differences (SMDs) with 95% confidence intervals (CIs) for each pairwise comparison . Pooled ORs and their 95% confidence intervals (CIs) were estimated with a fixed effects model if there was no significant heterogeneity, and with a random effects model if significant heterogeneity existed. Two methods were used to assess heterogeneity: the X2-based Q test, the results of which were considered to be statistically significant if the P value was < 0.05; and I2 statistics, wherein values 30~60% and 60~90% suggested moderate and substantial heterogeneity, respectively [24, 25]. Sensitivity analyses were performed to verify the source of heterogeneity . Relationships between stone size and outcomes (first stone clearance rate and ML usage rate) were analyzed using meta-regression with the logarithmic ORs (log OR) as the dependent variable . Log OR equal 0 means Odd ratio equal 1. Publication bias was assessed qualitatively by inspecting funnel plots of logarithmic ORs (log OR) versus their standard errors, and quantitatively using the Egger regression test  and Begg and Mazumdar adjusted-rank correlation test. Publication bias was considered present if the P value was < 0.1 .