Systematic Differences in Effect Estimates Between Observational Studies and Randomized Control Trials in Meta-Analyses Combining Both Study Designs in Nephrology

The limited availability of randomized controlled trials (RCTs) in nephrology undermines causal inferences in meta-analyses. Systematic reviews of observational studies have grown more common under such circumstances. We conducted systematic reviews of all comparative observational studies in nephrology from 2006 to 2016 to assess the trends in the past decade. We then focused on the meta-analyses combining observational studies and RCTs to evaluate the systematic differences in effect estimates between study designs using two statistical methods: by estimating the ratio of odds ratios (ROR) of the pooled OR obtained from observational studies versus those from RCTs and by examining the discrepancies in their statistical signicance. The number of systematic reviews of observational studies in nephrology had grown by 11.7-fold in the past decade. Among 56 records combining observational studies and RCTs, ROR suggested that the estimates between study designs agreed well (ROR: 1.05, 95% condence interval: 0.90-1.23). However, almost half of the reviews led to discrepant interpretations in terms of statistical signicance. In conclusion, the ndings based on ROR might encourage researchers to justify the inclusion of observational studies in meta-analyses. However, caution is needed as the interpretations based on statistical signicance were less concordant than those based on ROR.


Introduction
Randomized controlled trials (RCTs) provide high levels of evidence because they can minimize threats to internal validity. However, it is di cult to conduct RCTs in certain situations, such as with participants with serious complications, interventions with ethical constraints (e.g. surgical procedures) and serious adverse effects [1][2][3][4] . In particular, RCTs in nephrology have been limited because patients with kidney diseases generally have a number of complications [5][6][7][8][9] . When the number of available RCTs is insu cient, meta-analyses restricted to RCTs can be misleading 10,11 . Authors might then be justi ed in including observational studies 3,12 . Observational studies can re ect real world practices and have superior generalizability in comparison with RCTs under ideal conditions. However, the GRADE system reports that nonrandomized studies constitute only a low level of evidence due to many threats to internal validity 13 .
Discrepancies in the ndings between observational studies and RCTs can be caused by differences in sample size, confounding factors, several biases such as selection bias, publication bias, and follow-up period 14,15 . In particular, unmeasured confounding factors can hamper causal inferences between the exposure and outcome 16,17 . Despite such controversy, several previous studies have reported that there were no differences in the risk estimates obtained from meta-analyses of observational studies in comparison with from those from RCTs 14,18,19 . However, the evidence has not been su ciently established in nephrology. Further, recent meta-analyses comparing observational studies with RCTs based their conclusions on the ratio of odds ratios (ROR) between the pooled OR derived from observational studies and those derived from RCTs, while most clinical studies generally interpret e cacy based on the statistical signi cance [18][19][20][21][22] .
Therefore, in the present study, we aimed to 1) assess the trends and characteristics of systematic reviews of observational studies in nephrology in the past decade; and 2) quantify systematic differences in effect estimates between observational studies and RCTs in meta-analyses using two statistical methods: ROR and discrepancies in statistical signi cance between the two study designs among metaanalyses combining observational studies and RCTs.

Literature search and selection of studies
The literature searches were conducted in January 2017 using EMBASE and MEDLINE. We searched studies published from January 2006 to December 2016 with no language limitation. The search strategy was developed with the assistance of a medical information specialist. The search strategy included key words related to 'observational study', 'systematic review', and 'kidney disease' (see Supplement Table 1). Search terms relevant to this review were collected through expert opinion, literature review, controlled vocabulary-including Medical Subject Headings (MeSH) and Excerpta Medica Tree-and a review of the primary search results. The titles and abstracts were screened independently by two authors (M.K, K.K) and were excluded during screening if they were irrelevant to our research question or duplicated. Studies suspected of including relevant information were retained for full text assessment using inclusion and exclusion criteria. If more than one publication of one study existed, we grouped them together and adopted the publication with the most complete data. The present study was conducted according to a protocol prospectively registered at PROSPERO (CRD42016052244).
Evaluation of the characteristics of the systematic reviews of observational studies We included systematic reviews of all comparative observational studies in nephrology to assess the trends and characteristics of systematic reviews of observational studies in nephrology in the past decade. We included systematic reviews published from 2006 to 2016 to assess the in uence of reporting assessment tools including PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) published in 2009 and the risk of bias (RoB) tools including the Newcastle-Ottawa Scale 1. We included studies on participants with kidney diseases. Kidney diseases were de ned as diseases that occurred in the renal parenchyma, such as acute or chronic kidney injury, kidney neoplasms, and nephrolithiasis, based on the MeSH search builder of the term 'Kidney Diseases'. Studies were excluded if they had participants with extra-renal diseases including ureteral diseases, urethral diseases, and urinary bladder diseases.
2. We included studies with primary outcomes related to kidney diseases. We used the same de nition of kidney diseases as above. We excluded studies wherein kidney diseases were treated as a composite outcome (e.g. composite outcome of kidney, pancreas, and liver cancers).
We described the characteristics of systematic reviews of observational studies as follows: 1 Comparison of effect estimates between observational studies and RCTs in meta-analyses combining both types of studies To compare the effect estimates between study designs, we focused on the meta-analyses combining observational studies and RCTs that compared two speci c interventions. We included non-randomized studies, such as cohort, case-control, cross-sectional, and controlled trials that use inappropriate strategies of allocating interventions (sometimes called quasi-randomized studies), as observational studies 23 . We expressed the quantitative differences in effect estimates for primary e cacy outcomes between study designs, taking the ROR 24 . Further, we assessed discrepancies in the statistical signi cance between study designs. The absence of discrepancies, which represents agreement between e cacy and effectiveness, was de ned as follows: 1) both study types were signi cant with the same direction of point estimates, and 2) both study types were not signi cant. In contrast, the presence of discrepancies was de ned as follows: 1) one study type was signi cant while the other type was not signi cant, and 2) both study types were signi cant, although the point estimates had the opposite direction 24 . We performed the assessment of the methodological quality of these meta-analyses combining both types of studies using AMSTAR (assessment of multiple systematic reviews) appraisal tool 25 . Two review authors (M.K, A.O) independently graded each review for rating overall con dence as high, moderate, low, and critically low.

Data extraction
Two authors (M.K, K.K) collected the above-mentioned characteristics of systematic reviews of observational studies. Further, we also determined the effect size for the primary outcome of the primary articles. All relevant data related to the comparison of effect estimates between study designs were independently recon rmed by two authors (A.O, A.T) using standard data extraction forms.

Statistical analyses
We described the baseline characteristics of systematic reviews of observational studies using means (standard deviation [SD]) for continuous data with a normal distribution, medians (interquartile range [IQR]) for continuous variables with skewed data, and proportions for categorical data.
For the comparison of effect estimates between observational studies and RCTs in meta-analyses combining both types of studies, we extracted relative risks or risk ratios (RR), OR, hazard ratios (HR), standardized mean differences (SMD), and mean differences (MD) of each primary study and their standard errors (SE) or 95% con dence interval (CI) from reviews as outcome measures. If an OR was not reported in a review, we recalculated the OR by extracting the number of events and non-events in both the intervention and control groups from a review or the primary study itself. If the number of events or non-events was 0, we added 0.5 to all cells of each result 23 . If we could not determine the number of events or non-events from a review or primary articles to calculate the OR, we substituted original outcome measures, such as RR and HR, instead of OR. Additionally, SMD and MD were converted to OR based on a previous study 26 . The SE and 95% CI were calculated in accordance with previous studies 22,24 . Further, if the reviews did not report effect sizes separately for two designs, we synthesized the results obtained from primary articles. If positive outcomes such as survival were adopted, the OR comparing the intervention with control were coined. Additionally, if ordinary or older interventions were included in the numerator of the OR, those OR were also coined. If several outcomes were reported, we used the rst outcome that was described in the paper.
We estimated the difference in the primary e cacy outcomes between study designs by calculating the pooled ROR with the 95% CI using a two-step approach 27 . First, the ROR was estimated with the OR obtained from observational studies and RCTs in each review using random effects meta-regression. Second, we estimated the pooled ROR with the 95% CI across reviews with a random-effects model. Further, we performed sensitivity analysis using xed effect model. If the ROR was more than 1.0, this would indicate that the OR from observational studies were larger than those from RCTs. Heterogeneity was estimated using I 2 test 23 . I 2 values of 25%, 50% and 75% represent low, medium and high levels of heterogeneity.
Further, we examined the association between discrepancies in statistical signi cance of each design in accordance with above-mentioned de nitions and risk factors using a multiple logistic regression model, adjusted for difference in the number of primary articles between study designs, publication year, countries of rst authors, pharmacological intervention, adjustment for confounding factors, and methodological quality of systematic reviews based on rating overall con dence of AMSTAR tool.
All statistical analyses were performed using STATA 16.0 (StataCorp LLC, College Station, TX, USA).

Study ow diagram
The PRISMA ow diagram (see Fig. 1) shows the study selection process. Of 5,547 records identi ed through database searching, we screened the titles and abstracts of the 3,994 records remaining after removing duplicates and ultimately obtained 613 records. After a full-text review, we included a total of 477 records for the description of characteristics of systematic reviews of observational studies. Further, of the 114 records that combined both observational studies and RCTs, 56 were eligible for the evaluation of quantitative systematic differences in effect estimates of meta-analyses between observational studies and RCTs (see Supplement Table 2).

Trends over the past decade and description of study characteristics
We summarized the baseline characteristics of 477 nephrology systematic reviews of all comparative observational studies (see Table 1). The number of systematic reviews of observational studies in nephrology has increased by 11.7-fold in 2016 compared to 2006. In particular, the number of publications from China as well as the United States of America and European countries has increased (see Supplement Table 3). As shown in Table 1, most of the reviews dealt with topics related to therapies for patients with acute kidney injury, malignancy, end-stage renal diseases, and renal transplantation, aside from basic research. As for the eligible designs of observational studies, 67.1% records included cohort studies, and 33.8% included case-control studies. Of the 82 reviews related to basic research, 75 (91.5%) included case-control studies. Case series and before-after studies without comparisons were excluded in many studies. NOS was the most frequently used tool for assessing the risk of bias. ACROBAT-NRSI was used in only 0.8% of records.
Comparison of qualitative systematic differences in effect estimates between observational studies and RCTs in meta-analyses combining both types of studies Fifty-six meta-analyses combining both observational studies and RCTs were eligible for the analyses. A total of 418 observational studies and 204 RCTs were included, and the median number per metaanalysis was 7 (2.5 to 10) observational studies and 3 (2 to 5) RCTs. Almost reviews indicated a critically low quality (see Supplement Table 4).
We compared the effect estimates of primary outcomes between study designs using the ROR with the 95% CI. No signi cant differences were noted in the effect estimates between study designs (ROR 1.05, 95% CI 0.90 to 1.23) (see Fig. 2). There was moderate heterogeneity (I 2 = 47.5%). Additionally, the result obtained using the xed effect model was almost similar with that obtained using the random effect model (ROR 0.98, 95% CI 0.89 to 1.07). Of the 56 studies, 2 reviews showed that observational studies had signi cantly larger effects than RCTs (ROR>1.0), while 6 showed that observational studies had signi cantly smaller effects than RCTs (ROR<1.0). The remaining 48 reviews indicated no signi cant differences between the study designs.
Of the 56 studies, 29 reviews showed no discrepancy in terms of statistical signi cance (14 reviews; signi cant in the same direction as the point estimates, 15 reviews; neither signi cant), while 27 reviews showed some discrepancy (all 27 studies: one signi cant and the other not signi cant). No review showed statistical signi cance in the opposite direction of the point estimates. Table 2 shows a comparison of the baseline characteristics between the presence and absence of discrepancies. In addition, we explored the factors associated with discrepancies (see Table 3), but no signi cant association was noted for any covariate, in particular, difference in the number of papers between observational studies and RCTs (OR 1.10, 95% CI 0.99 to 1.23).
Further, comparing the results from ROR and the distribution of discrepancies of statistical signi cance, of 48 records (85.7%) that indicated the non-signi cance of the ROR, 20 (35.7%) showed discrepancies in statistical signi cance (see Table 4).

Discussion
The ndings of the present study indicated that the numbers of systematic reviews of observational studies in nephrology have dramatically increased in the past decade, especially from China and the United States of America. Around 60% of the reviews assessed the risk of bias, mostly using the NOS. A comparison of effect estimates between observational studies and RCTs in meta-analyses combining both types of studies revealed that the effect estimates from observational studies were largely consistent with those from RCTs. However, when interpreted in terms of statistical signi cance, almost half of the reviews led to discrepant interpretations.
Observational studies generally have larger sample sizes and better represent real-world populations than RCTs. Nevertheless, confounding factors, especially confounding by indication, often disturb the precise assessment of causal inference and establishment of high levels of evidence [28][29][30][31] . The quality of the evidence based on observational studies might depend on how confounding factors are controlled. Adjustment using appropriate techniques, including propensity score matching and instrumental variables, are likely to be useful, although these methods cannot completely deal with unmeasured variables 32,33 . However, most of the reviews included in the present study did not mention the implementation of adjustment in detail.
Recently, several risk of bias appraisal tools for evaluating the quality of systematic reviews of observational studies in multiple domains have been developed, including ACROBAT-NRSI 34-36 . However, the present study showed that these tools are not yet widely implemented. Most of the studies reported the risk of bias using the NOS, although the NOS has been reported to show uncertain reliability and validity in previous studies 37, 38 .
In the present study, we compared the effect estimates between observational studies and RCTs in metaanalyses combining both types of studies using two analytical methods: ROR and discrepancies in statistical signi cance between the study designs. The ROR with the 95% CI revealed that effect estimates were, on average, consistent between the two study designs. However, with regard to the interpretation of the ndings, almost half of records showed discrepancies in statistical signi cance between the study designs. Further, 35.7% of records indicated disagreement in judgement between the two analytical methods. It might be reasonable to combine different types of designs in meta-analyses based on the ROR, because improvement of the statistical power leads to a more de nite assessment if a su cient number of RCTs cannot be obtained. However, interpreting the ndings should be done with care, as inconsistent ndings due to the modi cation of the analytical methods might re ect poor internal validity between the study designs. Additionally, the present study failed to identify systematic reviewlevel factors associated with the discrepancies in statistical signi cance, including difference in sample sizes between study designs and adjustment with confounding factors. In the future, we need to explore risk factors at the level of primary studies, such as detail of adjustment techniques for confounding factors and presence of biases in observational studies.
Several limitations of our study should be mentioned. First, it is possible that we failed to include several gray-area studies or smaller studies, although we performed a comprehensive search. Second, we included similar research questions that were published by different authors, which might have led to overestimation. Third, to compare effect estimates between study designs, we substituted original outcome measures, such as the RR and HR, instead of the OR, if the number of events could not be determined from primary articles to calculate the OR, similarly to previous studies 21,39 . However, results using the RR and HR are not necessarily consistent with those using the OR, particularly when the number of events is large. Fourth, we were unable to estimate the ROR adjusted for the methodological quality of systematic reviews based on the AMSTAR tool, as almost all reviews were judged to be of low quality. Finally, because we sampled meta-analyses including both observational studies and RCTs, it is conceivable that extreme results, either from observational studies or from RCTs, could have been excluded when the original meta-analysis was being conducted, leading to spurious greater concordance between the two study designs. Without a pre-speci ed protocol, we cannot assess the extent of such practices.

Conclusion
This study indicates that evidence synthesis based on observational studies has been increasing in nephrology. When we examined ROR, we found no systematic differences in effect estimates between observational studies and RCTs when meta-analyses included both study designs. These ndings might encourage researchers to justify the inclusion of observational studies in meta-analyses. This approach can increase statistical power and allow stronger causal inference. However, caution is needed when interpreting the ndings from both observational studies and RCTs because the interpretations based on statistical signi cance were shown to be less concordant than those based on ROR. Further studies are necessary to explore the causes of these contradictions.

Declarations
Data availability The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. Lilly, personal fees from Mitsubishi-Tanabe, personal fees from Asahi-Kasei, personal fees from Takeda, personal fees from Phyzer, grants from Advantest, outside the submitted work; A.T. reports personal fees from Mitsubishi-Tanabe, personal fees from Dainippon-Sumitomo, personal fees from Otsuka, outside the submitted work; T.A.F. reports grants and personal fees from Mitsubishi-Tanabe, personal fees from MSD, personal fees from Shionogi, outside the submitted work; M.K. and K.K. declare that they have no relevant nancial interests.

Competing interests
The authors declare no competing interests.  Adjusted for differences in the number of primary articles between observational studies and RCTs, publication year, country of rst author, and pharmacological intervention.