Analysis of Evidence-Rating Systems Used in Meta-Analyses of Pharmacotherapy

Background: Evidence-rating systems (ERSs) provide a framework for the systematic evaluation of the quality of individual interventional or observational studies and the overall body of evidence in meta-analyses. Authors and users of meta-analyses require a familiarity with ERSs to determine the level of condence in the application of results. Many ERSs have been published, but no consensus exists regarding best practice for their use. Objective: The aim is to describe patterns of use of ERSs in meta-analyses of drug therapy published in contemporary high-impact medical journals. Methods: We design a review. Medline / PubMed was searched to identify meta-analyses evaluating drug therapy from the top 5 ranked general medical journals from 2012 to 2016. Methods of full-texts were reviewed to ensure the meta-analyses evaluated drug therapy and to identify the ERS used to rate individual studies and the overall body of evidence. Frequency of ERS use was analyzed using descriptive statistics. Results: The top-ranked journals were Ann Intern Med, BMJ, JAMA, Lancet and PLoS Medicine. Of the 309 results, manual review excluded 111 meta-analyses. In 198 evaluated meta-analyses, 86.4% (171) utilized an ERS; the most commonly used was the Cochrane Risk of Bias Tool in 80.7% (138) of all meta-analyses. An ERS was used to evaluate the body of literature in 19.1% (38) of meta-analyses; the most commonly used of three systems was the GRADE methodology. Overall, 14 unique ERSs, including author-dened systems, were used Conclusions: Most meta-analyses of drug effects in high-impact medical journals evaluated individual studies with an ERS, most commonly the Cochrane Risk of Bias Tool, while the use of ERSs to evaluate the body of literature was less frequent. The familiarity of authors and users of meta-analyses with commonly used ERSs may facilitate the evaluation and application of ndings of meta-analyses.


Impacts On Practice
Experience is crucial for the evaluation and application of ndings of meta-analyses. Meta-analyses evaluating individual studies with an Evidence-Rating Systems were frequent compared to those evaluating the body of literature, and modi ed versions of Evidence-Rating Systems, although used, may not be optimal.
It could be bene cial for journals to be more explicit in terms of detail and risk of bias, so further work is required to develop a general framework of best practices in Evidence-Rating Systems.

Background
Evidence-based clinical decision-making relies on well-designed research that utilizes rigorous methodology [1,2]. Ranking highly among the various clinical research designs are meta-analyses, which have a goal of quantitatively integrating ndings of individual studies [3]. The results of meta-analyses may be more useful than those of the individual studies they contain; the integrated result may include a more representative population, so it possesses greater statistical power. Therefore, this provides more con dence in decision-making through more precise results and offer a more e cient review of a full body of literature [4].
These bene ts of meta-analyses are particularly relevant in the assessment of drug effects, which are part of a voluminous bibliography. A cursory Medline search of publications indexed in 2016 revealed approximately 10,500 results for publications on clinical trials related to drug therapy alone. Additionally, the Clinicaltrials.gov website lists 138,002 clinical trials on drugs or biologics registered as of May 09, 2019 [5].
In spite of these bene ts of meta-analyses, their results are only as reliable as those of the individual studies they include. For this reason, meta-analyses should evaluate the risk of bias of the individual studies they include. This practice is recommended by various sources, including the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Statement, the Meta-analysis Of Observational Studies in Epidemiology: a proposal for reporting (MOOSE) statement, and the Cochrane Handbook on Systematic Reviews of Interventions [6-8]. These sources are designed to improve reporting and utility of meta-analyses, and to delineate the preferred methodology for meta-analyses produced by some of the world's most recognized and authoritative groups of authors [6-9].
A study's risk for bias may be assessed through evaluation of elements of research design (e.g., randomization and blinding), and other design aws that lead to increased risk for selection, performance, information, and other biases [6-8]. Nonetheless, the assessment of study quality may present di culties because the concept of "quality" is not well de ned, not surprisingly, some assessments of markers of study quality (e.g., blinding) are judged subjectively and have poor inter-rater reliability [10,11].
In attempts to provide greater objectivity and reproducibility to assessments of study quality, numerous evidence-rating systems (ERSs) have been developed. A 2002 Evidence Report Summary from the US Agency for Healthcare Research and Quality (AHRQ) found that among 1,602 reviewed publications, many ERSs were used for different purposes -49 evaluated randomized controlled trials (RCTs), 19 evaluated observational studies, and 20 evaluated systematic reviews [12].
Several of these systems are often used in contemporary meta-analyses -for prospective trials, these include systems developed by AHRQ, Cochrane, and Jadad and colleagues. Underlying their familiarity, however, are signi cant differences in methodology. For example, the AHRQ and Cochrane tools provide non-numeric assessments to independent domains related to risks of bias, while the Jadad score generates a quantitative score re ecting methodological quality [8,13]. For observational studies, prevalent systems include the Downs-Black and Newcastle-Ottawa systems [14][15][16]. These also differ in that the Downs-Black system may be applied to both randomized and non-randomized studies and provide a semi-quantitative assessment, while the Newcastle-Ottawa system is applied only to those non-randomized trials as either a checklist or quantitative scale [16]. Additionally, tools that evaluate the quality of the overall body of evidence have been created to help summarize main ndings, and their use is also recommended by PRISMA and Cochrane [8,17]. These assessments are similar to epidemiologic determinations of causality (e.g., Bradford Hill criteria), which consider elements of quantity, quality and consistency of evidence for each outcome across studies [18].
Although there is much overlap between systems that assess the risk of bias in individual studies in some domains, none of the ERS address all the necessary components [13]. Certainly, different study designs and assessment purposes dictate the need for different ERSs, and none are considered to be more appropriate [4,14,21]. The resulting variability among the numerous systems can be problematic for those conducting meta-analyses [14]. Further knowledge regarding trends in the use of speci c ERSs in different scenarios may help identify commonly used systems.
Because the generation of meta-analyses involves assessment of the quality of individual studies and the overall body of evidence, familiarity with commonly used ERSs may help both authors and users of meta-analyses to assess the level of con dence and application of results of meta-analyses. Therefore, there is abundance of published drug studies, and coexist numerous ERSs in use, with lack of consensus on best practices.

Aim of the study
The objective of this study is to describe the use of ERSs (including to evaluate de body of literature) in meta-analyses of drug therapy published in high-impact medical journals.

Ethics Approval
Not applicable. This research does not contain any studies with human participants conducted by any of the authors.

Methods
We performed a review of the use of ERSs in meta-analyses of drug effects published in high-impact medical journals. The highest-impact journals in the category of Medicine, General and Internal for the years 2012 through 2016 were determined using the Clarivate Analytics Journal Citations Report (JCR) database [22]. The journals ranked within the top ve positions for this category in any year from 2012 through 2016 were selected as source journals for this analysis, similar to the methods utilized in other evaluations of reporting quality [23]. We searched for meta-analyses of drug effects published in these journals in Medline / PubMed using Medical Subject Heading (MeSH). The subheadings that were descriptive of drug effects included therapeutic use, adverse effects, drug therapy, and administration and dosage. Meta-analyses were targeted for searching the term "meta-analysis" in the title eld or articles classi ed with the National Library of Medicine publication type meta-analysis. The time period 2012 to 2016 was selected to allow su cient time for indexation with MeSH terms between the time of publication and the time of searching, because delays in indexing have previously been documented [24]. In the online supplementary appendix, we report the key terms (PRISMA owchart as Fig. 1) outlining the search strategy. This research did not require further review by a local Institutional Review Board as the research does not meet the de nition of human subject research. Methods of the analysis and inclusion criteria were speci ed in advance and documented in an internal non-registered protocol (available upon request).
Medline data was downloaded from PubMed and entered into a standardized data collection form. The full-texts of all the articles was manually reviewed and categorized for inclusion criteria if they were determined to be meta-analyses (quantitative data), evaluated if the subject treated was drug therapy. The methods section, and when indicated, supplemental material, of included articles were reviewed and data was collected regarding the ERS that was used (if any) to rate the quality of evidence at the levels of study and the body of literature, methods used for assessing risk of bias of individual studies, and whether authors incorporated modi cations to an ERS. A global assessment of risk of bias that may affect evidence was done. Descriptive statistics were calculated using SPSS to report frequencies.
Principal summary measures may differ from that used in some of the included studies and may not be given for all studies. Exploratory analyses describing ERS use, strati ed by journal, was also performed.
The between-study variability (heterogeneity or inconsistency) of results may in uence the decision of whether to combine results in a meta-analysis, so there were no planned methods for combining results.
None sensitivity nor subgroup additional analyses was done. Overall, 86.4% (n = 171) of meta-analyses of drug effects published in high-impact medical journals utilized an ERS to evaluate interventional or observational studies. Fourteen unique ERSs were identi ed which included author-de ned systems: these were utilized to evaluate interventional and observational studies in 12 and 2 meta-analyses, respectively.

Review
Among the meta-analyses reporting use of an ERS to rate interventional trials (n = 171), the most frequently used ERS was the Cochrane Risk of Bias Tool (80.7%; n = 138), followed by a variety of other systems (Table 1). Meta-analyses that reported ERSs for rating observational studies (n = 24) most frequently used the Newcastle-Ottawa scale (66.7%; n = 16; Table 2).  An ERS evaluating the body of literature was used by 38 meta-analyses, with the most commonly used system being the GRADE methodology (78.9%; n = 30; Table 3). Systems developed by AHRQ and the USPSTF were also utilized (n = 7 and n = 1, respectively). Four meta-analyses incorporated a secondary ERS to evaluate interventional trials, including systems by Jadad (n = 2), AHRQ (n = 1), and McHarm (n = 1). No meta-analyses evaluating observational studies utilized a secondary ERS. Not all meta-analyses incorporated present results with con dence intervals and measures of consistency.
Modi cations of ERS were made in 11 meta-analyses. 8 meta-analyses were interventional studies; The ERS modi ed included the Cochrane Risk of Bias Tool (n = 7) and Jadad systems (n = 1). Observational ERSs were modi ed in 3 meta-analyses, including the Newcastle-Ottawa scale (n = 2) and ROBINS-I (n = 1). Most meta-analyses present results of any assessment of risk of bias across studies. No results from exploratory subgroup nor sensitivity analyses was done, bearing in mind the potential for multiple analyses to mislead.
ERS use across journals indicated variations in whether any ERS was used and which ERS was used (Table 4; Fig. 2). Notably, the meta-analysis reporting use of AHRQ methods referenced the AHRQ Methods Guide for Effectiveness and Comparative Effectiveness Reviews; this document provides information about the process of evaluating risk of bias, but does not propose its own ERS per se, and did not explicitly state the actual ERS used to assess the risk of bias [14]. The variety of unique ERSs identi ed con rms there is no well-accepted gold standard system, consistent with statements to this effect by respected organizations commenting on this topic [14]. While most meta-analyses included an ERS to rate quality of individual studies, only 19.1% incorporated an ERS evaluating the body of literature. In addition, inter-journal practices varied, as noted by the disparity in the proportion of meta-analyses using any ERS, which was as low as 53.7% in Lancet, compared to 100% in Annals of Internal Medicine and JAMA.
These ndings indicate that adherence to the recommendations for the reporting of several aspects of meta-analytic designs might be suboptimal. For example, the PRISMA statement and Cochrane Handbook recommend that authors specify the assessments of risk of bias for each study, across studies, and at the outcome level. Similarly, MOOSE guidelines suggest less speci cally that risk for bias should be discussed [6-8]. Our ndings (88.4% of meta-analyses in high-impact medical journals) show variations in use across journals; these ndings suggest standards is not consistently meet. Systematic reviewers should have exibility to choose the tool that best matches their study and literature base.
Other notable ndings were that more than 10 meta-analyses used modi cations of an ERS. This practice may not be optimal, as authors of the GRADE system have claimed such modi cations undermining the goal of promoting a single system in which all readers can be familiar with [25]. Indeed, recent commentaries have questioned the rigor in using modi ed versions of ERS in meta-analyses, speci cally the Newcastle-Ottawa scale [26]. Readers of meta-analyses should therefore be observant for such modi cations and consider their effects on estimates of study quality. Additionally, author-de ned systems were utilized to evaluate interventional and observational studies in 12 and 2 meta-analyses, respectively. One of these relied on methods in rating study quality based on those from a previous publication, which was not an ERS per se, but evaluated how speci c elements of study design biased the estimates of an intervention's effect [27,28]. In such instances, authors should be explicit in the methods used to assess the risk for bias to clearly explain the process to readers.
Our ndings suggest potential improvements in standards for publication of meta-analyses; it could be bene cial for journals to consider more explicit statements regarding the amount, and type of detail, regarding risk of bias assessments to improve reporting. For example, guidances from journals included in this analysis refer authors to PRISMA, MOOSE, and other relevant guidances in reporting of systematic reviews [29][30][31][32][33]. In addition to the commonly used system developed by GRADE Working Group, others have been developed by the AHRQ, the US Preventive Services Task Force and the Oxford Centre for Evidence-Based Medicine [18][19][20]. These systems evaluate different domains and incorporate their own processes of translating assessments of a body of literature into clinical recommendations, such as those provided by clinical practice guidelines. However, an assessment of 40 of these systems by the 2002 AHRQ report determined that they are less uniform than those used for assessing individual studies, which may complicate the selection of an appropriate system to rate a body of evidence [12]. Further instructions from journals may help address limitations identi ed in this study. For example, while certain ERSs will undoubtedly be preferred in different scenarios, journals may consider establishing a preferred ERS for meta-analyses characteristic of the journal's scope, such as establishing a preferred ERS for interventional and another for observational studies. This may allow for more meaningful comparisons of estimates between meta-analyses published in the same journal, when for example, one meta-analysis evaluates e cacy, and another evaluates safety of the same drug. This task would be more complex if different meta-analyses used various systems to rate the quality of evidence. Journal-speci c preference of particular ERSs may cultivate more awareness and familiarity among readers and facilitate application of evidence in practice.
Our analysis has several limitations. Firstly, we only considered a narrow scope of journals, the top ve journals in Medicine, General & Internal; this category covers resources on medical specialties such as general medicine, internal medicine, clinical physiology, pain management, military and hospital medicine, whereas, Pharmacology and Toxicology category are not included and covers resources on the discovery and testing of bioactive substances, including animal research, clinical experience, delivery systems, and dispensing of drugs. This category also includes resources on the biochemistry, metabolism, and toxic or adverse effects of drugs. These ndings may not be representative of nonmedical journals or medical journals with lower impact factors or from specialized practice areas, and Embase search should also have been made. Secondly, the editorial and peer review standards of higherimpact journals in this analysis may have produced ndings more re ective of the "best practice" in meta-analysis production. Among these journals, a large number of meta-analyses originated in BMJ, and the requirements of this journal may disproportionately in uence our overall ndings. Thirdly, we only evaluated meta-analyses of drug effects, and our conclusions are not generalizable to meta-analyses of other interventions. Combined together, these limitations indicate there may be a greater variety of ERS utilization outside journals and interventions considered in this review.
Future research, conducting an overview of systematic reviews would be needed. The 40 elds of the protocol should be prospectively registered on PROSPERO, an international prospective database of registered systematic reviews, developed and managed by the Centre for Reviews and Dissemination (CRD) at the University of York.
Our review represents, to the authors' knowledge, the rst description of the frequency of use of ERSs in the medical literature. Further research should address these ndings to develop a general framework for best practices in this eld.

Conclusions
Most meta-analyses of drug effects in high-impact medical journals evaluated individual studies with an ERS, most commonly the Cochrane Risk of Bias Tool, while use of ERSs to evaluate the body of literature was less frequent. The evaluation and application of ndings of meta-analyses may be facilitated by familiarity of authors and users of meta-analyses with commonly used ERSs, as well as more speci c guidance for journal submissions. -Competing interests

Abbreviations
The authors declare that they have no competing interests. -Funding

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.