We will conduct two studies to assess the frequency (objective 1) and the implications (objective 2) of the use of GRADE in current dental literature. We adhered to all sections of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) statement that applied to our methodological study [see Additional File 1].11
Search Strategy
We will utilize one search strategy to retrieve potentially eligible SRs for both studies.
We will perform a search in Ovid MEDLINE from January 1, 2016, to the present day. We will use search filters from the Health Information Research Unit (HIRU) of McMaster University as well as the Medical Subject Heading (MeSH) “dentistry” to search for SRs.12 There will be no language restrictions in our search strategy. Our final search strategy (Box 1) will be reviewed by a methods expert (R.B.-P.).
Box 1: Ovid MEDLINE Search
- MEDLINE.tw.
- systematic review.tw.
- meta analysis.pt.
- 1 or 2 or 3
- exp Dentistry/
- 4 and 5
- limit 6 to yr= “2016 -Current”
Terms 1-4 refer to the HIRU review filter that maximizes specificity.
|
Screening Process
For both studies, we will screen the titles and abstracts and full texts of the retrieved citations independently and in duplicate using Covidence. Conflicts will be resolved through discussion or by a third reviewer when necessary. Eligibility criteria will be different for each study and are described below.
Study Sample & Random Sampling of Citations
We will screen all citations retrieved in our search at the title and abstract screening stage. We will then take a random sample of studies that meet the eligibility criteria at this stage and screen them in full-text. We will repeat the random sampling process until we reach our target sample size.
To obtain an informative sample, we aim to include a minimum of 50 SRs that use GRADE. Given the findings of a previous study which found that nearly 30% of oral health SRs used GRADE, we used a more conservative estimate of 25% and determined our target sample size would then be 200 SRs.9
Study 1: Assessment of the frequency of the utilization of the GRADE approach in recent dental literature
Objectives
- To determine the frequency of the utilization of GRADE in dentistry SRs.
- To summarize the frequency of the levels of certainty determined by GRADE assessments conducted in dentistry SRs.
- To assess whether GRADE is being used appropriately at both the review and outcome level (for the primary outcome) in dentistry SRs.
- To evaluate whether SRs using GRADE differ from those that do not use GRADE with regards to methodological quality.
Eligibility Criteria
Inclusion Criteria
We will include SRs of interventions in dentistry, published in English, which included only RCTs.
We will consider SRs to be studies in which either of the following criteria are met:
- The authors refer to the study as either a SR or meta-analysis and search at least one electronic database for published studies.
- The authors search at least one electronic database for published studies and use well-defined eligibility criteria. We will consider eligibility criteria to be well-defined if it comments on all of the following:
- The study designs to be included in the SR.
- The population of interest for the research question (e.g., patient characteristics, specific indication for treatment).
- The intervention(s)/comparator(s) the authors aim to investigate.
In order to be considered a SR in dentistry, one of the following conditions must be met:
- The SR includes studies in which patients receive treatment for an oral pathology or undergo an oral-health related procedure.
- The SR includes studies in which one oral-health related intervention is compared to another, placebo, or standard care.
Exclusion Criteria
- SRs which conduct network meta-analyses (NMAs)
- SRs which find no evidence and therefore fail to include any studies
- SRs which are published in combination with another type of study (e.g., case study/series, health technology assessment, clinical practice guidelines, etc.)
Data Extraction
Pairs of reviewers will extract data from eligible studies independently and in duplicate using forms created in forms created in Microsoft Excel. Reviewers will undergo a data extraction calibration exercise of three SRs per reviewer and pilot the standardized extraction sheet prior to the start of extraction. We will resolve conflicts through discussion or by consulting a third reviewer.
Data to be extracted from each SR will include general characteristics including title, author(s), journal, year of publication, country of authors, and, if applicable, dentistry specialty or specialties.13 For SRs conducting GRADE assessments, reviewers will also identify the primary outcome of each SR, which is the outcome defined as such by the authors or the outcome first listed in the methods section. If there are multiple primary outcomes defined by the SR authors, we will use the first outcome mentioned. If the methods section does not clearly describe the outcomes, we will consider the first outcome mentioned in the results section to be the primary outcome. Additionally, if a review assesses multiple comparisons for the primary outcome, we will only consider the results of the first comparison described in the results. If the primary outcome is assessed at multiple time points, we will consider only the results of the shortest time point. We will also extract data on the methodology of each SR, including the methods of searching, screening, and data extraction as well as the results of the SRs, including the outcomes analyzed and number of included RCTs. In order to allow us to select an outcome of interest for study 2, we will also extract data on whether the SR authors conducted and reported the results of an RoB assessment and whether they report the number of participants analyzed for narratively reported outcomes.
Regarding GRADE, we will extract the extent to which it was used in each SR (for all, some, or none of its outcomes), whether summary of findings tables were used, whether GRADE was used for all outcomes that were meta-analyzed, and whether GRADE was used for outcomes that were not meta-analyzed. We will also determine whether the SR authors refrain from making recommendations, statements about whether an intervention should or should not be used in clinical practice. We will search for potential recommendations in the conclusion, discussion, and abstract of the SR. We will extract additional data on the GRADE assessments for the primary outcome including the final certainty of the evidence rating, ratings and explanations for each GRADE domain, and additional information to allow us to determine whether GRADE was used appropriately at the outcome level. We will note any other issues with the GRADE assessments of the SR authors as part of our evaluation of whether GRADE was used appropriately.
We will also extract whether GRADE assessments were incorporated into conclusions about the primary outcome in the abstract and body of the SR. We will define a conclusion as a statement in which the authors interpret their results by stating whether the intervention(s) has beneficial or harmful effects relative to, or is no different from, the comparator(s), or stating that there is a lack of evidence regarding the outcome. We will first extract conclusions about the primary outcome from the abstract. If there is no conclusion section in the abstract, we will extract any conclusion statements from the results of the abstract. We will also extract conclusions about the primary outcome from the body of the SR, referring to the SRs designated conclusion section to minimize subjective judgements. If there is no conclusion section, we will extract the conclusion from the discussion section. Finally, if there is no clear conclusion statement in any of the aforementioned sections, we will not assess the conclusions of the SR but will still incorporate the SR in our other analyses (e.g., percentage of use of GRADE).
A summary of the data extraction fields can be found in table 1. Should further data necessitate extraction, we will modify the standardized form, extract this new data for all eligible studies, and report these protocol modifications in the final publication.
Table 1: Data Extraction Fields
Data Analysis
All retrieved articles will be presented in a study selection flow chart and the data of eligible studies summarized in tables.
For determining how frequently GRADE is used in dentistry SRs (objective A), we will first conduct a descriptive analysis. We will calculate the percentage of SRs using GRADE overall, by year, and by dental specialty from our entire sample of studies. Additionally, for SRs using GRADE for at least one outcome, we will calculate the percentage of SRs using GRADE for outcomes in which no meta-analysis was conducted.14 Finally, we will determine how frequently authors incorporate GRADE into the conclusions of the SRs primary outcome in both the body of the SR as well as in the abstract.
For summarizing the frequency of the levels of certainty determined by GRADE assessments in dentistry SRs (objective B), we will determine the percentage of high, moderate, low, and very low certainty evidence amongst the primary outcomes of each SR. To evaluate which limitations are more likely to lead to lower certainty evidence in the current literature, we will also quantify the frequency of concerns that lead to rating down the certainty of the evidence for each GRADE domain.
For assessing whether GRADE is being used appropriately (objective C), we will conduct two separate evaluations: at the review level and at the outcome level for the primary outcome of each SR.6,15–20 We will determine the percentage of SRs using GRADE appropriately at each level using the criteria outlined in box 2.6,15–20
Box 2: Checklist for determining whether GRADE was used appropriately
- If the response to all of the following questions is “yes,” then GRADE has been used appropriately at the review level. If any of these criteria are not met, then GRADE was not used appropriately.
- Do the SR authors use GRADE for all outcomes for which a meta-analysis was conducted?
- Are the GRADE assessments compiled in a GRADE evidence table (summary of findings table or evidence profile)*?
- Do the SR authors refrain from making recommendations?
- If the response to all of the following questions is “yes,” then GRADE has been used appropriately at the outcome-level. If any of these criteria are not met, then GRADE was not used appropriately.
- Are all five GRADE domains assessed?
- Do the SR authors refrain from using the criteria for rating up?
- Are explanations provided for all domains that are downgraded?
- Are all the explanations for the downgraded domains informative?**15
- For study limitations, do the authors indicate the proportion of studies that were at a concern for high risk of bias or the specific RoB assessment criteria that was of most concern?
- For imprecision, do the authors indicate whether the sample size or number of events was too low or whether the bounds of the CI have different meanings based on thresholds for the optimal information size or the effect size, respectively?
- For inconsistency, do the authors indicate how heterogeneity was judged (e.g., confidence interval overlap, statistical tests)?
- For indirectness, do the authors indicate whether it was the population, intervention, comparator, or outcome of the included RCTs that does not align with the SR question and is therefore a reason for concern?
- For publication bias, do the authors indicate the reason to suspect publication bias (e.g., funnel plot, suspected selective reporting)?
- If the primary outcome is dichotomous, do the SR authors transform relative estimates of effects to absolute estimates in order to assess imprecision?
- For the GRADE domains which were downgraded, is there evidence that the SR authors assessed the domains using the incorrect criteria (e.g, referring to the criteria for indirectness in the explanation for rating down imprecision, creating concerns for whether imprecision and indirectness were appropriately assessed)?
We will also note any other issues with the SRs GRADE assessments. If the review has any issues in the GRADE assessments at the review level or the outcome level, we will conclude that GRADE was not used appropriately at the review level or outcome level, respectively.
*We will consider any table that lists the certainty of the evidence ratings achieved after the GRADE assessments with information regarding which domains were downgraded (either reported in the table or in the footnotes) to meet this criterion.
**We will capture whether only some of the downgraded domains have informative explanations and report which domains were most or least likely to have informative explanations as defined above.
|
We will evaluate whether SRs using GRADE differ from those that do not GRADE with regards to methodological quality (objective D). To evaluate the methodological quality of each SR, we will refer to two aspects of the ROBIS tool.21 First, we will determine whether the search strategy was comprehensive. A search strategy will be considered comprehensive if it searches for published and unpublished reports (by specifying grey literature databases or searching for unpublished reports through any other means) (ROBIS question 2.1).21 Second, we will assess whether efforts were made to minimize errors during screening (i.e., title/abstract and/or full-text screening) as well as data extraction (ROBIS questions 2.5, 3.1).21 As we anticipate poor reporting of the methods used for screening, we will consider any mention of conducting screening independently and in duplicate or by having a second reviewer check the work of another to be minimizing errors. For data extraction, as described in the ROBIS tool, SRs for which this process is conducted independently and in duplicate or by having a second reviewer check the work of another reviewer in detail will be considered to be minimizing errors.21 We will use the odds ratio and its 95% confidence interval (CI) to determine whether SRs using GRADE are more likely to (1) have a comprehensive search strategy that considers grey literature and (2) take steps to avoid errors in screening and data extraction.
Study 2: Impact of GRADE assessments or lack thereof on the conclusions of dentistry-related systematic reviews
Objectives
- To determine whether a lack of certainty of the evidence assessments is a predictor of inappropriately formulated conclusions in SRs.
- To determine whether the use of GRADE changes the conclusions of dentistry SRs which do not utilize the tool.
Outcome of Interest
To conduct this study, we will focus on a specific outcome across all SRs. We will determine the outcome of interest based on the following criteria:
- The outcome of interest will be the outcome most frequently reported within our sample of SRs and for which the following information is also available:
- The findings of the RoB assessment conducted by SR authors
- The effect estimate with its 95% CI or number of participants analyzed
This outcome will be selected upon completion of data extraction for study 1, which will allow us to map the outcomes frequently investigated in the sample. This outcome must meet the aforementioned requirements as these will be necessary to conduct GRADE assessments necessary for Study 2. Given that oral health SRs have been found to be most frequently downgraded in the study limitations and imprecision domains,9 the aforementioned criteria is the minimum that our review team will require to conduct GRADE assessments. We selected a single outcome that is most frequently investigated to make it feasible for our review team to conduct GRADE assessments.
Eligibility Criteria
Inclusion Criteria
SRs eligible for this study must meet all of the eligibility criteria outlined above for study 1, in addition to reporting on the outcome of interest.
Data Extraction
All data will be extracted independently and in duplicate using a piloted data extraction form. Reviewers will begin extraction upon completion of a calibration exercise. For all eligible SRs, we will extract the SRs conclusion for the outcome of interest alongside additional data including whether the conclusions made by study authors relied on statistical significance, included recommendations, and considered if there were any limitations. For SRs not using GRADE, we will extract the minimum information needed for our team to make a GRADE assessment. For SRs where a meta-analysis was conducted for the outcome of interest, this will include the results of the meta-analysis. In cases where the outcome of interest is summarized without a meta-analysis, we will extract the list of RCTs analyzed for the outcome, the number of participants analyzed overall, the SRs narrative summary of the analysis, and any effect estimates provided for each of the individual RCTs. In the case where the outcome of interest was measured at multiple timepoints or investigated for multiple comparisons, we will only consider the results of the shortest time point and the first comparison listed in the results. We will also extract the results of the RoB assessment, identify the level of contextualization used to assess imprecision, and determine whether there was any evidence of publication bias or indirectness for the outcome of interest. The additional data extraction fields for this study can be found in table 2.
Table 2: Additional Data Extraction Fields for the Outcome of Interest in Study 2
Section
|
Data to be Extracted
|
Results
|
- Did the SR authors conduct GRADE assessments for the outcome of interest? (yes/no)
If the review authors did not use GRADE, extract the following:
- How do the authors define the outcome of interest and at what time point is it measured?
- What intervention and comparator are being investigated for the outcome of interest?
- How do the SR authors measure the outcome of interest?
- If a meta-analysis was completed extract:
- The type of effect measure used by the authors
- The pooled effect estimate and 95% CI
- Screenshot of the forest plot
- If there is no forest plot, also extract the I2 value and corresponding p-value
- If there was no meta-analysis extract:
- The included RCTs used to analyze the outcome of interest (extract first author name and reference number)
- The number of participants analyzed for the outcome of interest
- Verbatim quotation of the qualitative synthesis of the outcome of interest by the SR authors
- Any effect estimates provided for each of the included RCTs investigating the outcome of interest
- Was a minimally or partially contextualized approach used to assess imprecision?
- Results of the RoB assessment
- Is there evidence of serious or very serious indirectness?(yes/no)
- If yes, provide a rationale.
- Is there any reason to suspect publication bias? (yes/no)
- If yes, provide a rationale.
|
Conclusions
|
- Verbatim quotation of the SRs conclusion statements pertaining to the outcome of interest from the SRs conclusion section
- If there is no conclusion section, this is to be extracted from the discussion section. If there is no conclusion outlined in the discussion, this is to be extracted from the abstract.
- Do the authors rely on a p-value to make their conclusions (e.g., p<0.05)? (yes/no)
- Do the authors make any recommendations in their conclusion for the outcome of interest? (yes/no)
- Do the authors consider if there are any limitations to their findings in their conclusion (by means of a GRADE assessment or otherwise)? (yes/no)
|
Data Analysis
We will use a study selection flow chart to present the retrieved articles and tables to summarize the characteristics of eligible studies.
To determine whether a lack of certainty of the evidence assessments is a predictor of inappropriately formulated conclusions in SRs (objective A), we will first evaluate the conclusions made by all the SRs for the outcome of interest, irrespective of whether they use GRADE. Two reviewers will independently assess these conclusions to determine whether they are appropriately formulated, conflicts will be resolved through discussion or by a third reviewer where needed. Once all conclusions have been classified as appropriately formulated or not, we will use the odds ratio and its 95% CI to evaluate whether SRs using GRADE are more likely to formulate appropriate conclusions compared to SRs not using GRADE.
A conclusion will be considered to be appropriately formulated if it meets all the following criteria:
- The conclusion does not rely on statistical significance.21
- The conclusion considers if there are any limitations.
- We will consider SR authors to have addressed limitations by stating whether or not the results are impacted by any number of factors (e.g., low quality of RCTs, heterogeneity, small sample size, publication bias, short follow-up time in RCTs) or referencing their GRADE certainty of the evidence rating.
- The conclusion does not make recommendations.22
To determine whether the use of GRADE changes the conclusions of dentistry SRs (objective B), reviewers will evaluate the conclusions of a subset of the study sample which does not utilize the GRADE approach and report on the outcome of interest. First, we will classify authors’ conclusions in terms of their certainty as either definitive or recognizing uncertainty (addresses any limitations of the evidence by means of a GRADE assessment or through some other means). Examples of other ways to recognize uncertainty include stating that the results should be interpreted with caution due to high risk of bias, heterogeneity, small sample size, etc., stating that there were limitations in the evidence, stating that there is insufficient evidence to draw a conclusion, or stating that further high-quality studies are needed. We will also identify how the effect size is categorized in the SR conclusions according to the level of contextualization used by the authors of each SR (i.e., minimally contextualized or partially contextualized).23 A minimally contextualized approach will be defined as an approach that focuses on whether an important effect exists (i.e., negligible/trivial/no difference or important difference between interventions), while a partially contextualized approach defines the magnitude of the effect (i.e., negligible, small, moderate, or large).23 We will assume a minimally contextualized approach is used unless the authors explicitly state using a partially contextualized approach or if this can be inferred from their conclusions as they refer to different magnitudes of effect. If the SR authors rely on statistical significance to classify the effect size, we will consider this to be a minimally contextualized approach. Conclusions will be classified by two reviewers until consensus is reached.
Second, after classifying the authors’ original conclusions, the review team will complete GRADE assessments for the outcome of interest in these SRs independently and in duplicate. We will assess imprecision using the same level of contextualization used by the authors of each SR (i.e., minimally contextualized or partially contextualized) as classified by our team using the criteria above. We will use the information provided by the SR authors to assess risk of bias and inconsistency. We will assume no concerns for indirectness and publication bias unless otherwise stated in the SRs results or discussion sections. Using the results of our GRADE assessment, we will then formulate one conclusion per SR as shown in table 3, using the same level of contextualization utilized by the SR authors.24
Table 3. Methods for Formulating Conclusions
Level of Certainty
|
Conclusion Based on a Minimally Contextualized Approach
|
Conclusion Based on a Partially Contextualized Approach
|
High
|
Intervention X increases/decreases the outcome by an important/negligible amount when compared to intervention Y.
|
Intervention X increases/decreases the outcome by a negligible/small/moderate/large amount when compared to intervention Y.
|
Moderate
|
Intervention X probably increases/decreases the outcome by an important/negligible amount when compared to intervention Y.
|
Intervention X probably increases/decreases the outcome by a negligible/small/moderate/large amount when compared to intervention Y.
|
Low
|
Intervention X may increase/decrease the outcome by an important/negligible amount when compared to intervention Y.
|
Intervention X may increase/decrease the outcome by a negligible/small/moderate/large amount when compared to intervention Y.
|
Very Low
|
Intervention X may increase/decrease the outcome by an important/negligible amount when compared to intervention Y, but the evidence is very uncertain.
|
Intervention X may increase/decrease the outcome by a negligible/small/moderate/large amount when compared to intervention Y, but the evidence is very uncertain.
|
After this, our team’s conclusion will be compared to the authors’ conclusions with respect to the certainty and effect size. If the review authors’ original conclusion only states that there is insufficient evidence to comment on the outcome or does not comment on the effect size, we will only evaluate whether the conclusion changes with respect to certainty. We will calculate the percentage of conclusions which changed after a GRADE assessment was completed with respect to the classification of the certainty and/or effect size. We will also calculate the percentage of conclusions which have increased certainty, decreased certainty, increased effect size, or decreased effect size after a GRADE assessment is conducted. This will allow us to evaluate how the utilization of GRADE may impact authors’ conclusions by assessing whether conclusions change following GRADE assessments and by determining the direction of that change (i.e., do conclusions become more or less conservative following GRADE assessments?).
Table 4. Methods for Determining the Effect of the GRADE Approach on SR Conclusions
|
Classification of SR Authors’ Conclusion
|
Classification of Our Conclusion (after GRADE assessments)
|
Effect of our use of the GRADE Approach on the SR Conclusion
|
Certainty
|
Definitive
|
High
|
No change
|
Moderate, Low, or Very Low
|
The level of certainty of the SR conclusion decreased
|
Recognizing Uncertainty
|
High
|
The level of certainty of the SR conclusion increased
|
Moderate, Low, or Very Low
|
No change
|
Recognizing Uncertainty*
|
High, Moderate, or Low
|
The level of certainty of the SR conclusion increased
|
Very Low
|
No change
|
Effect Size
|
Minimally Contextualized Approach
|
Negligible Effect
|
Negligible Effect
|
No change
|
Important Effect
|
The magnitude of the effect in the conclusion of the SR increased
|
Important Effect
|
Negligible Effect
|
The magnitude of the effect in the conclusion of the SR decreased
|
Important Effect
|
No change
|
Partially Contextualized Approach
|
Negligible Effect
|
Negligible Effect
|
No change
|
Small, Moderate, or Large Effect
|
The magnitude of the effect in the conclusion of the SR increased
|
Small Effect
|
Negligible Effect
|
The magnitude of the effect in the conclusion of the SR decreased
|
Small Effect
|
No change
|
Moderate or Large Effect
|
The magnitude of the effect in the conclusion of the SR increased
|
Moderate Effect
|
Negligible or Small Effect
|
The magnitude of the effect in the conclusion of the SR decreased
|
Moderate Effect
|
No change
|
Large Effect
|
The magnitude of the effect in the conclusion of the SR increased
|
Large Effect
|
Negligible, Small, or Moderate Effect
|
The magnitude of the effect in the conclusion of the SR decreased
|
Large Effect
|
No change
|
*This refers to conclusions where it is only stated that there is a lack of sufficient evidence
Ethics & Dissemination
Ethics committee approval and consent is not required for any component of this methodology project since only previously published data will be used.