Rating the quality of evidence and grading the strength of recommendations
Almost all guidance documents (93%) suggested a structured approach or system for rating the quality of the evidence, most of them (85%) contained a specific section or chapter. GRADE was the most common approach suggested (53%), followed by the approaches proposed by the Australian National Health and Medical Research Council (NHMRC), and the Scottish Intercollegiate Guidelines Network (SIGN), reported in three documents each (4%). Of note, seven documents (10%) suggested rating systems that were based on GRADE or other approaches. Finally, nine documents (13%) proposed their own systems, mainly based on the previous approaches. See Additional file 2 for further details.
Most of the guidance documents (88%) proposed a system for grading the strength of recommendations; half of them suggested the GRADE approach. Around a fifth of the documents (22%) suggested their own approach, whereas 14% of the documents did not report any. Other approaches were SIGN (3%), Oxford Centre for Evidence-Based Medicine (1 document), NICE (1 document), and USPSTF (1 document). Five documents (8%) proposed other approaches, mostly adapted from GRADE (Table 1).
Table 1
Recommendation formulation information reported in the included guidance documents
|
N (%)
|
Contains specific section in the document
|
55 (81%)
|
Details of people involved
|
- Entire panel/GDG (8, 56%)
- Panel + other (5, 7%)
- Subgroup of the panel (3, 4%)
- Not specified (19, 28%)
|
Technical team shares materials (e.g., evidence summaries) with the guideline panel ahead of meeting
|
27 (40%)
|
Technical team or someone else makes preliminary judgments on the different criteria (e.g., certainty of evidence).
|
25 (37%)
|
Technical team or someone else makes preliminary judgment on the strength of recommendations (e.g., strong, conditional).
|
15 (22%)
|
Technical team or someone else makes preliminary judgments about the direction of recommendations (e.g., in favor against).
|
7 (10%)
|
Approach to grading the strength of recommendations
|
60 (88%)
|
Approach suggested for grading the strength of recommendations
|
- GRADE (35, 51%)
- NHMRC (2, 3%)
- SIGN (2, 3%)
- CEBM (1, 1.5%)
- NICE (1, 1.5%)
- USPSTF (1, 1.5%)
- Adapted systems (5, 7%)
. GRADE + NICE (1, 1.5%)
. NICE + SIGN (2, 3%)
. GRADE + SIGN (1, 1.5%)
. GRADE + AHRQ + USPSTF (1, 1.5%)
- Other or not specified 21 (36%)
|
Use of a framework for the EtD process
|
45 (66%)
|
Frameworks suggested for the EtD process
|
GRADE-EtD: 19 (42%)
Other approaches: 26 (58%)
Own approach 20 (76%)
NICE 2 (8%)
SIGN 1 (4%)
USPSTF 1 (4%)
SIGN+NICE 1 (4%)
GRADE+SIGN+AHRQ 1 (4%)
|
Explicit method to reach agreement among panel members (e.g., consensus, nominal group techniques).
|
56 (82%)
|
Documentation of judgements made.
|
27 (38%)
|
AHRQ: The Agency for Healthcare Research and Quality (USA); CEBM: The Centre for Evidence-Based Medicine, based in the Nuffield Department of Primary Care Health Sciences at the University of Oxford; GDG: Guideline development group; GRADE: The Grading of Recommendations Assessment, Development and Evaluation approach; NHMRC: National Health and Medical Research Council (Australia); NICE: The National Institute for Health for Care and Excellence (UK); SIGN: Scottish Intercollegiate Guidelines Network; USPSTF: The U.S. Preventive Services Task Force |
Recommendation formulation
Two out of three guidance documents (66%) suggested a structured process for formulating recommendations, with the entire panel being involved in more than half (56%) of the documents. In contrast, 28% of the guidance documents failed to report who was involved in the process of formulating recommendations. The technical team needed to share preliminary material (e.g., evidence summaries and evidence-to-recommendation tables) with the guideline panel ahead of GDG meetings in 40% of the guidance documents, and 48% of the documents reported that the technical team should make preliminary judgments about the quality of the evidence. A lower number of documents (37%) declared that preliminary judgments on the certainty of the evidence or any other factor (e.g., values and preferences, equity, resources required) should be made before the panel meetings. A smaller number of documents (22%) suggested the same for deciding about the direction and strength of recommendations (10%) (see Table 1).
Use of EtD frameworks
The GRADE-EtD framework was the most often reported (42%), followed by NICE’s framework (8%), SIGN (4%), and USPSTF (4%). Twenty guidance documents (76%) reported their own multi-criteria framework. None of the guidance documents reported the use of multi-criteria decision analysis frameworks. Nearly half of the documents (57%) provided guidance for formulating recommendations when there is insufficient evidence or no evidence available, or low/very low-quality evidence. Most of the documents (82%) suggested a method to reach agreement among guideline panel members (e.g., consensus, voting (majority rule), or nominal group techniques) (see Table 1).
Setting, perspective, and subgroup considerations
We found that 42% and 24% of the guidance documents that included an EtD framework specified the setting and perspective, respectively. These rates were considerably lower among documents that did not suggest any framework (17% setting and 9% perspective). Guidance documents suggesting the GRADE-EtD framework were the most common ones in reporting the inclusion of both setting and perspective (42%), followed by those that suggested another EtD framework (42% and 11%, respectively). The inclusion of subgroups considerations showed a similar pattern among the guidance documents (i.e., 31% in the use of any EtD framework and 4% in the use of no framework).
Recommendation-related criteria
A total of 14 recommendation-related criteria were identified across the guidance documents. Overall, guidance documents that suggested an EtD framework considered a more comprehensive set of criteria than those who did not suggest any framework. Magnitude of desirable and undesirable effects, and certainty of the evidence were the most frequently reported criteria among documents that suggested any framework for guiding the EtD process, with reporting above 80%. In contrast, the use of these three criteria in documents that suggested no framework was 52%. Other criteria such as equity, acceptability, and feasibility were reported in less than half of the documents that guided the EtD process through a systematic framework (any EtD framework), whereas considerably lower rates of use were observed among the documents that did not suggest a framework to guide the EtD process (e., from 0 to 22%). Of note, one document reported on the consideration of legal consequences (Domus Medica), and on bioethical considerations (Italian National Center for Clinical Excellence, Quality, and Security, CNEC), respectively.
Among the documents that suggested a framework for guiding the EtD process, those that suggest the GRADE-EtD framework addressed a larger number of criteria compared to those that suggested another framework for the EtD process (see Table 3). Rates of use differed between the two categories. For instance, 95% of the guidance documents that suggested the GRADE-EtD framework reported the consideration of patients’ values relative to 46% of those that used another EtD framework. A larger difference was observed for equity considerations, 42% in the GRADE-EtD framework and 11% in another EtD framework, respectively. Table 2 presents all the recommendation-related criteria reported in the guidance documents. Results from the bivariate analysis are presented in the section below.
Table 2
Recommendation-related criteria in the EtD process
Criteria
|
All guidance documents
68 (100%)
|
Any framework
45/68 (66%)
|
No framework
23/68 (34%)
|
GRADE-EtD1
19/45
(42%)
|
Other framework
26/45
(58%)
|
Problem priority
|
40 (59%)
|
30 (67%)
|
10 (43%)
|
12 (63%)
|
18 (69%)
|
Desirable effects
|
49 (72%)
|
37 (82%)
|
12 (52%)
|
17 (89%)
|
20 (77%)
|
Undesirable effects
|
50 (73%)
|
38 (84%)
|
12 (52%)
|
17 (89%)
|
21 (81%)
|
Certainty of the evidence of effects
|
50 (73%)
|
38 (84%)
|
12 (52%)
|
18 (95%)
|
20 (77%)
|
Values (outcome importance)
|
35 (51%)
|
30 (67%)
|
5 (22%)
|
18 (95%)
|
12 (46%)
|
Balance of effects
|
39 (57%)
|
32 (71%)
|
7 (30%)
|
17 (89%)
|
15 (58%)
|
Resources required
|
37 (54%)
|
31 (69%)
|
6 (26%)
|
18 (95%)
|
13 (50%)
|
Certainty of evidence of required resources
|
17 (25%)
|
16 (36%)
|
1 (4%)
|
9 (47%)
|
7 (27%)
|
Cost-effectiveness
|
36 (53%)
|
30 (67%)
|
6 (26%)
|
16 (84%)
|
14 (54%)
|
Equity
|
11 (16%)
|
11 (24%)
|
0
|
8 (42%)
|
3 (11%)
|
Acceptability
|
19 (28%)
|
17 (38%)
|
2 (9%)
|
9 (47%)
|
8 (31%)
|
Feasibility
|
23 (34%)
|
18 (40%)
|
5 (22%)
|
9 (47%)
|
9 (35%)
|
1 Guidance documents that suggested the GRADE-EtD frameworks for the process of formulating recommendations. |
GRADE: The Grading of Recommendations Assessment, Development and Evaluation approach |
Table 3
Criteria
|
All guidance documents
68 (100%)
|
Any framework
45/68 (66%)
|
No framework
23/68 (34%)
|
GRADE-EtD1
19/45
(42%)
|
Other framework
26/45
(58%)
|
Summary of the judgments made about the different criteria considered
|
28 (41%)
|
22 (49%)
|
6 (26%)
|
9 (47%)
|
13 (50%)
|
Justification of the recommendation
|
28 (41%)
|
24 (53%)
|
4 (17%)
|
11 (58%)
|
13 (50%)
|
Subgroup considerations
|
13 (19%)
|
12 (27%)
|
1(4%)
|
7 (37%)
|
5 (19%)
|
Implementation considerations
|
38 (56%)
|
31 (69%)
|
7 (30%)
|
14 (74%)
|
17 (65%)
|
Monitoring and evaluation considerations
|
30 (44%)
|
25 (56%)
|
5 (22%)
|
14 (74%)
|
11 (42%)
|
Research priorities
|
19 (28%)
|
18 (40%)
|
1 (4%)
|
9 (47%)
|
9 (35%)
|
1 Guidance documents that suggested the GRADE-EtD frameworks for the process of formulating recommendations. GRADE: The Grading of Recommendations Assessment, Development and Evaluation approach |
Although the wording of some recommendation-related criteria may vary across the documents, which might be explained by the organization preferences, this refers to the same set of criteria described in Table 2. Criteria wording variation is expected for organizations that adapt a framework and process for arriving at recommendations. Thus, some organizations may have addressed the magnitude of the problem before assessing the evidence pertaining the EtD process, and therefore might not report this as an independent criterion in their EtD framework.
To illustrate this better, the GRADE-EtD framework presents a set of additional considerations, so called detailed judgments, that assist the panel when considering the evidence that underlie the main criteria (4). For example, legal consequences, reported by Domus Medica as a specific criterion, would be covered by the GRADE-EtD framework as one of the detailed judgements included under feasibility (i.e., are there important legal or bureaucratic constraints that that make it difficult or impossible to cover the intervention?). The same principle applies to ethical considerations, which is suggested by the Italian Center for Clinical Excellence, Quality and Safety of Care (CNEC), and are addressed as a detailed judgement under acceptability by the GRADE-EtD framework (i.e., are there key stakeholders who would disapprove of the intervention morally, for reasons other than its effects on people’s autonomy (such as regarding ethical principles such as no maleficence, beneficence, or justice)?) (4).
Drawing conclusions as part of the EtD process
Overall, 41% of the guidance documents reported a process to summarize the judgments made about the different recommendation-related criteria. This step was more common in the documents that suggested the use of a framework (49%) compared to documents that did not (26%). We did not observe major differences on this step between those suggesting the use of the GRADE-EtD framework or another framework (47% vs 50%, respectively). Similarly, the justification of the recommendation’s strength and direction was more common in documents that used a framework (53% vs 17%, respectively). This trend was also observed for other steps, such as considerations for relevant subgroups, implementation, monitoring, and evaluation, as well as about the formulation priorities for further research (see Table 3).
Bivariate analysis
As stated in the methods section, we express the results of the bivariate analysis as the probabilities of addressing each recommendation-related criterion in the different categories of the EtD process (i.e., any framework vs no framework; GRADE-EtD vs other EtD framework; GRADE-EtD vs no framework; other EtD framework vs no framework). The following is a summary of the main findings. We refer the reader to Additional file 4 for a full description of the data.
The use of an EtD framework for guiding the recommendation formulation compared to no framework resulted in higher probability of incorporating both perspectives (OR 2.8; 95% CI 0.6 to 13.8) and subgroup considerations (OR 7.2; 95% CI 0.9 to 57.9). Similarly, the documents that incorporated the GRADE-EtD framework were more likely to incorporate these criteria in the recommendation formulation process than those that suggested another EtD framework or no framework, with ORs ranging from 1.4 to 8.4, respectively.
The probability of using all the recommendation-related criteria identified in this study were higher in documents that suggested the use of any EtD framework relative to no framework, as well as for the documents that suggested the GRADE-EtD framework compared to those that suggested another framework or no framework (see Figure 3).
For instance, guidance documents that suggested the use of any EtD framework were more likely to consider patients’ values in the recommendation formulation process compared to those that did not follow any framework (OR 3.1; 95% CI 1 to 8.9). The odds of including patients’ values were two-fold greater in documents that suggested the GRADE-EtD framework relative to those that suggested another framework (OR 2; 95% CI 0.8 to 5.4). The odds ratio increased to 4 when the GRADE-EtD was compared to no framework (OR 4.4; 95% CI 1.4 to 13.9) (see Figure 3).
The guidance documents that suggested any EtD framework were more likely to present evidence on the balance between desirable and undesirable effects when formulating recommendations than documents that did not suggest any framework (OR 2.3; 95% CI 0.9 to 6.1). The odds were larger for the comparison of the documents that suggested the GRADE-EtD framework with those that used no framework (OR 2.9; 95% CI 1.1 to 8.6). Similar odds were observed for the criterion related to resources required.
The Odds Ratios (ORs) are based on the comparison of each category of EtD framework relative to the use of no framework. The vertical lines illustrate the 95% confidence intervals (CIs).
Suggesting the use of an EtD framework was associated with higher odds of including cost-effectiveness considerations when formulating recommendations relative to the use of no framework (OR 2.5; 95% CI 0.9 to 7.1). Alike other criteria, the documents that suggested the use of the GRADE-EtD framework were more likely to incorporate cost-effectiveness considerations in the recommendation formulation process than those that suggested another framework (OR 1.6; 95% CI 0.6 to 3.9) or no framework (OR 3.2; 95% CI 1.1 to 9.8) (Figure 3).
Of note, the odds of including recommendation-related criteria such as equity, acceptability, and feasibility followed the same pattern. That is, the use of any EtD framework and the GRADE-EtD framework resulted in higher probability of including those criteria when formulating recommendations relative to the use of no framework or another EtD framework.
Finally, and in line with the associations observed for the recommendation-related criteria, the documents that suggested both the use of any framework or the GRADE-EtD framework were more likely to provide a justification of the judgements made, implementation considerations, as well as monitoring and evaluation considerations than the documents that suggested no framework or another framework. Additional file 4 presents further details.