Our document analysis on a sample of 15 CPGs about CRP, colonoscopy and FeNO tests revealed that none of these CPGs reported evidence on all components of a test-treatment pathway. Consideration of test consequences on patient-relevant outcomes was barely described. Systematic review of the literature, including a judgement of the certainty in the supporting evidence was only reported for a few recommendations and mainly covered diagnostic accuracy.
The importance of systematically evaluating test consequences for the purpose of developing CPGs has been recognised,(4, 11, 12) but this study suggests that implementation is lagging behind. This also applies to CPGs that claim to use the GRADE approach. There seems to be a gap between following a methodologically robust approach and developing CPGs in practice.
Two issues may explain that gap. First, guideline developers may have considered the downstream consequences of a test but did not explicitly report these. It may not be strictly necessary to systematically evaluate all evidence components. However, we still recommend transparent documentation of choices made in the guideline development process. A guideline user should be able to read which elements of a test-treatment pathway were considered and how, and which were not considered and why.
Second, performing systematic literature reviews of the complete test-treatment pathway – including assessment of the certainty in the evidence of test accuracy and downstream consequences – is complex and time-consuming The use of the GRADE approach for the evaluation of medical tests and test strategies is considered challenging.(8, 9) Strategies to facilitate the use of this approach, such as training of CPG panel members, may improve the application. Unfortunately, we could not determine factors that contribute to successful use of the GRADE approach, because we could not identify a ‘best practice’.
A lack of transparency in combination with the use of state-of-the-art methods was also described by Arevalo-Rodriguez and colleagues, who studied the methods and reports of 191 rapid reviews of medical tests.(30) In the majority of those reviews, the study selection method was not reported. Although almost 20% of the reviews claimed to have applied the GRADE approach, few actually reported the data extraction and quality appraisal methods.
This finding is consistent with a recent report on the application of GRADE in U.S. guidelines.(31) Although guideline developers indicated that they used the GRADE approach, only 10% of the included CPGs reported on all 8 criteria for assessing the certainty in the evidence (e.g. indirectness and dose-response gradient), and around half of these included an evidence profile or summary of findings table.
Gopalakrishna et al. studied barriers in the development of recommendations about medical tests in a qualitative study among European CPG developers.(32) They also reported challenges in the development of recommendations about medical tests, e.g. in the definition of key questions, the types of evidence and outcomes included in the CPG, and synthesizing and appraising the evidence. Awareness and education were reported as the most important ways to solve these challenges.
Our study emphasises the need for more knowledge and expertise among CPG developers when evaluating medical tests. Currently available competency-based frameworks for CPG developers do not include a special focus on medical test evaluation.(33, 34) This also applies for current training programs of CPG panel members, e.g. INGUIDE.(35) Facilitating the implementation of GRADE for diagnosis by defining competencies and training needs may improve CPGs.
Strengths and limitations
This study evaluated the supporting evidence of recommendations in CPGs on 3 medical tests. The selection of only 3 topics is a limitation of this study. However, we chose 3 tests with divergent characteristics (e.g. invasiveness, possible burden of the test, disease of interest, costs) allowing comparison of many CPGs. The homogenous results in all 3 clusters of CPGs strengthens the external validity of our findings. Additionally, we found large variance in methodological quality of the included CPGs. However, high scoring CPGs on the AGREE domain methodology did not reflect a better or more transparent underpinning of the recommendations than lower scoring CPGs.
Due to the document analysis design we could not retrieve information about the dynamics in the CPG panels that could explain their decisions and reasons for lack of transparency in the CPG documents. We did not contact the CPG developers, since in our opinion CPG users should be able to find the considerations of the panel beyond the recommendations in the published documents of the CPG.
Implications for practice
We suggest that developers of CPGs with diagnostic topics clearly describe which elements of a test-treatment pathway were or were not considered and why. In addition, CPG developers should indicate the presence or absence of systematic reviews of the evidence, including determination of the certainty in that evidence, which is also usual in recommendations about therapy. Facilitating the implementation of GRADE for diagnosis will be useful to improve the clinical content of CPGs.
Implications for research
This study highlighted the lack of (transparency about) supporting evidence for medical test recommendations in CPGs. A next step could be to study why CPG developers do not report all elements of the test-treatment pathway, including a review of the evidence and its quality. Furthermore, it is worthwhile to research how to facilitate CPG developers in explicitly and reliably considering all relevant steps of a test-treatment pathway when developing medical test recommendations.