This article reports the results of the quality appraisal with the AGREE II of the most recent CPGs for LBP interventions retrieved through a systematic search through electronic databases and guidelines websites and published from January 2016 to January 2020.
A key findings of this research was the quality variability of LBP CPGs across all six AGREE II domains; the Domain 6 - Editorial Independence and the Domain 1 - Scope and Purpose obtained the highest average scores (> 60%) whereas the Domain 5 - Applicability obtained the lowest (< 15%).
The overall quality in LBP CPGs was judged to be low and the most frequent judgement for the guideline recommendation was “No” (n = 15 out of 21).
The Domain 1 - Scope and Purpose addresses the overall aim of the guideline, the clinical question, and the target population. The highest compliance of our sample with this item (63.9%) could be mainly due to their focus on LBP, which is the most prevalent musculoskeletal condition for which guidelines should be needed considering the consequent years lived with disability in most countries.(30)
The Domain 2 - Stakeholder Involvement addresses the degree to which the guideline represents the views of its intended users. Less than one third of our CPGs accomplish the AGREE II requirements lacking the participation of patients and their advocate.
The Domain 3 - Rigour of Development revealed an extreme variability across CPGs scoring from 0 to 90.1%. Only an half of CPGs have an acceptable rigour of development, a low score of this domain is worrying, as Domain 3 has been identified as stronger predictor of quality of the AGREE instrument.(5) Indeed, a regression analysis showed a statistically significant strongest influence of Domain 3 on overall guideline quality.(31) Among the items addressed by the Domain 3, the one regarding the systematic search might be considered of great importance (i.e., “Item 7: Systematic methods were used to search for evidence”) because CPGs have the duty to present the most updated evidence. We found that less than half of CPGs did not reported the time coverage of systematic search or, when reported, it ranged from 1 to 4 years before CPGs publication. Another important item in Domain 3 (e.g., “Item 12: There is an explicit link between the recommendations and the supporting evidence”) found that two-third of CPGs in our sample adequately planned and judged the body of the evidence (e.g, GRADE). However, the application of system for grading the evidence (i.e., GRADE) not always guarantee the inclusion of the most updated evidence in an acceptable time span: thus, reliability should be cautiously evaluated.
The validity of each recommendation, and consequently of the CPG, is actually determined by the methodological quality and transparency of its development and by the “living evidence” on which it is based. As suggested by Garcia et al. waiting more than 3 years to review a guideline is potentially too long and, in this case, recommendations could be outdated even at the time of guideline publication. (12) This critical issue has been answered by the living clinical practice guidelines concept (32), which draws inspiration from the already established model of living systematic reviews, where the evidence is continuously updated and incorporated as soon as available in the literature, through a process of continuous surveillance. (33)
On this view, the AGREE II should put emphasis on timing, rating a high-quality CPGs if they conducted the search within 2 years of completion of the review.(34)
The Domain 4 - Clarity of Presentation reflects the adequacy in the reporting of recommendations and different options for management and it was satisfactory in only half of our CPGs. This can be related to the purpose of AGREE II: the current version makes no distinction between quality of reporting and quality of conduct of a CPG. Despite good reporting, the methodological conduct underlying a guideline can be weak.(35) Quality of conduct and reporting should be judged separately as for all other study design.(36, 37) For instance, in systematic reviews, the PRISMA and the AMSTAR assess quality of reporting and quality of conduct, respectively. (38)
The Domain 5 - Applicability was the poorest scoring domain reporting results similar to other conditions.(5, 8, 39–41) Development and implementation of guidelines are erroneously considered as separate activities.(5)
The Domain 6 - Editorial Independence, has high compliance in most of the CPGs. Considering the high social-economic global burden of low back pain and the relative need for care, CPGs must reported the presence and management conflict of interests.
Our appraisal has several strengths. We performed an exhaustive search including explicit eligibility criteria and independent duplicate assessment of eligibility. We involved four reviewers for the appraisal with the inter-rater reliability nearly perfect. While all appraisers were trained in the use of AGREE II, it should be acknowledged that the appraisers shared a similar background (methodology and rehabilitation), which may partially explain the highest overall agreement. Indeed, our team included clinical experts and methodologists with wide experience in clinical epidemiology including systematic reviews and CPGs. Even after the same training, guideline appraisers from different disciplines may interpret the items and scoring system differently.(42) Furthermore, it is possible that the appraisers, basing the assessment on their own experience, paid more attention to assessing the quality of reporting than the quality of conduct and vice versa. We analysed a reliable subset of CPGs restricted to LBP in order to ensure consistency of appraisal while avoiding discrepancies in item judgements due to different clinical contexts (e.g., AGREE II assessed CPGs in oncology differently from orthopaedics). We focused on the most recent versions of the guidelines in order to offer stakeholders, policy makers, clinicians and patients with the last evidence about effectives of interventions. However, selection of CPGs was challenging, since the definition of guidelines is not universally established. In the literature, there is confusion between what is the meaning of consensus and what is an evidence-based CPG. The rigour of methods and panel of experts have to be simultaneously considered in a CPG, but the current definition does not explicate these elements.
Future spin for research
At the time of its publication, a CPG can be already old not reflecting the most recent evidence. Indeed, time can influence the reliability at two points: (a) during the conduction of systematic reviews for the production of the body of the evidence needed during CPG development; (b) between the finalization of a CPG and its publication. In order to avoid waste of efforts resulting in CPGs duplicate or unreliable new-born already old CPG, we call for a universal database where all guidelines can be registered and updated. An example might be the registers of RCTs (e.g., WHO or clincialtrias.gov) and SRs (e.g., PROSPERO) but for CPGs. In this way, a “living and dynamic” development of recommendations can be better recognized identifying the most recent literature. (43)