The reporting of this systematic review fulfils the Preferred Reporting Items for Systematic Reviews and Meta-Analyses. (24, 25). No ethics committee approval was needed. The protocol is registered in PROSPERO (CRD42019127619).
Inclusion and exclusion criteria
In line with the World Health Organization, we defined a CPG as a document containing “systematically developed evidence-based statements that assist providers, patients, policy makers and other stakeholders to make informed decisions on health care and public health policy”. (26)
Inclusion criteria were: (i) the systematic process evaluated the recommendations; (ii) the CPG was focused on rehabilitation, pharmacological or surgical therapeutic intervention for LBP management; (iii) the full text was published in the last 4 years (2016-2020). We used the most up-to-date version and its supplementary documents. No language restrictions were applied. Exclusion criteria were: (i) not primarily focused on LBP, such as national/international guidelines in which LBP was briefly mentioned in the context of a more comprehensive disease evaluation; (ii) not issued by a national or international society (e.g., designed for local use); (iii) declaration of recommendations was based exclusively on consensus statements or systematic reviews or commentary editorials related to published CPGs; (iv) focus on interventions other than therapeutic (e.g., prevention, diagnosis); (v) based on population subgroups (e.g., pregnant women), specific causes (e.g. spondyloarthritis) or mixed/generic population (e.g., musculoskeletal chronic pain).
Information sources and search strategy
We systematically searched the PubMed, Embase, PEDro, and TRIP databases using the adapted terms and keywords derived from the scoping search outlined in the search strategy. We checked guideline organisation databases (e.g., National Institute for Clinical Excellence) and guideline websites (e.g., eGuidelines). Supplementary Digital Content 1 illustrates the search strategy. Two reviewers (SG, GC) with a solid background in clinical epidemiology ran the search strategy in March 2019 and updated the results in January 2020. Grey literature was searched using Google Scholar and reference lists were screened for further eligible CPGs.
Selection of clinical practice guidelines
Search results were uploaded to Endnote software and duplicates were removed. (27),(28) Two independent reviewers (SG, VI) screened the titles and abstracts according to the eligibility criteria. Full texts were retrieved when abstracts gave insufficient information or in case of disagreement between the two reviewers. When disagreement persisted, a third reviewer was consulted (GC). Rayyan software (https://rayyan.qcri.org/) was used to manage screening and selection. (29) Reasons for study exclusion are reported.
Appraisal of clinical practice guidelines
Four independent researchers (MB, GC, SG, VI) appraised each CPG using the AGREE II instrument and recorded with a self-chronometer the time taken for each assessment. The researchers received training in the use of AGREE II. They completed the AGREE II Online Training Tool (http://www.agreetrust.org/resource-centre/agree-ii-training-tools/) and participated in two calibration rounds with a sample of four relevant CPGs of varying quality from a previous overview of clinical guidelines for chronic LBP restricted to 2012. (30) The original AGREE tool was published in 2003 has since then been revised in an updated version. The AGREE II instrument (22) consists of 23 items organized into six quality domains: scope and purpose, stakeholder involvement, rigour of development, clarity of presentation, applicability, and editorial independence. Supplementary Digital Content 2 shown the items and domains of the AGREE II instrument. (31) Answers to items are graded on a 7-point scale from 1 (strongly disagree) to 7 (strongly agree). A standardized score (range, 0 to 100%) was calculated for each domain.
The appraisers completed the first global rating item on a 7-point scale (1=lowest possible quality, 7= highest possible quality) and the second global rating item of recommending the guidelines for use in practice, with one of three options (Yes, Yes, with modifications, and No). One author (VI) calculated the standardised domain score for each of the six domains as recommended by AGREE II. (22, 32) The general data from each CPG were collected: i) authors and year of publication; ii) ex novo, update or adoption/adolopment CPG status; iii) continent of origin; iv) organization/society/association, funding source, conflict of interest. We also extracted content information such as target population, target interventions (i.e., surgery, physical therapy, pharmaceutics, educational / behavioural, alternative medicine), rating methods for the quality of evidence (e.g., the Grading of Recommendations Assessment, Development and Evaluation - GRADE), presence of a multidisciplinary panel (as defined by AGREE II: potential candidates for a panel group include clinicians, content experts, researchers, policy makers, clinical administrators, and funders; at least one methodology expert), and patient involvement (as defined by AGREE II: to capture patient/public views and preferences). Supplementary Digital Content 2.
Data synthesis
We used descriptive statistics to summarize the characteristics of CPGs deemed eligible for inclusion. Data are summarized as frequency number (percentage) or median and interquartile range (IQR). We calculated a quality score for each of the six domains of CPGs using the formula presented in the AGREE II User’s Manual. (32) The appraisers added notes and completed the two global rating items at the end of each AGREE II assessment. The first global rating item asks appraisers to rate the overall quality of the guideline on a 7-point scale (1=lowest possible quality and 7= highest possible quality). Domain scores are calculated by summing up the appraisers’ scores of the individual items in a domain and then scaling the total as a percentage of the maximum possible score for that domain, which is then automatically generated on the platform My AGREE PLUS. (33)
The second global rating item asks whether the appraiser would recommend the guideline for use in practice and to respond with one of three options (Yes, Yes, with modifications, and No).
The first global rating was adopted to formulate the agreement on the overall assessment between the four appraisers measuring the intraclass correlation coefficient (ICC) with 95% confidence interval (CI). The degree of agreement was graded according to Landis and Koch (34): slight (0.01-0.2); fair (0.21-0.4); moderate (0.41-0.6); substantial (0.61-0.8); and almost perfect (0.81-1). Statistical significance was a P value < 0.05. All tests were two-sided. (34) All data analyses were performed using STATA (StataCorp. 2017. Stata Statistical Software: Release 15. College Station, TX, USA: StataCorp LLC).