In the present study, a mathematical method was introduced to assess objectively the efficacy of guidelines in contributing to decision-making, and how to assess the implementation of research findings into Guidelines.
As an example, the mathematical analysis was applied to the ESC Guideline on sports cardiology and exercise in patients with cardiovascular disease (ESC GL-SCE), which showed that it provides helpful recommendations regarding exercise/sports activities on CVD, but the supporting evidence behind the recommendations, in most cases, is of lower quality.
Because the ESC GL-SCE was analysed with mathematical methods, without giving any opinion regarding the scientific content of the GL, the present findings can be compared with those revealed by our previous analytical study on several other GLs, showing that various GLs provide different strengths of recommendations and are supported by the different quality of evidence (Fig. 1–2 and Additional file 3 online).
In this particular case, the ESC GL-SCE showed ‘strong’ recommendations, even if they are supported mostly by lower quality evidence. In various disease conditions, the strength of recommendations and levels of evidence varied. In many conditions, the existing evidence is less than sufficient to make informed decisions at Level A or B. In the case of Level of Evidence C, the results revealed that there is sufficient available evidence to make informed decisions at this Level.
Previous studies evaluating the guidelines with frequencies of Classes of Recommendations and/or Levels of Evidence (independently or dependently to each other) without giving meaningful explanations for these findings, to assess the efficacy of guidelines with Certainty/Uncertainty may be more practical, and support the implementation of scientific evidence to clinical guidelines and practice. Optimal decision-making means to make a decision under uncertainty; therefore, guidelines have to remain patient-centric [35] and not assume an unrealistic, but ideal approach when every recommendation can be proved by RCTs and patients have similar clinical phenotypes [36].
Thus, to demonstrate a ‘quality control’ of GLs, to evaluate the certainty and uncertainty rates, we introduced the Certainty Index (CI), which ranges from − 1 to + 1 (Additional file 1 online). Thereby it is also possible to compare various GLs and identify issues that need to be investigated by future studies. It is also recommended to evaluate the Certainty Index of the type and actions of recommendations [37].
Recently, the use of implementation methods to improve the uptake of clinical guidelines into cardiology practice have been increased [38]. These methods can support the clinical guideline implementation and dissemination to improve the health care and patient outcomes. An important fact, which should be mentioned is that it is difficult to include in one GL all risk factors (obesity, dyslipidaemia, hypertension, diabetes) and different pathomechanisms such as it was experienced in the analysed SCE-GL. Moreover, during developing this GL, it is difficult to assign specific values of type, intensity, duration, and frequency of exercise to improve various cardiovascular conditions. Even if the above could be achieved, still, inter-individual differences such as age, gender, or co-morbidities [39, 40] prevent combining patients in one particular group.
In addition, giving personalized exercise and sport therapy recommendations are difficult in many cases, for example, a relatively small number of women were recruited in previous trials, although it has been recognized that there are sex-related differences in the mechanisms leading to, presentations, diagnosis and treatment of various cardiovascular disease conditions [41]. Thus, the sex-specific response to exercise with different parameters or sex-specific therapeutic use of exercise modalities are not fully understood [42–44]. Also, the age-related range of various cardiovascular parameters is still not based always on solid facts but rather on an idealized young male [45]. In addition, in diseases showing a grey zone in their diagnosis, such as heart failure [46] or in extreme conditions such as high altitude [47], temperature etc., it is difficult to assign personalized exercise regimens.
In general, it is known that decisions have inevitable some bias as it has been revealed by human experimental studies [48] that cognitive bias always occurs in our decision-making, and therefore it must also be part of the decisions made during the preparation of guidelines. Contrary to the fact that, there are minimal standards for a clinical practice guideline to be reliable which are established by the Institute of Medicine [49], the writers may have different basic and clinical scientific expertise and practice, as well as unaware conflicts of interest.
Future considerations regarding the implementation of research findings into Guidelines
Because due to methodological, and ethical reasons, Evidence A cannot be always provided in human studies, perhaps one can suggest that a GL certainty cannot be or should not be increased by only Evidence A but lower evidence in some cases can be accepted to be sufficient, thus implemented into Guidelines. Thus, in such cases, it would be advisable to give greater weight to lower quality of evidence, which, at the same time, would exclude the disadvantages of rigid classifications based on dichotomized or categorised systems such as GRADE framework, AAP, or ESC Guidelines Classifications Scheme in the present guideline [50–52].
Although many different methods have been proposed for grading recommendation strength - which greatly influences the implementation of knowledge - most developers agree that determining the strength of action is distinct from rating the aggregate quality of evidence. One can argue that high-quality evidence (such as grade A) does not always justify strong recommendations. Also, recommendations - or even strong recommendations - may be possible despite lower quality evidence (such as grade B, C, or X) [53]. The primary modifying factor in this regard is the benefit-harm assessment, as defined in the preceding section on action statement profiles.
The method for determining the strength of recommendation developed by the American Academy of Pediatrics (AAP) is simple, transparent, and clinically relevant [50]. Similar to the GRADE approach [51], the aggregate evidence level and benefit-harm assessment are the primary rating determinants. GRADE is more complex, however, and offers only two levels of action strength (‘strong recommendation’ and ‘weak recommendation’) in contrast to the three levels from the AAP (‘strong recommendation’, ‘recommendation’ and ‘option’). Based on the empiric experience in developing guidelines suggests that three levels support more flexible decision making and are better accepted by clinicians [54].
In the present paper, we have also revealed the deviation from the observed and expected (based hypothetical optimal distribution) rates of Certainty on the Levels of Evidence in ESC GL-SCE (GL-RA: Risk factors and Ageing, GL-CS: Clinical settings) (Table 2). These data show that in many cases there is not enough evidence to make an informed decision, thus there is a need for further studies. Such analysis may be useful in other guidelines to reveal the ‘weakest links’, and to update of a clinical practice guideline, it is crucial to identify critical knowledge gaps and define the clinical problem.
Limitations of the study
There are potential limitations to this analytic study. The first is the inherent ‘potential limitations and harms of guidelines’, such as the recommendations may be wrong for a group of or at least one individual patient, errors, which were carried forward into our analysis. The second limitation concerns the optimal decision-making intervals to determine certainty and uncertainty. Even though these intervals are theory-based, arbitrary intervals in line with expert opinions, it is important to note that they are also supported by clinical and decision-making literature [55, 56]. Finally, it is important to note that there was no prior analytic research of this kind on the guidelines; therefore, it is likely that the applied methodology in the current study will be refined and improved in future research.