Principal Results
The first objective of this systematic review was to identify existing BRA methods for medical devices described in the scientific literature. We analysed six studies, revealing ten individual BRA methods for MDs. A majority (7) of those methods are described as quantitative, two as both qualitative and quantitative, and one as qualitative only. The quality of the included papers was mixed, ranging from x to y. Several papers describe methods which, however, are only insufficiently defined within those papers (HOM, CNCO, PC, CPBR) [10, 25]. Our assessment of the methods shows that, even when being described as quantitative, the use of mathematical calculations to determine the variables within the BRA ranges from not numerical (0) to completely numerical (5). At the same time, the methods range from nearly subjective (1) to nearly objective (4). We identified a strong positive and significant correlation between objectivity and the use of numerical calculations.
The second objective was to describe the weaknesses and strengths of the included methods as reported in the literature. For example, the MCDA method was noted for its structured approach to evaluating multiple conflicting criteria but was criticised for its reliance on point estimates and the complexity of implementation [5, 10, 27, 28]. The NBS & BRR method provided clear numerical values for benefits and risks but was criticised for potentially oversimplifying complex benefits and risks [5, 26]. Other methods, like HOM, SCS, and qBRA, offered comprehensive approaches but required extensive data and sophisticated analysis techniques [10, 27].
Comparing Qualitative and Quantitative BRA Methods
Multiple authors criticise the qualitative nature of some current methods, such as the BRF, as being subjective, susceptible to bias and therefore inferior to quantitative methods [5, 26–28]. Our findings partially support this critique since the use of numeric methods correlated with the objectiveness of a method, although there are some outliers. Due to this assumption, the majority (70%) of the methods identified in this systematic review were developed to be quantitative in nature. However, other studies indicate that the most common BRA methods used by MD decision-makers in real-world assessments are qualitative [19, 31–33]. One potential reason for this preference could be that many of the quantitative methods originate from the pharmaceutical industry, where such approaches are well-established and useful [5, 8]. Multiple studies are attempts to adapt these methods for MDs [5, 26, 27], yet some may not be a perfect fit due to fundamental differences between pharmaceuticals and MDs described above [3, 5, 10, 11, 27]. Especially purely quantitative methods, such as PC and QBRD, could only process quantitative data, making them unsuitable for hard-to-quantifiable risks, such as those related to new technologies [12–17]. This limitation is acknowledged in the most recent version of applicable standards [3, 7].
In our assessment of the use of numerical calculations and objectivity within each method, we observed that the extent to which numerical methods were used varied from three to five in the subgroup of quantitative methods. This categorisation as ‘quantitative’ was derived from the self-classification of the authors of the individual methods. However, the strong correlation between numerical methods and the objectivity shown in our analysis and expected by other authors [5, 26–28] does not apply universally; there are methods that, despite being highly numerical, remain very subjective (SCS [10]). Additionally, our assessment of the overall level of objectivity was medium-low, with a mean value of 2.2 for methods labelled as quantitative. The finding, that even methods described as highly quantitative could, in fact, be subjective in their practical application, is supported by other researchers [25]. This could be due to the fact that most methods for the BRA of MDs include some degree of subjectiveness, at least in the identification of endpoints and the assignment of relevance, making complete objectivity potentially impossible.
Practical Implications and Current Gaps
To our knowledge, this is the first review that refers exclusively to BRA methods for MDs. Previous reviews have primarily focused on pharmaceuticals, resulting in a higher number of methods overall, including a greater variety of qualitative BRA methods [9]. However, due to their unique characteristics, not all of those methods could be applied to MDs [3, 5, 10, 11, 27]. Additionally, the BRA of pharmaceuticals relies on the comparison of effects between different groups, e.g., intervention against placebo [9, 25], which is not always possible for MD testing scenarios. In this case, using the current state of the art (SOTA) as a base for comparison is recommended by methods and guidelines [1, 5, 30].
The importance of conducting a BRA is widely recognised and mandated by regulatory requirements and standards [1–4, 7, 34]. In the EU, there is currently no standardised approach or guideline for BRA, creating a considerable gap [5, 10, 19]. In contrast, in the US, the FDA has issued comprehensive guidance documents that describe the qualitative BRF and provide practical examples but leaves open whether quantitative methods may also be used [2]. However, both US and EU regulations recognise the BRA is an integral step for the approval process, in which all risks of a device are weighted against the clinical benefits to determine whether is should be made available on the market [1, 2, 4]. Thus, in standards and guidance, two levels of the BRA exist: first, an individual BRA for each unacceptable risk, and second, an overall BRA [1–3, 7, 11, 18, 35].
Some BRA methods, such as PC and QBRD, rely on pairwise comparisons of benefits and risks, which have some benefits (granularity, transparency) [5, 25, 30] but fall short in several critical aspects. Firstly, MD benefits and risks are not necessarily linked in pairs, making such comparisons impractical in many cases. Secondly, these methods rely on quantitative data, which could be difficult to collect for some risks. Thirdly, conducting the overall BRA required by standards and guidelines [1–3, 7, 11, 18], could be difficult. Lastly, as the number of risks and benefits increases, the complexity of pairwise comparison methods like QBRD becomes challenging.
With new categories of digital and connected MDs, new challenges that might be difficult to assess by some of the methods described in the review arise. It is already recognised that finding numerical data about risks connected to, for example, cybersecurity, user interfaces, or MDs integrated into IT systems could be hard to find [3, 7] and that innovative MDs often pose risks that were previously unknown and are difficult to quantify [12–17]. This is a gap that current BRA methods described in the literature do not address adequately.
While addressing this gap might be difficult, multiple authors state that there is no “one size fits all solution” [9] which covers all types of MDs, all stages of the product life cycle, and all cases in which a BRA might be necessary [9, 10, 19, 25]. Therefore, different methods could be used depending on the level and type of risk. One approach would be to conduct a qualitative BRA first and then assess whether a quantitative BRA should be conducted [28]. Alternatively, combining qualitative and quantitative approaches into one method could be effective. This can be achieved with methods such as MCDA [5, 28] or the BRF [2, 5]. The benefit of such a combined approach is that these frameworks provide a holistic overview of MDs and are capable of incorporating new risks, making it a comprehensive tool for evaluating the safety and efficacy of especially innovative MDs. A benefit of the BRF, in particular, is that it can incorporate quantitative aspects through improvements proposed by various authors [26], thus overcoming the limitations of qualitative approaches (subjectivity and lack of transparency) by introducing consistency and objectivity to benefit evaluations [5, 26].
We suggest that future research should empirically evaluate the methods identified in this review through applying them to a range of example devices (either real devices or realistic and detailed models of devices), as has already been done in part for pharmaceuticals [36]. This would enable the direct analysis and comparison of the strengths and weaknesses of the different methods and would furthermore allow the assessment of which methods should be excluded from application, which should be retained as reliable. It would also identify how the methods could be further refined and developed to be better fit for purpose.
Limitations
This systematic review is subject to several limitations. First, our findings may be influenced by a form of publication bias, where MD manufacturers do not disclose their used BRA method to the public. Second, we included only reports written in English. Although the majority of scientific research is published in English, this may have resulted in the exclusion of pertinent studies in other languages. Third, we limited our review to studies published after 2000. This restriction was necessary due to the major updates to regulations in recent years, but it may exclude valuable historical insights. Fourth, there is a scarcity of methods specifically designed for medical devices, with most methodologies being developed for pharmaceuticals. The significant differences between these fields mean that many pharmaceutical methods may not be applicable to MDs. Fifth, our review focused exclusively on the EU and the US. This geographic limitation may reduce the applicability of our findings to other regions. Sixth, while research indicates that companies often employ qualitative approaches [32], these are infrequently detailed in the scientific literature, potentially limiting our understanding of their use in practice. Seventh, the lack of standardisation of the terminology in the field may have affected the comprehensiveness of our search terms, possibly leading to the omission of relevant reports. These limitations suggest that our findings should be interpreted with caution and highlight the need for further research to gain a complete understanding.