Clinical simulation in the regulation of software as a medical device (SaMD): an eDelphi study

doi:10.21203/rs.3.rs-3575725/v1

Download PDF

Article

Clinical simulation in the regulation of software as a medical device (SaMD): an eDelphi study

https://doi.org/10.21203/rs.3.rs-3575725/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Accelerated digitalization in the health sector requires the development of appropriate evaluation methods to ensure digital health technologies (DHTs) are safe and effective. Clinical simulation can be used by researchers to test DHTs in an agile and low-cost way, yet there is limited research on criteria to assess the robustness of simulations and subsequently, their relevance for a regulatory decision. The aim of this study was to gain consensus the research question “What criteria should be used to assess clinical simulation being used to generate evidence for software as a medical device (SaMD)?” 39 international experts in the digital health field, including academics, regulators, policy makers, and industry representatives, participated in a three-round eDelphi exercise. Options were generated through the scoping questionnaire around key themes identified from the literature to obtain a comprehensive list of criteria and voted upon in two subsequent questionnaire rounds. Consensus was defined by two criteria: if <10% of the panelists deemed the criteria as ‘not important’ or ‘not important at all’ and >60% ‘important’ or ‘very important’. 43 criteria gained consensus from the panelists across seven domains and the Simulation for Regulation of SaMD (SIROS) framework was developed. We highlight key areas of concern identified by panelists, specifically on the importance of fidelity of simulation and its reporting, and the challenge of bias. Future research should prioritise the development of safe and effective SaMD, while implementing the criteria generated for regulating DHT based on clinical simulation evidence can enable faster uptake of technologies.

Health sciences/Health care

Health sciences/Medical research

COVID-19 has accelerated digitization in the health sector and there is an urgent need for appropriate evaluation methods to ensure that digital health technologies (DHTs) are safe and effective. Software as a medical device (SaMD), defined by the International Medical Device Regulators Forum (IMDRF) as “software intended to be used for one or more medical purposes that perform these purposes without being part of a hardware medical device”¹, is a DHT that is growing in relevance². Yet existing regulations on medical devices were developed before the widespread development of SaMD and are ill-suited for this type of DHT.

Discussions on common frameworks and principles for SaMD, including: key definitions¹, framework for risk categorization³, quality management system⁴, and clinical evaluation⁵ have left gaps. The latest literature on the regulation of SaMD calls for international standards and guiding principles addressing the unique iterative nature of SaMD^6,7. Several examples outlining the challenges have recently emerged. In June 2023 iRhythm received a letter from the US Food and Drug Authority (FDA) outlining serious violations following challenges that were made to the SaMD’s hardware, firmware, and algorithm that potentially require new 510(k) submissions^8,9. The regulator's concern was the potential impact of these changes to safety or effectiveness of iRhythm. Fundamentally, the challenge to SaMD developers is how to manage evidence generation to support the assertion that the device remains safe and effective as per regulatory requirements each time iterative changes are made to hardware and firmware.

Novel approaches to evidence generation, including real world evidence (RWE) and clinical simulation, are being developed to support the development of DHTs to market and maintaining regulatory adherence over time. Clinical simulation research puts the intended end-users, such as clinicians, into a simulated environment that replicates real-life scenarios. While traditionally clinical simulation has refered to simulation used in medical education, there is an increasing focus on its use as an investigative methodology¹⁰. Participants in clinical simulation studies are provided with inputs such as synthetic patient cases and clinical scenarios, which they are asked to consider while assessing the DHT in this context. This allows researchers to test DHTs in an agile and lower-cost way, whilst making sure that they are testing the technology with the right people, at the right time, and in the right place. The FDA has made decisions to authorise medical devices for market based in part or exclusively on novel approaches, including validation simulations, and use of synthetic data for assessment¹¹. Post-market, simulation is also used to aid manufacturers to detect error patterns, validate updates to the technology, and predict future errors.

Given the iterative challenges of SaMD, and implications for regulatory approvals, we selected it as a use case for exploring clinical simulation as a research and evidence generation method. We conducted an eDelphi study to gain consensus on the research question “What criteria should be used to assess clinical simulation being used to generate evidence for SaMD?”

Scoping round

39 participants from across industry (30.7%), regulatory bodies (17,9%), academia (38.4%) and policy (10.2%) completed the scoping round out of the 45 individuals (87%) who had consented (Table 1). Participants were from 9 different countries (Table 2). 39 participants took part in the scoping round out of the 45 individuals (86.6%) who had consented. There were 26 qualitative comments made in response to the final question regarding other items that had been missed, along with 132 comments made throughout the questionnaire after each participant responded to the 19 criteria. For each of the 19 criteria that were proposed to the participants, over 50% of the responses selected were that they were ‘relevant’. Given that most participants marked each criterion as ‘relevant’, none of the 19 pre-defined criteria were excluded from progression to round 1. However, most were re-phrased or amended based on feedback. The qualitative comments, along with the pre-defined criteria, were refined by the research team into 55 items grouped into seven categories: background & context; overall study design; study population; delivery of the simulation; fidelity; software & AI; study analysis.

Delphi rounds 1 and 2

38 participants were invited to take part in round 1; one of the study participants did not provide their personal details when completing the online questionnaire and so could not be contacted to complete the subsequent round. 35 participants (92.1%) completed round 1 and 33 participants (94.2%) completed round 2.

Round 1 quantitative analysis

A total of 43 of the 55 items in Round 1 met the pre-defined quantitative criteria for progressing to round 2. Twelve items were excluded based on the two criteria – 11 items had >10% of participants who rated it as ‘not important at all’ or ‘not important’. One further item was excluded as less than 60% of participants rated it as ‘very important’ or ‘important’. However, of the 11 items that >10% of participants rated as ‘not important at all’ or ‘not important’, 5 items had >60% of participants rating them as ‘very important’ or ‘important’ and so met one of the two criteria for consensus. This means that the inclusion of two criteria to determine which items to take forward in Delphi rounds may lead to confusion or unnecessary exclusion of potentially important items.

The items with the greatest proportion of participants who rated them as either ‘very important’ or ‘important’ included ‘a clear description of the SaMD being evaluated, including its purpose and intended end users’ (100%), ‘the primary and secondary outcome measures are clearly defined, including how and when they were assessed’ (100%), and ‘the initial orientation and any training provided to the clinicians before taking part in the clinical simulation is described’ (97%).

Round 1 qualitative analysis

A detailed thematic analysis was performed with the qualitative comments that were provided by participants in Round 1, where they were asked to summarize the reasons for their decisions after each section. Within the ‘Background & Context’ theme, there were four primary sub-themes : evidence, research team, SaMD context and regulatory process demands (Figure 1). Participants felt that the “level and quality of available evidence is key” this must be clearly presented and comprehensive in their scope. The research team expertise should be appropriate and transparently reported, along with any conflicts of interest. Related to the SaMD context, the intended use is “absolutely essential in evaluating the evidence generated to support a regulatory decision” along with defined intended end user. The broader regulatory workload and processes should be acknowledged to ensure that work is not replicated and to ensure organisations are incentivized to adopt a simulated approach to evidence generation.

Within the ‘Study Design’ theme, there were five primary sub-themes – study design, equity, digital literacy, risk management, framework presented (Figure 2). Related to study design, location of use was relevant for any SaMD products to be used in a home setting and the study population should consider how the patient population was chosen to avoid introducing any bias or inequity. This echoes the overall importance of “understanding and eliminating reasons for inequitable access”, while another participant felt that the equity of service provision is for “decision makers not those who conduct primary research”. Risk management was discussed and one participant felt that it was the ‘most important factor’ as it impacts across many other areas, with several standards referenced by participants that should be adhered to. A broader point related to the framework that had been presented in round 1, some participants felt that there was unclear wording which may have impacted their answers.

Round 2 quantitative analysis

All 43 of the items that met the criteria to progress to Round 2 also met the same criteria to be included in the final Delphi results. As a result, consensus was reached on the 43 criteria to answer the research question “What criteria should be used to assess clinical simulation being used to generate evidence for SaMD?”. Based on these criteria, the Simulation for Regulation of SaMD (SIROS) framework was developed. The framework is designed to provide relevant stakeholders a clear criteria to assess clinical simulation being used to generate evidence for SaMD.

The final list of items included were also distinguished between different degrees of agreement – with >60% as low level of agreement, >70 as high level of agreement and >80% as very high level of agreement. 5 items met low level of agreement, 9 had high level of agreement while 29 items met very high level of agreement.

Background and context criteria viewed as most important to use in clinical simulation to generate evidence for SaMD were a clear description of the SaMD being evaluated, including purpose and intended users, and description and justification of the simulation performed; this was alongside any other research being conducted to evaluate the SaMD. While remaining important, the criteria for appropriately declaring sources of funding and other conflicts of interest scored less highly. In terms of overall study design criteria, participants viewed it as important that potential limitations of study design and potential biases associated are discussed. Similarly, strategies to minimize potential study biases being discussed, particularly regarding issues of equity (e.g., high risk patient profiles, racial disparities), were also noted as important. Again, while consensus was reached to include information on how digital literacy is considered in the study design, participants did not rate this as highly as the other areas. The study population criteria rated as most important included the eligibility criteria for clinicians who took part in the clinical simulation being representative of the intended end users (e.g., staff level, qualification, experience) and transparency around the number of clinicians who took part in the simulation.

Thehighest rated delivery of the simulation criteria included the need to describe the environment in which the simulation took place (e.g., physical/virtual, type of facility), the equipment used, the initial orientation and training provided to clinicians before taking part, and a description of how the SaMD was described to clinicians before taking part. The fidelity criteria rated as most important were that the clinical simulation has high conceptual fidelity that meets the intended use of the SaMD, it uses high fidelity synthetic patient cases, and simulation has high clinical scenario fidelity. Beyond these elements, participants highly rated the need to describe the methodology and rationale for developing the synthetic patient cases, the overall representativeness of the synthetic patient cases, the potential limitations of the synthetic patient cases, as potential data biases in their development. On software and AI, participants viewed all criteria as highly important, including the need to describe any continuous machine learning (ML) algorithms embedded in the SaMD, including their design and development, ensure they are reviewed at regular intervals to monitor their changes, and describe and justify any software updates to the SaMD since the clinical simulation study.

The most important study analysis criteria included the need to clearly define primary and secondary outcome measures, and provide a rationale and justification for selecting these, the impacts of any unintended consequences (e.g., harm) from the study, data analysis methods, generalizability of the findings, and results of a sensitivity analysis to assess the robustness of the clinical simulation findings.

The inclusion of the majority (43/55) of criteria presented to the Delphi participants suggests there is substantial data collection and reporting that need to be considered and executed if clinical simulation is to be increasingly used to authorise medical devices for market based in part or exclusively on validation simulations. However, the benefits of such evidence generation and reporting are also clear; evidence can be used to achieve regulatory authorisation and post-market, can detect error and validate technologies¹¹. Developing the SIROS framework with guidance across the 43 criteria presented in this study can enable manufacturers to work methodically on evidence generation and regulatory submissions for more streamlined SaMD approval. A next step will be to develop an accompanying checklist that makes the SIROS framework actionable for SaMD developers.

While our study is the first of its kind in collecting consensus on how to assess clinical simulation being used to generate evidence on SaMD, participants in our research viewed many of the criteria as important. This is complementary to findings from other contemporary studies on methods to generate evidence on DHTs. More generally, participant comments on the importance of such guidance are in line with research by Day et al.¹² who note that many digital health start-ups have limited clinical robustness, as measured by regulatory filings and clinical trials. The authors note a lack of meaningful clinical validation for almost half of digital health companies (44% had a clinical robustness score of 0), highlighting a lack of guidance such as the SIROS framework. In recent years, national guidance and regulations have been developed to enable rapid assessment of DHTs. In the United States, the Digital Medicine Society has created a regulatory compass tool entitled RegPath, with input from the FDA, to enable improved understanding of whether a specific DHT falls within FDA regulation, and if so, which regulatory pathway is relevant¹³. In Europe, German regulators have developed a fast-track pathway for digital health applications (in German, DiGA) to be reimbursed by statutory health insurances^14,15, a model that will now be employed in other European countries. The importance of developing a framework for assessing clinical simulation can therefore been seen as a next step to compliment ongoing regulatory developments internationally, where there are limited tailored guidelines or frameworks at present.

The seven areas and associated criteria agreed by the participants, are in line with existing literature and research studies that utilized clinical simulation to evaluate DHTs. Gardener et al. used clinical simulation to evaluate a clinical decision support tool for matching cancer patients to clinical trials¹⁶. Participants in the research stated that they were provided sufficient guidance on the exercises and enough clinical information in the synthetic patient cases, though a small number noted that they would have preferred more information on histology information. Such findings suggest the importance of providing regulators information on initial orientation and training provided to clinicians before taking part, as a key factor in the clinical simulation’s success. Gardener et al., reported that participants noted that a lack of familiarity with the novel solution could potentially challenge the clinical simulation approach. However, as there are few published studies on the role of clinical simulation to evaluation DHTs, there is an urgent need for further research in this area that can both utilize the areas developed through our research, but also validate these areas by utilizing them in practice. Similarly, there is the potential for the seven-dimension SIROS framework developed through to be utilized in the evaluation of other types of DHTs where many similar issues will be of concern to regulators.

Echoing the well-researched importance of fidelity of simulation used in health education^17,18, participants in the Delphi identified this as a key area where researchers were required to outline how the clinical simulation sought high fidelity with planned future use of the SaMD. The specific context in which the SaMD is intended to be used must be considered in planning for the clinical simulation to enable accurate reporting, with particular attention paid to high fidelity synthetic patient cases, and their implications for representativeness and equity. In this regard, the evidence required is similar to that of all DHTs developed with AI/ML methods. For example, the Good Machine Learning Practice for Medical Device Development: Guiding Principles developed by the FDA, UK Medicines and Healthcare products Regulatory Agency (MHRA) and Health Canada in 2021 encourage good practice in medical device development using AI/ML, including the reduction of bias through representative clinical study participants and datasets¹⁹. In cases where the SaMD being simulated uses continuous self-learning algorithms, the Delphi participants highlighted the need for reporting of plans for continuous monitoring and other steps to maintain quality and safety as part of defining, controlling, and improving software life cycle processes outlined in ISO/IEC 12207²⁰ and adhering to relevant local legislation and regulatory guidance. Such views were echoed by Carolan et al. who note the need for international standards and guiding principles addressing the uniqueness of SaMD with a continuous learning algorithm⁶.

The importance of presenting issues of bias and equity through the simulation process is a central element of how to assess clinical simulation being used to generate evidence on SaMD according to our research participants. This is perhaps unsurprising given the increasing research outlining the potential risk of bias and increased health inequities associated with poorly developed or implemented DHTs, including AI/ML^7,21-23. Guo et al. identified a range of relevant tools and frameworks providing guidance on different aspect of bias in evidence generation studies²⁴. These include Quality in prognosis studies (QUIPS), Cochrane risk-of-bias tool for randomized trials (RoB2), PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies, and The risk of bias in nonrandomized studies of interventions (ROBINS-I). Such frameworks offer SaMD developers a ready source of information to address and report on issues related to bias as part of clinical simulation research during submissions for regulatory approval. In response to the challenge of adaptive technologies, the proposed FDA framework for modifications to AI/ML-based SaMD further seeks to ensure safety and effectiveness are maintained²⁵.

Current tools and guidance remain at best a stopgap until regulatory environments and international guidelines can be developed to ensure SaMD is developed and deployed with a clearer understand on its impact on quality and safety through the software life cycle. To advance this process, manufacturers should engage with regulators and propose clinical simulation methods for the purpose of regulatory approval. While there are a few existing examples of clinical simulation data in part or wholly being used as evidence for regulatory approvals, developing more real-life use cases will enable the development of best-practice guidelines and lessons learned that will benefit all stakeholders. Regulators and notified bodies should also work with manufacturers on the application of clinical simulation. Providing greater clarity on what they would like to see from data and how it can best be collected and presented for SaMD approval, will enable manufacturers to be increasingly targeted in their approach to clinical simulation.

Limitations

Delphi studies traditionally begin with an open-ended question for participants to provide numerous open-ended responses. The results of this initial ideas generation stage are then analysed, summarized and presented in subsequent rounds²⁶. However, some studies have chosen a different approach, where pre-existing information is initially described to participants and on which they are asked their opinion^27,28. This approach was taken in this study as outlined in the procedure. This can be justified as it aims to prepare participants for the upcoming rounds and can reduce the potentially overwhelming task of data analysis. However, limitations regarding potential bias of responses exist along with exclusion of relevant ideas that participants may have contributed if they were requested in an open-ended format.

There was some confusion apparent in the panelists’ qualitative comments, particularly in the scoping round, about what the terminology used and the question’s context. For example, many panelists misinterpreted the term ‘study participants’, ‘intervention’ and ‘evaluation opinion’ most commonly. This may have been due to the complexity of the use case scenario and with no practical scenarios provided to help with its understanding. To overcome this, an analysis of the comments regarding lack of clarity was carried out and used to improve the wording of the questions and criteria for round 1.

Despite efforts to recruit a global cohort that is representative across high- and low-and-middle income countries, most participants came from high-income countries, particularly the UK. This may have arisen due to several factors, such as the informal networks from the study sponsor being predominantly based in the UK, or that the global SaMD community has a greater base in high income countries. Regardless, further research is required to ensure that the study results are applicable to other settings and in other country contexts.

The two pre-defined criteria for deciding on whether to include criteria between successive rounds may have led to some potentially important items being removed unnecessarily. As mentioned previously, in the quantitative analysis of rounds 1 and 2, any item was removed if it was rated by more than 10% of panelists as ‘not important’ or ‘not important at all’ or if less than 60% of panelists rated it was ‘important’ or ‘very important’. The analysis of round 1 led to 5 items that met the former criteria but did not meet the latter. However, given that it did not meet one of the pre-determined criteria it was excluded from round 2. Therefore, these 5 items may have been included important information for regulators to consider when evaluating clinical simulation methods but were excluded from inclusion.

In conclusion, the Delphi exercise undertaken enabled the development of the SIROS framework of seven domains and associated criteria for assessing clinical simulation being used to generate evidence on SaMD. Implementation of the criteria generated can enable faster uptake of high-potential technologies. The framework is highly relevant in the current health regulatory landscape where there is limited guidance or regulatory oversight on SaMD implementation or use across the life cycle of the technology. Participants in the Delphi study identified key areas of concern, specifically around the importance of fidelity of simulation, and its reporting, as well as the challenge of bias in SaMD which risks a reliance on ML algorithms trained on inadequate datasets. These aspects, as well as the other areas and criteria agreed through Delphi consensus, must be addressed by both developers and regulators as regulatory requirements for SaMD advance in the coming years.

Design

The study was designed as a Delphi method using an online format, known as eDelphi²⁹, and conducted virtually between October 2022 and January 2023. The Delphi method is a systematic research method that uses structured communication techniques to achieve consensus on a specific topic. The participants in the process are experts who have been selected based on the research question. The Delphi process involves the participants answering several rounds of questionnaires, with an anonymized summary synthesized and shared by the Delphi facilitator between each round. Before the next round, participants are requested to review their previous answers, and the process is repeated until a pre-determined stop criterion is met²⁹. Given the identified need for standards and criteria of using clinical simulation to evaluate SaMD, the Delphi technique was chosen as an appropriate method to build international consensus. The online format allowed us to consult international experts, which was appropriate given the global applicability of the research question and the benefit of including experts from different geographies to provide differing perspectives. The study design and reporting were in line with the Guidance on Conducting and REporting DElphi Studies (CREDES)³⁰.

Study participants

The study group comprised 45 participants who were either key members in the digital health field (academia, regulatory, industry, policy makers) or experts that has been identified by the study team as having the relevant knowledge and experience to provide insights of evidence generation in the regulatory space. Study participants were selected via purposive and snowball sampling³¹ to reflect a wide range of geographical location, sector, and experience. These sampling methods aimed to ensure that the findings were validated from technical, clinical, and system perspectives. Inclusion criteria was individuals who have expertise or experience in the area of regulation of medical devices or digital health technologies. Participants under 18 years old or unable to communicate in fluent English were excluded. Participants were formally invited to participate in the study by email and received a participant information sheet and consent form. They were made aware of the aims of the research and study protocol and informed consent was obtained. Throughout the study, only participants who completed the previous study round were invited to complete subsequent rounds. Participants were sent email reminders to complete each round.

Procedure

The study began with a scoping round, which aimed to begin the process of determining options²⁶ and obtain a comprehensive list of items for subsequent Delphi rounds. To prepare for the scoping round, a literature review was conducted to identify possible criteria relevant to the use of clinical simulation in the regulation of SaMD. This initial list of criteria was developed into an online questionnaire and piloted with several experts before formally sending it to the wider panel of study participants. Participants were sent the questionnaire by email and asked to indicate whether each item was “relevant”, “irrelevant”, or if they were “not sure”, with free text to provide comments. Participants were also asked to suggest additional items to be included in subsequent rounds. They were encouraged to provide as many opinions as possible so as to maximize the chance of covering the most important opinions and issues²⁶.

Round 1

The first Delphi round aimed to start to build consensus on the list of items generated from the scoping round. The list of criteria that progressed from the scoping to the first Delphi round was presented to participants in a similar online questionnaire. Participants were asked to rate each item in the list on a five-point Likert scale (‘5 – very important’, ‘4 – important’, ‘3 – neither important nor unimportant’, ‘2 – not important’, ‘1 – not important at all’). Participants were also given the opportunity to provide qualitative comments at the end of each section of items. Two pre-defined criteria for consensus were applied to decide whether to include items in the second Delphi round. Items which were rated by more than 10% of panelists as ‘not important’ or ‘not important at all’ were excluded. In addition, if less than 60% of participants rated an item as ‘very important’ or ‘important’ it was excluded. The qualitative data was initially analyzed separately by two researchers, FOD and NOB, following which they discussed and finalized the themes. Thematic analysis was only performed for the first round of the Delphi to capture early comments from participants when they are likely most engaged.

Round 2

The purpose of the second Delphi round was to build consensus on the final list of criteria for the use of clinical simulation in the regulation of SaMD. Results from round one, along with the qualitative comments, were summarized and shared with the panelists. The same two criteria as used in the first round were applied to achieve consensus in the second round. As in round 1, consensus in round 2 was pre-defined in the study protocol as >60% of participants indicating that an item is ‘important’ or ‘very important’. This is in line with previous Delphi studies which usually have a threshold of 60% of higher³². The final list of items included was distinguished between different degrees of agreement – with >60% as low level of agreement, >70% as high level of agreement and >80% as very high level of agreement.

Table 1. Composition of the participants by sector

Sector	Recruited Participants	Scoping Round	Round 1	Round 2
n	45	39[1]	35	33
Industry	15	12	11	11
Regulatory	7	7	7	7
Academia	16	15	14	12
Policy	7	4	3	3

Table 2. Composition of the participants by country

Country	Recruited Participants	Scoping Round	Round 1	Round 2
n	45	39[2]	35	33
UK	24	21	19	17
USA	6	4	4	4
Germany	4	3	3	3
Switzerland	3	3	3	3
Singapore	2	2	2	2
Australia	2	2	2	2
France	1	1	1	1
Belgium	1	1	0	0
Luxemburg	1	0	0	0
Netherlands	1	1	1	1

Table 3. Summary of final agreed items and level of consensus

Category	Criteria	Consensus (% rating as 5-very important or 4-important)
Background & Context	Clear description of the SaMD being evaluated, including its purpose and intended end users	100%
	Description and justification of the clinical simulation performed, alongside any other research being conducted to evaluate the SaMD	94%
	Overview of the existing evidence to support the SaMD is provided	76%
	Sources of funding and other conflicts of interest are declared appropriately	67%
Overall Study Design	Potential limitations of the study design are discussed	94%
	Potential biases associated with the study design are discussed	100%
	Strategies to minimise potential study biases are described	85%
	Issues on equity have been considered in the overall study design e.g., high risk patient profiles, racial disparities	85%
	Digital literacy is considered in the study design e.g., digital literacy of clinicians taking part in the clinical simulation or the digital literacy of the intended end users	70%
	Risk management in the study is described e.g., impact assessments	76%
Study Population	The eligibility criteria for clinicians who took part in the clinical simulation is representative of the intended end users e.g., staff level, qualification, experience	91%
	The sampling and recruitment methods used to recruit clinicians who took part in the clinical simulation is clearly described	70%
	The number of clinicians who took part in the clinical simulation is provided	85%
	Issues on equity were considered in the sampling and recruitment process to ensure representativeness	79%
Delivery of the Simulation	The environment in which the clinical simulation took place is described e.g., physical or virtual location, type of healthcare facility	85%
	The timing of the clinical simulation is described e.g., time of day, length of time taken	67%
	The equipment used for the clinical simulation is described	88%
	The facilitator (the individual who facilitated the clinical simulation for the clinicians), if any, is described e.g., what role they took, how many there were, what input they had	64%
	The initial orientation and any training provided to the clinicians before taking part in the clinical simulation is described	88%
	When the clinical simulation was being performed, the SaMD was described in sufficient detail to the clinicians taking part in the clinical simulation to allow them to evaluate it	88%
Fidelity	There is a clear analysis, considering the risk and impact, of the different levels of fidelity, e.g., high, medium and low, required for various aspects of the clinical simulation	73%
	A lack of fidelity in any aspect of the clinical simulation is explained and justified e.g., fidelity in one aspect of the scenario may not be required for the SaMD being assessed	73%
	The clinical simulation has high conceptual fidelity, that meets the intended use of the SaMD	88%
	The clinical simulation uses high fidelity synthetic patient cases, that meet the intended use of the SaMD	88%
	The clinical simulation has high clinical scenario fidelity, that meets the intended use of the SaMD	94%
	The clinical simulation has high healthcare facilities fidelity, that meets the intended use of the SaMD	73%
	The methodology and rationale for developing the synthetic patient cases is described	82%
	The overall representativeness of the synthetic patient cases is described	88%
	Potential limitations of the synthetic patient cases are discussed	91%
	Potential data bias in development of the synthetic patient cases is discussed	82%
	Strategies to minimise potential data bias associated with synthetic patient cases are discussed	79%
Software & AI	Any continuous machine learning algorithms embedded in the SaMD are described	100%
	The design and development of any continuous machine learning algorithms embedded in the SaMD are described	82%
	Any continuous machine learning algorithms are reviewed at regular intervals to monitor their changes from the initial set-up	97%
	Any software updates to the SaMD made since the clinical simulation study are described and justified	94%
Study Analysis	The primary and secondary outcome measures are clearly defined, including how and when they were assessed	97%
	Rationale and justification for the chosen primary and secondary outcome measures is provided	94%
	The usability of the SaMD is assessed as part of the clinical simulation	79%
	The feasibility of the SaMD is assessed as part of the clinical simulation	75%
	The impacts of any unintended consequences e.g., harm/clinical risk from the study are described	88%
	The data analysis performed is clearly described e.g., statistical methods and the unit of analysis used (e.g., individual, team, group)	100%
	The generalisability of the study findings is discussed, e.g., to other populations or clinical scenarios	88%
	Sensitivity analysis is performed to assess the robustness of the clinical simulation findings	88%

Author contributions:

FO coordinated and managed the Delphi rounds. FO analysed the participant data and all qualitative and quantitative data between rounds. FO and NO carried out the detailed qualitative analysis in Round 1. FO and NO were major contibutors in writing the manuscript. All authors read and approved the final manuscript.

Ethical approval

Ethical approval for this study was granted by the local research ethics committee at Imperial College London (22IC7862). All participants were electronically provided with participant information prior and provided signed (electronic) consent prior to commencing the scoping round.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Acknowledgements

This paper received funding support from Roche Diagnostics.

Competing interests

C.G. and M.P. are employees of Roche Diagnostics; S.G. is a co-founder of Psyma Ltd; G.F., and S.G. are co-founders of Prova Health which has also received funding from Roche Diagnostics; M.P. is a co-founder of Open Medical Ltd.

Data availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

IMDRF. Software as a Medical Device (SaMD): Key Definitions. (International Medical Device Regulators Forum, 2013).
MHRA. Consultation on the future regulation of medical devices in the United Kingdom. (Medicines and Healthcare products Regulatory Agency London, United Kingdom, 2021).
IMDRF. Software as a Medical Device: Possible Framework for Risk Categorization and Corresponding Considerations. (International Medical Device Regulators Forum, 2014).
IMDRF. Software as a Medical Device (SaMD): Application of Quality Management System. (2015).
FDA. Software as a Medical Device (SAMD): Clinical Evaluation – Guidance for Industry and Food and Drug Administration Staff. (Food and Drug Administration, United States, 2017).
Carolan, J. E., McGonigle, J., Dennis, A., Lorgelly, P. & Banerjee, A. Technology-Enabled, Evidence-Driven, and Patient-Centered: The Way Forward for Regulating Software as a Medical Device. JMIR Med Inform 10, e34038 (2022). https://doi.org:10.2196/34038
Mori, M. et al. Sensible regulation and clinical implementation of clinical decision support software as a medical device. BMJ 376, o525 (2022). https://doi.org:10.1136/bmj.o525
Taylor, N. P. FDA: iRhythm targeted heart monitor at ‘high-risk’ patients without seeking clearance, <https://www.medtechdive.com/news/IRTC-iRhythm-FDA-letter-high-risk-patients/652265/> (2023).
FDA. WARNING LETTER: iRhythm Technologies, Inc., (Food and Drug Administration, United States, 2023).
Prime, M. S. et al. An assessment of the NAVIFY clinical trial match application using clinical simulation-based research methods. Journal of Clinical Oncology 39, e13552 (2021).
Johner Institute. Computer-Based Modeling & Simulation: Not Just a Tool for Quicker Product Approval, <https://www.johner-institute.com/articles/product-development/and-more/computer-based-modeling-simulation/> (2020).
Day, S., Shah, V., Kaganoff, S., Powelson, S. & Mathews, S. C. Assessing the Clinical Robustness of Digital Health Startups: Cross-sectional Observational Analysis. J Med Internet Res 24, e37677 (2022). https://doi.org:10.2196/37677
Digital Medicine Society. Digital Health Regulatory Pathways, <https://dimesociety.org/access-resources/digital-health-regulatory-pathways/identify/> (2023).
NICE. Early Value Assessment (EVA) for medtech, <https://www.nice.org.uk/about/what-we-do/eva-for-medtech> (2023).
Schliess, F. et al. The German Fast Track Toward Reimbursement of Digital Health Applications (DiGA): Opportunities and Challenges for Manufacturers, Healthcare Providers, and People With Diabetes. J Diabetes Sci Technol, 19322968221121660 (2022). https://doi.org:10.1177/19322968221121660
Gardner, C. et al. Evaluation of a clinical decision support tool for matching cancer patients to clinical trials using simulation-based research. Health Informatics J 28, 14604582221087890 (2022). https://doi.org:10.1177/14604582221087890
Kyaw Tun, J., Alinier, G., Tang, J. & Kneebone, R. L. Redefining Simulation Fidelity for Healthcare Education. Simulation & Gaming 46 (2015).
Andersen, P. et al. Snapshots of Simulation: Innovative Strategies Used by International Educators to Enhance Simulation Learning Experiences for Health Care Students. Clinical Simulation in Nursing 16, 8-14 (2018).
FDA. Good Machine Learning Practice for Medical Device Development: Guiding Principles. (Food and Drug Administration, United States, 2021).
ISO. ISO/IEC/IEEE 12207:2017 Systems and software engineering — Software life cycle processes, <https://www.iso.org/standard/63712.html> (2017).
O’Brien, N. et al. Addressing Racial and Ethnic Inequities in Data-driven Health Technologies. (Institute of Global Health Innovation, Imperial College London, London, UK, 2021).
Zinzuwadia, A. & Singh, J. P. Wearable devices-addressing bias and inequity. Lancet Digit Health 4, e856-e857 (2022). https://doi.org:10.1016/S2589-7500(22)00194-7
Wired. Health Care Bias Is Dangerous. But So Are ‘Fairness’ Algorithms, <https://www.wired.com/story/bias-statistics-artificial-intelligence-healthcare/> (2023).
Guo, C. et al. Challenges for the evaluation of digital health solutions-A call for innovative evidence generation approaches. NPJ Digit Med 3, 110 (2020). https://doi.org:10.1038/s41746-020-00314-2
FDA. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) - Discussion Paper and Request for Feedback. (Food and Drug Administration, United States, 2023).
Hasson, F., Keeney, S. & McKenna, H. Research guidelines for the Delphi survey technique. J Adv Nurs 32, 1008-1015 (2000).
Jenkins, D. A. & Smith, T. E. Applying Delphi methodology in family therapy research. Contemp Fam Ther 16, 411-430 (1994).
Duffield, C. The Delphi technique: a comparison of results obtained using two expert panels. Int J Nurs Stud 30, 227-237 (1993). https://doi.org:10.1016/0020-7489(93)90033-q
eDelphi. eDelphi Delphi Method Software, <https://www.edelphi.org/.> (2022).
Junger, S., Payne, S. A., Brine, J., Radbruch, L. & Brearley, S. G. Guidance on Conducting and REporting DElphi Studies (CREDES) in palliative care: Recommendations based on a methodological systematic review. Palliat Med 31, 684-706 (2017). https://doi.org:10.1177/0269216317690685
Biernacki, P. & Waldorf, D. Snowball Sampling: Problems and Techniques of Chain Referral Sampling. Sociological Methods & Research 10, 141-163 (1981).
Niederberger, M. & Spranger, J. Delphi Technique in Health Sciences: A Map. Front Public Health8, 457 (2020). https://doi.org:10.3389/fpubh.2020.00457

There is a conflict of interest C.G. and M.P. are employees of Roche Diagnostics; S.G. is a co-founder of Psyma Ltd; G.F., and S.G. are co-founders of Prova Health which has also received funding from Roche Diagnostics; M.P. is a co-founder of Open Medical Ltd.

Download PDF

Version 1

posted

You are reading this latest preprint version

Clinical simulation in the regulation of software as a medical device (SaMD): an eDelphi study

Status:

Version 1

Abstract

Figures

Introduction

Results

Scoping round

Delphi rounds 1 and 2

Round 1 quantitative analysis

Discussion

Limitations

Conclusion

Methods

Declarations

References

Additional Declarations

Status:

Version 1