A unified framework towards diagnostic test development and evaluation during outbreaks of emerging infections

doi:10.21203/rs.3.rs-2827292/v1

Download PDF

Article

A unified framework towards diagnostic test development and evaluation during outbreaks of emerging infections

https://doi.org/10.21203/rs.3.rs-2827292/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

Diagnostic tests play a crucial role during an epidemic or a pandemic, both for individual patient care, and as a tool in population-level non-pharmaceutical interventions. The development and evaluation of such tests during epidemics faces numerous challenges, including short timeframes, and changing disease prevalence, pathogen characteristics, and testing applications. In this position paper, we describe these challenges through an interdisciplinary lens and present potential solutions, based on experiences during the SARS-CoV-2 pandemic.
Methods
We conducted a workshop that brought together experts from various disciplines involved in diagnostic test development and evaluation, from molecular test development to public health decision-making. The challenges and potential solutions we discuss are derived from discussions had and conclusions drawn in the workshop.
Results
We identified a feedback loop between evaluation of test accuracy, integration of test accuracy estimates in modelling studies for public health decision-making, and population-level interventions that determine testing strategies, and can define how diagnostic tests might need re-evaluation. Incorporating this feedback loop into test evaluation can help diagnostic test development be optimised for both individual patient care and population level measures. Furthermore, adaptive and seamless designs for diagnostic studies provide a promising methodological solution to narrow timeframes and the need for continuous re-evaluation of diagnostic tests during epidemic or pandemic situations.
Conclusions
We present a framework for diagnostic test development and evaluation that acknowledges the feedback loop between diagnostic test studies and infectious disease modelling studies, and provides solutions to challenges faced in test development and evaluation during outbreaks of emerging infectious agents.

Health sciences/Medical research/Clinical trial design/Clinical trials/Adaptive clinical trial

Health sciences/Medical research/Clinical trial design/Clinical trials/Phase III trials

Evaluating the accuracy of diagnostic tests during epidemics is difficult due to an urgent need for test availability, changing disease prevalence and pathogen characteristics, and constantly evolving testing aims and applications (e.g. from confirming infection to screening asymptomatic individuals). Based on the lessons learned during the SARS-CoV-2 pandemic, we introduce a framework for rapid diagnostic test development, evaluation, and validation during outbreaks of emerging infections. The framework is based on the feedback loop between evaluation of test accuracy, integration of accuracy estimates in modelling studies for public health decision-making, and public health interventions. We suggest that by acknowledging and using this loop, future diagnostic test evaluation platforms can better address the requirements of both patient care and public health.

Newly emerging infectious agents present a particular challenge for diagnostic test development and evaluation. These agents often surface in the form of an outbreak, an epidemic or a pandemic with high urgency for targeted infection control, but minimal evidence about the agents themselves. Rapid availability of diagnostic tests, along with information on their accuracy, however limited, is critical in these situations.

Traditional diagnostic study designs and quality assessment tools developed for individual patient care, as proposed in the guidelines for diagnostic tests^1–3, are difficult to apply given the volatile environment, with continuously evolving research questions, infectious agents, and intervention options. These challenges became particularly apparent during the SARS-CoV-2 pandemic, where the quality of diagnostic studies available in the field was generally limited. The Cochrane review on rapid, point-of-care (POC) antigen and molecular-based tests for diagnosing SARS-CoV-2 infection found a high risk of bias in different domains in 66 of the 78 studies considered (85%)⁴. The most frequent potential source of bias was identified in the reference standard domain, including potential of imperfect gold/reference standard bias, incorporation bias, and diagnostic review bias (an explanation of these biases is given in Table 1). In the Cochrane review on antibody tests for identification of current and past infection with SARS-CoV-2⁵, the most frequent potential source of bias was identified in the patient selection domain, due to selection or spectrum bias (48 of 54 studies, 89%).

In both application fields, the differences between the diagnostic test accuracy estimates reported by the manufacturers and those estimated later on in the Cochrane meta-analyses were enormous. The mean sensitivity reported by manufacturers for antigen tests was 89% (as of 22/06/2022, restricted to antigen tests)⁶. In comparison, the sensitivity estimated in the meta-analysis on antigen tests⁴ was 72% in symptomatic and 58% in asymptomatic individuals. This discrepancy shows that the timely evaluation of newly developed laboratory tests under real-life conditions is crucial, and should be planned and started before market launch.

Table 1

Summary of types of bias typically present in diagnostic accuracy studies
Domain	Type of bias	Description	Example based on the SARS-CoV-2 antigen test
Reference standard	“This is the test used to define the target condition, and the underlying assumption is that it reflects the truth. By design, the reference standard is assumed to be flawless. The reference standard sets the reference, and sensitivity and specificity are expressed as the proportion of reference standard positives with a positive index test result, and the proportion of reference standard negatives with a negative index test result, respectively.”⁷ For the evaluation of the SARS-CoV-2 antigen test, the detection of viral RNA via reverse transcription–polymerase chain reaction (RT-PCR) is in general the reference standard. This might be problematic if the target condition is e.g. being infectious.
	Imperfect gold/reference standard bias	The reference standard is unlikely to correctly classify the target condition.	A single RT-PCR is used as a reference standard.
	Incorporation bias	The definition of the reference standard incorporates results from the index tests.	The true infection status is defined based on both the RT-PCR and the antigen test result.
	Diagnostic review bias	The results of the reference standard are interpreted with knowledge of the results of the index test.	The results of the antigen test are known when the RT-PCR result is interpreted.
	Differential verification bias	Different reference standards are used between groups of patients to determine the true disease state.	Positive antigen test results are verified by RT-PCR and negative results are verified by a second antigen test.
	Partial verification bias	The true disease state is verified by a reference standard only for a subgroup.	Only positive antigen test results are verified by a RT-PCR.
Index test	The test of interest is called the index test, here being the antigen test.
	Test review bias	The results of the index tests are interpreted with knowledge of the reference standard.	Antigen test results are interpreted with the knowledge of the RT-PCR result.
	Threshold bias	Diagnostic accuracy is evaluated for different of multiple thresholds.	The threshold for the antigen test was not pre-specified.
Participant selection	Selection bias	The study population is not representative of the population of the intended use.	Participants are selected based on results of RT-PCR.
Participant selection	Spectrum bias	Not the whole spectrum of the disease is included.	Cases are patients with severe symptoms, and controls are pre-pandemic.

This paper is divided into three key sections. First, we discuss the relevance of diagnostic studies for public health decision-making based on mathematical models. Second, we describe the challenges in developing diagnostic tests and present study designs to accelerate the evaluation of their diagnostic accuracy. Third, considering the challenges mentioned above, we present the idea of a unified framework for rapid diagnostic test development and a clinical evaluation pathway. This highlights that multiple and perhaps different study designs will be necessary to build a convincing portfolio of evidence for various stakeholders during outbreaks of emerging infections.

As part of a research project funded by the German Research Foundation, we conducted an interdisciplinary workshop that brought together expertise from all disciplines relevant to diagnostic test development and evaluation, ranging from molecular test development to public-health decision-making. Based on discussions had and conclusions drawn in this workshop, we describe challenges and potential solutions for implementing state-of–the-art diagnostic test development and evaluation processes, based on accuracy studies performed at different phases of an epidemic or pandemic.

Population-level information as a key input for public health decision-making

While diagnostic tests are usually developed for individual diagnosis and patient care, their results also play a crucial role in public health decision-making. Population-level case data, collected based on the number of positive diagnostic tests in surveillance systems worldwide, are a central input parameter for decision-making processes in public health policy. Cases might in this situation represent different outcomes of contact with an infectious agent (e.g., infections or deaths), and also different types of measures of this contact (e.g., incident or cumulative cases derived from seroprevalence studies).

Surveillance systems for infectious diseases provide reports on the number of cases associated with specific pathogens using standardized case definitions based on pre-defined rules (including diagnostic test results) and legal obligations. These surveillance systems run constantly for notifiable diseases associated with high public health risks. Surveillance-related case data (based on diagnostic test results) are directly used for public health decision-making. They enable the development and parameterization of infectious disease models (e.g., for early warning and monitoring) and for decision-analytic models (e.g., for assessing the benefit-harm, cost-effectiveness or other trade-offs when guiding public health interventions). This is especially true in epidemic or pandemic situations when reducing harm at a population level becomes a crucial aspect of the decision-making philosophy^8,9, and high consequence decisions must be made under uncertainty and time pressure. In such scenarios, two fundamental and extremely relevant quantities are; some measure of the presence of the infection in the population (e.g., prevalence or incidence data), and a measure of existing immunity to the infection in the population, i.e. seroprevalence data.

An important decision supported by dynamic infectious disease modelling studies focusing on predicting infection dynamics is the timing of interventions. Interventions are most effective when deployed in time¹⁰ and may cease to be effective if implemented too late¹¹. Therefore, it is imperative that decisions about implementing interventions are made in a timely manner and sometimes with incomplete evidence, but with all relevant information being collected and reported suitably. Monitoring population-level data from as soon as possible is essential since it can be used to set thresholds for starting interventions¹² and determine when intervention measures are no longer necessary and can be ended¹³.

Due to reporting delays and the fact that the decision-making process is not instantaneous, decisions can come too late when relying solely on current population-level data. This is where infectious disease modelling comes in. Models help decision-makers obtain reasonable estimates of how the epidemic is likely to progress and what impact different interventions may have. This enables timely and informed decision-making^14–16. Combined with benefit-harm and health economic models to account for unintended effects and costs of interventions, infectious disease models enable decision-makers to make optimal decisions given the available evidence and resources^17–19.

The points discussed above are exemplified by decision-making during the SARS-CoV-2 pandemic. Even during the early phases of the pandemic, decisions about interventions were made with population-level data in hand. In the UK, the timing of the first nationwide lockdown was determined based on the predicted number of people treated for SARS-CoV-2 in the intensive care unit (ICU)¹². In Australia, more targeted lockdowns were implemented based on regional prevalence data^20,21, and local lockdowns were also implemented in the UK during later phases of the pandemic²². Prevalence data became yet more important when contact tracing and test-intervention strategies were implemented, because the predictive value of diagnostic tests depends on the infection prevalence. As vaccines became available, subpopulations most at risk of severe COVID-19 were prioritised and given the opportunity to be vaccinated first^23,24. In Germany, vaccination and testing control rules for access to parts of public life varied from region to region. Again, the region-specific thresholds were based on the number of hospitalised patients testing positive for SARS-CoV-2 in the respective region.

Mathematical models were used throughout to support the decision-making process. The threshold for applying the first nationwide lockdown in the UK was set based on the number of people estimated to be in need of ICU treatment based on different modelling scenarios¹². In Austria, the decision to prioritise vaccinating elderly and vulnerable groups was based on decision-analytic modelling to minimise hospitalisations and deaths²⁵. In general, infectious disease and decision-analytic models contributed substantially to the type and intensity of interventions implemented^26–28. Once tests became widely available, they were also used to devise effective mass testing and isolating strategies^29–31.

The current pandemic has thus demonstrated the need for accurate and timely population-level case data and clinical case data (requiring different diagnostic tests and testing strategies), to allow public health policy decisions to be as well-informed as possible. Diagnostic tests, as the primary tool to obtain these population-level data, are therefore at the heart of all modelling efforts during an epidemic or pandemic, and early and precise knowledge about their accuracy is crucial for interpreting and further applying these case data.

Challenges for diagnostic test evaluation in an epidemic setting

Diagnostic tests developed for emerging infections should serve various purposes, including individual clinical diagnosis, screening, and – as discussed above – surveillance. These purposes demand distinct strategies and, in theory, require separate approval mechanisms³². However, test development, evaluation of technical validity, clinical validity and utility, as well as test validation currently do not account for that in a generalized way. The challenges and potential solutions in this article and the framework proposed therein have been described with all these purposes in mind.

In the initial phase of an outbreak of an emerging infection, the main focus of diagnostic test development is providing a diagnostic test that can identify infected individuals with high sensitivity, so that they can be isolated and treated as soon as possible. This is usually achieved by direct detection of the pathogen, e.g., by molecular genetic tools like polymerase chain reaction (PCR), microscopy, antigen tests or cultivation of the microorganisms involved. Later, a better understanding of the immune protection caused by contact with the agent is required, leading to the development of indirect pathogen detection tools antibody tests. Here, sensitivity and specificity are equally important to evaluate proxies of long-term immune protection and to detect past low severity infections which would have been missed otherwise. The specificity of the direct detection tools developed earlier can also come into play in the case of reported reinfections, whereby it becomes important to understand whether these reinfections were due to false positives in a time of intensified testing. High specificity is also important once treatment options are available, but possibly come with relevant side effects, high costs or limited availability. Once tests begin to be used as parts of an intervention strategy or other population-level aims, they need to be developed as point-of-care (POC) diagnostic tests, which may allow for lower accuracies but need to be easily and quickly administrable in practice. Furthermore, target populations, testing aims and prioritised estimators (e.g. sensitivity or specificity) can change rapidly, necessitating constant test evaluation and re-evaluation.

During an epidemic or pandemic, direct and indirect tests are thus used for different purposes and require different study designs, with different sample size calculations and study populations, to provide critical information with high precision and validity.

During epidemics with emerging infections, all new tests must, in general, quickly go through three steps; the test must be developed, its clinical performance assessed, and then information on its performance incorporated into infectious disease modelling to inform public health decision-making. Each step has potential sources of various biases that must be considered. In the following, we describe potential challenges during these steps and how these challenges might affect the submission process to regulatory agencies, taking into account the perspective of test developers from the industry.

Diagnostic test development

Diagnostic tests for emerging infections typically fall into the so called in vitro diagnostic (IVD) test category, as they examine human body specimens (e.g., nasopharyngeal swabs, nasal swabs, blood or saliva³²). IVDs are generally considered medical devices³³. Consequently, their development has to adhere to the rules of regulatory agencies and a pre-defined complex legal framework. Currently, the EU IVD Regulation 2017/746 covers IVD medical devices, and focuses on a legislative process that prioritises individual safety, which means that different types of clinical data must be collected before submission. If a test is deemed capable of distinguishing infected individuals from non-infected ones, it has to be shown to not be a one-off result³⁴.

There are several phase models for the development of diagnostic tests in the literature. In the following, we use the frequently used four-phase model^2,35,36- In phase I, the analytical performance is evaluated, - in phase II, the diagnostic accuracy is estimated roughly and the threshold is determined, - in phase III, the clinical performance is estimated in a confirmatory way, and - in phase IV, the test is evaluated together with the following diagnostic and/or therapeutic measures with regard to a patient-relevant endpoint.

Inter-rater agreement, analytical sensitivity (minimally detectable levels)³⁴ and cross-reactivity have to be investigated in phase I studies to verify the technical validity, repeatability and reproducibility of laboratory tests (on a lot-to-lot, instrument group, and day-to-day basis). However, in the early phase of an epidemic or pandemic, there are often not enough samples from infected individuals. Sharing data and using a common infrastructure by, for instance, collecting samples at national reference centres could solve this problem, if they are made accessible to IVD developers. A possible limitation of this approach is the risk of spectrum bias due to the particular mix of individuals, e.g. there may be more severe cases in the samples than in reality. Furthermore, regulatory agencies do not allow the use of (frozen) biobank samples for approval.

After having shown good technical performance, the next step is demonstrating clinical performance in phase II and III studies. An integral part of assessing the sensitivity and specificity of a continuous diagnostic test is determining the threshold at which it should be used³⁴. This must be fixed before moving on to diagnostic test evaluation, to avoid bias caused by a data-driven threshold selection^37,38. The optimal threshold for a diagnostic test depends on the prevalence and consequences associated with misclassification in either way^31,39, which may both change over time; this would mean that a new study would be needed every time the threshold changes, requiring extensive resources (especially time and money).

Phase II studies are initial, so-called proof-of-concept studies about clinical performance and are often carried out in a two-gate design⁴⁰, where sensitivity is estimated in diseased individuals and specificity in healthy samples from a different source. However, this design can lead to spectrum bias (Table 1). Rutjes et al.⁴⁰ have already pointed out that sensitivity and specificity are generally overestimated in such studies. Likewise, Lijmer et al.⁴¹ have shown in their meta-analysis that a two-gate case-control design is likely to lead to an overestimation of diagnostic accuracy. In most situations outside an epidemic or pandemic, individuals tested are symptomatic and suspicious for the infection of interest if the test is thought to be used to guide therapy or decide about isolation. However, during epidemics or pandemics, tested individuals can also be asymptomatic if the test is intended as a contact tracing tool or screening test³⁴. In both cases, real-world samples may not be as perfect as in a laboratory situation³⁴ because testing can also be performed at POC, in the community, at the workplace, school, or home³². A test may require different performance characteristics if it is the first test in line, used to triage who will be tested further, compared to when the test is used to confirm infection. For instance, in a confirmation setting, most individuals who clearly do not have the infection of interest will be excluded³⁴.

Diagnostic test evaluation

IVDs must be evaluated in phase III diagnostic accuracy studies that ideally start by including all individuals who will be tested in clinical practice to avoid selection bias (all-comer studies). Individuals fulfilling the inclusion criteria should be enrolled consecutively, without judging how likely this person is to test positive or negative³⁴. In such prospective diagnostic studies, to minimise variability and thus increase statistical power, all study participants ideally undergo all tests under investigation (index tests) as well as the reference standard to assign their final diagnosis.

The reference standard must be sufficiently reliable to differentiate between people with and without the target condition, but it is usually not perfect³⁴. This imperfectness has to be taken into account when interpreting the results. Suppose a POC antigen test for SARS-CoV-2 is evaluated with a PCR test as reference standard resulting in a sensitivity of 90%. This does not mean that 90% of people with SARS-CoV-2 will be detected but that the POC test will be positive in 90% of cases with a positive PCR test. Solutions to this may include follow-up data or composite reference standards, which use all tests or clinical criteria available for a diagnosis. However, if the test under evaluation is part of this composite reference standard, this may lead to incorporation bias⁴².

Depending on the phase of the epidemic or pandemic, recruitment speed can vary considerably due to changes in incidence. The guideline on clinical evaluation of diagnostic agents of the European Medicine Agency² demands sample size specification in a confirmatory diagnostic accuracy study in the study protocol. The required sample size is highly dependent on the prevalence of the target condition, which may change during the recruitment phase, making a priori sample size calculations inappropriate at the time of recruitment.

Submission to regulatory agencies

Studies for the industry face rigorous regulatory and ethics requirements as clinical trials follow strict processes and regulatory guidelines which are assessed in the regulatory submission process and are potentially controlled by audits. Clinical studies must be transparent, traceable and reproducible. Special attention must be paid to data quality and privacy. This leads to very detailed study preparation, documentation, quality control, and long and less flexible study processes.

When the SARS-CoV-2 pandemic hit in 2019, the need for diagnostic tests grew with the rising number of cases. Regulatory bodies (like the U.S. Food & Drug Administration, FDA) established country-specific emergency use authorization guidelines^43,44to make it easier and faster to bring a test for SARS-CoV-2 to the market and make it accessible during the pandemic. As soon as the emergency situation is declared over, the tests must go through the regular submission process for every country to get clearance.

Requirements like sample size, inclusion criteria for subjects, properties of the reference test and more are different for each country's submission process and may change during an epidemic or pandemic. Therefore, it is not always possible to cover submissions for different countries or certificates within one study, and several studies must be planned.

The different and changing requirements are not the only challenges submission teams face. The changing prevalence of infection makes adequate project management and timeline planning difficult. Recruitment of positive cases fulfilling the recruitment requirements can be very slow which leading to a longer study duration and, therefore, longer time to market. New mutations of SARS-CoV-2 make re-evaluations of statistical properties necessary. Considering regulatory changes during pandemics and possible mutations, (pre)planning such a study is complicated and time-consuming.

Potential solutions for the challenges presented

The challenges discussed in the previous chapters are multidimensional but can be addressed by three countermeasures in several areas. First, test developers should use methodological approaches to address study designs and statistical analyses, increasing study efficiency and reducing the risk of bias. Second, strategic approaches and regulatory guidance for the industry should be deployed to clearly define opportunities but also limitations in the development and approval process. Third, results and feedback from population-level mathematical modelling should inform test development and validation for deriving optimal study designs based on formal value-of-information analyses.

Methodological solutions

Methodological solutions fall into two categories; statistical methods to control bias, and those to increase speed and efficiency.

The different biases in diagnostic studies have been described extensively, both in general^45–47 and also specifically in the context of the SARS-CoV-2 pandemic⁴⁸ and POC tests for respiratory pathogens⁴⁹. From a methodological standpoint, the problem of bias can be addressed in two ways: either by choosing a study design in the planning stage that minimises the risk of bias, or by using analytical methods that correct for potential bias.

An excellent overview of how to avoid bias through an appropriate design can be found in Pavlou et al.⁵⁰. Important for the planning phase is the work of Shan et al. ⁵¹, who present an approach to calculate the sample size in the presence of verification bias (i.e., partial or differential verification bias).

In terms of bias reduction methods during the analysis phase, most studies focus on the correction of verification bias. Bayesian approaches are mainly proposed for differential verification bias^52,53, while there are a variety of methods for partial verification bias (for a methodological review see Chikere et al.⁵⁴ or de Groot et al.⁵⁵, for implementation in R see Arifin et al.⁵⁶).

Time to market has to be reduced significantly in pandemics to find an optimal trade-off between misclassification and missed opportunities of action. From a statistical point of view, the methods and processes must be reconsidered. One possibility to improve study designs and statistical analysis is adaptive designs, that can increase efficiency. These approaches have been established for a long time in therapeutic studies and are also anchored in guidelines^3,57. With adaptive designs, it is possible to make pre-specified modifications during the study. For example, inclusion and exclusion criteria can be changed, the trial can be terminated early due to futility or efficacy, or the sample size can be recalculated. Thorlund et al.⁵⁸ summarise the characteristics and typical adaptive designs very clearly in their review. Cerqueira et al.⁵⁹ observed in their review of published studies with adaptive designs that the pharmaceutical industry in particular increasingly uses simple adaptive designs, but more complex adaptive designs are still rare.

In diagnostic studies, however, this topic is still very fresh, and experience in using adaptive designs in diagnostic clinical trials for submissions is limited. A summary of the current state of research can be found for diagnostic accuracy studies in Zapf et al.⁶⁰, for randomised test-treatment studies in Hot et al.⁶¹ and for adaptive seamless designs in Vach et al.⁶². Methods for blinded and unblinded sample size re-calculations for diagnostic accuracy studies have been published recently^63–66, as well as adaptive designs for test-treatment studies⁶⁷ and adaptive seamless designs⁶⁸. The diagnostic industry heavily depends on regulatory guidelines worldwide. If regulatory bodies emphasise more efficient diagnostic trials that include, e.g., adaptive designs, it would incentivise the implementation of modern study designs.

In the following, concrete possible solutions to the above-mentioned challenges are explained as examples. For details, please refer to the corresponding articles.

To address the problem of setting a threshold in an early study that may later turn out not to be optimal, the approach of Westphal et al.⁶⁹ can be used. A limited pool of promising thresholds can be selected. These are then evaluated simultaneously in the validation study, with the type I error adjusted accordingly. Another idea is to use mixture modelling without defining a threshold⁷⁰. Prevalence-specific cut-offs might be developed and defined a priori.

If the testing strategy and thus the target population change during the study, adaptive designs offer the possibility to re-estimate the sample size in a blinded manner based on the prevalence estimated in the interim analysis⁶³.

To address the problem of biased diagnostic accuracy in two-gate designs, a seamless enrichment design could be chosen, in which proof-of-concept and confirmation are performed together in one study⁶⁸. However, it is apparent that regulatory authorities are cautious of the possible shortcomings of these innovative designs, and a lot of work needs to be done to get them approved⁷¹. This, in turn, results in the manufacturers of diagnostic tests being conservative in their study designs.

Solutions for political decision-making based on mathematical modelling

When considering model input data, one key aspect that has to be taken into account by modelling studies is the deliberate parameterization of accuracy for case numbers based directly or indirectly on the results of a diagnostic test⁷². That typically includes incidence rates as well as seroprevalence estimates. During the first three months of the SARS-CoV-2 pandemic, only a minority of modelling studies in the field accounted for test accuracy estimates; the remaining used incidence and later seroprevalence data as if they represented the ground truth. This approach would be appropriate if incidence or seroprevalence data were already corrected for imperfect test accuracy estimates. However, in this case, the correction procedure should still be reported in the modelling study to enable a transparent evaluation of model parameterization, and the model(s) should be reparametrized once updated information on diagnostic test accuracy is available. The earlier that decisions are made based on updated information, the greater is the impact of these decisions on population health (Fig. 1). An earlier decision by just a few weeks or even a couple of days can make a huge difference, offering a critical time window for accelerated diagnostic studies. Figure 2 shows the sensitivity of model-based assessments of interventions to diagnostic test accuracy parameters. The results show that even relatively small biases in the estimation of test accuracy (much smaller than those found in the Cochrane reviews) for an antibody test used to derive the proportion of undetected cases in a population have an enormous effect on the predicted further course of the epidemic (the mechanism for this impact is that the proportion of undetected cases is used to correct reported case numbers before they are used to calibrate transmissibility estimates and other parameters).. The results are enough to change public health decision-making from, for instance not implementing population-level contact reduction measures to introducing a hard lockdown if the defined outcome of interest crosses a set decision-analytic threshold.

Longitudinal panels as a platform for diagnostic accuracy studies

Given the rapidly changing research questions during an epidemic or pandemic, there is a huge practical challenge in setting up diagnostic studies even with the modern study designs described above, because the acceptable time spans for recruiting study participants and for conducting the actual studies are very short. The availability of a study platform that allows immediate initiation of diagnostic studies reflecting the current research question and infection dynamics is indispensable for timely studies in the field. One way to ensure this is the sustainable implementation of a longitudinal panel within existing cohorts (e.g., as the NAKO Health Study⁷⁴) that is tested regularly for the presence or absence of the pathogen by a defined test (or several) under evaluation. Another way would be to use data from hospitals, health insurance or public health agencies. In this approach, a platform comparable to the UK ONS panel⁷⁵ or the REACT study⁷⁶ can be built and used for two, equally important, purposes; the evaluation of the tests or testing strategies under study, and the real-time communication about the results of the respective tests representing current or past infection dynamics. In this setting, flexible and fast study designs can fulfil both purposes at the same time.

Feedback triangle at the centre of a unified framework

As discussed in the section above, the development and evaluation of diagnostic tests in an epidemic or pandemic setting is closely linked to modelling studies used to inform political and public health decision-making. This link is at the centre of the unified framework we want to propose based on the experiences during the SARS-CoV-2 pandemic (Fig. 3). The execution of diagnostic studies for new tests or new application areas of existing tests depends heavily on current test strategies and those potentially applied in the future. Results from diagnostic studies are a direct input in mathematical modelling studies, and in turn results from these models are used for decision-making based on a defined decision-making framework. However, modelling studies can also give crucial feedback to those responsible for planning and analysing diagnostic accuracy studies. Here, so-called value-of-information analyses can help identify those gaps in knowledge regarding diagnostic test accuracy that need to be tackled first or require the greatest attention⁷⁷. This can directly affect sample size estimations, for instance if it is clear that more precision is needed to estimate the test’s specificity (as is often the case with antibody tests). Therefore, the optimal strategy to deal with these constant feedback loops is to establish continuous collaboration between the disciplines representing the three parts of this loop (in green in Fig. 3). This collaboration platform can use the longitudinal panel with complementary perspectives described above to create a unified diagnostic test development and evaluation framework during an epidemic or pandemic. The modern study designs and bias reduction methods described above can be applied to obtain the best potentially available evidence about diagnostic test accuracy in different settings.

Diagnostic test-intervention studies using a cluster-randomised approach

In many situations, diagnostic test accuracy estimates should only be seen as surrogate information since the actual outcome of interest during an ever-changing pandemic, especially in the later phases, is the effect of an application of this test on clinical or population-level outcomes. Here, it is possible, as has been discussed during the SARS-CoV-2 pandemic, to take a step further and move test evaluation to phase IV or diagnostic test-intervention studies. In this phase, individuals or clusters of individuals are randomised to a diagnostic strategy (e.g., regular testing of the entire population or testing only in case of symptoms). The relevant clinical endpoint is then compared between randomised groups³⁵. Thus, the test strategy is treated like an intervention evaluated for its effectiveness and safety. Diagnostic test accuracy helps to reach this endpoint but is not the only factor under evaluation. The practicability of the strategy, as well as real-world effectiveness and interaction with other interventions (e.g., the case isolation and quarantine of close contacts), are also assessed indirectly in this approach. In a dynamic infectious disease setting where an intervention can have indirect effects on people other than the target population, only cluster-randomised approaches allow for a reasonable estimation of population-level effects of the intervention under study. In infectious disease epidemiology, similar designs are applied when assessing the effectiveness of vaccination programs on a population level, often combined with a staggered entry approach to allow all clusters to benefit from the intervention over time (so-called stepped-wedge design). During the pandemic, small-scale pilot studies were discussed, trying to mirror such an approach in a non-randomised way, often claiming to be a natural experiment. However, most of them did not follow guidelines and recommendations available for diagnostic test-intervention studies that would have improved the quality of the results and their usefulness for evidence-based public health. Rigorous application of cluster-randomised diagnostic test-intervention studies to implement testing strategies can support decision-making processes in the later stages of an epidemic or pandemic.

The development and evaluation of diagnostic tests for emerging infectious agents during an epidemic or pandemic come with many serious challenges. We propose integrating diagnostic studies in a unified framework representing the triangle of diagnostic test evaluation, predictive or decision-analytic public health modelling and the testing strategy applied in this population. This framework can use modern, flexible and fast study designs and should incorporate a longitudinal panel as a continuous study platform. Diagnostic test-intervention studies need to be planned early and should be used for evidence-based public health during later phases of an epidemic or pandemic, when research questions become more complicated and testing strategies serve as interventions to counteract infectious disease dynamics.

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Author Contributions

MC, DK, NR, AK, and AZ wrote the first draft. All authors attended the workshop or participated in the discussion, contributed to the writing of the manuscript, and approved the final version.

Competing Interests

The authors declare no competing interests.

REGULATION (EU) 2017/ 746 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL - of 5 April 2017 - on in vitro diagnostic medical devices and repealing Directive 98/ 79/ EC and Commission Decision 2010/ 227/ EU. (2017).
European Medicines Agency. GUIDELINE ON CLINICAL EVALUATION OF DIAGNOSTIC AGENTS. (2009).
Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests - Guidance for Industry and FDA Staff | FDA. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/statistical-guidance-reporting-results-studies-evaluating-diagnostic-tests-guidance-industry-and-fda.
Dinnes, J. et al. Rapid, point-of-care antigen and molecular-based tests for diagnosis of SARS-CoV-2 infection. Cochrane Database Syst. Rev. 2021, (2021).
Deeks, J. J. et al. Antibody tests for identification of current and past infection with SARS-CoV-2. Cochrane Database Syst. Rev. 2020, (2020).
Test directory - FIND. https://www.finddx.org/covid-19/test-directory/?_disease_target=sars-cov-2&_assay_target=antigen&_clinical_sensitivity=0.00%2C100.00&_clinical_specificity=0.00%2C100.00.
Leeflang, M. M. G. Systematic reviews and meta-analyses of diagnostic test accuracy. Clin. Microbiol. Infect. 20, 105–113 (2014).
Daugherty Biddison, L. et al. Ethical considerations: care of the critically ill and injured during pandemics and disasters: CHEST consensus statement. Chest 146, e145S-e155S (2014).
Emanuel, E. J. et al. Fair Allocation of Scarce Medical Resources in the Time of Covid-19. N. Engl. J. Med. 382, 2049–2055 (2020).
Kelso, J. K., Milne, G. J. & Kelly, H. Simulation suggests that rapid activation of social distancing can arrest epidemic development due to a novel strain of influenza. BMC Public Health 9, 1–10 (2009).
Faes, C., Hens, N. & Gilbert, M. On the timing of interventions to preserve hospital capacity: lessons to be learned from the Belgian SARS-CoV-2 pandemic in 2020. Arch. Public Heal. 79, 1–5 (2021).
Ferguson, N. M. et al. Report 9- Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. doi:10.25561/77482.
Ngonghala, C. N. et al. Mathematical assessment of the impact of non-pharmaceutical interventions on curtailing the 2019 novel Coronavirus. Math. Biosci. 325, 108364 (2020).
van Kerkhove, M. D. & Ferguson, N. M. Epidemic and intervention modelling – a scientific rationale for policy decisions? Lessons from the 2009 influenza pandemic. Bull. World Health Organ. 90, 306 (2012).
Heesterbeek, H. et al. Modeling infectious disease dynamics in the complex landscape of global health. Science 347, (2015).
Kretzschmar, M. E. et al. Challenges for modelling interventions for future pandemics. Epidemics 38, 100546 (2022).
Mauskopf, J. et al. Economic Analysis of Vaccination Programs: An ISPOR Good Practices for Outcomes Research Task Force Report. Value Health 21, 1133–1149 (2018).
Siebert, U. When should decision-analytic modeling be used in the economic evaluation of health care? Eur. J. Heal. Econ. 4, 143–150 (2003).
Ultsch, B. et al. Methods for Health Economic Evaluation of Vaccines and Immunization Decision Frameworks: A Consensus Framework from a European Vaccine Economics Community. Pharmacoeconomics 34, 227–244 (2016).
Melbourne lockdown extended by seven days as Victoria records 20 local COVID-19 cases - ABC News. https://www.abc.net.au/news/2021-08-11/victoria-covid-cases-melbourne-lockdown-extension/100366822.
Australia’s Canberra extends COVID-19 lockdown | Coronavirus pandemic News | Al Jazeera. https://www.aljazeera.com/news/2021/8/31/australias-canberra-extends-covid-19-lockdown.
Coronavirus: local lockdowns | The Institute for Government. https://www.instituteforgovernment.org.uk/explainers/coronavirus-local-lockdowns.
Federal Government: Federal and Länder Governments consult on the coronavirus situation. https://www.bundeskanzler.de/bk-en/news/corona-state-premier-conference-1983156.
World Health Organisation. WHO SAGE roadmap for prioritizing uses of COVID-19 vaccines in the context of limited supply: an approach to inform planning and subsequent recommendations based on epidemiological setting and vaccine supply scenarios, first issued 20 October 2020, latest. (2021).
Jahn, B. et al. Targeted COVID-19 Vaccination (TAV-COVID) Considering Limited Vaccination Capacities—An Agent-Based Modeling Evaluation. Vaccines 9, (2021).
Moore, S., Hill, E. M., Tildesley, M. J., Dyson, L. & Keeling, M. J. Vaccination and non-pharmaceutical interventions for COVID-19: a mathematical modelling study. Lancet Infect. Dis. 21, 793–802 (2021).
Chowdhury, R. et al. Dynamic interventions to control COVID-19 pandemic: a multivariate prediction modelling study comparing 16 worldwide countries. Eur. J. Epidemiol. 35, 389–399 (2020).
Giordano, G. et al. Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy. Nat. Med. 2020 266 26, 855–860 (2020).
Grassly, N. C. et al. Comparison of molecular testing strategies for COVID-19 control: a mathematical modelling study. Lancet Infect. Dis. 20, 1381–1389 (2020).
Nussbaumer-Streit, B. et al. Quarantine alone or in combination with other public health measures to control COVID-19: a rapid review. Cochrane database Syst. Rev. 4, (2020).
Nussbaumer-Streit, B. et al. Quarantine alone or in combination with other public health measures to control COVID-19: a rapid review [Update]. Cochrane database Syst. Rev. 9, (2020).
Mina, M. J. & Andersen, K. G. COVID-19 testing: One size does not fit all. Science 371, 126–127 (2021).
Consolidated text: Regulation (EU) 2017/746 of the European Parliament and of the Council of 5 April 2017 on in vitro diagnostic medical devices and repealing Directive 98/79/EC and Commission Decision 2010/227/EU (Text with EEA relevance)Text with EEA re. https://eur-lex.europa.eu/eli/reg_impl/2022/1107/oj.
Leeflang, M. M. G. & Allerberger, F. How to: evaluate a diagnostic test. Clin. Microbiol. Infect. 25, 54–59 (2019).
Sackett, D. L. & Haynes, R. B. The architecture of diagnostic research. BMJ 324, 539–541 (2002).
Koebberling, J., Trampisch, H. & Windeler, J. Memorandum: Evaluation of diagnostic measures. J Clin Chem Clin Biochem 28, 873–880 (1990).
Leeflang, M. M. G., Moons, K. G. M., Reitsma, J. B. & Zwinderman, A. H. Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. Clin. Chem. 54, 729–737 (2008).
Ewald, B. Post hoc choice of cut points introduced bias to diagnostic research. J. Clin. Epidemiol. 59, 798–801 (2006).
Jahn, B. et al. On the role of data, statistics and decisions in a pandemic. AStA Adv. Stat. Anal. 1–34 (2022) doi:10.1007/S10182-022-00439-7/FIGURES/4.
Rutjes, A. W. S., Reitsma, J. B., Vandenbroucke, J. P., Glas, A. S. & Bossuyt, P. M. M. Case-control and two-gate designs in diagnostic accuracy studies. Clin. Chem. 51, 1335–1341 (2005).
Lijmer, J. G. et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 282, 1061–1066 (1999).
Karch, A., Koch, A., Zapf, A., Zerr, I. & Karch, A. Partial verification bias and incorporation bias affected accuracy estimates of diagnostic studies for biomarkers that were part of an existing composite gold standard. J. Clin. Epidemiol. 78, 73–82 (2016).
Commission Implementing Regulation (EU) 2022/1107 of 4 July 2022 laying down common specifications for certain class D in vitro diagnostic medical devices in accordance with Regulation (EU) 2017/746 of the European Parliament and of the Council (Text with. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02017R0746-20220128.
In Vitro Diagnostics EUAs | FDA. https://www.fda.gov/medical-devices/coronavirus-disease-2019-covid-19-emergency-use-authorizations-medical-devices/in-vitro-diagnostics-euas.
Hall, M. K., Kea, B. & Wang, R. Recognising Bias in Studies of Diagnostic Tests Part 1: Patient Selection. Emerg. Med. J. 36, 431 (2019).
Kea, B., Hall, M. K. & Wang, R. Recognising bias in studies of diagnostic tests part 2: interpreting and verifying the index test. Emerg. Med. J. 36, 501–505 (2019).
Kohn, M. A., Carpenter, C. R. & Newman, T. B. Understanding the Direction of Bias in Studies of Diagnostic Test Accuracy. Acad. Emerg. Med. 20, 1194–1206 (2013).
Suchá, D., van Hamersvelt, R. W., van den Hoven, A. F., de Jong, P. A. & Verkooijen, H. M. Suboptimal Quality and High Risk of Bias in Diagnostic Test Accuracy Studies at Chest Radiography and CT in the Acute Setting of the COVID-19 Pandemic: A Systematic Review. Radiol. Cardiothorac. Imaging 2, (2020).
Hughes, J. M., Penney, C., Boyd, S. & Daley, P. Risk of bias and limits of reporting in diagnostic accuracy studies for commercial point-of-care tests for respiratory pathogens. Epidemiol. Infect. 146, 747–756 (2018).
Pavlou, A., Kurtz, R. M. & Song, J. W. Diagnostic Accuracy Studies in Radiology: How to Recognize and Address Potential Sources of Bias. Radiol. Res. Pract. 2021, 1–10 (2021).
Shan, G., Zhang, H. & Jiang, T. Determining sample size for a binary diagnostic test in the presence of verification bias. J. Biopharm. Stat. 28, 1193–1202 (2018).
De Groot, J. A. H. et al. Adjusting for differential-verification bias in diagnostic-accuracy studies: a Bayesian approach. Epidemiology 22, 234–241 (2011).
Lu, Y., Dendukuri, N., Schiller, I. & Joseph, L. A Bayesian approach to simultaneously adjusting for verification and reference standard bias in diagnostic test studies. Stat. Med. 29, 2532–2543 (2010).
Chikere, C. M. U., Wilson, K., Graziadio, S., Vale, L. & Allen, A. J. Diagnostic test evaluation methodology: A systematic review of methods employed to evaluate diagnostic tests in the absence of gold standard - An update. PLoS One 14, (2019).
de Groot, J. A. H. et al. Correcting for partial verification bias: a comparison of methods. Ann. Epidemiol. 21, 139–148 (2011).
Arifin, W. N. & Yusof, U. K. Correcting for partial verification bias in diagnostic accuracy studies: A tutorial using R. Stat. Med. 41, 1709–1727 (2022).
European Medicines Agency. Guidance for companies considering the adaptive pathways approach.
Thorlund, K., Haggstrom, J., Park, J. J. & Mills, E. J. Key design considerations for adaptive clinical trials: a primer for clinicians. BMJ 360, (2018).
Cerqueira, F. P., Jesus, A. M. C. & Cotrim, M. D. Adaptive Design: A Review of the Technical, Statistical, and Regulatory Aspects of Implementation in a Clinical Trial. Ther. Innov. Regul. Sci. 54, 246–258 (2020).
Zapf, A. et al. Adaptive trial designs in diagnostic accuracy research. Stat. Med. 39, 591–601 (2020).
Hot, A. et al. Randomized test-treatment studies with an outlook on adaptive designs. BMC Med. Res. Methodol. 21, 1–12 (2021).
Vach, W. et al. A potential for seamless designs in diagnostic research could be identified. J. Clin. Epidemiol. 129, 51–59 (2021).
Stark, M. & Zapf, A. Sample size calculation and re-estimation based on the prevalence in a single-arm confirmatory diagnostic accuracy study. Stat. Methods Med. Res. 29, (2020).
Blohm, C., Schlattmann, P. & Zapf, A. Sample size estimation and blinded re-estimation for diagnostic studies with single-imputed missing values (preparation of resubmission).
Stark, M., Hesse, M., Brannath, W. & Zapf, A. Blinded sample size re-estimation in a comparative diagnostic accuracy study (preparation of major revision).
Köster, D., Hoyer, A. & Zapf, A. Unblinded re-estimation of sample size in diagnostic test accuracy studies (in preparation).
Hot, A. et al. Sample size recalculation based on the prevalence in a randomised test-treatment study (preparation of resubmission).
Bibiza-Freiwald, E., Vach, W. & Zapf, A. An adaptive seamless enrichment design for diagnostic studies (in preparation).
Westphal, M., Zapf, A. & Brannath, W. A multiple testing framework for diagnostic accuracy studies with co-primary endpoints. Stat. Med. 41, 891–909 (2022).
Bouman, J. A., Riou, J., Bonhoeffer, S. & Regoes, R. R. Estimating the cumulative incidence of SARS-CoV-2 with imperfect serological tests: Exploiting cutoff-free approaches. PLOS Comput. Biol. 17, e1008728 (2021).
Hot, A., Stark, M., Friede, T., HEDOS Group & Zapf, A. A diagnostic phase III / IV adaptive seamless design to investigat the diagnostic accuracy and clinical effectiveness using the example of the HEDOS study (in preparation).
Trikalinos, T. A., Siebert, U. & Lau, J. Decision-analytic modeling to evaluate benefits and harms of medical tests: Uses and limitations. Med. Decis. Mak. 29, (2009).
Streeck, H. et al. Infection fatality rate of SARS-CoV2 in a super-spreading event in Germany. Nat. Commun. 2020 111 11, 1–12 (2020).
NAKO Gesundheitsstudie - Information in English. https://nako.de/informationen-auf-englisch/.
COVID-19 Infection Survey - Office for National Statistics. https://www.ons.gov.uk/surveys/informationforhouseholdsandindividuals/householdandindividualsurveys/covid19infectionsurvey.
Real-time Assessment of Community Transmission (REACT) Study | Faculty of Medicine | Imperial College London. https://www.imperial.ac.uk/medicine/research-and-impact/groups/react-study/.
Siebert, U., Rochau, U. & Claxton, K. When is enough evidence enough? - Using systematic decision analysis and value-of-information analysis to determine the need for further evidence. Z. Evid. Fortbild. Qual. Gesundhwes. 107, 575–584 (2013).

There is NO Competing Interest.

Download PDF

Version 1

posted

You are reading this latest preprint version

A unified framework towards diagnostic test development and evaluation during outbreaks of emerging infections

Status:

Version 1

Abstract

Figures

Plain language summary

Introduction

Methods

Results

Population-level information as a key input for public health decision-making

Challenges for diagnostic test evaluation in an epidemic setting

Diagnostic test development

Diagnostic test evaluation

Submission to regulatory agencies

Potential solutions for the challenges presented

Methodological solutions

Solutions for political decision-making based on mathematical modelling

Longitudinal panels as a platform for diagnostic accuracy studies

Feedback triangle at the centre of a unified framework

Diagnostic test-intervention studies using a cluster-randomised approach

Discussion

Declarations

References

Additional Declarations

Status:

Version 1