Using Threshold Analysis to Assess the Robustness of Public Health Intervention Recommendations from Network Meta-Analyses: Application to Poison Prevention in Households with Children Under Five

In the appraisal of clinical interventions, complex evidence synthesis methods, such as network meta-analysis (NMA), are commonly used to investigate the effectiveness of multiple interventions in a single analysis. The results from a NMA can inform clinical guidelines directly or be used as inputs into a decision-analytic model assessing the cost-effectiveness of the interventions. However, there is hesitancy in using complex evidence synthesis methods when evaluating public health interventions. This is due to signicant heterogeneity across studies investigating such interventions and concerns about their quality. Threshold analysis has been developed to help assess and quantify the robustness of recommendations made based on results obtained from NMAs to potential limitations of the data. Developed in the context of clinical guidelines, the method may prove useful also in the context of public health interventions. In this paper, we illustrate the use of the method in the study investigating the effectiveness of interventions aiming to increase the uptake of poison prevention behaviours in homes with children aged 0-5.


Abstract Background
In the appraisal of clinical interventions, complex evidence synthesis methods, such as network metaanalysis (NMA), are commonly used to investigate the effectiveness of multiple interventions in a single analysis. The results from a NMA can inform clinical guidelines directly or be used as inputs into a decisionanalytic model assessing the cost-effectiveness of the interventions. However, there is hesitancy in using complex evidence synthesis methods when evaluating public health interventions. This is due to signi cant heterogeneity across studies investigating such interventions and concerns about their quality.
Threshold analysis has been developed to help assess and quantify the robustness of recommendations made based on results obtained from NMAs to potential limitations of the data. Developed in the context of clinical guidelines, the method may prove useful also in the context of public health interventions. In this paper, we illustrate the use of the method in the study investigating the effectiveness of interventions aiming to increase the uptake of poison prevention behaviours in homes with children aged 0-5.

Methods
Random effects NMA was carried out to assess the effectiveness of several interventions for increasing the uptake of poison prevention behaviours, focusing on the safe storage of other household products outcome.
Threshold analysis was then applied to the NMA to assess the robustness of the intervention recommendations made based on the NMA.

Results
15 studies assessing seven interventions were included in the NMA. The results of the NMA indicated that complex intervention, including Education, Free/low-cost equipment, Fitting equipment and Home safety inspection, was the most effective intervention at promoting poison prevention behaviours. However, the threshold analyses highlighted that this intervention recommendation was not robust.

Conclusions
In our case study, threshold analysis allowed us to demonstrate that the intervention recommendation for promoting poison prevention behaviours was not robust to changes in the evidence due to potential bias. Therefore, caution should be taken when considering such interventions in practice. We have illustrated the potential bene t of threshold analysis and, therefore, encourage the use of the method in practice as a sensitivity analysis for NMA of public health interventions.

Background
Evidence synthesis methods, including systematic reviews and meta-analysis, are used in evidence-based decision-making, for example, carried out as part of the technology appraisals of new health interventions. A range of meta-analytic methods are available for different data scenarios. Pairwise meta-analysis pools evidence from multiple studies that compare head-to-head two interventions, that are the same or similar across studies, to gain a pooled overall estimate of the relative treatment effect. However, issues with pairwise meta-analysis arise when more than two interventions need to be compared. Network metaanalysis (NMA) expands on the pairwise meta-analysis framework by allowing for the comparison of multiple interventions in a single analysis. The results from a NMA are often used to inform a decisionanalytic model assessing the cost-effectiveness of the interventions [1]. The effectiveness and costeffectiveness of interventions are vital components in policy decision-making and the development of guidelines, for example, by the National Institute for Health and Care Excellence (NICE).
Despite the known bene ts of NMA, there is some hesitancy in using NMA methods in public health intervention appraisals. Public health interventions can be highly complex as they can consist of multiple and often overlapping components. It is common to see substantial between-studies heterogeneity due to, for example, different study designs, which is often listed as the reason for not using meta-analysis methods [2].
As well as substantial between-studies heterogeneity, there is often concern regarding the quality of the studies evaluating public health interventions. Due to the nature of public health outcomes and corresponding interventions, there tends to be a broader range of study types in contrast to individualfocused randomised controlled trials (RCTs) typically seen in clinical settings. Due to the nature of RCTs, particularly the randomisation and blinding, they are considered to be the least biased source of evidence compared to other study designs such as non-randomised controlled trials (NRCTs) and observational studies. The broad range of study designs in public health introduces issues with the validity of the results from these studies and increases the potential risk of bias. This is one of the reasons behind the hesitancy for using NMA methods in the public health setting.
In a recent study, Smith et al (2021) highlighted that there is increasing use of evidence synthesis methods in the appraisals of public health interventions by NICE. Thirty-one percent (14/45) of NICE public health intervention appraisals used a meta-analysis as part of the statistical analysis assessing the effectiveness of such interventions, which is an increase of 8% since 2012. However, only one of these appraisals conducted a NMA [2].
All studies included in a NMA should be assessed in terms of their quality and the potential risk of bias. If the studies included in the NMA have issues with their conduct and design, causing problems with their validity or their relevance, then there will be concerns regarding the reliability and validity of the NMA estimates and rankings. The Cochrane risk of bias tool can be used to assess the quality and potential risk of bias for individual studies [3]. This is typically used for RCTs where the studies are assessed on several aspects whereby possible bias could occur. Each aspect of the trial design that could introduce bias is then assigned a judgment based on how susceptible the study is to bias. These judgements are rated "high", "low", or "unclear" [4]. For network meta-analysis, the Grading of Recommendations Assessment, Development and Evaluation (GRADE), also formerly known as GRADE NMA, has been developed to assess the quality of evidence contributing to the intervention contrasts for every pair of interventions. The quality of evidence for each contrast in the network is rated as high, moderate, low, or very low across ve areas: inconsistency, study limitations, indirectness, imprecision, publication bias. However, as networks become larger, loops of evidence become more complex leading to GRADE NMA becoming insu cient. Furthermore, whilst there are tools or methods that exist to assess the quality of evidence in NMA, these do not indicate or consider the impact any potential bias has on the intervention recommendations, which is less useful for decision-makers and guideline developers [5].
Threshold analysis, a method recently proposed by Phillippo et al [3], quanti es the sensitivity of effect estimates and decisions resulting from a NMA to any changes in the evidence. In this paper, we aim to illustrate that the application of threshold analysis in the public health setting can allow researchers and policy makers to assess and quantify the credibility of the results from NMAs in the presence of evidence that could be biased. We illustrate this using an example of NMA investigating the effectiveness of interventions to increase the uptake of poison prevention behaviours in homes with children under 5.

Network meta-analysis
Network meta-analysis (NMA) allows for the comparison of multiple interventions in a single analysis to obtain the relative effectiveness of all interventions compared to each other. In NMA, the structure of the network is used to gain indirect estimates of effects between interventions that have not been compared directly. For example, by combining trials that have direct evidence comparing interventions B versus A and trials of C versus B, we can estimate the indirect relative effect of interventions C versus A. The use of indirect evidence is suitable provided that we can assume the consistency in the network, indicating that there is little difference between the direct evidence from trials (in this case, trials of C versus A, if they exist in the network) and indirect evidence obtained from the network. By combining the direct and indirect evidence, NMA allows for the estimation of relative intervention effects for all interventions in the network and enables ranking of the interventions according to the probability of an intervention being the best, thus identifying the most effective intervention [6]. The results from the NMA are often incorporated into a decision-analytic model to consider the cost-effectiveness of interventions. We conducted a NMA in WinBUGS 1.4.3 using a Bayesian approach which gave effect estimates as odds ratios with 95% credible intervals.

Threshold analysis
Threshold analysis identi es how sensitive the intervention recommendations from a NMA are to the level of imprecision in the effect estimates [3]. It derives multiple thresholds that represent changes in the effect estimates that lead to a change in the intervention decision. As a result, the method can identify the smallest possible changes in the data point, in both the positive and negative direction, that can lead to the change in the conclusion of which intervention is deemed most optimal. Once the smallest threshold is identi ed from the analysis in either direction, the method allows us to construct an invariant interval for each effect estimate. An invariant interval represents the interval in which the effect estimate can lie within without altering the intervention decision. If either the 95% con dence or credible interval for an effect estimate extends beyond the invariant interval, then we can deduce that the intervention recommendation is sensitive to the level of imprecision in the effect estimate. Whereas, if the 95% con dence or credible interval lies within the invariant interval, then this means that the intervention decision for that estimate is robust.
Threshold analysis can be conducted at the study level and the contrast level. Study level threshold analysis considers the impact of imprecision in individual studies included in each intervention contrast in the network on the results of the NMA, including intervention ranking. Study level threshold analysis helps to assess the robustness of the intervention recommendation based on each study individually. Contrast level threshold analysis examines the robustness of the results from the NMA in the combined evidence for each intervention contrast in the network. That is, assuming that direct evidence for the contrast is present in the network, we assess the impact of imprecision in the combined evidence for that particular contrast on the results from the NMA. Contrast level analysis is more useful in guideline development as the robustness of the entire body of evidence is considered, rather than just the individual studies [3]. For the full algebraic breakdown of both study and contrast level threshold analyses, refer to Philippo et al [3].

Application
We adapted the threshold analysis to allow for the modelling of a random-effects NMA with a binary outcome. We applied it to a published NMA of studies evaluating the effectiveness of several interventions aimed to increase the uptake of poison prevention behaviours in homes with children under 5 [7]. The data were obtained from primary studies identi ed in two systematic reviews [8]. We replicated the published NMA using a random-effects NMA with a binary outcome; the uptake of poison prevention behaviours within a household. A variety of outcomes were considered in the original NMAs. However, in this paper, we focus on interventions to promote the safe storage of other household products. The data were obtained from 15 studies assessing the effectiveness of 7 different interventions. The studies included 10 RCTs, two NRCTs, two cluster RCTs and one cluster NRCT. As with any NMA, the quality of the studies was assessed before inclusion in the NMA. Table 1 includes the assessment of each study quality as reported [7].
The interventions compared across these studies were:  The upper triangle contains the results from the NMA, and the lower triangle contains the results from a random-effects pairwise meta-analysis.
The network plot showing the comparisons between interventions evidenced in these studies can be seen in Figure 1.

Results
Network meta-analysis (NMA) The results from the NMA can be seen in Table 2, listing the relative effects of all interventions present in the network. The relative effectiveness of the interventions are presented as odds ratios (ORs) with 95% credible intervals. From Table 2, we can see that most interventions are more effective at increasing the uptake of the poison prevention behaviours for the safe storage of other household items than usual care, apart from the free/low-cost equipment intervention. Using the results of the NMA, we ranked the interventions according to which was the most effective at increasing the uptake of the poison prevention measures in the home. The results from the rankings can be seen in Table 3. Last column includes the number of households with safe storage out of the total number of storage.

Abbreviations:
A = adequate allocation concealment; B = blinded outcome assessment; C, the prevalence of confounders does not differ by more than 10% between treatment arms; CBA, controlled before-andafter study; F = at least 80% participants followed up in each arm; NMA, network meta-analysis; RCT, randomised clinical trial; U = unclear; Y= yes.   Table 3, we can see that the intervention with the highest probability of being the most effective is education + free/low-cost equipment + tting + home safety inspection (E + FE + F + HSI), which is the most intensive intervention. This intervention was also ranked highest along with education + free/low-cost equipment + tting (E + FE + F). The least effective interventions were usual care and free/low-cost equipment only. There was overlap between the 95% credible intervals for the rankings for all the interventions, indicating that no distinct intervention is optimal or worst.

Study level threshold analysis
The studies are sorted according to those with the smallest thresholds, with the intervention contrasts for each particular study identi ed in the brackets; the studies emphasised in bold represent those in which the 95% con dence interval for the effect estimate extends beyond the invariant interval. Where the 95% CI extends beyond the invariant interval, the invariant interval is coloured red rather than light blue. The optimal intervention for this NMA is intervention 6. The new optimal intervention is displayed for that particular study on either side of the invariant intervals. NT indicates "No threshold", meaning that no threshold exists in that particular direction, so no amount of adjustment in that particular direction would change the optimal intervention from intervention 6. Figure 2 presents the results of the study level threshold analysis. We can see that of the 15 studies included in the network meta-analysis, 7 studies had 95% con dence intervals extending beyond the invariant interval (indicated in bold). This demonstrates that the intervention recommendations are sensitive to the amount of imprecision in the study estimates in studies: 6, 7, 9, 10, 12, 14, and 15. For example, for study 15, which compared interventions 4 and 6, the estimated log OR of 0.04 had an invariant interval of (0.00, NT). This indicates that a change of -0.04 in the log OR would change the optimal intervention recommendation from intervention 6 to intervention 4. The NT in the upper invariant interval represents "No threshold", which illustrates that no amount of change in this direction would change the optimal intervention recommendation. For study 10, which compared interventions 1 and 4, the estimated log OR of 2.76 has an invariant interval of (2.19, 50.88). This illustrates that a change in the log OR of -0.57 is substantial enough to change the intervention recommendation from intervention 7 to intervention 3. Therefore, a change in the log odds ratio of 0.82 would change the intervention recommendation to intervention 3 being the most optimal rather than intervention 6. However, for studies 6 and 12, the upper limits of the invariant intervals lie very close to the upper limits of the 95% con dence intervals. For the remaining 8 studies, their relative 95% con dence intervals fall within the invariant intervals, which indicates that no amount of change in the study estimates would change the intervention recommendation as the bias adjustment threshold is very large.
Contrast level threshold analysis It is important to note that when only one study observes a particular contrast in the network, the results of the threshold analyses at study level and contrast level must be consistent. From Figure 1, there are two twoarm studies in the network, which are single studies for comparisons 7 vs 3 and 6 vs 4. From Figure 3, we can see that the thresholds for the contrast 6 vs 4 are identical to those corresponding to study 15 in the study level analysis (as seen in Figure 2), as expected. However, we can see that the 95% credible interval for the effect estimate is wider in the contrast level analysis than the 95% con dence interval in the study level analysis. This is due to the combined NMA result being less precise than the study estimate due to the large level of heterogeneity in the NMA. However, for the 7 vs 3 contrast, both the effect estimates and thresholds are different at the study level and the contrast level. Despite the quantitative differences between the study level and the contrast level analyses for this comparison, the results for this particular contrast/study are consistent qualitatively. There is a lot of uncertainty around the effect estimate for this contrast/study, and the upper threshold (in favour of intervention 7) lies well within the con dence interval at study level and credible interval at contrast level.

Discussion
The network meta-analysis identi ed that the most intensive intervention is the most effective at increasing the uptake of the poison prevention behaviours for the safe storage of other household products to promote poison prevention behaviours in homes with children under 5. The usual care and free/low-cost equipment interventions were identi ed as the least effective interventions. The results from the NMA and both the study and contrast level threshold analyses indicated that no distinct intervention could be recommended as the most optimal intervention. This is illustrated by the credible intervals for the effect estimates from the NMA and the overlapping intervention rankings. In the threshold analysis, this was re ected in the small thresholds identi ed in the analyses, which meant that a small change in the evidence would result in an alternative intervention being most effective. Furthermore, the intervention recommendation from the NMA was not robust, as the effectiveness estimate was sensitive to the level of imprecision in the evidence and potential bias.
As recommended by Phillippo et al [3], any studies with reasonably small thresholds need to be assessed for risk of bias by using the tools discussed previously. From the threshold analysis, there were 3 studies with thresholds less than 0.5; these were studies 6, 7 and 15. By referring to the study quality assessment in Table 2, these studies did not appear to be particularly at risk of bias and did not have any major issues with their quality.
A limitation of this work is that threshold analysis has only been applied to one example of a public health intervention evidence synthesis. A further limitation is that the published NMA example only had a small number of studies. There was little evidence for many of the contrasts. As well as this, there was no distinct or clear intervention recommendation from the NMA as all effect estimates contained 1, and the rankings overlapped. However, this example still illustrates the use of NMA and threshold analysis in the context of public health.
Threshold analysis allows researchers to identify and quantify the robustness of intervention recommendations from NMAs to any potential bias in the evidence. The use of this method provides researchers and policy makers with the con dence that their results from NMAs are robust to changes in the evidence that might be due to bias. It is important to note that threshold analysis does not investigate the presence or absence of any particular bias and does not make any assumptions on the type and source of the bias. Threshold analysis is more concerned with the implications, if there is any bias present, that such bias would have on the intervention recommendations and resulting decisions.
There still should be some careful consideration when applying complex evidence synthesis methods to highly heterogeneous data, as this method is not a way to x the issues that arise. The primary consideration with heterogeneity is that we should account for it appropriately rather than avoid complex analyses due to the arising issues. Heterogeneity is inevitable, especially in public health intervention appraisals. The use of advanced methods for evidence synthesis, including the appropriate account of the heterogeneity, can lead to more detailed and robust conclusions, which will improve research and aid the decision-making process.
Threshold analysis could be extended to incorporate GRADE judgements in the analyses, as seen in the paper by Holper 2019 [9]. The use of GRADE judgements alongside threshold analysis offers a qualitative judgement as well as quantitative. Threshold analysis could also be incorporated into a cost-effectiveness analysis to consider the robustness of decisions on the cost-effectiveness of interventions.
A further application of threshold analysis could be to components network meta-analysis. Component network meta-analysis expands on the NMA framework and splits the interventions into components to consider which combination of components is most effective. The interventions in the NMA assessed in this example consist of several components, for example, education, tting, and home safety inspection, so it could be more appropriate to explore which combinations of these components, not just the ones observed, are most effective. As well as this, in recent literature, threshold analysis has been applied to continuous and binary outcomes. These methods could be extended to look at other possible outcomes.

Conclusion
Applying threshold analysis to a NMA of public health interventions, we have highlighted that the intervention recommendation for the network was not robust, as the decision was sensitive to the imprecision in the effect estimate and to possible bias. We have illustrated that threshold analysis gives an insight into the effects of changes in the evidence on the resulting intervention decisions. The application of threshold analysis should ease any hesitancy to use complex evidence synthesis methods, such as NMA, in public health intervention appraisals. The increase in the use of such methods in public health intervention appraisals can improve the standard of the evaluation of interventions and, consequently, the decisionmaking process, with bene t to policy-makers and the public.  Study level forest plot for the safe storage of other household products outcome.

Figure 3
Contrast level threshold analysis for safe storage of other household products outcome.