A systematic review and classiﬁcation of information-sharing methods

Background: Sparse relative eﬀectiveness evidence is a frequent problem in Health Technology Assessment (HTA). Where evidence directly pertaining to the decision problem is sparse, it may be feasible to expand the evidence-base to include studies that relate to the decision problem only indirectly: for instance, when there is no evidence on a comparator, evidence on other treatments of the same molecular class could be used; similarly, a decision on children may borrow-strength from evidence on adults. Usually, in HTA, such indirect evidence is either included by ignoring any diﬀerences (‘lumping‘) or not included at all (‘splitting‘). However, a range of more sophisticated methods exists, primarily in the biostatistics literature. The objective of this study is identify and classify the breadth of the available information-sharing methods. Methods: Forwards and backwards citation-mining techniques were used on a set of seminal papers on the topic of information-sharing. Papers were included if they speciﬁed (network) meta-analytic methods for combining information from distinct populations, interventions, outcomes or study-designs. Results: Overall, 89 papers were included. A plethora of evidence synthesis methods have been used for information-sharing. Most papers ( n = 78 ) described methods that shared information on relative treatment eﬀects. Amongst these, there was a strong emphasis on methods for information-sharing across multiple outcomes ( n = 39 ) and treatments ( n = 23 ), with fewer papers focusing on study-designs ( n = 10 ) or populations ( n = 6 ). We categorise and discuss the methods under four ’core’ relationships of information-sharing: functional, exchangeability-based, prior-based and multivariate relationships, and explain the assumptions made within each of these core approaches. Conclusions: This study highlights the range of information-sharing methods available. These methods often impose more moderate assumptions than lumping or splitting. Hence, the degree of information-sharing that they impose could potentially be considered more appropriate. Our identiﬁcation of four ‘core‘ methods of information-sharing allows for an improved understanding of the assumptions underpinning the diﬀerent methods. Further research is required to understand how the methods diﬀer in terms of the strength of sharing they impose and the implications of this for health care decisions. Abstract


Background
Health Technology Assessment (HTA) is the systematic evaluation of the properties, effects and impact of health technologies with a view to inform decisionmaking in health care [1]. Regardless of whether or not a system functions under explicit budget constraints, resources spent could have always been used for alter-* Correspondence: georgios.nikolaidis@york.ac.uk The University of York, Centre for Health Economics, Alcuin A Block, Heslington, YO10 5DD York, UK Full list of author information is available at the end of the article native purposes. Therefore, policy-makers are always faced with difficult decisions about whether interventions should be funded. This requires an assessment of whether the benefits of an intervention are sufficient to justify the health opportunity costs of funding it [2]. It follows that a set of tools ought to be used so that policy-makers can rationally and transparently decide about the adoption of a given health technology [3].
Decision analysis provides a quantitative framework that brings together all relevant evidence on the im-pact of an intervention on health outcomes and costs, whilst making explicit judgements about how different types and sources of evidence are linked together (model structure) and which elements are relevant to decision-making (reflecting social values). The outputs of a Decision Analytic Model (DAM) include incremental costs and benefits and can be useful for decision-makers [4].
Each input within a DAM is a parameter and constitutes a potential research question that can be informed by evidence which is typically identified using literature reviews. To assist study selection when identifying evidence for reviews, research questions are defined using the PICOS framework, where P stands for Population, I for Intervention, C for Comparator, O for Outcome, and S for Study-design [5]. Typically, reviewers exclude studies deviating from the inclusion criteria on any PICOS dimension; that is, they usually only include studies providing direct evidence. Hence, direct evidence on relative effectiveness comprises of one or more randomised studies, evaluating the intervention(s) under assessment, recruiting patients from the population of interest, and measuring effects on all relevant outcomes.
Where multiple studies exist to inform the same parameter, these can be synthesised to generate a single estimate that represents the evidence-base. To synthesise the evidence base and provide DAMs with relative effectiveness inputs, standard Meta-Analysis (MA) and Network Meta-Analysis (NMA) methods [6,7] are commonly used. Although synthesis is more common for Relative Treatment Effects (RTEs), evidence synthesis methods can also be applied for other DAM inputs such as costs and Quality of Life (QoL).
However, in HTA, direct evidence may be sparse, heterogeneous, or limited in other ways and synthesis may become problematic. Where evidence is sparse, it may not be possible to obtain the required Relative Treatment Effect (RTE) estimates, and even when they can be obtained, they may be highly uncertain and may not be robust due to assumptions imposed in the analysis [8,9]. Evidence sparsity may also prevent appropriate exploration of heterogeneity because small studies are at higher risk of enrolling unrepresentative populations [10] and provide less evidence to enable robust subgroup analyses.
A policy relevant alternative to limited or sparse data may be to extend the evidence base beyond the direct evidence. A topical example concerns paediatric indications for which the evidence-base is typically sparse due to the regulatory restrictions on trials. To support decision-making for this population, the U.S. Food and Drugs Administration (FDA) [11] and the European Medicines Agency (EMA) now propose that "The evidence needed to address the research questions that are important for marketing authorisation of a given product in the target population might be modified based on what is known for other population" [12]. Whilst in the aforementioned example the evidence is extended to consider another population, in principle, indirect evidence may relate to any other dimension of PICOS ( Figure 1) -it may include studies assessing a different, but related, treatment or pertaining to a different study-design than what is specified in the research question. Note that, in this context, NMA also considers indirect evidence, pertaining to other treatment comparisons i.e. indirect evidence on the 'Intervention' PICOS dimension, to inform the treatment effect(s) of primary interest [13].
[INSERT FIGURE 1 HERE] Within a decision-making context, the use of indirect evidence, as long as it is judged relevant, contributes to accountability by allowing for all relevant evidence to be considered. Combining all relevant sources of evidence may yield more precise estimates than the direct evidence alone and allow better characterisation of heterogeneity and uncertainty. However, when indirect evidence are not sufficiently relevant or of high-quality, using indirect evidence may also introduce bias and inflate heterogeneity estimates.
The use of indirect evidence to support decisionmaking is not exclusive to the aforementioned regulatory context and has permeated HTA processes. Examples can be found in Technology Appraisals (TAs) conducted by the National Institute for Health and Care Excellence (NICE) to inform routine use of technologies in the National Health Service (NHS) in England and Wales. For instance, TA445 [14] considered adult studies to complement a sparse paediatric evidence base. Also, relative effectiveness has been generalised across subgroups of different Hepatitis C genotypes [15]. These two examples use indirect evidence by considering both sources perfectly generalisable ('lumping'), as an alternative to being considered completely independent ('splitting'). There are, however, examples of appraisals which use indirect evidence in more sophisticated ways. For instance, (TA383) [16] used indirect evidence across interventions by assuming a 'class-effect' between treatments that function through the same molecular pathway. TA139 [17] and TA168 [18] simultaneously modelled two outcomes leveraging their correlation structure and TA244 [19] modelled a network of interventions with multiple treatment components assuming that the relative effect of an intervention is the sum of the relative effects of its comprising components.
Inevitably, a judgement on whether the indirect evidence is relevant is always required. However, what is often not made explicit is that, where both direct and indirect evidence are considered, there should be appropriate consideration for the extent of informationsharing permitted by different synthesis methods (i.e. the extent to which the indirect evidence is allowed to affect the estimates obtained by using only the direct evidence).
The objective of this review is to identify informationsharing evidence synthesis methods that have been used in the literature and improve understanding of these methods by making explicit the fundamental assumptions underpinning them. We do so by identifying the 'core' relationships used to share information. This review increases awareness around the breadth of available information-sharing methods and aids transparency in information-sharing methods choice. To our knowledge, this topic has not been explored in the past with a clear policy focus.

Methods
We scoped the literature to inform the design and conduct of our systematic review. The aims of our scoping process were to clarify working definitions, determine inclusion and exclusion criteria, and understand whether the most suitable way of systematically searching the literature is using keyword-based or 'citation-mining' methods. Details on the scoping process are provided in Additional file 1. During our scoping process, we found that the literature lacked consistent terminology when referring to methods that combined direct and indirect evidence. Therefore, for our systematic review, instead of keyword-based search methods [20], we used citation-mining methods [21] which are efficient [22] and have been used for similar reviews [23]. Briefly, the citation-mining process comprised two steps: first, a list of seminal/influential papers was compiled after scoping the literature and consulting with two external evidence synthesis experts. Seminal papers were selected to represent a variety of fields including MA, NMA, multi-parameter evidence synthesis, and the incorporation of evidence on historical controls in trial-design. Second, the citation-mining review was conducted using the final list of seminal papers.
The citations of 7 seminal papers [8,[24][25][26][27][28][29] were searched in the Web of Science (WoS) on 20-Feb-2019. Subsequently, articles were identified that cited -forwards citation-mining-or were cited by the seminal papers -backwards citation-mining-. Articles were included if they mathematically specified MA or NMA models that combine information pertaining to multiple populations, interventions, outcomes, studydesigns or utilise evidence from an external source such as previous meta-analyses. Importantly, papers that used only standard NMA methods were excluded even though they share information across treatment comparisons, because such methods are well established in the literature. Further information on the search strategy and inclusion and exclusion criteria is provided in Additional file 1.
From each included paper, the synthesis model was isolated and from within it, methods facilitating information-sharing were extracted. Methods were subsequently categorised according to the 'core' relationship that they used to enable information-sharing. When papers tackled multiple synthesis challenges simultaneously (e.g. [30][31][32]), the issues they dealt with were isolated along with the method used to address each. The PICOS dimension of indirectness was also extracted. The search was conducted in Zotero version 5.0.69. The PRISMA checklist for systematic reviews is provided in Additional file 2.

Characteristics of the included studies
The review identified 89 papers ( Figure 2). The majority (n = 78) described methods that shared information on relative treatment effects. Other studies used methods to share information on comparison-specific meta-regression slopes (n = 4), comparison-specific between-studies heterogeneities (n = 6), or studyspecific baselines (n = 2). Overall, there was a balance amongst papers that developed methods within MA (n = 45) and NMA (n = 44). There was a strong emphasis on methods for information-sharing across multiple outcomes (39 papers) and treatments (23 papers), with fewer papers focusing on study-designs (10 papers) or populations (6 papers) ( Table 1). Note that some papers described methods sharing information on several types of parameters and across more than one PICOS dimension (e.g. [30][31][32]). A full list of the included papers along with a description of how information was shared within each paper can be found in Additional file 3. [
[INSERT FIGURE 3 HERE] Table 1 classifies papers according to the 'core' method used and the PICOS dimension of indirectness. It shows that some 'core' relationships are preferred when information is shared across specific PI-COS dimensions. For instance, most of the identified papers sharing across interventions either use functional or exchangeability-based relationships, and no example using priors was found. Also, papers that use multivariate relationships, do so to share information across related outcomes, not across populations or study-designs. This may be partly because the information required to implement multivariate methods for multiple populations on study-designs is usually unavailable in the literature. For instance, to synthesise evidence on multiple populations using multivariate methods, we would need studies that enrol all relevant populations and report separately for each, and such information is rarely provided.

Functional relationships
The simplest functional relationship is lumping (i.e. common effects) where all data points inform a single parameter independently of whether the evidence is direct or indirect. Examples include pooling RTEs across time-points [32], (sub-)populations [14,35], or interventions of the same treatment class [30,32,39,45], as well as pooling between-trial heterogeneity parameters [45,100,102,103] or meta-regression slopes [36,99,101].
Another type of functional relationship is constraint where a strict inequality is imposed among parameters. In a Bayesian framework, information-sharing is facilitated by preventing simulation samples that do not conform to the specified constraint. Such methods have been used to relate RTEs across dosages, expressing that higher dosages are expected to exhibit larger RTE [39,40], describe structurally-related outcomes [67], and specify second-order consistency equations that impose a triangle inequality on the comparisonspecific between-trial variances [102,103].
Meta-regression-type methods have also been suggested. In the examples found, the relationships were usually linear -on the modelling scale-with one RTE component independent and another RTE component dependent on a particular study characteristic. The most common example in this category is bias-adjustment methods, primarily used to synthesise studies of different designs. Bias-adjustment methods broadly fall into two categories: general frameworks that adjust the RTE for biases affecting internal and external validity provided that the extent of bias can be either estimated from empirical evidence or elicited from experts [57,58,65,66], and approaches that adjust for bias due to particular study-level characteristics (considered proxies for study quality such as their size [47,[61][62][63][64], publication year [106], or risk-ofbias [59,60]). Meta-regression-type relationships have also been used for complex interventions. In their simplest form, they model the RTE of a complex intervention as the sum of RTEs of its treatment components [30,45,49,50]. More sophisticated approaches allow for synergistic or antagonistic relationships by suggesting functions that also contain treatment interaction RTE components [48]. Other applications include approaches that model the RTEs for two survival outcomes (e.g. time-to-mortality and time-to-progression) by assuming that they only differ by a constant component which is invariant across treatment comparisons [31], models that assume a linear relationship between dosage and RTE [39,44], methods for baseline-risk adjustment [36], and models that relate the relative effects of populations subgroups of differing disease severity [35].  Finally, more complex, non-linear, relationships have also been presented in the literature, namely those enabling the synthesis of RTEs across a range of dosages using the Emax model [38,41,42] commonly employed in pharmacokinetics or other non-linear models [39] and those enabling the sharing of information across follow-up periods [68,69].

Exchangeability-based relationships
The simplest exchangeability-based relationship uses a random effect to relate a set of parameters; in this way accounting for heterogeneity without explicitly modelling its source(s). The random effect assumes that all parameters are drawn from a distribution, implying that individual parameters are shrunken towards the random effect mean; this can happen to a greater or lesser extent, depending on the precision and discrepancy of each individual estimate in relation to the random effect mean. Examples of parameters to which random-effects have been applied include: comparisonspecific meta-regression slopes [36,47,61,99,101], comparison-specific between-trial variances [102,103], and study-specific baseline-risks [36,37].
Random-walks are another form of exchangeability relationship. They assume that data points which are more similar with respect to a particular characteristic are expected to exhibit more similar RTEs. Examples include approaches assuming that the RTE of a particular dosage or follow-up period is drawn from a distribution centred around the RTE of its adjacently lower or higher dosage [39] or follow-up period [43,68].
Multi-level models also use exchangeability, but apply it to the hierarchical/clustered structure of the available data. As such, exchangeability is applied at a first level within specific groups of parameters (i.e. multiple random effects are applied, each within groups of RTEs from studies showing a particular characteristic) and at a second level across the group-specific hyper-parameters. This is shown in Figure 4, where in the bottom level, studies are categorised according to a characteristic and a different random effect is imposed within every category, producing group-specific basic parameters and heterogeneities. Subsequently, in the top-level, exchangeability is also be assumed across the group-specific basic parameters which are shrunk towards an overall, global, group-independent, hyper-mean. Examples include 'class-effects' models where, on top of the classical Random-Effects (RE) NMA models, the basic parameters of treatments that function through the same mechanism are assumed to be drawn from a common distribution with an overall 'class' mean and an across-treatments, within-class, heterogeneity [32,35,39,40,44,46]. Class-effect approaches have also been imposed across comparison-specific metaregression slopes [47,99]. Multi-level models have been suggested to combine adult and paediatric evidence [34], RTEs measured at different time-points [30], and studies of different designs [51,52,54,105].

[INSERT FIGURE 4]
Prior-based relationships Direct and indirect evidence can also be combined through the use of prior distributions. The process usually consists of two-steps where initially the indirect evidence is analysed and subsequently the resulting distribution is used as a prior in the analysis of the direct evidence. Of note is that this approach is mathematically equivalent to lumping, which was described under functional relationships. Examples include the combination of adult and paediatric evidence [34] or randomised and non-randomised evidence [51,52,55,56,105]. The prior can additionally be adjusted for bias or its precision decreased [51]. Alternative ways to define the prior include the use of metaepidemiological evidence or expert elicitation. The former has been used primarily for bias-adjustment [60], whilst both the former [24,96,97] and the latter [107] have been used to define prior a distribution for the between-trials heterogeneity.
More nuanced prior-based approaches such as mixtures of priors have also been used. Here, the informative prior (distribution representing the indirect evidence) is not used at face value, but instead mixed with a vague prior according to weights that may be specified by the analyst or estimated within the synthesis model. The resulting informative prior is typically heavy-tailed, and allows for 'adaptive' informationsharing whereby information-sharing is stronger when the direct and indirect evidence are in agreement and weaker when they conflict [33]. Mixtures of priors have been used to combine evidence on RTE and betweenstudies heterogeneity across adults and children [33] and to analyse the study-specific baseline parameters from studies that enrol populations with different baseline risk [36]. The use of mixtures of priors has also been discussed for the synthesis of randomised and non-randomised evidence [51].
Finally, a flexible method that has been proposed is the power-prior [108]. In this method, the likelihood of the indirect evidence is raised to a power scalar 0 a 1 which reflects the perceived similarity between the two sources of evidence. When a = 1 the results are equivalent to 'lumping' and when a = 0 results are identical to 'splitting'. The power parameter, a, needs to be specified, and it has been proposed to be elicited [109] or varied in sensitivity analysis [110]. Power priors have been used to combine observational and randomised evidence [53] and for the synthesis of adult and paediatric evidence [34].

Multivariate relationships
Multi-variate relationships have primarily been used to share information across multiple outcomes. Multivariate meta-analysis correlates the various outcomes and may separate within-and between-studies correlations [75]. At the within-study level, the studyspecific correlations arise due to differences among the included patients and indicate how the outcomes covary across individuals within the study. For example, patients who, due to a baseline characteristic that makes their disease more severe, show high values for outcome A, are also more likely to yield high values for outcome B. At the between-studies level, correlations arise mainly due to study-level differences such as the distribution of the patient-level characteristics across studies. For instance, studies that enrol more severe cases and therefore may show high values for the mean of outcome A, are also more likely to result in high values for the mean of outcome B, whilst studies enrolling less severe cases may show lower mean values for both outcomes. These models can potentially produce more precise estimates [89] and mitigate outcome reporting bias [111,112].
Multivariate methods have been developed to consider two [76,77,88], three or more correlated outcomes [26,91], accommodate the simultaneous analyses of multiple treatments [78,79,81], and assess the relationship between surrogate and final outcomes [74,84]. Given that within-trial correlations are commonly not reported, authors have suggested the use of external data to inform these parameters [73] or, when external data is not available, methods that approximate the within-study co-variances [90]. Further extensions have been developed to handle missing data [86], assist the estimation of the between-studies covariance matrix when only a few studies are available [87], model the within-studies covariance structure using copulas [93], and allow modelling of heterogeneity and inconsistency using two separate variance components [85].
To accommodate cases where the within-trials correlations are unavailable and cannot be otherwise obtained, alternative methods, which require the same data as a univariate approach and do not separate within-and between-trials correlations have been suggested for MA [83,104] and NMA [79]. Assuming that the overall correlation is not very strong, these methods perform very similarly to their counterpart, which separates the two correlations, whilst preserving their benefits against the univariate approach.
Finally, some methods only account for either the within-or the between-studies correlations. For example, to model mutually exclusive outcomes, it has been suggested to only account for the within-trials negative correlations which are induced by the competing risks structure of the data (i.e. the more patients that reach an outcome, the fewer the patients that reach another outcome) [80]. Also, other approaches have only modelled the between-studies covariance matrix to allow simultaneous synthesis of multiple outcomes [30,31,67,82], accommodate outcomes reported at several follow-up periods [70,71] and enable information-sharing across different treatment components of complex interventions [45].

Discussion
The aim of this review was to identify and classify evidence synthesis methods that have been used to combine evidence from sources that relate directly and indirectly to a particular research question. A wide range of methods have been developed to share information between populations, treatments, outcomes and study-designs. We found that across the breadth of methods identified, four 'core' relationships are used to facilitate information-sharing. These are functional, exchangeability-based, prior-based, and multivariate relationships and are illustrated in Figure 3.
This review highlights the breadth of methods options that can facilitate information-sharing. Although, typically, particular relationships are used preferentially to share information on specific informationsharing contexts, it is likely that several methods are applicable and analysts would need to choose which method is more appropriate. This paper highlights that appropriate considerations need to be made when choosing 'core' relationships and methods as choices are likely to influence the degree of informationsharing. Specifically, method selection may be informed by the following considerations; the first is the plausibility of the assumptions imposed by the methods in the context of interest. By classifying methods according to the 'core' relationship that enables information-sharing, we hope to facilitate a clearer discussion about the plausibility of these assumptions in the decision context of interest.
The second is the degree of information-sharing that is imposed between direct and indirect evidence. Within the literature, there is limited exploration of how much different methods borrow-strength from indirect evidence, though for multivariate methods, it has been noted that information-sharing is 'usually modest' [26,92] and, sometimes, instead of 'borrowing-strength', multi-variate methods may end up 'borrowing-weakness' [113]. The few studies that have assessed the degree of information-sharing typically consider only the degree of precision gains [114] rather than also examining how the point estimatewhich is also important for decision making -changes. Further research to understand the extent to which different methods share information is warranted.
Finally, decision-makers may be interested in exploring different levels of information-sharing. One way to do that is by using prior-based methods that allow some control on the degree of information-sharing. For instance, an informative prior may use either the posterior distribution of the mean, or the predictive distribution of the indirect evidence. The former is equivalent to lumping, whilst the latter imposes less information-sharing. Similarly, mixture priors can regulate the weight that is placed on the informative component, and power-priors allow a range of values to be used for α which determines the extent of informationsharing.
Whilst we believe that the above identification of 'core' relationships is exhaustive, the use of citationmining techniques may have missed relevant methods, particularly those outside of health research. Additionally, we only looked for methods that shared information between sources of evidence that address different research questions. Hence, methods such as commensurate priors which have been used to combine individual-patient data and aggregate-level evidence on the same research question [115] could also be useful for combining evidence sources that pertain to different research questions, but were here considered outside of the scope of the search.
This paper is the first to summarise and categorise the existing literature by classifying methods according to the 'core' assumption that they use to facilitate information-sharing. Further research could explore the following questions: first, how can we determine whether indirect evidence is relevant? Second, how can the appropriateness of each informationsharing method be assessed for the synthesis problem at hand? Finally, can the extent of information-sharing be quantified to assist transparent decision-making?

Conclusions
We conclude that a plethora of methods has been used to facilitate information-sharing. These can be categorised according to the main assumption they impose into functional, exchangeability-based, priorbased, and multivariate relationships. Despite the wide range of available methods, these are often used preferentially without ensuring that all options have been explored. Given that methods may differ in the degree of information-sharing they impose, the implication is that the chosen method may impose stronger or weaker information-sharing that what is considered appropriate by policy-makers. Further research should investigate ways of judging the appropriateness of the degree of information-sharing imposed by each method, and assess the impact of using different methods on decisions.

Declarations
Ethics approval and consent to participate Not applicable.
Consent for publication All authors have provided their consent for the publication of the final manuscript.
Availability of data and materials The list of included studies in the systematic review is available online in the following url: https://www.zotero.org/groups/2360368/citation-mining_included-studies

Competing interests
The authors declare that they have no competing interests.

Funding
This work was funded by a doctoral studentship awarded to GFN by the Centre for Health Economics. The Centre for Health Economics did not have a role in the design of the study, the collection, analysis, and interpretation of data, or in writing the manuscript.
Authors' contributions GFN drafted the initial manuscript, conducted the citation-mining review and coordinated contributions from all authors in drafting the final manuscript. MS oversaw the work. MS, BW, and SP participated in the study-design, revised the manuscript and approved its final version.     A categorisation of papers that share information on the relative effectiveness parameter according to the 'core' relationship that they use and the PICOS dimension that direct and indirect evidence differ in.

Additional Files
Additional file 1 -Search strategy. A description of the inclusions and exclusion criteria of the search as well as the number of citations for each one of the seminal papers/'pearls' and the number of times it has been cited.
Additional file 2 -PRISMA checklist for systematic reviews. The PRISMA checklist for systematic reviews, indicating the page at which each characteristic of the review is described.
Additional file 3 -Brief summary of each included study. A multi-page table including a brief description of how each of the included papers shared information between direct and indirect information and on which PICOS dimension.