Classifying information-sharing methods: a citation-mining systematic review

Background: Sparse relative eﬀectiveness evidence is a frequent problem in Health Technology Assessment (HTA). Where evidence directly pertaining to the decision problem is sparse, it may be feasible to expand the evidence-base to include studies that relate to the decision problem only indirectly: for instance, when there is no evidence on a comparator, evidence on other treatments of the same molecular class could be used; similarly, a decision on children may borrow-strength from evidence on adults. Usually, in HTA, such indirect evidence is either included by ignoring any diﬀerences (‘lumping‘) or not included at all (‘splitting‘). However, a range of more sophisticated methods exists, primarily in the biostatistics literature. The objective of this study is identify and classify the breadth of the available information-sharing methods. Methods: Forwards and backwards citation-mining techniques were used on a set of seminal papers on the topic of information-sharing. Papers were included if they speciﬁed (network) meta-analytic methods for combining information from distinct populations, interventions, outcomes or study-designs. Results: Overall, 89 papers were included. A plethora of evidence synthesis methods have been used for information-sharing. Most papers ( n = 78 ) described methods that shared information on relative treatment eﬀects. Amongst these, there was a strong emphasis on methods for information-sharing across multiple outcomes ( n = 39 ) and treatments ( n = 23 ), with fewer papers focusing on study-designs ( n = 10 ) or populations ( n = 6 ). We categorise and discuss the methods under four ’core’ relationships of information-sharing: functional, exchangeability-based, prior-based and multivariate relationships, and explain the assumptions made within each of these core approaches. Conclusions: This study highlights the range of information-sharing methods available. These methods often impose more moderate assumptions than lumping or splitting. Hence, the degree of information-sharing that they impose could potentially be considered more appropriate. Our identiﬁcation of four ‘core‘ methods of information-sharing allows for an improved understanding of the assumptions underpinning the diﬀerent methods. Further research is required to understand how the methods diﬀer in terms of the strength of sharing they impose and the implications of this for health care decisions. Abstract


Background
Health Technology Assessment (HTA) is the systematic evaluation of the properties, effects and impact of health technologies with a view to inform decisionmaking in health care [1]. Regardless of whether or not a system functions under explicit budget constraints, resources spent could have always been used for alter-* Correspondence: georgios.nikolaidis@york.ac.uk The University of York, Centre for Health Economics, Alcuin A Block, Heslington, YO10 5DD York, UK Full list of author information is available at the end of the article native purposes. Therefore, policy-makers are always faced with difficult decisions about whether interventions should be funded. This requires an assessment of whether the benefits of an intervention are sufficient to justify the health opportunity costs of funding it [2]. It follows that a set of tools ought to be used so that policy-makers can rationally and transparently decide about the adoption of a given health technology [3].
Decision analysis provides a quantitative framework that brings together all relevant evidence on the im-pact of an intervention on health outcomes and costs, whilst making explicit judgements about how different types and sources of evidence are linked together (model structure) and which elements are relevant to decision-making (reflecting social values). The outputs of a Decision Analytic Model (DAM) include incremental costs and benefits and can be useful for decision-makers [4].
Each input within a DAM is a parameter and constitutes a potential research question that can be informed by evidence which is typically identified using literature reviews. To assist study selection when identifying evidence for reviews, research questions are defined using the PICOS framework, where P stands for Population, I for Intervention, C for Comparator, O for Outcome, and S for Study-design [5]. Typically, reviewers exclude studies deviating from any of the PI-COS criteria; that is, they usually only include studies providing direct evidence. Hence, direct evidence on relative effectiveness comprises of one or more randomised studies, evaluating the intervention(s) under assessment, recruiting patients from the population of interest, and measuring effects on all relevant outcomes.
Where multiple studies exist to inform the same parameter, these can be synthesised to generate a single estimate that represents the evidence-base. However, due to the availability of evidence, the need for evidence synthesis is more common for parameters like the Relative Treatment Effect (RTE) than for other DAM inputs such as costs and Quality of Life (QoL). Standard Meta-Analysis (MA) and Network Meta-Analysis (NMA) methods [6,7] are commonly used to synthesise the evidence base and provide DAMs with relative effectiveness inputs.
However, in HTA, direct evidence may be sparse, heterogeneous, or limited in other ways and synthesis may become problematic. Where evidence is sparse, it may not be possible to obtain the required RTE estimates, and even when they can be obtained, they may be highly uncertain and may not be robust due to assumptions imposed in the analysis [8,9]. Evidence sparsity may also prevent appropriate exploration of heterogeneity because small studies are at higher risk of enrolling unrepresentative populations [10].
A policy relevant alternative to limited or sparse data may be to extend the evidence base beyond the direct evidence. A topical example concerns paediatric indications for which the evidence-base is typically sparse due to the regulatory restrictions on trials. To support decision-making for this population, the U.S. Food and Drugs Administration (FDA) [11] and the European Medicines Agency (EMA) now propose that "The evidence needed to address the research questions that are important for marketing authorisation of a given product in the target population might be modified based on what is known for other population" [12]. Whilst in the aforementioned example the evidence is extended to consider another population, in principle, indirect evidence may relate to any other level of PICOS ( Figure 1)-it may include studies assessing a different, but related, treatment or pertaining to a different study-design than what is specified in the research question. Note that, in this context, NMAs also consider indirect evidence, pertaining to other treatment comparisons, to inform the treatment effect(s) of primary interest [13].
[INSERT FIGURE 1 HERE] Within a decision-making context, the use of indirect evidence, as long as it is judged relevant, contributes to accountability by allowing for all relevant evidence to be considered. Combining all relevant sources of evidence may yield more precise estimates than the direct evidence alone and allow better characterisation of heterogeneity and uncertainty. However, when indirect evidence are not sufficiently relevant or of high-quality, using indirect evidence may also introduce bias and inflate heterogeneity estimates.
The use of indirect evidence to support decisionmaking is not exclusive to the aforementioned regulatory context and has permeated HTA processes. Examples can be found in Technology Appraisals (TAs) conducted by the National Institute for Health and Care Excellence (NICE) to inform routine use of technologies in the National Health Service (NHS) in England and Wales. For instance, TA445 [14] considered adult studies to complement a sparse paediatric evidence base. Also, relative effectiveness has been generalised across subgroups of different Hepatitis C genotypes [15]. These two examples use indirect evidence by considering both sources perfectly generalisable ('lumping'), as an alternative to being considered completely independent ('splitting'). There are, however, examples of appraisals which use indirect evidence in more sophisticated ways. For instance, (TA383) [16] used indirect evidence across interventions by assuming a 'class-effect' between treatments that function through the same molecular pathway. TA139 [17] and TA168 [18] simultaneously modelled two outcomes leveraging their correlation structure and TA244 [19] modelled a network of interventions with multiple treatment components assuming that the relative effect of an intervention is the sum of the relative effects of its comprising components.
Inevitably, a judgement on whether the indirect evidence is relevant is always required. However, what is often not made explicit is that, where both direct and indirect evidence are considered, there should be appropriate consideration for the extent of informationsharing permitted by different synthesis methods (i.e. the extent to which the indirect evidence is allowed to affect the estimates obtained by using only the direct evidence).
The objective of this review is to identify informationsharing evidence synthesis methods that have been used in the literature and improve understanding of these methods by making explicit the fundamental assumptions underpinning them. We do so by identifying the 'core' relationships used to share information. This review increases awareness around the breadth of available information-sharing methods and aids transparency in information-sharing methods choice. To our knowledge, this topic has not been explored in the past with a clear policy focus.

Methods
Given the lack of consistent terminology in the literature referring to methods that combine direct and indirect evidence, keyword-based search methods [20] were not used. Instead, 'citation-mining' methods [21] were used, which are efficient [22] and have been used for similar reviews [23]. The review protocol was developed by the authors and validated by the first author's Doctoral Thesis Advisory Panel. The process consisted of the following steps: initially, a list of seminal/influential papers was compiled after consulting with experts and conducting a scoping literature review. The scoping review included terms to represent a variety of fields including MA, NMA, multi-parameter evidence synthesis, and the incorporation of evidence on historical controls in trial-design. Although the last category is outside the scope of this work, it may have influenced the extension of methods to the MA/NMA field.
The citations of 7 seminal papers [8,[24][25][26][27][28][29] were searched in the Web of Science (WoS) on 20-Feb-2019. Subsequently, articles were identified that cited -forwards citation-mining-or were cited by the seminal papers -backwards citation-mining-. Articles were included if they mathematically specified MA or NMA models that combine information pertaining to multiple populations, interventions, outcomes, studydesigns or utilise evidence from an external source such as previous meta-analyses. Importantly, papers that used only standard NMA methods were excluded even though they share information across treatment comparisons, because such methods are well established in the literature. Further information on the search strategy and inclusion and exclusion criteria is provided in Additional file 1.
From each included paper, the synthesis model was isolated and from within it, methods facilitating information-sharing were extracted. Methods were subsequently categorised according to the 'core' relationship that they used to enable information-sharing. When papers tackled multiple synthesis challenges simultaneously (e.g. [30,31]), the issues they dealt with were isolated along with the method used to address each. The PICOS level of indirectness was also extracted. The search was conducted in Zotero version 5.0.69. The PRISMA checklist for systematic reviews is provided in Additional file 2.

Characteristics of the included studies
The review identified 89 papers ( Figure 2). The majority (n = 78) described methods that shared information on relative treatment effects. Other studies used methods to share information on comparison-specific meta-regression slopes (n = 4), comparison-specific between-studies heterogeneities (n = 6), or studyspecific baselines (n = 2). Overall, there was a balance amongst papers that developed methods within MA (n = 45) and NMA (n = 44). There was a strong emphasis on methods for information-sharing across multiple outcomes (39 papers) and treatments (23 papers), with fewer papers focusing on study-designs (10 papers) or populations (6 papers) ( Table 1). Note that some papers described methods sharing information on several types of parameters and across more than one PICOS level (e.g. [32]). A full list of the included papers along with a description of how information was shared within each paper can be found in Additional file 3. [
'Core' relationships for information-sharing The methods identified were classified according to the 'core' relationship facilitating information-sharing. Four 'core' methods were identified ( Figure 3) : 1) functional relationships which include deterministic functions among model parameters resulting in a reduced number of parameters that need to be estimated; 2) exchangeability-based relationships which assume that a set of parameters are drawn from a common distribution that allows them to be shrunk towards its mean; 3) prior-based relationships which employ a Bayesian framework to 'load' the indirect evidence in prior distributions and 4) multi-variate methods which assume that model parameters are correlated and enable information-sharing through the covariance structure.
[INSERT FIGURE 3 HERE] Table 1 classifies papers according to the 'core' method used and the PICOS level of indirectness. It shows that some 'core' relationships are preferred when information is shared across specific PICOS levels. For instance, most of the identified papers sharing across interventions either use functional or exchangeabilitybased relationships, and no example using priors was found. Also, papers that use multivariate relationships, do so to share information across related outcomes, not across populations or study-designs. This may be partly because the information required to implement multivariate methods that capture the between-study correlation for multiple populations or study-designs (i.e. studies reporting separately for two different populations or designs) is usually unavailable in the literature because single studies rarely enroll or report separately for multiple populations and cannot, by definition, pertain to multiple designs.

Functional relationships
The simplest functional relationship is lumping (i.e. common effects) where all data points inform a single parameter independently of whether the evidence is direct or indirect. Examples include pooling RTEs across time-points [32] or (sub-)populations [14,35] as well as pooling between-trial heterogeneity parameters [92] or meta-regression slopes [91].
Another type of functional relationship is constraint where a strict inequality is imposed among parameters. In a Bayesian framework, information-sharing is facilitated by preventing simulation samples that do not conform to the specified constraint. Such methods have been used to relate RTEs across dosages, expressing that higher dosages are expected to exhibit larger RTE [39,40], describe structurally-related outcomes [65] and specify second-order consistency equations that impose a triangle inequality on the comparisonspecific between-trial variances [94,95].
Meta-regression-type methods have also been suggested. In the examples found, the relationships were usually linear -on the modelling scale-with one RTE component independent and another RTE component dependent on a particular study characteristic. The most common example in this category is bias-adjustment methods, primarily used to synthesise studies of different designs. Bias-adjustment methods broadly fall into two categories: general frameworks that adjust the RTE for biases affecting internal and external validity provided that the extent of bias can be either estimated from empirical evidence or elicited from experts [55,56,63,64], and approaches that adjust for bias due to particular study-level characteristics (considered proxies for study quality such as their size [47,[59][60][61][62], publication year [104], or risk-ofbias [57,58]). Meta-regression-type relationships have also been used for complex interventions. In their simplest form, they model the RTE of a complex intervention as the sum of RTEs of its treatment components [30,45,49,50]. More sophisticated approaches allow for synergistic or antagonistic relationships by suggesting functions that also contain treatment interaction RTE components [48]. Other applications include approaches that model the RTEs measured in two survival outcomes (e.g. time-to-mortality and time-to-progression) by assuming that they only differ by a constant component which is invariant across treatment comparisons [31], models that assume a linear relationship between dosage and RTE [44], and methods for baseline-risk adjustment [36].
Finally, more complex, non-linear, relationships have also been presented in the literature, namely those enabling the synthesis of RTEs across a range of dosages using the Emax model [38,41,42] commonly employed in pharmacokinetics and those enabling the sharing of information across follow-up periods [66,67].

Exchangeability-based relationships
The simplest exchangeability-based relationship uses a random effect to relate a set of parameters; in this way accounting for heterogeneity without explicitly modelling its source(s). The random effect assumes that all parameters are drawn from a distribution, implying that individual parameters are shrunken towards the random effect mean; this can happen to a greater or lesser extent, depending on the precision and discrepancy of each individual estimate in relation to the random effect mean. Examples of parameters to which random-effects have been applied include: RTEs of different dosages of the same treatment [39], comparison-specific meta-regression slopes [36,47,59,91], comparison-specific between-trial variances [94,95], and study-specific baseline-risks [36,37]. Random-walks are another form of exchangeability relationship. They assume that data points which are more similar with respect to a particular characteristic are expected to exhibit more similar RTEs. Examples include approaches assuming that the RTE of a particular dosage or follow-up period is drawn from a distribution centred around the RTE of its adjacently lower or higher dosage [39] or follow-up period [43,66].
Multi-level models also use exchangeability, but apply it to the hierarchical/clustered structure of the available data. As such, exchangeability is applied at a first level within specific groups of parameters (i.e. multiple random effects are applied, each within groups of RTEs from studies showing a particular characteristic) and at a second level across the group-specific hyper-parameters. This is shown in Figure 4, where in the bottom level, studies are categorised according to a characteristic and a different random effect is imposed within every category, producing group-specific basic parameters and heterogeneities. Subsequently, in the top-level, exchangeability is also be assumed across the group-specific basic parameters which are shrunk towards an overall, global, group-independent, hyper-mean. Examples include 'class-effects' models where, on top of the classical Random-Effects (RE) NMA models, the basic parameters of treatments that function through the same mechanism are assumed to be drawn from a common distribution with an overall 'class' mean and an across-treatments, within-class, heterogeneity [32,35,40,[44][45][46]. Class-effect approaches have also been imposed across comparison-specific metaregression slopes [47,91]. Multi-level models have been suggested to combine adult and paediatric evidence [34] and studies of different designs [51,52,54,105]. [

INSERT FIGURE 4]
Prior-based relationships Direct and indirect evidence can also be combined through the use of prior distributions. The process usually consists of two-steps where initially the indirect evidence is analysed and subsequently the resulting distribution is used as a prior in the analysis of the direct evidence. Of note is that this approach is mathematically equivalent to lumping, which was described under functional relationships. Examples include the combination of adult and paediatric evidence [34] or randomised and non-randomised evidence [51,52,105,107]. The prior can additionally be adjusted for bias or its precision decreased [51]. Alternative ways to define the prior include the use of metaepidemiological evidence or expert elicitation. The former has been used primarily for bias-adjustment [48], whilst both the former [24,88,89] and the latter [108] have been used to define prior a distribution for the between-trials heterogeneity. More nuanced prior-based approaches such as mixtures of priors have also been used. Here, the informative prior (distribution representing the indirect evidence) is not used at face value, but instead mixed with a vague prior according to weights that may be specified by the analyst or estimated within the synthesis model. The resulting informative prior is typically heavy-tailed, and allows for 'adaptive' informationsharing whereby information-sharing is stronger when the direct and indirect evidence are in agreement and weaker when they conflict [33]. Mixtures of priors have been used to combine evidence on RTE and betweenstudies heterogeneity across adults and children [33] and to analyse the study-specific baseline parameters from studies that enroll populations with different baseline risk [36]. The use of mixtures of priors has also been discussed for the synthesis of randomised and non-randomised evidence [51].
Finally, a flexible method that has been proposed is the power-prior [109]. In this method, the likelihood of the indirect evidence is raised to a power scalar 0 a 1 which reflects the perceived similarity between the two sources of evidence. When a = 1 the results are equivalent to 'lumping' and when a = 0 results are identical to 'splitting'. The power parameter, a, needs to be specified, and it has been proposed to be elicited [110] or varied in sensitivity analysis [111]. Power priors have been used to combine observational and randomised evidence [53] and for the synthesis of adult and paediatric evidence [34].

Multivariate relationships
Multi-variate relationships have primarily been used to share information across multiple outcomes. Multivariate meta-analysis correlates the various outcomes and may separate within-and between-studies correlations [72]. At the within-study level, the study-specific correlations arise due to differences among the included patients and indicate how the outcomes co-vary across individuals within the study. For example, patients who, due to a baseline characteristic that makes their disease more severe, show high values for outcome A, are also more likely to yield high values for outcome B. At the between-studies level, correlations arise mainly due to study-level differences such as the distribution of the patient-level characteristics across studies. For instance, studies that enroll more severe cases and therefore may show high values for the mean of outcome A, are also more likely to result in high values for the mean of outcome B, whilst studies enrolling less severe cases may show lower mean values for both outcomes.
Multivariate methods have been developed to consider two [73,74,84], three, or more correlated outcomes [26,87], accommodate the simultaneous analyses of multiple treatments [75,77], and assess the relationship between surrogate and final outcomes [80,96]. Given that within-trial correlations are commonly not reported, authors have suggested the use of external data to inform these parameters [71] or, when external data is not available, methods that approximate the within-study co-variances [86]. Further extensions have been developed to handle missing data [82] and allow modelling of heterogeneity and inconsistency using two different variance components [81].
To accommodate cases where the within-trials correlations are unavailable and cannot be otherwise obtained, alternative methods, which require the same data as a univariate approach and do not separate within-and between-trials correlations have been suggested for MA [79,112] and NMA [75]. Assuming that the overall correlation is not very strong, these methods perform very similarly to their counterpart, which separates the two correlations, whilst preserving their benefits against the univariate approach.
Finally, some methods only account for either the within-or the between-studies correlations. For example, to model mutually exclusive outcomes, it has been suggested to only account for the within-trials negative correlations which are induced by the competing risks structure of the data (i.e. the more patients that reach an outcome, the fewer the patients that reach another outcome) [76]. Also, other approaches have only modelled the between-studies covariance matrix to allow simultaneous synthesis of multiple outcomes [30,31,65,78], accommodate outcomes reported at several follow-up periods [68,69] and enable information-sharing across different treatment components of complex interventions [45].

Discussion
The aim of this review was to identify and classify evidence synthesis methods that have been used to combine evidence from sources that relate directly and indirectly to a particular research question and classify relationships. A wide range of methods have been developed to share information between populations, treatments, outcomes and study-designs. We found that across the breadth of methods identified, four 'core' relationships are used to facilitate informationsharing. These are functional, exchangeability-based, prior-based, and multivariate relationships (Figure 3).
This review highlights the breadth of methods options that can facilitate information-sharing. Although, typically, particular relationships are used preferentially to share information on specific informationsharing contexts, it is likely that several methods are applicable and analysts would need to choose which method is more appropriate. This paper highlights that appropriate considerations need to be made when choosing 'core' relationships and methods as choices are likely to influence the degree of informationsharing. Specifically, method selection may be informed by the following considerations; the first is the plausibility of the assumptions imposed by the methods in the context of interest. By classifying methods according to the 'core' relationship that enables information-sharing, we hope to facilitate a clearer discussion about the plausibility of these assumptions in the decision context of interest.
The second is the degree of information-sharing that is imposed between direct and indirect evidence. Within the literature, there is limited exploration of how much different methods borrow-strength from indirect evidence, though for multivariate methods, it has been noted that information-sharing is 'usually modest' [26,97] and, sometimes, instead of 'borrowing-strength', multi-variate methods may end up 'borrowing-weakness' [113]. The few studies that have assessed the degree of information-sharing typically consider only the degree of precision gains [114] rather than also examining how the point estimatewhich is also important for decision making -changes. Further research to understand the extent to which different methods share information is warranted.
Finally, decision-makers may be interested in exploring different levels of information-sharing. One way to do that is by using prior-based methods that allow some control on the degree of information-sharing. For instance, an informative prior may use either the posterior distribution of the mean, or the predictive distribution of the indirect evidence. The former is equivalent to lumping, whilst the latter imposes less information-sharing. Similarly, power-priors allow a range of values to be used for α which determines the extent of information-sharing.
Whilst we believe that the above identification of 'core' relationships is exhaustive, the use of citationmining techniques may have missed relevant methods, particularly those outside of health research. Additionally, we only looked for methods that shared information between sources of evidence that address different research questions. Hence, methods such as commensurate priors which have been used to combine individual-patient data and aggregate-level evidence on the same research question [99] could also be useful for combining evidence sources that pertain to different research questions, but were here considered outside of the scope of the search. This paper is the first to summarise and categorise the existing literature by classifying methods according to the 'core' assumption that they use to facilitate information-sharing. Further research could explore the following questions: first, how can we determine whether indirect evidence is relevant? Second, how can the appropriateness of each informationsharing method be assessed for the synthesis problem at hand? Finally, can the extent of information-sharing be quantified to assist transparent decision-making?

Conclusions
We conclude that a plethora of methods has been used to facilitate information-sharing. These can be categorised according to the main assumption they impose into functional, exchangeability-based, priorbased, and multivariate relationships. Despite the wide range of available methods, these are often used preferentially without ensuring that all options have been explored. Given that methods may differ in the degree of information-sharing they impose, the implication is that the chosen method may impose stronger or weaker information-sharing that what is considered appropriate by policy-makers. Further research should investigate ways of judging the appropriateness of the degree of information-sharing imposed by each method, and assess the impact of using different methods on decisions.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
All authors have provided their consent for the publication of the final manuscript.
Availability of data and materials The list of included studies in the systematic review is available online in the following url: https://www.zotero.org/groups/2360368/citation-mining_included-studies

Competing interests
The authors declare that they have no competing interests.

Funding
This work was funded by a doctoral studentship awarded to GFN by the Centre for Health Economics. The Centre for Health Economics did not have a role in the design of the study, the collection, analysis, and interpretation of data, or in writing the manuscript.
Authors' contributions GFN drafted the initial manuscript, conducted the citation-mining review and coordinated contributions from all authors in drafting the final manuscript. MS oversaw the work. MS, BW, and SP participated in the study-design, revised the manuscript and approved its final version.