Novel approach to extract candidate outcomes from literature for a standard outcome set: A case- and simulation study

doi:10.21203/rs.3.rs-2252328/v1

Download PDF

Research Article

Novel approach to extract candidate outcomes from literature for a standard outcome set: A case- and simulation study

https://doi.org/10.21203/rs.3.rs-2252328/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 09 Nov, 2023

Read the published version in BMC Medical Research Methodology →

You are reading this latest preprint version

Aims

Standard outcome sets enable the value-based evaluation of health care delivery. Whereas the attainment of expert opinion has been structured using methods such as the modified-Delphi process, standardized guidelines for extraction of candidate outcome measures from literature are lacking. As such, we aimed to describe a novel methodology to obtain a comprehensive list of candidate outcome measures for potential inclusion in standard outcome datasets.

Methods

We designed a three-key steps to develop a list of candidate outcome measures to evaluate healthcare and applied these steps for the development of the international consortium of health outcome measures Heart valve disease dataset to illustrate the method. Our methodological approach involves: 1) Benchmark review of relevant registries and Clinical Practice Guidelines; 2) Applying machine learning to screen (using frequent words in abstracts) the studies that have been extracted from a systematic search of the literature using only the disease term; and 3) Extracting the candidate outcome measures from randomly selected batches of the retrieved studies iteratively until saturation is reached. Batch cutoff choices were investigated using data of 1000 simulated cases.

Results

Simulation showed that on average 98% (range 92% to 100%) saturation is reached using a 100-article batch initially, with 25 articles in the subsequent batches. On average 1.7 repeating rounds (range 1-5) of 25 new articles were necessary to achieve saturation.

Conclusion

In this paper a standardized three-pillar approach is proposed to identify relevant outcome measures for a standard dataset. This approach creates a balance between comprehensiveness and feasibility in conducting literature reviews for the identification of candidate outcome measures.

Methodology

outcome measures

cardiovascular disease

systematic review

PROM

Value-based healthcare has emerged in recent years as the primary focus for authorities, decision-makers and the public (1-3). Medical interventions are evaluated on the basis of their safety, efficacy and cost-effectiveness(4-6). Whilst randomized controlled trials (RCT) are the gold standard method for such an evaluation, the post-marketing surveillance of clinical outcomes using real-world data is increasingly demanded by regulators and the public (1, 4-6). However, there is a need to standardize the methods by which these outcomes are defined and captured. The International Consortium for Health Outcomes Measurement (ICHOM) has been founded to promote comprehensive standardized outcome measurement globally. In developing a standardized dataset, ICHOM aims to collaborate with international working groups, including clinicians, scientist, policy advisors and patients, in order to vote on candidate outcome measures and case-mix variables using an online modified-Delphi process (7). Before undertaking the voting process, the candidate outcome measures have to be extracted from existing literature and registries. The identification of candidate outcome measures is a crucial step in the process, because erroneously disregarding outcomes in this step results in exclusion of potentially relevant outcomes prior to the voting process. Systematic reviews using use the Preferred Reporting Items for systematic Reviews and Meta-Analyses (PRISMA) guidelines are generally performed to identify candidate outcome measures (8). However, the PRISMA guidelines are designed to address a specific research question and aim to identify all relevant studies, but have not been developed for the identification of candidate outcome measures. Additionally, the PRISMA method retrieves a broad searches yielding an overwhelming number of studies, and extracting candidate outcome measures from all these studies can be a challenging endeavor. Besides, at a certain point (saturation point), repeated (or similar) outcome measures will be identified, resulting in inefficiency and waste of resources.

In this paper we discuss a novel more efficient approach to identify candidate outcome measures for the purpose of developing standardized datasets.

The methodology comprises 3 complementary pillars: 1) Benchmark review of relevant registries and Clinical Practice Guidelines 2) Applying machine learning to screen (using frequent words in abstracts) the studies that have been extracted from a systematic search using only the disease as search term; and 3) Extracting the candidate outcome measures from batches of retrieved studies from another systematic search iteratively until saturation is reached (Figure 1). To show the feasibility of our approach, we illustrate this methodology in the ICHOM standard dataset for heart valve disease (HVD).

Pillars of standardized outcome measure development

Pillar one: Benchmark review

In most domains of disease, a wealth of registries and guidelines exist for the capture of clinical outcome measures. These resources provide excellent starting points for the identification of candidate outcome measures for a standardized dataset. In the case of the HVD dataset these resources included existing well-known valve registries, as well as international Clinical Practice Guidelines for the management of patients with HVD. Even in clinical areas where there is a lack of guidelines or registries (e.g. in rare diseases), efforts should focused on performing a literature search to identify consensus documents or relevant literature for this specific disease. In the HVD case, national and international registries and guidelines were reviewed to extract the clinical outcome measures including patient reported outcomes measures (PROMs). No systematic search was performed, as the guidelines an registries are well-known. Notwithstanding, a benchmark review may not capture all relevant candidate outcome measures, as emerging outcomes may not have yet been adopted in the literature, it provides an overview of the widely used outcome measures for the targeted clinical domain. In addition, in registries that only focus on short term clinical outcomes, relevant long term clinical outcomes may be missed. As such, incorporating this pillar with other steps complement the development process.

Pillar two: Text mining

In order to obtain a quick overview of the outcome measures that should be included, text mining using machine learning can be utilized. That is, the use of a traditional scooping search (using only the disease as search term) at an initial stage and then apply machine learning to combine and analyze retrieved data. For instance, in the HVD dataset, the search term “Heart Valve Disease” was used in Embase yielding 142,279 articles (Table 1), with all the abstracts of the identified articles being separated in different text files. Using machine learning, there is an opportunity to create a Wordcloud from these text files (9), and visually present the frequency of the words that are used in the abstracts (the larger the word in the Wordcloud, the more frequently it appears in the abstracts). The Wordcloud of the HVD search is presented in Figure 2 as an illustration. A quick scan of this Wordcloud suggests that outcomes such as mortality, valve regurgitation and stenosis are commonly discussed in literature. Note that the machine learning algorithm abbreviates words so that identical words with different suffixes (e.g. valve and valves) are counted as one word. An example of R code to develop a Wordcloud is provided in Supplementary Text 3.

Pillar Three: Iterative approach of reviewing literature

A systematic literature review is an essential component in identifying candidate outcome measures for a standardized dataset. It is recommended to conduct a systematic search in collaboration with a medical information specialist. Multiple medical literature databases should be included. As ICHOM focuses on outcomes of importance to patients, PROMs have been incorporated as an integral part of all ICHOM datasets. However, to ensure the capture of all relevant clinical outcomes (e.g., mortality, complications), it is recommended to perform two separate systematic reviews focusing on clinical outcomes and PROMs. Duplicates articles should be removed before selecting relevant articles. In the HVD standardized dataset, we used a broad search to identify clinical outcomes and a separate systematic search for PROMs (Table 1).

To balance the tradeoff between unnecessarily reviewing all studies and saturation of candidate outcomes (whether all relevant outcomes are identified) we propose an iterative algorithm of randomly selecting a batch of articles from an extensive literature search and keep reselecting new batches of articles until saturation of outcomes is achieved. The steps of this algorithm are described in Table 2 and visually represented in Figure 3.

Simulation module

The number of articles that are included in the initial and sequential batches in the iterative approach (pillars three) can be difficult to determine and reaching full saturation is a stochastic process, depending on several parameters. Therefore, a simulation study was performed to determine which cutoffs maximal saturation of candidate outcomes was achieved. It was hypothesized saturation is depended on several parameters, including: 1) the number of total candidate outcome measures, 2) the number of individual outcome measures per study, 3) the number of selected papers in the starting batch, 4) the number of selected papers in the subsequent batches, 4) the saturation achieved in pillar one and two, and 5) the probability of an outcome being reported in a random paper. The latter implies that some outcomes (e.g. mortality) are studied more often compared to other outcomes (e.g. left ventricle size), which reflects the real world better than assuming that the probability of an outcome being studied is equal for all potential candidate outcome measures. As such, we developed a simulation algorithm in R to simulate 1000 cases for any given combination using the aforementioned parameters. The simulation algorithm details are presented in supplementary Text 1, and can be used to replicate the simulation in different settings. In the HVD standard set we used 100 papers in the initial batch with 25 papers in the subsequent ones.

Results of the simulation module

The iterative approach of our simulation model results in a 98% (range 92% to 100%) saturation of the candidate outcome measures for a total of 100 candidate outcomes based on 1000 simulations with 3 individual outcomes per paper, with 0% outcomes identified by the first two pillars of the development process. It is worth noting that pillar one and two did not identify candidate outcome measures for simulation purposes, and this is highly unlikely in real-world scenarios. In this particular case, the probability of encountering a specific outcome in a random study ranged from 40% for outcome number 1 to 5% for outcomes number 100 (in total 100 original outcomes). On average 1.7 repeating rounds (range 1-5) of 25 new articles were necessary to achieve saturation. The relationships between the choice of the initial and subsequent batches and the total number of outcomes are shown in Figure 4.

The probability of encountering a specific outcome in a random study seems a determining factor for achieving saturation, as simulation showed that if an outcome was reported in less than 5% of the studies in the literature, the probability of being selected in the final candidate outcome set dropped below 75% using a starting batch of 100 studies and repeating batches of 25 studies, assuming 0% saturation from the benchmark, with 3 individual outcomes per study and a total of 100 outcomes. The probabilities of ending up in the final set are marginally improved by choosing larger starting or repeating batches, but this improvement seemed more dependent on the number of individual outcomes per study than the actual number of studies in the starting and subsequent batches (Figure 5).

In this paper, we have designed and validated a structured approach for the conduction of systematic reviews which are aimed for the identification of candidate standardized outcome measures. Our methodology involves a benchmark review of relevant registries and Clinical Practice Guideline; the application of machine learning to screen (using frequent words in abstracts) the studies that have been extracted from a scoping systematic search of the literature; and the extraction of the candidate outcome measures from batches of the retrieved studies iteratively until saturation is reached. We validated this approach using a simulation module with an average of 98% (range 92–100%) saturation reached using a 100-article initial batch, with 25 articles in the subsequent batches.

This approach can be used to obtain a complete list of candidate outcomes which can be used in voting processes such as the modified Delphi method to develop a standard outcome set.

The PRISMA guidelines are commonly used to conduct systematic reviews of the literature. These guidelines mandate the review of all abstracts identified by the search, and recommend a full text examination of all the abstracts that meet the inclusion criteria (8). However, when developing outcome measures for standardized datasets, a huge number of studies may be identified given that the majority of observational and RCTs usually describe patient outcomes in these studies. As such, covering all these studies for a specific clinical condition is an overwhelming task. Previous research suggested limiting the inclusion to the 100 most recent articles in the search, or restrict the search to only RCT and registries (10, 11). However, both methods can introduce bias and miss out on important candidate outcome measures, due to temporal trends in outcome reporting or selective/limited reporting of outcomes in clinical trials/registries.

The aforementioned three pillars should be combined to compute a final list of candidate outcomes. The final list can be offered to clinicians and patients, who can vote and discuss which candidate outcomes should be included in the standard dataset. These voting processes are preferably iterative processes that uses systematic progression of repeated rounds of voting, such as the Delphi method (4).

In this paper we propose a novel integrated approach for the selection of candidate outcome measures form the literature. Such outcome measures provide a means for a standardized dataset to collect real-world outcome data to improve the understanding of the patterns disease across countries. The proposed approach is comprehensive, efficient and balances the tradeoff between extensively reviewing all literature versus potentially missing out on important outcomes. Simulation studies showed acceptable saturation of candidate outcomes using the iterative algorithm. We recommend that this method is adopted by the ICHOM initiative and other initiatives developing new standard outcome datasets.

Funding

No funding was received for this study.

Conflict of interest

The authors declare no conflict of interest.
-Ethical approval and consent: NA

-Consent to publish: NA

-Availability of data and material: The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

-Acknowledgment: NA

-Author's contribution: K.M.V conceived the idea and did formal analyses. A.J, F.S, P.B.J performed outcome extraction. S.A, J.J.M.T, Z.D and E.L helped supervise the project and contributed the final version of the manuscript.

-Author's information: NA

Porter ME, Teisberg EO. Redefining health care: creating value-based competition on results. Harvard business press; 2006.
Teisberg E, Wallace S, O'Hara S. Defining and Implementing Value-Based Health Care: A Strategic Framework. Acad Med. 2020;95(5):682–5.
Teisberg EO, Wallace S, editors. Creating a high-value delivery system for health care. Seminars in thoracic and cardiovascular surgery. Elsevier; 2009.
Szymański P, Leggeri I, Kautzner J, Fraser AG. The new European regulatory framework for medical devices: opportunities for engagement by electrophysiologists†. EP Europace. 2017;20(6):902–5.
Fanaroff AC, Califf RM, Lopes RD. New Approaches to Conducting Randomized Controlled Trials. J Am Coll Cardiol. 2020;75(5):556–9.
Grunkemeier GL, Jin R, Starr A. Prosthetic Heart Valves: Objective Performance Criteria Versus Randomized Clinical Trial. Ann Thorac Surg. 2006;82(3):776–80.
Jones J, Hunter D. Consensus methods for medical and health services research. BMJ: Br Med J. 1995;311(7001):376.
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.
Bletzer KV. Visualizing the qualitative: making sense of written comments from an evaluative satisfaction survey. J Educ Eval Health Prof. 2015;12:12.
Seligman WH, Das-Gupta Z, Jobi-Odeneye AO, Arbelo E, Banerjee A, Bollmann A, et al. Development of an international standard set of outcome measures for patients with atrial fibrillation: a report of the International Consortium for Health Outcomes Measurement (ICHOM) atrial fibrillation working group. Eur Heart J. 2020;41(10):1132–40.
Burns DJP, Arora J, Okunade O, Beltrame JF, Bernardez-Pereira S, Crespo-Leiro MG, et al. International Consortium for Health Outcomes Measurement (ICHOM): Standardized Patient-Centered Outcomes Measurement Set for Heart Failure Patients. JACC: Heart Failure. 2020;8(3):212–22.

Table 1: Search strategy of the systematic review

Goal	Search terms	Databases	Number of articles
Identifying clinical outcomes	See supplementary text 2	Embase	17,166
Identifying PROMs	See supplementary text 3	Embase, MEDLINE OVID®, Web of Science, Cochrane CENTRAL register of Trials	856
Text mining abstracts	“Heart Valve Disease”	MEDLINE OVID®	142,279

PROMs: patient-reported outcome measures

Table 2: Steps of the iterative algorithm to select articles until saturation of all candidate outcome measures, with an example of the heart valve disease standardized dataset

Step	Description	Example HVD set
1	Define research goal	Identify all clinical outcomes used in HVD
2	Define inclusion and exclusion criteria for relevant articles	1. Original research (retrospective, prospective, RCT, systematic review with meta-analysis) 2. Solely patients with heart valve disease (excluding the pulmonary valve) 3. Focus on clinical outcomes and case-mix variables 4. Conducted in humans
3	Conduct literature search (Table 1)	Yielded: 17,279 articles
4	Select 100 random articles	Random number generator corresponding to index number articles
5	Extract outcomes from initial randomly selected batch (100 articles)	52 outcomes were identified (including results from first 2 pillars)
6	Extract outcomes from subsequent randomly selected batch (new 25 articles)	4 new outcomes were identified
7	Repeat step 6 until no new outcomes are identified	In total 150 articles were reviewed

No competing interests reported.

SupplementaryfileHVDmethodologypaper.docx

Download PDF

Journal Publication

published 09 Nov, 2023

Read the published version in BMC Medical Research Methodology →

Editorial decision: Major revision
11 Aug, 2023
Reviews received at journal
10 Aug, 2023
Reviews received at journal
25 Jun, 2023
Reviewers agreed at journal
12 Jun, 2023
Reviewers invited by journal
12 Jun, 2023
Editor assigned by journal
23 May, 2023
Editor invited by journal
29 Dec, 2022
Submission checks completed at journal
29 Dec, 2022
First submitted to journal
08 Nov, 2022

You are reading this latest preprint version

Novel approach to extract candidate outcomes from literature for a standard outcome set: A case- and simulation study

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Methods

Results

Discussion

Conclusion

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1