The methodology comprises 3 complementary pillars: 1) Benchmark review of relevant registries and Clinical Practice Guidelines 2) Applying machine learning to screen (using frequent words in abstracts) the studies that have been extracted from a systematic search using only the disease as search term; and 3) Extracting the candidate outcome measures from batches of retrieved studies from another systematic search iteratively until saturation is reached (Figure 1). To show the feasibility of our approach, we illustrate this methodology in the ICHOM standard dataset for heart valve disease (HVD).
Pillars of standardized outcome measure development
Pillar one: Benchmark review
In most domains of disease, a wealth of registries and guidelines exist for the capture of clinical outcome measures. These resources provide excellent starting points for the identification of candidate outcome measures for a standardized dataset. In the case of the HVD dataset these resources included existing well-known valve registries, as well as international Clinical Practice Guidelines for the management of patients with HVD. Even in clinical areas where there is a lack of guidelines or registries (e.g. in rare diseases), efforts should focused on performing a literature search to identify consensus documents or relevant literature for this specific disease. In the HVD case, national and international registries and guidelines were reviewed to extract the clinical outcome measures including patient reported outcomes measures (PROMs). No systematic search was performed, as the guidelines an registries are well-known. Notwithstanding, a benchmark review may not capture all relevant candidate outcome measures, as emerging outcomes may not have yet been adopted in the literature, it provides an overview of the widely used outcome measures for the targeted clinical domain. In addition, in registries that only focus on short term clinical outcomes, relevant long term clinical outcomes may be missed. As such, incorporating this pillar with other steps complement the development process.
Pillar two: Text mining
In order to obtain a quick overview of the outcome measures that should be included, text mining using machine learning can be utilized. That is, the use of a traditional scooping search (using only the disease as search term) at an initial stage and then apply machine learning to combine and analyze retrieved data. For instance, in the HVD dataset, the search term “Heart Valve Disease” was used in Embase yielding 142,279 articles (Table 1), with all the abstracts of the identified articles being separated in different text files. Using machine learning, there is an opportunity to create a Wordcloud from these text files (9), and visually present the frequency of the words that are used in the abstracts (the larger the word in the Wordcloud, the more frequently it appears in the abstracts). The Wordcloud of the HVD search is presented in Figure 2 as an illustration. A quick scan of this Wordcloud suggests that outcomes such as mortality, valve regurgitation and stenosis are commonly discussed in literature. Note that the machine learning algorithm abbreviates words so that identical words with different suffixes (e.g. valve and valves) are counted as one word. An example of R code to develop a Wordcloud is provided in Supplementary Text 3.
Pillar Three: Iterative approach of reviewing literature
A systematic literature review is an essential component in identifying candidate outcome measures for a standardized dataset. It is recommended to conduct a systematic search in collaboration with a medical information specialist. Multiple medical literature databases should be included. As ICHOM focuses on outcomes of importance to patients, PROMs have been incorporated as an integral part of all ICHOM datasets. However, to ensure the capture of all relevant clinical outcomes (e.g., mortality, complications), it is recommended to perform two separate systematic reviews focusing on clinical outcomes and PROMs. Duplicates articles should be removed before selecting relevant articles. In the HVD standardized dataset, we used a broad search to identify clinical outcomes and a separate systematic search for PROMs (Table 1).
To balance the tradeoff between unnecessarily reviewing all studies and saturation of candidate outcomes (whether all relevant outcomes are identified) we propose an iterative algorithm of randomly selecting a batch of articles from an extensive literature search and keep reselecting new batches of articles until saturation of outcomes is achieved. The steps of this algorithm are described in Table 2 and visually represented in Figure 3.
Simulation module
The number of articles that are included in the initial and sequential batches in the iterative approach (pillars three) can be difficult to determine and reaching full saturation is a stochastic process, depending on several parameters. Therefore, a simulation study was performed to determine which cutoffs maximal saturation of candidate outcomes was achieved. It was hypothesized saturation is depended on several parameters, including: 1) the number of total candidate outcome measures, 2) the number of individual outcome measures per study, 3) the number of selected papers in the starting batch, 4) the number of selected papers in the subsequent batches, 4) the saturation achieved in pillar one and two, and 5) the probability of an outcome being reported in a random paper. The latter implies that some outcomes (e.g. mortality) are studied more often compared to other outcomes (e.g. left ventricle size), which reflects the real world better than assuming that the probability of an outcome being studied is equal for all potential candidate outcome measures. As such, we developed a simulation algorithm in R to simulate 1000 cases for any given combination using the aforementioned parameters. The simulation algorithm details are presented in supplementary Text 1, and can be used to replicate the simulation in different settings. In the HVD standard set we used 100 papers in the initial batch with 25 papers in the subsequent ones.