Study conduct
Methods for this prospective evaluation are reported in an a-priori protocol, posted 25 October 2018 on the Open Science Framework, https://osf.io/wxebg/ (doi: 10.17605/OSF.IO/6M28H), and are outlined more briefly below.
Test Systematic Review
We tested the proposed complementary search approaches on a SR initiated at our centre in 2016 focused on the effectiveness of pharmacologic treatments for the acute management of bronchiolitis (PROSPERO registration #CRD42016048625). The SR was chosen to test our living SR approach because (a) the topic is of high clinical priority, (b) there is uncertainty about the most effective treatment, and (c) new evidence is rapidly emerging that could alter conclusions and/or clinical practice. The primary outcomes of the SR were outpatient rate of admission and inpatient length of stay. Appendix 1 shows the selection criteria for the SR.
The literature search was developed by a research librarian and peer-reviewed following PRESS guidelines (Appendix 2) [16]. The search was initially run in October 2016 and updated in May 2018 in the following electronic databases: Ovid MEDLINE, Ovid Embase; Cochrane Central Register of Controlled Trials (CENTRAL) via Wiley Cochrane Library; CINAHL Plus with Full Text via EBSCOhost (1937 to present; removed in the 2018 search because it previously retrieved no unique included studies). This was supplemented by searches of selected conference proceedings, clinical trials registers, hand-searching reference lists of relevant SRs, and contact with content experts. As of May 2018, the search identified 6,999 unique records and the SR included 146 trials.
Complementary search approaches
We tested three automated search approaches over a one-year period (referred to as ‘complementary’ approaches), between October 2018 and September 2019: (1) Automated full search; (2) PubMed Similar Articles; (3) Scopus Citing References. A research librarian set up each search such that updates would be received by a central e-mail account on an approximately monthly basis, depending on the functionality of each database. We compared the performance of these strategies to the results of a full search update completed at the end of the one-year period. We refer to the full search update as the ‘reference standard’.
Automated Full Search. The Automated Full Search was very similar to the reference standard, but was adapted such that Medline and Embase could be searched simultaneously via Ovid (Appendix 3). We set alerts for Ovid to be received monthly. The timing of alerts for Wiley Cochrane Library cannot be controlled by the user, and were received on database reload. We supplemented these searches with a Google alert for clinicaltrials.gov (received ‘as it happens’) and a monthly alert of CPCI via Clarivate Analytics for conference proceedings.
PubMed Similar Articles. We undertook a Similar Articles search in PubMed via NCBI Entrez manually each month, as the process cannot be automated. The Similar Articles function in PubMed allows users to search for citations related to key ‘seed’ articles chosen by the reviewer [17]. We chose 48 seed articles: 13 key SRs and trials chosen by the SR authors, as well as the 3 largest and 3 most recent trials for each intervention (Appendix 4). We limited the searches by date (i.e., previous month).
Scopus Citing References. We set automated monthly alerts for Citing References in Scopus, using the same 48 seed articles that were used in the Pubmed Similar Articles search. The Citing References function in Scopus allows users to view all articles that have cited a particular ‘seed’ article. The Citing References search cannot be restricted by date but the monthly alerts reflected new citations during the previous month.
Reference management and screening
Following a pilot phase, we assigned a pair of reviewers to the management and screening of records retrieved from each of the search approaches. Pairs were matched for speed and accuracy, based on data collected during the pilot round. We approximated the approach to reference management and screening that may occur in a LSR. One reviewer in each pair received the automated search alerts via e-mail (or ran the search, for PubMed Similar Articles) and forwarded these to the other reviewer in the pair for screening. Duplicate records were not removed. Reviewers screened records independently in duplicate, in a two-phase process (titles and abstracts followed by full texts), and came to agreement on those included after full text review. Reviewers screened records directly from the email records of the search alerts.
At the end of the one-year period, a research librarian uploaded the results of the full search update to an Endnote (v.X7, Clarivate Analytics, Philadelphia, PA) library, and removed duplicates. The records were transferred to a Microsoft Office Excel (v.2016, Microsoft Corporation, Redmond, WA) spreadsheet for screening. As with the other search approaches, records were screened independently by two reviewers. The final inclusion of studies in the SR was determined by consensus between the two reviewers. This was supplemented by scanning the reference lists of the included studies and pertinent SRs identified by the search.
Data collection and analysis
Search performance. One reviewer documented the following in an Excel spreadsheet each month: the number of records (a) retrieved by the search, (b) screened by title and abstract, (c) reviewed by full text, and (d) included in the SR. As shown in Table 1, for each search approach we calculated performance metrics using standard formulae, as defined by Cooper et al. [18], and the proportion of studies missed compared to the reference standard.
Table 1
Definitions and formulae for the search performance metrics used to evaluate the complementary approaches
Performance metric a
|
Definition & formula
|
Proportion of studies missed b
|
Number of records not identified by the search, out of the total identified by the reference standard:
# relevant studies complementary search approach / # studies included using the reference standard approach x 100
|
Precision (specificity)
|
The number of relevant studies identified by the search, relative to the total number of records identified by the search:
# relevant studies identified / # records retrieved by the search x 100
|
Sensitivity
|
The number of records correctly identified by the search, relative to the total number of relevant studies that exist (identified by the reference standard):
# records retrieved by the search / total number of potentially eligible articles that may exist x 100
|
Number needed to read (NNR)
|
The number of records identified by the search that need to be screened to locate one included study:
1 / precision
|
a Metrics were calculated for each of the complementary search approaches, and compared to the reference standard approach. |
b We planned to also record any additional studies located by a complementary method that were not located via the reference standard approach, but this was not applicable. |
Impact on results and certainty of evidence. At the end of the one year, one reviewer extracted the following data from studies located via any of the search approaches using a standardized form in Excel: publication characteristics (author, year, country, design, funding source, language), population (age, sex, setting (inpatient or outpatient)), intervention and comparator (drug, dose, timing, duration, mode of administration), co-interventions, outcome data for the primary outcomes. A second reviewer verified the extraction.
Two reviewers independently assessed the risk of bias of new included studies using the Cochrane Risk of Bias Tool (version 2011) [19]. We assessed trials to be at overall high risk of bias when any critical domain was judged to be at high risk of bias, unclear risk of bias when any critical domain was judged to be at unclear risk of bias and no domain was at high risk of bias, and low risk of bias when there were no concerns in any critical domain. Reviewers resolved disagreements by discussion.
When new included studies were located by any search approach, we added relevant study data to pre-existing pairwise meta-analyses (any of the individual treatments vs. placebo) in Review Manager (RevMan v.5.3, The Nordic Cochrane Centre [Cochrane Collaboration], Copenhagen, Denmark). We pooled data using the Dersimonian and Laird random effects model [20] and present the findings as mean differences (MD) with 95% confidence intervals (CIs). For each new meta-analysis, two reviewers independently appraised the outcome-level certainty of evidence using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach [21]. Discrepancies in ratings between the reviewers were resolved by discussion. For ease of interpretation, we present the results of the appraisals in GRADE summary of findings tables and report decisions to rate down the certainty of evidence explicitly. For each complementary search approach, we recorded the timing (i.e., month) at which any changes to our classification of the results and certainty in the evidence occurred.
Feasibility and time requirement. Throughout the year, reviewers kept a log of thoughts and experiences related to logistical challenges, opportunities, successes and barriers in an Excel file. At the end of the one year of testing, the reviewers came to consensus on considerations for research groups undertaking living SRs based on their experiences. We had planned to analyze the qualitative data thematically, but given the small amount of data collected, these were instead summarized narratively.
We had initially planned to use a time log in Google forms to collect monthly data related to the search and screening process for each review team, to the closest 5 minutes per task. At the end of the project, it became apparent time estimates were tended to be overestimated using this method. Thus, we instead assigned a standard time per record for screening, estimated from the time logs (0.5 minutes per title/abstract; 5 minutes per full text). This had the advantage of eliminating confounding by differences in the speed of reviewer pairs from our comparison. For each complementary search approach, we calculated descriptive statistics (i.e., medians, ranges) in Excel for the number of hours spent screening per month and over the one year period. We retrospectively removed duplicates from the records retrieved via each complementary approach to estimate the number of duplicates screened using each approach.