Due to the exploratory nature of the research question, we anticipated that the traditional ‘population, intervention, comparator and outcome’ (PICO) [7] -based formulation of the research question using systematic review methods alone would be either too restrictive (gonorrhoea-related conditions would have to be specified a priori, limiting the outcome), or alternatively, too imprecise, resulting in a high number of hits of low specificity. To address this, we used an established scoping review methodology [8] combining three complementary approaches to search the published literature. These included a structured preliminary search combined with a traditional systematic search, then supplemented with the output of a novel AI-assisted Medline search, which we report here for the first time.
The research question was based on the ‘patient, concept and context (PCC)[9] structure: ‘Among persons infected with gonorrhoea, what is the range of clinical presentations, complications, coinfections and health outcomes that are associated with the infection?’ ‘Association’ was assumed to imply that gonorrhoea infection could be a plausible component along the causal pathway to the health outcome, either directly or indirectly [10, 11]. Identified conditions were then contextualized according to known pathogenic processes, to be associated with primary urogenital, anorectal, oropharyngeal or conjunctival infection. To compile an initial list of key health problems associated with Ng infection, we first conducted a ‘high yield’ or ‘snowball’ search [12], accessing websites of major public health institutes in the United States (US), United Kingdom (UK), Germany and the Netherlands [13-17] to review current disease summaries and guidance on gonorrhoea (Supplementary text 1.1, Additional file 1). We pursued targeted Medline searches of key health problems based on the quoted references, knowledge of seminal authors and studies in the field, and the reference list of each paper (Supplementary text 1.2, Additional file 1). The resulting list of health problems was then compared against the existing compendia of clinical diagnoses related to gonorrhoea from the International Statistical Classification of Diseases and Related Health Problems Ninth and Tenth Revision (ICD9/10 [18, 19] and Read diagnostic codes [20]) (Supplementary table 1–2, Additional file 1). The list of health problems was supplemented with ICD codes (used to systematically classify diseases, disorders, injuries and other health conditions) where necessary.
We then conducted a traditional systematic Medline search applying the PICO methodology [7], posing the broad question “In people exposed to Ng, what is the natural history of gonorrhoea infection?” The search string was developed iteratively (Additional file 2), combining keywords and Medical Subject Headings (MeSH) terms identified from seminal references which in-turn resulted from the snowball search. Full-text articles were retrieved if the title and abstract specifically related aspects of the natural history or pathogenesis of Ng to clinical sequelae or health problems in humans. Only English language abstracts were included. No other limitations were applied. Reference lists were reviewed and full-text articles were accessed where relevant. The outcome of this search was used to provide a brief narrative summary of the key pathogenic processes associated with complications and health problems identified as well as to identify further health problems associated with Ng.
We supplemented the searches with Papyrus [21], a novel AI-assisted Medline search tool (Supplementary text 3.1–3.3, Additional file 3). A broad search query (‘gonorrhoea’) was run, identifying relevant literature with an English title and abstract. The AI tool used automatic natural language processing (NLP) methods and pre-processing using the Stanford Core NLP library [22] (see details in Supplementary text 3.1, Additional file 3) to extract identified ‘topic-words’ from all abstracts – typically nouns or expressions describing concepts related to gonorrhoea (e.g. ‘salpinx’). A vector space model was constructed and a ‘CoClus’ co-clustering algorithm [23](see details in Supplementary text 3.2, Additional file 3) was applied to partition the vocabulary and the document set into topics, so that each topic comprises semantically related ‘topic-words’ and their enclosing documents (e.g. an analogy in the press would be to discover automatically without prior knowledge a topic where some of the most important words are ‘covid19’, ‘lockdown’, ‘mask’, ‘PCR’, ‘vaccine’, ‘test’, ‘layoff’, ‘stimulus’, ‘bill’). Within each topic, associated ‘topic words’ are ranked by a score based on the frequency with which these words occur in abstracts, reflecting their importance with respect to the given topic (Supplementary text 3.1, Additional file 3). Supplementary figure 1 (Additional file 1) shows an example of the raw textual output of the words listed under a topic, as extracted by the tool. Figure 2 shows the graphical user interface of Papyrus, which is composed of a topic map in the form of a mosaic of word clouds. It illustrates how each rectangle is a topic grouping a subset of abstracts (e.g., outcomes related to urogenital Ng infections) and their most representative topic-words (‘ectopic pregnancy’, ‘endometritis’, ‘epididymitis’ and ‘salpinx’). Details of the NLP methods are provided in Supplementary text 3.1, Additional file 3. As a first step, all ‘topic-words’ corresponding to each topic displayed in the topic map were extracted and screened manually and independently by two reviewers (JW and EB) for relevance to clinical and psychosocial gonorrhoea-related health outcomes. The papers corresponding to the agreed topic-words were eligible for inclusion if they described a potential association between primary gonorrhoea infection at any anatomic site with any clinical or psychosocial health outcome in women, men or children. Abstracts and titles were then manually screened and full-text articles were only accessed if the inclusion criteria were met.
To map the list of health problems and outcomes associated with Ng infections, the three approaches were cross referenced, duplicate conditions and references were removed and reported according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) framework [24]. The retrieved reports were categorized by study design (e.g., cohort study/case control/literature review [Additional file 4]) and primary research papers were assigned a quality score according to the Scottish Intercollegiate Guidelines Network (SIGN) criteria [25]. As some health outcomes are serious but rare (e.g., disseminated gonorrhoea infection [DGI]), categories of evidence considered included case reports, case series (SIGN score of 3) and higher levels of evidence. Health outcomes identified through secondary reporting in review papers only (and not in primary research) were also included as we considered that primary research from the pre-antibiotic era may not have been indexed on PubMed. Where associated conditions were derived from the clinical compendia of ICD 9/10 or Read codes (classification of clinical terms for describing the care and treatment of patients), these were categorized separately based on the causal pathogenic pathway. To summarize the results, health outcomes with the highest level of supporting evidence (SIGN score) were selected for inclusion in an illustrative figure. The full evidence table was reviewed by an independent expert (MA) for the plausibility of association with gonorrhoea. All identified conditions, associated references, study design and SIGN scores are provided in Additional file 4.