Literature search overview
Our literature search identified 1737 unique references. After screening titles and abstracts, we excluded 1498 citations and assessed the full-text of the remaining 239 articles. We excluded 234 studies (Fig. 2). Most excluded studies were ineligible because the index test did not fit our criteria (n = 166). We excluded 23 studies of stationary colposcopes, 30 studies of low magnification devices (VIA and visual inspection with Lugol’s iodine, smartphones, EVA™, Aviscope™, cervicscan, and Magnivisualiser™), 21 studies where the full colposcopy procedure was not carried out (e.g. only acetic acid was used as with digital cervicography devices, smartphones, microscopes) and 92 studies of visual biopsy devices (e.g. artificial intelligence technologies, electrical impedance spectroscopy, confocal microscopy, Truscreen™, and sonoelastography). Six publications were ineligible because test accuracy data were missing [19–24]. Seven publications were based on study populations already included in our analysis [17, 25–29]. We have presented a complete list of excluded full-text assessments and the reasons for their exclusion in additional file 4.
Figure 2. PRISMA flow diagram of articles evaluated for inclusion and exclusion
From: Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group (2009). Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med 6(7): e1000097 [50].
We included five diagnostic test accuracy studies. Table 1 shows the characteristics of these studies, which include 2693 women. Four of the studies were conducted in LMIC (India [30], Bangladesh [31], Peru [32] and China [29]) and one was conducted in a high-income country (Sweden [33]). All studies used a single-gate design [12]. One study estimated DTA with two methods of screening [29] and another, for two different groups of providers (nurses/doctors) [31]. Four studies evaluated the Gynocular™ [30, 31, 33, 34] and one study evaluated the Pocket device. These devices have 4-12x and 3-30x optical magnification, respectively. All studies carried out the full colposcopy procedures outlined in the IARC manual for Colposcopy and Treatment of Cervical Intraepithelial Neoplasia [5]. Investigators from two studies obtained funding from the manufacturer for their contribution to the study. In all other studies where funding was obtained, the manuscript states that the funder did not play a role in planning and conducting research, or writing the manuscript.
Table 1
Characteristics of included studies: all obtain biopsy for identification of CIN2+
First author / publication year | Clinical setting | Index test | Procedure as described in manuscript | Age Mean (SD) | Number of women receiving index test | Number of women receiving biopsy (%) | Number of women refusing biopsy | Biopsy indication | Person performing index test | Prior tests | Prevalence of CIN2+ n (%) | Funding |
First-line test |
Newman 2019 | Boashan, China | Gynocular™ | Colposcopy traditional Use of green-filter not specified | 44.3 **(6.7) | 488 | 27 (5.5) | NR | Abnormal findings on colposcopy or abnormal cytology | Gynaecologists (1 of 2) | none | 31/488* (0.6%) | Two devices were donated to the study |
Mixed use: First-line test and add-on test |
Nessa 2014 Doctors | Dhaka, Bangladesh | Gynocular™ | Colposcopy IARC guidelines | 35.1 (8.1) | 932 | 228 (24.5) | 28 | Women who had a Swede score of greater than 4 | Gynaecologists or Colposcopy trained Physicians (1 of 6) | VIA (n = 528) OR no screening (n = 404) | 39/932* (4.2%) | Two investigators were funded by the device manufacturer |
Nessa 2014 Nurses | Dhaka, Bangladesh | Gynocular™ | Colposcopy IARC guidelines | 35.1 (8.1) | 932 | 228 (24.5) | 28 | Women who had a Swede score of greater than 4 | Colposcopy trained nurses (1 of 2) | VIA (n = 528) OR no screening (n = 404) | 39/932* (4.2%) | Two investigators were funded by the device manufacturer |
Add-on test only |
Banerjee 2018 | West Bengal, India | Gynocular™ | Colposcopy IARC guidelines | 39.2 **(7.4) | 1021 | 1020 (99.9) | 1 | All women who had the index test | Gynaecologist (uncertain how many gynaecologists were performing the index test) | HPV AND /OR VIA (180 had VIA only) | 36/1021 (3.5%) | No funding |
Kallner 2015 | Stockholm, Sweden | Gynocular™ | Colposcopy IARC guidelines | 33.4 (9.9) | 123 | 113 (92.0) | NR | Women who had a Swede score of greater than 0 | Gynaecologist (1 of 6) | PAP smear AND HPV | 44/123* (35.7%) | One investigator was funded by device manufacturer |
Mueller 2018 | Lima, Peru | Pocket | Colposcopy Excluded green filter as not standard practice in Lima | 37.1 ***(20–67) | 129 | 81 (62.7) | NR | Abnormal findings on colposcopy | Physicians (1 of 4) | HPV OR PAP smear | 22/129* (17.1%) | Two National Institutes of Health grants |
CIN2+,cervical intraepithelial neoplasia, grade two and above; Numbers in italics: calculated from data extracted; HPV, human papillomavirus; PAP smear, Papanicolaou smear test; VIA, visual inspection with acetic acid; IARC, International Agency for Research on Cancer; NR, not reported; SD, standard deviation. * Prevalence is based on assumption that all Women without biopsy were free from CIN2+; ** SD approximated, based on data from age categories; *** age range, as reported in the pape |
ADDITIONAL FILES |
Additional file 1: “Protocol”. The protocol for the systematic review and meta-analysis |
Additional file 2: “PRISMA-DTA check-list”. Completed systematic Reviews and Meta-Analyses for Diagnostic Test Accuracy Studies Checklist. |
Additional file 3: “Medline Ovid search strategy”. Description of the Medline Ovid search strategy. |
Additional file 4: “Full text assessments_explaination for exclusions”. Table showing excluded full texts and explainations for their exclusion. |
Additional file 5: “Paired forest plot for all Swede score studies”. Sensitivity and specificity estimates for all Swede score thresholds. |
Additional file 6: “Quality of the eligible studies”. Quality of eligible studies is scored using the QUADAS-2 criteria. |
Table 1. Characteristics of included studies
The studies evaluated test accuracy at different stages in the screening pathway (Fig. 1). The Pocket device was evaluated as an add-on test to HPV or PAP-smear [24]. The Gynocular™ was evaluated as a first-line test [29] and as an add-on test to HPV, PAP-smear or VIA [30, 33]. In one study, the Gynocular™ device was used indiscriminately as a first-line test among 404 women (43%), and as an add-on test after VIA positivity among 528 women (57%) [31]. Estimates of test accuracy were not available separately for the two subgroups, so the results could not be summarised with the other study results. In studies assessing devices in an add-on capacity, disease prevalence ranged between 3.5% [30] and 35.7% [33]. In these studies, the colposcopic procedure followed a positive PAP smear and/or HPV and/or VIA test. Prevalence of CIN2 + in studies assessing the device and colposcopic procedure as a first-line test was 0.6% [29], and when used in either situation at two points in the screening pathway, a prevalence of 4.2% [31] was found.
Test accuracy for the detection of CIN2+
Three of the four studies evaluating the Gynocular™ used the Swede scoring system to describe the colposcopy result [15]. We report sensitivity and specificity estimates for Swede score thresholds five and above (Fig. 3) and for all scores in additional file 5. Across all studies, sensitivity decreased as Swede score threshold increased, and specificity increased. The Swede score that optimised sensitivity and specificity was calculated to be six in three studies in which doctors did the assessment [30, 33, 31], and seven in one study, where nurses did the assessment [31].
Figure 3. Paired forest plot for Swede scores five to ten
TP, true positive; FP, false positive; FN, false negative; TN, true negative; CI, confidence intervals
Figure 4 shows study estimates for sensitivity and specificity, stratified by stage in the clinical pathway. For each specific point, there were few studies. We pooled results from three studies, including 1273 women, which used the index test as an add-on to any previous test. We found a sensitivity of 0.79 (95% CI: 0.55–0.92) and a specificity of 0.83 (95% CI: 0.59–0.94), with an AUC of 0.88 (0.85–0.90) (Fig. 5). However, the prediction interval indicates a large degree of variation between studies and imprecision in the pooled estimate. One study reported sensitivity and specificity of the index test used as a first-line test, and found a sensitivity and specificity of 0.33 (95% CI: 0.01–0.91) and 0.95 (95% CI: 0.93–0.97), respectively [29]. We did not pool study estimates across different stages in the screening pathway.
Figure 4. Paired forest plot of index test sensitivity and specificity stratified by clinical pathway
TP, true positive; FP, false positive; FN, false negative; TN, true negative; CI, confidence intervals
Figure 5. Bivariate model plot of add-on tests
1, Banerjee 2018; 2, Kallner 2015; 3, Mueller 2018; SENS, sensitivity; SPEC, specificity; AUC, area under the receiver operating curve; SROC, summary receiver-operating characteristic
Quality assessment
Overall, the quality of the eligible studies was moderate. Assessment using the QUADAS-2 criteria identified three common areas that compromise studies in the domains of (i) patient selection, (ii) index test, and (iii) the reference standard additional file 6.
In all five included studies, the sampling strategies were not detailed. It was unclear how the sample was derived, for example, whether a consecutive, random or convenience selection was used. Information about the target population was also missing, and no study reflected on whether the sample population was comparable to the target population. Data on excluded women were generally not available. In all studies, it was unclear whether selection bias influenced results.
Overall, the conduct of the index test was reasonable. However, in two studies (Nessa et al [31] and Kallner el at [33]), for 50% of women, the same assessor performed stationary colposcopy, followed immediately after by the index test. This sequence of events might have influenced the assessment of the index test. Several important issues regarding the reference standard were identified. Partial verification bias was identified infour out of five studies but considered to to have a high risk of bias in three. We considered two studies, Banerjee et al and Kallner et al [30, 33], to have a low risk of bias in the reference standard domain. In these studies, more than 90% of women who had received the index test also received the reference standard. In contrast, in Mueller et al, 63% of women received biopsy [32], in Nessa et al, 25% of women received biopsy [31], and in Newman et al, only 6% of women received a biopsy [29]. Conduct of the reference standard was problematic in two studies due to incorporation bias, where investigators use the index test to determine the need for reference standard and final diagnosis [31, 33]. These two studies used the Gynocular™ to assess Swede score, and used thresholds of 1+ [33] and 5+ [31] to determine if a biopsy was necessary. In contrast, two studies used alternative methods to indicate the need for biopsy. In Mueller et al, a standard colposcopic examination to determine the need for biopsy and by different assessors to those performing the index testing. In Newman et al [29], of the 488 women who received the index test, 24 women were biopsied following Gynocular™ examination, and a further seven were biopsied following a positive HPV test, cytology and stationary colposcopic examination. As such, women who were negative for the index test in this study had alternative tests, reducing the risks of misclassification. None of the studies included verification of histopathological diagnoses as a method for quality control and minimising misclassification.