Challenges of False Positive and Negative Results in Cervical Cancer Screening

Background: Liquid-based cytology (LBC) molecular testing for human papillomavirus (HPV) infection andcombinations are practical modalities for cervical-screening. While life-saving, false positive and negative results are possible, leading to potential over or under treatment. Quantifying this is complicated by the increasing number of options available, including parallel co-testing and sequential triage. As HPV vaccination rates increase, it also has a potential impact onscreening test performance and interpretation of results. Methods:A modelling approach was used to compare different screening modalities in terms of Cervical intra-epithelial neoplasia (CIN) grade 2 and 3 detected and missed, false positives leading to excess colposcopy, and number of tests required to achieve a given accuracy. The positive predictive value (PPV) and negative predictive value (NPV) of different modalities were simulated under varying levels of HPV vaccination. Results:The model suggested that in a cohort of 1000 women, LBC screening typically misses 4.9 cases (95% Con�dence Interval (CI) 3.5-6.7), with 95 (95% CI: 93-97%) excess colposcopies. With primary HPV testing, 2.0 (95% CI:1.9-2.1) were missed with 99 (95% CI:98-101) excess colposcopies. Co-testing reduced missed cases to 0.5 (95% CI:0.3-0.7) but dramatically increased excess colposcopy referral (184, 95% CI:182-188). Conversely, triage testing with re�ex screening substantially reduced excess colposcopy to 9.6 (95% CI:9.3-10) at the cost of missing more cases (6.4, 95% CI:5.1-8.0). Over a life-time of screening, women who always attend co-testing hada 93.8-100% chance of a false positive over screening life-time. For annual, 3-year, and 5-year triage testing (either LBC with HPV re�ex or vice-versa), lifetime risk of a false positive is 35.1%, 13.4%,


Background
The impact of cervical cancer screening programmes with conventional cytology has been dramatic, as shown in numerous national programmes.While its effect has been primarily restricted to squamous cell carcinoma (SCC) in women over the age of 25, the estimated 80% reduction in mortality in high-quality national screening programs illustrates its truly positive impact (1)(2)(3).This ability to save women's lives has been key in justifying the substantial cost associated with screening, which in 2004 was estimated at £36,000 per life saved in the UK (1).
The evolution of cervical cancer screening to include HPV testing is a desirable step (4).However, this evolution necessitates that critical lessons from history are not repeated.HPV DNA testing as a screening tool has already demonstrated superior sensitivity in detecting cervical intra-epithelial neoplasia (CIN) grade 2 and 3 (5).Less emphasis has been placed on the high prevalence of HPV infection, and that suboptimal implementation of primary HPV screening will result in increased referrals to colposcopy.This is especially relevant as national programmes transition to HPV testing, which necessitates re-education of the screened population regarding the bene ts and limitations of cervical screening, particularly with the meaning of a "normal smear" (6).
A central consideration is the disproportionate impact false negatives can have on women who attend for cervical screening and the need to determine what represents acceptable false negative rates within programmes (7,8).This is further complicated by the legal standard in the UK and Ireland that screeners have "absolute con dence" in a negative test (9, 10).
Screening tests are however far from perfect in the identi cation of disease.Cochrane review gures (11) indicate that for every 1000 women screened 20 will be diagnosed with cervical intra-epithelial neoplasia (CIN) of grades 2 or 3 (CIN2/3).Cytology alone will identify 15 women of these women and screening with HPV testing would identify an additional 3 that would have been missed with cytology (18 vs 15).This increased sensitivity of HPV testing comes at the expense of lower speci city than liquid-based cytology (LBC).Typically, HPV screening results in an additional 4 women in 1000 being told they have a suspected lesion with the subsequent potential for unnecessary, harmful interventions.
It is also worth looking at the future of screening and at the in uences that will affect its performance.
HPV vaccination is already markedly reducing the prevalence of CIN across the world.As uptake increases, prevalence of HPV infection drops.While relatively recent, HPV vaccination is having a dramatic impact; a recent study in a Scottish cohort found a reduction of 88% in cervical disease due to vaccination (12), and modelling studies suggest Australia could virtually eliminate cervical cancer in coming decades due to its pioneering adoption and long-term maintenance of a successful vaccination program (13).As prevalence falls, a positive result is increasingly likely to not re ect a true case.The positive predictive value (PPV) of any test is the probability that a positive result truly re ects underlying disease that needs to be discovered and treated.There is already evidence of vaccination in younger women decreasing PPV (12,14), which was predicted at the time the HPV vaccine was rst approved (15).Theoretically, one would expect an impact on negative predictive value (NPV), the likelihood that a negative result can be safely reassuring that no disease is present.Implications of increased vaccination uptake must also be considered for any viable screening modality, as these shape interpretation of results and clinical judgement.
In this work, we use a mathematical model to examine the nuances of screening programmes using different screening modalities, examining the impact of different strategies on CIN2/3 cases detected, missed, and excess referral to colposcopy.We demonstrate the impacts of strategies to reduce false negative rates and examine whether these approaches might potentially result in overtreatment.Finally, we also examine the impact of HPV vaccination on screening accuracy, and likely challenges for the future to maintain bene cial screening programs.

Screening considerations
Screening is performed on asymptomatic women, in the hope of detecting potential CIN2/3 lesions before they advance into cancer.There are three especially relevant parameters to any screening test and in our model: prevalence, sensitivity and speci city.
Prevalence (p) -The fraction of a given population that have a disease (in this case, CIN 2/3) Sensitivity (s n ) -The proportion of cases with disease (CIN2 + in this case) that are correctly identi ed as positive by the screen.A test with a high sensitivity has in consequence a low false negative rate.Speci city (s p ) -The proportion of individuals without disease correctly identi ed as such by screening, i.e., negative.A test with a high speci city means a test has a low rate of false positives.
Tests do not have perfect sensitivity nor speci city.Consequently, false positive results (incorrectly identifying lesion-free women as CIN2/3 or worse) and false negative results (missing instances of CIN2/3 or cancer) are unavoidable in practice.Their impact on clinical decisions is a function of sensitivity and speci city of the test, and prevalence of disease itself.The PPV and NPV are respectively the probability that a test positive is a true positive and that a test negative is a true negative and are de ned in more detail in the mathematical appendix.As prevalence falls, a positive result is increasingly likely to not re ect a true case, and PPV would thus decline.Conversely, as the prevalence of a disease decreases, the NPV of testing tends to increase.

Testing Modalities And Implementation
Primary testing, triage, and co-testing Worldwide, there are numerous different approaches to cervical screening, and these can differ even inside a country.We will thus concentrate on general methods for illustration.LBC Primary test-only is illustrated in Fig. 1(a), where a positive LBC test is referred to colposcopy.Conversely, primary LBC may be performed with re ex HPV test, where a ASC-US positive results on a smear are then HPV tested.For primary HPV testing, a positive result is followed by cytological examination of the same specimen, where If abnormalities are then detected, referral is made to colposcopy.Another option with some clinical use is co-testing, where both HPV and LBC tests are performed.Worldwide, there are different ways to manage co-test results, but only interpretation is that a positive result in either arm instigates an elevation to colposcopy, illustrated in Fig. 1(b).Finally, re ex triage approaches are illustrated in Fig. 1(c).where an LBC screen can be performed, and AS-CUS results interrogated with a re ex HPV test, with positive results referred to colposcopy, as is common in Australia.Alternatively, the converse can occur (HPV test with re ex LBC, as is the recommended case in Ireland).In this work, we simulate all these general approaches.

Repeat testing of negative results
Another potential option to reduce the risk of false negatives is to perform n retests of tests that are initially negative.As detailed in the mathematical appendix, multiple testing with LBC modalities constantly reduces the number of missed cases of CIN2/3 cells, but at the cost of increasing excess unnecessary colposcopy (increased false positives).To the author's knowledge such a methodology hasn't been previously implemented in precisely such a fashion, but it is worth noting that repeat testing of negative results is similar to asking women who screen negative to come back for another regular screen after a certain interval, similar to what is recommended in most countries.While the model here does not attempt to reproduce the natural history of HPV infection, it can still be employed to illuminate the clinical value and untoward consequences of multiple screenings for a woman over time if she has cervical disease or not.

The Impact Of The Hpv Vaccine On Screening Accuracy
The HPV vaccine will markedly reduced CIN prevalence.A reduction in prevalence, however, has implications for both the PPV and NPV of screening tests.To model the impact of vaccine uptake on screening performance, we took predictive data from 29 modelling studies as previously described in the literature(16) to estimate the expected HPV prevalence under different levels of vaccine uptake (presuming vaccination against subtypes 6,11, 16 and 18), and then calculated PPV and NPV for both HPV testing and LBC.

Brief Modelling Description
A mathematical model was constructed to simulate likely outcomes of different modalities and implementations, so that they could be cross-compared.Full mathematical details of the model are in the appendix, as well as details on parameter estimation.Outcomes were simulated for a hypothetical cohort of 1000 women with a CIN2 + prevalence of 2%.This is a simpli cation, as natural history models show signi cant variation in CIN2 + prevalence with age and nationality (17), but this gure for prevalence is broadly representative and appropriate to assess screening performance in a randomly selected cohort (11).Table 1 lists the parameter values used for simulations.In addition to test outcomes, the model is also capable of yielding cumulative life-time probability of a false positive for women without CIN 2/3 as a function of test and testing interval, for the scenario where her underlying health status does not change.Exact formula are given in the supplementary material, and using these methods, we also model the impact of different screening regimens in a typical population to compare their false negative and false positive risk as a function of screening frequency.

Model Outcomes
Different implementations were tested, with emphasis on the following aspects vital to any screening programme 1.

2.
Theoretical maximum excess colposcopy referrals per 1000 women screened (false positives leading to unnecessary colposcopy referral) 3.
The positive predictive value of a given implementation.

4.
The negative predictive value of a given implementation.

5.
Total number of tests undertaken per 1000 women.
Ideally, false negative (FN) rates should be as low as reasonably possible, so that the NPV can approach 100%.In practice this is impossible (see mathematical supplementary material), but in principle one could reduce the number of FNs by retesting negative results.We might instead de ne an appropriate threshold for "absolute" con dence in a negative result, below which we can conclude with extremely high certainty that a woman does not have CIN2+.As an example in our study, we selected a con dence threshold of t s < 0.1%, which is consistent with an NPV of greater than 99.99%, and is the threshold likely to be the foundation of some forthcoming US guidelines (22).Each screening round, however, CIN2/3 prevalence in the cohort changes from p o initially to p(n).This acts to reduce missed cases of CIN 2/3, but conversely increases false positives requiring colposcopy.The formula for this is outlined in the mathematical appendix.Here, we simulate likely impacts of cautionary re-testing of negatives, ascertaining how many screening rounds would be required to achieve a threshold of t s and the implications for colposcopy rates.

Cross comparison of screening modalities and implementations
Table 2 shows the modelled comparison of screening statistics for different modalities discussed.Due to low prevalence of CIN2/3, NPV is consistently high across all modalities (> 99% typically) with the exception of screening modalities with triage testing, where it is lower.Triage testing leads to more missed CIN2/3 cases, but drastically reduces excess colposcopy referrals; HPV with LBC re ex triage results in typically only 9.6 excess colposcopy referrals per 1000 women screened, but increases the number of false negatives (6.4 per 1000 women screened).For primary testing alone as illustrated in Fig. 2, HPV testing detects 19% more cases than LBC testing, but led to 4% more colposcopy referrals relative to LBC. a HPV tests alone without co-testing not typically employed, but shown for completeness b Secondary test used as a triage test for initially positive results.These results are for the rst round of screening plus triage and do not include subsequent management of triage negative women.
c In co-testing, both tests are performed and a positive on either is referred to colposcopy.Brackets show order of tests Primary testing however produces an abundance of false positives.This can be reduced with triage testing, as depicted in Fig. 2. Both triage modalities (HPV with LBC re ex or LBC with HPV re ex) lead to only 10% of the false positives of LBC, at the cost of detecting only 90% of the CIN 2/3 cases LBC would detect.More importantly, triage outcomes are the same regardless of the primary test.This is relevant, as it may have economic impact.For example, 1000 women with HPV primary would require 1000 HPV tests, and a follow up of 117 triage LBC tests.Conversely, an LBC primary with HPV re ex for 1000 women requires 1000 LBC and 110 follow up triage HPV tests, as derived from formula in mathematical appendix.
Co-testing can markedly increases the number of cases detected, detecting 29% more cases than LBC alone.This however comes at the cost of increasing the false positive rate by 94%.Table 3 shows the number of test iterations required to reach an NPV of greater than 99.99%.This analysis does not account for test interval or triage strategies, discussed below.

Implications of test modality and frequency on overscreening and missed positives
The accuracy of screening is also dependent on inter-test interval.Figure 2 shows outcomes for different modalities, depicting both the cumulative probability of a false positive result over a screening lifetime (assuming screening begins at 25 and ceases at 70) for a CIN2/3 negative woman and the probability of missing successive true and persistent CIN 2/3 (Fig. 2b).Intervals of 1, 3, and 5 years are shown.As can be seen clearly, triage testing results in much fewer false positives, but at the cost of missing a greater proportion of CIN2/3 cases.Co-testing by contrast, can drastically reduce the number of CIN2/3 cases missed, but at the cost of increasing the false positive rate markedly.This can be somewhat ameliorated by reducing screening frequency.
Triage tests themselves have some added nuance that must be considered.Table 4 depicts the likely outcome of HPV primary testing with LBC re ex, considering the expedited retesting process that results from a positive HPV infection status.In these instances, women are tested again 6-18 months after this result.The retest can take several forms; another HPV test (UK (23) and Australia (24)), an LBC (The Netherlands( 25)) or both (United States(26)).Outcomes and times to detection are shown in Table 4, which clearly illustrates that retesting with both HPV and LBC detects more cases than a HPV retest alone, and much more cases relative to a single LBC retest.Another important consideration for Triage tests is the primary test; while outcomes are the same, the order of testing slightly impacts the total number of tests undertaken, as shown in Table 3. Depending on the cost differential between HPV and LBC screens, this might be economically relevant, as explored further in discussion.Impact Of The Hpv Vaccine On Screening Accuracy selection of natural history models as previously described.As vaccination rates increase, PPV of both tests fall, while NPV rises.What is immediately clear, however, is that HPV testing is a superior modality as vaccination increases, as it results in much fewer false positives than LBC testing.This can further be improved by implementing triage testing, and the results of this simulation would strongly suggest HPV testing is a superior method of screening to employ as vaccination rates increase and HPV infection decreases.

Disccussion And Conclusion
Cervical screening is a life-saving intervention, but as the results of this analysis show, it must be applied judiciously to have maximum bene t, whilst minimising impacts of over-treatment in false positive cases.
In this work, we examined advantages and limitations of different screening methods.The improved con dence gained for the patient in a negative primary HPV result in comparison with cytology testing alone (performed at an equal frequency) is due to the HPV test's improved sensitivity (27,28).The minimal bene t accrued in terms of sensitivity for treatable cervical cancers from co-testing, as outlined in this work, means that the decision of which modality to use is not only a scienti c question, but one of appropriate allocation of limited public health resources rather than a scienti c one which screening strategy should be pursued (29).Primary HPV testing at a 3 year interval has been demonstrated to have at a minimum equivalence with 5 yearly co-testing (30).Co-testing has other drawbacks too, as outlined in the result section and later in this discussion.
Primary testing (21,31) produces an abundance of false positives, and this can certainly be reduced with triage testing, as depicted in Fig. 3.Both triage modalities (HPV with LBC re ex or LBC with HPV re ex) have only 10% the false positive rate of LBC, detecting 90% of the cases LBC would detect.As these approaches yield essentially the same result, then a screening programme could implement as their resources allowed without ill-effect.More importantly, triage outcomes are the same regardless of the primary test.As mentioned in the results section, this has economic implications.While the total number of tests required differs by only a small amount, if there is substantial differential costs between both modalities, then the optimum could be selected to minimise these without causing harm.If, for example, HPV tests were much costlier than LBC, then taking LBC primary approaches for triage would be costsaving.Alternatively, if cytology was a limiting resource, then a HPV primary approach might be more suitable.A move towards HPV testing modalities will likely be strengthen given the WHO's commitment to cervical cancer elimination.
The bene t of triage testing is the reduced number of excess colposcopies performed, but this comes at a slightly reduced detection rate.One potential approach to increase detection is to perform surveillance and expedited retesting of triage-negative women, and our results show that this approach should allow to eventually detect most prevalent CIN2/3 cases.Another potential approach to maximise detection rate is to perform co-testing, also shown in Fig. 2.This results in improved detection ratio relative to LBC primary (29% improvement) but at the cost of almost doubling the false positive ratio (95% increase).This is accordingly highly likely to prove excessively expensive, and ultimately detrimental to public health, as the increased rate of detection is associated with an ampli ed false positive rate.While this is a modality reduces missed cases of CIN2/3 cells, this analysis suggests it would not be viable, resulting in needless harm, as other authors have warned (32).This raises important ethical questions regarding the safety of any screening program.When an asymptomatic population are invited into a screening program, there remains an ethical obligation that they exit the program with a reduced cancer risk and minimal harm.Increasing referrals to colposcopy is likely to lead to over-treatment of dysplastic lesions with associated impact on fertility and obstetric outcomes, including a 2 fold increased risk of preterm birth (33).Over-diagnosis resulting from screening has long been recognised as a serious issue (17,34) with screening programmes, although it is one that remains di cult to quantify (35).The results of this work should be useful in elucidating potential harms and bene ts.
While the model presented here is useful for quantifying detection statistics, it is important to consider the limitations of this analysis.For the false positive and negative life-time probability, we did not model natural history, and there is an implicit assumption that test results are independent from previous test results.The model is not adapted to predict the accuracy of screening at different ages, or for assessing the risk of progression/regression of CIN between screening opportunities.However, the results are likely a good approximation for the worst case scenarios, i.e. the risk for a woman with persistent CIN2/3 and the risks of screening for a woman who remains negative throughout her screening journey.This important assumption requires consideration as it is plausible that there are simply some CIN2/3 lesions that may never be detected with cytology or HPV testing due to characteristics of the lesion, such as low volume or low viral load.This in uences the cumulative probability of a false negative/positive.
A crucial point to acknowledge is that all screening modalities have inherent limitations -those which maximise detection are most likely to lead to false positives, and those which reduce the incidences of false detection also reduce CIN2/3 detection.However, there are serious caveats to this that must be considered.Considering HPV triage with LBC screening in Table 4, it is immediately apparent that expedited re-testing of HPV positive results outside of the regular screening cycle (6-18 months) goes a long way towards ameliorating the reduced detection ratio of triage tests, whilst minimising false positives and excess colposcopy referrals.This analysis also suggests that LBC only retesting of triage results tends to detect less disease than HPV retesting, or both HPV and LBC retesting.
In designing a screening programme, one must be cognisant of the potential harms as much as bene ts.
The advent of HPV testing has huge implications for cervical screening (11) (36), the implications of which are quanti ed further in this paper.The question of testing intervals was beyond the scope of this work, but was brie y alluded to in the analysis of false positive and false-negative cumulative probability illustrated for 1 year, 3 year, and 5 years intervals in Figs. 2 and 3. A recent French study (37) found that reducing testing window interval does more harm than good, leading to over-screening with needless risk, excess costs, and over-treatment.Other authors(28) have suggested that re-screening after a negative primary HPV screen should occur no sooner than every 3 years, and with Dillner et al 29 reporting that intervals of even 6 year were safe and effective.2018 US guidelines for HPV screening recommend a minimum interval of 5 years (26) between routine screening tests.As this analysis illustrates, we would expect in most cases that extending this interval has only a miniscule impact on missed CIN2/3 rate, while substantially reducing false positives.
As HPV testing becomes cheaper and more common, it is vital to consider how they are best implemented into screening.Evidence from recent multicentre studies (5,38) indicate that HPV-based screening provides greater protection against invasive cervical carcinomas relative to cytology.In those studies, recorded cumulative incidence of cervical cancer was lower 5.5 years after a negative HPV test than 3.5 years after a negative cytology result.This indicates empirically that 5-year intervals for HPV screening are safer than 3-year intervals for cytology.Results in this work support the hypothesis that HPV screening every 5 years could reduce the number of unnecessary colposcopy and biopsy procedures compared to frequent cytology, cutting costs and invasive unnecessary procedures.There is also ample evidence that negative HPV tests provide greater reassurance of low abnormality risk than negative cytology results(28, 39), with authors suggesting that primary HPV screening can be considered as an alternative to current US cytology-based cervical cancer screening methods.Certainly, the results of this analysis support the contention that HPV testing can strongly increase predictive power of screening tests, and when correctly deployed can also reduce potential harms of over-screening.
It is vital to look towards the future of cervical screening.The staggering international success of the HPV vaccine is already apparent (40), and countries with high update of the HPV vaccine are already seeing a precipitous drop in rates of precancer and abnormal cervical cells.The falling prevalence of CIN2/3 has deep implications for how we interpret future tests.As Fig. 3 demonstrates, the chief impact of falling HPV infection rates is that across all modalities, positive results are less likely to be informative.Conversely, our con dence in negative results increases as vaccine rates increase.It is clear that HPV testing is superior as infection rates fall, and results in much fewer false positives relative to LBC testing.This is likely to be important in planning the future evolution of screening programmes.The model outlined in this work has application here too, and can be employed to predict the con dence one should afford a particular screening result under varying levels of population prevalence.
The requirement to both educate women and provide accurate information on the impact of any alteration to screening programmes can be illustrated by the psychosocial impact the addition to or replacement of LBC with HPV for primary screening or for triage which can cause additional stress and anxiety for those participating in the screening programme (41)(42)(43).Unfortunately the discrepancy between society's expectation of screening programmes and actual sensitivities exist demonstrating the importance of public education (44).It is worth noting too that physicians and healthcare professionals are frequently under informed about the bene ts and limitations of screening programmes (45,46), and confusion can easily arise.While screening is an extraordinary measure that saves lives, it is important to understand its fundamental limitations and nuances, so that maximum bene t can be derived from any national programme and misunderstandings minimised.
Cervical screening comes with some inherent uncertainty, irrespective of the modality employed.While this work should help elucidate some optimum strategies for screening, the reality is that screening, while life-saving, cannot be expected to be perfect.It is worth being clear that perfect detection is a mathematical impossibility, and this is demonstrated in the appendix.There is an inherent trade-off in strategies to increase detection, as they inevitably lead to a disproportionate rise in false positives, with needless over-treatment.This is particularly relevant in the context of the legal requirement in some jurisdictions, such as Ireland where following legal action over missed cancer diagnosis, the high court ruled that screeners must have 'absolute con dence' in negative results(10), despite multiple investigation showing the labs in question were operating to high standard and no negligence was committed.
Such a stipulation is impossible, and as this analysis shows, even striving to get arbitrarily close to this standard is likely to result in more harm than good.This is neither conducive to public health, nor sustainable.It also has potential to muddy public expectation and understanding of screening, and what it can realistically achieve.Screening is a vital undertaking if we are to reduce cervical cancer mortality, and its strengths and limitations must be seen in context so that bene t can be maximised.The results of this analysis should prove useful in optimising approaches and demonstrating the complexities of different implementations so informed decisions can be made.

Figure 3 (
Figure 3(a) and 3(b) depict the impact of screening on the number of false positives (excess colposcopy) for HPV, LBC, and Triage testing as vaccination rates increase.Figure 3(c) and (d) show the PPV and NPV change with vaccination rates for HPV and LBC tests.Con dence envelopes were derived from a Abbreviations

Figures
Figures

Flowcharts
for (a) Primary testing only (b) Co-testing (c) Triage testing.Note that these are general schemas, and that there is large variation worldwide in exact implementation.

Table 3
Screening rounds required to achieve t s < 0.1% (NPV > 99.99%) a Triage results not shown, as repeated triage does not increase NPV.See text for details.b Total tests is the number of tests of initial type (1000) plus follow-up type.For example, Co-testing HPV/LBC implies 1000 HPV tests (initial) plus 883 (881-884) LBC tests.c HPV tests alone without co-testing not typically employed, but shown for completeness

Table 4
Possible Triage outcomes with expedited retesting for a woman with CIN 2/3

Table 3 and
Fig.2also show that while multiple screening rounds increase detection rate, it does so at the cost of increasing over-diagnosis and expense.For example, three rounds of LBC improves detection by 30% relative to a single screen, but requires a factor of 2.69 more tests, with 172% the false positive rate, rendering it an unsustainable approach.