Accuracy of Self-report for Cervical and Breast Cancer Screening: A Systematic Review and Meta-analysis

Background: Guideline-based breast and cervical cancer screenings are fundamental components of high-quality preventive women’s health care services. Accurate measurement of screening rates is vital to ensure all women are adequately screened. Our systematic review and meta-analysis aims to provide an updated synthesis of the evidence on the accuracy of self-reported measures of cervical and breast cancer screening compared to medical records. Methods: To identify studies, we searched MEDLINE®, Cochrane Database of Systematic Reviews, and other sources up to July 2019. Two reviewers sequentially selected studies, abstracted data, and assessed internal validity and strength of the evidence. Adjusted summary numbers for sensitivity and specificity were calculated using a bivariate random-effects meta-analysis. Results: Unscreened women tended to over-report screening among 39 included studies examining the accuracy of self-report for cervical and/or breast cancer screening. The specificity of self-report was 48% (95% CI 41 to 56) for cervical cancer screening and 61% (95% CI 53 to 69) for breast cancer screening while the sensitivity of self-report was much higher at 96% (95% CI 94 to 97) for cervical cancer screening and 96% (95% CI 95 to 98). We have moderate confidence in these findings, as they come from a large number of studies directly assessing the accuracy of self-report compared to medical records and are consistent with findings from a previous meta-analysis. Conclusions: Unscreened women tend to over-report cervical and breast cancer screening, while screened women more accurately report their screening. Future research should focus on assessing the impact of over-reporting on clinical and system-level outcomes.

knowledge, attitudes, or beliefs about screening; lack of access to a primary care physician for screening referral; lack of insurance; or lack of transportation to and from appointments (4, 5) may hinder women from getting screened (6,7). These barriers may be more pronounced among minority and low-income women, leading to racial and economic disparities in receipt of care and worse health outcomes (8)(9)(10)(11)(12)(13).
Accurate measurement of cancer screening rates is vital to identify inadequate screening, determine funding priorities and national health goals, inform initiatives to increase the uptake of screening, and determine if disparities exist in receipt of screening. Methods to collect cancer screening data range from population-based mail or telephone surveys relying on patient self-report (14) to specific medical record documentation requirements for quality reporting (15) (including physician-recorded data entered from patient self-report and pathology, lab, or radiology reports confirming that a test was performed). Different methods of measuring cancer screening rates have trade-offs to consider. Given that most individuals in the US receive health care from multiple providers, clinics, and hospitals, obtaining specific medical record documentation of prior screening may be time-consuming and costly. For these reasons, most cancer screening data rely on patient self-report (14,16). However, self-reported screening data is subject to potential conscious and unconscious biases, including telescoping (remembering an event occurring sooner than it actually occurred), (17) acquiescence bias (the tendency to respond "yes" when in doubt), (18) and social desirability response bias (overreporting of events that are socially desirable), (19,20) which have led to concerns about the accuracy of self-report data (21,22). Concerns about the accuracy of self-report data and uncertainty about the balance between the trade-offs of utilizing self-report or medical record documentation highlight the need to understand both the accuracy of patient self-report compared to medical records, as well as clinical and system level impacts (i.e.,over-or under-screening, missed diagnoses, and provider, administrative, or patient burden) of accepting self-report. Previous systematic reviews (23,24) on the accuracy of selfreport for breast and cervical cancer screening found that women tend to over-report cancer screenings. However, several subsequent studies have emerged since the publication of these reviews. Additionally, these reviews did not assess the impacts of accepting self-report on clinical and system-level outcomes. Our systematic review and meta-analysis aims to provide an updated synthesis of the evidence on the accuracy of self-reported measures of cervical and breast cancer screening compared to medical records, as well as the clinical and system-level impacts of relying on patient self-report instead of medical record documentation.

Methods
This paper is based on our full evidence report (25) and was reported and guided by current systematic review standards and guidelines (26, 27). The complete description of our methods can be found on the international prospective register of systematic reviews (PROSPERO; http://www.crd.york.ac.uk/PROSPERO/; registration number CRD42019116781) and in our full evidence report (25).

Searching and Study Selection
To identify relevant articles, our research librarian searched Medline, CINAHL, PsycINFO, Cochrane Central Register of Controlled Trials (CCRCT), and Cochrane Database of Systematic Reviews (CDSR), using terms for self-report and screening, mammography, pap smear, breast cancer screening, and cervical cancer screening from 2005 through July 2019. We relied on existing systematic reviews (23,24) to identify studies published prior to 2005. We identified additional citations by hand-searching reference lists and consulting with content experts. We limited the search to published and indexed articles involving human subjects available in the English language.
We included studies that assessed the accuracy of self-report compared to medical records in women reporting cervical and breast cancer screening. We excluded studies in women identified as high risk (i.e.,genetic mutation carriers, (28) patients with abnormal previous screens (29)) or with a history or diagnosis of breast or cervical cancer. These populations may be expected to be more involved in screening and treatment planning and more accurately report screening. We excluded studies that did not provide enough data to allow for calculation of diagnostic accuracy characteristics (i.e., sensitivity, specificity, positive predictive value, negative predictive value). Two investigators independently reviewed titles and abstracts and used sequential review (one investigator completing the first review, and second investigator checking) for full text articles. Investigators resolved all disagreements by consensus.

Quality Assessment and Synthesis
We used predefined criteria to rate the internal validity of all included studies. We used the Cochrane ROBIS tool to rate the internal validity of systematic reviews (30) and the QUADAS-2 tool to rate the internal validity of diagnostic accuracy studies (31). We abstracted data from all studies for prespecified study and patient characteristics of interest and results for each included outcome. We calculated sensitivity, specificity, positive predictive value, negative predictive value, concordance, and report to record (Rep/Rec) ratio for all included studies. Two investigators sequentially reviewed all data abstraction and internal validity ratings and resolved all disagreements by consensus.
We informally graded the strength of the evidence based on the AHRQ Methods Guide for Comparative Effectiveness Reviews (32) by considering risk of bias (includes study design and aggregate quality), consistency, directness, and precision of the evidence. Ratings typically range from high to insufficient, reflecting our confidence that the evidence reflects the true effect. For this review, we applied the following general algorithm: evidence comprised of multiple large studies with moderate risk of bias and inconsistent findings (significant unexplained heterogeneity) received a rating of "low strength" or lower. We assigned this same type of evidence but with consistent findings (limited or explained heterogeneity) a rating of "moderate strength" or higher.

Statistical Analysis
Where studies were appropriately homogenous, we synthesized outcome data quantitatively using STATA v.15 (StataCorp. College Station, TX). We calculated pooled adjusted summary numbers for sensitivity and specificity using a bivariate random-effects meta-analysis and positive predictive values (PPV) and negative predictive values (NPV) using the midas command (generates summary predictive values) (33). We calculated summary numbers for concordance and the Rep/Rec ratio by averaging across studies. We also conducted a univariable bivariate, and multivariable univariate, meta-regression to examine the impact of prespecified moderator variables, including age, publication year, sample size, study design (case-control vs. cohort), setting (population-vs. clinic-based), referral or screening program (whether the patients were part of a screening or referral program vs not), survey administration method (in-person vs. telephone or mail survey), timeframe of recall (≤ 1 year to ever), minority status, and low-income status. We assessed heterogeneity using the Q statistic and the I 2 statistic and publication bias using the Deeks' test (34).
Overall, we have moderate confidence (strength of evidence) in findings from these studies, as they come from a large number of studies which directly assessed the accuracy of self-report compared to medical records. Most studies had unclear risk of bias (QUADAS-2) due to unclear blinding of data collectors or survey facilitators and incomplete medical record review. This may have led to potential biases in the collection of self-report and medical record data or not obtaining medical record documentation of all screenings ( Figure 2). The studies showed significant heterogeneity, but this is likely due to differences among the populations studied (e.g., a study in low-income Black women vs. a population-based study in a Scandinavian country). Although heterogeneity remained in subgroup analyses, this may be due to our inability to adjust for age or differences between low-income or minority populations because of inadequate reporting of detailed patient characteristics in the studies.

Records
Women tended to over-report receipt of cervical and breast cancer screenings in studies reporting the accuracy of cervical or breast cancer screening self-report compared to medical record ( Table 2). The adjusted pooled specificity of self-report was 48% (95% CI 41 to 56) for cervical cancer screening and 61% (95% CI 53 to 69) for breast cancer screening (Online Resources 1 & 2). This indicates that about 40% to 50% of women without screenings in their medical record inaccurately reported having a screening. The adjusted pooled sensitivity of self-report was much higher at 96% (95% CI 94 to 97) for cervical cancer screening and 96% (95% CI 95 to 98) for breast cancer screening (Online Resources 3 & 4). This indicates that women with screenings in their medical record more accurately report their screening. These findings are consistent with the previous meta-analyses ( (23,24). Taken together, the average overall agreement (concordance) between self-report (both positive and negative) and medical record was 81% (cervical cancer) to 82% (breast cancer).
Both sensitivity (χ 2 P <.001, I 2 = 95.7 to 99.5) and specificity (χ 2 P <.001, I 2 = 94.1 to 99.7) outcomes showed considerable heterogeneity. We found no evidence of publication bias (P =.131) among cervical cancer studies, but we found evidence of publication bias (P <.001), among breast cancer studies. This may suggest that small studies that did not find a significant difference between selfreport and the medical record may have been less likely to be published compared to those that found a significant difference. However, the assessment of the potential for publication bias is more complicated for diagnostic accuracy studies than for studies of interventions, because there may not be not a clearly favored finding in these studies. Empirical evidence for the existence of publication bias in this area of literature is scarce (34).
The positive predictive value (PPV) for cervical and breast cancer self-report ranged from 80% to 84%, and the negative predictive value (NPV) ranged from 83% to 86% (Table 2). This indicates that the medical record verified most positive and negative reports of screening. The PPV and NPV are dependent upon the screening rate in the population, with a higher screening prevalence leading to higher PPV and lower NPV. The screening prevalence in the included studies (as verified by medical record) was fairly high at 74% for cervical cancer and 84% for breast cancer in the included studies, which may have led to the high observed PPV and NPV.

Differences in the Accuracy Self-reported Measures of Cervical and Breast Cancer Screening by Patient or Measure Characteristics or Setting
We evaluated the effects of prespecified variables (publication year, sample size, study design (casecontrol vs. cohort), setting (population-vs. clinic-based), referral or screening program (whether the patients were a part of a screening or referral program vs. not), survey administration method (inperson vs. telephone or mail survey), timeframe of recall (≤ 1 year to ever), and population (minority or low-income population vs. not)) using univariable bivariate and multivariable univariate metaregression models. Due to a lack of comprehensive reporting of specific moderators in primary studies, we did not include a variable for age in our meta-regressions, and we combined minority status and low-income status into a single variable of "population".
Results of multivariable univariate models suggest that minority or low-income women may have less accurate self-report than the general population for both cervical and breast cancer screening, but the clinical significance is of these differences is unclear. For breast cancer screening, sample size was also significantly associated with the accuracy of self-report, but this is likely due to the inclusion of a single study (N > 400,000) that was larger than all other studies combined. When we evaluated sensitivity and specificity using a univariable bivariate model, several factors (study design, setting, referral or screening program, survey administration method, and population) were significantly associated with sensitivity for cervical and breast cancer screening studies. In contrast, no factors showed a statistically significant effect on specificity for cervical cancer screening studies, and only population had a significant effect on specificity for breast cancer screening studies. The implications of the impact of these factors on sensitivity is unclear, as we found the results between studies for sensitivity to be similar (tightly clustered), so minor differences resulted in statistical significance. In contrast, results for specificity were highly varied, so larger differences were necessary to reach statistical significance. Additionally, the sensitivity among studies was high (sensitivity range 94% to 97%), and the absolute differences were equal to or less than 3%, which may not be clinically significant.

Clinical and System-Level Impacts of Using Self-reported Measures of Cervical and Breast Cancer
Screening No studies reported clinical or system-level outcomes. However, study authors frequently hypothesized potential adverse events or unintended consequences of using self-report, including overestimating of the success of screening interventions, masking of disparities in screening prevalence, less frequent screening, and risk of missed cancer diagnoses. Mention of adverse events or unintended consequences of using medical record was less frequent, but included time and resource burden of tracking medical records and potential for inaccuracies in medical records.

Discussion
It is essential for providers and policymakers to understand the accuracy of self-reported cancer screening rates as well as the clinical and system-level impacts of different documentation requirements. This systematic review updates the most recent syntheses of the evidence (23,24) with 11 new studies on the accuracy of self-reported measures of cervical and breast cancer screening, including the clinical and system-level impacts of relying on patient self-report instead of medical record documentation. Our meta-analysis confirmed over-reporting of cervical and breast cancer screening among women with no screening identified in their medical record. Only 48% to 61% (specificity) of unscreened women accurately reported not having a screening, but 96% (sensitivity) of screened women accurately reported their screening. These findings are consistent with the most recent meta-analyses, (23,24) indicating a fairly stable level of over-reporting of cervical and breast cancer screening among unscreened women.
We found that several factors had a significant effect on the accuracy of self-report for both cervical and breast cancer screenings. In adjusted models, studies in minority or low-income populations were more likely to find less accurate self-report compared to medical records. However, the clinical significance of these differences is unclear. This greater over-reporting in minority or low-income populations may be due to lower health-literacy, (72) cultural factors, (73,74) or greater barriers to healthcare access in general, thus increasing the number of unscreened women (12,13,(73)(74)(75). The National Breast and Cervical Cancer Early Detection Program (NBCCEDP) was established in the US in 1991 to improve screening rates among low-income and uninsured women (76). Despite these efforts, the percentage of eligible women screened remains low and further efforts to improve screening uptake in this population are warranted (12,13).
The over-reporting of screening among unscreened women can potentially be improved in several ways. Use of electronic medical records as opposed to paper records may improve self-report accuracy by improving documentation accuracy and access (61). Additionally, ensuring respondents understand the questions (i.e., including detailed explanations of what the tests entail); phrasing survey questions to minimize social desirability bias (i.e., asking about barriers or future plans prior to asking about past behaviors) (22); and having clear, exhaustive, mutually exclusive response categories can improve patient recall. Several of the included studies showed evidence of telescoping, in which women recalled screening tests as occurring more recently than indicated by medical record (44,50,51). Some of this bias may be alleviated by expanding the timeframe of recall to allow some room for error in recall of the timing of screenings (77). Self-report for cervical cancer screening may also be improved by explaining in detail the procedures performed during regular gynecological exams. The accuracy of self-report was lower for pap test recall than for mammograms. This may be due to the fact that patients often receive pap tests as a part of routine gynecological exams, and may falsely assume that they received a pap test when they did not (41). In contrast, patients typically receive mammograms as separately scheduled appointments, specifically for the purpose of completing a mammogram.
We did not find any studies reporting clinical or system-level impacts of accepting self-report, including the potential forover-or under-screening, missed diagnoses, and provider, administrative, or patient burden. Future research should focus on assessing the impact of accepting self-report on these important outcomes. Ideally, studies would directly compare the use of self-report versus provider documentation, but directly assessing clinical outcomes may not be feasible due to the necessary lengthy follow-up to identify cancer outcomes and due to patients' fragmented health care use. Even so, researchers may find it possible to track system-level outcomes to weigh the potential harms of accepting self-report against the time and resource burden of tracking medical records.

Limitations
The evidence in this review contains some important limitations. We found significant heterogeneity among the studies for all outcomes assessed, and heterogeneity remained in subgroup analyses.
However, this is most likely due to our inability to include all potential moderators in our metaregression, due to lack of comprehensive reporting of specific moderators in primary studies (e.g., we were unable to include a variable for age, and we combined low-income and minority status into one variable due to the limited reporting in primary studies). These variables may have accounted for some of the heterogeneity observed in our results. Common methodological limitations of the studies included lack of or unclear reporting of blinding of the medical record and/or self-report data collection and incomplete medical record review. Lack of blinding of patient interviewers or medical record data collectors may influence survey responses or the thoroughness of medical chart review.
Additionally, incomplete medical record review may lead to studies reporting higher false positives (a patient reports a test was done, but the medical record cannot confirm it) and reduce the overall specificity of self-report found in the study.
Limitations of our review methods include our literature search and our use of sequential, instead of independent, dual assessment. For our literature search, we limited the timeframe and used existing systematic reviews to identify earlier studies, which may have led to missed studies. This is unlikely, however, as several content experts identified no additional studies upon review. Additionally, although widely used, sequential dual review has not been empirically compared to independent dual review and may increase the risk of error and bias.

Conclusion
Unscreened women tend to over-report having had a mammogram or pap test, while screened women more accurately report their screening. Policymakers and providers should continue to consider this over-reporting in clinical and system-level decisions. However, as these findings are consistent with the previous synthesis, changes to current documentation policies do not appear necessary. Future research should focus on assessing the impact of over-reporting on clinical and system-level outcomes to advance knowledge about the trade-offs of accepting self-report.

Declarations
Ethics approval and consent to participate: Not applicable Consent for publication: Not applicable Availability of data and materials: The datasets used and/or analyzed in the current study are available from the corresponding author on reasonable request.
Competing interests: The authors declare that they have no competing interests.