Challenging the Results and Consequences of Cancer Screening Programs.

Background. Cancer screening tests have been used in clinical routine for more than half a century, without consensus as to their value. Method. We offer a method to quantify two components of screening programs relevant to their interpretation: the self-selection of participants and the effect of screening on reduction of mortality. Self-selection occurs because patients with a favorable prognosis are more health-conscious and accept screening more often than participants with poor prognosis. Separate quantication of these two components might allow estimates of the effects of screening independent of the effect of self-selection. Result. Using the example of a single prostate screening study, our analysis shows that most of the apparent benecial effects of screening can be explained by self-selection. We also describe how the Informed Consent process can inuence outcomes. Solution. Using similar methods, other published studies can be interpreted in the light of self-selection bias. To avoid these biases, we suggest application of the Pragmatic Controlled Trial design that yields estimates of Real-World Effectiveness as an alternative to Randomized Controlled Trials.

Solution. Using similar methods, other published studies can be interpreted in the light of self-selection bias.
To avoid these biases, we suggest application of the Pragmatic Controlled Trial design that yields estimates of Real-World Effectiveness as an alternative to Randomized Controlled Trials.

Background
All screening systems are subject to 'Healthy Volunteer Bias'. This bias occurs if people who accept the offer of a screening program are more health-conscious and less likely to be ill than people who reject this offer (1)(2)(3)(4). Attempts to correct for this bias using data from screening programs have not been successful. Promising approaches were suggested 30 years ago but have not been applied in empirical studies (5). The challenge is that it has not been possible to reliably model self-selection bias (6)(7)(8).
Four different effects can make it di cult to evaluate the results of screening programs:(i) non-transparent informed consent, (ii) contamination, i.e. screening requested by participants in control groups of screening programs, (iii) self-selection of participants in subgroups who either accept or refuse the planned cancer screening program, and (iv) application of a therapeutic intervention in participants with a positive screening result.
Here, we suggest a strategy that can be used to gain more detailed information from published reports of screening trials, to quantify the presumed effects, to improve the understanding of cancer screening programs, and nally to optimize methods for generation of unbiased screening data.

Methods
We illustrate our method by analysis using detailed methodological information from the 'Eighteen-year followup of the Göteborg Randomized Population-based Prostate Cancer Screening Trial' by Hugosson et al (9)(10). Our analysis is based on the terminology used to describe the four potential confounders that may in uence or distort the results of a screening evaluation.
Terminology. a) Groups determined by pre-randomization.
The Göteborg study used pre-randomization: i.e. participants were randomized before they were asked to provide Informed Consent (IC). This sequence may raise ethical, legal and epidemiological concerns (11,12).
The invited population (Inv) splits into two subgroups: one that accepted the invitation (InvA) and another that rejected the invitation (InvR). The number of participants sorting into these two subgroups is recorded.
Among the invited group who accept the invitation one subgroup will get a positive screening result. for the purposes of illustration, we assume that all participants of this group will accept an offer for intervention (InvA/Sp/Int). Another subgroup with negative screening result will not be offered an intervention (InvA/Sn/NoInt). b) Contamination; screening in the control group.
Some of the individuals who were not invited to participate in the screening program (NInv) but will nonetheless obtain screening outside of the study. The not invited group splits therefore into two subgroups: one without participation in a screening program (NInv/NS) and another that has used external screening (NInv/ExtS). Participants of this subgroup with a positive screening result will be offered an intervention and are likely to accept it (NInv/Sp/Int) while those with negative screening will not have an intervention (Ninv/Sn).
The remainder of those non-invited are not screened (NInv/NS). Most of this information will not be available in reports of studies. c) Participants self-selection.
Another effect that may in uence the results of screening programs is the postulated 'Healthy Screenee Bias that is potentially associated with a good prognosis for people who attend or seek screening and poorer prognosis of those who refuse the invitation to a screening program. d) Application of a preventive intervention.
The effect that may in uence the results of screening is the preventive intervention offered to the subgroups of all participants with a positive screening result obtained either within or outside of a clinical trial.
We assume that the number randomized in the study (n = 19,899) is large enough to guarantee a fairly equal distribution of similar risk pro les in the two randomized groups of the trial. The calculations derived from this assumption are described in detail together with the results. Table 1 Table 1 also shows the number of prostate cancer cases detected, the death from prostate cancer, the death from all causes, and the number of metastatic prostate cancer. Using these data, it can be demonstrated that the death from prostate cancer among patients with detected prostate cancer is twice as high in the not invited population (12.6%) as compared to the invited population (5.6%). The same ratio is found when comparing the development of metastatic disease (18.6% versus 8.3%). Both data sets are contrasting the all-cause mortality rates that are identical in the invited and not invited groups.

Results
If those who declined screening were equivalent to people randomly assigned not be offered screening, we would expect equal detection rates in the two groups. However, the rate of detection for those offered but refused was about half as high (5.3%) in comparison to those who were not offered screening (9.6%). The detection rate among those who accepted the invitation for screening was 16.6%. The "intention to treat" method combines the those who accepted or refused an offer for screening. When considered by intention to treat, there were fewer prostate cancer deaths in the invited group in comparison to those not invited (0.8% vs 1.2). However, there were no differences in all-cause mortality (28.6% vs 28.7%). Effects of Consent The synopsis in Fig. 1 highlights the differences among the subgroups in this study. It shows that not all individuals who were invited to participate in a screening program took advantage of this offer. According to the study protocol the group of participants who was invited to screening was asked to provide informed consent. As the consent was obtained only after randomization it is unclear (and not described in the original publication) whether or not an consent was also requested from invited participants with negative screening results ( Fig. 1 cells with blue background) and from participants invited to screening who refused the screening offer ( Fig. 1 the cell with yellow background of the invited group).
Calculation of missing details The information generated by randomization was used for calculation of the missing details in the group of participants who were not invited for screening. The invited and not invited groups were large enough to assume a fairly equal distribution of subpopulations in the two groups. Assuming that those with a good prognosis are most likely to accept a screening offer, randomization should result in equal distributions of participants with assumed good (accepting the screening offer) or poor prognosis (refusing the screening offer). Due to random allocation the numbers of refusers (assumed poor prognosis) will be identical in the group invited for screening or not invited (control group). The numbers of participants not invited to screening with assumed good prognosis can be calculated by the difference of all not invited and the not invited participants with assumed poor prognosis. These calculated data are presented in Table 2. Table 2 Calculation of missing data. Due to the random allocation of participants of the screened (columns A, B, C) and not screened groups (column D, E, F) should be equally distributed. The distribution of participants who refused the invitation to screening with assumed poor prognosis in the invited group (column B) will be similar to the distribution participants with assumed poor prognosis in the control group (column E). The difference of the total numbers of participants in the control group (column F) and the number of the participants with poor prognosis (E) will result in the number of participants in the control group with assumed good prognosis (column D). The data in column (E) are shown in (brackets) to indicate the numbers in column (E) are similar to the numbers in column B. To avoid an overload of information lines 02-04 of Table 1 were eliminated in Table 2. The completion of the missing numbers enables the calculation of ve indicators for assessment of the e cacy of prostate cancer screening.

The indicators of e cacy
The ve indicators of e cacy are shown in Table 3 . These data provide three types of speci c information. It explains the detailed decisions and assumptions that were made by the investigator, the participants, and the attending physicians.
The investigator selected the participants from a data base and used the randomization for allocation of the participants to either the groups invited or not invited for screening. The attending physician and investigators recorded the participants' decision to accept or to refuse the invitation and offered the screening according to the individual participant's decision. The investigator also derived the assumed prognosis from the individual participant's decisions: a good prognosis is assumed if the invitation was accepted but poor prognosis if it was refused.

The Informed Consent
According to the original publications (9, 10) Informed Consent (IC) was obtained only from participants who were invited to screening but not from the not invited group. The original publication does, however, not describe if all invited participants were asked to provide an IC or only those who accepted the invitation for screening or those who were offered treatment due to a positive screening result. This information is important for ethical, epidemiological, medical and economic reasons. Lacking information on the IC has to be considered for the interpretation of the results. Table 3 The upper part of this synopsis replicates the results of Table 2. This synopsis provides three types of information. First, the descriptions of the two study groups: invited or not invited (line 1) and its four subgroups: invited participants who accepted or refused the invitation (line 2 col. A and B) and not invited participants (line 2 col D and E). The assumed prognoses and the obtained IC in these four subgroups are described in line 3 and 4. The second type of information includes the numbers in the four subgroups of participants, of detected prostate cancer, of death from prostate cancer, of death from all causes, and of metastatic prostate cancers. The third type of information describes the indicators [in brackets the lines used for calculation] that express the effects of prostate cancer screening mostly but not exclusively under experimental study conditions. These indicators are the incidence of prostate cancer (IPC line 10), the disease speci c mortality (DSM line 11), the all-cause mortality (ACM line 12), the lethality from prostate cancer (line 13), and the rate of metastatic disease (RMD line 14). Line 10 in Table 4 shows the odds ratios of incidences of prostate cancer (i.e. the pathologist's assignment of the diagnosis 'prostate cancer') seems to be higher in populations with assumed good prognosis than subgroups with poor prognosis. The likely reason for this difference is the population with assumed good prognosis more often accepts screening than the subgroups with assumed poor prognosis. Table 4 Assessment of e cacy of prostate cancer screening based on the results of Hugosson et. al (9,10). The odds ratios of the incidences of prostate cancer, the disease speci c mortalities, the all-cause mortalities, the lethality from prostate cancer, and the risks of advanced stage of disease are compared in three pairs of target populations. Left column: the good prognosis subgroups with or without screening. Middle column: not invited groups with good or poor prognosis. Right column: the groups of all invited versus all not invited participants.

Effects of Effects on
Screen/No screen in subgroups with assu. good progn.
Assumed good / poor prognosis Screening of invited /not invi-ted participants Odds ratios calculated from columns in Table 3 [ participants with better prognosis than the not screened groups. This is not true for the middle column where the odds rate was 1. Neither of the two comparison groups were screened.
Line 12 of Table 4 shows that screening has no effect on the all-cause mortality when either subgroups with good prognosis who accepted frequent screening were compared with the corresponding subgroup that was not offered screening (left column) or all participants with or without an offer to be screened were compared (right column). The ACM odds ratio of 0.5 in the middle column seems to be plausible because this column compares participants with a good versus a poor prognosis. Neither of these subgroups was offered screening within the study but the subgroup with good prognosis may have asked more often for external screening than the subgroup with poor prognosis. This information was not presented in the analyzed publication.
Line 13 seems to demonstrate that the lethality from prostate cancer is always lower in screened than in not screened groups. Another explanation for these data is the difference in the prognosis, which is likely to be better in the screened than in the not screened groups. A third possible explanation is potential biases in both types of information used for the calculation of LPC. These two types of information that are sensitive to bias are the diagnosis of prostate cancer and prostate cancer as cause of death. From a scienti c perspective the numbers of participants in a study and the numbers of observed deaths are numbers least likely to be affected by bias. Line 14 shows a problem similar to lines 10 and 13. The caveat in this line is related to the diagnosis of metastases from prostate cancer. We are not sure that a PSA concentration of > 100 ng/ml is a reliable indicator for the diagnosis of advanced stage i.e. metastatic prostate cancer as some authors have suggested (9). Numbers based on the reduction of metastatic prostate cancer that are not based on the demonstration of either osteolytic or osteoplastic bone metastases in association with solid histopathologic data that con rm prostate cancer as primary tumor should be interpreted with caution.

Discussion
Our analysis suggests four conclusions: 1) screening studies are very sensitive to bias and have to be carefully controlled for confounders; 2) to control for confounders a sequence should be followed; 3) minimal  (19,20). Randomized studies seek to identify the single variable that affects outcome. If in screening studies only some of the participants were asked to provide an IC a second variable is likely to confound the outcomes. The information exchanged during the process of IC has a considerable in uence on individual decisions and will in uence the self-allocation of participants to the study groups. The example of the Hugosson screening study (10) demonstrates that some participants were invited to participate while others were not. Those who were invited and accepted the invitation were very likely asked to provide a formal IC. In many screening programs it is not clear whether or not the study participants who were allocated to the not screened control group were also asked to provide a formal IC. This IC was explicitly not requested (10) in the study we analyzed (9).
All-cause mortality remains the outcome least likely to be affected by bias. The denominator (number of participants in a study) and the numerator (number of deaths in this study) can be counted with low risk of bias. ACM depends on the prognosis of participants included in the study. Our analysis con rms this in uence of the prognosis and does not support the assumption that screening i.e. the early detection of prostate cancer -using the described setting and methodologies -can reduce the ACM. Our conclusions are consistent with suggestions that the PSA test has little value for preventing deaths from all causes (21)(22)(23)(24).

Page 10/14
Screening bias has two components: bias associated with selection and consent, and bias associated with emotion.
There is a strong public demand for cancer screening (17,18) and many consumers are not interested in scientists 'complicating' maneuvers, recommendations, and caveats. The public wants to feel safe. The subjective perception of safety is one of the basic human needs (25). This basic human need can be controlled very easily with very simple and plausible information. Some believe we should protect consumers from information that disrupts their sense of perceived safety. Medical oncologists are aware of the patient demand for feeling safe. The 'safety loop' shown in Fig. 2 summarizes our experience derived from several projects (Supplement I).
Careful elimination of all forms of bias should be a high priority before a screening program is marketed to the public. Once implemented recalling a program is di cult, particularly if they feel nothing better can be offered.
Making the quality criteria for new screening programs transparent should be an important goal of health care.
Declarations Figure 1 The synopsis in Figure highlights the differences among the subgroups in this study. It shows that not all individuals who were invited to participate in a screening program took advantage of this offer.