Research Protocol for an Observational Health Data Analysis to Assess the Long-term Outcomes of Prostate Cancer Patients Undergoing Non-Interventional Management (i.e., Watchful Waiting) and the Impact of Comorbidities and Life Expectancy – PIONEER IMI’s “Big Data for Better Outcomes” program

This is a study protocol for an observational health data analysis, submitted as a preprint to facilitate transparency and open science. Watchful waiting (WW) represents a deferred treatment option for prostate cancer (PCa) patients when curative treatment seems overtreatment right from the outset. Patients are ‘watched’ for the development of local or systemic progression with disease-related symptoms, at which stage they are then treated palliatively according to their symptoms, in order to maintain quality of life. When choosing WW, it is important to adequately assess life expectancy of patients. Although previous studies reported the outcomes of PCa patients managed with WW, which is the impact of individual patient characteristics and comorbidities on long-term outcomes is still largely unknown. The PIONEER, which is a novel project of the Innovative Medicine Initiative’s (IMI’s) “Big Data for Better Outcomes” program with the mission to transform PCa care with particular focus on improving cancer-related outcomes, health system e�ciency and the quality of health and social care across Europe, aims at assessing which are the long-term outcomes of PCa patients undergoing WW overall and after strati�cation according to disease characteristics, comorbidities and life expectancy. Of note, this topic emerged as the second one with the highest agreement score among different stakeholders after an international consensus to identify and prioritize the most important questions in the �eld of PCa. This study aims to describe demographics, clinical characteristics and estimate outcomes of PCa patients under delayed treatment (WW) across a network of databases in the overall population and subgroups of patients identi�ed by individual disease characteristics, demographics and comorbidities. The study will rely on large observational data, namely population-based registries, electronic health records and insurance claims data. The study will be an observational cohort study based on routinely collected health care data which has been mapped to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM).


Introduction
Prostate cancer (PCa) is the second leading cause of cancer deaths in men worldwide [1].Despite high incidence rates, outcomes and survival rates for prostate cancer have improved signi cantly over the years, partly due to widespread availability of prostate-speci c antigen (PSA) testing [2].Localized prostate cancer (PCa) is characterized by a relatively long natural history, where not all patients affected by the disease would experience metastases and cancer-related death at long-term follow-up.For example, a man diagnosed with cT1c Gleason 6 PCa at age >75 has a risk of PCa-speci c death after 15 years of observation of 10%, while the risk of overall mortality is close to 80% [3].On the other hand, both RP and RT are associated with increased rates of urinary incontinence and erectile dysfunction with consequent detrimental effects on health-related quality of life [4].Consequently, more conservative approaches have been welcomed for the management of localized and locally advanced PCa.Active surveillance (AS) or watchful waiting (WW) are popular conservative approaches for PCa patients with non-metastatic disease and are recommended in selected patients according to the European Association of Urology (EAU) guidelines [5].
Several studies have justi ed the use of these conservative approaches for PCa patients with nonmetastatic disease.In the Prostate Cancer Intervention versus Observation Trial (PIVOT), 731 men with localized PCa (mainly low-risk) were randomly assigned to radical prostatectomy or observation.After ~20 years of follow-up, RP was not associated with signi cantly lower all-cause or prostate-cancer mortality over observation.RP was associated with a higher frequency of adverse events compared with observation but a lower frequency of treatment for disease progression.Additionally, urinary incontinence and erectile and sexual dysfunction were each greater with RP than with observation through 10 years.Furthermore, disease-related or treatment-related limitations in activities of daily living were greater with surgery than with observation through 2 years [6].
The terms AS and WW are frequently used interchangeably, but they refer to very different observational approaches in PCa management.Although AS and WW aim at avoiding unnecessary therapies and their treatment-related side effects, they have substantial practical differences [5].AS involves the avoidance or postponement of immediate curative therapy, combined with careful surveillance, whereby curative treatment is offered only upon evidence for increased risk of disease progression or patient preference [5,7].For WW no treatment with curative intent is planned, with the aim of avoiding treatment-related side effects.At disease progression and impending PCa-related complications, palliative treatment is started using hormonal therapy [5,8].Patients considered for WW are deemed as unsuitable for curative treatments due to their age and life expectancy, or have speci cally chosen for this management strategy and, therefore, are typically monitored until the development of local or systemic symptoms [5].Since available studies rely on historical cohorts [3], the natural history of contemporary patients managed with WW and the rates of disease progression and survival require further investigation.Moreover, the improved life expectancy and different impact of comorbidities on survival would preclude the generalizability of the results of these studies to contemporary cohorts.
Taken together, these observations highlight that there is currently a lack of real-world population-based data on the long-term outcomes of contemporary PCa patients managed with non-curative intent therapies such as WW.Moreover, the impact of patient disease characteristics, individual comorbidity pro les, life expectancy and race/ethnicity for the selection of WW candidates still remains to be elucidated.In the face of such a paucity of data, the PIONEER Consortium, which is a novel project of the Innovative Medicine Initiative's (IMI's) "Big Data for Better Outcomes" program with the mission to transform PCa care with particular focus on improving cancer-related outcomes, health system e ciency and the quality of health and social care across Europe, aims to assess the question "Which are the longterm outcomes of prostate cancer patients undergoing non-interventional management (i.e., watchful waiting) and what is the impact of comorbidities and life expectancy?".This question emerged as the one with the second highest agreement score among different stakeholders after an international consensus to identify and prioritize the most important questions in the eld of PCa performed within the IMI PIONEER project [9].The EAU Prostate Cancer Guideline panel and other prostate cancer Key Opinion Leaders were consulted to propose the most critical questions in the eld of PCa to be answered using big data.Through this process, 44 key questions were identi ed.Afterwards, the PIONEER consortium conducted a two round Delphi survey in order to build consensus between the two stakeholder groups: healthcare professionals (including representatives from pharmaceutical companies) and PCa patients.
Respondents were asked to consider what impact answering the proposed questions would have on better diagnosis and treatment outcomes for PCa, while scoring these questions [on a scale of 1 (not important) to 9 (critically important)].The results were analysed by calculating the percentage of respondents scoring each question as not important (score 1 to 3), important (score 4 to 6) or critically important (score 7 to 9).A modi ed Delphi Methods was then adopted for this prioritization process in order to build consensus among the participants.In the second round, participants were shown a summary of the percentage of other participants' (patients and healthcare professionals) who considered the question "critically important" in round one.The question "Which are the long-term outcomes of prostate cancer patients undergoing non-interventional management (i.e., watchful waiting) and what is the impact of comorbidities and life expectancy?" was ranked second among the 56 questions identi ed as "critically important" by the PIONEER project [9]. 5. To characterize demographics, clinical characteristics and estimate long-term outcomes of patients newly diagnosed with PCa who received immediate treatment (target cohort 2) 6.To characterize demographics, clinical characteristics and estimate long-term outcomes of patients newly diagnosed with PCa who delayed immediate treatment (target cohort 4)

Data Sources
The study will rely on large observational data, namely population-based registries, electronic health records (EHR) and insurance claims data.The data will be analyzed using a federated model, where the data remain with the data custodians and only the analysis results are shared and published.
Case series and AS cohorts will not be considered.

Study design
The study will be an observational cohort study based on routinely collected health care data which has been mapped to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM).
First, cohorts of individuals with PCa will be identi ed.Patients' demographics and clinical characteristics at or prior to index date (de ned below) and treatments and outcomes of these individuals at or after their index date will be described (clinical characterization).
Target Cohort 1 -Newly Diagnosed PCa: Adult patients with newly diagnosed PCa with at least 365 days of prior observation

Strati cations
Each target cohort will be analyzed in full and strati ed on factors based on the following pre-index characteristics, all strata are pending meeting minimum reportable cell counts (as speci ed by data owners): -Comorbidities classi ed according to standardized systems (e.g., Charlson Co-morbidity Index).
Patients will be strati ed into three groups:

Features of interest
These features span across the full set of target cohorts and research questions of interest in subgroups.Some features will only be relevant in some target cohorts or some subgroups, but the full list is given here.
Pre-index characteristics These features will be described as assessed during the year (-1 to -365 days) pre-index: Post-index characteristics These features will be described in two different time windows: at index date (day 0) and in the 365 days from index date (0 to 365 days).The characteristics will include: Concept-based: -Condition groups (SNOMED + descendants), >=1 occurrence during the interval -Drug era groups (ATC/RxNorm + descendants), >=1 day during the interval which overlaps with at least 1 drug era Cohort-based: • ER visits within 12 months after onset of symptoms • Hospitalization within 12 months after onset of symptoms

Symptomatic progression
Analysis: Characterizing cohorts All analyses will be performed using code developed for the OHDSI Methods library.The code for this study can be found at link.A diagnostic package built off the OHDSI Cohort Diagnostics (https://ohdsi.github.io/CohortDiagnostics/)library, is included in the base package as a preliminary step to assess the tness of use of phenotypes on your database.If a database passes cohort diagnostics, the full study package will be executed.Baseline covariates will be extracted using an optimized SQL extraction script based on principles of the FeatureExtraction package (http://ohdsi.github.io/FeatureExtraction/) to quantify Demographics, Condition Group Eras, and Drug Group Eras Additional cohort-speci c covariates will be constructed using OMOP Standard Vocabulary concepts.
At the time of executing Feature Extraction, the package will create a data frame in which individuals' age and sex will be extracted.Individuals' medical conditions, procedures, measurements and medications will be summarized 1) over the year prior to their index date (-365-1 day), 32) at index date (0day), and 4) at and over the follow-up time (0+ days).Number and proportion of persons with feature variables during time-at-risk windows will be reported by target cohort and speci c strati cations.Standardized mean differences (SMD) will be calculated when comparing characteristics of study cohorts, with plots comparing the mean values of characteristics for each of the characteristics (with the color indicating the absolute value of the standardized difference of the mean).
Baseline disease characteristics at diagnosis will be reported using medians and proportions for nonnormally distributed continuous variables and categorical variables, respectively.
The median follow-will be computed for the overall study cohort.The absolute number of patients who experienced overall mortality, cancer-speci c mortality, other-cause mortality and disease progression will be reported.
Kaplan-Meier analyses will assess time from PCa diagnosis to overall survival, cancer-speci c survival, other-cause-mortality-free survival and symptomatic progression-free survival and time to palliative or curative treatment initiation in the overall cohort and after stratifying patients according to the pre-de ned subgroups.
Kaplan-Meier analyses will assess time from disease progression to overall survival, cancer-speci c survival and other-cause-mortality-free survival in the overall cohort and after stratifying patients according to the pre-de ned subgroups.

Troubleshooting Strengths
The study is anticipated to be the largest patient-level cohort of PCa patients who received conservative management (AS or WW), thus allowing characterization of relatively uncommon outcomes, otherwise not identi able in smaller datasets.Data will be obtained from multiple centres and providers from at least ve countries and two continents.This enables comprehensive characterisation of the study population, key baseline characteristics, outcomes.Lastly, the use of routinely collected data from multiple sources maximizes the external validity and generalisability of the ndings.

Limitations
This study is carried out using data recorded in a collection of EHR, claims and tumor registries.As with any healthcare database used for secondary data analysis, the patient records might be incomplete in following PCa diagnosis.Similarly, distinction between WW and AS is not possible in the secondary data and can only be inferred based on the intensity of screening during the follow up.Using future information to de ne the study (target) cohorts lead to immortal time bias [11,12].To avoid this, landmark analyses will be used.Six months post initial diagnosis of PCa will be used as a landmark time (landmark time1) to ascertain initial treatment status.Patients receiving any PCa related treatment during this period are classi ed as immediate management and patients who did not receive any treatment during this period will be classi ed as conservative management.Patients who were lost to follow-up or died prior to this time are excluded.To further distinguish WW from AS in the conservative management cohorts, a combination of lack of treatment in the rst six month following initial PCa diagnosis, PCa risk group and intensity of surveillance during the rst 18 months of follow up (landmark time 2) will be used.
Patients with intermediate-or high-risk PCa who received no treatment in the rst six months following initial PCa diagnosis will be categorized into "Intermediate-, high-risk WW" cohort.Patients with low-risk PCa and minimal surveillance during the rst 18 months will be categorized into "low-risk WW" and lowrisk PCa patients with intense surveillance in the 18 months period post PCa diagnosis will be categorized into "low-risk AS".Landmark analyses, itself, can lead to misclassi cation of patients.To reduce this potential bias, landmarks times (six-months and 18-months post PCa diagnosis) were chosen a-priori and based on clinically meaningful periods [13].
Medical conditions may be underestimated as they will be based on the presence of condition codes, with the absence of such a record taken to indicate the absence of a disease.Meanwhile, medication records indicate that an individual was prescribed or dispensed a particular drug, but this does not necessarily mean that an individual took the drug as originally prescribed or dispensed.

Protection of Human Subjects
The study uses only de-identi ed data.Con dentiality of patient records will be maintained at all times.Data custodians will remain in full control of executing the analysis and packaging results.There will be no transmission of patient-level data at any time during these analyses.Only aggregate statistics will be captured.Study packages will contain minimum cell count parameters to obscure any cells which fall below allowable reportable limits.All study reports will contain aggregate data only and will not identify individual patients or physicians.
to describe demographics, clinical characteristics and estimate outcomes of PCa patients under initial conservative management (delayed treatment) across a network of databases in the overall population and subgroups of patients identi ed by individual disease characteristics, demographics and comorbidities.In detail, the main objectives of the study are: 1.To describe demographic and clinical characteristics of patients with PCa under conservative management (delayed treatment, target cohort 3) To estimate clinical outcomes of PCa patients under conservative management (delayed treatment): Overall survival Cause-speci c survival (cancer and other-causes) Time to symptomatic progression Time to palliative (or curative) treatment initiation To characterize detailed treatments patterns and outcomes of patients with PCa under conservative management (delayed treatment) who initiated treatment: • Distribution of treatment type: curative and palliative • Distribution treatment categories: ADT, RT, RP, systemic anti-neoplastic treatment 4. To characterize demographics, clinical characteristics and estimate long-term outcomes of patients newly diagnosed with PCa across a network of databases (target cohort 1)

Outcome 3 . 3 .
Treatment (curative or palliative) initiation.Initiation of PCa-related palliative or curative related treatment such as surgery, radiotherapy and systemic anti-neoplastic during the follow up Curative treatment was de ned as having any of the following: 1. Prostatectomy (radical, open, laparoscopic radical and robot assisted radical) Focal therapy (HIFU, Cryotherapy, RFA) Palliative treatment was de ned as having any of the following: , paclitaxel, cabazitaxel, mitoxantrone) Immunotherapy (sipuleucel-T, pembrolizumab) PARP inhibitors (olaparib, rucaparib) Androgen receptor inhibitor (ARTA) ADT (ATC for GnRH agonists: L02AE OR GnRH antagonists: L02BX OR anti-androgens: L02BB) Radiotherapy following symptoms Placement of ureteral stent or nephrostomy for acute kidney failure Colostomy Chronic foley catheter placement Pelviectomy (Total pelvic exenteration) Suprapubic catheter placement Outcome Cohort 4. Curative treatment initiation Outcome Cohort 5. Palliative treatment initiation Outcome Cohort 6. Hospitalization within 12 months after onset of symptoms Outcome Cohort 7. ER visits within 12 months after onset of symptoms Outcome Cohort 8. Cancer-speci c mortality: occurrence of death from PCa Outcome Cohort 9. Other cause mortality: occurrence of death from causes other than PCa Follow-up Patients are followed up from index date until death, diagnosis with another malignancy (except for nonmelanoma skin cancer), or end of observation period.

•
Male adults (age ≥ 18) • a diagnosis of PCa (Index date: date of rst visit with PCa dx) • a prostate biopsy within 30 days of the rst visit with PCa diagnosis • no history of PCa or prostate dysplasia within 365 days prior to index • no drug exposure to ADT or androgen agonist/inhibitor within 365 days prior to index Target Cohort 2 -Immediate management: Adult patients with newly diagnosed PCa and treatment within six months with at least 365 days of prior observation • Male adults (age ≥ 18) • a diagnosis of PCa • a prostate biopsy within 30 days of the rst visit with PCa diagnosis • no history of PCa or prostate dysplasia within 365 days prior to rst PCa diagnosis • no drug exposure to ADT or androgen agonist/inhibitor within 365 days prior to rst PCa diagnosis • receipt of at least one treatment (curative or palliative) within the rst six months after PCa diagnosis (Index date: six months after rst prostate cancer diagnosis) Target Cohort 3 -Delayed management: Adult patients with newly diagnosed PCa and no treatment within 6 months of their diagnosis • Male adults (age ≥ 18) • a diagnosis of PCa • a prostate biopsy within 30 days of the rst visit with PCa diagnosis • no history of PCa or prostate dysplasia within 365 days prior to rst PCa diagnosis • No drug exposure to ADT or androgen agonist/inhibitors within 365 days prior to rst PCa diagnosis • No treatment (curative or no palliative) within the rst six months of prostate cancer diagnosis (Index date: six months after rst prostate cancer diagnosis) Target Cohort 3.1 -Intermediate-and high-risk PCa watchful waiting: Adult patients with newly diagnosed intermediate-or high-risk PCa who received no treatment within 6 months of their diagnosis • Male adults (age ≥ 18) • a diagnosis of PCa • a prostate biopsy within 30 days of the rst visit with prostate cancer diagnosis • no history of PCa or prostate dysplasia within 365 days prior to rst PCa diagnosis • No drug exposure to ADT or androgen agonist/inhibitors within 365 days prior to rst PCa diagnosis • Intermediate -risk or high-risk PCa according to EAU risk groups [5] • No curative or palliative treatment within the rst six months of PCa diagnosis (Index date: six months after rst prostate cancer diagnosis) Target Cohort 3.2.1 -low-risk PCa watchful waiting (low-risk PCa patient not managed with AS during the rst 18 months): Adult patients with newly diagnosed low-risk PCa who received no treatment within the rst six months of diagnosis and were not managed with AS in the rst 18 months of diagnosis of PCa or prostate dysplasia within 365 days prior to rst PCa diagnosis • No drug exposure to ADT or androgen agonist/inhibitors within 365 days prior to rst PCa diagnosis • No curative or palliative treatment within the rst six months after PCa diagnosis • Low-risk PCa • At least one biopsy or ≥3 PSA testing or ≥3 urological visits within the rst 18 months after the rst diagnosis (Index date: 18 months after prostate cancer diagnosis) • Male adults (age ≥ 18) • a diagnosis of PCa • a prostate biopsy within 30 days of the rst visit with PCa diagnosis • no history of PCa or prostate dysplasia within 365 days prior to rst PCa diagnosis • No drug exposure to ADT or androgen agonist/inhibitors within 365 days prior to rst PCa diagnosis • No curative or palliative treatment within the rst six months after PCa diagnosis • Male adults (age ≥ 18) • a diagnosis of PCa • a prostate biopsy within 30 days of the rst visit with PCa diagnosis • no history • a prostate biopsy within 30 days of the rst visit with PCa diagnosis • no history of PCa or prostate dysplasia within 365 days prior to rst PCa diagnosis • No drug exposure to ADT or androgen agonist/inhibitors within 365 days prior to rst PCa diagnosis • No curative or palliative treatment within the rst six months after PCa diagnosis • Initiation of curative treatment or Palliative treatment after six months of rst prostate cancer diagnosis (Index date: date of treatment initiation) Target Cohort 4.1 -Delayed curative management and further treated curatively PCa: • No drug exposure to ADT or androgen agonist/inhibitors within 365 days prior to rst PCa diagnosis • No curative or palliative treatment within the rst six months after PCa diagnosis • No drug exposure to ADT or androgen agonist/inhibitors within 365 days prior to rst PCa diagnosis • No curative or palliative treatment within the rst six months after PCa diagnosis • Palliative treatment (RT or systemic therapies) after six months of initial prostate cancer diagnosis (Index date: date of treatment initiation) To include newly diagnosed PCa patients not undergoing biopsy at the time of diagnosis, more inclusive cohorts were de ned by including patients with a PSA value above 50 ng/mL within 30 days of PCa diagnosis with or without a PCa biopsy in the study.
in most EHR and claims databases.Treatment provided in hospitals or any other setting outside each participating institution is not included.Lack of information on treatment intent and di culty in distinguishing WW from AS are a major limitation of the study.Treatment intent upon PCa diagnosis is not generally captured in the data.As such, identi cation of patients who were put on initial conservative management (target cohort 3) is based on lack of events (drugs, observations or procedures indicative of immediate PCa treatment) many respects and may have had erroneous entries, leading to misclassi cation of study variables.Data regarding diagnosis of prostate cancer, treatments, pathology and lab results or baseline covariates prior to enrollment within the database may not be available.PCa speci c characteristics such as stage or grade at diagnosis or the extent of the disease or mutational status of genes implicated in PCa are not readily available