Development and validation of patient-level prediction models for symptoms, hospitalization and treatment initiation amongst prostate cancer patients on watchful waiting


 The objective of this study is to develop and validate patient-level prediction models for patients on watchful waiting (WW) estimating the risk of developing symptomatic progression, hospitalization, ER visit, initiation of curative or palliative treatment, and survival. Estimation for all clinical models will be done based on 1) age and clinical measurements (e.g., PSA) 6 months before diagnosis, 2) age, clinical measurements 6 months before diagnosis, and clinical conditions one year before diagnosis. Finally, a clinically usable model will be developed based on expert clinical input. All prediction models will be implemented using Lasso logistic regression for the time at risk analyses.


Introduction Introduction
Prostate cancer (PCa) is the second leading cause of cancer death in men worldwide (1). Despite high incidence rates, outcomes and survival rates for PCa have improved signi cantly over the years, partly due to widespread availability of prostate-speci c antigen (PSA) testing (2). PSA based screening leads to earlier detection of PCa often at a stage it is amenable to treatment but years before it would have presented clinically (lead time bias) (3). However, it also leads to a signi cant increase in the number of men found to harbour prostate cancer the majority of whom would not have gone on to develop clinical disease (4). Historically, early detection leads to treatment much of which was unnecessary. This overtreatment, result in dysfunction of the urogenital tract including incontinence, infertility and impotence or secondary malignancies affecting patient's quality of life (5).
These two issues (overtreatment and lead time bias) have led to the development of two separate conservative approaches, active surveillance (AS) and watchful waiting (WW) for patients with PCa by the European Association of Urology (EAU) (https://uroweb.org/guideline/prostate-cancer/) and American Association of Urology (AUA) (https://www.auanet.org/guidelines/prostate-cancer-clinicallylocalized-guideline) guidelines. AS is an internationally accepted management strategy for men with lowand intermediate-risk PCa with a low risk of disease progression. These patients keep the option to convert to curative treatment at the time of progression (6). It attempts to reduce overtreatment by only treating those patients who are shown to progress.
Asymptomatic patients with localized disease and a life expectancy less than 10 years at time of diagnosis are not likely to bene t from radical treatment because of the lead-time bias associated with PSA testing. Instead, these patients are offered WW (i.e., symptom-guided treatment) and they may receive palliative treatment in case of progression to maintain quality of life.
Knowledge on potential risk factors for outcomes of men following WW is limited as is the natural history. The current evidence for WW mainly emerged from clinical studies which might limit the identi cation of potential risk factors. A European network of excellence for big data in PCa, PIONEER (https://prostate-pioneer.eu), partner in the Innovative Medicine Initiative's (IMI's) "Big Data for Better Outcomes" program, aims to improve PCa care across Europe through the application of big data analytics (7). Within the context of PIONEER, the objective of the current study is to apply data driven strategies to identify predictors for symptomatic progression, hospitalization, ER visit, treatment initiation and death in order to support clinical decisions making for the management of WW.

Objective
The objective is to develop and validate patient-level prediction models for symptomatic progression, hospitalization and palliative treatment initiation amongst prostate cancer patients on watchful waiting. In detail, we predict the 1-, 2-, and 5-year risk of developing symptomatic progression, hospitalization, ER visit, treatment initiation, and any death based on age, clinical measurements and clinical conditions to guide expectancy management for both the clinician and the patient, see Figure 1. For the model to be used in clinical practice (i.e., the clinical model), we develop a model based on expert clinical input. Discrimination will be used to compare the full "big data" model and the clinical model.

Reagents
Equipment Procedure This study will follow a retrospective, observational, patient-level prediction design (https://ohdsi.github.io/TheBookOfOhdsi/PatientLevelPrediction.html). We de ned the 'patient-level prediction' as a modeling process wherein an outcome is predicted within a time at risk relative to the target cohort start and/or end date. Prediction will be performed using a set of covariates derived using data prior to the start of the target cohort. prediction problem we will address. Among a population at risk, we aim to predict which patients at a de ned moment in time (t = 0) will experience some outcome during a time-at-risk. Prediction is done using only information about the patients in an observation window prior to that moment in time.
We follow the PROGRESS best practice recommendations for model development and the TRIPOD guidance for transparent reporting of the model results. (8,9). For all data sources, we refer to the appendices.
In all models we estimated the risk after 1, 2, and 5 years after diagnosis. Our population setting comprises patients with a time-at-risk window between 0 and 365 days, 0 and 730 days, and 0 and 1826 days. In all settings, the minimum lookback period applied to the target cohort is 365 days, without removing patients without time at risk or removing patients with an outcome prior to diagnosis. We included only the rst exposure per patient. Model evaluation will be based on the calibration plot and the discrimination of the internal and external validation.

Quality Control
The PatientLevelPrediction package itself, as well as other OHDSI packages on which PatientLevelPrediction depends, use unit tests for validation.

Tools
This study will be designed using OHDSI tools and run with R.

Diagnostics
Reviewing the incidence rates of the outcomes in the target population prior to performing the analysis will allow us to assess its feasibility. The full Shiny app can be observed here: PIONEER watchful waiting.

Algorithm Settings
For the time at risk analyses we use lasso regression, we use a xed set seed and a starting lambda value of 0.01.

Covariate Settings
A covariate included in the model needs to contain at least 0.001 times. In all models we speci ed medium term as 180 days and long term as 365 days.

First model
In the second model, we included the predictors age and all concept based clinical measurements 6 months before diagnosis de ned in the OMOP Common data model. In the model development we will split the data in a train set (75%) and a test set (25%) for internal validation. The optimal lambda for the lasso regression will be assessed by 3-fold cross validation on the train set. Discriminative ability between models will be assessed by the area under the receiver operating characteristic curve (AUC). The discrimination of the clinical model will be compared against the conceptbased model.

Strengths & Limitations
A strength of the study is the inclusion of multiple data sources such as clinical data and claims data, all adapted with OMOP standards, allowing more generalized results. The analysis of big data may identify predictors that are currently not used in daily clinical practice. This provides a limitation but also a chance for the study. Newly identi ed signi cant predictors might not be included in clinical procedures, and therefore this study can be irrelevant for clinical questions. On the other hand, it may provide the chance to adapt current PCa treatment for the future.
A clear limitation of this study is, that in claims data the occurrence of death is not accurately presented and might be biased.

Protection of Human Subjects
Local analyses were run to take into account the sensitive nature of the data. Con dentiality of patient records will be maintained always. All study reports will contain aggregate data only and will not identify individual patients or physicians. At no time during the study will the sponsor receive patient identifying information except when it is required by regulations in case of reporting adverse events.

Tables & Figures
For the incidence rate and characterization, we refer to PIONEER watchful waiting.

Troubleshooting Time Taken
Anticipated Results Figure 1 Overview of current study Figure 2 The prediction problem