Artificial Intelligence-based Personalized and Risk-adapted Surveillance Management for Urologic Cancer: A SEER-based Study

: Identification Current clinical practice is limited by “one-size-fits-all” time management or subjective assessment that is not necessarily appropriate for individual patients’ risk profiles. A risk-adapted follow-up is more efficient and optimal from the clinical perspective. Herein we introduce an artificial intelligence-based data-driven solution for risk-adapted follow-up schedule that devises the number of follow-up visits per year and assesses the cancer-specific risk profile over a long interval depending on the age at diagnosis. We utilized Surveillance, Epidemiology, and End Results (SEER), a cancer registry database and a resource for cancer-specific survival estimation in the United States that includes more than 2 million patients diagnosed with urologic cancers. We tested different machine learning algorithms on their definition for the feature importance and selected clinically relevant parameters regularly used in clinical routine to develop a survival modeling. As a result, the underlying recurrent neural network algorithm for follow-up modeling was fitted on the unseen test set with an overall concordance index score of 0.80. A controlled access to the online tool is available for physicians.


Introduction
Devising an individualized surveillance strategy for patients following oncologic interventions is still one of the most challenging tasks for clinicians caring for cancer patients. The treatment and surveillance strategy are typically based on the best possible risk profiles and patients associated with comorbidities [1][2][3][4][5] . After curative treatment or during active surveillance, patients are followed at predefined intervals to identify treatment-associated side effects or prognosis changing events that require the clinical reassessment, initiating a new treatment strategy or modifying the ongoing treatment 6,7 .
The follow-up visits generally start three months after treatment and are conducted at 3 to 6-month intervals initially and possibly extended to an annual basis depending on cancer risk and time interval from the initial treatment 8 . The frequency of the follow-up visits is generally based on expert opinions, given that some patients will have a failed curative treatment in the first years after diagnosis 7 . The current clinical practice is limited by rigid time management or the subjective assessment that is not necessarily appropriate for the patients' risk profile.
We hypothesize that risk-adapted individualized follow-up planning for cancer diseases is feasible. Towards that goal, we introduced artificial intelligence (AI)-based data-driven solution that recommends the number of follow-up visits per year and assesses the cancer-specific risk profile over a long period. Further, it suggests when we shall reconsider the follow-up plan. For that purpose, we utilized a cancer registry database, including almost 2 million patients who were diagnosed with cancer of urogenital organs (i.e., prostate, testis, kidney, urinary bladder, ureter, renal pelvis and penis) from the Surveillance, Epidemiology, and End Results (SEER) Program that is considered as resources for the cancer-specific survival estimation in the United States 9 . We further included a web-based explainable AI solution interface to recommend a risk-adapted follow-up strategy using an advanced machine learning algorithm and the puzzle concept for urologic cancer cases. We utilized the puzzle concept that assumes that all patients from a national dataset are pieces of a puzzling problem to reconstruct the population's survival history. Our concept relies on the recurrent artificial neural network that supports solving the N-puzzle problem and is frequently applied to natural language processing (e.g., automated generation of texts) 10 .

Dataset
We utilized the SEER database covering the years 1975-2017 (Version 18) and included 2,006,052 patients diagnosed with one of the urologic cancers (i.e., prostate, testis, kidney, urinary bladder, ureter, renal pelvis, penis, other genital organs). The affected organ by cancer type was determined according to the documentation guideline of the SEER program (`Site recode B ICD-O-3/WHO 2008`). Here, we collected 505 features from the database covering demographic information, clinical and pathological information.
Information on treatments was not included, since only surgery data are available.
Further, two additional features (cancer-specific death status and the follow-up duration given in years) included the cancer-specific survival follow-up data. Supplement file 1 lists the variables considered in the current study.
We randomly divided the database into the development set (90%) and the out-held test set (10%) while maintaining the same proportion for the cancer-specific death status between these datasets for model development and evaluation. The development set was then split into the training (90%) and validation sets (10%). The validation set is used to evaluate the optimization procedure of the model weights. We applied the feature-wise normalization by ranking the units of a feature according to their appearance order in the dataset and dividing by the maximum rank to achieve a value between 0 and 1. When the information is missing for the feature, a value of -1 was given. In contrast, the followup duration was fed as a non-negative continuous value without normalization into the model, and the cancer-specific death status was binarized (0: no cancer-specific death; 1: cancer-specific death).
Reliability of Machine Learning for the feature selection A total of 506 features including the follow-up duration were considered as input data.
We evaluated the feature importance to determine whether machine learning can provide a list of clinically relevant features. For all features, we evaluated the Gini importance for the random forest and XGBoost algorithms and the coefficients for the Linear Support Vector Machine (LSVM). Besides, we estimated the coefficients from the hidden layer and the Shapley values for the simple recurrent neural network (sRNN). Shapley values evaluate the contribution of each feature to the prediction 11 . To estimate the coefficients from the hidden layer for the simple recurrent neural network, we applied the featurewise summation of the coefficients (506 different coefficients were available per feature).
These models were trained on the training set and optimized on the validation set, and the feature importance was determined on the test set. Here, the input data were applied to determine the binarized cancer-specific death status.
Finally, we evaluated whether all top 20 features for each algorithm were clinically relevant from the urological perspective or included administrative, social, and demographic parameters or outdated features no longer considered by the SEER program. The clinical relevance was determined based on whether this feature covered the tumor staging, race, age at diagnosis, the number of malignancies, treatment-related information, or a cancer-specific risk factor (e.g., biopsy information, biomarkers, number of positive lymph nodes). Further, the list shall not contain redundant or outdated features (e.g., year of diagnosis, different versions of parameters for tumor stage). A top feature list was considered invalid when it contained any clinically not relevant features.

Feature Selection for Model Development
We aimed to select features for survival modeling that have the relevant clinical information covering tumor stage and biomarkers in addition to age at diagnosis and race due to their well-established clinical importance in urologic cancers [1][2][3][4][5] .
The approach for feature selection was considered depending on the results from the Since the SEER database can cover all variations of the disease conditions and characteristics 12 , we defined a puzzle concept to develop our cancer-specific survival model. For that purpose, we preferred the recurrent neural network (RNN) to reconstruct the risk profile based on information sequence since RNN is effective in learning the prediction of the outcome based on the information sequence 13 . For model development, we assumed that each patient provided an information sequence for the given t time and therefore contributed to the reconstruction of the cohort's risk profile history. The model would be able to estimate the cancer-specific survival probabilities for a period T where t is an element of T: = [1,44]. While developing and validating our model, we followed the recommendation made by PROGRESS 14 .

Hyperparameter Configuration
A grid search was applied to identify the optimal hyperparameter configuration for model architecture [i.e., simple recurrent neural network (sRNN), gated recurrent units (GRU) 15  The risk velocity-based follow-up assessment After developing the risk-profile reconstruction model, we estimated the risk velocity as risk change over a time unit because it reflects the time-dependent alteration in the risk profile and prognosis of the case. High velocity means a fast change of the prognosis, and a low velocity means that the risk profile is changing slowly over time. A velocity stillstand means that the risk profile is stable over time.
To develop the risk velocity-based follow-up assessment for a case, we defined the following algorithms: 1. Since we aimed to develop a risk-adapted follow-up strategy for a single case, the risk prognostication is considered as a follow-up score (s). The collection of the follow-up scores (S) are defined as = { | ∈ ℝ ∧ 0 ≤ ≤ 1}.
2. We defined a set of 120 time points (data points) (P) with an equal time distance in a year (k) in the period (f) from the age at diagnosis (a) to the median U.S. life expectancy 19 (b) (78 years) to generate a smooth follow-up curve. ix. = 2 .
6. The definition of the period for a close follow-up is based on the grade of velocities; time intervals with score velocities exceeding the threshold of 0.5% were marked for a close follow-up. The reason for defining 0.5% is that the overall lowest 5-year cancer-specific death probability was 0.5% 9 .
7. The follow-up plan needs to be adapted over time. Therefore, we added a suggestion rule based on the score velocity to define when the follow-up plan to reconsider. Since a score velocity of 0 means that the risk profile remained unchanged or stable for a given time interval, it is intuitively to assess the followup plan in this condition.

Evaluation Metrics
The discriminatory accuracy was measured by Harrell's Concordance Index 20 . The fitness of the model was assessed by comparing Kaplan-Meier Curves between the prognosticated and observed probabilities 21 . The clinical utility of survival modeling was assessed by the capability to define discriminative risk groups. Here, we assessed the observed cancer-specific survival probabilities at the 10 th year from diagnosis after stratifying the test set by quantile thresholds (10%,50%,90%) determined from the training set for the prognosticated cancer-specific death probability. The log-rank test was applied to determine the discrimination significance between the two risk groups. The

Results
In this SEER database, the complete follow-up information was available for 1,941,893 cases. Here, 14.34% of the patients died due to one of the urologic cancers, with an overall median follow-up of 12 years (range 1-44 years). Figure 1 illustrates the distribution of the cancer-specific death status over the year. When these patients stratified by the age at diagnosis, the age group (30-35 years) had a median follow up of 14 years, and patients belonging to the age group (65-69 years) had a median follow up of 13 years ( Figure 1A).
By stratifying according to the cancer-specific death status, the median time to cancer-specific death was 11 years, and the median observation time was 13 years for those who were event-free at the time of completing the 18 th version of the SEER database ( Figure 1B and C).   The number of positive lymph nodes: The range between 0 and 90 >=90 Unknown or not performed (-1) Regional nodes examined 1988 The number of lymph nodes examined: The   (Figure 3). When the test set is stratified by the cancer origins (organs), we found that the KM fitness mostly remained unchanged. The stratification by quantile thresholds showed well-discriminative risk groups (Logrank P<0.001), as shown in Figure 4B. When we compare to the tumor dissemination status that used by the SEER program for survival estimation (Figure 4A)  significantly discriminative (Log-rank P<0.005). For figure A, we calculated Log-rank P using localized, distant, regional, and regional/localized stage groups. Localized and regional prostate cancer cases were combined due to their high 5-year survival rates (nearly 100%) compared to other entities. The KM curves were generated on the test set.

Follow-up Management Solution
We prepared an access-controlled web portal that incorporates the AI solution (https://www.myhealthadvisor.info) and covers 6 major urologic cancers for follow-up planning. This solution proposed intervals for follow-up and provided information about the cancer-specific follow-up score over the follow-up as a curve; it highlighted the case's BCR RTX cM0 cN0 cM1b CRPC A white man, 60 years old at time of diagnosis had PCa confirmed by biopsy that was indicated due to elevated PSA. Detailed information about biopsy cores is not available.
Recommended follow-up visits per year: A white man, 65 years old at time of diagnosis had PCa confirmed by biopsy (cT1c) that was indicated due to elevated PSA of 5 ng/mL (08'). 2 of 12 biopsy cores were positive and the maximum tumor length was 15%. Gleason grade was 3+3. Active surveilance was recommended.
A white man, 67 years old at time of diagnosis had PCa confirmed by biopsy (1/10) that was indicated due to elevated PSA. The patient was treated with RPE (04') and had pT3aR0 pN1 (1/3) cM0 and Gleason grade 4+5=9.
Re-evaluation of the follow-up plan is recommended here start ADT A white man, 61 years old at time of diagnosis had PCa confirmed by biopsy that was indicated due to elevated PSA. Only the total number of biopsy cores is known (6). The patient was treated with RPE (95') and had pT3aR0 pN0 cM0 and Gleason grade 3+4=7a. We identified that the observation intervals (red area in Figure 5) suggested by our algorithms covered the first 5 years mostly considered by the current clinical guidelines and the follow-up plan practices 23 . Further, our approach has covered the time beyond 5 years where we do not have data from studies on the optimal observation management for urologic cancer diseases.
Our AI solution can deliver clinically meaningful follow-up plans as illustrated on men diagnosed with prostate cancer. For instance, our solution could correctly observe the example case who developed castration-resistant prostate cancer (CRPC).
Developing a follow-up model for cancer survivors is considered one of the major challenges in clinical routine 24 . Various cancer organizations recommend using a risk-stratified approach for care and transition for follow-up planning 24 . Our solution may help the cancer patients get a data-driven follow-up plan personalized based on their risk profile.
The feature importance aims to explain the contribution of a feature to the prediction in a machine learning model 25 (Figure 1 and 3).
The current study has potential limitations or controversies that warrant mention. First, cancer progression is a dynamic process and influenced by different factors. However, our study approached a puzzle completion concept to develop survival modeling using a large dataset covering a long follow-up period to reduce this limitation. The diagnostic and therapeutic improvement has helped to detect cancers in the earlier stages, resulting in longer survival as reflected in the SEER database 9 . Most recent drug options for different urologic cancers found their first approval in early 2000 27,28 . Although the SEER database does not include detailed treatment information, the version of the SEER database ) considered by the current study very likely covered patient groups who received such approved drugs. There was a lack of information regarding patient comorbidities and complications after local treatment or information related to different treatment options, which undoubtedly influences progression and survival. The staging procedure was not standardized, which may have resulted in an underestimation/overestimation of the tumor extent and thereby influenced the data quality. However, the SEER-based data represent the actual situation in the daily clinical setting.
Further, the SEER database is the only comprehensive population-based database in the United States and represents an ideal approach for studying the survival of patients diagnosed with cancer diseases, especially in recent periods. We did not consider the treatment status in our survival modeling because the clinical and pathological information may include hidden information about the treatment strategy. Although we did not study other well-known prognostic factors like treatment response, we preferred the specific cancer death status as an endpoint to develop a follow-up model, because it is also a reflection of the treatment response and cancer progression risk. We successfully demonstrated that AI helps develop personalized follow-up strategies based on the risk profile in contrast to the current practice. We preferred the national cancer registry because it represents different clinical conditions of cancer diseases more than any study cohorts from a single institution or even from two institutions 12 . We did not consider the tumor grading for bladder cancers due to the significant variation in the definition of tumor grading between the versions and the challenge of adapting the recent version of tumor grading in clinical routine 29,30 . We currently preferred AFP over the serum level of beta human chorionic gonadotropin (BHCG) for testicular cancer because AFP provides additional information on the subtype of testicular cancer compared to BHCG 31 .
Prospectively, we aim to connect this AI solution to the Hospital Information System (HIS) using API (Application Programming Interface) and HL7 protocol 32 as an additional tool to the HIS-based application for the real-time survival rate estimation introduced by our previous study 33 . Further, another potential utilization of our AI-based follow-up planner is defining the follow-up for clinical trials.
In summary, the AI-based solution for follow-up management is feasible; it provides decision-aided and patient-friendly tools for a more personalized follow-up plan for men with prostate cancer. We believe that having a defined follow-up plan in advance provides better time management for the patients and healthcare personals. Further, the current study successfully unlocked the potential of national cancer registry to develop clinical-oriented solutions. Finally, we call the research teams from different geographies to validate our model independently as it is essential to validate it by different independent studies 14 . A secured API access required to communicate with the solution will be provided for these research teams upon request. The API access will not be publicly available due to the security concerns.