Artificial Intelligence Predictive Model for Hormone Therapy Use in Prostate Cancer

Background Androgen deprivation therapy (ADT) with radiotherapy can benefit patients with localized prostate cancer. However, ADT can negatively impact quality of life and there remain no validated predictive models to guide its use. Methods Digital pathology image and clinical data from pre-treatment prostate tissue from 5,727 patients enrolled on five phase III randomized trials treated with radiotherapy +/− ADT were used to develop and validate an artificial intelligence (AI)-derived predictive model to assess ADT benefit with the primary endpoint of distant metastasis. After the model was locked, validation was performed on NRG/RTOG 9408 (n = 1,594) that randomized men to radiotherapy +/− 4 months of ADT. Fine-Gray regression and restricted mean survival times were used to assess the interaction between treatment and predictive model and within predictive model positive and negative subgroup treatment effects. Results In the NRG/RTOG 9408 validation cohort (14.9 years of median follow-up), ADT significantly improved time to distant metastasis (subdistribution hazard ratio [sHR] = 0.64, 95%CI [0.45–0.90], p = 0.01). The predictive model-treatment interaction was significant (p-interaction = 0.01). In predictive model positive patients (n = 543, 34%), ADT significantly reduced the risk of distant metastasis compared to radiotherapy alone (sHR = 0.34, 95%CI [0.19–0.63], p < 0.001). There were no significant differences between treatment arms in the predictive model negative subgroup (n = 1,051, 66%; sHR = 0.92, 95%CI [0.59–1.43], p = 0.71). Conclusions Our data, derived and validated from completed randomized phase III trials, show that an AI-based predictive model was able to identify prostate cancer patients, with predominately intermediate-risk disease, who are likely to benefit from short-term ADT.


Introduction
Radiotherapy is a common form of treatment administered with curative intent, for localized prostate cancer. Trials conducted since the 1980s consistently demonstrate an improvement in oncologic outcomes when androgen deprivation therapy (ADT) is added to radiotherapy [1][2][3][4][5] . However, ADT has well-documented toxicity, including hot ashes, declines in libido and erectile function, loss of muscle mass, increase in body fat, osteoporosis, and potential deleterious effects on cardiac and brain health 6 .
Unfortunately, there remains no predictive biomarkers to identify which men speci cally derive bene t from ADT with radiotherapy, and thus current guidelines recommend the use of ADT based on prognostic National Comprehensive Cancer Network (NCCN) risk groups or other methods of prognostication 12 .
Gleason grading has modest prognostic ability and a plethora of tissue-based gene expression, serum, and imaging biomarkers have also been developed. While some have demonstrated improvements in prognostication 13 , none have been shown to function as predictive biomarkers for ADT use with randomized trial validation. Thus, there is a large unmet need to guide the individualized use of ADT with radiotherapy for men with localized prostate cancer.
Digital pathology has been used for years as a method to archive, visualize, and share histopathology images 14 . More recently, there has been growing interest in leveraging arti cial intelligence (AI) to assist in the diagnosis and grading of prostate cancer [15][16][17] . Fundamentally, these efforts restrict AI to predict human interpretable and de ned features (i.e. Gleason score). In a recent study, a multi-modal AI (MMAI) system leveraging digital histopathology and clinical data from ve NRG Oncology phase III clinical trials, termed the MMAI Prostate Prognostic Model, was used to develop and validate prognostic models that consistently outperformed NCCN risk groups in localized prostate cancer 18 . In this study, we extend this approach by adapting MMAI Prostate Prognostic Model to develop a predictive model, based on "deep learning" that has the potential to be used to identify men who will bene t from ADT.
In this report, we used extant data from four NRG Oncology North American phase III randomized trials, i.e., NRG 9202, 9413, 9910, and 0126, with long term follow-up data, including pathology images. Data from these trials were acquired and digitized and used to train a predictive AI model for the identi cation of men with localized prostate cancer that were likely to derive differential bene t from the addition of ADT to radiotherapy. This predictive model for differential bene t from ADT was then validated using data from NRG/RTOG 9408, a clinical trial which randomized men to treatment with radiotherapy plus or minus 4 months of ADT; this trial consisted mostly of men with intermediate-risk prostate cancer, de ned as Gleason score of 7 or a Gleason score of 6 or less with a PSA 10-20 ng/mL or a clinical stage T2b

Methods
Ancillary Project and Trial Details NRG Oncology randomized phase III trials conducted in men with localized non-metastatic prostate cancer that enrolled at least a subset of patients with intermediate-risk disease, included treatment with radiotherapy alone or with ADT, had long-term follow-up de ned as a median follow-up greater than 8 years, and had stored histopathology slides in the NRG Oncology Biospecimen Bank were eligible for inclusion. Trials testing the use of chemotherapy were excluded. Data from ve prospective phase III randomized trials (NRG/RTOG 9202, 9413, 9910, 0126, and 9408) were identi ed and used for the development and validation of a predictive model for the escalation of hormone therapy in patients with localized prostate cancer [7][8][9][10][11] . NRG/RTOG 9408 was used as the validation cohort in this study as it represents one of the largest phase III clinical trials evaluating patients who received radiotherapy with or without 4 months of ADT. All image data from the remaining trials were used for the image feature extraction model, and full image, clinical and outcome data from NRG/RTOG 9910 and 0126 were used for downstream predictive model development.

Objective and endpoints
The primary objective was to develop and validate an AI-based predictive model that could identify differential bene t from the addition of short-term ADT to radiotherapy in localized prostate cancer. The primary endpoint was time to distant metastasis, measured from time of randomization until development of distant metastasis or last follow-up. The secondary objective was to evaluate the predictive model on a secondary endpoint, prostate cancer-speci c mortality (de ned in the present study as death in the setting of distant metastasis). Metastasis-free survival (MFS, distant metastasis or death from any cause) and overall survival (OS) were evaluated as exploratory endpoints.

Histopathology image acquisition
Unannotated hematoxylin and eosin (H&E)-stained histopathology slides in patients with localized prostate cancer from the NRG Oncology Biospecimen Bank were independently digitized without access to clinical outcomes data. The slides were digitized using a Leica Biosystems Aperio AT2 digital pathology scanner at a 20x magni cation level. Image feature extraction model development The rst component of model development was image feature extraction, which was trained on images only to recognize de ning tissue features and did not evaluate any clinical variables or outcomes. For each patient the tissue across all available digital slides were divided into 256 x 256-pixel patches. A Resnet-50 feature extraction model was trained on image patches using self-supervised learning (SSL) 19 .
We employed the MoCo-v2 training protocol without access to any clinical or outcomes data 20  tuning (40%) sets for model training and hyperparameter tuning, respectively 21,22 . Clinical data, image data, and treatment types were used as inputs to a multimodal predictive model architecture ( Figure S1A in Supplementary Appendix). The treatment type was used only for model development; treatment type was not required for model score generation on the locked model. The image and clinical data were preprocessed as speci ed in the Methods for Multimodal Deep Learning Model Development Section in Supplementary Appendix. The multimodal predictive model optimized the difference in the magnitude of ADT bene t, outputting a continuous score 'delta' (Figure S1A in Supplementary Appendix). The 67th percentile of the delta scores in the development set was selected as the cutoff threshold as it maximized the difference between predictive model subgroup treatment effects in the tuning set and would result in reasonably sized predictive model subgroups for clinical utility. Patients with a delta score greater than the cutoff are classi ed as predictive model positive and those below the cutoff as predictive model negative ( Figure  S1B in Supplementary Appendix). Model development was performed using Python programming language (Python Software Foundation. Python Language Reference, version 3.8.12. Available at http://www.python.org). After the model was locked, it was provided to independent biostatisticians (HCH and JZ) to perform clinical validation of the model in NRG/RTOG 9408.

Statistical Analysis
The NRG/RTOG 9408 validation cohort characteristics by predictive model status (positive or negative) were reported and compared using chi-square test or Fisher's exact test in the presence of low cell counts for categorical variables, and Wilcoxon rank-sum test for continuous variables. Time to event was analyzed using the cumulative incidence function; for distant metastasis and prostate cancer-speci c mortality, death without the corresponding event was treated as a competing risk. Fine and Gray regression was also performed to estimate the subdistribution hazard ratio (sHR) and 95% con dence interval (CI) for the short-term ADT treatment effect for distant metastasis and prostate cancer-speci c mortality 23 . A test for predictive model-treatment interaction was performed to evaluate this predictive model. Treatment effects of the predictive model positive and negative subgroups were similarly assessed as the overall validation cohort to measure the relative treatment effect between arms. Fifteenyear restricted mean survival times were reported to provide alternative estimates given non-proportional hazards were observed 10 .
Exploratory subgroup analyses were performed where the primary analysis was reanalyzed within NCCN low-and intermediate-risk patients. Due to stage and Gleason score migration, low-risk patients from NRG/RTOG 9408 are more similar to contemporary intermediate-risk patients and were included in the subgroup analyses. Statistical analyses were performed using R, version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria). No multiplicity adjustments for the secondary and exploratory endpoints were de ned. Therefore, only point estimates and 95% con dence intervals are provided. The con dence intervals have not been adjusted for multiple comparisons and should not be used to infer de nitive treatment effects. Differences in percentages may not add up due to rounding.

Patient and Model Characteristics
Of the 7,752 eligible patients enrolled on the ve phase III randomized trials, 6,020 (77.7%) patients had available slides at the NRG Biospecimen Bank. Of these patients, 5,727 (95.1%) had available pretreatment prostate slides. Pre-treatment slides were not available for 285 patients and 8 patients had insu cient tissue. Additionally, 39 patients with transurethral resection of the prostate samples were further excluded from the validation cohort (NRG/RTOG 9408). Details regarding the representativeness of the trial patients are provided in Table S3 in Supplementary Appendix 24 .
The development cohort for the downstream predictive model for differential bene t from ADT had 2,024 patients with a median follow-up of 10.6 years, and 1,050 (52%) patients received radiotherapy alone and 974 (48%) patients received radiotherapy with short-term ADT (Table S2 and Table S4 in Supplementary Appendix). The median PSA was 9 ng/mL (interquartile range [IQR], 6-13), 87% had intermediate-risk disease, and the median age was 71 years (IQR, 65-74). The nal locked model was comprised primarily of histopathology features (Gleason score and imaging features), contributing to more than 86% of model prediction ( Figure S2 in Supplementary Appendix). While histopathology features provide a large contribution, the multi-modal AI architecture utilizes deep learning and also captures interaction effects, with the model bene tting from learning of all features.
The validation set (NRG/RTOG 9408) consisted of 1,594 patients with a median follow-up of 14.9 years, with the arms reasonably balanced in size (RT alone = 806 patients, and RT plus short-term ADT = 788 patients; Fig. 1 and Table 1). The median PSA was 8 ng/mL (IQR, 6-12), 56% had intermediate-risk disease, and the median age was 71 years (IQR, 66-74). To evaluate representativeness of the overall trial cohort, baseline characteristics between trial arms, evaluable cohort and original eligible cohorts for NRG/RTOG 9408 trial were outlined in Table 1. In the validation set, 543 patients (34%) were classi ed as predictive model positive (predicted to bene t most from short-term ADT), and 1,051 patients (66%) were predictive model negative (predicted to derive lesser or no bene t from short-term ADT). Baseline characteristics were generally well-matched between predictive model positive and negative patients, except Gleason score where 24% predictive model positive patients versus 30% predictive model negative patients had a Gleason score 7 (Table S5 in Supplementary Appendix).
On exploratory subset analysis, when restricting the analyses to solely patients with low-and intermediate-risk disease the results remained similar ( Figure S3 in Supplementary Appendix).
We did not observe differential treatment bene ts between predictive model subgroups on the exploratory endpoints, MFS and OS (p-interaction = 0.31 and 0.23, respectively; Figure S4 in Supplementary Appendix). The predictive model effects on distant metastasis and prostate cancer-speci c mortality were evaluated within each treatment arm (Table S6 in Supplementary Appendix). For distant metastasis, within the RT alone arm, the predictive model positive vs negative subgroup sHR was 1.93 (95% CI 1.24-2.98), whereas within the RT + short-term ADT arm, the predictive model sHR was 0.72 (95% CI 0.39-1.34); similar results were found for prostate cancer-speci c mortality as well.

Discussion
The current standard of care for men with intermediate-risk, speci cally unfavorable intermediate-risk, localized prostate cancer treated with RT is the addition of short-term ADT. Despite the improvement in outcomes in all-comers, the majority of men will not develop distant metastasis with RT alone, and many will experience side effects from ADT. Unfortunately, there are no validated predictive models to guide ADT use or duration in these men. Herein, we report our results using novel deep learning methodology and leveraging image data from over 5,000 patients on ve phase III randomized trials with long-term follow-up to create and validate a predictive model to guide ADT use with RT in men with localized prostate cancer.
As a patient's prognosis worsens (i.e., going from NCCN low-to high-risk) the recommendations to add ADT to RT strengthen. This is despite evidence that NCCN risk groups are not predictive of ADT bene t 5 .
To this point, we demonstrate that among patients with positive and negative AI model predictions, the baseline PSA, T-stage, and NCCN risk group distribution, were similar; there were small differences in Gleason score. These results con rm that historical categorization of tumor aggressiveness alone is insu cient to determine which patients derive differential relative bene t from ADT.
A concern with any model is the possibility of over tting and failure to validate. This cannot be overstated, and independent validation remains necessary to prove the performance of a model. In the speci c case of predictive models, which aim to identify those patients who derive greater or lesser relative bene t, this almost always should be performed within the context of a randomized trial of the treatment of interest to avoid confounding and bias between arms. Herein, we intentionally selected NRG/RTOG 9408, as it remains the largest published trial of radiotherapy with or without short-term ADT with very long-term follow-up. While there was clear bene t of ADT in unselected patients in this trial, the majority of patients enrolled had no demonstrable bene t. Our results indicate that over 60% of the intermediate-risk patients enrolled on NRG/RTOG 9408 could be spared the morbidity and costs of ADT.
The primary endpoint of time to distant metastasis was speci cally selected to train the short-term ADT predictive model. Other endpoints, such as biochemical recurrence, metastasis-free survival (MFS), and OS all have clinical relevance, but in the context of localized prostate cancer model development have notable limitations. ADT inhibits PSA production, and thus ADT is expected to delay biochemical recurrence irrespective of subgroup. Furthermore, the majority of biochemical recurrence events do not result in metastasis or death 25 . Therefore, it is a suboptimal endpoint for model training to determine intrinsic tumor-speci c bene t from ADT. MFS and OS are important endpoints for determining the net effect of a given therapy and are the gold-standard for clinical trial design as they also capture death from competing causes. However, they are suboptimal endpoints for development of prostate cancerspeci c predictive models for localized disease. This is because 78% of deaths in the validation cohort were not from prostate cancer, and only 12% of events in the MFS endpoint were from metastatic events. Thus, the strongest prediction models for MFS and OS would be driven by variables associated with death from non-prostate cancer causes (i.e., comorbid conditions). Importantly, despite the model being trained for distant metastasis, it showed a clear differential impact of ADT by predictive model status for prostate cancer-speci c mortality, a cancer-driven endpoint.
As with any model, generalizability is critical. Concerns have been raised from AI models derived from a limited number of centers and in cohorts with limited diversity. Due to the limitations of the available data, we were unable to fully account for the potential confounding effect of factors impacting various aspects of health (e.g., socioeconomic status). Fortunately, NRG/RTOG enrolls patients from over 500 centers across primarily the USA and Canada from academic, community, and Veterans Affairs centers, and 20% of the 1,594 patients in the validation cohort were African American, which is higher than the proportion of African American men (15.6%) diagnosed with localized prostate cancer in the United States 26 . This important real-world diversity strengthens the generalizability of our ndings. However, this study was underpowered to further assess the predictive performance of the model for African American men and future studies are needed for evaluation.
The study has limitations. Similar to other prognostic and predictive models in active clinical use, our short-term ADT predictive model was not developed and validated as part of a de novo prospective model dedicated trial. This approach is supported by Simon et al, and use of a randomized trial of RT with or without ADT strengthens the credibility and level of evidence of our work 27 . During the era of conduct and follow-up of this trial, there was effectively no use of advanced molecular imaging. Grade migration due to changes in the Gleason grading system may also have impacted patient strati cation into NCCN risk groups. However, any potential biases introduced by this are likely random and impact both trial arms, and the raw histopathology imagery would not be impacted by changes in de nitions of grading over time. Information on other prognostic clinicopathologic variables, such as percentage Gleason pattern 4 or percent positive biopsy cores were not available. Thus, alternative risk-classi cations schemas for exploratory analyses were not performed 28,29 .

Conclusions
We have developed and independently validated in a completed phase III randomized trial an AI-based predictive model to guide ADT use with radiotherapy in localized prostate cancer using a novel, multimodal digital pathology AI-derived digital pathology-based platform. Using this predictive model, we showed from the trial data that the majority of intermediate-risk patients did not bene t from ADT treatment.

Declarations
Data Sharing Statement: The data published in this article will be publicly available six months from publication, through requests made to NRG Oncology at APC@nrgoncology.org.
Model Availability Statement: The model used in this study is proprietary and currently available to patients as part of a clinical test that may be ordered by physicians through the Artera laboratory. It is not currently available for public research use due to commercial restrictions.  Karnofsky performance status scores range from 0 to 100. A higher score indicates the patient having better ability to carry out daily activities.

Figures
ST-ADT = short-term androgen-deprivation therapy; RT = radiotherapy. Cumulative incidence in the validation cohort, NRG/RTOG 9408, histopathology-imaged patients by AIpredictive model subgroups for A) distant metastasis and B) prostate cancer-speci c mortality.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. EV2300023.R2SuppClean.docx