Personalized Prognostication of Kidney Transplant Candidates on the Waiting List: A Machine Learning Approach

Kidney transplantation (KT) remains the best available treatment for end-stage kidney disease. Currently, there is an unmet need for personalized clinical decision aids that help clinicians and KT candidates on the waitlist (WL) make informed decisions. To this end, we took a machine learning (ML)-based approach by using random survival forest with competing risks (RSF-CR) to develop a personalized prognostication system to predict the patient outcomes. performance. This work demonstrated the value of using RSF-CR in the identication of prognostic factors and personalized prediction of the outcomes for KT candidates on WL. More research is warranted to fully unlock the potential of ML in providing personalized medicine.


Introduction
Kidney transplantation (KT) remains the best available treatment for end-stage renal disease (ESRD); however, the need for kidneys continues to far exceed the number of organs available. At the end of 2018, there were approximately 92,000 KT candidates on the waitlist (WL), and during the same year, 14,784 and 6,120 received deceased (DDKT) and living donor KT (LDKT) respectively, 4,193 died, and 4,240 became too sick for transplant while waiting for a kidney. 1 Keeping patients well-informed of likely outcomes and treatment options is an essential component of patient-centered care and informed decision-making. It is thus critical to develop and adopt clinical decision aids that healthcare services may use to provide patients with accurate and complete information 2 . KT candidates often wonder and debate between multiple treatment options (e.g. whether to keep receiving statin or go on dialysis while waiting for a kidney) 3 . Consequently, clinicians are frequently asked by KT candidates about the likelihood of receiving a kidney during a clinical consultation. In the United States, the answer to the question primarily depends on the median waiting time -the time to when 50% of candidates received KT in the past, which is calculated based on a Kaplan-Meier survival curve 4 . While providing this statistic does give patients an answer, this value fails to fully account for several important factors such as the presence of competing risks and inter-individual differences that make each case unique severely limiting the accuracy of the information provided.
Therefore, addressing these shortcomings of the current practice is imperative for clinicians to provide more personalized care and improve informed decision-making for KT candidates on WL.
Competing risks are factors that preclude the main event of interest being observed, and therefore should be considered accordingly. The Kaplan-Meier survival estimate treats competing risks merely as censored events resulting in the overestimation of the survival probability 5 . Particularly for receipt of KT, WL removal due to death or deteriorating health condition may be considered a competing risk and should be treated as such. Previously, Cox regression models with competing risks were developed using the Fine-Gray method with a priori variables to predict the probabilities of death, KT, and remaining on WL at the time of listing 4 . However, while the use of a set of preselected variables helps ensure a model is clinically relevant, there is a potential risk of missing important factors that are indeed predictive of the outcome. Additionally, in December 2014, the new Kidney Allocation System (KAS) went into effect in the United States, introducing several major allocation policy changes for deceased KT donors 6 . As a result, there has been a considerable impact on the rates of deceased donor KT especially for highly-sensitized patients with high calculated panel reactive antibody (cPRA) 7,8 further limiting the validity of the previously developed model in the post-KAS era. Finally, more studies are needed to elucidate the advantages and disadvantages of the random forests-based algorithm proposed recently by Ishwaran et al. 9 over the Fine-Gray method. Random survival forests with competing risks (RSF-CR) is a nonparametric machine learning (ML) algorithm that does not make any assumptions about the underlying distributions in the training data and automatically accounts for high-level interactions and higher-order terms in the features being analyzed. This allows the algorithm to learn complex patterns effectively from time-to-event data and achieve the calculation of variable importance and outcome prediction without preselection of a priori variables.
In this study, we employed the RSF-CR method to those who were listed after the introduction of the KAS to construct prognostic models that compute the one year post-listing probabilities of two potential outcomes: receipt of KT and WL removal due to death or deteriorating condition.

Study design
This study used data from the Scienti c Registry of Transplant Recipients (SRTR). The SRTR data system includes data on all donor, wait-listed candidates, and transplant recipients in the US, submitted by the members of the Organ Procurement and Transplantation Network (OPTN). The Health Resources and Services Administration (HRSA), U.S. Department of Health and Human Services provides oversight to the activities of the OPTN and SRTR contractors. The data on adult kidney transplant candidates listed between 12/04/2014 and 12/04/2016 was used to develop ML models to predict two potential outcomes: receipt of KT (primary outcome) and WL removal due to death or deteriorating condition (secondary outcome) ( Figure 1). WL removal for other reasons was censored. The dataset was split on a WL date of 12/05/2015 into development and validation sets.

Model training (development)
Grid search was performed with nine different model con gurations to tune three model parameters: the number of trees (ntree), node size (nodesize), and the number of variables selected for each tree (mtry). BS at one year post-listing with standard deviation and IBS were used as the performance metrics to select the nal model for validation. The algorithm was implemented using the randomForestSRC package in R 10 .

Feature selection
With regards to feature selection, RSF-CR is considered an embedded method in which feature selection is integrated as part of the learning 11 . As such, variables that are predictive of the outcome were determined using the best-performing model based on VIS. VIS of less than 0 indicates a non-informative factor and thus was removed. VIS was calculated by taking the difference between prediction error under the predictor perturbed via random node assignment and the original predictor for each tree and averaged over x number of trees. The accuracy generally is higher for a lower number of trees used for aggregation, which ranges between one and the number of trees used to build the forest 9 . We averaged the values over 10 trees for the current study to achieve good accuracy while keeping the computation time and required memory manageable.

RSF-CR
RSF-CR as proposed by Ishwaran et al. is an extension of the vanilla random forest algorithm developed by Brieman. As our main aim is to assist decision-making rather than investigate treatment options with the models, we used the event-speci c CIF rather than the hazard function as the model output. The CIF is de ned as the probability P of experiencing an event of type j given a set of covariates x by time t: where T 0 is the event time observed and δ 0 is the event type of interest. For performance evaluation, outof-bag (OOB) CIF was used, which is derived from an OOB ensemble constructed with data that is excluded from each bootstrap sample.
The following steps describe how RFS-CR is constructed: 1. Take k bootstrap samples from training data for the k trees to be included in the forest.
2. Grow a risk tree for each bootstrap sample. At each node, randomly select lcandidate variables. The node is split on a candidate variable such that it maximizes a splitting rule (i.e. modi ed Gray's test).
3. Grow the tree until a terminal node has no less than m unique cases.
4. Compute the cause-speci c CIF, cause-speci c hazard function, cause-speci c mortality, and forest event-free survival for each tree.

Model validation and evaluation
Model validation was conducted on the validation set. Overall predictive performance and discrimination were evaluated using time-dependent IBS, and ROC-AUC. IBS is an overall performance measure calculated by integrating BS over all available times. ROC-AUC is another indicator of performance based on the area under the true positive rate (sensitivity) versus false positive rate (1-speci city) curve. Error analysis was performed by bootstrapping with replacement to estimate the 95% con dence intervals. As probability-based measures suffer from the presence of an imbalance in the dataset, we also evaluated precision (positive predictive value), recall (sensitivity), and F1 of the classi ers by dichotomizing the predicted probabilities at an optimal threshold value determined by decision curve analysis. F1 is a harmonic mean of precision and recall 12 de ned as: Model calibration was assessed via calibration plot by plotting the observed prevalence against the mean predicted probability for the deciles of the predicted probability 13 . Additional analyses were carried out in patient subgroups by cPRA category, OPTN region, listing year, and race where we compared the mean predicted probability with observed prevalence in each of the subgroups. Additionally, decision curve analysis was performed to assess the clinical utility of the nal models. To develop decision curves, net bene ts (NBs) were plotted against threshold probabilities of zero through one with an increment of 0.01 for three different scenarios: all patients are treated, no patients are treated, and only selected patients are treated using the prognostic systems. The net bene t was calculated by using the following formula: where TP = true positive count, n = sample size, FP = false positive count, and p t = threshold probability 14 .
The calculation of NB has been adjusted from its original form such that it accounts for censored events and competing risks accordingly 15 . Reduction in avoidable treatment per 100 patients (N 100 ) was then computed as follows: where NB m = net bene t of the model and NB treat all = net bene t of treat all.

Results
Predictive modeling process To develop reliable and robust RSF-CR models, we followed the procedure as described in Figure 1. First, data were pre-processed as outlined and split into development (training) and validation (test) sets on a listing date of December 4, 2015. The development set was then used to train RSF-CR models. Grid search with nine different model con gurations was performed for hyperparameter tuning. Variable importance score (VIS) was obtained from the best performing model to assess the contribution of each predictor to the outcome prediction. Finally, the selected model was validated on the validation set using a series of statistical techniques commonly used for clinical predictive modeling: receiver-operating curve (ROC) analysis, calibration, and decision curve analysis.
Patient characteristics: development and validation cohorts   Table 2). Although minor discrepancies were found for some groups such as 0.99 cPRA (∆ = -5.69), 0.9-0.97 cPRA (∆ = -5.38), and Region 8 (∆ = -4.12), there was generally good agreement between the mean estimated CIF 1yr, KT and observed CIF 1yr, KT . Next, the clinical utility of the models was assessed via decision curve analysis ( Figure 5). Compared with treating all patients and treating none, treating patients based on predicted CIF 1yr, KT provided greater net bene t (NB) at certain threshold probabilities. Treatment here refers to any clinical intervention that may be suggested when a patient is predicted to undergo KT at one year post-listing such as discontinuation of statin therapy. NB peaked at a threshold probability of 0.28. At this threshold probability, the NB was 0.217, which translates to a reduction by 10.15 patients in avoidable treatment per 100 patients compared to the treat all strategy. As our dataset contained data imbalance where a disproportional set of patients was censored compared to those who experienced one of the events of interest, we took a moving threshold approach to address the issue. 16 That is, the optimal threshold probability from the decision curve analysis was used to categorize the patients into two groups: 1 (predicted to receive KT at one year post-listing) versus 0 (predicted not to undergo KT by one year post-listing). When using this cutoff, the model yielded excellent prognostic performance as shown by a confusion matrix and a series of performance measures in Figure 6. There were a total of 6,630 true positive cases, which corresponded to a detection prevalence rate of 0.24 when the observed prevalence rate was 0.27. We placed greater emphasis on the precision (positive predictive value), recall (sensitivity) and F 1 score as there was class imbalance in the dataset with substantially more patients not receiving KT. The precision and recall were 0.927 and 0.824 with the F1 being 0.873 suggesting that the model is highly capable of accurately differentiating one group from another. Likewise, predicted CIF 1yr, death demonstrated a degree of clinical utility (Supplementary Figure 3); using the RSF-CR model will assist clinicians in making clinical decisions about the administration of treatment to prevent death. Moreover, the predicted CIF 1yr, death exhibited high predictive performance when dichotomized at a probability of 0.11 at which the NB represents a reduction of avoidable treatment by 1.46 cases per 100 patients (Supplementary Figure 4). The number of true positive cases and detection prevalence reached 1,136 and 0.038 (observed prevalence = 0.049) respectively. The model achieved a precision of 0.975, recall of 0.785, and an F 1 score of 0.869.

Discussion
A recent report from 2018 indicates that there were nearly 90,000 KT candidates listed waiting to receive a kidney at the end of the year 1 . The introduction of the new KAS has signi cantly changed the mechanisms by which donor kidneys are allocated to patients to increase KT access to those who were previously at disadvantage such as HS patients 7 . This paradigm shift in donor allocation has necessitated a new prognostic system that accurately estimates the likelihood of individual patients to receive KT in the post-KAS era. While the Fine and Gray method has conventionally been used to model survival data with competing risks, RSF-CR 9 is another attractive technique to develop prediction models with such data. Therefore, we employed RSF-CR coupled with the post-KAS data from a national-scale registry database to identify predictors of the outcomes and build robust prognostic models. The nal models achieved good discrimination and calibration and exhibited good predictive performance for both KT and WL removal due to death or deteriorating condition at one year post-listing.
We have identi ed 16 predictors for CIF 1yr, KT and 18 predictors for CIF 1yr, death based on the VIS computed by the best-performing model. There is a body of evidence that supports the prognostic ability of each of these factors. Poor functional status was shown to be associated with both lower transplant rates 17 and a higher risk of WL mortality 18 . Lower albumin level has been described as a risk factor of decline in kidney function and renal diseases such as diabetes nephropathy 19 . Diabetes poses a great risk of increased mortality not just among KT candidates but also in the general population 20 rendering it one of the important variables considered by the KAS in donor allocation 21 . Inactive status was reported to be correlated with lower KT rates in a study where candidates initially having inactive status were more likely to be older, female, and African-American and have higher BMI 22 . Although one of the goals of the KAS is to reduce geographical inequality in donor allocation, it is still prevalent today 23 . This explains Organ Procurement and Transplant Network (OPTN) region being one of the predictors for CIF 1yr, KT. Racial disparities in access to KT are also one of the active areas of research with recent studies showing the importance of addressing this issue in increasing KT rates [24][25][26] . Interestingly, cPRA was not one of the signi cant prognostic factors although it is integral part of the new allocation policy. This observation is likely attributed to the de nition of KT used in the study, which includes both LDKT and DDKT as the primary scope of the KAS is to improve the rates of DDKT rather than LDKT.
For the present study, we utilized a random forests-based ML algorithm, RSF-CR 9 to develop prognostic models with survival data with competing risks. RSF, an extension of the Breiman's random forests 27 was originally proposed by Ishwaran et al. in 2008 28 upon which they built RSF-CR for competing risk applications. The advantages of this algorithm include the ability to (1) "learn" nonlinear relationships between the input features without stringent assumptions regarding the underlying data distributions (e.g. no model assumptions to be met); (2) quantify the variable importance of each factor for estimation of CIF; (3) potentially reveal new prognostic factors in a data-driven approach; (4) be able to account for the presence of competing risks to avoid overestimation of CIF; and (5) leverage the strength of random forests and RSF, both of which have empirically been shown to outperform traditional algorithms such as logistic regression 29 , decision trees 30 , and cox regression. 28 The developed prognostic system is superior to the current median waiting time approach for KT candidates, which does not account for interindividual differences and the presence of competing risks. The ML model provides the ability to estimate the likelihood of receiving KT and WL removal due to death or deteriorating health condition in a personalized manner. Based on the predicted probabilities, clinicians and patients can make more informed decisions about their treatment options.
Despite our results showing the potential of RFS-CR in predicting the patient outcomes, our study has several limitations, one of which includes the imbalanced nature of our dataset with substantially more patients being censored compared to those who experienced any of the events of interest. This resulted in some of the performance measures such as ROC-AUC and IBS to be relative rather than absolute rendering comparisons with other studies di cult. As a way to address this issue, we dichotomized the predicted probabilities at a threshold probability and employed the performance measures (precision, recall, and F 1 scores) known to be more appropriate in case of imbalanced datasets 31 . This moving threshold approach is one of the traditional methods to deal with class imbalance 16 . Another limitation of the study comes from the use of a registry database. Databases of large size were reported to have high error rates caused by various factors such as human data entry errors and misinterpretation of original documents 32 . We dedicated a substantial amount of time to data cleaning to minimize this risk. Also, we only validated the model with the registry data as an external dataset was not available. Thirdly, it is relatively recent that the KAS was put into effect, and therefore it was impractical to predict beyond one year post-listing. Finally, as with the original random forests algorithm, RSF-CR, by default, does not provide the direction of in uence for each predictor in the presence of multiple variables. There are several techniques that may be utilized to understand the directionality of each predictor such as Shapley Additive Explanation and Local Interpretable Model-Agnostic Explanations, to name a few 33 . Therefore, as a next step, we will perform additional analyses of the predictors along with external model validation to improve both local and global interpretability as well as the robustness of the model.
With the advent of advances in computing power, ML algorithms, and data generating capability, ML has generated an unprecedented volume of opportunities for various sectors including healthcare. While ML has gained increasing attention in the healthcare industry, it has seen a unique set of challenges due to the concomitant bioethical concerns surrounding the use of complex ML algorithms with poor interpretability 34,35 . This has led to the preferred use of algorithms such as decision trees and logistic regression that have high model transparency. However, more convoluted algorithms such as ensemble learning and deep learning models have been demonstrated to predict clinical outcomes with higher accuracy while being more e cient at handling complex data such as images and electric medical records 36 . Hence more research is needed to elucidate AI's capability in medicine while advancing ML algorithms and interpretability. In addition to predicting patient prognosis, ML models have potential in other aspects of patient care, such as diagnosis, treatment, and clinical work ow to increase e ciency and augment the work of clinicians. This synergy between clinicians and AI suggests that ML-based systems may make patient care accessible to a larger patient population while ensuring high-quality of service. This also highlights the importance of a concept known as "Augmented Intelligence", in which AI works to aid human physicians to maximize their performance. In addition, AI's potential has started to be realized by the pharmaceutical industry and clinical researchers. Among a number of ways AI may help the industry, one of the key applications of ML is to allow more systematic and accurate risk quanti cation, which may lead to higher success rates of clinical trials. As clinical studies for drug development require substantial amounts of resources, strategies to reduce trial failure are imperative. ML algorithms may be part of such strategies as a tool that aids in patient strati cation, treatment response identi cation, and/or subgroup identi cation 37 .

Conclusions
In all, here we demonstrated the potential of RSF-CR in both identi cation of prognostic factors and personalized prediction of the outcomes for KT candidates on WL. This strategy using the ML techniques is more ideal than the current median time-based approach as it enables a personalized prognostication of KT candidates helping patients make informed decisions. As ML progressively nds its place in medicine, more research is required to ensure successful clinical implementation and fully unlock the potential of AI in personalized medicine.