Patient cohort
There were 1231 first-time symptomatic stone formers identified. There were 1104/1231 (90%) patients included in 2-year analyses with 585/1104 (53%) male, a mean age of 50 (SD = 15) years, and 278/1104 (25%) symptomatic recurrences. There were 875/1231 (71%) included in 5-year analyses with 470/875 (54%) male, a mean age of 50 (SD = 15) years, and 385/875 (44%) symptomatic recurrences. The remaining features used for model training are presented in Tables 1 and 2.
Table 1
Cohort demographics and EHR features for patients with the 2- and 5-Year recurrence.
Patient Features | 2-Year Recurrence n = 1104 | 5-Year Recurrence n = 875 |
Recurrence | 278 (25) | 385 (44.0) |
Gender, Male | 585 (53) | 470 (54) |
Age (years)* | 50 (15) | 50 (15) |
BMI* | 30 (7.8) | 30 (7.8) |
Hyperlipidemia | 315 (29) | 256 (29) |
Bowel Disease | 101 (9) | 81 (9) |
Osteoporosis, immobility, hyperparathyroidism | 58 (5) | 43 (5) |
Epilepsy or migraines | 48 (4) | 40 (5) |
Hypertension | 602 (55) | 480 (55) |
Gout | 48 (4) | 37 (4) |
Type 2 Diabetes Mellitus | 247 (22) | 191 (22) |
Cystinuria | 3 (0.3) | 3 (0.3) |
Coronary artery disease / MI | 110 (10) | 85 (10) |
Cerebrovascular accident | 31 (3) | 25 (3) |
Gastroesophageal reflux disease | 396 (36) | 328 (38) |
Alkalinizing agent | 92 (8) | 63 (7) |
Hydrochlorothiazide | 72 (7) | 60 (7) |
Allopurinol | 43 (4) | 32 (4) |
Unless otherwise indicated, data represent number of patients and parenthesis represent percentage of cohort. *Data represent mean value and parenthesis represent standard deviation |
Table 2
24H urine metrics and stone type for 2- and 5-year recurrence cohorts
24H urine and stone features | 2-Year Recurrence n = 1104 | 5-Year Recurrence n = 875 |
Urine Component* | | |
Urine volume [L/d] | 2.01 (0.88) | 2.00 (0.88) |
Urine calcium | 218 (127) | 219 (125) |
Urine oxalate | 38 (18) | 38 (18) |
Urine citrate | 619 (434) | 618 (406) |
Urine pH | 6.1 (0.55) | 6.1 (0.55) |
Urine uric acid [g/d] | 0.64 (0.25) | 0.64 (0.25) |
Urine sodium [mmol/d] | 182 (85.9) | 181 (85.9) |
Urine potassium [mmol/d] | 58 (26) | 58 (25) |
Urine creatinine [mg/d] | 1677 (614.3) | 1661 (606.9) |
Majority stone composition | | |
Calcium oxalate monohydrate | 617 (56) | 479 (55) |
Calcium oxalate dihydrate | 116 (11) | 92 (11) |
Calcium phosphate | 205 (19) | 163 (19) |
Uric Acid | 86 (8) | 73 (8) |
Other | 75 (7) | 64 (7) |
No single type | 5 (0.5) | 4 (0.5) |
Unless otherwise indicated, data represent number of patients and parenthesis represent percentage of cohort. Urine metrics selected are based on the AUA minimum recommended features. Urine data is recorded from samples closest to the time of index symptomatic stone event. *Data represent mean value and parenthesis represent standard deviation. |
Predictive Models
Evaluating 2-year symptomatic stone recurrence, the AUC for LR, LASSO, RF, and XGBoost models were 0.585, 0.617, 0.570 and 0.580, respectively (Table 3). The AUC for LR, LASSO, RF, and XGBoost models predicting 5-year symptomatic stone recurrence was 0.618, 0.625, 0.608, and 0.621, respectively. The LASSO model demonstrated superior performance among both 2- and 5-year recurrence models (Fig. 1).
Table 3
Area under the curve (AUC) for the five predictive models
Model | 2-Year AUC (95% CI) | 5-Year AUC (95% CI) |
LR | 0.585 (0.514, 0.656) | 0.618 (0.550, 0.686) |
LASSO | 0.617 (0.547, 0.687) | 0.625 (0.557, 0.692) |
RF | 0.570 (0.498, 0.641) | 0.608 (0.539, 0.677) |
XGBoost | 0.580 (0.505, 0.654) | 0.621 (0.552, 0.690) |
All AUCs were calculated using the 30% validation data that were not used in the predictive model building. All values in parenthesis denote upper and lower bounds of the 95% confidence interval. Least absolute shrinkage and selection operator (LASSO), random forest (RF), gradient boosted decision tree (XGBoost), and logistic regression (LR). |
Predictive Features
Each model output individual rankings of features based on their importance in the model structure. Within the top 10 features of the 2- and 5-Year LASSO models, urine pH was the only feature derived from 24H Urine results (Supplementary Table 4). Type II diabetes mellitus and patient age ranked within the top 3 predictive features of both LASSO models. Patient age was the only feature found in the top 5 of every 2- and 5-year model tested. Within the 2-year recurrence models, none of the top 5 features for both LASSO and LR models were derived from 24H urines whereas 4/5 (80%) top features for both RF and XGBoost were derived from 24H urines. Urine pH ranked in the top 5 features of the 2- and 5-year RF and XGBoost models. In the 5-year recurrence models, again 0/5 (0%) top features for LASSO and LR models were from 24H urine studies.
Comment
We demonstrate the feasibility of integrating 24H urine data into machine learning models for the prediction of kidney stone recurrence at 2 and 5 years following an index stone event. When comparing LASSO, RF, and XGBoost models to an LR model, the LASSO model performed superiorly with an AUC of 0.585 and 0.618 at 2 and 5 years, respectively. Patient age and a history of Type II diabetes mellitus were highly important features in the LASSO model, and patient age was highly important in all models.
The incorporation of complex, non-linear data to create predictive models is a recognized strength of ML models [20]. However, previous evaluations of symptomatic recurrence have focused on the use of traditional statistical approaches to form predictive models or identify risk factors of stone recurrence [4, 6, 21, 22]. Machine learning model performance in our study was similar to logistical regression and other linear models of prediction [23]. This demonstrates the workability of these models for predicting stone recurrence using large, non-linear datasets. We specifically found that the LASSO model outperformed logistic regression at recurrence prediction in our study. More robust datasets will enhance ML model performance.
Previously, Rule et al demonstrated 2- and 5-year prediction of stone recurrence after an index stone event at rates of 11% and 20%, respectively [6]. This study included 2239 first-time stone formers and identified younger age, male sex, and family history of stones as independent predictors of recurrence. At home 24H urine testing was not required for inclusion in this study. Our study population was higher risk as all required had 24H urine testing for inclusion. Thus, the identified 2- and 5-year recurrence rates (2-year: 25%, 5-year: 44%) of our study are higher than previously described [6].
Similar to prior studies, many features with known associations to stone recurrence were prioritized by the ML models [22, 24–28]. Specifically, age, diabetic status, urine pH and stone composition were among most important features utilized by all of the ML models. Thus, the machine learning models were able to prioritize features associated with known stone pathogenesis. Moreover, in the highest performing model (i.e, the LASSO model), the top features that predicted stone recurrence included age, type II diabetes, and BMI, which reflects known risk factors for recurrence.
This study and the incorporation of ML models has several limitations. Primarily, ML models require large, non-biased data sets to develop robust results [29]. The 2- and 5-year recurrence data sets containing 1104 and 875 patients, respectively, is below that which is considered a large ML dataset. Ensuring training and testing data have correctly assigned outcomes of recurrence is additionally important for model performance. This retrospective study could not account for patients who had symptomatic recurrence and did not present to our institution for treatment. Furthermore, post-treatment imaging was not required after index stone event to ensure stone free status prior to the tracking for stone recurrence. A 90-day wait period was enforced following index stone event to reduce false positives associated with index stone event. Lastly, the use of a single 24H urine to predict chronic stone formation lends results to the variability of 24H urine studies over time [14].