In this study, we first evaluated the relative importance of 20 possibly predictive variables for ESKD using a machine learning random forest model and a Cox proportional hazard model. Both models selected eGFR, proteinuria, hemoglobin HA1c, serum albumin, and serum bilirubin as the most important predictors. Then, we developed a simplified prediction model using these 5 variables and the Cox proportional hazard model. To the best of our knowledge, this is the first report to show that a 5-year risk model using these 5 commonly available variables has a good performance in the predictive ability for ESKD in patients with diabetes.
Previously, several prediction models for ESKD in patients with diabetes have also been reported.11–15 Jardine et al. reported a prediction model using 7-variables, including eGFR, urinary albumin-creatinine ratio, sex, systolic blood pressure, blood pressure-lowering agent use, presence of retinopathy, and education career from the ADVANCE trial (C-statistic: 0.847).12 A similar prediction model using 11 variables was reported in Chinese patients with diabetes (area under the curves [AUC] of the 3-, 5-, and 8-year risk: 0.90, 0.86, and 0.81, respectively).14 However, these models lacked external validation and thereby may not be generalized well to other populations. The model reported by Elley et al. showed a good performance in the predictive ability in the development cohort and the external validation cohort (C-statistic: 0.89–0.92), but this model used 10 variables including sex, ethnicity, age, diabetes duration, albuminuria, serum creatinine, systolic blood pressure, HbA1c, smoking status, and previous cardiovascular disease status.13 A recent study developed a machine learning based prediction model called the feed-forward neural network model.15 In that model, 18 variables were used in patients with diabetes and nephropathy participating in past clinical trials, including RENAAL, IDNT and ALTITUDE studies (AUC: 0.82, 0.81, and 0.84, respectively). The machine learning approach appears to be superior to the traditional hypothesis-driven statistical methods in terms of its data-driven approach to analyze a large number of possibly predictive variables. Our random forest model using 20 variables also showed an excellent predictive ability for ESKD (C-statistic 0.935). However, the main obstacle is that many predictive variables are not readily obtainable in primary care, thus limiting their usefulness to clinicians’ managing patients with diabetes. In contrast, Keane et al reported a simple prediction model using 4 variables (serum creatinine, urine albumin-creatinine ratio, serum albumin, and hemoglobin) in a cohort from the RENNAL study.11 They selected those 4 variables from 23 baseline variables using the Cox proportional hazard model with backward selection process, with P < 0.01 required for inclusion in a final model. However, our analysis using both the machine learning approach and the Cox proportional hazard model showed that HbA1c levels and bilirubin levels were more important predictors than hemoglobin levels for predicting ESKD. The mean follow-up period was much shorter in the RENNAL study than in our study (3.4 years vs. 5.6 years), and decreasing hemoglobin level is a generally late sign of renal impairment. Their model may be effective for risk prediction at a time shorter than 3 years. Thus, our simplified 5-year prediction model may be more useful than previous prediction models in clinical practice. Our model could guide clinicians in making clinical decision earlier regarding intensification of monitoring and preventive therapies or referral to specialists. Risk information helps patients to become aware of their current risk, promote motivation on improving their lifestyle. In our study, the nomogram based on our model was built to predict the absolute ESKD probability for each individual. The 5-year risk equation we showed can be also used as electronic applications. These tools may provide practical risk predictive tools for future clinical application.
The reason why serum bilirubin levels were so important among various predictive variables for predicting ESKD remained to be elucidated. Bilirubin is a product of heme catabolism by heme oxygenase, which is a major antioxidant enzyme. Bilirubin is thought to have a protective effect on oxidative stress-induced organ damage through its strong antioxidant activity.22 We previously showed a lower prevalence of nephropathy and other vascular complications, as well as reduced oxidative stress, in patients with diabetes and Gilbert syndrome, which is a hereditary hyperbilirubinemia.25 Accumulating evidence has also shown that serum bilirubin levels are negatively associated with the progression of DKD.26–28 Taken together, it is most likely that serum bilirubin may prevent the progression of nephropathy via its anti-oxidative activities. This possibility is supported by an animal study, which showed that bilirubin prevented renal oxidative stress and dysfunction in type-1 diabetic rats and type 2 diabetic mice.29 In addition, serum bilirubin levels have been reported to be affected by oxidative stress-related factors such as smoking, obesity, hypertension, metabolic syndrome, and cardiovascular diseases in addition to genetic factors,30–34 all of which are possible risk facors for DKD. Therefore, serum bilirubin levels might represent a total susceptibility determined by such factors to the progression to ESKD. In line with these concepts, the effect of anti-oxidative properties of albumin on the progression to ESKD may be plausible, although the effect of serum albumin levels may be mainly explained by their association with the levels of albuminuria. Albumin is thought to be an important serum antioxidant in addition to serum bilirubin.24,25 In serum, free thiol groups are one of the most important scavengers of hydroxy radicals and other oxidants and are largely located on albumin.35 Serum albumin levels have been reported to be inversely associated with the cardiovascular disease risk and aging, supporting its possible causal relationships with the oxidative stress-related status and diseases.36–38 The mechanisms underlying the close associations of serum bilirubin and albumin levels with the progression to ESKD should be clarified in future studies.
There are several limitations to this study. First, we might not have obtained ideal information regarding clinical data and timely assessment of endpoint, compared with controlled clinical trials. Second, the sample size may not have been sufficient to develop prediction models. Third, we used proteinuria by a conventional urine test rather than measurements of albuminuria because the rate of albuminuria measurements was low in the electronic medical record data. However, despite these limitations, our prediction models showed a good performance in the independent external validation as well as the development cohorts. Additionally, proteinuria data are much more easily obtainable than those of albuminuria measurement in primary care. Lastly, although the excellent performance in discrimination of our prediction model was confirmed in an external cohort, the performance in calibration should be evaluated in more various populations including different ethnicities, and cohorts outside Japan for widespread adoption of this model, because patient characteristics, healthcare system, and treatment strategies vary between health centers, regions and countries, and such heterogeneity can affect risk estimates and their calibration.39,40
In conclusion, we developed a simplified and accurate 5-year prediction model using the commonly available clinical variables of eGFR, proteinuria, HbA1c, serum albumin and bilirubin levels, which are easy to measure. Prospective studies should be done to establish its clinical usefulness in reducing ESKD in patients with diabetes.