PD-associated peritonitis is one of the leading causes of PD withdrawal and death.[12, 13] ML algorithms are becoming increasingly popular in medical research and can be applied to disease screening, diagnosis, and prognosis. We used ML intelligent analysis technology to construct a predictive model for the adverse prognosis of PD and demonstrated that age, body weight, and albumin levels are important predictive factors for the adverse prognosis of PD. We developed five predictive models; in the complete model, the calculated SHAP values summarised the strongest predictive indicators and sorted and extracted the 20 key features to reconstruct the model. Collectively, our findings suggested that the CatBoost model demonstrated the strongest performance.
We ranked the factors closely related to the adverse prognosis of patients by the SHAP values, with the top 20 key factors including age, body weight, albumin, and blood lipids. The meta-analysis revealed that age is a risk factor for all-cause cardiovascular death in dialysis patients[14]. In this study, we observed that the age of patients in the PD continuation group was significantly lower than that in the adverse prognosis group (45.36 vs 51.11 years, P < 0.001). In the complete model, the calculated SHAP values confirmed that age had the strongest impact on predicting an adverse prognosis for PD patients. In addition, body weight and BMI were critical predictive factors for adverse PD prognosis, with higher BMI leading to higher hospitalization rates for peritonitis[15]. In the general population, obesity is associated with increased cardiovascular risk and reduced survival, but the “obesity paradox” in ESRD has always been controversial[16, 17]. Our study suggests that increased body weight and BMI correlate with a lower risk of adverse PD prognosis. The nutritional indicators include body weight, as well as albumin and blood lipids. A positive correlation between nutritional status and dialysis duration has been reported in PD patients because a nutritious diet reduces the incidence of complications such as peritonitis[12].
Education level was also considered a vital predictor of adverse PD prognosis, and multiple studies have demonstrated that [18, 19] patients with lower education levels experience increased peritonitis and technical failure than those with higher education levels. The potential reason may be that patients with lower education levels have lower incomes and poor compliance, which affects their access to timely healthcare, medication, and treatment.
The high prevalence of cardiovascular diseases in PD patients is related to uremic toxins, inflammation (erythrocyte sedimentation rate), and disorders in bone mineral metabolism (vitamin D, serum phosphorus, and iPTH)[20]. Similarly, we observed vitamin D, serum phosphorus, iPTH, erythrocyte sedimentation rate, creatinine, and cardiovascular disease to be associated with adverse PD prognosis in patients. Furthermore, we observed that TIBC and SF are critical predictive factors for adverse PD prognosis and that higher amounts of iron increase the risk of QT dispersion[21]. Functional iron deficiency is an independent risk factor for all-cause death in PD patients. Consistent with our research findings, PD patients with high iron levels have a four-fold higher risk of all-cause cardiovascular death [22].
ML is an interdisciplinary field of mathematics and statistics[23] that involves fitting predictive models to data for information grouping. We assumed that ML methods could predict the adverse prognosis of patients before starting PD, recommended the most favourable dialysis method, and provided timely medical intervention, which improved patient prognosis and reduced medical costs.
CatBoost is the third GBDT-based improved algorithm after XGBoost and LightGBM[24]. Launched by Yandex Company in Russia in 2018, and is open-source. It uses gradient lifting on the decision tree and can be easily integrated into deep-learning frameworks. Based on the GBDT framework, which has fewer parameters, CatBoost supports categorical variables with high accuracy and can efficiently and reasonably process t- algorithms. CatBoost has been extensively studied in the prediction of skin sensitisation [25], depression occurrence[26], pregnancy diabetes management[27], and transplanted kidney function[8], and it exhibits good predictive performance. Owing to numerous factors that affect an adverse PD prognosis and considering the clinical applications, we reconstructed a compression model by extracting 20 key features ranked by the SHAP values. This simplified version of the model was slightly weaker in performance than the full model but was more conducive to clinical application and data collection. Before a patient starts PD, the CatBoost model can be used to predict whether the patient is suitable for PD treatment and whether PD-related peritonitis may occur. Based on the prediction, the most optimal dialysis plan can be selected for the patient allowing early intervention.
Our study had several limitations. First, this was a single-centre retrospective study, and we could not evaluate whether the external cohort population exhibited the same pattern. Second, this study used the median of missing values, which inevitably led to bias. Third, the number of cases was relatively small, and the model construction lacked cross-validation and external validation, all of which affected the ability to generalise the model. A multicenter joint study is needed to validate the model.