DOI: https://doi.org/10.21203/rs.3.rs-2040978/v1
Background Heart failure (HF) is one of the most prevalent diseases in China and worldwide with poor prognosis. A prognostic model for predicting readmission for patients with HF could greatly facilitate risk stratification and timely identify high-risk patients. Various HF prediction models have been developed worldwide; however, there is few prognostic models for HF among Chinese populations. Thus, we developed and tested an eXtreme Gradient Boosting (XGBoost)model for predicting 90-day readmission for patients with HF.
Methods Clinical data for 1,532 HF patients retrospectively admitted to Zigong Fourth People’s Hospital in Sichuan Province from December 2016 to June 2019 were used to develop and test two prognostic models: XGBoost and logistic models. The least absolute shrinkage and selection operator (LASSO) regression method was applied to filter variables and select predictors. The XGBoost model tuning was performed in a 10-fold cross validation and tuned models were validated in test set (7:3 random split). The performance of the XGBoost model was assessed by accuracy (ACC), kappa, area under curve (AUC) and other metrics, and was compared with that of the logistic model.
Results systolic blood pressure, diastolic blood pressure, type of HF, mean corpuscular hemoglobin concentration, total cholesterol were screened out as predictors through LASSO regression. In training set, we optimized four major parameters, max depth, eta, nrounds and early stopping rounds with optimal values of 6, 0.5, 1000 and 5 for XGBoost. In test set, we obtained a ACC of 0.99 with kappa of 0.98 and the AUC, sensitivity and specificity achieved were of 1.00, 1.00 and 0.99 in the XGBoost model, which has significantly higher prediction performance than the logistic model.
Conclusion The XGBoost model developed in our study had excellent prediction performance in test set and the model can contribute to the assessment of 90-day readmission risk for patients with HF in Chinese population.
Heart failure (HF) is one of the most prevalent diseases in China and worldwide. According to a national HF epidemiology survey conducted from 2012 to 2015, there were approximately 8.9 million HF cases in China [1, 2], which was estimated to increase 5 million compared to 2000 [3]. Although domestic standardized diagnosis and treatment have been improved in recent years [4, 5], the prognosis of HF patients remains suboptimal. Previous studies indicated that 90-day readmission rate reached up to 24.8%[4, 6] and the resulting annual costs of inpatient stay were estimated to be as high as ¥20,000 per capita in China [7]. Undoubtedly, HF has already become a major public health problem that seriously affects the health of residents and aggravate the economic burden. Therefore, the accurate assessment of HF patients’ prognosis and the timely identification of high-risk patients are required for long-term effective readmission reduction, post-discharge management and medical expenses control. Currently, over 80% of the prediction models for readmission in patients with HF are based primarily on populations of European and American background [8] and most of them has not been externally validated. Thus, these prediction models could not be generalized across the population in China due to vast differences in basic national condition. As an illustration, most of the existing models developed by western countries are directed at predicting 30-day readmission for patients with HF [9–11]. However, Chinese patients with HF tend to refuse short-term rehospitalization (30-day) for economic reasons unless HF symptoms are not tolerable. Compared with a 30-day period, the 90-day period after discharge may be more valuable for the observation and evaluation of readmission for HF in China [6, 9]. Facing this situation, it is necessary to develop an accurate prediction model on Chinese patient with HF.
In recent years, machine learning (ML) methodologies have been widely utilized for the construction of prediction models based on biological features. With the development of artificial intelligence, ML algorithms have shown their advantages in predictions and recognitions[12–14] and one such advanced and successful ML method is eXtreme Gradient Boosting (XGBoost) [15]. XGBoost has been widely recognized in a number of ML and data mining challenges, for example, 17 solutions used XGBoost among the 29 challenge winning solutions published at Kaggle’s blog in 2015 and the top-10 winning teams used XGBoost in Knowledge Discovery and Data Mining Cup 2015. The reason why we choose XGBoost as our classifier is that the boosting algorithm of XGBoost make it a strong learner to enhance the performance compared with the simple decision trees, and the regularization of XGBoost make it robust against the noise and thus outperforming other ML algorithms [16]. Consequently, in the present study, we sought to train and test a XGBoost model for predicting 90-day readmission for patients with HF from hospital discharge based on retrospective medical records of 1,532 hospitalized patients with HF.
Study population
Data analysed in this study were retrieved from Research Resource for Complex Physiologic Signals (PhysioNet). The data is a restricted-access resource, which can be freely downloaded after passing the ethical examinations and signsing the data use agreement for the project according to the website’s protocols[17]. The study dataset was a retrospective cohort of 2,008 patients with HF consecutively admitted to Zigong Fourth People’s Hospital in Sichuan Province from December 2016 to June 2019. The study dataset was available from the following link: https://physionet.org/content/heart-failure-zigong/1.2[6].
Inclusion and Exclusion Criteria
In our study, HF was diagnosed based on 2016 European Society of Cardiology (ESC) criteria[18]. The target patients who had a diagnosis of HF on hospital admission were identified with International Classification of Diseases (ICD)-9 codes and selected from inpatient electronic health record system. Details on ICD-9 codes for the diagnosis of HF are provided in the original publication[6]. The participants with any missing data were excluded from the study and a total number of 1,532 patients were included in the final statistical analysis.
Data collection
Data collected for the study included three broad categories: demographic data, baseline characteristics and laboratory findings. Subject demographics were collected from the first sheet of the medical records and included age, sex and Body Mass Index (BMI). Baseline characteristics were measured on the day of hospital admission and included admission way (emergency or non-emergency), body temperature (T), pulse rate (PR), respiratory rate (RR), systolic blood pressure (SBP), diastolic blood pressure (DBP), Charlson Comorbidity Index Score (CCI), type of HF (left, right or both), NYHA (New York Heart Association) cardiac function classification, Killip grade, Glasgow Coma Scale (GCS), fraction of inspired oxygenation (FiO2). Laboratory findings were obtained from day one of hospital admission, including creatinine (CREA), uric acid (UA), glomerular filtration rate (GFR), cystatin-C (cys-C), white blood cell count (WBC), coefficient of variation of red blood cell distribution width (RDW-CV), standard deviation of red blood cell distribution width (RDW-SD), lymphocyte count (LYM), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean platelet volume (MPV), basophil count (BASO), eosinophil count (EON), hemoglobin (HGB), platelet (PLT), platelet distribution width (PDW), platelet hematocrit (PCT), neutrophil count (NEUT), D-Dimer (D-Di), high sensitivity troponin T (hs-TnT), brain natriuretic peptide (BNP), albumin (ALB), total cholesterol (TC), low density lipoprotein cholesterol (LDL-C), triglyceride (TG) and high density lipoprotein cholesterol (HDL-C). The details are shown in the original reference[6]. The primary outcome in this study was readmission within 90-day and readmission was measured from index hospital admission.
Ethical permission and informed consent
The planning, conduct, and reporting of the original study was in accordance with the Declaration of Helsinki, as revised in 2013. Ethical approval was obtained from the ethics committee of Zigong Fourth People’s Hospital (Approval Number: 2020-010) and the informed consent was exempted under the approval of the ethics committee of Zigong Fourth People’s Hospital in the original study. Informed consent in present study was not required as this is a study using secondary data and the data was analysed anonymously [19, 20].
All data analyses were performed using R version 4.2.1 (https://www.r-project.org, The R Foundation). The continuous variables with normal distribution were expressed as the mean ± standard deviation, whereas continuous variables with a skewed distribution were reported as the median (interquartile range). Categorical variables were expressed as frequency (percentage).
The participants were randomly divided into training set (N = 1068) and test set (N = 464) with the ratio of 7:3, which were used to establish readmission prediction models and test the accuracy (ACC) of the models, respectively. Baseline characteristics between training and test sets were compared using independent t test, Mann-Whitney U test, or Chi-square test, respectively. The variable selection was performed using the LASSO (Least absolute shrinkage and selection operator) regression in training group. 10-fold cross-validation was used to compute the optimal lambda shrinkage coefficient that minimized cross-validated error and the largest value of lambda within one standard error (lambada 1se) of this optimal value. Using the training dataset, we trained and developed two prediction models including XGBoost model and logistic regression model (generalize linear model) as classifiers for outcome prediction. The trained logistic regression is detailed in https://shengsong.shinyapps.io/ readmission_at _3_ months_in_HF_patient. To tune the hyperparameters of the XGBoost classifier and evaluate its performance, we obtained 10-fold cross-validation performance for each iteration and selected the iteration value that generated the best performance[21]. With XGBoost, we extracted the results of gain, cover, frequency and importance from the XGBoost output to evaluate the importance of the features. In this paper, the SHapley Additive exPlanations (SHAP), a model-agnostic explanation technique derived from cooperative game theory, was also used to quantify the importance of clinical features and their relationship to readmission for the XGBoost model [22].
The XGBoost model and logistic regression model were tested using separate test set and the prediction performance of the trained models in test set was estimated using area under curve (AUC), kappa score, ACC, balanced ACC, ACC > no information rate (NIR) metric of McNamar’s Chi-square test, sensitivity, specificity, positive predictive value, negative predictive value, precision, recall, F1 score, detection rate and detection prevalence. Generally speaking, kappa could be classified as poor (< 0.0), slight (0.00–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), or almost perfect (0.80–1.00) and AUC values were clarified as AUC = 0.5: no discrimination; 0.5 < AUC < 0.7: poor discrimination; 0.7 ≤ AUC < 0.8: acceptable discrimination; 0.8 ≤ AUC < 0.9: excellent discrimination; AUC ≥ 0.9: outstanding discrimination. For additional test, we also compared our XGBoost approach against the logistic regression model in test set. The predictive performance of the two models was compared using several metrics described above. The comparison of AUCs between the two models was performed with the DeLong test.
There were 1068 and 464 patients in training set and test set, among whom 254 (23.78%) and 104 patients (22.41%) required readmission within 90-day in training set and test set, respectively. Between groups, a significant difference was found in the distribution of admission way, RDW-CV, MCHC, D-Di and ALB. The remaining variables did not significantly differ among the two groups. Baseline characteristics of the patients are listed in Table 1.
Group | Training set | Test set | P |
---|---|---|---|
N | 1068 | 464 | |
Readmission within 90-day | 0.561 | ||
No | 814 (76.22%) | 360 (77.59%) | |
Yes | 254 (23.78%) | 104 (22.41%) | |
Age category (years) | 0.975 | ||
≤ 59 | 91 (8.52%) | 40 (8.62%) | |
60–79 | 570 (53.37%) | 250 (53.88%) | |
≥ 80 | 407 (38.11%) | 174 (37.50%) | |
Sex | 0.828 | ||
Female | 622 (58.24%) | 273 (58.84%) | |
Male | 446 (41.76%) | 191 (41.16%) | |
BMI (kg/m2) | 20.81 (18.37–23.50) | 20.66 (18.49–23.44) | 0.655 |
Admission way | 0.034 | ||
Non-emergency | 576 (53.93%) | 223 (48.06%) | |
Emergency | 492 (46.07%) | 241 (51.94%) | |
T (℃) | 36.41 ± 0.46 | 36.39 ± 0.40 | 0.437 |
PR (bpm) | 85.22 ± 21.88 | 84.69 ± 21.39 | 0.661 |
RR (bpm) | 19.08 ± 1.76 | 19.09 ± 1.77 | 0.899 |
SBP (mmHg) | 131.37 ± 25.23 | 131.64 ± 23.56 | 0.846 |
DBP (mmHg) | 76.98 ± 15.12 | 76.05 ± 13.40 | 0.250 |
CCI | 2.00 (1.00–2.00) | 2.00 (1.00–2.00) | 0.689 |
Type of heart failure | 0.990 | ||
Left | 246 (23.03%) | 106 (22.84%) | |
Right | 31 (2.90%) | 14 (3.02%) | |
Both | 791 (74.06%) | 344 (74.14%) | |
NYHA classification | 0.330 | ||
II | 177 (16.57%) | 88 (18.97%) | |
III | 555 (51.97%) | 245 (52.80%) | |
IV | 336 (31.46%) | 131 (28.23%) | |
Killip grade | 0.181 | ||
I | 268 (25.09%) | 124 (26.72%) | |
II | 548 (51.31%) | 248 (53.45%) | |
III | 224 (20.97%) | 76 (16.38%) | |
IV | 28 (2.62%) | 16 (3.45%) | |
GCS | 14.87 ± 0.91 | 14.77 ± 1.56 | 0.098 |
FiO2 (%) | 32.82 ± 5.04 | 32.87 ± 5.00 | 0.867 |
CREA (µmol/l) | 87.60 (65.07-122.75) | 87.65 (65.97-122.15) | 0.894 |
UA (µmol/l) | 477.04 ± 164.28 | 487.83 ± 170.40 | 0.243 |
GFR (ml/min*1.73m2) | 64.72 (40.64–89.16) | 64.56 (43.10-89.05) | 0.863 |
cys-C (mg/l) | 1.56 (1.22–2.17) | 1.50 (1.20–2.27) | 0.741 |
WBC(109/l) | 6.52 (5.04–8.61) | 6.56 (5.26–9.07) | 0.180 |
RDW-CV (%) | 14.92 ± 2.02 | 14.63 ± 1.72 | 0.007 |
RDW-SD (fl) | 48.92 ± 6.39 | 48.23 ± 6.13 | 0.051 |
LYM (109/l) | 0.93 (0.62–1.27) | 0.95 (0.61–1.33) | 0.286 |
MCH (pg) | 29.97 ± 3.33 | 30.27 ± 3.15 | 0.098 |
MCHC (g/l) | 324.52 ± 14.22 | 326.90 ± 11.96 | 0.002 |
MPV (fl) | 12.15 ± 1.71 | 12.17 ± 1.72 | 0.836 |
BASO (109/l) | 0.03 (0.02–0.04) | 0.03 (0.02–0.04) | 0.603 |
EON (109/l) | 0.06 (0.02–0.13) | 0.06 (0.02–0.13) | 0.879 |
HGB (g/l) | 114.69 ± 24.74 | 116.77 ± 23.07 | 0.123 |
PLT (109/l) | 145.67 ± 63.67 | 147.29 ± 60.22 | 0.642 |
PDW (fl) | 16.34 ± 1.35 | 16.38 ± 1.46 | 0.652 |
PCT (%) | 0.17 ± 0.07 | 0.17 ± 0.06 | 0.473 |
NEUT (109/l) | 4.84 (3.60–6.80) | 4.94 (3.74–7.05) | 0.218 |
D-Di (mg/l) | 1.18 (0.78–2.13) | 1.31 (0.84–2.33) | 0.040 |
hs-TnT (pg/ml) | 0.06 (0.02–0.12) | 0.06 (0.02–0.12) | 0.813 |
BNP (pg/ml) | 764.89 (324.50-1789.01) | 757.82 (314.55-1758.87) | 0.771 |
ALB (g/l) | 36.80 ± 4.99 | 36.24 ± 4.84 | 0.044 |
TC (mmol/l) | 3.76 ± 1.11 | 3.67 ± 1.03 | 0.140 |
LDL-C (mmol/l) | 1.87 ± 0.76 | 1.83 ± 0.72 | 0.313 |
TG (mmol/l) | 0.96 (0.72–1.31) | 0.99 (0.71–1.31) | 0.875 |
HDL-C (mmol/l) | 1.11 ± 0.36 | 1.09 ± 0.33 | 0.228 |
Variable selection
A total of 41 variables were included in the LASSO regression analysis. In LASSO regression model, a value of tuning parameter lambda = 0.04 with log (lambda) = -3.21 was selected by 10-fold cross-validation to minimize bionomial deviance values among 41 variables. Through the LASSO method, five important variables including SBP, DBP, type of HF, MCHC and TC were screened out to construct the models. The results from the LASSO regression are shown in Fig. 1.
The training and test of the two models
In training set, the XGBoost model building process employed 10-fold cross-validation as the guidance for parameter tuning in XGBoost. We optimized four major parameters, max depth, eta, nrounds and early stopping rounds with optimal values of 6, 0.5, 1000 and 5, respectively. The final XGBoost model with the best AUC and logloss was obtained on iteration 17. We achieved almost perfect performance from this iteration in training set (AUC = 1.00 ± 0.00 and logloss = 0.01 ± 0.00) and 10-fold cross-validation (AUC = 1.00 ± 0.00 and logloss = 0.03 ± 0.03). The results of 10-fold cross-validation in XGBoost are shown in Supplementary Table 1.
XGBoost predictors and fine-tuned parameters yielded the results with perfect kappa (kappa = 1.00) and excellent ACC (ACC = 1.00, 95%CI: 0.99-1.00) and had excellent P value (P < 0.001 for ACC > NIR), indicating better performance of the XGBoost model over NIR in training set. In the logistic regression model, overall ACC was only 0.63 and the kappa statistic showed a fair agreement (kappa = 0.28). However, Acc of the logistic model was not significantly better than NIR (P > 0.05 for ACC > NIR).
After the application of the classifier-specific feature evaluator for the Xgboost model, the included features were ranked based on their gain, cover, frequency and importance. For the XGBoost model, feature importance analysis indicated that TC was the most influential variables in the XGBoost model, followed by SBP, MCHC, DBP and type of HF. Feature importance of the five variables is shown in Fig. 2. Regarding the interpretability of the XGBoost model, SHAP values were used to visualize and explain how these variables affect readmission events within the XGBoost model. Based on the SHAP algorithm, the feature ranking interpretation of the XGBoost model showed that TC was the characteristics of the XGBoost model with the greatest impact in predicting readmission (Fig. 3), similar to the results of feature importance described above.
In test set, we obtained a high ACC of 0.99 with an almost perfect kappa score of 0.98 and ACC was significantly higher than NIR (P < 0.001 for ACC > NIR) for XGBoost. A sensitivity and specificity of 1.00 and 0.99 were achieved respectively. Compared with XGBoost, the logistic regression model yielded 0.60 of predicting ACC and kappa of 0.20 but a non-significant P-value for ACC > NIR. The sensitivity and specificity achieved was of 0.72 and 0.56 in the logistic model. The comparisons of other metrics between the two models in test set are given in Table 2. For all of these metrics, the XGBoost model was consistently superior to the logistic regression model.
As shown in the Fig. 4, the AUC of these two models for discriminating readmission was 1.00 and 0.71, respectively which indicated that the XGBoost model clearly outperform the logistic regression model (P < 0.001, Delong test).
metrics | XGBoost | Logistic regression |
---|---|---|
True Positives | 104 | 75 |
False Positives | 3 | 157 |
True Negatives | 357 | 203 |
False Negatives | 0 | 29 |
Area Under Curve | 1.00 | 0.71 |
kappa | 0.98 | 0.20 |
Accuracy | 0.99 | 0.60 |
Balanced Accuracy | 1.00 | 0.64 |
Sensitivity | 1.00 | 0.72 |
Specificity | 0.99 | 0.56 |
Positive Predictive Value | 0.97 | 0.32 |
Negative Predictive Value | 1.00 | 0.88 |
Precision | 0.97 | 0.32 |
Recall | 1.00 | 0.72 |
F1 Score | 0.99 | 0.45 |
Detection Rate | 0.22 | 0.16 |
Detection Prevalence | 0.23 | 0.50 |
The high readmission rate of HF is gaining more and more attention nowadays with the increasing incidence of HF observed in China in recent two decades [1, 2, 4]. Accurate assessment of the risk of readmission is an important precondition for the reduction in the rate of readmissions and the improvement of final outcome for HF patients.
At present, there is lack of the unified criteria for the critical time point of readmission in clinical studies involving HF and the specific timepoints include 7-day, 30-day, 90-day, 6-month, 1-year or longer. The readmission rate and mortality in the patients with HF significantly increased within 3-months after discharge as a result of short-term worsening of hemodynamics. This early postdischarge period is termed the vulnerable phase [23]. As observed in the EVEREST trial, high rate of early postdischarge events seems to be driven by a subgroup of high-risk patients with HF [24–26]. Thus, identification of high-risk patients has the potential to improve patient outcomes substantially after HF hospitalization. For this reason, we used the endpoints of 90-day readmission and develop a prediction model to identify patients with HF at high-risk of readmission. In 2019, Tan et al. developed a logistic regression model to predict the risk of 90-day readmission for patients with HF based on 350 patients with HF retrospectively collected in Hunan Provincial People’s Hospital. In their study, four variables were included into a multivariable model: NT-proBNP, RDW-CV and CCI[27]. To our knowledge, this is the only so far known prediction model for 90-day readmission for patients with HF in Chinese population. However, the sample size of the study was relatively small which made it difficult to generalize the results. Additionally, the model had only limited discriminatory power with AUC of 0.73 and provided a merely acceptable sensitivity of 0.74 and specificity of 0.61. Predictably, model performance in external clinical application scenario tends to be even worse than the original study. Compared with the previous prediction model, the sample size was further enlarged and a total of 1532 patients were included in this study. Secondly, our study established the XGBoost prediction model with almost perfect performance in test set (AUC = 1.00, sensitivity = 1.00 and specificity = 0.99), which could be quite accurately predicted 90-day readmission for patients with HF. Besides these, all the predictors in the XGBoost model are routinely available making it clinically feasible and practical. Finally, we also compared the classification performance of the XGBoost model and logistic regression model on test dataset, and the result showed that the XGBoost model significantly outperformed the traditional logistic regression model. Therefore, we have reasons to believe that the XGBoost model can provide a more accurate prediction of 90-day readmission for patients with HF.
We acknowledge that there are certain limitations in the present study. First, data in this study was derived from a single-center cohort of patients with HF. Although internal testing showed an almost perfect performance of the model, whether the results can be extrapolated to other populations remains uncertain and further external testing with a multicenter design are necessary to confirm model performance. Second, this study is based on retrospective data. Thus, the accuracy and quality of the data might be inferior to prospectively collected data, which might have an adverse impact on our model[28]. Third, substantial numbers of patients had data missing for follow-up period. Because of this, the model failed to account for the effect of follow-up period on readmission for patients with HF. Fourth, in contrast to the previous prediction models for readmission at different time points, our prediction model is the first to included MCHC and type of HF in addition to the conventional predictors such as SBP, DBP and TC. However, no studies have yet indicated that MCHC and type of HF is associated with readmission in patients with HF.
The XGBoost prediction model developed in our study had excellent prediction performance in both training and test sets and the model can contribute to the assessment of 90-day readmission risk for patients with HF and accurate identification of high-risk patients based on Chinese population.
Heart failure
machine learning
eXtreme Gradient Boosting
Research Resource for Complex Physiologic Signals
European Society of Cardiology
International Classification of Diseases
Body Mass Index
body temperature
pulse rate
respiratory rate
systolic blood pressure
diastolic blood pressure
Charlson Comorbidity Index Score
New York Heart Association
Glasgow Coma Scale
fraction of inspired oxygenation
creatinine
uric acid
glomerular filtration rate
cystatin-C
white blood cell count
coefficient of variation of red blood cell distribution width
standard deviation of red blood cell distribution width
lymphocyte count
mean corpuscular hemoglobin
mean corpuscular hemoglobin concentration
mean platelet volume
basophil count
eosinophil count
hemoglobin
platelet
platelet distribution width
platelet hematocrit
neutrophil count
D-Dimer
high sensitivity troponin T
brain natriuretic peptide
albumin
total cholesterol
low density lipoprotein cholesterol
triglyceride
high density lipoprotein cholesterol
accuracy
least absolute shrinkage and selection operator
one standard error
SHapley Additive exPlanations
area under curve
no Information rate
Ethics approval and consent to participate
The planning, conduct, and reporting of the original study was in accordance with the Declaration of Helsinki, as revised in 2013. Ethical approval was obtained from the ethics committee of Zigong Fourth People’s Hospital (Approval Number: 2020-010) and the informed consent was exempted under the approval of the ethics committee of Zigong Fourth People’s Hospital in the original study. Informed consent in present study was not required as this is a study using secondary data and the data was analysed anonymously [19, 20].
Consent for publication
Not applicable
Availability of data and materials
The existing data were obtained from was available from the following link: https://physionet.org/content/heart-failure-zigong/1.2.
Competing interests
All the authors have declared no competing interest.
Funding
This study was funded by Special Training Program for Outstanding Young Scientific and Technological Talents (grant number ZZ13-YQ-012).
Authors' contributions
Song Sheng completed the statistical analysis and wrote the paper. Professor Ye Huang designed the study and substantively revised it.
Acknowledgements
We are very grateful to the original authors of the study. They finished the entire study and uploaded their raw data for free. They are Zhongheng Zhang, Linghong Cao, Rangui Chen, Yan Zhao, Lukai Lv, Ziyin Xu, Ping Xu.