Predicting 90-day readmission for patients with heart failure: a machine learning approach using XGBoost

doi:10.21203/rs.3.rs-2040978/v1

Download PDF

Research Article

Predicting 90-day readmission for patients with heart failure: a machine learning approach using XGBoost

https://doi.org/10.21203/rs.3.rs-2040978/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background Heart failure (HF) is one of the most prevalent diseases in China and worldwide with poor prognosis. A prognostic model for predicting readmission for patients with HF could greatly facilitate risk stratification and timely identify high-risk patients. Various HF prediction models have been developed worldwide; however, there is few prognostic models for HF among Chinese populations. Thus, we developed and tested an eXtreme Gradient Boosting (XGBoost)model for predicting 90-day readmission for patients with HF.

Methods Clinical data for 1,532 HF patients retrospectively admitted to Zigong Fourth People’s Hospital in Sichuan Province from December 2016 to June 2019 were used to develop and test two prognostic models: XGBoost and logistic models. The least absolute shrinkage and selection operator (LASSO) regression method was applied to filter variables and select predictors. The XGBoost model tuning was performed in a 10-fold cross validation and tuned models were validated in test set (7:3 random split). The performance of the XGBoost model was assessed by accuracy (ACC), kappa, area under curve (AUC) and other metrics, and was compared with that of the logistic model.

Results systolic blood pressure, diastolic blood pressure, type of HF, mean corpuscular hemoglobin concentration, total cholesterol were screened out as predictors through LASSO regression. In training set, we optimized four major parameters, max depth, eta, nrounds and early stopping rounds with optimal values of 6, 0.5, 1000 and 5 for XGBoost. In test set, we obtained a ACC of 0.99 with kappa of 0.98 and the AUC, sensitivity and specificity achieved were of 1.00, 1.00 and 0.99 in the XGBoost model, which has significantly higher prediction performance than the logistic model.

Conclusion The XGBoost model developed in our study had excellent prediction performance in test set and the model can contribute to the assessment of 90-day readmission risk for patients with HF in Chinese population.

heart failure

90-day readmission

XGBoost

machine learning

predictive model

Heart failure (HF) is one of the most prevalent diseases in China and worldwide. According to a national HF epidemiology survey conducted from 2012 to 2015, there were approximately 8.9 million HF cases in China ^{[1, 2]}, which was estimated to increase 5 million compared to 2000 ^[3]. Although domestic standardized diagnosis and treatment have been improved in recent years ^{[4, 5]}, the prognosis of HF patients remains suboptimal. Previous studies indicated that 90-day readmission rate reached up to 24.8%^{[4, 6]} and the resulting annual costs of inpatient stay were estimated to be as high as ¥20,000 per capita in China ^[7]. Undoubtedly, HF has already become a major public health problem that seriously affects the health of residents and aggravate the economic burden. Therefore, the accurate assessment of HF patients’ prognosis and the timely identification of high-risk patients are required for long-term effective readmission reduction, post-discharge management and medical expenses control. Currently, over 80% of the prediction models for readmission in patients with HF are based primarily on populations of European and American background ^[8] and most of them has not been externally validated. Thus, these prediction models could not be generalized across the population in China due to vast differences in basic national condition. As an illustration, most of the existing models developed by western countries are directed at predicting 30-day readmission for patients with HF ^[9–11]. However, Chinese patients with HF tend to refuse short-term rehospitalization (30-day) for economic reasons unless HF symptoms are not tolerable. Compared with a 30-day period, the 90-day period after discharge may be more valuable for the observation and evaluation of readmission for HF in China ^{[6, 9]}. Facing this situation, it is necessary to develop an accurate prediction model on Chinese patient with HF.

In recent years, machine learning (ML) methodologies have been widely utilized for the construction of prediction models based on biological features. With the development of artificial intelligence, ML algorithms have shown their advantages in predictions and recognitions^[12–14] and one such advanced and successful ML method is eXtreme Gradient Boosting (XGBoost) ^[15]. XGBoost has been widely recognized in a number of ML and data mining challenges, for example, 17 solutions used XGBoost among the 29 challenge winning solutions published at Kaggle’s blog in 2015 and the top-10 winning teams used XGBoost in Knowledge Discovery and Data Mining Cup 2015. The reason why we choose XGBoost as our classifier is that the boosting algorithm of XGBoost make it a strong learner to enhance the performance compared with the simple decision trees, and the regularization of XGBoost make it robust against the noise and thus outperforming other ML algorithms ^[16]. Consequently, in the present study, we sought to train and test a XGBoost model for predicting 90-day readmission for patients with HF from hospital discharge based on retrospective medical records of 1,532 hospitalized patients with HF.

Study population

Data analysed in this study were retrieved from Research Resource for Complex Physiologic Signals (PhysioNet). The data is a restricted-access resource, which can be freely downloaded after passing the ethical examinations and signsing the data use agreement for the project according to the website’s protocols^[17]. The study dataset was a retrospective cohort of 2,008 patients with HF consecutively admitted to Zigong Fourth People’s Hospital in Sichuan Province from December 2016 to June 2019. The study dataset was available from the following link: https://physionet.org/content/heart-failure-zigong/1.2^[6].

Inclusion and Exclusion Criteria

In our study, HF was diagnosed based on 2016 European Society of Cardiology (ESC) criteria^[18]. The target patients who had a diagnosis of HF on hospital admission were identified with International Classification of Diseases (ICD)-9 codes and selected from inpatient electronic health record system. Details on ICD-9 codes for the diagnosis of HF are provided in the original publication^[6]. The participants with any missing data were excluded from the study and a total number of 1,532 patients were included in the final statistical analysis.

Data collection

Data collected for the study included three broad categories: demographic data, baseline characteristics and laboratory findings. Subject demographics were collected from the first sheet of the medical records and included age, sex and Body Mass Index (BMI). Baseline characteristics were measured on the day of hospital admission and included admission way (emergency or non-emergency), body temperature (T), pulse rate (PR), respiratory rate (RR), systolic blood pressure (SBP), diastolic blood pressure (DBP), Charlson Comorbidity Index Score (CCI), type of HF (left, right or both), NYHA (New York Heart Association) cardiac function classification, Killip grade, Glasgow Coma Scale (GCS), fraction of inspired oxygenation (FiO2). Laboratory findings were obtained from day one of hospital admission, including creatinine (CREA), uric acid (UA), glomerular filtration rate (GFR), cystatin-C (cys-C), white blood cell count (WBC), coefficient of variation of red blood cell distribution width (RDW-CV), standard deviation of red blood cell distribution width (RDW-SD), lymphocyte count (LYM), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean platelet volume (MPV), basophil count (BASO), eosinophil count (EON), hemoglobin (HGB), platelet (PLT), platelet distribution width (PDW), platelet hematocrit (PCT), neutrophil count (NEUT), D-Dimer (D-Di), high sensitivity troponin T (hs-TnT), brain natriuretic peptide (BNP), albumin (ALB), total cholesterol (TC), low density lipoprotein cholesterol (LDL-C), triglyceride (TG) and high density lipoprotein cholesterol (HDL-C). The details are shown in the original reference^[6]. The primary outcome in this study was readmission within 90-day and readmission was measured from index hospital admission.

Ethical permission and informed consent

The planning, conduct, and reporting of the original study was in accordance with the Declaration of Helsinki, as revised in 2013. Ethical approval was obtained from the ethics committee of Zigong Fourth People’s Hospital (Approval Number: 2020-010) and the informed consent was exempted under the approval of the ethics committee of Zigong Fourth People’s Hospital in the original study. Informed consent in present study was not required as this is a study using secondary data and the data was analysed anonymously ^{[19, 20]}.

Data Analysis

All data analyses were performed using R version 4.2.1 (https://www.r-project.org, The R Foundation). The continuous variables with normal distribution were expressed as the mean ± standard deviation, whereas continuous variables with a skewed distribution were reported as the median (interquartile range). Categorical variables were expressed as frequency (percentage).

The participants were randomly divided into training set (N = 1068) and test set (N = 464) with the ratio of 7:3, which were used to establish readmission prediction models and test the accuracy (ACC) of the models, respectively. Baseline characteristics between training and test sets were compared using independent t test, Mann-Whitney U test, or Chi-square test, respectively. The variable selection was performed using the LASSO (Least absolute shrinkage and selection operator) regression in training group. 10-fold cross-validation was used to compute the optimal lambda shrinkage coefficient that minimized cross-validated error and the largest value of lambda within one standard error (lambada 1se) of this optimal value. Using the training dataset, we trained and developed two prediction models including XGBoost model and logistic regression model (generalize linear model) as classifiers for outcome prediction. The trained logistic regression is detailed in https://shengsong.shinyapps.io/ readmission_at _3_ months_in_HF_patient. To tune the hyperparameters of the XGBoost classifier and evaluate its performance, we obtained 10-fold cross-validation performance for each iteration and selected the iteration value that generated the best performance^[21]. With XGBoost, we extracted the results of gain, cover, frequency and importance from the XGBoost output to evaluate the importance of the features. In this paper, the SHapley Additive exPlanations (SHAP), a model-agnostic explanation technique derived from cooperative game theory, was also used to quantify the importance of clinical features and their relationship to readmission for the XGBoost model ^[22].

The XGBoost model and logistic regression model were tested using separate test set and the prediction performance of the trained models in test set was estimated using area under curve (AUC), kappa score, ACC, balanced ACC, ACC > no information rate (NIR) metric of McNamar’s Chi-square test, sensitivity, specificity, positive predictive value, negative predictive value, precision, recall, F1 score, detection rate and detection prevalence. Generally speaking, kappa could be classified as poor (< 0.0), slight (0.00–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), or almost perfect (0.80–1.00) and AUC values were clarified as AUC = 0.5: no discrimination; 0.5 < AUC < 0.7: poor discrimination; 0.7 ≤ AUC < 0.8: acceptable discrimination; 0.8 ≤ AUC < 0.9: excellent discrimination; AUC ≥ 0.9: outstanding discrimination. For additional test, we also compared our XGBoost approach against the logistic regression model in test set. The predictive performance of the two models was compared using several metrics described above. The comparison of AUCs between the two models was performed with the DeLong test.

There were 1068 and 464 patients in training set and test set, among whom 254 (23.78%) and 104 patients (22.41%) required readmission within 90-day in training set and test set, respectively. Between groups, a significant difference was found in the distribution of admission way, RDW-CV, MCHC, D-Di and ALB. The remaining variables did not significantly differ among the two groups. Baseline characteristics of the patients are listed in Table 1.

Table 1

Baseline characteristics of the patients
Group	Training set	Test set	P
N	1068	464
Readmission within 90-day			0.561
No	814 (76.22%)	360 (77.59%)
Yes	254 (23.78%)	104 (22.41%)
Age category (years)			0.975
≤ 59	91 (8.52%)	40 (8.62%)
60–79	570 (53.37%)	250 (53.88%)
≥ 80	407 (38.11%)	174 (37.50%)
Sex			0.828
Female	622 (58.24%)	273 (58.84%)
Male	446 (41.76%)	191 (41.16%)
BMI (kg/m²)	20.81 (18.37–23.50)	20.66 (18.49–23.44)	0.655
Admission way			0.034
Non-emergency	576 (53.93%)	223 (48.06%)
Emergency	492 (46.07%)	241 (51.94%)
T (℃)	36.41 ± 0.46	36.39 ± 0.40	0.437
PR (bpm)	85.22 ± 21.88	84.69 ± 21.39	0.661
RR (bpm)	19.08 ± 1.76	19.09 ± 1.77	0.899
SBP (mmHg)	131.37 ± 25.23	131.64 ± 23.56	0.846
DBP (mmHg)	76.98 ± 15.12	76.05 ± 13.40	0.250
CCI	2.00 (1.00–2.00)	2.00 (1.00–2.00)	0.689
Type of heart failure			0.990
Left	246 (23.03%)	106 (22.84%)
Right	31 (2.90%)	14 (3.02%)
Both	791 (74.06%)	344 (74.14%)
NYHA classification			0.330
II	177 (16.57%)	88 (18.97%)
III	555 (51.97%)	245 (52.80%)
IV	336 (31.46%)	131 (28.23%)
Killip grade			0.181
I	268 (25.09%)	124 (26.72%)
II	548 (51.31%)	248 (53.45%)
III	224 (20.97%)	76 (16.38%)
IV	28 (2.62%)	16 (3.45%)
GCS	14.87 ± 0.91	14.77 ± 1.56	0.098
FiO2 (%)	32.82 ± 5.04	32.87 ± 5.00	0.867
CREA (µmol/l)	87.60 (65.07-122.75)	87.65 (65.97-122.15)	0.894
UA (µmol/l)	477.04 ± 164.28	487.83 ± 170.40	0.243
GFR (ml/min*1.73m²)	64.72 (40.64–89.16)	64.56 (43.10-89.05)	0.863
cys-C (mg/l)	1.56 (1.22–2.17)	1.50 (1.20–2.27)	0.741
WBC(10⁹/l)	6.52 (5.04–8.61)	6.56 (5.26–9.07)	0.180
RDW-CV (%)	14.92 ± 2.02	14.63 ± 1.72	0.007
RDW-SD (fl)	48.92 ± 6.39	48.23 ± 6.13	0.051
LYM (10⁹/l)	0.93 (0.62–1.27)	0.95 (0.61–1.33)	0.286
MCH (pg)	29.97 ± 3.33	30.27 ± 3.15	0.098
MCHC (g/l)	324.52 ± 14.22	326.90 ± 11.96	0.002
MPV (fl)	12.15 ± 1.71	12.17 ± 1.72	0.836
BASO (10⁹/l)	0.03 (0.02–0.04)	0.03 (0.02–0.04)	0.603
EON (10⁹/l)	0.06 (0.02–0.13)	0.06 (0.02–0.13)	0.879
HGB (g/l)	114.69 ± 24.74	116.77 ± 23.07	0.123
PLT (10⁹/l)	145.67 ± 63.67	147.29 ± 60.22	0.642
PDW (fl)	16.34 ± 1.35	16.38 ± 1.46	0.652
PCT (%)	0.17 ± 0.07	0.17 ± 0.06	0.473
NEUT (10⁹/l)	4.84 (3.60–6.80)	4.94 (3.74–7.05)	0.218
D-Di (mg/l)	1.18 (0.78–2.13)	1.31 (0.84–2.33)	0.040
hs-TnT (pg/ml)	0.06 (0.02–0.12)	0.06 (0.02–0.12)	0.813
BNP (pg/ml)	764.89 (324.50-1789.01)	757.82 (314.55-1758.87)	0.771
ALB (g/l)	36.80 ± 4.99	36.24 ± 4.84	0.044
TC (mmol/l)	3.76 ± 1.11	3.67 ± 1.03	0.140
LDL-C (mmol/l)	1.87 ± 0.76	1.83 ± 0.72	0.313
TG (mmol/l)	0.96 (0.72–1.31)	0.99 (0.71–1.31)	0.875
HDL-C (mmol/l)	1.11 ± 0.36	1.09 ± 0.33	0.228

Variable selection

A total of 41 variables were included in the LASSO regression analysis. In LASSO regression model, a value of tuning parameter lambda = 0.04 with log (lambda) = -3.21 was selected by 10-fold cross-validation to minimize bionomial deviance values among 41 variables. Through the LASSO method, five important variables including SBP, DBP, type of HF, MCHC and TC were screened out to construct the models. The results from the LASSO regression are shown in Fig. 1.

The training and test of the two models

In training set, the XGBoost model building process employed 10-fold cross-validation as the guidance for parameter tuning in XGBoost. We optimized four major parameters, max depth, eta, nrounds and early stopping rounds with optimal values of 6, 0.5, 1000 and 5, respectively. The final XGBoost model with the best AUC and logloss was obtained on iteration 17. We achieved almost perfect performance from this iteration in training set (AUC = 1.00 ± 0.00 and logloss = 0.01 ± 0.00) and 10-fold cross-validation (AUC = 1.00 ± 0.00 and logloss = 0.03 ± 0.03). The results of 10-fold cross-validation in XGBoost are shown in Supplementary Table 1.

XGBoost predictors and fine-tuned parameters yielded the results with perfect kappa (kappa = 1.00) and excellent ACC (ACC = 1.00, 95%CI: 0.99-1.00) and had excellent P value (P < 0.001 for ACC > NIR), indicating better performance of the XGBoost model over NIR in training set. In the logistic regression model, overall ACC was only 0.63 and the kappa statistic showed a fair agreement (kappa = 0.28). However, Acc of the logistic model was not significantly better than NIR (P > 0.05 for ACC > NIR).

After the application of the classifier-specific feature evaluator for the Xgboost model, the included features were ranked based on their gain, cover, frequency and importance. For the XGBoost model, feature importance analysis indicated that TC was the most influential variables in the XGBoost model, followed by SBP, MCHC, DBP and type of HF. Feature importance of the five variables is shown in Fig. 2. Regarding the interpretability of the XGBoost model, SHAP values were used to visualize and explain how these variables affect readmission events within the XGBoost model. Based on the SHAP algorithm, the feature ranking interpretation of the XGBoost model showed that TC was the characteristics of the XGBoost model with the greatest impact in predicting readmission (Fig. 3), similar to the results of feature importance described above.

In test set, we obtained a high ACC of 0.99 with an almost perfect kappa score of 0.98 and ACC was significantly higher than NIR (P < 0.001 for ACC > NIR) for XGBoost. A sensitivity and specificity of 1.00 and 0.99 were achieved respectively. Compared with XGBoost, the logistic regression model yielded 0.60 of predicting ACC and kappa of 0.20 but a non-significant P-value for ACC > NIR. The sensitivity and specificity achieved was of 0.72 and 0.56 in the logistic model. The comparisons of other metrics between the two models in test set are given in Table 2. For all of these metrics, the XGBoost model was consistently superior to the logistic regression model.

As shown in the Fig. 4, the AUC of these two models for discriminating readmission was 1.00 and 0.71, respectively which indicated that the XGBoost model clearly outperform the logistic regression model (P < 0.001, Delong test).

Table 2

The comparisons of other metrics between the two models in test set
metrics	XGBoost	Logistic regression
True Positives	104	75
False Positives	3	157
True Negatives	357	203
False Negatives	0	29
Area Under Curve	1.00	0.71
kappa	0.98	0.20
Accuracy	0.99	0.60
Balanced Accuracy	1.00	0.64
Sensitivity	1.00	0.72
Specificity	0.99	0.56
Positive Predictive Value	0.97	0.32
Negative Predictive Value	1.00	0.88
Precision	0.97	0.32
Recall	1.00	0.72
F1 Score	0.99	0.45
Detection Rate	0.22	0.16
Detection Prevalence	0.23	0.50

The high readmission rate of HF is gaining more and more attention nowadays with the increasing incidence of HF observed in China in recent two decades ^{[1, 2, 4]}. Accurate assessment of the risk of readmission is an important precondition for the reduction in the rate of readmissions and the improvement of final outcome for HF patients.

At present, there is lack of the unified criteria for the critical time point of readmission in clinical studies involving HF and the specific timepoints include 7-day, 30-day, 90-day, 6-month, 1-year or longer. The readmission rate and mortality in the patients with HF significantly increased within 3-months after discharge as a result of short-term worsening of hemodynamics. This early postdischarge period is termed the vulnerable phase ^[23]. As observed in the EVEREST trial, high rate of early postdischarge events seems to be driven by a subgroup of high-risk patients with HF ^[24–26]. Thus, identification of high-risk patients has the potential to improve patient outcomes substantially after HF hospitalization. For this reason, we used the endpoints of 90-day readmission and develop a prediction model to identify patients with HF at high-risk of readmission. In 2019, Tan et al. developed a logistic regression model to predict the risk of 90-day readmission for patients with HF based on 350 patients with HF retrospectively collected in Hunan Provincial People’s Hospital. In their study, four variables were included into a multivariable model: NT-proBNP, RDW-CV and CCI^[27]. To our knowledge, this is the only so far known prediction model for 90-day readmission for patients with HF in Chinese population. However, the sample size of the study was relatively small which made it difficult to generalize the results. Additionally, the model had only limited discriminatory power with AUC of 0.73 and provided a merely acceptable sensitivity of 0.74 and specificity of 0.61. Predictably, model performance in external clinical application scenario tends to be even worse than the original study. Compared with the previous prediction model, the sample size was further enlarged and a total of 1532 patients were included in this study. Secondly, our study established the XGBoost prediction model with almost perfect performance in test set (AUC = 1.00, sensitivity = 1.00 and specificity = 0.99), which could be quite accurately predicted 90-day readmission for patients with HF. Besides these, all the predictors in the XGBoost model are routinely available making it clinically feasible and practical. Finally, we also compared the classification performance of the XGBoost model and logistic regression model on test dataset, and the result showed that the XGBoost model significantly outperformed the traditional logistic regression model. Therefore, we have reasons to believe that the XGBoost model can provide a more accurate prediction of 90-day readmission for patients with HF.

We acknowledge that there are certain limitations in the present study. First, data in this study was derived from a single-center cohort of patients with HF. Although internal testing showed an almost perfect performance of the model, whether the results can be extrapolated to other populations remains uncertain and further external testing with a multicenter design are necessary to confirm model performance. Second, this study is based on retrospective data. Thus, the accuracy and quality of the data might be inferior to prospectively collected data, which might have an adverse impact on our model^[28]. Third, substantial numbers of patients had data missing for follow-up period. Because of this, the model failed to account for the effect of follow-up period on readmission for patients with HF. Fourth, in contrast to the previous prediction models for readmission at different time points, our prediction model is the first to included MCHC and type of HF in addition to the conventional predictors such as SBP, DBP and TC. However, no studies have yet indicated that MCHC and type of HF is associated with readmission in patients with HF.

The XGBoost prediction model developed in our study had excellent prediction performance in both training and test sets and the model can contribute to the assessment of 90-day readmission risk for patients with HF and accurate identification of high-risk patients based on Chinese population.

Heart failure

machine learning

XGBoost

eXtreme Gradient Boosting

PhysioNet

Research Resource for Complex Physiologic Signals

ESC

European Society of Cardiology

ICD

International Classification of Diseases

BMI

Body Mass Index

body temperature

pulse rate

respiratory rate

SBP

systolic blood pressure

DBP

diastolic blood pressure

CCI

Charlson Comorbidity Index Score

NYHA

New York Heart Association

GCS

Glasgow Coma Scale

FiO2

fraction of inspired oxygenation

CREA

creatinine

uric acid

GFR

glomerular filtration rate

cys-C

cystatin-C

WBC

white blood cell count

RDW-CV

coefficient of variation of red blood cell distribution width

RDW-SD

standard deviation of red blood cell distribution width

LYM

lymphocyte count

MCH

mean corpuscular hemoglobin

MCHC

mean corpuscular hemoglobin concentration

MPV

mean platelet volume

BASO

basophil count

EON

eosinophil count

HGB

hemoglobin

PLT

platelet

PDW

platelet distribution width

PCT

platelet hematocrit

NEUT

neutrophil count

D-Di

D-Dimer

hs-TnT

high sensitivity troponin T

BNP

brain natriuretic peptide

ALB

albumin

total cholesterol

LDL-C

low density lipoprotein cholesterol

triglyceride

HDL-C

high density lipoprotein cholesterol

ACC

accuracy

LASSO

least absolute shrinkage and selection operator

1se

one standard error

SHAP

SHapley Additive exPlanations

AUC

area under curve

NIR

no Information rate

Ethics approval and consent to participate

Consent for publication

Not applicable

Availability of data and materials

The existing data were obtained from was available from the following link: https://physionet.org/content/heart-failure-zigong/1.2.

Competing interests

All the authors have declared no competing interest.

Funding

This study was funded by Special Training Program for Outstanding Young Scientific and Technological Talents (grant number ZZ13-YQ-012).

Authors' contributions

Song Sheng completed the statistical analysis and wrote the paper. Professor Ye Huang designed the study and substantively revised it.

Acknowledgements

We are very grateful to the original authors of the study. They finished the entire study and uploaded their raw data for free. They are Zhongheng Zhang, Linghong Cao, Rangui Chen, Yan Zhao, Lukai Lv, Ziyin Xu, Ping Xu.

Hao G, Wang X, Chen Z, et al. Prevalence of heart failure and left ventricular dysfunction in China: the China Hypertension Survey, 2012–2015[J]. Eur J Heart Fail. 2019;21(11):1329–37.
Metra M, Lucioli P. Corrigendum to 'Prevalence of heart failure and left ventricular dysfunction in China: the China Hypertension Survey, 2012–2015' [Eur J Heart Fail 2019;21:1329–1337][J]. Eur J Heart Fail, 2020, 22(4): 759.
Gu DF, Huang GY, Wu XG, et al. Investigation of prevalence and distributing feature of chronic heart failure in Chinese adult population[J]. Chin J Cardiol. 2003;31(1):3–6.
Working Group on Heart Failure National Center for Cardiovascular Quality Improvement. 2020 Clinical Performance and Quality Measures for Heart Failure in China[J]. Chin Circul J, 2021, 36(3): 221–238.
Zhang Y, Zhang J, Butler J, et al. Contemporary Epidemiology, Management, and Outcomes of Patients Hospitalized for Heart Failure in China: Results From the China Heart Failure (China-HF) Registry[J]. J Card Fail. 2017;23(12):868–75.
Zhang Z, Cao L, Chen R, et al. Electronic healthcare records and external outcome data for hospitalized patients with heart failure[J]. Sci Data. 2021;8(1):46.
Huang J, Yin H, Zhang M, et al. Understanding the economic burden of heart failure in China: impact on disease management and resource utilization[J]. J Med Econ. 2017;20(5):549–53.
Sun Z, Dong W, Shi H, et al. Comparing Machine Learning Models and Statistical Models for Predicting Heart Failure Events: A Systematic Review and Meta-Analysis[J]. Front Cardiovasc Med. 2022;9:812276.
Riester MR, McAuliffe L, Collins C, et al. Development and validation of the Tool for Pharmacists to Predict 30-day hospital readmission in patients with Heart Failure (ToPP-HF)[J]. Am J Health Syst Pharm. 2021;78(18):1691–700.
Driscoll A, Romaniuk H, Dinh D, et al. Clinical risk prediction model for 30-day all-cause re-hospitalisation or mortality in patients hospitalised with heart failure[J]. Int J Cardiol. 2022;350:69–76.
Sharma V, Kulkarni V, McAlister F, et al. Predicting 30-Day Readmissions in Patients With Heart Failure Using Administrative Data: A Machine Learning Approach[J]. J Card Fail. 2022;28(5):710–22.
Khera R, Haimovich J, Hurley NC, et al. Use of Machine Learning Models to Predict Death After Acute Myocardial Infarction[J]. JAMA Cardiol. 2021;6(6):633–41.
Zhang Z, Ho KM, Hong Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care[J]. Crit Care. 2019;23(1):112.
Zhao QY, Liu LP, Luo JC, et al. A Machine-Learning Approach for Dynamic Prediction of Sepsis-Induced Coagulopathy in Critically Ill Patients With Sepsis[J]. Front Med (Lausanne). 2020;7:637434.
Taninaga J, Nishiyama Y, Fujibayashi K, et al. Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study[J]. Sci Rep. 2019;9(1):12384.
Li H, Zhou J, Zhou Y, et al. An Interpretable Computer-Aided Diagnosis Method for Periodontitis From Panoramic Radiographs[J]. Front Physiol. 2021;12:655556.
Goldberger AL, Amaral LA, Glass L, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals[J]. Circulation. 2000;101(23):E215–20.
Ponikowski P, Voors AA, Anker SD, et al. 2016 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure: The Task Force for the diagnosis and treatment of acute and chronic heart failure of the European Society of Cardiology (ESC)Developed with the special contribution of the Heart Failure Association (HFA) of the ESC[J]. Eur Heart J. 2016;37(27):2129–200.
Bangalore S, Guo Y, Samadashvili Z, et al. Everolimus-eluting stents or bypass surgery for multivessel coronary disease[J]. N Engl J Med. 2015;372(13):1213–22.
Filion KB, Azoulay L, Platt RW, et al. A Multicenter Observational Study of Incretin-based Drugs and Heart Failure[J]. N Engl J Med. 2016;374(12):1145–54.
Liu G, Chen Z, Danilova IG, et al. Identification of miR-200c and miR141-Mediated lncRNA-mRNA Crosstalks in Muscle-Invasive Bladder Cancer Subtypes[J]. Front Genet. 2018;9:422.
Twick I, Zahavi G, Benvenisti H, et al. Towards interpretable, medically grounded, EMR-based risk prediction models[J]. Sci Rep. 2022;12(1):9990.
Greene SJ, Fonarow GC, Vaduganathan M, et al. The vulnerable phase after hospitalization for heart failure[J]. Nat Rev Cardiol. 2015;12(4):220–9.
Chun S, Tu JV, Wijeysundera HC, et al. Lifetime analysis of hospitalizations and survival of patients newly admitted with heart failure[J]. Circ Heart Fail. 2012;5(4):414–21.
Desai AS, Stevenson LW. Rehospitalization for heart failure: predict or prevent?[J]. Circulation. 2012;126(4):501–6.
Gheorghiade M, Pang PS, Ambrosy AP, et al. A comprehensive, longitudinal description of the in-hospital and post-discharge clinical, laboratory, and neurohormonal course of patients with heart failure who die or are re-hospitalized within 90 days: analysis from the EVEREST trial[J]. Heart Fail Rev. 2012;17(3):485–509.
Tan BY, Gu JY, Wei HY, et al. Electronic medical record-based model to predict the risk of 90-day readmission for patients with heart failure[J]. BMC Med Inform Decis Mak. 2019;19(1):193.
Euser AM, Zoccali C, Jager KJ, et al. Cohort studies: prospective versus retrospective[J]. Nephron Clin Pract. 2009;113(3):c214–217.

No competing interests reported.

SupplementaryTable1.xlsx

Download PDF

Version 1

posted

You are reading this latest preprint version

Predicting 90-day readmission for patients with heart failure: a machine learning approach using XGBoost

Status:

Version 1

Abstract

Figures

Introduction

Materials And Methods

Data Analysis

Result

Discussion

Conclusion

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1