Prediction of 30-Day In-Hospital Mortality in Elderly UGIB Patients Using a Simplified Risk Score and Comparison with AIMS65 score

doi:10.21203/rs.3.rs-3332780/v1

Background: Upper gastrointestinal bleeding (UGIB) in elderly patients is associated with substantial in-hospital morbidity and mortality. This study aimed to develop and validate a simplified risk score for predicting 30-day in-hospital mortality in this population.

Methods: A retrospective analysis was conducted on data from 1899 UGIB patients aged ≥65 years admitted to a single medical center between January 2010 and December 2019. An additional cohort of 330 patients admitted from January 2020 to October 2021 was used for external validation. Variable selection was performed using five distinct methods, and models were generated using generalized linear models, random forest, support vector machine, and k-nearest neighbors approaches. The developed score, "ABCAP," incorporated Albumin <30 g/L, Blood Urea Nitrogen (BUN) >7.5 mmol/L, Cancer presence, Altered mental status, and Pulse rate >100/min, each assigned a score of 1. Internal and external validation procedures compared the ABCAP score with the AIMS65 score.

Results: In internal validation, the ABCAP score demonstrated robust predictive capability with an area under the curve (AUC) of 0.878 (95% CI: 0.824-0.932), comparable to the AIMS65 score (AUC: 0.827, 95% CI: 0.751-0.904). External validation yielded an AUC of 0.799 (95% CI: 0.709-0.889) for the ABCAP score, slightly outperforming the AIMS65 score (AUC: 0.743, 95% CI: 0.647-0.838). The ABCAP score effectively stratified mortality risk into low (0-2 points), moderate (3 points), and high (4-5 points) categories. This score exhibited consistent accuracy across variceal and non-variceal UGIB subgroups.

Conclusions: The ABCAP score incorporates easily obtained clinical variables and demonstrates promising predictive ability for 30-day in-hospital mortality in elderly UGIB patients. It allows effective mortality risk stratification and showed slightly better performance than the AIMS65 score. Further cohort validation is required to confirm generalizability.

Elderly

UGIB

in-hospital outcome prediction

AIMS65

Upper gastrointestinal bleeding (UGIB) is a prevalent and clinically significant condition that poses a considerable burden on public health systems. In the United Kingdom, the annual incidence ranges from 103 to 172 cases per 100,000 adults, with a mortality rate of 8–14% [1]. Similarly, in the United States, there were over 800,000 emergency department visits related to UGIB annually, and approximately 50% of these cases require hospitalization[2].

The incidence of upper gastrointestinal bleeding is intricately linked with a range of influential factors, encompassing aging, conditions such as peptic ulcer disease and cirrhosis, alcohol abuse, Helicobacter pylori infection, stress, and recently rapidly increasing drug-related contributors such as non-steroidal anti-inflammatory drugs (NSAIDs), corticosteroids, and anticoagulant/antiplatelet medications. Among these factors, aging plays a particularly pivotal role, notably augmenting the prevalence and prognosis of UGIB within the elderly demographic[3, 4].Despite advancements in preventive and treatment measures, such as the eradication of Helicobacter pylori, the application of proton pump inhibitors (PPIs) and improvements in endoscopic therapies, mortality incidence was nearly double in Egyptian elderly patients than in young patients[5].In China,the UGIB-specific death rate was estimated to be 4–14%[6], remains a substantial concern among elderly individuals and cannot be overlooked.

To aid in the clinical management of UGIB, several prediction tools have been developed and validated. These tools, including the Glasgow Blatchford Score(GBS)[7], Rockall Score(RS)[3], and AIMS65 score[8], have demonstrated utility in predicting in-hospital mortality in patients with UGIB and have been integrated into routine clinical care. However, the complexity of the GBS and RS may limit their widespread use. Additionally, the reliance of the RS(complete version) on endoscopic findings may not be feasible or preferred by all patients. Furthermore, many studies have not specifically focused on the elderly population, although the AIMS65 score,as one brief score tool, categorizes individuals aged 65 and above into a single risk category, which may not adequately capture the heterogeneity within this age group. Given the current rapid aging of the population, there is a pressing need to shift our focus toward the elderly and reassess the existing scoring systems, particularly within the Chinese population[9].

In light of the aforementioned considerations, the primary objective of this study is to develop a robust predictive model for estimating the 30-day in-hospital all-cause mortality among elderly patients with UGIB before endoscope. Additionally, we aim to compare the predictive performance of this new score with that of the widely used AIMS65 score, which also does not require endoscopic evaluation and is more accessible for clinical implementation. By achieving these research objectives, we strive to enhance risk stratification, improve clinical decision-making, and ultimately optimize the outcomes and overall management of elderly patients with UGIB.

2.1 Participants

The source of participants for this study is the Elderly Diseases Dataset, a well-established and continuously updated research dataset comprising individuals aged over 60 years. The dataset is derived from the Electrical Health Record of the First Medical Center of the Chinese People's Liberation Army General Hospital (PLAGH). The development dataset encompasses patients admitted between January 2010 and December 2019, as per the available version before 2022. Furthermore, an external validation dataset was employed to assess the predictive score's performance beyond its development dataset. This validation dataset comprises patients admitted between January 2020 and October 2021.

Inclusion Criteria:

Age at admission greater than 65 years
Admission diagnosis of UGIB (blood loss from a gastrointestinal source above the ligament of Treitz), including diagnoses documented by physicians, admission records, and corresponding International Classification of Diseases 10th codes (ICD-10)
Typical symptoms described in patient complaints and medical history, such as "hematemesis (vomiting of fresh blood)," "coffee-ground" emesis (vomiting of dark altered blood), and/or melena
Identification by a gastroenterologist
For patients with multiple admissions, only data from the earliest hospitalization were considered.

Exclusion Criteria:

Unclassified gastrointestinal bleeding
Nonbleeding periods, such as old bleeding episodes or bleeding history
Cases with a significant (>50%) amount of missing laboratory results.

2.2 Data Collection

2.2.1 ICD-10 Codes for UGIB

The ICD-10 codes of UGIB: see the supplementary document.

2.2.2 Laboratory Test Indicators

Priority was given to data collected within 24 hours of admission, including hemoglobin (HGB, g/dL), platelet count (PLT, 10^3/µL), international normalized ratio (INR),Albumin (Albumin, g/L), blood urea nitrogen (BUN, mmol/L), creatinine(CR,μmol/L) and estimated glomerular filtration rate(eGFR),which was calculated using the following formula:eGFR(ml/(min*1.73m²))=186×(Scr)^-1.154×(Age)^-0.203×(0.742Female) *1.233. Several other critical indicators, such as white cell counts and prothrombin time, were excluded from the analysis due to issues related to multicollinearity during preliminary work.Indicators with missing values exceeding 20% are not collected.

2.2.3 Vital Signs, Checkup, and Mental Status

Vital signs, including systolic blood pressure (SBP, mmHg) , diastolic blood pressure (DBP, mmHg) and pulse (beats per minute) were recorded.Body mass index (BMI) was calculated using the following formula:BMI = weight (kg) / height² (m²).Altered mental status was defined as a Glasgow Coma Scale score of less than 14 or a physician-charted designation of “disoriented,” “lethargy,” “stupor,” or“coma.”

2.2.4 Charlson Comorbidity Index and comorbidities

The collection of comorbidities for the Charlson Comorbidity Index (CCI) involved the utilization of ICD-10 codes, with subsequent calculation of the CCI score for each patient by summing the assigned weights of the respective comorbidities (as detailed in the supplementary document). CCI serves as an extensively utilized tool for prognosticating 10-year survival rates among patients grappling with multiple comorbid conditions [10, 11]. This index attributes specific weights to diverse comorbidities in accordance with their individual impact on prognosis. In our study, the enumeration of CCI components was instrumental in depicting the intricate landscape of health challenges faced by the elderly population.

The spectrum of collected comorbidities encompassed an array of conditions, including coronary heart disease (CAD), congestive heart failure (CHF), peripheral vascular disease (PAD), cerebrovascular disease (CVD), chronic obstructive pulmonary disease (COPD), moderate to severe kidney disease (Kidney Diseases), and liver disease (Liver Diseases). Additionally, cancers, both metastatic and nonmetastatic, as well as other conditions featured in the CCI were comprehensively incorporated. Furthermore, common geriatric comorbidities such as hypertension (HTN) and atrial fibrillation (AF) were meticulously recorded in the dataset, contributing to the comprehensive portrayal of the patients' health status.

2.2.5 Endoscope and Outcome

Endoscopy records during hospitalization were collected. The primary outcome of interest was defined as any death occurring within 30 days of hospitalization with UGIB. Additionally, we also collected data on the length of hospital stay (days,LOS).

2.3 Quality Control

Two independently trained investigators analyzed and collected data from electronic medical records. In case of conflicts, higher-level personnel make the final determination.

2.4 Statistical,Data Handling,Training and Evaluation Methods

2.4.1 Statistical Methods

All statistical analyses were conducted using R (R Version:4.2.3.R Core Team (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.). Normally distributed variables are presented as the mean ± standard deviation, while nonnormally distributed variables are described as the median (interquartile range). Group comparisons for normally distributed variables employed t tests, whereas the Kruskal-Wallis test was applied for nonnormally distributed variables. Categorical variables were compared using the chi-square test or Fisher's exact test.To assess the association of each variable with the outcome, univariable regression was used.

2.4.2 Training and internal validation Dataset

The Development dataset was further divided into a 70% training subset, used for variable selection and model training, and a 30% internal validation subset. The external validation dataset was employed for independent model evaluation and to ensure the robustness of the model performance assessment. The random partitioning of data into these subsets was carried out utilizing the 'createDataPartition' function available within the 'caret' package.

2.4.3 Missing value handling

In our study, the missing values, mainly attributed to the lack of testing within a specific time window, can be classified as "Missing Completely at Random" (MCAR). In MCAR scenarios, the absence of data is assumed to be unbiased and unlikely to systematically affect the outcomes, simplifying the process of imputation and analysis. To address these missing values within the development dataset, we employed appropriate procedures.

To ensure the integrity and reliability of the results, variables with missing data exceeding 20% were excluded from the analysis. Additionally, individual cases with missing values surpassing 50% in laboratory indicators were excluded during the data screening process. For instances where missing values were less than 20%, we utilized the 'missForest' function from the 'missForest' package in R for imputation. To enhance the reliability of the imputed data, we repeated the imputation process five times, and a statistical test was conducted to compare the imputed data with the original dataset. This comparison confirmed the absence of significant discrepancies between the imputed data and the original dataset. Detailed information can be found in the supplementary document.

2.4.4 Variables selection methods

To enhance the clinical applicability and interpretability of the predictive model, continuous variables were transformed into categorical variables using both general standard cutoff values and specific cutoff values employed in the First Medical Center of PLAGH. To identify the most relevant predictors and reduce dimensionality, we employed five variable selection methods: Stepwise by Akaike Information Criterion (‘StepAIC’,'MASS' package), Least Absolute Shrinkage and Selection Operator ('LASSO', 'glmnet' package), Elastic net ('ENT','glmnet' package), Best subset (‘BestSub', 'leaps' package), and Recursive Feature Elimination ('RFE', 'caret' package). All selection methods were applied to the five training iterations, and the resulting variable selection outcomes from each method were combined. In cases where the number of selected variables exceeded 10, we included the top 10 variables that appeared most frequently in the selection results.

2.4.5 Model Training Methods and Evaluation in Internal and External Evaluation

In this study, we employed four model training methods ('train' function of 'caret' package): generalized linear models (GLM), k-nearest neighbors (KNN), support vector machine (SVM), and random forest (RF). These methods were carefully selected to leverage their unique strengths in capturing relationships between predictors and the outcome variable.To train the models, we performed the training process on the five training datasets using the variable selection subsets obtained from the feature selection process. Subsequently, the trained models were evaluated on the corresponding five internal validation datasets, enabling a comprehensive assessment of their performance across different data partitions. During the evaluation process, various performance metrics, including specificity, sensitivity, accuracy, F1 score, and area under the curve (AUC), were calculated for each iteration on the internal validation. To provide a more reliable estimate of the models' overall performance, the mean value of each performance metric across the five internal validation iterations was calculated.

Once the optimal model was identified and the scoring system was established, we employed AUC as the metric to validate its performance. This validation encompassed a comparison of the model's performance with the established AIMS65 score, across both internal and external validation phases.

3.1 Characteristics of Participants in the Development Dataset

A total of 1899 distinct patients diagnosed with UGIB were included in the development dataset, and subsequently divided into training and internal validation subsets. To ensure rigor, the inclusion and exclusion criteria detailed in the Methods section were meticulously applied. Additionally, an external validation dataset consisting of 330 patients was defined, maintaining alignment with the established criteria.A flow diagram illustrating the participant selection process is presented in Figure 1. The cohort comprised patients with a median age of 72 years, ranging from 65 to 102 years. During the 30-day follow-up period, a total of 97 patients experienced mortality associated with the condition, with an additional 21 patients passing away after the 30-day period.

Within the development dataset, a comparative analysis was conducted to investigate potential differences between patients who died within 30 days and those who survived. Demographic characteristics, comorbidities, and laboratory results were compared and summarized in Table 1. The differences between the groups were assessed using appropriate statistical tests, including Pearson's chi-squared test, Fisher's exact test, and the Wilcoxon rank sum test. Furthermore, univariate logistic regression analysis was conducted to examine the association between each characteristic and the outcome. Table 1 presents the results of these analyses. Importantly, all the data in Table 1 are original and have not undergone imputation.

Figure 1 Flow diagram of participants in study

In Table 1, it is noteworthy that the "Alive" group exhibits a notably higher ratio of in-hospital endoscopy, variceal cases, and alcohol use compared to the "Death" group, signifying a statistically significant association. Conversely, the "Death" group demonstrated an elevated mean age, increased CCI value, and a greater frequency of altered mental status. Additionally, the "Death" group shows a comparatively shorter LOS compared to the "Alive" group.

Table1 Baseline Characteristics of Patients in the Development Dataset

Characteristic¹	Alive at 30 d, N = 1,802	Death within 30 d, N = 97	p-value²	Univariable logistic regression
Characteristic¹	Alive at 30 d, N = 1,802	Death within 30 d, N = 97	p-value²	OR(95%CI)	p-value
AGE (years)Median (IQR³)	72 (68, 78)	78 (71, 83)	<0.001	1.08(1.05,1.11)	＜0.000
Female (n, %)	729 (40%)	37 (38%)	0.651	1.1(0.73,1.69)	0.651
In-hospital Endoscope	1324(73%)	38(39%)	<0.000	0.23(0.15,0.35)	＜0.000
Altered Mental Status(n, %)	93 (5.5%)	34 (41%)	<0.001	12.13(7.42,19.69)	<0.000
Smoke History(n, %)	421 (25%)	23 (28%)	0.532	1.17(0.7,1.89)	0.533
Alcohol Use(n, %)	453(25%)	15(15%)	0.031	0.54(0.3,0.93)	0.034
Comorbidities(n, %)
Variceal	601 (33%)	21 (22%)	0.017	0.55(0.33,0.89)	0.018
Peptic Ulcer	353 (20%)	21 (22%)	0.619	1.13(0.67,1.83)	0.619
ICH	33 (1.8%)	5 (5.2%)	0.041	2.91(0.98,7.01)	0.030
CAD	89 (4.9%)	12 (12%)	0.001	2.72(1.37,4.98)	0.002
Liver Diseases	666 (37%)	36 (37%)	0.976	1.01(0.65,1.53)	0.976
Diabetes	405 (22%)	23 (24%)	0.777	1.07(0.65,1.71)	0.777
Cancer	313 (17%)	39 (40%)	<0.001	3.2(2.08,4.87)	<0.000
HTN	645 (36%)	49 (51%)	0.003	1.83(1.21,2.76)	0.004
AF	85 (4.7%)	5 (5.2%)	0.805	1.1(0.38,2.52)	0.843
CVD	236 (13%)	23 (24%)	0.003	2.06(1.24,3.31)	0.004
CHF	66 (3.7%)	14 (14%)	<0.001	4.44(2.31,8.01)	<0.000
COPD	26 (1.4%)	3 (3.1%)	0.182	2.18(0.51,6.34)	0.208
Kidney Diseases	56 (3.1%)	13 (13%)	0.001	4.83(2.44,8.92)	<0.000
CCI Median (IQR)	5 (4, 6)	7 (5, 9)	<0.001	1.44(1.32,1.57)	<0.000
LOS(days) Median (IQR)	14 (9, 21)	5 (2, 14)	<0.001
Physical&Lab Examination (IQR)
BMI(kg/㎡)	23.4 (21.0, 25.8)	21.8 (19.5, 24.4)	<0.001	0.9(0.84,0.96)	0.001
DBP(mmHg)	71 (65, 80)	65 (58, 72)	<0.001	0.95(0.93,0.97)	<0.000
SBP(mmHg)	130 (118, 140)	117 (100, 136)	<0.001	0.97(0.96,0.98)	<0.000
Pulse(per minute)	78 (72, 83)	87 (78, 104)	<0.001	1.06(1.05,1.07)	<0.000
PLT(10^9/L)	128 (46, 211)	89 (53, 145)	0.040	1(0.99,1)	0.049
HGB(g/L)	96 (80, 117)	85 (73, 105)	0.003	0.98(0.97,0.99)	0.001
Albumin(g/L)	34.9 (30.7, 39.0)	27.4 (22.0, 31.9)	<0.001	0.83(0.79,0.86)	<0.000
BUN(mmol/L)	5.5 (4.2, 7.6)	10.7 (6.9, 18.3)	<0.001	1.12(1.09,1.15)	<0.000
INR	1.14 (1.05, 1.28)	1.32 (1.20, 1.62)	<0.001	2.09(1.43,3.04)	<0.000
CR(μmol/L)	71 (59, 86)	81 (64, 125)	0.003	1.00(1.00,1.01)	＜0.000
eGFR(ml/min/1.73m²)	110 (91, 132)	95 (63, 133)	0.002	0.99(0.98,0.99)	0.001

1n (%)Before imputation

2Pearson's Chi-squared test; Fisher's exact test; Wilcoxon rank sum test

3IQR=Interquartile range

Table2: Variables Selection Results by Different Methods In the Training Dataset

Variable	Categorical	StepAIC	LASSO	ENT	RFE	BestSub
AGE	75years					✓
Albumin	30 g/L	✓	✓	✓	✓	✓
BUN	7.5 mmol/L	✓	✓	✓	✓	✓
Altered Mental	Yes/No	✓	✓	✓	✓	✓
eGFR	<30~90<				✓
CHF	Yes/No	✓		✓		✓
HGB	120 g/L(Male); 110 g/L(Female)	✓
ICH	Yes/No	✓
INR	1.5	✓		✓	✓	✓
Liver Diseases	Yes/No				✓
Peptic Ulcer	Yes/No				✓
Pulse	100/min	✓	✓	✓	✓	✓
SBP	90 mmHg		✓			✓
Cancer	Yes/No	✓	✓	✓	✓	✓
Variceal	Yes/No	✓			✓	✓
No.	15	10	6	7	10	10

3.2 Variables Subset selection

Following the completion of imputation and dataset splitting, a comprehensive variable selection process was initiated, employing five distinct methods on the entire training dataset. The outcomes of these selection methods exhibited varying patterns, as shown in Table 2. A total of 15 variables were selected by five methods.The StepAIC,RFE and BestSub methods identified more than 10 different significant features, while the 10 most frequently occurring features are included in Table2.LASSO and ENT have one relatively fixed result. INR emerged as a significant variable selected by four of the methods, while HF and Variceal were each chosen by three methods. Interestingly, SBP was exclusively chosen by two specific methods. Variables such as AGE, eGFR, HGB, ICH, Liver Diseases, and Peptic Ulcer were each selected by a single method." Based on the consistent selections of important variables across the all methods, including "Albumin" ,"BUN", "Cancer", " Altered Mental " and "Pulse",we can create another subset called "ABCAP" that includes these five variables.

Table 3:Model performance in Internal validation

	Accuracy	Sensitivity	Specificity	F1 Score	AUC
KNN+StepAIC	0.951±0.002	0.953±0.002	0.71±0.213	0.975±0.001	0.872 (0.8, 0.943)
KNN+LASSO	0.953±0.003	0.956±0.001	0.69±0.192	0.976±0.001	0.813 (0.716, 0.909)
KNN+ENT	0.951±0.001	0.955±0.003	0.7±0.184	0.975±0	0.825 (0.736, 0.915)
KNN+BestSub	0.951±0.002	0.953±0.001	0.72±0.189	0.975±0.001	0.863 (0.782, 0.943)
KNN+RFE	0.951±0.005	0.953±0.004	NA±NA	0.975±0.002	0.833 (0.745, 0.921)
RF+StepAIC	0.949±0.004	0.954±0.003	0.659±0.317	0.974±0.002	0.8 (0.707, 0.893)
RF+LASSO	0.951±0.002	0.959±0.003	0.56±0.034	0.975±0.001	0.769 (0.676, 0.862)
RF+ENT	0.95±0.002	0.953±0.002	0.55±0.17	0.974±0.001	0.806 (0.717, 0.896)
RF+BestSub	0.951±0.004	0.957±0.001	0.578±0.132	0.975±0.002	0.808 (0.718, 0.897)
RF+RFE	0.953±0.004	0.956±0.002	0.745±0.228	0.976±0.002	0.84 (0.759, 0.92)
SVM+StepAIC	0.947±0.002	0.953±0.002	0.393±0.068	0.973±0.001	0.76 (0.66, 0.859)
SVM+LASSO	0.95±0.003	0.956±0.005	0.495±0.113	0.974±0.002	0.67 (0.552, 0.787)
SVM+ENT	0.948±0.003	0.953±0.002	0.438±0.133	0.973±0.002	0.665 (0.55, 0.779)
SVM+BestSub	0.949±0.005	0.954±0.004	0.544±0.29	0.974±0.002	0.711 (0.589, 0.832)
SVM+RFE	0.951±0.004	0.955±0.001	0.689±0.289	0.975±0.002	0.695 (0.57, 0.821)
GLM+StepAIC	0.956±0.006	0.963±0.006	0.65±0.128	0.977±0.003	0.884 (0.822, 0.946)
GLM+LASSO	0.955±0.006	0.961±0.004	0.675±0.165	0.977±0.003	0.88 (0.818, 0.943)
GLM+ENT	0.955±0.006	0.962±0.004	0.645±0.157	0.977±0.003	0.878 (0.813, 0.943)
GLM+BestSub	0.956±0.008	0.963±0.006	0.67±0.173	0.977±0.004	0.89 (0.831, 0.949)
GLM+RFE	0.956±0.007	0.963±0.004	0.669±0.158	0.977±0.003	0.882 (0.819, 0.945)
GLM+ABCAP	0.957±0.007	0.962±0.005	0.691±0.157	0.978±0.003	0.879 (0.818, 0.939)

3.3 Performance of Models Combined with Variable Subsets in Internal Validation.

The evaluations of various training methods combined with different feature selection results in internal validation are presented in Table 3. For instance, the combination of KNN+ StepAIC means using the KNN training prediction model with the variable subset selected by StepAIC.All combinations consistently achieved accuracy, sensitivity, and F1 score levels slightly above 0.9.Combinations involving RF, KNN, and SVM models exhibited issues with correctly identifying negative instances, resulting in decreased overall AUC values, especially for SVM. The GLM models consistently outperformed the other methods. Among them, GLM+BestSub showed the best performance of all and achieved the highest AUC (0.89,95% CI: 0.831, 0.949), with other GLM combinations also showing promising results. The AUC for GLM+ABCAP was 0.879 (95% CI: 0.818, 0.939), slightly below the AUC of 0.888 for GLM+BestSub,which includes only five key variables.

Figure 2 displays the receiver operating characteristic (ROC) curves for each combination with the highest AUC, including the GLM+ ABCAP combination. While GLM+BestSub emerged as the top performer in terms of AUC, it's essential to consider the balance between model complexity and performance. Notably, the GLM+ABCAP combination offers a more parsimonious model by utilizing only five variables, compared to the ten variables used by GLM+BestSub. This suggests that the GLM+ABCAP combination may be a more appropriate choice when seeking a simpler and more interpretable model, without a significant sacrifice in predictive performance.

Figure2 Max AUC of Each Training Methods in Internal validation

3.4 General Linear Models Equation and ABCAP score

After carefully considering various factors, we chose the GLM+ABCAP combination as our final model. Consequently, the resulting GLM equation, trained using the training dataset, takes the form:

In the GLM equation, each variable is represented by a coefficient value, indicating its contribution to the log-odds of the outcome (death). A positive coefficient suggests a positive association with the outcome, while a negative coefficient suggests a negative association. By plugging in the respective values of the variables, the equation allows for the estimation of the probability of the outcome being death.

The equation and Table 4 reveal important associations between the variables and the outcome. The intercept term of -3.71 represents the log-odds of the outcome when all predictor variables are at their reference levels or baseline values. Among the five variables integrated into the model, Cancer, Altered Mental, Pulse (>100/min), and BUN (>7.5 mmol/L) exhibit positive coefficients, indicating an elevated risk of the outcome. Conversely, Albumin (≥30 g/L) has a negative coefficient, suggesting a protective effect on prognosis, while Albumin (<30 g/L) has the opposite effect. The odds ratios further quantify these associations. Odds ratios exceeding 1 signify an increased risk, whereas odds ratios below 1 indicate a decreased risk.

Table 4: Generalized Linear Models With ABCAP variables in the Training Dataset

Variable	β-coefficient	OR	95%CI	p-value
Cancer	1.60	4.943	2.671-9.154	<0.000
Altered Mental Status	1.59	5.276	2.679-10.394	<0.001
Pulse＞100/min	1.40	4.151	2.086-8.263	<0.000
Albumin≥30g/L	-1.39	0.254	0.138-0.465	<0.001
BUN＞7.5mmol/L	1.35	3.894	2.044-7.422	<0.001

Table5: Features and points of the ABCAP score and AIMS65 score

	Variable	Parameter	Score
ABCAP score	Albumin	<30 g/L	1
	BUN	＞7.5 mmol/L	1
	Cancer	Yes/No	1
	Altered Mental	Yes/No	1
	Pulse	＞100/min	1
AIMS65 score	Albumin	<30 g/L	1
	INR	>1.5	1
	Mental status	Altered	1
	SBP	≤90 mmHg	1
	Age	≥65 years	1

The ABCAP score, derived from the GLM equation mentioned earlier, simplifies calculations by assigning one point to each of the five predictors. In parallel, the AIMS65 score—a validated scoring system for predicting in-hospital mortality in patients with UGIB—also incorporates five variables, with each variable assigned a score of 1 point[8].

Table 5 compares variables in both scoring systems used to assess severity and predict in-hospital mortality for patients with UGIB. The ABCAP score includes Cancer, Altered Mental Status, Pulse > 100/min, Albumin < 30g/L, and BUN > 7.5mmol/L. In contrast, the AIMS65 score involves Albumin < 30g/L, INR > 1.5, Altered Mental Status, SBP ≤ 90mmHg, and Age ≥ 65 years.Both the ABCAP and AIMS65 scores share commonalities by incorporating Albumin and Altered Mental Status as indicators of disease severity. Notably, the ABCAP score does not consider Age, INR, and SBP as scoring criteria, distinguishing it from the AIMS65 score.

3.5 Comparison of ABCAP score with AIMS65 score in Internal and External validation

Table 6:Characteristics of External Validation Dataset

Characteristics	levels	0 (N=298)	1 (N=32)	p
Altered Mental	Yes	23 (7.7%)	9 (28.1%)	0.001
Cancer	Yes	103 (34.6%)	20 (62.5%)	0.004
SBP	<90 mmHg	130.9 ± 20.8	107.4 ± 20.6	<.001
		3 (1%)	7 (21.9%)
PULSE	>100/min	78.0 (72.0,90.0)	93.0 (77.5,109.0)	0.001
		34 (11.4%)	13 (40.6%)
Albumin	<30 g/L	34.3 ± 5.1	29.4 ± 6.2	<0.001
		61 (20.5%)	17 (53.1%)	<0.001
BUN	>7.5 mmol/L	7.2 (4.7,10.8)	10.8 (6.8,21.2)	0.002
		138 (46.3%)	22 (68.8%)
INR	>1.5	1.1 (1.1,1.3)	1.4 (1.2,1.6)	<.001
		37 (12.8%)	12 (37.5%)
ABCAP score		1(1,2)	2(2.5,3.25)	<0.000
AIMS65 score		1(1,2)	2(1,3)	<0.000

Figure3 AIMS65 score,ABCAP score And original equation performance in Internal and External validation

Table 6 presents the characteristics of the external validation dataset. All variables included in both scoring systems exhibited significant differences between the groups of patients who survived and those who did not.Figure 3 shows the ROC curves of the original equation, AIMS65 and ABCAP score in both internal and external validations. In the internal validation, the original equation achieves an AUC of 0.886 (95% CI: 0.832-0.940), the ABCAP score attained an AUC of 0.878 (95% CI: 0.824-0.932), and the AIMS65 score demonstrated an AUC of 0.827 (95% CI: 0.751-0.904).

In the external validation, both the ABCAP and AIMS65 scores experienced a slight decrease in their predictive power, with the ABCAP score yielding an AUC of 0.799 (95% CI: 0.709-0.889) and the AIMS65 score achieving an AUC of 0.743 (95% CI: 0.647-0.838). Despite this decline, the ABCAP score continues to offer robust evaluation of severity and risk compared to the AIMS65 score.This finding underscores the ABCAP score's potential to provide a reliable assessment of severity and risk for patients in our study when compared to the AIMS65 score.

3.6 ABCAP score Performance in the Variceal and No-Variceal Groups

Figure 4 ABCAP score distribution and performance in the Variceal and Nonvariceal groups of the Development Dataset

The distribution of the ABCAP score at different score levels and its performance within the variceal (622 patients) and nonvariceal (1277 patients) groups within the Development Dataset are visualized in Figure 4. Notably, the number of patients in the nonvariceal group was nearly double that in the variceal group. The distribution of ABCAP scores among different levels within each group correlates with the proportion of individuals in that group.

To assess the predictive performance of the ABCAP score, we utilized the AUC values for both the variceal and nonvariceal groups. In the variceal group, the calculated AUC was 0.881 (95% CI: 0.805-0.958), signifying a strong level of predictive accuracy. Similarly, within the nonvariceal group, the AUC was measured at 0.873 (95% CI: 0.834-0.912). A statistical analysis with a P value of 0.853 indicates no significant difference in the performances of the ABCAP score between these two groups.

3.7 ABCAP score level metrics and risk stratification

Table 7 Score and death counts of ABCAP and AIMS65 in the Development dataset

Score level	ABCAP score		AIMS65 score		Same score patients counts
Score level	Counts	Death	Counts	Death	Same score patients counts
0	909	2(0.2%)	0	0(NaN)	0
1	635	17(2.7%)	1338	15(1.1%)	421
2	238	31(13.0%)	396	35(8.8%)	141
Cumulative result	1782	50(2.8%)	1734	50(2.9%)	562
3	79	24(30.4%)	117	22(18.8%)	32
4	33	19(57.6%)	44	22(50.0%)	20
5	5	4(80.0%)	4	3(75.0%)	0
Cumulative result	117	47(40.2%)	165	47(28.5%)	52

Table 7 outlines the patient and death counts within the development dataset, categorized according to each score level for both the ABCAP and AIMS65 scoring systems. Notably, the distribution of scores exhibits variations across these levels, with the "0" and "1" point levels having the highest patient counts in both scoring systems.Upon analyzing cumulative counts, a clear pattern emerges wherein low scores (ranging from 0 to 2 points) and high scores (ranging from 3 to 5 points) exhibit a notable similarity for both scoring methods.However, when considering cumulative death counts and the associated ratios, we observe distinct patterns. In instances of low scores, the ABCAP system demonstrates no significant difference from the AIMS65 system. In contrast, for cases involving high scores, the ABCAP scores exhibit higher mortality ratios compared to the AIMS65 system.

Of particular interest is the "Same score patients counts" column, which reveals the count of patients who received identical scores in both the ABCAP and AIMS65 systems. Remarkably, the proportion of patients displaying the same score was consistently less than 50% across all score levels in the AIMS65 system.

Table 8 : Cumulative Mean of Statistic Metrics in Total Dataset Use ABCAP score

Score	Sens	Spec	PPV	NPV	Death Patients	All Patients	Mortality%	Score Level Mortality%	PLR
≥0	1.000	0.000	0.051	NaN	97	1899	5.11	0.2	1
≥1	0.965	0.492	0.093	0.996	94	1010	9.27	2.7	1.898
≥2	0.784	0.837	0.206	0.986	76	369	20.58	13.0	4.815
≥3	0.476	0.961	0.397	0.972	46	116	39.69	30.4	12.232
≥4	0.208	0.995	0.677	0.959	20	30	67.73	57.6	40.296
5	0.041	0.999	0.797	0.951	4	5	79.67	80.0	74.309

Sens:sensitivity;Spec:specificity;

PPV:positive predictive value;NPV:negative predictive value;

PLR: positive likelihood ratio

Table 8 provides the cumulative mean values of the statistical metrics for each score level and the corresponding mortality rates obtained from the five imputation datasets using the ABCAP score. As score levels increase, an evident trend emerges. Sensitivity experiences a decline, while specificity follows an opposing trajectory. The Positive predictive value (PPV) exhibits an ascending pattern, whereas the negative predictive value (NPV) displays an inverse association with higher score levels.

In terms of mortality rates, score levels of ≥1 and ≥2 were associated with mortality rates of 9.27% and 20.58%, respectively. The positive likelihood ratio (PLR) values for these score levels are 1.898 and 4.815. For score level ≥3, the mortality rate further increased to 36.69%, accompanied by a PLR of 12.232. Upon reaching score levels ≥4 and 5, mortality rates escalated significantly to 67.73% and 79.67%. The corresponding PLR values also experience substantial increments, reaching 40.296 and 74.309, respectively.

Considering the distribution of ABCAP scores and the observed metrics at each score level, we can classify scores ranging from 0 to 2 as indicative of low risk, a score of 3 as signifying moderate risk, and scores ranging from 4 to 5 as indicative of high risk

Despite significant advancements in the prevention and treatment of UGIB, the prognosis for elderly patients remains a challenge during hospitalization. Interestingly, among the 1899 patients included in our development dataset, those who did not survive exhibited a lower rate of in-hospital endoscopy compared to the surviving group, despite the majority of patients undergoing such procedures as part of their medical care. The reasons behind this discrepancy could include safety concerns, patient preferences, and practical considerations, all of which can hinder the accessibility and suitability of endoscopic interventions, particularly among the elderly population [12]. Consequently, the development of a streamlined scoring system capable of swiftly assessing prognosis in elderly patients prior to undergoing endoscopy is of paramount importance.

While established scoring systems such as the GBS, RS, and AIMS65 scores have undergone extensive validation and implementation for patient triage in clinical settings, it is crucial to acknowledge that the severity of various acute and chronic conditions might differ in elderly individuals compared to their younger counterparts. Thus, it becomes imperative to explore specific risk factors that address the distinct challenges encountered by this particular patient demographic. Identifying high-risk individuals in a timely manner is pivotal for effective resource allocation, enabling prompt endoscopic or surgical interventions. Furthermore, managing the prognosis of low-risk patients who require antithrombotic therapy, such as aspirin for secondary prevention, is a critical aspect in treatment decision-making [13].

Differing from the focus of AIMS65, RS, and GBS, which predominantly address overall in-hospital mortality, our study delved into a more specific aspect: the 30-day in-hospital mortality rate. Significantly, within our development dataset, this rate was measured at 5.1%, a value notably higher than the corresponding 30-day mortality rate of 1.37% observed in the broader original population of 253,947 patients (3485 cases). This particular timeframe allows for the assessment of the near-term risk of death following UGIB, capturing the most relevant outcomes within the hospital stay. Our concentration on the 30-day in-hospital death outcome permits an evaluation of the effectiveness of interventions and risk prediction models in mitigating mortality within a critical window. Such an approach yields valuable insights for clinicians, aiding them in informed decision-making and the prioritization of suitable interventions during the acute phase of care.Furthermore, by limiting the outcome to in-hospital death, potential biases and confounding factors associated with long-term follow-up are circumvented. Factors like patient adherence, access to healthcare resources, and shifts in treatment strategies over time could influence outcomes beyond the immediate hospitalization period.

To overcome the above limitations and predict short-term outcome, we developed the ABCAP score, a simplified scoring system specifically designed for elderly patients with UGIB. This scoring system incorporates key variables, including the presence of cancers, mental status alterations, elevated heart rate, low albumin levels, and increased blood urea nitrogen. Each variable is assigned a score of 1, resulting in a concise and practical tool for risk stratification for 30-day in-hospital death.

Previous studies have primarily relied on traditional multivariable analysis or stepwise methods to identify risk factors in elderly populations with UGIB. Furthermore, some studies have explored the use of compound features, such as the shock index[14]and the blood urea nitrogen to serum albumin ratio[15], to improve conciseness and prediction accuracy. However, it is important to note that not all of these newly identified risk factors have proven to be effective predictors[14]. Therefore, in our study, we focused on incorporating key individual variables which are mostly available and easy to measure. We employed a combination of traditional methods and innovative techniques, such as StepAIC,LASSO, ENT, RFE, and Best subset selection, to facilitate the selection of key variables during the training and prediction modeling process.For example,the LASSO method has distinct advantages for variable selection in predictive modeling, particularly in complex datasets with interrelated factors, commonly found in studies involving elderly populations with multiple comorbidities[16, 17]. It efficiently handles high-dimensional data by shrinking irrelevant variables' regression coefficients to zero, enabling automatic variable selection. This regularization penalty promotes sparsity in the model, resulting in a concise set of predictors with strong predictive power, preventing overfitting and improving generalizability. LASSO has been successfully applied in various studies, including predicting in-hospital mortality risk for elderly patients undergoing cardiac valvular surgery and predicting mortality in elderly patients after hip fractures[18, 19].

Each variable selection method has its own set of advantages and limitations. While StepAIC can yield promising models, it may sometimes result in the selection of an overly expansive feature subset. In contrast, Best Subset selection exhaustively explores all potential predictor combinations, ultimately identifying the subset that optimally fits the model. This approach provides a comprehensive tool for pinpointing the most favorable set of predictors, although it is worth noting that its exhaustive nature can be computationally intensive and might lead to overfitting when the number of predictors is large. Across five iterations, it was clear that specific methodologies, especially StepAIC, BestSub, and RFE, led to the selection of over 10 variables. However, the consistency and stability of selections from these methods varied across iterations, highlighting the potential for overfitting within each iteration. Notably, LASSO consistently chose 6 variables, and ENT consistently chose 7 variables, indicating significant alignment in their selections.

In total, 15 variables emerged from the selection process, each potentially associated with in-hospital death. This encompasses variables selected by specific methods, such as Age, eGFR, HGB, ICH, Liver Diseases, and Peptic Ulcer. It's noteworthy that Age, Liver Diseases, and HGB, which were present in both the GBS and RS scores, were also consistently identified across a substantial body of research, underscoring their significance in predicting outcomes.Age's influence, a pivotal prognostic factor in various medical contexts, demonstrated a nuanced impact in our population, indicating its potential to be outweighed by other factors for individuals over 65 years old. The presence of UGIB often coincides with ICH, linked to heightened mortality risk and prolonged stays in the ICU [20, 21].Some studies found that renal function, as exemplified by eGFR, emerged as a critical marker and was indicative of poorer UGIB outcomes[22]. Furthermore, the prevalence of Peptic Ulcer Bleeding outweighed that of variceal bleeding. Depending on the type of comorbidities, this association translated into varying degrees of short-term mortality elevation[23]. Variceal is uncertain risk factors according some research[24, 25]. These findings highlight the complexity and variability in the outcomes of patients with different types of gastrointestinal bleeding and with different types of complications. Our aim was to find the best subset of predictors, which is why we implemented several methods to create different combinations and evaluated their performance using internal validation.

To comprehensively capture and analyze the complex conditions of the elderly population, we considered the use of the Charlson Comorbidity Index (CCI), a well-established scoring system for predicting mortality. CCI provides a quantitative assessment of the cumulative comorbidity burden, contributing to the evaluation of long-term prognosis[26]. Notably, certain studies have identified the potential of the CCI in predicting short-term in-hospital prognosis for elderly patients as well[27]. The Alive and Death groups showed median and interquartile range (IQR) CCI values of 5 (4, 6) and 7 (5, 9) points, respectively, shedding light on the intricate comorbidity landscape among the elderly individuals. In our initial exploration, we deliberated on whether to include CCI as a predictor, considering its independence from individual diseases within the index. Although both the LASSO and ENT methods identified CCI, its incorporation yielded only average performance, proving less effective compared to replacing it with only little comorbidity as the predictor.Additionally, the complexities associated with calculating CCI, unless utilizing specialized tools or integrating with diagnostic systems, required further information and rendered its inclusion less practical. As a result, in this study, we made the decision not to include CCI and instead focused on individually integrating each comorbidity into the variable selection process.

In our study, certain variables including age, BMI, and SBP were initially considered in their continuous format by certain variable selection methods. However, their inclusion with small coefficients would have limited the interpretability and practical utility of a scoring system. To address this concern, we undertook the necessary step of transforming these variables into categorical formats.

Presenting the data in a categorical format allowed us to effectively communicate the implications of each variable on the outcome. This approach emphasized the significance of specific ranges or levels in predicting the target variable. Moreover, using categorical variables facilitated the integration of the model into established clinical guidelines or risk stratification systems, enhancing its practical applicability in real-world contexts. It is important to acknowledge that converting continuous variables into categorical ones might result in some loss of information, and the selection of cutoff points should be made thoughtfully. For example, in the case of the variable AGE, we utilized the Youden index to determine the optimal threshold, which was identified as 75 years old. However, it is worth noting that this variable was ultimately not included in the final selection. On the other hand, for variables such as BUN, Albumin, and Pulse, we aligned the chosen cutoff points with the hospital's laboratory standards and established conventions. This approach ensured consistency with common practices, making our findings clinically meaningful and comparable across various healthcare institutions.

Ultimately, we arrived at a subset of five variables – Albumin, BUN, Cancer, Altered Mental Status, and Pulse – which were consistently selected by all five methods as well as our scoring system. This specific subset was established manually, and we are eager to evaluate its performance.

Traditional regression models have a well-established history of application and validation across various studies, leading to the development of widely used scoring systems. Notably, GBS and RS rely on logistic regression and forward stepwise techniques, respectively, while AIMS65 employs the recursive partition approach, a more recent decision tree method. In recent years, the field of predictive modeling has witnessed the emergence of innovative techniques. RF, KNN, and SVM have demonstrated distinct features and have found applications in diverse medical research domains, including predicting bleeding events among elderly patients with mechanical valve replacement[28], early detection of Alzheimer's disease stages[29], and predicting medication adherence in elderly patients with chronic diseases [30]. Each of the five methods is adept at managing categorical data and excels in performing classification tasks.Each method possesses unique strengths and limitations, and the selection of the most suitable approach hinges on the specific characteristics and objectives of the dataset. It is crucial to recognize that while machine learning methods have shown promise, they also exhibit certain limitations, such as their "black box" nature with reduced interpretability. Moreover, these methods may require iterative parameter tuning to achieve optimal performance.

In our study, we employed a comprehensive set of predictive modeling methods, including RF, KNN, SVM, and GLM, to conduct prediction and classification tasks. The selection of these methods aimed to thoroughly evaluate their performance in our specific domain.The combinations of RF, KNN, SVM, and GLM demonstrated diverse performance in predicting binary outcomes. Overall, most combinations exhibited strong performance with high accuracy and sensitivity. However, SVM-based combinations showed comparatively lower specificity, implying a higher false-positive rate. Of particular note is the observation that the RF + RFE combination yielded an NA value for specificity.

Despite dedicated efforts to fine-tune the critical parameters of each machine learning approach, the results remained unsatisfactory. However, it is important to highlight that generalized linear models demonstrated commendable performance and suitability in this context. This pattern led us to hypothesize that the challenges in applying machine learning methods to this specific cohort arise from its unique characteristics. Machine learning methodologies generally shine when dealing with high-dimensional, complex datasets. However, our attempts to use machine learning methods with all variables in model training yielded only marginal improvements in AUC, while complicating the prediction model considerably.

In our comprehensive comparative analysis of the three machine learning methods alongside the GLM-based combinations, with a special focus on the GLM + BestSub combination, a consistent pattern emerged. We observed that this specific combination consistently demonstrated well-balanced performance across a range of evaluation metrics,including specificity. Notably, even the ABCAP score, which was manually derived from the selection of five variables, displayed slightly lower metric values in comparison to GLM + BestSub.

Our decision to develop the ABCAP score was influenced by several factors, including the need for result interpretability, data availability, domain expertise, practical ease of calculation and application.

When contrasting the ABCAP score with the GBS, RS, and AIMS65 score, there are both shared and distinct variables. For instance, BUN and Pulse, featured in the ABCAP score, are also significant factors in other scoring systems such as the GBS and RS. Additionally, the presence of Cancer, encompassing both metastatic and nonmetastatic malignancies, has proven to be a crucial predictor of outcomes in UGIB patients. This characteristic is present in both the ABCAP score and the RS. The inclusion of Cancer as a predictive factor holds relevance due to its prevalence among our study population, a factor driven by the age-related increase in cancer cases and its substantial impact on prognosis[31]. Furthermore, a multicenter study on chronic diseases among elderly inpatients in China, utilizing our development dataset, revealed that malignancy remains the leading cause of in-hospital mortality[32].

Another significant observation within our study pertains to the dominant role of serum albumin levels, rather than HGB levels, at the time of presentation. This discovery aligns with recent research findings and the AIMS65 score, both of which emphasize the critical importance of hypoalbuminemia in forecasting mortality within the context of upper gastrointestinal bleeding and critical illness[33, 34]. Interestingly, hypoalbuminemia remains absent from the RS and GBS systems, despite its clinical relevance.

Analyzing the disparities between internal and external validation performance requires consideration of the differences in basic characteristics of the study populations. It is important to note that the AIMS65 and ABCAP scores are tailored for distinct patient groups and outcomes. Our focus on the 30-day in-hospital mortality rate diverges from AIMS65, which considers overall in-hospital mortality without a specific time frame. This divergence significantly impacts the differences in predictive performance.Additionally, the AIMS65 score offers the ability to predict length of stay (LOS) and intensive care unit (ICU) admission, features not included in our ABCAP score. Given that elderly patients in our cohort generally experience longer hospital stays and ICU admissions for reasons beyond UGIB, these additional features might contribute to disparities.

During internal validation, the ABCAP score demonstrated superior predictive power compared to the AIMS65 score, as evidenced by its notably higher AUC value. This trend continues in external validation, where both scores experience some decrease in performance but remain acceptable. Notably, the ABCAP score maintains its advantage over the AIMS65 score, highlighting its robust performance in predicting the 30-day in-hospital mortality rate.

Considering the specific attributes of our study population, which comprised elderly individuals aged 65 years and older in China[35], it is critical to note that the AIMS65 score's inclusion of the age factor (≥ 65 years) is effectively a fixed 1-point predictor in our population due to our population's age range. Despite this, the AIMS65 score still demonstrated reasonable predictive performance in our study, as indicated by its respectable AUC value.Collectively, these findings emphasize the potential of the ABCAP scoring system as a more suitable tool for risk assessment and prediction within our specific context. This contributes to enhancing the accuracy of clinical decision-making and strategies for patient care.

Variceal upper gastrointestinal bleeding (UGIB) is typically associated with underlying liver disease and the presence of esophageal or gastric varices. On the other hand, nonvariceal UGIB often arises from causes such as peptic ulcers, erosions, or Mallory-Weiss tears. Many studies focused on UGIB tend to primarily concentrate on the nonvariceal population, given the specialized management required for variceal bleeding. The classification of patients into variceal and nonvariceal groups has been a point of consideration in various research efforts. While some studies specifically examine variceal or nonvariceal patients, others include both patient groups in their analyses. Notably, the distinction between variceal and nonvariceal bleeding is not always clear-cut. For instance, one study comparing these two types of gastrointestinal bleeding reported higher mean age and mortality rates in the nonvariceal bleeding group[35].However, another study found no significant differences in clinical outcomes, including mortality, between patients admitted with variceal and nonvariceal gastrointestinal bleeding[25]. In our dataset, distinguishing between variceal and nonvariceal bleeding proved challenging due to limited information on the presence of varices, stemming from lower rates of endoscopic utilization in the elderly population. Consequently, we were unable to clearly classify patients based on this criterion.

Despite the limitations in classifying patients by variceal status, our analysis revealed that variceal bleeding was associated with prognosis in univariable analysis. However, it was selected by only three methods during the variable selection process. Interestingly, when comparing the performance of the ABCAP score in both groups, we noted no significant differences in score distribution or predictive accuracy. This suggests that the ABCAP score effectively stratifies the risk of adverse outcomes in both variceal and nonvariceal UGIB patients, regardless of the underlying cause. Consequently, the ABCAP score emerges as a versatile and reliable prognostic tool for managing UGIB, providing valuable risk assessment irrespective of the presence of varices.

We further delved into the patient and death counts for distinct score levels attributed to both the ABCAP and AIMS65 scoring systems across the entirety of the development dataset, yielding insightful findings. Particularly noteworthy is the equilibrium observed in cumulative counts for the 1 to 2 point and 3 to 5 point categories. However, a notable divergence becomes evident when accounting for the corresponding death counts and their ratios. Within the context of our study cohort, the 3 to 5 score range of the ABCAP score exhibits a heightened ability to effectively stratify mortality, surpassing the performance of the AIMS65 score.

In a cumulative analysis of the corresponding metrics across each score level in the development dataset, an upward trend in both mortality and positive likelihood ratio (PLR) was observed with increasing ABCAP scores. However, this trend was not consistently smooth. Based on the significant increase in mortality and PLR with higher scores, we were able to establish a risk stratification system. Patients with scores ranging from 0 to 2 were categorized as low risk, experiencing a mortality rate lower than 13%. A score of 3 indicated moderate risk, corresponding to a noticeable increase in mortality to 30.4%. For patients scoring 4 or 5, representing high risk, the mortality rate further escalated, ranging from 57.6–80%.

This risk stratification framework offers valuable guidance for healthcare providers when managing elderly patients with UGIB. If a patient's calculated ABCAP score is 3 or higher, timely intervention becomes crucial due to the significantly elevated risk of mortality. Conversely, if the score is below 3, while the mortality rate remains relatively high, the prognosis is generally expected to be more favorable. This risk-based approach facilitates informed decision-making and aids in prioritizing appropriate interventions for optimal patient care.

Despite the imputation of missing values in the development dataset using missForest, the imputed data showed minimal divergence from the original dataset. This result reinforces the credibility and robustness of our analysis. To ensure consistency and reliability, all computations were carried out across five iterations, and mean values were calculated accordingly.It is worth noting that the ABCAP score comprises only three numerical variables, of which Albumin and BUN have missing values below 10%. This careful selection and subsequent evaluation of variables contribute to a high level of acceptability and data integrity. Furthermore, in the external validation using patients admitted more recently, almost no missing values were present in the five variables. As a result, the ABCAP score exhibited acceptable predictive power in this subset.

In conclusion, our developed ABCAP score, incorporating Altered Mental Status,BUN, Cancer, Albumin and Pulse as key variables, has demonstrated strong predictive performance in assessing the 30-day in-hospital mortality risk for elderly patients with UGIB. This score exhibits comparable predictive ability to the widely utilized AIMS65 scoring system, as evidenced by robust results in both internal and external validation. Importantly, the ABCAP score effectively stratifies mortality risk in both variceal and nonvariceal bleeding cases. Nonetheless, to establish its wider applicability and generalizability, further validation studies across diverse healthcare settings and patient populations are imperative. With its potential to provide valuable risk assessment insights, the ABCAP score stands as a promising tool to guide clinicians in making well-informed decisions and prioritizing appropriate interventions during the acute care phase for elderly UGIB patients.

Limitation

Our study has provided valuable insights and promising results. However, certain data-related limitations need to be acknowledged. First, the relatively small sample size may limit the generalizability of our findings to a broader population. To enhance reliability, larger and more diverse cohorts should be considered in future studies. Additionally, missing data and potential biases in our retrospective study require careful consideration, even though we used imputation methods. In terms of study design, our adoption of a single-center retrospective design may introduce selection bias and limit the ability to establish causality. Conducting a multicenter prospective study would enhance the robustness of our findings. Despite our efforts to control for potential confounders, unmeasured factors or unknown biases may still influence the results and interpretations of our study. Therefore, more external validation and broader exploration are necessary to validate the performance and applicability of the ABCAP score in diverse patient populations and clinical settings.

Ethics approval and consent to participate

This study received ethical approval from the Ethics Committee of the People's Liberation Army General Hospital. All participants provided informed consent, and their confidentiality was safeguarded throughout the study. The research adheres to the principles of the Declaration of Helsinki and ensures participant welfare and data integrity.

Consent for publication

We, the authors of the manuscript titled 'Prediction of 30-Day In-Hospital Mortality in Elderly UGIB Patients Using a Simplified Risk Score and Comparison with AIMS65 score,' submitted to 'BMC Geriatrics,' hereby provide our consent for the publication of our work in your esteemed journal. This manuscript is an original contribution, and each author has made substantial contributions to the research. We affirm that there are no conflicts of interest or ethical concerns associated with this work.

In accordance with the journal's request, we hereby indicate that the "Consent for publication" section is deemed as "Not applicable (NA)" as there are no images or identifiable information in our manuscript that would require special consent for publication.

Availability of data and materials

We regret to inform that the original data used in this study cannot be made publicly available due to restrictions imposed by our affiliated institution. The data contain sensitive patient information, and our institution has not granted approval for its public release. However, interested researchers may request access to the de-identified data, subject to approval from our institutional ethics committee. Requests for data access can be directed to [email protected].

Competing interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding

This research was supported by the 2019 Medical Big Data and Artificial Intelligence Research and Development Project (2019MBD-018).

Authors' contributions

Z.X.: conceptualization, investigation, methodology, validation, and writing; H.C.: technical support and data curation; J.R.: data collection and curation; D.X.: technical support and methodology; Q.S.: conceptualization, supervision, and review. All authors contributed to the article and approved the submitted version.

Hearnshaw, S.A., et al., Acute upper gastrointestinal bleeding in the UK: patient characteristics, diagnoses and outcomes in the 2007 UK audit. Gut, 2011. 60(10): p. 1327-35.
Peery, A.F., et al., Burden and Cost of Gastrointestinal, Liver, and Pancreatic Diseases in the United States: Update 2018. Gastroenterology, 2019. 156(1): p. 254-272 e11.
Rockall, T.A., et al., Risk assessment after acute upper gastrointestinal haemorrhage. Gut, 1996. 38(3): p. 316-21.
Longstreth, G.F., Epidemiology of hospitalization for acute upper gastrointestinal hemorrhage: a population-based study. Am J Gastroenterol, 1995. 90(2): p. 206-10.
Elsebaey, M.A., et al., Predictors of in-hospital mortality in a cohort of elderly Egyptian patients with acute upper gastrointestinal bleeding. Medicine (Baltimore), 2018. 97(16): p. e0403.
Zhong, M., et al., Comparison of three scoring systems in predicting clinical outcomes in patients with acute upper gastrointestinal bleeding: a prospective observational study. J Dig Dis, 2016. 17(12): p. 820-828.
Blatchford, O., W.R. Murray, and M. Blatchford, A risk score to predict need for treatment for upper-gastrointestinal haemorrhage. Lancet, 2000. 356(9238): p. 1318-21.
Saltzman, J.R., et al., A simple risk score accurately predicts in-hospital mortality, length of stay, and cost in acute upper GI bleeding. Gastrointest Endosc, 2011. 74(6): p. 1215-24.
Wang, H. and H. Chen, Aging in China: Challenges and Opportunities. China CDC Wkly, 2022. 4(27): p. 601-602.
Charlson, M.E., et al., A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis, 1987. 40(5): p. 373-83.
Quan, H., et al., Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am J Epidemiol, 2011. 173(6): p. 676-82.
Miyanaga, R., et al., Complications and outcomes of routine endoscopy in the very elderly. Endosc Int Open, 2018. 6(2): p. E224-E229.
Derogar, M., et al., Discontinuation of low-dose aspirin therapy after peptic ulcer bleeding increases risk of death and acute cardiovascular events. Clin Gastroenterol Hepatol, 2013. 11(1): p. 38-42.
Saffouri, E., et al., The Shock Index is not accurate at predicting outcomes in patients with upper gastrointestinal bleeding. Aliment Pharmacol Ther, 2020. 51(2): p. 253-260.
Bae, S.J., et al., Predictive performance of blood urea nitrogen to serum albumin ratio in elderly patients with gastrointestinal bleeding. Am J Emerg Med, 2021. 41: p. 152-157.
Tibshirani, R., The lasso method for variable selection in the Cox model. Stat Med, 1997. 16(4): p. 385-95.
Kang, J., et al., LASSO-Based Machine Learning Algorithm for Prediction of Lymph Node Metastasis in T1 Colorectal Cancer. Cancer Res Treat, 2021. 53(3): p. 773-783.
Zhu, K., et al., An In-Hospital Mortality Risk Model for Elderly Patients Undergoing Cardiac Valvular Surgery Based on LASSO-Logistic Regression and Machine Learning. J Cardiovasc Dev Dis, 2023. 10(2).
Endo, A., et al., Prediction Model of In-Hospital Mortality After Hip Fracture Surgery. J Orthop Trauma, 2018. 32(1): p. 34-38.
Qiu, W., et al., Age-to-Glasgow Coma Scale score ratio predicts gastrointestinal bleeding in patients with primary intracerebral hemorrhage. Front Neurol, 2023. 14: p. 1034865.
Wei, J., et al., Stress-related upper gastrointestinal bleeding in adult neurocritical care patients: a Chinese multicenter, retrospective study. Curr Med Res Opin, 2019. 35(2): p. 181-187.
Bai, Z., et al., Incidence and mortality of renal dysfunction in cirrhotic patients with acute gastrointestinal bleeding: a systematic review and meta-analysis. Expert Rev Gastroenterol Hepatol, 2019. 13(12): p. 1181-1188.
Leontiadis, G.I., et al., Effect of comorbidity on mortality in patients with peptic ulcer bleeding: systematic review and meta-analysis. Am J Gastroenterol, 2013. 108(3): p. 331-45; quiz 346.
Tandon, P., et al., Comparison of clinical outcomes between variceal and non-variceal gastrointestinal bleeding in patients with cirrhosis. J Gastroenterol Hepatol, 2018. 33(10): p. 1773-1779.
Farooq, U., et al., Comparison of outcomes between variceal and non-variceal gastrointestinal bleeding in patients with cirrhosis: Insights from a Nationwide Inpatient Sample. Ann Gastroenterol, 2022. 35(6): p. 618-626.
Shuvy, M., et al., The age-adjusted Charlson comorbidity index: A significant predictor of clinical outcome in patients with heart failure. Eur J Intern Med, 2020. 73: p. 103-104.
Radovanovic, D., et al., Validity of Charlson Comorbidity Index in patients hospitalised with acute coronary syndrome. Insights from the nationwide AMIS Plus registry 2002-2012. Heart, 2014. 100(4): p. 288-94.
Kim, J. and I. Jang, Predictors of bleeding event among elderly patients with mechanical valve replacement using random forest model: A retrospective study. Medicine (Baltimore), 2021. 100(19): p. e25875.
Elgammal, Y.M., M.A. Zahran, and M.M. Abdelsalam, A new strategy for the early detection of alzheimer disease stages using multifractal geometry analysis based on K-Nearest Neighbor algorithm. Sci Rep, 2022. 12(1): p. 22381.
Lee, S.K., et al., Predictors of medication adherence in elderly patients with chronic diseases using support vector machine models. Healthc Inform Res, 2013. 19(1): p. 33-41.
Smith, B.D., et al., Future of cancer incidence in the United States: burdens upon an aging, changing nation. J Clin Oncol, 2009. 27(17): p. 2758-65.
曹丰, et al., 中国老年疾病临床多中心报告 %J 中华老年多器官疾病杂志. 2018. 17(11): p. 801-808.
Freire, A.X., et al., Admission hyperglycemia and other risk factors as predictors of hospital mortality in a medical ICU population. Chest, 2005. 128(5): p. 3109-16.
Tung, C.F., et al., The prevalence and significance of hypoalbuminemia in non-variceal upper gastrointestinal bleeding. Hepatogastroenterology, 2007. 54(76): p. 1153-6.
Ratiu, I., et al., Acute gastrointestinal bleeding: A comparison between variceal and nonvariceal gastrointestinal bleeding. Medicine (Baltimore), 2022. 101(45): p. e31543.

No competing interests reported.

supplementarydocument.docx

Prediction of 30-Day In-Hospital Mortality in Elderly UGIB Patients Using a Simplified Risk Score and Comparison with AIMS65 score

Status:

Version 1

Abstract

Figures

Background

Methods

Results

Discussion

Conclusion

Limitation

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1