Despite significant advancements in the prevention and treatment of UGIB, the prognosis for elderly patients remains a challenge during hospitalization. Interestingly, among the 1899 patients included in our development dataset, those who did not survive exhibited a lower rate of in-hospital endoscopy compared to the surviving group, despite the majority of patients undergoing such procedures as part of their medical care. The reasons behind this discrepancy could include safety concerns, patient preferences, and practical considerations, all of which can hinder the accessibility and suitability of endoscopic interventions, particularly among the elderly population [12]. Consequently, the development of a streamlined scoring system capable of swiftly assessing prognosis in elderly patients prior to undergoing endoscopy is of paramount importance.
While established scoring systems such as the GBS, RS, and AIMS65 scores have undergone extensive validation and implementation for patient triage in clinical settings, it is crucial to acknowledge that the severity of various acute and chronic conditions might differ in elderly individuals compared to their younger counterparts. Thus, it becomes imperative to explore specific risk factors that address the distinct challenges encountered by this particular patient demographic. Identifying high-risk individuals in a timely manner is pivotal for effective resource allocation, enabling prompt endoscopic or surgical interventions. Furthermore, managing the prognosis of low-risk patients who require antithrombotic therapy, such as aspirin for secondary prevention, is a critical aspect in treatment decision-making [13].
Differing from the focus of AIMS65, RS, and GBS, which predominantly address overall in-hospital mortality, our study delved into a more specific aspect: the 30-day in-hospital mortality rate. Significantly, within our development dataset, this rate was measured at 5.1%, a value notably higher than the corresponding 30-day mortality rate of 1.37% observed in the broader original population of 253,947 patients (3485 cases). This particular timeframe allows for the assessment of the near-term risk of death following UGIB, capturing the most relevant outcomes within the hospital stay. Our concentration on the 30-day in-hospital death outcome permits an evaluation of the effectiveness of interventions and risk prediction models in mitigating mortality within a critical window. Such an approach yields valuable insights for clinicians, aiding them in informed decision-making and the prioritization of suitable interventions during the acute phase of care.Furthermore, by limiting the outcome to in-hospital death, potential biases and confounding factors associated with long-term follow-up are circumvented. Factors like patient adherence, access to healthcare resources, and shifts in treatment strategies over time could influence outcomes beyond the immediate hospitalization period.
To overcome the above limitations and predict short-term outcome, we developed the ABCAP score, a simplified scoring system specifically designed for elderly patients with UGIB. This scoring system incorporates key variables, including the presence of cancers, mental status alterations, elevated heart rate, low albumin levels, and increased blood urea nitrogen. Each variable is assigned a score of 1, resulting in a concise and practical tool for risk stratification for 30-day in-hospital death.
Previous studies have primarily relied on traditional multivariable analysis or stepwise methods to identify risk factors in elderly populations with UGIB. Furthermore, some studies have explored the use of compound features, such as the shock index[14]and the blood urea nitrogen to serum albumin ratio[15], to improve conciseness and prediction accuracy. However, it is important to note that not all of these newly identified risk factors have proven to be effective predictors[14]. Therefore, in our study, we focused on incorporating key individual variables which are mostly available and easy to measure. We employed a combination of traditional methods and innovative techniques, such as StepAIC,LASSO, ENT, RFE, and Best subset selection, to facilitate the selection of key variables during the training and prediction modeling process.For example,the LASSO method has distinct advantages for variable selection in predictive modeling, particularly in complex datasets with interrelated factors, commonly found in studies involving elderly populations with multiple comorbidities[16, 17]. It efficiently handles high-dimensional data by shrinking irrelevant variables' regression coefficients to zero, enabling automatic variable selection. This regularization penalty promotes sparsity in the model, resulting in a concise set of predictors with strong predictive power, preventing overfitting and improving generalizability. LASSO has been successfully applied in various studies, including predicting in-hospital mortality risk for elderly patients undergoing cardiac valvular surgery and predicting mortality in elderly patients after hip fractures[18, 19].
Each variable selection method has its own set of advantages and limitations. While StepAIC can yield promising models, it may sometimes result in the selection of an overly expansive feature subset. In contrast, Best Subset selection exhaustively explores all potential predictor combinations, ultimately identifying the subset that optimally fits the model. This approach provides a comprehensive tool for pinpointing the most favorable set of predictors, although it is worth noting that its exhaustive nature can be computationally intensive and might lead to overfitting when the number of predictors is large. Across five iterations, it was clear that specific methodologies, especially StepAIC, BestSub, and RFE, led to the selection of over 10 variables. However, the consistency and stability of selections from these methods varied across iterations, highlighting the potential for overfitting within each iteration. Notably, LASSO consistently chose 6 variables, and ENT consistently chose 7 variables, indicating significant alignment in their selections.
In total, 15 variables emerged from the selection process, each potentially associated with in-hospital death. This encompasses variables selected by specific methods, such as Age, eGFR, HGB, ICH, Liver Diseases, and Peptic Ulcer. It's noteworthy that Age, Liver Diseases, and HGB, which were present in both the GBS and RS scores, were also consistently identified across a substantial body of research, underscoring their significance in predicting outcomes.Age's influence, a pivotal prognostic factor in various medical contexts, demonstrated a nuanced impact in our population, indicating its potential to be outweighed by other factors for individuals over 65 years old. The presence of UGIB often coincides with ICH, linked to heightened mortality risk and prolonged stays in the ICU [20, 21].Some studies found that renal function, as exemplified by eGFR, emerged as a critical marker and was indicative of poorer UGIB outcomes[22]. Furthermore, the prevalence of Peptic Ulcer Bleeding outweighed that of variceal bleeding. Depending on the type of comorbidities, this association translated into varying degrees of short-term mortality elevation[23]. Variceal is uncertain risk factors according some research[24, 25]. These findings highlight the complexity and variability in the outcomes of patients with different types of gastrointestinal bleeding and with different types of complications. Our aim was to find the best subset of predictors, which is why we implemented several methods to create different combinations and evaluated their performance using internal validation.
To comprehensively capture and analyze the complex conditions of the elderly population, we considered the use of the Charlson Comorbidity Index (CCI), a well-established scoring system for predicting mortality. CCI provides a quantitative assessment of the cumulative comorbidity burden, contributing to the evaluation of long-term prognosis[26]. Notably, certain studies have identified the potential of the CCI in predicting short-term in-hospital prognosis for elderly patients as well[27]. The Alive and Death groups showed median and interquartile range (IQR) CCI values of 5 (4, 6) and 7 (5, 9) points, respectively, shedding light on the intricate comorbidity landscape among the elderly individuals. In our initial exploration, we deliberated on whether to include CCI as a predictor, considering its independence from individual diseases within the index. Although both the LASSO and ENT methods identified CCI, its incorporation yielded only average performance, proving less effective compared to replacing it with only little comorbidity as the predictor.Additionally, the complexities associated with calculating CCI, unless utilizing specialized tools or integrating with diagnostic systems, required further information and rendered its inclusion less practical. As a result, in this study, we made the decision not to include CCI and instead focused on individually integrating each comorbidity into the variable selection process.
In our study, certain variables including age, BMI, and SBP were initially considered in their continuous format by certain variable selection methods. However, their inclusion with small coefficients would have limited the interpretability and practical utility of a scoring system. To address this concern, we undertook the necessary step of transforming these variables into categorical formats.
Presenting the data in a categorical format allowed us to effectively communicate the implications of each variable on the outcome. This approach emphasized the significance of specific ranges or levels in predicting the target variable. Moreover, using categorical variables facilitated the integration of the model into established clinical guidelines or risk stratification systems, enhancing its practical applicability in real-world contexts. It is important to acknowledge that converting continuous variables into categorical ones might result in some loss of information, and the selection of cutoff points should be made thoughtfully. For example, in the case of the variable AGE, we utilized the Youden index to determine the optimal threshold, which was identified as 75 years old. However, it is worth noting that this variable was ultimately not included in the final selection. On the other hand, for variables such as BUN, Albumin, and Pulse, we aligned the chosen cutoff points with the hospital's laboratory standards and established conventions. This approach ensured consistency with common practices, making our findings clinically meaningful and comparable across various healthcare institutions.
Ultimately, we arrived at a subset of five variables – Albumin, BUN, Cancer, Altered Mental Status, and Pulse – which were consistently selected by all five methods as well as our scoring system. This specific subset was established manually, and we are eager to evaluate its performance.
Traditional regression models have a well-established history of application and validation across various studies, leading to the development of widely used scoring systems. Notably, GBS and RS rely on logistic regression and forward stepwise techniques, respectively, while AIMS65 employs the recursive partition approach, a more recent decision tree method. In recent years, the field of predictive modeling has witnessed the emergence of innovative techniques. RF, KNN, and SVM have demonstrated distinct features and have found applications in diverse medical research domains, including predicting bleeding events among elderly patients with mechanical valve replacement[28], early detection of Alzheimer's disease stages[29], and predicting medication adherence in elderly patients with chronic diseases [30]. Each of the five methods is adept at managing categorical data and excels in performing classification tasks.Each method possesses unique strengths and limitations, and the selection of the most suitable approach hinges on the specific characteristics and objectives of the dataset. It is crucial to recognize that while machine learning methods have shown promise, they also exhibit certain limitations, such as their "black box" nature with reduced interpretability. Moreover, these methods may require iterative parameter tuning to achieve optimal performance.
In our study, we employed a comprehensive set of predictive modeling methods, including RF, KNN, SVM, and GLM, to conduct prediction and classification tasks. The selection of these methods aimed to thoroughly evaluate their performance in our specific domain.The combinations of RF, KNN, SVM, and GLM demonstrated diverse performance in predicting binary outcomes. Overall, most combinations exhibited strong performance with high accuracy and sensitivity. However, SVM-based combinations showed comparatively lower specificity, implying a higher false-positive rate. Of particular note is the observation that the RF + RFE combination yielded an NA value for specificity.
Despite dedicated efforts to fine-tune the critical parameters of each machine learning approach, the results remained unsatisfactory. However, it is important to highlight that generalized linear models demonstrated commendable performance and suitability in this context. This pattern led us to hypothesize that the challenges in applying machine learning methods to this specific cohort arise from its unique characteristics. Machine learning methodologies generally shine when dealing with high-dimensional, complex datasets. However, our attempts to use machine learning methods with all variables in model training yielded only marginal improvements in AUC, while complicating the prediction model considerably.
In our comprehensive comparative analysis of the three machine learning methods alongside the GLM-based combinations, with a special focus on the GLM + BestSub combination, a consistent pattern emerged. We observed that this specific combination consistently demonstrated well-balanced performance across a range of evaluation metrics,including specificity. Notably, even the ABCAP score, which was manually derived from the selection of five variables, displayed slightly lower metric values in comparison to GLM + BestSub.
Our decision to develop the ABCAP score was influenced by several factors, including the need for result interpretability, data availability, domain expertise, practical ease of calculation and application.
When contrasting the ABCAP score with the GBS, RS, and AIMS65 score, there are both shared and distinct variables. For instance, BUN and Pulse, featured in the ABCAP score, are also significant factors in other scoring systems such as the GBS and RS. Additionally, the presence of Cancer, encompassing both metastatic and nonmetastatic malignancies, has proven to be a crucial predictor of outcomes in UGIB patients. This characteristic is present in both the ABCAP score and the RS. The inclusion of Cancer as a predictive factor holds relevance due to its prevalence among our study population, a factor driven by the age-related increase in cancer cases and its substantial impact on prognosis[31]. Furthermore, a multicenter study on chronic diseases among elderly inpatients in China, utilizing our development dataset, revealed that malignancy remains the leading cause of in-hospital mortality[32].
Another significant observation within our study pertains to the dominant role of serum albumin levels, rather than HGB levels, at the time of presentation. This discovery aligns with recent research findings and the AIMS65 score, both of which emphasize the critical importance of hypoalbuminemia in forecasting mortality within the context of upper gastrointestinal bleeding and critical illness[33, 34]. Interestingly, hypoalbuminemia remains absent from the RS and GBS systems, despite its clinical relevance.
Analyzing the disparities between internal and external validation performance requires consideration of the differences in basic characteristics of the study populations. It is important to note that the AIMS65 and ABCAP scores are tailored for distinct patient groups and outcomes. Our focus on the 30-day in-hospital mortality rate diverges from AIMS65, which considers overall in-hospital mortality without a specific time frame. This divergence significantly impacts the differences in predictive performance.Additionally, the AIMS65 score offers the ability to predict length of stay (LOS) and intensive care unit (ICU) admission, features not included in our ABCAP score. Given that elderly patients in our cohort generally experience longer hospital stays and ICU admissions for reasons beyond UGIB, these additional features might contribute to disparities.
During internal validation, the ABCAP score demonstrated superior predictive power compared to the AIMS65 score, as evidenced by its notably higher AUC value. This trend continues in external validation, where both scores experience some decrease in performance but remain acceptable. Notably, the ABCAP score maintains its advantage over the AIMS65 score, highlighting its robust performance in predicting the 30-day in-hospital mortality rate.
Considering the specific attributes of our study population, which comprised elderly individuals aged 65 years and older in China[35], it is critical to note that the AIMS65 score's inclusion of the age factor (≥ 65 years) is effectively a fixed 1-point predictor in our population due to our population's age range. Despite this, the AIMS65 score still demonstrated reasonable predictive performance in our study, as indicated by its respectable AUC value.Collectively, these findings emphasize the potential of the ABCAP scoring system as a more suitable tool for risk assessment and prediction within our specific context. This contributes to enhancing the accuracy of clinical decision-making and strategies for patient care.
Variceal upper gastrointestinal bleeding (UGIB) is typically associated with underlying liver disease and the presence of esophageal or gastric varices. On the other hand, nonvariceal UGIB often arises from causes such as peptic ulcers, erosions, or Mallory-Weiss tears. Many studies focused on UGIB tend to primarily concentrate on the nonvariceal population, given the specialized management required for variceal bleeding. The classification of patients into variceal and nonvariceal groups has been a point of consideration in various research efforts. While some studies specifically examine variceal or nonvariceal patients, others include both patient groups in their analyses. Notably, the distinction between variceal and nonvariceal bleeding is not always clear-cut. For instance, one study comparing these two types of gastrointestinal bleeding reported higher mean age and mortality rates in the nonvariceal bleeding group[35].However, another study found no significant differences in clinical outcomes, including mortality, between patients admitted with variceal and nonvariceal gastrointestinal bleeding[25]. In our dataset, distinguishing between variceal and nonvariceal bleeding proved challenging due to limited information on the presence of varices, stemming from lower rates of endoscopic utilization in the elderly population. Consequently, we were unable to clearly classify patients based on this criterion.
Despite the limitations in classifying patients by variceal status, our analysis revealed that variceal bleeding was associated with prognosis in univariable analysis. However, it was selected by only three methods during the variable selection process. Interestingly, when comparing the performance of the ABCAP score in both groups, we noted no significant differences in score distribution or predictive accuracy. This suggests that the ABCAP score effectively stratifies the risk of adverse outcomes in both variceal and nonvariceal UGIB patients, regardless of the underlying cause. Consequently, the ABCAP score emerges as a versatile and reliable prognostic tool for managing UGIB, providing valuable risk assessment irrespective of the presence of varices.
We further delved into the patient and death counts for distinct score levels attributed to both the ABCAP and AIMS65 scoring systems across the entirety of the development dataset, yielding insightful findings. Particularly noteworthy is the equilibrium observed in cumulative counts for the 1 to 2 point and 3 to 5 point categories. However, a notable divergence becomes evident when accounting for the corresponding death counts and their ratios. Within the context of our study cohort, the 3 to 5 score range of the ABCAP score exhibits a heightened ability to effectively stratify mortality, surpassing the performance of the AIMS65 score.
In a cumulative analysis of the corresponding metrics across each score level in the development dataset, an upward trend in both mortality and positive likelihood ratio (PLR) was observed with increasing ABCAP scores. However, this trend was not consistently smooth. Based on the significant increase in mortality and PLR with higher scores, we were able to establish a risk stratification system. Patients with scores ranging from 0 to 2 were categorized as low risk, experiencing a mortality rate lower than 13%. A score of 3 indicated moderate risk, corresponding to a noticeable increase in mortality to 30.4%. For patients scoring 4 or 5, representing high risk, the mortality rate further escalated, ranging from 57.6–80%.
This risk stratification framework offers valuable guidance for healthcare providers when managing elderly patients with UGIB. If a patient's calculated ABCAP score is 3 or higher, timely intervention becomes crucial due to the significantly elevated risk of mortality. Conversely, if the score is below 3, while the mortality rate remains relatively high, the prognosis is generally expected to be more favorable. This risk-based approach facilitates informed decision-making and aids in prioritizing appropriate interventions for optimal patient care.
Despite the imputation of missing values in the development dataset using missForest, the imputed data showed minimal divergence from the original dataset. This result reinforces the credibility and robustness of our analysis. To ensure consistency and reliability, all computations were carried out across five iterations, and mean values were calculated accordingly.It is worth noting that the ABCAP score comprises only three numerical variables, of which Albumin and BUN have missing values below 10%. This careful selection and subsequent evaluation of variables contribute to a high level of acceptability and data integrity. Furthermore, in the external validation using patients admitted more recently, almost no missing values were present in the five variables. As a result, the ABCAP score exhibited acceptable predictive power in this subset.