Risk Prediction Models for Patients With Acute Heart Failure: A Protocol for Systematic Review, Critical Appraisal, and Meta-Analysis

Nearly a third of patients with acute heart failure (AHF) die or are readmitted within three months after discharge, accounting for the majority of costs associated with heart failure-related care. A considerable number of risk prediction models, which predict outcomes for mortality and readmission rates, have been developed and validated for patients with AHF. These models could help clinicians stratify patients by risk level and improve decision making, and provide specialist care and resources directed to high-risk patients. However, clinicians sometimes reluctant to utilize these models, possibly due to their poor reliability, the variety of models, and/or the complexity of statistical methodologies. Here, we describe a protocol to systematically review extant risk prediction models. We will describe characteristics, compare performance, and critically appraise the reporting transparency and methodological quality of risk prediction models for AHF patients.


Discussion
The result of the systematic review could help clinicians better understand and use the prediction models for AHF patients, as well as make standardized decisions about more precise, risk-adjusted management.

Systematic review registration
: PROSPERO registration number CRD42021256416.

Background
Heart failure is a rapidly growing health issue associated with high mortality and readmission rates [1,2].
Nearly a third of patients with acute heart failure (AHF) die or are readmitted three months after discharge, accounting for the majority of costs associated with heart failure-related care [3][4][5]. Although some improvements have been made in AHF elds, these achievements were realized via adherence to existing chronic heart failure care and improvement in the quality of patient care[6], but not because of new therapeutic developments [7][8][9]. In this context, risk strati cations of AHF patients would be useful for more effective, risk-adjusted management [1,10]. Accurate risk predictions could lead to timely decisions at emergency department (ED) patient triage (discharge with close outpatient follow up versus hospitalization) and thus appropriate level of care (outpatient care, ward, telemetry, or intensive care) and its corresponding treatments [11]. Risk prediction models help clinicians stratify patients by their severity of disease or general health condition [12]. Therefore, specialist care and resources could be directed to high risk patients, while unnecessary testing and procedures could be avoided for low risk patients, leading to overall healthcare cost savings [10].
Risk prediction models usually encompass a combination of predictors such as vital signs, biomarkers, and demographics; they provide estimates of future outcomes of AHF [13]. These statistical models, which yield a so-called risk score for mortality, hospital readmission and other outcomes in AHF. A considerable number of multivariable prognostic models have been developed for patients with AHF, and the outcomes most frequently predicted are mortality and readmission rate [12]. Several scores, such as MEESSI-AHF [5] and ADHF/NTproBNP [14], have been con rmed to achieve reliable risk strati cation in patients with AHF, while others showed less satisfactory performance [15,16]. On the other side, clinicians and other associated health care professionals are usually unable to accurately judge the probability of death or severe complications or predict hospital readmission in AHF. An ideal combination of the best prognostic models and clinical judgment can supplement each other and optimize clinical care. However, improvements remains in huge demand by hospital administrators and policy makers as clinicians are reluctant to utilize these models, possibly due to their poor reliability, the variety of models, and/or the complexity of statistical methodologies.
Although variable systematic reviews are available in the literature on prognostic models for outcomes in heart failure [12,17], few have been speci cally dedicated for the AHF population or the AHF subgroups. Therefore, a new systematic review of models evaluated both in the ED and hospitalization settings will be a necessary supplement.is in urgent need. Here we describe our protocol to systematically review the prognostic models for patients with AHF. Our speci c objectives are: (1) to identify prediction models for mortality and readmission in patients with AHF and qualitatively describe characteristics of these models, (2) to compare the performance of identi ed models quantitatively across different populations and settings with meta-analysis if appropriate, and (3) to critically appraise the reporting transparency and methodological quality of these studies on prediction models.

Methods
The protocol is reported with reference to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) statement [18,19] (the checklist is shown in the Supplementary le in Appendix 1) and is conducted using the methodology recommended for systematic review and metaanalysis of prediction models [20]. We will conduct the framing of the review question (Table 1), study design, data extraction, data analysis and critical appraisal under the relevant guidelines [21][22][23]. We have registered the study on PROSPERO(CRD42021256416).

Eligibility criteria
Studies will be screened on the basis of the following criteria: population (P), index (I), comparator (C), outcomes (O), timing (T), and setting (S). The PICOTS is an amendment of the PICOS system speci c to systematic reviews of prediction models with extra consideration for timing and clinical setting [20,24].

Patients
Studies on prediction models for patients with AHF (or acute decompensated heart failure) be considered for inclusion, but those focus on exclusive patients with speci c morbidities or groups of children will be excluded.

Index/Potential prognostic models
Studies reporting the derivation and validation of a multivariable prediction model will be eligible for this review. The derived model should include at least 50 patients who experienced an event during the period of observation, because studies with fewer cases may not be su cient for convincing administrative or clinical use [17].

Outcome, timing and setting
The primary outcomes predicted by the models under review are mortality and readmission rates of AHF patients. Readmission is de ned as all-cause or heart failure-related readmission, which includes emergency department revisits or re-hospitalizations. The studies included in this review should report at least one of the primary outcomes. Studies for prediction models in inpatient and emergency department settings at any period will be included.

Types of studies and limits
Cohort, nested case-control, case-cohort, or registry studies using any type of data source (e.g., administrative databases and electronic medical records) will be included in this review. Only those original research articles reported in peer-reviewed publications will be eligible for this review. Secondary research, reviews, conference proceedings, dissertations, editorials, expert opinions, or consensus paper abstracts will be excluded.

Information sources and search strategy
Embase, Pubmed, Web of Science, and the Cochrane Library will be searched for results from their inception onwards. Detailed search strategies are presented in Supplementary le in Appendix 2, and the search terms cover expressions for acute heart failure, prediction model, and mortality/readmission [25]. We will focus on studies published in English. We will search a backword citation on all model derivation studies and identify relevant external validation studies via these studies.

Data management and study selection
Reviewers will undergo formal training before the formal selection of studies, considering different backgrounds and knowledge of the reviewers. Eligibility criteria will be explained and discussed in detail to ensure the reviewers have the same understanding. Search results will be combined using Endnote X9, and duplicates will be removed. Abstracts and titles of each study will be screened by two reviewers (selected among XZ, SW, ZG, MH), excluding articles based on eligibility criteria.

Data extraction
Two independent reviewers (selected among XZ, SW, ZG, MH) will perform the data extraction, using a standardized data extraction form for all included studies. The form was developed based on the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (the CHARMS checklist) [22]. For each included study, the following information, but not limited to, will be extracted: study information (title, authors, countries and regions, published year); type of prediction model (regression-based model or risk score, derivation and/or validation); source of data (cohort, registry, casecontrol or randomized trial participants data); participants (consecutive/inconsecutive participants, location, number of centers, ED/hospitalization/combination/unclear, acute heart failure/acute decompensate heart failure, age limitation, mean age, percent of male patients, eligibility criteria, study dates); outcomes to be predicted (types of outcome, the time point and over what time period the outcome is predicted); candidate predictors (number and type of predictors, timing of predictor measurement); missing data; sample size; model derivation; model performance; model evaluation and results. Full details of the form are presented in the Supplementary le in Appendix 3.
Disagreements will be resolved via discussion or consultation with a third reviewer when a consensus cannot be reached. We will obtain missing data from the authors if possible. A study will be excluded if the key missing data could not be supplemented.

Critical appraisal
We will provide an overview critical appraisal of the methodological quality and reporting transparency of included studies. To critically appraise the methodological quality of included studies, which will include the risk of bias and applicability, we will use the PROBAST tool [23,24]. This tool consists of 20 items structured in four domains: participants, predictors, outcome, and analysis. After 4 steps of an overall assessment, each domain will be ranked as "high", "low" or "unclear" for both risk of bias and applicability. The nal results of the PROBAST assessments will be presented in a summary table (Supplementary le in Appendix 4).
The TRIPOD statement [21,26], which provides a checklist of 22 items considered indispensable for reporting transparency of a prediction model study, will be used for evaluating transparency of the reporting of the included studies. Each element of the TRIPOD statement could be answered with "yes" or "no", depending on whether the element was reported in the studies. Each "yes" answer will receive 1 point, and each "no" answer will receive 0 points. Generally, when several aspects are described within a TRIPOD item, all aspects has to be reported to obtain a point for that item. For elements that do not apply to a speci c situation, they can be marked as "not applicable" [27]. We will report the overall TRIPOD adherence score of each study, which is developed to assess the uniformity in measuring adherence to the TRIPOD statement. The computational method of TRIPOD adherence score is calculated by dividing the sum of the adhered TRIPOD items by the total number of applicable TRIPOD items [28].
To ensure consistence in evaluating the parameters, reviewers will participate in a pre-study training before the critical appraisal. Any disagreement will be handled as described previously.

Qualitative data synthesis of the prediction models
Characteristics of the included studies will be described systematically by a narrative synthesis, and quantitative data from the included studies will be presented. Key ndings, such as outcomes to be predicted, predictors, performance measures, and the predictive accuracy of the model, will be tabulated to facilitate comparisons. We will report uncertain measures in the way they were published or approximated using published methods [20].

Preparation for quantitative synthesis
Quantitative synthesis of predictive performance of the included models will be based on the results of performance measures and their precision values [20]. However, the types of performance measures vary among different studies and are often inconsistent or even not reported for further analyses. We will focus on discrimination and calibration, which are considered as the two most common performance measures of predictive models. Discrimination refers to a prediction model's ability to distinguish between patients developing and not developing the target outcome and is often quanti ed by the concordance (C) statistic [29]. The C-statistic will be estimated from the reported measure when missing, if necessary [20]. Calibration refers to a model's accuracy of the predicted risk probabilities and indicates the degree of which expected outcomes and observed outcomes agree with each other [29]. Previous studies offer different statistic measures for calibration, such as calibration plot, calibration slope, Hosmer-Lemeshow test, and others. If the data on the total number of expected (E) and observed (O) events are available for extraction among the included models, the O:E ratio will be used for further analysis and roughly provide an indication of the overall model calibration [30]. Extracted C-statistics and total O:E ratios will be rescaled before further meta-analysis to improve the validity of their underlying assumptions, according to statistic models provided in the literature [31,32].

Quantitative syntheses
The nature of quantitative analyses will depend on the number of prediction models of the included studies. Derivation and validation of models will be considered separately. Data will be synthesized by a meta-analysis if the number (at least 5) of studies included is large enough in a subset with an analogous clinical question appraised repeatedly [33]. A meta-analysis of validation studies for a common prediction model will also be performed if feasible.

Meta-analyses and investigation of heterogeneity
Performance measures of discrimination and calibration will be analyzed by a random-effects model of meta-analysis to estimate the average performance for all included models, if appropriate [20]. The metaanalysis will be performed using Softwares STATA 16.0 (Stata Corp, College Station, TX, USA) with several packages. To better handle the uncertainty in the estimated heterogeneity among studies, we will adopt the restricted maximum likelihood (REML) estimation and use the Hartung-Knapp-Sidik-Jonkman (HKSJ) method when calculating 95% con dence intervals for the average performance [34,35].
Heterogeneity in the pooled results is usually dependent on differences in the design and populations across the validation studies, such as changes in case mix variation or baseline risk [36,37]. The case-mix variation of each study will be quanti ed by estimating the standard deviation of the linear predictor [20]. Cochran's Q and the I²statistic will be calculated for statistical homogeneity assessment of heterogeneity [38]. Potential sources of heterogeneity will be explored using meta-regression analyses if there are enough studies included in the meta-analyses (≥10 studies) [39,40].

Subgroup analysis
If the number of included studies is large enough for speci c subgroups, we will perform the following subgroup analyses: (1) type of prediction model; (2) source of data; (3) outcomes to be predicted (hospitalization and readmission); (4) method of predictive model building; (5) location; (6) timing (at which time point and over what period the outcome is predicted); (7) settings; and (8) others. Further subgroup analysis will be dependent on the nal data extraction.

Sensitivity analysis
Sensitivity analyses will be conducted by excluding studies with a high risk of bias (at least 4/7 domain determined using the PROBAST tool) and studies with low reporting transparency (the overall TRIPOD adherence score <50%), to explore their in uence on effect size.

Reporting and dissemination
We will report our review per guidance by the PRISMA statement [41]. The GRADE approach (grading of recommendations, assessment, derivation, and evaluation) [42] will be used to appraise con dence in estimates. Informed consent and ethical approval are not required because all data will be acquired from published studies. The results of the study will be published in peer-reviewed journals and presented at conferences if possible. Any important protocol modi cations will be presented and made available on the PROSPERO registration.

Discussion
This systematic review protocol will identify existing prognostic models for mortality and readmission in patients with AHF. These prognostic models will be comprehensively summarized and critically appraised, and a meta-analysis will be conducted on their performance compared across pre-de ned subgroups if appropriate.
The methodological quality (which includes applicability and risk of bias) and reporting transparency of the available reviews seem to be suboptimal [30,43]. Good methodological quality and transparent reporting are essential for clinicians to accurately judge the performance and applicability of prognostic models in real clinical settings of AHF for individualized predictions. Without these criteria, the use of prediction models will be limited due to lack of availability, and sometimes can even be misleading [44,45]. Importantly, the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) [21] and the Prediction model Risk Of Bias Assessment Tool (PROBAST) [23] were published in 2015 and 2019 to improve reporting transparency and methodological quality of prediction. A 2020 systematic review regarding risk prediction models in patients with AHF have focused on the models utilized in the ED setting; however, models exclusively applied in hospitalized patients were excluded from the review [25].
Theses aforementioned limitations compromise the relevancy of the model and the comprehensiveness of the review. Our review will include the models utilized in both ED and inpatient settings and compare the performance of two groups with meta-analysis if feasible. Thus, our review will complement the existing reviews in this area. Additionally, we will review the risk of bias and relevance using PROBAST, and the adherence to the reporting guideline using TRIPOD, which have not been previously reported in reviews. The study may provide a structural reference of data extraction and critical appraisal for future reviews. Finally, our review will utilize a formal training to standardize article selection, data extraction, and assessment.
We acknowledge several limitations in our review protocol. First, the included studies may report the model performance using different kinds of measures, and the procedure of merging data may lead to additional bias. Secondly, there may be limited studies meeting our eligibility criteria, which is not su cient for further meta-analyses. Thirdly, we will exclude models focusing purely on other outcomes apart from mortality and hospitalization, such as adverse events [46]. Future studies may extend the scope to all models for other kinds of outcomes and review all AHF prediction models.
Ultimately, our systematic review will identify prognostic models, which may help clinicians to better allocate and utilize medical resources and contribute to more precise, risk-adjusted management, allowing personalized prevention and therapeutic options while decreasing mortality and hospital readmission in AHF. Table 1 Framing of this systematic review using key items identi ed by the CHARMS checklist [22] Items Comments

Availability of data and materials
Not applicable.

Competing interests
The authors declare that the research was conducted without any commercial or nancial relationships that could be considered as potential con icts of interest.

Funding
The study is supported by National Key R&D Program of China(2017YFC1700400, 2017YFC1700403) and National Natural Science Foundation of China(82174233). The funder played no role in the research.

Author Contributions
HS, YL and XZ conceived the study. KZ, HD, JZ and YC contributed to the design study. YL advised on statistical analysis. XZ drafted the manuscript. All authors provided suggestions during the drafting of manuscript.