We reported the current study according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement.[12]
Source of data: design and sample of the IBenC study
We conducted a comparative validation study by using data of the cross-European “Identifying best practices for care-dependent elderly by Benchmarking Costs and outcomes of community care” (IBenC) study.[13]
Data collection of the IBenC study was performed between January 2014 and August 2016 in six European countries: Belgium, Finland, Germany, Iceland, Italy and the Netherlands. Participants were home care recipients expected to receive care for 6 more months. Other selection criteria can be found elsewhere.[13] For this study we used data from baseline and 6-month follow up.
Data collection of the IBenC cohort
Data on care-recipient characteristics and resource utilization were collected with the interRAI Home Care (interRAI-HC) instrument.[14] The interRAI-HC contains about 300 items, including domains of function, cognition, health, social support and service use with good to excellent interrater reliabilities.[14] Trained (research) nurses collected the data at the residences of home care recipients. All sources of data information were used: patient interviews, care files, observations and information obtained from informal and formal caregivers.[13]
Outcome measures
The dichotomous outcomes of this study were the presence of “unplanned hospital admissions”, “ED visits”, and “any unplanned hospital visits” from three to six months after baseline. This timeframe was defined as such, since InterRAI-HC assesses hospital admissions and ED visits 90 days prior to follow-up.
Study population and loss to follow up
At baseline, the IBenC cohort consisted of 2656 home care recipients. After six months, 347 participants (13.1%) were lost to follow up (see Supplementary Figure 1, Additional File 1). Participants with missing outcomes because of death or a nursing home admission had significantly higher age, more comorbidities and more functional impairments compared to care recipients with an available outcome (data not shown). Moreover, these participants did not match the target population for which the risk scores are developed, and were therefore excluded (n=210).[15] Regarding the remaining missing data (n=137), multiple imputation (MI) was applied, resulting in a total sample of 2446 cases for this study.[16]
Risk scores
We calculated seven risk scores to predict unplanned hospitalizations or ED visits in the IBenC study sample. A detailed description of the risk scores and their use within the IBenC data can be found in Additional File 2.
Four of the risk scores were developed to predict hospital admissions or ED visits in older people specifically. We selected these risk scores because of their accurate predictive value after validation and/or their applicability within the IBenC data. The risk scores are listed below:
- Detection of Indicators and Vulnerabilities for Emergency Room Trips (DIVERT) scale[4]
The DIVERT is a prognostic case-finding tool for ED use within 6 months. The tool was derived and internally validated using routinely collected data from interRAI-HC assessments in home healthcare recipients in Canada. More than 80% of home care recipients were aged ≥ 65. ED use was assessed through electronic records.
- The Community Assessment Risk Screen (CARS)[5]
The CARS is a tool developed in Illinois (USA) for stratifying community-dwelling older adults at risk for hospitalizations or ED visits within 12 months. The development cohort consisted of Medicare fee-for-service patients and the tool was externally validated in a cohort of individuals enrolled in a Medicare Risk Demonstration. All participants were aged ≥ 65. Data were obtained through telephone interviews and mailed questionnaires. Health care utilization was mainly determined from claims files.
- The Emergency Admission Risk Likelihood Index (EARLI)[6]
The EARLI is a tool to predict the likelihood of emergency hospital admission within 12 months. Data of the development and (external) validation cohorts came from questionnaires sent to older people aged ≥ 75 registered with general practices in north-west England. Emergency hospitalizations were determined through administrative and clinical data.
- The Previous Acute Admissions (PAA) score
Prior hospital visits is considered an important predictor of unplanned hospitalizations[17, 18], and is an item in all above risk scores. Because of its predictive potential in combination with easy applicability, we decided to assess the performance of the PAA-score as an individual risk score. The PAA-score is a discrete measure, based on interRAI-HC data, which accumulates the number of unplanned ED visits and hospitalizations in the past 90 days.
In addition, we computed three generic frailty indicators in the IBenC data;
- The MDS Changes in Health, End-stage disease and Symptoms and Signs (CHESS) scale[19]
The CHESS was developed using routinely collected data and was designed to identify health instability (i.e. mortality and hospitalizations) within 30 days in long-term care residents. The development cohort consisted of Medicare beneficiaries aged 65 and over, newly admitted to a nursing home. The scale was temporally validated in a cohort admitted one year later and was also tested in long-stay nursing home residents. Death and hospitalizations were obtained from medical files.
- Fried’s Frailty Criteria (FFC)[20, 21]
The FFC was developed to define a phenotype of frailty based on 5 criteria. The criteria were based on a prospective study of adults aged 65 and older, and were validated in community-dwelling older women. For this study, we applied the Bandeen-Roche specifications to operationalize the criterion ‘Weakness’.[21]
- The Frailty Index (FI)[22, 23]
The FI developed by Rockwood et al.[24] is based on an accumulation of deficits approach. The FI is calculated as the proportion of potential deficits and therefore ranges from 0 to 1. For this study we combined the FI’s developed by Armstrong et al. and Lutomski et al., resulting in an FI of 44 deficits.
Statistical analysis
Descriptive statistics were performed for baseline characteristics and main outcomes. Loss to follow-up analyses were performed using logistic regression. P-values <0.05 were considered statistically significant. All variables with missing data were handled through application of MI by chained equations (m=5) (Additional File 1).[25] We compared two MI procedures; one multilevel method, with country as cluster variable, and one normal MI method including the country variable. The dataset with multilevel imputation[25] was used as primary results.
We used the original scoring systems to compute the risk scores and used these scores to assess their performance. Performance of the risk scores was evaluated based on discrimination and calibration. Discrimination is the ability of the risk score to differentiate between participants with and those without the outcome. This was estimated with the area under the receiver operating curve (AUC) with 95% confidence intervals.
Calibration reflects the agreement between observed and predicted values. Calibration was inspected graphically with calibration plots.
For calibration of the risk scores developed using logistic regression (i.e. CARS and EARLI), we used the coefficients reported in the original publications and determined the intercept in the IBenC data, since the intercept was not reported in the original publication. The intercept and coefficients of the PAA-score were completely based on the IBenC data. The calibration plots of these three risk scores were constructed for all three outcomes.
DIVERT and CHESS were not based on logistic regression and we therefore used the observed proportion of the outcome from the original publications (i.e. ED visits and hospital admissions, respectively) in the respective risk categories (1-6 and 0-5, respectively) as the expected proportion for that risk category within the IBenC data. This could only be done for equal outcomes.
The FFC and FI did not use logistic regression nor had an identical outcome, calibration measures could therefore not be assessed.
Statistical analyses were performed with SPSS version 26.0 and R Studio Version 1.1.463. For MI and pooling analyses in R, we used R packages mice, miceadds, micemd and psfmi.