Predictors and prognostic models for early discharge planning of hospitalized acute geriatric patients, a retrospective study

Motivation: When treated for an acute disorder, older adults are vulnerable for functional losses and the need of care after discharge. In a specialised geriatric ward, patients get a comprehensive treatment complementary to medical care in order to maintain and improve mobility and activities of daily living (ADL) to facilitate the return to domesticity. The aim of this paper is to identify the relevant predictors for the impact of geriatric treatment on the status at discharge, which are then used in logistic models to predict a patient’s potential to reach a certain level of independence during treatment. Method: In a retrospective cohort study with 580 patients, we analysed the impact of acute geriatric early rehabilitation on the functional outcome after treatment. As a sufficient improvement of ADLs and mobility we defined as a suitable endpoint at least 60 Barthel Points (ADL) and the ability for „Timed-Up-and-Go-Test“(TUG) when discharged from acute hospital care. To identify relevant predictors in the set of the screening assessments at admission we used linear and logistic regressions as well as odds-ratios. Multivariate logistic models are used to predict the probability that at patient reaches the endpoint. Their predictive quality is tested on additional 120 test patients from a different cohort. Results: Statistical analysis shows that all patients improved during early rehabilitation significantly in ADLs and the physical function (TUG). Barthel-Score, walking distance and handgrip on admission are the strongest predictors for the outcome after geriatric treatment. Logistic models predict the outcome correctly in 70% to 80% of the cases. These models once established for a certain cohort of patients can be applied a descriptive statistical listing of the data. As our endpoint contains Barthel-Score and TUG as outcome measures, we first analyze dependencies of BD and TUGd on admission assessments separately. We extend this analysis to the combined output measure as a suitable endpoint and develop a predictive model for prognosis of the patient’s potential reaching the endpoint. We describe the overall model quality using sensitivity, specificity and the area under the ROC curve. To test the quality of the models, we apply the models to data of a different cohort of test-patients.


Background
Older adults are the largest group of hospitalized patients. In western countries up to 50% [1] of the admitted persons are 65 years or older. Most older adults live at home (74% in Germany and 67,5% in OECD25 [2]). However, older patients in hospitals suffer severe risks of functional decline. Early rehabilitation of the elderly along with the clinical treatment in specialized acute geriatric wards, labeled often as Acute Care for the Elderly units (ACE), minimizes these risks. Such a treatment is an important and useful strategy, to facilitate a return to their previous social and domestic live [1,3,4]. However, early rehabilitation patients typically need support or even institutional care after discharge. Hence, discharge planning is an important task, which should start as early as possible. To aid this process, a prognostic tool, which quantifies the potential of a patient already based on geriatric assessments at admission would be helpful.
To explore the potential of a patient we need a useful and efficient outcome measure. There is a wide variety of outcome measures reported in the literature [5], where mobility and the Barthel score for ADLs were found to be most sensitive for the benefits of early rehabilitation in acute care. Hence, we use as a first outcome target a Barthel-Score of minimum 60 points, which is the approved level of independence for acceptance in a further rehabilitation unit. As independence needs a certain capability to stand up and walk, the second outcome target is the ability to perform the Timed-Upand-Go-Test (TUG). Both outcome targets form a meaningful endpoint. To identify strong predictors for a positive outcome we analyze the scores from standard geriatric assessments at admission. With predictors showing the largest effect-sizes logistic models are formed, to predict the probability to meet the outcome target successfully. These prognostic models are tested on different cohorts of patients to analyze their predictive power.
Organization of the paper is as follows. At first, we describe the procedures and participants of this retrospective study, the geriatric assessments and give an overview of the statistical methods and a descriptive statistical listing of the data. As our endpoint contains Barthel-Score and TUG as outcome measures, we first analyze dependencies of BD and TUGd on admission assessments separately. We extend this analysis to the combined output measure as a suitable endpoint and develop a predictive model for prognosis of the patient's potential reaching the endpoint. We describe the overall model quality using sensitivity, specificity and the area under the ROC curve. To test the quality of the models, we apply the models to data of a different cohort of test-patients.

Procedures and Participants
Participants in this retrospective cross sectional study were 700 randomly chosen patients of the acute geriatric ward of a medium sized hospital in a rural area in Germany from 2009-2019. Data of 580 patients are analyzed and 351 of them with a BA less than 60 are used for model development. Another group of 120 patients, treated in 2018 and 2019, form the test cohort.
Patients are typically aged > 70 years, have multiple chronical illnesses and are admitted because of an acute disorder. They were transferred to the geriatric ward either from the surgical or internal ward of the hospital or other hospitals nearby without a geriatric unit. The disorders treated were internal (apoplexy, cardio-vascular disorders, diabetes, pneumonia, infections) or surgical (fractures of femoral p. 4 neck, pelvis, vertebral bodies and humerus, treated surgically or conservatively) or convalescents after abdominal or cardiac surgery.
During their stay (typically 14 days) patients get, complementary to acute medical care, a comprehensive training in order to maintain and improve mobility and activities of daily living. The rehabilitative training comprises intense activating daily care, physiotherapy, occupational therapy, physical exercise, cognitive training and if needed speech therapy and psychological consulting. The training is tailored to meet the specific individual deficits of a patient. Ten multidisciplinary therapeutic training sessions are scheduled per week. Staff-members (nursing, therapists and doctors) in the geriatric ward have a special geriatric qualification and meet regularly to discuss the development of the patients.

Randomization
Participants in this study were randomly chosen, selected by picking arbitrary hospital registration numbers of patients admitted to the geriatric ward from 2009-2019. Data of 580 patients form the basic cohort used for statistical analysis and 351 datasets with BS < 60 were used for model development. Other 120, randomly chosen patients with BS < 60 form a test cohort, to validate the predictive power of the model. To compensate for bias and outliers in the randomly chosen cohorts, bootstrap resampling was applied to derive 95%-confidential intervals.

Measures
For the whole set of treatment, the hospital gets a fixed remuneration under certain conditions: Specially trained staff, adequate equipment and furnishing of the ward, size of the hospital rooms and specific procedures. In particular, the testing of mobility, ADLs, cognition and emotion on admission and ADLs and mobility at discharge is mandatory. The specific tests were selected by the hospital and then accredited by the federal geriatric association.
The following tests were applied at admission: Barthel-Score (BS), Timed-up and Go-Test (TUG), walking distance (WD) and handgrip strength (HG), Shulman's Clock-Drawing-Test (CDT) and Mini-Mental State Examination (MMSE) and the geriatric depression scale (GDS-15). Tests for physical function (TUG, HG, WD) and BS were repeated before discharge.
The Barthel-Score (BS) is a marker for the performance in ADL. A higher score is associated with a greater likelihood of being able to live with a certain degree of independence. According to ICD-10, Barthel-Scores are clustered in 5 intervals. A score 60-75 indicates a medium sized impairment. In the three intervals below 60 there is growing dysfunctionality with falling Barthel-Scores. There is none or only a slight impairment for scores from 80 to 95.
TUG-testing results indicate fall risks and measure the impairment of mobility by taking the time in seconds required to stand up from a chair, followed by walking 3 meters, turn around walk back and sit down again. The TUG-performance is measured in seconds and hence a quantitative indicator for physical functioning as frailty and falling risk but is also a useful indicator for cognition [11,12,22,25,29]. A TUG time more than 12 seconds is predictive for future falls [24].
Hand grip strength correlates positively with overall physical performance and has a predictive validity for decline in cognition and mobility [6,13,14]. Handgrip was measured in kPa using the Vigorimeter (KLS Martin), which is as reliable as the JAMAR Dynamometer [15]. In the present study, we consider the averaged pressure of both hands.
The walking distance (WD) is an indicator for physical function, cognition and mental status as well as their interrelationship [16,17]. This test accounts for three important aspects of overall functioning. We measure the untimed walking distance (WD) in meter.
For the assessment of cognition, we use the Mini-Mental State Examination (MMSE) [18] and Shulman's Clock-Drawing-Test (CDT) [19,20]. MMSE is a screening test for dementia with a maximum score of 30 points. The threshold for no or negligible impairment are 24 points. There are three categories for mild (19-23 points), moderate (10-18 points) and severe deficits for less or equal 9 marks. For performing in the Clock-Drawing-Test patients have to draw the dial of an analogous watch with the watch hands set to certain time: 10 past 11. The outcome is assessed with a 6 point inverted scale: 3 or more points indicate cognitive impairment.
The score for the Geriatric Depression Scale (GDS) is determined via individual self-assessment of the patient [21]. 15 questions have to be answered by yes/no. From a scoring grid with one point per answer, the severity of depression can be assessed. Below 5 points there is no depression, a mild depression is to be expected between 5 to 10 points. 11 to 15 points indicate a severe depression. This test is not possible in the case of dementia.
Besides the screening tests the following characteristics -taken from the hospital records -were included for each patient in the analysis: age, gender, duration of stay in days and the referral from surgical or medical department.

Endpoint and Outcome measures
Heldmann and coworkers [5] analysed outcome measures in a meta-analysis for acute geriatric early rehabilitation. ADL-capacity (BS) is a typical indicator, substantiated in many cases by mobility criteria.
In our view independent or moderately assisted living after treatment, requires a certain capability in ADLs plus a basic mobility e.g. transfer from a passive state (i.e. sitting) to active movements (i.e. walking) and vice versa, while maintaining a low falling risk. As a sufficient improvement of ADLs and mobility, we define as a suitable endpoint at least 60 Barthel-Points plus the ability for an untimed TUG.
A drawback of the timed TUG-test for hospitalized patients is that many are not able to stand up and therefore are incapable to perform the timed test. Hence a certain improvement in seconds at discharge is an outcome measure with a large floor effect. Dichotomous tests (TUG possible or not) also reported in literature [12], avoid this effect. It was shown, that the timed as well as the dichotomous test are significantly associated with functional performance. Patients found able to do the TUG had a lower fall rate as those who failed. We therefore use the dichotomous test as a suitable outcome measure and monitor the TUG-time as well.

Statistical methods
We use SPSS 27 to perform the statistical analysis [23] of the assessment data. A significance-level p < 0.05 was used throughout. The magnitude of a certain effect is measured using standardized effectsize parameters r, f, V: r > 0,5, f > 0,4 and V > 0.5 indicate a strong effect.
The assessment data may exhibit deviations from a normal distribution. For parametric test e.g. t-tests non-parametric tests (e.g. Wilcoxon or Mann Whitney U tests) validate statistical findings. The chisquared test is used to examine statistical properties of categorical variables and odds-ratios to quantify the impact of dichotomous measures (e.g. gender) on the outcome of the geriatric treatment. With Bootstrapping using 1000 samples, we obtain 95%-ranges for the statistical findings.
Linear and logistic regressions were performed to examine the impact of the screening parameters (predictors) taken at admission (BA, HG, WD, MMSE etc.) on the outcome at discharge. Linear regression analysis results not only in optimal fits of the data with a straight line but comprises correlation of predictors with endpoints as well. Logistic regression defines a model for the probability P (0 ≤ P ≤ 1) to reach a certain condition, here to meet the endpoint. Using P = 0.5 as the classification p. 6 cutoff, a binary classification is possible i.e. a go/no-go-decision. P is a logistic function of the metric assessment scores weighted with coefficients fitted to the data. For a bivariate analysis e.g. P(BA) BA is the only variable considered.
Logistic regression models for a probability once established for a given set of data, may be applied to other data sets. If the probabilities predict the outcome in such test sets with comparable accuracy, then the models obtained from the initial data have an inherent generic predictive quality. To investigate this, we compared the clinical outcomes to the prognosis of the models for the initial data set and for 120 different test-patients. Quality measures are sensitivity and specificity of the P-models using P = 0.5 as the classification cutoff and the ROC-Curves (Receiver Operating Characteristics). The ROC-curve evaluates a model by plotting the true positive rate (sensitivity) against the false positive rate (1-specificity) of the model-predictions with varied decision threshold settings (i.e. P-values). A reference for no predictive quality is a random classifier giving a 45°-line in the ROC-diagram. Specifically, the area under ROC-curves (AUC) is a quantitative quality criterion. The range is between 0 and 1 and the larger the area, the better [30]. A suitable index-number for the predictive quality is AUC minus the area under the 45°-line multiplied by 2. This defines the Gini-Index which varies between 0 and 1. The best discrimination threshold yields a maximum number of true positives and the minimum number of false positives. This threshold maximizes the Jouden-Index which is sum of sensitivity plus specificity minus 1. In the ROC curve the best discriminator corresponds to the point with the largest distance from the 45° line and maybe determined by placing the 45°-tangent to the curve.

Patient data
The basic cohort with 580 patients consists of 426 women, 154 men. 291 of them had internal diseases 289 surgical problems (84 of them were treated conservatively). Screening assessments at admission, are listed Table 1.

Scores and statistics Valid measures Range M ± SD Median
Barthel score ( Predictors for Barthel-Score or TUG at discharge The endpoint of this study consists of two components: BD / TUGd. As a first step, we investigate the predictors for BD and TUGd independently to quantify the impact of the geriatric early rehabilitation scheme on ADLs and mobility respectively. Patients BA score was categorized according to the five ICD 10 groups. The improvement of the score during hospitalization is determined using the difference of means (M ) before and after treatment in each category and for all n = 580 patients ( Fig. 1). A paired sample t-test for each group before and after treatment is performed and the results are confirmed by a non-parametric Wilcoxon test, Tab. 2. The increase in BS is significant with a strong effect size in all groups 0.78 < r < 0.91. BD minus BA is around 20 points, i.e. on the average patients improve their ADL-score by one ICD10 group. The groups 20 to 35 and 40 to 55 have the highest benefit from early geriatric rehabilitation.

Predictors for BD
To find the most important predictors for BS-improvement we examined all metric scores at admission for all 580 patients, (except CDT, which is categorical), with bivariate linear regressions (Table 3). Ftests prove significance for the linear model and t-tests significance for the coefficients (i.e. constant and slope of the linear fit). R², the measure of determination, is the squared Pearson correlation p. 8 coefficient R and f is Cohens effect size. The table lists significant models only with a large enough Fvalue for p < 0.05.
As expected, the Barthel score at admission (BA) has the strongest impact on the score at discharge (BD   The figure shows that BA has the highest leverage on the Barthel score at discharge. The other predictors with identical but almost smaller slopes have a moderate impact on BD and they need a large vertical intercept to interpolate between the measured data. This finding is in perfect accordance with the effect sizes and the Pearson correlation coefficients, see Tab. 2.
TUG-test at admission and discharge:

Dichotomous variable: Logistic Regression
As TUG times at admission as well as on discharge are typically well above the cut off value of 12 seconds and as only 33% of the patients were able at admission to do the test, the dichotomous version of the test (TUG possible or not) is the appropriate choice. Bivariate logistic regressions are applied to identify strong predictors for such dichotomous outcome-measures. ²-tests prove the significance for the logistic model. The coefficient of determination is Nagelkerke's R², from which Cohens f is obtained. . MMSE (f =0.305) and age (f =0.139) establish significant models but the effect size is medium or small respectively. Note that WD and HG change positions in the hierarchical order as compared to the linear BD-regression. In particular, the small effect size of age indicates that age is not very important for performing the TUG at discharge. Weight, GDS and hospitalization in days have a non-significant effect and hence are not considered.

Predictors for BD 60 and TUGd (endpoint) for patients with BA < 60
The endpoint of this study is the joint performance in metric BS (BD 60) and in the dichotomous TUG (TUGd possible or not). Reaching the endpoint successfully is meaningful only for patients whose BA is less than 60 to avoid biasing. This applies for 351 of the total 580 patients.
TUG and BS in the BA < 60 group All metric scores of these 351 patients at admission are listed in Table 5. Comparing Tab. 5 and Tab. 1 shows that major differences are (aside BA) found in WD, HG and particular in TUG: The ability to perform the TUG in the full cohort is 194/580 = 33% and in the BA < 60 group 50/351 = 14% only. In total 164 of the 351 patients are reaching the endpoint but 184 (52%) could do the TUG at discharge.
It should be noted that 20 patients who could perform the TUG at discharge, failed to reach a BD 60 and 58 out of 222 patients with BD 60 failed to perform TUGd.  In this group (BA < 60) patients improve their BS-scores by 21.6 ± 13.4 (n=351) and 129 still had a BD less than 60. Patients with a successful TUG at discharge improve their BD significantly higher than those who were unable to do the TUG at discharge (BD = 27.2 ± 11.4 and BD = 15.4 ± 12.7). This difference is significant (unpaired t-test p = 0.000, r = 0.45 or Mann-Whitney-U-Test p = 0.000, r = 0.47).

Scores and statistics Valid measures Range M ± SD Median
Patients meeting both endpoint criteria enhance their BS-scores significantly higher by 28.7 ± 9.9 (n=164) than those, who fail to do so: 15.3 ± 13.0. A t-test for unpaired groups results in p = 0.000, r = 0.50 and the Mann Whitney-U-test gives p = 0.000, r = 0.69. TUG-time for patients reaching the endpoint is 24.3s ± 9.6s. 20 patients with TUGd but BD < 60 hat a TUG = 29.4 ± 10.9s. Though the difference of means is large, the t-Test is non-significant: p = 0.058.  Larger effect sizes are found for binary versions of BA, WD, MMSE and HG. CDT is on the verge of being significant, and age above or below 85 plays a non-significant role for the outcome of the treatment.

Bivariate Logistic Regression
Only the method of treatment in the surgical ward has a minor impact on the outcome. Gender is also not significant. Age as a confounder?
To rule out age as a confounder for the dominant predictors, we analyze bivariate linear regressions for BA, WD, MMSE and HG with age as the input variable. Age is a significant predictor only for handgrip but with a small effect size. See Table 8.

Summary of bivariate analysis
The above investigations showed that only a few of the assessments at admission exhibit a significant impact with a relevant effect size on the improvement of BS, TUGd or the endpoint. Fig. 4 summarizes the effect sizes found in bivariate regression analysis. Important predictors are BA, WD, HG and MMSE.

Fig. 4 Effect sizes of significant predictors in bivariate regressions for BD, TUGd (580 patients) and reaching the endpoint (351 patients with BA < 60 at admission) as outcome variables.
Note that age, weight, GDS, gender, length of hospitalization are not significant for the endpoint. We use the important metric predictors to develop prognostic models in the following chapter.

Prognostic Models
Logistic regression models with appropriate predictors as variables yield probabilities (P) for reaching the endpoint. In this way, patients with the highest rehabilitation potential maybe identified. Logistic models with three, two and a single one of the strong predictors identified above were derived. Table  9 shows the model-parameters for the multivariate models. Parameters for bivariate models may be taken from Table 6. Fig. 5 shows as a sample a 2D-plot of the probability function P(BA, WD). We see that BA with the largest effect size creates a larger curvature. This is in accordance with the decreasing effect size.
p. 14  Another more practical way of visualizing probabilities are contour-plots (Fig. 6). For example, the probability P to meet the endpoint with a certain HG and WD at admission maybe taken from the intersection of the corresponding vertical and horizontal lines. If for instance HG = 10 kPa and WD = 40m the probability for a reaching the endpoint is 66%, see Fig. 6. To use the three dimensional model P(BA, WD, HG) a graphic evaluation is not possible and P has to be obtained by computation.

Test of Models
As a first test we determined sensitivity, specificity and correct classification rate using P = 0.5 as the classification threshold (Table 10). If P > 0.5 then we expect the patient to reach the endpoint otherwise not. Table 10 shows the results for patients with BA < 60, left: 351 patients used for model derivation and right: 120 additional patients from the test cohort, whose screening parameters at admission were not used for model derivation. Obviously, qualitatively similar classification rates are achieved for the test-set of 120 patients as for the cohort of 351 patients.
A systematic evaluation of model quality is done by using ROC-curves, AUCs and the relevant indices for the test-set and the basic cohort, Tab. 11. Fig. 7 shows these curves for the models with the best Gini-Indices (P(BA, WD, HG), P(BA, WD) and P(WD)). We see that in both cases the three predictormodel has the largest area under the curve and hence gives the best classification results of the three models. The optimal cut-off (Jouden-Index) is found by inspecting the coordinates of the ROC-curves. The index corresponds to the 45°-tangent to the ROC-curve shown in Fig. 7 for the three predictor model. In both plots there is not a singular touching point but a certain range is possible to place the tangent. The best cutoff-point for symmetrical sensitivity and specificity (351: 72% to 72% or 120: 78% to 78% respectively) is indicated in the ROC-curves and is found in both cases from the coordinates at P = 0.57 or P = 0.63.    Table 11 list Jouden-indices, AUCs (95%-confidence intervals) and Gini-coefficients for both cohorts and for all significant models. The Gini-coefficients and the AUCs show the same qualitative behavior for both cohorts though the values for the 351-group are systematically smaller. The Jouden-Indices for multivariate models suggest larger cutoffs than P = 0.5 for optimum model quality. Obviously the classification cut-off P = 0.5 used so far favors true positives rather than true negative outcomes. In the case on bivariate models smaller cutoffs are sometimes better.

Discussion
In this work we investigated the impact of early in-patient rehabilitation along with medical treatment in a specialized acute geriatric ward. The aim of this paper was to study and predict the improvements of the patient's condition from admission to discharge using standardized screening parameters taken at admission. As in acute geriatric care typically outcome measures [5] combine ADLs with mobility we use in this study BD 60 for ADLs at discharge, which is the approved level of independence for acceptance in a further rehabilitation unit. The outcome target for mobility is a successful but dichotomous Timed-Up-and-Go-Test (TUGd = possible). TUG is not only an indicator for mobility but correlates also with cognition [22,25]. Hence TUG-performance and Barthel score which together form the endpoint correspond to a reasonable level of physical functioning and independence after treatment. With prognostic models we can quantify a patient's potential to reach this endpoint. We started our analysis with 580 randomly chosen patients and investigated predictors for BD 60 and TUGd separately and then in combination as the endpoint.
Descriptive statistics showed that the Barthel-score in was significantly higher at discharge than at admission, with a strong effect size in all ICD-10 groups. Patients starting the treatment with a BA between 20 and 55 points have the largest improvement: Their BD was more than 20 points higher, which corresponds to a substantial reduction of the assistance needed after discharge. If 20 points is the average improvement, we see that 40 Barthel-points at admission indicate that a patient may leave the hospital with a mild ADL-impairment (BD 60).
In order to find predictors for BS-improvement we use linear regression models and classify the relevance of the significant predictors by effect sizes. As expected, the Barthel score at admission (BA) has the strongest impact. This is in accordance with the literature [26,27]. HG, WD and MMSE follow in descending order. GDS-15 and TUG-time are weak predictors, however there is correlation between TUG and the ADLs [21]. Note that age had only a small influence on the BS-improvement (BD-BA). Weight and duration of treatment were found to be non-significant for BD. These findings are in accordance with a meta-analysis [31], which showed that ADL-improvement in acute geriatric care is dependent neither on age nor the days of hospitalization.
TUG-testing is a problem for geriatric patients because many cannot stand up und this results in a large floor effect. The analysis of the TUG-testing showed that only 194 (33%) of the patients could perform the test at admission and 406 (70%) at discharge. The number of patients with a TUG time under the 12 seconds limit [24] for normal mobility increased from 6 at admission to 47 at discharge. These small numbers indicate, that TUG-times are not suitable as criteria for mobility of acute geriatric patients. Large floor-effects are seen in the above cited meta-analysis [31] as well, where the data of almost 154.000 patients were analyzed, but only 80.700 data sets for the timed TUG were available. TUGtimes are supposed to be indicators for fall risks, but as it was shown in [12] a dichotomous TUG has the same predictive quality as the timed TUG. Moreover, typical geriatric patients will not achieve normal functional mobility after treatment, with TUG-times at admission of around 30s and an averaged improvement of 10s [31]. These are approximately the same times found for our patients. The mean TUG time improved significantly from 26.7s to 17.9s (n = 194). If the level of assistance needed after discharge is of interest, the dichotomous TUGd is sufficient. Less assistance is required, if a patient can stand up and walk even if the TUG time is 20s or higher.
The important predictors for TUGd with a strong effect size are the same as found in the BD-analysis: BA, WD, HG. MMSE and HG loose some importance and WD seems to be more relevant for TUGd than HG, which is obvious, since TUG and WD are mobility indicators [16,17,24]. Age is significant but with a small effect size. Weight, GDS and hospitalization in days have non-significant effects.  [27,28] and hence a strong HG favours higher BD, (see Tab. 3), but is less relevant for the performance of the TUG. HG tests only static strength whereas the TUG performance has multifactorial components and a dynamic aspect. MMSE keeps the forth position in the effect size ranking, already found for the independent regressions for BD and TUGd. This may be either related to moderate impact on physical function, or, more likely, to the small spread of MMSEscores found in our data. Our patients typically have no relevant impairments in cognition (M = 24.3 SD = ± 5.0, Median 26.0) and hence small differences in MMSE-scores can only produce limited differences in the outcome.
Odds-Ratios for reaching the endpoint support the ranking of metric predictors by effect sizes. Significant higher Odds for reaching the endpoint were found for BA 40, WD > 20m, MMSE 24, and HG 25 kPa. In case of the surgical patients only conservative treatment is significantly favorable as compared to an operation, and patients admitted from an internal ward have higher Odds than those from a surgical ward. The effect sizes for BA and WD are medium sized, all others Odds are either not significant or exhibit weak effects. Again BA and WD are most important, followed by MMSE and HG. Note, that age < 85 and male/female yield non-significant Odds.
However, as only elderly patients are admitted and age is a marker for functional decline, age is expected to correlate with our strong predictors and hence be confounding. However, bivariate linear regression for BA, WD, MMSE with age as predictor result in non-significant models. HG is the only predictor that exhibits a significant correlation with age as reported already in [6] and declines by 0.29 kPa/(year of age), which is of no clinical relevance. Hence age is no relevant confounder for the dominant predictors. Even the oldest benefit from the comprehensive treatment and improve their ADLs and mobility significantly as already noted in [31].
Logistic regression models with the most important predictors as variables yield probabilities (P) for reaching the endpoint. In this way, patients with the highest rehabilitation potential maybe identified.
Prognosis in medical decision-making and the prognostic value of geriatric assessments was investigated by several authors [7, 8, 9, 10, 29, 32]. The main focus was survival rates after critical clinical conditions, fall risks, hospitalization and mortality, but to our knowledge, no prognostic models for the outcome of acute geriatric care was published so far. Here we derived probabilities P, that a patient in an acute geriatric ward with a certain set of assessment scores in BA, HG and WD at admission reaches a well-defined endpoint in ADLs and mobility at discharge.
The predictive power of seven multivariate and bivariate models was analyzed: P(BA, HG, WD), P(BA, WD), P(BA,HG) and P(HG,WD), P(BA), P(WD), P(HG). We compared sensitivities, specificities and the correct classification rate with P = 0.5 as the classification cut-off for the basic set of 351 patients (BA < 60) and for a different set of 120 test patients with BA < 60, not used for model derivation. The classification rates for the test-set as well as for the full cohort of 351 patients are similar but the classification works slightly better for the test-set. As the 95% confidential intervals of the AUCs overlap this may be a random effect. The full model P(BA, WD, HG) yields the best correct classification rates, followed by P(BA, WD) as a two-predictor model. P(BA) and P(WD) are the best bivariate models. However, the P(WD, HG) based on simple and quick assessments yields a sufficient measure to explore the potential of a patient, e.g. by using the contour plot (Fig. 5).
ROC-curves are frequently used [7, 29, 32] for a systematic evaluation of model quality. We see that for both cohorts the three predictor-model has the largest area under the curve and hence gives the best classification. The model with BA and WD is confirmed to be second best and from the onedimensional models either P(BA) or P(WD) are useful. The standard classification cut-off P = 0.5 results in suboptimal classification results since positive rather than true negative outcomes are preferred. The best cutoff-point for symmetrical sensitivity and specificity, i.e. the best Jouden-index, is found for larger cutoffs than 0.5 for multivariate models and close to 0.5 for bivariate models. Hence bivariatemodels seem to be more specific and less sensitive as compared to multivariate models.
A comparison of model accuracy in terms of AUCs to other prognostic models in geriatric research imply that our models for predicting the level of independence and physical functioning show an outstanding predictive quality. Predictions for future falls on the basis of TUG-times or other mobility related criteria e.g. have AUCs in range 0.51 to 0.67 [29,32], whereas our models show AUCs from 0.7 to 0.89. Note that 0.5 is the AUC of a random classifier and 1.0 is the maximum value for the perfect model. Therefore, prognosis of future falls seems to be more complicated, as the limited predictive quality of the models in [29,32] show. However, unexpected hospitalization within a certain timeperiod and 3-or 5-years mortality can be predicted with similar prognostic strength [7], since similar AUCs are reported as in our investigation.

Limitations
In total, we investigated the data of randomly chosen 700 patients. Bias was minimized by using bootstrap resampling methods to derive 95% confidential-intervals and by considering patients from a 10-years-time span. The main limitation is the single centric character of the study. Moreover, the hospital is located in a rural area with an almost mono-ethnic population with a high fraction of patients working in farming and industry. Hence the physical background is different from patients in in metropolitan areas. Data from other geriatric wards would be highly welcome to test our approach.
Our prognostic models address the potential for independent living but whether a patient in fact was able to return to home depends on his or her social situation, which is not under the control of the hospital. Since our study is retrospective there is no possibility for a follow up. A longitudinal study would therefore be of great interest. These questions will be addressed in future investigations.

Conclusions
In this study, we showed that early rehabilitation in acute geriatric care improves significantly ADLcapability and physical function. Whether the improvements are sufficient for a discharge to domesticity can be predicted by reliable prognostic models, which explore the potential of a patient already at admission using standardized screening parameters for the functional status.
A BD 60 and the ability to perform the TUGd is a valid endpoint and useful criterion to quantify the level of physical functioning and independence at discharge. TUGd and BD are complementary and predictive measures for the amount of assistance needed by a patient after returning to domesticity, because this combination addresses a wide range of competences as ADLs, cognition and motoric capabilities. Patients reaching the endpoint have the potential to be discharged to home or to live with reduced assistance in institutionalized care or in nursing homes.
The predictive quality of screening tests at admission was analyzed using linear and logistic regressions for BD, then TUGd and finally for their combination defining the endpoint. Significant predictors with strong effect size are the assessments of BA, WD, HG and to a minor extent MMSE. That means, the functional status at admission has the highest impact on the status at discharge. Age, the kind of disorder (internal or surgery), gender or the duration of hospitalization show no relevance. In fact, all patients benefit from the treatment of their disorders and the comprehensive rehabilitative training in the acute geriatric ward. These results correspond to the findings of surveys including more than 100.000 geriatric patients and this implies that our cohort of 580 patients represents a typical cross section.
Once the important predictors are known, we can calculate the probability that a patient reaches the endpoint using prognostic models. These models determine the potential of a patient for a discharge to home on the basis of the functional status measured at admission, quantified with BA, WD and HG.
The use of such models enables discharge planning at an early stage of treatment in hospital. Different combinations of predictors in the logistic models yield correct classification in up to 80% of the cases. The model with all three important predictors (BA, HG, WD) is the most accurate, however (BA, WD) is almost as reliable. For practical purposes the model not relying on a time-consuming ADLassessment (HG, WD) is also efficient to assess the potential of a patient. Bivariate-models (BA) or (WD) are less sensitive than their multivariate counterparts but have a higher specificity. AUCs verify that these models have a profound predictive quality.
It was verified that the models once established for a representative cohort, are useful to predict the outcome of the geriatric treatment for other patients as well which were not considered for model development. Indeed, the prognostic models worked surprisingly well in the test-cohort. This was validated by comparing ROC-curves, Jouden-indices and Gini-coefficients for the basic cohort with the results for the test patients.
As a basic rule, which can be applied directly in clinical practice, we find that a substantial BA-score of around 40 is the best starting point for the comprehensive treatment in acute geriatric care, which should be backed up by walking distance of more than 25 m. Patients with these competences at admission are very likely to reach a basic level of independence at discharge. We emphasize that some parameters of the patient's status at admission, which seem to be important at first glance, lack any predictive quality. In particular, these are clinical condition, the medical treatment before admission, length of hospitalization, weight, age or gender. They are non-predictive for the outcome of early rehabilitative care if the potential for increased mobility and independence or even a discharge to home is considered. Hence according to our results all kinds of geriatric patients benefit from acute rehabilitative care of the elderly.