Reporting of this study was informed by the STROBE guideline for observational studies, the GRRAS guideline for reliability studies and the criteria of the COSMIN risk of bias checklist.31-34
Design and Setting
We performed a cross-sectional study of the DEMMI’s measurement properties in neurorehabilitation. This study was approved by the Local Committee for Ethics in Medical Research (Canton of Thurgau, Switzerland: 2013/13), was registered a priori (German Clinical Trials Register: DRKS00004681), and all participants gave written informed consent. Briefly, rehabilitation inpatients with neurological conditions were examined with the DEMMI and a set of functional assessments (listed below) on several occasions to analyze the DEMMI’s psychometric properties. The present study reports on the DEMMI’s structural and construct validity, internal consistency, inter-rater reliability, measurement error, interpretability, and feasibility for the complete sample of rehabilitation inpatients with neurological conditions. The DEMMI’s measurement properties for sub-samples with stroke and PD have been published previously.28,29
The study was conducted in a neurological rehabilitation hospital in Switzerland, where patients were typically referred from acute hospitals, neurologist consultants, or general practitioners located in the eastern and central parts of Switzerland.
Participants
The study sample consisted of all inpatients present on May 8, 2013 or entering the rehabilitation hospital consecutively within the following 20 weeks. Inclusion criteria were a neurological disorder and an age of 18 years and older. The main exclusion criteria were severe cognitive impairment and a contraindication for mobilization (for all criteria, see Figure 1).
Procedures
Eligible participants were examined by the primary investigator (TB) in a single session of 30─45 minutes scheduled within the first 7 days after hospital admission, if possible. The DEMMI and a comprehensive set of functional assessments were performed in a standardized order (baseline).
The participants’ socio-demographic data were taken from the medical records. For common disorders, disease-specific measures were performed to describe disease severity and functional capacity. For participants with stroke, the National Institutes of Health Stroke Scale was assessed to measure the global severity of stroke symptoms.35 For participants with PD and Multiple Sclerosis, Hoehn and Yahr staging36 and the Expanded Disability Status Scale37 were completed by the hospital neurologist, respectively. In all three scales, higher scores indicate higher impairment or disease severity.
Inter-rater reliability was examined between 2 trained and experienced physiotherapists, the primary investigator (TB) and a second rater (DM). Characteristics of both raters are described elsewhere.28,29
The second rater performed the DEMMI independently in a convenient sub-sample (reliability sample). Participant selection was mainly based on the second rater’s availability (temporal resources) and on participants’ consent to perform a second study assessment. Both DEMMI assessments were performed within 2 days. To create a stable retest situation, participants were excluded if they reported a change in their physical or mental condition with respect to the first session (e.g., fatigue, pain, ON/OFF state in PD). The test environment (patient’s room) was similar for both sessions (baseline and retest). Both raters were blinded toward each other’s ratings and we tried to balance the number of participants each rater visited first.
A sample size of ≥ 50 participants for reliability studies has been proposed to be “good” at the times of study conduction.38,39 However, within the initial recruitment period (20 weeks), we could not include ≥ 50 participants for each major sub-sample of participants with stroke and PD, respectively. Hence, we set up a second recruitment period, using the same inclusion criteria, and screened all present and incoming patients over a period of 9 consecutive days. This additional sample of convenience was only included in the inter-rater reliability analysis.
Measurements
Participants were assessed with the DEMMI, together with a set of functional assessments, including Berg Balance Scale, timed up and go test, 10-meter walk test, Functional Ambulation Categories (FAC), 6-minute walk test, Performance Oriented Mobility Assessment, and Functional Independence Measure. For the sub-samples of participants with stroke and PD, we performed additional functional assessments, which were only used to analyze these sub-samples.28,29
A detailed description of the assessment procedures and a description of the comparator assessments are given in the Supplementary file 1. Table 1 provides an overview of the scale width and constructs measured by the comparator assessments.
DEMMI
The DEMMI is a performance-based clinical outcome assessment of mobility capacity, consisting of 15 hierarchical mobility items.15,40,20,41 The patient is asked to perform functional tasks related to bed and chair mobility, ambulation, static balance, and dynamic balance. The items are rated with 2-or 3-point response options, resulting in a maximum ordinal score of 19 points. This raw score is transformed into a total interval DEMMI score of 0–100 points, with higher scores indicating a higher level of mobility capacity.
Statistical Analysis
Data were analyzed using SPSS version 23.0 and Microsoft Excel (Professional Plus 2016) for all analyses except the Rasch analysis, which was completed using RUMM2030 version 5.1 software. Descriptive statistics were used to present sample characteristics. Interval-based data were examined for normal distribution with the Shapiro-Wilk test of normality and by visual inspection of the related histograms and P-P-plots. The DEMMI scores were not normally distributed (p < 0.001); therefore, only non-parametric statistics were applied. A significance value of 5% was used.
Measurement properties
Structural validity (Rasch analysis)
The Rasch model is a probabilistic model asserting that item response is a logistic function of item difficulty and person ability16 The DEMMI was developed based on the Rasch model in geriatric inpatients15 and data fitted the model in various other medical conditions.22,20,23,28,30
We performed a Rasch analysis to evaluate the following properties of the DEMMI in neurological inpatients: stochastic (probabilistic) ordering of items, monotonicity (increase in item responses consistent with the underlying trait), local item independence (zero correlation between items when conditioned on the score), unidimensionality, and group invariance (no difference in response to item by group membership when at the same level of (in this case) ‘mobility capacity’), which is also called differential item functioning (DIF). Data fit to the model was deemed acceptable if a set of criteria was fulfilled (Supplementary file 2). Full details of the Rasch analysis process are given elsewhere.42,17 Reporting followed established recommendations.17
A target sample size of at least 150 was set to provide 99% confidence within ± 0.5 logits.43 The unrestricted (partial credit) Rasch polytomous model was used with a conditional pair-wise parameter estimation.
Construct validity
In absence of a ‘gold standard’ for ‘mobility capacity’, construct validity was assessed by following the methodological approach of hypotheses testing.38,39 We used the other functional outcomes and participants’ clinical information to assess the DEMMI’s construct validity. Aspects of convergent and known-groups validity were used to formulate 11 hypotheses (H1–H15).39,44 All hypotheses were formulated a priori, based on existing literature, and the clinical expertise of clinicians and the research team.30,15,45,23,20 Formulated and shortened versions of the hypotheses are presented in Supplementary file 1 and Table 1, respectively. Details on the statistical analyses and interpretation of hypotheses testing are given in Supplementary file 1. A sample size of ≥ 100 participants is recommended.46
Reliability
Cronbach’s alpha and the Person-Item-Separation Index, which are measures of internal consistency reliability in case of a unidimensional scale, were derived from the validity sample because of its larger sample size.39 An outcome between 0.70 and 0.95 was considered acceptable.39
Inter-rater reliability was examined using the intra-class correlation coefficient (ICC) model 2.1 (two-way random effects model; ICCAGREEMENT).44 An ICC of ≥ 0.7 or higher was deemed acceptable.39 The standard error of measurement (SEMAGREEMENT) was calculated and deemed satisfactory if it was ≤ 10% of the total scale range (100 DEMMI points).47,44 The absolute and relative agreement between both raters per DEMMI item was calculated as a percentage (%) and as the weighted kappa with linear weights (ƙ).44 Agreement per item equal or above 70% and ƙ ≥ 0.70 was considered acceptable.39 For additional information on reliability statistics, see Supplementary file 1.
Interpretability
Bland and Altman’s method was used to illustrate agreement between the two raters.48 The minimal detectable change (MDC) with 90% and 95% confidence was calculated for individual subjects (MDCind) as well as for comparisons of mean scores between groups (MDCgroup).49,44 A floor or ceiling effect was considered if ≥ 15% of the participants scored the highest or lowest possible DEMMI score.39 Supplementary file 1 gives more information on the statistical methods.
Feasibility
We calculated the mean administration time for the DEMMI in minutes and related the administration time to the participants’ functional status. We documented any adverse events, such as falls, reports of pain, atypical and severe changes of muscle tone, or significant fatigue.