German Translation, Cultural Adaption and Validation of the Unidimensional Self-ecacy Scale for Multiple Sclerosis Using Rasch Analysis

Self-ecacy refers to people’s beliefs in their ability to perform relevant activities to achieve personal goals. In people with multiple sclerosis (PwMS), self-ecacy has been shown to signicantly impact health-related behaviour. So far, a validated German language self-ecacy scale for PwMS is missing. Therefore, the aims his study were to translate the Unidimensional Self-Ecacy Scale for Multiple Sclerosis (USE-MS) into German, establish face and content validity and cultural adaption of the German version for PwMS in Austria. Further aims were to validate the German USE-MS (USE-MS-G) in PwMS using Rasch analysis.


Page 3/21
Background Multiple Sclerosis is a chronic demyelinating disease of the central nervous system, with accumulating disability and loss in quality of life (1). It appears crucial for people with MS (PwMS) to preserve their autonomy, despite functional limitations and an unpredictable disease course. Bandura's social cognitive theory proposes constructs such as people's personal agency, self-re ection and self-e cacy (2). Selfe cacy concerns the belief about one's capability to perform relevant tasks, overcome challenges and achieve meaningful goals. Importantly, self-e cacy is not related to people's level of functioning or their skills but rather their judgement of what they can achieve (2). This implies that PwMS who are con dent in their ability to master challenges and reach their goals may cope with the disease more effectively.
Higher levels of self-e cacy may enhance people's motivation to be physically active as there is a strong relationship with health promoting behaviour and perceived quality of life in PwMS (3).
Several scales have been developed to assess self-e cacy in PwMS (4)(5)(6). The Unidimensional Self-E cacy Scale for Multiple Sclerosis (USE-MS) (5) was developed from the Liverpool Self-e cacy Scale (LSES) (6) and the Multiple Sclerosis Self-E cacy Scale (MSSS) (4), both of which were based on patients' in-depth interviews and underpinned by Bandura's theory of self-e cacy. Only the USE-MS met the stringent criteria of Rasch analysis (7) for assessing its psychometric properties in a large sample of PwMS. Accordingly, the USE-MS is a valid and reliable instrument for use in clinical practice and research. So far, however, no validated German language version of the USE-MS is available. Therefore, the main purpose of this study was to translate the USE-MS into German and validate the German language version (USE-MS-G) using Rasch analysis. Further purpose was to examine any differential item functioning, including that for language to equate the original English USE-MS and German USE-MS-G.

Study design and participants
This was a bi-centre prospective cross-sectional translation and validation study with repeated measures, consisting of Phase 1 and Phase 2. In Phase 1, the forward-backward translation, establishment of face and content validity and cross-cultural adaption of the pre-nal USE-MS-G was performed. In Phase 2, examination of construct validity and reliability of the USE-MS-G was done. Information brochures and invitations for study participation were displayed in the MS-Clinic, the Clinic for Rehabilitation, the Austrian MS Society patient magazine and on their website; they were also forwarded to MS support groups. Upon agreement, severely disabled PwMS (Expanded Disability Status Scale (EDSS)(8) ≥ 8) were visited at home to facilitate their participation. Additionally, during their regular visits, PwMS were noti ed about the study by Clinic staff. All procedures followed the tenets of the Declaration of Helsinki and written informed consent was obtained from all participants. Research data are available on reasonable request (barbara.seebacher@i-med.ac.at).
A random cross-sectional cohort of patients with clinically de nite MS according to the McDonald's criteria (9) version valid at the time of diagnosis and with any MS phenotype was recruited into this study. PwMS of any ethnicity with very good German language skills, aged ≥ 18 years with different levels of functioning were included (EDSS scores from 0 (no disability) to 9.0 (severe disability); see (10) for a detailed study protocol.
Exclusion criteria were comorbidities potentially affecting subjective self-e cacy ratings (e.g., malignant diseases, other neurological or psychiatric disorders), a relapse of MS within 2 months prior to the study or any change in medication within 4 weeks of the study commencement. A relapse between test and retest required the exclusion of the participant.
The Austrian dataset was also pooled with a dataset from the UK development sample to test for invariance by language.
Outcome measures Demographic (gender, age) and disease speci c data (disease duration, MS phenotype, diseasemodifying treatment) were retrieved from patients' les. The current EDSS was assessed by neurologists. Questionnaire data were collected twice within a 14-21-day period (test, retest).
The original USE-MS has been shown to be reliable and valid for assessing self-e cacy in PwMS (5). Scoring is achieved by summing up all 12 items while items 5, 7-9 and 11 are reversed scored. The USE-MS involves a 4-point Likert scale (0 = strongly disagree to 3 = strongly agree). A higher summary score signi es stronger self-e cacy beliefs in people.
Validated questionnaires used to evaluate external construct validity of the USE-MS-G were recommended by governmental or patient organisations (11,12). These included the German language versions of the General Self-E cacy Scale (GSE) (13) Patients were recruited until saturation was achieved, indicating that no further information could be obtained from additional interviews.

Phase 2
For Rasch analysis, a sample size of 243 participants has been shown to provide accurate estimates of item and person locations irrespective of the scale targeting (19). Moreover, with polytomous items, ≥ 10 observations per category are recommended (20). It is further relevant to collect a wide range of responses across the latent trait under consideration, i.e., self-e cacy (19).

Translation, face and content validity and cultural adaption
In Phase 1, following guidelines for the cross-cultural adaptation of patient-reported outcomes (21,22) and its enhanced version from the University of Leeds, UK, a forward-backward translation process was conducted by 6 bilingual translators, 3 native in German, 3 in English. This comprised a synthesis of translations and expert committee consensus. Pretesting (Test 1, T1) and face-to-face cognitive interviews regarding the questionnaire wording were carried out in male and female PwMS across the disability range. After 30 recorded interviews, saturation was achieved. Cross-cultural equivalence between the USE-MS and USE-MS-G was accomplished in the semantic, idiomatic, experiential and conceptual areas (21,22). Qualitative content analysis of the verbatim interview transcriptions was performed (described in detail elsewhere (10)). During all stages of the iterative adaption process of the USE-MS-G, consensus was reached with the original scale developers (5).

Statistical analyses External validity and test-retest reliability
Correlational analyses were performed between the USE-MS-G and other measures to determine convergent and discriminant construct validity. We hypothesised moderate to high positive correlations of the USE-MS-G with the GSE, RS-13 and MusiQol and moderate to high negative correlations with the HADS and NFI-MS. Spearman's Rank correlation coe cients of 0.3-0.49 were considered low, 0.5-0.69 moderate and ≥ 0.7 strong (23); they were calculated with their 95% con dence intervals (CI) and pvalues corrected for multiple comparisons using a Bonferroni correction. Descriptive statistics and external validity estimates were performed IBM SPSS software (IBM SPSS Statistics; Version 26.0. Armonk, NY: IBM Corp.) or GraphPad Prism Version 8 (GraphPad Software, La Jolla, CA). Statistical signi cance was de ned as two-tailed p-value < 0.05.
Test-retest reliability was determined using Lin's concordance correlation coe cient (24) (r c ) between Test 2 (T2) and Test 3 (T3). The r c (0-1; 95% CI) was used to estimate the amount of agreement between the test and retest USE-MS-G data. The Pearson correlation coe cient was calculated as a measure of precision and a Bias correction factor, C b as a measure of accuracy (24). MedCalc software (https://www.medcalc.org/) was used to determine the r c .

Internal validity: Rasch analysis
Rasch analysis uses the mathematical Rasch model to assess whether a summary score for a scale can be calculated with con dence (7). Internal construct validity of the USE-MS-G was determined by examining the deviations from model expectations, i.e. the way in which persons are expected to interact with test items to produce linear measurement (7). The model expects that the probability of a person providing a certain answer to an item is a logistic function of the difference between the person 'ability' (perceived self-e cacy) and the item 'di culty'. This is checked visually by inspection of item characteristic curves and numerically by the analysis of variance (ANOVA) t statistics (uniform DIF; nonuniform DIF (25)). The USE-MS-G contains 4 response categories and hence, the polytomous Rasch model was chosen for the current study (26).
Using different chi square (χ 2 ) t statistics, USE-MS-G data were tested against the model expectations of unidimensionality. That is, the 'ability' and 'di culty' are required to relate to the same construct of selfe cacy (described in detail elsewhere (26)). Perfect values for the different t statistics and unidimensionality are provided in Table 3. Using a residual item correlation matrix between all items the expectation of local independence was examined. Item residuals represent the difference between an item's expected and observed values, divided by its standard deviation for standardisation. Residual correlations of + 0.2 above the mean correlation of the total matrix indicate local dependence (27). This denotes a confounding factor inducing an association between items, or multidimensionality (28, 29). In the presence of item-dependency, two "super-items" can be created from alternative items and compared with each other running a robust conditional chi-square test of t (28). The proportion of common to total variance retained in a bi-factor equivalent solution corresponds to the explained common variance (ECV) (30). For a unidimensional scale, the ECV should be > 0.9, indicating that > 90% of the variance is common and retained in the latent estimate (28).  3 The PSI indicates the reliability and differentiation of strata 4 Based on independent t-tests to compare person residuals which are positively and negatively loading on the rst principal component *Bonferroni adjusted and variable with number of items The property of invariance means that all participants recognise the di culty in identical items regardless of their self-e cacy (31). If certain groups of participants respond differently to items, e.g. males and females, the assumption of invariance is violated, called differential item functioning (DIF) (31 which was based upon the unrestricted or partial credit model (34).

Phase 1
The pre nal USE-MS-G resulted from the forward-backward translation procedure.  (10)).   Spearmans' correlation coe cients are shown with 95% con dence intervals; ***correlation is signi cant at the < 0.001 level (2-tailed, p-values corrected for 18 comparisons) Phase 2: Rasch analysis USE-MS-G test data and pooled UK and Austrian data were tted to the Rasch model separately. The original USE-MS-G data showed mis t to the Rasch model, multidimensionality and DIF by age, gender, disease duration and centre. Absence of item independency given, the data were combined into two super-items (testlets) consisting of all alternative items and t to the bi-factor model was tested. The class interval structure of the current 2 testlets was equal. The results demonstrated a latent correlation between the two item sets of -0.976. The ECV was 0.988, which con rms that a single score driven from the testlet items truly indicates self-e cacy of the respondents. Fit to the Rasch model, excellent reliability and unidimensionality were shown, as presented in detail in Table 3 and Fig. 1. The t residuals were well below the limits of ± 2.5, which indicates that there was a small calculated difference between the expected and observed values for persons and items.
Inspection of the item characteristic curves showed classic t to the Rasch model for both subtests (super-items). All observed scores for the ve class intervals at the different 'ability' levels of self-e cacy matched the expected scores of the model (see Additional File 1). Examination of the item characteristics curves for DIF by gender (male; female), age groups (18-40; 41-51; 52-58; 59 + years), disease duration groups (0-10; 11-16; 17-25; 26 + years), timepoint (test; retest) and centre (Innsbruck; Münster) con rmed absence of any DIF (see Additional File 2). All ANOVA results for DIF by person factors (uniform DIF) and for person factor by class interval interaction (non-uniform DIF) were statistically nonsigni cant. The USE-MS-G and original USE-MS data (N = 485) were also pooled and tested for invariance by language (English; German) to equate the language versions. With the pooled dataset, there was also no DIF by language or centre in any of the analyses. Figure 2 presents the item characteristics curves with DIF analyses for the person factor language.
Perfect threshold ordering was displayed by the threshold map, with all scoring categories progressing in a logical order, as well as the category probability curves of the testlets (Fig. 3). Zero % of oor and 1% of ceiling effects (3/309 persons) were observed. The targeting of the scale was good, showing a slightly higher level of self-e cacy than the mean of the scale, which itself showed a near ideal distribution across the trait (Fig. 4). A standard error of measurement (SEM) of 0.439 was found, a minimum detectable change (MDC) of 4.56 points, as measured on the original scale range of 0-36 points. The smallest amount of change beyond the measurement error, expressed as MDC percentage, would be 12.7%. Given t to the Rasch model, a transformation of the raw score to interval scaling is available (Table 4).

Discussion
The purpose of this study was to translate, cross-culturally adapt the pre nal German USE-MS-G to Austria and validate the nal USE-MS-G in PwMS across a wide range of disability. Forward-backward translation and pretesting according to guidelines are critical procedures of a scale's cross-cultural adaptation, as it is expected to re ect the latent trait under investigation (22 The study population covered the full range of MS disability levels and phenotypes. Fit to the Rasch model was shown for the USE-MS-G, good targeting and unidimensionality. This justi es the use of a summary score and transformation of the ordinal raw score into an interval score, which is suitable for parametric analysis. Absence of any DIF was observed and invariance by language for the pooled dataset. This means that the USE-MS-G is equivalent to the English original version and appropriate to measure self-e cacy in people with mild to severe MS. The scale failed to show model t at the item-based level. There was some individual item mis t with some over and under discriminating items. Items 5, 7, 8 and 12 showed DIF by language, and for 5 and 12 the English items were harder than German, but for items 7 and 8 the opposite was the case. So, at the level of the test, the DIF cancelled out and this was con rmed by the absence of DIF using the super-item, or bi-factor, solution. Further, low levels of item dependency were observed that advocated employing a bifactor approach. The bifactor solution with alternative items allocated to the testlets mirrors the scale's use as a domain score in clinical practice. Excellent t to the model was shown, and merely 0.0050% of the variance was lost (1 minus the A value in the summary statistics window).
Good precision of the USE-MS-G was demonstrated, as expressed by a low SEM. Based on that, the minimum detectable difference was 12.7% i.e., change scores of less than 4.6 points are less than the measurement error in the scale. No (0%) oor effects and 1% ceiling effects were seen, with only 3 people scoring at the top end of the scale. This indicates that the USE-MS-G is able to discriminate all levels of self-e cacy in PwMS and sensitive to changes exceeding 4.6 points also at both ends of the spectrum.
To our knowledge, the English USE-MS is the most rigorously developed and tested scale for assessing self-e cacy in PwMS and accordingly, it was chosen for this study. The validity and reliability of the German USE-MS-G has been demonstrated in this study and the scale is available for use in clinical and research practice. Assessing self-e cacy may be useful to enable an individualised and comprehensive treatment in PwMS. The USE-MS-G is easy to use and can be completed by PwMS within 5-10 minutes. It can be accessed free of charge from tonic.measures@gmail.com.

Conclusions
To conclude, the USE-MS-G is a robust, valid and reliable scale to assess self-e cacy in PwMS. The translation and cross-cultural adaption to Austria were performed according to international guidelines (21)  Declarations Scale (EDSS) ≥8) were visited at home to facilitate their participation. Additionally, during their regular visits, PwMS were noti ed about the study by Clinic staff. All procedures followed the tenets of the Declaration of Helsinki and written informed consent was obtained from all participants.

Consent for publication
Not applicable.
Availability of data and materials All relevant data are shown in the manuscript and its additional les as gures or tables. All data described in the manuscript, including all relevant raw data, are freely available to reviewers and any scientist wishing to use them for non-commercial purposes, without breaching participant con dentiality on reasonable request (barbara.seebacher@i-med.ac.at).
Competing interests Figure 1 Equating t-test distribution plot Figure 1 shows that a majority ofthe respondents scored similarly on the two subsetsof items (person residuals which are positively and negatively loading on the rst principal component). On the 5% level, 32 persons (4.07%) scored signi cantly different and on the 1% level, 11 persons (1.40%) showed signi cantly different responses. DIF by language analyses results for the pooled UK and Austrian datasets ASubtest 1 works the same way for English and German speaking participants (no DIF by language) BSubtest 2works the same way for English and German speaking participants (no DIF by language) Figure 3 Category probability curve for subtest 2 The category probability curves for subtest2 show19 ordered categories, where higher numbers represent higher levels of self-e cacy. Figure 4