Participants
The data for this validation study originate from the baseline assessment of the project “STEP.De -Sports Therapy for Depression”, which assessed the implementation of sports therapy as a non-inferior treatment alternative in depressed patients compared to psychotherapy (16). For the purpose of the current study, the sample comprised n = 277 (72.7% women) patients with mild to moderately severe depression (Beck Depression Inventory II (BDI-II) mean score = 22.28, SD 10.14). The mean age was M = 48.3 years (SD = 11.1, range 20–65) and 80.9% of the participants had worked within the last three months. Further patient characteristics are shown in Table 1. The study was approved by the local ethics committee of the University of Potsdam (No. 17/2018) and the Freie Universität Berlin (No. 206/18) and was conducted in compliance with the Declaration of Helsinki. All methods were carried out in accordance with relevant guidelines and regulations. The study was registered in the ISRCTN registry (ISRCTN28972230).
Table 1
Characteristics of the sample (n = 277)
|
n
|
No. (%), range
|
Age (years), M (SD) range
|
275
|
48.3 (11.0), 20–65
|
Gender
|
275
|
|
Female
|
|
200 (72.7)
|
Male
|
|
75 (27.3)
|
Education level
|
266
|
|
Lower secondary school
|
|
22 (8.3)
|
Secondary school
|
|
161 (58.1)
|
Higher education
|
|
83 (31.2)
|
Living status
|
274
|
|
Alone
|
|
70 (25.5)
|
Not alone
|
|
204 (75.5)
|
Income
|
262
|
|
Low income
|
|
28 (10.7)
|
Middle income
|
|
158 (60.3)
|
High income
|
|
76 (29.0)
|
First language
|
263
|
|
German
|
|
255 (97.0)
|
Other
|
|
8 (3.0)
|
Worked within the last 3 months
|
272
|
|
Yes
|
|
220 (80.9)
|
No
|
|
52 (19.1)
|
Depressive symptoms (BDI-II)
|
277
|
|
Minimal
|
|
54 (19.5)
|
Mild
|
|
62 (22.4)
|
Moderate
|
|
81 (29.2)
|
Severe
|
|
80 (28.9)
|
Note. BDI-II: Beck Depression Inventory II.
|
Procedure
Strict inclusion and exclusion criteria were followed for the clinical sample of mild to moderately severe depression (see Heissel et al., 2020 (16) for full details). To include patients with diverse social backgrounds, the patient sample was recruited by health insurance data managers between August 2018 and October 2020, from diverse urban districts. Patients were informed about the study aims and the voluntary nature of the study. When general interest was expressed, participants met a trained study assessor for patient education, regarding being informed of the data protection policy and to sign the informed consent forms. Via an electronic case reporting form (eCRF), participants provided their data and completed the WSAS and further self-report questionnaires outlined below.
Measures
Participants’ self-reported sociodemographic data on age, gender, education level, living status, income, first language, and employment were collected. For the education level, a variable with the three categories of low (lower secondary school), middle (secondary school diploma), and high education (university entrance qualification and university degree) levels was created. The income variable was categorised into low (< 1000€), middle (1000–2000€), and high (> 2000€) personal monthly net income.
Work and Social Adjustment Scale (WSAS)
The WSAS comprises 5 items (work, home management, social leisure, private leisure, and relationships), each rated on a 9point Likert scale from 0 “not at all impaired” to 8 “very severely impaired” as a patient-reported outcome, which can also be pooled. The total score ranges from 0 to 40, with higher scores denoting higher levels of disability (12, 13). Scores above 20 indicate moderately severe or worse impairment, scores between 10 and 20 represent significant functional impairment, and scores below 10 are considered subclinical (13). The initial translation from English to German (forward translation) was performed by two independent German native speakers fluent in English. The resulting two German versions were synthesized, and technically and linguistically revised by a third German native speaker. The result was then translated back into the source language by an English native speaker fluent in German, but blind to the original WSAS (back translation). Non-equivalent translations were discussed until all translators agreed upon a functionally equivalent German version – ASAS: “Arbeits- und Sozialanpassungsskala”. The clinical guideline for cultural translation and adaptation of self-report scores was strictly followed in the translation process (17). The result of this translation is presented in the Additional File 1 (Table S1). Participants were administered this measure online at baseline (prior to taking part in the treatment trial).
World Health Organization Disability Assessment Schedule (WHODAS 2.0)
Within five days of the initial online assessment, an equivalent instrument for measuring workability, i.e. the World Health Organization Disability Assessment Schedule (WHODAS 2.0) (18), was administered by telephone through trained assessors, as this measure assesses functionality in multiple domains which is comparable to the WSAS. The WHODAS 2.0 is a questionnaire that assesses an individual’s level of functioning in six domains: cognition, mobility, self-care, getting along, life activities, and participation in society. In this study, the German 12-item screening version of WHODAS 2.0 (19) was used. For each item, respondents had to indicate the level of difficulty experienced during the previous 30 days using a five-point Likert scale from 1 “none” to 5 “extreme/cannot do”. The total score for global disability ranges from 0 “no disability” to 60 “complete disability”. The reliability of the WHODAS 2.0 in the present sample had a Cronbach’s α = .77.
Beck Depression Inventory II (BDI-II)
Depressive symptoms were assessed by the German version (20) of the Beck Depression Inventory II (BDI-II) (21). The BDI-II is a 21-item self-report depression screening measure. Individuals were asked to respond to each question based on a two-week time period. Items were rated on a four-point Likert scale ranging from 0 to 3. The maximum total score is 63, with higher scores indicating higher levels of depressive symptoms. According to the BDI-II manual (21), a score of 0–13 indicates minimal depression, 14–19 mild depression, 20–28 moderate depression, and 29–63 severe depression. The reliability in the present sample had a Cronbach’s α = .91.
Single Item General Impairment
For validity measures, a single-item question was developed from the WSAS (“Meine Depression beeinträchtigt mich im Alltag/ in der Freizeit/ im Berufsleben – My depression affects me in everyday life/ in my free time/ in my work life”) to assess the global impairment due to depression. Participants rated the item on a Likert scale from 0 “not at all impaired” to 8 “very severely impaired” with every second step marked, so that higher values indicated greater impairment.
12-Item Short Form Survey (SF-12)
To assess health-related quality of life, the 12-Item Short Form Survey (SF-12) (22, 23) questionnaire was used. It consists of seven questions including twelve items and representing eight domains. Next to a weighted sum score, items can be grouped into two subscales, the mental component summary (MCS-12) and the physical component summary (PCS-12). The PCS-12 represents four domains, namely general health perception, physical functioning, physical role functioning, and pain. The MCS-12 reflects the four domains of emotional role functioning, mental well-being, negative affectivity, and social functioning. Both summary scores range from 0 to 100, with higher scores indicating better quality of life. In this study, only the two SF-12 sub-summary scores were used to better differentiate between the mental and physical constructs of workability.
Data analyses
Data analyses were performed in SPSS (IBM Corp. Released 2017. IBM SPSS Statistics for Windows, Version 25.0.0.1 Armonk, NY: IBM Corp.) and R Studio (version 1.2.5042 for Macintosh). For all analyses, statistical significance was set at a p value of less than .05.
Sample characteristics were summarized as frequencies and percentages for the categorical variables and as means and standard deviations (SD) for the continuous variables. The Shapiro-Wilk test was used to assess normality. The non-parametric Kruskal Wallis and Mann-Whitney U test were used to investigate the differences in the WSAS total score by age, gender, income, education level, and depressive symptom severity. Correlation between the continuous variable of age and the WSAS total score was also examined. Chi-square tests were used to investigate the differences in the level of impairment (WSAS score < 10, subclinical impairment;10–20, significant functional impairment; >20 moderately severe or worse impairment) by gender, income, and education level.
Floor and Ceiling Effects
To examine the usability of the WSAS in a homogeneous group of patients with mild to moderately severe depressive disorders, floor and ceiling effects were examined by evaluating the means and standard deviation of each item and testing these against the lowest and highest possible scale values via one sample t-test. Furthermore, the frequency of participants with the lowest and highest possible scores and the skew distribution for each item were assessed. The cutoff for a significant floor or ceiling effect was set at ≤ 15% (24). For the skewness distribution, values less than − 1 or greater than + 1 were considered highly skewed, values between − 1 and − .50 or between + .50 and + 1 were considered moderately skewed, and values between − .50 and + .50 were considered approximately symmetrical (25).
Factorial validity
To test the factorial validity, Confirmatory Factor Analysis (CFA) was performed using the Lavaan package in R Studio (26). To test the hypothesis that the WSAS is best interpreted as a one-factor model, models were evaluated using a chi-square test and additional fit indices. As the chi-square is known to be affected by the sample size, a relative/normed chi-square (ratio of the chi-square test to the degrees of freedom) (27) that minimizes the impact of sample size on the model fit was calculated. A value < 2 for the normed chi-square is considered a good model fit, and a value < 3 an acceptable model fit (28). The Bentler Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI) were used as comparative fit indices. Following the literature, an acceptable model fit was set by values ≥ .90, and values ≥ .95 indicated a good model fit (29). The standardized root-mean-square residual (SRMR) and root-mean-square error of approximation (RMSEA) were assessed as absolute fit indices. For the SRMR, values < .05 were considered good and values < .10 were considered acceptable (28). For the RMSEA values < .05 were interpreted as good, and values between .05 and .08 as acceptable (31, 32). Modification indices were calculated to identify where linear constraints might be relaxed to improve model fit (33). To ensure that the characteristics of the dataset were suitable for CFA to be conducted on the study sample, the linear relationship between WSAS items was graphically validated by Q-Q plots. Multifactorial normal distribution was tested by a Shapiro-Wilk test and a Kolmogorow-Smirnow test. As these tests did not confirm normal distribution, the maximum likelihood estimation with robust (Huber-White) standard errors and a scaled test statistic (Yuan-Bentler) was used for CFA. To compare different models, the Satorra-Bentler Scaled chi-square difference test (SBS-χ2) was used (34), where the usual normal-theory chi-square statistic is divided by a scaling correction to better approximate the chi-square under non-normality. Because the SBS-χ2, as the chi-square test used to test goodness of fit, is sensitive to sample size (35), the difference in CFI (∆CFI) (36) and two predictive fit indexes, the Akaike’s information criterion (AIC) values (37) and the Bayesian Information Criterion (BIC) (38), were also considered. A decrease of less than .01 in the fit of the more parsimonious model on the CFI should be treated as support for that model (39, 40). Lower BIC and AIC values indicate better model fit (41). A difference of 10 points between models was accepted as a relevant difference.
Measurement invariance
To examine whether the WSAS had the same psychometric properties across patients with a different severity of depressive symptoms (minimal/mild vs. moderate/severe) according to the BDI-II (21), measurement invariance of Model 2 was tested in a series of multigroup CFA with three levels of invariance (configural, weak, and strong invariance) (39). Whereas configural invariance imposes the same factor structure in all groups, weak invariance constrains all factor loadings to be equal across groups and strong invariance additionally constrains the equality of intercepts. All model fits were tested using robust maximum likelihood (robust ML) and full information maximum likelihood (FIML) estimation Model comparisons were processed using Satorra-Bentler scaled chi-square difference test (ΔSBSχ2) for two nested models (42), changes in fit indices, and AIC and BIC values. For testing weak invariance, a change of ≥ − .01 in CFI, supplemented by a change of ≥ .03 in SRMR or a change of ≥ .015 in RMSEA would indicate non-invariance. For testing strong invariance, a change of ≥ − .01 in CFI, supplemented by a change of ≥ .015 in SRMR or a change of ≥ .015 in RMSEA would indicate non-invariance. Among the three indexes, CFI is chosen as the main criterion (43).
Internal consistency
For reliability measures, internal consistency tested by Cronbach’s alpha (α) (44) and the coefficient omega (ω) (45, 46) were assessed. Coefficients Cronbach’s alpha and omega above .70 were considered satisfactory (47, 48).
Convergent validity
Convergent validity was assessed by the correlations between the WSAS and WHODAS 2.0, as well as their individual items, as a different measure of workability and social functioning. As Shapiro-Wilk test scores for WSAS and WHODAS 2.0 scores did not follow the normal distribution (p < .05), the non-parametric coefficient of Spearman’s rho was used for the correlations between the WSAS and the other instruments.
Criterion validity
To examine the criterion validity, Spearman rho correlation coefficients between the WSAS and related constructs were calculated. It was determined by the correlation between the WSAS and the BDI-II as a measurement of symptom severity, between the WSAS and the Single Item as a measure of General Impairment, and between the WSAS and the two SF-12 subscales as a measurement of mental and physical health status. Correlations less than .30 were considered weak, correlations between .30 and .49 were considered moderate, and correlations greater than .49 were considered strong (49).