Ethical approval
was obtained from the National Research Ethics Service Committee East Midlands – Leicester South (17/EM/0409) and the University of Salford’s School of Health Sciences Ethics Panel. All participants provided written, informed consent.
Study Design
Cross-cultural adaptation, followed by cross-sectional surveys to establish psychometric properties of the WALS. The COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) checklists for assessing methodological quality and reporting guidelines were followed [19, 20].
Participants and recruitment
Patients were identified by research facilitators or therapists in 47 UK National Health Service (NHS) Trusts (41 secondary care and six community Rheumatology, Orthopaedic or Therapy out-patient clinics). We also recruited some possible participants from our research group’s Arthritis Volunteer Register. Participants were eligible if: at least 18 years of age; in paid employment for at least one day a week (including self-employed); currently at work (or if on short-term sick leave, i.e., less than four weeks, participation was delayed until at work); and a confirmed primary diagnosis of: RA or undifferentiated inflammatory arthritis (UIA); AS or axial spondylopathy (AxSpA), OA (knee and/or hip); or FM. Diagnoses were confirmed by a Rheumatologist for RA/IA and AS/AxSpA; or a Rheumatologist, Orthopaedic Surgeon, General Practitioner, or extended scope practitioner physiotherapist for OA and FM. There were no restrictions on RMD duration. Participants needed to be able to read, write and understand British English. Exclusion criteria were: on long-term sick leave (as unable to complete the work measures); and unable to provide informed consent. Patients were identified using these criteria, given a short study explanation and information pack (introductory letter, participant information sheet, reply form, and Freepost envelope to the research team). The reply form included diagnosis, employment, and sick leave status, to further check eligibility criteria. The patient could return the reply form themselves or provide written agreement that NHS staff could do so on their behalf.
For those contacted for Phase 1, a study explanation was provided by telephone and written consent given prior to interview. For those in Phase 2, the questionnaire booklet included a consent form on the front.
Phase 1: Linguistic validation, cross-cultural adaptation, and content validity
The following procedures were used [21]:
Forward translation: two translators (a rheumatology researcher familiar with the WALS (AH), and a non-health professional (experienced teacher, including of English: JG) unfamiliar with the WALS) independently reviewed the WALS to identify words requiring changing into British English and use of Plain English (i.e., simplifying words and phrases).
Translation synthesis
the two translators discussed and agreed recommended changes.
Backward translation
was not required as the translation was into another form of English.
Expert committee review: The committee included: one translator (AH); three occupational therapists experienced in work and musculoskeletal conditions (YH, TW, RO’B); the WALS developer MG: Canadian-English speaker); experienced PROMS researchers (AT, AH, YP, SV) and two patient research partners (AP, SK). The committee discussed the synthesised translation, made additional recommendations, and agreed and approved the draft British English WALS. This process ensures semantic, idiomatic, experiential, and conceptual equivalence.
Field testing of the draft WALS and content validity
Cognitive debriefing interviews were used to investigate the WALS from people with RMD’ perspectives [18]. PROM content validity should be assessed by experts, i.e., patient/ public representatives of the target populations [22]. At least 10 in each target group should be included [23]. Participants were mailed a paper questionnaire booklet, including the draft British English WALS, to complete at home, and asked to consider WALS ease of completion, item relevance and if anything important was missing. Within two weeks, they were interviewed, face-to-face or by telephone, about comprehensiveness (1 = not relevant; 5 = extremely relevant; and any missing items) and comprehensibility (instructions, content, layout). Findings were discussed with the expert committee, further changes made and the final British English WALS agreed.
Content validity was further examined by linking the WALS to the Activities and Participation component of the International Classification of Functioning, Disability and Health (ICF) Core Set for Vocational Rehabilitation [24, 25]. The Flesch-Kincaid Grade score was calculated using Microsoft Word to check readability was similar to the original WALS [26].
Phase 2: Psychometric testing
Data collection
Participants were mailed a paper questionnaire booklet to complete at home (Test 1: T1). Two weeks after return, they were mailed a second paper questionnaire (Test 2: T2), to assess test-retest reliability. If either were not returned, at two weeks participants were sent a reminder letter, followed at four weeks by a further letter and copy of the questionnaire booklet. The T1 booklet included items on demographic, disease and employment characteristics: age, gender, living arrangements, education status, condition duration, medication regimen, employment status and job title, to allow coding to job skill level (1 = elementary occupations; 2 = requiring compulsory education and work-related training; 3 = post-compulsory education (sub-degree) or longer work experience; 4 = degree level education or equivalent experience [27]). Data were collected as part of a wider study testing six other contextual factor work-related measures. At T2, participants did not have knowledge of previous scores.
Instruments
To test construct (concurrent) validity, at T1 we included the following work and health measures. For all, a higher score indicates worse status.
British English WALS: 12 items, measured on a 0–3 scale of difficulty performing work tasks (0 = no difficulty; to 3 = unable to do (Supplementary File 1). WALS content is specific to arthritis with items created through literature review [10]. It includes: eight physical activity items (e.g., working with hands, standing, moving around inside, commuting); three about managing work (i.e., work hours, pace and job demands); and one mental demand (concentration at work) [12]. Instructions state to answer without help from others, use of special gadgets or equipment, so as not confounded by use of workplace behavioural coping strategies [10]. Recall period is not specified. Items recorded as “not applicable to my job” are assigned a score of 0. The scoring allows up to three missing items, which can be imputed using individual’s mean or median scores (depending on data distribution). A total summed score is calculated (0–36). A WALS score of 0–4 is considered to indicate a low level of work limitations, 5–8 = moderate; and ≥ 9 = high [13]. A score ≥ 9 is associated with greater need for work accommodations, absenteeism and job disruptions, compared to those scoring < 5 [13].
WLQ-25: a reliable, valid measure including 25 items in four sub-scales (1–5 scale), indicating the percentage time in the past two weeks a person was limited in physical work demands, time demands, mental-interpersonal demands and output demands [7]. From these, the WLQ Percentage Productivity Loss [7] and Summed scores [28] can be created.
WIS: measured using the RA-WIS in RA, OA, and FM and AS-WIS in AS [29–31]. This measures the degree of mismatch between work abilities and job demands. There is evidence for reliability and validity for the RA-WIS in RA and OA, but not yet for FM; and for the AS-WIS. The RA-WIS includes 23 true/false items and the AS-WIS 20 items. Both measures have cut-points indicating low, moderate and high work instability (Table 2).
WPAI (General Health)
a reliable, valid measure of six items from which Percentage Overall Work Impairment due to Health (in past seven days) is calculated [32].
Health status measures
Perceived health status: measured using a 5-point Likert scale “Considering all the ways that your condition affects you, how have you been over the past month?” (1 = very good (no symptoms; no limitation of normal daily activities); to 5 = very poor (very severe, intolerable symptoms; unable to do many normal daily activities)).
Perceived change in health status: At T2 only, measured using a 5-point Likert scale “Overall, how much is your arthritis/ condition troubling you now compared to when you last completed this questionnaire?” (1 = much less; 3 = about the same; 5 = much more).
Condition-specific health measures:
Four condition specific questionnaire booklets were used. Participants completed only those measures relevant to their condition.
RA
Rheumatoid Arthritis Impact of Disease (RAID): includes seven 0–10 numeric rating scales (NRS): pain, fatigue, sleep, functional disability, coping, physical and emotional well-being. A total score is created from the sum of weighted NRS scores [33].
HAQ: physical function evaluated by 20 daily activities rated on a 0–3 scale (0 = not at all difficult; 3 = unable to do) [34]; scored using the HAQ20 method, i.e., all 20 items are summed (0–20 = mild; 21–40 = moderate; 41–60 = severe disability) without adjustment for using aids and devices [35].
AS
Bath Ankylosing Spondyloarthritis Disease Activity Index (BASDAI): six 10cm. visual analogue scales (VAS) of symptom severity (fatigue, spinal pain, other joint pain/swelling, localised tenderness, morning stiffness, duration of morning stiffness), from which an average score (0–10) is calculated. Scores > 4 indicate active disease [36].
Bath Ankylosing Spondyloarthritis Functional Index (BASFI): Ten 10cm. VAS of physical function (mobility), from which an average score (0–10) is calculated [37].
OA
Western Ontario McMaster Universities Osteoarthritis Index (WOMAC): two of the three sub-scales: pain (five items); and physical function (17 items), scored on 0–4 scales (0 = none; 5 = extreme), from which total scores for each sub-scale are calculated [38].
FM
Revised Fibromyalgia Impact Questionnaire (FIQR): three sub-scales rated on 0–10 NRS: overall impact (two items); symptoms (10 items); and function (nine items). Sub-scale and overall total scores are calculated [39].
Sample size
As Rasch analysis was used to assess internal construct validity (unidimensionality), enough cases are needed within each condition group to test for invariance across groups [40]. The sample does not need to be representative, as the Rasch model is independent of distribution, but should have a good distribution across the work activity limitation domain. A minimum of 150 responses is required for Rasch analysis, although we aimed to collect up to 250 to ensure a broad spread of responses. At least 79 sets of repeated responses were required to demonstrate that a test-retest correlation of 0.7 differs from a background correlation (constant) of 0.45, with 90% power at the 1% significance level. A test-retest correlation of 0.7 is deemed a minimum acceptable level [41].
Statistical analyses
Demographic, work, and disease measures were summarised descriptively, as appropriate. RUMM 2030 + software was used for Rasch analysis [42]. As all work and health measures either consisted of ordinal data, or were not normally distributed, non-parametric statistical tests were conducted using the Statistical Package for the Social Sciences (SPSS) v26 [43].
Compliance (missing data): the number (%) of missing data items and WALS which could not be scored were identified.
Internal construct validity: The primary analytical strategy was testing the fit of the WALS for each condition to the Rasch Measurement Model to determine reliability and internal construct validity [44]. Given the requirements for fit, a hierarchical strategy was used (Supplementary Table 1). With level 1 as the priority (individual item fit), all requirements for model fit must be met. Should a Level 5 solution be unavailable (bi-factor solution on alternative items), item deletion will be considered (Level 6). If this fails then Level 7 will be used to test for a valid ordinal scale, and if this fails then Level 8 indicates no valid ordinal scale. Details of the Rasch analysis undertaken are in Supplementary File 2 and described elsewhere [45].
Construct (concurrent) validity: was assessed using Spearman’s correlations with work and health measures. Correlations were deemed: 0.8-1 = very strong; 0.6–0.79 = strong; 0.4–0.59 = moderate; 0.20–0.39 = weak; and 0-0.19 = very weak [46]. We hypothesised that, in the four condition groups, there would be: moderate to strong correlations between the WALS and scores for the three work measures: WLQ-25 (Percentage Productivity Loss and Summed scores), WIS (RA- and AS-WIS) and WPAI; and moderate correlations with severity of perceived health status, and condition-specific symptoms and physical function scales.
Discriminant validity: was assessed using Kruskal-Wallis tests to evaluate differences in perceived health status between participants reporting very poor/poor; fair; good/very good health status. A p-value of p ≤ 0.05 was considered significant.
Internal consistency: was assessed using Cronbach’s alpha. Results of ≥ 0.8 were deemed good to excellent: ≥ 0.9 is consistent with individual use; and > 0.7 with group-level use [46].
Test-retest reliability was assessed in those reporting their health was “the same” at T2, using Spearman’s correlations and ICC (2,1): two-way random consistency, average measures model. An ICC ≥ 0.75 was considered excellent and 0.5–0.74 moderate [47]. Reliability of individual WALS items was calculated using linear weighted kappa, with levels of agreement as < 0.20 = poor; 0.21–0.40 = fair; 0.41–0.60 = moderate; 0.61–0.80 = good; 0.81-1.00 = very good [46].
Sensitivity to change: was assessed by calculating Standard Error of Measurement (SEM) and the Minimal Detectable Change95 (MDC95) scores, i.e. a statistical estimate of the smallest detectable change corresponding to change in ability [47, 48]. The formulae used were: SEM = s√(1 – r), where s = the mean plus standard deviation (SD) of T1 and T2 difference, r = the reliability coefficient for the test, i.e. Pearson’s correlation co-efficient between T1 and T2 values. Thereafter the MDC95 was calculated using the formula: MDC95 = SEM × √2 × 1.96 [48].
Floor and ceiling effects: were considered present if > 15% of participants achieved either the lowest or highest scores in the WALS [50].