Translation and cross-cultural adaptation
We translated the MSK-HQ into Danish in accordance with international standards  . This was done in close collaboration with the license holder of the MSK-HQ questionnaire, Oxford University Innovation Ltd (http://innovation.ox.ac.uk) and a representative of the research group, who developed the questionnaire (JCH). A professional English language translator specialising in medical translation and a bilingual physiotherapist, both Danish native speakers, translated the questionnaire from English into Danish. Subsequently, their translations were compared at a consensus meeting in the project group with participation of both translators. A professional native English-speaking translator back-translated the questionnaire from Danish into English, without any prior knowledge of the original version. The back-translation was compared to the original version. Afterwards, the translation report was reviewed by JCH and approved. Finally, we performed cognitive debriefing interviews with 13 patients with musculoskeletal disorders recruited in one physiotherapy clinic in the Central Denmark Region. The results of the cognitive debriefing were presented and reviewed in the project group and the final version of the questionnaire was completed.
Design and study populations
The study was a prospective comparative study encompassing two consecutively recruited cohorts of patients with musculoskeletal disorders in primary physiotherapy practice in DK and UK.
Consecutive adult (≥18 years) consulters referred to physiotherapy with a musculoskeletal disorder were invited to participate in six physiotherapy (PT) clinics in the Central Denmark Region. Participants had to be able to understand and speak Danish well enough to complete questionnaires, with no further inclusion criteria applied. In the period January-July 2017 a total of 180 patients agreed to participate, of whom 27 were excluded due to ‘no show’/cancelations (n=9), withdrawal (n=5), other diagnosis than musculoskeletal (n=2), other reasons (n=4) or did not complete baseline questionnaire (n=7), leaving 153 patients for analysis.
At first contact, the patient was informed about the project. If the patient agreed to participate, an e-mail with a link to an electronic questionnaire was sent to the patient (baseline). The questionnaire included the Danish versions of the MSK-HQ, the generic EQ-5D-5L [19-21] and validated reference standard measures depending on the pain region from which the patient’s main problem originated; Shortened Disabilities of the Arm, Shoulder and Hand (Q-DASH)[22, 23], Neck and Back disability Indexes (ODI and NDI)[24-26], Pain, stiffness and function modules of Knee injury and Hip Disability Osteoarthritis Outcome Scores (KOOS and HOOS) [27-29]. An appointment was made for a first PT consultation (test-retest) 5-7 days later. Immediately before this appointment, the patient once more completed the questionnaire. Follow-up questionnaires were sent after six and 12 weeks by e-mail. In addition, in retest and follow-up questionnaires, patients were asked to rate the overall change in their condition on a 7-point scale (much better, better, slightly better, unchanged, slightly worse, worse, much worse). A total of 134 patients (88%) completed the test-retest questionnaire (median time interval 6 days [Inter quartile range 3-8 days]). Follow-up questionnaires were completed by 118 patients (77%) at six weeks and 128 patients (84%) at 12 weeks. For comparison with the UK cohort only the results of the 12-week follow-up are included in the present study.
The UK cohort was drawn from the primary care physiotherapy sample used as the original validation cohort for the MSK-HQ. The details of materials and methods have previously been described elsewhere . Briefly, the cohort included 210 consecutive consulters in community musculoskeletal physiotherapy clinics in five UK West-Midlands towns. Of those 166 (78%) with test-retest data were available for the present study.
Participants completed the English paper version of the MSK-HQ and the EQ-5D-5L index before the start of treatment at the first clinic visit (baseline) and again at the second visit, typically 2 weeks later (test–retest). Follow-up questionnaires were completed at 12 weeks by 133 (80%) patients. A transition question on overall change in the condition on a 5-point scale (much better, slightly better, unchanged, slightly worse, much worse) was completed by patients at test-retest and 12-week follow-up.
Descriptive statistics were calculated for all variables and compared between the two cohorts. Sum scores were calculated at all time points and raw scores were calculated if no more than
3 items were missing in the respective score; otherwise, the score was left missing. Possible floor and ceiling effects were examined and such effects were considered to be present if more than 15% of the respondents achieved the highest or the lowest sum score, respectively.
As no reference standard measures were collected for the UK primary care cohort , we only assessed construct convergent validity for the Danish version of the MSK-HQ. This was done by correlation analyses between the MSK-HQ scores and the relevant reference standard measures scores at baseline (Q-DASH, ODI, NDI and WOMAC scores calculated from KOOS and HOOS). Likewise longitudinal convergent validity was assessed by correlation analyses between MSK-HQ and reference standard measures change scores at three months. Based on findings from the original validation study of the MSK-HQ  and previous literature of health outcome research [30-32], we expected correlations between MSK-HQ and relevant reference standard measures to be at moderate to strong (r= ≥ 0.5).
Cross-cultural validity and measurement invariance
Measurement invariance according to language, categories of age, pain site, duration of pain, and work status, was assessed by Differential item functioning (DIF) analysis of baseline ratings of the two cohorts. DIF is the assessment of the extent to which items function differently between various groups, when the scores among those groups are corrected for. Uniform dichotomous DIF on the raw scores was assessed in this paper via the Mantel-Haenszel (MH) statistic calculated in the R 3.4.1  package difR . Item purification was used and adjustment for multiple comparisons was made via Holm’s method. The assessment of the effect size for the DIF was made on the ETS Delta scale for the dichotomous categories (i.e., country, duration, work status) . Note that MSK-HQ items were dichotomised such that the two lower impact categories (‘not at all’ and ‘slightly/rarely’) opposed the three higher impact categories. Furthermore, for the assessment of pain location, ‘neck’ was collapsed with ‘back’ as there were too few instances of neck pain to calculate the MH statistic robustly.
Measurement error and reliability
As test-retest was administered differently in the two cohorts with respect to time interval and initiation of treatment, we restricted the analysis to patients who reported their condition to be ‘stable’ between administrations. Systematic measurement error between MSK-HQ scores at baseline and retest was analysed by Bland-Altman plot and paired t-test for the DK and UK cohorts. Further random errors were estimated by standard error of measurement (SEM) and minimal detectable change (MDC =1.96×√2×SEM) was calculated. Cronbach’s alpha was calculated to assess internal consistency. The intra class correlation coefficient (ICC 2.1) was used to assess test-retest reliability, and for single items Cohen’s Kappa with quadratic weights was used. Confidence intervals (95% CI) for Kappa values were obtained using non-parametric bootstrap methods (1000 replications).
Sensitivity to change, responsiveness and interpretability
To evaluate sensitivity to change MSK-HQ change scores from baseline to three months and effect-size statistics (i.e., mean change/ standard deviation at baseline) were calculated and compared between the two cohorts. For responsiveness the MSK-HQ's ability to discriminate between unchanged patients was calculated and compared between cohorts by receiver-operating-characteristic (ROC) curve analyses with large improvement (much better, better versus a little better, unchanged, little worse) and small improvement (much better, better, little better versus unchanged), using the transition question as external anchor. Responsiveness to worsening was not analysed, as only few patients rated their condition to be ‘worse’ or ‘much worse’. Minimal clinically important change (MCIC) values were estimated by the Pythagoras' Theorem (a^2 + b^2 = c^2) to choose the change score closest to the upper left-hand corner, which best discriminated between improved and unchanged patients . As MCIC values can be affected by baseline scores, analysis was repeated with relative change scores (i.e., change scores expressed as percentages of the baseline scores) . Finally, as a key vison for the development of the MSK-HQ was to produce a single musculoskeletal health measure superior to generic health tools, we compared effect size estimates and areas under the ROC curve for the MSK-HQ and the EQ-5D change scores . The statistical package STATA version 15 was used.