WOMAC questionnaire
WOMAC has been validated in Arabic speaking countries, and has since been used in practice. Nevertheless, additional analyses have been conducted in order to explore psychometric characteristics of a WOMAC questionnaire that had been used in this study.
Test reliability for the first testing situation - calculated using Cronbach's alpha - was 0.98 for the pain subscale, 0.98 for the stiffness, and 0.99 for the physical function subscale. For the second testing, reliability was 0.99, 0.97, and 0.99 (pain, stiffness, and physical function, respectively). This only proves that WOMAC is an instrument with good reliability.
In order to check content validity, floor and ceiling effects were examined. 10% of the patients have recorded floor effect on pain subscale, 14% on stiffness subscale, and 12% on the physical function. On the other hand, 3% have recorded ceiling effect on pain subscale, 3% on stiffness subscale, and 3% on the physical function. Being that these percentages are far less than 30% (which is considered relevant) – this is an argument in favour of content validity of WOMAC.
Harris Hip Score
To test the reliability of the instrument, Cronbach's alpha was calculated. For each of the three testing occasions the reliability was very good or excellent – α1 = 0.92, α2 = 0.91, and α3 = 0.90. Intra-class correlation coefficient was good with the score of 0.76 (95% CI 0.44-0.88).
Floor effect was recorded for 1% of the patients, and 2% showed a ceiling effect in the first week of testing. Two and a half weeks later, 1% of respondents again showed ceiling effect, and there was no floor effect recorded. On the third testing, 1% recorded floor effect, and one more time ceiling effect hasn’t been documented. Shapiro-Wilk test was used to check if the data significantly deviates from the normal distribution, and it showed that it did, in all three testing occasions.
Table 1. Descriptive statistics of Harris Hip Score questionnaire
|
|
N1
|
Min2
|
Max3
|
Mean
|
SD4
|
Sk5
|
Ku6
|
Floor effect
|
Ceiling effect
|
HHS
|
Week 1
|
110
|
0
|
91
|
66.0
|
17.613
|
-1.232
|
1.494
|
1%
|
2%
|
Week 2
|
110
|
0
|
87
|
61.1
|
17.841
|
-1.024
|
.692
|
1%
|
0%
|
Week 3
|
108
|
0
|
85
|
52.6
|
18.563
|
-.565
|
-.015
|
1%
|
0%
|
Note: 1 Sample size; 2 Minimum; 3 Maximum; 4 Standard deviation; 5 Skewness; 6 Kurtosis.
A 2 weeks’ test-retest reliability of HHS was applied to the present manuscript. Of the 110 patients that fulfilled the questionnaire, 108 responded to the second assessment after the initial evaluation.
Table 2: Mean, standard Deviation, Change, ICC between different assessments of each subscale
Subscales
|
|
Scores
|
Change*
|
ICC (95% CI)
|
Cronbach's alpha (95% CI)
|
|
|
First assessment
|
|
Second assessment
|
|
Third assessment
|
|
|
Mean
|
SD
|
|
Mean
|
SD
|
|
Mean
|
SD
|
|
|
|
WOMAC
|
|
|
|
|
|
|
|
|
|
|
|
|
Pain
|
|
53.22
|
15.90
|
|
63.17
|
18.85
|
|
|
|
9.95
|
0.581 (0.234 - 0.760)
|
0.735 (0.379 - 0.864)
|
Stiffness
|
|
53.38
|
16.87
|
|
63.55
|
18.50
|
|
|
|
10.17
|
0.593 (0.230 - 0.772)
|
0.745 (0.375 - 0.872)
|
Physical Function
|
|
53.31
|
16.39
|
|
62.91
|
18.60
|
|
|
|
9.60
|
0.623 (0.262 - 0.793)
|
0.768 (0.416 - 0.884)
|
HHS
|
|
72.55
|
19.35
|
|
67.12
|
19.61
|
|
57.81
|
20.40
|
-14.74
|
0.755 (0.442 - 0.876)
|
0.902 (0.704 - 0.955)
|
* Minus sign in HHS means that the condition of patient has been worsened over time (lower score = Deterioration) / Plus sign in WOMAC means that the condition of patient has been worsened over time (higher score = Deterioration)
|
Test-retest reliability was also performed using Intra-class Correlation (ICC). The results (Table 2) indicated that HHS has an acceptable intra-class correlation with 0.755 (95% CI 0.442, 0.876). Considering the value of 0.902 (95% CI 0.704 – 0.955) for Cronbach’s alpha, the internal consistency of the three assessments were proven to be very high.
In order to be able to compare the results of WOMAC questionnaire with those from HHS, it was important to standardize the scores of WOMAC to the range of 0-100. In addition, HHS score which were in the range of 0-91, were rescaled to 0-100 to match the WOMAC scores. Figure 1 illustrates the change and the mean level of different subscales during different assessments which were conducted 2 weeks apart from each other. It is visually evident that the mean score of HHS decreased which is related to more pain and symptoms. At the same time the WOMAC mean score is showing an upward trend, which is also related with more pain and in general worsened conditions of the patient. This illustrates a visual agreement between the two questionnaires.
As can be seen in the table below, there are medium to large negative correlations between Harris Hip Score on one side, and all the subscales from the WOMAC questionnaire on the other. This shows that patients with high scores on WOMAC have low scores on HHS. This means that those who experience greater hip pain have higher scores on WOMAC, and lower HHS.
Table 2. Convergent validity of the Harris Hip Score (Spearman's rank correlation coefficient)
|
|
WOMAC
|
|
Pain
|
Stiffness
|
Physical function
|
Week 1
|
Harris Hip Score
|
-.56**
|
-.61**
|
-.62**
|
Week 2
|
Harris Hip Score
|
-.41**
|
-.42**
|
-.48**
|
Note: ** Correlation is significant at the 0.01 level (2-tailed).
|
Responsiveness
14 patients (13.1%) reported overall relevant improvement in their condition by responding to the WOMAC questionnaire, while 53 patients (49.5%) reported worsening of their condition, and 40 of participants remained stable (37.4%).
Table 3: Responsiveness and agreement between the two questionnaires
QUESTIONNAIRES
|
HRRIS HIP SCORE (HHS)
|
TOTAL
|
Stable
|
Improvement
|
Deterioration
|
WOMAC
|
Stable
|
3.7%
|
2.8%
|
30.8%
|
37.4%
|
Improvement
|
0.0%
|
2.8%
|
10.3%
|
13.1%
|
Deterioration
|
3.7%
|
0.9%
|
44.9%
|
49.5%
|
TOTAL
|
7.5%
|
6.5%
|
86.0%
|
100.0%
|
On the other hand, only 8 patients (7.3%) reported to remain stable by responding to HHS questionnaire. The majority of them (86.4%) believed their condition to be deteriorated, and only 6.4% of them reported relevant improvement after 2 weeks. In addition, it is relevant to note that 12 patients (11.2%) showed contradictory results (1 patient improved according to HHS and worsened according to WOMAC, while 11 patients showed the opposite). 33 patients (30%) believed that their condition was worsened according to HHS, while according to WOMAC their conditions were not changed (Table 3).
Table 4: Effect Sizes and SRMs for the WOMAC subscales and HOOS subscales. Bars represent the 95% confidence intervals.
Questionnaire
|
Subscales
|
Effect Size (Cohen's d)
|
95% CI*
|
SRM
|
95% CI*
|
WOMAC
|
Pain
|
0.571
|
0.387
|
0.751
|
0.406
|
0.358
|
0.434
|
Stiffness
|
0.574
|
0.395
|
0.749
|
0.411
|
0.366
|
0.436
|
Physical Function
|
0.547
|
0.378
|
0.709
|
0.410
|
0.363
|
0.434
|
HHS
|
|
0.729
|
0.537
|
0.891
|
0.456
|
0.441
|
0.467
|
* Bootstrap confidence interval (1000 iterations; random number seed: 978).
|
Effects are often used to give meaning to change over time in terms of ‘trivial’ (ES < 0.20), ‘small’ (ES ≥ 0.20 < 0.50),’moderate’ (ES ≥ 0.50 < 0.80) or ‘large’ (ES ≥ 0.80) change. Cohen introduced this ‘matched pairs’ effect size, which was later renamed the standardised response mean (SRM) by Liang et al.20 According to responsiveness test, WOMAC subscales show similar responsiveness (SRM = 0.41) between first and second measurement. In comparison to WOMAC, HHS showed better responsiveness with SRM = 0.46. This is important to note, however, that responsive change of both questionnaires are very similar and the differences are not considerable.
Level of Agreement between WOMAC & HHS
One of the best methods to measure the level of agreement between two measurement methods is Bland-Altman plot. In this method, the mean difference between WOMAC and HHS are plotted as a function of mean of WOMAC and HHS. As shown in the graphs, overall mean difference between WOMAC and HOOS shows that there could be a systemic bias between two questionnaires (M = -7.49, 95% CI -13.59, -1.41, p = 0.016). In order to test this result, linear regression was performed with mean difference between WOMAC and HOOS as a dependent variable and mean value of WOMAC and HOOS as independent variable. The result of linear regression also indicates statistically significant difference between the two measurement methods (β = -0.94, 95% CI -1.801 – -0.081, t = -2.168, p = 0.03).
First and last measurement of both methods are also compared together with the help of Bland-Altman plot, to investigate whether there will be any change over time to the systemic bias between the two methods. The results indicate that in the first measurement there is a systemic bias between the two methods (M = -18.9, 95% CI -25.13, -12.65, p < 0.001), the performed linear regression also confirms this bias (β = -0.95, 95% CI -1.81 – -0.104, t = -2.235, p = 0.028). It means that HHS increasingly overestimates the worsened conditions in comparison to WOMAC. However, in the last measurement, the slope of the regression line decreases and became statistically insignificant (β = -0.58, 95% CI -1.38 – 0.23, t = -1.429 p = 0.156).