This study was made to test the validity of the Swedish translation of WOOS. In our study, convergent criterion validity was tested by correlating WOOS to CMS, OSS and EQ-5D. The correlation between WOOS and CMS was high, which is in accordance with the original version in English(1), as well as with the results from a study of the validity of the Danish version of WOOS (17). The correlation between the WOOS and OSS was also high, and our results show that the Swedish version of the WOOS is valid compared to both these widely used shoulder-specific scoring systems.
The correlation between the WOOS and EQ-5D 3L was higher than we expected. In validation studies for the original English version and the Danish version, correlations were made against the general health measure SF36 with 36 items, instead of the EQ-5D. In both studies, the correlations between WOOS and SF36 were shown to be poor(1, 17).
In addition, we examined correlations between each separate domain of WOOS and EQ-5D in this study. The best correlation was seen between the physical domain of WOOS and EQ-5D. This might be explained by the emphasis on pain in the physical domain of WOOS, and that pain also reflects in the EQ-5D to a large extent. The weakest correlation was seen between the emotional domain of WOOS and EQ-5D.
Comparing WOOS domains and the clinically examined items of CMS, we found that the correlations for the total score were 0.53. The highest correlation was seen for the physical domain of WOOS, which may be expected as the physical domain of WOOS covers the same type of issues that a physical examination does. As noted earlier, one difference between WOOS and CMS is that WOOS only contains patient-reported questions, whereas CMS includes questions that necessitates a clinical exam
Content validity was analyzed with the floor and ceiling effects. There was no floor and a small ceiling effect for total WOOS. There were adequate floor and ceiling effects in some of the domains, the highest being postoperative ceiling effects in the emotions domain. This is in accordance with the results from other articles on validation of the Western Ontario shoulder instruments(7, 17, 18).
Lack of preoperative floor effect is a good property of WOOS, and makes the score sensitive for not only bettered, but also worsened, symptoms. The small postoperative ceiling effect means that some patients reported that they were free of all symptoms after surgery. This means that they will not be able to report any further improvement in a later assessment. This may be considered as a weakness, or that other measures are needed to assess shoulders that are free of symptoms.
Analyzing reliability, CA was shown to be excellent in all the domains and for total WOOS in the postoperative group as well as when combined with the preoperative group. CA for total WOOS was very high (0.95), which indicates that some items are redundant. However, as CA normally increases with the number of items in an instrument (WOOS has 19 items), this might be a contributing factor for the high CA value.
In the preoperative group lower CA values were seen with separately analyzed domains, and the sport and lifestyle domains both had CA values below 0.7. The emotions domain had a CA value of 0.72 and thus graded as adequate. The Physical and Emotional domains both had CA values above 0.7.
Responsiveness to change was analyzed by calculating ES as well as SRM. All four scoring systems showed high ES, all exceeding 0.8. The WOOS score had the highest ES at 2.52, and an SRM of 1.43. These results were similar to the results shown in the study on the Danish translation of WOOS (17), which reported an ES of 2.32 and an SRM of 1.41 for the WOOS score. Support of a high responsiveness for WOOS in shoulder arthroplasty, as well as an excellent correlation with the American Shoulder and Elbow Surgeons score, is also shown in another recent study(19). In the original WOOS article (1), as well as in the previous Swedish validation of WOOS in patients with subacromial pain (7), only the SRM for the different scores was presented. In these studies, the SRM were 1.20 and 1.91 respectively. We believe that the results give support to the notion that WOOS is a responsive instrument in a clinical setting.
The difference between the ES and SRM for WOOS in our study is an effect of the much larger standard deviation seen in the postoperative group compared to the preoperative. When plotted as a histogram, the preoperative scores come closer to a normally distributed curve than the postoperative scores. This could be explained by the large number of good results in the postoperative group and is also reflected by the occurrence of a small, but adequate, ceiling effect.
The MDC and MCID was found to be at the level of previous estimates for WOOS, with a 10% change or difference as the minimum of clinical relevance recommended for WOOS%(20).
We find it notable that EQ-5D 3L, a general health measure, performs so well compared to shoulder-specific health measures. EQ-5D 3L was shown to be highly responsive for change (ES=0.82, SRM=0.86) in patients with glenohumeral OA. EQ-5D 3L provides no possibility to study specific shoulder-related problems and cannot replace WOOS as a shoulder evaluation tool. However, our results suggest that EQ-5D 3L adequately reflects disease-specific QoL in patients with glenohumeral OA. The time and effort needed to complete the EQ-5D 3L questionnaire is less in comparison to the WOOS questionnaire.
The outcome of the treatment, as measured by the PROM used in the SSAR are considered stable over time. There is a slight decrease in the overall results at 10 years, but lower than MDC and it may be difficult to determine if a change is related to the implant performance or a result of increasing patient age. The possible need for an age adjusted WOOS will have to be studied separately. The lack of a clinical examination in WOOS might be regarded as a weakness of the score. However, evidence that WOOS adequately covers these questions could improve evaluation of patients with glenohumeral OA, and save resources, and should be further studied.
One strength of this study is the correlations of WOOS made to both CMS and OSS. CMS is a well-established and widely used shoulder score, and we think it is an important correlation to be made in the validation process of any shoulder score. The correlation to OSS is important since OSS is used in other shoulder arthroplasty registries. This can be of value when comparing results from different registries. The patient cohort was limited but could be considered as useful for the planning of future studies of comparisons of PROM outcome. We also could demonstrate the real performance of the PROM over time, in use for a 10-year follow-up within SSAR. No test-retest analysis was performed within this study, which we consider to be a weakness. A test-retest analysis of the Swedish translation of WOOS might be a subject for a future study to validate the score within the registry.