This study established a high level of content validity for PRWHE, AUSCAN and TDX for patients with hand arthritis. The content validity index was very high for all the individual items for each questionnaire (I-CVI> 0.77) and for the overall score (S-CVI > 0.85) in terms of relevancy and clarity. The Kappa inter-rater agreement was excellent across all the individual items for all PROMs (PRWHE, AUSCAN and TDX) among the raters.
Content validity of PRWHE was established during the development of the PRWE by using semi-structured interviews in patients with distal radius fracture and expert opinion.  However, neither were quantified. Thus, this study provides novel information on the content validity of the items of the PRWE/PRWHE, with specific reference to those with hand arthritis. All items of PRWHE were found with very high content validity index in terms of relevance (I-CVI > 0.79) and clarity (I-CVI > 0.87).
For the AUSCAN the content validity was established during development using a formal clinimetric process where patients in a tertiary care centre rated items by importance and frequency to establish relevance. This study provides additional support for the content validity in a community sample of people living with hand arthritis, and by adding new data on the clarity of the items.
It might have been expected that the AUSCAN would have more relevance to our sample, than the PRWHE since it a disease-specific PROM. Both point estimate and CI comparisons indicate that AUSCAN had slightly higher overall scores in terms of relevancy (S-CVI = 0.92, 95% CI: 0.90 to 0.94) and clarity (S-CVI = 0.99, 95% CI: 0.98 to 1.00) than the PRWHE (S-CVI=0.85, 95% CI :0.82 to 0.88 for relevancy and S-CVI=0.95, 95% CI: 0.93 to 0.97 for clarity). Although the CIs of the respective S-CVIs indicate that there was a small statistically significant difference (Table 5) between compared S-CVI values (AUSCAN vs TDX and AUSCAN vs PRWHE), all PROMs met standards of very high content validity. Finally, since 6 to 8 additional raters assessed the PRWHE than the AUSCAN, the small differences may reflect differences in rater pools rather than an actual difference in perceptions.
The TDX is relatively new developed PROM (Noback et al. 2017) that was initially tested in patients with basal joint arthritis. The TDX demonstrated very high content validity index when assessed in terms of relevancy (S-CVI = 0.87, 95% CI: 0.85 to 0.89) and clarity (S-CVI = 0.91, 95% CI: 0.89 to 0.94).All the individual items of the TDX had a very high content validity index (I-CVI >0.77). To the authors knowledge, no previous studies exist to assess the content validity index of TDX in the current literature. The item generation of TDX included the review of items from relevant scales (Michigan Hand Questionnaire (MHQ), Disabilities of the Arm, Shoulder, and Hand (DASH), AUSCAN, PRWHE and McGill Pain questionnaire). Then, the development process included item reduction and pilot testing and then final item reduction.
Our kappa statistics indicated excellent agreement between patient raters after correcting for chance agreement. (K> 0.77). The assessment from a large pool of experts (n> 60) generated similar scores between the I-CVI and K scores. This has been previously described in the literature when the number of experts increasing and the probability of chance (Pc) decreases the K agreement and I-CVI values tend to converge.
This study provided novel data on the content validity index in 3 different PROMs in patients with hand arthritis. Since few studies address content validity, this is important to support the conceptual foundations of these measures. While the computation of CVI is relatively easy, its major weakness is the failure to adjust for chance agreement. However, the authors tried to mitigate this problem by calculating a modified kappa agreement. A potential limitation is that the items of the PROMs were not randomized but the items were rated for relevance and clarity in an order (PRWE, AUSCAN, TDX). We deem that it is highly unlikely to have an order effect in the CVI values. First, higher scores were found in AUSCAN and not in PRWE and second, all CVI scores were very high and this indicates that the conclusion is not affected by order.