Instruments
The OxCAP-MH is a self-reported, 16-item, mental health specific instrument, where items are rated on a 1–5 Likert-scale and each question provides an equal contribution to the overall score. The 16 items cover a broad range of individual wellbeing including: Overall health, Enjoying social and recreational activities, Losing sleep over worry, Friendship and support, Having suitable accommodation, Feeling safe, Likelihood of discrimination and assault, Freedom of personal and artistic expression, Appreciation of nature, Self-determination and Access to interesting activities or employment (10). The OxCAP-MH initial score (16–80 scale) is converted on to a 0–100 scale referring to minimum and maximum capabilities using the formula: 100 × (OxCAP-MH total score – minimum possible score)/possible range (11). Higher scores indicate better capabilities; items 2, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15 and 16 are reverse coded. The OxCAP-MH has shown validity (5, 11), responsiveness (5, 11) and feasibility (10) in several settings and mental health disease areas and is currently available in the English, German (22) and Hungarian (23) languages with further language translations ongoing. In an earlier factor analysis, Laszewska et al. found that all EQ-5D-5L items and seven OxCAP-MH items loaded on one factor and nine remaining OxCAP-MH items loaded on a separate factor, indicating that the OxCAP-MH may be seen as supplementary rather than complementary in its concept, when compared to the EQ-5D-5L (5). The OxCAP-MH does not yet have a preference-based value set; however, research is on-going to develop a weighting system for its domains.
The ICECAP-A is a brief self-reported measure for the general adult population with five items, each of which can take one of four levels ranging from full capability to no capability. The domains include Stability (being able to feel settled and secure), Attachment (being able to have love, friendship and support), Autonomy (being able to be independent), Achievement (being able to achieve and progress), and Enjoyment (being able to have enjoyment and pleasure) (12). The ICECAP-A has shown validity (16, 17, 19, 24, 25) reliability (26, 27), responsiveness (28) and feasibility (14) in different populations. Beside the original English language version, it is also available in German (26), Chinese (29), Welsh, Dutch, Danish, Persian and Italian languages (30). Previous factor analysis comparing the ICECAP-A with the items of EQ-5D-5L (31) and EQ-5D-3L (13, 15) found that these instruments measure two different constructs and therefore provide potentially different information. A recent systematic literature review found inconsistencies between the ICECAP-A and EQ-5D instruments, suggesting that the ICECAP-A is most appropriately regarded as a complement for and not a substitute to the EQ-5D-3L and EQ-5D-5L in particular (32). The ICECAP-A has a preference-based value set derived from the UK general population (24) and it is increasingly used in economic evaluations (32). The simple addition of ICECAP-A level sum scores ranges from 5 to 20, with higher scores representing better capabilities.
The EQ-5D-5L is one of the most commonly used self-reported generic health status measures, and its validity and reliability have been reported in various health conditions and populations (33). The EQ-5D-5L descriptive system comprises five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Beside the original 3-level version (34), a more sensitive, 5-level version exists since 2009 (35). Both versions have value sets in several countries (36); but they can also be used as simple descriptive systems with total scores ranging from 5–15 for the 3L version and 5–25 for the 5L version, with higher scores representing better HRQoL. As part of this instrument, respondents’ self-rated health is also recorded on a vertical visual analogue scale (EQ-5D VAS) where scores range between 0-100 referring to worst imaginable health state and best imaginable health state, respectively.
Since the OxCAP-MH and EQ-5D VAS scores range between 0-100, the ICECAP-A level sum scores range between 5–20 and the EQ-5D-5L descriptive system level sum scores range between 5–25, the comparisons between the instruments would be challenging. Hence, all values were transformed to a 0–1 range for the relevant statistical calculations, i.e. in case of responsiveness and agreement analysis. This was calculated as a simple division by 100 in case of the OxCAP-MH and EQ-5D VAS scores, and a transformation of the ICECAP-A and EQ-5D-5L scores in a way that a score of 5 was recalibrated to 0 and scores of 20 and 25 were recalibrated to 1, respectively.
The Beck Depression Inventory (BDI), General Anxiety Disorder (GAD), Rosenberg self-esteem scale (RSES), and the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) are all mental-health specific, self-reported outcome instruments. They were used as anchors for the sensitivity to change analysis to assess external responsiveness.
BDI is a self-reported measure of depressive symptoms and their severity in adolescents and adults according to the Diagnostic and Statistical Manual for Mental Disorder (37). It has 21-items scored on 4-point polytomous response scale ranging from 0 to 3 (21). Scores range between 0 and 63 with higher score representing more severe depression.
GAD is a self-reported measure of anxiety symptoms over the last two weeks. It consists of seven items scored on a 0–3 scale with higher score indicating more severe symptoms (range from 0 to 21) (38). The cut-off scores of 5, 10 and 15 reflect mild, moderate and severe anxiety symptoms, respectively (39).
RSES is a 10-item, self-reported instrument that measures global self-worth by measuring both positive and negative feelings about the self (40). Items are answered using a 4-point polytomous response scale format ranging from strongly agree to strongly disagree. Items 2, 5, 6, 8, 9 are reverse scored.
The self-reported WEMWBS instrument was developed in the UK to assess mental wellbeing including affective-emotional aspects, cognitive-evaluative dimensions and psychological functioning. It is a 14-item scale with 5 response categories (‘none of the time’, ‘rarely’, ‘some of the time’, ‘often’, ‘all of the time’), with a total score ranging from 14–70. A higher score indicates a higher level of mental wellbeing (41).
Responsiveness
Responsiveness was defined as the ability to capture clinically important changes over time (45). Patients filled out each four scales at both baseline and 9 months, which allowed for an exploration of change in mean scores over time. Responsiveness was assessed in terms of an external approach comparing the extent to which change in a capability measure relates to corresponding change in anchor instruments (46, 47). The analysis of responsiveness started with the definition of 2–4 instruments which could be used as autonomous anchors because they identify change that is unlikely to have arisen by chance (47).
The level of responsiveness was evaluated by defining groups who worsened, improved or remained stable, based on whether a change in the instrument scores between baseline and 9-month follow-up assessments was measured for individuals by the reference or anchor instruments. The calculation was based on the difference between baseline to 9-month values of standard error of measurement (SEM) using the following formula:\(\)\({S}_{diff}=\sqrt{({SEM}_{1}^{2}+{SEM}_{2}^{2})}\)
SEM was calculated by using the standard deviation (SD) of the instrument multiplied by the square root of one minus its reliability coefficient at baseline and 9 months (11, 48). Internal consistency reliability coefficients were calculated for each scale based on the baseline to 3-month and 6-month to 9-month follow-up scores. More details on the calculation of the difference in SEM values can be found in Appendix 5. There is no consensus about how many SEMs an individual's score must change for that change to be considered clinically meaningful. This paper used the threshold of one SEM, which is known to frequently correspond to a minimally important difference (11, 49). In addition, standardised response mean (SRM) was calculated as the ratio of the mean change, between baseline and 9-month follow-up scores in a single group, to the SD of the change scores (42). Small, moderate and large magnitude of change was indicated by 0.20–0.49, 0.50–0.79 and ≥ 0.80 values of SRM, respectively (33). Next, the percentages of the study respondents who improved, worsened or remained stable according to the capability and anchor questionnaires were calculated to explore changes at the individual patient level (5).