The analysis in this paper was based on data from the PoMeT trial (15), which investigated the impact of Positive Memory Training on depression symptoms of schizophrenia patients (n=100) in the UK between 2014-2016. The trial received ethical approval from the Berkshire Research Ethics Committee (REC ref 13/SC/0634). Patients were eligible for inclusion if they were between 18-65 years of age, had a DSM-V diagnosis of schizophrenia or schizoaffective disorder, and had at least a mild level of depression as measured by scoring 14 or more on the Beck Depression Inventory-II (27). Participants were identified by trial research assistants, working in collaboration with care coordinators based within community mental health teams. Randomisation was stratified by site and severity of depression (above and below a BDI-II score of 29, i.e. a severe level of depression) using randomised-permuted blocks (15). Patients were assessed at four time points through the 9-month study period: baseline, 3 months, 6 months and 9 months. More details about the PoMeT trial can be found in Steel et al (15).
The OxCAP-MH is a self-reported, 16-item instrument developed in the context of mental health outcome measurement , where items are rated on a 1–5 Likert-scale and each question provides an equal contribution to the overall score. The 16 items cover a broad range of individual wellbeing aspects including: Overall health, Enjoying social and recreational activities, Losing sleep over worry, Friendship and support, Having suitable accommodation, Feeling safe, Likelihood of discrimination and assault, Freedom of personal and artistic expression, Appreciation of nature, Self-determination and Access to interesting activities or employment (9). The OxCAP-MH initial score (16-80 scale) is converted on to a 0–100 scale referring to minimum and maximum capabilities using the formula: 100 × (OxCAP-MH total score – minimum possible score)/possible range (17). Higher scores indicate better capabilities; items 2, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15 and 16 are reverse coded. The OxCAP-MH has shown validity (5, 17), responsiveness (5, 17) and feasibility (9) in several settings and mental health disease areas, including schizophrenia and depression (14, 15, 24). It is currently available in the English, German (28), Hungarian (29) and Luganda languages with further language translations ongoing. The OxCAP-MH does not yet have a preference-based value set, so far it has been used in economic evaluations as a score; however, research is on-going to develop a weighting system for its domains.
The ICECAP-A is a brief self-reported measure for the general adult population with five items, each of which can take one of four levels ranging from full capability to no capability. The domains include Stability (being able to feel settled and secure), Attachment (being able to have love, friendship and support), Autonomy (being able to be independent), Achievement (being able to achieve and progress), and Enjoyment (being able to have enjoyment and pleasure) (18). The ICECAP-A has shown validity (22, 23, 25, 30, 31) reliability (32, 33), responsiveness (34) and feasibility (20) in different populations, including depression (35). The simple addition of ICECAP-A level sum scores ranges from 5 to 20, with higher scores representing better capabilities. Beside the original English language version, it is also available in German (32), Chinese (36), Welsh, Dutch, Danish, Persian and Italian languages (37). The ICECAP-A has a preference-based value set derived from the UK general population (30) and it is increasingly used in economic evaluations (38).
The EQ-5D is one of the most commonly used self-reported generic health status measures, and its validity and reliability have been reported in various health conditions and populations (39). The EQ-5D descriptive system comprises five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Beside the original 3-level version (40), a more sensitive, 5-level version exists since 2009 (41), and both versions have value sets developed in several countries (42). The EQ-5D instruments can be used to summarise change with or without using value sets (43); however, value sets can introduce an exogenous source of variance into statistical inference (44). The EuroQol Group guidance to users of EQ-5D value sets warns against using value sets to produce a single index for statistical analysis of profiles that are meant to be purely descriptive. The EQ-5D instruments can be used as simple descriptive systems with total scores ranging from 5-15 for the 3L version and 5-25 for the 5L version, with higher scores representing better HRQoL. As part of the EQ-5D-5L, respondents’ self-rated health is also recorded on a vertical visual analogue scale (EQ-5D VAS) where scores range between 0-100 referring to worst imaginable health state and best imaginable health state, respectively. The EQ-5D VAS can capture further important, complementary information to the health state information patients provide when they self-report their health on the EQ-5D (45).
The analysis also included comparative information from four condition-specific instruments used in the POMET trial due to their ability to capture important aspects of the condition from the patient’s perspective and thereby enabled the assessment of the OxCAP-MH, ICECAP-A, and EQ-5D-5L instruments’ ability to reflect clinically relevant mental health outcomes. The Beck Depression Inventory (BDI), General Anxiety Disorder (GAD), Rosenberg self-esteem scale (RSES), and the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) are all mental-health specific, self-reported measures.
BDI is a self-reported measure of depressive symptoms and their severity in adolescents and adults according to the Diagnostic and Statistical Manual for Mental Disorder (46). It has 21-items scored on 4-point polytomous response scale ranging from 0 to 3 (27). Scores range between 0 and 63 with higher score representing more severe depression.
GAD is a self-reported measure of anxiety symptoms over the last two weeks. It consists of seven items scored on a 0-3 scale with higher score indicating more severe symptoms (range from 0 to 21) (47). The cut-off scores of 5, 10 and 15 reflect mild, moderate and severe anxiety symptoms, respectively (48).
RSES is a 10-item, self-reported instrument that measures global self-worth by measuring both positive and negative feelings about the self (49). Items are answered using a 4-point polytomous response scale format ranging from strongly agree to strongly disagree. Items 2, 5, 6, 8, 9 are reverse scored.
The self-reported WEMWBS instrument was developed in the UK to assess mental wellbeing including affective-emotional aspects, cognitive-evaluative dimensions and psychological functioning. It is a 14-item scale with 5 response categories (‘none of the time’, ‘rarely’, ‘some of the time’, ‘often’, ‘all of the time’), with a total score ranging from 14–70. A higher score indicates a higher level of mental wellbeing (50).
Level sum scores
Using preference-based index values is an important aspect of utility-based economic evaluations; however, it can also introduce an exogenous source of variance into statistical inference (44). Since the focus of this paper was on the comparability of the descriptive systems of the instruments; preference-based weights available for the ICECAP-A and the EQ-5D-5L at the time of writing this paper were not used and instead level sum scores were applied for all instruments. In this way we also avoided the ceiling effect present in the case of both preference-based index values, as presented in Appendix 1. Moreover, relevant value sets for the EQ-5D-5L descriptive system and the ICECAP-A have different anchor points. The 0 point of the EQ-5D-5L value set is anchored against ‘death’, while the 0 point of the ICECAP-A value set is anchored against ‘no capability’ leading to further potential difficulties in interpreting any comparisons based on preference-weighted scales.
The statistical analysis focused on exploring and comparing the measurement properties of the OxCAP-MH, ICECAP-A, EQ-5D-5L descriptive systems and the EQ-5D VAS. Correlations of baseline and change scores to test and compare construct validity across the scales, exploratory factor analysis (EFA) and investigation of responsiveness to change and degree of agreement were carried out.
For all analyses, the level of significance was determined at p < 0.05, unless stated otherwise. Group comparisons of mean baseline scores were conducted using the Wilcoxon rank-sum test (51, 52) for two-group comparisons and Kruskal-Wallis one-way ANOVA for multiple group comparison (53). Analysis was conducted on complete cases, excluding missing items at the relevant time point, unless stated otherwise. EFA was conducted with the freely available FACTOR software, and we used STATA Version 16 for all other analyses.
Convergent validity indicates the degree to which two measures of constructs that theoretically should be related, are in fact related (54, 55). The hypothesis, that capability instruments and their items have stronger correlation with each other than with a HRQoL instrument, was tested through exploring the correlation between baseline scores. Spearman’s rank correlations across OxCAP-MH, ICECAP-A, EQ-5D-5L descriptive system, EQ-5D VAS and condition-specific measures were calculated at total score-level at baseline and assessed based on Cohen's effect size classification, namely < 0.3 is small, 0.3 - < 0.5 is moderate and ≥ 0.50 is large (56).
Next, locally weighted smoothing curve (LOWESS) fit lines were used to graphically indicate nonlinear trends in the scatter plots between OxCAP-MH, ICECAP-A, EQ-5D-5L descriptive system and EQ-5D VAS scores (57). LOWESS is a form of nonparametric regression that plots a line of central tendency between two variables on a scatterplot, thereby visualizing the relationship across the possible score ranges. LOWESS captures general patterns in the relationship between two measures without making assumptions about their actual relationship (58).
Exploratory factor analysis
EFA is a method to uncover the underlying structure of variables and is therefore used to assess whether an instrument measures what it intends to measure (59). It was used to evaluate the construct validity through the factors (or concepts) assessed by the instruments and their relevance to the underlying construct. EFA was conducted on the baseline scores of the OxCAP-MH, ICECAP-A and EQ-5D-5L descriptive system to examine the overlap between the constructs of the two capability measures and the multidimensional measure of HRQoL, and to study how far they share the same set of underlying factors. Further details on the methods of EFA can be found in Appendix 3.
Responsiveness was defined as the ability to capture clinically important changes over time (60). Patients filled out each scales at both baseline and 9 months, which allowed for an exploration of change in mean scores over time across all instruments. Responsiveness was assessed by Spearman’s rank correlation between baseline to endpoint change scores. Next, responsiveness was also assessed in terms of standardised response mean (SRM), which was calculated as the ratio of the mean change between baseline and 9-month follow-up scores in a single group to the SD of the change scores (54). Trivial, small, moderate and large magnitude of change was indicated by ≤0.20, 0.20–0.49, 0.50–0.79 and ≥ 0.80 values of SRM, respectively (61). Appendix 4 includes further assessment of responsiveness based on an external approach comparing the extent to which change in a capability measure relates to corresponding change in anchor instruments (62, 63).