Does Assessing Outcomes in Terms of Capability for Schizophrenic Patients with Depression Provide more Information Than use of the NICE Recommended QALY? - An Empirical Comparison of the OxCAP-MH, ICECAP-A and EQ-5D-5L Instruments

There is increasing evidence that assessing outcomes in terms of capability wellbeing provides information beyond that of health-related quality of life measures for evaluation in mental health research. This paper aims to comprehensively compare the properties of the Oxford CAPabilities questionnaire-Mental Health (OxCAP-MH), the ICECAP-A the EQ-5D-5L descriptive system and EQ-5D VAS in schizophrenic patients with depression. Using trial data for 100 patients from the UK, the properties of the instruments were compared in terms of construct validity, including correlations between the OxCAP-MH, the ICECAP-A, the EQ-5D-5L descriptive system and the EQ-5D VAS scores; and comparative assessment of their sensitivity to change based on external anchors. Exploratory factor analysis (EFA) investigated the extent to which the instruments measure complementary or overlapping constructs. The pattern and extent of agreement between all instruments was plotted on Bland-Altman diagrams. (0.441-0.527). provide more information than use of the NICE recommended QALY. OxCAP-MH and ICECAP-A show similar construct validity in severely ill mental health patients within the capability framework. Future research should extend the comparison of the properties of these instruments to other areas of mental health. Overall health, Enjoying social and recreational activities, Losing sleep over worry, Friendship and support, Having suitable accommodation, Feeling safe, Likelihood of discrimination and assault, Freedom of personal and artistic expression, Appreciation of nature, Self-determination and Access to interesting activities or employment (10). The OxCAP-MH initial score (16–80 scale) is converted on to a 0–100 scale referring to minimum and maximum capabilities using the formula: 100 × (OxCAP-MH total score – minimum possible score)/possible range (11). Higher scores indicate better capabilities; items 2, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15 and 16 are reverse coded. The OxCAP-MH has shown validity (5, 11), responsiveness (5, 11) and feasibility (10) in several settings and mental health disease areas and is currently available in the English, German (22) and Hungarian (23) languages with further language translations ongoing. In an earlier factor analysis, Laszewska et al. found that all EQ-5D-5L items and seven OxCAP-MH items loaded on one factor and nine remaining OxCAP-MH items loaded on a separate factor, indicating that the OxCAP-MH may be seen as supplementary rather than complementary in its concept, when compared to the EQ-5D-5L (5). The OxCAP-MH does not yet have a preference-based value set; however, research is on-going to develop a weighting system for its domains. The ICECAP-A is a brief self-reported measure for the general adult population with ve items, each of which can take one of four levels ranging from full capability to no capability. The domains include Stability (being able to feel settled and secure), Attachment (being able to have love, friendship and support), Autonomy (being able to be independent), Achievement (being able to achieve and progress), and Enjoyment (being able to have enjoyment and pleasure) (12). The ICECAP-A has shown validity (16, 17, 19, 24, 25) reliability (26, 27), responsiveness (28) and feasibility (14) in different populations. Beside the original English language version, it is also available in German (26), Chinese (29), Welsh, Dutch, Danish, Persian and was moderate (0.389). The ICECAP-A change scores also moderately correlated with change scores of generic health-related scales (0.307–0.357) and disease-specic instruments (0.295–0.468). The OxCAP-MH change scores had low correlation to generic (0.153–0.202) and moderate to high correlation with disease-specic instruments (0.441–0.527). Since the GAD and WEMWBS measures had the highest correlation with the four wellbeing instruments under investigation in this paper, they were selected as suitable reference anchor instruments for the analysis of responsiveness. measured by the EQ-5D-5L descriptive system and the EQ-5D VAS. The study conrmed that both the OxCAP-MH and ICECAP-A instruments possess good psychometric properties among patients with severe mental health problems. In particular, this paper further conrmed the construct validity of both OxCAP-MH and ICECAP-A questionnaires. Both questionnaires are well correlated with self-reported measures of symptoms of anxiety (assessed with GAD) and general mental health wellbeing (e.g. WEMWBS); and relatively well correlated with instruments measuring depressive symptoms (assessed with BDI) and self-worth/self-esteem (assessed with RSES). The EQ-5D-5L descriptive system showed less sensitivity to capture change and its evaluative space is also limited compared to both capability instruments. The two capability instruments were more convergent with each other than with any HRQoL measure conrming the hypothesised more similar underlying ‘capability’ construct. On the other hand, none of them proved superior to the other one in the current context. Instead, they seem to have different pros and cons. Establishing the psychometric properties of an instrument is a continuous process and further research should replicate this analysis on a higher number of patients and in other disease areas to strengthen these conclusions and explore potential psychometric differences related to diagnosis. Comparisons of OxCAP-MH and/or ICECAP-A with other capability (e.g. Achieved Capabilities Questionnaire for Community Mental Health (54)) or wellbeing (e.g. ReQol (55)) instruments developed for the area of mental health would further contribute to our understanding of their measurement characteristics.


Introduction
The capability approach was developed by Amartya Sen with a core focus on what individuals are free and able to do (i.e., capable of) (1,2). This approach places emphasis on promoting wellbeing through enabling people to realise their capabilities and engage in behaviours that they value (3). There is increasing interest in the use of the capability approach for the economic evaluation of health-related interventions (4). One reason for this is the wider evaluative space this approach offers in comparison to the commonly used methods of assessment (5). Quality-Adjusted Life Years (QALYs) are routinely used as a summary measure of health outcome for economic evaluation, which incorporates the impact on both the quantity and quality of life (6). The quality component is measured with preference-based utility values of health-related quality of life (HRQoL) instruments. Currently EQ-5D is the most commonly recommended such instrument in a number of settings, including the National Institute for Health and Care Excellence (NICE) in the UK (7,8). In its current form, however, QALYs may not capture important consequences where impacts of interventions go beyond a rather narrow de nition of health. For instance, QALYs may be insensitive to the impact of social care interventions and therefore underestimate their full welfare impact in the area of mental health (9). Mental health care interventions usually target both health and social impairments because many people with severe and enduring mental illness experience signi cant functional and social challenges (10).
A recent literature review of capability instruments in economic evaluations of health-related interventions has identi ed 14 instruments, differing in their domains, levels, target populations and interventions (4). Two of these instruments are commonly used and have been validated for the adult population with mental health problems: the Oxford CAPabilities questionnaire-Mental Health (OxCAP-MH) and the ICECAP measure for Adults (ICECAP-A). Both instruments have been shown to move beyond the standard HRQoL approach for the measurement and valuation of outcomes (5,(10)(11)(12)(13)(14)(15)(16)(17)(18). While both instruments are grounded in the capability approach and have been implemented in the mental health context, their conceptual approaches differ. The OxCAP-MH is rooted in Nussbaum's central human capabilities and was developed free from geographical and cultural contexts. It was published in 2013 (10). The ICECAP-A belongs to a broader group of ICECAP capability instruments, each focusing on different aspects of capabilities and life span. It draws on the capability approach, using participatory (qualitative) methods to generate attributes as recommended by Sen (19). The ICECAP-A descriptive system was published in 2012.
Questions remain about whether different applications of the same broad concept of the capability approach result in similar or different measurement properties. Comparative studies of the measurement properties of alternative capability instruments have not been conducted yet, and researchers cannot rely on published studies when choosing between instruments. The lack of such comparative information hinders the future optimisation of research efforts related to quality of life and wellbeing in the (mental) health eld.
Exploring the construct structure and the convergence and divergence between the ICECAP-A and the OxCAP-MH measures would not only contribute to our understanding of which measure may be used in certain settings and provide further information about their complementary or enhanced conceptual properties, but it may also shed light on some broader questions about how each method of instrumentalising the capability theory in uences measurement processes. Moreover, the hypothesis that capability instruments, even when derived from differing conceptual underpinnings, are more correlated to each other than to a HRQoL instrument, e.g. EQ-5D-5L or EQ-5D VAS, has not been tested before in the area of mental health. This paper aims to contribute to the utilisation of the capability approach in mental health research, by exploring the empirical relationship between the OxCAP-MH, the ICECAP-A and the EQ-5D instruments. More speci cally, the purpose of this study is to examine correlations between the OxCAP-MH, the ICECAP-A, the EQ-5D-5L descriptive system and the EQ-5D VAS scores, explore whether they measure complementary or overlapping constructs, and investigate the similarities in how they capture change.
The focus of the paper is on the comparability of the descriptive systems of the instruments, therefore, preference-based weights that are available for the EQ-5D-5L and the ICECAP-A at the time of writing this paper were not used. Moreover, relevant tariff values for the EQ-5D-5L descriptive system and the ICECAP-A have different anchor points. The 0 point of the EQ-5D-5L value set is anchored against 'death', while the 0 point of the ICECAP-A value set is anchored against 'no capability' leading to potential di culties in interpreting in any comparisons based on preference-weighted scales.

Data source
The analysis in this paper was based on data from the PoMeT trial (20), which investigated the impact of Positive Memory Training on depression symptoms of schizophrenia patients (n = 100) in the UK between 2014-2016. The trial received ethical approval from the Berkshire Research Ethics Committee (REC ref 13/SC/0634). Patients were eligible for inclusion if they were between 18-65 years of age, had a DSM-V diagnosis of schizophrenia or schizoaffective disorder, and had at least a mild level of depression as measured by scoring 14 or more on the Beck Depression Inventory-II (21). Patients were assessed at four time points through the 9-month study period: baseline, 3 months, 6 months and 9 months. More details about the PoMeT trial can be found in Steel et al (20).

Instruments
The OxCAP-MH is a self-reported, 16-item, mental health speci c instrument, where items are rated on a 1-5 Likert-scale and each question provides an equal contribution to the overall score. The 16 items cover a broad range of individual wellbeing including: Overall health, Enjoying social and recreational activities, Losing sleep over worry, Friendship and support, Having suitable accommodation, Feeling safe, Likelihood of discrimination and assault, Freedom of personal and artistic expression, Appreciation of nature, Self-determination and Access to interesting activities or employment (10). The OxCAP-MH initial score  scale) is converted on to a 0-100 scale referring to minimum and maximum capabilities using the formula: 100 × (OxCAP-MH total score -minimum possible score)/possible range (11). Higher scores indicate better capabilities; items 2, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15 and 16 are reverse coded. The OxCAP-MH has shown validity (5,11), responsiveness (5,11) and feasibility (10) in several settings and mental health disease areas and is currently available in the English, German (22) and Hungarian (23) languages with further language translations ongoing. In an earlier factor analysis, Laszewska et al. found that all EQ-5D-5L items and seven OxCAP-MH items loaded on one factor and nine remaining OxCAP-MH items loaded on a separate factor, indicating that the OxCAP-MH may be seen as supplementary rather than complementary in its concept, when compared to the EQ-5D-5L (5). The OxCAP-MH does not yet have a preference-based value set; however, research is on-going to develop a weighting system for its domains.
The ICECAP-A is a brief self-reported measure for the general adult population with ve items, each of which can take one of four levels ranging from full capability to no capability. The domains include Stability (being able to feel settled and secure), Attachment (being able to have love, friendship and support), Autonomy (being able to be independent), Achievement (being able to achieve and progress), and Enjoyment (being able to have enjoyment and pleasure) (12). The ICECAP-A has shown validity (16,17,19,24,25) reliability (26,27), responsiveness (28) and feasibility (14) in different populations. Beside the original English language version, it is also available in German (26), Chinese (29), Welsh, Dutch, Danish, Persian and Italian languages (30). Previous factor analysis comparing the ICECAP-A with the items of EQ-5D-5L (31) and EQ-5D-3L (13,15) found that these instruments measure two different constructs and therefore provide potentially different information. A recent systematic literature review found inconsistencies between the ICECAP-A and EQ-5D instruments, suggesting that the ICECAP-A is most appropriately regarded as a complement for and not a substitute to the EQ-5D-3L and EQ-5D-5L in particular (32). The ICECAP-A has a preference-based value set derived from the UK general population (24) and it is increasingly used in economic evaluations (32). The simple addition of ICECAP-A level sum scores ranges from 5 to 20, with higher scores representing better capabilities.
The EQ-5D-5L is one of the most commonly used self-reported generic health status measures, and its validity and reliability have been reported in various health conditions and populations (33). The EQ-5D-5L descriptive system comprises ve dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Beside the original 3-level version (34), a more sensitive, 5-level version exists since 2009 (35). Both versions have value sets in several countries (36); but they can also be used as simple descriptive systems with total scores ranging from 5-15 for the 3L version and 5-25 for the 5L version, with higher scores representing better HRQoL. As part of this instrument, respondents' self-rated health is also recorded on a vertical visual analogue scale (EQ-5D VAS) where scores range between 0-100 referring to worst imaginable health state and best imaginable health state, respectively.
Since the OxCAP-MH and EQ-5D VAS scores range between 0-100, the ICECAP-A level sum scores range between 5-20 and the EQ-5D-5L descriptive system level sum scores range between 5-25, the comparisons between the instruments would be challenging. Hence, all values were transformed to a 0-1 range for the relevant statistical calculations, i.e. in case of responsiveness and agreement analysis. This was calculated as a simple division by 100 in case of the OxCAP-MH and EQ-5D VAS scores, and a transformation of the ICECAP-A and EQ-5D-5L scores in a way that a score of 5 was recalibrated to 0 and scores of 20 and 25 were recalibrated to 1, respectively. The Beck Depression Inventory (BDI), General Anxiety Disorder (GAD), Rosenberg self-esteem scale (RSES), and the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) are all mental-health speci c, self-reported outcome instruments. They were used as anchors for the sensitivity to change analysis to assess external responsiveness.
BDI is a self-reported measure of depressive symptoms and their severity in adolescents and adults according to the Diagnostic and Statistical Manual for Mental Disorder (37). It has 21-items scored on 4-point polytomous response scale ranging from 0 to 3 (21). Scores range between 0 and 63 with higher score representing more severe depression.
GAD is a self-reported measure of anxiety symptoms over the last two weeks. It consists of seven items scored on a 0-3 scale with higher score indicating more severe symptoms (range from 0 to 21) (38). The cut-off scores of 5, 10 and 15 re ect mild, moderate and severe anxiety symptoms, respectively (39).
RSES is a 10-item, self-reported instrument that measures global self-worth by measuring both positive and negative feelings about the self (40). Items are answered using a 4-point polytomous response scale format ranging from strongly agree to strongly disagree. Items 2, 5, 6, 8, 9 are reverse scored.
The self-reported WEMWBS instrument was developed in the UK to assess mental wellbeing including affective-emotional aspects, cognitive-evaluative dimensions and psychological functioning. It is a 14-item scale with 5 response categories ('none of the time', 'rarely', 'some of the time', 'often', 'all of the time'), with a total score ranging from 14-70. A higher score indicates a higher level of mental wellbeing (41).

Statistical analysis
The statistical analysis focused on exploring and comparing the measurement properties of the OxCAP-MH, ICECAP-A, EQ-5D-5L descriptive system and the EQ-5D VAS. Exploratory factor analysis (EFA), correlations of baseline and change scores to test and compare construct validity across the scales, and investigation of responsiveness to change and degree of agreement were carried out. For all analyses, the level of signi cance was determined at p < 0.05, unless stated otherwise. Group comparisons of mean baseline scores were conducted using t-tests for two-group comparisons and ANOVA for multiple group comparison. Analysis was conducted on complete cases, excluding missing items at the relevant time point, unless stated otherwise. EFA was conducted with the freely available FACTOR software, and we used STATA Version 16 for all other analyses.

Construct validity
Construct validity indicates the degree to which the scores of the capability and HRQoL instruments are consistent with the underlying concepts of these wellbeing measures (42,43). Graphical presentation of correlation between baseline and change scores explored the degree of agreement between the four scales. The axis of the graphs represented the minimum and maximum values of the relevant instruments. The hypothesis, that capability instruments and their items have stronger correlation with each other than with a HRQoL instrument, was tested through exploring the correlation between baseline scores. Pearson correlations across OxCAP-MH, ICECAP-A, EQ-5D-5L descriptive system and EQ-5D VAS were calculated at total score-level and at item-level for each time point and assessed based on Cohen's effect size classi cation, namely < 0.3 is small, 0.3 -< 0.5 is moderate and ≥ 0.50 is large (44).
Exploratory factor analysis EFA was conducted on the baseline scores of the OxCAP-MH, ICECAP-A and EQ-5D-5L to examine the overlap between the constructs of the two capability measures and the multidimensional measure of HRQoL, and to study how far they share the same set of underlying factors. Further details on the methods of EFA can be found in Appendix 1.

Responsiveness
Responsiveness was de ned as the ability to capture clinically important changes over time (45). Patients lled out each four scales at both baseline and 9 months, which allowed for an exploration of change in mean scores over time. Responsiveness was assessed in terms of an external approach comparing the extent to which change in a capability measure relates to corresponding change in anchor instruments (46,47). The analysis of responsiveness started with the de nition of 2-4 instruments which could be used as autonomous anchors because they identify change that is unlikely to have arisen by chance (47).
The level of responsiveness was evaluated by de ning groups who worsened, improved or remained stable, based on whether a change in the instrument scores between baseline and 9-month follow-up assessments was measured for individuals by the reference or anchor instruments. The calculation was based on the difference between baseline to 9-month values of standard error of measurement (SEM) using the following formula: SEM was calculated by using the standard deviation (SD) of the instrument multiplied by the square root of one minus its reliability coe cient at baseline and 9 months (11,48). Internal consistency reliability coe cients were calculated for each scale based on the baseline to 3-month and 6-month to 9-month followup scores. More details on the calculation of the difference in SEM values can be found in Appendix 5. There is no consensus about how many SEMs an individual's score must change for that change to be considered clinically meaningful. This paper used the threshold of one SEM, which is known to frequently correspond to a minimally important difference (11,49). In addition, standardised response mean (SRM) was calculated as the ratio of the mean change, between baseline and 9-month follow-up scores in a single group, to the SD of the change scores (42). Small, moderate and large magnitude of change was indicated by 0.20-0.49, 0.50-0.79 and ≥ 0.80 values of SRM, respectively (33). Next, the percentages of the study respondents who improved, worsened or remained stable according to the capability and anchor questionnaires were calculated to explore changes at the individual patient level (5).

Agreement analysis
The pattern and extent of the agreement between OxCAP-MH, ICECAP-A, EQ-5D-5L descriptive system and EQ-5D VAS scores were plotted on Bland and Altman diagrams (50), where the difference between the instruments is shown on the vertical axis of the diagram against the mean of the pair on the horizontal axis (51).

Patient characteristics
Patient characteristics are presented in Table 1. Mean baseline scores for all instruments used in this analysis are presented in Table 2.

Construct validity
Both the graphical (Fig. 1) and numerical (Table 3) presentation of correlations at baseline con rmed the hypothesis that the capability instruments are more correlated with each other than with the EQ-5D-5L descriptive system's level sum scores or the EQ-5D VAS. Correlations between the capability and HRQoL measures (0.315-0.385) were lower than those between OxCAP-MH and ICECAP-A (0.641). The ICECAP-A was slightly more correlated with EQ-5D VAS (0.385) than with the EQ-5D-5L descriptive system (0.354), whilst the OxCAP-MH was somewhat higher correlated with the EQ-5D-5L descriptive system (0.370) than with the EQ-5D VAS (0.315).   Table 3) Exploratory factor analysis A four-factor solution was chosen according to the Kaisers criterion based on a scree plot, as described in the Appendix. EFA with four factors found that all items of the instruments had communalities greater than 0.35, i.e. none of the items struggled to load signi cantly on any factor. Hence, the factor loadings are shown in Table 4 for any factor > 0.35.  Table 4) Factor one consisted of the ve EQ-5D-5L descriptive system domains, with particularly high communalities for all items apart from the Anxiety and depression domain. The Daily activities and Suitable accommodation domains of OxCAP-MH and the Being independent domain of ICECAP-A also loaded to this undoubtedly physical health related factor.
None of the EQ-5D-5L descriptive system domains loaded on factors two, three and four. Only the Feeling settled and secure domain of ICECAP-A loaded on factor two, where high communalities were observed for the domains of OxCAP-MH related to the perception of the settlement and security, e.g. Losing sleep, Neighbourhood safety, Potential for assault and Discrimination. The negative loading of Access to interesting activities is consistent with the direction of scoring of the items.
Factor three consisted of four ICECAP-A domains (the Being independent domain did not load on this factor) and the Social networks, Enjoy recreation, In uence local decisions, Freedom of expression, Love and Support and Planning one's life domains, Imagination and creativity and Access to interesting activities domains of OxCAP-MH.
Factor four consisted of two OxCAP-MH domains, both focusing on the appreciation of a person's environment. These two domains had remarkably high communalities on factor four and did not load to any other factor.

Responsiveness
The Pearson correlation between the baseline to endpoint change scores of the OxCAP-MH, ICECAP-A, EQ-5D-5L and EQ-5D VAS, and the potential reference instruments are presented in Table 5. Table 5 Pearson correlations between change scores of OxCAP-MH, ICECAP-A and EQ-5D-5L descriptive system and EQ-5D VAS scores  Table 5) Correlation between the OxCAP-MH and ICECAP-A change scores was moderate (0.389). The ICECAP-A change scores also moderately correlated with change scores of generic health-related scales (0.307-0.357) and disease-speci c instruments (0.295-0.468). The OxCAP-MH change scores had low correlation to generic (0.153-0.202) and moderate to high correlation with disease-speci c instruments (0.441-0.527). Since the GAD and WEMWBS measures had the highest correlation with the four wellbeing instruments under investigation in this paper, they were selected as suitable reference anchor instruments for the analysis of responsiveness.
(insert Table 6)   Table 6 presents the number of patients improved, deteriorated and remained stable based on assessment by different anchors, and the mean scores in each group. Each instrument captured changes in patients' health state with somewhat similar magnitude. For the study participants who reported improvement in GAD, the improvements in the OxCAP-MH, ICECAP-A and EQ-5D VAS scores were statistically signi cant at the 1% level with large SRM statistics. However, improvement in WEMWBS was associated with statistically signi cant improvement at the 1% level with large SRM statistics reported only in case of the ICECAP-A and the EQ-5D VAS measures, with moderate results for OxCAP-MH. The effect sizes were lower for the EQ-5D-5L descriptive system.
(insert Table 7) *Changes in instrument scores between baseline and 9-month follow-up were categorised as improved, worsened or no change, de nition of groups is based on the difference in SEM; values in agreement are in bold Table 7 contrasts the number of patients who improved, remained stable and deteriorated based on the capability, HRQoL and the anchor instruments. All four measures identi ed the majority of patients in agreement with the anchor instruments, but the EQ-5D-5L descriptive system performed worst. Each instrument classi ed similar proportion of patients as "Stable" (50-74%), indicating similar sensitivity to change. There was no signi cant difference in terms of sensitivity to change between the investigated instruments.

Agreement analysis
The Bland and Altman analysis showed that the OxCAP-MH and ICECAP-A having poorer agreement with EQ-5D-5L descriptive system than with each other or the EQ-5D VAS scale (Fig. 2). There is small average discrepancy between the four instruments; however, the limits of agreement were wider and therefore more ambiguous in the comparisons with EQ-5D-5L descriptive system than in the direct comparison of OxCAP-MH, ICECAP-A and EQ-5D VAS.

Discussion
This paper aimed to contribute to the utilisation of the capability approach in mental health by empirically demonstrating that two instruments embedded in the capability framework but with a different approach to development show different psychometric properties when deployed on the same patient cohort. To our knowledge, this is the rst paper to empirically compare the two most commonly used capability instruments in the area of mental health and compare them simultaneously to HRQoL, measured by the EQ-5D-5L descriptive system and the EQ-5D VAS. The study con rmed that both the OxCAP-MH and ICECAP-A instruments possess good psychometric properties among patients with severe mental health problems. In particular, this paper further con rmed the construct validity of both OxCAP-MH and ICECAP-A questionnaires. Both questionnaires are well correlated with self-reported measures of symptoms of anxiety (assessed with GAD) and general mental health wellbeing (e.g. WEMWBS); and relatively well correlated with instruments measuring depressive symptoms (assessed with BDI) and self-worth/self-esteem (assessed with RSES).
Different aspects of the analysis con rmed that the capability instruments had stronger associations and were more correlated to each other than to the HRQoL instruments, which implies that the capability instruments may be seen supplementary rather than complementary in their concept. The Bland Altman plots showed that the OxCAP-MH and ICECAP-A had poorer agreement with EQ-5D-5L than with each other. The results of the EFA of the items of both capability instruments and the EQ-5D-5L demonstrate that the capability instruments measure concepts beyond the standard interpretation of health because all items of the HRQoL measure loaded onto one factor, whilst the capability instruments spread across multiple factors. The results of the EFA suggest that the four factors represent different aspects of wellbeing measurement. Factor one could be linked to a narrower interpretation of health, but also including independence and suitable accommodation. Factor two includes items related to settlement and security aspects, where the negative loading of access to interesting activities indicates that this might be an auxiliary concept. Most of the ICECAP-A and OxCAP-MH items loaded on factor three, previously interpreted as internal psycho-social aspect of capabilities. These alternative loadings to factors two and three demonstrate the difference between the internal and external aspects of freedom within the capability approach. The ndings of the current study are in line with a qualitative validation study of the Hungarian version of the OxCAP-MH, i.e. most domains in factor two and four of the EFA in this study are associated with the internal aspects of freedom, whilst factor three can be linked to external aspects (23).The two domains of OxCAP-MH related to the capabilities of appreciating people and nature loaded on a separate, fourth factor, indicating that this concept is supplementary and moves beyond the evaluative space included within the ICECAP-A or EQ-5D-5L instruments. Nevertheless, some domains of both the OxCAP-MH and ICECAP-A load on the same factor together with the EQ-5D-5L items. This con rms the ndings of Laszewska et al. (5); namely that the OxCAP-MH indeed measures some aspects of HRQoL and can be considered supplementary rather than complementary to EQ-5D-5L. An interesting nding of the current study is that the "Suitable accommodation" also loaded on to the common factor with EQ-5D-5L, suggesting an association with health. Previous studies exploring EFA between ICECAP-A and EQ-5D measures found that these two instruments share only a few common factors (13,15). This study con rmed that there is a weak overlap between these two instruments, and only the "Being independent" domain of ICECAP-A loads on the same factor as the EQ-5D-5L items. These ndings suggest that both capability instruments can be considered supplementary rather than complementary to EQ-5D-5L.
In contrast to most previous papers, this analysis presents relatively weak correlations between the OxCAP-MH, the ICECAP-A and the EQ-5D-5L descriptive system in the area of mental health. The OxCAP-MH was compared to the EQ-5D-3L and − 5L instruments in a mixed mental health population context and found correlation coe cients between 0.45-0.66 (5,11). Similar correlations were observed between the ICECAP-A and the EQ-5D instruments when they were compared for opiate dependent patients. The study by Goranitis et al. found that ICECAP-A and EQ-5D-5L have similar construct validity when compared to other clinical measures (17). The slightly different results of the current study con rm previously identi ed weaknesses of the EQ-5D-5L instrument to measure HRQoL in severely ill mental health patients (52). Our results also con rm the ndings of a study comparing ICECAP-A and EQ-5D-5L instruments in the area of depression, which concluded that instruments designed speci cally to measure depression and mental health explained a greater proportion of the variation in ICECAP-A than the EQ-5D-5L (53).
In terms of sensitivity to change, no signi cant differences were observed between the two capability instruments, and the EQ-5D VAS performed better than the EQ-5D-5L descriptive system. The ICECAP-A seems to be slightly more correlated with generic measures, including the EQ-5D-5L descriptive system and the EQ-5D VAS, whilst the OxCAP-MH seems to be more highly correlated with disease-speci c measures. This could be explained by either its supplementary nature, or the fact that the OxCAP-MH is a more detailed and longer questionnaire. The OxCAP-MH and ICECAP-A instruments are both embedded in the capability approach, but they were developed with a different approach, and this study has shown that they thereby show different psychometric properties when deployed on the same patient cohort. A major advantage of using the ICECAP-A in economic evaluations is the availability of its preference-based value set and its shorter length, which reduces the burden for respondents. Future research could explore the relationship of preference-based scores once those become available for the OxCAP-MH instrument if the relevant scale anchors allow.
Limitation of this research include a restricted number of data points compared to the number of items. Hence, the robustness of the EFA may be limited. In addition, the lack of an objective scale, which could indicate whether a patient has improved and which could be used as an absolute anchor in the calculations of external responsiveness statistics and Bland-Altman plots, could have potentially introduced some bias. External responsiveness could not be assessed by methods which require a gold-standard anchor, such as the Receiver Operating Characteristic (ROC) curve analysis. The reason for this is that none of the instruments in this study could be used as appropriate reference standards because they are all patient-reported measures (16). The responsiveness statistics also relied on a relatively small number of patients in most identi ed groups.

Conclusion
The main conclusion of this study is that assessing outcomes in terms of capability for schizophrenic patients with depression provide more information than use of the NICE recommended QALY. Both the OxCAP-MH and the ICECAP-A are valid instruments to measure the impacts of mental health interventions within the capability framework. The EQ-5D-5L descriptive system showed less sensitivity to capture change and its evaluative space is also limited compared to both capability instruments. The two capability instruments were more convergent with each other than with any HRQoL measure con rming the hypothesised more similar underlying 'capability' construct. On the other hand, none of them proved superior to the other one in the current context. Instead, they seem to have different pros and cons. Establishing the psychometric properties of an instrument is a continuous process and further research should replicate this analysis on a higher number of patients and in other disease areas to strengthen these conclusions and explore potential psychometric differences related to diagnosis. Comparisons of OxCAP-MH and/or ICECAP-A with other capability (e.g. Achieved Capabilities Questionnaire for Community Mental Health (54)) or wellbeing (e.g. ReQol (55) Correlations between OxCAP-MH and EQ-5D-5L descriptive system scores at baseline Figure 3 Correlations between OxCAP-MH and VAS scores at baseline Figure 4 Correlations between ICEAP-A and EQ-5D-5L descriptive system level sum scores at baseline Correlations between EQ-5D-5L descriptive system level sum scores and VAS scores at baseline Bland-Altman plot of difference in OxCAP-MH vs ICEAP-A change scores Bland-Altman plot of difference in OxCAP-MH vs EQ.5D-5L descriptive system change scores Bland-Altman plot of difference in OxCAP-MH vs VAS change scores Figure 10 Bland-Altman plot of difference in ICECAP-A and EQ-5D-5L descriptive system change scores Figure 11 Bland-Altman plot of difference of ICECAP-A and VAS change scores Bland-Altman plot of difference of EQ-5D-5L descriptive system and VAS change scores

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementarymaterialOxCAPMHICECAPAcomparison.pdf