Do alternative capability instruments capture differing aspects of mental health and quality of life in comparison to the EQ-5D for schizophrenic patients with depression?

Background: There is increasing evidence that assessing outcomes in terms of capability wellbeing provides information beyond that of health-related quality of life measures for evaluation in mental health research. This paper aimed to contribute to the utilisation of capability-based measures relative to health status measures in mental health research, by exploring the empirical relationship between the OxCAP-MH, the ICECAP-A, and the EQ-5D instruments against condition-specic measures and each other for schizophrenic patients with depression. Methods: Using trial data for 100 patients from the UK, the properties of the instruments were compared in terms of convergent validity, including correlations between the OxCAP-MH, the ICECAP-A, the EQ-5D-5L descriptive system and the EQ-5D VAS scores; and a line of central tendency between two variables was plotted on locally weighted smoothing curves. Exploratory factor analysis (EFA) investigated the extent to which the instruments measured complementary or overlapping constructs. Responsiveness was assessed in terms of standardised response mean and correlation between change scores (baseline to endpoint) of the instruments. Results: Correlation between the OxCAP-MH and ICECAP-A baseline scores was strong (0.682) and between change scores was moderate (0.401). The baseline scores of both capability instruments correlated more with condition-specic (0.481-0.718) than with generic (0.344-0.425) instruments. Their change scores weakly correlated with change scores of the generic health-related scales (0.183-0.247), but moderately with those of condition-specic instruments (0.339-0.557). The EFA found that while the EQ-5D-5L descriptive system loaded onto one factor, the items of the ICECAP-A loaded onto two additional factors and the items of the OxCAP-MH loaded onto three additional factors. Conclusions: The capability instruments had stronger convergent validity with each other than with any of the other instruments. Assessing outcomes in terms of capabilities for schizophrenic patients with depression proved to capture broader relevant information than the EQ-5D-5L both by the OxCAP-MH and the ICECAP-A, albeit to different extent. When comparing the two capability instruments, the OxCAP-MH tended to have stronger correlations with condition-specic instruments most likely due to its origin from mental health outcome measurement, while the ICECAP-A had slightly stronger correlation with the EQ-5D VAS. limited condition-specic This paper aimed to contribute to the utilisation of capability-based measures relative to health status measures in mental health research, by exploring the empirical relationship between the OxCAP-MH, the ICECAP-A, and the EQ-5D instruments against condition-specic measures and each other for schizophrenic patients with depression. We conducted analyses to assess the measures': (1) convergent validity, based on the strength and statistical signicance of correlations, using statistical and graphical methods; (2) construct validity by using exploratory factor analysis (EFA) to assess the underlying construct of the measures; and (3) responsiveness, based on standardised response mean (SRM). The ndings of this study should provide relevant information for researchers and decision-makers on how to optimise future quality-of-life information collection given a patient population similar to our study. their different approaches the followed a bottom-up method. The ndings instrument selection for evaluations or other types of assessment in this context. To our knowledge, the rst paper to empirically compare the two most commonly used capability instruments in the area of mental health and compare them simultaneously to condition-specic and generic HRQoL scales. In particular, this paper further conrmed the construct validity of both OxCAP-MH and ICECAP-A questionnaires as instruments measuring capabilities. Both scales are strongly correlated with self-reported measures of general mental health wellbeing (e.g. WEMWBS) and instruments measuring depressive symptoms (assessed with BDI) and self-worth/self-esteem (assessed with RSES). However, the ICECAP-A is somewhat more strongly correlated with symptoms of anxiety (assessed with GAD) than the OxCAP-MH. The two capability instruments are strongly correlated with each other, but moderately correlated with the EQ-5D-5L descriptive system and EQ-5D VAS. In the ICECAP-A more responsive than OxCAP-MH. The EQ-5D VAS performed better than the EQ-5D-5L descriptive system. The ICECAP-A seemed to be slightly more correlated with generic measures, including the EQ-5D-5L descriptive system and the EQ-5D VAS, whilst the OxCAP-MH seemed to be more highly correlated with disease-specic measures. This could be explained by either the supplementary nature of capability instruments, or the fact that the OxCAP-MH is a more detailed and longer questionnaire. The OxCAP-MH and ICECAP-A instruments are both embedded in the capability approach, but they were developed with a different approach, and this study showed that they thereby exhibited somewhat different psychometric properties when deployed on the same patient cohort. A major advantage of using the ICECAP-A in economic evaluations is the availability of its preference-based value set and its shorter length, which reduces the burden for respondents. Future research could explore the relationship of preference-based scores once those become available for the OxCAP-MH instrument if the relevant scale anchors allow. OxCAP-MH and/or


Introduction
The capability approach was developed by Amartya Sen with a core focus on what individuals are free and able to do (i.e., capable of) (1,2). This approach places emphasis on promoting wellbeing through enabling people to realise their capabilities and engage in behaviours that they value (3). There is increasing interest in the use of the capability approach for the economic evaluation of health-related interventions (4). One reason for this is the wider evaluative space this approach offers for the measurement of quality-of-life in comparison to the commonly used methods of assessing impacts in terms of health-related quality of life (HRQoL) (5). Currently the EQ-5D is the most commonly recommended and used HRQoL instrument (6,7). However, the EQ-5D-5L may not capture important impacts of interventions when these go beyond health change and therefore underestimate their full welfare impact (8). Mental health care interventions usually target both health and social impairments because many people with severe and enduring mental illnesses experience signi cant functional and social challenges (9). Hence, their impacts go beyond HRQoL (10); particularly in the case of schizophrenia (11). People living with schizophrenia are more likely to be homeless, unemployed, or living in poverty compared with the general population; moreover, their disease is progressive and causes disability, dependence to care, and need for assistance of the surrounding people in daily life activities increases (12,13). The prevalence of depressive disorder in schizophrenia has been reported to be around 50% (14,15). Evidence suggests that depression is linked to poorer outcomes in schizophrenia, paired with particularly high levels of health care use and suicide (14)(15)(16).
The capability approach has been highlighted as a way of facilitating people with a lived experience of mental health di culties to engage with their values and priorities (3). A recent literature review of capability instruments in economic evaluations of health-related interventions has identi ed 14 instruments, differing substantially in their development methods, domains, levels, target populations and interventions (4). Two of these instruments are commonly used and have been validated for the adult population with mental health problems: the Oxford CAPabilities questionnaire-Mental Health (OxCAP-MH) and the ICECAP measure for Adults (ICECAP-A). Both instruments have been shown to move beyond the standard HRQoL approach for the measurement and valuation of outcomes (5,9,(17)(18)(19)(20)(21)(22)(23)(24). While both instruments are grounded in the capability approach and have been implemented in the mental health context, their conceptual approaches differ. The OxCAP-MH was developed in the UK in the context of mental health outcome measurement using a topdown approach rooted in Nussbaum's central human capabilities free from geographical and cultural contexts. It was published in 2013 (9). The ICECAP-A belongs to a broader group of ICECAP capability instruments, each focusing on different aspects of capabilities according to a given life stage. It was also developed in the UK using bottom-up participatory (qualitative) methods to generate contextual capability attributes as recommended by Sen (25). The ICECAP-A descriptive system was published in 2012.
Questions remain about whether different applications of the same broad concept of the capability approach result in similar or different measurement properties. Comparative studies of the measurement properties of alternative capability instruments have not been conducted yet, and researchers cannot rely on published studies when choosing between instruments. The lack of such comparative information hinders the future optimisation of research efforts related to quality of life and wellbeing in the (mental) health eld.
Exploring the construct structures of and the convergence and divergence between the OxCAP-MH and the ICECAP-A measures would provide further information about their complementary or enhanced conceptual properties and contribute to our understanding of their pros and cons in certain contexts. Moreover, it may also shed light on some broader questions about how each method of instrumentalising the capability theory in uences measurement processes. Additionally, the hypothesis that capability instruments, regardless of using a topdown or bottom-up approach to their development, are more correlated to each other than to a HRQoL instrument, e.g. EQ-5D-5L or EQ-5D VAS, has not been tested before in the area of mental health. Laszewska et al. found that all EQ-5D-5L items and seven OxCAP-MH items loaded on one factor and nine remaining OxCAP-MH items loaded on a separate factor, indicating that the OxCAP-MH may be seen as supplementary rather than complementary in its concept, when compared to the EQ-5D-5L [5]. Previous factor analysis comparing the ICECAP-A with the items of EQ-5D-5L [51] and EQ-5D-3L [18,20] found that these instruments measure two different constructs and therefore provide potentially different information. A recent systematic literature review found inconsistencies between the ICECAP-A and EQ-5D instruments, suggesting that the ICECAP-A is most appropriately regarded as a complement for and not a substitute to the EQ-5D-3L and EQ-5D-5L in particular [36]. In addition, capability instruments have only been compared to a limited number of condition-speci c mental health instruments (17,26).
This paper aimed to contribute to the utilisation of capability-based measures relative to health status measures in mental health research, by exploring the empirical relationship between the OxCAP-MH, the ICECAP-A, and the EQ-5D instruments against condition-speci c measures and each other for schizophrenic patients with depression. We conducted analyses to assess the measures': (1) convergent validity, based on the strength and statistical signi cance of correlations, using statistical and graphical methods; (2) construct validity by using exploratory factor analysis (EFA) to assess the underlying construct of the measures; and (3) responsiveness, based on standardised response mean (SRM). The ndings of this study should provide relevant information for researchers and decision-makers on how to optimise future quality-of-life information collection given a patient population similar to our study.

Data source
The analysis in this paper was based on data from the PoMeT trial (15), which investigated the impact of Positive Memory Training on depression symptoms of schizophrenia patients (n=100) in the UK between 2014-2016. The trial received ethical approval from the Berkshire Research Ethics Committee (REC ref 13/SC/0634). Patients were eligible for inclusion if they were between 18-65 years of age, had a DSM-V diagnosis of schizophrenia or schizoaffective disorder, and had at least a mild level of depression as measured by scoring 14 or more on the Beck Depression Inventory-II (27). Participants were identi ed by trial research assistants, working in collaboration with care coordinators based within community mental health teams. Randomisation was strati ed by site and severity of depression (above and below a BDI-II score of 29, i.e. a severe level of depression) using randomised-permuted blocks (15). Patients were assessed at four time points through the 9-month study period: baseline, 3 months, 6 months and 9 months. More details about the PoMeT trial can be found in Steel et al (15).

Instruments
The OxCAP-MH is a self-reported, 16-item instrument developed in the context of mental health outcome measurement , where items are rated on a 1-5 Likert-scale and each question provides an equal contribution to the overall score. The 16 items cover a broad range of individual wellbeing aspects including: Overall health, Enjoying social and recreational activities, Losing sleep over worry, Friendship and support, Having suitable accommodation, Feeling safe, Likelihood of discrimination and assault, Freedom of personal and artistic expression, Appreciation of nature, Self-determination and Access to interesting activities or employment (9). The OxCAP-MH initial score (16-80 scale) is converted on to a 0-100 scale referring to minimum and maximum capabilities using the formula: 100 × (OxCAP-MH total score -minimum possible score)/possible range (17). Higher scores indicate better capabilities; items 2, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15 and 16 are reverse coded. The OxCAP-MH has shown validity (5, 17), responsiveness (5, 17) and feasibility (9) in several settings and mental health disease areas, including schizophrenia and depression (14,15,24). It is currently available in the English, German (28), Hungarian (29) and Luganda languages with further language translations ongoing. The OxCAP-MH does not yet have a preference-based value set, so far it has been used in economic evaluations as a score; however, research is on-going to develop a weighting system for its domains.
The ICECAP-A is a brief self-reported measure for the general adult population with ve items, each of which can take one of four levels ranging from full capability to no capability. The domains include Stability (being able to feel settled and secure), Attachment (being able to have love, friendship and support), Autonomy (being able to be independent), Achievement (being able to achieve and progress), and Enjoyment (being able to have enjoyment and pleasure) (18). The ICECAP-A has shown validity (22,23,25,30,31) reliability (32,33), responsiveness (34) and feasibility (20) in different populations, including depression (35). The simple addition of ICECAP-A level sum scores ranges from 5 to 20, with higher scores representing better capabilities. Beside the original English language version, it is also available in German (32), Chinese (36), Welsh, Dutch, Danish, Persian and Italian languages (37). The ICECAP-A has a preference-based value set derived from the UK general population (30) and it is increasingly used in economic evaluations (38).
The EQ-5D is one of the most commonly used self-reported generic health status measures, and its validity and reliability have been reported in various health conditions and populations (39). The EQ-5D descriptive system comprises ve dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Beside the original 3-level version (40), a more sensitive, 5-level version exists since 2009 (41), and both versions have value sets developed in several countries (42). The EQ-5D instruments can be used to summarise change with or without using value sets (43); however, value sets can introduce an exogenous source of variance into statistical inference (44). The EuroQol Group guidance to users of EQ-5D value sets warns against using value sets to produce a single index for statistical analysis of pro les that are meant to be purely descriptive. The EQ-5D instruments can be used as simple descriptive systems with total scores ranging from 5-15 for the 3L version and 5-25 for the 5L version, with higher scores representing better HRQoL. As part of the EQ-5D-5L, respondents' self-rated health is also recorded on a vertical visual analogue scale (EQ-5D VAS) where scores range between 0-100 referring to worst imaginable health state and best imaginable health state, respectively. The EQ-5D VAS can capture further important, complementary information to the health state information patients provide when they self-report their health on the EQ-5D (45).
The analysis also included comparative information from four condition-speci c instruments used in the POMET trial due to their ability to capture important aspects of the condition from the patient's perspective and thereby enabled the assessment of the OxCAP-MH, ICECAP-A, and EQ-5D-5L instruments' ability to re ect clinically relevant mental health outcomes. The Beck Depression Inventory (BDI), General Anxiety Disorder (GAD), Rosenberg self-esteem scale (RSES), and the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) are all mental-health speci c, self-reported measures.
BDI is a self-reported measure of depressive symptoms and their severity in adolescents and adults according to the Diagnostic and Statistical Manual for Mental Disorder (46). It has 21-items scored on 4-point polytomous response scale ranging from 0 to 3 (27). Scores range between 0 and 63 with higher score representing more severe depression.
GAD is a self-reported measure of anxiety symptoms over the last two weeks. It consists of seven items scored on a 0-3 scale with higher score indicating more severe symptoms (range from 0 to 21) (47). The cut-off scores of 5, 10 and 15 re ect mild, moderate and severe anxiety symptoms, respectively (48).
RSES is a 10-item, self-reported instrument that measures global self-worth by measuring both positive and negative feelings about the self (49). Items are answered using a 4-point polytomous response scale format ranging from strongly agree to strongly disagree. Items 2, 5, 6, 8, 9 are reverse scored.
The self-reported WEMWBS instrument was developed in the UK to assess mental wellbeing including affective-emotional aspects, cognitive-evaluative dimensions and psychological functioning. It is a 14-item scale with 5 response categories ('none of the time', 'rarely', 'some of the time', 'often', 'all of the time'), with a total score ranging from 14-70. A higher score indicates a higher level of mental wellbeing (50).

Level sum scores
Using preference-based index values is an important aspect of utility-based economic evaluations; however, it can also introduce an exogenous source of variance into statistical inference (44). Since the focus of this paper was on the comparability of the descriptive systems of the instruments; preference-based weights available for the ICECAP-A and the EQ-5D-5L at the time of writing this paper were not used and instead level sum scores were applied for all instruments. In this way we also avoided the ceiling effect present in the case of both preference-based index values, as presented in Appendix 1. Moreover, relevant value sets for the EQ-5D-5L descriptive system and the ICECAP-A have different anchor points. The 0 point of the EQ-5D-5L value set is anchored against 'death', while the 0 point of the ICECAP-A value set is anchored against 'no capability' leading to further potential di culties in interpreting any comparisons based on preferenceweighted scales.

Statistical analysis
The statistical analysis focused on exploring and comparing the measurement properties of the OxCAP-MH, ICECAP-A, EQ-5D-5L descriptive systems and the EQ-5D VAS. Correlations of baseline and change scores to test and compare construct validity across the scales, exploratory factor analysis (EFA) and investigation of responsiveness to change and degree of agreement were carried out. For all analyses, the level of signi cance was determined at p < 0.05, unless stated otherwise. Group comparisons of mean baseline scores were conducted using the Wilcoxon rank-sum test (51,52) for two-group comparisons and Kruskal-Wallis one-way ANOVA for multiple group comparison (53). Analysis was conducted on complete cases, excluding missing items at the relevant time point, unless stated otherwise. EFA was conducted with the freely available FACTOR software, and we used STATA Version 16 for all other analyses.

Convergent validity
Convergent validity indicates the degree to which two measures of constructs that theoretically should be related, are in fact related (54,55). The hypothesis, that capability instruments and their items have stronger correlation with each other than with a HRQoL instrument, was tested through exploring the correlation between baseline scores. Spearman's rank correlations across OxCAP-MH, ICECAP-A, EQ-5D-5L descriptive system, EQ-5D VAS and condition-speci c measures were calculated at total score-level at baseline and assessed based on Cohen's effect size classi cation, namely < 0.3 is small, 0.3 -< 0.5 is moderate and ≥ 0.50 is large (56).
Next, locally weighted smoothing curve (LOWESS) t lines were used to graphically indicate nonlinear trends in the scatter plots between OxCAP-MH, ICECAP-A, EQ-5D-5L descriptive system and EQ-5D VAS scores (57). LOWESS is a form of nonparametric regression that plots a line of central tendency between two variables on a scatterplot, thereby visualizing the relationship across the possible score ranges. LOWESS captures general patterns in the relationship between two measures without making assumptions about their actual relationship (58).
Exploratory factor analysis EFA is a method to uncover the underlying structure of variables and is therefore used to assess whether an instrument measures what it intends to measure (59). It was used to evaluate the construct validity through the factors (or concepts) assessed by the instruments and their relevance to the underlying construct. EFA was conducted on the baseline scores of the OxCAP-MH, ICECAP-A and EQ-5D-5L descriptive system to examine the overlap between the constructs of the two capability measures and the multidimensional measure of HRQoL, and to study how far they share the same set of underlying factors. Further details on the methods of EFA can be found in Appendix 3.

Responsiveness
Responsiveness was de ned as the ability to capture clinically important changes over time (60). Patients lled out each scales at both baseline and 9 months, which allowed for an exploration of change in mean scores over time across all instruments. Responsiveness was assessed by Spearman's rank correlation between baseline to endpoint change scores. Next, responsiveness was also assessed in terms of standardised response mean (SRM), which was calculated as the ratio of the mean change between baseline and 9-month follow-up scores in a single group to the SD of the change scores (54). Trivial, small, moderate and large magnitude of change was indicated by ≤0.20, 0.20-0.49, 0.50-0.79 and ≥ 0.80 values of SRM, respectively (61). Appendix 4 includes further assessment of responsiveness based on an external approach comparing the extent to which change in a capability measure relates to corresponding change in anchor instruments (62, 63).

Patient characteristics
Page 6/16 The mean age of respondents was 43 years. Further patient characteristics are presented in Table 1.

Convergent validity
Mean baseline scores for all instruments used in this analysis are presented in Table 2. Graphical presentation of baseline correlations are included in Appendix 2.
Spearman's rank correlations at baseline con rmed the hypothesis that the capability instruments are more correlated with each other than with the EQ-5D-5L descriptive system's level sum scores or the EQ-5D VAS. Correlations between the capability and HRQoL measures (0.344-0.425) were lower than those between OxCAP-MH and ICECAP-A (0.682). The ICECAP-A was slightly more correlated with EQ-5D VAS (0.425) than with the EQ-5D-5L descriptive system (0.344), whilst the OxCAP-MH was somewhat higher correlated with the EQ-5D-5L descriptive system (0.381) than with the EQ-5D VAS (0.354). The baseline scores of both capability instruments were higher correlated with condition-speci c (0.481-0.718) than with generic (0.344-0.425) instruments.
The scatter plots with the LOWESS lines visualising the relationship between the instruments are presented in Figures 1a-1f. The LOWESS lines' stagnating or upward trend indicates that, as expected, a higher score (better state) in one instrument tends to be associated with higher or unchanged score on the other scales. This indicates that the capability and HRQoL instruments share a general broad pattern in capturing the full welfare impact of mental health care. However, the most uctuating trend could be observed between the OxCAP-MH and ICECAP-A instruments, suggesting that the alternative capability measures might capture differing aspects of mental health and quality of life.

Exploratory factor analysis
A four-factor solution was chosen according to the Kaisers criterion based on a scree plot, as described in Appendix 3. EFA with four factors found that all items of the instruments had communalities greater than 0.35, i.e. none of the items struggled to load signi cantly on any factor. Hence, the factor loadings are shown in Table 3 for any factor >0.35.
Factor one consisted of the ve EQ-5D-5L descriptive system domains, with particularly high communalities for all items apart from the Depression/anxiety domain. The Limit daily activities and Suitable at situation domains of OxCAP-MH and the Being independent domain of ICECAP-A also loaded to this undoubtedly HRQoL-related factor. None of the EQ-5D-5L descriptive system domains loaded on factors two, three and four, thereby suggesting that these factors capture capability-related concepts. Only the Feeling settled and secure domain of ICECAP-A loaded on factor two, where high communalities were observed for the domains of OxCAP-MH related to the perception of the settlement and security, e.g. Less sleep over worries, Safety in neighbourhood, Probability for assault and Probability of discrimination. The negative loading of Access to interesting activities/employment is consistent with the direction of scoring of the items. Factor three consisted of four ICECAP-A domains (the Being independent domain did not load on this factor) and the Meet socially with friends and family, Enjoy free time activities, Local decisions, Freedom of expression, Enjoy love and Support and Freedom of deciding for yourself domains, Creativity and Access to interesting activities/employment domains of OxCAP-MH.
Factor four consisted of two OxCAP-MH domains, both focusing on the appreciation of a person's environment. These two domains had remarkably high communalities on factor four and did not load to any other factor.

Responsiveness
The Spearman's rank correlation between the baseline to endpoint change scores of the OxCAP-MH, ICECAP-A, EQ-5D-5L and EQ-5D VAS, and the potential reference instruments are presented in Table 4.
Correlation between the OxCAP-MH and ICECAP-A change scores was moderate (0.401). The ICECAP-A change scores weakly correlated with change scores of generic health-related scales (0.227-0.247), whilst moderate correlations were observed in case of self-reported condition-speci c instruments (0.339-0.461). The OxCAP-MH change scores had weak correlation with generic scales (0.183-0.203), but moderate to high correlation with self-reported condition-speci c instruments (0.440-0.557).
All SRM-based responsiveness statistics revealed a similar order of responsiveness ( Table 5). The depression speci c BDI was the most responsive instrument (SRM=0.77), whilst the EQ-5D-5L descriptive system (SRM=0.13) was the least responsive scale. The ICECAP-A was more responsive than the OxCAP-MH (SRM= 0.28 vs. 0.17). The results echo the ndings of an anchor-based analysis presented in Appendix 4.

Discussion
This paper aimed to contribute to the utilisation of the capability approach in mental health by empirically demonstrating that two instruments embedded in the capability framework may show somewhat different psychometric properties when deployed on the same patient cohort. This could be explained by their different approaches to development, namely that the OxCAP-MH followed a more topdown approach, whilst the ICECAP-A followed a bottom-up method. The ndings of this study could be used as guidance when optimising instrument selection for economic evaluations or other types of assessment in this context. To our knowledge, this is the rst paper to empirically compare the two most commonly used capability instruments in the area of mental health and compare them simultaneously to condition-speci c and generic HRQoL scales. In particular, this paper further con rmed the construct validity of both OxCAP-MH and ICECAP-A questionnaires as instruments measuring capabilities. Both scales are strongly correlated with self-reported measures of general mental health wellbeing (e.g. WEMWBS) and instruments measuring depressive symptoms (assessed with BDI) and self-worth/self-esteem (assessed with RSES). However, the ICECAP-A is somewhat more strongly correlated with symptoms of anxiety (assessed with GAD) than the OxCAP-MH. The two capability instruments are strongly correlated with each other, but moderately correlated with the EQ-5D-5L descriptive system and EQ-5D VAS.
Capability instruments may be seen supplementary rather than complementary in their concept and their evaluative space capture more of what might be important for those with this condition because their correlation to self-reported condition-speci c instruments is stronger. The LOWESS plots across the OxCAP-MH, ICECAP-A and EQ-5D-5L instruments showed that a higher score in one scale tends to be associated with higher or unchanged score on the other scales. This con rmed that the capability and HRQoL instruments share a general broad pattern in capturing the full welfare impact of mental health care. However, the results of the EFA of the items of both capability instruments and the EQ-5D-5L demonstrated that the capability instruments measure concepts beyond the standard interpretation of health because all items of the HRQoL measure loaded onto one factor, whilst the capability instruments spread across multiple factors. The results of the EFA suggested that the four factors represent different aspects of wellbeing measurement. Factor one could be linked to HRQoL, but also including independence and suitable accommodation which suggests an interesting link between these domains. Factors two, three and four seemed to include capability-speci c domains. Factor two included items related to settlement and security aspects, where the negative loading of access to interesting activities indicates that this might be an auxiliary concept. Most of the OxCAP-MH and ICECAP-A items loaded on factor three, which could be interpreted as the internal psycho-social aspect of capabilities. These alternative loadings to factors two and three demonstrated the difference between the internal and external aspects of freedom within the capability approach. The ndings of the current study were in line with a qualitative validation study of the Hungarian version of the OxCAP-MH, i.e. most domains in factor two and four of the EFA in this study were associated with the internal aspects of freedom, whilst factor three can be linked to external aspects (29)The two domains of OxCAP-MH related to the capabilities of appreciating people and nature loaded on a separate, fourth factor, indicating that this concept is supplementary and moves beyond the evaluative space included within the ICECAP-A or EQ-5D-5L instruments. Nevertheless, some domains of both the OxCAP-MH and ICECAP-A loaded on the same factor together with the EQ-5D-5L items. This con rmed the ndings of Laszewska et al. (5); namely that the OxCAP-MH indeed measures some aspects of HRQoL and can be considered supplementary rather than complementary to EQ-5D-5L. An interesting nding of the current study is that the "Suitable at situation" also loaded on to the common factor with EQ-5D-5L, suggesting an association with health. Previous studies exploring EFA between ICECAP-A and EQ-5D measures found that these two instruments share only a few common factors (19,21). This study con rmed that there is a weak overlap between these two instruments, and only the "Being independent" domain of ICECAP-A loaded on the same factor as the EQ-5D-5L items. These ndings suggested that both capability instruments could be considered supplementary rather than complementary to EQ-5D-5L. Nevertheless, if the purpose of the intervention in question is to improve mental health, than a mental health focused measure (e.g. ReQol (64)) could capture these relevant aspects too. Capability instruments capture broader aspects than physical and mental health, but it is currently debatable whether they should be used alongside or supplementary to other measures. The results of the EFA suggested that the both capability instruments captured relevant aspects for common mental health disorders, however, the OxCAP-MH seemed to capture further traits which might be relevant to people with severe mental health disorders such as schizophrenic patients with depression.
In contrast to most previous papers, this analysis presented relatively weak correlations between the OxCAP-MH, the ICECAP-A and the EQ-5D-5L descriptive system in the area of mental health, which could also be linked to the use of level sum scores. The OxCAP-MH was compared to the EQ-5D-3L and -5L instruments in a mixed mental health population context and found correlation coe cients between 0.45-0.66 (5,17). Similar correlations were observed between the ICECAP-A and the EQ-5D instruments when they were compared for opiate dependent patients. The study by Goranitis et al. found that ICECAP-A and EQ-5D-5L have similar construct validity when compared to other clinical measures (23). The slightly different results of the current study con rmed previously identi ed weaknesses of the EQ-5D-5L instrument to measure HRQoL in severely ill mental health patients (65). Our results also con rmed the ndings of a study comparing ICECAP-A and EQ-5D-5L instruments in the area of depression, which concluded that instruments designed speci cally to measure depression and mental health explained a greater proportion of the variation in ICECAP-A than the EQ-5D-5L (35).
In terms of sensitivity to change, the ICECAP-A was slightly more responsive than the OxCAP-MH. The EQ-5D VAS performed better than the EQ-5D-5L descriptive system. The ICECAP-A seemed to be slightly more correlated with generic measures, including the EQ-5D-5L descriptive system and the EQ-5D VAS, whilst the OxCAP-MH seemed to be more highly correlated with disease-speci c measures. This could be explained by either the supplementary nature of capability instruments, or the fact that the OxCAP-MH is a more detailed and longer questionnaire. The OxCAP-MH and ICECAP-A instruments are both embedded in the capability approach, but they were developed with a different approach, and this study showed that they thereby exhibited somewhat different psychometric properties when deployed on the same patient cohort. A major advantage of using the ICECAP-A in economic evaluations is the availability of its preference-based value set and its shorter length, which reduces the burden for respondents. Future research could explore the relationship of preferencebased scores once those become available for the OxCAP-MH instrument if the relevant scale anchors allow.
Limitation of this research included a restricted number of data points compared to the number of items. Hence, the robustness of the EFA and the responsiveness analysis may be limited. A further limitation of this study was the relatively small sample size, which motivated the application of available-case analysis for individual analyses. Establishing the psychometric properties of an instrument is a continuous process and further research should replicate this analysis for a higher number of patients and in other disease areas to strengthen these conclusions and explore potential psychometric differences related to diagnosis. Comparisons of OxCAP-MH and/or ICECAP-A with other capability (e.g. Achieved Capabilities Questionnaire for Community Mental Health (66)) or wellbeing (e.g. ReQol (64)) instruments developed for the area of mental health would further contribute to our understanding of their measurement characteristics. Future research should explore the best method for incorporating capability measures into economic evaluation in the area of mental health (24).

Conclusion
The main conclusion of this study is that assessing outcomes in terms of capability for schizophrenic patients with depression provided more relevant, condition-speci c information than the EQ-5D-5L descriptive system or the EQ-5D VAS. The EQ-5D-5L descriptive system showed less sensitivity to capturing change than the OxCAP-MH and the ICECAP-A instruments. The two capability instruments were more convergent with each other than with any HRQoL measure, con rming the hypothesised more similar underlying 'capability' construct. On the other hand, neither proved superior to the other in the current context. Instead, they seem to have different pros and cons. The ICECAP-A is more responsive than the OxCAP-MH and slightly more strongly correlated with the EQ-5D-5L descriptive system and the EQ-5D VAS; whilst the OxCAP-MH tends to have stronger correlations than the ICECAP-A with condition-speci c instruments due to its mental health speci c design.        Figure 1 a. Scatterplots and LOWESS lines for OxCAP-MH vs ICEAP-A change scores b. Scatterplots and LOWESS lines forOxCAP-MH vs EQ.5D-5L descriptive system change scores c. Scatterplots and LOWESS lines for OxCAP-MH vs EQ-5D VAS change scores d. Scatterplots and LOWESS lines for ICECAP-A and EQ-5D-5L descriptive system change scores e. Scatterplots and LOWESS lines for ICECAP-A and EQ-5D VAS change scores f. Scatterplots and LOWESS lines for EQ-5D-5L descriptive system and EQ-5D VAS change scores Figure 2