An Investigation of Construct Validity and Responsiveness of the Danish ICECAP-A

Background This study aimed to provide the rst assessment of construct validity of ICECAP-A in patients with cardiovascular disease, chronic obstructive pulmonary disease and diabetes, and to assess the responsiveness of the measure in this group. Method Data were provided from patients attending rehabilitation in the municipality of Aalborg, Denmark, from March 2018 to March 2019. Patients answered a questionnaire from the healthcare centre and the ICECAP-A at baseline and 12 weeks follow-up. To assess construct validity, a priori hypotheses were developed. Based on these hypotheses, associations between sociodemographic characteristics, ‘general health’, a freedom dimension, and ICECAP-A were analysed through chi-squared tests and Spearman rank correlations for categorical and ordinal variables, respectively. To investigate responsiveness, the anchor-based method was used. Patients were divided into improved, worsened or no change, based on changes between baseline and follow-up on the anchor measures (‘general health’ and ‘freedom’). To quantify responsiveness, both the weighted and un-weighted ICECAP-A scores’ effect sizes, standardised response means and t-tests were used. Findings were explored across different age groups.

The municipality of Aalborg, Denmark, decided in 2018 to develop a self-completion questionnaire to evaluate their rehabilitation programme. The full questionnaire consists of 33 questions, including background information (gender, employment status, education level and cohabitation). Additional questions concerning training level and satisfaction with the program were asked at follow-up. The healthcare centre uses six of the questions to evaluate the rehabilitation programmes: (1) 'general health', (2) 'improvement of quality of life', (3) 'feeling t to do the things I want to', (4) 'better at handling everyday life after programme', (5) 'know how to sustain health in the future' and (6) 'able to be more physically active after programme'. Questions 1 and 3 were the only questions asked at both baseline and follow-up; the rest were only asked at follow-up. Questions 1-5 have four or ve possible response categories (where higher scores indicate greater levels of general health, for example). Question 6 has a binary response option (yes or no).

Construct validity
Construct validity is the degree to which an instrument (such as a questionnaire) measures what it is hypothesised to be measuring. It can be assessed by considering the degree to which expected relationships between a measure and other factors are con rmed [13,14]. Best-practice guidance on psychometric analyses highlights the importance of a priori statement of hypotheses on the anticipated relationship between the constructs explored [15]. Drawing on Sen's theoretical framework for the establishment of capabilities, capability can be limited by reduced socioeconomic status and improved by good circumstances [3]. For the assessment of construct validity, a priori hypotheses were developed based on existing evidence about the ICECAP measures in other contexts [16,17]. Table 1 indicates the expected direction between the ve attributes of ICECAP-A, and indicators of socioeconomic status, general health and freedom in terms of 'feeling t to do the things I want to' included in the Aalborg questionnaire.
The interpretation of Table 1 is as follows. The stability attribute is initially expressed as being able to feel settled and secure, and relates to the absence of signi cant changes in life and stress. It is therefore hypothesised that signi cant negative life changes were likely to be associated with reduced capability (such as changes in general health). The validity study by Al-Janabi et al. found that, among other factors, employment, education and relationship status were associated with stability in a positive direction [17]. Therefore, this study expected an association between stability and employment, education and cohabitation in a positive direction, despite the different de nitions of relationship status and education level. The attachment attribute is stated in terms of being able to have love, friendship and support, and relates to the ability to interact with others and have good relationships. Al-Janabi et al. found an positive association between attachment, employment and relationship status [17]. This study therefore anticipated nding an association between attachment, employment and cohabitation in a positive direction. The autonomy attribute is de ned as being able to be independent and relates to looking after oneself and making one's own decisions. Previously, positive associations between autonomy and employment and education have been found [17]. It was therefore anticipated that higher capability level for autonomy would be associated with higher level of employment and education in this study. The achievement attribute is de ned as being able to achieve and progress, and re ects individuals' abilities to move forward and achieve their goals. Previously, positive associations between achievement and employment, education and relationship status have been found [17]. It was therefore anticipated that capability for achievement would be associated with employment, education and cohabitation in a positive direction in this study. The enjoyment attribute is de ned as being able to have enjoyment and pleasure in life. It re ects opportunities for the small pleasures in life, as well as things that are perceived to be enjoyable or exciting. As such, an association with employment and cohabitation was anticipated in a positive direction [17].
The ICECAP-A measure was developed to measure the effectiveness of health and social care interventions. The degree of variation in health and healthcare usage is re ected in individuals' capabilities, and therefore is essential and of interest, because poor health and disabilities affect one's capabilities [4,17]. Previous studies concerning ICECAP-A have found that impairments to physical health reduce the capability for stability, autonomy, achievement and enjoyment [17,18]. Therefore, this study anticipated an association between general health and stability, autonomy, achievement and enjoyment.
Here, it was anticipated that the question focusing on general health would be interpreted by participants as a question about physical health only, given the reasons that they were accessing the service, and thus would not be associated with attachment. 'Feeling t to do the things I want to' was hypothesised to be associated with all ve attributes of the ICECAP-A, and high levels of capability were anticipated to relate to a high level of this question of freedom. This hypothesis is based on the ndings by Al-Janabi et al.
where a similar question was asked, 'I can do the things in life I want to do', and an association was found with all attributes [17].

Statistical analysis
Based on these hypotheses (Table 1), associations between selected variables and the ICECAP-A attributes at baseline were analysed using chi-squared tests for categorical variables and Spearman rank correlation for ordinal variables. A correlation was considered strong if the coe cient was higher than 0.5, moderate if the coe cient was between 0.3 and 0.5, and weak if the coe cient was below 0.3 [19].

Responsiveness
The ability of outcome measures to detect meaningful change, is central to their usefulness in health and social care interventions. Two core ideas in the assessment of evaluative instruments are sensitivity to change and responsiveness. Sensitivity to change refers to the ability of instruments to measure change statistically. Responsiveness addresses the detection of the clinically relevant change [13,20].
To assess responsiveness, some criterion is needed to ascertain where patients have changed over time.
The two main methods for assessing responsiveness are the distribution-and anchor-based approaches. The distribution-based method uses the effect size of the difference between groups to measure variability, standard response means, standard error of measurement and responsive statistics. The anchor-based method is sample-independent and examines the relationship with an anchor, such as a QoL measure, to explain the meaning of a particular degree of change [21]. The anchors can either be cross-sectional or longitudinal. An anchor-based analysis aims to assess whether scores on the target measure change in an anticipated way, as indicated by changes in the scores on the anchor [22]. Distribution methods alone do not provide information about the clinical relevance of the observed change. Therefore, this study assessed responsiveness, using anchor-based methods to investigate the association between change over time in the ICECAP-A scores and change over time in the anchors. An exploratory analysis of the correlation between the change scores of longitudinal outcome measures was used to support the choice of anchors for this study.
Using Cohen's rule, correlations were considered strong when the coe cients were > 0.50, moderate when ≥ 0.30, and weak when < 0.30. Therefore, 0.30 was used as a correlation threshold to de ne an at least moderate association between an anchor and outcome measure change score [23]. General health and 'feeling t to do the things I want to' were the only two questions for which there were longitudinal data, but they were only used if they reached a threshold of baseline correlation of 0.3 (at least moderate correlation). For appropriate anchors, patients were divided into three groups depending on the changes in scores in general health and 'feeling t to do the things I want to': (1) those who had worsened between baseline and follow-up scores, (2) those who had improved between baseline and follow-up scores, and (3) those with no change in scores between baseline and follow-up.
When assessing the responsiveness of a weighted measures such as ICECAP-A [8], consideration needs to be given independently to both the descriptive system [4] and the value weighting of the descriptive system. It is essential that the descriptive system can detect a change in a construct for the weighted measure to re ect meaningful change. If the analysis only uses the weighted tariffs scores, a misleading conclusion could be made, that is, a conclusion whereby the measure is thought not to be responsive, when, in fact, the descriptive system of the measure shows change, but the value weightings suggest that these changes are not highly valued [24]. The weighted tariffs scores are also re ective of the UK population and not those of the Danish public. Therefore, for each anchor, two analyses are presented: (1) an analysis of the 'un-weighted' descriptive system of the ICECAP-A and (2) an analysis of the 'weighted tariff scores'. For the un-weighted and weighted analysis, change was calculated in groups that improved and worsened. Un-weighted scores were calculated by summing ICECAP-A item response levels, with four indicating full capability on an item and one indicating no capability on an item. The weighted tariff scores were calculated using the UK general population tariff from Flynn et al. [25]. Findings were explored across different age groups (< 65 versus ≥ 65 years of age).
Responsiveness of the ICECAP-A scores was assessed using the Cohen's effect size (ES) and standardised response mean (SRM). Additionally, a paired t-test was applied to test the null hypothesis, that no change in the response means between baseline and follow-up had occurred. These indices were calculated separately for patients who reported improved, worsened or no change in the anchors [13,23]. The effect size was calculated by dividing the mean difference between baseline and follow-up scores by the standard deviation (SD) of baseline scores; SRM was calculated by dividing the mean score change (follow-up minus baseline) by the standard deviation of the change [22]. For all indices, a value of < 0.2 was considered small, 0.2-0.5 moderate and > 0.5 large responsiveness [23]. The range of the unweighted score was 16 (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20), and for the tariff score was 1 (0-1) with higher scores on both representing higher capability. Age differences in responsiveness were investigated by subgroup analysis using a group < 65 years of age and a group ≥ 65 years of age.
To assess the responsiveness of the individual ICECAP-A items, a response pro le (frequency of participants answering each level for each item, at baseline and follow-up) was completed for the two anchors. Change in response pro les between baseline and follow-up was analysed for each item to indicate which items were the 'drivers' of change in the overall measure.

Statistical analysis
The investigation of construct validity was based on all baseline data. The responsiveness analysis was based on complete cases in terms of questionnaire data because of high rates of missing data (78%); hence, imputation was not considered. The type of missing was anticipated to be missing completely at random because in all cases the entire questionnaire was missing. The reason for the amount of missing is that there was voluntary completion of the questionnaire, both at baseline and follow-up. Therefore, complete case analysis was performed for the responsiveness analysis. All analyses were carried out in Stata version 15 with a signi cance level set at 1% and 5%.
The study was carried out in accordance with the General Data Protection Regulation (2015-509-00007).
In accordance with the Danish National Committee on Health Research Ethics, this research satis es the criteria of being 'questionnaire and register-based research excluding human biological material', and thus was not required to undergo a formal ethics procedure [26].

Results
A total of 729 patients were registered at baseline as having completed the rehabilitation programme. At baseline, 454 patients completed the ICECAP-A. Of these, 155 completed the ICECAP-A at follow-up. The baseline characteristics for the complete cases and for the whole sample are presented in Table 2. More men were included, and just over half were aged over 65 years, with a similar proportion being retired. Around two thirds were living with a spouse and approximately half had a medium level of education, with a similar number having a low as a high level of education. Patients' responses (complete cases) at baseline and follow-up are listed in Table 3. The baseline weighted tariff score was 0.87 and the follow-up weighted tariff score was 0.89, thus a mean change of 0.02. The majority of responses had the highest or second-highest level of capabilities for each of the ve attributes. Nevertheless, some patients indicated that their capability level was limited (little or no capability) in most of the ve attributes. However, the proportion was small (< 5 patients), and in the autonomy attribute, there were no responses at the lowest level at follow-up. The percentage of patients reporting the highest response level increased for each of the attributes between baseline and follow-up data collection.  Table 4 shows the associations between selected variables and ICECAP-A attributes at baseline. Of the 26 hypothesised associations, 16 (62%) were in the expected direction. Hypothesised associations that did not meet our a priori tests were (1) education, cohabitation and the stability attribute, (2) employment and the attachment attribute, (3) employment, education (negative correlation, but close to zero − 0.0005) and the autonomy attribute, (4) education, cohabitation and the achievement attribute, (5) employment and the enjoyment attribute, and (6) employment, education and the weighted tariff score. In contrast, the associations between general health and the attachment attribute, were not hypothesised. Based on the correlations, analyses of general health and 'feeling t to do the things I want to' were chosen as anchors, as both reached strong correlation (0.54 and 0.52) and were therefore appropriate to use as anchors (see Table 4). Table 5 shows the change in un-weighted and weighted tariff scores in groups that reported improved (n = 70) and worsened (n = 16) general health scores. In groups that reported improved general health scores, ICECAP-A scores increased (0.05), and in the groups that reported a worsening of general health scores, ICECAP-A scores decreased (-0,06). The ES and SRM for those reporting an improvement in general health were small for both the un-weighted and weighted tariff scores; for those who reported a worsening in general health scores, the ES and SRM were moderate to strong. The ES and SRM in ICECAP-A scores were more substantial in the groups that reported a worsening of general health than improvement.   Table 5 shows the change in un-weighted and weighted tariff scores in groups that reported improved (n = 37) and worsened (n = 15) 'freedom' scores. In groups that reported improved freedom scores, ICECAP-A scores increased (0.06), and in the groups that reported a worsening of freedom scores, ICECAP-A scores decreased (-0,03). The change in ICECAP-A scores was more substantial in the groups that reported an improvement of freedom. The ES and SRM for those reporting an improvement in freedom were small to moderate for both the un-weighted and weighted tariff scores; for those who reported a worsening in freedom scores, the ES and SRM were small.

Subgroup analysis of responsiveness in different age groups
The results concerning responsiveness in the different age groups (Table 6) showed small differences, with the younger age group having a higher mean change, ES and SRM than the older group. In anchor between groups. More respondents reported improved general health (n = 35) compared with those improving in 'Feeling t to do the things I want to' (n = 15). The ES and SRM were larger in the < 65 groups. In the < 65 group, both the improved and worsened mean change were statistically signi cantly different between baseline and follow-up. This was only the case with the improved group in the ≥ 65 subgroup. Results concerning freedom showed small ES and SRM in both age groups, but smallest in the ≥ 65 subgroup. The item-by-item analysis (Table 7) showed that in the group of patients reporting an improvement in general health, the largest increase was in stability and in the patients reporting worsening of general health, the biggest decrease was in autonomy. In the group of patients reporting an improvement in 'feeling t to do the things I want to', the increase was comparable across attributes with increases in attachment lowest, and in the patients reporting worsening in 'feeling t to do the things I want to', the biggest decreases were seen in autonomy.

Discussion
This is the rst study to assess the construct validity and responsiveness of the Danish ICECAP-A measure. To achieve this, it used longitudinal data from a rehabilitation setting in a population of chronically ill patients. The ndings indicate that scores on the Danish ICECAP-A are associated with indicators of freedom and general health. The results provide evidence about the instrument's ability to respond to differences in socioeconomic characteristics such as employment, education and cohabitation. The responsiveness analysis explored changes in the ICECAP-A scores in response to general health and freedom, and the results indicate that the ICECAP-A is responsive and that patients younger than 65 years of age appear more responsive than older patients. The Danish ICECAP-A, therefore, demonstrated encouraging construct validity and responsiveness in a rehabilitation setting among chronically ill patients. The item-by-item analysis showed that those reporting an increase in general health and 'Feeling t to do the things I want to' scores the largest change in Achievement and autonomy respectively, and those reporting an decreased general health and 'Feeling t to do the things I want to' score the largest change was found in autonomy in both.
The overall ndings are consistent with previous studies that found the ICECAP-A to be promising in terms of validity [16,17,27] and responsiveness [16,28] in different populations and health conditions. The most comparable is the study by Al-Janabi et al. [17], where the ICECAP-A was found to be associated with various socioeconomic variables, the EQ-5D, and questions concerning freedom and opportunities. The most noticeable result was that the present study found an association between general health and the attribute attachment where Al-Janabi et al. found the opposite. Al-Janabi et al. did, however, nd an association between anxiety and depression and attachment. This could indicate that the participants in this study considered mental health to be a part of general health, which could relate to differences in the setting, but could also re ect the increasing focus on mental health across society more generally since the Al-Janabi research was published in 2013.
The study bene ts from the available Danish ICECAP-A translation (discussed elsewhere [11]) that made it possible to investigate the psychometric properties of ICECAP-A. Further, this study extends our academic knowledge around accurate outcomes assessment in the context of rehabilitation medicine among chronically ill patients. ICECAP-A is still a relatively new questionnaire, and so developing a better understanding of the tool's validity and responsiveness across populations is essential for its further use in health economic evaluations. Previous studies have demonstrated construct validity in different populations, including the general British population [17], women with irritable lower urinary tract symptoms [16] and a population with depression [27].
One methodological limitation of the study is the small number of possible anchors and lack of clinical anchors. While the use of general health as an anchor was driven by methodological considerations when considering a capability measure's suitability for use in health interventions, it is essential to identify how the instrument responds to changes in health. Health is one of many factors that affect the capability of a person and a relevant factor in this study population in particular. A smaller change in capability scores would, therefore, be expected in response to changes in health, and could have been useful to investigate with more anchors than general health. A previous study used EQ-5D as an anchor, in a population with depression, resulting in a correlation between all attributes of the ICECAP-A [27]. This study had a large proportion of missing data in term of patients not having both a baseline and follow-up measures. This missingness was anticipated to be completely at random because the entire questionnaires were missing, rather than responses to speci c questions. The amount of missingness may be due to that fact that it was voluntary completion of the questionnaire, both at baseline and follow-up. This could in uence the results if the sample is different from the missing data and decrease the power of the sample. However, the proportion of missing was assumed too large (78%) to impute.
The evidence of validity and responsiveness presented in this study adds to the psychometric pro le of the ICECAP-A measure, and the results provide an initial indication that the ICECAP-A may be responsive in public health research and chronically ill populations. In the Danish municipal rehabilitation setting, no national outcome measurement procedures exist, so a more extensive study with more participating municipalities would be interesting to explore the implications further. Establishing the psychometric performance of a measure is a continuous process, and further research is needed to explore how well the ICECAP-A performs in different public health and social care settings, such as in interventions regarding self-care. Ideally, capability measures could be incorporated into future health agreements and clinical guidelines. More importantly, it is necessary to show personnel in healthcare centres and decisionmakers the bene ts of implementing ICECAP-A in everyday work as a tool in public health and social care interventions, and not just as a scienti c instrument.

Conclusion
This study provides the rst investigation into construct validity and responsiveness to change for the Danish translation of the ICECAP-A and the rst investigation into responsiveness to change for any ICECAP measure in the context of CVD, COPD and diabetes. The Danish ICECAP-A has demonstrable potential for accurately measuring the effect of rehabilitation. Furthermore, it appears to be responsive in terms of capturing the effects on general health and the freedom to do things. Future research into the psychometric properties of the Danish ICECAP-A would be bene cial to clinicians and decision-makers in Denmark interested in capturing broader bene ts to patients, beyond just health.

Declarations
Ethics approval and consent to participate The study was carried out in accordance with the General Data Protection Regulation (2015-509-00007).
In accordance with the Danish National Committee on Health Research Ethics, this research satis es the criteria of being 'questionnaire and register-based research excluding human biological material', and thus was not required to undergo a formal ethics procedure [29].

Consent for publication
Not applicable Availability of data and materials: The data that support the ndings of this study are available from [Denmark Statistics and the municipality of Aalborg, Denmark] but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of [Denmark Statistics and the municipality of