Content validity, construct validity and magnitude of change for the eight-item HAKIR questionnaire - a patient reported outcome in the Swedish national healthcare quality registry for hand surgery.

Background The Swedish National Healthcare Quality registry for hand surgery, called HAKIR, includes collection of patient-reported outcome measures (PROMs) for all operations performed at the specialist departments of hand surgery. A prerequisite for PROMs is that the collection of data is based on psychometrically sound outcome instruments. This study therefore aims to evaluate content and construct validity, floor and ceiling effects, data completeness and magnitude of change over time for the eight-item HAKIR questionnaire. Methods Content validity was investigated by patients, an expert group and literature review. Construct validity was investigated through predefined hypotheses and correlation statistics between single-item questions and QuickDASH. Floor and ceiling effect and data completeness was analysed on 13197 preoperative, 10702 three months and 9986 twelve months postoperative questionnaires. Changes in scores for single-items between pre- and 3 and 12 months follow-up for elective hand-related diagnosis and between 3 and 12 months for acute injuries was quantified by effect size calculations. Results The HAKIR Questionnaire (HQ-8) included items concerning pain on load, pain on motion without load, pain at rest, stiffness, weakness, numbness, cold sensitivity and ability to perform daily activities. Correlation coefficients between single-items and total QuickDASH score at pre, 3 and 12 months follow-up ranged from 0.44-0.79. No ceiling effect but a floor effect in the total group was seen in all items at all follow-ups. Percentage of missing item responses was < 2.6% except for the cold sensitivity question. The magnitude of change for individual items varied between small to large effects size in elective hand-related diagnosis. For acute injuries a small effect size was seen between 3 and 12 months follow-up. Conclusions This study provides evidence of content and construct validity of the HQ-8 including lack of ceiling effect, expected floor effect, good data completeness and ability to detect changes in scores over time. Associations between single-items and QuickDASH indicate that HQ-8 measures unique disability aspects important in hand surgery. Further studies are needed to evaluate test-retest reliability, responsiveness including clinically important change of the HQ-8 questions in subgroups of hand-related diagnoses.

injuries a small effect size was seen between 3 and 12 months follow-up.
Conclusions This study provides evidence of content and construct validity of the HQ-8 including lack of ceiling effect, expected floor effect, good data completeness and ability to detect changes in scores over time. Associations between single-items and QuickDASH indicate that HQ-8 measures unique disability aspects important in hand surgery. Further studies are needed to evaluate test-retest reliability, responsiveness including clinically important change of the HQ-8 questions in subgroups of hand-related diagnoses.

Background
The Swedish National Healthcare Quality Registries (NQRs) have brought about considerable improvements in different fields of medicine [1,2]. The first NQRs in the 1970s were orthopaedic arthroplasty registries, registering all joint prostheses that had been implanted and removed. As one result, Sweden now has one of the lowest revision rates for knee and hip implants in the world [3,4]. Evaluating outcomes after hand surgery is complex since the distinction between a good or a bad result cannot be made simply by measuring joint motion or grip strength or following the rate of revision of joint implants. Common indications for hand surgical procedures are often perceived symptoms like ache, pain and paraesthesias. PROMs should therefore be included in a NQR for hand surgery to give a complete picture of treatment results. The short version of the Disabilities of the Arm, Shoulder and Hand (Quick DASH) [5] as well as a single-item questionnaire with seven questions concerning perceived symptoms in the affected/injured hand and one question about the ability to perform activities of daily living have been included in HAKIR since the start. This eight-item questionnaire is in future text referred to as HAKIR questionnaire (HQ-8) [6] and can be found in the online Additional file 1, HAKIR questionnaire, HQ-8.
A prerequisite for PROMs in a quality registry such as HAKIR is that the systematic collection of data is based on psychometrically sound outcome instruments. The QuickDASH has retained equal measurement properties as the original DASH with strong evidence for reliability and validity [7]. These aspects, however, have not been evaluated for the HQ-8. Content validity which is the extent to which the content of the instrument is an adequate reflection of the construct to be measured therefore needs to be evaluated [8]. Construct validity refers to the degree in which scores in an instrument relate to other measures that are consistent with predefined hypotheses concerning the concept being measured [9]. The outcome measures should also be valid for the purpose and population and responsive enough to identify true and clinically meaningful changes in function, not just a change due to random error [10,11]. Floor and ceiling effect occurs when a considerable proportion of subjects score the best or worst score [12]. The measure is then unable to discriminate between subjects at either extreme of the scale. This may indicate limited content validity and responsiveness as well as reduced reliability because changes cannot be measured [9]. Data completeness (item-response) is another feature important for the content validity of questionnaires.
The aim of the present study was to evaluate content and construct validity as well as floor and ceiling effect, data completeness and the ability to detect magnitude of change for the single-item questions (HQ-8) included in HAKIR.

Methods
The HAKIR registry is web-based with a secure logon function. The legal requirements for healthcare registries in Sweden do not demand active consent from the participants. There are however strict demands on information to all participants before registration, for instance about the option of not being registered (opt-out) and the possibility of having all personal data erased at any time. When patients are scheduled for hand surgery, they are informed orally about HAKIR by staff, as well as being informed through an information brochure which is available in six languages (https://hakir.se/wp content/uploads/2018/06/Eng_Patientinformation_2018.pdf). On the back of the brochure, a tag with username and password is attached and the patients log in themselves at a computer at the clinic. At three and twelve months after surgery, postoperative web questionnaires are automatically sent out by the registry to the e-mail address that the patient had registered. After 2 days, an sms is sent as a reminder. Some patients request a paper form instead, which they then get at their visit to the clinic. These patients are sent the postoperative questionnaires by surface mail by staff with a payed return envelope. The data in the questionnaires is then registered by staff. Paper questionnaires are very work demanding and costly, and eight out of nine participating departments have completely transitioned to web questionnaires [13].
All performed operations at each participating department are included in the basic registration in HAKIR, where surgical codes and reoperations are registered by hospital staff. Patients are asked to complete the HQ-8 and QuickDASH before, as well as three and twelve months after their surgery [13].

HQ-8 -Content validation process
The choice of questions concerning different hand symptoms was initially made by the registry manager and a senior hand surgeon at the hand surgery department in Stockholm. The selection of symptoms was based on clinical experience and previous literature [14][15][16][17][18][19][20][21]. The symptoms were; pain on load, pain on motion without load, pain at rest, reduced range of motion, reduced strength, reduced sensibility or numbness and hand function in daily activities. The content validity of these questions was assessed by cognitive interviewing (think aloud) [22,23] of seven patients (3 men and 4 women, age 25-68 years) while they were responding.
Patients´ views on the relevance of each item, as well as any ambiguities of formulations were noted during the interviews which were performed by an experienced hand therapist. An oral summary was made together with each patient.
The field notes were then transcribed and analyzed with content analyses by the interviewer (fourth author, KS) [24]. A review of the included questions and formulations was then performed by an inter-professional expert group consisting of the registry manager, hand surgeons, occupational and physiotherapists as well as nurses from all hand surgery departments. The final version of the HQ-8 was reached through consensus and unanimous decision and included eight questions [6]. As in the initial process mentioned above the review of literature and following discussion included a discussion about the clinical importance and relevance of the  [6]. Patients were asked to respond to their experienced problems during the last week in the hand/arm relevant for surgery. In case of surgery for an acute injury, patients were asked to estimate perceived problems prior to the injury.

QuickDASH
Furthermore, patients were asked to respond to the Swedish version of the 11-item QuickDASH [5]. The QuickDASH is a region-specific self-report outcome instrument quantifying physical function and symptoms in persons with any or multiple musculoskeletal conditions of the upper limb [26,27]. Each item in QuickDASH scores on a 5-point scale (1 = No difficulty to 5 = Unable). Patients were requested to answer the questions based on their condition in the last week and regardless of which hand they used to perform the task. If they did not have the opportunity to perform an activity in the past week, they were asked to make their best estimate of which response would be the most accurate. A total QuickDASH score ranging from 0 (no disability) to 100 (most severe disability) was calculated from the item score [5].

Data analysis
Two aspects of validity (content and construct validity) was investigated. Content validity was assessed by patients, an inter-professional group of experts in the field of hand surgery and via literature review. Construct validity was investigated through predefined hypothesis and correlation statistics (Spearman's rank correlation coefficient, r s ). The strength of the correlations was interpreted as: r s < 0.5 low; 0.5 to < 0.7 moderate; > 0.7 high [28]. The analyses were made for the total group and in a selection of hand surgery diagnoses.
It was hypothesized that; scores from question 8 in HQ-8 (ability to perform ADL) would correlate positively with the total QuickDASH score for the total group and in a selection of hand surgery diagnoses. A point estimate of the correlation of 0.70 or greater is considered a high correlation [8]. Analyses were based on responded questionnaires for each follow-up. A high correspondence between the visual analogue scale has previously been established [29]. We therefore aggregated the data from the VAS and NRS responses in the present study.
Floor and ceiling effect was calculated as the percent of responded questionnaires with a score <5 (floor) and a score >95 (ceiling), respectively, for each of the HQ-8 questions at pre, 3 and 12 months follow-up. A threshold >15% was defined as a floor or ceiling effect [12]. Data completeness (item-response) was calculated as the number of missing item responses for each HQ-8 question in relation to responded questionnaires. A threshold of >15% was defined as unacceptable [30].
The magnitude of change was quantified by effect size calculations. For elective hand-related diagnosis the mean paired change between pre-and 3 months followup as well as pre-and 12 months follow-up was divided by the dispersion measure (SD) of the preoperative (baseline) score in each HQ-8 question. For acute handrelated diagnosis the mean paired change between 3-and 12 months follow-up was divided by the dispersion measure (SD) of the 3-month score [31,32]. According to Cohen's criteria, an effect size of 0.20 is considered small, 0.50 is medium, and 0.80 is large [33].

Sample characteristics
In total, data from 33 885 questionnaires was analysed; 13 197 before operation, 10 702 at three months, and 9 986 at 12 months after hand surgery. Mean age was 52 , and composed of 50.0% men and 50.0% women. Thirty-six percent of the questionnaires were completed in a web format and 64% in a pencil and paper format.

Content validity
Seven questions were initially included in the HQ-8 questionnaire. The interprofessional expert group added one question concerning discomfort/problems when exposed to cold, giving a total of eight included questions. Formulations of four symptoms were changed from the initial version in order to achieve unambiguity and consistency with formulation of verbal anchors for all questions (0 representing no problem and 100 the worst problem imaginable). Reduced range of motion was therefore changed to stiffness, reduced strength to weakness, reduced sensibility or numbness to numbness/tingling in fingers and hand function in daily activities to ability to perform daily activities. The final version was achieved through consensus discussion in the expert group taking into consideration patients´ suggestions for clarification of symptoms and anchors, clinical experiences of the spectrum of symptoms and literature review [6]. See the online Additional file 1, HAKIR questionnaire, HQ-8. In order to reduce respondent burden and increase response rates, the number of included items was restricted, excluding questions concerning fine-motor skills, grip function and aesthetics.

Construct validity
The correlation coefficients between the single-item questions in HQ-8 and the total score in QuickDASH for the total group at pre, 3 and 12 months follow-up ranged from 0.44-0.79. The strongest correlation (r s =0.79) in the total group was noted for question 8 (ability to perform daily activities) and the total QuickDASH score at the 3 month follow-up. The weakest correlation (r s =0.44) was seen for question 7 (cold sensitivity) and total QuickDASH score in the pre-operative analysis. Detailed information for total group and a selection of diagnostic subgroups is available in Fig 1, Fig 2, Fig 3 and Table 3.

Thumb osteoarthritis: A medium ES was seen in all pain questions (#1-3) and ability
to perform daily activities (#8) at 3 months follow-up, which increased to a large ES at the 12 months follow-up. Stiffness (#4) and weakness (#5) reached a medium ES at 12 months follow-up.
Dupuytren´s contracture: A medium ES was seen for stiffness (#5) and ability to perform daily activities (#8) at both follow-ups.
Morbus de Quervain: A medium ES was seen for pain on load (#1) and ability to perform daily activities (#8) at 3 months follow-up, which increased to a large ES for pain on load (#1) at 12 months follow-up. At 12 months follow-up, pain on motion without load, weakness and ability to perform daily activities (#2, 5,8) showed a medium ES.
Ganglion: A medium ES was seen for pain on load (#1) at 12 month follow-up.
Trigger finger: A medium ES was seen for ability to perform daily activities (#8) at 3 months follow up, which increased to a large ES at 12 months follow up. A medium ES at 12 months follow-up was also seen for pain on motion without load (#2), pain at rest (#3), stiffness (#4) and weakness (#5) and a large ES for pain on load (#1).
Carpal tunnel syndrome: A medium ES was seen for pain at rest (#3) and numbness/tingling in fingers (#6) at 3 months follow-up, the latter increased to large ES at 12 months follow-up. A medium ES was seen for all other questions (# 1, 2, 3, 4, 5, 8) except cold sensitivity (#7) at 12 months follow-up.
Ulnar nerve entrapment: A small ES was seen for all questions at both follow-ups.  The use of an eleven increment NRS question has been recommended as being a responsive scale with good compliance rate, ease of use and similar psychometric properties compared to a visual analogue [29,[35][36][37][38]. Issues to consider in use of single-item questions are the definition of anchors. This has been commented on for pain ratings [39] but may be true even for ratings of other impairments related to hand function. "Worst pain ever experienced" is another upper anchor alternative but is only interpretable when having knowledge about patient's pain history [39].
The idea of accuracy when measuring pain has also been pointed out since an unequivocal reference standard does not exist or cannot be obtained [40]. Selfreports of pain can be influenced by previous experiences, behavioural, affective or cognitive factors and vary depending on the context in which the pain is experienced. When measuring change over time it is therefore of importance to use mean paired comparisons to limit the variability between person to person.
Multidimensional pain scales have been recommended offering a broader understanding of pain experience beyond the simple factor of pain intensity as measured with single questions such as a NRS [41] but are not applicable in a total population of hand surgery patients. To capture different levels of pain intensity, ranking of both pain on load, on motion without load and at rest was included in HQ-8. Furthermore, it is also possible to include other additional self-report instruments specific for different hand-related diagnosis in research projects. A high expected level of positive correlation (>0.70) between #8 (ability to perform daily activities) and the total QuickDASH score for the total group and in all followups was found (hypothesis 1). This was expected since the main focus on items included in QuickDASH reflects the ability to perform certain activities. Question # 8 reflects the ability to perform ADL with the affected/operated hand and QuickDASH regardless of which hand that is affected. This may be part of an explanation why a "perfect" correlation close to 1 is not present and confirms the relevance of both outcome measures.
Strong positive correlations were also seen in the total group for all HQ-8 questions concerning pain and the total QuickDASH score (hypothesis 2). In our experience pain is one of the most limiting factors for satisfactory performance on activity level. This is also confirmed in previous studies [42][43][44]. A weaker r s was noted preoperatively for all pain related questions and total QuickDASH scores for the subgroup thumb osteoarthritis. A possible explanation may be that patients with thumb osteoarthritis rank their problems with pain as more problematic compared to performance in activities preoperatively. Access to compensatory strategies may be reflected in a proportional lower QuickDASH score and the correlation may therefore be weaker. Postoperatively, the perceived pain improves and the scores on pain and QuickDASH are therefore reflected in stronger r s values.
A stronger positive correlation in the total group was seen for weakness and the total QuickDASH score compared to r s values for stiffness, numbness and cold sensitivity (hypothesis 3). Hand strength is previously demonstrated to correlate strongly to DASH score in a variety of hand related diagnosis [42,[44][45][46]. For the thumb osteoarthritis group an r s value of (0.36, 0.63, 0.70) was seen for weakness and total QuickDASH score at pre, 3 and 12 month follow-up, which may indicate that an increased strength after surgery is important for performance on activity level.
A limited content validity may be indicated if a floor and ceiling effect occurs since the ability to discriminate between subjects at the extremes of the scale is lost [9].
A floor effect, defined as a 15% threshold [12] was seen in the total group and in all questions (HQ-8) with the largest floor effect seen at 12-months follow-up. This however was not observed in all of the selected subgroups since certain symptoms are neither relevant before, nor after surgery for some diagnoses. It is also expected that treatments should decrease or completely diminish symptoms that are relevant for different subgroups.
Data completeness was very good with 1.5 -2.6% of patients with missing item responses depending on item and follow-up. This surpassed the rough guidelines on the defined threshold at >15% [30]. One exception was cold sensitivity (#7) with a slightly higher percentage of missing item responses at pre and 3 months follow-up (6.2 and 7.6%). This may be explained by the timepoint when the question was answered since the relevance of responding may vary during different seasons.
Although cold sensitivity has been described as a frequent problem in a mixed group of hand injuries [47], it is not present for all hand surgery patients. This may influence the motivation to reply.
Interesting to note is that the effect sizes in various elective subgroups was The small ES between 3 and 12 months follow-up for the selected acute handrelated injuries is also as expected. The main change in scores probably occurs within 3 months after the injury and when the rehabilitation usually is completed.
The preoperative scores are retrospectively filled in and recall bias especially in an acute injury situation may exist. At this point we therefore choose to prioritize ES calculations on the mean paired change between 3 and 12 months.
Although, the results of the ES calculations were consistent with clinical experience following surgery in the selected subgroups of hand related diagnosis, one has to remember that ES only measures the magnitude of change and not the clinical importance [48].
According to the COSMIN guidelines responsiveness refers to the validity of a change score [48] and it has been defined as the ability to detect minimal important change (MIC) over time, even if these changes are small. Furthermore, instruments should be able to distinguish MIC from measurement error [9]. We therefore plan to study these aspects in a future study when test-retest data is collected.

Methodological considerations
The response rate was 45% for 3 months and 47% for 12 months postoperative questionnaires. It has been reported that on-line surveys in general are less likely than paper surveys to receive high response rates [49]. This was initially also the case for the HAKIR surveys, but after improving web-functionality and sending out a reminder sms the response rate is now similar between web and paper. Response rates around 30-40% have commonly been reported for large on-line surveys [49].
The response rate for the HAKIR questionnaire could therefore be considered as acceptable, even though efforts should be made to increase it further.
To determine the accuracy of HQ-8 in another culture or country or when different language versions are used, a reassessment of the validity and reliability is advocated. Guidelines for cultural-adaptation process in other languages are then to be followed [50,51].

Conclusions
This study provides evidence of the content and construct validity of the HQ-8, including good data completeness, expected floor effect, lack of ceiling effect and an ability to detect changes in scores over time. The different associations between

Competing interests
The authors report no conflicts of interest.  Spearman´s correlation coefficient between HQ-8 questions and total QuickDASH score for th Spearman´s rank correlation coefficient between questions in HQ-8 and total QuickDASH sco Spearman´s rank correlation coefficient between questions in HQ- 8