Psychometric Comparison of the Performance of Quality of Life Assessment Instruments in Dermatology—the DLQI, Skindex-16, and Skindex-17 —in a Brazilian Population.

Background: The DLQI (Dermatology Life Quality Index) is the most commonly used instrument for evaluating the quality of life in dermatology. Skindex was developed as a multidimensional instrument with successive versions published, the most recent being Skindex-16 and Skindex-17, both derived from Skindex-29 through different techniques. This study aimed to compare the three instruments—the DLQI, Skindex-16, and Skindex-17—according to their psychometric performance to rene the assessment of the quality of life among dermatological patients. Methods: A methodological study compared the psychometric performance of the DLQI, Skindex-16 (Sk-16), and Skindex-17 (Sk-17) instruments among adults with dermatoses that were classied according to characteristic physical symptoms and psychological or social domains. Analyses were performed to assess internal consistency, correlation, test-retest reproducibility, and responsiveness according to classical psychometry and to test discrimination and diculty according to the item response theory. Results: The sample consisted of 229 patients predominantly women (71%) of adult age (average 45 years) and intermediate phototypes (III and IV = 73%). The analyses of internal consistency for the instruments resulted in Cronbach-α coecients >0.80. There was adequate test-retest reproducibility and responsiveness for all dimensions of the instruments. The IRT (Iten Response Theory) analysis indicated adequate ordering and discrimination (a >1.0) for all items of the DLQI, Sk-16, and Sk-17; four items of Sk-16 did not adequately adhere to the TRI model (p <0.01 ). The items with the greatest discrimination were q3 (domestic activities) and q5 (leisure activities) in the DLQI; F2 (desire to be with people) and E6 (annoyance) in Sk-16; and S4 (irritated skin), P5 (relationship), and P6 (autonomy of tasks) in Sk-17. The Sk-16 and Sk-17 instruments presented more items that registered mild impacts on the quality of life (b <-0.5). Conclusions: The DLQI, Sk-16, and Sk-17 presented the assessment of health-related of


Introduction
Quality of life assessment instruments can provide more personalized information than traditional clinical data on the impact of health conditions on patients' lives. (1) Metrics based on the perspective of patients (patient-centered outcomes) are important for monitoring chronic diseases. Such an assessment is based on the patient's opinion without interference or interpretations by the doctor, contributing to the understanding of the health-disease process as well as the evaluation of treatments.
The rst well-structured generic instrument for evaluating health-related quality of life (HRQOL) in dermatology was the Dermatology Life Quality Index (DLQI), published in 1994 (2) and validated for Portuguese (DLQI-BRA) in 2004. (3) It is a one-dimensional instrument, composed of 10 items arranged in six categories (symptoms, daily activity, work/school, leisure, interpersonal relationships, and treatment), that assesses the individual's perception in the last week. (4) There are ve possible answers for each item (very much, a lot, a little, not at all, and not relevant); the score for each item ranges from 0 to 3. (2)(3)(4)(5) Conventionally, DLQI scores are interpreted based on the algebraic sum of the indexes of the 10 items evaluated: without compromising quality of life (0-1), with mild (2)(3)(4)(5), moderate (6-10), severe (11)(12)(13)(14)(15)(16)(17)(18)(19)(20), or very severe (21)(22)(23)(24)(25)(26)(27)(28)(29)(30) impairment. (6) The DLQI is the most frequently used instrument for clinical follow-up and treatment guidelines (7)(8)(9); however, in recent years, its dimensionality and psychometric properties, such as the ordering of item scores and presence of differential functioning of the items, have been questioned in consideration of the item response theory (IRT). (10)(11)(12)(13)(14) In 1996, Chren et al. developed the rst version of Skindex, a multidimensional instrument for evaluating HRQOL among patients with dermatological diseases, with 61 items aimed to assess several psychological and psychosocial aspects not yet addressed. (15) The second version of the instrument, Skindex-29, was published the following year, in 1997. (16) In 2000, the third version of the instrument was published: Skindex-16 (Sk-16). In this new version, the effects of dermatoses on HRQOL were divided into three domains: symptoms, emotions, and functioning The items assess the frequency of discomfort experienced through a seven point Likert scale (ranging from never bothered to always bothered). (17) Each item is transformed into a score that ranges from 0 (without discomfort) to 100 (maximum discomfort).
In 2006, a new version was published; Skindex-17 (Sk-17) is based on the IRT and not on the classic test theory, like the other versions. Sk-17 has two subscales: psychosocial and symptoms. On the psychosocial subscale, the scores range from 1 to 24 and are classi ed according to the impact on HRQOL: less than ve (little), between ve and nine (moderate), and greater than nine (high). The symptom scale result is dichotomized; there are ve response options (never, rarely, sometimes, often, and all the time), and scores greater than or equal to ve indicate that many symptoms are present. (18,19) The development of the Sk-16 and Sk-17 instruments differed, and the two instruments vary in terms of structure, dimensionality, and scores. Despite coincident items, the overall designs of the instruments are different. They have not yet been compared in terms of their psychometric properties.
By comparing the available instruments, it is possible to better understand the sensitivity of HRQOL assessment tools. This study aimed to compare the psychometric performance of the DLQI-BRA, Sk-16, and Sk-17 in a Brazilian population.

Methods
A methodological study was conducted to compare psychometric performance between HRQOL instruments. The project was approved by the institutional ethics committee (no. 2,367,912), and all participants signed a consent form before inclusion.
Adult patients from the Hospital das Clínicas, Faculty of Medicine of Botucatu-Unesp with dermatological diseases, between December 2017 and October 2019, were included. During a medical consultation, demographic information about the dermatological disease was collected, and the patient was asked to complete Sk-16, the DLQI-BRA, and Sk-17 (in that order). The underlying dermatoses were classi ed into three groups by consensus among the dermatologist authors: evident psychosocial impact (e.g. vitiligo, melasma), predominantly symptomatic impact (e.g. venous ulcer, urticaria), or symptomatic and psychosocial impact (e.g. psoriasis, hidradenitis suppurativa).
The frequency of responses for each item of the instruments was evaluated, and the "ground" effect was assessed (frequency of responses > 50% for minimum response).
The internal consistency of each dimension of the instruments (Sk-17, Sk-16, DLQI-BRA) was assessed by Cronbach's alpha coe cient with a 95% CI, whose lower interval should exceed 0.8 in constructs of adequate consistency. (20 ) The correlations between the questionnaire scores were assessed using Spearman's linear correlation coe cients, which should be greater than 0.7 (strong correlation). (21) The test-retest reliability was assessed in a subgroup of 21 subjects with no clinical alteration of their dermatoses, with an interval of 7-30 days between the interviews. The test-retest reproducibility was analyzed by the intraclass correlation index (ICC) for single measures (complete agreement) and considered adequate if greater than 0.8. (22) For responsiveness analysis, the response to treatment was evaluated within a subgroup of 21 patients who presented clinical alteration of their dermatosis through the Wilcoxon test, considered adequate if p < 0.05.
Data were collected to assess temporal stability and responsiveness through convenience sampling among patients with brief outpatient returns (within 7-30 days). The presence or absence of clinical alteration was assessed by the attending dermatologist and inquired of the patient. All evaluations were consistent between the patients and attending physicians.
The instruments were evaluated for global informativeness (information function) and the characteristics of the items (a: discrimination; b: di culty) according to the multidimensional IRT. (23) For model adjustments, we used the representation identi ed by the Akaike information criterion (AIC) and Bayesian information criterion (BIC). The adjustment of the items to the model was performed using the S2-X2 statistic, considered appropriate if p ≥ 0.01.
Assuming the need for up to 10 participants per item in each instrument, a minimum of 170 participants was estimated for the comparison of the instruments. (24,25) The data were tabulated in Microsoft Excel and analyzed using IBM SPSS 25.0 software, JASP 0.14, and R (mIRT package). (23)

Results
In total, 229 participants were interviewed, resulting in 271 complete questionnaires (including test-retest and responsiveness). The clinical demographic characteristics are listed in Table 1. The patients were predominantly married, female, of adult age with intermediate phototypes. The dermatoses included in the study are presented in Table 2. The analysis of internal consistency for the three instruments resulted in Cronbach's alpha coe cients with indexes ≥ 0.80 (Table 3).  There were no major di culties in interpreting and responding to the items; however, due to the coincident items in the Sk-16 and Sk-17 questionnaires, some patients queried the need to answer the same question again. Furthermore, the response options are different (Sk-16: Likert scale with options from 0 to 6; Sk-17: response options of never, rarely, sometimes, often, and all the time), and some participants reported more ease in responding to the Sk-17 options. This difference was observed by the authors during the research but not measured.
The frequency of each response for each item of the instruments was studied to detect ground effects. Ground effects were found in Sk-17 (12 items), the DLQI-BRA (eight items), and Sk-16 (four items).
The results indicated adequate test-retest reproducibility (Table 5) for all instruments (ICC > 0.80). Responsiveness analysis was also satisfactory for all dimensions (p < 0.05). The items with the greatest discrimination were q3 (shopping/domestic activities) and q5 (social/leisure activities) in the DLQI; F2 (desire to be with people) and E6 (annoyance) in Sk-16; and S4 (irritated skin), P5 (closeness with loved ones), and P6 (doing things alone) in Sk-17.
Notably, in the DLQI-BRA, q1 (symptoms) and q2 (embarrassment) were the items that identi ed the mildest impacts on HRQOL, while q6 (sport), q8 (partner/family member problems), q9 (sexual life), and q10 (treatment) were only sensitive to situations with more evident impacts. The Sk-16 and Sk-17 instruments more thoroughly assessed the different levels of impact on HRQOL. In Sk-16, items E1 (persistence/recurrence) and E2 (worry about skin condition) identi ed earlier impacts on HRQOL; item F3 (di culty showing affection) was sensitive to more impactful situations. In Sk-17, items S4 (irritated skin) and P8 (embarrassment) indicated early impacts on HRQOL, and items P5 (closeness with loved ones) and P12 (sex life) delineated more substantial impacts on HRQOL.   on HRQOL is common in dermatological series and requires instruments with su cient discrimination in this impact range. (4,11,(27)(28)(29)(30)(31)(32)(33) The three instruments studied demonstrated adequate feasibility, internal consistencies, temporal stability, and sensitivity to change. The Sk-17 subscale of symptoms exhibited small psychometric indicators. Notably, the article describing Sk-17 (18) suggests that this domain has an interpretation of its dichotomized score -greater than or equal to ve x less than ve. Thus, the non-continuous interpretation of this dimension may have decreased its accuracy. Over time, the emotions dimension of Sk-16 and psychosocial dimension of Sk-17 exhibited greater internal consistency than the DLQI-BRA, resulting in better discriminating power.
The DLQI is the most widely used HRQOL assessment tool in dermatology worldwide, and it correlated strongly with the Sk-17 domains (rho ≥ 0.75). The Sk-16 domains evidenced less expressive correlation (rho ≥ 0.59).
While analyses of the ground effect of instruments are not always described in validation studies, they can in uence the performance of the questionnaires. There was a signi cant number of items with a ground effect in the Sk-17 and DLQI questionnaires. The inclusion of dermatoses with mild to moderate HRQOL involvement favors the selection of options with lower scores, especially when the questionnaires use fewer than seven options per item. Notably, in the DLQI, two options (not relevant and not at all) that result in a score of zero can have markedly different practical implications. (34,35) There is a tendency to use numerical scales (0-10) in the construction of psychometric instruments. This affects internal consistency and dilutes the ground effect. However, the preference for answering questions using the numerical scale in Sk-17 rather than the Likert scale in Sk-16 reported by some patients indicates greater ease in considering concrete and non-numeric concepts when assessing quality of life.
The IRT analysis allowed a more detailed evaluation of the behavior of the items and instruments independently. In general, the items exhibited high discriminatory capacity; however, Sk-16 and Sk-17 presented items that more adequately represented mild impacts on HRQOL (b <-1.00), while the DLQI presented items that resulted in b >-0.50, which can compromise sensitivity at lower scores. In addition, multidimensional instruments favor separate analyses of the disease's impact dimensions, allowing a detailed assessment of the type of impairment in icted. (36) This study has limitations related to the sample of patients from a single Brazilian center that is part of a public institution. These limitations, however, did not compromise the sample's representativeness regarding gender, age, education, income, and impact on HRQOL. Generic HRQOL instruments are fundamentally in uenced by the type of disease and may not be suitable for comparing different disease groups (14,(37)(38)(39); however, this study used a parallel comparison of the diseases for the three instruments simultaneously.
Finally, this study contrasted the popularity, agility, clarity, and availability of the DLQI-BRA with the multidimensional analysis and greater discriminatory sensitivity to HRQOL in low Sk-16 and Sk-17 scores. Still, the performance of the multidimensional instruments, Sk-16 and Sk-17, was similar. The Sk-17 performance correlated more strongly with the established DLQI-BRA. In addition, the greater ease in understanding the response options, as reported by the participants, and the well-established categorization of the Sk-17 scores are favorable points for its use.
These results provide researchers with support for the selection of the instrument in clinical trials and its interpretation, as well as for the development of new questionnaires.

Conclusions
The Consent for publication: Not applicable.