Evaluation of Skindex-16 construct validity in routinely collected psoriasis data: a retrospective analysis of the relationship between overall physician global assessment scores and Skindex-16 and measure discordance

Patient-reported outcome measures (PROMs) capture disease severity metrics from the patient’s perspective, including health-related quality of life (HRQL). Disease-specific validation of PROMs improves their clinical utility. We evaluated construct validity (HRQL) for Skindex-16 in routinely seen psoriasis patients and characterized instances of discordance between Skindex-16 scores and clinician-reported outcome measure of disease severity. We retrospectively studied psoriasis patients seen by University of Utah Dermatology from 2016 to 2020. Cross-sectional construct validity was assessed using quantile regression and Spearman correlation between overall physician global assessment (OPGA) score and Skindex-16 scores. Longitudinal within-subject correlation was performed using linear mixed models. Discordance (10th percentile or lower OPGA and 90th percentile or higher Skindex-16 score [clear skin, poor HRQL; cspHRQL] or the reverse [severe skin, good HRQL; ssgHRQL]) was characterized descriptively. 681 first-visit patients with psoriasis were included. Median overall Skindex-16 score varied by ≥ 10 points across all levels of OPGA scores. OPGA and Skindex-16 domain scores were moderately correlated (emotions ρ = 0.54, functioning ρ = 0.47, and symptoms ρ = 53). Longitudinal correlations were similar (emotion ρxy = 0.54, functioning ρxy = 0.65, symptoms ρxy = 0.47). Visits with cspHRQL discordance occurred for each Skindex-16 domain (emotions = 7, functioning = 13, symptoms = 12). The ssgHRQL group was observed within the emotions (n = 1) and functioning (n = 23) domains. Median Skindex-16 scores are different between different levels of OPGA and show moderate cross-sectional and longitudinal correlation. This supports construct validity in patients with psoriasis. Severe discordance was rare and most often for those with clear skin but poor HRQL. These discordances can prompt further patient–clinician conversation.


Introduction
Patient-reported outcome measures (PROMs) can be used by clinicians to quantify a patient's perceived health-related quality of life (HRQL) impact from disease (1). Measures from the patient's perspective are important as multiple patient, clinician, and social dynamics can impede clinicians' understanding of how diseases are impacting their patients [2][3][4][5]. Using a well-validated PROM can help clinicians assess HRQL in a standardized manner and track changes longitudinally-improving therapy counseling and shared decision making [5][6][7][8]. Despite advantages and increasing adoption, many clinicians remain unacquainted Z. H. Hopkins and G. Kuceki have contributed equally to this manuscript.
with PROMs or skeptical of implementing PROMs for their patients in routine practice [9].
To be useful, PROMs must be well-validated, have favorable measurement properties across a broad spectrum of patients and disease severities, and offer interpretable and actionable data [9][10][11][12]. Construct validity, or the evaluation of how well designed a measure is to evaluate its target area, HRQL, is vital to this goal. While the Dermatology Life Quality Index (DLQI) and the adapted DLQI-Relevant (DLQI-R) instruments have been evaluated in psoriasis [13][14][15][16], construct validity testing for Skindex-16 remains sparse in psoriasis. Skindex-16 has recently shown good construct validity in hidradenitis suppurativa [17], acne [18], and compared favorably when directly compared with DLQI in recent comparisons [16,19]. We have also found that within our institution, physicians find the Skindex-16 scales more interpretable (internal memo). Thus, we sought to further elucidate the construct validity of the Skindex-16 in patients routinely seen in our clinics with psoriasis.
Often, a moderate level of correlation between CROs and PROMs is desired [20]. Because PROMs and clinicianreported outcomes (CROs) measure the impact of disease from related but different constructs (HRQL versus clinical disease severity), we hope for correlation that is high enough to confirm the relatedness of the constructs being measured, but different enough to demonstrate that different data are being generated. Conceptually, score discordance can drive decreased correlation. However, discordance may be clinically useful, representing different perspectives gained on how a patient experiences disease.
The purpose of this study is twofold. One, we test hypothesis-driven construct validity of Skindex-16 by investigating known-groups and correlation between Skindex-16 (a PROM) and the overall physician global assessment (OPGA), a CRO. Second, we will quantify how often highly discrepant PROM and CRO scores occur during the same visit and describe factors associated with discordance.

Methods
We collected data on all visits from patients ≥ 18 years old who were seen for psoriasis at University of Utah Dermatology (UUD) from 2016 to 2020. Patients needed a documented OPGA and Skindex-16 assessment for analysis. This study was approved by the University of Utah Institutional Review Board (#76927).

Instruments
Many psoriasis measures exist, but most are largely confined to use in clinical trials. Body surface area is the only commonly collected measure during clinical encounters [21]. At UUD, OPGA is used in routine clinical practice because it is a simple, quick, and easily calculated measure that concisely summarizes the plaque provider global assessment (PPGA), body surface area, and body location into a simple ordinal severity scale. Validation of this measure has been published previously [11]. OPGA ranges from 0 to 5, with 0 indicating clear and 5 indicating very severe disease activity [11]. Skindex-16 is a validated PROM that evaluates the impact of skin conditions on a patient's health-related HRQL in three specific domains: emotions, functioning, and symptoms [22]. Skindex-16 employs a 7-day recall period with Likert-scale responses ranging from "never bothered" to "always bothered." Item scores are averaged, summed, and converted to a linear 0-100 scale. Higher scores indicate a greater HRQL impact [22].

Cross-sectional analysis
The relationships between OPGA and Skindex-16 were evaluated in two stages, cross-sectionally and longitudinally. For cross-sectional analysis, only data from a patient's first visit were included, and in the longitudinal analyses, we used all time points. Skindex-16 and OPGA scores were described using median, range, and interquartile range. Medians, 10th, and 90th percentile scores were calculated using quantile regression.
Known-group comparisons were made by comparing Skindex-16 domain score distributions across the levels of OPGA. We hypothesized that median scores between each level of OPGA should be significantly different (p < 0.05) and show at least a 10-point difference across categories (10% of instrument range) [23]. Differences in Skindex-16 between sexes were assessed using quantile regression. A model assessing differences in Skindex-16 scores by sex after adjusting for disease severity (OPGA score) was assessed next. Finally, an interaction term between sex and OPGA score was performed to investigate differences in the relationship between Skindex-16 score and disease severity varied by sex.
Cross-sectional correlation was assessed with Spearman correlations. Correlations were also calculated by patient sex with differences compared numerically and using the "cortesti" Stata command. Because increasing OPGA and Skindex-16 both represent worsening disease severity and HRQL states, respectively, a positive correlation coefficient was expected. We defined a moderate to moderately high (0.40-0.80) correlation as desirable for convergent validity [20]. As OPGA measures observable disease severity and Skindex-16 measures multiple skin-related HRQL domains, we expect that these measures should not perfectly align.

Longitudinal analysis
Skindex-16 scores for each visit were described using median, range, and interquartile range. Domain-level floor and ceiling effects were defined as observing a score of 0 (all items answered "Never Bothered") or a score of 100 (all items answered "Always Bothered") [24]. Acceptable ceiling/floor effects in this setting were considered to be ≤ 15% of reported visits [24,25]. High ceiling or floor effects suggest a measure has difficulty in distinguishing patients with severe or mild disease, respectively.
Correlation over time (ρ xy ) was calculated using linear mixed models [26,27]. This technique can account for the correlation in measures over time within the same subject. In short, a multilevel linear mixed model to account for within-subject effects and the correlation between the two measurement variables is derived from the variance-covariance matrix under the assumption of a compound symmetric structure [26,27].
Since we could not assume that a patient's clinical status remained stable and varying amounts of time occurred between visits for each patient, reliability in a traditional psychometric sense could not be calculated. However, we hypothesized that after adjusting for disease severity, some degree of reliability or relatedness in Skindex-16 scores over time should still be apparent. We calculated a conditional intraclass correlation coefficient (cICC) from the linear mixed model and patient identifier was used as the class distinguisher. ICC values close to 1 suggest high similarity between values in the same group (patient) and low values suggest repeated Skindex-16 values are not related to person-level effects. We hypothesized that repeated Skindex-16 measures should be moderately related to patient-level characteristics and effects, especially after adjusting for clinical severity.

Discordant visit analysis
Discordant cases were a priori defined as encounters where one score was ≤ 10th percentile and the other was ≥ 90th percentile. These were labeled as either clear skin (OPGA ≤ 10th percentile) and poor HRQL (Skindex-16 ≥ 90th percentile) [cspHRQL], or severe skin (OPGA ≥ 90th percentile) and good HRQL (Skindex-16 ≤ 10th percentile) [ssgHRQL]. Frequencies of discordance were described, and associations between these and non-discordant were assessed for patient sex, comorbid skin conditions, depression, and rheumatologic comorbidities using Fisher's exact test.
For all comparisons, two-tailed p values were used. All calculations were computed using Stata v14.2 (StataCorp, College Station, TX, USA).

Cross-sectional analysis
Median Skindex-16 scores by categorical OPGA score for first visits are shown in Table 2 and their associated distributions in Fig. 1. Pairwise comparisons of Skindex-16 score medians between progressive levels of OPGA (0 vs 1, 1 vs 2, etc.) were significant for all domains (p < 0.05) except OPGA 3 vs 4 in the emotions (p = 0.07)  192 (
Longitudinal correlation was moderate for all domains and was similar to cross-sectional correlation ( Table 2). cICCs were also calculated and were moderate ranging from 0.47 (Symptoms) to 0.65 (Functioning) and increased after conditioning on OPGA score (range 0.47-0.65, Table 2).

Discordance
The number and characteristics of visits with a cspHRQL discrepancy are shown in Table 3. This discrepancy was seen in 7/1028 visits for the emotions, 13/1031 for functioning,

Discussion
In this study, we evaluated three aspects of construct validity: known-groups comparisons, convergent validity, and floor/ceiling effects. We observed statistically and clinically distinct differences between Skindex-16 domain scores and OPGA severity levels. Likewise, we observed the expected positive, moderate correlation hypothesized between all domains of Skindex-16 (PROM) and OPGA (CROM) both cross-sectionally and over time. These data support the construct validity of the Skindex-16 in routinely evaluated psoriasis patients. While women did experience more severe impact on HRQL in this cohort, the relationship between OPGA and Skindex-16 domains was similar between sexes. This is similar to recent findings in hidradenitis suppurativa [17].
Reliability, or the consistency of multiple test adminstrations in the same patient without clinical change, could not be assessed as Skindex-16 was tracked over time in a clinical setting. However, we hypothesized that given a reliable test, patient-level characteristics such as their own understanding of the questions, baseline HRQL levels, and pattern of answering should lead to at least a moderate degree of within-person correlation. Our findings suggest that personlevel effects contributed importantly to the score variability over time, even after adjusting for changes in disease severity. This underscores the importance of tracking individual patient HRQL data and having personalized discussions [28]. Each patient's characteristics are incorporated into their scores, and thus, deviations from in their own trends       are more likely to be meaningful than established cut points of severity. Lastly, ICC's offer insight into differences in domain measurement. For example, Skindex-16 functioning and emotions domains showed less within-person variability over time than the symptoms domain. Emotional and functioning-based HRQL may be more person-specific and may respond differently to changes in disease status than symptoms. For example, prior work has demonstrated that emotional effects and impact on social interactions can persist even after treatment [29][30][31]. Similar to prior studies, domain-level floor and ceiling effects were generally rare and may be an important advantage of Skindex-16 over other general dermatology measures such as the commonly used Dermatology Quality of Life Index (DLQI) [32]. Though floor effects > 20% were seen for the Skindex-16 functioning domain, most of these scores were in cleared disease (OPGA score 0) or very mild disease (OPGA score 1). Importantly, OPGA scores of 0 and 4 were much more common in the cross-sectional cohort than Skindex-16 scores of 0 or 100 (n = 279, 27.1% and n = 60, 5.82%, respectively) illustrating how PROs can add information beyond CROs, regardless of disease severity [9].
In this cohort, only about 1% of patients fell into discordant groups where PROM and CRO scores were highly mismatched. Most patients only had one discordant event. Together, our findings of good construct validity, apparently reliable detection of person-level impacts on HRQL scoring, with the findings of rare discordance suggest that these discordant events may be clinically important and not just artifacts of HRQL instrument error. cspHRQL may suggest a sudden worsening in HRQL without clinically observable disease change or a new skin condition that is now affecting patient HRQL. For example, Skindex-16 scores were elevated for one patient's first two visits, while OPGA was 0. At the third visit, Skindex-16 remained elevated and OPGA jumped to 3, potentially suggesting HRQL effects that occurred prior to a clinically observable psoriasis flare. However, discordances may also represent more complex phenomenon such as present HRQL impacts, such as anxiety of recurrence and side effects of medication, which may persist despite good disease management and have more complicated time trends [29][30][31]33].
If collected prior to the visit, discordant Skindex-16 scores may give context to the visit and trigger investigation for new skin findings, non-obvious symptoms, or impact from other comorbid or new disease. Pruritus, for example, is often a stronger determinant of depression than other visible processes in patients with psoriasis [34]. Another nonobvious issue that could affect functioning is sleep disturbance, which is common with psoriasis [35]. In this study, we found that discordance was more common in patients with other comorbid skin conditions or multiple diagnoses in addition to psoriasis being listed at the clinical visit. Thus, exploring discordance could result in improved understanding of the patient experience, aid shared decision making, and potentially alter management to better meet the global needs of the patient [36][37][38][39].

Limitations
This study has important limitations. First, our data came from a single academic institution and PROM scores are known to vary by socioeconomic status or other demographic characteristics [4,6,40]. Also, data on treatment regimen were unavailable for this analysis. Treatment regimens could potentially add variability to correlations and especially could alter longitudinal relationships. Lastly, convergent and known-groups comparisons were limited to those defined by the OPGA. Though recently validated, it represents a composite of two simple and validated measures which is simple to interpret and is routinely collected [11]. The psoriasis area and severity index (PASI), though considered a gold-standard CRO, is not routinely collected in the clinical setting [12]. Simple measures such as investigator global scores or body surface area are more readily used in clinic, and since a composite of these (the OPGA) is routinely used in clinic, we sought to validate the Skindex-16 with a clinic-ready and easily calculated severity score that is routinely utilized in our institution. Though the OPGA itself is likely not widely used, the measures which it is based on are and it has been shown to be highly related to those. Future studies evaluating Skindrx-16 characteristics in other populations and with other outcome measurements will be important in further understanding how to best implement this measure.

Conclusion
These data support the HRQL construct validity of the Skindex-16 for psoriasis patients routinely seen in clinic. Skindex-16 increased with each increase in clinical severity, and we observed a moderate correlation with OPGA, a composite of commonly used measures such as investigator global assessment, body surface area, and plaque provider global assessment. Domain-level ceiling and floor effects were low, suggesting good discrimination across a broad range of disease severity and the opportunity to provide more granular insight into disease severity. Severe discrepancies were rare but may provide valuable clinical opportunities and should prompt clinician-patient conversations.