Comparative analysis of pediatric anxiety measures in clinical sample: evaluation of the PROMIS pediatric anxiety short forms

Examine the psychometric properties, validity in relation to a legacy measure, and diagnostic accuracy of the PROMIS Anxiety Short Form 2.0 (PROMIS A-SF) Caregiver and Youth Reports in a clinical sample. Participants were 301 youth and caregivers referred to a behavioral health clinic by their pediatrician. Participants and their caregivers completed PROMIS A-SF (youth and caregiver proxy), SCARED (youth and caregiver proxy), and a semi-structured interview. Descriptive, correlational, test–retest reliability, and receiver operating characteristic (ROC) analyses were conducted for both measures. PROMIS A-SF measures were highly correlated with SCARED total scores and the panic subscale. PROMIS A-SF measures had AUCs ranging from .49-.79 for the detection of any of three primary subtypes of anxiety: Generalized Anxiety, Separation Anxiety, and Social Anxiety. Implications: Dimensional anxiety subtypes, such as Social Anxiety may not be well detected on the PROMIS youth measure. Use of the PROMIS A-SF as a part of Evidence Based Assessment process is discussed.

relative to (a) the specific construct(s) it is supposed to be measuring; (b) the application of the measure (e.g., diagnosis, screening, se etc.); (c) the population with which it is to be used (age, clinical sample etc.). To do this, the classic psychometric properties of the PROMIS-A-SF needs to be examined as has been done with the PROMIS measures for pediatric depression [17]. Presently, the test-retest reliability of the short form in a pediatric sample has been reported (r = 0.57-0.74) [15] but our literature review has not identified any studies that examine the concurrent validity of the short-form by comparing it to existing pediatric anxiety scales. Second, the PROMIS-A-SF measure was developed as a unidimensional measure of anxiety that has not been assessed against specific anxiety disorders as defined by the DSM-5. As the authors of the PROMIS measures indicate, clinician-researchers are expected to further evaluate the strength of association between the measure and any disorder identified in clinical samples. For this reason, the relationship of the unidimensional construct of anxiety as measured by the PROMIS anxiety scale needs to be compared to each of the identified types of anxiety commonly seen in clinics and considered disorders in the DSM-5. Third, future directions for research on PROMIS measures include finding ways to implement their use in diverse clinical settings and identifying clinical thresholds for these measures [18]. While the percentiles for scores derived from the PROMIS A-SF in a nationally representative sample have been reported by Carle and colleagues [19], no study has determined whether the PROMIS A-SF can be used to predict any of the common pediatric anxiety disorder diagnoses derived from gold-standard assessments of the presence or absence of disorder. Presently, little progress has been made in developing a research base to support the clinical applications of the PROMIS-A-SF scale. For the PROMIS-A-SF to be clinically useful, its diagnostic accuracy needs to be assessed and meaningful cutoff scores need to be defined.

Study aims
Our primary study aim was to examine the concurrent validity of the PROMIS A-SF compared to the SCARED using an outpatient psychiatry sample. To accomplish this, the single-factor PROMIS A-SF was assessed against three major subtypes of anxiety, Separation Anxiety (SEP), Generalized Anxiety (GAD), and Social Anxiety (SOC) identified in the DSM-5 and the SCARED. Along with concurrent validity, we examined the test-retest reliability of the PROMIS A-SF to replicate rank order at two administrations and the agreement (i.e., the capacity of the PROMIS A-SF to produce identical results across two points) [20] of the test and the retest in our sample.
While examining the correlations between the PROMIS A-SF and the SCARED helps establish whether the two measures map onto the same construct, it is not sufficient to establish the utility of the PROMIS for determining whether a youth meets criteria for an anxiety disorder based on a structured diagnostic interview. However, the PROMIS A-SF may be a useful part of the EBA process to support statistical decision making. For that reason, the second aim of this study was to examine the diagnostic accuracy of the PROMIS A-SF in relation to an evidence-based semistructured interview using Receiver Operating Characteristic (ROC) analysis.

Method
The Institutional Review Board at Lurie Children's Hospital approved study procedures to be in compliance with ethical standards. The retrospective review of clinical data was considered minimal risk and a full HIPAA waiver was granted of consent.

Participants
Youth and caregivers in this sample were referred by their primary care pediatricians for an outpatient psychiatric evaluation. Pediatricians making these referrals were part of a program called Mood, Anxiety, ADHD Collaborative Care (MAACC) [21] designed to train and expedite mental health care access in small-to medium-sized community pediatric practices. There were 537 youth referred by their pediatrician for evaluation between June 2018 and October 2020. Of the referred families, 310 completed both a psychological evaluation and either the SCARED and/or PROMIS A-SF as part of routine clinical care. Nine youth were removed from the analysis because they were age 6 years at the time of the evaluation. Table 1 includes patient demographics for the sample (N = 301). Youth (ages 7-18 years, M = 12.93; SD = 3.02) and their caregivers completed measures in English or Spanish. Thirteen caregivers used the Spanish Language PROMIS and SCARED measures; all other youth measures were completed in English. Seven youth, aged 7, completed the youth PROMIS A-SF measure. Despite the PROMIS A-SF being validated for ages 8-17, the research team felt that the data garnered from these seven youth was appropriate for inclusion in the sample. This decision was made for two reasons; (a) the psychologist conducting the evaluation felt the language skills of those children were sufficient to allow them to complete the measures; and (b) seven-year-old children are often included in studies of treatment of anxiety [22], so including them in a study assessing the utility of the PROMIS A-SF in predicting a clinical anxiety diagnosis could be useful. A smaller sample of youth and caregivers (n = 52) who had completed measures 2-3 weeks prior to the diagnostic interview were asked to repeat the PROMIS A-SF measure on the date of the diagnostic interview.

PROMIS anxiety short forms 2.0 (PROMIS A-SF)
The PROMIS A-SF measures were created to assess anxiety for youth ages 8-17 and caregiver proxy from ages 5-17 [14,15]. Likert response total scores range from 8 to 40 (1 = "never" to 5 = "almost always"). Summed raw scores and associated T-Scores (M = 50, SD = 10) are provided on the Health Measures website (https:// www. healt hmeas ures. net/ search-view-measu res? task= Search. search). As noted above, there are no empirically-established clinical cut off scores for the PROMIS A-SF, although T-score severity levels of mild-moderate and moderate-severe have been described by Carle et al. [16] in a large sample. Internal consistency of the PROMIS A-SF (α = 0.84) and test-retest reliability at 11-17 days were reported as good (r = 0.75), although 40% of that sample included youth recruited from an asthma clinic [18]

Screen for child anxiety related emotional disorders (SCARED)
The SCARED is a 41-item questionnaire designed to assess a variety of anxiety symptoms occurring over the prior three months, with parallel caregiver-and child-report versions for youth [9]. The SCARED allows for calculation of a total anxiety score (Range = 0-82) and has a five-dimension structure, with subscale scores for Separation Anxiety, Generalized Anxiety, Social Anxiety, and Panic/Somatic symptoms, and School Avoidance. Cut off scores for each dimension are used to indicate the presence of a disorder [12]. The SCARED has demonstrated discriminant validity between anxious and non-anxious youth, strong internal consistency (coefficient α of approximately 0.90), and favorable psychometrics in treatment-seeking samples [9,10,12,22,23].

Anxiety disorders interview schedule for DSM-IV: child and parent versions (ADIS-IV-C/P)
The ADIS-IV-C/P [24] are semi-structured diagnostic interviews used to assess psychopathology among youth ages 6-18, with a particular emphasis on anxiety disorders. Clinical Severity Ratings (CSR) ranging from 0 to 8 are assigned by the clinician for each diagnosis with the ADIS-IV-C/P, with a CSR of 4 or greater representing symptoms and distress/interference at a level that meets full diagnostic criteria. There is strong evidence supporting the reliability, validity, and sensitivity to clinical change for the ADIS-IV-C/P [25]. Test-retest reliability for anxiety disorder diagnoses for both parent and child reports is excellent (ƙ coefficients, 0.80 to 0.92) [25], and interrater agreement for anxiety disorders diagnosed with the ADIS-IV-C/P is strong (ƙ coefficients, Table 1 Sample characteristics a Includes the following primary, secondary, tertiary, and quaternary diagnoses: GAD, separation anxiety, social anxiety, or panic disorder b Includes the following primary, secondary, tertiary, and quaternary diagnoses: OCD c Includes the following primary, secondary, tertiary, and quaternary diagnoses: depression (NOS), major depressive disorder, mood disorder (NOS), major depressive episode, mood disorder, disruptive mood dysregulation disorder d Includes the following primary, secondary, tertiary, and quaternary diagnoses: ADHD-inattentive type, ADHD-hyperactive type, ADHDcombined type e Includes the following primary, secondary, tertiary, and quaternary diagnoses: disruptive disorder, conduct disorder, oppositional defiant disorder, impulse control disorder f Includes the following primary, secondary, tertiary, and quaternary diagnoses: PTSD, trauma, adjustment disorder, bereavement, other trauma 0.80 to 1.0) [26]. The ADIS-IV-C/P modules for Attention Deficit Hyperactivity Disorder, Oppositional Defiant Disorder, Separation Anxiety, Generalized Anxiety, Social Anxiety, Panic Disorder, and Depression were administered to all youth and caregivers.

Procedure & data collection
Patients were referred to MAACC for further mental health evaluation by their pediatrician. Patients were initially screened by telephone for appropriateness of referral (e.g., age 7-18 years, no autism or developmental disorders, not recently engaged in higher levels of care, etc.). All patients and caregivers received the PROMIS A-SF and SCARED (among other clinical measures not included in this study) by mail or accessed online through a secure portal prior to their appointment. Directions were given to both the caregiver and youth to complete the measures prior to the diagnostic intake. The psychologist was occasionally presented with completed paper measures at intake if the patient did not complete measures digitally. This procedure represented the typical EBA process in busy outpatient clinics described by Van Meter and colleagues [27] and Ford-Paz et al. [28]. A licensed psychologist administered the ADIS-IV-C/P anxiety modules with the young person and at least one caregiver/ guardian. Although inter-rater reliability was not tracked, the psychologist and program psychiatrist, both trained in ADIS-IV administration, discussed findings for all new diagnostic evaluations and agreed upon a consensus diagnosis.

Statistical analysis
All analyses were conducted using STATA 15.1 [29]. Descriptive statistics were completed for the full sample (N = 301). Test-retest reliability was completed with a subset (n = 52) of participants who repeated measures. There is disagreement about the preferred approach to capture test-retest reliability for the sample so three analyses, Pearson correlations, intraclass correlation coefficients (ICC), and Bland-Altman methods were performed. Pearson correlations measure the rank order of individuals across two measurements [20]. While Pearson correlations do not assess agreement, it remains a common way of reporting test-retest reliability and can be used in calculating a reliable change index [30]. Two-way mixed effects ICC were also performed as a measure of both reliability and agreement between measures [31][32][33]. An ICC value ≥ 0.70 is considered good [33]. In contrast, Berchtold [20] notes that the Bland-Altman method is preferable to ICC because the ordering of the two measures in a test-retest situation may matter. The Bland-Altman approach involves determining the percent of pairs of test and retest scores for individuals that fall within the 95% confidence interval (95% CIs) of the mean difference in test and retest scores. The Bland-Altman method also allows for comparing the 95% CIs with clinician's interpretation of when differences in two ratings essentially agree versus reflecting what may be a clinically meaningful change. In the present study, the ICC is reported along with the agreement using the Bland-Altman method and how agreement on the PROMIS-A compares to clinician's interpretation of meaningful score differences over time.
Clinician-completed ADIS-IV structured interviews were administered to caregivers and youth and used to identify anxiety-related diagnoses of GAD, SEP, or SOC. The category of "any anxiety disorder" included youth meeting criteria for any of those three anxiety disorders. ROC analyses were performed to determine how well the PROMIS A-SF and SCARED measures could distinguish between the presence or absence of each disorder. Panic disorder was not included because there were too few cases (n = 2) to allow for analyses. The Area Under the Curve (AUC) is the metric for the best overall classification ability for each diagnosis. AUC values are typically evaluated on the following scale: 0.9-1.0 outstanding; 0.80-0.9.89 excellent; 0.70-0.79 acceptable; 0.60-0.69 limited discrimination; 0.50-0.59 no discrimination [35]. Sensitivity (SE), specificity (SP), and positive predictive values (PPV) were calculated for the PROMIS A-SF.
The percent of missing data was low for SCARED and PROMIS A-SF measures. Youth and Caregiver measures for SCARED-Total, Panic, GAD, SEP, SOC, and School Avoidance subscales had less than 13% missing data. Missing data for the youth and caregiver measures of the PROMIS A-SF Total Scores was less than 6%. To determine if the data were Missing Completely at Random (MCAR), Little's test was performed by analyzing missing data from the youth and caregiver SCARED and PROMIS A-SF total scores. Little's test was not found to be significant (p = 0.57), and we concluded that our data were missing completely at random. As a result, the research team determined that imputation was not needed and pairwise deletion was used to manage missing data.

Descriptive statistics, test-retest reliability, and test-retest agreement
Key demographic descriptive statistics are reported in Table 1. The average age of patients was 12.93 years 1 3 (SD = 3.02). Approximately 70.43% (n = 212) of the sample was White and 16.94% was Hispanic.
Paired sample t-tests of youth and caregiver PROMIS A-SF total score means and the youth and caregiver SCARED total score means did not significantly differ by race, ethnicity, and insurance type. However, mean scores were significantly higher in females compared to males on the youth PROMIS A-SF (t = -4.69; p < 0.001), caregiver PROMIS A-SF (t = -2.16; p < 0.05), youth SCARED total score (t = -4.40; p < 0.001), and the caregiver SCARED total score (t = −2.49; p < 0.05).
Bland and Altman [36] note that, even when findings are statistically significant, the range in difference scores can be substantial and difficult to interpret. To help with interpretation of the Bland-Altman test results, we followed deVet et al. [33] in soliciting clinician's impressions of what constituted a clinically meaningful difference in test-retest scores on an anxiety measure with the same range of scores as the PROMIS A-SF (i.e., T-Scores with Mean = 50 and SD = 10). There were 11 child and adolescent psychologists who were surveyed and asked to report a score that would represent a clinically meaningful change at 3 weeks from initial administration. The median cutoff score for the 11 psychologists was a 5-point difference on both a youth and caregiver-proxy measure of childhood anxiety. Given that the upper and lower bounds for both the youth and caregiver report (+ 9.4 and + 10.6, respectively) were considerably higher than the clinician's cutoff of 5 for a possibly clinically meaningful change, the level of agreement for both the caregiver and youth reported PROMIS A-SF may be difficult to interpret, even if statistically significant for the ICC and Bland-Altman methods. Table 2 includes the correlations between the PROMIS A-SF and the anxiety subscales of the SCARED. For the youth measures, the PROMIS A-SF correlated moderately highly with the SCARED total (r = 0.69; p < 0.001), GAD (r = 0.59; p < 0.001), and Panic scale (r = 0.66; p < 0.001). The youth PROMIS A-SF had a low correlation with the Youth SCARED SEP (r = 0.43; p < 0.001) and a negligible correlation with SCARED SOC (r = 0.27; p < 0.001). Differences in the magnitude of these correlations were examined using Fisher's z-transformation. The correlations between the PROMIS A-SF and both the SCARED SEP (z = 4.80, p < 0.001) and SCARED SOC (z = 4.80, p < 0.001) were significantly lower than the correlation between the PROMIS A-SF scale and the SCARED total score.

Concurrent validity
The caregiver proxy PROMIS A-SF measure was moderately highly correlated with the caregiver SCARED Total (r = 0.68; p < 0.001) and caregiver SCARED SEP (r = 0.64; p < 0.001), the SCARED GAD (r = 0.56; p < 0.001), and the SCARED Panic scale (r = 0.59; p < 0.001). The correlation of the caregiver proxy PROMIS A-SF and the SCARED SOC was negligible (r = 0.21; p < 0.001). The correlation between the PROMIS anxiety scale and SCARED social anxiety scale (z = 4.80, p < 0.001) was significantly lower than the correlation between the PROMIS anxiety scale and the SCARED total anxiety score. Table 3 presents a comparison of AUC values for Caregiver and Youth PROMIS A-SF Total score, SCARED Total score and SCARED subscales (SOC, GAD, SEP) in accurately diagnosing Separation Anxiety, Generalized anxiety, Social Anxiety, or any anxiety disorder.
Caregiver and Youth SCARED Total Score and SCARED Subscales AUC values ranged from acceptable (0.72) to outstanding (0.91).
SE, SP, PPVs, and associated clinical cut offs and T-scores are presented in Table 3 for the PROMIS and SCARED measures. For a screening measure an ideal cutoff score generates an SE of ≥ 0.80 and an SP of ≥ 0.80 [37]. None of the PROMIS measures generated a score meeting those SE and SP values. In our sample, this standard was only approached for the caregiver PROMIS A-SF SEP (0.82; 0.72.).  Table 3 Comparison of area under the curve and cutoff scores for the PROMIS and SCARED *Any anxiety disorder includes separation anxiety disorder, GAD, social anxiety disorder, & panic disorder; total scores on the SCARED are compared to any anxiety disorder (SEP, GAD, social anxiety, & panic disorder); GAD, social anxiety disorder, and separation anxiety disorder are all compared to SCARED subscales that correspond to each diagnosis

Discussion
This study examined the concurrent validity of the youth and caregiver proxy PROMIS A-SF by comparing it with a well-established pediatric anxiety measure, the SCARED, and assessed the diagnostic accuracy of the PROMIS A-SF measures as predictors of the three major subtypes of anxiety disorders derived from an evidence-based semi-structured diagnostic interview. Assessing the concurrent validity of a new measure (e.g., PROMIS A-SF) in comparison to a well-established measure (e.g., SCARED), both ostensibly measuring the same construct, helps establish whether the underlying construct is indeed the same for both measures. Because the goal of developing the PROMIS measures was to produce caregiver-and youth-report anxiety scales with a single factor structure invariant across age, sex, and race/ ethnicity, it was likely that the PROMIS A-SF would not do equally well in assessing the several different dimensions of anxiety assessed on the SCARED [9,10]. As a result, it is not surprising that significant differences were detected in the magnitude of the correlations between the PROMIS A-SF and the SCARED subscales for both youthand caregiver-report, respectively. The results indicate that the caregiver PROMIS A-SF provides a better estimate of the underlying construct for total anxiety, panic, generalized anxiety, and separation anxiety as defined by the SCARED. The youth PROMIS A-SF demonstrated strong relationships with total anxiety, panic, and generalized anxiety as defined by the dimensions on the SCARED.
This pattern of results suggests that if a researcher or clinician is seeking a caregiver-reported anxiety scale measuring total anxiety, SEP, GAD, or panic scales, the PROMIS A-SF is a reasonable choice. The caregiver PROMIS A-SF would not be appropriate for SOC. If the researcher is seeking a youth report measuring total anxiety, GAD, and panic scales, the PROMIS A-SF is a reasonable choice. The youth PROMIS A-SF would be less appropriate for measuring SOC or SEP.
PROMIS A-SF measures may add value over other available anxiety measures in busy clinical settings. They are free, brief tools, developed with the goal of reducing racial and ethnic bias. Moreover, the PROMIS A-SF could be used in a battery with other brief PROMIS tools that assess a broad spectrum of health functioning. The PROMIS A-SF tools could also be used to improve diagnostic accuracy in clinical settings when incorporated into an Evidence Based Assessment (EBA) process. EBA techniques improve diagnostic decision making and align with precision mental health procedures [38]. In a precision mental health approach, the likelihood that an individual will meet criteria for a disorder is calculated and can be used for clinical decision-making.
By applying the present findings, the PROMIS measures may be relatively quickly incorporated into recommended EBA practices. For example, Van Meter et al. [27] described the use of the nomograms to improve statistical prediction of diagnosis in in clinic settings by using the identified prevalence of a disorder in a local sample and the Parent SCARED, with known SE and SP. The nomogram procedure could be completed with the PROMIS A-SF. Considering the present sample, the prevalence of GAD based on semi-structured interview was 36.88% and the calculated accuracy of the youth PROMIS A-SF in detection of GAD was SE = 69.16; SP = 68.16 (T-Score 58.7). An above cut off score would result in an increased probability that a youth would meet criteria for a diagnosis of GAD (56%), whereas below cut off score would decrease the probability (21%) of a GAD diagnosis. In this example, a measure being above or below the cut off significantly improves prediction of a disorder's presence or absence over the starting point of population prevalence. While optimal (SE = 0.8, SP = 0.8) cut off scores were not achieved by any of the PROMIS A-SF in our sample, knowledge of local prevalence of a disorder of interest and SE and SP of a measure for a clinical condition can still improve the process of statistical prediction. Additionally, while structured interviews remain the gold standard for diagnostic decision-making, they are underutilized in clinic settings because of their length and/or administration time [39]; instruments like the PROMIS® measures might be useful in determining which modules of a structured interview should or should not be administered to shorten administration time of a structured interview [28].
The PROMIS A-SF is measuring the broad construct of total anxiety and may be valuable in EBA. Both caregiver and youth PROMIS A-SF measures the presence of Any Anxiety Disorder reasonably well. The findings of the present study regarding the reliability and agreement of the PROMIS A-SF have implications for monitoring patient outcomes. Pearson correlations are customarily reported for measures of psychopathology and describe the level of rank-order correspondence at test and retest but not the degree to which scores agree at both measurements time points. Agreement between test and retest is assessed using ICC or Bland-Altman methods. There were statistically significant rank order correlations (i.e., Pearson, r = 0.75) and significant agreement using both the ICC and Bland-Altman methods in our sample. Further, clinicians surveyed as part of Bland Altman method identified change scores of + 5 as clinically meaningful while the + 95% CI is much higher. Thus, clinicians may benefit from education on how to interpret what scores are clinically meaningful.
This reasonably high test-retest reliability is a strength of the PROMIS A-SF. Test-retest reliability is valuable for assessing reliable change in monitoring patient progress with an intervention. Thus, the PROMIS A-SF may be particularly useful as an initial screener for Any Anxiety Disorder symptoms and progress monitoring of anxiety symptoms during treatment, particularly if recommended cut offs to assess reliable change are provided. The PROMIS A-SF measure may be less useful in screening for and monitoring progress for specific anxiety disorders, such as social anxiety.

Strength and limitations
The strengths of this study include the novel investigation of the PROMIS A-SF compared to a gold standard semi-structured interview (ADIS-IV-C/P) in an outpatient clinical psychiatry sample. The sample of 301 youth and caregivers is well sized to draw assumptions of the performance of the PROMIS A-SF and missing data was minimized (< 13%). This study is the first to describe the relationship between the youth and caregiver PROMIS A-SF report with a well-established anxiety measure and report psychometric properties of the PROMIS A-SF in a clinical sample. Findings suggest consideration for use of the PROMIS A-SF in clinical practice.
The ADIS-IV-C/P anxiety modules were completed by a trained evaluator with the youth and caregiver. The evaluator was not blinded to SCARED or PROMIS measures but did not have scoring of these measures available prior to the interview. As has been discussed [27], while non-blinding the evaluator may introduce bias, it also reflects real world clinical practice. Measures of inter-rater reliability for the ADIS-IV was not collected but interviews were completed by a trained psychologist and reviewed by a trained psychiatrist. The ethnic and racial diversity of the sample closely aligns with regional demographic data, apart from underrepresenting youth and caregivers defining themselves at Black/ African American. Thus, generalizability of this sample should be undertaken cautiously.
Additional investigation of the PROMIS A-SF is needed in clinical samples. The PROMIS A-SF and PROMIS Depression-Short Forms were designed as unidimensional measures, however, the developers of these measures have suggested a bi-factor model with a shared latent general factor [41]. In a clinical sample, where comorbidity is common across disorders (anxiety and depression) and within each disorder (SEP, GAD, etc.), the unidimensional model may not hold up. Confirmatory Factor Analysis to assess for a latent construct in a clinical sample and further Exploratory Factor Analysis would be needed if the unidimensional structure was not confirmed.

Conclusion
The PROMIS A-SF are brief, psychometrically sound, instruments created for the purpose of assessing symptomology across a range of health conditions. The SCARED is a widely used, well-established measure of anxiety. Youth and caregiver PROMIS A-SF results are comparable to the SCARED in the detection of any anxiety disorder, however, appear limited in the detection of specific subtypes of anxiety including social anxiety and separation anxiety in a general psychiatry sample. Specific consideration of anxiety disorders of interest should be taken when choosing how to best incorporate the PROMIS A-SF into your research and clinical practice.