The present study serves to replicate and extend initial findings and confirm the utility of our shortened and “optimized” PANSS versions for use in pediatric trials. The shortened PANSS versions we derived based on the NIMH TEOSS pediatric schizophrenia trial dataset [2], which performed as well as the original 30-item PANSS in the TEOSS trial with respect to psychometric integrity and change over time, have now been examined and tested in two additional wholly independent placebo-controlled positive outcome multicenter pivotal trials by different sponsors with different drugs (paliperidone pivotal trial [4] and herein, in this placebo-controlled multicenter aripiprazole pivotal trial. In all instances, the 10-item and 20-item PANSS versions demonstrated psychometric properties that equaled those of the full 30-item version.
Although the benefits of a shorter scale seem obvious with respect to lower burden to adolescent patients, caregivers, and the practitioners who administer the scale, the guiding question of most interest is whether the reduction of items would serve to detract from signal detection. Clearly, pediatric clinical trials that are not optimized for detecting signal are in many ways a waste of precious human resources and a betrayal of the good faith efforts of participants, families, sponsors, regulators, and ultimately the field at large.
For both the paliperidone randomized placebo-controlled trial, and now this aripiprazole randomized placebo controlled trial, we found each shortened version to detect drug vs placebo treatment effects as well as or better than the full 30-item scale.
Results showed good reliability and high correlation between the short forms (10- and 20-item) and the standard 30-item version. No clinical bias was detected, and prorated scores closely matched the full-length form, particularly within the score range commonly used in clinical trials (e.g., total scores of 60 to 120). Short and full-length form scores also had similar correlations with CGI-S scores.
Results strongly replicated prior work indicating that the PANSS items tap five modestly correlated factors .[7–9, 14, 18–19] Confirmatory factor analyses found that the 10- and 20- item scales developed in Findling al. [2] retained the consensus five-factor model [7, 14], with all items showing good loadings on the posited factor. Whether using 30, 20, or 10 items, the five-factor model fit markedly better than a one factor model. The total composite score provided high reliability across an extremely wide range of severity levels for all three lengths. The lack of judgment and insight item (G12) showed weak factor loadings and poor item characteristics, also consistent with prior work in adult as well as pediatric samples.
In keeping with the PANSS having multiple underlying factors with low correlations between them, the Guttman lambda6 and OmegaTotal reliability estimates were higher than Cronbach’s alpha. Alpha assumes that all of the items are related to a single underlying factor [10], which is well-established not to be the case for the PANSS. The reliability findings for the five subscales based on both the 10- and 20- item subsets also were good, and even better than we found in prior analyses with the paliperidone dataset [4].
Psychology and medicine have been facing “replication crises” [20–21]. Practice guidelines, the EQUATOR guidelines, and even the Wikipedia guidelines for articles on medicine-related topics all stress the importance of replication, and de-emphasize findings based on a single study. To address this need for replication before recommending clinical implementation, we have worked to obtain access to multiple large, independent data sets based on registered clinical trials, and we have used consistent statistical methods and an a priori choice of factor structure and items to retain in all our subsequent replications and extensions. In addition, replicating with completely independent samples, the studies used different patients, different countries, different raters, and different pharmacological interventions [22–23]. Each of these variations heightens the risk of the effect size shrinking. Despite this, the short forms have shown high reliability, convergent validity, and sensitivity to change during treatment that compare favorably to the full-length version.
The current paper has certain limitations, such as being a secondary analysis of a clinical trial where the PANSS was administered in its full 30-item format. To ensure item performance is not affected by contextual factors, it would be beneficial to examine the reliability and validity of the shortened version when administered independently. This would ensure that item characteristics were not dependent on the context created by interviewing about the other (subsequently omitted) items. Because these items performed poorly – not just in present analyses, but also in other pediatric samples, and indeed, across adult samples as well (see Santor et al. [14] for review)—they are unlikely to be contributing crucial context for responses to the stronger, retained items. Using a shortened interview also could reduce the burden and duration, potentially enhancing rater and participant focus and consequently improving scale reliability. Further research is also warranted to explore the reliability and treatment sensitivity of the five subscales.
Implications
Our work provides further support and confirmation of the utility of the shortened PANSS for pediatric trials. The 10-item and 20-item versions we developed from the TEOSS dataset have performed equivalently to each other and equivalently to the 30-item version in their ability to detect baseline to endpoint treatment change; in addition, we now have 2 large independent placebo-controlled positive drug trials replicating the psychometrics and showing equivalent drug/placebo signal detection for each of the three versions.
Given the rich body of findings, our recommendation for clinical and psychopharmacology trial use at present is that the 10-item version be used. The 10-item version not only reduces burden but performed as well as the 20- and 30-item versions across a wide range of adolescent patients with severity matching that sought in psychopharmacology registration trials. That said, researchers or drug developers targeting symptoms not covered on the 10-item version are always free to use the 20-item or even the full 30-item version if the symptom of interest so requires. We are providing the psychometric analyses and comparisons from all 3 versions (10-item, 20-item, and 30-item) as supplementary data to assist others and in the hopes of further growing the literature on psychometric characteristics of these PANSS versions in pediatric samples.
Future Directions
Efforts to further optimize the PANSS assessment in pediatric trials should continue. To help improve standardization, reduce noise, and improve accuracy, our group is currently developing a pediatric semi-structured interview for the 10-item version to assist clinicians and researchers in assessing the 10 targeted items. Surprisingly, although many of us (JB, DGD, RLF) have long trained investigators in best practices when interviewing adolescents and their parents on the PANSS, a structured interview for the pediatric age group has never been developed. A standardized semi-structured interview should benefit all stakeholders: it would assist clinicians in better assessing symptoms initially and over time, allowing for more informed clinical management, and it would assist researchers by reducing interrater and intrarater variance, thus improving signal detection and allowing for a more robust and reliable determination of treatment effects for the ultimate benefit of our patients and their families.