Descriptive Statistics
ISQ means, standard deviations, skewness, number of large correlations (r > 0.7), and mean correlations are displayed in Table 2. Several items (Items 6, 10, 11, 12, 13, 14, 16, 18) showed many (>5) large correlations (>0.7). Out of 190 unique correlations, there were 43 (22.6%) that were greater than 0.7, indicating that there was likely a high degree of item content overlap (39). Several problematic item pairs (e.g., Item 5. I find it difficult to describe feelings like hunger, thirst, hot or cold and item 13. It is difficult for me to describe what it feels like to be hungry, thirsty, hot, cold or in pain; Item 3 I have difficulty feeling my bodily need for food and item 11. I have difficulty understanding when I am hungry or thirsty; Item 10. I find it difficult to read the signs and signals within my own body (e.g., when I have hurt myself or I need to rest) and item 14. I am confused about my bodily sensations) had a very high degree of correlation (rpoly = 0.85 for items 5 and 13).
Confirmatory Factor Analysis
Model fit for the 20-item ISQ was inadequate based on conventional fit criteria (Table 3). The Chi-square test was significant (p < 0.001), rejecting the null hypothesis of exact model fit. Other fit indices also failed to meet a priori cutoff values (i.e., CFIcML/TLIcML > 0.95, RMSEAcML < 0.06, WRMR < 1.0, and SRMRu/CRMRu < 0.08), suggesting that this model did not fit the data in our sample well. Using McDonald’s omega, the model showed good reliability (ω = 0.966, 95% bootstrapped CI [0.961, 0.971]); however, as a model-based reliability coefficient is only as valid as the model it is based on (69), this coefficient should be interpreted with caution given the poor fit of the model. Factor loadings for the items in the CFA model are displayed in Table 4.
Item Reduction and Short Form Construction
Misspecification analysis was conducted to identify the specific pairs of items driving the misfit of the unidimensional model. Based on this method, several pairs of items were found to have omitted error correlations (i.e., EPC > 0.1; 51), indicating item content redundancy (e.g., Items 19/20, 5/13, and 3/11; see Supplemental Table S1 for a full list of flagged item pairs).
Using the polychoric correlation matrix, the items were ordered by number of large correlations (>0.7). First, the 6 items with the most intercorrelations were removed (Items 6, 10, 11, 13, 14, 16). Item 17 was then cut because of its high correlations with items 12 and 1 (r values = 0.73 and 0.71, respectively; 17. I don’t tend to notice feelings in my body until they’re very intense; 12. I find it difficult to identify some of the signals that my body is telling me [e.g., If I’m about to faint or I’ve over exerted myself]; 1. I have difficulty making sense of my body’s signals unless they are very strong). After these reductions, several large correlations were still present among the 13 remaining items. To further reduce item redundancy, each of the flagged item pairs was compared, and the item whose content was more general was retained for the final scale. Using this criterion, item 3 was kept over item 8 (3. I have difficulty feeling my bodily need for food; 8. I only notice I need to eat when I’m in pain or feeling nauseous or weak), item 20 was kept over item 19 (20. Even when I know that I am physically uncomfortable, I do not act to change my situation; 19. Even when I know that I am hungry, thirsty, in pain, hot or cold, I don’t feel the need to do anything about it), and item 5 was kept over item 18 (5. I find it difficult to describe feelings like hunger, thirst, hot or cold; 18. I find it difficult to put my internal bodily sensations into words). This item reduction process resulted in a 10-item scale with all inter-item correlations less than 0.7. Based on information from the misspecification analyses item pairs 2/3 (2. I tend to rely on visual reminders (e.g., times on the clock) to help me know when to eat and drink; 3. I have difficulty feeling my bodily need for food) and 7/20 (7. If I injure myself badly, even though I can feel it, I don’t feel the need to do much about it; 20. Even when I know that I am physically uncomfortable, I do not act to change my situation) were further identified as misspecified, and items 3 and 20 were retained due to their more general content. The final short form of the ISQ contained 8 items (ISQ items 1, 3, 4, 5, 9, 12, 15, and 20; Supplemental Table S2).
The short form ISQ (ISQ-8) showed far better fit after item reduction using the same criteria (Table 3). The Chi-square test once again rejected the null hypothesis of exact model fit (p = 0.007), signaling at least some degree of model misspecification. Other fit indices met a priori criteria (i.e., CFIcML/TLIcML > 0.95, RMSEAcML < 0.06, WRMR < 1.0, and SRMRu/CRMRu < 0.08), demonstrating trivial levels of global misfit, and misspecification analysis of this reduced-item set showed no flagged pairs, indicating a low likelihood of item content redundancy. Reliability of the model was evaluated with coefficient omega (ω = 0.901, 95% bootstrapped CI [0.886, 0.913]) suggesting good internal consistency for this 8-item model.
Item Response Theory Analyses
The model for the ISQ-8 showed overall good fit in the adult sample (C2(20) = 32.5, p = 0.038, CFIC2 = 0.997, RMSEAC2 = 0.036, SRMR = 0.040). Additionally, the standardized LD-χ2 values were all less than 5.79, providing no evidence for remaining item redundancies. The marginal reliability of the ISQ-8 was good (rxx = 0.891, 95% bootstrapped CI [0.881, 0.890]), further demonstrating the psychometric adequacy of the reduced scale. Scores for individual participants all had reliability values greater than 0.7, indicating the 8-item form measured the construct with sufficient precision in all cases. Factor loadings and IRT slope/intercept parameters can be found in Table 4.
Based on an examination of the item category characteristic curves (Supplemental Figure S4), we concluded that a 7-point response scale was not optimal for the ISQ-8. For all 8 items, the plots showed that there were item responses that at no point on the latent continuum were the most probable choice, thus suggesting that there were too many response options. As a result, item responses were collapsed together to create a 5-point scale (i.e., the “2”/“3” responses were combined together into a single response option, as were the “5”/“6” responses). Using this new 5-point scale, the IRT model was re-run in the adult sample. This model also showed good fit (C2(20) = 32.0, p = 0.043, CFIC2 = 0.997, RMSEAC2 = 0.035, SRMR = 0.038), no local dependencies (LD-χ2 values < 9.26), and good reliability (rxx = 0.887, 95% bootstrapped CI [0.878, 0.897]). EAP-estimated latent trait scores derived from the recoded ISQ-8 correlated very highly with those derived from the original ISQ-8 (r > 0.997). The item trace lines for the 5-point scale indicated more consistent response utilization than those for the 7-point scale, but the middle response was still shown to be underutilized in a number of cases (Supplemental Figure S4).
Differential item function was also evaluated using the iterative Wald test procedure to identify differences in performance by age, sex, gender, and household income. No differential item functioning was found between any of the tested groups on any item (all p’s > 0.101, FDR corrected; see Supplemental Table S3 for full DIF results). Given that no difference was observed between the adult and adolescent groups, the two were combined and run together in another model using the 5-point scale. This model showed good overall fit (C2(20) = 48.2, p < 0.001, CFIC2 = 0.994, RMSEAC2 = 0.046, SRMR = 0.036), no local dependence (LD-χ2 values all < 9.14), and good reliability (rxx = 0.880, 95% bootstrapped CI: [0.871, 0.889]). Latent trait scores from this model (EAP estimation) correlated very highly with total scores on the original ISQ-20 (r = 0.942), and thus we concluded that this short form adequately represented the longer measure from which it was derived. A regression of ISQ-8 score on age and sex across the full sample explained very little of the variance in interoceptive sensibility (R2 = 0.045), although a statistically significant main effect of sex indicated moderately higher levels of interoceptive difficulties in autistic women and girls compared to autistic men and boys (bF-M = 0.612, p < 0.001). The main effect of age and the age by sex interaction were not significant (p’s > 0.104). Neither of these results differ according to reported sex or gender.