Participants and Demographics
In total, our sample included TAS data from 1464 unique individuals across the two data sources (Table 1). Autistic adults in the SPARK sample (n = 743, age = 30.91 ± 7.02 years, 63.5% female sex) were predominantly non-Hispanic White (79.4%) and college-educated (46.4% with a 2- or 4-year college degree, and an additional 26.5% with some college but no degree), similar to the previous sample drawn from this same SPARK project (78). The median age of autism diagnosis was 19.17 years (IQR = [10.33, 28.79]), indicating the majority of individuals in the sample were diagnosed in adulthood. The majority of participants reported a current depressive or anxiety disorder (defined as symptoms in the past three months or an individual currently being treated for one of these disorders), with depression present in 59.2% and anxiety present in 71.7%. TAS-20 scores in the SPARK sample were present across the full range of trait levels (M = 60.55, SD = 13.11), and just over half of the sample (54.5%) was classified as “high alexithymia” based on TAS-20 total scores greater than or equal to 61. Less demographic information was available for the general population adults in the HPP sample (n = 721, age = 30.92 ± 13.01 years, 64.9% female), but the available demographics indicated that these individuals were well-matched to the SPARK sample on age and sex. Partially imputed TAS-20 scores in the HPP sample were slightly higher than other general population samples (M = 50.21, SD = 11.21), and based on these scores, 17.1% of HPP participants were classified as having “high alexithymia.” As anticipated, large differences in TAS-20 total scores were present between groups (d = 0.880, 95% CrI [0.767, 0.995]).
Table 1
Demographics for Autistic and General Population Samples
|
SPARK (n = 743)
|
HPP (n = 721)
|
Age (Years)
|
30.91 (7.02)
|
30.92 (13.01)
|
Sex
|
|
|
Male
|
271 (36.5%)
|
253 (35.1%)
|
Female
|
472 (63.5%)
|
468 (64.9%)
|
Gender Identity
|
|
|
Cisgender Man
|
245 (33.0%)
|
—
|
Cisgender Woman
|
400 (53.8%)
|
—
|
Transgender Man
|
15 (2.0%)
|
—
|
Transgender Woman
|
6 (0.8%)
|
—
|
Non-binary
|
76 (10.2%)
|
—
|
Non-Hispanic White
|
590 (79.4%)
|
—
|
Education
|
|
|
No High School Diploma
|
25 (3.4%)
|
—
|
High School Diploma/GED
|
140 (18.8%)
|
—
|
Vocational Certificate
|
36 (4.8%)
|
—
|
Some College
|
197 (26.5%)
|
—
|
Associate Degree
|
74 (10.0%)
|
—
|
Bachelor’s Degree
|
171 (23.0%)
|
—
|
Graduate/Professional Degree
|
100 (13.5%)
|
—
|
Age of Autism Diagnosis (Years)
|
19.67 (11.17)
|
—
|
Current Depression
|
440 (59.2%)
|
—
|
Current Anxiety
|
533 (71.7%)
|
—
|
Current Suicidality
|
292 (39.3%)
|
—
|
Lifetime ADHD
|
342 (46.0%)
|
—
|
TAS-20 Total Score
|
60.55 (13.11)
|
50.21 (11.21) a
|
TAS-8 Latent Trait Score
|
1.01 (1.17)
|
0.01 (0.93)
|
"High Alexithymia" (TAS-20 ≥ 61)
|
405 (54.5%)
|
123 (17.1%)a
|
Note. Continuous variables are presented as M (SD), and categorical variables are presented as N (%). All data in both samples were gathered by self-report. SPARK = Simons Powering Autism Research Knowledge; HPP = Human Penguin Project; ADHD = attention deficit hyperactivity disorder; TAS = Toronto Alexithymia Scale.
a Participants in the HPP sample completed a 16-item version of the TAS, which excluded items 16, 17, 18, and 20. For comparison with the TAS-20 scores in the SPARK sample, these four items were imputed for all HPP participants using random forest imputation.
Confirmatory Factor Analysis
Within the SPARK sample, the confirmatory factor model for the full TAS-20 exhibited subpar model fit, with only the SRMRu meeting a priori fit index cutoff values (Table 2). Additionally, examination of residual correlations revealed five values greater than 0.1, indicating a non-ignorable degree of local model misfit. Model-based bifactor coefficients indicated strong reliability and general factor saturation of the TAS-20 composite (ωT = 0.912, ωH = 0.773), though the ECV/PUC indicated that the scale could not be considered “essentially unidimensional” (ECV = 0.635, PUC = 66.8%). Both the DIF and DDF subscales exhibited good composite score reliability (ωS = 0.906 and 0.854, respectively), although omega hierarchical coefficients indicated that the vast majority of reliable variance in each subscale was due to the “general alexithymia” factor (DIF: ωHS = 0.162, S-ECV = 0.753; DDF: ωHS = 0.145, S-ECV = 0.768, respectively). Conversely, the EOT subscale exhibited very poor reliability, with only one fourth of common subscale variance attributable to the general factor (ωS = 0.451, ωHS = 0.300, S-ECV = 0.245). Examination of the factor loadings further confirmed the inadequacy of the EOT subscale, as seven of the eight EOT items (5, 8, 10, 15, 16, 18, 19, and 20) loaded poorly onto the “general alexithymia” factor (λG = -0.116–0.311; Supplemental Table S1). Notably, these psychometric issues were not limited to autistic adults. The fit of the TAS-20 CFA model in the HPP sample was equally poor, and bifactor coefficients indicating the psychometric inadequacy of the EOT and reverse-scored items were replicated in this sample as well (Table 2).
Table 2
Confirmatory Factor Analysis Fit Indices and Model-based Omega Coefficients
Index
|
TAS-20 Bifactor: SPARK
|
TAS-20 Bifactor: HPP
|
TAS-11 Bifactor: SPARK
|
TAS-11 Bifactor: HPP
|
Model Fit Indices
|
|
|
|
|
χ2 (df)a
|
590.6 (145)
|
669.9 (145)
|
151.6 (33)
|
124.0 (33)
|
CFIcML
|
0.924
|
0.900
|
0.970
|
0.978
|
TLIcML
|
0.900
|
0.869
|
0.951
|
0.963
|
RMSEAcML [90% CI]
|
0.072 [0.066, 0.078]
|
0.086 [0.081, 0.092]
|
0.080 [0.069, 0.092]
|
0.068 [0.056, 0.079]
|
SRMRu [90% CI]
|
0.036 [0.033, 0.004]
|
0.051 [0.047, 0.056]
|
0.020 [0.017, 0.024]
|
0.019 [00.015, 0.023]
|
WRMR
|
1.119
|
1.565
|
0.768
|
0.699
|
|Residuals| > 0.1
|
2.60%
|
8.90%
|
0%
|
0%
|
Largest Residual
|
0.149
|
0.225
|
0.084
|
0.055
|
Bifactor Coefficients
|
|
|
|
|
wT/wH
|
0.912/0.773
|
0.914/0.741
|
0.929/0.861
|
0.925/0.952
|
wS/wHS (DIF)
|
0.906/0.162
|
0.880/0.224
|
0.913/0.087
|
0.892/0.071
|
wS/wHS (DDF)
|
0.854/0.145
|
0.803/0.120
|
0.800/0.163
|
0.839/0.223
|
wS/wHS (EOT)
|
0.451/0.300
|
0.512/0.307
|
—
|
—
|
wS/wHS (REV)
|
0.559/0.441
|
0.692/0.689
|
—
|
—
|
Note. Fit indices that above the a priori cutoffs for acceptable model fit (CFI/TLI > 0.95, RMSEA < 0.06, SRMR < 0.08, WRMR < 1, all residuals < 0.1) are presented in bold. TAS = Toronto Alexithymia Scale; SPARK = Simons Powering Autism Research Knowledge; HPP = Human Penguin Project; CFIcML = comparative fit index (categorical maximum likelihood estimation); TLIcML = Tucker-Lewis Index (categorical maximum likelihood estimation); RMSEAcML = root mean square error of approximation (categorical maximum likelihood estimation); SRMRu = population-unbiased standardized root mean square residual; WRMR = weighted root mean square residual; wT = omega total (composite reliability of total score); wH = omega hierarchical (proportion of total score variance accounted for by general factor); wS = omega subscale (composite reliability of subscale score); wHS = omega hierarchical subscale (proportion of subscale score variance accounted for by specific factor); DIF = difficulty identifying feelings; DDF = difficulty describing feelings; EOT = externally-oriented thinking; REV = reverse-coded item method factor.
a all p values < 0.001
Following the removal of the EOT and reverse-coded items from the TAS-20, we fit a bifactor model with two specific factors (DIF and DDF) to the remaining 11 items in our SPARK sample. The fit of this model was substantially improved over the TAS-20, with all indices except RMSEAcML exceeding a priori designated cutoffs (Table 2) and all residuals correlations below 0.1. Moreover, model-based coefficients (ECV = 0.815; PUC = 50.9%) indicated that the 11-item TAS was unidimensional enough to be fit by a standard graded response model with little parameter bias. Notably, the estimated reliability and general factor saturation of the 11-item TAS composite score were higher than those of the 20-item composite (ωT = 0.925, ωH = 0.852), suggesting that the inclusion of EOT and reverse-coded items on the scale actually reduces the amount of scale variance attributable to the underlying alexithymia construct. Fit of the 11-item TAS model in the HPP sample was equally strong (Table 2), with an approximately equal ECV (0.793) supporting the essential unidimensionality of this scale in both samples.
Item Response Theory Analyses
A unidimensional graded response model fit to the 11-item TAS short form did not display adequate fit according to a priori fit index guidelines (C2(44) = 485.7, p < 0.001, CFIC2 = 0.955, RMSEAC2 = 0.116, SRMR = 0.068). Examination of residual correlations indicated that item 7 (I am often puzzled by sensations in my body) was particularly problematic, exhibiting a very large residual correlation of 0.259 with item 3 as well as two other residuals greater than 0.1. Removal of this item caused the resulting 10-item graded response model to approximately meet the minimum standards for adequate fit (C2(35) = 485.7, p < 0.001, CFIC2 = 0.976, RMSEAC2 = 0.086, SRMR = 0.051), with all remaining residual correlations below 0.1. The overall fit of this 10-item model was somewhat worse in the HPP sample (C2(35) = 319.9, p < 0.001, CFIC2 = 0.960, RMSEAC2 = 0.106, SRMR = 0.065); however, it is notable that this model contained item 17, which was not contained within the TAS-16 and was thus fully imputed in the HPP sample. Removal of this item resulted in a substantial improvement in fit in the HPP sample (C2(27) = 169.1, p < 0.001, CFIC2 = 0.974, RMSEAC2 = 0.086, SRMR = 0.058), with fit indices approximately reaching the a priori cutoffs. As the 9-item TAS also exhibited good fit in the SPARK sample (C2(27) = 161.7, p < 0.001, CFIC2 = 0.980, RMSEAC2 = 0.082, SRMR = 0.049), we chose this version of the measure to test I-DIF between autistic and general population adults.
For the remaining nine TAS items, I-DIF was evaluated across diagnostic groups using the iterative Wald test procedure. Significant I-DIF was found in eight of the nine items (all except item 6) at the p < 0.05 level (Table 3); however, effect size indices suggested that practically significant I-DIF was only present in item 3 (I have physical sensations that even doctors don’t understand; wABC = 0.433, ESSD = 0.670). The remaining items all exhibited I-DIF with small standardized effect sizes (all wABC < 0.165, all |ESSD| < 0.187), allowing these effects to be ignored in practice (83). After removal of item 3, we re-tested I-DIF the resulting eight-item scale (TAS-8), producing nearly identical results (significant I-DIF for all items except 6; all wABC < 0.167, all |ESSD| < 0.186). The overall DTF of the TAS-8 was also small enough to be ignorable, with the average difference in total scores between autistic and non-autistic adults of the same trait level being less than 0.5 scale points (UETSDS = 0.460, ETSSD = -0.011).
Table 3
Differential Item Functioning Results Comparing Autistic and General Population Adults on 9-item Toronto Alexithymia Scale
TAS-20 Item #
|
χ2(5)
|
p-value
|
wABC
|
ESSD
|
Parametersa
|
1
|
35.30
|
<0.001
|
0.089
|
-0.018
|
a1, d1, d2
|
2
|
23.18
|
<0.001
|
0.164
|
0.157
|
d2, d3
|
3
|
65.10
|
<0.001
|
0.433b
|
0.670b
|
d2, d3, d4
|
9
|
26.03
|
<0.001
|
0.064
|
-0.021
|
d1
|
11
|
30.47
|
<0.001
|
0.165
|
0.001
|
a1, d2, d3
|
12
|
30.19
|
<0.001
|
0.149
|
-0.187
|
d1
|
13
|
57.66
|
<0.001
|
0.064
|
-0.022
|
a1, d1, d2, d3, d4
|
14
|
61.90
|
<0.001
|
0.031
|
-0.022
|
a1, d1, d2, d3, d4
|
Note. Results indicate omnibus Wald tests of differential item functioning using the iterative anchor-selection method of Cao et al. (2017). P-values are corrected for a 5% false discovery rate using the Benjamini-Hochberg procedure. Parameters that were significantly different between groups when tested alone with follow-up Wald tests (FDR < 0.05) are indicated in the Parameters column. wABC = weighted area between curves; ESSD = expected score standardized difference (in Cohen’s d metric); a1 = slope parameter; d1-d4 = item intercept parameters (i.e., item “difficulty” parameters).
a Parameters in bold are larger (i.e., more discriminating for a parameters and “easier” for d parameters) in the autistic group. Larger values of a indicate that the item is more strongly related to the latent trait in autistic adults, whereas larger values of d indicate that a given item response is endorsed at lower latent trait levels in autistic adults relative to the general population.
b Practically significant DIF (i.e., wABC > 0.3).
After establishing practical equivalence in item parameters between the two diagnostic groups, we then tested I-DIF for the TAS-8 for a number of subgroups within the HPP and SPARK samples. Within the general population HPP sample, all eight TAS-8 items displayed no significant I-DIF across by sex, age (≥ 30 vs. <30), or phase of the HPP study (all ps > 0.131). Similarly, in the SPARK sample, there was no significant I-DIF by sex, gender, race, education level, current anxiety disorder, history of ADHD, or current suicidality (all ps > 0.105). However, significant I-DIF was found across several demographics, including age (item 6; wABC = 0.0543, ESSD = -0.045), age of autism diagnosis (items 2, 6, and 14; all wABC < 0.267, all |ESSD| < 0.135), and current depressive disorder (item 13; wABC = 0.274, ESSD = 0.361), although wABC values for these items indicated that the degree of I-DIF was ignorable in practice.
As no items of the TAS-8 exhibited practically significant I-DIF across any of the tested contrasts, we retained all eight items for the final TAS short form. A graded response model fit to the full sample exhibited adequate fit (C2(20) = 240.4, p < 0.001, CFIC2 = 0.983, RMSEAC2 = 0.087, SRMR = 0.045) and no residual correlations greater than 0.1. A multi-group model with freely estimated mean/variance for the autistic group was used to calculate the final item parameters (Table 4), as well as individual latent trait scores. Item characteristic curves indicated that all TAS-8 items behaved appropriately, although the middle response option was insufficiently utilized for three of the eight items (Fig. 1). The MAP-estimated latent trait scores for the TAS-8 showed strong marginal reliability (ρxx = 0.895, 95% bootstrapped CI: [0.895, 0.916]), and individual reliabilities were greater than the minimally acceptable 0.7 for the full range of possible TAS-8 scores (i.e., latent trait values between − 2.19 and 3.52; Fig. 2A). Item information plots for the eight TAS-8 items (Fig. 2B) indicated that all items contributed meaningful information to the overall test along the full trait distribution of interest. TAS-8 latent trait scores were also highly correlated with total scores on the TAS-20 (r = 0.910, 95% CrI [0.897, 0.922]), indicating that the general alexithymia factor being assessed by this short form is strongly related to the alexithymia construct as operationalized by the TAS-20 total score. Diagnostic group differences in TAS-8 latent trait scores remained large, with autistic individuals demonstrating substantially elevated levels of alexithymia on this measure (d = 1.014 [0.887, 1.139]).
Table 4
TAS-8 Graded Response Model Parameters and Equivalent Factor Loadings for Full Sample
TAS-20 Item #
|
Item Content
|
a1
|
d1
|
d2
|
d3
|
d4
|
l
|
h2
|
1
|
I am often confused about what emotion I am feeling.
|
2.802
|
3.092
|
-0.689
|
-2.740
|
-6.336
|
0.855
|
0.731
|
2
|
It is difficult for me to find the right words for my feelings.
|
2.190
|
3.478
|
0.491
|
-0.931
|
-3.841
|
0.790
|
0.623
|
6
|
When I am upset, I don’t know if I am sad, frightened, or angry.
|
2.335
|
2.090
|
-0.805
|
-2.413
|
-5.497
|
0.808
|
0.653
|
9
|
I have feelings that I can’t quite identify.
|
2.402
|
3.137
|
0.072
|
-1.434
|
-5.170
|
0.816
|
0.666
|
11
|
I find it hard to describe how I feel about people.
|
1.870
|
2.745
|
-0.234
|
-1.505
|
-4.340
|
0.740
|
0.547
|
12
|
People tell me to describe my feelings more.
|
1.235
|
1.739
|
-0.526
|
-1.636
|
-3.644
|
0.587
|
0.345
|
13
|
I don’t know what’s going on inside me.
|
1.892
|
2.054
|
-0.646
|
-2.231
|
-4.771
|
0.743
|
0.553
|
14
|
I often don’t know why I am angry.
|
1.538
|
1.285
|
-1.133
|
-2.201
|
-4.361
|
0.671
|
0.450
|
Note. Parameters estimated using maximum marginal likelihood based on Bock-Aitkin EM algorithm. This model contained two groups: general population (q fixed to M = 0, SD = 1 in this group) and autistic group (mean and SD of q free to vary), with all item parameters constrained to equality between groups. TAS = Toronto Alexithymia Scale; a1 = slope parameter; d1-d4 = item intercept parameters (more positive values indicate “easier” items); l = factor loading on single factor; h2 = communality (squared factor loading).
Validity Analyses
Overall, the TAS-8 latent trait scored demonstrated a pattern of correlations with other variables that generally resembled the relationships seen in other clinical and non-clinical samples (Table 5). The TAS-8 latent trait score was highly correlated with autistic traits as measured by the SRS-2 (r = 0.642 [0.598, 0.686]), additionally exhibiting moderate correlations with lower-order (r = 0.386 [0.320, 0.450]) and higher-order (r = 0.432 [0.372, 0.494]) repetitive behaviors as measured by the RBS-R. TAS-8 latent trait scores were also correlated with psychopathology measures, exhibiting the hypothesized pattern of correlations with depression, anxiety, somatic symptom burden, social anxiety, and suicidality, as well as lower autism-related quality of life. As with other versions of the TAS, the TAS-8 displayed a moderate-to-large correlation with trait neuroticism (r = 0.475 [0.416, 0.531]), raising the possibility that relationships between TAS-8 scores and internalizing psychopathology are driven by neuroticism rather than alexithymia per se. To investigate this possibility further, we calculated partial correlations between the TAS-8 and other variables after controlling for IPIP-N10 scores, using a Bayes factor to test the interval null hypothesis that rp falls between − 0.1 and 0.1 (i.e., < 1% of additional variance in the outcome is explained by the TAS-8 score after accounting for neuroticism). Bayes factors provided substantial evidence that the partial correlations between the TAS-8 and SRS-2, RBS-R subscales, BDI-II, and ASQoL exceeded the ROPE. Additionally, while partial correlations with the BFNE-S, PHQ-15, and BDI suicidality item were all greater than zero, Bayes factors suggested that all three of these correlations were more likely to lie within the ROPE than outside of it (all BFROPE < 0.258). There was only anecdotal evidence that the partial correlation between the TAS-8 and GAD-7 exceeded the ROPE (BFROPE = 2.18). However, there was a 91.3% posterior probability of that correlation exceeding the ROPE, suggesting that there was a strong likelihood of alexithymia explaining a meaningful amount of additional variance in anxiety symptoms beyond that accounted for by neuroticism.
The relationships between TAS-8 scores and demographic variables were also examined in order to determine whether relationships found in the general population apply to autistic adults. As hypothesized, TAS-8 scores showed a small and practically insignificant correlation with age (r = 0.032 [-0.041, 0.104], BFROPE = 5.77 ´ 10-6), likely due to the absence of older adults (i.e., ages 60+) in our sample. Alexithymia also showed a nonzero negative correlation with education level, although the magnitude of this relationship was small enough to not be practically significant (rpoly = -0.089 [-0.163, -0.017], BFROPE = 0.045). Unlike in the general population, females in the SPARK sample had slightly higher TAS-8 scores (d = 0.183 [0.022, 0.343]), although this difference was small and not practically significant (BFROPE = 0.265). Additionally, there was an absence of practically significant differences in alexithymia by race/ethnicity (d = -0.052 [-0.247, 0.141], BFROPE = 0.029). Lastly, age of autism diagnosis was positively correlated with TAS-8 scores (r = 0.133 [0.06, 0.204]), although this correlation was also small enough to not be practically significant (BFROPE = 0.014).
Readability Analysis
Using the FORCAST algorithm, we calculated the equivalent grade level of the full TAS-20 (including instructions) to be 10.2 (i.e., appropriate for individuals at the reading level of a high school sophomore after the second month of class). Using this same algorithm, the TAS-8 items had a readability of 8.8, indicating a moderate decrease in word difficulty. Thus, in addition to improving the psychometric properties of the measure, our item reduction procedure seemingly improved the overall readability of the TAS.