Participants
Experiment 1 was conducted on 40 neurotypical adults (32 females; age [mean ± SD]: 29.42 ± 1.93), with no diagnosed neurological condition. For Experiment 2, we recruited 40 children with developmental disorders, including 18 autistics (3 females; age 6.2–15.6 years; mean ± SD: 10.77 ± 3.02 years). Note that we often use the wording “autistic” throughout the paper, aligning with the preference for identity-first language expressed by the autistic community [51]. The other children (22 (7 females); age 6.8–15.5 years; mean ± SD: 10.88 ± 2.34 years) were diagnosed with disorders considered to be outside the autism spectrum, specifically: learning disabilities (LD) (n = 7), developmental language disorder (DLD) (n = 7), behavioral disorder (BD) (n = 3), or attention deficit hyperactivity disorder (ADHD) (n = 5).
Autistic children had received a diagnosis of autism according to DSM-5 criteria [52], or of autistic disorder, Asperger disorder, and pervasive developmental disorder-not otherwise specified according to DSM-IV criteria [53] (Table 1). In all cases, the diagnosis had been made prior to admission in the study, by a multidisciplinary team that included a senior child psychiatrist and an experienced clinically trained research child psychologist. The autistic and comparison groups were matched by chronological age (two-sample t-test on age in years: t(39) = 0.51, p = 0.60, lgBF = − 1.07) and Performance IQ (t(39) = 0.23, p = 0.81, lgBF = − 1.04), as measured by standardized tests (Leiter International Performance Scale-Revised or Leiter- R, [54]; Wechsler Preschool and Primary Scale of Intelligence WPPSI, Italian version [55]; Wechsler Intelligence Scale for Children [56]), chosen for each participant based on their varying levels of verbal functioning. All children had a Performance IQ score above 70 and were thus considered “cognitively able”. No child in either group had additional medical or developmental conditions, as reported by parents, and no child was on medication at the time of the study.
Using G*Power [57], we computed the sensitivity of our tests given the sample size, a set power of 80% and a type I error probability of 5%. This indicated that the smallest between-group difference we would be able to detect was 0.9 as measured with Cohen’s d metrics, a large effect size.
Table 1. Mean (standard deviations) in each group of participants; the last column gives the comparison between the two groups of children
|
Adults
|
Autistics
|
Controls
|
A-C Comparison
|
Gender (F:M)
|
32:8
|
3:15
|
7:15
|
X^2=1.21, p = 0.27
|
Age
|
29.78 (1.61)
|
10.95 (2.35)
|
10.51 (2.94)
|
t(39) = 0.51, p = 0.60
|
Performance IQ
|
-
|
99.00 (17.00)
|
100.40 (14.06)
|
t(39) = 0.27, p = 0.78
|
Ados-2 Total Score
|
-
|
12.20 (3.30)
|
-
|
|
AQ Total Score
|
14.25 (8.01)
|
27.56 (7.81)
|
18.18 (7.55)
|
t(39) = 3.84, p < 0.0001
|
All participants had normal or corrected-to-normal visual acuity. Experimental procedures were approved by the regional ethics committee Comitato Etico Pediatrico Regionale—Azienda Ospedaliero-Universitaria Meyer—Firenze (FI) and are in accordance with the declaration of Helsinki; participants (and their legal guardian, where appropriate) gave written informed consent.
AQ score
All neurotypical adult participants filled out an on-line or paper version of the Autism-spectrum Quotient questionnaire, using the validated Italian version [58, 59]. The test comprises 50 items. Responses are made on a 4-point Likert scale: ‘‘strongly agree’’, ‘‘slightly agree’’, ‘‘slightly disagree’’, and ‘‘strongly disagree’’. Items were scored as described in the original paper [58]: 1 when the participant’s response was characteristic of autism (slightly or strongly), 0 otherwise. Total scores ranged between 0 and 50, with higher scores indicating higher degrees of autistic traits. AQ scores for children are parent-reported, and were collected using the age-appropriate form [60]. For one child participant the AQ score was not collected (the parent filled out only part of the questionnaire and could not be re-contacted to complete the task).
Stimuli and Procedure
The experiment was conducted in a dark room with no illumination other than the display screen. For adults (Experiment 1) the display was a CRT (Cathode-ray tube) monitor (40 × 30 cm, Barco Calibrator with resolution 1024 × 768; maximum-minimum luminance 53 − 0.1 cd/m2). Children (Experiment 2) were tested with a more portable device (53 × 32.8 cm LCD color monitor Acer, with resolution 1920 × 1080; maximum-minimum luminance 110 − 0.1 cd/m2). In both cases, the screen was placed 57 cm from the participant, whose head was stabilized by chin rest. Visual stimuli were generated in Matlab (Mathworks) using the Psychophysics Toolbox[61]. Total testing time (for both adults and children) was about 30 min, including the time for initial adjustment of the apparatus to match each participant’s eye-level.
Top: images (sun, moon and meaningless control images) were presented for 1s each in random order, embedded within an animation movie. Bottom: the same protocol was used for the presentation of full-screen maximum or minimum luminance squares, for testing the pupillary light or dark response.
During experimental sessions participants observed a clip extracted from an animation movie [62], displayed at screen center within a window of 17 × 9.1 deg. Stimulus presentation blanked out the movie (with no interruption of the soundtrack) for 1 s, and occurred every 4 s on average (Fig. 1). When testing adults (Experiment 1), three types of images were used: photographs of the sun; photographs of the moon, adjusted to match the mean luminance of the sun images; and phase-scrambled images of the sun that preserved mean luminance, power spectrum, and root mean square contrast [63]. There were 13 images per category, all 10 X 10 cm (subtending 10 × 10 deg at 57 cm viewing distance). Each image was presented twice, over two sessions, in pseudorandomized order. For children (Experiment 2), only sun and phase-scrambled images of the sun were used, which yielded the strongest differences in pupillary response in adults. In addition, in separate sessions, full-screen white or black squares were shown for 1 s and 13 repetitions to estimate each child’s pupillary light/dark responses.
Pupil diameter was monitored at 500 Hz with an EyeLink 1000 system (SR Research) with infrared camera mounted below the screen, recording from the left eye. Pupil measures were calibrated by an artificial 4-mm pupil placed at the approximate position of the participants’ eye. Synchronization between eye recordings and visual presentations was ensured by the Eyelink toolbox for MATLAB [61]
Analysis of pupillometry and eye-tracking data
Eye-tracking data were preprocessed using custom Matlab scripts that implemented the following steps:
-
Identification and removal of gross artifacts: removal of time-points with unrealistically small or large pupil size (more than 2 mm from the mean of the trial or < 0.1 mm, corresponding to blinks or other signal losses).
-
Identification and removal of finer artifacts: identification of samples where pupil size varied at unrealistically high speeds (> 25 mm per second, beyond the physiological range) and removal of the 20 ms epoch surrounding this disturbance.
-
Down-sampling of data at 100 Hz, by averaging the retained time-points in non-overlapping 100 ms windows. If no retained sample was present in a window, that window was set to “NaN” (MATLAB code for “not a number”).
Pupil traces were transformed into changes from baseline by subtracting the average pupil diameter in the first 200 ms after stimulus onset (i.e. during the latency of the pupillary light response). After averaging all traces per subject and image type, we took the maximum dilation (for dark images) or the maximum constriction (for all other images) after the stimulus presentation to index the size of the response, which we submitted to statistical tests. Due to the preprocessing described above, trials with blinks or artifacts yielded traces with several missing values; we excluded these from our analyses by eliminating all trials for which no sample was available over the stimulus presentation window (mean ± s.e.m in adults: 1.5 ± 0.23%; autistic group: 19.40 ± 3.22%; control group: 7.20 ± 1.53%).
Statistical Analysis
Data were analyzed using custom Matlab code and JASP [64]. We used a repeated-measures approach, computing average per-participant responses and comparing them across stimulus types and (for children) across participant groups. Data from Experiment 1 were analyzed with a One-way ANOVA for repeated measures, with ‘stimulus category’ as within-subject factor; subsequent paired t-tests tested pairwise differences between sun, moon and phase-scrambled stimuli. Data from Experiment 2 were analyzed with a mixed design ANOVA for repeated-measures. This had a within-subject factor ‘stimulus category’ (sun versus phase-scrambled or bright versus dark) and a between-subject factor ‘group’ (autistic versus comparison group). Pupillometric results from both the experiments were correlated with participants’ AQ scores using Pearson’s correlation coefficient.
Each analysis was complemented with a Bayesian Repeated Measures ANOVA, which estimated Bayes Factors for each of the F-terms [65]. Statistical significance was evaluated using both p-values and log-transformed Bayes Factors. The Bayes Factor is the ratio of the likelihood of the two models H1/H0, where H1 assumes an effect (e.g. correlation between two variables or difference between two means) and H0 assumes no effect. By convention, when the base 10 logarithm of the Bayes Factor (lgBF) > 0.5 is considered substantial evidence in favor of H1, and lgBF < − 0.5 substantial evidence in favor of H0.