Participants. Toddlers were recruited through community referral and a population-based screening method in collaboration with pediatricians via the 1-Year Well-Baby Check-Up Approach15,17. All toddlers participated in clinical assessments, including the ADOS34, Mullen Scales of Early Learning35, and Vineland Adaptive Behavior Scales36. Toddlers who received their initial diagnostic and clinical evaluations at < 36 months were invited to return for repeat evaluations until they reached 48 months. Clinical scores at the outcome visit were used as a best estimate of a child’s abilities (Supplementary Table 3). Clinical testing occurred at the University of California, San Diego Autism Center of Excellence. Adult participants were recruited by word of mouth. This study was approved by the University of California, San Diego Institutional Review Board. Informed consent was obtained from parents or guardians of toddlers and from adult participants.
Clinical scores and fMRI scans were collected from 71 toddlers (41 ASD/30 TD). Scans were conducted during natural sleep, which has been proven to yield robust activation in ASD and TD toddlers37–40. Toddlers were considered TD if their diagnosis at outcome was TD and their Mullen Early Learning Composite scores fell within 2 standard deviations of the group mean. This allows us to examine activation patterns along a continuum of language and cognitive abilities in TD children. A subset of toddlers (4 ASD/6 TD) had test-retest fMRI scans collected at intervals ranging from 1 – 15 months after the initial scan. fMRI scans were also obtained from 14 TD adults (6M/8F, 20 – 37 years old).
Language paradigms. We presented three types of language stimuli with varying levels of emotional valence, including the “Story” language paradigm we used in earlier studies31,33,38,41,42 (i.e., low emotional valence); a newly-created “Karen” language paradigm (i.e., moderate emotional valence); and a Motherese paradigm (i.e., high emotional valence). Stimuli were created by recording female voices reading nursery stories or age-appropriate phrases either using neutral or infant-directed utterances (e.g., Motherese2,8). Language paradigms were presented in a block design (20s stimulus/20s rest).
The Story language paradigm has been used previously to identify ASD language-related brain signatures38,39 and to develop predictors of language outcome among ASD toddlers31,33. It consists of three types of speech stimuli (simple forward speech, complex forward speech, and backward speech; 9 cycles of speech + rest; 6min 25s). Forward and backward speech stimuli were combined, and our main contrast of interest was all speech versus rest31,33.
To expand our understanding of neurofunctional language and affect development in ASD, we developed two new social orienting language paradigms. The Karen language paradigm utilizes 18 different nursery story stimuli (12min 5s) with moderate levels of emotion and prosody. The Motherese paradigm includes 12 phrases (8min 5s) recorded using high-pitched, intonational, lyrical and sing-songy speech characteristic of motherese.
Emotionality level testing. Two computer-based surveys were administered to TD adults to test emotionality levels of language paradigms.
Each fMRI paradigm consists of unique language segments, i.e., 2 Story language, 18 Karen language, and 12 Motherese segments. For survey 1, each unique segment was presented in random order (same order for each participant). Subjects were instructed to listen to each segment and respond using a Likert scale of 1 – 5 with a rating of 1 indicating the least amount of emotionality and a rating of 5 indicating the most. TD adults (n=19) rated Story language as the least emotional, followed by Karen language, and Motherese segments as the most emotional (Supplementary Fig. 5a).
Survey 2 consisted of 18 trials, each containing a Story language segment, a Karen language segment, and a Motherese segment. Presenting all three stimulus types allowed for evaluation of differences in emotionality across all language paradigms. TD adults (n=15) then rated each segment using a Likert scale of 1 through 3 with a rating of 1 indicating the least amount of emotionality, 2 – some emotionality, and 3 – very emotional. Participants rated Story language as the least emotional, followed by Karen language, and Motherese segments as the most emotional (Supplementary Fig. 5b).
fMRI data acquisition. All fMRI data were collected in a 3T GE scanner at the University of California, San Diego Center for Functional MRI. Functional images were acquired with a multi-echo EPI protocol (echo times (TE) = 15ms, 28ms, 42ms, 56ms; TR = 2500ms; flip angle = 78°; matrix size = 64 × 64; slice thickness = 4 mm; field of view (FOV) = 256 mm; 34 slices). Structural images were acquired using a T1-weighted MPRAGE sequence (FOV = 256 mm; TE = 3.172ms; TR= 8.142ms; Flip angle = 12°).
Imaging data preprocessing. Functional data were preprocessed using the ME-ICA analysis pipeline “meica.py”43,44 implemented in AFNI45 and Python. First, the first 4 volumes of each run were discarded to allow for magnetization to reach steady state. Next, motion correction parameters were calculated based on the first TE images (TE = 15 ms) using a rigid-body alignment procedure. Slice timing correction was implemented for functional images of each TE, which then were normalized to an age-matched infant template46. The time series of four TEs were combined into a single time series47. Both principal and independent component analyses were applied to denoise the data through isolation of thermal (i.e., random) noise from structured signals (i.e., BOLD and non-BOLD signals) and separation of BOLD and non-BOLD signals. Only the BOLD-like components were retained in the preprocessed images, which were then spatially smoothed with a 8 mm FWHM Gaussian kernel.
Head motion was quantified via framewise displacement (FD)48. For adults and sleeping toddlers, head motion was minimal (mean FD < 0.1 mm). Group differences between ASD and TD toddlers in mean FD were only seen in the Motherese paradigm; there were no group differences between adults and toddler groups (Supplementary Table 4).
Whole-brain analyses. First-level and second-level whole-brain activation analyses were conducted with the general linear model (GLM) in SPM12 (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/). Events in first-level models were based on the canonical hemodynamic response function and its temporal derivative. Given that scans were collected at multiple time points, including retest scans, we ran second-level whole-brain analyses with mixed effects models using 3dMVM program in AFNI:
In mixed effects models, brain activation to each language paradigm (i.e., speech versus rest contrast) served as a dependent variable. Individual subjects were treated as a random effect, which allows for fixed effects (i.e., age, gender, and mean FD) to vary for each subject.
Using a similar approach, we conducted whole-brain analyses with adult data for each language paradigm. However, only within-group tests were performed as all adult participants had typical development.
Resulting activation maps were corrected for multiple comparisons with the family-wise error (FWE) approach using 3dClustSim program in AFNI (voxel wise p = 0.005 and cluster size > 138 voxels for adults and cluster size > 186 voxels for toddlers). This spatial cluster correction took into account spatial autocorrelation by using the ‘–acf’ option in 3dClustSim.
Calculation of percent signal changes in temporal language cortex. Two language-relevant ROIs from the meta-analytic activation map in Neurosynth (https://neurosynth.org/) with the term “language”, including left and right temporal regions, were used for ROI analysis (Fig. 2). These ROIs were identical to those used in previous papers31,33. Given that a toddler template was used for toddler samples, ROIs were co-registered to the toddler template using FSL’s flirt function49,50. For each language paradigm, percent signal changes were calculated with first-level models in speech versus rest contrast for all toddlers and adults.
Examination of Stability and Validation of fMRI Activation in Toddlers: Toddler-Adult, Test-Retest, and Sleep-Awake. Given the challenges relating to implementing sleep imaging with toddlers, test-retest is rarely examined, but essential for determining the rigor of this approach. Additional key questions surround the degree to which functional activation patterns vary along the dimension of sleep and awake states, and developmental periods such as between toddlers and adults or in individuals across time. Here we took steps towards filling these gaps and tested: 1) whether brain activation to language stimuli in sleeping TD toddlers is similar to that in passively listening adults; 2) whether brain activation patterns are stable and reproducible in TD and ASD individuals across time.
These questions were addressed by comparing percent signal change values between TD adults and TD toddlers, and by computing intraclass correlation coefficients of brain activation within individuals who were scanned multiple times at intervals of 1–15 months, respectively.
Group differences in temporal cortex activation between ASD and TD. We compared percent signal change values between ASD and TD toddlers in a priori temporal ROIs relevant to language processing. Given the repeated time points, mixed effect models with lmer function (lme4 package) in R51 were used, in which age, gender, and mean FD were included as fixed effects while subjects served as a random effect.
Brain–behavior correlation analysis. Using similar mixed effects models as aforementioned, but with social or communication scores as a predictor of interest (age, gender, and mean FD as control variables/fixed effects, subjects as a random effect), we investigated the relevance of brain activation to a child’s social and communication abilities assessed by the Vineland Adaptive Behavior Scales36.
Clustering analysis using Similarity Network Fusion. SNF is a novel approach for capturing heterogeneity in multiple types of patient data and forming clusters or subgroups. The method reduces noise by aggregating across multiple types of data, detects common and complementary signals from different types of data, and reveals the importance of each data type to patient similarity. Validation and clinical values emanate from follow-up analyses testing whether different clusters predict important “held-back” clinical variables of interest. In the original description, five SNF clusters formed from multiple types of heterogenous genomic and genetic variables were associated with significantly different patient survival times20, a finding substantially superior to other single modality cluster approaches.
To identify clusters of clinical features of ASD and TD toddlers linked with patterns of fMRI activation to speech with varying emotionality levels, we used SNF20 to integrate fMRI brain activation in three language paradigms and clinical measures and then used Louvain algorithm52 to detect clusters of the similarity network. For this analysis, we included a subset of 52 of the 71 toddlers who had successful scans of all three language paradigms. The analysis was performed with 6 ROI variables (left and right temporal activation for each of the three language paradigms) and 14 clinical variables (i.e., 3 ADOS variables, 6 Vineland variables, and 5 Mullen variables) in R with SNFtool package. First, ROI and clinical data were normalized separately. Next, pairwise distance matrices between subjects were calculated for ROI or clinical data. Affinity matrices (networks) were computed based on distance matrices. Each affinity matrix is equivalent to a similarity network where nodes are samples (e.g., subjects) and weighted edges represent pairwise sample similarities. Network fusion that iteratively updates every network was then performed, making two networks more similar to each other with every iteration. After a few iterations, two networks converged to a single network. We constructed the network with the strongest 15% connecting partners of each subject and ran the clustering analysis with the Louvain community algorithm. The clusters were visualized with Cytoscape53.
Motherese eye-tracking task. We used a novel eye tracking Motherese task that utilized gaze contingent technology wherein a toddler’s gaze activates what he/she sees and hears. Toddlers can choose to watch a movie depicting an actress telling a story using motherese speech or computer “techno” sounds and images. Motherese utterances used in this task were among those included in fMRI experiments. Fifty-four toddlers had moderate or good eye-tracking performance and total looking time > 50% and were therefore included in the analysis. For 38 toddlers, eye-tracking data were collected prior to fMRI while 16 toddlers completed the task after fMRI.
Eye tracking was conducted using Tobii software (Tobii Studio; Tobii Pro Lab), and fixation data were collected using a velocity threshold of 0.42 pixels/ms (Tobii Studio Tobii Fixation Filter) or 0.03 degrees/ms (Tobii Pro Lab Tobii IV-T Fixation Filter). Preference for Motherese was characterized by comparing percent fixation duration within Motherese versus computer “Techno” sounds and images, we tested Motherese preferences between ASD and TD and across clusters identified by clustering analysis.