The objective of our study is to test innovative methodological approaches that could effectively distinguish toddlers with ASD from an elevated-risk population. Previous studies explored the potential of early screening for ASD with eye-tracking techniques (21, 10, 22). In various studies based on eye tracking of children with typical and atypical development, a high exclusion rate can be observed (23). A significant challenge lies in the difficulty young children under the age of 3 face in adhering to instructions during examinations, and they also tend to frequently move their head, which often results is data loss and inaccuracy. In the present research, we used a self-developed eye-tracking device that tolerates head movements and does not require calibration.
Moreover, we established variables that are sufficiently robust. In similar research investigating fixations, the subject exclusion rate was 11–26%, which was frequently due to a significant (40–50%) lack of data (11, 18, 24). In a meta-analysis, Von Holzen and Bergmann (25) reported dropout rates ranging from 0%-69% in mispronunciation detection studies with one-to-four-year olds using the preferential looking paradigm. They draw attention to the fact that it would be important to establish the length of the fixations theoretically and to compare only those results where a similar duration was used.
Instead of utilizing fixations, we opted for gaze retention interval (GRI) variables, which cover a broader spatial and temporal range. Additionally, we incorporated variables based on areas of interest (AOI) and variables that analyse the dynamic aspect of eye movements (Markov variables).
Participants
Participants were recruited through social media advertising and various early childhood institutions (e.g., primary care or diagnostic centers). Our recruitment process highlighted whether the suspicion of autism arose in the parents or professionals working with the family. As a result, most of the toddler’s parents had some suspicion or curiosity about autism. Thus, subjects were not screened from an average population but from an elevated risk sample. Our control group consisted of children for whom the suspicion of ASD was not confirmed by our examiners. The data collection took place from September 2022 to March 2023. Finally our research included 74 toddlers between 12–30 months. For the descriptive statistics of the subjects see the chapter “Study population” and Table 1.
Procedure
Parents registering for research filled out a questionnaire about their child’s anamnestic data (e.g., developmental problems) online. The child’s participation began with the eye-tracking procedure (max. 5 minutes), followed by ADOS-2 and ophthalmological screening (the two together max. 45–60 minutes). The parent completed the M-CHAT questionnaire during these examinations. Families received a 5000 HUF (approx. 14.2 USD) voucher upon their participation.
Clinical Measures
ADOS-2 Toddler module
The Autism Diagnostic Observation Schedule 2 (ADOS-2) is a semistructured, standardized assessment for the diagnosis of autism spectrum disorder. In our study, we utilized the Toddler module (26), which is similar to the original tasks of ADOS Module 1 but is intended for children under 30 months of age. The observation was conducted by our specialists who are certified in the method. To assess interrater reliability, 10 toddlers (Mage=23,5, SD = 4; 3 nonASD, 7 hrASD) were rated both by our specialists and a clinical psychologist with several years of experience. The interrater agreement between the observers on sample was 86–100% for algorithm items (M = 93,6%). There was no difference between the two specialists in the classification of the toddlers. Therefore, the following ADOS evaluations were performed by our own specialists.
We used two different algorithms for verbal and nonverbal children to calibrate the severity score. To determine the cut-off points, we used the results of Hong et al. (27). In Algorithm 1. the 0–9 severity score is “nonspectrum” (non-ASD), and 10 or over was classified in the “high-risk ASD” group (hrASD). In Algorithm 2. scores of 0–7 are “nonspectrum”, and scores of 8 or over were classified in the hrASD group.
Ophthalmological screening
Ophthalmological screening procedures are aimed at identifying prevalent ophthalmic issues in cases of atypical development. The screenings were administered by our examiners, who received training from a specialist in pediatric ophthalmology and strabismus. The ophthalmic screening tests applied included guided eye movement, cover and uncover test, and Lang-II test.
M-CHAT
The M-CHAT is a standardized questionnaire to be completed by parents, which helps to screen children aged 18–24 months for autism (28). As part of the research, the questionnaire was used in the entire 12–30-month age group due to its risk-free nature and content of relevant information.
Eye-tracking methods
Stimuli and presentation
Doing eye tracking research on toddlers has a number of challenges: attention capacity, need for movement and limited understanding of instructions. Therefore, two iterative stages preceded the present study (N1 = 34, N2 = 35), on the basis of which we selected the most suitable ones from several stimulus materials (types and items), and we also tested our eye tracking protocol on toddlers with typical development.
We decided on two major types of stimuli in the presented material: joint attention (subsequently ostensive) scenes and preferential stimuli (with social and non-social sides) (see Fig. 1 for examples).
In ostensive video, the actress first performed ostensive communication - looked at the camera, smiled and addressed the child (e.g., says ‘hello’) - and then directed the child’s attention to one of two similar objects (toys) by shifting her gaze (and head). The duration of these videos was between 9 and 13 seconds.
In preferential stimuli, two dynamic videos were presented side-by-side. There was always a social video with an adult (who shows ostensive communication) or child (who moves) or children's interactions (two or three children moving, dancing or playing together) and a non-social video with moving objects or nonfigurative scenes (they are all real video recordings). The social scene occurs half-and-half on the right or left side. The duration of these videos is between 8 and 13 seconds.
Through the examination videos followed each other in a quasirandom order: seven joint attention videos and 16 preferential videos.
During the presentation, the videos began with animation (zoom-out) and finished with animation and a sound signal (zoom-in and short beep). They were used to keep the children’s interest and direct the children's attention to the screen.
Eye tracker
For our measures, we used our custom-built eye tracker, which consists of a uEye UI-3860LE-M (SONY IMX-290 sensor) monochrome camera equipped with 16 mm optics and four 850 nm LED emitters spaced in a rectangular fashion approximately 5–5 cm inwards from the corners of the virtual stimuli presentation surface. The lighting and stimuli were combined by a selective wavelength semitransparent mirror. The infant subjects were seated in the lap of their caretaker spaced 50 cm from the front of the housing of the device, spaced 100 cm from the front lens of the optics and 110 cm from the LED emitters. The camera produced 50 frames per second at a resolution of 1936x1096 with a single channel. Note that the LEDs used were measured by an appropriate independent laboratory to emit unharmful levels of infrared light both in terms of corneal and retinal irradiation in the full range of the possible movement of subjects towards the emitters (min. 50 cm).
The eye tracker we built was specifically constructed to tolerate head movements present in infant behavior and to accommodate it using custom built software by our team. The eye tracker does not need any calibration to operate. The head movements are tolerated until the head and at least one eye remains within the field of view of the camera, which varies by the distance of the subject. The optics are calibrated in such a way that the eye tracker tolerates moving 30 cm towards the camera from the original position and 10 cm further from the camera and still maintains gaze capture. Note that backwards movement is more limited due to the subject sitting in the caretaker’s lap.
We measured the accuracy and precision of the eye tracking gaze-points on a sample of 11 adult persons with healthy vision. On a central target, average accuracy in X and Y directions was measured as 0.74° and 1.25°, while average precision was 0.37° and 0.53° respectively. The eye tracker produces raw gaze position data in a manner that is calculated from the two eye positions. In cases when only a single eye/gaze was visible/detectable, an algorithm from a previously learned relationship between the detectable eye and the binocular gaze point was applied to yield the gaze point.
Eye-tracking variables
The eye tracker generated a row-based table in which each row defined a gaze point in time for both eyes and a singular gaze point, also revealing the temporal state of the video stimulus that was presented at the time.
The stimuli were sectioned into temporal parts that represented similar situations across multiple similar stimuli. In the preferential paradigm, we used only the core part of the stimuli without the animations at the beginning and the end. In the ostensive stimuli, we used the postturn-towards-target parts of videos.
The areas of interest (AOI) were developed using the Voronoi method (29). In preferential stimuli, the social and non-social parts were defined in space and time, and in ostensive stimuli, there were a person, a target and a non-target AOI (see examples in Fig. 1.). Our AOI variables relied on the gaze-time ratio. We used five different AOI variables: ratio of social, ratio of non-social AOI; and in ostensive situations: ratio of person, ratio of target and ratio of non-target.
State change dynamics were captured by features that were Markovian Chain like (30), representing the changes of the gazed AOIs. Staying in the previous state was measured by a 5 second window as a unit (allowing fractional counts), while switching to another AOI was measured discretely by counting each represented switch as one. From this point for each person within a domain/item a Markov Chain was calculated and the probabilities of this Markov chain were used as the features. E.g. for a preferential item consisting of social and non-social AOIs, there were 4 Markov features: social to social, social to non-social, non-social to non-social, and non-social to social.
For detecting gaze retention intervals (GRI) a common algorithm was used, the identification by dispersion threshold (IDT) algorithm (31–34). With the conventional parameter settings, the algorithm detects fixations. However, with modified (more tolerant) parameters, it is able to detect the more dispersed, longer duration GRI. The IDT algorithm uses the x- and y-data of the eye positions and two fixed thresholds: the maximum fixation dispersion and the minimum fixation duration. To be a GRI, eye-movement data samples have to be within a spatial area not exceeding the dispersion threshold (300 pix) for at least the duration threshold (300 msec). The samples fulfilling these criteria were marked as GRIs. Based on GRI detections, we calculated density (number of GRI in a minute), duration (average length of GRI in msec), gaze time (duration of all GRI in msec), gaze stuck (duration of GRI in msec after the end of an AOI), GRI in AOI ratio (number of GRI in an AOI/number of all GRI during the AOI time) and latency (time in msec from the start of an AOI to the first GRI in the AOI) parameters.
For each subject, the values of the eye movement and gaze retention interval (GRI) variables were calculated in the preferential and ostensive situations in each AOI for each stimulus. Markov probability variables were also calculated. The values of variables from different stimuli but belonging to the same AOI were averaged. This resulted in robust variables where the effects of occasional missing data due to not watching the stimuli were mitigated.
Variables used in statistical tests
Simple AOI based eye-movement variables used in preferential situations: ratio of social (%) and ratio of non-social (%). AOI based eye-movement variables used in ostensive situations: ratio of person (%), ratio of target (%) and ratio of non-target (%). Markov variables used in preferential situations: staying in a social state, staying in a non-social state, transitioning from a non-social to a social state, and transitioning from a social to a non-social state. Markov variables used in ostensive situations: staying in person state, transitioning from person to non-target state, transitioning from person to target state. GRI variables used for social and non-social areas: density, duration, gaze time, gaze stuck, GRI in AOI ratio and latency.
Statistical method
To determine the optimal sample size, we performed power calculations for independent sample t-tests. A sample size of N ≥ 34 was found to be required for a power of at least 0.8 to detect a 0.7 effect at p < 0.05. Therefore, we decided that our sample size would be near N = 34 in the non-ASD and hrASD groups.
Non-ASD and hrASD groups were compared using independent sample t-tests (α = 0.05), and Hedge’s g was used to calculate the effect size of the differences between the groups. Pearson's point-moment correlation coefficient was computed between the ADOS score and the eye-movement parameters. Correlations were computed for all subjects and separately for hrASD and non-ASD subjects. The difference (expressed in Fisher's z score) between the correlations in the non-ASD and hrASD groups and the significance of the difference were also calculated. The Holm‒Bonferroni method of false detection rate correction was performed to correct for multiple comparisons. Statistical tests were carried out in G*Power 3.1.9.7 software (Franz Faul, Univedrsitat Kiel, Germany, 1992–2020) for power analysis. TIBCO STATISTICA 14.0.1.25 software (TIBCO Software Inc., 1984–2020) was used for independent sample t-tests and correlation calculations. Microsoft Excel Professional 2021 was used for Hedge’s g effect size and Holm‒Bonferroni calculations. An online tool (VassarStats: Website for Statistical Computation, Richard Lowry 2001–2023; http://vassarstats.net/rdiff.html) was used to compute the significance of the difference between two correlation coefficients using the Fisher r-to-z transformation.