Forty-one patients with a diagnosis of RTT took part in the experiment. Forty were female and one was male. Their families had been contacted by the Italian association for Rett syndrome (AIRETT) that asked them to participate in the study. The families come from all over Italy. Five patients were excluded from the study since they were not able to focus on the stimuli of the monitor. For this reason, finally, 36 patients participated in the study. They ranged in age between 4–32 years. A general assessment was carried out by a psychologist through the Vineland Adaptive Behaviour Scale (VABS)  and the Rett Syndrome Rating Scale (RARS) . Thirty-one girls and one male attended schools or socio-educational centres; four girls were assisted by an educator at home. All the girls were in the post-regression phase of the disorder: They were severely mentally retarded and only six were able to use verbal speech. All showed little or no purposeful hand use and pervasive hand stereotypies were striking. Ambulation was preserved in 19 girls. Table 1 shows the chronological age of the participants, the RARS scores as well as the VABS Scores.
A Tobii Series-I eye-tracker was used to record the subject’s visual scanning. This device records ocular movements such as the location and duration of ocular fixations (pause of eye movement on an object of interest) and saccadic movements (rapid movements between fixations). The participant was positioned at a distance of about 30 cm from the screen and the direction of the gaze was determined according to the Pupil Centre/Corneal Reflection Method in low-intensity infrared light. Passive gaze tracing (LC Technologies, Sao Paulo, Brazil) software was used to generate gaze data during visuals scanning. In addition, this device allows to define the areas of interest (AOI) within the images chosen for the statistical analysis of eye tracking. An AOI cluster refers to selected specific areas that are used for both attention and recalling details of the images.
The eye-tracker was used for both the overselectivity paradigm and the memory paradigm. The avatar was created using an educational platform "Voki for Education" (https://www.voki.com/). Voki is a free collection of customizable speaking avatars for teachers and educators that allows users to create a precise profile of a talking character. Voki is created by Oddcast and can be customized to look like humans, cartoons, and/or animals.
The characteristics selected for the creation of the avatar were chosen through a pre-calibration, carried out during the 2018 Airett Campus in which several patients with RTT spent their holidays with family and educators. The pre-calibration was fundamental as it allowed to include the avatar that the RTT patients prefer. Following, the materials of both paradigms will be presented.
The memory test was implemented. The story-cartoon presented with Tobii eye-tracker was easy to understand and remember, and the descriptions of facts were presented in a logical order. The cartoon sequences were extracted from “La Pimpa” and they were: “ant Bibi” and “Pimpa on the beach”. They were chosen out of seven cartoon sequences presented to 31 3-year-old children and calibrated on the basis of their comprehension of the story (> 90%) and on the basis of their recalling indices (> 90%). Each cartoon sequence contained 8 significant memory indices (Table 2). Both cartoon sequences “Ant Bibi” and “Pimpa on the Beach” lasted 2:30 minutes.
The test was carried out for each patient. After each cartoon was presented through the eye-tracker, the participants were asked to perform immediate recall of the cartoon with a recognition test with 8 questions regarding the story (Table 3).
For each of the relevant indexes two cards were presented on the screen, the correct answer and the distractor answer (Fig. 1). The scoring standard used in the present study involved giving 1 point for choosing the correct answer, and 0 point for choosing the distractor.
In this paradigm, 2 cards of 10 cm x 30 cm, each one reporting a different complex stimulus composed of three familiar objects shown in black and white, were presented on the screen of the Tobii eye-tracker. In the second phase, individual stimuli, consisting of cards of about 10 cm x 10 cm, were presented on the screen. Each card represented a single familiar object previously included in the target complex stimulus (Fig. 2). The cards were calibrated in a previous study .
In the condition with virtual avatar, the avatar presented both the complex stimuli and the individual stimuli between which the participant had to choose (Fig. 3, phase 1 and 2). In the condition without virtual avatar, no avatar was presented to participants.
The experiment was carried out in a quiet room during the 2019 Rett summer campus of the AIRETT. The examiner administered the VABS and the RARS through an interview with the parents of the subjects with RTT and the educators. Participants sat in a dimly lit room of the association in front of the eye-tracker screen at a distance of 30 cm. The eye tracker was positioned in such a way that ambient lighting did not affect the recordings. The eye tracking equipment was calibrated for each participant at the beginning of the experiment. Gaze fixations of at least 1000 ms within a region of 2°– 3° around each calibration point were considered accurate. All participants were tested in the morning from 9.00 to 12.00 a.m.
The two tasks of this experiment were presented randomly. With reference to the memory task, in the condition with the avatar, the avatar appeared initially on the whole screen of the Tobii I-15 and said "Hi, my name is Giorgio. Watch this cartoon with me!" Then the avatar became smaller and moved to the lower left part of the screen. During the cartoon, it only moves its eyes and head in a stereotyped way to make the avatar seem alive. After watching the first cartoon, the avatar appeared again and says "Hello, we will play together now!” Then the avatar started by asking the participant the 8 questions mentioned above (Table 3). The participant had to choose the correct answer with their eyes and avoid focusing on the distractor. Then the avatar appeared again and repeated the process with a second video. As can be seen in Table 2, the questions have various levels of difficulty, from simple recognition of the main character of the story, to recognition of the emotional states of the characters, to identification of the actions within the story.
With reference to the overselectivity procedure, the examiner presented a pair of complex stimuli on the screen, placed to the right and left sides. The positioning of the images on the right or on the left took place in a random order. The images were placed 40 cm from each other. In this way, both images were easily observable and within easy reach and grasp of the patients who can use the hands. The task was carried out in two phases.
During the first phase, two images reporting complex stimuli (ABC, correct stimulus; XYZ, incorrect stimulus, see Fig. 2) were presented. The examiner presented each subject with the correct complex stimulus described as the “correct one”; both the correct and incorrect cards were then presented on the screen in front of each subject who was subsequently asked: “Which is the correct one?” Forty-five seconds were allowed to answer the question. The subjects could answer by grasping an image or by looking at it. If the subject chose the correct card (ABC) during the 45 sec, the examiner gave them a verbal reinforcement (e.g. "Great!" “Very good!”). If the subject chose the incorrect image (XYZ) or did not choose any image during 45 sec, both were removed, and the ‘no’ answer was coded, and a new possibility of choice started after 10 seconds.
In a second phase, the examiner used the cards reporting individual objects (Fig. 2, second phase) extracted both by correct and incorrect complex stimuli by devising 9 different pairs of individual stimuli from the combination of A with Y, B with X etc. The examiner asked every participant to choose the correct stimulus.
Memory task: two parameters were considered: fixation length (FL) of the correct stimuli related to the significant memory indices during the vision of the cartoon (Table 2) and the number of the recalled correct indexes. FL refers to the amount of time (seconds) spent by the subject when looking at the correct stimulus. Total fixation length refers to the sum of the time spent in looking at each significant index during the vision of the cartoon. Fixations were extracted using a threshold of 100 ms.
Overselectivity task: two parameters were considered: FL of the complex correct stimulus and the number of the individual correct recalled images
The data were analysed using SPSS version 22.0 for Mac. The descriptive statistics of the dependent variables were tabulated and examined. Alpha level was set to 0.05 for all statistical tests. In the case of significant effects, the effect size of the test was reported. The relationship between continuous variables was evaluated by determining Pearson’s r; group comparisons were conducted using t-test for paired samples.