Autism and Inner Speech: A Computational Model of Language Functions in Autistic Flexible Behaviour

Experimental and computational studies propose that inner speech boosts categorisation skills and executive functions, making human behaviour more focused and ﬂexible. In addition, many clinical studies highlight a relationship between poor inner-speech and an executive impairment in autism spectrum condition (ASC), but contrasting ﬁndings are reported. Here we investigate the latter issue through a previously implemented and validated computational model of the Wisconsin Cards Sorting Tests. In particular, the model was applied to detect the individual differences in cognitive ﬂexibility and inner speech contribution in ASC and neurotypical participants. Our results suggest that the use of inner-speech increases along the life-span of neurotypical participants but is absent in ASC ones. Although we found more attentional failures in autistic children/teenagers and more perseverative behaviours in autistic young/older adults, only ASC children and ASC older adults exhibited a lower performance than matched control groups. Overall, our results corroborate the idea that the lower use of inner speech in ASC teenagers and young adults is compensated by alternative cognitive strategies (e.g., visual thinking), but it could represent a disadvantage for children (for the missing support of development) and older adults (for the missing compensation of cognitive decline). Moreover, the results suggest that cognitive-behavioural therapies should focus on developing inner speech skills in ASC children as this could provide cognitive support along their whole life span.

Many studies investigated the relationship between inner speech and executive functions in ASC, and contrasting results are reported (for a review see [13]). For example, some studies on planning [14,15,16] found that an experimental interference of inner speech (e.g., articulatory suppression) impairs planning abilities in control participants but not in an ASC population. However, the results should be taken with caution due to potential methodological limitations (see critiques by [13]). Again, evidence on working memory suggests that ASC individuals do not spontaneously use inner speech to name stimuli internally (e.g., [17]) while most studies on motor control indicate either that ASC individuals use inner speech or that the absence of inner speech does not impact their performance. Crucially for us, [18] showed that articulatory suppression does not interfere with cognitive flexibility in ASC people, who do not show an impaired performance. However, [19] found that ASC children performed worse than controls even if they did used private speech. Overall, these scattered and controversial results leave space to further research. In particular, findings suggest that autistic people make a reduced use of inner speech but it is debated whether this reduction has an impact on executive functioning and in particular on cognitive flexibility.
Here we use a computational modelling approach to investigate the relationship between inner speech and executive functions in autistic people. Specifically, we used here a previously implemented and validated computational model [6] able to perform the Wisconsin Cards Sorting test (WCST), a neuropsychological test commonly used to measure cognitive flexibility [20]. The model reproduced several human behavioural data, and was also able to distinguish the different levels of inner speech contribution during an experimental protocol involving verbal shadowing. Here we use the model to address four studies in which the WCST is administered to ASC children [21], teenagers [22], young adults [23] and old adults [24].
Our results suggest that the control groups show an inner speech contribution that progressively increases with the age of the participants. Instead, autistic people do not show an inner speech contribution at any age. Moreover, although we found more attention failures in autistic children/teenagers and more perseveration in autistic young/old adults, only children and old adults show statistically significant lower global performances than their control groups. Therefore we propose that for neurotypical people (not autistic) inner speech is a solid cognitive tool in all ages, representing a development support for children and a compensating tools for old adults (cognitive decline). On the other hand, the absence of inner speech in autism can be compensated in teenagers and young adults (e.g., based on above-average visual skills and thinking; [25,26,27]), but can represent an impairing factor for children and old adults.
The presented results have clinical implications. In particular, they suggest focusing psychotherapies treatments on the development of inner speech skills, especially in ASC children. The suitable integration of inner speech and strong visual thinking abilities could indeed be a protecting factor against ageing of autistic people.

Task and participants data
The WCST [28] is a neuropsychological test that is commonly used to measure cognitive flexibility, the capacity to change behavioural strategies to achieve a target goal on the basis of external feedback [20]. The task setting is composed of two decks of 64 cards and four target cards, both put on a table in front of the participant (Figure 1). Each deck card shows a specific combination of elements varying with respect to three categories (colour, shape, number), each characterised by four attributes (colour: red, green, blue, yellow; shape: stars, triangles, circles, crosses; number: one, two, three, four). Each target card shows a unique combination of attributes (e.g. one red triangle, two green circles, etc.). In our simulations the third category (numerosity) is substituted with size (small, medium small, medium large, and large), and the star and cross shapes are substituted with square and bar shapes to reduce the resolution of the images and gain simulation speed. These modifications do not alter the quality of the results. Participants are required to sort each deck-card, choosing one of the three categories, and put the drawn deck card under one target-card matching it for the attribute of the chosen category. For example, if the chosen category is 'colour', a card with a blue item has to be put under the target card with a blue item. Importantly, there is a correct sorting rule for each turn, but it is unknown to the participant. After each sorting attempt, an external operator provides feedback ('correct' or 'not correct') depending on the current sorting rule and the action executed. The key challenge of the task is hence that the participant has to infer the rule based on the feedback. After a succession of ten correct matches, the sorting rule is changed without informing the participant, who has thus to change the sorting rule to the new one by inferring it on the basis of the feedback.
To extract a complete cognitive profile of participants, we considered four behavioural indices scored during the task performance: Completed Categories (CCs), a global performance index indicating the number of the performed non-interrupted ten-card sequences of correct sorting; Perseverative Errors (PEs), indicating a perseverative behaviour; Non Perseverative Errors (NPEs), indicating an attentional failure or incorrect inferential reasoning; Failure to Maintain Set (FMS), indicating a sustained attention failure.
To investigate the relationship between the inner speech and executive functions in ASC we considered four specific works that (a) adopted the Heaton's version of WCST, (b) involved an ASC group (mild autism/Asperger syndrome/highfunctioning autism) and a matched control group, and (c) reported at least CC, PE, and NPE indices.
The first group [21] involved 26 children (6 to 12 years) with a diagnosis of autism without mental retardation (DSM-III) and a control group of 52 children matched for age. The second group [22] involved 13 teenagers (16.40 ± 2.84) with a diagnosis of Asperger syndrome or High-functioning Autism (ICD-10) and a control group of 13 teenagers matched for age and QI. The third group [23] involves 9 young adults (27 ± 7) with a diagnosis of autism without mental retardation and high verbal competences (DSM-III) and a control group of 10 young adults matched for age, education and QI. The fourth group [24] involves 27 old adults (33.5 ± 12) with a diagnosis of Asperger syndrome (ICD-10) and a control group of 20 old adults matched for age and QI.
Despite the populations of [23] and [24] show similar ages (27 ± 7 vs 33.5 ± 12) we define them 'young adults' and 'old adults' to better distinguish them. We did not found studies that administrate the WCST to ASC adults older than such age.

Model
A previous version of the model, without an inner speech component, demonstrated to reproduce the behavioural differences between young adults, old adults, frontal patients, and Parkinson patients in the WCST [5]. The model is based on an hypothesis that proposes that flexible goal-directed behaviour is based on the manipulation of perceptual representations, based on internal attention, and the external world, for example to displace objects. In particular, the synergistic integration of internal manipulation of representations and external manipulations of objects allows an agent to support goal-directed flexible perception and behaviour. The hypothesis, and the derived computational model, is based on three main components: a hierarchical visual system, a working memory, and a top-down selector of internal representations.
An updated version of the model was enhanced with the addition of an inner speech component and was validated in [6]. The model reproduced a complete behavioural profile (WCST indices) of three groups of teenagers in different experimental conditions: control, motor tapping and verbal tapping. In particular, it was able to detect the different levels of inner speech contribution of the participants in the control condition and in the verbal shadowing condition.
We present here a description of the model components that allows the reader to interpret the presented results. These components support the cognitive functions needed to perform the task in a goal-directed manner. We reported the computational details of the model in Supplementary Materials and a throughout analysis of the model computational architecture and dynamics is presented in [6].
A first group of components, supporting the internal manipulations of perceptual representations, is as follows. (a) A hierarchical perceptual component that extracts the input visual features at increasing levels of abstraction; this component is analogous to the visual brain system [29], and in the model is implemented as a deep neural-network generative model. (b) A working-memory component that stores the 'priorities' of the task sub-goals (sorting rules), thus determining the probability with which the rules are selected; in the brain, this is a function that is mostly supported by the reentrant circuits of frontal cortices [30,20] and it is implemented here as a recurrent neural network. (c) A motivational component, using the external feedback to update the information in working memory; in the brain, this function is mostly supported by ventral basal ganglia [31,32] and it is implemented here as a reinforcement learning (RL) algorithm. (d) A selector and a manipulator, the former that chooses the sorting rule to follow and the latter that implement the manipulation of the internal representations by biasing the perceptual system; these functions reflect the top-down control that the fronto-parietal cortex and basal ganglia exert on the perceptual cortices [33,34]; these functions are implemented here as a softmax function selecting the rule to follow on the basis of the working memory activation, and a mechanism that uses the selected rule to dis-inhibit the high-level representations within the hierarchical perceptual component. (e) An inner-speech component storing a linguistic code on the relevance of the rules to follow, and influencing the working memory rule selections; this component is inspired by the brain systems that integrate linguistic and emotional information [35,36], and in the model it involves a deep neural-network that produces an output formed by the positive/negative valence, and intensity, of the change of the rule priority.
The model is also formed by additional components that implement a set of sensorimotor auxiliary functions needed to accomplish the WCST and that work as follows. (f) A visual sensor component, extracting the visual information from deck cards and target cards; this component is analogous to the eye retina and in the model it is implemented as an RGB matrix of pixels. (g) A visual comparator component, executing a visual matching of the deck and selected target card based on the comparison of the low-level perceptual representations of the cards reconstructed by the perceptual component under the bias of the chosen rule [5]; in the brain, these processes might rely on an integrated network involving the frontal and temporal-occipital cortices [37,38], and are implemented here as the computation of the Euclidean distance between the representations of the two cards; (g) A motor component, controlling the saccades on the deck and target cards, and the actions to move the deck card close to the chosen target card after a successful visual matching. After each sorting attempt, a simulated 'external operator', knowing the correct sorting rule, receives the deck card and the chosen target card and returns positive or negative feedback to the model. The model has four key parameters that influence its computations and behaviour: 'error sensitivity' (µ), representing the magnitude with which the motivational component influences the working-memory sorting-rule priorities in case of negative feedback; 'memory refresh/forgetting speed' (φ), representing the decay speed of the working memory rule priorities towards a baseline; 'distractibility/explorative tendency' (τ ), representing the level of randomness of the rule selection; 'inner speech contribution' (λ), representing the magnitude with which the inner-speech component influences the working memory. The last parameter is the most important for this study.

Configurations of parameters of the best fitting models
As done in [6], we used a statistical search method based on the minimisation of the mean square error (MSE) to find the models parameters (see Supplementary Materials for further details). In particular, this method was used to find the parameter configurations that best reproduced the behavioural data of the control and ASC populations. Although the sample size of some groups was small, the model reproduces the human behavioural data with a low average MSE for both control and ASC groups (see Table S1 in Supplementary Materials). Table 1 and Figure 2 show the parameter values of the models populations that best fit the dataset of the human groups. The parameters represent the the simulated cognitive traits of the model and, therefore, of the modelled human participants. Regarding the inner speech contribution (parameter λ), the control groups show an increasing tendency depending on ageing. Differently, ASC groups show an absent or negligible inner speech contribution in all ages.
Regarding the error sensitivity (parameter µ), the control groups show an "inverse U-shaped" curve. In particular, children and old adults show a similar and lower error sensitivity, while teenagers and young adults show a similar and higher error sensitivity. In the case of ASC groups, we found similarities among pairs of different groups. In particular, children and young adults show a similar and lower error sensitivity, while teenagers and old adults show the same higher error sensitivity.
Regarding the memory refresh/forgetting speed (parameter φ), the control groups again show similarities between children and old adults. Differently, teenagers show the lowest value and young adults the highest value. In case of ASC groups, we found a descending tendency. In particular, children show the highest value with respect to the other groups, and the latter ones show similar values.
Regarding the distractibility/exploratory behaviour (parameter τ ), the control groups have similar values. Despite this, children and old adults show the same slightly higher value with respect to teenagers and young adults, that show the same value. In the case of ASC groups, similarly to the φ parameter, we found a descendent tendency. In particular, children show higher value with respect to the other groups, and the latter ones are similar between them.

Comparisons between perseverative errors and non perseverative errors in each group
Since perseverative errors and non perseverative errors identify two opposite tendencies, respectively for perseveration and for distraction [5], we performed statistical comparisons (t-tests with Bonferroni's correction) between PEs and NPEs of each model to investigate its behavioural profile (Figure 3).
The results show that in the control condition only old adults have significant differences in their behavioural profile, with an imbalance toward NPE (7.9 ± 2.32 vs 12.05 ± 3.53, p < .001). In the ASC condition, we found that children have an imbalance toward NPE (24.77 ± 4.48 vs 38.04 ± 4.4, p < .001) while young adults have an imbalance toward Parameters of models: trends Despite the plots show many imbalances of PE and NPE population means in the other groups of models, they also show a high population variability that prevents further statistical differences.

Comparison between the behaviour of different age groups (intra-condition analysis)
We performed statistical comparisons (one-way Anova and post-hoc t-tests with Bonferroni's correction) between the models of each condition. These analyses aimed to investigate the differences in the ageing process of control and ASC conditions (Figure 4, blue and red trend lines).
Behavioural indices of models (inter-conditions analysis: control vs. ASC) Regarding the completed categories index (CC), we found statistical difference between the control models (F = 7.03, p < .001). Post hoc tests (table S2 in supplementary materials) indicate that children achieve a lower CC index with respect to teenagers (5.06 ± 0.93 vs 6.0 ± 0.0, p < .001). We did not find significant statistical differences between the other models, probably due to the high variability of each model population. We found statistical difference between the ASC models (F > 50, p < .001). Post hoc tests ( . We did not find significant statistical differences between the other models.

Comparison between the behaviour of the control and experimental groups (inter-condition analysis)
We performed statistical comparisons (t-tests with Bonferroni's correction) between the indices of the control and ASC models to investigate the behavioural differences between them in each age (figure 4) Regarding the completed categories (CC), we found that they are lower in ASC children (5.06 ± 0.93 vs 0.12 ± 0.32, p < .001) and ASC older adults (5.5 ± 0.81 vs 4.44 ± 1.03, p < .01). We did not find a statistical differences in teenagers (6.0 ± 0.0 vs 5.08 ± 1.21, p > .05) and young adults (5.9 ± 0.3 vs 5.44 ± 0.68, p > .05).

Internal functioning comparisons
We also investigated the internal functioning of the models. Figure 5 shows the internal activation of the working memory units of the models recorded during their task performance. The activation of each unit corresponds to a specific sorting rule to follow and the top-space of each plot of the figure shows the errors that occur during each card response.
In the case of children, the activation of the working-memory units of the control and ASC models appear very different. In particular, the ASC model shows several erratic strategy changes that cause the occurrence of several NPE. Interestingly, despite the model is evidently distracted and does not keep the focus on a specific strategy, few PE are scored. As already shown in [6], a participant with high distractability can choose by chance an already tried strategy thus erroneously appearing perseverative. Here we refer to these errors as 'distraction-related PEs'. At last, also the control model shows a sub-optimal performance, caused by reasoning errors (e.g., see the 65-80 interval of trials) and attention failures (e.g. 3-4, 25-35 interval) In the case of teenagers, the control model shows a good landscape with some negligible reasoning failures (e.g., 0-5 interval) and perseveration (e.g., 40-45 interval). The ASC model shows several 'sustained attentional failures' (e.g., 10-40 interval) and reasoning errors (e.g. 110-120 interval) that cause many NPE and FMS errors.
In the case of old adults, both control and ASC models show sub-optimal landscapes. The ASC model shows many attention failures (e.g., 35-45 interval) and reasoning errors (e.g., 55-65 interval), causing many PEs and NPEs.
Interestingly, the control model shows many FMS (e.g., 50-75 interval) as the ASC model, showing a poorly focused behaviour.

Discussion
The computational model we presented reproduces most behavioural indices of control and ASC humans groups performing the Wisconsin Cards Sorting Test. Moreover, it captures several intra-group and inter-group cognitive and behavioural differences.
Regarding control populations, we generally found similar parameters values between children and old adults (Figure 2, blue lines) in error sensitivity, memory refresh/forgetting speed, and distractibility/exploratory behaviour, detecting some 'U-shaped tendencies' related to age. Differently from the other three parameters, we found an inner-speech contribution that increases with age, being low in children and high in adults. Further investigations of the cognitive profile of control groups confirmed the U-shaped trends in perseverative errors (perseverative behaviour) and non-perseverative errors (attention/reasoning failures) (Figure 3, left plot). At last, a qualitative analysis of internal activations of the models corroborated these trends ( Figure 5), showing that teenagers and young adults exhibit the best performance with respect to children and old adults in which we found more sub-optimal behaviours affected by distraction and perseveration.
Despite the emergence of these trends, the cognitive differences (parameters) between controls groups do not always cause statistically significant differences in behavioural data ( Figure 4). For example, only children show significantly lower global performances than the other groups (teenagers, young adults, and old adults), which do not show statistical differences between them.
These results allow the interpretation of contrasting findings on ageing-related effects. In particular, several studies indicate that ageing causes significant brain changes (e.g., [39,40]), in particular a weakening of executive functions [41,42,43], but other studies reveal compensating brain processes such as functional reorganisation and increased bilateral recruitment [44,45].
Considering this literature, based on the results presented here we suggest that the inner speech contribution, showing an increasing trend from children to old adults, can play an ageing compensation effect. In particular, we propose that inner speech contributes to support early development and to avoid/compensate cognitive decline, thus mitigating the life-span cognitive and behavioural differences between neurotypical individuals. This proposal is also coherent with our results from [6], highlighting that inner speech interacts with the other cognitive processes (working memory storing, error sensitivity, attention), boosting the global performance and diminishing distracted and perseverative behaviours. Moreover, our proposal corroborates the several studies that highlight an important executive modulator function of inner speech in old adults (e.g., [46,47,48]).
Regarding the ASC populations, we found relevant differences in cognitive profiles with respect to the control populations. First, we found that ASC groups do not show any contribution of inner speech along the life-span. Second, we found greater differences between children and other groups regarding working memory decay and distractibility. Third, autistic groups show different imbalances with respect to control groups (figure 3, right plot). In particular, autistic children show an evident imbalance toward distractibility (NPE), while young adults show an imbalance toward perseverative behaviours (PE).
These results are particularly interesting because the diagnostic criteria for autism rely on repetitive behaviours [11] and clinical studies mostly focus on perseverative/repetitive behaviours in ASC children and adults [49,50]. On the other side, several works have investigated attention abnormalities in autism suggesting that an attention impairment could play a causal role in the development of ASC individuals (for a review see [51]). The results presented here agree with these last studies, suggesting that ASC children mostly show an imbalance toward distractions with respect to perseverative behaviours. Moreover, the models suggest a cognitive change in ASC peoples along the life-span, from a distracted profile in children to a perseverative one in young adults.
Regarding behavioural age-related differences, the cognitive traits (parameters) seem to have a more marked effect on behaviours in ASC peoples with respect to the control groups. For example, the descending values of distractibility and memory refresh are reflected by the similar curve of NPE and the low error sensitivity in children and young adults cause higher PE with respect to teenagers and old adults. However, in the case of children this result is evidently altered by many distractibility-related PEs [6]. In particular, the extreme distractibility of ASC children causes a random behaviour ( Figure 5, first row) that is sometime scored as 'perseverative behaviour' although it is caused by attention failures (see the imbalance toward NPEs in Figure 3, right plot).
Interestingly, the FMS curve shows a different and unexpected trend with respect to the control trends. In particular, we could expect that ASC children would show higher FMS due high distraction, but in fact they showed a low value of this index. This is probably explained by the difference between NPE and FMS, where the first indicate an attentional/reasoning failure and the second indicates a sustained attention failure. Since ASC children cannot focus on a specific strategy (sorting rule) for long, they often do not achieve the necessary number of responses to occur in a sustained attention error (FMS). These results are coherent with [52], detecting an impairment in selective attention and not sustained attention, and with [53], detecting a response inhibition impairment rather than a sustained attention impairment. A high FMS in old adults is another interesting data. While a higher FMS value of teenagers is expected and corroborates a sustained attention deficit [54,55], we could expect an imbalance toward perseveration in old adults. Instead, we found higher FMS with respect to young adults without a marked PE/NPE imbalance (Figure 3, right plot). These results need further investigations, in particular regarding sustained attention in autistic old adults.
In summary, comparing control and ASC populations we found statistically lower performance only in ASC children and ASC old adults with respect to their control groups. The behaviour comparisons ( Figure 4) and the analysis of the internal activation of the models ( Figure 5) suggests that these differences are caused mainly by more distractions in ASC children/teenagers and a higher perseveration in ASC young/old adults. These results suggest an immature executive functioning in ASC children and a slight cognitive decline in old adults, as suggested by similar trends in the control groups. Despite this, the control groups show weaker intra-condition behavioural differences than the ASC groups, where the difference between age groups appears more marked.
Although many latent variables can contribute to these different behavioural performances (e.g., impaired social learning in autistic children, see [56] for a review), our data suggest that the lack of inner speech development in ASC people could make the ageing effects more evident. In particular, since in control conditions the inner speech represents a cognitive support for an immature executive functioning in children and a compensating process in old adults, its absence in autistic peoples could deprive them of these compensation processes.
Our hypothesis can contribute to explain the contrasting evidence of studies on autism, inner speech and executive functions (for a review see [13]). In particular, the differences might be due to the heterogeneous involved populations that span from children to old adults. Moreover, this proposal is coherent with many studies regarding autism and life-span cognitive changes [57,58] suggesting that also autistic people show an improvement of executive functions during the life-span. Indeed in ASC teenagers and young adults, compensating processes emerge (e.g., higher visual skills and visual thinking with respect to neurotypical peoples; [26,27,25]). However, the lack of inner-speech support still represents a strong impairment for children and old adults.
Finally, our results have interesting clinical implications. Many therapeutic approaches aim to limit compromising symptoms in autism [59], but only few of them focus on speech abilities in autism [60,61,62]. These approaches aim to increase linguistic skills to improve social communication abilities, but they do not directly focus on self-directed language (inner-speech). This study suggests that clinicians should device a new class of therapeutic approaches primarily focusing on developing inner speech skills in autistic children. In particular, the integration of early development of inner speech and strong visual thinking could represent an important cognitive support along the life-span of autistic people, from childhood to adulthood.

Conclusions
This study investigated the relationship between inner speech, ageing, and autism spectrum condition. The study used a previously validated computational model of the Wisonsin sorting cartd test, to reproduce and interpret the data of eight groups of participants differing in age and condition (control condition vs. autism spectrum condition; ASC). The results showed that in the control condition, inner speech contribution increases from childhood to adulthood, possibly supporting early development in children and compensating cognitive decline in old adults. Conversely, the results showed that ASC do not exhibit an inner speech contribution without differences between different ages. Although we found more attention failures in ASC children/teenagers and perseveration in ASC young/old adults, only ASC children and old adults showed lower performances than their matched controls, while no difference was present in teenagers and younger adults. Hence, our results suggest that the absent inner speech support in ASC people creates more difficulties in early development and life-span ageing effects. This hypothesis has clinical implications, suggesting that psycho-therapeutic approaches should focus more on developing inner speech skills in autism spectrum condition.

Authors contribution
GG: Idea, model design, model implementation, simulations, data analysis, result interpretation, writing. AB: Idea, result interpretation, writing. AM: data analysis, result interpretation, writing. GB: Idea, model design, result interpretation, writing, overall supervision.  Table 1 -Values of the parameters of the models that produce the best fit of the data on the WCST indices.