An autism spectrum disorder adaptive identification based on the Elimination of brain connections: a proof of long-range underconnectivity

Autism spectrum disorder (ASD) is theoretically characterized by alterations in functional connectivity between brain regions. Many works presented approaches to determine informative patterns that help to predict autism from typical development. However, most of the proposed pipelines are not specifically designed for the autism problem, i.e. they do not corroborate with autism theories about functional connectivity. In this paper, we propose a framework that takes into account the properties of local connectivity and long range under-connectivity in the autistic brain. The originality of the proposed approach is to adopt elimination as a technique in order to well emerge the autistic brain connectivity alterations, and show how they contribute to differentiate ASD from controls. Experimental results conducted on the large multi-site Autism Brain Imaging Data Exchange (ABIDE) show that our approach provides accurate prediction up to 70% and succeeds to prove the existence of deficits in the long-range connectivity in the ASD subjects brains.


Introduction
Autism spectrum disorder (ASD) is one of the biggest challenges of modern medicine. From being considered a rare disorder affecting approximately 4 to 5 in every 10,000 children (Harris 2016), it is now considered a common disorder with an increased occurrence that reached 1 in 54 children identified with ASD according to the estimates from the Centers for Disease Control and Prevention CDC's Autism and Developmental Disabilities Monitoring Network (Maenner et al. 2016), with predominance in boys over girls. Its impact does not only touch the children, but also their parents, as high levels of distress were measured in parents with autistic children (Salomone et al. 2019 Autism symptoms and severity are heterogeneous and vary throughout the development. In general, ASD is characterized by impaired social interaction, language and communication abnormalities, and stereotypical behavior. Moreover, an investigation of Predictive Adaptive Behavior Skill (Emma et al. 2021) showed that autistic children also presented individual lower raw scores in community use, functional preacademics, home living, health and safety, leisure, self-care and self-direction. According to Stiles and Jernigan (2010), two important processes in the Brain Development involve substantial loss of neural elements. The first is naturally occurring cell death, which involves a loss of more than 50% of the neurons; and the second is synaptic exuberance and pruning in which there is massive production of connections followed by the systematic elimination of up to 50% of those connections. However, Courchesne et al. (2011) states that males with autism had an abnormal excess number of prefrontal neurons (a mean of 67% more) than those in the control group. This clearly indicates that the autistic brain is mainly characterized by some connectivity problems. The brain is a complex network composed of regions that communicate through anatomical and functional connections with information processing that is near-optimal. These networks can be obtained through the anatomical and functional neuroimaging techniques MRI, DTI, EEG, MEG, fMRI, PET, SPECT (Camprodon and Stern 2013;Alessandro et al. 2017;Du et al. 2018). Among these, the MRI have played a pivotal role in providing in vivo information in a noninvasive manner to identify aberration in brain development (Hiremath et al. 2021). fMRI (functional MRI) goes farther by detecting changes in the MRI signal that arise when changes in neuronal activity occur with a change in brain state. This method remains the most used Neuroimaging Technique (Meijie et al. 2021), thanks to its availability and safety, especially on children. In fMRI, the various neural connections are identified by measuring the temporal correlation between fluctuations in the Blood Oxygen Level Dependent (BOLD) signal of discrete anatomical regions (Nierhaus et al. 2012). The final results are 4-dimensional images that enable the visualization of brain structure and the study of the functional activities of the brain regions. fMRI can be conducted in two settings, resting-state and task-based. Even though experimental tasks fMRI, may be more suitable to achieve high classification accuracies. But for task-based fMRI data, it is difficult to acquire large datasets. Usually the sizes of the task-based datasets are composed of less than 50 subjects (Meijie et al. 2021). Moreover, it can be ill-suited for many individuals with ASD, particularly those who would be considered low-functioning or young children (Lau et al. 2019). Furthermore, the resting-state fMRI requires neither a constrained experimental setup nor the active and focused participation of the subjects. It also has been proven to capture interactions between brain regions that may lead to neuropathology diagnostic biomarkers (Greicius 2008). Many methods have been used to study the connectivity networks contained in resting state fMRI images, such as, the multi-echo independent component analysis (multi-echo ICA) (Kundu et al. 2017), the Group-ICA (Tang et al. 2017) and the neighborhood one-class SVM (Yang et al. 2007). Those later allow to come up with descriptors in order to classify subjects. The big issue of the above cited works is the small size of the data that prohibits the generalization of descriptors, as addressed by Button et al. (2013), Mumford (2012). This made the authors of Nielsen et al. (2013), Abraham et al. (2017) and Liu et al. (2019) use a large database, ABIDE, to overcome the handicap of reduced samples of previous autism research. In Nielsen et al. (2013), 964 subjects fMRI images' BOLD signal were used to compute 7266 x 7266 connectivity matrices, by calculating the pair-wise correlation between each ROI. The 7266 ROIs were formed by clustering seed voxels into Euclidean-close regions separated by at least 5 mm. A leave-one-out approach was then conducted reaching 60% of classification accuracy. Later in Abraham et al. (2017), the authors approach consisted of a specific intra and inter site classification using the cross-validation method. They built functional connectivity matrices using multiple atlases for the extraction of the ROIs and three methods for computing correlation. This approach showed good classification results achieving an accuracy of 67%. However, the pipeline did not consider autism connections specifically, since the connectomes were used with no intermediate processing before the classification. On the other hand, Bolte et al. (2019) concluded that research needs to consider autistic traits to achieve better results in the autism identification. The results achieved in Liu et al. (2019) support this point, since authors were able to observe differences in patterns between healthy and autistic children when local and long-distance deficits were considered. Theoretically, ASD is characterized by an over-connectivity between local brain regions, and an under-connectivity between long-regions (Hull et al. 2017). In this paper, we propose an approach that consists of eliminating brain connections in order to prove the existence of ASD theories. Our main idea is to extract connections that correspond to a given theory before eliminating them from the initial connectivity matrices. The resultant connectivity matrices are then used to classify the subjects. The impact of the elimination will indicate whether the studied theory exists or not, reposing on the fact that when removal of some specific connections impacts the classification performances, it means that there are descriptors of the disorder hidden in those connections. In the present study, we focus on the long-range underconnectivity of autistic brains. For this, we use tools that are technically adaptive to this theory, namely hierarchical clustering (HC) (Fang 2021) and minimum spanning tree (MST) (Daintith and Edmund 2008). HC permits to extract local-range and long-range connections, while MST is used to identify the under-connectivity between the long-range regions. It is worth mentioning that we already investigated the underconnectivity deficit in Benabdallah et al. (2018). Here, we extend the investigation to long-range (inter-cluster) underconnectivity, which is a more specific property of the autistic brain. To generalize our results, experiments are conducted on the ABIDE (Autism Brain Imaging Data Exchange) database. ABIDE was created to gather a large base of data to facilitate the autism research, and overcome the lack of generalization due to the small size of testing data. This database provides fMRI images of a big number of subjects from multiple international sites with no prior coordination and with different ranges of age, IQ and gender among other characteristics. It is to note that our approach permits to deal with the computational issues related to the large size of the ABIDE database, thanks to the elimination process which helps in alleviating the network.
Hence, in the present work we propose the elimination of brain connectivities as an approach to address three major concerns: -Investigation of long-range underconnectivity theory related to ASD. -Enhancement of the classification of ASD subjects.
-Generalization of the investigation using large datasets, and dealing with the computational demands of such datasets.
The remainder of this paper is organized as follows. Section 2 describes the data and the experimental framework used to classify the subjects and verify the autism theories. Section 3 presents the results of the classification after the application of every method from the approach. In Sect. 4, we discuss the results achieved and the performance of the framework. Finally, a conclusion sums up and present what was achieved in this paper.

Data
ABIDE database is partitioned in two datasets, ABIDE I and ABIDE II. We decided to work with ABIDE I because a preprocessed version is proposed . This later offers a base for comparison that diminishes the impact of the preprocessing pipeline between research approaches. Since according to Strother (2006) and Hull et al. (2017) the final results of any analysis is highly correlated with the choice of the preprocessing pipeline. In concordance with this, the used preprocessing pipeline is the C-PAC , which was also used in Abraham et al. (2017). The ABIDE I dataset involved 17 international sites, sharing previously collected resting state functional magnetic resonance imaging (Rs-fMRI), anatomical and phenotypic datasets made available for data sharing with the broader scientific community. This dataset was the first available for sharing and yields 1112 subjects (539 individuals with ASD and 573 typical controls) with ages between 7 and 64 years. All the information included are anonymous. The imaging acquisition machines are from different brands, and most fMRI images in the ABIDE database are not treated or corrected in any way. Therefore, the Nilearn Abraham et al.
(2014) project experts tested the data and eliminated all damaged images (Abraham et al. 2017). Hence, the new database contains 870 subject with 468 controls and 402 autistics.

FMRI images to connectivity graphs
The brain is a complex system with many interactions. These latter can be visualized in fMRI through the capture of changing blood oxygen level that are related to neurons' activities (Li and van Zijl 2020). However, the brain is composed of billions of neurons, hence, clustering them into ROIs is an important step. From a neuroscience perspective, using ROIs instead of voxels has been proved more informative (Zafar et al. 2016). Moreover, having the same set of ROIs for all subjects allows easier inter-subject comparisons and reduces the individual anatomical variability that exists in human brains.
As can be seen in Fig. 1, the Automated Anatomical Labeling (AAL) atlas (Tzourio-Mazoyer et al. 2002) is used for brain parcellation. Time series of these parcels (AAL ROIs) are extracted using an ordinary least square approach (OLS) from Nilearn (a library which patterns are described in Abraham et al. (2014)). Then, these latter are used to pull out the correlation matrices between the ROIs of the brain. Here, the correlation matrices are computed using the tangent, as a method that led to the best classification results in Abraham et al. (2017). Finally, the correlation matrices are transformed into graphs to visualize the connections between the ROIs of the atlas. A graph g = (V , E) consists of vertices V that are connected by a set of weighted edges E ⊆ V × V . In this work, the vertices are the atlas ROIs and the edges are the connections computed in the correlation matrices.

Minimum spanning tree
The spanning tree is an unbiased graph theory tool that permits to reduce the complexity of high-order network structure while preserving its core framework (Li et al. 2017;Guo et al. 2017). Its potential as a powerful method has been brought to light in the analysis of schizophrenia (Anjomshoa et al. 2016). Moreover, its application greatly improved the diagnostic accuracy for Alzheimer's disease (Guo et al. 2017). A spanning tree T of a graph G is a subgraph that includes all vertices of G, and connects them without any cycles. To extract a specific spanning tree a condition is added. For example, to extract the minimum spanning tree the condition is to only extract the vertices with the weakest weights. In this paper, this process is done through a greedy algorithm, the Kruskal algorithm (Horowitz et al. 1978). The greedy choice is to pick the smallest weights while constructing the tree to result into the minimum spanning tree. For that, the algorithm sorts the edges based on their weights. Then, it builds the tree by adding increasing cost arcs at each step while keeping the total weight of all the edges to the minimum. Fig. 1 Steps to construct the connectivity graphs

Hierarchical clustering
Hierarchical clustering (HC) builds a cluster hierarchy that is commonly displayed as a tree diagram called a dendrogram. In the context of brain connectivity, each ROI represents a separate cluster in a first time. Then step by step, the distances of dissimilarity between every two clusters are computed and the two most similar clusters are joined into a single new cluster till the construction of the final cluster that contains all the ROIs. The idea behind the clustering came from Ghanbari et al. (2015) where a Multi-Layer graph clustering method permitted to detect connectivity problems in the autistic brain. The hierarchical clustering was used to investigate the Genotype to phenotype relationships in autism spectrum disorders (Chang et al. 2014) using the strength of interactions in the phenotypic network as a metric. However, this latter does not guarantee spatially homogeneous clusters, a requirement that has been found to be advantageous (Blumensath et al. 2013). In this work, clusters are constructed using the Euclidean distance since it provides enforced spatial contiguity and also because we are dealing with ROIs that are defined in a 3-dimensional space. The Euclidean distance between two points a = (a 1 , a 2 , ..., a n ) and b = (b 1 , b 2 , ..., b n ) of an Euclidean n-space is given by: In addition, to compute the dissimilarity of sets, we use the wards method since according to Esztergár-Kiss and Caesar (2017), it offers higher accuracy compared to other methods and minimizes the variance between the elements. In the wards method, the choice of the clusters to merge at each step is based on the optimal value of an objective function.
Where, C i , C j and C k are disjoint clusters with sizes n i , n j and n k , respectively.

ASD classification
An ASD classifier categorizes the input subjects into a defined number of groups. Here, the first group contains the neurotypical subjects. Its label is 0. The second group is for ASD subjects with label 1. The objective of this classification is to decide the class label of every subject using feature vectors extracted from the correlation matrices computed using the methods presented above. The classifier used is an l2penalized support vector classifier SVC with a p-value of 0.01 and a 10 folds cross-validation. SVC is an implementation of SVM by scikit-learn library (Pedregosa et al. 2012). It returns a "best fit" hyperplane that divides the learning data into a fixed number of classes and then predicts the test data.
To measure the performance of this classification, three metrics are computed, namely, the accuracy, the sensitivity and the specificity (Baratloo et al. 2015). The accuracy shows the classifier ability to categorize ASD and TDC subjects correctly. It is defined by: The sensitivity computes the ability of the classifier in classifying the ASD subjects correctly. It is defined by: False negative (FN) is the number of cases incorrectly classified as TDC.

Proposed approach
As mentioned in the introduction, the originality of the proposed approach is that it is based on the elimination of brain connectivities. The elimination takes its root from the Mathematical law of double negation (Hazewinkel 2013) where one can negate certain information in order to prove its veracity. Here, the elimination process suppresses connections extracted using the tools described in Sects. 2.3 and 2.4. It is also to note, that the approach is ASD specific since the used tools are technically similar to autism theories. It first eliminates weak connections to test the under-connectivity of the autistic brain. Then, it separates the brain connectivity map into long-range and local connectivity. Finally, weak connections are eliminated from the long-range connectivity matrices to test the long-range-under-connectivity. Figure 2 summarizes all the steps of the proposed approach.

Elimination of the weakest connections of the brain
The under-connectivity theory stipulates that some regions in the autistic brain are under-connected, or in other words, can not communicate efficiently. Therefore, to verify this theory, weak connections are first extracted from the initial connectivity graphs using the algorithm described in Sect. 2.3. Then, the constructed tree is eliminated by suppressing all its edges from its original graph from which we extracted it. Finally, the obtained graphs weights are retransformed into new correlation matrices for the classification. Figure 3 presents all the important steps.

Long-range and local connectivity matrices
In this step, we firstly create clusters of the AAL atlas ROIs using Scipy (a Python-based ecosystem of open-source software for mathematics) (Oliphant 2007). This first allocates each ROI to a separate cluster, then merges them using the equations described in Sect. 2.4 till the creation of the final cluster that contains all the ROIs (Fig. 4 represents the dendrogram created in this step). Then, a dissimilarity value is fixed to extract the clusters, before extracting the connections between and outside these clusters. By eliminating these extracted connections we construct the long-range and the local-range connectivity matrices that we use for classification.
Depending on the value of the dissimilarity measure, the number of ROIs in the clusters changes. This can be seen in Fig. 4. Therefore, multiple dissimilarity measures have been tested to bring out their impact on the results. Their values and results are reported below in the experimental results section.

Long-range under-connectivity
As mentioned in the introduction, the main goal of our study is to verify the under-connectivity between long regions, by eliminating the weak connections from the inter-cluster connectivity matrices. For this, we first extract the long-range connections from the initial graphs using the clusters constructed by the hierarchical clustering. Then, we eliminate the minimum spanning trees from the resultant graphs. Finally, we transform the newly composed graphs into matrices for the classification. All these steps are presented in Fig. 5, where a decrease of the number of edges after the extraction of the clusters long-range connections is notable ((a) to (b)). The same decrease is also visible in the matrices (fewer connections in the long-range connectivity matrix than in the initial matrix thanks to the clustering). From (b) to (c), the elimination of the MST leads to a decrease in the number of the weakest long-range connections (blue edges) which can also be seen in the associated connectivity matrix where less blue points are represented compared to the other matrices.

Experimental results
In this work, the connectivity of autistic brains is studied in a highly heterogeneous setting and on large test datasets. Methods of our approach are first applied to the whole ABIDE I dataset. Then, to verify the impact of different criteria (age, gender, handedness) on the results, the same methods are also applied on different sub-datasets. Information on these laters are stated in Table 1.
As mentioned before, the proposed approach starts by computing the correlation matrices of the AAL atlas ROIs. Then, our approach's methods are applied to construct the new connectivity matrices as reported in Sect. 2.6. Later, every group of matrices is classified separately. The performance measures of these classifications are stated in the tables below.
Finally, to highlight the performance of our approach, we compare it with the most influential state-of-the-art approaches, namely Nielsen et al. (2013) and Abraham et al. (2017). A brief description of those approaches and the proposed one is given as follows: - Nielsen et al. (2013): This method uses a leave-onesubject-out cross validation for classification on the ABIDE subjects' correlation matrices.  slight increase in the accuracy values of sub-datasets D1, D2, and D5 but in D3 and D4 the accuracy decreases. However, an important result shows up, which is the decrease of sensitivity in all datasets no matter the value of accuracy. This same result was also found in Benabdallah et al. (2018) with different criterions sub-datasets. Which means that the eliminated connections contain information about the autistic brain connectivity deficit. These results are contained in Table 2 that shows the results of the initial matrices classification compared to classification after the elimination of MST values. In the second method (HC-LRM) we divide the brain into several clusters. The hierarchical clustering permits to create different groups of clusters depending on the dissimilarity measure value. This latter is chosen empirically by testing different values to extract different clusters of ROIs. Then using these clusters, we construct the long-range and local connectivity matrices of the first dataset D1 and classify them separately. The metrics of this classification are stored in Table 3, where we notice an increased accuracy of the inter-connectivity matrices classification using dissimilarity values of 80, 90 and 100 with a mean of 6-8 ROIs per clus- ter. However, after tests on the other datasets, 80 remains the suitable tuned dissimilarity value. Another point that jumps to attention in Table 3 is that the inter-connectivity (HC-LRM) leads to better accuracy results than the intra-connectivity (HC-LM). The highest intra-connectivity classification result in the whole dataset (D1) is 65.09%. It was achieved with a 300 dissimilarity distance value (3 clusters of ROIs). But this value is still lower than the accuracy achieved without clustering (68.13%). That is why we focus on the inter-connectivity in the remainder of this work.
The results of Table 4 stems from classifying interconnectivity matrices of clusters constructed using a dissimilarity value of 80. They show an improvement of the three classification metrics (accuracy, sensitivity and specificity) of the inter-connectivity matrices when compared to the initial matrices in all datasets. In other words, the elimination of the clusters' local connections guarantees better results even when using different criterions datasets. This supports the brain's communication deficit in autism, and precisely the long-range deficit.
From the results presented in Tables 2 and 4, we can conclude that both the elimination of the minimum spanning tree and the elimination of the HC clusters local connections had an impact on the classification. When combined (eliminating the local connections, then removing the MST from the resulting inter clusters), they lead to the same decrease in sensitivity as reported in Table 2 after removing MST from the initial matrices. In other words, the elimination of the minimum spanning tree has a direct impact on the sensitivity (the ability of ASD prediction), which means that the weakest connections extracted using the MST contain information about autism. Therefore, after their elimination the sensitivity always drops. Table 5 summarizes those results. Comparing our approach's methods with the state-of-art approaches applied to ABIDE I, reported in Table 6, we can see an increase in the accuracy value, reaching 69.76% with a 3% raise compared to Abraham et al. (2017) where the best result was 66.80%. We also remark an increase in sensitivity in the long-range classification compared to the highest result achieved by Nielsen et al. (2013). Finally, the impact of eliminating the MST is highlighted when comparing the sensitivity values of all methods.

Discussion
In this work, we studied the autistic brain connectivity in a highly heterogeneous setting and on large test datasets from ABIDE I. The problem of using ABIDE is the uncontrolled heterogeneity due to the multi-site fMRI data acquisition that influences the results. The uncontrolled heterogeneity poses great challenges to develop brain-based classifiers for psychiatric illnesses in general. Without forgetting the sources of uncontrolled variation that can arise across sites as the scanner type, pulse sequence and sample composition. But since our objective is to simplify the process of autism detection and since the heterogeneity is part of the real setting, we used the datasets with little changes. Only the images with quality issues were eliminated. That is why using sub-datasets with certain criteria helps to meld those challenges and might even give more weight to findings from the whole dataset.
Our objective was to verify the ASD theories and also to improve the detection process of autism. Our approach consisted of suppressing brain connections that are related to a given theory. For this, we used and combined specific tools to extract those connections. We taught about the minimum spanning tree because it permits to simplify a complex network into a simple one based on weights. Moreover, MST is concordant with the under-connectivity theory of the autistic brain, since it extracts the weakest connections.
Besides, we used hierarchical clustering to divide the brain regions into clusters. By doing so, we verified the intra-and inter-connectivity of the brain as in Ghanbari et al. (2015) where the use of clustering led to the discovery of deficits in ASD brain connectivity.
Moreover, since Autism affects children in the early years of childhood, we searched for the most common and simple imagery acquisition method that can extract the brain connectivity without causing any damage to the brain and has the advantage of availability and cheap cost compared to other imagery methods. Then, we tested multiple ASD specific methods and combined them to improve the accuracy of ASD detection. The accuracy obtained is 69.76% on the whole dataset D1, as can be seen in Table 5, which exceeds previously published ABIDE findings (Abraham et al. 2017;Nielsen et al. 2013).
The elimination of the minimum spanning tree led to a diminution in the sensitivity in all datasets. Since the sensitivity mirrors the classification of the autistic subjects, its decrease implies that the eliminated weak connections contain biomarkers relative to autism, and supports the autism under-connectivity theories. The result of this elimination was reported in Table 2.
On the other hand, the elimination of intra-connectivity (the clusters local connectivity) using the clusters constructed with the hierarchical clustering improved accuracy and sensitivity. This latter increased from 61.9 to 63.07% which indicates a long-range connectivity deficit in ASD. This can be seen in Table 4 that reports the elimination of intraconnectivity in all datasets. The importance of the choice of the dissimilarity distance used to construct the clusters was also highlighted by testing multiple values. A good choice can strongly improve the prediction accuracy and sensitivity, as seen in Table 3.
Later, when the minimum spanning tree was again eliminated from the long-range connectivity matrices, the above findings of Table 2 were confirmed in Table 5. Where, again, the elimination of the weak connections extracted from the inter-connectivity matrices showed a decrease in the sensitivity. Which gives more weight to the under-connectivity deficit in Autism, especially between the long-range regions. Hence, we conclude the existence of the long-range underconnectivity deficit in the autistic brain. In this study, the results showed evidence of a deficit in the long-range connectivity between the regions of the autistic brain that is independent of age, gender and handedness. This was confirmed by the classification results of the sub-datasets that showed the same pattern as the classification applied on the whole dataset. An accuracy of 70% proves that the proposed approach is very promising and leads to accurate prediction. As the elimination of inter-connectivity enhances the accuracy while the elimination of the MST impacts the sensitivity. Their combined use highlighted the long-range under-connectivity and permitted better results compared to the state-of-art, as can be seen in Table 6. However, there are still other theories that we could not verify as the over connectivity reported in Ghanbari et al. (2015), Supekar et al. (2013 Nov 14) and Keown et al. (2013 Nov 14). Such con-nectivity might be between specific ROIs of the brain and should be studied separately.

Conclusion
In this paper, we suggested a novel approach based on ASD adaptive methods to verify autism theories about brain connectivity. Our aim was to improve the accuracy of the detection of this disorder while keeping up with the advance in the autism research. We used ABIDE I a large dataset with data from different world sites. We applied the minimum spanning tree and the hierarchical clustering to extract the weak connections, and the local/long-range connections, respectively. A combination of these two methods was used to construct connectivity matrices to verify the long-range underconnectivity. From the results, we were able to detect a long-range connectivity deficit and proved that it is independent of age, gender and other criteria as the handedness. Furthermore, an under-connectivity problem was highlighted by the pattern found in the specificity and the sensitivity values. This holds promise to future detection of decisive biomarkers of autism that will help in understanding and diagnosing this disorder.
Funding Not applicable.

Data Availability
The resting state fMRI images data used to support the findings of this study are from the ABIDE I dataset that have been cited. The preprocessed version of the data is available at: http://preprocessedconnectomes-project.org/abide/Pipelines.html.