## Subjects

A total of 177 participants including 60 typical development (TD) subjects (50 males/10 females, mean age = 11.8 years, standard deviation = 2.8), 29 ASD subjects (24 males/5 females, mean age = 11.5 years, standard deviation = 2.6), 54 ADHD-Combined subjects (45 males/9 females, mean age = 11.2 years, standard deviation = 2.5) and 34 ADHD-Inattentive (28 males/6 females, mean age = 11.7, standard deviation = 2.5) matched in age and gender were enrolled in this study. The differences in gender and age were tested using Chi-square test and analysis of variance (ANOVA) across groups. All the data was accessed from NYU Langone Medical Center’s dataset in which participants were recruited in New York City and surrounding areas and scanned using the same MRI and parameters (https://fcon_1000.projects.nitrc.org/indi/abide/abide_I.html). The study for ASD and ADHD was approved by the local Ethics Committee, and the written informed consent/assent in accordance with the NYU-SOM IRB was provided and obtained. Although ABIDE has a lot of ASD data, we only kept ASD and ADHD data which was acquired from the same site to exclude effects of different MRI scanner and scan parameters for further analyses. Although there have some harmonization approaches to eliminates the infulences of different scanner and scanner parameters, we believe that data from the same center using the same scanning protocol is more controllable and convincing. Given that NYU dataet has the largest ASD data in single site, thus we chose data from NYU for analyses. Since most of ASD subjects from NYU are co-morbid with ADHD, 29 ASD subjects were finally selected in our study by excluding the ASD subjects comorbid with ADHD.

## Clinical assessments

For all the participants, the full intelligence quotient (FIQ), verbal IQ (VIQ), and performance IQ (PIQ) are assessed using the subtests of the Wechsler Abbreviated Scale of Intelligence (WASI), and the handedness was assessed by 22 items Edinburgh Handedness Inventory. Body Mass Index (BMI) was determined by measuring body weight and height at the initial visit only for TD and ASD. For ASD subjects, the clinical performances including Autism Diagnostic Interview–Revised (ADI), Autism Diagnostic Observation Schedule (ADOS), and Vineland Adaptive Behavior Scales-Second Edition (VABS) were evaluated (Pepperdine and McCrimmon, 2018). ADI consists of social (reciprocal social interaction subscore), verbal (abnormalities in communication subscore), restricted, repetitive, and stereotyped patterns of behavior subscore (RRB), and onset (abnormalities of behavior evident at or before 36 months subscore) scores. ADOS consists of total (classic total score), social affect (social affect subscore), and RRB (restricted and repetitive behavior subscore) scores. VABS consists of communication, daily living, and social scores. For ADHD subjects, the clinical performances including ADHD index, Inattentive, and Hyper/Impulsive scores were acquired.

## Resting-state fMRI data acquisition

The resting-state fMRI was scanned using a SIEMENS 3.0T (MAGNETOM Allegra syngo) MRI machine. Most subjects were instructed to relax and to keep their eyes open while a few of them closed their eyes in a few cases. The fMRI images were scanned with the following parameters: repetition time = 2000 ms, echo time (TE) = 15 ms, flip angle = 90 degrees, voxel size = 3×3×4 mm3, slices = 33, SNR = 1, 180 measurements. The details for the scanning information could be found in a previous study (Pehlivanidis et al., 2021)

### Resting-state fMRI data preprocessing

Resting-state fMRI data were preprocessed as follows: 1) removing the first 10 volumes to avoid magnetization effects; 2) remaining volumes were realigned to the first volume to correct head motion; 3) registration to the EPI template and resampled to 3 × 3 × 3 mm3; 4) smoothing the images with 6mm full-width at half maximum (FWHM) Gaussian kernel; 5) detrending and regressing nuisance covariates including Friston-24 head motion parameters, global mean, white matter, and cerebrospinal fluid signals; 6) filtering with band path of 0.01–0.1 Hz; 7) scrubbing with cubic spline interpolation for frame-wise motion correction with mean frame displacement (FD) > 0.5 mm. To eliminate head motion effects, the subjects with head motion exceeding one voxel and the mean frame-wise displacement > 0.5 were excluded. FD differences were tested using ANOVA with *p* < 0.05 and post-hoc between groups differences were tested using two-sample t-tests with *p* < 0.05 with false discovery rate (FDR) corrected.

### Define subject-specific 17 functional networks

In this study, we used a spatially regularized NMF to delineate subject-specific functional networks (Li et al., 2017). Mounting evidence has demonstrated that the human brain could be stably and reproduceablely parcellated into 7 or fine-grained 17 functional networks for visual, somatomotor, limbic, dorsal and ventral attention, frontoparietal, and default mode networks (Yeo et al., 2011). In this study, regularized NMF method was employed to define 17 large-scale brain functional networks in each individual for further analysis. Before decomposition, a linear shift for each voxel’s time series was performed to make the values of all the time points nonnegative. Then, the time series were normalized with its maximum value to change the values of all time points within the range of 0–1. After that, the individual functional network mapping was performed as follows (details see Figure S1 in Supplementary Materials): 1) group network initialization: we first constructed a matrix with 8500-time points (170-time points for each subject) and 67541 voxels (whole brain) based on 50 randomly selected subjects and decomposed this matrix with an alternative optimization method and random nonnegative initialization to generate network time series matrix and network probability matrix (Lee and Seung, 1999). The network probability matrix had 17 rows (17 networks) and 67541 columns (67541 whole voxels), indicating the probability of each voxel belonging to each network. We repeated this step 50 times to enhance robustness and obtained 50 network probability matrices; 2) group network atlas creation: we used spectrum clustering method to make 50 network probability matrix into one consensus probability matrix (Jianbo and Malik, 2000). The size of consensus probability matrix is the same as the network probability matrix, and served as the group network atlas; 3) personalized network definition: each subject’s whole functional time series matrix was decomposed using regularized NMF and generated the individual’s 17 functional networks with the group atlas generated in step 2 as a prior. The details for individual functional network mapping can be found in previous studies (Cui et al., 2020; Zhang et al., 2022), and the codes to define individual networks can be found in this linkage (https://github.com/hmlicas/Collaborative_Brain_Decomposition).

To test whether the reconstructed signal can restore the original signal, a Pearson’s correlation coefficient was calculated between original and reconstructed signals for each voxel of the whole brain in each subject. Each voxel was assigned to one of the 17 networks in which this voxel has the maximum load. The mean correlation coefficient was calculated across all voxels and subjects for each network characterizing reconstruction accuracy.

To further quantify individual differences of the 17 functional networks in the brain, the median absolute deviation was taken as an indicator to evaluate the variability of the brain functional networks across subjects as previous studies (Cui et al., 2020; Zhang et al., 2022). First, a load matrix (67541 × 17) representing the loadings of each voxel in 17 networks was acquired using NMF method as stated above, and the median loading of each voxel in each network was computed across all the subjects. Then, the absolute deviation between the load of this voxel in each subject and the median loading was computed. Finally, the average value of the median absolute deviation across all the 17 networks was used to evaluate the variability of the brain functional networks.

## Node-wise and edge-wise FNC analyses of the 17 networks

The largee-scale functional network connectivity (FNC) was measured by calculating the Pearson correlation coefficient between time courses. In our study, both node-wise and edge-wise FNC were analyzed in TD, ASD, and ADHD. For node-wise network topological analysis, a 17×17 FNC matrix was calculated for each subject. For edge-wise functional network analysis, we first obtained all edges, i.e. functional connectivities of any pair of the 17 networks. Then, the time courses of all 136 (17×(17 − 1)/2) edges were acquired to calculate the edge-wise functional network. Suppose that \({x}_{i}=[{x}_{i}\left(1\right), {x}_{i}\left(2\right),\dots , {x}_{i}\left(T\right)]\) and \({x}_{j}=[{x}_{j}\left(1\right), {x}_{j}\left(2\right),\dots , {x}_{j}\left(T\right)]\) are time series of network *i* and *j* respectively, the node-wise functional connectivity is defined as Pearson’s correlation coefficient between *x**i* and *x**j*. For edge-wise functional connectivities, edge time series \({e}_{i}=[{e}_{i}\left(1\right), {e}_{i}\left(2\right),\dots ,{e}_{i}(T\left)\right]\) are calculated as \({e}_{i}\left(t\right)={x}_{i}\left(t\right)\bullet {x}_{j}\left(t\right)\), and repeating this procedure for every pair of networks to obtain all the 136 edges’ time series. Then, the functional connectivities between any pair of edges were computed and a 136×136 matrix representing edge-wise functional network was obtained.

## Node-wise and edge-wise large-scale functional connectivity differences

After obtain node-wise and edge-wise functional connectivity matrix, the node-wise and edge-wise functional connectivity differences between any groups of TD, ASD, ADHD-Combined, and ADHD-Inattentive were analyzed. ANCOVA with FD as covariate was first used to identify the difference in each functional connectivity across all the groups with *p* < 0.05. If significant differences found, post-hoc two-sample t-tests were used to determine between-group differences in connectivity strength with the significant level of *p* < 0.05, FDR corrected.

## Graph theory based network attributes analyses

Graph theroy was used to analyze global and nodal network parameters to determine differences in brain topological organization. Both node-wise and edge-wise network topological properties were investigated. Before analyses, both node-wise and edge-wise functional connectivity matric were threshold with *p* < 0.05 to reserve significant connectivities. In this study, we did not use different sparity values to threshold for network analyses since the node-wise connectivity matrix only contains 17 node. The small-world property including normalized clustering coefficient (*γ* ) *> >* 1, normalized characteristic path length (*λ*) ≈ 1, and smallworldness (*δ*) *>* 1 was first evaluated before calculation of global parameters (Watts and Strogatz 1998). When meeting the small-world criterion, the global and nodal topological parameters including shortest path length (*Lp*), global efficiency (*Eglob*), local efficiency (*Eloc*), clustering coefficient (*Cp*), assortativity (*r*), modularity (*Q*), betweenness centrality (*B**e*), and degree centrality (*D**eg*) were computed. The formula for these attributes calculation are shown in Supplementary Materials (section of Network attributes calculation). The significant differences in network attributes were determined using ANOVA with *p* < 0.05 among all the groups, and post-hoc two-sample t-tests were applied to determine between group differences with the significant level of *p* < 0.05, false discovery rate (FDR) corrected.

## Edge-wise community detection and functional connectivity analysis

To explore whether the 136 edges could be futher grouped to specific communities, a modified k-means algorithm was applied to clustering the connectivity matrix of the 136 edges (Faskowitz et al., 2020). The details for edges’ community detection were as follows: 1) modularity analysis using a spectral optimization algorithm of the connectivity matrix derived from the 136 edges was adopted to identify the number of optimal modules; 2) a modified k-means algorithm was used to group the 136 edges into different communities; 3) to link different edges’ communities with cortical networks, the total numbers of each functional network belonging to a specific community was calculated (for each edge of the 136 edges, it connects two of the 17 functional networks, when it belongs to a specific community, the connected two functional networks were considered to belong this community);

To determine whether functional connectivities between different communities could differentiate TD, ASD, ADHD-Combined, and ADHD-Inattentive, functional connectivity defined using Pearson’s correlation coefficient was computed between any two communities. ANCOVA with FD as covariate was used to identify the difference in each functional connectivity across all the groups with *p* < 0.05, and post-hoc two-sample t-tests were used to determine between-group differences in connectivity strength with the significant level of *p* < 0.05, FDR corrected.

## Top and bottom amplitude co-fluctuation analysis

A recent study demonstrated that only a small fraction of frames exhibiting the strongest co-fluctuation amplitude could explain the overall pattern of connection and act as the primary driver of resting-state functional connectivity (Zamani Esfahlani et al., 2020). Thus, we examined the top 5% and bottom 5% amplitude co-fluctuation frames of each subject’s time series to explore whether they can effectively discriminate different disorders. We computed the root sum square (RSS) of the top and bottom 5% volumes’ time series. ANCOVA with FD as covariate was executed to identify the difference in top and bottom 5% amplitude co-fluctuation across all the groups with *p* < 0.05. Post-hoc two-sample t-tests were used to determine between-group differences in top and bottom 5% amplitude co-fluctuation with the significant level of *p* < 0.05, FDR corrected.

## The transition analysis of high-normal-low amplitude frames

To determine whether different groups have different numbers of transitions among high, normal, and low amplitude frames, we divided the whole time series of the 136 edges into three levels of frames according to the magnitude of amplitude: bottom 5% frames (the lowest 5% amplitude frames), normal frames (amplitude from 5–95%), and top 5% frames (the highest 95% amplitude frames). We calculated the number of transitions of the whole time series from bottom to bottom stage, bottom to normal stage, bottom to top stage, normal to bottom stage, normal to normal stage, normal to top stage, top to bottom stage, top to normal stage, and top to top stage. ANCOVA with FD as covariate was performed to identify the difference in the number of transitions across all the groups with *p* < 0.05. Post-hoc two-sample t-tests were used to determine between-group differences in the number of transitions with the significant level of *p* < 0.05, FDR corrected.

## Correlation analyses

To identify the associations between changed neuroimaging measurements and clinical performances, correlation analyses using Pearson’s correlation coefficients were performed. The significance level was set at *p* < 0.05 corrected with the FDR method.