Functional Imaging Derived ADHD Biotypes Based on Deep Clustering May Guide Personalized Medication Therapy

Attention deficit hyperactivity disorder (ADHD) is one prevalent neurodevelopmental disorder with childhood onset, however, there is no clear correspondence established between clinical ADHD subtypes and primary medications. Identifying objective and reliable neuroimaging markers for categorizing ADHD biotypes may lead to more individualized, biotype-guided treatment. Here we proposed graph convolutional network plus deep clustering for ADHD biotype detection using functional network connectivity (FNC), resulting in two biotypes based on 1069 ADHD patients selected from Adolescent Brain and Cognitive Development (ABCD) study, which were well replicated on independent ADHD adolescents undergoing longitudinal medication treatment (n=130). Interestingly, in addition to differences in cognitive performance and hyperactivity/impulsivity symptoms, biotype 1 treated with methylphenidate demonstrated significantly better recovery than biotype 2 treated with atomoxetine (p<0.05, FDR corrected). This imaging-driven, biotype-guided approach holds promise for facilitating personalized treatment of ADHD, exploring possible boundaries through innovative deep learning algorithms aimed at improving medication treatment effectiveness.


INTRODUCTION
Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neuropsychiatric diseases.As a highly complex and heterogeneous disorder with childhood onset 1 , attention deficit hyperactivity disorder (ADHD) is usually divided into three subtypes based on core specific symptoms (ICD-9/10;DSM-4/5) 2,3 , i.e., Hyperactivity-Impulsivity (ADHD-HI), Inattention (ADHD-I) and the Combined (ADHD-C) subtype 2 .Meanwhile, stimulant medications (e.g., methylphenidate (MPH), amphetamine) and non-stimulant medications (e.g., atomoxetine (ATX)) are two main categories of medications approved for treating ADHD.However, currently there is no clear medication treatment plan corresponding to each subtype, which is decided mainly relying on clinical experience and not always being effective.Reports have demonstrated that MPH can equally improve inattention symptoms in both ADHD-C and ADHD-I subtypes, while predominantly reducing hyperactivity in the former 4 .A rat study reported that both MPH and ATX improved overall attention, but the highest dose of MPH significantly increased impulsivity, while ATX induced decrease in impulsivity 5 .Therefore, for ADHD patient with pure inattention, the optimal medication selection is challenging in clinical practice.Thus, the identification of imaging-based ADHD biological subtypes (biotypes) that may exhibit greater sensitivity to specific drug therapies holds promise for guiding a more personalized treatment approach with enhanced effectiveness.Meanwhile, multiple functional magnetic resonance imaging (fMRI) analyses have been performed to study ADHD subtype [6][7][8][9] , though in a relatively highly piecemeal way, which most relied heavily on prior clinical subtype divisions based on pure symptoms 3 and highlighted group-level differences via univariate statistic methods [10][11][12] .For example, ADHD-C were found to have hyper-connectivity in the pre-default mode network and lower network modularity compared to health controls (HC), whereas no such differences were found in the ADHD-I subtype 13 , whilst ADHD-C/HI to ADHD-I were often compared on the level of comorbidity, treatment response, and possible etiological factors 14 .
In contrast, understanding ADHD heterogeneity how defined by multimodal neuroimaging and demographic features rather than symptoms can offer a novel perspective for unraveling the ADHD complexity and pathology, as it may provide critical information for developing a more targeted, effective treatment program based on objective measures.The existing studies on ADHD biotypes encompass investigations into brain function, structure and neuropathology using diverse methodologies [15][16][17][18] .For instance, Fair et al. adopted graph theory and community detection to clarify neuropsychological heterogeneity in typically developing children and children with ADHD, which yielded four unique subgroups each of which had unique patterns of factor scores 19 .Dias et al. performed community detection analysis and characterized heterogeneity in children with and without ADHD based on reward system connectivity 20 .These studies helped disentangle the disease heterogeneity, however, the replicability and utility of the identified ADHD biotypes require further research.
Motivated by this, we aim to develop innovative MRI-compatible, deep learning approaches to identify reliable ADHD biotypes in addressing the afore-mentioned medical issues.Deep learning (DL) has gained wide attention in mental disorder analysis due to its capability in automatically learning useful feature expression from high dimensional neuroimaging data, though its application in ADHD is still rare.The time is ripe to capitalize on these sophisticated methodologic advances and foster the convergence between clinical psychiatry and imaging neuroscience.In biotype division, compared to traditional clustering methods, deep clustering is able to learn nonlinear properties, represent low-dimensional embeddings from highdimensional neuroimaging input, and constrains these learned features to adapt to the clustering target, which maximizes the between-class distance and minimizes the within-class distance [21][22][23][24] .Specifically, both convolutional neural network 25 and generative adversarial network 26 have been employed to identify biotypes, e.g., from mild cognitive impairment.However, one issue is that clustering methods often ignored the topological information of population networks [21][22][23][24] , i.e., complex associations between individuals were not fully explored, which may lead to sub-optimal performance in biotype identification.Considering graph convolutional network (GCN) is able to encode topological information of population networks to improve disease prediction 27 , and has performed well for community detections such as social networks 28 and protein interaction networks 29 , we are motivated to integrate deep learning advances such as GCN and deep K-means to identify reliable, multi-factorial signatures for imaging-derived ADHD biotypes; so we can take an initial step towards developing biotype-guided treatment recommendations.Furthermore, public imaging data of children offers unprecedented opportunities for ADHD research.With the aid of publicly shared data from Adolescent Brain Cognitive Development (ABCD) 30 study and local longitudinal medication treatment data, we have an excellent platform with sufficient statistical power and enriched multi-factors (fMRI, demographics, cognition, symptoms) for testing and validating the proposed method and the identified ADHD biotypes.
Consequently, we proposed a graph convolution network for biological subtype detection (GCN-BSD) using both functional imaging and non-imaging phenotypic data.As illustrated in Fig. 1, we first construct a population graph based on functional network connectivity (FNC, Fig. 1a) and phenotypic information (age, gender) to build individual mappings; where FNCs serve as the feature of the nodes and the similarities between two subjects, which were abstracted from gender and age serve as edges (see detailed FNC and network derivation in Method section).Then we applied GCN-BSD to learn embeddings that are both groupdiscriminative between ADHD and controls, as well as adapted to the clustering constraint through K-Means loss (Fig. 1b).We selected 1069 ADHD patients from 11875 children aged 9-11 from the ABCD study as the discovery dataset, identified K=2 for biotype division via the cluster sum of square (CSS) using the elbow method 31 , and evaluated the clustering performance of 4 popular algorithms with proposed method based on Davies-Bouldin Index (DBI) and Calinski-Harabasz Index (CHI) 32 (Fig. 1c).As a result, two ADHD biotypes were identified, manifesting with different FNC patterns and distinguishing cognitive abilities.
More importantly, we used a valuable ADHD dataset with longitudinal medication treatment to validate the reliability and utility of the identified ADHD biotypes from ABCD.This data was collected from Peking University Sixth Hospital (PKU6, 130 ADHD, 105 controls, aged 9-15) with fMRI, cognitive metrics, and symptom records.Interestingly, we found that ADHD biotypes identified in ABCD and PKU showed high similarity and replicability in FNC patterns.The most contributing FNCs and clinical records were compared dedicatedly between two biotypes (Fig. 1d).Specifically, biotype 1 presented milder symptoms while biotype 2 manifested more severe hyperactivity/impulsivity symptoms and worse cognitive levels.Finally, we compared the symptom relief and treatment outcome of two biotypes from 44 ADHD patients either treated by MPH or ATX at PKU6 according to our division (Fig. 1e).Interestingly, biotype 1 treated with methylphenidate (MPH) showed significantly better recovery than biotype 2 treated with atomoxetine (ATX) ( < 0.05).Collectively, this is a novel attempt to identify and validate ADHD biotypes that might facilitate ADHD personalized treatment, pushing the boundaries of what is possible through innovative deep learning algorithms and assisting in clinical drug selection to enhanced cure rates.

RESULTS
Two ADHD biotypes were identified, and the number of clusters was decided by the within-cluster sum of squared errors (SSE) using the elbow method 31 , as shown in Fig. 2a.Through plotting the SSE as a function of the number of clusters  and picking the elbow of the curve, we chose  = 2 as the optimal number of clusters/subtypes.The performance of our GCN-BSD for significant characteristics along with conventional methods and deep methods with the input of FNC was reported in Table 1 and Fig. 2b.The clustering performance is evaluated concerning two standard measures: Davies-Bouldin Index (DBI) and Calinski-Harabasz Index (CHI).As shown in Fig. 2b, GCN-BSD reached the lowest DBI and highest CHI.Meanwhile, GCN-BSD achieved the smallest P-values in several cognitive characteristics.In general, compared with the traditional clustering methods and naïve deep learning methods, GCN-BSD achieves better performance by its abstracted features and GCN module.The two biotypes and demographically matched healthy controls were compared in cognitive measures (Table 2).Overall, across all ten cognitive scores, HC has the highest mean values, and biotype 1 has the lowest mean values among the three groups (Fig. 2c).The two biotypes are significantly different in seven out of 10 cognitive measures.Notably, fluid cognition (  = 3.09 × 10 !" ) and total cognition (  = 2.61 × 10 !# ) showed the most significant difference between the two biotypes.

ADHD biotypes replicated in PKU cohort
We further tested whether the identified connectivity pattern was consistent in the independent data cohort.By projecting the identified ADHD biotypes from ABCD into the PKU dataset, subjects were divided into 2 subsets.We detected the top 100 discriminative connectivity from 1378 FNCs in two biotypes by using two-sample t-tests to estimate the group differences of FNC features.To explicitly characterize the contribution of each functional network, we grouped the 53 nodes into 7 networks according to the NeuroMark pipeline, as shown in Fig. 3a and 3b.Connectivity patterns of biotype 1 and biotype 2 were highly replicable from ABCD to PKU (correlation:  = 0.75,  = 5.66 × 10 !$% for biotype 1,  = 0.76,  = 3.48 × 10 !$% for biotype 2), and regions in the DM and CC exhibited the greatest involvement.For biotype 1, the results showed that the connectivity among CC, DM, SC, SM, and VI is overrepresented and repeatable.While for biotype 2, apart from the connectivity among multiple networks and VI, the overrepresented connections between AU and CC, DM, between CC and CB are also replicated between datasets.Then we expanded these to the whole FNC, the mean differences between biotypes and HCs in PKU   In PKU, two-sample t-tests were used to estimate the group differences of the IQ scale and symptom scale, see Table 3. Performance IQ measured by the Wechsler Child/Adult Intelligence Scale distinguished itself ( = 0.01).Further, the severity of ADHD symptoms was evaluated by the ADHD Rating Scale-IV (ADHD RS-IV) and Conners' Parent Rating Scale (CPRS) 33 .And we noticed that hyperactive/impulsive symptoms and total ADHD symptoms from RS-IV, and impulsivity-hyperactivity factor and hyperactivity index from CPRS showed significant differences between biotype 1 and biotype 2 ( = 4.97 × 10 !( for RShyperactive/ impulsive,  = 3.82 × 10 !& for RS-total,  = 8.67 × 10 !& for CPRS-impulsivityhyperactivity,  = 0.01 for CPRS-hyperactivity index).

FNC pattern consistency across datasets.
Moreover, as shown in Fig. 3c and d, we extracted the overlapping FNCs between ABCD and PKU, resulting in 30 FNCs for biotype 1 and 27 for biotype 2. Both in the biotype 1 and biotype 2, the connections between the superior parietal lobule and posterior cingulate cortex, precuneus in DM, between the precuneus and anterior cingulate cortex, superior frontal gyrus, between the precentral gyrus and superior frontal gyrus are overrepresented.Separately, the key nodes in biotype 1 are the paracentral lobule, superior parietal lobule, cerebellum, frontal gyrus, and precentral gyrus, while in biotype 2 those are fusiform gyrus, posterior cingulate cortex, precuneus, anterior cingulate cortex and cerebellum.For biotype 1, the connectivity within the frontal gyrus and their connections with other regions including the cerebellum, middle temporal gyrus, paracentral lobule, precentral gyrus, and inferior parietal lobule are overrepresented.While for biotype 2, FNCs exhibited considerably denser patterns and greater degrees in DM regions including the precuneus, anterior cingulate cortex, and posterior cingulate cortex.And connectivity between the cerebellum and fusiform gyrus, and between the insula and anterior cingulate cortex are overrepresented for biotype 2.
The correlation results of the connectivity between the middle temporal gyrus and paracentral lobule, between the superior parietal lobule and posterior cingulate cortex, and the specialized connectivity cerebellumfusiform gyrus in biotype 2 are listed in Table S4.

Comparison of medication effect of two biotypes
One of the important applications of biotype detection is to observe their clinical progression and differences under treatment; therefore, we further evaluated symptom improvement within the ADHD patients under the treatment of ATX and MPH in week 1, 2, 3, 4, 8.For ADHD children under medication from PKU, 27 of 44 patients were classified as biotype 1 and the rest of them as biotype 2. To test whether there are differences in the benefits of the drug between 2 biotypes, we compared the symptom scores RS-IV and CPRS from baseline to week 8.As shown in Fig. 4, we found symptom reduction in both biotype 1 and biotype 2 from baseline to endpoint.Note that no significant group difference in symptom scales was identified at baseline, however, in week 8 the reduction rate of inattention ( = 0.01) and total ( = 0.02) scores in RS-IV and learning problem ( = 0.04) score in CPRS showed significant group difference.After medication, biotype1 and biotype2 showed similar declines in symptoms from week 1 to week 4.After week 4, biotype 1 still showed a steady decline, while biotype 2 showed a minor reduction in multiple symptoms.Furthermore, we analyzed the prognosis of using different medication in two biotypes.16 of 26 subjects in biotype 1 and 11 of 18 subjects in biotype 2 took MPH, and the rest of them took ATX.The demographic information and differences in symptom reduction between the ATX group and MPH group were shown in the Supplementary file.Our results showed that there was a significant group difference between biotype1 with MPH and biotype 2 with ATX in inattention ( = 0.05), hyperactive/impulsive ( = 0.05), and the total score ( = 0.03) from RS-IV in week 8.

DISCUSSION
In this study, we proposed a novel GCN-based deep clustering framework, compared it with several clustering methods, validated its utility in ADHD biotype detection, and identified two clinically discrepant biotype patterns.Biotype 1 contained more patients, presented milder symptoms, and overrepresented several wildly recognized brain aberrations including frontal gyrus and cerebellum.In contrast, biotype 2 included fewer patients, presented more severe symptoms especially hyperactive/impulsive, and showed greater degrees in regions from DM to SM, as well as the connectivity between the cerebellum and fusiform gyrus.
Based on this, we considered biotype 1 as a typical but mild ADHD pattern and biotype 2 as an atypical but more severe one.To the best of our knowledge, this is a novel attempt to introduce population association into the identification of disorder biotypes and identified two ADHD biotypes showing significant group differences.
Disease heterogeneity has attracted much interest, for it elucidates the implicit pathological mechanisms and makes it possible to offer individual treatment options to patients from different subgroups.For example, Varol et al. present an algorithm HYDRA based on SVM to simultaneously identify a class of pathological samples and separate them into coherent subgroups 34 .Building on HYDRA, Wen et al. proposed MAGIC to capture multi-scale representations of disorder subgroups and exploited a double-cycle optimization method 35 .In contrast to existing clustering approaches presented in [34][35][36] , we considered the population association by introducing population graph GCN into deep clustering, for heterogeneity is a property that can only be expressed at the population level instead of the individual level.Population graph GCN has been proven effective in disease classification by a number of previous studies [37][38][39] .Our GCN-BSD not only outperformed conventional clustering methods but also significantly outperformed the deep learning-based clustering model.When using the publicly accessible dataset ABCD, lowest DBI and highest CHI were achieved in ADHD biotype detection.Both as traditional methods, K-Means showed decreased P-values in characteristics compared to agglomerative clustering.As expected, we observed that the DNN with K-Means and GCN with K-Means performed better than the two conventional approaches.In addition, our validation in the dependent dataset further indicated the effectiveness of our proposed method.Our ADHD biotype results are consistent across two independent cohorts, as shown in Fig. 3 a and b.Consistent with the cognitive ability results, PKU biotypes were differentiable in performance IQ and further several symptoms including hyperactive/impulsive.ADHD biotypes identified by GCN-BSD from ABCD not only make sense in the objective index but also be reproducible and show significance in clinical scales.
The pattern of biotype 1 is in accord with the classical ADHD model as a disorder of deficient frontoparietal and multiple regions activation, while biotype 2 expresses an unconventional pattern.
Regarding the specific abnormalities identified in biotype 1, this typical but mild biotype overrepresented in multiple connectivity from DM to SM and CC, besides, the anterior cingulate cortex, cerebellum, paracentral lobule, and frontal gyrus also present as the key nodes.The frontal gyrus is hypo-activated in various tasks ranging from working memory to time discrimination 40,41 .ADHD has long been thought to reflect dysfunction of prefrontal-striatal-cerebellar circuitry, but accumulating evidence suggests that the prefrontal-striatal model of ADHD should be extended to include other circuits and their interrelationships from the perspective of systems neuroscience 42 .Consistent with the previous studies 43,44 , the biotype 1 supported a model that ADHD-related dysfunctions are not only involved in higher-level cognitivebehavioral functions, such as the frontoparietal and default networks, but also in sensorimotor processes, including SM 45 .Note that all the remaining regions implicated in the prefrontal-striatal-cerebellar model of 46,47 are components of the frontoparietal circuit known as the executive control circuit 48,49 .As for the unconventional biotype 2, we identified connectivity which not overrepresented in biotype 1.We noticed that besides the frontoparietal circuit, the special connectivity cerebellum-fusiform gyrus is highly correlated with multiple measures, including total intelligence, fluid intelligence, crystallized intelligence, inattentive and hyperactive/impulsive symptoms.The fusiform gyrus territory has received considerable attention for its role in reading 50,51 , moreover, its damage might lead to the letter-by-letter reading strategies shown by neuroanatomical investigations 52,53 .Our identified biotype 2 is consistent with this investigation that though the cerebellum and fusiform gyrus do not connect physically, their functional connection contributes to reading abilities, phonological processing, and semantic memory 54 .This abnormal connectivity might further influence cognitive abilities, and lead to inattention and hyperactive symptoms.
As the most prominent network in the clinical neuroscience literature regarding spontaneous intrinsic brain activity, DM underlies some of the executive function deficits, working memory deficits, and attention lapses in ADHD patients 55,56 .Our results showed that the extracted common top 100 discriminative FNCs from two separate cohorts showed great overlap and correlated with multiple cognition and symptom scales.
Among top 100 FNCs, the fundamental connection between the superior parietal lobule and posterior cingulate cortex, and precuneus are highly correlated with multiple cognitive abilities and symptoms, indicating that these connections from DM regions to the superior parietal lobule are associated with several attention problems 57,58 .The superior parietal lobule has been shown to play a major role in the voluntary shift of spatial visual attention and attention switching, and thus to be particularly important for regulated attentional processes and attention disengagement 59,60 .Besides, the common connection in DM from the precuneus to the anterior cingulate cortex is also considered a candidate locus of dysfunction in ADHD 61,62 .
A functional connectivity analysis suggested that structural and functional circuits linking the anterior cingulate cortex to the precuneus may represent "small-world network" long-range connections 63 .Our results affirmed the importance of DM in ADHD analysis and put forward a presumption that the connection between DM and the superior parietal lobule might also be a key circuit.
Most interestingly, we illustrate the treatment record varying from 1, 2, 3, 4, 8 weeks after the treatment between the two biotypes.Though only part of the subjects received medication treatment, we still found that biotype 1 showed a steady decline in all ADHD symptoms, and biotype 2 approached healthy children at a smaller degree after 8 weeks, which reflected that biotype 2 suffered from severe symptoms from the side.
Multiple studies revealed that MPH and ATX improve attention functions and upregulate abnormal frontocortical activation during executive function tasks in ADHD patients [64][65][66] .Specifically, both of the drugs have a moderating effect on the DM and cerebellum.DM serves as a core neuropathological mechanism of attention deficit behavior and cerebellar degree centrality was significantly associated with learning function performance 67,68 .Therefore, differential DM and cerebellar connectivity patterns between the two biotypes may influence drug responsiveness, consequently leading to varying symptom improvement outcomes, including inattention, total, and learning problem scores.Furthermore, we found reduction rate divisible in different biotypes with different medications, particularly, MPH treatment in biotype 1 significantly outperforms ATX treatment in biotype 2, implying that for patients with poorer treatment outcomes, MPH may be a preferable treatment option.Previous studies showed that stimulant medications like MPH are the most effective treatments for ADHD, while the non-stimulant ATX shows slightly lower but good efficacy in reducing ADHD symptoms 69 .Both biotype 2 showing server abnormality and the effectiveness of MPH might illustrate our results.Due to the limitation of our dataset, we need further study with more subjects to support our hypothesis.Some limitations should be considered.First, ABCD is an ongoing prospective longitudinal study starting at the ages of 9-10 and includes a diverse sample enrolled at 21 research sites across the U.S. The PKU dataset is collected from a local hospital, which contains potential ADHD patients.These two cohorts might be different in data distribution, even after regressing the factor of site and scanner.This limitation likely reduces the similarity of the ADHD biotype patterns from two datasets.Second, only some patients from PKU under medication have the longitudinal scale information.Therefore, our analysis of medication influence is an explorative attempt.We might include much more patients under medication to make a detailed analysis in the future.
Collectively, in this study, we proposed a novel framework, GCN-BSD, that can jointly characterize brain imaging data and phenotypic association and further use this knowledge to guide disease biotype detection.The population graph provides association information that the DNN-based model ignores.
Importantly, the identified two ADHD biotypes exhibit significant group differences in functional networks and multiple cognitive abilities and symptoms, especially in fluid intelligence and hyperactive/impulsive.All the above findings indicate the validation of the frontoparietal circuits to serve as a key signature to ADHD and provide the first evidence for the connection from the cerebellum to the fusiform gyrus to be used as a biomarker in the uncommon subgroup.This study helps move forward from a conventional biotype detection approach to the use of a more flexible deep learning-based analysis.

Study design
A population graph was first constructed based on functional network connectivity (Fig. 1a) and phenotypic information (age, gender) to build individual mappings; where FNCs serve as the feature of the nodes and the similarities between two subjects, which were abstracted from gender and age serve as edges.
Then we applied GCN-BSD to learn embeddings that are both group-discriminative between ADHD and controls, as well as adapted to the clustering constraint through K-Means loss (Fig. 1b).We selected 1069 ADHD patients from the ABCD study as the discovery dataset, identified K=2 for biotype division via the cluster sum of square (CSS) using the elbow method 31 , and evaluated the clustering performance of 4 popular algorithms, including (1) agglomerative clustering, (2) conventional K-Means, (3) DNN with deep K-Means, and (4) autoencoder GCN with K-Means with GCN-BSD based on Davies-Bouldin Index (DBI) and Calinski-Harabasz Index (CHI) 32 (Fig. 1c).As a result, two ADHD biotypes were identified, manifesting with different FNC patterns and distinguishing cognitive abilities.Then we used 130 ADHD and 105 controls collected from Peking University Sixth Hospital as validation dataset to test the generalizability and potential clinical use of the identified biotypes.Interestingly, we found that ADHD biotypes identified in ABCD and PKU showed high similarity and replicability in FNC patterns(Fig.1d).The most contributing FNCs and clinical records were compared dedicatedly between two biotypes.Specifically, biotype 1 presented milder symptoms while biotype 2 manifested more severe hyperactivity/impulsivity symptoms and worse cognitive levels.
Finally, we compared the symptom relief and treatment outcome of two biotypes from 44 out of 130 ADHD patients either treated by MPH or ATX at PKU6 according to our division (Fig. 1e).

The discovery ABCD cohort
The ABCD study (https://nda.nih.gov/abcd/) is an observation assessment of brain development in children in the US at 9-11 years of age to characterize psychological and neurobiological development from early adolescence to early adulthood 30 .Our data were analyzed from the ABCD Study curated annual release 3.0, which contains baseline data from 11,875 children.We determined ADHD same as previous studies 70,71 by past and present mental disorder diagnoses using parent-reported responses to the self-administered computerized Schedule for Affective Disorders and Schizophrenia for School-Age Children for DSM-5 (K-SADS-5).

Independent PKU cohort
Another in-house dataset was included in this study as a validation.This children's cohort was approved Children-Present and Lifetime version (K-SADS-PL) 72 , which is a clinical and semi-structured interview based on the Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition (DSM-IV).The severity of inattentive symptoms, hyperactive/impulsive symptoms, and total ADHD symptoms of all subjects were evaluated by the ADHD Rating Scale-IV (ADHD RS-IV).Conners' Parent Rating Scale (CPRS) 33 was used to assess the hyperactive/impulsive symptoms(see more details in the Supplementary file).The demographic information was listed in Table S2.

Image acquisition and processing
The resting-state fMRI data from two datasets were preprocessed with a combination of FMRIB Software Library v6.0 toolbox and Statistical Parametric Mapping 12 toolbox, including rigid body motion and distortion correction, spatial normalization into Montreal Neurological Institute space, and smoothing with a 6-mm full width at half maximum Gaussian kernel (please see more details in Supplementary file).
We decomposed the preprocessed MRI by a fully automated spatially constrained ICA run on each individual subject separately based on the neuromark_fMRI_1.0template (available in GIFT; http://trendscenter.org/software/gift and as a separate download at http://trendscenter.org/data) and pipeline used in 73,74 .For this template, fifty-three independent component networks (ICNs) were identified and arranged into 7 functional networks, including the subcortical (SC) (5 components), auditory (AU) (2 components), sensorimotor (SM) (9 components), visual (VI) (9 components), cognitive control (CC) (17   components), default mode (DM) (7 components), and cerebellar (CB) (4 components) networks.FNC is calculated using Pearson correlation, resulting in a 53×53 matrix for each subject.We extracted the upper triangle elements of the matrix as FNC features, namely, each subject has an FNC vector in the dimension of (53×52)/2 = 1378.Participants from the ABCD dataset were selected based on the availability of non-imaging measures and high-quality neuroimaging data.The quality assessment of the neuroimaging data was based on the "Quality Control and Recommended Inclusion Criteria for Structural Data" provided as a document in the "Release Notes Imaging Instruments" folder.Participants were excluded from the present analyses because of incomplete rsMRI data at baseline and/or because brain scans did not pass the ABCD Study's quality control.After all quality control, we have 1069 ADHD participants included (see more details in the Supplementary file).And 1164 demographically matched HCs were included in our study.Their demographic and cognitive information were listed in Table S1.

Cognitive abilities and symptom scales
The ABCD baseline cognitive battery assessed by a well-validated National Institute of Health Toolbox was used as evaluation in this study 75,76  For patients from PKU, the severity of inattentive symptoms, hyperactive/impulsive symptoms, and total ADHD symptoms of all subjects were evaluated by the ADHD Rating Scale-IV (ADHD RS-IV).Besides, the Conners' Parent Rating Scale (CPRS) was used to assess the symptoms in children participants with ADHD.The CPRS is a widely used instrument for screening and evaluating ADHD-related symptoms as well as other behavioral problems frequently associated with ADHD in children.It contains 48 items and can be divided into six factors: conduct problems, learning problems, psychosomatic problems, impulsivityhyperactivity, anxiety, and hyperactivity index.Verbal IQ, performance IQ, and full-scale IQ were measured by the Wechsler Child/Adult Intelligence Scale, Third Edition.

GCN for biotype detection
GCN-BSD is an autoencoder-based GCN model.The proposed architecture consists of three main elements: an encoder, an embedding layer, and a decoder, as shown in Fig. 1b.The encoder and decoder are connected via the embeddings.By putting the embeddings into the fully connected layer, we adopted the binary cross entropy function to learn group-discriminative features, which identified the subject labels as patients or healthy controls for all subjects.Besides, we put the patient subgraph with the connection unchanged into the clustering layer for biotype detection.The deep K-Means loss function is adopted as the clustering loss.To better extract high-resolution embeddings, we also adopted the reconstruction loss.The GCN-BSD can then be formulated as minimizing the following function:  = ∑  $  *+, () +  - ./() +  &  ,0123+* () where  $ ,  -,   & regulate the trade-off among seeking good representation for , which is set for biotype clustering, meanwhile is faithful to the original feature and maintains label information.More details can be found in the supplementary file.
Firstly, GCN-BSD was adopted in the ABCD dataset to identify ADHD biotypes.The number of clusters  was decided by the cluster sum of square (CSS) using the elbow method 31 .GCN-BSD was implemented with Pytorch and optimized using Adam optimizer 77

Biotypes validation
To validate the constancy of the identified ADHD biotype patterns, we used the mean FNC feature of two biotypes as templates and projected them into an independent dataset from PKU to extract similar biotype patterns.Specifically, for each subject in PKU, we calculated the Euclidean distance between individual FNC and two FNC templates, and then we divided the subject into a subgroup where the template showed more similarity than the other.Two-sample t-tests were used to calculate the group differences between two biotypes of the symptom scales, including RS-IV, CPRS, and Wechsler Child/Adult Intelligence Scale.Note that sites and scanners were regressed out before statistical analysis.
The top 100 discriminative FNC in two biotypes were identified by two-sample t-tests in ABCD and PKU, respectively.We grouped the 100 FNCs into 7 brain networks according to the NeuroMark pipeline to analyze the functional connectivity on a large scale.Moreover, to discover the exact consistent brain patterns, we picked the overlapped top 100 FNCs in two separate datasets and obtained brain functional patterns for two biotypes.We further calculated the correlations between the identified overlapped FNC and ABCD cognitive battery, RS-IV, CPRS, Wechsler Child/Adult Intelligence Scale in two datasets, separately, in order to study the association between FNC and multiple symptoms.

Clinical progression of two biotypes
In order to explore whether the identified biotypes respond differently to medical treatment, we compared the treatment outcome under medication treatment of ATX and MPH after 1, 2, 3, 4 and 8 weeks.
44 out of 130 ADHD patients from PKU were recorded with longitudinal symptom and cognitive scores, who were divided into two biotypes according to the previous results.Particularly, 27 of 44 patients were classified as biotype 1 and the rest of them as biotype 2. To test whether there are differences in the benefits of the drug between 2 biotypes, we compared the symptom scores RS-IV and CPRS from baseline to week 8 by repetive two sample t-test.

Fig. 1
Fig. 1 Research outline.(a) Functional imaging feature extraction--functional network connectivity(FNC) deriviation.(b)Biotype detection based on GCN+deep k-means using population graph constructed via FNC and phenotypic data from the ABCD study.(c) Clustering method comparison using Davies-Bouldin Index (DBI) and Calinski-Harabasz Index (CHI), resulting in 2 biotypes, which were mapped to PKU.(d) biotype comparison and replication.(e) Comparison of the symptom relief and treatment outcome of two biotypes either treated by MPH or ATX using longitudinal data at PKU.

Fig. 2
Fig. 2 ADHD biotypes detection results.(a) Cluster sum of square (CSS) across the number of clusters.(b) GCN-BSD shows the best clustering performance among the 5 methods, i.e., highest CHI and lowest DBI.(c) Mean cognitive scores for biotype 1 (blue), biotype 2 (yellow), and HC (green) in ABCD dataset, note that a higher score means better cognitive abilities.(d) Mean cognitive scores for biotype 1 (blue) and biotype 2 (yellow) in the PKU dataset, note that a higher score means better cognitive abilities in the left subgraph and a higher score means severer symptoms in the right subgraph.(e) Clustering results of the embedding features learned from GCN-BSD for ADHD biotypes and HC visualized by t-SNE.

Fig. 3
Fig. 3 ADHD biotypes detection results.(a) The mean differences of FNC between 2 biotypes and HC in ABCD and Top 100 connectivity patterns of biotype 1 and biotype 2 in the view of the contribution of functional networks.(b) The mean FNC differences between 2 biotypes-HC in PKU show high correlation with results from ABCD (r>0.6, p<10 -100 for both biotypes).Top 100 connectivity patterns of biotype 1 and biotype 2 in the view of the contribution of functional networks in PKU, showing high correlation with results from ABCD (r=0.75 for biotype 1 r=0.76 for biotype 2, p<6x10 -10 ).(c) Overlapped top 100 discriminative FNCs for biotype 1 between ABCD and PKU.(d) Overlapped top 100 discriminative FNCs for biotype 2 between ABCD and PKU.The two biotypes and demographically matched healthy controls were compared in cognitive measures still reveal a clear correlation to ABCD, where  = 0.60,  = 1.05 × 10 !$&' for biotype 1 and  = 0.67,  = 1.11 × 10 !$#" for biotype 2, as shown in Fig.3b.Similar to the results from the ABCD dataset, biotype 1 shows positive connectivity between SC, AU, SM, CC, and CB, as well as SC and SM.And it also shows negative connectivity within CB and between DM and SM.As for biotype 2, consistent with the result from ABCD, it presents more impaired regions than biotype 1. Biotype 2 has positive connectivity between SM and DM, VI and CB, and negative connectivity between SM and VI, CB and DM, and within DM.

Fig. 4
Fig. 4 The longitudinal results for patients under medication.(a) The longitudinal results for biotypes under medication from PKU in terms of the symptom scores RS-IV and CPRS from baseline to week 8.Note that a lower score means milder symptoms.From the baseline to week 8, biotype 1 showed a steady decline in all ADHD symptoms.While for biotype 2, the reduction after week 4 stagnated.(b) At week 8, the reduction rate showed significant group differences at RS-inattention ( = 0.01), RS-total ( = 0.02) and CPRS-learning problem ( = 0.04).(c) The longitudinal results for biotypes with ATX or MPH in terms of the symptom scores RS-IV and CPRS from baseline to week 8.(d) At week 8, we found significant group difference between biotype 1 with MPH and biotype 2 with ATX in inattention (  = 0.05 ), hyperactive/impulsive ( = 0.05) and total score ( = 0.03) from RS-IV, all FDR corrected.
by the Ethics Committee of Peking University Sixth Hospital.A total of 130 children with ADHD and 105 HCs (6-15 years old) were recruited.44 patients received medical treatment, 17 of them were treated with atomoxetine (ATX), and 27 of them were treated with methylphenidate (MPH), and we recorded their symptom scales at baseline, week 1, week 2, week 3, week 4, and week 8.The diagnosis was made by a senior psychiatrist based on the Schedule for Affective Disorders and Schizophrenia for School-Age

.
Hyperparameters were set as:  $ =0.1,  -= 0.5,  & = 1, and the learning rate was 0.001.To evaluate the clustering performance, we compared four alternative clustering methods with GCN-BSD, including (1) agglomerative clustering, (2) conventional K-Means, (3) DNN with deep K-Means, and (4) autoencoder GCN with K-Means.The clustering performance was measured by two internal cluster evaluation measures: Davies-Bouldin Index (DBI) and Calinski-Harabasz Index (CHI) 32 , and one performance indicator: ABCD cognitive battery.Statistical analyses were performed among ADHD biotypes and HC.A two-sample t-test was performed for all measures between biotypes and HC.The P-values were corrected for multiple comparisons using Bonferroni correction.

Table 1 . Comparison of P-value in significant cognitive characteristics between 2 biotypes Cognitive measures P-values of 2 sample t-test between biotype 1 and biotype 2
.  ×  ! Note: PVT=picture vocabulary test, LSWMT=list sorting working memory test, DCCST=dimensional change card sort test, PCPST=pattern comparison processing speed test, PSMT=picture sequence memory test.

Table 2 . Demographic and Cognitive Characteristics of the biotypes from ABCD.
PVT=picture vocabulary test, LSWMT=list sorting working memory test, DCCST=dimensional change card sort test, PCPST=pattern comparison processing speed test, PSMT=picture sequence memory test.

Table 3 . Demographic and Clinical Characteristics of the biotypes from PKU
, which covers Picture Vocabulary Test Score (PVT); Flanker Inhibitory Control and Attention Test Score (FICA); List Sorting Working Memory Score (LSWM); Dimensional Change Card Sort Test Score (DCCS); Pattern Comparison Processing Speed Test Score (PCPS);