Novel data-driven subtypes and stages of brain atrophy in the ALS-FTD spectrum

Background TDP-43 proteinopathies represents a spectrum of neurological disorders, anchored clinically on either end by amyotrophic lateral sclerosis (ALS) and frontotemporal degeneration (FTD). The ALS-FTD spectrum exhibits a diverse range of clinical presentations with overlapping phenotypes, highlighting its heterogeneity. This study aimed to use disease progression modeling to identify novel data-driven spatial and temporal subtypes of brain atrophy and its progression in the ALS-FTD spectrum. Methods We used a data-driven procedure to identify 13 anatomic clusters of brain volumes for 57 behavioral variant FTD (bvFTD; with either autopsy-confirmed TDP-43 or TDP-43 proteinopathy-associated genetic variants), 103 ALS, and 47 ALS-FTD patients with likely TDP-43. A Subtype and Stage Inference (SuStaIn) model was trained to identify subtypes of individuals along the ALS-FTD spectrum with distinct brain atrophy patterns, and we related subtypes and stages to clinical, genetic, and neuropathological features of disease. Results SuStaIn identified three novel subtypes: two disease subtypes with predominant brain atrophy either in prefrontal/somatomotor regions or limbic-related regions, and a normal-appearing group without obvious brain atrophy. The Limbic-predominant subtype tended to present with more impaired cognition, higher frequencies of pathogenic variants in TBK1 and TARDBP genes, and a higher proportion of TDP-43 type B, E and C. In contrast, the Prefrontal/Somatomotor-predominant subtype had higher frequencies of pathogenic variants in C9orf72 and GRN genes and higher proportion of TDP-43 type A. The normal-appearing brain group showed higher frequency of ALS relative to ALS-FTD and bvFTD patients, higher cognitive capacity, higher proportion of lower motor neuron onset, milder motor symptoms, and lower frequencies of genetic pathogenic variants. Overall SuStaIn stages also correlated with evidence for clinical progression including longer disease duration, higher King’s stage, and cognitive decline. Additionally, SuStaIn stages differed across clinical phenotypes, genotypes and types of TDP-43 pathology. Conclusions Our findings suggest distinct neurodegenerative subtypes of disease along the ALS-FTD spectrum that can be identified in vivo, each with distinct brain atrophy, clinical, genetic and pathological patterns.


Introduction
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disorder characterized by loss of motor neurons in the brain and spinal cord, leading to muscle weakness, atrophy and ultimately paralysis [1].
Behavioral variant frontotemporal dementia (bvFTD) is the most common subtype of frontotemporal degeneration (FTD) characterized by impairments in behavior, personality, and/or executive function [2,3].
ALS may additionally exhibit cognitive and behavioral symptoms overlapping with bvFTD [4,5] and bvFTD can exhibit motor neuron dysfunction consistent with ALS [6]. These two phenotypes can either occur separately or simultaneously, and have shared clinical, neuropathological and genetic features, constituting two ends of a spectrum of disorders that ALS-FTD lies in between [7]. De ning subtypes and elaborating distinct characteristics in the ALS-FTD spectrum captures potential driving causes of neurodegeneration.
Neurodegenerative diseases display high degree of inter-individual variation in disease biomarkers, including neuropsychological pro les, neuroimaging features, and molecular biological indicators. Distinct patterns of brain atrophy have been observed along the ALS-FTD spectrum. Regarding clinical phenotypes, bvFTD patients exhibit greater grey matter atrophy in the frontotemporal cortex, insula, thalamus, striatum, hippocampus and amygdala, while ALS patients show more severe atrophy in the motor cortex, pons and brainstem [8,9]. Different genetic pathogenic variants also result in distinct patterns of brain atrophy in individuals with ALS-FTD spectrum. These patterns vary in severity, progression rate, and affected brain regions. C9orf72-related FTD and ALS are associated with higher degree of atrophy extensively in frontal, parietal, occipital, cingulate and insula regions, thalamus and cerebellum compared to sporadic patients [10][11][12][13][14][15][16][17]. Progranulin (GRN)-frontotemporal lobar degeneration (FTLD) patients tend to exhibit greater grey matter volume loss in the frontal cortex [18,19]. Longitudinal data suggest that patients with pathogenic variants in GRN experience faster brain atrophy progression than those with pathogenic variants in C9orf72, indicating different rates of pathological progression and fundamental mechanisms associated with different gene variants [10,20]. Thus, distinct clinical phenotypes and genotypes may account for both spatial and temporal heterogeneity in brain atrophy patterns.
To better understand the spatial and temporal patterns of brain atrophy, an unsupervised machine-learning algorithm called Subtype and Stage Inference (SuStaIn) was developed. This tool can identify distinct subtypes and extract their progression patterns simultaneously [21], unlike previous studies that applied either subtype-only [22][23][24] or stages-only [25][26][27] models. A recent study utilized the SuStaIn algorithm to establish a data-driven pathological TDP-43 staging system in ALS, FTLD due to TDP-43 proteinopathies (FTLD-TDP), and limbic-predominant age-related TDP-43 encephalopathy neuropathologic change (LATE-NC) [28]. They identi ed two subtypes within FTLD-TDP that were cortical-predominant or brainstempredominant, and two subtypes within ALS that were subcortical-predominant or corticolimbic-predominant.
To date, this method has been applied to reconstruct different patterns of sequential disease progression trajectories in TDP-43 proteinopathies [28], FTD [21,29] and Alzheimer's disease (AD) [30,31], providing fundamental insights into the underlying biological processes of these diseases.
This study aimed to investigate the complex progression patterns and heterogeneity within earlier stages of the ALS-FTD spectrum, in contrast to late-stage neuropathological studies. To achieve this, we focused on individuals with high likelihood (clinical ALS) or de nite (pathology con rmed or genetic variants) TDP-43 pathology, and we trained a SuStaIn model on baseline cortical and subcortical volume data. Our prior study using the SuStaIn model trained on TDP-43 proteinopathy data had limitations related to the focus on the end-stage of disease and reliance on ordinal pathology ratings [28]. In contrast, this study utilized more quantitative data, the MRI-derived cortical and subcortical volumes that can identify earlier evidence of brain atrophy. We classi ed individuals into subtypes with different brain atrophy patterns and extracted a full trajectory for each subtype. Furthermore, we examined the differences in clinical phenotypes, genotypes and pathologies across subtypes. We also assessed the effectiveness of the tted model by analyzing longitudinal brain volumetric data.

Participants
Participants were retrospectively selected from the Integrated NeuroDegenerative Disease (INDD) database at the University of Pennsylvania (Additional le 1: Fig.S1) [32]. This study included a cohort of individuals who met published clinical criteria for ALS (n = 103), ALS-FTD (n = 47), or bvFTD (n = 57) [33][34][35], diagnosed by board-certi ed neurologists. We also included 172 demographically-comparable (age, sex) healthy controls who self-reported a negative neurological and non-signi cant psychiatric history with a normal Mini-Mental Status Examination (MMSE) > 27 (out of 30). Individuals with bvFTD had either autopsy-con rmed TDP-43 proteinopathy or genetic evidence of pathogenic variants associated with TDP-43 proteinopathy including C9orf72, GRN, metalloendopeptidase (MME), TANK-binding kinase 1 (TBK1), and TAR DNA binding protein (TARDBP). Of the 207 individuals with ALS-FTD spectrum disorder, 62 (22 with ALS, 8 with ALS-FTD and 32 with bvFTD) had one follow-up MRI scan, which were used in secondary analyses to evaluate the longitudinal consistency of SuStaIn subtype and stage assignments.
Images were processed using the Advanced Normalization Tools (ANTs) software package through standard preprocessing steps, as previously described [36]. Brie y, this procedure included N4 bias eld correction, diffeomorphic and symmetric registration to a custom template, brain extraction, and segmentation into sixtissue classes (cortical grey matter, subcortical grey matter, deep white matter, CSF, brainstem, and cerebellum) using template-based priors [37]. The custom template was in turn aligned to the MNI152 2009c Asymmetric T1-weighted template. The Schaefer 17-network atlas with 100 cortical parcels [38] and the Melbourne subcortex atlas [39] were warped from MNI152 space through the custom template to individual space. From each label, volumetric measurement was extracted, normalized by age, sex, and intracranial volume and converted to w-scores relative to healthy controls [40].
Considering the relatively low dimensionality of input data required for SuStaIn model, it is important to limit the number of features. We sought data reduction to enhance power of analysis, improve model identi ability, and reduce uncertainty. An unsupervised consensus-clustering algorithm, Bootstrap Analysis of Stable Clusters (BASC), was utilized to identify spatially stable clusters that consistently exhibited similar volumetric measurements of cortical and subcortical structures across subjects [41]. This algorithm performed k-means clustering on 1000 bootstrapped samples to reduce the dimensions of input data. A stability matrix was generated to represent the probabilities of each pair of brain regions falling into the same cluster. Based on the Silhouette index, an optimal number of data-driven clusters were identi ed. The volumetric measurements of BASC-identi ed clusters were then extracted and used as input biomarkers to the SuStaIn model (Fig. 1a, Additional le 1: Table S1).

Clinical data
Clinical and neuropsychological assessments were conducted at the Penn Frontotemporal Degeneration Center and Penn Comprehensive ALS Clinic. Neuropsychological test scores were obtained from the testing visit that was closest to the MRI scan. Demographic information, including age, sex, years of education, disease duration from symptom onset to MRI scan, diagnostic delay (the time interval between symptom onset and con rmed disease diagnosis), and site of symptom onset were collected.
Motor Assessments. The Penn Upper Motor Neuron Score (PUMNS) measures upper motor neuron signs in individuals with ALS/ALS-FTD [42]. The Revised ALS Functional Rating Scale (ALSFRS-R) evaluates the severity of motor symptom functional impairment in ALS/ALS-FTD [43]. Disease progression was measured by the Progression index, which is calculated as (48 -ALSFRS-R score) divided by duration in months [44].
We also calculated King's stage, derived from the ALSFRS-R, to assess spreading of motor symptoms [45].
Cognitive Assessments. Cognitive and behavioral changes were evaluated using tests, including MMSE, Edinburgh Cognitive Assessment Scale (ECAS) [5,46], Philadelphia Brief Assessment of Cognition (PBAC) [47], Boston Naming Test (BNT), semantically-guided category naming uency for the number of animals generated in 60 seconds (Animal uency score), letter guided category naming uency for the number of 'F' words generated in 60 seconds (Letter uency score), and digit-span for the longest number of digits repeated in forward and backward sequences (Digit forward span and Digit backward span).

Genetic screening
Genomic DNA was extracted from peripheral blood or frozen brain tissue collected from participants [48].
DNA was not available for 6 individuals. Genotyping for C9orf72 hexanucleotide repeat expansions was performed using a modi ed repeat-primed polymerase-chain reaction, as previously described [49]. Pathogenic variants that associated with ALS-FTD spectrum were screened using either a targeted nextgeneration sequencing panel (MiND-Seq) [48] or whole exome/genome (WES/WGS) sequencing. Of 201 individuals who underwent genetic screening, 64 were found to have pathogenic variants. Speci cally, 48 had repeat expansions in C9orf72 (> 30 repeats), or known pathogenic variants including 11 in GRN, 1 in MME, 2 in TBK1, and 2 in TARDBP.

Neuropathological examination
Autopsy was performed on a subset of individuals (n = 55) including 21 ALS, 7 ALS-FTD, and 27 bvFTD. Neuropathological diagnosis of FTLD-TDP and ALS with TDP-43 proteinopathies (ALS-TDP) was performed by expert neuropathologists according to previously described protocols [50]. TDP-43 proteinopathies were classi ed into categories including types A, B, C and E [51]. Since type E is relatively rare and shows some biological overlap with type B [51], it has been proposed to combine these two types together. Of the 55 individuals, 16 were classi ed as type A cases, 18 as type B or E, 3 as type C cases, and the remaining 18 cases (1 bvFTD and 17 ALS) that could not be further subtyped were classi ed as TDP-43 non-speci c type.

Subtype and Stage Inference modelling
We utilized the w-scored volumetric measurements of 13 BASC-identi ed clusters (Fig. 1a, Additional le 1: Table S1) as input biomarkers for training the SuStaIn model (https://github.com/ucl-pond/pySuStaIn). As the volumetric measurements were continuous variables, we employed the piecewise linear SuStaIn model. This algorithm combines clustering and disease progression modelling to identify subtypes with different rates and patterns of disease progression [21]. To evaluate the performance of SuStaIn model, we used 10fold cross-validation, where the optimal number of subtypes was selected based on the out-of-sample loglikelihood and cross-validation information criterion (CVIC) [21] to better balance the model complexity with accuracy ( Fig. 1b-c). Each subtype's disease progression pattern was described by a piecewise linear model, which reconstructed the trajectory of brain atrophy. Each event, alternatively referred to as stage, corresponded to a change in a speci c biomarker, quanti ed by w-scores representing the severity of brain atrophy. We utilized w-score waypoints of 1, 2, and 3, with 3 set as the maximum value that represented the point at which the biomarker reached severe abnormality. To capture the progression pattern where each SuStaIn stage corresponds to a new region reaching a new score, the number of stages was determined by multiplying the number of BASC-identi ed clusters (13) by the maximum w-score value (3), resulting in a total of 39 stages. The model uncertainty was estimated using 100,000 Markov chain Monte Carlo iterations (MCMC). For each subject, the SuStaIn model assigned a probability value to each subtype and stage, enabling their assignment to a speci c subtype and stage within the disease progression pattern of this subtype.
Longitudinal MRI scans were withheld from the SuStaIn model calculations and then used in a secondary analysis to assess the stability of SuStaIn subtypes and progression of SuStaIn stages over time. At followup visits, the volumetric measurements were w-scored as described above using the same healthy control cohort for normalization. Subtype stability was determined as the proportion of individuals who were either assigned to the same subtype or progressed from normal-appearing group to a SuStaIn subtype at follow-up visits. The advancement of SuStaIn stage over time was evaluated in individuals with stable subtypes. The annualized change of SuStaIn stage was calculated by dividing the change in SuStaIn stage from baseline to follow-up visit by follow-up period.

Statistical analyses
The statistical analyses and plotting were conducted with R statistical software (version 4.2.0; R Foundation for Statistical Computing, Vienna, Austria) and GraphPad Prism (version 9.0; GraphPad Software, Inc., San Diego, CA). The brain heatmaps were visualized using BrainNet Viewer [52]. The normality of variable distribution was tested using the Shapiro-Wilk normality test. Continuous variables with normal distribution were compared using two-sample t-test, while Mann-Whitney test was utilized for comparing variables with non-normal distribution. For comparison of categorical variables, chi-squared test or Fisher exact test was employed. We compared clinical features, frequencies of pathogenic variants, proportions of TDP-43 types, SuStaIn stages and annualized change of SuStaIn stage across subtypes. Additionally, subtype probability at baseline were compared between subtype-stable and unstable individuals. A signi cance level of p < 0.05 was considered signi cant. Cortical and subcortical volumes were compared between different groups using a generalized linear model, and a false discovery rate (FDR)-corrected p < 0.05 was used for multiple testing.
Correlation analyses were conducted between the predicted SuStaIn stages and clinical pro les, baseline and follow-up SuStaIn stages, as well as the change in SuStaIn stage and follow-up period. All correlation analyses were considered signi cant at a threshold of p < 0.05.

Participants characteristics
The demographic, clinical, genetic and pathological characteristics are summarized in Table 1. Compared to ALS individuals, bvFTD individuals had longer disease duration. The diagnostic delay in individuals with ALS, ALS-FTD, and bvFTD is a multifactorial issue in uenced by various elements, which increased in ascending order for these conditions. The ALS individuals were younger and had higher MMSE scores than ALS-FTD and bvFTD individuals. Individuals with bvFTD had higher frequencies of pathogenic variants in C9orf72 and GRN genes than ALS/ALS-FTD, and two individuals had pathogenic TARDBP mutations were both bvFTD. Higher proportions of TDP-43 type A, B, and E cases were observed in ALS-FTD and bvFTD groups compared to ALS group. All three TDP-43 type C cases were bvFTD. Most of the ALS cases in our cohort were classi ed as TDP-43 non-speci c type.

Subtype progression patterns
The SuStaIn algorithm was applied to the baseline brain volumetric measurements, resulting in the identi cation of subtypes that exhibit distinct progression patterns of brain atrophy. Figure 2 illustrates the brain atrophy trajectory for each subtype, with the w-score ranging from 1 to 3, indicating the degree of brain atrophy from mild to moderate to severe. The most noticeable differences between the two subtypes with distinct brain atrophy patterns were observed in the initial sites of brain atrophy during the early SuStaIn stages.
The rst identi ed subtype, exhibited brain atrophy that initially appeared in the prefrontal cortex and subsequently in the somatomotor cortex at SuStaIn stage 3, which we subsequently refer to as "Prefrontal/Somatomotor-predominant subtype". By SuStaIn stage 12-13, parts of the prefrontal cortex reached w-scores exceeding 3. Additionally, the volumetric loss of subcortical regions, including the thalamus, caudate, globus pallidus, putamen, and nucleus accumbens, was evident in early stages but developed more slowly than atrophy in the prefrontal cortex. This volume loss continued to progress and reaches a severe degree after SuStaIn stage 17.
The second identi ed subtype displayed brain atrophy that was rst observed in the temporal pole within the limbic network, hippocampus, and amygdala at SuStaIn stage 1, which we subsequently referred to as "Limbic-predominant subtype". The brain regions related to the limbic system experience a more rapid progression of atrophy. Speci cally, the hippocampus and amygdala reached w-score 3 by SuStaIn stage 8, while the temporal pole and insula reached w-score 3 by stage 12. The volumetric loss of subcortical regions also began in the early stages of atrophy progression, but it reached w-score 3 later than the Prefrontal/Somatomotor-predominant subtype, indicating a relatively slower rate of progression. It was worth noting that the 11th cluster, which included prefrontal regions, orbitofrontal cortex and insula, experienced signi cant volumetric loss in the early stages and ultimately reached a severe level of atrophy by SuStaIn stage 11 in both subtypes. In addition to these two subtypes with atrophy, individuals assigned to SuStaIn stage 0 were labeled as "normal-appearing group", which showed no detectable brain atrophy. Thus, individuals with ALS were more likely to be classi ed into the normal-appearing group, whereas the majority of the ALS-FTD and bvFTD individuals were assigned to atrophy subtypes.
Prefrontal/Somatomotor-predominant subtype was the most common assignment across clinical diagnoses, which has a higher prevalence compared to the Limbic-predominant subtype, occurring almost 2.5 times more frequently. The distributions of clinical phenotypes signi cantly differed across subtypes (Fig. 3a, Additional le 1: Table S2).

Comparison of cortical and subcortical volumes between subtypes
Comparing cortical and subcortical volumes across different groups (Fig. 4), we found that the normalappearing group did not display any signi cant brain atrophy at their baseline MRI. As indicated by the name "normal-appearing group", there was no noticeable reduction in brain volumes compared to healthy controls, which was in line with our expectations.
The two atrophy subtypes displayed extensive decreased brain volume in comparison to the normalappearing group. The Prefrontal/Somatomotor-predominant subtype exhibited reduced volume in brain The two SuStaIn subtypes exhibited distinct patterns of brain atrophy (Fig. 4) Given the signi cant difference in SuStaIn stage between subtypes, we conducted additional comparisons of volumetric measurements between subtypes while adjusting for SuStaIn stage, to avoid attributing regional atrophy differences solely to subtypes with more advanced atrophy due to disease progression (Additional le 1: Fig.S2). Similar ndings were observed, with more concentrated in regions relevant to the respective subtypes. Speci cally, the Prefrontal/Somatomotor-predominant subtype exhibited reduced volume primarily in the thalamus, prefrontal and somatomotor cortices, while the Limbic-predominant subtype showed decreased volumes mainly in the temporal lobe, insula, parahippocampal cortex, hippocampus, and amygdala.
Comparison of clinical, genetic, and neuropathological features between subtypes Demographic, clinical, genetic and neuropathological characteristics for each subtype are summarized in Fig. 3 and Additional le 1: Table S2. Although the two SuStaIn subtypes displayed different patterns of brain atrophy, there were substantial overlaps in clinical features across subtypes. This suggests that despite differences in neurodegenerative patterns, the clinical manifestations and symptomatology remain largely consistent between the subtypes. The Limbic-predominant subtype exhibited poorer performance on BNT, which assesses language and semantic memory, compared to the Prefrontal/Somatomotor-predominant subtype. In terms of genetic status, the Prefrontal/Somatomotor-predominant subtype had a signi cantly higher frequency of pathogenic variants in GRN compared to Limbic-predominant subtype. Notably, all 11 cases with GRN pathogenic variants were classi ed into the Prefrontal/Somatomotor-predominant subtype.
Although not statistically signi cant, there was also a trend towards higher frequencies of repeat expansions in C9orf72 in the Prefrontal/Somatomotor-predominant subtype. Additionally, it is worth highlighting that two individuals with bvFTD who had pathogenic variants in the TARDBP gene, as well as one individual with ALS-FTD and one with bvFTD who carried TBK1 pathogenic variants, were all classi ed under the Limbicpredominant subtype. Distribution of TDP-43 types varied across SuStaIn subtypes. The Prefrontal/Somatomotor-predominant subtype had a higher proportion of TDP-43 type A. The Limbicpredominant subtype was more prone to having TDP-43 type B or E, and all three bvFTD individuals with TDP-43 type C also belonged to this subtype. The TDP-43 non-speci c type, predominantly observed in individuals with ALS-TDP, was more prevalent in the Prefrontal/Somatomotor-predominant subtype than Limbic-predominant subtype. Comparing to atrophy subtypes, the normal-appearing group had a signi cantly shorter diagnostic delay, and a higher proportion of individuals with ALS than ALS-FTD and bvFTD. Additionally, they had a lower frequency of cognitive onset in relation to lower and upper motor neuron onset. This group also showed higher cognitive scores, as evidenced by better performance on tests including MMSE, ECAS, PBAC, BNT, Animal and Letter uency tasks, and Digit forward and backward span. Two cases in the normal-appearing group were found to have pathogenic variants in either C9orf72 or MME genes. Additionally, most individuals in this group who underwent autopsy were classi ed as having TDP-43 non-speci c type pathology.
Certain tests (including PUMNS, ALSFRS-R, Progression index, and King's stage) were speci cally administered for individuals with ALS/ALS-FTD, as these tests were considered more relevant or sensitive in assessing motor impairments. Thus, we focused on ALS/ALS-FTD as a distinct subgroup to compare clinical pro les across subtypes (Fig. 3, Additional le 1: Table S3). Despite a smaller number of ALS-FTD cases in this cohort, the Limbic-predominant subtype still exhibited a higher percentage of individuals with ALS-FTD compared to the Prefrontal/Somatomotor-predominant subtype. Likewise, individuals who experienced cognitive onset were more likely to be classi ed under the Limbic-predominant subtype, given that this subtype had more individuals with cognitive decline. Regarding the motor symptom scales, the normalappearing group tended to have lower King's stages compared to atrophy subtypes. Moreover, by focusing solely on bvFTD (Additional le 1: Table S4), the research sample was relatively homogeneous, allowing for a comprehensive examination of cognitive function across subtypes. The Limbic-predominant subtype had longer disease duration and only showed worse performance on BNT.
To demonstrate that the differences between two subtypes were related to atrophy patterns rather than one subtype being in a more advanced stage, we further adjusted for SuStaIn stage when comparing the clinical pro les. This adjustment allowed us to account for the potential confounding effect of disease progression.
Even after adjusting for SuStaIn stage, the Limbic-predominant subtype still showed poorer performance on BNT (t-statistic = -5.70, p < 0.0001) and language scale (t-statistic = -2.17, p = 0.03) of PBAC compared to the Prefrontal/Somatomotor-predominant subtype. This nding further supported the presence of language impairments in the Limbic-predominant subtype. Furthermore, the Limbic-predominant subtype showed longer diagnostic delay (t-statistic = 2.009, p = 0.04).

Relationship between SuStaIn stage and clinical characteristics
Each individual was assigned a SuStaIn stage, which re ected progression of brain atrophy. The distribution of individuals assigned to each SuStaIn stage was illustrated in Fig. 5a. ALS individuals were predominantly assigned to earlier SuStaIn stages of brain atrophy, while ALS-FTD and bvFTD individuals were more frequently assigned to later stages (Fig. 5b). Individuals in Limbic-predominant subtype had higher SuStaIn stages than individuals in Prefrontal/Somatomotor-predominant subtype (Fig. 3b, Tables S2-S3).
We further investigated the relationship between SuStaIn stages and clinical pro les, genotypes, and neuropathologies in all individuals. The SuStaIn stage was positively correlated with disease duration (r = 0.22, p = 0.002; Fig. 5f) and diagnostic delay (r = 0.46, p < 0.0001; Fig. 5g), while negatively correlated with cognitive scales including MMSE (r = -0.50, p < 0.0001; Fig. 5h), ECAS scores, PBAC score, BNT, Animal and Letter uency tasks, Digit forward and back span tasks (Additional le 1: Fig.S3). In terms of motor symptoms, individuals with ALS/ALS-FTD who had higher King's stages exhibited higher SuStaIn stages compared to individuals in King's stage 1 (Fig. 5c). Furthermore, individuals carrying pathogenic variants in C9orf72 and GRN had signi cantly higher SuStaIn stages, compared to sporadic forms of the disease (Fig. 5d). Individuals with pathogenic variants in C9orf72 exhibited higher SuStaIn stages than those who had pathogenic variants in GRN. Furthermore, autopsy-con rmed TDP-43 typable cases including type A, B, C, and E, also showed signi cantly higher SuStaIn stages than cases having TDP-43 non-speci c type (Fig. 5e).
Longitudinal stability and reliability of SuStaIn subtypes and stages Subtyping stability. The mean follow-up period was 17.5 months, with a standard deviation of 13.1 months. The subtype assignments of follow-up visits are shown in Fig. 6a and Additional le 1: were deemed as "subtype stable" individuals. The remaining 3 (4.8%) follow-up visits resulted in inconsistent subtype assignments, and were considered as "subtype unstable". The probability of subtype assignments at baseline was higher in subtype stable individuals than unstable individuals (Mann-Whitney U-statistic = 27, p = 0.04; Fig. 6b). Individuals assigned to Prefrontal/Somatomotor-predominant subtype exhibited more atrophy in its key regions, the BASC-identi ed clusters 1, 2, 5, and 10. The Limbic-predominant subtype showed more atrophy in its key regions, the BASC-identi ed clusters 9 and 12 (Fig. 2, Additional le 1: Fig.S4). During follow-up visits, brain atrophy showed slight progression. Speci cally, the two normalappearing cases that progressed to Limbic-predominant subtype exhibited signi cant atrophy progression, particularly in clusters 9 and 12. In contrast, the two normal-appearing cases progressed to Prefrontal/Somatomotor-predominant subtype showed more widespread atrophy progression, particularly in the prefrontal cortex, with less pronounced progression in limbic-related regions (Additional le 1: Fig.S4b). Cases displaying abnormal longitudinal changes were typically classi ed as "subtype unstable" or "stage unstable".
Staging reliability. Among individuals with stable subtype, most of the follow-up visits were assigned to a more advanced SuStaIn stage or remained at the same stage. Of the 59 subtype stable cases, 6 (10.2%) follow-up visits were retrogressed to an earlier stage and regarded as "stage unstable" individuals. The probability of stage assignments at baseline was signi cantly higher in stage stable individuals compared to unstable individuals (Mann-Whitney U-statistic = 45, p = 0.003; Fig. 6c). The annualized change in SuStaIn stage may indicate rate of disease progression, with the normal-appearing group showing slower progression than the Prefrontal/Somatomotor-predominant subtype (Mann-Whitney U-statistic = 110, p = 0.01; Fig. 6d). In stage stable individuals, annualized change in SuStaIn stage was signi cantly smaller in normal-appearing group compared to both atrophy subtypes (Mann-Whitney U-statistic = 86 and p = 0.003 for Prefrontal/Somatomotor-predominant subtype, and Mann-Whitney U-statistic = 14 and p = 0.005 for Limbicpredominant subtype). Additionally, the SuStaIn stage at baseline was signi cantly correlated with stages at follow-up visits (r = 0.89, p < 0.0001; Fig. 6e). Furthermore, we observed a positive correlation between the follow-up period and the change of SuStaIn stage (r = 0.27, p = 0.04; Fig. 6f).

Discussion
In this study, we utilized a data-driven SuStaIn model approach to investigate diverse spatial and temporal patterns of brain atrophy in the ALS-FTD spectrum. By analyzing baseline cross-sectional volumetric imaging data, we identi ed distinct patterns of regional brain atrophy, which included a Prefrontal/Somatomotor-predominant subtype, a Limbic-predominant subtype and a normal-appearing group. These data-driven subtypes exhibited variations in clinical, genetic and neuropathological characteristics. Moreover, the data-driven SuStaIn stages constructed progression trajectories of each subtype, which aligned with worsening clinical pro les. Together, our ndings provided new insights into the heterogeneity in progression patterns of brain atrophy in the ALS-FTD spectrum and highlighted the potential utility for patient strati cation in precision medicine.
Supporting evidence has demonstrated that the ALS-FTD spectrum displays a high degree of clinical, genetic and neuropathological heterogeneity [8]. Although various biomarkers have been applied to subtype individuals and characterize their brain atrophy patterns within the ALS-FTD spectrum [20,53,54], there is still no ideal method to fully disentangle the heterogeneity of brain atrophy. Using the SuStaIn model, we identi ed data-driven subtypes with distinct progression patterns of brain atrophy. The Prefrontal/Somatomotor-predominant and Limbic-predominant subtypes exhibited brain atrophy in shared and distinct brain regions. The two subtypes were characterized by their distinctive brain atrophy regions as their names suggest, with the Prefrontal/Somatomotor-predominant subtype exhibiting atrophy in prefrontal and somatomotor regions, while the Limbic-predominant subtype exhibited atrophy in limbic-related regions such as temporal regions, hippocampus and amygdala. In addition, both subtypes exhibited volumetric loss in several shared brain regions including prefrontal, paralimbic, and subcortical regions. The prefrontal regions were likely to be the vulnerable regions in the Prefrontal/Somatomotor-predominant subtype, while the orbitofrontal cortex and insula, as two major components of the paralimbic belt, were vulnerable regions in the Limbic-predominant subtype. Our ndings were partly consistent with previous studies that have identi ed subtypes of brain atrophy in subsets of the ALS-FTD spectrum [24,55]. Tan  . Young et al. applied the SuStaIn model to genetic FTD, and identi ed a temporal subtype and a frontotemporal subtype of brain atrophy [29]. Our study trained the SuStaIn model on a diverse range of clinical phenotypes within the ALS-FTD spectrum. Our approach bene ted by considering both spatial and temporal progression of brain atrophy, setting it apart from previous subtypeonly and stage-only studies. By incorporating spatial patterns of brain atrophy, we gained a more comprehensive understanding of the different subtypes within the ALS-FTD spectrum. Simultaneously, analysis of temporal progression allowed us to capture the dynamic nature of brain atrophy in the ALS-FTD spectrum and allowing determination of the progressive stage of an individual. As a result, the two subtypes we identi ed provide a comprehensive summary of the characteristics of previously identi ed subtypes.
The two brain atrophy subtypes showed distinct characteristics. The Limbic-predominant subtype captured a higher proportion of individuals with cognitive (rather than motor) symptom onset, with more pronounced cognitive decline, particularly in the language domain. This subtype resembled a semantic variant primary progressive aphasia pattern. The Prefrontal/Somatomotor-predominant subtype had higher frequencies of pathogenic variants in C9orf72. The C9orf72 pathogenic variants carriers were demonstrated to exhibit prominent structural and functional disruptions in various brain regions, including prefrontal and motor cortices [57,58]. Additionally, this subtype also covered all the GRN pathogenic variants carriers. FTD individuals with GRN pathogenic variants may exhibit asymmetric cortical atrophy involving frontal, temporal and parietal cortices [10,59,60]. Both two bvFTD individuals with the I383V variant in TARDBP gene fell into the Limbic-predominant subtype, consistent with previous observations that I383V variant was associated with predominant atrophy of temporal lobes and hippocampus [61,62]. The distribution of TDP-43 types was different between subtypes. The Prefrontal/Somatomotor-predominant subtype had a higher proportion of type A, which has been linked to atrophy in the dorsal frontotemporal, striatal, and thalamic regions [53,57], all of which were predominant regions of this subtype. The Limbic-predominant subtype presented higher proportions of TDP-43 type B, and E. It has been reported that the TDP-43 type B was associated with relatively symmetric atrophy of the medial temporal, medial prefrontal, and orbitofrontalinsular cortices [53], which are regions involved in the Limbic-predominant subtype. TDP-43 type C was highly associated with neurodegeneration in the anterior temporal lobes including the temporal pole and amygdalo-hippocampal area [63]. It is notable that all three bvFTD individuals with con rmed TDP-43 type C pathology fell into the Limbic-predominant subtype, which aligns with a staging system of brain atrophy in TDP-43 type C with early involvement of amygdala, medial and lateral temporal cortex, and temporal pole, followed by later involvement of insula [64]. The normal-appearance group displayed better cognitive abilities in various domains including the executive functioning, language, visual skill, and memory, as well as milder behavioral symptoms, and a tendency towards shorter disease duration. This group mostly consisted of individuals with ALS, who exhibited better cognitive performance and were more likely to be lower motor neuron onset. This observation is in line with established knowledge, which suggests that ALS typically exhibits a lesser degree of cortical TDP-43 pathology and greater involvement of lower motor neurons [65]. The spread of TDP-43 pathology in ALS follows a sequential pattern, starting from motor neurons in the spinal cord, brainstem, and agranular motor cortex, then propagating to the frontotemporal and subcortical regions [50].
The SuStaIn model further reconstructed the progression trajectories of brain atrophy of each subtype. The SuStaIn stages represent ordered progression of brain atrophy from normal to a certain degree of abnormality. The Limbic-predominant subtype had higher SuStaIn stages, indicating a more advanced degree of brain atrophy progression than Prefrontal/Somatomotor-predominant subtype. Individuals with genetic pathogenic variants were assigned to more advanced stages compared to sporadic individuals. Speci cally, the individuals with pathogenic variants in GRN exhibited more advanced stages than those with the C9orf72 repeat expansions. This aligns with previous works demonstrating a faster progression rate of brain atrophy in individuals with pathogenic variants in GRN than those in C9orf72 [20,66]. Furthermore, individuals with TDP-43 non-speci c type exhibited higher SuStaIn stages compared to those with typable TDP-43 pathology. This is because the TDP-43 non-speci c type mainly consisted of ALS cases with less cortical pathology, making them unclassi able into speci c TDP-43 types. These individuals exhibited less brain atrophy, indicating an early-stage level of brain atrophy. As individuals entered advanced SuStaIn stages, brain atrophy was increased in degree and spatial extent, accompanied by a subsequent progression of clinical symptoms. SuStaIn stage showed good linear correlations with clinical progression measures including disease duration, motor symptoms severity and cognitive decline, making it a reliable representation of disease progression and could be used to evaluate the level of advancement of an individual's disease.
To test the reliability of the SuStaIn model, we examined the consistency of subtype assignments on followup MRI data. The results supported the effectiveness of the disease progression model in subtyping and staging, as 95.2% of the individuals showed stable subtype assignments over time. This includes individuals who were consistently assigned to the same subtype, and those who progressed from the normal-appearing group to corresponding atrophy subtypes as the brain atrophy initiated in either prefrontal/somatomotor or limbic-related regions. Overall, the model demonstrated a subtyping capability as high as 95.2%. Staging reliability refers to the proportion of follow-up visits where individuals either advanced to a higher SuStaIn stage or remained at the same stage as baseline assessment. This model exhibited a staging reliability of 89.8%, which could be attributed to the lower probabilities of stage assignment in unstable stage cases, making them more prone to being retrogressed to an earlier stage. The reason for the "subtype unstable" or "stage unstable" assignments in longitudinal assessments could be attributed to various factors, including technical issues that may lead to inconsistencies in the measured imaging features used to classify subtypes or stages. Moreover, our nding revealed progressively worsening of brain atrophy over time, with longer follow-up periods associated with greater changes in SuStaIn stage, re ecting more advanced disease progression.
There are several limitations to consider in future work. One limitation is the inherent heterogeneity of the ALS-FTD spectrum. Our clinical assessments were routinely collected clinical measures (e.g., ALSFRS-R, UMN) that largely did not differ across observed subtypes but more detailed clinical exam or ner-grained motor measures may better identify how our observed patterns may onto clinical heterogeneity in the future.
Our study speci cally focused on individuals associated with TDP-43 proteinopathies. This selective focus may restrict the generalizability of SuStaIn model in capturing the full extent of heterogeneity within the ALS-FTD spectrum, including bvFTD due to a tauopathy or atypical form of AD. Another limitation is the lack of sampling from important regions including spinal cord and brainstem, which play crucial roles in pathophysiology of ALS. This limitation may partially explain why approximately 40% of ALS individuals were assigned to the normal-appearing group without apparent brain atrophy. The absence of data from these regions may mask important changes occurring speci cally in spinal cord and brainstem, thereby restricting our ability to fully comprehend the underlying neurodegenerative processes in ALS. Future investigations should address these limitations to gain a more comprehensive understanding of the ALS-FTD spectrum.

Conclusions
In general, we utilized the SuStaIn model to gain a deeper understanding of the heterogeneity within progressive processes of the ALS-FTD spectrum. We demonstrated two distinct spatiotemporal subtypes of cortical atrophy with varying clinical, genetic and neuropathological pro les, which shed light on the intricate progression patterns and heterogeneity of the ALS-FTD spectrum. This data-driven disease progression modelling method provided a valuable tool for individual classi cation and staging, paving the way for precision medicine in the eld.  Figure 1 Methodology of selecting optimal number of brain clusters and subtypes. (a) Bootstrap analysis of stable clusters on cortical and subcortical volume. The stability matrix showed that partitions of the brain were classi ed into stable clusters. Cross-validation was employed and (b) out-of-sample log-likelihood and (c) CVIC were both calculated to the select the optimal number of subtypes. (d) Subtype probability across SuStaIn stage. CVIC cross-validation information criterion.

Figure 2
Subtypes progression patterns identi ed by SuStaIn algorithm. (a) W-scores of subtype progression patterns for each region for each subtype. Color shade represents the probability that w-score in each region is reached at each SuStaIn stage, with red for mild atrophy (w-score = 1), magenta for moderate atrophy (wscore = 2), and blue for severe atrophy (w-score = 3). (b) Spatial distribution and degree of cortical atrophy at each SuStaIn stage. Color shades represent the cumulative sum of probabilities in each brain region.  Comparison of volumetric measurements between groups at baseline. (a) Cortical volumetric differences between groups at baseline. (b) Subcortical volumetric differences between groups at baseline. Only results with a threshold at FDR-corrected p value < 0.05 were shown. Cool colors indicate more cortical atrophy in the former group than the latter one, while warm colors indicate more cortical atrophy in the latter group than the former one.