Short Sensory Profile Responses in non-syndromic ASD are heterogeneous
The primary goal of this study was to use sensory features to identify subgroups within ASD. In order to create subgroups, there must be variation within the group. Therefore, we first looked at the distribution of SSP scores within our cohort. Consistent with prior reports (Rogers et al., 2003; Tomchek and Dunn 2007; Ashburner et al., 2008; Chen et al., 2009), a wide range of scores in each of the seven sensory areas assayed by the SSP was observed (Fig. 2A). Most histograms skewed towards higher values associated with typical scoring ranges. However, the score distributions for under-responsiveness-seeks sensation and auditory filtering exhibited a bell shape distribution with the peak in mid-range scores suggesting most subjects were probably different from typically developing children in those areas (Fig. 2A). Only eight subjects (2%) scored in the typically performing range (Fig. 2A, green shading) for all sensory areas. Similarly, only 24 subjects (6%) scored in the definitely different range (Fig. 2A, blue shading) for all sensory areas. Scoring in the probably or definitely different range for all sensory areas was slightly more common, seen in 54 subjects (14%). Overall, the scores were broadly distributed suggesting sufficient variation to allow for clustering.
Meaningful subgrouping requires that the features being grouped are not random. Therefore, we next asked if all possible combinations of sensory features were present. To answer this question, data were simplified into binary categories in which all subjects were classified as typically performing or different from typical performance including probably different and definitely different using standardized, previously published ranges from a group of 1,037 children without disabilities (McIntosh et al. 1999). Using this method, only 82 out of 128 possible combinations of sensory areas were present in the cohort suggesting that sensory changes are not random and that certain sensory areas tend to be affected together. In fact, the 10 most common patterns described 162 subjects (43%) (Table 2). 45 patterns were rare, being seen in three or fewer subjects each.
Broad heterogeneity was also present in the distribution of hyper- and hypo-sensitivity scores (Fig. 2B) derived from a subset of the SSP responses (Table 1). Notably, many subjects were both hyper and hyposensitive in the modalities of touch (47, 12.4%), hearing (156, 41%) and vision (35, 9%). Consistent with the analysis of the seven sensory areas defined by the SSP, only 26 subjects (7%) were hyper-sensitive in all sensory modalities. 32 subjects (8.4%) were neither hyper- nor hypo-sensitive in any sensory modality. Overall, scores in the subset analysis were similarly broadly distributed suggesting sufficient variation to allow for clustering.
Cluster analysis optimizes at seven sensory based subgroups
Having determined that SSP scores in this cohort were sufficiently broadly distributed to allow for clustering, we performed cluster analysis on bootstrap samples for 100 iterations using K-means clustering on responses from 377 subjects. The K-mean clustering was run with K = 3 to K = 10 on bootstrap samples for 100 iterations and the consensus clusters for each K were used as the final cluster for subsequent analyses. This cluster analysis was separately performed using both responses to all 38 questions (Fig. 3A) and the hyper/hypo sensitivity subset (Fig. 3B). Visual inspection of the consensus heatmaps illustrated that K = 3 provided the most consistent clustering. However, these groupings could be described by “affected in all areas,” “affected in no areas” and “all other subjects” and thus were not clinically relevant. Overall, we determined six clusters represented the most reproducible and consistent number of subgroups (Fig. 3).
Cluster analysis of responses to all 38 questions in the SSP identified six subgroups (Fig. 4A). Group 1 (N = 72) was characterized by high percentage of subjects who scored probably or definitely different in all seven areas. Similarly, group 2 (N = 72) was characterized by predominantly typical performance in six of the seven sensory areas. For group 2, subjects scored probably different only in area seven (visual/auditory sensitivity). Each of the remaining four subgroups exhibited a unique pattern (Fig. 4A). For example, group 3 (N = 64) was defined by subjects scoring in the definitely different range in all areas except three and six (movement sensitivity and low energy/weak). In contrast, group 4 (N = 79) was defined by subjects scoring in the definitely different range only in areas four and five (under-responsiveness/seeks sensation and auditory filtering) paired with probably different scoring in area seven (visual/auditory sensitivity). Group 5 (N = 63) was associated with isolated scoring in the definitely different range for areas two and five (taste/smell sensitivity and auditory filtering) as well as probably different scores for area seven. Finally, group 6 (N = 27) was associated with scores in the definitely different range for areas five and six (auditory filtering and low energy) with mixed scores in areas one through four as well as probably different scores for area seven. Consistent with the distribution of scores in this cohort (Fig. 2), subjects scoring probably or definitely different in areas five (auditory filtering) and seven (visual/auditory sensitivity) were seen in all six clusters. Thus, cluster analysis identified a cohort of subjects likely to be different in all or almost all areas, subjects mostly scoring in the typically developing range and four unique clusters with distinct patterns of sensory features.
Next, we performed the same analysis using the subset of questions evaluating hyper and hypo sensitivity (Table 2, Fig. 4B). Notably, group 1 (N = 92), characterized by increased hyper- and hypo-sensitivity in all sensory modalities, was similar to group 1 in the 38-question analysis. Similarly, group 4 (N = 51) was characterized by the absence of hyper- or hypo-sensitivity in all sensory modalities and group 3 (N = 60) was characterized by isolated auditory hyposensitivity. These two groups corresponded to 38-question cluster 2 phenotypically and had the greatest overlap in subjects with 38-question group 2. The remaining three groups were defined by distinct patterns of hyper- and hypo-sensitivity. Specifically, group 2 (N = 74) was characterized by auditory hyper- and hypo-sensitivity. Group 5 (N = 58) was characterized by hypo-sensitivity in the auditory and tactile domains. Finally, group 6 (N = 42) was characterized by auditory hypo-sensitivity and hyper-sensitivity to taste. (Fig. 4B) Similar to the 38-question analysis, five of the six groups included auditory hypo-sensitivity reflecting the high prevalence of auditory behaviors in this cohort. Thus, both cluster analyses identified a cohort of subjects that were very affected, a group mostly scoring in the typically developing or unaffected range and at least three unique clusters with distinct patterns of sensory features.
Having identified phenotypically distinct subgroups using two methods, the subgroups were validated by evaluating the degree of overlap between the two methods. Not unexpectedly given the corresponding phenotypic features, there was notable overlap between the groups identified in each method (Fig. 5). However, 38-question group 2 was divided between hyper/hypo sensitivity groups 3 and 4. Similarly, hyper/hypo sensitivity group 1 had the greatest overlap with both 38-question groups 1 and 6. The overlap between the two methods best identified seven subgroups (Fig. 5, highlighted boxes) comprising 222 subjects.
Thus, we concluded that sensory features can reliably be used to generate clinically distinct subgroups within a heterogeneous population of non-syndromic ASD patients. Further, we conclude that in the MSSNG cohort, there are seven distinct subgroups.
Subgroups are associated with genetic variants
Having successfully identified seven sensory based subgroups, we next asked if these subgroups predicted shared molecular mechanisms as indicated by correlation with genetic variants using whole genome sequencing (Fig. 1). The analysis included variants with at least a CADD score of 15 and annotation to a gene from the MSSNG database which were aggregated to a gene level for each patient for further analysis. Whole genome sequencing results were not available for seven of the subjects coming from five of the seven subgroups. In total, we identified 24,896 genes for which at least one patient had an annotated variant.
In order to ask if variants in a given gene were associated with a subgroup, we calculated variant frequency within each subgroup for each of these genes. The mean variant frequency was 9.14% (Fig. 6A). Most genes had a low variant frequency, with the mean variant frequency at 2SD being only 30%. Importantly, because all variants were collapsed, this analysis is looking at both common and rare variant frequency and thus a low variant frequency is not unexpected. Many (41.5%) genes had a low variant frequency in all subgroups meaning less than 10% of subjects in the subgroup had a variant. In fact, the mean variant frequency for each subgroup was less than 13% (range 7.1% − 12.2%). Group 3 had the highest mean variant frequency at 12.2% which was significantly higher than all other groups (Kruskal-Wallis, p < 2.2e-16).
To screen for enrichment of variants in a given subgroup, we calculated the average difference in variant frequency. For example, 62% of subjects in subgroup 6 had at least one variant in SHANK1. In contrast, the other subgroups had variant frequencies ranging from 29–33% for an average of 31%. Thus, the average difference in variant frequency was 30.6% (p < 0.0005 for a gene to have a difference in variant frequency of this magnitude) indicating variants in SHANK1 were 30.6% more common in subgroup 6 compared to other subgroups. In contrast, the mean frequency of these variants as listed in gnomAD is 0.88%. Plotting the distribution of the difference in variant frequency for each gene revealed that the top 1% of genes had a difference in variant frequency of 21.68% or greater (Fig. 6B).
It is not known what percent difference in variant frequency is clinically significant. Therefore, we considered multiple thresholds based on standard deviations with a goal of being more inclusive and ultimately used dual criteria to ensure that all associated genes associated with just one subgroup. Four standard deviations (20.7% increased or decreased variant frequency from the mean of all subgroups) was the initial cut-off. The second threshold accounted for differences in subgroup distribution in that any genes that were less than three standard deviations from any other subgroup were also excluded. Following these criteria, some groups associated with a small number of genes including six genes with increased variant frequency in group 2, four genes with increased or decreased variant frequency in group 4, and six genes with increased or decreased frequency in group 6. In contrast, group 3 associated with 126 genes with increased variant frequency, 12 genes with increased or decreased variant frequency associated with group 5, and 50 genes with increased or decreased variant frequency associated with group 7. Notably, no genes associated with group 1, characterized by difficulties in all sensory areas (Fig. 6C).
We were particularly interested in the three subgroups associated with a small number of genes. Subgroup 2, characterized by differences in sensory areas four (under-responsiveness) and five (auditory filtering) as well as hyper- and hypo- auditory sensitivity, was associated with variants in FUT8, GGT2, KCNQ1, NALCN-AS1, PHF12, and SLC25A23 (Table 3). Five of these genes are protein coding genes, although their functions vary. Subgroup 4, characterized by largely typical performance in all areas for both methods, associated with variants in CACNA2D3, CUX1, SMG6, and SOBP (Table 3). Three of these four genes negatively correlated meaning subjects were less likely to have variants. Finally, subgroup 6, characterized by definitely different scores in areas two and five (taste/smell sensitivity and auditory filtering) as well as hyper-sensitivity to taste and auditory hypo-sensitivity, associated with variants in FNDC3B, GRIK4, LINC01798, NPR3, SHANK1, and SPAG7 (Table 3). Notably, the first four of these genes negatively correlated. Further, SHANK1 was the most strongly associated with this subgroup and is part of a gene family previously identified as being related to ASD (Sala et al., 2015, Sato et al., 2012). The other four subgroups were associated with larger numbers of genes (Supplementary Table 1).