Brain-based gene expression of putative risk genes for anorexia nervosa

The etiology of anorexia nervosa (AN) remains elusive. Recent genome-wide association studies identified the first genes liked to AN which reached genome-wide significance, although our understanding of how these genes confer risk remains preliminary. Here, we leverage the Allen Human Brain Atlas to characterize the spatially distributed gene expression patterns of genes linked to AN in the non-disordered human brain, developing whole-brain maps of AN gene expression. We found that genes associated with AN are most expressed in the brain, relative to all other body tissue types, and demonstrate gene-specific expression patterns which extend to cerebellar, temporal and basal ganglia structures in particular. fMRI meta-analyses reveal that AN gene expression maps correspond with functional brain activity involved in processing and anticipating appetitive and aversive cues. Findings offer novel insights around putative mechanisms through which genes associated with AN may confer risk.

Anorexia nervosa (AN) is a debilitating and life-threatening psychiatric disorder characterized by self-directed starvation, low weight and emaciation, and an intense fear of weight gain [1].These core symptomatic features appear homogeneous across both time and cultures, and a stereotypic post-pubertal onset is commonly observed.Illness duration persists for several decades without remission for more than half of those afflicted [2], and recent meta-analyses suggest that leading specialized treatments do not outperform control treatments on key indices of symptom remission [3].With illness pathophysiology and the biological mechanisms driving the potentially lethal self-directed restriction of food intake remaining elusive, the need to better characterize the pathophysiology of AN is critical.
Despite the ubiquity of driven weight loss efforts in Western society, AN effects ~1% of the population, and appears to be a largely heritable neuropsychiatric phenotype.For instance, (i) epidemiological studies have illustrated an 11-fold greater risk of AN in first degree relatives of AN probands [4,5], (ii) heritability rates among twins range from 48 to 84% [6][7][8][9], and (iii) the variability in twin-based heritability of AN is due to genetic variation [10].Cumulatively, this has prompted some to contest that genetics are the single greatest risk factor for AN [6].
Locating the precise source of the genetic risk for AN, however, has proven challenging.Owing to the noted challenges around recruitment of AN participants for research, recent transcriptomic [11], whole exome sequencing [12] and genome-wide association studies (GWAS) [10,[13][14][15] have been underpowered, and have not yielded definitive insights as to the genetic architecture of AN.
However, the more recent cumulative aggregation of all available genotyped AN data in the world recently resulted in the breakthrough discovery of eight loci reaching genome-wide significance, that are associated with nine genes [16].
Following this landmark discovery of the genetic loci associated with AN-a highly complex neuropsychiatric disease, a fundamental challenge facing the 'post-GWAS era' lies in understanding how specific genes confer risk.Elucidating the magnitude and neuroanatomical distribution of the expression of the prioritized genes may offer novel insights into the neurogenetic mechanisms of AN.Specifically, genetic associations at the level of nondisordered human tissue transcriptome may offer important insights into normative gene function, without the confound of clinical epiphenomena common among clinical populations [17], and postmortem mRNA of human samples in particular has been outlined as the 'ultimate intermediate phenotype' to examine neuropsychiatric disorders [18].
Here we characterize the neuroanatomical distribution of multiarray derived mRNA expression patterns of the nine genes associated with AN [16] in the non-disordered human brain, and explore the functional relevance of these patterns.To do this, we extracted data from 20,737 protein-coding genes in post-mortem brain tissue from six healthy donors from the Allen Human Brain Atlas (http://human.brain-map.org/).First, (i) we assessed the donor-to-donor reproducibility of the nine genes linked to AN in the largest GWAS to date [16] (Supplementary Fig. 1) by leveraging the concept of differential stability [19,20].Next, (ii) we created voxel-by-voxel volumetric gene expression maps for genes demonstrating strong differential stability and developed a composite brain map representing an average of the 6 donors on the left hemisphere, (Supplementary Fig. 2).Next, we isolated individual gene expression patterns for each of the 9 genes linked to AN, across 54 brain regions (Supplementary Fig. 3), and additionally undertook out-of-sample replication of these gene expression maps in a separate dataset (Supplementary Fig. 4).In ascertaining the centrality of brain-based gene expression of the 9 risk genes lined to AN, relative to other human body tissue types, we additionally assessed gene expression patterns across 30 different tissue types across the human body (Supplementary Fig. 5).Lastly, we correlated voxel-by-voxel mRNA expression maps for the genes of interest with the NeuroSynth database, performing quantitative reverse inference via large-scale metaanalysis of functional neuroimaging data using mRNA brain expression maps on voxel-by-voxel left hemisphere brain maps to "decode" cognitive states on a given activation map.

Brain-based region-of-interest gene expression
Of the five genes showing reproducible expression patterns, three demonstrated spatially specific expression patterns which reached statistical significance for specific brain regions (Fig. 2).Compared   1.
The relative expression of NCKIPSD and MGMT was not spatially confined to specific brain regions, and appeared to be expressed diffusely.Voxel-by-voxel volumetric gene expression maps for the additional six genes are presented in Supplementary Fig. 3. Importantly, out of sample validation indicates similar expression patterns for all genes of interest in the Genotype-Tissue Expression (GTEx) project database (Supplementary Fig. 4).

Whole-body gene expression
Alongside whole-brain gene expression, we additionally assessed gene expression across 30 different body tissue types in the human body, by extracting normalized gene expression values (reads per kilo base per million; RPKM) from the GTEx database, via the FUMA platform (Supplementary Fig. 5).Normalized expression [zero mean of log2(RPKM + 1)] was used to assess differentially expressed gene sets [21], and Bonferroni adjusted p-values were calculated using two-sided t-tests per gene per tissue against all other tissues.These analyses indicate that the aggregated set of AN risk genes are cumulatively most expressed in the brain, relative to other body tissue types.

Cognitive state correlates
Expression maps for two of the five differentially stable genes (CADM1, NCKIPSD) were highly correlated with functional imaging maps reflecting 'conditioning', 'fear' and 'reward', ranking among the top 0.5% strongest associations for each of these cognitive state activation maps, respectively (Fig. 3).Moreover, when assessing the relationship between gene expression maps and functional activation maps most associated with specific mental states, these same two genes (CADM1, NCKIPSD) were among the 0.5% strongest associations with depression, anxiety, stress and addiction (Fig. 4).In addition, two of these five genes (NCKIPSD, MGMT) were correlated with functional imaging maps reflecting visual processing.

DISCUSSION
The anatomical distribution of gene expression networks in the human brain is idiosyncratic, dynamic and highly coordinated.Cumulatively, gene-gene co-expression patterns represent genetic signatures in the brain which are likely involved in a series of cognitive and affective states, learning processes, and brain disorders.This is especially important to examine in AN, a largely heritable and uniformed neuropsychiatric phenotype, for which the largest GWAS to date recently identified nine genes implicated in its genetic risk [16].While comprehensive post-mortem assessment of brain-based gene expression in AN is lacking, the explication of precisely where in the non-disordered human brain these risk genes are most and least expressed is critical for Fig. 2 The pathway of gene expression for anorexia nervosa risk genes in the human brain.Each point represents expression from six donors with standard errors for each given brain region for (a) CHD10, (b) CADM1, and (c) FOXP1.Asterisks represent regions of statistically significant over or under expression, relative to the rest of the brain (*p < 0.05, FDR corrected for 54 tests).
advancing our understanding of how the genetic signature of AN impacts the mechanisms that drive illness psychopathology.Here, we leveraged the expansive human brain mRNA library from the Allen Brain Atlas to illustrate the spatial location of expression patterns of mRNA reflecting specific genes linked to AN.
Cumulatively, the 9 AN risk genes assessed are most predominantly expressed in the brain, relative to other body tissue types, supporting the notion that AN is a brain-based disorder [22].Further, 5 of these 9 genes demonstrate reliable expression patterning across sex, ethnicity and age, which was corroborated in independent sample comparisons, which is suggestive of well conserved mechanisms of action.In the nondisordered brain, CADM1 mRNA expression was elevated throughout the cerebellum, the olfactory bulb, and in limbic regions including the hippocampus and amygdala.Interestingly, while the amygdala and hippocampus have been implicated in feeding behaviors via their modulation of contextually learned feeding behaviors [23,24], the cerebellum and olfactory bulb have been directly implicated in appetitive signaling and regulation via their respective projections to and from hypothalamic nuclei [25,26], which in turn secrete orexigenic and anorexigenic neuropeptides which directly modulate feeding behaviors [25].
Previous GWAS studies have linked CADM1 to the regulation of body mass and energy homeostasis [27,28].For instance, risk alleles for obesity in humans are associated with greater mRNA expression of the CADM1 gene in the cerebellum and hypothalamus [29], and animal studies have illustrated elevated neuronal expression of CADM1 in cerebellar, hypothalamic and hippocampal regions in obese mice relative to lean mice [30].Moreover, the induction of CADM1 in excitatory neurons facilitates weight gain while paradoxically enhancing energy expenditure [30].In contrast, the removal of CADM1 in knockout mice results in a prolonged negative energy balance, rapid weight loss, and prevents weight gain even in the context of extended high fat dietary regimens [30], which mirrors the rapid weight loss and profound difficulty reported by those with AN in gaining weight [31,32].Importantly, the expression of CADM1 in cerebellar and hippocampal regions may be gated by a dynamic interplay between bodyweight and dietary intake, as dietary restriction serves to reduce CADM1 expression in obese mice, but not in controls [30].Importantly, spatiotemporal trends suggesting that CADM1 expression increases across the lifespan in the olfactory bulb and cerebellum [33], while remaining static from birth onwards in the hippocampus [33], provides intriguing clues around the stereotypic onset of AN in adolescence, although further investigation into spatiotemporal gene expression patterns is warranted.Cumulatively, these data suggest that CADM1 may act on the cumulative genetic risk for AN via its contribution to synaptic plasticity and the excitatory/inhibitory balance of neuronal networks which regulate energy expenditure and bodyweight, which may be exacerbated by low weight.
The FOXP1 gene, which encodes a transcription factor important for the early development of many organs including the brain [34], demonstrated highly diffuse patterns of overexpression in frontal, occipital, parietal, and temporal regions, and patterns of profound under-expression in an array of cerebellar regions, alongside thalamic, hippocampal and amygdalar regions.In addition, CDH10 was significantly underexpressed in central structures such as the caudate, putamen, pallidum, and thalamus, and temporal regions such as the hippocampus and parahippocampus in the non-disordered human brain.Interestingly, both FOXP1 and CDH10 expression patterns have been linked to the pathophysiology of autism [35,36].For instance, overexpression of FOXP1 has been associated with autism [36,37] and related features including language impairment and intellectual disability [38].Elevated expression of CDH10 in the frontal cortex has also been associated with autism spectrum disorders [39].Certainly, AN has been characterized by elevated autistic traits [40,41] and conceptualized by some as a related endophenotype characterized by cognitive rigidity and social cognition [42,43].Cognitive rigidity in AN may also be mediated by the orexin/hyprocretin and oxytocin neuropeptide messaging systems, which have been associated with several brain regions that we have identified in our study [20,44].It is thought that both orexin/ hyprocretin and oxytocin may exert their effect via increasing synaptic plasticity [45][46][47].A particular challenge relating to the treatment of AN relates to the high rates of relapse, which is partly underpinned by cognitive and behavioral inflexibility [48].These data raise an intriguing question around whether FOXP1 and CDH10 expression may serve a plausible mechanism through which cognitive flexibility is altered in those with AN.Interestingly, and while not demonstrating strong differential stability in expression across donors, the C2orf30/ERLEC1 gene demonstrated spatially specific patterns of expression in the nondisordered human brain (Supplementary Fig. 3).Specifically, the ERLEC1 gene is reliably under-expressed in cerebellar regions.Interestingly, the ERLEC1 gene also exhibits bodyweightdependent effects of dietary restriction.Animal studies suggest that during periods of dietary restriction, lower birthweight animals, relative to normal weight animals, demonstrate profound deacetylation at ERLEC1 sites, and an accompanying downregulation of ERLEC1 expression [49].In AN, broader findings have illustrated global volumetric reductions in gray matter structures in the context of starvation [50], suggesting that the brain changes dynamically in response to changing caloric and nutrient availability.The present findings add to this body of evidence, raising the notion of a possibly synergistic effect of weight status and dietary intake in regulating the expression of genes implicated in the pathophysiology of AN.
In examining the functional relevance of genes linked to AN, quantitative reverse inference via meta-analysis of approximately 15,000 fMRI studies illustrated that two of these five differentially stable genes (CADM1, NCKIPSD) were highly correlated with functional imaging maps reflecting 'conditioning', 'fear' and 'reward', respectively.The cognitive state maps relating to fear and reward conditioning had a stronger association with each of these seven two genes than 99.5% of all other protein coding genes in the brain.The psychopathology of AN is centrally by alterations in both fear and reward, inasmuch as typically non-threatening cues (normative bodyweight, palatable food cues, etc) are found highly aversive [51], while typically hedonic cues (food, money, etc) are deemed rewarding or motivating [52].With evidence illustrating abnormal fear conditioning in AN [53], and structural and functional abnormalities in nodes within the fear circuit among those with AN [54], it been postulated that pathogenic fear learning may be fundamental to the psychopathology of AN [55].In tandem, neuroimaging evidence has illustrated reliable aberrancies in circuits underpinning reward [56][57][58], which give rise to altered reward sensitivity (and marked capacity to inhibit and delay reward seeking behaviors [57,[59][60][61].These data accord with our findings noting that several of the genes implicated in AN demonstrate reliable patterns of over (CADM1) and under-expression (FOXP1, CDH10, ERLEC1) in central nodes within fear and rewards circuits, suggesting that the genes linked with AN may confer risk for AN psychopathology by altering processes relating to fear-and reward learning.In conjunction with data illustrating that abnormalities relating to reward and fear are evident premorbidly prior to the development of AN [62], and endure after weight gain [63], our findings support the behavioral endophenotype of AN may be underpinned by spatially specifically risk gene expression patterns.
Cumulatively, these results expand recent GWAS findings by offering proof-of-principle demonstration that the genes conferring risk intersect multiple regions and cognitive processes which function abnormally in AN.These findings offer important insights around putative mechanisms through which risk genes associated with AN may confer psychopathology, and how relevant gene expression patterns depend to an extent on both weight and nutritional status which point towards novel targets in elucidating the pathophysiology of AN.

MATERIALS AND METHODS Gene selection
The largest GWAS of AN to date revealed eight loci which reached genome-wide significance [16].Extraction of the nearest genes (the nearest gene within the region of linkage disequilibrium 'friends' of the lead variant) [16] resulted in the identification of nine candidate genes.

Post-mortem brain samples
mRNA distribution data was obtained from the Allen Human Brain Atlas (http://human.brain-map.org/).One donor was a Hispanic female, three donors were Caucasian males, and two donors were African-American males.Notably, the majority of these donors are male, which represents a limitation given the female skew in prevalence of AN.This in turn underscored the importance of assessing donor-to-donor reproducibility in gene expression patterns.Mean donor age was 42.5 years (S.D. = 11.2 years), and postmortem brain tissue was collected, on average, 22.3 h after death.Data collection adhered to ethical guidelines for the collection of human postmortem issue collection, and institutional review board approval was granted at each issue bank and repository that provided tissue samples.Further, informed consent was obtained from each donor's next-of-kin.For more details regarding data collection procedures, see http://help.brainmap.org/display/humanbrain/Documentation.

Gene expression data
Gene expression data from 20,737 protein coding genes was collected from the Allen Human Brain Atlas.Each donor brain was sampled in 363-946 locations, either in the left hemisphere only (n = 6), or over both hemispheres (n = 2) using a custom 8 × 60 K cDNA array chip.Analyses were performed on left hemisphere samples due to a larger sample size.Individual brain maps were non-linearly registered to the MNI152 (Montreal Neurological Institute) template using Advanced Normalization Tools.Next, we extracted region specific statistics for 54 brain regions based on the Automated Anatomical Label (AAL) atlas.

Donor-to-donor reproducibility of gene expression patterns
Owing to differences in donor sex and ethnicity, we assessed the similarity of gene expression patterns across the six donors by leveraging the concept of differential stability [19], which is the average Spearman's correlation between any possible combination of 15 pairs between donors [20].This method has previously indicated that genes with strong differential stability are highly biologically relevant [19].For analysis, we selected the probe with the greatest differential stability, which represented probe with the least amount of spatial variability among donors.The average correlation (Spearman's r) across 15 pairs of 6 donors' voxel-by-voxel brain maps were used to calculate differential stability using an approach described by Hawrylycz and colleagues (2015) [19].Since each voxel-by-voxel map was generated based on a limited and variable number of samples, to calculate the statistical significance of Spearman's coefficients we calculated p-values between two donors based on the smallest number of samples out of the two.That is, if one donor had samples from 353 locations and another one from 456, we would use the 353 samples to calculate a p-value for the pair.

Out-of-sample validation
The Genotype-Tissue Expression (GTEx) project database was used for independent sample validation.Although this dataset provides gene expression data from fewer brain regions (i.e., 10) compared to the Allen dataset, the data is derived from a larger dataset of donors (mean sample size for mRNA expression across brain regions = 131.7,range = 88-173).Median gene expression profiles from 10 distinct brain regions were extracted for the above-specified 20 genes of interest from the GTEx database and median values were calculated for these same 10 regions from the Allen dataset.For independent sample validation, the rank-order correlation of gene expression between the Allen and GTEx datasets using the 10 brain regions reported in the GTEx database was calculated.

Voxel-by-voxel gene expression maps
To create novel voxel-by-voxel volumetric expression maps, we first marked all the sample locations and expression values in native image space [20,64].To interpolate missing voxels, we labeled brain borders with the sample expression value that had the closest distance to a given border point (Supplementary Fig. 2).Next, we divided the space between scattered points into simplices based on Delaunay triangulation, then linearly interpolated each simplex with values to yield a completed map.All maps were computed in Matlab 2014a (The Mathworks Inc., Natick, MA, USA).We then created a composite brain map representing an average of 6 individuals on the left hemisphere, registered brains to MNI space using ANT's non-linear registration and averaged so that each gene's mRNA is represented by a single voxel-by-voxel brain map.

Cognitive state correlates
The Neurosynth framework has collated neuroimaging data from over 14,000 fMRI studies (database version 0.7, released July, 2018).While this framework can be used to develop meta-analytic brain activation maps for specific cognitive states (i.e., stress, learning, reward) using forward inference, it may also be leveraged to "decode" cognitive states on a given activation map, via reverse inference [20].We correlated voxel-by-voxel mRNA expression maps for the genes of interest with NeuroSynth (version 0.3.7),performing quantitative reverse inference via large-scale meta-analysis of functional neuroimaging data using mRNA brain expression maps on voxelby-voxel left hemisphere brain maps, representing the average of the six donors.Next, we modified the NeuroSynth package to calculate Spearman's correlation coefficient instead of the default Pearson's correlation coefficient.To test the specificity of these cognitive states, we extracted association Z maps, which reflect Z-scores of the association between the presence of activation and the presence of a cognitive feature.We assessed the 5 strongest relationships for expression maps for each gene of interest and all available cognitive state maps in the Neurosynth database.Additionally, we calculated Spearman's correlation between all 20,737 genes and association Z score maps and ranked them from largest to smallest.

Statistical analysis of gene expression data
The R statistical package (version 3.3.2) was used for statistical analysis.One-sample t-tests (two-tailed) were conducted to assess which of the 54 left hemisphere regions from six donor samples expressed mRNA to a significantly greater or lesser degree compared to average mRNA expression across the brain.To correct for multiple tests (54 in total), reported p-values were adjusted using a false discovery rate (FDR) threshold.Cohen's d values for one-sample t-tests were calculated to yield a measure of effect size.
To assess gene expression across various body tissues, normalized gene expression values (reads per kilo base per million; RPKM) were extracted from the GTEx database, via the FUMA platform.As described by Watanabe and colleagues [21], normalized expression [zero mean of log2(RPKM + 1)] was used to assess differentially expressed gene sets.Bonferroni adjusted p-values are then calculated using two-sided t-tests per gene per tissue against all other tissues.Genes with a Bonferroni adjusted p-value < 0.05 and absolute log fold change ≥0.58 were categorized as a differentially expressed gene set in a given tissue type.The presented -log10 p-values represent results from hypergeometric tests, which were used to assess if genes of interest were overrepresented in differentially expressed gene sets in specific tissues.
Similarly, hypergeometric tests were used to assess if genes of interest were overrepresented in gene sets reported in the GWAS catalog and gene sets associated with behavioral and cognitive state processes reported within GO biological processes gene sets within Molecular Signatures Database, using 20,119 protein coding genes as the background set.pvalues were Benjamini-Hochberg adjusted for all genesets reported in the GO biological processes dataset and GWAS catalog, respectively.

Fig. 1
Fig.1Differential stability of protein coding genes.Differential stability for all protein coding genes (n = 20,737) was calculated as a function of the average Spearman's correlation between any possible combination of 15 pairs between donors, to assess the similarity of gene expression patterns from donor-to-donor.FOXP1, CADM1, CDH10, NCKIPSD and MGMT were above the 50 th percentile of all genes, suggesting adequate reproducibility.

Fig. 3
Fig. 3 Cognitive state correlates of the five differentially stable genes associated with AN. a Cognitive states were meta-analytically decoded from AN risk gene mRNA maps (Supplementary Fig. 2) using the NeuroSynth framework.The top five strongest relationships for CADM1, CDH10, FOXP1, MGMT and NCKIPSD are shown, with duplicates removed.b The absolute distribution of Spearman's correlations between each protein coding gene map (N = 20,737) and cognitive state maps.

Fig. 4
Fig. 4 Mental state correlates of the five differentially stable genes associated with AN. a Mental states were meta-analytically decoded from AN risk gene mRNA maps (Supplementary Fig. 2) using the NeuroSynth framework.The top five strongest relationships for CADM1, CDH10, FOXP1, MGMT and NCKIPSD are shown, with duplicates removed.b The absolute distribution of Spearman's correlations between each protein coding gene map (N = 20,737) and mental state maps.