Genotoxic stress in SOD1 G93A mutants
Oxidative genomic damage is increased in spinal cord tissue of SOD1G93A mouse mutants as approved by OdG and γH2AX markers [26–29]. Recent studies indicate that the macrodomain-containing core histone variant macroH2A1 (mH2A1) is involved in DDR and DNA damage repair following its attraction to DSB [65]. Under oxidative stress conditions, it stabilizes the DDR sensor poly(ADP)ribose PAR, reduces oxidative DNA damage and prevents cell death [66]. According to this role, we observed an increased number of nuclear mH2A1+ foci in SOD1G93A mutant SMI-32+ motor neurons (MN) relative to control MN as quantified from grey matter areas (Fig. S1A-C). In contrast to other histone variants, mH2A1 is transcribed throughout the cell cycle and not limited to S phase. To further prove this, mH2A1 detection in post-replicative MN was assessed under co-expression of a cell cycle-sensitive Fucci (fluorescence ubiquitination-based cell cycle indicator) reporter [67] thus confirming a G0/G1 state (Fig. S1A, B). These observations complement previous studies on genotoxic stress evaluation in the SOD1G93A strain [26–29].
Circulomics and proteomics workflow
Under corresponding conditions, eccDNA was isolated from cervical spinal cord of 9 symptomatic hSOD1G93A mutant mice and 10 controls. In short, for each individual sample eccDNA was enriched from 400 ng total genomic DNA by removal of linear DNA with exonuclease, followed by RCA. The obtained eccDNA pool was normalized against each sample’s total DNA amount, followed by paired-end short read sequencing as described earlier [37]. Mapping of the purified circles discovered hundreds to thousands of eccDNAs in the control and disease-affected Mus musculus cervical spinal cord, from which the gene-specific eccDNAs, PpGCs, were calculated. DifCir-based differential analysis [55, 56] of the genic eccDNAs identified a set of PpGCs that were differentially produced in ALS, DPpGCs, and that were up-produced under disease-like conditions (up-DPpGCs) compared to controls. Applying the democratic method [56, 59, 60] for finding common PpGCs (CPpGCs) expanded the set of eccDNA gene hotspots. The experimental setup and computational workflow for eccDNA mapping and differential analysis, and subsequent proteomics analysis, is presented in Fig. 1.
Unique eccDNAs are more abundant in the ALS diseased than control CNS
Almost all, 99%, of the mapped unique eccDNAs in the hSOD1G93A and control samples covered eccDNA sizes of < 104 bp. Detailed length-stratified profiling, as exemplified in the histograms of Fig. 2A, and illustrated for all control and ALS samples in the histograms of Fig. S2A-B, revealed that the majority of eccDNAs were smaller than ~ 104 bp in size (Fig. 2A; Fig. S2A-B). Overall size distributions of eccDNAs were similar for hSOD1G93A and control tissues (Fig. 2A-B; Fig. S2A-B). Notably, in both groups eccDNA exhibited a periodic enrichment with a size-interval of approximately every ~ 200 bp (Fig. 2C), which is in accordance to prior indications of 188 bp regular intervals in mouse embryonic stem cells (mESCs) [68]. Whether such consistent observations can indicate an association with the nucleosome repeat length (NRL) in mouse, comprising the 147 bp of core nucleosomes and their linker stretches, needs further investigations. NRL varies between species, cell types and genomic locations, but can occupy more than 200 bp in mouse [69].
Based on our genotoxic stress model, we anticipated eccDNA accumulations in spinal cord of the hSOD1G93A mutant. To verify this notion, we assessed the sample-individual load of eccDNA species for each group. Irrespective of their absolute quantities, which are not assessable from RCA, the numbers of unique, mappable eccDNAs across all chromosomes up to a size of 105 bp were defined. These ranged from 94–1,501 eccDNAs in controls (µ ± SEM = 480.7 ± 137.3) up to 1,377–5,610 eccDNAs in ALS samples (µ ± SEM = 2,882.6 ± 419.3), revealing a statistically significant, six-fold increase in the number of unique eccDNAs under disease conditions (p-value = 2.165− 05; Fig. 2D) (where µ is the mean and SEM the standard error of the mean). We next explored the chromosomal origin of the eccDNA species prevailing in the CNS, and particularly in presence of the hSOD1G93A mutation. In compliance with a genome-wide impact of the SOD1 enzyme dysfunction and endogenous ROS on chromatin structures, eccDNA provenience was genomically scattered in both the ALS and control samples and arose from all 19 murine autosomes (Fig. 2E; Fig. S3A-B).
Split-reads based differential analysis identifies a distinctive genic eccDNA profile in ALS spinal cord
To assess if any of the genes in the mouse genome were particularly prone to form eccDNA under the two different conditions, we calculated the PpGC for the control and ALS samples. A heatmap of the top 1000 most variable PpGCs is shown in Fig. S4. The paired scatter plot in Fig. 3A discloses differences in the PpGCs produced in the hSOD1G93A mutants and control specimens. For a log2 fold change (FC) > 1 (FC ≥ 2 on the linear scale) and a -log10p-value ≥ 2.0 (significance level α = 0.01) as statistical thresholds, we identified 225 up-DPpGCs, corresponding to genes that were particularly prone to form eccDNA under ALS conditions, i.e., in an ALS-specific pattern (Fig. 3B). No up-DPpGCs were identified in the controls relative to ALS for the same thresholds.
The top up-DPpGCs were shed by the following gene loci: Large1 (-log10p-value = 5.7, FC = 5.2), Csmd1 (-log10p-value = 5.4, FC = 5.9), Sox5 (-log10p-value = 5.2, FC = 6.3), Cdh4 (-log10p-value = 5.2, FC = 5.7), Ntm (-log10p-value = 4.8, FC = 4.9), and Galnt2l (-log10p-value = 4.3, FC = 5.3), as indicated in Fig. 3C. EccDNA release from each of these 6 hotspot loci was consistently detected in at least 8 out of the 9 ALS samples investigated (Fig. 3C). In Fig. 4A, these 6 top up-DPpGCs are further illustrated as ‘linear’ circle track plots together with their coverages, while solely the linear circle plots are shown for all remaining up-DPpGCs in Fig. S5. The linear plots illustrate that up-DPpGCs are represented by eccDNA carrying gene fragments only but no full genes, and the lack of sequence overlap of genic eccDNAs among the samples.
When considering gene size, we found these 6 hotspot genes to be large and, except for Large1, to be exclusively or dominantly transcribed in CNS and skeletal muscle cells, as deduced from single cell cluster expression patterns annotated in the Human Protein Atlas (2022) for different tissues and cell types. To explore a putative linear correlation between the probability to excise eccDNA and the originating gene length, the -log10 (p-values) of the 225 hotspot genes were plotted against the size of the corresponding genes. The resulting regression line (Fig. 4B) and correlation coefficient of R2 = 0.15175 indicated that, though long genes were involved, the propensity to shed eccDNA in our ALS model was not correlated with the size of the underlying genes.
Association of eccDNA gene hotspots with ALS risk l oci shows 63 genes in common
For a comprehensive assessment of whether the 225 up-DPpGCs carry a signature linked to the ALS pathophysiology, we referenced them to publicly available databases that aggregate genetic variants associated with fALS/sALS disorders from genome wide association studies (GWAS). In total, 610 gene sets were notified in association with the term ‘amyotrophic lateral sclerosis’ in the integrative Harmonizome database, which collects information for human and mouse species from GWAS and other genetic association studies and displays empirical cumulative probabilities for the 10% with strongest associations in a standardized fashion [70]. Applying the ‘GWASdb SNP-Disease Associations’ setting for output search [70], 608 genes were found listed in association with the disease ‘amyotrophic lateral sclerosis’. Among, we found 54 listed genes to overlap with our 225 up-DPpGCs, 3 of which, Csmd1, Sox5, Cdh4 were among the aforementioned 6 up-DPpGCs with highest statistical significance (Fig. 5A, B). The total list is found in Table S1. Notably, all of these disease-associated genes were also positively screened in the category of ‘GWASds SNP-Phenotype Associations’, indicating relevance as for symptom manifestation and disease penetrance.
In the context of a second, high-quality curated repository, the National Human Genome Research Institute – European Bioinformatics Institute (NHGRI-EBI) Catalog of human genome-wide association studies [71], approximately 350 variant associations were displayed for the trait ‘ALS’, underlying defined GWAS curation cutoff values of p-value ≤ 1.0 x 10− 5 for single SNP-traits as described in the recently updated deposition resource [71]. From this database collecting human associations only, we could extract 237 unique genes associated with ALS (state: 16/01/2023), 19 of which were overlapping with our 225 up-DPpGCs (Fig. 5A, B; Table S1). These comprised the genes Adamtsl1, Asic2, Camta1, Creb5, Ctnnd2, Dach1, Dpp6, Erbb4, Grid1, Itpr2, Kalrn, Kcnmb2, Lama2, Lama3, Macrod2, Mir99ahg, Opcml, Ptprn2, and Trpm8. Notably, 10 of them, Asic2, Ctnnd2, Creb5, Erbb4, Itpr2, Kalrn, Lama3, Macrod2, Opcml, and Ptprn2, were recorded both in the Harmonizome and NHGRI-EBI study assortment (Fig. 5A, B; Table S1). In total, we found 63 out of the 225 up-DPpGCs, 28.0%, as being indexed in these two databases (Fig. 5A, B; Table S1). The convergence of ALS risk gene considerations established in experimental studies and human GWAS with our eccDNA analyses is comprehensively illustrated in Fig. 5A and B and detailed in Table S1.
EccDNA levels in ALS correlate with the ALS proteome
To investigate whether genes prone to form eccDNA were also affected at the proteome level, we assessed changes in protein abundances in hSOD1G93A and control spinal cord via mass spectrometry. Proteins found to be differentially expressed in the ALS proteome when referenced to the healthy control proteome (q-value < 0.05; absolute average log2 ratio > 0.38; Fig. 5C) were extracted from a cell compartment-specific analysis (Fig. 1F, G). In total, 3,312 differentially expressed proteins (DEPs) were identified. Purity of the 5 subcellular fractions harvested was recently confirmed mass spectrometrically by marker proteins (unpublished data). Overall, when relating our ALS-specific catalogue of eccDNA hotspot genes to the ALS sub-proteome in the respective cell compartment, we found that the highest percentage of eccDNA releasing genes in ALS had counterparts in the Cysk (2.0%) and sNE (1.4%) fractions, followed by comparable percentages in the CP and cNE fractions (1.3% and 1.2%), and lowest in the ME fraction (0.8%) (Fig. 5D).
In total, 42 out of the 225 up-DPpGCs, i.e., 18.7%, exhibited a coincidentally altered protein level under ALS relative to control conditions in at least one of the cellular subfractions (Fig. 5A-C). While all involved genes were up-producing eccDNAs in the hSOD1G93A mutants compared to controls, protein counterparts were either up- or downregulated (Fig. 5C). DEP pendants for the up-DPpGCs with strongest q-values (-log10 scaled) were mainly discordantly down-regulated (Fig. 5C). Seventeen of these 42 DPpGC – DEP tandems overlapped with ALS-associated gene annotations in the Harmonizome database, including Cadm2, Cdh4, Cdh13, Ctnna3, Dclk1, Gria3, Grm7, Grm8, Itpr2, Lama3, Macrod2, Plcb1, Prmd16, Ptprn2, Rbfox1, Wwox, and Zbtb20 (Fig. 5A-B; Table S1), two of which, Macrod2, and Itpr2, also intersected with the human GWAS catalogue (Fig. 5A-B; Table S1). Additionally, the DPpGC – DEP pairs related to Dpp6 and Lama2 were found registered in the human GWAS database (Fig. 5A-B: Table S1). Thus, 19 out 42, i.e., 45.2% of the DPpGC – DEP couples were ascribed to an ALS risk (Fig. 5A-B; Table S1), indicating triple DPpGC-DEP-GWAS risk associations. In comparison to the 28.0% of all up-DPpGCs that were overlapping with ALS risk gene annotations (Fig. 5A-B; Table S1), these protein correlations strengthened our preceding indications of an ALS-related, SOD1G93A- coined eccDNA profile in the spinal cord of affected animals. Prevailing functional denotations associated with the up-DPpGC – down-DEP pairs, according to GeneCards® annotations complemented by manual research, are summarized in Fig. 5E. Itemized are neurotransmitter homeostasis, synaptic transmission, establishment of neuronal projections, cytoskeletal reorganization, cAMP/cGMP homeostasis and DNA repair, all of which are of relevance in the ALS pathology. Screening of single cell expression clusters based on the Human Protein Atlas (2022) proved that the genes underlying these DPpGC – DEP tandems were predominantly expressed in neurons and glia entities, and some of them showed neural and muscular clusters (Fig. 5F).
Additionally, the majority of DPpGCs that remained devoid of a DEP pendant were generated from neuronal genes, as indicated by gene set enrichment analyses (GSEA) (Fig. 6A). Among the gene ontology (GO) terms enriched, our analyses discovered the class of ‘adenylate cyclase modulating g proteins’ as an overrepresented category that included 10 related genes, Adcy2, Akap13, Chrm3, Gng2, Grm7, Grm8, Htr4, Pde4b, Rims2 and Rit2 (Fig. 6A), all of which up-produced DPpGCs under ALS conditions. The entire fraction of PpGCs generated from genes of the adenylate cyclase signaling pathway are displayed in Fig. 6B, and their numerical overrepresentation is illustrated in the violin plot of Fig. 6C.
Higher number of PpGCs from RDC neural coding genes in ALS than in controls
To explore a putative preponderance in eccDNA production by chromosome structure as found described recently [72], the parent loci of the 225 up-DPpGCs in ALS were mapped back to the murine chromosomal set (Fig. 7A). No chromosome proved to be significantly enriched in up-DPpGC shedding (Fig. 7B).
Predisposing factors for genes to liberate eccDNA can be structural or functional, and might be associated with susceptibility loci for DSB. For mice, 27 neural coding genes are known to harbor recurrent DSB clusters (RDC), as demasked under conditions of deficient DNA repair and replication stress [73]. Ten of these 27 RDC genes (37.0%), i.e., Csmd1, Csmd3, Ntm, Cdh13, Magi1, Grik2, Rbfox1, Ctnnd2, Cadm2 and Wwox also occurred in our list of up-DPpGCs in ALS, emphasizing the role of genomic instability in ALS-related circular DNA production. We assessed the differences in PpGCs between ALS and controls for the complete set of these 27 RDC genes (Fig. 7C) and found a higher number of RDC-related PpGCs in ALS than in control specimens (Fig. 7C-D), implicating a FC of 3.6185 for the ratio built from the ALS and control means (p-value = 3.1704− 26).
No difference in extrachromosomal telomere repeats between ALS and controls
Telomeres are particularly susceptible to oxidative stress compared to the bulk genome [74]. We therefore tested whether telomeres were more prone to form eccDNA in the ALS model by estimating the sample-specific length of telomeric TTAGGG minisatellites on eccDNA circles, termed extrachromosomal telomere repeats (ECTR). We observed slightly longer ECTR in control as compared to ALS samples (Fig. 7E), with the difference being more pronounced for the range of 5–10 repeats (Fig. 7E). To estimate the total ECTR length, a sequence of 4 hexanucleotide repeats was chosen since this length was not exceeded in most samples. The mean total TTAGGG repeat length on ECTR showed high inter-sample variability (Fig. 7F) but was predicted to be 35.1 ± 6.8 Kbp (n = 10) for control samples and 30.5 ± 16.9 Kbp for ALS samples (n = 9) and thus was comparable between the groups (p-value = 0.397; Fig. 7F).
Common PpGCs in ALS are frequent and predominantly specific
To further identify PpGCs that are common either in the control or ALS group, denoted as CPpGCs, we employed the democratic method that shows a group-intrinsic over-representation independently of an inter-group differentiation. To find the minimum eccDNA production values to be considered as a positive vote, the empirical distributions of the PpGCs were assessed for ALS and control samples (Fig. 8A-B). As threshold parameter served the ceiling of the maximum, which was 8 and 9 for control and ALS, respectively. Considered were all genes that cumulated at least 3 votes (Fig. 8C-D).
In total, we identified 72 CPpGCs in ALS (Fig. 8D), 25 of which were among the DPpGCs already retrieved with the DifCir algorithm. Notably, the CPpGCs derived from the Scn7a gene and those originating from the Cdkal1, Chchd3, Dcc, Fhit, Naaladl2, Pbx1, and Sorcs2 genes intersected with the NHGRI-EBI and Harmonizome databases, respectively. The remaining expanded the ALS eccDNA profile. By contrast, only one CPpGC was detected in control samples, Mir100hg (Fig. 8C). The miRNAs deriving from the miR-100/let-7a-2/miR-125b-1 cluster host gene, Mir100hg, are transcribed in spinal cord and upregulated in presence of the hSOD1G93A mutation [75]. Likewise, miR-125b-5p is induced in FUS-mutated motor neurons and therein associated with an impaired DDR and DNA damage accumulation [76]. According to such role, CPpGC from this gene was also found in the ALS samples. Hence, it was not specific for healthy spinal cord tissue. Notably, miRNAs from this cluster can also modulate inflammatory responses that might be relevant for ALS [77, 78].
EccDNA carry full genes
When screening all eccDNAs for full gene sequences, independently of thresholds for differential production and sample types, we gathered replicates of 235 uncropped genes or pseudogenes (Fig. S6, A-B), 69 of which carried coding functions. Notably, 100% of the ALS samples provided full gene eccDNAs excised from numerous gene loci (µ ± SEM = 0.085 ± 0.011; Fig. 8E). By contrast, 80% of the control samples embedded full genes on their circles, involving a minor number of gene loci (0.028 ± 0.0062; Fig. 8E). These data attest a significantly higher probability to excise full gene sequences from the ALS than from the control genome (p-value = 4.19− 6). Regarding inter-sample recurrence, the immunoglobulin κ joining-2 and − 3 (Igkj2, Igkj3) loci were iterated by two ALS samples (Fig. 8E) and thus showed minor sample overlap. As the underlying highly repetitive, polymorphic locus is subjected to V-D-J recombination, class switch recombination and somatic hypermutations [79], all of which are engaged in eccDNA genesis, it might reflect a general source of circular DNA synthesis, unrelated to the ALS pathophysiology, or originate from residual blood cells in the extracted samples. Accordingly, most prominently represented were the lymphocyte-related Tra/Trb and Igh/Igl loci encoding B- and T-cell receptor specificity in response to pathogens and other antigens. A second cluster relied in sensory receptor signaling engaging the extremely large Olfr multi-gene family [80]. In conclusion, as lacking intra-group re-occurrence and presenting with single-event eccDNA, whole genes did not contribute to ALS hotspot loci.