Structural variants shape driver combinations and outcomes in pediatric high-grade glioma

We analyzed the contributions of structural variants (SVs) to gliomagenesis across 179 pediatric high-grade gliomas (pHGGs). The most recurrent SVs targeted MYC isoforms and receptor tyrosine kinases (RTKs), including an SV amplifying a MYC enhancer in 12% of diffuse midline gliomas (DMG), indicating an underappreciated role for MYC in pHGG. SV signature analysis revealed that tumors with simple signatures were TP53 wild type (TP53WT) but showed alterations in TP53 pathway members PPM1D and MDM4. Complex signatures were associated with direct aberrations in TP53, CDKN2A and RB1 early in tumor evolution and with later-occurring extrachromosomal amplicons. All pHGGs exhibited at least one simple-SV signature, but complex-SV signatures were primarily restricted to subsets of H3.3K27M DMGs and hemispheric pHGGs. Importantly, DMGs with complex-SV signatures were associated with shorter overall survival independent of histone mutation and TP53 status. These data provide insight into the impact of SVs on gliomagenesis and the mechanisms that shape them. Dubois and colleagues assemble a large cohort of human pediatric high-grade glioma samples, identifying patterns of simple and complex structural variants and characterizing their role in tumor development and evolution.

breast cancers 22 and other cancer types 23,26,27 . However, whereas SNV signatures have been characterized across tens of thousands of exomes, the relationships between currently described SV signatures across cancer types remain underexplored. For example, high rates of tandem duplications are associated with deficiencies in homologous recombination (HR) only in tumors with very high SV burdens 24 . It remains unclear whether these associations translate to other tumor types, including pHGGs, and which other SV signatures or associated variant-generating processes exist.
The differences across lineages indicate the role of epigenetics in shaping the SVs that are observed in cancer 21,23 . Mutations in core histones in pHGGs 8,9 highlight the role of epigenetic dysregulation in these tumors. pHGGs therefore offer a unique perspective on the relationships between patterns of SVs and different alterations in chromatin. Associations between patterns of SVs and other molecular and clinical characteristics of these tumors are also largely unknown.
Historically, the characterization of DMGs lacked pretreatment tissue owing to the risks involved in performing biopsies of midline brain structures [28][29][30][31] . A concern with posttreatment samples is that treatment-often involving ionizing radiation-might alter the SV patterns in these tumors. We leveraged samples from the first multi-institutional North American clinical trials to incorporate biopsies of DMGs 31 and added published [4][5][6][7] pre-and posttreatment samples to assemble the largest pHGG WGS cohort to date. We identified recurrent driver events, stratified pHGGs based upon mechanistically informative SV signatures, and detected genetic events and differences in clinical outcomes associated with these signatures.

Results
Significantly recurrent SVs. We assembled a pHGG WGS cohort including 61 hemispheric tumors and 118 DMGs from 179 children. Of these, 61 were sequenced de novo for this study. The other 118 samples include 18 from Buczkowicz et al. 5 , 20 from Taylor et al. 7 , 30 from Wu et al. 6 and 50 from Bender et al. 4 Table 1). All sequences were subjected to a single uniform computational pipeline. Among the DMGs, 84 (71%) were from pretreatment biopsies, including 33 obtained from the first multi-institutional North American clinical trial to incorporate diagnostic biopsies 31 . The tumor purity of the pretreatment biopsies was comparable with that of autopsy samples (median: 0.8 versus 0.78, P = 0.5) (Extended Data Fig. 1a).
The most notable finding was a recurrent amplification in 8q24.21, 2 MB telomeric to MYC. This amplicon, which was probably not detected in prior array-and exome-based studies because it lies outside the exome 3 , was present in 28 tumors (16% of the cohort). All but one of these tumors were DMGs, a significant enrichment (P = 0.0016). Most of these amplicons excluded MYC itself (Extended Data Fig. 1c). A nonoverlapping peak was also detected that did encompass MYC, owing to two tumors with extrachromosomal MYC amplicons.
We also found that several regions were recurrently amplified together to high levels, including 2p25.1 and the MYCN locus at 2p24. 3. This pattern of correlated SCNAs in distinct genomic loci suggests underlying recurrent SVs. We therefore comprehensively cataloged SVs using an assembly-based method 33 with improved ability to detect complex and short SVs compared with standard alignment-based methods. We detected 15,485 SVs (Supplementary Table 3), averaging 87 per tumor, including 1482 (10%) that were 10-300 base pairs (bp) in span; this was a 'blind spot' in prior analyses 33 .
To distinguish recurrent SVs, we took two approaches based on methods that we have recently developed 21 . In the first, we conducted a 'one-dimensional' (1D) analysis that identified genomic loci with more SV breakpoints than expected (termed significantly recurrent breakpoints or SRBs; Supplementary Table 4a). This analysis splits the whole genome into 50 kbp bins and compares the observed number of SVs in each bin with a background distribution that considers sequence, epigenetic and other features of each locus (all results in Supplementary Table 4B). In the second, we conducted a two-dimensional (2D) analysis to detect pairs of loci that were recurrently juxtaposed by SVs (termed significantly recurrent juxtapositions, or SRJs). This analysis evaluates the rate at which pairs of bins are connected by SVs to a background distribution that reflects the rates at which each bin suffers breakpoints and the genomic distances between them. The bins and background model in this analysis were determined in a prior pan-cancer study 22 ; the 2D bin median length was 467 kbp. In both analyses, bins with q values < 0.1 were considered significant.
We identified 10 SRB bins across five TADs (Supplementary  Table 4a,b, Fig. 1a and Extended Data Fig. 1d) and two SRJs (Fig. 2a). The most significant SRB was within the MYC TAD, encompassing breakpoints in 28 tumors-more than those for any other TAD in the genome. This locus was also a component of an SRJ connecting two adjacent bins at the telomeric end of the MYC  K27M and four H3 K27WT pHGGs tumors. Only the top H3K27ac enrichment track originates from a tumor with a CCDC26-SV. Significantly enriched peaks (q value < 0.01) are indicated below each H3K27ac ChIP-seq track. The CCDC26 amplicon boundaries for individual samples are indicated by the paired red arrows at the top. The consensus amplicon is indicated by the red dotted lines and centers on an H3K27ac peak. d, Hi-C heatmap across the MYC-CCDC26 locus from a midline glioma with CCDC26-SV. Increasing interaction frequencies are indicated by brighter shades of red. The black arrowheads indicate significant interaction loops. The track beneath the heatmap indicates RobusTAD left and right TAD boundary scores, which represent the likelihood that TAD boundaries are present. The third row contains a virtual 4C track, in which peaks indicate higher interaction frequencies with an anchor sequence in the MYC promoter, which is highlighted in gray. The fourth row shows H3K27ac ChIP-seq data from the same sample indicating the location of the enhancer peak within the CCDC26-SV consensus amplicon. e, Normalized MYC expression in DMG samples with WT CN profiles at 8q24.21 (n = 92 tumors), CCDC26-SVs (n = 8 tumors) or amplifications of the MYC coding sequence (n = 12 tumors). *P = 0.04 as determined by two-sided Wilcoxon rank sum test. The center line of the boxplot indicates the median, bounds of the box indicate the 25th and 75th percentiles, and whiskers extend from the box to the largest or smallest value no further than 1.5× IQR. f, Schematic illustrating the luciferase reporter used to validate the enhancer in CCDC26, showing the positions of the E1 and E2 sequences with respect to the enhancer within CCDC26. g, Luciferase activity in DIPG13 cells following transduction of the E1, E2 and LUAD enhancer reporters or empty vector controls. Values represent the average of four technical replicates in each of three independent experiments. *P = 1.6 × 10 −5 (E1 versus backbone) and P = 0.89 (LUAD versus backbone), n = 3 independent experiments, nested one-way ANOVA with Tukey's post-test; boxplot defined as in e.
TAD. The remaining SRBs corresponded to SVs within the TADs of the RTK genes MET (q = 0.0025), EGFR (q = 0.029) and PDGFRA (q = 0.032), as well as an SV within the TAD of the transcription factor ID2 (Supplementary Table 4a). This latter SRB was also a component of the second SRJ, which connected ID2 and MYCN. 15 tumors contained a tandem duplication centering on intron 1 of the long noncoding RNA CCDC26, with a median span of 216 kbp and a minimal common region of amplification (MCR) of only 42 kbp (Fig. 1b). The remaining seven rearrangements exhibited no consistent structure.
The 2 Mb region telomeric to MYC has been shown to contain MYC enhancers in lineage-specific locations in several cancer types 34 . We therefore hypothesized that the CCDC26 amplicon promoted oncogenesis by amplifying an associated neural-lineage enhancer. We analyzed published H3K27ac enhancer tracks generated from H3 K27M and H3 wt pHGGs 35 and adult glioblastomas 36 and observed H3K27ac enhancer peaks within the MCR of the CCDC26 amplicon ( Fig. 1c and Extended Data Fig. 2a). We also confirmed the presence of this enhancer in an independent pHGG assay for transposase-accessible chromatin using sequencing (ATAC-seq) dataset 37 and H3K27ac chromatin immunoprecipitation sequencing (ChIP-seq) data from normal neural tissue 38 (Extended Data Fig.  2a). The H3K27ac peak at chr8:130640182-130641543 was present in all tissues of neural origin but not in enhancer maps from nonneural lineages (hematopoietic and lung tissues) (P = 0.0005, Fisher's exact test). We conclude that the CCDC26 amplicon centers on a neural-lineage-specific enhancer.
The lineage-specific H3K27ac enhancer at the CCDC26 locus also appears to interact directly with MYC. We evaluated the chromatin topology of an H3.3 K27M pHGG that harbored a CCDC26-SV using Hi-C. The CCDC26-SV breakpoints were within the MYC TAD, and the H3K27ac peak within the CCDC26-SV formed an interaction peak with the MYC promoter (Fig. 1d). However, this interaction was not restricted to tumors with CCDC26-SVs. Analysis of Hi-C data generated in a CCDC26 wt pHGG, two patient-derived H3.3 K27M cell lines and induced pluripotent stem cell (iPSC)-derived neural progenitors (Extended Data Fig. 2b) revealed a similar TAD structure and interactions with the MYC promoter. We conclude that the CCDC26-SV amplifies a preexisting MYC enhancer.
The eight pHGGs with the CCDC26-SV also exhibited increased MYC RNA expression compared with tumors without SVs or amplifications in the MYC TAD (8q24.21-WT, n = 92, P = 0.04, Fig. 1e) and similar levels of MYC to those of pHGGs with amplifications of the MYC coding sequence (P = 0.85). Indeed, the absolute CNs of both MYC and the CCDC26-SV were correlated with MYC expression (P = 0.0003 for MYC and P = 0.01 for the CCDC26-SV, Spearman rank correlations; Extended Data Fig. 2c,d). We conclude that CCDC26-SVs and associated enhancer amplifications activate MYC expression.
We next used minimal reporter assays to confirm that the H3K27ac peak in CCDC26 represented a functionally relevant, lineage-specific enhancer. We generated two enhancer reporter systems (E1 and E2), each encompassing slightly more than half of the enhancer, with a small region of overlap (Fig. 1f). We transduced two histone-mutant pHGG cell lines with the reporter constructs and evaluated induction of luciferase expression to mark enhancer activity. In both lines, the E1 enhancer region was sufficient to increase luciferase expression relative to the vector control (P < 0.01 in both cases, n = 3, nested one-way analysis of variance (ANOVA): Tukey's multiple comparisons; Fig. 1g, Extended Data Fig. 2e). We performed similar experiments in the lung cancer cell line A549 and found no increase in luciferase activity, although previously validated lung adenocarcinoma (LUAD) MYC enhancers 34 did induce luciferase activity (P = 0.96 (E1 versus control); P = 0.0071 (LUAD versus control), n = 3, nested one-way ANOVA: Tukey's multiple comparisons; Extended Data Fig. 2f).

MYCN activation through enhancer amplification and hijacking.
Somatic enhancer amplification also seems to play a part in the activation of MYCN in pHGG. The SRJ (Extended Data Fig. 3a) connecting ID2 with MYCN represents a set of complex SVs in tumors with high-level amplifications within this region on chromosome 2 (Fig. 2b). ID2 is a transcription factor regulating neural differentiation 39 , and the ID2 locus is associated with an H3K27ac enhancer that was present across all analyzed pHGG tumor samples (Extended Data Fig. 3b). The MCR of the MYCN-ID2 amplification contains both the ID2-associated enhancer and the coding sequence of MYCN (Extended Data Fig. 3b). The SVs result in juxtaposition of the enhancer in ID2 with MYCN, reducing the distance between the two from the normal 7 Mbp to less than 700 kbp (Fig. 2c). These data suggest these SVs hijack the ID2 enhancer to activate MYCN.
We also identified four pHGGs with MYCN amplifications that did not connect to ID2. However, these latter 'localized MYCN' amplicons always encompassed more of the immediate neighborhood of MYCN than the complex MYCN-ID2 amplicons. In contrast to MYCN-ID2 amplicons, which only contained a small fraction of the MYCN TAD (23% on average), localized MYCN amplicons contained most of this TAD (60% on average, P = 0.03, t-test), including several enhancers not included in the MYCN-ID2 amplicons (Fig. 2d).
The high-level MYCN amplicons showed typical characteristics of extrachromosomal amplicons, reaching CNs of 50-300 per cell. Other oncogenes with absolute CNs greater than 10 have been    shown to reside on extrachromosomal amplicons in various cancer types 13,40 . Although projections on the linear reference genome resulted in typical complex patterns 14 , it was possible to construct circular amplicons containing MYCN and, in the MYCN-ID2 cases, ID2 (Fig. 2e). Indeed, these circular amplicons represent an optimal solution to explain the joint CN and SV profiles in this region. We therefore sought to validate this using metaphase fluorescence in situ hybridization (FISH) on a pHGG cell line derived from a tumor with a MYCN-ID2 rearrangement. We found abundant extrachromosomal amplicons containing both the MYCN and ID2 loci (Fig. 2f). These appeared to reflect multiple subclonal amplicons, including some containing additional oncogenes such as MDM4 (Fig. 2g). The CN of MYCN was consistently higher than that of ID2, raising the possibility that ID2 was incorporated into a subset of preexisting MYCN amplicons during tumor development.
Alternatively, MYCN might be further amplified within cells that already harbor MYCN-ID2 amplicons. These data suggest that MYCN-ID2 rearrangements are an example of enhancer hijacking, bringing a strong enhancer in ID2 next to MYCN on amplicons without the endogenous elements of the MYCN TAD, whereas amplifications of MYCN without ID2 amplify enhancers within the MYCN TAD.

Recurrent SV around RTKs suggest extrachromosomal DNA.
The remaining SRBs all involved RTK genes that are known to be amplified and oncogenic in pHGG: PDGFRA, EGFR and MET 3-7,41 . These loci, along with MYC and MYCN, were also the only regions with high-level amplifications that recurred in at least three patients (Fig. 3a). The RTK SRBs also comprised both simple SVs that presumably amplify local enhancers ( Fig. 3b and Extended Data Fig. 4c) and SVs that appeared to reflect complex extrachromosomal amplicons that integrate distant sites and reach as many as 200 copies (Fig. 3c,e).
Overall, 35 of 179 tumors showed at least one >50 kbp amplicon with an absolute CN greater than ten. Among 34 of these 35 tumors, the high-level amplicons contained at least one well-known oncogene, and, apart from the coding sequence of the oncogene, they recurrently incorporated the same genomic loci around the oncogene (Extended Data Fig. 4a,b for PDGFR and EGFR). In several pHGGs, we detected SVs that allowed for the reconstruction of circular extrachromosomal amplicons containing multiple oncogenes from different chromosomes (Fig. 3d,e). We again validated this by performing FISH on a tissue slide of a tumor with a high-level amplification and SVs connecting segments on chr8 (including GATA4) and chr10 (including FGFR2). This showed massively increased numbers of foci for both the GATA4 and FGFR2 probes. In many cases, the signal of both probes overlapped, indicating colocalization (Extended Data Fig. 4e). The number of copies per cell was highly variable, and the amplicons were distributed in heterogeneous clusters throughout the nucleus. All these features have been associated with extrachromosomal amplicons 13 . These data suggest that only a subset of pHGGs develop high-level amplicons, which recurrently contain the same (presumably regulatory) sequences in addition to the target oncogene and can contain segments originating from different chromosomes.
To further understand the structure of these amplicons, we first focused on high-level amplicons containing PDGFRA. These amplicons span more than 2.5 Mbp and are superimposed on low-level amplicons of the surrounding region, often starting from the centromere. The amplicons included KIT in 80% of cases and KDR in 60%. All but one (14 of 15, 93%) of the amplicons in the PDGFRA TAD amplified PDGFRA itself, often to the highest CNs reached in the region (Extended Data Fig. 4a). The sole exception was the amplification of a short sequence centromeric to PDGFRA containing H3K27ac peaks (Fig. 3b) that have been shown to interact with the PDGFRA promoter in pHGGs 18 and adult GBMs 42 , suggesting use of enhancer amplification to activate PDGFRA. Indeed, this region was included in the PDGFRA amplicon in nearly all tumors (14 of 15, 93%) (Extended Data Fig. 4a). These data suggest that SVs in pHGG recurrently incorporate an upstream enhancer-rich region into high-level PDGFRA amplicons.
The high-level EGFR and MET amplicons also extended beyond the RTK coding sequence to recurrently involve associated enhancers (Extended Data Fig. 4b-d). The EGFR amplicons showed a skew towards the enhancers in SEC61G, which drive EGFR expression in extrachromosomal amplicons in adult GBM 20 . Both EGFR and MET amplicons showed subclonal SVs within the coding sequence, potentially allowing for expression of alternate transcripts. In two tumors, these resulted in the EGFRvIII variant. EGFR amplicons also showed complex subclonal structures with incorporation of distant oncogenes (Fig. 3e). MET amplicons skewed towards a region including enhancers in CAPZA2 (Extended Data Fig. 4d). We conclude that the RTK gene amplicons are shaped by the epigenetic machinery necessary to drive their expression.
SV signatures relate to genetic and epigenetic tumor states. The discovery of these two distinct classes of recurrent SVs, simple enhancer amplifications and complex extrachromosomal DNA (ecDNA)-based amplicons, raises the question of whether these alterations are part of distinct variant patterns in different pHGGs. Unsupervised identification of SV signatures can reveal tumor subgroups with distinct SV-generating mechanisms 26 . Recent work has identified SV signatures that are present across cancer types, albeit in tissue-specific conformations and with tissue-specific variant associations 26 . We performed a manual review of individual SVs to assess their probable mechanisms of formation as described above; however, automated classifiers have recently been developed for genome-wide analyses 43 . Using methods developed by the Pan-Cancer Analysis of Whole Genomes group 23 , we detected 10,385 complex and/or clustered SVs. Among these, automated methods further classified 21% by their probable formation mechanism. We therefore combined both approaches to identify SV signatures, using the more precise formation mechanisms when available and denoting the remaining complex events as complex-NOS (not otherwise specified).
We thus obtained nine SV signatures (Fig. 4a). Six of these signatures were complex (breakage fusion bridge cycle (BFB), chromothripsis, chromoplexy, complex-NOS, DMs and tyfonas) and three were simple (deletion, duplication, translocation/inversion). The DM signature comprised only DMs, of which 6% were complex DMs according to the categorization of Hadi et al. 43 . Templated insertion chains contributed 6% of the translocation/inversions, 1% of the deletions and 1% of the complex-NOS SV signatures. Inversions contributed 35% of the translocation/inversion signature and 13% of the duplication signature. Four signatures were composed entirely of a single type of feature: chromothripsis, BFB, chromoplexy and tyfonas (Fig. 4b).
We next looked for possible causes and consequences of the pHGG SV signatures by testing for associations between the activity of each signature and the presence of recurrent and known oncogenic 44 variants (Supplementary Table 5). As expected, the DM signature was associated with ecDNA amplifications of MYCN and ID2. The complex-NOS SV signature was closely associated with focal TP53 disruption and loss of 17p (encompassing TP53) and anticorrelated with oncogenic mutations in PPM1D, ACVR1 and HIST1H3B (Extended Data Fig. 5a). Notably, we observed complex-SV signatures in pHGGs with high SV counts (ρ = 0.68, P < 2.2 × 10 −16 , Spearman's). Unlike several high-SV-count adult cancers with disrupted DNA damage response (DDR) 22,23,45,46 and HR/BRCA 23 , where tandem duplication signatures were dominant, pHGGs with the simple tandem duplication signature tended to have few SVs. None of the genes previously implicated in tandem duplication signatures or loss of HR in adults 23 reached a significant level of association with any signature in pHGG.
We next asked whether pHGGs separated into subsets with different DNA damage and damage response characteristics based on patterns of both SVs and SNVs. We detected an anticorrelation between the complex-SV signatures and the three simple-SV signatures (Fig. 4c). Evaluation of SNV mutation patterns revealed 14 SNV signatures, including signatures similar to known aging, APOBEC (COSMIC signature SBS13 (ref. 47 )), HR deficiency (SBS3 (ref. 25 )) and hypermutation SNV signatures 25 (Extended Data Fig. 5b-d).   We also performed signature analyses using alternative methods; 26 these generated similar results (Extended Data Figs. 6, 7).
The entire pHGG cohort separated into two groups reflecting different amplitudes of the 9 SV and 14 SNV signatures (Fig. 4d). One cluster (complex-SV) was dominated by complex-SV signatures (q values ranging from 0.02 for DM to 1.4 × 10 −30 for complex-NOS; Extended Data Fig. 8a). SBS3 and SBS13 were also enriched in this cluster, following their close correlation with the complex-NOS SV signature (Extended Data Fig. 8b). The complex-SV cluster was enriched for TP53 inactivation (q < 0.1, Fig. 4e), SVs surrounding and amplification of PDGFRA, EGFR and MET (q < 0.1). By contrast, the other cluster (SNV-dominant) was dominated by simple-SV signatures (q < 8 × 10 −5 ), lacked TP53 disruption (q < 0.1) and was enriched for PPM1D mutations. This cluster seemed to be driven instead by a combination of SNVs including ACVR1, PPM1D, H3.1 K27M and PIK3CA mutations (all q < 0.1). Both clusters included hemispheric and midline gliomas. H3.3 K27M showed no enrichment in either cluster (q = 0.46). These data suggest that pHGG genomes are shaped by at least two distinct variant-generating processes, which are associated with distinct driver combinations.

Signatures indicate two groups, complex-SV and SNV-dominant.
We next evaluated whether SV signatures could inform pHGG subtypes. Currently, pHGGs are classified according to their location and histone mutations; different histone mutations are known to be associated with distinct recurrent SNVs and SCNAs. We confirmed these known relationships 3,48 and detected two additional associations with SVs (Fig. 5a,b and Extended Data Fig. 8c). The SV in CCDC26 resulting in MYC enhancer amplification was enriched in H3.3 K27M gliomas (q = 0.008), and H3.1 K27M pHGGs were enriched for a focal deletion of CDKN2C with breakpoints in the adjacent gene FAF1 (q = 0.04).
Across the DMGs with more than 20% complex-SV signature activity (denoted H3 K27M complex-SV, including H3.3 K27M (n = 43) and H3.1 K27M (n = 3) DMGs), the TP53 pathway was inactivated almost universally through direct disruption of TP53 (44 of 46 cases, 96%; Fig. 5a). By contrast, the majority of DMGs with less complex signature activity (denoted H3 K27M SNV-dominant, H3.3 K27M (n = 30) and H3.1 K27M (n = 21)) lacked direct TP53 disruption (37 of 51, 73% TP53 WT ; q = 2.3 × 10 −5 ) but appeared to suppress the TP53 pathway through other mechanisms. Mutations in PPM1D were more prevalent in this group, although they were still a minority (7 of 30 H3.3 K27M , 2 of 21 H3.1 K27M , 20% in total; versus 1 of 46 H3 K27M -complex tumors; q = 0.008). It is possible that gains of 1q, encompassing MDM4, also served to suppress the TP53 pathway in these tumors. Although 1q spans approximately 2,580 genes, we observed two sources of evidence that their prevalence in SNV-dominant DMGs was related to MDM4 and TP53 pathway suppression. First, MDM4 was significantly overexpressed in 1q-amplified pHGGs of all types in our cohort (q = 0.004; Extended Data Focusing on the tumors that harbored significantly recurrent SVs, we observed two groups. One group contained tumors with high-level amplicons of PDGFRA, EGFR, MET, MYC and MYCN (oncogene-amp). By contrast, the second group showed amplification of presumed enhancer elements within the TADs of these oncogenes without amplification of their coding sequences (enhancer-amp). The oncogene-amp pHGGs exhibited significantly higher activity of complex-SV signatures (P = 4 × 10 −7 ; Fig. 6a,b). The two groups also harbored inactivating alterations in different DDR genes (Fig. 6a). Oncogene-amp pHGGs were enriched for TP53 SNVs (69% of oncogene-amp versus 18% of enhancer-amp pHGGs, q = 0.01) and RB1 deletions (23% of oncogene-amp versus 0% of enhancer-amp pHGGs, q = 0.16). By contrast, enhancer-amp pHGGs were enriched with PPM1D SNVs (29% of enhancer-amp versus 0% of oncogene-amp pHGGs, q = 0.03) and gains of 1q encompassing MDM4 (71% of enhancer-amp versus 34% of oncogene-amp pHGGs, q = 0.16). In summary, alterations in TP53 and RB1 are associated with complex-SV signatures and high-level amplifications of oncogenes, whereas PPM1D SNVs and 1q gains more frequently occur with simple-SV signatures and amplifications of enhancer elements near oncogenes. These data raise the possibility that alterations in the DDR shape not only the processes that generate SVs but also the types of driver alterations they exhibit in MYC, MYCN and RTK genes.
Temporal evolution of genetic variants. The correlation between the presence of a variant and the activity of a signature by itself cannot tell us anything about the direction of the link between the two. This is most obvious for the inactivation of tumor suppressors through CN loss or the amplification of oncogenes and their associations with complex-SV signatures. These events could be direct consequences of the activity of this signature. On the other hand, these genetic variants could drive survival after catastrophic SVs, increase genomic instability and thereby drive  happen later in tumor development as a consequence of complex SVs involving these genes. Notably, we observed no effects of therapy on SV patterns, suggesting that the SVs occurred during gliomagenesis. Although radiation treatment has been shown to induce DNA breaks 50 , we found no differences in the number of SVs per sample (median 35 versus 42; q = 0.6) or in the activity of the complex-SV signatures (median 24% versus 28%; q = 0.7) between pretreatment biopsy and autopsy samples (Extended Data Fig. 9e,f).
We performed a timing analysis reflecting the relative ordering of mutations and SCNAs during gliomagenesis 51 . Focusing on the subset of pHGGs with simple enhancer amplifications (enhancer-amp pHGGs), we found that the focal amplification of the MYC enhancer in CCDC26 was one of the earliest variants in these samples (Fig. 6c), occurring earlier than alterations in PPM1D and 1q/MDM4 gain. By contrast, amplification of the MYC isoform and RTK genes in the oncogene-amp samples happened after the loss of the tumor suppressors TP53, RB1 and CDKN2A/B (Fig. 6d). These data suggest that simple tandem duplications can arise in tumors without major disruptions of DDR, potentially contributing to tumor initiation, whereas the creation of complex high-level oncogene amplicons requires prior direct genetic disruption of TP53, RB1 or CDKN2A/B.
Prior studies have shown histone mutations to be the initiating event in the pHGGs in which they occur 8,9,35,41 . However, the studies investigating pHGG evolution in human tumor tissues were limited to exomic alterations in fewer than 15 patients [52][53][54][55] . Using the power of the WGS data from 179 tumors, we confirmed the findings of these previous studies [52][53][54][55] , including that H3 K27M mutations are the earliest mutations, followed by SNVs in ACVR1 and TP53 in H3.1 K27M and H3.3 K27M gliomas, respectively (Extended Data Fig. 10). This large WGS cohort also allowed us to time focal SCNAs based on the ratio of SNVs acquired before and after each change in each CN 51 Fig. 9e,bottom). To address SV signatures specifically, we also investigated the correlations between the numeric values of the combined complex-SV signature and OS. Across all DMGs, this complex-SV signature was significantly anticorrelated with OS ( Fig. 7a; P = 0.001).
The combined complex-SV signature was also significantly associated with shorter survival in a multivariate Cox regression analysis of DMGs that controlled for the known predictors of survival 3,48 (histone SNV and age) and for TP53 status ( Fig. 7b; P = 0.038). This analysis confirmed a significantly increased hazard ratio for H3.3 K27M compared with both H3.1 K27M and H3 WT DMGs and a lack of significant associations between TP53 disruptions and OS in multivariate analyses as previously described 48 . However, associations with age did not reach significance, probably owing to our low representation of the under-three and over-ten age groups. Although all patients with DMGs in our study died from their disease, the combined effects of these factors caused survival differences of several months. For example, children with DMGs with at least 20% complex-SV activity survived a median of 9.6 months, about 3 months less than the 12.3-month survival of children with less than 20% complex-SV activity (Fig. 7c).

Discussion
These analyses found recurrent SVs, including a tandem duplication in 12% of all DMGs encompassing a MYC enhancer; revealed distinct SV signatures; and indicated two classes of DMG, whose driver alterations were either largely complex SVs or dominated by SNVs.
The MYC enhancer amplifications highlight an underrecognized role for MYC in pHGGs. MYC is the most frequently amplified gene across all cancers, with focal amplifications observed in 15% of tumors 49 . By contrast, MYC amplifications only occur in 5% of pHGGs 3 . The observation of MYC enhancer amplification in pHGGs, without amplification of MYC itself, enables us to start to address this discrepancy. Although tissue-specific amplifications of MYC enhancers occur in other cancers 34 , CCDC26 duplication is a pHGG-specific occurrence apparently driven by differences in enhancers across cell types. Altogether, when including high-level MYCN amplifications, 14% of pHGGs harbored SVs predicted to activate MYC pathways. Given this high rate, the role of MYC in pHGG formation requires further study.
Although both SNV-dominant and complex-SV pHGGs activate MYC signaling pathways, they do so in strikingly different ways. Whereas SNV-dominant pHGGs amplify only the MYC enhancer in CCDC26, pHGGs with complex-SV signatures contain high-level amplicons of both the MYC coding sequence and segments of CCDC26, PVT1 or other distant regions. Amplifications of MYC-PVT1 have been reported in DMGs and other cancers 57,58 . These additional segments could contain independent oncogenic activity, as has been proposed for PVT1, or they could represent regulatory elements that have been hijacked to drive MYC expression. The complex MYC amplicons are often extrachromosomal, as indicated by their circular topology and high CN. In this respect, MYC serves as an example for other oncogenes, including MYCN, PDGFRA, EGFR and MET. Extrachromosomal amplicons (also known as DMs) containing recurrent oncogene-enhancer combinations occur in several cancers 12,20,59 . However, their regulatory elements differ from those of pHGG and appear to reflect the tissue specificity of regulatory loci 19 .
DMs have been shown to originate as byproducts of chromothripsis 10 . Our data suggest that in pHGG they often contain multiple oncogenes from different chromosomes. These DMs would therefore either require simultaneous chromothripsis of two chromosomes or need to develop sequentially by a less-clear mechanism. Our data also suggest multiple variants of DMs within individual pHGGs. These could be correlated with different descendants of the initial DM, as suggested by recent mechanistic and long-read sequencing studies 10,14 . In cases where two oncogenes are integrated into a DM that is subsequently amplified, the number of copies of each oncogene should be identical. However, pHGGs often exhibit different amplification levels of these oncogenes, suggesting sequential incorporation into the amplicon. The exact mechanism for this remains elusive; the possibilities range from sequential chromothripsis events 10 to reversible DM integration in proximity to oncogenes 11 , or deletions within the DMs. It is tempting to speculate that the evolution and optimization of DMs 13 could contribute to the rapid, lethal growth of pHGGs and their poor response to available therapies. RTK inhibition is still a promising goal in pHGG 4 , but our study highlights that understanding how these DMs evolve might provide insight into resistance mechanisms.
We also observed an association between H3.3 K27M , complex-SV signatures and TP53 loss. Although TP53 disruption is known to be associated with higher SV burden 49 , the reason for its association with H3.3 K27M instead of H3.1 K27M is unclear. H3.1 K27M and H3 wt pHGGs also included both complex-SV and SNV-dominant tumors, although H3.1 K27M DMGs were enriched in the SNV-dominant subgroup. The split into complex-SV and SNV-dominant types observed in H3.3 K27M pHGGs could also occur in H3.1 K27M and H3 wt tumors-indeed, these distinctions may exist in other tumor types-but our cohort was insufficient to address this possibility.
We found TP53 disruption to be an early event in tumors with complex SVs. TP53 disruption also precedes and might facilitate survival after chromothripsis in medulloblastoma 60 . Notably, although almost all DMGs with complex-SV signatures were TP53 disrupted, not all TP53-disrupted DMGs showed complex-SV signatures. In addition, hemispheric pHGGs with complex-SV signatures were frequently TP53 WT but often harbored early loss of CDKN2A/B. This indicates that although TP53 loss and H3.3 K27M are correlated with complex-SV signatures, they are neither necessary nor sufficient, either alone or in combination, for the generation of complex-SV signatures in pHGG.
Finally, we found variants in known cancer genes in 98.3% (176 of 179) of pHGGs, substantially expanding the share of patients with identified potential drivers compared with those reported by prior exome-sequencing-based studies. Many of our observed alterations were in noncoding regions of the genome, targeting regulatory elements such as enhancers. WGS also allowed us to determine which patients had complex-SV or SNV-dominant signatures, which were associated with survival, controlling for histone and TP53 status. The association between the complex-SV signatures and survival might be causative and indicate potential therapeutic targets or it could represent a quantifiable biomarker for underlying factors such as genome instability. In any case, these findings indicate that both research and clinical sequencing of these tumors should encompass the whole genome.

WGS.
Library preparation for paired-end WGS was performed 16 . Genomic DNA was fragmented and prepared for sequencing (60× depth for tumors and 30× depth for normal samples) on an Illumina HiSeq 2000. Reads from both novel and published data were aligned to hg19/GRCh37 with Burrows-Wheeler Aligner (BWA), duplicate-marked and indexed using SAMtools and Picard. Base quality score was bias-adjusted for flow cell, lane, dinucleotide context and machine cycle and recalibrated, and local realignment around insertions or deletions (indels) was achieved using the Genome Analysis Toolkit. All paired samples underwent quality control.
Recurrent juxtapositions (2D analysis; Extended Data Fig. 3a) were detected using a background model determined from 2658 cancers across several types 21 and a binning scheme (5,583 bins, median span 467 kbp, interquartile range (IQR) 347 kbp). One SV from each sample was allowed to contribute to connections between any two bins (a 'tile'). SVs with at least four supporting reads and a span of >1 kbp were included in this analysis. The P values reflecting the significance of enrichment of SVs within each tile were corrected using the Benjamini-Hochberg procedure. Only significantly recurrent juxtapositions that did not occur at the same nucleotide position, had a mean SvABA-assigned quality score greater than 20, included at least one SV detected with postassembly (ASDIS or ASSMB) evidence, occurred in more than two samples and had a q value smaller than 0.1 were considered for further analysis. SV signature analysis. SV signature analysis followed published approaches 22,23 . SVs were stratified according to the span between the two breakpoints (0-30 kbp, 0.03-1 Mbp, >1 Mbp, interchromosomal); read orientation (deletion, duplication, inversion and interchromosomal); and whether they were clustered, as determined by clusterSV 23 . This was analyzed with Bayesian NMF using SignatureAnalyzer 24,25 .
JabBa was used to generate genome graphs 43 . SV events were called on the gGraph output from JabBa using the gGnome::events function. Events were mapped to individual SVs with gGnome designations when available. SVs without annotations from JabBa/gGnome were classified as 'complex-NOS' if they had cluster size >2 according to the clusterSV method 23 or involved an inversion, translocation, deletion or duplication based on the orientation of their supporting reads. The count matrix was analyzed with Bayesian NMF using SignatureAnalyzer 24,25 .
RNA-seq analysis. RNA-seq data were available for 112 of 179 tumors (57 sequenced de novo and 55 previously published 4,6 ). For de novo samples, cDNA libraries were prepared 16 using the Tru-Seq Strand Specific Large-Insert kit and sequenced to a depth of 50 million paired ends using Illumina HiSeq. All reads were aligned to the hg19 reference genome using STAR and quantified with RNA-SeQC following the GTEX analysis pipeline 66 . Counts were normalized using the VST transform as implemented in DESeq2 (ref. 67 ) and batch-corrected with COMBAT 68 as implemented in sva 69 .
ChIP. Active Motif was used to perform ChIP-seq. Cells were fixed with 1% formaldehyde (15 min) and quenched with 0.125 M glycine. Chromatin was isolated by adding lysis buffer then disrupted with a Dounce homogenizer. Lysates were sonicated and DNA was sheared to an average length of 300-500 bp with Active Motif 's EpiShear probe sonicator (cat no. 53051). Genomic DNA was prepared by treatment with RNase, proteinase K and heat for de-crosslinking, followed by clean-up with SPRI beads (Beckman Coulter) and quantitation by Clariostar (BMG Labtech).
An aliquot of chromatin (30 μg) was precleared with protein A agarose beads (Invitrogen). Genomic DNA regions of interest were isolated using 4 μg of antibody against H3K27ac. Complexes were washed, eluted from the beads with sodium dodecyl sulfate buffer and treated with RNase and proteinase K. Crosslinks were reversed by incubation overnight at 65 °C, and ChIP DNA was purified by phenolchloroform extraction and ethanol precipitation.
ChIP-seq. Illumina sequencing libraries were prepared from the ChIP and input DNAs by the standard consecutive enzymatic steps. Steps were performed on an automated system (Apollo 342, Wafergen Biosystems/Takara). After PCR amplification, the DNA libraries were sequenced on Illumina's NextSeq 500 (75-nucleotide (nt) reads, single end). Reads were aligned to hg19 using the BWA algorithm (default settings). Duplicate reads were removed, and only those with mapping quality ≥25 were used for further analysis. Alignments were extended in silico at their 3ʹ-ends to a length of 200 bp and assigned to 32-nt bins along the genome. Published H3K27ac ChIP-seq sequencing data from primary DMGs were downloaded from GSE128745 (ref. 35 ). Peaks were called using MACS2 (ref. 70 ) callpeak with -B -SPMR to save the fragment pileup per million reads track. The bdg files were used to calculate fold enrichment and q value tracks with MACS2 bdgcmp, transformed into bigwig files with rtracklayer and visualized with CN and SV calls in gTrack. Additional bigwig files for adult GBM H3K27ac, pHGG ATAC-seq and noncancerous or nonbrain tissues were downloaded from GSE54792 (ref. 36 ), GSE126319 (ref. 37 ) and the Encode project 38 , respectively.
Hi-C. Library generation and sequencing. In situ Hi-C libraries were generated from 5 million cultured H3.3 K27M glioblastoma cell lines (HSJ-019 and HSJ-031) and H3.3 K27M primary tumors (HSJ-031 and 039) following published protocols 71 with minor modifications. Briefly, the steps were as follows: (1) crosslinking cells with formaldehyde; (2) digesting the DNA using a 4-cutter restriction enzyme (for example, DpnII) within intact permeabilized nuclei; (3) filling in, biotinylating the resulting 5′-overhangs and ligating the blunt ends; (4) shearing the DNA; (5) pulling down the biotinylated ligation junctions with streptavidin beads; (6) library amplification and (7) analyzing these fragments using paired-end sequencing. Quality control for efficient sonication was performed through a combination of agarose DNA gel electrophoresis with appropriate size selection using an Agilent Bioanalyzer on final amplified libraries, followed by low-pass sequencing on an Illumina HiSeq 2500 (~30 M reads per sample) to assess the quality of the libraries using the percentage of reads passing the filter, percentage of chimeric reads and percentage of forward-reverse pairs. Data processing. Additional Hi-C files for neural progenitor cells were downloaded from www.synapse.org/#!Synapse:syn12979101 (registration required; Data Download -Study 'iPSC-HiC') 72 . Analysis of Hi-C generated and downloaded fastq files was performed using Juicer 73 . Contact maps were generated using Juicer with the following parameters: -s DpnII -g hg19. Map resolution was determined using Juicer's 'calculate_map_resolution.sh' script. Hi-C contact maps and associated annotations were visualized using Juicebox. The HIFI algorithm was used to process 5-kb-resolution Hi-C data to obtain higher accuracy estimates of interaction frequencies, using the following parameters: bandSize = 1,000, outputNormalized, boundaryKS = 1,000. TAD boundaries were determined using RobusTAD 74 . Composite figure panels including Hi-C and other genomic data were created using plotgardener (https://github.com/PhanstielLab/ plotgardener).
Luciferase reporter. Cell lines. The pHGG cell line DIPG13 (ref. 75  Cell line authenticity and mycoplasma surveillance. Cell line authenticity was confirmed using short tandem repeat (STR) profiling. All cell lines were monitored and confirmed to be negative for mycoplasma infection using the MycoAlert Mycoplasma Detection Kit (Lonza) following the manufacturer's protocol.
Luciferase reporter construction. A lentiviral firefly luciferase reporter system was constructed from pGL4.26 (Promega) and the pLKO.1 backbone via Gibson Assembly. The pLKO.1 backbone was digested with FastDigest KflI and EcoRI. The minimal promoter firefly reporter cassette was PCR-amplified from pGL4.26 using the lucminP primer set (Supplementary Table 6) using NEB Q5 polymerase. These two fragments were assembled into the lentiviral firefly luciferase reporter using the NEBuilder HiFi DNA Assembly Cloning Kit according to the manufacturer's instructions. The DNA sequence in the H3K27ac peak in the consensus CCDC26-SV amplicon was split into two fragments (E1 and E2) and PCR-amplified from DIPG13 genomic DNA with the primers listed in Supplementary Table 6 using NEB Q5 polymerase. The resulting E1 and E2 fragments were cloned into the vector using the KPN1 and NHE1 restriction sites. The lentiviral constitutively active pLX313-Renilla construct was obtained from Addgene (plasmid no. 118016) to serve as an intrinsic control.
Viral production. HEK-293T (ATCC CRL-3216) cells were cultured in T75 tissue culture treated flasks in DMEM (Gibco) supplemented with 10% fetal bovine serum (FBS; Gemini Bio). Lipofectamine-3000 (Invitrogen) was used to transfect with the plasmid of interest, in addition to packaging plasmids VSV-G and psPAX2, according to the manufacturer's protocol. Media were replaced with DMEM supplemented with 20% FBS 6 h after transfection. The media were harvested 24-48 h posttransfection, and the viruses were concentrated (20×) using a Lenti-X Concentrator (Takara Bio) per the manufacturer's protocol.

Lentiviral infection of BT245 and DIPG13 cells.
Cells were dissociated and plated in a 12-well tissue culture plate at a density of 1.5 million cells ml −1 . Concentrated virus was added to the medium, and the cells were centrifuged for 120 min at 850g and 30 °C. Cells were placed in a T75 ULA flask; for selection,1 µg ml −1 puromycin for the firefly reporter and 300 µg ml −1 hygromycin for LX313-Renilla were added the following day to achieve survival of 40-80% under infected conditions. A549 transduction. A549 (ATCC CCL-185) cells were cultured in a T75 tissue culture flask in RPMI (Gibco) supplemented with 10% FBS. Lipofectamine was used to transduce the enhancer reporter plasmids following the manufacturer's protocol.
Luciferase reporter readout. The Dual-Glo Luciferase Assay System (Promega) was used following the manufacturer's protocols for all measurements 4 days post spinfection (2 days post puro selection).

Visualization and reconstruction of complex MYCN and RTK amplicons.
JaBbA 43 was used to generate cancer genome graphs using SvABA SV, GATK CNV, and absolute purity and ploidy as inputs. Tracks were visualized in gGnome/ gTrack, which was also used to calculate distances between loci in the cancer genome. Extrachromosomal amplicons were inferred by using a subset of only circular path segments with CN > 20 and reconstructed with the gGnome walks() function.
Genomic loci recurrently incorporated into the amplicons were determined based on the distribution of amplicons around the oncogene 19,20 . TADs adjacent to the amplified oncogene were divided into 10 kbp windows. The average CN per 10 kbp window was calculated for all tumors with an amplicon of CN > 5 anywhere in the TAD of the oncogene (using the germinal zone TAD boundaries from GSE77565 (ref. 63 )). Among tumors with an amplification of CN > 5 anywhere within the TAD of the oncogene, the fraction of tumors with an amplification in each 10 kbp window was determined. The locations of probable enhancer elements, which are necessary to drive expression of the amplified oncogene, were inferred from the direction of the skew of the observed distribution compared with the expected symmetric normal distribution.
Slide processing for paraffin-embedded tissue samples. Slides were placed in a 90 °C oven for 15 min. Slides were then deparaffinized with xylene (two times, 15 min each) at room temperature (RT), dehydrated in 100% ethanol for 5 min at RT, and placed in 10 mM citric acid (pH 6.0) and microwaved for 10 min. Following this, the slides were immersed in 2x standard saline citrate (SSC) for 5 min at 37 °C, followed by digestion in 0.2 % pepsin working solution (1.2 g pepsin per 600 ml, 0.9% NaCl pH 1.5) at 37 °C for 12 min. Immediately after digestion, the slides were dehydrated using an ethanol series (70, 85, 100%) for 2 min each at RT. A working solution of GATA4/3ʹFGFR2 (Mayo Clinic laboratory-developed probe) was made by mixing 2 l of concentrated 3'FGFR2 probe and 1 µl of concentrated GATA4 probe with 7 µl of LSI/WCP hybridization buffer (Abbott Laboratories). The working solution was applied to the target areas, coverslipped, co-denatured with a ThermoBrite at 83 °C for 5 min and hybridized overnight in a 37 °C humidified oven. Following hybridization, slides were soaked in RT 2xSSC/0.1% NP-40 to remove coverslips, placed in 2xSSC/0.1% NP-40 at 74 °C for 2 min and then placed in RT 2xSSC/0.1% NP-40 for 2 min. The slides were stained with 4ʹ-6,-diamidino-2-phenylindole (DAPI) (Vector Laboratories) and coverslipped.
ID2/MYCN metaphase FISH. Probe specifics. ID2/MYCN enumeration was analyzed with FISH. BACs covering the ID2 gene region were identified using the UCSC August 2021 Assembly hg38. The ID2 clone (CTD-2131H8) was labeled by nick translation with Spectrum Green (Abbott Molecular), and the MYCN probe was commercially available from Abbott Molecular. The ID2 probe and MYCN probe were combined to create an enumeration probe set.
Slide processing for metaphase samples. Slides were air-dried at RT overnight. Following this, the slides were immersed in 2xSSC for 30 min at 37 °C. The slides were dehydrated using an ethanol series (70, 85, 100%) for 2 min each at RT. A working solution of ID2/MYCN was made by mixing 2 μl of concentrated ID2 probe and 1 μl of concentrated MYCN probe with 7 μl of LSI/WCP hybridization buffer (Abbott Laboratories). The working solution was applied to the target areas, coverslipped, co-denatured with a ThermoBrite™ at 73 °C for 5 min and hybridized overnight in a 37 °C humidified oven. Following hybridization, slides were soaked in RT 2xSSC/0.1% NP-40 to remove coverslips, placed in 2xSSC/0.1% NP-40 at 74 °C for 2 min and then placed into RT 2xSSC/0.1% NP-40 for 2 min. The slides were stained with 10% DAPI (Vector Laboratories) and coverslipped. SNV signature analysis. De novo SNV signature extraction was performed using Bayesian NMF in SignatureAnalyzer 24,25 . The resulting SNV signatures were compared with the COSMICv3 SBS signatures using cosine similarity to annotate known etiologies and signature names. DeconstructSig 76 was used to extract SBS signatures with the highest degree of similarity to the de novo signatures, including a designation of 'unknown' .

Signature integration and definition of signature clusters with similar variant-generating processes.
To better understand the information contained in each of the nine SV and 14 SNV signatures, consensus clustering was applied to the tumor × signature proportion matrix, comprising the 23 values representing the proportions of each of the SV and SNV signatures of all SV and SNV signatures in each tumor. The proportions for each signature were median-centered across all tumors before consensus clustering with the ConsensusClusterPlus R package using the following parameters: reps=1,000, pItem=0.9, pFeature=0.9, clusterAlg = 'hc' , distance = 'spearman' . The resulting most stable and informative clusters were named 'complex-SV' and 'SNV-dominant' , after the signatures with the highest enrichment in the cluster.
Chromothripsis and extrachromosomal amplicons. To define regions of chromothripsis, we used Shatterseek 77 . Regions in the genome with CN > 10 extending for more than 50 kbp were defined as probably extrachromosomal or derived from an extrachromosomal stage.
Comut plots and variant combination matrix. SNVs were annotated using Oncotator. SVs were annotated and linked to a gene based on whether the SV breakpoints were in exons of the gene (named 'coding SV'), intronic ('intron SV') or in the TAD of the gene ('flank SV'). Absolute purity-and ploidy-adjusted CN was determined for each gene using the width-weighted mean CN from all segments overlapping the gene.
To create the variant combination matrix, we used a subset of only Cancer Gene Census 44 genes and genes that showed significantly recurrent variants in this cohort. For SNVs, the variant classification was simplified to truncating_ snvs = ('Nonsense_Mutation' , 'Frame_Shift_Del' , 'Frame_Shift_Ins' , 'Splice_Site' , 'Start_Codon_SNP' , 'START_CODON_SNP' , 'Translation_Start_Site') and missense.snvs = ('Missense_Mutation' , 'In_Frame_Del' , 'Stop_Codon_Del' , 'DE_ NOVO_START_IN_FRAME' , 'DE_NOVO_START_OUT_FRAME' , 'Nonstop_ Mutation' , 'In_Frame_Ins' , 'START_CODON_INS'). SCNAs of genes with a ploidy-and purity-adjusted CN of <0.4 were annotated as 'homdel' , CN > 5.4 as amp, CN > 10 as ExChr_amp and amplifications covering only parts of a gene with a CN > 3.1 as 'part.amp' based on the CN histogram across all tumors defining recurrent CN states. A genetic variant had to recur in at least three samples (excluding the hypermutant samples for SNVs) to be kept in the matrix. For SV in the TAD of a gene ('flank SV'), this threshold was increased to at least ten. GISTIC peaks in each sample were used to incorporate the SCNAs of lower amplitude.
cBioPortal oncoprinter was used to visualize variants. Column order represents the samples within each subgroup determined by hierarchical clustering (HC). HC with one minus Spearman rank correlation metric was applied (Extended Data Fig. 6c) and HC with one minus cosine similarity metric on the respective subsets of the variant combination matrix ( Fig. 6a and Extended Data Fig. 8c) with average linkage in all cases. Genes of interest were manually selected based on the variants with the highest enrichment in the subgroups.
Distances between sample groups in variant space. We calculated Jaccard distances between genetic profiles across 369 variants in genes from the Cancer Gene Census 44 for each pair of tumors. Tumor groups were determined by mutations in H3.1 or H3.3, and the complex-SV signature contributed more or less than 20% to all SV signature activity.
Variant timing analysis. The palimpsest 78 R package was used to determined single-patient timings of SNVs 79 . SNVs were classified into clonal versus subclonal based on their cancer cell fraction (variant allele fraction adjusted by local CN and purity and/or ploidy). SNVs overlapping with SCNAs could further be timed into early or late depending on whether they occurred before or after the SCNA 51 . MutationTimeR 51 was used to determine the timing of SCNAs in individual patients, including clone clusters as input. Mobster 80 was used to define clone clusters based on the distribution of the absolute 62 cancer cell fractions. The resulting molecular time for the SCNA segments was assigned to the GISTIC peaks present in the respective samples with a width-weighted mean and categorized based on the timing quarters. For each subgroup, in addition to the timed GISTIC peaks, single-patient timed SNVs in consensus cancer genes were tallied into winning tables reflecting the frequency of this variant being an early event using published code 79 . The BradleyTerryScalable R package was used to estimate the winning probability across the subgroup for each variant, which in this setting is a measure of the probability of this variant being an early event in the tested subgroup. The Bayesian maximum a posteriori probability estimate was used to fit the model as previously described 79 . To control for outlier samples, the analysis was performed on 100 random samples of 70% of each subgroup. The resulting distributions for the strength parameters (on a log scale) were plotted for variants recurring at least three times in the tested subgroup.
Survival analysis. Univariate correlations for differences in survival were analyzed using the Kaplan-Meier method, and significance was determined by log-rank test. Spearman rank correlation tests were used to determine correlations between OS and complex-SV signature activity. This was possible because all children with DMGs died within the observed period, resulting in an absence of censored data. Variables included in the multivariate analysis (Cox model) were histone SNV, age and TP53 status combined with complex-SV signature activity.

Statistics and reproducibility.
No statistical method was used to predetermine sample size. Three previously published samples were excluded because their fastq files could not be successfully realigned using our pipelines. Exclusion criteria were pre-established. The experiments were not randomized. The investigators were not blinded to allocation during experiments or outcome assessment. All statistical analyses were performed in R 3.6.3. Unless otherwise indicated, statistical comparisons were performed using Fisher's exact tests or Wilcoxon tests, as appropriate. The data met the assumptions of the statistical tests used. Unless otherwise specified, data were assumed to be not normally distributed, but this was not formally tested. P values less than 0.05 were considered significant. Multiple testing was accounted for by using false discovery rate q values unless otherwise indicated. In all box plots, the boxes represent the range between the 25th and 75th percentiles, and the central line indicates the median. Statistical comparisons for the luciferase reporter were performed in Prism v.9 using nested one-way ANOVA and Tukey's multiple comparison test.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
De novo generated sequencing data from this study are accessible under dbGaP accession number phs002380.v1.p1. Previously published sequencing data 4-7 that were reanalyzed here are available under accession codes EGAS00001000575, EGAS00001001139, EGAS00001000572, EGAS00001000192, GSE128745 (ref. 35 ), GSE54792 (ref. 36 ) and GSE126319 (ref. 37 ) and from the ENCODE project 38 . COSMIC signatures and cancer genes are available at: https://cancer.sanger.ac.uk/ cosmic/download. TAD boundaries are from GSE77565 (ref. 63 ). Source data are provided with this paper. All other data supporting the findings of this study are available from the corresponding author on reasonable request.

Code availability
Publicly available software was used as indicated in the Methods. The main custom analysis code is available at: https://github.com/FrankDubois/pHGG_SVs. All custom code used to connect and reformat the outputs of the publicly available software, as well as code used to generate the figures, is available upon request. Fig. 1 | Sample characteristics and significantly recurrent variants. (a) Purity of pretreatment biopsy and autopsy samples were not significantly different (p = 0.5, two-sided Wilcoxon, n = 174 tumors, center line of the boxplot indicates the median, bounds of the box the 25th and 75th percentiles and whiskers extend from the box to the largest or smallest value no further than 1.5x IQR). (b) Significantly recurrent SNVs in nonhypermutant tumors (n = 179 tumors). (c) Significantly recurrent SCNAs (n = 179 tumors). All of these SCNAs have been noted 3 except for a non-protein-coding locus in 8q.24.21, near MYC-which is also within a separate recurrently amplified locus. (d) Q-Q plot for the analysis of significantly recurrent SV breakpoints. The most SRBs are within the long noncoding RNA CCDC26, within the TAD encompassing MYC (based on n = 179 tumors). (e) Representative examples of the enhancer amplification through simple tandem-duplications within the long noncoding RNA encoding CCDC26.