Parent of origin bias for de novo events at recurrent CNV loci has been well-documented but has lacked a compelling explanation. Our analysis of data gathered on 1,438 CNVs from 25 published reports demonstrate that sex-specific variation in local meiotic recombination rates predicts parent of origin at recurrent CNV loci. Human male and female meiotic recombination rates and patterns differ greatly across the broad scale of human chromosomes. Recombination events are nearly uniformly distributed across the chromosome arms in females but tend to be clustered closer to the telomeres in males [57]. Our analysis reveals a parallel trend, such that NAHR-mediated CNVs that arise more frequently in female meiosis occur closer to the centromeres of their respective chromosomes, while those exhibiting paternal biases occur closer to the telomeres. We note that this pattern has been previously recognized [26]. Here we have formally tested the hypothesis that recombination variation drives parent of orgin variantion using a rigorus, statistical framework and provided an estimate for the variance in parent of origin bias that is due to sex-dependent recombination rates.
Investigations into the mechanism by which recurrent CNVs arise have focused on LCRs and their makeup [1, 58]. These regions are composed of units of sequence repeats that vary in orientation, percent homology, length, and copy number. Consequently, LCRs are mosaics of varying units, imparting complexity to LCR architecture [23]. The frequency of NAHR events mediated by LCRs is a function of these characteristics, and other features of the genomic architecture[21]. Specifically, the rate of NAHR is known to correlate positively with LCR length and percent homology and decrease as the distance between LCRs increases [19, 21]. However, because LCRs are challenging to study with short-read sequencing technology, the population-level variability of these regions is not well described [59]. Recent breakthroughs with long-read sequencing and optical mapping have revealed remarkable variation in LCRs [60–62], and haplotypes with higher risks for CNV formation have now been identified [63]. Our data suggest that any evaluation of CNV formation would be well served to consider the local meiotic recombination landscape. LCRs are substrates for NAHR [1], and thus are subject to the recombination process. Local recombination rates may influence how likely an NAHR event will happen between two LCRs. Therefore, when analyzing LCR haplotypes and their susceptibility to NAHR, one would need to take into account sex-differences in recombination. For example, at loci with maternal biases, specific risk haplotypes may be required for males to form CNVs and vice versa. Greater enrichment of GC content, homologous core duplicons or the PRDM9 motifs, or other recombination-favoring factors may also be required [1, 19]
While variation in recombination rates between sexes is well established [57, 64–67], prediction of individual risk may also need to consider individual variation in meiotic recombination, which is itself a heritable trait [65]. Here we show that 83% of the variation is explained by mean recombination rates in males and females. It could be that the remaining 17% is explained by individual level variation in rates. Variants in several genes, including PRDM9, have been shown to affect recombination rates and the distribution of double-stranded breaks in mammals [68, 69]. Common alleles in PRDM9 are evidenced to affect the percentage of recombination events within individuals that take place at hotspots [69]. Additionally, evidence shows that sex-specific hotspots exist in the genome and coincide with CNV loci [26, 65]. While CNVs at 22q11.2 show a slight maternal bias[30], the maternal bias evident at 16p11.2 bias is relatively more apparent [26, 30]. This may be due to the existence of a female hotspot at the 16p11.2 locus [26]. Existence of sex-specific hotspots may influence the likelihood of a recombination event in NAHR-prone regions in a particular sex and influence the strength of the parental bias in regions.
Many human genetic studies have observed correlations between inversion polymorphisms and genomic disorder loci [25, 70]. Because these inversions are copy-number neutral and often located in complex repeat regions, [71] they can be difficult to assay with current high-throughput strategies and their true impact remains to be explored. One model proposes that during meiosis these regions may fail to synapse properly and increase the probability of NAHR [72, 73]. Another theory suggests formation of inversions increase directly oriented content in LCRs leading to a NAHR-favorable haplotype [74]. Supporting these theories, inversion polymorphisms have been identified at the majority of recurrent CNV loci [24, 25, 30, 70, 72, 74, 75]. At the 7q11.23, 17q21.31, and 5q35.3 loci [24, 25, 75], compelling data indicts inversions as a highly associated marker of CNV formation. However, heterozygous inversions are known to suppress recombination perturbing the local pattern of recombination and altering the fate of chiasmata [76]. The analysis presented here strongly suggests that recombination is the driving force for CNV formation giving rise to an alternate explanation for the association between inversions and CNVs; They are both the consequence (and neither one the cause) of recombination between non-allelic homologous LCRs. Inversions and CNVs appear to be associated because both are being initiated by aberrant recombination. Viewing the system in this manner also explains the frequency of individual inversions at CNV loci. Inversions are arising via rare aberrant recombination, like CNVs, but subsequently being driven to higher frequency by natural selection, because they act to suppress recombination and “save offspring” from deleterious genomic disorders. Of course, frequent mutations leading to inversions and the details of LCR structure such as relative orientation and homology within a genomic region may promote or impede CNV formation in a locus-specific manner [77–79]. Further exploration of this relationship with improved genomic mapping can test these alternative models [80]. One testable prediction of the model described here is that inversions should be at higher frequency at loci giving rise to highly deleterious CNVs, as opposed to loci harboring recurrent benign CNVs.
To our knowledge, this study is the first comprehensive investigation of parental origin of NAHR-mediated CNV loci. Hehir-Kwa et al., and Ma et al., conducted similar large-scale studies, focusing on intellectual disability, developmental delay and congenital dysmorphisms, and determined a paternal bias for a sample predominantly of non-recurrent CNV [81, 82]. They hypothesized that replication-based mechanisms of CNV formation contributed to the bias. Our study focuses on loci predicted to be formed via NAHR, and thus isolates our data from confounding by multiple mechanisms of CNV formation. Although our analysis includes data from over 1,400 samples, it is limited to existing studies on pathogenic rare CNVs. It does not include benign CNVs such as the 7q11.2 deletion [83], since parent of origin data is scarce for these non-pathogenic loci. Analysis of a larger cohort of CNV loci including benign CNVs will give greater insight into the role of recombination, and sex differences in recombination in influencing parent of origin in CNVs.
Our data show that meiotic recombination predicts the parent of origin for recurrent CNVs underlying genomic disorders. The influence of recombination in CNV formation may also influence the incidence of recurrent CNVs. Females, on average, have more crossovers per genome, and when observing the frequency of CNVs in the population with known parental biases, a pattern emerges. 22q11.2 and 16p11.2 have a maternal bias and a prevalence of 1 in 4,000 and 1 in 3,000 [30, 84] respectively, while 3q29 and 5q35.3 have paternal biases and a prevalence of 1 in 30,000 and 1 in 14,000 respectively [85]. While prevalence could be confounded by severity of the disorders, our data suggest the sex specific frequency of meiotic recombination may also influence the incidence of these genomic disorders. Investigation into this possibility could be achieved with a cohort of individuals with benign CNVs to reduce confounding by severity.