With the aim of exploring the timing of mutational events in human germline, we used a three-step method to detect genome-wide DNMs, attribute a parental haplotype and assess parental blood and sperm mosaicism, in five individuals. We found that parental embryonic mosaicism was a common source of DNMs, detectable in every genome. Pre-PGC events (detected in blood) appear equally distributed on paternal and maternal haplotype, which account respectively for n = 1.9 and 1.8 events per child after adjustment on detectability. Post-PGC events in male germline, as variants detected in sperm only, were detected at a similar rate than pre-PGC events, at about 2.0 events per child. Therefore, sequencing sperm samples appear twice as sensitive to identify the risk of recurrence as sequencing blood. Altogether, our approach based on mosaicism detection found an average risk of recurrence of paternally phased variants of 0.27%. We compared this estimate with a model based on the actual recurrence rate of variants within an Icelandic population [21], and found that our paternally phased variants would be 0.55% of risk of recurrence (Supplementary Information). A possible explanation could be that our set of de novo variants does not capture all variants that are at risk of recurrence. Indeed, the work from Decode, besides detecting de novo variants by trio approach as we did, the authors further detected variants by a haplotype-based method in large families, allowing for the detection of variants with high VAF in parents (high mosaicism) that would be considered as inherited variants by trio-based methods. Indeed, the authors estimated that the trio-based method would miss about half of the variants that actually recurred [21]. Therefore, our result represent an estimation of recurrence risk for trio-accessible DNMs only, which is yet a valuable information since clinically relevant DNMs are often detected by this approach.
The risk of recurrence of maternally derived DNMs is more difficult to assess from clonal VAF detection due to the inaccessibility of germline cells which harbour post-PGC variants. However, post-PGC events detectable in bulk analysis of germ cells are supposed to occur as an very early embryonic event (“peri-PGC”, [23]), in primordial germ cells, prior to sexual differentiation. Therefore, this shared biology argue that the absolute count and VAF of oocyte mosaicisms should be similar to those of sperm cells. This assumption would mean that the risk of recurrence for maternally derived variants equals RRpat x α where RRpat is the risk of recurrence for paternally phased variants and α is the ratio of paternal/maternal counts (Supplementary Information). With this approach and the value of α = 4 in our cohort, we estimate the maternal recurrence risk to be 1.08%, and the overall risk of recurrence to be 0.43%. This appears lower than the commonly accepted risk of recurrence of 1% for DNMs [8]. Once again, the detection method should be considered and our estimate concerns DNMs detected by stringent trio-based rules.
Recent studies on the genome-wide mutability of somatic tissues showed that variants accumulate at a relatively constant rate throughout life in many tissues, with a high correlation with time and minor impact of cell division rate [24]. However, in the germline, mutation rate does not appear constant overtime. Indeed, we found that 6.9% of the assessed de novo variants exhibited evidence for an early embryonic mutational event, which is a significant proportion considering the short period of time in which these variant are considered to happen (days or weeks post fertilization) in relation to the duration of a generation in which DNMs can occur. This observation can be linked to a very significant hypermutability of the first few cell divisions after the zygote, which has recently been detected by multiple approaches [25]. This hypermutability coincides with rapid cellular divisions termed “cleavages” without G1 and G2 phases and a suppression of cell cycle checkpoint. This special cellular state may be much prone to mutations, explaining this critically enriched short period of time.
In this study, we applied a sensitive deep sequencing method to detect parental mosaicism. Using these results as a gold standard, we can compare the performances to that of parental WGS VAF alone to detect parental blood mosaicism. Considering only variants with at least 1 alternate read in parental WGS, we would have had surprisingly good performances, with 77% recall and 67% precision (Supplementary Fig. 7). Notably, this would have captured all variants with a VAF of > 1%. On a technical note, the alternate count was assessed using samtools mpileup because DeepVariant did not report any alt count reads in its output VCF. This limitation prevents the use of DeepVariant for VAF quantification in mosaics. This suggests that, in absence of deep sequencing data in parental samples, looking back at the raw data from WGS may still be very useful to assess recurrence risk. However, in case of a single alternate read, a confirmation in an independent assay remains mandatory.
One major originality of our study was to assess the recurrence risk of genome-wide variants, whatever their effect on biology or disease risk. Most previous studies have focused on the assessment of recurrence risk of individual pathogenic de novo variants [7, 26–28]. In a remarkable example on 59 de novo variants, the authors applied a general framework consisting in (i) phasing the variants using targeted long-read sequencing and (ii) sequencing multiple parental tissues [7]. In our study, we used long read genome sequencing only to phase the DNMs called from short-read data, due to the low performances of v9 chemistry of Nanopore in small variant calling. However, recent advancements in long-read sequencing technologies have significantly improved this quality. These improvements enable highly accurate and efficient identification of de novo variants [29]. Therefore, it is likely that the transition from short-read to long-read genome sequencing in future years will enable much more systematic phasing of DNMs and therefore benefit to the genetic counselling of DNM associated diseases. With long-read based DNM identification, the pipeline for risk recurrence assessment could be restricted to a deep sequencing analysis on paternal sperm for paternal variants. Such a viable approach would lead to a precise estimation of recurrence risk for 80% of DNMs and avoid unnecessary invasive prenatal testing procedures in most of these cases. Evidence shows little variation in the VAF of sperm mosaicism over time [22, 30], which could corroborate this approach of using VAF as a proxy for risk of recurrence of paternally phased variants. While techniques of prenatal diagnosis improve and non-invasive techniques (NIPT) becomes accessible for de novo variants [31], the anticipation of the recurrence risk by sperm analysis before any pregnancy could better suit some families, and would present the advantage to be performed only once versus one NIPT at each pregnancy.
Finally, some other criteria should be taken into account when assessing the recurrence risk of DNMs. Some pathogenic variants in specific genes can lead to a developmental advantage of the wild type or mutant cell over the other [25], leading to biased recurrence risk. For instance, selfish mutations, affecting RAS/MAPK pathway occur almost systematically in paternal adult germline and even though these mutations lead to spermatogonial clonality, the overall proportion of mutated cells is very limited [32]. In line with this, epidemiological observations have revealed a low risk of recurrence for selfish mutations, questioning the necessity of prenatal diagnostic testing in subsequent pregnancies after an affected child [33]. In contrast, pathogenic mutations in other genes, such as SCN1A, appear enriched in parental mosaicism and de novo recurrence risk [34–42]. Another genomic feature that could potentially be used for recurrence risk assessment could be the presence of the variant in a mutational cluster (i.e. multiple variants within a small genomic interval, typically 20kb). Many mutation clusters are thought to derive from age-related changes in the biology of the germline, notably on oocytes [43]. Therefore, clustered variants could be indicative of low recurrence risk variants. Interestingly, none of the 11 clustered variants in which deep sequencing was performed showed evidence of parental mosaicism. Larger studies are needed to assess the correlation between risk of recurrence and occurrence in mutation clusters.