SNV/indel hypermutator phenotype in biallelic RAD51C variant - Fanconi anemia

We previously reported a fetus with Fanconi anemia (FA), complementation group O due to compound heterozygous variants involving RAD51C. Interestingly, the trio exome sequencing analysis also detected eight apparent de novo mosaic variants with variant allele fraction (VAF) ranging between 11.5%−37%. Here, using whole genome sequencing and a ‘home-brew’ variant filtering pipeline and DeepMosaic module, we investigated the number and signature of de novo heterozygous and mosaic variants and the rare phenomenon of hypermutation. Eight-hundred-thirty apparent dnSNVs and 21 de novo indels had VAFs below 37.41% and were considered postzygotic somatic mosaic variants. The VAFs showed a bimodal distribution, with one component with an average VAF of 25% (range: 18.7–37.41%) (n=446), representing potential postzygotic first mitotic events, and the other component with an average VAF of 12.5% (range: 9.55–18.69%) (n=384), describing potential second mitotic events. No increased rate of CNV formation was observed. The mutational pattern analysis for somatic single base substitution showed SBS40, SBS5, and SBS3 as the top recognized signatures. SBS3 is a known signature associated with homologous recombination-based DNA damage repair error. Our data demonstrate that biallelic RAD51C variants show evidence for defective genomic DNA damage repair and thereby result in a hypermutator phenotype with the accumulation of postzygotic de novo mutations, at least in the prenatal period. This ‘genome hypermutator phenomenon’ might contribute to the observed hematological manifestations and the predisposition to tumors in patients with FA, and pregnancy loss in general. We propose that other FA groups should be investigated for genome-wide de novo variants.


Introduction
Fanconi anemia (FA) is a rare chromosomal instability syndrome that affects one in every 136,000 births (Mamrak et al. 2017). FA is a genetically and phenotypically heterogeneous disorder, resulting from perturbations in genes involved in DNA repair and cell cycle regulation. When the cell cannot sufficiently repair its genetic information, and genomic integrity is compromised, various clinical manifestations, such as congenital malformations, early progressive bone marrow failure, and predisposition to hematologic malignancies and solid tumors occur (Auerbach 2009;Kolinjivadi et al. 2020;Mehta and Ebens 2021; Moldovan and D'Andrea 2009;Rageul and Kim 2020). Physical abnormalities, present in approximately 75% of affected individuals with FA, include one or more of the following: endocrine dysfunctions leading to short stature, abnormal skin pigmentation, skeletal deformities of the thumb and forearm, microcephaly, and ophthalmic and genitourinary tract anomalies (Auerbach 2009;Ceccaldi et al. 2016;Kee and D'Andrea 2012;Mehta and Ebens 2021). In addition, FA patients are predisposed to various cancers, including acute myeloid leukemia (AML) and squamous cell carcinoma of the head and neck (Kutler et al. 2016;Kutler et al. 2003).
To date, variants in more than 20 genes (FANCA to FANCW) have been found to cause FA as an autosomal recessive (AR) trait, in FANCB exhibits X-linked (XL) inheritance, and in FANCR/RAD51, which have been associated with an autosomal dominant (AD) rare disease trait (Ameziane et al. 2015;Badra Fajardo et al. 2022;Moreno et al. 2021;Wang et al. 2015). All proteins form an integrated protein network, known as the FA/BRCA repair pathway (Ceccaldi et al. 2016). The network is active during DNA replication when converging replisomes encounter damage that covalently binds the two DNA strands (interstrand crosslinking; ICL). Sequential recruitment of FA proteins and associated partners onto chromatin unhooks the ICL and coordinates the repair through homologous monoubiquitination and recombination (Badra Fajardo et al. 2022;Moreno et al. 2021;Nalepa and Clapp 2018). Emerging evidence suggests that independent of ICL and homologous recombination (HR) repair, FA proteins also regulate cell-cycle checkpoints and/or promote replication fork remodeling in response to replication stress, redefining the FA pathway as a cardinal mechanism to preserve genome integrity throughout the entire replication process (Badra Fajardo et al. 2022).
Alterations in FA genes have been found to incite genomic instability and contribute to tumorigenesis (Badra Fajardo et al. 2022). Accordingly, cells from FA patients are hypersensitive to ICL-inducing agents such as diepoxybutane (DEB) and mitomycin C (MMC), which cause high levels of chromosomal aberrations, including chromosomal breaks and quadriradial formation. In addition to identifying pathogenic variants in the FA genes, this unique and characteristic cellular phenotype is still employed for the objective clinical laboratory diagnostics of FA patients using a DEB and MMCinduced chromosome breakage test of lymphocytes (Rageul and Kim 2020).
Previous whole genome sequencing (WGS) and exome sequencing (ES) studies have provided insights into the scale of de novo variants in the normotypical population and as a cause of genetic diseases (Acuna-Hidalgo et al. 2016;Veltman and Brunner 2012).
The mutation rate of single nucleotide variants (SNVs) has been estimated at 1.0-1.8x10 -8 variants per base per generation, giving rise to 60-70 de novo variants per genome, with one to two affecting the coding sequence (Campbell and Eichler 2013;Goldmann et al. 2016;Kong et al. 2012;Roach et al. 2010). It is estimated that locus-specific spontaneous mutation rates for copy-number variants (CNVs) are approximately hundreds or thousands fold higher than that of de novo SNVs (dnSNVs), i.e. ~ 10 -6 to 10 -4 per generation, resulting in 0-1 de novo CNV per genome (Lupski 2007;Turner et al. 2008).
Interestingly, Liu et al. (Liu et al. 2017) and Du et al. (Du et al. 2022) described a novel type of constitutional genome instability with an unusually large number of de novo mutations (DNMs) for multiple de novo CNVs (MdnCNV) occurring during perizygotic mutagenesis; this MdnCNV phenomenon showed evidence for regional SNV hypermutagenesis in a 4 Mb 'window' surrounding the CNV breakpoint junctions consistent with replicative recombination repair involving an error prone polymerase (Kaplanis et al. 2022;Liu et al. 2017). Most MdnCNVs were arranged as large tandem duplications (~1 Mb in size) with microhomology and microhomeology at the breakpoints and dnSNVs in their vicinity. Genetic marker studies revealed the MdnCNV arose in a perizygotic time interval of organismal development, thus affecting all cells of the human body.
In a more recent study, germline hypermutation of dnSNVs was identified in genome-wide studies from 12 individuals out of 21,879 families with rare genetic diseases.
The number of dnSNVs for each individual with hypermutation ranged from 110 to 425, correlating to a 1.7-6.5 fold increase compared with the median number of dnSNVs in the general population. Two of these individuals also had a significantly increased number of de novo insertion/deletions (indels) (Kaplanis et al. 2022). Constitutional new variants have been considered to primarily arise from germline or zygotic events; however, more recent data suggest postzygotic new variants are an under-recognized source of de novo genomic variations (Acuna-Hidalgo et al. 2015;Rahbari et al. 2016). Postzygotic events contribute to the formation of mosaicism, and recent advances in genomic technologies have enhanced our ability to detect and characterize low-level mosaicism (Contini et al. 2015;Doan et al. 2021;Lannoy and Hermans 2020;Uchiyama et al. 2016).
In 2018, we reported a newborn female with an expanded phenotype of Fanconi anemia, complementation group O (FANCO) (Jacquinet et al. 2018). She was diagnosed prenatally with several congenital anomalies: bilateral ventriculomegaly, absence or fenestration of the septum pellucidum and fusion of the fornices anteriorly, thick and echogenic corpus callosum, cleft lip and palate, overlapping fingers, heart anomalies, symmetric fetal growth restriction, and suspected ambiguous genitalia. Of note, the family history was significant for breast cancer in the paternal grandmother and greatgrandmother. Chromosomal microarray analysis using Affymetrix CytoScan HD SNP array performed on DNA from amniotic fluid was normal. Prenatal trio ES on DNA isolated from cultured amniocytes revealed inherited compound heterozygous variant alleles in RAD51C: (NC_00017.10(NM_058216.2)): c.935G>A (p.Arg312Gln) and c.571+5G>A that were interpreted as likely pathogenic. In addition, trio ES analysis detected eight apparent de novo mosaic variants in the fetus with variant allele fractions (VAF) ranging between 11.5% and 37% (Jacquinet et al. 2018). Chromosome breakage studies confirmed the diagnosis of FANCO, while the cleft lip and palate and the lobar holoprosencephaly were considered an expansion of the phenotypic spectrum of FANCO.
The child died soon after birth.
Here, we describe the results of subsequent trio WGS studies in the family, which revealed SNV hypermutagenesis as evidenced by a large number of apparent dnSNVs and indels.

Genomic sequencing
Fetal DNA was extracted from amniotic fluid and parental DNA was extracted from peripheral blood. The prenatal trio ES and WGS were performed on the Illumina HiSeq platform following standard protocols as previously described (Liu et al. 2017;Normand et al. 2018;Yang et al. 2013). The total mean autosomal sequencing read depth-ofcoverage in WGS ranged between 42-58x per sample (Supplementary Table 1).

Selection criteria for candidate de novo variants
A custom bioinformatics script was utilized to detect and filter apparent de novo SNVs or small indels in the trio WGS data (Gambin et al. 2020). We analyzed the VCF file to select variants for which the proband was found to be heterozygous by calculating the VAFs. We have previously shown that more than 95% of apparent de novo autosomal SNVs and Xlinked SNVs in females in ES analyses have VAF ranging between 37.41-62.6% (Cao et al. 2019). We have now used more stringent criteria to eliminate genotype calls erroneously classified as heterozygous and removed variants with VAF above 70%, variants with a total depth of coverage below 20x in any sample from the trio, and DNMs overlapping known segmental duplications, centromeres, or Alu repetitive elements. We have included variants (SNVs or indels) with ≥2 alternative reads in the proband and absent in both parents. To further reduce the number of false positives and technical artifacts, we have removed all variants present in gnomAD v3.1 database [https://gnomad.broadinstitute.org]. For each selected variant, we have retrieved pileup information from the proband and parental BAM files that enabled obtaining more precise data on read depth and VAF in these samples.
The analyses of the pileup data were performed using Samtools version 1.13 (Danecek et al. 2021 software (Robinson et al. 2011;Thorvaldsdottir et al. 2013) with the previously described criteria (Du et al. 2022).
To validate the customized variant filtering, we have used the recently published prediction module DeepMosaic, which combines an image-based visualization for mosaic SNVs with a convolutional neural network-based classification for mosaic variants detection. This pipeline has an increased sensitivity (using HaplotypeCaller with ploidy=50) and fully automated filtration mechanism (Yang et al. 2023). The de novo variants observed in the trio WGS of the FA patient was compared to the de novo variants in the control trio WGS using both pipelines.

De novo substitution mutational signature pattern analysis
The R package MutationalPatterns (Manders et al. 2022) was used for de novo substitution mutational signature analysis. The tri-nucleotide and pan-nucleotide mutational contexts were extracted and visualized with 'mut_matrix' and 'plot_96_profile' functions from the MutationalPattern R package. Known signatures from COSMIC (v3.2) were refitted using the 'backwards' method. The method starts by achieving an optimal reconstruction via 'fit_to_signatures.' The signature with the lowest contribution is then removed and refitting is repeated iteratively. Each time the cosine similarity between the original and reconstructed profile is calculated using refitting with 'backwards' method.

Refitting bimodal VAF distribution
To infer the timing within the life cycle of the de novo variants event, we have evaluated their VAF distribution (density plot). We have used a custom R code to identify the boundaries corresponding to the first, second, and third cell division, looking for the points of intersection between different distributions. To this end, we have sampled 10,000 values from theoretical densities of binomial distributions of VAFs corresponding to each cell division using the cbinom R package (https://cran.rproject.org/web/packages/cbinom/cbinom.pdf). Next, we have estimated the boundaries between the consecutive pairs of distributions by identifying the positions closest to the intersection points.

CNV calling and visualization
CNVs were called using Illumina Dragen Bio-IT Platform (v3.4.15). The read depth was calculated with mosdepth (v 0.3.4) (Pedersen and Quinlan 2018) and visualized with the in-house visualization tool VizCNV (https://github.com/BCM-Lupskilab/VizCNV) that allows for normalized read depth plotting of the proband and both parents to help with manual inspection of potential de novo CNVs larger than 3 kb.

De novo variants identification in ES and WGS data
Computational CNV analyses followed by manual read-depth visualization did not reveal any increased rate of CNV formation (Supplementary Figure 1) Table 2).
Re-analysis of prenatal trio ES data confirmed eight apparent de novo mosaic variants with VAF ranging between 11.5% and 37% and revealed an additional 15 apparent de novo mosaic variants with VAFs ranging between 9% and 37% (Supplementary Table   3). The novel variants are mainly located in non-coding regions, close to the exon boundaries. Although the number of de novo mosaic variants detected through the exome analysis is not high, the depth-of-coverage is sufficient to consider the VAFs of the mosaic variants as solid.

De novo variant allele frequency distribution
The VAF pattern of apparent dnSNVs (Figure 1) suggests their bimodal distribution, with one component with an average VAF of 25% (ranging between 18.7-37.41%) (n=446), representing potential postzygotic first mitotic events, and the other with an average VAF of 12.5% (ranging between 9.55-18.69%) (n=384), representing potential second mitotic events.

Mutational signature of the de novo substitutions
Genome-wide distribution of the somatic de novo substitutions shows clusters (genomic distance <50 bp) spreading across multiple chromosomes (Figure 2a) The mutational pattern analyses of the SBS revealed enrichment of C>G, C>A, and C>T variants (Figure 3). The mutational pattern analysis for somatic SBS shows SBS40, SBS5, and SBS3 as the top signatures (Figure 4). The SBS40 is a flat signature similar to SBS5, of which the underlying etiology is uncertain. SBS3 is a known signature associated with HR-based DNA damage repair error, often due to BRCA1 or BRCA2 inactivation (Nik-

RAD51 is a RecA-like DNA recombinase that initiates HR upon DNA damage by
replacing Replication protein A (RPA) and catalyzing strand transfer between the broken sequence and its undamaged homolog (Boni et al. 2022). RAD51C is a member of the RAD51 family, RAD51B, RAD51C, RAD51D, XRCC2, and XRCC3, required for efficient DNA double-strand break repair by HR (Chun et al. 2013). Furthermore, RAD51 paralogs have been shown also to play cardinal roles in protecting the replicative fork during DNA synthesis (Somyajit K 2015), avoiding unrestrained fork progression, and promoting efficient restart (Berti et al. 2020). Depletion of RAD51 paralog genes in human cell lines has been associated with impaired HR, reduced genome stability, increased DSBs, and growth defects (Boni et al. 2022). The collection of replication stress-associated DNA lesions leads to genomic instability and may predispose to cancerogenesis, also in the heterozygous state, e.g., cancers arising mainly in the breast and ovary (Badra Fajardo et al. 2022;Boni et al. 2022;Kottemann and Smogorzewska 2013).
The first pathogenic variant in RAD51C associated with a human disorder was described in three siblings from a consanguineous Pakistani family with clinical features suggestive of FA (Vaz et al. 2010). Functional analyses of the homozygous missense mutation (NM_058216.2:c.773G>A (p.Arg258His)) demonstrated an increase in G2 arrest in response to MMC in patient cultured lymphocytes compared to controls, a decrease in RAD51C focus formation in response to MMC in patient fibroblasts, a modest radiosensitivity, and an increased sensitivity to the topoisomerase I inhibitor, camptothecin (Somyajit et al. 2012;Vaz et al. 2010). Since the report of this family, only the presented individual with FANCO has been reported in the literature, thus the phenotypic spectrum associated with RAD51C mutations remains obscure.
The 'mutator phenotype' was first described in Drosophila and in bacteria (Liu et al. 2017;Miyake 1960;Plough 1941). In the last two decades, it has been appreciated also in cancers (Nicolaides et al. 1998). The variants have been found to be usually driven by an intrinsic source, such as a defective mismatch repair gene (Loeb 2001). Therefore, the mutations are expected to develop over time, and accumulate in a somatic mosaic state (Kilpivaara and Aaltonen 2013). In contrast, the non-cancer constitutional CNV mutator phenotype described by Liu et al. (Liu et al. 2017) and most recently further expanded by Du et al. (Du et al. 2022) (Sopik et al. 2015). Importantly, in the present case, the family history was significant for breast cancer in female family members of the proband's father with the heterozygous splicing variant c.571+5G>A in RAD51C. Interestingly, the same variant was identified in a Chinese individual among 273 BRCA1/2-negative familial breast cancer cases (Pang et al. 2011). Segregation analysis of the paternally inherited splicing variant (c.571+5G>A) was recommended to better counsel the family about the increased risk for breast and ovarian cancer in female carriers. Similar segregation should also be considered for the maternally inherited missense variant, although it was not previously described in familial breast cancer cases. Of note, reproductive options, such as prenatal diagnosis and preimplantation genetic testing, were discussed with the parents due to 25% risk of recurrence of FA.
Finally, we propose that in addition to chromosome aneuploidies, common during early human embryogenesis (Cavazza et al. 2021;Vanneste et al. 2009), the hypermutation phenomenon, such as observed in our patient with FA, might represent the underrecognized mechanism responsible for early pregnancy losses.

Conclusions
Our data demonstrate that biallelic RAD51C variants affecting DNA damage repair process result in an SNV hypermutator phenotype, leading to the accumulation of postzygotic de novo variants, at least in the prenatal period. This phenomenon might contribute to the hematological manifestations and the predisposition to cancer in patients with FA.
Investigating this phenomenon may enhance our knowledge about the phenotype of FA and cancer biology. We propose that additional FA groups be analyzed genome-wide for de novo variants and better genotype-phenotype correlations. As most DNA-repair disorders, including FA, are recessive disorders, we suggest that genetic causes of hypermutation are more likely to be found at higher frequencies in populations with increased consanguinity rates. Importantly, hypermutation phenotype may be an underascertained factor responsible for early pregnancy losses.

Statements and Declarations
Funding

Availability of data and materials
The dataset supporting the conclusions of this article is included within the article (and its additional files).

Ethics approval
Our study was approved by Baylor College of Medicine Institutional Review Board (H-46683, H-41191).

Consent to participate
The deceased child's parents signed an informed consent for trio genome analysis as part of the research protocol. This research conformed with the principles of the Declaration of Helsinki.