There are numerous cases of the identical “founder” mutation occurring more than once in the same or different populations. For example, a three-bp deletion in DFNA5 causing familial deafness is a founder mutation in the East Asian population yet is also found in a family of European ancestry with no East Asian background (Booth, Azaiez and Smith 2020). In Lynch Syndrome, a missense mutation in MLH1, c.2059C > T, p.Arg687Trp is a founder mutation in the Swedish population but the same mutation is seen in families from other countries on different haplotypes indicating that the mutation occurred independently in those cases (von Salome et al. 2017). Another mutation in MLH1, c.589-2A>G, was determined to be a founder mutation in the US and in Italy, however, the mutation arose on different haplotypes in those countries (Tomsic et al. 2012). In each of these cases the haplotype on which the mutation was found distinguished between the mutation being a founder or a recurrent event.
We examined the sequence approximately 10 Kb both proximally and distally from the recombination junction on the affected allele in our cohort of 27 CCM2 deletion patients. All of the people sequenced shared the identical SNP haplotype across this entire region, including the presence of the minor allele for one SNP, rs7792895. As it is unlikely that the deletion would have occurred 27 times on the identical haplotype, these data are consistent with the hypothesis that the deletion had a single founder. Adding to this argument is the specific location of the recombination between an AluSx site on the proximal side and an AluSg site on the distal side. Alu/Alu rearrangements are well documented and are the cause of ~0.3% of human diseases (Batzer and Deininger 2002). However, a tool to calculate potential Alu/Alu recombination (Song et al. 2018) does not predict these two sites pairing together, suggesting that this would be an unusual event.
Also striking in the group of CCM2 deletion families is the geographic distribution within the US. The majority of the families live in Southern and Midwestern states; none are from the mid-Atlantic or New England. While this appears to be a somewhat limited range, it spans the entire continent. In contrast to other known founder mutations in CCM genes that are restricted to specific ethnic or geographic backgrounds such as the common Hispanic mutation in KRIT1, the CCM2 mutation in Ashkenazi Jews, and the Sardinian mutation in KRIT1 (Sahoo et al. 1999; Gallione et al. 2011; Cau et al. 2009), this CCM2 deletion has spread into the population of the United States over an expansive geographic range. To date, this deletion has only been reported in the US and not in other countries, implying that this is a US founder mutation.
Intriguingly, most of the families also report long family histories of living in the continental US, some dating back to colonial times in the 1600s. Extensive genealogical research, sparked by CCM2 deletion patients discovering shared ancestral family names via social media, uncovered five large families with founders in the 1600s – 1800s in southern US states. At least two members from distant arms of each of these families were included in the sequencing cohort. The most closely related were 3rd cousins (Families R/H) and the most distantly related were 5th cousins once removed (Family F/W) (Fig. 2). At these generational distances between 0.05 – 0.78% of the genome, or 1.5 to 53Mb in total, would be predicted to be shared (Browning and Browning 2012) (isogg.org/wiki/Wiki). It is perhaps not surprising then that within a family two distant cousins would share a 97 Kb SNP haplotype (10 Kb on either side of a 77 Kb deletion). What is most notable is that this identical SNP haplotype is shared between all sequenced members of this CCM2 patient cohort. Through genealogical research, many of these individuals have now been connected to one of these five multi-generational families but this same haplotype is also shared with individuals from smaller, unconnected families. Thus, the genetic and genealogical data combined support the hypothesis that this mutation arose once on this haplotype and that all of these people are related to a single, common founder. We may never be able to connect all of these individuals and families using genealogical tools due to the paucity and fragmentary nature of historical records in the United States. We are, however, able to connect these individuals genetically, based on their shared haplotype across this mutation. This combination of genetics and genealogy allows us to observe this founder mutation as it spread into the population across the country over at least the last four centuries.
We also note that an entire gene, NACAD, is included in the deletion but was not noted previously (Liquori et al. 2007). Our preliminary investigation of patients with the large NACAD-including deletion and those with point mutations only affecting CCM2 did not uncover any phenotypic differences between these groups. More work will be needed to determine which potential clinical phenotypes may be affected by NACAD deletion, especially as a contiguous gene deletion with the CCM2 gene.