Discovery of BRCA1/BRCA2 Founder Variants by Haplotype Analysis

A signi�cant number of hereditary breast or ovarian cancers are caused by germline variants, mostly BRCA1/BRCA2 genes. Because genetic predispositions vary by ethnicity, several studies have reported founder variants of BRCA1/BRCA2 genes. Such founder variants were reported primarily based on their relevant population frequencies. We reviewed the variant data relating to BRCA1 and BRCA2 genes from January, 2012 to March 2019 at Samsung Medical Center, Seoul, Korea. Among the cases with pathogenic variants (PVs) or likely pathogenic variants (LPVs), we de�ned recurrent variants as those found in more than �ve unrelated patients. Using single nucleotide polymorphisms, we analyzed patient haplotypes. There were 14 recurrent variants in the BRCA1 gene and seven variants in the BRCA2 gene. Of note, three variants in each gene were primarily detected in Korean populations. Among them, the c.5339T > C BRCA1 variant had a long block sized 74.5 kb. In BRCA2, the c.1399A > T variant had a long block sized 35.5 kb. We suggest that BRCA1 c.5339T > C and BRCA2 c.1399A > T are founder variants of the Korean population. These two recurrent variants were ethnicity-prevalent, primarily found in Korean populations, and the sizes of the linkage disequilibrium blocks are longer than others.


Background
Female breast cancer is the second most common malignancy, with its incidence ranking second to thyroid cancer in Korean females since 2004 1 .Among Korean women, the disease accounted for 15,942 new cases and 1,999 deaths in 2011 1 .Ovarian cancer is the second most commonly diagnosed gynecological cancer worldwide and the leading cause of cancer death in females in western countries 2 .Annually, it accounts for an estimated 239,000 new cases and 152,000 deaths worldwide 3 .Two highly susceptible genes, BRCA1 and BRCA2, were identi ed in 1994 4 and 1995 5 , respectively, and the importance of these genes has been emphasized.Many studies have reported BRCA1/2 pathogenic variant (PV) and likely pathogenic variant (LPVs) distributions in breast and ovarian cancers.Women with germline BRCA1 and BRCA2 PV/LPVs have a high risk of developing breast, ovarian, and pancreatic cancer throughout their lifetime [6][7][8][9][10] .
Several studies have investigated founder variants in BRCA1/2 genes, and several well-known 'founder variants' have been suggested in speci c ethnicities [11][12][13][14][15][16] .Based on these reports, ethnicity might be relevant to founder variants associated with hereditary breast and ovarian cancers.Speci c ethnic groups harbor common, recurrent variants at a higher frequency than the general population due to founder effects that arise from small founder population sizes, isolation, and rapid population expansion 17 .The reduced genetic heterogeneity in these founder populations facilitates greater ease of identifying disease-associated genes and causal variants in cancer families and patients without a family history of cancer 18 .There are no standard rules to de ne founder variants, and most analyses and reports have mainly been based on prevalence alone.
We detected several recurrent PV/LPVs in Korean patients and tried to analyze their haplotypes using single nucleotide polymorphisms (SNPs).The comparison of haplotypes between unrelated patients and the general population could distinguish whether high-frequency alleles are derived from an older or more recent single mutational event and whether they have arisen independently more than once 19 .

Patient Characteristics.
Out of 308 BRCA1-positive patients, 173 had breast cancer (56.2%), 115 had ovarian/peritoneal cancer (37.3%), and 14 patients had cancer of an unknown origin (4.5%) which were requested from other hospitals for BRCA1/BRCA2 genes tests.Two patients had dual primary cancers (breast/ovary and breast/peritoneal cancer).Out of 222 BRCA2positive patients, 168 had breast cancer (75.7%), 39 had ovarian/peritoneal cancer (17.6%), and eight patients had cancer of an unknown origin (3.6%).Breast cancer was the most common, followed by ovarian and fallopian tube cancer.Especially, the proportion of breast cancer in BRCA2-positive patients (76.6%) was higher than that of breast cancer in BRCA1-positive patients (56.2%) (p<0.0001).In the same context, gynecologic cancer, ovarian, fallopian tubal, and peritoneal cancer were more common in BRCA1-positive patients than in BRCA2-positive patients (37.3% vs. 17.6%, respectively; p<0.0001).Family histories were dominant contributing factors for breast cancer, especially in the BRCA2-positive group.The mean ages were 46 (range 21-77) years old for BRCA1-positive patients and 48 (range 27-78) years old for BRCA2-positive patients (Table 2).
Variant Type Classi cation.
BRCA1 and BRCA2 genes were analyzed for all coding exons in all patients.We classi ed the detected pathogenic and likely pathogenic variants as possessing nonsense, frameshift, missense, synonymous, intronic, or amino acid deletion mutations.In the BRCA1 gene, there were 84 total variants, and frameshift variants were most common (39/84, 46.4%), followed by nonsense variations (30/84, 35.7%).More than 80% of variants in the BRCA1 gene conferred a loss of function.There were 81 total variants in the BRCA2, and 54 (66.7%) were frameshift variants.We de ned recurrent variants as variants that were found in more than ve unrelated cases.There were 14 recurrent variants of the BRCA1 gene.Among them, seven were frameshift, four were nonsense, two were intronic, and the remaining was a missense variant.In the BRCA2 gene, there were seven recurrent variants.Among them, four were frameshift variants, and the other three were nonsense variants (Figure 1).
Detected Recurrent Variants of BRCA1/2 and Classi ed Groups.
Haplotype analysis of Patients with a BRCA1 gene variant.
We de ned the linkage block as that which harbored the detected variant.Even if there was a signi cant block that did not harbor the variant, we did not count the block as a haplotype block.If the result of the analysis revealed that there was no absolute block, we assumed the block size was shorter than the distance between the two SNPs.There were 14 variants of BRCA1 detected in more than ve people.We compared them against a sample of the normal Korean population (n=397) through haplotype and SNP chip analysis.In group A, the BRCA1 variant c.390C>A had an 8.9kb haplotype block, c.3627dup had a 0.8kb haplotype block, and c.5467+1G>A had a 1.3 kb haplotype block.The c.5030_5033del and c.5080G>T variants had the same-sized haplotype block, 4.0 kb.These two variants are closely located, so near SNPs were linked in both variants.In BRCA1 c.2433del, which was detected in eight patients, we tried to nd the linkage block.However, the start of the block was de ned, but the end of the block was not found due to lopsided SNPs.The c.302-2A>C and c.5445G>A variants had no rest specimen to be analyzed.In group B, the BRCA1 c.5496_5506delinsA variant had a 1.3kb sized linkage block, and c.3442del had a 0.4 kb linkage block.There were only two patients with c.5470_5477del, so we could not analyze the haplotype block this variant due to small number.With respect to group C, the linkage block of c. 1831del, which was found in six patients, was 23.0 kb long.The variant c.922_924delinsT showed a 2.9 kb-sized block.For the c.5339T>C variant, the linkage block was very long, 74.5kbs.This variant was detected in 20 unrelated patients, and we could analyze eight samples.The most common block size among the eight analyzed was 74.5 kb.The size of the linkage block that covered more than 50% of patients (the major block) was 398.8 kb, which exceeded the size of the BRCA1 gene.This variant was detected and reported mostly in the Korean population.The representative variants of each group are summarized in Figure 2.

Haplotype analysis of Patients with BRCA2 gene variants.
There were seven recurrent variants in the BRCA2 gene.In group A, there were two variants, c.7480C>T and c.3744_3747del, which were found in all ethnicities.The BRCA2 c.7480C>T variant was most detected in this study but also detected in others 22 .Forty-ve patients with this variant were analyzed with 24 SNPs and compared to a sample of the normal Korean population (n=397).However, there were no meaningful blocks or linkages due to the longer length between any two SNPs.Therefore, we could guess that the block size was less than 6.0 kb based on the SNPs around the variant.After analyzing 14 patients using the Korea Biobank Array, we concluded that the common block of this variant was 4.7 kb long.The result of haplotyping for the c.3744_3747del variant showed a common block size of 12.3 kb in 17 patients.Especially, the SNP positioned at 32929387 (rs169547) was highly located by T in patients (84.6%), and the odds ratio (OR) was extremely high, 7,945 (95% con dence interval [CI], 415.14 to 152053.25;p value, 0.0001).Regarding Group B, including variants primarily found in Asia, the c.5576_5579del variant was analyzed in 12 patients and showed a most common block sized 4.0 kb.It could be expanded to 123.1 kb to cover more than 50% of patients.This variant has actually been reported in all ethnicities [47][48][49]55 but is usually detected in the Japanese population and people with Japanese ancestry. In his context, this variant was suggested to be a Japanese founder variant 55 .The BRCA2 variant c.10150C>T could neither be analyzed by haplotyping nor by SNP chip analysis because of its terminal location and lack of specimens.
In group C, the c.1399A>T variant, which has been mostly reported in the Korean population, was detected in 26 unrelated patients in this study.Compared to a sample of the normal Korean population (n = 1,099), there was a highly suspicious haplotype, sized over 21.0 kb, harboring c.1399A>T.The OR for this linkage block was 273.85 (95% CI, 37.4721 to 2001.3852; p value <0.0001).After Korea Biobank Array analysis, the common block of this variant could be de ned in 35.5 kb, and the major block was 144.5 kb long.It appears as though the original block was the longest, and recombinations were generated in other shorter blocks.The BRCA2 c.3018del variant, which was found in six patients, had a 3.5 kb-sized block, and the c.6724_6725del variant, found in seven patients, had a long block sized 21.0 kb.
Characteristics of BRCA1 and BRCA2 Korean founder variants.
The sites of disease in patients with c.5339T>C in the BRCA1 gene and that of other BRCA1-positive patients were similar, breast cancer (p=0.7213) and gynecologic cancer (p=0.0986)(Table 4).However, there were three patients (25.0%) with c.5339T>C in the BRCA1 gene and bilateral breast cancer, and this risk could be higher than other BRCA1positive breast cancer patients (1.9%) (p<0.0001).Moreover, the hormone receptor status of breast cancer in those with c.5339T>C was different compared to other BRCA1-positive patients.BRCA1 germline variants are well-known to be associated with negative hormone receptor status.Of those with the BRCA1 Korean founder variant, ten patients showed positive, and four patients showed negative hormone status.Twenty-two patients with c.1399A>T in the BRCA2 gene had breast cancer, and two had gynecologic cancer.Seven patients had both breast and gynecologic cancer, but no patients with both cancers in founder variant of BRCA2 gene.It was not statistically signi cant.The hormone receptor status in those with c.1399A>T was not different compared to other BRCA2-positve patients.Patients with founder variants in the BRCA2 gene had more family history of gynecologic cancer (p=0.0055).

Discussion
This is the rst description to de ne founder variants based on the frequency and haplotyping.Here, we described the results of BRCA1 and BRCA2 gene analysis in a single center considering more than 5,000 cases.Detected variants were classi ed based on the 2015 ACMG/AMP guideline.Here, we selected the variants detected in more than ve unrelated patients, de ned as recurrent variants.There were 14 recurrent variants in the BRCA1 gene and seven recurrent variants in the BRCA2 gene.There were 84 total variants in the BRCA1 gene among 308 patients, and 14 recurrent variants covered 64.3% (198/308) of patients.In the BRCA2 gene, seven recurrent variants, from 80 total variants, covered 51.8% (116/222) of patients.We reviewed some Korean reports regarding BRCA1/BRCA2 variants, counted the similar variants, and de ned and merged the results relating to recurrent variants 23,26,52,53 .In the aforementioned studies, ve recurrent variants were found in the BRCA1 gene, and six recurrent variants were found in the BRCA2 gene, both found in more than three unrelated patients.These recurrent variants represented 43.5% (27/62) of all BRCA1 gene variants and 40.9% (38/93) of the BRCA2 gene variants.
It is important to note that well-known Ashkenazi-Jewish and Irish founder variants were not observed in our population.Many reports showed that founder variants are different depending on ethnicity [11][12][13]16 , but has not been many reports regarding BRCA1/2 Asian founder variants. BRA1 and BRCA2 genes have several variants; 3,517 BRCA1 variants and 3,902 BRCA2 variants were reported in HGMD (http://portal.biocase-international.com).However, the database was mostly composed of Caucasian patients.A report pointed out that 38% of variants detected in the Chinese population did not overlap with the BIC database 32 .We considered previous results pertinent to the Asian population 32,[56][57][58] , and recurrent variants were generally different, even between Chinese and Japanese populations.As such, knowledge of recurrent and founder variants could guide the steps to understanding a speci c population.
A variant prevalent in a speci c ethnicity, but not in others, was suggested to be a founder variant.However, other conditions like pseudogenes should be considered.In order to determine a probable founder status for the recurrent variants, we had two rules.The rst rule was that the variants were recurrent in a speci c population.This rule was used in several other reports suggesting founder variants in speci c ethnicities.The second rule was related to the size of the linkage block.Analyzing patient haplotypes with SNPs, we found that c.5339T>C in the BRCA1 gene was detected in 20 unrelated patients, and the linkage block was shaped as long as 74.5 kb, revealing no imbedded recombinants.In the BRCA2 gene, c.1399A>T was detected in 26 unrelated patients, and the haplotype was shaped in a 35.5-kb-sized block.Moreover, these variants likely occurred in a recent event, estimated by the length of linkage blocks.Several studies have attempted to estimate the age of genetic variants [59][60][61][62] .This can be done by considering the linkage disequilibrium (LD) at multiple genetic markers.The decay of LD due to recombination could be used to date the age of the variant, or the time of introduction, in a speci c population.By this evidence, if one linkage block is longer than the others, we could assume that the variant was raised recently.Thus, we concluded that c.5339T>C and c.1399A>T occurred as founder effects.We classi ed variants into three groups, A to C. Group A was de ned as variants found in all ethnicities, group B contained variants found in Asia, and group C was composed of Koreaspeci c variants.Approximately, the block sizes in group A were 0.8-8.9 kb for the BRCA1 gene and 4.7-12.3kb for the BRCA2 gene.On the other hand, the block sizes for BRCA1 and BRCA2 in group C were 2.9-74.5kband 3.5-35.5kb, respectively.In this context, we could guess the variants of group A occurred in very old common ancestors, while the variants of group C occurred recently.
We searched for the recurrent variants in other ethnicities and found the variants in group A were found in all ethnicities, but the group C variants were very rare in other ethnics 20,21,24,25,28,63 .The prevalence of founder variants in other ethnic groups is much higher.For example, c.5266dup (5382insC) and c.68_69del (185delAG) BRCA1 variants and the c.5946del (6174delT) BRCA2 variant have a combined prevalence of 2-3% in U.S. Ashkenazi Jews [64][65][66] .In this study, recurrent variants covered more than 50% of the detected PV/LPVs.The BRCA1 c.5339T>C variant accounted for 6.5% (20/308) of all BRCA1-positive patients and 10.1% (20/198) of recurrent variants in the BRCA1 gene.The BRCA2 c.1399 variant accounted for 10.8% (24/222) of all BRCA2-positive patients and 20.7% (24/116) of all recurrent variants in the BRCA2 gene.
For variant BRCA1 c.922_924delinsT, classi ed into group C, had a short block size of (2.9 kb).This variant could be found in other ethnicities, but we did not do so due to its very rare frequency in other ethnicities.Moreover, this was an indel variant that had a lower possibility to be raised than other variant types.The c.1831del BRCA1 variant had a long block (23.0 kb); however, we concluded that the number of detected patients with c.1831del was not enough to distinguish it as a founder variant.Variant c.6724_6725del in BRCA2 was detected in only seven patients, and c.3018del, detected in six patients, had a short sized block.
In conclusion, our ndings indicate that detected variants of both BRCA1 and BRCA2 genes may be ethnicity-speci c and especially that founder variants could account for a substantial proportion of hereditary breast or ovarian cancer.

Patients.
From January, 2012 to March 2019 at the Samsung Medical center in Seoul, Korea, 5,090 BRCA1 and 5,093 BRCA2 gene tests were performed.There were 308 BRCA1 (6.05%) and 222 BRCA2 (4.36%) PVs or LPVs.The included patients were diagnosed with breast, ovarian, fallopian tube, or primary peritoneal cancer, and genetic tests were requested from the general surgery, obstetrics, gynecology, or hemato-oncology department of Samsung Medical Center and other hospitals.The study was exempted for informed consent from patients by full IRB review by Samsung Medical Center Institutional Review Board.
Sanger Sequencing and Targeted Exon Sequencing.
From January 2012 to August 2016, we performed Sanger sequencing for all samples.Genomic DNA was extracted and puri ed from peripheral blood leukocytes based on the a Wizard Genomic DNA Puri cation kit according to the manufacturer's instructions (Promega, Madison, WI, USA).All BRCA1/BRCA2 exons and their intrinsic anking sequences were ampli ed by polymerase chain reaction (PCR).The ampli ed products were directly sequenced, compared with reference sequences using Sequencher software (Gene Codes Co., Ann Arbor, MI, USA).
From September 2016 to March 2019, we performed all exon sequencing with the Ion Torrent S5 XL sequencer and Oncomine™ (Thermo Fisher Scienti c, Waltham, MA, USA).Library preparation by using an Ion Chef System (Thermo Fisher Scienti c) according to the manufacturer's instructions was done.Barcoded libraries were generated from 10 ng of DNA per sample using an Ion AmpliSeq Chef Solutions DL8 Kit (Thermo Fisher Scienti c) and an Oncomine™ BRCA Research Assay (Thermo Fisher Scienti c).Two premixed pools of 265 primer pairs were used to generate the sequencing libraries.Clonal ampli cation of the libraries was carried out by emulsion PCR using an Ion AmpliSeq IC 200 Kit (Thermo Fisher Scienti c).The prepared libraries were then sequenced with an Ion S5 XL Sequencer using an Ion 520 Chip and an Ion 520 kit-Chef Kit (Thermo Fisher Scienti c).When PVs and LPVs were identi ed by nextgeneration sequencing (NGS), they were con rmed against Sanger sequencing.

Interpretation of Variants.
Sequences were compared with the BRCA1 (NM_007294.3) and BRCA2 (NM_000059.3)reference sequences for variant detection.Results were interpreted and reported following the American College of Medical Genetics and Genomics and the Association for Molecular Pathology 2015 guidelines (ACMG-AMP 2015) 67 .

Selection Criteria.
We de ned recurrent variants as the variants found in more than ve unrelated patients.In other studies, recurrent variants were de ned as the variants found in four to eight patients at minimum, dependent on the studies' sample size 20,22,57,68 .Among the detected PV/LPVs, we selected the recurrent variants.There were 14 variants in the BRCA1 gene and seven variants in the BRCA2 gene (Table 1).
De ned Groups based on Literature Review.
To nd ethnicity-speci c variants, we searched the identi ed variants against the literature, mostly based on PUBMED (https://pubmed.ncbi.nlm.nih.gov) and Human Gene Mutation Database (HGMD) reports.According to this search process, we classi ed variants into three groups.Several variants were found in all ethnicities, and some variants were found in mostly Asia or Korea.Based on these ndings, three groups were de ned as A to C. Group A was de ned as variants frequently found throughout all ethnicities.Variants found in Asia, especially Northeast Asia, were classi ed into group B. Lastly, group C included variants mostly found in Korea.

SNP Chip Analysis.
We analyzed several variants by the Korea Biobank Array (referred to as KoreanChip), which was optimized for the Korean population and demonstrated ndings from genome-wide association studies of the biochemical traits of blood 69 .The data of the Korea Biobank Array were derived sequencing data from 2,576 Korean genetic samples, 397 by whole genome sequencing and 2,179 by whole exome sequencing 70 .We requested SNP genotype data from DNA Link (Seoul, Korea), which were analyzed with the Axiom® Korean Biobank Array 1.0 (Thermo Fisher Scienti c).In our study, we analyzed each variant with the 65 SNPs of the BRCA1 gene.DNA for genotyping was extracted and puri ed from peripheral blood leukocytes using a Wizard Genomic DNA Puri cation kit according to the manufacturer's instructions (Promega).

Computational haplotype phasing.
Haplotypes were estimated using the statistical software package PHASE version 2.1.1.
2][73] , a program based on a Bayesian statistical method using coalescent-based models to infer phases at loci from unphased genotype data for samples of unrelated individuals 71 , though extensions to related individuals are possible 74 .The algorithm uses a exible model for the decay of linkage disequilibrium alongside distance and explicitly incorporates an assumption about recombination rate variation.The individual haplotype can be estimated from the posterior distribution by choosing the most likely haplotype reconstruction for each individual.Computational haplotype analysis was carried out in all unrelated carriers with recurrent variants.To analyze haplotype, we used the BRCA1/BRCA2 sequences with variable SNPs.We used the reference population from the Korean reference genome (KRG) database ( http://coda.nih.go.kr/coda/KRGDB/index), which included 397 individuals.Unrelated individuals can be phased by considering sets of common haplotypes that can explain the observed genotype data.If the number of unrelated patients was higher, we could expect a better estimation.All methods were performed in accordance with the relevant guidelines and regulations.

Ethics statement
The research was approved by the Institutional Review Board of Samsung Medical Center (IRB No. 2019-10-013).
Tables Table 1 Demographics of the patients who were tested for BRCA1 and BRCA2 gene haplotypes.

Figures
Figures

Figure 2 The
Figure 2