Whole exome sequencing reveals a combination of rare high and low penetrance variants that correlates with familial breast cancer relative risk

Background: Genetic risk factors of breast cancer are very heterogeneous and complex. They vary according to the familial relative risk, the age of cancer diagnosis of the index case and the age of the affected relatives. Objectives: We aimed to investigate and identify simultaneously all rare pathogenic and common variants in unrelated BC cases with different relative risk ratios for breast cancer and evaluate the contribution of these variants in genetic susceptibility to breast cancer. Patients and Methods: All frequent mutations in BRCA genes previously identified in Tunisia have been excluded by Sanger sequencing in 42 women affected with high family risk having at least 3 cancer affected related individuals. Two unrelated cases having two different family histories (in terms of different numbers of affected first-degree relatives and young age onset) have been selected for whole exome sequencing. The first family is composed of three sisters F1.1, F1.2 and F1.3 affected at 46, 50, and 32 years old, respectively. The second has only two breast cancer cases, F2.2 and F2.4, affected at late age 61 and 70 years old, respectively, in addition to other 5 members affected by different kinds of cancer. Selected high risk variants were confirmed and segregation analysis was performed using Sanger sequencing. Results and discussion: For F1.1 case, we identified a pathogenic frame-shift loss of function variant in BRCA2 p.Val1283Lysfs. For F2.2 we identified a pathogenic rare variant in OGG1 , p.Arg46Gln that co-segregates with a rare non sense variant in BRCA2 p.K3326X, only in the breast cancer affected cases. Moreover, F2.2 patient has 9 other common low penetrant variants in different loci known to represent independently minor, but cumulatively significant, increased risk for breast cancer. Conclusion: Family history and the young age at onset for patient F1.1 correlate with the presence of a rare high penetrant variant (p.Val1283Lysfs) in BRCA2 gene. However, the late age at onset and the less severe phenotype for patient F2.2 are probably the consequence of the presence of a pathogenic variant p.Arg46Gln in OGG1 gene that co-segregate with


Background
Cancer is one of the leading causes of death world-wide. There were 6.7 million new cancer cases and 3.5 million deaths among females worldwide in 2012. Among them, 56% of cases and 64% of deaths were in under-developed countries. These numbers are expected to increase to 9.9 million cases and 5.5 million deaths among females annually by 2030 as a result of the lifestyle changes and aging of the populations. [1].
Breast cancer (BC) is the most frequently diagnosed cancer worldwide especially in economically developed countries. In Tunisia, BC is a major public health problem with at least 2300 new cases per year. Many studies suggest that it is more aggressive than in Western countries, with notably large proportions of young patients [2].
Both non-genetic and genetic factors are involved in the etiology of breast cancer. About 15% of cases exhibit a family history of the disease which represents the strongest risk factor for mutation carriers. Its risk varies according to the mutation location and the genes involved. Therefore, genetic counseling should incorporate both family history profiles and mutation location [3]. A measure of this familial clustering is the familial relative risk (FRR), defined as the ratio of the risk of disease for a relative of an affected individual to that for the general population. Current International guidelines for BRCA testing use (i) breast or ovarian family, (history, (ii) young age at diagnosis ≤ 36 years and (iii) triple-negative breast cancer as the most risk factors. However, the commonly used guidelines for testing were insufficient to detect all mutation carriers in the BC cohorts [4]. Indeed, a higher rate of both BRCA1 and BRCA2 mutations has been observed in affected patients from North Africa without family history (8.0% in North Africa versus 1.1% in France for BRCA1 mutations, P = 0.02; 7.2% in North Africa vs. 1.1% in France for BRCA2 mutations:, P<0.05) [5].
The genetic variants associated with breast cancer risk can be classified as high-penetrant mutations that are rare in the population but associated with a very high risk (relative risk of carriers versus non carriers of 5 to >20); moderate penetrant variants associated with a moderate risk (5>relative risk >1.5) ; and low-penetrant polymorphisms which are common and associated with a small risk (relative risk <1.5) [6].
At the age of 80, cumulative cancer risk for BRCA1 and BRCA2 mutation carriers ranges from 69% to 72% for breast cancer development, and from 17% to 44% for ovarian cancer [3]. This variability is explained by other genetic modifiers and/or environmental factors [7]. Studies of the genetic variants influencing the risk of breast cancer in BRCA1/2 mutation carriers have been conducted first by the Consortium of Investigators of Modifiers of BRCA1 and BRCA2 (CIMBA) and then by the Collaborative Oncological Gene-environment Study (COGS). More than one hundred SNPs have been so far identified that are associated with the risk of developing breast or ovarian cancer for BRCA1 or BRCA2 carriers [8][9][10][11][12][13]. More recent studies have shown that common variants in genes involved in DNA repair pathways especially Base Excision Repair (BER) have a synergistic functional effect increasing cancer risk susceptibility in BRCA2 mutation carriers [14,15]. Among 144 SNPs analyzed in a two stage study involving 23,463 carriers from the CIMBA consortium, eleven SNPs showed evidence of association with breast and/or ovarian cancer at p<0.05 in the combined analysis. Four of the five genes for which strong evidence of association was observed were DNA glycosylases especially, OGG1, TDG and NEIL2. Most of these SNPs are common and non-coding, present in regulatory regions. Unfortunately, we have no idea on the role of rare coding modifiers variants located in these genes.
In another side, a large proportion of all breast cancers arises in a genetically susceptible minority of women that are not carriers of BRCA1 or BRCA2. Susceptibility to breast cancer is likely to be the result of at risk alleles in many different genes. Previous studies suggested that disease susceptibility in non carriers of BRCA1/2 mutations is explainable, in a polygenic model, by large numbers of susceptibility polymorphisms that are multiplicatively acting on risk [16][17][18]. It is estimated that 28% of familial breast cancer risk is explained by common breast cancer susceptibility loci. In some cases, SNP associations may be specific to some ethnicity or estrogen receptor [13]. The most recent and largest breast cancer Genome Wide Association study (GWAS) using the Illumina OncoArray BeadChip has identified a total of 172 risk-associated SNPs that account for an estimated ~18% of familial relative risk [19][20][21].
As focusing on common non coding variants, GWAS studies have limited the capacity to identify interactions between BRCA1 and BRCA2 mutations and rare coding modifiers variants [22]. A portion of missing heritability in familial breast cancer is likely represented by rare functional coding variants in genes not currently present on available panels [23]. These rare coding variants could be identified using Whole Exome Sequencing (WES) or Multi Genes High Throughput Sequencing.
Recently, WES technology has been demonstrated to be efficient in the discovery of novel breast cancer predisposition genes such as those that encode proteins involved in the DNA damage response or DNA repair [24] and has helped to determine the frequency of causal germline mutations [23,25] and to identify novel possible genetic modifiers of risk for early-onset breast cancer predisposition in carriers of high-risk mutations [26].
In the present study, we used WES to explore potential rare pathogenic variants for two cases having different relative risk ratios for breast cancer and evaluate the contribution of these variants in the genetic susceptibility to the disease.
Patients And Methods

Exome sequencing
Whole-exome sequencing was performed for patients F1.1 and F2.2. Exome was captured from genomic DNA using Agilent SureSelect Protocol Version 1.2 (Agilent Technologies; Santa Clara, CA, USA) and then sequenced on an Illumina HiSeq 2000 sequencer. We used BWA to align sequence reads to the hg19 reference genome and GATK to call SNVs and indels. Control quality showed that 88% of targeted bases were covered at >20X.

Exome sequencing data analysis
The results were analyzed using the VarAft software version 1.6, (http://varaft.eu/index.php). For exome analysis, dominant models of inheritance have been selected. Given the number of variants identified in WES, and in order to prioritize them, variants were filtered according to several stringent criteria. Indeed, we kept only rare functional variants (missense, nonsense, splice site variants, and indels) that were heterozygous in the index cases and we discarded variants with a Minor Allele Frequency (MAF) ≥1% according to 4 databases (1000 Genomes Project, Exome Variant Server (EVS), Exome Aggregation Consortium (ExAC) and a local database encompassing 48 exomes of Tunisian individuals with no personal nor familial breast cancer history). We also excluded variants with low sequencing quality. A number of online tools were used to predict the functional impact and pathogenicity of the missense variants such as Mutation Taster, PolyPhen, SIFT. Of the variants that met these criteria, we selected all rare coding variants (frequency <0.01) described at least once as pathogenic in ClinVar and located in a gene that matched with breast cancer disorders according to the VarElect prioritization tool (http://varelect.genecards.org) [27]. We also extracted common at risk variants previously reported to contribute in increasing breast cancer risk.

Variant validations and co-segregation
Exons with a bad coverage (i.e. Exon 5 in BRCA2 and Exon 13 in BRCA1) as well as exons 11 and 27 in BRCA2 and exon1 in OGG1 gene, which contain potential at risk variants, were amplified and sequenced by Sanger sequencing using the primers listed in (Supplementary Table 1) in order to confirm the variations and to analyze co-segregation for available related affected and healthy individuals.

Screening for variants in BRCA genes
All heterozygous and homozygous variants in BRCA1 and BRCA2 genes, that have been identified in the two cases, are listed in Table 3. The structure of the genetic profile between the two patients showed significant difference in terms of frequency and function (Fig1).    The OGG1 gene has the highest Vareclect score matching with breast cancer disorders (236,97). The variant R46Q has been previously described as a risk allele for the Human clear_cell_carcinoma_of_kidney that impairs the enzymatic activity of the OGG1 DNA glycosylase [41] and recently observed in an affected member by a familial form of small intestinal neuroendocrine tumors (SI-NETs ) and also in a putative clinically healthy carrier member [ For the third variant, c.990dupG: p.P331fs on FTCD gene. Mutations in FTCD represent the molecular basis for the mild phenotype of the Glutamate form iminotransferase deficiency, an autosomal recessive disorder and the second most common inborn error of folate metabolism. There is a conflicting epidemiological evidence on the role of folate in breast cancer risk. A recent metanalysis review has shown that breast cancer does not appear to be associated with folate intake, and this did not vary by menopausal status or hormonal receptor status. In addition, folate blood levels also do not appear to be associated with breast cancer risk [48]

Screening for Common at risk Variants
Investigation of common at risk variants could contribute to estimate and refine each individual risk and help to identify the highest risk patient. For this, among a published list of 182 risk associated SNPs that have displayed genome-wide significant associations with breast cancer [13, 21], we have extracted those present in each of our patients.
For the patient F1.1, we found only one SNP rs11374964, however for the patient F2.2, we found 6 SNPs (rs2992756, rs4971059, rs4245739, rs6964587, rs11374964, rs2236007) (Supplementary Table   3). These six SNPs added to the four variants that are present in BRCA genes could contribute together to increase the individual risk for developing breast cancer.

Sanger confirmation and validation
The BRCA2 c.3847_3848delGT frame-shift mutation was confirmed by Sanger sequencing and cosegregation analysis was performed in the two other affected sisters (F1.2 and F1.3). However, It was absent in 40 available affected with high family risk women having at least 3 affected related individuals.
We also confirmed that the second variant in BRCA2 gene, c.9976A>T; p.Lys3326Ter, was present in the index case and also in her BC affected sister (F2.4) but absent in the healthy sister (F2.3) and brother (F2.6) and also in the affected colorectal cancer sister (F2.1) and the affected testicular cancer brother (F2.5). Thus, confirming that this variant segregate only with the breast cancer phenotype.
For the OGG1 variant, it was confirmed by Sanger sequencing and co-segregation analysis was performed. It was found at a heterozygous state in the two BC affected patients (F2-2 and F2-4) and in one healthy sister F2-3 and absent in the 3 remaining family members. F2-3 has 68 years old, she reached menopause at the age of 48 years old. Cosegregation of the two variants in BRCA2 and OGG1 was observed only in the two BC affected cases, suggesting an additive risk.

Discussion
The identification of mutations responsible for breast cancer through clinical genetic testing enables patients to benefit from early screening and prevention strategies, some of which provide generally survival benefit. Using next generation sequencing allowed the identification of all rare and common variants that could be linked to breast cancer predisposition.
According to the international guideline for BRCA testing, female members in F1 family should undertake BRCA test because of the positive family history and the young age at onset for the third sister F1.3 (31 years old) ( Table 1). Results of WES showed a highly deleterious variant c.3847_3848delGT in BRCA2 gene. This mutation has been described as founder and frequent in the Danish population [49]. It is also present among Japanese patients and other Asian populations but it is rare elsewhere [50]. The index case has, in addition, a variant, Ile1307Lys in APC gene, previously described as a risk factor for breast, lung, urologic, pancreatic, and skin cancers [33, 34] and has been associated with an increased risk of colorectal cancer among Ashkenazi Jewish, Croatian, and For the F2.2 patient and her affected sister, they have a late onset age ranging from 62 to 70 years with non aggressive tumor according to histopathology test. They responded well to treatment without signs of recidivism. We found a rare pathogen variant, R46Q OGG1, in the two affected sisters and also in the clinically healthy sister who is currently 73 years. This variant has been previously described as a risk allele for the Human clear_cell_carcinoma_of_kidney that impairs the enzymatic activity of the OGG1 DNA glycosylase [41]. R46Q OGG1 has been recently observed in a patient with a familial form of SI-NETs and also in a putative clinically healthy carrier member [42].
In addition, significant associations between other OGG1 germline variants and breast cancer risk have been shown by meta-analysis and experimental data. For some missense variants in OGG1, the risk increases by 14-fold (p < 0.01) and reach 18-fold ( 67]. A recent study has shown that the K3326X acts as a trans-eQTL involved in DNA repair pathway. In addition, it exhibits statistically significant association with expression of TRPC6 gene and 4q21 locus [68]. The 4q21 has been recently identified as a novel breast cancer susceptibility locus associated with differential allelic expression [69]. This locus has been identified among the most frequent candidate loci with at high risk haplotype (haplotype frequency > 5%), through a genome wide haplotype study in the general Tunisian population [70]. Little is known about the involvement of common variants and their association with breast cancer risk in Tunisia.
The "rare variant hypothesis" for susceptibility to common diseases postulates that a significant proportion of the inherited component might be due to the addition of the effects of a series of low frequency and independently acting variants from a variety of genes, each conferring a moderate but detectable increase in the relative risk [71]. Accumulation of rare genetic variants in DNA repair In summary, using WES and segregation analysis, we have identified a low penetrant variant, K3326X in BRCA2 gene, that co-segregates with a rare pathogenic variant, R46Q in OGG1 gene, only in breast cancer affected cases. The OGG1 variation is a candidate risk factor predisposing to the disease. In addition, other rare variants such as the variant in GCGR gene should be investigated in future studies to understand their potential role. It is also recommended that the ten risk common variants found in BRCA genes and in others candidate loci be investigated through a large association study to

Availability of data and materials
All data generated or analyzed during this study are included in this published article and its additional files.

Consent for publication
Not applicable.       Figure 1 The structure of genetic profile of BRCA1 and BRCA2 genes in F1.1 and F2.2 patients Pedigrees of the F1 and F2 families and DNA-sequence electropherograms for unaffected (wt) and affected family members.