We have used segregation analysis and other molecular data to reclassify rare variants to benign (n=51) or likely benign (n=211) in a Brazilian cohort of rare diseases. All variants are harbored by genes associated with autosomal dominant disorders. Proper interpretation of rare variants is a crucial step for adequate molecular diagnosis and, consequently, clinical management and genetic counseling. Therefore, efforts to distinguish pathogenic variants from rare benign variants are a key step in molecular diagnosis, but ascertaining which rare variants have clinical/health impact remains a major challenge.
We have used a systematic approach to preselect the variants in this study. Besides a low allele frequency in control databases, reports from literature and functional impact, all variants were clinically curated: they were harbored by genes associated with autosomal dominant disorders similar to all patients’ manifestations. In other words, all variants were associated with autosomal dominant disorders and were judged highly suspicious to explain the final diagnosis that had led to exome sequencing.
Segregation study was assessed to investigate the clinical relevance of the rare variants found in our cohort because a de novo event was expected for all 327 suspicious variants. Segregation is important to assess the pathogenicity of a variant. Indeed, variant segregation in patients and corresponding families are widely used for novel disease-associated genes and pathogenic/likely pathogenic variants discoveries. In this study, we have used the same approach for the opposite reason: uncover benign/likely benign variants. The inheritance of a variant expected to be de novo for pathogenicity assumption was considered in our study as a non-segregation and, therefore, a key step for benign or likely benign classification. However, this was not the sole step in this process, as discussed below.
Figure 1 resumes the key workflow steps of the variant classification. A total of 81 variants occurred in patients with a clear molecular alternate cause of disease and received the BP5 (supporting) criterion. All these 81 variants were classified as benign (n=13) or likely benign (n=66) because they also received, at least, another benign criterion either because were inherited from predictively asymptomatic parents (n=79) receiving BS4 or BS4_supporting criterion or REVEL score suggested no impact (n=2). Note that two variants were not inherited - non-inherited variants will be discussed below. In this branch of cases shown in Figure 1, the combination of BP5 and BS4/BS4_supporting was key for non-pathogenicity assumption. Figure 2 demonstrates that this combination of criteria happened for 13 benign variants and 66 likely benign variants: 29 likely benign variants received exclusively these two criteria and this combination was critical for final classification, while the remaining variants received other benign criteria.
Another key step in the classification is the frequency in the control database. In this study, we used only the gnomAD database, a fact that may have underestimated the “benign weight” of BS2 criterion because other databases might suggest higher frequencies especially for ethnicities not covered by gnomAD; on the other hand, the quality of variant calls and integrity of database might be an uncontrolled bias. All variants found in five or more controls in gnomAD were eventually classified as benign or likely benign because they received other benign criteria. Regarding the benign variants, 14 received exclusively the combination BS4 (variants inherited from parents and associated with conditions curated as pediatric onset) and BS2. As for the likely benign variants, nine variants received exclusively BS4_suporting (variants inherited from parents and associated with conditions not curated by NHGRI) and BS2.
A fourth key step was in-silico prediction. In this study, we have used REVEL, which is an ensemble method for predicting the pathogenicity of rare missense variants, as our sole in-silico predictor. REVEL has shown better performance when compared to other methods especially for rare neutral variants [2, 3, 15]. These new pieces of evidence demonstrate adequate strength of REVEL as a benignity predictor and make this tool fit adequately in our study, which aims to unravel rare neutral variants. The limited use for missense variants, though, is a limitation of REVEL approach.
Among the 170 that did not receive either BP5 and BS2 criteria, 106 were predicted as having benign effect by REVEL (score <0.4) and received BP4 criterion; 104 of these variants were inherited from parents (received BS4 or BS4_supporting) and the combination of BP4+BS4/BS4_supporting was critical for the final classification as likely benign.
Finally, 65 variants did not meet minimum requirements for reclassification as benign or likely benign and were considered VUS, among which 63 were inherited from a parent (received solely BS4 or BS4_supporting).
A total of six unique variants were not confirmed to be inherited from either parent in our study, among which four were not found in parental samples (a supposed de novo event) and for the remaining two, the paternal samples were not available, though these two variants were not found in the maternal samples. Half of these variants (n=3) were classified as likely benign considering other molecular data. Although it is widely known that several de novo genome changes can be associated with a multitude of pathological conditions, a broad range of de novo molecular events is also characteristic for individuals obtained from a general human population [16].
For the majority of variants classified as benign or likely benign, we did not find a ClinVar classification (n=189; 72.1%). Even for variants with ClinVar entrances, we found a concordance of classification with our study only for 15 variants.
Our study approach focuses on variant segregation with phenotype and makes several simplifying assumptions, including 1) parents were asymptomatic: we recognize that several dominant diseases may present varying degrees of severity and it may be difficult for clinicians to identify oligosymptomatic patients; 2) non-segregation was considered a strong benign criterion for child-onset diseases and a supporting criterion for later-onset diseases or onsets non-classified solely by NHGRI: NHGRI database lacks information for several genes, including some widely known early-onset genes (e.g., ASH1L, KMT2A, DEAF1, TRPV4, CHD2, SRCAP, ANKRD11, RAI1, ZEB2, SHANK3, ARID1B, and others) and this fact may have underestimated non-segregation (unclassified genes received a BS4_supporting); 3) variants associated with adult-onset conditions were a minority (seven unique variants) and segregation analysis for these cases may be more complex to understand: we opted to apply a BS4_supporting criterion considering that parents were asymptomatic and we did not have information of other affected family members, even though we may have overestimated the non-segregation effect for these seven variants; 4) our method applies for genetic sequence variants of high penetrance and dominant inheritance: for this model, we considered all conditions sufficiently penetrant for segregation/non-segregation purposes.
The model that we have used presents limitations for more complex inheritance, including unknown codominant, digenic or even oligogenic mechanisms or for recessive inheritance. It is noteworthy that some variants are harbored by genes associated with both autosomal dominant and autosomal recessive forms. However, all variants were clinically selected because only the autosomal dominant form of the corresponding disease was similar to patients’ phenotypes and a de novo event was expected for pathogenicity assumption. Therefore, the non-segregation of these variants for the autosomal dominant model was considered relevant for non-pathogenicity.
It is important to note, though, that there are growing pieces of evidence that rare variants make important contributions to human phenotypic variation and disease susceptibility, though detecting the effects of rare variants in complex traits is challenging because 1) it generally requires very large sample sizes to achieve statistical power and 2) rare SNVs are population-specific, which implies difficulties for replication of disease associations across different populations [17]. Therefore, it is possible (or even likely) that a rare variant classified as B/LB for autosomal dominant Mendelian trait may have health/disease implications for complex traits, for recessive forms or even for unknown recessive forms of a gene.
Our study presents important limitations. First, it analyzed the impact of single variants in monogenic-based models of autosomal dominant diseases and did not consider more complex interactions that might modulate phenotypes. Another limitation is that several variants were classified as likely benign/benign based solely on ACMG criteria but were not definitively proven to be benign. We took several precautions to apply some ACMG benign criteria that may have underestimated their effects: BS2 required at least five controls from gnomAD, BS1 required a frequency of at least 0.1% in gnomAD (a frequency considered high for many rare conditions). Even though we followed the ACMG criteria for variant classification, we may have falsely classified some variants as B/LB since we relied solely on presumptively assuming mechanisms without proper functional studies.
Even taking extra precautions that underestimate the weight of several benign criteria (e.g., BS2, BS1) and underestimating the non-segregation for several conditions non-classified by NHGRI, the majority of variants (n=262; 80.1%) were reclassified as benign or likely benign. One main reason for this is that it is a lot easier for a variant to be classified as B/LB than pathogenic or likely pathogenic because only two supporting benign criteria are enough for LB classification. We used the term “easier” for two reasons: easier because it requires less criteria for B/LB classification and easier because they do not have up- or -downgrades such as several pathogenic criteria. These assumptions have made us raise concerns regarding the wide application of benign criteria. They also made us unilaterally downgrade BS4 criterion for later-onset or non-classified-onset conditions. On the other hand, the addition of extra supporting benign criteria does not influence in the final classification: for instance, a variant receiving a combination of BS2 and BP4 will have the same likely benign classification of a variant receiving BS2, BS4_supporting, BP4 and BP5, though the likelihood for this later variant to be benign is greater than the former.
We did not find comprehensive studies that address segregation analysis and classification of B/LB variants in Brazil or any other Latin American countries. These countries are generally underrepresented populations in international databases and genetically heterogeneous with important genetic contributions from Amerindians, African-Americans and Western-Europeans. The more NGS and segregation studies become available in such nations, the more will be known of the rare regional benign variants.
Literature presents an important limitation regarding the description of benign/likely benign variants because studies and research journals generally present a strong bias for positive results. We believe that studies like ours are valuable for identifying rare benign variants and this strategy may improve the correct interpretation of genetic findings.