Current techniques available for routine genetic analyses, such as Sanger sequencing, aCGH, MLPA and short-read exome/WGS sequencing, have shortcomings when it comes to SV detection. Insertions, deletions or rearrangements are frequently missed due to limitations in resolution, particularly if repetitive elements are involved, combined with the focus on coding exon sequences. With over half of the human genome consisting of dispersed repetitive elements 38, compounded by the increased likelihood of recombination events occurring between repetitive sequences 39 40, current practices will thus likely miss many pathogenic SV. The SV found in this study is a good example of such an event, consisting of inserted non-coding sequences with breakpoints in repetitive elements.
By the application of OGM, we were able to detect an SV causing LS by heterozygous disruption of the MSH2 gene. The 39kb insertion placed non-coding 5´UTR exons from MSH6 into intron 7 of the MSH2 gene. mRNA studies confirmed that a non-coding exon was spliced into MSH2 RNA, creating a truncated protein and causing loss of function. The variant segregated with several individuals from the two different families showing loss of MSH2 and MSH6 protein expression and MSI-high tumours. For these reasons the variant has been classified as pathogenic. This finding demonstrates the advantages of OGM over routine diagnostic methods for SV detection. At 39 kb, the SV found in this study is well above the lower resolution claimed for OGM (ca. 0.5 kb). However, despite the inclusion of only seven OGM markers, the technique was able to correctly identify the source of the inserted sequence, underlining its utility. In this case, the finding confirmed the diagnosis of LS in two families that had been uncertain for around 15 years, and enabled relatives to obtain genetic testing and clarification of their cancer risk.
Nonetheless, this study also demonstrates a limitation of OGM, which does not have the resolution to determine breakpoints to within the distances required to design PCR primers for verification. We therefore employed also long-read sequencing with Oxford Nanopore technology to verify the OGM findings and improve detection of the breakpoints themselves. Although the data from a single MinION flowcell on this occasion confirmed the insertion sequence and position, the low coverage (2.8-fold genome coverage) prevented long-read variant calling software from identifying the sequences, and manual inspection of BLAST alignments was necessary. The low coverage and high error rate of the MinION reads also meant we were unable to precisely identify the breakpoints. Nonetheless, the long reads were instrumental in this regard, allowing breakpoint identification to within +/- 50 bp, and facilitating design of PCR primers that allowed breakpoint identification by Sanger sequencing.
It is likely that deeper sequencing with long reads (for example with higher-output PromethION flowcells), or the use of targeted-capture combined with MinION sequencing (as has been demonstrated for MSH241) would be more effective than low-coverage WGS MinION sequencing for breakpoint detection. Indeed, WGS approaches may on their own suffice to identify SV without the prior need for optical mapping. However, these approaches also cost more, so for the purposes of verifying Bionano optical genome mapping findings, Oxford Nanopore’s recently released adaptive sampling method, employed with a MinION flowcell, may offer the most cost-effective approach.
Dispersed repeats such as the SVA and LTR elements found at the breakpoints of the insertion in this study can contribute to pathogenic SV through either transposition or recombination events 34. The SVA repeats (SINE-VNTR-Alu) are non-LTR retrotransposons, similar to LINE-1 and Alu elements. As their name suggests, SVA retrotransposons are composite elements consisting of sequences derived from each of the contributing retrotransposons. However, unlike LINE-1 and Alu elements, which have proliferated widely throughout the human genome to constitute approximately 17% and 10% of the human genome respectively 39, SVA elements are relatively rare: It is estimated that there are only approx. 2700 SVA copies in the human genome (ca. 0.2%) 42.
SVA repeats have been shown to be involved in pathogenic SV formation 34. More specifically, SVA transposition events in MSH2, MSH6 and PMS2, have been reported to cause LS 43 44 45. In the patients reported in the present study, the MSH2 insertion point and donor MSH6 site contain the SVAD retrotransposon repeat element (SINE-VNTR-Alu, subfamily D) 34. The SVAD copy at the MSH2 insertion site appears to be a full-length element, while a truncated fragment is found at the MSH6 donor site. There is no evidence suggesting an SVA transposition event, but rather the contribution of existing SVA repeats to the generation of an SV through recombination. Since the left-hand side of the duplicated MSH6 sequence contains sequence from the unrelated ERV1 LTR repeat, a complex recombination event is assumed to have occurred.
The 39 kb insertion detected in both families, who were not aware they were related, was determined to have originated from a founding mutation event estimated to have occurred at least 4 generations prior to the two index cases presented here. Analysis of additional individuals will be required to more accurately estimate the timing of the founder event, which could have occurred in the more distant past. At the time of writing, individuals from three additional families (none aware of their relation to the others), also from the same region of Norway and presenting with suspected LS, have since been confirmed to carry the mutation. In all these families, immunohistochemistry demonstrated no expression of MSH2 and MSH6 in tumour tissue, and their cancers were MSI-high.
It has been recommended for over a decade that immunohistochemistry and MSI analyses be performed on patients with colorectal or endometrial cancer 46. These methods, which can indicate with up to 90% sensitivity an underlying MMR mutation, can be used to prioritize patients for extended genetic testing, including for SV. SV are a common source of pathogenic variants in the LS genes. Some of these, but not all, are detectable through MLPA or exome sequencing 41 47 48 49. We therefore encourage the use of OGM or long-read sequencing technologies to test for the presence of SV mutations where LS is suspected based on family history and/or immunohistochemistry/MSI findings but a genetic diagnosis has not been obtained with routine analyses. The recent observation that cancer-associated genes are enriched for retro-elements, which may act as recombination hotspots 50, supports the increased use of techniques such as OGM or long-read sequencing to evaluate SV as causative mutations also in other familial cancers.