Identifying NAHR mechanism between two distinct Alu elements through breakpoint junction mapping by NGS

Background: Genomic rearrangements encompass deletions, duplications, inversions, insertions and translocations and may be the cause of several genetic diseases. One of the most frequent mechanisms that generates these rearrangements is the Non-Allelic Homologous Recombination (NAHR). They are caused by a misalignment between regions of high level of similarity, like Low Copy Repeats (LCRs) and Alu sequences. We aimed to sequence the breakpoint of a patient with a single deletion on chromosome 22q13.2 in order to understand the genomic structure of the region involved as well as elucidate the mechanism behind this rearrangement. Investigating breakpoints are of the utmost importance in the understanding the influence of the genomic architecture in clinical assays. Results: We flanked the breakpoint detected by array and then we captured the regions using Illumina Nextera Rapid Capture Custom to sequence with Illumina MiSeq. We found a chimeric read on Chr22:41,026,090, setting a 624,688 bp deletion on Chr22:41,026,112-41,650,780 (hg19). This deletion merges the intronic region of MKL1 and RANGAP1 genes, on two different Alu sequences ( AluSx and AluY, respectively ). Conclusions: The sequence of the breakpoint reveals that Alu elements are an important characteristic of the human genome on generating rearrangements.

The mechanisms that cause these rearrangements are classified as replicative and non-replicatives Moreover, the human genome features can result in a susceptibility of a specific region to rearrangements, leading to a genomic instability [5,6].
There recurrent rearrangements are caused mainly by low copy repeats (LCR) regions [7,8]. The most frequent recurrent mechanism known is the non-allelic homologous recombination (NAHR). NAHR mainly occur due to the presence of LCRs. These sequences, because of their high similarity, may eventually misalign during either mitosis or meiosis, between either homologous or non-homologous chromosomes. These misalignment leads to deletions, duplications, inversions and translocations [1,9].
Additionally to LCR regions, other repetitive elements can lead to NAHR, as short interspersed nuclear elements (SINEs) [10]. SINEs are retrotransposons, sequences that amplify themselves throughout the genome through RNA intermediates, originated probably due to parasitic DNA [11,12]. Alu elements are the most frequent SINE known on the human genome and they are present only in primates [13,14]. They have ~300 bp long comprehending more than 10% of the human genome, being classified in several subfamilies [14].
Searching for a breakpoint formation mechanism demands to sequence the breakpoint at base pair level. Therefore, it is possible detect its exact position, analyze the genomic architecture of the loci involved and identify the nucleotides sequence of the breakpoint [15,16,17,18,19].
Only with this information, it is possible to understand the molecular event that triggered the rearrangement.
Thus, this study aimed to sequence the breakpoint of a patient with a single deletion on chromosome 22q13.2 in order to understand the genomic structure of the region involved as well as elucidate the mechanism behind this rearrangement.
This patient presented a SNP array (Illumina Cyto-SNP-12) result with a deletion encompassing chr22:41,032,863-41,640,297 (hg19). We aligned at the breakpoint there is a r nd searched for chimeric reads in which we could find the breakpoint and the junction sequence that connect the two loci. With this approach, we find the exact coordinate of the entire deletion: Chr22:41,026,112-41,650,780 (hg19), which sets a 624,688 bp deletion.
The deletion occurred in introns of two distinct genes, merging the MKL1 and RANGAP1 genes. The investigation of the region revealed that both locus are part of two distinct SINEs Alu sequences. Chr22:41,026,112 region is inside an AluSx sequence, and Chr22:41,650,780 region is inside an AluY sequence. The analysis of both SINEs showed a high level of similarity of both sequences. At the breakpoint there is a region of total homology and, subsequently, the sequence of the strand after the deletion ( Figure 1).
The sequence shows two regions with a high similarity level and that the breakpoint occurs without micro-homology, although there is some homology after the breakpoint. After this point, the strand follows the other sequence. Using a previous array result to determine a region to be sequenced is an effective and inexpensive way to flank a breakpoint. Since arrays are comparative tests that aim to assess copy number imbalances in the genome and uses a logarithmic scale to reckon gains and loss regions, it is not very precise in determine a breakpoint [20,21,22]. The same DNA sample that undergoes for the same test can present different breakpoint positions. To solve this limitation of the technique, we considered a range of three SNP array probes that preceded and three SNP array probes that succeeded each breakpoint region previously detected. This approach proved to be a success, since we were able to directly reach and sequence the breakpoint.
Sanger sequencing of this region would be a challenge, it would demand a great number of primers for several regions of repetitive elements. Once the whole regions to be sequenced encompass both genic and intergenic sequences, it was required that we built a customized NGS panel. However, a panel aiming to sequence a small region would be unviable. Thus, we took advantage to insert this region in other custom panel, drastically reducing the expense of the assay.
Since we reach the exact breakpoint, we could assess the region involved. The fact that the breakpoint is inserted in two different Alu and the absence of micro-homology inside the breakpoint, lead us to consider NAHR mechanism driven by Alu. AluY is a subfamily of the most recent Alu elements in the human genome, and the only category that encompass active elements, that is, Alu elements that still replicates themselves throughout the genome. AluSx belongs to the AluS subfamily, which is not active anymore.
The fact of the high similarity between these two Alu elements can be explain by the fact that AluY subfamily is derived of the AluS subfamily, and it was relocated to a different type of subfamily because of its activity [20,23,24,25,26,27,28,29].
It is the first time that a fusion between MKL1 and RANGAP1 genes are sequenced.
MKL1 gene is known to cause leukemia due to a fusion with RBM15 [30,31]. However, its fusion with RANGAP1 gene remains unclear. None of these two genes are related with any phenotypic features of the patient and it is unlikely that the fusion of these genes were the responsible for these characteristics.
The phenotype is better explained by the deletion of EP300 gene, causing the rare autosomal dominant Rubinstein-Taybi Syndrome 2, a mild form of Rubinstein-Taybi Syndrome, which fits with the clinical signs observed [32,33].
Flanking breakpoints by using array platform to capture and sequencing by using NGS platform is an inexpensive and effective way to reveal the sequence of a breakpoint junction. Alu elements are an important SINE regarding recurrent non replicative mechanisms, like NAHR, for genomic imbalances on the human genome.
Delineate and study breakpoints is of the utmost importance in the understanding of the influence of the genomic architecture in clinical assays.

Methods
We assessed the DNA sample of a male patient, 20 years old, double aortic arch, esophageal stricture, seizures, inguinal hernia DI, cryptorchidism, eyelid crack down, speech disorder, polydactyly, microcephaly, floating nose. Karyotype and FISH analysis for 22q13.2 were normal. Array study revealed a 607 kb deletion in chromosome 22q13.2 (Illumina Cyto-SNP-12). In order to precisely define the region of the breakpoint, we reanalyzed the result on BlueFuse Multi software. On chr22:41,640,297 locus, we defined the 3 probes that preceded the last probe identified as deleted and three probes that succeeded the first probe as not deleted. Then we reached two regions for sequencing: Chr22:41,025,934-41,036,329 and Chr22:41,640,297-41,651,034.
In order to capture this region, we designed the probes for Nextera Rapid Capture Custom on Illumina Design Studio. The experiments were performed according to the manufacturer instructions and sequenced on Illumina MiSeq. DNA sequence was aligned by Agilent Sure Call and the bam file was analyzed by IGV.

Availability of data and materials
All data in this article is included in the published work.