A germline chimeric KANK1-DMRT1 transcript derived from a complex structural variant is associated with a congenital heart defect segregating across five generations

Structural variants (SVs) pose a challenge to detect and interpret, but their study provides novel biological insights and molecular diagnosis underlying rare diseases. The aim of this study was to resolve a 9p24 rearrangement segregating in a family through five generations with a congenital heart defect (congenital pulmonary and aortic valvular stenosis, and pulmonary artery stenosis), by applying a combined genomic analysis. The analysis involved multiple techniques, including karyotype, chromosomal microarray analysis (CMA), FISH, whole-genome sequencing (WGS), RNA-seq and optical genome mapping (OGM). A complex 9p24 SV was hinted at by CMA results, showing three interspersed duplicated segments. Combined WGS and OGM analyses revealed that the 9p24 duplications constitute a complex SV, on which a set of breakpoints match the boundaries of the CMA duplicated sequences. The proposed structure for this complex rearrangement implies three duplications associated with an inversion of ~ 2Mb region on chromosome 9 with a SINE element insertion at the more distal breakpoint. Interestingly, this hypothesized genomic structure of rearrangement forms a chimeric transcript of the KANK1/DMRT1 loci, which was confirmed by RNA-seq on blood from 9p24 rearrangement carriers. Altogether with breakpoint amplification and FISH analysis, this combined approach allowed a deep characterization of this complex rearrangement. Although the genotype-phenotype correlation remains elusive from the molecular mechanism point of view, this study identified a large genomic rearrangement at 9p segregating with a familial congenital clinical trait, revealing a genetic biomarker that was successfully applied for embryo selection, changing the reproductive perspective of affected individuals.


Introduction
Structural variations (SVs) can have a signi cant impact on congenital human diseases (Schuy et al. 2022).Duplications at the short arm of chromosome 9 (9p) are frequent autosomal alterations of the newborns (Temtamy et al. 2007; Guilherme et al. 2014; Cammarata-Scalisi 2019), with more than 200 cases reported up to now (Sams et al. 2022).In the majority of the cases, 9p duplication causes global developmental delay and a well-recognized spectrum of ndings, such as craniofacial (mainly microcephaly and typical facial dysmorphisms) and hands/toes anomalies, accompanied by a broad spectrum of less common additional varying features, including kidney abnormalities, other skeletal malformations, and congenital heart defects (Sams et Nakagawa et al. 1999).This phenotypic heterogeneity can be explained by the large number of genes that can be affected and the variable size of the duplicated regions.In addition, 9p duplications are mostly due to segregation of derivative chromosomes from balanced rearrangements, resulting in additional extra 9p chromosomal abnormalities, and only a few of them are de novo pure 9p duplications (Sams et al. 2022;Tkemaladze et al. 2023; Ana Cristina V. Krepischi-Santos and Vianna-Morgante 2003, 9).Although 9p is a relatively gene-poor genomic segment, it contains more than 450 genes, some of them essential for human development.At least 50 of them were previously associated with human diseases.
Several efforts were done to de ne speci c loci within 9p responsible for each phenotypic manifestation (Sams et al. 2022;Wilson, Raj, and Baker 1985, 9;Huret et al. 1988), resulting in the delimitation of a minimal critical subregion at 9p24-9p22 (Haddad et al. 1996;Fujimoto, Lin, and Schwartz 1998).However, there is still no consistent genotype-phenotype correlation (Cammarata-Scalisi 2019; Littooij et al. 2002)..It is noteworthy that both 9p deletions and duplications can be associated with congenital heart defects -CHD (Morrissette et al. 2003, 9;Nakagawa et al. 1999;Sams et al. 2022), implicating one or more loci for this pathology in the short arm of chromosome 9. CHD is the most common birth defect in newborns, and a substantive cause of morbidity and mortality in infancy (Houyel and Meilhac 2021).
SVs, including duplications, can affect the expression of genes nearby breakpoints and even several hundred kilobases away (Kabirova et al. 2023).Here we report a complex cryptic rearrangement at 9p24 comprising three duplications, which is segregating in a large pedigree in a dominant pattern through ve generations; 22 carriers of the rearrangement were affected by an isolated CHD (pulmonary artery and aortic stenosis).Using a combination of genomic approaches and transcriptomic analysis, we dissect the structure of this complex SV.

Results
We investigated here a 5-generation family with 22 individuals presenting with a phenotype of nonsyndromic pulmonary artery and aortic stenosis.

Clinical description
The proband was followed by a pediatric cardiologist from 2 years of age, with clinical and echocardiographic diagnosis of valvular aortic stenosis and both valvular and supravalvular pulmonary stenosis.At diagnosis, the maximal gradients were 25 mmHg through the aortic valve, 20 mmHg through the pulmonary valve and 15 mmHg at the supravalvular pulmonary artery.The severity of the condition slowly progressed and at the age of 8 the gradients have risen to 34, 70 and 35 mmHg, respectively.He underwent percutaneous dilatation of the pulmonary valve, followed by a progressive clinical worsening of the electrocardiographic, radiological, and echocardiographic signs, with an increase in aortic gradient to 114 mmHg at the age of 15.Cardiac surgery was then indicated, and he was submitted to implantation of a metallic aortic valve prosthesis coupled to supravalvular pulmonary artery angioplasty.Both immediate and late postoperative outcomes were uneventful, and he is still asymptomatic at 42 years of age, with near normal cardiac clinical, radiological, and echocardiographic examinations.The proband's offspring consisted of three children.The rst baby had an early diagnosis of severe valvular aortic and valvular pulmonary stenosis requiring urgent cardiac surgery at 2 months of age; albeit technically successful, it was followed by clinical deterioration and rapid demise.The two other children were unaffected by the CHD and conceived by in vitro fertilization followed by embryo selection, based on the results of the genomic analysis described here.
G-banded karyotype, whole-exome sequencing and chromosomal microarray analysis (CMA) G-banded karyotype of the proband showed no abnormalities (Supplementary Fig. 1a), and exome sequencing did not detect any pathogenic variants.CMA at 180K resolution (Agilent Technologies) was performed in proband's DNA sample extracted from peripheral blood, revealing copy number variants (CNVs) of 9p24.3 sequences < 500 kb: two adjacent duplicated genomic segments, interspersed between a normal copy number segment (Supplementary Fig. 1b).
Although the duplicated segments partially encompass DOCK8, KANK1 and DMRT1 sequences, the resulting structure could not lead to gene disruption.Moreover, this region is covered by several overlapping CNVs (duplications and deletions) in control populations (DGVhttp://dgv.tcag.ca/dgv/app/home).Therefore, the 9p24 CNVs were classi ed as variants of unknown signi cance (VUS).
The proband is one of the 22 affected individuals of a large family with aortic and pulmonary artery stenosis transmitted in a dominant pattern through ve generations (Fig. 2).To verify a possible association of the complex SV detected in the proband with the phenotype, 21 additional family members were evaluated by CMA (data not shown).The analysis revealed that the 9p24 rearrangement segregated with the CHD in all 11 affected individuals, while it was absent in all 11 normal relatives.

Whole-genome sequencing (WGS) and optical genome mapping (OGM)
To dissect the structure of the 9p chromosomal rearrangement discovered by CMA, we employed WGS and OGM techniques.
OGM at 100X coverage was analyzed regarding the 9p24 rearrangement and other possible SVs.
Although the duplications could be visualized in the copy number track, they were not called by either Access software pipelines (pipeline CNV, which detects copy number changes > 500 kb; and pipeline SV, which detects duplications > 30kb).Analysis of the detected SVs showed the presence of three hybrid molecules at 9p24 (Fig. 3).These 9p24 hybrid molecules were manually analyzed based on the genomic coordinates at the breakpoints/junction sequences, identifying breakpoints and an inversion.
WGS data analysis identi ed the three 9p24 duplicated segments, showed two breakpoints with discordant reads, and indicated that both homologs of chromosome 9 contain at least one copy of the concordant sequence around the breakpoints, as shown in Fig. 4 (a and b).OGM data con rmed all the breakpoints identi ed by WGS analysis and revealed one additional junction, including an inverted segment.
Taken together, WGS and OGM analyses con rmed the three 9p24 duplications reported by CMA, disclosed an SV complex pattern, and identi ed a partially overlapping set of breakpoints (Fig. 4), all of them matching the boundaries of the CMA duplicated sequences.
The combined evaluation of data allowed us to propose a structure for this complex rearrangement, which implies an inversion of ~ 2Mb region on chromosome 9 with partial duplications at the breakpoint regions.Since this 2 Mb inversion is con rmed, this structure looks like a large inverted region with duplicated and non-duplicated sequences.
To validate the proposed 9p24 rearrangement structure, we ampli ed and sequenced by Sanger the dup2-dup3 breakpoint, which was not covered by split-reads in the WGS data.Due to the presence of homopolymer tracts within these regions, we were not able to obtain complete end-to-end sequence.Thus, we employed NGS to analyze the obtained amplicons.This analysis con rmed the junction between dup2 and inverted dup3 regions and revealed an insertion of a ~ 500 bp sequence between them that corresponds to a SINE element (Supplementary Fig. 2).
FISH analysis with BAC probes (Supplementary Fig. 3) mapped to 9p24.3 duplicated segment dup1 showed that the additional genomic copies were not inserted in other chromosomes or moved to a distant region of chromosome 9.In addition, the rearranged chromosome 9 appeared to carry an inverted segment inserted between the two duplicated regions dup1 and dup2.We also performed FISH analysis with non-duplicated probes proximal to dup3 and within the inverted segment, respectively; the results also con rmed the presence of an inversion near dup3.
At this point, 9p24 SV was considered a biomarker for the proband segregating with the phenotype, and this information was used for embryos selection in preimplantation genetic diagnosis (PDG), after fertilization in vitro (IFV).Two healthy non-carrier children were born after this procedure.

RNASeq analysis and splicing prediction using AI
The proposed structure of the 9p24 rearrangement suggests the formation of a chimeric transcript, which would include a 5'-prime fragment of KANK1 (exons 1 and 2 of the MANE-annotated isoform of KANK1 ENST00000382297.7) and a portion of DMRT1 locus.To validate this hypothesis, we performed a RNAseq experiment on blood samples from three 9p24 rearrangement carriers and three unrelated controls.In all samples, we observed high DOCK8 expression and low-level expression of KANK1 (Fig. 5).In control samples, we detected no RNA-seq reads aligning to the DMRT1 locus, consistent with the testis-speci c expression pattern of this gene (Raymond et al. 1999).However, in the 9p24 SV carriers, the duplicated region including DMRT1 gene was covered by RNA-seq reads.All reads mapped in this region support transcription from the forward strand; this strand should be observed if transcription starts from the KANK1 gene promoter.
To predict the structure of the chimeric transcript which can be formed at KANK1-DMRT1 breakpoint, we infer splice acceptor (SA) and donor (SD) sites using GENA tool (Fishman et al. 2023).Within KANK1 and DMRT1 genes, the predicted splice sites match known exon junctions.As shown in Fig. 5, the KANK1 gene is truncated after splice-donor site SD1.
Within the region which is presumably transcribed in the rearranged sequence, GENA annotates one pair of SD and SA sites, suggesting the formation of an exon containing 48 nucleotides (ag-GTA CCT ACG CTT GGA AGT GCC AGC ACT ATT ACG TTT CAC TCT GAA CAG-gt).The next splice donor corresponds to the beginning of DMRT1 exon 2. Thus, the predicted transcript includes the rst two KANK1 exons, 48 bases of additional sequence, and DMRT1 exons 2-9.The inserted 48-nucleotides sequence contains a stop-codon in the KANK1 reading frame (Supplementary material); thus, the predicted chimeric transcript contains a premature stop-codon, and probably undergoes nonsense-mediated RNA decay.Moreover, KANK1 exon 2 and DMRT1 exons are in different reading frames, thus even if the additional 48 nucleotides are not included, the resulting chimeric transcript likely does not encode a functional protein.
Although the coverage of this region is low, we were able to detect several RNA-seq reads con rming junctions of DMRT1 exons and the junction of the KANK1 exon 2 with DMRT1 exon 2. Altogether, with breakpoint ampli cation and FISH analysis, these data con rm the proposed structure for the 9p24 complex SV, indicating that there is an additional copy of KANK1, which is truncated, and there is a readthrough from this copy into the DMRT1 locus.

Discussion
Complex rearrangements are still an underestimated cause of genetic diseases, and in some loci they constitute up to 30% of the pathogenic CNVs (Schuy et al. 2022).Sensitivity of the available methods for SV detection is especially limited for resolving complex SVs involving multiple chromosomal segments.This study con rms the importance of a multiomics approach and a combination of different techniques like CMA, FISH, WGS, OGM and RNASeq to fully dissect a complex chromosomal rearrangement.CMA revealed the duplications, whereas WGS/OGM allowed the re nement of the breakpoints, revealed the presence of an inversion, phasing of the multiple rearrangements in cis, and provided a framework for the proposal of genomic structure.Although the complex nature of the 9p24 SV was revealed by OGM, con rming breakpoints already detected by WGS and revealing a new one, the duplicated segments were not called, which revealed a limitation of the system.FISH was crucial to show that the duplicated segments mapped on 9p24, and also to support the proposed structure of the rearrangement, with an inversion associated with duplications.Finally, RNA-seq provided experimental evidence of chimeric KANK1/DMRT1 transcripts, and in silico AI-based predictive tools assisted in analysis of the chimeric transcript structure.
Duplication/deletions restricted to the 9p24.3cytoband, including DOCK8 and KANK1, have been reported across multiple neurodevelopmental/psychiatric phenotypes (Capkova et al. 2021;Glessner et al. 2017).DOCK8 biallelic mutations cause a recessive condition (https://omim.org/entry/243700); its disruption in heterozygosity was identi ed in a few patients with mental retardation and/or seizures (Griggs et al. 2008), who were not further evaluated by the presence of additional pathogenic variants by exome analysis.This is the case for several reports of 9p24.3CNV cases, and current data can only support a possible contribution to neurodevelopmental/psychiatric phenotypes in a multifactorial model.Therefore, an association of 9p24.3 heterozygous CNVs with clinical ndings, as major variants with high impact, is still controversial.CNVs encompassing DOCK8 or KANK1 are detected in the general population at a relatively high frequency, and an eventual contribution to a congenital rare phenotype should be evaluated with caution.The absence of a neurodevelopment phenotype associated with the DOCK8/KANK1 duplication (dup1) disclosed in our family is not surprising.
Haploinsu ciency of DMRT1, 2 and 3, mainly due to 9p24. 3  ).In the current case, there is involvement only of the DMRT1 gene (dup2), and similar phenotypes are not present in the 9p24 SV carriers reported here.In association with the duplications and inversion, we detected a non-reference (both GRCh38 and T2T) SINE insertion at one of the breakpoints, disrupting one of the copies of the DMRT1 gene.SINE is a transposable element and its mobilization has long been associated with evolution and human diseases (Akrami and Habibi 2014; Pfaff, Singleton, and Kõks 2022).Several cases linked with SINE-VNTR-Alus rearrangements induce aberrant splicing patterns, and we cannot exclude the possibility that this insertion alters the DMRT1 expression pattern.Copy number variants overlapping the short arm of chromosome 9 were already associated with CHD (ref), implicating one or more loci in this genomic region.The genetic landscape of CHD is complex, and an interesting emerging feature is that CHD mutations often alter gene/protein dosage (Fahed et  SMARCA2 is not disrupted by the rearrangement, but it is included in the inverted segment.The haploinsu ciency of SMARCA2 causes two dominant developmental conditions, namely Blepharophimosis-impaired intellectual development syndrome (OMIM #619293) and Nicolaides-Baraitser (OMIM #601358), with other clinical signs including CHD.However, as both conditions are associated with severe syndromic intellectual disability, it is not probable that its expression is disrupted by the rearrangement.
Regarding KANK1, deletion of the paternal allele was reported in one single family to cause the condition named cerebral palsy, spastic quadriplegic 2 (OMIM #612900); however, no following studies support this association.Indeed, chromosome 9 uniparental disomy is not related to imprinted syndromes (Elbracht et al. 2020), and clinical ndings in UPD (9) are commonly attributed to homozygous variants in genes related to recessive conditions or residual trisomy in mosaic.Currently, there is no clinical evidence for haploinsu ciency or triplosensitivity of KANK1 (KANK1 curation results for Dosage Sensitivity).Notwithstanding, we have found evidence in literature proposing a role for KANK1 in cardiac development (Nguyen and Lee 2022; Botos et al. 2023).KANK genes are scaffold proteins, bridging microtubules to focal adhesion sites (Botos et al. 2023;Pan et al. 2018).The Kank1 protein expression was shown to be widely distributed in various murine tissues, with relatively high levels in cardiac muscle (Nguyen and Lee 2022).In humans, the longest transcript (NM_015158) shows tissue speci c expression, predominantly in heart and kidney.In addition, it was found in an injury-speci c gene regulatory network in a transcriptome analysis related to cardiac regeneration in the zebra sh (Botos et al. 2023).
It is not clear how a complex SV involving three DNA segments was formed, with six breakpoints (two in each CNV) with three breakpoint junctions.At both sides anking the dup2-dup3 breakpoint, we observed microhomologies of simple repeats composed of polyA/T sequences.However, insertion of a nonreference SINE element between dup2 and dup3 argues against non-allelic recombination caused by homology of these polyA/T sequences.Alternatively, the SINE insertion might be present in the ancestral chromosome on which the rearrangement took place or an additional event occurring after SV has been formed.It is interesting to note that the transcriptome analysis detected the presence of a chimeric transcript encompassing KANK1 and DMRT1 exons, maybe reinforcing a modi ed product of KANK1 as a candidate for the phenotype.The role of chimeric transcripts as cause of congenital defects is poorly explored (Zuccherato et al. 2016), in contrast to fusion transcripts commonly described as somatic events in cancer (Salokas, Dashi, and Varjosalo 2023).Only isolated cases were reported related to the detection of chimeric transcripts (gene fusions) as underlying molecular cause of developmental/neurological phenotypes (Boone et al. 2014;Ferrari et al. 2017).Recently, two studies employed an approach of detecting chimeric transcripts using RNA-seq data in rare congenital diseases, one of them with individuals with birth defects (Yamada et al. 2021;Oliver et al. 2019), leading to an increased diagnostic rate.However, in silico analysis in the current case predicted a premature stopcodon in the fusion transcript, which probably would undergo nonsense-mediated RNA decay.An eventual contribution of this fusion KANK1-DMRT1 gene to the cardiac phenotype remains to be fully explored.
Considering the recent report of ultra-long-range interactions between active regulatory elements (Friman et al. 2023), distant 9p genes with normal copy number could be misregulated due to this 9p rearrangement, which makes the derivation of genotype-to-phenotype association relationships even more complicated.In particular, the study of this SV was crucial for genetic counseling and reproductive choices of the family.Even without the identi cation of the precise mechanism underlying the CHD phenotype, this study identi ed the SV as a biomarker that was used to identify embryos at risk and select for implantation those without the CHD risk.This strategy resulted in a healthy offspring for at least one couple.

Patients and genomic samples
Written informed consent for this study was obtained from affected individuals or their parents.Genomic DNA samples were extracted from peripheral blood of 22 family members (n = 11 patients and n = 11 nonaffected relatives), using standard procedures (phenol-chloroform followed by ethanol precipitation).RNA samples were obtained from peripheral blood of three male patients and three non-related male controls using the RNeasy Mini Kit (QIAGEN).

GTG-banded karyotype, and chromosomal microarray analysis (CMA)
Peripheral blood temporary culture (72h) was performed in the presence of phytohemagglutinin, and GTG-banding was obtained according to standard methods.FISH analysis based on metaphase spreads and interphase preparations was performed using BAC clones, as previously described (A.C. V. Krepischi-Santos et al. 2009), with genomic sequences mapped to the short arm of chromosome 9.

Whole-genome (WGS) analysis
WGS data six individuals (carriers of the 9p rearrangement, and three unrelated controls) was obtained.Brie y, genomic libraries were constructed with 1 µg of genomic DNA and sequenced on the Illumina HiSeq 2500 platform using 150 base paired end reads (~ 30x coverage).Reads were aligned to the GRCh38 human genome reference using the BWA algorithm (Li 2013)  For analysis, based on the alignment results, we computed tracks showing the depth of coverage (using deepTools bamCoverage) (Ramírez et al. 2016) and discordant read pairs (using samtools) (Danecek et al. 2021).These data were visualized using the IGV software.Breakpoint structures were assessed based on the following lters: the presence of split-reads ( > = 4 in case, no reads in control), with matching supplementary segment and the presence of discordantly aligned mates ( > = 5 in case, <=1 in control), with matching mate alignment coordinates.The orientation of the DNA segments was determined based on the alignment locations and strands.For two breakpoints where discordant reads were detected, we identi ed single nucleotide variants (SNV) near the breakpoints.Analyzing SNV distribution in reads, we classi ed all pairs as: 1) concordant read pair with reference sequence; 2) discordant read pair with an alternative sequence; 3) concordant read pair with an alternative sequence.
These data indicate that both homologs of chromosome 9 contain at least one copy of the concordant sequence around the breakpoint, as shown in the rst two lines in Fig. 4B.

Optical genome mapping (OGM) data analysis
OGM was conducted with ultra-high molecular weight DNA samples (> 150 kb) extracted from peripheral blood cells of the proband using the Bionano Prep SP Blood and Cell DNA Isolation kit (Bionano, San Diego, CA, USA).DNA labeling was performed using the DLS DNA Labeling Kit (Bionano, San Diego, CA, USA) to add uorophores to the speci c motif "CTTAAG," and the sample was run on the Saphyr chip to collect data on the Saphyr System (Bionano, San Diego, CA, USA) at 100x coverage.OGM data were analyzed using the De Novo Assembly pipeline, followed by CNV and SV pipelines, and visualized using the Bionano Access software.

RNA-seq
Total RNA samples extracted from peripheral blood of three patients and three unrelated male controls were used to build cDNA libraries using the TruSeq®Stranded Total RNA LT-kit (with Ribo-Zero TM Gold) (Illumina, USA).Sequencing was performed on the NextSeq 500 platform Mid Output v2 Kit (150 cycles) (Illumina, USA).The FASTQ les were aligned against the ribosomal reference sequence (NCBI, 12/2017) using the BWA software [26] version 0.7.17-r1188, in MEM mode, with the standard parameters, except for the -t 4 parameters.Reads not aligned to ribosomal sequences went to the alignment step against the reference sequence of the human genome (version GRCh37 -hg19) using the STAR software [27], version 2.6.1a_08-27.The annotation database (GTF le) used was the Ensembl le in version 87 in the same build as the human genome reference (GRCh37).

Breakpoint amplicon sequencing
One of the breakpoints was ampli ed by PCR using proband's and control genomic DNA as template (94°C, 4 min; [94°C, 30 sec; 67°C, 40 sec; 72°C, 2 min)×14 cycles, decreasing the annealing temperature by 0.5°C after each cycle; (94°C, 30 sec; 60°C, 40 sec; 72°C, 2 min) × 23 cycles; 72°C, 10 min; 4°C).Primer  The plot shows the copy number pro le (log 2 ratios, Y axis) of the distal region of the short arm of chromosome 9 (9p24), with probes (black dots) depicted according to their genomic coordinates (from pter to the centromere, X axis).To further re ne the duplications breakpoints, we applied a CMA (array-CGH) based on a custom 44K (Agilent) platform covering at higher resolution the 9p sequences, which con rmed the presence of two adjacent 9p24.3 duplications (dup1 and dup2) and disclosed a third (dup3) one at 9p24.The arrow indicates the proband (black symbols denoted affected individuals with pulmonary artery and aortic stenosis; orange is a female patient who was born only with heart murmur).All affected individuals evaluated by CMA were carriers of the 9p24 rearrangement (red asterisk), while evaluated normal family members were non-carriers (black minus symbol).The two alive affected individuals of the last generation were not tested.The two unaffected children of the proband were conceived by in vitro fertilization followed by embryo selection, based on the results of the genomic analysis described here.

Figures
Figures

Figure 1 Chromosomal
Figure 1 3 (blue shadows and dark blue horizontal lines in the 9p24 ideogram).Above the CNV plot, regions with polymorphic CNVs are presented (pink horizontal lines), as well as genes mapped to the segment (black lines) with respective exons.Image extracted from Nexus Copy Number software (Bionano).

Figure 2 Family
Figure 2