Mate-pair genome sequencing reveals structural variants for idiopathic male infertility

Currently, routine genetic investigation for male infertility includes karyotyping analysis and PCR for Y chromosomal microdeletions to provide prognostic information such as sperm retrieval success rate. However, over 85% of male infertility remain idiopathic. We assessed 101 male patients with primary infertility in a retrospective cohort analysis who have previously received negative results from standard-of-care tests. Mate-pair genome sequencing (large-insert size library), an alternative long-DNA sequencing method, was performed to detect clinically significant structural variants (SVs) and copy-number neutral absence of heterozygosity (AOH). Candidate SVs were filtered against our in-house cohort of 1077 fertile men. Genes disrupted by potentially clinically significant variants were correlated with single-cell gene expression profiles of human fetal and postnatal testicular developmental lineages and adult germ cells. Follow-up studies were conducted for each patient with clinically relevant finding(s). Molecular diagnoses were made in 11.1% (7/63) of patients with non-obstructive azoospermia and 13.2% (5/38) of patients with severe oligozoospermia. Among them, 12 clinically significant SVs were identified in 12 cases, including five known syndromes, one inversion, and six SVs with direct disruption of genes by intragenic rearrangements or complex insertions. Importantly, a genetic defect related to intracytoplasmic sperm injection (ICSI) failure was identified in a patient with non-obstructive azoospermia, illustrating the additional value of an etiologic diagnosis in addition to determining sperm retrieval rate. Our study reveals a landscape of various genomic variants in 101 males with idiopathic infertility, not only advancing understanding of the underlying mechanisms of male infertility, but also impacting clinical management.


Introduction
Infertility affects around 8-12% of couples worldwide, among which male factor accounts for approximately 50% of cases (Agarwal et al. 2021). Male infertility is clinically heterogeneous with variable phenotypic presentations. Spermatogenesis is a highly complex process that occurs through successive mitotic, meiotic, and post-meiotic phases involving multiple molecular pathways (Krausz 2011). Human spermatogenesis involves dynamic transcription of over 4000 genes in various germ cell subtypes; thus, defects in any of these genes may disrupt pathways that could lead to male infertility (Hotaling 2014). Currently, known genetic factors including numerical abnormalities (specifically sex chromosome aneuploidies) and Y chromosome microdeletions (Alhathal et al. 2020) are etiologic in approximately 15% of male infertility (Krausz and Riera-Escamilla 2018) cases. Particularly, identification of the genetic factors in Zirui Dong, Jicheng Qian, and Tracy Sze Man Law contributed equally. patients with nonobstructive azoospermia (NOA) is of utmost clinical importance in order to provide prognostic information (such as sperm retrieval success rate) so that the unnecessary invasive testicular sperm extraction (TESE) can be avoided (Krausz 2011).
Recently, increasing numbers of studies show male infertility an be attributed to monogenic disorders due to single-nucleotide variants (SNVs) and/or small insertions/ deletions (Indels) (Alhathal et al. 2020;Kherraf et al. 2022). In comparison, copy-number variants (CNVs) affecting dosage-sensitive genes have only been reported in limited cases. In addition, gross chromosome structural rearrangements including balanced translocations have been reported in patients with male infertility, the pathogenic mechanism of which has been demonstrated as gene disruption or dysregulation in some cases Schilit et al. 2020;Yammine et al. 2021). However, the contribution of cryptic structural variants (SVs, including deletion, duplication, translocation, inversion, and insertion) that are not detectable by karyotyping analysis is also largely unknown. Lastly, the prevalence of copy-number neutral absence of heterozygosity (AOH) contributed by uniparental disomy and/or parental consanguinity in these patients has also not yet been well investigated. A method that can comprehensively investigate the spectrum of SVs and AOH, and their contribution to male infertility is clearly warranted.
Currently, standard genome sequencing (GS) relies on massively parallel sequencing of small-insert size DNA libraries on a next-generation sequencing platform. While single-nucleotide variants and small indels can be accurately detected, its performance is limited in detecting SVs across challenging regions of the genome, particularly those located in regions with repetitive elements. Furthermore, it is not always possible to delineate the orientation of complex SVs (for example, duplications can be in tandem, or as an insertion elsewhere in the genome). In comparison, the advancement of long-read sequencing has shed light on a broader spectrum of SVs, including complex and cryptic SVs that are blind to conventional methods, in both the general population (Jain et al. 2018) and in human diseases (Stephens et al. 2018). However, the high cost and large requirement of high molecular weight DNA associated with the platform prevents routine clinical application. To overcome these limitations, we developed an alternative method of long-DNA sequencing, which utilizes large DNA inserts (~ 5 kb, namely mate-pair) and low-pass (low-coverage and high-throughput) GS (Dong et al. 2019a, b) with longer read-length (100 bp) compared with jumping library (~ 26 bp) (Talkowski et al. 2011). It identifies and delineates SVs and copy-number neutral AOHs (Dong et al. 2019b(Dong et al. , 2021a with an increased diagnostic yield compared with the short read-length methods (Dong et al. 2019b). With that, we profiled the spectrum of SVs in 1077 fertile men (Dong et al. 2019a) particularly for balanced translocations and inversions. On the other hand, functional validation of candidate variants is challenging as transcriptome studies on peripheral blood (the most common sample type) may not always recapitulate the aberrant expressions contributed by SVs. In addition, bulk tissue RNA-seq might not be specific for the exact cell-type(s) of interest.
Herein, we performed mate-pair GS on 101 infertile men with negative findings from karyotyping analysis and Y microdeletion detection (Dong et al. 2019b) to investigate the genetic etiologies contributed by SVs and copy-number neutral AOH that are underappreciated by standard-of-care tests. By pinpointing the precise cell-types and timepointspecific expression profiles from single-cell RNA-seq datasets collected during human testicular development and in adult germ cells (Guo et al. 2021;Han et al. 2020;Karlsson et al. 2021), we demonstrated a better understanding of the pathogenicity related to male infertility for target gene(s) affected by SVs.

Study participants
This study was approved by the Joint Chinese University of Hong Kong-New Territories East Cluster Clinical Research Ethics Committee (CREC Ref. No. 2021.719). This retrospective study included 101 consecutive patients with non-obstructive azoospermia (absence of sperm) or severe oligozoospermia (< 5 M/mL) who had two semen tests. All patients have previously received negative results from G-banded chromosome analysis and Y microdeletion assays (see Supplemental Materials and Methods) from 2018 to 2022. Informed written consent was obtained from each participant at the time of genetic testing or microsurgical epididymal sperm aspiration (MESA) or testicular sperm extraction (TESE). Johnsen score was provided during pathophysiological exam.

DNA preparation, genome sequencing and variant identification
The remaining DNA from Y microdeletion detection was quantified using a Qubit dsDNA HS Assay kit (Invitrogen, Carlsbad, CA, USA) and the DNA integrity was assessed by agarose gel electrophoresis. After QC, the DNA was subjected to mate-pair library construction and low-pass GS (Dong et al. 2019b) with a minimum of 60 million read-pairs (100 bp in length; equivalent to 4X read-depth) for each case (Dong et al. 2021a) on an MGISEQ-2000 platform (MGI Tech Co., Ltd., Shenzhen, China). Subsequently, rare SVs (by analyzing the chimeric read-pairs and difference of read-depth) and AOH (by analyzing variant allele fraction), which were not present in the 1000 Genomes Project from our previous analyses (Dong et al. 2018(Dong et al. , 2021b, were identified using our reported in-house bioinformatics pipelines (Dong et al. 2019b(Dong et al. , 2021a, and annotated and interpreted based on the guidelines of the American College of Medical Genetics and Genomics (Del Gaudio et al. 2020;Riggs et al. 2019) (see Supplemental Materials and Methods). SVs were filtered out if it existed in any of the 1,077 fertile men reported in our previous study (Dong et al. 2019a). During interpretation, genes involved or likely affected by genomic rearrangements were reviewed in association with male infertility, oligozoospermia or azoospermia, spermatogenesis deficiency, and ciliopathies (Kherraf et al. 2022).
High read-depth GS was performed for selected cases to identify or exclude additional clinically significant SNVs/Indels in cases for which mate-pair GS identified a pathogenic deletion in a gene implicated in a compound or autosomal recessive (AR) etiology (n = 2; cases 20C1594 and 20C2993) or for multiple regions of AOH (n = 3; cases 20C0911, 22C0195 and 21C3049). 500 ng genomic DNA was used for library construction and sequenced for a readdepth of ~ 40fold on an MGISEQ-2000 platform (MGI). After quality control, paired-end reads were aligned to the human reference genome (GRCh37/hg19) by BWA, SNV/ Indel detection was carried out by HaplotypeCaller V.3.4 of the Genome Analysis Toolkit (Broad Institute), and annotation was performed using ANNOVAR with public and in-house datasets. Potentially deleterious SNVs/Indels were identified and classified following ACMG guidelines (Richards et al. 2015) (see Supplemental Materials and Methods).

Variant verification
CNVs were verified by either chromosomal microarray analysis (CMA) or quantitative PCR (qPCR; see Supplemental Materials and Methods). For verification of structural rearrangements (Dong et al. 2021a) and SNVs/Indels , primers targeting the rearrangement junctions or regions with SNV/Indel were designed using Primer3, Primer-Blast (NCBI), and in silico PCR (UCSC). In addition, PCR was performed in case and negative controls simultaneously, and products were sequenced on an ABI 3730 DNA Analyzer (Applied Biosystems). Sanger sequencing results were aligned to the reference genome by BLAT (UCSC) for breakpoint verification and delineation or confirmation of genotype changes for SNVs/Indels.

Single-cell RNA-seq data analysis
To correlate the perturbation of genes likely affected by SVs during human gonadal and testicular development, we retrieved two published scRNA-seq datasets (Guo et al. 2021;Han et al. 2020) and processed with a unified analytical pipeline with functions provided by Seurat v4.0.6 (Hao et al. 2021). In brief, for each dataset, data QC was performed by filtering out (i) cells with < 200 expressed genes; and (ii) genes expressed in < 10 cells. Then, 'Normalized-Data' in Seurat v4.0 was applied to normalize the sparse single-cell gene expression matrix. Subsequently, normalization was performed by using 'SCTransform' (Hafemeister and Satija 2019), followed by batch-correction with "Harmony" as recommended (Korsunsky et al. 2019). Genes with highly variable expressions were identified by 'Find-VariableGenes', and the top 2000 highly variable expressed genes were used for dimensionality reduction using principal component analysis (PCA). In addition, the top 30 PCs were selected for clustering with FindNeighbors (SNN). Overall, the integrated dataset consisted of 74,845 cells derived from 12 timepoints in prenatal and postnatal testicular development [gestational week (GW): GW6, GW7, GW8, GW9, GW10, GW11, GW12, GW13, GW15, GW16, GW26 and 5 Months]. UMAP was plotted and each cell was labelled by the cell-type (with marker provided in the reported studies) or by the sample collected timepoint, respectively (Supplementary Figure S1). The analytical results were curated in our newly established Temporal Expression during Development Database (TEDD) (Zhou et al. 2022). In addition, the UMAP results from the integrated dataset were compared with the ones from each of the two original studies and showed that different cell-types were clustered consistently (Supplementary Figure S2) indicating the potential batch-effect was minimal. Furthermore, the single-cell expression data from the Human Protein Atlas wee also obtained by enquiry via the browser (Karlsson et al. 2021).

Statistical analysis
Statistical analysis was performed using R software (4.02 version). A p value of < 0.05 was considered statistically significant.

Cohort characteristics and the landscape of genomic variants
A total of 101 unrelated patients presenting with NOA (n = 63) or severe oligozoospermia (n = 38) ( Table 1) without a positive finding from routine standard-of-care tests 1 3 were included in this study. The clinical workup included detailed hormone analysis (follicle stimulating hormone [FSH] and testosterone), two semen analyses, and testicular ultrasound imaging. Overall, low-pass mate-pair GS identified clinically significant SVsin 12 patients (11.9%, Tables 2 and 3), all findings were confirmed by orthogonal methods. Follow-up studies were conducted for each of these patients regarding the outcomes of sperm retrieval and pregnancy with in vitro fertilization technology.

Clinically significant variants in patients with NOA
Among the 63 NOA patients, mate-pair GS identified clinically relevant seven SVs in seven patients (11.1%, Table 2).
Case 20C1113 was a 33 year-old patient presenting with small testes and NOA. Mate-pair GS identified a forward tandem duplication of Xp21.2 of 535 kb, seq[GRCh37] dup(X)(p21.3) chrX:g.29957001_30492292dup, involving NR0B1 and MAGEB gene clusters ( Fig. 1A and B). NR0B1 and MAGEB gene clusters have been curated as triplo-sensitive regions by ClinGen (TS = 3; Dosage #ISCA-46302). Duplications of similar size were reported in cases with isolated 46,XY gonadal dysgenesis (Barbaro et al. 2012). In addition, a 4-Mb inversion, inv(13)(q31.1q32.1), which was missed by former G-banded chromosome analysis (Fig. 1C), was detected without gene disruption at the breakpoint junctions. Thus, the forward tandem duplication was classified as pathogenic (Table 2), and the inversion was classified as a VUS (Supplementary Table S1). Follow-up of the patient indicated that no natural conception was achieved for more than 1 year since receiving former negative karyotyping results.
Case 21C0764 was diagnosed as NOA (testicular failure). Ultrasound examination revealed small testes (5 cc) and a small epididymal cyst. A hemizygous 1.14 Mb deletion in Xq28 was detected (seq[GRCh37] del(X)(q28) chrX:g.150838316_151978891del) by mate-pair GS involving MAGEA and CSAG gene clusters as well as the FATE1 gene. The MAGEA gene cluster is crucial in maintaining normal testicular size in mice (Hou et al. 2016), knockout of which suggested that specific defects occurred during the first wave of spermatogenesis. ScRNA-seq data from fetal and postnatal testicular development indicated that MAGEA gene cluster expressed in primordial germ cells ( Fig. 2A).
The protein of FATE1 is known to control the early testicular differentiation and cell proliferation (https:// www. ncbi. nlm. nih. gov/ gtr/ genes/ 89885/), and high expression of gene FATE1 was only observed in Sertoli cells ( Fig. 2A) in early testicular development. In comparison, depletion of CSAG1 disrupts centrosomes and leads to multipolar spindles during mitosis (Sapkota et al. 2020), but no expression of gene CSAG1 was identified from fetuses ( Fig. 2A). Both FATE1 and CSAG1 are highly expressed in adult germ cells ( Fig. 2B and C). Therefore, the lack of the expressions of these genes by the hemizygous deletion in this case likely led to small testes (MAGEA gene cluster and FATE1) and absence of sperms (CSAG1). Follow-up study indicated that the patient pursued an IVF pregnancy using donor sperm.
A 3.2-Mb inversion, inv(16)(q22.2q23.1), was identified in case 20C0011 with NOA. Both breakpoint junctions were located in intergenic regions (i.e., no gene was disrupted, Fig. 3B). However, by the analysis of the topologically associated domain (TAD), RFWD3, mutations of which are known to cause Fanconi anemia with variable expressivity of azoospermia (GeneReviews# NBK1401), was located in the TAD of the distal breakpoint junction of the inversion (Supplemental Figure S3). This likely resulted in a pathogenic position effect. Follow-up reported that MESA was performed with limited sperm obtained, and an ongoing pregnancy achieved by IVF with ICSI (Intracytoplasmic Sperm Injection).
Mate-pair GS also detected a VUS (Supplemental Table S1). Case 21C2529 was a 38 year-old NOA male with bilateral small testes. A heterozygous intragenic forward tandem duplication of 16.3 kb (seq[GRCh37] dup(8)(q21.11) chr8:g.74500678_74517010dup) was identified involving exons 10 and 11 (NM_001164380) of STAU2, likely resulting in gene truncation (loss-offunction, Fig. 3A). STAU2 mRNA is expressed in spermatocytes found in seminiferous tubules at stages VI-XII of the spermatogenic cycle and plays a role in mammalian gametogenesis (Saunders et al. 2000), and scRNA-seq data indicated STAU2 was likely expressed in most of the cell types in fetal testicular development as well as in adult germ cells (Supplementary Figure S4). In addition, STAU2 is likely haploinsufficient (pLI = 0.97 in gnomAD). Stau2 knockdown disrupted spindle formation and chromosome alignment during meiotic progression in mouse oocytes  (Cao et al. 2016), while depletion or interference with STAU2 function during early zebrafish development results in aberrant migration and subsequent extinction of primordial germ cells (Ramasamy et al. 2006). Furthermore, down regulation of Stau2 in vivo resulted in a small tissue size (Cockburn et al. 2012). However, knockout of Stau2 in mouse models did not present with defects in the reproductive system. As the consequences were controversial, we further classified this intragenic deletion as a VUS. Follow-up reported the patient did not proceed to TESE. Lastly, mate-pair GS also detected multiple regions with AOH in three NOA patients (Table 2). For case 20C0911, a complex insertion was identified resulting in disruption of gene ANOS1 (ClinGen dosage sensitivity map: HI = 3, Table 2), which is known to cause Kallmann syndrome (Thakker et al. 2020). In addition, two interstitial regions with AOH were identified with an overall size of 18.7 Mb. Uniparental disomy in chromosome 6 (UPD6) was suspected; however, no parental samples were available for confirmation. As paternal UPD6 (UPD6pat) is known to cause transient neonatal diabetes mellitus (Milenkovic et al. 2001) without the knowledge of male infertility, such a finding was classified as a VUS (Supplemental Table S1). High read-depth GS excluded the presence of causative SNVs/Indels in genes related to male infertility.
In comparison, the overall sizes of regions with AOH identified in cases 22C0195 and 21C3049 were 68 Mb and 218.6 Mb, respectively (Table 2), suggesting parental consanguinity. High read-depth GS in patient 22C0195 revealed a pathogenic homozygous SNV NC_000011.9:g.5248155C > G in HBB (Fig. 3Ci) mapped in the region with AOH in chromosome 11 (ClinVar ID: #15,447). Variants in HBB are a well-established cause for beta-thalassemia. Beta-thalassemia is classically considered to result in iron deposition in the endocrine glands leading to male infertility (De Sanctis et al. 2013); however, the relationship between beta-thalassemina and NOA is not well-established yet. Thus, we classified it as an incidental finding (Supplemental Table S1). High readdepth GS in patient 21C3049 revealed a homozygous AC deletion NC_000017.10:g.78064149_78064150delAC in CCDC40 (NM_001243342.2:c.3044_3045del, Fig. 3Cii). Loss-of-function mutations in CCDC40 are known to cause Ciliary dyskinesia, primary, 15 (OMIM # 613808) and male infertility by damaging sperm morphology and motility (Precone et al. 2020). However, as the presentation of case 21C3049 was with absence of sperm, this variant was classified as a VUS (Supplemental Table S1). Follow-up indicated that the patient suffered Sertoli-Cell-Only (SCO) syndrome revealed by MESA.  (GRCh37), while the Y axis shows the copy number. B Chromosomal microarray analysis (CMA) confirmed the finding of the Xp21.2 duplication. The probe distribution on the CMA platform is shown with the candidate region reported by low-pass mate-pair GS. In both low-pass mate-pair GS and CMA, the two regions with duplications are highlighted in blue. C A 4 Mb inv(13)(q31.1q32.1) identified by mate-pair GS but not detected by former karyotyping analysis. Breakpoint junction-specific PCR and Sanger sequencing results confirmed the inversion. On the right side, the pair of ideograms reveals the banding patterns of the normal and inverted chromosomes. Karyotyping results are shown and the inversion are not readily identified by G-banded chromosome analysis although they involve more than one band

Clinically significant variants in patients with severe oligozoospermia
Among the 38 patients with severe oligozoospermia, lowpass mate-pair GS identified five clinically significant SVs (n = 5) in five patients (13.2%, Table 2). They included three recurrent CNV syndromes (Supplementary Figure  S5), one intragenic forward tandem duplication and one complex insertion.
Patient 21C1102 was diagnosed with severe oligozoospermia and dry eye syndrome. A 2.6 Mb d u p l i c a t i o n ( s e q [ G RC h 3 7 ] d u p ( 2 2 ) ( q 1 1 . 2 1 ) chr22:g.18862504_21453290dup at 22q11.2) was detected. This is consistent with 22q11.2 microduplication syndrome (OMIM# 608363). 22q11.21 duplication syndrome exhibits variable expressivity and incomplete penetrance (Wentzel et al. 2008). However, affected individuals may have intellectual or learning disability and/or developmental delay as well as infertility (Abdullah et al. 2021). Follow-up revealed that the couple had undergone IVF but failed to achieve a pregnancy.
Patient 20C2873 was diagnosed with severe oligozoospermia. A 1.5-Mb microdeletion, (seq[GRCh37] del(16) (p13.11) chr16:g.14991714_16507667del) was identified in 16p13.11 (involving MYH11), which is the well-established 16p13.11 microdeletion syndrome reported with variable congenital anomalies, incomplete penetrance and variable expressivity. Although paternally inherited 16p13.11 deletions have been reported, immunohistochemical analysis clearly shows that specific contractile markers including MYH11 were reduced or lost in peritubular cells of seminiferous tubules of men with mixed atrophy in testicular biopsies (Welter et al. 2013). ScRNA-seq analysis indicated that MYH11 is mainly expressed in myocytes during fetal testicular development and in early/late spermatids in adult germ cells (Supplementary Figure S6). Follow-up revealed a pregnancy achieved by IVF resulted in a live-birth.
Mate-pair GS also reported three SVs further classified as VUS (Supplemental Table S1). For instance, Fig. 4 Inter-chromosomal insertion identified in case 20C1850. A Diagram of complex inter-chromosomal insertion identified in case 20C1850. The upper panel indicates composition of the normal chromosomes 2 and 9. Blue and orange bars refer to the 53.2 kb segment of dup(2)(q31.1) chr2:g.171170285_171223517dup and the 21.7 kb segment of dup(9)(p24.3) chr9:g.910281_931988dup, respectively, while the arrows indicate sequence orientation. Diagrams shown in the middle reveal composition of the der(9), with insertion of a segment of 2q31.1 in anti-sense orientation with subsequent duplica-tion of a segment originating from 9p24.3. In the bottom, breakpoint junction-specific PCR and Sanger sequencing results confirmed the rearrangements and indicate sequence microhomologies (highlighted in yellow) in both breakpoint junctions. Bi and Bii indicate the abundance of DMRT1 gene expression in different cell-types and timepoints during fetal and postnatal testicular development. C The expression profile of gene DMRT1 in adult germ cells. Histogram was obtained from The Human Protein Atlas browser patient 20C2993 was diagnosed with severe oligozoospermia. Mate-pair GS revealed a 22.5-kb heterozygous deletion (seq[GRCh37] del(9)(p21.2) chr9:g.26972539_26995048del) involving exons 3 to 8 (NM_001099223) of IFT74 in 9p21.2 ( Supplementary Figure S7). Mutations in IFT74 are known to cause spermatogenic failure 58 (OMIM# 619585) in an AR manner. High read-depth GS did not reveal any clinically significant SNV/ Indels in IFT74 nor in other genes related to male infertility. However, IFT74 is highly expressed in spermatocytes and early spermatids in adult germ cells. Knockdown of Ift74 in spermatocyte-derived GC-2 cells resulted in a significant reduction of protein levels of cell-adhesion molecules such as E-cadherin protein that is required for the initial cell division of spermatogonial stem cells (Yamashita et al. 2003). However, further study is warranted to investigate the causative relationship between heterozygous intragenic ITF74 deletion and severe oligozoospermia.

Comparison of the detection yields by mate-pair GS and follow-up
Overall, mate-pair GS identified clinically significant variants in seven patients with NOA (11.1%, 7/63) and five patients with severe oligozoospermia (13.2%, 5/38). There was no significant difference of the detection yields between two groups (Chi-square test: p = 0.758105).
In the present study, follow-up study was conducted for all 12 cases. Among the patients with NOA, follow-up information was obtained for five individuals (71.4%). Three opted out of testicular biopsy (one pursued IVF pregnancy with donor sperm) and two opted in (one with limited sperm retrieved and one without sperm retrieved, Table 2). In comparison, among the five cases with severe oligozoospermia, four had follow-up and all pursued IVF pregnancy with ICSI with one unsuccessful (Table 3).

Discussion
By using mate-pair GS an alternative long DNA sequencing method, our study revealed the spectrum of SVs and copy-number neutral AOH in male patients with idiopathic infertility who received negative results by standard-of-care tests. In 12 patients (11.9%, 12/101), mate-pair GS identified clinically significant variants that potentially explain the male infertility phenotypes. Among them, 12 clinically significant SVs were identified in 12 cases, including five recurrent CNV syndromes, one inversion, and six SVs with direct disruption of genes by intragenic rearrangements or complex insertions.
In this study, we demonstrated the advantages of using mate-pair GS as an alternative method of long DNA sequencing. Apart from the increasing resolution in detection of CNVs (up to 7.8 kb reported in this study), this method enabled delineation of the genomic location and orientation of duplications. For instance, the intragenic duplication in STAU2 (case 21C2529) was confirmed as an intragenic forward tandem duplication instead of an insertion into the other locations in the genome. Mate-pair GS helped the interpretation as the intragenic defect likely causing loss-offunction of that allele although this variant was further classified as a VUS due to the controversial evidence. In addition, the importance of knowing the genomic composition for a duplication could be also exemplified by two duplications, which were further classified as unlikely related to male infertility (Supplementary Table S1). In patient 18C0169 diagnosed with NOA, a 4.5 Mb seq[GRCh37] dup(4) (q12q13.1) chr4:g.58183887_62691666dup was identified involving a portion of ADGRL3, which is likely haploinsufficient (pLI = 1). However, it was a forward tandem duplication defined by mate-pair GS; thus, ADGRL3 remained intact. Another case 21C0925 was diagnosed with severe oligozoospermia. A 47.1 kb duplication, seq[GRCh37] dup(Y)(p11.32) chrY:g.2841415_2888522dup, involving partial ZFY (from Intron 3 to the 3′ UTR of the transcript NM_003411) was detected. Overexpression of ZFY in spermatocytes was demonstrated as a physiological marker of meiotic arrest leading to azoospermia and infertility (Jan et al. 2018), while Zfy dual knockout mice have infertility and display severe spermatogenesis defects (Yamauchi et al. 2022). However, mate-pair GS indicated that a segment of 217 kb in size seq[GRCh37] dup(Y) (p11.2) chrY:g.3216692_3434226dup was duplicated and inserted into the distal breakpoint junction of this 47-kb segment resulting in this duplication. As ZFY was located in the proximal breakpoint junction and the duplication only involved Intron 3 toward the 3' UTR of ZFY, ZFY remained intact, which was interpreted as unlikely to cause its aberrant expression.
Another advantage of mate-pair GS was its ability to detect balanced chromosomal structural rearrangements (such as balanced translocations and inversions). In this study, three inversions in karyotypically normal patients were identified. Notably, two of the three inversions involved more than one chromosomal band (Figs. 1C and 3B). However, both were not readily identified by previous G-banded chromosome analysis. Another inversion, inv(5)(p14.3), had both breakpoint junctions located in the same dark band. It was identified in NOA case 19C3124 with bilateral testicular microlithiasis and a 9-mm cyst at the right epididymal head. CDH12 was disrupted by the breakpoint junction (Supplementary Figure S8), and follow-up indicated no sperm retrieved by TESE (Johnsen score = 2) and classified as SCO syndrome. In addition, a deletion involving CDH12 was identified in a patient with SCO syndrome but absent 1 3 in normal controls (Stouffs et al. 2016). However, the same inversion was reported in 1077 fertile men in our former study (Dong et al. 2019a), indicating it is unlikely the causative factor related to male infertility. Lastly, mate-pair GS also illuminated copy-number neutral AOH on a genomewide scale (Dong et al. 2021b). Two cases were attributed to parental consanguinity, and one case with suspected UPD6. Multiple regions with AOH reported by low-pass mate-pair GS would be an important indicator for follow-up to detect SNVs/Indels.
Although some variants identified in this study were known to cause male infertility (such as forward tandem duplication of Xp21.2 of 535 kb involving NR0B1 and MAGEB gene clusters), interpretation of the other genomic variants for the relationship with male infertility remained challenging, particularly when there was only a remaining DNA sample leftover after routine testing. In this study, we utilized the scRNA-seq datasets generated from human fetal and postnatal testicular development (Guo et al. 2021;Han et al. 2020) and adult germ cells (Karlsson et al. 2021) to identify the specific cell-types and timepoints with expression detected of the target gene(s). For instance, a hemizygous 1.14 Mb deletion in Xq28 was detected in case 21C0764 with NOA and small testes (5 cc), involving 24 RefSeq genes. Knockout of the MAGEA gene cluster is known to cause the specific defects (smaller testes) during the first wave of spermatogenesis (Hou et al. 2016), the expression profiles from the published scRNA-seq datasets supported that MAGEA gene cluster expressed in primordial germ cells during fetal testicular development. In addition, the protein of FATE1 is known to control the early testicular differentiation and cell proliferation, high gene expression of which was observed mainly in fetal Sertoli cells ( Fig. 2A). Last, depletion of CSAG1 disrupts centrosomes and leads to multipolar spindles during mitosis (Sapkota et al. 2020), and its expression was absent in fetal stages but highly expressed in different types of adult germ cells ( Fig. 2B and C). Taken together, such findings support that a lack of these genes (null allele) by the hemizygous deletion would result in small tests and azoospermia.
In this study, we included the following two groups of male infertility patients with negative findings from the standard-of-care genetic testing: (1) 63 patients with NOA and (2) 38 patients with severe oligozoospermia. Regarding the impact of genetic diagnosis, current routine genetic investigation for patients with male infertility includes conventional karyotyping analysis and Y chromosome microdeletion analysis (Alhathal et al. 2020). Identification of numerical or structural aberrations would be important for the patients as some of them are strongly correlated with the sperm retrieval rate during testicular biopsy (Krausz and Riera-Escamilla 2018). The current management for patients with NOA is to provide testicular biopsy following IVF pregnancy through ICSI. Among the seven NOA cases with genetic diagnosis by mate-pair GS, the findings in some cases clearly provided prognostic information such as likely failure of sperm retrieval (such as in cases 21C1113 and 21C0764). In addition, current knowledge indicates some other factors would contribute to the failure of ICSI even if there were successful sperm retrieval. For instance, a centrosome defect is known to result in ICSI failure because this cellular component is involved in control of the first division after fertilization (Toure et al. 2021). In case 21C0764, a hemizygous deletion of CSAG1 was detected. Depletion of CSAG1 disrupts centrosomes and leads to multipolar spindles during mitosis (Sapkota et al. 2020). Therefore, information regarding failure of ICSI is also important for counseling with NOA patients. For severe oligozoospermia, the clinical option for management concerns IVF pregnancy by ICSI. In this study, we identified genetic defects in five cases with severe oligozoospermia. Three well-established microdeletion/duplication syndromes were discovered. Although known to have variable expressivity and incomplete penetrance among families, they were also detected in infertile males (Rossi et al. 2005;Wentzel et al. 2008). It is possible that former studies were focused largely on cognitive impairment of patients while reproductive failures were not a high priority, but patients with severe oligozoospermia might still achieve natural conceptions. Genetic investigation in cases with severe oligozoospermia may provide insight into the underlying mechanisms leading to co-morbid phenotypes impacting management options. Knowing the pathogenic microdeletion/duplication syndromes would be important for further genetic counselling if pre-implantation testing were under consideration with conception by IVF with ICSI. Taken together, genetic investigations of variants with mate-pair GS in both NOA and severe oligozoospermia cases with negative findings from standard-of-care testing would be important not only to gain an understanding of the mechanism(s) of infertility but also address the potential patient management.
Limitations remain. First, a slight increase of detection yield was identified in the group of severe oligozoospermia (13.2% vs 11.1%); however, no significant difference was observed with the main contribution likely due to the limited sample size. To provide a comprehensive landscape and spectrum of different types of genomic variants, enlarging the sample size in a future study would be valuable. Second, to address causal relationships of genetic variants with the infertility phenotypes, collection of samples with results from testicular biopsies available for genotype-phenotype correlation would be an important deliverable. However, testicular biopsies were not available for all patients. Third, detailed family medical history and segregation analysis of variants in the family for variant interpretation should be prioritized in addition to recruitment of parental samples if possible. Fourth, no other sample type was available for functional validation. Lastly, this study was focused on the finding of SVs/CNVs and copy-number neutral AOH, but other causative genomic variants remain to be investigated.

Conclusions
Our mate-pair genome sequencing study provided a landscape of various genomic variants in 101 males with idiopathic infertility and with molecular etiologies identified in 12 cases (11.9%). Identification of such genomic variants would not only help in gaining understanding of the underlying mechanisms leading to male infertility, but also likely impact clinical management.