Genome-wide DNA methylation pro les and small non- coding RNAs signatures study of sperm with high DNA fragmentation index

Minghua Liu Changhai Hospital, Naval Medical University Peiru Liu Fudan University Yunjian Chang Chinese Academy of Sciences, University of Chinese Academy of Sciences Beiying Xu Chinese Academy of Sciences, University of Chinese Academy of Sciences Nengzhuang Wang Changhai Hospital, Naval Medical University Lina Qin Changhai Hospital, Naval Medical University Jufen Zheng Changhai Hospital, Naval Medical University Yun Liu Fudan University Ligang Wu Chinese Academy of Sciences, University of Chinese Academy of Sciences Hongli Yan (  hongliyan@smmu.edu.cn ) Changhai Hospital, Naval Medical University


Abstract
Background Sperm DNA fragmentation index (DFI) is a more accurate measure of sperm quality than traditional semen analysis (sperm motility, concentration, and morphology) because DFI can better predict male fertility potential and assisted reproductive technology (ART) outcomes. However, no studies have compared the genome-wide DNA methylation pro les and small noncoding RNAs (sncRNAs) signatures of high DFI and low DFI sperm samples.

Methods
Whole-genome bisul te sequencing (WGBS) and sncRNAs deep sequencing were performed to compare the genomewide DNA methylation pro les and sncRNAs signatures of sperm samples of weak group(DFI > 30%) and normal group(DFI < 15%). In addition, we used the WGBS data to construct DNA methylation correlation matrices of the weak and normal group sperm.

Results
A total of 4939 diferentially methylated regions (DMRs) (3083 hypermethylated and 1856 hypomethylated) were identifed in the two group sperm samples. The hypermethylated DMRs were found to be mainly signifcantly enriched in the area of neurons and microtubule. Compared with the normal group, the global DNA methylation level of the weak group sperm showed a downward trend and the methylation correlation within the chromosomes of weak group sperm was weakened. 27 miRNAs, 151 tsRNAs and 70 rsRNAs were differently expressed between the two group sperm samples. Finally, we identi ed nine sncRNAs as candidate sperm quality biomarkers. In addition, the target genes of both the downregulated and upregulated miRNAs were involved in the nervous system and cell development

Conclusions
Our ndings suggest that the genome-wide DNA methylation levels and sncRNAs expression of sperm were signi cantly correlated with sperm DFI levels. Therefore, our study provides useful biomarkers for improving the success rate of natural pregnancy and ART. The chromosomes of high DFI sperm became loose and more vulnerable to ROS attack. The increase of sperm DFI level may affect embryonic nervous system development.

Background
Infertility refers to the failure of couples to achieve clinical pregnancy for more than one year without contraception [1].
Globally, more than 15% of couples of reproductive age suffer from infertility, of which 50% are caused by male factors [2]. Currently, assessment of sperm quality in infertile men still relies on traditional semen analysis, such as sperm motility, concentration, and morphology, but these analyses cannot accurately predict male fertility potential and assisted reproductive technology (ART) outcomes [3]. In fact, about 15% of infertile men have normal sperm motility, concentration, and morphology [4], but these analyses do not re ect sperm DNA integrity [5].
Sperm DNA Fragmentation (SDF) can be caused by extrinsic factors (smoking, high temperature, chemotherapeutic drugs and environmental pollutants) and intrinsic factors (oxidative stress[OS], apoptosis failure and germ cell maturation defects) [6]. Numerous studies have shown that OS is a major factor in male infertility [7]. Reactive oxygen species (ROS) are essential for physiological processes such as apoptosis and capacitation, but excess ROS can cause SDF[8] and affect the outcome of natural pregnancy or ART. Over the past 20 years, there has been an increasing number of studies reporting that SDF is associated with male infertility, with the main focus of SDF research being on lifestyle factors, asthenozoospermia, and varicocele [9]. In addition, there is evidence that SDF affects the health and well-being of offspring [10].
Some studies showed that changes in sperm DNA methylation in imprinted regions are associated with male infertility and abnormal semen parameters (especially oligozoospermia) [11][12][13][14][15]. However, no speci c DNA methylation signatures that convincingly replicate male infertility or abnormal semen parameters have been found in genome-scale unbiased analyses [16]. In addition, there is no clear evidence linking sperm DNA methylation with pregnancy outcomes or offspring health [16].
To date, no studies have demonstrated the effect of SDF on genome-wide DNA methylation pro les and sncRNAs signatures in human sperm. Through WGBS and sncRNAs deep sequencing in two groups of sperm with DNA fragmentation index (DFI) > 30% and DFI < 15%, we revealed that the genome-wide DNA methylation levels and sncRNAs expression of sperm were signi cantly correlated with sperm DFI, which has important clinical signi cance. . The reads were aligned to the human genome (hg19) as well as the lambda phage genome using Bowtie2 v.2.2.3. PCR duplication reads were then removed, methylation measurements for each CpG site were obtained, and the bisulfte conversion rates were calculated based on the spiked-in unmethylated lambda phage DNA.

Methods
Identifying DMRs in the sperm samples using the bsseq package in BSmooth [27]. Only CpGs that appeared more than twice in at least three samples from weak and normal group were analyzed. DMRs were identi ed using a smoothing window containing 70 CpGs or 1 kb width, whichever was larger. The presumptive DMRs must meet the following 4 criteria at the same time: (1) must located on autosomes; (2) must have a methylation difference of 10% or more; (3) must contain more than 3 CpG sites; (4) must have a t-statistic with qcutoff in the range 0.025 to 0.975.
The pathway enrichment analysis Sperm DMRs were annotated to their nearest genes. The hypergeometric test was performed with a focus on biological processes using the Molecular Signatures Database and the R package clusterPro er v.3.6.0 enricher function with default parameters (P-value cut-of=0.05; OrgDb=org.Hs.eg.db; and p.adjust method="BH"). Pathway enrichment analysis was performed with Kyoto Encyclopedia of Genes and Genomes (KEGG) database (P-value cut-of=0.05) and R package clusterPro er v.3.6.0.

Construction of WGBS correlation matrices
We constructed WGBS correlation matrices follow the method of Kasper D. Hansen[28]. In our correlation matrix, we only included so-called open sea CpGs that are more than 4 kb away from CpG islands. We then binned each chromosome into 100 kb bins, and took the correlations of the mean methylation levels for each bin. We obtained the rst eigenvector of this binned correlation matrix and gently smoothed the signal by using two iterations of a moving average with a window size of three bins. The sign of the eigenvector is chosen so that the sign of the correlation between the eigenvector and column sums of the correlation matrix is positive which ensures that positive values of the eigenvector are associated with the closed compartment. sncRNAs library preparation and deep sequencing Total RNA of 13 weak group sperm samples and 17 normal group sperm samples was extracted using TRIzol reagent (Takara). Approximately 50 ng RNA was used to construct the sncRNAs library according to the Illumina protocol. Highthroughput RNA sequencing was performed using a Hiseq X Ten (PE150). Fastp was used to clip adaptor and lter low quality reads. Reads that do not match the adapter or reads less than 17 nt in length were discarded. Redundant sequences were collapsed as useful reads for further analysis [29]. To assess the expression levels of miRNAs, only reads with an exact match to the 5' start site of the annotated miRNA with ≤ 2 nt deletions at the 3' end or additional sequences derived from pri-miRNAs were counted as miRNAs. Normalize miRNA counts to total miRNAs and multiplied by 1,000,000. Expression levels and classi cation of tsRNAs were based on GtRNAdb, and the tsRNA counts were also normalized to total tsRNAs and multiplied by 1,000,000. rsRNAs were mapped to rRNA precursors in the order of 5S, 5.8S, 18S, 28S and 45S, and the normalized method of rsRNA was same as that of miRNA and tsRNA. The remaining 25-32 nt sequences were used to identify the piRNAs [30]. sncRNAs with more than one annotation were characterized in the following order: miRNA, tsRNA, rsRNA, snRNA, snoRNA, and piRNA. Sequences not annotated with any of the above sncRNAs were classi ed as other sequences.

Statistical analysis
We used the R software package for statistical analysis. The differentially expressed miRNA tsRNA and rsRNA levels were analyzed using DESeq2 [32]. Only sncRNAs with P-values less than 0.05 were considered differentially expressed sncRNAs. The Student's test was used to analyze the sncRNAs ratio in the weak and normal group which is a statistical signi cance test, and if P < 0.05, the association between the miRNAs/tsRNAs/rsRNAs and the two groups could be established. Unsupervised hierarchical clustering analysis was performed using Pheatmap [33]. The PCA was performed useing the prcomp package, which was visualized by ggbiplot [34].

miRNA functional analysis
The miRNA target genes were predicted by the online computational algorithm TargetScan (http://www.targetscan.org). The target gene enrichment tests of upregulated and downregulated miRNAs were annotated based on their biological processes GO terms [35].

Genome-wide DNA methylation analysis of sperm
To compare the genome-wide DNA methylation pro les of sperm samples of the weak group (DFI>30%) and normal group (DFI<15%), 6 weak group sperm samples and 7 normal group sperm samples were examined using WGBS (Table  1). After quality control and data preprocessing, all samples were found to have a bisul te conversion rate greater than 96%.
Compared with the normal group sperm samples, the global DNA methylation level of the weak group sperm samples showed a downward trend ( Figure 1A). We then performed DMRs analysis to determine epigenetic differences between the two groups. A total of 4939 diferentially methylated regions (DMRs) (3083 hypermethylated and 1856 hypomethylated) were identifed in the weak group sperm samples relative to the normal group sperm samples (Table S1) , with 2072 of them (41.95%) located in promoter regions (1 to 3000 bp upstream or downstream of transcription start site) ( Figure 1B). The percentages of hypermethylated DMRs were higher than hypomethylated DMRs in all of the seven examined gene annotation groups ( Figure 1C).
The top 300 DMRs included 7921 CpGs with the median length of 1282.5 bp, and most of them were located at promoter regions. We were able to separate sperm samples into two groups corresponding to DFI>30% and DFI<15% ( Figure 1D) using the top 300 DMRs, thus these 300 DMRs may be potential biomarkers for sperm quality.
To characterize the functional relevance of the 3083 hypermethylated DMRs, all of the 3083 hypermethylated DMRs were associated with their nearest genes and the gene ontology(GO) and pathway enrichment analysis was performed. The 3083 hypermethylated DMRs were found to be mainly signifcantly enriched in the area of neurons, such as axonogenesis, regulation of neuron projection development, synaplic membrane, axon guidance, neuron projection guidance and distal axon( Figure 1E). These ndings suggest that the increase of sperm DFI level may affect embryonic nervous system development by causing epigenetic dysregulation of genes associated with neurons. In addition, the 3083 hypermethylated DMRs were also found to be signifcantly enriched in the area of microtubule ( Figure 1E) which is an important part of the 9+2 structure of the sperm tail and is very important to ensure sperm motility. Since the sperm motility of weak group sperm was signi cantly lower than that of the normal group sperm (Table 1), the increase of sperm DFI level may affect sperm tail structure and sperm motility by causing epigenetic dysregulation of genes associated with microtubule.

Chromosome Compartments Analysis
Because the global DNA methylation level of the weak group sperm samples showed a downward trend, we speculated that there might be differences in the spatial conformation of chromosomes between the two groups of samples. We constructed the chromosome compartments of sperm from the two groups using the WGBS data, and found that the compartments of ve chromosomes (13, 4, 5, 21 and Y) in the weak group sperm changed compared with the normal group, and the correlations of methylation within the ve chromosomes were weakened (Figure 3) which suggested that the structure of the ve chromosomes of the weak group sperm become loose. Therefore, the chromosomes of the weak group sperm is more vulnerable to ROS attacks and more prone to break. Among these ve chromosomes, the chromosome compartments and the correlations of methylation of chromosome Y changed the most between the two groups( Figure 3I, 3J), which may suggests that elevated DFI levels produce more damage on sperms with chromosome Y than chromosome X.

sncRNAs deep analysis
We extracted total RNA from 13 weak group sperm samples(DFI>30%) and 17 normal group sperm samples(DFI 15%) (Table 1), and analyzed the expression of sncRNAs by deep sequencing. We found that rsRNAs, tsRNAs, yRNAs and miRNAs were abundant in sperm samples. On average, about 40.5% of the sncRNAs annotated to rsRNAs, 19.3% to tsRNAs, 10.4% to yRNAs, and 7.1% to miRNAs ( Table 2, Table S2). The length distribution of these sncRNAs was similar in each sample in both groups (Figure 4). The peak of the tsRNAs, rsRNAs, and miRNAs length ranged from 17 to 40 nt, 17 to 40 nt and 20 to 23 nt, respectively. We analyzed the proportions of miRNAs, tsRNAs, rsRNAs, sn/snoRNAs, yRNAs, and piRNAs in each sample ( Figure 5A) and found that the proportions of these sncRNAs were not signi cantly different between the two groups ( Figure 5B).
A total of 632 miRNAs (average RPM > 10) were detected in sperm samples (Table S3), of which 27 miRNAs were differently expressed between the two groups (9 up-regulated, 18 down-regulated) ( Figure 6A, 6B, 6C and Table S4). Figure 6D shows the difference in the expression of the top 10 miRNAs by average expression between the two groups. Furthermore, we found that a principal component analysis (PCA), which is a powerful tool for exploratory data analysis and generating predictive models, could separate the weak group sperm samples from the normal group based on these 27 differently expressed miRNAs (PC1=48.72%, PC2=19.14%) ( Figure 10A). These results indicated that the 27 tsRNAs have an excellent prognostic value and can be potential biomarkers for assessing human sperm quality. Moreover, the target genes of 16 of the 27 differentially expressed miRNAs were predicted by TargetScan. An analysis of signi cant GO-enriched terms showed that target genes of both the downregulated and upregulated miRNAs were involved in the nervous system and cell development (Figure 7), indicating that these miRNA target genes might be important for early embryo nervous system development and other systems development.
tsRNAs were high expressed in human sperm, and a total of 3612 tsRNAs (average RPM > 10) were detected in sperm samples (Table S5), of which 151 tsRNAs were differently expressed between the two groups (76 up-regulated, 75 downregulated) ( Figure 8A, 8B, 8C and Table S6). Figure 8D shows the difference in the expression of the top 10 tsRNAs by average expression between the two groups. PCA classi er analyses showed that these 151 differently expressed tsRNAs could also classify the samples into two groups (PC1=28.79%, PC2=23.57%), indicating that these tsRNAs have comparable predictive power and may be another type of useful biomarker for the clinical evaluation of sperm quality( Figure 10B).
We found that rsRNAs is the most highly expressed sncRNAs in human sperm. A total of 10707 rsRNAs (average RPM > 100) were detected in sperm samples (Table S7), of which 70 rsRNAs were differently expressed between the two groups(42 up-regulated, 28 down-regulated) ( Figure 9A, 9B, 9C and Table S8). Figure 9D shows the difference in the expression of the top 10 rsRNAs by average expression between the two groups. PCA classi er analyses showed that these 70 differently expressed rsRNAs can be used for separating the weak and normal group sperm samples(PC1=47.06%, PC2=18.13%) ( Figure 10C).
Finally, we identi ed nine sncRNAs as candidate sperm quality biomarkers (Table 3). PCA classi er analyses showed that these nine sncRNAs can be used for better separating the weak and normal group sperm samples(PC1=34.02%, PC2=26.28%)( Figure 10D). In the future, we also need to verify whether these nine sncRNAs can be used as sperm quality biomarkers on the basis of larger sample size. True sncRNAs biomarkers not only require the ability to distinguish the high DFI sperm from the low DFI sperm, but more importantly, can accurately predict the pregnancy outcome.

Discussion
Studies in non-human mammals have shown that environmental factors in uence sperm DNA methylation [36][37][38][39][40][41], which is also associated with diseases such as infertility and imprinting disorders in offspring. In recent years, more and more studies have explored the human sperm DNA methylation landscape, but the impact of environmental factors on human sperm DNA methylation is still unclear, and the relationship between human sperm DNA methylation and various sperm parameters and offspring health is also not clear.
In our study, genome-wide methylation sequencing of high DFI sperm was performed for the rst time. We found that relative to low DFI sperm, the overall methylation level of high DFI sperm decreased; DMRs were mainly located in the promoter regions, and the number of hypermethylated DMRs was signi cantly higher than that of hypomethylated DMRs. We were able to separate sperm samples into two groups corresponding to DFI > 30% and DFI < 15% using the top 300 DMRs, thus these 300 DMRs may be potential biomarkers for sperm quality. At present, there is no clear sperm quality DNA methylation biomarker, our study may provide a basis for sperm quality DNA methylation biomarker research. The hypermethylated DMRs were found to be mainly signifcantly enriched in the area of neurons and microtubule which suggest that the increase of sperm DFI level may affect embryonic nervous system development and sperm motility.
We constructed the chromosome compartments of the two groups using the WGBS data, and found that the compartments of some chromosomes in the high DFI sperm changed compared with the low DFI sperm, and the correlations of methylation within some chromosomes were weakened. Therefore, the structure of the ve chromosomes of the high DFI sperm may become loose, and the chromosomes is more vulnerable to ROS and more prone to break. Therefore, patients with high DFI should be treated in time or change their bad habits to avoid the continuous increase of DFI.
Previously, sperm RNA was thought to be a simple residual product of spermatogenesis, but studies over the past few years have shown that RNA is actually dynamically regulated during spermatogenesis and is spatially compartmentalized after sperm mature [42][43][44]. Studies have shown that both RNA pro les and RNA modi cations in mammalian sperm are sensitive to the paternal environment. Various environmental factors, such as alcohol abuse, mental stress, chemical exposure, exercise and so on, can alter the composition of sperm miRNAs, tsRNAs, and rsRNAs. There are potential interactions between different kinds of RNAs in sperm and can synergistically regulate complex biological processes [45], such as optimizing early embryonic development [46,47] or transmitting paternal traits to offspring [20,[48][49][50].
We performed deep sequencing of sncRNAs in high DFI sperm for the rst time, and found that there was no signi cant difference in the overall expression of various sncRNAs between high DFI and low DFI sperm. However, there were many differentially expressed miRNAs, tsRNAs and rsRNAs between the two groups of sperm, and PCA classi er analyses showed that they had some discriminative effect on the two groups of samples, so they may be potential sncRNAs biomarkers of sperm quality. We screened nine sncRNAs from the differentially expressed sncRNAs as candidate sperm quality biomarkers, and PCA classi er analyses showed that these 9 sncRNAs could well distinguish the two groups of sperm samples, but we need to verify these 9 sncRNAs on the basis of more sperm samples in the future

Conclusions
In conclusion, our study showed the genome-wide DNA methylation pro les and small non-coding RNAs signatures of sperm with high DNA fragmentation index. Our study provides useful biomarkers for improving the success rate of natural pregnancy and ART. Our ndings suggest that the chromosomes of high DFI sperm became loose and more vulnerable to ROS attack. The increase of sperm DFI level may affect embryonic nervous system development.        Differential expression miRNAs between the weak and normal group sperm samples analysis.

Abbreviations
(A) Valcano plot of the 27 differential expression miRNAs.
(C) Heatmap of the 27 differentially expressed miRNAs, The branching pattern is illustrated using a dendrogram.
(D) Bar plot of the top ten differentially expressed miRNAs in average expression. GO analysis of the predicted genes targeted by the differentially expressed miRNAs.

Figure 8
Differential expression tsRNAs between the weak and normal group sperm samples analysis.
(C) Heatmap of the 151 differentially expressed tsRNAs. The branching pattern is illustrated using a dendrogram.
(D) Bar plot of the top ten differentially expressed tsRNAs in basic expression.

Figure 9
Differential expression rsRNAs between the weak and normal group sperm samples analysis.
(C) Heatmap of the 70 differentially expressed rsRNAs. The branching pattern is illustrated using a dendrogram.
(D) Bar plot of the top ten differentially expressed rsRNAs in basic expression.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryTable.xlsx