The Novel Mutations of Unexplained Abortion Were Analyzed at the Nucleotide Level and Classied According to Gene Function

Background Abortion is a major problem affecting women's normal reproductive life. Non-whole chromosomes account for about 50% of abortions. In around 50% of cases, the etiology remains unknown. The present analysis was aimed to screen the mutations at the nucleotide level and detected new genetic origins of human diseases. Results 72 abortion tissues (with abnormal chromosome reason excluded) were recruited and whole-genome low coverage mate-pair sequencing (WGL-MPS) was performed on each tissue. And used 100 healthy human data as a background control to nd specic relevant genes and mutations with stringent bioinformatics analysis. We merged 72 abortion samples after sequencing. 28 sequence mutations in 25 genes potentially associated with unexplained abortion were identied in abortion. These 25 genes were divided into six classications, including fetal development, cell cycle, genital correlation, kidneys, nerve, and other know functions. The mutation frequency of these sites was greater than 20% compared with the reference sequence in merged abortion data. The frequency of mutations in the samples of the control group at these sites was less than 1%. Furthermore, we veried by 10 unexplained abortion tissues, seven of which had above mutations sites on these 25 genes. Conclusions We identied 28 mutations in 25 genes mutation that may be associated with unexplained abortion, and some were associated with embryonic development. In conclusion, the results of the present study provide a good research direction for unexplained abortions.


Introduction
Miscarriage is a common complex disease that affects about 10-25% of clinically con rmed pregnancies [1,2]. The increased risk of miscarriage is associated with age[1], and has been related to a range of causes; obesity, endocrine, immunological dysregulation, embryo, and oocyte aneuploidy, parental chromosomal abnormalities, oocyte aneuploidy and embryo, although causal underlying factors remain largely unknown [3].
Normal pregnancy requires a series of vascular, metabolic, immunological, and endocrine regulating processes. Disorder in these processes may lead to miscarriage. The majority (50-60%) of early pregnancy losses are caused by chromosomal aberrations, which are either de novo abnormalities from parents with normal karyotypes or inherited from the parents [4,5]. Embryonic aneuploidy, which increases signi cantly with advanced maternal age, accounts for a large portion of spontaneous abortus [4]. Previous studies have suggested that abnormal embryonic karyotype may contribute to abortion [6][7][8][9]. De novo numerical abnormalities, particularly autosomal trisomy, may explain a proportion of abortion [9]. However, up to 50% of recurrent pregnancy loss (RPL) cases fail to nd an etiology and are therefore referred to as unexplained recurrent pregnancy losses [10]. Polymorphic variants in different genes have been the target of investigations into genetic susceptibility to idiopathic RPL [11].
In addition to this, are there any common genetic causes for abortions? The development and application of NGS, including whole-genome or whole-exome sequencing (WES) or whole-genome sequencing, makes it possible to screen entire genomes and detect new genetic origins of human disease at the nucleotide level [12]. However, the use of NGS for nding causes of embryonic or fetal deaths and associated developmental abnormalities in pregnancy loss has been less extensive. Evica summarized the nextgeneration sequencing in recurrent pregnancy loss-approaches and outcomes, including speci c mutations or affected genes in different studies [13]. Additional examples of whole family WES approach were provided, with compound heterozygous mutations detected in a variety of genes considered to be contributing to the lethal or abnormal phenotype in the embryo [14][15][16]. Some coding variants in genes potentially related to the unexplained abortion were identi ed by using whole-exome sequencing [17][18][19][20][21]. Chen et al identi ed 275 potential developmental genes integral to 154 CNVs detected by WGS and CMA [22]. Filges et al used family-based whole-exome sequencing to identify causal variants for a recurrent pattern of an undescribed lethal fetal congenital anomaly syndrome and identify mutations in KIF14 as a novel cause of an autosomal recessive lethal fetal ciliopathy phenotype [23].
In this study, we presented a new effective method to detect mutations with NGS and bioinformatics approaches that have potential functional effects associated with unexplained abortion. This method is also more effective in large numbers of populations. This study merged 72 abortions (with chromosome aneuploidy excluded), and used 100 normal human data as a background control to nd speci c relevant genes and variants. It was found that the 28 variants in 25 genes potentially related to the phenotype.
The gene functions of these loci are related to embryo development, immunity, neurodevelopment, genitalia, and so on. We have reported the characteristics. We consider that some of these genes/variants may be potential future biomarkers for unexplained abortion.

Subjects
All samples in this study are Chinese people from The A liated Suzhou Hospital of Nanjing Medical University, Peking University Shenzhen hospital, and the second a liated hospital of Shantou University.
The sampling time is from May 2018 to October 2019. We counted the clinical and demographic data of the 72 abortion tissues and 100 control samples (Table 1).
A total of 72 abortion tissues were evaluated in this case-control study. These abortion tissues came from unexplained pregnancy losses that occurred before 24 weeks of pregnancy. The woman may have miscarriage for the rst time or have had a miscarriage before. Samples of abortion tissue were excluded by 1. pregnant women diagnosed with abnormal chromosome abortion or other known causes; 2. pregnant women who had given birth normally ever.
There was usually no change in the human genome from birth to adult. So we collected peripheral blood from 100 individuals' adults with no signi cant medical history and no reproductive system diseases as the control group to compare genetic variants with abortion tissues. Demographic information, medical and obstetric histories were also evaluated for both groups. All procedures performed in the study involving abortion tissue samples were by the ethical standards of The A liated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital. The tissue or blood was frozen by liquid nitrogen.

Data extraction
After thawed, washed the aborted tissue with phosphate-buffered saline (PBS), and extracted the Genomic DNA by QIAamp® DNA Mini Kit (Qiagen, Hilden, Germany). Genomic DNA was extracted by QIAamp® DNA Mini Kit (Qiagen, Hilden, Germany) from blood in the same way. Quantity and purity of gDNA were assessed by Qubit® 3.0 Fluorometer (Invitrogen, Carlsbad, CA, USA) and NanoDrop-One (Thermo Scienti c, Wilmington, DE, USA). The isolated DNA was stored in the − 80℃ refrigerator immediately.
Library preparation and Illumina sequencing 1 ug of gDNA was fragmented to the size range of 200-500 bp by M220 Focused-ultrasonicator (Covaris, UK), and next end repair, tailing and adaptor ligation with VAHTS Universal DNA Library Prep Kit for illumining (VAZYME), 7 cycles were performed in the ampli cation. Quality controlling the generate sequencing library by Agilent 2100 bioanalyzer (Agilent) and Qubit 3.0 (Invitrogen). Massively parallel sequencing of aborted tissue libraries was performed on the BGI500 platform (BGI, China). The sequencing data was 70M-200M PE150 raw reads per sample.

Sequence data analysis
We used WGL-MPS (whole-genome low-coverage mate-pair sequencing) technology to detect the CCRs (complex chromosomal rearrangements) in maternal chromosomes [24] and further checked for aneuploidy by conventional PGT-A [25]. Therefore, 72 abortion tissue samples without abnormal chromosome structure and number were selected as the experimental group. The number of original reads for each sample was more than 70 million. To eliminate the difference in sequencing depth, we randomly selected 70 million reads for each sample and merged the raw data. This newly merged data had a total of 5.04 billion reads number.
First, we performed a ltering p whole genome rocess by SOAPnuke (V1.5.0) on the merged data. After ltering, the merged data were mapped to the reference genome sequence(hg19)using BWA (V0.7.12). We used a self-developed program to analyze the bam le and counted the number of the four bases of ATCG at each coverage site. We retained high quality (depth was greater than 4 and non-reference frequency is greater than 10%) mutations and annotated the mutation sites with genomic, population frequency, and functional annotations. In the control group, we selected 100 whole genome sequencing samples. We performed the ltering, alignment, bam le statistics, mutation ltering, and annotation steps of the above experimental group to obtain the result le of each sample (Fig. 1a).
Then, found the genes and the mutation sites of these genes in the case sample. Next, retained meaningful mutation sites related to abortion (Fig. 1b). Afterward, selected the sites with a non-reference frequency was greater than 10% from the 72 samples. Subsequently, ltered out the variants with allele frequencies above 1% in the ExAC database or unknown population frequencies. The distribution of the population frequency in CHR is in Fig. 2. Variants ful lling the following criteria were retained in exon regions: frameshift deletion, stopgain, nonframeshift insertion, nonsynonymous SNV and splicing. After that found out the corresponding mutation site in controls behind the above screening. Finally, found the non-reference mutation frequency (Non-reference mutation frequency, means to the sum of the frequency of occurrence of all bases different from the reference, regardless of the direction of mutation) of these mutation sites differed between merged sample, and controls signi cantly.

Results
172 samples with unexplained abortion were recruited in this study, including 72 abortion tissues and 100 controls. Their general information and data were shown in Table 1. We observed more mutations in the case merged sample merged from 72 abortion tissues. We speculated that genetic factors may play an important role in patients with unexplained abortion. As shown in Fig. 3, the non-reference mutations frequency were greater than 10% in the merged abortion tissues sample shown in the red line. In the 100 control samples, remain the proportion of samples with non-reference mutations greater than 10% was < 0.05 shown in the green line. The horizontal axis was the mutation site, with a total of 525; the vertical axis was the proportion of non-reference mutation frequencies. There were signi cantly different between the integrated sample and controls in the non-ref mutation frequency of 525 mutation sites (Wilcoxon rank-sum test, P-value < 0.01, Fig. 3).
There were many mutations different from the reference sequence that occurred in the merged sample, and in control samples without reproductive problems, almost no samples were mutated. The black threshold line kept only non-reference frequencies greater than 20%, leaving 81 mutations. Next, we selected 28 mutation sites that were closely related to reproductive reproduction from 81 mutation sites remaining ltered by the black threshold line in Fig. 3. The mutations with unknown gene functions were not considered. There were 28 loci with heterozygous mutations, shown in Supplementary Table 1. A total of 28 sequence variants in 25 genes potentially associated with unexplained abortion were identi ed. According to the different annotation information of genes, these 25 genes were divided into 6 classi cations, including fetal development (5 SNP), cell cycle (2 SNP), genital correlation (3 SNP), kidneys (4 SNP), nerve (4 SNP) and other known functions (10 SNP), shows in Fig. 4a. Next, we selected ten new abortion tissue samples for analysis. These samples were from pregnant women who had never had normal births, and abnormal chromosomal abortions were excluded. The results showed that three of these samples had no mutations at these 28 sites. For the other seven samples at these 28 sites, the mutation ratio of each classi cation was shown in Table 2 and Fig. 4b. The ratio was the number of mutated sites in the sample in the classi cation divided by the total number of mutated sites in this classi cation. It can be obtained from the radar chart (Fig. 4b) that the mutation sites of S2 and S4 samples were mainly concentrated on the sites related to fetal development, and the percentage was up to 80%. S1, S5, and S6 were the same as the previous, and the percentage was 60%. The main mutation sites of S3 related to cell cycle, which percent was 50%. S7 has no obvious tendency of variation sites.
There are four genes associated with embryonic development of the 25 genes that we found, including ve mutation sites. A missense mutation was detected in MST1 (c.469A > C:p.K157Q). It accounted for 49.35% of the merged abortion data. But no such mutation occurred in the controls. Both the RS numbers and population frequency are unknown in the variation database. MST1 gene was associated with embryogenesis [26]. It was also found a heterozygous mutation in the RBPJ (c.548C > G:p.T183R) in chromosome 4. RBPJ related to recombination signal-binding protein for the immunoglobulin kappa j region, its replacement symbol is RBPSUH. RBPSUH and DLL4 (605185) are both involved in Notch signaling. Krebs et al. (2004) showed that Dll4 haploinsu ciency or RBPSUH knockout in mice resulted in severe vascular defects leading to embryonic lethality [27]. TRIM71 (c.1166T > C:p.I389T) was found in chromosome 3, which related to embryo development and was known to cause embryonic lethality in animals [28,29]. In 2005, Mitschka et al. con rmed that loss of TRIM71 resulted in 100% embryonic lethality in mice. And TRIM71 -/-mice showed general growth retardation of the trunk, all embryos were dead by E14. 5 at later stages [29]. There were two variations on FOXD4L4 (c.506T > C:p.I169T), which regulate embryonal development and tissue differentiation. Mutation sites signi cantly different from control have also been found in these two genes [30].

Discussion
In the present study, we merged 72 abortions (with abnormal chromosome excluded), and used 100 normal human data as a control to nd speci c relevant genes and variation. It aimed to screen the entire genome and detect the genetic origin of human diseases at the nucleotide level. 28 rare sequence variants in 25 genes potentially associated with unexplained abortion were identi ed by using a strict ltering strategy.
As a result, these 25 genes are divided into six classi cations, including fetal development, cell cycle, genital correlation, kidneys, nerve, and other know functions. These sites had more than 20% mutations frequency different from the reference sequence in the case. Almost no samples had these mutations in the control. And we veri ed by 10 abortion tissues without chromosome problems, most of which had mutation sites on these genes.
We found that four mutation site such as MST1(c.469A > C:p.K157Q) et al in euchromosome was associated with an increased incidence of abortion. The functions of these genes have been reported in previous studies and veri ed in animal models. But these loci have never been reported. Besides, as mentioned below, we also found several other genes related to embryo development. Based on our data, we suggest that the single-gene mutation site has a very large inevitable link to abortion of unknown cause. However, we still need to conduct a larger population analysis, and the sites and genes were then validated in animal models in the follow-up.
It is noteworthy that there is a non-reference mutation on MUC4 (c.C12119C > T;p.S4040L) that reaches 46.75%, and only 3% in the control samples. Moehle et al. (2006) found that MUC4 was highly expressed in adult trachea and colon, and fetal lung. Expression was moderate in adult testis, prostate, mammary gland, lung, and stomach [42]. The occurrence of such a high mutation frequency in the streaming product is certainly not an accident. Next, we should continue to expand the research object for further veri cation.
Conclusion we found speci c relevant genes and variations in abortion and control. It was found that the variation of 28 loci may be related to abortion. The gene functions of these loci are related to Fetal development, cell cycle, neurodevelopment, genital correlation, kidneys, and so on. Although further research is needed, these results help to provide a good research direction for abortions of unknown cause, and subsequent experiments can be performed to verify the sites found.
Declarations and design; been involved in revising the manuscript critically for important intellectual content and given nal approval of the version to be published. All authors contributed to and approved the nal manuscript.

Funding
This study was funded by grants from a special funds for the Sanming Project of Medicine in Shenzhen, Shenzhen Leading Gynecological Subject, and the Shenzhen Municipal Science and Technology Innovation Committee (JSGG20180703164202084).

Availability of data and materials
All data generated or analyzed during this study are included in the article.

Competing interests
The authors declare that they have no con ict of interest.

Ethics approval and consent to participate
This study was permitted by the ethics committee of The A liated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, and informed consent was gained from all participants before enrollment in the project. This study conformed to the approved guidelines.

Consent for publication
All patients in this study provided their consent for publication. Tables   Table 1. Overview of the cohorts studied.  Table 2. The mutation ratio of each classi cation at the seven samples in total of 28 mutation sites.

SampleID
Cell cycle(2) Fetal development (5) Genital correlation (3) Kidneys (4) Nerve (4) Other (  The comparison of non-reference mutations frequencies in case and control. There are signi cantly different between merged sample and controls of the non-ref mutation frequency of 525 mutation sites (Wilcoxon rank-sum test, P value = 2.2e-16). The red line: the non-reference mutation frequency is greater than 10% in the case; the blue line: the proportion of samples whose non-reference mutation frequency is greater than 10% in the control, retaining the point where the mutation frequency ratio is less than 0.05. The horizontal axis is the mutation site, with a total of 525; the vertical axis is the proportion of nonreference mutation frequencies. The black threshold line is to keep only non-reference mutation frequencies greater than 20%. After ltering, there are 81 mutation sites remaining, of which 28 sites are closely related to reproductive reproduction.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. TableS1.xls