All samples in this study are Chinese people from The Affiliated Suzhou Hospital of Nanjing Medical University, Peking University Shenzhen hospital, and the second affiliated hospital of Shantou University. The sampling time is from May 2018 to October 2019. We counted the clinical and demographic data of the 72 abortion tissues and 100 control samples (Table 1).
A total of 72 abortion tissues were evaluated in this case-control study. These abortion tissues came from unexplained pregnancy losses that occurred before 24 weeks of pregnancy. The woman may have miscarriage for the first time or have had a miscarriage before. Samples of abortion tissue were excluded by 1. pregnant women diagnosed with abnormal chromosome abortion or other known causes; 2. pregnant women who had given birth normally ever.
There was usually no change in the human genome from birth to adult. So we collected peripheral blood from 100 individuals’ adults with no significant medical history and no reproductive system diseases as the control group to compare genetic variants with abortion tissues. Demographic information, medical and obstetric histories were also evaluated for both groups. All procedures performed in the study involving abortion tissue samples were by the ethical standards of The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital. The tissue or blood was frozen by liquid nitrogen.
After thawed, washed the aborted tissue with phosphate-buffered saline (PBS), and extracted the Genomic DNA by QIAamp® DNA Mini Kit (Qiagen, Hilden, Germany). Genomic DNA was extracted by QIAamp® DNA Mini Kit (Qiagen, Hilden, Germany) from blood in the same way. Quantity and purity of gDNA were assessed by Qubit® 3.0 Fluorometer (Invitrogen, Carlsbad, CA, USA) and NanoDrop-One (Thermo Scientific, Wilmington, DE, USA). The isolated DNA was stored in the − 80℃ refrigerator immediately.
Library preparation and Illumina sequencing
1 ug of gDNA was fragmented to the size range of 200–500 bp by M220 Focused-ultrasonicator (Covaris, UK), and next end repair, tailing and adaptor ligation with VAHTS Universal DNA Library Prep Kit for illumining (VAZYME), 7 cycles were performed in the amplification. Quality controlling the generate sequencing library by Agilent 2100 bioanalyzer (Agilent) and Qubit 3.0 (Invitrogen). Massively parallel sequencing of aborted tissue libraries was performed on the BGI500 platform (BGI, China). The sequencing data was 70M-200M PE150 raw reads per sample.
Sequence data analysis
We used WGL-MPS (whole-genome low-coverage mate-pair sequencing) technology to detect the CCRs (complex chromosomal rearrangements) in maternal chromosomes and further checked for aneuploidy by conventional PGT-A. Therefore, 72 abortion tissue samples without abnormal chromosome structure and number were selected as the experimental group. The number of original reads for each sample was more than 70 million. To eliminate the difference in sequencing depth, we randomly selected 70 million reads for each sample and merged the raw data. This newly merged data had a total of 5.04 billion reads number.
First, we performed a filtering p whole genome rocess by SOAPnuke (V1.5.0) on the merged data. After filtering, the merged data were mapped to the reference genome sequence(hg19)using BWA (V0.7.12). We used a self-developed program to analyze the bam file and counted the number of the four bases of ATCG at each coverage site. We retained high quality (depth was greater than 4 and non-reference frequency is greater than 10%) mutations and annotated the mutation sites with genomic, population frequency, and functional annotations. In the control group, we selected 100 whole genome sequencing samples. We performed the filtering, alignment, bam file statistics, mutation filtering, and annotation steps of the above experimental group to obtain the result file of each sample (Fig. 1a).
Selected all the genes related to abortion on hg19, therapeutic abortion (HP:0030449), recurrent spontaneous abortion (HP:0200067), spontaneous abortion (HP:0005268), use the Exomiser(v12.1.0). Then, found the genes and the mutation sites of these genes in the case sample. Next, retained meaningful mutation sites related to abortion (Fig. 1b). Afterward, selected the sites with a non-reference frequency was greater than 10% from the 72 samples. Subsequently, filtered out the variants with allele frequencies above 1% in the ExAC database or unknown population frequencies. The distribution of the population frequency in CHR is in Fig. 2. Variants fulfilling the following criteria were retained in exon regions: frameshift deletion, stopgain, nonframeshift insertion, nonsynonymous SNV and splicing. After that found out the corresponding mutation site in controls behind the above screening. Finally, found the non-reference mutation frequency (Non-reference mutation frequency, means to the sum of the frequency of occurrence of all bases different from the reference, regardless of the direction of mutation) of these mutation sites differed between merged sample, and controls significantly.