Prediction key genes and the potential molecular mechanism for early recurrent pregnancy loss by Multi-Omics analysis

Background: The potential mechanism of early recurrent pregnancy loss (ERPL) has not been fully elucidated, a multi-omics analysis can help us find key genes. Results: We download data from Gene Expression Omnibus (GEO), 90 hypermethylation-down regulated genes and 49 hypomethylation-up regulated genes were identified by intersection. Compared with the normal early pregnancy group, the expression of ERCC2 and MLH1 was lower in ERPL group, but the expression of PLEK and FOS was higher, and the expression of MLH1 was statistically significant (p<0.05). Compared with the anembryonic (empty sac) miscarriage group, the expression of MLH1, PLEK, FOS decreased, and that of ERCC2 increased in the embryonic miscarriage group. TF-MDEGs networks predicted SP1, POLR2A, YY1, CREM and CREB1 were involved in methylation regulation of DNA promoter with MDEGs. Among them, YY1, FOXP3 and p53 may be related to the mechanism of MLH1 in ERPL. Conclusions: Our study identified possible aberrant MDEGs, and TF- MDEGs regulatory networks may be related to its mechanism. MLH1 may play an important role in early embryonic development.


Introduction
Early recurrent pregnancy loss (ERPL) concerning approximately 1% of couples aiming at childbirth [1] , is generally defined as two or more consecutive clinical pregnancy losses before 12 weeks of gestation. Majority of sporadic cases of ERPL (50-60%) are due to an accidental non-inherited nondisjunction events causing chromosome aneuploidies [2] .
Another current known causes of ERPL include age, uterine abnormalities, endocrine disorders, immune disorders and thrombosis [3] . However, there are still 25 to 50% of normal karyotype RPL remain unexplained after evaluation [4] , and the risk of premature birth, stillbirth, pre-eclampsia increases after normal pregnancy [5] . ERPL continues to be a clinically frustrating challenge for pregnant women. Thus, discovering a new mechanism is urgent.
DNA methylation, as one of the major epigenetic modifications, contributes to gene expression regulation and chromosomal stable modification [6] . Animal studies show that genome-wide changes in methylation reprogramming result in embryonic arrest and embryo loss prior to implantation [7] . During pregnancy, placenta plays a crucial role in the normal growth and development of the fetus. Scholars observed abnormal DNA methylation in human chorionic villus, and the loci of the imprinted genes can result in ERPL with normal karyotype [8,9] . A genome-wide DNA methylation analysis reveals that CREB5 hypomethylation can recruit TFs, such as P53 and SP1, which alter trophoblast cell function, leading to RPL [10] . The promoter or enhancer region is often accompanied by the binding of TFs, suggesting that it may be involved in the regulation of DNA methylation [11][12][13] . Abnormal DNA methylation of the placental gene is closely related to the normal growth and development of the fetus, leading to normal human pathological conditions, especially abortion [14] . Insufficient DNA methylation in villus tissue is likely to be an independent predisposing factor for ERPL.
However, due to the small size of most studies, which is limited to the existing etiological hypothesis of ERPL to measure a small set of genes, this phenomenon may be explained in a biased manner. Multiple cellular processes and biological key genes and pathways cannot be involved. As a result, the relationship between TF and DNA methylation is inconclusive with RPL and needs to be further studied.
To date, no research has been performed to jointly analyze of both gene and methylation expression profiling microarray in ERPL. We performed a multi-omics analysis on existing microarray data, combined with gene expression profiling microarray (GSE22490, GSE76862) and gene methylation profiling microarray (GSE43256, GSE74738), and identified aberrant MDEGs and pathways in ERPL. Key genes were revealed through PPI network. TFs associated with DNA promoter methylation were also predicted to explore possible mechanisms of TF-DNA methylation interactions. In this way, we expect to find novel key genes and reveal the potential molecular mechanism in ERPL.

Microarray data processing
Gene expression profiling data (GSE22490, GSE76862) and gene methylation profiling data (GSE43256, GSE74738) were collected through the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) of The National Center for Biotechnology Information (NCBI). All of these were collected from early karyotype normal placental villi.
The microarray data were analyzed by GEO2R, and identify differentially expressed genes (DEGs) and differentially methylated genes (DMGs), Cut-off criteria for DEGs and DMGs were defined as p<0.05 and |t|>2. By overlapping DEGs and DMGs, MDEGs were obtained.
Functional analysis and enrichment analysis were performed on MDEGs at least three or three intersections.

Integration of Protein-protein interaction (PPI) networks and functional enrichment analysis
The STRING database was used to construct PPI network with DMGs (cut-off criteria score 0.4). PPI network was visualized by Analyzer module in Cytoscape software, and the key genes were determined by node degree. To clearly understand the biological mechanism of DMGs, Gene ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed using the Omicshare tools (http://www.omicshare.com/tools) and the R software cluster Profiler package. P<0.05 was regarded as statistically significant.

Study population and sample collection
The villus tissues were obtained from selected 22  history. The embryonic development of this pregnancy is normal. The gestational age of the two groups was less than 12 weeks, and no significant difference was observed in age and gestational age (ultrasound) between the two groups. At the same time, the clinical data, such as age, previous pregnancy history, and gestational age were collected ( Table   1). The following specific ultrasound information was included: gestational sac size (GDS), crown-rump-length (CRL), embryo or germ and cardiac tube beat. According to ultrasound, the ERPL group was divided into two subgroups: the anembryonic (empty sac) miscarriage group (n = 6) and embryonic miscarriage group (n = 16). Anembryonic (empty sac) miscarriage is defined as the loss of intrauterine pregnancy with no yolk sac or embryo but with pregnancy sac during ultrasonography, and embryonic miscarriage is defined as the loss of intrauterine pregnancy in which ultrasound shows that the fetus does not have a pulsatile tube [15] . All subjects in this study were provided with a verbal explanation and had signed an informed consent. The procedures were approved by the medical ethics committee of the first affiliated hospital of Guangxi Medical University.

qRT-PCR of key genes
Four key genes of ERCC2, PLEK, FOS, MLH1 were selected and verified by qRT-PCR. Villus tissue of 100 mg was taken and evenly ground. The total RNA was extracted using TRIZOL (Thermo). Total RNA of 1 μg was reverse-transcribed according to generate cDNA by using the reverse transcription kit (QIAGEN). The reverse-transcribed cDNA was diluted 10 times and used as a template for qRT-PCR detection. The reaction conditions of PCR were as follows: 95 ℃ 2min, 95 ℃ 5s, 60 ℃ 30s, 40 cycles, along with GAPDH was used as endogenous control. The relative expression was calculated by 2-∆∆Ct. All primer information of PCR was shown in Table S1.

Statistical analysis
Data were presented as Mean ± SD. The significant difference between two groups was determined by student's t-test using SPSS 22.0, visualized with GraphPad PrismVersion 8 (San Diego, CA, USA). P<0.05 were considered statistically significant.

TF-MDEGs networks
GeneHancer (https://www.genecards.org/) predicted the TFs of the key genes' promoter interaction and the highest part of the TFs of the Gene Association Score. PROMO (http://alggen.lsi.upc.es/cgibin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3) and JASPAR (http://jaspar.genereg.net/) databases were used to verify whether binding sites between TFs and key genes. At the same time, from the Pubmed retrieval of TFs and RPL of the published literature, combined with the TFs, the possible mechanism of key genes was further explored.

Identification of aberrantly MDEGs in ERPL
According to analyze by GEO2R, GSE22490 was screened 707 up-regulated genes and 692 down-regulated genes, 644 up-regulated genes and 1326 down-regulated genes were screened by GSE76862. A total of 2800 hypermethylated genes and 993 hypomethylated genes were found in GSE43256, and 9509 hypermethylated genes and 8985 hypomethylated genes were identified by GSE74738. As a result, 90 hypermethylateddown expression genes and 49 hypomethylated-up expression genes were obtained by overlapping DEGs and DMGs (Fig. 1, Table 2). The heatmap of top 50 up-regulated and down-regulated genes in GSE22490 and GSE43256 were shown in Fig. 2.

PPI networks and functional enrichment
Constructed by STRING database, the first 10 key genes in PPI networks were PLEK, FOS,  (Table 3). The difference was not statistically significant (Fig 4, Table S2, S3).

TF-MDEGs networks and predicted MLH1 function
The hypermethylated-down expression genes had the highest correlation with SP1, POLR2A, YY1, DRF2 and PKNOX1 (Fig. 5a). The hypomethylated-up expression genes were closely related to SP1, POLR2A, CREM, CREB1 and ZEB2 (Fig. 5b). Thirty-three TFs were related to ERPL from Pubmed and through PROMO (cut-off criteria 0%). Twenty-two TFs associated with MLH1 were predicted. By JASPAR database (cut-off criteria score 450), 21 TFs with binding sites to MLH1 were predicted. Combined with the reported TFs related to ERPL, FOXP3, YY1 and P53 may be related to the mechanism of MLH1 in ERPL (Fig 5c, d).

Discussion
Abnormal methylation, especially abnormal promoter CpG island methylation, can be used as a marker to monitor the dynamic changes of many diseases [16,17] , and plays an important role in embryonic development, X chromosome inactivation, genetic imprinting, and chromatin stability [14] . Therefore, abnormal DNA methylation in early placental villi is also considered to be an important mechanism leading to developmental failure [7] . The analysis of abnormal DNA methylation genes helps discover the mechanism of placental resistance to adverse environmental conditions and allows us to take protective measures.
In our study, we found PLEK, FOS, PLCG2, ERBB2, EXO1, MLH1 and ERCC2 are in the core position in the regulatory network of ERPL occurrence and development. Early abortion less than 12 weeks of gestation is generally calculated according to the last menstruation of a woman, but delayed ovulation or miscarriage may be observed, thus, estimating the embryo size on the basis of ultrasound may be reliable [18] . In our study, ultrasound was used to estimate the gestational age of embryos, and the normal groups with no statistical difference were selected to respond well to the factors related to early embryonic development.
Early pregnancy loss can be divided into anembryonic (empty sac) miscarriage, yolk sac miscarriage and embryonic miscarriage, usually due to chorionic vascular hypoplasia, resulting in abnormal embryonic development [19] . However, Reus et al suggested that although a decrease in villus angiogenesis could be observed in the abortion tissues of women with recurrent pregnancy loss, there was no significant difference in the degree of vascularization between empty sac miscarriage and embryonic miscarriage [20] . Previous studies have shown that the size of yolk sac is related to the results of specific types of ultrasound [21] . However, research on the relationship between ultrasound and normal karyotype abortion is lacking. Whether other mechanisms affect the development of embryos is worth considering. In our study, we specifically divided the ERPL group into anembryonic (empty sac) miscarriage and embryo miscarriage according to ultrasound in order to explore whether the genes related to DNA repair and immunity are related to the early embryonic development. We preliminarily found that the expression of MLH1, PLEK, FOS was lower, and that of ERCC2 was higher in the empty sac miscarriage group than in the empty miscarriage group. We speculated that these genes may be involved in the development of early embryos. We could not obtain statistically significant results though due to the limitation of sample size. Yolk sac miscarriage was also excluded in the scope of the study. Accurately judging whether these genes influence the development of yolk sac is impossible. Expanding the sample size and further investigation are necessary.
The mechanism of DNA repair is crucial for maintaining the fidelity of DNA replication during mitosis, meiosis and post-meiotic development of male and female germ cells.
Common approaches include nucleotide excision repair (NER), base excision repair (BER) and DNA mismatch repair (MMR) [22] . Human MMR genes mainly include HMLH1, HMSH1, HMSH6. The defects of MMR gene lead to increase of base mismatch and instability of microsatellites during DNA replication, resulting in cell growth and apoptosis out of control [23] . MMR protein can promote cell cycle arrest, P53 protein phosphorylation and initiate apoptosis, whereas loss of MMR function can inhibit apoptosis involving exogenous DNA damage due to changes in the external environment, thus losing the inhibition of tumours [24] . To date, MMR plays an important role in embryonic development, but the detailed mechanism is unclear. Mismatch repair proteins involved in meiosis are identified, including MLH1, MLH3 and MSH4 [25] . Gene knockout mice reveal that MLH1 greatly influence infertility in female mice [26] . Ernest et al found that in men with sperm disorders, the ability of spermatocytes to express MLH1 decreases, resulting in meiosis and spermatogenesis [27] . In other studies, MLH1 gene is highly expressed in patients with azoospermia, combining with hMuts to form a large number of temporary complexes, initiating mismatch repair, resulting in excessive repair of DNA during meiosis and may likely cause meiosis to fail [28] . Although the results of spermatogenesis in MLH1 are inconsistent, MLH1 may play an important role in early embryonic development. To further explore the role of MLH1 in embryonic development is necessary. In the normal villi of early pregnancy, cytotrophoblast cells are highly proliferative, DNA replication is active and high expression of mismatch repair genes, such as MLH1, can maintain the stability of genetic material [29] . By contrast, the expression of syncytiotrophoblast is negative or weakened. Therefore, we speculate that the MLH1 expression helps maintain the normal function of trophoblast cells and meet the needs of embryonic development. Our qRT-PCR results also confirmed that the MLH1 expression in the villi of the normal early pregnancy group was significantly higher than that of ERPL. This result was consistent with the function of MLH1 in trophoblast cells.
The function of MLH1 gene seems abnormally related to the corresponding regulatory sequence of the promoter [30] . Some TFs are preferred to combine with these areas, which can regulate the establishment and maintenance of cells and local DNA methylation levels in a sequence-specific manner [31] . A semi-estrogen response element (hemi-ERE) is also present in the promoter region of the MLH1 gene. Under the action of other TFs, its estrogen can bind to the estrogen receptor and interact with the corresponding regulatory sequence of the MLH1 gene promoter and regulation of the transcription of downstream genes [32] . We predicted by GeneHancer that TF, SP1 and YY1, which may bind to MLH1.
SP1 is the first identified member of the zinc finger transcription factor family. Binding to DNA is essential for the embryonic development of normal mammals. Abnormal DNA binding domain can lead to a variety of heterogeneous phenotypic abnormalities in mice and intrauterine death [33] . As a member of the zinc finger protein GLI-Krüppel family, YY1 is involved in the inhibition and activation of a variety of promoters [34] and regulation of embryogenesis, proliferation and tumour and other cellular processes. The YY1 deletion leads to serious neurological defects, resulting in embryo death during implantation [35][36][37] . Tian et al found that there is a binding site between YY1 and MMP2 promoter, affecting the migration and invasion of trophoblast cells by regulating the expression of MMP2D [38] . We further predicted the binding sites of MLH1 promoter and TFs by UCSC and PROMO, needing to be further verified by experiments.

Conclusion
In this study, we predicted key genes in ERPL by a multi-omics analysis. The results showed that MLH1 in villus tissue of ERPL patients was significantly lower than that of normal early pregnancy, we speculate that DNA mismatch repair gene MLH1 may play an important role in the occurrence and development of ERPL. Further verification of how MLH1 plays an important role in the pathogenesis of ERPL will be the focus of our next research.

Ethics approval and consent to participate
All subjects in this study were provided with a verbal explanation and had signed an informed consent. The procedures were approved by the medical ethics committee of the first affiliated hospital of Guangxi Medical University.

Consent for publication
Not applicable.

Availability of data and material
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Funding
This work was supported by grants from the National Natural Science Foundation of China (Nos. 81960281, Nos. 81660256).
CQW and ZZ collected and analyzed the data from GEO, ZWZ collected the patient data. JC and LJD carried out the qRT-PCR. LH participated in the design of the study and wrote the manuscript. LHP conceived of the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript. the gestational diameter of the sac; GA on ultrasound: <7 weeks' size, GA (days)=GDS (mm) +30; 7-12 weeks' size, GA (days)= CRL (mm)+42 Data are presented as Mean ± SD, and dichotomous variables as n (%); *Two sample (ERPL/normal) t-test was used.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.