Comprehensive analysis of micro RNAs in recurrent implantation failures patients and construction of prediction models based on circulating micro RNAs

Background Recurrent implantation failure (RIF) is an obstacle in the process of assisted reproductive technology (ART). At present, there was limited research on its pathogenesis, diagnosis and treatment methods. Results In this study, a series of analytical tools were used to analyze differences in miRNAs, mRNAs and lncRNAs in the endometrium of patients in the RIF group and the control group. At the same time, multiple databases are used to predict the target genes of non-coding RNAs. Then the competing endogenous RNA (ceRNA) network was built to describe the relationship between gene regulation in the endometrium of the RIF. Concludes Based on the results of the logistic regression of co-expression miRNAs between serum and endometrial samples, we built a predictive model based on circulating miRNAs.


Introduction
Recurrent implantation failure (RIF) is a thorny issue that couples undergoing in vitro fertilization (IVF) /intracytoplasmic sperm injection (ICSI) may face. The generally accepted de nition is that women under the age of 40 have transferred at least four high-quality embryos in at least three fresh or frozen cycles or have transferred a total of 10 high-quality embryos but have not yet achieved clinical pregnancy (1)(2)(3).
Though along with improving in vitro fertilization embryo transfer (IVF-ET) technology and increasing clinical pregnancy rate, recurrent implantation failure is still a tough problem in the process of IVF-ET. The normal embryo implantation generally only occurs during the window of implantation (WOI)(4) which refers to days 20-24 of the normal menstrual cycle. Abnormalities of the endometrium at this stage were important factors that lead to RIF.
MicroRNAs (miRNA) are a class of non-coding RNA molecules with a length of about 22 nucleotides that are widely found in eukaryotic cells. It plays an important role in the post-transcriptional regulation of genes, which includes targeting mRNA, negatively by mRNA cleavage (5), releasing translational suppression (6), and targeting gene promoters (7). There had been some studies con rming the role of miRNA in endometrial regulation (8)(9)(10)(11). For example, miR-30b, miR-30d, and miR-494 had been reported to play an important role in the regulation of endometrial function (8). Recent researches reported miRNAs associated with RIF, such as miR-34c-5p (12) and miR-148A-3P (13). But the related mechanism was still unclear, and further in-depth studies were still lacking.
MicroRNAs not only exist in human cells, but also widely exist in extracellular body uids, such as plasma, serum, urine and saliva (14). Peripheral blood miRNA is stable and can resist freezing and thawing at 4 °C and − 80 °C (15). The stability is due to the binding with protein complexes, mainly Argonaute2 protein complexes, resulting in a protective effect on miRNAs (16). This characteristic makes peripheral blood miRNA a more stable biomarker for disease diagnosis (17). Fatemeh Azhari et. al showed that hsa-miR-145 and hsa-miR-23b were signi cantly down-regulated in both serum samples and endometrial samples of RIF patients (18). However, they did not describe the correlation in miRNA expression between serum and endometrium samples, nor the related regulatory mechanism.
Moreover, the regulatory effect of miRNA on mRNA can be regulated by LncRNA through competitive binding, which is called the endogenous RNA (ceRNA) network hypothesis (19). In the present study, we aim to use a larger sample size of data in our analysis to explore the regulatory molecular mechanism in endometrium of RIF patients at the WOI stage. At the same time, we aim to look for peripheral blood miRNAs closely related to RIF and provide a new way for non-invasive early diagnosis for RIF, thereby improving the clinical outcome of patients.

Method
We used R software (version 3.

Data acquisition and preprocessing
The raw data of endometrial expression pro le and corresponding clinical data of RIF group and control group (GSE71331, GSE71332) were obtained from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo). Paired serum and endometrial miRNA expression pro le data were extracted from GSE108966.
To ensure consistency in the results of the analysis, we started with raw data of the dataset. Due to the different character of the platforms, the raw data of GSE71331 and GSE71332 were processed by "Limma" R package(20) (Agilent-052909 CBC lncRNA mRNA V3, Agilent-046064 Unrestricted Human miRNA V19.0), and the raw data of GSE108966 was processed by "Deseq2" R package(21) (Illumina HiSeq 2500).
Weighted correlation network analysis (WGCNA) of miRNA of endometria samples With the "WGCNA" R package (22), WGCNA was performed on differentially expressed miRNAs which selected based on GSE71332 dataset. The work ow of WGCNA was as follows. First, outliers were reduced by ltering the RNA-seq data. The coexpression matrix was constructed based on the absolute points of the relationships between gene expression levels. Then, we constructed a Pearson correlation matrix of paired genes, and a weighted adjacency matrix was developed based on the power function (where is the Pearson correlation between gene b and gene d, is the adjacency between gene b and gene d, and β emphasizes the strength of the relationship between the genes). Then, we selected an optimal β value to develop a scale-free coexpression network and a similarity matrix. Next, a topological overlap matrix (TOM) was constructed from the adjacency matrix. We performed average linkage hierarchical clustering with a minimum gene dendrogram size of 40 by using TOM-based dissimilarity measurements. By analyzing the modules, we calculated the dissimilarity and constructed module dendrograms for these modules.
We then calculated gene signi cance (GS) to estimate the signi cance of each module and measure the relationships between genes and sample traits. In the principal component analysis, module eigengenes (MEs) were considered as the main components, and we summarized gene expression levels as a single feature within a given module. Then, the p-value of the linear regression between gene expression levels and clinical feature were transformed, and the result was de ned as the GS value (lgP = GS). Next, to estimate the relationship between the module and sample traits, we calculated the average value of GS, which was then de ned as the module signi cance (MS). We used the relevant p-value to determine statistical signi cance. Next, clinical data including rst trimester losses and live births were combined with gene modules for further analysis.
For the identi cation of key genes, the GS and module membership (MM, the correlation between the genes in the module and their expression pro les) of every key gene were calculated with the following thresholds: cor. gene GS > 0.5 and cor. gene MM > 0.8.
Selection and validation of co-expression miRNAs between serum and endometrial samples The intersection of endometrium DE miRNAs and serum DE miRNAs was taken as intersection miRNAs. To ensure that their expressions were relevant, Pearson correlation analysis is performed by using GraphPad Prism (version 8). Genes with the Pearson correlation coe cient | r | ≥ 0.5 were considered to be co-expressed miRNAs between serum and endometrium. And the intersection miRNAs were then validated in GSE71332.

Functional enrichment analysis of targeted DE mRNAs
To understand the biological function of targeted DE mRNAs of GSE71332 and targeted mRNAs of GSE108966, Metascape (29) (http://metascape.org), which includes abundant functional annotations, such as KEGG pathway, Reactome pathway, canonical pathway, GO biological process, and CORUM (the comprehensive resource of mammalian protein complexes) annotations, was then used to perform functional enrichment analysis with a p-value of < 0.01 as the cutoff value (29). Selected terms with a pvalue of < 0.01 and a number` of genes greater than or equal to 3 were considered signi cant terms.
Transcriptional Regulatory Relationship analysis of targeted DE mRNAs TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust) is a TF-target regulatory interactions database based on the manual curation of Medline abstracts (30). TRRUST was subsequently used to screen transcription factors related to targeted DE mRNAs and targeted mRNAs and study their transcription regulation relationships.

Causal relationship analysis
DisNor (31) (https://disnor.uniroma2.it/) is a web-based tool that can generate and explore protein interaction networks based on disease genes using Mentha protein interaction data and causal interaction information annotated by SIGNOR. DisNor was used to explore the causal relationships among targeted DE mRNAs.

Construction and validation of nomogram based on circulating miRNAs
Logistic regression analysis was then performed with three selected factors by using "survival" R package (32) to select the best t model. Then a nomogram was built to predict the risk of RIF patients by using "rms" R package. At the same time, the consistency index (C-index) was calculated to evaluate the model's ability to distinguish. The consistency of the predicted probability and the actual probability of the model was evaluated by the Calibration curves. The predictive performance of the model was evaluated by drawing the receiver operating characteristic (ROC) curve and calculating the area under the curve (AUC) values.

Result
Clinical characteristics of samples used in the study All data came from samples taken during the window of implantation. The clinical characteristics of RIF group and control group in GSE71331 and GSE71332 were listed in Table 1

Selection of RIF related miRNAs by WGCNA in GSE71332
A gene coexpression network was then constructed based on the samples of GSE71332 by WGCNA to select the most signi cant gene modules and genes. This procedure can also help to elucidate the relationship between genes and clinical features. We rst eliminated the outlier samples, showing the highest variance in the module, for further analysis. Next, a scale-free network was constructed with a soft threshold of β = 7, and 16 modules were selected with a minimum module size of 50 for further analysis (Fig. 2 B).
Then, the overall expression gene level was taken as the MS to estimate the relationship between the corresponding modules and clinical phenotypes (Fig. 2 A). Based on the results, we found that the turquoise module showed the most signi cant positive correlation with the RIF (cor = 0.97, p < 0.0001) (Fig.2 C). Therefore, the turquoise module was chosen as RIF related modules.
Finally, 97 intersection miRNAs between DEG and WGCNA were selected as RIF-related DE miRNAs (supplement table 3).

Construction of lncRNA-miRNA-mRNA regulatory network
Based on the interaction of the prediction of 3 databases (miRDB, miTarBase, TargetScan) and DE mRNAs, 45 mRNAs were selected for network construction. Similarly, 7 lncRNAs were selected (supplement table 4). Finally, a lncRNA-miRNA-mRNA regulatory network was constructed based on 80 miRNAs, 45 mRNAs and 7 lncRNAs by using cytoscape (Fig. 3 A, C). By using MCODE app of cytoscape, key module network was selected (Fig.3 B).

Functional enrichment analysis and causal relationship analysis
By using Metascape, we found that the targeted DE mRNAs were enriched mainly in terms of cell adhesion mediated by integrin (GO: 0033627), female pregnancy (GO: 0007565), fatty acid metabolic process (GO: 0006631) and reproductive structure development (GO: 0048608) (Fig.4 C). The pathways targeted DE mRNAs enriched in included focal adhesion, signaling by nuclear receptors, MAPK pathway and PIP3 activates AKT pathway (Fig.4 D).
Key modules were selected in functional network (Fig.4A) by using MCODE cytoscape app. We found that one of the key modules was closely related with reproduction function (Fig.4B red circle), and their st neighbor nodes were also drawn in Fig.4 B (outside the red circle).
By using the DisNor tool, we screened the rst neighboring genes of the targeted DE mRNAs, and built a causal relationship network (Fig.5). We found that these genes were closely related to MAPK signal pathway as we could see in the results of functional enrichment analysis.
Selection of co-expression miRNAs between serum and endometrial samples and functional enrichment analysis For GSE108966, 63 down-regulation miRNAs and 45 up-regulation miRNAs were selected from endometrium samples by Deseq2 with the following criteria: p value < 0.05 and |log 2-fold change| > 1 (Fig. 6 A). Similarly, 28 down-regulation miRNAs and 22 up-regulation miRNAs were selected from serum samples (Fig. 6 B).
Similarly, we used multiple databases to predict their target mRNAs (supplement table 5) and use the metascape tool for functional enrichment analysis. The results showed that the targeted mRNAs of hsa-miR-96-5p mainly enriched in the terms of cellular response to organonitrogen compound, negative regulation of cell differentiation and regulation of protein serine/ threonine kinase activity (Fig.7 A). Fig.7 B showed the pathway of the targeted mRNAs of hsa-miR-96-5p. We also noticed that the targeted mRNAs of hsa-miR-378e mainly enriched in protein serine/ threonine/ tyrosine kinase activity and regulation of organ growth (Fig.7 C). The pathway terms that the targeted mRNAs of hsa-miR-378e enriched in were cell cycle, Mitotic (Fig.7 D).

Construction and validation of nomogram based on circulating miRNAs
Based on the results of logistic regression analysis, the best t models included age, the expression of hsa-miR-96-5p, and the expression of hsa-miR-378e. A nomogram with C-index: 0.865 was established to act as a prediction tool of RIF (Fig.8 A), which means that our model has a good ability to distinguish. The calibration curves in Fig.8 B showed a good consistency of the predicted probability and the actual probability of the model with mean absolute error as 0.028, mean squared error as 0.00096 and 0.9 quantile of absolute error as 0.044. Fig.8 C showed the receiver operating characteristic (ROC) curve of the model, and the area under the curve (AUC) values was 0.865, which means a promising predictive performance.

Discussion
Epigenetic regulation of gene expression played an important role in the development of recurrent implantation failure (RIF), and one of the most important parts was miRNA. As a previous study showed, the expression levels of miRNAs in endometrium and serum were difference during the menstrual cycle. Several factors, including different miRNAs, were selected as key molecules of regulation of endometrial acceptance and implantation (33). However, there was still a lack of research on the molecules that affect endometrial tolerance before implantation and the mechanisms of early dialogue between the embryo and the uterus (33). At present, more and more studies were beginning to pay attention to the diagnostic predictive value of circulating miRNA (34,35). A research by Fatemeh Azhari et al. studied previously reported endometrial-speci c miRNAs based on PCR technology in endometrial and serum samples from RIF patients. They found has-miR-145 and has-miR-23b were both down-regulated in endometrial and serum samples (18). Due to the limitations of the sample and detection technology, they could only study the known potentially meaningful miRNAs and could not explore them in depth.
In this study, by using WGCNA, we built a lncRNA-miRNA-mRNA regulatory network to analyze the expression and regulation characteristics of miRNAs in the endometrium and serum. By using MCODE app in cytoscape, two key modules were selected. We noticed that both hsa-miR-23a and hsa-miR-23b were interacted with ACADSB, DUSP5 and NPR3. Fan Y et. al reported that the up-regulator expression of hsa-miR-23a could suppress hdac2 and activate NF-κB and in uence the ability of adhesion, invasion and proliferation of trophoblast (36). Our study showed that hsa-miR-23a played an important role in embryo implantation. At the same time, there had several studies suggested that hsa-miR-23a and hsa-miR-23b were closely related to MAPK pathway (37,38). As many studies reported, MAPK pathway played an important role in embryo implantation, and it was closely related to the ability of adhesion, invasion and proliferation of trophoblast and the procession of endometrium angiogenesis (39)(40)(41). The causal relationship network in Fig. 5 showed that DUSP5 down-regulates MAPK1 (R = 0.42). According to these results, we could make a hypothesis that lncRNA PART1 may act as a sponge of hsa-miR-23a/b to down-regulate DUSP5 to promote RIF.
In addition, we found that hsa-miR-96-5p which also highly expressed in serum was also closely related to the ability of adhesion, invasion and proliferation of trophoblast (42).Chen D et. al found that the function of human trophoblast cell could be promoted by down-regulating miR-96-5p via targeting DDAH1 (42). Another study con rmed that PTEN, the main negative regulator of PI3K-Akt pathway, was direct target of hsa-miR-96-5p (43). Several studies had shown that abnormal activation of AKT pathway was one of the main mechanisms of endometrial abnormality during WOI period (44)(45)(46). Based on the above results, we speculated that hsa-miR-96-5p could affect the receptivity of the endometrium during WOI by activated the PI3K-AKT pathway.
In this study, the results of functional enrichment analysis of miRNAs target genes also supported our conclusions. The targeted mRNAs of hsa-miR-96-5p mainly enriched in the terms of cellular response to organonitrogen compound, negative regulation of cell differentiation, regulation of protein serine/ threonine kinase activity, apoptosis pathway and MAPK signaling pathway.
At the same time, we also developed a non-invasive RIF diagnostic scoring model to assist in the diagnosis and treatment of RIF patients and it showed better predictability and accuracy. As far as we know, this was the rst RIF predictive scoring model based on circulating miRNA. Clinical trials of models will also be conducted soon. For this study, there were still some shortcomings, such as lacking adequate laboratory tests to verify the mechanism. Because of the huge work of validation experiments, we are planning to do so, and the results will be published in our future research.

Declarations
Ethics approval and consent to participate Not application.

Consent for publication
All authors had been carefully examined and agreed to publish.

Availability of data and materials
The data that support the ndings of this study are available in the Gene Expression Omnibus (GEO) database.

Competing interests
The authors declare that there are no con icts of interest to disclose.

Author Contributions
Cong Fang and Peigen Chen carried out the study. Peigen Chen analyzed and interpreted the data. Yingchun Guo and Tingting Li drafted the manuscript. Lei Jia, Yanfang Wang and Yi Zhou collected and analyzed the data. Cong Fang and Peigen Chen coordinated the study, participated in the design and reviewed the manuscript. All authors read and approved the nal manuscript. Figure 1 The work ow of this study.