Articial Neural Network Model Using Immune-inltration Modules for Endometrial Receptivity Assessment of Implantation Failure

9 Objectives: This study was anchored on the state of local immune-infiltration in the endometrium, 10 which acts as critical factors affecting embryonic implantation, and aimed at establishing novel 11 approaches to assess endometrial receptivity for patients with IVF failure. 12 Methods: Immune-infiltration levels in the GSE58144 dataset (n=115) from GEO were analyzed by 13 digital deconvolution and validated by immunofluorescence (n=30), illustrating that dysregulation of 14 the ratio of M φ 1 to M φ 2 is an important factor contributing to implantation failure. Then, modules 15 most associated with M1/M2 macrophages (M φ s) and their hub genes were then selected by weighted 16 gene co-expression network and univariate analyses, then validated by GSE5099 macrophage dataset, 17 qPCR analysis (n=16), and western blot. It revealed that closely related gene modules dominated 18 19 pathway, and phagosome acidification, respectively. Their hub genes were significantly altered in 20 patients and related with ribosomal, lysosome, and proteasomal pathways. Finally, the artificial 21 neural network (ANN) and nomogram models were established from hub genes, of which efficacy 22 was compared and validated in the GSE165004 dataset (n=72). Models established by the selected 23 hub genes exhibited excellent predictive values in both datasets, and ANN performed best with an 24 accuracy of 98.3% and an AUC of 0.975 (95% CI 0.945-1). 25 Conclusions: Macrophages, proven to be essential for endometrial receptivity, were regulated by 26 gene modules dominating antigen presentation, interleukin−1−mediated signalling pathway, and 27 phagosome acidification. Selected hub genes can effectively assess endometrial dysfunction 28 receptivity for IVF outcomes by the ANN approach. 29


Introduction 33
Recurrent implantation failure (RIF) is one of the most frustrating and difficult areas in reproductive 34 medicine because the etiology is often unknown and there are few evidence-based diagnostic and 35 treatment strategies. Defective endometrial receptivity is currently becoming a critical theory in the 36 study of this disease. Endometrial receptivity, is a complex process that enables embryonic 37 attachment, invasion, and development. For healthy females, during the secretory phase, the window 38 of implantation (WOI) lasts from 3 to 6 days. In certain inflammatory or anatomical cases, this 39 window can be narrowed or displaced to inhibit normal implantation, resulting in infertility or loss of 40 7 at 37°C for 1 h with primary antibodies, which included mouse anti-CD68 (ab201973, Abcam, 134 dilution 1:200), rabbit anti-CD86 (13395-1-AP, Proteintech, dilution 1:200) for Mφ1 as well as 135 mouse anti-CD68, rabbit anti-CD86 (13395-1-AP, Proteintech, dilution 1:200) for Mφ2. Next, 136 sections were rinsed 3 times in phosphate-buffered saline (PBS) and stained using anti-rabbit-Alexa 137 Fluor® 488 (ab150073, abcam) and anti-mouse-Alexa Fluor® 594 (ab150064, abcam) for 1 h at 138 room temperature (both Invitrogen). Slides were mounted in the SlowfadeGold reagent containing 139 DAPI (Thermofisher, Landsmeer, The Netherlands) and examined using a NIKON CORPORA 140 microscope (Nikon, Japan). The percentage of Mφ1 or Mφ2 was obtained by counting fluorescence 141 positive cells and dividing them by DAPI signal points using Image J (Version 1.50b), respectively. 142

Weighted gene co-expression network 143
The network module was established using the WGCNA package (in the R environment, version 144 3.6.2) in GSE58144 dataset (21). To minimize noise in the gene expression dataset, data was filtered 145 as follows. Pearson correlation analysis was used to rank all genes according to their association with 146 Mφ1/Mφ2. Correlations between genes and Mφ1/Mφ2 were established at a cut-off of p≤0.05. To 147 reduce the computational burden and enhance signals in our data, we used 2,185 of the 5,531 genes 148 with the greatest variability ranked by the variance in our initial network construction with a cut-off 149 value of var > 0.05 (21). By definition, module genes are highly connected (i.e., module genes tend 150 to have relatively high connectivity). Therefore, for module detection, restricting analysis to the most 151 connected genes should not lead to a major loss of information. Then, we performed cluster analysis 152 of 2,185 genes in these 115 patients. The theory of network construction algorithm has been 153 previously described (22). Briefly, for co-expression module identification, Pearson correlation 154 matrices were first generated (average linkage method) for all pairwise genes. An adjacency matrix 155 was then constructed using a "soft" power adjacency function, = | ( )| . Based on scale-156 8 free topology criteria (R 2 =0.85), we selected a power of β=10. In WGCNA, a soft threshold 157 parameter, beta, of the power function was used to ensure that the co-expression network (adjacency 158 matrix) best approximates scale-free topology. This adjacency matrix was then transformed into a 159 topological overlap matrix to measure relative gene interconnectedness and proximity. Finally, gene 160 co-expression modules corresponded to branches of the resulting hierarchical clustering tree 161 (dendrogram). To ensure that genes in the analyzed network exhibited sufficient correlation, we set 162 the weight threshold of the co-expression network to 0.03. Uniform Manifold Approximation and 163 Projection (UMAP) analysis and visualization of the modules was performed using the umap 164 package (in R, version 3.6.2). Visual network diagram was constructed using Cytoscape (version 165 3.4.0). 166

Enrichment analysis of functional categories 167
The STRING v11.5 online tool (https://string-db.org/) was used for functional enrichment analysis of 168 the gene module that was most associated with endometriosis, identified after WGCNA analysis. In 169 the enrichment analysis, Gene Ontology (GO) terms (including Biological Process BP, Cellular 170 Component CC, and Molecular Function MF) as well as the Kyoto Encyclopedia of Genes and 171 Genome (KEGG) were used to evaluate functional categories and pathways for genes involved in the 172 module. Correlations with Mφ1/Mφ2 for all genes was calculated using the cor function in R 173 environment, version 3.6.2. Then, the GOplot package (in the R environment, version 3.6.2) was 174 used to visualize GO enrichment analysis results and their regulatory conditions. The formula for 175 ，which is a value that indicates whether GO terms are 176 more likely to be decreased (negative value) or increased (positive value). Gene-set enrichment 177 analysis (GSEA) was performed on the GSE5099 dataset using the "ClusterProfiler" (24) package in 178 R. The Broad Molecular Signature Database (MSigDB v7.0) dataset in the Kyoto Encyclopedia of 179 9 Genes and Genomes (KEGG) (c2.cp.kegg.v7.0.symbols) was used. This database summarizes and 180 presents specifically well-defined biological states and pathway processes. For statistical significance 181 estimation, the GSEA program was run with 1,000 permutations, and correlations between selected 182 genes and other genes were used to rank all genes.  Table 3. PCRs were performed on an LightCycler 480 PCR System (Roche,190 Germany) with the protocol for the SYBR Premix Ex TaqTM II (RR820A, Takara). The reaction 191 began at 95°C for 30 seconds for initial denaturation, followed by 35 cycles of 5 seconds at 95°C and 192 34 seconds at 60°C(23). The measurements were repeated three times, and the relative quantification 193 was performed by the comparative CT (2^-ΔΔCT) method. 194

prognostic model establishment and statistics 195
The ANN prognostic model was implemented using the nnet package in R environment, version 196 3.6.2 (12). To determine the number of units in the hidden layer, the GSE58144 dataset was used: 197 . Moreover, to ensure maximum optimization of the 198 prognostic model, NNET was run with 100 permutations in the GSE165004 dataset for verification. 199 For prognostic model efficacy testing, Z tests were used to determine the significance of the area 200 under the receiver operating characteristic (ROC) curve (AUC) using the pROC package in R, 201 10 version 3.6.2. Sensitivity, specificity, Youden index (YI), positive predictive value (PPV), and 202 negative predictive value (NPV) were used to assess prognostic values of hub genes and the machine 203 learning model. 204 Comparisons of cell proportions and single gene expression levels between the two groups were 205 performed by Wilcox test. Normally distributed continuous variables were analysed by the Students 206 t-test and paired t-test was used for paired analysis (in R, version 3.6.2). 207 Then, we performed enrichment analysis of the three modules. We found that the main biological 238 processes of the three modules were the three aspects of macrophage functions (antigen processing 239 and presentation of exogenous peptide antigen via MHC class I, TAP−dependent, FDR=4.5×10 -10 ; 240 phagosome acidification, FDR=6.2×10 -4 ; interleukin-1-mediated signaling pathway, FDR= 1.13×10 -241 13 ), while the corresponding GO-CC and GO-MF terms of the three were also different. KEGG 242 enrichment analysis revealed that the three modules were enriched in ribosomal, lysosome, and 243 proteasomal pathways, respectively (Fig. 4A). 244

Selection and verification of hub genes associated with Mφ1/Mφ2 245
12 Due to differences in biological roles of the three gene modules and high-levels of consistency in 246 expression among genes within the modules, we selected three hub genes in each of the modules. 247 First, we selected genes that were most associated with modules and Mφ1/Mφ2 based on the median 248 of the membership in the module and gene significance for Mφ1/Mφ2 (Fig. 3C). Then, we screened 249 the hub genes in the network based on the number of connections between nodes. CytoHubba was 250 used to calculate the top 40 hub genes (Fig. 4B). Intersections between the two sets of hub genes 251 were established. Intersections for the genes in blue, turquoise, and green modules were 15, 15, and 252 19, respectively (Fig. 4C). Finally, univariate analysis revealed that RPS9, DUT, and KIAA0430 253 genes were significantly associated with implantation failure in the blue, turquoise, and green 254 modules (Fig. 4D). 255

Lab and external verification of hub genes expression and correlation with Mφ1/ Mφ2 256
We first validated the altered mRNA expression levels of DUT, RPS9, and KIAA0430 in 257 endometrial tissues. Compared with the control group, the expression of all three genes (P=0.029, 258 0.028, and 0.0006 for DUT, RPS9, and KIAA0430, respectively) in patients with IVF failure was 259 significantly downregulated (Fig.5 A). 260 Then, we selected Mφs datasets in GSE5099 to verify the association between the above screened 261 genes and macrophage polarization. As shown in Fig. 5B (Fig. 6B). As a further verification that the ANN model is the optimal model for 279 implantation failure prediction, we also established a prediction model based on logistic regression. 280 In Fig. 6C, expression levels of the three hub genes were found to be significantly different between 281 the two groups, consistent with the results of the GSE58144 dataset. Also, the risk score, based on 282 the logistic regression, showed the ability for defective endometrial receptivity identification. The 283 Nomogram based on logistic regression is shown in Fig.6D. Prognostic results were averaged and 284 plotted as ROC, and were found to compare to those by modules and hub genes. As result, the AUC 285 of the ANN model was 0.975 (95% CI 0.945-1), significantly better than that of nomogram model, 286 DUT, RPS9, and KIAA0430 (p=0.0439, 9.27×10 -6 , 1.15×10 -6 , and 5.33×10 -4 ), as shown in Fig. 7A. The altered immune microenvironment in the uterine cavity, an essential aspect of endometrial 292 receptivity, is a crucial factor for a successful IVF(24). However, immune-related factors have been 293 seriously underestimated when used to assess endometrial receptivity. In this study, we found that the 294 balance of the ratio between Mφ1 and Mφ2 is an important factor that affects endometrial receptivity 295 for patients. Then, we screened different biological functional gene modules and obtained hub genes 296       Assessment for the models and genes in implantation prediction. A) ROC curve for ANN, Nomogram, DUT, RPS9 and KIAA0430 (left), and AUC comparison between models and genes. B) Diagnostic e cacies for the models and genes. YI=Youden index, PPV=positive predictive value, and NPV=negative predictive value.

Figure 8
Regulation of mid-secretory macrophage mechanisms. Macrophage polarization can ameliorate endometrial receptivity through regulation of hub genes, DUT, RPS9, and KIAA0430, mediating ribosomal as well as proteasomal pathways for endometrial decidualization.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.