Key genes and pathways of ovarian granulosa cells in polycystic ovary syndrome identied by bioinformatics analysis

Purpose Polycystic ovary syndrome (PCOS) is one of the factors leading to infertility. The specic pathogenesis of PCOS is still unclear. The purpose of this study was to determine key changes in gene expression during the formation of PCOS and to provide a theoretical basis for the clinical diagnosis and treatment of PCOS. Methods We analyzed differentially expressed genes (DEGs) in the dataset GSE34526 from the bioinformatics array research tool (BART) online analysis tool (bart.salk.edu). Then, through the Database for Annotation, Visualization and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/) online analysis software for gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) rich path analysis, STRING (https://string-db.org/) online analysis tool for protein-protein interaction (PPI) network, Cytoscape software for Mcode module and HUB gene analysis

important role in the process of folliculogenesis [7]. Understanding gene expression of PCOS GCs is of great signi cance for effective diagnosis and treatment. In previous literature, many scholars have analyzed potential differential genes in the GSE34526 gene expression pro le, performing gene ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, PPI network analysis and Mcode module analysis but not Hub gene analysis [8][9][10][11][12].
In this study, we used the bioinformatics array research tool (BART) online analysis tool to analyze the original microarray dataset GSE34526 (healthy samples and PCOS female ovarian GCs for differentially expressed genes (DEGs). GO enrichment and KEGG pathway analysis, PPI network, Mcode module and Hub analysis were used to determine the genes, pathways and molecular mechanisms related to ovarian granulosa cells in women with PCOS to provide a theoretical basis for clinical diagnosis, treatment and prevention of PCOS.

Materials And Methods:
Microarray Data and Identi cation of DEGs GEO (http://www.ncbi.nlm.nih.gov/GEO) [13] is a public fun ctional genome database that contains highthroughput gene expression data, chips and microarrays. A gene expression dataset [GSE34526,] is selected through GEO (GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array). The GSE3526 dataset includes 10 samples, including 3 normal female ovarian granulosa cell samples and 7 PCOS female ovarian granulosa cell samples (Human granulosa cells were isolated from ovarian uid aspiration of normal and PCOS women who received in vitro fertilization) DEGs were downloaded from the BART (bart.salk.edu) [14]. The platform can process raw microarray data from GEO or local into a list of differential genes and related pathways. DEGs is determined by logFC greater than 1 or less than -1 and t-tests with adj. P <0.05.

GO Enrichment and KEGG Pathway Analysis of DEGs
The Database for Annotation, Visualization and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/) (Version 6.8) online analysis software was used for GO enrichment and KEGG pathway analysis of differential genes. GO analysis included three terms: BP (Biological Process), CC (Cellular Component), MF (Molecular Function). P < 0.05 had statistical signi cance in screening important GO terms and KEGG pathway.

PPI network construction and Module Analysis
The differential gene protein interaction network (PPI) analysis was performed using STRING (https://string-db.org/) (Version 11.0) online analysis software. Analysis of PPI has a better understanding of the pathogenesis of PCOS, and a minimum required interaction score of 0.400 indicates statistical signi cance. The TSV le guide Cytoscape (www.Cytoscape.org) (version 3.7.2) of the STRING analysis results was downloaded, which is an open source systems biology analysis software, which can be used for data visualization. Mcode (version 1.6.1) is a plug-in of Cytoscape software. The functional module of string protein gene network was constructed by clustering, and the network formed by TSV was analyzed again. The selection criteria were as follows: MCODE degree cut-off = 2, node score cut-off = 0.2, Max depth = 100, and k-score = 2. Then, the DEGs in Mcode were analyzed using KEGG and GO software.

HUB Gene Selection
The Cytohubba plug-in in Cytoscape software was used for HUB gene selection. The rst 10 HUB genes were screened by Radility, MHC, Degree, Stress and Closeness, and an overlapping HUB gene network was constructed. Results:

Identi cation of DEGs
We use BART online analysis software to analyze the DEGs of GSE34526. BART software can automatically download data from GEO and analyze it using the LIMMA bioinformatics software package. The original uorescence CEL le is used as input, divided into PCOS group and normal group. All samples were isolated from the ovarian uid of normal and polycystic ovary syndrome patients undergoing in vitro fertilization. There are a total of 54675 genes. The Hclust R function is used to cluster the rst 1000 expressed normalized genes (see Figure.1a), by analyzing that logFC is greater than 1 or less than -1 and t-tests with adj. P < 0.05. The results showed that there were 91 DEGs, 7 up-regulated genes and 84 down-regulated genes (see Figure.1b).

PPI network construction and Mcode Analysis
The TSV le was downloaded and analyzed by the STRING online analysis software, imported into Cytoscape, and inserted into the Mcode module, revealing 60 nodes and 193 edges. The nodes represent the DEGs, and edges in the PPI network represent interactions between DEGs. Based on this, two modules were obtained from the PPI network, as shown in Figure. Table 2).

HUB Genes Selection
In this study, we used the cytoHubba plug-in in Cytoscape software to select the HUB gene and screened the rst ten genes according to the methods of Radility, MHC, Degree, Stress and Closeness (see Table 3).

Discussion:
GCs play an important role in the formation of follicles and the formation of cumulus-oocyte complex around the egg [15]. Recently, many scholars have found that cumulus GC genes can predict oocyte development [16,17].
Is the pathogenesis of PCOS related to GCs? Many scholars have found that there are changes in oocyte growth and embryonic potential in patients with PCOS. Abnormal GC function is one of the primary causes of follicular dysplasia in PCOS [18,19]. Victor Blasco and colleagues investigated the decreased expression of TAC3, TACR3 and KISS1 mRNA in mural granulosa and cumulus cells of patients with PCOS, which may be related to abnormal follicular development and ovulation disturbance in PCOS patients [20]. These data show that abnormal gene expression in ovarian GCs is closely related to the pathogenesis of PCOS. However, there are many genes in GCs, and whether there are other additional gene abnormalities and gene interactions leading to the pathogenesis of PCOS need to be further explored.
Many scholars have performed DEGs analysis, PPI network, GO enrichment and KEGG pathway analysis of GSE34526 datasets [8][9][10][11][12]. different genes were identi ed using different research methods, providing important clues for the diagnosis and treatment of PCOS. However, no HUB genes were analyzed.In this study, analysis of DEGs in the GSE34526 dataset (3 normal GCs and 7 PCOS female GCs) was performed using the BART online analysis software. This analysis tool has six modules. Users can test differential expression of the original microarray data from GEO or local data using the LIMMA bioinformatics software package [14]. Exclusion criteria were LogFC > 1 or <-1 and t-tests with adj. P < 0.05. A total of 91 DEGs, 7 up-regulated and 84 down-regulated, were found. GO enrichment analysis of DEGs using DAVID software showed that these genes were primarily involved in in ammatory reactions, plasma membrane and protein binding. The abnormalities of in ammatory cytokines and GC cell membrane receptors are reportedly related to the pathogenesis of PCOS [21][22][23]. In addition, recent studies have reported that SRAGE plays a protective role in the development of PCOS by inhibiting in ammation [24]. KEGG pathway analysis also showed that DEGs was primarily associated with infection and bacteriophage, consistent with Go enrichment analysis.
The PPI network of DEGs was analyzed by STRING, and the TSV le was downloaded and imported into the Cytoscape software, allowing identi cation of the two modules by the Mcode plug-in. GO enrichment and KEGG pathway analysis of DEGs in the module were performed using the DAVID online tool. Our research shows that Module 1 signi cantly participates in innate immune response, in ammatory response, Phagosome and IPAF in ammasome complex, while Module 2 signi cantly participates in interferon-gamma-mediated signaling pathway, clathrin-coated endocytic vesicle membrane, MHC class II receptor activity and Staphylococcus aureus infection. Previous research has con rmed that serum levels of interferon-γ in patients with PCOS are lower than in healthy women. Interferon-γ may be a new biomarker for the diagnosis and treatment of PCOS [22]. Androgens can induce apoptosis of GCs, a process related to macrophages. Therefore, infection and immunity play an important role in the occurrence and development of PCOS [25].
In addition, we also analyzed the PPI network of DEGs and used ve sequencing methods to identify seven HUB genes: ITGAM, CYBB, tTLR1, PTAFR, CD163, CASP1, and MMP9. ITGAM was the most prominent HUB gene, has been reported to be associated with the pathogenesis of PCOS [26,27], but no scholars have studied its expression in ovarian GCs or the speci c pathogenesis of PCOS. The relationship between CYBB, CASP1 and polycystic ovary syndrome has not been studied. At present, some studies have reported that saturated fat intake promotes the increase of circulating endotoxin levels and TLR-4 gene expression in obese women of childbearing age, especially in the presence of PCOS [28]. However, the role of TLR1 in the pathogenesis of PCOS is still unclear. PTAFR, a member of the G protein coupled receptor family, was detected in the luminal epithelial cells of embryonic diapause and is strongly expressed in all stages of resuscitation [29]. The relationship between PTAFR and PCOS has not been reported. CD163 is a marker of macrophages. Asa Lindholm and other scholars have found that expression of CD163 in peripheral blood is decreased in overweight women with PCOS [30]. Nine members of the family of matrix metalloproteinases, the main proteases involved in extracellular matrix remodeling, were identi ed. It was found that levels of MMP2 and 9 were higher in the circulation, follicular uid and granulosa cells of patients with PCOS, while levels of TIMP1 were constant or low. Increased activity of MMPs may disrupt the process of tissue remodeling, as well as the availability of growth factors and gap junctional communication, leading to the development of abnormal ovarian phenotypes in women with PCOS [31].
Based on our analysis, we speculated on the potential mechanism of seven HUB genes in the formation of PCOS. However, our study is limited by a lack of analysis on the dataset related to the peripheral blood of women with PCOS. In our next study, we will further analyze whether genes related to the peripheral blood of women with PCOS are consistent with the expression of GCs. In addition, we will perform molecular biology experiments to con rm the potential mechanism of Hub genes in the formation of PCOS to provide a theoretical basis for clinical diagnosis and treatment of PCOS.

Conclusion:
In conclusion, 91 DEGs, 2 network modules and 7 HUB genes were identi ed. The seven Hub genes, ITGAM, CYBB, TLR1, PTAFR, CD163, CASP1 and MMP9, and their associated signaling pathways identi ed in this study are worthy of further study. Molecular biological experiments are needed to con rm the role of these genes in the formation of PCOS to provide a basis for clinical diagnosis and treatment.