We used bioinformatics methods to explore the biological characteristics of PCOS. A schematic of the study is shown in Fig. 1.
3.1 Results of Data Processing Process
Herein, we analyzed the GSE34526, GSE5850, and GSE98421 datasets. Samples were divided into PCOS and normal groups. The datasets (GSE34526, GSE5850, and GSE98421) were normalized and data cleaning operations were applied, including annotation probe usage(GSE34526 (Fig. 2a-b), GSE5850 (Fig. 2c-d), and GSE98421 (Fig. 2e-f) ). Boxplots were generated to visualize the data distribution before and after standardization, showing consistent expression levels among samples.
3.2 Analysis of DEGs associated with PCOS
Using the limma package, we analyzed the GSE34526, GSE5850, and GSE98421 datasets to identify differentially expressed genes (DEGs) associated with PCOS. In the GSE34526 dataset, 2077 genes met the criteria, with 1320 upregulated and 757 downregulated in the PCOS group. The volcano map (Fig. 3a) represents these results. Similarly, the GSE5850 dataset displayed 1113 threshold genes, with 485 upregulated and 628 downregulated (Fig. 3b). For the GSE98421 dataset, 567 threshold genes were identified, with 245 upregulated and 322 downregulated (Fig. 3c).
To identify RNA modification-related DEGs (RMRDEGs), the GSE34526 dataset was analyzed, revealing six RMRDEGs (ALYREF, AGO2, TET2, YTHDF2, TRMT61B). A DEG Wayne figure (Fig. 3d) display the different groups (normal/PCOS) in GSE34526, while a heat map (Fig. 3e) represents the clustering of these RMRDEGs. Table 2 shows the DEGs related to RNA modification.
Table 2
List of the RNA modification-related differentially expressed gene list
Gene Symbol
|
logFC
|
P.value
|
P.adj
|
AveExpr
|
ALYREF
|
2.445726262
|
0.018133586
|
0.416860734
|
7.377749002
|
NUDT1
|
1.700986254
|
0.032050661
|
0.480644162
|
7.345364833
|
AGO2
|
0.793577606
|
0.045731939
|
0.513127199
|
10.17742066
|
TET2
|
0.736254891
|
0.032323652
|
0.481568679
|
9.583706327
|
YTHDF2
|
-0.585378033
|
0.040860923
|
0.505449152
|
11.8732077
|
TRMT61B
|
-1.403558047
|
0.016363417
|
0.406696578
|
10.70490338
|
3.3 Functional enrichment analysis (GO) and pathway enrichment (KEGG) of DEGs associated with RNA modification
Functional enrichment analyses were conducted to understand the biological pathways and relationships of the six identified RMRDEGs (ALYREF, NUDT1, AGO2, TET2, YTHDF2, TRMT61B) in PCOS. GO (Table 3) and KEGG (Table 4) enrichment analyses revealed significant findings. Enrichment items with P-value < 0.05 and FDR value (P-value) < 0.05 were considered statistically significant. The results were presented using bubble diagrams (Fig. 4a, b) and ring network diagrams (Fig. 4c, d). The RMRDEGs showed enrichment in pathways such as Wnt signaling, Notch signaling, RNA methylation, and mitochondrial matrix. Molecular functions associated with the RMRDEGs included endonuclease activity, catalytic activity, and single-stranded RNA binding. KEGG enrichment analysis highlighted involvement in pathways such as amyotrophic lateral sclerosis, RNA transport, spliceosome, and mRNA surveillance.
Table 3
GO enrichment analysis results of RNA modification related DEGs
ONTOLOGY
|
ID
|
Description
|
GeneRatio
|
BgRatio
|
P.value
|
p.adjust
|
qvalue
|
BP
|
GO:0061014
|
Positive regulation of mRNA catabolic process
|
2/6
|
48/18670
|
9.65e-05
|
0.021
|
0.008
|
BP
|
GO:1903313
|
Positive regulation of mRNA metabolic process
|
2/6
|
78/18670
|
2.56e-04
|
0.021
|
0.008
|
BP
|
GO:0006446
|
Regulation of translational initiation
|
2/6
|
79/18670
|
2.62e-04
|
0.021
|
0.008
|
BP
|
GO:2000241
|
Regulation of reproductive process
|
2/6
|
146/18670
|
8.92e-04
|
0.033
|
0.013
|
BP
|
GO:0006413
|
Translational initiation
|
2/6
|
193/18670
|
0.002
|
0.033
|
0.013
|
CC
|
GO:0000932
|
P-body
|
2/6
|
84/19717
|
2.66e-04
|
0.006
|
0.002
|
CC
|
GO:0036464
|
Cytoplasmic ribonucleoprotein granule
|
2/6
|
212/19717
|
0.002
|
0.014
|
0.004
|
CC
|
GO:0035770
|
Ribonucleoprotein granule
|
2/6
|
223/19717
|
0.002
|
0.014
|
0.004
|
CC
|
GO:0005845
|
mRNA cap binding complex
|
1/6
|
12/19717
|
0.004
|
0.016
|
0.004
|
CC
|
GO:0000346
|
Transcription export complex
|
1/6
|
14/19717
|
0.004
|
0.016
|
0.004
|
MF
|
GO:0008174
|
mRNA methyltransferase activity
|
1/6
|
11/17697
|
0.004
|
0.035
|
0.007
|
MF
|
GO:0035197
|
siRNA binding
|
1/6
|
11/17697
|
0.004
|
0.035
|
0.007
|
MF
|
GO:0047429
|
Nucleoside-triphosphate diphosphatase activity
|
1/6
|
11/17697
|
0.004
|
0.035
|
0.007
|
MF
|
GO:0000340
|
RNA 7-methylguanosine cap binding
|
1/6
|
12/17697
|
0.004
|
0.035
|
0.007
|
MF
|
GO:0000339
|
RNA cap binding
|
1/6
|
19/17697
|
0.006
|
0.035
|
0.007
|
RMRDEGs: RNA modification-related differentially expressed genes, GO: Gene Ontology, BP: biological process, CC: cellular component, MF: molecular function. |
Table 4
KEGG enrichment analysis results of RNA modification-related DEGs.
ONTOLOGY
|
ID
|
Description
|
GeneRatio
|
BgRatio
|
P.value
|
p.adjust
|
Q value
|
KEGG
|
hsa03015
|
mRNA surveillance pathway
|
1/1
|
97/8076
|
0.012
|
0.038
|
0.008
|
KEGG
|
hsa03040
|
Spliceosome
|
1/1
|
151/8076
|
0.019
|
0.038
|
0.008
|
KEGG
|
hsa03013
|
RNA transport
|
1/1
|
186/8076
|
0.023
|
0.038
|
0.008
|
KEGG
|
hsa05014
|
Amyotrophic lateral sclerosis
|
1/1
|
364/8076
|
0.045
|
0.056
|
0.012
|
KEGG
|
hsa05168
|
Herpes simplex virus 1 infection
|
1/1
|
498/8076
|
0.062
|
0.062
|
0.013
|
KEGG: Kyoto Encyclopedia of Genes and Genomes, RMRDEGs: RNA modification-related differentially expressed genes. |
3.4 GSEA and GSVA enrichment analysis of the PCOS dataset
GSEA and GSVA enrichment analyses were conducted on the PCOS dataset (GSE34526) to assess the impact of DEG expression levels. The relationship between DEGs and their associated biological processes (BPs) in different groups (normal/PCOS) was examined. The DEGs exhibited significant enrichment in pathways such as Hedgehog, MAPK, JAK-STAT, and Notch (Fig. 5b-e, Table 5). Furthermore, GSVA enrichment analysis was performed on the GSE34526 dataset to compare PCOS and normal samples. The analysis revealed significant differences in functional enrichment between the two groups, particularly in pathways such as Notch, Wnt, IL2, and IL6 (Fig. 5f, Table 6).
Table 5
Results of the GSEA analysis of the GSE34526 dataset
Description
|
setSize
|
enrichmentScore
|
NES
|
P.value
|
qvalues
|
KEGG_HEDGEHOG_SIGNALING_PATHWAY
|
53
|
-0.38933726
|
-1.551408026
|
0.013736264
|
0.069866662
|
BIOCARTA_MAPK_PATHWAY
|
80
|
0.448943803
|
1.727238945
|
0.002898551
|
0.025339617
|
KEGG_JAK_STAT_SIGNALING_PATHWAY
|
152
|
0.375749027
|
1.581280386
|
0.002649007
|
0.024018199
|
WP_NOTCH_SIGNALING
|
44
|
0.416721058
|
1.418660187
|
0.053398058
|
0.176443176
|
KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION
|
250
|
0.40029646
|
1.771594646
|
0.001221001
|
0.017365304
|
WP_MAPK_SIGNALING_PATHWAY
|
236
|
0.382955129
|
1.687913094
|
0.001226994
|
0.017365304
|
REACTOME_INTERFERON_SIGNALING
|
186
|
0.497277529
|
2.142536431
|
0.001272265
|
0.017365304
|
KEGG_CHEMOKINE_SIGNALING_PATHWAY
|
179
|
0.475981098
|
2.043156396
|
0.001282051
|
0.017365304
|
WP_VITAMIN_D_RECEPTOR_PATHWAY
|
164
|
0.438926203
|
1.865810445
|
0.00131406
|
0.017365304
|
WP_CHEMOKINE_SIGNALING_PATHWAY
|
161
|
0.461873975
|
1.950643205
|
0.001335113
|
0.017365304
|
WP_HEPATITIS_B_INFECTION
|
148
|
0.416609384
|
1.745983235
|
0.001338688
|
0.017365304
|
REACTOME_TOLL_LIKE_RECEPTOR_CASCADES
|
145
|
0.539202181
|
2.250308586
|
0.001355014
|
0.017365304
|
NABA_ECM_AFFILIATED
|
141
|
0.428886833
|
1.787868154
|
0.001358696
|
0.017365304
|
KEGG_NATURAL_KILLER_CELL_MEDIATED_CYTOTOXICITY
|
128
|
0.475032981
|
1.966855354
|
0.001362398
|
0.017365304
|
KEGG_CELL_ADHESION_MOLECULES_CAMS
|
125
|
0.442460334
|
1.827867032
|
0.00136612
|
0.017365304
|
GSEA: Gene Set Enrichment Analysis. |
Table 6
Results of the GSVA analysis of the GSE34526 dataset
Description
|
logFC
|
AveExpr
|
t
|
P.Value
|
B
|
HALLMARK_IL6_JAK_STAT3_SIGNALING
|
0.573437762
|
-0.001917768
|
3.431651992
|
0.001563163
|
-1.155628174
|
HALLMARK_ALLOGRAFT_REJECTION
|
0.54725735
|
0.018321638
|
3.348258541
|
0.00196351
|
-1.358354491
|
HALLMARK_INFLAMMATORY_RESPONSE
|
0.502184956
|
-0.02553951
|
3.052933165
|
0.004322486
|
-2.056087693
|
HALLMARK_INTERFERON_GAMMA_RESPONSE
|
0.477406957
|
-0.024534411
|
2.852626222
|
0.007248781
|
-2.509152526
|
HALLMARK_COMPLEMENT
|
0.464929639
|
0.058184326
|
2.726586773
|
0.009951851
|
-2.784842811
|
HALLMARK_SPERMATOGENESIS
|
-0.407470319
|
-0.034641677
|
-2.695622036
|
0.010746389
|
-2.851391302
|
HALLMARK_INTERFERON_ALPHA_RESPONSE
|
0.478556216
|
-0.066120837
|
2.666408386
|
0.011549555
|
-2.913735465
|
HALLMARK_NOTCH_SIGNALING
|
0.422849057
|
-0.003580058
|
2.644791124
|
0.012179265
|
-2.959588996
|
HALLMARK_WNT_BETA_CATENIN_SIGNALING
|
0.408537465
|
-0.011178712
|
2.451810136
|
0.019375666
|
-3.357979318
|
HALLMARK_APOPTOSIS
|
0.396896132
|
0.010790873
|
2.411608018
|
0.021295095
|
-3.438393105
|
HALLMARK_IL2_STAT5_SIGNALING
|
0.35818706
|
-0.002127791
|
2.232680643
|
0.032101453
|
-3.784821298
|
HALLMARK_TNFA_SIGNALING_VIA_NFKB
|
0.391963242
|
-0.022104131
|
2.16194063
|
0.03758126
|
-3.916402428
|
HALLMARK_KRAS_SIGNALING_UP
|
0.319926245
|
-0.004284628
|
2.058993963
|
0.047037772
|
-4.102178567
|
GSVA: Gene Set Variation Analysis |
3.5 Immunoinfiltration Analysis of the PCOS Dataset (CIBERSORTx)
We utilized the CIBERSORTx online tool to calculate the correlation between 22 immune cell types and the expression profile data of the GSE34526 PCOS dataset. The resulting immunoinfiltration analysis was visualized using a bar graph to display immune cell infiltration in each sample (Fig. 6a). Correlation heatmaps were generated to assess the relationships between the six RMRDEGs (ALYREF, NUDT1, AGO2, TET2, YTHDF2, and TRMT61B) and the abundance of the 22 immune cell types in the GSE34526 dataset (Fig. 6b). The results revealed correlations between the RMRDEGs and the abundance of 14 immune cell infiltrates. Notably, most immune cell types exhibited significant correlations with RNA-modified DEGs (P < 0.005).
3.6 PPI network: hub gene-miRNA interaction network; hub gene-RBP interaction network; hub gene-TF interaction network
PPI analysis of the six RMRDEGs (ALYREF, NUDT1, AGO2, TET2, YTHDF2, TRMT61B) was performed using the STRING database. The resulting PPI network was visualized using Cytoscape software (Fig. 7a).
We applied the mRNA-miRNA data from the ENCORI and miRDB databases to predict miRNAs that interact with the six hub genes (mRNA). The intersection of results from both databases was extracted and visualized as an mRNA-miRNA interaction network using Cytoscape software (Fig. 7b). This network consisted of five hub genes (NUDT1, AGO2, TET2, YTHDF2, and TRMT61B), 179 miRNA molecules, and 195 mRNA-miRNA interaction pairs (Table S2).
RBP interactions were predicted using mRNA-RBP data from the ENCORI database. The mRNA-RBP interaction network, composed of five hub genes (ALYREF, AGO2, TET2, YTHDF2, TRMT61B) and 24 RBP molecules, was visualized using Cytoscape software (Fig. 7c). It consisted of 33 mRNA-RBP interaction pairs (Table S3). To identify TFs binding to hub genes, we subsequently searched the CHIPBase (version 3.0) and hTFtarget databases. The resulting interactive relationships and interactions with hub genes (mRNA) were downloaded and visualized using Cytoscape software. The network included six hub genes (ALYREF, NUDT1, AGO2, TET2, YTHDF2, TRMT61B) and 77 TFs (Fig. 7d, Table S4).
3.7 Expression and correlation analyses of hub genes in the dataset
In the PCOS dataset (GSE34526), we examined the expression differences and correlations of the six RMRDEGs. The correlation analysis revealed significant differences in the expression levels of most hub genes (ALYREF, AGO2, TET2, YTHDF2, TRMT61B) between the different groups (normal/PCOS) in the GSE34526 dataset (Fig. 8a). Specifically, the expression levels of these hub genes showed statistically significant differences among the different groups (P < 0.05). To further investigate the correlation between the six hub genes (ALYREF, NUDT1, AGO2, TET2, YTHDF2, and TRMT61B), a heatmap displaying their correlations in the GSE34526 dataset was generated (Fig. 8b). The results indicated statistically significant differences between ALYREF and NUDT1 (P < 0.01) and between ALYREF and YTHDF2 (P < 0.05).
3.8 ROC validation of hub genes in the three data sets
ROC was performed to assess the discriminatory accuracy of the six hub genes (ALYREF, NUDT1, AGO2, TET2, YTHDF2, and TRMT61B) in the GSE34526, GSE5850, and GSE98421 datasets, which were derived from PPI network construction in the PCOS dataset. In the GSE34526 dataset, ROC validation identified AGO2, ALYREF, and TRMT61B as hub genes with high discriminatory accuracy (AUC = 0.952) between the normal and PCOS groups (Fig. 9a-d). NUDT1 showed moderate accuracy (AUC = 0.857) (Fig. 9c). The performance of TET2 and YTHDF2 is displayed in Fig. S1. In the GSE5850 dataset, ALYREF, NUDT1, and TET2 displayed low discriminatory accuracy (AUC = 0.639, 0.611, and 0.694, respectively) (Fig. 9e-g). TRMT61B and YTHDF2 demonstrated moderate accuracy (AUC = 0.750 and 0.889, respectively) (Fig. 9h-i). AGO2 is shown in Fig. S1. In the GSE98421 dataset, TRMT61B and YTHDF2 exhibited low accuracy (AUC = 0.625) (Fig. 9j-k). AGO2, ALYREF, NUDT1, and TET2 are displayed in Fig. S1. *[Fig. S1: ROC curve verification for each hub gene in the respective dataset.]