Identification of DEGs in PCOS and OV
To identify shared DEGs between PCOS and OV, we analyzed gene expression data from public datasets. The synthesis of five PCOS datasets yielded data from 28 PCOS and 22 normal granulocyte samples. The TCGA OV dataset GSE18520 provided 53 cancer and 10 normal tissue samples. Our analysis revealed 1179 PCOS-associated DEGs (520 upregulated and 659 downregulated; Fig. 1A) and 2092 OV-related DEGs (1237 upregulated and 855 downregulated; Fig. 1B). Remarkably, there was an overlap of 28 upregulated and 45 downregulated DEGs between PCOS and OV (Fig. 1C).
Association of DEGs with m6A modification and cellular senescence
To identify DEGs associated with both m6A modification and cellular senescence (m6A_CellAgeRG), we aligned the DEGs from both PCOS and OV with known m6A target genes and cellular senescence genes. This intersection produced 8 significant m6A_CellAgeRGs, including LMNB1, SNAI2, NOTCH1, DDIT3, MEIS2, STAT5A, TACC3, and VCAN (Fig. 1D). A subsequent m6A_CellAgeRG score was derived using the ssGSEA algorithm, which revealed elevated scores in PCOS and OV patients compared to controls (Fig. 1E and 1F). This finding suggests that m6A modification and cellular senescence may play a pivotal role in the pathogenesis of both PCOS and OV.
Identification of core modules and OV-related hub genes
To identify core modules related to the m6A_CellAgeRG score, we conducted WGCNA on PCOS and TCGA OV datasets. After initial sample clustering, we detected and removed one outlier in the PCOS dataset (Fig. 2A). To establish an optimal scale-free topology and connectivity, we set the soft threshold β to 12 (Fig. 2B). Utilizing hierarchical clustering, we categorized the genes into 10 distinct modules (Fig. 2C). Notably, the blue module exhibited the highest correlation (R = 0.8) with the m6A_CellAgeRG score (Fig. 2D). Similarly, in the TCGA OV dataset, we identified and removed 13 outliers (Fig. 2E), selected a soft threshold β of 8 (Fig. 2F), and organized the genes into 19 modules (Fig. 2G), with the brown module showing the highest correlation (R = 0.74) with the m6A_CellAgeRG score (Fig. 2H). By applying the criteria GS > 0.5 and |MM| > 0.6, we identified 1246 hub genes within the blue module in the PCOS cohort (Fig. 3A) and 308 hub genes within the brown module in the OV cohort (Fig. 3B). Out of these hub genes, 242 were also among the 2092 DEGs associated with OV (termed as DEhub genes) (Fig. 3C). These data suggest the involvement of these hub genes in OV development and progression.
Functional enrichment and pathway analysis of DEhub genes
To investigate the biological functions and pathways of the DEhub genes, we performed GO and KEGG enrichment analyses. For functional annotation, GO analysis revealed that the DEhub genes were significantly enriched in 2627 biological processes, including cell-substrate adhesion, positive regulation of angiogenesis, and positive regulation of vasculature development. They were also associated with 18 cellular components, such as the collagen-containing extracellular matrix, external side of the plasma membrane, and membrane raft, as well as 11 molecular functions, including extracellular matrix structural constituent, coreceptor activity, and cytokine binding (Fig. 3D, Table S1). Furthermore, through KEGG pathway enrichment analysis, we identified 11 significant pathways, including Adherens junction, Fc gamma R-mediated phagocytosis, and the AGE-RAGE signaling pathway in diabetic complications (Fig. 3E, Table S2). These results suggest that these pathways and processes may be potential therapeutic targets for OV.
Development and validation of an OV prognostic model
To identify hub genes related to the prognosis of OV, we conducted a univariate Cox analysis to assess the correlation between the expression levels of 242 DEhub genes and patient prognosis in the TCGA OV cohort. Using a significance threshold of P < 0.05, we identified 31 DEhub genes significantly associated with patient prognosis (Table S3). To address potential issues such as variable collinearity and overfitting, we performed LASSO Cox analysis on these 31 DEhub genes in the training set (Fig. 4A), selecting the minimum λ value from the model (Fig. 4B). This process led us to identify 19 genes (APBB2, C5AR1, CCDC80, CFI, CXCR2, CXCR4, DOCK11, FNIP1, GBP5, KATNAL1, KCTD1, LILRA2, MRC1, P2RX1, P2RY14, PI3, PYGB, TMOD2, and ZBP1) with corresponding weight coefficients shown in Fig. 4C. Subsequently, these 19 genes underwent multivariate Cox analysis (Fig. 4D), and their functional interactions with closely related proteins were explored (Fig. 4E). Using the expression levels and regression coefficients of these 19 DEhub genes, we calculated risk scores for each TCGA sample. Patients were then divided into high-risk and low-risk groups based on the median risk score, and the risk score formula was as follows: risk score = (-0.102) × APBB2 + 0.016 × C5AR1 + 0.13 × CCDC80 + (-0.066) × CFI + 0.485 × CXCR2 + (-0.133) × CXCR4 + 0.073 × DOCK11 + 0.067 × FNIP1 + (-0.088) × GBP5 + 0.108 × KATNAL1 + (-0.041) × KCTD1 + 0.165 × LILRA2 + 0.05 × MRC1 + (-0.192) × P2RX1 + (-0.678) × P2RY14 + 0.089 × PI3 + 0.177 × PYGB + 0.073 × TMOD2 + (-0.13) × ZBP1. A heatmap depicted the expression of these 19 DEhub genes in each TCGA OV sample (Fig. 4F).
Efficacy of prognostic risk score model across OV cohorts
To explore the prognostic implications of the risk score, we performed Kaplan-Meier (KM) survival analysis on the TCGA OV cohort. The results showed diminished OS for high-risk patients in both the training and test sets (Fig. S2A and S2B). The ROC curves indicated AUC values of 0.72, 0.77, and 0.74 for 1-year, 2-year, and 3-year OS in the training set and 0.61, 0.67, and 0.64 in the test set (Fig. S2D and 2E). When evaluating the model's reliability on an external OV cohort, GSE140082, a significant decrease in OS for the high-risk group was observed (Fig. S2C). The corresponding ROC curves showed AUC values of 0.56, 0.63, and 0.64 for 1-year, 2-year, and 3-year survival (Fig. S2F). Upon analyzing the correlation between risk scores and OS among the TCGA OV patients, we categorized these patients based on their age, tumor stage, and grade. The data showed that high-risk patients generally had lower OS rates compared to their low-risk counterparts across the overall patient population (P < 0.0001; Fig. S2G) and when differentiated by age (P < 0.0001; Fig. S2H and S2I). The difference in OS was notably significant among patients with high-grade (G3+G4, P < 0.0001; Fig. S2J) and advanced-stage tumors (III+IV, P < 0.0001; Fig. S2K). However, for those diagnosed with early-stage (I+II, P = 0.25; Fig. S2L) or low-grade tumors (G1+G2, P = 0.46; Fig. S2M), there wasn't a significant difference in OS between the high and low-risk groups. The results suggest that the risk score is a reliable prognostic indicator for OS in OV, particularly among high-grade and advanced-stage tumors.
Construction and validation of the prognostic nomogram
To explore the clinical utility of the prognostic model, we identified independent prognostic clinical factors through univariate and multivariate Cox analyses. Both analyses unveiled significant associations between OS and the risk score (both HR = 2.7, P < 0.001; Fig. 5A and 5B). Subsequently, we constructed a nomogram by integrating the risk score with tumor stage, grade, and age, aiming to forecast 1-year, 2-year, and 3-year survival probabilities (Fig. 5A). The calibration plot showed good agreement between the observed and predicted survival rates spanning 1, 2, and 3 years (Fig. 4D-F). The DCA for these durations further underscored the clinical utility of this nomogram (Fig. 4G-I). These data suggest that the constructed nomogram provides a valuable tool for predicting survival outcomes in OV patients.
Association of risk score with immune cell infiltration
To explore the potential interplay between the risk score and the tumor's immune microenvironment, we used the CIBERSORT algorithm to quantify the infiltration of immune cells within the tumor and compared these measurements across risk groups. The results revealed significant differences in the abundance of seven immune cell types between the risk groups (Fig. 6A and 6B). In particular, the high-risk group was characterized by elevated counts of M0 macrophages and monocytes, coupled with a subtle rise in M2 macrophages. Conversely, counts of M1, CD8+ T cells, and T follicular helper (Tfh) cells decreased significantly in this group. By employing the ESTIMATE algorithm, we found that the ESTIMATEScore and stromal score were noticeably higher in the high-risk group compared to the low-risk group (P = 0.014 and 0.00011, respectively; Fig. 6C and 6E), while tumor purity reduced significantly in the high-risk group (P = 0.014; Fig. 6D). However, no marked difference was observed in the immune score across the groups (Fig. 6F). Subsequent TIDE analyses to measure the potential for immunotherapy response revealed that interferon gamma (IFNG), CD8, microsatellite instability (MSI), and myeloid-derived suppressor cell (MDSC) levels decreased in the high-risk group, while T cell dysfunction, exclusion, and cancer-associated fibroblasts (CAF) increased (Fig. 6G). These observations suggest an intricate relationship between the risk score and the immune landscape of the tumor.
Drug sensitivity analysis in different risk categories
To investigate how the risk score reflects therapeutic responsiveness, we projected drug sensitivity in patients. Differences in drug sensitivity between risk groups were determined based on the prediction scores (Fig. 7A). The lower the prediction score, the higher the sensitivity of the corresponding patients to the drug. Three drugs, namely Sabutoclax_1849, AGI-6780_1634, and BMS-536924_1091, emerged as the most correlated with the risk scores (Fig. 7B). However, the prediction scores of Sabutoclax_1849 showed no difference between the two groups. Thus, we only display the Pearson correlation coefficient (r) and P-value between the risk score and the prediction scores of AGI-6780_1634 and BMS-536924_1091 (Fig. 7C and 7D). BMS-536924_1091 response was negatively correlated with the risk score (r = -0.22, p = 1.5e-05), while AGI-6780_1634 response was positively correlated with the risk score (r = 0.25, p = 1.4e-06). These patterns suggest that risk scores could potentially serve as informative indicators for therapeutic strategies in OV.
Model gene expression in external OV datasets and OV cell lines.
Lastly, we examined the alterations in model gene expression in external OV datasets and OV cell lines. The expression patterns of these model genes in GSE18520 and GSE27651 datasets (Fig. 8A) are summarized in Table 2. Notably, our analysis revealed that CXCR4 emerged as the only m6A target gene, subject to regulation by m6A-associated genes FTO, METTL14, ELAVL1, and YTHDF2 (Fig. 8B). This finding suggests a complicated regulatory network governing CXCR4 in OV. Furthermore, in SKOV-3 and A2780 OC cell lines, qRT-PCR analysis revealed notable upregulation of C5AR1, CFI, CXCR2, CXCR4, KCTD1, PI3, PYGB, ZBP1, and LILRA2 (only in SKOV3 cells) compared to HOSEpiC. Conversely, CCDC80, FNIP1, GBP5 (only in SKOV3 cells), KATNAL1, MRC1, P2RX1, and TMOD2 (only in A2780 cells) showed significant downregulation (all P < 0.05). The other genes did not exhibit significant changes (all P > 0.05; Fig. 9). These findings are generally consistent with the gene expression alterations observed in OV datasets (Table 2).