Clinical characteristics in the HGSOC training set
Three high quality datasets, respectively, GSE49997, GSE9891, and TCGA, were adopted from the ‘NormalizerVcuratedOvarianData’ Bioconductor package and were aggregated as the modeling dataset. In total, 464 samples and 10647 genes were included after data preprocessing. Among them, 429 were late-stage patients (FIGO III-IV). Clinical information statistically analysis and univariate cox regression analysis were performed. As shown in Table 1, there was significantly statistical difference between FIGO stage with recurrence (p = 0.001). Moreover, age more than 60 years old (p = 0.001), late FIGO stage (p = 0.049) and recurrence (p = 0.005) were also associated with the death consequence. These Data trends were consistent with clinical and epidemiological distributions.
Table 1
General clinic-pathological information of high grade serious ovarian cancer patients in modeling dataset
Clinical features
|
Sample
n = 464
|
Non-recurrence
n = 212
|
Recurrence
n = 252
|
χ2
|
P value
|
Living
n = 260
|
Death
n = 204
|
χ2
|
P value
|
age < 60
age ≥ 60
|
248
216
|
118
94
|
130
122
|
3.67
|
0.055
|
157
103
|
91
113
|
10.815
|
0.001
|
FIGO I
FIGOII
FIGOIII
FIGOIV
|
15
20
370
59
|
14
11
160
27
|
1
9
210
32
|
13.933
|
0.003
|
12
15
201
32
|
3
5
169
27
|
4.821
|
0.185
|
Early stage(FIGO I-II)
Late stage(FIGO III-IV)
|
35
429
|
25
187
|
10
242
|
11.279
|
0.001
|
27
233
|
8
196
|
3.885
|
0.049
|
Non-recurrence
Recurrence
|
212
252
|
-
|
-
|
-
|
-
|
170
90
|
42
162
|
7.787
|
0.005
|
Construction of gene co-expression network and screening overall survival correlated modules
WGCNA was performed to identify gene co-expression network associated with HGSOC prognosis. A soft threshold of 3 was implemented, resulting in the detection of 27 significant gene modules (Fig. 2E). For each module, we calculated correlations between gene expressions and clinical features such as tumor stage, age, recurrence time, vital time, recurrence status, and vital status. In the present study, we focused on the risk factors of death. Survival analysis indicated five correlated modules, their names were greenyellow (138 genes), midnightblue (126 genes), cyan (126 genes), darkturquoise (70 genes) and white (56 genes)(Fig. 3A).
In order to test the reproducibility and stability of gene co-expression networks, we constructed gene co-expression network among ten random sampling datasets each containing 90% of 464 samples. In parallel, the sampling ratio gradually diminished to 80%,70% and 60%. According to survival analysis, 2 out of 20 modules and 1 out of 24 modules were correlated with OS in 90% and 80% sampling groups. No prognostic module was found in 70% or 60% sampling groups. Eventually, we built WGCNA among 100 random sampling datasets each containing 90% of 464 samples and screened unique overall survival correlated cyan* module (113 genes) (Fig. 1B). As expected, there were 108 intersect genes between greenyellow module and cyan* module. Hence, we identified cyan* module genes for subsequent analysis (Fig. 3).
GO and pathway enrichment analysis of cyan* module genes
To explore the biological functions of cyan* module genes, we performed Gene Ontology (GO) analysis included biological process (GOBP), cellular component (GOCC) and molecular functions (GOMF). The top five enriched terms in GOBP were protein modification by small protein removal (FDR=0.014), protein deubiquitination (FDR=0.03), ubiquitin-dependent protein catabolic process (FDR=0.011), post-translational protein modification (FDR=0.014) and cellular protein catabolic process (FDR=0.011). The significantly related KEGG pathway were proteasome (hsa03050) and thermogenesis (hsa04714), while Reactome pathway was apoptosis (FSA-109581).
PPI construction of cyan* module genes
We extracted a protein-protein interaction subnetwork with 118 nodes and 68 edges from the high quality STRING protein interaction database. PPI enrichment p-value is 0.0183. PSMA4, PSMC4, PSMD8, PSMA3, PSMC5 and PSMD12 interacted with each other directly and were involved in the proteasome pathway (FDR=6.89e-05)(Figure 4). Meanwhile, ATP5D, RPS6KB1, COX11, ATP5H, GRB2, ATP5G1 and SMARCD2 were involved in thermogenesis KEGG pathway (hsa04714). RPS6KB1, RAC3, GRB2 and PRKCA were involved in choline metabolism in cancer (hsa05231).
Validation of ten independent predictors of overall survival
As mentioned above, there were 108 intersect genes between greenyellow module and cyan* module. Aiming to test the prognostic association with OS, we built forest plots of the expression of each gene using another 7 datasets (E.MTAB.386, GSE17260, GSE26712, GSE30161, GSE32062.GPL6480, PMID17290060 and TCGA.RNASeqV2) and illustrated the top ten independent predictors (C17orf62, CANT1, DUS1L, FN3KRP, GRB2, NARF, NUP85, P4HB, SIRT7 and STRA13) in table 2. The p values for the overall HR were between 4.42e-08 to 6.48e-05 (Figure 5). In addition, we demonstrated gene descriptions and other prognostic related cancers.
Validation of four candidate predictors in optimal debulking ovarian cancer patients
We selected 801 optimal debulking ovarian cancer patients as the validation dataset through online tool. Four out of ten predictors were filtered with absolutely significant differences. As demonstrated in Figure 6, low mRNA expression levels of CANT1 (p=7.6e-05), P4HB (p=9.8e-07), DUS1L (p=6.4e-06) and SIRT7 (p=9.5e-08) were associated with worse OS in OvCa patients.