Classification of TCGA data
All 467 cases were randomly divided into a training cohort (n=234, used to identify key microRNAs) and a validation cohort (n=233, used to validate the microRNA signature). There were no significant differences between these two cohorts regarding age, clinical stage, histological grade, lymphatic invasion, living status, or residual tumor status (Table 1).
Table 1. Clinical characteristics of the two cohorts from TCGA datasets.
Parameters
|
|
Discovery cohort
(n=234)
|
Validation cohort
(n=233)
|
p-value
|
method
|
Age ( Mean ± SD)
|
|
59.6 ± 11.5
|
60.2 ± 11.7
|
0.57
|
t-test
|
Clinical stage
|
|
|
|
|
|
|
I/II
|
14
|
16
|
0.82
|
χ2 test
|
III/IV
|
219
|
214
|
Null
|
1
|
3
|
|
|
Histologic grade
|
|
|
|
|
|
|
G1
|
1
|
0
|
0.63
|
Fisher’s exact test
|
G2
|
29
|
32
|
G3
|
201
|
190
|
Null
|
3
|
11
|
|
|
Lymphatic invasion
|
|
|
|
|
|
|
No
|
28
|
32
|
0.67
|
χ2 test
|
Yes
|
54
|
51
|
Null
|
152
|
150
|
|
|
Living status
|
|
|
|
|
|
|
Living
|
87
|
88
|
0.97
|
χ2 test
|
Dead
|
147
|
145
|
Tumor residual disease
|
|
|
|
|
|
|
1-10mm
|
96
|
112
|
0.51
|
χ2 test
|
11-20mm
|
14
|
15
|
>20mm
|
43
|
37
|
Null
|
81
|
69
|
|
|
Identification of microRNAs related to prognosis
Univariable cox regression analysis of the training cohort identified 59 statistically significant microRNAs (P <0.05) (Supplementary table 2).The six prognosis-related microRNAs (hsa-miR-3074-5p, hsa-miR-758-3p, hsa-miR-877-5p, hsa-miR-760, hsa-miR-342-5p and hsa-miR-6509-5p) were picked out using Robust likelihood-based survival analysis (Table 2).
Table 2. A prognosis-related microRNA signature in the training cohort
MicroRNA
|
Gene ID
|
nloglik
|
AIC
|
Selected
|
hsa-miR-3074-5p
|
5
|
666.12
|
1334.24
|
*
|
hsa-miR-758-3p
|
4
|
663.16
|
1330.33
|
*
|
hsa-miR-877-5p
|
16
|
660.98
|
1327.96
|
*
|
hsa-miR-760
|
10
|
658.13
|
1324.26
|
*
|
hsa-miR-342-5p
|
14
|
655.4
|
1320.8
|
*
|
hsa-miR-6509-5p
|
20
|
654.24
|
1320.49
|
*
|
hsa-miR-410-3p
|
11
|
654.08
|
1322.16
|
|
hsa-miR-654-3p
|
15
|
654.06
|
1324.11
|
|
hsa-miR-4473
|
8
|
652.13
|
1322.25
|
|
hsa-miR-551a
|
3
|
651.35
|
1322.7
|
|
|
|
|
|
|
|
|
Developing a risk score based on a 6-microRNA signature for ovarian cancer
we performed multivariable Cox proportional regression analysis based on the six prognosis-related microRNAs determined by the training cohort. The consequent risk score was established as follows:
Risk score = (-0.05171* hsa-miR-3074-5p) + (0.22493* hsa-miR-758-3p)+ (-0.06977*hsa-miR-877-5p) + (-0.16781*hsa-miR-760) + (-0.24161*hsa-miR-342-5p) + (-0.12669*hsa-miR-6509-5p)
Patients were assigned to high-risk group (n=117) and low-risk group (n=117) according to the Youden index(-2.083) of risk score. As shown in Figure 2. Most deaths were associated with a high risk score, while patients surviving for a long time tended to have a low risk score. Furthermore, the expression levels of hsa-miR-758-3p tended to be higher among patients in the high risk group; the opposite trend was shown for the other 5 microRNAs. A Pearson correlation coefficient was calculated as -0.32 with P-value <10-6. Thus, the survival time for OVC patients was negatively correlated with their risk scores.
Performance and validation of the prognostic risk score system
As shown in Figure 3A, OS was significantly higher in patients with low risk score group compared with patients with high risk score (P <0.0001). The prognosis-related microRNA signature possess a remarkable ability to predict survival with AUC values of 0.681 and 0.802 for 5-year and 10-year OS, respectively by time-dependent ROC analysis (Figure 3B). To reinforce the prediction ability of the prognostic risk score system, we also performed survival analysis for the validation cohort and the complete cohort. In accordance with the results of the training cohort, the survival curves (Figure 3C and Figure 3E) indicated a highly significant difference between the low risk and high risk groups. In terms of the time dependent ROC analysis for the validation cohort, the AUC was 0.647 and 0.742 for 5-year and 10-year OS, respectively (Figure 3D). In addition, the AUC value was 0.671 and 0.784 for the 5-year and 10-year OS for the complete cohort, respectively (Figure 3F). As most patients involved in the present TCGA data had clinical stages of III or IV. Thus, the patients with clinical stages of III or IV were further divided into smaller subgroup to perform survival analysis and time-independent ROC analysis, as shown in Supplementary figure 1. It was found that the proposed risk score had an outstanding ability in stratifying patients with clinical stages of IV in terms of 10 years survival.
Functional characteristics of the genes targeted by the prognosis-related microRNAs
Using the six microRNAs in miRWalk 3, we derived six cohorts of target genes belonging to each microRNA. When duplicate genes were removed, there were 2325, 2097, 2405, 3058, 2847, and 2970 target genes found for hsa-miR-3074-5p, hsa-miR-758-3p, hsa-miR-877-5p, hsa-miR-760, hsa-miR-342-5p, and hsa-miR-6509-5p, respectively. A gene targeted by 3 or more microRNAs (and therefore appeared in at least 3 cohorts) was defined as an ‘overlapping target gene’. In total, we identified 1870 overlapping target genes (Figure 4A). To explore the molecular mechnisms uncerlying the function of the overlapping target genes, we conducted KEGG functional enrichment analysis, in which pathways with P <0.01 were defined as significantly enriched pathways (Supplementary table 3). The result showed that these genes were predominantly involved in pathways including phosphatidylinositol signaling system, morphine addiction, choline metabolism in cancer, the phospholipase D signaling pathway, circadian entrainment, the prolactin signaling pathway, cholinergic synapses and EGFR tyrosine kinase inhibitor resistance(Figure 4B). According to the GO enrichment analysis, intracellular signal transduction, nervous system development and phospholipid biosynthetic process were the most significantly enriched biological process; protein binding, transcription factor activity, sequence-specific DNA binding and metal ion binding were the most significantly enriched molecular function. The top 15 GO terms were shown in Supplementary table 4.
Network of significantly enriched KEGG pathways and related genes
Figure 5 shows a network that includes significantly enriched KEGG pathways and related genes as visualized by Cytoscape. These results suggested that PRKCA, MAPK1, PRKCB, PIK3CB, PIK3CA, NRAS, PIK3R3, SOS1 and PLCB2 were involved in at least half of the KEGG pathways. Therefore, these genes may participate in regulating the progression of OVC.
Verification of the selected markers using clinical tissues
To investigate the expression of the selected microRNAs in OVC, we assessed the expression levels of the 6 microRNAs in 180 OVC samples and 162 normal ovaries by qRT-PCR. As shown in Figure 6A, the results indicated that the expression of miR-6509-5p, miR-342-5p, miR-3074-5p, miR-877-5p, and miR-760, were dramatically reduced in OVC tissues, compared with normal tissues (P < 0.05). However, miR-758-3p expression was markedly increased in OVC simples compared with normal ovaries (P < 0.05). To investigate the prognostic significance of the selected microRNAs, we performed Kaplan-Meier survival analysis. We selected the median value of the micro-RNAs from the entire dataset as the cutoff point. These results suggested that median survival of patients with higher levels of miR-6509-5p (51 months), miR-342-5p (50 months ), miR-3074-5p (43month ), miR-877-5p (48 months ), and miR-760 (46 months) had better time than those with lower levels of miR-6509-5p (40 months ), miR-342-5p (38 months), miR-3074-5p(33 monhs), miR-877-5p(38 months), and miR-760 (37 months) ( all p-value <0.05, Figure 6B). Patients who over-expressed miR-758-3p had a shorter median survival time (36 months ) than its low expression (58 months, p-value <0.05).