Patient data
MiRNA expression files and clinical information for 521 CRA patients (comprising 529 tumor samples and 11 normal tissues) were downloaded from the TCGA database. A total of 415 CRA patients with complete clinical data were enrolled in further analysis. All enrolled patients had primary adenocarcinoma, did not have a previous or concurrent malignancy, and received no chemotherapy or radiotherapy before surgery. After differential miRNA expression analysis, the 415 CRA patients were randomly divided into two sets(Table 1).
Differentially expressed miRNAs between cancer tissues and normal tissues in CRA patients
Before analysis, data normalization and log2 transformation were conducted in the miRNA expression files firstly using R software (version 3.6.1). We finally identified 385 differentially expressed (231 upregulated and 154 downregulated) miRNAs (|log2FC| >1 and the adjusted P-value <0.05) using the “DESeq” package. The heatmap of the expression of differentially expressed miRNAs in Figure 1 could clearly distinguish the CRA samples from the controlling normal samples.
Signature identification of the prognosis-related miRNA in the training set
Univariate Cox regression analysis with the LASSO method was used in the training set for selection of CRA prognosis-related miRNAs among the 385 differentially expressed miRNAs (Figure 2). Then, through a stepwise multivariate Cox regression analysis to maximize accuracy and validity, seven miRNAs were finally screened, and the prognosis-related miRNA signature was constructed based on the expression levels of these seven miRNAs. The signature formula was as follows: seven-miRNA signature= hsa-miR-144×−0.255 + hsa-miR-153-2×0.472 + hsa-miR-505×0.475 + hsa-miR-887×0.565 + hsa-miR-3199-2×0.509+hsa-miR-561×0.452 + hsa-miR-3684×0.516. The higher concordance index (C index), the better model performance, and the C index of the seven-miRNA signature was 0.772.
Among these miRNAs, hsa-miR-144 had a negative coefficient in the risk formula, indicating that it might be considered a protective miRNA; that is, the higher the expression level of this miRNA, the longer was the survival time of the patient. The remaining six miRNAs in this risk formula had positive coefficients, pointing that higher expression levels of these miRNAs were associated with shorter patient survival times.
Evaluation of the survival prediction capability of the seven-miRNA signature in the training set
The 208 samples in the training set were scored according to the risk scoring formula and divided into the high-risk group (n=104) and the low-risk group (n=104), with a median value of 1.009 as the cutoff point. The Kaplan-Meier overall survival curves of the two groups showed statistically significant differences between patient survival (log-rank test P=5e−08) (Figure 3A). ROC curves along with AUC values were further generated to evaluate the prognostic ability of the seven-miRNA signature (Figure 3B). The AUCs for 3- and 5-year survival in the training set were 0.769 and 0.889, respectively, indicating that the signature had good performance in predicting CRA patients’ survival risk. In addition, the distributions of the risk score, survival status and expression profile of the seven miRNAs in the training set were illustrated to visualize the analysis results (Figure 4). Patients with higher risk scores had poorer overall survival than patients with lower risk scores, as shown in the figure. The expression levels of the protective miRNA hsa-miR-144 were significantly decreased in patients with high risk scores, whereas those of the remaining six oncogenic miRNAs (miR-153-2, miR-505, miR-887, miR-3199-2, miR-561, and miR-3684) were increased.
Performance verification of the seven-miRNA signature in the test set and entire set
To evaluate the performance of the seven-miRNA signature for predicting the survival of CRA patients, Kaplan-Meier survival analysis was conducted and ROC curves with AUC values were generated in the test set (n= 207) and the entire set (n= 415). Patients in the two validation sets were classified into a high-risk group and a low-risk group according to the constructed seven-miRNA signature. As shown in the Kaplan-Meier survival curve for the test set, patients in the high-risk group (n= 103) had significantly shorter overall survival times than those in the low-risk group (n= 104) (P = 6e - 03, Figure 5A). In the test set, the AUC for 3-year survival was 0.666, and that for 5-year survival was 0.742 (Figure 5B). Similar results were obtained in Kaplan-Meier survival analysis of the entire TCGA set, in which the patients in the high-risk group (n= 207) had significantly shorter overall survival times than those in the low-risk group (n= 208) (P= 0, Figure 6A). In the entire TCGA set, the AUC for 3-year survival was 0.722, while that for 5-year survival was 0.816 (Figure 6B). These results confirmed the power of the seven-miRNA signature for predicting the prognosis of CRA patients.
To better demonstrate differences in the survival outcomes and miRNA expression levels between CRA patients in the high-risk and low-risk groups, we visualized the distributions of the risk scores, survival statuses and miRNA expression levels of patients in the test set and the entire TCGA set. The overall survival times of patients in the high-risk group was significantly shorter than those of patients in the low-risk group; moreover, one protective miRNA, hsa-mie-144, was highly expressed in the low-risk group, while the remaining 6 oncogenic miRNAs were highly expressed in the high-risk group, reconfirming the results in the training set. (Figure 5C, Figure 6C).
Combination of routine clinical factors for efficacy evaluation of the seven-miRNA signature
Clinical factors play an important role in tumor prognosis. To construct a more stable and effective prognostic model, the seven-miRNA signature combined with conventional clinical parameters (including age, race, sex, tumor site and TNM stage) was analyzed by univariate and multivariate Cox regression along with stratification analysis to evaluate the associations of these parameters with the overall survival in the entire data set. Variables with a P-value of <0.05—i.e., the seven-miRNA signature, age and TNM stage—were considered to be associated with prognosis, as shown in Table 2. In addition, the conclusion can be made that the seven-miRNA signature is an independent risk factor for the prognosis of CRA patients.
Then, ROC curves and AUC values were used to horizontally compare the efficacy of the seven-miRNA signature with those of patient age and TNM stage in predicting the prognosis of patients with CRA. First, the seven-miRNA signature, age, and a new variable combining both of these variables were included in the analysis. As shown in Figure 7A, the seven-miRNA signature showed a higher predictive power than the age. Then, the seven-miRNA signature, TNM stage, and a new variable combining both of these variables were included; similarly, the predictive power of the seven‐miRNA signature was superior to that of the TNM stage (Figure 7B).
Finally, analysis was performed on the 415 patients stratified by age and TNM stage to compare the risk score of the seven-miRNA signature. The patients were divided by the average age in years into the younger group (<65, n=189) and the aged group (≥65, n=226. In the younger group, the seven-miRNA signature divided the patients into a high-risk group (n=95) and a low-risk group (n=94) according to the median risk score. Kaplan-Meier survival curves showed that the difference between the two risk groups was statistically significant (P=5.891e−05, Figure 8A&B). The corresponding values were also statistically significant (P=1e−07) in the aged group, which was also divided into the high- and low-risk groups (n=112 and 114, respectively). Then, the patients were classified by TNM stage into the early-stage group (stages I+II, n=229) and the advanced-stage group (stages III+IV, n=186). In both stage groups, the seven-miRNA signature separated the patients into a high-risk group and a low-risk group independently. Similarly, the Kaplan-Meier survival curves for both stage groups indicated that the overall survival times of patients in the high-risk groups were significantly shorter than those of patients in the low-risk groups (P=5.891e−05, 0.00011855, Figure 8C&D).
Target gene prediction and functional enrichment analysis of the prognostic miRNAs
To investigate the cellular processes and biological pathways associated with the seven prognostic miRNAs, miRNA target prediction was separately performed with miRDB, miRTarBase7.0 and TargetScan. MiRDB identified 9297 miRNA-related genes,11 while TargetScan,12 the most comprehensive miRNA target gene database, identified 38678 mRNAs. MiRTarBase,13 the experimentally validated miRNA-target interaction database, identified 152 mRNAs associated with the seven prognostic miRNAs. The intersection of the results from these three databases was identified as the set of miRNA-related genes (n=67, Figure 9B) for further functional enrichment analyses. First, GO BP and KEGG pathway enrichment analyses were performed with these miRNA-related genes via DAVID.14,15 Then, the genes with a P-value of < 0.05 (Table 3) were visualized using the R packages “Cairo” and “ggplot2” (Figure 9A&C). The GO and KEGG analysis results showed that these genes are involved in cellular processes associated with cancer, such as regulation and negative regulation of apoptosis, and in biological pathways such as the p53 signaling pathway. The functional enrichment analysis results indicated the potential roles of the seven prognostic miRNAs in the regulation of CRA tumorigenesis and progression via targeting of their associated mRNAs.