DOI: https://doi.org/10.21203/rs.3.rs-70357/v1
Background and aims: Colorectal cancer (CRC) remains a global public health problem. We aimed to access prognostic microRNAs (miRNAs) associated with overall survival (OS) and evaluate their diagnostic value in CRC.
Methods: Three serum miRNA expression profiles from Gene Expression Omnibus (GEO) database and tissue miRNA expression profile of CRC from The Cancer Genome Atlas (TCGA) database were studied to get common differentially expressed miRNA (DEmiRNAs) in serums and tissues. 569 CRC patients with survival information from TCGA database were randomly divided into the discovery and validation cohort. Based on the TCGA discovery cohort, the univariate cox analysis, least absolute shrinkage and selection operator (LASSO) regression analysis and multivariate cox proportional hazards regression analysis were performed and a prognostic signature was constructed. The prognostic capacity of the signature was verified in TCGA validation cohort and the prognostic signature expression in tumors and serums was validated in an independent microarray dataset and our clinical set. We also evaluated the diagnostic power of the model in serum by calculating the area under the receiver operating characteristic (ROC) curve in a clinical set. Functional enrichment analyses were applied to analyze the potential roles of the signature in CRC.
Results: A total of 154 common DEmiRNAs were identified. Based on the TCGA discovery cohort, a four-miRNA (miR-10b, miR-130a, miR-561, miR-4684) prognostic signature was constructed and a predictive nomogram model including the expression of above four miRNAs showed good performance for predicting the 3- and 5-year overall survival. The findings were well verified in TCGA validation cohort. miR-10b, miR-130a, miR-561 and miR-4684 were significantly upregulated both in CRC tumors and serums. Circulating miR-10b, miR-130a, miR-561 and miR-4684 levels were significantly downregulated after surgical resection of tumor in CRC patients, suggesting these four circulating miRNAs in the serum were tumor derived. The four circulating miRNA combination might act as a novel diagnostic signature for CRC. Functional enrichment analyses suggested that above four miRNAs might be related to the progression of CRC.
Conclusions: We developed a novel reliable prognostic and diagnostic signature, which should be beneficial for CRC screening and clinical therapeutic decision-making.
Colorectal cancer (CRC) remains a global public health problem, with more than a million diagnosed patients and 500 thousand deaths every year[1]. By 2030, it is estimated that there will be more than 2 million new CRC cases and over 1 million deaths[2]. Because the early symptoms of CRC are not obvious, many diagnosed patients are at the middle and late stages of which five-year survival are much lower than stage I and II[3]. Early diagnosis and intervention are critical to improve CRC patients’ outcomes. In early stage of CRC, the faecal screening and diagnostic strategies such as faecal occult blood testing (FOBT) and faecal immunochemical testing (FIT) were limited sensitivity[4]. Researches suggest that patients prefer blood test compared with faecal screening[5]. The uptake in CRC screening would be significantly increased by identifying a outperformed blood-based biomarker[6]. Meanwhile, the prognosis remains unsatisfactory due to individual differences. The clinical therapeutic strategies and prognosis are based on the cancer stage when diagnosed and the clinicopathological features of patients[7]. Only One fifth of CRC patients at stage III benefit from chemotherapy and a large majority of patients are exposed to unnecessary toxicity[8]. Prognostic markers will provide the prognostic information such as overall survival (OS) and relapse-free survival (RFS) which is treatment-independent. Thus, developing some reliable, sensitive and minimally invasive diagnostic and prognostic biomarkers is essential to the CRC control.
MicroRNAs (miRNAs) are short, endogenous and non-coding RNAs in eukaryotic organisms which regulate the expression of target genes[9]. Numerous studies reported that dysregulated miRNAs were related to tumorigenesis, progression and metastasis of CRC[10]. Many CRC-related miRNAs were identified and numerous upregulated or downregulated miRNAs were associated with prognosis of CRC patients, such as miR-30b, miR-214, miR-200c, miR-224, miR-155 and so on[11–16]. Circulating miRNAs derived by tumor can exist in peripheral blood steadily, so circulating miRNAs in serum are potential diagnostic biomarkers with minimal invasiveness. Many potential CRC diagnostic markers have been reported, for example, miR-21, miR-92a and a panel of several miRNAs[16–19]. Novel miRNA markers for CRC are being continuously identified but there still require in depth studies and validation before clinical application of these markers. Combination of markers seems to be more effective than individual marker on clinical application[20, 21].
In this study, we used three GEO datasets and TCGA CRC dataset to obtain the common significantly differentially expressed miRNAs (DEmiRNAs) in tumor tissue and serum of CRC patients. The miRNAs associated with overall survival (OS) were identified. A four-miRNA prognostic model (miR-10b, miR-130a, miR-561, miR-4684) was developed, predicting the post-surgical prognosis of CRC patients with moderate accuracy. The predictive power of above model was validated in an independent CRC cohort. We also tested the expression of these four miRNAs in tissues and serums in a clinical set and evaluated the diagnostic power of the model in serum by determining the area under the receiver operating characteristic (ROC) curve. Functional enrichment analyses were applied using the potential targets of above four miRNAs. In sum, we developed a novel reliable prognostic and diagnostic signature, which should be beneficial for CRC screening and prognosis predicting.
The online tool GEO2R of GEO database was used to analyze three serum miRNA expression data sets (GSE113486, GSE124158 and GSE106817) get DEmiRNAs in the CRC serums compared to healthy controls. The tissue miRNA expression profiles for CRC were downloaded from TCGA database, including 635 CRC tissues and 11 normal tissues. All of the 11 normal samples were paired with their matched CRC tumors. The DEmiRNAs between CRC and normal tissues were identified by R/Bioconductor package “edgeR”. DEmiRNAs were distinguished according to |log2 fold change (FC)| ≥ 1 and adjusted p-value < 0.05. Then, R language package “veen” was used to get the common DEmiRNAs in serums and tumor tissues of CRC patients for further study.
CRC patients without survival information in TCGA database were excluded. According the computer-generated distribution sequence, 569 CRC patients were randomly divided into the TCGA discovery cohort (n = 285) and TCGA validation cohort (n = 284). Univariate Cox analysis was carried out in the TCGA discovery cohort to study the association between DEmiRNAs expression and OS of CRC. The miRNAs with a p-value < 0.05 were considered to be significantly related to OS and were selected for next analysis. The optimal miRNAs were identified by using least absolute shrinkage and selection operator (LASSO) regression algorithm.
The independent prognostic miRNAs of OS were determined by performing the multivariate Cox proportional hazards regression analysis. According the expression of key miRNAs and regression coefficients, a random forest plot was established. We calculated the risk scores of each patient with “Predict” function of R language. The TCGA discovery cohort was stratified into two groups with high-risk and low-risk based on the median risk score. A heatmap was plotted to characterize the expression of key miRNAs and a Kaplan-Meier analysis was used to assess the prognosis difference between the two groups with high-risk and low-risk.
In TCGA discovery cohort, a nomogram model was constructed based on the expression of key miRNAs and nomogram scores for patients were calculated. The agreement between model prediction and actual outcome was evaluated by calibration curves. 3- and 5-year OS probability of patients in TCGA discovery cohort was predicted based on the nomogram score, and the prognostic capacity was assessed by the ROC curve.
As same as the TCGA discovery cohort, TCGA validation cohort was divided into two groups with high-risk and low-risk according the risk scores of each patient calculated with “Predict” function of R language. A heatmap was also plotted to characterize the expression of key miRNAs and a Kaplan-Meier analysis was also used to assess the prognosis difference between the two groups. The prognostic accuracy in TCGA validation cohort was also assessed by the time-dependent ROC curve.
To validate the miRNAs expression and identify the diagnostic value of prognostic model, a total of 52 CRC patients who undergo surgical treatment in the Shenzhen Second People's Hospital (Shenzhen, China) and 52 volunteer healthy controls were invited to participate in this study. All the CRC patients did not receive treatment before surgery. The clinicopathological information of the CRC patients was shown in Table 1. The Ethics Committee of the Tsinghua Shenzhen International Graduate School approved this study and we obtained the written informed consent from all participating individuals before using their tissue and blood specimens. The resected tumor tissue and non-cancerous samples (away from tumor > 5 cm) of CRC patients were stored in RNAlater™ stabilization solution immediately (Thermo Fisher Scientific, Waltham, MA, USA) then stored at -80 °C for future experiment. Pre- and post-operation CRC peripheral blood as well as healthy peripheral blood were collected in vacuum blood tubes without clot activator, clot at 4 °C for 2 h, and rotate at 2,000 rpm at 4 °C for 10 minutes. The serums were transferred into RNase-free tubes then stored at -80 °C.
Characteristics | Number of patients(N%) | |
---|---|---|
Gender | male | 28(53.84%) |
female | 24(46.16%) | |
Age | ≤ 50 | 15(28.85%) |
> 50 | 37(71.15%) | |
Tumor size | T1 | 3(5.76%) |
T2 | 4(7.70%) | |
T3 | 10(19.23%) | |
T4 | 35(67.31%) | |
Lymph nodes | N0 | 22(42.31%) |
N1 | 19(36.54%) | |
N2 | 11(21.15%) | |
Metastasis | M0 | 44(84.61%) |
M1 | 8(15.39%) | |
Histological Tumor grade (TNM system) | I | 6(11.53%) |
II | 14(26.93%) | |
III | 24(46.15%) | |
IV | 8(15.39%) | |
T1: Tumor invades submucosa; T2: Tumor invades muscularis propria; T3: Tumor invades through the muscularis propria into pericolorectal tissues; T4: Tumor invades visceral peritoneum or invades or adheres to adjacent organ or structure. | ||
N0: No regional lymph node metastasis; N1: Metastasis in 1–3 regional lymph nodes; N2: Metastasis in 4 or more regional lymph nodes. | ||
M0: No distant metastasis; M1: Distant metastasis. |
We respectively used the RNAiso for Small RNA (Takara, Japan) and miRNeasy Serum/Plasma kit (Qiagen, Hilden, Germany) to extract total miRNA from the tissues and serums. miRNA was reverse-transcribed by using the Mir-X miRNA First-Strand Synthesis Kit (Takara). In order to identify the miRNA expression, quantitative real-time PCR (qRT-PCR) was applied on a 7500 PCR system (Thermo Fisher Scientific) using the TB Green Advantage qPCR Premix (Takara). All the primers used for qRT-PCR were shown in Table 2. The applied PCR conditions were as follows: step 1, 5 minutes at 95 °C, step 2, 40 cycles of 10 seconds at 95 °C, 35 seconds at 60 °C. At the end of cycles, a melting curve was performed according to the pre-setted conditions on the 7500 PCR system. To calculate the relative expression levels of miRNAs in colorectal tumor tissues, snRNA U6 was the internal control and the 2−△△Ct method was used. We used cel-miR-39 (Gene Pharma, Shanghai, China) as the external reference and added 1 ul cel-miR-39 at a concentration of 1umol/L to each serum sample. The relative expression of miRNAs in serums was calculate with the 2−△Ct method.
miRNA | Sequence |
---|---|
miR-130a-F | CAGTGCAATGTTAAAAGGGCAT |
miR-10b-F | TACCCTGTAGAACCGAATTTGTG |
miR-561-F | ATCAAGGATCTTAAACTTTGCC |
miR-4684-F | CTCTCTACTGACTTGCAACATA |
cel-miR-39-F | TCACCGGGTGTAAATCAGCTTG |
miRNAs-R | Universal primer in Mir-X miRNA First-Strand Synthesis Kit of Takara |
snRNA U6-F | CTCGCTTCGGCAGCACA |
snRNA U6-R | AACGCTTCACGAATTTGCGT |
The diagnostic performance of miRNAs expression in distinguishing CRC patients from healthy controls was determined by performing ROC analysis. Sensitivity against (1-specificity) was plotted at each cutoff threshold, and the probability of identifying CRC patients from healthy subjects was calculated by the area under the curve (AUC) values.
Gene ontology (GO) and Protein-protein Interaction (PPI) enrichment analysis of the potential target genes of the miRNAs
To explore the potential roles of the miRNAs in CRC, we used the mirDIP database to get the top 100 potential target genes with the highest score for each miRNA. The selected genes were used to identify the functional differences and the involved biological processes with the GO terms and pathway enrichment analysis in Metascape database (http://metascape.org/gp/index.html#/main/step1). Terms with enrichment factor > 1.5, p-value < 0.01 and minimum count of 3 were collected. According the membership similarities, these terms were grouped into clusters. PPI enrichment analysis was carried out with the following algorithm in Metascape database: BioGrid, OmniPath and InWeb_IM[22–24]. The network contained proteins subset that interact with at least one other member in the given list, and the Molecular Complex Detection (MCODE) network was gathered to identify densely connected network components. MCODE is a graph-based approach clustering algorithm, which can quickly detect dense connected regions from large-scale protein networks and measure the degree of association of proteins in the modules.
SPSS20.0 and Graphpad prism8.0 software were used for statistical analyses. The independent prognostic miRNAs for CRC were identified by performing the univariate and multivariate Cox proportional hazards regression analyses, and the prognostic capacity was assessed by the time-dependent ROC curve. Using the Kaplan-Meier method and logrank test to calculate survival rate and confirm the comparisons. The student’s t test was used to determine the miRNAs expression difference between two groups. ROC curve and calculated AUC were used to discriminate CRC patients from healthy subjects. The criterion for statistical significance was p < 0.05 (two-sided).
Figure 1 showed the overview of study strategy. The dysregulated miRNA expression pattern of GSE113486, GSE124158, GSE106817 and TCGA CRC cohort was respectively visualized by volcano plot (Fig. 2A-2D). Red and green dots respectively indicated the upregulated and downregulated miRNAs in CRC serums or tumor tissues. Venn analysis was used to identify the common upregulated and downregulated miRNAs in both serums and tumor tissues of CRC patients and a Venn diagram was generated. Finally, we confirmed a total of 154 upregulated DEmiRNAs (Fig. 2E) and 0 downregulated DEmiRNAs (Fig. 2F). The common DEmiRNAs were showed in Table 3.
| ||
---|---|---|
| miR-451a,miR-23a,miR-22,miR-130a,miR-151a,miR-17,miR-15a,miR-21,miR-126,miR-20a,miR-411,let-7 g,miR-18a,miR-106a,miR-4668,miR-4766,miR-135b,miR-654,miR-379,miR-95,miR-655,miR-148a,miR-651,miR-217,miR-136,miR-27a,miR-32,miR-1245a,miR-561,miR-24-1,miR-223,miR-10b,miR-27b,miR-142,miR-5683,miR-628,miR-3607,miR-4724,miR-374a,miR-549a,miR-545,miR-19a,miR-301a,miR-708,miR-590,miR-542,miR-96,miR-660,miR-643,miR-577,miR-454,miR-758,miR-146a,miR-26b,miR-556,miR-152,miR-5579,miR-10a,miR-374b,miR-627,miR-889,miR-224,miR-215,miR-576,miR-190a,miR-1277,miR-1537,miR-653,miR-539,miR-452,miR-154,miR-624,miR-3613,miR-369,miR-376c,miR-376b,miR-582,miR-218-2,miR-30e,miR-144,miR-503,miR-143,miR-548v,miR-192,miR-493,miR-186,miR-5586,miR-25,miR-4791,miR-19b-2,miR-511,miR-30a,miR-141,miR-3664,miR-337,miR-182,miR-181d,miR-34b,miR-19b-1,miR-33b,miR-31,miR-541,miR-33a,miR-412,miR-26a-2,miR-205,miR-29c,miR-3065,miR-29a,miR-598,miR-551b,miR-200a,miR-335,miR-6854,miR-5000,miR-381,miR-382,miR-429,miR-34a,miR-4999,miR-548 s,miR-7705,miR-374c,miR-425,miR-200b,miR-183,miR-147b,miR-218-1,miR-4473,miR-496,miR-3677,miR-7-1,miR-203b,miR-1269b,miR-450a-1,miR-552,miR-26a-1,miR-24-2,miR-188,miR-196b,let-7f-2,miR-7-2,miR-449a,miR-4684,miR-2355,miR-16-1,miR-34c,miR-30b,miR-584,miR-581,miR-185,miR-134,miR-16-2,miR-210 |
569 CRC patients in TCGA database were randomly divided into the TCGA discovery cohort (n = 285) and TCGA validation cohort (n = 284). In the discovery cohort, the prognostic relationship between the DEmiRNAs expression and OS of CRC patients was studied by performing univariateCox regression analysis. There were 22 DEmiRNAs with a p-value < 0.05 among the 154 DEmiRNAs (Table 4). Because all the 154 DEmiRNAs were upregulated in CRC patients, the 12 miRNAs with hazard ratio > 1 were selected for the LASSO analysis. The coefficients of the 12 miRNAs were calculated, and 7 miRNAs (miR-10b, miR-130a, miR-5579, miR-561, miR-217, miR-34c and miR-4684) were screened out according the minimize λ method (Fig. 3A-B). Next, multivariate cox regression analysis was carried out and the results showed that miR-10b, miR-130a, miR-561 and miR-4684 were the independent prognostic biomarkers for CRC. The random forest plot was established (Fig. 3C). The risk scores for patients were calculated with the “Predict” function of R language. 285 patients in TCGA discovery cohort were split into two groups with high-risk (n = 142) and low-risk group (n = 143) based on the median risk score. We plotted the distributions of risk scores, OS and OS status (Fig. 3D-E). We also applied the cluster analysis of above four miRNAs between high-and low-risk groups, a heatmap was constructed (Fig. 3F). Kaplan-Meier analysis was carried out to assess the prognosis difference between the two groups with different risk, and the result suggested that the high-risk group had a worse OS than the low-risk group (P = 0.0124) (Fig. 3G).
miRNA | hazard ratio | p-value | miRNA | hazard ratio | p-value |
---|---|---|---|---|---|
miR-3677 | 0.660166391 | 0.000331741 | miR-22 | 1.675037398 | 0.018808316 |
miR-10b | 1.61051756 | 0.002174175 | miR-425 | 0.669353339 | 0.019528583 |
miR-561 | 1.447264325 | 0.002818638 | miR-493 | 1.497718743 | 0.025639308 |
miR-188 | 0.680682457 | 0.00567738 | miR-4684 | 1.246131828 | 0.026607058 |
miR-34b | 1.425594558 | 0.006165811 | miR-429 | 0.753387006 | 0.03196279 |
miR-34c | 1.284070183 | 0.0076846 | miR-451a | 0.790666585 | 0.032279276 |
miR-196b | 0.790120228 | 0.009075736 | miR-16-1 | 0.704184925 | 0.032893757 |
miR-5579 | 1.502461992 | 0.013189274 | miR-30a | 1.24901058 | 0.033735734 |
miR-526b | 1.325834901 | 0.014182196 | miR-16-2 | 0.70721554 | 0.036521963 |
miR-130a | 1.520549448 | 0.017036171 | miR-3613 | 0.74353705 | 0.042162458 |
miR-217 | 1.285700337 | 0.017910929 | miR-144 | 0.79251505 | 0.04546269 |
A predictive nomogram model including the expression of miR-10b, miR-130a, miR-561 and miR-4684 was constructed by the nomogram approach to predict the prognosis of CRC patients in the TCGA discovery cohort (Fig. 4A). Based on the expression of the four miRNAs, nomogram scores for patients was calculated and was used to predict the 3-year OS or 5-year OS of CRC patients. The calibration curves were constructed, which suggested that the nomogram model exhibited good performance for the 3-year and 5-year OS prediction (Fig. 4B-C). Moreover, the time-dependent ROC analysis was performed and the AUC for 3- and 5-years survival were 0.733 and 0.704, suggesting that the integrated four-miRNA signature had good prognostic capacity (Fig. 4D).
Next, different patients cohort was used to assess the common prognostic capacity of the integrated four-miRNA signature. Using the above four miRNA signature, 284 patients in the TCGA validation cohort were also divided into two groups with high-risk (n = 142) and low-risk (n = 142). The distributions of risk scores and survival status were plotted (Fig. 5A-B). A risk heatmap was constructed based on the expression of four miRNAs in TCGA validation cohort (Fig. 5C). Kaplan-Meier analysis suggested that the high-risk group had a worse OS than the low-risk group (P = 0.0127) (Fig. 5D). The ROC analysis was performed and the AUC for 3- and 5-years survival were 0.625 and 0.723, respectively (Fig. 5E). These results showed a good agreement with those in the TCGA discovery cohort.
The expression of miR-10b, miR-130a, miR-561 and miR-4684 was identified in an independent GEO dataset (GSE98406). We also studied the miRNAs expression of tumor tissues compared to non-cancerous tissues by qRT-PCR in our clinical specimens. Revealing a consistently different expression with the TCGA database, the four miRNAs expression of tumor tissues was significantly upregulated both in GSE98406 dataset and our clinical specimens (Fig. 6A-B).
The circulating miR-10b, miR-130a, miR-561 and miR-4684 expression in the preoperative and postoperative serums of CRC patients was tested by qRT-PCR to investigate whether circulating miRNAs in the serum were tumor derived. The expression of circulating miRNAs was significantly decreased in postoperative serums compared with that in preoperative serums, suggesting that CRC tumor was the source of the increased circulating miRNAs in serum (Fig. 7A). The relative levels of miR-10b, miR-130a, miR-561 and miR-4684 were significantly higher in the CRC serums than those in the healthy individuals (Fig. 7B). In order to assess the potential diagnostic capacity of the four circulating miRNAs, ROC analysis was carried out and AUC was calculated. The AUC was 0.854 (95% CI 0.783–0.924) for miR-10b, 0.739 (95% CI 0.640–0.839) for miR-130a, 0.830 (95% CI 0.745–0.915) for miR-561, 0.842 (95% CI 0.765–0.919) for miR-4694, and 0.976 (95% CI 0.951-1.000) for the 4-miRNA combination for distinguishing CRC patients from healthy controls(Fig. 7C).
To analyze the potential roles of the four miRNAs in CRC, their downstream target mRNAs were predicted by using mirDIP database. mirDIP database integrates the data of 30 databases which can predict the target genes of miRNAs. We selected 100 potential target genes with the highest score for each miRNA (Table 5) and Metascape database was used to carry out the GO and KEGG enrichment analyses. According to the top 20 significant pathways and functions of which p-value < 0.05, a heatmap enriched was constructed (Fig. 8A). Several biological processes and pathways such as chromatin remodeling, mitotic cell cycle phase transition, proteoglycans and transcriptional misregulation in cancer were related to the progression of cancer. According the correlated function pathway and constructed network, the highly enriched genes were clustered. In the diagram, the pathway or biological process which contains the greater number of genes was represented by the dark color (Fig. 8B) and different categories were represented by different colors (Fig. 8C). Last, the PPI network for the potential target genes was built (Fig. 8D) and the MCODE networks were gathered (Fig. 8E-F).
miRNA | Potential targets |
---|---|
miR−10b | NCOR2,H3F3B,NR4A3,GATA6,ZMYND11,TFAP2C,MAPRE1,BDNF,GTF2H1,GALNT1,HOXA3,DAZAP1,CNOT6,RAP2A,KLF11,SOBP,EBF2,BAZ2B,ESRRG,ANK3,CTDSPL,NR5A2,MDGA2,NCOA6,RB1CC1,SLC38A2,BCL6,DOCK11,MAP3K7,CRK,MTMR3,BACH2,ACTG1,SON,FXR2,CRLF3,CSMD1,BAZ1B,EPHA4,TMEM183A,ARRDC3,TBX5,ZNF367,E2F7,FIGN,WDR26,RPRD1A,TRIM2,FNBP1L,E2F3,ELAVL2,GATA3,HNRNPK,CAMK2G,BICD2,CTNNBIP1,DLGAP2,GOLGA3,SDC1,HOXA1,HOXB3,MTF1,WNK3,IL1RAPL1,SMTNL2,NFAT5,ITSN1,SMAP1,SRSF1,SCARB2,HCN1,ANKFY1,NEDD4,JARID2,KCTD16,CCDC71L,USP46,LCOR,ARSJ,TMOD1,PIK3CA,ZNF608,PHF20L1,IGDCC4,KLHL29,TIAM1,ID4,CNNM4,PAPD5,FLRT2,ERI3,NONO,HAS3,TPP2,PAFAH1B1,CECR6,BTRC,RORA,GRM3,LGALSL |
miR−130a | FAM234B,PDE4B,NEK6,TNRC6A,AMBRA1,PDE7A,KPNA4,RBM25,PKP4,SOCS6,TOP1,SEMA6D,NACC2,KMT5A,NR6A1,CSNK1G3,ZBTB34,ASF1A,SATB1,CNOT6L,DNAJC6,AUH,TLK1,WBP2,ETNK1,KLF3,CA2,BRWD1,MRC1,CTCF,RAB39B,MAP3K1,EBF3,PPARGC1A,SEC23IP,PPIF,UBE2D1,HMGB2,RRAS2,POU4F2,CLCN3,ARHGAP20,CXCL12,MEIS1,CELF2,ZNF292,SEC24A,FMR1,PNRC2,ELF2,MCFD2,FNBP1L,MYCT1,LBR,ZNF839,TGIF1,NAP1L5,PPP1R12A,MAPRE1,NEK7,STX12,SATB2,MAP7,PLCXD3,CCNG1,TMED5,ROBO2,TOX3,CAPRIN1,MAB21L2,SLC6A14,FOXP2,CUL3,ZNF654,PKIA,NUFIP2,CREBZF,SPOPL,C2orf69,PPP1CB,WNK1,CELF1,MICU3,CDC40,TMPO,MYH1,MARCKSL1,RAB8B,SPRY2,MEIS2,QSER1,C3orf52,APAF1,HOXD10,HMGN2,TOX,MAF,IRF2,RCN1,TXLNG |
miR−561 | PCNX1,RBBP4,EIF4G2,RERE,ZNF236,ARFGEF2,GSPT1,MED1,DOK6,YOD1,SLAIN2,WNK3,PPARGC1B,DDA1,KL,BCL11B,ALDH5A1,HIPK1,ELOVL7,CCDC88A,GABRA1,SLC5A1,KCTD12,ADNP,ZXDB,SPRY3,NID1,TMEM123,ARHGAP5,PTPRD,ASH1L,TRPC5,EIF3J,SEC24A,HMBOX1,DDX21,SLC4A11,TMBIM4,GPHN,ZCCHC2,WWC3,SOX11,ATP2C1,KIAA0408,ABHD13,FKBP1A,GABPA,PCP4,GYPA,PCK1,ADAR,MGA,FGD4,RAB2B,ACTL6A,TRAF6,AGGF1,HDLBP,ZNF384,MIER1,STK4,RBM46,GATAD2B,ZNF501,CDH6,STAMBP,EEA1,ANK2,CREB1,BOLL,FAM110B,SLC35A2,CA13,SMAD2,USP37,MEF2A,CLEC16A,SRRM4,GALNT2,STK35,PDPR,ABHD2,DIP2B,TBC1D13,ANKRD50,ZNF461,MICU3,RET,VCAM1,SRPX,FIGN,FUS,MAPK1IP1L,IGFBP5,ARFGEF3,MDGA2,ATP6V0A2,IRGQ,WAPL,RFX7 |
miR−4684 | GNS,CUL3,SLC23A2,KLHL24,ASH1L,JADE2,HMGB2,ZMYM2,GID8,RIN2,ZBTB38,YPEL5,SCOC,JADE1,SHISA2,ZNF850,RAB11B,ZNF490,NOL9,MDGA2,PTPN11,WIZ,CHRDL1,ARHGAP11A,JCHAIN,ANP32E,GXYLT1,TMEM236,KLHL5,NXT1,AMMECR1L,ZNF33A,MTURN,MYOZ2,HSPD1,TXLNG,ZNF250,ZEB2,GK5,EP300,ELK4,TATDN3,SAR1B,SLC5A5,ZNF528,ADGRG1,RFT1,VCPIP1,IRGQ,MGAT5,SYNPO2,SLCO1A2,ZBTB3,SOX11,WBP11,MTFMT,GINS1,ZNF638,ZDHHC9,AKAP9,DTX3L,RBM8A,GNG4,NFYB,ZNF772,FOXN3,RPS6KA3,FAM102B,LMO4,SBSPON,ZNF629,MMACHC,SPRYD7,PCDHA9,CDK12,USP1,PPP1R3D,ESRRG,PGAP1,EIF5B,EXOSC6,KCNIP4,PNPO,TRAPPC2,JAM3,GATAD2B,ORAI2,ZNF709,SPAST,PSME4,MATR3,MDM2,PAK2,NHLRC2,HES1,LRTOMT,TEAD1,CDC42BPA,BRIP1,ADAMTS4 |
Currently, more and more studies have reported that dysregulated miRNAs played important roles in the progression of CRC and might be potential diagnostic, therapeutic and prognostic biomarkers of CRC[16]. An effective prognostic biomarker provides prognostic information about the clinical treatment-independent outcome of patients such as OS and RFS which is very worthful for therapeutic strategies selection[25]. The application of prognostic biomarkers can avoid undertreatment or overtreatment, which is essential for personalized and precision therapy[25]. Compared with single biomarker, the predictive accuracy of a system with multiple integrated biomarkers seems to be better. In this study, 569 CRC patients with information of survival status and survival time from TCGA database were randomly divided into the discovery and validation cohort to construct and validate the prognostic model. Using multiple appropriate statistical methods, four miRNAs (miR-10b, miR-130a, miR-561, miR-4684) were identified as independent prognostic indicators in the TCGA discovery cohort. An integrated four-miRNA prognostic signature was established. According the expression of the above miRNA signature and regression coefficients, risk scores of patients in the TCGA discovery cohort were calculated and patients were divided into two groups with low-risk and high-risk. Kaplan-Meier analysis indicated that the high-risk group had a worse OS than the low-risk group. ROC analysis and the areas under curve showed that the four-miRNA signature exhibited high predictive accuracy in the 3-year and 5-year OS prediction. The universal applicability of the four-miRNA signature was verified by the TCGA validation cohort.
Cancer cells derive some molecules that contain signature markers of their origin cells into the peripheral blood, which is the theoretical basis of the concept of liquid biopsy[26, 27]. The blood-based screening has the advantages of minimal invasiveness and good reproducibility. However, there are few routine liquid biopsy markers for CRC screening or diagnosis. Low sensitivity and specificity limited the application of carcinoembryonic antigen (CEA) in the CRC diagnosis or screening, so CEA was commonly to be monitor for CRC recurrence[28]. To develop more sensitive liquid biopsy biomarkers used as a replacement for or in combination with current stool-based screening is crucial to early detection of CRC and decrease of CRC-related mortality. Being detected in serum and tissue and remaining stable after long storage[29, 30], miRNAs have been considered as suitable potential markers for liquid biopsy[16]. We also assessed the potential diagnostic capacity of above four miRNAs in independent clinical samples in the present study. The expression of circulating miR-10b, miR-130a, miR-561, miR-4684 in CRC serums was significantly higher than that in healthy subjects. The integrated four-miRNA signature exhibited high accuracy in distinguishing CRC patients from healthy controls.
To explore the potential biological function of the four miRNAs, we predicted the potential target mRNAs of each miRNA and performed the gene ontology as well as KEGG pathway analysis. The results suggested that these four miRNAs were potentially involved in several cancer-related biological processes and pathways such as chromatin remodeling, mitotic cell cycle phase transition, proteoglycans and transcriptional misregulation in cancer. Transcriptional regulation is essential to intra- and extra-cellular signals responding, to define and maintain cell identity, and to coordinate cellular activity[31]. It is well known that most cancers are characterized by transcriptional dysregulation. Chromatin remodeling is mainly involved in the regulation of gene transcription, and is related to cell apoptosis, DNA damage repair, cell proliferation and differentiation, and maintenance of genomic stability[32]. Abnormal chromatin remodeling is closely related to the occurrence of cancers[33]. The cell cycle is strictly ordered under the regulation of control mechanisms. Accurate cell cycle phase transition is crucial to genome duplication and chromosome segregation[34]. Misregulation of cell cycle phase such as G1/S, G2/M or S/G2 transitions may lead to tumorigenesis. Some miRNAs were reported to be associated with above biological processes in cancers. miR-661 and miR-22 were reported to regulate chromatin remodeling[35, 36], miR-122, miR-15a, miR-26, miR-29, mir-30 and let-7 were related to the transcriptional misregulation in cancer[37], miR-188, miR-638 and miR-1258 were involved in the cell cycle regulation[38–40].
Some mechanisms of miR-10b and miR-130a involved in CRC have been reported. miR-10b significantly inhibited PIK3CA expression and suppressed the activity of PI3K/Akt/mTOR pathway, increased TGF-β and SM α-actin expression, promoted cancer associated fibroblasts (CAFs) formation and CRC growth[41]. miR-10b may target HOXD10 ,E-cadherin, KLF4 and Rhoc, promoting the invasion and migration in CRC tumor and up-regulation of miR-10b was associated with liver metastasis in CRC[42–46]. miR-130a was also reported to promote proliferation, migration and invasion of CRC cells by regulating FOXF2[47]. miR-130a was a potential therapeutic target of CRC. miR-130a functioned as a radiosensitizer in rectal cancer[48], and curcumin and mesalazine suppressed the colon cancer proliferation by inhibiting the expression of miR-130a[49, 50]. To our knowledge, this study is the first to clarify the specific expression of miR-561 and miR-4684 in the tumors & serums of CRC patients and to identify their prognostic and diagnostic capacity in CRC. The molecular mechanisms of miR-561 and miR-4684 involved in CRC completely unknown.
We should point out the limitations of the present study. First, the prognostic capacity and diagnostic capacity of the four-miRNA signature were only separately evaluated in TCGA dataset and a clinical cohort. The effectiveness of this signature should be further validated in different cohorts. Secondly, there remains a need for the in-depth research of the miRNAs biological functions to explore the novel mechanisms of CRC carcinogenesis.
We established a novel integrated four-miRNA signature which was significantly related to OS in CRC patients. This four-miRNA signature could accurately distinguish low prognostic risk patients from high prognostic risk patients. Moreover, this four-miRNA signature exhibited a good accuracy and reliability in identifying CRC patients from healthy controls. These results suggested that this four-miRNA signature could be a potential prognostic and diagnostic model for CRC.
CRC; microRNAs:miRNAs; differentially expressed miRNAs:DEmiRNAs; overall survival:OS; Gene Expression Omnibus:GEO; The Cancer Genome Atlas:TCGA; receiver operating characteristic:ROC; faecal occult blood testing:FOBT; faecal immunochemical testing:FIT; relapse-free survival:RFS; least absolute shrinkage and selection operator:LASSO; qRT-PCR:quantitative real-time PCR; area under the curve:AUC; Gene ontology:GO; Kyoto Encyclopedia of Genes and Genomes:KEGG; Protein-protein Interaction:PPI; Molecular Complex Detection:MCODE; hazard ratio:HR.
Acknowledgements
Not applicable.
Authors’ contributions
YC is the principle investigator. YC and YY J conceived the idea for the paper. YQ collected the clinical specimens. YC and CQ conducted data management and bioinformatics analysis. YC, YY Z and MM D performed the qRT-PCR. QS S and LL L conducted statistical analysis. YC was a major contributor in writing the manuscript and YY J revised the manuscript critically. All authors read and approved the final manuscript.
Funding
This work was supported by Shenzhen Progression and Reform Committee (No.2019156), Shenzhen Foundation of Science and Technology (No. JCYJ20180306174248782), Department of Science and Technology of Guangdong Province (No.2017B030314083).
Availability of data and materials
The datasets analyzed during the current study are available at the GEO database (https://www.ncbi.nlm.nih.gov/gds) and the TCGA database (https://cancergenome.nih.gov/).
Ethics approval and consent to participate
This study was approved by the Ethics Committee of Tsinghua Shenzhen International Graduate School. All subjects gave written informed consent in accordance with the Declaration of Helsinki before their inclusion in the study.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author details