Identification of 4 dynamic down-regulated miRNAs in the colorectal adenoma-carcinoma sequence (ACS)
In this study, we aimed to identify key miRNAs of the colorectal ACS and uncover their biological molecular function during colorectal tumorigenesis. As shown in Fig. 1, we performed comprehensive bioinformatic analyses in several datasets of miRNAs and/or mRNAs (Table 1) to construct a miRNA-mRNA regulatory network and explore their biological roles in this sequence.
Table 1
Detailed information of 7 datasets analyzed in this study.
Dataset | Platform | Data type | Normal samples | Adenoma samples | CRC samples | Source | Species |
GSE41655 | GPL11487 | microRNA | 15 | 59 | 33 | Tissue | Homo sapiens |
GSE115513 | GPL18402 | microRNA | 84 | 84 | 84 | Tissue | Homo sapiens |
GSE41657 | GPL6480 | mRNA | 12 | 51 | 25 | Tissue | Homo sapiens |
GSE37364 | GPL570 | mRNA | 38 | 29 | 14 | Tissue | Homo sapiens |
TCGA-COAD | Illumina | mRNA & microRNA | 8 | / | 8 | Tissue | Homo sapiens |
GSE161277 | GPL20301 | scRNAseq | 3 | 3 | 3 | Tissue | Homo sapiens |
TCGA-PanCancer | Illumina | mRNA | / | / | 10953 Pan-Cancers | Tissue | Homo sapiens |
As for the miRNA dataset, we searched datasets with normal, adenoma, and CRC samples in GEO database, and finally selected GSE41655(15 normal, 59 adenoma, and 33 CRC samples) and GSE115513 with the 84 paired samples for further analysis. We ordered samples according to the colorectal normal‑adenoma‑carcinoma in the two datasets, respectively. STEM analysis then classified ordered expression data into 15 profiles with the significant ones colored (Fig. 2A-B and Additional File 1: Table S1). Profile 4 (35 miRNAs) in GSE41655 and Profile 4 (18 miRNAs) in GSE115513 were significantly descending expression, respectively (p < 0.05). A total of 4 miRNAs (hsa-miR-133b, hsa-miR-145-5p, hsa-miR-30a-5p, hsa-miR-497-5p) were merged and called as the dynamic down-regulated miRNAs along the colorectal ACS (Fig. 2C).
Moreover, logistic linear regression analysis and ROC analysis evaluated the performance of these 4 miRNAs for adenoma or CRC detection. In GSE41657, AUC values of the 4 miRNAs exhibited high accuracy for identifying CRC (100%) or adenoma (100%) compared with normal mucosa samples, and even for identifying CRC (76.1%) compared with adenoma samples (Fig. 2D). AUC values of the 4 miRNAs in GSE115513 also showed highly diagnostic potential with AUC value 76.6%, 89.4%, and 62.1%, respectively (Fig. 2E). In total, these results showed 4 dynamic down-regulated miRNAs can play vital roles in CRC or adenoma diagnosis.
To further interpret the biological function of these dynamic down-regulated miRNAs, the DIANA-mirPath database, a miRNA pathway analysis web-server was used. As shown in Fig. 2F and Additional File 1: Table S2, these 4 miRNAs may be involved in some cancer related pathways, including the “cell cycle”, “colorectal cancer”, “Wnt signaling”, and “Hippo signaling” pathways (p < 0.05). In total, the 4 dynamic down-regulated miRNAs might be involved in the colorectal tumorigenesis.
Identification of 278 dynamic up-regulated mRNAs during the colorectal ACS
Given the directly negative biological relationship between miRNAs and mRNAs, the similar analysis (STEM analysis) was also performed to obtain the dynamic deregulated mRNAs. GSE41657(12 normal, 51 adenoma, and 25 CRC samples) and GSE117606 with 48 paired samples were also ordered along the colorectal ACS, respectively. A total of 15 profiles were classified via STEM analysis (Fig. 3A-B, and Additional File 2: Table S2). Among them, profile 11, 12 and 15 having 1048 genes in GSE41657, profile 11, 12 and 15 including 749 genes in GSE117606 showed the increasing expression pattern along the transition. Then, a total of 278 genes were merged from these mRNAs, and identified as the dynamic up-regulated mRNAs in the sequence (Fig. 3C).
Next, we explored the biological function of these 278 dynamic up-regulated mRNAs by performing KEGG pathway analysis via ClusterProfiler R package. As shown in Fig. 3D, some cancer related pathways, including the “cell cycle”, “colorectal cancer”, and “PI3K-Akt signaling” pathways were significantly enriched (p < 0.05). In total, we used similar analyses as the miRNA data and found that 278 dynamic up-regulated mRNAs may also participate in the initiation of CRC.
Construction of a 4 miRNA- 25 mRNA regulatory network
According to the above dynamic deregulated miRNAs and mRNAs, we aimed to construct a miRNA-mRNA regulatory network. We identified 25 target mRNAs by merging the predicted targets from mirTarbase and Tarbase with the 278 dynamic up-regulated mRNAs presented in Fig. 3C (Fig. 4A-B). Next, we validated the regulatory relationship between these miRNAs and their targets using datasets with paired miRNA (GSE41655) and mRNA (GSE41657) data. As shown in Fig. 4C, 4 miRNAs were significantly negatively associated with these 25 target genes (Left panel), and the signature composed of these 4 miRNAs and 25 target genes effectively distinguished the normal, adenoma, and CRC samples (Right panel). Moreover, the TCGA-COAD dataset also contained paired miRNA and mRNA data, and it exhibited a significantly negative relationship between these down-regulated miRNAs and up-regulated mRNAs (Fig. 4D, Left panel), except for hsa-mir-30a, which showed higher expression in the tumor samples compared to the normal samples (Fig. 4D, Right panel). In total, we constructed and validated a regulatory network composed of 4 miRNAs and 25 mRNAs along the colorectal ACS.
Pathway analyses of 25 targeted mRNAs during the colorectal ACS in GSE41657
We firstly conducted WGCNA analysis to explore the distinct tumor biology associated with these 25 targets by identifying modules linked to the signature of 25 target genes (Sig-25Targets) during the three stages of CRC initiation in GSE41657. As shown in Fig. 5A, a total of 7 significant modules can be classified into 3 distinct patterns (R > = 0.2, p < 0.05). In particular, the pattern one (blue, lightcyan, and brown modules) displayed a strong positive correlation with normal samples (R = 0.9, p = 7e-33; R = 0.69, p = 1e-13; R = 0.79, p = 1e-19), but were found to be negatively associated with the Sig-25Targets (R = -0.65, p = 6e-12; R = -0.66, p = 2e-12; R = -0.75, p = 4e-17). The pattern two (magenta and pink modules) exhibited a high positive correlation both with adenoma samples (R = 0.61, p = 3e-10; R = 0.29, p = 0.005) and the Sig-25Targets (R = 0.31, p = 0.003; R = 0.44, p = 2e-05). The pattern three (yellow and midnight blue modules) demonstrated a strongly positive correlation with both CRC samples (R = 0.65, p = 1e-11; R = 0.6, p = 8e-10) and the Sig-25Targets (R = 0.94, p = 1e-40; R = 0.36, p = 5e-04). These results indicated that pattern one represented the expression characteristics of normal samples and was negatively correlated with Sig-25Targets. Pattern two and three represented the expression characteristics of adenoma samples and colorectal cancer samples, respectively, and were positively correlated with Sig-25Targets. Genes in these patterns were considered as co-expression genes of these Sig-25Targets.
Subsequently, KEGG pathway analysis was performed on genes in the aforementioned patterns. Significant pathways were depicted in Fig. 5B, 1286 genes in pattern one were enriched in immune related pathways, including “Nature killer cell mediated cytotoxicity”, “antigen processing and presentation” pathways. Pattern two containing 314 genes was enriched in “cell cycle”. 675 genes in pattern three were enriched in some cell cycle-related pathways like “cell cycle”, “DNA replication”, “Notch signaling”, “Hippo signaling” and “PI3K-Akt signaling” pathways (p < 0.05).
We further conducted GSEA using the tumor samples from GSE41657, based on the high and low expression of Sig-25Targets, with the median expression as the cut-off. GSEA results revealed significant differences between two groups in enrichment of HALLMARK pathways from MSigDB Collection. Specifically, the high expression group exhibited positive associations with stemness and cell cycle related pathways, such as "EPITHELIAL_MESENCHYMAL_TRANSITION", "NOTCH SIGNALING", "WNT_BETA_CATENIN_SIGNALING", "DNA_REPAIR", and "G2M_CHECKPOINT" (p < 0.05, FDR < 0.25, NES > 0). In contrast, the low expression group demonstrated strong associations with immune-related pathways, including "INFLAMMATORY_RESPONSE" and "INTERFERON_GAMMA_RESPONSE" (p < 0.05, FDR < 0.25, NES < 0) (Fig. 5C). These results suggested potential involvement of the 25 target genes in the dysregulation of tumor immune microenvironment and cell cycle-related pathways during the colorectal ACS.
Validation of Sig-25Targets implicated in cell cycle signaling using a single-cell RNA sequencing dataset
Single-cell sequencing provides an elevated level of resolution that cannot be achieved through conventional bulk sequencing methods, thus facilitating the measurement of tumor heterogeneity at the individual cellular level. A dataset of single cell sequencing (GSE161277) including samples from normal tissue, adenoma, and CRC was utilized to validate the distinct expression of 25 target mRNAs in the colorectal ACS (Additional file 3: Figure S1). As shown in Fig. 6A-B, these 25 mRNAs mainly expressed in the epithelial cells but not in other cell types, indicating that they are epithelial cell specific expression in CRC initiation. Monocle analysis further explore the lineage trajectory of epithelial cells along the normal-adenoma-carcinoma transition (Fig. 6C). The results demonstrated that the Sig-25Targets exhibited an increasing trend during the transition to CRC (Fig. 6D). Moreover, there was a corresponding increase in the expression score. of cell cycle-related pathways, such as "KEGG_CELL_CYCLE", "HALLMARK_DNA_REPAIR", and "HALLMARK_G2M_CHECKPOINT" along the sequence (Fig. 6E-G). Notably, the Sig-25Targets showed a positive correlation with scores of "HALLMARK_DNA_REPAIR" and “HALLMARK_G2M_CHECKPOINT” signaling in normal, adenoma, and CRC epithelial cells (Fig. 6H-I, R > 0, p < 0.05). Overall, these results confirmed the association between these 25 target genes and cell cycle-related pathways at the single cell level during CRC initiation.
Validation of Sig-25Targets as an immunosuppressor during CRC initiation.
To investigate the correlation between Sig-25Targets and infiltration of immune cells, we used several cellular deconvolution methods to examine the contribution of different cell types in the tumor microenvironment in GSE41657 and GSE37364. Initially, ESTIMATE analysis revealed a gradual increase in tumor purity score, whereas the immune and stromal scores decreased along the colorectal normal-adenoma-carcinoma stages in both datasets (Additional file 3: Figure S2). Subsequently, correlation analysis demonstrated a strong positive association between the Sig-25Targets and the tumor purity score in the adenoma and CRC groups, but a negative association with the immune and stromal scores in the same groups (Fig. 7A-B, p < 0.05). To further investigate these observed differences, immune cell infiltration analysis based on MCPCOUNTER, XCELL, QUANTISEQ, and EPIC approaches was conducted using the immunedeconv package. Correlation analysis validated that the signature was indeed negatively associated with immune cells like B cells in both datasets (Fig. 7C-D, p < 0.05). To assess the generalizability of the correlation between the Sig-25Targets and the cellular ecosystem, we evaluated ESTIMATE analysis and correlation analysis in TCGA-Pan-cancer and observed a similar correlation pattern: this Sig-25Targets exhibited a positive association with tumor purity score, but a negative association with the immune score and stromal score in ESCA, UCEC, STAD, READ, LUSC, HNSC (Fig. 7D, p < 0.05). These analyses revealed that Sig-25Targets may represent tumor cells and function as an immunosuppressor during CRC initiation.
Efficacy of Sig-25Targets in predicting drug sensitivity.
Due to the fact that drug treatment is also an important treatment modality for CRC, we conducted drug sensitivity predictions of GSE41657 and GSE37364 datasets via OncoPredict package. We then performed correlation analysis between the Sig-25Targets and drug response. As depicted in Fig. 8A, 19 out of 198 drug IC50 values had significant associations with the Sig-25Targets in both datasets (p < 0.05). We further classified these drugs into seven categories and noticed that this signature was negatively associated with the IC50 values of drugs in cell cycle category, such as AZD7762, Wee1 inhibitor, BI2536. According to annotation in the OncoPredict package, these drugs mainly target genes involved in the G2/M checkpoint of cell cycle, such as CHEK1, CHEK2, WEE1, PLK1, PLK2, and PLK3. Correlation analysis then revealed a positive association between the Sig-25Targets and these G2/M checkpoint related targets (Fig. 8B, R > 0.3, p < 0.05). These results suggested that Sig-25Targets may exhibit sensitivity to drugs that target the cell cycle-related pathways.
Furthermore, to identify potential therapeutic response of immunotherapy of these 25 target genes, we utilized the biomarker evaluation module of Tumor immune dysfunction and exclusion (TIDE) analysis. This Sig-25Targets was associated with a poorer ICB outcome in bladder cancer and melanoma treated with ICB therapy (Fig. 8C). In addition, the regulator prioritization module analysis highlighted TNFRSF10B as one of the top targets ranked by this module. In detail, high expression of TNFRSF10B was consistently associated with T cell dysfunction phenotypes across all datasets analyzed (Fig. 8D, Left panel). Moreover, high expression of TNFRSF10B was also linked to a worse outcome in terms of ICB response in bladder cancer and melanoma (Fig. 8D, Right panel). Taken together, these results suggest that patients with higher expression of this signature are less likely to benefit from immune checkpoint inhibitor therapy. These findings provided potential drug options for clinical treatment of patients with higher expression of Sig-25Targets.