Data source
We downloaded publicly available CRC gene expression datasets from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/gds/) and The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) (Additional file 1: Table S1). We collected a set of 3,360 CRC tissue samples from 21 datasets measured by eight platforms and another set of 618 adjacent normal tissue samples from 23 datasets measured by six platforms. Only 12,401 genes that are shared in all these datasets were used. Especially for CRC samples, we only used the datasets containing more than 50 samples to avoid the influence of small sample bias. Data were preprocessed as previously described[14].
We also downloaded 619 miRNA samples, 536 mutation samples analyzed by MuTect2 algorithm (.maf files), 617 copy number alteration (CNA) samples and 295 DNA methylation samples from the TCGA project (TCGA-COAD and TCGA-READ).
Classification of colorectal cancer tissue samples
The random forest classifyCMS function in the R package CMSclassifier (https://github.com/Sage-Bionetworks/CMSclassifier) was used to assign CMSs to CRC samples based on the gene expression profile of each tissue dataset[2]. Microarray data for all datasets were obtained in the form of normalized expression values on the log2 scale. For RNAseq data, we used log2 transformed FPKM matrix after adding 1 to avoid undefined values.
Classification of colorectal cancer cell lines
The gene expression data of 57 large intestine cell lines were download from the Cancer Cell Line Encyclopedia (CCLE) database (https://portals.broadinstitute.org/ccle). The CMScaller algorithm was used to assign CMSs to CRC cell lines (https://github.com/Lothelab/CMScaller). CMS cell lines were also supplemented with the reported results[15].
Manually curation of gene interaction
Protein–protein interaction network (PPIN) data were downloaded from STRING v11.0 (https://string-db.org/)[16]. We used interactions with the organism of “Homo sapiens” and score more than 0.9 with high confidence. TF–gene interactions assayed by chromatin immunoprecipitation followed by sequencing (ChIP-seq) were also downloaded from the ChIPBase v2.0 database (http://rna.sysu.edu.cn/chipbase/)[17].
PPI and TF–gene interactions in the context of CMS3-CRC were filtered based on their expression correlation measured by the Pearson correlation analysis. We only retained the significantly co-expressed interactions with the threshold of adjusted P < 0.05 and |ρ|>0.3. “ρ” represents the Pearson correlation coefficient.
miRNA-mediated gene regulation
miRNA–gene interactions were all experimentally validated and collected from public databases, including TarBase[18], miRTarBase[19], and miRecords[20]. miRNA–gene regulations in the context of CMS3-CRC were filtered based on their expression correlation measured by the Pearson correlation analysis with the threshold of P < 0.05 and |ρ|>0.3.
We note that these target genes of miRNAs were derived from TCGA data. To avoid biased results, we took the target genes as a gene signature for each miRNA and performed an evaluation step (sigQC) to ensure that the signature used has suitable properties across different datasets. Here, we present radar plots summarizing the gene signature quality control metrics implemented by the R package sigQC[21].
Identification of functional miRNAs
We identified Fmirs by firstly identifying Fgenes. Based on the individual-level differentially expressed genes calculated by RankComp algorithm[22]. We identified genes with significantly upregulated or downregulated patterns and defined them as CMS-related genes. Then, we proposed a computational method to prioritize CMS-related genes and identify Fgenes in CMS3-CRC by integrating multi-omics features[13]. Briefly, for each CMS-related gene in CMS3-CRC, we calculated the scores of seven features that characterized the genomic and transcriptomic activity, regulation by co-expressed TFs, and central roles in PPIN of gene regulators. The aggregated score for each gene was further calculated by integrating the seven separate ranks in CMS3-CRC, using the robust rank aggregation method in R package RobustRankAggreg[23]. Then, we utilized the most effective feature and identified top-ranked 100 genes as the Fgenes.
For miRNA, we directly analyzed the 1,881 miRNAs measured by the TCGA-COAD and TCGA-READ project. We identified Fmirs using a backward derivation approach. Firstly, we identified significantly co-expressed miRNA and genes for CMS3-CRC among the collected miRNA-gene interactions. Then, miRNAs whose target genes were significantly intersected with Fgenes were defined as Fmirs. The hypergeometric test was used to calculate the significance as follows:
Where N is the number of target genes that overlap with all detected genes in the expression profiles, F is the number of target genes that overlap with Fgenes, Ti is the number of target genes of miRNA i, and Fi is the number of target genes for miRNA i that overlap with Fgenes.
Manually curation of CRC driver miRNAs
A panel of oncogenic and tumor-suppressive miRNAs was collected from the public databases miRCancer[24] and OncomiRDB[25]. Briefly, the microRNA–cancer association in miRCancer was extracted using the text mining algorithm against PubMed based on 75 rules. The entries of CRC-related miRNA regulations in OncomiRDB were manually curated from abstracts with direct experimental evidence. Finally, we retained miRNAs associated with CRC and collected 250 CRC-related miRNAs.
Subpathway analysis
The R package “Subpathway-GMir” was performed to identify the metabolism subpathways mediated by miRNAs[26]. We mapped the Fmirs and significantly co-expressed target genes as nodes into the reconstructed metabolism pathway graphs obtained from the package. Then, we identified miRNA-mediated FA metabolism subpathways based on the “lenient distance” similarity method. It evaluated the significance of candidate subpathways using the hypergeometric method. The miRNA–target interactions for CMS3-CRC were collected as described above.
Cell culture and Transfection
Human colon epithelial cell line NCM460 and human CRC cell lines Lovo, SW48, T84, SW948, LS174T, HT29, HCT116 and SW480 were purchased from Procell Life Sciences Co. Ltd., (Wuhan, Hubei, China). Cell lines NCM460 was cultured in RPMI-1640 (Gibco) medium, T84 in DMEM/F12 (Gibco) medium, Lovo in F12K (Gibco) medium, LS174T in MEM, HT29 and HCT116 in (Gibco) McCoy’s 5A, SW48, SW948 and SW480 in L15 (Gibco) medium.
All media for the human cell lines were supplemented with 10% FBS and 1% penicillin/streptomycin antibiotics. SW48, SW948 and SW480 cell lines were cultured in 100% air at 37 °C. Other cell lines were cultured under 95% air-5% CO2 at 37 °C. All cell lines were validated by STR DNA finger-printing. Experiments were carried out within 6 months after acquisition of the cell lines. In addition, mycoplasma contamination was ruled out using a PCR- based method.
HT29 and LS174T cells were transfected with miR-20a mimic and negative control (NC) or miR-20a inhibitor and inhibitor NC (inNC) respectively using Lipofectamine 2000 (Invitrogen, USA). After 48 h transfection, these cells were harvested for further experiments.
Western Blotting
The cells were washed with ice-cold PBS and lysed with RIPA buffer containing protease inhibitors. After mixed with SDS sample buffer, the proteins were heated to 99 °C for 10 min, separated on 10% SDS–polyacrylamide gels, transferred to PVDF mem-branes, and probed with antibodies against β-catenin (1:1000, CST, USA) and β-actin (1:3000, Proteintech, China) at 4 °C overnight and blots were visualized using gel imaging systems (Bio-Rad). ECL was used to visualize the immunoblot signals.
Real Time-quantitative PCR (RT-qPCR)
Total RNA of HT29 and LS174T cells were extracted using total RNA extraction kit (Thermo Fisher Scientific, Waltham, USA). 1 μg RNA was used for reverse transcription into complementary DNA (cDNA) using the PrimeScript RT-PCR Kit (Takara, Tokyo, Japan). Real-time polymerase chain reaction (RT-PCR) was conducted using the SYBR Premix Ex-Taq II kit (Takara, Tokyo, Japan) on the Quant Studio 3 Real-Time PCR System (Applied Biosystems, CA, USA). The fold changes were determined using the 2-△△CT method. All primers used were as follows: Quantitative-FASN, forward 5’-AAGGACCTGTCTAGGTTTGATGC-3’, reverse 5’-TGGCTTCATAGGTGACTTCCA-3’, Quantitative-ACACA, forward 5’-ATGTCTGGCTTGCACCTAGTA-3’ reverse 5’-CCCCAAAGCGAGTAACAAATTCT-3’, Quantitative-ACLY, forward 5’-TCGGCCAAGGCAATTTCAGAG-3’ reverse 5’-CGAGCATACTTGAACCGATTCT-3’, Quantitative-CPT1A, forward 5’-TCCAGTTGGCTTATCGTGGTG-3’ reverse 5’-TCCAGAGTCCGATTGATTTTTGC-3’, Quantitative-Bactin, forward 5’-CATGTACGTTGCTATCCAGGC-3’ reverse 5’-CTCCTTAATGTCACGCACGAT-3’, Quantitative-U6, forward 5’-CTCGCTTCGGCAGCACATATACT-3’ reverse 5’-ACGCTTCACGAATTTGCGTGTC-3’, Quantitative-miR-20a, forward 5’-TAAAGTGCTTATAGTGCAG-3’ reverse 5’-GTCGTATCCAGTGCAGGGTCCGAGGT-3’, Reverse Transcription-miR-20a, 5’-GTCGTATCCAGTGCAGGGTCCGAGGTATTCGCACTGGATACGACCTACCT-3’, Reverse Transcription-U6, 5’-AAAATATGGAACGCTTCACGAATTTG -3’.
Clone-formation assay, EdU assay, Transwell assay and Wound-healing assay
For the clone formation assay, HT29 and LS174T cell lines were transfected with mimic/NC (inhibitor/inNC) on day 1. Digest the cells the next day and seed 200 cells into a six-well plate, followed by the addition of 2 ml of 10% FBS DMEM for 14 days. After 14 days, the cells were fixed in 4% paraformaldehyde and stained with crystal violet for 30 min at room temperature. Colonies consisting of > 50 cells were counted. For EdU assay, cells were plated in 12-well plates and performed transfection. After 48h, 5-ethynyl-2′-deoxyuridine (EdU) assay (BeyoClick™ EdU-488 Kit, Beyotime, Shanghai, China) were performed to analyze the cell proliferation. The cells were incubated with 10μM EdU solution for 2 h and fixed with 4% paraformaldehyde. And the cells were washed with PBS for 3 times and 0.5% TritonX-100 once. Then, the cells were stained with 100 μL 1 × DAPI solution. After washed with 100μL PBS for 3 times, images were obtained from fluorescence microscope for further calculation of proliferation rates.
Transwell assays were conducted using an 8 μm pore Transwell filter (Corning, US). HT29 or LS174T cells were seeded in the upper chamber with serum-free medium. And medium with 10% FBS was added to the lower chamber. After incubation for 48 h, the Transwell filter was washed, fixed with 4% paraformaldehyde and in turn stained with 0.1% crystal violet staining solution.
For wound-healing assay, HT29 or LS174T cells were seeded on six-well plates. Then, a wound across the wall was introduced. After washing with PBS, the plate was incubated with serum-free medium for 24 h. Cell migration was observed and photographed by microscopy.
Immunofluorescence analysis
The cells were seeded on poly-lysine-coated coverslips for 24 h, fixed with 4% paraformaldehyde for 15 min, and permeabilized in 1% Triton X-100 for 10 min. The cells were then blocked with 5% BSA in PBS for 30 min at room temperature and incubated in antibodies overnight at 4°C. The coverslips were washed three times with PBS buffer, and the secondary antibody was applied for 1 h at 37°C. The coverslips were then mounted onto glass slides using an antifade mounting medium (Beyotime, China). Cells were then examined under a laser scanning confocal microscope (Zeiss LSM 510 Meta).