The Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo) database is a public functional genomics data set to help users download statistics and import gene expression. In our research, Gene expression profile data (GSE56350, GSE70574 and GSE95109) were obtained from GEO.
GSE56350, including 8 primary CRC tissues derived from stage II–III CRC patients with (n = 20) or without (n = 15) lymph node metastasis. MicroRNA expression profiling analysis of these samples was performed on Agilent-021827 Human miRNA Microarray [miRNA_107_Sep09_2_105]. Dataset GSE70574, 16 T1-stage CRCs (7 lymph node-positive and 9 lymph node- negative tumors), were processed by Agilent-031181 Unrestricted_Human_miRNA_V16.0_Microarray 030840 (Feature Number version). GSE95109, 13 patients in lymph node negative subgroup and 9 patients in lymph node positive subgroup. The mRNA profiles of all 22 patients were analyzed by Agilent microarray technology to explore the different expression between lymph node negative subgroup and lymph node positive subgroup.
Differently expressed miRNAs and mRNAs analysis
R software was used to compare two groups of tissues. Besides, |log2FC|≥1 and P<0.05 were used as a cut-off criterion and a significant statistical difference would be considered if the statistics met our standards .
Functional and pathway enrichment analysis
FunRich (http://www.funrich.org) is a publicly accessible software with the ability to identify the enriched transcription factors and perform Gene Ontology (GO) functional analysis of upload differentially expressed miRNAs (DE-miRNAs). In this study, transcription factors enrichment analysis was carried out and 10 transcription factors that may regulate DE-miRNAs were identified. Besides, we also used this software to obtain target genes of DE-miRNAs. Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed by using Cytoscape software. ClueGO is a Cytoscape plug-in that visualizes the non-redundant biological terms for large clusters of genes in a functionally grouped network. The network graph of ClueGO is created based on kappa statistics. Each node in the graph represents a term. The connection between nodes reflects the correlation between terms, while the color of nodes reflects the enrichment and classification of the node (which functional group it belongs to).
PPI network and clustered subnetworks construction
The exploration of protein interactions helps to reveal the underlying pathological mechanism of CRC. In this study, we used the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (https://string-db.org/) to construct a protein–protein interaction network.
Prediction of miRNA target genes and miRNA-mRNA regulatory network
A previous study suggested that the function of miRNAs lies in the regulation of target genes. Therefore, the prediction of target genes is particularly important. It can indirectly understand the biological function and enrichment pathway of miRNAs. The miRNA enrichment function in FunRich was used to achieve miRNA targeting prediction. Besides, GSE95109 was downloaded from GEO database. By combining the results from FunRich and difference analysis of GSE95109, screened genes were identified and the miRNA-mRNA regulatory network was built by using Cytoscape software.
Construction of a prognostic signature model
To find out the influence of differential expression of miRNAs on the prognosis of CRC patients, the univariate and multivariate Cox proportional risk regression analysis was carried out for different expression of miRNAs, the miRNAs related to CRC prognosis were selected, and the risk linear model based on The Cancer Genome Atlas (TCGA) data set was established. 569 CRC patients downloaded from TCGA were randomly divided into training groups and test groups. We built a predictive feature model in the training group and tested its validity in the test group. First, the training group was analyzed by univariate Cox regression analysis to select the prognoses related DEGs. In the further functional analysis and development of potential risk characteristics, the least absolute shrinkage and selection operator (LASSO) method was used to regress the high-dimensional prediction factors [16, 17]. The "glmnet" packet in R was used to calculate the coefficient and partial likelihood deviation . Through multivariate Cox regression analysis, we further studied these miRNAs to identify significant targets and build a risk linear model. To better understand the relationship between the selected miRNAs and the prognosis of CRC patients, we constructed a risk prediction model. By using “survival ROC” package, we calculated AUC (Area Under Curve) of 3 years and 5 years dependent ROC (Receiver Operating Characteristic) curve to assess the predictive power of identified miRNAs.
Verification of miRNA expression and mRNA expression
Twenty lymph node non-metastasis CRC tissues and twenty lymph node metastatic CRC tissues were obtained from patients with CRC. We obtained the approval from the ethics committee of Second Affiliated Hospital of Soochow University. Total RNA was extracted from the tissues using Trizol reagent (Mesgen Biotech Co., Shanghai, China). The reverse transcription was performed according to the manufcturer’s protocol. Real-time PCR was performed on the QuantStudio 5 Real-Time PCR System (Thermo Fisher) using qPCR SYBR Green master mix (Vazyme Biotech Co., Nanjing, China). The primers are showed in Table S1. The expression levels of miRNAs were normalized to U6 (internal standard control) and calculated using the 2−ΔΔCt method. All experiments were performed in triplicate.
Hematoxylin and eosin (H&E) staining and analysis.
Fresh colorectal carcinoma and lymph node tissues were fixed in 10% formalin and embedded in paraffin before sectioning and staining. Tissue sections 4 μm thick were deparaffinized in xylene and rehydrated in ethanol series. H&E staining was performed according to standard protocols.
Lymph node negative test and lymph node positive test were performed to evaluate the statistical significance between the two groups. All data analysis was performed using R software (version 3.6.6) and GraphPadPrism6 package (GraphPad Software, Inc., La Jolla, California, USA). Correlation between miRNA and its possible targeted mRNA among individual samples was assessed. A P value of <0.05 was viewed as statistically significant. The correlation analysis between the RT-qPCR results and RNA‑seq results was calculated in Excel 2013 (Microsoft Corporation, Redmond, WA, USA) with the function of CORR.