Comprehensive analysis of miRNA sequencing profiles identifies novel deregulated and prognostic biomarkers in hepatocellular carcinoma

Background: Liver cancer is the fourth most common cause of cancer-related death and rank sixth in terms of incident cases. We aim to identify a set of miRNAs and a miRNA-based signature related to tumorigenesis and prognosis in patients with hepatocellular carcinoma (HCC). Methods: We analyzed the miRNA sequencing profiles of 373 HCC patients downloaded from The Cancer Genome Atlas LIHC program. The isoform quantification profiles were transformed into 5p and 3p mature miRNA names. Differentially expressed (DE) miRNAs between tumor and adjacent normal tissues were identified by Wald test based on the negative binomial distribution. Prognostic miRNAs associated with overall survival were confirmed by multivariate Cox proportional hazards models. The miRNA-based signatures were obtained from the linear predictors of cox regression, and the prognostic performance was compared by Harrel’s C-index and revealed by the restricted mean survival (RMS) curve. Results: The selected twelve DE miRNAs showed a good performance to classify tumor tissues from normal tissues. Meanwhile, a miRNA-based prognostic signature of eight mature miRNAs was constructed, which significantly stratified patients into high- vs low-risk groups in terms of overall survival (hazard ratio, 4.11; 95% CI, 2.71-6.24; P<0.001). When integrated with clinical information, the composite miRNA-clinical signature showed improved prognostic accuracy relative to the eight-miRNA signature alone. As we set the follow-up time at 5 years, the estimated RMST difference between low- and high-risk group stratified by miRNA index was 1.39 (95% CI: 0.95-1.83) months, which is lesser than the difference between miRNA-clinical risk groups (1.63, 95%CI: 1.20-2.06). Functional enrichment analysis indicated that the target mRNAs of selected miRNAs were mainly enriched in cancer-related pathways and vital cell biological processes. Conclusions: The proposed DE miRNAs and miRNA-clinical signature are promising biomarkers for discrimination and predicting overall survival respectively in HCC patients. These biomarkers may have significant relevance for development of new drug research and targeting therapies for HCC patients.


Introduction
Liver cancer is predicted to be the sixth most commonly diagnosed cancer and the fourth leading cause of cancer death worldwide in 2018, with about 841,000 new cases and 782,000 deaths annually [1]. With a 5-year survival of 18%, liver cancer is the second most lethal tumor, after pancreatic cancer [2]. Despite improvements in therapeutic strategies of liver cancer over recent years, such as liver resection, transplantation, ablation, embolization and chemotherapy [3], the long-time survival of patients with HCC is not improved due to tumor recurrence and metastasis after liver resection [4].
Exploring the molecular mechanisms related to tumorigenesis and progression of HCC survival is helpful for developing new drug research and targeting therapies in early stage to prolong patients' survival time.
MicroRNAs (miRNAs) are produced through two steps via primary miRNAs (pri-miRNAs) and precursor miRNAs (pre-miRNAs) with cleavage by Drosha and Dicer. [5] The pre-miRNA, approximately 70 nucleotides in length, is composed of a 5p arm, a 3p arm and a terminal loop. Mature sequences of pre-miRNAs are named as miR-#-5p and miR-#-3p, nearly 22 nucleotides in length [6,7]. They play essential roles across various biological processes in post-transcriptional regulation, which selectively bind to 3' untranslated region of message RNAs (mRNAs) thus inducing to mRNA degradation or protein translation repression [7]. Several length and/or sequence variants of the same miRNA, which were called isoforms of miRNA (isomiR), are observed by using high-throughput sequencing technology [8]. Various miRNAs have elucidated to be related with kinds of cancers, such as breast cancer [9], non-small cell lung cancer [10], ovarian cancer [11] and liver cancer [12,13].
Accumulating evidence showed that miRNAs played critical roles in pathological processes of HCC, such as cell proliferation, metastasis, migration, DNA methylation and so on [14].
The available public database, such as The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO), are helpful for data researchers to conduct an in-depth analysis of identifying potential biomarkers across various cancer types [15,16]. Recent study showed that a six-miRNA signature as an independent factor for predicting recurrence and outcome of patients with HCC was identified based on precursor miRNA profiles [17]. Furthermore, a miRNA combination was established as a promising tool for HCC diagnosis [18]. However, an isomiR-based signature related to tumorigenesis and patients' prognosis of HCC has not been published.
In this study, we aim to identify a class of mature miRNAs that might serve as clinical biomarkers of HCC discrimination and survival outcome from a perspective of isomiR profiling. Firstly, we identified the differentially expressed (DE) miRNAs between tumor and normal tissues with the miRNA sequencing data of HCC from TCGA. Then we constructed a miRNA-based signature that effectively distinguish patients into a high-or low-risk. This signature might be helpful in guiding the development of new drug research and more effective individualized therapies, and improving survival outcome of patients with HCC.

Data processing and normalization
Forty hundred and twenty-five miRNA expression profiles and their clinical information on 373 cases with liver cancer were downloaded from the Cancer Genome Atlas (TCGA) data portal (https://tcgadata.nci.nih.gov/). The miRNA sequencing (miRNAseq) data were generated by Canada's Michael Smith Genome Sciences Centre (GSC) at the British Columbia (BC) Cancer Agency between 2010 and 2015. After checking the annotations, 13 profiles of 10 patients were deleted as they are not hepatocellular cases. One profile of non-matched normal tissue and three profiles of recurrent tumor tissues were also excluded. Raw read counts of mature miRNAs annotated MIMAT accession IDs were derived from isoforms quantification profiles with self-defined Python codes. Then we translated 2166 MIMATs into 5p and 3p mature strand names and removed 14 dead miRNAs using miRBase version 22.1 (http://www.mirbase.org/). We generated a log2(RPM + 1) normalized data, first normalizing read counts of each mature miRNAs to total reads of per million mapped reads, then scaling to log2(RPM + 1).

Differential Expression Analysis
For the comparison of primary tumors and adjacent normal tissues, the differentially expressed (DE) miRNAs were identified with DESeq2 (release 3.9) package, which was based on the assumption that the read counts of deep sequencing data follow the negative binomial distribution [19]. Also, paired ttest was used to the log2(RPM + 1) normalized data and p-values were adjusted with false discovery rate (FDR) [20]. The significantly DE miRNAs were defined as |mean difference between paired tissues|≥2 and FDR adjusted p-value < 0.05. Considering the abundance of miRNAs varied greatly in tumor tissues, we select those DE miRNAs with a median tumor expression (362 samples) greater than or equal to 20 RPM for drawing a heatmap. DE miRNAs (rows) and paired tissues (columns) were clustered using unsupervised hierarchical clustering with Euclidean distance as distance measure and Ward's hierarchical clustering as clustering method [21].
Construction of a prognostic miRNA signature and a miRNA-clinical signature Prognostic miRNAs were selected using Cox univariate analyses to assess the association between each mature miRNAs and patients' overall survival. To ensure the robustness of results for great varies in tumor expression, prognostic miRNAs with a p-value less than 0.05 and a median tumor expression greater than or equal to 20 RPM were candidates to build the miRNA risk index. To future reduce the prognostic miRNAs, we applied the Cox proportional hazards regression model combined with the least absolute shrinkage and selection operator (LASSO, glmnet package, version 3.0-2) [22].
The penalty parameter was estimated by 10-fold cross-validation set at the minimum partial likelihood deviance [23]. Final model was constructed with only those statistically significant miRNAs using multivariate Cox stepwise regression method.
The miRNA risk index was achieved from the linear predictors of cox regression with multiple prognostic miRNAs. To separate patients into low-and high-risk groups, the optimal cutoff was determined by a time-dependent receiver operating characteristic (ROC) curve (survivalROC package, version 1.0.3) at 5 years [24]. We used the nearest neighbor estimation method to estimate the survival ROC curve [25]. Here we chose zero instead of the optimal value as a cutoff value because the optimal value (0.13) was close to zero and it would infinitely approach to zero as the sample size increased.
The prognostic miRNA signature was assessed in patients with different groups of clinical factors, including gender, age group, race, treatment type and tumor stage. Then we integrated miRNA risk index with the clinical features to a composite miRNA-clinical signature in a multivariate cox model.
The cutoff value for miRNA-clinical signature was set as zero other than the optimal value (-0.034) to separate patients into low-and high-risk. The prognostic performance of the miRNA-clinical signature was compared with that of the miRNA signature in terms of C-index and revealed by the restricted mean survival (RMS) curve. RMS represents the life expectancy at 10 years for patients with different risk. A higher RMS time ratio corresponds to a larger prognostic difference.

Functional Annotation And Enrichment Analysis
Potential gene target of DE miRNAs and prognostic miRNAs were obtained from two bioinformatics algorithm prediction database (targetScan v7.2 [26]and miRDB v6.0 [27]) and one experimentally validated miRNA-target interactions database (miRTarBase v7.0 [28]). Gene2ensembl database was used to perform gene id conversion. At first, the potential targets of miRNAs were separately obtained from three databases. Those targets predicted by all the three databases were kept, which ensured each mRNA target with experimental evidence. The biological processes of gene ontology and KEGG pathway enrichment analyses were performed with DAVID (Database for Annotation, Visualization and Integrated Discovery) Bioinformatics Tool (https://david.ncifcrf.gov/, version 6.8) [29]. Both biological processes of gene ontology and KEGG pathway with P < 0.10 were considered as significantly enriched function annotation.

Differentially expressed miRNAs between normal and tumor tissues
To systematically identify mature miRNAs related to tumorigenesis and prognosis, approximately 2.0 billion reads for 408 samples from 362 HCC patients were sequenced using miRNA sequencing 7 (miRNAseq). The flowchart of this study was shown in Fig. 1. After translating MIMAT accession IDs into 5p and 3p mature strand names using miRBase v22.1, 2152 mature miRNAs were available. For the 46 paired tumor and adjacent normal tissues, 12 miRNAs were differentially expressed (DE), including four upregulated and eight downregulated. The results of DESeq2 for raw read counts and paired t-test for log2(RPM + 1) transformed data were very similar to each other. The expression abundance of the selected 12 miRNAs (miR-199a-5p, miR-199a-3p, miR-139-5p, miR-10b-5p, miR-182-5p, miR-183-5p, miR-130a-3p, miR-424-5p, miR-451a, miR-199b-3p, miR-1269a, miR-675-3p) in normal and tumor tissues were listed in Table 1 and Figure S1.  *Those differentially expressed miRNAs with a median tumor expression (n = 362) greater than or equal to 20 RPM were listed above. Normalized expression data means log2(RPM + 1); RPM was a data normalization as counts normalized to reads per million mapped reads. SD stands for standard deviation; FC stands for fold change between tumor and adjacent normal tissues, and here log 2 (FC) is calculated as the mean difference of normalized data between paired tissues. P-values were derived from Wald test based on the negative binomial distribution (Bioconductor package DESeq2, release 3.9). AUC (95%CI) were obtained from modeling univariate logistic regression, showing the discriminate accuracy of each miRNA.

Prognostic miRNAs Associated With Overall Survival
A total of 185 mature miRNAs were selected as candidate miRNAs to assess the association between each miRNA and patients' overall survival, removing those miRNAs' RPM less than 20 in tumor expression. Also, the observations were deleted if their overall survival (OS) time were missing (n = 7) or less than 30 days (n = 20). Therefore, 335 patients with 118 meeting death events were included in the survival analysis. With the log2 (RPM + 1) transformed data, univariate cox regression identified 65 miRNAs with statistical significance. As shown in Table 2 and Figure S2, 16 miRNAs were selected using LASSO method with the seed number as 10101 and lambda as the minimum partial likelihood deviance (0.04221). The miRNA risk index was calculated from multivariate Cox regression model with 8 miRNAs (let-7a-3p, miR-148a-3p, miR-151a-3p, miR-210-3p, miR-215-5p, miR-361-5p, miR-582-3p, miR-9-5p) at a significant level. Zero was used as a cutoff value to separate the patients into highand low-risk groups ( Figure S3). The hazard ratio of the eight-miRNA signature between high-risk and low-risk group was 4.11(95% CI: 2.71-6.24) with statistically significance (P < 0.001). The relation between eight-miRNA risk group and other clinical factors was assessed using univariate cox regression and the results were shown in Fig. 3. In early and mid-term (stage I and stage II), the Kaplan-Meier curves were significantly separated from each other; otherwise, the curves for stage III were somewhat crossed although the pvalue < 0.05 ( Figure S4).

Integrated prognostic miRNA-clinical risk index by combining the miRNA risk index with clinical factors
In multivariate analysis, clinical features and eight-miRNA risk index were independent factors. To further improve accuracy, we derived a miRNA-clinical risk index by combining important clinical features (gender, age group, race and tumor stage) and eight-miRNA risk index (Table S1). If we set the follow-up time at 5 years, the estimated RMST difference between low-and high-risk group stratified by eight-miRNA risk index was 1.39 (95% CI: 0.95-1.83) months, which is lesser than the difference between miRNA-clinical risk groups (1.63, 95%CI: 1.20-2.06) (Table S2). Similar results were observed in the Kaplan-Meier curves for the two risk groups derived from the prognostic miRNA risk index and miRNA-clinical risk index (Fig. 4). We also plotted the restricted mean survival curve for the two signatures, the estimation of overall survival was significantly improved (mean C-index, 0.74 vs 0.77; P = 0.0067) (Fig. 5). Additionally, the accuracy of prediction of five-year survival was slightly enriched GO biological processes terms and KEGG pathways for HCC prognosis-related miRNAs targeted genes are shown in Fig. 6C-D. These results demonstrated that the eight prognostic miRNAs might be associated with cell proliferation, apoptotic process and TGF-β signaling pathway [28]. Our results support use of this eight-miRNA signature as an independent factor for predicting the overall survival of patients with HCC.

Discussion
Patients with HCC are at substantially high risk for recurrence and death, even after surgical resection. A prognostic signature can help oncologists to proceed systematic therapy on patients at a high risk as early as possible. In this study, based on the isomiR profiles, we identified twelve differentially expressed miRNAs related to tumorigenesis through 46 paired tissues, then a miRNAclinical signature, which was derived from multivariate Cox regression by combining eight-miRNA risk index with several clinical factors, significantly stratified the high-risk to the low-risk patients with HCC. These results prompted that the miRNA screened in this study have potential value for the discrimination and prognosis of hepatocellular carcinoma, and might be useful for the development of novel drug research and individualized treatment of HCC.
It is widely accepted that a series of molecular pathways and biological processes are involved in the pathogenesis and progression of HCC. The miR-199 family members (miR-199a-3p, miR-199a-5p and miR-199b-3p in our study) have been reported to under-expression in HCC cells, promoting HCC cell proliferation, increasing invasion and inhibiting apoptosis with different signaling pathway [32,33].
Besides, the mechanism of one DE miRNA (miR-675-3p) and three prognostic miRNAs (let-7a-3p, miR-148a-3p, and miR-151a-3p) in our results is rarely reported. The DE analysis demonstrated that miR-675-3p expression is inhibited significantly in tumor tissues, suggesting as a suppressor in tumorigenesis. With the Cox model used, we found that let-7a-3p and miR-151a-3p may be the risk factors for patients' overall survival (HR > 1, P < 0.05) while miR-148a-3p may be a protective factor (HR < 1, P < 0.05). Unlike to Bai et al. [17], our data mining were based on the isoform quantification profiles other than the pre-miRNA expression data. The analysis strategy of Bai et al. is first identifying the prognostic miRNA, then using two-sided Student's t-test to determine significance differences between tumor and nontumor tissues. In fact, the aim of differential expression analysis is to select those miRNAs that might be helpful for cancer diagnosis, and the miRNA-based signature is related to cancer prognosis, which are two separate things. In our study, there might be some  Figure 1 Flowchart of study design.

Figure 2
Heatmap plot for paired tissues with differentially expressed miRNAs. Tumor tissues (red color) and adjacent normal tissues (blue color) form two distinct clusters. The distance measure is Euclidean distance and clustering method is Ward's hierarchical clustering. The expression data were normalized to log2(RPM+1).

Figure 3
Associations between the prognostic eight-miRNA signature and clinical factors for overall survival of HCC patients.

Figure 4
Kaplan-Meier curves of HCC patients' overall survival for the prognostic eight-miRNA signature or miRNA-clinical signature. The RMS curves for the prognostic eight-miRNA signature and miRNA-clinical signature.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. Supplemental.docx