Correlation of Hepatocellular Carcinoma Stemness Indices and Clinical Characteristics and Prognosis

Recent studies have shown that cancer stem cell (CSC) is related to the occurrence and development of hepatocellular carcinoma(HCC), but the mechanism has not yet been elucidated. Therefore, it is necessary to explore the relationship between HCC stem cells and the survival time of HCC patients and its mechanism to guide the treatment. We downloaded the RNA-Seq data and clinical information from The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC). The mRNA gene expression-based stemness index (mRNAsi) and DNA methylation-based stemness index (mDNAsi) were calculated through one-class logistic regression (OCLR). By applying the univariable Cox regression analysis, we found that mRNAsi and mRNAsi were signiﬁcantly correlated with overall survival (OS). Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analysis were used to seek for different genes, and searched for hub genes through protein-protein interaction (PPI) analysis. The spearman’s rank correlation coefficient test was applied to analyze the relationship between these hub genes and the stemness indices. methylation data of each sample gene with the model's gene weight vector. Perform a linear conversion on the Spearman correlation coefficient obtained from the sample. The conversion methods were utilized to identify the minimum Spearman correlation coefficient from the Spearman correlation coefficient of each sample, and then Molecular Function; DEGs: differentially expressed genes.


Results
The mRNA gene expression-based stemness index (mRNAsi) and DNA methylation-based stemness index (mDNAsi) levels of HCC samples from TCGA and ICGC were significantly negatively related to clinical characteristics and OS. Analysis of differentially expressed genes and PPI revealed that SNAP25, KPT19, GABBR1 and EPCAM were significantly negative correlation with mDNAsi.

Conclusion
Our new model based on stemness indices-related genes was available for predicting prognosis. The SNAP25, KPT19, GABBR1 and EPCAM were potentially therapeutic targets for HCC patients.

Background
Hepatocellular carcinoma (HCC) is the sixth common cancer and the third most frequent cause of cancer death [1]. Despite surgical resection provides the most potential cure for the early stage HCC, 3 the 5-year survival rate of HCC is less than 20%, mainly due to limited effective interventions for advanced HCC [2,3]. However, own to the absence of specific clinical symptoms in early stages, most patients with HCC are diagnosed in an advanced stage with poor prognosis [3,4]. But the specific pathogenesis of HCC is still not clear [5]. Thus, it is essential to further explore the development and progression mechanism of HCC to achieve early diagnosis and target treatment.
The emergence of the concept of cancer stem cells (CSCs) allows people to know that cancer cells are not exactly the same, but there are many types of cells [6,7]. Therefore, malignant solid tumors consist of a highly heterogeneous cancer stem cell population. And CSCs illustrate that a small number of cancer cells can self-replicate as stem cells and maintain tumor characteristics, just as normal stem cells renew and maintain our organs and tissues [8,9]. Growing evidence showed that CSCs are original cells of tumor and play crucial roles in tumorigenic, metastasis, and recurrence, and resistance to chemotherapy and radiation therapy [10]. However, the potential molecular mechanism and signaling pathway of CSCs in cancer, including HCC, are not clear.
Increasing studies have shown that CSCs are therapeutic targets of cancers [11]. Yang et al. found that HGF/c-Met promoted the enrichment of renal carcinoma cancer stem cells, which may be one of the possible mechanisms of metastasis and recurrence, and may be a new therapeutic target in renal cell carcinoma [12]. Increasing evidence has suggested that clarifying the specific source of CSCs can provide a novel method for individualized treatment [13]. However, little is known about the expression pattern and the specific mechanism of CSCs in HCC.
In this study, we built a model based on the sequencing data of stem cells in the Progenitor Cell Biology Consortium (PCBC) database to predict the stemness indices in tumor samples. Besides, we utilized TCGA cohort to make predictions, and validated our findings by an independent dataset. Furthermore, we analyzed the relationship between the stemness indices, clinical characteristics and OS of patients, and searched different genes between high and low stem cell factor levels samples.
And PPI analysis was performed to find the hub genes, which might provide potential target for stem cell treatment of HCC.

Analyzing And Screening Of Differentially Expressed Genes
We used the R package DESeq2 (v1.24.0) to perform a differential analysis on the RNA-seq data of the mRNAsi-high/ mDNAsi-high and mRNAsi-low/ mRNAsi-low samples, and calculated the differentially expressed genes (DEGs) between the mRNAsi-high/ mDNAsi-high sample and the mRNAsi-low/ mRNAsi-low samples. Differential genes were filtered according to the threshold false discovery rate (FDR) < 0.05 and |log2FC| > 1. . Through these three functional categories, gene functions were defined and described in many aspects. To examine the function of stemness indices differential genes, we performed KEGG pathway analysis and GO function enrichment analysis on the differential genes of mRNAsi and mDNAsi through the R package cluster Profiler (v3.14.0). The significance level was defined as FDR < 0.05.

PPI Network Analysis
The STRING database was used to analyze the PPI network of differential genes, and the degree algorithm of the cytoHubba plug-in in Cityscape software was applied to identify the hub genes. PPI network of organism serves as the backbone of its signaling pathways that mediate cellular responses to the environment and heredity [20]. PPI is commonly used to elucidate the molecular basis of disease and provide detailed knowledge about proteins and their interactions, which can be used to suggest and improve the diagnosis, prevention, and treatment of a disease [21]. PPI network analysis provides information about genes and proteins shared in disease and describes interactions [22].

Statistics 6
A t test was used for continuous variables. The statistical analyses were performed using IBM SPSS 24.0. All statistical tests were two-sided, and the significance level was defined as P < 0.05.
Spearman's rank correlation coefficients were tested to analyze the correlation between two variables. Kaplan-Meier survival curves were generated after stratification of the data and compared with a log-rank test.
Results mRNA expression-and DNA methylation-based stemness indices in HCC Firstly, to explore the relationship between the index distributions of stem cells and clinical characteristics. We ranked the HCC samples according to their mRNA gene expression-based stemness index (mRNAsi) and DNA methylation-based stemness index (mDNAsi) values and tested whether any clinical feature was correlated with low or high stemness index. The mRNAsi and mDNAsi exponential distribution of TCGA HCC samples were shown in Fig. 1a and Fig. 1b, respectively. The mRNAsi exponential distribution of ICGC HCC samples was shown in Fig. 1c.

Relationship Between Stem Cell Index And Clinical Characteristics
To further explore the relationship between mRNAsi and mDNAsi indexes and clinical features of tumor samples, such as: gender, age, pathological stage, whether HBV or HCV infection, immune cell score,TNM stage and iCluster typing, we performed a linear regression test on the above characteristics. As shown in Table 1, the mRNAsi of TCGA samples were significantly related to pathological stage, HBV infection, immune cell score, and iCluster typing (P < 0.05). And the mDNAsi was remarkably associated with immune cell score and iCluster typing (P < 0.05). Furthermore, we analyzed the data from ICGC cohort, which indicated that the mRNAsi was significantly associated with TNM stage ( Table 2, P < 0.05).  HBV infection (Fig. 4c, d, P = 0.15, P = 0.88). The overall survival rate of HCV infection subgroup was lower and the higher TNM stage had lower OS (Fig. 4e, f, P = 0.033, P < 0.05). And the OS of mRNAsilow group was higher than mRNAsi-high group (Fig. 4g, P < 0.001). All these hinted, the stemness indices negatively related to OS and affected the prognosis of patients, which may provide a new objective criterion for evaluating the prognosis of HCC patients.
Screening The Stem Cell-related Genes In HCC Based on the above results, we further analyzed the differentially expressed genes (DEGs). In TCGA cohort, a total of 136 differential genes were filtered according to the threshold, among which the mRNAsi-high group was down-regulated by 110 genes and up-regulated by 26 genes compared to the mRNAsi-low group. The DEGs were mainly down-regulated differentially expression (Fig. 5a). A total of 569 differential genes were selected after filtering, of which mDNAsi-high group were down-regulated by 546 genes and up-regulated by 23 genes compared to mDNAsi-low group. The DEGs were mainly down-regulated differentially expression (Fig. 5b). Among the differential genes in the two sets, mRNAsi and mDNAsi were both down-regulated genes with intersect into four genes, respectively APOBEC3C, C1orf116, GABBR2 and TFCP2L1. In ICGC cohort, the volcano map of DEGs showed there were 190 differential genes. Among them, mRNAsi-high group have 16 genes down-regulated and 174 genes up-regulated compared to mRNAsi-low group, which were major up-regulated differential expression (Fig. 5c). The intersection of the up-regulated genes with the TCGA samples based on the mDNAsi grouping was ALDH1A3. These up-or down-regulated differential genes reflect different gene expressions in HCC samples with high and low stem cell indices, which provides new ideas for the further exploration of HCC pathological mechanisms.

Functional Analysis Of Differentially Expressed Genes
To further explore the function of differentially expressed genes and their roles in tumor genesis, metastasis, recurrence, and drug resistance, a functional analysis of them were performed. GO function annotations of 136 differential genes of TCGA tumor samples grouped by mRNAsi showed without significant enrichment (FDR < 0.05). GO function annotations of 569 differential genes of TCGA tumor samples identified by mDNAsi indicating that the pathway "metal ion trans-membrane transporter activity" and "monovalent inorganic action trans-membrane transporter activity" were significantly enriched (Fig. 5d, FDR < 0.05). KEGG pathway enrichment analysis of mRNAsi differential genes found no significant enriched pathway (FDR < 0.05). For the KEGG pathway enrichment analysis, our results demonstrated the pathway "Camp signaling pathway" was significantly enriched (Fig. 5e, FDR < 0.05). For the GO function annotations of 569 differential genes obtained by ICGC tumor samples based on mRNAsi grouping, the pathway "extracellular matrix structural constituent" and "glycosaminoglycan binding" were significantly enriched (Fig. 5f, FDR < 0.05). The KEGG pathway enrichment analysis of mRNAsi differential genes found three significantly enriched pathways. "PI3K-Akt signaling pathway" was significantly enriched (Fig. 5g, FDR < 0.05). Taken together, we found the DEGs of high and low stemness indices always enriched on pathway correlated with tumor formation, metastasis and recurrence, which help to understand the tumor genesis mechanism and guide the targeted therapy of tumors.

PPI analysis and hub genes analysis of differentially expressed genes
To further find out hub genes related to stemness indices, PPI analysis of differentially expressed genes were performed. 10 genes (SNAP25, EPCAM, CALB2, SOX2, KRT19, GABBR1, MUC1, AFP, GRIN1, and GAD1) were selected as hub genes (Fig. 6a). The correlation between these hub genes and the mDNAsi index were performed. The results suggested that SNAP25, KPT19, GABBR1 and EPCAM showed a significant negative correlation with mDNAsi ( Fig. 6b-

Discussion
In this study, we found that the stemness indices were related to pathological stage, HBV infection,  Furthermore, we investigated the differentially expressed genes of stemness indices, and analyzed the function of differential genes. The pathways "metal ion trans-membrane transporter activity" "monovalent inorganic action trans-membrane transporter activity" "cAMP signaling pathway" "extracellular matrix structural constituent" "glycosaminoglycan binding" and "PI3K/Akt signaling

Acknowledgements
We earnestly appreciate all the participators in our work. We thank the patients and investigators who participated in TCGA, PCBC and ICGC for providing data.

Authors' contributions
JL and ZJY: study concepts, study design, literature research, manuscript preparation and editing; CTZ: definition of intellectual content, literature research, manuscript preparation and editing; XY: data acquisition, literature research and manuscript writing. All authors read and approved the final manuscript.

Funding
This study was supported by funds from the National Natural Science Foundation of China (81702927).

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Consent for publication
Not applicable.

Ethics approval and consent to participate
Not applicable. exosomes in the development and treatment of hepatocellular carcinoma. 2020, 19(1).   probability of the mRNAsi-low group is higher than the mRNAsi-high group in iCluster2. k

Jiao J, González
The survival probability of the mRNAsi-low group and the mRNAsi-high group in iCluster3 has no significantly difference.  TCGA. e KEGG annotation map of mDNAsi differential genes in TCGA. f GO annotation map of the 569 mRNAsi differential genes in ICGC. g KEGG annotation map of mRNAsi differential genes in ICGC.

Figure 6
PPI analysis and correlation between Hub genes and mDNAsi. a Protein interaction between 10 hub genes (the darker the color, the higher the score, the more important it is). b EPCAM is significantly negatively correlated with mDNAsi. c GABBR1 is significantly negatively correlated with mDNAsi. d KRT19 is significantly negatively correlated with mDNAsi. e SNAP25 is significantly negatively correlated with mDNAsi. f AFP has no significantly relation 24 with mDNAsi. g CALB2 has no significantly relation with mDNAsi. h GAD1 has no significantly relation with mDNAsi. i GRIN1 has no significantly relation with mDNAsi. j MUC1 has no significantly relation with mDNAsi. k SOX2 has no significantly relation with mDNAsi.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. Figure S1.tif Figure S2.tif