Co-Expression Network Revealed the Potential Regulatory Mechanism of Lncrnas in Relapse Hepatocellular Carcinoma.

The mechanistic basis for the relapse of hepatocellular carcinoma (HCC) remains poorly understood. Recent research has highlighted the important roles of long non-coding RNAs (lncRNAs) in HCC. However, there are only a few studies on lncRNAs associated with the relapse of HCC. Methods: We analyzed lncRNA and mRNA proles in the GSE101432 dataset associated with HCC relapse. The differentially expressed lncRNAs and mRNAs were used to construct an lncRNA-mRNA co-expression network. Weighted gene co-expression network analysis followed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses were conducted on the database. Furthermore, correlation and survival analyses were performed using The Cancer Genome Atlas database, and the clinical samples were veried by qRT-PCR. signatures for predicting the prognosis of patients with HCC relapse. Our study indicates that the and LIN00668 may serve as candidate prognostic biomarkers and targets for new therapies in human gastric


Introduction
Liver cancer is the sixth most prevalent cancer in world. [1] Furthermore, the incidence rate of liver cancer showed the greatest increase from 2007 to 2016 at 2-3% per year, although this rate has decreased as compared to that in previous years. For all stages combined, the 5-year relative survival rate is the lowest (18%) for liver cancers. [2] Although patients receive curative treatment for early hepatocellular carcinoma (HCC), up to 70% of them may experience relapse in the liver after 5 years. [3] Therefore, we aimed to explore the relapse mechanism of liver tumors and search for prognostic-associated long non-coding RNAs (lncRNAs), which are of great signi cance as they provide useful information for the diagnosis and prognosis of liver cancer.
lncRNAs are a group of non-coding RNAs that are more than 200 nucleotides in length and have no ability to encode proteins. An expanding body of evidence shows that lncRNAs modulate gene expression at multiple levels through many complex mechanisms. In addition to their well-established role as regulators of transcription, lncRNAs are also effective regulators of pre-mRNA splicing, mRNA decay, and translation. [4] Portions of lncRNAs are speci cally expressed in different tissues and cancers. [5] In addition, lncRNAs are involved in the pathological process of tumors and act via mechanisms, such as cis-regulation at enhancers, chromatin reprogramming, and post-transcriptional regulation of mRNA processing. [6] The emergence of wide-ranging cancer projects, such as The Cancer Genome Atlas (TCGA) research network project and Gene Expression Omnibus (GEO) has provided an excellent opportunity to characterize lncRNAs in various cancers. Furthermore, bioinformatics analysis has been used to establish lncRNA features with prognostic value: for example, the identi cation of three lncRNA prognostic markers of ovarian cancer according to genome-wide copy number variation. [7] Additionally, a few survival-related lncRNA signatures have been discovered in malignant tumors of the digestive system, such as gastric cancer [8] and pancreatic cancer. [9] While a few reports have focused on the bioinformatics analysis of liver cancer, [10][11][12] few studies related to the recurrence of liver cancer can provide us.
In this study, we repurposed and integrated HCC data from TCGA and GEO to identify potential lncRNA signatures for predicting the survival of patients with relapse liver cancer. Combining the correlation analysis of lncRNAs and mRNA expression, it was discovered for the rst time that lncRNAs are essential for the diagnosis, treatment, and prognosis of liver cancer relapse.

Data downloaded and preprocessing
The expression pro le of genes associated with liver cancer (GSE101432) was obtained from the NCBI Gene Expression Omnibus database. Furthermore, the GEPIA 2 database was used to analyze the gene expression pro les from the Cancer Genome Atlas dataset and the Genotype-Tissue Expression projects. [13] Kaplan-Meier Plotter was used to analyze overall survival (OS) and recurrence-free survival (RFS). [14] 2.2 Read alignment and analysis of differentially expressed genes (DEGs) The clean reads were aligned with the human GRch38 genome using tophat2, allowing up to 4 mismatches. [15] Uniquely mapped reads were ultimately used to calculate the read number and reads per kilobase million (RPKM) for each gene. The gene expression levels were evaluated based on the RPKM. The software edgeR, [16] which is speci cally applied to analyze the differential expression of genes was used to screen the RNA-seq data for DEGs. A fold change (FC) ≥ 2 or ≤ 0.5 and a false discovery rate (FDR) ≤ 0.05 were used as cutoffs.

LncRNA prediction and direction identi cation
To identify lncRNAs associated with liver cancer relapse, we executed the pipeline described by Liu S et al. [17] , which was based on the cu inks software developed by Trapnell et al. [18] 2.4 Weighted gene co-expression network analysis (WGCNA) and co-expression analysis The lncRNAs and mRNAs that were differentially expressed between primary tumors, relapsed tumors, and benign adjacent tissue in the dataset were utilized for WGCNA [19] to reveal gene expression patterns. All expressed genes were considered input data. The eigengenes of each clustering module were utilized as the representative expression pattern of genes in each module. In order to investigate the regulatory relationship between lncRNAs and their target mRNAs, we calculated the Pearson's correlation coe cients (PCCs).

Functional enrichment analysis
Gene Ontology (GO) terms and KEGG pathways were analyzed to determine the functional categories associated with the identi ed DEGs using the KOBAS 2.0 server. [20] Benjamini-Hochberg FDR controlling procedure and hypergeometric test were used to de ne the enrichment of each term. Reactome pathway analysis was also conducted for functional enrichment analysis of the selected genes sets.

qRT-PCR
To further verify our results, samples of tumors and matched adjacent tissue were obtained from 5 patients with primary HCC and 3 patients with relapsed liver cancer who underwent tumor resection at the First A liated Hospital of Kunming Medical University from July 15, 2019, to December 20, 2020. This study was approved by the Ethics Committee of the First A liated Hospital of Kunming Medical University. A signed informed consent form was obtained from each patient or their family members, who were provided a detailed explanation about the study. The total RNA samples were isolated using TRIzol reagent. Next, quantitative real-time reverse transcription-polymerase chain reaction (qRT-PCR) analysis was performed using SYBR Green Master Mix and an ABI Quant Studio 5 system, and changes in gene expression were calculated using the comparative CT approach. Primers used in the study are listed in Table 1.

Statistical analysis
Principal component analysis (PCA) analysis was employed by R software package factoextra (https://cloud.rproject.org/package=factoextra) for effective dimension reduction, pattern recognition, and exploratory visualization of high-dimensional data of the whole genome. After normalizing the reads by TPM (Tags Per Million) of each gene in samples, in house-script (sogen) was utilized for visualization of next-generation sequence (NGS) data and genomic annotations. The pheatmap package (https://cran.rproject.org/web/packages/pheatmap/index.html) in R was used to carry the clustering based on Euclidean distance. Student's t-test was used by SPSS22.0 for comparisons between two groups. For results, P value < 0.05 and a false discovery rate (FDR) < 0.25 were saw as statistically signi cant.

Results
3.1 A comprehensive catalog of lncRNA and mRNA genes in primary and relapse tumors and benign adjacent tissue.
Employing the edgeR software package, 4051 lncRNAs and 4968 mRNAs were found to be differentially expressed in the GSE101432 dataset, of which 1218 lncRNAs and 1676 mRNAs were upregulated in the primary tumor samples and 1540 lncRNAs and 1787 mRNAs were upregulated in the relapse tumor samples (Fig. 1).
3.2 Identi cation of differentially expressed (DE lncRNAs) and mRNAs (DE mRNAs) in the recurrence of liver cancer Figure 2A shows a Venn diagram that illustrates the overlap between the relapse and primary tumors. Box plots for gene expression data were created to assess the distribution of DE lncRNAs (Fig. 2B). The analysis of the data revealed that a higher number of lncRNAs in the two groups were upregulated (68%). Next, principal component analysis (PCA) was performed to compare the DE lncRNAs and DE mRNAs between the primary tumor and relapse tumor groups (Fig. 2C, 2D). It can be clearly seen that lncRNA expression differs between the primary tumor and the relapse tumor groups. Therefore, lncRNAs could be used as important markers for liver cancer relapse. In addition, a Venn diagram was generated to identify the genes that were similar between all the groups. The intersection of combined co-upregulated and co-downregulated lncRNAs was identi ed between primary tumors and relapse tumors (co-upregulated = 508, co-downregulated = 216) (Fig. 2E).

Co-expression network and gene enrichment analysis
We used scatter plots to determine the co-expression of DE lncRNAs with DE mRNAs in relapse tumor samples (Fig. 3A). The top 10 Gene Ontology (GO) terms of the DE mRNAs co-expressed with the speci c upregulated lncRNAs were identi ed using the KOBAS 2.0 server (Fig. 3B). These processes included homophilic cell adhesion, cartilage condensation, and cell-cell junction organization. We then constructed a co-expression network between speci c upregulated lncRNAs and co-expressed DE mRNAs, involving the top 10 GO terms ( Fig. 3C). We obtained the top three DE lncRNAs and co-expressed mRNAs which were speci c up-regulated in relapse tumor and which were expressed in the primary and relapse tumors and benign adjacent tissue (Fig. 3D).

Identi cation of weighted gene co-expression network analysis (WGCNA) modules
We performed WGCNA analysis on all DE lncRNAs and DE mRNAs to obtain module-trait associations (Fig. 4A). Here, we focused on modules related to liver cancer relapse, including the pink, black, green-yellow, and yellow modules (R > 0.5). When comparing the fold change in lncRNAs and mRNAs with that in the control group in the four modules that were positively correlated in relapse liver cancer, all the genes were found to be upregulated in the four modules (Fig. 4B). We performed GO functional enrichment analysis on mRNAs co-expressed with lncRNAs in the four modules. The rst 15 GO terms in the yellow and black modules were related to cell proliferation, differentiation, and survival. Therefore, the yellow and black modules were selected for further analysis. We generated an integrated lncRNA-mRNA regulatory network based on the hub genes identi ed within the yellow-and black-module hub genes (Fig. 4C, 4D).

Identi cation of hub lncRNAs based on key modules
Using the results of the co-expression network and WGCNA analyses, we constructed a bar graph of the expression level of selected lncRNAs from the GSE101432 dataset (Fig. 5A). Therefore, we selected RP11-334A14.8, RP4-738P11.4, TRBV6-6, LINC00668, and LINC00941, which showed higher relapse-tumor group than in the primary-tumor group for qRT-PCR. Taking into account TCGA database correlation analysis (Fig. 5B) and lncRNA qRT-PCR results, mRNA was selected for clinical sample veri cation (Fig. 5B).

Validation of lncRNA and regulated gene expression network in clinical samples
To directly validate a subset of these bioinformatics results, we assessed lncRNA and mRNA expression levels in a distinct cohort of the primary liver tumor, relapse liver tumor, and matched normal tissue samples via qRT-PCR (*P < 0.05, **P < 0.01, ***P < 0.001, **** P < 0.0001) (Fig. 6). Table 2 listed clinical samples relevant information. At the same time, according to the results of qRT-PCR, the expression levels of LOX, OTX1, MICB, NDUFA4L2, BAIAP2L2 and KCTD17 in recurrent HCC group were higher than which in primary HCC group, and had statistical signi cance in recurrent HCC.

Survival analysis of lncRNAs related to relapse liver cancer
The Kaplan-Meier potter was used to assess the relationship between these genes and liver hepatocellular carcinoma patient survival, leading to the identi cation of prognosis-related lncRNAs (Fig. 7). The OS and RFS of LINC00668, LINC00941 and their co-expressed mRNA in HCC were statistically signi cant.

Discussions
Liver cancer is the second most common cause of cancer-related deaths. [21] Surgical resection is the most effective rst-line treatment for speci c patients. The 5-year recurrence-free survival (RFS) rate after partial hepatectomy is only 48.4%. [22,23] For HCC, most studies that have published lncRNAs signatures associated with prognosis have focused on primary liver cancer, while few have focused on relapse liver tumors and thus, nding effective biomarkers for HCC is crucial.
Data mining strategies can be used to explore signi cant biological phenotypes associated with highdimensional datasets. TCGA and GEO databases, with large-scale genomic analyses, can evaluate the molecular features associated with cancer outcomes. Recent developments in next-generation sequencing technologies have greatly expanded our knowledge on non-coding RNAs, and these non-coding RNAs are considerably more abundant than mRNAs. [24] Many studies have revealed the role of lncRNAs in cancer development, indicating their potential as novel biomarkers for cancer diagnosis and prognosis. [25][26][27] To better understand the molecular markers of relapse HCC, we comprehensively analyzed the database and identi ed lncRNAs that can be used to predict overall survival (OS) and RFS. Herein, we rst retrieved GSE101432 dataset containing data pertaining to the HCC primary and relapse tumors, and matched non-tumor tissues from the GEO database, following which we analyzed the differential expression patterns of lncRNA and mRNA.
We identi ed 1218 lncRNAs and 1676 mRNAs upregulated in the primary tumor samples and 1540 lncRNAs and 1787 mRNAs upregulated in the relapse tumor samples. The results of PCA revealed a signi cant difference for recurrent liver cancer between lncRNA and mRNA. Additionally, we found that 1540 of the 2261 (68.1%) lncRNAs were upregulated and 508 genes were co-upregulated between the two groups, suggesting upregulated lncRNAs play a more important role in relapse HCC than downregulated lncRNAs.
LncRNA-mRNA co-expression combined with GO enrichment analysis showed that lncRNAs were related to cell adhesion, cell-cell junction organization, adherens junction organization, cell junction assembly, and pattern speci cation process, suggesting that lncRNAs are related to the development of cancer. In the relapse tumor group, the co-expression network showed that upregulated lncRNA and co-expressed DE mRNA are involved in processes related to cancer metastasis; this creates conditions to promote the recurrence of liver cancer. CTD-2369P2.5 co-expressed protein coding gene ICAM1 that is associated with liver cancer has a high level of expression, and this expression is regulated by lncRNAs. In HCC, ICR speci cally affects cancer stem cell properties of ICAM-1(+) HCC cells and lncRNA ICR contributes to portal vein tumor thrombus development. [28] WGCNA technology has been leveraged to identify upregulated gene modules associated with HCC recurrence.
Additionally, WGCNA has been used to analyze lncRNA and mRNA by clustering genes with similar expression patterns. [29] It considers the expression of all genes evaluated in the experiment to reveal co-expressed gene clusters (modules), which are likely also co-regulated. If some genes are co-expressed in the control group but not expressed in the pathological samples, it can be assumed that the regulatory mechanism has changed, which may be the cause or result of the disease. Therefore, the genes in these modules can play a role in cancer and are therefore considered potential therapeutic targets or diagnostic/prognostic biomarkers. [30] Herein, we employed WGCNA by assessing co-expressed gene networks in the GSE101432 dataset, leading to the identi cation of 15 gene co-expression modules. Based on the p-and R-values, we chose four modules namely: black, green-yellow, pink, and yellow, and performed GO enrichment analysis on them. Of these four modules, the yellow and black modules were most closely linked to cell proliferation, differentiation, and survival, as well as some transcription-related biological processes; this indicates that there are multiple abnormal cell activities in relapse HCC. These two modules were thus suspected to be crucial regulators of relapse HCC, and were therefore analyzed further.
Given that the co-expression network and WGCNA were found to be closely associated with the relapse of HCC, we next screened for hub genes that showed higher expression and correlation in relapse tumors. We found that the expression of RP11-334A14.8, RP4-738P11.4, TRBV6-6, LINC00668, and LINC00941 was higher in the relapse tumor group than in the primary tumor group. We then assessed the expression of these genes via qRT-PCR in a separate set of primary-HCC patient and relapse-HCC patient clinical samples, which con rmed that LINC00668 and LINC00941 expression levels were increased in both HCC tumor groups, whereas PHACTR2 expression was reduced in these samples relative to control tissue levels. LINC00668 and LINC00941 and their co-expression mRNA were selected using correlation analysis with R > 0.1 and p < 0.05 as the selection threshold values. Therefore, we selected BAIAP2L2, MDFI, ZNF750, and SLC44A5 as LINC00668 co-expression mRNAs.
Besides, GPR160, LOX, NXPH4, OTX1, GMNN, and MICB were LINC00941 co-expression mRNAs. As a result, BAIAP2L2, MDFI, LOX, OTX1, and MICB were highly expressed in both the groups. This means that LINC00941 is co-expressed with LOX, OTX1, and MICB and is related to the recurrence of liver cancer. Furthermore, LINC00668 was found to be co-expressed with BAIA02L2, KCTD17, and NDUFA4L2, which has biological signi cance for the recurrence of liver cancer.
To the best of our knowledge, LINC00941 has been reported to exhibit protumorigenic and prometastatic behaviors during tumorigenesis, such as colorectal cancer (CRC) and gastric cancer. [31,32] Wu et al. found that LINC00941 activates the TGF-β/SMAD2/3 axis in metastatic CRC, which provided new insight into the mechanism of metastatic CRC and a novel potential therapeutic target for advanced CRC. This is a potential marker for recurrent liver cancer. As with LINC00941 co-expression mRNAs, expression of OTX1 has been found to be positively correlated with nodal metastasis status (p = 0.009) and TNM staging (p = 0.001) in HCC tissues. [33] Moreover, overexpression of OTX1 promotes the proliferation, migration, invasion, and tumor angiogenesis of HCC cells. [34] MICB is highly expressed in HCC, and its expression level is signi cantly and negatively associated with TNM stages. [35] Among the patients with different stages of hepatitis, asymptomatic individuals have higher MICB serum levels, while liver cirrhosis patients have lower MICB serum levels (p < 0.0001) compared to those in other patient groups. [36] The lysyl oxidase (LOX) family members are secreted copper-dependent amine oxidases, which are characterized by catalytic activity that contributes to the remodeling of the cross-linking of the structural extracellular matrix. [37] Umezaki et al. found that high LOX expression was associated with epithelial-mesenchymal transition (EMT) markers and predicted early recurrence and poor survival in patients with HCC. This is supported by our ndings, which indicate thatLINC00941 and coexpression mRNA are potential biomarkers and therapeutic targets for HCC recurrence.
However, another lncRNA, located at ch18p11.31, may play a pivotal role in the recurrence of liver cancer. In HCC, the molecular mechanism indicated that LINC00668 affects cell division, cell cycle, mitotic nuclear division, and drug metabolizing cytochrome P450 enzymes (all p ≤ 0.05). [38] However, the co-expression mRNA, BAIAP2L2, was found to be highly expressed in gastric cancer tumors and its expression signi cantly correlated with tumor diameter, TNM stage, and lymph node metastasis, respectively. [39] KCTD17 is upregulated in the liver tissues of obese mice and nonalcoholic fatty liver disease patients; however, few studies have been published on its role in cancer. [40] NDUFA4L2 can promote cell migration, invasion, proliferation, and EMT of cancer cells under hypoxic conditions. [41] In terms of survival analysis, whether in OS or RFS, both LINC00941 and LINC00668, as well as the co-      Relative expression levels of lncRNAs and mRNAs in clinical samples.