Discovery and validation of immune- related long non-coding RNA biomarkers associated with prognosis in hepatocellular carcinoma

Background : Hepatocellular carcinoma (HCC) is one of the most common clinical malignant tumors, resulting in high mortality and poor prognosis. Studies have found that LncRNA plays an important role in the onset, metastasis and recurrence of hepatocellular carcinoma. The immune system plays a vital role in the development, progression, metastasis and recurrence of cancer. Therefore, immune-related lncRNA can be used as a novel biomarker to predict the prognosis of hepatocellular carcinoma. Methods : The transcriptome data and clinical data of HCC patients were obtained by using The Cancer Genome Atlas-Liver Hepatocellular Carcinoma (TCGA‑LIHC), and immune-related genes were extracted from the Molecular Signatures Database (IMMUNE RESPONSE M19817 and IMMUNE SYSTEM PROCESS M13664). By constructing the co-expression network and Cox regression analysis, 13 immune-lncRNAs was identified to predict the prognosis of HCC patients. Patients were divided into high risk group and low risk group by using the risk score formula, and the difference in overall survival (OS) between the two groups was reflected by Kaplan-Meier survival curve. The time - dependent receiver operating characteristics (ROC) analysis and principal component analysis (PCA) were used to evaluate 13 immune -lncRNAs signature. Results : Through TCGA - LIHC extracted from 343 cases of patients with hepatocellular carcinoma RNA - Seq data and clinical data, 331 immune-related genes were extracted from the Molecular Signatures Database , co-expression networks and Cox regression analysis were constructed, 13 immune-lncRNAs signature was identified as biomarkers to predict the prognosis of patients. At the same time using the risk score median divided the patients into high risk and low risk groups, and through the Kaplan-Meier survival curve analysis found that high-risk group of patients' overall survival (OS) less low risk group of patients. The AUC value of the ROC curve is 0.828, and principal component

analysis (PCA) results showed that patients could be clearly divided into two parts by immune-lncRNAs, which provided evidence for the use of 13 immune-lncRNAs signature as prognostic markers.
Conclusion : Our study identified 13 immune-lncRNAs signature that can effectively predict the prognosis of HCC patients, which may be a new prognostic indicator for predicting clinical outcomes. Background Hepatocellular carcinoma (HCC) is the most common histological type of primary liver cancer [1]. Worldwide, HCC is one of the most common malignant tumors, the fourth leading cause of cancer death, and the sixth most common malignant tumor in the world [2]. Because of its strong invasion and high incidence of metastasis, hepatocellular carcinoma often has a poor prognosis, so early diagnosis and treatment have become the key to improve the survival rate of HCC patients. It is necessary to develop efficient and non-invasive markers for early diagnosis and prognosis.
The occurrence and development of hepatocellular carcinoma (HCC) is a complex process involving multiple genes and multiple pathways [3]. Long non-coding RNA (lncRNA) is a class of regulatory macromolecular RNA discovered in recent years. Its transcripts were longer than 200nt. For a long time, lncRNA was considered as a meaningless transcriptional by-product [4]. Recent studies have found that lncRNA can regulate gene expression at transcriptional, post-transcriptional and epigenetic levels, and participate in many pathophysiological processes such as activation of proto-oncogenes, transcriptional activation and interference, binding and regulating the activity of corresponding proteins [5]. Current studies have shown that lncRNA plays an important role in the pathogenesis, metastasis and recurrence of hepatocellular carcinoma [6,7].
The occurrence and development of hepatocellular carcinoma (HCC) is closely related to 4 the medium stimulation in the microenvironment [8]. Hepatocellular carcinoma (HCC) is a complex ecosystem containing non-tumor cells (mainly immune-related cells), and the success of immune checkpoint suppression in solid tumors underscores the critical role of tumor microenvironment in cancer progression [9]. Therefore, immune-related lncRNAs can be used as novel biomarkers to predict the prognosis of hepatocellular carcinoma.
In this study, we through the The Cancer Genome Atlas -Liver Hepatocellular Carcinoma (TCGALIHC) and the Molecular Signatures Database to collect the transcriptome data of HCC patients clinical data and immune related genes. Through the establishment of coexpression network and Cox regression analysis, immune-related lncRNAs that can predict the prognosis of HCC patients were identified, and use the risk score to HCC patients are grouped, multivariate Cox regression analysis confirmed that the identified lncRNA has the independent prognostic role. Time-dependent receiver operating characteristics (ROC) analysis and principal component analysis (PCA) were used to further predict their prognostic ability, and gene set enrichment analysis (GSEA) was used to help explain the underlying mechanism.

Materials And Methods
The patient cohort and lncRNA profiles were mined Transcriptome data and clinical information of Hepatocellular Carcinoma patients can be extracted from The Cancer Genome Atlas -liver Hepatocellular Carcinoma (TCGA-LIHC) (http://larssonlab.org/tcga-lncrnas/index.php) [10,11]. After evaluating the clinical data, 343 patients with hepatocellular carcinoma (HCC) were included in the study after excluding patients who were followed up for no more than 30 days (because they were likely to die from fatal complications other than HCC) or who had RNA-seq data but no clinical prognostic information [12,13]. In order to identify the differential expression of lncRNA and mRNA between HCC and adjacent non-tumor tissues, the R/Bioconductor package "edgeR" was used to process the mRNA and lncRNA expression data [14].
Immune-related genes were extracted from the Molecular Signatures Database (IMMUNE RESPONSE M19817 and IMMUNE SYSTEM PROCESS M13664) [15]. Immune-related lncRNAs were identified by establishing a co-expression network. Univariate and multivariate Cox regression analysis was used to analyze the immune-related lncRNA (p < 0.05) related to the survival time of patients for model construction [16].

Validation of prognostic model and survival analysis
Immune-lncRNAs associated with prognosis were selected and the prognostic model was further constructed by multivariate Cox analysis. After the expression values of each specific gene were included, the risk score formula for each patient was constructed and weighted by its estimated regression coefficients in the multi-factor regression analysis.
According to the risk score formula: Here, lncRNAi represents the prognostic lncRNA and expression (lnRNAi) is the expression level of lncRNAi for the patient. Regression coefficient of multivariate Cox analysis was denoted as coefficient (lncRNAi) which represents the contribution of lncRNAi for prognostic risk scores [17]. Patients with higher risk scores tend to have poorer survival.
HCC patients can be assigned to high-risk or low-risk groups according to the median risk as the critical point. Overall survival differences between high-risk and low-risk groups were assessed using the Kaplan-Meier method and compared using the log-rank test [18].

Statistical analysis
In order to evaluate the independence of immune-related lncRNA signature from key clinical factors, multi-factor regression analysis and stratified analysis were used to examine the role of risk score in predicting patient prognosis, hazard ratio (HR) and 95% confidence interval (CI) were calculated by Cox analysis, and clinical correlation analysis was performed [19]. Survival prediction comparisons based on lncRNA signature and key clinical characteristics were performed using time-dependent receiver operating characteristics (ROC) analysis [20]. Based on principal component analysis (PAC) of immune-lncRNAs and whole genes, the prognostic accuracy of the identified immunerelated lncRNAs was further verified [21].

Functional enrichment analysis
Through the Gene Set Enrichment Analysis (GSEA) of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO), the possible molecular mechanism of the prognosis difference between the high-risk group and the low-risk group was investigated [22,23]. The genes were sequenced according to the degree of differential expression in the high-risk group and the low-risk group, and then the enrichment of the preset gene set was examined.

Results
The co-expression network was constructed to identify the immune-related lncRNAs signature ). Immune-related lncRNA was identified by constructing an immune co-expression network, and 542 lncRNAs were identified as immune-related (P≤0.01) ( Table S1). The immune-related lncRNA expression was combined with the survival data, and 56 immunerelated lncRNAs (Table S2) related to patient prognosis were identified by univariate Cox regression analysis and the forest map was drawn (Fig. 1). Using multi-factor Cox regression analysis, 13 immunity-related lncRNAs were further screened out for prediction model construction (Table 1)

Prognostic validation of 13 immune-related lncRNAs signature in HCC patients
In order to establish the focused expression characteristics of immune-related lncRNAs for survival prediction, the expression profiles of 13 immune-lncrnas signatures with independent prognosis were selected to establish a multivariate Cox regression model to evaluate their relative predictive power. In the above multi-factor Cox analysis, the expression level of immune-related lncRNAs with independent prognosis and the weighted method of regression coefficient were used to characterize 13 immune-related lncRNAs signature. The estimation method of prognostic risk score was as follows: Risk score patients with risk score, according to the score of the median value, 343 patients were divided into high-risk and low-risk group ( Fig. 2A), at the same time to calculate the risk score is associated with cancer death (Fig. 2B), the relationship between as shown in the heat map (Fig. 2C), LINC01554 expression level increased with the increase of risk score gradually reduce, the rest of the immune related lncRNA expression level gradually increased with the increase of the risk score. Kaplan-meier analysis was used to evaluate the difference in survival between the high-risk group and the low-risk group (Fig. 2D), and the results showed that the overall survival (OS) of the high-risk group was worse than that of the low-risk group.

Evaluation of 13 immune-related lncRNAs signature models
Univariate and multivariate Cox regression methods were used to determine whether 13 immune-related lncRNAs signature could be used as independent predictors of HCC patients. According to the results of univariate analysis, risk score was significantly correlated with prognosis. After multivariate adjustment using the above factors, the risk score remained a reliable and independent predictor of risk in the cohort (P <0.001) ( Table 2). Based on univariate and multivariate analyses, we constructed a forest graph (Fig. 3A, B) that included independent prognostic factors (risk score, age, gender, grade, stage and TMN staging). The prognostic effects of 13 immune-related lncRNAs signature was evaluated by time-dependent receiver operating characteristics (ROC) analysis and calculating the area under the ROC curve (AUC) (Fig.3C). The results showed that the AUC of 13 immune-related lncRNA signature was 0.828, showing higher specificity and sensitivity compared with other independent prognostic factors. At the same time, the clinical correlation between 13 immune-lncRNAs signature and T staging was analyzed (Fig.3D), and it was found that AC010761.1, AC023157.3, AC145207.5, AL031985.3, LINC01554, MIR210HG, and ptov1-as1 were correlated with T staging. Through principal component analysis (PCA), based on the 13 immune related lncRNAs signature (Fig. 4A) and the gene expression profile (Fig. 4B) studies the difference between low risk and high risk groups, show that low risk and high risk groups are usually distributed in different directions, the risk gene can significantly HCC patients can be divided into two parts, also for the identification of 13 immune related lncRNAs signature as to provide evidence for the HCC patients prognosis.

Discussion
Hepatocellular carcinoma (HCC) is one of the most common malignant tumors in clinical practice and the third leading cause of cancer death, accounting for more than 80% of the incidence of liver cancer [24]. Hepatocellular carcinoma is characterized by strong invasiveness and high incidence of metastasis, resulting in high mortality and poor prognosis, with a 5-year survival rate of only 10%-30% [25,26]. Early diagnosis and treatment have become the key to improve the survival rate of HCC patients. At present, common clinical markers of liver cancer include alpha fetoprotein(AFP) desrcarboxy prothrombin DCP) squamous cell carcinoma antigen-IgM complexes SCCA-IgMCs), etc [27,28]. Although these markers have some specificity in the diagnosis of HCC, they are less sensitive for early diagnosis, and HCC is usually diagnosed at an advanced stage. Therefore, it is necessary to develop efficient and non-invasive new markers for early diagnosis and prognosis.
Tumor microenvironment (TME) is a complex integrated system of tumor cell growth [29,30]. This microenvironment is composed of tumor constituent cells, endothelial cells, immune cells, fibroblasts and extracellular matrix [31]. The immune system plays an important role in the tumor microenvironment and plays a vital role in the development, progression, metastasis and recurrence of cancer [32]. Cancer cells and immune cells show metabolic reprogramming in the tumor microenvironment, which is closely related to immune cell function and edits tumor immunology [33]. The latest findings in immune cell metabolism offer broad prospects for clinical therapies to treat cancer and will be critical to the development of identifying biomarkers [9,34].
LncRNAs are a class of RNAs that are longer than 200 nucleotides but do not encode proteins [35]. Studies have proved that its expression is tissue-specific, plays an important role in the life process, and participates in almost all aspects of cell life activities. It can participate in the regulation of cell proliferation, cell differentiation and other life activities at epigenetic level, transcriptional level and post-transcriptional level [36,37]. Therefore, abnormal expression of lncRNA is closely related to the occurrence and development of many human diseases, including tumors [38,39]. Many studies have proved that lncRNAs can regulate gene expression through interaction with proteins or RNA, thus regulating cell proliferation, apoptosis and metastasis and participating in the occurrence and development of HCC [40]. Differential expression of lncRNAs in tissues can distinguish between cancer tissues and adjacent tissues, indicating that lncRNAs in cancer tissues have a characteristic expression profile [41]. At present, the abnormal expression of many lncRNAs in hepatocellular carcinoma has been identified. In HCC tumor tissues, the up-regulation of lncRNAs such as H19 and HOTAIR is associated with distant metastasis, and the down-regulation of lncRNA such as MEG3 is associated with cell proliferation [42][43][44]. Since the abnormal expression of some lncRNAs has been proved to be closely related to cancer, the use of lncRNAs as a marker for cancer diagnosis and prognosis has also become a new research direction [45]. Currently, immune-related lncRNA has not been reported as a prognostic marker for HCC patients.
In this study, we obtained transcriptome information and clinical data of a HCC patient using TCGA-LIHC Database, and obtained immune-related genes through Molecular Signatures Database. Through the establishment of co-expression network and Cox regression analysis, 13 immune-related lncRNAs signature was identified to predict the prognosis of HCC patients. Risk score was used to distinguish HCC patients into high-risk group and low-risk group, and the differential expression of 13 immune-lncRNAs signature in the two groups was analyzed. Kaplan-Meier survival chart was used to analyze the difference in survival status between high-risk group and low-risk group. Using univariate and multivariate Cox regression analysis, to determine the 13 kinds of immune related lncRNA can serve as an independent predictor of HCC of patients, and use the timedependent receiver operating characteristics (ROC) curve, AUC value of 0.828, to evaluate the immune -lncRNAs on overall survival of predictive value, through the principal component analysis (PCA) has once again proven its predictive value. Gene set enrichment analysis (GSEA) was used to investigate the possible molecular mechanism of the difference in prognosis between the high-risk group and the low-risk group.

Conclusions
In summary, we identified 13 immune-related lncRNAs signature by constructing a coexpression network that can effectively predict the prognosis of HCC patients, which may become a new prognostic indicator for predicting clinical outcomes and contribute to the clinical decision-making of individual treatment. It is worth further study on the molecular mechanism of hepatocellular carcinoma.

Ethics approval and consent to participate
This study was reviewed and approved by the Medical Ethics Committee of Renmin hospital of Guizhou medical university, Guiyang, China.

Consent for Publication
All authors have agreed to publish this manuscript.

Availability of data and materials
All data generated or analyzed during this study are included in this published article.

Competing interests
The authors declare that they have no conflicts of interests.

Funding
This study was supported by the "Xin Miao" Foundation of Guizhou Medical University, NO:5779-10 carcinoma in vitro and vivo studies. Am J Transl Res 2019, 11 (7)