Development of a prognostic signature based on six immune related lncRNAs for hepatocellular carcinoma.

Background : Hepatocellular carcinoma (HCC) is one of the most common malignant tumor in the world which prognosis is poor. Therefore, a precise biomarker is needed to guide treatment and improve prognosis. More and more studies have shown that lncRNAs and immune response are closely related to the prognosis of hepatocellular carcinoma. The aim of this study was to establish a prognostic signature based on immune related lncRNAs for HCC. Methods : Univariate cox regression analysis was performed to identify immune related lncRNAs, which had negative correlation with overall survival (OS) of 370 HCC patients from The Cancer Genome Atlas (TCGA). A prognostic signature based on OS related lncRNAs was identified by using multivariate cox regression analysis. Gene set enrichment analysis (GSEA) and a competing endogenous RNA (ceRNA) network were performed to clarify the potential mechanism of lncRNAs included in prognostic signature. Results: A prognostic signature based on OS related lncRNAs (AC145207.5, AL365203.2, AC009779.2, ZFPM2-AS1, PCAT6, LINC00942) showed moderately in prognosis prediction, and related with pathologic stage (Stage I&II VS Stage III&IV), distant metastasis status (M0 VS M1) and tumor stage (T1-2 VS T3-4). CeRNA network constructed 15 aixs among differentially expressed immune related genes, lncRNAs included in prognostic signature and differentially expressed miRNA. GSEA indicated that these lncRNAs were involved in cancer-related pathways. Conclusion: We constructed a prognostic signature based on immune related lncRNAs which can predict prognosis and guide therapies for HCC. of HCC, KM analysis of these lncRNAs were significant difference in our present study. Our study provided novel evidence that AC145207.5, AL365203.2, AC009779.2, ZFPM2-AS1 and LINC00942 might be potential predictors of HCC prognosis, and further studies are needed to validate these results and investigate its molecular mechanisms.

Long non-coding RNAs (lncRNAs) are a class of poorly conserved non-coding RNA with transcripts longer than 200 nucleotides [10]. It has been revealed that various lncRNAs can function as signals, decoys, guides or scaffolds for other regulatory proteins [11][12][13]. For HCC, many studies have revealed that lncRNA can affect prognosis by targeting oncogenic or tumor suppressor genic mRNAs. For instance, lncRNA F11-AS1 suppresses HCC progression by competitively binding with miR-3146 to regulate PTEN expression [14], lncRNA ANCR promotes HCC metastasis through up-regulating HNRNPA1 expression [15].
Immune response and process are considered to related with promoting tumourigenesis in many cancers. As we all know, the chronic inflammation induced by HBV infection can lead to liver damage.
In china, more than 80% occurrence of HCC are attributed to cirrhosis and chronic inflammation, which is now considered as an important factor involved in cancer progression [16][17][18]. So, we consider that lncRNAs may affect HCC progression by targeting immune related genes, we called these lncRNAs as immune related lncRNAs.
The underlying molecular mechanisms that mediate recurrence and metastasis remain largely unclear. The construction of an appropriate survival prediction model will help improve the overall prognosis of HCC patients. In present study, we utilized the Cancer Genome Atlas (TCGA) database and other online database to identify a prognostic signature based on immune related lncRNAs for HCC patients.

Data acquisition
Transcriptome profiling RNA-seq data (HTSeq-fpkm) of 50 non-tumor liver specimens and 374 HCC specimens and corresponding clinical data were download from the TCGA data portal (https://portal.gdc.cancer.gov/). Patients without complete follow-up data were excluded, 370 patients were enrolled in our study. We also downloaded the expression profile of miRNA (HTSeq-count) of 50 non-tumor liver specimens and 375 HCC specimens from TCGA data portal. There is no need to get ethical approval due to TCGA database is a public database from TCGA date portal. The present study complied with TCGA publication guidelines and data access policies.

Extracting expression profile of immune related lncRNAs
Firstly, we distinguish the expression profile of genes and lncRNAs. We extracted the expression profile of immune related genes with the help of immune response (M19817), immune system process (M13664) gene set and transcriptome expression profile of TCGA. With the help of co-expression analysis between immune related genes and lncRNAs, we extracted the expression profile of immune related lncRNAs, a threshold of correlation scores more than 0.4 and P<0.001 were considered significant.

Differentially expressed analysis of immune related lncRNAs, immune related genes and miRNAs
Differentially expressed immune related lncRNAs and immune related genes between the HCC specimens and non-tumor liver specimens were identified with the help of limma package and differentially expressed miRNAs were identified by edgeR package. A threshold of |log2fold change (log2FC)| > 1 and false discovery rate (FDR) < 0.05 were considered significant.

Identification of OS related immune related lncRNA
Due to the low overall survival (OS) rate of HCC, we chose OS as the primary endpoint. We performed univariate cox proportional hazards regression analysis to identify OS related immune related lncRNAs, P<0.01 was selected as the cut off value.

Construction and evaluation of an immune related lncRNAs prognostic signature
We performed LASSO analysis based on the results of univariate cox analysis to avoid over-fitting the prognostic signature. According to the results of LASSO analysis, we performed multivariable cox proportional hazards regression analysis to construct prognostic signature based on immune related lncRNAs. The risk-score is calculated by multiplying the cox regression coefficient by the gene expression data. Patients were categorized as high-and low-risk groups according to the median risk score. Kaplan-Meier curves were plotted to compare the OS of high-risk and low-risk groups, we also performed KM analysis of the lncRNAs included in prognostic signature. The receiver operating characteristic (ROC) curve was plotted as well and the area under the ROC curve (AUC) was calculated to evaluate the predicting efficacy of prognostic signature. Univariate Cox analysis and multivariate Cox analysis were performed among clinicopathologic characteristics and risk score to identify the independent prognostic factors for HCC. The relevance between clinicopathologic characteristics and risk score were performed as well.
Gene set enrichment analysis (GSEA) of lncRNAs included in prognostic signature were performed to explore the potential pathways affecting the prognosis of HCC.

Statistical analysis
All the statistical analysis was performed by R software (version 3.6.1) and perl software (version 5.30). Univariate and multivariable cox proportional hazards regression analysis were calculated by survival R software package. Survival ROC R software package was used to calculate AUC of the survival ROC curve. Cluster heatmaps and volcano maps were generated using gplots and heatmap packages. ceRNA network was constructed by Cytoscape software (version 3.6.1). GSEA was performed using GSEA software (version 4.0.1).

Construction of prognostic signature
According to the results of univariate cox regression analysis (P<0.05), 31 lncRNAs were identified that were related to OS (Figure 2A). LASSO analysis revealed that 11 lncRNAs were properly considered to construct a prognostic signature ( Figure 3). According to the results of multivariable cox regression analysis, six OS related immune related lncRNAs were included in the prognostic signature ( Figure 2B). We constructed the prognostic signature according to the expression level of these six lncRNAs and their coefficients. The formula was as follows: risk score = (0.3342 * the expression level of AC145207.5) + (0.0865 * the expression level of AL365203.2) + (0.092 * the expression level of AC009779.2)+ (0.0441 * the expression level of ZFPM2-AS1)+ (0.0801 * the expression level of PCAT6)+ (0.0266 * the expression level of LINC00942). HCC patients were categorized as high-risk group (n=185) and low-risk group(n=185) according to the median risk score.

Evaluation of prognostic signature based on OS related immune related lncRNA
Based on the results of Kaplan-Meier analysis, we found that high-risk group had a significantly poorer OS than low-risk group (P=2.094e−06, Figure 4A), the 1-, 3-and 5-year survival rates of high-risk group were 72.3%, 50.1% and 36.2%, respectively. However, in the low-risk group, the corresponding survival rates were 92.6%, 73% and 58.5%, respectively. All lncRNAs included in prognostic signature were found to be negative correlation with OS of HCC patients ( Figure 5A-F).
Compared with other Clinicopathological parameters, the AUC for the prognostic signature was the highest (0.778), suggesting moderate predicting efficacy in OS monitoring ( Figure 4B). A heat map about the expression profile of the six lncRNAs show that all lncRNAs were negative with OS ( Figure   6A). A dot plot of survival status revealed that patients in high risk group had much higher mortality rate than those in low-risk group ( Figure 6B). Figure 6C showed the rank of prognostic index and distribution of groups.
Principal component analysis (PCA) revealed that risk-score can better distinguish high-risk and low risk patients compared with all lncRNAs, immune related lncRNAs, differentially expressed immune related lncRNAs (Figure 7) Univariate cox regression analysis revealed that pathologic stage, tumor stage, distant metastasis status and risk-score were related to OS ( Figure 8A). Multivariate cox regression analysis suggested that risk-score could become an independent predictor after other parameters were adjusted, including age (≥65 years old), gender, tumor grade, pathologic stage, tumor stage, lymph node metastasis status and distant metastasis status ( Figure 8B).

Analysis the clinical relevance of the prognostic signature
In order to apply the prognostic signature to the clinical, we analyzed the relevance between risk score and clinicopathologic characteristics, including age(≥65 years old), gender, tumor grade (Grade

Potential molecular mechanisms of lncRNAs included in prognostic signature
According to the results of the ceRNA network ( Figure 10A), five of the lncRNAs included in prognostic signature, six differentially expressed immune related genes and 32 differentially expressed miRNAs were included in the network. Finally, 15 immune-related ceRNA aixs were constructed ( Table 1).
The results of GSEA (c2.cp.kegg.v7.symbol.gmt) showed that mainly function of these six lncRNAs were related with cancer, including nine high expressed pathways (cell cycle, DNA replication, NOTCH signaling pathway, tight junction, ERBB signaling pathway, bladder cancer, pathways in cancer, rig i like receptor signaling pathway, nod like receptor signaling pathway) and three low expressed pathways (Complement and coagulation cascades, PPAR signaling pathway, Drug metabolism cytochrome P450). (Table 2 and Figure 10B). High expressed pathways were considered to promote cancer formation, invasion and metastasis, while low expressed pathways were considered to inhibit cancer formation, invasion and metastasis.

Discussion
HCC is one of the most common malignant cancer around the world with poor prognosis. It is necessary to construct an effective survival prediction model to help improve the overall prognosis of HCC patients. In present study, with the help of TCGA data portal, we constructed a prognostic signature based on immune related lncRNAs. Six lncRNAs (AC145207.5, AL365203.2, AC009779.2, ZFPM2-AS1, PCAT6, LINC00942) related to OS were included in the prognostic signature.
We then evaluated the prognostic signature using various kinds of analysis. In present study, HCC patients with high risk score based on the prognostic signature had a poorer OS compared with those with low risk score. The results of the multivariate cox regression analysis showed that the risk score was an independent predict factor for predicting the OS of HCC patients. The area under the ROC curve (AUC) for the prognostic signature also suggested moderate predicting efficacy in OS monitoring. In our study, risk score was relevant with distant metastasis status, pathologic stage and tumor stage. PCA revealed that risk score can better distinguish high-risk and lower risk patients. The results of these analysis suggest that the prognostic value of the prognostic signature is robust and reliable for predicting OS in HCC patients.
With respect to lncRNAs included in prognostic signature, previous studies has revealed that ZFPM2- There are some limitations of our present study. First, the present study lacks independent validation cohort. Second, in vitro or in vivo experiments are needed to validate the molecular mechanisms and pathways.

Conclusion
We constructed a prognostic signature based on immune related lncRNAs which can predict prognosis and guide therapies of HCC.

Competing interests
The authors have declared no conflict of interests

Funding
This work was supported by no funding.

Authors' contributions
Conceptualization: Shao-qiang Li.       Kaplan-Meier analysis of lncRNAs included in prognostic signature.  Principal component analysis of all lncRNAs, immune related lncRNAs, differentially expressed immune related lncRNAs and risk score. Figure A represents all lncRNAs. Figure B represents immune related lncRNAs. Figure C represents differentially expressed immune related lncRNAs. Figure D represents risk score. Blue dots represent low-risk group, red dots represent high risk group.

Figure 8
Univariate and multivariate Cox regression analysis of clinicopathologic characteristics and risk score. Figure A represents the results of univariate cox regression analysis. Figure B represents the results of multivariate cox regression analysis.

Figure 9
Clinical relevance of risk score. A: The relevance between risk score and distant metastasis status. B: The relevance between risk score and pathologic stage. C: The relevance between risk score and tumor stage.