Data source and processing
RNA-sequencing data of LIHC cases were retrieved from the TCGA repository [31], which were operated on the Illumina HiSeq RNA‐Seq platform. The corresponding clinical data including survival time, TNM classification information, risk factors, etc. were also downloaded. The data contained 424 samples, of which 50 were normal, and 374 were primary hepatocellular carcinoma. The use and acquisition of these data are according to TCGA data access policies and publication guidelines.
The human general transfer format obtained from the ensemble website [32] was used to match the ensemble name of mRNA and long non-coding RNA. Clinical samples without precise outcome or follow-up time less than 30 days were excluded from our study.
Selection of immune-related long non-coding RNAs
Two gene sets “immune response” and “immune system process” were acquired from the Molecular Signatures Database [33] and then employed to identify the immune-related genes of LIHC samples. Afterward, these genes were fitted into correlation analysis using “limma” packages and “cor function” in R to find the correlated lncRNAs, with the filter criteria set to absolute cor (corresponding coefficients) > 0.4, adjust P value < 0.001. Then the network was visualized by Cytoscape software [34]. In this study, lncRNAs failed to meet the following conditions were deleted: (1). exhibited expression (FPKM ≥ 1) in at least 50% of the LIHC samples; (2). exhibited no fluctuation in expression levels between samples [35].
Construction of prognostic lncRNA signature and statistical analysis
Univariate cox regression analysis was performed to select a subset of candidate lncRNAs, whose expression level and patient’s overall survival (OS) are closely associated (P < 0.001). Afterward, a multivariate cox regression analysis was conducted to screen these candidates to build the prognostic lncRNA model. The formulae of the risk score were given below. Based on the cut-off of risk score, tumor specimens were divided into high-risk group and low-risk group. Kaplan-Meier survival method was used to calculate the survival rate. Receiver Operating Characteristic curve (ROC) was carried out to assess the effectiveness of this lncRNA model. Akaike Information Criterion (AIC) was adopted in the optimization process. All analyses were implemented using the following packages “survival”, “survminer”, “survival ROC”, “pheatmap” on R platform.
lncRNA risk score = |cor|* expression value
risk score of samples = ∑ (lncRNA risk score)
Clinical feature analysis of lncRNA signature
Based on the foregoing analyses, we obtained the expression matrix of lncRNA biomarkers in LIHC patients and then combined the expression data with clinical information. Subsequently, clinical features such as T-stage (tumor), G-stage (histologic grade) and S-stage (pathological stage) were analyzed to investigate the correlation between these clinic factors and expression of lncRNA biomarkers (P < 0.05) using “ggpubr” package in R.
Gene set enrichment analysis
Gene set enrichment analysis (GSEA) was applied to figure out the underlying mechanism between risk scores, which were derived from the co-expression analysis. Two immune gene sets “immune system process” (M13664 Genes annotated by the GO term GO:0002376), “immune response” (M19817 Genes annotated by the GO term GO:0006955) mentioned above were downloaded from MsigDB database as reference gene sets. The enrichment results were considered statistically significant at a level of FDR < 0.25.
Principal component analysis
Principal component analysis was employed to show the cluster of samples with risk scores separated by different levels of six-lncRNA signature, immune lncRNAs, immune-related genes, and all genes, respectively. The expression matrixes were pre-processed before PCA analysis. De-duplication via taking an average value and disregarded the data exhibited no fluctuation in expression level. Graphs displayed first, second and third major components (PC1, PC2, PC3) in three dimensions. The “limma”, “scatterplot3d” packages were conducted in the analysis.
Statistics
Mean ± standard deviation (SD) was used for data expression. Two‐group comparisons were analyzed by student’s t-test, while multi‐group comparisons were analyzed via one-way ANOVA. Spearman’s correlation analysis was performed to evaluate expression correlation. Kaplan Meier analysis was carried out to analyze overall survival. P value < 0.05 was considered statistically significant.