Construction of Novel Immune-Related LncRNA Signature and its Potential Prediction of Immune Status in Hepatocellular Carcinoma


 Background: The accuracy of the existing biomarkers in predicting the prognosis of hepatocellular carcinoma (HCC) is not satisfactory. It is necessary to explore biomarkers that can accurately predict the prognosis of HCC.Materials and methods: In this study, the original transcriptome data was downloaded from The Cancer Genome Atlas (TCGA) database. Immune-related long non-coding ribonucleic acids (irlncRNAs) were identified by co-expression analysis, and different expression irlncRNA (DEirlncRNA) pairs were distinguished by univariate analysis. Besides, the least absolute shrinkage and selection operator (LASSO) penalized regression was modified. Next, the cut-off point was determined based on the area under the curve (AUC) and Akaike information criterion (AIC) values of the 5-year receiver operating characteristic curve (ROC) to establish an optimal model for identifying high-risk and low-risk groups in HCC patients. The model was then reassessed in terms of clinicopathological features, survival rate, tumor-infiltrating immune cells, immunosuppressive markers, and chemotherapy efficacy.Results: A total of 1009 pairs of DEirlncRNA were recognized in this study, 30 of which were included in the Cox regression model for subsequent analysis. After regrouping according to the cut-off point, we can more effectively identify factors such as aggressive clinicopathological features, poor survival outcomes, specific tumor immune infiltration status, the high expression level of immunosuppressive biomarkers, and low sensitivity to chemotherapy drugs in HCC patients.Conclusions: The non-specific expression level signature involved in irlncRNAs shows a promising clinical value in predicting the prognosis of HCC patients.

phenotypes of cancer by changing the genome or transcriptome level and varying the immune microenvironment [15]. LncRNAs can activate immune cells by expressing related genes, which leads to tumor immune cells in ltrating [16].
Tumor immune in ltration markers show prospective predictive and prognostic value in tumor diagnosis, treatment, and survival evaluation [17][18][19]. Moreover, lncRNAs have a close relationship with tumor immunity; the study of its combination with tumor immunity will help to establish these markers. Hong et al. constructed a model to predict the prognosis of HCC according to immune-related lncRNAs (irlncRNAs) and risk scores [20]; Wei et al. generated a marker for evaluating the prognosis of pancreatic cancer by screening nine irlncRNAs [21]; Jiang et al. established a prognostic model composed of three lncRNAs to predict the prognosis of clear cell renal cell carcinoma [21]; Wu et al. integrated clinical data and RNAs, including lncRNA, microRNA (miRNA), and mRNA, and established an immune-related signature to predict patients' survival with head and neck squamous cell carcinoma [22].
Generally speaking, the accuracy of the tumor prediction model based on the combination of two biomarkers is better than that of simple genes [23]. Up to the present, few models have studied the predictive role of lncRNAs and tumor immune-related cells in HCC. This study used a novel modeling algorithm, which does not need to involve a speci c expression level, but through pairing and iteration to establish an irlncRNA signature. Subsequently, we evaluated the diagnostic effect, predictive value, tumor immune in ltration, and chemotherapy e cacy of this signature in HCC patients.

Materials And Methods
Transcriptome data collection and differential expression analysis We downloaded the transcriptome data (RNAseq) corresponding to fragments perkilobase million (FPKM) from the TCGA database (https://tcga-data. nci.nih.gov/tcga/). The GTF les used for subsequent analysis, distinguishing mRNA and lncRNA, were extracted from the genome database, Ensembl (http://asia.ensembl.org). The ImmPort database (http://www.immport.org) was accessed to download the list of identi ed immune-related genes (ir-genes), serving as a screening role to lter irlncRNAs through co-expression methods. The correlation between lncRNAs and ir-genes was analyzed.
The inclusion criteria of irlncRNAs were immune gene correlation coe cient > 0.4 and p < 0.001. Package limma of R was performed to analyze the differential expression among irlncRNAs, and log fold change (FC) > 1 and false discovery rate (FDR) < 0.05 was regarded as thresholds to distinguish different expression irlncRNA (DEirlncRNA) .

DEirlncRNAs pairing
DEirlncRNAs were periodically single paired. Assuming that C was equal to lncRNA A plus lncRNA B, a 0 or 1 matrix was constructed; If the lncRNA A expression level is higher than that of lncRNA B, Cis is de ned as 1; otherwise, C is de ned as 0. After that, the constructed matrix was further screened.
Suppose the expression level of lncRNA pairs is 0 or 1. In that case, it is considered that there is no relationship between pairing and prognosis because there is no speci c level of pairing to predict the survival outcome of patients correctly. When the number of lncRNA pairs with an expression of 0 or 1 accounts for more than 30% of the total pairs, it was regarded as an effective match.

Clinicopathological data acquisition
The clinicopathological information of HCC patients was collected from the TCGA database. After excluding the cases with follow-up time less than 30 days and duplicate data, the adequate data were extracted. Establish a risk model for evaluating the riskScore The single-factor analysis was the rst step. Then, the lasso regression with 10-fold cross-validation was carried out, and the p-value was 0.05. Lasso regression was performed for 1000 cycles, and 1000 random stimulation were set in each cycle. The next step was to record the frequency of each pair in the Lasso regression model that was repeated 1000 times and select the pairs with a frequency more than 100 times for Cox proportional hazard regression analysis and model generation. The AUC value was calculated, and the curve was plotted. The maximum area under the curve (AUC) value is when the curve reaches the highest point, the calculation process is terminated, and the model is regarded as the best candidate. In this study, the 1-year, 3-year, and 5-year receiver operating characteristic curves (ROC) of the risk model were drawn. The formula of the riskScore for all clinical cases is as follows. RiskScore = βiSi. The Akaike information criterion (AIC) value of each point of the 5-year ROC curve is used to determine the cut-off point to distinguish the high or low risk of RiskScores.

Risk model validation
Kaplan-Meier analysis was used to verify the cut-off point to show the difference of survival rate between high-risk group and low-risk group. Then, the survival curve was plotted, and R was used to visualize the riskScore of each case in the model. The R packages, including glmnet, survival, survivalROC, pbapply, surfminer, and pheatmap, were utilized in these analyses. A Chi-square test was conducted to investigate the relationship between the clinicopathological features and model, which in order to validate the clinical appliance usefulness of the generated model. Then the band chart was visualized and marked as below: <0.05 was marked *, <0.01 was marked **, and <0.001 was marked ***. Wilcoxon signed-rank test was performed to analyze the differences in riskScores between various clinicopathological feature groups. Results were demonstrated by Box plots. Univariate and multivariate Cox regression analysis was performed between the riskScore and clinicopathological features to certify the possibility of an independent predictor of clinical prognosis for this model. Results were displayed by a forest plot. Packages of R, including ggupbr, survival, and pHeatmap, were used in these procedures.

Study on tumor-in ltrating immune cells
The currently accepted methods for examining the immune in ltration status between samples from LIHC of the TCGA dataset were used to analyze the relationship between risk and immune cell features. These methods include XCELL, TIMER, QUANTISEQ, MCPCOUNTER, EPIC, CIBERSORT-ABS, and CIBERSORT. The immune in ltrating cell content analysis between the high-risk and low-risk groups used the Wilcoxon signed-rank test. Results were shown by a box diagram. The relationship between immune in ltrating cells and riskScores was performed by Spearman correlation analysis. A lollipop chart was drawn to show the correlation coe cient of the results. P < 0.05 was considered as a signi cant threshold. Ggplot2 packages of R conducted the procedure.
The noteworthy relationship between the model and the clinical therapeutics In order to assess the clinical application value of the model in the treatment of HCC, the IC50 of commonly used chemotherapy drugs in the TCGA project of the LIHC dataset was calculated. Anti-tumor medications such as bleomycin, doxorubicin, erlotinib, gemcitabine, methotrexate, mitomycin, paclitaxel, rapamycin, cisplatin, and sorafenib are commonly used in the treatment of various types of malignant tumors recommended by AJCC guidelines. Wilcoxon signed-rank test was performed to compare the difference of IC50 between the high-risk group and low-risk group. The results were shown in the box diagram drawn by ggplot2 and pRRophetic of R.

Expression analysis of immunosuppressive molecules related to ICIs
The relationship between the model and the gene expression level associated with ICIs was examined and visualized by the ggstatsplot and violin plot.

Differential expression analysis of irlncRNAs (DEirlncRNAs)
We downloaded transcriptome data of HCC from the TCGA database, including 50 non-tumor tissues and 374 tumor tissues. The gene transfer format (GTF) les from Ensemble were used to annotate the accessed data, and co-expression of known ir-genes and lncRNAs was analyzed. Totally, 740 irlncRNAs were identi ed, of which 490 were classi ed as DEirlncRNAs. Among the DEirlncRNAs, 16 were downregulated, and 474 were up-regulated ( Figure 1A, 1B, Table S1, S2).

DEirlncRNA pairs screening and risk model construction
A total of 10344 valid DEirlncRNA pairs from 490 DEirlncRNAs were identi ed by iteration loop and 0 or 1 matrix. 1009 DEirlncRNA pairs were screened by a single factor test and modi ed LASSO analysis, of which 30 pairs were involved in the Cox model by the stepwise method. Results were shown in Figure 1C, 1D, and 1E. After that, the areas under the curve (AUC) for each receiver operating characteristic (ROC) curve of 1009 were calculated, and the curve was plotted. At the same time, we found that the maximum AUC value was obtained when the highest point equal to 0.941, then the the optimal DEirlncRNA pair was determined (Figure 2A). Our study also used the Akaike information criterion (AIC) values to determine the maximum in ection point as the cut-off point of the 5-year ROC curve ( Figure 2B). ROC curves at 1, 3, and 5 years were drawing to verify the optimality, which indicated all AUC values exceeded 0.91 ( Figure 2C).
Moreover, AUC values between the 5-year ROC curve and some common clinical parameters were also compared. (Figure 2D). Furthermore, 343 HCC patients' data were collected from the TCGA database, and the risk scores of all patients were calculated. Then the cut-off point was utilized to re-differentiate the high-risk group and low-risk group for veri cation.
Application of risk assessment model in the clinical evaluation Fifty-nine cases and 284 cases were divided into high-risk and low-risk groups according to the cut-off point. Figure 3A and Figure 3B display the riskScores and survival rate of each case. These data indicate that patients' clinical outcome in the high-risk group was inferior to that in the low-risk group. The survival of the high-risk group was poor than that of the low-risk group by Kaplan Meier analysis (p < 0.001) ( Figure 3C). Next, a Chi-square test was conducted to explore the relationship between the risk of HCC and clinicopathological features. The stripping diagram ( Figure 4A) and the scatter plots examined through Wilcoxon signed-rank test displayed T classi cation, tumor stage, tumor grade, Child-Pugh grade, ECOG, vascular invasion, and survival status ( Figure 4B-4H) were signi cantly associated with the risk.  Figure 4I).

Relationship between tumor-in ltrating immune cells, immune molecules and the risk model
Since lncRNAs were initially associated with ir-gene, we explored whether this model correlates with the tumor immune microenvironment. Our study found that there is a positive correlation between the highrisk group and tumor-in ltrating immune cells, such as B cells, Neutrophils, and macrophages, while negatively correlated with CD8 + T cells, CD4 + T cells, and monocytes. Spearman correlation analysis was carried out in detail, and the results were shown as a lollipop diagram ( Figure 5A). Since ICIs currently play a signi cant role in the treatment of HCC, we explored the correlation between the risk model and ICI-related biomarkers. The results showed that high risk scores were associated with the expression of CD276 (p < 0.001), GSDME (p < 0.001), HAVCR2 (p < 0.01), and TNFRSF18 (p < 0.05) ( Figure 5B-5E). There was no statistical difference between the high-risk scores and ir-genes, such as CTLA4, PDCD1, and LAG3 (All p > 0.05, Figure 5F-5H).

Correlation analysis of risk model and chemotherapy drugs
In addition to checkpoint blocking therapy, we also tried to explore the association between the risk and e cacy of commonly used chemotherapy drugs for HCC. We found that the high-risk score was along with a higher half inhibitory concentration (IC50) of chemotherapy drugs for erlotinib (p < 0.001), methotrexate (p = 0.0024), and rapamycin (p = 0.0043), while a lower IC50 for bleomycin (p < 0.001), doxorubicin (p = 0.0023), gemcitabine (p < 0.001), mitomycin (p < 0.001), and paclitaxel (p = 0.011), suggesting that this model can be served as a potential predictor for the sensitivity of chemotherapies ( Figure 6A-6J).

Discussion
It is necessary to improve the accuracy of prognostic markers for HCC patients. LncRNAs are closely related to normal physiological activities and the development of diseases [10,24]. Furthermore, studies demonstrated that lncRNAs play a vital role in tumor development and anti-tumor processes [25][26][27]. Recent researches have focused on investigating the potential relationship between coding genes and non-coding RNAs to predict patients' prognosis with cancers [28,29]. Regretfully, the majority of these signatures were generated by the particular expression levels of transcripts. Our research did not pay attention to the speci c expression levels of lncRNAs but utilized the method of ir-gene pairing to generate a practical model with the combination of lncRNAs.
First of all, we downloaded the original information of lncRNAs from the TCGA database, and then differential co-expression analysis was performed to catalog DEirlncRNAs. The lncRNA-pairs pairs were veri ed by an improved cyclic single pair method along with 0 or 1 matrix. Secondly, univariate analysis and modi ed LASSO penalty regression were performed to determine DEirlncRNAs pairs, procedures including cross-validation, multiple repetitions, and random stimulation. And then, we gained the optimum model by examining each AUC value of ROC, and the optimum cut-off point was according to the AIC value of each point on the AUC to distinguish the high-risk and low-risk group in HCC cases. Finally, the model was evaluated according to various parameters, such as survival rate, clinicopathological features, tumor-in ltrating immune cells, checkpoint-associated molecules, and chemotherapeutics.
The relationship between lncRNAs and tumors has been paid more and more attention [30][31][32]. Deng et al. established a model to predict HCC patients' survival [33]. The method utilized in this study does not need to check the speci c expression level of each lncRNA but only needs to detect the pairs with high or low expression levels. Therefore, the model is practical and straightforward in distinguishing high-risk or low-risk clinical cases. The lncRNAs included in this model are related to ir-genes, so these lncRNAs may regulate the immune microenvironment and the activation of immune cells.
Our research reveals that part of the DEirlncRNAs included in the modeling play a vital role in the malignant phenotype of many cancers, such as MYLK − AS1 [34,35], THUMPD3 − AS1[36], and DSCR8 [37], especially in the development of HCC. MYLK − AS1 promotes angiogenesis and HCC progression by targeting the miR-424-5p/E2F7 axis and activating the VEGFR-2 signaling pathway [35]. In order to achieve better accuracy and effectiveness of risk prediction, this study used the improved method of the LASSO penalty model [39]. In addition, we determined the maximum value for an optimal model by calculating each AUC value and then compared it with other clinicopathological characteristics to further improve the modeling process. The AIC value was applied to get the ideal cut-off point for model tting instead of only using the median value to discriminate risk. After using this new method to differentiated high-risk and low-risk groups, survival outcomes and univariate and multivariate analysis of clinicopathological features were reevaluated. Moreover, the sensitivity of common chemotherapy drugs to HCC treatment was analyzed. The relationship between high-risk and low-risk groups and tumor immune in ltration, and the relationship between high-risk and low-risk groups and immune checkpointrelated genes were also studied, which indicated that this modeling algorithm has a good clinical application prospect.
The reaction of blocking immune checkpoint is closely related to tumor-in ltrating immune cells [40]. Our research used seven common recognized methods to reckon the immune in ltrating cells, which in order to investigate the relationship between risk scores and tumor-in ltrating immune cells, including XCELL [41,42], TIMER [43,44], QUANTISEQ [45,46], MCPCOUNTER [47], EPIC[48], CIBERSORT-ABS, and CIBERSORT [49,50]. Due to the defects and complexity of these algorithms, they are rarely compared with each other. Through integration analysis, our ndings show that DEirlncRNA pairs have a positive correlation with tumor-in ltrating immune cells such as B cells, Neutrophils, macrophages, while negatively correlated with CD8 + T cells, CD4 + T cells, and monocytes. Wang et al. demonstrated that the immune score could predict the e cacy of immunotherapy and chemotherapy [51]. IrlncRNAs SATB2-AS1 can affect tumor immune cell microenvironment and inhibit colorectal cancer metastasis [34]. LncRNA-EGFR can stimulate T regulatory cell differentiation and promote immune evasion of HCC [35]. Our model suggests that the high risk was related to the sensitivity of chemotherapy drugs such as methotrexate, rapamycin, bleomycin, doxorubicin, gemcitabine, mitomycin, and paclitaxel, rather than the sensitivity of sorafenib. We believe that immunotherapy is more effective than traditional chemotherapy, mainly because immunotherapy can activate the immune function and resist tumors through active immunity. Tumoral mutations can cause a large number of neoantigens to be released, which can be recognized by T cells and cause rich immune cells to in ltrate into the tumor [52][53][54].
We have to admit that our study has some defects and limitations. Firstly, the research data was based on public databases. Some data are incomplete, such as some clinicopathological features and the sensitivity of drugs commonly used in the treatment of HCC, for instance, lenvatinib and oxaliplatin, have not been analyzed; Secondly, the constructed model needs external veri cation because the expression level of each sample is different, which may lead to the unreliability of the nal model. However, this research uses various methods to verify the new modeling algorithm and optimizes and analyzes it. Although the lack of external data validation, from the analysis results, our model was acceptable. But this study will be more convincing if there is external validation. Therefore, our team will re-collect clinicopathological data for the subsequent studies and enlarge the sample size for further veri cation.

Conclusions
Our research shows an innovative signature established by irlncRNAs that do not need to predict the expression level of lncRNA can predict HCC patients' prognosis and may contribute to identifying those patients who could bene t from anti-tumor immunotherapy.