3.1. Identification of endoplasmic reticulum stress-associated lncRNAs
Firstly, we explored the Molecular Signatures Database for relevant ERS genes, including the IRE1, PERK, and ATF6 gene sets. According to the transcriptome data of TCGA. In total, we identified 744 lncRNAs associated with ERS（CorFilter>0.4 and P-value<0.001）, and draw co-expression network plot(Fig. 1).We defined endoplasmic reticulum stress-associated lncRNAs in this study as ERS-associated lncRNAs.
3.2. Six ERS-associated lncRNAs for establishing risk models
First, 210 lncRNAs associated with prognosis were filtered out of the training samples using univariate Cox regression study (P-value< 0.01). The Lasso-Cox regression study was used for the lncRNAs mentioned before to prevent over-fitting genes. When Log(λ) is at its minimum value, 16 lncRNAs are available for multivariate Cox regression study(Fig. 2A；Fig. 2B).The final six lncRNAs were utilized to construct the risk model for ERS-associated lncRNAs and draw the forest plot(Fig. 2C).When we evaluated the expression levels of the above six ERS-associated lncRNAs in HCC tumor and normal samples, we found that the above six ERS-associated lncRNAs expressed highly in HCC tumor samples (Fig. 2D).The above six ERS-associated lncRNAs are MKLN1-AS, LINC01224, AL590705.3, AC008622.2, AC145207.5 and AC026412.3.The expression level and coefficient of the above six lncRNAs were used to calculate the risk score of each HCC patient, use the equation below.
Risk score equation=MKLN1-AS expression level×1.2381+LINC01224 expression level×0.5136+AL590705.3 expression level×0.8334+AC008622.2 expression level×2.2724+AC145207.5 expression level×0.5644+AC026412.3 expression level×2.7248.
The risk score equation was adopted for obtaining the risk score for the respective patient in the training samples and verification samples. Patients in the training samples were divided into groups at a high risk and a low risk in accordance with their median risk score. In addition, in accordance with median risk scores, patients in the verification sample were assigned into groups at a high risk and a low risk.
Following that, we performed KM survival study on both the entire samples and training samples and discovered that overall survival was significantly shorter in the group at a high risk than in the group at a low risk（Fig. 2E；Fig. 2F). In verification samples, overall survival was also shorter in the group at a high risk than in the group at a low risk (Fig. 2G). Meanwhile, the three samples examined were submitted to a time-ROC curve study. The area under the ROC curve (AUC) for 1, 3, and 5 years was 0.821, 0.733 and 0.703 in the entire sample and 0.842, 0.799 and 0.868 in the training sample, respectively（Fig. 3A； Fig. 3B）. Their area under the ROC curve (AUC) was 0.805, 0.671 and 0.547 for 1, 3, and 5 years, respectively, in the verification samples (Fig. 3C). Moreover, patient mortality increased as risk scores improved in all three samples (Fig. 3D-Fig. 3L). We stratified the different clinical features to further validate the risk model prognostic ability (such as gender, stage, and grade). In the entire samples, the group at a high risk also had a poorer prognosis across different clinical features (Fig. 4A-4F). The above results suggest that six ERS-associated lncRNAs risk models have a better prognosis for patients with hepatocellular carcinoma.
3.3. Evaluating the Independent Prognosis of Six ERS-associated lncRNAs Risk Models
The Hazard ratio and 95% confidence interval (CI) of the Risk score for the entire samples were 1.052 and 1.031-1.072, in the univariate Cox study (p-value<0.001) (Fig. 5A). The Hazard ratio and 95% confidence interval (CI) of the Risk score for the entire samples were 1.035 and 1.012-1.058, respectively, in multivariate Cox study (P=0.002) (Fig. 5B). As a consequence of the above, we conclude that the risk model we developed may be utilized as an independent prognostic factor.
We subsequently performed multi-exponential ROC curve study on the risk scores of the entire samples, training samples and verification samples to investigate the correlation with different clinical features. The AUC for the three samples were 0.821,0.805 and 0.842 respectively (Fig. 5C; Fig. 5D; Fig. 5E). According to the multi-exponential ROC curve study., the AUC of the risk model was much better than the other clinical features.
3.4.GSEA（Gene Set Enrichment Analysis）
In the entire HCC samples, the group at a high risk was enriched in DNA REPLICATION, RNA DEGRADATION, PATHWAYS IN CANCER, INSULIN SIGNALING PATHWAY. And the high risk is also enriched in BLADDER CANCER, SMALL CELL LUNG CANCER, RENAL CELL CARCINOMA. The group at a low risk was mainly enriched in GLYCINE SERINE AND THREONINE METABOLISM, PPAR SIGNALING PATHWAY, RETINOL METABOLISM, FATTY ACID METABOLISM (Fig. 5F). The results of the aforementioned research further indicate that the risk model we created plays a different biological role in the development of cancers.
3.5. To investigate the link between immune checkpoints and Chemotherapy drugs sensitivity in the ERS-associated lncRNAs risk model.
The pRRophetic package in R language was adopted for studying the sensitivity of groups at a high risk and a low risk to chemotherapies in the whole sample of HCC. As revealed by the high IC50 values of Sorafenib, Erlotinib, and Gefitinib in the group at a high risk, the group at a low risk had sensitivity to the above three medicines(Fig. 6A-6C).In the group at a low risk, the high IC50 for Bicalutamide, Bleomycin, Doxorubicin, Gemcitabine, and Imatinib indicated that the group at a high risk was sensitive to the above five medications（Fig. 6D-6H）.Furthermore, this study suggested that the group at a high risk had more activation in 48 immune checkpoints（Fig. 6I）.
3.6. Correlation between ERS-associated lncRNAs risk model and immune cell infiltration.
We found a connection between risk scores and immune cells across seven distinct platforms of immune cell infiltration. Such as, Neutrophil, Macrophage, T cell regulatory (Tregs), Macrophage M0, Macrophage M1, Macrophage M2, T cell CD4+ memory（Fig. 7A）.Further, The group at a high risk had a higher level of immune cell infiltration, which explains why patients in the group at a high risk had a worse prognosis (Fig. 7B-7J).
3.7. Two molecular subtypes by Consensus Clustering algorithm
In accordance with the risk model we built for ERS-associated lncRNAs, consensus clustering was performed on the entire sample of HCC using the ConsensusClusterPlus algorithm. Different molecular subtypes are expressed by the consensus number (K) (k=2-9) (Fig. 8A). K=2 (Fig. 8B) is selected as the best molecular subtype in accordance with the cumulative distribution function (CDF)(Fig. 8C) and relative change in area under the CDF curve (Fig. 8D). As a result, the entire HCC sample was split into two molecular subtypes: Cluster 1 and Cluster 2. According to a Kaplan-Meier survival study, the cluster 2 survival rate was significantly higher than Cluster 1(P-value<0.001) (Fig. 8E). The principal component study (PCA) showed that Cluster 1 and Cluster 2 had different distribution areas, which means our grouping was correct (Fig. 8F). Moreover, PCA study of the group at a high risk and a low risk achieved the same result（Fig. 8G）.
Then, Calculate the stromal, immune, and ESTIMATE scores of the Cluster 1 and Cluster 2 using the ESTIMATE algorithm. Although there was no significant difference in immune scores between Cluster 1 and Cluster 2 (P-value=0.21) (Fig. 8H), Cluster 1 and Cluster 2 exhibited different stromal (Fig. 8I) and ESTIMATE scores(J). Furthermore, we found that Cluster 1 had higher activation of 27 immune checkpoints (Fig. 8k).