1. Identification of prognosis necroptosis-related lncRNAs
The workflow of this research is shown in Fig. 1. Univariate Cox regression analysis was applied for 56 necroptosis-related genes. Eventually, 19 prognosis-related genes were obtained, including FADD, MLKL, TRIM11, IPMK, MYC, TNFRSF1A, TRAF2, PANX1, MAP3K7, DIABLO, ID1, HSPA4, FLT3, HAT1, PLK1, ALK, TERT, HSP90AA1, DDX58. And then, a correlation analysis was performed (|correlation coefficients| > 0.4, and p < 0.001) to filter necroptosis-related lncRNAs. 267 lncRNAs were identified, and a co-expression network of necroptosis-related lnRNAs was shown in Fig. 2A. A univariate Cox analysis was performed to confirm necroptosis and prognosis associated lncRNAs, and 41 lncRNAs were identified in the end (P < 0.05) (Fig. 2B). The expression profiles of the 41 lncRNAs in tumor and normal samples were also illustrated in Fig. 2C (* P < 0.05, ** P < 0.01, ***P < 0.001).
2. Consensus Clustering for the Prognosis Necroptosis-Related lncRNAs
Based on the 41 prognosis necroptosis-related lncRNAs, we conducted a consensus clustering analysis for all LUAD samples to explore the relationship between the expression signatures of the 41 lncRNAs and LUAD subtypes. According to the empirical cumulative density function (CDF) plot, when k = 2, the consensus clustering matrix shows that the LUAD samples can be classified into two subtypes (Fig. 3A). According to the consensus clustering analysis result, we next compared the clinical information based on clusters (cluster1 and cluster2). The gender, N staging, and tumor stage were shown to have significant differences (Fig. 3B). Furthermore, to further explore the differences between the two subgroups, Kaplan-Meier survival curves for cluster1 and cluster1 showed that the OS (overall survival) of the two subgroups are different. The OS of cluster 2 is better than cluster 1(Fig. 3C).
3. The immune checkpoint and TME signatures of the cluster1 and cluster2 Subgroups
To explore the tumor microenvironment(TME) features of the two subgroups of LUAD, we performed a correlation analysis for the immune checkpoints PDCD1(PD-1), CD274(PD-L1), PDCD1LG2(PD-L2), CTLA4, and LAG3 with the prognosis necroptosis-Related lncRNA (Fig. 4A). And we also compared the immune checkpoints expression level in different clusters (Fig. 4B-F). Eventually, we found that except for CTLA4, all the other four immune checkpoints have expression differences, and the expression levels for cluster1 are all higher than cluster2. In addition, to evaluate the degree of immune activation of different clusters, we further calculated the tumor microenvironment score by “ESTIMATE” arithmetic, which showed that the immune score of cluster1 is lower than cluster2 while the stromal score and estimate score have no significant difference in cluster1 and cluster2 (Fig. 4G-I).
4. Immune cells infiltration analysis, WGCNA, and GO analysis for clusters
To assess the immune cell infiltration conditions, we applied “CIBERSORT” arithmetic to evaluate the proportions of immune cells in LUAD. The 22 immune cells infiltration in cluster1 and cluster2 were displayed in Fig. 5A. Based on the analysis results and simultaneously combining the immune cells with a relatively high fraction level, we ultimately found that the Plasma cells, T cells CD8, Macrophages M0, and Macrophages M1 are higher in cluster1. On the contrary, the T cells CD4 memory resting, Macrophages M2, Dendritic cells resting, Dendritic cells activated, and Mast cells resting are higher in cluster2.
To further explore the signature of gene expression patterns in different clusters, we utilized the WGCNA algorithm to recognize features for cluster1 and cluster2. According to the WGCNA, we established that the MEgrey module has the highest correlation (|correlation coefficient|=0.56, P < 0.001) with the cluster (Fig. 5B). Therefore, we screened the top-30 intramodule connectivity genes from the MEgrey module, including ANXA10, ASCL1, ALCA, CDHR2, CDHR5, CSAG1, CSAG3, DDX3Y, EIF1AY, EPS8L3, FER1L6, HNF4A, KDM5D, MAGEA12, MAGEA3, MAGEA6, MT-ATP6, MT-CO1, MT-CO2, MT-CO3, MT-CYB, MT-ND2, MT-ND4, MT-ND4L, MYO1A, REG4, RPS4Y1, USP9Y, UTY, ZFY.
And then, Gene Ontology(GO)enrichment analysis was operated to identify functional features of the top-30 genes. A network diagram was plotted to manifest the results of GO analysis, which showed that the CDHR2 has high connectivity with enriched pathways (Fig. 5C). Next, we performed a Kaplan-Meier survival analysis in GEPIA, which showed that the overall survival time for the CDHR2 high expression group was worse than the low expression group in LUAD༈P = 0.033༉(Fig. 5D). Further, we validated the CDHR2 expression status in the HPA database, showing that CDHR2 has high antibody staining in some immunohistochemical samples and a high CDHR2 expression level has a lousy prognosis, which indicates that CDHR2 is a hub gene for the prognosis of LUAD patients (Fig. 5E).
5. Development and validation of a Prognostic Model of LUAD based on prognosis necroptosis-related lncRNAs
A total of 494 tumor samples of the TCGA lncRNAs matrix were randomly divided into train set (348) and test set (146) in a ratio of 7:3. According to the previously screened 41 prognosis necroptosis-related lncRNAs, we processed LASSO regression and 10-fold cross-validation to filter variables for the prognostic model (Fig. 6A, B). In the end, 13-lncRNAs were established to construct the model, including RHOQ-AS1, LINC02693, LINC02273, LINC01800, LINC00996, FGD5-AS1, CADM3-AS1, TMPO-AS1, LINC01116, SFTA3, LINC02802, CRNDE, and PARP11-AS1. The risk score can be calculated by the following formula: (-0.1694×RHOQ-AS1 Expression)+(0.1762×LINC02693 Expression)+(-0.1310×LINC02273 Expression)+(-0.6575× LINC01800 Expression) +(-0.0423×LINC00996 Expression)+(0.1671×FGD5-AS1 Expression)+(-0.0160×CADM3-AS1 Expression)+(0.0808×TMPO-AS1 Expression)+(0.0845×LINC01116 Expression)+(-0.0156×SFTA3 Expression+(0.1348× LINC02802 Expression)+(-0.050×CRNDE Expression)+(-0.0505×PARP11-AS1 Expression). In this model, five genes (LINC02693, FGD5-AS1, TMPO-AS1, LINC01116, LINC02802) have positive coefficient, indicating that these are harmful factors for LUAD patients. On the contrary, eight genes (RHOQ-AS1, LINC02273, LINC01800, LINC00996, CADM3-AS1, SFTA3, CRNDE, PARP11-AS1) have negative coefficient that undertake a protective effect on the prognosis of LUAD patients, and their higher expression level often means a better outcome. Based on this risk prognosis model, we divided train set into high and low-risk group by the median risk score. Kaplan-Meier survival analysis was conducted for two groups, and log-rank tests showed that the low-risk group had a better prognosis (P < 0.001, Fig. 6C). Next, the ROC curve of 1, 3, and 5 years were plotted to estimate the model’s prediction efficiency, and the area under the curve (AUC) was calculated, which respectively are 0.75, 0.7, 0.67(Fig. 6D).
To examine the model’s prognosis prediction efficacy, we also calculated the risk score of the test set. We separated samples into high and low-risk groups in the median risk score cutoff. The same as before, we compared the survival differences of the two groups, plotted the ROC curves, and calculated the AUC for 1, 3, and 5 years. Similar to before, the two groups have significant differences in survival. The low-risk group has a better prognosis than the high (P = 0.002, Fig. 6E). And the AUC for the 1, 3,5 years ROC curves are 0.66, 0.6, and 0.78 (Fig. 6F). These results indicating this model has a good prediction effect.
6. External validation and independent prognostic analysis for model
For further confirmation of the efficacy of this model, we verified this model in an external validation dataset from the GEO database, GSE30219, with 289 samples from the GPL570 platform. Similarly, a Kaplan-Meier survival analysis was performed to compare the survival difference between the high-risk and low-risk groups based on the median risk score cutoff, which showed that the high-risk group is bad in prognosis (Fig. 7A). Moreover, the AUC of the ROC curves in 1, 3, and 5 years respectively, are 0.68, 0.64, and 0.61(Fig. 7B). These results robustly indicate that this model has an excellent performance in LUAD patients’ prognosis survival prediction.
To verify whether our model can be used as an independent prognostic factor independent of other clinical traits. We conducted independent prognostic analysis for this model in the entire LUAD TCGA dataset combined with clinical information. First, a univariate independent prognostic analysis showed that age and gender are not correlated with survival, but the risk score and tumor stage are survival-related (Fig. 7C). Then, a multivariate independent prognostic analysis was implemented, which indicated the risk score and tumor stage could be independent of other factors as independent prognostic factors (Fig. 7D). In conclusion, the results showed that the model-based risk score and tumor stage are independent factors for prognosis, while age and gender are not independent prognosis factors. Therefore, our established model can be an independent prognostic factor for LUAD prognosis.
7. Screening differential expression gene in model and identifying clusters’ signature based on risk score
We utilized the “limma” package to analyze the differentially expressed genes (lncRNAs), and filtered genes rely on the following criteria: | logFC |>1 and P- value < 0.05. At last, we confirmed that SFTA3 was downregulated in tumor group ( logFC = -1.4548, P- value < 0.001). And its expression level in the tumor and the normal group was compared in a box map, showing that SFTA3 is relatively high expression in normal but low expression in the tumor group (Fig. 8A). According to the previous Cox analysis regression analysis (Fig. 2B), the Hazard Ratio (HR) of SFTA3 is 0.842 (P < 0.001), which indicates that a high SFTA3 expression level is a protective factor for patients’ survival and a low SFTA3 expression level may indicate bad prognosis. Meanwhile, this comparison has established that SFTA3 is low expressed in tumors. It also reminds us that detecting SFTA3 low expression may be an essential marker for LUAD diagnosis.
Moreover, for more profoundly investigating the multiple traits of LUAD. We compared the differences in clusters based on our model’s risk score. Results showed that cluster1 has a higher risk score than cluster2, consistent with our previous studies that cluster1 has a worse prognosis than cluster 2 (Fig. 3C, Fig. 8B). Combined with our previous studies, cluster1 has a lower immune score than cluster2 (Fig. 4H), which can explain why cluster1 has a poor prognosis —— due to cluster1 in low immune infiltration status that causes patients into high risk and has an unfavorable prognosis. Therefore, these findings suggested that the immune process plays an essential role in LUAD progression.